›› 2018,Vol. 33 ›› Issue (1): 58-78.doi: 10.1007/s11390-018-1808-5

所属专题: Computer Architecture and Systems Data Management and Data Mining

• Special Section on Selected Paper from NPC 2011 • 上一篇    下一篇

一种用于提高数据去重系统选择性恢复性能的高耐久性固态硬盘读缓存

Jian Liu1,2,3, Yun-Peng Chai2,3,*, Member, CCF, Xiao Qin4, Senior Member, IEEE, Yao-Hong Liu2,3   

  1. 1 Division of Computer Science and Engineering, Louisiana State University, Baton Rouge, LA 70803, U.S.A;
    2 Key Laboratory of Data Engineering and Knowledge Engineering, Ministry of Education of China, Beijing 100872, China;
    3 School of Information, Renmin University of China, Beijing 100872, China;
    4 Shelby Center for Engineering Technology, Department of Computer Science and Software Engineering Samuel Ginn College of Engineering, Auburn University, Auburn, AL 36849-5347, U.S.A
  • 收稿日期:2016-12-06 修回日期:2017-05-07 出版日期:2018-01-05 发布日期:2018-01-05
  • 通讯作者: Yun-Peng Chai E-mail:ypchai@ruc.edu.cn
  • 作者简介:Jian Liu received his B.E. degree in electronic information engineering from China Agricultural University, Beijing, in 2012, and his M.E. degree in computer science and technology from National Computer System Engineering Research Institute of China, Beijing, in 2015. He is currently a Ph.D. student majored in computer science at Louisiana State University, Baton Rouge. His research interests include data deduplication, SSD-based storage systems, etc.
  • 基金资助:

    This work is supported by the Natural Science Foundation of Beijing under Grant No. 4172031, the Fundamental Research Funds for the Central Universities of China, and the Research Funds of Renmin University of China under Grant No. 16XNLQ02. Xiao Qin's work is supported by the U.S. National Science Foundation under Grant Nos. ⅡS-1618669, CCF-0845257 (CAREER), CNS-0917137, CNS-0757778, CCF-0742187, CNS-0831502, CNS-0855251, and OCI-0753305. Xiao Qin's study is also supported by the Programme of Introducing Talents of Discipline to Universities (111 Project) in China under Grant No. B07038.

Endurable SSD-Based Read Cache for Improving the Performance of Selective Restore from Deduplication Systems

Jian Liu1,2,3, Yun-Peng Chai2,3,*, Member, CCF, Xiao Qin4, Senior Member, IEEE, Yao-Hong Liu2,3   

  1. 1 Division of Computer Science and Engineering, Louisiana State University, Baton Rouge, LA 70803, U.S.A;
    2 Key Laboratory of Data Engineering and Knowledge Engineering, Ministry of Education of China, Beijing 100872, China;
    3 School of Information, Renmin University of China, Beijing 100872, China;
    4 Shelby Center for Engineering Technology, Department of Computer Science and Software Engineering Samuel Ginn College of Engineering, Auburn University, Auburn, AL 36849-5347, U.S.A
  • Received:2016-12-06 Revised:2017-05-07 Online:2018-01-05 Published:2018-01-05
  • Contact: Yun-Peng Chai E-mail:ypchai@ruc.edu.cn
  • About author:Jian Liu received his B.E. degree in electronic information engineering from China Agricultural University, Beijing, in 2012, and his M.E. degree in computer science and technology from National Computer System Engineering Research Institute of China, Beijing, in 2015. He is currently a Ph.D. student majored in computer science at Louisiana State University, Baton Rouge. His research interests include data deduplication, SSD-based storage systems, etc.
  • Supported by:

    This work is supported by the Natural Science Foundation of Beijing under Grant No. 4172031, the Fundamental Research Funds for the Central Universities of China, and the Research Funds of Renmin University of China under Grant No. 16XNLQ02. Xiao Qin's work is supported by the U.S. National Science Foundation under Grant Nos. ⅡS-1618669, CCF-0845257 (CAREER), CNS-0917137, CNS-0757778, CCF-0742187, CNS-0831502, CNS-0855251, and OCI-0753305. Xiao Qin's study is also supported by the Programme of Introducing Talents of Discipline to Universities (111 Project) in China under Grant No. B07038.

数据去重技术已经被广泛应用于企业级存储和云存储系统。为了应对数据去重系统的选择性恢复操作所面临的性能挑战,我们可以用基于固态硬盘的读缓存来动态地缓存那些非常热的数据来加快恢复速度。但是经典缓存算法(例如LRU和LFU)需要频繁的进行缓存数据的更新,这样会极大地缩短固态硬盘的使用寿命,同时也会降低其I/O处理速度。为了解决这个问题,我们提出了一种新的方法LOP-Cache来增加长期热门(LOP)数据在所有写入固态硬盘的缓存数据中的比例,因此能显著提高固态硬盘的耐久性,并提高I/O性能。LOP-Cache能够让LOP数据在固态硬盘缓存中停留较长时间从而减少缓存替换的次数。另外,它也能够阻止那些冷的或者不必要的数据进入固态硬盘缓存。我们在一个真实的数据去重原型系统中实现了LOP-Cache算法,从而评估它的性能。我们的实验结果表明,LOP-Cache能够利用一个较小的固态硬盘缓存(其空间为去重后数据的5.56%)将选择性恢复操作的平均延迟降低了37.3%。重要的是,LOP-Cache算法还能够将固态硬盘的寿命延长到原来的9.77倍。这些证据显示LOP-Cache算法能够为数据去重系统提供一个高性价比的固态硬盘读缓存来提高选择性恢复操作的性能。

Abstract: Deduplication has been commonly used in both enterprise storage systems and cloud storage. To overcome the performance challenge for the selective restore operations of deduplication systems, solid-state-drive-based (i.e., SSD-based) read cache can be deployed for speeding up by caching popular restore contents dynamically. Unfortunately, frequent data updates induced by classical cache schemes (e.g., LRU and LFU) significantly shorten SSDs' lifetime while slowing down I/O processes in SSDs. To address this problem, we propose a new solution-LOP-Cache-to greatly improve the write durability of SSDs as well as I/O performance by enlarging the proportion of long-term popular (LOP) data among data written into SSD-based cache. LOP-Cache keeps LOP data in the SSD cache for a long time period to decrease the number of cache replacements. Furthermore, it prevents unpopular or unnecessary data in deduplication containers from being written into the SSD cache. We implemented LOP-Cache in a prototype deduplication system to evaluate its performance. Our experimental results indicate that LOP-Cache shortens the latency of selective restore by an average of 37.3% at the cost of a small SSD-based cache with only 5.56% capacity of the deduplicated data. Importantly, LOP-Cache improves SSDs' lifetime by a factor of 9.77. The evidence shows that LOP-Cache offers a cost-efficient SSD-based read cache solution to boost performance of selective restore for deduplication systems.

[1] EMC Corporation. The EMC digital universe study. Technical Report, 2014. https://www.emc.com/collateral/analyst-reports/idc-digital-universe-2014.pdf, Jan. 2018.

[2] Gantz J, Reinsel D. The digital universe decade-Are you ready? Technical Report, IDC-IVIEW EMC Corporation, 2010. http://www.group47.com/ The DigitalUniverse Decade-Are You Ready.pdf, Dec. 2017

[3] Ganesan P. Read performance enhancement in data deduplication for secondary storage[M.S. Theses]. University of Minnesota, Minnesota, USA, 2013.

[4] Alvarez C. NetApp deduplication for FAS and V-Series deployment and implementation guide. Technical Report TR-3505, NetApp, Inc., 2011. http://www.concordeitgroup.com/docs/netapp/netapp-deduplication-deployment-guide.pdf, February 2011.

[5] EMC. Achieving storage efficiency through EMC Celerra data deduplication:Applied technology. EMC White Paper, http://www.docin.com/p-688598633.html, March 2010.

[6] Mao B, Jiang H, Wu S Z, Fu Y J, Tian L. SAR:SSD assisted restore optimization for deduplication-based storage systems in the cloud. In Proc. the 7th IEEE Int. Conf. Networking Architecture and Storage, June 2012, pp.328-337.

[7] Rabin M O. Fingerprinting by random polynomials. Technical Report TR-15-81, Department of Mathematics, The Hebrew University of Jerusalem, and Department of Computer Science, Harvard University, 1981. http://www.cs.cmu.edu/-15-749/READINGS/optional/rabin1981.pdf, Dec. 2017.

[8] Zhu B, Li K, Patterson H. Avoiding the disk bottleneck in the data domain deduplication file system. In Proc. the 6th USENIX Conf. File and Storage Technologies, February 2008, Article No. 18.

[9] Srinivasan K, Bisson T, Goodson G, Voruganti K. iDedup:Latency-aware, inline data deduplication for primary storage. In Proc. the 10th USENIX Conf. File and Storage Technologies, February 2012.

[10] Lillibridge M, Eshghi K, Bhagwat D, Deolalikar V, Trezis G, Camble P. Sparse indexing:Large scale, inline deduplication using sampling and locality. In Proc. the 7th Conf. File and Storage Technologies, February 2009, pp.111-123.

[11] Xia W, Jiang H, Feng D, Hua Y. SiLo:A similarity-locality based near-exact deduplication scheme with low ram overhead and high throughput. In Proc. USENIX Annual Technical Conf., June 2011, pp.26-28.

[12] Nam Y J, Park D, Du D H C. Assuring demanded read performance of data deduplication storage with backup datasets. In Proc. the 20th Int. Symp. Modeling Analysis and Simulation of Computer and Telecommunication Systems, August 2012, pp.201-208.

[13] Meister D, Brinkmann A. dedupv1:Improving deduplication throughput using solid state drives (SSD). In Proc. the 26th Symp. Mass. Storage Systems and Technologies, May 2010.

[14] Debnath B, Sudipta S, Li J. ChunkStash:Speeding up inline storage deduplication using flash memory. In Proc. USENIX Annual Technical Conf., June 2010.

[15] Boboila S, Desnoyers P. Write endurance in flash drives:Measurements and analysis. In Proc. the 8th USENIX Conf. File and Storage Technologies, February 2010.

[16] Grupp L M, Davis J D, Swanson S. The bleak future of NAND flash memory. In Proc. the 10th USENIX Conf. File and Storage Technologies, February 2012.

[17] Soundararajan G, Prabhakaran V, Balakrishnan M, Wobber T. Extending SSD lifetimes with disk-based write caches. In Proc. the 8th USENIX Conf. File and Storage Technologies, February 2010.

[18] Chen Z G, Liu F, Du Y M. Reorder the write sequence by virtual write buffer to extend SSD's lifespan. In Proc. the 8th IFIP Int. Conf. Network and Parallel Computing, October 2011, pp.263-276.

[19] Yang Q, Ren J. I-CASH:Intelligently coupled array of SSD and HDD. In Proc. the 17th Int. Symp. High Performance Computer Architecture, February 2011, pp.278-289.

[20] Kim J, Son I, Choi J, Yoon S, Kang S, Won Y, Cha J. Deduplication in SSD for reducing write amplification factor. In Proc. the 9th USENIX Conf. File and Storage Technologies, Feb. 2011.

[21] Jeong J, Hahn S S, Lee S, Kim J. Lifetime improvement of NAND flash-based storage systems using dynamic program and erase scaling. In Proc. the 12th USENIX Conf. File and Storage Technologies, February 2014, pp.61-74.

[22] Zhang L K, Neely B, Franklin D, Strukov D, Xie Y, Chong F T. Mellow writes:Extending lifetime in resistive memories through selective slow write backs. In Proc. the 43rd ACM/IEEE Annual Int. Symp. Computer Architecture, June 2016, pp.519-531.

[23] Zhang M Z, Zhang L K, Jiang L, Liu Z Y, Chong F T. Balancing performance and lifetime of MLC PCM by using a region retention monitor. In Proc. IEEE. Int. Symp. High Performance Computer Architecture, February 2017, pp.385-396

[24] Jiang S, Zhang X D. LIRS:An efficient low inter-reference recency set replacement policy to improve buffer cache performance. In Proc. ACM SIGMETRICS Int. Conf. Measurement and Modeling of Computer Systems, June 2002, pp.31-42.

[25] Megiddo N, Modha D S. ARC:A self-tuning, low overhead replacement cache. In Proc. the 2nd USENIX Conf. File and Storage Technologies, March 2003.

[26] Huang S, Wei Q S, Chen J X, Chen C, Feng D. Improving flash-based disk cache with lazy adaptive replacement. In Proc. the 29th Symp. Mass Storage Systems and Technologies, May 2013.

[27] Matthews J, Trika S, Hensgen D, Coulson R, Grimsrud K. Intelr turbo memory:Nonvolatile disk caches in the storage hierarchy of mainstream computer systems. ACM Trans. Storage (TOS), 2008, 4(2):Article No. 4.

[28] Pritchett T, Thottethodi M. SieveStore:A highly-selective, ensemble-level disk cache for cost-performance. In Proc. the 37th Annual Int. Symp. Computer Architecture, June 2010, pp.163-174.

[29] Qureshi M K, Jaleel A, Patt Y N, Steely S C, Emer J. Adaptive insertion policies for high performance caching. ACM SIGARCH Computer Architecture News, 2007, 35(2):381-391

[30] Qureshi M K, Suleman M A, Patt Y N. Line distillation:Increasing cache capacity by filtering unused words in cache lines. In Proc. the 13th IEEE Int. Symp. High Performance Computer Architecture, February 2007, pp.250-259.

[31] Liu J, Chai Y P, Qin X, Xiao Y. PLC-cache:Endurable SSD cache for deduplication-based primary storage. In Proc. the 30th Symp. Mass Storage Systems and Technologies, June 2014.

[32] Wang L, Zhan J F, Luo C J, Zhu Y Q, Yang Q, He Y Q, Gao W L, Jia Z, Shi Y J, Zhang S J, Zheng C, Lu G, Zhan K, Li X N, Qiu B Z. BigDataBench:A big data benchmark suite from Internet services. In Proc. the 20th IEEE Int. Symp. High Performance Computer Architecture, February 2014, pp.488-499.

[33] Fu M. An experimental platform for chunk-level data deduplication. https://github.com/fomy/destor, Dec. 2017.

[34] Lillibridge M, Eshghi K, Bhagwat D. Improving restore speed for backup systems that use inline chunk-based deduplication. In Proc. the 11th USENIX Conf. File and Storage Technologies, February 2013, pp.183-197.
No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] 陈世华;. On the Structure of Finite Automata of Which M Is an(Weak)Inverse with Delay τ[J]. , 1986, 1(2): 54 -59 .
[2] 朱明远;. Two Congruent Semantics for Prolog with CUT[J]. , 1990, 5(1): 82 -91 .
[3] 王能斌; 刘海青;. An Intelligent Tool to Support Requirements Analysis and Conceptual Design of Database Design[J]. , 1991, 6(2): 153 -160 .
[4] 林珊;. Using a Student Model to Improve Explanation in an ITS[J]. , 1992, 7(1): 92 -96 .
[5] 王晖; 刘大有; 王亚飞;. Sequential Back-Propagation[J]. , 1994, 9(3): 252 -260 .
[6] 虞慧群; 宋国新; 孙永强;. Completeness of the Accumulation Calculus[J]. , 1998, 13(1): 25 -31 .
[7] 郝瑞兵; 吴建平;. A Formal Approach to Protocol Interoperability Testing[J]. , 1998, 13(1): 79 -90 .
[8] 武君胜; 吴广茂;. Element-Partition-Based Methods for Visualization of 3D Unstructured Grid Data[J]. , 1998, 13(5): 417 -425 .
[9] 黄雄; 李未;. On k-Positive Satisfiability Problem[J]. , 1999, 14(4): 309 -313 .
[10] 陈海明;. Function Definition Language FDL andIts Implementation[J]. , 1999, 14(4): 414 -421 .
版权所有 © 《计算机科学技术学报》编辑部
本系统由北京玛格泰克科技发展有限公司设计开发 技术支持:support@magtech.com.cn
总访问量: