›› 2018, Vol. 33 ›› Issue (1): 58-78.doi: 10.1007/s11390-018-1808-5

Special Issue: Computer Architecture and Systems; Data Management and Data Mining

• Computer Architecture and Systems • Previous Articles     Next Articles

Endurable SSD-Based Read Cache for Improving the Performance of Selective Restore from Deduplication Systems

Jian Liu1,2,3, Yun-Peng Chai2,3,*, Member, CCF, Xiao Qin4, Senior Member, IEEE, Yao-Hong Liu2,3   

  1. 1 Division of Computer Science and Engineering, Louisiana State University, Baton Rouge, LA 70803, U.S.A;
    2 Key Laboratory of Data Engineering and Knowledge Engineering, Ministry of Education of China, Beijing 100872, China;
    3 School of Information, Renmin University of China, Beijing 100872, China;
    4 Shelby Center for Engineering Technology, Department of Computer Science and Software Engineering Samuel Ginn College of Engineering, Auburn University, Auburn, AL 36849-5347, U.S.A
  • Received:2016-12-06 Revised:2017-05-07 Online:2018-01-05 Published:2018-01-05
  • Contact: Yun-Peng Chai E-mail:ypchai@ruc.edu.cn
  • About author:Jian Liu received his B.E. degree in electronic information engineering from China Agricultural University, Beijing, in 2012, and his M.E. degree in computer science and technology from National Computer System Engineering Research Institute of China, Beijing, in 2015. He is currently a Ph.D. student majored in computer science at Louisiana State University, Baton Rouge. His research interests include data deduplication, SSD-based storage systems, etc.
  • Supported by:

    This work is supported by the Natural Science Foundation of Beijing under Grant No. 4172031, the Fundamental Research Funds for the Central Universities of China, and the Research Funds of Renmin University of China under Grant No. 16XNLQ02. Xiao Qin's work is supported by the U.S. National Science Foundation under Grant Nos. ⅡS-1618669, CCF-0845257 (CAREER), CNS-0917137, CNS-0757778, CCF-0742187, CNS-0831502, CNS-0855251, and OCI-0753305. Xiao Qin's study is also supported by the Programme of Introducing Talents of Discipline to Universities (111 Project) in China under Grant No. B07038.

Deduplication has been commonly used in both enterprise storage systems and cloud storage. To overcome the performance challenge for the selective restore operations of deduplication systems, solid-state-drive-based (i.e., SSD-based) read cache can be deployed for speeding up by caching popular restore contents dynamically. Unfortunately, frequent data updates induced by classical cache schemes (e.g., LRU and LFU) significantly shorten SSDs' lifetime while slowing down I/O processes in SSDs. To address this problem, we propose a new solution-LOP-Cache-to greatly improve the write durability of SSDs as well as I/O performance by enlarging the proportion of long-term popular (LOP) data among data written into SSD-based cache. LOP-Cache keeps LOP data in the SSD cache for a long time period to decrease the number of cache replacements. Furthermore, it prevents unpopular or unnecessary data in deduplication containers from being written into the SSD cache. We implemented LOP-Cache in a prototype deduplication system to evaluate its performance. Our experimental results indicate that LOP-Cache shortens the latency of selective restore by an average of 37.3% at the cost of a small SSD-based cache with only 5.56% capacity of the deduplicated data. Importantly, LOP-Cache improves SSDs' lifetime by a factor of 9.77. The evidence shows that LOP-Cache offers a cost-efficient SSD-based read cache solution to boost performance of selective restore for deduplication systems.

[1] EMC Corporation. The EMC digital universe study. Technical Report, 2014. https://www.emc.com/collateral/analyst-reports/idc-digital-universe-2014.pdf, Jan. 2018.

[2] Gantz J, Reinsel D. The digital universe decade-Are you ready? Technical Report, IDC-IVIEW EMC Corporation, 2010. http://www.group47.com/ The DigitalUniverse Decade-Are You Ready.pdf, Dec. 2017

[3] Ganesan P. Read performance enhancement in data deduplication for secondary storage[M.S. Theses]. University of Minnesota, Minnesota, USA, 2013.

[4] Alvarez C. NetApp deduplication for FAS and V-Series deployment and implementation guide. Technical Report TR-3505, NetApp, Inc., 2011. http://www.concordeitgroup.com/docs/netapp/netapp-deduplication-deployment-guide.pdf, February 2011.

[5] EMC. Achieving storage efficiency through EMC Celerra data deduplication:Applied technology. EMC White Paper, http://www.docin.com/p-688598633.html, March 2010.

[6] Mao B, Jiang H, Wu S Z, Fu Y J, Tian L. SAR:SSD assisted restore optimization for deduplication-based storage systems in the cloud. In Proc. the 7th IEEE Int. Conf. Networking Architecture and Storage, June 2012, pp.328-337.

[7] Rabin M O. Fingerprinting by random polynomials. Technical Report TR-15-81, Department of Mathematics, The Hebrew University of Jerusalem, and Department of Computer Science, Harvard University, 1981. http://www.cs.cmu.edu/-15-749/READINGS/optional/rabin1981.pdf, Dec. 2017.

[8] Zhu B, Li K, Patterson H. Avoiding the disk bottleneck in the data domain deduplication file system. In Proc. the 6th USENIX Conf. File and Storage Technologies, February 2008, Article No. 18.

[9] Srinivasan K, Bisson T, Goodson G, Voruganti K. iDedup:Latency-aware, inline data deduplication for primary storage. In Proc. the 10th USENIX Conf. File and Storage Technologies, February 2012.

[10] Lillibridge M, Eshghi K, Bhagwat D, Deolalikar V, Trezis G, Camble P. Sparse indexing:Large scale, inline deduplication using sampling and locality. In Proc. the 7th Conf. File and Storage Technologies, February 2009, pp.111-123.

[11] Xia W, Jiang H, Feng D, Hua Y. SiLo:A similarity-locality based near-exact deduplication scheme with low ram overhead and high throughput. In Proc. USENIX Annual Technical Conf., June 2011, pp.26-28.

[12] Nam Y J, Park D, Du D H C. Assuring demanded read performance of data deduplication storage with backup datasets. In Proc. the 20th Int. Symp. Modeling Analysis and Simulation of Computer and Telecommunication Systems, August 2012, pp.201-208.

[13] Meister D, Brinkmann A. dedupv1:Improving deduplication throughput using solid state drives (SSD). In Proc. the 26th Symp. Mass. Storage Systems and Technologies, May 2010.

[14] Debnath B, Sudipta S, Li J. ChunkStash:Speeding up inline storage deduplication using flash memory. In Proc. USENIX Annual Technical Conf., June 2010.

[15] Boboila S, Desnoyers P. Write endurance in flash drives:Measurements and analysis. In Proc. the 8th USENIX Conf. File and Storage Technologies, February 2010.

[16] Grupp L M, Davis J D, Swanson S. The bleak future of NAND flash memory. In Proc. the 10th USENIX Conf. File and Storage Technologies, February 2012.

[17] Soundararajan G, Prabhakaran V, Balakrishnan M, Wobber T. Extending SSD lifetimes with disk-based write caches. In Proc. the 8th USENIX Conf. File and Storage Technologies, February 2010.

[18] Chen Z G, Liu F, Du Y M. Reorder the write sequence by virtual write buffer to extend SSD's lifespan. In Proc. the 8th IFIP Int. Conf. Network and Parallel Computing, October 2011, pp.263-276.

[19] Yang Q, Ren J. I-CASH:Intelligently coupled array of SSD and HDD. In Proc. the 17th Int. Symp. High Performance Computer Architecture, February 2011, pp.278-289.

[20] Kim J, Son I, Choi J, Yoon S, Kang S, Won Y, Cha J. Deduplication in SSD for reducing write amplification factor. In Proc. the 9th USENIX Conf. File and Storage Technologies, Feb. 2011.

[21] Jeong J, Hahn S S, Lee S, Kim J. Lifetime improvement of NAND flash-based storage systems using dynamic program and erase scaling. In Proc. the 12th USENIX Conf. File and Storage Technologies, February 2014, pp.61-74.

[22] Zhang L K, Neely B, Franklin D, Strukov D, Xie Y, Chong F T. Mellow writes:Extending lifetime in resistive memories through selective slow write backs. In Proc. the 43rd ACM/IEEE Annual Int. Symp. Computer Architecture, June 2016, pp.519-531.

[23] Zhang M Z, Zhang L K, Jiang L, Liu Z Y, Chong F T. Balancing performance and lifetime of MLC PCM by using a region retention monitor. In Proc. IEEE. Int. Symp. High Performance Computer Architecture, February 2017, pp.385-396

[24] Jiang S, Zhang X D. LIRS:An efficient low inter-reference recency set replacement policy to improve buffer cache performance. In Proc. ACM SIGMETRICS Int. Conf. Measurement and Modeling of Computer Systems, June 2002, pp.31-42.

[25] Megiddo N, Modha D S. ARC:A self-tuning, low overhead replacement cache. In Proc. the 2nd USENIX Conf. File and Storage Technologies, March 2003.

[26] Huang S, Wei Q S, Chen J X, Chen C, Feng D. Improving flash-based disk cache with lazy adaptive replacement. In Proc. the 29th Symp. Mass Storage Systems and Technologies, May 2013.

[27] Matthews J, Trika S, Hensgen D, Coulson R, Grimsrud K. Intelr turbo memory:Nonvolatile disk caches in the storage hierarchy of mainstream computer systems. ACM Trans. Storage (TOS), 2008, 4(2):Article No. 4.

[28] Pritchett T, Thottethodi M. SieveStore:A highly-selective, ensemble-level disk cache for cost-performance. In Proc. the 37th Annual Int. Symp. Computer Architecture, June 2010, pp.163-174.

[29] Qureshi M K, Jaleel A, Patt Y N, Steely S C, Emer J. Adaptive insertion policies for high performance caching. ACM SIGARCH Computer Architecture News, 2007, 35(2):381-391

[30] Qureshi M K, Suleman M A, Patt Y N. Line distillation:Increasing cache capacity by filtering unused words in cache lines. In Proc. the 13th IEEE Int. Symp. High Performance Computer Architecture, February 2007, pp.250-259.

[31] Liu J, Chai Y P, Qin X, Xiao Y. PLC-cache:Endurable SSD cache for deduplication-based primary storage. In Proc. the 30th Symp. Mass Storage Systems and Technologies, June 2014.

[32] Wang L, Zhan J F, Luo C J, Zhu Y Q, Yang Q, He Y Q, Gao W L, Jia Z, Shi Y J, Zhang S J, Zheng C, Lu G, Zhan K, Li X N, Qiu B Z. BigDataBench:A big data benchmark suite from Internet services. In Proc. the 20th IEEE Int. Symp. High Performance Computer Architecture, February 2014, pp.488-499.

[33] Fu M. An experimental platform for chunk-level data deduplication. https://github.com/fomy/destor, Dec. 2017.

[34] Lillibridge M, Eshghi K, Bhagwat D. Improving restore speed for backup systems that use inline chunk-based deduplication. In Proc. the 11th USENIX Conf. File and Storage Technologies, February 2013, pp.183-197.
No related articles found!
Full text



[1] Chen Shihua;. On the Structure of Finite Automata of Which M Is an(Weak)Inverse with Delay τ[J]. , 1986, 1(2): 54 -59 .
[2] Zhu Mingyuan;. Two Congruent Semantics for Prolog with CUT[J]. , 1990, 5(1): 82 -91 .
[3] Wang Nengbin; Liu Haiqing;. An Intelligent Tool to Support Requirements Analysis and Conceptual Design of Database Design[J]. , 1991, 6(2): 153 -160 .
[4] Lin Shan;. Using a Student Model to Improve Explanation in an ITS[J]. , 1992, 7(1): 92 -96 .
[5] Wang Hui; Liu Dayou; Wang Yafei;. Sequential Back-Propagation[J]. , 1994, 9(3): 252 -260 .
[6] Yu Huiqun; Song Guoxin; Sun Yongqiang;. Completeness of the Accumulation Calculus[J]. , 1998, 13(1): 25 -31 .
[7] Hao Ruibing; Wu Jianping;. A Formal Approach to Protocol Interoperability Testing[J]. , 1998, 13(1): 79 -90 .
[8] Wu Junsheng; Wu Guangmao;. Element-Partition-Based Methods for Visualization of 3D Unstructured Grid Data[J]. , 1998, 13(5): 417 -425 .
[9] HUANG Xiong; LI wei;. On k-Positive Satisfiability Problem[J]. , 1999, 14(4): 309 -313 .
[10] CHEN Haiming;. Function Definition Language FDL andIts Implementation[J]. , 1999, 14(4): 414 -421 .

ISSN 1000-9000(Print)

CN 11-2296/TP

Editorial Board
Author Guidelines
Journal of Computer Science and Technology
Institute of Computing Technology, Chinese Academy of Sciences
P.O. Box 2704, Beijing 100190 P.R. China
E-mail: jcst@ict.ac.cn
  Copyright ©2015 JCST, All Rights Reserved