›› 2013, Vol. 28 ›› Issue (6): 1012-1024.doi: 10.1007/s11390-013-1394-5

Special Issue: Data Management and Data Mining

• Special Section on Cloud Data Management • Previous Articles     Next Articles

Application-Aware Client-Side Data Reduction and Encryption of Personal Data in Cloud Backup Services

Yin-Jin Fu1 (付印金), Nong Xiao1, * (肖侬), Member, IEEE, Xiang-Ke Liao2 (廖湘科), Member, IEEE, and Fang Liu1 (刘芳), Member, CCF   

  1. 1 State Key Laboratory of High Performance Computing, National University of Defense Technology Changsha 410073, China;
    2 School of Computer, National University of Defense Technology, Changsha 410073, China
  • Received:2012-12-10 Revised:2013-05-06 Online:2013-11-05 Published:2013-11-05
  • About author:Yin-Jin Fu received his B.S. degree in mathematics from Nanjing University, China, and M.S. degree in computer science from National University of Defense Technology (NUDT), Changsha, in 2006 and 2008, respectively. Now he is a Ph.D. candidate at the State Key Laboratory of High Performance Computing in NUDT. His research areas are data deduplication, cloud storage, and distributed file systems.
  • Supported by:

    This work was supported in part by the National High Technology Research and Development 863 Program of China under Grant No. 2013AA013201, the National Natural Science Foundation of China under Grant Nos. 61025009, 61232003, 61120106005, 61170288, and 61379146.

Cloud backup has been an important issue ever since large quantities of valuable data have been stored on the personal computing devices. Data reduction techniques, such as deduplication, delta encoding, and Lempel-Ziv (LZ) compression, performed at the client side before data transfer can help ease cloud backup by saving network bandwidth and reducing cloud storage space. However, client-side data reduction in cloud backup services faces efficiency and privacy challenges. In this paper, we present Pangolin, a secure and efficient cloud backup service for personal data storage by exploiting application awareness. It can speedup backup operations by application-aware client-side data reduction technique, and mitigate data security risks by integrating selective encryption into data reduction for sensitive applications. Our experimental evaluation, based on a prototype implementation, shows that our scheme can improve data reduction efficiency over the state-of-the-art methods by shortening the backup window size to 33%~75%, and its security mechanism for sensitive applications has negligible impact on backup window size.

[1] Armbrust M, Fox A, Griffith R, Joseph A D, Katz R, Konwinski A, Lee G, Patterson D, Rabkin A, Stoica I, Zaharia M. A view of cloud computing. Communications of the ACM, 2010, 53(4): 50-58.

[2] Biggar H. Experiencing data de-duplication: Improving efficiency and reducing capacity requirements. White Paper, the Enterprise Strategy Group, Feb. 2007. www.abtechsystems. com/files/pdfs/WP001 04.pdf, Dec. 2012.

[3] Ponemon L. The cost of a lost laptop. White Paper, Ponemon Institute, Apr. 2009. http://communities.intel.com/docs/DOC-3076, Dec. 2012.

[4] Storer M W, Greenan K, Long D D, Miller E L. Secure data deduplication. In Proc. the 4th StorageSS, Oct. 2008, pp.110.

[5] Harnik D, Pinkas B, Shulman-Peleg A. Side channels in cloud services: Deduplication in cloud storage. IEEE Security & Privacy, 2010, 8(6): 40-47.

[6] Halevi S, Harnik D, Pinkas B, Shulman-Peleg A. Proofs of ownership in remote storage systems. In Proc. the 18th CCS, Oct. 2011, pp.491-500.

[7] Blelloch G E. Introduction to data compression. Technical Report, Computer Science Department, Carnegie Mellon University, Oct. 2001. http://www.cs.cmu.edu/afs/cs/project/pscico-guyb/realworld/www/compression.pdf, Oct. 2013.

[8] Douglis F, Iyengar A. Application-specific delta-encoding via resemblance detection. In Proc. the USENIX ATC, Jun. 2003, pp.113-126.

[9] Shilane P, Huang M, Wallace G, Hsu W. WAN optimized replication of backup datasets using stream-informed delta compression. ACM Transactions on Storage, 2012, 8(4): Article No. 13.

[10] Zhu B, Li K, Patterson H. Avoiding the disk bottleneck in the data domain deduplication file system. In Proc. the 6th FAST, Feb. 2008, pp.269-282.

[11] Bois L D, Amatruda R. Backup and recovery: Accelerating efficiency and driving down IT costs using data deduplication. Technical Report, EMC Corporation, Feb. 2010.

[12] Shilane P, Wallace G, Huang M, Hsu W. Delta compressed and deduplicated storage using stream-informed locality. In Proc. the 4th HotStorage, June 2012, Article No. 10.

[13] Maximizing data efficiency: Benefits of global deduplication. White Paper, NEC, June 2009. http://www.knowledgestorm.com/sol summary 5136573.asp, Dec. 2013.

[14] Anderson P, Zhang L. Fast and secure laptop backups with encrypted de-duplication. In Proc. the 24th LISA, Dec. 2010, Article No. 3.

[15] Lillibridge M, Eshghi K, Bhagwat D, Deolalikar V, Trezise G, Camble P. Sparse indexing: Large scale, inline deduplication using sampling and locality. In Proc. the 7th FAST, Feb. 2009, pp.111-123.

[16] Meister D, Brinkmann A. Multi-level comparison of data deduplication in a backup scenario. In Proc. the SYSTOR, May 2009, Article No. 8.

[17] Agrawal N, Bolosky W J, Douceur J R, Lorch J R. A five-year study of file-system metadata. In Proc. the 5th FAST, Feb. 2007, pp.31-45.

[18] Bhagwat D, Eshghi K, Long D D, Lillibridge M. Extreme binning: Scalable, parallel deduplication for chunk based file backup. In Proc. the 17th MASCOTS, Sept. 2009, pp.1-9.

[19] Tan Y, Jiang H, Feng D, Tian L, Yan Z, Zhou G. SAM: A semantic-aware multi-tiered source de-duplication framework for cloud backup. In Proc. the 39th ICPP, Sept. 2010, pp.614-623.

[20] Vrable M, Savage S, Voelker G M. Cumulus: Filesystem backup to the cloud. In Proc. the 7th FAST, Feb. 2009, pp.225-238.

[21] MacDonald J. File system support for delta compression[Master's Thesis]. Department of Electrical Engineering and Computer Science, University of California at Berkeley, 2000.

[22] Asenjo J C. The advanced encryption standard | Implementation and transition to a new cryptographic benchmark. Network Security, 2002, 2002(7): 7-9.

[23] Fu Y, Jiang H, Xiao N, Tian L, Liu F. AA-Dedupe: An application-aware source deduplication approach for cloud backup services in the personal computing environment. In Proc. the IEEE CLUSTER, Sept. 2011, pp.112-120.
No related articles found!
Full text



[1] Liu Mingye; Hong Enyu;. Some Covering Problems and Their Solutions in Automatic Logic Synthesis Systems[J]. , 1986, 1(2): 83 -92 .
[2] Sun Zhongxiu; Shang Lujun;. DMODULA:A Distributed Programming Language[J]. , 1986, 1(2): 25 -31 .
[3] Chen Shihua;. On the Structure of (Weak) Inverses of an (Weakly) Invertible Finite Automaton[J]. , 1986, 1(3): 92 -100 .
[4] Gao Qingshi; Zhang Xiang; Yang Shufan; Chen Shuqing;. Vector Computer 757[J]. , 1986, 1(3): 1 -14 .
[5] Chen Zhaoxiong; Gao Qingshi;. A Substitution Based Model for the Implementation of PROLOG——The Design and Implementation of LPROLOG[J]. , 1986, 1(4): 17 -26 .
[6] Huang Heyan;. A Parallel Implementation Model of HPARLOG[J]. , 1986, 1(4): 27 -38 .
[7] Min Yinghua; Han Zhide;. A Built-in Test Pattern Generator[J]. , 1986, 1(4): 62 -74 .
[8] Tang Tonggao; Zhao Zhaokeng;. Stack Method in Program Semantics[J]. , 1987, 2(1): 51 -63 .
[9] Min Yinghua;. Easy Test Generation PLAs[J]. , 1987, 2(1): 72 -80 .
[10] Zhu Hong;. Some Mathematical Properties of the Functional Programming Language FP[J]. , 1987, 2(3): 202 -216 .

ISSN 1000-9000(Print)

CN 11-2296/TP

Editorial Board
Author Guidelines
Journal of Computer Science and Technology
Institute of Computing Technology, Chinese Academy of Sciences
P.O. Box 2704, Beijing 100190 P.R. China
E-mail: jcst@ict.ac.cn
  Copyright ©2015 JCST, All Rights Reserved