A Data Deduplication Framework of Disk Images with Adaptive Block Skipping

Bing Zhou; Jiang-Tao Wen

doi:10.1007/s11390-016-1665-z

Bing Zhou, Jiang-Tao Wen. A Data Deduplication Framework of Disk Images with Adaptive Block SkippingJ. Journal of Computer Science and Technology, 2016, 31(4): 820-835. DOI: 10.1007/s11390-016-1665-z

Citation:

Bing Zhou, Jiang-Tao Wen. A Data Deduplication Framework of Disk Images with Adaptive Block SkippingJ. Journal of Computer Science and Technology, 2016, 31(4): 820-835. DOI: 10.1007/s11390-016-1665-z

Citation:

Bing Zhou, Jiang-Tao Wen. A Data Deduplication Framework of Disk Images with Adaptive Block SkippingJ. Journal of Computer Science and Technology, 2016, 31(4): 820-835. DOI: 10.1007/s11390-016-1665-z

A Data Deduplication Framework of Disk Images with Adaptive Block Skipping

Abstract

Abstract

We describe an efficient and easily applicable data deduplication framework with heuristic prediction based adaptive block skipping for the real-world dataset such as disk images to save deduplication related overheads and improve deduplication throughput with good deduplication efficiency maintained. Under the framework, deduplication operations are skipped for data chunks determined as likely non-duplicates via heuristic prediction, in conjunction with a hit and matching extension process for duplication identification within skipped blocks and a hysteresis mechanism based hash indexing process to update the hash indices for the re-encountered skipped chunks. For performance evaluation, the proposed framework was integrated and implemented in the existing data domain and sparse indexing deduplication algorithms. The experimental results based on a real-world dataset of 1.0 TB disk images showed that the deduplication related overheads were significantly reduced with adaptive block skipping, leading to a 30% 80% improvement in deduplication throughput when deduplication metadata were stored on the disk for data domain, and 25% 40% RAM space saving with a 15% 20% improvement in deduplication throughput when an in-RAM sparse index was used in sparse indexing. In both cases, the corresponding deduplication ratios reduced were below 5%.

FullText(HTML)

References (33)

Relative Articles

Supplements (0)

Cited By

A Data Deduplication Framework of Disk Images with Adaptive Block Skipping

Abstract

Catalog

Export File

Citation

Format

Content