Metadata Feedback and Utilization for Data Deduplication Across WAN

Bing Zhou; Jiang-Tao Wen

doi:10.1007/s11390-016-1650-6

Bing Zhou, Jiang-Tao Wen. Metadata Feedback and Utilization for Data Deduplication Across WAN[J]. Journal of Computer Science and Technology, 2016, 31(3): 604-623. DOI: 10.1007/s11390-016-1650-6

Citation:

Bing Zhou, Jiang-Tao Wen. Metadata Feedback and Utilization for Data Deduplication Across WAN[J]. Journal of Computer Science and Technology, 2016, 31(3): 604-623. DOI: 10.1007/s11390-016-1650-6

Citation:

Bing Zhou, Jiang-Tao Wen. Metadata Feedback and Utilization for Data Deduplication Across WAN[J]. Journal of Computer Science and Technology, 2016, 31(3): 604-623. DOI: 10.1007/s11390-016-1650-6

Metadata Feedback and Utilization for Data Deduplication Across WAN

Abstract

Abstract

Data deduplication for file communication across wide area network (WAN) in the applications such as file synchronization and mirroring of cloud environments usually achieves significant bandwidth saving at the cost of significant time overheads of data deduplication. The time overheads include the time required for data deduplication at two geographically distributed nodes (e.g., disk access bottleneck) and the duplication query/answer operations between the sender and the receiver, since each query or answer introduces at least one round-trip time (RTT) of latency. In this paper, we present a data deduplication system across WAN with metadata feedback and metadata utilization (MFMU), in order to harness the data deduplication related time overheads. In the proposed MFMU system, selective metadata feedbacks from the receiver to the sender are introduced to reduce the number of duplication query/answer operations. In addition, to harness the metadata related disk I/O operations at the receiver, as well as the bandwidth overhead introduced by the metadata feedbacks, a hysteresis hash re-chunking mechanism based metadata utilization component is introduced. Our experimental results demonstrated that MFMU achieved an average of 20% 40% deduplication acceleration with the bandwidth saving ratio not reduced by the metadata feedbacks, as compared with the “baseline” content defined chunking (CDC) used in LBFS (Low-bandwith Network File system) and exiting state-of-the-art Bimodal chunking algorithms based data deduplication solutions.

FullText(HTML)

References (36)

Relative Articles

Supplements (0)

Cited By

Metadata Feedback and Utilization for Data Deduplication Across WAN

Abstract

Catalog

Export File

Citation

Format

Content