We use cookies to improve your experience with our site.

跨网络的重复数据删除系统中元数据反馈与抑制研究

Metadata Feedback and Utilization for Data Deduplication Across WAN

  • 摘要: 跨广域网的文件通信常见于文件同步和镜像创建等云应用, 而在文件通信中使用跨广域网的重复数据删除往往能够以相应的去重时间开销获得显著的带宽节省。跨广域网的去重时间开销包括在不同地理节点上用于去重的时间开销以及节点之间跨网络重复查询/应答的时间开销。每次跨网络重复查询/应答操作需要至少一个RTT的时延开销。本文提出一个跨广域网的重复数据删除系统MFMU。在该系统中通过元数据反馈和抑制技术来降低跨网络的重复数据删除所需要的时间开销。在该系统中, 基于数据局部性信息, 数据接收端选择性地将部分元数据主动地反馈给数据发送端, 以减少跨网络的重复查询/应答操作。另一方面, 为了抑制元数据反馈所需要的带宽开销以及在接收端元数据读写所需要的磁盘IO操作, 该系统提出了一种基于滞后哈希划分的元数据抑制技术。我们基于实际磁盘镜像数据集的实验显示, 在保证带宽节省效率不被削减的前提下, 与基准的CDC和当前最优的Bimodal重复数据删除算法比较, MFMU能够获得20%到40%的重复数据删除速率提升。

     

    Abstract: Data deduplication for file communication across wide area network (WAN) in the applications such as file synchronization and mirroring of cloud environments usually achieves significant bandwidth saving at the cost of significant time overheads of data deduplication. The time overheads include the time required for data deduplication at two geographically distributed nodes (e.g., disk access bottleneck) and the duplication query/answer operations between the sender and the receiver, since each query or answer introduces at least one round-trip time (RTT) of latency. In this paper, we present a data deduplication system across WAN with metadata feedback and metadata utilization (MFMU), in order to harness the data deduplication related time overheads. In the proposed MFMU system, selective metadata feedbacks from the receiver to the sender are introduced to reduce the number of duplication query/answer operations. In addition, to harness the metadata related disk I/O operations at the receiver, as well as the bandwidth overhead introduced by the metadata feedbacks, a hysteresis hash re-chunking mechanism based metadata utilization component is introduced. Our experimental results demonstrated that MFMU achieved an average of 20% 40% deduplication acceleration with the bandwidth saving ratio not reduced by the metadata feedbacks, as compared with the “baseline” content defined chunking (CDC) used in LBFS (Low-bandwith Network File system) and exiting state-of-the-art Bimodal chunking algorithms based data deduplication solutions.

     

/

返回文章
返回