Improving Metadata Caching Efficiency for Data Deduplication via In-RAM Metadata Utilization

Bing Zhou; Jiang-Tao Wen

doi:10.1007/s11390-016-1664-0

Bing Zhou, Jiang-Tao Wen. Improving Metadata Caching Efficiency for Data Deduplication via In-RAM Metadata Utilization[J]. Journal of Computer Science and Technology, 2016, 31(4): 805-819. DOI: 10.1007/s11390-016-1664-0

Citation:

Improving Metadata Caching Efficiency for Data Deduplication via In-RAM Metadata Utilization

Abstract

Abstract

We describe a data deduplication system for backup storage of PC disk images, named in-RAM metadata utilizing deduplication (IR-MUD). In-RAM hash granularity adaptation and miniLZO based data compression are firstly proposed to reduce the in-RAM metadata size and thereby reduce the space overheads required by the in-RAM metadata caches. Secondly, an in-RAM metadata write cache, as opposed to the traditional metadata read cache, is proposed for further reducing metadata-related disk I/O operations and improving deduplication throughput. During deduplication, the metadata write cache is managed following the LRU caching policy. For each manifest that is hit in the metadata write cache, an expensive manifest reloading operation from the disk is avoided. After deduplication, all the manifests in the metadata write cache are cleared and stored on the disk. Our experimental results using 1.5 TB real-world disk image dataset show that 1) IR-MUD achieved about 95% size reduction for the deduplication metadata, with a small time overhead introduced, 2) when the metadata write cache was not utilized, with the same RAM space size for the metadata read cache, IR-MUD achieved a 400% higher RAM hit ratio and a 50% higher deduplication throughput, as compared with the classic Sparse Indexing deduplication system where no metadata utilization approaches are utilized, and 3) when the metadata write cache was utilized and enough RAM space was available, IR-MUD achieved a 500% higher RAM hit ratio compared with Sparse Indexing and a 70% higher deduplication throughput compared with IR-MUD with only a single metadata read cache. The in-RAM metadata harnessing and metadata write caching approaches of IR-MUD can be applied in most parallel deduplication systems for improving metadata caching efficiency.

FullText(HTML)

References (30)

Relative Articles

Supplements (0)

Cited By

Improving Metadata Caching Efficiency for Data Deduplication via In-RAM Metadata Utilization

Abstract

Catalog

Export File

Citation

Format

Content