We use cookies to improve your experience with our site.

CRL:在地理分布式存储系统中具有局部重构的高效并行再生代码

CRL: Efficient Concurrent Regeneration Codes with Local Reconstruction in Geo-Distributed Storage Systems

  • 摘要: 作为一种典型的纠删码,Reed-Solomon(RS)编码的维修成本如此之高,以致存在高可靠性和存储效率的代价,因此它们不适用于地理分布式存储系统。本文提出了一种新的基于局部重构的并发再生码(CRL)。该CRL纠删码有三个好处。首先,它们能够最小化节点修复的网络带宽。其次,它们可以通过计算来自数据块子集的奇偶校验并使用隐含的奇偶校验块来减少访问节点的数量。第三,它们比用于地理分布式存储系统重建的现有纠删码更快。另外,我们还演示了CRL代码如何克服Reed-Solomon代码的限制。我们还通过分析说明,它们在区块大小和最小距离之间的权衡上非常出色。此外,我们提供理论分析,包括CRL代码的延迟分析和可靠性分析。通过使用数量比较,我们证明CRL(6,2,2)仅为Azure LRC(6,2,2)的0.657X,其中有6个数据块,2个全局奇偶校验和2个本地奇偶校验,CRL(10,4,2)仅为HDFS-Xorbas(10,4,2)的0.656X,其中数据重构时间分别为10个数据块,4个本地奇偶校验位和2个全局奇偶校验位。我们的实验结果通过在两种环境中进行性能评估来显示CRL的性能:1)在存储器中的编码和解码吞吐量方面,其比竞争对手多至少57.25%和66.85%;以及2)在JBOD(Just a Bunch Of Disks)方面,其编码和解码吞吐量至少要高于其竞争对手的1.46倍和1.21倍。我们还通过实验证明,在地理分布式环境中,CRL的编码和解码吞吐量比LRC多28.79%和30.19%。

     

    Abstract: As a typical erasure coding choice, Reed-Solomon (RS) codes have such high repair cost that there is a penalty for high reliability and storage efficiency, thereby they are not suitable in geo-distributed storage systems. We present a novel family of concurrent regeneration codes with local reconstruction (CRL) in this paper. The CRL codes enjoy three benefits. Firstly, they are able to minimize the network bandwidth for node repair. Secondly, they can reduce the number of accessed nodes by calculating parities from a subset of data chunks and using an implied parity chunk. Thirdly, they are faster than existing erasure codes for reconstruction in geo-distributed storage systems. In addition, we demonstrate how the CRL codes overcome the limitations of the Reed Solomon codes. We also illustrate analytically that they are excellent in the trade-off between chunk locality and minimum distance. Furthermore, we present theoretical analysis including latency analysis and reliability analysis for the CRL codes. By using quantity comparisons, we prove that CRL(6, 2, 2) is only 0.657x of Azure LRC(6, 2, 2), where there are six data chunks, two global parities, and two local parities, and CRL(10, 4, 2) is only 0.656x of HDFS-Xorbas(10, 4, 2), where there are 10 data chunks, four local parities, and two global parities respectively, in terms of data reconstruction times. Our experimental results show the performance of CRL by conducting performance evaluations in both two kinds of environments:1) it is at least 57.25% and 66.85% more than its competitors in terms of encoding and decoding throughputs in memory, and 2) it has at least 1.46x and 1.21x higher encoding and decoding throughputs than its competitors in JBOD (Just a Bunch Of Disks). We also illustrate that CRL is 28.79% and 30.19% more than LRC on encoding and decoding throughputs in a geo-distributed environment.

     

/

返回文章
返回