We use cookies to improve your experience with our site.

一种异步的机群文件系统元数据一致性保证协议

A Non-Forced-Write Atomic Commit Protocol for Cluster File Systems

  • 摘要: 在机群文件系统中,采用多元数据服务器架构成为一种必然趋势。其中分布式元数据操作的一致性维护成为影响机群文件系统可靠性和可用性的关键问题。现有的原子提交协议都需要多次的同步写磁盘操作,会极大的降低分布式元数据操作的性能。鉴于网络交互的延迟远远低于写磁盘的延迟,本文提出了一种不需要同步写的原子提交协议——Dual-Log(DL). DL是针对只涉及到两台元数据服务器的分布式元数据操作设计的,两台分布式元数据服务器通过网络互相把本地子操作的重做日志发送到对方服务器。当其中的一台元数据服务器宕机时,便可以根据另外一台元数据服务器上冗余记录的重做日志恢复自己的子操作。我们在机群文件系统实现了DL并且对DL的性能进行了评测,测试结果表明:与另外两种广泛使用的原子提交协议,EP 和 S2PC-MP相比,在采用DL的系统中,分布式元数据操作的平均响应时间减少了40%-60%。并且宕机服务器对不超过10000个未完成的分布式元数据操作的恢复时间不超过1s。

     

    Abstract: Distributed metadata consistency is one of the critical issues of metadata clusters in distributed file systems. Existing methods to maintain metadata consistency generally need several log forced write operations. Since synchronous disk IO is very ineffcient, the average response time of metadata operations is greatly increased. In this paper, an asynchronous atomic commit protocol (ACP) named Dual-Log (DL) is presented. It does not need any log forced write operations. Optimizing for distributed metadata operations involving only two metadata servers, DL mutually records the redo log in counterpart metadata servers by transferring through the low latency network. A crashed metadata server can redo the metadata operation with the redundant redo log. Since the latency of the network is much lower than the latency of disk IO, DL can improve the performance of distributed metadata service significantly. The prototype of DL is implemented based on local journal. The performance is tested by comparing with two widely used protocols, EP and S2PC-MP, and the results show that the average response time of distributed metadata operations is reduced by about 40%~60%, and the recovery time is only 1 second under 10 thousands uncompleted distributed metadata operations.

     

/

返回文章
返回