计算机科学技术学报 ›› 2022,Vol. 37 ›› Issue (1): 128-146.doi: 10.1007/s11390-021-1662-8

所属专题: Software Systems

• • 上一篇    下一篇

MongoDB因果一致性协议测试

  

  • 收稿日期:2021-06-01 修回日期:2021-10-09 接受日期:2021-12-20 出版日期:2022-01-28 发布日期:2022-01-28

Checking Causal Consistency of MongoDB

Hong-Rong Ouyang1 (欧阳鸿荣), Heng-Feng Wei1,2,* (魏恒峰), Member, CCF, Hai-Xiang Li3,* (李海翔), Member, CCF, An-Qun Pan3 (潘安群), Member, CCF, and Yu Huang1 (黄宇), Member, CCF        

  1. 1State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing 210023, China
    2Software Institute, Nanjing University, Nanjing 210093, China
    3Tencent Distributed SQL Team of Technology and Engineering Group of Tencent, Tencent Inc., Shenzhen 518054, China
  • Received:2021-06-01 Revised:2021-10-09 Accepted:2021-12-20 Online:2022-01-28 Published:2022-01-28
  • Contact: Heng-Feng Wei, Hai-Xiang Li E-mail:hfwei@nju.edu.cn, blueseali@tencent.com
  • About author:Heng-Feng Wei received his B.S. and Ph.D. degrees in computer science and technology from Nanjing University, Nanjing, in 2009 and 2016, respectively. He is currently a research assistant with Software Institute and the State Key Laboratory for Novel Software Technology at Nanjing University, Nanjing. His research interests include distributed computing and formal methods. He is a member of CCF.
    Hai-Xiang Li is currently a chief researcher and chief architect of the distributed database system TDSQL at Tencent Inc., Shenzhen. His research interests include distributed computing, cloud database, transaction processing, and query optimization. He is a member of CCF.
  • Supported by:
    This work was supported by the CCF-Tencent Open Fund under Grant No.RAGR20200124 and the National Natural Science Foundation of China under Grant Nos.61702253 and 61772258.

1、研究背景:
目前已有的分布式存储系统通常采用副本技术,通过将数据的多个副本存放在不同的物理节点以提供可用性和容错性,并降低访问延迟。然而多副本技术也会导致数据一致性问题。根据CAP定理,在分布式系统中强数据一致性、可用性与分区容错性无法同时满足。因此,为了能在网络分区情况下保障系统的可用性,许多分布式存储系统选择实现弱一致性协议。因果一致性协议是一种被广泛使用的弱一致性模型,它可以保证客户端以相同的顺序观察到具有因果关系的操作。已有文献表明,因果一致性是在网络分区情况下能实现的最强的一致性模型。MongoDB是最早支持因果一致性的商用分布式数据库之一。然而,为了提供系统的可用性、容错性与安全性,MongoDB因果一致性的实现包含混合逻辑时钟、基于Raft的共识协议以及签名验证等机制,这大大增加了MongoDB因果一致性协议实现的复杂性。尽管著名的分布式系统测试机构Jepsen对于MongoDB的因果一致性进行了检测,但是该检测在规约、测试样例、算法与测试场景下都存在许多不足,这大大降低测试的可信度。
2、研究目的:
作为目前世界上使用率排名第一的非关系型数据库,MongoDB近年来得到了很多应用。我们希望能够在Jepsen测试的基础上对MongoDB的因果一致性协议进行一次更系统更充分的测试。
3、研究方法:
我们在Jepsen已有测试的基础上,通过对规约、测试样例、算法与测试场景的完善与改进对MongoDB的因果一致性协议进行一次更系统更充分的测试。具体而言,首先,我们采用使用形式化规约语言TLA+与模型检验工具TLC对因果一致性变体CC,CM和CCv的规约,与基于“非法模式”的因果一致性检测算法进行形式化建模。然后,在Jepsen已有测试的基础上,我们引入了网络分区,节点失效与数据块迁移的测试场景,并收集MongoDB在由YCSB生成的覆盖率更广的测试样例下得到的操作历史。最后,使用基于“非法模式”的因果一致性检测算法检测MongoDB操作历史对于CC,CM和CCv这三种因果一致性变体的满足情况。
4、研究结果:
我们首先成功用TLA+对因果一致性变体CC,CM和CCv的规约和基于“非法模式”的因果一致性检测算法进行形式化建模,通过对从模型检验的角度验证了算法的正确性。接着,通过使用Jepsen框架对MongoDB因果一致性协议进行的更充分测试,我们得到MongoDB在不同情况下对于因果一致性的满足情况:在网络分区、节点失效等干扰存在的情况下,那么只有majority的写关注与majority的读关注设置下MongoDB才能提供因果一致性的保障。如果系统时钟正常运行,那么最弱的w1写关注与local读关注设置下MongoDB也能提供因果一致性的保障。
5、研究结论:
我们通过从规约、样例生成、算法以及测试场景四个方面对Jepsen关于MongoDB因果一致性协议的测试进行改进,对MongoDB并进行了更为充分的测试,并得到了在不同网络条件下不同读关注与写关注设置的MongoDB对因果一致性三种主要变体(CC,CM与CCv)的支持情况,这对于用户使用MongoDB的因果一致性有很大的参考价值。未来,我们希望能够使用TLA+对MongoDB的事务协议进行形式化建模与验证。

关键词: MongoDB, 因果一致性, Jepsen, 一致性检验, TLA+

Abstract: MongoDB is one of the first commercial distributed databases that support causal consistency. Its implementation of causal consistency combines several research ideas for achieving scalability, fault tolerance, and security. Given its inherent complexity, a natural question arises: "Has MongoDB correctly implemented causal consistency as it claimed?'' To address this concern, the Jepsen team has conducted black-box testing of MongoDB. However, this Jepsen testing has several drawbacks in terms of specification, test case generation, implementation of causal consistency checking algorithms, and testing scenarios, which undermine the credibility of its reports. In this work, we propose a more thorough design of Jepsen testing of causal consistency of MongoDB. Specifically, we fully implement the causal consistency checking algorithms proposed by Bouajjani et al. and test MongoDB against three well-known variants of causal consistency, namely CC, CCv, and CM, under various scenarios including node failures, data movement, and network partitions. In addition, we develop formal specifications of causal consistency and their checking algorithms in TLA+ , and verify them using the TLC model checker. We also explain how TLA+ specification can be related to Jepsen testing.

Key words: MongoDB, casual consistency, Jepsen, consistency checking, TLA+

[1] Schultz W, Avitabile T, Cabral A. Tunable consistency in MongoDB. Proc. VLDB Endow., 2019, 12(12): 2071-2081. DOI: 10.14778/3352063.3352125.
[2] Tyulenev M, Schwerin A, Kamsky A, Tan R, Cabral A, Mulrow J. Implementation of cluster-wide logical clock and causal consistency in MongoDB. In Proc. the 2019 International Conference on Management of Data, June 30-July 5, 2019, pp.636-650. DOI: 10.1145/3299869.3314049.
[3] Abadi D. Consistency tradeoffs in modern distributed database system design: CAP is only part of the story. IEEE Computer, 2012, 45(2): 37-42. DOI: 10.1109/MC.2012.33.
[4] Brewer E A. Towards robust distributed systems (abstract). In Proc. the 19th Annual ACM Symposium on Principles of Distributed Computing, July 2000, Article No.7. DOI: 10.1145/343477.343502.
[5] Gilbert S, Lynch N. Brewer's conjecture and the feasibility of consistent, available, partition-tolerant web services. ACM SIGACT News, 2002, 33(2): 51-59. DOI: 10.1145/564585.564601.
[6] Brzezinski J, Sobaniec C, Wawrzyniak D. From session causality to causal consistency. In Proc. the 12th Euromicro Conference on Parallel, Distributed and Network-Based Processing, Feb. 2004, pp.152-158. DOI: 10.1109/EMPDP.2004.1271440.
[7] Kulkarni S S, Demirbas M, Madappa D, Avva B, Leone M. Logical physical clocks. In Proc. the 18th International Conference on Principles of Distributed Systems, Dec. 2014, pp.17-32. DOI: 10.1007/978-3-319-14472-6-2.
[8] Du J, Iorgulescu C, Roy A, Zwaenepoel W. GentleRain: Cheap and scalable causal consistency with physical clocks. In Proc. the ACM Symposium on Cloud Computing, Nov. 2014, Article No.4. DOI: 10.1145/2670979.2670983.
[9] Akkoorath D D, Tomsic A Z, Bravo M, Li Z, Crain T, Bieniusa A, Preguiça N, Shapiro M. Cure: Strong semantics meets high availability and low latency. In Proc. the 36th International Conference on Distributed Computing Systems, June 2016, pp.405-414. DOI: 10.1109/ICDCS.2016.98.
[10] Ongaro D, Ousterhout J. In search of an understandable consensus algorithm. In Proc. the 2014 USENIX Conference on USENIX Annual Technical Conference, June 2014, pp.305-320.
[11] Bouajjani A, Enea C, Guerraoui R, Hamza J. On verifying causal consistency. In Proc. the 44th ACM Symposium on Principles of Programming Languages, Jan. 2017, pp.626-638. DOI: 10.1145/3009837.3009888.
[12] Burckhardt S. Principles of eventual consistency. Found. Trends Program. Lang., 2014, 1(1/2): 1-150. DOI: 10.1561/2500000011.
[13] Perrin M, Mostéfaoui A, Jard C. Causal consistency: Beyond memory. In Proc. the 21st ACM Symposium on Principles and Practice of Parallel Programming, Aug. 2016, Article No.26. DOI: 10.1145/2851141.2851170.
[14] Ahamad M, Neiger G, Burns J E, Kohli P, Hutto P W. Causal memory: Definitions, implementation, and programming. Distributed Computing, 1995, 9(1): 37-49. DOI: 10.1007/BF01784241.
[15] Lynch N A. Distributed Algorithms. Morgan Kaufmann Publishers Inc., 1996.
[16] Ouyang H R, Wei H F, Huang Y. Checking causal consistency of MongoDB. In Proc. the 12th Asia-Pacific Symposium on Internetware, Nov. 2020, pp.209-216. DOI: 10.1145/3457913.3457928.
[17] Lamport L. Time, clocks, and the ordering of events in a distributed system. Commun. ACM, 1978, 21(7): 558-565. DOI: 10.1145/359545.359563.
[18] Lesani M, Bell C J, Chlipala A. Chapar: Certified causally consistent distributed key-value stores. In Proc. the 43rd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, Jan. 2016, pp.357-370. DOI: 10.1145/2837614.2837622.
[19] Lamport L. Specifying Systems: The TLA+ Language and Tools for Hardware and Software Engineers (1st edition). Addison-Wesley Professional, 2002.
[20] Lamport L. The temporal logic of actions. ACM Trans. Program. Lang. Syst., 1994, 16(3): 872-923. DOI: 10.1145/177492.177726.
[21] Wei H F, Tang R Z, Huang Y, Lv J. Jupiter made abstract, and then refined. Journal of Computer Science and Technology, 2020, 35(6): 1343-1364. DOI: 10.1007/s11390-020-0516-0.
[22] Yu Y, Manolios P, Lamport L. Model checking TLA+ specifications. In Proc. the 10th IFIP WG 10.5 Advanced Research Working Conference on Correct Hardware Design and Verification Methods, Sept. 1999, pp.54-66. DOI: 10.1007/3-540-48153-2-6.
[23] Cooper B F, Silberstein A, Tam E, Ramakrishnan R, Sears R. Benchmarking cloud serving systems with YCSB. In Proc. the 1st ACM Symposium on Cloud Computing, June 2010, pp.143-154. DOI: 10.1145/1807128.1807152.
[24] Bowles J, Caminati M B. A verified algorithm enumerating event structures. In Proc. the 10th International Conference on Intelligent Computer Mathematics, July 2017, pp.239-254. DOI: 10.1007/978-3-319-62075-6-17.
[25] Gibbons P, Korach E. Testing shared memories. SIAM Journal on Computing, 1997, 26(4): 1208-1244. DOI: 10.1137/S0097539794279614.
[26] Herlihy M P, Wing J M. Linearizability: A correctness condition for concurrent objects. ACM Trans. Program. Lang. Syst., 1990, 12(3): 463-492. DOI: 10.1145/78969.78972.
[27] Attiya H, Welch J L. Sequential consistency versus linearizability. ACM Trans. Comput. Syst., 1994, 12(2): 91-122. DOI: 10.1145/176575.176576.
[28] Wei H, Huang Y, Cao J, Ma X, Lv J. Verifying Pipelined-RAM consistency over read/write traces of data replicas. IEEE Transactions on Parallel and Distributed Systems, 2013, 27(5): 1511-1523. DOI: 10.1109/TPDS.2015.2453985.
[29] Lipton R J, Sandberg J. PRAM: A scalable shared memory. Technical Report, Department of Computer Science, Princeton University, 1988. https://www.cs.princeton.edu/research/techreps/TR-180-88, Aug. 2021.
[1] Jia-Qi Yin, Hui-Biao Zhu, Yuan Fei. 利用TLA+规约和验证Zab协议[J]. 计算机科学技术学报, 2020, 35(6): 1312-1323.
[2] Heng-Feng Wei, Rui-Ze Tang, Yu Huang, Jian Lv. Jupiter协议的抽象与精化[J]. 计算机科学技术学报, 2020, 35(6): 1343-1364.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] 闵应骅;. Easy Test Generation PLAs[J]. , 1987, 2(1): 72 -80 .
[2] 孙永强; 陆汝占; 黄小戎;. Termination Preserving Problem in the Transformation of Applicative Programs[J]. , 1987, 2(3): 191 -201 .
[3] 冯玉琳;. Hierarchical Protocol Analysis by Temporal Logic[J]. , 1988, 3(1): 56 -69 .
[4] 徐洁; 李庆南; 黄世泽; 徐江峰;. DFTSNA:A Distributed Fault-Tolerant Shipboard System[J]. , 1990, 5(2): 109 -116 .
[5] 李锦涛; 闵应骅;. Product-Oriented Test-Pattern Generation for Programmable Logic Arrays[J]. , 1990, 5(2): 164 -174 .
[6] 苏伯珙; 王剑; 夏金石;. TST——An Algorithm for Global Microcode Compaction with Timing Constraints[J]. , 1991, 6(1): 97 -107 .
[7] 李涛;. Competition Based Neural Networks for Assignment Problems[J]. , 1991, 6(4): 305 -315 .
[8] 唐泽圣;. Octree Representation and Its Applications in CAD[J]. , 1992, 7(1): 29 -38 .
[9] 周笛;. Adapting Backward Error Recovery to Parallel Real Time Systems[J]. , 1992, 7(3): 257 -267 .
[10] 沈一栋;. An Algorithm for Determining Database Consistency Under the Closed World Assumption[J]. , 1992, 7(4): 289 -294 .
版权所有 © 《计算机科学技术学报》编辑部
本系统由北京玛格泰克科技发展有限公司设计开发 技术支持:support@magtech.com.cn
总访问量: