Journal of Computer Science and Technology ›› 2021, Vol. 36 ›› Issue (5): 1037-1050.doi: 10.1007/s11390-021-1247-6

Special Issue: Computer Architecture and Systems

• Special Section of APPT 2021 (Part 1) • Previous Articles     Next Articles

Write-Optimized B+ Tree Index Technology for Persistent Memory

Rui-Xiang Ma1,2,3,4, Fei Wu1,2,3,4,*, Senior Member, CCF, Member, IEEE, Bu-Rong Dong1,2,3,4 Meng Zhang1,2,3,4, Wei-Jun Li5, Senior Member, CCF, IEEE, and Chang-Sheng Xie1,2,3,4, Member, IEEE        

  1. 1 Wuhan National Laboratory for Optoelectronics, Huazhong University of Science and Technology, Wuhan 430070, China;
    2 Key Laboratory of Information Storage System, Ministry of Education of China, Wuhan 430070, China;
    3 Engineering Research Center of Data Storage Systems and Technology, Huazhong University of Science and Technology Wuhan 430070, China;
    4 School of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan 430070, China;
    5 Shenzhen DAPU Microelectronics Co., Ltd, Shenzhen 518116, China
  • Received:2020-12-31 Revised:2021-08-28 Online:2021-09-30 Published:2021-09-30
  • About author:Rui-Xiang Ma is currently pursuing his Ph.D. degree with the Wuhan National Laboratory for Optoelectronics (WNLO), Huazhong University of Science and Technology (HUST), Wuhan. His current research interests include intelligent storage, machine learning, and non-volatile memory.
  • Supported by:
    This work was supported in part by the National Natural Science Foundation of China under Grant Nos. U1709220, U2001203, 61821003, 61872413, and 61902137, in part by the National Key Research and Development Program of China under Grant No. 2018YFB1003305, and in part by the Key-Area Research and Development Program of Guangdong Province of China under Grant No. 2019B010107001.

Due to its low latency, byte-addressable, non-volatile, and high density, persistent memory (PM) is expected to be used to design a high-performance storage system. However, PM also has disadvantages such as limited endurance, thereby proposing challenges to traditional index technologies such as B+ tree. B+ tree is originally designed for dynamic random access memory (DRAM)-based or disk-based systems and has a large write amplification problem. The high write amplification is detrimental to a PM-based system. This paper proposes WO-tree, a write-optimized B+ tree for PM. WO-tree adopts an unordered write mechanism for the leaf nodes, and the unordered write mechanism can reduce a large number of write operations caused by maintaining the entry order in the leaf nodes. When the leaf node is split, WO-tree performs the cache line flushing operation after all write operations are completed, which can reduce frequent data flushing operations. WO-tree adopts a partial logging mechanism and it only writes the log for the leaf node. The inner node recognizes the data inconsistency by the read operation and the data can be recovered using the leaf node information, thereby significantly reducing the logging overhead. Furthermore, WO-tree adopts a lock-free search for inner nodes, which reduces the locking overhead for concurrency operation. We evaluate WO-tree using the Yahoo! Cloud Serving Benchmark (YCSB) workloads. Compared with traditional B+ tree, wB-tree, and Fast-Fair, the number of cache line flushes caused by WO-tree insertion operations is reduced by 84.7%, 22.2%, and 30.8%, respectively, and the execution time is reduced by 84.3%, 27.3%, and 44.7%, respectively.

Key words: persist memory; B+ tree; write amplication; consistency; YCSB (Yahoo! Cloud Serving Benchmark);

[1] Mueller W, Aichmayr G, Bergner W et al. Challenges for the DRAM cell scaling to 40nm. In Proc. IEEE International Electron Devices Meeting, December 2005, pp.336-339. DOI:10.1109/IEDM.2005.1609344.
[2] Mandelman A J, Dennard H R, Bronner B G et al. Challenges and future directions for the scaling of dynamic random-access memory (DRAM). IBM Journal of Research and Development, 2002, 46(2.3):187-212. DOI:10.1147/rd.462.0187.
[3] Freitas R, Wilcke W. Storage-class memory:The next storage system technology. IBM Journal of Research and Development, 2008, 52(4.5):439-447. DOI:10.1147/rd.524.0439.
[4] Arulraj J, Pavlo A, Dulloor S. Let's talk about storage & recovery methods for non-volatile memory database systems. In Proc. the 2015 ACM SIGMOD International Conference on Management of Data, May 31-June 4, 2015, pp.707-722. DOI:10.1145/2723372.2749441.
[5] Harter T, Borthakur D, Dong S et al. Analysis of HDFS under HBase:A facebook messages case study. In Proc. the 12th USENIX Conference on File and Storage Technologies, February 2014, pp.199-212.
[6] Lepers B, Balmau O, Gupta K et al. KVell:The design and implementation of a fast persistent key-value store. In Proc. the 27th ACM Symposium on Operating Systems Principles, October 2019, pp.447-461. DOI:10.1145/3341301.3359628.
[7] Wang Y, Tan J, Mao R et al. Temperature-aware persistent data management for LSM-Tree on 3-D NAND flash memory. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2020, 39(12):4611-4622. DOI:10.1109/TCAD.2020.2982623.
[8] Lu L, Pillai S T, Arpaci-Dusseau C A et al. WiscKey:Separating keys from values in SSD conscious storage. ACM Transactions on Storage, 2017, 13(1):1-28. DOI:10.1145/3033273.
[9] Li Y, Chan H, Lee P et al. HashKV:Enabling efficient updates in KV storage via hashing. In Proc. the 2018 USENIX Annual Technical Conference, June 2018, pp.1007-1019. DOI:10.5555/3277355.3277451.
[10] Raju P, Kadekodi R, Chidambaram V et al. PebblesDB:Building key-value stores using fragmented log-structured merge trees. In Proc. the 26th Symposium on Operating Systems Principle, October 2017, pp.497-514. DOI:10.1145/3132747.3132765.
[11] Chen S, Jin Q. Persistent B+-trees in non-volatile main memory. Proc. the VLDB Endowment, 2015, 8(7):786-797. DOI:10.14778/2752939.2752947.
[12] Lee B, Ipek E, Mutlu O et al. Phase change memory architecture and the quest for scalability. Communications of the ACM, 2010, 53(7):99-106. DOI:10.1145/1785414.1785441.
[13] Zhou P, Zhao B, Yan J et al. A durable and energy efficient main memory using phase change memory technology. In Proc. the 36th Annual International Symposium on Computer Architecture, June 2009, pp.14-23. DOI:10.1145/1555754.1555759.
[14] Yu S. Resistive Random Access Memory (RRAM). Morgan & Claypool, 2016. 10.2200/S00681ED1V01Y201510EET006.
[15] Apalkov D, Khvalkovskiy A, Watts S et al. Spintransfer torque magnetic random access memory (STTMRAM). ACM Journal on Emerging Technologies in Computing Systems, 2013, 9(2):Article No. 13. DOI:10.1145/2463585.2463589.
[16] Venkataraman S, Tolia N, Ranganathan P et al. Consistent and durable data structures for non-volatile byteaddressable memory. In Proc. the 9th USENIX Conference on File and Storage Technologies, February 2011, pp.61-75.
[17] Yang J, Wei Q, Cheng C et al. NV-tree:Reducing consistency cost for NVM-based single level systems. In Proc. the 13th USENIX Conference on File and Storage Technologies, February 2015, pp.167-181. DOI:10.5555/2750482.2750495.
[18] Oukid I, Lasperas J, Nica A et al. FPTree:A hybrid SCMDRAM persistent and concurrent B-tree for storage class memory. In Proc. the 2016 International Conference on Management of Data, June 26-July 1, 2016, pp.371-386. DOI:10.1145/2882903.2915251.
[19] Comer D. Ubiquitous B-tree. ACM Comput. Surv., 1979, 11(2):121-137. DOI:10.1145/356770.356776.
[20] Bayer R. Binary B-trees for virtual memory. In Proc. the 1971 ACM SIGFIDET Workshop on Data Description, Access and Control, November 1971, pp.219-235. DOI:10.1145/1734714.1734731.
[21] Ni J, Hu W, Li G et al. Bp-tree:A predictive B+-tree for reducing writes on phase change memory. IEEE Transactions on Knowledge and Data Engineering, 2014, 26(10):2368-2381. DOI:10.1109/TKDE.2014.5.
[22] Hwang D, Kim W H, Won Y et al. Endurable transient inconsistency in byte-addressable persistent B+-tree. In Proc. the 16th USENIX Conference on File and Storage Technologies, February 2018, pp.187-200. DOI:10.5555/3189759.3189777.
[23] Silberschatz A, Korth H, Sudarshan S. Database Systems Concepts (5th edition). McGraw-Hill, 2005.
[24] Dulloo R S, Kumar S, Keshavamurthy A et al. System software for persistent memory. In Proc. the 9th European Conference on Computer Systems, April 2014, Article No. 15. DOI:10.1145/2592798.2592814.
[25] Volos1 H, Magalhaes G, Cherkasova L et al. Quartz:A lightweight performance emulator for persistent memory software. In Proc. the 16th Annual Middleware Conference, November 2015, pp.37-49. DOI:10.1145/2814576.2814806.
[26] Cooper B F, Silberstein A, Tam E et al. Benchmarking cloud serving systems with YCSB. In Proc. the 1st ACM Symposium on Cloud Computing, June 2010, pp.143-154. DOI:10.1145/1807128.1807152.
[1] Ze-Lin Zhao, Di Huang, and Xiao-Xing Ma. TOAST: Automated Testing of Object Transformers in Dynamic Software Updates [J]. Journal of Computer Science and Technology, 2022, 37(1): 50-66.
[2] Hong-Rong Ouyang, Heng-Feng Wei, Hai-Xiang Li, An-Qun Pan, and Yu Huang. Checking Causal Consistency of MongoDB [J]. Journal of Computer Science and Technology, 2022, 37(1): 128-146.
[3] Lie-Huang Zhu, Bao-Kun Zheng, Meng Shen, Feng Gao, Hong-Yu Li, Ke-Xin Shi. Data Security and Privacy in Bitcoin System: A Survey [J]. Journal of Computer Science and Technology, 2020, 35(4): 843-862.
[4] Xu-Ran Zhao, Xun Wang, Qi-Chao Chen. Temporally Consistent Depth Map Prediction Using Deep CNN and Spatial-temporal Conditional Random Field [J]. , 2017, 32(3): 443-456.
[5] Yue-Feng Du, De-Rong Shen, Tie-Zheng Nie, Yue Kou, Ge Yu. Content-Related Repairing of Inconsistencies in Distributed Data [J]. , 2016, 31(4): 741-758.
[6] Yuan-Chao Xu, Hu Wan, Ke-Ni Qiu, Tao Li, Wei-Gong Zhang. Reducing Synchronization Cost for Single-Level Store in Mobile Systems [J]. , 2016, 31(4): 836-848.
[7] Lengdong Wu, Liyan Yuan, Jiahuai You. Survey of Large-Scale Data Management Systems for Big Data Applications [J]. , 2015, 30(1): 163-183.
[8] Bo Wang, Ying-Fei Xiong, Zhen-Jiang Hu, Hai-Yan Zhao, Wei Zhang, and Hong Mei. Interactive Inconsistency Fixing in Feature Modeling [J]. , 2014, 29(4): 724-736.
[9] Bing-Qing Shao, Jun-Wei Zhang, Cai-Ping Zheng, Hao Zhang, Zhen-Jun Liu, Lu Xu . A Non-Forced-Write Atomic Commit Protocol for Cluster File Systems [J]. , 2014, 29(2): 303-315.
[10] Ke-Dian Mu (牟克典), Member, CCF, Weiru Liu, Zhi Jin (金芝), Senior Member, CCF, IEEE, Jun Hong, and David Bell. Managing Software Requirements Changes Based on Negotiation-Style Revision [J]. , 2011, 26(5): 890-907.
[11] Xin Wang, Student Member, CCF, Lin-Peng Huang, Senior Member, CCF, Yi Zhang, Xiao-Hui Xu, Student Member, CCF, and Jun-Qing Chen, Student Member, CCF. A Solution of Data Inconsistencies in Data Integration --- Designed for Pervasive Computing Environment [J]. , 2010, 25(3): 499-508.
[12] Zu-Song Li, Dan-Dan Huan, Wei-Wu Hu, and Zhi-Min Tang. Chip Multithreaded Consistency Model [J]. , 2008, 23(2): 298-ver .
[13] Yi-Song Wang, Ming-Yi Zhang, and Yu-Ping Shen. Consistency Property of Finite FC-Normal Logic Programs [J]. , 2007, 22(4): 554-561 .
[14] Franz Weitl and Burkhard Freitag. Checking Content Consistency of Integrated Web Documents [J]. , 2006, 21(3): 418-429 .
[15] Jian-Wan Ding, Li-Ping Chen, and Fan-Li Zhou. A Component-Based Debugging Approach for Detecting Structural Inconsistencies in Declarative Equation Based Models [J]. , 2006, 21(3): 450-458 .
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] Zhou Di;. A Recovery Technique for Distributed Communicating Process Systems[J]. , 1986, 1(2): 34 -43 .
[2] Chen Shihua;. On the Structure of Finite Automata of Which M Is an(Weak)Inverse with Delay τ[J]. , 1986, 1(2): 54 -59 .
[3] Wang Jianchao; Wei Daozheng;. An Effective Test Generation Algorithm for Combinational Circuits[J]. , 1986, 1(4): 1 -16 .
[4] Chen Zhaoxiong; Gao Qingshi;. A Substitution Based Model for the Implementation of PROLOG——The Design and Implementation of LPROLOG[J]. , 1986, 1(4): 17 -26 .
[5] Huang Heyan;. A Parallel Implementation Model of HPARLOG[J]. , 1986, 1(4): 27 -38 .
[6] Zheng Guoliang; Li Hui;. The Design and Implementation of the Syntax-Directed Editor Generator(SEG)[J]. , 1986, 1(4): 39 -48 .
[7] Huang Xuedong; Cai Lianhong; Fang Ditang; Chi Bianjin; Zhou Li; Jiang Li;. A Computer System for Chinese Character Speech Input[J]. , 1986, 1(4): 75 -83 .
[8] Xu Xiaoshu;. Simplification of Multivalued Sequential SULM Network by Using Cascade Decomposition[J]. , 1986, 1(4): 84 -95 .
[9] Tang Tonggao; Zhao Zhaokeng;. Stack Method in Program Semantics[J]. , 1987, 2(1): 51 -63 .
[10] Zhong Renbao; Xing Lin; Ren Zhaoyang;. An Interactive System SDI on Microcomputer[J]. , 1987, 2(1): 64 -71 .

ISSN 1000-9000(Print)

         1860-4749(Online)
CN 11-2296/TP

Home
Editorial Board
Author Guidelines
Subscription
Journal of Computer Science and Technology
Institute of Computing Technology, Chinese Academy of Sciences
P.O. Box 2704, Beijing 100190 P.R. China
Tel.:86-10-62610746
E-mail: jcst@ict.ac.cn
 
  Copyright ©2015 JCST, All Rights Reserved