We use cookies to improve your experience with our site.

Indexed in:

SCIE, EI, Scopus, INSPEC, DBLP, CSCD, etc.

Submission System
(Author / Reviewer / Editor)
Zhou Zhang, Pei-Quan Jin, Xiao-Liang Wang, Yan-Qi Lv, Shou-Hong Wan, Xi-Ke Xie. COLIN: A Cache-Conscious Dynamic Learned Index with High Read/Write Performance[J]. Journal of Computer Science and Technology, 2021, 36(4): 721-740. DOI: 10.1007/s11390-021-1348-2
Citation: Zhou Zhang, Pei-Quan Jin, Xiao-Liang Wang, Yan-Qi Lv, Shou-Hong Wan, Xi-Ke Xie. COLIN: A Cache-Conscious Dynamic Learned Index with High Read/Write Performance[J]. Journal of Computer Science and Technology, 2021, 36(4): 721-740. DOI: 10.1007/s11390-021-1348-2

COLIN: A Cache-Conscious Dynamic Learned Index with High Read/Write Performance

Funds: The work was supported by the National Natural Science Foundation of China under Grant No. 62072419 and the Huawei-USTC Joint Innovation Project on Fundamental System Software.
More Information
  • Author Bio:

    Zhou Zhang received his B.S. degree in computer science and technology from the University of Science and Technology of China, Hefei, in 2016. He is a Ph.D. candidate of the School of Computer Science and Technology, University of Science and Technology of China, Hefei. His current research interests include database index, non-volatile memory, and stream processing systems.

  • Corresponding author:

    Pei-Quan Jin E-mail: jpq@ustc.edu.cn

  • Received Date: January 31, 2021
  • Revised Date: June 12, 2021
  • Published Date: July 04, 2021
  • The recently proposed learned index has higher query performance and space efficiency than the conventional B+-tree. However, the original learned index has the problems of insertion failure and unbounded query complexity, meaning that it supports neither insertions nor bounded query complexity. Some variants of the learned index use an out-of-place strategy and a bottom-up build strategy to accelerate insertions and support bounded query complexity, but introduce additional query costs and frequent node splitting operations. Moreover, none of the existing learned indices are cachefriendly. In this paper, aiming to not only support efficient queries and insertions but also offer bounded query complexity, we propose a new learned index called COLIN (Cache-cOnscious Learned INdex). Unlike previous solutions using an out-ofplace strategy, COLIN adopts an in-place approach to support insertions and reserves some empty slots in a node to optimize the node’s data placement. In particular, through model-based data placement and cache-conscious data layout, COLIN decouples the local-search boundary from the maximum error of the model. The experimental results on five workloads and three datasets show that COLIN achieves the best read/write performance among all compared indices and outperforms the second best index by 18.4%, 6.2%, and 32.9% on the three datasets, respectively.
  • [1]
    Kraska T, Beutel A, Chi E H, Dean J, Polyzotis N. The case for learned index structures. In Proc. the 2018 International Conference on Management of Data, Jun. 2018, pp.489-504. DOI: 10.1145/3183713.3196909.
    [2]
    Galakatos A, Markovitch M, Binnig C, Fonseca R, Kraska T. FITing-Tree:A data-aware index structure. In Proc. the 2019 International Conference on Management of Data, Jun. 2019, pp.1189-1206. DOI: 10.1145/3299869.3319860.
    [3]
    Ferragina P, Vinciguerra G. The PGM-index:A fullydynamic compressed learned index with provable worst-case bounds. Proceedings of the VLDB Endowment, 2020, 13(8):1162-1175. DOI: 10.14778/3389133.3389135.
    [4]
    Ding J, Minhas U F, Yu J et al. ALEX:An updatable adaptive learned index. In Proc. the 2020 ACM International Conference on Management of Data, Jun. 2020, pp.969-984. DOI: 10.1145/3318464.3389711.
    [5]
    Shazeer N, Mirhoseini A, Maziarz K, Davis A, Le Q V, Hinton G E, Dean J. Outrageously large neural networks:The sparsely-gated mixture-of-experts layer. In Proc. the 5th International Conference on Learning Representations, April 2017.
    [6]
    Liu X, Lin Z, Wang H. Novel online methods for time series segmentation. IEEE Transactions on Knowledge and Data Engineering, 2008, 20(12):1616-1626. DOI: 10.1109/TKDE.2008.29.
    [7]
    Xu Z, Zhang R, Ramamohanarao K, Parampalli U. An adaptive algorithm for online time series segmentation with error bound guarantee. In Proc. the 15th International Conference on Extending Database Technology, Mar. 2012, pp.192-203. DOI: 10.1145/2247596.2247620.
    [8]
    Xie Q, Pang C, Zhou X, Zhang X, Deng K. Maximum errorbounded piecewise linear representation for online stream approximation. The VLDB Journal, 2014, 23(6):915-937. DOI: 10.1007/s00778-014-0355-0.
    [9]
    Bentley J L, Yao A C. An almost optimal algorithm for unbounded searching. Information Processing Letters, 1976, 5(3):82-87. DOI: 10.1016/0020-0190(76)90071-5.
    [10]
    Hadian A, Heinis T. Considerations for handling updates in learned index structures. In Proc. the 2nd International Workshop on Exploiting Artificial Intelligence Techniques for Data Management, Jul. 2019, Article No. 3. DOI: 10.1145/3329859.3329874.
    [11]
    Li X, Li J, Wang X. ASLM:Adaptive single layer model for learned index. In Proc. the 2019 International Conference on Database Systems for Advanced Applications, Apr. 2019, pp.80-95. DOI: 10.1007/978-3-030-18590-96.
    [12]
    O'Neil P, Cheng E Y, Gawlick D, Oneil E. The logstructured merge-tree (LSM-tree). Acta Informatica, 1996, 33(4):351-385. DOI: 10.1007/s002360050048.
    [13]
    Bender M A, Hu H. An adaptive packed-memory array. ACM Transactions on Database Systems, 2007, 32(4):Article No. 26. DOI: 10.1145/1292609.1292616.
    [14]
    Ailamaki A, DeWitt D, Hill M, Wood D. DBMSs on a modern processor:Where does time go? In Proc. the 25th International Conference on Very Large Data Bases, Sept. 1999, pp.266-277.
    [15]
    Hadian A, Heinis T. Shift-Table:A low-latency learned index for range queries using model correction. In Proc. the 24th International Conference on Extending Database Technology, Mar. 2021, pp.253-264. DOI: 10.5441/002/edbt.2021.23.
    [16]
    Tang C, Wang Y, Hu G et al. XIndex:A scalable learned index for multicore data storage. In Proc. the 25th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, Feb. 2020, pp.308-320. DOI: 10.1145/33-32466.3374547.
    [17]
    Kipf A, Marcus R, van Renen A, Stoian M, Kemper A, Kraska T, Neumann T. RadixSpline:A single-pass learned index. In Proc. the 3rd International Workshop on Exploiting Artificial Intelligence Techniques for Data Management, Jun. 2020, Article No. 5. DOI: 10.1145/34010-71.3401659.
    [18]
    Neumann T, Michel S. Smooth interpolating histograms with error guarantees. In Proc. the 25th British National Conference on Databases, July 2008, pp.126-138. DOI: 10.1007/978-3-540-70504-812.
    [19]
    Bilgram R. Cost models for learned index with insertions[Master Thesis]. Department of Computer Science, Aalborg University, 2019.
    [20]
    Wang Y, Tang C, Wang Z, Chen H. SIndex:A scalable learned index for string keys. In Proc. the 11th ACM SIGOPS Asia-Pacific Workshop on Systems, Aug. 2020, pp.17-24. DOI: 10.1145/3409963.3410496.
    [21]
    Llaveshi A, Sirin U, Ailamaki A, West R. Accelerating B+-tree search by using simple machine learning techniques. In Proc. the 1st International Workshop on Applied AI for Database Systems and Applications, Aug. 2019.
    [22]
    Hadian A, Heinis T. Interpolation-friendly B-trees:Bridging the gap between algorithmic and learned indexes. In Proc. the 22nd International Conference on Extending Database Technology, Mar. 2019, pp.710-713. DOI: 10.5441/002/edbt.2019.93.
    [23]
    Hadian A, Heinis T. MADEX:Learning-augmented algorithmic index structures. In Proc. the 2nd International Workshop on Applied AI for Database Systems and Applications, Aug. 2020.
    [24]
    Li P, Lu H, Zheng Q, Yang L, Pan G. LISA:A learned index structure for spatial data. In Proc. the 2020 International Conference on Management of Data, Jun. 2020, pp.2119-2133. DOI: 10.1145/3318464.3389703.
    [25]
    Qi J, Liu G, Jensen C S, Kulik L. Effectively learning spatial indices. Proceedings of the VLDB Endowment, 2020, 13(11):2341-2354. DOI: 10.14778/3407790.3407829.
    [26]
    Nathan V, Ding J, Alizadeh M, Kraska T. Learning multidimensional indexes. In Proc. the 2020 International Conference on Management of Data, Jun. 2020, pp.985-1000. DOI: 10.1145/3318464.3380579.
    [27]
    Ding J, Nathan V, Alizadeh M, Kraska T. Tsunami:A learned multi-dimensional index for correlated data and skewed workloads. Proceedings of the VLDB Endowment, 2020, 14(2):74-86. DOI: 10.14778/3425879.3425880.
    [28]
    Zhou X, Chai C, Li G, Sun J. Database meets artificial intelligence:A survey. IEEE Transactions on Knowledge and Data Engineering. DOI: 10.1109/TKDE.2020.2994641.
    [29]
    Sun J, Li G. An end-to-end learning-based cost estimator. Proceedings of the VLDB Endowment, 2019, 13(3):307-319. DOI: 10.14778/3368289.3368296.
    [30]
    Rodriguez L V, Yusuf F, Lyons S, Paz E, Rangaswami R, Liu J, Zhao M, Narasimhan G. Learning cache replacement with CACHEUS. In Proc. the 19th USENIX Conference on File and Storage Technologies, Feb. 2021, pp.341-354.
    [31]
    Zhou X, Sun J, Li G, Feng J. Query performance prediction for concurrent queries using graph embedding. Proceedings of the VLDB Endowment, 2020, 13(9):1416-1428. DOI: 10.14778/3397230.3397238.
    [32]
    Fan J, Liu T, Li G, Chen J, Shen Y, Du X. Relational data synthesis using generative adversarial networks:A design space exploration. Proceedings of the VLDB Endowment, 2020, 13(11):1962-1975. DOI: 10.14778/3407790.3407802.
    [33]
    Cooper B F, Silberstein A, Tam E, Ramakrishnan R, Sears R. Benchmarking cloud serving systems with YCSB. In Proc. the 1st ACM Symposium on Cloud Computing, Jun. 2010, pp.143-154. DOI: 10.1145/1807128.1807152.
    [34]
    Jin P, Ou Y, Härder T, Li Z. AD-LRU:An efficient buffer replacement algorithm for flash-based databases. Data & Knowledge Engineering, 2012, 72:83-102. DOI: 10.1016/j.datak.2011.09.007.
  • Related Articles

    [1]Shi-Qiang Nie, Chi Zhang, Wei-Guo Wu. DIR: Dynamic Request Interleaving for Improving the Read Performance of Aged Solid-State Drives[J]. Journal of Computer Science and Technology, 2024, 39(1): 82-98. DOI: 10.1007/s11390-023-1601-y
    [2]Ning Bao, Yun-Peng Chai, Xiao Qin, Chuan-Wen Wang. MacroTrend: A Write-Efficient Cache Algorithm for NVM-Based Read Cache[J]. Journal of Computer Science and Technology, 2022, 37(1): 207-230. DOI: 10.1007/s11390-021-0178-6
    [3]Jian Liu, Yun-Peng Chai, Xiao Qin, Yao-Hong Liu. Endurable SSD-Based Read Cache for Improving the Performance of Selective Restore from Deduplication Systems[J]. Journal of Computer Science and Technology, 2018, 33(1): 58-78. DOI: 10.1007/s11390-018-1808-5
    [4]Dongchul Park, Ziqi Fan, Young Jin Nam, David H. C. Du. A Lookahead Read Cache: Improving Read Performance for Deduplication Backup Storage[J]. Journal of Computer Science and Technology, 2017, 32(1): 26-40. DOI: 10.1007/s11390-017-1680-8
    [5]WANG Wei, WaNG Yujun, SHI Baile. Dynamic Interval Index Structure in Constraint Database Systems[J]. Journal of Computer Science and Technology, 2000, 15(6): 542-551.
    [6]HUANG Linpeng, SUN Yongqiang, YUAN Wei. Hierarchical Bulk Synchronous Parallel Model and Performance Optimization[J]. Journal of Computer Science and Technology, 1999, 14(3): 224-233.
    [7]XU Mingwei, WU Jianping. A Formal Approach to Protocol Performance Testing[J]. Journal of Computer Science and Technology, 1999, 14(1): 81-87.
    [8]Hu Zhanyi, Ma Songde. Performance Prediction of the Hough Transform[J]. Journal of Computer Science and Technology, 1997, 12(1): 49-57.
    [9]Sibabrata RAY, JIANG Hong. Reconfigurable Optical Bus and Performance Optimization[J]. Journal of Computer Science and Technology, 1996, 11(3): 296-312.
    [10]Xu Haishui, Li Xianji, Richard W.Nau. Performance of Multicast Communication on Hypercubes[J]. Journal of Computer Science and Technology, 1993, 8(4): 88-92.
  • Cited by

    Periodical cited type(13)

    1. Minsu Kim, Jinwoo Hwang, Guseul Heo, et al. Accelerating String-Key Learned Index Structures via Memoization-Based Incremental Training. Proceedings of the VLDB Endowment, 2024, 17(8): 1802. DOI:10.14778/3659437.3659439
    2. Yongping Luo, Peiquan Jin, Zhaole Chu, et al. Morphtree: a polymorphic main-memory learned index for dynamic workloads. The VLDB Journal, 2024, 33(4): 1065. DOI:10.1007/s00778-023-00823-y
    3. Jiangneng Li, Zheng Wang, Gao Cong, et al. Towards Designing and Learning Piecewise Space-Filling Curves. Proceedings of the VLDB Endowment, 2023, 16(9): 2158. DOI:10.14778/3598581.3598589
    4. Shakthi Weerasinghe, Arkady Zaslavsky, Seng W. Loke, et al. Context Caching for IoT-Based Applications: Opportunities and Challenges. IEEE Internet of Things Magazine, 2023, 6(4): 96. DOI:10.1109/IOTM.001.2200247
    5. Zhou Zhang, Zhaole Chu, Peiquan Jin, et al. PLIN. Proceedings of the VLDB Endowment, 2022, 16(2): 243. DOI:10.14778/3565816.3565826
    6. Yuquan Ding, Xujian Zhao, Peiquan Jin. Database and Expert Systems Applications. Lecture Notes in Computer Science, DOI:10.1007/978-3-031-12426-6_17
    7. Shuhao Song, Peiquan Jin, Zhaole Chu, et al. LIFM: A Persistent Learned Index for Flash Memory. 2023 IEEE 29th International Conference on Parallel and Distributed Systems (ICPADS), DOI:10.1109/ICPADS60453.2023.00050
    8. Zhaole Chu, Zhou Zhang, Peiquan Jin, et al. LIVAK: A High-Performance In-Memory Learned Index for Variable-Length Keys. Proceedings of the 61st ACM/IEEE Design Automation Conference, DOI:10.1145/3649329.3657385
    9. Gaocong Liu, Yongping Luo, Peiquan Jin. Database Systems for Advanced Applications. Lecture Notes in Computer Science, DOI:10.1007/978-3-031-00123-9_44
    10. Can Wang, Peiquan Jin, Yongping Luo, et al. FGCache: Accelerating Aggregation Queries for OLAP Applications via Caching. 2023 IEEE 29th International Conference on Parallel and Distributed Systems (ICPADS), DOI:10.1109/ICPADS60453.2023.00023
    11. Zhuohan Yu, Peiquan Jin, Zhaole Chu, et al. DTtree: A Novel Read/Write-Optimized Learned Index for Database Systems. 2023 IEEE 29th International Conference on Parallel and Distributed Systems (ICPADS), DOI:10.1109/ICPADS60453.2023.00054
    12. Chunpu Huang, Yukai Huang, Lulu Chen, et al. Computing and Combinatorics. Lecture Notes in Computer Science, DOI:10.1007/978-981-96-1093-8_32
    13. Rui Zhang, Yukai Huang, Sicheng Liang, et al. Revisiting Learned Index with Byte-addressable Persistent Storage. Proceedings of the 53rd International Conference on Parallel Processing, DOI:10.1145/3673038.3673113

    Other cited types(0)

Catalog

    Article views (110) PDF downloads (0) Cited by(13)
    Related

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return