? 分布式顺序表的索引技术:索引和分析
Journal of Computer Science and Technology
Quick Search in JCST
 Advanced Search 
      Home | PrePrint | SiteMap | Contact Us | Help
 
Indexed by   SCIE, EI ...
Bimonthly    Since 1986
Journal of Computer Science and Technology 2018, Vol. 33 Issue (1) :169-189    DOI: 10.1007/s11390-018-1813-8
Survey << Previous Articles | Next Articles >>
分布式顺序表的索引技术:索引和分析
Chen Feng1,2, Student Member, CCF, Chun-Dian Li1,2, Student Member, CCF, Rui Li3
1 State Key Laboratory of Computer Architecture, Institute of Computing Technology, Chinese Academy of Sciences Beijing 100190, China;
2 University of Chinese Academy of Sciences, Beijing 100049, China;
3 Tencent Inc., Beijing 100080, China
Indexing Techniques of Distributed Ordered Tables: A Survey and Analysis
Chen Feng1,2, Student Member, CCF, Chun-Dian Li1,2, Student Member, CCF, Rui Li3
1 State Key Laboratory of Computer Architecture, Institute of Computing Technology, Chinese Academy of Sciences Beijing 100190, China;
2 University of Chinese Academy of Sciences, Beijing 100049, China;
3 Tencent Inc., Beijing 100080, China

摘要
参考文献
相关文章
Download: [PDF 2555KB]  
摘要 业界提出了许多NoSQL数据库以存储和查询大量数据。其中如BigTable、PNUTS和HBase等一些数据库可以被归纳为分布式顺序表(Distributed Ordered Table,DOT)。分布式顺序表之上,存在许多额外的索引技术以支持非主键查询。然而,目前并没有针对这些索引技术的分析和比较,这为用户针对特定工作负载选择或者设计的索引技术制造了难点。本文提出了一个基于6项分布式顺序表的索引要素的分类方法,并对现有技术进行了对比。基于该分类方法,本文提出了一个性能模型QSModel,以预测特定索引技术的查询时间和存储开销,同时本文还采用来自腾讯的真实数据集对该模型进行了验证。实验结果显示,查询时间和存储开销的最大误差分别是24.2和9.8%。本文还提出了IndexComparator,一个继承了代表性索引技术的开源项目。基于此,用户可以从理论分析和实际实验两个方面选择最合适的索引技术。
关键词数据库   NoSQL   区间查询   索引     
Abstract: Many NoSQL (Not Only SQL) databases were proposed to store and query on a huge amount of data. Some of them like BigTable, PNUTS, and HBase, can be modeled as distributed ordered tables (DOTs). Many additional indexing techniques have been presented to support queries on non-key columns for DOTs. However, there was no comprehensive analysis or comparison of these techniques, which brings troubles to users in selecting or proposing a proper indexing technique for a certain workload. This paper proposes a taxonomy based on six indexing issues to classify indexing techniques on DOTs and provides a comprehensive review of the state-of-the-art techniques. Based on the taxonomy, we propose a performance model named QSModel to estimate the query time and storage cost of these techniques and run experiments on a practical workload from Tencent to evaluate this model. The results show that the maximum error rates of the query time and storage cost are 24.2% and 9.8%, respectively. Furthermore, we propose IndexComparator, an open source project that implements representative indexing techniques. Therefore, users can select the best-fit indexing technique based on both theoretical analysis and practical experiments.
Keywordsdatabase   Not Only SQL (NoSQL)   range query   indexing     
Received 2017-03-27;
本文基金:

This work is partially supported by the Strategic Priority Program of Chinese Academy of Sciences under Grant No. XDB02040009, the Key Program of the National Natural Science Foundation of China under Grant No. 61532016, the Key Program of Cloud Computing and Big Data of the Ministry of the Science and Technology of China under Grant No. 2016YFB1000200, and Tencent Inc.

About author: Chen Feng is a Ph.D. candidate of Institute of Computing Technology, Chinese Academy of Sciences, Beijing. He received his B.S. degree in software engineering from Nankai University, Tianjin, in 2011. His current research interests include big data computing and distributed system. He is a student member of CCF.
引用本文:   
Chen Feng, Chun-Dian Li, Rui Li.分布式顺序表的索引技术:索引和分析[J]  Journal of Computer Science and Technology , 2018,V33(1): 169-189
Chen Feng, Chun-Dian Li, Rui Li.Indexing Techniques of Distributed Ordered Tables: A Survey and Analysis[J]  Journal of Computer Science and Technology, 2018,V33(1): 169-189
链接本文:  
http://jcst.ict.ac.cn:8080/jcst/CN/10.1007/s11390-018-1813-8
Copyright 2010 by Journal of Computer Science and Technology