We use cookies to improve your experience with our site.

分布式顺序表的索引技术:索引和分析

Indexing Techniques of Distributed Ordered Tables: A Survey and Analysis

  • 摘要: 业界提出了许多NoSQL数据库以存储和查询大量数据。其中如BigTable、PNUTS和HBase等一些数据库可以被归纳为分布式顺序表(Distributed Ordered Table,DOT)。分布式顺序表之上,存在许多额外的索引技术以支持非主键查询。然而,目前并没有针对这些索引技术的分析和比较,这为用户针对特定工作负载选择或者设计的索引技术制造了难点。本文提出了一个基于6项分布式顺序表的索引要素的分类方法,并对现有技术进行了对比。基于该分类方法,本文提出了一个性能模型QSModel,以预测特定索引技术的查询时间和存储开销,同时本文还采用来自腾讯的真实数据集对该模型进行了验证。实验结果显示,查询时间和存储开销的最大误差分别是24.2和9.8%。本文还提出了IndexComparator,一个继承了代表性索引技术的开源项目。基于此,用户可以从理论分析和实际实验两个方面选择最合适的索引技术。

     

    Abstract: Many NoSQL (Not Only SQL) databases were proposed to store and query on a huge amount of data. Some of them like BigTable, PNUTS, and HBase, can be modeled as distributed ordered tables (DOTs). Many additional indexing techniques have been presented to support queries on non-key columns for DOTs. However, there was no comprehensive analysis or comparison of these techniques, which brings troubles to users in selecting or proposing a proper indexing technique for a certain workload. This paper proposes a taxonomy based on six indexing issues to classify indexing techniques on DOTs and provides a comprehensive review of the state-of-the-art techniques. Based on the taxonomy, we propose a performance model named QSModel to estimate the query time and storage cost of these techniques and run experiments on a practical workload from Tencent to evaluate this model. The results show that the maximum error rates of the query time and storage cost are 24.2% and 9.8%, respectively. Furthermore, we propose IndexComparator, an open source project that implements representative indexing techniques. Therefore, users can select the best-fit indexing technique based on both theoretical analysis and practical experiments.

     

/

返回文章
返回