预取J<sup>+</sup>-Tree：一个缓存优化的主存数据库索引结构

栾华; 杜小勇; 王珊

预取J⁺-Tree：一个缓存优化的主存数据库索引结构

Prefetching J⁺-Tree: A Cache-Optimized Main Memory Database Index Structure

摘要

摘要: 1．本文的创新点
提出一种主存数据库的索引结构J⁺-tree，并给出在J⁺-tree上进行单值查询、范围查询、插入和删除操作的算法。提出一种具有高效范围查询算法的索引结构pJ⁺-tree，并给出在pJ⁺-tree上进行操作的方法。J⁺-tree和pJ⁺-tree的高度不会随着关键字数目的增多而变高，关键字本身的大小决定了树的高度，对于32位的关键字，仅用五层就可以存储所有可能的值，且每一层上比较次数和缺失次数都比较少，因此可以提供快速的单值查询，同时单值查询也是插入和删除操作的重要组成部分，所以插入和删除操作也能快速完成。另外，pJ⁺-tree使用软件预取技术进一步提高了范围查询的性能。
2．实现方法
J⁺-tree索引主要由两部分组成，上层是一个Judy数字树，下层是用双向指针链接起来的叶子结点。所有的关键字都按顺序存储在叶子结点中，每个叶子结点的最小关键字作为这个叶子结点的引用关键字存储到上层的Judy数据结构中。引用关键字被分解成多个字节，分层存储在Judy树中，Judy树的每一层至少可以存放关键字的一个字节。PJ⁺-tree基于J⁺-tree索引，由上下两部分组成，上层结构与J⁺-tree一致，下层的叶子结点除了包含指向左右兄弟结点的指针之外，增加了一个预取指针，指向要预取的叶子结点，通过这个指针可以很容易的得到要预取的结点的地址。进行范围查询的时候，在处理当前结点的同时发出对后面的结点的预取指令，这样多个结点的读取操作、结点的读取和处理操作可以重叠起来，加快查询速度。
3．结论及未来待解决的问题
（1）PJ⁺-tree在单值查询、范围查询、更新操作和内存空间各方面都能够超过B⁺-tree、pB⁺-tree、CSB⁺-tree和T-tree。
（2）PJ⁺-tree在范围查询上比J⁺-tree快2.0倍，在其他操作上表现类似。
（3）与B⁺-tree、pB⁺-tree和CSB⁺-tree相比，pJ⁺-tree和J⁺-tree在计算时间、内存相关延迟和其他延迟方面表现的都要好一些。与T-tree相比，pJ⁺-tree和J⁺-tree具有很小的内存相关延迟，这是pJ⁺-tree和J⁺-tree取得高性能的主要原因。
（4）所有索引的有用计算时间都很少，大部分时间都花在内存相关的延迟上，尤其是L2缓存缺失。
在未来工作中，需要研究pJ⁺-tree和J⁺-tree索引的并发控制方法。根据索引的特征，设计高性能的并发控制技术，将对索引的查询和更新操作的影响降低到最少。
4．实用价值或应用前景
长期以来，数据库领域的主要研究方向是磁盘数据库。随着硬件技术的快速发展，内存的容量越来越大，价格也越来越便宜，越来越多的数据可以放到内存中；同时磁盘数据库一直在研究减少I/O次数的技术，如异步IO、磁盘阵列、压缩技术等，使得I/O在很多情况下已经不是主要的或者是唯一的系统瓶颈，CPU对内存的访问延迟越来越重要，主存数据库越来越多的受到研究者以及工业界的关注。高性能是主存数据库的一大优势，索引技术作为主存数据库的重要组成部分，对主存数据库的性能有很大影响，对能否很好的满足应用需求起到重要作用，因此本文提出的主存数据库索引结构具有较强的实用价值和应用前景。

Abstract: As the speed gap between main memory and modern processors continues to widen, the cache behavior becomes more important for main memory database systems (MMDBs). Indexing technique is a key component of MMDBs. Unfortunately, the predominant indexes --- B⁺-trees and T-trees --- have been shown to utilize cache poorly, which triggers the development of many cache-conscious indexes, such as CSB⁺-trees and pB⁺-trees. Most of these cache-conscious indexes are variants of conventional B⁺-trees, and have better cache performance than B⁺-trees. In this paper, we develop a novel J⁺-tree index, inspired by the Judy structure which is an associative array data structure, and propose a more cache-optimized index --- Prefetching J⁺-tree (pJ⁺-tree), which applies prefetching to J⁺-tree to accelerate range scan operations. The J⁺-tree stores all the keys in its leaf nodes and keeps the reference values of leaf nodes in a Judy structure, which makes J⁺-tree not only hold the advantages of Judy (such as fast single value search) but also outperform it in other aspects. For example, J⁺-trees can achieve better performance on range queries than Judy. The pJ⁺-tree index exploits prefetching techniques to further improve the cache behavior of J⁺-trees and yields a speedup of 2.0 on range scans. Compared with B⁺-trees, CSB⁺-trees, pB⁺-trees and T-trees, our extensive experimental study shows that pJ⁺-trees can provide better performance on both time (search, scan, update) and space aspects.

HTML全文

参考文献()

施引文献

资源附件()

预取J+-Tree：一个缓存优化的主存数据库索引结构

Prefetching J+-Tree: A Cache-Optimized Main Memory Database Index Structure

预取J⁺-Tree：一个缓存优化的主存数据库索引结构

Prefetching J⁺-Tree: A Cache-Optimized Main Memory Database Index Structure