兄弟关系的结构连接和楼梯连接算法

摘要: 可扩展标记语言XML已经成为Internet上数据表示和交换的事实标准，随着XML的流行，对XML数据的管理也日益成为一个非常重要而有意义的研究课题，这其中，XML查询处理又是这一领域最活跃的研究方向。XML数据经常建模为一种层次结构或者树型结构，为了对这种树型结构的数据进行查询，现在的主流XML查询语言中都大量使用了XPath的路径表达式在一个XML文档的树型结构中进行导航。在路径表达式中有一系列轴，指定了按照何种顺序和方向来访问数据，或者说指定了上下文结点和选择结点之间的结构关系。XPath中的结构关系主要有：祖先-后裔关系、双亲-孩子关系、之前-之后关系和左兄弟-右兄弟关系。XPath表达式的处理的核心就是如何有效地处理这些结构关系。在处理XPath查询时，各种结构关系的处理经常被转化为各种连接操作，主要是结构连接（structural join）和楼梯连接（staircase join）。XPath表达式中最常见的关系就是祖先-后裔关系和双亲-孩子关系，因而这两种关系在数据库界有深入研究，并取得了一系列重要成果，但是对于其他结构关系则鲜有研究。但是其它结构关系在一些复杂查询中经常用到，比如在有序查询中经常用到兄弟关系。本文研究了兄弟关系的处理问题。兄弟关系的处理直观上看似乎很简单，但是仔细分析可知事实并非如此。本文中我们首先对兄弟关系处理的误区进行了分析，虽然目前存在一些简单的处理方法，但是这些方法要么不能正确产生用户需要的结果，要么效率难以保证。在此基础上，我们提出了针对结构关系的优化思想：利用双亲的结构信息来过滤和减少对元素无意义的访问的思想，这种思想可以加速双亲-孩子和左兄弟-右兄弟的结构连接。其次，提出了两个高效的兄弟关系的结构连接算法，这两个算法具有较好的连接性能：不参与连接的结点可以事先判断出来并利用B+树跳过，连接的两个列表最多都只需扫描一次，而且连结结果按照文档序有序。再次，我们将这些算法扩充到楼梯连接，讨论了兄弟关系的楼梯连接算法，研究显示，兄弟轴的楼梯连接问题与结构连接问题相近，而且提出的算法也具有较高的性能。实验结果不仅显示了本文提出的针对兄弟轴的优化技术的有效性，而且验证了本文中所提出算法的高效性。本文是第一次对这一问题进行研究。

Abstract: The processing of XML queries can result in evaluationof various structural relationships. Efficient algorithms forevaluating ancestor-descendant and parent-child relationships have beenproposed. Whereas the problems of evaluatingpreceding-sibling-following-sibling and preceding-following relationshipsare still open. In this paper, we studied the structural join andstaircase join for sibling relationship. First, the idea of how tofilter out and minimize unnecessary reads of elements using parent'sstructural information is introduced, which can be used to acceleratestructural joins of parent-child and preceding-sibling-following-siblingrelationships. Second, two efficient structural join algorithms ofsibling relationship are proposed. These algorithms lead to optimaljoin performance: nodes that do not participate in the join can bejudged beforehand and then skipped using B^+-tree index. Besides,each element list joined is scanned sequentially once at most.Furthermore, output of join results is sorted in document order. Wealso discussed the staircase join algorithm for sibling axes. Studiesshow that, staircase join for sibling axes is close to the structuraljoin for sibling axes and shares the same characteristic of highefficiency. Our experimental results not only demonstrate theeffectiveness of our optimizing techniques for sibling axes, but alsovalidate the efficiency of our algorithms. As far as we know, this isthe first work addressing this problem specially.

兄弟关系的结构连接和楼梯连接算法

Structural Join and Staircase Join Algorithms of Sibling Relationship