? 基于节点分发的分布式Twig查询处理技术
Journal of Computer Science and Technology
Quick Search in JCST
 Advanced Search 
      Home | PrePrint | SiteMap | Contact Us | Help
 
Indexed by   SCIE, EI ...
Bimonthly    Since 1986
Journal of Computer Science and Technology 2017, Vol. 32 Issue (1) :78-92    DOI: 10.1007/s11390-017-1707-1
Data Management and Data Mining << Previous Articles | Next Articles >>
基于节点分发的分布式Twig查询处理技术
Xin Bi, Xiang-Guo Zhao*, Member, CCF, and Guo-Ren Wang, Member, CCF, ACM, IEEE
College of Computer Science and Engineering, Northeastern University, Shenyang 110815, China
Efficient Processing of Distributed Twig Queries Based on Node Distribution
Xin Bi, Xiang-Guo Zhao*, Member, CCF, and Guo-Ren Wang, Member, CCF, ACM, IEEE
College of Computer Science and Engineering, Northeastern University, Shenyang 110815, China

摘要
参考文献
相关文章
Download: [PDF 468KB]  
摘要 持续增长的海量XML数据被用于网络中的数据存储、表达和数据交换。大规模XML文档中的Twig查询处理技术的研究已经成为学术研究热点之一。然而,大多数传统的查询处理算法无法直接应用在分布式环境中;部分已有的分布式算法会产生无用的中间结果从而需要执行局部结果连接;而另一些分布式算法需要在XML数据分片、存储和查询处理之前知道查询模式,这在海量数据规模或者频繁处理新查询的应用环境中是不切实际的。为了进一步提高处理效率和可扩展性,本文提出了基于节点分发的三阶段分布式算法DisT3。此外,本文提出了轻量级局部索引ReP及其改进的XML文档任意分片方法。基于此索引,本文进一步提出了改进的两阶段分布式算法DisT2,从而进一步降低了通讯代价。本文给出了所提出算法的性能保障分析,并通过大量试验验证了所提出算法在分布式Twig查询应用中的的高效性和可扩展性。
关键词XML   Twig查询   分布式计算   节点分发     
Abstract: Massive XML data are increasingly generated for the representation, storage and exchange of web information. Twig query processing over massive XML data has become a research focus. However, most traditional algorithms cannot be directly implemented in a distributed manner. Some of the existing distributed algorithms generate a lot of useless intermediate results and execute many join operations of partial results in most cases; others require the priori knowledge of query pattern before XML partition, storage and query processing, which is impractical in the cases of large-scale data or frequent incoming new queries. To improve efficiency and scalability, in this paper, we propose a 3-phase distributed algorithm DisT3 based on node distribution mechanism to avoid unnecessary intermediate results. Furthermore, we propose a lightweight local index ReP with an enhanced XML partitioning approach using arbitrary partitioning strategy, and based on ReP we propose an improved 2-phase distributed algorithm DisT2ReP to further reduce the communication cost. After the performance guarantees are analyzed, extensive experiments are conducted to verify the efficiency and scalability of our proposed algorithms in distributed twig query applications.
KeywordsXML   twig query   distributed computing   node distribution     
Received 2016-02-29;
本文基金:

This work is supported in part by the National Natural Science Foundation of China under Grant Nos. 61272181, 61672145, 61572121 and U1401256.

通讯作者: Xiang-Guo Zhao     Email: zhaoxiangguo@mail.neu.edu.cn
About author: Xin Bi received his M.S. degree in computer science from Northeastern University, Shenyang, in 2011. Currently, he is a Ph.D. candidate of Northeastern University, Shenyang. His main research interest includes XML data management and distributed database system.
引用本文:   
Xin Bi, Xiang-Guo Zhao, Guo-Ren Wang.基于节点分发的分布式Twig查询处理技术[J]  Journal of Computer Science and Technology , 2017,V32(1): 78-92
Xin Bi, Xiang-Guo Zhao, Guo-Ren Wang.Efficient Processing of Distributed Twig Queries Based on Node Distribution[J]  Journal of Computer Science and Technology, 2017,V32(1): 78-92
链接本文:  
http://jcst.ict.ac.cn:8080/jcst/CN/10.1007/s11390-017-1707-1
Copyright 2010 by Journal of Computer Science and Technology