通过分层标记图式回答增量更新图的可达性查询
Answering Reachability Queries on Incrementally Updated Graphs by Hierarchical Labeling Schema
-
摘要: 我们提出了一个新的方案称为分层的标籤模式Hierarchical Labeling Schema(HLS)来回答在有向图的可达性查询。不同於现有主要针对静态有向无环图(DAG)的许多方法,我们的模式侧重於可以添加顶点或弧的有向循环图(DCG)。与许多传统的方法不同,HLS在构建索引时并不要求图是无环的。因此,它实际上可以被应用到DAG和DCG当中。当顶点或圆弧加到一个图时,HLS能够增量地更新索引,而不是每次重新计算索引,使它在运作时比起许多其他的方法更有效。HLS的基本概念是为一个图的每个顶点建立一颗树,以及将多颗树链接一起,以致每当有两个顶点,我们马上就可以透过适当的树来知道他们之间是否有一条路径。我们已经在合成的数据集和真实的数据集进行了广泛的实验。在索引构建时间,查询处理时间和空间消耗叁方面, 我们与两种最先进的方法,分别是路径树方法(path-tree method)和叁跳法(3-hop method) ,进行了一个模拟图在增量更新时的测试和比较,除此之外,我们亦与不同算法在处理静态图时进行了性能上的比较。我们的研究结果显示,HLS在实践中是具高度竞争力的,尤其是当图是经常更新的情况下特别有用。
-
关键词:
- 图索引 /
- 可达性查询 /
- 分层标记模式(HLS) /
- 增量更新 /
- 有向有环图
Abstract: We proposed a novel solution schema called the Hierarchical Labeling Schema (HLS) to answer reachability queries in directed graphs. Different from many existing approaches that focus on static directed acyclic graphs (DAGs), our schema focuses on directed cyclic graphs (DCGs) where vertices or arcs could be added to a graph incrementally. Unlike many of the traditional approaches, HLS does not require the graph to be acyclic in constructing its index. Therefore, it could, in fact, be applied to both DAGs and DCGs. When vertices or arcs are added to a graph, the HLS is capable of updating the index incrementally instead of re-computing the index from the scratch each time, making it more efficient than many other approaches in the practice. The basic idea of HLS is to create a tree for each vertex in a graph and link the trees together so that whenever two vertices are given, we can immediately know whether there is a path between them by referring to the appropriate trees. We conducted extensive experiments on both real-world datasets and synthesized datasets. We compared the performance of HLS, in terms of index construction time, query processing time and space consumption, with two state-of-the-art methodologies, the path-tree method and the 3-hop method. We also conducted simulations to model the situation when a graph is updated incrementally. The performance comparison of different algorithms against HLS on static graphs has also been studied. Our results show that HLS is highly competitive in the practice and is particularly useful in the cases where the graphs are updated frequently. -
-
[1] Yan X, Yu P, Han J. Substructure similarity search in graph databases. In Proc. the 2005 ACM SIGMOD International Conference on Management of Data, June 2005, pp.766- 777.
[2] Golumbic M. Algorithm Graph Theory and Perfect Graphs (2nd edition). Amsterdam, The Netherlands: North- Holland, 2004.
[3] Tian Y, Hankins R, Patel J. Efficient aggregation for graph summarization. In Proc. the 2008 ACM SIGMOD Inter national Conference on Management of Data, June 2008, pp.567-580.
[4] Jin R, Xiang Y, Ruan N, Wang H. Efficiently answering reachability queries on very large directed graphs. In Proc. the 2008 ACM SIGMOD International Conference on Management of Data, June 2008, pp.595-608.
[5] Jin R, Xiang Y, Ruan N, Fuhry D. 3HOP: A highcompression indexing scheme for reachability query. In Proc. the 2009 SIGMOD International Conference on Management of Data, June 29-July 2, 2009, pp.813-826.
[6] Wu Y, Huang H, Hao Z, Chen F. Local community detection using link similarity. Journal of Computer Science and Technology, 2012, 27(6): 1261-1268.
[7] Zhang Q. Dynamic uncertain causality graph for knowledge representation and reasoning: Discrete DAG cases. Journal of Computer Science and Technology, 2012, 27(1): 1-23.
[8] Pechsiri C, Piriyakul R. Explanation knowledge graph construction through causality extraction from texts. Journal of Computer Science and Technology, 2010, 25(5): 1055- 1070.
[9] D'Ulizia A, Ferri F, Formica A, Grifoni P. Approximating geographical queries. Journal of Computer Science and Technology, 2009, 24(6): 1109-1124.
[10] Sakr S, Al-Naymat G. Efficient relational techniques for processing graph queries. Journal of Computer Science and Technology, 2010, 25(6): 1237-1255.
[11] Abed S, Mohamed O A, Al-Sammane G. An abstract reachability approach by combining HOL induction and multiway decision graphs. Journal of Computer Science and Technology, 2009, 24(1): 76-95.
[12] Jagadish H. A compression technique to materialize transitive closure. ACM Transactions on Database Systems (TODS), 1990, 15(4): 558-598.
[13] Chen Y, Chen Y. An efficient algorithm for answering graph reachability queries. In Proc. the 24th IEEE International Conference on Data Engineering, April 2008, pp.893-902.
[14] Wang H, He H, Yang J, Yu P, Yu J. Dual labeling: Answering graph reachability queries in constant time. In Proc. the 22nd IEEE International Conference on Data Engineering, April 2006, pp.75:1-75:12.
[15] Agrawal R, Borgida A, Jagadish H. Efficient management of transitive relationships in large data and knowledge bases. In Proc. the 1989 ACM SIGMOD International Conference on Management of Data, May 31-June 2, 1989, pp.253-262.
[16] Chen L, Gupta A, Kurul M. Stack-based algorithms for pattern matching on DAGs. In Proc. the 31st International Conference on Very Large Data Bases, August 30- September 2, 2005, pp.493-504.
[17] Dey S K, Jamil H. A hierarchical approach to reachability query answering in very large graph databases. In Proc. the 19th ACM International Conference on Information and Knowledge Management, Oct. 2010, pp.1377-1380.
[18] Trißl S, Leser U. Fast and practical indexing and querying of very large graphs. In Proc. the 2007 ACM SIGMOD International Conference on Management of Data, June 2007, pp.845-856.
[19] Cohen E, Halperin E, Kaplan H, Zwick U. Reachability and distance queries via 2-hop labels. In Proc. the 13th Annual ACM-SIAM Symposium on Discrete Algorithms, January 2002, pp.937-946.
[20] Schenkel R, Theobald A, Weikum G. HOPI: An efficient connection index for complex XML document collections. In Proc. the 9th Advances in Database Technology-EDBT 2004, March 2004, pp.237-255.
[21] Cheng J, Yu J, Lin X, Wang H, Yu P. Fast computation of reachability labeling for large graphs. In Proc. the 10th Advances in Database Technology-EDBT 2006, March 2006, pp.961-979.
[22] Cheng J, Yu J, Lin X,Wang H, Yu P. Fast computing reachability labelings for large graphs with high compression rate. In Proc. the 11th International Conference on Extending Database Technology: Advances in Database Technology, March 2008, pp.193-204.
[23] Demetrescu C, Italiano G F. Fully dynamic all pairs shortest paths with real edge weights. Journal of Computer and System Sciences, 2006, 72(5): 813-837.
[24] Henzinger M R, King V. Fully dynamic biconnectivity and transitive closure. In Proc. the 36th Annual Symposium on Foundations of Computer Science, October 1995, pp.664- 672.
[25] Roditty L. Decremental maintenance of strongly connected components. In Proc. the 24th Annual ACM-SIAM Symposium on Discrete Algorithms, January 2013, pp.1143-1150.
[26] Roditty L, Zwick U. A fully dynamic reachability algorithm for directed graphs with an almost linear update time. In Proc. the 36th Annual ACM Symposium on Theory of Computing, June 2004, pp.184-191.
[27] Bramandia R, Choi B, Ng W K. Incremental maintenance of 2-hop labeling of large graphs. IEEE Transactions on Knowledge and Data Engineering, 2010, 22(5): 682-698.
[28] Schenkel R, Theobald A, Weikum G. Efficient creation and incremental maintenance of the HOPI index for complex XML document collections. In Proc. the 21st IEEE International Conference on Data Engineering, April 2005, pp.360-371.
[29] Jin R, Ruan N, Xiang Y, Wang H. Path-tree: An efficient reachability indexing scheme for large directed graphs. ACM Transactions on Database Systems (TODS), 2011,36(1): 7:1-7:44.
[30] Yildirim H, Chaoji V, Zaki M J. DAGGER: A scalable index for reachability queries in large dynamic graphs. http://arxiv.org/abs/1301.0977, Jan. 2016.
[31] Zhu A D, Lin W, Wang S, Xiao X. Reachability queries on large dynamic graphs: A total order approach. In Proc. the 2014 ACM SIGMOD International Conference on Management of Data, June 2014, pp.1323-1334.
[32] Krommidas I, Zaroliagis C. An experimental study of algorithms for fully dynamic transitive closure. Journal of Experimental Algorithmics (JEA), 2008, 12: 1.6:1-1.6:22.
[33] Cheng J, Huang S, Wu H, Fu A W C. TF-label: A topological-folding labeling scheme for reachability querying in a large graph. In Proc. the 2013 ACM SIGMOD International Conference on Management of Data, June 2013, pp.193-204.
[34] Jin R, Wang G. Simple, fast, and scalable reachability oracle. Proc. the VLDB Endowment, 2013, 6(14): 1978-1989.
[35] Yano Y, Akiba T, Iwata Y, Yoshida Y. Fast and scalable reachability queries on graphs by pruned labeling with landmarks and paths. In Proc. the 22nd ACM International Conference on Information and Knowledge Management, October 27-November 1,2013, pp.1601-1606.
[36] Y?d?r?m H, Chaoji V, Zaki M J. GRAIL: A scalable index for reachability queries in very large graphs. The VLDB Journal, 2012, 21(4): 509-534.
[37] Johnsonbaugh R, Kalin M. A graph generation software package. ACM SIGCSE Bulletin, 1991,23(1): 151-154.
-
期刊类型引用(4)
1. Zirui Liu, Hailin Zhang, Boxuan Chen, et al. CAFE+: Towards Compact, Adaptive, and Fast Embedding for Large-scale Online Recommendation Models. ACM Transactions on Information Systems, 2025. 必应学术
2. Hailin Zhang, Zirui Liu, Boxuan Chen, et al. CAFE: Towards Compact, Adaptive, and Fast Embedding for Large-scale Recommendation Models. Proceedings of the ACM on Management of Data, 2024, 2(1): 1. 必应学术
3. Dongdong Yan, Xiaofang Li, Qin Yang, et al. Key Technologies of Government Data Heterogeneity and Interoperation: A Survey. 2024 IEEE 7th Advanced Information Technology, Electronic and Automation Control Conference (IAEAC), 必应学术
4. Bo Zhou, Yuwei Bao, Shuai Zhu. Information Extraction Algorithms and Semantic Understanding in Natural Language Processing. 2024 First International Conference on Software, Systems and Information Technology (SSITCON), 必应学术
其他类型引用(0)
计量
- 文章访问数: 26
- HTML全文浏览量: 0
- PDF下载量: 960
- 被引次数: 4