计算机科学技术学报 ›› 2019,Vol. 34 ›› Issue (6): 1185-1202.doi: 10.1007/s11390-019-1969-x

所属专题: Data Management and Data Mining

• Data Management and Data Mining • 上一篇    下一篇

针对多子图模式匹配模型的有效框架

Jiu-Ru Gao1, Wei Chen1, Jia-Jie Xu1, Member, CCF, ACM, An Liu1, Member, CCF, ACM, Zhi-Xu Li1, Member, CCF, ACM, Hongzhi Yin2, Member, CCF, ACM, Lei Zhao1,*, Member, CCF, ACM   

  1. 1 School of Computer Science and Technology, Soochow University, Suzhou 215006, China;
    2 School of Information Technology and Electrical Engineering, The University of Queensland, Brisbane 4072, Australia
  • 收稿日期:2018-12-14 修回日期:2019-09-12 出版日期:2019-11-16 发布日期:2019-11-16
  • 通讯作者: Lei Zhao E-mail:zhaol@suda.edu.cn
  • 作者简介:Jiu-Ru Gao received her B.S. degree in computer science and technology from Changshu Institute of Technology, Changshu, in 2016. She is currently a M.S. candidate in Soochow University, Suzhou. Her research interests include data mining, data analysis and graph computing.
  • 基金资助:
    This work is supported by the National Natural Science Foundation of China under Grant No. 61572335, and the Natural Science Foundation of Jiangsu Province of China under Grant No. BK20151223.

An Efficient Framework for Multiple Subgraph Pattern Matching Models

Jiu-Ru Gao1, Wei Chen1, Jia-Jie Xu1, Member, CCF, ACM, An Liu1, Member, CCF, ACM, Zhi-Xu Li1, Member, CCF, ACM, Hongzhi Yin2, Member, CCF, ACM, Lei Zhao1,*, Member, CCF, ACM   

  1. 1 School of Computer Science and Technology, Soochow University, Suzhou 215006, China;
    2 School of Information Technology and Electrical Engineering, The University of Queensland, Brisbane 4072, Australia
  • Received:2018-12-14 Revised:2019-09-12 Online:2019-11-16 Published:2019-11-16
  • Contact: Lei Zhao E-mail:zhaol@suda.edu.cn
  • About author:Jiu-Ru Gao received her B.S. degree in computer science and technology from Changshu Institute of Technology, Changshu, in 2016. She is currently a M.S. candidate in Soochow University, Suzhou. Her research interests include data mining, data analysis and graph computing.
  • Supported by:
    This work is supported by the National Natural Science Foundation of China under Grant No. 61572335, and the Natural Science Foundation of Jiangsu Province of China under Grant No. BK20151223.

随着在云平台存储大规模图数据的普及,也激发了在远程云端进行子图模式匹配的出现。子图同构是子图模式匹配的传统模型,它的计算是一个NP-完全问题。并且由于子图同构太过于严格以致于在一些应用场景下找不到有用的匹配结果。同时,如何在不破坏匹配结果的情况下保护子图模式匹配过程中的图数据隐私是一个重要的问题。因此,本文提出了一个新的框架来实现云平台上的可保护数据隐私的子图模式匹配。为了保护数据图中的结构隐私,本文采用了基于k-自同构模型的方法;为了保护数据图和模式图中的标签隐私,本文使用基于cost的标签泛化方法实现顶点标签的匿名化。在生成k-自同构图的过程中,可能会有大量的噪声边或噪声点添加到原始数据图中,因此,本文仅上传外包图(Outsourced Graph)到云端并完成子图模式匹配过程。外包图是k-自同构图的一部分,通过这种方式可以大大提高子图模式匹配过程的效率。本文在多个真实数据集上进行了大量实验并证明了本文所提出框架的高效性。

关键词: 子图模式匹配, k-自同构, 标签泛化

Abstract: With the popularity of storing large data graph in cloud, the emergence of subgraph pattern matching on a remote cloud has been inspired. Typically, subgraph pattern matching is defined in terms of subgraph isomorphism, which is an NP-complete problem and sometimes too strict to find useful matches in certain applications. And how to protect the privacy of data graphs in subgraph pattern matching without undermining matching results is an important concern. Thus, we propose a novel framework to achieve the privacy-preserving subgraph pattern matching in cloud. In order to protect the structural privacy in data graphs, we firstly develop a k-automorphism model based method. Additionally, we use a cost-model based label generalization method to protect label privacy in both data graphs and pattern graphs. During the generation of the k-automorphic graph, a large number of noise edges or vertices might be introduced to the original data graph. Thus, we use the outsourced graph, which is only a subset of a k-automorphic graph, to answer the subgraph pattern matching. The efficiency of the pattern matching process can be greatly improved in this way. Extensive experiments on real-world datasets demonstrate the high efficiency of our framework.

Key words: subgraph pattern matching, k-automorphism, label generalization

[1] Lu H, Chang Y. Mining disease transmission networks from health insurance claims. In Proc. the 2017 International Conference on Smart Health, June 2017, pp.268-273.
[2] Ray B, Ghedin E, Chunara R. Network inference from multimodal data:A review of approaches from infectious disease transmission. Journal of Biomedical Informatics, 2016, 64:44-54.
[3] Balsa E, Pérez-Solà C, Díaz C. Towards inferring communication patterns in online social networks. ACM Trans. Internet Techn., 2017, 17(3):Article No. 32.
[4] Yin H, Zhou X, Cui B, Wang H, Zheng K, Hung N Q V. Adapting to user interest drift for POI recommendation. IEEE Trans. Knowl. Data Eng., 2016, 28(10):2566-2581.
[5] Yin H, Hu Z, Zhou X, Wang H, Zheng K, Hung N Q V, Sadiq S W. Discovering interpretable geo-social communities for user behavior prediction. In Proc. the 32nd IEEE International Conference on Data Engineering, May 2016, pp.942-953.
[6] Xie M, Yin H, Wang H, Xu F, Chen W, Wang S. Learning graph-based POI embedding for location-based recommendation. In Proc. the 25th ACM International Conference on Information and Knowledge Management, October 2016, pp.15-24.
[7] Yin H, Wang W, Wang H, Chen L, Zhou X. Spatial-aware hierarchical collaborative deep learning for POI recommendation. IEEE Trans. Knowl. Data Eng., 2017, 29(11):2537-2551.
[8] Aggarwal C C, Wang H. Managing and Mining Graph Data. Spring, 2010.
[9] Gallagher B. Matching structure and semantics:A survey on graph-based pattern matching. In Proc. the 2006 AAAI Fall Symposium on Capturing and Using Patterns for Evidence Detection, October 2006, pp.45-53.
[10] Henzinger M R, Henzinger T A, Kopke P W. Computing simulations on finite and infinite graphs. In Proc. the 36th Annual Symposium on Foundations of Computer Science, October 1995, pp.453-462.
[11] Fan W, Li J, Ma S, Tang N, Wu Y, Wu Y. Graph pattern matching:From intractable to polynomial time. Proceedings of the VLDB Endowment, 2010, 3(1):264-275.
[12] Brynielsson J, Högberg J, Kaati L, Mårtenson C, Svenson P. Detecting social positions using simulation. In Proc. the 2010 International Conference on Advances in Social Networks Analysis and Mining, August 2010, pp.48-55.
[13] Ma S, Cao Y, Fan W, Huai J, Wo T. Strong simulation:Capturing topology in graph pattern matching. ACM Trans. Database Syst., 2014, 39(1):Article No. 4.
[14] Fard A, Nisar M U, Ramaswamy L, Miller J A, Saltz M. A distributed vertex-centric approach for pattern matching in massive graphs. In Proc. the 2013 IEEE International Conference on Big Data, October 2013, pp.403-411.
[15] Fard A, Nisar M U, Miller J A, Ramaswamy L. Distributed and scalable graph pattern matching:Models and algorithms. International Journal of Big Data, 2014, 1(1):1-14.
[16] Fan W, Li J, Ma S, Tang N, Wu Y. Adding regular expressions to graph reachability and pattern queries. In Proc. the 27th International Conference on Data Engineering, April 2011, pp.39-50.
[17] Chang Z, Zou L, Li F. Privacy preserving subgraph matching on large graphs in cloud. In Proc. the 2016 International Conference on Management of Data, June 2016, pp.199-213.
[18] Zou L, Chen L, Özsu M T. K-automorphism:A general framework for privacy preserving network publication. Proceedings of the VLDB Endowment, 2009, 2(1):946-957.
[19] Tai C, Tseng P, Yu P S, Chen M. Identity protection in sequential releases of dynamic networks. IEEE Trans. Knowl. Data Eng., 2014, 26(3):635-651.
[20] Liu K, Terzi E. Towards identity anonymization on graphs. In Proc. the ACM SIGMOD International Conference on Management of Data, June 2008, pp.93-106.
[21] Zhou B, Pei J. Preserving privacy in social networks against neighborhood attacks. In Proc. the 24th International Conference on Data Engineering, April 2008, pp.506-515.
[22] Li J, Xiong J, Wang X. The structure and evolution of large cascades in online social networks. In Proc. the 4th International Conference on Computational Social Networks, August 2015, pp. 273-284.
[23] Hay M, Miklau G, Jensen D et al. Resisting structural reidentification in anonymized social networks. VLDB, 2010, 19(6):797-823.
[24] Cheng J, Fu A W, Liu J. K-isomorphism:Privacy preserving network publication against structural attacks. In Proc. the ACM SIGMOD International Conference on Management of Data, June 2010, pp. 459-470.
[25] Wu W, Xiao Y, Wang W et al. K-symmetry model for identity anonymization in social networks. In Proc. the 13th International Conference on Extending Database Technology, March 2010, pp.111-122.
[26] Liu K, Terzi E. Towards identity anonymization on graphs. In Proc. the ACM SIGMOD International Conference on Management of Data, June 2008, pp.93-106.
[27] Tai C H, Yu P S, Yang D H, Chen M S. Privacy preserving social network publication against friendship attacks. In Proc. the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, August 2011, pp.1262-1270.
[28] Chen S, Zhou S. Recursive mechanism:Towards node differential privacy and unrestricted joins. In Proc. the ACM SIGMOD International Conference on Management of Data, June 2013, pp.653-664.
[29] Zhu T, Li G, Zhou W, Yu P S. Differentially private data publishing and analysis:A survey. IEEE Trans. Knowl. Data Eng., 2017, 29(8):1619-1638.
[30] Qian Q, Li Z, Zhao P et al. Publishing graph node strength histogram with edge differential privacy. In Proc. the 23rd International Conference Database Systems for Advanced Applications, May 2018, pp.75-91.
[31] Sala A, Zhao X, Wilson C et al. Sharing graphs using differentially private graph models. In Proc. the 11th ACM SIGCOMM Internet Measurement Conference, November 2011, pp.81-98.
[32] Chen R, Fung B C, Yu P S et al. Correlated network data publication via differential privacy. VLDB, 2014, 23(4):653-676.
[33] Zhang J, Cormode G, Procopiuc C M et al. Private release of graph statistics using ladder functions. In Proc. the 2015 ACM SIGMOD International Conference on Management of Data, May 2015, pp.731-745.
[34] Ullmann J R. An algorithm for subgraph isomorphism. J. ACM, 1976, 23(1):31-42.
[35] Yuan Y, Wang G, Chen L et al. Efficient subgraph similarity search on large probabilistic graph databases. VLDB, 2012, 5(9):800-811.
[36] Yuan Y, Wang G, Xu J Y et al. Efficient distributed subgraph similarity matching. VLDB, 2015, 24(3):369-394.
[37] Gao J, Xu J, Liu G et al. A privacy-preserving framework for subgraph pattern matching in cloud. In Proc. the 23rd International Conference on Database Systems for Advanced Applications, May 2018, pp.307-322.
[38] Karypis G, Kumar V. Analysis of multilevel graph partitioning. In Proc. the 1995 ACM/IEEE Conference on Supercomputing, December 1995.
[39] Karypis G, Kumar V. Multilevel k-way partitioning scheme for irregular graphs. Journal of Parallel and Distributed computing, 1998, 48(1):96-129.
No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] 刘明业; 洪恩宇;. Some Covering Problems and Their Solutions in Automatic Logic Synthesis Systems[J]. , 1986, 1(2): 83 -92 .
[2] 陈世华;. On the Structure of (Weak) Inverses of an (Weakly) Invertible Finite Automaton[J]. , 1986, 1(3): 92 -100 .
[3] 高庆狮; 张祥; 杨树范; 陈树清;. Vector Computer 757[J]. , 1986, 1(3): 1 -14 .
[4] 陈肇雄; 高庆狮;. A Substitution Based Model for the Implementation of PROLOG——The Design and Implementation of LPROLOG[J]. , 1986, 1(4): 17 -26 .
[5] 黄河燕;. A Parallel Implementation Model of HPARLOG[J]. , 1986, 1(4): 27 -38 .
[6] 闵应骅; 韩智德;. A Built-in Test Pattern Generator[J]. , 1986, 1(4): 62 -74 .
[7] 唐同诰; 招兆铿;. Stack Method in Program Semantics[J]. , 1987, 2(1): 51 -63 .
[8] 闵应骅;. Easy Test Generation PLAs[J]. , 1987, 2(1): 72 -80 .
[9] 朱鸿;. Some Mathematical Properties of the Functional Programming Language FP[J]. , 1987, 2(3): 202 -216 .
[10] 李明慧;. CAD System of Microprogrammed Digital Systems[J]. , 1987, 2(3): 226 -235 .
版权所有 © 《计算机科学技术学报》编辑部
本系统由北京玛格泰克科技发展有限公司设计开发 技术支持:support@magtech.com.cn
总访问量: