计算机科学技术学报 ›› 2021,Vol. 36 ›› Issue (6): 1420-1430.doi: 10.1007/s11390-020-0142-x

所属专题: Artificial Intelligence and Pattern Recognition

• • 上一篇    下一篇

预训练和学习:在图神经网络中保留全局信息

Dan-Hao Zhu1,2, Xin-Yu Dai2,*, Member, CCF, and Jia-Jun Chen2   

  1. 1 Library, Jiangsu Police Institute, Nanjing 210031, China;
    2 Department of Computer Science and Technology, Nanjing University, Nanjing 210093, China
  • 收稿日期:2019-10-30 修回日期:2020-10-09 出版日期:2021-11-30 发布日期:2021-12-01
  • 通讯作者: Xin-Yu Dai E-mail:daixinyu@nju.edu.cn
  • 作者简介:Dan-Hao Zhu received his B.S. degree in management from Hohai University, Changzhou, in 2008, his M.S. degree in management from Nanjing University, Nanjing, in 2011, and his Ph.D. degree in computer science in Nanjing University, Nanjing, in 2019. He is currently a librarian in Jiangsu Police Institute, Nanjing. His research interests include network representation, graph learning and natural language processing.
  • 基金资助:
    This work was partially supported by the Natural Science Foundation of the Jiangsu Higher Education Institutions of China under Grant No. 18kJB510010, the Social Science Foundation of Jiangsu Province of China under Grant No. 19TQD002, and the National Nature Science Foundation of China under Grant No. 61976114.

Pre-Train and Learn: Preserving Global Information for Graph Neural Networks

Dan-Hao Zhu1,2, Xin-Yu Dai2,*, Member, CCF, and Jia-Jun Chen2        

  1. 1 Library, Jiangsu Police Institute, Nanjing 210031, China;
    2 Department of Computer Science and Technology, Nanjing University, Nanjing 210093, China
  • Received:2019-10-30 Revised:2020-10-09 Online:2021-11-30 Published:2021-12-01
  • Contact: Xin-Yu Dai E-mail:daixinyu@nju.edu.cn
  • Supported by:
    This work was partially supported by the Natural Science Foundation of the Jiangsu Higher Education Institutions of China under Grant No. 18kJB510010, the Social Science Foundation of Jiangsu Province of China under Grant No. 19TQD002, and the National Nature Science Foundation of China under Grant No. 61976114.

1、研究背景(context)。
近几年,图神经网络是图学习领域的前沿技术,相关的研究汗牛充栋。然而,常见图神经网络一般只能承受两层的深度,过多的层数会导致过平滑问题,因此,只能利用到两部以内的局部信息。
2、目的(Objective)。
本研究提出了一种框架,可以灵活地嵌入不同类型的图神经网络内核,使其能够获取并利用全局信息,进而提高方法的分类水平。
3、方法(Method)。
首先,基于随机游走的无监督学习方法,分别获取每个节点的全局结构特征和全局属性特征;其次,基于平行构架,使用3个图神经网络模型分别从原始属性和预训练特征中提取出高维特征;最后,加权混合不同的高维特征,进行分类。
4、结果(Result&Findings)。
本文在4个数据集上进行了实证实验,发现本文的框架可以明显提高所有方法的性能。特别的是,在Cora和Pubmed两个标准数据集上,本文得到了新的标杆结果:Cora(84.31%)和Pubmed(80.95%)。
5、结论(Conclusions)。
本文提出了一种可使通用图神经网络方法获取全局信息处理能力的框架,实证中的结果论证了本文方法的有效性。未来,我们将针对具体的图神经网络方法,研究针对性的、提升全局信息利用能力的方案。

关键词: 图神经网络, 网络表示, 表示学习

Abstract: Graph neural networks (GNNs) have shown great power in learning on graphs. However, it is still a challenge for GNNs to model information faraway from the source node. The ability to preserve global information can enhance graph representation and hence improve classification precision. In the paper, we propose a new learning framework named G-GNN (Global information for GNN) to address the challenge. First, the global structure and global attribute features of each node are obtained via unsupervised pre-training, and those global features preserve the global information associated with the node. Then, using the pre-trained global features and the raw attributes of the graph, a set of parallel kernel GNNs is used to learn different aspects from these heterogeneous features. Any general GNN can be used as a kernal and easily obtain the ability of preserving global information, without having to alter their own algorithms. Extensive experiments have shown that state-of-the-art models, e.g., GCN, GAT, Graphsage and APPNP, can achieve improvement with G-GNN on three standard evaluation datasets. Specially, we establish new benchmark precision records on Cora (84.31%) and Pubmed (80.95%) when learning on attributed graphs.

Key words: graph neural network, network embedding, representation learning, global information pre-train

[1] Kipf T N, Welling M. Semi-supervised classification with graph convolutional networks. arXiv:1609.02907, 2016. https://arxiv.org/abs/1609.02907, September 2020.
[2] Hamilton W, Ying Z, Leskovec J. Inductive representation learning on large graphs. In Proc. the 31st International Conference on Neural Information Processing Systems, December 2017, pp.1024-1034.
[3] Velickovic P, Cucurull G, Casanova A, Romero A, Lio P, Bengio Y. Graph attention networks. arXiv:1710.10903, 2017. https://arxiv.org/abs/1710.10903, October 2020.
[4] Li Q, Han Z, Wu X M. Deeper insights into graph convolutional networks for semi-supervised learning. In Proc. the 32nd AAAI Conference on Artificial Intelligence, February 2018, pp.3538-3545.
[5] Abu-El-Haija S, Kapoor A, Perozzi B, Lee J. N-GCN: Multi-scale graph convolution for semi-supervised node classification. arXiv:1802.08888, 2018. https://arxiv.org/abs/1802.08888, August 2020.
[6] Klicpera J, Bojchevski A, Günnemann S. Predict then propagate: Graph neural networks meet personalized PageRank. arXiv preprint arXiv:1810.05997, 2018. https://arxiv.org/abs/1810.05997, October 2020.
[7] Perozzi B, Al-Rfou, R, Skiena S. DeepWalk: Online learning of social representations. In Proc. the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, August 2014, pp.701-710. DOI: 10.1145/2623330.2623732.
[8] Grover A, Leskovec J. node2vec: Scalable feature learning for networks. In Proc. the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, August 2016, pp.855-864. DOI: 10.1145/29396-72.2939754.
[9] Albert R, Jeong H, Barabasi A L. Internet: Diameter of the world-wide web. Nature, 1999, 401(6749): 130-131. DOI: 10.1038/43601.
[10] Broder A, Kumar R, Maghoul F, Raghavan P, Rajagopalan S, Stata R, Tomkins A, Wiener J. Graph structure in the web. Computer Networks, 2000, 33(1/2/3/4/5/6): 309-320. DOI: 10.1016/S1389-1286(00)00083-9.
[11] Mikolov T, Sutskever I, Chen K, Corrado G, Dean J. Distributed representations of words and phrases and their compositionality. In Proc. the 26th International Conference on Neural Information Processing Systems, December 2013, pp.3111-3119.
[12] Zhu D, Dai X Y, Yang K, Chen J, He Y. PCANE: Preserving context attributes for network embedding. In Proc. the 23rd Pacific-Asia Conference on Knowledge Discovery and Data Mining, April 2019, pp.156-168. DOI: 10.1007/978-3-030-16142-213.
[13] Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez A N, Kaiser L, Polosukhin I. Attention is all you need. In Proc. the 31st International Conference on Neural Information Processing Systems, December 2017, pp.5998-6008.
[14] Bengio Y, Lamblin P, Popovici D, Larochelle H. Greedy layer-wise training of deep networks. In Proc. the 19th International Conference on Neural Information Processing Systems, December 2006, pp.153-160.
[15] Lin T Y, Goyal P, Girshick R, He K, Dollar P. Focal loss for dense object detection. In Proc. the 2017 IEEE International Conference on Computer Vision, October 2017, pp.2980-2988. DOI: 10.1109/ICCV.2017.324.
[16] Sen P, Namata G, Bilgic M et al. Collective classification in network data. AI Magazine, 2008, 29(3): 93-106. DOI: 10.1609/aimag.v29i3.2157.
[17] Yang Z, Cohen W W, Salakhutdinov R. Revisiting semisupervised learning with graph embeddings. arXiv:160-3.08861, 2016. https://arxiv.org/abs/1603.08861, March 2021.
[18] Bojchevski A, Gunnemann S. Deep gaussian embedding of graphs: Unsupervised inductive learning via ranking. arXiv:1707.03815, 2017. https://arxiv.org/abs/1707.03815, March 2021.
[19] Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R. Dropout: A simple way to prevent neural networks from overfitting. Journal of Machine Learning Research, 2014, 15(1): 1929-1958.
[20] Kingma D P, Ba J. Adam: A method for stochastic optimization. arXiv:1412.6980, 2014. https://arxiv.org/abs/14-12.6980, December 2020.
[21] Gao Y, Yang H, Zhang P, Zhou C, Hu Y. Graphnas: Graph neural architecture search with reinforcement learning. arXiv:1904.09981, 2019. https://arxiv.org/abs/1904.09981, April 2021.
[22] Abu-El-Haija S, Perozzi B, Kapoor A, Harutyunyan H, Alipourfard N, Lerman K, Steeg G, Galstyan A. MixHop: Higher-order graph convolution architectures via sparsified neighborhood mixing. arXiv:1905.00067, 2019. https://arxiv.org/abs/1905.00067, May 2021.
[23] Tu C, Zhang W, Liu Z, Sun M. Max-margin DeepWalk: Discriminative learning of network representation. In Proc. the 25th International Joint Conference on Artificial Intelligence, July 2016, pp.3889-3895.
[24] Chen W, Mao X, Li X, Zhang Y, Li X. PNE: Label embedding enhanced network embedding. In Proc. the 21st Pacific-Asia Conference on Knowledge Discovery and Data Mining, May 2017, pp.547-560. DOI: 10.1007/978-3-319-57454-743.
[25] Zhu X, Ghahramani Z, Lafferty J D. Semi-supervised learning using Gaussian fields and harmonic functions. In Proc. the 20th International Conference on Machine Learning, August 2003, pp.912-919.
[26] Tang J, Qu M, Wang M, Zhang M, Yan J, Mei Q. LINE: Large-scale information network embedding. In Proc. the 24th International Conference on World Wide Web, May 2015, pp.1067-1077. DOI: 10.1145/2736277.2741093.
[27] Jacob Y, Denoyer L, Gallinari P. Learning latent representations of nodes for classifying in heterogeneous social networks. In Proc. the 7th ACM International Conference on Web Search and Data Mining, February 2014, pp.373-382. DOI: 10.1145/2556195.2556225.
[28] Chen J, Ma T, Xiao C. FastGCN: Fast learning with graph convolutional networks via importance sampling. arXiv:1801.10247, 2018. https://arxiv.org/abs/1801.10247, January 2021.
[29] You J, Ying R, Leskovec J. Position-aware graph neural networks. arXiv:1906.04817, 2019. https://arxiv.org/abs/1-906.04817, April 2021.
[30] Xu K, Li C, Tian Y, Sonobe T, Kawarabayashi K I, Jegelka S. Representation learning on graphs with jumping knowledge networks. arXiv:1806.03536, 2018. https://arxiv.org/abs/1806.03536, June 2021.
[31] Tran P V. Learning to make predictions on graphs with autoencoders. In Proc. the 5th IEEE International Conference on Data Science and Advanced Analytics, October 2018, pp.237-245. DOI: 10.1109/DSAA.2018.00034.
[1] Da-Wei Cheng, Yi Tu, Zhen-Wei Ma, Zhi-Bin Niu, Li-Qing Zhang. 二元高阶担保网络表示学习方法[J]. 计算机科学技术学报, 2019, 34(3): 657-669.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] 周笛;. A Recovery Technique for Distributed Communicating Process Systems[J]. , 1986, 1(2): 34 -43 .
[2] 陈世华;. On the Structure of Finite Automata of Which M Is an(Weak)Inverse with Delay τ[J]. , 1986, 1(2): 54 -59 .
[3] 李万学;. Almost Optimal Dynamic 2-3 Trees[J]. , 1986, 1(2): 60 -71 .
[4] 王选; 吕之敏; 汤玉海; 向阳;. A High Resolution Chinese Character Generator[J]. , 1986, 1(2): 1 -14 .
[5] C.Y.Chung; 华宣仁;. A Chinese Information Processing System[J]. , 1986, 1(2): 15 -24 .
[6] 吴恩华;. A Graphics System Distributed across a Local Area Network[J]. , 1986, 1(3): 53 -64 .
[7] 章萃; 赵沁平; 徐家福;. Kernel Language KLND[J]. , 1986, 1(3): 65 -79 .
[8] 王建潮; 魏道政;. An Effective Test Generation Algorithm for Combinational Circuits[J]. , 1986, 1(4): 1 -16 .
[9] 陈肇雄; 高庆狮;. A Substitution Based Model for the Implementation of PROLOG——The Design and Implementation of LPROLOG[J]. , 1986, 1(4): 17 -26 .
[10] 黄河燕;. A Parallel Implementation Model of HPARLOG[J]. , 1986, 1(4): 27 -38 .
版权所有 © 《计算机科学技术学报》编辑部
本系统由北京玛格泰克科技发展有限公司设计开发 技术支持:support@magtech.com.cn
总访问量: