Journal of Computer Science and Technology ›› 2021, Vol. 36 ›› Issue (6): 1420-1430.doi: 10.1007/s11390-020-0142-x

Special Issue: Artificial Intelligence and Pattern Recognition

• Regular Paper • Previous Articles     Next Articles

Pre-Train and Learn: Preserving Global Information for Graph Neural Networks

Dan-Hao Zhu1,2, Xin-Yu Dai2,*, Member, CCF, and Jia-Jun Chen2        

  1. 1 Library, Jiangsu Police Institute, Nanjing 210031, China;
    2 Department of Computer Science and Technology, Nanjing University, Nanjing 210093, China
  • Received:2019-10-30 Revised:2020-10-09 Online:2021-11-30 Published:2021-12-01
  • Contact: Xin-Yu Dai E-mail:daixinyu@nju.edu.cn
  • Supported by:
    This work was partially supported by the Natural Science Foundation of the Jiangsu Higher Education Institutions of China under Grant No. 18kJB510010, the Social Science Foundation of Jiangsu Province of China under Grant No. 19TQD002, and the National Nature Science Foundation of China under Grant No. 61976114.

Graph neural networks (GNNs) have shown great power in learning on graphs. However, it is still a challenge for GNNs to model information faraway from the source node. The ability to preserve global information can enhance graph representation and hence improve classification precision. In the paper, we propose a new learning framework named G-GNN (Global information for GNN) to address the challenge. First, the global structure and global attribute features of each node are obtained via unsupervised pre-training, and those global features preserve the global information associated with the node. Then, using the pre-trained global features and the raw attributes of the graph, a set of parallel kernel GNNs is used to learn different aspects from these heterogeneous features. Any general GNN can be used as a kernal and easily obtain the ability of preserving global information, without having to alter their own algorithms. Extensive experiments have shown that state-of-the-art models, e.g., GCN, GAT, Graphsage and APPNP, can achieve improvement with G-GNN on three standard evaluation datasets. Specially, we establish new benchmark precision records on Cora (84.31%) and Pubmed (80.95%) when learning on attributed graphs.

Key words: graph neural network; network embedding; representation learning; global information pre-train;

[1] Kipf T N, Welling M. Semi-supervised classification with graph convolutional networks. arXiv:1609.02907, 2016. https://arxiv.org/abs/1609.02907, September 2020.
[2] Hamilton W, Ying Z, Leskovec J. Inductive representation learning on large graphs. In Proc. the 31st International Conference on Neural Information Processing Systems, December 2017, pp.1024-1034.
[3] Velickovic P, Cucurull G, Casanova A, Romero A, Lio P, Bengio Y. Graph attention networks. arXiv:1710.10903, 2017. https://arxiv.org/abs/1710.10903, October 2020.
[4] Li Q, Han Z, Wu X M. Deeper insights into graph convolutional networks for semi-supervised learning. In Proc. the 32nd AAAI Conference on Artificial Intelligence, February 2018, pp.3538-3545.
[5] Abu-El-Haija S, Kapoor A, Perozzi B, Lee J. N-GCN: Multi-scale graph convolution for semi-supervised node classification. arXiv:1802.08888, 2018. https://arxiv.org/abs/1802.08888, August 2020.
[6] Klicpera J, Bojchevski A, Günnemann S. Predict then propagate: Graph neural networks meet personalized PageRank. arXiv preprint arXiv:1810.05997, 2018. https://arxiv.org/abs/1810.05997, October 2020.
[7] Perozzi B, Al-Rfou, R, Skiena S. DeepWalk: Online learning of social representations. In Proc. the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, August 2014, pp.701-710. DOI: 10.1145/2623330.2623732.
[8] Grover A, Leskovec J. node2vec: Scalable feature learning for networks. In Proc. the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, August 2016, pp.855-864. DOI: 10.1145/29396-72.2939754.
[9] Albert R, Jeong H, Barabasi A L. Internet: Diameter of the world-wide web. Nature, 1999, 401(6749): 130-131. DOI: 10.1038/43601.
[10] Broder A, Kumar R, Maghoul F, Raghavan P, Rajagopalan S, Stata R, Tomkins A, Wiener J. Graph structure in the web. Computer Networks, 2000, 33(1/2/3/4/5/6): 309-320. DOI: 10.1016/S1389-1286(00)00083-9.
[11] Mikolov T, Sutskever I, Chen K, Corrado G, Dean J. Distributed representations of words and phrases and their compositionality. In Proc. the 26th International Conference on Neural Information Processing Systems, December 2013, pp.3111-3119.
[12] Zhu D, Dai X Y, Yang K, Chen J, He Y. PCANE: Preserving context attributes for network embedding. In Proc. the 23rd Pacific-Asia Conference on Knowledge Discovery and Data Mining, April 2019, pp.156-168. DOI: 10.1007/978-3-030-16142-213.
[13] Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez A N, Kaiser L, Polosukhin I. Attention is all you need. In Proc. the 31st International Conference on Neural Information Processing Systems, December 2017, pp.5998-6008.
[14] Bengio Y, Lamblin P, Popovici D, Larochelle H. Greedy layer-wise training of deep networks. In Proc. the 19th International Conference on Neural Information Processing Systems, December 2006, pp.153-160.
[15] Lin T Y, Goyal P, Girshick R, He K, Dollar P. Focal loss for dense object detection. In Proc. the 2017 IEEE International Conference on Computer Vision, October 2017, pp.2980-2988. DOI: 10.1109/ICCV.2017.324.
[16] Sen P, Namata G, Bilgic M et al. Collective classification in network data. AI Magazine, 2008, 29(3): 93-106. DOI: 10.1609/aimag.v29i3.2157.
[17] Yang Z, Cohen W W, Salakhutdinov R. Revisiting semisupervised learning with graph embeddings. arXiv:160-3.08861, 2016. https://arxiv.org/abs/1603.08861, March 2021.
[18] Bojchevski A, Gunnemann S. Deep gaussian embedding of graphs: Unsupervised inductive learning via ranking. arXiv:1707.03815, 2017. https://arxiv.org/abs/1707.03815, March 2021.
[19] Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R. Dropout: A simple way to prevent neural networks from overfitting. Journal of Machine Learning Research, 2014, 15(1): 1929-1958.
[20] Kingma D P, Ba J. Adam: A method for stochastic optimization. arXiv:1412.6980, 2014. https://arxiv.org/abs/14-12.6980, December 2020.
[21] Gao Y, Yang H, Zhang P, Zhou C, Hu Y. Graphnas: Graph neural architecture search with reinforcement learning. arXiv:1904.09981, 2019. https://arxiv.org/abs/1904.09981, April 2021.
[22] Abu-El-Haija S, Perozzi B, Kapoor A, Harutyunyan H, Alipourfard N, Lerman K, Steeg G, Galstyan A. MixHop: Higher-order graph convolution architectures via sparsified neighborhood mixing. arXiv:1905.00067, 2019. https://arxiv.org/abs/1905.00067, May 2021.
[23] Tu C, Zhang W, Liu Z, Sun M. Max-margin DeepWalk: Discriminative learning of network representation. In Proc. the 25th International Joint Conference on Artificial Intelligence, July 2016, pp.3889-3895.
[24] Chen W, Mao X, Li X, Zhang Y, Li X. PNE: Label embedding enhanced network embedding. In Proc. the 21st Pacific-Asia Conference on Knowledge Discovery and Data Mining, May 2017, pp.547-560. DOI: 10.1007/978-3-319-57454-743.
[25] Zhu X, Ghahramani Z, Lafferty J D. Semi-supervised learning using Gaussian fields and harmonic functions. In Proc. the 20th International Conference on Machine Learning, August 2003, pp.912-919.
[26] Tang J, Qu M, Wang M, Zhang M, Yan J, Mei Q. LINE: Large-scale information network embedding. In Proc. the 24th International Conference on World Wide Web, May 2015, pp.1067-1077. DOI: 10.1145/2736277.2741093.
[27] Jacob Y, Denoyer L, Gallinari P. Learning latent representations of nodes for classifying in heterogeneous social networks. In Proc. the 7th ACM International Conference on Web Search and Data Mining, February 2014, pp.373-382. DOI: 10.1145/2556195.2556225.
[28] Chen J, Ma T, Xiao C. FastGCN: Fast learning with graph convolutional networks via importance sampling. arXiv:1801.10247, 2018. https://arxiv.org/abs/1801.10247, January 2021.
[29] You J, Ying R, Leskovec J. Position-aware graph neural networks. arXiv:1906.04817, 2019. https://arxiv.org/abs/1-906.04817, April 2021.
[30] Xu K, Li C, Tian Y, Sonobe T, Kawarabayashi K I, Jegelka S. Representation learning on graphs with jumping knowledge networks. arXiv:1806.03536, 2018. https://arxiv.org/abs/1806.03536, June 2021.
[31] Tran P V. Learning to make predictions on graphs with autoencoders. In Proc. the 5th IEEE International Conference on Data Science and Advanced Analytics, October 2018, pp.237-245. DOI: 10.1109/DSAA.2018.00034.
[1] Yu-Wen Huang, Gong-Ping Yang, Kui-Kui Wang, Hai-Ying Liu, Yi-Long Yin. Multi-Scale Deep Cascade Bi-Forest for Electrocardiogram Biometric Recognition [J]. Journal of Computer Science and Technology, 2021, 36(3): 617-632.
[2] Chao Kong, Bao-Xiang Chen, Li-Ping Zhang. DEM: Deep Entity Matching Across Heterogeneous Information Networks [J]. Journal of Computer Science and Technology, 2020, 35(4): 739-750.
[3] Chun-Yang Ruan, Ye Wang, Jiangang Ma, Yanchun Zhang, Xin-Tian Chen. Adversarial Heterogeneous Network Embedding with Metapath Attention Mechanism [J]. Journal of Computer Science and Technology, 2019, 34(6): 1217-1229.
[4] Da-Wei Cheng, Yi Tu, Zhen-Wei Ma, Zhi-Bin Niu, Li-Qing Zhang. BHONEM: Binary High-Order Network Embedding Methods for Networked-Guarantee Loans [J]. Journal of Computer Science and Technology, 2019, 34(3): 657-669.
[5] Lei Guo, Yu-Fei Wen, Xin-Hua Wang. Exploiting Pre-Trained Network Embeddings for Recommendations in Social Networks [J]. , 2018, 33(4): 682-696.
[6] Sheng Zhang, Zhu-Zhong Qian, Jie Wu, Sang-Lu Lu. Service-Oriented Resource Allocation in Clouds: Pursuing Flexibility and Efficiency [J]. , 2015, 30(2): 421-436.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] Zhou Di;. A Recovery Technique for Distributed Communicating Process Systems[J]. , 1986, 1(2): 34 -43 .
[2] Chen Shihua;. On the Structure of Finite Automata of Which M Is an(Weak)Inverse with Delay τ[J]. , 1986, 1(2): 54 -59 .
[3] Li Wanxue;. Almost Optimal Dynamic 2-3 Trees[J]. , 1986, 1(2): 60 -71 .
[4] Wang Xuan; Lü Zhimin; Tang Yuhai; Xiang Yang;. A High Resolution Chinese Character Generator[J]. , 1986, 1(2): 1 -14 .
[5] C.Y.Chung; H.R.Hwa;. A Chinese Information Processing System[J]. , 1986, 1(2): 15 -24 .
[6] Wu Enhua;. A Graphics System Distributed across a Local Area Network[J]. , 1986, 1(3): 53 -64 .
[7] Zhang Cui; Zhao Qinping; Xu Jiafu;. Kernel Language KLND[J]. , 1986, 1(3): 65 -79 .
[8] Wang Jianchao; Wei Daozheng;. An Effective Test Generation Algorithm for Combinational Circuits[J]. , 1986, 1(4): 1 -16 .
[9] Chen Zhaoxiong; Gao Qingshi;. A Substitution Based Model for the Implementation of PROLOG——The Design and Implementation of LPROLOG[J]. , 1986, 1(4): 17 -26 .
[10] Huang Heyan;. A Parallel Implementation Model of HPARLOG[J]. , 1986, 1(4): 27 -38 .

ISSN 1000-9000(Print)

         1860-4749(Online)
CN 11-2296/TP

Home
Editorial Board
Author Guidelines
Subscription
Journal of Computer Science and Technology
Institute of Computing Technology, Chinese Academy of Sciences
P.O. Box 2704, Beijing 100190 P.R. China
Tel.:86-10-62610746
E-mail: jcst@ict.ac.cn
 
  Copyright ©2015 JCST, All Rights Reserved