Journal of Computer Science and Technology ›› 2021, Vol. 36 ›› Issue (4): 778-791.doi: 10.1007/s11390-021-1356-2

Special Issue: Data Management and Data Mining

• Special Section on AI4DB and DB4AI • Previous Articles     Next Articles

TransGPerf: Exploiting Transfer Learning for Modeling Distributed Graph Computation Performance

Songjie Niu, Student Member, CCF, and Shimin Chen*, Senior Member, IEEE        

  1. State Key Laboratory of Computer Architecture, Institute of Computing Technology, Chinese Academy of Sciences Beijing 100190, China;University of Chinese Academy of Sciences, Beijing 100049, China
  • Received:2021-02-02 Revised:2021-07-10 Online:2021-07-05 Published:2021-07-30
  • Contact: Shimin Chen
  • About author:Songjie Niu received her B.E. degree from Beijing Institute of Technology, Beijing, in 2014. She is a Ph.D. candidate at Institute of Computing Technology, Chinese Academy of Sciences, Beijing. Her research interests include graph computation, database systems, and big data processing. She is a student member of CCF.

It is challenging to model the performance of distributed graph computation. Explicit formulation cannot easily capture the diversified factors and complex interactions in the system. Statistical learning methods require a large number of training samples to generate an accurate prediction model. However, it is time-consuming to run the required graph computation tests to obtain the training samples. In this paper, we propose TransGPerf, a transfer learning based solution that can exploit prior knowledge from a source scenario and utilize a manageable amount of training data for modeling the performance of a target graph computation scenario. Experimental results show that our proposed method is capable of generating accurate models for a wide range of graph computation tasks on PowerGraph and GraphX. It outperforms transfer learning methods proposed for other applications in the literature.

Key words: performance modeling; distributed graph computation; deep learning; transfer learning;

[1] Malewicz G, Austern M H, Bik A J C, Dehnert J C, Horn I, Leiser N, Czajkowski G. Pregel:A system for large-scale graph processing. In Proc. the 2010 ACM SIGMOD International Conference on Management of Data, June 2010, pp.135-146. DOI:10.1145/1807167.1807184.
[2] Gonzalez J E, Low Y, Gu H, Bickson D, Guestrin C. PowerGraph:Distributed graph-parallel computation on natural graphs. In Proc. the 10th USENIX Symposium on Operating Systems Design and Implementation, October 2012, pp.17-30.
[3] Xin R S, Gonzalez J E, Franklin M J, Stoica I. GraphX:A resilient distributed graph system on Spark. In Proc. the 1st International Workshop on Graph Data Management Experiences and Systems, June 2013, Article No. 2. DOI:10.1145/2484425.2484427.
[4] Niu S, Chen S. Optimizing CPU cache performance for Pregel-like graph computation. In Proc. the 31st IEEE International Conference on Data Engineering Workshops, April 2015, pp.149-154. DOI:10.1109/ICDEW.2015.7129568.
[5] Han M, Daudjee K, Ammar K, Özsu M T, Wang X, Jin T. An experimental comparison of Pregel-like graph processing systems. Proc. VLDB Endow., 2014, 7(12):1047-1058. DOI:10.14778/2732977.2732980.
[6] Iosup A, Hegeman T, Ngai W L et al. LDBC graphalytics:A benchmark for large-scale graph analysis on parallel and distributed platforms. Proc. VLDB Endow., 2016, 9(13):1317-1328. DOI:10.14778/3007263.3007270.
[7] Ngai W L, Hegeman T, Heldens S, Iosup A. Granula:Toward fine-grained performance analysis of largescale graph processing platforms. In Proc. the 5th International Workshop on Graph Data-management Experiences & Systems, May 2017, Article No. 8. DOI:10.1145/3078447.3078455.
[8] Herodotou H, Lim H, Luo G, Borisov N, Dong L, Cetin F B, Babu S. Starfish:A self-tuning system for big data analytics. In Proc. the 5th Biennial Conference on Innovative Data Systems Research, January 2011, pp.261-272.
[9] Venkataraman S, Yang Z, Franklin M J, Recht B, Stoica I. Ernest:Efficient performance prediction for large-scale advanced analytics. In Proc. the 13th USENIX Symposium on Networked Systems Design and Implementation, March 2016, pp.363-378.
[10] Pan S J, Yang Q. A survey on transfer learning. IEEE Trans. Knowledge and Data Engineering, 2010, 22(10):1345-1359. DOI:10.1109/TKDE.2009.191.
[11] Weiss K R, Khoshgoftaar T M, Wang D. A survey of transfer learning. J. Big Data, 2016, 3:Article No. 9. DOI:10.1186/s40537-016-0043-6.
[12] Tzeng E, Hoffman J, Zhang N, Saenko K, Darrell T. Deep domain confusion:Maximizing for domain invariance. arXiv:1412.3474, 2014., May 2021.
[13] Long M, Cao Y, Wang J, Jordan M I. Learning transferable features with deep adaptation networks. In Proc. the 32nd International Conference on Machine Learning, July 2015, pp.97-105.
[14] Long M, Zhu H, Wang J, Jordan M I. Deep transfer learning with joint adaptation networks. In Proc. the 34th International Conference on Machine Learning, August 2017, pp.2208-2217.
[15] Sun B, Saenko K. Deep CORAL:Correlation alignment for deep domain adaptation. In Proc. the 2016 European Conference on Computer Vision Workshops, October 2016, pp.443-450. DOI:10.1007/978-3-319-49409-835.
[16] He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. In Proc. the 2016 IEEE Conference on Computer Vision and Pattern Recognition, June 2016, pp.770-778. DOI:10.1109/CVPR.2016.90.
[17] Barker K J, Pakin S, Kerbyson D J. A performance model of the Krak hydrodynamics application. In Proc. the 2006 International Conference on Parallel Processing, August 2006, pp.245-254. DOI:10.1109/ICPP.2006.11.
[18] Kerbyson D J, Alme H J, Hoisie A, Petrini F, Wasserman H J, Gittings M L. Predictive performance and scalability modeling of a large-scale application. In Proc. the 2001 ACM/IEEE Conference on Supercomputing, November 2001, Article No. 37. DOI:10.1145/582034.582071.
[19] Sundaram-Stukel D, Vernon M K. Predictive analysis of a wavefront application using logGP. In Proc. the 1999 ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, May 1999, pp.141-150. DOI:10.1145/301104.301117.
[20] Bhattacharyya A, Hoefler T. PEMOGEN:Automatic adaptive performance modeling during program runtime. In Proc. the 23rd International Conference on Parallel Architectures and Compilation, August 2014, pp.393-404. DOI:10.1145/2628071.2628100.
[21] Bhattacharyya A, Kwasniewski G, Hoefler T. Using compiler techniques to improve automatic performance modeling. In Proc. the 2015 International Conference on Parallel Architectures and Compilation, October 2015, pp.468-479. DOI:10.1109/PACT.2015.39.
[22] Calotoiu A, Beckingsale D, Earl C W, Hoefler T, Karlin I, Schulz M, Wolf F. Fast multi-parameter performance modeling. In Proc. the 2016 IEEE International Conference on Cluster Computing, September 2016, pp.172-181. DOI:10.1109/CLUSTER.2016.57.
[23] Sun J, Sun G, Zhan S, Zhang J, Chen Y. Automated performance modeling of HPC applications using machine learning. IEEE Trans. Computers, 2020, 69(5):749-763. DOI:10.1109/TC.2020.2964767.
[24] Pan S J, Tsang I W, Kwok J T, Yang Q. Domain adaptation via transfer component analysis. IEEE Trans. Neural Networks, 2011, 22(2):199-210. DOI:10.1109/TNN.2010.2091281.
[25] Sun B, Feng J, Saenko K. Return of frustratingly easy domain adaptation. In Proc. the 30th AAAI Conference on Artificial Intelligence, February 2016, pp.2058-2065.
[26] Tzeng E, Hoffman J, Darrell T, Saenko K. Simultaneous deep transfer across domains and tasks. In Proc. the 2015 IEEE International Conference on Computer Vision, December 2015, pp.4068-4076. DOI:10.1109/ICCV.2015.463.
[27] Long M, Zhu H, Wang J, Jordan M I. Unsupervised domain adaptation with residual transfer networks. In Proc. the 30th Annual Conference on Neural Information Processing Systems, December 2016, pp.136-144.
[28] Leskovec J, Sosič R. SNAP:A general-purpose network analysis and graph-mining library. ACM Trans. Intell. Syst. Technol., 2016, 8(1):Article No. 1. DOI:10.1145/2898361.
[29] Chakrabarti D, Zhan Y, Faloutsos C. R-MAT:A recursive model for graph mining. In Proc. the 4th SIAM International Conference on Data Mining, April 2004, pp.442-446. DOI:10.1137/1.9781611972740.43.
[30] Faloutsos M, Faloutsos P, Faloutsos C. On power-law relationships of the Internet topology. In Proc. the 1999 ACM SIGCOMM Conference on Applications, Technologies, Architectures, and Protocols for Computer Communication, August 30-September 3, 1999, pp.251-262. DOI:10.1145/316188.316229.
[31] Leskovec J, Chakrabarti D, Kleinberg J M, Faloutsos C. Realistic, mathematically tractable graph generation and evolution, using Kronecker multiplication. In Proc. the 9th European Conference on Principles and Practice of Knowledge Discovery in Databases, October 2005, pp.133-145. DOI:10.1007/1156412617.
[32] Park H, Kim M. TrillionG:A trillion-scale synthetic graph generator using a recursive vector model. In Proc. the 2017 ACM International Conference on Management of Data, May 2017, pp.913-928. DOI:10.1145/3035918.3064014.
[33] Boldi P, Vigna S. The webgraph framework I:Compression techniques. In Proc. the 13th International Conference on World Wide Web, May 2004, pp.595-602. DOI:10.1145/988672.988752.
[1] Xin-Feng Wang, Xiang Zhou, Jia-Hua Rao, Zhu-Jin Zhang, and Yue-Dong Yang. Imputing DNA Methylation by Transferred Learning Based Neural Network [J]. Journal of Computer Science and Technology, 2022, 37(2): 320-329.
[2] Xin Zhang, Siyuan Lu, Shui-Hua Wang, Xiang Yu, Su-Jing Wang, Lun Yao, Yi Pan, and Yu-Dong Zhang. Diagnosis of COVID-19 Pneumonia via a Novel Deep Learning Architecture [J]. Journal of Computer Science and Technology, 2022, 37(2): 330-343.
[3] Songjie Niu, Dongyan Zhou. SOOP: Efficient Distributed Graph Computation Supporting Second-Order Random Walks [J]. Journal of Computer Science and Technology, 2021, 36(5): 985-1001.
[4] Sheng-Luan Hou, Xi-Kun Huang, Chao-Qun Fei, Shu-Han Zhang, Yang-Yang Li, Qi-Lin Sun, Chuan-Qing Wang. A Survey of Text Summarization Approaches Based on Deep Learning [J]. Journal of Computer Science and Technology, 2021, 36(3): 633-663.
[5] Lan Chen, Juntao Ye, Xiaopeng Zhang. Multi-Feature Super-Resolution Network for Cloth Wrinkle Synthesis [J]. Journal of Computer Science and Technology, 2021, 36(3): 478-493.
[6] Yu-Jie Yuan, Yukun Lai, Tong Wu, Lin Gao, Li-Gang Liu. A Revisit of Shape Editing Techniques: From the Geometric to the Neural Viewpoint [J]. Journal of Computer Science and Technology, 2021, 36(3): 520-554.
[7] Wei Du, Yu Sun, Hui-Min Bao, Liang Chen, Ying Li, Yan-Chun Liang. DeepHBSP: A Deep Learning Framework for Predicting Human Blood-Secretory Proteins Using Transfer Learning [J]. Journal of Computer Science and Technology, 2021, 36(2): 234-247.
[8] Jun Gao, Paul Liu, Guang-Di Liu, Le Zhang. Robust Needle Localization and Enhancement Algorithm for Ultrasound by Deep Learning and Beam Steering Methods [J]. Journal of Computer Science and Technology, 2021, 36(2): 334-346.
[9] Hua Chen, Juan Liu, Qing-Man Wen, Zhi-Qun Zuo, Jia-Sheng Liu, Jing Feng, Bao-Chuan Pang, Di Xiao. CytoBrain: Cervical Cancer Screening System Based on Deep Learning Technology [J]. Journal of Computer Science and Technology, 2021, 36(2): 347-360.
[10] Jason Liu, Pedro Espina, Xian-He Sun. A Study on Modeling and Optimization of Memory Systems [J]. Journal of Computer Science and Technology, 2021, 36(1): 71-89.
[11] Andrea Caroppo, Alessandro Leone, Pietro Siciliano. Comparison Between Deep Learning Models and Traditional Machine Learning Approaches for Facial Expression Recognition in Ageing Adults [J]. Journal of Computer Science and Technology, 2020, 35(5): 1127-1146.
[12] Nuo Qun, Hang Yan, Xi-Peng Qiu, Xuan-Jing Huang. Chinese Word Segmentation via BiLSTM+Semi-CRF with Relay Node [J]. Journal of Computer Science and Technology, 2020, 35(5): 1115-1126.
[13] Ying Li, Jia-Jie Xu, Peng-Peng Zhao, Jun-Hua Fang, Wei Chen, Lei Zhao. ATLRec: An Attentional Adversarial Transfer Learning Network for Cross-Domain Recommendation [J]. Journal of Computer Science and Technology, 2020, 35(4): 794-808.
[14] Dun Liang, Yuan-Chen Guo, Shao-Kui Zhang, Tai-Jiang Mu, Xiaolei Huang. Lane Detection: A Survey with New Results [J]. Journal of Computer Science and Technology, 2020, 35(3): 493-505.
[15] Zheng Zeng, Lu Wang, Bei-Bei Wang, Chun-Meng Kang, Yan-Ning Xu. Denoising Stochastic Progressive Photon Mapping Renderings Using a Multi-Residual Network [J]. Journal of Computer Science and Technology, 2020, 35(3): 506-521.
Full text



[1] Zhou Di;. A Recovery Technique for Distributed Communicating Process Systems[J]. , 1986, 1(2): 34 -43 .
[2] Chen Shihua;. On the Structure of Finite Automata of Which M Is an(Weak)Inverse with Delay τ[J]. , 1986, 1(2): 54 -59 .
[3] Li Wanxue;. Almost Optimal Dynamic 2-3 Trees[J]. , 1986, 1(2): 60 -71 .
[4] Sun Zhongxiu; Shang Lujun;. DMODULA:A Distributed Programming Language[J]. , 1986, 1(2): 25 -31 .
[5] Zhang Cui; Zhao Qinping; Xu Jiafu;. Kernel Language KLND[J]. , 1986, 1(3): 65 -79 .
[6] Wang Jianchao; Wei Daozheng;. An Effective Test Generation Algorithm for Combinational Circuits[J]. , 1986, 1(4): 1 -16 .
[7] Chen Zhaoxiong; Gao Qingshi;. A Substitution Based Model for the Implementation of PROLOG——The Design and Implementation of LPROLOG[J]. , 1986, 1(4): 17 -26 .
[8] Huang Heyan;. A Parallel Implementation Model of HPARLOG[J]. , 1986, 1(4): 27 -38 .
[9] Zheng Guoliang; Li Hui;. The Design and Implementation of the Syntax-Directed Editor Generator(SEG)[J]. , 1986, 1(4): 39 -48 .
[10] Min Yinghua; Han Zhide;. A Built-in Test Pattern Generator[J]. , 1986, 1(4): 62 -74 .

ISSN 1000-9000(Print)

CN 11-2296/TP

Editorial Board
Author Guidelines
Journal of Computer Science and Technology
Institute of Computing Technology, Chinese Academy of Sciences
P.O. Box 2704, Beijing 100190 P.R. China
  Copyright ©2015 JCST, All Rights Reserved