We use cookies to improve your experience with our site.

Indexed in:

SCIE, EI, Scopus, INSPEC, DBLP, CSCD, etc.

Submission System
(Author / Reviewer / Editor)
Songjie Niu, Shimin Chen. TransGPerf: Exploiting Transfer Learning for Modeling Distributed Graph Computation Performance[J]. Journal of Computer Science and Technology, 2021, 36(4): 778-791. DOI: 10.1007/s11390-021-1356-2
Citation: Songjie Niu, Shimin Chen. TransGPerf: Exploiting Transfer Learning for Modeling Distributed Graph Computation Performance[J]. Journal of Computer Science and Technology, 2021, 36(4): 778-791. DOI: 10.1007/s11390-021-1356-2

TransGPerf: Exploiting Transfer Learning for Modeling Distributed Graph Computation Performance

More Information
  • Author Bio:

    Songjie Niu received her B.E. degree from Beijing Institute of Technology, Beijing, in 2014. She is a Ph.D. candidate at Institute of Computing Technology, Chinese Academy of Sciences, Beijing. Her research interests include graph computation, database systems, and big data processing. She is a student member of CCF.

  • Corresponding author:

    Shimin Chen E-mail: chensm@ict.ac.cn

  • Received Date: February 01, 2021
  • Revised Date: July 09, 2021
  • Published Date: July 04, 2021
  • It is challenging to model the performance of distributed graph computation. Explicit formulation cannot easily capture the diversified factors and complex interactions in the system. Statistical learning methods require a large number of training samples to generate an accurate prediction model. However, it is time-consuming to run the required graph computation tests to obtain the training samples. In this paper, we propose TransGPerf, a transfer learning based solution that can exploit prior knowledge from a source scenario and utilize a manageable amount of training data for modeling the performance of a target graph computation scenario. Experimental results show that our proposed method is capable of generating accurate models for a wide range of graph computation tasks on PowerGraph and GraphX. It outperforms transfer learning methods proposed for other applications in the literature.
  • [1]
    Malewicz G, Austern M H, Bik A J C, Dehnert J C, Horn I, Leiser N, Czajkowski G. Pregel:A system for large-scale graph processing. In Proc. the 2010 ACM SIGMOD International Conference on Management of Data, June 2010, pp.135-146. DOI: 10.1145/1807167.1807184.
    [2]
    Gonzalez J E, Low Y, Gu H, Bickson D, Guestrin C. PowerGraph:Distributed graph-parallel computation on natural graphs. In Proc. the 10th USENIX Symposium on Operating Systems Design and Implementation, October 2012, pp.17-30.
    [3]
    Xin R S, Gonzalez J E, Franklin M J, Stoica I. GraphX:A resilient distributed graph system on Spark. In Proc. the 1st International Workshop on Graph Data Management Experiences and Systems, June 2013, Article No. 2. DOI: 10.1145/2484425.2484427.
    [4]
    Niu S, Chen S. Optimizing CPU cache performance for Pregel-like graph computation. In Proc. the 31st IEEE International Conference on Data Engineering Workshops, April 2015, pp.149-154. DOI: 10.1109/ICDEW.2015.7129568.
    [5]
    Han M, Daudjee K, Ammar K, Özsu M T, Wang X, Jin T. An experimental comparison of Pregel-like graph processing systems. Proc. VLDB Endow., 2014, 7(12):1047-1058. DOI: 10.14778/2732977.2732980.
    [6]
    Iosup A, Hegeman T, Ngai W L et al. LDBC graphalytics:A benchmark for large-scale graph analysis on parallel and distributed platforms. Proc. VLDB Endow., 2016, 9(13):1317-1328. DOI: 10.14778/3007263.3007270.
    [7]
    Ngai W L, Hegeman T, Heldens S, Iosup A. Granula:Toward fine-grained performance analysis of largescale graph processing platforms. In Proc. the 5th International Workshop on Graph Data-management Experiences & Systems, May 2017, Article No. 8. DOI: 10.1145/3078447.3078455.
    [8]
    Herodotou H, Lim H, Luo G, Borisov N, Dong L, Cetin F B, Babu S. Starfish:A self-tuning system for big data analytics. In Proc. the 5th Biennial Conference on Innovative Data Systems Research, January 2011, pp.261-272.
    [9]
    Venkataraman S, Yang Z, Franklin M J, Recht B, Stoica I. Ernest:Efficient performance prediction for large-scale advanced analytics. In Proc. the 13th USENIX Symposium on Networked Systems Design and Implementation, March 2016, pp.363-378.
    [10]
    Pan S J, Yang Q. A survey on transfer learning. IEEE Trans. Knowledge and Data Engineering, 2010, 22(10):1345-1359. DOI: 10.1109/TKDE.2009.191.
    [11]
    Weiss K R, Khoshgoftaar T M, Wang D. A survey of transfer learning. J. Big Data, 2016, 3:Article No. 9. DOI: 10.1186/s40537-016-0043-6.
    [12]
    Tzeng E, Hoffman J, Zhang N, Saenko K, Darrell T. Deep domain confusion:Maximizing for domain invariance. arXiv:1412.3474, 2014. https://arxiv.org/pdf/1412.3474.pdf, May 2021.
    [13]
    Long M, Cao Y, Wang J, Jordan M I. Learning transferable features with deep adaptation networks. In Proc. the 32nd International Conference on Machine Learning, July 2015, pp.97-105.
    [14]
    Long M, Zhu H, Wang J, Jordan M I. Deep transfer learning with joint adaptation networks. In Proc. the 34th International Conference on Machine Learning, August 2017, pp.2208-2217.
    [15]
    Sun B, Saenko K. Deep CORAL:Correlation alignment for deep domain adaptation. In Proc. the 2016 European Conference on Computer Vision Workshops, October 2016, pp.443-450. DOI: 10.1007/978-3-319-49409-835.
    [16]
    He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. In Proc. the 2016 IEEE Conference on Computer Vision and Pattern Recognition, June 2016, pp.770-778. DOI: 10.1109/CVPR.2016.90.
    [17]
    Barker K J, Pakin S, Kerbyson D J. A performance model of the Krak hydrodynamics application. In Proc. the 2006 International Conference on Parallel Processing, August 2006, pp.245-254. DOI: 10.1109/ICPP.2006.11.
    [18]
    Kerbyson D J, Alme H J, Hoisie A, Petrini F, Wasserman H J, Gittings M L. Predictive performance and scalability modeling of a large-scale application. In Proc. the 2001 ACM/IEEE Conference on Supercomputing, November 2001, Article No. 37. DOI: 10.1145/582034.582071.
    [19]
    Sundaram-Stukel D, Vernon M K. Predictive analysis of a wavefront application using logGP. In Proc. the 1999 ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, May 1999, pp.141-150. DOI: 10.1145/301104.301117.
    [20]
    Bhattacharyya A, Hoefler T. PEMOGEN:Automatic adaptive performance modeling during program runtime. In Proc. the 23rd International Conference on Parallel Architectures and Compilation, August 2014, pp.393-404. DOI: 10.1145/2628071.2628100.
    [21]
    Bhattacharyya A, Kwasniewski G, Hoefler T. Using compiler techniques to improve automatic performance modeling. In Proc. the 2015 International Conference on Parallel Architectures and Compilation, October 2015, pp.468-479. DOI: 10.1109/PACT.2015.39.
    [22]
    Calotoiu A, Beckingsale D, Earl C W, Hoefler T, Karlin I, Schulz M, Wolf F. Fast multi-parameter performance modeling. In Proc. the 2016 IEEE International Conference on Cluster Computing, September 2016, pp.172-181. DOI: 10.1109/CLUSTER.2016.57.
    [23]
    Sun J, Sun G, Zhan S, Zhang J, Chen Y. Automated performance modeling of HPC applications using machine learning. IEEE Trans. Computers, 2020, 69(5):749-763. DOI: 10.1109/TC.2020.2964767.
    [24]
    Pan S J, Tsang I W, Kwok J T, Yang Q. Domain adaptation via transfer component analysis. IEEE Trans. Neural Networks, 2011, 22(2):199-210. DOI: 10.1109/TNN.2010.2091281.
    [25]
    Sun B, Feng J, Saenko K. Return of frustratingly easy domain adaptation. In Proc. the 30th AAAI Conference on Artificial Intelligence, February 2016, pp.2058-2065.
    [26]
    Tzeng E, Hoffman J, Darrell T, Saenko K. Simultaneous deep transfer across domains and tasks. In Proc. the 2015 IEEE International Conference on Computer Vision, December 2015, pp.4068-4076. DOI: 10.1109/ICCV.2015.463.
    [27]
    Long M, Zhu H, Wang J, Jordan M I. Unsupervised domain adaptation with residual transfer networks. In Proc. the 30th Annual Conference on Neural Information Processing Systems, December 2016, pp.136-144.
    [28]
    Leskovec J, Sosič R. SNAP:A general-purpose network analysis and graph-mining library. ACM Trans. Intell. Syst. Technol., 2016, 8(1):Article No. 1. DOI: 10.1145/2898361.
    [29]
    Chakrabarti D, Zhan Y, Faloutsos C. R-MAT:A recursive model for graph mining. In Proc. the 4th SIAM International Conference on Data Mining, April 2004, pp.442-446. DOI: 10.1137/1.9781611972740.43.
    [30]
    Faloutsos M, Faloutsos P, Faloutsos C. On power-law relationships of the Internet topology. In Proc. the 1999 ACM SIGCOMM Conference on Applications, Technologies, Architectures, and Protocols for Computer Communication, August 30-September 3, 1999, pp.251-262. DOI: 10.1145/316188.316229.
    [31]
    Leskovec J, Chakrabarti D, Kleinberg J M, Faloutsos C. Realistic, mathematically tractable graph generation and evolution, using Kronecker multiplication. In Proc. the 9th European Conference on Principles and Practice of Knowledge Discovery in Databases, October 2005, pp.133-145. DOI: 10.1007/1156412617.
    [32]
    Park H, Kim M. TrillionG:A trillion-scale synthetic graph generator using a recursive vector model. In Proc. the 2017 ACM International Conference on Management of Data, May 2017, pp.913-928. DOI: 10.1145/3035918.3064014.
    [33]
    Boldi P, Vigna S. The webgraph framework I:Compression techniques. In Proc. the 13th International Conference on World Wide Web, May 2004, pp.595-602. DOI: 10.1145/988672.988752.
  • Others

Catalog

    Article views (52) PDF downloads (0) Cited by()
    Related

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return