Journal of Computer Science and Technology ›› 2019, Vol. 34 ›› Issue (5): 993-1006.doi: 10.1007/s11390-019-1956-2

Special Issue: Data Management and Data Mining; Software Systems

• Special Section on Software Systems 2019 • Previous Articles     Next Articles

Graph Embedding Based API Graph Search and Recommendation

Chun-Yang Ling, Yan-Zhen Zou*, Member, CCF, ACM, IEEE, Ze-Qi Lin, Bing Xie, Senior Member, CCF   

  1. 1 Key Laboratory of High Confidence Software Technologies, Ministry of Education, Beijing 100871, China;
    2 School of Electronics Engineering and Computer Science, Peking University, Beijing 100871, China
  • Received:2019-02-26 Revised:2019-07-12 Online:2019-08-31 Published:2019-08-31
  • Contact: Yan-Zhen Zou
  • About author:Chun-Yang Ling received his B.S. degree in computer science from Peking University, Beijing, in 2018. He is now a Master student in computer science at Peking University, Beijing. His current research interests include software engineering, knowledge engineering, and data mining.
  • Supported by:
    The work was supported by the National Natural Science Fund for Distinguished Young Scholars of China under Grant No. 61525201.

Searching application programming interfaces (APIs) is very important for developers to reuse software projects. Existing natural language based API search mainly faces the following challenges. 1) More accurate results are required as software projects evolve to be more heterogeneous and complex. 2) The semantic relationships between APIs (e.g., inheritances between classes, and invocations between methods) need to be illustrated so that developers can better understand their usage scenarios. To deal with these issues, we propose GeAPI, a novel graph embedding based approach for API graph search and recommendation in this paper. First, we build a software project's API graph automatically from its source code and represent each API using graph embedding methods. Second, we search the API graph with a question in natural language, and return the corresponding subgraph that is composed of relevant code elements and their associated relationships, as the best answer of the question. In experiments, we select three well-known open source projects, JodaTime, Apache Lucene and POI, as examples to perform API search tasks. The experimental results show that our approach GeAPI improves F1-score by 10% compared with the existing shortest path based API search approach, while reduces the average response time about 60 times.

Key words: application programming interface (API) recommendation; API graph; graph embedding; software reuse;

[1] Moreno L, Bavota G, di Penta M et al. How can I use this method? In Proc. the 37th International Conference on Software Engineering, May 2015, pp.880-890.
[2] Sirres R, Bissyandé T F, Kim D, Lo D, Klein J, Kim K, Traon L Y. Augmenting and structuring user queries to support efficient free-form code search. Empirical Software Engineering, 2018, 23(5):2622-2654.
[3] Stylos J, Myers B A. Mica:A web-search tool for finding API components and examples. In Proc. the 2006 IEEE Symposium on Visual Languages and Human-Centric Computing, Sept. 2006, pp.195-202.
[4] Linstead E, Bajracharya S, Ngo T, Rigor P, Lopes C, Baldi P. Sourcerer:Mining and searching internet-scale software repositories. Data Mining and Knowledge Discovery, 2009, 18(2):300-336.
[5] Baeza-Yates R, Ribeiro-Neto B. Modern Information Retrieval-The Concepts and Technology Behind Search (2nd edition). Addison-Wesley Professional, 2011.
[6] Lv F, Zhang H, Lou J, Wang S, Zhang D, Zhao J. CodeHow:Effective code search based on API understanding and extended Boolean model (E). In Proc. the 30th IEEE/ACM International Conference on Automated Software Engineering, Nov. 2015, pp.260-270.
[7] Hill E, Pollock L, Vijay-Shanker K. Improving source code search with natural language phrasal representations of method signatures. In Proc. the 26th IEEE/ACM International Conference on Automated Software Engineering, Nov. 2011, pp.524-527.
[8] Rahman M M, Roy C K. QUICKAR:Automatic query reformulation for concept location using crowdsourced knowledge. In Proc. the 31st IEEE/ACM International Conference on Automated Software Engineering, Aug. 2016, pp.220-225.
[9] McMillan C, Grechanik M, Poshyvanyk D, Xie Q, Fu C. Portfolio:Finding relevant functions and their usage. In Proc. the 33rd International Conference on Software Engineering, May 2011, pp.111-120.
[10] Chan W K, Cheng H, Lo D. Searching connected API subgraph via text phrases. In Proc. the 20th ACM SIGSOFT International Symposium on the Foundations of Software Engineering, Nov. 2012, Article No. 10.
[11] Goyal P, Ferrara E. Graph embedding techniques, applications, and performance:A survey. Knowledge-Based Systems, 2018, 151:78-94.
[12] Belkin M, Niyogi P. Laplacian eigenmaps and spectral techniques for embedding and clustering. In Proc. the 14th International Conference on Neural Information Processing Systems:Natural and Synthetic, Dec. 2002, pp.585-591.
[13] Ahmed A, Shervashidze N, Narayanamurthy S, Josifovski V, Smola A J. Distributed large-scale natural graph factorization. In Proc. the 22nd International World Wide Web Conference, May 2013, pp.37-48.
[14] Perozzi B, Al-Rfou R, Skiena S. DeepWalk:Online learning of social representations. In Proc. the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Aug. 2014, pp.701-710.
[15] Wang D, Cui P, Zhu W. Structural deep network embedding. In Proc. the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Aug. 2016, pp.1225-1234.
[16] Kipf T N, Welling M. Semi-supervised classification with graph convolutional networks. arXiv:1609.02907, 2016., July 2019.
[17] Bordes A, Usunier N, Garcia-Durán A, Weston J, Yakhnenko O. Translating embeddings for modeling multirelational data. In Proc. the 27th Annual Conference on Neural Information Processing Systems, Dec. 2013, pp.2787-2795.
[18] Lin Y, Liu Z, Sun M, Liu Y, Zhu X. Learning entity and relation embeddings for knowledge graph completion. In Proc. the 29th AAAI Conference on Artificial Intelligence, Jan. 2015, pp.2181-2187.
[19] Tang J, Qu M, Wang M, Zhang M, Yan J, Mei Q. LINE:Large-scale information network embedding. In Proc. the 24th International Conference on World Wide Web, May 2015, pp.1067-1077.
[20] Cui P, Wang X, Pei J, Zhu W. A survey on network embedding. IEEE Transactions on Knowledge and Data Engineering, 2018, 31(5):833-852.
[21] Zou Y, Ling C, Lin Z, Xie B. Graph embedding based code search in software project. In Proc. the 10th Asia-Pacific Symposium on Internetware, Sept. 2018, Article No. 1.
[22] Chatterjee S, Juvekar S, Sen K. SNIFF:A search engine for Java using free-form queries. In Proc. the 12th International Conference on Fundamental Approaches to Software Engineering, Mar. 2009, pp.385-400.
[23] Tian Y, Lo D, Lawall J. Automated construction of a software specific word similarity database. In Proc. the 2014 IEEE Conference on Software Maintenance, Reengineering, and Reverse Engineering, Feb. 2014, pp.44-53.
[24] Yang J, Tan L. Inferring semantically related words from software context. In Proc. the 9th IEEE Working Conference of Mining Software Repositories, June 2012, pp.161-170.
[25] Sridhara G, Hill E, Pollock L, Vijay-Shanker K. Identifying word relations in software:A comparative study of semantic similarity tools. In Proc. the 16th IEEE International Conference on Program Comprehension, June 2008, pp.123-132.
[26] Wang S, Lo D, Jiang L. Active code search:Incorporating user feedback to improve code search relevance. In Proc. the 29th IEEE/ACM International Conference on Automated Software Engineering, Sept. 2014, pp.677-682.
[27] Haiduc S, Bavota G, Marcus A, Oliveto R, de Lucia A, Menzies T. Automatic query reformulations for text retrieval in software engineering. In Proc. the 35th International Conference on Software Engineering, May 2013, pp.842-851.
[28] Jiang H, Nie L, Sun Z, Kong W, Zhang T, Luo X. ROSF:Leveraging information retrieval and supervised learning for recommending code snippets. IEEE Transactions on Services Computing, 2019, 12(1):34-46.
[29] Gu X, Zhang H, Zhang D, Kim S. Deep API learning. In Proc. the 24th ACM SIGSOFT International Symposium on the Foundations of Software Engineering, Nov. 2016, pp.631-642.
[30] Richardson K, Kuhn J. Function assistant:A tool for NL querying of APIs. arXiv:1706.00468, 2017., July 2019.
[31] Nguyen T D, Nguyen A T, Phan H D, Nguyen T N. Exploring API embedding for API usages and applications. In Proc. the 39th IEEE/ACM International Conference on Software Engineering, May 2017, pp.438-449.
[32] Huang Q, Xia X, Xing Z, Lo D, Wang X. API method recommendation without worrying about the Task-API knowledge gap. In Proc. the 33rd ACM/IEEE International Conference on Automated Software Engineering, Sept. 2018, pp.292-303.
[33] Sillito J, Murphy G C, de Volder K. Asking and answering questions during a programming change task. IEEE Transactions on Software Engineering, 2008, 34(4):434-451.
[34] Li X, Wang Z, Wang Q, Yan S, Xie T, Mei H. Relationshipaware code search for JavaScript frameworks. In Proc. the 24th ACM SIGSOFT International Symposium on the Foundations of Software Engineering, Nov. 2016, pp.690-701.
[35] Fu K, Qian W Y, Peng X, Zhao W. Feature location method based on call chain analysis. Computer Science, 2014, 41(11):36-39. (in Chinese)
[1] Yong-Hao Long, Yan-Cheng Chen, Xiang-Ping Chen, Xiao-Hong Shi, and Fan Zhou. Test-Driven Feature Extraction of Web Components [J]. Journal of Computer Science and Technology, 2022, 37(2): 389-404.
[2] Ji-Zhao Zhu, Yan-Tao Jia, Jun Xu, Jian-Zhong Qiao, Xue-Qi Cheng. Modeling the Correlations of Relations for Knowledge Graph Embedding [J]. , 2018, 33(2): 323-334.
[3] MEI Hong (梅宏), XIE Tao (谢涛) and YANG Fuqing (杨芙清). A Model-Based Approach to Object-Oriented Software Metrics [J]. , 2002, 17(6): 0-0.
[4] MEI Hong (梅宏), ZHANG Lu (张路) and YANG Fuqing (杨芙清). A Component-Based Software Configuration Management Model and Its Supporting System [J]. , 2002, 17(4): 0-0.
Full text



[1] Jin Lan; Yang Yuanyuan;. A Modified Version of Chordal Ring[J]. , 1986, 1(3): 15 -32 .
[2] Fan Zhihua;. Vectorization for Loops with Three-Forked Jumps[J]. , 1988, 3(3): 186 -202 .
[3] Shen Li;. Testability Analysis at Switch Level for CMOS Circuits[J]. , 1990, 5(2): 197 -202 .
[4] Guo Qingping; Y. Paker;. Communication Analysis and Granularity Assessment for a Transputer-Based System[J]. , 1990, 5(4): 347 -362 .
[5] Han Jianchao; Shi Zhongzhi;. Formalizing Default Reasoning[J]. , 1990, 5(4): 374 -378 .
[6] Huang Zhiyi; Hu Shouren;. Detection of And-Parallelism in Logic Programs[J]. , 1990, 5(4): 379 -387 .
[7] Cai Shijie; Zhang Fuyan;. A Fast Algorithm for Polygon Operations[J]. , 1991, 6(1): 91 -96 .
[8] Tian Jie;. The Geometric Continuity of Rational Bezler Triangular Surfaces[J]. , 1991, 6(4): 383 -388 .
[9] Yao Xin; Li Guojie;. General Simulated Annealing[J]. , 1991, 6(4): 329 -338 .
[10] Deng Tieqing; Wu Quanyuan; Wang Zhiying;. A New Integrated System of Logic Programming and Relational Database[J]. , 1993, 8(1): 58 -67 .

ISSN 1000-9000(Print)

CN 11-2296/TP

Editorial Board
Author Guidelines
Journal of Computer Science and Technology
Institute of Computing Technology, Chinese Academy of Sciences
P.O. Box 2704, Beijing 100190 P.R. China
  Copyright ©2015 JCST, All Rights Reserved