计算机科学技术学报 ›› 2019,Vol. 34 ›› Issue (5): 993-1006.doi: 10.1007/s11390-019-1956-2

所属专题: Data Management and Data Mining Software Systems

• Special Section on Software Systems 2019 • 上一篇    下一篇

基于图嵌入的API子图推荐

Chun-Yang Ling, Yan-Zhen Zou*, Member, CCF, ACM, IEEE, Ze-Qi Lin, Bing Xie, Senior Member, CCF   

  1. 1 Key Laboratory of High Confidence Software Technologies, Ministry of Education, Beijing 100871, China;
    2 School of Electronics Engineering and Computer Science, Peking University, Beijing 100871, China
  • 收稿日期:2019-02-26 修回日期:2019-07-12 出版日期:2019-08-31 发布日期:2019-08-31
  • 通讯作者: Yan-Zhen Zou E-mail:zouyz@pku.edu.cn
  • 作者简介:Chun-Yang Ling received his B.S. degree in computer science from Peking University, Beijing, in 2018. He is now a Master student in computer science at Peking University, Beijing. His current research interests include software engineering, knowledge engineering, and data mining.
  • 基金资助:
    The work was supported by the National Natural Science Fund for Distinguished Young Scholars of China under Grant No. 61525201.

Graph Embedding Based API Graph Search and Recommendation

Chun-Yang Ling, Yan-Zhen Zou*, Member, CCF, ACM, IEEE, Ze-Qi Lin, Bing Xie, Senior Member, CCF   

  1. 1 Key Laboratory of High Confidence Software Technologies, Ministry of Education, Beijing 100871, China;
    2 School of Electronics Engineering and Computer Science, Peking University, Beijing 100871, China
  • Received:2019-02-26 Revised:2019-07-12 Online:2019-08-31 Published:2019-08-31
  • Contact: Yan-Zhen Zou E-mail:zouyz@pku.edu.cn
  • About author:Chun-Yang Ling received his B.S. degree in computer science from Peking University, Beijing, in 2018. He is now a Master student in computer science at Peking University, Beijing. His current research interests include software engineering, knowledge engineering, and data mining.
  • Supported by:
    The work was supported by the National Natural Science Fund for Distinguished Young Scholars of China under Grant No. 61525201.

检索软件项目API(Application Program Interface,应用程序接口)是开发者复用软件的重要方式.当前基于自然语言的API检索和推荐主要面临以下两个挑战:1)随着软件项目变得越来越复杂,自然语言查询的准确性需要提高.2)需要展示目标APIs之间的语义关联,以更好地辅助开发者理解APIs的逻辑和使用场景.为此,本文提出了一种基于图嵌入的API子图推荐方法.首先,该方法能够基于软件项目源代码,自动构建其代码结构图,并通过图嵌入对源代码进行表示.然后,开发者可以输入自然语言问题、检索并返回相关的APIs及其关联信息构成的连通API子图,从而提高了API检索和复用的效率.本文选择了开源项目Apache Lucene、POI、JodaTime为例及进行实验验证.结果表明本文方法的F1值相比现有的基于最短路径的方法提高了10%,而平均响应时间缩短了约60倍.

关键词: API推荐, 代码图, 图嵌入, 软件复用

Abstract: Searching application programming interfaces (APIs) is very important for developers to reuse software projects. Existing natural language based API search mainly faces the following challenges. 1) More accurate results are required as software projects evolve to be more heterogeneous and complex. 2) The semantic relationships between APIs (e.g., inheritances between classes, and invocations between methods) need to be illustrated so that developers can better understand their usage scenarios. To deal with these issues, we propose GeAPI, a novel graph embedding based approach for API graph search and recommendation in this paper. First, we build a software project's API graph automatically from its source code and represent each API using graph embedding methods. Second, we search the API graph with a question in natural language, and return the corresponding subgraph that is composed of relevant code elements and their associated relationships, as the best answer of the question. In experiments, we select three well-known open source projects, JodaTime, Apache Lucene and POI, as examples to perform API search tasks. The experimental results show that our approach GeAPI improves F1-score by 10% compared with the existing shortest path based API search approach, while reduces the average response time about 60 times.

Key words: application programming interface (API) recommendation, API graph, graph embedding, software reuse

[1] Moreno L, Bavota G, di Penta M et al. How can I use this method? In Proc. the 37th International Conference on Software Engineering, May 2015, pp.880-890.
[2] Sirres R, Bissyandé T F, Kim D, Lo D, Klein J, Kim K, Traon L Y. Augmenting and structuring user queries to support efficient free-form code search. Empirical Software Engineering, 2018, 23(5):2622-2654.
[3] Stylos J, Myers B A. Mica:A web-search tool for finding API components and examples. In Proc. the 2006 IEEE Symposium on Visual Languages and Human-Centric Computing, Sept. 2006, pp.195-202.
[4] Linstead E, Bajracharya S, Ngo T, Rigor P, Lopes C, Baldi P. Sourcerer:Mining and searching internet-scale software repositories. Data Mining and Knowledge Discovery, 2009, 18(2):300-336.
[5] Baeza-Yates R, Ribeiro-Neto B. Modern Information Retrieval-The Concepts and Technology Behind Search (2nd edition). Addison-Wesley Professional, 2011.
[6] Lv F, Zhang H, Lou J, Wang S, Zhang D, Zhao J. CodeHow:Effective code search based on API understanding and extended Boolean model (E). In Proc. the 30th IEEE/ACM International Conference on Automated Software Engineering, Nov. 2015, pp.260-270.
[7] Hill E, Pollock L, Vijay-Shanker K. Improving source code search with natural language phrasal representations of method signatures. In Proc. the 26th IEEE/ACM International Conference on Automated Software Engineering, Nov. 2011, pp.524-527.
[8] Rahman M M, Roy C K. QUICKAR:Automatic query reformulation for concept location using crowdsourced knowledge. In Proc. the 31st IEEE/ACM International Conference on Automated Software Engineering, Aug. 2016, pp.220-225.
[9] McMillan C, Grechanik M, Poshyvanyk D, Xie Q, Fu C. Portfolio:Finding relevant functions and their usage. In Proc. the 33rd International Conference on Software Engineering, May 2011, pp.111-120.
[10] Chan W K, Cheng H, Lo D. Searching connected API subgraph via text phrases. In Proc. the 20th ACM SIGSOFT International Symposium on the Foundations of Software Engineering, Nov. 2012, Article No. 10.
[11] Goyal P, Ferrara E. Graph embedding techniques, applications, and performance:A survey. Knowledge-Based Systems, 2018, 151:78-94.
[12] Belkin M, Niyogi P. Laplacian eigenmaps and spectral techniques for embedding and clustering. In Proc. the 14th International Conference on Neural Information Processing Systems:Natural and Synthetic, Dec. 2002, pp.585-591.
[13] Ahmed A, Shervashidze N, Narayanamurthy S, Josifovski V, Smola A J. Distributed large-scale natural graph factorization. In Proc. the 22nd International World Wide Web Conference, May 2013, pp.37-48.
[14] Perozzi B, Al-Rfou R, Skiena S. DeepWalk:Online learning of social representations. In Proc. the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Aug. 2014, pp.701-710.
[15] Wang D, Cui P, Zhu W. Structural deep network embedding. In Proc. the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Aug. 2016, pp.1225-1234.
[16] Kipf T N, Welling M. Semi-supervised classification with graph convolutional networks. arXiv:1609.02907, 2016. https://arXiv.org/abs/1609.02907, July 2019.
[17] Bordes A, Usunier N, Garcia-Durán A, Weston J, Yakhnenko O. Translating embeddings for modeling multirelational data. In Proc. the 27th Annual Conference on Neural Information Processing Systems, Dec. 2013, pp.2787-2795.
[18] Lin Y, Liu Z, Sun M, Liu Y, Zhu X. Learning entity and relation embeddings for knowledge graph completion. In Proc. the 29th AAAI Conference on Artificial Intelligence, Jan. 2015, pp.2181-2187.
[19] Tang J, Qu M, Wang M, Zhang M, Yan J, Mei Q. LINE:Large-scale information network embedding. In Proc. the 24th International Conference on World Wide Web, May 2015, pp.1067-1077.
[20] Cui P, Wang X, Pei J, Zhu W. A survey on network embedding. IEEE Transactions on Knowledge and Data Engineering, 2018, 31(5):833-852.
[21] Zou Y, Ling C, Lin Z, Xie B. Graph embedding based code search in software project. In Proc. the 10th Asia-Pacific Symposium on Internetware, Sept. 2018, Article No. 1.
[22] Chatterjee S, Juvekar S, Sen K. SNIFF:A search engine for Java using free-form queries. In Proc. the 12th International Conference on Fundamental Approaches to Software Engineering, Mar. 2009, pp.385-400.
[23] Tian Y, Lo D, Lawall J. Automated construction of a software specific word similarity database. In Proc. the 2014 IEEE Conference on Software Maintenance, Reengineering, and Reverse Engineering, Feb. 2014, pp.44-53.
[24] Yang J, Tan L. Inferring semantically related words from software context. In Proc. the 9th IEEE Working Conference of Mining Software Repositories, June 2012, pp.161-170.
[25] Sridhara G, Hill E, Pollock L, Vijay-Shanker K. Identifying word relations in software:A comparative study of semantic similarity tools. In Proc. the 16th IEEE International Conference on Program Comprehension, June 2008, pp.123-132.
[26] Wang S, Lo D, Jiang L. Active code search:Incorporating user feedback to improve code search relevance. In Proc. the 29th IEEE/ACM International Conference on Automated Software Engineering, Sept. 2014, pp.677-682.
[27] Haiduc S, Bavota G, Marcus A, Oliveto R, de Lucia A, Menzies T. Automatic query reformulations for text retrieval in software engineering. In Proc. the 35th International Conference on Software Engineering, May 2013, pp.842-851.
[28] Jiang H, Nie L, Sun Z, Kong W, Zhang T, Luo X. ROSF:Leveraging information retrieval and supervised learning for recommending code snippets. IEEE Transactions on Services Computing, 2019, 12(1):34-46.
[29] Gu X, Zhang H, Zhang D, Kim S. Deep API learning. In Proc. the 24th ACM SIGSOFT International Symposium on the Foundations of Software Engineering, Nov. 2016, pp.631-642.
[30] Richardson K, Kuhn J. Function assistant:A tool for NL querying of APIs. arXiv:1706.00468, 2017. https://arXiv.org/abs/-1706.00468, July 2019.
[31] Nguyen T D, Nguyen A T, Phan H D, Nguyen T N. Exploring API embedding for API usages and applications. In Proc. the 39th IEEE/ACM International Conference on Software Engineering, May 2017, pp.438-449.
[32] Huang Q, Xia X, Xing Z, Lo D, Wang X. API method recommendation without worrying about the Task-API knowledge gap. In Proc. the 33rd ACM/IEEE International Conference on Automated Software Engineering, Sept. 2018, pp.292-303.
[33] Sillito J, Murphy G C, de Volder K. Asking and answering questions during a programming change task. IEEE Transactions on Software Engineering, 2008, 34(4):434-451.
[34] Li X, Wang Z, Wang Q, Yan S, Xie T, Mei H. Relationshipaware code search for JavaScript frameworks. In Proc. the 24th ACM SIGSOFT International Symposium on the Foundations of Software Engineering, Nov. 2016, pp.690-701.
[35] Fu K, Qian W Y, Peng X, Zhao W. Feature location method based on call chain analysis. Computer Science, 2014, 41(11):36-39. (in Chinese)
[1] 龙永浩, 陈彦呈, 陈湘萍, 石晓虹, 周凡. 测试驱动的构件功能提取[J]. 计算机科学技术学报, 2022, 37(2): 389-404.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] 金兰; 杨元元;. A Modified Version of Chordal Ring[J]. , 1986, 1(3): 15 -32 .
[2] 范植华;. Vectorization for Loops with Three-Forked Jumps[J]. , 1988, 3(3): 186 -202 .
[3] 沈理;. Testability Analysis at Switch Level for CMOS Circuits[J]. , 1990, 5(2): 197 -202 .
[4] 郭庆平; Y.Paker;. Communication Analysis and Granularity Assessment for a Transputer-Based System[J]. , 1990, 5(4): 347 -362 .
[5] 韩建超; 史忠植;. Formalizing Default Reasoning[J]. , 1990, 5(4): 374 -378 .
[6] 黄志毅; 胡守仁;. Detection of And-Parallelism in Logic Programs[J]. , 1990, 5(4): 379 -387 .
[7] 蔡士杰; 张福炎;. A Fast Algorithm for Polygon Operations[J]. , 1991, 6(1): 91 -96 .
[8] 田捷;. The Geometric Continuity of Rational Bezler Triangular Surfaces[J]. , 1991, 6(4): 383 -388 .
[9] 姚新; 李国杰;. General Simulated Annealing[J]. , 1991, 6(4): 329 -338 .
[10] 邓铁清; 吴泉源; 王志英;. A New Integrated System of Logic Programming and Relational Database[J]. , 1993, 8(1): 58 -67 .
版权所有 © 《计算机科学技术学报》编辑部
本系统由北京玛格泰克科技发展有限公司设计开发 技术支持:support@magtech.com.cn
总访问量: