Journal of Computer Science and Technology ›› 2020, Vol. 35 ›› Issue (6): 1234-1257.doi: 10.1007/s11390-020-0520-4

Special Issue: Software Systems

Previous Articles     Next Articles

ProSy: API-Based Synthesis with Probabilistic Model

Bin-Bin Liu1, Wei Dong2,*, Member, CCF, Jia-Xin Liu1, Ya-Ting Zhang1, and Dai-Yan Wang1        

  1. 1 College of Computer Science, National University of Defense Technology, Changsha 410072, China;
    2 Key Laboratory of Software Engineering for Complex Systems, College of Computer Science, National University of Defense Technology, Changsha 410072, China
  • Received:2020-04-10 Revised:2020-10-22 Online:2020-11-20 Published:2020-12-01
  • Contact: Wei Dong E-mail:wdong@nudt.edu.cn
  • About author:Bin-Bin Liu is a Ph.D. candidate in software engineering at National University of Defense Technology, Changsha. He received his B.S. degree in computer science and technology and M.S. degree in software engineering from National University of Defense Technology, Changsha, in 2013 and 2015 respectively. His research interests include intelligent methods in software engineering, and data-driven software engineering.
  • Supported by:
    This paper was supported by the National Natural Science Foundation of China under Grant No. 61690203, and the National Key Research and Development Program of China under Grant No. 2018YFB0204301.

Program synthesis is an exciting topic that desires to generate programs satisfying user intent automatically. But in most cases, only small programs for simple or domain-specific tasks can be synthesized. The major obstacle of synthesis lies in the huge search space. A common practice in addressing this problem is using a domain-specific language, while many approaches still wish to synthesize programs in general programming languages. With the rapid growth of reusable libraries, component-based synthesis provides a promising way, such as synthesizing Java programs which are only composed of APIs (application programming interfaces). However, the efficiency of searching for proper solutions for complex tasks is still a challenge. Given an unfamiliar programming task, programmers would search for API usage knowledge from various coding resources to reduce the search space. Considering this, we propose a novel approach named ProSy to synthesize API-based programs in Java. The key novelty is to retrieve related knowledge from Javadoc and Stack Overflow and then construct a probabilistic reachability graph. It assigns higher probabilities to APIs that are more likely to be used in implementing the given task. In the synthesis process, the program sketch with a higher probability will be considered first; thus, the number of explored reachable paths would be decreased. Some extension and optimization strategies are further studied in the paper. We implement our approach and conduct several experiments on it. We compare ProSy with SyPet and other state-of-the-art API-based synthesis approaches. The experimental results show that ProSy reduces the synthesis time of SyPet by up to 80%.

Key words: application programming interface (API)-based program; Petri net; probabilistic reachability graph; program synthesis;

[1] Gulwani S, Polozov O, Singh R. Program synthesis. Foundations and Trends in Programming Languages, 2017, 4(1/2):1-119.
[2] Pnueli A, Rosner R. On the synthesis of a reactive module. In Proc. the 16th Annual ACM Symposium on Principles of Programming Languages, January 1989, pp.179-190.
[3] Gulwani S, Jain P. Programming by examples:PL meets ML. In Proc. the 15th Asian Symposium on Programming Languages and Systems, November 2017, pp.3-20.
[4] Alur R, Bodík R, Juniwal G, Martin M M K, Raghothaman M, Seshia S A, Singh R, Solar-Lezama A, Torlak E, Udupa A. Syntax-guided synthesis. In Proc. the 13th International Conference on Formal Methods in Computer-Aided Design, October 2013, pp.1-8.
[5] Gulwani S, Jha S, Tiwari A, Venkatesan R. Synthesis of loop-free programs. In Proc. the 32nd ACM SIGPLAN Conference on Programming Language Design and Implementation, June 2011, pp.62-73.
[6] Solar-Lezama A. Program sketching. Int. J. Softw. Tools Technol. Transf., 2013, 15(5/6):475-495.
[7] Parisotto E, Mohamed A, Singh R, Li L, Zhou D, Kohli P. Neuro-symbolic program synthesis. In Proc. the 5th International Conference on Learning Representations, April 2017.
[8] Balog M, Gaunt A L, Brockschmidt M, Nowozin S, Tarlow D. DeepCoder:Learning to write programs. In Proc. the 5th International Conference on Learning Representations, April 2017.
[9] Perelman D, Gulwani S, Grossman D, Provost P. Testdriven synthesis. In Proc. the ACM SIGPLAN Conference on Programming Language Design and Implementation, June 2014, pp.408-418.
[10] Feng Y, Martins R, Geffen J V, Dillig I, Chaudhuri S. Component-based synthesis of table consolidation and transformation tasks from examples. In Proc. the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation, June 2017, pp.422-436.
[11] Jha S, Gulwani S, Seshia S A, Tiwari A. Oracle-guided component-based program synthesis. In Proc. the 32nd ACM/IEEE International Conference on Software Engineering, May 2010, pp.215-224.
[12] Feng Y, Martins R, Wang Y, Dillig I, Reps T W. Component-based synthesis for complex APIs. In Proc. the 44th ACM SIGPLAN Symposium on Principles of Programming Languages, January 2017, pp.599-612.
[13] Shi K, Steinhardt J, Liang P. FrAngel:Component-based synthesis with control structures. Proc. the ACM on Programming Languages, 2019, 3(POPL):Article No. 73.
[14] Mei H, Zhang L. Can big data bring a breakthrough for software automation? SCIENCE CHINA Information Sciences, 2018, 61(5):Article No. 056101.
[15] Petri C A, Reisig W. Petri net. Scholarpedia, 2008, 3(4):Article No. 6477.
[16] Mikolov T, Chen K, Corrado G, Dean J. Efficient estimation of word representations in vector space. arXiv:1301.3781, 2013. https://arxiv.org/abs/1301.3781, Sept. 2020.
[17] Pennington J, Socher R, Manning C D. Glove:Global vectors for word representation. In Proc. the 2014 Conference on Empirical Methods in Natural Language Processing, October 2014, pp.1532-1543.
[18] Eén N, Sörensson N. Translating pseudo-boolean constraints into SAT. J. Satisf. Boolean Model. Comput., 2006, 2(1/2/3/4):1-26.
[19] Darwin I F. Java Cookbook. O'Reilly, 2001.
[20] Murali V, Qi L, Chaudhuri S, Jermaine C. Neural sketch learning for conditional program generation. In Proc. the 6th International Conference on Learning Representations, April 2018.
[21] Gulwani S. Automating string processing in spreadsheets using input-output examples. In Proc. the 38th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, January 2011, pp.317-330.
[22] Singh R, Gulwani S. Learning semantic string transformations from examples. Proc. the VLDB Endowment, 2012, 5(8):740-751.
[23] Albarghouthi A, Gulwani S, Kincaid Z. Recursive program synthesis. In Proc. the 25th International Conference on Computer Aided Verification, July 2013, pp.934-950.
[24] Singh R, Gulwani S. Synthesizing number transformations from input-output examples. In Proc. the 24th International Conference on Computer Aided Verification, July 2012, pp.634-651.
[25] Gulwani S, Harris W R, Singh R. Spreadsheet data manipulation using examples. Communications of the ACM, 2012, 55(8):97-105.
[26] Gulwani S, Korthikanti V A, Tiwari A. Synthesizing geometry constructions. In Proc. the 32nd ACM SIGPLAN Conference on Programming Language Design and Implementation, June 2011, pp.50-61.
[27] Feser J K, Chaudhuri S, Dillig I. Synthesizing data structure transformations from input-output examples. In Proc. the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation, June 2015, pp.229-239.
[28] Kalyan A, Mohta A, Polozov O, Batra D, Jain P, Gulwani S. Neural-guided deductive search for real-time program synthesis from examples. In Proc. the 6th International Conference on Learning Representations, April 2018.
[29] Ling W, Blunsom P, Grefenstette E, Hermann K M, Kociský T, Wang F, Senior A W. Latent predictor networks for code generation. In Proc. the 54th Annual Meeting of the Association for Computational Linguistics, August 2016.
[30] Yin P, Neubig G. A syntactic neural model for general purpose code generation. In Proc. the 55th Annual Meeting of the Association for Computational Linguistics, July 2017, pp.440-450.
[31] Rabinovich M, Stern M, Klein D. Abstract syntax networks for code generation and semantic parsing. In Proc. the 55th Annual Meeting of the Association for Computational Linguistics, July 2017, pp.1139-1149.
[32] Lee W, Heo K, Alur R, Naik M. Accelerating search-based program synthesis using learned probabilistic models. In Proc. the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation, June 2018, pp.436-449.
[33] Chen Y, Martins R, Feng Y. Maximal multi-layer specification synthesis. In Proc. the ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, August 2019, pp.602-612.
[34] Linstead E, Bajracharya S K, Ngo T C, Rigor P, Lopes C V, Baldi P. Sourcerer:Mining and searching Internet-scale software repositories. Data Min. Knowl. Discov., 2009, 18(2):300-336.
[35] McMillan C, Grechanik M, Poshyvanyk D, Xie Q, Fu C. Portfolio:Finding relevant functions and their usage. In Proc. the 33rd International Conference on Software Engineering, May 2011, pp.111-120.
[36] Lu M, Sun X, Wang S, Lo D, Duan Y. Query expansion via WordNet for effective code search. In Proc. the 22nd IEEE International Conference on Software Analysis, March 2015, pp.545-549.
[37] Haiduc S, Bavota G, Marcus A, Oliveto R, Lucia A D, Menzies T. Automatic query reformulations for text retrieval in software engineering. In Proc. the 35th International Conference on Software Engineering, May 2013, pp.842-851.
[38] Lv F, Zhang H, Lou J, Wang S, Zhang D, Zhao J. CodeHow:Effective code search based on API understanding and extended Boolean model (E). In Proc. the 30th IEEE/ACM International Conference on Automated Software Engineering, November 2015, pp.260-270.
[39] Gu X, Zhang H, Kim S. Deep code search. In Proc. the 40th International Conference on Software Engineering, May 2018, pp.933-944.
[40] Ke Y, Stolee K T, Goues C L, Brun Y. Repairing programs with semantic code search (T). In Proc. the 30th IEEE/ACM International Conference on Automated Software Engineering, November 2015, pp.295-306.
[41] Reiss S P. Semantics-based code search. In Proc. the 31st International Conference on Software Engineering, May 2009, pp.243-253.
[42] Stolee K T, Elbaum S G, Dobos D. Solving the search for source code. ACM Trans. Softw. Eng. Methodol., 2014, 23(3):Article No. 26.
[43] Sirres R, Bissyandé T F, Kim D, Lo D, Klein J, Kim K, Traon Y L. Augmenting and structuring user queries to support efficient free-form code search. In Proc. the 40th International Conference on Software Engineering, May 2018, pp.945-945.
[44] Nguyen A T, Hilton M, Codoban M, Nguyen H A, Mast L, Rademacher E, Nguyen T N, Dig D. API code recommendation using statistical learning from fine-grained changes. In Proc. the 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering, November 2016, pp.511-522.
[45] Gu X, Zhang H, Zhang D, Kim S. Deep API learning. In Proc. the 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering, November 2016, pp.631-642.
[46] Huang Q, Xia X, Xing Z, Lo D, Wang X. API method recommendation without worrying about the task-API knowledge gap. In Proc. the 33rd ACM/IEEE International Conference on Automated Software Engineering, September 2018, pp.293-304.
[47] Liu X, Huang L, Ng V. Effective API recommendation without historical software repositories. In Proc. the 33rd ACM/IEEE International Conference on Automated Software Engineering, September 2018, pp.282-292.
[48] Li X, Jiang H, Kamei Y, Chen X. Bridging semantic gaps between natural languages and APIs with word embedding. arXiv:1810.09723, 2018. https://arxiv.org/abs/1810.09723, Sept. 2020.
[49] Mandelin D, Xu L, Bodík R, Kimelman D. Jungloid mining:Helping to navigate the API jungle. In Proc. the 2005 ACM SIGPLAN Conference on Programming Language Design and Implementation, June 2005, pp.48-61.
[50] Chan W, Cheng H, Lo D. Searching connected API subgraph via text phrases. In Proc. the 20th ACM SIGSOFT Symposium on the Foundations of Software Engineering, November 2012, Article No. 10.
[51] van Nguyen T, Rigby P C, Nguyen A T, Karanfil M, Nguyen T N. T2API:Synthesizing API code usage templates from English texts with statistical translation. In Proc. the 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering, November 2016, pp.1013-1017.
[52] Nguyen T T, Nguyen H A, Pham N H, Al-Kofahi J M, Nguyen T N. Graph-based mining of multiple object usage patterns. In Proc. the 7th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT International Symposium on Foundations of Software Engineering, August 2009, pp.383-392.
[1] Le-Jun Fan, Yuan-Zhuo Wang, Jing-Yuan Li, Xue-Qi Cheng, Chuang Lin. Privacy Petri Net and Privacy Leak Software [J]. , 2015, 30(6): 1318-1343.
[2] Yang Liu, Huai-Kou Miao, Hong-Wei Zeng, Yan Ma, and Pan Liu. Nondeterministic Probabilistic Petri Net — A New Method to Study Qualitative and Quantitative Behaviors of System [J]. , 2013, 28(1): 203-216.
[3] Chong-Yi Yuan, Wen Zhao, Shi-Kun Zhang, and Yu Huang. A Three-Layer Model for Business Processes --- Process Logic, Case Semantics and Workflow Management [J]. , 2007, 22(3): 410-425 .
[4] Lei Shi, Ying-Jie Han, Xiao-Guang Ding, Lin Wei and Zhi-Min Gu. An SPN-Based Integrated Model for Web Prefetching and Caching [J]. , 2006, 21(4): 482-489 .
[5] Xing-Qi Huang, Li-Fu Wang, Wen Zhao, Shi-Kun Zhang, and Chong-Yi Yuan. A Workflow Process Mining Algorithm Based on Synchro-Net [J]. , 2006, 21(1): 66-71 .
[6] Yu-Yue Du and Chang-Jun Jiang. Verifying Functions in Online Stock Trading Systems [J]. , 2004, 19(2): 0-0.
[7] ZHOU GuoFu (周国富) and YUAN ChongYi (袁崇义). Mapping PUNITY to UniNet [J]. , 2003, 18(3): 0-0.
[8] LIN Chuang (林 闯) and XU MingWei (徐明伟). Stability Analysis of Buffer Priority Scheduling Policies Using Petri Nets [J]. , 2003, 18(3): 0-0.
[9] JIANG Changjun (蒋昌俊), WANG Huaiqing (王怀清) and LIAO Shaoyi (廖少毅). Behavior Relativity of Petri Nets [J]. , 2002, 17(6): 0-0.
[10] YU Jian(喻坚),WANG Shengyuan(王生原)and YUAN Chongyi(袁崇义). Solving Inheritance Anomaly with OMNets [J]. , 2002, 17(1): 0-0.
[11] WANG Shengyuan(王生原),YU Jian(喻坚)and YUAN Chongyi(袁崇义). A Pragmatic Behavior Subtyping Relation Based on Both States and Actions [J]. , 2001, 16(5): 0-0.
[12] LI Xuandong(李宣东). Verifying Time Petri Nets by Linear Programming [J]. , 2001, 16(1): 0-0.
[13] LIANG Yongquan; SHI Zhongzhi;. A Multimedia Synchronization Model Based on Timed Petri Net [J]. , 1999, 14(3): 276-282.
[14] Lin Hong; Chen Guoliang;. Program Constructionby Verifying Specification [J]. , 1998, 13(6): 597-607.
[15] Shuai Dianxun;. Hyper-Distributed Hyper-Parallel Implementation of Heuristic Search of Implicit AND/OR Graph [J]. , 1997, 12(6): 532-542.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] Zhou Di;. A Recovery Technique for Distributed Communicating Process Systems[J]. , 1986, 1(2): 34 -43 .
[2] Wang Jianchao; Wei Daozheng;. An Effective Test Generation Algorithm for Combinational Circuits[J]. , 1986, 1(4): 1 -16 .
[3] Huang Guoxiang; Liu Jian;. A Key-Lock Access Control[J]. , 1987, 2(3): 236 -243 .
[4] Duan Ping; Cai Xiyao;. A Real-Time Interprocessor Synchronization Algorithm for Communication in Distributed Computer Systems[J]. , 1987, 2(4): 292 -302 .
[5] Li Renwei;. Soundness and Completeness of Kung s Reasoning Procedure[J]. , 1988, 3(1): 7 -15 .
[6] Feng Yin; Wang Kaizhu; Chang Yadong; Li Zhongrong;. CQAES,a Chinese Question Answer Experimental System[J]. , 1988, 3(4): 317 -319 .
[7] Lian Lin; Zhang Yili; Tang Changjie;. A Non-Recursive Algorithm Computing Set Expressions[J]. , 1988, 3(4): 310 -316 .
[8] Zhang Bo; Zhang Tian; Zhang Jianwei; Zhang Ling;. Motion Planning for Robots with Topological Dimension Reduction Method[J]. , 1990, 5(1): 1 -16 .
[9] Zhu Mingyuan;. Two Congruent Semantics for Prolog with CUT[J]. , 1990, 5(1): 82 -91 .
[10] Xu Jie; Li Qingnan; Huang Shize; Xu Jiangfeng;. DFTSNA:A Distributed Fault-Tolerant Shipboard System[J]. , 1990, 5(2): 109 -116 .

ISSN 1000-9000(Print)

         1860-4749(Online)
CN 11-2296/TP

Home
Editorial Board
Author Guidelines
Subscription
Journal of Computer Science and Technology
Institute of Computing Technology, Chinese Academy of Sciences
P.O. Box 2704, Beijing 100190 P.R. China
Tel.:86-10-62610746
E-mail: jcst@ict.ac.cn
 
  Copyright ©2015 JCST, All Rights Reserved