Journal of Computer Science and Technology ›› 2021, Vol. 36 ›› Issue (4): 922-943.doi: 10.1007/s11390-021-0235-1

Special Issue: Artificial Intelligence and Pattern Recognition; Software Systems

• Regular Paper • Previous Articles     Next Articles

Discovering API Directives from API Specifications with Text Classification

Jing-Xuan Zhang1,2,3, Member, CCF, ACM, Chuan-Qi Tao1,2, Member, CCF, ACM Zhi-Qiu Huang1,2, Member, CCF, and Xin Chen3,4, Member, CCF        

  1. 1 College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing 210016, China;
    2 Key Laboratory of Safety-Critical Software(Nanjing University of Aeronautics and Astronautics), Ministry of Industry and Information Technology, Nanjing 210016, China;
    3 Key Laboratory of Complex Systems Modeling and Simulation(Hangzhou Dianzi University), Ministry of Education Hangzhou 310018, China;
    4 School of Computer Science and Technology, Hangzhou Dianzi University, Hangzhou 310018, China
  • Received:2019-12-18 Revised:2021-06-09 Online:2021-07-05 Published:2021-07-30
  • About author:Jing-Xuan Zhang received his Ph.D. degree in software engineering from the School of Software, Dalian University of Technology, Dalian, in 2018. He is currently a lecturer of the College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing. His current research interests include mining software repositories and software data analytics.
  • Supported by:
    This work is partially supported by the National Key Research and Development Plan of China under Grant No. 2018YFB1003900, the National Natural Science Foundation of China under Grant No. 61902181, the China Postdoctoral Science Foundation under Grant No. 2020M671489, and the CCF-Tencent Open Research Fund under Grant No. RAGR20200106.

Application programming interface (API) libraries are extensively used by developers. To correctly program with APIs and avoid bugs, developers shall pay attention to API directives, which illustrate the constraints of APIs. Unfortunately, API directives usually have diverse morphologies, making it time-consuming and error-prone for developers to discover all the relevant API directives. In this paper, we propose an approach leveraging text classification to discover API directives from API specifications. Specifically, given a set of training sentences in API specifications, our approach first characterizes each sentence by three groups of features. Then, to deal with the unequal distribution between API directives and non-directives, our approach employs an under-sampling strategy to split the imbalanced training set into several subsets and trains several classifiers. Given a new sentence in an API specification, our approach synthesizes the trained classifiers to predict whether it is an API directive. We have evaluated our approach over a publicly available annotated API directive corpus. The experimental results reveal that our approach achieves an F-measure value of up to 82.08%. In addition, our approach statistically outperforms the state-of-the-art approach by up to 29.67% in terms of F-measure.

Key words: Application programming interface (API) directive; API specification; imbalanced learning; text classification;

[1] Maalej W, Robillard M P. Patterns of knowledge in API reference documentation. IEEE Transactions on Software Engineering, 2013, 39(9):1264-1282. DOI:10.1109/TSE.2013.12.
[2] Petrosyan G, Robillard M P, De Mori R. Discovering information explaining API types using text classification. In Proc. the 37th International Conference on Software Engineering, May 2015, pp.869-879. DOI:10.1109/ICSE.2015.97.
[3] Jiang H, Zhang J X, Ren Z L, Zhang T. An unsupervised approach for discovering relevant tutorial fragments for APIs. In Proc. the 39th International Conference on Software Engineering, May 2017, pp.38-48. DOI:10.1109/ICSE.2017.12.
[4] Monperrus M, Eichberg M, Tekes E, Mezini M. What should developers be aware of? An empirical study on the directives of API documentation. Empirical Software Engineering, 2012, 17(6):703-737. DOI:10.1007/s10664-0119186-4.
[5] Dekel U, Herbsleb J D. Improving API documentation usability with knowledge pushing. In Proc. the 31st International Conference on Software Engineering, May 2009, pp.320-330. DOI:10.1109/ICSE.2009.5070532.
[6] Dagenais B, Robillard M P. Recovering traceability links between an API and its learning resources. In Proc. the 34th IEEE/ACM International Conference on Software Engineering, June 2012, pp.47-57. DOI:10.1109/ICSE.2012.6227207.
[7] Subramanian S, Inozemtseva L Holmes R. Live API documentation. In Proc. the 36th ACM/IEEE International Conference on Software Engineering, May 2014, pp.643-652. DOI:10.1145/2568225.2568313.
[8] Saied M A, Sahraoui H, Dufour B. An observational study on API usage constraints and their documentation. In Proc. the 22nd IEEE International Conference on Software Analysis, Evolution, and Reengineering, March 2015, pp.33-42. DOI:10.1109/SANER.2015.7081813.
[9] Liu X Y, Wu J X, Zhou Z H. Exploratory undersampling for class-imbalance learning. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 2009, 39(2):539-550. DOI:10.1109/TSMCB.2008.2007853.
[10] Robillard M P, DeLine R. A field study of API learning obstacles. Empirical Software Engineering, 2011, 16(6):703-732. DOI:10.1007/s10664-010-9150-8.
[11] Rastkar S, Murphy G C, Murray G. Summarizing software artifacts:A case study of bug reports. In Proc. the 32nd ACM/IEEE International Conference on Software Engineering, May 2010, pp.505-514. DOI:10.1145/1806799.1806872.
[12] Jiang H, Zhang J X, Li X C, Ren Z L, Lo D. A more accurate model for finding tutorial segments explaining APIs. In Proc. the 23rd IEEE International Conference on Software Analysis, Evolution, and Reengineering, March 2016, pp.157-167. DOI:10.1109/SANER.2016.59.
[13] Chen D Q, Manning C D. A fast and accurate dependency parser using neural networks. In Proc. the Conference on Empirical Methods in Natural Language Processing, October 2014, pp.740-750. DOI:10.3115/v1/D14-1082.
[14] Manning C D, Mihai S, John b, Jenny F, Steven J B, David M. The Stanford CoreNLP natural language processing toolkit. In Proc. the 52nd Annual Meeting of the Association for Computational Linguistics:System Demonstrations, June 2014, pp.55-60. DOI:10.3115/v1/P14-5010.
[15] Mirray G, Carenini G. Summarizing spoken and written conversations. In Proc. the 2008 Conference on Empirical Methods in Natural Language Processing, October 2008, pp.773-782. DOI:10.3115/1613715.1613813.
[16] Panichella A, Dit B, Oliveto R, Penta M D, Poshynanyk D, Lucia A D. How to effectively use topic models for software engineering tasks? An approach based on genetic algorithms. In Proc. the 35th International Conference on Software Engineering, May 2013, pp.522-531. DOI:10.1109/ICSE.2013.6606598.
[17] Nguyen A T, Nguyen T T, Nguyen T N, Lo D, Sun C N. Duplicate bug report detection with a combination of information retrieval and topic modeling. In Proc. the 27th International Conference on Automated Software Engineering, September 2012, pp.70-79. DOI:10.1145/2351676.2351687.
[18] Gorla A, Tavecchia I, Gross F, Zeller A. Checking app behavior against app descriptions. In Proc. the 36th International Conference on Software Engineering, May 2014, pp.1025-1035. DOI:10.1145/2568225.2568276.
[19] Bernardi M L, Sementa C, Zagarese Q, Distante D, Penta M D. What topics do Firefox and Chrome contributors discuss? In Proc. the 8th Working Conference on Mining Software Repositories, May 2011, pp.234-237. DOI:10.1145/1985441.1985480.
[20] Xia X, Lo D, Shihab E, Wang X Y, Yang X H. ELBlocker:Predicting blocking bugs with ensemble imbalance learning. Information and Software Technology, 2015, 61:93-106. DOI:10.1016/j.infsof.2014.12.006.
[21] Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten I H. The WEKA data mining software:An update. ACM SIGKDD Explorations Newsletter, 2009, 11(1):10-18. DOI:10.1145/1656274.1656278.
[22] Fu W, Menzies T, Sheng X P. Tuning for software analytics:Is it really necessary? Information and Software Technology, 2016, 76:135-146. DOI:10.1016/j.infsof.2016.04.017.
[23] Zhang C, Yang J Y, Zhang Y, Fan J, Zhang X, Zhao J J, Ou P Z. Automatic parameter recommendation for practical API usage. In Proc. the 34th International Conference on Software Engineering, June 2012, pp.826-836. DOI:10.1109/ICSE.2012.6227136.
[24] Field A. Discovering Statistics Using SPSS (2nd edition). Sage, 2005.
[25] Head A, Sadowski C, Murphy-Hill E, Knight A. When not to comment:Questions and tradeoffs with API documentation for C++ projects. In Proc. the 40th International Conference on Software Engineering, May 2018, pp.643-653. DOI:10.1145/3180155.3180176.
[26] Zhang J X, Jiang H, Ren Z L, Zhang T, Huang Z Q. Enriching API documentation with code samples and usage scenarios from crowd knowledge. IEEE Transactions on Software Engineering. DOI:10.1109/TSE.2019.2919304.
[27] Dekel U. Increasing awareness of delocalized information to facilitate API usage[Ph.D. Thesis]. Carnegie Mellon University, 2009.
[28] Zhou Y, Gu R H, Chen T L, Huang Z Q, Panichella S, Gall H C. Analyzing APIs documentation and code to detect directive defects. In Proc. the 39th International Conference on Software Engineering, May 2017, pp.27-37. DOI:10.1109/ICSE.2017.11.
[29] Zhong H, Su Z D. Detecting API documentation errors. In Proc. the 2013 ACM SIGPLAN International Conference on Object Oriented Programming Systems Languages and Applications, October 2013, pp.803-816. DOI:10.1145/2509136.2509523.
[30] Shi L, Zhong H, Xie T, Li M S. An empirical study on evolution of API documentation. In Proc. the 14th International Conference on Fundamental Approaches to Software Engineering, March 26-April 3, 2011, pp.416-431. DOI:10.1007/978-3-642-19811-329.
[31] Tan L, Yuan D, Krishna G, Zhou Y Y./*iComment:Bugs or bad comments?*/. In Proc. the 21st ACM SIGOPS Symposium on Operating Systems Principles, October 2007, pp.145-158. DOI:10.1145/1294261.1294276.
[32] Blasi A, Goffi A, Kuznetsov K, Gorla A, Ernst M D, Pezzè M, Castellanos S D. Translating code comments to procedure specifications. In Proc. the 27th ACM SIGSOFT International Symposium on Software Testing and Analysis, July 2018, pp.242-253. DOI:10.1145/3213846.3213872.
[33] Zhong H, Zhang L, Xie T, Mei H. Inferring specifications for resources from natural language API documentation. Automated Software Engineering, 2011, 18(3/4):227-261. DOI:10.1007/s10515-011-0082-3.
[34] Pandita R, Taneja K, Williams L, Tung T. ICON:Inferring temporal constraints from natural language API descriptions. In Proc. the 2016 IEEE International Conference on Software Maintenance and Evolution, October 2016, pp.378-388. DOI:10.1109/ICSME.2016.59.
[35] Robillard M P, Chhetri Y B. Recommending reference API documentation. Empirical Software Engineering, 2015, 20(6):1558-1586. DOI:10.1007/s10664-014-9323-y.
[36] Dagenais B, Robillard M P. Using traceability links to recommend adaptive changes for documentation evolution. IEEE Transactions on Software Engineering, 2014, 40(11):1126-1146. DOI:10.1109/TSE.2014.2347969.
[37] Treude C, Robillard M P. Augmenting API documentation with insights from Stack Overflow. In Proc. the 38th IEEE/ACM International Conference on Software Engineering, May 2016, pp.392-403. DOI:10.1145/2884781.2884800.
[38] Kim J, Lee S, Hwang S, Kim S. Enriching documents with examples:A corpus mining approach. ACM Transactions on Information Systems, 2013, 33(1):Article No. 1. DOI:10.1145/2414782.2414783.
[39] Wu Y C, Mar L W, Jiau H C. CoDocent:Support API usage with code example and API documentation. In Proc. the 5th International Conference on Software Engineering Advances, August 2010, pp.135-140. DOI:10.1109/ICSEA.2010.28.
[1] Yang Li, Wen-Zhuo Song, Bo Yang. Stochastic Variational Inference-Based Parallel and Online Supervised Topic Model for Large-Scale Text Processing [J]. Journal of Computer Science and Technology, 2018, 33(5): 1007-1022.
[2] Xin-Li Yang, David Lo, Xin Xia, Qiao Huang, Jian-Ling Sun. High-Impact Bug Report Identification with Imbalanced Learning Strategies [J]. , 2017, 32(1): 181-198.
[3] Mitat Poyraz, Zeynep Hilal Kilimci, and Murat Can Ganiz. Higher-Order Smoothing:A Novel Semantic Smoothing Method for Text Classification [J]. , 2014, 29(3): 376-391.
[4] Rafael Geraldeli Rossi, Alneu de Andrade Lopes, Thiago de Paulo Faleiros, and Solange Oliveira Rezende. Inductive Model Generation for Text Classification Using a Bipartite Heterogeneous Network [J]. , 2014, 29(3): 361-375.
[5] Shi-Zhu Liu and He-Ping Hu. Text Classification Using Sentential Frequent Itemsets [J]. , 2007, 22(2): 334-ver .
[6] Xueqi Cheng, Songbo Tan, and Lilian Tang. Using DragPushing to Refine Concept Index for Text Categorization [J]. , 2006, 21(4): 592-596 .
Full text



[1] Zhou Di;. A Recovery Technique for Distributed Communicating Process Systems[J]. , 1986, 1(2): 34 -43 .
[2] Chen Shihua;. On the Structure of Finite Automata of Which M Is an(Weak)Inverse with Delay τ[J]. , 1986, 1(2): 54 -59 .
[3] Feng Yulin;. Recursive Implementation of VLSI Circuits[J]. , 1986, 1(2): 72 -82 .
[4] Wang Xuan; Lü Zhimin; Tang Yuhai; Xiang Yang;. A High Resolution Chinese Character Generator[J]. , 1986, 1(2): 1 -14 .
[5] Wu Enhua;. A Graphics System Distributed across a Local Area Network[J]. , 1986, 1(3): 53 -64 .
[6] Zhang Cui; Zhao Qinping; Xu Jiafu;. Kernel Language KLND[J]. , 1986, 1(3): 65 -79 .
[7] Wang Jianchao; Wei Daozheng;. An Effective Test Generation Algorithm for Combinational Circuits[J]. , 1986, 1(4): 1 -16 .
[8] Chen Zhaoxiong; Gao Qingshi;. A Substitution Based Model for the Implementation of PROLOG——The Design and Implementation of LPROLOG[J]. , 1986, 1(4): 17 -26 .
[9] Huang Heyan;. A Parallel Implementation Model of HPARLOG[J]. , 1986, 1(4): 27 -38 .
[10] Zheng Guoliang; Li Hui;. The Design and Implementation of the Syntax-Directed Editor Generator(SEG)[J]. , 1986, 1(4): 39 -48 .

ISSN 1000-9000(Print)

CN 11-2296/TP

Editorial Board
Author Guidelines
Journal of Computer Science and Technology
Institute of Computing Technology, Chinese Academy of Sciences
P.O. Box 2704, Beijing 100190 P.R. China
  Copyright ©2015 JCST, All Rights Reserved