Journal of Computer Science and Technology ›› 2021, Vol. 36 ›› Issue (4): 922-943.doi: 10.1007/s11390-021-0235-1

Special Issue: Artificial Intelligence and Pattern Recognition; Software Systems

Regular Paper

Discovering API Directives from API Specifications with Text Classification

Jing-Xuan Zhang1,2,3, Member, CCF, ACM, Chuan-Qi Tao1,2, Member, CCF, ACM Zhi-Qiu Huang1,2, Member, CCF, and Xin Chen3,4, Member, CCF        

  1. 1 College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing 210016, China;
    2 Key Laboratory of Safety-Critical Software(Nanjing University of Aeronautics and Astronautics), Ministry of Industry and Information Technology, Nanjing 210016, China;
    3 Key Laboratory of Complex Systems Modeling and Simulation(Hangzhou Dianzi University), Ministry of Education Hangzhou 310018, China;
    4 School of Computer Science and Technology, Hangzhou Dianzi University, Hangzhou 310018, China
  • Received:2019-12-18 Revised:2021-06-09 Online:2021-07-05 Published:2021-07-30
  • About author:Jing-Xuan Zhang received his Ph.D. degree in software engineering from the School of Software, Dalian University of Technology, Dalian, in 2018. He is currently a lecturer of the College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing. His current research interests include mining software repositories and software data analytics.
  • Supported by:
    This work is partially supported by the National Key Research and Development Plan of China under Grant No. 2018YFB1003900, the National Natural Science Foundation of China under Grant No. 61902181, the China Postdoctoral Science Foundation under Grant No. 2020M671489, and the CCF-Tencent Open Research Fund under Grant No. RAGR20200106.

Application programming interface (API) libraries are extensively used by developers. To correctly program with APIs and avoid bugs, developers shall pay attention to API directives, which illustrate the constraints of APIs. Unfortunately, API directives usually have diverse morphologies, making it time-consuming and error-prone for developers to discover all the relevant API directives. In this paper, we propose an approach leveraging text classification to discover API directives from API specifications. Specifically, given a set of training sentences in API specifications, our approach first characterizes each sentence by three groups of features. Then, to deal with the unequal distribution between API directives and non-directives, our approach employs an under-sampling strategy to split the imbalanced training set into several subsets and trains several classifiers. Given a new sentence in an API specification, our approach synthesizes the trained classifiers to predict whether it is an API directive. We have evaluated our approach over a publicly available annotated API directive corpus. The experimental results reveal that our approach achieves an F-measure value of up to 82.08%. In addition, our approach statistically outperforms the state-of-the-art approach by up to 29.67% in terms of F-measure.

Key words: Application programming interface (API) directive; API specification; imbalanced learning; text classification;

