Journal of Computer Science and Technology ›› 2021, Vol. 36 ›› Issue (4): 856-876.doi: 10.1007/s11390-020-0042-0

Special Issue: Software Systems

• Regular Paper • Previous Articles     Next Articles

An Empirical Comparison Between Tutorials and Crowd Documentation of Application Programming Interface

Yi-Xuan Tang1, Zhi-Lei Ren1,*, Member, CCF, ACM He Jiang1,2,3, Distinguished Member, CCF, Member, ACM, IEEE, Xiao-Chen Li1, Member, CCF, and Wei-Qiang Kong1, Member, CCF        

  1. 1 School of Software, Dalian University of Technology, Dalian 116000, China;
    2 Key Laboratory for Ubiquitous Network and Service Software of Liaoning Province, Dalian 116000, China;
    3 School of Computer Science & Technology, Beijing Institute of Technology, Beijing 100000, China
  • Received:2019-09-18 Revised:2020-05-17 Online:2021-07-05 Published:2021-07-30
  • Contact: Zhi-Lei Ren
  • About author:Yi-Xuan Tang received her B.S. degree in computer science and technology from Liaoning University, Shenyang, in 2015. She is currently a Ph.D. candidate in Dalian University of Technology, Dalian. Her current research interests include software data analytics and compiler testing.
  • Supported by:
    This work is supported by the National Key Research and Development Program of China under Grant No. 2018YFB1003900, the National Natural Science Foundation of China under Grant Nos. 61722202, 61772107 and 61572097, and the Fundamental Research Funds for the Central Universities of China under Grant No. DUT18JC08.

API (application programming interface) documentation is critical for developers to learn APIs. However, it is unclear whether API documentation indeed improves the API learnability for developers. In this paper, we focus on two types of API documentation, i.e., official API tutorials and API crowd documentation. First, we analyze API coverage and check API consistencies in API documentation based on the API traceability. Then, we conduct a survey and extract several characteristics to analyze which API documentation can help developers learn APIs. Our findings show that: 1) API crowd documentation can be regarded as a supplement to the official API tutorials to some extent; 2) the concerns for frequentlyused APIs between different types of API documentation show a huge mismatch, which may prevent developers from deeply understanding the usages of APIs through only one type of API documentation; 3) official API tutorials can help developers seek API information on a long page and API crowd documentation could provide long codes for a particular programming task. These findings may help developers select the suitable API documentation and find the useful information they need.

Key words: API documentation; empirical study; quantitative analysis;

[1] Subramanian S, Inozemtseva L, Holmes R. Live API documentation. In Proc. the 36th Int. Conf. Softw. Eng., May 2014, pp.643-652. DOI:10.1145/2568225.2568313.
[2] Petrosyan G, Robillard M P, De Mori R. Discovering information explaining API types using text classification. In Proc. the 37th IEEE/ACM Int. Conf. Softw. Eng., May 2015, pp.869-879. DOI:10.1109/ICSE.2015.97.
[3] Maalej W, Robillard M P. Patterns of knowledge in API reference documentation. IEEE Trans. Softw. Eng., 2013, 39(9):1264-1282. DOI:10.1109/TSE.2013.12.
[4] Robillard M P. What makes APIs hard to learn? Answers from developers. IEEE Softw., 2009, 26(6):27-34. DOI:10.1109/MS.2009.193.
[5] Thayer K. Using program analysis to improve API learnability. In Proc. the 2018 ACM Conf. Int. Computing Education Research, Aug. 2018, pp.292-293. DOI:10.1145/3230977.3231009.
[6] Jiang J, Koskinen J, Ruokonen A, Systa T. Constructing usage scenarios for API redocumentation. In Proc. the 15th IEEE Int. Conf. Prog. Comprehension, Jun. 2007, pp.259-264. DOI:10.1109/ICPC.2007.16.
[7] Jiang H, Zhang J, Li X, Ren Z, Lo D. A more accurate model for finding tutorial segments explaining APIs. In Proc. the 23rd IEEE Int. Conf. Softw. Analysis, Evolution, and Reengineering, Mar. 2016, pp.157-167. DOI:10.1109/SANER.2016.59.
[8] Zhang J, Jiang H, Ren Z, Chen X. Recommending APIs for API related questions in Stack Overflow. IEEE Access, 2018, 6:6205-6219. DOI:10.1109/ACCESS.2017.2777845.
[9] Treude C, Storey M A. Effective communication of software development knowledge through community portals. In Proc. the 19th ACM SIGSOFT Symposium and the 13th European Conf. Foundations of Softw. Eng., Sept. 2011, pp.91-101. DOI:10.1145/2025113.2025129.
[10] Robillard M P, Deline R. A field study of API learning obstacles. Empir. Softw. Eng., 2011, 16(6):703-732. DOI:10.1007/s10664-010-9150-8.
[11] Scaffidi C. Why are APIs difficult to learn and use. ACM Crossroads Student Magazine, 2006, 12(4):4-10. DOI:10.1145/1144359.1144363.
[12] Jiang H, Zhang J, Ren Z, Zhang T. An unsupervised approach for discovering relevant tutorial fragments for APIs. In Proc. the 39th IEEE/ACM Int. Conf. Softw. Eng., May 2017, pp.38-48. DOI:10.1109/ICSE.2017.12.
[13] Ye X, Shen H, Ma X, Bunescu R, Liu C. From word embeddings to document similarities for improved information retrieval in software engineering. In Proc. the 38th Int. Conf. Softw. Eng., May 2016, pp.404-415. DOI:10.1145/2884781.2884862.
[14] Uddin G, Robillard M P. How API documentation fails. IEEE Softw., 2015, 32(4):68-75. DOI:10.1109/MS.2014.80.
[15] Zhou Y, Gu R, Chen T, Huang Z, Panichella S, Gall H. Analyzing APIs documentation and code to detect directive defects. In Proc. the 39th IEEE/ACM Int. Conf. Softw. Eng., May 2017, pp.27-37. DOI:10.1109/ICSE.2017.11.
[16] Parnin C, Treude C, Grammel L, Storey M A. Crowd documentation:Exploring the coverage and the dynamics of API discussions on Stack Overflow. Technical Report, Georgia Institute of Technology, 2012., Jan. 2021.
[17] Wang X, Huang C, Yao L, Benatallah B, Dong M. A survey on expert recommendation in community question answering. Journal of Computer Science and Tech., 2018, 33(4):625-653. DOI:10.1007/s11390-018-1845-0.
[18] Mamykina L, Manoim B, Mittal M, Hripcsak G, Hartmann B. Design lessons from the fastest Q&A site in the west. In Proc. the 2011 Annual Conf. Human Factors in Computing Systems, May 2011, pp.2857-2866. DOI:10.1145/1978942.1979366.
[19] Beyer S, Macho C, Pinzger M. On Android API classes and their references on Stack Overflow. Technical Report, University of Klagenfurt, 2016., Jan. 2021.
[20] Zhang Y, Lo D, Xia X, Sun J L. Multi-factor duplicate question detection in Stack Overflow. Journal of Computer Science and Tech., 2015, 30(5):981-997. DOI:10.1007/s11390015-1576-4.
[21] Yang X L, Lo D, Xia X, Wan Z Y, Sun J L. What security questions do developers ask? A large-scale study of Stack Overflow posts. Journal of Computer Science and Tech., 2016, 31(5):910-924. DOI:10.1007/s11390-016-1672-0.
[22] Rosen C, Shihab E. What are mobile developers asking about? A large scale study using Stack Overflow. Empir. Softw. Eng., 2016, 21(3):1192-1223. DOI:10.1007/s10664-015-9379-3.
[23] Barua A, Thomas S W, Hassan A E. What are developers talking about? An analysis of topics and trends in Stack Overflow. Empir. Softw. Eng., 2014, 19(3):619-654. DOI:10.1007/s10664-012-9231-y.
[24] Chen C, Wu K, Srinivasan V, Bharadwaj R K. The best answers? Think twice:Identifying commercial campaigns in the CQA forums. Journal of Computer Science and Tech., 2015, 30(4):810-828. DOI:10.1007/s11390-015-1562-x.
[25] Brito G, Hora A C, Valente M T, Romain R. On the use of replacement messages in API deprecation:An empirical study. Journal Syst. Softw., 2018, 137:306-321. DOI:10.1016/j.jss.2017.12.007.
[26] Dagenais B, Robillard M P. Recovering traceability links between an API and its learning resources. In Proc. the 34th Int. Conf. Softw. Eng., June 2012, pp.47-57. DOI:10.1109/ICSE.2012.6227207.
[27] Rastkar S, Murphy G C, Murray G. Summarizing software artifacts:A case study of bug reports. In Proc. the 32nd ACM/IEEE Int. Conf. Softw. Eng., May 2010, pp.505-514. DOI:10.1145/1806799.1806872.
[28] Bettenburg N, Just S, Schröter A, Weiss C, Premraj R, Zimmermann T. What makes a good bug report? In Proc. the 16th ACM SIGSOFT Int. Symposium on Foundations of Softw. Eng., November 2008, pp.308-318. DOI:10.1145/1453101.1453146.
[29] Schneider T D. Information theory primer with an appendix on logarithms., Jan. 2021.
[30] Smith E A, Senter R J. Automated readability index. Wright-Patterson Air Force Base, 1967., Jan. 2021.
[31] Jay G T, Hale J E, Smith R K, Hale D P, Kraft N A, Ward C. Cyclomatic complexity and lines of code:Empirical evidence of a stable linear relationship. Journal of Softw. Eng. and Applications, 2009, 2(3):137-143. DOI:10.4236/jsea.2009.23020.
[32] Nykaza J, Messinger R, Boehme F, Norman CL, Mace M, Gordon M. What programmers really want:Results of a needs assessment for SDK documentation. In Proc. the 20th Annual ACM SIGDOC Int. Conf. Computer Documentation, Oct. 2002, pp.133-141. DOI:10.1145/584955.584976.
[33] Mclellan S G, Roesler A W, Tempest J T, Spinuzzi C I. Building more usable APIs. IEEE Softw., 1998, 15(3):78-86. DOI:10.1109/52.676963.
[34] Santos A L, Myers B A. Design annotations to improve API discoverability. Journal of Systems and Softw., 2017, 126:17-33. DOI:10.1016/j.jss.2016.12.036.
[35] McCabe T J. A complexity measure. IEEE Trans. Softw. Eng., 1976, SE-2(4):308-320. DOI:10.1109/TSE.1976.233837.
[36] Yuan T, Thung F, Sharma A, Lo D. APIBot:Question answering bot for API documentation. In Proc. the 32nd IEEE/ACM Int. Conf. Automated Softw. Eng., Oct. 30Nov. 3, 2017, pp.153-158. DOI:10.1109/ASE.2017.8115628.
[37] Zar J H. Spearman rank correlation. Encyclopedia of Biostatistics. DOI:10.1002/0470011815.b2a15150.
[38] Mandelin D, Xu L, Bodík R, Kimelman D. Jungloid mining:Helping to navigate the API jungle. In Proc. the ACM SIGPLAN Conf. Prog. Language Design and Implementation, June 2005, 40(6):48-61. DOI:10.1145/1065010.1065018.
[39] Mann H B, Whitney D R. On a test of whether one of two random variables is stochastically larger than the other. The Annals of Mathematical Statistics, 1947, 18(1):50-60. DOI:10.1214/AOMS/1177730491.
[40] Kitchenham B A, Peeger S L. Personal opinion surveys. In Guide to Advanced Empirical Software Engineering, Shull F, Singer J, Sjøberg D (eds.), Springer, 2008, pp.63-92. DOI:10.1007/978-1-84800-044-53.
[41] Zou W, Lo D, Chen Z, Xia X, Feng Y, Xu B. How practitioners perceive automated bug report management techniques. IEEE Trans. Softw. Eng., 2020, 46(8):836-862. DOI:10.1109/TSE.2018.2870414.
[42] David L O, Nagappan N, Zimmermann T. How practitioners perceive the relevance of software engineering research. In Proc. the 10th Joint Meeting on Foundations of Soft. Eng., Aug. 30-Sept. 4, 2015, pp.415-425. DOI:10.1145/2786805.2786809.
[43] Fisher R A. On the interpretation of χ2 from contingency tables, and the calculation of P. Journal of the Royal Statistical Society, 1922, 85(1):87-94. DOI:10.1111/j.23972335.1922.tb00768.x.
[44] McDonald J H. Handbook of Biological Statistics (3rd edition). Sparky House Publisher, 2009.
[45] Rocha A M, Maia M A. Automated API documentation with tutorials generated from Stack Overflow. In Proc. the 30th Brazilian Symposium on Softw. Eng., Sept. 2016, pp.33-42. DOI:10.1145/2973839.2973847.
[46] Kim J, Lee S, Hwang S, Kim S. Enriching documents with examples:A corpus mining approach. ACM Trans. Information Systems, 2013, 31(1):Article No. 1. DOI:10.1145/2414782.2414783.
[47] Treude C, Robillard M P. Augmenting API documentation with insights from Stack Overflow. In Proc. the 38th IEEE/ACM Int. Conf. Softw. Eng., May 2016, pp.392-403. DOI:10.1145/2884781.2884800.
[1] Que-Ping Kong, Zi-Yan Wang, Yuan Huang, Xiang-Ping Chen, Xiao-Cong Zhou, Zi-Bin Zheng, and Gang Huang. Characterizing and Detecting Gas-Inefficient Patterns in Smart Contracts [J]. Journal of Computer Science and Technology, 2022, 37(1): 67-82.
[2] Yong-Hao Wu, Zheng Li, Yong Liu, Xiang Chen. FATOC: Bug Isolation Based Multi-Fault Localization by Using OPTICS Clustering [J]. Journal of Computer Science and Technology, 2020, 35(5): 979-998.
[3] Xiang Chen, Dun Zhang, Zhan-Qi Cui, Qing Gu, Xiao-Lin Ju. DP-Share: Privacy-Preserving Software Defect Prediction Model Sharing Through Differential Privacy [J]. Journal of Computer Science and Technology, 2019, 34(5): 1020-1038.
[4] Mohammed Alqmase, Mohammad Alshayeb, Lahouari Ghouti. Threshold Extraction Framework for Software Metrics [J]. Journal of Computer Science and Technology, 2019, 34(5): 1063-1078.
[5] Xin-Li Yang, David Lo, Xin Xia, Zhi-Yuan Wan, Jian-Ling Sun. What Security Questions Do Developers Ask? A Large-Scale Study of Stack Overflow Posts [J]. , 2016, 31(5): 910-924.
Full text



[1] Zhou Di;. A Recovery Technique for Distributed Communicating Process Systems[J]. , 1986, 1(2): 34 -43 .
[2] Chen Shihua;. On the Structure of Finite Automata of Which M Is an(Weak)Inverse with Delay τ[J]. , 1986, 1(2): 54 -59 .
[3] Liu Mingye; Hong Enyu;. Some Covering Problems and Their Solutions in Automatic Logic Synthesis Systems[J]. , 1986, 1(2): 83 -92 .
[4] Wang Jianchao; Wei Daozheng;. An Effective Test Generation Algorithm for Combinational Circuits[J]. , 1986, 1(4): 1 -16 .
[5] Chen Zhaoxiong; Gao Qingshi;. A Substitution Based Model for the Implementation of PROLOG——The Design and Implementation of LPROLOG[J]. , 1986, 1(4): 17 -26 .
[6] Huang Heyan;. A Parallel Implementation Model of HPARLOG[J]. , 1986, 1(4): 27 -38 .
[7] Zheng Guoliang; Li Hui;. The Design and Implementation of the Syntax-Directed Editor Generator(SEG)[J]. , 1986, 1(4): 39 -48 .
[8] Huang Xuedong; Cai Lianhong; Fang Ditang; Chi Bianjin; Zhou Li; Jiang Li;. A Computer System for Chinese Character Speech Input[J]. , 1986, 1(4): 75 -83 .
[9] Xu Xiaoshu;. Simplification of Multivalued Sequential SULM Network by Using Cascade Decomposition[J]. , 1986, 1(4): 84 -95 .
[10] Tang Tonggao; Zhao Zhaokeng;. Stack Method in Program Semantics[J]. , 1987, 2(1): 51 -63 .

ISSN 1000-9000(Print)

CN 11-2296/TP

Editorial Board
Author Guidelines
Journal of Computer Science and Technology
Institute of Computing Technology, Chinese Academy of Sciences
P.O. Box 2704, Beijing 100190 P.R. China
  Copyright ©2015 JCST, All Rights Reserved