›› 2016, Vol. 31 ›› Issue (5): 883-909.doi: 10.1007/s11390-016-1671-1

Special Issue: Surveys; Software Systems

• Special Section on Software Systems 2016 • Previous Articles     Next Articles

Summarizing Software Artifacts: A Literature Review

Najam Nazar1, Yan Hu2, Member, CCF, ACM, and He Jiang1,2*, Member, CCF, ACM   

  1. 1 Key Laboratory for Ubiquitous Network and Service Software of Liaoning Province, School of Software Dalian University of Technology, Dalian 116621, China;
    2 State Key Laboratory of Software Engineering, Wuhan University, Wuhan 430072, China
  • Received:2015-11-20 Revised:2016-07-30 Online:2016-09-05 Published:2016-09-05
  • Contact: He Jiang E-mail:jianghe@dlut.edu.cn
  • About author:Najam Nazar received his B.Sc. (Hons.) degree in computer science from University of the Punjab, Lahore, Pakistan, in 2005, and M.S. degree in software engineering from Chalmers University of Technology, Sweden, in 2010. He is currently working towards his Ph.D. degree in software engineering at Dalian University of Technology, Dalian. His current research interest includes mining software repositories, data mining, natural language processing, and machine learning.
  • Supported by:

    This work was supported in part by the National Basic Research 973 Program of China under Grant No. 2013CB035906, the Fundamental Research Funds for the Central Universities of China under Grant No. DUT13RC(3)53, and in part by the New Century Excellent Talents in University of China under Grant No. NCET-13-0073 and the National Natural Science Foundation of China under Grant No. 61300017.

This paper presents a literature review in the field of summarizing software artifacts, focusing on bug reports, source code, mailing lists and developer discussions artifacts. From Jan. 2010 to Apr. 2016, numerous summarization techniques, approaches, and tools have been proposed to satisfy the ongoing demand of improving software performance and quality and facilitating developers in understanding the problems at hand. Since aforementioned artifacts contain both structured and unstructured data at the same time, researchers have applied different machine learning and data mining techniques to generate summaries. Therefore, this paper first intends to provide a general perspective on the state of the art, describing the type of artifacts, approaches for summarization, as well as the common portions of experimental procedures shared among these artifacts. Moreover, we discuss the applications of summarization, i.e., what tasks at hand have been achieved through summarization. Next, this paper presents tools that are generated for summarization tasks or employed during summarization tasks. In addition, we present different summarization evaluation methods employed in selected studies as well as other important factors that are used for the evaluation of generated summaries such as adequacy and quality. Moreover, we briefly present modern communication channels and complementarities with commonalities among different software artifacts. Finally, some thoughts about the challenges applicable to the existing studies in general as well as future research directions are also discussed. The survey of existing studies will allow future researchers to have a wide and useful background knowledge on the main and important aspects of this research field.

[1] Lloret E, Palomar M. Text summarisation in progress: A literature review. Artificial Intelligence Review, 2012, 37(1): 1-41.

[2] Murphy G C. Lightweight structural summarization as an aid to software evolution [Ph.D. Thesis]. University of Washington, 1996.

[3] Sridhara G, Hill E, Muppaneni D, Pollock L L, Vijay-Shanker K. Towards automatically generating summary comments for java methods. In Proc. the 25th IEEE/ACM International Conference on Automated Software Engineering, Sept. 2010, pp.43-52.

[4] Eddy B P, Robinson J A, Kraft N A, Carver J C. Evaluating source code summarization techniques: Replication and expansion. In Proc. the 21st International Conference on Program Comprehension, May 2013, pp.13-22.

[5] Rastkar S, Murphy G C, Murray G. Automatic summarization of bug reports. IEEE Transactions on Software Engineering, 2014, 40(4): 366-380.

[6] Bettenburg N, Premraj R, Zimmermann T, Kim S. Extracting structural information from bug reports. In Proc. the International Working Conference on Mining Software Repositories, May 2008, pp.27-30.

[7] Bacchelli A, Lanza M, Mastrodicasa E S. On the road to hades-helpful automatic development email summarization. In Proc. the 1st International Workshop on the Next Five Years of Text Analysis in Software Maintenance, Sept. 2012.

[8] Di Sorbo A, Panichella S, Visaggio C A, Di Penta M, Canfora G, Gall H C. Development emails content analyzer: Intention mining in developer discussions (T). In Proc. the 30th IEEE/ACM International Conference on Automated Software Engineering, Nov. 2015, pp.12-23.

[9] Haiduc S, Aponte J, Moreno L, Marcus A. On the use of automated text summarization techniques for summarizing source code. In Proc. the 17th Working Conference on Reverse Engineering, Oct. 2010, pp.35-44.

[10] Nenkova A, McKeown K. A survey of text summarization techniques. In Mining Text Data, Aggarwal C C, Zhai C (eds.), Springer US, 2012, pp.43-76.

[11] Manning C D, Raghavan P, Schütze H. Introduction to Information Retrieval (1 edition). Cambridge University Press, 2008.

[12] Kagdi H, Collard M L, Maletic J I. A survey and taxonomy of approaches for mining software repositories in the context of software evolution. Journal of Software Maintenance and Evolution: Research and Practice, 2007, 19(2): 77-131.

[13] Bacchelli A, Lanza M, Robbes R. Linking e-mails and source code artifacts. In Proc. the 32nd ACM/IEEE International Conference on Software Engineering -Volume 1, May 2010, pp.375-384.

[14] Haiduc S, Aponte J, Marcus A. Supporting program comprehension with source code summarization. In Proc. the 32nd ACM/IEEE International Conference on Software Engineering, May 2010, pp.223-226.

[15] Moreno L, Aponte J. On the analysis of human and automatic summaries of source code. CLEI Electronic Journal, 2012, 15(2).

[16] Rodeghero P, McMillan C, McBurney P W, Bosch N, D'Mello S. Improving automated source code summarization via an eyetracking study of programmers. In Proc. the 36th International Conference on Software Engineering, May 2014, pp.390-401.

[17] Rodeghero P, Liu C, McBurney P, McMillan C. An eyetracking study of java programmers and application to source code summarization. IEEE Transactions on Software Engineering, 2015, 41(11): 1038-1054.

[18] Rastkar S, Murphy G C. Why did this code change? In Proc. the 2013 International Conference on Software Engineering, May 2013, pp.1193-1196.

[19] Binkley D, Lawrie D, Hill E, Burge J, Harris I, Hebig R, Keszocze O, Reed K, Slankas J. Task-driven software summarization. In Proc. the 29th IEEE International Conference on Software Maintenance, Sept. 2013, pp.432-435.

[20] Panichella A, Aponte J, Di Penta M, Marcus A, Canfora G. Mining source code descriptions from developer communications. In Proc. the 20th International Conference on Program Comprehension (ICPC), Jun. 2012, pp.63-72.

[21] Blei D M, Ng A Y, Jordan M I. Latent Dirichlet allocation. The Journal of Machine Learning Research, 2003, 3: 993-1022.

[22] Panichella A, Dit B, Oliveto R, Di Penta M, Poshyvanyk D, De Lucia A. How to effectively use topic models for software engineering tasks? An approach based on genetic algorithms. In Proc. the 35th International Conference on Software Engineering, May 2013, pp.522-531.

[23] De Lucia A, Di Penta M, Oliveto R, Panichella A, Panichella S. Using IR methods for labeling source code artifacts: Is it worthwhile? In Proc. the 20th International Conference on Program Comprehension, Jun. 2012, pp.193-202.

[24] De Lucia A, Di Penta M, Oliveto R, Panichella A, Panichella S. Labeling source code with information retrieval methods: An empirical study. Empirical Software Engineering, 2014, 19(5): 1383-1420.

[25] Vassallo C, Panichella S, Di Penta M, Canfora G. Codes: Mining source code descriptions from developers discussions. In Proc. the 22nd International Conference on Program Comprehension, May 2014, pp.106-109.

[26] Rahman M M, Roy C K, Keivanloo I. Recommending insightful comments for source code using crowdsourced knowledge. In Proc. the 15th International Working Conference on Source Code Analysis and Manipulation (SCAM), Sept. 2015, pp.81-90.

[27] Sridhara G, Pollock L L, Vijay-Shanker K. Generating parameter comments and integrating with method summaries. In Proc. the 19th IEEE International Conference on Program Comprehension, Jun. 2011, pp.71-80.

[28] Sridhara G, Pollock L, Vijay-Shanker K. Automatically detecting and describing high level actions within methods. In Proc. the 33rd International Conference on Software Engineering (ICSE), May 2011, pp.101-110.

[29] Rastkar S. Summarizing software concerns. In Proc. the 32nd ACM/IEEE International Conference on Software Engineering -Volume 2, May 2010, pp.527-528.

[30] Rastkar S, Murphy G C, Bradley A W J. Generating natural language summaries for crosscutting source code concerns. In Proc. the 27th International Conference on Software Maintenance, Sept. 2011, pp.103-112.

[31] Moreno L, Aponte J, Sridhara G, Marcus A, Pollock L L, Vijay-Shanker K. Automatic generation of natural language summaries for java classes. In Proc. the 21st International Conference on Program Comprehension, May 2013, pp.23-32.

[32] Moreno L, Marcus A, Pollock L L, Vijay Shanker K. Jsummarizer: An automatic generator of natural language summaries for java classes. In Proc. the 21st International Conference on Program Comprehension (ICPC), May 2013, pp.230-232.

[33] McBurney P W, McMillan C. Automatic documentation generation via source code summarization of method context. In Proc. the 22nd International Conference on Program Comprehension, Jun. 2014, pp.279-290.

[34] McBurney P W, McMillan C. Automatic source code summarization of context for java methods. IEEE Transactions on Software Engineering, 2016, 42(2): 103-119.

[35] McBurney P W. Automatic documentation generation via source code summarization. In Proc. the 37th International Conference on Software Engineering -Volume 2, May 2015, pp.903-906.

[36] McBurney P W, Liu C, McMillan C, Weninger T. Improving topic model source code summarization. In Proc. the 22nd International Conference on Program Comprehension, June 2014, pp.291-294.

[37] Moreno L, Bavota G, Di Penta M, Oliveto R, Marcus A, Canfora G. Automatic generation of release notes. In Proc. the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering, Nov. 2014, pp.484-495.

[38] Kulkarni N, Varma V. Supporting comprehension of unfamiliar programs by modeling an expert's perception. In Proc. the 3rd International Workshop on Realizing Artificial Intelligence Synergies in Software Engineering, Jun. 2014, pp.19-24.

[39] Wong E, Yang J, Tan L. Autocomment: Mining question and answer sites for automatic comment generation. In Proc. the IEEE/ACM 28th International Conference on Automated Software Engineering (ASE), Nov. 2013, pp.562-567.

[40] Zhang Y, Hou D. Extracting problematic API features from forum discussions. In Proc. the 21st International Conference on Program Comprehension (ICPC), May 2013, pp.142-151.

[41] Kamimura M, Murphy G C. Towards generating humanoriented summaries of unit test cases. In Proc. the 21st International Conference on Program Comprehension (ICPC), May 2013, pp.215-218.

[42] Panichella S, Panichella A, Beller M, Zaidman A, Gall H C. The impact of test case summaries on bug fixing performance: An empirical investigation. In Proc. the 38th International Conference on Software Engineering, May 2016, pp.547-558.

[43] Li B, Vendome C, Linares-Vásquez M, Poshyvanyk D, Kraft N A. Automatically documenting unit test cases. In Proc. the IEEE Int. Conf. Software Testing, Verification and Valication, Apr. 2016, pp.341-352.

[44] Dragan N, CollardM, Maletic J. Automatic identification of class stereotypes. In Proc. the IEEE International Conference on Software Maintenance (ICSM), Sept. 2010, pp.1-10.

[45] Abid N, Dragan N, Collard M, Maletic J. Using stereotypes in the automatic generation of natural language summaries for C++ methods. In Proc. the International Conference on Software Maintenance and Evolution, Sept.29-Oct.1, 2015, pp.561-565.

[46] Cortés-Coy L F, Linares-Vásquez M, Aponte J, Poshyvanyk D. On automatically generating commit messages via summarization of source code changes. In Proc. the 14th IEEE International Working Conference on Source Code Analysis and Manipulation, Sept. 2014, pp.275-284.

[47] Moreno L, Marcus A. Jstereocode: Automatically identifying method and class stereotypes in java code. In Proc. the 27th IEEE/ACM International Conference on Automated Software Engineering, Sept. 2012, pp.358-361.

[48] Buse R P, Weimer W R. Automatically documenting program changes. In Proc. the IEEE/ACM International Conference on Automated Software Engineering, Sept. 2010, pp.33-42.

[49] Nielson F, Nielson H R, Hankin C. Principles of Program Analysis. Springer, 2015.

[50] Kupiec J, Pedersen J O, Chen F. A trainable document summarizer. In Proc the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Jul. 1995, pp.68-73.

[51] Lotufo R, Malik Z, Czarnecki K. Modelling the ‘hurried’ bug report reading process to summarize bug reports. In Proc. the 28th IEEE International Conference on Software Maintenance, Sept. 2012, pp.430-439.

[52] Rastkar S, Murphy G C, Murray G. Summarizing software artifacts: A case study of bug reports. In Proc. the 32nd ACM/IEEE International Conference on Software Engineering, Volume 1, May 2010, pp.505-514.

[53] Murray G, Carenini G. Summarizing spoken and written conversations. In Proc. the Conference on Empirical Methods in Natural Language Processing, Oct. 2008, pp.773-782.

[54] Jiang H, Zhang J, Ma H, Nazar N, Ren Z. Mining authorship characteristics in bug repositories. Science China Information Sciences, 2015. (Accepted)

[55] Ying A T T, Robillard M P. Code fragment summarization. In Proc. the 9th Joint Meeting on Foundations of Software Engineering, Aug. 2013, pp.655-658.

[56] Nazar N, Jiang H, Gao G, Zhang T, Li X, Ren Z. Source code fragment summarization with small-scale crowdsourcing based features. Frontiers of Computer Science, 2016, 10(3): 504-517.

[57] Petrosyan G, Robillard M P, Mori R D. Discovering information explaining API types using text classification. In Proc. the 37th International Conference on Software Engineering-Volume 1, May 2015, pp.869-879.

[58] Mani S, Catherine R, Sinha V S, Dubey A. AUSUM: Approach for unsupervised bug report summarization. In Proc. the 20th International Symposium on the Foundations of Software Engineering, Nov. 2012, Article No. 11.

[59] Lotufo R,Malik Z, Czarnecki K.Modelling the ‘hurried’ bug report reading process to summarize bug reports. Empirical Software Engineering, 2015, 20(2): 516-548.

[60] Yeasmin S, Roy C, Schneider K. Interactive visualization of bug reports using topic evolution and extractive summaries. In Proc. the IEEE International Conference on Software Maintenance and Evolution, Sept. 2014, pp.421-425.

[61] Fowkes J, Chanthirasegaran P, Allamanis M, Lapata M, Sutton C A. TASSAL: Autofolding for source code summarization. In Proc. the 38th International Conference on Software Engineering Companion, May 2016, pp.649-652.

[62] Aponte J, Marcus A. Improving traceability link recovery methods through software artifact summarization. In Proc. the 6th International Workshop on Traceability in Emerging Forms of Software Engineering, May 2011, pp.46-49.

[63] Fritz T, Shepherd D C, Kevic K, Snipes W, Bräunlich C. Developers' code context models for change tasks. In Proc. the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering, Nov. 2014, pp.7-18.

[64] Kevic K, Walters B M, Shaffer T R, Sharif B, Shepherd D C, Fritz T. Tracing software developers' eyes and interactions for change tasks. In Proc. the 10th Joint Meeting on Foundations of Software Engineering, Aug.31-Sept.4, 2015, pp.202-213.

[65] Ying A T T, Robillard M P. Selection and presentation practices for code example summarization. In Proc. the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering, Nov. 2014, pp.460-471.

[66] Sun C, Lo D, Khoo S C, Jiang J. Towards more accurate retrieval of duplicate bug reports. In Proc. the 26th IEEE/ACM International Conference on Automated Software Engineering (ASE), Nov. 2011, pp.253-262.

[67] Wang X, Zhang L, Xie T, Anvik J, Sun J. An approach to detecting duplicate bug reports using natural language and execution information. In Proc. the 30th ACM/IEEE International Conference on Software Engineering, May 2008, pp.461-470.

[68] Runeson P, Alexandersson M, Nyholm O. Detection of duplicate defect reports using natural language processing. In Proc. the 29th International Conference on Software Engineering, May 2007, pp.499-510.

[69] McBurney PW, McMillan C. An empirical study of the textual similarity between source code and source code summaries. Empirical Software Engineering, 2014: 21(1): 17-42.

[70] Hill E, Pollock L, Vijay-Shanker K. Automatically capturing source code context of NL-queries for software maintenance and reuse. In Proc. the 31st International Conference on Software Engineering, May 2009, pp.232-242.

[71] Treude C, Filho F F, Kulesza U. Summarizing and measuring development activity. In Proc. the 10th Joint Meeting on Foundations of Software Engineering, Sept. 2015, pp.625-636.

[72] Chang C C, Lin C J. LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology, 2011, 2(3): Article No. 27.

[73] Fan R E, Chang K W, Hsieh C J, Wang X R, Lin C J. Liblinear: A library for large linear classification. Journal of Machine Learning Research, 2008, 9: 1871-1874.

[74] Wong E, Liu T, Tan L. Clocom: Mining existing source code for automatic comment generation. In Proc. the 22nd International Conference on Software Analysis, Evolution and Reengineering (SANER), Mar. 2015, pp.380-389.

[75] Jones K S, Galliers J R. Evaluating Natural Language Processing Systems: An Analysis and Review. Springer-Verlag Berlin Heidelberg, 1995.

[76] Nenkova A, McKeown K. Automatic summarization. Foundations and Trends in Information Retrieval, 2011, 5(2/3): 103-233.

[77] Cohen J. A coefficient of agreement for nominal scales. Educational and Psychological Measurement, 1960, 20(1): 37-46.

[78] Nenkova A, Passonneau R J. Evaluating content selection in summarization: The pyramid method. In Proc. the Human Language Technology/North American Chapter of the Association for Computational Linguistics, May 2004, pp.145-152.

[79] Kitchenham B, Brereton P. A systematic review of systematic review process research in software engineering. Information and Software Technology, 2013, 55(12): 2049-2075.

[80] Mesquida A L, Mas A, Amengual E, Calvo-Manzano J A. It service management process improvement based on ISO/IEC 15504: A systematic review. Information and Software Technology, 2012, 54(3): 239-247.

[81] Shihab E, Jiang Z M, Hassan A E. Studying the use of developer IRC meetings in open source projects. In Proc. the IEEE International Conference on Software Maintenance, Nov. 2009, pp.147-156.

[82] Guzzi A, Begel A, Miller J K, Nareddy K. Facilitating enterprise software developer communication with cares. In Proc. the 28th IEEE International Conference on Software Maintenance (ICSM), Sept. 2012, pp.527-536.

[83] Ponzanelli L, Mocci A, Lanza M. Summarizing complex development artifacts by mining heterogeneous data. In Proc. the 12th IEEE/ACM Working Conference on Mining Software Repositories, May 2015, pp.401-405.

[84] Zhao Y, Zhu Q. Evaluation on crowdsourcing research: Current status and future direction. Information Systems Frontiers, 2014, 16(3): 417-434.

[85] Howe J. The rise of crowdsourcing. http: //www.wired.com/ 2006/06/crowds/, July 2006.

[86] Greengard S. Following the crowd. Communications of the ACM, 2011, 54(2): 20-22.

[87] Whitla P. Crowdsourcing and its application in marketing activities. Contemporary Management Research, 2009, 5(1): 15-28.
No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] Liu Mingye; Hong Enyu;. Some Covering Problems and Their Solutions in Automatic Logic Synthesis Systems[J]. , 1986, 1(2): 83 -92 .
[2] Wang Xuan; Lü Zhimin; Tang Yuhai; Xiang Yang;. A High Resolution Chinese Character Generator[J]. , 1986, 1(2): 1 -14 .
[3] Chen Shihua;. On the Structure of (Weak) Inverses of an (Weakly) Invertible Finite Automaton[J]. , 1986, 1(3): 92 -100 .
[4] Gao Qingshi; Zhang Xiang; Yang Shufan; Chen Shuqing;. Vector Computer 757[J]. , 1986, 1(3): 1 -14 .
[5] Chen Zhaoxiong; Gao Qingshi;. A Substitution Based Model for the Implementation of PROLOG——The Design and Implementation of LPROLOG[J]. , 1986, 1(4): 17 -26 .
[6] Huang Heyan;. A Parallel Implementation Model of HPARLOG[J]. , 1986, 1(4): 27 -38 .
[7] Min Yinghua; Han Zhide;. A Built-in Test Pattern Generator[J]. , 1986, 1(4): 62 -74 .
[8] Wu Yunzeng;. On the Development of Applications of Logic in Programming[J]. , 1987, 2(1): 30 -34 .
[9] Tang Tonggao; Zhao Zhaokeng;. Stack Method in Program Semantics[J]. , 1987, 2(1): 51 -63 .
[10] Min Yinghua;. Easy Test Generation PLAs[J]. , 1987, 2(1): 72 -80 .

ISSN 1000-9000(Print)

         1860-4749(Online)
CN 11-2296/TP

Home
Editorial Board
Author Guidelines
Subscription
Journal of Computer Science and Technology
Institute of Computing Technology, Chinese Academy of Sciences
P.O. Box 2704, Beijing 100190 P.R. China
Tel.:86-10-62610746
E-mail: jcst@ict.ac.cn
 
  Copyright ©2015 JCST, All Rights Reserved