Abstract Code reviews in pull-based model are open to community users on GitHub. Various participants are taking part in the review discussions and the review topics are not only about the improvement of code contributions but also about project evolution and social interaction. A comprehensive understanding of the review topics in pull-based model would be useful to better organize the code review process and optimize review tasks such as reviewer recommendation and pull-request prioritization. In this paper, we first conduct a qualitative study on three popular open-source software projects hosted on GitHub and construct a fine-grained two-level taxonomy covering four level-1 categories (code correctness, pullrequest decision-making, project management, and social interaction) and 11 level-2 subcategories (e.g., defect detecting, reviewer assigning, contribution encouraging). Second, we conduct preliminary quantitative analysis on a large set of review comments that were labeled by TSHC (a two-stage hybrid classification algorithm), which is able to automatically classify review comments by combining rule-based and machine-learning techniques. Through the quantitative study, we explore the typical review patterns. We find that the three projects present similar comments distribution on each subcategory. Pull-requests submitted by inexperienced contributors tend to contain potential issues even though they have passed the tests. Furthermore, external contributors are more likely to break project conventions in their early contributions.
This work was supported by the National Key Research and Development Program of China under Grant No. 2016YFB1000805 and the National Natural Science Foundation of China under Grant Nos. 61432020, 61303064, 61472430 and 61502512.
About author: Zhi-Xing Li is a Master student in the College of Computer,National University of Defense Technology,Changsha.His work interests include open source software engineering,data mining,and knowledge discovering in open source software.
Cite this article:
Zhi-Xing Li, Yue Yu, Gang Yin, Tao Wang, Huai-Min Wang.What Are They Talking About? Analyzing Code Reviews in Pull-Based Development Model[J] Journal of Computer Science and Technology, 2017,V32(6): 1060-1075
 Barr E T, Bird C, Rigby P C, Hindle A, German D M, Devanbu P. Cohesive and isolated development with branches. In Fundamental Approaches to Software Engineering, De Lara J, Zisman A (eds.), Springer, 2012, pp.316-331. Gousios G, Pinzger M, van Deursen A. An exploratory study of the pull-based software development model. In Proc. the 36th Int. Conf. Software Engineering, May 31-June 7, 2014, pp.345-355. Gousios G, Zaidman A, Storey M A, van Deursen A. Work practices and challenges in pull-based development:The integrator's perspective. In Proc. the 37th Int. Conf. Software Engineering, May 2015, pp.358-368. Gousios G, Storey M A, Bacchelli A. Work practices and challenges in pull-based development:The contributor's perspective. In Proc. the 38th Int. Conf. Software Engineering, May 2016, pp.285-296. Tsay J, Dabbish L, Herbsleb J. Let's talk about it:Evaluating contributions through discussion in GitHub. In Proc. the 22nd ACM SIGSOFT Int. Symp. Foundations of Software Engineering, November 2014, pp.144-154. Marlow J, Dabbish L, Herbsleb J. Impression formation in online peer production:Activity traces and personal profiles in GitHub. In Proc. Conf. Computer Supported Cooperative Work, February 2013, pp.117-128. Yu Y, Wang H M, Yin G, Wang T. Reviewer recommendation for pull-requests in GitHub:What can we learn from code review and bug assignment? Information and Software Technology, 2016, 74:204-218. Tsay J, Dabbish L, Herbsleb J. Influence of social and technical factors for evaluating contribution in GitHub. In Proc. the 36th Int. Conf. Software Engineering, May 31-June 7, 2014, pp.356-366. Yu Y, Yin G, Wang T, Yang C, Wang H M. Determinants of pull-based development in the context of continuous integration. Science China Information Sciences, 2016, 59:080104. Thongtanunam P, McIntosh S, Hassan A E, Iida H. Investigating code review practices in defective files:An empirical study of the QT system. In Proc. the 12th Working Conf. Mining Software Repositories, May 2015, pp.168-179. Storey M A, Singer L, Cleary B, Filho F F, Zagalsky A. The (r)evolution of social media in software engineering. In Proc. the Future of Software Engineering, May 31-June 7, 2014, pp.100-116. Zhu J X, Zhou M H, Mockus A. Effectiveness of code contribution:From patch-based to pull-request-based tools. In Proc. the 24th ACM SIGSOFT Int. Symp. Foundations of Software Engineering, November 2016, pp.871-882. De Lima M L, Soares D M, Plastino A, Murta L. Developers assignment for analyzing pull requests. In Proc. the 30th Annual ACM Symp. Applied Computing, April 2015, pp.1567-1572. van der Veen E, Gousios G, Zaidman A. Automatically prioritizing pull requests. In Proc. the 12th Working Conf. Mining Software Repositories, May 2015, pp.357-361. Bacchelli A, Bird C. Expectations, outcomes, and challenges of modern code review. In Proc. the 35th Int. Conf. Software Engineering, May 2013, pp.712-721. Rigby P C, Bacchelli A, Gousios G, Mukadam M. A mixed methods approach to mining code review data:Examples and a study of multi-commit reviews and pull requests. In The Art and Science of Analyzing Software Data, Bird C, Menzies T, Zimmermann T (eds.), Morgan Kaufmann, 2015, pp.231-255. Vasilescu B, Yu Y, Wang H M, Devanbu P, Filkov V. Quality and productivity outcomes relating to continuous integration in GitHub. In Proc. the 10th Joint Meeting on Foundations of Software Engineering, August 30-September 4, 2015, pp.805-816. Mcintosh S, Kamei Y, Adams B, Hassan A E. An empirical study of the impact of modern code review practices on software quality. Empirical Software Engineering, 2016, 21(5):2146-2189. Fagan M. Design and code inspections to reduce errors in program development. In Software Pioneers, Broy M, Denert E (eds.), Springer-Verlag, 2002, pp.575-607. Aurum A, Petersson H, Wohlin C. State-of-the-art:Software inspections after 25 years. Sofware:Testing Verification and Reliability, 2002, 12(3):133-154. Rigby P, Cleary B, Painchaud F, Storey M A, German D. Contemporary peer review in action:Lessons from open source development. IEEE Software, 2012, 29(6):56-61. Rigby P C, Storey M A. Understanding broadcast based peer review on open source software projects. In Proc. the 33rd Int. Conf. Software Engineering, May 2011, pp.541-550. Baum T, Liskin O, Niklas K, Schneider K. Factors influencing code review processes in industry. In Proc. the 24th ACM SIGSOFT Int. Symp. Foundations of Software Engineering, November 2016, pp.85-96. Mcintosh S, Kamei Y, Adams B, Hassan A E. The impact of code review coverage and code review participation on software quality:A case study of the QT, VTK, and ITK projects. In Proc. the 11th Working Conf. Mining Software Repositories, May 31-June 1, 2014, pp.192-201. Thongtanunam P, Mcintosh S, Hassan A E, Iida H. Review participation in modern code review. Empirical Software Engineering, 2016, 22(2):768-817. Zhang Y, Wang H M, Yin G, Wang T, Yu Y. Social media in GitHub:The role of@-mention in assisting software development. Science China Information Sciences, 2017, 60:032102. Baeza-Yates R A, Ribeiro-Neto B. Modern Information Retrieval:The Concepts and Technology Behind Search (2nd edition). Addison Wesley, 2011. Zhou Y, Tong Y X, Gu R H, Gall H. Combining text mining and data mining for bug report classification. Journal of Software:Evolution and Process, 2016, 28(3):150-176. Shah S K. Motivation, governance, and the viability of hybrid forms in open source software development. Management Science, 2006, 52(7):1000-1014.
Copyright 2010 by Journal of Computer Science and Technology