面向基于合并请求开发模式的代码审查分析

doi:10.1007/s11390-017-1783-2

面向基于合并请求开发模式的代码审查分析

What Are They Talking About? Analyzing Code Reviews in Pull-Based Development Model

摘要

摘要: 在GitHub平台上，Pull-based模型下的代码审查以众包的形式开发给社区所有成员。审查者们所谈论的不仅仅局限于如何提高代码质量，还包括项目演进和社交互动等关注点。深入理解Pull-based模式下代码审查的审查关注点有利于更好地组织代码审查流程、优化审查任务（如审查者推荐、审查优先级决策）。在本文中，我们首先对GitHub上的三个流行开源软件项目进行了定性研究，构建了一个细分两级分类体系。该体系涵盖4个1级类别（代码改进，贡献价值决策，项目管理和社交互动）和11个2级子类别（例如缺陷检测，审查者指派，贡献鼓励等）。其次，我们用自动化的两级混合分类模型标注了大规模的审查评论，该模型通过综合利用基于规则的技术和机器学习算法自动标注审阅评论。基于此标注数据集，我们进行定量分析，研究了Pull-based模式下典型的审查模式。我们发现三个调研项目在每个类别上都有相似的评论分布；没有经验的贡献者所提交的代码贡献，即使已通过测试也可能会包含潜在的问题；此外，外部贡献者更有可能在贡献早期违反项目约定。

Abstract: Code reviews in pull-based model are open to community users on GitHub. Various participants are taking part in the review discussions and the review topics are not only about the improvement of code contributions but also about project evolution and social interaction. A comprehensive understanding of the review topics in pull-based model would be useful to better organize the code review process and optimize review tasks such as reviewer recommendation and pull-request prioritization. In this paper, we first conduct a qualitative study on three popular open-source software projects hosted on GitHub and construct a fine-grained two-level taxonomy covering four level-1 categories (code correctness, pullrequest decision-making, project management, and social interaction) and 11 level-2 subcategories (e.g., defect detecting, reviewer assigning, contribution encouraging). Second, we conduct preliminary quantitative analysis on a large set of review comments that were labeled by TSHC (a two-stage hybrid classification algorithm), which is able to automatically classify review comments by combining rule-based and machine-learning techniques. Through the quantitative study, we explore the typical review patterns. We find that the three projects present similar comments distribution on each subcategory. Pull-requests submitted by inexperienced contributors tend to contain potential issues even though they have passed the tests. Furthermore, external contributors are more likely to break project conventions in their early contributions.

HTML全文

参考文献()

施引文献

资源附件()