|
计算机科学技术学报 ›› 2021,Vol. 36 ›› Issue (1): 191-206.doi: 10.1007/s11390-020-9935-1
所属专题: Software Systems
Zhi-Xing Li1, Yue Yu1,*, Member, CCF, ACM, Tao Wang1, Member, CCF, ACM Gang Yin1, Member, CCF, ACM, Xin-Jun Mao2, Member, CCF, ACM, and Huai-Min Wang1, Fellow, CCF, ACM
Zhi-Xing Li1, Yue Yu1,*, Member, CCF, ACM, Tao Wang1, Member, CCF, ACM Gang Yin1, Member, CCF, ACM, Xin-Jun Mao2, Member, CCF, ACM, and Huai-Min Wang1, Fellow, CCF, ACM
在开源软件分布式协同开发过程中,开发者之间的沟通和协调一直是备受关注的研究问题。作为目前最先进的协同开发机制,基于pull-request(合并请求)的开发模式为开源开发者提供了高度的开放性和透明性,提高了其工作的可见性。然而,由于此开发模式的并行性和无中心协调的性质,仍存在多个开发者提交重复性合并请求的现象。重复的合并请求如果没有被及时检测到,可能会导致贡献者和审查者浪费时间和精力做冗余的审查和更新工作。在本文中,我们提出了一种综合利用文本和变更相似度以自动检测重复合并请求的方法。对于给定的合并请求,我们首先计算它与历史合并请求之间的文本相似度以及变更相似度,然后利用贪心搜索策略得到混合相似度,并依据混合相似度返回一组相似度最高的合并请求列表。实验结果显示,当我们使用混合相似度时,召回率可以达到83.4%,而仅使用文本相似度时召回率为54.8%,仅使用变更相似度时召回率为78.2%。
[1] Herbsleb J D, Mockus A. An empirical study of speed and communication in globally distributed software development. IEEE Transactions on Software Engineering, 2003, 29(6):481-494. DOI:10.1109/TSE.2003.1205177. [2] Espinosa J, Slaughter S, Kraut R, Herbsleb J. Team knowledge and coordination in geographically distributed software development. Journal of Management Information Systems, 2007, 24(1):135-169. DOI:10.2753/MIS0742- 1222240104. [3] Storey M A, Singer L, Cleary B, Filho F M, Zagalsky A. The (r)evolution of social media in software engineering. In Proc. the 2014 International Conference on Future of Software Engineering, May 31-June 7, 2014, pp.100-116. DOI:10.1145/2593882.2593887. [4] Zhu J, Zhou M, Mockus A. Effectiveness of code contribution:From patch-based to pull-request-based tools. In Proc. the 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering, November 2016, pp.871-882. DOI:10.1145/2950290.2950364. [5] Gousios G, Pinzger M, van Deursen A. An exploratory study of the pull-based software development model. In Proc. the 36th International Conference on Software Engineering, May 2014, pp.345-355. DOI:10.1145/2568225.2568260. [6] Yu Y, Yin G, Wang T, Yang C, Wang H. Determinants of pull-based development in the context of continuous integration. SCIENCE CHINA:Information Sciences, 2016, 59(8):Article No. 080104. DOI:10.1007/s11432-016-5595- 8. [7] Ye Y, Kishida K. Toward an understanding of the motivation of open source software developers. In Proc. the 2003 IEEE/ACM International Conference on Software Engineering, May 2003, pp.419-49. DOI:10.1109/ICSE.2003.1201220. [8] Barcomb A, Kaufmann A, Riehle D, Stol K J, Fitzgerald B. Uncovering the periphery:A qualitative survey of episodic volunteering in free/libre and open source software communities. IEEE Transactions on Software Engineering, 2020, 46(9):962-980. DOI:10.1109/TSE.2018.2872713. [9] Gousios G, Zaidman A, Storey M A, van Deursen A. Work practices and challenges in pull-based development:The integrator's perspective. In Proc. the 37th International Conference on Software Engineering, May 2015, pp.358- 368. DOI:10.1109/ICSE.2015.55. [10] Yu Y, Wang H, Yin G, Wang T. Reviewer recommendation for pull-requests in GitHub:What can we learn from code review and bug assignment? Information and Software Technology, 2016, 74:204-218. DOI:10.1016/j.infsof.2016.01.004. [11] Thongtanunam P, Tantithamthavorn C, Kula R G, Yoshida N, Iida H, Matsumoto K. Who should review my code? A file location-based code-reviewer recommendation approach for modern code review. In Proc. the 22nd International Conference on Software Analysis, Evolution, and Reengineering, March 2015, pp.141-150. DOI:10.1109/SANER.2015.7081824. [12] Steinmacher I, Pinto G, Wiese I S, Gerosa M A. Almost there:A study on quasi-contributors in open-source software projects. In Proc. the 40th International Conference on Software Engineering, May 2018, pp.256-266. DOI:10.1145/3180155.3180208. [13] Yu Y, Li Z, Yin G, Wang T, Wang H. A dataset of duplicate pull-requests in GitHub. In Proc. the 15th International Conference on Mining Software Repositories, May 2018, pp.22-25. DOI:10.1145/3196398.3196455. [14] Gousios G, Storey M A, Bacchelli A. Work practices and challenges in pull-based development:The contributor's perspective. In Proc. the 38th International Conference on Software Engineering, May 2016, pp.285-296. DOI:10.1145/2884781.2884826. [15] Yu Y, Wang H, Yin G, Ling C X. Reviewer recommender of pull-requests in GitHub. In Proc. the 2014 International Conference on Software Maintenance and Evolution, September 2014, pp.609-612. DOI:10.1109/ICSME.2014.107. [16] Li Z X, Yu Y, Yin G, Wang T, Wang H M. What are they talking about? Analyzing code reviews in pull-based development model. Journal of Computer Science and Technology, 2017, 32(6):1060-1075. DOI:10.1007/s11390- 017-1783-2. [17] Li Z, Yin G, Yu Y, Wang T, Wang H. Detecting duplicate pull-requests in GitHub. In Proc. the 9th Asia-Pacific Symposium on Internetware, September 2017, Article No. 20. DOI:10.1145/3131704.3131725. [18] Runeson P, Alexandersson M, Nyholm O. Detection of duplicate defect reports using natural language processing. In Proc. the 29th International Conference on Software Engineering, May 2007, pp.499-510. DOI:10.1109/ICSE.2007.32. [19] Wang X, Zhang L, Xie T et al. An approach to detecting duplicate bug reports using natural language and execution information. In Proc. the 30th International Conference on Software Engineering, May 2008, pp.461-470. DOI:10.1145/1368088.1368151. [20] Nguyen A T, Nguyen T T, Nguyen T N, Lo D, Sun C. Duplicate bug report detection with a combination of information retrieval and topic modeling. In Proc. the 27th International Conference on Automated Software Engineering, September 2012, pp.70-79. DOI:10.1145/2351676.2351687. [21] Lazar A, Ritchey S, Sharif B. Improving the accuracy of duplicate bug report detection using textual similarity measures. In Proc. the 11th Working Conference on Mining Software Repositories, May 2014, pp.308-311. DOI:10.1145/2597073.2597088. [22] Porter M F. An algorithm for suffix stripping. In Readings in Information Retrieval, Jones K S, Willett P (eds.), Morgan Kaufmann Publishers Inc., 1997, pp.313-316. [23] Manning C D, Schütze H. Foundations of Statistical Natural Language Processing. MIT Press, 1999. [24] Sun C, Lo D, Wang X, Jiang J, Khoo S C. A discriminative model approach for accurate duplicate bug report retrieval. In Proc. the 32nd International Conference on Software Engineering, May 2010, pp.45-54. DOI:10.1145/1806799.1806811. [25] Sun C, Lo D, Khoo S C, Jiang J. Towards more accurate retrieval of duplicate bug reports. In Proc. the 26th International Conference on Automated Software Engineering, November 2011, pp.253-262. DOI:10.1109/ASE.2011.6100061. [26] Zhang Y, Lo D, Xia X, Sun J. Multi-factor duplicate question detection in stack overflow. Journal of Computer Science and Technology, 2015, 30(5):981-997. DOI:10.1007/s11390-015-1576-4. [27] Mann H B, Whitney D R. On a test of whether one of two random variables is stochastically larger than the other. The Annals of Mathematical Statistics, 1947, 18(1):50-60. [28] Thung F, Kochhar P S, Lo D. DupFinder:Integrated tool support for duplicate bug report detection. In Proc. the 29th International Conference on Automated Software Engineering, September 2014, pp.871-874. DOI:10.1145/2642937.2648627. [29] Tsay J, Dabbish L, Herbsleb J. Influence of social and technical factors for evaluating contribution in GitHub. In Proc. the 36th International Conference on Software Engineering, May 2014, pp.356-366. DOI:10.1145/2568225.2568315. [30] van der Veen E, Gousios G, Zaidman A. Automatically prioritizing pull requests. In Proc. the 12th Working Conference on Mining Software Repositories, May 2015, pp.357- 361. DOI:10.1109/MSR.2015.40. [31] Baysal O, Kononenko O, Holmes Ret al. Investigating technical and non-technical factors influencing modern code review. Empirical Software Engineering, 2016, 21(3):932- 959. DOI:10.1007/s10664-015-9366-8. [32] Mcintosh S, Kamei Y, Adams B et al. An empirical study of the impact of modern code review practices on software quality. Empirical Software Engineering, 2016, 21(5):2146- 2189. DOI:10.1007/s10664-015-9381-9. [33] Fagan M E. Design and code inspections to reduce errors in program development. In Pioneers and Their Contributions to Software Engineering, Broy M, Denert E (eds.), Springer, 2001, pp.301-334. DOI:10.1007/978-3-642-48354- 713. [34] Bacchelli A, Bird C. Expectations, outcomes, and challenges of modern code review. In Proc. the 35th International Conference on Software Engineering, May 2013, pp.712-721. DOI:10.1109/ICSE.2013.6606617. [35] Rigby P C, Storey M A. Understanding broadcast based peer review on open source software projects. In Proc. the 33rd International Conference on Software Engineering, May 2011, pp.541-550. DOI:10.1145/1985793.1985867. [36] Thongtanunam P, McIntosh S, Hassan A E, Iida H. Investigating code review practices in defective files:An empirical study of the Qt system. In Proc. the 12th Working Conference on Mining Software Repositories, May 2015, pp.168- 179. DOI:10.1109/MSR.2015.23. [37] Jiang J, He J H, Chen X Y. CoreDevRec:Automatic core member recommendation for contribution evaluation. Journal of Computer Science and Technology, 2015, 30(5):998- 1016. DOI:10.1007/s11390-015-1577-3. [38] Rahman M M, Roy C K, Collins J A. CORRECT:Code reviewer recommendation in GitHub based on cross-project and technology experience. In Proc. the 38th International Conference on Software Engineering Companion, May 2016, pp.222-231. DOI:10.1145/2889160.2889244. [39] de Lima Júnior M L, Soares D M, Plastino A, Murta L. Developers assignment for analyzing pull requests. In Proc. the 30th Annual ACM Symposium on Applied Computing, April 2015, pp.1567-1572. DOI:10.1145/2695664.2695884. [40] Baum T, Liskin O, Niklas K, Schneider K. Factors influencing code review processes in industry. In Proc. the 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering, November 2016, pp.85-96. DOI:10.1145/2950290.2950323. [41] Beller M, Bacchelli A, Zaidman A, Jürgens E. Modern code reviews in open-source projects:Which problems do they fix? In Proc. the 11th Working Conference on Mining Software Repositories, May 2014, pp.202-211. DOI:10.1145/2597073.2597082. [42] Morales R, Mcintosh S, Khomh F. Do code review practices impact design quality? A case study of the Qt, VTK, and ITK projects. In Proc. the 22nd International Conference on Software Analysis, Evolution and Reengineering, March 2015, pp.171-180. DOI:10.1109/SANER.2015.7081827. [43] Mcintosh S, Kamei Y, Adams B, Hassan A E. The impact of code review coverage and code review participation on software quality:A case study of the Qt, VTK, and ITK projects. In Proc. the 11th Working Conference on Mining Software Repositories, May 2014, pp.192-201. DOI:10.1145/2597073.2597076. [44] Thongtanunam P, Mcintosh S, Hassan A E, Iida H. Revisiting code ownership and its relationship with software quality in the scope of modern code review. In Proc. the 38th International Conference on Software Engineering, May 2016, pp.1039-1050. DOI:10.1145/2884781.2884852. |
[1] | Zhi-Xing Li, Yue Yu, Gang Yin, Tao Wang, Huai-Min Wang. 面向基于合并请求开发模式的代码审查分析[J]. , 2017, 32(6): 1060-1075. |
|
版权所有 © 《计算机科学技术学报》编辑部 本系统由北京玛格泰克科技发展有限公司设计开发 技术支持:support@magtech.com.cn 总访问量: |