We use cookies to improve your experience with our site.

基于历史提交消息的代码修改描述推荐方法

Learning Human-Written Commit Messages to Document Code Changes

  • 摘要: 1、研究背景(context):在软件代码维护过程中,理解代码修改往往占据程序员大量的时间。代码修改经常以提交(commit)的方式存储于版本控制器中,提交消息描述了代码修改涉及的内容,可以用于辅助程序员理解代码修改。有研究表明,目前许多储存在版本控制器上的提交存在消息缺失的问题。因此,有研究人员提出了不同的生成方法用于自动补全提交消息。然而,当前大部分方法不能生成理想的提交消息。这些方法生成的消息只能描述浅层的代码修改内容,不能描述代码背后修改的原因。
    2、目的(Objective):为代码修改生成描述修改原因的提交消息,从而提升程序员理解代码修改的效率。
    3、方法(Method):考虑到版本控制器中程序员手写的提交消息往往描述了代码修改的原因。因此,可以让当前代码修改直接复用已有提交消息。具体来说,通过分析当前代码修改与版本控制器中提交的代码语法与语义相似度,将相似提交的消息推荐给当前代码修改。为了提高消息推荐效率,从版本控制器上下载了超过五十万个提交储存本地数据库,同时从抽象语法树获取代码语法信息,使用词嵌入方式获取代码语义信息,最后使用代码克隆检测方式计算代码片段间的相似度。
    4、结果(Result&Findings):实验结果表明,在推荐的消息中,21.5%的消息可以被当前代码修改直接复用;62.8%的消息通过调整后可被当前代码修改复用。同时,还分析了推荐的信息不能被直接复用的原因。此外,与已有方法对比中发现,提出的方法所生成的提交消息在简洁性、精准性、可表达性等方面都表现更优。
    5、结论(Conclusions):结果表明,通过复用已有提交的消息,可以为当前代码修改推荐描述修改原因的消息。在未来的工作中,需进一步提升推荐消息的准确率,提升可直接复用的提交消息的比例。

     

    Abstract: Commit messages are important complementary information used in understanding code changes. To address message scarcity, some work is proposed for automatically generating commit messages. However, most of these approaches focus on generating summary of the changed software entities at the superficial level, without considering the intent behind the code changes (e.g., the existing approaches cannot generate such message: “fixing null pointer exception”). Considering developers often describe the intent behind the code change when writing the messages, we propose ChangeDoc, an approach to reuse existing messages in version control systems for automatical commit message generation. Our approach includes syntax, semantic, pre-syntax, and pre-semantic similarities. For a given commit without messages, it is able to discover its most similar past commit from a large commit repository, and recommend its message as the message of the given commit. Our repository contains half a million commits that were collected from SourceForge. We evaluate our approach on the commits from 10 projects. The results show that 21.5% of the recommended messages by ChangeDoc can be directly used without modification, and 62.8% require minor modifications. In order to evaluate the quality of the commit messages recommended by ChangeDoc, we performed two empirical studies involving a total of 40 participants (10 professional developers and 30 students). The results indicate that the recommended messages are very good approximations of the ones written by developers and often include important intent information that is not included in the messages generated by other tools.

     

/

返回文章
返回