使用机器学习技术和软件度量的代码异味预测与预测分析

doi:10.1007/s11390-020-0323-7

使用机器学习技术和软件度量的代码异味预测与预测分析

Predicting Code Smells and Analysis of Predictions: Using Machine Learning Techniques and Software Metrics

摘要

摘要: 代码异味检测对提高软件质量十分重要，可以提升软件可维护性，降低软件系统失误和故障的风险。本文基于机器学习技术和软件度量提出了一个软件异味预测方法。本文使用本地可判断模型无关解释（local interpretable model-agnostic explanation，LIME）算法解释机器学习模型的预测和可判断性，重新整理了Fontana等人的数据集，以创建二值标签和多标签数据集。10折交叉验证结果显示基于树形算法（主要指Random Forest）的性能优于基于核心和基于网络算法。基于遗传算法的特征选择方法通过选择每个数据集最相关的特征提升了此类机器学习算法的精确度。此外，基于网格搜索算法的参数优化技术使所有的这些算法的精确度得到显著提高。总之，机器学习技术能很好地进行代码异味预测，有助于异味检测和软件质量提升。

Abstract: Code smell detection is essential to improve software quality, enhancing software maintainability, and decrease the risk of faults and failures in the software system. In this paper, we proposed a code smell prediction approach based on machine learning techniques and software metrics. The local interpretable model-agnostic explanations (LIME) algorithm was further used to explain the machine learning model’s predictions and interpretability. The datasets obtained from Fontana et al. were reformed and used to build binary-label and multi-label datasets. The results of 10-fold cross-validation show that the performance of tree-based algorithms (mainly Random Forest) is higher compared with kernel-based and network-based algorithms. The genetic algorithm based feature selection methods enhance the accuracy of these machine learning algorithms by selecting the most relevant features in each dataset. Moreover, the parameter optimization techniques based on the grid search algorithm significantly enhance the accuracy of all these algorithms. Finally, machine learning techniques have high potential in predicting the code smells, which contribute to detect these smells and enhance the software’s quality.

HTML全文

参考文献()

施引文献

资源附件()