? 基于聚类的跨项目软件缺陷预测特征选择方法
Journal of Computer Science and Technology
Quick Search in JCST
 Advanced Search 
      Home | PrePrint | SiteMap | Contact Us | Help
 
Indexed by   SCIE, EI ...
Bimonthly    Since 1986
Journal of Computer Science and Technology 2017, Vol. 32 Issue (6) :1090-1107    DOI: 10.1007/s11390-017-1785-0
Special Section on Software Systems 2017 << Previous Articles | Next Articles >>
基于聚类的跨项目软件缺陷预测特征选择方法
Chao Ni1, Student Member, IEEE, Wang-Shu Liu1, Xiang Chen1,2, Senior Member, CCF, Qing Gu2, Senior Member, CCF, Dao-Xu Chen1, Fellow, CCF, Member, ACM, IEEE, Qi-Guo Huang1, Member, CCF
1 State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing 210023, China;
2 School of Computer Science and Technology, Nantong University, Nantong 226019, China
A Cluster Based Feature Selection Method for Cross-Project Software Defect Prediction
Chao Ni1, Student Member, IEEE, Wang-Shu Liu1, Xiang Chen1,2, Senior Member, CCF, Qing Gu2, Senior Member, CCF, Dao-Xu Chen1, Fellow, CCF, Member, ACM, IEEE, Qi-Guo Huang1, Member, CCF
1 State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing 210023, China;
2 School of Computer Science and Technology, Nantong University, Nantong 226019, China

摘要
参考文献
相关文章
Download: [PDF 516KB]  
摘要 跨项目缺陷预测(Cross-project defect prediction,CPDP)使用其他源项目的标记过的数据来弥补目标项目中缺少有用数据的不足,从而建立一个有意义的分类模型。然而,从源项目和目标项目中提取的软件特性之间的分布差距可能会很大以至于源项目和目标项目混合而成的数据对训练一个有效的分类模型无用。在本文中,我们提出了一种基于聚类的新颖方法FeSCH (Feature Selection using Clusters of Hybrid-data,FeSCH),该方法通过特征选择来减小特征之间的分布差异。FeSCH包含两个阶段。特征聚类阶段采用基于密度的聚类方法对特征进行聚类,而特征选择阶段使用排序策略从每个簇中选择特征。对于跨项目缺陷预测而言,我们在FeSCH方法的第二阶段设计了三种不同的启发式排序策略。为了探究FeSCH方法的预测性能,我们基于现实世界的软件工程项目设计了实验,并且研究了FeSCH的不同设计选项对性能的影响,例如,排序策略、特征选择比例和基分类器。实验结果证明了FeSCH的有效性。首先,FeSCH取得了比典型跨项目缺陷预测方法更好的性能,并且取得的性能不受所使用的基分类器的影响。其次,FeSCH依据特征类别有效地选择了特征,从而提高了预测的性能,同时也为缺陷预测领域选取有用的特征提供了指导。
关键词软件缺陷预测   跨项目缺陷预测   特征选择   特征聚类   基于密度峰聚类     
Abstract: Cross-project defect prediction (CPDP) uses the labeled data from external source software projects to compensate the shortage of useful data in the target project, in order to build a meaningful classification model. However, the distribution gap between software features extracted from the source and the target projects may be too large to make the mixed data useful for training. In this paper, we propose a cluster-based novel method FeSCH (Feature Selection Using Clusters of Hybrid-Data) to alleviate the distribution differences by feature selection. FeSCH includes two phases. The feature clustering phase clusters features using a density-based clustering method, and the feature selection phase selects features from each cluster using a ranking strategy. For CPDP, we design three different heuristic ranking strategies in the second phase. To investigate the prediction performance of FeSCH, we design experiments based on real-world software projects, and study the effects of design options in FeSCH (such as ranking strategy, feature selection ratio, and classifiers). The experimental results prove the effectiveness of FeSCH. Firstly, compared with the state-of-the-art baseline methods, FeSCH achieves better performance and its performance is less affected by the classifiers used. Secondly, FeSCH enhances the performance by effectively selecting features across feature categories, and provides guidelines for selecting useful features for defect prediction.
Keywordssoftware defect prediction   cross-project defect prediction   feature selection   feature clustering   density-based clustering     
Received 2017-04-21;
本文基金:

This work is supported in part by the National Natural Science Foundation of China under Grant Nos. 61373012, 91218302, 61321491 and 61202006, the Collaborative Innovation Center of Novel Software Technology and Industrialization, the Open Project of State Key Laboratory for Novel Software Technology at Nanjing University under Grant No. KFKT2016B18, and the National Basic Research 973 Program of China under Grant No. 2009CB320705.

通讯作者: Qing Gu     Email: guq@nju.edu.cn
About author: Chao Ni received his B.S.degree in computer science from Nantong University,Nantong,in 2014.Then he received his M.S.degree in computer science from Nanjing University,Nanjing,in 2017.Now he is a Ph.D.candidate of State Key Laboratory for Novel Software Technology and the Department of Computer Science and Technology,Nanjing University,Nanjing.His research interests are mainly in software defect prediction and machine learning.
引用本文:   
Chao Ni, Wang-Shu Liu, Xiang Chen, Qing Gu, Dao-Xu Chen, Qi-Guo Huang.基于聚类的跨项目软件缺陷预测特征选择方法[J]  Journal of Computer Science and Technology , 2017,V32(6): 1090-1107
Chao Ni, Wang-Shu Liu, Xiang Chen, Qing Gu, Dao-Xu Chen, Qi-Guo Huang.A Cluster Based Feature Selection Method for Cross-Project Software Defect Prediction[J]  Journal of Computer Science and Technology, 2017,V32(6): 1090-1107
链接本文:  
http://jcst.ict.ac.cn:8080/jcst/CN/10.1007/s11390-017-1785-0
Copyright 2010 by Journal of Computer Science and Technology