? 用非平衡学习策略识别高影响漏洞报告
Journal of Computer Science and Technology
Quick Search in JCST
 Advanced Search 
      Home | PrePrint | SiteMap | Contact Us | Help
 
Indexed by   SCIE, EI ...
Bimonthly    Since 1986
Journal of Computer Science and Technology 2017, Vol. 32 Issue (1) :181-198    DOI: 10.1007/s11390-017-1713-3
Regular Paper << Previous Articles | Next Articles >>
用非平衡学习策略识别高影响漏洞报告
Xin-Li Yang1(杨昕立), David Lo2, Member, ACM, IEEE, Xin Xia1,*(夏鑫), Member, CCF, ACM, IEEE, Qiao Huang1(黄乔), and Jian-Ling Sun1(孙建伶), Member, CCF, ACM
1 College of Computer Science and Technology, Zhejiang University, Hangzhou 310027, China;
2 School of Information Systems, Singapore Management University, Singapore, Singapore
High-Impact Bug Report Identification with Imbalanced Learning Strategies
Xin-Li Yang1(杨昕立), David Lo2, Member, ACM, IEEE, Xin Xia1,*(夏鑫), Member, CCF, ACM, IEEE, Qiao Huang1(黄乔), and Jian-Ling Sun1(孙建伶), Member, CCF, ACM
1 College of Computer Science and Technology, Zhejiang University, Hangzhou 310027, China;
2 School of Information Systems, Singapore Management University, Singapore, Singapore

摘要
参考文献
相关文章
Download: [PDF 1489KB]  
摘要 在实际应用中,一些漏洞比其他漏洞有更大的影响,因此更需要被立即关注。由于紧张的时间表和有限的人力资源,开发者没有足够时间去处理所有漏洞。因此,他们常常关注那些有高影响的漏洞。在文献中,高影响漏洞指那些意外出现的并带来意外影响的漏洞(意外漏洞),或破坏了原有的功能并降低了用户体验(破坏漏洞)。不幸的是,从上千漏洞报告中识别高影响漏洞不是一件易事。因此,一种能识别高影响漏洞的自动化技术能帮助开发者更早的意识到发现它们,快速修复它们,并将它们引起的损失降到最低。考虑到只有一小部分漏洞属于高影响漏洞,识别它们是一件困难的任务。本文中,我们提出用非平衡学习策略来识别高影响漏洞的方法。我们调研了不同方法变种的有效性。其中每种变种结合了一种特定的非平衡学习策略和一种特定的分类算法。我们选择了4种广泛应用的非平衡学习策略和4种常用的文本分类算法,并在4个不同开源项目的数据集上进行了实验。我们主要分析了两种高影响漏洞,即意外漏洞和破坏漏洞。结果显示不同的方法变种有不同的表现,针对意外漏洞识别的最佳变种SMOTE+KNN和针对破坏漏洞识别的最佳变种RUS+NB的在F1-scores上的表现优于两个分别由Thung等人和Garcia等人提出的最先进的方法。
关键词高影响漏洞   非平衡学习   漏洞报告识别     
Abstract: In practice, some bugs have more impact than others and thus deserve more immediate attention. Due to tight schedule and limited human resources, developers may not have enough time to inspect all bugs. Thus, they often concentrate on bugs that are highly impactful. In the literature, high-impact bugs are used to refer to the bugs which appear at unexpected time or locations and bring more unexpected effects (i.e., surprise bugs), or break pre-existing functionalities and destroy the user experience (i.e., breakage bugs). Unfortunately, identifying high-impact bugs from thousands of bug reports in a bug tracking system is not an easy feat. Thus, an automated technique that can identify high-impact bug reports can help developers to be aware of them early, rectify them quickly, and minimize the damages they cause. Considering that only a small proportion of bugs are high-impact bugs, the identification of high-impact bug reports is a difficult task. In this paper, we propose an approach to identify high-impact bug reports by leveraging imbalanced learning strategies. We investigate the effectiveness of various variants, each of which combines one particular imbalanced learning strategy and one particular classification algorithm. In particular, we choose four widely used strategies for dealing with imbalanced data and four state-of-the-art text classification algorithms to conduct experiments on four datasets from four different open source projects. We mainly perform an analytical study on two types of high-impact bugs, i.e., surprise bugs and breakage bugs. The results show that different variants have different performances, and the best performing variants SMOTE (synthetic minority over-sampling technique)+KNN (K-nearest neighbours) for surprise bug identification and RUS (random under-sampling)+NB (naive Bayes) for breakage bug identification outperform the F1-scores of the two state-of-the-art approaches by Thung et al. and Garcia and Shihab.
Keywordshigh-impact bug   imbalanced learning   bug report identification     
Received 2016-03-19;
本文基金:

A preliminary version of the paper was published in the Proceedings of COMPSAC 2016.This work is supported by the National Natural Science Foundation of China under Grant Nos. 61602403 and 61402406 and the National Key Technology Research and Development Program of the Ministry of Science and Technology of China under Grant No. 2015BAH17F01.

通讯作者: Xin Xia     Email: xxia@zju.edu.cn
About author: Xin Xia received his Ph.D. degree in computer science from the College of Computer Science and Technology, Zhejiang University, Hangzhou, in 2014. He is currently a research assistant professor in the College of Computer Science and Technology at Zhejiang University, Hangzhou. His research interests include software analytic, empirical study, and mining software repository.
引用本文:   
Xin-Li Yang, David Lo, Xin Xia, Qiao Huang, Jian-Ling Sun.用非平衡学习策略识别高影响漏洞报告[J]  Journal of Computer Science and Technology , 2017,V32(1): 181-198
Xin-Li Yang, David Lo, Xin Xia, Qiao Huang, Jian-Ling Sun.High-Impact Bug Report Identification with Imbalanced Learning Strategies[J]  Journal of Computer Science and Technology, 2017,V32(1): 181-198
链接本文:  
http://jcst.ict.ac.cn:8080/jcst/CN/10.1007/s11390-017-1713-3
Copyright 2010 by Journal of Computer Science and Technology