计算机科学技术学报 ›› 2019,Vol. 34 ›› Issue (5): 1063-1078.doi: 10.1007/s11390-019-1960-6

所属专题: Software Systems

• • 上一篇    下一篇

软件度量元阈值提取框架

Mohammed Alqmase, Mohammad Alshayeb*, Lahouari Ghouti   

  1. Information and Computer Science Department, King Fahd University of Petroleum and Minerals Dhahran 31261, Saudi Arabia
  • 收稿日期:2018-10-13 修回日期:2019-03-23 出版日期:2019-08-31 发布日期:2019-08-31
  • 通讯作者: Mohammad Alshayeb E-mail:alshayeb@kfupm.edu.sa
  • 作者简介:Mohammed Alqmase received his M.S. degree in computer science from King Fahd University of Petroleum and Minerals, Dhahran, in 2019, and his B.S. degree in information technology (IT) from King Abdul-Aziz University, Jeddah, in 2013. He worked as a content management system analyst for Hippo CMS in 2017. He also worked as an instructor in Sana'a Community College, Sana, Yemen, from 2013 to 2015. His research interests include sentiment analysis, natural language processing, machine learning, algorithms and software engineering.

Threshold Extraction Framework for Software Metrics

Mohammed Alqmase, Mohammad Alshayeb*, Lahouari Ghouti   

  1. Information and Computer Science Department, King Fahd University of Petroleum and Minerals Dhahran 31261, Saudi Arabia
  • Received:2018-10-13 Revised:2019-03-23 Online:2019-08-31 Published:2019-08-31
  • Contact: Mohammad Alshayeb E-mail:alshayeb@kfupm.edu.sa
  • About author:Mohammed Alqmase received his M.S. degree in computer science from King Fahd University of Petroleum and Minerals, Dhahran, in 2019, and his B.S. degree in information technology (IT) from King Abdul-Aziz University, Jeddah, in 2013. He worked as a content management system analyst for Hippo CMS in 2017. He also worked as an instructor in Sana'a Community College, Sana, Yemen, from 2013 to 2015. His research interests include sentiment analysis, natural language processing, machine learning, algorithms and software engineering.

软件度量元用来度量软件的不同属性。事实上,为了使用这些度量元来度量软件属性,度量阈值不必可少。很多研究人员曾试图基于个人经验来确定这些阈值。但是,由于个人经历的可变性和人的观点的主观性,基于经验产生的阈值难以推广。本文旨在提出一个自动聚类框架,它基于期望最大化(EM)算法,在此,利用一个简化的3个度量元集(LOC,LCOM,CBO)来聚类。基于这些类,我们系统地确定不同软件度量元阈值等级,使得每个阈值反应软件质量的特定等级。本文所提的框架分为两个步骤:聚类和阈值提取。前者中,软件质量历史数据集被使用EM算法,分解成特定的聚类集合;后者则是通过统计数据,例如每个度量元的平均值和标准方差来估算针对所得到的类中,每个软件度量元的阈值。本文研究结果凸显了基于EM的聚类方法的能力,它能使用最小度量集,根据不同质量等级,对软件质量数据库进行组合。

关键词: 度量元阈值, 期望最大化, 实证研究

Abstract: Software metrics are used to measure different attributes of software. To practically measure software attributes using these metrics, metric thresholds are needed. Many researchers attempted to identify these thresholds based on personal experiences. However, the resulted experience-based thresholds cannot be generalized due to the variability in personal experiences and the subjectivity of opinions. The goal of this paper is to propose an automated clustering framework based on the expectation maximization (EM) algorithm where clusters are generated using a simplified 3-metric set (LOC, LCOM, and CBO). Given these clusters, different threshold levels for software metrics are systematically determined such that each threshold reflects a specific level of software quality. The proposed framework comprises two major steps:the clustering step where the software quality historical dataset is decomposed into a fixed set of clusters using the EM algorithm, and the threshold extraction step where thresholds, specific to each software metric in the resulting clusters, are estimated using statistical measures such as the mean (μ) and the standard deviation (σ) of each software metric in each cluster. The paper's findings highlight the capability of EM-based clustering, using a minimum metric set, to group software quality datasets according to different quality levels.

Key words: metric threshold, expectation maximization, empirical study

[1] Erni K, Lewerentz C. Applying design metrics to objectoriented frameworks. In Proc. the 3rd IEEE International Software Metrics Symposium, March 1996, pp.64-74.
[2] Abílio R, Padilha J, Figueiredo E, Costa H. Detecting code smells in software product lines-An exploratory study. In Proc. the 12th International Conference on Information Technology-New Generations, April 2015, pp.433-438.
[3] McCabe T J. A complexity measure. IEEE Transactions on Software Engineering, 1976, SE-2(4):308-320.
[4] Nejmeh B A. NPATH:A measure of execution path complexity and its applications. Commun. ACM, 1988, 31(2):188-200.
[5] Henderson-Sellers B. Object-Oriented Metrics:Measures of Complexity. Prentice Hall, 1995.
[6] Coleman D, Lowther B, Oman P. The application of software maintainability models in industrial software systems. Journal of Systems and Software, 1995, 29(1):3-16.
[7] Lanza M, Marinescu R. Object-Oriented Metrics in Practice:Using Software Metrics to Characterize, Evaluate, and Improve the Design of Object-Oriented Systems. Springer, 2006.
[8] Wheeldon R, Counsell S. Power law distributions in class relationships. In Proc. the 3rd IEEE International Workshop on Source Code Analysis and Manipulation, September 2003, pp.45-54.
[9] Concas G, Marchesi M, Pinna S, Serra N. Power-laws in a large object-oriented software system. IEEE Transactions on Software Engineering, 2007, 33(10):687-708.
[10] Baxter G, Frean M, Noble J et al. Understanding the shape of Java software. In Proc. the 21st Annual ACM SIGPLAN Conference on Object-Oriented Programming Systems, Languages, and Applications, October 2006, pp.397-412.
[11] French V. Establishing software metric thresholds. In Proc. the 9th International Workshop on Software Measurement, September 1999, Article No. 7.
[12] Shatnawi R, Li W, Swain J, Newman T. Finding software metrics threshold values using ROC curves. Journal of Software Maintenance and Evolution:Research and Practice, 2010, 22(1):1-16.
[13] Catal C, Alan O, Balkan K. Class noise detection based on software metrics and ROC curves. Information Sciences, 2011, 181(21):4867-4877.
[14] Herbold S, Grabowski J, Waack S. Calculation and optimization of thresholds for sets of software metrics. Empirical Software Engineering, 2011, 16(6):812-841.
[15] Do C B, Batzoglou S. What is the expectation maximization algorithm? Nature Biotechnology, 2008, 26:897-899.
[16] He P, Li B, Liu X, Chen J, Ma Y. An empirical study on software defect prediction with a simplified metric set. Information and Software Technology, 2015, 59:170-190.
[17] Sharma N, Bajpai A, Litoriya M R. Comparison the various clustering algorithms of Weka tools. International Journal of Emerging Technology and Advanced Engineering, 2012, 2(5):73-80.
[18] Hill T, Lewicki P. Statistics:Methods and Applications; A Comprehensive Reference for Science, Industry, and Data Mining. StatSoft, 2006.
[19] Chidamber S R, Kemerer C F. A metrics suite for object oriented design. IEEE Transactions on Software Engineering, 1994, 20(6):476-493.
[20] Vale G A D, Figueiredo E M L. A method to derive metric thresholds for software product lines. In Proc. the 29th Brazilian Symposium on Software Engineering, September 2015, pp.110-119.
[21] Benlarbi S, Emam K E, Goel N, Rai S. Thresholds for object-oriented measures. In Proc. the 11th International Symposium on Software Reliability Engineering, October 2000, pp.24-39.
[22] Emam K E, Benlarbi S, Goel N, Melo W, Lounis H, Rai S N. The optimal class size for object-oriented software. IEEE Transactions on Software Engineering, 2002, 28(5):494-509.
[23] Spinellis D, Jureczko M. Metric descriptions. http://gromit.iiar.pwr.wroc.pl/p inf/ckjm/metric.html, December 2018.
[24] Jureczko M, Madeyski L. Towards identifying software project clusters with regard to defect prediction. In Proc. the 6th International Conference on Predictive Models in Software Engineering, September 2010, Article No. 9.
[25] Jureczko M, Spinellis D. Using object-oriented design metrics to predict software defects. In Proc. the 5th International Conference on Dependability of Computer Systems, June 2010, pp.69-81.
[26] Zhang H. An investigation of the relationships between lines of code and defects. In Proc. the 25th IEEE International Conference on Software Maintenance, September 2009, pp.274-283.
[27] Lipow M. Number of faults per line of code. IEEE Transactions on Software Engineering, 1982, SE-8(4):437-439.
[28] Ferreira K A M, Bigonha M A S, Bigonha R S, Mendes L F O, Almeida H C. Identifying thresholds for object-oriented software metrics. Journal of Systems and Software, 2012, 85(2):244-257.
[29] Alves T L, Ypma C, Visser J. Deriving metric thresholds from benchmark data. In Proc. the 26th IEEE International Conference on Software Maintenance, September 2010, Article No. 44.
[30] Oliveira P, Valente M T, Lima F P. Extracting relative thresholds for source code metrics. In Proc. the 2014 IEEE Conference on Software Maintenance, Reengineering, and Reverse Engineering, February 2014, pp.254-263.
[31] Veado L, Vale G, Fernandes E, Figueiredo E. TDTool:Threshold derivation tool. In Proc. the 20th International Conference on Evaluation and Assessment in Software Engineering, June 2016, Article No. 24.
[32] Lincke R, Lundberg J, Löwe W. Comparing software metrics tools. In Proc. the 2008 International Symposium on Software Testing and Analysis, July 2008, pp.131-142.
[1] 孔雀屏, 王子彦, 黄袁, 陈湘萍, 周晓聪, 郑子彬, 黄罡. 定义和检测智能合约中低效率的Gas模式[J]. 计算机科学技术学报, 2022, 37(1): 67-82.
[2] Yong-Hao Wu, Zheng Li, Yong Liu, Xiang Chen. 使用OPTICS聚类进行基于错误划分的多错误定位[J]. 计算机科学技术学报, 2020, 35(5): 979-998.
[3] Xiang Chen, Dun Zhang, Zhan-Qi Cui, Qing Gu, Xiao-Lin Ju. DP-Share:基于差分隐私保护的软件缺陷预测模型共享方法[J]. 计算机科学技术学报, 2019, 34(5): 1020-1038.
[4] Xin-Li Yang, David Lo, Xin Xia, Zhi-Yuan Wan, Jian-Ling Sun. 开发者问什么安全问题?在Stack Overflow上的大规模实证研究[J]. , 2016, 31(5): 910-924.
[5] Saiqa Aleem, Luiz Fernando Capretz, Faheem Ahmed. 基于开发员的视角提升游戏开发过程的关键成功因素[J]. , 2016, 31(5): 925-950.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] 刘明业; 洪恩宇;. Some Covering Problems and Their Solutions in Automatic Logic Synthesis Systems[J]. , 1986, 1(2): 83 -92 .
[2] 陈世华;. On the Structure of (Weak) Inverses of an (Weakly) Invertible Finite Automaton[J]. , 1986, 1(3): 92 -100 .
[3] 高庆狮; 张祥; 杨树范; 陈树清;. Vector Computer 757[J]. , 1986, 1(3): 1 -14 .
[4] 陈肇雄; 高庆狮;. A Substitution Based Model for the Implementation of PROLOG——The Design and Implementation of LPROLOG[J]. , 1986, 1(4): 17 -26 .
[5] 黄河燕;. A Parallel Implementation Model of HPARLOG[J]. , 1986, 1(4): 27 -38 .
[6] 闵应骅; 韩智德;. A Built-in Test Pattern Generator[J]. , 1986, 1(4): 62 -74 .
[7] 唐同诰; 招兆铿;. Stack Method in Program Semantics[J]. , 1987, 2(1): 51 -63 .
[8] 闵应骅;. Easy Test Generation PLAs[J]. , 1987, 2(1): 72 -80 .
[9] 朱鸿;. Some Mathematical Properties of the Functional Programming Language FP[J]. , 1987, 2(3): 202 -216 .
[10] 李明慧;. CAD System of Microprogrammed Digital Systems[J]. , 1987, 2(3): 226 -235 .
版权所有 © 《计算机科学技术学报》编辑部
本系统由北京玛格泰克科技发展有限公司设计开发 技术支持:support@magtech.com.cn
总访问量: