Journal of Computer Science and Technology ›› 2020, Vol. 35 ›› Issue (6): 1428-1445.doi: 10.1007/s11390-020-0323-7

Special Issue: Software Systems

Previous Articles     Next Articles

Predicting Code Smells and Analysis of Predictions: Using Machine Learning Techniques and Software Metrics

Mohammad Y. Mhawish and Manjari Gupta        

  1. Computer Science, Centre for Interdisciplinary Mathematical Sciences, Institute of Science, Banaras Hindu University Varanasi 221005, India
  • Received:2020-01-24 Revised:2020-09-29 Online:2020-11-20 Published:2020-12-01
  • About author:Mohammad Y. Mhawish received his Ph.D. degree in computer science from DST-Centre for Interdisciplinary Mathematical Sciences, Banaras Hindu University, Varanasi, in 2019. His research interests mainly include software engineering, software security, and artificial intelligence.

Code smell detection is essential to improve software quality, enhancing software maintainability, and decrease the risk of faults and failures in the software system. In this paper, we proposed a code smell prediction approach based on machine learning techniques and software metrics. The local interpretable model-agnostic explanations (LIME) algorithm was further used to explain the machine learning model’s predictions and interpretability. The datasets obtained from Fontana et al. were reformed and used to build binary-label and multi-label datasets. The results of 10-fold cross-validation show that the performance of tree-based algorithms (mainly Random Forest) is higher compared with kernel-based and network-based algorithms. The genetic algorithm based feature selection methods enhance the accuracy of these machine learning algorithms by selecting the most relevant features in each dataset. Moreover, the parameter optimization techniques based on the grid search algorithm significantly enhance the accuracy of all these algorithms. Finally, machine learning techniques have high potential in predicting the code smells, which contribute to detect these smells and enhance the software’s quality.

Key words: code smell; code smell detection; feature selection; prediction explanation; parameter optimization;

[1] Wiegers K, Beatty J. Software Reqirements. Pearson Education, 2013.
[2] Chung L, do Prado Leite J C S. On non-functional requirements in software engineering. In Conceptual Modeling:Foundations and Applications-Essays in Honor of John Mylopoulos, Borgida AT, Chaudhri V, Giorgini P, Yu E (eds.), Springer, 2009, pp.363-379.
[3] Fowler M, Beck K, Brant J, Opdyke W, Roberts D. Refactoring:Improving the Design of Existing Code (1st edition). Addison-Wesley Professional, 1999.
[4] Yamashita A, Moonen L. Exploring the impact of intersmell relations on software maintainability:An empirical study. In Proc. the 35th Int. Conf. Softw. Eng., May 2013, pp.682-691.
[5] Yamashita A, Counsell S. Code smells as system-level indicators of maintainability:An empirical study. J. Syst. Softw., 2013, 86(10):2639-2653.
[6] Yamashita A, Moonen L. Do code smells reflect important maintainability aspects? In Proc. the 28th IEEE Int. Conf. Softw. Maintenance, September 2012, pp.306-315.
[7] Sjøberg D I K, Yamashita A, Anda B C D, Mockus A, Dybå T. Quantifying the effect of code smells on maintenance effort. IEEE Trans. Softw. Eng., 2013, 39(8):1144-1156.
[8] Sahin D, Kessentini M, Bechikh S, Ded K. Code-smells detection as a bi-level problem. ACM Trans. Softw. Eng. Methodol., 2014, 24(1):Article No. 6.
[9] Olbrich S, Cruzes D S, Basili V, Zazworka N. The evolution and impact of code smells:A case study of two open source systems. In Proc. the 3rd International Symposium on Empirical Software Engineering and Measurement, October 2009, pp.390-400.
[10] Olbrich S M, Cruzes D S, Sjoøberg D I K. Are all code smells harmful? A study of God Classes and Brain Classes in the evolution of three open source systems. In Proc. the 26th IEEE Int. Conf. Softw. Maintenance, September 2010.
[11] Khomh F, Penta D M, Guéhéneuc Y G. An exploratory study of the impact of code smells on software changeproneness. In Proc. the 16th Working Conference on Reverse Engineering, October 2009, pp.75-84.
[12] Deligiannis I, Stamelos I, Angelis L, Roumeliotis M, Shepperd M. A controlled experiment investigation of an objectoriented design heuristic for maintainability. J. Syst. Softw., 2004, 72(2):129-143.
[13] Pérez-Castillo R, Piattini M. Analyzing the harmful effect of god class refactoring on power consumption. IEEE Softw., 2014, 31(3):48-54.
[14] Li W, Shatnawi R. An empirical study of the bad smells and class error probability in the post-release object-oriented system evolution. J. Syst. Softw., 2007, 80(7):1120-1128.
[15] Ciupke O. Automatic detection of design problems in object-oriented reengineering. In Proc. the 30th International Conference on Technology of Object-Oriented Languages and Systems, Delivering Quality Software, August 1999, pp.18-32.
[16] Travassos G, Shull F, Fredericks M, Basili V R. Detecting defects in object-oriented designs:Using reading techniques to increase software quality. ACM SIGPLAN Notices, 1999, 34(10):47-56.
[17] Dashofy E M, van der Hoek A, Taylor R N. A comprehensive approach for the development of modular software architecture description languages. ACM Trans. Softw. Eng. Methodol., 2005, 14(2):199-245.
[18] Vidal S, Vázquez H, Díaz-Pace J A, Marcos C, Garcia A, Oizumi W. JSpIRIT:A flexible tool for the analysis of code smells. In Proc. the 34th Int. Conf. Chil. Comput. Sci. Soc., November 2016.
[19] Marinescu R. Measurement and quality in object-oriented design. In Proc. the 21st IEEE Int. Conf. Softw. Maintenance, September 2005, pp.701-704.
[20] Moha N, Guéhéneuc Y, Duchien L, le Meur A. DECOR:A method for the specification and detection of code and design smells. IEEE Trans. Softw. Eng., 2010, 36(1):20-36.
[21] Fontana F A, Zanoni M, Marino A, Mäntylä M V. Code smell detection:Towards a machine learning-based approach. In Proc. the 2013 IEEE Int. Conf. Softw. Maintenance, September 2013, pp.396-399.
[22] Azadi U, Fontana F A, Zanoni M. Machine learning based code smell detection through WekaNose. In Proc. the 40th Int. Conf. Softw. Eng., May 2018, pp.288-289.
[23] Fontana F A, Zanoni M. Code smell severity classification using machine learning techniques. Knowledge-Based Syst., 2017, 128:43-58.
[24] Fontana F A, Mäntylä M V, Zanoni M, Marino A. Comparing and experimenting machine learning techniques for code smell detection. Empir. Softw. Eng., 2016, 21(3):1143-1191.
[25] Sharma T, Spinellis D. A survey on software smells. J. Syst. Softw., 2018, 138:158-173.
[26] Rasool G, Arshad Z. A review of code smell mining techniques. J. Softw. Evol. Process, 2015, 27(11):867-895.
[27] Fernandes E, Oliveira J, Vale G, Paiva T, Figueiredo E. A review-based comparative study of bad smell detection tools. In Proc. the 20th International Conference on Evaluation and Assessment in Software Engineering, June 2016, Article No. 18.
[28] Fontana F A, Braione P, Zanoni M. Automatic detection of bad smells in code:An experimental assessment. J. Object Technol., 2012, 11(2):Article No. 5.
[29] Riberro M T, Singh S, Guestrin C. "Why should I trust you?":Explaining the predictions of and classifier. https//, Oct. 2020.
[30] Chicco D. Ten quick tips for machine learning in computational biology. BioData Mining, 2017, 10(1):35.
[31] Marinescu R. Detection strategies:Metrics-based rules for detecting design flaws. In Proc. the 20th IEEE International Conference on Software Maintenance, December 2004, pp.350-359.
[32] Abílio R, Padilha J, Figueiredo E, Costa H. Detecting code smells in software product lines-An exploratory study. In Proc. the 12th International Conference on Information Technology-New Generations, April 2015, pp.433-438.
[33] Fenske W, Schulze S. Code smells revisited:A variability perspective. In Proc. the 9th International Workshop on Variability Modelling of Software-Intensive Systems, January 2015, Article No. 3.
[34] Suryanarayana G, Samarthyam G, Sharma T. Refactoring for Software Design Smells:Managing Technical Debt (1st edition). Morgan Kaufmann, 2014.
[35] Baudry B, Traon Y L, Sunyé G, Jézéquel J M. Measuring and improving design patterns testability. In Proc. the 9th IEEE International Software Metrics Symposium, September 2003.
[36] Langelier G, Sahraoui H, Poulin P. Visualization-based analysis of quality for large-scale software systems. In Proc. the 20th IEEE/ACM International Conference on Automated Software Engineering, November 2005, pp.214-223.
[37] Murphy-Hill E, Black A P. An interactive ambient visualization for code smells. In Proc. the 5th International Symposium on Software Visualization, October 2010, pp.5-14.
[38] de Figueiredo Carneiro G, Silva M, Mara L et al. Identifying code smells with multiple concern views. In Proc. the 24th Brazilian Symposium on Software Engineering, September 2010, pp.128-137.
[39] Kreimer J. Adaptive detection of design flaws. Electron. Notes Theor. Comput. Sci., 2005, 141(4):117-136.
[40] Amorim L, Costa E, Antunes N, Fonseca B, Ribeiro M. Experience report:Evaluating the effectiveness of decision trees for detecting code smells. In Proc. the 26th IEEE International Symposium on Software Reliability Engineering, November 2015, pp.261-269.
[41] Khomh F, Vaucher S, Guéhéneuc Y G, Sahraoui H. A Bayesian approach for the detection of code and design smells. In Proc. the 9th International Conference on Quality Software, August 2009, pp.305-314.
[42] Khomh F, Vaucher S, Guéhéneuc Y G, Sahraoui H. BDTEX:A GQM-based Bayesian approach for the detection of antipatterns. J. Syst. Softw., 2011, 84(4):559-572.
[43] Vaucher S, Khomh F, Moha N, Guéhéneuc Y G. Tracking design smells:Lessons from a study of god classes. In Proc. the 16th Working Conference on Reverse Engineering, October 2009, pp.145-154.
[44] Hassaine S, Khomh F, Guéhéneuc Y G, Hamel S. IDS:An immune-inspired approach for the detection of software design smells. In Proc. the 7th International Conference on the Quality of Information and Communications Technology, September 2010, pp.343-348.
[45] Maiga A, Ali N, Bhattacharya N et al. Support vector machines for anti-pattern detection. In Proc. the 27th IEEE/ACM International Conference on Automated Software Engineering, September 2012, pp.278-281.
[46] Maiga A, Ali N, Bhattacharya N, Sabane A, Gueheneuc Y G, Aimeur E. SMURF:A SVM-based incremental antipattern detection approach. In Proc. the 19th Working Conference on Reverse Engineering, October 2012, pp.466-475.
[47] Tempero E, Anslow C, Dietrich J et al. The Qualitas Corpus:A curated collection of Java code for empirical studies. In Proc. the 17th Asia Pacific Software Engineering Conference, November 2010, pp.336-345.
[48] Pecorelli F, Palomba F, di Nucci D, de Lucia A. Comparing heuristic and machine learning approaches for metric-based code smell detection. In Proc. the 27th Int. Conf. Progr. Compr., May 2019, pp.93-104.
[49] Wieman R. Anti-Pattern Scanner:An approach to detect anti-patterns and design violations[Master Thesis]. Department of Computer Science, Delft University of Technology, 2011.
[50] Nongpong K. Integrating "code smells" detection with refactoring tool support[Ph.D. Thesis]. University of Wisconsin-Milwaukee, 2012.
[51] Riel A J. Object-Oriented Design Heuristics (1st edition). Addison-Wesley Professional, 1996.
[52] Chawla N V, Bowyer K W, Hall L O, Kegelmeyer W P. SMOTE:Synthetic minority over-sampling technique. J. Artif. Intell. Res., 2002, 16:321-357.
[53] Do T D, Hui S C, Fong A C M. Associative classification with prediction confidence. In Proc. the 4th International Conference on Machine Learning and Cybernetics, August 2005, pp.199-208.
[54] Malhotra R. Empirical Research in Software Engineering:Concepts, Analysis, and Applications (1st edition). Chapman and Hall/CRC, 2015.
[55] Forman G, Scholz M, Rajaram S. Feature shaping for linear SVM classifiers. In Proc. the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, June 2009, pp.299-308.
[56] Jain A, Nandakumar K, Ross A. Score normalization in multimodal biometric systems. Pattern Recognit., 2005, 38(12):2270-2285.
[57] Yang J, Honavar V. Feature subset selection using a genetic algorithm. IEEE Intell. Syst., 1998, 13(2):44-49.
[58] Cassar I R, Titus N D, Grill W M. An improved genetic algorithm for designing optimal temporal patterns of neural stimulation. J. Neural Eng., 2017, 14(6):Article No. 066013.
[59] Hassanat A, Almohammadi K, Alkafaween E, Abunawas E, Hammouri A, Prasath V B. Choosing mutation and crossover ratios for genetic algorithms-A review with a new dynamic approach. Information, 2019, 10(12):Article No. 390.
[60] Hall M A. Correlation-based feature subset selection for machine learning[Ph.D Thesis]. Department of Computer Science, The University of Waikato, 1998.
[61] Vapnik V N. An overview of statistical learning theory. IEEE Trans. Neural Networks, 1999, 10(5):988-999.
[62] LeCun Y, Bengio Y, Hinton G. Deep learning. Nature, 2015, 521(7553):436-444.
[63] Aha D W, Kibler D, Albert M K. Instance-based learning algorithms. Mach. Learn., 1991, 6(1):37-66.
[64] Rokach L, Maimon O Z. Data Mining with Decision Trees:Theory and Applications. World Scientific, 2007.
[65] Malohlava M, Candel A, Click C, Roark H, Parmar V. Gradient boosting machine with H2O., May 2020.
[66] Hsu C W, Chang C C, Lin C J. A practical guide to support vector classification. Technical Report, Taiwan University, 2008., May 2020.
[67] Thomas I L, Allcock G M. Determining the confidence level for a classification. Photogramm. Eng. Remote Sensing, 1984, 50(10):1491-1496.
[68] Chakraborty S, Tomsett R, Raghavendra R et al. Interpretability of deep learning models:A survey of results. In Proc. the 2017 IEEE SmartWorld Ubiquitous Intell. Comput. Adv. and Trust. Comput. Scalable Comput. and Commun. Cloud Big Data Comput., Internet People Smart City Innov. SmartWorld/SCALCOM/UIC/ATC/CBDCom/IOP/SCI, August 2017.
[69] Guggulothu T, Moiz S A. Code smell detection using multilabel classification approach. Softw. Qual. J., 2020, 28:1063-1086.
[70] Kiyak E O, Birant D, Birant K U. Comparison of multilabel classification algorithms for code smell detection. In Proc. the 3rd International Symposium on Multidisciplinary Studies and Innovative Technologies, October 2019.
[71] di Nucci D, Palomba F, Tamburri D A, Serebrenik A, de Lucia A. Detecting code smells using machine learning techniques:Are we there yet? In Proc. the 25th IEEE Int. Conf. Softw. Anal. Evol. Reengineering, March 2018, pp.612-621.
[1] Yi-Fan Chen, Xiang Zhao, Jin-Yuan Liu, Bin Ge, Wei-Ming Zhang. Item Cold-Start Recommendation with Personalized Feature Selection [J]. Journal of Computer Science and Technology, 2020, 35(5): 1217-1230.
[2] Gökçer Peynirci, Mete Eminaǧaoǧlu, Korhan Karabulut. Feature Selection for Malware Detection on the Android Platform Based on Differences of IDF Values [J]. Journal of Computer Science and Technology, 2020, 35(4): 946-962.
[3] Shu-Zheng Zhang, Zhen-Yu Zhao, Chao-Chao Feng, Lei Wang. A Machine Learning Framework with Feature Selection for Floorplan Acceleration in IC Physical Design [J]. Journal of Computer Science and Technology, 2020, 35(2): 468-474.
[4] Chao Ni, Wang-Shu Liu, Xiang Chen, Qing Gu, Dao-Xu Chen, Qi-Guo Huang. A Cluster Based Feature Selection Method for Cross-Project Software Defect Prediction [J]. , 2017, 32(6): 1090-1107.
[5] Bei-Ji Zou, Yao Chen, Cheng-Zhang Zhu, Zai-Liang Chen, Zi-Qian Zhang. Supervised Vessels Classification Based on Feature Selection [J]. , 2017, 32(6): 1222-1230.
[6] Lan Yao, Feng Zeng, Dong-Hui Li, Zhi-Gang Chen. Sparse Support Vector Machine with Lp Penalty for Feature Selection [J]. , 2017, 32(1): 68-77.
[7] Chao Han, Yun-Kun Tan, Jin-Hui Zhu, Yong Guo, Jian Chen, Qing-Yao Wu. Online Feature Selection of Class Imbalance via PA Algorithm [J]. , 2016, 31(4): 673-682.
[8] Fatemeh Azmandian, Ayse Yilmazer, Jennifer G. Dy Javed A. Aslam, and David R. Kaeli. Harnessing the Power of GPUs to Speed Up Feature Selection for Outlier Detection [J]. , 2014, 29(3): 408-422.
Full text



[1] Zhou Di;. A Recovery Technique for Distributed Communicating Process Systems[J]. , 1986, 1(2): 34 -43 .
[2] Li Wanxue;. Almost Optimal Dynamic 2-3 Trees[J]. , 1986, 1(2): 60 -71 .
[3] Wang Xuan; Lü Zhimin; Tang Yuhai; Xiang Yang;. A High Resolution Chinese Character Generator[J]. , 1986, 1(2): 1 -14 .
[4] C.Y.Chung; H.R.Hwa;. A Chinese Information Processing System[J]. , 1986, 1(2): 15 -24 .
[5] Wu Enhua;. A Graphics System Distributed across a Local Area Network[J]. , 1986, 1(3): 53 -64 .
[6] Zhang Cui; Zhao Qinping; Xu Jiafu;. Kernel Language KLND[J]. , 1986, 1(3): 65 -79 .
[7] Wang Jianchao; Wei Daozheng;. An Effective Test Generation Algorithm for Combinational Circuits[J]. , 1986, 1(4): 1 -16 .
[8] Huang Xuedong; Cai Lianhong; Fang Ditang; Chi Bianjin; Zhou Li; Jiang Li;. A Computer System for Chinese Character Speech Input[J]. , 1986, 1(4): 75 -83 .
[9] Shi Zhongzhi;. Knowledge-Based Decision Support System[J]. , 1987, 2(1): 22 -29 .
[10] Tang Tonggao; Zhao Zhaokeng;. Stack Method in Program Semantics[J]. , 1987, 2(1): 51 -63 .

ISSN 1000-9000(Print)

CN 11-2296/TP

Editorial Board
Author Guidelines
Journal of Computer Science and Technology
Institute of Computing Technology, Chinese Academy of Sciences
P.O. Box 2704, Beijing 100190 P.R. China
  Copyright ©2015 JCST, All Rights Reserved