SCIE, EI, Scopus, INSPEC, DBLP, CSCD, etc.
Citation: | Zhang FL, Chen YC, Khoo SC et al. CHANN: A hierarchical neural network for clone consistency prediction. JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY, 40(1): 178−195, Jan. 2025. DOI: 10.1007/s11390-023-2831-8 |
Modifying a code segment may give rise to a consistency issue when the code segment belongs to a clone group comprising closely similar code segments. Recent studies have demonstrated that such consistent changes can incur extra maintenance costs when clones are checked for consistency and introduce defects if developers forget to change clones consistently when needed. To address this problem, researchers have proposed an approach to predict clone consistency in advance with handcrafted attributes, notably using machine learning methods. Although these attributes can help predict clone consistency to some extent, the capability of such an approach is generally weak and unsatisfactory in practice. Such limitations in capability are especially severe at a project’s infancy stage when there is not sufficient within-project data to model clone consistency behavior, and cross-project data have not been helpful in supporting prediction. In this paper, we propose the Clone Hierarchical Attention Neural Network (CHANN) to represent code clones and their evolution by adopting a hierarchical perspective of code, context, and code evolution, and thus enhancing the effectiveness of clone consistency prediction. To assess the effectiveness of CHANN, we conduct experiments on the dataset collected from eight open-source projects. The experimental results show that CHANN is highly effective in predicting clone consistency, and the precision, recall, and F-measure attained in prediction are around 82%. These findings support our hypothesis that the hierarchical neural network can help developers predict clone consistency effectively in the case of cross-project incubation when insufficient data are available at the early stage of software development.
[1] |
Roy C K, Cordy J R, Koschke R. Comparison and evaluation of code clone detection techniques and tools: A qualitative approach. Science of Computer Programming, 2009, 74(7): 470–495. DOI: 10.1016/j.scico.2009.02.007.
|
[2] |
Kim M, Sazawal V, Notkin D, Murphy G. An empirical study of code clone genealogies. In Proc. the 10th European Software Engineering Conference Held Jointly with the 13th ACM SIGSOFT International Symposium on Foundations of Software Engineering, Sept. 2005, pp.187–196. DOI: 10.1145/1081706.1081737.
|
[3] |
Pate J R, Tairas R, Kraft N A. Clone evolution: A systematic review. Journal of Software: Evolution and Process, 2013, 25(3): 261–283. DOI: 10.1002/smr.579.
|
[4] |
Zhang F, Khoo S C, Su X. Machine-learning aided analysis of clone evolution. Chinese Journal of Electronics, 2017, 26(6): 1132–1138. DOI: 10.1049/cje.2017.08.012.
|
[5] |
Krinke J. A study of consistent and inconsistent changes to code clones. In Proc. the 14th Working Conference on Reverse Engineering, Oct. 2007, pp.170–178. DOI: 10.1109/WCRE.2007.7.
|
[6] |
Barbour L, Khomh F, Zou Y. Late propagation in software clones. In Proc. the 27th IEEE International Conference on Software Maintenance, Sept. 2011, pp.273–282. DOI: 10.1109/ICSM.2011.6080794.
|
[7] |
Mondal M, Roy C K, Schneider K A. Bug-proneness and late propagation tendency of code clones: A comparative study on different clone types. Journal of Systems and Software, 2018, 144: 41–59. DOI: 10.1016/j.jss.2018.05.028.
|
[8] |
Zhang F, Khoo S C, Su X. Predicting consistent clone change. In Proc. the 27th International Symposium on Software Reliability Engineering, Oct. 2016, pp.353–364. DOI: 10.1109/ISSRE.2016.11.
|
[9] |
Zhang F, Khoo S C, Su X. Predicting change consistency in a clone group. Journal of Systems and Software, 2017, 134: 105–119. DOI: 10.1016/j.jss.2017.08.045.
|
[10] |
Zhang F, Khoo S. An empirical study on clone consistency prediction based on machine learning. Information and Software Technology, 2021, 136: 106573. DOI: 10.1016/j.infsof.2021.106573.
|
[11] |
Allamanis M, Barr E T, Devanbu P, Sutton C. A survey of machine learning for big code and naturalness. ACM Computing Surveys (CSUR), 2019, 51(4): Article No. 81. DOI: 10.1145/3212695.
|
[12] |
Zhang J, Wang X, Zhang H, Sun H, Wang K, Liu X. A novel neural source code representation based on abstract syntax tree. In Proc. the 41st IEEE/ACM International Conference on Software Engineering, May 2019, pp.783–794. DOI: 10.1109/ICSE.2019.00086.
|
[13] |
Mou L, Li G, Zhang L, Wang T, Jin Z. Convolutional neural networks over tree structures for programming language processing. In Proc. the 13th AAAI Conference on Artificial Intelligence, Feb. 2016, pp.1287–1293. DOI: 10.5555/3015812.3016002.
|
[14] |
Wang W, Li G, Shen S, Xia X, Jin Z. Modular tree network for source code representation learning. ACM Trans. Software Engineering and Methodology (TOSEM), 2020, 29(4): Article No. 31. DOI: 10.1145/3409331.
|
[15] |
Ling X, Wu L, Wang S, Pan G, Ma T, Xu F, Liu A X, Wu C, Ji S. Deep graph matching and searching for semantic code retrieval. ACM Trans. Knowledge Discovery from Data (TKDD), 2021, 15(5): Article No. 88. DOI: 10.1145/3447571.
|
[16] |
Mathew G, Stolee K T. Cross-language code search using static and dynamic analyses. In Proc. the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, Aug. 2021, pp.205–217. DOI: 10.1145/3468264.3468538.
|
[17] |
Gu J, Chen Z, Monperrus M. Multimodal representation for neural code search. In Proc. the 37th IEEE International Conference on Software Maintenance and Evolution, Sept. 27–Oct. 1, 2021, pp.483–494. DOI: 10.1109/ICSME52107.2021.00049.
|
[18] |
Cheng W, Hu P, Wei S, Mo R. Keyword-guided abstractive code summarization via incorporating structural and contextual information. Information and Software Technology, 2022, 150: 106987. DOI: 10.1016/j.infsof.2022.106987.
|
[19] |
Zhou Z, Yu H, Fan G, Huang Z, Yang X. Summarizing source code with hierarchical code representation. Information and Software Technology, 2022, 143: 106761. DOI: 10.1016/j.infsof.2021.106761.
|
[20] |
Tang Z, Li C, Ge J, Shen X, Zhu Z, Luo B. AST-transformer: Encoding abstract syntax trees efficiently for code summarization. In Proc. the 36th IEEE/ACM International Conference on Automated Software Engineering, Nov. 2021, pp.1193–1195. DOI: 10.1109/ASE51524.2021.9678882.
|
[21] |
Rattan D, Bhatia R, Singh M. Software clone detection: A systematic review. Information and Software Technology, 2013, 55(7): 1165–1199. DOI: 10.1016/j.infsof.2013.01.008.
|
[22] |
Roy C K, Cordy J R. NICAD: Accurate detection of near-miss intentional clones using flexible pretty-printing and code normalization. In Proc. the 16th IEEE International Conference on Program Comprehension, Jun. 2008, pp.172–181. DOI: 10.1109/ICPC.2008.41.
|
[23] |
White M, Tufano M, Vendome C, Poshyvanyk D. Deep learning code fragments for code clone detection. In Proc. the 31st IEEE/ACM International Conference on Automated Software Engineering, Aug. 2016, pp.87–98. DOI: 10.1145/2970276.2970326.
|
[24] |
Saha R K, Roy C K, Schneider K A. An automatic framework for extracting and classifying near-miss clone genealogies. In Proc. the 27th IEEE International Conference on Software Maintenance, Sept. 2011, pp.293–302. DOI: 10.1109/ICSM.2011.6080796.
|
[25] |
Wang X, Dang Y, Zhang L, Zhang D, Lan E, Mei H. Predicting consistency-maintenance requirement of code clonesat copy-and-paste time. IEEE Trans. Software Engineering, 2014, 40(8): 773–794. DOI: 10.1109/TSE.2014.2323972.
|
[26] |
Zhang F, Khoo S C, Su X. Improving maintenance-consistency prediction during code clone creation. IEEE Access, 2020, 8: 82085–82099. DOI: 10.1109/ACCESS.2020.2990645.
|
[27] |
Hu B, Wu Y, Peng X, Sha C, Wang X, Fu B, Zhao W. Predicting change propagation between code clone instances by graph-based deep learning. In Proc. the 30th IEEE/ACM International Conference on Program Comprehension, Oct. 2022, pp.425–436. DOI: 10.1145/3524610.3527912.
|
[28] |
Nguyen H A, Nguyen T T, Pham N H, Al-Kofahi J, Nguyen T N. Clone management for evolving software. IEEE Trans. Software Engineering, 2012, 38(5): 1008–1026. DOI: 10.1109/TSE.2011.90.
|
[29] |
Cheng X, Zhong H, Chen Y, Hu Z, Zhao J. Rule-directed code clone synchronization. In Proc. the 24th IEEE International Conference on Program Comprehension, May 2016. DOI: 10.1109/ICPC.2016.7503722.
|
[30] |
Hu X, Li G, Xia X, Lo D, Jin Z. Deep code comment generation. In Proc. the 26th Conference on Program Comprehension, May 2018, pp.200–210. DOI: 10.1145/3196321.3196334.
|
[31] |
Nguyen S, Phan H, Le T, Nguyen T N. Suggesting natural method names to check name consistencies. In Proc. the 42nd ACM/IEEE International Conference on Software Engineering, Oct. 2020, pp.1372–1384. DOI: 10.1145/3377811.3380926.
|
[32] |
Le T H M, Chen H, Babar M A. Deep learning for source code modeling and generation: Models, applications, and challenges. ACM Computing Surveys (CSUR), 2021, 53(3): Article No. 62. DOI: 10.1145/3383458.
|
[33] |
Hoang T, Kang H J, Lo D, Lawall J. CC2Vec: Distributed representations of code changes. In Proc. the 42nd ACM/IEEE International Conference on Software Engineering, Jul. 2020, pp.518–529. DOI: 10.1145/3377811.3380361.
|
[34] |
Hoang T, Lawall J, Tian Y, Oentaryo R J, Lo D. PatchNet: Hierarchical deep learning-based stable patch identification for the Linux kernel. IEEE Trans. Software Engineering, 2021, 47(11): 2471–2486. DOI: 10.1109/TSE.2019.2952614.
|
[35] |
Alon U, Zilberstein M, Levy O, Yahav E. Code2vec: Learning distributed representations of code. Proceedings of the ACM on Programming Languages, 2019, 3(POPL): Article No. 40. DOI: 10.1145/3290353.
|
[36] |
Cho K, Van Merriënboer B, Gulcehre C, Bahdanau D, Bougares F, Schwenk H, Bengio Y. Learning phrase representations using RNN encoder-decoder for statistical machine translation. In Proc. the 2014 Conference on Empirical Methods in Natural Language Processing, Oct. 2014, pp.1724–1734. DOI: 10.3115/v1/D14-1179.
|
[37] |
Schuster M, Paliwal K K. Bidirectional recurrent neural networks. IEEE Trans. Signal Processing, 1997, 45(11): 2673–2681. DOI: 10.1109/78.650093.
|
[38] |
Lin T Y, Goyal P, Girshick R, He K, Dollár P. Focal loss for dense object detection. IEEE Trans. Pattern Analysis and Machine Intelligence, 2020, 42(2): 318–327. DOI: 10.1109/TPAMI.2018.2858826.
|
[1] | Chong Zhang, Hong-Zhi Wang, Hong-Wei Liu, Yi-Lin Chen. Fine-Tuning Channel-Pruned Deep Model via Knowledge Distillation[J]. Journal of Computer Science and Technology, 2024, 39(6): 1238-1247. DOI: 10.1007/s11390-023-2386-8 |
[2] | Lei Guan, Dong-Sheng Li, Ji-Ye Liang, Wen-Jian Wang, Ke-Shi Ge, Xi-Cheng Lu. Advances of Pipeline Model Parallelism for Deep Learning Training: An Overview[J]. Journal of Computer Science and Technology, 2024, 39(3): 567-584. DOI: 10.1007/s11390-024-3872-3 |
[3] | Adam Weingram, Yuke Li, Hao Qi, Darren Ng, Liuyao Dai, Xiaoyi Lu. xCCL: A Survey of Industry-Led Collective Communication Libraries for Deep Learning[J]. Journal of Computer Science and Technology, 2023, 38(1): 166-195. DOI: 10.1007/s11390-023-2894-6 |
[4] | Xin Zhang, Siyuan Lu, Shui-Hua Wang, Xiang Yu, Su-Jing Wang, Lun Yao, Yi Pan, Yu-Dong Zhang. Diagnosis of COVID-19 Pneumonia via a Novel Deep Learning Architecture[J]. Journal of Computer Science and Technology, 2022, 37(2): 330-343. DOI: 10.1007/s11390-020-0679-8 |
[5] | Sheng-Luan Hou, Xi-Kun Huang, Chao-Qun Fei, Shu-Han Zhang, Yang-Yang Li, Qi-Lin Sun, Chuan-Qing Wang. A Survey of Text Summarization Approaches Based on Deep Learning[J]. Journal of Computer Science and Technology, 2021, 36(3): 633-663. DOI: 10.1007/s11390-020-0207-x |
[6] | Hua Chen, Juan Liu, Qing-Man Wen, Zhi-Qun Zuo, Jia-Sheng Liu, Jing Feng, Bao-Chuan Pang, Di Xiao. CytoBrain: Cervical Cancer Screening System Based on Deep Learning Technology[J]. Journal of Computer Science and Technology, 2021, 36(2): 347-360. DOI: 10.1007/s11390-021-0849-3 |
[7] | Jun Gao, Paul Liu, Guang-Di Liu, Le Zhang. Robust Needle Localization and Enhancement Algorithm for Ultrasound by Deep Learning and Beam Steering Methods[J]. Journal of Computer Science and Technology, 2021, 36(2): 334-346. DOI: 10.1007/s11390-021-0861-7 |
[8] | Wei Du, Yu Sun, Hui-Min Bao, Liang Chen, Ying Li, Yan-Chun Liang. DeepHBSP: A Deep Learning Framework for Predicting Human Blood-Secretory Proteins Using Transfer Learning[J]. Journal of Computer Science and Technology, 2021, 36(2): 234-247. DOI: 10.1007/s11390-021-0851-9 |
[9] | Andrea Caroppo, Alessandro Leone, Pietro Siciliano. Comparison Between Deep Learning Models and Traditional Machine Learning Approaches for Facial Expression Recognition in Ageing Adults[J]. Journal of Computer Science and Technology, 2020, 35(5): 1127-1146. DOI: 10.1007/s11390-020-9665-4 |
[10] | Ma Zhifang. DKBLM——Deep Knowledge Based Learning Methodology[J]. Journal of Computer Science and Technology, 1993, 8(4): 93-98. |