SCIE, EI, Scopus, INSPEC, DBLP, CSCD, etc.
Citation: | Zhang C, Wang HZ, Liu HW et al. Fine-tuning channel-pruned deep model via knowledge distillation. JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY 39(6): 1238−1247 Nov. 2024. DOI: 10.1007/s11390-023-2386-8. |
Deep convolutional neural networks with high performance are hard to be deployed in many real world applications, since the computing resources of edge devices such as smart phones or embedded GPU are limited. To alleviate this hardware limitation, the compression of deep neural networks from the model side becomes important. As one of the most popular methods in the spotlight, channel pruning of the deep convolutional model can effectively remove redundant convolutional channels from the CNN (convolutional neural network) without affecting the network’s performance remarkably. Existing methods focus on pruning design, evaluating the importance of different convolutional filters in the CNN model. A fast and effective fine-tuning method to restore accuracy is urgently needed. In this paper, we propose a fine-tuning method KDFT (Knowledge Distillation Based Fine-Tuning), which improves the accuracy of fine-tuned models with almost negligible training overhead by introducing knowledge distillation. Extensive experimental results on benchmark datasets with representative CNN models show that up to 4.86% accuracy improvement and 79% time saving can be obtained.
[1] |
Krizhevsky A, Sutskever I, Hinton G E. ImageNet classification with deep convolutional neural networks. Communications of the ACM, 2017, 60(6): 84–90. DOI: 10.1145/3065386.
|
[2] |
Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A. Going deeper with convolutions. In Proc. the 2015 IEEE Conference on Computer Vision and Pattern Recognition, Jun. 2015, pp.1–9. DOI: 10.1109/cvpr.2015.7298594.
|
[3] |
He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. In Proc. the 2016 IEEE Conference on Computer Vision and Pattern Recognition, Jun. 2016, pp.770–778. DOI: 10.1109/cvpr.2016.90.
|
[4] |
Niyaz U, Bathula D R. Augmenting knowledge distillation with peer-to-peer mutual learning for model compression. In Proc. the 19th International Symposium on Biomedical Imaging, Mar. 2022, pp.1–4. DOI: 10.1109/ISBI52829.2022.9761511.
|
[5] |
Morikawa T, Kameyama K. Multi-stage model compression using teacher assistant and distillation with hint-based training. In Proc. the 2022 IEEE International Conference on Pervasive Computing and Communications Workshops and Other Affiliated Events, Mar. 2022, pp.484–490. DOI: 10.1109/PerComWorkshops53856.2022.9767229.
|
[6] |
Chen P, Liu S, Zhao H, Jia J. Distilling knowledge via knowledge review. In Proc. the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 2021, pp.5006–5015. DOI: 10.1109/cvpr46437.2021.00497.
|
[7] |
Redmon J, Divvala S, Girshick R, Farhadi A. You only look once: Unified, real-time object detection. In Proc. the 2016 IEEE Conference on Computer Vision and Pattern Recognition, Jun. 2016, pp.779–788. DOI: 10.1109/CVPR.2016.91.
|
[8] |
Ren S, He K, Girshick R, Sun J. Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Trans. Pattern Analysis and Machine Intelligence, 2017, 39(6): 1137–1149. DOI: 10.1109/tpami.2016.2577031.
|
[9] |
Shelhamer E, Long J, Darrell T. Fully convolutional networks for semantic segmentation. IEEE Trans. Pattern Analysis and Machine Intelligence, 2017, 39(4): 640–651. DOI: 10.1109/tpami.2016.2572683.
|
[10] |
Chen L C, Papandreou G, Kokkinos I, Murphy K, Yuille A L. DeepLab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Trans. Pattern Analysis and Machine Intelligence, 2018, 40(4): 834–848. DOI: 10.1109/tpami.2017.2699184.
|
[11] |
Han S, Pool J, Tran J, Dally W J. Learning both weights and connections for efficient neural network. In Proc. the 28th International Conference on Neural Information Processing Systems, Dec. 2015, pp.1135–1143.
|
[12] |
Guo Y, Yao A, Chen Y. Dynamic network surgery for efficient DNNs. In Proc. the 30th International Conference on Neural Information Processing Systems, Dec. 2016, pp.1387–1395.
|
[13] |
Wen W, Wu C, Wang Y, Chen Y, Li H. Learning structured sparsity in deep neural networks. In Proc. the 30th International Conference on Neural Information Processing Systems, Dec. 2016, pp.2074–2082.
|
[14] |
Chen W, Wilson J T, Tyree S, Weinberger K Q, Chen Y. Compressing neural networks with the hashing trick. In Proc. the 32nd International Conference on Machine Learning, Jul. 2015, pp.2285–2294.
|
[15] |
Denton E, Zaremba W, Bruna J, LeCun Y, Fergus R. Exploiting linear structure within convolutional networks for efficient evaluation. In Proc. the 27th International Conference on Neural Information Processing Systems, Dec. 2014, pp.1269–1277.
|
[16] |
Lin M, Ji R, Wang Y, Zhang Y, Zhang B, Tian Y, Shao L. HRank: Filter pruning using high-rank feature map. In Proc. the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 2020, pp.1526–1535. DOI: 10.1109/cvpr42600.2020.00160.
|
[17] |
Gao S, Huang F, Pei J, Huang H. Discrete model compression with resource constraint for deep neural networks. In Proc. the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 2020, pp.1896–1905. DOI: 10.1109/cvpr42600.2020.00197.
|
[18] |
Liu Z, Li J, Shen Z, Huang G, Yan S, Zhang C. Learning efficient convolutional networks through network slimming. In Proc. the 2017 IEEE International Conference on Computer Vision, Oct. 2017, pp.2755–2763. DOI: 10.1109/iccv.2017.298.
|
[19] |
He Y, Lin J, Liu Z, Wang H, Li L J, Han S. AMC: AutoML for model compression and acceleration on mobile devices. In Proc the 15th European Conference on Computer Vision, Sept. 2018, pp.815–832. DOI: 10.1007/978-3-030-01234-2_48.
|
[20] |
Le Cun Y, Denker J S, Solla S A. Optimal brain damage. In Proc. the 2nd International Conference on Neural Information Processing Systems, Jan. 1989, pp.598–605.
|
[21] |
Lawson C L, Hanson R J, Kincaid D R, Krogh F T. Basic linear algebra subprograms for Fortran usage. ACM Trans. Mathematical Software (TOMS), 1979, 5(3): 308–323. DOI: 10.1145/355841.355847.
|
[22] |
Denil M, Shakibi B, Dinh L, Ranzato M, de Freitas N. Predicting parameters in deep learning. In Proc. the 26th International Conference on Neural Information Processing Systems, Dec. 2013, pp.2148–2156.
|
[23] |
Jiang W, Wang W, Liu S. Structured weight unification and encoding for neural network compression and acceleration. In Proc. the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, Jun. 2020, pp.3068–3076. DOI: 10.1109/cvprw50498.2020.00365.
|
[24] |
Hu H, Peng R, Tai Y W, Tang C K. Network trimming: A data-driven neuron pruning approach towards efficient deep architectures. arXiv: 1607.03250, 2016. https://arxiv.org/abs/1607.03250, Sept. 2024.
|
[25] |
Guo J, Ouyang W, Xu D. Multi-dimensional pruning: A unified framework for model compression. In Proc. the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 2020, pp.1505–1514. DOI: 10.1109/cvpr42600.2020.00158.
|
[26] |
Hinton G, Vinyals O, Dean J. Distilling the knowledge in a neural network. arXiv: 1503.02531, 2015. https://arxiv.org/abs/1503.02531, Sept. 2024.
|
[27] |
Krizhevsky A. Learning multiple layers of features from tiny images. [Master Thesis], University of Toronto, 2009. https://www.cs.toronto.edu/~kriz/learning-features-2009-TR.pdf, Sept. 2024.
|
[28] |
Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. arXiv: 1409.1556, 2015. https://arxiv.org/abs/1409.1556, Sept. 2024.
|
[29] |
Srinivas S, Subramanya A, Venkatesh Babu R. Training sparse neural networks. In Proc. the 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops, Jul. 2017. pp.455–462. DOI: 10.1109/cvprw.2017.61.
|
[30] |
Gardner M W, Dorling S R. Artificial neural networks (the multilayer perceptron)—A review of applications in the atmospheric sciences. Atmospheric Environment, 1998, 32(14/15): 2627–2636. DOI: 10.1016/S1352-2310(97)00447-0.
|
[31] |
Chin T W, Ding R, Zhang C, Marculescu D. Towards efficient model compression via learned global ranking. In Proc. the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 2020. pp.1515–1525. DOI: 10.1109/cvpr42600.2020.00159.
|
[32] |
Romero A, Ballas N, Kahou S E, Chassang A, Gatta C, Bengio Y. FitnEts: Hints for thin deep Nets. In Proc. the 3rd International Conference on Learning Representations, May 2015.
|
[33] |
Yim J, Joo D, Bae J, Kim J. A gift from knowledge distillation: Fast optimization, network minimization and transfer learning. In Proc. the 2017 IEEE Conference on Computer Vision and Pattern Recognition, Jul. 2017, pp.7130–7138. DOI: 10.1109/cvpr.2017.754.
|
[34] |
Sandler M, Howard A, Zhu M, Zhmoginov A, Chen L C. MobileNetV2: Inverted residuals and linear bottlenecks. In Proc. the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 2018. pp.4510–4520. DOI: 10.1109/cvpr.2018.00474.
|
[35] |
Paszke A, Gross S, Chintala S, Chanan G, Yang E, DeVito Z, Lin Z, Desmaison A, Antiga L, Lerer A. Automatic differentiation in Pytorch. In Proc. the 31st Conference on Neural Information Processing Systems, Dec. 2017.
|
[36] |
Robbins H, Monro S. A stochastic approximation method. The Annals of Mathematical Statistics, 1951, 22(3): 400–407. DOI: 10.1214/aoms/1177729586.
|
[1] | Zhi-Yuan Wu, Tian-Liu He, Sheng Sun, Yu-Wei Wang, Min Liu, Bo Gao, Xue-Feng Jiang. Federated Class-Incremental Learning with New-Class Augmented Self-Distillation[J]. Journal of Computer Science and Technology. DOI: 10.1007/s11390-025-5186-5 |
[2] | Quan Feng, Jia-Yu Yao, Ming-Kun Xie, Sheng-Jun Huang, Song-Can Chen. Sequential Cooperative Distillation for Imbalanced Multi-Task Learning[J]. Journal of Computer Science and Technology, 2024, 39(5): 1094-1106. DOI: 10.1007/s11390-024-2264-z |
[3] | Yong-Chi Ma, Xiao Ma, Tian-Ran Hao, Li-Sha Cui, Shao-Hui Jin, Pei Lyu. Knowledge Distillation via Hierarchical Matching for Small Object Detection[J]. Journal of Computer Science and Technology, 2024, 39(4): 798-810. DOI: 10.1007/s11390-024-4158-5 |
[4] | Yi-Ge Xu, Xi-Peng Qiu, Li-Gao Zhou, Xuan-Jing Huang. Improving BERT Fine-Tuning via Self-Ensemble and Self-Distillation[J]. Journal of Computer Science and Technology, 2023, 38(4): 853-866. DOI: 10.1007/s11390-021-1119-0 |
[5] | Jia-Ke Ge, Yan-Feng Chai, Yun-Peng Chai. WATuning: A Workload-Aware Tuning System with Attention-Based Deep Reinforcement Learning[J]. Journal of Computer Science and Technology, 2021, 36(4): 741-761. DOI: 10.1007/s11390-021-1350-8 |
[6] | Sheng-Luan Hou, Xi-Kun Huang, Chao-Qun Fei, Shu-Han Zhang, Yang-Yang Li, Qi-Lin Sun, Chuan-Qing Wang. A Survey of Text Summarization Approaches Based on Deep Learning[J]. Journal of Computer Science and Technology, 2021, 36(3): 633-663. DOI: 10.1007/s11390-020-0207-x |
[7] | Xi-Ming Li, Ji-Hong Ouyang. Tuning the Learning Rate for Stochastic Variational Inference[J]. Journal of Computer Science and Technology, 2016, 31(2): 428-436. DOI: 10.1007/s11390-016-1636-4 |
[8] | Yan Li, Yun-Quan Zhang, Yi-Qun Liu, Guo-Ping Long, Hai-Peng Jia. MPFFT: An Auto-Tuning FFT Library for OpenCL GPUs[J]. Journal of Computer Science and Technology, 2013, 28(1): 90-105. DOI: 10.1007/s11390-013-1314-8 |
[9] | Hui-Xuan Tang, Hui Wei. A Coarse-to-Fine Method for Shape Recognition[J]. Journal of Computer Science and Technology, 2007, 22(2): 329-333. |
[10] | Ma Zhifang. DKBLM——Deep Knowledge Based Learning Methodology[J]. Journal of Computer Science and Technology, 1993, 8(4): 93-98. |