Journal of Computer Science and Technology ›› 2022, Vol. 37 ›› Issue (3): 584-600.doi: 10.1007/s11390-022-2131-8

Special Issue: Artificial Intelligence and Pattern Recognition; Computer Graphics and Multimedia

• Special Section of CVM 2022 • Previous Articles     Next Articles

Probability-Based Channel Pruning for Depthwise Separable Convolutional Networks

Han-Li Zhao1 (赵汉理), Senior Member, CCF, Kai-Jie Shi1 (史开杰), Xiao-Gang Jin2 (金小刚), Distinguished Member, CCF, Ming-Liang Xu3 (徐明亮), Member, CCF, Hui Huang1 (黄辉), Senior Member, CCF, Wang-Long Lu1,4 (卢望龙), and Ying Liu1 (刘影)        

  1. 1College of Computer Science and Artificial Intelligence, Wenzhou University, Wenzhou 325035, China
    2State Key Laboratory of CAD&CG, Zhejiang University, Hangzhou 310058, China
    3School of Information Engineering, Zhengzhou University, Zhengzhou 450000, China
    4Department of Computer Science, Memorial University of Newfoundland, St. John's A1B 3X5, Canada
  • Received:2022-01-01 Revised:2022-04-24 Accepted:2022-05-06 Online:2022-05-30 Published:2022-05-30
  • Contact: Han-Li Zhao E-mail:hanlizhao@wzu.edu.cn
  • About author:Han-Li Zhao is a professor of College of Computer Science and Artificial Intelligence, Wenzhou University, Wenzhou. He received his B.Sc. degree in software engineering from Sichuan University, Chengdu, in 2004, and his Ph.D. degree in computer science from the State Key Laboratory of CAD&CG, Zhejiang University, Hangzhou, in 2009. His current research interests include computer vision, pattern recognition, medical image analysis, and deep learning. He is a senior member of CCF.
  • Supported by:
    This work was supported by the National Natural Science Foundation of China under Grant Nos. 62036010 and 62072340, the Zhejiang Provincial Natural Science Foundation of China under Grant Nos. LZ21F020001 and LSZ19F020001, and the Open Project Program of the State Key Laboratory of CAD&CG, Zhejiang University under Grant No. A2220.

Channel pruning can reduce memory consumption and running time with least performance damage, and is one of the most important techniques in network compression. However, existing channel pruning methods mainly focus on the pruning of standard convolutional networks, and they rely intensively on time-consuming fine-tuning to achieve the performance improvement. To this end, we present a novel efficient probability-based channel pruning method for depthwise separable convolutional networks. Our method leverages a new simple yet effective probability-based channel pruning criterion by taking the scaling and shifting factors of batch normalization layers into consideration. A novel shifting factor fusion technique is further developed to improve the performance of the pruned networks without requiring extra time-consuming fine-tuning. We apply the proposed method to five representative deep learning networks, namely MobileNetV1, MobileNetV2, ShuffleNetV1, ShuffleNetV2, and GhostNet, to demonstrate the efficiency of our pruning method. Extensive experimental results and comparisons on publicly available CIFAR10, CIFAR100, and ImageNet datasets validate the feasibility of the proposed method.

Key words: network compression; channel pruning; depthwise separable convolution; batch normalization;

[1] Cheng Y, Wang D, Zhou P, Zhang T. A survey of model compression and acceleration for deep neural networks. arXiv:1710.09282, 2017. https://arxiv.org/abs/ 1710.09282, Jun. 2021.

[2] Han S, Pool J, Tran J, Dally W. Learning both weights and connections for efficient neural network. In Proc. the 28th International Conference on Neural Information Processing Systems, Dec. 2015, pp.1135-1143.

[3] Liu Z, Li J, Shen Z, Huang G, Yan S, Zhang C. Learning efficient convolutional networks through network slimming. In Proc. the 2017 IEEE International Conference on Computer Vision, Oct. 2017, pp.2736-2744. DOI: 10.1109/ICCV.2017.298.

[4] West D M. The Future of Work: Robots, AI, and Automation. Brookings Institution Press, 2018.

[5] Liu Z, Sun M, Zhou T, Huang G, Darrell T. Rethinking the value of network pruning. arXiv:1810.05270, 2018. https://arxiv.org/abs/1810.05270, Mar. 2021.

[6] Liu R, Cao J, Li P, Sun W, Zhang Y, Wang Y. NFP: A no fine-tuning pruning approach for convolutional neural network compression. In Proc. the 3rd International Conference on Artificial Intelligence and Big Data, May 2020, pp.74-77. DOI: 10.1109/ICAIBD49809.2020.9137429.

[7] Howard A G, Zhu M, Chen B, Kalenichenko D, Wang W, Weyand T, Andreetto M, Adam H. MobileNets: Efficient convolutional neural networks for mobile vision applications. arXiv:1704.04861, 2017. https://arxiv.org/abs/ 1704.04861, Apr. 2021.

[8] Sandler M, Howard A, Zhu M, Zhmoginov A, Chen L. MobileNetV2: Inverted residuals and linear bottlenecks. In Proc. the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 2018, pp.4510-4520. DOI: 10.1109/CVPR.2018.00474.

[9] Zhang X, Zhou X, Lin M, Sun J. ShuffleNet: An extremely efficient convolutional neural network for mobile devices. In Proc. the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 2018, pp.6848-6856. DOI: 10.1109/CVPR.2018.00716.

[10] Ma N, Zhang X, Zheng H, Sun J. ShuffleNetV2: Practical guidelines for efficient CNN architecture design. In Proc. the 15th European Conference on Computer Vision, Sept. 2018, pp.116-131. DOI: 10.1007/978-3-030-01264-9.

[11] Han K, Wang Y, Tian Q, Guo J, Xu C, Xu C. GhostNet: More features from cheap operations. In Proc. the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 2020, pp.1580-1589. DOI: 10.1109/CVPR42600.2020.00165.

[12] Zhang K, Cheng K, Li J, Peng Y. A channel pruning algorithm based on depth-wise separable convolution unit. IEEE Access, 2019, 7: 173294-173309. DOI: 10.1109/ACCESS.2019.2956976.

[13] Sifre L, Mallat S. Rigid-motion scattering for texture classification. arXiv:1403.1687, 2014. https://arxiv.org/abs/ 1403.1687, Mar. 2021.

[14] Ioffe S, Szegedy C. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In Proc. the 32nd International Conference on Machine Learning, Jul. 2015, pp.448-456.

[15] Glorot X, Bordes A, Bengio Y. Deep sparse rectifier neural networks. In Proc. the 14th International Conference on Artificial Intelligence and Statistics, Apr. 2011, pp.315-323. DOI: 10.1.1.208.6449.

[16] Krizhevsky A. Learning multiple layers of features from tiny images. Technical Report, University of Toronto. http: // www.cs.toronto.edu/kriz/learning-features-2009-TR.pdf, June 2022.

[17] Deng J, Dong W, Socher R, Li L, Li K, L F. ImageNet: A large-scale hierarchical image database. In Proc. the 2009 IEEE Conference on Computer Vision and Pattern Recognition, Jun. 2009, pp.248-255. DOI: 10.1109/CVPR.2009.5206848.

[18] Hu H, Peng R, Tai Y, Tang C. Network trimming: A data-driven neuron pruning approach towards efficient deep architectures. arXiv:1607.03250, 2016. https://arxiv.org/ abs/1607.03250, Jul. 2021.

[19] Han S, Mao H, Dally W J. Deep compression: Compressing deep neural networks with pruning, trained quantization and Huffman coding. arXiv:1510.00149, 2015. https: //arxiv.org/ abs/1510.00149, Feb. 2021.

[20] He Y, Zhang X, Sun J. Channel pruning for accelerating very deep neural networks. In Proc. the 2017 IEEE International Conference on Computer Vision, Oct. 2017, pp.1389-1397. DOI: 10.1109/ICCV.2017.155.

[21] Luo J, Wu J, Lin W. ThiNet: A filter level pruning method for deep neural network compression. In Proc. the 2017 IEEE International Conference on Computer Vision, Oct. 2017, pp.5058-5066. DOI: 10.1109/ICCV.2017.541.

[22] Lebedev V, Lempitsky V. Fast ConvNets using group-wise brain damage. In Proc. the 2016 IEEE Conference on Computer Vision and Pattern Recognition, Jun. 2016, pp.2554-2564. DOI: 10.1109/CVPR.2016.280.

[23] Wang Y, Zhang X, Xie L, Zhou J, Su H, Zhang B, Hu X. Pruning from scratch. In Proc. the 34th AAAI Conference on Artificial Intelligence, Feb. 2020, pp.12273-12280. DOI: 10.1609/aaai.v34i07.6910.

[24] Zhuang T, Zhang Z, Huang Y, Zeng X, Shuang K, Li X. Neuron-level structured pruning using polarization regularizer. In Proc. the Annual Conference on Neural Information Processing Systems, Dec. 2020, pp.9865-9877.

[25] Wen W, Wu C, Wang Y, Chen Y, Li H. Learning structured sparsity in deep neural networks. In Proc. the Annual Conference on Neural Information Processing Systems, Dec. 2016, pp.2082-2090.

[26] Ye J, Lu X, Lin Z, Wang J Z. Rethinking the smaller-norm-less-informative assumption in channel pruning of convolution layers. arXiv:1802.00124, 2018. https://arxiv.org/ abs/1802.00124, Feb. 2021.

[27] Yang T, Howard A, Chen B, Zhang X, Go A, Sandler M, Sze V, Adam H. NetAdapt: Platform-aware neural network adaptation for mobile applications. In Proc. the 15th European Conference on Computer Vision, Sept. 2018, pp.285-300. DOI: 10.1007/978-3-030-01249-6.

[28] Li H, Kadav A, Durdanovic I, Samet H, Graf H P. Pruning filters for efficient convnets. arXiv:1608.08710, 2016. https://arxiv.org/abs/1608.08710, Mar. 2021.

[29] Huang Z, Wang N. Data-driven sparse structure selection for deep neural networks. In Proc. the 15th European Conference on Computer Vision, Sept. 2018, pp.304-320. DOI: 10.1007/978-3-030-01270-0.

[30] He Y, Kang G, Dong X, Fu Y, Yang Y. Soft filter pruning for accelerating deep convolutional neural networks. arXiv:1808.06866, 2018. https://arxiv.org/abs/1808.06866, Aug. 2021.

[31] He Y, Liu P, Wang Z, Hu Z, Yang Y. Filter pruning via geometric median for deep convolutional neural networks acceleration. In Proc. the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 2019, pp.4340-4349. DOI: 10.1109/CVPR.2019.00447.

[32] Kang M, Han B. Operation-aware soft channel pruning using differentiable masks. arXiv:2007.03938, 2020. https: //arxiv.org/abs/2007.03938, Jul. 2021.

[33] Yu J, Yang L, Xu N, Yang J, Huang T. Slimmable neural networks. arXiv:1812.08928, 2018. https://arxiv.org/ abs/1812.08928, Dec. 2021.

[34] Yu J, Huang T. AutoSlim: Towards one-shot architecture search for channel numbers. arXiv:1903.11728, 2019. https://arxiv.org/abs/1903.11728, Jun. 2021.

[35] He Y, Lin J, Liu Z, Wang H, Li L, Han S. AMC: AutoML for model compression and acceleration on mobile devices. In Proc. the 15th European Conference on Computer Vision, Sept. 2018, pp.784-800. DOI: 10.1007/978-3-030-01234-2.

[36] Liu Z, Mu H, Zhang X, Guo Z, Yang X, Cheng K, Sun J. MetaPruning: Meta learning for automatic neural network channel pruning. In Proc. the 2019 IEEE/CVF International Conference on Computer Vision, Oct. 2019, pp.3296-3305. DOI: 10.1109/ICCV.2019.00339.

[37] He K, Zhang X, Ren S, Sun J. Delving deep into rectifiers: Surpassing human-level performance on ImageNet classification. In Proc. the 2015 IEEE International Conference on Computer Vision, Dec. 2015, pp.1026-1034. DOI: 10.1109/ICCV.2015.123.

[38] He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. In Proc. the 2016 IEEE Conference on Computer Vision and Pattern Recognition, Jun. 2016, pp.770-778. DOI: 10.1109/CVPR.2016.90.

[39] He Y, Dong X, Kang G, Fu Y, Yan C, Yang Y. Asymptotic soft filter pruning for deep convolutional neural networks. IEEE Transactions on Cybernetics, 2019, 50(8): 3594-3604. DOI: 10.1109/TCYB.2019.2933477.

[40] He T, Zhang Z, Zhang H, Zhang Z, Xie J, Li M. Bag of tricks for image classification with convolutional neural networks. In Proc. the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 2019, pp.558-567. DOI: 10.1109/CVPR.2019.00065.

[41] Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z. Rethinking the inception architecture for computer vision. In Proc. the 2016 IEEE Conference on Computer Vision and Pattern Recognition, Jun. 2016, pp.2818-2826. DOI: 10.1109/CVPR.2016.308.

[42] Lu W, Zhao H, He Q, Huang H, Jin X. Category-consistent deep network learning for accurate vehicle logo recognition. Neurocomputing, 2021, 463: 623-636. DOI: 10.1016/j.neucom.2021.08.030.

[43] Zhao H, Qiu X, Lu W, Huang H, Jin X. Retinal vessel segmentation using generative adversarial learning with a large receptive field. International Journal of Imaging Systems and Technology, 2020, 30(3): 828-842. DOI: 10.1002/ima.22428.

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] Zhou Di;. A Recovery Technique for Distributed Communicating Process Systems[J]. , 1986, 1(2): 34 -43 .
[2] Chen Shihua;. On the Structure of Finite Automata of Which M Is an(Weak)Inverse with Delay τ[J]. , 1986, 1(2): 54 -59 .
[3] Wang Jianchao; Wei Daozheng;. An Effective Test Generation Algorithm for Combinational Circuits[J]. , 1986, 1(4): 1 -16 .
[4] Chen Zhaoxiong; Gao Qingshi;. A Substitution Based Model for the Implementation of PROLOG——The Design and Implementation of LPROLOG[J]. , 1986, 1(4): 17 -26 .
[5] Huang Heyan;. A Parallel Implementation Model of HPARLOG[J]. , 1986, 1(4): 27 -38 .
[6] Zheng Guoliang; Li Hui;. The Design and Implementation of the Syntax-Directed Editor Generator(SEG)[J]. , 1986, 1(4): 39 -48 .
[7] Huang Xuedong; Cai Lianhong; Fang Ditang; Chi Bianjin; Zhou Li; Jiang Li;. A Computer System for Chinese Character Speech Input[J]. , 1986, 1(4): 75 -83 .
[8] Xu Xiaoshu;. Simplification of Multivalued Sequential SULM Network by Using Cascade Decomposition[J]. , 1986, 1(4): 84 -95 .
[9] Tang Tonggao; Zhao Zhaokeng;. Stack Method in Program Semantics[J]. , 1987, 2(1): 51 -63 .
[10] Zhong Renbao; Xing Lin; Ren Zhaoyang;. An Interactive System SDI on Microcomputer[J]. , 1987, 2(1): 64 -71 .

ISSN 1000-9000(Print)

         1860-4749(Online)
CN 11-2296/TP

Home
Editorial Board
Author Guidelines
Subscription
Journal of Computer Science and Technology
Institute of Computing Technology, Chinese Academy of Sciences
P.O. Box 2704, Beijing 100190 P.R. China
Tel.:86-10-62610746
E-mail: jcst@ict.ac.cn
 
  Copyright ©2015 JCST, All Rights Reserved