计算机科学技术学报 ›› 2022,Vol. 37 ›› Issue (3): 584-600.doi: 10.1007/s11390-022-2131-8

所属专题: Artificial Intelligence and Pattern Recognition Computer Graphics and Multimedia

• • 上一篇    下一篇

基于概率的深度可分离卷积网络通道剪枝

  

  • 收稿日期:2022-01-01 修回日期:2022-04-24 接受日期:2022-05-06 出版日期:2022-05-30 发布日期:2022-05-30

Probability-Based Channel Pruning for Depthwise Separable Convolutional Networks

Han-Li Zhao1 (赵汉理), Senior Member, CCF, Kai-Jie Shi1 (史开杰), Xiao-Gang Jin2 (金小刚), Distinguished Member, CCF, Ming-Liang Xu3 (徐明亮), Member, CCF, Hui Huang1 (黄辉), Senior Member, CCF, Wang-Long Lu1,4 (卢望龙), and Ying Liu1 (刘影)        

  1. 1College of Computer Science and Artificial Intelligence, Wenzhou University, Wenzhou 325035, China
    2State Key Laboratory of CAD&CG, Zhejiang University, Hangzhou 310058, China
    3School of Information Engineering, Zhengzhou University, Zhengzhou 450000, China
    4Department of Computer Science, Memorial University of Newfoundland, St. John's A1B 3X5, Canada
  • Received:2022-01-01 Revised:2022-04-24 Accepted:2022-05-06 Online:2022-05-30 Published:2022-05-30
  • Contact: Han-Li Zhao E-mail:hanlizhao@wzu.edu.cn
  • About author:Han-Li Zhao is a professor of College of Computer Science and Artificial Intelligence, Wenzhou University, Wenzhou. He received his B.Sc. degree in software engineering from Sichuan University, Chengdu, in 2004, and his Ph.D. degree in computer science from the State Key Laboratory of CAD&CG, Zhejiang University, Hangzhou, in 2009. His current research interests include computer vision, pattern recognition, medical image analysis, and deep learning. He is a senior member of CCF.
  • Supported by:
    This work was supported by the National Natural Science Foundation of China under Grant Nos. 62036010 and 62072340, the Zhejiang Provincial Natural Science Foundation of China under Grant Nos. LZ21F020001 and LSZ19F020001, and the Open Project Program of the State Key Laboratory of CAD&CG, Zhejiang University under Grant No. A2220.

1、 研究背景(Context):作为网络压缩中最重要的技术之一,通道剪枝能以很小的性能损失来减少内存消耗和运行时间。通道剪枝技术促进了人工智能在日常生活应用中的发展,诸如无人驾驶汽车、机器人技术和增强现实等。但是,现有的通道剪枝方法主要集中于标准卷积网络的剪枝研究,并且主要依靠耗时的微调技术来达到性能的提升。
2、 目的(Objective):本文利用逐通道卷积对每个通道仅使用一个滤波器、并且不会改变通道数量的特性,研究一种面向深度可分离卷积网络的有效通道剪枝方法。
3、 方法(Method):本文提出了一种基于概率的深度可分离卷积通道剪枝方法。首先,充分利用深度可分离卷积中BN和ReLU的特性,提出了一种新的基于概率的剪枝准则。如果某一BN层的输出在概率上极有可能小于或等于0,则相应通道被视为不重要的通道,可以执行剪枝操作。然后,由于逐通道卷积的输入通道数与输出通道数相同,基于通道一致性保持原则将每个待剪枝通道分为四种情况,根据不同的情况进行有效地剪枝操作。最后,运用偏移因子融合技术来避免通道剪枝过程中引入的数值误差。
4、 结果(Result & Findings):本文在公开数据集CIFAR10、CIFAR100和ImageNet上测试了本文方法的有效性。本文对MobileNetV1、MobileNetV2、ShuffleNetV1、ShuffleNetV2和GhostNet等多个深度可分离卷积网络进行通道剪枝实验。实验结果表示,本文方法在识别精度和参数数量上均能取得较好的结果。其中,在ImageNet数据集,基于本文方法所得到的网络模型与MobileNetV1基准模型相比减少了约40%参数量和40%计算量。
5、 结论(Conclusions):本文提出了一种基于概率的深度可分离卷积网络通道剪枝方法。基于BN缩放和偏移因子提出了一种简单有效的基于概率的通道剪枝准则,并且运用偏移因子融合技术进一步提高通道剪枝性能。在公开数据上的实验结果展现了本文方法的可行性。

关键词: 网络压缩, 通道剪枝, 深度可分离卷积, 批归一化

Abstract: Channel pruning can reduce memory consumption and running time with least performance damage, and is one of the most important techniques in network compression. However, existing channel pruning methods mainly focus on the pruning of standard convolutional networks, and they rely intensively on time-consuming fine-tuning to achieve the performance improvement. To this end, we present a novel efficient probability-based channel pruning method for depthwise separable convolutional networks. Our method leverages a new simple yet effective probability-based channel pruning criterion by taking the scaling and shifting factors of batch normalization layers into consideration. A novel shifting factor fusion technique is further developed to improve the performance of the pruned networks without requiring extra time-consuming fine-tuning. We apply the proposed method to five representative deep learning networks, namely MobileNetV1, MobileNetV2, ShuffleNetV1, ShuffleNetV2, and GhostNet, to demonstrate the efficiency of our pruning method. Extensive experimental results and comparisons on publicly available CIFAR10, CIFAR100, and ImageNet datasets validate the feasibility of the proposed method.

Key words: network compression, channel pruning, depthwise separable convolution, batch normalization

[1] Cheng Y, Wang D, Zhou P, Zhang T. A survey of model compression and acceleration for deep neural networks. arXiv:1710.09282, 2017. https://arxiv.org/abs/ 1710.09282, Jun. 2021.

[2] Han S, Pool J, Tran J, Dally W. Learning both weights and connections for efficient neural network. In Proc. the 28th International Conference on Neural Information Processing Systems, Dec. 2015, pp.1135-1143.

[3] Liu Z, Li J, Shen Z, Huang G, Yan S, Zhang C. Learning efficient convolutional networks through network slimming. In Proc. the 2017 IEEE International Conference on Computer Vision, Oct. 2017, pp.2736-2744. DOI: 10.1109/ICCV.2017.298.

[4] West D M. The Future of Work: Robots, AI, and Automation. Brookings Institution Press, 2018.

[5] Liu Z, Sun M, Zhou T, Huang G, Darrell T. Rethinking the value of network pruning. arXiv:1810.05270, 2018. https://arxiv.org/abs/1810.05270, Mar. 2021.

[6] Liu R, Cao J, Li P, Sun W, Zhang Y, Wang Y. NFP: A no fine-tuning pruning approach for convolutional neural network compression. In Proc. the 3rd International Conference on Artificial Intelligence and Big Data, May 2020, pp.74-77. DOI: 10.1109/ICAIBD49809.2020.9137429.

[7] Howard A G, Zhu M, Chen B, Kalenichenko D, Wang W, Weyand T, Andreetto M, Adam H. MobileNets: Efficient convolutional neural networks for mobile vision applications. arXiv:1704.04861, 2017. https://arxiv.org/abs/ 1704.04861, Apr. 2021.

[8] Sandler M, Howard A, Zhu M, Zhmoginov A, Chen L. MobileNetV2: Inverted residuals and linear bottlenecks. In Proc. the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 2018, pp.4510-4520. DOI: 10.1109/CVPR.2018.00474.

[9] Zhang X, Zhou X, Lin M, Sun J. ShuffleNet: An extremely efficient convolutional neural network for mobile devices. In Proc. the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 2018, pp.6848-6856. DOI: 10.1109/CVPR.2018.00716.

[10] Ma N, Zhang X, Zheng H, Sun J. ShuffleNetV2: Practical guidelines for efficient CNN architecture design. In Proc. the 15th European Conference on Computer Vision, Sept. 2018, pp.116-131. DOI: 10.1007/978-3-030-01264-9.

[11] Han K, Wang Y, Tian Q, Guo J, Xu C, Xu C. GhostNet: More features from cheap operations. In Proc. the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 2020, pp.1580-1589. DOI: 10.1109/CVPR42600.2020.00165.

[12] Zhang K, Cheng K, Li J, Peng Y. A channel pruning algorithm based on depth-wise separable convolution unit. IEEE Access, 2019, 7: 173294-173309. DOI: 10.1109/ACCESS.2019.2956976.

[13] Sifre L, Mallat S. Rigid-motion scattering for texture classification. arXiv:1403.1687, 2014. https://arxiv.org/abs/ 1403.1687, Mar. 2021.

[14] Ioffe S, Szegedy C. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In Proc. the 32nd International Conference on Machine Learning, Jul. 2015, pp.448-456.

[15] Glorot X, Bordes A, Bengio Y. Deep sparse rectifier neural networks. In Proc. the 14th International Conference on Artificial Intelligence and Statistics, Apr. 2011, pp.315-323. DOI: 10.1.1.208.6449.

[16] Krizhevsky A. Learning multiple layers of features from tiny images. Technical Report, University of Toronto. http: // www.cs.toronto.edu/kriz/learning-features-2009-TR.pdf, June 2022.

[17] Deng J, Dong W, Socher R, Li L, Li K, L F. ImageNet: A large-scale hierarchical image database. In Proc. the 2009 IEEE Conference on Computer Vision and Pattern Recognition, Jun. 2009, pp.248-255. DOI: 10.1109/CVPR.2009.5206848.

[18] Hu H, Peng R, Tai Y, Tang C. Network trimming: A data-driven neuron pruning approach towards efficient deep architectures. arXiv:1607.03250, 2016. https://arxiv.org/ abs/1607.03250, Jul. 2021.

[19] Han S, Mao H, Dally W J. Deep compression: Compressing deep neural networks with pruning, trained quantization and Huffman coding. arXiv:1510.00149, 2015. https: //arxiv.org/ abs/1510.00149, Feb. 2021.

[20] He Y, Zhang X, Sun J. Channel pruning for accelerating very deep neural networks. In Proc. the 2017 IEEE International Conference on Computer Vision, Oct. 2017, pp.1389-1397. DOI: 10.1109/ICCV.2017.155.

[21] Luo J, Wu J, Lin W. ThiNet: A filter level pruning method for deep neural network compression. In Proc. the 2017 IEEE International Conference on Computer Vision, Oct. 2017, pp.5058-5066. DOI: 10.1109/ICCV.2017.541.

[22] Lebedev V, Lempitsky V. Fast ConvNets using group-wise brain damage. In Proc. the 2016 IEEE Conference on Computer Vision and Pattern Recognition, Jun. 2016, pp.2554-2564. DOI: 10.1109/CVPR.2016.280.

[23] Wang Y, Zhang X, Xie L, Zhou J, Su H, Zhang B, Hu X. Pruning from scratch. In Proc. the 34th AAAI Conference on Artificial Intelligence, Feb. 2020, pp.12273-12280. DOI: 10.1609/aaai.v34i07.6910.

[24] Zhuang T, Zhang Z, Huang Y, Zeng X, Shuang K, Li X. Neuron-level structured pruning using polarization regularizer. In Proc. the Annual Conference on Neural Information Processing Systems, Dec. 2020, pp.9865-9877.

[25] Wen W, Wu C, Wang Y, Chen Y, Li H. Learning structured sparsity in deep neural networks. In Proc. the Annual Conference on Neural Information Processing Systems, Dec. 2016, pp.2082-2090.

[26] Ye J, Lu X, Lin Z, Wang J Z. Rethinking the smaller-norm-less-informative assumption in channel pruning of convolution layers. arXiv:1802.00124, 2018. https://arxiv.org/ abs/1802.00124, Feb. 2021.

[27] Yang T, Howard A, Chen B, Zhang X, Go A, Sandler M, Sze V, Adam H. NetAdapt: Platform-aware neural network adaptation for mobile applications. In Proc. the 15th European Conference on Computer Vision, Sept. 2018, pp.285-300. DOI: 10.1007/978-3-030-01249-6.

[28] Li H, Kadav A, Durdanovic I, Samet H, Graf H P. Pruning filters for efficient convnets. arXiv:1608.08710, 2016. https://arxiv.org/abs/1608.08710, Mar. 2021.

[29] Huang Z, Wang N. Data-driven sparse structure selection for deep neural networks. In Proc. the 15th European Conference on Computer Vision, Sept. 2018, pp.304-320. DOI: 10.1007/978-3-030-01270-0.

[30] He Y, Kang G, Dong X, Fu Y, Yang Y. Soft filter pruning for accelerating deep convolutional neural networks. arXiv:1808.06866, 2018. https://arxiv.org/abs/1808.06866, Aug. 2021.

[31] He Y, Liu P, Wang Z, Hu Z, Yang Y. Filter pruning via geometric median for deep convolutional neural networks acceleration. In Proc. the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 2019, pp.4340-4349. DOI: 10.1109/CVPR.2019.00447.

[32] Kang M, Han B. Operation-aware soft channel pruning using differentiable masks. arXiv:2007.03938, 2020. https: //arxiv.org/abs/2007.03938, Jul. 2021.

[33] Yu J, Yang L, Xu N, Yang J, Huang T. Slimmable neural networks. arXiv:1812.08928, 2018. https://arxiv.org/ abs/1812.08928, Dec. 2021.

[34] Yu J, Huang T. AutoSlim: Towards one-shot architecture search for channel numbers. arXiv:1903.11728, 2019. https://arxiv.org/abs/1903.11728, Jun. 2021.

[35] He Y, Lin J, Liu Z, Wang H, Li L, Han S. AMC: AutoML for model compression and acceleration on mobile devices. In Proc. the 15th European Conference on Computer Vision, Sept. 2018, pp.784-800. DOI: 10.1007/978-3-030-01234-2.

[36] Liu Z, Mu H, Zhang X, Guo Z, Yang X, Cheng K, Sun J. MetaPruning: Meta learning for automatic neural network channel pruning. In Proc. the 2019 IEEE/CVF International Conference on Computer Vision, Oct. 2019, pp.3296-3305. DOI: 10.1109/ICCV.2019.00339.

[37] He K, Zhang X, Ren S, Sun J. Delving deep into rectifiers: Surpassing human-level performance on ImageNet classification. In Proc. the 2015 IEEE International Conference on Computer Vision, Dec. 2015, pp.1026-1034. DOI: 10.1109/ICCV.2015.123.

[38] He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. In Proc. the 2016 IEEE Conference on Computer Vision and Pattern Recognition, Jun. 2016, pp.770-778. DOI: 10.1109/CVPR.2016.90.

[39] He Y, Dong X, Kang G, Fu Y, Yan C, Yang Y. Asymptotic soft filter pruning for deep convolutional neural networks. IEEE Transactions on Cybernetics, 2019, 50(8): 3594-3604. DOI: 10.1109/TCYB.2019.2933477.

[40] He T, Zhang Z, Zhang H, Zhang Z, Xie J, Li M. Bag of tricks for image classification with convolutional neural networks. In Proc. the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 2019, pp.558-567. DOI: 10.1109/CVPR.2019.00065.

[41] Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z. Rethinking the inception architecture for computer vision. In Proc. the 2016 IEEE Conference on Computer Vision and Pattern Recognition, Jun. 2016, pp.2818-2826. DOI: 10.1109/CVPR.2016.308.

[42] Lu W, Zhao H, He Q, Huang H, Jin X. Category-consistent deep network learning for accurate vehicle logo recognition. Neurocomputing, 2021, 463: 623-636. DOI: 10.1016/j.neucom.2021.08.030.

[43] Zhao H, Qiu X, Lu W, Huang H, Jin X. Retinal vessel segmentation using generative adversarial learning with a large receptive field. International Journal of Imaging Systems and Technology, 2020, 30(3): 828-842. DOI: 10.1002/ima.22428.

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] 周笛;. A Recovery Technique for Distributed Communicating Process Systems[J]. , 1986, 1(2): 34 -43 .
[2] 陈世华;. On the Structure of Finite Automata of Which M Is an(Weak)Inverse with Delay τ[J]. , 1986, 1(2): 54 -59 .
[3] 王建潮; 魏道政;. An Effective Test Generation Algorithm for Combinational Circuits[J]. , 1986, 1(4): 1 -16 .
[4] 陈肇雄; 高庆狮;. A Substitution Based Model for the Implementation of PROLOG——The Design and Implementation of LPROLOG[J]. , 1986, 1(4): 17 -26 .
[5] 黄河燕;. A Parallel Implementation Model of HPARLOG[J]. , 1986, 1(4): 27 -38 .
[6] 郑国梁; 李辉;. The Design and Implementation of the Syntax-Directed Editor Generator(SEG)[J]. , 1986, 1(4): 39 -48 .
[7] 黄学东; 蔡莲红; 方棣棠; 迟边进; 周立; 蒋力;. A Computer System for Chinese Character Speech Input[J]. , 1986, 1(4): 75 -83 .
[8] 许小曙;. Simplification of Multivalued Sequential SULM Network by Using Cascade Decomposition[J]. , 1986, 1(4): 84 -95 .
[9] 唐同诰; 招兆铿;. Stack Method in Program Semantics[J]. , 1987, 2(1): 51 -63 .
[10] 衷仁保; 邢林; 任朝阳;. An Interactive System SDI on Microcomputer[J]. , 1987, 2(1): 64 -71 .
版权所有 © 《计算机科学技术学报》编辑部
本系统由北京玛格泰克科技发展有限公司设计开发 技术支持:support@magtech.com.cn
总访问量: