Space Efficient Quantization for Deep Convolutional Neural Networks

Dong-Di Zhao; Fan Li; Kashif Sharif; Guang-Min Xia; Yu Wang

doi:10.1007/s11390-019-1912-1

Volume 34 Issue 2

March 2019

Turn off MathJax

Article Contents

Abstract

References

Supplements

Journal of Computer Science and Technology > 2019 > 34(2): 305-317. > DOI: 10.1007/s11390-019-1912-1 CSTR: 32374.14.s11390-019-1912-1

Dong-Di Zhao, Fan Li, Kashif Sharif, Guang-Min Xia, Yu Wang. Space Efficient Quantization for Deep Convolutional Neural Networks[J]. Journal of Computer Science and Technology, 2019, 34(2): 305-317. DOI: 10.1007/s11390-019-1912-1

Citation:

Previous Article Next Article

PDF

Space Efficient Quantization for Deep Convolutional Neural Networks

Dong-Di Zhao¹ ,
Fan Li^1, , Member, CCF, ACM, IEEE,
Kashif Sharif¹ Member, CCF, ACM, IEEE,
Guang-Min Xia¹ ,
Yu Wang^2, , Fellow, IEEE, Senior Member, ACM

1 School of Computer Science, Beijing Institute of Technology, Beijing 100081, China;
2 Wireless Networking and Sensing Laboratory, Department of Computer Science, University of North Carolina at Charlotte, Charlotte, NC 28223, U.S.A.

Funds: The work of Fan Li is partially supported by the National Natural Science Foundation of China (NSFC) under Grant Nos. 61772077 and 61370192, and Beijing Natural Science Foundation of China under Grant No. 4192051. The work of Yu Wang is partially supported by NSFC under Grant Nos. 61428203 and 61572347.

More Information

Author Bio:
Dong-Di Zhao received his B.E. degree in the Internet of Things from the School of Computer Science, Beijing Institute of Technology, Beijing, in 2016. He is currently pursuing his Master's degree at Beijing Institute of Technology, Beijing. His research interests include mobile sensing, mobile computing, and deep learning.
Corresponding author:
Fan Li E-mail: fli@bit.edu.cn

Yu Wang E-mail: yu.wang@uncc.edu
Received Date: July 14, 2018
Revised Date: January 26, 2019
Published Date: March 04, 2019

Abstract

Abstract

Deep convolutional neural networks (DCNNs) have shown outstanding performance in the fields of computer vision, natural language processing, and complex system analysis. With the improvement of performance with deeper layers, DCNNs incur higher computational complexity and larger storage requirement, making it extremely difficult to deploy DCNNs on resource-limited embedded systems (such as mobile devices or Internet of Things devices). Network quantization efficiently reduces storage space required by DCNNs. However, the performance of DCNNs often drops rapidly as the quantization bit reduces. In this article, we propose a space efficient quantization scheme which uses eight or less bits to represent the original 32-bit weights. We adopt singular value decomposition (SVD) method to decrease the parameter size of fully-connected layers for further compression. Additionally, we propose a weight clipping method based on dynamic boundary to improve the performance when using lower precision. Experimental results demonstrate that our approach can achieve up to approximately 14x compression while preserving almost the same accuracy compared with the full-precision models. The proposed weight clipping method can also significantly improve the performance of DCNNs when lower precision is required.
- convolutional neural network,
- memory compression,
- network quantization

FullText(HTML)

References (25)

References

[1]	Krizhevsky A, Sutskever I, Hinton G E. ImageNet classification with deep convolutional neural networks. In Proc. the 26th Annual Conf. Neural Information Processing Systems, December 2012, pp.1106-1114.
[2]	Ren S, He K, Girshick R, Sun J. Faster R-CNN: Towards real-time object detection with region proposal networks. In Proc. the 29th Annual Conf. Neural Information Processing Systems, December 2015, pp.91-99.
[3]	Abdel-Hamid O, Mohamed A R, Jiang H, Deng L, Penn G, Yu D. Convolutional neural networks for speech recognition. IEEE/ACM Trans. Audio, Speech, and Language processing, 2014, 22(10): 1533-1545.
[4]	Mao H, Alizadeh M, Menache I, Kandula S. Resource management with deep reinforcement learning. In Proc. the 15th ACM Workshop on Hot Topics in Networks, November 2016, pp.50-56.
[5]	Deng J, Dong W, Socher R, Li L J, Li K, Li F F. ImageNet: A large-scale hierarchical image database. In Proc. the 2009 IEEE Computer Society Conf. Computer Vision and Pattern Recognition, June 2009, pp.248-255.
[6]	He K, Shang X, Ren S, Sun J. Deep residual learning for image recognition. In Proc. IEEE Conf. Computer Vision and Pattern Recognition, June 2016, pp.770-778.
[7]	Yao S, Hu S, Zhao Y, Zhang A, Abdelzaher T. DeepSense: A unified deep learning framework for time-series mobile sensing data processing. In Proc. the 26th International Conference on World Wide Web, April 2017, pp.351-360.
[8]	Guo B, Wang Z, Yu Z, Wang Y, Yen N, Huang R, Zhou X. Mobile crowd sensing and computing: The review of an emerging human-powered sensing paradigm. ACM Computing Surveys, 2015, 48(1): Article No. 7.
[9]	Vanhoucke V, Senior A, Mao M Z. Improving the speed of neural networks on CPUs. In Proc. NIPS Deep Learning and Unsupervised Feature Learning Workshop, December 2011, pp.611-620.
[10]	Han S, Mao H, Dally W J. Deep compression: Compressing deep neural networks with pruning, trained quantization and Huffman coding. In Proc. Int. Conf. Learning Representations, May 2016, pp.351-360.
[11]	Gysel P, Motamedi M, Ghiasi S. Hardware-oriented approximation of convolutional neural networks. arXiv:1604.03168, 2016. https://arxiv.org/abs/1604.03168,October2018.
[12]	Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556, 2014. https://arxiv.org/abs/1409.1556,April2018.
[13]	Chen W, Wilson J T, Tyree S, Weinberger K Q, Chen Y. Compressing neural networks with the hashing trick. In Proc. the 32nd Int. Conf. Machine Learning, July 2015, pp.2285-2294.
[14]	Wu J, Leng C, Wang Y, Hu Q, Cheng J. Quantized convolutional neural networks for mobile devices. In Proc. the 2016 IEEE Conf. Computer Vision and Pattern Recognition, June 2016, pp.4820-4828.
[15]	Zhou A, Yao A, Guo Y, Xu L, Chen Y. Incremental network quantization: Towards lossless CNNs with low precision weights. arXiv:1702.03044, 2017. https://arxiv.org/abs/1702.03044,August2017.
[16]	Park E, Ahn J, Yoo S. Weighted-entropy-based quantization for deep neural networks. In Proc. the 2017 IEEE Conf. Computer Vision and Pattern Recognition, July 2017, pp.7197-7205.
[17]	Jaderberg M, Vedaldi A, Zisserman A. Speeding up convolutional neural networks with low rank expansions. In Proc. British Machine Vision Conference, September 2014, Article No. 73.
[18]	Hinton G, Vinyals O, Dean J. Distilling the knowledge in a neural network. arXiv:1503.02531, 2015. https://arxiv.org/pdf/1503.02531.pdf,November2018.
[19]	Iandola F N, Han S, Moskewicz M W, Ashraf A, Dally W J, Keutzer K. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5 MB model size. arXiv:1602.07360, 2016. https://arxiv.org/abs/1602.07360,November2018.
[20]	Chollet F. Xception: Deep learning with depthwise separable convolutions. In Proc. the 2017 IEEE Conf. Computer Vision and Pattern Recognition, July 2017, pp.1800-1807.
[21]	Lin D, Talathi S, Annapureddy S. Fixed point quantization of deep convolutional networks. In Proc. the 33rd Int. Conf. Machine Learning, Jun. 2016, pp.2849-2858.
[22]	Gupta S, Argawal A, Gopalakrishnan K, Narayanan P. Deep learning with limited numerical precision. In Proc. the 32nd Int. Conf. Machine Learning, July 2015, pp.1737- 1746.
[23]	Gong Y, Liu L, Yang M., Bourdev L. Compressing deep convolutional networks using vector quantization. arXiv:1412.6115, 2014. https://arxiv.org/abs/1412.6115,December2018.
[24]	Kullback S, Leibler R A. On information and sufficiency. The Annals of Mathematical Statistics, 1951, 22(1): 79-86.
[25]	Abadi M, Barham P, Chen J, Chen Z et al. TensorFlow: A system for large-scale machine learning. In Proc. the 12th USENIX Symposium on Operating Systems Design and Implementation, November 2016, pp.265-283.

Relative Articles

[1]	Cheng Gong, Ye Lu, Su-Rong Dai, Qian Deng, Cheng-Kun Du, Tao Li. AutoQNN: An End-to-End Framework for Automatically Quantizing Neural Networks[J]. Journal of Computer Science and Technology, 2024, 39(2): 401-420. DOI: 10.1007/s11390-022-1632-9
[2]	Zhang-Jin Huang, Xiang-Xiang He, Fang-Jun Wang, Qing Shen. A Real-Time Multi-Stage Architecture for Pose Estimation of Zebrafish Head with Convolutional Neural Networks[J]. Journal of Computer Science and Technology, 2021, 36(2): 434-444. DOI: 10.1007/s11390-021-9599-5
[3]	Shu-Chang Zhou, Yu-Zhi Wang, He Wen, Qin-Yao He, Yu-Heng Zou. Balanced Quantization: An Effective and Efficient Approach to Quantized Neural Networks[J]. Journal of Computer Science and Technology, 2017, 32(4): 667-682. DOI: 10.1007/s11390-017-1750-y
[4]	Xi-Jin Zhang, Yi-Fan Lu, Song-Hai Zhang. Multi-Task Learning for Food Identification and Analysis with Deep Convolutional Neural Networks[J]. Journal of Computer Science and Technology, 2016, 31(3): 489-500. DOI: 10.1007/s11390-016-1642-6
[5]	Si-Wei Ma, Wen Gao. Low Complexity Integer Transform and Adaptive Quantization Optimization[J]. Journal of Computer Science and Technology, 2006, 21(3): 354-359.
[6]	Zhou Jingzhou. A Neural Network Model Based on Logical Operations[J]. Journal of Computer Science and Technology, 1998, 13(5): 464-470.
[7]	Qin Kaihuai. Neural Network Methods for NURBS Curve and Surface Interpolation[J]. Journal of Computer Science and Technology, 1997, 12(1): 76-89.
[8]	Zhang Zhong. Simulation of ATPG Neural Network and Its Experimental Results[J]. Journal of Computer Science and Technology, 1995, 10(4): 310-324.
[9]	Zhang Bo, Zhang Ling. On Memory Capacity of the Probabilistic Logic Neuron Network[J]. Journal of Computer Science and Technology, 1993, 8(3): 62-66.
[10]	Weigeng Shi. Reconnectable Network with Limited Resources[J]. Journal of Computer Science and Technology, 1991, 6(3): 243-249.

Supplements (1)

Supplements
Others
- English PDF
  2019-2-5-8920-Highlights 327KB

Cited By

Cited by

Periodical cited type(5)

1.	Bhoomi Shah, Hetal Bhavsar. Time Complexity in Deep Learning Models. Procedia Computer Science, 2022, 215: 202. DOI:10.1016/j.procs.2022.12.023
2.	Xiaohui Kuang, Xianfeng Gao, Lianfang Wang, et al. A discrete cosine transform-based query efficient attack on black-box object detectors. Information Sciences, 2021, 546: 596. DOI:10.1016/j.ins.2020.05.089
3.	Jakub Nalepa, Marek Antoniak, Michal Myller, et al. Towards resource-frugal deep convolutional neural networks for hyperspectral image segmentation. Microprocessors and Microsystems, 2020, 73: 102994. DOI:10.1016/j.micpro.2020.102994
4.	Chollette C. Olisah, Lyndon Smith. Understanding unconventional preprocessors in deep convolutional neural networks for face identification. SN Applied Sciences, 2019, 1(11) DOI:10.1007/s42452-019-1538-5
5.	Andras Formanek, Daniel Hadhazi. Compressing Convolutional Neural Networks by L0 Regularization. 2019 International Conference on Control, Artificial Intelligence, Robotics & Optimization (ICCAIRO), DOI:10.1109/ICCAIRO47923.2019.00032

Other cited types(0)

Get Citation

PDF

XML

Article views (83) PDF downloads (615) Cited by(5)

Indexed in:

Space Efficient Quantization for Deep Convolutional Neural Networks

Abstract

References

Related Articles

Supplements

Others

English PDF

Cited by

Periodical cited type(5)

Other cited types(0)

Catalog

Related

Home

Overview

Resources

Contents

Indexed in:

Space Efficient Quantization for Deep Convolutional Neural Networks

Abstract

References

Related Articles

Supplements

Others

English PDF

Cited by

Periodical cited type(5)

Other cited types(0)

Catalog

Related

Home

Overview

Resources

Contents

Export File

Citation

Format

Content