Journal of Computer Science and Technology

   

AutoQNN: An End-to-End Framework for Automatically Quantizing Neural Networks

Cheng Gong1 (龚成), Student Member, CCF, Ye Lu2,3 (卢冶), Senior Member, CCF, Su-Rong Dai2 (代素蓉), Student Member, CCF, Qian Deng2 (邓倩), Cheng-Kun Du2 (杜承昆), Student Member, CCF, and Tao Li2,3,∗ (李涛), Distinguished Member, CCF, Member, ACM   

  1. 1College of Software, Nankai University, Tianjin 300350, China
    2College of Computer Science, Nankai University, Tianjin 300350, China
    3State Key Laboratory of Computer Architecture, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China
  • Received:2022-05-30 Revised:2022-10-22 Accepted:2022-12-26
  • Contact: Tao Li E-mail:litao@nankai.edu.cn
  • About author:Tao Li received his Ph.D. degree in Computer Science from Nankai University, in 2007. He works at the College of Computer Science, Nankai University as a professor. He is the member of the IEEE Computer Society and the ACM, and the dis- tinguished member of the CCF. His main research interests include heterogeneous computing, machine learning, and blockchain system.

Exploring the expected quantizing scheme with suitable mixed-precision policy is the key point to compress deep neural networks (DNNs) in high efficiency and accuracy. This exploration implies heavy workloads for domain experts, and an automatic compression method is needed. However, the huge search space of the automatic method introduces plenty of computing budgets that make the automatic process challenging to be applied in real scenarios. In this paper, we propose an end-to-end framework named AutoQNN, for automatically quantizing different layers utilizing different schemes and bitwidths without any human labor. AutoQNN can seek desirable quantizing schemes and mixed-precision policies for mainstream DNN models efficiently by involving three techniques: quantizing scheme search (QSS), quantizing precision learning (QPL), and quantized architecture generation (QAG). QSS introduces five quantizing schemes and defines three new schemes as a candidate set for scheme search, and then uses the differentiable neural architecture search (DNAS) algorithm to seek the layer- or model-desired scheme from the set. QPL is the first method to learn mixed-precision policies by reparameterizing the bitwidths of quantizing schemes, to the best of our knowledge. QPL optimizes both classification loss and precision loss of DNNs efficiently and obtains the relatively optimal mixed-precision model within limited model size and memory footprint. QAG is designed to convert arbitrary architectures into corresponding quantized ones without manual intervention, to facilitate end-to-end neural network quantization. We have implemented AutoQNN and integrated it into Keras. Extensive experiments demonstrate that AutoQNN can consistently outperform state-of-the-art quantization. For 2-bit weight and activation of AlexNet and ResNet18, AutoQNN can achieve the accuracy results of 59.75% and 68.86%, respectively, and obtain accuracy improvements by up to 1.65% and 1.74%, respectively, compared with state-of-the-art methods. Especially, compared with the full-precision AlexNet and ResNet18, the 2-bit models only slightly incur accuracy degradation by 0.26% and 0.76%, respectively, which can fulfill practical application demands.


中文摘要

1、 研究背景(context):
近年来,深度神经网络(Deep Neural Networks, DNNs)吸引了许多研究者的兴趣。由于DNNs 极强的非线性特征提取能力,其被广泛地应用于各个研究领域,相比于传统的非基于DNNs 的算法实现了跨越式的性能提升。然而,DNNs的计算量和内存占用都十分巨大,计算时间复杂度和空间复杂度极高。通过将DNNs的权值和激活值替换为低数值精度的表示,深度神经网络量化可以有效地降低DNNs算法的计算复杂度,提升DNNs的计算效率。
2、 目的(Objective):
尽管量化方法研究取得了巨大进步,为指定的DNN模型选择合适的量化方案仍然具有挑战,大量的量化超参数仍然需要人为设定。探索具有合适超参数设置的量化方案是实现高效,准确的DNN模型量化的关键。但是,一般的DNN模型通常有数百层需要量化,一个DNN模型可能需要人为设定数百个超参数,其探索空间极其庞大,人工探索方法的工作量巨大。而随着深度学习的快速发展,新模型不断被提出,为每个待量化模型人工搜索最优的量化方案变得愈加困难。因此,自动神经网络量化方法引起了研究人员的广泛关注,其使用自动的机器学习技术来自动的探索庞大的搜索空间,自动地为每一个输入的DNN模型选择合适的量化方案。然而,探索巨大的搜索空间是仍然是十分具有挑战性,目前的自动量化方法常常具有极高的计算复杂度,这严重影响了自动方法的应用。
3、方法(Method):
本文提出了一个端到端的自动神经网络量化框架,称为AutoQNN(Automatically Quantizing Neural Networks)。 AutoQNN接受一个浮点的DNN模型作为输入,从候选的量化方法集合,以及候选的量化位宽集合中,自动地为DNN模型的每一层搜索最合适的量化方法和量化精度,并自动地将搜索的量化方案应用于DNN模型,端到端地生成量化后的模型。AutoQNN提出了三种技术以高效地为主流的DNN模型搜索合适的量化方法以及混合精度策略,并生成量化模型,这些技术包括:量化方法搜索(quantizing Scheme Search,QSS)、量化精度学习(quantizing Precision Learning,QPL)和量化架构生成(Quantized Architecture Generation, QAG)。QSS首先定义了量化方法的候选集合,其引用了5个已经广泛使用的量化方法,并提出了三种新的量化方法,使用总共8种量化方法作为候选集合。然后,QSS使用可微神经架构搜索算法(Differentiable Neural Architecture Search,DNAS),为整个DNN模型或者DNN模型的每一层,从候选集合中搜索合适的量化方法。QPL首先定义了一个量化的范式,基于该范式重参数化量化的位宽参数,并给出了DNN模型损失到量化位宽参数的链式求导方程,并使用基于梯度的优化方法(Gradient-Based Optimization, GBO)学习最优的量化精度。据我们所知,QPL是第一个使用GBO来搜索混合精度量化策略的方法。 然后,QPL提出了量化精度损失(Precision Loss),以约束整个模型的量化精度,限制量化后模型的存储大小和内存占用。QAG提出了一个通用的计算图重构算法,用于将搜索到了量化方案应用到DNN模型。基于该算法,QAG可以自动地将任意DNN模型转换为相应的量化后的模型,从而实现端到端神经网络量化,无需人工干预。
4、结果(Result & Findings):
本文基于广泛使用的深度学习框架Keras实现了AutoQNN,以便于快速高效的DNN模型量化。大量的实验表明AutoQNN优于目前最先进的量化方法,以及自动的量化方案。使用AutoQNN将AlexNet和ResNet18分别压缩16倍后,其量化模型在ImageNet数据集上分别达到了59.75%和68.86%的Top1分类准确率,这个结果仅比浮点模型的结果下降了0.26%和0.76%,比当前最先进的量化方法在2比特的最好结果分别高了1.65%和1.74%。
5、结论(Conclusions):
本文提出一种端到端的神经网络量化框架,称为AutoQNN,可以自动地为指定的DNN 模型搜索合适的量化方法和量化精度,为指定的DNN 模型搜索的量化方案,并避免大量的人工探索和工作量。AutoQNN主要包括三种技术:QSS,QPL,和QAG,分别解决自动量化面临的难题。QSS 通过配置和共享状态参数,自动地为DNN模型搜索合适的量化方法,可以分别为不同的层搜索量化方法,或者为整个DNN模型搜索统一的量化方法。QPL 使用GBO,自动地为DNN模型的每一层学习不同的量化精度,提升给定模型压缩约束下的DNN模型的分类准确率。QAG 按照给定的量化策略自动地从浮点的DNN模型生成量化的DNN模型。大量的实验结果表明,在相同的平均量化位宽下,基于AutoQNN 进行量化的DNN 模型可以达到更高的分类准确率,并优于最新的量化方法。

Key words: automatic quantization; mixed precision; quantizing scheme search; quantizing precision learning; quantized architecture generation;

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!

ISSN 1000-9000(Print)

         1860-4749(Online)
CN 11-2296/TP

Home
Editorial Board
Author Guidelines
Subscription
Journal of Computer Science and Technology
Institute of Computing Technology, Chinese Academy of Sciences
P.O. Box 2704, Beijing 100190 P.R. China
Tel.:86-10-62610746
E-mail: jcst@ict.ac.cn
 
  Copyright ©2015 JCST, All Rights Reserved