平衡优化:一种量化神经网络的有效与高效方法

doi:10.1007/s11390-017-1750-y

平衡优化:一种量化神经网络的有效与高效方法

Balanced Quantization: An Effective and Efficient Approach to Quantized Neural Networks

摘要

摘要: 量化神经网络（Quantized Neural Network）使用低位宽数来表示参数和执行计算，以降低神经网络的计算复杂性，存储大小和存储器使用。在QNN中，参数和激活被均匀地量化为低位宽数，从而可以用高效的位运算来代替更复杂的乘加操作。然而，神经网络中的参数分布通常是钟形的并且包含几个大的异常值，因此从极值确定的均匀量化可能造成对位宽的不充分利用，造成错误率升高。在本文中，我们提出了一种产生平衡分布的量化值的新型量化方法。我们的方法先将参数按百分位数递归划分为一些平衡的箱，　再应用均匀量化。这样产生的量化值均匀分布在可能的值之间，从而增加了有效位宽。我们还引入计算上更廉价的百分位数的近似值来减少由平衡量化引入的计算开销。总体而言，我们的方法提高了QNN的预测精度，对训练速度的影响可以忽略不计，且不会在推理过程中引入额外的计算。该方法适用于卷积神经网络和循环神经网络。包括ImageNet和Penn Treebank在内的标准数据集上的实验证实了我们的方法的有效性。在ImageNet上，我们的2位量化的Resnet-18模型首五错误率低达18.0％，4位GoogLeNet实现了12.7％的首五错误率，优于其他量化神经网络方法。代码将在线提供。

Abstract: Quantized neural networks (QNNs), which use low bitwidth numbers for representing parameters and performing computations, have been proposed to reduce the computation complexity, storage size and memory usage. In QNNs, parameters and activations are uniformly quantized, such that the multiplications and additions can be accelerated by bitwise operations. However, distributions of parameters in neural networks are often imbalanced, such that the uniform quantization determined from extremal values may underutilize available bitwidth. In this paper, we propose a novel quantization method that can ensure the balance of distributions of quantized values. Our method first recursively partitions the parameters by percentiles into balanced bins, and then applies uniform quantization. We also introduce computationally cheaper approximations of percentiles to reduce the computation overhead introduced. Overall, our method improves the prediction accuracies of QNNs without introducing extra computation during inference, has negligible impact on training speed, and is applicable to both convolutional neural networks and recurrent neural networks. Experiments on standard datasets including ImageNet and Penn Treebank confirm the effectiveness of our method. On ImageNet, the top-5 error rate of our 4-bit quantized GoogLeNet model is 12.7%, which is superior to the state-of-the-arts of QNNs.

HTML全文

参考文献()

施引文献

资源附件()