基于线性映射的近似定点乘法器设计
LMM: A Fixed-Point Linear Mapping Based Approximate Multiplier for IoT
-
摘要:研究背景 如今,物联网迅速发展,由于物联网设备可用资源受限,电路的能效成为重要的考虑因素。近似计算是以一定精度损失换取能效提升的技术,适合允许一定精度损失的应用包括音频、图像、视频处理和神经网络分类、识别等应用。在这些容错应用中,乘法器是常用的基本模块且能耗较高,若替换为高能效的近似乘法器能够显著提升电路整体的能效。相较于浮点乘法器,定点乘法器由于运算开销、带宽和内存需求小的优势,在物联网设备中应用更广泛。基于以上考虑,近似定点乘法器的设计具有重要意义。目的 我们的研究目的是设计一个高能效的近似定点乘法器,降低容错应用对能耗、计算单元的需求,以满足物联网设备对于能效的要求。方法 我们基于线性映射方法设计了可配置的近似定点乘法器(LMM),该方法将乘法映射为线性加法,并通过累加误差补偿的方式实现多级线性映射。基于此设计的LMM乘法器按照归一化、线性映射、去归一化的步骤实现。考虑到归一化和去归一化操作产生的额外成本,我们在LMM乘法器中引入动态截位方法进一步优化硬件开销得到ILMM近似定点乘法器。其中归一化模块设计结合了移位器和数据选择器,能够实现面积和延时的权衡,并且便于动态截位方法的实现。我们对LMM乘法器进行误差分析,并从精度和电路特性两方面评估16 bit乘法器,精度衡量标准包括最大相对误差、平均相对误差、偏差和标准差,电路特性包括面积、功耗、延时以及相较于精确乘法器的提升。结果 LMM近似定点乘法器在近似层次为0和1时,相较于精确乘法器能够节约一定的硬件开销;但是在更高的近似层次,线性映射方法节约的资源无法抵消归一化和去归一化引入的额外成本。而ILMM近似定点乘法器和同一级近似LMM乘法器相比在面积和功耗上有较大提升,16 bit的ILMM乘法器可以节约高达49.7%的面积和66.39%的功耗。但是动态截位方法的引入导致偏差增加。结论 LMM乘法器可以实现多级近似配置,在此基础上引入动态截位的ILMM乘法器可以实现硬件资源的进一步优化,并且可以实现更精细的近似配置,用户可以根据面积、功耗等需求选择合适的配置,减少电路的开销。如何减少归一化与去归一化模块引入的额外成本以及该设计在低位宽乘法器中的效果有待进一步研究。Abstract: The development of IoT (Internet of Things) calls for circuit designs with energy and area efficiency for edge devices. Approximate computing which trades unnecessary computation precision for hardware cost savings is a promising direction for error-tolerant applications. Multipliers, as frequently invoked basic modules which consume non-trivial hardware costs, have been introduced approximation to achieve distinct energy and area savings for data-intensive applications. In this paper, we propose a fixed-point approximate multiplier that employs a linear mapping technique, which enables the configurability of approximation levels and the unbiasedness of computation errors. We then introduce a dynamic truncation method into the proposed multiplier design to cover a wider and more fine-grained configuration range of approximation for more flexible hardware cost savings. In addition, a novel normalization module is proposed for the required shifting operations, which balances the occupied area and the critical path delay compared with normal shifters. The introduced errors of our proposed design are analyzed and expressed by formulas which are validated by experimental results. Experimental evaluations show that compared with accurate multipliers, our proposed approximate multiplier design provides maximum area and power savings up to 49.70% and 66.39% respectively with acceptable computation errors.