We use cookies to improve your experience with our site.

基于忆阻器的神经网络加速器数据位宽与工作电压协同优化机制

LayCO: Achieving Least Lossy Accuracy for Most Efficient RRAM-Based Deep Neural Network Accelerator via Layer-Centric Co-Optimization

  • 摘要:
    研究背景(context) 由于忆阻器天然具备矩阵乘法的能力,因此采用高性能、低功耗和高存储密度的忆阻器来构建神经网络加速器已成为当前的研究主流,能够有效缓解物联网设备资源有限与神经网络硬件成本高这两者之间的矛盾。然而在当前基于忆阻器的神经网络加速器研究中,由于推理及训练阶段神经网络中间数据的规模过大,引入大量高能耗的数模接口电路,严重限制了忆阻器计算系统整体效能的提升。模态缓存的提出在一定程度上降低了能耗开销,但在资源受限的场景下,仍需进一步减少系统能耗,此外,模态缓存给忆阻器件的耐久度和使用寿命带来巨大的挑战,该问题也需针对性的进行解决。
    目的(Objective) 我们的研究目标是面向忆阻器提出一种基于神经网络层结构的优化方案,对数据位宽与工作电压进行协同优化,主要研究由降低写入电压所带来的比特错误对于深度神经网络模型精度造成的影响,通过引入数据位宽量化思想来达到性能提升的目的,同时协同完成精度恢复及能耗调节的目标,在严格保证模型推理精度的同时,解决神经网络加速器的能耗与耐久度问题。
    方法(Method) 我们提出了LayCO——通过利用深度神经网络自身的容错能力来提高加速器的整体能效,以神经网络层结构为中心对数据位宽与工作电压进行协同优化。同时,为解决由模态缓冲和电压调节引起的器件耐久度问题,我们还提出了一种以层为中心的数据映射方法,为神经网络合理分配存储分区,并使用一种可感知磨损的数据交换方法,以控制整个忆阻阵列的写平衡。
    结果(Result & Findings) 实验表明LayCO能够在保障模型精度与提升系统能效之间达到动态平衡,使得能源效率较对比对象提升了一个数量级。具体来说,LayCO在保持神经网络模型精度损失低于1%的情况下,在能源效率方面相较于TIMELY加速器提升27倍,在延长器件寿命和节约设计面积方面相较于RAQ加速器分别提升了308倍和6倍。
    结论(Conclusions) LayCO以神经网络层结构为中心,提出三个关键性的策略:电压调节、数据位宽紧缩、以及数据合理映射与交换方法,达到了模型损失、系统能耗与其他性能指标之间的动态权衡。在严格确保深度神经网络模型精度的同时,LayCO能够在能效提升、寿命延长和设计面积减少等方面优于当前最新系统。未来除了进一步改善面向忆阻器的系统能耗之外,相关研究将会进一步涉及到计算效率的提升,同时考虑针对忆阻器件非理想因素进行系统容错设计,通过构建高效可靠的计算加速系统,推动各类上层深度学习应用的不断发展。

     

    Abstract: Resistive random access memory (RRAM) enables the functionality of operating massively parallel dot products and accumulations. RRAM-based accelerator is such an effective approach to bridging the gap between Internet of Things devices’ constrained resources and deep neural networks’ tremendous cost. Due to the huge overhead of Analog to Digital (A/D) and digital accumulations, analog RRAM buffer is introduced to extend the processing in analog and in approximation. Although analog RRAM buffer offers potential solutions to A/D conversion issues, the energy consumption is still challenging in resource-constrained environments, especially with enormous intermediate data volume. Besides, critical concerns over endurance must also be resolved before the RRAM buffer could be frequently used in reality for DNN inference tasks. Then we propose LayCO, a layer-centric co-optimizing scheme to address the energy and endurance concerns altogether while strictly providing an inference accuracy guarantee. LayCO relies on two key ideas: 1) co-optimizing with reduced supply voltage and reduced bit-width of accelerator architectures to increase the DNN’s error tolerance and achieve the accelerator’s energy efficiency, and 2) efficiently mapping and swapping individual DNN data to a corresponding RRAM partition in a way that meets the endurance requirements. The evaluation with representative DNN models demonstrates that LayCO outperforms the baseline RRAM buffer based accelerator by 27x improvement in energy efficiency (over TIMELY-like configuration), 308x in lifetime prolongation and 6x in area reduction (over RAQ) while maintaining the DNN accuracy loss less than 1%.

     

/

返回文章
返回