LayCO: Achieving Least Lossy Accuracy for Most Efficient RRAM-Based Deep Neural Network Accelerator via Layer-Centric Co-Optimization
-
Abstract
Resistive random access memory (RRAM) enables the functionality of operating massively parallel dot products and accumulations. RRAM-based accelerator is such an effective approach to bridging the gap between Internet of Things devices’ constrained resources and deep neural networks’ tremendous cost. Due to the huge overhead of Analog to Digital (A/D) and digital accumulations, analog RRAM buffer is introduced to extend the processing in analog and in approximation. Although analog RRAM buffer offers potential solutions to A/D conversion issues, the energy consumption is still challenging in resource-constrained environments, especially with enormous intermediate data volume. Besides, critical concerns over endurance must also be resolved before the RRAM buffer could be frequently used in reality for DNN inference tasks. Then we propose LayCO, a layer-centric co-optimizing scheme to address the energy and endurance concerns altogether while strictly providing an inference accuracy guarantee. LayCO relies on two key ideas: 1) co-optimizing with reduced supply voltage and reduced bit-width of accelerator architectures to increase the DNN’s error tolerance and achieve the accelerator’s energy efficiency, and 2) efficiently mapping and swapping individual DNN data to a corresponding RRAM partition in a way that meets the endurance requirements. The evaluation with representative DNN models demonstrates that LayCO outperforms the baseline RRAM buffer based accelerator by 27x improvement in energy efficiency (over TIMELY-like configuration), 308x in lifetime prolongation and 6x in area reduction (over RAQ) while maintaining the DNN accuracy loss less than 1%.
-
-