We use cookies to improve your experience with our site.

Indexed in:

SCIE, EI, Scopus, INSPEC, DBLP, CSCD, etc.

Submission System
(Author / Reviewer / Editor)
Zhao SF, Wang F, Liu B et al. LayCO: Achieving least lossy accuracy for most efficient RRAM-based deep neural network accelerator via layer-centric co-optimization. JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY 38(2): 328−347 Mar. 2023. DOI: 10.1007/s11390-023-2545-y.
Citation: Zhao SF, Wang F, Liu B et al. LayCO: Achieving least lossy accuracy for most efficient RRAM-based deep neural network accelerator via layer-centric co-optimization. JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY 38(2): 328−347 Mar. 2023. DOI: 10.1007/s11390-023-2545-y.

LayCO: Achieving Least Lossy Accuracy for Most Efficient RRAM-Based Deep Neural Network Accelerator via Layer-Centric Co-Optimization

Funds: This work was supported by the National Natural Science Foundation of China under Grant Nos. U22A2027, 61832020, 61832007 and 61821003, the Science Technology and Innovation Commission of Shenzhen Municipality under Grant No. JCYJ20210324141601005, and the Henan Provincial Science and Technology Key Project Foundation under Grant Nos. 212102310085, 222102210054, 222102210154 and 222102210252.
More Information
  • Author Bio:

    Shao-Feng Zhao is currently a Ph.D. candidate in Wuhan National Laboratory for Optoelectronics, Key Laboratory of Information Storage System, Engineering Research Center of Data Storage Systems and Technology at Huazhong University of Science and Technology, Wuhan. He received his M.S. and B.S. degrees in computer science and engineering from Zhengzhou University, Zhengzhou, in 2013 and 2008, respectively. His current research interests focus on deep learning and non-volatile memory system

    Fang Wang received her B.E. degree and Master's degree in computer science in 1994 and 1997 respectively, and her Ph.D. degree in computer architecture in 2001, all from Huazhong University of Science and Technology (HUST), Wuhan. She is a professor of computer science and engineering at HUST, Wuhan. Her interests include distribute file systems, parallel I/O storage systems, and graph processing systems. She has more than 50 publications in major journals and international conferences, including FGCS, ACM TACO, SCIENCE CHINA Information Sciences, Chinese Journal of Computers, and HiPC, ICDCS, HPDC, and ICPP. She is a member of CCF and IEEE

    Bo Liu received her Ph.D. degree in computer architecture from the Huazhong University of Science and Technology, Wuhan, in 2020. She is now a lecturer in Zhengzhou University, Zhengzhou. Her research interests focus on deep learning, heterogeneous system, and distributed computing

    Dan Feng received her B.E., M.E., and Ph.D. degrees in computer science and technology in 1991, 1994, and 1997, respectively, from Huazhong University of Science and Technology (HUST), Wuhan. She is a distinguished professor of Changjiang Scholars Program, and obtainer of Distinguished Young Scholars of National Science Foundation of China (NSFC). She is a professor and the dean of School of Computer Science and Technology at HUST, Wuhan. She is also the director of Division of Data Storage System, Wuhan National Laboratory for Optoelectronics, and the director of Key Laboratory of Information Storage System, Ministry of Education of China. Her research interests include computer architecture, non-volatile memory technology, distributed and parallel file system, and massive storage system. She has more than 300 publications in major journals and international conferences, including IEEE TC, IEEE TPDS, IEEE TCAD, ACM-TOS, ISCA, FAST, USENIX ATC, EuroSys, ICDCS, HPDC, SC, ICS, IPDPS, DAC and DATE, etc. She serves as an associate editor of IEEE LOCS. She has served as the reviewer of multiple journals, including IEEE TC, IEEE TPDS, etc., and the program committees of multiple international conferences, including SC 2011 & 2013, MSST 2012 & 2015, SRDS2020, FAST2022, etc. She served as the chair of Information Storage Technology Committee of CCF from 2016 to 2020. She is a senior member of CCF and a member of ACM and IEEE

    Yang Liu received his B.S. and M.S. degrees in computer science and technology from Zhengzhou University, Zhengzhou, in 2003 and 2006, respectively, and his Ph.D. degree in computer architecture from Huazhong University of Science and Technology, Wuhan, in 2013. He is currently an associate professor at the Cloud Computing and Big Data Institute, Henan University of Economics and Law, Zhengzhou. His research interests include massive storage system, data mining, and machine learning

  • Corresponding author:

    Fang Wang initiated the project and coordinated the research.

    Bo Liu worked on the conceptualization and methodology.

  • Received Date: May 30, 2022
  • Accepted Date: March 25, 2023
  • Resistive random access memory (RRAM) enables the functionality of operating massively parallel dot products and accumulations. RRAM-based accelerator is such an effective approach to bridging the gap between Internet of Things devices’ constrained resources and deep neural networks’ tremendous cost. Due to the huge overhead of Analog to Digital (A/D) and digital accumulations, analog RRAM buffer is introduced to extend the processing in analog and in approximation. Although analog RRAM buffer offers potential solutions to A/D conversion issues, the energy consumption is still challenging in resource-constrained environments, especially with enormous intermediate data volume. Besides, critical concerns over endurance must also be resolved before the RRAM buffer could be frequently used in reality for DNN inference tasks. Then we propose LayCO, a layer-centric co-optimizing scheme to address the energy and endurance concerns altogether while strictly providing an inference accuracy guarantee. LayCO relies on two key ideas: 1) co-optimizing with reduced supply voltage and reduced bit-width of accelerator architectures to increase the DNN’s error tolerance and achieve the accelerator’s energy efficiency, and 2) efficiently mapping and swapping individual DNN data to a corresponding RRAM partition in a way that meets the endurance requirements. The evaluation with representative DNN models demonstrates that LayCO outperforms the baseline RRAM buffer based accelerator by 27x improvement in energy efficiency (over TIMELY-like configuration), 308x in lifetime prolongation and 6x in area reduction (over RAQ) while maintaining the DNN accuracy loss less than 1%.

  • [1]
    Jin H, Liu B, Jiang W B, Ma Y, Shi X H, He B S, Zhao S F. Layer-centric memory reuse and data migration for extreme-scale deep learning on many-core architectures. ACM Trans. Architecture and Code Optimization, 2018, 15(3): Article No. 37. DOI: 10.1145/3243904.
    [2]
    Yang T J, Chen Y H, Sze V. Designing energy-efficient convolutional neural networks using energy-aware pruning. In Proc. the 2017 IEEE Conference on Computer Vision and Pattern Recognition, Jul. 2017, pp.6071–6079. DOI: 10.1109/CVPR.2017.643.
    [3]
    He K M, Zhang X Y, Ren S Q, Sun J. Deep residual learning for image recognition. In Proc. the 2016 IEEE Conference on Computer Vision and Pattern Recognition, Jun. 2016, pp.770–778. DOI: 10.1109/CVPR.2016.90.
    [4]
    Xie S N, Girshick R, Dollár P, Tu Z W, He K M. Aggregated residual transformations for deep neural networks. In Proc. the 2017 IEEE Conference on Computer Vision and Pattern Recognition, Jul. 2017, pp.5987–5995. DOI: 10.1109/CVPR.2017.634.
    [5]
    Chen Y H, Emer J, Sze V. Eyeriss: A spatial architecture for energy-efficient dataflow for convolutional neural networks. ACM SIGARCH Computer Architecture News, 2016, 44(3): 367–379. DOI: 10.1145/3007787.3001177.
    [6]
    Gao M Y, Yang X, Pu J, Horowitz M, Kozyrakis C. TANGRAM: Optimized coarse-grained dataflow for scalable NN accelerators. In Proc. the 24th International Conference on Architectural Support for Programming Languages and Operating Systems, Apr. 2019, pp.807–820. DOI: 10.1145/3297858.3304014.
    [7]
    Chi P, Li S C, Xu C, Zhang T, Zhao J S, Liu Y P, Wang Y, Xie Y. PRIME: A novel processing-in-memory architecture for neural network computation in ReRAM-based main memory. ACM SIGARCH Computer Architecture News, 2016, 44(3): 27–39. DOI: 10.1145/3007787.3001140.
    [8]
    Shafiee A, Nag A, Muralimanohar N, Balasubramonian R, Strachan J P, Hu M, Williams R S, Srikumar V. ISAAC: A convolutional neural network accelerator with in-situ analog arithmetic in crossbars. ACM SIGARCH Computer Architecture News, 2016, 44(3): 14–26. DOI: 10.1145/3007787.3001139.
    [9]
    Song L H, Qian X H, Li H, Chen Y R. PipeLayer: A pipelined ReRAM-based accelerator for deep learning. In Proc. the 2017 IEEE International Symposium on High Performance Computer Architecture, Feb. 2017, pp.541–552. DOI: 10.1109/HPCA.2017.55.
    [10]
    Zhu Z H, Sun H B, Lin Y J, Dai G H, Xia L X, Han S, Wang Y, Yang H Z. A configurable multi-precision CNN computing framework based on single bit RRAM. In Proc. the 56th Annual Design Automation Conference, Jun. 2019, Article No. 56. DOI: 10.1145/3316781.3317739.
    [11]
    Chou T, Tang W, Botimer J, Zhang Z Y. CASCADE: Connecting RRAMs to extend analog dataflow in an end-to-end in-memory processing paradigm. In Proc. the 52nd Annual IEEE/ACM International Symposium on Microarchitecture, Oct. 2019, pp.114–125. DOI: 10.1145/3352460.3358328.
    [12]
    Li W T, Xu P F, Zhao Y, Li H T, Xie Y, Lin Y Y. Timely: Pushing data movements and interfaces in PIM accelerators towards local and in time domain. In Proc. the 2020 ACM/IEEE Annual International Symposium on Computer Architecture, May 30–June 3, 2020, pp.832–845. DOI: 10.1109/ISCA45697.2020.00073.
    [13]
    Waser R, Dittmann R, Staikov G, Szot K. Redox-based resistive switching memories-nanoionic mechanisms, prospects, and challenges. Advanced Materials, 2009, 21(25/26): 2632–2663. DOI: 10.1002/adma.200900375.
    [14]
    Wong H S P, Lee H Y, Yu S M, Chen Y S, Wu Y, Chen P S, Lee B, Chen F T, Tsai M J. Metal-oxide RRAM. Proceedings of the IEEE, 2012, 100(6): 1951–1970. DOI: 10.1109/JPROC.2012.2190369.
    [15]
    Chou C C, Lin Z J, Tseng P L, Li C F, Chang C Y, Chen W C, Chih Y D, Chang T Y J. An N40 256K×44 embedded RRAM macro with SL-precharge SA and low-voltage current limiter to improve read and write performance. In Proc. the 2018 IEEE International Solid-State Circuits Conference, Feb. 2018, pp.478–480. DOI: 10.1109/ISSCC.2018.8310392.
    [16]
    Yang J G, Xue X Y, Xu X X, Wang Q, Jiang H J, Yu J, Dong D N, Zhang F, Lv H B, Liu M. 24.2 A 14nm-FinFET 1Mb embedded 1T1R RRAM with a 0.022μm2 cell size using self-adaptive delayed termination and multi-cell reference. In Proc. the 2021 IEEE International Solid-State Circuits Conference, Feb. 2021, pp.336–338. DOI: 10.1109/ISSCC42613.2021.9365945.
    [17]
    Yao P, Wu H Q, Gao B, Eryilmaz S B, Huang X Y, Zhang W Q, Zhang Q T, Deng N, Shi L P, Wong H S P, Qian H. Face classification using electronic synapses. Nature Communications, 2017, 8: Article No. 15199. DOI: 10.1038/ncomms15199.
    [18]
    Strukov D B. Endurance-write-speed tradeoffs in nonvolatile memories. Applied Physics A, 2016, 122(4): Article No. 302. DOI: 10.1007/s00339-016-9841-0.
    [19]
    Vogelsang T. Understanding the energy consumption of dynamic random access memories. In Proc. the 43rd Annual IEEE/ACM International Symposium on Microarchitecture, Dec. 2010, pp.363–374. DOI: 10.1109/MICRO.2010.42.
    [20]
    Koppula S, Orosa L, Yağlıkçı A G, Azizi R, Shahroodi T, Kanellopoulos K, Mutlu O. EDEN: Enabling energy-efficient, high-performance deep neural network inference using approximate DRAM. In Proc. the 52nd Annual IEEE/ACM International Symposium on Microarchitecture, Oct. 2019, pp.166–181. DOI: 10.1145/3352460.3358280.
    [21]
    Indiveri G, Linn E, Ambrogio S. RERAM-based neuromorphic computing. In Resistive Switching: From Fundamentals of Nanoionic Redox Processes to Memristive Device Applications, Ielmini D, Waser R (eds.), Wiley-VCH, 2016, pp.715–736. DOI: 10.1002/9783527680870.ch25.
    [22]
    Chandramoorthy N, Swaminathan K, Cochet M, Paidimarri A, Eldridge S, Joshi R V, Ziegler M M, Buyuktosunoglu A, Bose P. Resilient low voltage accelerators for high energy efficiency. In Proc. the 2019 IEEE International Symposium on High Performance Computer Architecture, Feb. 2019, pp.147–158. DOI: 10.1109/HPCA.2019.00034.
    [23]
    Sandrini J. Fabrication, characterization and integration of resistive random access memories [Ph. D. Thesis]. École Polytechnique Fédérale De Lausanne, Switzerland, 2017. DOI: 10.5075/epfl-thesis-8097.
    [24]
    Hirtzlin T, Bocquet M, Klein J O, Nowak E, Vianello E, Portal J M, Querlioz D. Outstanding bit error tolerance of resistive RAM-based binarized neural networks. In Proc. the 2019 IEEE International Conference on Artificial Intelligence Circuits and Systems, Mar. 2019, pp.288–292. DOI: 10.1109/AICAS.2019.8771544.
    [25]
    Li G P, Hari S K S, Sullivan M, Tsai T, Pattabiraman K, Emer J, Keckler S W. Understanding error propagation in deep learning neural network (DNN) accelerators and applications. In Proc. the International Conference for High Performance Computing, Networking, Storage and Analysis, Nov. 2017, Article No. 8. DOI: 10.1145/3126908.3126964.
    [26]
    Geng Q H, Zhou Z, Cao X C. Survey of recent progress in semantic image segmentation with CNNs. SCIENCE CHINA Information Sciences, 2018, 61(5): Article No. 051101. DOI: 10.1007/s11432-017-9189-6.
    [27]
    Krizhevsky A. Learning multiple layers of features from tiny images. Technical Report, TR-2009, University of Toronto, 2009. http://www.cs.toronto.edu/~kriz/learning-features-2009-TR.pdf, Mar. 2023.
    [28]
    Xia L X, Liu M Y, Ning X F, Chakrabarty K, Wang Y. Fault-tolerant training with on-line fault detection for RRAM-based neural computing systems. In Proc. the 54th Annual Design Automation Conference 2017, Jun. 2017, Article No. 33. DOI: 10.1145/3061639.3062248.
    [29]
    Ketkar N, Moolayil J. Introduction to PyTorch. In Deep Learning with Python: Learn Best Practices of Deep Learning Models with PyTorch, Ketkar N, Moolayil J (eds.), Apress, 2021, pp.27–91. DOI: 10.1007/978-1-4842-5364-9_2.
    [30]
    Krizhevsky A, Sutskever I, Hinton G E. ImageNet classification with deep convolutional neural networks. In Proc. the 25th International Conference on Neural Information Processing Systems, Dec. 2012, pp.1097–1105. DOI: 10.5555/2999134.2999257.
    [31]
    Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. In Proc. the 3rd International Conference on Learning Representations, May 2015.
    [32]
    Deng J, Dong W, Socher R, Li L J, Li K, Li F F. ImageNet: A large-scale hierarchical image database. In Proc. the 2009 IEEE Conference on Computer Vision and Pattern Recognition, Jun. 2009, pp.248–255. DOI: 10.1109/CVPR.2009.5206848.
    [33]
    Cai Y, Lin Y J, Xia L X, Chen X M, Han S, Wang Y, Yang H Z. Long live TIME: Improving lifetime for training-in-memory engines by structured gradient sparsification. In Proc. the 55th Annual Design Automation Conference, Jun. 2018, Article No. 107. DOI: 10.1145/3195970.3196071.
    [34]
    Wang F, Luo G J, Sun G Y, Wang Y H, Niu D M, Zheng H Z. Area efficient pattern representation of binary neural networks on RRAM. Journal of Computer Science and Technology, 2021, 36(5): 1155–1166. DOI: 10.1007/s11390-021-0906-y.
    [35]
    Liu B, Cai H, Wang Z, Sun Y H, Shen Z Y, Zhu W T, Li Y, Gong Y, Ge W, Yang J, Shi L X. A 22nm, 10.8 μW/15.1 μW dual computing modes high power-performance-area efficiency domained background noise aware keyword-spotting processor. IEEE Trans. Circuits and Systems I: Regular Papers, 2020, 67(12): 4733–4746. DOI: 10.1109/TCSI.2020.2997913.
    [36]
    Liu B, Cai H, Zhang Z L, Ding X L, Wang Z Y, Gong Y, Liu W Q, Yang J J, Wang Z, Yang J. More is less: Domain-specific speech recognition microprocessor using one-dimensional convolutional recurrent neural network. IEEE Trans. Circuits and Systems I: Regular Papers, 2022, 69(4): 1571–1582. DOI: 10.1109/TCSI.2021.3134271.
    [37]
    Xiao T P, Feinberg B, Bennett C H, Agrawal V, Saxena P, Prabhakar V, Ramkumar K, Medu H, Raghavan V, Chettuvetty R, Agarwal S, Marinella M J. An accurate, error-tolerant, and energy-efficient neural network inference engine based on SONOS analog memory. IEEE Trans. Circuits and Systems I: Regular Papers, 2022, 69(4): 1480–1493. DOI: 10.1109/TCSI.2021.3134313.
  • Related Articles

    [1]Feng Wang, Guo-Jie Luo, Guang-Yu Sun, Yu-Hao Wang, Di-Min Niu, Hong-Zhong Zheng. Area Efficient Pattern Representation of Binary Neural Networks on RRAM[J]. Journal of Computer Science and Technology, 2021, 36(5): 1155-1166. DOI: 10.1007/s11390-021-0906-y
    [2]Fa-Qiang Sun, Gui-Hai Yan, Xin He, Hua-Wei Li, Yin-He Han. CPicker: Leveraging Performance-Equivalent Configurations to Improve Data Center Energy Efficiency[J]. Journal of Computer Science and Technology, 2018, 33(1): 131-144. DOI: 10.1007/s11390-018-1811-x
    [3]Yu-Rong Cheng, Ye Yuan, Jia-Yu Li, Lei Chen, Guo-Ren Wang. Keyword Query over Error-Tolerant Knowledge Bases[J]. Journal of Computer Science and Technology, 2016, 31(4): 702-719. DOI: 10.1007/s11390-016-1658-y
    [4]Lixue Xia, Peng Gu, Boxun Li, Tianqi Tang, Xiling Yin, Wenqin Huangfu, Shimeng Yu, Yu Cao, Yu Wang, Huazhong Yang. Technological Exploration of RRAM Crossbar Array for Matrix-Vector Multiplication[J]. Journal of Computer Science and Technology, 2016, 31(1): 3-19. DOI: 10.1007/s11390-016-1608-8
    [5]Qi Wang, Jia-Rui Li, Dong-Hui Wang. Improving the Performance and Energy Efficiency of Phase Change Memory Systems[J]. Journal of Computer Science and Technology, 2015, 30(1): 110-120. DOI: 10.1007/s11390-015-1508-3
    [6]Bo Yang, Xiao-Qiong Pang, Jun-Qiang Du, Dan Xie. Effective Error-Tolerant Keyword Search for Secure Cloud Computing[J]. Journal of Computer Science and Technology, 2014, 29(1): 81-89. DOI: 10.1007/s11390-013-1413-6
    [7]Jin-Tao Meng, Jian-Rui Yuan, Sheng-Zhong Feng, Yan-Jie Wei. An Energy Efficient Clustering Scheme for Data Aggregation in Wireless Sensor Networks[J]. Journal of Computer Science and Technology, 2013, 28(3): 564-573. DOI: 10.1007/s11390-013-1356-y
    [8]Xiao-Hang Wang, Peng Liu, Mei Yang, Maurizio Palesi, Ying-Tao Jiang, Michael C Huang. Energy Efficient Run-Time Incremental Mapping for 3-D Networks-on-Chip[J]. Journal of Computer Science and Technology, 2013, 28(1): 54-71. DOI: 10.1007/s11390-013-1312-x
    [9]Long Zheng, Mian-Xiong Dong, Kaoru Ota, Hai Jin, Song Guo, Jun Ma. Energy Efficiency of a Multi-Core Processor by Tag Reduction[J]. Journal of Computer Science and Technology, 2011, 26(3): 491-503. DOI: 10.1007/s11390-011-1149-0
    [10]LI Layuan, LI Chunlin. A Semantics-Based Approach for Achieving Self Fault-Tolerance of Protocols[J]. Journal of Computer Science and Technology, 2000, 15(2): 176-183.
  • Others

Catalog

    Article views (215) PDF downloads (16) Cited by()
    Related

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return