›› 2016,Vol. 31 ›› Issue (1): 3-19.doi: 10.1007/s11390-016-1608-8

所属专题: Computer Architecture and Systems

• Special Section on Selected Paper from NPC 2011 • 上一篇    下一篇

忆阻器阵列矩阵向量乘的设计空间优化

Lixue Xia1,2, Student Member, IEEE, Peng Gu1,2, Student Member, IEEE Boxun Li1,2, Student Member, IEEE, Tianqi Tang1,2, Student Member, IEEE, Xiling Yin1,2 Wenqin Huangfu1,2, Shimeng Yu3, Member, IEEE, Yu Cao3, Senior Member, IEEE Yu Wang1,2*, Senior Member, IEEE, and Huazhong Yang1,2, Senior Member, IEEE   

  1. 1 Department of Electronic Engineering, Tsinghua University, Beijing 100084, China;
    2 Tsinghua National Laboratory for Information Science and Technology(TNList), Tsinghua University Beijing 100084, China;
    3 School of Electrical, Computer and Energy Engineering, Arizona State University, Arizona 85281, U.S.A
  • 收稿日期:2015-09-01 修回日期:2015-12-08 出版日期:2016-01-05 发布日期:2016-01-05
  • 通讯作者: Yu Wang E-mail:yu-wang@mail.tsinghua.edu.cn
  • 作者简介:Lixue Xia received his B.S. degree in electronic engineering from Tsinghua University, Beijing, in 2013. He is currently pursuing his Ph.D. degree in the Department of Electronic Engineering, Tsinghua University, Beijing. His research mainly focuses on energy efficient hardware computing system design and neuromorphic computing system based on emerging non-volatile device.
  • 基金资助:

    This work was supported by the National Basic Research 973 Program of China under Grant No. 2013CB329000, the National Natural Science Foundation of China under Grant Nos. 61373026, 61261160501, the Brain Inspired Computing Research of Tsinghua University under Grant No. 20141080934, Tsinghua University Initiative Scientific Research Program, and the Importation and Development of High-Caliber Talents Project of Beijing Municipal Institutions.

Technological Exploration of RRAM Crossbar Array for Matrix-Vector Multiplication

Lixue Xia1,2, Student Member, IEEE, Peng Gu1,2, Student Member, IEEE Boxun Li1,2, Student Member, IEEE, Tianqi Tang1,2, Student Member, IEEE, Xiling Yin1,2 Wenqin Huangfu1,2, Shimeng Yu3, Member, IEEE, Yu Cao3, Senior Member, IEEE Yu Wang1,2*, Senior Member, IEEE, and Huazhong Yang1,2, Senior Member, IEEE   

  1. 1 Department of Electronic Engineering, Tsinghua University, Beijing 100084, China;
    2 Tsinghua National Laboratory for Information Science and Technology(TNList), Tsinghua University Beijing 100084, China;
    3 School of Electrical, Computer and Energy Engineering, Arizona State University, Arizona 85281, U.S.A
  • Received:2015-09-01 Revised:2015-12-08 Online:2016-01-05 Published:2016-01-05
  • Contact: Yu Wang E-mail:yu-wang@mail.tsinghua.edu.cn
  • About author:Lixue Xia received his B.S. degree in electronic engineering from Tsinghua University, Beijing, in 2013. He is currently pursuing his Ph.D. degree in the Department of Electronic Engineering, Tsinghua University, Beijing. His research mainly focuses on energy efficient hardware computing system design and neuromorphic computing system based on emerging non-volatile device.
  • Supported by:

    This work was supported by the National Basic Research 973 Program of China under Grant No. 2013CB329000, the National Natural Science Foundation of China under Grant Nos. 61373026, 61261160501, the Brain Inspired Computing Research of Tsinghua University under Grant No. 20141080934, Tsinghua University Initiative Scientific Research Program, and the Importation and Development of High-Caliber Talents Project of Beijing Municipal Institutions.

矩阵向量乘被大量应用于各类计算密集型算法中,而忆阻器(RRAM)及其构成的交叉阵列矩阵则可以通过模拟计算的方式高效、快速地完成矩阵向量乘运算。本文从器件层和电路层两个层面入手,分析了包括忆阻器非线性电压-电流特性,忆阻器阻值随机误差,以及忆阻器阵列互连线等多种非理想因素对忆阻器阵列在进行矩阵向量乘时的功耗、误差系统性能的影响。基于上述分析,本文提出了一个针对非理想因素的技术设计流程,通过优化多种电路参数并改进忆阻器阻值的映射方法实现功耗、准确度、健壮性等系统性能之间的折衷设计与优化。我们进一步使用支持向量机作为矩阵向量乘的实际应用,在手写体数据集MNIST上仿真验证了我们的设计流程。仿真结果显示我们的设计可以提升10.98%的分类准确度并同时节约26.4%的功耗。同时,对于一些可以容忍一定计算误差的应用,我们的算法将能够节约高达84.4%的功耗。

Abstract: Matrix-vector multiplication is the key operation for many computationally intensive algorithms. The emerging metal oxide resistive switching random access memory (RRAM) device and RRAM crossbar array have demonstrated a promising hardware realization of the analog matrix-vector multiplication with ultra-high energy efficiency. In this paper, we analyze the impact of both device level and circuit level non-ideal factors, including the nonlinear current-voltage relationship of RRAM devices, the variation of device fabrication and write operation, and the interconnect resistance as well as other crossbar array parameters. On top of that, we propose a technological exploration flow for device parameter configuration to overcome the impact of non-ideal factors and achieve a better trade-off among performance, energy, and reliability for each specific application. Our simulation results of a support vector machine (SVM) and Mixed National Institute of Standards and Technology (MNIST) pattern recognition dataset show that RRAM crossbar array based SVM is robust to input signal fluctuation but sensitive to tunneling gap deviation. A further resistance resolution test presents that a 6-bit RRAM device is able to realize a recognition accuracy around 90%, indicating the physical feasibility of RRAM crossbar array based SVM. In addition, the proposed technological exploration flow is able to achieve 10.98% improvement of recognition accuracy on the MNIST dataset and 26.4% energy savings compared with previous work. Experimental results also show that more than 84.4% power saving can be achieved at the cost of little accuracy reduction.

[1] Franklin J. The elements of statistical learning:Data mining, inference and prediction. The Mathematical Intelligencer, 2005, 27(2):83-85.

[2] Jang J W, Choi S B, Prasanna V K. Energy- and timeefficient matrix multiplication on FPGAs. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 2005, 13(11):1305-1319.

[3] Williams S, Oliker L, Vuduc R et al. Optimization of sparse matrix-vector multiplication on emerging multicore platforms. Parallel Computing, 2009, 35(3):178-194.

[4] Catanzaro B, Sundaram N, Keutzer K. Fast support vector machine training and classification on graphics processors. In Proc. the 25th International Conference on Machine Learning, July 2008, pp.104-111.

[5] Dean J, Corrado G, Monga R et al. Large scale distributed deep networks. In Advances in Neural Information Processing Systems 25, Pereira F, Burges C, Bottou L, Weinberger K (eds.), Curran Associates, Inc., 2012, pp.1232-1241.

[6] Xu C, Dong X, Jouppi N P et al. Design implications of memristor-based RRAM cross-point structures. In Proc. Design, Automation and Test in Europe Conference and Exhibition (DATE), March 2011.

[7] Wang Y, Li B, Luo R et al. Energy efficient neural networks for big data analytics. In Proc. Design, Automation and Test in Europe Conference and Exhibition (DATE), March 2014.

[8] Hu M, Li H, Wu Q et al. Hardware realization of BSB recall function using memristor crossbar arrays. In Proc. the 49th Annual Design Automation Conference, June 2012, pp.498-503.

[9] Li B, Shan Y, Hu M et al. Memristor-based approximated computation. In Proc. the International Symposium on Low Power Electronics and Design, September 2013, pp.242- 247.

[10] Hu M, Li H, Chen Y et al. Memristor crossbar-based neuromorphic computing system:A case study. IEEE Transactions on Neural Networks and Learning Systems, 2014, 25(10):1864-1878.

[11] Li B, Wang Y, Wang Y et al. Training itself:Mixed-signal training acceleration for memristor-based neural network. In Proc. the 19th Asia and South Pacific Design Automation Conference (ASP-DAC), January 2014, pp.361-366.

[12] Deng Y, Huang P, Chen B et al. RRAM crossbar array with cell selection device:A device and circuit interaction study. IEEE Transactions on Electron Devices, 2013, 60(2):719- 726.

[13] Seo K, Kim I, Jung S et al. Analog memory and spiketiming-dependent plasticity characteristics of a nanoscale titanium oxide bilayer resistive switching device. Nanotechnology, 2011, 22(25):254023.

[14] Chang T, Jo S H, Lu W. Short-term memory to long-term memory transition in a nanoscale memristor. ACS Nano, 2011, 5(9):7669-7676.

[15] Fang Z, Yu H, Li X et al. Multilayer-based forming-free RRAM devices with excellent uniformity. IEEE Electron Device Letters, 2011, 32(4):566-568.

[16] Wong H S P, Lee H Y, Yu S et al. Metal-oxide RRAM. Proceedings of the IEEE, 2012, 100(6):1951-1970.

[17] Yu S, Gao B, Fang Z et al. A low energy oxide-based electronic synaptic device for neuromorphic visual systems with tolerance to device variation. Advanced Materials, 2013, 25(12):1774-1779.

[18] Jiao B, Deng N, Yu J et al. Resisitive switching variability study on 1T1R ALOX/WOx-based RRAM array. In Proc. International Conference of Electron Devices and SolidState Circuits (EDSSC), June 2013.

[19] Goux L, Fantini A, Kar G et al. Ultralow sub-500nA operating current high-performance TINAL 2O3 HfO2 HFTiN bipolar RRAM achieved through understandingbased stack-engineering. In Proc. Symposium on VLSI Technology (VLSIT), June 2012, pp.159-160.

[20] Young-Fisher K G, Bersuker G, Butcher B et al. Leakage current-forming voltage relation and oxygen gettering in HfOx RRAM devices. IEEE Electron Device Letters, 2013, 34(6):750-752.

[21] Yu S, Guan X, Wong H S P. On the stochastic nature of resistive switching in metal oxide RRAM:Physical modeling, monte carlo simulation, and experimental characterization. In Proc. International Electron Devices Meeting (IEDM), December 2011, pp.17.3.1-17.3.4.

[22] Degraeve R, Fantini A, Raghavan N et al. Causes and consequences of the stochastic aspect of filamentary RRAM. Microelectronic Engineering, 2015, 147:171-175.

[23] Long S, Lian X, Cagli C et al. A model for the set statistics of RRAM inspired in the percolation model of oxide breakdown. IEEE Electron Device Letters, 2013, 34(8):999-1001.

[24] Guan X, Yu S, Wong H S. A SPICE compact model of metal oxide resistive switching memory with variations. IEEE Electron Device Letters, 2012, 33(10):1405-1407.

[25] Guan X, Yu S, Wong H S P. On the switching parameter variation of metal-oxide RRAM-Part I:Physical modeling and simulation methodology. IEEE Transactions on Electron Devices, 2012, 59(4):1172-1182.

[26] Li B, Gu P, Shan Y et al. RRAM-based analog approximate computing. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2015, 34(12):1905-1917.

[27] Wang Y, Tang T, Xia L et al. Energy efficient RRAM spiking neural network for real time classification. In Proc. the 25th Edition on Great Lakes Symposium on VLSI, May 2015, pp.189-194.

[28] Chen A, Lin M R. Variability of resistive switching memories and its impact on crossbar array performance. In Proc. International Reliability Physics Symposium (IRPS), April 2011, pp.MY.7.1-MY.7.4.

[29] Lee H, Che P, Wu T et al. Low power and high speed bipolar switching with a thin reactive Ti buffer layer in robust HfO2 based RRAM. In Proc. International Electron Devices Meeting, December 2008.

[30] Serrano-Gotarredona T, Masquelier T, Prodromakis T et al. STDP and STDP variations with memristors for spiking neuromorphic learning systems. Frontiers in Neuroscience, 2013, 7(7):2.

[31] Tang T, Luo R, Li B et al. Energy efficient spiking neural network design with RRAM devices. In Proc. the 14th International Symposium on Integrated Circuits (ISIC), December 2014, pp.268-271.

[32] Querlioz D, Bichler O, Gamrat C. Simulation of a memristor-based spiking neural network immune to device variations. In Proc. International Joint Conference on Neural Networks (IJCNN), July 31-Aug.5, 2011, pp.1775-1781.

[33] ITRS teams. International technology roadmap for semiconductors:2013 edition executive summary. http://public.itrs.net/ITRS%201999-2014%20Mtgs,%20Presentations%20&%20Links/2013ITRS/2013Chapters/2013ExecutiveSummary.pdf, Dec. 2015.

[34] Dongale T, Patil K, Mullani S et al. Investigation of process parameter variation in the memristor based resistive random access memory (RRAM):Effect of device size variations. Materials Science in Semiconductor Processing, 2015, 35:174-180.

[35] Walczyk D, Walczyk C, Schroeder T et al. Resistive switching characteristics of CMOS embedded HfO2-based 1T1R cells. Microelectronic Engineering, 2011, 88(7):1133-1135.

[36] Lee S R, Kim Y B, Chang M et al. Multi-level switching of triple-layered TaOx RRAM with excellent reliability for storage class memory. In Proc. Symposium on VLSI Technology (VLSIT), June 2012, pp.71-72.

[37] Liu B, Li H, Chen Y et al. Reduction and IR-drop compensations techniques for reliable neuromorphic computing systems. In Proc. International Conference on ComputerAided Design (ICCAD), November 2014, pp.63-70.

[38] Kannan S, Rajendran J, Karri R et al. Sneak-path testing of crossbar-based nonvolatile random access memories. IEEE Transactions on Nanotechnology, 2013, 12(3):413-426.

[39] Prakash A, Jana D, Samanta S et al. Self-complianceimproved resistive switching using Ir/TaOx/W cross-point memory. Nanoscale Research Letters, 2013, 8(1):527.

[40] Sheu S S, Chiang P C, Lin W P et al. A 5ns fast write multi-level non-volatile 1 K bits RRAM memory with advance write scheme. In Proc. Symposium on VLSI Circuits, June 2009, pp.82-83.

[41] Wong S C, Lee G Y, Ma D J. Modeling of interconnect capacitance, delay, and crosstalk in VLSI. IEEE Transactions on Semiconductor Manufacturing, 2000, 13(1):108-111.

[42] Govoreanu B, Kar G, Chen Y et al. 10×10 nm2 Hf/HfOx crossbar resistive RAM with excellent performance, reliability and low-energy operation. In Proc. International Electron Devices Meeting (IEDM), December 2011, pp.31.6.1- 31.6.4.

[43] Yu S, Gao B, Fang Z et al. A neuromorphic visual system using RRAM synaptic devices with sub-pJ energy and tolerance to variability:Experimental characterization and large-scale modeling. In Proc. International Electron Devices Meeting (IEDM), December 2012, pp.10.4.1-10.4.4.

[44] Kawahara A, Kawai K, Ikeda Y et al. Filament scaling forming technique and level-verify-write scheme with endurance over 107 cycles in ReRAM. In Proc. International Solid-State Circuits Conference Digest of Technical Papers (ISSCC), February 2013, pp.220-221.

[45] Wang L (ed.). Support Vector Machines:Theory and Applications, Volume 177. Springer-Verlag Berlin Heidelberg, 2005.

[46] Bishop C M. Pattern Recognition and Machine Learning. Springer, 2006.
No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] 刘明业; 洪恩宇;. Some Covering Problems and Their Solutions in Automatic Logic Synthesis Systems[J]. , 1986, 1(2): 83 -92 .
[2] 陈世华;. On the Structure of (Weak) Inverses of an (Weakly) Invertible Finite Automaton[J]. , 1986, 1(3): 92 -100 .
[3] 高庆狮; 张祥; 杨树范; 陈树清;. Vector Computer 757[J]. , 1986, 1(3): 1 -14 .
[4] 金兰; 杨元元;. A Modified Version of Chordal Ring[J]. , 1986, 1(3): 15 -32 .
[5] 陈肇雄; 高庆狮;. A Substitution Based Model for the Implementation of PROLOG——The Design and Implementation of LPROLOG[J]. , 1986, 1(4): 17 -26 .
[6] 黄河燕;. A Parallel Implementation Model of HPARLOG[J]. , 1986, 1(4): 27 -38 .
[7] 闵应骅; 韩智德;. A Built-in Test Pattern Generator[J]. , 1986, 1(4): 62 -74 .
[8] 唐同诰; 招兆铿;. Stack Method in Program Semantics[J]. , 1987, 2(1): 51 -63 .
[9] 闵应骅;. Easy Test Generation PLAs[J]. , 1987, 2(1): 72 -80 .
[10] 朱鸿;. Some Mathematical Properties of the Functional Programming Language FP[J]. , 1987, 2(3): 202 -216 .
版权所有 © 《计算机科学技术学报》编辑部
本系统由北京玛格泰克科技发展有限公司设计开发 技术支持:support@magtech.com.cn
总访问量: