计算机科学技术学报 ›› 2021,Vol. 36 ›› Issue (5): 1155-1166.doi: 10.1007/s11390-021-0906-y

所属专题: Computer Architecture and Systems

• • 上一篇    下一篇

在忆阻器中基于模式表示法的二值神经网络权重映射法

Feng Wang1, Guo-Jie Luo1,*, Member, CCF, ACM, IEEE, Guang-Yu Sun1, Member, CCF, ACM, IEEE Yu-Hao Wang2, Di-Min Niu2, and Hong-Zhong Zheng2   

  1. 1 Center for Energy-Efficient Computing and Applications, Peking University, Beijing 100871, China;
    2 Pingtouge, Alibaba Group, Hangzhou 310052, China
  • 收稿日期:2020-08-14 修回日期:2021-08-26 出版日期:2021-09-30 发布日期:2021-09-30
  • 作者简介:Feng Wang received his Ph.D. degree in computer science from Peking University, Beijing, in 2021. He is currently a research scientist in LEDA Technology. His research interests include processing-in-memory and EDA algorithms.
  • 基金资助:
    This work is partly supported by the National Key Research and Development Program of China under Grant No. 2020AAA0130400, Beijing Municipal Science and Technology Program of China under Grant No. Z201100004220007, the National Natural Science Foundation of China under Grant No. 62090021, Beijing Academy of Artificial Intelligence (BAAI), and Alibaba Innovative Research (AIR) Program.

Area Efficient Pattern Representation of Binary Neural Networks on RRAM

Feng Wang1, Guo-Jie Luo1,*, Member, CCF, ACM, IEEE, Guang-Yu Sun1, Member, CCF, ACM, IEEE Yu-Hao Wang2, Di-Min Niu2, and Hong-Zhong Zheng2        

  1. 1 Center for Energy-Efficient Computing and Applications, Peking University, Beijing 100871, China;
    2 Pingtouge, Alibaba Group, Hangzhou 310052, China
  • Received:2020-08-14 Revised:2021-08-26 Online:2021-09-30 Published:2021-09-30
  • About author:Feng Wang received his Ph.D. degree in computer science from Peking University, Beijing, in 2021. He is currently a research scientist in LEDA Technology. His research interests include processing-in-memory and EDA algorithms.
  • Supported by:
    This work is partly supported by the National Key Research and Development Program of China under Grant No. 2020AAA0130400, Beijing Municipal Science and Technology Program of China under Grant No. Z201100004220007, the National Natural Science Foundation of China under Grant No. 62090021, Beijing Academy of Artificial Intelligence (BAAI), and Alibaba Innovative Research (AIR) Program.

1、研究背景(context)。
近年来,一些工作利用忆阻器实现并行的乘累加运算,并进而用其加速卷积神经网络中的全连接层和卷积层。由于卷积神经网络需要大量数模转换器,又有一些工作开始尝试用忆阻器加速二值神经网络。二值神经网络中的权重为-1和+1,对数模转换需求较小。然而,主流的两种二值神经网络权重表示方法在表示负权时都引入了许多冗余的0和1。
2、目的(Objective):准确描述该研究的目的,说明提出问题的缘由,表明研究的范围和重要性。
在本工作中,我们希望减少冗余的0和1,节省阵列面积。为此,我们希望使用一种新的基于模式的权重表示方法,并设计相应的硬件架构。
3、方法(Method):简要说明研究课题的基本设计,结论是如何得到的。
首先,我们通过最近邻算法将权重矩阵分成若干小矩阵。然后,我们从各个小矩阵中提取1的模式,每一权重列都可以用这些模式组合而成。接着,我们将这些模式映射到忆阻器阵列中,模式计算阵列负责计算这些模式的值,模式累加阵列负责累加模式以得到最终输出。最后,我们比较我们的模式表示方法和传统表示方法,选出更省面积的方法。
4、结果(Result&Findings):简要列出该研究的主要结果,有什么新发现,说明其价值和局限。叙述要具体、准确,尽量给出量化数据而不只是定性描述,并给出结果的置信值(如果有)。
我们使用MNIST和CIFAR-10中的卷积层和全连接层做了测试。相较于两种主流的权重表示方式,我们的模式表示法在超过70%的测试用例中有效,平均可以节省约20%的面积。
5、结论(Conclusions):简要地说明经验,论证取得的正确观点及理论价值或应用价值,是否还有与此有关的其它问题有待进一步研究,是否可推广应用,其应用价值如何?
和传统方法直接映射权重不同,我们的模式表示法先提取模式,再通过模式构造原始输出。实验结果表明这样的方式对于远大于阵列大小的权重矩阵更加有效。由于外围电路占据了绝大部分面积,我们未来会进一步探索如何节省这一部分面积。

关键词: 二值神经网络, 模式, 忆阻器

Abstract: Resistive random access memory (RRAM) has been demonstrated to implement multiply-and-accumulate (MAC) operations using a highly parallel analog fashion, which dramatically accelerates the convolutional neural networks (CNNs). Since CNNs require considerable converters between analog crossbars and digital peripheral circuits, recent studies map the binary neural networks (BNNs) onto RRAM and binarize the weights to {+1, -1}. However, two mainstream representations for BNN weights introduce patterns of redundant 0s and 1s when dealing with negative weights. In this work, we reduce the area of redundant 0s and 1s by proposing a BNN weight representation framework based on the novel pattern representation and a corresponding architecture. First, we spilt the weight matrix into several small matrices by clustering adjacent columns together. Second, we extract 1s' patterns, i.e., the submatrices only containing 1s, from the small weight matrix, such that each final output can be represented by the sum of several patterns. Third, we map these patterns onto RRAM crossbars, including pattern computation crossbars (PCCs) and pattern accumulation crossbars (PACs). Finally, we compare the pattern representation with two mainstream representations and adopt the more area efficient one. The evaluation results demonstrate that our framework can save over 20% of crossbar area effectively, compared with two mainstream representations.

Key words: binary neural network (BNN), pattern, resistive random access memory (RRAM)

[1] Hinton G, Deng L, Yu D et al. Deep neural networks for acoustic modeling in speech recognition:The shared views of four research groups. IEEE Signal Process Mag., 2012, 29(6):82-97. DOI:10.1109/MSP.2012.2205597.
[2] Akinaga H, Shima H. Resistive random access memory (ReRAM) based on metal oxides. Proc. IEEE, 2010, 98(12):2237-2251. DOI:10.1109/JPROC.2010.2070830.
[3] Chi P, Li S, Xu C et al. PRIME:A novel processing-inmemory architecture for neural network computation in ReRAM-based main memory. In Proc. the 43rd International Symposium on Computer Architecture, Jun. 2016, pp.27-39. DOI:10.1109/ISCA.2016.13.
[4] Chen L, Li J, Chen Y et al. Accelerator-friendly neuralnetwork training:Learning variations and defects in RRAM crossbar. In Proc. the Design, Automation & Test in Europe Conference & Exhibition, Mar. 2017, pp.19-24. DOI:10.23919/DATE.2017.7926952.
[5] Liu C, Yan B, Yang C et al. A spiking neuromorphic design with resistive crossbar. In Proc. the 52nd Design Automation Conference, Jun. 2015. DOI:10.1145/2744769.2744783.
[6] Rastegari M, Ordonez V, Redmon J, Farhadi A. XNORNet:ImageNet classification using binary convolutional neural networks. In Proc. the 14th European Conference on Computer Vision, Oct. 2016, pp.525-542. DOI:10.1007/978-3-319-46493-032.
[7] Alemdar H, Leroy V, Prost-Boucle A, Pétrot F. Ternary neural networks for resource-efficient AI applications. In Proc. the International Joint Conference on Neural Networks, May 2017, pp.2547-2554. DOI:10.1109/IJCNN.2017.7966166.
[8] Tang T, Xia L, Li B, Wang Y, Yang H. Binary convolutional neural network on RRAM. In Proc. the 22nd Asia and South Pacific Design Automation Conference, Jan. 2017, pp.782-787. DOI:10.1109/ASPDAC.2017.7858419.
[9] Ni L, Liu Z, Song W et al. An energy-efficient and highthroughput bitwise CNN on sneak-path-free digital ReRAM crossbar. In Proc. the 2017 IEEE/ACM International Symposium on Low Power Electronics and Design, Jul. 2017. DOI:10.1109/ISLPED.2017.8009177.
[10] Sun X, Yin S, Peng X, Liu R, Seo J, Yu S. XNOR-RRAM:A scalable and parallel resistive synaptic architecture for binary neural networks. In Proc. the Design, Automation & Test in Europe Conference & Exhibition, Mar. 2018, pp.1423-1428. DOI:10.23919/DATE.2018.8342235.
[11] Sun X, Peng X, Chen P Y, Liu R, Seo J, Yu S. Fully parallel RRAM synaptic array for implementing binary neural network with (+1, -1) weights and (+1, 0) neurons. In Proc. the 23rd Asia and South Pacific Design Automation Conference, Jan. 2018, pp.574-579. DOI:10.1109/ASPDAC.2018.8297384.
[12] Wang P, Ji Y, Hong C, Lyu Y, Wang D, Xie Y. SNrram:An efficient sparse neural network computation architecture based on resistive random-access memory. In Proc. the 55th ACM/ESDA/IEEE Design Automation Conference, Jun. 2018. DOI:10.1109/DAC.2018.8465793.
[13] Chi C C, Jiang J H R. Logic synthesis of binarized neural networks for efficient circuit implementation. IEEE Trans. Comput. Des. Integr. Circuits Syst.. DOI:10.1109/TCAD.2021.3078606.
[14] Garey M R, Johnson D S, Stockmeyer L. Some simplified NP-complete problems. In Proc. the 6th ACM Symposium on Theory of Computing, Apr. 30-May 2, 1974, pp.47-63. DOI:10.1145/800119.803884.
[15] Kazemi A, Alessandri C, Seabaugh A C, Sharon H X, Niemier M, Joshi S. A device non-ideality resilient approach for mapping neural networks to crossbar arrays. In Proc. the 57th ACM/IEEE Design Automation Conference, Jul. 2020. DOI:10.1109/DAC18072.2020.9218544.
[16] Song L, Qian X, Li H, Chen Y. PipeLayer:A pipelined ReRAM-based accelerator for deep learning. In Proc. the International Symposium on High Performance Computer Architecture, Feb. 2017, pp.541-552. DOI:10.1109/HPCA.2017.55.
[17] Shafiee A, Nag A, Muralimanohar N et al. ISAAC:A convolutional neural network accelerator with in-situ analog arithmetic in crossbars. ACM SIGARCH Comput. Archit. News, 2016, 44(3):14-26. DOI:10.1145/3007787.3001139.
[18] Zhu Z, Sun H, Lin Y et al. A configurable multi-precision CNN computing framework based on single bit RRAM. In Proc. the 56th ACM/IEEE Design Automation Conference, Jun. 2019, Article No. 56. DOI:10.1145/3316781.3317739.
[19] Peng X, Liu R, Yu S. Optimizing weight mapping and dataflow for convolutional neural networks on processing-in-memory architectures. IEEE Trans. Circuits Syst. I Regul. Pap., 2020, 67(4):1333-1343. DOI:10.1109/TCSI.2019.2958568.
[20] Cheng M, Xia L, Zhu Z et al. TIME:A training-in-memory architecture for RRAM-based deep neural networks. IEEE Trans. Comput. Des. Integr. Circuits Syst., 2019, 38(5):834-847. DOI:10.1109/TCAD.2018.2824304.
[21] Zhu Z, Lin J, Cheng M et al. Mixed size crossbar based RRAM CNN accelerator with overlapped mapping method. In Proc. the International Conference on Computer-Aided Design, Nov. 2018, Article No. 69. DOI:10.1145/3240765.3240825.
[1] 孔雀屏, 王子彦, 黄袁, 陈湘萍, 周晓聪, 郑子彬, 黄罡. 定义和检测智能合约中低效率的Gas模式[J]. 计算机科学技术学报, 2022, 37(1): 67-82.
[2] Zeynep Banu Ozger, Nurgul Yuzbasioglu Uslu. 基于三元组重新排序的有效离散人工蜂群SPARQL查询路径优化[J]. 计算机科学技术学报, 2021, 36(2): 445-462.
[3] Xin Li, Patrick Gardy, Yu-Xin Deng, Hiroyuki Seki. 模式化条件下推系统的可到达性[J]. 计算机科学技术学报, 2020, 35(6): 1295-1311.
[4] Dong Liu, Zhi-Lei Ren, Zhong-Tian Long, Guo-Jun Gao, He Jiang. 挖掘设计模式应用场景和相关设计模式对:基于网络发帖的案例研究[J]. 计算机科学技术学报, 2020, 35(5): 963-978.
[5] Monidipa Das, Soumya K. Ghosh. 用于时空分析的数据驱动方法:研究现状综述[J]. 计算机科学技术学报, 2020, 35(3): 665-696.
[6] Wen-Yan Chen, Ke-Jiang Ye, Cheng-Zhi Lu, Dong-Dai Zhou, Cheng-Zhong Xu. 混部容器负载干扰分析:从硬件性能计数器的视角[J]. 计算机科学技术学报, 2020, 35(2): 412-417.
[7] Jiu-Ru Gao, Wei Chen, Jia-Jie Xu, An Liu, Zhi-Xu Li, Hongzhi Yin, Lei Zhao. 针对多子图模式匹配模型的有效框架[J]. 计算机科学技术学报, 2019, 34(6): 1185-1202.
[8] Zhe Liu, Cheng-Jian Qiu, Yu-Qing Song, Xiao-Hong Liu, Juan Wang, Victor S. Sheng. 基于高阶衍生的均值CLBP甲状腺MR图像纹理特征提取[J]. 计算机科学技术学报, 2019, 34(1): 35-46.
[9] Xiao-Dong Dong, Sheng Chen, Lai-Ping Zhao, Xiao-Bo Zhou, Heng Qi, Ke-Qiu Li. 多请求,少开销:分层计价模式下的数据中心间不确定流量传输[J]. 计算机科学技术学报, 2018, 33(6): 1152-1163.
[10] Aakash Ahmad, Claus Pahl, Ahmed B. Altamimi, Abdulrahman Alreshidi. 支持软件体系结构的重用驱动演化的变更日志挖掘模型[J]. 计算机科学技术学报, 2018, 33(6): 1278-1306.
[11] Xin Xu, Jiaheng Lu, Wei Wang. 一种面向复杂符号数据的分层聚类方法及其在辐射源识别上的应用[J]. , 2018, 33(4): 807-822.
[12] Lei Guo, Yu-Fei Wen, Xin-Hua Wang. 社会网络中基于预训练网络嵌入式表示模型的推荐算法研究[J]. , 2018, 33(4): 682-696.
[13] Guo-Wei Wang, Jin-Dou Zhang, Jing Li. 用户异源移动轨迹链接的研究[J]. , 2018, 33(4): 792-806.
[14] Bei-Ji Zou, Yao Chen, Cheng-Zhang Zhu, Zai-Liang Chen, Zi-Qian Zhang. 基于特征选择的有监督视网膜血管动静脉分类[J]. , 2017, 32(6): 1222-1230.
[15] Shi-Ming Guo, Hong Gao. HUITWU:一个在事务数据库中有效挖掘高效用项集的算法[J]. , 2016, 31(4): 776-786.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] 周笛;. A Recovery Technique for Distributed Communicating Process Systems[J]. , 1986, 1(2): 34 -43 .
[2] 陈世华;. On the Structure of Finite Automata of Which M Is an(Weak)Inverse with Delay τ[J]. , 1986, 1(2): 54 -59 .
[3] 王建潮; 魏道政;. An Effective Test Generation Algorithm for Combinational Circuits[J]. , 1986, 1(4): 1 -16 .
[4] 陈肇雄; 高庆狮;. A Substitution Based Model for the Implementation of PROLOG——The Design and Implementation of LPROLOG[J]. , 1986, 1(4): 17 -26 .
[5] 黄河燕;. A Parallel Implementation Model of HPARLOG[J]. , 1986, 1(4): 27 -38 .
[6] 郑国梁; 李辉;. The Design and Implementation of the Syntax-Directed Editor Generator(SEG)[J]. , 1986, 1(4): 39 -48 .
[7] 黄学东; 蔡莲红; 方棣棠; 迟边进; 周立; 蒋力;. A Computer System for Chinese Character Speech Input[J]. , 1986, 1(4): 75 -83 .
[8] 许小曙;. Simplification of Multivalued Sequential SULM Network by Using Cascade Decomposition[J]. , 1986, 1(4): 84 -95 .
[9] 唐同诰; 招兆铿;. Stack Method in Program Semantics[J]. , 1987, 2(1): 51 -63 .
[10] 衷仁保; 邢林; 任朝阳;. An Interactive System SDI on Microcomputer[J]. , 1987, 2(1): 64 -71 .
版权所有 © 《计算机科学技术学报》编辑部
本系统由北京玛格泰克科技发展有限公司设计开发 技术支持:support@magtech.com.cn
总访问量: