? 基于深度学习处理器的库设计与实现
Journal of Computer Science and Technology
Quick Search in JCST
 Advanced Search 
      Home | PrePrint | SiteMap | Contact Us | Help
 
Indexed by   SCIE, EI ...
Bimonthly    Since 1986
Journal of Computer Science and Technology 2017, Vol. 32 Issue (2) :286-296    DOI: 10.1007/s11390-017-1722-2
Computer Architecture and Systems << Previous Articles | Next Articles >>
基于深度学习处理器的库设计与实现
Hui-Ying Lan1,2,3, Lin-Yang Wu1,2,3, Student Member, CCF, Xiao Zhang1,2, Jin-Hua Tao1,2, Xun-Yu Chen1,2, Bing-Rui Wang1,2,4, Yu-Qing Wang1,2,4, Qi Guo1,2, Member, CCF, Yun-Ji Chen1,2
1 State Key Laboratory of Computer Architecture, Institute of Computing Technology, Chinese Academy of Sciences Beijing 100190, China;
2 Microprocessor Research Center, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China;
3 University of Chinese Academy of Sciences, Beijing 100049, China;
4 Department of Computer Science, University of Science and Technology of China, Hefei 230026, China
DLPlib: A Library for Deep Learning Processor
Hui-Ying Lan1,2,3, Lin-Yang Wu1,2,3, Student Member, CCF, Xiao Zhang1,2, Jin-Hua Tao1,2, Xun-Yu Chen1,2, Bing-Rui Wang1,2,4, Yu-Qing Wang1,2,4, Qi Guo1,2, Member, CCF, Yun-Ji Chen1,2
1 State Key Laboratory of Computer Architecture, Institute of Computing Technology, Chinese Academy of Sciences Beijing 100190, China;
2 Microprocessor Research Center, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China;
3 University of Chinese Academy of Sciences, Beijing 100049, China;
4 Department of Computer Science, University of Science and Technology of China, Hefei 230026, China

摘要
参考文献
相关文章
Download: [PDF 878KB]  
摘要 最近,深度学习处理器已经成为一种非常值得期待的深度学习算法加速方法。如今,人们只能通过手写指令的方式对深度学习处理器进行编程,这种方式既费人力,也不够高效。一种改善编程环境的方法就是讲深度学习处理器整合到现有的,流行的高级深度学习框架中,作为一种新的后端计算设备,比如,Tensorflow直接将TPU集成到框架中。然而,这种方式只能让一种框架收益,无法令其他的开源框架从中获益。另一种方式是设计一种不依赖于框架的底层库,比如针对GPU的深度学习加速库cuDNN。这样更便于将库整合到高级的编程框架并且能够提供更多的通用性。
为了让更多的深度学习框架从我们所设计的编程环境中获得好处,我们设计一种针对深度学习处理器的底层库,其很易于嵌入现有的高层框架,并且提供相对好的性能。本文中我们针对库的设计讨论三个主要问题。首先,我们讨论库的数据结构,库应该提供最少,但是足够支持所有操作。这样易于对数据结构进行优化,同时也不会损害通用性。第二点是对深度学习操作的选择。库中应该提供足够多的操作,以支持不同类型的网络以及较好的性能。第三,API的设计。API的编程模型应该友好,易用,并且易于嵌入到现有的深度学习框架中。
考虑以上三点问题,我们提出DLPlib,一种专为深度学习处理器设计的,基于张量-滤波器的库,其包括两种主要的数据结构:张量和滤波器,并且提供一系列操作,包括基本的深度学习操作,以及矩阵/向量操作。此外,库还提供了一种基于描述符的API,接口为C++。和手写指令相比,我们实现的库达到了0.79倍的加速比。
关键词:   
Abstract: Recently, deep learning processors have become one of the most promising solutions of accelerating deep learning algorithms. Currently, the only method of programming the deep learning processors is through writing assembly instructions by bare hands, which costs a lot of programming efforts and causes very low efficiency. One solution is to integrate the deep learning processors as a new back-end into one prevalent high-level deep learning framework (e.g., TPU (tensor processing unit) is integrated into Tensorflow directly). However, this will obstruct other frameworks to profit from the programming interface. The alternative approach is to design a framework-independent low-level library for deep learning processors (e.g., the deep learning library for GPU, cuDNN). In this fashion, the library could be conveniently invoked in high-level programming frameworks and provides more generality. In order to allow more deep learning frameworks to gain benefits from this environment, we envision it as a low-level library which could be easily embedded into current high-level frameworks and provide high performance. Three major issues of designing such a library are discussed. The first one is the design of data structures. Data structures should be as few as possible while being able to support all possible operations. This will allow us to optimize the data structures easier without compromising the generality. The second one is the selection of operations, which should provide a rather wide range of operations to support various types of networks with high efficiency. The third is the design of the API, which should provide a flexible and user-friendly programming model and should be easy to be embedded into existing deep learning frameworks. Considering all the above issues, we propose DLPlib, a tensor-filter based library designed specific for deep learning processors. It contains two major data structures, tensor and filter, and a set of operators including basic neural network primitives and matrix/vector operations. It provides a descriptor-based API exposed as a C++ interface. The library achieves a speedup of 0.79x compared with the performance of hand-written assembly instructions.
Keywords:   
Received 2016-11-02;
本文基金:

This work is partially supported by the National Natural Science Foundation of China under Grant Nos. 61432016, 61472396, 61473275, 61522211, 61532016, 61521092, 61502446, 61672491, 61602441, and 61602446, the National Basic Research 973 Program of China under Grant No. 2015CB358800, and the Strategic Priority Research Program of the Chinese Academy of Sciences under Grant No. XDB02040009.

About author: Hui-Ying Lan received her B.E. degree in software engineering from Wuhan University, Wuhan, in 2012. She received her Master's degree from School of Software and Microelectronics, Peking University, Beijing, in 2015. She is currently a Ph.D. student at Institute of Computing Technology, Chinese Academy of Sciences, Beijing. Her research interests include computer architecture and computational intelligence.
引用本文:   
Hui-Ying Lan, Lin-Yang Wu, Xiao Zhang, Jin-Hua Tao, Xun-Yu Chen, Bing-Rui Wang, Yu-Qing Wang, Qi Guo, Yun-Ji Chen.基于深度学习处理器的库设计与实现[J]  Journal of Computer Science and Technology , 2017,V32(2): 286-296
Hui-Ying Lan, Lin-Yang Wu, Xiao Zhang, Jin-Hua Tao, Xun-Yu Chen, Bing-Rui Wang, Yu-Qing Wang, Qi Guo, Yun-Ji Chen.DLPlib: A Library for Deep Learning Processor[J]  Journal of Computer Science and Technology, 2017,V32(2): 286-296
链接本文:  
http://jcst.ict.ac.cn:8080/jcst/CN/10.1007/s11390-017-1722-2
Copyright 2010 by Journal of Computer Science and Technology