1 State Key Laboratory of Computer Architecture, Institute of Computing Technology Chinese Academy of Sciences, Beijing 100190, China;
2 School of Computer and Control Engineering, University of Chinese Academy of Sciences, Beijing 100049, China;
3 Intelligent Processor Research Center, Institute of Computing Technology, Chinese Academy of Sciences Beijing 100190, China;
4 Cambricon Ltd., Beijing 100190, China;
5 Alibaba Infrastructure Service, Alibaba Group, Hangzhou 311121, China;
6 Iflytek Co., Ltd., Hefei 230088, China;
7 Beijing Jingdong Century Trading Co., Ltd., Beijing 100176, China;
8 RDA Microelectronics, Inc., Shanghai 201203, China;
9 Advanced Micro Devices, Inc., Sunnyvale, CA 94085, U.S.A
Abstract The increasing attention on deep learning has tremendously spurred the design of intelligence processing hardware. The variety of emerging intelligence processors requires standard benchmarks for fair comparison and system optimization (in both software and hardware). However, existing benchmarks are unsuitable for benchmarking intelligence processors due to their non-diversity and nonrepresentativeness. Also, the lack of a standard benchmarking methodology further exacerbates this problem. In this paper, we propose BenchIP, a benchmark suite and benchmarking methodology for intelligence processors. The benchmark suite in BenchIP consists of two sets of benchmarks:microbenchmarks and macrobenchmarks. The microbenchmarks consist of single-layer networks. They are mainly designed for bottleneck analysis and system optimization. The macrobenchmarks contain state-of-the-art industrial networks, so as to offer a realistic comparison of different platforms. We also propose a standard benchmarking methodology built upon an industrial software stack and evaluation metrics that comprehensively reflect various characteristics of the evaluated intelligence processors. BenchIP is utilized for evaluating various hardware platforms, including CPUs, GPUs, and accelerators. BenchIP will be open-sourced soon.
This work is partially supported by the National Key Research and Development Program of China under Grant No. 2017YFB1003101, the National Natural Science Foundation of China under Grant Nos. 61472396, 61432016, 61473275, 61522211, 61532016, 61521092, 61502446, 61672491, 61602441, 61602446, 61732002, and 61702478, Beijing Science and Technology Projects under Grant No. Z151100000915072, the Science and Technology Service Network Initiative (STS) Projects of Chinese Academy of Sciences, and the National Basic Research 973 Program of China under Grant No. 2015CB358800.
About author: Jin-Hua Tao received his B.S. degree in statistics from University of Science and Technology of China, Hefei, in 2013. He is currently a Ph.D. student at Institute of Computing Technology, Chinese Academy of Sciences, Beijing. His research interests include computer architecture and computational intelligence.
Cite this article:
Jin-Hua Tao, Zi-Dong Du, Qi Guo, Hui-Ying Lan, Lei Zhang, Sheng-Yuan Zhou, Ling-Jie Xu, Cong Liu, Hai-Feng Liu, Shan Tang, Allen Rush, Willian Chen, Shao-Li Liu, Yun-Ji Chen, Tian-Shi Chen.BENCHIP: Benchmarking Intelligence Processors[J] Journal of Computer Science and Technology, 2018,V33(1): 1-23
 Krizhevsky A, Sutskever I, Hinton G E. ImageNet classification with deep convolutional neural networks. Communications of the ACM, 2017, 60(6):84-90. Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556, 2014. http://arxiv.org/abs/1409.1556, Dec. 2017. He K M, Zhang X Y, Ren S Q, Sun J. Deep residual learning for image recognition. arXiv:1512.03385, 2015. http://arxiv.org/abs/1512.03385, Dec. 2017. Venugopalan S, Rohrbach M, Donahue J, Mooney R, Darrell T, Saenko K. Sequence to sequence-video to text. In Proc. the Int. Conf. Computer Vision, Dec. 2015, pp.4534-4542. Abdel-Hamid O, Mohamed A R, Jiang H, Deng L, Penn G, Yu D. Convolutional neural networks for speech recognition. In Proc. IEEE/ACM Trans. Audio Speech and Language Processing, July 2014, pp.1533-1545. Eriguchi A, Hashimoto K, Tsuruok Y. Tree-to-sequence attentional neural machine translation. In Proc. the 54th Annual Meeting of the Association for Computational Linguistics, Aug. 2016, pp.823-833. Farabet C, Poulet C, Han J Y, LeCun Y. CNP:An FPGAbased processor for convolutional networks. In Proc. Int. Conf. Field Programmable Logic and Applications, Aug. 31-Sept. 2, 2009, pp.32-37. Zhang C, Li P, Sun G Y, Guan Y J, Xiao B J, Cong J. Optimizing FPGA-based accelerator design for deep convolutional neural networks. In Proc. the ACM/SIGDA Int. Symp. Field-Programmable Gate Arrays, Feb. 2015, pp.161-170. Chen T S, Du Z D, Sun N H, Wang J, Wu C Y, Chen Y J, Temam O. DianNao:A small-footprint high-throughput accelerator for ubiquitous machine-learning. In Proc. the 19th Int. Conf. Architectural Support for Programming Languages and Operating Systems, March 2014, pp.269-284. Farabet C, Martini B, Corda B, Akselrod P, Culurciello E, LeCun Y. NeuFlow:A runtime reconfigurable dataflow processor for vision. In Proc. IEEE Computer Society Conf. Computer Vision and Pattern Recognition Workshops, June 2011, pp.109-116. Han S, Liu X Y, Mao H Z, Pu J, Pedram A, Horowitz M A, Dally W J. EIE:Efficient inference engine on compressed deep neural network. In Proc. the 43rd Int. Symp. Computer Architecture, June 2016, pp.243-254. Bienia C, Kumar S, Singh J P, Li K. The PARSEC benchmark suite:Characterization and architectural implications. In Proc. Int. Conf. Parallel Architectures and Compilation Techniques, Oct. 2008, pp.72-81. Alwani M, Chen H, Ferdman M, Milder P. Fused-layer CNN accelerators. In Proc. the 49th Annual IEEE/ACM Int. Symp. Microarchitecture, October 2016. Judd P, Albericio J, Hetherington T, Aamodt T M, Moshovos A. Stripes:Bit-serial deep neural network computing. In Proc. the 49th Annual IEEE/ACM Int. Symp. Microarchitecture, October 2016. Rhu M, Gimelshein N, Clemons J, Zulfiqar A, Keckler S W. vDNN:Virtualized deep neural networks for scalable, memory-efficient neural network design. In Proc. the 49th Annual IEEE/ACM Int. Symp. Microarchitecture, October 2016. Zhang S J, Du Z D, Zhang L, Lan H Y, Liu S L, Li L, Guo Q, Chen T, Chen Y J. Cambricon-x:An accelerator for sparse neural networks. In Proc. the 49th Annual IEEE/ACM Int. Symp. Microarchitecture, Oct. 2016. Ji Y, Zhang Y H, Li S C, Chi P, Jiang C H, Qu P, Xie Y, Chen W G. NEUTRAMS:Neural network transformation and co-design under neuromorphic hardware constraints. In Proc. the 49th Annual IEEE/ACM Int. Symp. Microarchitecture, Oct. 2016. Kim D, Kung J, Chai S, Yalamanchili S, Mukhopadhyay S. Neurocube:A programmable digital neuromorphic architecture with high-density 3D memory. In Proc. the 43rd ACM/IEEE Annual Int. Symp. Computer Architecture, June 2016, pp.380-392. LiKamWa R, Hou Y H, Gao Y, Polansky M, Zhong L. RedEye:Analog convNet image sensor architecture for continuous mobile vision. In Proc. the 43rd ACM/IEEE Annual Int. Symp. Computer Architecture, June 2016, pp.255-266. Albericio J, Judd P, Hetherington T, Aamodt T, Jerger N E, Moshovos A. Cnvlutin:Ineffectual-neuron-free deep neural network computing. In Proc. the 43rd ACM/IEEE Annual Int. Symp. Computer Architecture, June 2016. Chi P, Li S C, Xu C, Zhang T, Zhao J S, Liu Y P, Wang Y, Xie Y. PRIME:A novel processing-in-memory architecture for neural network computation in ReRAM-based main memory. In Proc. the 43rd ACM/IEEE Annual Int. Symp. Computer Architecture, June 2016, pp.27-39. Shafiee A, Nag A, Muralimanohar N, Balasubramonian R, Strachan J P, Hu M, Williams R S, Srikumar V. ISAAC:A convolutional neural network accelerator with in-situ analog arithmetic in crossbars. In Proc. the 43rd ACM/IEEE Annual Int. Symp. Computer Architecture, June 2016, pp.14-26. Liu S L, Du Z D, Tao J H, Han D, Luo T, Xie Y, Chen Y J, Chen T S. Cambricon:An instruction set architecture for neural networks. In Proc. the 43rd ACM/IEEE Annual Int. Symp. Computer Architecture, June 2016, pp.393-405. Chen Y H, Emer J, Sze V. Eyeriss:A spatial architecture for energy-efficient dataflow for convolutional neural networks. In Proc. the 43rd ACM/IEEE Annual Int. Symp. Computer Architecture, June 2016, pp.367-379. Reagen B, Whatmough P, Adolf R, Rama S, Lee H, Lee S K, Hernández-Lobato J M, Wei G Y, Brooks D. Minerva:Enabling low-power, highly-accurate deep neural network accelerators. In Proc. the 43rd ACM/IEEE Annual Int. Symp. Computer Architecture, June 2016, pp.267-278. Song L H, Qian X H, Li H, Chen Y R. PipeLayer:A pipelined reRAM-based accelerator for deep learning. In Proc. IEEE Int. Symp. High Performance Computer Architecture, Feb. 2017, pp.541-552. Lu W Y, Yan G H, Li J J, Gong S J, Han Y H, Li X W. FlexFlow:A flexible dataflow accelerator architecture for convolutional neural networks. In Proc. IEEE Int. Symp. High Performance Computer Architecture, Feb. 2017, pp.553-564. Song M C, Hu Y, Chen H X, Li T. Towards pervasive and user satisfactory CNN across GPU microarchitectures. In Proc. IEEE Int. Symp. High Performance Computer Architecture, Feb. 2017. Rastegari M, Ordonez V, Redmon J, Farhadi A. XNOR-Net:ImageNet classification using binary convolutional neural networks. arXiv:1603.05279, 2016. http://arxiv.org/abs/1603.05279, Dec. 2017. Han S, Pool J, Tran J, Dally W J. Learning both weights and connections for efficient neural networks. arXiv:1506.02626, 2015. http://arxiv.org/abs/1506.02626, Dec. 2017. Ren S Q, He K M, Girshick R, Sun J. Faster R-CNN:Towards real-time object detection with region proposal networks. In Proc. the 28th Int. Conf. Neural Information Processing Systems, Dec. 2015, pp.91-99. Parkhi O M, Vedaldi A, Zisserman A. Deep face recognition. In Proc. the British Machine Vision Conf., September 2015, pp.41:1-41:12. Johnson J, Karpathy A, Li F F. DenseCap:Fully convolutional localization networks for dense captioning. arXiv:1511.07571, 2015. http://arxiv.org/abs/1511.07571, Dec. 2017. Noh H, Hong S, Han B. Learning deconvolution network for semantic segmentation. In Proc. IEEE Int. Conf. Computer Vision, Dec. 2015, pp.1520-1528. Graves A, Mohamed A R, Hinton G. Speech recognition with deep recurrent neural networks. In Proc. IEEE Int. Conf. Acoustics, Speech and Signal Processing, May 2013, pp.6645-6649. Andor D, Alberti C, Weiss D, Severyn A, Presta A, Ganchev K, Petrov S, Collins M. Globally normalized transition-based neural networks. arXiv:1603.06042, 2016. http://arxiv.org/abs/1603.06042, Dec. 2017. Chen Y J, Luo T, Liu S L, Zhang S J, He L Q, Wang J, Li L, Chen T S, Xu Z W, Sun N H, Temam O. DaDianNao:A machine-learning supercomputer. In Proc. the 47th Annual IEEE/ACM Int. Symp. Microarchitecture, Dec. 2014, pp.609-622. Du Z D, Fasthuber R, Chen T S, Ienne P, Li L, Feng X B, Chen Y J, Temam O. ShiDianNao:Shifting vision processing closer to the sensor. In Proc. the 42nd Annual Int. Symp. Computer Architecture, June 2015, pp.92-104. Chen T S, Chen Y J, Duranton M, Guo Q, Hashmi A, Lipasti M, Nere A, Qiu S, Sebag M, Temam O. BenchNN:On the broad potential application scope of hardware neural network accelerators. In Proc. IEEE Int. Symp. Workload Characterization, Nov. 2012, pp.36-45. Adolf R, Rama S, Reagen B, Wei G Y, Brooks D. Fathom:Reference workloads for modern deep learning methods. In Proc. IEEE Int. Symp. Workload Characterization, Sept. 2016. Murtagh F, Hernández-Pajares M. The Kohonen selforganizing map method:An assessment. Journal of Classification, 1995, 12(2):165-190. Jia Y Q, Shelhamer E, Donahue J et al. Caffe:Convolutional architecture for fast feature embedding. arXiv:1408.5093, 2014. http://arxiv.org/abs/1408.5093, Dec. 2017. Chen T Q, Li M, Li Y T et al. MXNET:A flexible and efficient machine learning library for heterogeneous distributed systems. arXiv:1512.01274, 2015. http://arxiv.org/abs/1512.01274, Dec. 2017. Lecun Y, Bottou L, Bengio Y, Haffner P. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 1998, 86(11):2278-2324. Karpathy A, Li F F. Deep visual-semantic alignments for generating image descriptions. In Proc. IEEE Conf. Computer Vision and Pattern Recognition, June 2015, pp.3128-3137. He K M, Zhang X Y, Ren S Q, Sun J. Delving deep into rectifiers:Surpassing human-level performance on ImageNet classification. In Proc. IEEE Int. Conf. Computer Vision, Dec. 2015, pp.1026-1034. Taigman Y, Yang M, Ranzato M, Wolf L. DeepFace:Closing the gap to human-level performance in face verification. In Proc. IEEE Conf. Computer Vision and Pattern Recognition, June 2014, pp.1701-1708. Le Q V. Building high-level features using large scale unsupervised learning. In Proc. IEEE Int. Conf. Acoustics, Speech and Signal Processing, May 2013, pp.8595-8598. Jouppi N P, Young C, Patil N et al. In-datacenter performance analysis of a tensor processing unit. arXiv:1704.04760, 2017. http://arxiv.org/abs/1704.04760, Dec. 2017. Phansalkar A, Joshi A, John L K. Analysis of redundancy and application balance in the SPEC CPU2006 benchmark suite. In Proc. the 34th Annual Int. Symp. Computer Architecture, June 2007, pp.412-423. McCalpin J D. Memory bandwidth and machine balance in current high performance computers. In Proc. the IEEE Computer Society Technical Committee on Computer Architecture, Dec. 1995, pp.19-25. Bull J M, O'Neill D. A microbenchmark suite for OpenMP 2.0. ACM SIGARCH Computer Architecture News, 2001, 29(5):41-48. Graves A, Jaitly N. Towards end-to-end speech recognition with recurrent neural networks. In Proc. the 31st Int. Conf. Machine Learning, June 2014, pp.1764-1772. Marcus M P, Santorini B, Marcinkiewicz M A. Building a large annotated corpus of English:The Penn treebank. Computational Linguistics, 1993, 19(2):313-330. Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z H, Karpathy A, Khosla A, Bernstein M, Berg A C, Li F F. ImageNet large scale visual recognition challenge. International Journal of Computer Vision, 2015, 115(3):211-252. Everingham M, Van Gool L, Williams C K I, Winn J, Zisserman A. The PASCAL visual object classes (VOC) challenge. International Journal of Computer Vision, 2010, 88(2):303-338. Huang G B, Ramesh M, Berg T, Learned-Miller E. Labeled faces in the wild:A database for studying face recognition in unconstrained environments. Technical Report 07-49, Amherst:University of Massachusetts, 2007. http://viswww.cs.umass.edu/lfw/, Dec. 2017. Chen D L, Dolan W B. Collecting highly parallel data for paraphrase evaluation. In Proc. the 49th Annual Meeting of the Association for Computational Linguistics:Human Language Technologies, June 2011, pp.190-200. Mucci P J, Browne S, Deane C, Ho G. PAPI:A portable interface to hardware performance counters. In Proc. Department of Defense HPCMP Users Group Conf., June 1999, pp.7-10. Ding C, Zhong Y T. Predicting whole-program locality through reuse distance analysis. In Proc. the ACM SIGPLAN Conf. Programming Language Design and Implementation, June 2003, pp.245-257. Pawlowski J T. Hybrid memory cube:Breakthrough dram performance with a fundamentally re-architected dram subsystem. In Proc. the 23rd Hot Chips Symp., August 2011. Courbariaux M, Bengio Y. BinaryNet:Training deep neural networks with weights and activations constrained to +1 or -1. arXiv:1602.02830, 2016. http://arxiv.org/abs/1602.02830v1, Dec. 2017. Rastegari M, Ordonez V, Redmon J, Farhadi A. XNOR-Net:ImageNet classification using binary convolutional neural networks. arXiv:1603.05279, 2016. http://arxiv.org/abs/1603.05279, Dec. 2017. Denkowski M, Lavie A. Meteor universal:Language specific translation evaluation for any target language. In Proc. the 9th Workshop on Statistical Machine Translation, June 2014, pp.376-380. Keckler S W, Dally W J, Khailany B, Garland M, Glasco D. GPUs and the future of parallel computing. IEEE Micro, 2011, 31(5):7-17.