智能处理器的评测基准

doi:10.1007/s11390-018-1805-8

摘要: 目的：提供针对智能处理器的评测基准负载和方法创新点：1.提出微基准和宏基准用作评测负载，相比之前提出的一些测试负载更具有代表性和多样性，涵盖更多的计算模式和应用领域；2.微基准用于针对智能处理硬件设计的性能/功耗等瓶颈分析，比以前的测试负载提供更多的关于性能/功耗/面积等相关的设计指导；3.宏基准用于针对不同硬件平台的性能对比，相对之前的测试负载更具有公平性；4.提供使用简便且高效的软件环境，输入为与caffe，tensorflow等开源框架兼容的模型文件，输出为针对硬件的性能/功耗等评测结果。方法：1.通过对多种神经网络算法进行特征分析、相关性分析以及应用领域分析，选出涵盖不同计算模式和不同应用领域的代表性算法结构，作为微基准和宏基准；2.提供简便且高效的软件环境和基准评测负载模型，提供通用api。用户可以重载api以及自定义运行次数和决定使用微基准负载还是宏基准负载。软件会根据用户设置，运行并得到相应的性能/功耗等结果，用于发现设计瓶颈或性能比较。结论：通过简便且高效的软件环境和基准评测负载，便于用户使用这些负载来发现硬件设计的性能等瓶颈和跟别的硬件平台进行公平的性能比较；同时还提供通用的api，便于用户自定义自己的库文件，方便对不同硬件平台进行测试。同时，根据面向不同领域的基准负载所反映的结果可以方便设计针对特定领域的智能硬件。

Abstract: The increasing attention on deep learning has tremendously spurred the design of intelligence processing hardware. The variety of emerging intelligence processors requires standard benchmarks for fair comparison and system optimization (in both software and hardware). However, existing benchmarks are unsuitable for benchmarking intelligence processors due to their non-diversity and nonrepresentativeness. Also, the lack of a standard benchmarking methodology further exacerbates this problem. In this paper, we propose BenchIP, a benchmark suite and benchmarking methodology for intelligence processors. The benchmark suite in BenchIP consists of two sets of benchmarks:microbenchmarks and macrobenchmarks. The microbenchmarks consist of single-layer networks. They are mainly designed for bottleneck analysis and system optimization. The macrobenchmarks contain state-of-the-art industrial networks, so as to offer a realistic comparison of different platforms. We also propose a standard benchmarking methodology built upon an industrial software stack and evaluation metrics that comprehensively reflect various characteristics of the evaluated intelligence processors. BenchIP is utilized for evaluating various hardware platforms, including CPUs, GPUs, and accelerators. BenchIP will be open-sourced soon.

智能处理器的评测基准

BENCHIP: Benchmarking Intelligence Processors