We use cookies to improve your experience with our site.
陈鹏, 张磊, 韩银和, 陈云霁. 一种基于应用程序中数据流图分类方法的通用多加速器架构[J]. 计算机科学技术学报, 2014, 29(2): 239-246. DOI: 10.1007/s11390-014-1426-9
引用本文: 陈鹏, 张磊, 韩银和, 陈云霁. 一种基于应用程序中数据流图分类方法的通用多加速器架构[J]. 计算机科学技术学报, 2014, 29(2): 239-246. DOI: 10.1007/s11390-014-1426-9
Peng Chen, Lei Zhang, Yin-He Han, Yun-Ji Chen. A General-Purpose Many-Accelerator Architecture Based on Dataflow Graph Clustering of Applications[J]. Journal of Computer Science and Technology, 2014, 29(2): 239-246. DOI: 10.1007/s11390-014-1426-9
Citation: Peng Chen, Lei Zhang, Yin-He Han, Yun-Ji Chen. A General-Purpose Many-Accelerator Architecture Based on Dataflow Graph Clustering of Applications[J]. Journal of Computer Science and Technology, 2014, 29(2): 239-246. DOI: 10.1007/s11390-014-1426-9

一种基于应用程序中数据流图分类方法的通用多加速器架构

A General-Purpose Many-Accelerator Architecture Based on Dataflow Graph Clustering of Applications

  • 摘要: 硅片上晶体管数量的增加和有限的功耗预算造成了“利用墙”(也叫做“深硅”)问题的出现。它是指在一段时间内,芯片上面只有小部分晶体管能够全速运行。为特定应用和算法设计加速器被认为是提高能效最有前途的之一。然而,当前大部分设计方法都是针对于某个应用或者算法。这严重限制加速器的可应用性。我们的贡献主要有以下两个部分。第一,我们提出了将应用程序中最“热”的基本块的数据流图加以分类。这些分得的集合将被用于设计加速器。这是因为数据流图是不针对于某一个特定应用的最大的程序单元。我们用LLVM编译工具分析过SPEC CPU 2006里面的17个测试程序,得到了超过300个数据流图,并基于图相似原理将它们划分为15个集合。第二,我们介绍了一种函数指令集计算机,并且说明了如何将数据流图加速器与通用处理器核集成在一起以及如何使得这些它们被应用程序使用。我们的结果显示,数据流图分类和函数指令集计算机能够使得SPEC测试程序平均有6.2倍的加速比。

     

    Abstract: The combination of growing transistor counts and limited power budget within a silicon die leads to the utilization wall problem (a.k.a. "Dark Silicon"), that is only a small fraction of chip can run at full speed during a period of time. Designing accelerators for specific applications or algorithms is considered to be one of the most promising approaches to improving energy-effciency. However, most current design methods for accelerators are dedicated for certain applications or algorithms, which greatly constrains their applicability. In this paper, we propose a novel general-purpose many-accelerator architecture. Our contributions are two-fold. Firstly, we propose to cluster dataflow graphs (DFGs) of hotspot basic blocks (BBs) in applications. The DFG clusters are then used for accelerators design. This is because a DFG is the largest program unit which is not specific to a certain application. We analyze 17 benchmarks in SPEC CPU 2006, acquire over 300 DFGs hotspots by using LLVM compiler tool, and divide them into 15 clusters based on graph similarity. Secondly, we introduce a function instruction set architecture (FISC) and illustrate how DFG accelerators can be integrated with a processor core and how they can be used by applications. Our results show that the proposed DFG clustering and FISC design can speed up SPEC benchmarks 6.2X on average.

     

/

返回文章
返回