We use cookies to improve your experience with our site.

Indexed in:

SCIE, EI, Scopus, INSPEC, DBLP, CSCD, etc.

Submission System
(Author / Reviewer / Editor)
Peng Chen, Lei Zhang, Yin-He Han, Yun-Ji Chen. A General-Purpose Many-Accelerator Architecture Based on Dataflow Graph Clustering of Applications[J]. Journal of Computer Science and Technology, 2014, 29(2): 239-246. DOI: 10.1007/s11390-014-1426-9
Citation: Peng Chen, Lei Zhang, Yin-He Han, Yun-Ji Chen. A General-Purpose Many-Accelerator Architecture Based on Dataflow Graph Clustering of Applications[J]. Journal of Computer Science and Technology, 2014, 29(2): 239-246. DOI: 10.1007/s11390-014-1426-9

A General-Purpose Many-Accelerator Architecture Based on Dataflow Graph Clustering of Applications

Funds: This paper is supported by the National Natural Science Foundation of China under Grant Nos. 601173006, 61221062, and the Strategic Priority Research Program of the Chinese Academy of Sciences under Grant No. XDA06010403.
More Information
  • Author Bio:

    Peng Chen received his B.S. de-gree from the Department of Com-puter Science and Technology, Hefei University of Technology, in 2012. He is currently a master's student in the Institute of Computing Technol-ogy (ICT), Chinese Academy of Sci-ences (CAS), Beijing. His research interests include program analysis and accelerator design.

  • Received Date: October 29, 2013
  • Revised Date: January 02, 2014
  • Published Date: March 04, 2014
  • The combination of growing transistor counts and limited power budget within a silicon die leads to the utilization wall problem (a.k.a. "Dark Silicon"), that is only a small fraction of chip can run at full speed during a period of time. Designing accelerators for specific applications or algorithms is considered to be one of the most promising approaches to improving energy-effciency. However, most current design methods for accelerators are dedicated for certain applications or algorithms, which greatly constrains their applicability. In this paper, we propose a novel general-purpose many-accelerator architecture. Our contributions are two-fold. Firstly, we propose to cluster dataflow graphs (DFGs) of hotspot basic blocks (BBs) in applications. The DFG clusters are then used for accelerators design. This is because a DFG is the largest program unit which is not specific to a certain application. We analyze 17 benchmarks in SPEC CPU 2006, acquire over 300 DFGs hotspots by using LLVM compiler tool, and divide them into 15 clusters based on graph similarity. Secondly, we introduce a function instruction set architecture (FISC) and illustrate how DFG accelerators can be integrated with a processor core and how they can be used by applications. Our results show that the proposed DFG clustering and FISC design can speed up SPEC benchmarks 6.2X on average.
  • [1]
    Govindaraju V, Ho C H, Sankaralingam K. Dynamically specialized datapaths for energy effcient computing. In Proc. the 17th Symp. High Performance Computer Architecture (HPCA), February 2011, pp.503-514.
    [2]
    Venkatesh G, Sampson J, Goulding N et al. Conservation cores: Reducing the energy of mature computations. ACM SIGARCH Computer Architecture News, 2010, 38(1): 205-218.
    [3]
    Guha A, Zhang Y, ur Rasool R et al. Systematic evaluation of workload clustering for extremely energy-effcient architec-tures. ACM SIGARCH Computer Architecture News, 2013, 41(2): 22-29.
    [4]
    Cong J, Ghodrat M A, Gill M et al. Architecture support in accelerator-rich CMPs. InProc. the 49th Annual Design Automation Conference (DAC), June 2012, pp.843-849.
    [5]
    Hameed R, Qadeer W,Wachs M et al. Understanding sources of ineffciency in general-purpose chips. In Proc. the 37th ISCA, June 2010, pp.37-47.
    [6]
    Memik G, Memik S O, Mangione-Smith W H. Design and analysis of a layer seven network processor accelerator using reconfigurable logic. In Proc. the 10th IEEE Symposium on Field-Programmable Custom Computing Machines, April 2002, pp.131-140.
    [7]
    Yoon C W, Woo R, Kook J et al. An 80/20-MHz 160-mW multimedia processor integrated with embedded DRAM, MPEG-4 accelerator and 3-D rendering engine for mobile applications. IEEE Journal of Solid-State Circuits, 2001, 36(11): 1758-1767.
    [8]
    Steinkraus D, Buck I, Simard P Y. Using GPUs for machine learning algorithms. In Proc. the 8th Int. Conf. Doc-ument Analysis and Recognition, August 29-September 1, 2005, pp.1115-1119.
    [9]
    Pionteck T, Staake T, Stiefmeier T et al. Design of a reconfigurable AES encryption/decryption engine for mobile terminals. In Proc. Int. Symp. Circuits and Systems, May 2004, Vol.2, pp.545-548.
    [10]
    Lattner C, Adve V. LLVM: A compilation framework for life-long program analysis & transformation. In Proc. Int. Symp. Code Generation and Optimization: Feedback-Directed and Runtime Optimization, March 2004, pp.75-86.
    [11]
    Melnik S, Garcia-Molina H, Rahm E. Similarity flooding: A versatile graph matching algorithm and its application to schema matching. In Proc. the 18th Int. Conf. Data En-gineering, March 2002, pp.117-128.
    [12]
    Wu L, Weaver C, Austin T. CryptoManiac: A fast flexible architecture for secure communication. In Proc. Int. Symp. Computer Architecture, June 30-July 4, 2001, pp.110-119.
    [13]
    Ebeling C, Cronquist D C, Franklin P. RaPiD | Reconfigu-rable pipelined datapath. In Proc. the 6th International Workshop on Field-Programmable Logic, Sept. 1996, pp.126-135.
    [14]
    Goldstein S C, Schmit H, Moe M et al. PipeRench: A copro-cessor for streaming multimedia acceleration. In Proc. the 26th Int. Symp. Computer Architecture, May 1999, pp.28-39.
    [15]
    Ahn J H, DallyWJ, Khailany B et al. Evaluating the imagine stream architecture. In Proc. the 31st Int. Symp. Computer Architecture, June 2004.
    [16]
    Boeing A, Braunl T. Evaluation of real-time physics simula-tion systems. In Proc. the 5th International Conference on Computer Graphics and Interactive Techniques in Australia and Southeast Asia, December 2007, pp.281-288.
    [17]
    Luo Z, Liu H, Wu X. Artificial neural network computation on graphic process unit. In Proc. Int. Joint Conf. Neural Networks, July 31-Aug. 4, 2005, Vol.1, pp.622-626.
    [18]
    Lindholm E, Nickolls J, Oberman S et al. NVIDIA Tesla: A unified graphics and computing architecture. IEEE Micro, 2008, 28(2): 39-55.
    [19]
    Owens J D, Luebke D, Govindaraju N et al. A survey of general-purpose computation on graphics hardware. Com-puter Graphics Forum, 2007, 26(1): 80-113.
    [20]
    Demme J, Sethumadhavan S. Approximate graph clustering for program characterization. ACM Transactions on Archi-tecture and Code Optimization (TACO), 2012, 8(4): Article No. 21.
    [21]
    Cong J, Liu B, Majumdar R et al. Behavior-level observ-ability analysis for operation gating in low-power behavioral synthesis. ACM Transactions on Design Automation of Elec-tronic Systems (TODAES), 2010, 16(1): Article No.4.
  • Related Articles

    [1]Mo Zou, Ming-Zhe Zhang, Ru-Jia Wang, Xian-He Sun, Xiao-Chun Ye, Dong-Rui Fan, Zhi-Min Tang. Skyway: Accelerate Graph Applications with a Dual-Path Architecture and Fine-Grained Data Management[J]. Journal of Computer Science and Technology, 2024, 39(4): 871-894. DOI: 10.1007/s11390-023-2939-x
    [2]Ruo-Shi Li, Ping Peng, Zhi-Yuan Shao, Hai Jin, Ran Zheng. Evaluating RISC-V Vector Instruction Set Architecture Extension with Computer Vision Workloads[J]. Journal of Computer Science and Technology, 2023, 38(4): 807-820. DOI: 10.1007/s11390-023-1266-6
    [3]Yu-Wei Wu, Qing-Gang Wang, Long Zheng, Xiao-Fei Liao, Hai Jin, Wen-Bin Jiang, Ran Zheng, Kan Hu. FDGLib: A Communication Library for Efficient Large-Scale Graph Processing in FPGA-Accelerated Data Centers[J]. Journal of Computer Science and Technology, 2021, 36(5): 1051-1070. DOI: 10.1007/s11390-021-1242-y
    [4]Zhi-Xin Qi, Hong-Zhi Wang, An-Jie Wang. Impacts of Dirty Data on Classification and Clustering Models: An Experimental Evaluation[J]. Journal of Computer Science and Technology, 2021, 36(4): 806-821. DOI: 10.1007/s11390-021-1344-6
    [5]Chuang-Yi Gui, Long Zheng, Bingsheng He, Cheng Liu, Xin-Yu Chen, Xiao-Fei Liao, Hai Jin. A Survey on Graph Processing Accelerators: Challenges and Opportunities[J]. Journal of Computer Science and Technology, 2019, 34(2): 339-371. DOI: 10.1007/s11390-019-1914-z
    [6]Gill Barequet, Matthew Dickerson, David Eppstein, David Hodorkovsky, Kira Vyatkina. On 2-Site Voronoi Diagrams Under Geometric Distance Functions[J]. Journal of Computer Science and Technology, 2013, 28(2): 267-277. DOI: 10.1007/s11390-013-1328-2
    [7]Feng Wang , Can-Qun Yang, Yun-Fei Du, Juan Chen, Hui-Zhan Yi, Wei-Xia Xu. Optimizing Linpack Benchmark on GPU-Accelerated Petascale Supercomputer[J]. Journal of Computer Science and Technology, 2011, 26(5): 854-865. DOI: 10.1007/s11390-011-0184-1
    [8]Xiang-Dong Hu, Yong Guo, Ying Zhu, Xin Guo, Peng Wang. Design and Application of Instruction Set Simulator on Multi-Core Verification[J]. Journal of Computer Science and Technology, 2010, 25(2): 267-273.
    [9]Yu-Bao Liu, Jia-Rong Cai, Jian Yin, Ada Wai-Chee Fu. Clustering Text Data Streams[J]. Journal of Computer Science and Technology, 2008, 23(1): 112-128.
    [10]ZHUANG Yueting, RUI Yong, Thomas S.Huang. Video Key Frame Extraction by Unsupervised Clustering and Feedback Adjustment[J]. Journal of Computer Science and Technology, 1999, 14(3): 283-287.

Catalog

    Article views (100) PDF downloads (1312) Cited by()
    Related

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return