›› 2013,Vol. 28 ›› Issue (1): 54-71.doi: 10.1007/s11390-013-1312-x

所属专题: Computer Architecture and Systems Computer Networks and Distributed Computing

• Special Section on Selected Paper from NPC 2011 • 上一篇    下一篇

适用于3-D 片上网络的高效能运行时增量映射

Xiao-Hang Wang1,2 (王小航), Peng Liu2,* (刘鹏), Mei Yang3 (杨梅), Maurizio Palesi4, Ying-Tao Jiang3 (蒋颖涛), and Michael C Huang5 (黄巍)   

  • 收稿日期:2012-03-05 修回日期:2012-11-01 出版日期:2013-01-05 发布日期:2013-01-05
  • 基金资助:

    This work is supported by the National Natural Science Foundation of China under Grant Nos. 60873112 and 61028004, the National Natural Science Foundation of USA under Grant No. CNS-1126688.

Energy Efficient Run-Time Incremental Mapping for 3-D Networks-on-Chip

Xiao-Hang Wang1,2 (王小航), Peng Liu2,* (刘鹏), Mei Yang3 (杨梅), Maurizio Palesi4, Ying-Tao Jiang3 (蒋颖涛), and Michael C Huang5 (黄巍)   

  1. 1. Intelligent Chips and Systems Research Centre, Guangzhou Institute of Advanced Technology, Chinese Academy of Sciences Guangzhou 511458, China;
    2. Department of Information Science and Electronic Engineering, Zhejiang University, Hangzhou 310027, China;
    3. Department of Electrical and Computer Engineering, University of Nevada, Las Vegas, Nevada, U.S.A.;
    4. Faculty of Engineering, Kore University, Catania, Italy;
    5. Department of Electrical and Computer Engineering, University of Rochester, Rochester, New York, U.S.A.
  • Received:2012-03-05 Revised:2012-11-01 Online:2013-01-05 Published:2013-01-05
  • Supported by:

    This work is supported by the National Natural Science Foundation of China under Grant Nos. 60873112 and 61028004, the National Natural Science Foundation of USA under Grant No. CNS-1126688.

未来多核片上系统面临片上互连和设计复杂度等问题,3-D片上网络可以作为上述问题的一种解决方案.由于未到达应用的行为和特征无法预知,并且存在低能耗限制,使得如何设计和优化高效3-D片上网络运行时映射方法成为一个挑战性的问题.在本文中,我们提出一种适用于3-D 片上网络的高效能运行时增量映射方法(取名为ERIM).ERIM可以优化核间通信所造成的能耗,减少碎片效应,同时满足温度的限制.在我们的方法中,为了进行最短路径路由和减少通信能耗,新到达的应用首先被映射到一个立方体节点区域.碎片节点可以被重复利用以提高资源利用率.在实验评估中,本文所提出的ERIM算法和最优算法(分支限定)、两个启发式算法(TB和TL)进行比较,评估它们的性能.实验结果表明,ERIM和TB/TL相比,可以减少能耗(大约可减少10%),降低系统响应时间,并提高系统利用率.

Abstract: 3-D Networks-on-Chip (NoC) emerge as a potent solution to address both the interconnection and design complexity problems facing future Multiprocessor System-on-Chips (MPSoCs). Effective run-time mapping on such 3-D NoC-based MPSoCs can be quite challenging, as the arrival order and task graphs of the target applications are typically not known a priori, which can be further complicated by stringent energy requirements for NoC systems. This paper thus presents an energy-aware run-time incremental mapping algorithm (ERIM) for 3-D NoC which can minimize the energy consumption due to the data communications among processor cores, while reducing the fragmentation effect on the incoming applications to be mapped, and simultaneously satisfying the thermal constraints imposed on each incoming application. Specifically, incoming applications are mapped to cuboid tile regions for lower energy consumption of communication and the minimal routing. Fragment tiles due to system fragmentation can be gleaned for better resource utilization. Extensive experiments have been conducted to evaluate the performance of the proposed algorithm ERIM, and the results are compared against the optimal mapping algorithm (branch-and-bound) and two heuristic algorithms (TB and TL). The experiments show that ERIM outperforms TB and TL methods with significant energy saving (more than 10%), much reduced average response time, and improved system utilization.

[1] Pavlidis V F, Friedman E G. 3-D topologies for networks-on-chip. IEEE Trans. Very Large Scale Integration Systems,2007, 15(10): 1081-1090
[2] Davis W R, Wilson J, Mick S et al. Demystifying 3D ICs:The pros and cons of going vertical. IEEE Design and Testof Computers, 2005, 22(6): 498-511.
[3] Triviño F, S醤chez J L, Alfaro F J, Flich J. Virtualizingnetwork-on-chip resources in chip-multiprocessors. Micropro-cessors and Microsystems, 2011, 35(2): 230-245.
[4] Zhu C, Gu Z, Shang L, Dick R P, Joseph R. Three-dimensionalchip-multiprocessor run-time thermal management. IEEETrans. Computer-Aided Design of Integrated Circuits andSystems, 2008, 27(8): 1479-1492.
[5] Addo-Quaye C. Thermal-aware mapping and placement for3-D NoC designs. In Proc. Int. SoC Conf., Sept. 2005,pp.25-28.
[6] Chou C L, Marculescu R. Run-time task allocation consider-ing user behavior in embedded multiprocessor networks-on-chip. IEEE Trans. Computer-Aided Design of IntegratedCircuits and Systems, 2010, 29(1): 78-91.
[7] Chou C L, Ogras U Y, Marculescu R. Energy- andperformance-aware incremental mapping for networks on chipwith multiple voltage levels. IEEE Trans. Computer-AidedDesign of Integrated Circuits and Systems, 2008, 27(10):1866-1879.
[8] Matsutani H, Koibuchi M, Amano H. Tightly-coupled multi-layer topologies for 3-D NoCs, In Proc. Int. Conf. ParallelProcessing, Sept. 2007, Article No.75.
[9] Feero B S, Pande P P. Networks-on-chip in a three-dimensional environment: A performance evaluation. IEEETrans. Computers, 2009, 58(1): 32-45.
[10] Kim J, Nicopoulos C, Park D et al. A novel dimensionally-decomposed router for on-chip communication in 3D archi-tectures. In Proc. Int. Symp. Computer Architecture, June2007, pp.138-149.
[11] Seiculescu C, Murali S, Benini L, De Micheli G. Sunfloor 3D:A tool for networks on chip topology synthesis for 3-D systemson chips. IEEE Trans. Computer-Aided Design of IntegratedCircuits and Systems, 2010, 29(12): 1987-2000.
[12] Pavlidis V F, Friedman E G. 3-D topologies for networks-on-chip. IEEE Trans. Very Large Scale Integration Systems,2007, 15(10): 1081-1090.
[13] Wang X, Yang M, Jiang Y, Liu P. A power-aware mappingapproach to map IP cores onto NoCs under bandwidth andlatency constraints. ACM Trans. Architecture and Code Op-timization, 2010, 7(1): 1-30.
[14] Land A H, Doig A G. An automatic method for solving dis-crete programming problems. Econometrica, 1960, 28(3):497-520.
[15] Hu J, Marculescu R. Energy- and performance-aware map-ping for regular NoC architectures. IEEE Trans. Computer-Aided Design of Integrated Circuits and Systems, 2005, 24(4):551-562.
[16] Addo-Quaye C. Thermal-aware mapping and placement for3-D NoC designs. In Proc. Int. SoC Conf., Sept. 2005,pp.25-28.
[17] Smit L T, Smit G J M, Hurink J L et al. Run-time assignmentof tasks to multiple heterogeneous processors. In Proc. the5th Progress Embedded System Symp., Oct. 2004, pp.185-192.
[18] Carvalho E, Calazans N, Moraes F. Heuristics for dynamictask mapping in NoC-based heterogeneous MPSoCs. In Proc.the 18th IEEE/IFIP Int. Workshop. Rapid System Prototyp-ing, May 2007, pp.34-40.
[19] Lo V, Windisch K J, Liu W, Nitzberg B. Noncontiguous pro-cessor allocation algorithms for mesh-connected multicomput-ers. Trans. Parallel and Distributed Systems, 1997, 8(7):712-726.
[20] Arjomand M, Sarbazi-Azad H. Voltage-frequency planning forthermal-aware, low-power design of regular 3-D NoCs. InProc. the 23rd Int. Conf. VLSI Design, Jan. 2010, pp.57-62.
[21] Zhou X, Yang J, Xu Y, Zhang Y, Zhao J. Thermal-awaretask scheduling for 3D multi-core processors. IEEE Trans.Parallel and Distributed Systems, 2010, 21(1): 60-71.
[22] Chao C H, Jheng K Y, Wang H Y et al. Traffic- and thermal-aware run-time thermal management scheme for 3D NoC sys-tems. In Proc. the 4th ACM/IEEE Int. Symp. Networks-on-Chip, May 2010, pp.223-230.
[23] Truong D, Cheng W, Mohsenin T et al. A 167-processor65nm computational platform with per-processor dynamicsupply voltage and dynamic clock frequency scaling. In Proc.IEEE Symp. VLSI Circuits, June 2008, pp.22-23.
[24] Liu Y, Yang H, Dick R P, Wang H, Shang L. Thermal vs en-ergy optimization for DVFS-enabled processors in embeddedsystems. In Proc. Int. Symp. Quality Electronic Design,March 2007, pp.204-209.
[25] Srinivasan J, Adve S V, Bose P, Rivers J A. The impactof technology scaling on lifetime reliability. In Proc. Int.Conf. Dependable Systems and Networks, June 28-July 1,2004, pp.177-186.
[26] Huang W, Ghosh S, Velusamy S et al. HotSpot: A compactthermal modeling methodology for early-stage VLSI design.IEEE Trans. Very Large Scale Integration Systems, 2006,14(5): 501-513.
[27] Dally W J, Towles B. Principles and Practices of Intercon-nection Networks. San Francisco, USA: Morgan Kaufmann,2004.
[28] Kahng A, Li B, Peh L S et al. Orion 2.0: A fast and accu-rate NoC power and area model for early-stage design spaceexploration. In Proc. Conf. Design, Automation & Test inEurope, Apr. 2009, pp.423-428.
[29] Garey M R, Johnson D S. Computers and Intractability: AGuide to the Theory of NP-Completeness. New York, USA:WH Freeman, 1979.
[30] Lin M, Gamal A E, Lu Y C, Wong S. Performance benefitsof monolithically stacked 3D-FPGA. IEEE Trans. Computer-Aided Design of Integrated Circuits and Systems, 2007, 26(2):216-229.
[31] Dick R P, Rhodes D L, Wolf W. TGFF: Task graphs for free.In Proc. the 6th Int. Workshop on Hardware/Software Code-sign, March 1998, pp.97-101.
[32] Yang Y S, Bahn J H, Lee S E, Bagherzadeh N. Parallel andpipeline processing for block cipher algorithms on a network-on-chip. In Proc. the 6th Int. Conf. Information Technology:New Generations, Apr. 2009, pp.849-854.
[33] Delorme J, Houzet D. A complete 4G radiocommunicationapplication mapping onto a 2D mesh NoC architecture. InProc. North-East Workshop. Circuits and Systems, June2006, pp.93-96.
[34] Brooks D, Tiwari V, Martonosi M. Wattch: A frameworkfor architectural-level power analysis and optimizations. InProc. the 27th Int. Symp. Computer Architecture, June2000, pp.83-94.
No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] 刘明业; 洪恩宇;. Some Covering Problems and Their Solutions in Automatic Logic Synthesis Systems[J]. , 1986, 1(2): 83 -92 .
[2] 陈世华;. On the Structure of (Weak) Inverses of an (Weakly) Invertible Finite Automaton[J]. , 1986, 1(3): 92 -100 .
[3] 高庆狮; 张祥; 杨树范; 陈树清;. Vector Computer 757[J]. , 1986, 1(3): 1 -14 .
[4] 陈肇雄; 高庆狮;. A Substitution Based Model for the Implementation of PROLOG——The Design and Implementation of LPROLOG[J]. , 1986, 1(4): 17 -26 .
[5] 黄河燕;. A Parallel Implementation Model of HPARLOG[J]. , 1986, 1(4): 27 -38 .
[6] 闵应骅; 韩智德;. A Built-in Test Pattern Generator[J]. , 1986, 1(4): 62 -74 .
[7] 唐同诰; 招兆铿;. Stack Method in Program Semantics[J]. , 1987, 2(1): 51 -63 .
[8] 闵应骅;. Easy Test Generation PLAs[J]. , 1987, 2(1): 72 -80 .
[9] 朱鸿;. Some Mathematical Properties of the Functional Programming Language FP[J]. , 1987, 2(3): 202 -216 .
[10] 李明慧;. CAD System of Microprogrammed Digital Systems[J]. , 1987, 2(3): 226 -235 .
版权所有 © 《计算机科学技术学报》编辑部
本系统由北京玛格泰克科技发展有限公司设计开发 技术支持:support@magtech.com.cn
总访问量: