Journal of Computer Science and Technology ›› 2018, Vol. 33 ›› Issue (5): 984-997.doi: 10.1007/s11390-018-1869-5

Special Issue: Computer Architecture and Systems

• Computer Architecture and Systems • Previous Articles     Next Articles

DimRouter: A Multi-Mode Router Architecture for Higher Energy-Proportionality of On-Chip Networks

Shi-Qi Lian1,2, Student Member, CCF, IEEE, Ying Wang1,*, Member, CCF, ACM, IEEE, Yin-He Han1,*, Senior Member, CCF, IEEE, Member, ACM   

  1. 1 State Key Laboratory of Computer Architecture, Institute of Computing Technology, Chinese Academy of Sciences Beijing 100190, China;
    2 University of Chinese Academy of Sciences, Beijing 100049, China
  • Received:2017-03-29 Revised:2018-05-09 Online:2018-09-17 Published:2018-09-17
  • Contact: Ying Wang,;Yin-He Han,;
  • Supported by:
    This work was supported by the National Natural Science Foundation of China under Grant Nos. 61522406, 61504153, 61532017, and 61521092, and Beijing Municipal Science and Technology Commission under Grant No. Z171100000117019.

In the dark silicon era, many independent components of many-core processors are becoming voluntarily inactive due to the constraint of power consumption on a chip. However, to keep network connectivity, the on-chip interconnection must still be kept activated and wastes considerable energy to avoid the isolation of these inactive components, harming the energy-proportionality of the whole processor chip. In this paper, we propose a novel design to provide more energyproportional on-chip connection without damaging the network connectivity. To achieve this goal, we redesign the router architecture. The new architecture, DimRouter, supports three modes:normal, dark and dim. In the dim mode, only part of the router is active and provides flexible connection while the dark mode puts all router elements in the asleep state. Moreover, to maximize the number of dark routers, we also propose a reconfiguration algorithm based on degree-constrained Steiner Tree. The evaluation result under synthetic traffic shows that the new design can reduce the energy consumption up to 85% compared with the common design. For real application traffic, the new design can also save average 46% energy consumption with 4% performance improvement.

Key words: dark silicon; energy-proportion; power gating; topology reconfiguration;

[1] Kim J S, Taylor M B, Miller J, Wentzlaff D. Energy characterization of a tiled architecture processor with on-chip networks. In Proc. the 2008 Int. Symp. Low Power Electronics and Design, Mar. 2008, pp.424-427.
[2] Barroso L A, Holzle U. The case for energy-proportional computing. Computer, 2007, 40(12):33-37.
[3] Samih A, Wang R, Krishna A, Maciocco C, Tai C, Solihin Y. Energy-efficient interconnect via Router Parking. In Proc. the 19th Int. Symp. High Performance Computer Architecture, Feb. 2013, pp.508-519.
[4] Chen L, Zhao L, Wang R, Pinkston T M. MP3:Minimizing performance penalty for power-gating of Clos networkon-chip. In Proc. the 20th Int. Symp. High Performance Computer Architecture, Feb. 2014, pp.296-307.
[5] Moscibroda T, Mutlu O. A case for bufferless routing in onchip networks. In Proc. the 36th Ann. Int. Symp. Computer Architecture, Jun. 2009, pp.196-207.
[6] Kim H, Kim G, Maeng S, Yeo H, Kim J. Transportationnetwork-inspired network-on-chip. In Proc. the 20th Int. Symp. High Performance Computer Architecture, Feb. 2014, pp.332-343.
[7] Mishra A K, Vijaykrishnan N, Das C R. A case for heterogeneous on-chip interconnects for CMPs. In Proc. the 38th Ann. Int. Symp. Computer Architecture, Jun. 2011, pp.389-400.
[8] Fang J, Leng Z, Liu S, Yao Z, Sui X. Exploring heterogeneous NoC design space in heterogeneous GPU-CPU architectures. Journal of Computer Science and Technology, 2015, 30(1):74-83.
[9] Mishra A K, Das R, Eachempati S, Iyer R, Vijaykrishnan N, Das C R. A case for dynamic frequency tuning in on-chip networks. In Proc. the 42nd Ann. Int. Symp. Microarchitecture, Dec. 2009, pp.292-303.
[10] Ansari A, Mishra A, Xu J, Torrellas J. Tangle:Routeoriented dynamic voltage minimization for variationafflicted, energy-efficient on-chip networks. In Proc. the 20th Int. Symp. High Performance Computer Architecture, Feb. 2014, pp.440-451.
[11] Karpuzcu U R, Kolluru K B, Kim N S, Torrellas J. VARIUS-NTV:A microarchitectural model to capture the increased sensitivity of manycores to process variations at near-threshold voltages. In Proc. the 42nd Int. Conf. Dependable Systems and Networks, Jun. 2012.
[12] Das R, Satish N, Satpathy K, Dreslinski G. Catnap:Energy proportional multiple network-on-chip. In Proc. the 40th Ann. Int. Symp. Computer Architecture, Jun. 2013, pp.320-331.
[13] Lu H, Yan G, Han Y, Wang Y, Li X. ShuttleNoC:Boosting on-chip communication efficiency by enabling localized power adaptation. In Proc. the 20th Asia and South Pacific Design Automation Conf., Jan. 2015, pp.142-147.
[14] Wu J, Dong D, Liao X, Wang L. Chameleon:Adaptive energy-efficient heterogeneous network-on-chip. In Proc. the 33rd Int. Conf. Computer Design, Oct. 2015, pp.419-422.
[15] Lian S, Wang Y, Han Y, Li X. BoDNoC:Providing bandwidth-on-demand interconnection for multigranularity memory systems. In Proc. the 22nd Asia and South Pacific Design Automation Conf., Jan. 2017, pp.738-743.
[16] Chen L, Pinkston T M. NoRD:Node-router decoupling for effective power-gating of on-chip routers. In Proc. the 45th Ann. Int. Symp. Microarchitecture, Dec. 2012, pp.270-281.
[17] Chen L, Zhu D, Pedram M, Pinkston T M. Power punch:Towards non-blocking power-gating of NoC routers. In Proc. the 21st Int. Symp. High Performance Computer Architecture, Feb. 2015, pp.378-389.
[18] Hossein F, Hadi M K, Shaahin H. SMART:A scalable mapping and routing technique for power-gating in NoC routers. In Proc. the 2017 Int. Symp. Networks-on-Chip, Oct. 2017, pp.338-343.
[19] Foulds L R, Graham R L. The Steiner problem in phylogeny is NP-complete. Advances in Applied Mathematics, 1982, 3(1):43-49.
[20] Dorigo M, Maninur V, Cobmi A. Ant system:Optimization by a colony of cooperating agents. IEEE Trans. Systems, Man, and Cybernetics, 1996, 26(1):29-41.
[21] Liu Y, Wu J, Xu K, Xu M. The degree-constrained multicasting algorithm using ant algorithm. In Proc. the 10th Int. Conf. Telecommunications, Mar. 2003, pp.370-374.
[22] Sun C, Chen C H O, Kurian G, Wei L, Miller J, Agarwal A, Peh L, Stojanovic V. DSENT-A tool connecting emerging photonics with electronics for opto-electronic networks-on-chip modeling. In Proc. the 2012 Int. Symp. Networks-on-Chip, May 2012, pp.201-210.
[23] Matsutani H, Koibuchi M, Amano H, Wang D. Run-time power gating of on-chip routers using look-ahead routing. In Proc. the 13rd Asia and South Pacific Design Automation Conf., Mar. 2008, pp.55-60.
[24] Jiang N, Becker D U, Michelogiannakis G, Balfour J, Towles B, Kim J, Dally W J. A detailed and flexible cycle-accurate network-on-chip simulator. In Proc. the 2013 Int. Symp. Performance Analysis of Systems and Software, Apr. 2013, pp.86-96.
[25] Bienia C, Kumar S, Singh J P, Li K. The PARSEC benchmark suite:Characterization and architectural implications. In Proc. the 17th Int. Conf. Parallel Architectures and Compilation Techniques, Oct. 2008, pp.72-81.
[26] Binkert N, Beckmann B, Black G et al. The gem5 simulator. ACM SIGARCH Computer Architecture News, 2011, 39(2):1-7.
[27] Matsutani H, Koibuchi M, Ikebuchi D, Usami K, Nakamura H, Amano H. Performance, area, and power evaluations of ultrafine-grained run-time power-gating routers for CMPs. IEEE Trans. Computer-Aided Design of Integrated Circuits and Systems, 2011, 30(4):520-533.
No related articles found!
Full text



[1] Sun Zhongxiu; Shang Lujun;. DMODULA:A Distributed Programming Language[J]. , 1986, 1(2): 25 -31 .
[2] Zhang Bo; Zhang Ling;. Statistical Heuristic Search[J]. , 1987, 2(1): 1 -11 .
[3] Sun Yongqiang; Lu Ruzhan; Huang Xiaorong;. Termination Preserving Problem in the Transformation of Applicative Programs[J]. , 1987, 2(3): 191 -201 .
[4] Meng Liming; Xu Xiaofei; Chang Huiyou; Chen Guangxi; Hu Mingzeng; Li Sheng;. A Tree-Structured Database Machine for Large Relational Database Systems[J]. , 1987, 2(4): 265 -275 .
[5] Lin Qi; Xia Peisu;. The Design and Implementation of a Very Fast Experimental Pipelining Computer[J]. , 1988, 3(1): 1 -6 .
[6] Sun Chengzheng; Tzu Yungui;. A New Method for Describing the AND-OR-Parallel Execution of Logic Programs[J]. , 1988, 3(2): 102 -112 .
[7] Zhang Bo; Zhang Tian; Zhang Jianwei; Zhang Ling;. Motion Planning for Robots with Topological Dimension Reduction Method[J]. , 1990, 5(1): 1 -16 .
[8] Zhou Chaochen; Liu Xinxin;. Denote CSP with Temporal Formulas[J]. , 1990, 5(1): 17 -23 .
[9] Zhou Di; Xu Xiangwen;. A Distributed Error Recovery Technique and Its Implementation and Application on UNIX[J]. , 1990, 5(2): 127 -138 .
[10] Wang Dingxing; Zheng Weimin; Du Xiaoli; Guo Yike;. On the Execution Mechanisms of Parallel Graph Reduction[J]. , 1990, 5(4): 333 -346 .

ISSN 1000-9000(Print)

CN 11-2296/TP

Editorial Board
Author Guidelines
Journal of Computer Science and Technology
Institute of Computing Technology, Chinese Academy of Sciences
P.O. Box 2704, Beijing 100190 P.R. China
  Copyright ©2015 JCST, All Rights Reserved