›› 2013, Vol. 28 ›› Issue (6): 1045-1053.doi: 10.1007/s11390-013-1396-3

Special Issue: Computer Architecture and Systems; Computer Networks and Distributed Computing

• Architecture and VLSI Design • Previous Articles     Next Articles

RevivePath:Resilient Network-on-Chip Design Through Data Path Salvaging of Router

Yin-He Han1, 2 (韩银和), Senior Member, CCF, IEEE, Member, ACM, Cheng Liu1, 2 (刘成), Hang Lu1, 2 (路航), Student Member, CCF, IEEE, Wen-Bo Li1, 2 (李文博), Lei Zhang1, 2 (张磊), Member, CCF, IEEE, and Xiao-Wei Li1, 2 (李晓维), Senior Member, CCF, IEEE   

  1. 1 State Key Laboratory of Computer Architecture, Institute of Computing Technology, Chinese Academy of Sciences Beijing 100190, China;
    2. University of Chinese Academy of Sciences, Beijing 100049, China
  • Received:2012-10-16 Revised:2013-08-15 Online:2013-11-05 Published:2013-11-05
  • About author:Yin-He Han received the B.Eng. degree from Nanjing University of Aeronautics and Astronautics, China, in 2001, and the M. Eng. and Ph.D. degrees in computer science from the Institute of Computing Technology (ICT), Chinese Academy of Sciences (CAS), Beijing, in 2003 and 2006, respectively. He is currently an associate professor at ICT, CAS. His research interests include VLSI architecture design and test, especially on fault-tolerant and low power architecture. Dr. Han was a recipient of Best Paper Award at Asian Test Symposium (ATS) 2003. He is a member of IEEE/ACM/CCF/IEICE. He is the program chair of ATS 2014, finance chair of HPCA 2013, program co-chair of WRTLT 2009, and has served and serves on the technical program committees of several IEEE and ACM conferences, including HPCA 2013, ASPDAC 2013, Cool Chip 2013, ATS 2008~2010, GVLSI 2009~2010, etc.
  • Supported by:

    The work was supported in part by the National Basic Research 973 Program of China under Grant No. 2011CB302503, and the National Natural Science Foundation of China under Grant Nos. 61076037, 60906018, 60921002.

Network-on-Chip (NoC) with excellent scalability and high bandwidth has been considered to be the most promising communication architecture for complex integration systems. However, NoC reliability is getting continuously challenging for the shrinking semiconductor feature size and increasing integration density. Moreover, a single node failure in NoC might destroy the network connectivity and corrupt the entire system. Introducing redundancies is an efficient method to construct a resilient communication path. However, prior work based on redundancies, either results in limited reliability with coarse grain protection or involves even larger hardware overhead with fine grain. In this paper, we notice that data path such as links, buffers and crossbars in NoC can be divided into multiple identical parallel slices, which can be utilized as inherent redundancy to enhance reliability. As long as there is one fault-free slice left available, the proposed salvaging scheme named as RevivePath, can be employed to make the overall data path still functional. Furthermore, RevivePath uses the direct redundancy to protect the control path such as switch arbiter, routing computation, to provide a full fault-tolerant scheme to the whole router. Experimental results show that it achieves quite high reliability with graceful performance degradation even under high fault rate.

[1] Benini L, De Micheli G. Networks on chips: A new SoC paradigm. Computer, 2002, 35(1): 70-78.

[2] De Micheli G, Benini L. Networks on Chips: Technology and Tools. Morgan Kaufmann Pub, 2006.

[3] Borkar S. Microarchitecture and design challenges for gigascale integration. In Proc. the 37th International Symposium on Microarchitecture, Dec. 2004, p.3.

[4] Dally W, Towles B. Route packets, not wires: On-chip interconnection networks. In Proc. Design Automation Conference, June 2001, pp.684-689.

[5] Borkar S. Designing reliable systems from unreliable components: The challenges of transistor variability and degradation. IEEE Micro, 2005, 25(6): 10-16.

[6] Constantinescu C. Trends and challenges in VLSI circuit reliability. IEEE Micro, 2003, 23(4): 14-19.

[7] Zhang L, Han Y, Xu Q et al. On topology reconfiguration for defect-tolerant NoC-based homogeneous manycore systems. IEEE Trans. Very Large Scale Integration Systems, 2009, 17(9): 1173-1186.

[8] Boppana R V, Chalasani S. Fault-tolerant routing with nonadaptive wormhole algorithms in mesh networks. In Proc. Supercomputing, Nov. 1994, pp.693-702.

[9] Zhang Z, Greiner A, Taktak S. A reconfigurable routing algorithm for a fault-tolerant 2D-mesh network-on-chip. In Proc. Design Automation Conference, June 2008, pp.441-446.

[10] Flick D, DeOrio A, Chen G et al. A highly resilient routing algorithm for fault-tolerant NoCs. In Proc. Conf. Design, Automation and Test in Europe, April 2009, pp.21-26.

[11] Flich J, Rodrigo S, Duato J. An efficient implementation of distributed routing algorithms for NoCs. In Proc. Int. Symp. Networks-on-Chip, April 2008, pp.87-96.

[12] Wang J, Gu H, Yang Y et al. An energyand buffer-aware fully adaptive routing algorithm for Network-on-Chip. Microelectronics Journal, 2013, 44(2): 137-144.

[13] Xiang D, Zhang Y, Pan Y. Practical deadlock-free faulttolerant routing in meshes based on the planar network fault model. IEEE Trans. Computers, 2009, 58(5): 620-633.

[14] Xiang D, Luo W. An efficient adaptive deadlock-free routing algorithm for torus networks. IEEE Trans. Parallel and Distributed System, 2012, 23(5): 800-808.

[15] Siewiorek D, Swarz R. Reliable Computer Systems: Design and Evaluation (3rd edition). A K Peters/CRC Press, 1998.

[16] Smolens J, Gold B, Kim J et al. Fingerprinting: Bounding soft-error-detection latency and bandwidth. In Proc. the 11th Int. Conf. Architectural Support for Programming Languages and Operating Systems, Oct. 2004, pp.224-234.

[17] Weaver C, Austin T. A fault tolerant approach to microprocessor design. In Proc. International Conference on Dependable Systems and Networks, June 2001, pp.411-420.

[18] Constantinides K, Plaza S, Blome J et al. BulletProof: A defect-tolerant CMP switch architecture. In Proc. the 12th International Symposium on High-Performance Computer Architecture, Feb. 2006, pp.5-16.

[19] Hegde R, Shanbhag N R. Toward achieving energy efficiency in presence of deep submicronnoise. IEEE Trans. Very Large Scale Integration Systems, 2000, 8(4): 379-391.

[20] Kim J, Park D, Nicopoulos C et al. Design and analysis of an NoC architecture from performance, reliability and energy perspective. In Proc. Int. Symp. Architecture for Networking and Communications Systems, Oct. 2005, pp.173-182.

[21] Murali S, Atienza D, Benini L et al. A multi-path routing strategy with guaranteed in-order packet delivery and faulttolerance for networks on chip. In Proc. Design Automation Conference, June 2006, pp.845-848.

[22] Koibuchi M, Matsutani H, Amano H et al. A lightweight fault-tolerant mechanism for network-on-chip. In Proc. ACM/IEEE International Symposium on Networks-on-Chip, April 2008, pp.13-22.

[23] Fick D, DeOrio A, Hu J et al. Vicis: A reliable network for unreliable silicon. In Proc. the 46th Design Automation Conference, July 2009, pp.812-817.

[24] Palesi M, Kumar S, Catania V. Leveraging partially faulty links usage for enhancing yield and performance in networkson-chip. IEEE Trans. Computer-Aided Design of Integrated Circuits and Systems, 2010, 29(3): 426-440.

[25] Alaghi A, Karimi N, Sedghi M et al. Online NoC switch fault detection and diagnosis using a high level fault model. In Proc. International Symposium on Defect and FaultTolerance in VLSI Systems, Sept. 2007, pp.21-29.

[26] Gomez M E, Duato J, Flich J et al. An efficient fault-tolerant routing methodology for meshes and tori. Computer Architecture Letters, 2004, 3(1): 3.

[27] Ho C T, Stockmeyer L. A new approach to fault-tolerant wormhole routing for mesh-connected parallel computers. IEEE Trans. Computers, 2004, 53(4): 427-438.

[28] Han Y, Xu Y, Li H et al. Test resource partitioning based on efficient response compaction for test time and tester channels reduction. In Proc. Asian Test Symposium, Nov. 2003, pp.440-445.

[29] Han Y, Xu Y, Chandra A et al. Test resource partitioning based on efficient response compaction for test time and tester channels reduction. Journal of Computer Science and Technology, 2005, 20(2): 201-210.

[30] Han Y, Hu Y, Li X et al. Embedded test decompressor to reduce the required channels and vector memory of tester for complex processor circuit. IEEE Trans. Very Large Scale Integration Systems, 2007, 15(5): 531-540.

[31] Han Y, Hu Y, Li H et al. Theoretic analysis and enhanced X-tolerance of test response compact based on convolutional code. In Proc. the 2005 Asia and South Pacific Design Automation Conference, Jan. 2005, pp.53-58.
No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] Liu Mingye; Hong Enyu;. Some Covering Problems and Their Solutions in Automatic Logic Synthesis Systems[J]. , 1986, 1(2): 83 -92 .
[2] Chen Shihua;. On the Structure of (Weak) Inverses of an (Weakly) Invertible Finite Automaton[J]. , 1986, 1(3): 92 -100 .
[3] Gao Qingshi; Zhang Xiang; Yang Shufan; Chen Shuqing;. Vector Computer 757[J]. , 1986, 1(3): 1 -14 .
[4] Chen Zhaoxiong; Gao Qingshi;. A Substitution Based Model for the Implementation of PROLOG——The Design and Implementation of LPROLOG[J]. , 1986, 1(4): 17 -26 .
[5] Huang Heyan;. A Parallel Implementation Model of HPARLOG[J]. , 1986, 1(4): 27 -38 .
[6] Min Yinghua; Han Zhide;. A Built-in Test Pattern Generator[J]. , 1986, 1(4): 62 -74 .
[7] Tang Tonggao; Zhao Zhaokeng;. Stack Method in Program Semantics[J]. , 1987, 2(1): 51 -63 .
[8] Min Yinghua;. Easy Test Generation PLAs[J]. , 1987, 2(1): 72 -80 .
[9] Zhu Hong;. Some Mathematical Properties of the Functional Programming Language FP[J]. , 1987, 2(3): 202 -216 .
[10] Li Minghui;. CAD System of Microprogrammed Digital Systems[J]. , 1987, 2(3): 226 -235 .

ISSN 1000-9000(Print)

         1860-4749(Online)
CN 11-2296/TP

Home
Editorial Board
Author Guidelines
Subscription
Journal of Computer Science and Technology
Institute of Computing Technology, Chinese Academy of Sciences
P.O. Box 2704, Beijing 100190 P.R. China
Tel.:86-10-62610746
E-mail: jcst@ict.ac.cn
 
  Copyright ©2015 JCST, All Rights Reserved