RevivePath:基于数据通路挽回的可靠片上网络设计技术
RevivePath:Resilient Network-on-Chip Design Through Data Path Salvaging of Router
-
摘要: 片上网络因为具有很好的扩展性和能提供较高带宽,被视为未来大规模片上系统互连的一种极有前景的技术。然而,随着工艺尺寸的变小及集成密度的增加,片上网络将变得不可靠。同时,对于片上网络而言,任意单节点故障可能会破坏全网络的连通性,而使全网络崩溃。冗余技术是一种常用的可靠性增强技术,然而,先前的冗余设计,如冗余部件划分粒度较粗,则可靠性不足,而如冗余部件粒度较细,则会带来过大的面积开销。本文避开了这一问题。我们首先通过观察发现,片上路由器数据传输通道部件,比如连接线、缓存或是交叉开关,都可以划分为多个同构的子部件,而这些子部件可作为本征冗余来使用。本文即是利用了这一本征冗余,提出了RevivalPath技术,该技术能实现任一子部件正常工作下则整个片上路由器的功能就正常。对于片上路由器中的控制部分如交换仲裁器、路由计算部件等,则使用直接冗余的方法来保护。实验结果显示,本方法能提供较高的可靠性,即使在较高的故障率情况下,也能实现网络性能的优雅降级。Abstract: Network-on-Chip (NoC) with excellent scalability and high bandwidth has been considered to be the most promising communication architecture for complex integration systems. However, NoC reliability is getting continuously challenging for the shrinking semiconductor feature size and increasing integration density. Moreover, a single node failure in NoC might destroy the network connectivity and corrupt the entire system. Introducing redundancies is an efficient method to construct a resilient communication path. However, prior work based on redundancies, either results in limited reliability with coarse grain protection or involves even larger hardware overhead with fine grain. In this paper, we notice that data path such as links, buffers and crossbars in NoC can be divided into multiple identical parallel slices, which can be utilized as inherent redundancy to enhance reliability. As long as there is one fault-free slice left available, the proposed salvaging scheme named as RevivePath, can be employed to make the overall data path still functional. Furthermore, RevivePath uses the direct redundancy to protect the control path such as switch arbiter, routing computation, to provide a full fault-tolerant scheme to the whole router. Experimental results show that it achieves quite high reliability with graceful performance degradation even under high fault rate.