|
›› 2013,Vol. 28 ›› Issue (6): 1045-1053.doi: 10.1007/s11390-013-1396-3
所属专题: Computer Architecture and Systems; Computer Networks and Distributed Computing
• Special Section on Selected Paper from NPC 2011 • 上一篇 下一篇
Yin-He Han1, 2 (韩银和), Senior Member, CCF, IEEE, Member, ACM, Cheng Liu1, 2 (刘成), Hang Lu1, 2 (路航), Student Member, CCF, IEEE, Wen-Bo Li1, 2 (李文博), Lei Zhang1, 2 (张磊), Member, CCF, IEEE, and Xiao-Wei Li1, 2 (李晓维), Senior Member, CCF, IEEE
Yin-He Han1, 2 (韩银和), Senior Member, CCF, IEEE, Member, ACM, Cheng Liu1, 2 (刘成), Hang Lu1, 2 (路航), Student Member, CCF, IEEE, Wen-Bo Li1, 2 (李文博), Lei Zhang1, 2 (张磊), Member, CCF, IEEE, and Xiao-Wei Li1, 2 (李晓维), Senior Member, CCF, IEEE
片上网络因为具有很好的扩展性和能提供较高带宽,被视为未来大规模片上系统互连的一种极有前景的技术。然而,随着工艺尺寸的变小及集成密度的增加,片上网络将变得不可靠。同时,对于片上网络而言,任意单节点故障可能会破坏全网络的连通性,而使全网络崩溃。冗余技术是一种常用的可靠性增强技术,然而,先前的冗余设计,如冗余部件划分粒度较粗,则可靠性不足,而如冗余部件粒度较细,则会带来过大的面积开销。本文避开了这一问题。我们首先通过观察发现,片上路由器数据传输通道部件,比如连接线、缓存或是交叉开关,都可以划分为多个同构的子部件,而这些子部件可作为本征冗余来使用。本文即是利用了这一本征冗余,提出了RevivalPath技术,该技术能实现任一子部件正常工作下则整个片上路由器的功能就正常。对于片上路由器中的控制部分如交换仲裁器、路由计算部件等,则使用直接冗余的方法来保护。实验结果显示,本方法能提供较高的可靠性,即使在较高的故障率情况下,也能实现网络性能的优雅降级。
[1] Benini L, De Micheli G. Networks on chips: A new SoC paradigm. Computer, 2002, 35(1): 70-78.[2] De Micheli G, Benini L. Networks on Chips: Technology and Tools. Morgan Kaufmann Pub, 2006.[3] Borkar S. Microarchitecture and design challenges for gigascale integration. In Proc. the 37th International Symposium on Microarchitecture, Dec. 2004, p.3.[4] Dally W, Towles B. Route packets, not wires: On-chip interconnection networks. In Proc. Design Automation Conference, June 2001, pp.684-689.[5] Borkar S. Designing reliable systems from unreliable components: The challenges of transistor variability and degradation. IEEE Micro, 2005, 25(6): 10-16.[6] Constantinescu C. Trends and challenges in VLSI circuit reliability. IEEE Micro, 2003, 23(4): 14-19.[7] Zhang L, Han Y, Xu Q et al. On topology reconfiguration for defect-tolerant NoC-based homogeneous manycore systems. IEEE Trans. Very Large Scale Integration Systems, 2009, 17(9): 1173-1186.[8] Boppana R V, Chalasani S. Fault-tolerant routing with nonadaptive wormhole algorithms in mesh networks. In Proc. Supercomputing, Nov. 1994, pp.693-702.[9] Zhang Z, Greiner A, Taktak S. A reconfigurable routing algorithm for a fault-tolerant 2D-mesh network-on-chip. In Proc. Design Automation Conference, June 2008, pp.441-446.[10] Flick D, DeOrio A, Chen G et al. A highly resilient routing algorithm for fault-tolerant NoCs. In Proc. Conf. Design, Automation and Test in Europe, April 2009, pp.21-26.[11] Flich J, Rodrigo S, Duato J. An efficient implementation of distributed routing algorithms for NoCs. In Proc. Int. Symp. Networks-on-Chip, April 2008, pp.87-96.[12] Wang J, Gu H, Yang Y et al. An energyand buffer-aware fully adaptive routing algorithm for Network-on-Chip. Microelectronics Journal, 2013, 44(2): 137-144.[13] Xiang D, Zhang Y, Pan Y. Practical deadlock-free faulttolerant routing in meshes based on the planar network fault model. IEEE Trans. Computers, 2009, 58(5): 620-633.[14] Xiang D, Luo W. An efficient adaptive deadlock-free routing algorithm for torus networks. IEEE Trans. Parallel and Distributed System, 2012, 23(5): 800-808.[15] Siewiorek D, Swarz R. Reliable Computer Systems: Design and Evaluation (3rd edition). A K Peters/CRC Press, 1998.[16] Smolens J, Gold B, Kim J et al. Fingerprinting: Bounding soft-error-detection latency and bandwidth. In Proc. the 11th Int. Conf. Architectural Support for Programming Languages and Operating Systems, Oct. 2004, pp.224-234.[17] Weaver C, Austin T. A fault tolerant approach to microprocessor design. In Proc. International Conference on Dependable Systems and Networks, June 2001, pp.411-420.[18] Constantinides K, Plaza S, Blome J et al. BulletProof: A defect-tolerant CMP switch architecture. In Proc. the 12th International Symposium on High-Performance Computer Architecture, Feb. 2006, pp.5-16.[19] Hegde R, Shanbhag N R. Toward achieving energy efficiency in presence of deep submicronnoise. IEEE Trans. Very Large Scale Integration Systems, 2000, 8(4): 379-391.[20] Kim J, Park D, Nicopoulos C et al. Design and analysis of an NoC architecture from performance, reliability and energy perspective. In Proc. Int. Symp. Architecture for Networking and Communications Systems, Oct. 2005, pp.173-182.[21] Murali S, Atienza D, Benini L et al. A multi-path routing strategy with guaranteed in-order packet delivery and faulttolerance for networks on chip. In Proc. Design Automation Conference, June 2006, pp.845-848.[22] Koibuchi M, Matsutani H, Amano H et al. A lightweight fault-tolerant mechanism for network-on-chip. In Proc. ACM/IEEE International Symposium on Networks-on-Chip, April 2008, pp.13-22.[23] Fick D, DeOrio A, Hu J et al. Vicis: A reliable network for unreliable silicon. In Proc. the 46th Design Automation Conference, July 2009, pp.812-817.[24] Palesi M, Kumar S, Catania V. Leveraging partially faulty links usage for enhancing yield and performance in networkson-chip. IEEE Trans. Computer-Aided Design of Integrated Circuits and Systems, 2010, 29(3): 426-440.[25] Alaghi A, Karimi N, Sedghi M et al. Online NoC switch fault detection and diagnosis using a high level fault model. In Proc. International Symposium on Defect and FaultTolerance in VLSI Systems, Sept. 2007, pp.21-29.[26] Gomez M E, Duato J, Flich J et al. An efficient fault-tolerant routing methodology for meshes and tori. Computer Architecture Letters, 2004, 3(1): 3.[27] Ho C T, Stockmeyer L. A new approach to fault-tolerant wormhole routing for mesh-connected parallel computers. IEEE Trans. Computers, 2004, 53(4): 427-438.[28] Han Y, Xu Y, Li H et al. Test resource partitioning based on efficient response compaction for test time and tester channels reduction. In Proc. Asian Test Symposium, Nov. 2003, pp.440-445.[29] Han Y, Xu Y, Chandra A et al. Test resource partitioning based on efficient response compaction for test time and tester channels reduction. Journal of Computer Science and Technology, 2005, 20(2): 201-210.[30] Han Y, Hu Y, Li X et al. Embedded test decompressor to reduce the required channels and vector memory of tester for complex processor circuit. IEEE Trans. Very Large Scale Integration Systems, 2007, 15(5): 531-540.[31] Han Y, Hu Y, Li H et al. Theoretic analysis and enhanced X-tolerance of test response compact based on convolutional code. In Proc. the 2005 Asia and South Pacific Design Automation Conference, Jan. 2005, pp.53-58. |
No related articles found! |
|
版权所有 © 《计算机科学技术学报》编辑部 本系统由北京玛格泰克科技发展有限公司设计开发 技术支持:support@magtech.com.cn 总访问量: |