Journal of Computer Science and Technology ›› 2021, Vol. 36 ›› Issue (5): 1089-1101.doi: 10.1007/s11390-021-0852-8

Special Issue: Computer Architecture and Systems

• Special Section of 2020 CCF Integrated Circuit Design and Automation Conference • Previous Articles     Next Articles

Partial-TMR: A New Method for Protecting Register Files Against Soft Error Based on Lifetime Analysis

Xian-Geng Liang, Ying-Ke Gao*, Member, CCF, and Geng-Xin Hua, Member, CCF        

  1. Beijing Institute of Control Engineering, China Academy of Space Technology, Beijing 100090, China
  • Received:2020-08-05 Revised:2021-08-05 Online:2021-09-30 Published:2021-09-30
  • About author:Xian-Geng Liang received his B.E. degree and M.A. degree in control science and engineering from Beihang University, Beijing, in 2012 and 2015 respectively, and his Ph.D. degree in computer science from China Academy of Space Technology, Beijing, in 2020. He is currently working as an engineer in Beijing Institute of Control Engineering, China Academy of Space Technology, Beijing. His research fields are in computer architecture and reliability.

High-energy particles in the space can easily cause soft error in register file (RF). As a critical structure in a processor, RF often stores data for long periods of time and is read frequently, resulting in a higher probability of spreading corrupted data to other parts of the processor. The triple modular redundancy (TMR) is a common and effective fault tolerance method that enables multi-bit error correction. Designing full TMR for all the registers could cause excessive area and power overheads. However, some registers in RF have less impact on processor reliability. Therefore, there is no need to design TMR for them. This paper designs an efficient strategy which can rate the registers in RF based on their vulnerability. Based on the proposed strategy, a new RF fault tolerance method named Partial-TMR formulates in this paper, which selectively protects more vulnerable registers against multi-bit error, and improves fault tolerance efficiency. For integer RF, Partial-TMR improves its soft error correction capability by 24.5% relative to the baseline system and 3% relative to ParShield, while for floating-point RF, the improvement comes to 5.17% and 0.58% respectively. The soft error correction capability of Partial-TMR is slightly lower than that of full TMR by 1% to 3%, but Partial-TMR significantly cuts the area and power overheads. Compared with full TMR, Partial-TMR decreases the area and power overheads by 71.6% and 64.9%, respectively. It also has little impact on the performance. Partial-TMR is a more cost-effective fault tolerance method compared with ParShield and full TMR.

Key words: register file; soft error; lifetime analysis; selective protection; triple modular redundancy (TMR);

[1] Rajaei N, Rajaei R, Tabandeh M. A soft error tolerant register file for highly reliable microprocessor design. International Journal of High Performance Systems Architecture, 2017, 7(3):113-119. DOI:10.1504/IJHPSA.2017.091479.
[2] Pham H. Optimal cost-effective design of triple-modularredundancy-with-spares systems. IEEE Transactions on Reliability, 1993, 42(3):369-374. DOI:10.1109/24.257819.
[3] Jeon H, Ravi G S, Kim N S, Murali A. GPU register file virtualization. In Proc. the 48th International Symposium on Microarchitecture, December 2015, pp.420-432. DOI:10.1145/2830772.2830784.
[4] Leng J, Gilani S, Hetherington T, ElTantawy A, Kim N S, Aamodt T M, Reddi V J. GPUWattch:Enabling energy optimizations in GPGPUs. In Proc. the 40th Annual International Symposium on Computer Architecture, June 2013, pp.487-498. DOI:10.1145/2485922.2485964.
[5] Liu S, Reviriego P, Xiao L. Evaluating direct compare for double error correction codes. IEEE Transactions on Device and Materials Reliability, 2017, 17(4):802-804. DOI:10.1109/TDMR.2017.2756853.
[6] Montesinos P, Liu W, Torrellas J. Using register lifetime predictions to protect register files against soft errors. In Proc. the 37th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, June 2007, pp.286-296. DOI:10.1109/DSN.2007.99.
[7] Memik G, Kandemir M T, Ozturk O. Increasing register file immunity to transient errors. In Proc. the 2005 Design, Automation and Test in Europe Conference and Exposition, March 2005, pp.586-591. DOI:10.1109/DATE.2005.181.
[8] Hu J, Wang S, Ziavras S G. In-register duplication:Exploiting narrow-width value for improving register file reliability. In Proc. the 2006 Int. Conf. Dependable Systems and Networks, June 2006, pp.281-290. DOI:10.1109/DSN.2006.43.
[9] Esmaeeli S, Hosseini M, Vahdat B V, Rashidian B. A multibit error tolerant register file for a high reliable embedded processor. In Proc. the 18th IEEE International Conference on Electronics, Circuits, and Systems, December 2011, pp.532-537. DOI:10.1109/ICECS.2011.6122330.
[10] Balkan D, Sharkey J, Ponomarev D, Ghose K. Selective writeback:Reducing register file pressure and energy consumption. IEEE Trans. Very Large Scale Integration Systems, 2008, 16(6):650-661. DOI:10.1109/TVLSI.2008.2000243.
[11] Lozano L A, Gao G R. Exploiting short-lived variables in superscalar processors. In Proc. the 28th Annual International Symposium on Microarchitecture, November 29-December 1, 1995, pp.292-302. DOI:10.1109/MICRO.1995.476839.
[12] Tonfat J, Kastensmidt F L, Artola L et al. Analyzing the influence of the angles of incidence on SEU and MBU events induced by low LET heavy ions in a 28-nm SRAM-based FPGA. In Proc. the 16th European Conference on Radiation and Its Effects on Components and Systems, September 2016. DOI:10.1109/RADECS.2016.8093186.
[13] Wu W, Seifert N. MBU-Calc:A compact model for MultiBit Upset (MBU) SER estimation. In Proc. the 2015 IEEE Int. Reliability Physics Symp., April 2015, pp.SE.2.1-SE.2.6. DOI:10.1109/IRPS.2015.7112831.
[14] Abazari M A, Fazeli M, Patooghy A, Miremadi S G. An efficient technique to tolerate MBU faults in register file of embedded processors. In Proc. the 16th CSI Int. Symposium on Computer Architecture and Digital Systems, May 2012, pp.115-120. DOI:10.1109/CADS.2012.6316430.
[1] Yun Liang, Shuo Wang. Performance-Centric Optimization for Racetrack Memory Based Register File on GPUs [J]. , 2016, 31(1): 36-49.
[2] Yi-Xiao Yin, Yun-Ji Chen, Qi Guo, Tian-Shi Chen. Prevention from Soft Errors via Architecture Elasticity [J]. , 2014, 29(2): 247-254.
[3] Xue-Jun Yang, Yu Deng, Li Wang, Xiao-Bo Yan, Jing Du, Ying Zhang, Gui-Bin Wang, and Tao Tang. SRF Coloring: Stream Register File Allocation via Graph Coloring [J]. , 2009, 24(1 ): 152-164 .
Full text



[1] Zhou Di;. A Recovery Technique for Distributed Communicating Process Systems[J]. , 1986, 1(2): 34 -43 .
[2] Chen Shihua;. On the Structure of Finite Automata of Which M Is an(Weak)Inverse with Delay τ[J]. , 1986, 1(2): 54 -59 .
[3] Wang Jianchao; Wei Daozheng;. An Effective Test Generation Algorithm for Combinational Circuits[J]. , 1986, 1(4): 1 -16 .
[4] Chen Zhaoxiong; Gao Qingshi;. A Substitution Based Model for the Implementation of PROLOG——The Design and Implementation of LPROLOG[J]. , 1986, 1(4): 17 -26 .
[5] Huang Heyan;. A Parallel Implementation Model of HPARLOG[J]. , 1986, 1(4): 27 -38 .
[6] Zheng Guoliang; Li Hui;. The Design and Implementation of the Syntax-Directed Editor Generator(SEG)[J]. , 1986, 1(4): 39 -48 .
[7] Huang Xuedong; Cai Lianhong; Fang Ditang; Chi Bianjin; Zhou Li; Jiang Li;. A Computer System for Chinese Character Speech Input[J]. , 1986, 1(4): 75 -83 .
[8] Xu Xiaoshu;. Simplification of Multivalued Sequential SULM Network by Using Cascade Decomposition[J]. , 1986, 1(4): 84 -95 .
[9] Tang Tonggao; Zhao Zhaokeng;. Stack Method in Program Semantics[J]. , 1987, 2(1): 51 -63 .
[10] Zhong Renbao; Xing Lin; Ren Zhaoyang;. An Interactive System SDI on Microcomputer[J]. , 1987, 2(1): 64 -71 .

ISSN 1000-9000(Print)

CN 11-2296/TP

Editorial Board
Author Guidelines
Journal of Computer Science and Technology
Institute of Computing Technology, Chinese Academy of Sciences
P.O. Box 2704, Beijing 100190 P.R. China
  Copyright ©2015 JCST, All Rights Reserved