Partial-TMR: A New Method for Protecting Register Files Against Soft Error Based on Lifetime Analysis
-
Abstract
High-energy particles in the space can easily cause soft error in register file (RF). As a critical structure in a processor, RF often stores data for long periods of time and is read frequently, resulting in a higher probability of spreading corrupted data to other parts of the processor. The triple modular redundancy (TMR) is a common and effective fault tolerance method that enables multi-bit error correction. Designing full TMR for all the registers could cause excessive area and power overheads. However, some registers in RF have less impact on processor reliability. Therefore, there is no need to design TMR for them. This paper designs an efficient strategy which can rate the registers in RF based on their vulnerability. Based on the proposed strategy, a new RF fault tolerance method named Partial-TMR formulates in this paper, which selectively protects more vulnerable registers against multi-bit error, and improves fault tolerance efficiency. For integer RF, Partial-TMR improves its soft error correction capability by 24.5% relative to the baseline system and 3% relative to ParShield, while for floating-point RF, the improvement comes to 5.17% and 0.58% respectively. The soft error correction capability of Partial-TMR is slightly lower than that of full TMR by 1% to 3%, but Partial-TMR significantly cuts the area and power overheads. Compared with full TMR, Partial-TMR decreases the area and power overheads by 71.6% and 64.9%, respectively. It also has little impact on the performance. Partial-TMR is a more cost-effective fault tolerance method compared with ParShield and full TMR.
-
-