Checkpoint Management with Double Modular Redundancy Based on the Probability of Task Completion
-
Abstract
This paper proposes a checkpoint rollback strategy for real-time systems with double modular redundancy. Without built-in fault-detection and spare processors, our scheme is able to recover from both transient and permanent faults. Two comparisons are conducted at each checkpoint. First, the states stored in two consecutive checkpoints of one processor are compared for checking integrity of the processor. The states of two processors are also compared for detecting faults and the system rolls back to the previous checkpoint whenever required by logic of the proposed scheme. A Markov model is induced by the fault recovery scheme and analyzed to provide the probability of task completion within its deadline. The optimal number of checkpoints is selected so as to maximize the probability of task completion.
-
-