A TIME REDUNDANCY APPROACH TO TMR FAILURES USING FAULT-STATE LIKELIHOODS

Authors
Citation
Kg. Shin et Hb. Kim, A TIME REDUNDANCY APPROACH TO TMR FAILURES USING FAULT-STATE LIKELIHOODS, I.E.E.E. transactions on computers, 43(10), 1994, pp. 1151-1162
Citations number
23
Categorie Soggetti
Computer Sciences","Engineering, Eletrical & Electronic","Computer Science Hardware & Architecture
ISSN journal
00189340
Volume
43
Issue
10
Year of publication
1994
Pages
1151 - 1162
Database
ISI
SICI code
0018-9340(1994)43:10<1151:ATRATT>2.0.ZU;2-B
Abstract
Failure to establish a majority among the processing modules in a trip le modular redundant (TMR) system, called a TMR failure, is detected b y using two voters and a disagreement detector. Assuming that no more than one module becomes permanently faulty during the execution of a t ask, Re-execution of the task on the Same HardWare (RSHW) upon detecti on of a TMR failure becomes a cost-effective recovery method, because 1) the TMR system; can mask the effects of one faulty module while RSH W can recover from nonpermanent faults, and 2) system reconfiguration- Replace the faulty HardWare, reload, and Restart (RHWR)-is expensive b oth in time and hardware. We propose an adaptive recovery method for T MR failures by ''optimally'' choosing either RSHW or RHWR based on the estimation of the costs involved. We apply the Bayes theorem to updat e the likelihoods of all possible states in the TMR system with each v oting result. Upon detection of a TMR failure, the expected cost of RS HW is derived with these likelihoods and then compared with that of RH WR. RSHW will continue either until it recovers from the TMR failure o r until the expected cost of RSHW becomes larger than that of RHWR. As the number of unsuccessful RSHW's increases, the probability of perma nent fault(s) having caused the TMR failure will increase, which will, in turn, increase the cost of RSHW. Our simulation results show that the proposed method outperforms the conventional reconfiguration metho d using only RHWR under various conditions.