Dl. Tao et K. Kantawala, EVALUATING RELIABILITY IMPROVEMENTS OF FAULT-TOLERANT ARRAY PROCESSORS USING ALGORITHM-BASED FAULT-TOLERANCE, I.E.E.E. transactions on computers, 46(6), 1997, pp. 725-730
Algorithm-based fault tolerance (ABFT) is used to provide low-cost err
or protection for VLSI processor arrays used in real-time digital sign
al processing. The main objective of incorporating an ABFT technique i
n a processor array is to improve its reliability. All previous approa
ches on ABFT are evaluated in terms of their error detecting/correctin
g capabilities, the reliability improvement has never been addressed.
In this paper, we develop a stochastic model for an array processor in
corporating ABFT that takes the behavior of transient/intermittent fai
lures and hardware overhead into account. This model is then used to e
valuate reliability and reliability improvements of several existing A
BFT techniques that tolerate single faults. Therefore, a user can eval
uate a number of ABFT techniques and make a trade-off between reliabil
ity and cost prior to the implementation. Moreover, we have conducted
extensive simulation experiments and the simulation results validate t
he proposed model.