B. Vinnakota et Nk. Jha, DIAGNOSABILITY AND DIAGNOSIS OF ALGORITHM-BASED FAULT-TOLERANT SYSTEMS, I.E.E.E. transactions on computers, 42(8), 1993, pp. 924-937
Parallel processing architectures are now in common use for signal pro
cessing and other computation-intensive applications. These applicatio
ns are characterized by high throughput and long processing periods. S
uch characteristics decrease the reliability of high-performance archi
tectures. The erroneous data produced by faulty processors could have
damaging consequences, particularly in critical real-time applications
. It is therefore desirable that any erroneous data produced by the sy
stem be detected and located as quickly as possible. Algorithm-based f
ault tolerance (ABFT) is a low-cost system-level concurrent error dete
ction and fault location scheme. We apply methods used in the analysis
of multiprocessor systems employing system-level diagnosis to the ana
lysis of ABFT systems. A new algorithm to analyze an ABFT system for i
ts fault diagnosability is developed using these methods. Based on thi
s work, a fault diagnosis algorithm is developed for ABFT systems. No
such algorithm has been presented previously.