EVALUATING RELIABILITY IMPROVEMENTS OF FAULT-TOLERANT ARRAY PROCESSORS USING ALGORITHM-BASED FAULT-TOLERANCE

Citation
Dl. Tao et K. Kantawala, EVALUATING RELIABILITY IMPROVEMENTS OF FAULT-TOLERANT ARRAY PROCESSORS USING ALGORITHM-BASED FAULT-TOLERANCE, I.E.E.E. transactions on computers, 46(6), 1997, pp. 725-730
Citations number
23
Categorie Soggetti
Computer Sciences","Engineering, Eletrical & Electronic","Computer Science Hardware & Architecture
ISSN journal
00189340
Volume
46
Issue
6
Year of publication
1997
Pages
725 - 730
Database
ISI
SICI code
0018-9340(1997)46:6<725:ERIOFA>2.0.ZU;2-1
Abstract
Algorithm-based fault tolerance (ABFT) is used to provide low-cost err or protection for VLSI processor arrays used in real-time digital sign al processing. The main objective of incorporating an ABFT technique i n a processor array is to improve its reliability. All previous approa ches on ABFT are evaluated in terms of their error detecting/correctin g capabilities, the reliability improvement has never been addressed. In this paper, we develop a stochastic model for an array processor in corporating ABFT that takes the behavior of transient/intermittent fai lures and hardware overhead into account. This model is then used to e valuate reliability and reliability improvements of several existing A BFT techniques that tolerate single faults. Therefore, a user can eval uate a number of ABFT techniques and make a trade-off between reliabil ity and cost prior to the implementation. Moreover, we have conducted extensive simulation experiments and the simulation results validate t he proposed model.