The need for reliability of computers has been increasing, as computer
s have been put to use in more and more practical applications. Multip
rocessor architectures have provided elegant solutions for certain com
putationally expensive problems which find wide-ranging applications i
n areas such as defense and industry. Since computer-intensive applica
tions are run on these architectures, the probability that some comput
ations will incur error is not negligible. Hence fault tolerance plays
an important role in the design of multiprocessor architectures. In t
his paper, we review a low-cost scheme for adding fault tolerance in m
ultiprocessor architectures, called algorithm-based fault tolerance (A
BFT). The concurrent error detecting and correcting capabilities of th
is scheme are demonstrated with the help of examples. Various issues o
f interest, the areas open to research and the limitations of ABFT are
also pointed out. (C) 1997 Elsevier Science B.V.