We summarize different techniques which may be of interest in insuring
the survivability of a distributed or parallel application, in the pr
esence of processor stoppages or failures. Starting from techniques us
ed in sequential systems, we review checkpointing and rollback recover
y techniques. Then we discuss methods which are oriented specifically
to distributed and parallel system survivability. (C) Elsevier Science
Inc. 1997