W. Hohl et al., HARDWARE SUPPORT FOR ERROR-DETECTION IN MULTIPROCESSOR SYSTEMS - A CASE-STUDY, Microprocessors and microsystems, 17(4), 1993, pp. 201-206
A comparison of the most important methods for error detection in mult
iprocessor systems is presented based upon the experiences gained in t
he development of the fault-tolerant multiprocessor system MEMSY. A de
tailed comparison between watchdog processors and master-checker type
duplication based fault tolerance is given, from the point of view of
fault coverage, hardware and time overhead. It is shown that a simple
multiplication in itself is insufficient to assure proper error detect
ion features, especially if a low error latency time is required. Desi
gn guidelines are presented for the effective use of the duplication,
based on the master-checker mode. Additionally a new general purpose w
atchdog processor architecture is proposed, which monitors the behavio
ur of the main processor by checking the control flow of processes usi
ng an extended signature integrity checking (ESIC) method. The watchdo
g processor is independent of the architecture of the main processor b
ecause it is linked to the main processor by a memory interface. The w
atchdog processor is convenient for multiprocessor systems based on st
andard components and a RISC/CISC processor with large cache as node p
rocessor.