S. Tridandapani et al., LOW OVERHEAD MULTIPROCESSOR ALLOCATION STRATEGIES EXPLOITING SYSTEM SPARE CAPACITY FOR FAULT-DETECTION AND LOCATION, I.E.E.E. transactions on computers, 44(7), 1995, pp. 865-877
Several schemes for detecting faults at the processor level in a multi
processor system have been discussed in the past, One such scheme [1]
works by running secondary versions of jobs on the unused, or spare, p
rocessors of the system and uses the comparison approach [2] to detect
faults, We build upon this scheme and propose three new multiprocesso
r allocation strategies that run a variable number of versions per job
, These schemes permit on-line detection and, in many cases, location
of faulty processors in a system with nominal degradation in its delay
/throughput performance; these delays are limited chiefly to the delay
s associated with job preemptions. Two new metrics, the fault detectio
n capability (FDC) and the fault location capability (FLC), are introd
uced to evaluate these schemes, Extensive simulation results are perfo
rmed to obtain performance figures for the various schemes, Stochastic
Petri Net models are also developed to obtain approximate performance
results, The results show that these schemes utilize spare capacity m
ore efficiently, thereby improving upon the fault detection and locati
on capabilities of the system.