LOW OVERHEAD MULTIPROCESSOR ALLOCATION STRATEGIES EXPLOITING SYSTEM SPARE CAPACITY FOR FAULT-DETECTION AND LOCATION

Citation
S. Tridandapani et al., LOW OVERHEAD MULTIPROCESSOR ALLOCATION STRATEGIES EXPLOITING SYSTEM SPARE CAPACITY FOR FAULT-DETECTION AND LOCATION, I.E.E.E. transactions on computers, 44(7), 1995, pp. 865-877
Citations number
13
Categorie Soggetti
Computer Sciences","Engineering, Eletrical & Electronic","Computer Science Hardware & Architecture
ISSN journal
00189340
Volume
44
Issue
7
Year of publication
1995
Pages
865 - 877
Database
ISI
SICI code
0018-9340(1995)44:7<865:LOMASE>2.0.ZU;2-A
Abstract
Several schemes for detecting faults at the processor level in a multi processor system have been discussed in the past, One such scheme [1] works by running secondary versions of jobs on the unused, or spare, p rocessors of the system and uses the comparison approach [2] to detect faults, We build upon this scheme and propose three new multiprocesso r allocation strategies that run a variable number of versions per job , These schemes permit on-line detection and, in many cases, location of faulty processors in a system with nominal degradation in its delay /throughput performance; these delays are limited chiefly to the delay s associated with job preemptions. Two new metrics, the fault detectio n capability (FDC) and the fault location capability (FLC), are introd uced to evaluate these schemes, Extensive simulation results are perfo rmed to obtain performance figures for the various schemes, Stochastic Petri Net models are also developed to obtain approximate performance results, The results show that these schemes utilize spare capacity m ore efficiently, thereby improving upon the fault detection and locati on capabilities of the system.