S. Dutt et Jp. Hayes, A LOCAL-SPARING DESIGN METHODOLOGY FOR FAULT-TOLERANT MULTIPROCESSORS, Computers & mathematics with applications, 34(11), 1997, pp. 25-50
We present a comprehensive design methodology for constructing low-cos
t multiprocessors that use local spares to tolerate the failure of eit
her processor clusters or individual processors. We first formalize th
e concepts of global-and local-sparing in terms of graph automorphisms
. We then present a method for partitioning a multiprocessor graph by
its automorphisms and for incorporating local-sparing to tolerate faul
ts. We emphasize local-sparing designs, since they offer higher reliab
ility-to-cost ratios and can reconfigure faster and in a localized man
ner. When the spare clusters in each local subsystem are certain sizes
, our designs are optimal in the number of spare intersubsystem links.
They are all efficient (optimal in some cases) in terms of the number
of spare intrasubsystem links. We present switch-based implementation
s that significantly reduces the spare link complexities of the design
s. These implementations are equally efficient for any spare cluster s
ize, so they yield efficient local-sparing designs that can tolerate i
ndividual processor faults (cluster size of one). Algorithms for fast,
localized, and incremental reconfiguration of our FT designs are also
developed. Finally, we demonstrate that our local-sparing designs hav
e higher reliability-to-cost ratios than previous designs.