A LOCAL-SPARING DESIGN METHODOLOGY FOR FAULT-TOLERANT MULTIPROCESSORS

Authors
Citation
S. Dutt et Jp. Hayes, A LOCAL-SPARING DESIGN METHODOLOGY FOR FAULT-TOLERANT MULTIPROCESSORS, Computers & mathematics with applications, 34(11), 1997, pp. 25-50
Citations number
28
ISSN journal
08981221
Volume
34
Issue
11
Year of publication
1997
Pages
25 - 50
Database
ISI
SICI code
0898-1221(1997)34:11<25:ALDMFF>2.0.ZU;2-T
Abstract
We present a comprehensive design methodology for constructing low-cos t multiprocessors that use local spares to tolerate the failure of eit her processor clusters or individual processors. We first formalize th e concepts of global-and local-sparing in terms of graph automorphisms . We then present a method for partitioning a multiprocessor graph by its automorphisms and for incorporating local-sparing to tolerate faul ts. We emphasize local-sparing designs, since they offer higher reliab ility-to-cost ratios and can reconfigure faster and in a localized man ner. When the spare clusters in each local subsystem are certain sizes , our designs are optimal in the number of spare intersubsystem links. They are all efficient (optimal in some cases) in terms of the number of spare intrasubsystem links. We present switch-based implementation s that significantly reduces the spare link complexities of the design s. These implementations are equally efficient for any spare cluster s ize, so they yield efficient local-sparing designs that can tolerate i ndividual processor faults (cluster size of one). Algorithms for fast, localized, and incremental reconfiguration of our FT designs are also developed. Finally, we demonstrate that our local-sparing designs hav e higher reliability-to-cost ratios than previous designs.