ROUTING IN MODULAR FAULT-TOLERANT MULTIPROCESSOR SYSTEMS

Authors
Citation
Ms. Alam et Rg. Melhem, ROUTING IN MODULAR FAULT-TOLERANT MULTIPROCESSOR SYSTEMS, IEEE transactions on parallel and distributed systems, 6(11), 1995, pp. 1206-1220
Citations number
31
Categorie Soggetti
System Science","Engineering, Eletrical & Electronic","Computer Science Theory & Methods
ISSN journal
10459219
Volume
6
Issue
11
Year of publication
1995
Pages
1206 - 1220
Database
ISI
SICI code
1045-9219(1995)6:11<1206:RIMFMS>2.0.ZU;2-G
Abstract
In this paper, we consider a class of modular multiprocessor architect ures in which spares are added to each module to cover for faulty node s within that module, thus forming a fault-tolerant basic block (FTBB) . In contrast to reconfiguration techniques that preserve the physical adjacency between active nodes in the system, our goal is to preserve the logical adjacency between active nodes by means of a routing algo rithm which delivers messages successfully to their destinations, We i ntroduce two-phase routing strategies that route messages first to the ir destination FTBB, and then to the destination nodes within the dest ination FTBB. Such a strategy may be applied to a variety of architect ures including binary hypercubes and three dimensional tori. In the pr esence of f faults in hypercubes and tori, we show that the worst case length of the message route is min {sigma + f, (K + 1)sigma} + c wher e sigma is the shortest path in the absence of faults, K is the number of spare nodes in an FTBB, and c is a small constant. The average rou ting overhead is much lower than the worst case overhead.