Ms. Alam et Rg. Melhem, ROUTING IN MODULAR FAULT-TOLERANT MULTIPROCESSOR SYSTEMS, IEEE transactions on parallel and distributed systems, 6(11), 1995, pp. 1206-1220
Citations number
31
Categorie Soggetti
System Science","Engineering, Eletrical & Electronic","Computer Science Theory & Methods
In this paper, we consider a class of modular multiprocessor architect
ures in which spares are added to each module to cover for faulty node
s within that module, thus forming a fault-tolerant basic block (FTBB)
. In contrast to reconfiguration techniques that preserve the physical
adjacency between active nodes in the system, our goal is to preserve
the logical adjacency between active nodes by means of a routing algo
rithm which delivers messages successfully to their destinations, We i
ntroduce two-phase routing strategies that route messages first to the
ir destination FTBB, and then to the destination nodes within the dest
ination FTBB. Such a strategy may be applied to a variety of architect
ures including binary hypercubes and three dimensional tori. In the pr
esence of f faults in hypercubes and tori, we show that the worst case
length of the message route is min {sigma + f, (K + 1)sigma} + c wher
e sigma is the shortest path in the absence of faults, K is the number
of spare nodes in an FTBB, and c is a small constant. The average rou
ting overhead is much lower than the worst case overhead.