Fault-tolerant processor arrays using additional bypass linking allocated by graph-node coloring

Authors
Citation
N. Tsuda, Fault-tolerant processor arrays using additional bypass linking allocated by graph-node coloring, IEEE COMPUT, 49(5), 2000, pp. 431-442
Citations number
26
Categorie Soggetti
Computer Science & Engineering
Journal title
IEEE TRANSACTIONS ON COMPUTERS
ISSN journal
00189340 → ACNP
Volume
49
Issue
5
Year of publication
2000
Pages
431 - 442
Database
ISI
SICI code
0018-9340(200005)49:5<431:FPAUAB>2.0.ZU;2-4
Abstract
An advanced spare-connection scheme for k-out-of-n redundancy called "gener alized additional bypass linking" is proposed for constructing fault-tolera nt massively parallel computers with series-connected, mesh-connected, or t ree-connected processing element (PE) arrays. This scheme uses bypass links with wired OR connections to selectively connect the primary PEs to a spar e PE in parallel. These bypass links are allocated to the primary PEs by no de-coloring of a graph with a minimum inter-node distance of three in order to minimize the number of bypass links (i.e., the chromatic number). The m ain advantage of this scheme is that it can be used for constructing variou s k-out-of-n configurations capable of enhanced PE-to-PE communication and broadcast while still achieving strong fault tolerance for these PEs and li nks. In particular, it enables the construction of optimal r-strongly-fault -tolerant configurations capable of direct k-out-of-n selections by providi ng r spare PEs and r extra connections per PE for any kind of array when no de-coloring with a distance of three is used. This simple spare-circuit str ucture enhances fault tolerance more than conventional schemes do. The node -coloring patterns were constructed using new node-coloring algorithms and the chromatic numbers were evaluated theoretically. Enhanced PE-to-PE commu nication and broadcast were achieved by using new fault-tolerant routing al gorithms based on the properties of the node-coloring patterns with four or five message transmission steps being optimal configurations with any size array.