This payer considers the architecture of clusters and related message-passi
ng (MP) software algorithms and their effect on performance (speedup and ef
ficiency) of cluster computing (CC). We present new architectures for multi
-segment Ethernet clusters and new MP algorithms which fit these architectu
res. The multiple segments (e.g, commodity hubs) connect commodity processo
r nodes so as to allow MP to be highly parallelized by avoiding network con
tention and collisions in many applications where the all-gather and other
collective operations are central. We analyze all-gather in some detail, an
d present new network topologies and new MP algorithms to minimize latency.
The new topologies are based on a design, called two-by-four nets (2 x 4 n
ets), by Compbionics. An integrated MP software system, called Reduced Over
head Cluster Communication (ROCC), which embodies the MP algorithms is also
described. In brief, 2 x 4 nets are networks of "supernodes", called 2 x 4
's, each having 4 processors on 2 segments and segments usually being Ether
net hubs, The supernodes are typically connected to form rings or tori of s
upernodes. We present actual test results and supporting analyses to demons
trate that 2 x 4 nets with the ROCC MP software are faster than many existi
ng clusters and generally less costly. (C) 2000 Published by Elsevier Scien
ce B.V. All rights reserved.