The Computational Plant (Cplant) project at Sandia National Laboratories is
developing a large-scale, massively parallel computing resource from a clu
ster of commodity computing and networking components. We are combining the
benefits of commodity cluster computing with our expertise in designing, d
eveloping, using, and maintaining large-scale, massively parallel processin
g (MPP) machines. In this paper, we present the design goals of the cluster
and an approach to developing a commodity-based computational resource cap
able of delivering: performance comparable to production-level MPP machines
. We provide a description of the hardware components of a 96-node Phase I
prototype machine and discuss the experiences with the prototype that led t
o the hardware choices for a 400-node Phase II production machine. We give
a detailed description of the management and runtime software components of
the cluster and offer computational performance data as well as performanc
e measurements of functions that are critical to the management of large sy
stems. (C) 2000 Elsevier Science B.V. All rights reserved.