RATIONALE AND STRATEGY FOR A 21ST-CENTURY SCIENTIFIC COMPUTING ARCHITECTURE - THE CASE FOR USING COMMERCIAL SYMMETRICAL MULTIPROCESSORS AS SUPERCOMPUTERS
We. Johnston, RATIONALE AND STRATEGY FOR A 21ST-CENTURY SCIENTIFIC COMPUTING ARCHITECTURE - THE CASE FOR USING COMMERCIAL SYMMETRICAL MULTIPROCESSORS AS SUPERCOMPUTERS, International journal of high speed computing, 9(3), 1997, pp. 191-222
In this paper we argue that the next generation of supercomputers will
be based on tight-knit clusters of symmetric multiprocessor systems i
n order to: (i) provide higher capacity at lower cost; (ii) enable eas
y future expansion, and (iii) ease the development of computational sc
ience applications. This strategy involves recognizing that the curren
t vector supercomputer user community divides (roughly) into two group
s, each of which will benefit from this approach: One, the ''capacity'
' users (who tend to run production codes aimed at solving the science
problems of today) will get better throughput than they do today by m
oving to large symmetric multiprocessor systems (SMPs), and a second g
roup, the ''capability'' users (who tend to be developing new computat
ional science techniques) will invest the time needed to get high perf
ormance from cluster-based parallel systems. In addition to the techno
logy-based arguments for the strategy, we believe that it also support
s a vision for a revitalization of scientific computing. This vision i
s that an architecture based on commodity components and computer scie
nce innovation will: (i) enable very scalable high performance computi
ng to address the high-end computational science requirements; (ii) pr
ovide better throughput and a more productive code development environ
ment for production supercomputing; (iii) provide a path to integratio
n with the laboratory and experimental sciences, and (iv) be the basis
of an on-going collaboration between the scientific community, the co
mputing industry, and the research computer science community in order
to provide a computing environment compatible with production codes a
nd dynamically increasing in both hardware and software capability and
capacity. We put forward the thesis that the current level of hardwar
e performance and sophistication of the software environment found in
commercial symmetric multiprocessor (SMP) systems, together with advan
ces in distributed systems architectures, make clusters of SMPs one of
the highest-performance, most cost-effective approaches to computing
available today. The current capacity users of the C90-like system wil
l be served in such an environment by having more of several critical
resources than the current environment provides: much more CPU time pe
r unit of real time, larger memory per node and much larger memory per
cluster; and the capability users are served by an MPP-like performan
ce and an architecture that enables continuous growth into the future.
In addition to these primary arguments, secondary advantages of SMP c
lusters include: the ability to replicate this sort of system in small
er units to provide identical computing environments at the home sites
and laboratories of scientific users; the future potential for using
the global Internet for interconnecting large clusters at a central fa
cility with smaller clusters at other sites to form a very high capabi
lity system; and a rapidly growing base of supporting commercial softw
are. The arguments made to support this thesis are as follows: (1) Wor
kstation vendors are increasingly turning their attention to paralleli
sm in order to run increasingly complex software in their commercial p
roduct lines. The pace of development by the ''workstation'' manufactu
rers due to their very-large investment in research and development fo
r hardware and software is so rapid that the special-purpose research
aimed at just the high-performance market is no longer able to produce
significant advantages over the mass-market products. We illustrate t
his trend and analyze its impact on the current performance of SMPs re
lative to vector supercomputers. (2) Several factors also suggest that
''clusters'' of SMPs will shortly outperform traditional MPPs for rea
sons similar to those mentioned above. The mass-produced network archi
tectures and components being used to interconnect SMP clusters are ex
periencing technology and capability growth trends similar to commodit
y computing systems. This is due to the economic drivers of the mergin
g of computing and telecommunications technology, and the greatly incr
eased demand for high bandwidth data communication. Very-high-speed ge
neral-purpose networks are now being produced for a large market, and
the technology is experiencing the same kinds of rapid advances as wor
kstation processor technology. The engineering required to build MPPs
from special-purpose networks that are integrated in special ways with
commercial microprocessors is costly and requires long engineering le
ad times. This results in delivered MPPs with less capable processors
than are being delivered in workstations at the same time. (3) Commerc
ial software now exists that provides integrated, MPP-style code devel
opment and system management-for clusters of SMPs, and software archit
ectures and components that will provide even more homogeneous views o
f clusters of SMPs are now emerging from several academic research gro
ups. We propose that the next-generation scientific supercomputer cent
er be built from clusters of SMPs, and suggest a strategy for an initi
al 50 Gflop configuration and incremental increases thereafter to reac
h a teraflop by just after the turn of the century. While this cluster
uses what is called ''network of workstations'' technology, the indiv
idual nodes are, in and of themselves, powerful systems that typically
have several gigaflops of CPU and several gigabytes of memory. The ri
sks of this approach are analyzed, and found to be similar to those of
MPPs. That is, the risks are primarily in software issues that are si
milar for SMPs and MPPs: namely, in the provision of a homogenous view
of a distributed memory system. The argument is made that the capacit
y of today's large SMPs, taken together with already existing distribu
ted systems software, will provide a versatile and powerful computatio
nal science environment. We also address the issues of application ava
ilability and code conversion to this new environment even if the homo
geneous cluster software environment does not mature as quickly as exp
ected. The throughput of the proposed SMP cluster architecture is subs
tantial. The job mix is more easily load balanced because of the subst
antially greater memory size of the proposed cluster implementation as
compared to a typical C90. The larger memory allows more jobs to be i
n the active schedule queue (in memory waiting to execute), and the la
rger ''local'' disk capacity of the cluster allows more data