N-body codes are routinely used for simulation studies of physical systems,
e.g. in the fields of computational astrophysics and molecular dynamics. T
ypically, they require only a moderate amount of run-time memory, but are v
ery demanding in computational power. A detailed analysis of an N-body code
performance, in terms of the relative weight of each task of the code, and
how this weight is influenced by software or hardware optimisations, is es
sential in improving such codes. The approach of developing a dedicated dev
ice, GRAPE [J. Makino, M. Taiji, Scientific Simulations with Special Purpos
e Computers, Wiley, New York, 1998], able to provide a very high performanc
e for the most expensive computational task of this code, has resulted in a
dramatic performance leap. We explore on the performance of different vers
ions of parallel N-body codes, where both software and hardware improvement
s are introduced. The use of GRAPE as a 'force computation accelerator' in
a parallel computer architecture, can be seen as an example of a hybrid arc
hitecture, where special purpose device boards help a general purpose (mult
i)computer to reach a very high performance. (C) 2001 Elsevier Science B.V.
All rights reserved.