In this paper we compare the optimum performance of the fast Fourier transf
orm (FFT) on torus and hypercube multicomputers. The optimal number of floa
ting point operations is architecturally invariant so the relative performa
nce is determined by communication time. We compare the performance of know
n multiport communication algorithms that utilize the full bandwidth of the
interconnection network on every cycle. Furthermore, data is routed unbloc
ked over paths with minimum length. Specifically we compare FFT performance
on the hypercube as well as two-, three- and four-dimensional torus archit
ectures. On the hypercube we observe the somewhat surprising result that, i
n the limit, communication is ultimately negligible compared to computation
. While the opposite is true of the torus, it is nevertheless possible to o
btain comparable performance over a broad range of processors and problem s
izes. As computers are built with increasing numbers of processors, torus p
erformance can still be made comparable to the hypercube by gradually incre
asing the dimension of the interconnect. Any number of processors can be us
ed to compute the FFT with efficiency that is theoretically bounded from ze
ro. (C) 2001 Elsevier Science B.V. All rights reserved.