It seems likely that improvements in arithmetic speed will continue to outp
ace advances in communication bandwidth. Furthermore, as more and more prob
lems are working on huge datasets, it is becoming increasingly likely that
data will be distributed across many processors because one processor does
not have sufficient storage capacity. For these reasons, we propose that an
inexact DFT such as an approximate matrix-vector approach based on singula
r values or a variation of the Dutt-Rokhlin fast-multipole-based algorithm
may outperform any exact parallel FFT. The speedup may be as large as a fac
tor of three in situations where FFT run time is dominated by communication
. For the multipole idea we further propose that a method of "virtual charg
es" may improve accuracy, and we provide an analysis of the singular values
that are needed for the approximate matrix-vector approaches.