The best approach to parallelize multidimensional FFT algorithms has long b
een under debate, Distributed transposes are widely used, but they also var
y in communication policies and hence performance. In this work we analyze
the impact of different redistribution strategies on the performance of par
allel FFT, on various machine architectures. We found that some redistribut
ion strategies were consistently superior, while some others were unexpecte
dly inferior, An in-depth investigation into the reasons for this behavior
is included in this work. Copyright (C) 2001 John Wiley & Sons, Ltd.