M. Hegland, AN IMPLEMENTATION OF MULTIPLE AND MULTIVARIATE FOURIER-TRANSFORMS ON VECTOR PROCESSORS, SIAM journal on scientific computing, 16(2), 1995, pp. 271-288
Several algorithms for multiple and multivariate complex Fourier trans
forms are suggested. They use little or no scratch space, i.e., they a
re essentially in-place. Furthermore, they display high levels of regu
lar parallelism needed for vector-parallel computers. Data access is m
ainly by stride 1. These algorithms correspond to matrix factorization
s using transposition permutations. An essential tool used in defining
the permutations and the factorizations are Kronecker products. The s
uggested algorithms are combined with a method for one-dimensional fas
t Fourier transforms which is also in-place and accesses data with str
ide 1. Although the number of floating point operations and the total
degree of parallelism is left unchanged by the methods used here, data
access is much more regular This is reflected in substantially lower
values of anew indicator called loop overhead indicator and often by e
xecution times approximately 2 to 5 times lower, even for moderate pro
blem sizes on the Fujitsu VP 2200, compared to the implementation of t
he fundamental tenser product factorization.