Multiply-add optimized FFT kernels

Citation
H. Karner et al., Multiply-add optimized FFT kernels, MATH MOD M, 11(1), 2001, pp. 105-117
Citations number
10
Categorie Soggetti
Mathematics
Journal title
MATHEMATICAL MODELS & METHODS IN APPLIED SCIENCES
ISSN journal
02182025 → ACNP
Volume
11
Issue
1
Year of publication
2001
Pages
105 - 117
Database
ISI
SICI code
0218-2025(200102)11:1<105:MOFK>2.0.ZU;2-B
Abstract
Modern computer architecture provides a special instruction - the fused mul tiply-add (FMA) instruction - to perform both a multiplication and an addit ion operation at the same time. In this paper newly developed radix-2, radi x-3, and radix-5 FFT kernels that efficiently take advantage of this powerf ul instruction are presented. If a processor is provided with FMA instructi ons, the radix-a FFT algorithm introduced has the lowest complexity of all Cooley-Tukey radix-2 algorithms. All floating-point operations are executed as FMA instructions. Compared to conventional radix-3 and radix-5 kernels, the new radix-3 and radix-5 kernels greatly improve the utilization of FMA instructions, which results in a significant reduction in complexity. In g eneral, the advantages of the FFT algorithms presented in this paper are th eir low arithmetic complexity, their high efficiency, and their striking si mplicity Numerical experiments show that FFT programs using the new kernels clearly outperform even the best conventional FFT routines.