High-performance radix-2, 3 and 5 parallel 1-D complex FFT algorithms for distributed-memory parallel computers

Citation
D. Takahashi et Y. Kanada, High-performance radix-2, 3 and 5 parallel 1-D complex FFT algorithms for distributed-memory parallel computers, J SUPERCOMP, 15(2), 2000, pp. 207-228
Citations number
20
Categorie Soggetti
Computer Science & Engineering
Journal title
JOURNAL OF SUPERCOMPUTING
ISSN journal
09208542 → ACNP
Volume
15
Issue
2
Year of publication
2000
Pages
207 - 228
Database
ISI
SICI code
0920-8542(200002)15:2<207:HR3A5P>2.0.ZU;2-E
Abstract
In this paper, we propose high-performance radix-2, 3 and 5 parallel 1-D co mplex FFT algorithms for distributed-memory parallel computers. We use the four-step or six-step FFT algorithms to implement the radix-2, 3 and 5 para llel 1-D complex FFT algorithms. In our parallel FFT algorithms, since we u se cyclic distribution, all-to-all communication takes place only once. Mor eover, the input data and output data are both in natural order. We also show that the suitability of a parallel FFT algorithm is machine-de pendent because of the differences in the architecture of the processor ele ments in distributed-memory parallel computers. Experimental results of 2(p )3(q)5(r) point FFTs on distributed-memory parallel computers, HITACHI SR22 01 and IBM SP2 are reported. We succeeded to get performances of about 130 GFLOPS on a 1024PE HITACHI SR2201 and about 1.25 GFLOPS on a 32PE IBM SP2.