Sf. Gorman et Jm. Wills, PARTIAL COLUMN FFT PIPELINES, IEEE transactions on circuits and systems. 2, Analog and digital signal processing, 42(6), 1995, pp. 414-423
This paper presents the development of two efficient FFT implementatio
n algorithms which allow for more parallelization than the standard pi
peline, M = 2(q) radix r parallel computational elements are allocated
per column of the FFT flowgraph, and the constant geometry FFT is use
d for uniform stages. The first method solves the interstage data shuf
fle problem by decomposing the perfect shuffle matrix into the product
of four matrices, with a memory grouping resulting in a reduction of
switching from M(2) to 2M. The second method decomposes the perfect sh
uffle into the product of two matrices, and the memory is partitioned
such that multiport elements may be used. All required switching is ac
complished via addressing of the multiport elements requiring no exter
nal switching elements, Finally implementations are presented which al
low for a varied amount of parallization by using uniform modules and
merely modifying interconnect wiring.