We present an area-efficient parallel architecture that implements the cons
tant-geometry, in-place Fast Fourier Transform. It consists of a specific-p
urpose processor array interconnected by means of a perfect unshuffle netwo
rk, For a radix r transform of N = r(n) data of size D and a column of P =
r(p) processors, each processor has only one local memory of N/rP words of
size rD, with only one read port and one write part that, nevertheless, mak
e it possible to read the r inputs of a butterfly and write r intermediate
results in each memory cycle. The address generating circuit that permits t
he in-place implementation is simple and the same for all the local memorie
s. The data Bow has been designed to efficiently exploit the pipelining of
the processing section with no cycle loss. This architecture reduces the ar
ea by almost 50% of other designs with a similar performance.