The two-dimensional discrete cosine transform (2D-DCT) is at the core of im
age encoding and compression applications. We present a new architecture fo
r the 2D-DCT which is based on row-column decomposition. An efficient archi
tecture to compute the one-dimensional fast direct (1D-DCT) and inverse cos
ine (1D-IDCT) transforms, which is based in reordering the butterflies afte
r their computation, is also discussed. The architectures designed exploit
locality, allowing pipelining between stages and saving memory (in-place).
The result is an efficient architecture for high speed computation of the (
1D, 2D)-DCT that significantly reduces the area required for VLSI implement
ation.