An efficient algorithm is presented for computing the two-dimensional discr
ete cosine transform (2-D DCT) whose size is a power of a prime. Based on a
generalised 2-D to one-dimensional (1-D) index mapping scheme, the propose
d algorithm decomposes the 2-D DCT outputs into three parts. The first part
forms a 2-D DCT of a smaller size. The remaining outputs are further decom
posed into two parts, depending on the summation of their indices. The latt
er two parts can be reformulated as a set of circular correlation (CC) or s
kew-circular correlation (SCC) matrix-vector products by utilising the rece
ntly addressed maximum coset decomposition. Such a decomposition procedure
can be repetitively carried out for the 2-D DCT of the first part, resultin
g in a sequence of CC and SCC matrix-vector products of various sizes. Empl
oying fast algorithms for the computation of these CC/SCC operations can su
bstantially reduce the numbers of multiplications compared with those of th
e conventional row-column decomposition approach. In the special case where
the data size is a power of two, the proposed algorithm can be further sim
plified, calling for computations comparable with those of previous works.