Distributed arithmetic (DA) has been widely used to implement inner product
computations with a fixed input. Conventional ROM-based DA suffers from la
rge ROM requirements. A new DA algorithm is proposed that expands the fixed
input instead of the variable input into bit level as in ROM-based DA. Thu
s the new DA algorithm can take advantage of shared partial sum-of-products
and sparse nonzero bits in the fixed input to reduce the number of computa
tions. Unlike ROM-based DA that stores the precomputed results the new DA a
lgorithm uses a predefined structure to compute results. When applied to a
1-D eight-point DCT system the new DA algorithm only needs 30% of hardware
area and has faster speed as compared with ROM-based DA. To illustrate the
efficiency of the proposed algorithm a 2-D IDCT chip was implemented using
0.8 mu m SPDM CMOS technology. The chip with size 4575 x 5525 mu m can deli
ver a processing rate of 50 Mpixels per second.