A novel architecture for the block matching technique is proposed, which ca
n flexibly deal with various sizes of matching block and miscellaneous moti
on vector prediction modes of the current video coding standards, without e
xtra area and control overhead. The processing element array of the propose
d architecture features a separate difference and accumulation unit, consid
ering the balanced delay time among operational data paths and efficient ha
rdware resource utilisation. The VLSI realisation of the proposed architect
ure using 0.6 mu m CMOS technology shows significant improvement over a con
ventional systolic architecture in both area and speed.