This paper presents a novel variable-latency multiplier architecture, suita
ble for implementation as a self-timed multiplier core or as a fully synchr
onous multicycle multiplier core. The architecture combines a second-order
Booth algorithm with a split carry save array pipelined organization, incor
porating multiple row skipping and completion-predicting carry-select final
adder. The paper reports the architecture and logic design, CMOS circuit d
esign and performance evaluation. In 0.35 mum CMOS, the expected sustainabl
e cycle time for a 32-bit synchronous implementation is 2.25 ns. Instructio
n level simulations estimate 54% single-cycle and 46% two-cycle operations
in SPEC95 execution. Using the same CMOS process, the 32-bit asynchronous i
mplementation is expected to reach an average 1.76 ns throughput and 3.48 n
s latency in SPEC95 execution.