The Viterbi algorithm has been widely applied to many decoding and est
imation applications in communications and signal processing. A state-
parallel implementation is usually used in which one add-compare-selec
t (ACS) unit is devoted to each state in the trellis. In this paper we
present a systematic approach of partitioning, scheduling, and mappin
g the N trellis states to P ACS's, where N > P. The area saving of our
architecture comes from the reduced number of both the ACS's and inte
rconnection wires. The design of the ACS, path metric storage, and rou
ting network is discussed in detail. The proposed architecture creates
internal parallelism due to the ACS sharing, which can be exploited t
o increase the throughput rate by pipelining. Consequently, the area-e
fficient architecture offers a favorable (smaller) area-time product,
compared to a state-parallel implementation. These results will be dem
onstrated by application examples in the accompanying paper.