In order to satisfy cost and performance requirements, digital signal proce
ssing and telecommunication systems are generally implemented with a combin
ation of different components, from custom-designed chips to off-the-shelf
processors. These components vary in their area, performance, programmabili
ty and so on, and the system functionality is partitioned amongst the compo
nents to best utilize this tradeoff. However, for performance critical desi
gns, it is not sufficient to only implement the critical sections as custom
-designed high-performance hardware, but it is also necessary to pipeline t
he system at several levels of granularity. We present a design flow and an
algorithm to first allocate software and hardware components, and then par
tition and pipeline a throughput-constrained specification amongst the sele
cted components. This is performed to best satisfy the throughput constrain
t at minimal application-specific integrated-circuit cost. Our ability to i
ncorporate partitioning with pipelining at several levels of granularity en
ables us to attain high throughput designs, and also distinguishes this pap
er from previously proposed hardware/software partitioning algorithms.