The performance of computation-intensive digital signal processing app
lications running on parallel systems is highly dependent on communica
tion delays imposed by the parallel architecture, In order to obtain a
more compact task/processor assignment, a scheduling algorithm consid
ering the communication time between processors needs to be investigat
ed, Such applications usually contain iterative or recursive segments
that are modeled as communication sensitive data flow graphs (CS-DFG's
), where nodes represent computational tasks and edges represent depen
dencies between them, Based on the theorems derived, this paper presen
ts a novel efficient technique called cyclo-compaction scheduling, whi
ch is applied to a CS-DFG to obtain a better schedule, This new method
takes into account the data transmission time, loop carried dependenc
ies, and the target architecture, It implicitly uses the retiming tech
nique (loop pipelining) and a task remapping procedure to allocate pro
cessors and to iteratively improve the parallelism while handling the
underlying communication and resource constraints. Experimental result
s on different architectures demonstrate that this algorithm yields si
gnificant improvement over existing methods, For some applications, th
e final schedule length is less than one half of its initial length.