Many digital signal and image processing algorithms can be speeded up
by executing them in parallel on multiple processors. The speed of par
allel execution is limited by the need for communication and synchroni
zation between processors. In this paper, we present a paradigm for pa
rallel processing that we call the block data flow paradigm (BDFP). Th
e goal of this paradigm-is to reduce interprocessor communication, and
relax the synchronization requirements for such applications. We pres
ent the block data parallel architecture which implements this paradig
m, and we present methods for mapping algorithms onto this architectur
e. We illustrate this methodology for several applications including t
wo-dimensional (2-D) digital filters, the 2-D discrete cosine transfor
m, QR decomposition of a matrix, and Cholesky factorization of a matri
x. We analyze the resulting system performance for these applications
with regard to speedup and efficiency as the number of processors incr
eases. Our results demonstrate that the block data parallel architectu
re is a flexible, high-performance solution for numerous digital signa
l and image processing algorithms.