The problem of partitioning systolic algorithms (a subclass of Regular
Iterative Algorithms (RIA)) has been studied by a number of researche
rs. One approach to partitioning the algorithm is to divide the index
space into equivalence classes called blocks. The blocks are scheduled
and placed on a processor array of a fixed size. This approach create
s the need for auxiliary memory structures to store partial results an
d the need for control circuitry to direct the resulting non-uniform f
low of data values within the array. The implementation of these struc
tures and the means of comparing the cost of differences in the struct
ures resulting from various partitioning and scheduling policies has b
een largely omitted in the literature. In this paper we offer a detail
ed model of Fixed Size Array Processor architectures. Using this model
we may assess the trade-offs between auxiliary memory structure size,
execution time, and control circuit complexity. The control circuits
generate signals which determine the data flow through the array. We e
mphasize methods for controlling non-uniform data flow resulting from
some partitionings of the index space. We show how to derive the contr
ol signals in such arrays from the determination of the index points.
We also show how the translation of index points to control signals ca
n be refined by a process of index pipelining which reduces the comple
xity of the control circuitry. Several partitioning and scheduling sch
emes for QR decomposition are evaluated to illustrate how the model ma
y be used to determine a good partitioning/scheduling/storage configur
ation.