The data dependence of the delete-and-insert sort algorithm for median filt
ering is analyzed in this paper. It is shown that because of data dependenc
e, the fastest throughput rate and the most efficient pipeline scheme canno
t be used concurrently. A modified delete-and-insert sort algorithm avoidin
g the above dilemma and its bit-level systolic array implementation are pro
posed in this paper. The throughput rate of the proposed architecture is eq
ual to one-half (output/clocks) the maximum throughput allowed by the delet
e-and-insert sort algorithm, and the clock cycle time is equal to the propa
gation delay of a simple combinational circuit, Its speed is about 1.5 time
s faster than the existing bit-level systolic array designed by using the s
ame delete-and-insert sort algorithm. The proposed architecture can be desi
gned to operate at different word lengths and different window sizes. It is
modular, regular, and of local interconnections and therefore amenable for
VLSI implementation.