A stable vector algorithm for the solution of block bidiagonal linear
systems is obtained by a permutation of the unknowns called wrap-aroun
d partitioning combined with standard QR factorization. Wrap-around pa
rtitioning uses blocking and selects the unknowns in the blocks in tur
ns. After a suitable orthogonal elimination step one ends up with a re
duced system which is again block bidiagonal and so wrap-around partit
ioning can be applied again. Using a simple model for vectorization ov
erhead it is shown that small block sizes give best performance. The m
inimal block size 2, which corresponds to cyclic reduction, is subopti
mal due to memory bank conflicts.