We propose a new parallelization scheme for the Hessenberg double shift QR
algorithm. Our scheme allows software pipelining and communication latency
hiding, and gives almost perfect load balance. An asymptotic parallelizing
overhead analysis shows that our scheme attains the best possible scalabili
ty of the double shift QR algorithm, and that the overheads are less than t
he multishift algorithm when n = omega(p(2)), where n is the matrix size an
d p is the number of processors. Its high exploitation of the parallelism o
f the double shift QR algorithm is demonstrated by an implementation on Fuj
itsu AP1000+ multicomputer system. (C) 1999 Elsevier Science B.V. All right
s reserved.