Jj. Carrig et Ggl. Meyer, EFFICIENT HOUSEHOLDER QR FACTORIZATION FOR SUPERSCALAR PROCESSORS, ACM transactions on mathematical software, 23(3), 1997, pp. 362-378
To extract the potential promised by superscalar processors, algorithm
designers must streamline memory references and allow for efficient d
ata reuse throughout the memory hierarchy. Two parameterized Household
er QR factorization algorithms are presented that take into account th
e caches and registers typical of such processors. Guidelines are deve
loped for choosing parameter values that obtain near-optimal cache and
register utilization. The new algorithms are implemented and performa
nce-tuned on an Intel Pentium Pro system, a single thin POWER2 node of
the IBM Scalable Parallel System 2 (SP2), and a single R8000 processo
r of a Silicon Graphics POWER Challenge XL.