Applying recursion to serial and parallel QR factorization leads to betterperformance

Citation
E. Elmroth et Fg. Gustavson, Applying recursion to serial and parallel QR factorization leads to betterperformance, IBM J RES, 44(4), 2000, pp. 605-624
Citations number
16
Categorie Soggetti
Multidisciplinary,"Computer Science & Engineering
Journal title
IBM JOURNAL OF RESEARCH AND DEVELOPMENT
ISSN journal
00188646 → ACNP
Volume
44
Issue
4
Year of publication
2000
Pages
605 - 624
Database
ISI
SICI code
0018-8646(200007)44:4<605:ARTSAP>2.0.ZU;2-J
Abstract
We present new recursive serial and parallel algorithms for QR factorizatio n of an m by n matrix. They improve performance. The recursion leads to an automatic variable blocking, and it also replaces a Level 2 part in a stand ard block algorithm with Level 3 operations. However, there are significant additional costs for creating and performing the updates, which prohibit t he efficient use of the recursion for large n. We present a quantitative an alysis of these extra costs. This analysis leads us to introduce a hybrid r ecursive algorithm that outperforms the LAPACK algorithm DGEQRF by about 20 % for large square matrices and up to almost a factor of 3 for tall thin ma trices. Uniprocessor performance results are presented for two IBM RS/6000( R) SP nodes-a 120-MHz IBM POWER2 node and one processor of a four-way 332-M Hz IBM PowerPC(R) 604e SMP node. The hybrid recursive algorithm reaches mor e than 90% of the theoretical peak performance of the POWER2 node, Compared to standard block algorithms, the recursive approach also shows a signific ant advantage in the automatic tuning obtained from its automatic variable blocking. A successful parallel implementation on a four-way 332-MHz IBM PP C604e SMP node based on dynamic load balancing is presented. For two, three , and four processors it shows speedups of up to 1.97, 2.99, and 3.97.