Minimal-storage high-performance Cholesky factorization via blocking and recursion

Citation
Fg. Gustavson et I. Jonsson, Minimal-storage high-performance Cholesky factorization via blocking and recursion, IBM J RES, 44(6), 2000, pp. 823-850
Citations number
12
Categorie Soggetti
Multidisciplinary,"Computer Science & Engineering
Journal title
IBM JOURNAL OF RESEARCH AND DEVELOPMENT
ISSN journal
00188646 → ACNP
Volume
44
Issue
6
Year of publication
2000
Pages
823 - 850
Database
ISI
SICI code
0018-8646(200011)44:6<823:MHCFVB>2.0.ZU;2-C
Abstract
We present a novel practical algorithm for Cholesky factorization when the matrix is stored in packed format by combining blocking and recursion. The algorithm simultaneously obtains Level 3 performance, conserves about half the storage, and avoids the production of Level 3 BLAB for packed format. W e use recursive packed format, which was first described by Andersen et al. [1]. Our algorithm uses only DGEMM and Level 3 kernel routines; it first t ransforms standard packed format to packed recursive lower row format. Our new algorithm outperforms the Level 3 LAPACK routine DPOTRF even when we in clude the cost of data transformation. (This is true for three IBM platform s-the POWER3, the POWER2, and the PowerPC 604e.) For large matrices, blocki ng is not required for acceptable Level 3 performance. However, for small m atrices the overhead of pure recursion and/or data transformation is too hi gh. We analyze these costs analytically and provide detailed cost estimates . We show that blocking combined with recursion reduces all overheads to a tiny, acceptable level. However, a new problem of nonlinear addressing aris es. We use two-dimensional mappings (tables) or data copying to overcome th e high costs of directly computing addresses that are nonlinear functions o f i and j.