A recursive formulation of Cholesky factorization of a matrix in packed storage

Citation
Bs. Andersen et al., A recursive formulation of Cholesky factorization of a matrix in packed storage, ACM T MATH, 27(2), 2001, pp. 214-244
Citations number
23
Categorie Soggetti
Computer Science & Engineering
Journal title
ACM TRANSACTIONS ON MATHEMATICAL SOFTWARE
ISSN journal
00983500 → ACNP
Volume
27
Issue
2
Year of publication
2001
Pages
214 - 244
Database
ISI
SICI code
0098-3500(200106)27:2<214:ARFOCF>2.0.ZU;2-W
Abstract
A new compact way to store a symmetric or triangular matrix called RPF for Recursive Packed Format is fully described. Novel ways to transform RPF to and from standard packed format are included. A new algorithm, called RPC f or Recursive Packed Cholesky, that operates on the RPF format is presented. Algorithm RPC is based on level-3 BLAS and requires variants of algorithms TRSM and SYRK that work on RPF. We call these RP_TRSM and RP_SYRK and find that they do most of their work by calling GEMM. It follows that most of t he execution time of RPC lies in GEMM. The advantage of this storage scheme compared to traditional packed and full storage is demonstrated. First, th e RPC storage format uses the minimal amount of storage for the symmetric o r triangular matrix. Second, RPC gives a level-3 implementation of Cholesky factorization whereas standard packed implementations are only level 2. He nce, the performance of our RPC implementation is decidedly superior. Third , unlike fixed block size algorithms, RPC requires no block size tuning par ameter. We present performance measurements on several current architecture s that demonstrate improvements over the traditional packed routines. Also SMP parallel computations on the IBM SMP computer are made. The graphs that are attached in Section 7 show that the RPC algorithms are superior by a f actor between 1.6 and 7.4 for order around 1000, and between 1.9 and 10.3 f or order around 3000 over the traditional packed algorithms. For some archi tectures, the RPC performance results are almost the same or even better th an the traditional full-storage algorithm results.