A new compact way to store a symmetric or triangular matrix called RPF for
Recursive Packed Format is fully described. Novel ways to transform RPF to
and from standard packed format are included. A new algorithm, called RPC f
or Recursive Packed Cholesky, that operates on the RPF format is presented.
Algorithm RPC is based on level-3 BLAS and requires variants of algorithms
TRSM and SYRK that work on RPF. We call these RP_TRSM and RP_SYRK and find
that they do most of their work by calling GEMM. It follows that most of t
he execution time of RPC lies in GEMM. The advantage of this storage scheme
compared to traditional packed and full storage is demonstrated. First, th
e RPC storage format uses the minimal amount of storage for the symmetric o
r triangular matrix. Second, RPC gives a level-3 implementation of Cholesky
factorization whereas standard packed implementations are only level 2. He
nce, the performance of our RPC implementation is decidedly superior. Third
, unlike fixed block size algorithms, RPC requires no block size tuning par
ameter. We present performance measurements on several current architecture
s that demonstrate improvements over the traditional packed routines. Also
SMP parallel computations on the IBM SMP computer are made. The graphs that
are attached in Section 7 show that the RPC algorithms are superior by a f
actor between 1.6 and 7.4 for order around 1000, and between 1.9 and 10.3 f
or order around 3000 over the traditional packed algorithms. For some archi
tectures, the RPC performance results are almost the same or even better th
an the traditional full-storage algorithm results.