E. Rothberg, PERFORMANCE OF PANEL AND BLOCK APPROACHES TO SPARSE CHOLESKY FACTORIZATION ON THE IPSC 860 AND PARAGON MULTICOMPUTERS/, SIAM journal on scientific computing, 17(3), 1996, pp. 699-713
Sparse Cholesky factorization has historically achieved extremely low
performance on distributed-memory multiprocessors. We believe that thr
ee issues must be addressed to improve this situation: (1) parallel fa
ctorization methods must be based on more efficient sequential methods
; (2) parallel machines must provide higher interprocessor communicati
on bandwidth; and (3) the sparse matrices used to evaluate parallel sp
arse factorization performance should be more representative of the si
zes of matrices people would factor on large parallel machines. This p
aper demonstrates that all three of these issues have in fact already
been addressed. Specifically, (1) single node performance can be impro
ved by moving from a column-oriented approach, where the computational
kernel is level 1 BLAS, to either a panel- or block-oriented approach
, where the computational kernel is level 3 BLAS; (2) communication ha
rdware has improved dramatically, with new parallel computers (the Int
el Paragon system) providing one to two orders of magnitude higher com
munication bandwidth than previous parallel computers (the Intel iPSC/
860 system); and (3) several larger benchmark matrices are now availab
le, and newer parallel machines offer sufficient memory per node to fa
ctor these larger matrices. The result of addressing these three issue
s is extremely high performance on moderately parallel machines. This
paper demonstrates performance levels of 650 double-precision Mflops o
n 32 nodes of the Intel Paragon system, 1 Chop on 64 nodes, and 1.7 Ch
ops on 128 nodes. This paper also does a direct performance comparison
between the iPSC/860 and Paragon systems, as well as a comparison bet
ween panel- and block-oriented approaches to parallel factorization.