PERFORMANCE OF PANEL AND BLOCK APPROACHES TO SPARSE CHOLESKY FACTORIZATION ON THE IPSC 860 AND PARAGON MULTICOMPUTERS/

Authors
Citation
E. Rothberg, PERFORMANCE OF PANEL AND BLOCK APPROACHES TO SPARSE CHOLESKY FACTORIZATION ON THE IPSC 860 AND PARAGON MULTICOMPUTERS/, SIAM journal on scientific computing, 17(3), 1996, pp. 699-713
Citations number
21
Categorie Soggetti
Computer Sciences",Mathematics
ISSN journal
10648275
Volume
17
Issue
3
Year of publication
1996
Pages
699 - 713
Database
ISI
SICI code
1064-8275(1996)17:3<699:POPABA>2.0.ZU;2-O
Abstract
Sparse Cholesky factorization has historically achieved extremely low performance on distributed-memory multiprocessors. We believe that thr ee issues must be addressed to improve this situation: (1) parallel fa ctorization methods must be based on more efficient sequential methods ; (2) parallel machines must provide higher interprocessor communicati on bandwidth; and (3) the sparse matrices used to evaluate parallel sp arse factorization performance should be more representative of the si zes of matrices people would factor on large parallel machines. This p aper demonstrates that all three of these issues have in fact already been addressed. Specifically, (1) single node performance can be impro ved by moving from a column-oriented approach, where the computational kernel is level 1 BLAS, to either a panel- or block-oriented approach , where the computational kernel is level 3 BLAS; (2) communication ha rdware has improved dramatically, with new parallel computers (the Int el Paragon system) providing one to two orders of magnitude higher com munication bandwidth than previous parallel computers (the Intel iPSC/ 860 system); and (3) several larger benchmark matrices are now availab le, and newer parallel machines offer sufficient memory per node to fa ctor these larger matrices. The result of addressing these three issue s is extremely high performance on moderately parallel machines. This paper demonstrates performance levels of 650 double-precision Mflops o n 32 nodes of the Intel Paragon system, 1 Chop on 64 nodes, and 1.7 Ch ops on 128 nodes. This paper also does a direct performance comparison between the iPSC/860 and Paragon systems, as well as a comparison bet ween panel- and block-oriented approaches to parallel factorization.