ITA
ENG

PERFORMANCE OF PANEL AND BLOCK APPROACHES TO SPARSE CHOLESKY FACTORIZATION ON THE IPSC 860 AND PARAGON MULTICOMPUTERS/

Authors

ROTHBERG E

Citation

E. Rothberg, PERFORMANCE OF PANEL AND BLOCK APPROACHES TO SPARSE CHOLESKY FACTORIZATION ON THE IPSC 860 AND PARAGON MULTICOMPUTERS/, SIAM journal on scientific computing, 17(3), 1996, pp. 699-713

Citations number

Categorie Soggetti

Computer Sciences",Mathematics

Journal title

SIAM journal on scientific computing → ACNP

ISSN journal

10648275

Volume

Issue

Year of publication

1996

Pages

699 - 713

Database

ISI

SICI code

1064-8275(1996)17:3<699:POPABA>2.0.ZU;2-O

Abstract

Sparse Cholesky factorization has historically achieved extremely low performance on distributed-memory multiprocessors. We believe that thr ee issues must be addressed to improve this situation: (1) parallel fa ctorization methods must be based on more efficient sequential methods ; (2) parallel machines must provide higher interprocessor communicati on bandwidth; and (3) the sparse matrices used to evaluate parallel sp arse factorization performance should be more representative of the si zes of matrices people would factor on large parallel machines. This p aper demonstrates that all three of these issues have in fact already been addressed. Specifically, (1) single node performance can be impro ved by moving from a column-oriented approach, where the computational kernel is level 1 BLAS, to either a panel- or block-oriented approach , where the computational kernel is level 3 BLAS; (2) communication ha rdware has improved dramatically, with new parallel computers (the Int el Paragon system) providing one to two orders of magnitude higher com munication bandwidth than previous parallel computers (the Intel iPSC/ 860 system); and (3) several larger benchmark matrices are now availab le, and newer parallel machines offer sufficient memory per node to fa ctor these larger matrices. The result of addressing these three issue s is extremely high performance on moderately parallel machines. This paper demonstrates performance levels of 650 double-precision Mflops o n 32 nodes of the Intel Paragon system, 1 Chop on 64 nodes, and 1.7 Ch ops on 128 nodes. This paper also does a direct performance comparison between the iPSC/860 and Paragon systems, as well as a comparison bet ween panel- and block-oriented approaches to parallel factorization.