A general approach to the parallel sparse-blocked matrix-matrix multiply is
developed in the context of linear scaling self-consistent-field (SCF) the
ory. The data-parallel message passing method uses non-blocking communicati
on to overlap computation and communication. The space filling curve heuris
tic is used to achieve data locality for sparse matrix elements that decay
with "separation". Load balance is achieved by solving the bin packing prob
lem for blocks with variable size.
With this new method as the kernel, parallel performance of the simplified
density matrix minimization (SDMM) for solution of the SCF equations is inv
estigated for RHF/6-31G** water clusters and RHF/3-21G estane globules. Sus
tained rates above 5.7 GFLOPS for the SDMM have been achieved for (H2O)(200
) with 95 Origin 2000 processors. Scalability is found to be limited by loa
d imbalance, which increases with decreasing granularity, due primarily to
the inhomogeneous distribution of variable block sizes. Published by Elsevi
er Science B.V.