Linear-scaling electronic-structure techniques, also called O(N) techniques
, rely heavily on the multiplication of sparse matrices, where the sparsity
arises from spatial cut-offs, In order to treat very large systems, the ca
lculations must be run on parallel computers. We analyze the problem of par
allelizing the multiplication of sparse matrices with the sparsity pattern
required by linear-scaling techniques. We show that the management of inter
-node communications and the effective use of on-node cache are helped by o
rganizing the atoms into compact groups. We also discuss how to identify a
key part of the code called the 'multiplication kernel', which is repeatedl
y invoked to build up the matrix product, and explicit code is presented fo
r this kernel. Numerical tests of the resulting multiplication code are rep
orted for casts of practical interest, and it is shown that their scaling p
roperties for systems containing up to 20.000 atoms on machines having up t
o 512 processors are excellent. The tests also show that the CPU efficiency
of our code is satisfactory. (C) 2001 Elsevier Science B.V. All rights res
erved.