A scalable integral direct, distributed-data parallel algorithm for fo
ur-index transformation is presented. The algorithm was implemented in
the context of the second-order Moller-Plesset (MP2) energy evaluatio
n, yet it is easily adopted for other electron correlation methods, wh
ere only MO integrals with two indices in the virtual orbitals space a
re required. The major computational steps of the MP2 energy are the t
wo-electron integral evaluation O(N-4) and transformation into the MO
basis O(ON4) where N is the number of basis functions, and O the numbe
r of occupied orbitals, respectively. The associated maximal communica
tion costs scale as O(n(Sigma)O(2)V N), where V and n(Sigma) denote th
e number of virtual orbitals, and the number of symmetry-unique shells
. The largest local and global memory requirements are Co(N-2) for the
MO coefficients and O(OV N) for the three-quarter transformed integra
ls, respectively. Several aspects of the implementation such as symmet
ry-treatment, integral prescreening, and the distribution of data and
computational tasks are discussed. The parallel efficiency of the algo
rithm is demonstrated by calculations on the phenanthrene molecule, wi
th 762 primitive Gaussians, contracted to 412 basis functions. The cal
culations were performed on an IBM SP2 with 48 nodes. The measured wal
l clock time on 48 nodes is less than 15 min for this calculation, and
the speedup relative to single-node execution is estimated to 527. Th
is superlinear speedup is a result of exploiting both the compute powe
r and the aggregate memory of the parallel computer. The tatter reduce
s the number of passes through the AO integral list, and hence the ope
ration count of the calculation. The test calculations also show that
the evaluation of the two-electron integrals dominates the calculation
, despite the higher scaling of the transformation step.