We describe an efficient Particle-Mesh algorithm for the Connection Ma
chine CM-5. Our particular method parallelizes well and the computatio
n time per time step decreases as the particles become more clustered.
We achieve floating-point computation rates of 4-5 MFlops/processing
node and total operations (the sum of floating-point and integer arith
metic plus communications) of 5-19 MOps/sec/processing node. The rates
scale almost linearly from 32 to 256 processors. Although some of wha
t we discuss is specific to the CM-5, many aspects (e.g., the computat
ion of the force on a mesh) are generic to all implementations, and ot
her aspects (e.g., the algorithm for assignment of the density to the
mesh) are useful on any parallel computer.