The Vector Multiprocessor brings to the multiprocessor what vectorization b
rought to the single processor. In addition to the usual complement of logi
c and arithmetic units, each processor contains a programmable communicatio
n unit with registers that communicate directly with comparable registers i
n neighboring processors via an n-dimensional interconnection network. Inte
rprocessor communication tasks are performed to and from these registers in
the same way that computational tasks are performed on a vector uniprocess
or. Communication is shown to be optimal for a large class of communication
tasks. Elements are transmitted, in parallel, to their destination process
ors at an average rate of one per communication cycle. This result, called
O(1) access, is used to develop a balanced communication system where local
and global access are comparable. It is also used to support the "vector p
arallel paradigm" where all arrays are uniformly distributed and the user i
nterface "looks" like a vector uniprocessor interface. Both coarse- and fin
e-grain performance models are provided, which demonstrate the unexpected r
esult that communication is asymptotically negligible compared to computati
onal time. Finally, three performance models are presented for the spherica
l harmonic transform, which is the most communication-intensive part of cli
mate model dynamics.