This paper describes changes made to a previous implementation of an N
-body tree code developed for a fine-grained, SIMD computer architectu
re. These changes include (1) switching from a balanced binary tree to
a balanced oct tree, (2) addition of quadrupole corrections, and (3)
having the particles search the tree in groups rather than individuall
y. An algorithm for limiting errors is also discussed. In aggregate, t
hese changes have led to a performance increase of over a factor of 10
compared to the previous code. For problems several times larger than
the processor array, the code now achieves performance levels of simi
lar to 1 Gflop on the Maspar MP-2 or roughly 20% of the quoted peak pe
rformance of this machine. This percentage is competitive with other p
arallel implementations of tree codes on MIMD architectures. This is s
ignificant, considering the low relative cost of SIMD architectures.