Xb. Zhang et Aa. Chien, DYNAMIC POINTER ALIGNMENT - TILING AND COMMUNICATION OPTIMIZATIONS FOR PARALLEL POINTER-BASED COMPUTATIONS, ACM SIGPLAN NOTICES, 32(7), 1997, pp. 37-47
Loop tiling and communication optimizations, such as message pipelinin
g and aggregation, can achieve optimized and robust memory performance
by proactively managing storage and data movement. In this paper, we
generalize these techniques to pointer-based data structures (PBDSs).
Our approach, dynamic pointer alignment (DPA), has two components. The
compiler decomposes a program into non-blocking threads that operate
on specific pointers and labels thread creation sites with their corre
sponding pointers. At runtime, an explicit mapping from pointers to de
pendent threads is updated at thread creation and is used to dynamical
ly schedule both threads and communication, such that threads using th
e same objects execute together, communication overlaps with local wor
k, and messages are aggregated. We have implemented DPA to optimize re
mote reads to global PBDSs on parallel machines. Our empirical results
on the force computation phases of two applications that use sophisti
cated PBDSs, Barnes-Hut and FMM, show that DPA achieves good absolute
performance and speedups by enabling tiling and communication optimiza
tions on the CRAY T3D.