A. Krishnamurthy et K. Yelick, ANALYSES AND OPTIMIZATIONS FOR SHARED ADDRESS SPACE PROGRAMS, Journal of parallel and distributed computing, 38(2), 1996, pp. 130-144
Citations number
24
Categorie Soggetti
Computer Sciences","Computer Science Theory & Methods
We present compiler analyses and optimizations for explicitly parallel
programs that communicate through a shared address space. Any type of
code motion on explicitly parallel programs requires a new kind of an
alysis to ensure that operations reordered on one processor cannot be
observed by another. The analysis, called cycle detection, is based on
work by Shasha and Snir and checks for cycles among interfering acces
ses. We improve the accuracy of their analysis by using additional inf
ormation from synchronization analysis, which handles post-wait synchr
onization, barriers, and locks. We also make the analysis efficient by
exploiting the common code image property of SPMD programs. We make n
o assumptions on the use of synchronization constructs: our transforma
tions preserve program meaning even in the presence of race conditions
, user-defined spin locks, or other synchronization mechanisms built f
rom shared memory. However, programs that use linguistic synchronizati
on constructs rather than their user-defined shared memory counterpart
s will benefit from more accurate analysis and therefore better optimi
zation. We demonstrate the use of this analysis for communication opti
mizations on distributed memory machines by automatically transforming
programs written in a conventional shared memory style into a Split-C
program, which has primitives for nonblocking memory operations and o
ne-way communication. The optimizations include message pipelining, to
allow multiple outstanding remote memory operations, conversion of tw
o-way to one-way communication, and elimination of communication throu
gh data reuse. The performance improvements are as high as 20-35% for
programs running on a CM-5 multiprocessor using the Split-C language a
s a global address layer. Even larger benefits can be expected on mach
ines with higher communication latency relative to processor speed. (C
) 1996 Academic Press, Inc.