ANALYSES AND OPTIMIZATIONS FOR SHARED ADDRESS SPACE PROGRAMS

Citation
A. Krishnamurthy et K. Yelick, ANALYSES AND OPTIMIZATIONS FOR SHARED ADDRESS SPACE PROGRAMS, Journal of parallel and distributed computing, 38(2), 1996, pp. 130-144
Citations number
24
Categorie Soggetti
Computer Sciences","Computer Science Theory & Methods
ISSN journal
07437315
Volume
38
Issue
2
Year of publication
1996
Pages
130 - 144
Database
ISI
SICI code
0743-7315(1996)38:2<130:AAOFSA>2.0.ZU;2-7
Abstract
We present compiler analyses and optimizations for explicitly parallel programs that communicate through a shared address space. Any type of code motion on explicitly parallel programs requires a new kind of an alysis to ensure that operations reordered on one processor cannot be observed by another. The analysis, called cycle detection, is based on work by Shasha and Snir and checks for cycles among interfering acces ses. We improve the accuracy of their analysis by using additional inf ormation from synchronization analysis, which handles post-wait synchr onization, barriers, and locks. We also make the analysis efficient by exploiting the common code image property of SPMD programs. We make n o assumptions on the use of synchronization constructs: our transforma tions preserve program meaning even in the presence of race conditions , user-defined spin locks, or other synchronization mechanisms built f rom shared memory. However, programs that use linguistic synchronizati on constructs rather than their user-defined shared memory counterpart s will benefit from more accurate analysis and therefore better optimi zation. We demonstrate the use of this analysis for communication opti mizations on distributed memory machines by automatically transforming programs written in a conventional shared memory style into a Split-C program, which has primitives for nonblocking memory operations and o ne-way communication. The optimizations include message pipelining, to allow multiple outstanding remote memory operations, conversion of tw o-way to one-way communication, and elimination of communication throu gh data reuse. The performance improvements are as high as 20-35% for programs running on a CM-5 multiprocessor using the Split-C language a s a global address layer. Even larger benefits can be expected on mach ines with higher communication latency relative to processor speed. (C ) 1996 Academic Press, Inc.