ITA
ENG

ANALYSES AND OPTIMIZATIONS FOR SHARED ADDRESS SPACE PROGRAMS

Authors

KRISHNAMURTHY A YELICK K

Citation

A. Krishnamurthy et K. Yelick, ANALYSES AND OPTIMIZATIONS FOR SHARED ADDRESS SPACE PROGRAMS, Journal of parallel and distributed computing, 38(2), 1996, pp. 130-144

Citations number

Categorie Soggetti

Computer Sciences","Computer Science Theory & Methods

Journal title

Journal of parallel and distributed computing → ACNP

ISSN journal

07437315

Volume

Issue

Year of publication

1996

Pages

130 - 144

Database

ISI

SICI code

0743-7315(1996)38:2<130:AAOFSA>2.0.ZU;2-7

Abstract

We present compiler analyses and optimizations for explicitly parallel programs that communicate through a shared address space. Any type of code motion on explicitly parallel programs requires a new kind of an alysis to ensure that operations reordered on one processor cannot be observed by another. The analysis, called cycle detection, is based on work by Shasha and Snir and checks for cycles among interfering acces ses. We improve the accuracy of their analysis by using additional inf ormation from synchronization analysis, which handles post-wait synchr onization, barriers, and locks. We also make the analysis efficient by exploiting the common code image property of SPMD programs. We make n o assumptions on the use of synchronization constructs: our transforma tions preserve program meaning even in the presence of race conditions , user-defined spin locks, or other synchronization mechanisms built f rom shared memory. However, programs that use linguistic synchronizati on constructs rather than their user-defined shared memory counterpart s will benefit from more accurate analysis and therefore better optimi zation. We demonstrate the use of this analysis for communication opti mizations on distributed memory machines by automatically transforming programs written in a conventional shared memory style into a Split-C program, which has primitives for nonblocking memory operations and o ne-way communication. The optimizations include message pipelining, to allow multiple outstanding remote memory operations, conversion of tw o-way to one-way communication, and elimination of communication throu gh data reuse. The performance improvements are as high as 20-35% for programs running on a CM-5 multiprocessor using the Split-C language a s a global address layer. Even larger benefits can be expected on mach ines with higher communication latency relative to processor speed. (C ) 1996 Academic Press, Inc.