ITA
ENG

COMPILING FOR DISTRIBUTED-MEMORY ARCHITECTURES

Authors

ROGERS A PINGALI K

Citation

A. Rogers et K. Pingali, COMPILING FOR DISTRIBUTED-MEMORY ARCHITECTURES, IEEE transactions on parallel and distributed systems, 5(3), 1994, pp. 281-298

Citations number

Categorie Soggetti

System Science","Engineering, Eletrical & Electronic","Computer Science Theory & Methods

Journal title

IEEE transactions on parallel and distributed systems → ACNP

ISSN journal

10459219

Volume

Issue

Year of publication

1994

Pages

281 - 298

Database

ISI

SICI code

1045-9219(1994)5:3<281:CFDA>2.0.ZU;2-B

Abstract

Parallel computers provide a large degree of computational power for p rogrammers who are willing and able to harness it. The introduction of high-level languages and good compilers made possible the wide use of sequential machines, but the lack of such tools for parallel machines hinders their widespread acceptance and use. Programmers must address issues such as process decomposition, synchronization, and load balan cing. This is a severe burden and opens the door to time-dependent bug s, such as race conditions between reads and writes, which are extreme ly difficult to detect. We have developed a parallelizing compiler tha t, given a sequential program and a memory layout of its data, perform s process decomposition while balancing parallelism against locality o f reference. A process decomposition is obtained by specializing the p rogram for each processor to the data that resides on that processor. If this analysis fails, the compiler falls back to a simple but ineffi cient scheme called run-time resolution. Each process's role in the co mputation is determined by examining the data required for execution a t run-time. Thus, our approach to process decomposition is data-driven rather than program-driven. We discuss several message optimizations that address the issues of overhead and synchronization in message tra nsmission. Accumulation reorganizes the computation of a commutative a nd associative operator to reduce message traffic. Pipelining sends a value as close to its computation as possible to increase parallelism. Vectorization of messages combines messages with the same source and the same destination to reduce overhead. Our results from experiments in parallelizing SIMPLE, a large hydrodynamics benchmark, for the Inte l iPSC/2, show a speedup within 60% to 70% of handwritten code.