COMPILING FOR DISTRIBUTED-MEMORY ARCHITECTURES

Citation
A. Rogers et K. Pingali, COMPILING FOR DISTRIBUTED-MEMORY ARCHITECTURES, IEEE transactions on parallel and distributed systems, 5(3), 1994, pp. 281-298
Citations number
35
Categorie Soggetti
System Science","Engineering, Eletrical & Electronic","Computer Science Theory & Methods
ISSN journal
10459219
Volume
5
Issue
3
Year of publication
1994
Pages
281 - 298
Database
ISI
SICI code
1045-9219(1994)5:3<281:CFDA>2.0.ZU;2-B
Abstract
Parallel computers provide a large degree of computational power for p rogrammers who are willing and able to harness it. The introduction of high-level languages and good compilers made possible the wide use of sequential machines, but the lack of such tools for parallel machines hinders their widespread acceptance and use. Programmers must address issues such as process decomposition, synchronization, and load balan cing. This is a severe burden and opens the door to time-dependent bug s, such as race conditions between reads and writes, which are extreme ly difficult to detect. We have developed a parallelizing compiler tha t, given a sequential program and a memory layout of its data, perform s process decomposition while balancing parallelism against locality o f reference. A process decomposition is obtained by specializing the p rogram for each processor to the data that resides on that processor. If this analysis fails, the compiler falls back to a simple but ineffi cient scheme called run-time resolution. Each process's role in the co mputation is determined by examining the data required for execution a t run-time. Thus, our approach to process decomposition is data-driven rather than program-driven. We discuss several message optimizations that address the issues of overhead and synchronization in message tra nsmission. Accumulation reorganizes the computation of a commutative a nd associative operator to reduce message traffic. Pipelining sends a value as close to its computation as possible to increase parallelism. Vectorization of messages combines messages with the same source and the same destination to reduce overhead. Our results from experiments in parallelizing SIMPLE, a large hydrodynamics benchmark, for the Inte l iPSC/2, show a speedup within 60% to 70% of handwritten code.