ITA
ENG

A SCALABLE METHOD FOR RUN-TIME LOOP PARALLELIZATION

Authors

RAUCHWERGER L AMATO NM PADUA DA

Citation

L. Rauchwerger et al., A SCALABLE METHOD FOR RUN-TIME LOOP PARALLELIZATION, International journal of parallel programming, 23(6), 1995, pp. 537-576

Citations number

Categorie Soggetti

Computer Sciences","Computer Science Theory & Methods

Journal title

International journal of parallel programming → ACNP

ISSN journal

08857458

Volume

Issue

Year of publication

1995

Pages

537 - 576

Database

ISI

SICI code

0885-7458(1995)23:6<537:ASMFRL>2.0.ZU;2-1

Abstract

Current parallelizing compilers do a reasonable job of extracting para llelism from programs with regular, well behaved, statically analyzabl e access patterns. However, they cannot extract a significant fraction of the available parallelism if the program has a complex and/or stat ically insufficiently defined access pattern, e.g., simulation program s with irregular domains and/or dynamically changing interactions. Sin ce such programs represent a large fraction of all applications, techn iques are needed for extracting their inherent parallelism at run-time . In this paper we give a new run-time technique for finding an optima l parallel execution schedule for a partially parallel loop, i.e., a l oop whose parallelization requires synchronization to ensure that the iterations are executed in the correct order. Given the original loop, the compiler generates inspector code that performs run-time preproce ssing of the loop's access pattern, and scheduler code that schedules (and executes) the loop iterations. The inspector is fully parallel, u ses no sychronization, and can be applied to any loop (from which an i nspector can be extracted). In addition, it can implement at run-time the two most effective transformations for increasing the amount of pa rallelism in a loop: array privatization and reduction parallelization (element-wise). The ability to identify privatizable and reduction va riables is very powerful since it eliminates the data dependences invo lving these variables and thereby potentially increases the overall pa rallelism of the loop. We also describe a new scheme for constructing an optimal parallel execution schedule for the iterations of the loop. The schedule produced is a partition of the set of iterations into su bsets called wavefronts so that there are no data dependences between iterations in a wavefront. Although the wavefronts themselves are cons tructed one after another, the computation of each wavefront is fully parallel and requires no synchronization. This new method has advantag es over all previous run-time techniques for analyzing and scheduling partially parallel loops since none of them simultaneously has all the se features.