ITA
ENG

Compiler and run-time support for exploiting regularity within irregular applications

Authors

Lain, A Chakrabarti, DR Banerjee, P

Citation

A. Lain et al., Compiler and run-time support for exploiting regularity within irregular applications, IEEE PARALL, 11(2), 2000, pp. 119-135

Citations number

Categorie Soggetti

Computer Science & Engineering

Journal title

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS

ISSN journal

10459219 → ACNP

Volume

Issue

Year of publication

2000

Pages

119 - 135

Database

ISI

SICI code

1045-9219(200002)11:2<119:CARSFE>2.0.ZU;2-Q

Abstract

This paper starts from a well-known idea. that structure in irregular probl ems improves sequential performance, and tries to show that the same struct ure can also be exploited for parallelization of irregular problems on a di stributed-memory multicomputer. In particular, we extend a well-known paral lelization technique called run-time compilation to use structure informati on that is explicit on the array subscripts. This paper presents a number o f internal representations suited to particular access patterns and shows h ow various preprocessing structures such as translation tables, trace array s, and interprocessor communication schedules can be encoded in terms of on e or more of these representations. We show how loop and index normalizatio n are important for detection of irregularity in array references, as well as the presence of locality in such references. This paper presents methods for detection of irregularity, feasibility of inspection, and finally, pla cement of inspectors and interprocessor communication schedules. We show th at this process can be automated through extensions to an HPF/Fortran-77 di stributed-memory compiler (PARADIGM) and a new runtime support for irregula r problems (PILAR) that uses a variety of internal representations of commu nication patterns. We devise performance measures which consider the relati onship between the inspection cost, the execution cost, and the number of t imes the executor is invoked so that a comparison of the competing schemes can be performed independent of the number of iterations. Finally, we show experimental results on an IBM SP-2 that validate our approach. These resul ts show that dramatic improvements in both memory requirements and executio n time can be achieved by using these techniques.