Compiler and run-time support for exploiting regularity within irregular applications

Citation
A. Lain et al., Compiler and run-time support for exploiting regularity within irregular applications, IEEE PARALL, 11(2), 2000, pp. 119-135
Citations number
38
Categorie Soggetti
Computer Science & Engineering
Journal title
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS
ISSN journal
10459219 → ACNP
Volume
11
Issue
2
Year of publication
2000
Pages
119 - 135
Database
ISI
SICI code
1045-9219(200002)11:2<119:CARSFE>2.0.ZU;2-Q
Abstract
This paper starts from a well-known idea. that structure in irregular probl ems improves sequential performance, and tries to show that the same struct ure can also be exploited for parallelization of irregular problems on a di stributed-memory multicomputer. In particular, we extend a well-known paral lelization technique called run-time compilation to use structure informati on that is explicit on the array subscripts. This paper presents a number o f internal representations suited to particular access patterns and shows h ow various preprocessing structures such as translation tables, trace array s, and interprocessor communication schedules can be encoded in terms of on e or more of these representations. We show how loop and index normalizatio n are important for detection of irregularity in array references, as well as the presence of locality in such references. This paper presents methods for detection of irregularity, feasibility of inspection, and finally, pla cement of inspectors and interprocessor communication schedules. We show th at this process can be automated through extensions to an HPF/Fortran-77 di stributed-memory compiler (PARADIGM) and a new runtime support for irregula r problems (PILAR) that uses a variety of internal representations of commu nication patterns. We devise performance measures which consider the relati onship between the inspection cost, the execution cost, and the number of t imes the executor is invoked so that a comparison of the competing schemes can be performed independent of the number of iterations. Finally, we show experimental results on an IBM SP-2 that validate our approach. These resul ts show that dramatic improvements in both memory requirements and executio n time can be achieved by using these techniques.