Compiler algorithms for optimizing locality and parallelism on shared and distributed-memory machines

Citation
M. Kandemir et al., Compiler algorithms for optimizing locality and parallelism on shared and distributed-memory machines, J PAR DISTR, 60(8), 2000, pp. 924-965
Citations number
51
Categorie Soggetti
Computer Science & Engineering
Journal title
JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING
ISSN journal
07437315 → ACNP
Volume
60
Issue
8
Year of publication
2000
Pages
924 - 965
Database
ISI
SICI code
0743-7315(200008)60:8<924:CAFOLA>2.0.ZU;2-7
Abstract
Distributed-memory message-passing machines deliver scalable performance bu t are difficult to program. Shared-memory machines, on the other hand, are easier to program but obtaining scalable performance with. large number of processors is difficult. Recently, scalable machines based on logically sha red physically distributed memory have been designed and implemented. While some of the performance issues like parallelism and locality are common to different parallel architectures, issues such as data distribution are uni que to specific architectures. One of the most important challenges compile r writers face is the design of compilation techniques that can work well o n a variety of architectures. In this paper, we propose an algorithm that c an be employed by optimizing compilers for different types of parallel arch itectures. Our optimization algorithm does the following: (1) transforms lo op nests such that, where possible, the iterations of the outer-most loops can be run in parallel across processors; (2) optimizes memory locality by carefully distributing each array across processors; (3) optimizes interpro cessor communication using message vectorization whenever possible; and (4) optimizes cache locality by assigning appropriate storage layout for each array and by transforming the iteration space. Depending on the machine arc hitecture, some or all of these steps can he applied in a unified framework . We report empirical results on an SGI Origin 2000 distributed-shared-memo ry multiprocessor and an IBM SP-2 distributed-memory message-passing machin e to validate the effectiveness of our approach. (C) 2000 Academic Press.