The efficient computation of ownership sets in HPF

Citation
Pg. Joisha et P. Banerjee, The efficient computation of ownership sets in HPF, IEEE PARALL, 12(8), 2001, pp. 769-788
Citations number
15
Categorie Soggetti
Computer Science & Engineering
Journal title
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS
ISSN journal
10459219 → ACNP
Volume
12
Issue
8
Year of publication
2001
Pages
769 - 788
Database
ISI
SICI code
1045-9219(200108)12:8<769:TECOOS>2.0.ZU;2-Z
Abstract
Ownership sets are fundamental to the partitioning of program computations across processors by the owner-computes rule. These sets arise due to the m apping of arrays onto processors. In this paper, we focus on how ownership sets can be efficiently determined in the context of the HPF language and s how how the structure of these sets can be symbolically characterized in th e presence of arbitrary array alignment and array distribution directives. Our starting point is a system of equalities and inequalities due to Ancour t et al. [1] that captures the array mapping problem in HPF. We arrive at a refined system that enables us to efficiently solve for the ownership set using the Fourier-Motzkin Elimination technique and that requires the cours e vector as the only auxiliary vector. The formulation makes it possible to enumerate the elements of the ownership set exactly once, a feature that i s very beneficial when such sets are applied to handle Do loops qualified b y HPF's INDEPENDENT directive. We develop important and general properties pertaining to HPF alignments and distributions and show how they can be use d to eliminate redundant communication due to array replication. Polynomial -time schemes that determine whether the ownership set of a particular proc essor, with respect to some array, is the empty set or whether the ownershi p set of every processor, with respect to some array, is the empty set, are presented. We show how distribution directives with unspecified processor meshes can be efficiently handled at compile time. We also show how to avoi d the generation of communication code when pairs of array references are u ltimately mapped onto the same processors. Experimental data demonstrating the improved code performance that the latter optimization enables is prese nted and discussed.