Synthesizing efficient out-of-core programs for block recursive algorithmsusing block-cyclic data distributions

Citation
Zy. Li et al., Synthesizing efficient out-of-core programs for block recursive algorithmsusing block-cyclic data distributions, IEEE PARALL, 10(3), 1999, pp. 297-315
Citations number
23
Categorie Soggetti
Computer Science & Engineering
Journal title
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS
ISSN journal
10459219 → ACNP
Volume
10
Issue
3
Year of publication
1999
Pages
297 - 315
Database
ISI
SICI code
1045-9219(199903)10:3<297:SEOPFB>2.0.ZU;2-5
Abstract
In this paper, we present a framework for synthesizing I/O efficient out-of -core programs for block recursive algorithms, such as the fast Fourier tra nsform (FFT) and block matrix transposition algorithms. Our framework uses an algebraic representation which is based on tensor products and other mat rix operations. The programs are optimized for the striped Vitter and Shriv er's two-level memory model in which data can be distributed using various cyclic(B) distributions in contrast to the normally used physical track dis tribution cyclic(B-d), where B-d is the physical disk block size. We first introduce tensor bases to capture the semantics of block-cyclic data distri butions of out-of-core data and also data access patterns to out-of-core da ta. We then present program generation techniques for tensor products and m atrix transposition. We accurately represent the number of parallel I/O ope rations required for the synthesized programs for tensor products and matri x transposition as a function of tensor bases and data distributions. We in troduce an algorithm to determine the data distribution which optimizes the performance of the synthesized programs. Further, we formalize the procedu re of synthesizing efficient out-of-core programs for tensor product formul as with various block-cyclic distributions as a dynamic programming problem . We demonstrate the effectiveness of our approach through several examples . We show that the choice of an appropriate data distribution can reduce th e number of passes to access out-of-core data by as large as eight times fo r a tensor product and the dynamic programming approach can largely reduce the number of passes to access out-of-core data for the overall tensor prod uct formulas.