Efficient algorithms for block-cyclic array redistribution between processor sets

Citation
N. Park et al., Efficient algorithms for block-cyclic array redistribution between processor sets, IEEE PARALL, 10(12), 1999, pp. 1217-1240
Citations number
25
Categorie Soggetti
Computer Science & Engineering
Journal title
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS
ISSN journal
10459219 → ACNP
Volume
10
Issue
12
Year of publication
1999
Pages
1217 - 1240
Database
ISI
SICI code
1045-9219(199912)10:12<1217:EAFBAR>2.0.ZU;2-J
Abstract
Run-time array redistribution is necessary to enhance the performance of pa rallel programs on distributed memory supercomputers. In this paper, we pre sent an efficient algorithm for array redistribution from cyclic(x) on P pr ocessors to cyclic(Kx) on Q processors. The algorithm reduces the overall t ime for communication by considering the data transfer, communication sched ule, and index computation costs. The proposed algorithm is based on a gene ralized circulant matrix formalism. Our algorithm generates a schedule that minimizes the number of communication steps and eliminates node contention in each communication step. The network bandwidth is fully utilized by ens uring that equal-sized messages are transferred in each communication step. Furthermore, the time to compute the schedule and the index sets is signif icantly smaller. It takes O(maz(P, Q)) time and is less than 1 percent of t he data transfer time. In comparison, the schedule computation time using t he state-of-the-art scheme (which is based on the bipartite matching scheme ) is 10 to 50 percent of the data transfer time for similar problem sizes. Therefore, our proposed algorithm is suitable for run-time array redistribu tion. To evaluate the performance of our scheme, we have implemented the al gorithm using C and MPI on an IBM SP2. Results show that our algorithm perf orms better than the previous algorithms with respect to the total redistri bution time, which includes the time for data transfer. schedule, and index computation.