Minimizing communication penalty of triangular solvers by runtime mesh configuration and workload redistribution

Authors
Citation
Dq. Wang et E. Chu, Minimizing communication penalty of triangular solvers by runtime mesh configuration and workload redistribution, J SUPERCOMP, 14(1), 1999, pp. 77-95
Citations number
10
Categorie Soggetti
Computer Science & Engineering
Journal title
JOURNAL OF SUPERCOMPUTING
ISSN journal
09208542 → ACNP
Volume
14
Issue
1
Year of publication
1999
Pages
77 - 95
Database
ISI
SICI code
0920-8542(199907)14:1<77:MCPOTS>2.0.ZU;2-I
Abstract
In this article, we study the effects of network topology and load balancin g on the performance of a new parallel algorithm for solving triangular sys tems of linear equations on distributed-memory message-passing multiprocess ors. The proposed algorithm employs novel runtime data mapping and workload redistribution methods on a communication network which is configured as a toroidal mesh. A fully parameterized theoretical model is used to predict communication behaviors of the proposed algorithm relevant to load balancin g, and the analytical performance results correctly determine the optimal d imensions of the toroidal mesh, which vary with the problem size, the numbe r of available processors, and the hardware parameters of the machine. Furt her enhancement to the proposed algorithm is then achieved through redistri buting the arithmetic workload at runtime. Our FORTRAN implementation of th e proposed algorithm as well as its enhanced version has been tested on an Intel iPSC/2 hypercube, and the same code is also suitable for executing th e algorithm on the iPSC/860 hypercube and the Intel Paragon mesh multiproce ssor. The actual timing results support our theoretical findings, and they both confirm the very significant impact a network topology chosen at runti me can have on the computational load distribution, the communication behav iors and the overall performance of parallel algorithms.