Achieving robustness and minimizing overhead in parallel algorithms through overlapped communication/computation

Citation
Ak. Somani et Am. Sansano, Achieving robustness and minimizing overhead in parallel algorithms through overlapped communication/computation, J SUPERCOMP, 16(1), 2000, pp. 27-52
Citations number
22
Categorie Soggetti
Computer Science & Engineering
Journal title
JOURNAL OF SUPERCOMPUTING
ISSN journal
09208542 → ACNP
Volume
16
Issue
1
Year of publication
2000
Pages
27 - 52
Database
ISI
SICI code
0920-8542(200005)16:1<27:ARAMOI>2.0.ZU;2-K
Abstract
One of the major goals in the design of parallel processing machines and al gorithms is to achieve robustness and reduce the effects of the overhead in troduced when a given problem is parallelized or a fault occurs. A key cont ributor to overhead is communication time, in particular when a node is fau lty and another node is substuiting for its operation. Many architectures t ry to reduce this overhead by minimizing the actual time for a communicatio n to occur, including latency and bandwidth figures. Another approach is to hide communication by overlapping it with computation assuming that the co mputation is the most prominent factor. This paper presents the mechanisms provided in the Proteus parallel computer and its effective use of communic ation hiding through overlapping communication/computation techniques with and without the presence of a fault. These techniques are easily extended f or use in compiler support of parallel programming. We also address the com plexity (or rather simplicity) in achieving complete exchange on the Proteu s Machine.