Robust parallel computations through randomization

Citation
Sc. Kontogiannis et al., Robust parallel computations through randomization, THEOR C SYS, 33(5-6), 2000, pp. 427-464
Citations number
34
Categorie Soggetti
Computer Science & Engineering
Journal title
THEORY OF COMPUTING SYSTEMS
ISSN journal
14324350 → ACNP
Volume
33
Issue
5-6
Year of publication
2000
Pages
427 - 464
Database
ISI
SICI code
1432-4350(200009/12)33:5-6<427:RPCTR>2.0.ZU;2-F
Abstract
In this paper we present an efficient general simulation strategy for compu tations designed for fully operational BSP machines of n ideal processors, on n-processor dynamic-fault-prone BSP machines. The fault occurrences are fail-stop and fully dynamic, i.e., they are allowed to happen on-line at an y point of the computation, subject to the constraint that the total number of faulty processors may never exceed a known fraction. The computational paradigm can be exploited for robust computations over virtual parallel set tings with a volatile underlying infrastructure, such as a NETWORK OF WORKS TATIONS (where workstations may be taken out of the virtual parallel machin e by their owner). Our simulation strategy is Las Vegas (i.e., it may never fail, due to backt racking operations to robustly stored instances of the computation, in case of locally unrecoverable situations). It adopts an adaptive balancing sche me of the workload among the currently live processors of the BSP machine. Our strategy is efficient in the sense that, compared with an optimal off-l ine adversarial computation under the same sequence of fault occurrences, i t achieves an O((log n . log log n)(2)) multiplicative factor times the opt imal work (namely, this measure is in the sense of the "competitive ratio" of on-line analysis). In addition, our scheme is modular, integrated, and c onsiders many implementation points. We comment that, to our knowledge, no previous work on robust parallel comp utations has considered fully dynamic faults in the ssp model, or in genera l distributed memory systems. Furthermore, this is the first time an effici ent Las Vegas simulation in this area is achieved.