Multitolerance in distributed reset

Citation
Ss. Kulkarni et A. Arora, Multitolerance in distributed reset, CH J THEOR, (4), 1998, pp. 1-46
Citations number
20
Categorie Soggetti
Computer Science & Engineering
Journal title
CHICAGO JOURNAL OF THEORETICAL COMPUTER SCIENCE
ISSN journal
10730486 → ACNP
Issue
4
Year of publication
1998
Pages
1 - 46
Database
ISI
SICI code
1073-0486(199812):4<1:MIDR>2.0.ZU;2-U
Abstract
A reset of a distributed system is safe if it does not complete prematurely ," i.e., without having reset some process in the system. Safe resets are p ossible in the presence of certain faults, such as process fail-stops and r epairs, but are not always possible in the presence of more general faults, such as arbitrary transients. In this paper, we design a bounded-memory di stributed-reset program that possesses two tolerances: (1) in the presence of fail-stops and repairs, it always executes resets safely, and (2) in the presence of a finite number of transient faults, it eventually executes re sets safely. Designing this multitolerance in the reset program introduces the novel concern of designing a safety detector that is itself multitolera nt. A broad application of our multitolerant safety detector is to make any total program likewise multitolerant.