M. Ahuja et S. Mishra, UNITS OF COMPUTATION IN FAULT-TOLERANT DISTRIBUTED SYSTEMS, Journal of parallel and distributed computing, 40(2), 1997, pp. 194-209
Citations number
30
Categorie Soggetti
Computer Sciences","Computer Science Theory & Methods
We develop a framework that helps in understanding a fault-tolerant di
stributed system and so aids in designing such systems. We illustrate
the uses of the developed work in application areas such as checkpoint
ing and recovery, phase termination detection, stable property detecti
on, implementing membership protocols, debugging, and design of progra
mming languages. We define a unit of computation, and refer to it as a
molecule. A molecule has a well defined interface with other molecule
s. The smallest such unit-an indivisible molecule-is termed an atom. W
e show that any execution of a fault-tolerant distributed computation
can be seen as an execution of molecules/atoms in a partial order, and
such a view provides insights into understanding the computation, par
ticularly for a fault-tolerant system where it is important to guarant
ee that a unit of computation is either completely executed or not at
all and system designers need to reason about the states after executi
on of such units. Molecules are essentially a generalization of atomic
actions. (C) 1997 Academic Press.