Cq. Yang et Ys. Qu, RELIABILITY-ANALYSIS FOR THE EXECUTION OF REMOTE JOBS IN A WORKSTATION-BASED ENVIRONMENT, Computer systems science and engineering, 10(2), 1995, pp. 120-128
Citations number
17
Categorie Soggetti
System Science","Computer Application, Chemistry & Engineering","Computer Sciences, Special Topics","Computer Science Theory & Methods
Many workstation-based distributed systems allow programs to be execut
ed on remote machines for effective utilization of system resources. U
sually, the control policies in these systems force a remote job be di
scontinued by the arrival of local jobs to guarantee the autonomy of e
ach workstation. Therefore, one special concern in the design of such
systems is tile fault-tolerant aspects for the execution of remote job
s. In this paper, we discuss two control policies of workstation-based
distributed systems, the checkpointing and non-checkpointing policy,
which support fault-tolerant execution of remote jobs on idling workst
ations. An analytical analysis of the reliability and mean turnaround
time of the execution of remote jobs are conducted for both control po
licies. The optimal time interval between checkpoints in the checkpoin
ting policy is formulated based on the given reliability and overhead
of the system. In addition, several sample results derived from these
analyses are compared with the outcome of corresponding simulation pro
grams. Some observations of fault-tolerant features of each control po
licy are then presented as guidelines for the future development of su
ch workstation-based distributed systems.