ITA
ENG

AN OPTIMAL RETRY POLICY-BASED ON FAULT CLASSIFICATION

Authors

LIN TH SHIN KG

Citation

Th. Lin et Kg. Shin, AN OPTIMAL RETRY POLICY-BASED ON FAULT CLASSIFICATION, I.E.E.E. transactions on computers, 43(9), 1994, pp. 1014-1025

Citations number

Categorie Soggetti

Computer Sciences","Engineering, Eletrical & Electronic","Computer Science Hardware & Architecture

Journal title

I.E.E.E. transactions on computers → ACNP

ISSN journal

00189340

Volume

Issue

Year of publication

1994

Pages

1014 - 1025

Database

ISI

SICI code

0018-9340(1994)43:9<1014:AORPOF>2.0.ZU;2-H

Abstract

An optimal (in some sense) retry policy in a computer system is usuall y derived under an unrealistic assumption that fault characteristics a re known a priori and remain unchanged throughout the mission lifetime . In such a case, the optimal retry period depends only upon the syste m's status at the time of fault detection. We propose to remedy this d eficiency by formulating the optimal retry problem as a Bayesian decis ion problem where not only the time of fault detection but also the re sults of earlier retries are used to estimate the current fault charac teristics. Previous knowledge about fault characteristics is represent ed by the prior distributions of fault-related parameters which are up dated whenever new samples are obtained from retry and detection mecha nisms. A new fault classification scheme is proposed to assign a tempo ral fault type (i.e., permanent or intermittent or transient) to each detected fault so that the corresponding fault parameters can be estim ated. The estimated fault parameters are then used to derive the optim al retry period that minimizes the mean task completion time. Efficien t algorithms are developed to determine the optimal retry period on-li ne upon detection of each fault. To evaluate the goodness of the propo sed retry policy, it is compared with, and is always found to outperfo rm, a number of fixed-retry-period policies.