AN OPTIMAL RETRY POLICY-BASED ON FAULT CLASSIFICATION

Authors
Citation
Th. Lin et Kg. Shin, AN OPTIMAL RETRY POLICY-BASED ON FAULT CLASSIFICATION, I.E.E.E. transactions on computers, 43(9), 1994, pp. 1014-1025
Citations number
15
Categorie Soggetti
Computer Sciences","Engineering, Eletrical & Electronic","Computer Science Hardware & Architecture
ISSN journal
00189340
Volume
43
Issue
9
Year of publication
1994
Pages
1014 - 1025
Database
ISI
SICI code
0018-9340(1994)43:9<1014:AORPOF>2.0.ZU;2-H
Abstract
An optimal (in some sense) retry policy in a computer system is usuall y derived under an unrealistic assumption that fault characteristics a re known a priori and remain unchanged throughout the mission lifetime . In such a case, the optimal retry period depends only upon the syste m's status at the time of fault detection. We propose to remedy this d eficiency by formulating the optimal retry problem as a Bayesian decis ion problem where not only the time of fault detection but also the re sults of earlier retries are used to estimate the current fault charac teristics. Previous knowledge about fault characteristics is represent ed by the prior distributions of fault-related parameters which are up dated whenever new samples are obtained from retry and detection mecha nisms. A new fault classification scheme is proposed to assign a tempo ral fault type (i.e., permanent or intermittent or transient) to each detected fault so that the corresponding fault parameters can be estim ated. The estimated fault parameters are then used to derive the optim al retry period that minimizes the mean task completion time. Efficien t algorithms are developed to determine the optimal retry period on-li ne upon detection of each fault. To evaluate the goodness of the propo sed retry policy, it is compared with, and is always found to outperfo rm, a number of fixed-retry-period policies.