ITA
ENG

USING MULTISTAGE AND STRATIFIED SAMPLING FOR INFERRING FAULT-COVERAGEPROBABILITIES

Authors

CONSTANTINESCU C

Citation

C. Constantinescu, USING MULTISTAGE AND STRATIFIED SAMPLING FOR INFERRING FAULT-COVERAGEPROBABILITIES, IEEE transactions on reliability, 44(4), 1995, pp. 632-639

Citations number

Categorie Soggetti

Computer Sciences","Engineering, Eletrical & Electronic","Computer Science Hardware & Architecture","Computer Science Software Graphycs Programming

Journal title

IEEE transactions on reliability → ACNP

ISSN journal

00189529

Volume

Issue

Year of publication

1995

Pages

632 - 639

Database

ISI

SICI code

0018-9529(1995)44:4<632:UMASSF>2.0.ZU;2-B

Abstract

Development of fault-tolerant computing systems requires accurate reli ability modeling, Analytic, simulation, and hybrid models are commonly used for obtaining reliability measures. These measures are functions of component failure rates and fault-coverage (probabilities). Covera ge provides information about the fault & error detection, isolation, and system recovery capabilities, This parameter can be derived by phy sical or simulated fault injection. Unfortunately, the complexity of m odern computing systems makes: exhaustive testing intractable, As a co nsequence, statistical inference has been used to extract meaningful i nformation from sample observation, The problem of conducting fault in jection experiments and statistically inferring the coverage from the information gathered in those experiments is addressed in this paper, The methods previously used for estimating the coverage considered onl y 4 few factors which influence the coverage, By contrast, we perform statistical experiments in a multi-dimensional space of events. In thi s way all major factors which influence the coverage (fault locations, timing characteristics of the fault, and the workload) are accounted for, For process control computers, the combination of input values an d fault occurrence times provides information about the workload which is executed, Multi-stage, stratified, and combined multi-stage & stra tified sampling are used in this: paper for deriving the coverage. Equ ations of the mean, variance, and confidence interval of the coverage are provided, The statistical error produced by the injected faults wh ich do not induce errors in tbe tested system (also known as the nonre sponse problem) is considered, A program which emulates a typical faul t environment was developed and four hypothetical systems are analyzed , These systems are characterized by coverages in the 0.90 - 0.9999 ra nge and a 10(12) fault space. The confidence intervals of the coverage are derived and checked against known true values. The main advantage s of this approach are: fault injection is performed in a multidimensi onal space of events, and accounts for all major factors which affect the coverage: fault location, timing characteristics of the fault and system workload. randomness 'which characterizes the fault occurrence and error propagation in a real computer' is preserved throughout the fault injection experiment. coverage estimators are provided in a gene ral form, Thus the same equations can be used for various implications by choosing the proper number of stages and strata. method applies bo th for physical & simulated fault injection. The assumption of normali ty is the main limitation of the method, However, experiments performe d for various dimensions of the fault space and values of the coverage and reported by some researchers have confirmed the adequacy of this assumption.