Nh. Vaidya et Dk. Pradhan, FAULT-TOLERANT DESIGN STRATEGIES FOR HIGH-RELIABILITY AND SAFETY, I.E.E.E. transactions on computers, 42(10), 1993, pp. 1195-1206
Critical applications require systems with high reliability and safety
. Reliability is the probability that the system produces correct outp
ut. Safety is defined as the probability that the system output is eit
her correct, or the error in the output is detectable (the assumption
being that the system is safe when the error is detected). Systems wit
h high safety ensure that the probability of undetected errors is low.
In this paper, several fundamental results related to reliability and
safety are analyzed. Modular redundant systems consisting of multiple
identical modules and an arbiter are considered. It is shown that for
a given level of redundancy, a large number of implementation alterna
tives exist with varying degree of reliability and safety. Strategies
are formulated that achieve a maximal combination of reliability and s
afety. The effect of increasing the number of modules on system reliab
ility and safety is analyzed. It is shown that when one considers safe
ty in addition to reliability, it does not necessarily help to simply
add modules to the system. Specifically, increasing the number of modu
les by just one does not always improve both reliability and safety. T
o improve reliability and safety simultaneously, at least two addition
al modules are required when the outputs of the individual modules do
not have any redundant information (e.g., coding for error detection).
However, it is shown that if the modules themselves have built-in err
or detection capability, addition of just one module may be sufficient
to improve both reliability and safety.