Dependability evaluation involves the study of failures and errors. Th
e destructive nature of a crash and long error latency make it difficu
lt to identify the causes of failures in the operational environment.
It is particularly hard to recreate a failure scenario for a large, co
mplex system. To identify and understand potential failures, the autho
rs use an experiment-based approach for studying system dependability.
This approach is applied during the conception, design, prototype, an
d operational phases. To take an experiment-based approach, you must f
irst understand a system's architecture, structure, and behavior. You
need to know its tolerance for faults and failures, including its buil
tin detection and recovery mechanisms, and you need specific instrumen
ts and tools to inject faults, create failures or errors, and monitor
their effects. Engineers most often use low-cost, simulation-based fau
lt injection to evaluate the dependability of a system that is in the
conceptual and design phases. At this point, the system under study is
only a series of high-level abstractions; implementation details have
yet to be determined. Thus the system is simulated on the basis of si
mplified assumptions. Simulation-based fault injection, which assumes
that errors or failures occur according to predetermined distribution,
is useful for evaluating the effectiveness of fault-tolerant mechanis
ms and a system's dependability; it does provide timely feedback to sy
stem engineers. However, it requires accurate input parameters, which
are difficult to supply: Design and technology changes often complicat
e the use of past measurements. Testing a prototype, on the other hand
, allows you to evaluate the system without any assumptions about syst
em design. Instead of injecting faults, engineers can directly measure
operational systems as they handle real workloads. Measurement-based
analysis uses actual data, which contains much information about natur
ally occurring errors and failures and sometimes about recovery attemp
ts. Although these three experimental methods have limitations, their
unique values complement one another and allow for a wide spectrum of
dependability studies.