Based on extensive field failure data for Tandem's GUARDIAN operating
system, this paper discusses evaluation of the dependability of operat
ional software, Software faults considered are major defects that resu
lt in processor failures and invoke backup processes to take over, The
paper categorizes the underlying causes of software failures and eval
uates the effectiveness of the process pair technique in tolerating so
ftware faults, A model to describe the impact of software faults on th
e reliability of an overall system is proposed, The model is used to e
valuate the significance of key factors that determine software depend
ability and to identify areas for improvement. An analysis of the data
shows that about 77% of processor failures that are initially conside
red due to software are confirmed as software problems, The analysis s
hows that the use of process pairs to provide checkpointing and restar
t (originally intended for tolerating hardware faults) allows the syst
em to tolerate about 75% of reported software faults that result in pr
ocessor failures, The loose coupling between processors, which results
in the backup execution (the processor state and the sequence of even
ts) being different from the original execution, is a major reason for
the measured software fault tolerance, Over two-thirds (72%) of measu
red software failures are recurrences of previously reported faults, M
odeling, based on the data, shows that, in addition to reducing the nu
mber of software faults, software dependability can be enhanced by red
ucing the recurrence rate.