Dt. Stott et al., DEPENDABILITY ANALYSIS OF A HIGH-SPEED NETWORK USING SOFTWARE-IMPLEMENTED FAULT INJECTION AND SIMULATED FAULT INJECTION, I.E.E.E. transactions on computers, 47(1), 1998, pp. 108-119
This paper presents a dependability study of high-speed, switched Loca
l Area Networks (LANs) using Myrinet as an example testbed (with theor
etical speeds of 2.56 Gbps). The study uses results of two fault injec
tion methods, simulated fault injection and software-implemented fault
injection (SWIFI), to analyze the application-level impact of transie
nt faults injected into the network interface hardware. These results
include a number of errors, such as dropped or corrupt messages, host
interface or host resets, and local or remote host interface hangs. Th
e paper presents the study in two parts: First, the results from the S
WIFI method in the real system are used as a basis to validate the sim
ulation and identify the major factors leading to differences between
the methods. A comparison between the two injection methods shows that
they agree for 83 percent of the fault injections. The results, howev
er, vary greatly, depending on the fault type considered, The study al
so presents an analysis of the effects of varying workload intensity,
host platform, and interface function targeted by the injection. An ex
ample of this analysis is to show that the function targeted has a sig
nificant impact on the fault activation rate. Finally, the study ident
ifies two mechanisms by which faults may propagate from the interface
to other parts of the network; in one example, this propagation caused
the interface's host computer to reboot, while another caused a remot
e interface in the network to hang.