DEPENDABILITY ANALYSIS OF A HIGH-SPEED NETWORK USING SOFTWARE-IMPLEMENTED FAULT INJECTION AND SIMULATED FAULT INJECTION

Citation
Dt. Stott et al., DEPENDABILITY ANALYSIS OF A HIGH-SPEED NETWORK USING SOFTWARE-IMPLEMENTED FAULT INJECTION AND SIMULATED FAULT INJECTION, I.E.E.E. transactions on computers, 47(1), 1998, pp. 108-119
Citations number
21
Categorie Soggetti
Computer Science Hardware & Architecture","Engineering, Eletrical & Electronic","Computer Science Hardware & Architecture
ISSN journal
00189340
Volume
47
Issue
1
Year of publication
1998
Pages
108 - 119
Database
ISI
SICI code
0018-9340(1998)47:1<108:DAOAHN>2.0.ZU;2-L
Abstract
This paper presents a dependability study of high-speed, switched Loca l Area Networks (LANs) using Myrinet as an example testbed (with theor etical speeds of 2.56 Gbps). The study uses results of two fault injec tion methods, simulated fault injection and software-implemented fault injection (SWIFI), to analyze the application-level impact of transie nt faults injected into the network interface hardware. These results include a number of errors, such as dropped or corrupt messages, host interface or host resets, and local or remote host interface hangs. Th e paper presents the study in two parts: First, the results from the S WIFI method in the real system are used as a basis to validate the sim ulation and identify the major factors leading to differences between the methods. A comparison between the two injection methods shows that they agree for 83 percent of the fault injections. The results, howev er, vary greatly, depending on the fault type considered, The study al so presents an analysis of the effects of varying workload intensity, host platform, and interface function targeted by the injection. An ex ample of this analysis is to show that the function targeted has a sig nificant impact on the fault activation rate. Finally, the study ident ifies two mechanisms by which faults may propagate from the interface to other parts of the network; in one example, this propagation caused the interface's host computer to reboot, while another caused a remot e interface in the network to hang.