Hierarchical simulation approach to accurate fault modeling for system dependability evaluation

Citation
Z. Kalbarczyk et al., Hierarchical simulation approach to accurate fault modeling for system dependability evaluation, IEEE SOFT E, 25(5), 1999, pp. 619-632
Citations number
16
Categorie Soggetti
Computer Science & Engineering
Journal title
IEEE TRANSACTIONS ON SOFTWARE ENGINEERING
ISSN journal
00985589 → ACNP
Volume
25
Issue
5
Year of publication
1999
Pages
619 - 632
Database
ISI
SICI code
0098-5589(199909/10)25:5<619:HSATAF>2.0.ZU;2-3
Abstract
This paper presents a hierarchical simulation methodology that enables accu rate system evaluation under realistic faults and conditions. In this metho dology, effects of low-level (i.e., transistor or circuit level) faults are propagated to higher levels (i.e., system level) using fault dictionaries. The primary fault models are obtained via simulation of the transistor-lev el effect of a radiation particle penetrating a device. The resulting curre nt bursts constitute the first-level fault dictionary and are used in the c ircuit-level simulation to determine the impact on circuit latches and flip -flops. The latched outputs constitute the next level fault dictionary in t he hierarchy and are applied in conducting fault injection simulation at th e chip-level under selected workloads or application programs. Faults injec ted at the chip-level result in memory corruptions, which are used to form the next level fault dictionary for the system-level simulation of an appli cation running on simulated hardware. When an application terminates, eithe r normally or abnormally, the overall fault impact on the software behavior is quantified and analyzed. The system in this sense can be a single works tation or a network. The simulation method is demonstrated and validated in the case study of Myrinet (a commercial, high-speed network) based network system. The study shows that the method: 1) allows detailed simulation of faults at lower levels and effective fault propagation through the system t o the user-visible higher levels using fault dictionaries, 2) links physica l faults with effects that the user can observe at the higher levels and th us provides a foundation for realistic fault injection studies, 3) allows s ignificant reduction in the number of simulations needed, due to the fault dictionary method, 4) offers a high confidence in the evaluation results be cause the system is analyzed in presence of realistic fault conditions, and 5) provides Valuable feedback for designing error recovery mechanisms to i mprove dependability.