Fault-tolerant system dependability - Explicit modeling of hardware and software component-interactions

Citation
K. Kanoun et M. Ortalo-borrel, Fault-tolerant system dependability - Explicit modeling of hardware and software component-interactions, IEEE RELIAB, 49(4), 2000, pp. 363-376
Citations number
25
Categorie Soggetti
Eletrical & Eletronics Engineeing
Journal title
IEEE TRANSACTIONS ON RELIABILITY
ISSN journal
00189529 → ACNP
Volume
49
Issue
4
Year of publication
2000
Pages
363 - 376
Database
ISI
SICI code
0018-9529(200012)49:4<363:FSD-EM>2.0.ZU;2-5
Abstract
This paper presents a framework for modeling the dependability of hardware and software fault-tolerant systems, taking into account explicitly the dep endence among the components. These dependencies can result from: a) functi onal or structural interactions between the components or b) interactions d ue to global system reconfiguration and maintenance strategies. Modeling is based on GSPN (generalized stochastic Petri net). The modeling approach is modular: the behavior of each component and each interaction is represente d by its own GSPN, while the system model is obtained by composition of the se GSPN, Composition rules are defined and formalized through clear identif ication of the interfaces between the component and interaction nets. In ad dition to modularity, the formalism brings flexibility and re-usability, th ereby allowing easy sensitivity analysis with respect to the assumptions th at could be made about the behavior of the components and the resulting int eractions. This approach has; been successfully applied to select new architectures fo r the French Air Traffic Control system, based among other things, on avail ability evaluation. This paper illustrates it on a simple representative ex ample, including all the types of the identified dependencies: the duplex s ystem. Modeling of this system showed the strong dependence between compone nts. For example: the activation of a temporary hardware fault can propagat e an error to the hosted software component, which in turn can propagate to other components communicating with it (without being necessarily on the s ame computer). Thus the activation of a hardware temporary fault can lead t o the restart of one or more software components. Even if this has been obs erved on real systems, it has not been modeled explicitly in previous work. This paper shows how the modification of one or several assumptions can be performed without modifying all GSPN, considering two repair policies and two switching policies (with or without manual switch). The main advantage of this modeling approach, based on considering explicit ly the interactions, lies in its efficiency for modeling several alternativ es for the same system. These alternatives can differ by their composition or the organization or by the fault-tolerance and maintenance strategies. O ne can clearly identify from the beginning the components and interactions that are specific and those that are common to all alternatives, The common GSPN are thus developed and validated only once.