SAFETY ISSUES IN THE COMPARATIVE-ANALYSIS OF DEPENDABLE ARCHITECTURES

Citation
Cy. Choi et al., SAFETY ISSUES IN THE COMPARATIVE-ANALYSIS OF DEPENDABLE ARCHITECTURES, IEEE transactions on reliability, 46(3), 1997, pp. 316-322
Citations number
25
Categorie Soggetti
Computer Sciences","Engineering, Eletrical & Electronic","Computer Science Hardware & Architecture","Computer Science Software Graphycs Programming
ISSN journal
00189529
Volume
46
Issue
3
Year of publication
1997
Pages
316 - 322
Database
ISI
SICI code
0018-9529(1997)46:3<316:SIITCO>2.0.ZU;2-1
Abstract
This paper illustrates the value of analytic techniques for the safety analysis of dependable architectures at the system level. Its importa nt contributions are: 1) comparative analysis of 5 common hardware-arc hitectures for life-critical applications, 2) demonstration of the eff ect of various coverage parameters on system safety, and 3) illustrati on of important metrics in evaluating system safety. Discrete space, C TMC (continuous time Markov chains) are used to model the 5 architectu res at the building block level: a simplex architecture, two gracefull y degrading architectures with & without repair, and two hard-failing architectures (hard-fall refers to: design of not reconfiguring to ens ure continued operation), Comparative analysis has shown that: Safety & reliability usually entail tradeoff design issues - increasing relia bility might not increase safety. If the coverage of the comparison pr ocess is not high enough, a simplex architecture can have a higher lev el of steady-state safety than other architectures which feature redun dancy. Likewise, hard-fail architectures can provide a better measure of steady-state safety than the simplex. The cost for using a hard-fai l approach is the reliability loss compared to an architecture which r econfigures as illustrated in the reliability & MTTF (mean time to fai lure) plots. Coverage of a single module and coverage of a comparison process can appreciably affect the steady-state safety and the MTTHE ( mean time to hazardous event) of a system. For ail of the case archite ctures, improving the coverage of a single module improved their MTTHE by a factor of 10 over the control case with all other coverage facto rs being equal, When the coverage of the comparison process improved, the hard-fail architectures had their MTTHE improve by a factor of la over their control case, Coverage on the voting process made an unimpo rtant difference to the outcome of the MTTHE. Repair can appreciably i mprove the MTTF & MTTHE of a system, provided that the repair rate is factors of 10 greater than the system failure rate, but at the cost of a lower level of steady-state safety. In the MTTF analysis, adding a 'repair rate' = 3650 x 'failure rate' could increase system MTTF by a factor of 2, In MTTHE analysis, repair in the gracefully degrading arc hitectures made their MTTHE values comparable to their hard-falling co unterparts. However, the cost of adding repair to hard-failing archite ctures yields a lower level of resiliency in their steady-state safety . If safety is an overriding concern in the system design, a hard-fail architecture is an approach well worth looking at, However, the desig n cycle is subject to a myriad of goals, and the point of illustrating tine models under reliability metrics is to underline the fact that d esigning a system for safety & reliability can incur a tradeoff decisi on between the two.