Sophisticated modeling and analysis methods are being developed in academic
and industrial research labs for reliability engineering and other domains
. The evaluation and evolution of such methods based on use in practice is
critical to research progress, but few such methods see widespread use. A c
ritical impediment to disseminating new methods is the inability to produce
, at a reasonable cost, supporting software tools that have the
usability and dependability characteristics that industrial users require,
evolvability to accommodate software change as the underlying analysis meth
ods are refined AR?) enhanced.
The difficulty of software development thus emerges as a key impediment to
advances in engineering modeling and analysis.
Today, producing sophisticated software tools is costly and difficult, even
for capable software developers. One problem is that when common design-me
thods, such as object-oriented programming, are used to build such tools, t
he results are often large, complex, and thus costly programs. Tools on the
order of a million lines of code are typical, with much of the code devote
d to
tool interoperability,
human-computer interface, other issues not directly related to modeling and
analysis.
Making matters worse, domain experts, such as reliability engineering resea
rchers, often lack skills in modern software development, while software en
gineers and researchers lack knowledge of the application domains. All too
often the results of tool-development efforts today are thus
costly,
hard to use,
not dependable,
essentially unmaintainable.
This paper presents an approach to tool development that attacks these prob
lems. Progress requires synergistic, interdisciplinary collaborations betwe
en application-domain and software-engineering researchers. We have pursued
such an approach in developing Galileo: a fault tree modeling and analysis
tool. These innovations are described in 2 dimensions
1) The Galileo core reliability modeling and analysts function.
2) Our work on software engineering for high-quality, low-cost modeling and
analysis tools.
In the reliability engineering domain, Galileo supports precise, modular, d
ynamic fault-tree analysis using techniques developed primarily by Dugan an
d her colleagues. This approach addresses the problem that a single analysi
s technique seldom applies to an entire system. A good reliability engineer
uses different techniques to analyze different parts of a system
decomposing a complex model into smaller pieces,
applying different analysis techniques to submodels,
integrating partial results into a system-level result.
Manually decomposing systems into parts, developing submodels, analyzing th
em with different tools and techniques, and integrating the partial results
is tedious and error prone at best. By contrast, Galileo-
automatically detects independent(1) sub-trees;
translates them into appropriate submodels based on Markov chains, Boolean
decision diagrams, and other formalisms,
analyzes the submodels;
integrates the results.
Galileo supports precise analysis while exploiting modularity for scalabili
ty in solving problems that require time and space that is exponential in t
he number of basic events in the worst-case.
This software engineering approach centers on the component-based design te
chniques of Sullivan and his colleagues. A key element of the approach is t
he use of mass-market software packages as large components, viz, package-o
riented programming. It achieves at low cost
an effective human-computer interface,
tool interoperability,
considerable dependability for the function delivered.
Low-cost means that the effort involves a small handful of graduate and und
ergraduate students and faculty. Sullivan's mediator-based design approach
is also used at several scales to support an integrated, multi-view environ
ment in which it is possible to edit fault trees in either textual or graph
ical form, while fostering dependability and evolvability. To help validate
this modeling approach and to verify its implementation, both natural-lang
uage and formal specifications are being developed for the fault-tree gates
and their interactions.
Galileo has been evaluated against commercially available fault tree analys
is tools. The results highlight the need for fidelity in analysis. Testing
two tools popular in the reliability engineering community revealed the sam
e algorithmic error in both, despite their claimed ability to provide exact
solutions. At the intersection of software and reliability engineering, th
e redundancy inherent in the use of multiple analysis techniques in Galileo
is used as an aid to testing this software. Galileo has been acquired by h
undreds of sites. We are now building an enhanced version with NASA Langley
Research Center.