A. Thakur et Rk. Iyer, ANALYZE-NOW - AN ENVIRONMENT FOR COLLECTION AND ANALYSIS OF FAILURES IN A NETWORK OF WORKSTATIONS, IEEE transactions on reliability, 45(4), 1996, pp. 561-570
This paper describes Analyze-NOW, an environment for collection & anal
ysis of failures/errors in a network of workstations, Descriptions cov
er the data collection methodology and the tool implemented to facilit
ate this process, Software tools used for analysis are described, with
emphasis on the details of the implementation of the Analyzer, the pr
imary analysis tool. Application of the tools is demonstrated by using
them to collect & analyze failure data (for 32-week period) from a ne
twork of 69 SunOS-based workstations. Classification based on the sour
ce & effect of faults is used to identify problem areas, Different typ
es of failures encountered on the machines & network are highlighted t
o develop a proper understanding of failures in a network environment.
The results from the analysis tool should be used to pinpoint the pro
blem areas in the network. The results obtained from using Analyze-NOW
on failure data from the monitored network reveal some interesting be
havior of the network, Nearly 70% of the failures were network-related
, whereas disk errors were few. Network-related failures were 75% of a
ll hard-failures (failures that make a workstation unusable). Half of
the network-related failures were due to servers not responding to cli
ents, and half were performance-related and others, Problem areas in t
he network were found using this tool, Our approach was compared to th
e method of using the network architecture to locate problem areas, Th
is comparison showed that locating problem areas using network archite
cture over-estimates the number of problem areas.