EVENTS DEFINED BY DURATION AND SEVERITY, WITH AN APPLICATION TO NETWORK RELIABILITY

Citation
Ra. Becker et al., EVENTS DEFINED BY DURATION AND SEVERITY, WITH AN APPLICATION TO NETWORK RELIABILITY, Technometrics, 40(3), 1998, pp. 177-189
Citations number
13
Categorie Soggetti
Statistic & Probability","Statistic & Probability
Journal title
ISSN journal
00401706
Volume
40
Issue
3
Year of publication
1998
Pages
177 - 189
Database
ISI
SICI code
0040-1706(1998)40:3<177:EDBDAS>2.0.ZU;2-2
Abstract
Communications networks are highly reliable and almost never experienc e widespread failures. But from time to time performance degrades and the probability that a call is blocked or fails to reach its destinati on jumps from nearly 0 to an unacceptable level. High but variable blo cking may then persist for a noticeable period of time. Extended perio ds of high blocking, or events, can be caused by congestion in respons e to natural disasters, fiber cuts, equipment failures, and software e rrors, for example. Because the consequences of an event depend on the level of blocking and its persistence, lists of events at specified b locking and duration thresholds, such as 50% for 30 minutes or 90% for 15 minutes, are often maintained. Reliability parameters at specified blocking and duration thresholds, such as the mean number of events p er year and mean time spent in events, are estimated from the lists of reported events and used to compare network service providers, transm ission facilities, or brands of equipment, for example. This article s hows how data obtained with two-stage sampling can be used to estimate blocking probabilities as a function of time. The estimated blocking probabilities are then used to detect and characterize events and to e stimate reliability parameters at specified blocking and duration thre sholds. Our estimators are model-free, except for one step in a sampli ng bias correction, and practical even if there are hundreds of millio ns of observations. Pointwise confidence intervals for reliability par ameters as a function of blocking and duration thresholds are built us ing a kind of ''partial bootstrapping'' that is Suitable for very larg e sets of data. The performance of the algorithm for event detection a nd the estimators of reliability parameters are explored with simulate d data. An application to comparison of two network service providers is given in this article, and possible adaptations for other monitorin g problems are sketched.