Communications networks are highly reliable and almost never experienc
e widespread failures. But from time to time performance degrades and
the probability that a call is blocked or fails to reach its destinati
on jumps from nearly 0 to an unacceptable level. High but variable blo
cking may then persist for a noticeable period of time. Extended perio
ds of high blocking, or events, can be caused by congestion in respons
e to natural disasters, fiber cuts, equipment failures, and software e
rrors, for example. Because the consequences of an event depend on the
level of blocking and its persistence, lists of events at specified b
locking and duration thresholds, such as 50% for 30 minutes or 90% for
15 minutes, are often maintained. Reliability parameters at specified
blocking and duration thresholds, such as the mean number of events p
er year and mean time spent in events, are estimated from the lists of
reported events and used to compare network service providers, transm
ission facilities, or brands of equipment, for example. This article s
hows how data obtained with two-stage sampling can be used to estimate
blocking probabilities as a function of time. The estimated blocking
probabilities are then used to detect and characterize events and to e
stimate reliability parameters at specified blocking and duration thre
sholds. Our estimators are model-free, except for one step in a sampli
ng bias correction, and practical even if there are hundreds of millio
ns of observations. Pointwise confidence intervals for reliability par
ameters as a function of blocking and duration thresholds are built us
ing a kind of ''partial bootstrapping'' that is Suitable for very larg
e sets of data. The performance of the algorithm for event detection a
nd the estimators of reliability parameters are explored with simulate
d data. An application to comparison of two network service providers
is given in this article, and possible adaptations for other monitorin
g problems are sketched.