Background: An appropriate measure of performance is needed to identif
y anesthetic depth indicators that are promising for use in clinical m
onitoring. To avoid misleading results, the measure must take into acc
ount both desired indicator performance and the nature of available pe
rformance data, ideally, anesthetic depth indicator value should corre
late perfectly with anesthetic depth along a lighter-deeper anesthesia
continuum. Experimentally, however, a candidate anesthetic depth indi
cator is judged against a ''gold standard'' indicator that provides on
ly quantal observations of anesthetic depth. The standard anesthetic d
epth indicator is the patient's response to a specified stimulus. The
resulting observed anesthetic depth scale may consist only of patient
''response'' versus ''no response,'' or it may have multiple levels. T
he measurement scales for both the candidate anesthetic depth indicato
r and observed anesthetic depth are no more than ordinal; that is, onl
y the relative rankings of values on these scales are meaningful. Meth
ods: Criteria were established for a measure of anesthetic depth indic
ator performance and the performance measure that best met these crite
ria was found. Results: The performance measure recommended by the aut
hors is prediction probability P-K, a rescaled variant of Kim's d(y .
x) measure of association. This performance measure shows the correlat
ion between anesthetic depth indicator value and observed anesthetic d
epth, taking into account both desired performance and the limitations
of the data. Prediction probability has a value of 1 when the indicat
or predicts observed anesthetic depth perfectly, and a value of 0.5 wh
en the indicator predicts no better than a 50:50 chance. Prediction pr
obability avoids the shortcomings of other measures. For example, as a
nonparametric measure, P-K is independent of scale units and does not
require knowledge of underlying distributions or efforts to linearize
or to otherwise transform scales. Furthermore, P-K can be computed fo
r any degree of coarseness or fineness of the scales for anesthetic de
pth indicator value and observed anesthetic depth; thus, P-K fully use
s the available data without imposing additional arbitrary constraints
, such as the dichotomization of either scale. And finally, P-K can be
used to perform both grouped- and paired-data statistical comparisons
of anesthetic depth indicator performance, Data for comparing depth i
ndicators, however, must be gathered via the same response-to-stimulus
test procedure and over the same distribution of anesthetic depths. C
onclusions: Prediction probability P-K is an appropriate measure for e
valuating and comparing the performance of anesthetic depth indicators
.