In this paper we investigate the use of the area under the receiver op
erating characteristic (ROC) curve (AUC) as a performance measure for
machine learning algorithms. As a case study we evaluate six machine l
earning algorithms (C4.5, Multiscale Classifier, Perceptron, Multi-lay
er Perceptron, k-Nearest Neighbours, and a Quadratic Discriminant Func
tion) on six ''real world'' medical diagnostics data sets. We compare
and discuss the use of AUC to the more conventional overall accuracy a
nd find that AUC exhibits a number of desirable properties when compar
ed to overall accuracy: increased sensitivity in Analysis of Variance
(ANOVA) tests; a standard error that decreased as both AUC and the num
ber of test samples increased; decision threshold independent; and it
is invariant to a priori class probabilities. The paper concludes with
the recommendation that AUC be used in preference to overall accuracy
for ''single number'' evaluation of machine learning algorithms. (C)
1997 Pattern Recognition Society.