ANALYSIS OF CORRELATED ROC AREAS IN DIAGNOSTIC TESTING

Authors
Citation
Hh. Song, ANALYSIS OF CORRELATED ROC AREAS IN DIAGNOSTIC TESTING, Biometrics, 53(1), 1997, pp. 370-382
Citations number
36
Categorie Soggetti
Statistic & Probability","Statistic & Probability
Journal title
ISSN journal
0006341X
Volume
53
Issue
1
Year of publication
1997
Pages
370 - 382
Database
ISI
SICI code
0006-341X(1997)53:1<370:AOCRAI>2.0.ZU;2-S
Abstract
This paper focuses on methods of analysis of areas under receiver oper ating characteristic (ROC) curves. Analysis of ROC areas should incorp orate the correlation structure of repeated measurements taken on the same set of cases and the paucity of measurements per treatment result ing from an effective summarization of cases into a few area measures of diagnostic accuracy. The repeated nature of ROC data has been taken into consideration in the analysis methods previously suggested by Sw ets and Pickett (1982, Evaluation of Diagnostic Systems: Methods from Signal Detection Theory), Hanley and McNeil (1983, Radiology 148, 839- 843), and DeLong, DeLong;, and Clarke-Pearson (1988, Biometrics 44, 83 7-845). DeLong et al.'s procedure is extended to a Wald test for gener al situations of diagnostic testing. The method of analyzing jackknife pseudovalues by treating them as data is extremely useful when the nu mber of area measures to be tested is quite small. The Wald test based on covariances of multivariate multisample U-statistics is compared w ith two approaches of analyzing pseudovalues, the univariate mixed-mod el analysis of variance (ANOVA) for repeated measurements and the thre e-way factorial ANOVA. Monte Carlo simulations demonstrate that the th ree tests give good approximation to the nominal size at the 5% levels for large sample sizes, but the paired t-test using ROC areas as data lacks the power of the other three tests and Hanley and McNeil's meth od is inappropriate for testing diagnostic accuracies. The Wald statis tic performs better than the ANOVAs of pseudovalues. Jackknifing schem es of multiple deletion where different structures of normal and disea sed distributions are accounted for appear to perform slightly better than simple multiple-deletion schemes but no appreciable power differe nce is apparent, and deletion of too many cases at a time may sacrific e power. These methods have important applications in diagnostic testi ng in ROC studies of radiology and of medicine in general.