Pg. Shekelle et al., THE REPRODUCIBILITY OF A METHOD TO IDENTIFY THE OVERUSE AND UNDERUSE OF MEDICAL PROCEDURES, The New England journal of medicine, 338(26), 1998, pp. 1888-1895
Background To assess the overuse and underuse of medical procedures, v
arious methods have been developed, but their reproducibility has not
been evaluated. This study estimates the reproducibility of one common
ly used method. Methods We performed a parallel, three-way replication
of the RAND-University of California at Los Angeles appropriateness m
ethod as applied to two medical procedures, coronary revascularization
and hysterectomy. Three nine-member multidisciplinary panels of exper
ts were composed for each procedure by stratified random sampling from
a list of experts nominated by the relevant specialty societies. Each
panel independently rated the same set of clinical scenarios in terms
of the appropriateness of the relevant procedure on a risk-benefit sc
ale ranging from 1 to 9. Final ratings were used to classify the proce
dure in each scenario as necessary or not necessary (to evaluate under
use) and inappropriate or not inappropriate (to evaluate overuse). Rep
roducibility was measured by overall agreement and by the kappa statis
tic. The criteria for underuse and overuse derived from these ratings
were then applied to real populations of patients who had undergone co
ronary revascularization or hysterectomy. Results The rates of agreeme
nt among the three coronary-revascularization panels were 95, 94, and
96 percent for inappropriate-use scenarios and 93, 92, and 92 percent
for necessary-use scenarios. Agreement among the three hysterectomy pa
nels was 88, 70, and 74 percent for inappropriate-use scenarios. Scena
rios involving necessary use of hysterectomy were not assessed. The th
ree-way kappa statistic to detect overuse was 0.52 for coronary revasc
ularization and 0.51 for hysterectomy. The three-way kappa statistic t
o detect underuse of coronary revascularization was 0.83. Application
of individual panels' criteria to real populations of patients resulte
d in a 100 percent variation in the proportion of cases classified as
inappropriate and a 20 percent variation in the proportion of cases cl
assified as necessary. Conclusions The appropriateness method is far f
rom perfect. Appropriateness criteria may be useful in comparing level
s of appropriate procedures among populations but should not by themse
lves be used to direct care for individual patients. (N Engl J Med 199
8; 338:1888-95.) (C) 1998, Massachusetts Medical Society.