Js. Uebersax, STATISTICAL MODELING OF EXPERT RATINGS ON MEDICAL-TREATMENT APPROPRIATENESS, Journal of the American Statistical Association, 88(422), 1993, pp. 421-427
This article uses latent structure analysis to model ordered category
ratings by multiple experts on the appropriateness of indications for
the medical procedure carotid endarterectomy. The statistical method u
sed is a form of located latent class analysis, which combines element
s of latent class and latent trait analysis. It assumes that treatment
indications fall into distinct latent classes, with each latent class
corresponding to a different level of appropriateness. The appropriat
eness rating of a treatment indication by a rater is assumed determine
d by the latent class membership of the indication, rating category th
resholds of the rater, and random measurement error. The located laten
t class model has two alternative forms: a normal ogive form, which de
rives from the assumption of normally distributed measurement error, a
nd a logistic approximation to the normal form. The approach has the f
ollowing advantages for the analysis of ordered category ratings by mu
ltiple experts: (1) it assesses whether different raters base ratings
on the same or different criteria; (2) it assesses rater bias-the tend
ency of some raters to make higher or lower ratings than others; (3) i
t characterizes rater differences in rating category definitions; (4)
it provides theoretically based methods for combining the ratings of d
ifferent raters; and (5) it provides a description of the distribution
of the latent trait. The data examined are appropriateness ratings on
848 indications for carotid endarterectomy made by nine medical exper
ts. The located latent class approach provides unique insights concern
ing the data. It identifies what appears to be a set of clear nonindic
ations for carotid endarterectomy, but a corresponding set of clear in
dications is not evident. The results indicate that all raters measure
d a common latent trait of treatment appropriateness, but that some me
asured the trait better than others. Rater differences in overall bias
and rating category definitions are evident. Two methods are used to
combine raters' ratings. One uses ratings to calculate a continuous ap
propriateness score for each indication. The other uses ratings to ass
ign indications to discrete outcome categories, each corresponding to
a specific level of appropriateness. The located latent class approach
for ordered category measures has possible applications besides the a
nalysis of expert ratings, such as item analysis. Potential extensions
of the model are discussed.