Ah. Steinhart et al., RELIABILITY OF A CROHNS-DISEASE CLINICAL CLASSIFICATION SCHEME BASED ON DISEASE BEHAVIOR, Inflammatory bowel diseases, 4(3), 1998, pp. 228-234
Classification of Crohn's disease (CD) by disease behavior-either infl
ammatory (INF), fibrostenotic (FS), or fistulizing/perforating (FP)-ha
s been proposed as a means of assisting management decisions and predi
cting outcomes for subgroup analysis in clinical trials and for making
phenotype/genotype associations in molecular genetic studies. Accurat
e and reproducible classification of CD patient subgroups is of paramo
unt importance in such studies but to be useful, the classification sc
heme must have good interrater agreement. We sought to assess the inte
rrater agreement associated with the disease-behavior classification s
cheme of CD. Twelve patients with CD were randomly selected from a dat
abase of 964 patients with CD undergoing medical or surgical treatment
or both. Clinical details of the 12 cases, along with their radiograp
hs and surgical and pathological reports, were presented to a panel of
20 experts who were asked to classify each case based on the patient'
s overall disease course (scenario A) and as if the patient were being
entered into a clinical trial on that day (scenario B). Calculations
of strength of interrater agreement were made and were expressed as th
e kappa statistic (kappa), with kappa <0.2 = poor strength of agreemen
t; kappa 0.21-0.4 = fair; kappa 0.41-0.6 = moderate; kappa 0.61-0.8 =
good; and kappa 0.81-1.0 = very good. Five panel participants did not
complete the study, and three clinical vignettes were excluded because
of incomplete scoring, leaving a total of 15 panel experts assessing
nine cases. Overall interrater agreement was only fair with kappa = 0.
353 for scenario A and kappa = 0.291 for scenario B. Interrater agreem
ent was less when only the most straightforward case in each disease c
ategory was evaluated. Classification of CD by pattern of disease beha
vior yields only fair interrater agreement. This raises concerns regar
ding its applicability, particularly in ongoing studies of genotype/ph
enotype associations. Further refinement of disease subtypes and clear
operational definitions are required.