Objective: To identify the fracture characteristics that can be reliably as
sessed by analysis of plain radiographs of tibial plateau fractures.
Design: Radiographic review study.
Participants: Five orthopaedic traumatologists served as observers,
Intervention: Observers made assessments based on the radiographs of fifty-
six tibial plateau fractures. Precise definitions of the assessments to be
made were agreed on by all observers. The tested assessments included rater
s' abilities to identify and locate fracture lines, identify the presence o
f fracture displacement and comminution, make quantitative measurements of
displacement, and characterize qualitative features of fractures. For thirt
y-eight of the fractures that had a computed tomography (CT) scan available
, assessments were repeated using both radiographs and CT scans.
Main Outcome Measures: To characterize interobserver reliability, percentag
e agreement and kappa statistics were circulated for categorical variables,
and intraclass correlation coefficients (ICC) were calculated for noncateg
orical variables.
Results: Reliability of the assessments varied widely. Determining the loca
tion of fracture lines had the greatest reliability, whereas the subjective
assessments of fracture stability and energy showed the poorest reliabilit
y. Although the ICCs for quantitative measurements approached acceptable le
vels, the tolerance limits were extremely wide. The addition of a CT scan i
mproved the reliability of most assessments, but not to a statistically sig
nificant degree.
Conclusions: Many basic radiographic interpretations relied on in making tr
eatment decisions are made variably by observers. Using experienced raters
and precise definitions of fracture assessments does not guarantee a high l
evel of agreement. Discrete assessments have higher interrater agreements t
han do more qualitative assessments. Quantitative measures have wide tolera
nce limits and, therefore, probably cannot be used reproducibly to classify
fractures or make treatment decisions. We conclude the reliability of frac
ture classification is limited by raters' abilities to agree on basic radio
graphic assessments.