Background and Purpose. This study investigated whether the poor relia
bility of judgments of posteroanterior (PA) spinal stiffness is due to
rater bias or is a consequence of raters each having individual conce
pts of PA stiffness. Subjects. Three pairs of manipulative physical th
erapists with a minimum of 5 years of experience took part in the stud
y. Methods. The raters were required to make stiffness judgments of a
series of metal springs, and their performance at this task was compar
ed with that obtained when they rated the PA stiffness of patients wit
h low back pain. A range of reliability indices were calculated and ev
aluated to establish whether rater bias contributed to poor reliabilit
y in the stimuli and the measured stiffness of the springs was also as
sessed using the Pearson Product-Moment Correlation Coefficient. Resul
ts. The average intraclass correlation coefficient (2,1) for rating sp
ring stiffness was found to be .60, whereas for human spines it was fo
und to be .19. There was no evidence of rater bias contributing to poo
r reliability for rating stiffness of human spines. The average correl
ation between the rater's estimates of the magnitude of the stimuli an
d the measured stiffness of the stimuli was .80. Conclusion and Discus
sion. Physical therapists demonstrated much better ability to judge sp
ring stiffness than the PA stiffness of human spines. This difference
in performance implies that mechanical stiffness is not equivalent to
the clinical concept and individual interpretation of stiffness as a c
onstruct may lead to rater disagreement in the clinic. The reliability
of judgements of PA spinal stiffness may be enhanced in the future if
its dimensions can be identified, defined, and taken into account dur
ing clinical procedures.