A method for measuring interrater agreement on checklists is presented. Thi
s technique does not assign individual scores to raters, but computes a sin
gle agreement score from the concordance of their check mark configurations
. An overall coefficient of agreement, called phi, is derived. The agreemen
t coefficient that is expected by chance and the statistical significance o
f phi are determined by statistical simulation. Despite the dichotomous nat
ure of the checklist agreement (raters either agree or disagree on items),
we show that the binomial distribution does not provide a means for testing
the statistical significance of phi. A medical education study is used to
illustrate the phi methodology.