We propose a new procedure for constructing inferences about a measure of i
nterobserver agreement in studies involving a binary outcome and multiple r
aters. The proposed procedure, based on a chi-square goodness-of-fit test a
s applied to the correlated binomial model (Bahadur, 1961, in Studies in It
em Analysis and Prediction, 158-176), is an extension of the goodness-of-fi
t procedure developed by Donner and Eliasziw (1992, Statistics in Medicine
11, 1511-1519) for the case of two raters. The new procedure is shown to pr
ovide confidence-interval coverage levels that are close to nominal over a
wide range of parameter combinations. The procedure also provides a sample-
size formula that may be used to determine the required number of subjects
and raters for such studies.