In many clinical studies more than one observer may be rating a characteris
tic measured on an ordinal scale. For example, a study may involve a group
of physicians rating a feature seen on a pathology specimen or a computer t
omography scan. In clinical studies of this kind, the weighted it coefficie
nt is a popular measure of agreement for ordinally scaled ratings. Our rese
arch stems from a study in which the severity of inflammatory skin disease
was rated. The investigators wished to determine and evaluate the strength
of agreement between a variable number of observers taking into account pat
ient-specific (age and gender) as well as rater-specific (whether board cer
tified in dermatology) characteristics. This suggested modelling kappa as a
function of these covariates. We propose the use of generalized estimating
equations to estimate the weighted kappa coefficient. This approach also a
ccommodates unbalanced data which arise when some subjects are not judged b
y the same set of observers. Currently an estimate of overall kappa for a s
imple unbalanced data set without covariates involving more than two observ
ers is unavailable. In the inflammatory skin disease study none of the cova
riates were significantly associated with kappa, thus enabling the calculat
ion of an overall weighted a for this unbalanced data set. In the second mo
tivating example (multiple sclerosis), geographic location was significantl
y associated with kappa. In addition we also compared the results of our me
thod with current methods of testing for heterogeneity of weighted ii coeff
icients across strata (geographic location) that are available for balanced
data sets.