J. Strayhorn et al., AN INTERVENTION TO IMPROVE THE RELIABILITY OF MANUSCRIPT REVIEWS FOR THE MERICAN-ACADEMY-OF-CHILD-AND-ADOLESCENT-PSYCHIATRY, The American journal of psychiatry, 150(6), 1993, pp. 947-952
Objective: The effects of methods used to improve the interrater relia
bility of reviewers' ratings of manuscripts submitted to the journal o
f the American Academy of Child and Adolescent Psychiatry were studied
. Method: Reviewers' ratings of consecutive manuscripts submitted over
approximately 1 year were first analyzed, 296 pairs of ratings were s
tudied. Intraclass correlations and confidence intervals for the corre
lations were computed for the two main ratings by which reviewers quan
tified the quality of the article: a 1-10 overall quality rating and a
recommendation for acceptance or rejection with four Possibilities al
ong that continuum. Modifications were then introduced, including a mu
lti-item rating scale and two training manuals to accompany it. Over t
he next year, 272 more articles were rated, and reliabilities were com
puted for the new scale and for the scales previously used. Results: T
he intraclass correlation of the most reliable rating before the inter
vention was 0.27, the reliability of the new rating procedure was 0.43
. The difference between these two was significant. The reliability fo
r the new rating scale was in the fair to good range, and it became ev
en better when the ratings of the two reviewers were averaged and the
reliability stepped up by the Spearman-Brown formula. The new rating s
cale had excellent internal consistency and correlated highly with oth
er quality ratings. Conclusions: The data confirm that the reliability
of ratings of scientific articles may be improved by increasing the n
umber of rating scale points, eliciting ratings of separate, concrete
items rather than a global judgment, using training manuals, and avera
ging the scores of multiple reviewers.