Hr. Rubin et al., HOW RELIABLE IS PEER-REVIEW OF SCIENTIFIC ABSTRACTS - LOOKING BACK ATTHE 1991 ANNUAL-MEETING OF THE SOCIETY OF GENERAL INTERNAL-MEDICINE, Journal of general internal medicine, 8(5), 1993, pp. 255-258
Objective: To evaluate the interrater reproducibility of scientific ab
stract review. Design: Retrospective analysis. Setting: Review for the
1991 Society of General Internal Medicine (SGIM) annual meeting. Subj
ects: 426 abstracts in seven topic categories evaluated by 55 reviewer
s. Measurements: Reviewers rated abstracts from 1 (poor) to 5 (excelle
nt), globally and on three specific dimensions: interest to the SGIM a
udience, quality of methods, and quality of presentation. Each abstrac
t was reviewed by five to seven reviewers. Each reviewer's ratings of
the three dimensions were added to compute that reviewer's summary sco
re for a given abstract. The mean of all reviewers' summary scores for
an abstract, the final score, was used by SGIM to select abstracts fo
r the meeting. Results: Final scores ranged from 4.6 to 13.6 (mean = 9
.9). Although 222 abstracts (52%) were accepted for publication, the 9
5% confidence interval around the final score of 300 (70.4%) of the 42
6 abstracts overlapped with the threshold for acceptance of an abstrac
t. Thus, these abstracts were potentially misclassified. Only 36% of t
he variance in summary scores was associated with an abstract's identi
ty, 12% with the reviewer's identity, and the remainder with idiosyncr
atic reviews of abstracts. Global ratings were more reproducible than
summary scores. Conclusion: Reviewers disagreed substantially when eva
luating the same abstracts. Future meeting organizers may wish to rank
abstracts using global ratings, and to experiment with structured rev
iew criteria and other ways to improve raters' agreement.