Evaluating the accuracy of judgments obtained from item review committees

Citation
G. Engelhard et al., Evaluating the accuracy of judgments obtained from item review committees, APPL MEAS E, 12(2), 1999, pp. 199-210
Citations number
22
Categorie Soggetti
Education
Journal title
APPLIED MEASUREMENT IN EDUCATION
ISSN journal
08957347 → ACNP
Volume
12
Issue
2
Year of publication
1999
Pages
199 - 210
Database
ISI
SICI code
0895-7347(1999)12:2<199:ETAOJO>2.0.ZU;2-#
Abstract
The purpose of this study is to examine whether the reviewers on item revie w committees can accurately identify test items that exhibit a variety of f laws. An instrument with 75 items was constructed and administered to 39 re viewers who were operational members of an item review committee. After und ergoing training, the 39 reviewers were asked to examine the 75 items and i ndicate whether each item exhibited cultural or technical flaws. There were 8 cultural flaw categories (e.g., "Does the item unfairly favor males or f emales?") and 8 technical flaw categories (e.g., "Is the item content inacc urate or factually incorrect?"). The accuracy of the reviewers was defined in terms of the match between the judged classifications and the a priori c lassifications of the items into flaw categories. A new approach based on i tem response theory for examining rater accuracy was used to analyze the da ta (Engelhard, 1996). The data suggest that it is easier to identify some t ypes of item flaws than others; specifically, the reviewers were more accur ate in identifying items with cultural flaws than with technical flaws. The reviewers exhibited fairly high accuracy rates overall that ranged from 83 % to 94%, and there are statistically significant differences in judgmental accuracy between the reviewers. Suggestions for future research on judgmen tal accuracy and the implications of this study for identifying biased item s are discussed.