Are multiple measures meaningful?: Lessons from a statewide performance assessment

Citation
Gl. Goldberg et Bs. Roswell, Are multiple measures meaningful?: Lessons from a statewide performance assessment, APPL MEAS E, 14(2), 2001, pp. 125-150
Citations number
21
Categorie Soggetti
Education
Journal title
APPLIED MEASUREMENT IN EDUCATION
ISSN journal
08957347 → ACNP
Volume
14
Issue
2
Year of publication
2001
Pages
125 - 150
Database
ISI
SICI code
0895-7347(2001)14:2<125:AMMMLF>2.0.ZU;2-E
Abstract
For practical, psychometric, and pedagogical reasons, strong interest exist s in developing multiple-measure constructed-response items for use in larg e-scale performance assessments. Items that can be scored for evidence of p roficiency in 2 or more content areas raise questions, however, about the " fit" between various content areas and the possibility of sending cross-mes sages or confounding different content demands. To determine the factors th at contribute to or compromise the effectiveness of multiscored items, in t his study we combine analysis of statewide score data from the 1996 Marylan d School Performance Assessment Program tests, administered at Grades 3, 5, and 8, with systematic analysis of 60 activities providing measures of wri ting, language usage (LU), or both, as well as one or more content areas. A lthough test developers to date have had greater success in creating writin g/LU items that can also be scored for reading and social studies than for mathematics and science, we argue for the validity of multiple-measure item s across all content areas and suggest that, across content areas, successf ul multiple-measure items (a) make information sources explicit and allow s tudents to draw on both text-based and personal knowledge; (b) identify the specific content area skills or concepts being assessed; (c) permit open-e nded development; (d) maintain a good fit between content demands and the r hetorical situation, creating an authentic writer-audience relationship; an d (e) are uncluttered, focused, and direct, with good recall capability. Th us, we suggest that test developers reorient their concerns from the diffic ulty of items to identifying elements of multiple measure activities that f acilitate or impede students' ability to demonstrate what they know and can do in different content areas.