ITA
ENG

Are multiple measures meaningful?: Lessons from a statewide performance assessment

Authors

Goldberg, GL Roswell, BS

Citation

Gl. Goldberg et Bs. Roswell, Are multiple measures meaningful?: Lessons from a statewide performance assessment, APPL MEAS E, 14(2), 2001, pp. 125-150

Citations number

Categorie Soggetti

Education

Journal title

APPLIED MEASUREMENT IN EDUCATION

ISSN journal

08957347 → ACNP

Volume

Issue

Year of publication

2001

Pages

125 - 150

Database

ISI

SICI code

0895-7347(2001)14:2<125:AMMMLF>2.0.ZU;2-E

Abstract

For practical, psychometric, and pedagogical reasons, strong interest exist s in developing multiple-measure constructed-response items for use in larg e-scale performance assessments. Items that can be scored for evidence of p roficiency in 2 or more content areas raise questions, however, about the " fit" between various content areas and the possibility of sending cross-mes sages or confounding different content demands. To determine the factors th at contribute to or compromise the effectiveness of multiscored items, in t his study we combine analysis of statewide score data from the 1996 Marylan d School Performance Assessment Program tests, administered at Grades 3, 5, and 8, with systematic analysis of 60 activities providing measures of wri ting, language usage (LU), or both, as well as one or more content areas. A lthough test developers to date have had greater success in creating writin g/LU items that can also be scored for reading and social studies than for mathematics and science, we argue for the validity of multiple-measure item s across all content areas and suggest that, across content areas, successf ul multiple-measure items (a) make information sources explicit and allow s tudents to draw on both text-based and personal knowledge; (b) identify the specific content area skills or concepts being assessed; (c) permit open-e nded development; (d) maintain a good fit between content demands and the r hetorical situation, creating an authentic writer-audience relationship; an d (e) are uncluttered, focused, and direct, with good recall capability. Th us, we suggest that test developers reorient their concerns from the diffic ulty of items to identifying elements of multiple measure activities that f acilitate or impede students' ability to demonstrate what they know and can do in different content areas.