For practical, psychometric, and pedagogical reasons, strong interest exist
s in developing multiple-measure constructed-response items for use in larg
e-scale performance assessments. Items that can be scored for evidence of p
roficiency in 2 or more content areas raise questions, however, about the "
fit" between various content areas and the possibility of sending cross-mes
sages or confounding different content demands. To determine the factors th
at contribute to or compromise the effectiveness of multiscored items, in t
his study we combine analysis of statewide score data from the 1996 Marylan
d School Performance Assessment Program tests, administered at Grades 3, 5,
and 8, with systematic analysis of 60 activities providing measures of wri
ting, language usage (LU), or both, as well as one or more content areas. A
lthough test developers to date have had greater success in creating writin
g/LU items that can also be scored for reading and social studies than for
mathematics and science, we argue for the validity of multiple-measure item
s across all content areas and suggest that, across content areas, successf
ul multiple-measure items (a) make information sources explicit and allow s
tudents to draw on both text-based and personal knowledge; (b) identify the
specific content area skills or concepts being assessed; (c) permit open-e
nded development; (d) maintain a good fit between content demands and the r
hetorical situation, creating an authentic writer-audience relationship; an
d (e) are uncluttered, focused, and direct, with good recall capability. Th
us, we suggest that test developers reorient their concerns from the diffic
ulty of items to identifying elements of multiple measure activities that f
acilitate or impede students' ability to demonstrate what they know and can
do in different content areas.