In this study, we investigated 3 factors that may contribute to the large v
ariation in student performance across open-ended measures. These factors a
re content domain, format (whether the task required only pencil and paper
or involved a hands-on manipulation of equipment), and level of inquiry (wh
ether the task guided the student toward the solution or required the stude
nt to develop a solution strategy). A group of 6 similar investigations of
acids and bases were developed from a common shell that controlled for form
at and level of inquiry. Students completed 2 of these tasks as well as tas
ks drawn from other content areas and a multiple-choice test of science. Re
sults did not bear out the hypothesis that tasks that were similar to each
other in content, level of inquiry, and format would correlate higher with
each other than with measures that differed on these dimensions. Post hoc a
nalyses of the tasks revealed unanticipated differences in developers' inte
rpretation of the shell that may have affected student performance. Implica
tions for large-scale use of performance measures are discussed.