G. Solano-flores et al., On the development and evaluation of a shell for generating science performance assessments, INT J SCI E, 21(3), 1999, pp. 293-315
We constructed a shell (blueprint) for generating science performance asses
sments, and evaluated the characteristics of the assessments produced with
it. The shell addressed four tasks: Planning, Hands-On, Analysis, and Appli
cation, Two parallel assessments were developed, Inclines (IN) and Friction
(FR). Two groups of fifth graders who differed in both science curriculum
experience and socioeconomic status took the assessments consecutively in e
ither of two sequences, IN --> FR or FR --> IN. We obtained high interrater
reliabilities for both assessments, statistically significant score differ
ences due to assessment administration sequence, and a considerable task-sa
mpling measurement error. For both assessments, the magnitude of score vari
ation due to the hands-on task indicated that it tapped a kind of knowledge
not addressed by the other three tasks. Although IN and FR were similar in
difficulty, they correlated differently with an external measure of scienc
e achievement. Moreover, measurement error differed depending on assessment
administration sequence. The results indicate that shells can produce reli
able assessments, but do not solve the task-sampling variability problem or
insure assessment exchangeability. We conclude that future shell research
should focus on: (a) increasing shell precision, (b) improving shell usabil
ity, and (c) determining what specifications must be provided by the shell
to ensure that the assessments generated by different developers are compar
able.