ITA
ENG

On the development and evaluation of a shell for generating science performance assessments

Authors

Solano-Flores, G Jovanovic, J Shavelson, RJ Bachman, M

Citation

G. Solano-flores et al., On the development and evaluation of a shell for generating science performance assessments, INT J SCI E, 21(3), 1999, pp. 293-315

Citations number

Categorie Soggetti

Education

Journal title

INTERNATIONAL JOURNAL OF SCIENCE EDUCATION

ISSN journal

09500693 → ACNP

Volume

Issue

Year of publication

1999

Pages

293 - 315

Database

ISI

SICI code

0950-0693(199903)21:3<293:OTDAEO>2.0.ZU;2-C

Abstract

We constructed a shell (blueprint) for generating science performance asses sments, and evaluated the characteristics of the assessments produced with it. The shell addressed four tasks: Planning, Hands-On, Analysis, and Appli cation, Two parallel assessments were developed, Inclines (IN) and Friction (FR). Two groups of fifth graders who differed in both science curriculum experience and socioeconomic status took the assessments consecutively in e ither of two sequences, IN --> FR or FR --> IN. We obtained high interrater reliabilities for both assessments, statistically significant score differ ences due to assessment administration sequence, and a considerable task-sa mpling measurement error. For both assessments, the magnitude of score vari ation due to the hands-on task indicated that it tapped a kind of knowledge not addressed by the other three tasks. Although IN and FR were similar in difficulty, they correlated differently with an external measure of scienc e achievement. Moreover, measurement error differed depending on assessment administration sequence. The results indicate that shells can produce reli able assessments, but do not solve the task-sampling variability problem or insure assessment exchangeability. We conclude that future shell research should focus on: (a) increasing shell precision, (b) improving shell usabil ity, and (c) determining what specifications must be provided by the shell to ensure that the assessments generated by different developers are compar able.