Science performance assessments-assessments where students conduct han
ds-on investigations, and report and interpret their findings-have bee
n used in large-scale testing programs for years and have recently ent
ered into the international achievement arena. This paper argues that
in light of the cost, logistic and administrative challenges of fieldi
ng these assessments, a performance assessment technology needs to be
developed. Such a technology would provide tools for assessment constr
uction and evaluation that would address these challenges. Here we rep
ort on our research that is working toward this goal, albeit without c
omplete success. We present a conceptual framework for defining, gener
ating and evaluating assessments. A SPA is defined by a task, a respon
se demand, and a scoring system. We identify types of tasks/response d
emands (e.g., comparative-compare alternatives and reach a conclusion)
, link them to regularities in scoring systems that focus on the scien
tific validity of students' investigations, and evaluate the technical
qualities of the assessments within the context of generalizability t
heory (reliability and validity). We summarize, conceptually and step-
by-step, the development and evaluation process: providing concrete ex
amples. We conclude that work remains to be done to determine whether
a performance-assessment technology is a feasible goal. (C) 1998 Elsev
ier Science Ltd. All rights reserved.