Note on sources of sampling variability in science performance assessments

Citation
Rj. Shavelson et al., Note on sources of sampling variability in science performance assessments, J EDUC MEAS, 36(1), 1999, pp. 61-71
Citations number
11
Categorie Soggetti
Psycology
Journal title
JOURNAL OF EDUCATIONAL MEASUREMENT
ISSN journal
00220655 → ACNP
Volume
36
Issue
1
Year of publication
1999
Pages
61 - 71
Database
ISI
SICI code
0022-0655(199921)36:1<61:NOSOSV>2.0.ZU;2-A
Abstract
In 1993, we reported in Journal of Educational Measurement that task-sampli ng variability was the Achilles' heel of science performance assessment. To reduce measurement error, tasks needed to be stratified before sampling, s ampled in large number, or possibly both. However, Cronbach, Linn, Brennan, & Haertel (1997) pointed out that a task-sampling interpretation of a larg e person x task variance component might be incorrect. Task and occasion sa mpling are confounded because tasks are typically given on only a single oc casion. The person x task source of measurement error is then confounded wi th the pt x occasion source. If pto variability accounts for a substantial part of the commonly observed pt interaction, stratifying tasks into homoge nous subsets-a cost-effective way of addressing task sampling variability-m ight not increase accuracy. Stratification would not address the pto source of error. Another conclusion reported in JEM was that only direct observat ion (DO) and notebook (NB) methods of collecting performance assessment dat a were exchangeable; computer simulation, short-answer, and multiple-choice methods were not. However, if Cronbach et al. were right, our exchangeabil ity conclusion might be incorrect. After re-examining and reanalyzing data, we found support for Conbach et al. We concluded that large task-sampling variability was due to both the person x task interaction and person x task x occasion interaction. Moreover, we found that direct observation, notebo ok and computer simulation methods were equally exchangeable, but their exc hangeability was limited by the volatility of student performances across t asks and occasions.