In 1993, we reported in Journal of Educational Measurement that task-sampli
ng variability was the Achilles' heel of science performance assessment. To
reduce measurement error, tasks needed to be stratified before sampling, s
ampled in large number, or possibly both. However, Cronbach, Linn, Brennan,
& Haertel (1997) pointed out that a task-sampling interpretation of a larg
e person x task variance component might be incorrect. Task and occasion sa
mpling are confounded because tasks are typically given on only a single oc
casion. The person x task source of measurement error is then confounded wi
th the pt x occasion source. If pto variability accounts for a substantial
part of the commonly observed pt interaction, stratifying tasks into homoge
nous subsets-a cost-effective way of addressing task sampling variability-m
ight not increase accuracy. Stratification would not address the pto source
of error. Another conclusion reported in JEM was that only direct observat
ion (DO) and notebook (NB) methods of collecting performance assessment dat
a were exchangeable; computer simulation, short-answer, and multiple-choice
methods were not. However, if Cronbach et al. were right, our exchangeabil
ity conclusion might be incorrect. After re-examining and reanalyzing data,
we found support for Conbach et al. We concluded that large task-sampling
variability was due to both the person x task interaction and person x task
x occasion interaction. Moreover, we found that direct observation, notebo
ok and computer simulation methods were equally exchangeable, but their exc
hangeability was limited by the volatility of student performances across t
asks and occasions.