EXAMINATION OF THE ASSUMPTIONS AND PROPERTIES OF THE GRADED ITEM RESPONSE MODEL - AN EXAMPLE USING A MATHEMATICS PERFORMANCE ASSESSMENT

Citation
S. Lane et al., EXAMINATION OF THE ASSUMPTIONS AND PROPERTIES OF THE GRADED ITEM RESPONSE MODEL - AN EXAMPLE USING A MATHEMATICS PERFORMANCE ASSESSMENT, Applied measurement in education, 8(4), 1995, pp. 313-340
Citations number
30
Categorie Soggetti
Psychology, Educational","Psychologym Experimental","Education & Educational Research
ISSN journal
08957347
Volume
8
Issue
4
Year of publication
1995
Pages
313 - 340
Database
ISI
SICI code
0895-7347(1995)8:4<313:EOTAAP>2.0.ZU;2-O
Abstract
With the growing popularity of performance assessments over the last d ecade, the use of item response theory (IRT) models for polytomously s cored items has increased. However, prior to applying the graded item response model to data derived from a performance assessment, studies are needed to ensure that the assumptions and item parameter propertie s of the models are satisfied. This study examined the dimensionality of a mathematics performance assessment, the extent to which a subset of the tasks is speeded, and the extent to which the item parameter es timates are stable over time. The results from confirmatory factor ana lyses on three testing occasions indicated that the mathematics perfor mance assessment is unidimensional on each occasion. For two of the ei ght tasks that were examined for ''speededness,'' the threshold and sl ope parameter estimates were not stable over two conditions of adminis tration time (i.e., approximately 5 vs. 10 min), and for another two t asks, only the slope parameter estimates were not stable over the two conditions of administration time. The analysis of the stability of it em parameter estimates over time indicated that, from the fall of 1991 to the spring of 1992, the parameter estimates were stable. However, from the fall of 1992 to the spring of 1993, both the slope and thresh old parameter estimates were variant for 2 of the 33 tasks, and for an other two tasks, only the threshold estimates differed. Some potential reasons for the instability of the item parameter estimates and the s peededness of tasks are discussed. For example, the differential empha sis on instructional content between testing occasions may affect the stability of item parameters over time.