Interrater reliability of sleep stage scoring according to Rechtschaffen and Kales rules (RKR): A review and methodological considerations

Citation
H. Danker-hopfe et Wm. Herrmann, Interrater reliability of sleep stage scoring according to Rechtschaffen and Kales rules (RKR): A review and methodological considerations, KLIN NEUROP, 32(2), 2001, pp. 89-99
Citations number
52
Categorie Soggetti
Neurology
Journal title
KLINISCHE NEUROPHYSIOLOGIE
ISSN journal
14340275 → ACNP
Volume
32
Issue
2
Year of publication
2001
Pages
89 - 99
Database
ISI
SICI code
1434-0275(200106)32:2<89:IROSSS>2.0.ZU;2-5
Abstract
A literature review has been done on interrater reliability of sleep stage scoring according to the Rechtschaffen and Kales rules both between two and more than two raters. These results have been compared with the interrater reliability between visual scorings and semiautomatic as well as fully aut omated scorings. For single night scorings the interrater reliability varie s between 61% and 96% while at the group level the agreement between visual scorings varies between 85% and 95% with an average of approximately 89%. The Interrater reliability between visual and automatic scoring at a group level varies between 70% and 95% with an average of about 83%. The interrat er reliability of sleep stage scorings varies with the number and the exper ience of the scorers, the choice of the 100% reference (if two or more huma n experts are involved), the number of stages that are distinguished, the s ample (healthy subjects vs. patients with sleep disturbances), the age of t he subjects and the choice of the statistical method to estimate the interr ater reliability. Based on the review of interrater reliability data method ological considerations on the measurement of interrater reliability are pr esented and discussed. For variables measured on different scales (quantita tive sleep parameters measured on a metric scale vs. sleep stages as qualit ative variables measured on a nominal scale) different approaches to estima te interrater reliability are used. For sleep parameters measured on a metr ic scale the advantages and disadvantages of correlation statistics on one hand and approaches to test group differences on the other are discussed. A mong the approaches of correlation analysis, intra-class correlation should be the method of choice and with regard to approaches that test group diff erences the paired nature of the data has to be considered. Only a combinat ion of both statistical approaches yields a comprehensive impression on the interrater reliability of the scoring results. For sleep stages, which rep resent nominal scaled qualitative data, agreement is commonly expressed as a percentage. Although this is a simple measure which is readily understood , it is not an adequate index of agreement since it makes no allowance for agreement between scorers that might be attributed just to chance. This dis advantage is overcome by the kappa statistics (by Cohen for two scorers and by Fleiss for more than two scorers), which expresses the difference betwe en observed and chance agreement in relation to maximum possible excess of observed over chance agreement. Kappa usually varies between 0 (agreement i s equal to chance) and 1 (complete agreement between scorers). Values <0, w hich are rarely observed, indicate that there is a systematic deviation in agreement.