ASSESSING THE RELIABILITY OF 2 TOXICITY SCALES - IMPLICATIONS FOR INTERPRETING TOXICITY DATA

Citation
Md. Brundage et al., ASSESSING THE RELIABILITY OF 2 TOXICITY SCALES - IMPLICATIONS FOR INTERPRETING TOXICITY DATA, Journal of the National Cancer Institute, 85(14), 1993, pp. 1138-1148
Citations number
42
Categorie Soggetti
Oncology
Volume
85
Issue
14
Year of publication
1993
Pages
1138 - 1148
Database
ISI
SICI code
Abstract
Background: The toxicity of a given cancer therapy is an important end point in clinical trials examining the potential costs and benefits o f that therapy. Treatment-related toxicity is conventionally measured with one of several toxicity criteria grading scales, even though the reliability and validity of these scales have not been established. Pu rpose: We determined the reliability of the National Cancer Institute of Canada Clinical Trials Group (NCIC-CTG) expanded toxicity scale and the World Health Organization (WHO) standard toxicity scale by use of a clinical simulation of actual patients. Methods: Seven experienced data managers each interviewed 12 simulated patients and scored their respective acute toxic effects. Inter-rater agreement (agreement betwe en multiple raters of the same case) was calculated using the kappa (k appa) statistic across all seven randomly assigned raters for each of 18 toxicity categories (13 NCIC-CTG and five WHO categories). Intra-ra ter agreement (agreement within the same rater on one case rated on se parate occasions) was calculated using kappa over repeated cases (wher e raters were blinded to the repeated nature of the subjects). Proport ions of agreement (estimate of the probability of two randomly selecte d raters assigning the same toxicity grade to a given case) were also calculated for inter-rater agreement. Since minor lack of agreement mi ght have adversely affected these statistics of agreement, both kappa and proportion of agreement analyses were repeated for the following c ondensed grading categories: none (0) versus low-grade (1 or 2) versus high-grade (3 or 4) toxicity present. Results: Modest levels of inter -rater reliability were demonstrated in this study with kappa values t hat ranged from 0.50 to 1.00 in laboratory-based categories and from - 0.04 to 0.82 for clinically based categories. Proportions of agreement for clinical categories ranged from 0.52 to 0.98. Condensing the toxi city grades improved statistics of agreement, but substantial lack of agreement remained (kappa range, -0.04-0.82; proportions of agreement range, 0.67-0.98). Conclusions: Experienced data managers, when interv iewing patients, draw varying conclusions regarding toxic effects expe rienced by such patients. Neither the NCIC-CTG expanded toxicity scale nor the WHO standard toxicity scale demonstrated a clear superiority in reliability, although the breadth of toxic effects recorded differe d.