Ra. Lordo et al., Comparing and evaluating alternative (in vitro) tests on their ability to predict the Draize maximum average score, TOX VITRO, 13(1), 1999, pp. 45-72
The Cosmetic, Toiletry, and Fragrance Association (CTFA) Evaluation of Alte
rnatives Program comprised a multi-phased study of the relationship between
Draize eye irritation test data and comparable data from a selection of pr
omising alternative tin vitro) tests. The CTFA Program was designed to dete
rmine the effectiveness and limitations of several in vitro tests over a ra
nge of different cosmetic and personal-care product types. Test materials c
onstituted experimental formulations representative of three distinct produ
ct types. Each material was tested in vivo (according to a modified Draize
eye irritation test protocol) and ill vitro (according to one of up to fort
y different protocols). A statistical ranking and selection procedure ("con
cordance analysis") was used to identify those in vitro tests where the rel
ationships between in vitro and ill vivo score was sufficiently well define
d to warrant further statistical analysis. In vitro test performance was th
en evaluated by regression modelling of. these relationships. Maximum avera
ge Draize score (MAS) was utilized as the primary quantitative measure of e
ye irritation potential in vivo. The goodness-of-fit of the observed data t
o the regression model and comparison of the magnitude of upper and lower p
rediction-bounds on the range of probable MAS values associated with the re
gression model fit (prediction intervals) provide a means by which the perf
ormance of each in vitro test may be measured relative to Draize test outco
me. The narrower the prediction interval (i.e. the more precise the fill, t
he more predictive of in vivo score (MAS) is the in vitro lest result. The
prediction interval thus represents uncertainty associated with Draize lest
prediction. Such uncertainty depends heavily on the degree of irritancy. L
n Phases I and II, the widths of the prediction intervals were narrowest in
the region corresponding to low irritation potential; increasing widths we
re observed as irritation potential increased. In Phase III, relatively nar
row prediction interval widths were observed at both the low and high end o
f the observed range of irritation po potential; wider intervals were obser
ved in the middle of the observed range. In general, the selected endpoints
in each phase had similar average prediction interval widths and thereby d
iffered only slightly in their ability to predict MAS to a given level of p
recision; any differences between endpoints tended to occur at the low and/
or high ends of the observed range of irritation potential. The primary con
tributor to total variability associated with prediction of MAS is the devi
ation between the Draize score as observed in the laboratory and what is pr
edicted by the model for a given formulation. Consistently, this component
is responsible for 70% to 95% of the total variability. The other component
s (i.e. variability among replicate MAS and ill vitro scores) could be redu
ced simply by increasing the number of replicate tests performed on each te
st formulation. However, this would have relatively little impact on the ov
erall precision of prediction. (C) 1999 Elsevier Science Ltd. Ali rights re
served.