Ah. Murphy, GENERAL DECOMPOSITIONS OF MSE-BASED SKILL SCORES - MEASURES OF SOME BASIC ASPECTS OF FORECAST QUALITY, Monthly weather review, 124(10), 1996, pp. 2353-2369
Skill scores defined as measures of relative mean square error-and bas
ed on standards of reference representing climatology, persistence, or
a linear combination of climatology and persistence-are decomposed. T
wo decompositions of each skill score are formulated: i) a decompositi
on derived by conditioning on the forecasts and 2) a decomposition der
ived by conditioning on the observations. These general decompositions
contain terms consisting of measures of statistical characteristics o
f the forecasts and/or observations and terms consisting of measures o
f basic aspects of forecast quality. Properties of the terms in the re
spective decompositions are examined, and relationships among the vari
ous skill scores-and the terms in the respective decompositions-are de
scribed. Hypothetical samples of binary forecasts and observations are
used to illustrate the application and interpretation of these decomp
ositions. Limitations on the inferences that can be drawn from compara
tive verification based on skill scores, as well as from comparisons b
ased on the terms in decompositions of skill scores, are discussed. Th
e relationship between the application of measures of aspects of quali
ty and the application of the sufficiency relation (a statistical rela
tion that embodies the concept of unambiguous superiority) is briefly
explored. The following results can be gleaned from this methodologica
l study. 1) Decompositions of skill scores provide quantitative measur
es of-and insights into-multiple aspects of the forecasts, the observa
tions, and their relationship. 2) Superiority in terms of overall skil
l is no guarantor of superiority in terms of other aspects of quality.
3) Sufficiency (i.e., unambiguous superiority) generally cannot be in
ferred solely on the basis of superiority over a relatively small set
of measures of specific aspects of quality. Neither individual measure
s of overall performance (e.g., skill scores) nor sets of measures ass
ociated with decompositions of such overall measures respect the dimen
sionality of most verification problems. Nevertheless, the decompositi
ons described here identify parsimonious sets of measures of basic asp
ects of forecast quality that should prove to be useful in many verifi
cation problems encountered in the real world.