The performance of ensemble prediction systems (EPSs) is investigated by ex
amining the probability distribution of 500-hPa geopotential height over Eu
rope. The probability score (or half Brier score) is used to evaluate the q
uality of probabilistic forecasts of a single binary event. The skill of an
EPS is assessed by comparing its performance, in terms of the probability
score, to the performance of a reference probabilistic forecast. The refere
nce forecast is based on the control forecast of the system under considera
tion, using model error statistics to estimate a probability distribution.
A decomposition of the skill score is applied in order to distinguish betwe
en the two main aspects of the forecast performance: reliability and resolu
tion. The contribution of the ensemble mean and the ensemble spread to the
performance of an EPS is evaluated by comparing the skill score to the skil
l score of a probabilistic forecast based on the EPS mean, using model erro
r statistics to estimate a probability distribution.
The performance of the European Centre for Medium-Range Weather Forecasts (
ECMWF) EPS is reviewed. The system is skillful (with respect to the referen
ce forecast) from +96 h onward. There is some skill from +48 h in terms of
reliability. The performance comes mainly from the contribution of the ense
mble mean. The contribution of the ensemble spread is slightly negative, bu
t becomes positive after a calibration of the EPS standard deviation. The c
alibration improves predominantly the reliability contribution to the skill
score. The calibrated EPS is skillful from +72 h onward.
The impact of ensemble size on the performance of an EPS is also investigat
ed. The skill score of the ECMWF EPS decreases steadily with reducing numbe
rs of ensemble members and the resolution is particularly affected. The imp
act is mainly due to the ensemble spread contributing negatively to the ski
ll. The ensemble mean contribution to the skill decreases marginally when r
educing the ensemble size up to 11 members.
The performance of the U.S. National Centers for Environmental Prediction (
NCEP) EPS is also reviewed. The NCEP EPS has a lower skill score (vs a refe
rence forecast based on its control forecast) than the ECMWF EPS especially
in terms of reliability. This is mainly due to the smaller spread of the N
CEP EPS contributing negatively to the skill. On the other hand, the NCEP a
nd ECMWF ensemble means contribute similarly to the skill. As a consequence
, the performance of the two systems in terms of resolution is comparable.
The performance of a poor man's EPS, consisting of the forecasts of differe
nt NWP centers, is discussed. The poor man's EPS is more skillful than eith
er the ECMWF EPS or the NCEP EPS up to +144 h, despite a negative contribut
ion of the spread to the skill score. The higher skill of the poor man's EP
S is mainly due to a better resolution.