Sei sulla pagina 1di 22

Verification of Ensemble Forecasts - A

Survey

Laurence J. Wilson
Meteorological Service of Canada
Montreal, Quebec
Outline
• The ensemble verification problem
– Attributes applied to the ensemble distribution
• Verification of the ensemble distribution
– Wilson 1999
– RPS and CRPS
– Rank Histogram
• Verification of individual ensemble members
• Verification of probability forecasts from the
ensemble
– Reliability tables
– The ROC
Verification of the ensemble
• Problem:
– how to compare a distribution with an observation
• The concept of “consistency”:
– For each possible probability distribution f, the a posteriori
verifying observations are distributed according to f in
those circumstances when the system predicts the
distribution f. (Talagrand)
– similar to reliability
• The concept of “non-triviality”
– the eps must predict different distributions at different
times
Strategy for ensemble verification
Ensemble verification - distribution
Ensemble verification - 500 mb
Ensemble verification - 500 mb
Comments on “Wilson” score
• Sensitive both to “nearness” of the ensemble mean
and to ensemble spread
• Verifies the distribution only in the vicinity of the
observation; variations outside the window have
no impact
• Believed to be strictly proper - shown empirically
• Related to Brier Score for a single forecast
Sc = 1 − BS
• Can account for forecast “difficulty” by choosing
window based on climatological variance
Verification of approximations to the eps
distribution
• The Rank probability score (RPS)
1  K  i i

2

RPS = ∑ ∑ P − ∑O  
K − 1  i =1  n =1 n n 

  n =1  
– discrete form, choose categories; samples distribution
according to categories
• Continuous RPS

[P ( x ) − Pa ( x ) ] dx
∞ 2
CRPS ( P , x a ) = ∫
−∞
CRPS example

CDF - Forecast-observed

0.9

0.8

0.7

0.6
Probability

0.5

0.4

0.3

0.2

0.1

0
X

Forecast Observed
Rank Histogram (Talagrand Diagram)
• Preparation
– order the members of the ensemble from lowest to
highest - identifies n+1 ranges including the two
extremes
– identify the location of the observation, tally over a
large number of cases
• Interpretation
– Flat indicates ensemble spread about right to represent
uncertainty
– U-shaped - ensemble spread too small
– dome-shaped - ensemble spread too large
– assymetric - over- or under-forecasting bias
– This is NOT a true verification measure
Rank Histogram example
Rank Histogram
Verification of individual members
• Preferred for comparison with operational model
than verification of ensemble mean
• Unperturbed control
– compare with full resolution model
• Best and worst member
– a “posteriori” verification - less use to forecasters
– select over a forecast range or individually at each range
• Methods
– all that apply to continuous fields: RMSE, MAE, bias,
anomaly correlation etc.
– preferable to verify against data than analysis.
The Ensemble mean
• Popular, because scores well with quadratic rules
• Should NOT be compared to individual outcomes:
– different sampling distribution
– not a trajectory of the model
Verification of probability forecasts from the
Ensemble
• Same as verification of any probability forecasts
• Reliability Table (with unconditional
distribution of forecasts) + ROC (with
likelihood diagram) sufficient for complete
diagnostic verification
• Reliability table: Distribution conditioned by fcst
• ROC: Distribution conditioned by obs.
• Attributes:
• reliability
• sharpness
• resolution
• discrimination
ROC - ECMWF Ensemble Forecasts
Temperature 850 mb anomaly <-4C (vs. analysis)

Like lihood Dia gra m - 96 h


Re la tive Ope ra ting Cha ra cte ristic
T850 a nom a ly <-4, Europe a na l 2000
1 1500

Cases
y es
1000
0.9 no
500

0.05

0.25

0.45

0.65

0.85
0.8

For e cas t Pro bability


0.7

Like lihood Dia gra m - 144 h


0.6 3000
96 h 2500
Hit Rate

0 s kill 2000

Cases
0.5 y es
144 h 1500
240 h 1000 no
0.4 500
0

0.05

0.25

0.45

0.65

0.85
0.3
For e cas t Pro bability
AZ DA
0.2 96 h 0.900 1.812 Like lihood Dia gra m - 240 h
144 h 0.831 1.357
3500
240 h 0.725 0.844
0.1 3000
2500

Cases
2000
y es
0 1500
1000 no
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Fals e Alarm Rate 500
0

0.05

0.25

0.45

0.65

0.85
For e cas t Pro bability
ROC Issues
• Empirical vs. fitted
• No. points needed to define the ROC
• ROC and value (“potential value”)
ROC - threshold variation
(Wilson, 2000)

ROC - Summer 97, Europe

0.9

0.8

0.7

d 3 - 1 mm
0.6
d 3 - 2 mm
Hit Rate

0.5
d 3 - 5 mm

0.4 d 3 - 10 mm

0.3

0.2

0.1

0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
False Alarm Rate
ROC - Summer 97 -Europe

0.9

0.8

0.7

0.6
d 3 - 1 mm
Hit Rate

0.5 No skill
d 3 - 10 mm
HR - 1 mm
0.4
HR - 10 mm

0.3

Az s
0.2 d3 1mm - 0.866 d3 1 mm - 1.221
d3 10mm - 0.851 d3 10 mm - 1.096
0.1

0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
False Alarm Rate
Summary
• Verification of the ensemble distribution - depends
on how it is to be used by forecaster
• Two aspects: verification of distribution vs.
verification of probabilities from the distribution
• Several measures shown, characteristics identified
• Sufficiency of Reliability table and ROC graph for
diagnostic verification of probability forecasts

Potrebbero piacerti anche