Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
EXTREME ORDER
STATISTICS AND
ACTUARIAL
APPLICATIONS
2004 ERM Symposium, Chicago
April 26, 2004
Sessions CS 1E, 2E, 3E:
Extreme Value Forum
H. N. Nagaraja
(hnn@stat.ohio-state.edu)
Ohio State University, Columbus
1
Hour 1: ORDER STATISTICS
- BASIC THEORY & ACTUARIAL
EXAMPLES
1. Exact Distributions of Order Statistics
Uniform Parent
Exponential
Normal
Gumbel
2. Order Statistics Related Quantities of Actuarial In-
terest
Last Survivor Policy
Multiple Life Annuities
Percentiles and VaR (Value at Risk)
Tail Probabilities
Conditional Tail Expectation (CTE)
Exceedances and Mean Excess Function
3. Large Sample Distribution of Order Statistics
Central Order Statistics (Central Percentiles)
Extreme Order Statistics
Intermediate Order Statistics
2
4. Sum of top order statistics & sample CTE
5. Data Examples and Model Fitting
Nationwide Data
Danish Fire-loss Data
Missouri Flood level Data
Venice Sea Level Data
3
Hour 3: EXTREME VALUES -
INFERENCE AND APPLICATIONS
1. GEV Distribution- Maximum Likelihood Estimates
2. Blocked Maxima Approach
3. Inference based on top k extremes
Pickands Estimator
Hills Estimator
Estimating High Quantiles
4. GPD and Estimates Based on Exceedances
5. General Diagnostic Plots
QQ Plot
Empirical Mean Excess Function Plot
6. Examples
7. General Remarks
8. Software Resources
9. References
4
Hour 1
F(r)(x) = Pr{X(r) x}
Xn
n
= F i(x)[1 F (x)]ni
i=r
i
n!
f(r)(x) = F r1(x)[1 F (x)]nr f (x).
(r 1)!(n r)!
5
Standard Uniform Parent PDFs
f (x) = 1, 0 x 1.
3
pdf at x
6
Standard Uniform Parent CDFs
F (x) = x, 0 x 1.
r r(n r + 1)
E(X(r)) = ; V ar(X(r)) = .
n+1 (n + 1)2
0.8
0.6
cdf at x
0.4
0.2
7
Standard Normal Parent PDFs
1 1
f (x; , ) = exp{ 2 (x )2}, < x < .
2 2
Here is the mean and is the standard deviation.
Standard Normal: = 0; = 1.
0.8
2 4
0.6
1 5
Probability Density Function
0.4
0.2
0.0
4 2 0 2 4
Figure 3: The pdf of the standard normal population and the pdfs
of the first, second, third, fourth and the largest order statistics
from a random sample of size 5.
8
Standard Normal Parent PDFs of the Maxima
1.2
200
100
1.0
50
Probability Density Function
0.8
10
0.6
0.4
0.2
0.0
4 2 0 2 4 6
Figure 4: The pdf of the standard normal parent and the pdfs of
the maximum from random samples of size n = 10, 50, 100, 200.
9
Standard Exponential Parent PDFs
f (x; ) = exp{x}, 0 x < .
Here mean = std. dev. = 1/; Std. Exponential: = 1.
Moments for the Max:
E(X(n)) log(n) 0.5772; V ar(X(n)) 2/6 1.64.
1.2
16
1.0
17
Probability Density Function
0.8
18
0.6
19
0.4
20
0.2
0.0
0 1 2 3 4
10
Standard Gumbel Parent PDFs
F (x; a, b) = exp{ exp[(x )/]}, < x < .
Interesting Fact: X(n) is Gumbel with and /n.
Standard Gumbel: = 0, = 1;
E(X) = (Eulers constant) 0.5772;
Variance = 2/6.
0.8
48
0.6
49
Probability Density Function
0.4
50
0.2
0.0
2 0 2 4 6
Figure 6: The pdfs of the standard Gumbel parent and those of the
top 3 order statistics from a random sample of size n = 50.
11
2 Order Statistics Related Quantities of
Actuarial Interest
13
3 Large Sample Distribution of Order Statis-
tics
14
Central Case
15
Intermediate Case
17
4 Sum of top order statistics & sample CTE
18
5 Data Examples and Model Fitting
90000000
.001 .01 .05.10 .25 .50 .75 .90.95 .99 .999
80000000
70000000
60000000
50000000
40000000
30000000
20000000
10000000
-10000000 30000000 60000000 0
-10000000
-4 -3 -2 -1 0 1 2 3 4
Quantiles Moments
100.0% maximum 89490993 Mean -1.0354e6
99.5% 36103098 Std Dev 5983789.7
97.5% 11807434 Std Err Mean 189318.73
90.0% 2782728.4 upper 95% Mean -663845.7
75.0% quartile 207245.65 lower 95% Mean -1.4069e6
50.0% median -2.094e+6 N 999
25.0% quartile -3.8605e6
10.0% -5.1367e6
2.5% -6.5963e6
0.5% -8.0968e6
0.0% minimum -9.133e+6
30000000
20000000
mexf
10000000
19
-10000000 10000000 30000000 50000000
lag(-1)
Danish Fire Insurance Claims Data
Data Source: http://www.math.ethz.ch/~mcneil/
(Alexander McNeil Swiss Federal Institute of Technology)
Original Source: Rytgaard (ASTIN Bulletin)
200
Loss-in-DKM
100
0
01/01/1978
01/01/1980
01/01/1982
01/01/1984
01/01/1986
01/01/1988
01/01/1990
01/01/1992
0 10 20 30 40 50 60
Date
120
110
90
70
mexf
60
40
20
0
0 10 20 30 40 50 60 70 80 90100 120 140
loss_lag1
20
Missouri River Annual Maximum Flow Data for the Towns of
Boonville and Hermann
Data Source: US Army Corp of Engineers
1100000 1100000
1000000 1000000
900000 900000
800000 800000
700000 700000
Boonville
Hermann
600000
600000
500000
500000
400000
300000 400000
200000 300000
100000 200000
0 100000
-100000 0
1880 1900 1920 1940 1960 1980 2000 1880 1900 1920 1940 1960 1980 2000
Year Year
1000000 1000000
900000 900000
800000 800000
700000
700000
600000
600000
500000
500000
400000
400000
300000
200000 300000
100000 200000
0 100000
Extreme-Value and Normal fits for Boonville Extreme Value Fit for Hermann
600000
Boonv ille 1.0000 0.8936
500000 Hermann 0.8936 1.0000
400000
300000
200000
100000
0
100000 300000 500000 700000 900000
Hermann
21
Venice Annual Extreme Sea level Readings (1931-1981)
150
200 140
130
120
150
max1
max2
110
100
100
90
80
50 70
1920 1930 1940 1950 1960 1970 1980 1990 1920 1930 1940 1950 1960 1970 1980 1990
y ear y ear
200 180
180
160
160
140
140
max1
120 120
100 100
80
80
60
1930 1940 1950 1960 1970 1980 1990
60
y ear
22
Hour 2: EXTREME VALUES -
BASIC MODELS
1 Sample Maximum
23
Notes
24
Formulas for Norming Constants For the Maximum
25
An Example: Pareto Distribution
Another Example
26
2 Generalized Extreme-Value Distribution
h i
y 1
exp (1 + )
, 1 + y
> 0,
6= 0,
G (y; , ) = [(y)/]
exp e , < y < ,
= 0.
(2)
is the shape, is the location and is the scale pa-
rameter.
Connections
27
Moments of GEV
Take = 0, = 1.
When 6= 0,
1
Mean = [(1 ) 1] , < 1;
1 2
Variance = 2 (1 2) (1 ) , < 1/2;
1
Mode = (1 + ) 1 .
When = 0, we have the Gumbel distribution and
Mean = = 0.5772.. = Eulers constant;
Variance = 2/6; Mode = 0.
Quantile Function of GEV
Take = 0, = 1.
When 6= 0,
1
G1
(q) = [ log(q)] 1
For the Gumbel cdf, G1
0 (q) = log[ log(q)].
28
3 Limit Distributions of the top Extremes
31
4 Limiting Distribution of the Sample Mini-
mum
32
X. Thus need to know only the upper extremes
well.
33
5 Distributions of Exceedances and the Gen-
eralized Pareto Distribution (GPD)
When 6= 0,
1
Mean = , < 1;
1
1
Variance = (1 2)(1 )2 , < 1/2;
1
Mode = (1 + ) 1 .
1 + t
Mean Excess Function = E(Xt|X > t) = , < 1.
1
When = 0, we have the standard exponential distri-
bution and
Mean = Standard Deviation = 1 = Mean Excess Function.
Quantile Function of GPD
When 6= 0,
1
H1(q) =
(1 q) 1 .
For the Exponential cdf, H01(q) = log(1 q).
Note
The pdf is
1
1
G (y; , )(1 + y
) ,
(y ) > , 6= 0,
g (y; , ) = [(y)/] [(y)/]
1
exp e e ,
< y < , = 0.
(7)
36
1.1 Maximum Likelihood Method
37
generally,
m( ) N (0, 2)
where 2 is related to the average Fisher information
in the sample.
This variance 2 can be estimated by the reciprocal
of
1 2 log L
m 2 =
From (7) we can write the joint pdf of Y1, . . . , Ym. The
log-likelihood, log L(, ) will be
Xm X m
yi yi
m log exp .
i=1
i=1
39
We differentiate this with respect to and and ob-
tain two equations. Solving iteratively using numerical
techniques, we obtain the MLE (, ). If m is large, the
vector is approximately normally distributed with mean
(, ) and covariance matrix
2 !
2
1 6 2
6 + (1 ) (1 )
m 2 (1 ) 1
where is the Eulers constant 0.5772... This can be
used to provide confidence intervals for the parameter es-
timates.
GEV Distribution
40
Smith (1985)
When > 0.5, MLEs have the usual properties.
1 < 0.5: MLEs are generally obtainable but
do not have the standard asymptotic properties.
< 1: MLEs are unlikely to be obtainable from
numerical techniques- likelihood is too bumpy.
If 0.5, we have a short bounded upper tail for
F - not encountered; Frechet has > 0.
, ) is asymptotically normal with mean (, , )
(,
and the covariance matrix can be approximated us-
ing the inverse of the observed Fisher information
matrix.
Large Quantile Estimation
Let zp be the upper pth quantile of the GEV. Then its
estimate is given by
(1 y p ), if =
6 0
zp =
log(y ), if = 0 (Gumbel),
p
42
3 Inference Using Top k Order Statistics
Remarks
If we have data of top k order statistics from several
blocks, say m, then we use the product of these like-
lihoods and determine the MLEs using the combined
likelihood.
43
How large k can we take? A tricky question. Larger
k means decreased standard error of the estimators,
but increased bias as we move away from G towards
F . Typically one increases k and looks at a plot of
the estimates of the parameters; when they stabilize,
one stops.
Quick Estimators of
44
(b) Pickands Estimator
Works for any real.
1 X (nk+1) X(n2k+1)
P = log .
log 2 X(n2k+1) X(n4k+1)
Here k is large but k/n is small.
Choice of k is determined by the examination of the
plot of (k, P (k)).
Remark
There are other estimators that are refinements of Hills
and Pickands. They have explicit forms and hence com-
putation is easy, but there are no clear preferences.
45
4 Generalized Pareto Distribution
Estimates Based on Exceedances
46
by ( )
Pr{X > t}
xp = t + 1
p
where we assume Pr{X > t} > p and Pr{X > t} is
estimated by the sample proportion exceeding t.
Choice of the threshold t - how close the exceedances
are modeled by the Generalized Pareto distribution.
Such data are called Peaks Over Threshold (POT)
data.
47
5 General Diagnostic Plots
5.1 QQ plots
48
6 General Remarks
49
7 Examples
c) Fitting GEV.
50
8 Computational/Software Resources
XTREMES.
A free-standing software that comes with Reiss and
Thomas book (academic edition- with limited data
capacity). Professional edition is also available. Web-
site:
http://www.xtremes.math.uni-siegen.de/xtremes/
Alexander McNeils Website:
http://www.math.ethz.ch/ mcneil/software.html
Has free software attachment that works with S-PLUS.
http://www.maths.bris.ac.uik/ masgc/ismev/summary.html
Associated with Coles book.
51
9 References
53
ogy and Computing in Applied Probability 4, 377-
392.
12. Pickands, J. (1975). Statistical inference using ex-
treme order statistics. Annals of Statistics 3, 119-
131.
13. Reiss R.-D. and Thomas, M. (1997). Statistical Anal-
ysis of Extreme Values. Birkhauser Verlag, Basel,
Switzerland.
XTREMES software comes with the book.
14. SAS JMP software, Version 5.1. (2003). SAS Insti-
tute, Cary NC.
15. Smith, R. L. (1985). Maximum likelihood estimation
in a class of non-regular cases. Biometrika 72, 67-
90.
54