Sei sulla pagina 1di 69

Tutorial

Financial Econometrics/Statistics
2005 SAMSI program on Financial
Mathematics, Statistics, and Econometrics
Goal
At the index level
Part I: Modeling
... in which we see what basic properties of
stock prices/indices we want to capture
Contents
Returns and their (static) properties
Pricing models
Time series properties of returns
Why returns?
Prices are generally found to be non-
stationary
Makes life difficult (or simpler...)
Traditional statistics prefers stationary data
Returns are found to be stationary
Which returns?
Two type of returns can be defined
Discrete compounding



Continuous compounding
1
log

=
t
t
t
P
P
R
1
1
=
t
t
t
P
P
R
Discrete compounding
If you make 10% on half of your money and
5% on the other half, you have in total 7.5%

Discrete compounding is additive over
portfolio formation
Continuous compounding
If you made 3% during the first half year and
2% during the second part of the year, you
made (exactly) 5% in total

Continuous compounding is additive over
time
Empirical properties of returns
Mean St.dev. Annualized
volatility
Skewness Kurtosis Min Max
IBM -0.0% 2.46% 39.03% -23.51 1124.61 -138% 12.4%
IBM
(corr)
0.0% 1.64% 26.02% -0.28 15.56 -26.1% 12.4%
S&P 0.0% 0.95% 15.01% -1.4 39.86 -22.9% 8.7%
Data period: July 1962- December 2004; daily frequency
Stylized facts
Expected returns difficult to assess
Whats the equity premium?
Index volatility < individual stock volatility
Negative skewness
Crash risk
Large kurtosis
Fat tails (thus EVT analysis?)

Pricing models
Finance considers the final value of an asset to be
known
as a random variable , that is
In such a setting, finding the price of an asset is
equivalent to finding its expected return:

{ }
{ }
1 1 =
)
`

=
P
X E
P
X
E R E
X
Pricing models 2
As a result, pricing models model expected
returns ...

... in terms of known quantities or a few
almost known quantities
Capital Asset Pricing Model
One of the best known pricing models


The theorem/model states
( )
( )
f
t
m
t t i
f
t t i
r ER r ER =
, ,
|
{ }
( )
{ }
( )
{ }
( )
{ }
m
t
m
t t i
f
t
m
t
f
t t i
t i
R Var
R R Cov
r R E
r R E ,
, ,
,
=

= |
Black-Scholes
Also Black-Scholes is a pricing model



(Exact) contemporaneous relation between
asset prices/returns
( ) y volatilit , moneyness
price Stock
price Call
BS =
Time series properties of returns
Traditionally model fitting exercise without
much finance
mostly univariate time series and, thus, less
scope for tor the traditional cross-sectional
pricing models
lately more finance theory is integrated
Focuses on the dynamics/dependence in
returns
Random walk hypothesis
Standard paradigm in the 1960-1970
Prices follow a random walk
Returns are i.i.d.
Normality often imposed as well
Compare Black-Scholes assumptions

Box-J enkins analysis
Linear time series analysis
Box-Jenkins analysis generally identifies a
white noise
This has been taken long as support for the
random walk hypothesis
Recent developments
Some autocorrelation effects in momentum
Some (linear) predictability
Largely academic discussion
Higher moments and risk
Risk predictability
There is strong evidence for autocorrelation in
squared returns
also holds for other powers
volatility clustering
While direction of change is difficult to predict,
(absolute) size of change is
risk is predictable
The ARCH model
First model to capture this effect



No mean effects for simplicity
ARCH in mean
( )
2
2
1
, 0 ~
1
o c
c o
N
R R
t
t t t
+ =
ARCH properties
Uncorrelated returns
martingale difference returns
Correlated squared returns
with limited set of possible patterns
Symmetric distribution if innovations are
symmetric
Fat tailed distribution, even if innovations are
not
The GARCH model
Generalized ARCH




Beware of time indices ...
( )
2
2
2
2
1
2
1
1
, 0 ~
1

o c
|o o o
c o
N
R
R
t
t t t
t t t

+ + =
=
GARCH model
Parsimonious way to describe various
correlation patterns
for squared returns
Higher-order extension trivial
Math-stat analysis not that trivial
See inference section later
Stochastic volatility models
Use latent volatility process
( )
|
|
.
|

\
|
(
(

+ =
=

2
2
1
1
,
0
0
~
exp
q q c
q c c
o o o
o o o
q
c
q |
c
N
h h
h R
t
t
t t t
t t t
Stochastic volatility models
Also SV models lead to volatility clustering
Leverage
Negative innovation correlation means that
volatility increases and price decreases go
together
Negative return/volatility correlation
(One) structural story: default risk
Continuous time modeling
Mathematical finance uses continuous time,
mainly for simplicity
Compare asymptotic statistics as
approximation theory
Empirical finance (at least originally) focused
on discrete time models
Consistency
The volatility clustering and other empirical
evidence is consistent with appropriate
continuous time models
A simple continuous time stochastic volatility
model
( )
( )
( )
( ) 2
1
ln
t t t
t t t
dW dt d
dW dt S d
e o o | o
o
+ =
+ =
Approximation theory
There is a large literature that deals with the
approximation of continuous time stochastic
volatility models with discrete time models
Important applications
Inference
Simulation
Pricing
Other asset classes
So far we only discussed stock(indices)
Stock derivatives can be studied using a
derivative pricing models
Financial econometrics also deals with many
other asset classes
Term structure (including credit risk)
Commodities
Mutual funds
Energy markets
...
Term structure modeling
Model a complete curve at a single point in
time
There exist models
in discrete/continuous time
descriptive/pricing
for standard interest rates/derivatives
...
Part 2: Inference
Contents
Parametric inference for ARCH-type models
Rank based inference
Analogy principle
The classical approach to estimation is based
on the analogy principle
if you want to estimate an expectation, take an
average
if you want to estimate a probability, take a
frequency
...
Moment estimation (GMM)
Consider an ARCH-type model


We suppose that can be calculated
on the basis of observations if is known
Moment condition
( )
t t t
R c u o
1
=
( )
1
u o
t
u
( ) { } 0
2
1
2
1
=

u o
t t t
R E
Moment estimation - 2
The estimator now is taken to solve


In case of underidentification: use
instruments
In case of overidentification: minimize
distance-to-zero
( ) { } 0

1
1
2
1
2
=

=

n
t
n t t
R
n
u o
Likelihood estimation
In case the density of the innovations is
known, say it is , one can write down the
density/likelihood of observed returns



Estimator: maximize this
( ) ( )
[
=

|
|
.
|

\
|
n
t
t
t
t
R
f
1
1 1
1
u o u o
f
Doing the math ...
Maximizing the log-likelihood boils down to
solving




with
( ) ( ) ( ) ( )

=

c
c
|
|
.
|

\
|
+
n
t
t t t
f
f
1
2
1
log
'
1
2
1
u o
u
u c u c
( )
( ) u o
u c
1
=
t
t
t
R
Efficiency consideration
Which of the above estimators is better?
Analysis using Hjek-Le Cam theory of
asymptotic statistics
Approximate complicated statistical experiment
with very simple ones
Something which works well in the
approximating experiment, will also do well in
the original one
Quasi MLE
In order for maximum likelihood to work, one
needs the density of the innovations
If this is not know, one can guess a density
(e.g., the normal)
This is known as
ML under non-standard conditions (Huber)
Quasi maximum likelihood
Pseudo maximum likelihood
Will it work?
For ARCH-type models, postulating the
Gaussian density can be shown to lead to
consistent estimates
There is a large theory on when this works or
not
We say for ARCH-type models the Gaussian
distribution has the QMLE property

The QMLE pitfall
One often sees people referring to Gaussian
MLE
Then, they remark that we know financial
innovations are fat-tailed ...
... and they switch to t-distributions
The t-distribution does not possess the
QMLE property (but, see later)
How to deal with SV-models?
The SV models look the same

But now, is a latent process and
hence not observed
Likelihood estimation still works in principle,
but unobserved variances have to be
integrated out
( )
t t t
R c u o
1
=
( )
1
u o
t
Inference for continuous time
models
Continuous time inference can, in theory, be
based on
continuous record observations
discretely sampled observations
Essentially all known approaches are based
on approximating discrete time models
Rank based inference
... in which we discuss the main ideas of rank
based inference
The statistical model
Consider a model where somewhere there
exist i.i.d. random errors

The observations are

The parameter of interest is some

We denote the density of the errors by
( )
n
t
t
1 =
c
( )
n
t
t
Y
1 =
p
9 c O e u
f
Formal model
We have an outcome space , with the
number of observations and the dimension
of
Take standard Borel sigma-fields
Model for sample size :


Asymptotics refer to
n k
9
k
Y
n
n
( )
{ } e O e = f P E
f
n
; :
,
u
u
n
Example: Linear regression
Linear regression model


(with observations )

Innovation density and cdf
i
T
i i
X Y c u + =
( )
n
i
i i
X Y
1
,
=
f F
Example ARCH(1)
Consider the standard ARCH(1) model


Innovation density and cdf

t t t
Y Y c u u
2
1 1 0
+ =
f F
Maintained hypothesis
For given and sample size , the
innovations can be calculated from the
observations

For cross-sectional models one may even
often write

Latent variable (e.g., SV) models ...
n
( )
n
t
t
1 =
c
u
( )
n
t
t
Y
1 =
( ) ( ) u c u c c ;
i i i
Y = =
Innovation ranks
The ranks are the ranks of the
innovations

We also write for the ranks
of the innovations based on
a value for the parameter of interest

Ranks of observations are generally not very
useful
n
R R , ,
1

n
c c , ,
1

( ) ( ) u c u c
n
, ,
1

( ) ( ) u u
n
R R , ,
1

u
Basic properties
The distribution does
not depend on nor on
permutation of
This is (fortunately) not true for

at least essentially
( ) ( ) ( ) u u
u n f
R R L , ,
1 ,

u f
n , , 1
( ) ( ) ( ) u u
u n f
R R L , ,
1 ,
0

Invariance
Suppose we generate the innovations
as transformation


with i.i.d. standard uniform
Now, the ranks are even invariant with
respect to
( )
n
i
i
1 =
c
( )
n
i
i
U
1 =
( )
i i
U F
1
= c
( )
n
i
i
R
1 =
F
Reconstruction
For large sample size we have



and, thus,
n
1 +
~
n
R
U
i
i
|
.
|

\
|
+
~

1
1
n
R
F
i
i
c
Rank based statistics
The idea is to apply whatever procedure you
have that uses innovations on the innovations
reconstructed from the ranks
This makes the procedure robust to
distributional changes
Efficiency loss due to ? ~
Rank based autocorrelations
Time-series properties can be studied using
rank based autocorrelations


These can be interpreted as standard
autocorrelations
rank based
for given reference density and distribution
free
( )
( ) ( ) ( ) ( )

=
n
t
l t t
n
f
R R
f
f
n
l r
1
,
' 1
u u
u
Robustness
An important property of rank based statistics
is the distributional invariance
As a result: a rank based estimator is
consistent for any reference density
All densities satisfy the QMLE property when
using rank based inference
RB
u

Limiting distribution
The limiting distribution of depends on
both the chosen reference density and the
actual underlying density
The optimal choice for the reference density
is the actual density
How efficient is this estimator?
Semiparametrically efficient
RB
u

g
f
Remark
All procedures are distribution free with
respect to the innovation density
They are, clearly, not distribution free with
respect to the parameter of interest
f
u
Signs and ranks
Why ranks?
So far, we have been considering completely
unrestricted sets of innovation densities
For this class of densities ranks are maximal
invariant
This is crucial for proving semiparametric
efficiency
Alternatives
Alternative specifications may impose
zero-median innovations
symmetric innovations
zero-mean innovations
This is generally a bad idea ...
Zero-median innovations
The maximal invariant now becomes the
ranks and signs of the innovations

The ideas remain the same, but for a more
precise reconstruction
Split sample of innovations in positive and
negative part and treat those separately
( ) ( ) ( ) u c u
t t
sign s =
But ranks are still ...
Yes, the ranks are still invariant
... and the previous results go through
But the efficiency bound has now changed
and rank based procedures are no longer
semiparametrically efficient
... but sign-and-rank based procedures are
Symmetric innovations
In the symmetric case, the signed-ranks
become maximal invariant
signs of the innovations
ranks of the absolute values
The reconstruction now becomes still more
precise (and efficient)

Semiparametric efficiency
General result
Using the maximal invariant to reconstitute
the central sequence leads to
semiparametrically efficient inference
in the model for which this maximal invariant is
derived
In general use
( )
{ } invariant maximal
, ,
n
f f
E
u u
A
Proof
The proof is non-trivial, but some intuition can
be given using tangent spaces

Potrebbero piacerti anche