Sei sulla pagina 1di 28

University of Lausanne - cole des HEC

LECTURE NOTES
ADVANCED ECONOMETRICS
Preliminary version, do not quote, cite and reproduce
without permission
Professor : Florian Pelgrin
2009-2010
CHAPTER 2: THE MULTIPLE LINEAR
REGRESSION MODEL (PART II)
Contents
1 Some concepts in point estimation . . . . . . . . . . . . . . . . . 2
1.1 What is an estimator? . . . . . . . . . . . . . . . . . . . 2
1.2 What constitutes a "good" estimator? . . . . . . . . . . . 3
1.3 Decision theory . . . . . . . . . . . . . . . . . . . . . . . 4
1.4 Unbiased estimators . . . . . . . . . . . . . . . . . . . . 8
1.5 Best unbiased estimator in a parametric model . . . . . . 13
1.6 Best invariant unbiased estimators . . . . . . . . . . . . 16
2 Statistical properties of the OLS estimator . . . . . . . . . . . . 18
2.1 Semi-parametric model . . . . . . . . . . . . . . . . . . . 18
Nonstochastic regressors . . . . . . . . . . . . . . . . . . 18
Stochastic regressors . . . . . . . . . . . . . . . . . . . . 22
2.2 Parametric model . . . . . . . . . . . . . . . . . . . . . . 25
Fixed regressors . . . . . . . . . . . . . . . . . . . . . . . 26
Stochastic regressors . . . . . . . . . . . . . . . . . . . . 27
1
1 Some concepts in point estimation 2
In this part, some of the statistical properties of the ordinary least squares
are reviewed. More specically, we are focusing on the unbiasedness and ef-
ciency property of the ordinary least squares estimator when the multiple
linear regression model is parametric or semi-parametric. In the next part, the
asymptotic properties of the ordinary least squares estimator are analyzed.
1 Some concepts in point estimation
Before presenting the unbiasedness and eciency properties of the ordinary
least squares estimator, it is important to dene some concepts that will be
used throughout this part and the course.
Generally speaking, point estimation refers to providing a "best guess" of
some quantity of interest. The latter could be the parameters of the regression
function, the probability density function in a nonparametric model, etc.
1.1 What is an estimator?
To dene an estimator, we consider the simple case in which we have n i.i.d.
(independent and identically distributed) random variables, Y
1
,Y
2
, ,Y
n

each random variable is characterized by a parametric distribution and thus


depends on a parameter (or a vector of parameters), say and we observe
the realizations of these random variables. In this respect, one can dene an
estimator of as follows.
Denition 1. A point estimator is any function T(Y
1
, Y
2
, , Y
n
) of a sam-
ple. Any statistic is a point estimator.
Examples: Assume that Y
1
, ,Y
n
are i.i.d. ^(m,
2
) random variables.
1. The sample mean

Y
n
=
1
n
n

i=1
Y
i
is a point estimator (or an estimator) of m.
2. The sample variance
S
2
n
=
1
n 1
n

i=1
_
Y
i


Y
n
_
2
is a point estimator of
2
.
1 Some concepts in point estimation 3
Remarks:
1. There is no correspondence between the estimator and the parameter to
estimate.
2. In the previous denition, there is no mention regarding the range of the
statistic T(Y
1
, , Y
n
). More specically, the range of the statistic can
be dierent from the one of the parameter.
3. An estimator is a function of the sample: it is a random variable (or
vector).
4. An estimate is the realized value of an estimator (i.e. a number) that is
obtained when a sample is actually taken. For instance, y
n
is an estimate
of

Y
n
and is given by:
y
n
=
1
n
n

i=1
y
i
.
1.2 What constitutes a "good" estimator?
While this question is simple, there is no straightforward answer...A "good"
estimator may be characterized in dierent ways.
1
For instance, one may
consider a "good" estimator as one that:
1. ...minimizes a given loss function and has the smallest risk;
2. ...is unbiased, i.e.
E(T(Y )) = or
E(T(Y )) = g()
where g is known;
3. ...satises some asymptotic properties (when the sample size is large);
4. ...is ecient, i.e. has the minimum variance among all estimators of the
quantity of interest;
5. ...is the best estimator in a restricted class of estimators that sat-
ises some desirable properties (search within a subclass)for instance,
the class of unbiased estimators;
6. ...is the best estimator, which has some appropriate properties, by max-
imizing or minimizing a criterion (or objective function);
1
For sake of simplicity, we will almost always consider a statistical parametric model.
1 Some concepts in point estimation 4
7. ...
Some of these "dierent" interpretation are briey reviewed below.
2
1.3 Decision theory
A rst answer to our question is found in decision theory which is a formal
theory for comparing statistical procedures.
Let

n
= T(Y ) denote an estimator of and
0
the true value of the pa-
rameter . We can dene a loss function, L
_

n
,
0
_
, which tells us the loss
that occurs when

n
is used when the true value is
0
. As we explain before
(Chapter 2, part I), a common loss function is the quadratic one:
L(

n
,
0
) =
_

_
2
or
L(

n
,
0
) =
_

n
g ()
_
2
However, other loss functions are possible.
1. Absolute norm:
L
1
_

n
,
_
= [

n
[
2. Truncated loss function:
L
_

n
,
_
=
_
[

n
[ if [

n
[ c
c if [

n
[ c.
3. The zero-one loss function:
L
_

n
,
_
=
_
0 if

n
=
1 if

n
,= .
4. The L
p
or L
p
loss function:
L
_

n
,
_
= [

n
[
p
5. The Kullback-Leibler loss function:
L
_

n
,
_
=
_
Y
log
_
f(y; )
f(y;

n
)
_
f(y; )dy.
where f is the probability density function of y.
2
Note that the way of presenting these dierent interpretations is somehow articial in
the sense that there are some relationships among them.
1 Some concepts in point estimation 5
6. Etc.
In the sequel, we focus on the quadratic loss function. Taking the loss function,
we can calculate the risk function, i.e. the expected value of the loss for
any

n
.
Denition 2. The risk function, i.e. the expected value of the loss is
dened to be:
R
_

n
,
0
_
= E
_
L
_

n
,
0
__
= E[L(T(Y ),
0
)]
=
_
Y
L(T(Y ),
0
) f(y;
0
)dy
For example, the risk function function associated to the quadratic loss func-
tion is called the mean squared error (MSE):
R
_

n
,
_
= E
_
_

_
2
_
.
The mean squared error can be written as follows:
R
_

n
,
_
= V

n
_
+
_
B
_

n
__
2
where B
_

n
_
= E
_

n
_
is the bias of the estimator

n
.
In particular, if E
_

n
_
= for all , then

n
is an unbiased estimator of
and
R
_

n
,
_
= V

n
_
.
Exercises:
1. Consider a quadratic loss function such that the risk function is given
by:
R
_

n
,
_
= E

_
_

n
g()
_
2
_
.
Show that:
R
_

n
,
_
= V

n
_
+
_
B
_

n
__
2
where B
_

n
_
= E
_

n
_
g().
1 Some concepts in point estimation 6
2. Let Y
1
and Y
2
be two i.i.d. T() random variables and let

Y
2
and S
2
2
denote:

Y
2
=
1
2
2

i=1
Y
i
S
2
2
=
1
2 1
2

i=1
_
Y
i


Y
2
_
2
=
(Y
1
Y
2
)
2
2
.
(a) Show that:
R
_

Y
2
,
_
=

2
R
_
S
2
2
,
_
=

2
+ 2
2
.
(b) Conclude.
3. Let X
1
, ,X
n
denote a sequence of i.i.d. ^(0,
2
). An estimator of
2
is given by:
S
2
n
=
1
n 1
n

i=1
_
X
i


X
n
_
2
where

X
n
=
1
n
n

i=1
X
i
.
(a) Show that S
2
n
is an unbiased estimator of
2
.
(b) Determine the root mean squared error of S
2
n
.
(c) Consider another estimator:

2
n
=
1
n
n

i=1
_
X
i


X
n
_
2
.
(i) Is it an unbiased estimator of
2
? an asymptotically unbiased
estimator of
2
?
(ii) Determine the root mean squared error of
2
n
.
(iii) Compare with question (b). Conclude.

Taking Denition 2, a good estimator is one that has small risk and the best
estimator has the smallest risk. Therefore, to compare two (or more) estima-
tors one can compare their risk functions. At a rst glance, the risk function
1 Some concepts in point estimation 7
depends on the unknown true parameter vector
0
and thus the previous def-
inition is not operational. However, the risk function (and loss function) can
be still redened as follows:
R(

n
, ) = E
_
L(

n
, )
_
=
_
L(T(Y ), ) f(y; )dy.
But there is still another problem: it may happen that none estimator (among
dierent estimators) uniformly dominates the others.
3
Indeed, consider the
following example.
Example: Suppose that X ^(, 1) and that the loss function is a quadratic
form. Consider two estimators

1
= X and

2
= 3 (point-mass distribution).
The risk function of the rst estimator is:
R(

1
, ) = E

(X )
2
= 1.
The risk function of the second estimator is:
R(

2
, ) = E

(3 )
2
= (3 )
2
.
If 2 < < 4, then
R(

2
, ) < R(

1
, ).
Neither estimator uniformly dominates the other.
In this case, how can we proceed? Dierent strategies are possible. In doing
so, let dene rst the maximum risk and the Bayes risk.
Denition 3. The maximum risk is dened to be:

R
_

n
,
0
_
= sup

R
_

n
,
0
_
.
The Bayes risk is given by:
r
_
f
0
,

n
_
=
_
R
_

n
,
0
_
f
0
()d
where f
0
is a prior density function for .

Taking this denition, we can dene (at least) two strategies.


3
A given loss function only yields (in general) a pre-ordering and not a total order!
1 Some concepts in point estimation 8
Strategy 1: Find an estimator that works well for a range values of .
For instance, if we know that , we might try to nd an estimator
that solves:
min

n
max

R(

n
, ).
This is a so-called minimax estimator or minimax rule.
Remark: More generally,

n
is minimax if:
sup

R(

n
, ) = inf

sup

R(

, ).
Strategy 2: Find a decision rule that minimize the Bayes risk:
Denition 4. A decision rule that minimizes the Bayes risk is called a Bayes
rule. More specically,

n
is a Bayes rule with respect to the prior f
0
if:
r
_
f
0
,

n
_
= inf

r
_
f
0
,

_
.
.
During this course, we will not consider these strategies. However, it should be
stressed that decision theory is an important part of the statistical foundations
of econometrics and that a number of concepts introduced in this course are
derived from it!
1.4 Unbiased estimators
A "good" estimator may be dened as an unbiased one.
Denition 5. Given a parametric model (, P

, ), an estimator T(Y )
is unbiased for if:
E

[T(Y )] = for all .

This denition can be generalized to any function of .


1 Some concepts in point estimation 9
Denition 6. Given a parametric model (, P

, ), an estimator T(Y )
is unbiased for a function g() R
p
of the parameter if:
E

[T(Y )] = g() for all .

Examples:
1. Let Y
1
, ,Y
n
be a random sampling from a Bernoulli distribution. An
unbiased estimator of p is:
T(Y ) =
1
n
p

i=1
Y
i
.
2. Let Y
1
, ,Y
n
be a random sample from the uniform distribution |
[0,]
.
An unbiased estimator of is:
T(Y ) =
2
n
n

i=1
Y
i
.
3. Let T(Y ) be an unbiased estimator of g(). Then the linear transforma-
tion AT(Y ) + B is an unbiased estimator of Ag() + B where A and B
are constant (nonrandom) matrices.
4. Consider the multiple linear regression model:
Y = X
0
+ u
where Y R
n
, X /
nk
is nonrandom, E(u) = 0
n1
, and V(u) =
2
I
n
.
The OLS estimator
T(Y ) =
_
X
t
X
_
1
X
t
Y
is an unbiased estimator of
0
.
5. Consider the generalized multiple linear regression model:
Y = X
0
+ u
where Y R
n
, X /
nk
is nonrandom, E(u) = 0
n1
, and V(u) =
2

( matrix is known). The generalized least squares (GLS) estimator


T(Y ) =
_
X
t

1
X
_
1
X
t

1
Y
is an unbiased estimator of
0
.
1 Some concepts in point estimation 10
More generally, we can dene the unbiasedness of an estimator conditionally on
some random variables. This will be particularly useful in the multiple linear
regression model when one assumes a mixture of stochastic and nonstochastic
regressors.
Denition 7. T(X, Y ) is an estimator conditionally unbiased for if and
only if:
E

[T(Y, X) [ X = x] = for all and x A.

This denition also generalizes in the case of a function g of .


Denition 8. T(X, Y ) is an estimator conditionally unbiased for g() if and
only if:
E

[T(Y, X) [ X = x] = g() for all and x A.

Example: Conditional static linear models


The OLS estimator given by:

n
=
_
X
t
X
_
1
X
t
Y
is unbiased for all and x A. We have:

n
=
0
+
_
X
t
X
_
1
X
t
u.
Taking the expectation and iterating over X, we get:
E
_

n
[ X
_
=
0
+E
_
_
X
t
X
_
1
X
t
u [ X
_
=
0
+
_
X
t
X
_
1
X
t
E[u [ X]
=
0
.
Therefore,
E
_

n
_
= E
X
_
E
_

n
[ X
__
=
0
and x A.

Remarks:
1 Some concepts in point estimation 11
1. The unbiasedness condition must hold for every possible value of the
parameter and not only for some of these values.
For instance, if E

[T(Y )] = when =
0
, then the estimator is not
unbiased because the unbiasedness condition is not satised for every
other parameter value.
2. In general, the property of unbiasedness is not conserved after a nonlinear
transformation of the estimator.
Proposition 1. Let T(Y ) be an unbiased estimator of . If h is a nonlinear
function of , then E

[h(T(Y ))] = h[E

(T(Y ))] no longer holds.

One can show the following proposition.


Proposition 2. Let T(Y ) be an unbiased estimator of R.
(a) If h is convex, then h(T(Y )) overestimates h() on average.
(b) If h is concave, then h(T(Y )) underestimates h() on average.

Example: Let Y
1
, ,Y
n
be i.i.(m,
2
). The sample variance
S
2
n
=
1
n 1
n

i=1
_
Y
i


Y
n
_
2
is an unbiased estimator of
2
. However,
_
S
2
n
is a biased estimator of
and it underestimates the true value.
3. Asymptotically unbiased estimators:
Denition 9. The sequence of estimators

n
T
n
(Y ) (with n N) is
asymptotically unbiased if:
lim
n
E

(T
n
(Y )) = for all
where E

is dened with respect to P


,n
.
1 Some concepts in point estimation 12
An estimator

n
is asymptotically unbiased if the sequence of estimators

n
is asymptotically unbiased.
4. Existence of an unbiased estimator:
Proposition 3. If is a non-identied parameter, then there does not exist
an unbiased estimator of . If g() is a non-identied parameter function,
then there does not exist an unbiased estimator of g().
A necessary condition for the existence of an unbiased estimator is
the identication of the parameter function to be estimated.
5. The unbiasedness condition writes:
E

[T(Y )] =
_
Y
T(y)(y; )dy = g()
for all . Dierentiating this condition with respect to yields:
g

() =

_
Y
T(y)(y; )dy
=
_
Y
T(y)

(y; )dy
=
_
Y
T(y)
log

(y; )(y; )dy


= E

_
T(Y )
log

t
(Y ; )
_
.

While the unbiasedness property may appear to be interesting per se, it is not
so much! More specically,
(a) The absence of bias is not a sucient criterion to discriminate among
competitive estimators.
(b) A best unbiased estimator may be inadmissible (e.g., Steins estimator
see further).
(c) It may exists many unbiased estimators for the same parameter (vector)
of interest.
1 Some concepts in point estimation 13
(d) This is also true if one requires that the estimator is asymptotically
unbiased.
To illustrate this statement, consider the simple linear regression model:
y
i
= x
i

0
+ u
i
where x
i
is non-random and the error terms u
i
are i.i.d. ^(0,
2
). The OLS
estimator

n
=
_
n

i=1
x
2
i
_
1
_
n

i=1
x
i
y
i
_
is an unbiased estimator of
0
. However, is it unique? Unfortunately, no! The
following three estimators, which are linear with respect to Y , are unbiased:

1,n
=
1
n
n

i=1
y
i
x
i

2,n
=
n

i=1
y
i
n

i=1
x
i

3,n
=
1
n 1
n

i=2
y
i
y
i1
x
i
x
i1
.
More generally, any estimator (linear with respect to Y ) such that:
n

i=1
w
i
x
i
= 1
is unbiased.
Exercise: Verify the previous condition for

1,n
,

2,n
, and

3,n
.
1.5 Best unbiased estimator in a parametric model
When an estimator is unbiased, its (matrix) quadratic risk function is given
by:
R

(T(Y ), ) = E

_
(T(Y ) )(T(Y ) )
t

and thus reduced to its variance-covariance matrix V

(T(Y )). Therefore, com-


paring two (or more) unbiased estimators becomes equivalent to comparing
their variance-covariance matrices.
1 Some concepts in point estimation 14
Denition 10. Suppose that T
1
(Y ) and T
2
(Y ) are two unbiased estimators.
T
1
(Y ) dominates T
2
(Y ) if and only if:
V

(T
2
(Y )) _ V

(T
1
(Y ))
i.e. the matrix V

(T
2
(Y )) V

(T
1
(Y )) is a positive semi-denite matrix for
all .
In this respect, a natural question is whether there exists a lower bound for the
variance-covariance matrix of unbiased estimators in parametric models? To
answer this question, we rst need to dene the so-called Fisher information
matrix.
Denition 11. A parametric model with density (y; ), , is dened
to be regular if:
1. is an open subset of R
p
;
2. (y; ) is dierentiable with respect to ;
3.
_
Y
(y; )dy is dierentiable with respect to and

_
Y
(y; )dy =
_
Y

(y; )dy;
4. The Fisher information matrix
1() = E

_
log(Y ; )

log (Y ; )

t
_
exists and is nonsingular for every .
Taking this denition, the following fundamental theorem can be shown.
Theorem 1. Suppose that the parametric model is regular. Every estimator
T(Y ) that is regular and unbiased for R
k
has a variance-covariance
matrix satisfying:
V

(T(Y )) _ 1()
1
.
The quantity 1()
1
is called the Frechet-Darmois-Cramer-Rao lower
bound.
1 Some concepts in point estimation 15
Remark: T(Y ) is regular if it is square integrable:
E

|T(Y )|
2
< for all
and it satises:

_
Y
T(y)(y; )dy =
_
Y
T(y)

(y; )dy

The previous theorem can also be announced in the case of a function g of .


Theorem 2. Suppose that the parametric model is regular. Every estima-
tor T(Y ) that is regular and unbiased for g() R
p
has a variance-
covariance matrix satisfying:
V

(T(Y )) _
g()

t
1()
1
g()
t

.
where
g()

t
is the p k Jacobian matrix. The quantity
g()

t
1()
1
g()
t

is
also called the Frechet-Darmois-Cramer-Rao lower bound.
We are now in a position to dene the eciency of an estimator.
Denition 12. Assume that the parametric model is regular. A regular unbi-
ased estimator of (respectively g()) is ecient if its variance-covariance
matrix equals the FDCR lower bound:
V

(T(Y )) = 1()
1
or
V

(T(Y )) =
g()

t
1()
1
g()
t

Remarks:
1. It is worth noting that we restrict the class of estimators under consideration
the unbiased estimators.
Proposition 4. An ecient estimator of or g() is optimal in the class of
unbiased estimators.
1 Some concepts in point estimation 16
2. An ecient estimator of or g() is necessarily unique.
Proposition 5. The best unbiased estimator of or g() is unique. More-
over, this best unbiased estimator is uncorrelated with the dierence between
itself and every other unbiased estimators of or g().

Example: Let X
1
, ,X
n
denote a sequence of i.i.d. B(p). The sample mean

X
n
=
1
n
n

i=1
X
i
is an unbiased estimator of p and is the maximum likelihood estimator of p
(see Chapter 3). We get:
V(

X
n
) =
p(1 p)
n
.
The FDCR lower bound is given:
E
_
_
n

X
n
p

n n

X
n
1 p
_
2
_
= E
_
_
n(

X
n
p)
p(1 p)
_
2
_
=
n
p(1 p)
.
Therefore V(

X
n
) equals the inverse of the FDCR lower bound and

X
n
is
ecient.
1.6 Best invariant unbiased estimators
The previous results can only be used to nd best unbiased estimators in para-
metric models. These results no longer apply in the case of semi-parametric
models. The class of estimators must be again restricted, i.e. we impose
invariance conditions. More specically, we are interested in two forms of
invariance:
1. When (i) the parameters of interest appear linearly in the rst moment
and (ii) when the class of estimators is restricted to be linear in the
observations;
2. When (i) the parameters of interest appear linearly in the second moment
and (ii) when the class of estimators estimators is restricted to quadratic
estimators.
1 Some concepts in point estimation 17
In the former, the so-called Gauss-Markov theorem denes the best linear un-
biased estimators. In the latter, one can dene the best quadratic unbiased
estimators.
In doing so, consider the (conditional static) linear regression model:
Y = X
0
+ u
with E(u [ X) = 0 and V(u [ X) =
2
I
n
, Y is an n-dimensional vector and
X is an n k matrix of rank k (or P(rk(X) = k) = 1). The parameter
vector of interest is the k-dimensional vector
0
which is linearly related to
E(Y [ X). If we now consider the class of unbiased estimators that are linear
in Y (conditional on X), one could show the following theorem.
Theorem 3. The ordinary least squares estimator of
0
dened by

OLS
= (X
t
X)
1
X
t
Y
is the best estimator in the class of linear (in Y ) unbiased estimators of
0
.
Its variance is:
V
_

OLS
[ X
_
=
2
(X
t
X)
1
.

This theorem is known as the Gauss-Markov theorem. It can be easily gen-


eralized in the case of non-spherical error terms (assuming that the variance-
covariance matrix of the error terms is known) as follows.
Theorem 4. Consider the conditional static linear model:
Y = X
0
+ u
with E(u [ X) = 0, V(u [ X) =
2

0
,
0
is known positive denite, Y is an
n-dimensional vector, and X is an n k matrix of rank k (or P(rk(X) =
k) = 1).
The generalized least squares estimator dened by

GLS
=
_
X
t

1
0
X
_
1
X
t

1
0
Y
is the best estimator in the class of linear unbiased estimators of
0
. Its
variance is given by:
V
_

GLS
[ X
_
=
2
_
X
t

1
0
X
_
1
.

2 Statistical properties of the OLS estimator 18


Finally, it is also possible to show that the unbiased estimator of
2
is
the best quadratic unbiased estimator. In this case, we need some additional
assumptions regarding the third and four moments of the error terms condi-
tionally on X. This result is stated in the theorem below.
Theorem 5. Consider the (conditional static) linear regression model:
Y = X
0
+ u
with E(u
i
[ X) = 0 and V(u
i
[ X) =
2
, E(u
3
i
[ X) = 0, E(u
4
i
[ X) = 3
4
, Y
is an n-dimensional vector and X is an nk matrix of rank k (or P(rk(X) =
k) = 1).
The estimator of
2
dened by
s
2
=
1
n k
Y

M
X
Y =
u

u
n k
is the best quadratic unbiased estimator of
2
.

2 Statistical properties of the OLS estimator


Taking the previous results, we can now study the unbiasedness and eciency
properties of the ordinary least squares estimator. First, we consider the semi-
parametric multiple linear regression model and we make the distinction be-
tween nonstochastic and stochastic regressors. Then, we analyze these prop-
erties in the parametric multiple linear regression model. In the latter, one
main advantage is that we can derive the exact distribution of the ordinary
least squares estimator.
2.1 Semi-parametric model
Nonstochastic regressors
In the case of nonstochastic regressors, the multiple linear regression model
writes:
Y = X + u
where E(u) = 0
n1
, V(u) =
2
I
n
, and X is matrix of nonstochastic regressors
with rk(X) = k. This is a semi-parametric specication.
The ordinary least squares estimator of is given by:

OLS
=
_
X
t
X
_
1
X
t
Y.
2 Statistical properties of the OLS estimator 19
The corresponding naive least squares estimator of
2
is given by:

2
=
|Y X

OLS
|
2
I
n
n
.
On the one hand, the ordinary least squares estimator of is unbiased, un-
correlated with the adjusted error terms and its variance-covariance matrix is
given by
2
(X
t
X)
1
.
Proposition 6. The ordinary least squares of satises the following prop-
erties:
E
_

OLS
_
= for all
V
_

OLS
_
=
2
(X
t
X)
1
and
Cov
_

OLS
, u
_
= 0
k1
where u = Y

Y = M
X
Y .

Proof:
1. By denition,

OLS
= (X

X)
1
X

Y . Therefore,

OLS
= (X

X)
1
X

(X + )
= + (X

X)
1
X

u.
It follows that:
E(

OLS
) = +E((X

X)
1
X

u)
= + (X

X)
1
X

E(u) = .
2. By denition,
V
_

OLS
_
= E
_
_

OLS
E
_

OLS
___

OLS
E
_

OLS
__

_
= E
_
_

OLS

__

OLS

_

_
.
Using the previous proof, one has:

OLS
= (X

X)
1
X

u
2 Statistical properties of the OLS estimator 20
Therefore (since
_
(X

X)
1

= (X

X)
1
),
V
_

OLS
_
= E
_
(X

X)
1
X

uu

X (X

X)
1
_
= (X

X)
1
X

E[uu

] X (X

X)
1
= (X

X)
1
X

2
I
n
X (X

X)
1
=
2
(X

X)
1
(X

X) (X

X)
1
=
2
(X

X)
1
.
Remark: The result can also be shown in another way. One has V(Y ) =
V(X + u) = V(u) =
2
I
n
. It follows that: V
_

OLS
_
= V
_
(X

X)
1
X

Y
_
=
(X

X)
1
X

V(Y ) X (X

X)
1
=
2
(X

X)
1
.
3. We have:
Cov
_

OLS
, u
_
= E
__

OLS

_
u

_
= E
_
(X

X)
1
X

uu

M
X
_
since E( u) = 0
n1
and u = M
X
u. Therefore,
Cov
_

OLS
, u
_
= (X

X)
1
X

E[uu

] M
X
=
2
(X

X)
1
X

M
X
= 0
kn
since M
X
X = 0
nk
.

On the other hand, the naive estimator of


2
is biased. To obtain an un-
biased estimator of
2
, we need to adjust the denominator (i.e, instead of
dividing by n, we adjust for the number of explanatory variables).
Proposition 7. The naive ordinary least squares estimator of
2
,

2
=
|Y X

OLS
|
2
I
n
n
,
is biased. The unbiased ordinary least squares estimator of
2
is given by:

2
=
|Y X

OLS
|
2
I
n
n k
where k is the number of explanatory variables.

2 Statistical properties of the OLS estimator 21


Proof: By denition,
u = Y

Y = Y P
X
Y = (I P
X
)Y
i.e.
u = M
X
Y = M
X
u
(since M
X
X = 0).
Therefore,
n

i=1
u
2
i
can be dened as follows:
u

u = u

X
M
X
u = u

M
X
u = Tr(M
X
uu

).
Therefore,
E[ u

u] = E[Tr (M
X
uu

)]
= Tr [E(M
X
uu

)]
= Tr [M
X
E(uu

)]
= Tr
_
M
X

2
I
n

=
2
Tr (M
X
)
=
2
(n k).
It follows that
2
is a biased estimator of
2
since:
E
_

2

=
n k
n

2
.
Finally,
2
=
n
nk

2
is an unbiased estimator of
2
.
Remark: The variance of
2
cannot be derived without some assumptions re-
garding the third and fourth moments of u.
4
If we assume that such moments
exist, then:
V
_

2
_
=
2
4
n k
+

i
(
4i
3
4
)m
2
X,ii
(n k)
2
where m
X,ii
is the i
th
diagonal element of M
X
and
4i
is the moment of order
4.
As we explain before, unbiasedness is not sucient per se in order to dis-
criminate among estimators. We now turn to the eciency problem. First,
we study the eciency of the ordinary least squares estimator of . Then we
focus on the eciency of
2
.
4
To dene the semi-parametric model, we only make assumptions regarding the rst two
moments.
2 Statistical properties of the OLS estimator 22
Theorem 6. Consider the static multiple linear regression model:
Y = X
0
+ u
where E(u
i
) = 0 and V(u
i
) =
2
for all i, Y is an n-dimensional vector and
X is an n k matrix of rank k. The ordinary least squares estimator of
0
dened by

OLS
= (X
t
X)
1
X
t
Y
is the best estimator in the class of linear (in Y ) unbiased estimators of
0
.
Its variance is
V
_

OLS
_
=
2
(X
t
X)
1
.

Proof: See Theorem 3.


Theorem 7. Consider the static multiple linear regression model:
Y = X
0
+ u
where E(u
i
) = 0, V(u
i
) =
2
, E(u
3
i
) = 0, and E(u
4
i
) = 3
4
for all i, Y is an
n-dimensional vector and X is an n k matrix of rank k.
The estimator of
2
dened by
s
2
=
1
n k
Y

M
X
Y =
u

u
n k
is the best quadratic unbiased estimator of
2
.

Proof: See Theorem 5.


Stochastic regressors
We now study the properties of the estimator of (

, )

when the regressors


are stochastic. All in all, the main properties are not altered. This is only the
way to prove the results that is changing.
In the presence of stochastic regressors. the multiple linear regression model
writes:
Y = X + u
2 Statistical properties of the OLS estimator 23
where E(u [ X) = 0
n1
, V(u [ X) =
2
I
n
, and X is matrix of random regressors
with P(rk(X) = k) = 1. This is a semi-parametric specication.
The ordinary least squares estimator of is given by:

OLS
=
_
X
t
X
_
1
X
t
Y.
The corresponding naive least squares estimator of
2
is given by:

2
=
|Y X

OLS
|
2
I
n
n
.
Proposition 8. The ordinary least squares of satises the following prop-
erties:
E
_

OLS
[ X
_
= for all
E
_

OLS
_
= for all
V
_

OLS
[ X
_
=
2
(X
t
X)
1
V
_

OLS
_
=
2
E
X
_
(X
t
X)
1

and
Cov
_

OLS
, u [ X
_
= 0
kn
where u = Y

Y = M
X
Y .

Remark: To obtain the unconditional properties of the ordinary least squares


estimator of in the presence of stochastic regressors, one generally proceeds
in two steps:
1. Obtain the desired result conditioned on X;
2. Find the unconditional result by averaging (i.e., by integrating over) the
conditional distribution.
Selected proofs:
1. Conditional unbiasedness property of

OLS
:
E
_

OLS
[ X
_
= +E
_
(X
t
X)
1
X
t
u [ X
_
= .
2 Statistical properties of the OLS estimator 24
2. Unconditional unbiasedness property of

OLS
:
E
_

OLS
_
= E
X
_
E
_

OLS
[ X
__
= E
X
[] = .
3. Conditional variance property of

OLS
:
V
_

OLS
[ X
_
= E
_
(X
t
X)
1
X
t
uu
t
X(X
t
X)
1
[ X

= (X
t
X)
1
X
t
E
_
uu
t
[ X

X(X
t
X)
1
=
2
(X
t
X)
1
.
4. Unconditional variance property of

OLS
:
V
_

OLS
_
= E
X
_
V(

OLS
[ X)
_
+V
_
E
_

OLS
__
= E
X
_
V(

OLS
[ X)
_
=
2
E
X
_
(X
t
X)
1

As in the case of nonstochastic regressors, one can also show that


2
is an
unbiased estimator of
2
.
Proposition 9. The unbiased ordinary least squares estimator of
2
is given
by:

2
=
|Y X

OLS
|
2
I
n
n k
where k is the number of explanatory variables.

Proof: We get:
(n k)E
_

2
[ X

= E
_
u
t
M
X
u [ X

= E
_
Tr(M
X
uu
t
) [ X

= Tr
_
M
X
E
_
uu
t
[ X
_
=
2
Tr(M
X
).
The result follows.
Finally, we study the eciency properties.
2 Statistical properties of the OLS estimator 25
Theorem 8. Consider the conditional static multiple linear regression model:
Y = X
0
+ u
where E(u
i
[ X) = 0 and V(u
i
[ X) =
2
for all i, Y is an n-dimensional
vector and X is an n k matrix of rank k. The ordinary least squares
estimator of
0
dened by:

OLS
= (X
t
X)
1
X
t
Y
is the best estimator in the class of linear (in Y ) unbiased estimators of
0
.
Its (conditional) variance is:
V
_

OLS
[ X
_
=
2
(X
t
X)
1

Theorem 9. Consider the conditional static multiple linear regression model:


Y = X
0
+ u
where E(u
i
[ X) = 0, V(u
i
[ X) =
2
, E(u
3
i
[ X) = 0, and E(u
4
i
[ X) = 3
4
for all i, Y is an n-dimensional vector and X is an n k matrix of rank k.
The estimator of
2
dened by:
s
2
=
1
n k
Y

M
X
Y =
u

u
n k
is the best quadratic unbiased estimator of
2
.

Summary
Proposition 10. The unbiasedness results of the ordinary least squares es-
timator of and
2
and the Gauss-Markov theorem hold whether or not the
matrix X is considered as random.
2.2 Parametric model
Instead of dening the rst two moments of the error terms, we now assume
that the error terms are normally distributed (parametric model). In this
case, the exact (as opposed to asymptotic) distribution of

OLS
and
2
can be
derived.
2 Statistical properties of the OLS estimator 26
Fixed regressors
Proposition 11. Consider the multiple linear regression model:
Y = X + u
where u ^(0,
2
I
n
), and X is matrix of xed regressors with rk(X) = k.

OLS
and
(nk)b
2

2
are distributed as follows:

OLS
^(,
2
(X

X)
1
)
(n k)
2

2

2
(n k).
Moreover,

OLS
and
(nk)b
2

2
are independent.

Proof:
1. Since Y = X + u, we get Y ^(X,
2
I
n
). Moreover,

OLS
=
(X

X)
1
X

Y ,which implies that


OLS
is normally distributed. There-
fore, we just need to characterize the rst two moments, E[

OLS
] and

V[

OLS
]. It follows that:

OLS
^(,
2
(X

X)
1
).
2. As we show before, u = M
X
Y . Therefore, E[ u] = E[M
X
Y ] = M
X
E[Y ] =
M
X
X (with M
X
X = 0) and V[ u] =
2
M
X
. It follows that: M
X
Y
^(M
X
X,
2
M
X
), which implies
M
X
Y

^(M
X
X, M
X
). This can
be rewritten as |M
X
Y

|
2

2
(rg(M
X
)). This is also equivalent to
M
X
Y
2

2

2
(n k), i.e.
(nk)b
2

2

2
(n k).
Finally, the eciency of

OLS
can be stated using the maximum likelihood
theory (see further). One has the following proposition.
Proposition 12. Consider the multiple linear regression model:
Y = X + u
where u ^(0,
2
I
n
), and X is matrix of xed regressors with rk(X) = k.
1. The ordinary least squares estimator of is ecient: its variance-
covariance matrix equals the inverse of the Fisher information matrix.
2. The unbiased ordinary least squares estimator of
2
is not ecient.
There exits no best quadratic unbiased estimator of
2
which is ecient.

2 Statistical properties of the OLS estimator 27


Proof : See chapter 3 (Maximum likelihood theory).
Stochastic regressors
Proposition 13. Consider the conditional static multiple linear regression
model:
Y = X + u
where u [ X ^(0,
2
I
n
), and X is a matrix of random regressors with
P(rk(X) = k) = 1.

OLS
is distributed as follows:

OLS
[ X ^(,
2
(X

X)
1
).

Potrebbero piacerti anche