Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
LECTURE NOTES
ADVANCED ECONOMETRICS
Preliminary version, do not quote, cite and reproduce
without permission
Professor : Florian Pelgrin
2009-2010
CHAPTER 2: THE MULTIPLE LINEAR
REGRESSION MODEL (PART II)
Contents
1 Some concepts in point estimation . . . . . . . . . . . . . . . . . 2
1.1 What is an estimator? . . . . . . . . . . . . . . . . . . . 2
1.2 What constitutes a "good" estimator? . . . . . . . . . . . 3
1.3 Decision theory . . . . . . . . . . . . . . . . . . . . . . . 4
1.4 Unbiased estimators . . . . . . . . . . . . . . . . . . . . 8
1.5 Best unbiased estimator in a parametric model . . . . . . 13
1.6 Best invariant unbiased estimators . . . . . . . . . . . . 16
2 Statistical properties of the OLS estimator . . . . . . . . . . . . 18
2.1 Semi-parametric model . . . . . . . . . . . . . . . . . . . 18
Nonstochastic regressors . . . . . . . . . . . . . . . . . . 18
Stochastic regressors . . . . . . . . . . . . . . . . . . . . 22
2.2 Parametric model . . . . . . . . . . . . . . . . . . . . . . 25
Fixed regressors . . . . . . . . . . . . . . . . . . . . . . . 26
Stochastic regressors . . . . . . . . . . . . . . . . . . . . 27
1
1 Some concepts in point estimation 2
In this part, some of the statistical properties of the ordinary least squares
are reviewed. More specically, we are focusing on the unbiasedness and ef-
ciency property of the ordinary least squares estimator when the multiple
linear regression model is parametric or semi-parametric. In the next part, the
asymptotic properties of the ordinary least squares estimator are analyzed.
1 Some concepts in point estimation
Before presenting the unbiasedness and eciency properties of the ordinary
least squares estimator, it is important to dene some concepts that will be
used throughout this part and the course.
Generally speaking, point estimation refers to providing a "best guess" of
some quantity of interest. The latter could be the parameters of the regression
function, the probability density function in a nonparametric model, etc.
1.1 What is an estimator?
To dene an estimator, we consider the simple case in which we have n i.i.d.
(independent and identically distributed) random variables, Y
1
,Y
2
, ,Y
n
Y
n
=
1
n
n
i=1
Y
i
is a point estimator (or an estimator) of m.
2. The sample variance
S
2
n
=
1
n 1
n
i=1
_
Y
i
Y
n
_
2
is a point estimator of
2
.
1 Some concepts in point estimation 3
Remarks:
1. There is no correspondence between the estimator and the parameter to
estimate.
2. In the previous denition, there is no mention regarding the range of the
statistic T(Y
1
, , Y
n
). More specically, the range of the statistic can
be dierent from the one of the parameter.
3. An estimator is a function of the sample: it is a random variable (or
vector).
4. An estimate is the realized value of an estimator (i.e. a number) that is
obtained when a sample is actually taken. For instance, y
n
is an estimate
of
Y
n
and is given by:
y
n
=
1
n
n
i=1
y
i
.
1.2 What constitutes a "good" estimator?
While this question is simple, there is no straightforward answer...A "good"
estimator may be characterized in dierent ways.
1
For instance, one may
consider a "good" estimator as one that:
1. ...minimizes a given loss function and has the smallest risk;
2. ...is unbiased, i.e.
E(T(Y )) = or
E(T(Y )) = g()
where g is known;
3. ...satises some asymptotic properties (when the sample size is large);
4. ...is ecient, i.e. has the minimum variance among all estimators of the
quantity of interest;
5. ...is the best estimator in a restricted class of estimators that sat-
ises some desirable properties (search within a subclass)for instance,
the class of unbiased estimators;
6. ...is the best estimator, which has some appropriate properties, by max-
imizing or minimizing a criterion (or objective function);
1
For sake of simplicity, we will almost always consider a statistical parametric model.
1 Some concepts in point estimation 4
7. ...
Some of these "dierent" interpretation are briey reviewed below.
2
1.3 Decision theory
A rst answer to our question is found in decision theory which is a formal
theory for comparing statistical procedures.
Let
n
= T(Y ) denote an estimator of and
0
the true value of the pa-
rameter . We can dene a loss function, L
_
n
,
0
_
, which tells us the loss
that occurs when
n
is used when the true value is
0
. As we explain before
(Chapter 2, part I), a common loss function is the quadratic one:
L(
n
,
0
) =
_
_
2
or
L(
n
,
0
) =
_
n
g ()
_
2
However, other loss functions are possible.
1. Absolute norm:
L
1
_
n
,
_
= [
n
[
2. Truncated loss function:
L
_
n
,
_
=
_
[
n
[ if [
n
[ c
c if [
n
[ c.
3. The zero-one loss function:
L
_
n
,
_
=
_
0 if
n
=
1 if
n
,= .
4. The L
p
or L
p
loss function:
L
_
n
,
_
= [
n
[
p
5. The Kullback-Leibler loss function:
L
_
n
,
_
=
_
Y
log
_
f(y; )
f(y;
n
)
_
f(y; )dy.
where f is the probability density function of y.
2
Note that the way of presenting these dierent interpretations is somehow articial in
the sense that there are some relationships among them.
1 Some concepts in point estimation 5
6. Etc.
In the sequel, we focus on the quadratic loss function. Taking the loss function,
we can calculate the risk function, i.e. the expected value of the loss for
any
n
.
Denition 2. The risk function, i.e. the expected value of the loss is
dened to be:
R
_
n
,
0
_
= E
_
L
_
n
,
0
__
= E[L(T(Y ),
0
)]
=
_
Y
L(T(Y ),
0
) f(y;
0
)dy
For example, the risk function function associated to the quadratic loss func-
tion is called the mean squared error (MSE):
R
_
n
,
_
= E
_
_
_
2
_
.
The mean squared error can be written as follows:
R
_
n
,
_
= V
n
_
+
_
B
_
n
__
2
where B
_
n
_
= E
_
n
_
is the bias of the estimator
n
.
In particular, if E
_
n
_
= for all , then
n
is an unbiased estimator of
and
R
_
n
,
_
= V
n
_
.
Exercises:
1. Consider a quadratic loss function such that the risk function is given
by:
R
_
n
,
_
= E
_
_
n
g()
_
2
_
.
Show that:
R
_
n
,
_
= V
n
_
+
_
B
_
n
__
2
where B
_
n
_
= E
_
n
_
g().
1 Some concepts in point estimation 6
2. Let Y
1
and Y
2
be two i.i.d. T() random variables and let
Y
2
and S
2
2
denote:
Y
2
=
1
2
2
i=1
Y
i
S
2
2
=
1
2 1
2
i=1
_
Y
i
Y
2
_
2
=
(Y
1
Y
2
)
2
2
.
(a) Show that:
R
_
Y
2
,
_
=
2
R
_
S
2
2
,
_
=
2
+ 2
2
.
(b) Conclude.
3. Let X
1
, ,X
n
denote a sequence of i.i.d. ^(0,
2
). An estimator of
2
is given by:
S
2
n
=
1
n 1
n
i=1
_
X
i
X
n
_
2
where
X
n
=
1
n
n
i=1
X
i
.
(a) Show that S
2
n
is an unbiased estimator of
2
.
(b) Determine the root mean squared error of S
2
n
.
(c) Consider another estimator:
2
n
=
1
n
n
i=1
_
X
i
X
n
_
2
.
(i) Is it an unbiased estimator of
2
? an asymptotically unbiased
estimator of
2
?
(ii) Determine the root mean squared error of
2
n
.
(iii) Compare with question (b). Conclude.
Taking Denition 2, a good estimator is one that has small risk and the best
estimator has the smallest risk. Therefore, to compare two (or more) estima-
tors one can compare their risk functions. At a rst glance, the risk function
1 Some concepts in point estimation 7
depends on the unknown true parameter vector
0
and thus the previous def-
inition is not operational. However, the risk function (and loss function) can
be still redened as follows:
R(
n
, ) = E
_
L(
n
, )
_
=
_
L(T(Y ), ) f(y; )dy.
But there is still another problem: it may happen that none estimator (among
dierent estimators) uniformly dominates the others.
3
Indeed, consider the
following example.
Example: Suppose that X ^(, 1) and that the loss function is a quadratic
form. Consider two estimators
1
= X and
2
= 3 (point-mass distribution).
The risk function of the rst estimator is:
R(
1
, ) = E
(X )
2
= 1.
The risk function of the second estimator is:
R(
2
, ) = E
(3 )
2
= (3 )
2
.
If 2 < < 4, then
R(
2
, ) < R(
1
, ).
Neither estimator uniformly dominates the other.
In this case, how can we proceed? Dierent strategies are possible. In doing
so, let dene rst the maximum risk and the Bayes risk.
Denition 3. The maximum risk is dened to be:
R
_
n
,
0
_
= sup
R
_
n
,
0
_
.
The Bayes risk is given by:
r
_
f
0
,
n
_
=
_
R
_
n
,
0
_
f
0
()d
where f
0
is a prior density function for .
n
max
R(
n
, ).
This is a so-called minimax estimator or minimax rule.
Remark: More generally,
n
is minimax if:
sup
R(
n
, ) = inf
sup
R(
, ).
Strategy 2: Find a decision rule that minimize the Bayes risk:
Denition 4. A decision rule that minimizes the Bayes risk is called a Bayes
rule. More specically,
n
is a Bayes rule with respect to the prior f
0
if:
r
_
f
0
,
n
_
= inf
r
_
f
0
,
_
.
.
During this course, we will not consider these strategies. However, it should be
stressed that decision theory is an important part of the statistical foundations
of econometrics and that a number of concepts introduced in this course are
derived from it!
1.4 Unbiased estimators
A "good" estimator may be dened as an unbiased one.
Denition 5. Given a parametric model (, P
, ), an estimator T(Y )
is unbiased for if:
E
, ), an estimator T(Y )
is unbiased for a function g() R
p
of the parameter if:
E
Examples:
1. Let Y
1
, ,Y
n
be a random sampling from a Bernoulli distribution. An
unbiased estimator of p is:
T(Y ) =
1
n
p
i=1
Y
i
.
2. Let Y
1
, ,Y
n
be a random sample from the uniform distribution |
[0,]
.
An unbiased estimator of is:
T(Y ) =
2
n
n
i=1
Y
i
.
3. Let T(Y ) be an unbiased estimator of g(). Then the linear transforma-
tion AT(Y ) + B is an unbiased estimator of Ag() + B where A and B
are constant (nonrandom) matrices.
4. Consider the multiple linear regression model:
Y = X
0
+ u
where Y R
n
, X /
nk
is nonrandom, E(u) = 0
n1
, and V(u) =
2
I
n
.
The OLS estimator
T(Y ) =
_
X
t
X
_
1
X
t
Y
is an unbiased estimator of
0
.
5. Consider the generalized multiple linear regression model:
Y = X
0
+ u
where Y R
n
, X /
nk
is nonrandom, E(u) = 0
n1
, and V(u) =
2
1
X
_
1
X
t
1
Y
is an unbiased estimator of
0
.
1 Some concepts in point estimation 10
More generally, we can dene the unbiasedness of an estimator conditionally on
some random variables. This will be particularly useful in the multiple linear
regression model when one assumes a mixture of stochastic and nonstochastic
regressors.
Denition 7. T(X, Y ) is an estimator conditionally unbiased for if and
only if:
E
n
=
_
X
t
X
_
1
X
t
Y
is unbiased for all and x A. We have:
n
=
0
+
_
X
t
X
_
1
X
t
u.
Taking the expectation and iterating over X, we get:
E
_
n
[ X
_
=
0
+E
_
_
X
t
X
_
1
X
t
u [ X
_
=
0
+
_
X
t
X
_
1
X
t
E[u [ X]
=
0
.
Therefore,
E
_
n
_
= E
X
_
E
_
n
[ X
__
=
0
and x A.
Remarks:
1 Some concepts in point estimation 11
1. The unbiasedness condition must hold for every possible value of the
parameter and not only for some of these values.
For instance, if E
[T(Y )] = when =
0
, then the estimator is not
unbiased because the unbiasedness condition is not satised for every
other parameter value.
2. In general, the property of unbiasedness is not conserved after a nonlinear
transformation of the estimator.
Proposition 1. Let T(Y ) be an unbiased estimator of . If h is a nonlinear
function of , then E
Example: Let Y
1
, ,Y
n
be i.i.(m,
2
). The sample variance
S
2
n
=
1
n 1
n
i=1
_
Y
i
Y
n
_
2
is an unbiased estimator of
2
. However,
_
S
2
n
is a biased estimator of
and it underestimates the true value.
3. Asymptotically unbiased estimators:
Denition 9. The sequence of estimators
n
T
n
(Y ) (with n N) is
asymptotically unbiased if:
lim
n
E
(T
n
(Y )) = for all
where E
n
is asymptotically unbiased if the sequence of estimators
n
is asymptotically unbiased.
4. Existence of an unbiased estimator:
Proposition 3. If is a non-identied parameter, then there does not exist
an unbiased estimator of . If g() is a non-identied parameter function,
then there does not exist an unbiased estimator of g().
A necessary condition for the existence of an unbiased estimator is
the identication of the parameter function to be estimated.
5. The unbiasedness condition writes:
E
[T(Y )] =
_
Y
T(y)(y; )dy = g()
for all . Dierentiating this condition with respect to yields:
g
() =
_
Y
T(y)(y; )dy
=
_
Y
T(y)
(y; )dy
=
_
Y
T(y)
log
_
T(Y )
log
t
(Y ; )
_
.
While the unbiasedness property may appear to be interesting per se, it is not
so much! More specically,
(a) The absence of bias is not a sucient criterion to discriminate among
competitive estimators.
(b) A best unbiased estimator may be inadmissible (e.g., Steins estimator
see further).
(c) It may exists many unbiased estimators for the same parameter (vector)
of interest.
1 Some concepts in point estimation 13
(d) This is also true if one requires that the estimator is asymptotically
unbiased.
To illustrate this statement, consider the simple linear regression model:
y
i
= x
i
0
+ u
i
where x
i
is non-random and the error terms u
i
are i.i.d. ^(0,
2
). The OLS
estimator
n
=
_
n
i=1
x
2
i
_
1
_
n
i=1
x
i
y
i
_
is an unbiased estimator of
0
. However, is it unique? Unfortunately, no! The
following three estimators, which are linear with respect to Y , are unbiased:
1,n
=
1
n
n
i=1
y
i
x
i
2,n
=
n
i=1
y
i
n
i=1
x
i
3,n
=
1
n 1
n
i=2
y
i
y
i1
x
i
x
i1
.
More generally, any estimator (linear with respect to Y ) such that:
n
i=1
w
i
x
i
= 1
is unbiased.
Exercise: Verify the previous condition for
1,n
,
2,n
, and
3,n
.
1.5 Best unbiased estimator in a parametric model
When an estimator is unbiased, its (matrix) quadratic risk function is given
by:
R
(T(Y ), ) = E
_
(T(Y ) )(T(Y ) )
t
(T
2
(Y )) _ V
(T
1
(Y ))
i.e. the matrix V
(T
2
(Y )) V
(T
1
(Y )) is a positive semi-denite matrix for
all .
In this respect, a natural question is whether there exists a lower bound for the
variance-covariance matrix of unbiased estimators in parametric models? To
answer this question, we rst need to dene the so-called Fisher information
matrix.
Denition 11. A parametric model with density (y; ), , is dened
to be regular if:
1. is an open subset of R
p
;
2. (y; ) is dierentiable with respect to ;
3.
_
Y
(y; )dy is dierentiable with respect to and
_
Y
(y; )dy =
_
Y
(y; )dy;
4. The Fisher information matrix
1() = E
_
log(Y ; )
log (Y ; )
t
_
exists and is nonsingular for every .
Taking this denition, the following fundamental theorem can be shown.
Theorem 1. Suppose that the parametric model is regular. Every estimator
T(Y ) that is regular and unbiased for R
k
has a variance-covariance
matrix satisfying:
V
(T(Y )) _ 1()
1
.
The quantity 1()
1
is called the Frechet-Darmois-Cramer-Rao lower
bound.
1 Some concepts in point estimation 15
Remark: T(Y ) is regular if it is square integrable:
E
|T(Y )|
2
< for all
and it satises:
_
Y
T(y)(y; )dy =
_
Y
T(y)
(y; )dy
(T(Y )) _
g()
t
1()
1
g()
t
.
where
g()
t
is the p k Jacobian matrix. The quantity
g()
t
1()
1
g()
t
is
also called the Frechet-Darmois-Cramer-Rao lower bound.
We are now in a position to dene the eciency of an estimator.
Denition 12. Assume that the parametric model is regular. A regular unbi-
ased estimator of (respectively g()) is ecient if its variance-covariance
matrix equals the FDCR lower bound:
V
(T(Y )) = 1()
1
or
V
(T(Y )) =
g()
t
1()
1
g()
t
Remarks:
1. It is worth noting that we restrict the class of estimators under consideration
the unbiased estimators.
Proposition 4. An ecient estimator of or g() is optimal in the class of
unbiased estimators.
1 Some concepts in point estimation 16
2. An ecient estimator of or g() is necessarily unique.
Proposition 5. The best unbiased estimator of or g() is unique. More-
over, this best unbiased estimator is uncorrelated with the dierence between
itself and every other unbiased estimators of or g().
Example: Let X
1
, ,X
n
denote a sequence of i.i.d. B(p). The sample mean
X
n
=
1
n
n
i=1
X
i
is an unbiased estimator of p and is the maximum likelihood estimator of p
(see Chapter 3). We get:
V(
X
n
) =
p(1 p)
n
.
The FDCR lower bound is given:
E
_
_
n
X
n
p
n n
X
n
1 p
_
2
_
= E
_
_
n(
X
n
p)
p(1 p)
_
2
_
=
n
p(1 p)
.
Therefore V(
X
n
) equals the inverse of the FDCR lower bound and
X
n
is
ecient.
1.6 Best invariant unbiased estimators
The previous results can only be used to nd best unbiased estimators in para-
metric models. These results no longer apply in the case of semi-parametric
models. The class of estimators must be again restricted, i.e. we impose
invariance conditions. More specically, we are interested in two forms of
invariance:
1. When (i) the parameters of interest appear linearly in the rst moment
and (ii) when the class of estimators is restricted to be linear in the
observations;
2. When (i) the parameters of interest appear linearly in the second moment
and (ii) when the class of estimators estimators is restricted to quadratic
estimators.
1 Some concepts in point estimation 17
In the former, the so-called Gauss-Markov theorem denes the best linear un-
biased estimators. In the latter, one can dene the best quadratic unbiased
estimators.
In doing so, consider the (conditional static) linear regression model:
Y = X
0
+ u
with E(u [ X) = 0 and V(u [ X) =
2
I
n
, Y is an n-dimensional vector and
X is an n k matrix of rank k (or P(rk(X) = k) = 1). The parameter
vector of interest is the k-dimensional vector
0
which is linearly related to
E(Y [ X). If we now consider the class of unbiased estimators that are linear
in Y (conditional on X), one could show the following theorem.
Theorem 3. The ordinary least squares estimator of
0
dened by
OLS
= (X
t
X)
1
X
t
Y
is the best estimator in the class of linear (in Y ) unbiased estimators of
0
.
Its variance is:
V
_
OLS
[ X
_
=
2
(X
t
X)
1
.
0
,
0
is known positive denite, Y is an
n-dimensional vector, and X is an n k matrix of rank k (or P(rk(X) =
k) = 1).
The generalized least squares estimator dened by
GLS
=
_
X
t
1
0
X
_
1
X
t
1
0
Y
is the best estimator in the class of linear unbiased estimators of
0
. Its
variance is given by:
V
_
GLS
[ X
_
=
2
_
X
t
1
0
X
_
1
.
M
X
Y =
u
u
n k
is the best quadratic unbiased estimator of
2
.
OLS
=
_
X
t
X
_
1
X
t
Y.
2 Statistical properties of the OLS estimator 19
The corresponding naive least squares estimator of
2
is given by:
2
=
|Y X
OLS
|
2
I
n
n
.
On the one hand, the ordinary least squares estimator of is unbiased, un-
correlated with the adjusted error terms and its variance-covariance matrix is
given by
2
(X
t
X)
1
.
Proposition 6. The ordinary least squares of satises the following prop-
erties:
E
_
OLS
_
= for all
V
_
OLS
_
=
2
(X
t
X)
1
and
Cov
_
OLS
, u
_
= 0
k1
where u = Y
Y = M
X
Y .
Proof:
1. By denition,
OLS
= (X
X)
1
X
Y . Therefore,
OLS
= (X
X)
1
X
(X + )
= + (X
X)
1
X
u.
It follows that:
E(
OLS
) = +E((X
X)
1
X
u)
= + (X
X)
1
X
E(u) = .
2. By denition,
V
_
OLS
_
= E
_
_
OLS
E
_
OLS
___
OLS
E
_
OLS
__
_
= E
_
_
OLS
__
OLS
_
_
.
Using the previous proof, one has:
OLS
= (X
X)
1
X
u
2 Statistical properties of the OLS estimator 20
Therefore (since
_
(X
X)
1
= (X
X)
1
),
V
_
OLS
_
= E
_
(X
X)
1
X
uu
X (X
X)
1
_
= (X
X)
1
X
E[uu
] X (X
X)
1
= (X
X)
1
X
2
I
n
X (X
X)
1
=
2
(X
X)
1
(X
X) (X
X)
1
=
2
(X
X)
1
.
Remark: The result can also be shown in another way. One has V(Y ) =
V(X + u) = V(u) =
2
I
n
. It follows that: V
_
OLS
_
= V
_
(X
X)
1
X
Y
_
=
(X
X)
1
X
V(Y ) X (X
X)
1
=
2
(X
X)
1
.
3. We have:
Cov
_
OLS
, u
_
= E
__
OLS
_
u
_
= E
_
(X
X)
1
X
uu
M
X
_
since E( u) = 0
n1
and u = M
X
u. Therefore,
Cov
_
OLS
, u
_
= (X
X)
1
X
E[uu
] M
X
=
2
(X
X)
1
X
M
X
= 0
kn
since M
X
X = 0
nk
.
OLS
|
2
I
n
n
,
is biased. The unbiased ordinary least squares estimator of
2
is given by:
2
=
|Y X
OLS
|
2
I
n
n k
where k is the number of explanatory variables.
i=1
u
2
i
can be dened as follows:
u
u = u
X
M
X
u = u
M
X
u = Tr(M
X
uu
).
Therefore,
E[ u
u] = E[Tr (M
X
uu
)]
= Tr [E(M
X
uu
)]
= Tr [M
X
E(uu
)]
= Tr
_
M
X
2
I
n
=
2
Tr (M
X
)
=
2
(n k).
It follows that
2
is a biased estimator of
2
since:
E
_
2
=
n k
n
2
.
Finally,
2
=
n
nk
2
is an unbiased estimator of
2
.
Remark: The variance of
2
cannot be derived without some assumptions re-
garding the third and fourth moments of u.
4
If we assume that such moments
exist, then:
V
_
2
_
=
2
4
n k
+
i
(
4i
3
4
)m
2
X,ii
(n k)
2
where m
X,ii
is the i
th
diagonal element of M
X
and
4i
is the moment of order
4.
As we explain before, unbiasedness is not sucient per se in order to dis-
criminate among estimators. We now turn to the eciency problem. First,
we study the eciency of the ordinary least squares estimator of . Then we
focus on the eciency of
2
.
4
To dene the semi-parametric model, we only make assumptions regarding the rst two
moments.
2 Statistical properties of the OLS estimator 22
Theorem 6. Consider the static multiple linear regression model:
Y = X
0
+ u
where E(u
i
) = 0 and V(u
i
) =
2
for all i, Y is an n-dimensional vector and
X is an n k matrix of rank k. The ordinary least squares estimator of
0
dened by
OLS
= (X
t
X)
1
X
t
Y
is the best estimator in the class of linear (in Y ) unbiased estimators of
0
.
Its variance is
V
_
OLS
_
=
2
(X
t
X)
1
.
M
X
Y =
u
u
n k
is the best quadratic unbiased estimator of
2
.
, )
OLS
=
_
X
t
X
_
1
X
t
Y.
The corresponding naive least squares estimator of
2
is given by:
2
=
|Y X
OLS
|
2
I
n
n
.
Proposition 8. The ordinary least squares of satises the following prop-
erties:
E
_
OLS
[ X
_
= for all
E
_
OLS
_
= for all
V
_
OLS
[ X
_
=
2
(X
t
X)
1
V
_
OLS
_
=
2
E
X
_
(X
t
X)
1
and
Cov
_
OLS
, u [ X
_
= 0
kn
where u = Y
Y = M
X
Y .
OLS
:
E
_
OLS
[ X
_
= +E
_
(X
t
X)
1
X
t
u [ X
_
= .
2 Statistical properties of the OLS estimator 24
2. Unconditional unbiasedness property of
OLS
:
E
_
OLS
_
= E
X
_
E
_
OLS
[ X
__
= E
X
[] = .
3. Conditional variance property of
OLS
:
V
_
OLS
[ X
_
= E
_
(X
t
X)
1
X
t
uu
t
X(X
t
X)
1
[ X
= (X
t
X)
1
X
t
E
_
uu
t
[ X
X(X
t
X)
1
=
2
(X
t
X)
1
.
4. Unconditional variance property of
OLS
:
V
_
OLS
_
= E
X
_
V(
OLS
[ X)
_
+V
_
E
_
OLS
__
= E
X
_
V(
OLS
[ X)
_
=
2
E
X
_
(X
t
X)
1
OLS
|
2
I
n
n k
where k is the number of explanatory variables.
Proof: We get:
(n k)E
_
2
[ X
= E
_
u
t
M
X
u [ X
= E
_
Tr(M
X
uu
t
) [ X
= Tr
_
M
X
E
_
uu
t
[ X
_
=
2
Tr(M
X
).
The result follows.
Finally, we study the eciency properties.
2 Statistical properties of the OLS estimator 25
Theorem 8. Consider the conditional static multiple linear regression model:
Y = X
0
+ u
where E(u
i
[ X) = 0 and V(u
i
[ X) =
2
for all i, Y is an n-dimensional
vector and X is an n k matrix of rank k. The ordinary least squares
estimator of
0
dened by:
OLS
= (X
t
X)
1
X
t
Y
is the best estimator in the class of linear (in Y ) unbiased estimators of
0
.
Its (conditional) variance is:
V
_
OLS
[ X
_
=
2
(X
t
X)
1
M
X
Y =
u
u
n k
is the best quadratic unbiased estimator of
2
.
Summary
Proposition 10. The unbiasedness results of the ordinary least squares es-
timator of and
2
and the Gauss-Markov theorem hold whether or not the
matrix X is considered as random.
2.2 Parametric model
Instead of dening the rst two moments of the error terms, we now assume
that the error terms are normally distributed (parametric model). In this
case, the exact (as opposed to asymptotic) distribution of
OLS
and
2
can be
derived.
2 Statistical properties of the OLS estimator 26
Fixed regressors
Proposition 11. Consider the multiple linear regression model:
Y = X + u
where u ^(0,
2
I
n
), and X is matrix of xed regressors with rk(X) = k.
OLS
and
(nk)b
2
2
are distributed as follows:
OLS
^(,
2
(X
X)
1
)
(n k)
2
2
2
(n k).
Moreover,
OLS
and
(nk)b
2
2
are independent.
Proof:
1. Since Y = X + u, we get Y ^(X,
2
I
n
). Moreover,
OLS
=
(X
X)
1
X
OLS
is normally distributed. There-
fore, we just need to characterize the rst two moments, E[
OLS
] and
V[
OLS
]. It follows that:
OLS
^(,
2
(X
X)
1
).
2. As we show before, u = M
X
Y . Therefore, E[ u] = E[M
X
Y ] = M
X
E[Y ] =
M
X
X (with M
X
X = 0) and V[ u] =
2
M
X
. It follows that: M
X
Y
^(M
X
X,
2
M
X
), which implies
M
X
Y
^(M
X
X, M
X
). This can
be rewritten as |M
X
Y
|
2
2
(rg(M
X
)). This is also equivalent to
M
X
Y
2
2
2
(n k), i.e.
(nk)b
2
2
2
(n k).
Finally, the eciency of
OLS
can be stated using the maximum likelihood
theory (see further). One has the following proposition.
Proposition 12. Consider the multiple linear regression model:
Y = X + u
where u ^(0,
2
I
n
), and X is matrix of xed regressors with rk(X) = k.
1. The ordinary least squares estimator of is ecient: its variance-
covariance matrix equals the inverse of the Fisher information matrix.
2. The unbiased ordinary least squares estimator of
2
is not ecient.
There exits no best quadratic unbiased estimator of
2
which is ecient.
OLS
is distributed as follows:
OLS
[ X ^(,
2
(X
X)
1
).