Linear Regression Models, by Marcelo Moreira

Economics G6411
Marcelo J. Moreira
Fall 2011
Columbia University
Lecture 16: Linear Regression Models
In Economics we are often interested in making assessments about how
much of the value of one random variable can be explained by the value of
other variables. A commonly chosen way to do so is through the estimation
of a linear model. We start by considering a rather simple model of this sort.
Bivariate Linear Regression
Consider the relation between random variables X and Y in a bivariate
population. Assume that:
1. y
i
= x
i
+ u
i
2. E(u
i
) = 0
3. {u
i
: i = 1, . . . , n} is a set of mutually independent random variables.
4. X
i
is deterministic, for every i = 1, . . . , n.
5. u
i
iid
N(0, 1)
The rst of the above assumptions determines that variable X is the
independent variable, and Y is the dependent variable. We do not know the
value of the parameter , so we are interested in estimating it. One way to
do so is by using a method already known by us: the maximum likelihood
estimation. From the last assumption of the model, u
i
iid
N(0, 1), we get
f(y; ), the joint pdf of y, and g
i
(y
i
; ), the pdf of y
i
, for i = 1, . . . , n.
f(y; ) =
n
i=1
g
i
(y
i
; )
=
n
i=1
(2)
1
2
exp
_
(y
i
x
i
)
2
2
_
= (2)
n
2
exp
_
i=1
(y
i
x
i
)
2
2
_
1
Economics G6411
Marcelo J. Moreira
Fall 2011
Columbia University
The maximum likelihood estimator of is then given by:
=arg max
n
2
ln(2)
1
2
n
i=1
(y
i
x
i
)
2
_
=arg min
i=1
(y
i
x
i
)
2
x
i
y
i
x
2
i
Since we know the distribution of u
i
, we could use small sample results to test
hypothesis. If u
i
is normally distributed, so is

. However, we can choose
a more general approach, that uses asymptotic results to make inferences
concerning the unknown parameter. For that, we will need the following
theorem:
Result 1 (Lindeberg-Feller CLT) For each n, let X
n1
, . . . , X
nn
be indepen-
dent random variables such that:
1. E[X
nt
] = 0
2.
n
i=1
E[X
2
nt
] = 1
3. lim
n
n
t=1
E[X
2
nt
I(|X
nt
| > )] = 0
then {
n
t=1
X
nt
}
n
converges in distribution to a standard normally distributed
random variable.
Notice that to use the Lindeberg-Feller CLT we do not require the va-
riables to be independent and identically distributed. We only require in-
dependence and the validity of the three conditions above. This is a very
important theorem, which we will use repeatedly.
Now let us return to our model. We wish to apply the Lindeberg-Feller
CLT to obtain the asymptotic distribution of our statistic. To do so, we must
check if all the assumptions of the theorem are valid.
x
t
u
t
x
2
t
=
n
t=1
_
x
t
x
2
t
_
u
t
2
Economics G6411
Marcelo J. Moreira
Fall 2011
Columbia University
Consistent with the notation of the Lindeberg-Feller CLT, we can manipulate
our equation to obtain the desired variables X
nt
and a
nt
:
x
2
t
(
) =
n
t=1
ant
..
_
x
t
(
x
2
t
)
1
2
_
u
t
. .
Xnt
=
n
t=1
X
nt
=
n
t=1
a
nt
u
t
Now, we check if the required conditions hold:
n
t=1
a
2
nt
=
n
t=1
_
x
t
(
x
2
t
)
1
2
_
2
=
x
2
t
x
2
t
= 1
n
t=1
E[X
2
nt
] =
n
t=1
a
2
nt
E[u
2
t
] = 1 (1)
E[X
nt
] =E[a
nt
u
t
] = a
nt
E[u
t
] = 0 (2)
Assuming that lim
n
n
t=1
E[X
nt
] =
2
, our nal condition:
lim
n
n
t=1
E[a
2
nt
u
2
t
I(|a
nt
u
t
| > )] = 0 (3)
follows from an application of the dominated convergence theorem. Equations
(??), (??) and (??) guarantees us that we can apply the Lindeberg-Feller
theorem to obtain the asymptotic distribution of our statistic of interest.
Multivariate Linear Regression
In the previous section, we studied a model with only one explanatory
variable X. However, we can consider the bivariate model as a particular
case of the multivariate model, and seek results valid for every linear model
with any given number k of explanatory variables, in a multivariate fra-
mework. Given n random variables Y
1
, . . . , Y
n
, a multiple linear regression
model stipulates a dependence relation between these random variables and
k explanatory variables. For each of these random variables Y
i
, i = 1, . . . , n,
3
Economics G6411
Marcelo J. Moreira
Fall 2011
Columbia University
there is a corresponding k-dimensional vector of explanatory variables x
i
,
that we assume related to the random variable in the following manner:
y
i
= x
i1
1
+ + x
ik
k
+ u
i
= x
i
+ u
i
i = 1, . . . , n
y = X +u (4)
where the matrix of values of the explanatory variables, X, is given by:
X =
nk
_
_
x
11
x
12
. . . x
1k
.
.
.
.
.
.
.
.
.
.
.
.
x
n1
x
n2
. . . x
nk
_
_
=
_
_
x
1
.
.
.
x
n
_
_
and the other matrices are:
y =
_
_
y
1
.
.
.
y
n
_
_
x
i
=
_
_
x
i1
.
.
.
x
ik
_
_
=
_
1
.
.
.
k
_
_
u =
_
_
u
1
.
.
.
u
k
_
_
Notice that equation (??) is simply the multivariate version of the equation
shown in the rst assumption of our previous bivariate model. The classical
multiple regression model has four basic assumptions:
1. E(y) = X
2. X is nonstochastic
3. V(y) =
2
I
n
4. X has rank k
For any given sample (y, X), the parameters and
2
are unknown and
we are interested in estimating them. To do so, we can start from equation
(??) and choose

that minimizes the sum of the square of the residuals
u
u = (y X)
(y X).
= arg min
1
2n
(y
i
x
i
)
2
= arg min
1
2n
(y X)
(y X)
4
Economics G6411
Marcelo J. Moreira
Fall 2011
Columbia University
This estimator is called the Ordinary Least Squares (OLS) estimator. Re-
garding the objective function, notice that:
(y X)
(y X) = y
y y
y +
X
= y
y 2y
X +
X
The rst-order condition is:
_
1
2n
(y X)
(y X)
_
= 0 X
y = X
If the matrix X
X has full rank, it is invertible and we can explicitly express
as:
= (X
X)
1
X
y (5)
Example 1 Consider the linear model below:
y
i
=
1
+ i
2
+ u
i
=
_
1 i
_

1
2
_
+u
i
= x
i
+ u
i
where
x
i
=
21
_
1
i
_
,
=
21
_
2
_
,
X =
n2
_
_
1 1
.
.
.
.
.
.
1 n
_
_
To nd the ordinary least squares estimators, we apply equation (??) using
the above matrices. So, we have that:
X
X =
nn
_
1 . . . 1
1 . . . n
__
_
1 1
.
.
.
.
.
.
1 n
_
_
=
_
1
i
2
_
(X
X)
1
=
nn
_
n
1
2
n(n + 1)
1
2
n(n + 1)
1
6
n(n + 1)(2n + 1)
_
1
5
Economics G6411
Marcelo J. Moreira
Fall 2011
Columbia University
Therefore, our estimator of is given by:
=
1
det(X
X)
_
1
6
n(n + 1)(2n + 1)
1
2
n(n + 1)
1
2
n(n + 1) n
__
y
i
iy
i
_
=
1
det(X
X)
_
_
1
6
n
2
(n + 1)(2n + 1)
1
n
y
i
1
2
n
2
(n + 1)
1
n
ix
i
1
2
n
2
(n + 1)
1
n
x
i
+ n
2 1
n
ix
i
_
_
=
_
_
2
_
_
where the determinant of X
X is
det(X
X) =
n
2
(n + 1)(2n + 1)
6

n
2
(n + 1)
2
4
The OLS estimator has some important properties that make it interes-
ting. For example, it is easy to see that it is an unbiased estimator of :
E(
) = E
_
(X
X)
1
X
y
_
= (X
X)
1
X
E(y) = (X
X)
1
X
X =
In addition to that, under the assumptions of the classic regression model,
the variance of

is:
V(
) = (X
X)
1
X
V(y)X(X
X)
1
= (X
X)
1
X
2
I
n
X(X
X)
1
=
2
(X
X)
1
X
X(X
X)
1
V(
) =
2
(X
X)
1
(6)
However,
2
is not observed. And if we wish to estimate V(
) consistently,
we must nd a consistent estimator of
2
. To do so, let us dene the two
following auxiliary matrices:
N = X(X
X)
1
X
(7)
6
Economics G6411
Marcelo J. Moreira
Fall 2011
Columbia University
M = I
n
N = I
n
X(X
X)
1
X
(8)
Using these matrices, we can write the tted-value vector as:
y = Ny = X(X
X)
1
X
y
. .
=
_
_
x
11
x
12
. . . x
1k
.
.
.
.
.
.
.
.
.
.
.
.
x
n1
x
n2
. . . x
nk
_
_
nk
_
1
.
.
.
n
_
_
k1
and the vector of residuals as:
e = y y = My = (I
n
N)y =
_
_
y
1
k
j=1
x
1j
j
.
.
.
y
n
k
j=1
x
nj
j
_
_
n1
Additionally, these matrices have the following properties:
1. N
= (X(X
X)
1
X
= X(X(X
X)
1
)
= X(X
X)
1
X
= N
2. N
N = X(X
X)
1
X
X(X
X)
1
X
= X(X
X)
1
X
= N
3. M
= (I N)
= I
= I N = M
4. M
M = I NN+N
N = I NN+N = I N = M
5. M
N = (I N)N = NN
N = NN = 0
6. MX = (I
n
N)X = XX(X
X)
1
X
X = XX = 0
7. My = M(X +u) = 0 +Mu
Now we can return to our consistent estimation of
2
. Applying the above
properties, we can compute the expected value of the random variable e
e
7
Economics G6411
Marcelo J. Moreira
Fall 2011
Columbia University
(the sum of squared residuals), which we use in the estimation of
2
:
E(e
e) = E
_
tr(e
e)
_
= tr
_
E(e
e)
_
= tr
_
E(Myy
)
_
= tr
_
ME(uu
)M
_
= tr
_
M
2
I
n
M
_
=
2
tr(M) =
2
tr(I
n
N)
=
2
_
tr(I
n
) tr(N)
_
=
2
(n k)
where the last equality follows from:
tr(N) = tr
_
X(X
X)
1
X
_
= tr
_
(X
X)
1
X
X
_
= tr(I
k
) = k
Therefore, we can dene the adjusted mean squared residual
2
as:

2
=
e
e
n k
(9)
which gives us an estimator of
2
that is unbiased:
E(
2
) =
E(e
e)
n k
=
2
Now that we have a consistent estimator of
2
, given by equation (??), we
can use it to estimate the variance matrix from equation (??).
V(
) =
2
(X
X)
1
Since X is assumed to be nonstochastic and
2
is unbiased, our estimator for
the variance matrix is also unbiased.
Result 2 (Gauss-Markov Theorem) In the framework of the classical regres-
sion model, the vector of OLS coecients

is the minimum variance linear
unbiased estimator of the parameter vector .
Proof: Any linear estimator of can be writen as

=

Ay, where

A
is an k n nonstochastic matrix. Since we also want the estimator to be
unbiased, we must have:
E(
) =

AE(y) =

AX =

AX = I
n
8
Economics G6411
Marcelo J. Moreira
Fall 2011
Columbia University
Dene the matrices A = (X
X)
1
X
and D = A

A. It follows that:
AX = (A+D)X = I
But since the OLS estimator of , denoted by

, is also unbiased, we have
that AX = I
k
. Therefore, DX = 0. The variance of

is given by:
V(
) =

AV(y)
= (A+D)
2
I
n
(A+D)
=
2
_
AA
+AD
+AD
+DD
_
But notice that:
AD
= DA
= DX(X
X)
1
. .
=0
Therefore:
V(
) =
2
_
AA
+DD
_
= V(
) +
2
DD V(
) V(
) =
2
DD
Since
2
is a positive scalar and DD
is a positive semi-denite matrix, we

have that V(
) V(
This important result concerning the OLS estimator of the regression

coecients tells us that within the call of unbiased estimators of , we will
not nd any other estimator with greater precision. It is a result applicable
not only to the coecients themselves, but also to linear combinations of
them. Suppose we are interested in the estimation of a parameter , such
that:
=
1k k1
for any given , it seems reasonable to use as an estimator

=
, where

is the OLS estimator that by now we are already familiar with. As a matter
of fact, if we consider any other estimator of the type

=
, we would
have that:
V (
) =
V(
) =
_
V(
) +
2
DD
_
=
V(
)
. .
V (
)
+
2
DD
9
Economics G6411
Marcelo J. Moreira
Fall 2011
Columbia University
which brings us to:
V (
) V (
) =
2
DD
0
where
DD
is a positive semi-denite matrix and

2
a positive scalar,
which conrms our optimality result.
Other interesting properties of the OLS estimators are shown in the fol-
lowing examples.
Example 2 Consider the standard linear equation:
y = X +u
Suppose we subtract the n 1 vector X from both sides, obtaining a new
regression equation:
y X
. .
y
= X( )
. .
+u y = X
+u
How does it aect our OLS estimator? Notice that the OLS estimator of our
newly dened coecient

is given by:
= (X
X)
1
X
y
= (X
X)
1
X
(y X)
= (X
X)
1
X
y (X
X)
1
X
X
=

+
which gives us a rather intuitive result.
Example 3 Once again, consider the linear model:
y = X +u =
_
X
1
X
2
_

1
2
_
+u = X
1
1
+X
2
2
+u
But now suppose that
2
is known by us. How can we properly estimate
1
?
One way to do that is by dening a new model, in which we subtract X
2
2
from both sides of the old equation.
y X
2
2
= y = X
1
1
+u
10
Economics G6411
Marcelo J. Moreira
Fall 2011
Columbia University
We can then form a vector composed both from our OLS estimates of
1
and
from the known value of
2
.
=
_

2
_
where

1
= (X
1
X
1
)
1
X
1
y
11
Economics G6411
Marcelo J. Moreira
Fall 2011
Columbia University
Long and Short Regressions
Given k explanatory variables, we can create a partition of the matrix X
to regress y on only the rst k
1
explanatory variables and compare the OLS
coecients (b
1
) of the regression with a short list of variables with the OLS
coecients of the regression with a longer list of explanatory variables (
1
).
X =
_
X
1
X
2
nk
1
n(kk
1
)
_
Long Regression: is the regression of y on the k variables, that we have
been doing so far.
y = X +u =
_
X
1
X
2
_

1
2
_
+u = X
1
1
+X
2
2
+e
Short Regression: is the regression of y on a smaller number of explana-
tory variables k
1
< k.
y = X
1
b
1
+e
1
The OLS estimator of coecients from the short regression is given by:
b
1
= (X
1
X
1
)
1
X
1
y
= (X
1
X
1
)
1
X
1
_
X
1
1
+X
2
2
+u
=
1
+ (X
1
X
1
)
1
X
1
X
2
2
(10)
From equation (??) we have that the OLS estimator b
1
of the short regression
will be equal to the OLS estimator

1
of the long regression if and only if
one of two conditions holds:
1.

2
= 0
2. X
1
X
2
= 0. The matrix (X
1
X
1
)
1
X
1
X
2
contains in each column j the
k
1
estimators of the regression coecients of the variable in the j-th
column of X
2
on X
1
. If these coecients are equal to zero, the variables
in X
1
are orthogonal to those in X
2
.
12
Economics G6411
Marcelo J. Moreira
Fall 2011
Columbia University
The conditions under which we have equality between short and long
regression estimators become more clear when we look at the whole matrix
X. By now we already know that the OLS estimator

is given by:
= (X
X)
1
X
y =
_
(X
1
, X
2
)
(X
1
, X
2
)
1
(X
1
, X
2
)
y
With some matrix algebra, we get to:
=
_
X
1
X
1
X
1
X
2
X
2
X
1
X
2
X
2
_
1
_
X
1
y
X
2
y
_
=
_
X
1
X
1
0
0 X
2
X
2
_
1
_
X
1
y
X
2
y
_
=
_
(X
1
X
1
)
1
0
0 (X
2
X
2
)
1
__
X
1
y
X
2
y
_
Notice that we used the assumption of matrix orthogonality to transform the
matrix X
X into a block-diagonal matrix.

Therefore, it follows that:
=
_

2
_
=
_
(X
1
X
1
)
1
X
1
y
(X
2
X
2
)
1
X
2
y
_
=
_
b
1
b
2
_
A dierent way to write the OLS estimator of , using our partition of
X, is by dening the two following matrices:
N
i
= X
i
(X
i
X
i
)
1
X
i
(11)
M
i
= I N
i
(12)
where i is the index of the short regression. Using these newly dened ma-
trices, we can manipulate the regression equation in a very useful manner:
y = X +u
= X
1
1
+X
2
2
+u
= X
1
1
+ (M
1
X
2
+N
1
X
2
)
2
= X
1
(
1
+ (X
1
X
1
)
1
X
1
X
2
2
) +M
1
X
2
2
+u
= X
1
(
1
+ (X
1
X
1
)
1
X
1
X
2
2
) +X
2
+u
13
Economics G6411
Marcelo J. Moreira
Fall 2011
Columbia University
where X
2
= M
1
X
2
. In addition, notice that:
M
1
X
1
= (I X
1
(X
1
X
1
)
1
X
1
)X
1
= 0
So, we have that:
X
X
1
= X
2
M
1
X
1
= 0
We wish to show that:
b
2
= (X
2
)
1
X
y =

2
Indeed, it follows that:
b
2
= (X
2
)
1
X
(X
1

1
+X
2

2
+e)
= (X
2
)
1
X
X
1

1
. .
=0
+(X
2
)
1
X
2
+ (X
2
)
1
X
e
. .
=0
=

2
Because M
1
M = (I N
1
)(I N) = I N
1
N+N
1
N = I N = M
Analogously, b
1
= (X
1
)
1
X
y =

1
, where X
1
= M
2
X
1
. Using the
partition of X that we have established, we can write the variance of the
OLS estimators as:
V
_

2
_
=
2
_
X
1
X
1
X
1
X
2
X
2
X
1
X
2
X
2
_
It is also possible to write the variance of a subvector of the OLS estimator
as a function of X
2
:
V (
2
) =
2
(X
2
)
1
X
I X
2
(X
2
)
1
=
2
(X
2
)
1
Let us denote by e
1
= M
1
y the residual matrix of the short regression.
Therefore, we have that:
e
1
= M
1
(X
1
1
+X
2
2
+e) = e +M
1
X
2
2
= e +X
2
It follows that
e
1
e
1
= e
e +

2
X
2
(13)
14
Economics G6411
Marcelo J. Moreira
Fall 2011
Columbia University
Consequently, the sum of squared residuals of the long regression can not
exceed the sum of squared residuals of the short regression. So, we can not
improve the t by shortening the list of explanatory variables.
Example 4 Let us consider the matrix representation of the bivariate re-
gression model, with an intercept:
y = 1
n
1
+
2
X
2
+ u
n1 n1 11 11 n1 n1
For this particular model, our partition will give us the matrix X
1
= 1
n
and
the n 1 matrix X
2
with the observations of the explanatory variable.
y = X
1
1
+X
2
2
+u
X
1
=
_
_
1
.
.
.
1
_
_
, X = (X
1
, X
2
)
The OLS estimator for the long regression is given by:
=
_

2
_
= (X
X)
1
X
y =
_
1
n
1
n
1
n
X
2
X
2
1
n
X
2
X
2
_
1
_
1
n
y
X
2
y
_
=
_
n
i
x
2i
i
x
2i
i
x
2
2i
_
1
_
i
y
i
x
2i
y
i
=
1
n
i
x
2
2i
(
x
2i
)
2

_
i
x
2
2i

i
x
2i
i
x
2i
n
__
i
y
i
x
2i
y
i
_
Since we are particularly interested in
2
, we have that:
2
=
n
x
2i
y
i
x
2i
y
i
n
i
x
2
2i
(
x
2i
)
2
=
x
2
y x
2
y
x
2
2
x
2
2
(14)
15
Economics G6411
Marcelo J. Moreira
Fall 2011
Columbia University
where the last equality we get by dividing both the numerator and the deno-
minator by n
2
. The alternative formula for the OLS estimator of
2
is:
b
2
= (X
2
)
1
X
y
where
X
2
= M
1
X
2
= (I 1
n
(1
n
1
n
)
1
1
n
)X
2
= X
2
x
2
I
n
X
2
=
x
2
2i
n
1
(
x
2i
)
2
X
y =
x
2i
y
i
n
1
x
2i
y
i
So, it follows that:
b
2
=
x
2i
y
i
x
2
y
i
i
x
2
2i
x
2
x
2i
=
x
2
y x
2
y
x
2
2
x
2
2
(15)
where the last equality comes from dividing both the denominator and the
numerator by n. Notice the equivalence between formulas (??) and (??). As
expected, we obtained the same expression for

2
and b
2
.
16
Economics G6411
Marcelo J. Moreira
Fall 2011
Columbia University
Inference in Linear Regression Models
Once we have estimated the parameter values, we might be interested in
turning to hypothesis testing. If we know the distribution of the residuals,
we can use small sample results to test the null hypothesis. However, a more
general approach, that does not make assumptions concerning the residuals
distribution, consists in using the asymptotic distribution of the test statistics
to draw a conclusion about H
0
. We start by constructing our test statistic
from the model equation.
y = X +u y X = u
If we premultiply both sides by (X
X)
1
X
, we have:
(X
X)
1
X
y
. .
(X
X)
1
X
X = (X
X)
1
X
= (X
X)
1
X
u (16)
Or, equivalently:
n(
) =
_
1
n
X
X
_
1
1
n
Xu
=
_
1
n
X
X
_
1
2
(X
X)
1
2
X
u
To be able to use asymptotic results, we need to make an important assump-
tion concerning our k k matrix X
X:
1
n
X
X
p
B (17)
where B is a positive semidenite matrix. If this assumption is valid, we can
apply a multivariate version of the Lindeberg-Feller Central Limit Theorem
to obtain:
(X
X)
1
2
1
n
Xu
d
N(0,
2
B) (18)
17
Economics G6411
Marcelo J. Moreira
Fall 2011
Columbia University
We can combine results (??) and (??) to get to:
_
1
n
X
X
_
1
2
1
n
Xu
d
B
1
2
N(0,
2
B)
which gives us the result we wanted:
(X
X)
1
2
X
u
d
N(0,
2
I
k
) (19)
And if result (??) is valid, the Continuous Mapping Theorem assures us that:
u
X(X
X)
1
2
(X
X)
1
2
X
2
d
2
(k) (20)
In addition to that, we can substitute equation (??) on (??) and (??) to
obtain:
(
(X
X)
1
(
2
= (
V (
)
1
(
)
d
2
(k)
And since under some fairly general assumptions
2
is a consistent estimator
of
2
, we have that:

_
1
n
X
X
_
1
p

2
B
1
Therefore, it follows that:
(
)(X
X)(
)

2
d
2
(k)
Furthermore, if our test does not concern the entire vector but only some
part of it, say
2
, it is easy to derive asymptotic results for it, by premulti-
plying our test statistic by a specic matrix:
n(
2
) =
n
_
0 0
0 I
k
_
(
)
which gives us:
n(b
2
) =
_
1
n
X
2
_
1
2
_
X
2
_
1
2
X
u
18
Economics G6411
Marcelo J. Moreira
Fall 2011
Columbia University
(X
2
)
1
2
X
2
u
d
N(0,
2
I
k
2
)
u
2
(X
2
)
1
2
(X
2
)
1
2
X
2
d
2
(k
2
)
Or, equivalently:
(b
2
)(X
2
)(b
2
)
2
d
2
(k
2
)
(b
2
)(X
2
)(b
2
)

2
d
2
(k
2
)
These newly obtained asymptotic results can be used in hypothesis testing.
For example, if we wish to test H
0
:
2
=
2,0
, we use:
(b
2,0
)
2
X
2
(b
2,0
)

2
=
_
e
e
n k
_
1
_
e
1
e
1
e
e
_
d
2
(k
2
)
where e
1
= M
1
y are the residuals of the regression of the modied dependent
variable y = y X
2
2,0
on X
1
. For the particular case in which
2,0
= 0,
we have that:
b
2
X
2
b
2

2
=
_
e
e
n k
_
1
_
e
1
e
e
_
d
2
(k
2
)
where the above equality follows from:
e
1
= M
1
y = M
1
(X
1
1
+X
2
2
+e) = e +X
2
b
2
e
1
= e
e +b
2
X
2
b
2
These important asymptotic results can be used to test the value of any
linear function of the parameters of the model. Suppose that H
0
consists of
p dierent hypothesis concerning linear combinations fo the elements of .
These hypothesis can be summarized by the p k matrix H, so that the
hypothesis we wish to test can be written as:
H
0
: H =
0
pk k1 p1
19
Economics G6411
Marcelo J. Moreira
Fall 2011
Columbia University
such that H has rank p. We can then create a larger matrix F, formed by H
as a submatrix, together with another submatrix L, in the following manner:
F =
_
_
L
(kp)k
H
pk
_
_
=
_
L
H
_
=
_

1
2
_
Because we wish to test H
0
, it is important that the submatrix L is such
that F is an invertible matrix, so we can modify our model to become:
y = XF
1
F +u
In a previous example, we were interested in testing the hypothesis that the
last k
2
of the k elements of are equal to 0. To test this null hypothesis, we
can use the following matrix H:
H =
_
0 I
k
2
k
2
k
1
k
2
k
2
_
=
_
_
0 0 . . . 0 1 0 . . . 0
0 0 . . . 0 0 1 . . . 0
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
0 0 . . . 0 0 0 . . . 1
_
_
k
2
k
1
k
2
k
2
Typically, we divide the matrix H into two submatrices, H
1
and H
2
.
H =
_
H
1
H
2
p(kp) pp
_
H
2
must be invertible, so that we can compute the inverse matrix of F:
F =
_
I
kp
0
H
1
H
2
_
F
1
F =
_
I
kp
0
H
1
2
H
1
H
1
2
__
I
kp
0
H
1
H
2
_
=
_
I
kp
0
0 I
p
_
20
Economics G6411
Marcelo J. Moreira
Fall 2011
Columbia University
Starting from these matrices, we can adapt our regression model:
H
0
: H =
0
H
0
:
2
=
0
y Z
2
0
. .
y
= Z
1
1
+Z
2
(
2
0
)
. .

2
+u
The last step is to test the new null hypothesis, H
0
:
2
= 0, which is
something we already know how to do.
Example 5 Consider a multiple regression model with three explanatory va-
riables (X is a n 3 matrix).
y
i
=
1
X
1i
+
2
X
2i
+
3
X
3i
+ u
i
=
_
_
3
_
_
R
3
Suppose we are interested in testing the following null hypothesis:
H
0
:
1
+
2
+
3
= 0 H
0
: 1
3
= 0
To test this hypothesis we use the submatrix H
3
= 1
3
. And since we are
interested only in H
3
, we can choose L in a way such that F is invertible in
a simple matter.
F =
_
_
1 0 1
0 1 0
1 1 1
_
_
21
Economics G6411
Marcelo J. Moreira
Fall 2011
Columbia University
Relaxing the assumptions of the classical model
Until now, we have studied the linear regression model under the classical
assumptions. One of these assumptions is the nonstochastic nature of the
explanatory variables. Now we leave it aside and let X be a random matrix.
This model starts from the following assumptions:
1. E(y|) = X
2. X is stochastic
3. V(y) =
2
I
n
4. X has rank k.
It is easy to see that the OLS estimator remains unbiased if X is random:
E(
) = E
X
[E(
|X)] = E
X
[(X
X)
1
X
E(y|X)] = E
X
[(X
X)
1
X
X] =
The variance of our estimator, however, does change under this new assump-
tion. From the conditional variance identity, we have that:
V(
) = E
X
[V(
|X)] +V
X
[E(
|X)]
. .
=0
(21)
where the conditional variance is given by:
V(
|X) = V((X
X)
1
X
y|X) = (X
X)
1
X
V(y|X)X(X
X)
1
If V(y|X) = V(u|X) =
2
I
n
, we have that:
V(
|X) = (X
X)
1
X
2
I
n
X(X
X)
1
=
2
(X
X)
1
Substituting this result on equation (??), we get to:
V(
) =
2
E(X
X)
1
V(
n(
)) =
2
E
_
1
n
X
X
_
1
22
Economics G6411
Marcelo J. Moreira
Fall 2011
Columbia University
One one hand, since we did not give any details about the distribution
of the x
i
s, we cannot say more about the variance of

n(
), in small
samples.
On the other hand, notice that to get to this new formula for the variance
of our OLS estimator we made a crucial hypothesis concerning the variance of
u. We maintained the classical assumption that V(u) =
2
I
n
. Nonetheless,
if we assume a dierent variance matrix for u, we move further away from
our initial model, but make our results more general. Suppose that:
V(u|X) = () =
2
I
n
, R
d
where () is a positive denite matrix.
We start with the pure heteroskedasticity case. Assume that the variance
matrix of u takes the following form:
V(u|X) =
_
2
1
0 . . . 0
0
2
2
. . . 0
.
.
.
.
.
.
.
.
.
.
.
.
0 0 . . .
2
n
_
_
Then the variance matrix of our estimator of , conditional to the observed
value of X is given by:
V(
|X) = (X
X)
1
X
2
1
. . . 0
.
.
.
.
.
.
.
.
.
0 . . .
2
n
_
_
X(X
X)
1
=
_
x
i
x
i
_
1
x
i
x
2
i
_
x
i
x
i
_
1
It follows that:
V(
n(
)|X) =
_
x
i
x
i
n
_
1

x
i
x
2
i
n
_
x
i
x
i
n
_
1
(22)
n(
)
d
N(0, (E(x
i
x
i
))
1
limn
1
x
i
x
i
u
2
i
(E(x
i
x
i
))
1
23
Economics G6411
Marcelo J. Moreira
Fall 2011
Columbia University
For the more general case, we have the following conditional variance
matrix:
V(u|X) =
_
11
. . .
1n
.
.
.
.
.
.
.
.
.
n1
. . .
nn
_
_
which leads to
n(
) =
_
1
n
X
X
_
1
1
n
Xu =
_
1
n
x
i
x
i
_
1
1
x
i
u
i
and the following result for the variance:
V(
n(
)) = E
_
_
1
n
x
i
x
i
_
1
1
n
j
x
i
x
2
ij
_
1
n
x
i
x
i
_
1
_
Notice that although we have a new variance matrix , our least squares
estimator remains unbiased:
E(
|X) = E((X
X)
1
X
y|X) = (X
X)
1
X
E(y|X) = (X
X)
1
X
X =
The variance of the

, however, does not remain the same:
V(
|X) = (X
X)
1
X
V(u|X)X(X
X)
1
= (X
X)
1
X
()X(X
X)
1
In addition to that, the new variance matrix () aects our asymptotic re-
sults, specially if the variance of the score statistic diers from the variance
of the Hessian matrix. To see that, let us repeat the process we have previ-
ously gone through to nd the asymptotic distribution of the OLS estimator.
Starting from our usual objective function:
Q
n
() =
1
2n
(y X)
(y X)
we obtain the Score statistic and the Hessian matrix:
S
n
() =
Q
n
()
=
1
n
X
(y X)
24
Economics G6411
Marcelo J. Moreira
Fall 2011
Columbia University
H
n
() =

2
Q
n
()
=
1
n
X
X
Once again, we make an important assumption concerning X
X:
H
n
() =
1
n
X
X
p
B, where B is a positive denite matrix
Similarly to before, we can use the Lindeberg-Feller CLT:
nS
n
(
) =
1
n
X
u =
1
x
i
u
i
d
N(0, A),
where plim
1
n
X
X = A. So it follows that:
n(
)
d
N(0, B
1
AB
1
)
In details:
0 S
n
(
) +H
n
(
)(
n(
) H
n
(
)
1
nS
n
(
)
We can then conclude that the OLS estimator will probably be asymptotically
inecient, unless B is proportional to A.
It may be useful to compare the OLS estimator to the ML estimator
when we do not have homoskedasticity. Consider the joint probability density
function of y:
f(y; , ) = f(y|X; , )g(X)
ln f(y; , ) = ln f(y|X; , ) + ln g(X)
The vector that maximizes the pdf f(y; , ) also maximizes the conditi-
onal function f(y|X; , ). If we assume that u|X N(0, ), it follows
that y|X N(X, ). So we have that:
f(y|X; , ) = (2)
n
2
det() exp
_
1
2
(y X)
1
(y X)
_
So, the maximum likelihood estimator will be given by:
ml
= arg min
1
2n
(y X)
1
(y X)
25
Economics G6411
Marcelo J. Moreira
Fall 2011
Columbia University
This optimization problem, which renders the ML estimator when the resi-
duals are normally distributed, is identical to the optimization problem that
gives us the Generalized Least Squares (GLS) estimator:
gls
= arg min
1
2n
(y X)
1
(y X)
which gives us the following rst-order condition:
ln f(y|X; , )
=
1
n
X
1
(y X
gls
) = 0
Under certain conditions, we can obtain the GLS estimator:
gls
= (X
1
X)
1
X
1
y
Under the new assumption concerning the variance matrix, the GLS estima-
tor is still unbiased.
E(
gls
|X) = (X
1
X)
1
X
1
E(y|X) = (X
1
X)
1
X
1
X =
As expected, however, the variance of the estimators is altered by the new
variance matrix of the residuals.
V(
gls
|X) = (X
1
X)
1
X
1
X(X
1
X)
1
= (X
1
X)
1
X
1
X(X
1
X)
1
(23)
Remember that the variance of the OLS estimator is:
V(
ols
|X) = (X
X)
1
X
X(X
X)
1
(24)
Typically, these variance matrices do not need to be equal. However, for the
particular case in which =
2
I
n
, we have that:
ml
= (X
2
I
n
X)
1
X
2
I
n
y =

ols
In addition to that, our GLS estimator recovers asymptotic optimality pro-
perties that were lost by the least squares estimators once we have altered
the variance of the residuals. First, let us look at the Score statistic:
S
n
() =
1
n
X
1
(y X)
26
Economics G6411
Marcelo J. Moreira
Fall 2011
Columbia University
nS
n
(
) =
1
n
X
1
(y X
)
d
N(0, A)
H
n
() =
S
n
()
=
1
n
X
1
X
p
A
where A = plim
1
n
X
1
X. So we have that:
n(
) H
n
(
)
1
nS
n
(
)
d
N(0, A
1
)
The idea behind the GLS estimator is to transform the model in a way
that the variance of the adjusted residuals is proportional to I
n
, so that we
can apply the Gauss-Markov theorem. In other words, we seek an optima-
lity result that does not depend on the normality of the residuals, obtained
by restricting the class of estimators at which we look. To do so, let us
premultiply the regression equation by
1
2
:
1
2
y =
1
2
X +
1
2
u
nn n1
. .
y
nn nk k1
. .
X
nn n1
. .
u
V(u|X) =
nn
Since is a positive denite matrix, we can write it as:
= PP
, where P
P = I
n
and is a diagonal matrix.
1
2
= P
1
2
P

1
2
1
2
= P
1
2
P
1
2
P = PP
=
Our new regression equation is:
y
= X
+u
For this new modied model, the variance matrix of the residuals becomes:
V(u
) = E(u
) = E(
1
2
uu
1
2
) =
1
2
E(uu
1
2
=
1
2
1
2
= I
n
Therefore, all the requirements of the classical regression model are met, and
we can apply the Gauss-Markov theorem in y
= X
+u
= (X
X)
1
X
= (X
1
2
1
2
X)
1
X
1
2
1
2
y
= (X
1
X)
1
X
1
y =

gls
27
Economics G6411
Marcelo J. Moreira
Fall 2011
Columbia University
The generalized least squares estimator

gls
is the minimum variance linear
unbiased estimator of of , a result known as Aitkens Theorem.
Although it is an interesting extension of the OLS estimator, the GLS
estimator requires knowledge of . However, we rarely know , which me-
ans that testing hypothesis and constructing condence intervals is a very
complicated matter. What is left to do is to estimate this matrix and subs-
titute in the GLS estimator formula with its estimator,

. Nonetheless,
for estimates of , the resulting statistic may have a rather complicated dis-
tribution in small samples, leaving us no alternative but to focus on its large
sample properties. This new estimator of , with

instead of , is called
the Feasible Generalized Least Squares (FGLS) estimator.
fgls
= (X
)
1
X)
1
X
)
1
y = + (X
1
X)
1
X
1
u
The properties of the FGLS estimator will of course depend on the properties
of the estimator of the variance matrix,

.
plim
= + plim
_
_
1
n
X
1
X
_
1
1
n
X
1
u
_
= +
_
plim
1
n
X
1
X
_
1
plim
1
n
X
1
u
However, if we choose a consistent estimator

, under general conditions we
can be sure that the FGLS estimator of will have the same asymptotic
distribution as the GLS estimator

gls
:
n(
fgls
)
d
N(0, A
1
)
28

Linear Regression Models, by Marcelo Moreira

Caricato da

Informazioni sul documento

Descrizione originale:

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Linear Regression Models, by Marcelo Moreira

Caricato da

Copyright:

Formati disponibili

Economics G6411

X has full rank, it is invertible and we can explicitly express

is a positive semi-denite matrix, we

This important result concerning the OLS estimator of the regression

is a positive semi-denite matrix and

X into a block-diagonal matrix.

Potrebbero piacerti anche