Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Mark Schaer
Heriot-Watt University
Autumn 2016
Lecture Notes 1
Autumn 2016
1 / 87
Lecture Outline
1
Motivation
Lecture Notes 1
Autumn 2016
2 / 87
Key concepts
1
Unbiased estimator
E cient estimator
Frisch-Waugh-Lovell Theorem
Lecture Notes 1
Autumn 2016
3 / 87
Moment
Strict exogeneity
Unbiased estimator
E cient estimator
Lecture Notes 1
Autumn 2016
4 / 87
Motivation
We want to estimate a parameter and perform inference using it. In
econometrics, we have two standard settings for characterising estimators
and developing results: "nite sample" and "large sample".
Finite sample setting:
In repeated samples of size n, what is E (b
)? Is E (b
) = , i.e., is our
estimator unbiased?
Our estimator b
is a statistic - it is a function of the data. In repeated
samples of size n, what is the distribution of b
and Var (b
) in particular?
Lecture Notes 1
Autumn 2016
5 / 87
Motivation
Large sample setting:
It turns out to be a lot easier to develop theoretical results for b
in the
large sample setting, i.e., as n ! . We also say this is the asymptotic
setting, and we say our results are asymptotic or asymptotically valid.
What is the probability limit of b
as as n ! ? Does b
converge to ,
i.e., is p limn ! (b
) = ? In other words, is our estimator consistent?
(Loosely speaking, does any bias in b
disappear as the sample size gets
larger and larger?)
Our estimator b
is a statistic
p b - it is a function of the data. Whatbis the
limiting distribution of n( ) as n ! , i.e., what is AVar ( )?
Our expression for is AVar (b
) may be infeasible because it depends on
\
parameters we dont know. How do we obtain an estimate AVar
(b
)?
Lecture Notes 1
Autumn 2016
6 / 87
Motivation
What makes for a good estimator?
Central tendency: unbiasedness (nite-sample setting) or consistency
(large-sample setting). In repeated samples, b
will be centred around the
b
true ; as the sample size grows, converges to .
Variance: e ciency (nite-sample setting) or asymptotic e ciency
(large-sample setting). The variance of b
is "small".
How to weight these two criteria? What should be our loss function?
Might we sometimes prefer to use a biased or inconsistent estimator b
Lecture Notes 1
Autumn 2016
7 / 87
Motivation
Denition: The mean squared error of b
is MSE = E (b
)2 = Eh (b
= E (b
)2 . MSE is
)2
i
)2 + 2( )(b
) + ( )2
h
i
= E (b
)2 + E 2( )(b
) + E ( )2
) + ( )2
= E (b
)2 + 2( )E ( b
= E (b
)2 + 2( ) 0 + ( )2
h
i2
= Var (b
) + bias (b
)
where at various points we have made use of the fact that can be
treated as a constant and so E ( ) = E (E ( )) = .
Mark Schaer (Heriot-Watt University)
Lecture Notes 1
Autumn 2016
8 / 87
Motivation
Common pattern in econometrics:
b
does not have a nite-sample justication. Its biased, or we just
dont know what E (b
) is, or we cant derive Var (b
).
b
does have a large-sample justication. Its consistent, and we can
estimate the asymptotic variance.
Because b
is biased in nite samples, we are worried about how it will
perform in practice. No one ever actually has an innite sample.
Sometimes we can derive expressions for the nite-sample bias or
otherwise study nite-sample performance theoretically.
Often we do MC exercises to see how the estimator performs.
Increasing the sample size in a series of MCs can indicate how quickly
the asymptotic justication kicks in. We can compare how dierent
estimators perform in dierent settings according to their bias,
variance and MSE.
This lecture: nite-sample theory for OLS.
Mark Schaer (Heriot-Watt University)
Lecture Notes 1
Autumn 2016
9 / 87
k
y
X
0
0
Lecture Notes 1
Autumn 2016
10 / 87
yn
3
y1
6 .. 7
4 . 5
yn
Xn
3
x10
6 .. 7
4 . 5
xn0
Lecture Notes 1
3
1
6 .. 7
4 . 5
K
3
1
6 .. 7
4 . 5
n
Autumn 2016
11 / 87
y
y
y
y
y
OLS
X
b
X
e
X
b
X
OLS
Xb
Vector of
Residuals
Residuals
Residuals
Residuals
errors, n
dened by
dened by
dened by
dened by
1
b
estimator
e
some other estimator
b
OLS estimator OLS (no subscript!)
OLS estimator b. Same as above.
Lecture Notes 1
Autumn 2016
12 / 87
e2
b 2OLS
s2
Var (i jX)
Some estimator for 2
Some other estimator for 2
The OLS estimator for 2
b 2OLS .
The OLS estimator for 2 . Same meaning as
Lecture Notes 1
Autumn 2016
13 / 87
1 X0 y
e0 e
n K
Lecture Notes 1
Autumn 2016
14 / 87
e ) 0 (y
=(y X
n
1
n
1
n
e2i
i =1
n
(yi
i =1
e)
X
e )0 (yi
xi
e)
xi
e)
arg minSSR (
e
Lecture Notes 1
Autumn 2016
15 / 87
e ) 0 (y X
e)
= (y X
0
e X0 )(y X
e)
= (y 0
0
e X0 y y 0 X
e+
e 0 X0 X
e
= y0 y
0
e+
e X0 X
e
e 0 X0 y is a scalar)
= y0 y 2y0 X
(since
Lecture Notes 1
(1.2.2)
Autumn 2016
16 / 87
e+
e 0 X0 X
e
2y0 X
(1.2.2)
e ) with respect to .
e Since SSR (
e ) is a scalar
Next, dierentiate SSR (
e
and is a K 1 vector, the result is a K 1 vector vector of rst
derivatives (the "gradient"). Note that:
1
0
e (y y) = 0.
Since y0 y does not depend on ,
e
e)
e)
(a0
( 2y0 X
= a,
= 2X0 y. (note the transposition)
e
e
0
e A
e)
e0 0 e
(
e for A symmetric, ( X X ) = 2X0 X .
e
Since
= 2A
e
e
Since
Hence:
e)
SSR (
=
e
e
2y0 X + 2X0 X
Lecture Notes 1
Autumn 2016
17 / 87
e
2y0 X + 2X0 X
e
e ) are SSR ( ) = 0, so substitute
The rst-order conditions for minSSR (
e
e
0
0 e
to get 2X y + 2X X = 0 and rearrange:
X0 Xb = X0 y
(1.2.2)
(X0 X )
X0 Xb = (X0 X)
b = (X0 X )
(1.2.5)
Xy
Lecture Notes 1
Autumn 2016
18 / 87
e
2y0 X + 2X0 X
e)
2 SSR (
= 2X0 X
0
e
e
And since weve assumed X0 X is full rank and hence positive denite,
2X0 X is also positive denite and the second order conditions for a
minimization are satised. Done!
Lecture Notes 1
Autumn 2016
19 / 87
e0 e
y0 y
3
y1
7
6
y y = 4 ... 5
yn
e0 e
e
y0 e
y
3 2
3
1
y1 y
6 7 6
7
..
y 4 ... 5 = 4
5
.
1
yn y
Lecture Notes 1
Autumn 2016
20 / 87
The sampling error of b is the dierence between the OLS estimate b and
the true :
b
=
=
=
=
=
(X0 X ) 1 X0 y
(X0 X) 1 X0 (X + )
(X0 X) 1 X0 X + (X0 X) 1 X0
+ (X0 X ) 1 X0
(X0 X ) 1 X0
Lecture Notes 1
Autumn 2016
21 / 87
X (X0 X )
1 X0
Px is n
n.
X (X0 X )
1 X0
MX is n
n.
1 X0 ) 0 =
X (X0 X )
1 X0
= Px
1 X0 )
Lecture Notes 1
Autumn 2016
22 / 87
The projection and annihilation matrices make it easy to write some OLS
expressions:
b
y = Xb = X(X0 X)
1 X0 y
= Px y
b
y=y
Xb = y
X (X0 X )
1 X0 y
= (I
X (X0 X )
X0 )y = Mx y
Lecture Notes 1
Autumn 2016
23 / 87
Lecture Notes 1
Autumn 2016
24 / 87
[ X1
X2 ]
1
2
y = X + = [X1
X2 ]
1
+
2
y = Xb + e = [X1
X2 ]
b1
+e
b2
Lecture Notes 1
b1
b2
Autumn 2016
25 / 87
X1 ( X1 X1 )
X10
Mx 1
Px 1
e
y= Mx 1 y
e 2 residuals
Frisch-Waugh-Lovell Theorem: Regress the e
y residuals on the X
using OLS and you get the same b2 as when you do OLS using the full set
of Xs:
b = (X
e0 X
e
2 2)
2
1X
e0 e
2y
= b2
Lecture Notes 1
Autumn 2016
26 / 87
2 2)
2
1X
e0 e
2y
= b2
b =e
e2
X
2
Lecture Notes 1
Autumn 2016
27 / 87
Moments
A "moment" is a measure of the shape of a distribution (the term is
borrowed from physics).
If ai and ci are scalar random variables, then the following are population
moments:
E ( ai )
E (ai2 )
E (ai E (ai ))2
E ( a i ci )
E [(ai E (ai ))(ci
E (ci ))]
Lecture Notes 1
Autumn 2016
28 / 87
Moments
The corresponding sample moments:
n
1
n
ai
i =1
n
1
n
ai2
i =1
n
1
n
( ai
a )2
i =1
n
1
n
a i ci
i =1
n
1
n
( ai
a)(ci
c)
i =1
Lecture Notes 1
Autumn 2016
29 / 87
Moments
The extension of moments to matrices is straightforward. We illustrate by
applying to our data variables. These sample and population moments
appear frequently enough to warrant shorthand symbols.
Population moments:
E (xi yi )
xy
E (xi xi0 )
xx
Sample moments:
n
1
n
xi yi
1 0
nX y
Sxy
1 0
nX X
Sxx
i =1
n
1
n
xi xi0
i =1
Autumn 2016
30 / 87
Denition of i.i.d.
Denition of i.i.d.: independently and identically distributed
Note: we are assuming that the regressors X are random. Sometimes
textbooks assume that X is constant or "xed in repeated samples". This
makes the exposition a bit easier but is highly unrealistic, so we follow
Hayashi and assume X is random from the beginning.
The sample (y, X) is an i.i.d. random sample (or just "random sample") if
fyi , xi g is independently and identically distributed across observations i.
Note: the "identical" in i.i.d. means that the joint distribution of fi , xi g
does not depend on i. We will use this fact later.
Note: Hayashi (p. 12) uses the term "random sample" for an i.i.d.
random sample.
Mark Schaer (Heriot-Watt University)
Lecture Notes 1
Autumn 2016
31 / 87
Lecture Notes 1
Autumn 2016
32 / 87
(1.1.1)
i = 1, 2, ...n
(1.1.7)
Lecture Notes 1
Autumn 2016
33 / 87
i = 1, 2, ...n
(1.1.12)
(b) Independence
E ( i j jX) = 0
8 i, j, i 6= j
(1.1.13)
Lecture Notes 1
Autumn 2016
34 / 87
(1.1.1)
Lecture Notes 1
Autumn 2016
35 / 87
Lecture Notes 1
Autumn 2016
36 / 87
Lecture Notes 1
Autumn 2016
37 / 87
i = 1, 2, ...n
This is also known as the zero conditional mean assumption, but strict
exogeneity is perhaps the best term because it emphasizes the key point
(conditioning on the entire X).
It is actually two assumptions:
E ( i jX) =
i = 1, 2, ...n
and
=0
But not restrictive if the model has a constant term.
Mark Schaer (Heriot-Watt University)
Lecture Notes 1
Autumn 2016
38 / 87
8i
i = 1, 2, ...n and 6= 0
(the rst column of the data matrix X is all ones)
yi = 1 + 2 xi 2 + . . . + K xiK + i
Add and subtract :
yi = ( 1 + ) + 2 xi 2 + . . . + K xiK + (i
And the new error term now has a zero conditional mean.
Much more important....
Mark Schaer (Heriot-Watt University)
Lecture Notes 1
Autumn 2016
39 / 87
i = 1, 2, ...n
i = 1, 2, ...n
i = 1, 2, ...n
Lecture Notes 1
Autumn 2016
40 / 87
8 i, j
i = 1, 2, ...n
Lecture Notes 1
Autumn 2016
41 / 87
+ i
(1.1.11)
+ i )i ] = E (yi
1 i )
= 0. Then:
2
1 i ) + E ( i )
= E (2i )
So unless the error term is always zero, E (yi i ) 6= 0. But that means
weve violated strict exogeneity because yi is the regressor for observation
i + 1. We can assume weak exogeneity but not strict exogeneity.
Fortunately, weak exogeneity is enough for the OLS estimator to have good
large-sample properties even in a time series setting. More on this later.
Mark Schaer (Heriot-Watt University)
Lecture Notes 1
Autumn 2016
42 / 87
i = 1, 2, ...n
i = 1, 2, ...n
Lecture Notes 1
Autumn 2016
43 / 87
E (xi i ) = 0K
E (i jxi ) = 0
i = 1, 2, ...n
i = 1, 2, ...n
Weak exogeneity
Conditional moment restriction
Lecture Notes 1
Autumn 2016
44 / 87
i = 1, 2, ...n
= E [E (f (xi )i j xi )]
= E [f (xi )E (i jxi )]
= E [f (xi ) 0]
=0
(a)
(b)
(c)
(d)
Notes:
(a) By the Law of Total Expectations: E [E (AjB )] = E (A).
(b) By the linearity of conditional expectations: since we are conditioning
on xi , we can treat any function of xi as nonrandom and move it out of
the inner E ().
(c) By our conditional moment restriction E (i jxi ) = 0.
Mark Schaer (Heriot-Watt University)
Lecture Notes 1
Autumn 2016
45 / 87
i = 1, 2, ...n
1
i = 1, 2, ...n
Strict exogeneity
Weak exogeneity ("predetermined")
E (i jxi ) = 0
i = 1, 2, ...n
E (i jxj ) = 0
8 i; 8 j
Lecture Notes 1
Autumn 2016
46 / 87
i = 1, 2, ...n
(1.1.7)
i = 1, 2, ...n
(1.1.16)
Dont confuse this result with weak exogeneity! (1.1.16) follows from the
i.i.d. assumption for fyi , xi g, which is often very strong. For example, if
we are working with time series, the independent variables xi will usually
be serially correlated - GDP last year is correlated with GDP this year. If
the data are not i.i.d., then we have to include all the x1 , x2 , ...xn in the
denition of strict exogeneity.
Mark Schaer (Heriot-Watt University)
Lecture Notes 1
Autumn 2016
47 / 87
Lecture Notes 1
Autumn 2016
48 / 87
Classic example: the dummy variable trap. Say we have a dataset of male
and female individuals. Dene the male dummy m as the n 1 column
vector of dummies for individuals i = 1, ..., n. Dene the female dummy f
similarly. Say we also have a constant term, i.e., a variable which is just
an n 1 column vector of ones:
2 3
1
6 .. 7
n 1 4 . 5
(another notation for this column vector is 1)
1
There are no other regressors. Then the matrix of regressors is
X = [ m f ]. But its easy to see that = m + f . Thus X is not full
rank and the assumption fails.
Lecture Notes 1
Autumn 2016
49 / 87
Lecture Notes 1
Autumn 2016
50 / 87
i = 1, 2, ...n
i = 1, 2, ...n
Lecture Notes 1
Autumn 2016
51 / 87
i = 1, 2, ...n
If we in addition assume
Optional Extra Hayashi Assumption: I.I.D. random sample
i = 1, 2, ...n
(1.1.17)
Lecture Notes 1
Autumn 2016
52 / 87
i = 1, 2, ...n
Lecture Notes 1
Autumn 2016
53 / 87
8 i, j, i 6= j
(1.1.13)
Cov (i , j jX) = 0
8 i, j, i 6= j
Lecture Notes 1
Autumn 2016
54 / 87
i = 1, 2, ...n
Parts (a) and (b) can be combined into single expression involving
E (0 jX). Recall that the 0 matrix is n n with 2i s running down the
diagonal and i j on the o-diagonals:
2
21
6 ..
6 .
6
6 i 1
6
6 .
0 = 6 ..
6
6 j 1
6
6 ..
4 .
n 1
. . . 1 i
..
..
.
.
. . . 2i
..
.
...
j i
..
.
. . . n i
. . . 1 j
..
.
...
..
.
...
i j
..
.
2j
..
.
. . . n j
3
. . . 1 n
.. 7
. 7
7
. . . i n 7
7
.. 7
. 7
7
. . . j n 7
7
.. 7
..
.
. 5
. . . 2n
Lecture Notes 1
Autumn 2016
55 / 87
i = 1, 2, ...n
But (a) means the diagonal of E (0 jX) is 2 s, and (b) means all the
o-diagonals of E (0 jX) are zero. Thus the matrix simplies hugely:
2
21
6 ..
6 .
6
6 i 1
6
6 .
E (0 jX) = E( 6 ..
6
6 j 1
6
6 ..
4 .
n 1
Mark Schaer (Heriot-Watt University)
. . . 1 i
..
..
.
.
. . . 2i
..
.
...
j i
..
.
. . . n i
. . . 1 j
..
.
...
..
.
...
i j
..
.
2j
..
.
. . . n j
Lecture Notes 1
3
. . . 1 n
.. 7
. 7
7
. . . i n 7
7
.. 7
jX) = ...
. 7
7
7
. . . j n 7
.. 7
..
.
. 5
. . . 2n
Autumn 2016
56 / 87
E(21 jX)
6
..
6
.
6
6 E( i 1 jX)
6
6
..
E (0 jX) = 6
.
6
6 E( j 1 jX)
6
6
..
4
.
E( n 1 jX)
i = 1, 2, ...n
. . . E( 1 i jX) . . . E( 1 j jX)
..
..
..
.
.
.
. . . E(2i jX) . . . E(i j jX)
..
..
..
.
.
.
. . . E(j i jX) . . . E(2j jX)
..
..
.
.
. . . E( n i jX) . . . E( n j jX)
3
. . . E( 1 n jX)
7
..
7
.
7
. . . E( i n jX) 7
7
7
..
7
.
7
. . . E( j n jX) 7
7
7
..
..
5
.
.
. . . E(2n jX)
And since the diagonals are all 2 , and the o-diagonals are all 0s...
Mark Schaer (Heriot-Watt University)
Lecture Notes 1
Autumn 2016
57 / 87
.
.
.
0
6
7
6
.. . . . . . . . .
..
.. 7
0
6
.
.
.
.
.
.7
E ( jX) = 6 .
7
6 ..
7
.. . . . .
6.
7
.
. 2 0
.
0
6
7
..
..
6 ..
7
4.
.
. . . . 0 2 0 5
0
0
0 ... 0
0 2
i = 1, 2, ...n
Hence
E (0 jX) = 2 In
(1.1.14)
Lecture Notes 1
Autumn 2016
58 / 87
i = 1, 2, ...n
(1.1.14)
Or, making use of Assumption 1.2 Strict exogeneity, and using the
notation Var (jX) for the entire variance-covariance matrix of , we
can write Assumption 1.4 in terms of conditional variances and covariances
Var (jX) = 2 In
Lecture Notes 1
Autumn 2016
59 / 87
Lecture Notes 1
Autumn 2016
60 / 87
Lecture Notes 1
Autumn 2016
61 / 87
= E (b jX)
= E ((X0 X) 1 X0 jX)
= (X0 X ) 1 X0 E ( jX )
=0
(a)
(b)
(c)
(d)
Notes:
(a) Since is a constant.
(b) From denition of the sampling error of b.
(c) Since we are conditioning on X, we can treat any function of X as
nonrandom and move it out of the E ().
(d) By Assumption 1.3 Strict exogeneity. This is key. For models where
1.3 fails (such as most time-series models), OLS is biased.
Note we used Assumptions 1.1-1.3.but we did not use Assumption 1.4
(spherical errors: conditional homoskedasticity and independence).
Unbiasedness of OLS is robust to violations of this assumption.
Mark Schaer (Heriot-Watt University)
Lecture Notes 1
Autumn 2016
62 / 87
= Var (b jX)
= Var ((X0 X) 1 X0 jX)
= [(X0 X) 1 X0 Var (jX) X(X0 X)
= [(X0 X) 1 X0 2 In X(X0 X) 1
= 2 (X0 X ) 1
(a)
(b)
(c)
(d)
(e)
Notes:
(a) Since is a constant.
(b) From denition of the sampling error of b.
(c) Since we are conditioning on X, we can treat any function of X as
nonrandom and move it out of the Var ().
(d) By Assumption 1.4.a and 1.4.b (when we combined (a) and (b) into a
single statement).
(e) (X0 X) 1 and (X0 X) cancel (after moving the scalar 2 out of the way).
Mark Schaer (Heriot-Watt University)
Lecture Notes 1
Autumn 2016
63 / 87
1
1
(a)
(b)
(c)
(d)
(e)
Notes:
(a) Substitution.
(b) Since we are conditioning on X , we can treat any function of X as
nonrandom and move it out of the Var ().
(c) From Var (yjX) =Var (jX).
(d) By Assumption 1.4.a and 1.4.b.
(e) (X0 X) 1 and (X0 X) cancel (after moving the scalar 2 out of the way).
Mark Schaer (Heriot-Watt University)
Lecture Notes 1
Autumn 2016
64 / 87
()
(A
B) is positive semidenite.
Lecture Notes 1
65 / 87
1 X0 )y
= Dy + b = DX + D + b
b implies
Taking conditional expectations of this plus the unbiasedness of
b = D + b and therefore
that DX = 0 (see Hayashi). So
b
= D + (b
Lecture Notes 1
Autumn 2016
66 / 87
= D + (b
= D + (b
The second term on the right weve seen before - its the sampling error of
b. Substitute and we get
) = D+(X0 X)
1 X0
b
We are now set up to go, because Var (
= (D+(X0 X)
1 X0 )
b jX). So
jX) =Var (
Lecture Notes 1
Autumn 2016
67 / 87
e0 e
is feasible - we have everything we need to
n K
Estimate of Var(bjX)
The variance of the OLS estimator Var (bjX) = 2 (X0 X)
it depends on the unknown "nuisance parameter" 2 .
is infeasible -
\
Var
(bjX)= s2 (X0 X)
(1.3.4)
Lecture Notes 1
Autumn 2016
68 / 87
= (X0 X )
1 X0
Since the sampling error is a function of (X,), we could specify the joint
distribution of (X,) and work with that, but that is unattractive - how do
we know what the true distribution is?
Mark Schaer (Heriot-Watt University)
Lecture Notes 1
Autumn 2016
69 / 87
Lecture Notes 1
Autumn 2016
70 / 87
(1.4.1)
(Just plug the conditional mean and variance into the denition of a
Normal random variable - they dene a Normal distribution.)
This means the distribution of conditional on X doesnt depend on the
latter; and X are independent. Thus the marginal or unconditional
distribution of is simply s N (0, 2 In ).
Mark Schaer (Heriot-Watt University)
Lecture Notes 1
Autumn 2016
71 / 87
(b
) = (X0 X )
1 X0
(b
) jX s N (0, 2 (X0 X)
1)
(1.4.2)
Lecture Notes 1
Autumn 2016
72 / 87
(b
) jX s N (0, 2 (X0 X)
1)
k jX s N (0, 2 (X0 X)
where (X0 X)
1
kk
1
kk
Lecture Notes 1
1.
Autumn 2016
73 / 87
2 [(X0 X)
where zk
1]
k jX s N (0, 2 (X0 X)
kk
1
kk
).
bk
2 [(X0 X)
1]
(1.4.3)
kk
Lecture Notes 1
Autumn 2016
74 / 87
zk jX s N (0, 1)
where zk
bk
2 [(X0 X)
1]
(1.4.3)
kk
The only problem is ... the test statistic zk is infeasible because it depends
on the unknown nuisance parameter 2 .
Exactly the same issue arises if we want to construct tests of linear
hypotheses.
Lecture Notes 1
Autumn 2016
75 / 87
and
3 = 0
Lecture Notes 1
Autumn 2016
76 / 87
(1.4.8)
Require rank (R) = #r, i.e., R is full row rank. Means no redundant
equations and no inconsistent equations (see Hayashi p. 40).
We can write any set of linear hypotheses this way.
Lecture Notes 1
Autumn 2016
77 / 87
(1.4.8)
3 = 0
and
2 3
0
4
r = 05
0
2 3
0
r = 405
1
3
1
= 4 2 5
3
0 1 0
0 0 1
H0 : 2 + 3 = 1
(CRS)
R= 0 1 1
Mark Schaer (Heriot-Watt University)
3
1
= 4 2 5
3
Lecture Notes 1
Autumn 2016
78 / 87
where W
(Rb
h
r ) 0 2 R(X0 X )
R0
(Rb
r)
(Rb
rjX)]
(Rb
r)
1 R0
Lecture Notes 1
Autumn 2016
79 / 87
where W
(Rb
h
r ) 0 2 R(X0 X )
R0
(Rb
r)
Lecture Notes 1
Autumn 2016
80 / 87
(Rb
where zk
where
r ) 2 R (X0 X )
R0
bk
2 [(X0 X)
1]
r)
(Rb
.
kk
e0 e
?
n K
Lecture Notes 1
Autumn 2016
81 / 87
Under H0 : k = k , zk jX s N (0, 1)
where zk
bk
2 [(X0 X)
1]
.
kk
e0 e
, we obtain a
n K
dierent, feasible test statistic with a known distribution:
Under H0 : k = k , tk jX s t (n
k)
where tk
bk
s2
k
0
[(X X) 1 ]
kk
Lecture Notes 1
Autumn 2016
82 / 87
(Rb
where
r ) 2 R (X0 X )
R0
(Rb
r)
e0 e
, we can construct a
n K
dierent, feasible test statistic with a known distribution:
If we replace 2 with the OLS estimator s 2 =
Under H0 : R = r, F jX s F (#r, n
(Rb
K)
where
h
i
1
r ) s 2 R (X0 X ) R0
0
(Rb
r)
#r
Lecture Notes 1
Autumn 2016
83 / 87
e)
arg minSSR (
e
e ) s.t. R
e=r
arg minSSR (
e
Values of the two objective functions at their minima: SSRU and SSRR .
NB: LM (Lagrange Multiplier) Principle: estimate the restricted equation.
Then calculate "reduction in cost" from relaxing the constraints in H0 .
Mark Schaer (Heriot-Watt University)
Lecture Notes 1
Autumn 2016
84 / 87
Lecture Notes 1
Autumn 2016
85 / 87
1.1:
1.2:
1.3:
1.4:
1.5:
Linearity y = X +
Strict exogeneity E (i jX) = 0
i = 1, 2, ...n
No multicollinearity X is full rank with probability 1.
Spherical error variance Var (jX) = 2 In
Normality of
All of these assumptions (with the possible exceptions of 1.3 and 1.1) are
unattractive. We dont want our estimates and inferences to depend
heavily on assumptions that we dont believe and that are likely to be
violated in reality.
Loosening these assumptions and still obtaining nite-sample results is
often di cult or impossible. It is much easier to relax these assumptions in
a large-sample setting and rely on asymptotic results.
Mark Schaer (Heriot-Watt University)
Lecture Notes 1
Autumn 2016
86 / 87
Lecture Notes 1
Autumn 2016
87 / 87