Sei sulla pagina 1di 15

Colin Cameron: Asymptotic Theory for OLS

1. OLS Estimator Properties and Sampling Schemes


1.1. A Roadmap Consider the OLS model with just one regressor yi = xi + ui . 1 P N b = PN x2 The OLS estimator i=1 i i=1 xi yi can be written as Then under assumptions given below (including E[ui |xi ] = 0) 1 PN plim 0 p i=1 xi ui N b+ =+ = . P N 1 1 PN 2 plim N i=1 xi plim N i=1 x2 i b=+
1 PN i=1 xi ui N . P N 1 2 x i =1 i N

b is consistent for . It follows that And under assumptions given below (including E[ui |xi ] = 0 and V[ui |xi ] = 2 ) h PN 2 i " 1 # 2 plim 1 1 PN N 0 , x u XN i=1 xi N i=1 i i d 1 d N b ) = N ( N 0, 2 plim x2 . 1 PN 1 PN 2 2 i=1 i N x plim x i=1 i i=1 i N N P 1 a N 2 2 b is asymptotically normal distributed with b N 0, . It follows that i=1 xi 1.2. Sampling Schemes and Error Assumptions The key for consistency is obtaining the probability of the two averages (of xi ui and of x2 i ), by use of laws of large numbers (LLN). And for asymptotic normality the key is the limit distribution of the average of xi ui , obtained by a central limit theorem (CLT). Dierent assumptions about the stochastic properties of xi and ui lead to dierent properties of x2 i and xi ui and hence dierent LLN and CLT. For the data dierent sampling schemes assumptions include: 1. Simple Random Sampling (SRS). SRS is when we randomly draw (yi , xi ) from the population. Then xi are iid. So x2 i are iid, and xi ui are iid if the errors ui are iid.

2. Fixed regressors. This occurs in an experiment where we x the xi and observe the resulting random yi . Given xi xed and ui iid it follows that xi ui are inid (even if ui are iid), while x2 i are nonstochastic. 3. Exogenous Stratied Sampling This occurs when oversample some values of x and undersample others. Then xi are inid, so xi ui are inid (even if ui are iid) and x2 i are inid. The simplest results assume ui are iid. In practice for cross-section data the errors may be inid due to conditional heteroskedasticity, with V[ui |xi ] = 2 i varying with i.

2. Asymptotic Theory for Consistency


Consider the limit behavior of a sequence of random variables bN as N . This is a stochastic extension of a sequence of real numbers, such as aN = 2 + (3/N ). ; (2) bN is a component of an Examples include: P (1) bN is an estimator, say b 1 x u ; (3) b is a test statistic. estimator, such as N N i i i 2.1. Convergence in Probability, Consistency, Transformations Due to sampling randomness we can never be certain that a random sequence bN , such as an estimator b N , will be within a given small distance of its limit, even if the sample is innitely large. But we can be almost certain. Dierent ways of expressing this almost certainty correspond to dierent types of convergence of a sequence of random variables to a limit. The one most used in econometrics is convergence in probability. Recall that a sequence of nonstochastic real numbers {aN } converges to a if for any > 0, there exists N = N () such that for all N > N , |aN a| < . e.g. if aN = 2 + 3/N, then the limit a = 2 since |aN a| = |2 + 3/N 2| = |3/N | < for all N > N = 3/. For a sequence of r.v.s we cannot be certain that |bN b| < , even for large N , due to the randomness. Instead, we require that the probability of being within is arbitrarily close to one. Thus {bN } converges in probability to b if
N

lim Pr[|bN b| < ] = 1,

for any > 0. A formal denition is the following. 2

Denition A1: (Convergence in Probability ) A sequence of random variables {bN } converges in probability to b if for any > 0 and > 0, there exists N = N (, ) such that for all N > N , Pr[|bN b| < ] > 1 . We write plim bN = b, where plim is short-hand for probability limit, or bN b. The limit b may be a constant or a random variable. The usual denition of convergence for a sequence of real variables is a special case of A1. For vector random variables we can apply the theory for each element of bN . [Alternatively replace |bN b| by the scalar (bN b)0 (bN b) = (b1N b1 )2 + + (bKN bK )2 or its square root ||bN b||.] b is consistent for 0 if Denition A2: (Consistency ) An estimator b] = 0 . Unbiasedness permits Unbiasedness ; consistency. Unbiasedness states E[ variability around 0 that need not disappear as the sample size goes to innity. Consistency ; unbiasedness. e.g. add 1/N to an unbiased and consistent estimator - now biased but still consistent. A useful property of plim is that it can apply to transformations of random variables. Theorem A3: (Probability Limit Continuity ). Let bN be a nite-dimensional vector of random variables, and g () be a real-valued function continuous at a constant vector point b. Then p p bN b g(bN ) g(b). This theorem is often referred to as Slutskys Theorem. We instead call Theorem A12 Slutskys theorem. Theorem A3 is one of the major reasons for the prevalence of asymptotic results versus nite sample results in econometrics. It states a very convenient property that does not hold for expectations. For example, plim(aN , bN ) = (a, b) implies plim(aN bN ) = ab, whereas E[aN bN ] generally diers from E[a]E[b]. Similarly plim[aN /bN ] = a/b provided b 6= 0. b = 0 . plim b. Now consider {bN } to be a sequence of parameter estimates
p

2.2. Alternative Modes of Convergence It is often easier to establish alternative modes of convergence, which in turn imply convergence in probability. [However, laws of large numbers, given in the next section, are used much more often.] Denition A4: (Mean Square Convergence ) A sequence of random variables {bN } is said to converge in mean square to a random variable b if
N m

lim E[(bN b)2 ] = 0.


m p

We write bN b. Convergence in mean square is useful as bN b implies bN b. Another result that can be used to show convergence in probability is Chebychevs inequality. Theorem A5: (Chebyshevs Inequality ) For any random variable Z with mean and variance 2 Pr[(Z )2 > k] 2 /k, for any k > 0. A nal type of convergence is almost sure convergence (denoted ). This is conceptually dicult and often hard to prove. So skip. Almost sure convergence (or strong convergence) implies convergence in probability (or weak convergence).
as

2.3. Laws of Large Numbers Laws of large numbers are theorems for convergence in probability (or almost surely ) in N where the special case where the sequence {bN } is a sample average, i.e. bN = X
N 1 X XN = Xi . N i=1

Note that Xi here is general notation for a random variable, and in the regression context does not necessarily denote the regressor variables. For example Xi = xi ui . A LLN is much easier way to get the plim than use of Denition A1 or Theorems A4 or A5. Widely used in econometrics because the estimators involve averages. Denition A7: (Law of Large Numbers ) A weak law of large numbers (LLN) N under which species conditions on the individual terms Xi in X
p N ]) N E[X 0. (X

For a strong law of large numbers the convergence is instead almost surely. If a LLN can be applied then N plim X N ] = lim E[X in general PN 1 = lim N i=1 E[Xi ] if Xi independent over i = if Xi iid.

Leading examples of laws of large numbers follow. Theorem A8: (Kolmogorov LLN ) Let {Xi } be iid (independent and identically disas N ) 0. tributed). If and only if E[Xi ] = exists and E[|Xi |] < , then (X Theorem A9: (Markov LLN ) Let {Xi } be inidP (independent but not identically dis 1+ . If ]/i1+ ) < , for tributed) with E[Xi ] = i and V[Xi ] = 2 i i=1 (E[|Xi i | P as N N 1 N E[Xi ]) 0. some > 0, then (X i=1

The Markov LLN allows nonidentical distribution, at expense of require existence of an absolute moment beyond the rst. The rest of the side-condition is likely to hold P 2 with cross-section data. e.g. if set = 1, then need variance plus i=1 (2 i /i ) < 2 which happens if i is bounded. Kolmogorov LLN gives almost sure convergence. Usually convergence in probability is enough and we can use the weaker Khinchines Theorem. Theorem A8b: (Khinchines Theorem ) Let {Xi } be iid (independent and identically p N ) 0. distributed). If and only if E[Xi ] = exists, then (X Which LLN should I use in regression applications? It depends on the sampling scheme.

3. Consistency of OLS Estimator


b = + [ 1 PN xi ui ]/[ 1 PN x2 ]. Obtain probability limit of i=1 i=1 i N N

3.1. Simple Random Sampling (SRS) with iid errors

Assume xi iid with mean x and ui iid with mean 0. P p As xi ui are iid, apply Khinchines Theorem yielding N 1 i xi ui E[xu] = E[x]E[u] = 0. P 2 p 1 2 As x2 i are iid, apply Khinchines Theorem yielding N i xi E[x ] which we assume exists. 5

By Theorem A3 (Probability Limit Continuity) plim[aN /bN ] = a/b if b 6= 0. Then b=+ plim
1 plim N

plim

1 N

PN

3.2. Fixed Regressors with iid errors

PN

i=1 xi ui 2 i=1 xi

=+

0 = . E[x2 ]

Assume xi xed and that ui iid with mean 0 and variance 2 . 2 Then xi ui are inid with mean E[xi ui ] = xi E[ui ] = 0 and variance V[xi ui ] = x2 i . P P P p p 1 1 Apply Markov LLN yielding N 1 i xi ui N i E[xi ui ] 0, so N i xi ui 0. P 2 2 2 The side-condition with = P 1 is i=1 xi /i which is satised if xi is bounded. We also assume lim N 1 i x2 i exists. By Theorem A3 (Probability Limit Continuity) plim[aN /bN ] = a/b if b 6= 0. Then b=+ plim
1 plim N 1 lim N

PN

3.3. Exogenous Stratied Sampling with iid errors

PN

i=1 xi ui 2 i=1 xi

=+

1 lim N

0 PN

2 i=1 xi

= .

Assume xi inid with mean E[xi ] and variance V[xi ] and ui iid with mean 0. 2 Now xi ui are inid with mean E[xi ui ] =E[xi ]E[ui ] = 0 and variance V[xi ui ] =E[x2 i ] , P p so need Markov LLN. This yields N 1 i xi ui 0, with the side-condition satised if E[x2 i ] is bounded. And x2 i are inid, so need Markov LLN with side-condition that requires e.g. existence and boundedness of E[x4 i ]. b = . Combining again get plim

4. Asymptotic Theory for Asymptotic Normality

Given consistency, the estimator b has a degenerate distribution that collapses on 0 as N . So cannot do statistical inference. [Indeed there is no reason to do it if N .] Need to magnify or rescale b to obtain a random variable with nondegenerate distribution as N . 4.1. Convergence in Distribution, Transformation 0 ). bN has Often the appropriate scale factor is N , so consider bN = N (b an extremely complicated cumulative distribution function (cdf) FN . But like any other function FN it may have a limit function, where convergence is in the usual (nonstochastic) mathematical sense.

Denition A10: (Convergence in Distribution ) A sequence of random variables {bN } is said to converge in distribution to a random variable b if
N

lim FN = F,

at every continuity point of F , where FN is the distribution of bN , F is the distribution of b, and convergence is in the usual mathematical sense. We write bN b, and call F the limit distribution of {bN }. p d bN b implies bN b. p d In general, the reverse is not true. But if b is a constant then bN b implies bN b. To extend limit distribution to vector random variables simply dene FN and F to be the respective cdfs of vectors bN and b. A useful property of convergence in distribution is that it can apply to transformations of random variables. Theorem A11: (Limit Distribution Continuity). Let bN be a nite-dimensional vector of random variables, and g () be a continuous real-valued function. Then bN b g(bN ) g(b). This result is also called the Continuous Mapping Theorem. Theorem A12: (Slutskys Theorem ) If aN a and bN b, where a is a random variable and b is a constant, then (i) (ii) aN + bN a + b
d d d d p d d d

(4.1)

(iii) aN /bN a/b, provided Pr[b = 0] = 0. Theorem A12 (also called Cramers Theorem ) permits one to separately nd the limit distribution of aN and the probability limit of bN , rather than having to consider the joint behavior of aN and bN . Result (ii) is especially useful and is sometimes called the Product Rule. 4.2. Central Limit Theorems Central limit theorems give convergence in distribution when the sequence {bN } is a sample average. A CLT is much easier way to get the plim than e.g. use of Denition A10. 7

aN bN ab

(4.2)

N has a degenerate distribution as it converges to a constant, limE[X N ]. By a LLN X N ]) by its standard deviation to construct a random variable with N E[X So scale (X unit variance that will have a nondegenerate distribution. Denition A13: (Central Limit Theorem ) Let ZN = N ] N E[X X p , N ] V[X

N is a sample average. A central limit theorem (CLT) species the conditions where X N under which on the individual terms Xi in X ZN N [0, 1], i.e. ZN converges in distribution to a standard normal random variable. Note that ZN p N E[X N ])/ V[X N ] = (X in general qP PN N = i=1 (Xi E[Xi ])/ i=1 V[Xi ] if Xi independent over i N )/ = N (X if Xi iid.
d

N for functions h() N satises a central limit theorem, then so too does h(N )X If X such as h(N ) = N , since N ] E[h(N )X h(N )X pN . N ] V[h(N )X N = N 1/2 PN Xi , since V[ N X N ] Often apply the CLT to the normalization N X i=1 is nite. ZN = Examples of central limit theorems include the following. Theorem A14: (Lindeberg-Levy CLT ) Let {Xi } be iid with E[Xi ] = and V[Xi ] = 2 . d N )/ Then ZN = N (X N [0, 1]. Theorem A15: (Liapounov CLT ) Let {Xi } be independent with E[Xi ] = i and (2+)/2 P P N N 2+ ] / 2 . If lim E [ | X | = 0, for some choice of V[Xi ] = 2 i i i i=1 i=1 i q PN 2 d P > 0, then ZN = N i=1 (Xi i )/ i=1 i N [0, 1].

Lindberg-Levy is the CLT in introductory statistics. For the iid case the LLN required exists, while CLT also requires 2 exists. 8

For inid data the Liapounov CLT additionally requires existence of an absolute moment of higher order than two. Which CLT should I use in regression applications? It depends on the sampling scheme.

5. Limit Distribution of OLS Estimator


Obtain limit distribution of b ) = [ 1 PN xi ui ]/[ 1 PN x2 ]. N ( i=1 i=1 i N N

5.1. Simple Random Sampling (SRS) with iid errors

Assume xi iid with mean x and second moment E[x2 ], and assume ui iid with mean 0 and variance 2 . Then xi ui are iid, with mean 0 and variance 2 E[x2 ]. [Proof for variance: Vx,u [xu] = Ex [V[xu|x]]+Vx [E[xu|x]] = Ex [x2 2 ] +0 = 2 Ex [x2 ]. Apply Lindeberg-Levy CLT yielding ! P 1 PN N 1 N xi ui 0 i=1 xi ui d i =1 p p N N [0, 1]. = N 2 E[x2 ] 2 E[x2 ] Using Slutskys theorem that aN bN a b (for aN a and bN b), this implies 1 XN d xi ui N [0, 2 E[x2 ]]. i=1 N
d d d d p

Then using Slutskys theorem that aN /bN a/b (for aN a and bN b) 1 PN h 2 2 2 2 i i=1 xi ui d N 0, E[x ] d N 0, E[x ] d N 2 2 1 b , N ( ) = 1 PN 2 N 0 , ] E [ x P N 1 2 E[x2 ] plim N i=1 xi i=1 xi N P 2 2 where we use result from consistency proof that plim N 1 N i=1 xi =E[x ]. 5.2. Fixed Regressors with iid errors Assume xi xed and and ui iid with mean 0 and variance 2 . 2 Then xi ui are inid with mean 0 and variance V[xi ui ] = x2 i . Apply Liapounov LLN yielding PN 1 PN 1 xi ui N x u 0 d i i i=1 = q N i=1 N q N [0, 1]. P P 2 2 2 lim N 1 N 2 lim N 1 N i=1 xi i=1 xi 9

Using Slutskys theorem that aN bN a b (for aN a and bN b), this implies 1 XN 1 XN 2 d xi ui N [0, 2 lim x ]. i=1 i=1 i N N
d d

Then using Slutskys theorem that aN /bN a/b (for aN a and bN b) h PN 2 i " 2 lim 1 1 # 1 PN x N 0 , x u X i i N i i =1 N 1 i =1 d d b ) = N P N ( N 0, 2 lim x2 . N 1 1 PN 2 2 i=1 i N x lim x i=1 i i=1 i N N 5.3. Exogenous Stratied Sampling with iid errors Assume xi inid with mean E[xi ] and variance V[xi ] and ui iid with mean 0. Similar to xed regressors will need to use Liapounov CLT. We will get " 1 # XN 1 d 2 2 b ) N 0, plim N ( x . i=1 i N

6. Asymptotic Distribution of OLS Estimator


b has a degenerate distribution with all mass at , while From consistency we have that b ) has a limit normal distribution. For formal asymptotic theory, such as N ( deriving hypothesis tests, we work with this limit distribution. But for exposition it b ). We do this by b rather than N ( is convenient to think of the distribution of introducing the artice of "asymptotic distribution". Specically we consider N large but not innite, and drop the probability limit in the preceding result, so that " X 1 # N 1 b ) N 0, 2 N ( x2 . i=1 i N

b is It follows that the asymptotic distribution of " X 1 # N a 2 2 b xi . N ,


i=1

Note that this is exactly the same result as we would have got if yi = xi + ui with ui N [0, 2 ].

10

7. Multivariate Normal Limit Theorems


The preceding CLTs were for scalar random variables. N ] and VN = V[X N ]. Denition A16a: (Multivariate Central Limit Theorem ) Let N = E[X A multivariate central limit theorem (CLT) species the conditions on the indi N under which vidual terms Xi in X VN
1/2

(bN N ) N [0, I].

This is formally established using the following result. Theorem A16: (Cramer-Wold Device ) Let {bN } be a sequence of random k 1 vectors. If 0 bN converges to a normal random variable for every k 1 constant non-zero vector , then bN converges to a multivariate normal random variable. 1N + + k X kN N , then 0 bN = 1 X The advantage of this result is that if bN = X will be a scalar average and we can apply a scalar CLT, yielding 0 N d 0 X pN N [0, 1], 0 VN and hence VN
1/2

(bN N ) N [0, I].

where plim HN exists and aN has a limit normal distribution. The distribution of this product can be obtained directly from part (ii) of Theorem A12 (Slutskys theorem). We restate it in a form that arises for many estimators. Theorem A17: (Limit Normal Product Rule ) If a vector aN N [, A] and a matrix p HN H, where H is positive denite, then HN aN N [H, HAH0 ]. For example, the OLS estimator b 0 = N 1 1 0 1 X0 u, XX N N
d d

Microeconometric estimators can often be expressed as N (b 0 ) = HN aN ,

is HN = (N 1 X0 X)1 times aN = N 1/2 X0 u and we nd the plim of HN and the limit distribution of aN . 11

then it follows by Theorem A17 that BN


1/2

Theorem A17 also justies replacement of a limit distribution variance matrix by a consistent estimate without changing the limit distribution. Given d b 0 N N [0, B], d b 0 N N [0, I]
1/2 d

for any BN that is a consistent estimate for B and is positive denite. A formal multivariate CLT yields VN (bN N ) N [0, I]. Premultiply by VN and apply Theorem A17, giving simpler form bN N N [0, V], where V = plim VN and we assume bN and VN are appropriately scaled so that V exists and is positive denite. Dierent authors express the limit variance matrix V in dierent ways. 1. General form: V = plim VN . With xed regressors V = lim VN . 2. Strati ed sampling or xed regressors: Often VN is a matrix average, say VN = P p N 1 square matrix. A LLN gives VN E[VN ] 0. Then N i=1 Si ,where Si is a P V = limE[VN ] = lim N 1 N i=1 E[Si ].
d 1/2

3. Simple random sampling: Si are iid, E[Si ] =E[S], so V =E[S].

P P 1 0 0 As an example, plim N 1 i xi x0 i = lim N i E[xi xi ] if LLN applies and =E[xx ] under simple random sampling.

8. Asymptotic Normality
b rather than It can be convenient to re-express results in terms of d b 0 ) N ( N [0, B], b 0 ). N (

b) If Denition A18: (Asymptotic Distribution of

b is asymptotically normally distributed with then we say that in large samples a b N 0 ,N 1 B , 12

where the term in large samples means that N is large enough for good approximation but not so large that the variance N 1 B goes to zero. A more shorthand notation is to implicitly presume asymptotic normality and use the following terminology. b) If (??) holds then we say that the Denition A19: (Asymptotic Variance of b is asymptotic variance matrix of b) If (??) holds then we say Denition A20: (Estimated Asymptotic Variance of b that the estimated asymptotic variance matrix of is b is a consistent estimate of B. where B b] = N 1 B b. b [ V b] = N 1 B. V[

(8.1)

(8.2)

Denition A21: (Asymptotic Eciency ) A consistent asymptotically normal estib of is said to be asymptotically ecient if it has an asymptotic variancemator covariance matrix equal to the Cramer-Rao lower bound " #!1 2 ln LN . E / 0 0

b] and Avar b] in denitions A19 and A20 to avoid po[ [ Some authors use the Avar[ b] means tential confusion with the variance operator V[]. It should be clear that here V[ asymptotic variance of an estimator since few estimators have closed form expressions for the nite sample variance. d N )/ As an example of denitions 18-20, if {Xi } are iid [, 2 ] then N (X d a N N N [, 2 ]. Then X N ,2 /N ; the asympN [0, 1], or equivalently that N X N is s2 /N , N is 2 /N ; and the estimated asymptotic variance of X totic variance of X P N 2 /(N 1). where s2 is a consistent estimator of 2 such as s2 = i Xi X

9. OLS Estimator with Matrix Algebra

b = + (X0 X)1 X0 u. P Note that the k k matrix X0 X = i xi x0 i where xi is a k 1 vector of regressors for the ith observation. 13

b = (X0 X)1 X0 y with y = X + u, so Now consider

9.1. Consistency of OLS To prove consistency we rewrite this as b = + N 1 X0 X 1 N 1 X0 u.

using Slutskys Theorem (Theorem A.3). The OLS estimator is therefore consistent b for (i.e., plim OLS = ) if plim N 1 X0 u = 0. P If a law of LLN can be applied to the average N 1 X0 u = N 1 i xi ui then a necessary condition for this to hold is that E[xi ui ] = 0. The fundamental condition for consistency of OLS is that E[ui |xi ] = 0 so that E[xi ui ] = 0. 9.2. Limit Distribution of OLS b is degenerate with all the mass at . To Given consistency, the limit distribution of b OLS by N , so obtain a limit distribution we multiply b ) = N 1 X0 X 1 N 1/2 X0 u. N ( We know plim N 1 X0 X exists and is nite and nonzero from the proof of consistency. For iid errors, E[uu0 |X] =2 I and V[X0 u|X] = E[X0 uu0 X0 |X] = 2 X0 X we assume that a CLT can be applied to yield d N 1/2 X0 u N [0, 2 plim N 1 X0 X ].

P The reason for renormalization in the right-hand side is that N 1 X0 X = N 1 i xi x0 i is an average that converges in probability to a nite nonzero matrix if xi satises assumptions that permit a LLN to be applied to xi x0 i. Then b = + plim N 1 X0 X 1 plim N 1 X0 u , plim

Then by Theorem A17: (Limit Normal Product Rule)

9.3. Asymptotic Distribution of OLS Then dropping the limits

1 d b ) N ( plim N 1 X0 X N [0, 2 plim N 1 X0 X ] 1 d N [0, 2 plim N 1 X0 X ].

b ) N [0, 2 N 1 X0 X ], N ( 14

so

The asymptotic variance matrix is

a b N [, 2 (X0 X)1 ].

and is consistently estimated by the estimated variance matrix b ] = s2 (X0 X)1 , b [ V

b ] = 2 (X0 X)1 , V[

b0 u b0 u b /(N k ) or s2 = u b /N. where s2 is consistent for 2 . For example, s2 = u 9.4. OLS with Heteroskedatic Errors
0 What if the errors are heteroskedastic? If E[uu0 |X] = = Diag[2 i ] then V[X u|X] = P 0 N 0 E[X0 uu0 X0 |X] =X X = i=1 2 i xi xi . A CLT gives

N 1/2 X0 u N [0, plim N 1 X0 X],

leading to 1 d b ) N ( plim N 1 X0 X N [0, plim N 1 X0 X] 1 1 d N [0, 2 plim N 1 X0 X plim N 1 X0 X plim N 1 X0 X ].


a b N [, (X0 X)1 X0 X(X0 X)1 ].

Then dropping the limits etcetera

The asymptotic variance matrix is

White (1980) showed that this can be consistently estimated by X 1 0 b ] = (X0 X)1 ( N u b [ bi 2 xi x0 V i )(X X) ,
i=1

b ] = (X0 X)1 X0 X(X0 X)1 . V[

even though u bi 2 is not consistent for 2 i.

15

Potrebbero piacerti anche