Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
b is consistent for . It follows that And under assumptions given below (including E[ui |xi ] = 0 and V[ui |xi ] = 2 ) h PN 2 i " 1 # 2 plim 1 1 PN N 0 , x u XN i=1 xi N i=1 i i d 1 d N b ) = N ( N 0, 2 plim x2 . 1 PN 1 PN 2 2 i=1 i N x plim x i=1 i i=1 i N N P 1 a N 2 2 b is asymptotically normal distributed with b N 0, . It follows that i=1 xi 1.2. Sampling Schemes and Error Assumptions The key for consistency is obtaining the probability of the two averages (of xi ui and of x2 i ), by use of laws of large numbers (LLN). And for asymptotic normality the key is the limit distribution of the average of xi ui , obtained by a central limit theorem (CLT). Dierent assumptions about the stochastic properties of xi and ui lead to dierent properties of x2 i and xi ui and hence dierent LLN and CLT. For the data dierent sampling schemes assumptions include: 1. Simple Random Sampling (SRS). SRS is when we randomly draw (yi , xi ) from the population. Then xi are iid. So x2 i are iid, and xi ui are iid if the errors ui are iid.
2. Fixed regressors. This occurs in an experiment where we x the xi and observe the resulting random yi . Given xi xed and ui iid it follows that xi ui are inid (even if ui are iid), while x2 i are nonstochastic. 3. Exogenous Stratied Sampling This occurs when oversample some values of x and undersample others. Then xi are inid, so xi ui are inid (even if ui are iid) and x2 i are inid. The simplest results assume ui are iid. In practice for cross-section data the errors may be inid due to conditional heteroskedasticity, with V[ui |xi ] = 2 i varying with i.
Denition A1: (Convergence in Probability ) A sequence of random variables {bN } converges in probability to b if for any > 0 and > 0, there exists N = N (, ) such that for all N > N , Pr[|bN b| < ] > 1 . We write plim bN = b, where plim is short-hand for probability limit, or bN b. The limit b may be a constant or a random variable. The usual denition of convergence for a sequence of real variables is a special case of A1. For vector random variables we can apply the theory for each element of bN . [Alternatively replace |bN b| by the scalar (bN b)0 (bN b) = (b1N b1 )2 + + (bKN bK )2 or its square root ||bN b||.] b is consistent for 0 if Denition A2: (Consistency ) An estimator b] = 0 . Unbiasedness permits Unbiasedness ; consistency. Unbiasedness states E[ variability around 0 that need not disappear as the sample size goes to innity. Consistency ; unbiasedness. e.g. add 1/N to an unbiased and consistent estimator - now biased but still consistent. A useful property of plim is that it can apply to transformations of random variables. Theorem A3: (Probability Limit Continuity ). Let bN be a nite-dimensional vector of random variables, and g () be a real-valued function continuous at a constant vector point b. Then p p bN b g(bN ) g(b). This theorem is often referred to as Slutskys Theorem. We instead call Theorem A12 Slutskys theorem. Theorem A3 is one of the major reasons for the prevalence of asymptotic results versus nite sample results in econometrics. It states a very convenient property that does not hold for expectations. For example, plim(aN , bN ) = (a, b) implies plim(aN bN ) = ab, whereas E[aN bN ] generally diers from E[a]E[b]. Similarly plim[aN /bN ] = a/b provided b 6= 0. b = 0 . plim b. Now consider {bN } to be a sequence of parameter estimates
p
2.2. Alternative Modes of Convergence It is often easier to establish alternative modes of convergence, which in turn imply convergence in probability. [However, laws of large numbers, given in the next section, are used much more often.] Denition A4: (Mean Square Convergence ) A sequence of random variables {bN } is said to converge in mean square to a random variable b if
N m
We write bN b. Convergence in mean square is useful as bN b implies bN b. Another result that can be used to show convergence in probability is Chebychevs inequality. Theorem A5: (Chebyshevs Inequality ) For any random variable Z with mean and variance 2 Pr[(Z )2 > k] 2 /k, for any k > 0. A nal type of convergence is almost sure convergence (denoted ). This is conceptually dicult and often hard to prove. So skip. Almost sure convergence (or strong convergence) implies convergence in probability (or weak convergence).
as
2.3. Laws of Large Numbers Laws of large numbers are theorems for convergence in probability (or almost surely ) in N where the special case where the sequence {bN } is a sample average, i.e. bN = X
N 1 X XN = Xi . N i=1
Note that Xi here is general notation for a random variable, and in the regression context does not necessarily denote the regressor variables. For example Xi = xi ui . A LLN is much easier way to get the plim than use of Denition A1 or Theorems A4 or A5. Widely used in econometrics because the estimators involve averages. Denition A7: (Law of Large Numbers ) A weak law of large numbers (LLN) N under which species conditions on the individual terms Xi in X
p N ]) N E[X 0. (X
For a strong law of large numbers the convergence is instead almost surely. If a LLN can be applied then N plim X N ] = lim E[X in general PN 1 = lim N i=1 E[Xi ] if Xi independent over i = if Xi iid.
Leading examples of laws of large numbers follow. Theorem A8: (Kolmogorov LLN ) Let {Xi } be iid (independent and identically disas N ) 0. tributed). If and only if E[Xi ] = exists and E[|Xi |] < , then (X Theorem A9: (Markov LLN ) Let {Xi } be inidP (independent but not identically dis 1+ . If ]/i1+ ) < , for tributed) with E[Xi ] = i and V[Xi ] = 2 i i=1 (E[|Xi i | P as N N 1 N E[Xi ]) 0. some > 0, then (X i=1
The Markov LLN allows nonidentical distribution, at expense of require existence of an absolute moment beyond the rst. The rest of the side-condition is likely to hold P 2 with cross-section data. e.g. if set = 1, then need variance plus i=1 (2 i /i ) < 2 which happens if i is bounded. Kolmogorov LLN gives almost sure convergence. Usually convergence in probability is enough and we can use the weaker Khinchines Theorem. Theorem A8b: (Khinchines Theorem ) Let {Xi } be iid (independent and identically p N ) 0. distributed). If and only if E[Xi ] = exists, then (X Which LLN should I use in regression applications? It depends on the sampling scheme.
Assume xi iid with mean x and ui iid with mean 0. P p As xi ui are iid, apply Khinchines Theorem yielding N 1 i xi ui E[xu] = E[x]E[u] = 0. P 2 p 1 2 As x2 i are iid, apply Khinchines Theorem yielding N i xi E[x ] which we assume exists. 5
By Theorem A3 (Probability Limit Continuity) plim[aN /bN ] = a/b if b 6= 0. Then b=+ plim
1 plim N
plim
1 N
PN
PN
i=1 xi ui 2 i=1 xi
=+
0 = . E[x2 ]
Assume xi xed and that ui iid with mean 0 and variance 2 . 2 Then xi ui are inid with mean E[xi ui ] = xi E[ui ] = 0 and variance V[xi ui ] = x2 i . P P P p p 1 1 Apply Markov LLN yielding N 1 i xi ui N i E[xi ui ] 0, so N i xi ui 0. P 2 2 2 The side-condition with = P 1 is i=1 xi /i which is satised if xi is bounded. We also assume lim N 1 i x2 i exists. By Theorem A3 (Probability Limit Continuity) plim[aN /bN ] = a/b if b 6= 0. Then b=+ plim
1 plim N 1 lim N
PN
PN
i=1 xi ui 2 i=1 xi
=+
1 lim N
0 PN
2 i=1 xi
= .
Assume xi inid with mean E[xi ] and variance V[xi ] and ui iid with mean 0. 2 Now xi ui are inid with mean E[xi ui ] =E[xi ]E[ui ] = 0 and variance V[xi ui ] =E[x2 i ] , P p so need Markov LLN. This yields N 1 i xi ui 0, with the side-condition satised if E[x2 i ] is bounded. And x2 i are inid, so need Markov LLN with side-condition that requires e.g. existence and boundedness of E[x4 i ]. b = . Combining again get plim
Given consistency, the estimator b has a degenerate distribution that collapses on 0 as N . So cannot do statistical inference. [Indeed there is no reason to do it if N .] Need to magnify or rescale b to obtain a random variable with nondegenerate distribution as N . 4.1. Convergence in Distribution, Transformation 0 ). bN has Often the appropriate scale factor is N , so consider bN = N (b an extremely complicated cumulative distribution function (cdf) FN . But like any other function FN it may have a limit function, where convergence is in the usual (nonstochastic) mathematical sense.
Denition A10: (Convergence in Distribution ) A sequence of random variables {bN } is said to converge in distribution to a random variable b if
N
lim FN = F,
at every continuity point of F , where FN is the distribution of bN , F is the distribution of b, and convergence is in the usual mathematical sense. We write bN b, and call F the limit distribution of {bN }. p d bN b implies bN b. p d In general, the reverse is not true. But if b is a constant then bN b implies bN b. To extend limit distribution to vector random variables simply dene FN and F to be the respective cdfs of vectors bN and b. A useful property of convergence in distribution is that it can apply to transformations of random variables. Theorem A11: (Limit Distribution Continuity). Let bN be a nite-dimensional vector of random variables, and g () be a continuous real-valued function. Then bN b g(bN ) g(b). This result is also called the Continuous Mapping Theorem. Theorem A12: (Slutskys Theorem ) If aN a and bN b, where a is a random variable and b is a constant, then (i) (ii) aN + bN a + b
d d d d p d d d
(4.1)
(iii) aN /bN a/b, provided Pr[b = 0] = 0. Theorem A12 (also called Cramers Theorem ) permits one to separately nd the limit distribution of aN and the probability limit of bN , rather than having to consider the joint behavior of aN and bN . Result (ii) is especially useful and is sometimes called the Product Rule. 4.2. Central Limit Theorems Central limit theorems give convergence in distribution when the sequence {bN } is a sample average. A CLT is much easier way to get the plim than e.g. use of Denition A10. 7
aN bN ab
(4.2)
N has a degenerate distribution as it converges to a constant, limE[X N ]. By a LLN X N ]) by its standard deviation to construct a random variable with N E[X So scale (X unit variance that will have a nondegenerate distribution. Denition A13: (Central Limit Theorem ) Let ZN = N ] N E[X X p , N ] V[X
N is a sample average. A central limit theorem (CLT) species the conditions where X N under which on the individual terms Xi in X ZN N [0, 1], i.e. ZN converges in distribution to a standard normal random variable. Note that ZN p N E[X N ])/ V[X N ] = (X in general qP PN N = i=1 (Xi E[Xi ])/ i=1 V[Xi ] if Xi independent over i N )/ = N (X if Xi iid.
d
N for functions h() N satises a central limit theorem, then so too does h(N )X If X such as h(N ) = N , since N ] E[h(N )X h(N )X pN . N ] V[h(N )X N = N 1/2 PN Xi , since V[ N X N ] Often apply the CLT to the normalization N X i=1 is nite. ZN = Examples of central limit theorems include the following. Theorem A14: (Lindeberg-Levy CLT ) Let {Xi } be iid with E[Xi ] = and V[Xi ] = 2 . d N )/ Then ZN = N (X N [0, 1]. Theorem A15: (Liapounov CLT ) Let {Xi } be independent with E[Xi ] = i and (2+)/2 P P N N 2+ ] / 2 . If lim E [ | X | = 0, for some choice of V[Xi ] = 2 i i i i=1 i=1 i q PN 2 d P > 0, then ZN = N i=1 (Xi i )/ i=1 i N [0, 1].
Lindberg-Levy is the CLT in introductory statistics. For the iid case the LLN required exists, while CLT also requires 2 exists. 8
For inid data the Liapounov CLT additionally requires existence of an absolute moment of higher order than two. Which CLT should I use in regression applications? It depends on the sampling scheme.
Assume xi iid with mean x and second moment E[x2 ], and assume ui iid with mean 0 and variance 2 . Then xi ui are iid, with mean 0 and variance 2 E[x2 ]. [Proof for variance: Vx,u [xu] = Ex [V[xu|x]]+Vx [E[xu|x]] = Ex [x2 2 ] +0 = 2 Ex [x2 ]. Apply Lindeberg-Levy CLT yielding ! P 1 PN N 1 N xi ui 0 i=1 xi ui d i =1 p p N N [0, 1]. = N 2 E[x2 ] 2 E[x2 ] Using Slutskys theorem that aN bN a b (for aN a and bN b), this implies 1 XN d xi ui N [0, 2 E[x2 ]]. i=1 N
d d d d p
Then using Slutskys theorem that aN /bN a/b (for aN a and bN b) 1 PN h 2 2 2 2 i i=1 xi ui d N 0, E[x ] d N 0, E[x ] d N 2 2 1 b , N ( ) = 1 PN 2 N 0 , ] E [ x P N 1 2 E[x2 ] plim N i=1 xi i=1 xi N P 2 2 where we use result from consistency proof that plim N 1 N i=1 xi =E[x ]. 5.2. Fixed Regressors with iid errors Assume xi xed and and ui iid with mean 0 and variance 2 . 2 Then xi ui are inid with mean 0 and variance V[xi ui ] = x2 i . Apply Liapounov LLN yielding PN 1 PN 1 xi ui N x u 0 d i i i=1 = q N i=1 N q N [0, 1]. P P 2 2 2 lim N 1 N 2 lim N 1 N i=1 xi i=1 xi 9
Using Slutskys theorem that aN bN a b (for aN a and bN b), this implies 1 XN 1 XN 2 d xi ui N [0, 2 lim x ]. i=1 i=1 i N N
d d
Then using Slutskys theorem that aN /bN a/b (for aN a and bN b) h PN 2 i " 2 lim 1 1 # 1 PN x N 0 , x u X i i N i i =1 N 1 i =1 d d b ) = N P N ( N 0, 2 lim x2 . N 1 1 PN 2 2 i=1 i N x lim x i=1 i i=1 i N N 5.3. Exogenous Stratied Sampling with iid errors Assume xi inid with mean E[xi ] and variance V[xi ] and ui iid with mean 0. Similar to xed regressors will need to use Liapounov CLT. We will get " 1 # XN 1 d 2 2 b ) N 0, plim N ( x . i=1 i N
Note that this is exactly the same result as we would have got if yi = xi + ui with ui N [0, 2 ].
10
This is formally established using the following result. Theorem A16: (Cramer-Wold Device ) Let {bN } be a sequence of random k 1 vectors. If 0 bN converges to a normal random variable for every k 1 constant non-zero vector , then bN converges to a multivariate normal random variable. 1N + + k X kN N , then 0 bN = 1 X The advantage of this result is that if bN = X will be a scalar average and we can apply a scalar CLT, yielding 0 N d 0 X pN N [0, 1], 0 VN and hence VN
1/2
where plim HN exists and aN has a limit normal distribution. The distribution of this product can be obtained directly from part (ii) of Theorem A12 (Slutskys theorem). We restate it in a form that arises for many estimators. Theorem A17: (Limit Normal Product Rule ) If a vector aN N [, A] and a matrix p HN H, where H is positive denite, then HN aN N [H, HAH0 ]. For example, the OLS estimator b 0 = N 1 1 0 1 X0 u, XX N N
d d
is HN = (N 1 X0 X)1 times aN = N 1/2 X0 u and we nd the plim of HN and the limit distribution of aN . 11
Theorem A17 also justies replacement of a limit distribution variance matrix by a consistent estimate without changing the limit distribution. Given d b 0 N N [0, B], d b 0 N N [0, I]
1/2 d
for any BN that is a consistent estimate for B and is positive denite. A formal multivariate CLT yields VN (bN N ) N [0, I]. Premultiply by VN and apply Theorem A17, giving simpler form bN N N [0, V], where V = plim VN and we assume bN and VN are appropriately scaled so that V exists and is positive denite. Dierent authors express the limit variance matrix V in dierent ways. 1. General form: V = plim VN . With xed regressors V = lim VN . 2. Strati ed sampling or xed regressors: Often VN is a matrix average, say VN = P p N 1 square matrix. A LLN gives VN E[VN ] 0. Then N i=1 Si ,where Si is a P V = limE[VN ] = lim N 1 N i=1 E[Si ].
d 1/2
P P 1 0 0 As an example, plim N 1 i xi x0 i = lim N i E[xi xi ] if LLN applies and =E[xx ] under simple random sampling.
8. Asymptotic Normality
b rather than It can be convenient to re-express results in terms of d b 0 ) N ( N [0, B], b 0 ). N (
where the term in large samples means that N is large enough for good approximation but not so large that the variance N 1 B goes to zero. A more shorthand notation is to implicitly presume asymptotic normality and use the following terminology. b) If (??) holds then we say that the Denition A19: (Asymptotic Variance of b is asymptotic variance matrix of b) If (??) holds then we say Denition A20: (Estimated Asymptotic Variance of b that the estimated asymptotic variance matrix of is b is a consistent estimate of B. where B b] = N 1 B b. b [ V b] = N 1 B. V[
(8.1)
(8.2)
Denition A21: (Asymptotic Eciency ) A consistent asymptotically normal estib of is said to be asymptotically ecient if it has an asymptotic variancemator covariance matrix equal to the Cramer-Rao lower bound " #!1 2 ln LN . E / 0 0
b] and Avar b] in denitions A19 and A20 to avoid po[ [ Some authors use the Avar[ b] means tential confusion with the variance operator V[]. It should be clear that here V[ asymptotic variance of an estimator since few estimators have closed form expressions for the nite sample variance. d N )/ As an example of denitions 18-20, if {Xi } are iid [, 2 ] then N (X d a N N N [, 2 ]. Then X N ,2 /N ; the asympN [0, 1], or equivalently that N X N is s2 /N , N is 2 /N ; and the estimated asymptotic variance of X totic variance of X P N 2 /(N 1). where s2 is a consistent estimator of 2 such as s2 = i Xi X
b = + (X0 X)1 X0 u. P Note that the k k matrix X0 X = i xi x0 i where xi is a k 1 vector of regressors for the ith observation. 13
using Slutskys Theorem (Theorem A.3). The OLS estimator is therefore consistent b for (i.e., plim OLS = ) if plim N 1 X0 u = 0. P If a law of LLN can be applied to the average N 1 X0 u = N 1 i xi ui then a necessary condition for this to hold is that E[xi ui ] = 0. The fundamental condition for consistency of OLS is that E[ui |xi ] = 0 so that E[xi ui ] = 0. 9.2. Limit Distribution of OLS b is degenerate with all the mass at . To Given consistency, the limit distribution of b OLS by N , so obtain a limit distribution we multiply b ) = N 1 X0 X 1 N 1/2 X0 u. N ( We know plim N 1 X0 X exists and is nite and nonzero from the proof of consistency. For iid errors, E[uu0 |X] =2 I and V[X0 u|X] = E[X0 uu0 X0 |X] = 2 X0 X we assume that a CLT can be applied to yield d N 1/2 X0 u N [0, 2 plim N 1 X0 X ].
P The reason for renormalization in the right-hand side is that N 1 X0 X = N 1 i xi x0 i is an average that converges in probability to a nite nonzero matrix if xi satises assumptions that permit a LLN to be applied to xi x0 i. Then b = + plim N 1 X0 X 1 plim N 1 X0 u , plim
b ) N [0, 2 N 1 X0 X ], N ( 14
so
a b N [, 2 (X0 X)1 ].
b ] = 2 (X0 X)1 , V[
b0 u b0 u b /(N k ) or s2 = u b /N. where s2 is consistent for 2 . For example, s2 = u 9.4. OLS with Heteroskedatic Errors
0 What if the errors are heteroskedastic? If E[uu0 |X] = = Diag[2 i ] then V[X u|X] = P 0 N 0 E[X0 uu0 X0 |X] =X X = i=1 2 i xi xi . A CLT gives
White (1980) showed that this can be consistently estimated by X 1 0 b ] = (X0 X)1 ( N u b [ bi 2 xi x0 V i )(X X) ,
i=1
15