Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
= E(u
2
|x) + 0 + [m(x,Qo) - m(x,Q)]2
= E(u
2
|x) + [m(x,Qo) - m(x,Q)]2.
The first term does not depend on Q and the second term is clearly minimized
conditional mean function minimize the squared error conditional on any value
example, in the linear case m(x,Q) = xQ, any Q such that x(Qo - Q) = 0 sets
0. So E(u
2
|x) = exp(ao + xGo).
b. If we knew the ui = yi - m(xi,Qo), then we could do a nonlinear
regression of ui on exp(a + xG) and just use the asymptotic theory for
2
estimator, Q^ 2 ^2
-- that is we replace ui with ui, the squared NLS residuals --
179
N
min S {[yi - m(xi,Q^)]2 - exp(a + xiG)}2.
a,G i=1
12.4. Since Q
^
is generally consistent for Qo, the two-step M-estimator is
= exp(ao + xGo)v .
2 2
d. Using the definition of v, write u Taking logs
gives log(u ) =
2
ao + xGo + log(v2). Now, if v is independent of x, so is
log(v ).
2
Therefore, E[log(u )|x] =
2
ao + xGo + E[log(v2)|x] = ao + xGo + ko,
where ko _ E[log(v2)]. So, if we could observe the ui, and OLS regression of
So, if m(x,Q)
^
replaced with ui, by essentially the same argument in part b.
is linear in Q, we can carry out a weighted NLS procedure without ever doing
nonlinear estimation.
exp(-xiG/2).
^
180
^
12.3. a. The approximate elasticity is dlog[E(y |z)]/dlog(z1) = d[^q1 +
^
q2log(z1) + ^q3z2]/dlog(z1) = ^q2.
^ ^
b. This is approximated by 100Wdlog[E(y|z)]/dz2 = 100Wq3.
regression (12.75).
N
12.4. a. Write the objective function as (1/2) S [yi - m(xi,Q)]2/h(xi,^G).
i=1
The objective function, for any value of G, is q(wi,Q;G) = (1/2)[yi -
E[si(Qo;G)|xi] = -Dqm(xi,Qo)’E(ui|xi)/h(xi,G) = 0;
xi, so
E[Dgsi(Qo;G)|xi] = Dqm(xi,Qo)’E(ui|xi)Dgh(xi,G)/[h(xi,G)]2 = 0.
181
It follows by the LIE that the unconditional expectation is zero, too. In
other words, we have shown that the key condition (12.37) holds (whether or
Avar(Q) =
^ ^ & SN D mˇ’D mˇ *-1& SN uˇ2D ˇm’D ˇm *& SN D ˇm’D ˇm *-1,
7i=1 q i q i8 7i=1 i q i q i87i=1 q i q i8
ˇ ^ ^1/2 ˇ ^ ^1/2
where ui _ ui/hi and Dqmi _ Dqmi/hi .
12.5. We need the gradient of m(xi,Q) evaluated under the null hypothesis.
Ddm(xi,Q~) = g(xiB
~
)[(xiB) ,(xiB) ].
~ 2 ~ 3
Therefore, the usual LM statistic can be
~
gi _ g(xiB
~
). If G(W) is the identity function, g(W) _ 1, and we get RESET.
as
T T
Hi(Q) = Dqsi(Q) = - S D2qm(xit,Q)uit(Q) + S Dqm(xit,Q)’Dqm(xit,Q).
t=1 t=1
When we plug in Qo and use the fact that E(uit|xit) = 0, all t = 1,...,T,
then
182
T T
Ao _ E[Hi(Qo)] = - S E[D2qm(xit,Qo)uit] + S E[Dqm(xit,Qo)’Dqm(xit,Qo)]
t=1 t=1
T
= S E[Dqm(xit,Qo)’Dqm(xit,Qo)]
t=1
because E[Dqm(xit,Qo)uit] = 0, t = 1,...,T.
2
By the usual law of large
numbers argument,
-1 N T N
N S S Dqm(xit,Q^)’Dqm(xit,Q^) _ N-1 S ^Ai
i=1t=1 i=1
is a consistent estimator of Ao. Then, we just use the usual "sandwich"
formula in (12.49).
Qo, of course). This is very similar to the linear regression case from
= E(uit|xit,xir,uir)uirDqm(xit,Qo)’Dqm(xir,Q) = 0 because
The variance matrix obtained by ignoring the time dimension and assuming
homoskedasticity is simply
s^27& S S Dqm(xit,Q^)’Dqm(xit,^Q)*8 ,
N T -1
i=1t=1
and we just showed that N times this matrix is a consistent estimator of Avar
rN(^Q - Qo).
-----
183
12.7. a. For each i and g, define uig _ yig - m(xig,Qo), so that E(uig|xi) =
0, g = 1,...,G. Further, let ui be the G * 1 vector containing the uig.
Then E(uiu’|
i xi) = E(uiu’
i) = )o. ^
^ be the vector of nonlinear least
Let ui
squares residuals. That is, do NLS for each g, and collect the residuals.
observation i is
where, hopefully, the notation is clear. With this definition, we can verify
condition (12.37), even though the actual derivatives are complicated. Each
adjust for the first-stage estimation of )o. Alternatively, one can verify
= E{E[Dqmi(Qo)’)o uiu’
i )o Dqmi(Qo)|xi]}
-1 -1
= E[Dqmi(Qo)’)o E(uiu’|
i xi))o Dqmi(Qo)]
-1 -1
= E[Dqmi(Qo)’)o )o)-1
o Dqmi(Qo)] = E[Dqmi(Qo)’)o Dqmi(Qo)].
-1 -1
184
Next, we have to derive Ao _ E[Hi(Qo;Go)], and show that Bo = Ao. The
Hessian itself is complicated, but its expected value is not. The Jacobian
E[Hi(Qo;Go)|xi] = Dqmi(Qo)’)-1
o Dqmi(Qo) + [IP t E(ui|xi)’]F(xi,Qo;Go)
= Dqmi(Qo)’)-1
o Dqmi(Qo).
verified (12.37) and that Ao = Bo. Therefore, from Theorem 12.3, Avar rN(^Q -
-----
Qo) -1
= Ao = {E[Dqmi(Qo)’)o
-1
Dqmi(Qo)]}-1.
c. As usual, we replace expectations with sample averages and unknown
Avar(Q) =
^ ^ &N-1 SN D m (Q^)’)^ -1
D ^ *-1
mi(Q)
7 i=1 q i q 8 /N
& S D m (Q^)’)
N ^ -1
Dqmi(^Q)*8 .
-1
=
7 q i
i=1
The estimate )
^
can be based on the multivariate NLS residuals or can be
that
185
( )
2 s-2
o1Dq1mi1’Dq 1 m i1 0 W W W
o o
0 2
2 2
Dqmi(Qo)’)-1
o Dqmi(Qo) = 2
0
WW 2 .
2
W 2
2
s-2 0 W W W
oG DqGm iG’DqGmiG
o o
0 2
9 0
Taking expectations and inverting the result shows that Avar rN(Qg - Qog) =
^ -----
asymptotic variances are easily seen to be the same as those for nonlinear
given in Problem 7.5 does not extend readily to nonlinear models, even when
the same regressors appear in each equation. The key is that Xi is replaced
12.8. As stated in the hint, we can use (12.33) and an updated version of
(12.66),
-1/2 N N
N S si(Q~;G
^
) = N
-1/2
S si(Qo;G
^
) + AorN(Q - Qo) + op(1),
~ -----
i=1 i=1
N
to show rN(Q
----- ~
- Q) = Ao N
^ -1 -1/2
S si(Q~;G
^
) + op(1); this is just standard
i=1
-1/2 N N
algebra. Under (12.37), N S si(Q~;G
^
) = N
-1/2
S si(Q~;Go) + op(1), by a
i=1 i=1
similar mean value expansion used for the unconstrained two-step M-estimator:
186
-1/2 N N
N S si(Q~;G
^
) = N
-1/2
S si(Q~;Go) + E[Dgsi(Qo;Go)]rN(^G - Go) + op(1),
-----
i=1 i=1
and use E[Dgsi(Qo;Go)] = 0. Now, the second-order Taylor expansion gives
N N N ^ ( N ) ~ ^
S q(wi,Q~;G
^
) - S q(wi,Q;G) = S si(Q;G) + (1/2)(Q - Q)’ S ¨
^ ^ ^ ^ ~
9 Hi (Q - Q)
i=1 i=1 i=1 i=1 0
^ ( N ¨ ) ~
= (1/2)(Q - Q)’ S Hi (Q - Q).
~ ^
9i=1 0
Therefore,
2
& SN q(w ,Q~;G
^ N ^ ^ *
) - S q(wi,Q;G) = [rN(Q - Q)]’Ao[rN(Q - Q)] + op(1)
~
----- ^ ~
----- ^
7i=1 i
i=1 8
& -1/2 SN s~ *’A-1&N-1/2 SN s~ * + o (1),
= N
7 i=1 8
i o 7
i=1 8
i p
where si = si(Q;Go).
~ ~
Again, this shows the asymptotic equivalence of the QLR
conditional mean and conditional median are the same, and there is no
ambiguity about what is "the effect of xj on y," at least when only the mean
and median are in the running. Then, we could interpret large differences
between LAD and NLS as perhaps indicating an outlier problem. But it could
187
^
the nonlinear least squares estimator, say B. Define the weights as hi _
^ ^
^ ^
nip(xi,B)[1 - p(xi,B)]. Then, the weighted NLS estimator minimizes S [yi -
^ ^ N
i=1
nip(xi,B)] /hi.
2^
m(xi,B)]}) = E(u’
i ui) + 2E{[m(xi,Bo) - m(xi,B)]’ui} + E{[m(xi,Bo) -
that
188
which is just to say that the usual consistency proof can be used provided we
before, the first term does not depend on B and the second term is minimized
can use an argument very similar to the nonlinear SUR case in Problem 12.7:
= E{E[Dbmi(Bo)’[Wi(Do)] uiu’
i [Wi(Do)] Dbmi(Bo)|xi]}
-1 -1
= E[Dbmi(Bo)’[Wi(Do)] E(uiu’|
i xi)[Wi(Do)] Dbmi(Bo)]
-1 -1
= E{Dbmi(Bo)’[Wi(Do)] ]Dbmi(Bo)}.
-1
Now, the Hessian (with respect to B), evaluated at (Bo,Do), can be written as
expectations gives
189
Ao _ E[Hi(Bo;Do)] = E{Dbm(xi,Bo)’[Wi(Do)]-1Dbm(xi,Bo)} = Bo.
Therefore, from the usual results on M-estimation, Avar rN(B
^
- Bo) = Ao , and
----- -1
a consistent estimator of Ao is
N
^
A = N
-1
S Dbm(xi,B
^
)’[Wi(D)] Dbm(xi,B).
^ -1 ^
i=1
c. The consistency argument in part b did not use the fact that W(x,D)
Answer:
Chapter 3 but now allowing for the randomness of wi. By a mean value
190
expansion, we can write
N
i=1 i=1 i=1
where Gi is the M * P Jacobian of g(wi,Q) evaluate at mean values between Qo
¨
-1 N
Qo) = Op(1). Further, by Lemma 12.1, N S G¨i p
L E[Dqg(w,Qo)] _ Go, since the
i=1
mean values converge in probability to Qo. Therefore,
&N-1 N
S G¨i*8rN(^Q - Qo) = GorN(^Q - Qo) + op(1),
----- -----
7 i=1
and so
-1/2 N N
N S g(wi,Q^) = N-1/2 S g(wi,Qo) + GorN(Q^ - Qo) + op(1). -----
i=1 i=1
-1/2 N
Since rN(Q - Qo) = -N
----- ^
S Ao-1si( o) + Q op(1), we can write
i=1
-1/2 N -1/2 N
N S g(wi,^Q) = N S [g(wi, o) Q - GoAo si(Qo)] + op(1)
-1
i=1 i=1
or, subtracting rNDo from both sides,
-----
N
rN(D^ - Do) = N-1/2 S [g(wi,Qo) - Do - GoAo-1si(Qo)] + op(1).
-----
i=1
Since the term in the summation has zero mean, it follows from the CLT that
where Do = Var[(gi - Do -1
- GoAo si)], where hopefully the shorthand is clear.
This differs from the usual delta method result by the presence of gi =
gi(Qo).
^ ^
b. We assume we have A consistent for Ao. By the usual arguments, G =
-1 N
N S Dqg(wi,^Q) is consistent for Go. Then
i=1
-1 N
^
D = N S (g^i - D^ - GA si)(gi - D - GA si)’
^^-1^ ^ ^ ^^-1^
i=1
is consistent for Do, where the "^" denotes evaluation at Q.
^
GoAo BoAo G’
-1 -1
o , which is what we wanted to show.
191
SOLUTIONS TO CHAPTER 13 PROBLEMS
f(yi|xi;Q)]} over $. The problem is that the expectation and the exponential
f(yi|xi;Q)]}.
13.2. a. Since
f(y|xi) = (2pso)
2 -1/2 (
exp -(y - m(xi,Bo)) /(2so) ,
2 2 )
9 0
it follows that for each i,
li(B,s2) 1
= - log(2p) -
2
1
-----
2
log(s ) -
2
-----
1
-------------- [yi - m(xi,B)] .
2
2s
2
Only the last of these terms depends on B. Further, for any s2 > 0,
N
maximizing S li(B,s2) with respect to B is the same as minimizing
i=1
N
S [yi - m(xi,B)]2. (13.66)
i=1
Thus, regardless of the MLE for s2, the MLE B
^
of B minimizes (13.66).
b. First,
dli(B,s2)
[yi - m(xi,B)] .
1 1 2
------------------------------------------ = - -------------- + --------------
ds 2
2s
2
2s
4
192
For notational simplicity, define the residual function ui(B) _ yi
- m(xi,B). Then the score is
&D m (B)’u (B)/s 2 *
2
b i i 2
si(Q) =
[u i (B)]
2 1 1 2 2,
- ---------- + ----------------
7 2s 2s
2 2 4 2
0
where Dbmi(B) _ Dbm(xi,B).
Define the errors as ui _ ui(Bo), so that E(ui|xi) = 0 and E(u2i|xi) =
Var(yi|xi) = so2. Then, since Dbmi(Bo) is a function of xi, it is easily seen
that E[si(Qo)|xi] = 0. Note that we only use the fact that E(yi|xi) =
m(xi,Bo) and Var(yi|xi) = so2 in showing this. In other words, only the first
2s 2s
8 = 0,
--------------
Hi(Q) = 2 2 ,
-D b mi(B)u i (B)/s [u i (B)]
2 4 1 1 22
-
7 ---------------
2s
2
---------------
2s
6 8
where Db2mi(B) is the P * P Hessian of mi(B).
d. From part c,
193
&D m (B )’D m (B )/s2 *
2
b i o b i o o 0
2
-E[Hi(Qo)|xi] = 2 2 , (13.67)
2 1 2
0
7 --------------
2so
4 0
f. From general MLE, we know that Avar rN(^b - Bo) is the P * P upper
-----
Avar rN(B
^
- Bo) = so{E[Dbmi(Bo)’Dbmi(Bo)]} ,
----- 2 -1
^2& N ^ *-1
Avar(B) = s
^ ^
S Dbm^’D
7i=1 i bmi8 . (13.69)
If the model is linear, Db^mi = xi, and we obtain exactly the asymptotic
variance estimator for the OLS estimator under homoskedasticity.
immediately:
194
c. We need to evaluate the score and the expected Hessian with respect
to the full set of parameters, but then evaluate these at the restricted
estimates. Now,
DqG(xi,B,0) = f(xB)[x,(xB)2,(xB)3],
a 1 * (K + 2) vector. Let B~ denote the probit estimates of B, obtained under
the null. The score for observation i, evaluated under the null estimates,
is the (K + 2) * 1 vector
si(Q) = DqG(xi,B ,0)’[yi - F(xiB)]/{F(xiB)[1 - F(xiB)]}
~ ~ ~ ~ ~
= f(xiB
~ ~
)z’i [yi - F(xiB)]/{F(xiB)[1 - F(xiB)]},
~ ~ ~
~
where zi _ [xi,(xiB) ,(xiB) ].
~ 2 ~ 3
The negative of the expected Hessian,
These can be plugged into the second expression in equation (13.26) to obtain
statistic can be computed as the explained sum of squares from the regression
~ ~ ~ ~ ~ ~
ui/[Fi(1 - Fi)] on fiWxi/[Fi(1 - Fi)] ,
1/2 1/2
195
= [G(Qo)’] E[si(Qo)si(Qo)’|xi][G(Qo)]
-1 -1
= [G(Qo)’] Ai(Qo)[G(Qo)] .
-1 -1
LMg =
& SN ~sg*’& SN ~Ag*-1& SN ~sg*
7i=1 i8 7i=1 i8 7i=1 i8
=
& SN ~G’-1~s *’& SN ~G’-1~A ~G-1*-1& SN ~G’-1s~ *
7i=1 i8 7 i 8 7i=1 i8
N ’ i=1
& S s~ * G~-1G~& S A~ * G~’G~’-1& S ~s *
N -1 N
=
7i=1 i8 7i=1 i8 7i=1 i8
=
& S ~s *’& S ~A * & S ~s * = LM.
N N -1 N
7i=1 8 7i=1 8
i i 7i=1 8
i
13.6. a. No, for two reasons. First, just specifying a distribution of yit
given xit says nothing, in general, about the distribution of yit given xi _
(xi1,...,xiT). We could assume these two are the same, which is a strict
as -Dqsi(Q):
196
T
Hi(Q) = S exp(xitQ)x’itxit,
t=1
which, in this example, does not depend on the yit: Ait(Qo) = Hit(Qo).
Therefore,
N T
^
A = N
-1
S S exp(xitQ^)x’itxit,
i=1t=1
where Q
^
is the partial MLE. Further,
-1 N
^
B = N S si(^Q)si(^Q)’,
i=1
and then Avar(Q) is estimated as
^
E[sit(Qo)|xit,yi,t-1,xi,t-1,...]
it[E(yit|xit,yi,t-1,xi,t-1,...) - exp(xitQo)] = 0.
= x’
expectations,
T
Bo = S E[exp(xitQo)x’itxit] = Ao.
t=1
(We have really just verified the conditional information matrix equality for
which is exactly what we get by using pooled Poisson estimation and ignoring
197
13.7. a. The joint density is simply g(y1|y2,x;Qo)Wh(y2|x;Qo). The log-
E[ri2li1(Q)|yi2,xi] = ri2E[li1(Q)|yi2,xi];
c. The score is
+ E[ri2si1(Qo)si2(Qo)’] + E[ri2si2(Qo)si1(Qo)’].
Now by the usual conditional MLE theory, E[si1(Qo)|yi2,xi] = 0 and, since ri2
198
where Hi1(Q) = Dqsi1(Q). Since ri2 is a function of (yi2,xi), we can put ri2
E[ri2si1(Qo)si1(Qo)’] = -E[ri2Hi1(Qo)].
= -{E[ri2Dqsi1(Q) + Dqsi2(Q)]
= -E[Dqli(Q)] _ -E[Hi(Q)].
2
-1 N
N S (ri2^Hi1 + ^Hi2),
i=1
where the notation should be obvious. But, as we discussed in Chapters 12
and 13, this estimator need not be positive definite. Instead, we can break
e. Bonus Question: Show that if we were able to use the entire random
199
sample, the result conditional MLE would be more efficient than the partial
the entire random sample for both terms, the asymptotic variance would be
matrix
-1 N
N S [f(h^i^Q)]2^h’i ^hi/{F(h iQ)[1 - F(hiQ)]},
^ ^ ^ ^
i=1
^ ^
where hi _ (xi,vi) is the vector of "generated regressors." This is just the
usual probit estimator that we would use without generated regressors. The
and then averaging. But, by the same argument used to simplify the expected
200
Hessian, all but one term has zero conditional expectation. In particular,
o
where hi _ (xi,vi) and
( )
Dghi(Go)’ = -z0 _ -Ri, 2 2
9 i0
a (K + 1) * M matrix, where zi is 1 * M. Therefore, a consistent estimator
of Fo is
N
^
F = N
-1
S [f(h^i^Q)]2^h’i ^Q’Ri/{F(h iQ)[1 - F(hiQ)]}.
^ ^ ^ ^
i=1
^ ^ ^
Now, for ri implicit in (12.61), we can take ri = (Z’Z/N) z’
-1
i vi. Then, we
^
form gi as in (12.61).
( )
b. In general, Q’o Ri = (D’
o ro) z0 = rozi.
2 2 If ro = 0 then Q’o Ri = 0,
9 i0
which means Fo = 0. Then, we need not adjust the score for the first-step
estimation of Go.
c. From part b, the usual probit t statistic on the generated regressor
^
vi is asymptotically standard normal under H0: ro = 0, so we can just ignore
^
the generated regressor aspect of vi when carrying out the test.
given by
fT(yT|yT-1)WfT-1(yT-1|yT-2)WWWf1(y1|y0)Wf0(y0),
b. The log-likelihood
T
li(Q) = S log[ft(yit|yi,t-1;Q)]
t=1
is the conditional log-likelihood for the density of (yi1,...,yiT) given yi0,
201
c. Because we have the density of (yi1,...,yiT) given yi0, we can use
package that computes a particular MLE, we can just use any of the usual
i f(y1,y2,...,yG|x,c)h(c|x)dc.
g(y1,...,yG|x) =
R
c. The density g(y1,...,yG|x) is now
g(y1,...,yG|x;Go,Do) = i f(y1,y2,...,yG|x,c;Go)h(c|x;Do)dc
R
G
= i p fg(yg|x,c;Go)h(c|x;Do)dc,
g
R g=1
and so the log likelihood for observation i is
log[g(yi1,...,yiG|xi;Go,Do)]
= log
#i pG f (y |x ,c;Gg)h(c|x ;D )dc$.
3R g=1 g ig i o i o 4
d. This setup has some features in common with a linear SUR model,
nonlinear models -- especially if G is large and some of the models are for
202
qualitative response -- one probably needs to restrict the cross correlation
somehow.
13.11. a. For each t > 1, the density of yit given yi,t-1 = yi,t-1, yi,t-2 =
yt-2, , ..., yi0 = y0, and ci = c is
not a good idea to "estimate" the ci along with r and s2e, as the incidental
parameters problem causes inconsistency -- severe in some cases -- in the
estimator of r.
b. If we write ci = a0 + a1yi0 + ai, under the maintained assumption,
then the density of (yi1,...,yiT) given (yi0 = y0,ai = a) is
T
p (2ps2e)-1/2exp[-(yt - ryt-1 - a0 - a1y0 - a)2/(2s2e)].
t=1
Now, to get the density condition on yi0 = y0 only, we integrate this density
over the density of ai given yi0 = y0. But ai and yi0 are independent, and
ai ~ Normal(0,sa).
2
So the density of (yi1,...,yiT) given yi0 = y0 is
8 T
i&7 p (2ps2e)-1/2exp[-(yt - ryt-1 - a0 - a1y0 - a)2/(2s2e)]*8s-1
a f(a/sa)da.
-8 t=1
If we now plug in the data (yi0,yi1,...,yiT) for each i and take the log we
i.
203
c. As before, we can replace ci with a0 + a1yi0 + ai. Then, the density
get the MLEs, we would estimate r + E(ci) as r^ + a^0 + a^1y0, where y0 is the
----- -----
the log of
8& T
i7 p (2pse2)-1/2exp[-(yit - ryi,t-1 - zitB
-8 t=1
----- 2 2 * -1
- a0 - a1y0 - ziD - a) /(2se)] sa f(a/sa)da.
8
-----
The assumption that we can put in the time average, zi, to account for
to put in the full vector zi, although this leads to many more parameters to
estimate.
correctly specified densities for yit given xit. That is, assume that there
is Qo e int($) such that f(yt|xt;Qo) is the density of yit given xit = xt.
D(yit|xi1,...,xiT) = D(yit|xit).
serially uncorrelated?
Answer:
a. This is true, because, by the general theory for partial MLE, we know
does not depend on xis, s $ t, says nothing about whether yir, r < t, appears
in D(yit|xit,yi,t-1,xi,t-1,...).
If gt(yt|zt,c;G) is correctly
-----
specified for the density of yit given (zit = zt,ci = c), and h(c|z;D) is
-----
----- -----
C
Under the assumptions given, D(yit|zi1,...,ziT,zi) = D(yit|zit,zi), t =
----- -----
205
----- -----
206
SOLUTIONS TO CHAPTER 14 PROBLEMS
analytically if g2 $ 1.
b. No. If g1 = 0, the parameter g2 does not appear in the model. Of
score test and QLR test also fail because of lack of identification under
r1
H0. What we can do is fix a value for r1, and then use a t test on (wage
- 1)/r1 after 2SLS (or GMM more generally). This need not be a very good
test for detecting g1 $ 0 if our guess for r1 is not close to the actual
value. There is a growing literature on testing hypotheses when parameters
*
14.3. Let Zi be the G * G matrix of optimal instruments in (14.63), where
we suppress its dependence on xi. Let Zi be a G * L matrix that is a
208
function of xi and let %o be the probability limit of the weighting matrix.
Then the asymptotic variance of the GMM estimator has the form (14.10) with
Go = E[Z’
i Ro(xi)]. So, in (14.54), take A _ G’o %oGo and s(wi) _
o %oZ’
G’ i r(wi,Qo). _
*
The optimal score function is s (wi)
Ro(xi)’)o(xi) r(wi,Qo). r = 1:
-1
Now we can verify (14.57) with
o %oE[Z’
E[s(wi)s (wi)’] = G’ i r(wi,Qo)r(wi,Qo)’)o(xi) Ro(xi)]
* -1
o %oE[Z’
= G’ i E{r(wi,Qo)r(wi,Qo)’|xi})o(xi) Ro(xi)]
-1
o %oE[Z’
= G’ i )o(xi))o(xi) Ro(xi)] = G’
o %oGo = A.
-1
14.4. a. The residual function for the conditional mean model E(yi|xi) =
{E[Dbm(xi,Bo)’[wo(xi)] Dbm(xi,Bo)]}-1
-1
= so2{E[Dbm(xi,Bo)’Dbm(xi,Bo)/h(xi,Go)]}-1,
which is the asymptotic variance of the WNLS estimator under WNLS.1,
E[ri(Qo)|xi] = 0,
where Qo = (B’
o ,so)’.
2
To obtain the efficient IVs, we first need
209
E[Dqri(Qo)|xi]. But
& -D b m i (B) 0 *
Dqri(Q) = .
7-2D b m i (B)ui(B) -1 8
2 2
&-Dbm i (B) 0 *
Ro(xi) = Dqri(Q) = 2 2.
7 0 -1 8
We also need
and for B^ it is the same as NLS. In other words, adding the moment
addition, E(ui|xi) is constant, then the usual estimator of s2o based on the
4
the Pt we have
210
by
&1 0 0 0 0 *
2
0 IK 0 0 IK 2
2 2
0 0 IK 0 0
2 2
0 2 0 0 IK 0 2
1 2
0 0 0 0 2
2 2
0 IK 0 0 0
H = 2 . 2
0 2
0 IK 0 IK 2
0 2 0 0 IK 0 2
1 2
0 0 0 0 2
2 2
0 IK 0 0 0
2 2
0 2
0 IK 0 0 2
70 0 0 IK IK8
o %o Ho - H’
This difference is H’ o *o Ho = H’
o (%o *o-1)Ho.
-1 -1 -1
is p.s.d. - This is
QeR P
where it is assumed that no restrictions are placed on Q. The first order
14.8. From the efficiency result of maximum likelihood -- see the discussion
211
on page 439 -- it is no less asymptotically efficient to use the density of
f0(y0;Q), then the unconditional MLE will generally be inconsistent for Qo.
The MLE that conditions on yi0 is consistent provided we have the densitites
interest, we are usually willing to put more effort into testing our
specification of it.
14.9. We have to verify equations (14.55) and (14.56) for the random effects
and fixed effects estimators. The choices of si1, si2 (with added i
subscripts for clarity), A1, and A2 are given in the hint. Now, from Chapter
Therefore, E(si1s’
i1) = E(X’i rir’
i Xi) = su2E(Xˇ’i Xˇi) _ su2A1 by the usual
iterated expectations argument. This means that, in (14.55), r _ su2. Now,
¨ ˇ
we just need to verify (14.56) for this choice of r. But si2s’
i1 = X’i uir’
i X i.
¨ ¨
Now, as described in the hint, X’i ri = X’i (vi - ljTvi) = ¨X’i vi = ¨X’i (cijT + ui) =
-----
¨ ¨ ˇ ¨ ˇ
X’i ui. So si2s’
i1 = X’i rir’
i Xi and therefore E(si2s’
i1|xi) = X’i E(rir’|
i xi)Xi =
212
SOLUTIONS TO CHAPTER 15 PROBLEMS
of yi in the sample falling into category m. Therefore, the fitted values are
b. The fitted values for each category will be the same. If we drop d1
but add an overall intercept, the overall intercept is the cell frequency for
the first category, and the coefficient on dm becomes the difference in cell
dq 1 + q
The optimal solution is qi = 0 if the marginal utility of charitable giving at
213
which no charitable contributions will be made; in other words, we have the
ai > pi, then an interior solution exists (qi > 0) and necessarily solves the
1 + qi
---------------------------- _ 0
or
1 + qi = ai/pi.
Therefore,
dP(y = 1|z1,z2)
= (g1 + 2g2z2)Wf(z1D1 + g1z2 + g2z2);
2
dz 2 -------------------------------------------------------------------------
b. In the model
214
and d1 = 0:
just replace the parameters with their probit estimates, and use average or
require the full variance matrix of the probit estimates as well as the
15.4. This is the kind of statement that arises out of failure to distinguish
between the underlying latent variable model and the model for P(y = 1|x).
The linear probability model assumes P(y = 1|x) = xB while, for example, the
probit model assumes that P(y = 1|x) = F(xB). Thus, both models make very
fact that the probit model can be derived from a latent variable model with a
normal, homoskedastic error does not make it less plausible than the LPM. In
fact, we know that the probit functional form has some attractive properties
that the linear model does not have: F(xB) is always between zero and one,
Incidentally, the LPM can be obtained from a latent variable model by assuming
215
15.5. a. If P(y = 1|z,q) = F(z1D1 + g1z2q) then
dP(y = 1|z,q) = g qWf(z D + g z q),
dz 2 1 1 1 1 2
-----------------------------------------------------------------
E(e|z) = 0. Also,
that
& 5 ======================================
*
P(y = 1|z) = F z1D1/r g1z2 + 1 .
2 2
(15.90)
7 8
c. Because P(y = 1|z) depends only on g1, this is what we can estimate
2
along with D1. (For example, g1 = -2 and g1 = 2 give exactly the same model
easily done using the score or LM test because, under H0, we have a standard
^
probit model. Let D1 denote the probit estimates under the null that r1 = 0.
^ ^ ^ ^ ^ ^ ~
5 ================================================
only other quantity needed is the gradient with respect to r1 evaluated at the
null estimates. But the partial derivative of (15.90) with respect to r1 is,
for each i,
-(zi1D1)(zi2/2) r1zi2 + 1
2 & 2 *-3/2f(z D /r5g2z2 + 1). ==========================================
7 8 9 i1 1 1 i2 0
^ ^ ^
When we evaluate this at r1 = 0 and D1 we get -(zi1D1)(zi2/2)fi.
2
Then, the
216
2
score statistic can be obtained as NRu from the regression
~ ^
5================================================
^ ^ ^ 2 ^ ^ ^
5================================================
number of cigarettes that someone smokes per day, what effect would this have
we want to infer causality, not just find a correlation being missing work and
cigarette smoking.
b. Since people choose whether and how much to smoke, we certainly cannot
treat the data as coming from the experiment we have in mind in part a. (That
possible that smokers are less healthy to begin with, or have other attributes
that cause them to miss work more often. Or, it could go the other way:
harder workers. In any case, cigs might be correlated with the unobservables
in the equation.
cigs, then probit ignoring q1 does estimate the average partial effect of
217
another cigarette.
d. No. There are many people in the working population who do not smoke.
distributed. But it is really the pile up at zero that is the most serious
issue.
^
e. Use the Rivers-Vuong test. Obtain the residuals, r2, from the
^
regression cigs on z. Then, estimate the probit of y on z1, cigs, r2 and use
^
a standard t test on r2. This does not rely on normality of r2 (or cigs). It
does, of course, rely on the probit model being correct for y under H0.
residence when the state implements no smoking laws in the workplace, and that
a dummy indicator for whether the person works in a state with a new law can
with the state law indicator, since people will not be able to smoke as much
cigs.
. reg arr86 pcnv avgsen tottime ptime86 inc86 black hispan born60
. reg arr86 pcnv avgsen tottime ptime86 inc86 black hispan born60, robust
------------------------------------------------------------------------------
| Robust
arr86 | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
pcnv | -.1543802 .018964 -8.14 0.000 -.1915656 -.1171948
avgsen | .0035024 .0058876 0.59 0.552 -.0080423 .0150471
tottime | -.0020613 .0042256 -0.49 0.626 -.010347 .0062244
ptime86 | -.0215953 .0027532 -7.84 0.000 -.0269938 -.0161967
inc86 | -.0012248 .0001141 -10.73 0.000 -.0014487 -.001001
black | .1617183 .0255279 6.33 0.000 .1116622 .2117743
hispan | .0892586 .0210689 4.24 0.000 .0479459 .1305714
born60 | .0028698 .0171596 0.17 0.867 -.0307774 .036517
_cons | .3609831 .0167081 21.61 0.000 .3282214 .3937449
------------------------------------------------------------------------------
The estimated effect from increasing pcnv from .25 to .75 is about -.154(.5) =
-.077, so the probability of arrest falls by about 7.7 points. There are no
important differences between the usual and robust standard errors. In fact,
b. The robust statistic and its p-value are gotten by using the "test"
219
. test avgsen tottime
( 1) avgsen = 0.0
( 2) tottime = 0.0
F( 2, 2716) = 0.18
Prob > F = 0.8320
. qui reg arr86 pcnv avgsen tottime ptime86 inc86 black hispan born60
( 1) avgsen = 0.0
( 2) tottime = 0.0
F( 2, 2716) = 0.18
Prob > F = 0.8360
. probit arr86 pcnv avgsen tottime ptime86 inc86 black hispan born60
------------------------------------------------------------------------------
arr86 | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
pcnv | -.5529248 .0720778 -7.67 0.000 -.6941947 -.4116549
avgsen | .0127395 .0212318 0.60 0.548 -.028874 .0543531
tottime | -.0076486 .0168844 -0.45 0.651 -.0407414 .0254442
ptime86 | -.0812017 .017963 -4.52 0.000 -.1164085 -.0459949
inc86 | -.0046346 .0004777 -9.70 0.000 -.0055709 -.0036983
black | .4666076 .0719687 6.48 0.000 .3255516 .6076635
hispan | .2911005 .0654027 4.45 0.000 .1629135 .4192875
born60 | .0112074 .0556843 0.20 0.840 -.0979318 .1203466
_cons | -.3138331 .0512999 -6.12 0.000 -.4143791 -.213287
------------------------------------------------------------------------------
Now, we must compute the difference in the normal cdf at the two different
220
of the remaining variables:
This last command shows that the probability falls by about .10, which is
. predict phat
(option p assumed; Pr(arr86))
| arr86
arr86h | 0 1 | Total
-----------+----------------------+----------
0 | 1903 677 | 2580
1 | 67 78 | 145
-----------+----------------------+----------
Total | 1970 755 | 2725
. di 1903/1970
.96598985
. di 78/755
.10331126
For men who were not arrested, the probit predicts correctly about 96.6% of
221
the time. Unfortunately, for the men who were arrested, the probit is correct
only about 10.3% of the time. The overall percent correctly predicted is
quite high, but we cannot very well predict the outcome we would most like to
predict.
. probit arr86 pcnv avgsen tottime ptime86 inc86 black hispan born60 pcnvsq
pt86sq inc86sq
------------------------------------------------------------------------------
arr86 | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
pcnv | .2167615 .2604937 0.83 0.405 -.2937968 .7273198
avgsen | .0139969 .0244972 0.57 0.568 -.0340166 .0620105
tottime | -.0178158 .0199703 -0.89 0.372 -.056957 .0213253
ptime86 | .7449712 .1438485 5.18 0.000 .4630333 1.026909
inc86 | -.0058786 .0009851 -5.97 0.000 -.0078094 -.0039478
black | .4368131 .0733798 5.95 0.000 .2929913 .580635
hispan | .2663945 .067082 3.97 0.000 .1349163 .3978727
born60 | -.0145223 .0566913 -0.26 0.798 -.1256351 .0965905
pcnvsq | -.8570512 .2714575 -3.16 0.002 -1.389098 -.3250042
pt86sq | -.1035031 .0224234 -4.62 0.000 -.1474522 -.059554
inc86sq | 8.75e-06 4.28e-06 2.04 0.041 3.63e-07 .0000171
_cons | -.337362 .0562665 -6.00 0.000 -.4476423 -.2270817
------------------------------------------------------------------------------
( 1) pcnvsq = 0.0
( 2) pt86sq = 0.0
( 3) inc86sq = 0.0
222
chi2( 3) = 38.54
Prob > chi2 = 0.0000
. tab smokes
smokes| Freq. Percent Cum.
------------+-----------------------------------
0 | 1176 84.73 84.73
1 | 212 15.27 100.00
------------+-----------------------------------
Total | 1388 100.00
------------------------------------------------------------------------------
smokes | Coef. Std. Err. z P>|z| [95% Conf. Interval]
---------+--------------------------------------------------------------------
motheduc | -.1450599 .0207899 -6.977 0.000 -.1858074 -.1043124
white | .1896765 .1098804 1.726 0.084 -.0256852 .4050382
lfaminc | -.1669109 .0498894 -3.346 0.000 -.2646923 -.0691296
_cons | 1.126276 .2504608 4.497 0.000 .6353822 1.617171
------------------------------------------------------------------------------
. sum faminc
. di 1.126 - .167*log(29.027)
.56350619
223
. di normprob(-.145*16 + .5635) - normprob(-.145*12 + .5635)
-.08019603
For nonwhite women at the average income level, the estimated difference in
the probability of smoking between college graduates and high school is about
-.08, that is, women with a college education are .08 less likely to smoke.
------------------------------------------------------------------------------
lfaminc | Coef. Std. Err. t P>|t| [95% Conf. Interval]
---------+--------------------------------------------------------------------
motheduc | .0709044 .0098338 7.210 0.000 .0516109 .090198
white | .3452115 .050418 6.847 0.000 .2462931 .4441298
fatheduc | .0616625 .008708 7.081 0.000 .0445777 .0787473
_cons | 1.241413 .1103648 11.248 0.000 1.024881 1.457945
------------------------------------------------------------------------------
regression for the next part. Note that we lose 197 observations due to
224
^
v2:
------------------------------------------------------------------------------
smokes | Coef. Std. Err. z P>|z| [95% Conf. Interval]
---------+--------------------------------------------------------------------
motheduc | -.0826247 .0465203 -1.776 0.076 -.1738029 .0085535
white | .4611075 .1965242 2.346 0.019 .0759272 .8462879
lfaminc | -.7622559 .3652944 -2.087 0.037 -1.47822 -.046292
v2hat | .6107298 .3708066 1.647 0.100 -.1160378 1.337497
_cons | 1.98796 .5996364 3.315 0.000 .8126946 3.163226
------------------------------------------------------------------------------
There is not real strong evidence of endogeneity, but even if there were, we
fatheduc is exogenous.
This is not a very good example, but it shows you how to mechanically
225
observation, especially if N is large.
evaluated at the true values, of course -- maximizes the KLIC. Since the MLEs
are consistent for the unknown parameters, asymptotically the true density
will produce the highest average log likelihood function. So, just as we can
use an R-squared to choose among different functional forms for E(y|x), we can
use values of the log-likelihood to choose among different models for P(y =
^ ^
15.10. a. There are several possibilities. One is to define pi = P(y = 1|xi)
always between zero and one. An alternative is to use the sum of squared
residuals form. While this produces the same R-squared measure for the linear
b. I will report the square of the correlation between yi and the fitted
probabilities for the LPM and probit. The LPM R-squared is about .106 and
f(y1,...,yT|xi) = f1(y1|xi)WWWfT(yT|xi),
226
that is, the joint density (conditional on xi) is the product of the marginal
model -- then
T
f(y1,...,yT|xi) = p [G(xitB)]yt[1 - G(xitB)]1-yt,
t=1
and so pooled probit is conditional MLE.
cancel. Therefore,
227
+ exp(xi2B + ci) + exp(xi3B + ci)]
Also,
and
logit. This, however, would be inefficient because it does not use the ni = 2
observations.
A similar argument can be used for the three possible configurations with
Again, this has the conditional logit form, but where the explanatory
periods.
the treatment and control groups, both before and after the policy change,
b. Let d2 be a binary indicator for the second time period, and let dB be
an indicator for the treatment group. Then a probit model to evaluate the
treatment effect is
228
P(y = 1|x) = F(d0 + d1d2 + d2dB + d3d2WdB + xG),
probit of y on 1, d2, dB, d2WdB, and x using all observations. Once we have
^ ^ ^ ^ ^
- [F(d0 + d1 + xG) - F(d0 + xG)],
------ ------
Both are estimates of the difference, between groups B and A, of the change in
c. We would have to use the delta method to obtain a valid standard error
^ ~
for either q or q.
15.14. a. The following Stata output contains the linear regression results.
Since pctstck is discrete (taking on only 0, 50, 100), it seems likely that
errors are not very different from the usual ones (not reported).
. reg pctstck choice age educ female black married finc25-finc101 wealth89
prftshr, robust
------------------------------------------------------------------------------
229
| Robust
pctstck | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
choice | 12.04773 5.994437 2.01 0.046 .2188715 23.87658
age | -1.625967 .8327895 -1.95 0.052 -3.269315 .0173813
educ | .7538685 1.172328 0.64 0.521 -1.559493 3.06723
female | 1.302856 7.148595 0.18 0.856 -12.80351 15.40922
black | 3.967391 8.974971 0.44 0.659 -13.74297 21.67775
married | 3.303436 8.369616 0.39 0.694 -13.21237 19.81924
finc25 | -18.18567 16.00485 -1.14 0.257 -49.76813 13.39679
finc35 | -3.925374 15.86275 -0.25 0.805 -35.22742 27.37668
finc50 | -8.128784 15.3762 -0.53 0.598 -38.47072 22.21315
finc75 | -17.57921 16.6797 -1.05 0.293 -50.49335 15.33493
finc100 | -6.74559 16.7482 -0.40 0.688 -39.7949 26.30372
finc101 | -28.34407 16.57814 -1.71 0.089 -61.05781 4.369671
wealth89 | -.0026918 .0114136 -0.24 0.814 -.0252142 .0198307
prftshr | 15.80791 8.107663 1.95 0.053 -.190984 31.80681
_cons | 134.1161 58.87288 2.28 0.024 17.9419 250.2902
------------------------------------------------------------------------------
do not expect big differences in standard errors, and we do not see them:
. reg pctstck choice age educ female black married finc25-finc101 wealth89
prftshr, robust cluster(id)
------------------------------------------------------------------------------
| Robust
pctstck | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
choice | 12.04773 6.184085 1.95 0.053 -.1597615 24.25521
age | -1.625967 .8192942 -1.98 0.049 -3.243267 -.0086663
educ | .7538685 1.1803 0.64 0.524 -1.576064 3.083801
female | 1.302856 7.000538 0.19 0.853 -12.51632 15.12203
black | 3.967391 8.711611 0.46 0.649 -13.22948 21.16426
married | 3.303436 8.624168 0.38 0.702 -13.72082 20.32769
finc25 | -18.18567 16.82939 -1.08 0.281 -51.40716 15.03583
finc35 | -3.925374 16.17574 -0.24 0.809 -35.85656 28.00581
finc50 | -8.128784 15.91447 -0.51 0.610 -39.54421 23.28665
finc75 | -17.57921 17.2789 -1.02 0.310 -51.68804 16.52963
finc100 | -6.74559 17.24617 -0.39 0.696 -40.78983 27.29865
finc101 | -28.34407 17.10783 -1.66 0.099 -62.1152 5.427069
wealth89 | -.0026918 .0119309 -0.23 0.822 -.0262435 .02086
230
prftshr | 15.80791 8.356266 1.89 0.060 -.6874976 32.30332
_cons | 134.1161 58.1316 2.31 0.022 19.36333 248.8688
------------------------------------------------------------------------------
For later use, the predicted pctstck for the person described in the problem,
. oprobit pctstck choice age educ female black married finc25-finc101 wealth89
prftshr
------------------------------------------------------------------------------
pctstck | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
choice | .371171 .1841121 2.02 0.044 .010318 .7320241
age | -.0500516 .0226063 -2.21 0.027 -.0943591 -.005744
educ | .0261382 .0352561 0.74 0.458 -.0429626 .0952389
female | .0455642 .206004 0.22 0.825 -.3581963 .4493246
black | .0933923 .2820403 0.33 0.741 -.4593965 .6461811
married | .0935981 .2332114 0.40 0.688 -.3634878 .550684
finc25 | -.5784299 .423162 -1.37 0.172 -1.407812 .2509524
finc35 | -.1346721 .4305242 -0.31 0.754 -.9784841 .7091399
finc50 | -.2620401 .4265936 -0.61 0.539 -1.098148 .5740681
finc75 | -.5662312 .4780035 -1.18 0.236 -1.503101 .3706385
finc100 | -.2278963 .4685942 -0.49 0.627 -1.146324 .6905316
finc101 | -.8641109 .5291111 -1.63 0.102 -1.90115 .1729279
wealth89 | -.0000956 .0003737 -0.26 0.798 -.0008279 .0006368
prftshr | .4817182 .2161233 2.23 0.026 .0581243 .905312
-------------+----------------------------------------------------------------
_cut1 | -3.087373 1.623765 (Ancillary parameters)
_cut2 | -2.053553 1.618611
------------------------------------------------------------------------------
. di 1 - normprob(-2.054 + 2.918)
.19379395
. di 50*.373 + 100*.194
38.05
. * With choice:
. di 1 - normprob(-2.054 + 2.547)
.31100629
. di 50*.394 + 100*.311
50.8
. di 50.8 - 38.05
12.75
. * So, using the ordered probit, the effect of choice for this person is
. * about 12.8 percentage points more in stock, which is not far from the
. * 12.1 points obtained with the linear model.
d. We can compute an R-squared for the ordered probit model by using the
squared correlation between the predicted pctstcki and the actual. The
following Stata session does this, after using the "oprobit" command:
232
. predict p1 p2 p3
(option p assumed; predicted probabilities)
(32 missing values generated)
. sum p1 p2 p3
| pctstck pctstcko
-------------+------------------
pctstck | 1.0000
pctstcko | 0.3119 1.0000
. di .321^2
.103041
The R-squared for the linear regression was about .100, so the R-squared is
only slightly higher for ordered probit. In fact, the correlation between the
fitted values for the linear regression and ordered probit is .998, so the
15.15. We should use an interval regression model, that is, ordered probit
with known cut points. We would be assuming that the underlying GPA is
233
15.16. a. P(yi = 1|xi,ri) = P(xiB + ui > ri|xi,ri) = P[ui/s > (ri - xiB)/s] =
this would make it easy to obtain standard errors and other test statistics
d. As in part a,
+ (1 - yi)log[G(ri - xiB;D)].
to pay. Our choice of distribution for u can certainly affect our estimation
of B. If we could observe wtpi for each i, we would use linear regression and
15.17. a. We obtain the joint density by the product rule, since we have
f(y1,...,yG|x,c;Go) = f1(y1|x,c;Go)f2(y1|x,c;Go)WWWfG(yG|x,c;Go).
1 2 G
234
respect to the distribution of c given x:
8& G *
g(y1,...,yG|x;Go) = i p fg(yg|x,c;Go) h(c|x;Do)dc,
g
7
-8 g=1
8
where c is a dummy argument of integration. Because c appears in each
log
# 8i& pG f (y |x ,c;Gg)*h(c|x ;D)dc$.
3 -87g=1 g ig i 8 i 4
As expected, this depends only on the observed data, (xi,yi1,...,yiG), and the
unknown parameters.
b. Let gt(yt|xi,ai;Q) = [F(j + xitB + xiX + ai)] t[1 - F(j + xitB + xiX +
----- y -----
1-yt
ai)] . Then, by the product and integration rules,
f(y1,...,yT|xi;Q) =
# 8i& pT g (y |x ,a;Q)*h(a|x ;D)da$,
3 -87t=1 t t i 8 i 4
where h(W|xi,D) is the Normal[0,saexp(xiL)] density.
2 -----
15.19. a, b. Here is the Stata output for black men. I have balanced the
panel, so that only men in the sample from 1981 through 1987 appear.
235
Probit estimates Number of obs = 4038
LR chi2(1) = 1091.27
Prob > chi2 = 0.0000
Log likelihood = -2248.0349 Pseudo R2 = 0.1953
------------------------------------------------------------------------------
employ | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
employ_1 | 1.389433 .0437182 31.78 0.000 1.303747 1.475119
_cons | -.5396127 .0281709 -19.15 0.000 -.5948268 -.4843987
------------------------------------------------------------------------------
. di normprob(-.540)
.29459852
. di normprob(-.540 + 1.389)
.80205935
. di .802 - .295
.507
------------------------------------------------------------------------------
employ | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
employ_1 | 1.321349 .0453568 29.13 0.000 1.232452 1.410247
y83 | .3427664 .0749844 4.57 0.000 .1957997 .4897331
y84 | .4586078 .0755742 6.07 0.000 .3104852 .6067304
y85 | .5200576 .0767271 6.78 0.000 .3696753 .6704399
y86 | .3936516 .0774703 5.08 0.000 .2418125 .5454907
y87 | .5292136 .0773031 6.85 0.000 .3777023 .6807249
_cons | -.8850412 .0556041 -15.92 0.000 -.9940233 -.7760591
236
------------------------------------------------------------------------------
. di normprob(-.885 + .529)
.36092028
------------------------------------------------------------------------------
employ | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
employ_1 | .8987858 .0677035 13.28 0.000 .7660893 1.031482
employ81 | .5662849 .088493 6.40 0.000 .3928418 .739728
y83 | .4339896 .0804062 5.40 0.000 .2763964 .5915828
y84 | .6563064 .0841192 7.80 0.000 .4914358 .821177
y85 | .7919761 .0887153 8.93 0.000 .6180972 .9658549
y86 | .6896298 .0901566 7.65 0.000 .5129262 .8663335
y87 | .8381973 .0910525 9.21 0.000 .6597376 1.016657
_cons | -1.0051 .0660937 -15.21 0.000 -1.134641 -.8755586
-------------+----------------------------------------------------------------
/lnsig2u | -1.178755 .1995222 -1.569811 -.7876984
-------------+----------------------------------------------------------------
sigma_u | .5546726 .0553347 .4561628 .6744557
rho | .2352762 .0358983 .1722434 .3126631
------------------------------------------------------------------------------
Likelihood ratio test of rho=0: chibar2(01) = 47.90 Prob >= chibar2 = 0.000
13.3. As yet, we do not know how the coefficient .899 into the estimated
state dependence. Note that employ81 is also very significant, showing that
2 2
ci and employi,81 are positively correlated. The estimate of sa is (.555) ,
^2
or sa ~ .308.
f. The average state dependence, where we average out the distribution of
238
. gen prbdif87 = normprob((-1.005 + .838 + .899 + .566*employ81)/sqrt(1 +
.555^2)) - normprob((-1.005 + .838 + .566*employ81)/sqrt(1 + .555^2)) if y87
& black
(11493 missing values generated)
ci)], where the expectation is with respect to the distribution of ci. The
2 1/2
estimate is based on E{F[(j + d87 + r + xyi0)/(1 + sa) ]} - E{F[(j + d87 +
2 1/2
xyi0)/(1 + sa) ]} (by iterated expectations):
-1 N
N S {F[(j^ + d^87 + r^ + x^yi0)/(1 + ^s2a)1/2]
i=1
^ ^ ^ ^2 1/2
- F[(j + d87 + xyi0)/(1 + sa) ]};
see page 495 in the text. Interestingly, .283 is just over half if the
15.20. Since y1 = z1D1 + g(y2)A1 + u1, and we can write, just as before, u1 =
*
A1/(1 2 1/2
- r1) , and q1/(1 - r1)
2 1/2
, where recall that A1 is now a vector of
difficult. The key restriction is that the vector of reduced form errors, v2,
would have to be jointly normally distributed (along with u1). But the
239
endogeneity is solved by including in the vector of reduced form residuals in
the probit. For details, see J.M. Wooldridge, "Unobserved Heterogeneity and
240