Sei sulla pagina 1di 55

Department of Economics

Harvard University

Economics 1123
Fall 2003
Midterm Exam
Solutions

Question 1 (24 points)


The measure of a no-school day used in regression (1) (column (1) in Table 2) is whether the
day is a teacher meeting day. The measure of a no-school day in regression (2) is whether the
day is a break day (Thanksgiving break, etc.).
a) (9 points) For regression (1):
(i) Provide the estimated effect on the number of incidents of having a no-school day.
estimated effect = .76
(ii) Is this estimated effect large in a real-world sense? Briefly, explain.
This means that there is on average .76 more juvenile property crime incidents on a teacher
meeting day, which seems moderately large big enough that it shouldnt be ignored. The mean
number of incidents is 3.35, so an increase of .76 represents an increase of just over 20%,
relative to the mean, and an increase in teen criminal activity of 20% is large enough to be of
concern. (Reaching a different judgment is OK if the reasoning is sound.)
(iii) Test the hypothesis that this effect is zero, against the alternative that it is nonzero, at
the 5% significance level.
t = .76/.26 = 2.92, so |t| > 1.96 so the hypothesis is rejected at the 5% significance level.
b) (9 points) Repeat (i) (iii) for regression (2)
(i) estimated effect = .173
(ii) This means that there is on average .173 more juvenile property crime incidents on a break
day, which seems pretty small one crime for every 6 break days; certainly much less than the
estimate in regression (1). (Reaching a different judgment is OK if the reasoning is sound.)
(iii) t = .173/.114 = 1.52, so |t| < 1.96, so the null hypothesis is not rejected.
c) (6 points) Suggest a plausible reason, based on the definitions of the variables in
regressions (1) and (2), why the two estimates differ.
The two types of days are different. On a teacher meeting days, one would expect less parental
supervision and the teens to have more free time, because this is a normal work day for the
1

parents. On holidays like Thanksgiving or Christmas, teens are more likely to be spending time
with their family rather than hanging. (Other explanations are OK if the logic is sound.)
Question 2 (30 points)
a) (6 points) Explain what is meant by the SER in regression (3).
The SER is 3.89, which means that a typical prediction error from the regression has absolute
magnitude 3.89 (units of incidents per day)
b) (6 points) Interpret the coefficient on teacherday in regression (3).
The effect on the number of incidents of having a teacher meeting day is .87 (an increase of .87
incidents/day), holding constant population, whether the day is a break day, and whether the day
falls in the summer. (Here is another way to say the same thing: holding constant population,
the effect of having a teacher meeting day on what would otherwise be a normal school day is to
increase juvenile property crime incidents by .87 [this wording is OK because including
breakday and summer means that the base case for the comparisons is a normal school day]).
c) (6 points) Suggest a reason why the errors in regression (3) might be heteroskedastic;
explain.
Heteroskedasticity occurs when the variance of the error term depends on one or more of the
regressors.
In this regression, var(u) could plausibly depend on population. Bigger cities will have more
incidents because there are more teens, and it is plausible that the variability of the number of
incidents from day to day is greater in big cities than small cities (the distribution for a small city
might range from 0 to 5; for a big city, it might range from 2 to 20). Thus the spread of the
distribution, as well as its mean, depends on the population; that is, var(u), will be a function of
population, one of the regressors.
d) (6 points) Using regression (3), compute the predicted value of the number of incidents on a
teacher meeting day for a city with a population of 200,000 (so pop = 2).
#n
incidents = 1.39 + .87teacherday + .129breakday + .292summer + 1.58pop
= 1.39 + .871 + .1290 + .2920 + 1.582 = 5.42
e) (6 points) The school superintendent in a city with population 200,000 is contemplating
changing a normal school day into a teacher meeting day. Use regression (6) (not
regression (3)) to estimate the effect of this decision on the number of juvenile property
crime incidents.

This can be done using a before and after calculation. Note that the terms that do not
involve teacherday, such as the term in breakday, are the same in the before and after
scenarios and thus cancel out when before is subtracted from after. The operational part of
the calculation (ignoring the terms that drop out) thus is:
after:
#n
incidents = .82teacherday + 3.05(teacherdaypop) 1.21(teacherdaypop2)
+ .13(teacherdaypop3)
= .821 + 3.05(12) 1.21(122) + .13(123) = 1.48
before:
#n
incidents = .82teacherday + 3.05(teacherdaypop) 1.21(teacherdaypop2)
+ .13(teacherdaypop3)
= .820 + 3.05(02) 1.21(022) + .13(023) = 0
The predicted effect of the decision is after before = 1.48, that is, according to regression
(6) the predicted effect of the superintendents decision is an increase of 1.48 juvenile property
crime incidents on the proposed teacher meeting day.
Question 3 (26 points)
a) (8 points) One possibility is that pop enters the population regression function nonlinearly.
What does regression (5) tell us about this possibility? Briefly explain (be precise).
The hypothesis of linearity implies that the population coefficients on pop2 and pop3 are zero.
This is tested using the F-statistic, which is 171.5, so with p < .001 the hypothesis is rejected at
the 5% (1%, etc.) significance level. This provides evidence that pop enters the population
regression function for specification (5) nonlinearly. (Note: it is not a complete answer to
examine only the individual t-statistics on the two individual coefficients on pop2 and pop3,
unless you do so using the Bonferroni critical values.)
b) (8 points) Another possibility is that the effect on crime of a no-school day is different in
bigger cities than in smaller cities. What do the regression results tell us about this
possibility? Briefly explain (be precise).
To examine whether the effect of teacherday depends on population, we need to examine
regressions with interactions between teacherday and pop. There are two such regressions, (4)
and (6).
Regression (4) There is a single interaction term, teacherdaypop. The t-statistic is t = .40/.31
= 1.29 so the hypothesis that this coefficient is zero in population is not rejected at the 5%
3

significance level. So, there is no statistical evidence of an interaction effect in this regression.
Regression (6) There are three interaction terms, teacherdaypop, teacherdaypop2, and
teacherdaypop3. The hypothesis of no interaction effect is equivalent to saying that the
coefficients on all three terms must be zero in the population regression function. This is tested
using the F-statistic for these three coefficients, which is F = 1.75. Because its p-value of .155
exceeds .05, we cannot reject (at the 5% significance level) the null hypothesis that all three
coefficients are zero. So, there is no statistical evidence of an interaction effect in this
regression.
Overall, there is no statistical evidence that the effect of a teacher meeting day varies with the
population of the city.
c) (10 points) In words, briefly summarize your conclusions from Table 2 about the effect on
juvenile property crime of having a no-school day because of a teacher meeting day.
x
x
x
x

There is no statistical evidence that the effect of a teacher meeting day varies with the
population of the city.
There is statistical evidence that population enters the regression function nonlinearly
These two observations point to regression (5) as being the preferred specification
In regression (5), after controlling for population, the estimated effect of a teacher meeting
day, relative to a normal school day, is to increase the number of incidents of juvenile
property crime by .87, or nearly one. In a real-world sense this seems pretty big: hold a
teacher meeting day, you get a juvenile property crime incident. This effect is statistically
significant at the 5% (and 1%) significance level. [Also, could add: The 95% confidence
interval of (.42, 1.32), which in a practical sense is rather tight, it provides strong evidence
against really big effects like 10 incidents per teacher meeting day.]

Question 4 (20 points)


a) (10 points) Suggest a policy intervention that is, a specific program to provide teenagers
with some form of supervision for which these results would be externally valid. Suggest a
different policy intervention for which these results would not be externally valid. Explain
(be precise).
These results would be externally valid for an intervention in which teacher meeting days were
rescheduled for Saturday or for a proposal to start the school year one day earlier.
These results would not be externally valid for programs where participation is voluntary, for
example starting up a voluntary chess club that would meet on Saturday mornings. The teens
who would join a Saturday morning chess club probably arent the ones who would be
committing property crimes.

b) (10 points) Suggest two potential threats to the internal validity of your conclusions in 3(c).
That is, provide two potential threats to the internal validity of the regression analysis
summarized in Table 2. Explain why each threat could be relevant to this study (be precise).
Here are five:
x

(Omitted variable bias) Crime rates vary with income. Thus (i) income belongs in the
regression function. Moreover, teacher meeting days cost the school district money.
Suppose that the poorest cities cannot afford teacher meeting days, and the richest ones have
many teacher meeting days. If so, then (ii) teacherday would be correlated with income.
Thus the two conditions for omitted variable bias would hold and there would be omitted
variable bias. Intuitively, teacherday would be picking up an income effect. This would
result in the OLS estimates understating the effect of teacherday (more income, more teacher
days and less crime).

(Wrong functional form) Whether teens commit a crime could depend on the temperature,
with crime less likely below freezing. If so, then temperature should enter the equation and
moreover there should be interaction terms between temperature and teacherday. According
to this story, the effect on the number of incidents of having a teacher meeting day on a day
with cold weather would be less than if it is held during warm weather.

(Simultaneous causality bias) If superintendents in high-crime cities are particularly


concerned that their students be supervised, then they might schedule teacher meetings late in
the afternoon and have fewer or no teacher meeting days. Thus a large error term (high
crime city) would be associated with a small value of teacherday, so the error term would be
(negatively) correlated with teacherday. The direction of bias would be to understate the
effect of a teacher meeting day. (This could alternatively be thought of as omitted variable
bias, where the omitted variable is the overall crime rate of the city.)

(Errors-in-variables bias) If the administrative records used to construct these data were
incorrect and teacherday was measured incorrectly, then there would be errors-in-variables
bias and the estimated effect would be understated.

(Wrong standard errors) The OLS assumption of i.i.d. sampling is violated here because
there are repeated observations within the same city. This will induce correlation of the error
term across cities. For example, if Dayton, Ohio (one of the cities in the data set) has a large
number of incidents on a given Monday, then it might well have a large number on the
following Tuesday. If so, the formula for computing the standard errors is incorrect. This
problem does not produce bias in the OLS estimators but it means that confidence intervals
and test statistics will be wrong, leading to incorrect inferences in general. (There is no
reason you should know this: this explanation, while correct, has not yet been covered in
class.)

Par t 1 (25 points)


1)

(5 points) Interpret the coefficient on Beauty in regression (2).

An increase in Beauty by one is associated with an increase in the Course Overall score by .275, holding
constant the effects of the instructors gender, minority status, status as a native English speaker, whether
s/he is in a tenure track position, and whether the course is a one-credit course.

2)

(5 points) Using regression (2), compute a 95% confidence interval for the population
coefficient on Beauty.

EBeauty
3)

r 1.96SE( EBeauty ) = .275 r 1.96u.059 = (.159, .391)

(5 points) Define a 95% confidence interval.

Either of these answers receives full credit:


x A 95% confidence interval is an interval, which is a function of the data, that contains the true
value of the coefficient in 95% of all samples.
x A 95% confidence interval is the set of values of E1 that cannot be rejected using a hypothesis test
with a 5% significance level.

4)

(5 points) Professor Stock is male, not a minority, is a native English speaker, and is tenure
track. Ec1123 is not an introductory course, nor is it a one-credit elective. Suppose that
Professor Stock has average beauty, so his value of Beauty is zero. Use regression (2) to
compute the predicted course overall course evaluation score for Ec1123 this semester.

This is solved by substituting the values of the variables into the regression equation:

n = 4.25 + .275uBeauty .239uFemale .249uMinority .253uNonNativeEnglish


CourseOverall
.136uTenureTrack .046uIntroCourse + .687uOneCreditCourse
For Prof. Stock, the predicted value is:

n = 4.25 + .275u0 .239u0 .249u0 .253u0


CourseOverall
.136u1 .046u0 + .687u0 = 4.11

5)

(5 points) The professor in Ec1123 next semester is a tenure-track white male Australian.
Suppose he has a Beauty score of 1.66. Use regression (2) to compute a 95% confidence
interval for the difference between the Ec1123 Course Overall evaluation score next
semester and the Course Overall score this semester.

All the variables for Prof. Elliott are the same as for Prof. Stock, except that Beauty = 1.66; so the
difference is

EBeauty u1.66 and the 95% confidence interval is


4

EBeauty u1.66 r 1.96uSE( EBeauty )u1.66


= .275u1.66 r . 1.96u059u1.66 = .457 r .192 = (.265, .648)

Par t 2 (24 points)


1)

Suppose you want to estimate a version of regression (2) in which the coefficients on all
regressors except Beauty are the same for men and women, however the effect of Beauty
can differ for men and women.
a) (4 points) Provide a regression specification that achieves this (be specific).
b) (2 points) In your specification in (a), how would you test the hypothesis that the effect
of Beauty is the same for men and women (be specific)?

a) This is accomplished by using an interaction between Beauty and Female:


CourseOverall = E0 + E1Beauty + E2Female + E3BeautyuFemale
+ E4Minority + + E9OneCreditCourse + ui
where denotes all the other regressors in regression (2).
b) Compare the t-statistic testing E3 = 0 in this regression (the coefficient on the interaction) to the desired
critical value, e.g. 1.96 for a two-sided 5% test.

2)

The coefficient on Beauty drops from .410 in regression (1) to .275 in regression (2).
a) (4 points) Explain why. What does this drop imply about the relation between Beauty
and One-credit course?
b) (4 points) Is your reason in (a) for this decline plausible in a real-world sense?
Explain.

a) Because the coefficient fell, there was omitted variable bias in regression (1), specifically the coefficient
on Beauty was in part reflecting the effect of OneCreditCourse. In regression (2), the effect of
OneCreditCourse is positive. This means that the correlation between OneCreditCourse and Beauty must
be positive: If they are positively correlated and OneCreditCourse is omitted, then Beauty will pick up the
(positive) effect of OneCreditCourse and the coefficient on Beauty will be larger without OneCreditCourse
in the regression, than with it in.
Alternatively, this positive correlation can be seen directly from the omitted variable bias formula:
p

E1 o E1

Vu
U
V x Xu

Because the estimator without OneCreditCourse is too large, the second term in the expression must be
positive, so UXu is positive; but because OneCreditCourse enters the regression with a positive coefficient
it enters u positively when it is omitted so it must be that Beauty and OneCreditCourse are positively
correlated.
b) For Beauty and OneCreditCourse to be positively correlated, it must be that instructors of one credit

courses, like dance or yoga, are better looking than instructors of regular courses, like econometrics. This
doesnt seem plausible, if anything, I would think the correlation should be negative. Who would you
rather look at anyway, and economist or a dance instructor? The economist, I should think.

3)

The following variables are not in regression (2):


a) The amount of time the instructor spends on course preparation per class.
b) The marital status of the instructor.
For each, explain whether omission of this variable from regression (2) will, in your
judgment, plausibly result in omitted variable bias for the estimated effect of Beauty.
Briefly explain. (5 points each)

a) For a variable to cause omitted variable bias, it must be (i) a determinant of Y (belong in the equation),
and (ii) be correlated with X.
(i) Yes: The amount of time the instructor spends preparing should be a determinant of course
quality and thus course evaluations
(ii) No: To the extent that diligence and Beauty are uncorrelated, Beauty will be uncorrelated with
the amount of time spent preparing.
Because (i) and (ii) are not both Yes, omission of the amount of class preparation does not result in
omitted variable bias.
Note: Here is a (full credit) argument that (ii) should be yes: Physically unappealing instructors have
learned over the years that they must make up for their bad looks and therefore work harder.
b) Addressing the two criteria:
Two full-credit answers to (i):
(i) No. It isnt obvious why marital status should affect teaching quality
(ii) Yes. The divorced might be less happy and this might be reflected in their teaching, either
negatively (bad attitude) or positively (escape into their work, which is teaching).
Three full-credit answers to (ii):
(ii) Yes. Demand for a spouse is increasing in Beauty so the beautiful are more likely to be
married, so Beauty and Married would be positively correlated.
(ii) Yes. Supply of spouses is decreasing in Beauty because they dont need to get married (more
non-marital opportunities) and have shorter spells, so Beauty and Married would be negatively
correlated.
(ii) No. There is assortive matching (the technical term), that is, movie stars marry movie stars,
economists marry people who look like economists, etc., and everyone gets a match despite their
looks, so Beauty and Married are uncorrelated.
Picking Yes to both yields OV bias, picking No to either (or both) yields no OV bias. Personally, I think
the most reasonable choice is No (the divorce effect probably is a very small part of teaching quality) and
No (the assortive matching argument).

Par t 3 (21 points)


1)

Suppose you have data on years of teaching experience (Experience) of the instructor, and
you are considering choosing among three possible specifications:
(i) regression (2) plus Experience
6

(ii) regression (2) plus Experience, Experience2, and Experience3


(iii) regression (2) plus log(Experience)
a)

(6 points) In your judgment (before you know the results of these regressions), which
specification, (i), (ii), or (iii), is the most appropriate? Explain.
b) (4 points) Suppose you estimated regressions for specifications (i) and (ii). How would
you decide, based on the empirical evidence, whether (i) or (ii) is more appropriate.

a) Quite plausibly, there is a decreasing marginal effect of experience: Instructors learn a lot in their first
few years, but after they are farther up the learning curve the incremental amount they learn, and
improve, in subsequent years is less. This means that (i) is a poor initial choice. Both (ii) and (iii) allow for
decreasing marginal effects. An argument for using (ii) is that it is more flexible than (iii) (more terms), for
example it would fit an S-shape learning curve but (ii) would not. An argument against (ii) is that it can
start to slope down at high years of experience (or maybe this is an argument in favor of (ii)???).
Arguments in favor of (iii) are that it is a simple starting point, and that it has a natural interpretation as
having a constant increase in Y for a percentage increase in Experience (but perhaps this doesnt seem
natural to you, in which case this is an argument against). Any reasoned opinion about whether (ii) or
(iii) is a better starting point, based on at least one correct, substantive difference between the two
functional forms, received full credit.
b) Compute the F-statistic testing the joint hypothesis that the coefficients on Experience2 and
Experience3 are both zero, against the alternative that one or the other or both is nonzero; reject if the pvalue is less than the desired significance level (or the F-statistic exceeds the appropriate critical value
from the F2,f distribution).

2)

Consider regression (4).


a) (2 points) Test, at the 5% level, the hypothesis that the coefficient on BeautyuDBeauty>0
is zero, against the alternative that it is nonzero.
b) (4 points) In real-world terms, describe the null hypothesis you just tested, the
alternative, and the conclusion you draw from the hypothesis test.

a) t = .081/.135 = 0.60 < 1.96, so do not reject at the 5% significance level


b) If the coefficient BeautyuDBeauty>0 is nonzero, then the effect of an increase in Beauty is different for
those with Beauty > 0 than it is for those with Beauty < 0. In real-world terms, under the alternative, the
effect of being more attractive might be important for those who are particularly attractive, but not for the
less-attractive (or the reverse could be true): the Beauty effect might only matter for those who are
especially attractive, for example, and not be important for the rest of us. Under the null hypothesis, the
effect of a change in Beauty is the same, no matter how attractive you are. Because the null hypothesis is
not rejected, we are led to conclude (based on this evidence) that the effect of a change in Beauty is the
same for all levels of Beauty.

3)

(5 points) Test (at the 5% significance level) the hypothesis that the effect on course
evaluations of Beauty is the same for men and for women, against the alternative that these
effects differ.

This problem requires comparing the coefficients on Beauty in regression (5) for men to that in regression

(6) for women. Let

EBeauty

men

be the estimator for men (regression (5)) and

EBeauty

women

be the estimator

for women (regression (6)). We are interested in the difference,

' = EBeauty

EBeauty

men

women

and in testing the null hypothesis that the population difference, ', is zero. The test statistic is,

t=

'
SE ( ' )

) is the standard error of


where SE( '
variance. Now
) = var( E
var( '
Beauty

men

= var( EBeauty
Because the estimators

EBeauty

men

' . In general, the SE is an estimator of the square root of the

EBeauty

women

+ var( EBeauty

men )

and

EBeauty

women

women

) 2cov( EBeauty

men ,

EBeauty

women

are computed from different samples, they are

independently distributed and therefore uncorrelated, so the covariance in the final expression is zero and

) = var( E
var( '
Beauty

+ var( EBeauty

men )

women

These are population relations, and we need to estimate them. Because the standard error is the square

) is (by definition) the square of its standard


root of the estimator of the variance, an estimator of var( '
error. Similarly, an estimator of var( EBeauty
for var( EBeauty

women

men )

is (by definition) the square of its standard error, and also

). Thus,

)2 = SE( E
SE( '
Beauty

men )

+ SE( EBeauty

women

)2

)2 is the square of SE( ' ). Thus we have,


where SE( '
)=
SE( '

SE ( EBeauty

men

)2

SE ( EBeauty

women

)2

Substituting the empirical values in regressions (5) and (6) into this expression yields:

)=
SE( '

.0762

.064 2 = .099.

Therefore,

t=

.384 .128 .256


'
= 2.59
=
=
.099
.099
SE ( ' )
8

which exceeds 1.96, so you can reject at the 5% significance level.


(Note: a full-credit answer did not need to go through every step of the derivation.)

Par t 4 (14 points)


1)

(6 points) Suppose you have data on marital status of the instructor (the data record three
possibilities: single and never married, single and divorced, married). Provide a regression
specification that modifies (2) so as to control for marital status (be specific).

To regression (2), add the variables SNMi and Mi, where SNMi = 1 if the instructor is single and never
married and = 0 otherwise, and Mi = 1 if the instructor is married and = 0 otherwise. In this specification,
the third possible binary variable, SDi (= 1 if single and divorced, = 0 otherwise) must be excluded, else
you will have perfect multicollinearity because SNMi + Mi + SDi = 1 = X0i, where X0i is the constant
regressor.
Other specifications are possible, e.g. including only SNMi and SDi, or including all three indicators but
omitting the intercept (dropping X0i). But a specification including SNMi, SDi, Mi, and an intercept is
incorrect, it will have perfect multicollinearity.

2)

(8 points) Based on the facts given in the following statement and on the empirical results
presented in Table 2, in your judgment is the conclusion in the following statement justified
or not? Explain.
Regression (2) does not control for innate teaching ability. To do so, I obtained data
on the instructors average teaching evaluations in the previous year and added it to
regression (2). The coefficient on Beauty fell to .051 and was not statistically
significant (SE = .079). Therefore I conclude that the Beauty coefficient in regression
(2) is subject to omitted variable bias and that the true causal effect on course
evaluations of Beauty is effectively zero.

Taking the facts stated at face value, the relevant question is the interpretation of these facts, that is, do
they imply (as the statement says) that the true causal effect on CourseOverall of Beauty is effectively
zero. To evaluate this statement, one must think through what is actually being estimated in this
regression. The coefficient in Beauty in this regression is the effect of a unit change in Beauty, holding
constant last years evaluations and the other variables in (2). Because last years evaluations are being
held constant, the effect of Beauty is its effect on the change in the course evaluations from last year to
this year. Thus the regression asks the question, do more attractive individuals have a greater
improvement, on average, in their course evaluations, than less attractive individuals, holding the other
regressors in (2) constant? This is a very different question than the original, and it makes sense that the
answer to this alternative question is no. More attractive instructors might have higher evaluations, but
those evaluations do not continue to get better and better because they are more attractive rather, they
just stay (on average) at their originally elevated level. So the conclusion is incorrect, because the
regression is answering a different question than the original question of interest.
A second, related way to say this is that Beauty affects past evaluations, so adding past evaluations is one
of the channels by which Beauty has an effect; holding past evaluations constant and focusing on the
coefficient on Beauty is not the right concept in the sense that holding constant past evaluations is in part
holding constant the effect of Beauty through this channel.

Par t 5 (16 points)


A FAS committee on improving undergraduate teaching needs your help before reporting to
Dean Kirby. The committee seeks your advice, as an econometric expert, about whether FAS
should take physical appearance into account when hiring teaching faculty. (This is legal as long
as doing so is blind to race, religion, age, and gender.) You do not have time to collect your own
data so you must base your recommendations on the regression results in Table 2. Based on
your analysis of Table 2, what is your advice? Justify your advice based on a careful and
complete assessment of the internal and external validity of the results in Table 2.
Notes on Part 5:
x Assume the committee knows econometrics and econometric jargon at the level of this
course.
x The committee has experts on ethics, law, and university policy, and it is uninterested in
your views about the ethics or practicality of this proposed policy, whether the university
should be in the business of maximizing course evaluation ratings, etc.; not that these are
unimportant issues, they are simply not the question asked of you.
What follows is a complete answer, other full-credit answers are possible using different (plausible and
reasoned) examples, it is acceptable to reach different conclusions as long as the reasoning is logical and
complete.
I begin by discussing the internal and external validity of the results in Table 2, then turn to the
conclusions.
Internal validity.
1. Omitted variable bias. It is always possible to think of omitted variables, but the relevant question is
whether they are likely to lead to substantial omitted variable bias. I do not think that the examples given
in the questions above, marital status and instructor diligence, are likely to be major sources of bias,
although this is speculation and the next study on this topic should address these issues (both can be
measured).
One possible source of OV bias is the omission of the department. French instructors could well be more
attractive than chemists, and if French is more fun (or better taught) than chemistry then the department
would belong in the regression, and its omission could bias the coefficient on Beauty. It is difficult to say
whether this is a major problem or not, one approach would be to put in a full set of binary indicators for
the department and see if this changed the results. I suspect this is not an important effect, however this
must be raised as a caveat.
Another possible source of omitted variable bias is that Beauty might be correlated with self-confidence in
front of others, vivaciousness in class, and other aspects of instructor behavior that affect the atmosphere
of the classroom and thus student attentiveness and teaching quality. Said differently, are engaging
instructors more attractive? They could be, as a result of a lifetime of being attractive people enjoy
being in their presence because of their appearance and this makes them comfortable with being the
center of attention, essentially with being a performer. So if we had data on Engaging Presence
(correlated with Beauty, a determinant of Course Overall) then we could include it, but we do not.
2. Wrong functional form. The fact that the coefficients on BeautyuDBeauty>0 argues suggests that there

10

might not be important nonlinearities, however this could be (and should be) investigated further, e.g. by
putting in quadratics or cubics. Given the insignificance of the BeautyuDBeauty>0, it seems that this is
probably unlikely to change the results in an important way.
3. Measurement error in the regressors. The Beauty variable is subjectively measured so that it will
have measurement error. This is plausibly a case in which the measurement error is more or less
random, reflecting the tastes of the six panelists. If so, then the classical measurement error model, in
which the measured variable is the true value plus random noise, would apply. But this model implies that
the coefficient is biased down so the actual effect of Beauty would be greater than is implied by the OLS
coefficient. This suggests that the regressions in Table 2 understate the effect of Beauty.
4. Sample selection bias. The only information given in this exam about the sample selection method is
that the instructors have their photos on their Web site. Suppose instructors who get evaluations below
3.5 are so embarrassed that they dont put up their photos, and suppose there is a large effect of Beauty.
Then, of the least attractive instructors, the only ones that will put up their photos are those with particular
teaching talent and commitment, sufficient to overcome their physical appearance. Thus the effect of
physical appearance will be attenuated because the error term will be correlated with Beauty (low values
of Beauty means there must be a large value of u, else the photo wouldnt be posted.) This story, while
logically possible, seems a bit far-fetched, and whether an instructor puts up his or her photo is more likely
to be a matter of departmental policy, whether the department has a helpful webmaster and someone to
take their photo, etc. So sample selection bias does not seem (in my judgment) to be a potentially major
threat.
5. Simultaneous causality bias. There is an interesting possible channel of simultaneous causality, in
which good course evaluations improve an instructors self-image which in turn means they have a more
resonant, open, and appealing appearance and thus get a higher grade on Beauty. Against this, the
panelists were looking at the Web photos, not their conduct in class, and were instructed to focus on
physical features. So for the Beauty variable as measured, this effect is plausibly large.
External Validity
The question of external validity is whether the results for UT-Austin in 2000-2002 can be generalized to
Harvard in 2005. The years are close, so the question must focus on differences between students and
the instructional setting.
1. Are UT-Austin students like Harvard students? In their answers to this question, some in the class
suggested that Harvard students seek truth through intellectual arguments, that beauty matters, but it is
the beauty of the mind unlike, it seems, students in Austin, for whom beauty is skin-deep. Perhaps.
2. Do the methods of instruction differ? For example, if beauty matters more in small classes (where you
can see the instructor better) and if the distribution of class size at UT-Austin and Harvard were
substantially different, then this would be a threat to external validity. (The distribution of class sizes
between the two are not so very different that this threat is important both have large, mainly
introductory, courses, and small, main upper-level, courses.)
3. The Course Overall score is just a student evaluation, not a measure of what students actually learned
or how valuable the course was; perhaps an assessment of the value of the course, five years hence,
would produce a very different effect of Beauty, and that is arguably a more important outcome than the
end-of-semester evaluation. If the committee is interested in long-term learning or having students that,
with the experience of age, look back on their courses as meaningful, then the Course Overall results
might differ from a longer term retrospective. (Note: This as a threat to external validity because it
concerns the dependent variable the measurement error threat to internal validity concerns the
regressors.)

11

Note that both of these concerns could be addressed by performing a similar study using data from
Harvard and, for example, Stanford, Yale, and Princeton. This isnt feasible in the time frame of the
question but would be feasible if this policy issue were to be taken seriously.
Policy advice
As an econometric consultant, the question is whether this represents an internally and externally valid
estimate of the causal effect of Beauty, or whether the threats to internal and/or external validity are
sufficiently severe that the results should be dismissed as unreliable for the purposes of the FAS
committee. A correct conclusion is one that follows logically from the systematic discussion of internal and
external validity.
One subtle issue concerns the issue raised in the OV discussion above about the omission of Engaging
Presence if attractive people have a lifetime of experience being the center of attention, perhaps they
get better teaching evaluations because they have become good at being the center of attention, not
because they are beautiful. The reason this is subtle is that the import of this criticism depends on the
policy under question. If the FAS committee were considering a policy of free plastic surgery to its least
attractive faculty members, then this criticism would be key perhaps it is too late, their lifetime of being
unattractive has made them uncomfortable as a center of attention and changing physical appearance
wont change their lifetime of experiences. But that is not the policy under discussion: in the policy under
discussion, if you hire a beautiful person, you hire someone with a lifetime of experience being beautiful,
so it actually doesnt matter whether it is the beauty per se or the acquired comfort and enjoyment of being
a performer both come as part of the package that the FAS committee is considering buying.
Taking the foregoing concerns into account, my own conclusion is that I would be surprised if the threats
to internal and external validity above are sufficiently important, in a quantitative sense, to change the
main finding from Table 2 that the effect on Course Overall of Beauty is positive and quantitatively large.
Moreover, the discussion of internal and external validity and the observations in the previous paragraph
suggest that this is a causal effect in the sense of interest to the committee, that is, changing the
composition of faculty to be more attractive would increase teaching evaluations (even though the channel
might not necessarily be through the obvious aspect of simply the instructors physical appearance). So
my advice, as econometric consultant, would be that implementing a policy of affirmative action for
attractive people (all else equal, higher the better-looking) would, in expectation, improve Course Overall
scores.
A good econometric policy advisor always has some suggestions for further research (which s/he would
be happy to do, for a fee). One thing a follow-on study could do is focus on Ivy league institutions, and
collect data on some potential omitted variables (marital status, department offering the course, etc.).
Another, very different study would be to do a randomized controlled experiment that would get directly at
the policy question. Some department heads would be instructed to assign their most attractive teachers
to the largest introductory courses (treatment group), others would be instructed to maintain the status quo
(control group). The study would assess whether there is an improvement in evaluation scores (weighted
by class size) in the treatment group. A positive result would indicate that this treatment results in an
increase in customer satisfaction.
Finally, some thoughts that were out of bounds for this exam, but would be relevant and important to raise
in the report of an econometric consultant to the FAS committee. FIrst, academic output is not solely
teaching, and there is no reason at all that the results here would carry over to an analysis of research
output, or even graduate student advising and teaching (the data are only for undergrad courses); indeed,
the sign might be the opposite for research. Second, the econometric consultant could raise the question
of whether Beauty has the same moral status as gender or race, even if it does not have the same legal
status as a legally protected class; answering this question is outside the econometric consultants area of
expertise, but it is a legitimate question to raise and to frame so that others can address it.

12

Par t 1 (25 points)


1)

(5 points) Interpret the coefficient on Greek in regression (1).

Holding constant the students age and gender, being in a fraternity or a sorority is associated with an
increase of 1.87 days of binge drinking (out of 30 days) (Different wording: The effect on binge30 [the
number of binge-drinking days out of 30 days] of being in a fraternity is 1.87, controlling for the students
age and gender.)

2)

(5 points) Explain why the coefficient on Greek decreases from regression (1) to regression
(2).

The coefficient on Greek in regression (1) evidently has omitted variable bias. Students on a sports team
do more binge drinking (the coefficient on sports is positive), and being on a sports team seems to be
positively correlated with Greek (this is also common sense). That is, sports is a positive determinant of
binge30, and is positively correlated with Greek, so when sports is omitted in (1) Greek must capture both
effects (the frat effect, and the sports effect) thus the coefficient on Greek is overstated in (1) and falls
when sports is included in the regression.

3)

(5 points) Define heteroskedasticity and suggest a reason why the error in regression (3)
might be heteroskedastic.

Heteroskedasticity occurs is when the variance of the error term depends on one of the regressors. If, for
example, the dispersion (or variance) of the number of binge drinking days is larger at a frat than for dorm
residents, the variance of the error would depend on Greek.

4)

(5 points) Using regression (3), predict the number of binge-drinking days in a 30-day
period for an 18-year old white male Freshman who belongs to a fraternity and is on a
sports team.

n
binge
30 = .91 + 1.48u1 .96u0 + .09u18 + 1.15u1 + .35u1 + .00u0 + .22u0 2.08u0 1.54u0
= .91 + 1.48 + .09u18 + 1.15 + .35 = 5.51 binge drinking days (per 30 days)

5)

(5 points) All the respondents are either Freshmen, Sophomores, Juniors, or Seniors, yet
Freshman, Sophomore, Junior, age, and the constant regressor (the intercept) are not
perfectly multicollinear in regression (3). Describe a counterfactual situation in which
these variables would be perfectly multicollinear.

If all Freshmen were 18, all Sophomores 19, all Juniors 20, and all Seniors 21, then Freshman,
Sophomore, Junior, age, and the constant regressor (1) would be perfectly multicollinear:
18uFreshman + 19uSophomore + 20uJunior + 21u(1 Freshman Sophomore Junior) = age

Par t 2 (26 points)


1)

(6 points) Consider two white male frat-member non-sports Sophomores, one of whom is
18 years old and the other is 20 years old. Using regression (5):
a) (3 points) Compute the difference in the predicted values of binge30 for these two
students;

All their Xs have the same value except for age, so

n
30 =
' binge

b)

Eage 'age = Eage u2 = .09u2 = .18 binge drinking days (per 30 days)

(3 points) Compute a 95% confidence interval for the difference in part (a).

n
n
SE(' binge
30 ) = SE( Eage u2) = 2SE( Eage ) = 2u.10 = .20, so the 95% confidence interval for ' binge
30
is .18 r 1.96u.20 = [-.21, .57]

2)

(5 points) Use regression (4) to test the null hypothesis that the relationship between age
and binge drinking is linear, against the alternative hypothesis that the relationship is
possibly a quadratic, at the 5% significance level. Is the null hypothesis rejected?

The null hypothesis that the relationship is linear corresponds to the coefficient on age2 being zero; the
alternative that the relationship is quadratic corresponds to this coefficient being nonzero. Thus the
hypothesis can be tested using the t-statistic testing E age2 = 0. That t-statistic is t = -.081/.062 = -1.31,
which is less than 1.96 in absolute value, so the hypothesis is not rejected at the 5% significance level.

3)

(5 points) Suppose you hypothesized that female athletes are not prone to binge drinking,
even though male athletes might be. How would you modify regression (3) to test this
hypothesis? Be precise.

You want to allow the possibility that the effect of sports is nonzero for men and zero for women. This can
be achieved by defining the variable male = 1 female and creating the interaction variable maleusports,
then including it in regression (3). The relevant part of the regression thus would be,
binge30 = + E4sports + E5maleusports +
so the effect of sports for women is E4 and the effect for men is E4 + E5. Thus the hypothesis that the effect
for women is zero (but not necessarily so for men) can be tested by testing E4 = 0.
Alternatively, if you use the interaction femaleusports, the relevant part of the regression would be,
binge30 = + E4sports + E5femaleusports +
so the effect of sports on binge30 for women in this case is E4 + E5, so the test would be of the hypothesis
that E4 + E5 = 0.

4)

(5 points) The p-value is missing in Table 2 for one of the F-tests based on regression (6).
Estimate this missing p-value and briefly explain how you did so.
5

The F-statistic is 3.45 and there are q = 2 restrictions being tested (the coefficients on Black and
Hispanic/other), so the critical value is obtained from the Fq,f distribution with q = 2. From the table
attached to the exam, the 5% critical value is 3.00 and the 1% critical value is 4.61. Because 3.45 is
somewhat larger than 3.00, a reasonable guess is that p | .04 or p | .03. [The actual p-value is .032.]

5)

(5 points) In everyday language, what is the difference in the interpretation of the


coefficient on Greek in regression (3) and regression (6)?

The two regressions differ only by the inclusion of alcohol30 in regression (6). In regression (3), the
coefficient on Greek is the effect on binge drinking of being in a fraternity (holding constant the other
regressors in (3)), while in regression (6) the coefficient on Greek is the effect on binge drinking of being in
a fraternity, holding constant the other regressors in (3) and the number of days out of 30 which you drink
alcohol. Thus, regression (3) measures the effect on the number of binge drinking of Greek, in whereas
regression (6) it measures the number of days of binge drinking, holding constant the total number of days
drinking. Thus regression (3) examines the total quantity of binge drinking; regression (6) examines the
fraction of drinking that is binge drinking.
One can interpret regression (6) by examining the effect of the coefficient on Greek at the mean values in
the data. The mean value of binge30 is 2.35 and the mean value of alcohol30 is 5.12, so evaluated at the
means 2.35/5.12 = 46% of drinking episodes are binge drinking. The estimated effect of Greek in (6) is to
increase binge30 by .37, holding constant the total number of drinking days, so (evaluated at the mean)
Greek increases the fraction of binge drinking days, out of total drinking days, by .37/5.12 = 7%, a modest
(but statistically significant) increase relative to the overall ratio of averages of 46%.

Note to Par ts 3 and 4


Your answers should be based on the results in Table 2 and your knowledge of econometrics, not
on your beliefs about personal choice, equity, probity, etc.; these are questions about the
empirical results, not about your opinions concerning the Greek system, drinking, or such
matters.
Par t 3 (24 points)
Do you agree or disagree with the following statements? Briefly explain why. (8 points each)
1)

Binge drinking is a problem that primarily involves only a segment of the student
population.

To a considerable extent this is true. For example, on the one hand the predicted value of binge drinking
for the 18 year old Freshman sports playing fraternity member in Part 1 question 4 is 5.51 days out of 30
days, or more than one day per week. On the other hand, an 19 year old Sophomore non-frat female nonsports Hispanic female has a predicted value of -0.88 binge drinking days:

n
binge
30 = .91 + 1.48u0 .96u1 + .09u19 + 1.15u1 + .35u0 + .00u1 + .22u0 2.08u0 1.54u1
= .91 1.96 + .09u19 + .00 1.54 = -0.88 binge drinking days (per 30 days)

This negative value does not make sense in context but it does indicate that the binge drinking rate for
some segments of the population is nearly zero, much less than for other segments (male sports-playing

Greeks).
One need not go through this predicted value exercise to see that the average binge drinking rates differ
sharply across groups. The overall mean binge drinking rate is 2.35 days per 30 days, and the coefficient
on black in (3) is -2.02, so this alone suggests that the mean binge drinking rate for nonGreek blacks is
very much lower than the all-student average.

2)

Sororities are just as bad as fraternities, at least from the perspective of binge drinking.

False. Regression (5) addresses this by including the interaction between Greek and female. The effect
of Greek for men is 2.69 (holding constant the other regressors), while the effect of Greek for women is
2.69 2.06 = 0.63, far less in real-world terms. The difference is statistically significant at the 1% level (t =
-2.06/.66 = -3.12), so the hypothesis that binge drinking rates at fraternities and sororities is the same is
rejected at the 1% level, with the rate at sororities far less than at fraternities.

3)

Freshmen, who are learning how to cope with the new freedoms of college, have the
highest incidence of binge drinking; as students gain college experience, binge drinking
becomes much less of a problem.

Depending on what regression you look at, there is some growing up effect lower binge drinking rates
for older classes but the quantitative effect is fairly small. In all regressions (3) (6), the binge drinking
rate is between .35 and .76 days (per 30 days) greater for Freshman than for Seniors, and in all these
regressions the rate for freshman exceeds the rate for other classes. On the other hand, in regressions
(3) (5) these differences between classes are not statistically significant (either the coefficient on
Freshman alone, or jointly with an F-statistic testing the coefficients on Freshman, Sophomore, and Junior
all being zero), so in those specifications the hypothesis of no difference among classes cannot be
rejected.
In regression (6), the class binary variables have jointly significant coefficients and the coefficient on
Freshman is individually statistically significant (at the 1% level), so in that regression there is evidence of
a statistically significant growing up effect. Bear in mind, however, that regression (6) has a different
interpretation it considers the number of binge drinking days, given how many drinking days you have,
that is, the fraction of drinking days that are binge-drinking days (this fraction is higher for Freshmen than
for other classes). Taking regressions (3) (6) together, Freshman have more binge-drinking episodes,
other things constant, and those binge-drinking episodes constitute a higher fraction of their drinking days;
this is consistent with the growing up hypothesis. However, the first part of this summary (relying on (3)
(5)) is not strongly supported by the results because the effects in question have wide confidence
intervals that include zero.

Par t 4 (25 points)


1)

(8 points) Summarize the results in Table 2 about the effect on binge drinking of fraternity
and sorority membership. For the purpose of this question, take the results in the table at
face value, that is, do not consider threats to the validity of these results.

Taken at face value, the results indicate that fraternity membership is associated with a statistically
significant and large (in a real-world sense) increase in the binge drinking rate, holding constant other
student characteristics. Based on regression (5), fraternity membership is associated with an increase in
the binge drinking rate of nearly 3 days per month on average (t = 4.80), holding constant other student
characteristics. Moreover, regression (5) tells us that when students drink, fraternity membership makes it
more likely for that drinking to be binge drinking.

In contrast, the results in the table, regression (5) in particular, indicate that sorority membership has much
less effect on binge drinking (the hypothesis that it has no effect cannot be tested using only the results
provided in the table), increasing binge drinking by only two-thirds of a day per month.
In addition, the results in regression (6) indicate that a greater fraction of the drinking days are binge
drinking days for Greek students than for non-Greek students. However, as a fraction of days, this
differences is small. Thus, the major channel of increasing binge drinking that can be deduced from these
results is that total drinking goes up and, with it, binge drinking, although the fraction of binge drinking out
of total drinking also goes up slightly.

2)

(10 points) Provide two threats that, in your judgment, are the most important threats to the
internal validity of the results discussed in your response to Part 4/Question 1 (be specific
and explain your reasoning).

One major threat is omitted variable bias, associated with unobserved student characteristics. Suppose
students who want a party life will drink substantially wherever they reside, and fraternities simply
provide an opportunity for these students to live with like-minded fellow partiers. Then the causal effect of
being in a fraternity could be small or even zero Greek is positively correlated with a third variable (party
animal) which is an unobserved individual characteristic and is a determinant of binge30, so the
coefficient on Greek in regressions (1) (5) is biased up. This omitted variable bias could also be present
in regression (6), but the interpretation is more subtle, it is not just the amount of drinking but the nature of
the drinking (binge v. non-binge) that is being studied in regression (6), so there the omitted personal
characteristic would be one that is associated not with the students desire for using alcohol, but for using
it in a binge-drinking way.
Note: you could alternatively think of this channel as reverse causality being a binge drinker causes you
to join a fraternity. This would not be incorrect but it is probably not as helpful as thinking that there is a
third variable, party animal, that (for some) leads to both binge drinking and joining a fraternity.
Mathematically these are very closely related and both lead to a violation of the first least squares
assumption and upwards bias in the OLS estimator of the coefficient on Greek.
A second threat arises from these data being self-reported, raising questions about the veracity of the
binge30 response. Note that this is not an issue of errors-in-variables bias associated with
mismeasurement of the Xs it is plausible that the students accurately reported their age, gender, college
year, race, and sports teams. Instead, this is best thought of as omitted variable bias, where the omitted
variable is respondent exaggeration. If frat respondents systematically exaggerated their drinking
exploits but other respondents did not, then Greek would be correlated with the omitted variable
respondent exaggeration, so the coefficient on Greek would reflect not a true effect of being in a frat but
instead would just measure the fact that frat members exaggerated more than others. Of course, the
direction of the bias could go the other way, if frat members tried to hide their binge drinking and
systematically underreported, relative to non-frat members.
A third threat is possible sample selection because of the 65% response rate. For sample selection to
occur, the sample must be selected by a mechanism that is related to the dependent variable. For
example, if heavy binge drinkers were too busy drinking to fill out the survey then there would be sample
selection bias (the selection mechanism busy drinking is related to the dependent variable binge
drinking).

3)

(7 points) Consider the concerned college administrator of the introduction, who would like
to ban the Greek system and replace it with dorms or off-campus housing. All things
considered, do the results in Table 2 support this recommendation? Specifically, why or
8

why not?
First, as pointed out in the response to Part 4/question 1, binge drinking is something associated with
fraternities, not sororities, so banning the Greek system is too wide a policy; even taking the results at face
value, the ban (if justified by reducing binge drinking) should be applied to fraternities only. This said, the
results, taken at face value, would be evidence in favor of a ban on fraternities.
Second, for the ban-fraternity policy to be justified, the results must be internally and externally valid, that
is, the coefficient must be an unbiased estimate of the causal effect, and it must be possible to apply and
exploit that causal effect in the way envisioned by the policy.
Concerning internal validity, the criticism raised in response to Part 4/question 2 that party animals join
fraternities, instead of fraternities producing party animals is a serious one that is not adequately
addressed by this study. So, on the count of internal validity, there is a good reason to doubt that this is
an unbiased estimator of a causal effect, so one should hesitate to make policy conclusions based on
these results.
Concerning external validity, as stated the policy suggests that simply banning fraternities would eliminate
the environment exerting a bad influence on the students. Clearly, simply taking the Greek letters off the
front of the buildings but leaving everything else unchanged would have little effect, although that would
possibly count in a narrow sense as moving from Greek = 1 to Greek = 0. Similarly, closing down frat
living arrangements could in many cases simply move the members off campus to a similar situation,
except that they would not be under college jurisdiction. In this sense, the coefficient on Greek (ignoring
internal validity concerns) needs to be interpreted as eliminating not just fraternities in a narrow sense but
the social environment of fraternities for such a recommendation to be supported.
Taken together, these caveats indicate that the regression results, at least as stated in Table 2, are not by
themselves a suitable basis for proscribing policy.
This said, I would not dismiss the results entirely; instead, a more nuanced interpretation is that while they
are not definitive, they do point quantify a strong and large link between fraternities and binge drinking.
This link is part of conventional wisdom on campus, but the study and the analysis confirms and,
importantly, quantifies the conventional wisdom (bear in mind that it could have overturned the
conventional wisdom, but it did not).
The next step in this research program is to try to measure in a more convincing way the causal effect on
binge drinking of fraternity membership. Barring a randomized controlled experiment, the methods that
are available to try to do this are natural experiments and instrumental variables regression, topics we will
take up in the second half of the semester.

Questions for Par t I (25 points)

Answer these questions in blue book #1

1) (3 points) Using regression (2), construct a 95% confidence interval for the effect on the
corruption rate of an increase in LowEd share of .01 (that is, of a 1 percentage point increase
in the percent of the adult population with at most a high school degree).

'(LowEd share) = .01, so the predicted change in the corruption rate is .0118.4 = .184 and
the standard error of this predicted change is .018.7 = .087, so the 95% confidence interval is
.184 1.96.087 = .184 .171 = (.013, .355).
2) Consider regression (3):
(a) (3 points) Test the hypothesis that the population coefficient on LowEd shareVoting
share is zero, against the alternative that it is nonzero.
The t-statistic is t = 47.7/94.8 = 0.50, which is < 1.96, so we do not reject the null at the 5%
significance level.
(b) (3 points) Test the hypothesis that citizen participation, specifically the presidential
voting share, does not affect corruption, against the alternative that the voting share
affects corruption.
Under the null hypothesis, Voting share does not enter regression (3), which means that the
coefficients on Voting share and LowEd shareVoting share must both be zero. The F-statistic
testing this hypothesis is .52 with p = .60 > .05, so we do not reject the null hypothesis at the 5%
significance level.
3) Do you agree or disagree with the following statements? Explain (3 points each).
(a) Because immigrants are less knowledgeable about the U.S. legal system, they are more
susceptible to governmental corruption. The regression results in Table 1 show that this
is true: more foreign-born citizens, more corruption.
Disagree. The coefficient on Foreign-born share is positive so the sign of the estimated
coefficient indicates that more foreign-born citizens is associated with more corruption, however
the t-statistic is 21.3/14.3 = 1.49 < 1.645 so the coefficient is not significant at the 10% level, so
there is no statistically significant support for this claim at conventional levels of significance.
(b) The R2 of regression (2) is low. Thus there are important determinants of corruption
omitted, and therefore the coefficient on LowEd share in regression (2) is biased because
of omitted variable bias.
Disagree. The low R2 indicates that there might be determinants of corruption omitted from the
regression, but that alone does not mean there is omitted variable bias. For omitted variable
bias to exist, the omitted variables (1) need to be determinants of Y and (2) need to be correlated
3

with the included regressor(s). The low R2 indicates that (1) is probably true (although not
necessarily the error term could just be measurement error) but the R2 is silent on point (2).
(c) The regression results in Table 1 are flawed because they use heteroskedasticity-robust
standard errors: if the errors really are homoskedastic, then these standard errors will be
incorrect. The table should instead report standard errors that are correct even under
homoskedasticity.
Disagree. Heteroskedasticity-robust standard errors are valid whether the errors are
heteroskedastic or homoskedastic.
4) Suppose that high levels of corruption result in low-quality public institutions, including lowquality schools, which in turn results in lower levels of education.
(a) (3 points) If so, what are the implications for the estimated effect on corruption of
education in Table 1? Briefly explain.
This means that there is simultaneous causality: low education causes corruption and vice versa.
As a result the OLS estimator is biased.
(b) Consider the following potential instrumental variables for LowEd share in regression
(3):
(i) Newspapers = average number of newspapers per capita in 1990
(ii) Alphabet = 1 if the state falls in the first half of the alphabet, = 0 otherwise (e.g. = 1
for Alabama, = 0 for Wyoming)
(2 points each) For each proposed instrument, is the variable arguably a valid instrument
variable? Briefly explain.
The two conditions for a valid instrument Z are (1) it is relevant, i.e. Z is correlated with X and
(2) it is exogeneous, i.e. Z is uncorrelated with the error term. So:
(i)Newspapers: (1) relevance: maybe higher levels of education might mean more newspaper
readership; (2) exogeneity: probably not one hopes that more newspapers would help to
uncover corruption and thereby limit corruption.
(ii)Alphabet: (1) relevance: no no reason for alphabetical listing to be correlated with levels of
education. (2) exogeneity: yes no reason for alphabetical listing to be correlated with
anything!

Par t II: Cor r uption (B)


The questions in Part II refer to Table 2.
Table 2
The Deter minants of Cor r uption: Two Stage Least Squar es Regr essions Results
Dependent variable: Corruption Rate
Endogenous regressor
LowEd share
Exogenous regressors
Urban share
Foreign-born share
ln(Pop)
Voting share

(1)

(2)

(3)

(4)

(5)

(6)

29.4
(11.7)

131.0
(114.4)

32.9
(12.8)

32.5
(10.2)

54.8
(36.4)

35.4
(11.4)

1.3
(2.8)
22.4
(14.4)
-.43
(.45)
14.4
(8.2)

18.4
(18.4)
69.3
(48.9)
-2.20
(1.97)
80.4
(73.3)

1.9
(2.9)
24.0
(14.7)
-.49
(.49)
16.6
(18.8)

HS1928

LnInc1940

2.7
(5.6)
12.6
(14.8)
.18
(.54)
32.1
(24.1)
-28.5
(10.7)
LnInc1940

19.0

0.7

19.7

2.6

50

50

HS1928,
LnInc1940
10.6
3.95
(p = .047)
50

-.4
(2.5)
7.0
(9.4)
.34
(.34)
17.4
(7.2)
-22.2
(6.3)
HS1928

50

50

-.1
(2.5)
7.7
(9.5)
-.32
(.35)
19.2
(7.8)
-23.0
(6.1)
HS1928,
LnInc1940
11.3
0.48
(p = .487)
50

Manufacturing share
Instrumental variables
First-stage F-statistic*
J-test of overidentifying
restrictions
N

Notes: Heteroskedasticity-robust standard errors appear in parentheses under regression


coefficients, and p-values appear in parentheses under F-statistics. All regressions include an
estimated intercept, which is not reported. All regressions are estimated using a cross-sectional
data set consisting of 50 US states, where the variables are defined in Table 1.
*The first-stage F-statistic is the F-statistic testing the hypothesis that the coefficients on the
instruments in the first stage regression all equal zero.
Questions for Par t II (25 points)

Answer these questions in blue book #2

1) (15 points) From the regressions in Table 2, select one or more preferred regressions that
you believe provide the most reliable basis for inference about the effect of low education
levels on corruption. Carefully explain your reasoning.
The regressions differ in the instruments that are used, and in whether the manufacturing share
is included as a regressor. The instruments should be selected based on relevance and
exogeneity. Relevance is measured by the first-stage F-statistic, which should exceed 10 for the
two stage least squares results to be statistically reliable. Applying this criterion, we are left
with regressions (1), (3), (4), and (6). When the coefficient is overidentified (here, having at
least two instruments), the hypothesis that both instruments are exogenous can be tested using
5

the J-statistic. The null of exogeneity is rejected in regression (3) but not regression (6). This
leaves us with (1), (4), and (6).
To make a further distinction, we must exercise judgment about the specifications. Are the
instruments arguably exogeneous based on our judgment? It seems like they should be, after all
they measure conditions in the distant past and in this sense they should not be proximate
determinants of corruption in the 1990s. On the other hand, if corruption is related to overall
state values and culture that varies slowly over time, it is possible that these instruments still
could be correlated with these slowly-varying omitted variables. This suggests that it is sensible
to control for more state conditions, for example controlling for the level of manufacturing
(treating it as a control variable, for slowly-varying state conditions not as a causal variable
for corruption) is warranted. This reasoning leads to preferring regressions (4) or (6). As a
practical matter, there is very little difference between the two, however regression (6) includes
an instrument that is basically irrelevant (first-stage F=2.6) so it is warranted to drop that
instrument, which leaves us with regression (4). The fact that adding LnInc1940 as an
instrument in regression (6) doesnt change the results or reject exogeneity is a reassuring
robustness check of regression (4).
2) (5 points) Based on your preferred regression(s), what conclusions do you draw about the
effect on corruption of the level of education? Explain.
Based on regression (4), low levels of education (low shares of a high school degree) are
statistically significant at the 5% level. The magnitude of the effect is substantial: a one
standard deviation move in LowEd share is associated with a 32.5 .07 = 2.3 change in the
corruption rate, which is approximately a one standard deviation change in the corruption rate
(and a change of more than one-half of the mean corruption rate). Assuming that the findings
from the regression are internally valid, according to this regression, increasing the level of
education in the population in particular, reducing the fraction of the population with low
levels of education has not only the usual direct benefits, but the statistically significant side
benefit of substantially reducing corruption.
3) (5 points) In your judgment, what are the most important threats to the internal validity of the
estimates in your preferred regression(s), upon which you based your answer to question 2?
Here are two:
1. Are the other regressors good control variables? In particular, voting share could be subject
to simultaneous casuality (vote early and often), and if so it would introduce simultaneous
causality bias and not be a suitable control.
2. Are the instruments really exogeneous? Although they reflect things that happened in the
distant past, political culture in a state changes very slowly. The identifying assumption is, in
effect, that things that happened long ago are correlated with the level of education today but
are not correlated with omitted determinants of corruption today. The specification of the
equation for corruption today omits things that are plausibly strongly serially correlated, such
6

as the vigor of investigative journalism, the way that prosecutors are appointed (political
appointments? elected officials? rising through the bureaucracy?) It is particularly troubling
that the results on instrument exogeneity hinge on whether the manufacturing share is included
as a regressor (regression (3) v. (6)), when the manufacturing share itself is hard to understand
as a proximate cause of corruption. So the case for instrument exogeneity is not compelling.

Questions for Par t III (29 points)

Answer these questions in blue book #3

1) Using regression (1):


(a) (2 points) What is the estimated effect of a $600 rebate on consumption in the month in
which the rebate is received?
Consumption increases by .247$600 = $148.20
(b) (2 points) Test the hypothesis that a rebate received in month t has no effect on the
change in consumption in the second month after which it is received, that is, on 'Ct+2.
The effect of Rt on 'Ct+2 is the same as the effect of Rt2 on 'Ct, which is the coefficient on Rt2 in
regression (1). The t-statistic is t = -.034/.121 = -0.28 < 1.96 so the hypothesis that this effect is
zero is not rejected at the 5% (or 10%) significance level.
2) Consider regression (1):
(a) (2 points) Would you expect the error term in this regression to be serially correlated?
Why or why not?
The error term consists of determinants of the change of consumption that are not in the
regression, which in this case is everything except for the rebate and monthly fixed effects. Some
of these omitted factors are plausibly correlated. For example, young families tend to have
increasing consumption requirements and the omitted variable age of children is clearly
serially correlated, and this would produce positive serial correlation in the error term.
On the other hand, the theory of optimal consumption holds that rational agents plan their
intertemporal consumption decisions in such a way that consumption is smoothed in the
extreme, consumption follows a random walk so that consumption changes are unpredictable.
In this view, the error term would be serially uncorrelated.
(b) Whatever your answer to 2(a), suppose that this error term is in fact serially correlated.
(i) (2 points) What are the implications of this serial correlation for bias in the estimated
causal effects? Explain.
Serial correlation of the error term does not introduce bias into the OLS estimator, as long as
the regressor is exogeneous.
(ii) (2 points) What are the implications of this serial correlation for the standard errors
reported in the table? Explain.
Serial correlation of the error term means that the usual (heteroskedasticity-robust) standard
errors are incorrect (biased). [Instead, we should use heteroskedasticity- and
autocorrelation-consistent (HAC) standard errors.]
3) Using the results of regression (1):
10

(a) Draw the following graphs. Clearly label the axes and provide the numerical values of
the points (3 points each).
(i) The effect of a $1 rebate on the change of consumption, 'Ct, in the month the rebate
is received and the two subsequent months.
Suppose the rebate is received in July. Effects on the change of consumption:
CJuly CJune = .247
CAugust CJuly = -.172
CSeptember CAugust = -.034
(ii) The effect of a $1 rebate on the level of consumption, Ct, in the month the rebate is
received and the two subsequent months.
Suppose the rebate is received in July. Effects on the level of consumption:
CJuly CJune = .247
CAugust CJuly = -.172 so CAugust CJune = .247 - .172 = .075
CSeptember CAugust = -.034 so CSeptember CJune = = .247 .172 .034 = .041
Graphs for (i) and (ii):
Dynamic effect of rebate at date t on the change of consumption and the level of consumption

Consumption or Change in Consumption

0.3
0.25

0.247

0.2
0.15
0.1
0.075
0.05

0.041

0
0

2 -0.034

-0.05
-0.1
-0.15
-0.172
-0.2
Months after rebate
III.3(a)(i) Effect on change in consumption

III.3(a)(ii) Effect on consumption

(b) (2 points) Of a $1 rebate received in July, how much is estimated to remain unspent by
the end of September?
11

The increases in monthly consumption in each of the three months are given in 3(a)(ii).
Altogether, the total additional consumption spending is .247 in June, .075 in August, and .041
in September, for .247 + .075 + .041 = .363. Thus the amount remaining of a $1 rebate is 1.00 .363 = $.637.
4) (3 points) During this period, the economy was emerging from a recession. A skeptic says:
The regression results show that, on average, consumption is increasing over this six-month
period, but this could just be a consequence of the general economic recovery. Therefore,
these regressions confuse the effect on consumption of the rebate with the broader effect of
the overall economic recovery. Do you agree or disagree? Why?
Disagree. The regression has monthly fixed effects, which eliminates factors that are changing
over time but constant across households, such as changing overall macroeconomic conditions.
5) Using the results in regression (2), compare the estimated dynamic causal effects of the
rebate for low-income families vs. non-low income families.
(a) (3 points) Is there statistically significant evidence that the dynamic effects differ for
these two groups?
For the dynamic effects to be the same for these two groups, the coefficients on all the
interaction terms for LowIncome would need to be zero. The F-statistic testing this hypothesis is
4.10 with p = .024 < .05, so the hypothesis is rejected at the 5% significance level. So, there is
statistically significant evidence that the dynamic effects differ for these two groups.
(b) (3 points) According to the estimated coefficients, which group (if any) has spent more of
the rebate check after two months, and (if so) by how much? Briefly, explain.
Base group:
Effect on Ct of Rt: .130
Effect on Ct+1 of Rt: .130 -.067 = .063
Total increase in consumption in months t and t+1 = .130 + .063 = .193
Low-income group:
Effect on Ct of Rt: .130 + .624 = .754
Effect on Ct+1 of Rt: (.130 + .624) + (-.067 - .459) = .754 - .526 = .228
Total increase in consumption in months t and t+1 = .754 + .228 = .982
Of a $1 tax rebate, the base group (non-low income) has spent $.193 after two months, whereas
the low-income group has spent $.982 after two months.
(c) (2 points) Do these results accord with economic reasoning, or do they pose a puzzle?
Briefly, explain.
The low-income group has consumed almost all the rebate after two months, whereas the nonlow-income group has consumed only 19% of the rebate. This makes sense economically. The
12

low income group is likely to be liquidity constrained (unable to borrow, at least at reasonable
rates) and would therefore be more likely to spend the rebate check immediately. In contrast,
the non-low-income group might already have financial savings (a substantial bank account
balance) so that the rebate check need not be spent immediately, but instead could be saved and
used for future consumption. [This is an empirical verification of an important point that the
effects on consumption of changes in taxes depends on who the tax change affects.]

13

Questions for Par t IV (21 points)

Answer these questions in blue book #4

For purposes of Part IV, the rebate effect is the effect of receiving a $600 tax rebate on
household consumption of eligible households, in the month in which the rebate is received,
holding all else constant.
1) Consider the following estimators of the rebate effect:
(a) C I , July C I , June
Biased. The control groupfor this estimate is the receiving households, in the month prior to
receipt. This after minus before estimator cannot distinguish common effects that happen
over time, such as the general recovery from the recession, from the effect of receiving the
rebate.
(b) C I , July C II , July
Unbiased. The control group is the group that receives the rebate later. This is the simple
differences estimator, treatment minus control. If treatment (receipt of rebate in July) is
randomly assigned, then this will be uncorrelated with other determinants of consumption and
the simple differences estimator is unbiased. [Note: one might question this reasoning by
pointing out that the control group knows it will get a rebate later so if they are not liquidity
constrained, they might increase consumption in July. In fact, in the absence of liquidity
constraints, under the permanent income hypothesis the exact timing of the receipt of the rebate
shouldnt matter, so this estimator would yield an estimate of zero even though the rebate does
in fact increase consumption.]
(c) C I , July C III , July
Biased. The control group here are those who are ineligible. They are likely to be different
systematically from those who are eligible, mainly by having lower income, so their marginal
propensity to consume out of a rebate check is arguably different than those who are eligible
and by assumption we want to measure the rebate effect on the eligibles.
(d) ( C I , July C I , June ) ( C II , July C II , June )
Unbiased. This is the differences-in-differences estimator, in which after minus before for the
treatment group is compared to after minus before for the control group. Because treatment is
randomly assigned, it is independent of other determinants of consumption, and this estimator is
unbiased. [Note that the caveat in (b) applies here too.]
(2 points each) For each estimator (a) (d), is this an unbiased estimator of the rebate
effect? Briefly explain.
15

2) (3 points) Provide a regression equation by which the estimator in 1(d) can be computed by
OLS regression estimated with household-level data for June and July.
Let JulyCheckit = 1 if the household receives the check in July, and = 0 if it does not. Then
estimate the regression,

'Cit = E0 + E1JulyCheckit + uit,


using only data for groups I and II. The OLS estimator of E1 is (exactly) the estimator in 1(c).
3) Consider the probit regression in Table 5.
(a) (5 points) Using Table 5, compute the probability of receiving a check in July for an
eligible household with one child, aged 6 years, in which the head of household is 30
years old.
For this household, AnyChildren = 1 and HHAge = 30, so
Pr(Receive check in July) = Pr(z < -.75 + .111 .00830) = Pr(z < -.88) = .1894
where the value .1894 comes from the cumulative normal tables attached to the exam. Thus, the
estimated probability of receipt of a check in July is 18.94% or approximately 19%.
(b) (3 points) Do the results in Table 5 support, or cast doubt on, the governments claim that
the month in which checks were mailed is effectively random? Explain.
The results support the governments claim. If assignment is truly random, it should not be
possible to predict receipt using observable variables. In particular, the coefficients on
AnyChildren and HHAge in the probit regression should both be zero. The F-statistic testing
this restriction is 1.42 with p = .241 > .05, so the hypothesis of random assignment is not
rejected at the 5% (or 10%) significance level against the alternative that assignment depends
on these two variables.

16

Questions for Par t I (18 points). Please answer these questions in Blue Book I
1)

Interpret the coefficient on Fraction daughters in regression (2). (3 points)

The dependent variable is binary so this is a linear probability model. The coefficient on Fraction
daughters is the change in the probability of voting for that bill, if the Fraction daughters were to increase
from zero to one, holding constant Registered Democrat.

2)

Consider a representative with 2 daughters and 1 son, from a district in which 55% of
voters are registered Democrats.
a) Using regression (1), compute the probability that this representative voted in favor of
the bill on teen access to contraception. (3 points)

Regression (1) is a probit model, so the regression equation gives the z-score and the probability of
voting in favor (of the dependent variable being 1) is Pr(z < -0.51 + 0.36u(2/3) + 0.71*.55), where z is a
N(0,1) random variable. This equals Pr(z < 0.12) = 0.54 or 54% from the cumulative normal tables.

b) Using regression (2), compute the probability that this representative voted in favor of
the bill on teen access to contraception. (3 points)
Regression (2) is a linear probability model, so the predicted value is the predicted probability, that is, the
probability of voting in favor is 0.38 + 0.13u(2/3) + 0.23*.55 = .59, or 59%.

3)

Does the coefficient on Fraction daughters change substantially (in a real-world sense)
from regression (3) to regression (4)? What does this tell you about the additional variables
that were included in regression (4)? (3 points)

In regression (3), a unit change in Fraction daughters (from 0 to 1) is associated with an increase in the
NOW score by 6.18 (on a 0-100 scale), in regression (4) it increases by 6.01. This is a negligible change
in a real world sense (it is also much less than either of the standard errors in regressions (3) or (4)).
Because the coefficient does not change when the additional regressors are included, omitting those
regressors did not cause omitted variable bias. Thus either those regressors do not belong in the
equation or are uncorrelated with Fraction daughters. (You could say more about which of those is true
from the table, but that would require using additional information beyond the mere fact that the estimated
coefficient didnt change.)

4)

A critic asserts that a shortfall of this study is that it focuses exclusively on daughters,
indicating gender bias by the author. The critic suggests adding one more regressor to
regression (4), specifically, Fraction sons, which is the fraction of males among the
representatives children. What would be learned from this regression? Be specific. (3
points)

Nothing would be learned. Adding Fraction sons would produce perfect multicollinearity: Fraction sons +
Fraction daughters = 1, so because an intercept is included (as it is), Fraction sons is a perfect linear
combination of Fraction daughters and the constant regressor.

5)

Another critic suggests that more conservative districts might elect representatives with
fewer daughters, so that Fraction daughters is endogenous. The author responds that
regression (5) provides evidence against this hypothesis, because Fraction daughters is
3

(with only one exception) unpredictable by the other regressors and thus is exogenous. Do
you agree or disagree with the authors response? Why? Be precise. (3 points)
Disagree, that is, the authors response is not persuasive. The criticism can be said this way. Let X be
Fraction daughters and let W be all the other regressors, so the regression is Y on X and W. The criticism
is that X is endogenous. The response is that X and W are uncorrelated (the regression (5) does in fact
show this the F-statistic fails to reject the null hypothesis that all the coefficients in the regression of X on
W are zero). But this does not show that X is exogenous! Endogeneity is if X is correlated with the error
term (E(u|X) z 0) but saying something about the relationship between X and W doesnt tell us about the
relationship between X and u.

Questions for Par t II (24 points). Please answer these questions in Blue Book II
1)

The following questions concern regression (4):


a) Provide a potential reason why the coefficient on district income in (4) is subject to
omitted variable bias. (2 points)

To result in omitted variable bias, an omitted variable Z must be a determinant of the NOW score and
must be correlated with District income. One such variable is religious background: attitudes towards
womens rights vary across religions (so this variable belongs in the regression), and income levels vary
on average across religious groups in the U.S. (so religion is correlated with District income). A second
such variable is education: attitudes towards womens issues vary with levels of education (although
there is one measure of education in the regression it is a limited measure, perhaps it is high school
graduation that is more important), and income varies strongly with level of education.

b) Comment on the following statement: Your answer to the previous question implies
that the conditional mean of the error term in (4) is nonzero, given the regressors in (4).
Therefore, the first least squares assumption is violated and the coefficient on Fraction
daughters in (4) does not have a causal interpretation. (3 points)
Disagree. The argument in (a) implies that the coefficient on District Income does not have a causal
interpretation. but this need not imply that the coefficient on Fraction daughters does not have a causal
interpretation. The relevant question is whether there is conditional mean independence, specifically,
whether E(u|X,W) = E(u|W), where X = Fraction daughters and W = the other regressors (the control
variables). In words, conditional on the other regressors, does the mean of the error term depend on
Fraction daughters or not? Because the gender of a child is as if randomly assigned by nature, it is
plausible to think that Fraction daughters is distributed independently of u given W (or, more strongly, is
independent of u and W). Thus it is plausible that conditional mean independence holds, and this suffices
to give the coefficient on Fraction daughters a causal interpretation, even if the remaining control variables
are correlated with the error term.

For the remaining questions, suppose (hypothetically) that the data set is extended to be panel
data for T = 3 Congresses, the 105th (1997-1998), 106th (1999-2000), and 107th (2001-2002)
Congresses. The observational unit would be a representative (his/her votes, children, and
district) in a given Congressional session. The data set would consist of all representatives who
were elected to Congress for all three sessions. Suppose n = 300, so there is a total of 900
observations (representatives are elected for two-year terms, and almost all who run for
reelection are reelected).
2)

Representatives in the 105th Congress who retire, are not reelected, or die would be in the
cross-sectional data set used in Table 1, but would not be in the panel data set. Would this
introduce sample selection bias into the panel data estimate of the effect of Fraction
daughters? (3 points)

For sample selection bias to occur, there must be a selection process that is related to the outcome
variable. Suppose that over this period the country became more conservative and representatives with
liberal positions on womens issues were either voted out of office or retired to avoid defeat. Then the
remaining representatives (the ones in the full sample) would be more likely to vote conservative,
whatever their Fraction daughters, and the estimated effect of Fraction daughters would be biased
(towards zero). (Think of the extreme case that only conservatives were elected, and all voted against
womens issues some of these conservatives would have daughters, but because all voted conservative

anyway, voting behavior would not depend on Fraction daughters.)


Whether you consider this a significant threat or not is a matter of judgment (and detailed knowledge of
the political process and the history for these Congresses which are not expected for this exam). It is
true that the U.S. Congress became more conservative during this period, and that voting on family issues
played a role. Thus in theory the sample selection bias could be important. On the other hand the
conservative shift occurred for many reasons so selection occurred in many other ways that are not
related to the dependent variable (including retirement because of old age, health, etc.). My own guess is
that this sample selection bias is not important here, but one could reasonably take the other position.

Regardless of your answer to question (2), for the rest of these questions, ignore the possibility
of sample selection bias.
3)

To what extent would including representative fixed effects address the endogeneity
criticism? Explain. (3 points)

Including representative fixed effects controls for all characteristics of the representative and district that
do not change over time including the representatives family composition at the beginning of the panel.
Thus if family composition at the time of election is an issue, it is now contained in the fixed effect. This
effectively addresses the endogeneity criticism.
A different way to make this point is to suppose that the panel has only two Congresses (T = 2), in which
case representative fixed effects estimation is equivalent to regressions of differences between t = 2 and t
= 1 data. If the fraction daughters upon initial election enters the specification and is correlated with
district attitudes, because neither variable changes over time their first differences are zero and they do
not enter the differences specification. Instead, the differences specification regresses the change in
NOW scores on the change in Fraction daughters and the change in the other regressors.

4)

Would it be appropriate to include time fixed effects, in addition to representative fixed


effects, in the panel data regression? Explain. (3 points)

Suppose the mood of the country is becoming more conservative, and representatives votes change to
reflect that mood. Then mood of the country is an omitted variable and could be included using time
fixed effects. For mood of the country to introduce bias in the coefficient on Fraction daughters, the
mood of the country would need to be correlated with the change in Fraction daughters for those
representatives who were in all three Congresses (see the second part of the answer to question 3
above). For a representative who has a child while in Congress, the gender can be treated as assigned at
random (the logical argument against this would be that voters, seeing that the representative had a
daughter instead of a son, voted him or her out this certainly is far fetched!). So, time effects could be
added to capture the mood of the country, but it very plausibly, omission of time effects would not result
in omitted variable bias.

5)

Consider a hypothetical panel data version of regression (4) in Table 1, in which both
representative fixed effects and time fixed effects are included. Call this hypothetical
regression (P4) (P for panel).
a) What is the problem that is solved by clustered or HAC standard errors, and how
do clustered standard errors solve that problem? (3 points)

Let uit denote the error term in hypothetical panel data regression (P4), where i runs over representatives
and t = 1, 2, 3 runs over Congresses. If uit is correlated over time, then the usual (heteroskedasticityrobust) formula for standard errors does not apply. [The usual formula assumes that the error term is
uncorrelated, but if it is instead serially correlated there is less information in the data than one would think

the observations are not independent.] Clustered standard errors solve that problem by providing an
estimate of the variance of the OLS estimator in panel data that allows for nonzero correlations among the
cluster group, under the assumption that the errors are independent across cluster groups. [This
answer is fully acceptable, it is not necessary to provide a formula.]

b) In regression (P4), which would you recommend using: conventional


(heteroskedasticity-robust) standard errors or clustered standard errors? Explain, with
specific reference to regression (P4). (3 points)
If uit is plausibly serially correlated, then you should use clustered standard errors, where the clustering is
by representative. The question then is whether uit is plausibly serially correlated. The elements of uit are
individual political and personal characteristics of the representative, his/her district, and elements of
randomness (e.g. vote-trading agreements). Individual and district characteristics that do not change are
part of the fixed effect, however characteristics that do change are in the error term. In general those
characteristics would be correlated from one year to the next. This would imply that uit would be serially
correlated, so clustered standard errors are appropriate.
One could also reach the opposite conclusion if, in ones judgment, the district and personal
characteristics changed so slowly that they were fully captured by the representative fixed effects, so that
the inherent randomness in the legislative process (logrolling, votes to satisfy specific important
constituents, etc.) are the main element of the error term in the panel data specifications. If so, then the
error term would be uncorrelated and conventional standard errors would be appropriate.

c)

Suppose that the author estimated regression (P4), using the standard errors you
recommended in part (b). Using your judgment, do you think that these standard errors
in hypothetical panel regression (P4) would be smaller, larger, or about the same as
those in the cross-section regression (4) in Table 1? Explain. (3 points)

In many cases, panel data standard errors are smaller than cross sectional standard errors because there
are more observations (nT instead of n). In regression (P4), however, this usual situation seems unlikely
to arise. The reason is that, by including representative fixed effects, the variation in Fraction daughters is
arising only from those representatives who have children during the time covered by the panel. The
average age for these representatives is 53, so most are beyond their years of having babies. Thus the
variation in the regressor Fraction daughters, given the fixed effects, will be very small, so the standard
errors will be large.
Another way to see this is to consider the T = 2 panel, for which the first differences regression is
equivalent to OLS with fixed effects. The change in Fraction daughters will be zero for the vast majority of
members. Consider the extreme case in which only one representative has a child between T = 1 and T =
2, who happens to be a daughter. Then the estimator will be comparing the change in his/her NOW score
(the sample size is 1 for this group), to the average NOW score for all the other representatives.

Parts I and II were drawn from Washington, E. (2006), Female Socialization: How Daughters
Affect their Legislator Fathers Voting on Womens Issues, NBER Working Paper no. 11924.

Questions for Par t III (21 points). Please answer these questions in Blue Book III
1)

Give the best reason you can why the OLS estimator of the coefficient on Kids>2 in Table
2, column (3) might be biased. (3 points)

Here are some very good reasons (you only need to have provide one; this list of good reasons is not
exhaustive):
(i) The number of children to have is to a considerable extent a choice variable, it is chosen by the woman
(and the couple) based on various considerations, including what she could earn in the labor market.
Economics of the family indicate that women with a greater value of time (greater potential earnings) will
choose less paid employment and more at-home work which includes child-rearing. This unobserved
variable, earnings potential, is a determinant of hours but is also a determinant of number of children, so
the number of children is correlated with the error term, i.e. endogenous. Correlation with the error term
implies a biased coefficient estimator.
(ii) The foregoing argument also applies to how professionally ambitious the wife is.
(iii) There is an accounting relationship involved here, if a woman had full-time employment during 1979
but had a child during 1979 then she would have taken maternity leave and her weeks worked would be
less. (This problem could be eliminated by restricting the sample to women with no children born in 1978
or 1979.)
(iv) Number of children and weeks worked by the mother are both influenced by cultural and religious
factors. Some religions which emphasize large families also support the view that a womans place is in
the home. Religion indicators are omitted, they are a determinant of whether the woman works and are
correlated with family size, so they cause omitted variable bias.

2)

Consider the hypothesis that, on average, U.S. parents want to have children of both
genders (that is, they prefer at least one girl and one boy to all girls or all boys). Does
Table 2 provide evidence in favor of this hypothesis, against this hypothesis, or neither?
Explain. (3 points)

In favor of this hypothesis. The variable Same sex enters significantly in the linear probability model of
regression (1). That is, couples for which the first two children are of the same sex, are more likely to
have subsequent children. Moreover the effect is large in a real-world sense: the probability of having
additional children increases by approximately .07, that is, 7%, for a woman who has the first two children
with the same sex. This is consistent with the couples having a desire to have another child of a different
gender. Regression (2) is also consistent with this: if the first two children are boys, the couple is more
likely to have another child; also, if the first two children are girls, the couple is more likely to have a
second child. The coefficient on girls is somewhat smaller than the coefficient on boys in regression (2),
indicating that the probability of having another child is greater if you have two daughters than if you have
two sons, indicating that having at least one son is slightly preferred on average to having at least one
daughter.

3)

Consider the following potential instrumental variables for Kids>2 in regression (3):
a) Whether wife came from large family (binary) (3 points)
b) The teen pregnancy rate in the wifes city or town of residence (3 points)
For each proposed instrument, is the variable arguably a valid instrument variable? Briefly
explain.

For an instrument to be valid, it must

10

(i) be relevant (correlated with the included endogenous regressor, given the included exogenous
regressors) and
(ii) be exogenous (uncorrelated with the error term in the equation of interest).
For these two instruments:
(a) Wife coming from a large family:
(i) relevance: arguably yes if the wife came from a large family, she might be predisposed to
having a large family herself
(ii) exogeneity: no: coming from a large family would be correlated with religion (e.g. Catholic) or
would indicate being taught certain values, that would be in the error term so coming from a
large family would be correlated with the error term in her weeks worked equation.
(b) Teen pregnancy rate in wifes town:
(i) relevance: yes if teen pregnancy rate in her town is high, that reflects existing cultural
conditions and attitudes about women and work which could be correlated with her own personal
attitudes, which enter her decision about how many children to have.
(ii) exogeneity: no those same cultural attitudes that affect family size also influence the wifes
decision to work, so the instrument would be correlated with cultural attitudes which are in the
error term.

4)

Based on a combination of your judgment and the empirical results in Table 2:


a) Is Same sex a valid instrument in regression (4)? (3 points)

Validity requires (i) relevance and (ii) exogeneity.


(i) relevance: we can check this by the first-stage F, which is 1413. [Note: computing this as the
square of the t-statistic on Same sex regression (1) yields F = .(0694/.0018)2 = 38.62 = 1,486; the
difference between the two F-statistics is due to rounding error.] This first-stage F is (a lot) bigger
than 10, so the instrument is strong and passes the weak instruments test.
(ii) exogeneity: There is only one instrument here, Same sex, so we cannot test for the
exogeneity of the instrument. Thus assessing exogeneity requires exercising judgment. For the
instrument to be exogenous, it must be that Same sex is uncorrelated with the error term, given
the included exogenous regressor. Gender of children is randomly assigned. Thus whether the
first two children are of the same sex is uncorrelated with all the things in the error term that have
been discussed so far: religious and cultural background, unobserved wifes earning potential and
ambition, etc. All these should be independently distributed of gender of the first two children,
providing a strong argument that the instrument is exogenous.

b) Is the pair of variables, 2 boys and 2 girls, a valid set of instruments in regression (5)?
(3 points)
(i) relevance: first-stage F = 725.9 > 10, so the instruments taken as a set are relevant.
(ii) exogeneity. The logical reasoning given in (a) supports exogeneity here as well. Because there
are two instruments, however, we can also test for exogeneity of both, against the alternative that
one of them is not exogenous, using the J statistic. The J-statistic has a chi-squared distribution
with k1 degrees of freedom, where k is the number of instruments, so here it has a chi-squared
distribution with 1 degree of freedom. The J-statistic is 3.24; the 5% critical value of the F12

11

distribution is 3.84 and the 10% critical value is 2.71, so the J-statistic rejects at the 10% but not 5%
significance level. This provides some limited evidence against the hypothesis that both instruments
are exogenous, however the evidence is not strong (not significant at the 5% level), especially given
the very large number of observations. So it is reasonable to interpret this J-statistic as generally
supportive of the hypothesis of exogeneity.
An interesting aside: Suppose parents of boys need to take more time off on average from work than
parents of girls. Then 2 boys and 2 girls would be correlated with the error term in Weeks worked, that is,
they would be endogenous. However this does not imply that Same sex would be correlated with the error
term in Weeks worked, because Same sex simply says that you could either have 2 boys (negative effect)
or 2 girls (positive effect) which would by definition cancel out on average. So even if you are concerned
about the exogeneity of the pair of instruments 2 boys and 2 girls, you might not be concerned about the
exogeneity of Same sex.

5)

The estimated coefficient on Kids>2 differs in regressions (3) and (4) (the OLS estimate is
more negative than the TSLS estimate). Provide a real-world explanation (an interpretation
of the results) that explains why the OLS estimate is more negative than the TSLS estimate.
(3 points)

Consider explanation (ii) to question 1. Higher ambition implies more weeks worked and fewer children,
so the omitted variable bias effect is negative: fewer children is picking up higher ambition so the
coefficient on Kids>2 is biased towards a large negative number. This is what one sees comparing the
coefficients in regression (3) and (4).

12

Questions for Par t IV (17 points). Please answer these questions in Blue Book IV
1)

Consider a hypothetical regression (7),


Wifes weeks workedi = E0 + E1Kids>2 + ui

(7)

which would be estimated by TSLS, using Same sex as an instrument (so regression (7) is
regression (4) without the variables Boy first,, Other race). For this question, assume
that Same sex is a valid instrument in regression (4) and in addition that Same sex is
distributed independently of all the control variables in regression (4), so E(Boy first|Same
sex) = 0, , E(Other race|Same sex) = 0.
a) Explain why Same sex would be a valid instrument in regression (7). (3 points)
NOTE: TYPO ANNOUNCED DURING THE EXAM
The assumption
E(Boy first|Same sex) = 0, , E(Other race|Same sex) = 0

(*)

should be:
E(Boy first|Same sex) = E(Boy first), , E(Other race|Same sex) = E(Other race). (**)
The basic idea is that the difference between regression (4) and regression (7) is that W is in the error
term of (7). But if Same sex is distributed independently of W, and if it is exogenous in (4), then it is
uncorrelated with the error term in (7) because that error consists of the effect of W plus the error term in
(4).
Making this argument precise in equations (which isnt necessary for full credit) goes as follows. There
are two approaches. The first is to realize that, when an intercept is included in the regression, the
regression is the same if the regressors that vary are first subtracted from their mean. In this case,
assumption (*) is valid, and the regressors are simply (without loss of generality) interpreted as deviations
from their means. Here is the argument under this interpretation:
For Same sex to be valid in (7), it must be relevant and exogenous.
First the exogeneity argument:
Let u4 be the error term in regression (4), and write regression (4) as
Yi = J0 + J1Kids>2i + J2Wi + u4i

(4)

where Wi stands for all the other regressors in (4). The coefficient J1 is the effect of a change in Kids>2 on
Y, holding constant Wi. But if Same sexi is distributed independently of Wi then the final clause holding
constant Wi doesnt matter so J1 is the same as E1 in regression (7). This means that the error term in
regression (7) must be,
ui = J2Wi + u4i.
Thus,

13

(+)

E(ui|Same sexi) = J2E(Wi|Same sexi) + E(u4i|Same sexi).


But E(Wi|Same sexi) = 0 by the assumption that W and Same sex are independent, and E(u4i|Same sexi)
= 0 by the assumption that Same sex is a valid instrument for regression (4). Therefore, E(ui|Same sexi) =
0 and Same sex is exogenous in regression (7).
Next, the relevance argument: Because Same sexi is independent of W, in population the coefficient in
the regression of Wifes weeks on Same sex is the same as the coefficient on Same sex in the population
version of (1). In (1), Same sex is clearly relevant, so it would be relevant also if the W regressors were
dropped.
The second approach is to carry through nonzero means of the regressors, that is, to use (**). In this
case, the intercepts will differ in the two equations, so equation (+) is replaced by

E0 + ui = Jo + J2Wi + u4i.

(++)

Because the errors ui and u4i both have mean zero,

E0 = Jo + J2E(Wi),
Substituting this expression into (++) yields,
ui = J2[Wi E(Wi)] + u4i.
Thus
E(ui|Same sexi) = J2E{[Wi E(Wi)]|Same sexi} + E(u4i|Same sexi).
By the independence of W and Same sex and the exogeneity of Same sex as an instrument in regression
(4), it follows that E(ui|Same sexi) = 0, so that Same sex is an exogenous instrument in regression (7).
The relevance argument is the same as given above.

b) Provide a reason why, despite the validity of Same sex as an instrument in regression
(7), you would still prefer regression (4). (3 points)
Including W as a regressor in (4) does not change the consistency properties of 2SLS, however it could
reduce the standard error relative to (7) because including W will mean that the variance of u4i is less than
the variance of ul (the error of regression (7)), which means that the variance of the 2SLS estimator in
regression (4) could be less than that in regression (7). This can be checked empirically and the estimator
specification yielding the smaller standard error, (4) or (7), could be chosen, and that estimator could be
(4) for the reason just given.

2)

Some women are more ambitious professionally than others. Suppose that the effect on
labor force participation of having a large family is not the same for every woman,
specifically, the more ambitious the woman, the smaller is the effect (the most ambitious
women will work whether or not they have a large family). How if at all would this
change your interpretation of the results in regressions (4) and (5)? Explain your reasoning.
(5 points)

In general, if there is heterogeneity in treatment effects, then the IV estimator estimates the local average
treatment effect. The local average treatment effect is the weighted average treatment effect, weighted

14

most heavily by those women who are most heavily affected by the instrumental variable. The
interpretation for regression (5) is the same as for regression (4), so it suffices to focus on one or the
other; here we consider regression (4). Applied to regression (4), the local average treatment effect is the
weighted average effect on weeks worked of having a large family (more than two children), where the
women whose family size decision is most heavily influenced by the gender of their first two children. To
interpret this further, one must ask, (a) are there, plausibly, any differences among mothers in their desire
to have at least one child of each gender; and (b) if so, who are those mothers whose subsequent childbearing decisions are the most influence by the gender of their first two children?
The answer to (a) is entirely judgmental there is no evidence on that in the table. Here are three
perfectly valid answers (any one is sufficient):
(i) All mothers are arguably identical in the US in their preferences for a mix of boys and girls.
Then despite the variation in E1i, the instrument has the same effect on everyone, the TSLS
estimator consistently estimates the average treatment effect.
(ii) Mothers vary in their preferences for children of each gender, however that variation is
independent of their professional motivation so he variation in the effect of the instrument is
independent of the variation in E1i. If so, then the TSLS estimator consistently estimates the
average treatment effect.
(iii) Mothers from certain cultural backgrounds that value children of multiple genders will have
less desire to work, if they have a large family, so the effect of the instrument is the greatest for
those with small E1i. In this case, LATE will be larger (larger negative number what seems to be
a larger effect on Wifes weeks of having a large family) than the average treatment effect.
Another entirely satisfactory approach to this problem is to use the formulas presented in class in which
the effect of the instrument on Kids>2 is denoted by Si, so the LATE is given by
LATE =

E ( E1i S i )
.
E (S i )

Using this formula, one can then rephrase the substantive discussion above in terms of Si and the various
expectations in the LATE formula.

Use Table 2 to comment on the following statements. For each statement, do you agree or
disagree with the statement, and explain why (be specific).
3)

Families with large numbers of children tend to be unusual in certain ways, in some cases
coming from certain religious/ethnic backgrounds (traditional Catholic families, Mormons,
etc.). So the analysis in regressions (4) and (5) is not providing a valid estimate of the
effect of family size on labor supply, it is just reflects this religious/ethnic effect. (3 points)

Disagree. This criticism would be a valid criticism of OLS but it entirely misses the point that the use of the
instruments and TSLS eliminates the potential correlation in question.

4)

Even though having large families reduces female labor force participation, this is only half
of the story because their husbands will work more to compensate for the loss of the wifes
earnings. (3 points)

According to the evidence in regression (6) in Table 2, this is wishful thinking. Husbands weeks worked
increase slightly, but the estimated effect is much smaller than the decline in wifes weeks worked and it is

15

statistically insignificant.
By the way, when this analysis is repeated using income earned by the wife and husband, one finds that
the TSLS estimate of the husbands income increases slightly with more kids, but it does not increase
nearly enough to compensate for the decline in the wifes income (on average). This said, monetary
income is not necessarily the best measure of household economic well-being, for example the wife could
be cutting back on hours but also cutting back on expenses (lower child care payments, for example), so a
more complete analysis of the effect of large families on household economic well-being would need to
take into account the value of home production and market purchase of home services.

Parts III and IV were drawn from Angrist, J.D. and W.N. Evans (1998), Children and Their Parents
Labor Supply: Evidence from Exogenous Variation in Family Size, American Economic Review 88,
450-477.

16

Questions for Par t V (20 points). Please answer these questions in Blue Book V
1)

The value of GDP growth in 2005:III was 4.1 (that is, in the third quarter of 2005, GDP
grew by 4.1% at an annual rate).
a) Use regression (1) in Table 3 to compute a forecast of GDP growth for 2005:IV. (3
points)

The forecast for quarterly GDP growth for 2005:IV is 2.42 + 0.27u 4.1 = 3.5, or 3.5% at an annual rate.

b) Suppose that the errors in regression (1) are normally distributed. Compute a 95%
prediction interval (forecast interval) for GDP growth in 2005:IV. (3 points)
A 95% prediction interval is approximately given by the point estimate, r1.96SER, which is 3.5 r 1.96u3.3
= (-2.9%, 10.0%) (which is huge).

c)

Suppose that forecast errors come in clusters, for example, some years have more
volatile GDP growth than others, so that GDP growth is more difficult to predict in
some years than in others. Suggest a modification of regression (1) in Table 3 that
would produce more reliable forecast intervals if there is this forecast error volatility
clustering. (2 points)

Regression (1) could be modified to have an ARCH or GARCH model, which model the error variance as
depending on past squared errors and thus capture the effect of volatility clustering. If the error variance
is currently low, the ARCH estimate of the variance will be small and the confidence interval will be tighter.

2)

Table 3 reports heteroskedasticity-robust standard errors. Should it report HAC standard


errors instead? Explain. (2 points)

No, heteroskedasticity-robust standard errors suffice. HAC standard errors are needed when the error
term is serially correlated. If the number of autoregressive lags is large enough, then the error term will
not be serially correlated the serial correlation is captured by the autoregressive lags. The question is, is
one lag enough? Regression (2) indicates that one lag is in fact enough, the hypothesis that the additional
three lags have nonzero coefficients is not rejected. Another way to have determined the number of lags
is the BIC, which is not reported in the table; however the evidence in (2) is enough to provide confidence
that a single lag is enough to leave the error term serially uncorrelated.

3)

In Business Week Online (January 9, 2006), David Wyss, chief economist for Standard and
Poors wrote about how the recent decline of Term Spread has created worries about a
slowdown in U.S. economic growth. Based on the results in Table 3, do you think that
these worries are justified? Fully explain your reasoning. (5 points)

The coefficient on Term Spread in regressions (3) (5) is positive, so a decline in Term Spread predicts a
decline in GDP growth. To decide whether this is a worry or not, we need to see how large the effect is.
From Figure 2, the decline in the past year or so has been large. Consider a decline of 1 percentage
point. The predicted decline in GDP growth is 0.7 percentage points (regression (3)), 1.6 percentage
points (regression (4)), and 0.2 percentage points (regression (5)). The estimates for regression (3) are
substantial, for regression (4) are quite large, and for regression (5) are quite small.
Therefore we must decide which regression to use. Regression (4) uses data ending in 1984 and it
makes no sense to use that regression. Whether one uses regression (3) or (5) depends in part on

20

whether the hypothesis of stability of regression (3) can be rejected (if it cannot, then the low estimate in
(5) might just be sampling variation). The QLR statistic in regression (3) rejects the null hypothesis of
stability at the 5% level, but the QLR statistic for regression (5) indicates stability over that subsample.
These results indicate that the full-sample regression is unstable and thus inappropriate, but that the
second-half regression is stable and thus can be used. The predicted effect in regression (5) is very
small, so this suggests that the worry discussed in the Business Week Online article is misplaced, based
on the results in Table 3.

4)

Suppose the U.S. Federal Reserve Bank is considering setting Term Spread to 1.0, that is,
increasing Term Spread from its current value of approximately zero by 1.0 percentage
point. (Suppose that, because long rates are more sluggish than short rates, the Fed can do
this by lowing short-term interest rates until Term Spread equals 1.0.)
a) Use regression (5) to estimate the effect of this easing. (1 points)

The estimated effect is an increase in quarterly GDP growth of 0.18u1.0 = 0.18 percentage points (annual
rate).

b) In your judgment, do you think that your answer in (a) provides a good estimate of the
effect of this proposed policy intervention by the Fed? Why or why not? (4 points)
No. For this to be valid for policy purposes, the term spread would need to be exogenous. But interest
rates, especially long rates, are set by market participants who look ahead to future economic conditions.
In particular they would be taking into account the Feds expected future economic actions. To be more
specific, temporary short-term interest rate tightening would be associated with decline in Term Spread,
but that would also be associated with slowing the growth rate of GDP. This can be thought of as omitted
variable bias (or it can be phrased as simultaneous causation); in either event, Term spread is
endogenous, so the OLS estimate of the coefficient is a biased estimator of the causal effect that is of
interest in the hypothetical policy.

21

Par t I (24 points)


Please answer these questions in Blue Book I
The questions in Part I refer to the results in Tables 1 and 2.
1) Using regression (1) in Table 2:
a) (3 points) Compute the estimated effect on the childs years of education of an increase
of four years in the mothers education.
4 u .097 = 0.388 years

b) (2 points) Compute a 95% confidence interval for your estimated effect in (a).
SE = 4 u .027 = 0.108, so the 95% confidence interval is 0.388 r 1.96u0.108 = (0.18, 0.60)

2) Consider the relationship between the childs years of education and parental BMI, holding
constant the regressors in Table 2, column (1) other than parental BMI.
a) (2 points) Suggest a reason why this effect might be nonlinear.
Here is one example: Suppose BMI is just a proxy for omitted health factors. Extremely high or low
BMI could be associated with chronic illness which would make parents less able to spend time
parenting. If so, then very high and very low BMI would be associated with worse outcomes, relative
to BMI in a normal range. This suggests that the outcomes/BMI relationship might be modeled by a
quadratic, where the maximum of the quadratic is in the range of normal BMI.

b) (2 points) Can you reject the null hypothesis that effect on the childs years of education
of parental BMI is linear? Explain.
No. The F-statistic testing the hypothesis that the coefficients on (Mother's BMI)2, (Fathers BMI)2,
(Mothers BMI) x (Fathers BMI) are all zero has a p-value of 0.634, so the F-statistic is not significant
at the 5% (or 10%) level.

3) Consider the regressions in Table 1.


a) (2 points) Explain why these regressions can be used to examine the proposition that the
assignment process of adoptees to families was in effect random.
If assignment is random, then in particular there should be no relationship between the characteristics
of the family and the characteristics of the adoptee. One implication of this is that the population
regression relating adoptee characteristics (Y) to family characteristics (X) should have coefficients
that all equal zero. That is, the population coefficients in all the regressions in Table 2 should be zero
if assignment is random. These restrictions can be tested using an F-statistic for each regression.

b) (2 points) Using regressions (1) and (2), can you reject the hypothesis of random
assignment? Explain.
Yes, the F-statistic in both cases is significant at the 1% significance level (p-value < .01).

c) (2 points) Using regressions (3) and (4), can you reject the hypothesis of random
assignment? Explain.
No, the F-statistic in both cases is not significant at the 10% significance level (p-value > .10).

d) (3 points) Explain what your answers to (b) and (c) imply about the program. Explain, in
real-world, concrete terms, how you might reconcile any discrepancy between your
answers to (b) and (c).
The only difference between regression (1) and (3), and between regression (2) and (4), is that
regressions (3) and (4) include a full set of year dummy variables indicating the year in which the
adoption took place. The results are consistent with the assignment being random, conditional on the
year in which the adoption took place, but not with the assignment being random unconditionally.
Unconditionally, parents income is positively associated with weight and height, but conditionally it is
not. This might arise if, in the early years of the program, parents tended to be richer and the children
tended to be older, while in later years of the program, parents had lower incomes and the children
were younger, but in each year the children in the adoptee pool were randomly assigned to parents in
the parent pool.

4) The standard errors reported in Tables 1 and 2 are clustered standard errors, clustered at
the level of the household.
a) (3 points) Explain specifically what this means, that is, what are clustered standard errors,
clustered at the level of the household? Be precise.
Clustered standard errors allow for the possibility that the regression errors (ui) are correlated across
observations within a cluster, but that the regression errors are uncorrelated between clusters. In this
instance, clustered standard errors allow for the possibility that the error term is correlated among
adoptees who are in the same household, but not across adoptees who are in different households.

b) (3 points) Provide a reason why the clustered standard errors could be larger than the
conventional heteroskedasticity-robust standard errors for the regressions in Table 2.
The clustered standard errors will exceed the conventional heteroskedasticity-robust standard errors if
there is positive correlation between the errors within the household. This would arise if there are
omitted variables in the regression that are correlated for the two adoptees. Such omitted variables
might be omitted household characteristics. For example, the number of natural children in the family
is omitted; if more children means less attention to each child, then the number of natural children
could be a determinant of economic outcomes, but because it is the same for both adoptees in the
same family, the regression errors for adoptees within the family would be positively correlated. In this
case, the clustered standard errors would exceed the heteroskedasticity-robust standard errors.

Par t II (22 points)


Please answer these questions in Blue Book II
The questions in Part II refer to the results in Tables 2 and 3.
1) Consider a female adoptee whose adoptive mother has 14 years of education, whose father
has 16 years of education, whose parents income is $50,000, mothers BMI is 23, fathers
BMI is 24, the mother does not drink, and the father does not drink. Also suppose that the
child was adopted in the initial program year (so all binary year variables equal zero).
a) (3 points) Using regression (2) in Table 3, compute predicted probability that the adoptee
grows up to be a drinker.
The z score is
z = .013u14 + .022u16 + .079uln(50,000) + .000u23 + .000u24 + .374u0 + .211u0 + .203u0 1.300
= .089
so the predicted probability is the cumulative normal density, evaluated at z = .089, which is 0.535.

b) (2 points) What is the difference in the predicted probabilities of drinking for the adoptee
in (a), compared with an adoptee whose parents have the same characteristics as those in
(a) except that the mother drinks?
If the mother drinks, the z score is the same as above, plus .374, that is,
z = .089 + 0.374 = 0.463
so the predicted probability is the cumulative normal density, evaluated at z = 0.463, which is 0.678.
The difference in predicted probabilities is .678 - .535 = .143. That is, the adoptee whose mother
drinks is 14.3 percentage points more likely to drink than the adoptee whose mother does not drink,
given the values of the other regressors.

c) (2 points) Now use the linear probability model from Table 2 to estimate the change in
predicted probabilities for the comparison in 1(b) (that is, a nondrinking vs. a drinking
mother, with the values of the other regressors given at the beginning of this question).
The corresponding linear probability model in Table 2 is regression (6). The coefficient on Mother
Drinks in that regression is the estimated effect of a unit increase in Mother Drinks on the probability
that the child drinks, holding the other variables constant. The estimated increase in the probability of
drinking from having a drinking mother is 0.135, approximately the same as the increase of 0.144
estimated using the probit model.

2) Using the results in Tables 2 and 3, do you agree or disagree with the following statements?
Explain.
a) (5 points) Many countries impose restrictions on foreign adopting parents, including
limits on parental BMI and parents education. The results in Tables 2 and 3 support
7

these policies in the sense that Tables 2 and 3 show that high parental BMI and low
parental education both are associated with worse outcomes for adoptees.
Agree. Conditional on the year of adoption, the assignment of children to parents does indeed appear
to be random, so this is a valid experiment for determining the causal effect of assignment of children
to parents with different characteristics. This is completely analogous to a drug study, where the
treatment is the parent characteristics. The result in Table 2 and 3 show that some parent
characteristics do affect child outcomes, although others do not. For example, the adoptee will have
more education if the mothers BMI is lower and if the mothers education is higher. These magnitudes
are statistically significant and, arguably, large: in Table 2, regression (1), four more years of mothers
education is associated with approximately 0.4 years of childs education. (One could also argue that
this is a relatively small increase in education; this is a matter of judgment; but in light of the small
number of manipulable variables that affect educational outcomes, this estimate of 0.4 years is rather
substantial)

b) (5 points) The results in Tables 2 and 3 show that dieting by overweight mothers has
positive benefits for children. Specifically, consider a mother who decreases her BMI by
10 (for an obese woman, this corresponds to a weight drop of approximately 25%). On
average, holding other family characteristics constant, we would expect to see this weight
loss lead to an economically substantial increase in the childs years of education and in
the childs probability of graduating from college.
Disagree. From Table 2, regression (1), a drop in BMI of 10 is associated with an increase in the
childs years of education by 0.88, that is, almost one year. This is a sizeable amount in a real-world
sense. Using the coefficient from the linear probability model in Table 2, regression (3), a drop of BMI
by 10 is associated with an increased probability of being a college graduate by 0.17, or 17
percentage points, also a large amount in a real-world sense. These effects are statistically
significant. It does not follow, however, that these estimates are causal effects of dieting. They are
causal effects of placing children in households with these different characteristics. The question is, is
E(u|X,W) = 0, where W represents the year dummies and X represents the parental characteristics in
Tables 2 and 3? Arguably, the answer is no. For example, suppose it is not BMI, but maternal health
that is the determinant of Y (childs education). Then health is an omitted variable which is correlated
with maternal BMI, so E(u|X,W) z 0. In real-world terms, dieting would change BMI but might not
change the other health conditions (or might only partially change them) so that the true determinant,
health, would not change (or might only change slightly). This is standard OV bias.

c) (5 points) The results in Table 3 shed light on the nature-nurture debate. These tables
show that paternal characteristics (such as drinking and being overweight) are transmitted
primarily through a genetic path, whereas maternal characteristics seem to be transmitted
primarily through a non-genetic (that is, environmental) path.
Agree. The coefficients on education, BMI, and parental drinking are all much larger for the natural
children than for the adoptees. This is especially true for the paternal variables, for example the
change in the z-score for college graduation associated with 4 additional years of paternal education
is a large (0.105u4 = 0.42) for natural children, but falls slightly by a statistically insignificant amount
(-0.010u4 = -0.04) for adoptees. This is strongly suggestive of a genetic pathway for the paternal
education effect, not a social pathway. For mothers, the difference in the coefficients is smaller
between the adoptees and natural-born children. The effect of maternal BMI on college graduation is
quite small for the two groups (-.086 vs. -.108) suggesting this pathway is mainly environmental.

Questions for Par t III (34 points)


Please answer these questions in Blue Book III
The questions in Part III refer to Table 4.
1) (3 points) Suggest a reason why TV exposure might be endogenous in regression (1).
An endogenous regressor is one that is correlated with the error term. Here are two reasons why TV
exposure could be correlated with the error term (one is enough). (1) if you are obese you are less mobile
so you sit more and you might as well sit in front of the TV. (2) There could be an omitted variable,
amount of outdoor activity, which burns calories and reduces obesity (it is a determinant of obesity) and
which is correlated with TV exposure (you watch TV inside).

2) Regression (3) uses three variables as instrumental variables for TV exposure. For each
instrument, explain whether, in your judgment, the instrument plausibly is exogenous:
a) (2 points) the Price of TV advertising in the county;
The desired exogeneity condition is E(u|Z,W) = 0, where Z are the instruments, u is the error term in the
BMI equation, and W are the additional regressors (included exogenous regressors) in the obesity
equation. Plausibly there is no feedback from BMI of the individual child to the regional price of
advertising, so there is no simultaneous causality. You also need to think about whether there might be
an omitted variable in the BMI equation that would be correlated with the price of TV advertising. One
possibility is the population density. Counties with more people will have higher TV ad prices (the TV ads
reach more people). If children in urban areas get less outdoor play and exercise than children in rural
areas then the price of TV advertising will be correlated with the omitted variable, outdoor exercise. If so,
the instrument would not be exogenous.

b) (2 points) the Number of households with TV in the county;


The reasoning here is the same as in (a), there is no simultaneous causality but the number of households
would be correlated with any rural/urban differences such as outdoor activity. Additionally, almost all
American households have TVs, so any variation in this instrument would be associated with unusual
conditions such as extreme poverty, subpopulations that do not watch TV (Amish, rural Alaska), etc. But
these extreme situations are plausibly also determinants of health and exercise patterns (the Amish walk
instead of using cars) that would appear in the error term of the childhood BMI equation, so Z would be
correlated with u.

c) (2 points) the average annual county Temperature.


Again there would be no simultaneous causality but the regional temperature will be correlated with
population characteristics, for example higher proportion Hispanics living near the border with Mexico.
However, the childs race is included as a W variable so the question is whether temperature is correlated
with the error, controlling for the Ws (which include race). To the extent that outdoor activity is
temperature-dependent, outdoor activity would be omitted (in u) and correlated with temperature.

3) Consider regression (3).


a) (3 points) Suppose the instruments in regression (3) are weak. If so, what would the
consequence be for interpreting the results in column (3), specifically the coefficient on
TV exposure and its standard error?
If the instruments are weak, the distribution of the TSLS estimator is poorly approximated by its largesample normal distribution. Specifically, the TSLS coefficient will be biased, and conventional confidence

11

intervals will not contain the true value 95% of the time, in fact, the coverage rate of conventional
confidence intervals (and the size of hypothesis tests) can be very far from the nominal rate of 95% (or
size of 5%).

b) (3 points) Based on the results in Table 2 (TYPO: this should be Table 4), are the
instruments weak, are they strong, or do you need more information before you can
decide? Explain.
The first-stage F-statistic testing the hypothesis that the coefficients on the instruments in the first-stage
regression is 41.92. This exceeds the rule-of-thumb value of 10 so the instruments can be treated as
strong. Note: it is not enough that the first-stage F-statistic be statistically significant at the 5% or 1%
significance level the instruments can have statistically significant coefficients but still explain so little of
the variation in the endogenous regressor that the problems listed in response to 3(a) arise so that the
instruments are weak.

4) Consider the J-statistic in column (3).


a) (3 points) Suppose you were to reject the null hypothesis using this J-statistic. What
would you conclude?
The J-statistic tests the hypothesis that the overidentifying restrictions are valid, that is, that all the
instruments are exogenous, assuming that at least one of them is exogenous. Rejection by the Jstatistic indicates that one or more of the instruments are not exogenous.

b) (3 points) Using the J-statistic actually reported in column (3), do you reject the null
hypothesis at the 5% significance level? Explain how you reached this conclusion (be
precise).
The J-statistic has a chi-squared distribution with degrees of freedom equal to the number of
overidentifying restrictions. There are 3 instruments and one endogenous regressor so there are 2
overidentifying restrictions. From the chi-squared table, the 5% critical value of the chi-squared
distribution with 2 degrees of freedom is 5.99; from regression (3) in Table 4, the J-statistic is 0.308 <
5.99, so the null hypothesis is not rejected at the 5% significance level.

5) (3 points) A researcher suggests using as instruments a full set of county binary variables
(county dummy variables). What would be the effect of adding a full set of county dummy
variables to regression (2)?
The instruments in regression (2) all vary at the county level, for example Temperature is the annual
mean temperature in the county. Because these vary at the county level, they are perfectly explained
by a complete set of county dummies; that is, a regression of Temperature on an intercept and n-1
county indicator variables will have R2 = 1.00. Thus all three of the instruments in regression (2) are a
perfect linear combination of the county dummies, and for this reason adding the full set of county
dummies (n-1 indicators, plus the intercept) to regression (2) will result in perfect multicollinearity.

6) (5 points) Another researcher suggests replacing the instruments in regression (3) with a new
instrumental variable, ProSports, that equals one if at least one local professional sports team
was in the playoffs during the study period, and equals zero otherwise. For the purposes of
this question, suppose that ProSports is a valid instrument. Describe, in concrete and
12

everyday terms, a reason why the local average treatment effect obtained using ProSports
would differ from the average treatment effect. In your example, is the local average
treatment effect greater than or less than the average treatment effect?
The local average treatment effect is a weighted average treatment effect, where the most weight is
placed on those most affected by the instrument. In this case, the LATE would be the treatment effect
for those whose TV watching is most swayed by the presence of a successful pro sports team. These
are people who would not normally watch TV, but would do so if they have a playoff game to watch.
Call these people non-TV watching sports fans.
The question is, does the effect of fast-food advertising differ for non-TV watching sports fans,
compared with the rest of the population? If so, then LATE z the average treatment effect in the
population. Here is a possible reason. Suppose these non-TV watching sports fans are normally out
playing sports and leading healthy lives, which means not eating too much fast food. Then seeing ads
on TV will not induce them to eat more fast food. If so, the LATE using ProSports would be less than
the average treatment effect.

7) (5 points) Do you agree or disagree with the following statement? Explain fully. (The
sample average of TV Exposure is approximately 0.5 hours.)
The results in Table 1 (TYPO: this should be Table 4) indicate that a ban on TV fast-food
advertisements would reduce the BMI among children by an amount that is statistically
significant and meaningful in a real-world sense.
Here is a full credit Agree response:
The criticism in question 1 implies that TV Exposure is endogenous so regression (1) is not
meaningful. Therefore, to evaluate the statistical and real-world significance, we need to look at the IV
estimates in regression (3). The coefficient on TV Exposure is statistically significant at the 5% level
in regression (3). Concerning the real-world significance, the change in BMI associated with a ban on
advertising (going from 0.5 to 0 hours/week) is -0.336u0.5 = -0.17. One way to see if this is large is to
compare it to the average BMI increase over the past 30 years, which is (from the intro to Part III)
17.37 - 16.63 = 0.74. Taken literally, a ban on fast-food advertising would reverse 0.17/0.74 = 0.22, or
approximately one-fifth of the mean childhood weight gain over the past three decades. Thus, in a
real-world, medical sense, this is a large and statistically significant effect. The question then turns to
whether the instruments are valid, that is, whether regression (3) provides a consistent estimate of the
causal effect of fast-food TV advertising on BMI.
Based on the evidence presented, the proposed set of three instruments are valid, specifically they
are not weak (first-stage F =41.92 > 10) and they appear to be exogenous. This latter point is
supported by the failure of the J-statistic to reject the null hypothesis that the instruments are
exogenous, that is, to reject the overidentifying restrictions. Although some doubts were raised about
the validity of the instruments in the response to question 2, these seem not to be justified, or at least
empirically important, based on this J-statistic. Because the instruments are valid, inference based on
regression (3) is internally valid and the conclusion of the previous paragraph stands.
Here is a full credit Disagree response:
The criticism in question 1 implies that TV Exposure is endogenous so regression (1) is not
meaningful. Therefore, to evaluate the statistical and real-world significance, we need to look at the IV
estimates in regression (3). The coefficient on TV Exposure is statistically significant at the 5% level
in regression (3). Concerning the real-world significance, the change in BMI associated with a ban on
advertising (going from 0.5 to 0 hours/week) is -0.336u0.5 = -0.17. One way to see if this is large is to

13

compare it to the average BMI increase over the past 30 years, which is (from the intro to Part III)
17.37 - 16.63 = 0.74. Taken literally, a ban on fast-food advertising would reverse 0.17/0.74 = 0.22, or
approximately one-fifth of the mean childhood weight gain over the past three decades. Thus, in a
real-world, medical sense, this is a large and statistically significant effect. The question then turns to
whether the instruments are valid, that is, whether regression (3) provides a consistent estimate of the
causal effect of fast-food TV advertising on BMI.
Based on the evidence presented, the proposed set of three instruments are not weak (first-stage F
=41.92 > 10), however there is good reason to believe they are not valid. It is true that the J-statistic
fails to reject the null hypothesis that the overidentifying restrictions are valid. However, it should be
borne in mind that the J-statistic cannot test whether all the instruments are exogenous, it can only
test the exogeneity of k-1 instruments, assuming that at least one instrument is exogenous. The
response to question 2 raises doubts about all of these instruments: they all could be correlated with
unobserved county variation that is a determinant of childhood obesity, in particular urban/rural
differentials and access to outdoor exercise opportunities. It is possible for the J-statistic not to reject
but still for all the instruments to be endogenous, and the a-priori reasoning in the response to
question 2 suggests that this is what is going on in the TSLS regressions. This means that the
estimate in regression (2) is not a valid estimate of the causal effect of fast-food TV advertising on
BMI, so causal inference about the proposed policy change is not justified by the results in this table.

14

Questions for Par t IV (20 points)


Please answer these questions in Blue Book III
The questions in Part IV refer to Table 5.
1) In an expansion year, new teams are added to the league.
a) (3 points) What is the immediate, or impact, effect of an expansion on competitiveness?
(Provide a numerical estimate and interpret.)
The dynamic effect of an expansion year is estimated in regression (3). The immediate or impact effect is
the coefficient on ExpansionYeart (no lags), which is 1.54. That is, an expansion year is associated with a
decrease of competitiveness, specifically, a (statistically significant) increase in the standard deviation of
the winning percentage by 1.54 percentage points. (This makes sense, the expansion teams are typically
not as good as the original teams.)

b) (3 points) What is the cumulative dynamic effect of the expansion on competitiveness,


two years after the expansion? (Provide a numerical estimate and interpret.)
The cumulative dynamic effect after two years is the impact effect, plus the dynamic effects after one and
two years. This is 1.544 0.583 0.293 = 0.668. Thus, after two years, the standard deviation of the
points spread has had a net increase of only 0.67, less than half the initial impact increase. This indicates
that competitive balance is slowly restored after the expansion.

c) (3 points) Compute the standard error for the cumulative dynamic effect in (b). If you do
not have enough information to do so, explain how you would compute this standard
error and what additional information you would need.
There is not enough information. There are several ways to calculate the standard errors for this 2-year
cumulative effect. One way to do so would be to use the full covariance matrix of the regression
coefficients. Let

E1 , E2 , and E3

respectively denote the coefficients on Expansion Yeart, Expansion

Yeart-1, and Expansion Yeart-2. Then


var( E1 + E2 + E3 ) = var( E1 ) + var( E2 ) + var( E3 )
+ 2cov( E1 , E2 ) + 2cov( E1 , E3 ) + 2cov( E2 , E3 ).
The final three terms in this expression are missing from the table and their estimates are contained in the
full covariance matrix of the regression coefficients. However this sample covariance matrix is not
provided.

2) (3 points) Table 5 reports two sets of standard errors, heteroskedasticity-robust standard


errors and Newey-West standard errors. Which should be used here? Explain.
The Newey-West standard errors are appropriate if the errors are correlated over time. In this case, it
makes sense that there would be omitted variables that determine the competitive balance over time, but
that are omitted. For example, this competitive balance is presumably affected by the relative wealth of
the different teams, not just what they pay to free agents. The relative wealth of the different teams would
be serially correlated, so the regression errors would be serially correlated. Because the regression errors
are serially correlated, the Newey-West standard errors should be used.

17

3) (3 points) A critic of this analysis asserts that the relationship in regression (3) might be
unstable and suggests computing the QLR statistic (with trimming of 15% on each end of the
sample, as is conventional). Is this a good recommendation for the purpose of assessing the
stability of regression (3)? Explain why or why not.
This is not a good recommendation. FreeAgentst = 0 before the introduction of free agency, see Fig. 1.
So splitting the sample in for example 1965 will yield all values of FreeAgentst = 0 in the pre-1965 sample,
which will be perfectly mutlicollinear with the intercept term. Thus the QLR cannot be computed with
trimming of 15% on each end of the sample.

4) (5 points) Baseball owners assert that free agency reduces competitiveness across baseball
teams because rich teams can outbid poor teams, increasing talent disparities across teams.
Based on the results in Table 5, do you agree, disagree, or can you not reach a conclusion?
Explain.
Taken at face value, the regressions in Table 5 all suggest the opposite conclusion, that free agency has
been associated with a decrease in the standard deviation of winning percentages, that is, with an
increase in the competitive balance.
The question is whether to take this result at face value. It is hard to say without more information about
baseball than is provided in the introduction to this part. But it is evident from Fig. 1 that FreeAgentst is
essentially picking up a trend towards greater competition. The question then is whether there are other
determinants of that trend, which would be correlated with FreeAgentst, that is, whether there is omitted
variable bias. Plausibly, there might be. For example, if the pool of players is better now than it was, then
the difference in raw talent across teams might be less (hypothetically, suppose in the 50s there were a
few stars then many average players, but now all the players are much closer in quality; then differences
in quality across rosters would be less).
Note that, because the error term is serially correlated because of these omitted variables (we can also
deduce this from the substantial differences between the heteroskedasticity-robust and Newey-West
standard errors), it is not enough to say that FreeAgentst-1 is exogenous because it appears with a lag.
That lagged value of FreeAgents could be correlated with persistent elements of the error term, such as
improvements in the depth of talent.

18

Potrebbero piacerti anche