Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Harvard University
Economics 1123
Fall 2003
Midterm Exam
Solutions
parents. On holidays like Thanksgiving or Christmas, teens are more likely to be spending time
with their family rather than hanging. (Other explanations are OK if the logic is sound.)
Question 2 (30 points)
a) (6 points) Explain what is meant by the SER in regression (3).
The SER is 3.89, which means that a typical prediction error from the regression has absolute
magnitude 3.89 (units of incidents per day)
b) (6 points) Interpret the coefficient on teacherday in regression (3).
The effect on the number of incidents of having a teacher meeting day is .87 (an increase of .87
incidents/day), holding constant population, whether the day is a break day, and whether the day
falls in the summer. (Here is another way to say the same thing: holding constant population,
the effect of having a teacher meeting day on what would otherwise be a normal school day is to
increase juvenile property crime incidents by .87 [this wording is OK because including
breakday and summer means that the base case for the comparisons is a normal school day]).
c) (6 points) Suggest a reason why the errors in regression (3) might be heteroskedastic;
explain.
Heteroskedasticity occurs when the variance of the error term depends on one or more of the
regressors.
In this regression, var(u) could plausibly depend on population. Bigger cities will have more
incidents because there are more teens, and it is plausible that the variability of the number of
incidents from day to day is greater in big cities than small cities (the distribution for a small city
might range from 0 to 5; for a big city, it might range from 2 to 20). Thus the spread of the
distribution, as well as its mean, depends on the population; that is, var(u), will be a function of
population, one of the regressors.
d) (6 points) Using regression (3), compute the predicted value of the number of incidents on a
teacher meeting day for a city with a population of 200,000 (so pop = 2).
#n
incidents = 1.39 + .87teacherday + .129breakday + .292summer + 1.58pop
= 1.39 + .871 + .1290 + .2920 + 1.582 = 5.42
e) (6 points) The school superintendent in a city with population 200,000 is contemplating
changing a normal school day into a teacher meeting day. Use regression (6) (not
regression (3)) to estimate the effect of this decision on the number of juvenile property
crime incidents.
This can be done using a before and after calculation. Note that the terms that do not
involve teacherday, such as the term in breakday, are the same in the before and after
scenarios and thus cancel out when before is subtracted from after. The operational part of
the calculation (ignoring the terms that drop out) thus is:
after:
#n
incidents = .82teacherday + 3.05(teacherdaypop) 1.21(teacherdaypop2)
+ .13(teacherdaypop3)
= .821 + 3.05(12) 1.21(122) + .13(123) = 1.48
before:
#n
incidents = .82teacherday + 3.05(teacherdaypop) 1.21(teacherdaypop2)
+ .13(teacherdaypop3)
= .820 + 3.05(02) 1.21(022) + .13(023) = 0
The predicted effect of the decision is after before = 1.48, that is, according to regression
(6) the predicted effect of the superintendents decision is an increase of 1.48 juvenile property
crime incidents on the proposed teacher meeting day.
Question 3 (26 points)
a) (8 points) One possibility is that pop enters the population regression function nonlinearly.
What does regression (5) tell us about this possibility? Briefly explain (be precise).
The hypothesis of linearity implies that the population coefficients on pop2 and pop3 are zero.
This is tested using the F-statistic, which is 171.5, so with p < .001 the hypothesis is rejected at
the 5% (1%, etc.) significance level. This provides evidence that pop enters the population
regression function for specification (5) nonlinearly. (Note: it is not a complete answer to
examine only the individual t-statistics on the two individual coefficients on pop2 and pop3,
unless you do so using the Bonferroni critical values.)
b) (8 points) Another possibility is that the effect on crime of a no-school day is different in
bigger cities than in smaller cities. What do the regression results tell us about this
possibility? Briefly explain (be precise).
To examine whether the effect of teacherday depends on population, we need to examine
regressions with interactions between teacherday and pop. There are two such regressions, (4)
and (6).
Regression (4) There is a single interaction term, teacherdaypop. The t-statistic is t = .40/.31
= 1.29 so the hypothesis that this coefficient is zero in population is not rejected at the 5%
3
significance level. So, there is no statistical evidence of an interaction effect in this regression.
Regression (6) There are three interaction terms, teacherdaypop, teacherdaypop2, and
teacherdaypop3. The hypothesis of no interaction effect is equivalent to saying that the
coefficients on all three terms must be zero in the population regression function. This is tested
using the F-statistic for these three coefficients, which is F = 1.75. Because its p-value of .155
exceeds .05, we cannot reject (at the 5% significance level) the null hypothesis that all three
coefficients are zero. So, there is no statistical evidence of an interaction effect in this
regression.
Overall, there is no statistical evidence that the effect of a teacher meeting day varies with the
population of the city.
c) (10 points) In words, briefly summarize your conclusions from Table 2 about the effect on
juvenile property crime of having a no-school day because of a teacher meeting day.
x
x
x
x
There is no statistical evidence that the effect of a teacher meeting day varies with the
population of the city.
There is statistical evidence that population enters the regression function nonlinearly
These two observations point to regression (5) as being the preferred specification
In regression (5), after controlling for population, the estimated effect of a teacher meeting
day, relative to a normal school day, is to increase the number of incidents of juvenile
property crime by .87, or nearly one. In a real-world sense this seems pretty big: hold a
teacher meeting day, you get a juvenile property crime incident. This effect is statistically
significant at the 5% (and 1%) significance level. [Also, could add: The 95% confidence
interval of (.42, 1.32), which in a practical sense is rather tight, it provides strong evidence
against really big effects like 10 incidents per teacher meeting day.]
b) (10 points) Suggest two potential threats to the internal validity of your conclusions in 3(c).
That is, provide two potential threats to the internal validity of the regression analysis
summarized in Table 2. Explain why each threat could be relevant to this study (be precise).
Here are five:
x
(Omitted variable bias) Crime rates vary with income. Thus (i) income belongs in the
regression function. Moreover, teacher meeting days cost the school district money.
Suppose that the poorest cities cannot afford teacher meeting days, and the richest ones have
many teacher meeting days. If so, then (ii) teacherday would be correlated with income.
Thus the two conditions for omitted variable bias would hold and there would be omitted
variable bias. Intuitively, teacherday would be picking up an income effect. This would
result in the OLS estimates understating the effect of teacherday (more income, more teacher
days and less crime).
(Wrong functional form) Whether teens commit a crime could depend on the temperature,
with crime less likely below freezing. If so, then temperature should enter the equation and
moreover there should be interaction terms between temperature and teacherday. According
to this story, the effect on the number of incidents of having a teacher meeting day on a day
with cold weather would be less than if it is held during warm weather.
(Errors-in-variables bias) If the administrative records used to construct these data were
incorrect and teacherday was measured incorrectly, then there would be errors-in-variables
bias and the estimated effect would be understated.
(Wrong standard errors) The OLS assumption of i.i.d. sampling is violated here because
there are repeated observations within the same city. This will induce correlation of the error
term across cities. For example, if Dayton, Ohio (one of the cities in the data set) has a large
number of incidents on a given Monday, then it might well have a large number on the
following Tuesday. If so, the formula for computing the standard errors is incorrect. This
problem does not produce bias in the OLS estimators but it means that confidence intervals
and test statistics will be wrong, leading to incorrect inferences in general. (There is no
reason you should know this: this explanation, while correct, has not yet been covered in
class.)
An increase in Beauty by one is associated with an increase in the Course Overall score by .275, holding
constant the effects of the instructors gender, minority status, status as a native English speaker, whether
s/he is in a tenure track position, and whether the course is a one-credit course.
2)
(5 points) Using regression (2), compute a 95% confidence interval for the population
coefficient on Beauty.
EBeauty
3)
4)
(5 points) Professor Stock is male, not a minority, is a native English speaker, and is tenure
track. Ec1123 is not an introductory course, nor is it a one-credit elective. Suppose that
Professor Stock has average beauty, so his value of Beauty is zero. Use regression (2) to
compute the predicted course overall course evaluation score for Ec1123 this semester.
This is solved by substituting the values of the variables into the regression equation:
5)
(5 points) The professor in Ec1123 next semester is a tenure-track white male Australian.
Suppose he has a Beauty score of 1.66. Use regression (2) to compute a 95% confidence
interval for the difference between the Ec1123 Course Overall evaluation score next
semester and the Course Overall score this semester.
All the variables for Prof. Elliott are the same as for Prof. Stock, except that Beauty = 1.66; so the
difference is
Suppose you want to estimate a version of regression (2) in which the coefficients on all
regressors except Beauty are the same for men and women, however the effect of Beauty
can differ for men and women.
a) (4 points) Provide a regression specification that achieves this (be specific).
b) (2 points) In your specification in (a), how would you test the hypothesis that the effect
of Beauty is the same for men and women (be specific)?
2)
The coefficient on Beauty drops from .410 in regression (1) to .275 in regression (2).
a) (4 points) Explain why. What does this drop imply about the relation between Beauty
and One-credit course?
b) (4 points) Is your reason in (a) for this decline plausible in a real-world sense?
Explain.
a) Because the coefficient fell, there was omitted variable bias in regression (1), specifically the coefficient
on Beauty was in part reflecting the effect of OneCreditCourse. In regression (2), the effect of
OneCreditCourse is positive. This means that the correlation between OneCreditCourse and Beauty must
be positive: If they are positively correlated and OneCreditCourse is omitted, then Beauty will pick up the
(positive) effect of OneCreditCourse and the coefficient on Beauty will be larger without OneCreditCourse
in the regression, than with it in.
Alternatively, this positive correlation can be seen directly from the omitted variable bias formula:
p
E1 o E1
Vu
U
V x Xu
Because the estimator without OneCreditCourse is too large, the second term in the expression must be
positive, so UXu is positive; but because OneCreditCourse enters the regression with a positive coefficient
it enters u positively when it is omitted so it must be that Beauty and OneCreditCourse are positively
correlated.
b) For Beauty and OneCreditCourse to be positively correlated, it must be that instructors of one credit
courses, like dance or yoga, are better looking than instructors of regular courses, like econometrics. This
doesnt seem plausible, if anything, I would think the correlation should be negative. Who would you
rather look at anyway, and economist or a dance instructor? The economist, I should think.
3)
a) For a variable to cause omitted variable bias, it must be (i) a determinant of Y (belong in the equation),
and (ii) be correlated with X.
(i) Yes: The amount of time the instructor spends preparing should be a determinant of course
quality and thus course evaluations
(ii) No: To the extent that diligence and Beauty are uncorrelated, Beauty will be uncorrelated with
the amount of time spent preparing.
Because (i) and (ii) are not both Yes, omission of the amount of class preparation does not result in
omitted variable bias.
Note: Here is a (full credit) argument that (ii) should be yes: Physically unappealing instructors have
learned over the years that they must make up for their bad looks and therefore work harder.
b) Addressing the two criteria:
Two full-credit answers to (i):
(i) No. It isnt obvious why marital status should affect teaching quality
(ii) Yes. The divorced might be less happy and this might be reflected in their teaching, either
negatively (bad attitude) or positively (escape into their work, which is teaching).
Three full-credit answers to (ii):
(ii) Yes. Demand for a spouse is increasing in Beauty so the beautiful are more likely to be
married, so Beauty and Married would be positively correlated.
(ii) Yes. Supply of spouses is decreasing in Beauty because they dont need to get married (more
non-marital opportunities) and have shorter spells, so Beauty and Married would be negatively
correlated.
(ii) No. There is assortive matching (the technical term), that is, movie stars marry movie stars,
economists marry people who look like economists, etc., and everyone gets a match despite their
looks, so Beauty and Married are uncorrelated.
Picking Yes to both yields OV bias, picking No to either (or both) yields no OV bias. Personally, I think
the most reasonable choice is No (the divorce effect probably is a very small part of teaching quality) and
No (the assortive matching argument).
Suppose you have data on years of teaching experience (Experience) of the instructor, and
you are considering choosing among three possible specifications:
(i) regression (2) plus Experience
6
(6 points) In your judgment (before you know the results of these regressions), which
specification, (i), (ii), or (iii), is the most appropriate? Explain.
b) (4 points) Suppose you estimated regressions for specifications (i) and (ii). How would
you decide, based on the empirical evidence, whether (i) or (ii) is more appropriate.
a) Quite plausibly, there is a decreasing marginal effect of experience: Instructors learn a lot in their first
few years, but after they are farther up the learning curve the incremental amount they learn, and
improve, in subsequent years is less. This means that (i) is a poor initial choice. Both (ii) and (iii) allow for
decreasing marginal effects. An argument for using (ii) is that it is more flexible than (iii) (more terms), for
example it would fit an S-shape learning curve but (ii) would not. An argument against (ii) is that it can
start to slope down at high years of experience (or maybe this is an argument in favor of (ii)???).
Arguments in favor of (iii) are that it is a simple starting point, and that it has a natural interpretation as
having a constant increase in Y for a percentage increase in Experience (but perhaps this doesnt seem
natural to you, in which case this is an argument against). Any reasoned opinion about whether (ii) or
(iii) is a better starting point, based on at least one correct, substantive difference between the two
functional forms, received full credit.
b) Compute the F-statistic testing the joint hypothesis that the coefficients on Experience2 and
Experience3 are both zero, against the alternative that one or the other or both is nonzero; reject if the pvalue is less than the desired significance level (or the F-statistic exceeds the appropriate critical value
from the F2,f distribution).
2)
3)
(5 points) Test (at the 5% significance level) the hypothesis that the effect on course
evaluations of Beauty is the same for men and for women, against the alternative that these
effects differ.
This problem requires comparing the coefficients on Beauty in regression (5) for men to that in regression
EBeauty
men
EBeauty
women
be the estimator
' = EBeauty
EBeauty
men
women
and in testing the null hypothesis that the population difference, ', is zero. The test statistic is,
t=
'
SE ( ' )
men
= var( EBeauty
Because the estimators
EBeauty
men
EBeauty
women
+ var( EBeauty
men )
and
EBeauty
women
women
) 2cov( EBeauty
men ,
EBeauty
women
independently distributed and therefore uncorrelated, so the covariance in the final expression is zero and
) = var( E
var( '
Beauty
+ var( EBeauty
men )
women
These are population relations, and we need to estimate them. Because the standard error is the square
women
men )
). Thus,
)2 = SE( E
SE( '
Beauty
men )
+ SE( EBeauty
women
)2
SE ( EBeauty
men
)2
SE ( EBeauty
women
)2
Substituting the empirical values in regressions (5) and (6) into this expression yields:
)=
SE( '
.0762
.064 2 = .099.
Therefore,
t=
(6 points) Suppose you have data on marital status of the instructor (the data record three
possibilities: single and never married, single and divorced, married). Provide a regression
specification that modifies (2) so as to control for marital status (be specific).
To regression (2), add the variables SNMi and Mi, where SNMi = 1 if the instructor is single and never
married and = 0 otherwise, and Mi = 1 if the instructor is married and = 0 otherwise. In this specification,
the third possible binary variable, SDi (= 1 if single and divorced, = 0 otherwise) must be excluded, else
you will have perfect multicollinearity because SNMi + Mi + SDi = 1 = X0i, where X0i is the constant
regressor.
Other specifications are possible, e.g. including only SNMi and SDi, or including all three indicators but
omitting the intercept (dropping X0i). But a specification including SNMi, SDi, Mi, and an intercept is
incorrect, it will have perfect multicollinearity.
2)
(8 points) Based on the facts given in the following statement and on the empirical results
presented in Table 2, in your judgment is the conclusion in the following statement justified
or not? Explain.
Regression (2) does not control for innate teaching ability. To do so, I obtained data
on the instructors average teaching evaluations in the previous year and added it to
regression (2). The coefficient on Beauty fell to .051 and was not statistically
significant (SE = .079). Therefore I conclude that the Beauty coefficient in regression
(2) is subject to omitted variable bias and that the true causal effect on course
evaluations of Beauty is effectively zero.
Taking the facts stated at face value, the relevant question is the interpretation of these facts, that is, do
they imply (as the statement says) that the true causal effect on CourseOverall of Beauty is effectively
zero. To evaluate this statement, one must think through what is actually being estimated in this
regression. The coefficient in Beauty in this regression is the effect of a unit change in Beauty, holding
constant last years evaluations and the other variables in (2). Because last years evaluations are being
held constant, the effect of Beauty is its effect on the change in the course evaluations from last year to
this year. Thus the regression asks the question, do more attractive individuals have a greater
improvement, on average, in their course evaluations, than less attractive individuals, holding the other
regressors in (2) constant? This is a very different question than the original, and it makes sense that the
answer to this alternative question is no. More attractive instructors might have higher evaluations, but
those evaluations do not continue to get better and better because they are more attractive rather, they
just stay (on average) at their originally elevated level. So the conclusion is incorrect, because the
regression is answering a different question than the original question of interest.
A second, related way to say this is that Beauty affects past evaluations, so adding past evaluations is one
of the channels by which Beauty has an effect; holding past evaluations constant and focusing on the
coefficient on Beauty is not the right concept in the sense that holding constant past evaluations is in part
holding constant the effect of Beauty through this channel.
10
might not be important nonlinearities, however this could be (and should be) investigated further, e.g. by
putting in quadratics or cubics. Given the insignificance of the BeautyuDBeauty>0, it seems that this is
probably unlikely to change the results in an important way.
3. Measurement error in the regressors. The Beauty variable is subjectively measured so that it will
have measurement error. This is plausibly a case in which the measurement error is more or less
random, reflecting the tastes of the six panelists. If so, then the classical measurement error model, in
which the measured variable is the true value plus random noise, would apply. But this model implies that
the coefficient is biased down so the actual effect of Beauty would be greater than is implied by the OLS
coefficient. This suggests that the regressions in Table 2 understate the effect of Beauty.
4. Sample selection bias. The only information given in this exam about the sample selection method is
that the instructors have their photos on their Web site. Suppose instructors who get evaluations below
3.5 are so embarrassed that they dont put up their photos, and suppose there is a large effect of Beauty.
Then, of the least attractive instructors, the only ones that will put up their photos are those with particular
teaching talent and commitment, sufficient to overcome their physical appearance. Thus the effect of
physical appearance will be attenuated because the error term will be correlated with Beauty (low values
of Beauty means there must be a large value of u, else the photo wouldnt be posted.) This story, while
logically possible, seems a bit far-fetched, and whether an instructor puts up his or her photo is more likely
to be a matter of departmental policy, whether the department has a helpful webmaster and someone to
take their photo, etc. So sample selection bias does not seem (in my judgment) to be a potentially major
threat.
5. Simultaneous causality bias. There is an interesting possible channel of simultaneous causality, in
which good course evaluations improve an instructors self-image which in turn means they have a more
resonant, open, and appealing appearance and thus get a higher grade on Beauty. Against this, the
panelists were looking at the Web photos, not their conduct in class, and were instructed to focus on
physical features. So for the Beauty variable as measured, this effect is plausibly large.
External Validity
The question of external validity is whether the results for UT-Austin in 2000-2002 can be generalized to
Harvard in 2005. The years are close, so the question must focus on differences between students and
the instructional setting.
1. Are UT-Austin students like Harvard students? In their answers to this question, some in the class
suggested that Harvard students seek truth through intellectual arguments, that beauty matters, but it is
the beauty of the mind unlike, it seems, students in Austin, for whom beauty is skin-deep. Perhaps.
2. Do the methods of instruction differ? For example, if beauty matters more in small classes (where you
can see the instructor better) and if the distribution of class size at UT-Austin and Harvard were
substantially different, then this would be a threat to external validity. (The distribution of class sizes
between the two are not so very different that this threat is important both have large, mainly
introductory, courses, and small, main upper-level, courses.)
3. The Course Overall score is just a student evaluation, not a measure of what students actually learned
or how valuable the course was; perhaps an assessment of the value of the course, five years hence,
would produce a very different effect of Beauty, and that is arguably a more important outcome than the
end-of-semester evaluation. If the committee is interested in long-term learning or having students that,
with the experience of age, look back on their courses as meaningful, then the Course Overall results
might differ from a longer term retrospective. (Note: This as a threat to external validity because it
concerns the dependent variable the measurement error threat to internal validity concerns the
regressors.)
11
Note that both of these concerns could be addressed by performing a similar study using data from
Harvard and, for example, Stanford, Yale, and Princeton. This isnt feasible in the time frame of the
question but would be feasible if this policy issue were to be taken seriously.
Policy advice
As an econometric consultant, the question is whether this represents an internally and externally valid
estimate of the causal effect of Beauty, or whether the threats to internal and/or external validity are
sufficiently severe that the results should be dismissed as unreliable for the purposes of the FAS
committee. A correct conclusion is one that follows logically from the systematic discussion of internal and
external validity.
One subtle issue concerns the issue raised in the OV discussion above about the omission of Engaging
Presence if attractive people have a lifetime of experience being the center of attention, perhaps they
get better teaching evaluations because they have become good at being the center of attention, not
because they are beautiful. The reason this is subtle is that the import of this criticism depends on the
policy under question. If the FAS committee were considering a policy of free plastic surgery to its least
attractive faculty members, then this criticism would be key perhaps it is too late, their lifetime of being
unattractive has made them uncomfortable as a center of attention and changing physical appearance
wont change their lifetime of experiences. But that is not the policy under discussion: in the policy under
discussion, if you hire a beautiful person, you hire someone with a lifetime of experience being beautiful,
so it actually doesnt matter whether it is the beauty per se or the acquired comfort and enjoyment of being
a performer both come as part of the package that the FAS committee is considering buying.
Taking the foregoing concerns into account, my own conclusion is that I would be surprised if the threats
to internal and external validity above are sufficiently important, in a quantitative sense, to change the
main finding from Table 2 that the effect on Course Overall of Beauty is positive and quantitatively large.
Moreover, the discussion of internal and external validity and the observations in the previous paragraph
suggest that this is a causal effect in the sense of interest to the committee, that is, changing the
composition of faculty to be more attractive would increase teaching evaluations (even though the channel
might not necessarily be through the obvious aspect of simply the instructors physical appearance). So
my advice, as econometric consultant, would be that implementing a policy of affirmative action for
attractive people (all else equal, higher the better-looking) would, in expectation, improve Course Overall
scores.
A good econometric policy advisor always has some suggestions for further research (which s/he would
be happy to do, for a fee). One thing a follow-on study could do is focus on Ivy league institutions, and
collect data on some potential omitted variables (marital status, department offering the course, etc.).
Another, very different study would be to do a randomized controlled experiment that would get directly at
the policy question. Some department heads would be instructed to assign their most attractive teachers
to the largest introductory courses (treatment group), others would be instructed to maintain the status quo
(control group). The study would assess whether there is an improvement in evaluation scores (weighted
by class size) in the treatment group. A positive result would indicate that this treatment results in an
increase in customer satisfaction.
Finally, some thoughts that were out of bounds for this exam, but would be relevant and important to raise
in the report of an econometric consultant to the FAS committee. FIrst, academic output is not solely
teaching, and there is no reason at all that the results here would carry over to an analysis of research
output, or even graduate student advising and teaching (the data are only for undergrad courses); indeed,
the sign might be the opposite for research. Second, the econometric consultant could raise the question
of whether Beauty has the same moral status as gender or race, even if it does not have the same legal
status as a legally protected class; answering this question is outside the econometric consultants area of
expertise, but it is a legitimate question to raise and to frame so that others can address it.
12
Holding constant the students age and gender, being in a fraternity or a sorority is associated with an
increase of 1.87 days of binge drinking (out of 30 days) (Different wording: The effect on binge30 [the
number of binge-drinking days out of 30 days] of being in a fraternity is 1.87, controlling for the students
age and gender.)
2)
(5 points) Explain why the coefficient on Greek decreases from regression (1) to regression
(2).
The coefficient on Greek in regression (1) evidently has omitted variable bias. Students on a sports team
do more binge drinking (the coefficient on sports is positive), and being on a sports team seems to be
positively correlated with Greek (this is also common sense). That is, sports is a positive determinant of
binge30, and is positively correlated with Greek, so when sports is omitted in (1) Greek must capture both
effects (the frat effect, and the sports effect) thus the coefficient on Greek is overstated in (1) and falls
when sports is included in the regression.
3)
(5 points) Define heteroskedasticity and suggest a reason why the error in regression (3)
might be heteroskedastic.
Heteroskedasticity occurs is when the variance of the error term depends on one of the regressors. If, for
example, the dispersion (or variance) of the number of binge drinking days is larger at a frat than for dorm
residents, the variance of the error would depend on Greek.
4)
(5 points) Using regression (3), predict the number of binge-drinking days in a 30-day
period for an 18-year old white male Freshman who belongs to a fraternity and is on a
sports team.
n
binge
30 = .91 + 1.48u1 .96u0 + .09u18 + 1.15u1 + .35u1 + .00u0 + .22u0 2.08u0 1.54u0
= .91 + 1.48 + .09u18 + 1.15 + .35 = 5.51 binge drinking days (per 30 days)
5)
(5 points) All the respondents are either Freshmen, Sophomores, Juniors, or Seniors, yet
Freshman, Sophomore, Junior, age, and the constant regressor (the intercept) are not
perfectly multicollinear in regression (3). Describe a counterfactual situation in which
these variables would be perfectly multicollinear.
If all Freshmen were 18, all Sophomores 19, all Juniors 20, and all Seniors 21, then Freshman,
Sophomore, Junior, age, and the constant regressor (1) would be perfectly multicollinear:
18uFreshman + 19uSophomore + 20uJunior + 21u(1 Freshman Sophomore Junior) = age
(6 points) Consider two white male frat-member non-sports Sophomores, one of whom is
18 years old and the other is 20 years old. Using regression (5):
a) (3 points) Compute the difference in the predicted values of binge30 for these two
students;
n
30 =
' binge
b)
Eage 'age = Eage u2 = .09u2 = .18 binge drinking days (per 30 days)
(3 points) Compute a 95% confidence interval for the difference in part (a).
n
n
SE(' binge
30 ) = SE( Eage u2) = 2SE( Eage ) = 2u.10 = .20, so the 95% confidence interval for ' binge
30
is .18 r 1.96u.20 = [-.21, .57]
2)
(5 points) Use regression (4) to test the null hypothesis that the relationship between age
and binge drinking is linear, against the alternative hypothesis that the relationship is
possibly a quadratic, at the 5% significance level. Is the null hypothesis rejected?
The null hypothesis that the relationship is linear corresponds to the coefficient on age2 being zero; the
alternative that the relationship is quadratic corresponds to this coefficient being nonzero. Thus the
hypothesis can be tested using the t-statistic testing E age2 = 0. That t-statistic is t = -.081/.062 = -1.31,
which is less than 1.96 in absolute value, so the hypothesis is not rejected at the 5% significance level.
3)
(5 points) Suppose you hypothesized that female athletes are not prone to binge drinking,
even though male athletes might be. How would you modify regression (3) to test this
hypothesis? Be precise.
You want to allow the possibility that the effect of sports is nonzero for men and zero for women. This can
be achieved by defining the variable male = 1 female and creating the interaction variable maleusports,
then including it in regression (3). The relevant part of the regression thus would be,
binge30 = + E4sports + E5maleusports +
so the effect of sports for women is E4 and the effect for men is E4 + E5. Thus the hypothesis that the effect
for women is zero (but not necessarily so for men) can be tested by testing E4 = 0.
Alternatively, if you use the interaction femaleusports, the relevant part of the regression would be,
binge30 = + E4sports + E5femaleusports +
so the effect of sports on binge30 for women in this case is E4 + E5, so the test would be of the hypothesis
that E4 + E5 = 0.
4)
(5 points) The p-value is missing in Table 2 for one of the F-tests based on regression (6).
Estimate this missing p-value and briefly explain how you did so.
5
The F-statistic is 3.45 and there are q = 2 restrictions being tested (the coefficients on Black and
Hispanic/other), so the critical value is obtained from the Fq,f distribution with q = 2. From the table
attached to the exam, the 5% critical value is 3.00 and the 1% critical value is 4.61. Because 3.45 is
somewhat larger than 3.00, a reasonable guess is that p | .04 or p | .03. [The actual p-value is .032.]
5)
The two regressions differ only by the inclusion of alcohol30 in regression (6). In regression (3), the
coefficient on Greek is the effect on binge drinking of being in a fraternity (holding constant the other
regressors in (3)), while in regression (6) the coefficient on Greek is the effect on binge drinking of being in
a fraternity, holding constant the other regressors in (3) and the number of days out of 30 which you drink
alcohol. Thus, regression (3) measures the effect on the number of binge drinking of Greek, in whereas
regression (6) it measures the number of days of binge drinking, holding constant the total number of days
drinking. Thus regression (3) examines the total quantity of binge drinking; regression (6) examines the
fraction of drinking that is binge drinking.
One can interpret regression (6) by examining the effect of the coefficient on Greek at the mean values in
the data. The mean value of binge30 is 2.35 and the mean value of alcohol30 is 5.12, so evaluated at the
means 2.35/5.12 = 46% of drinking episodes are binge drinking. The estimated effect of Greek in (6) is to
increase binge30 by .37, holding constant the total number of drinking days, so (evaluated at the mean)
Greek increases the fraction of binge drinking days, out of total drinking days, by .37/5.12 = 7%, a modest
(but statistically significant) increase relative to the overall ratio of averages of 46%.
Binge drinking is a problem that primarily involves only a segment of the student
population.
To a considerable extent this is true. For example, on the one hand the predicted value of binge drinking
for the 18 year old Freshman sports playing fraternity member in Part 1 question 4 is 5.51 days out of 30
days, or more than one day per week. On the other hand, an 19 year old Sophomore non-frat female nonsports Hispanic female has a predicted value of -0.88 binge drinking days:
n
binge
30 = .91 + 1.48u0 .96u1 + .09u19 + 1.15u1 + .35u0 + .00u1 + .22u0 2.08u0 1.54u1
= .91 1.96 + .09u19 + .00 1.54 = -0.88 binge drinking days (per 30 days)
This negative value does not make sense in context but it does indicate that the binge drinking rate for
some segments of the population is nearly zero, much less than for other segments (male sports-playing
Greeks).
One need not go through this predicted value exercise to see that the average binge drinking rates differ
sharply across groups. The overall mean binge drinking rate is 2.35 days per 30 days, and the coefficient
on black in (3) is -2.02, so this alone suggests that the mean binge drinking rate for nonGreek blacks is
very much lower than the all-student average.
2)
Sororities are just as bad as fraternities, at least from the perspective of binge drinking.
False. Regression (5) addresses this by including the interaction between Greek and female. The effect
of Greek for men is 2.69 (holding constant the other regressors), while the effect of Greek for women is
2.69 2.06 = 0.63, far less in real-world terms. The difference is statistically significant at the 1% level (t =
-2.06/.66 = -3.12), so the hypothesis that binge drinking rates at fraternities and sororities is the same is
rejected at the 1% level, with the rate at sororities far less than at fraternities.
3)
Freshmen, who are learning how to cope with the new freedoms of college, have the
highest incidence of binge drinking; as students gain college experience, binge drinking
becomes much less of a problem.
Depending on what regression you look at, there is some growing up effect lower binge drinking rates
for older classes but the quantitative effect is fairly small. In all regressions (3) (6), the binge drinking
rate is between .35 and .76 days (per 30 days) greater for Freshman than for Seniors, and in all these
regressions the rate for freshman exceeds the rate for other classes. On the other hand, in regressions
(3) (5) these differences between classes are not statistically significant (either the coefficient on
Freshman alone, or jointly with an F-statistic testing the coefficients on Freshman, Sophomore, and Junior
all being zero), so in those specifications the hypothesis of no difference among classes cannot be
rejected.
In regression (6), the class binary variables have jointly significant coefficients and the coefficient on
Freshman is individually statistically significant (at the 1% level), so in that regression there is evidence of
a statistically significant growing up effect. Bear in mind, however, that regression (6) has a different
interpretation it considers the number of binge drinking days, given how many drinking days you have,
that is, the fraction of drinking days that are binge-drinking days (this fraction is higher for Freshmen than
for other classes). Taking regressions (3) (6) together, Freshman have more binge-drinking episodes,
other things constant, and those binge-drinking episodes constitute a higher fraction of their drinking days;
this is consistent with the growing up hypothesis. However, the first part of this summary (relying on (3)
(5)) is not strongly supported by the results because the effects in question have wide confidence
intervals that include zero.
(8 points) Summarize the results in Table 2 about the effect on binge drinking of fraternity
and sorority membership. For the purpose of this question, take the results in the table at
face value, that is, do not consider threats to the validity of these results.
Taken at face value, the results indicate that fraternity membership is associated with a statistically
significant and large (in a real-world sense) increase in the binge drinking rate, holding constant other
student characteristics. Based on regression (5), fraternity membership is associated with an increase in
the binge drinking rate of nearly 3 days per month on average (t = 4.80), holding constant other student
characteristics. Moreover, regression (5) tells us that when students drink, fraternity membership makes it
more likely for that drinking to be binge drinking.
In contrast, the results in the table, regression (5) in particular, indicate that sorority membership has much
less effect on binge drinking (the hypothesis that it has no effect cannot be tested using only the results
provided in the table), increasing binge drinking by only two-thirds of a day per month.
In addition, the results in regression (6) indicate that a greater fraction of the drinking days are binge
drinking days for Greek students than for non-Greek students. However, as a fraction of days, this
differences is small. Thus, the major channel of increasing binge drinking that can be deduced from these
results is that total drinking goes up and, with it, binge drinking, although the fraction of binge drinking out
of total drinking also goes up slightly.
2)
(10 points) Provide two threats that, in your judgment, are the most important threats to the
internal validity of the results discussed in your response to Part 4/Question 1 (be specific
and explain your reasoning).
One major threat is omitted variable bias, associated with unobserved student characteristics. Suppose
students who want a party life will drink substantially wherever they reside, and fraternities simply
provide an opportunity for these students to live with like-minded fellow partiers. Then the causal effect of
being in a fraternity could be small or even zero Greek is positively correlated with a third variable (party
animal) which is an unobserved individual characteristic and is a determinant of binge30, so the
coefficient on Greek in regressions (1) (5) is biased up. This omitted variable bias could also be present
in regression (6), but the interpretation is more subtle, it is not just the amount of drinking but the nature of
the drinking (binge v. non-binge) that is being studied in regression (6), so there the omitted personal
characteristic would be one that is associated not with the students desire for using alcohol, but for using
it in a binge-drinking way.
Note: you could alternatively think of this channel as reverse causality being a binge drinker causes you
to join a fraternity. This would not be incorrect but it is probably not as helpful as thinking that there is a
third variable, party animal, that (for some) leads to both binge drinking and joining a fraternity.
Mathematically these are very closely related and both lead to a violation of the first least squares
assumption and upwards bias in the OLS estimator of the coefficient on Greek.
A second threat arises from these data being self-reported, raising questions about the veracity of the
binge30 response. Note that this is not an issue of errors-in-variables bias associated with
mismeasurement of the Xs it is plausible that the students accurately reported their age, gender, college
year, race, and sports teams. Instead, this is best thought of as omitted variable bias, where the omitted
variable is respondent exaggeration. If frat respondents systematically exaggerated their drinking
exploits but other respondents did not, then Greek would be correlated with the omitted variable
respondent exaggeration, so the coefficient on Greek would reflect not a true effect of being in a frat but
instead would just measure the fact that frat members exaggerated more than others. Of course, the
direction of the bias could go the other way, if frat members tried to hide their binge drinking and
systematically underreported, relative to non-frat members.
A third threat is possible sample selection because of the 65% response rate. For sample selection to
occur, the sample must be selected by a mechanism that is related to the dependent variable. For
example, if heavy binge drinkers were too busy drinking to fill out the survey then there would be sample
selection bias (the selection mechanism busy drinking is related to the dependent variable binge
drinking).
3)
(7 points) Consider the concerned college administrator of the introduction, who would like
to ban the Greek system and replace it with dorms or off-campus housing. All things
considered, do the results in Table 2 support this recommendation? Specifically, why or
8
why not?
First, as pointed out in the response to Part 4/question 1, binge drinking is something associated with
fraternities, not sororities, so banning the Greek system is too wide a policy; even taking the results at face
value, the ban (if justified by reducing binge drinking) should be applied to fraternities only. This said, the
results, taken at face value, would be evidence in favor of a ban on fraternities.
Second, for the ban-fraternity policy to be justified, the results must be internally and externally valid, that
is, the coefficient must be an unbiased estimate of the causal effect, and it must be possible to apply and
exploit that causal effect in the way envisioned by the policy.
Concerning internal validity, the criticism raised in response to Part 4/question 2 that party animals join
fraternities, instead of fraternities producing party animals is a serious one that is not adequately
addressed by this study. So, on the count of internal validity, there is a good reason to doubt that this is
an unbiased estimator of a causal effect, so one should hesitate to make policy conclusions based on
these results.
Concerning external validity, as stated the policy suggests that simply banning fraternities would eliminate
the environment exerting a bad influence on the students. Clearly, simply taking the Greek letters off the
front of the buildings but leaving everything else unchanged would have little effect, although that would
possibly count in a narrow sense as moving from Greek = 1 to Greek = 0. Similarly, closing down frat
living arrangements could in many cases simply move the members off campus to a similar situation,
except that they would not be under college jurisdiction. In this sense, the coefficient on Greek (ignoring
internal validity concerns) needs to be interpreted as eliminating not just fraternities in a narrow sense but
the social environment of fraternities for such a recommendation to be supported.
Taken together, these caveats indicate that the regression results, at least as stated in Table 2, are not by
themselves a suitable basis for proscribing policy.
This said, I would not dismiss the results entirely; instead, a more nuanced interpretation is that while they
are not definitive, they do point quantify a strong and large link between fraternities and binge drinking.
This link is part of conventional wisdom on campus, but the study and the analysis confirms and,
importantly, quantifies the conventional wisdom (bear in mind that it could have overturned the
conventional wisdom, but it did not).
The next step in this research program is to try to measure in a more convincing way the causal effect on
binge drinking of fraternity membership. Barring a randomized controlled experiment, the methods that
are available to try to do this are natural experiments and instrumental variables regression, topics we will
take up in the second half of the semester.
1) (3 points) Using regression (2), construct a 95% confidence interval for the effect on the
corruption rate of an increase in LowEd share of .01 (that is, of a 1 percentage point increase
in the percent of the adult population with at most a high school degree).
'(LowEd share) = .01, so the predicted change in the corruption rate is .0118.4 = .184 and
the standard error of this predicted change is .018.7 = .087, so the 95% confidence interval is
.184 1.96.087 = .184 .171 = (.013, .355).
2) Consider regression (3):
(a) (3 points) Test the hypothesis that the population coefficient on LowEd shareVoting
share is zero, against the alternative that it is nonzero.
The t-statistic is t = 47.7/94.8 = 0.50, which is < 1.96, so we do not reject the null at the 5%
significance level.
(b) (3 points) Test the hypothesis that citizen participation, specifically the presidential
voting share, does not affect corruption, against the alternative that the voting share
affects corruption.
Under the null hypothesis, Voting share does not enter regression (3), which means that the
coefficients on Voting share and LowEd shareVoting share must both be zero. The F-statistic
testing this hypothesis is .52 with p = .60 > .05, so we do not reject the null hypothesis at the 5%
significance level.
3) Do you agree or disagree with the following statements? Explain (3 points each).
(a) Because immigrants are less knowledgeable about the U.S. legal system, they are more
susceptible to governmental corruption. The regression results in Table 1 show that this
is true: more foreign-born citizens, more corruption.
Disagree. The coefficient on Foreign-born share is positive so the sign of the estimated
coefficient indicates that more foreign-born citizens is associated with more corruption, however
the t-statistic is 21.3/14.3 = 1.49 < 1.645 so the coefficient is not significant at the 10% level, so
there is no statistically significant support for this claim at conventional levels of significance.
(b) The R2 of regression (2) is low. Thus there are important determinants of corruption
omitted, and therefore the coefficient on LowEd share in regression (2) is biased because
of omitted variable bias.
Disagree. The low R2 indicates that there might be determinants of corruption omitted from the
regression, but that alone does not mean there is omitted variable bias. For omitted variable
bias to exist, the omitted variables (1) need to be determinants of Y and (2) need to be correlated
3
with the included regressor(s). The low R2 indicates that (1) is probably true (although not
necessarily the error term could just be measurement error) but the R2 is silent on point (2).
(c) The regression results in Table 1 are flawed because they use heteroskedasticity-robust
standard errors: if the errors really are homoskedastic, then these standard errors will be
incorrect. The table should instead report standard errors that are correct even under
homoskedasticity.
Disagree. Heteroskedasticity-robust standard errors are valid whether the errors are
heteroskedastic or homoskedastic.
4) Suppose that high levels of corruption result in low-quality public institutions, including lowquality schools, which in turn results in lower levels of education.
(a) (3 points) If so, what are the implications for the estimated effect on corruption of
education in Table 1? Briefly explain.
This means that there is simultaneous causality: low education causes corruption and vice versa.
As a result the OLS estimator is biased.
(b) Consider the following potential instrumental variables for LowEd share in regression
(3):
(i) Newspapers = average number of newspapers per capita in 1990
(ii) Alphabet = 1 if the state falls in the first half of the alphabet, = 0 otherwise (e.g. = 1
for Alabama, = 0 for Wyoming)
(2 points each) For each proposed instrument, is the variable arguably a valid instrument
variable? Briefly explain.
The two conditions for a valid instrument Z are (1) it is relevant, i.e. Z is correlated with X and
(2) it is exogeneous, i.e. Z is uncorrelated with the error term. So:
(i)Newspapers: (1) relevance: maybe higher levels of education might mean more newspaper
readership; (2) exogeneity: probably not one hopes that more newspapers would help to
uncover corruption and thereby limit corruption.
(ii)Alphabet: (1) relevance: no no reason for alphabetical listing to be correlated with levels of
education. (2) exogeneity: yes no reason for alphabetical listing to be correlated with
anything!
(1)
(2)
(3)
(4)
(5)
(6)
29.4
(11.7)
131.0
(114.4)
32.9
(12.8)
32.5
(10.2)
54.8
(36.4)
35.4
(11.4)
1.3
(2.8)
22.4
(14.4)
-.43
(.45)
14.4
(8.2)
18.4
(18.4)
69.3
(48.9)
-2.20
(1.97)
80.4
(73.3)
1.9
(2.9)
24.0
(14.7)
-.49
(.49)
16.6
(18.8)
HS1928
LnInc1940
2.7
(5.6)
12.6
(14.8)
.18
(.54)
32.1
(24.1)
-28.5
(10.7)
LnInc1940
19.0
0.7
19.7
2.6
50
50
HS1928,
LnInc1940
10.6
3.95
(p = .047)
50
-.4
(2.5)
7.0
(9.4)
.34
(.34)
17.4
(7.2)
-22.2
(6.3)
HS1928
50
50
-.1
(2.5)
7.7
(9.5)
-.32
(.35)
19.2
(7.8)
-23.0
(6.1)
HS1928,
LnInc1940
11.3
0.48
(p = .487)
50
Manufacturing share
Instrumental variables
First-stage F-statistic*
J-test of overidentifying
restrictions
N
1) (15 points) From the regressions in Table 2, select one or more preferred regressions that
you believe provide the most reliable basis for inference about the effect of low education
levels on corruption. Carefully explain your reasoning.
The regressions differ in the instruments that are used, and in whether the manufacturing share
is included as a regressor. The instruments should be selected based on relevance and
exogeneity. Relevance is measured by the first-stage F-statistic, which should exceed 10 for the
two stage least squares results to be statistically reliable. Applying this criterion, we are left
with regressions (1), (3), (4), and (6). When the coefficient is overidentified (here, having at
least two instruments), the hypothesis that both instruments are exogenous can be tested using
5
the J-statistic. The null of exogeneity is rejected in regression (3) but not regression (6). This
leaves us with (1), (4), and (6).
To make a further distinction, we must exercise judgment about the specifications. Are the
instruments arguably exogeneous based on our judgment? It seems like they should be, after all
they measure conditions in the distant past and in this sense they should not be proximate
determinants of corruption in the 1990s. On the other hand, if corruption is related to overall
state values and culture that varies slowly over time, it is possible that these instruments still
could be correlated with these slowly-varying omitted variables. This suggests that it is sensible
to control for more state conditions, for example controlling for the level of manufacturing
(treating it as a control variable, for slowly-varying state conditions not as a causal variable
for corruption) is warranted. This reasoning leads to preferring regressions (4) or (6). As a
practical matter, there is very little difference between the two, however regression (6) includes
an instrument that is basically irrelevant (first-stage F=2.6) so it is warranted to drop that
instrument, which leaves us with regression (4). The fact that adding LnInc1940 as an
instrument in regression (6) doesnt change the results or reject exogeneity is a reassuring
robustness check of regression (4).
2) (5 points) Based on your preferred regression(s), what conclusions do you draw about the
effect on corruption of the level of education? Explain.
Based on regression (4), low levels of education (low shares of a high school degree) are
statistically significant at the 5% level. The magnitude of the effect is substantial: a one
standard deviation move in LowEd share is associated with a 32.5 .07 = 2.3 change in the
corruption rate, which is approximately a one standard deviation change in the corruption rate
(and a change of more than one-half of the mean corruption rate). Assuming that the findings
from the regression are internally valid, according to this regression, increasing the level of
education in the population in particular, reducing the fraction of the population with low
levels of education has not only the usual direct benefits, but the statistically significant side
benefit of substantially reducing corruption.
3) (5 points) In your judgment, what are the most important threats to the internal validity of the
estimates in your preferred regression(s), upon which you based your answer to question 2?
Here are two:
1. Are the other regressors good control variables? In particular, voting share could be subject
to simultaneous casuality (vote early and often), and if so it would introduce simultaneous
causality bias and not be a suitable control.
2. Are the instruments really exogeneous? Although they reflect things that happened in the
distant past, political culture in a state changes very slowly. The identifying assumption is, in
effect, that things that happened long ago are correlated with the level of education today but
are not correlated with omitted determinants of corruption today. The specification of the
equation for corruption today omits things that are plausibly strongly serially correlated, such
6
as the vigor of investigative journalism, the way that prosecutors are appointed (political
appointments? elected officials? rising through the bureaucracy?) It is particularly troubling
that the results on instrument exogeneity hinge on whether the manufacturing share is included
as a regressor (regression (3) v. (6)), when the manufacturing share itself is hard to understand
as a proximate cause of corruption. So the case for instrument exogeneity is not compelling.
(a) Draw the following graphs. Clearly label the axes and provide the numerical values of
the points (3 points each).
(i) The effect of a $1 rebate on the change of consumption, 'Ct, in the month the rebate
is received and the two subsequent months.
Suppose the rebate is received in July. Effects on the change of consumption:
CJuly CJune = .247
CAugust CJuly = -.172
CSeptember CAugust = -.034
(ii) The effect of a $1 rebate on the level of consumption, Ct, in the month the rebate is
received and the two subsequent months.
Suppose the rebate is received in July. Effects on the level of consumption:
CJuly CJune = .247
CAugust CJuly = -.172 so CAugust CJune = .247 - .172 = .075
CSeptember CAugust = -.034 so CSeptember CJune = = .247 .172 .034 = .041
Graphs for (i) and (ii):
Dynamic effect of rebate at date t on the change of consumption and the level of consumption
0.3
0.25
0.247
0.2
0.15
0.1
0.075
0.05
0.041
0
0
2 -0.034
-0.05
-0.1
-0.15
-0.172
-0.2
Months after rebate
III.3(a)(i) Effect on change in consumption
(b) (2 points) Of a $1 rebate received in July, how much is estimated to remain unspent by
the end of September?
11
The increases in monthly consumption in each of the three months are given in 3(a)(ii).
Altogether, the total additional consumption spending is .247 in June, .075 in August, and .041
in September, for .247 + .075 + .041 = .363. Thus the amount remaining of a $1 rebate is 1.00 .363 = $.637.
4) (3 points) During this period, the economy was emerging from a recession. A skeptic says:
The regression results show that, on average, consumption is increasing over this six-month
period, but this could just be a consequence of the general economic recovery. Therefore,
these regressions confuse the effect on consumption of the rebate with the broader effect of
the overall economic recovery. Do you agree or disagree? Why?
Disagree. The regression has monthly fixed effects, which eliminates factors that are changing
over time but constant across households, such as changing overall macroeconomic conditions.
5) Using the results in regression (2), compare the estimated dynamic causal effects of the
rebate for low-income families vs. non-low income families.
(a) (3 points) Is there statistically significant evidence that the dynamic effects differ for
these two groups?
For the dynamic effects to be the same for these two groups, the coefficients on all the
interaction terms for LowIncome would need to be zero. The F-statistic testing this hypothesis is
4.10 with p = .024 < .05, so the hypothesis is rejected at the 5% significance level. So, there is
statistically significant evidence that the dynamic effects differ for these two groups.
(b) (3 points) According to the estimated coefficients, which group (if any) has spent more of
the rebate check after two months, and (if so) by how much? Briefly, explain.
Base group:
Effect on Ct of Rt: .130
Effect on Ct+1 of Rt: .130 -.067 = .063
Total increase in consumption in months t and t+1 = .130 + .063 = .193
Low-income group:
Effect on Ct of Rt: .130 + .624 = .754
Effect on Ct+1 of Rt: (.130 + .624) + (-.067 - .459) = .754 - .526 = .228
Total increase in consumption in months t and t+1 = .754 + .228 = .982
Of a $1 tax rebate, the base group (non-low income) has spent $.193 after two months, whereas
the low-income group has spent $.982 after two months.
(c) (2 points) Do these results accord with economic reasoning, or do they pose a puzzle?
Briefly, explain.
The low-income group has consumed almost all the rebate after two months, whereas the nonlow-income group has consumed only 19% of the rebate. This makes sense economically. The
12
low income group is likely to be liquidity constrained (unable to borrow, at least at reasonable
rates) and would therefore be more likely to spend the rebate check immediately. In contrast,
the non-low-income group might already have financial savings (a substantial bank account
balance) so that the rebate check need not be spent immediately, but instead could be saved and
used for future consumption. [This is an empirical verification of an important point that the
effects on consumption of changes in taxes depends on who the tax change affects.]
13
For purposes of Part IV, the rebate effect is the effect of receiving a $600 tax rebate on
household consumption of eligible households, in the month in which the rebate is received,
holding all else constant.
1) Consider the following estimators of the rebate effect:
(a) C I , July C I , June
Biased. The control groupfor this estimate is the receiving households, in the month prior to
receipt. This after minus before estimator cannot distinguish common effects that happen
over time, such as the general recovery from the recession, from the effect of receiving the
rebate.
(b) C I , July C II , July
Unbiased. The control group is the group that receives the rebate later. This is the simple
differences estimator, treatment minus control. If treatment (receipt of rebate in July) is
randomly assigned, then this will be uncorrelated with other determinants of consumption and
the simple differences estimator is unbiased. [Note: one might question this reasoning by
pointing out that the control group knows it will get a rebate later so if they are not liquidity
constrained, they might increase consumption in July. In fact, in the absence of liquidity
constraints, under the permanent income hypothesis the exact timing of the receipt of the rebate
shouldnt matter, so this estimator would yield an estimate of zero even though the rebate does
in fact increase consumption.]
(c) C I , July C III , July
Biased. The control group here are those who are ineligible. They are likely to be different
systematically from those who are eligible, mainly by having lower income, so their marginal
propensity to consume out of a rebate check is arguably different than those who are eligible
and by assumption we want to measure the rebate effect on the eligibles.
(d) ( C I , July C I , June ) ( C II , July C II , June )
Unbiased. This is the differences-in-differences estimator, in which after minus before for the
treatment group is compared to after minus before for the control group. Because treatment is
randomly assigned, it is independent of other determinants of consumption, and this estimator is
unbiased. [Note that the caveat in (b) applies here too.]
(2 points each) For each estimator (a) (d), is this an unbiased estimator of the rebate
effect? Briefly explain.
15
2) (3 points) Provide a regression equation by which the estimator in 1(d) can be computed by
OLS regression estimated with household-level data for June and July.
Let JulyCheckit = 1 if the household receives the check in July, and = 0 if it does not. Then
estimate the regression,
16
Questions for Par t I (18 points). Please answer these questions in Blue Book I
1)
The dependent variable is binary so this is a linear probability model. The coefficient on Fraction
daughters is the change in the probability of voting for that bill, if the Fraction daughters were to increase
from zero to one, holding constant Registered Democrat.
2)
Consider a representative with 2 daughters and 1 son, from a district in which 55% of
voters are registered Democrats.
a) Using regression (1), compute the probability that this representative voted in favor of
the bill on teen access to contraception. (3 points)
Regression (1) is a probit model, so the regression equation gives the z-score and the probability of
voting in favor (of the dependent variable being 1) is Pr(z < -0.51 + 0.36u(2/3) + 0.71*.55), where z is a
N(0,1) random variable. This equals Pr(z < 0.12) = 0.54 or 54% from the cumulative normal tables.
b) Using regression (2), compute the probability that this representative voted in favor of
the bill on teen access to contraception. (3 points)
Regression (2) is a linear probability model, so the predicted value is the predicted probability, that is, the
probability of voting in favor is 0.38 + 0.13u(2/3) + 0.23*.55 = .59, or 59%.
3)
Does the coefficient on Fraction daughters change substantially (in a real-world sense)
from regression (3) to regression (4)? What does this tell you about the additional variables
that were included in regression (4)? (3 points)
In regression (3), a unit change in Fraction daughters (from 0 to 1) is associated with an increase in the
NOW score by 6.18 (on a 0-100 scale), in regression (4) it increases by 6.01. This is a negligible change
in a real world sense (it is also much less than either of the standard errors in regressions (3) or (4)).
Because the coefficient does not change when the additional regressors are included, omitting those
regressors did not cause omitted variable bias. Thus either those regressors do not belong in the
equation or are uncorrelated with Fraction daughters. (You could say more about which of those is true
from the table, but that would require using additional information beyond the mere fact that the estimated
coefficient didnt change.)
4)
A critic asserts that a shortfall of this study is that it focuses exclusively on daughters,
indicating gender bias by the author. The critic suggests adding one more regressor to
regression (4), specifically, Fraction sons, which is the fraction of males among the
representatives children. What would be learned from this regression? Be specific. (3
points)
Nothing would be learned. Adding Fraction sons would produce perfect multicollinearity: Fraction sons +
Fraction daughters = 1, so because an intercept is included (as it is), Fraction sons is a perfect linear
combination of Fraction daughters and the constant regressor.
5)
Another critic suggests that more conservative districts might elect representatives with
fewer daughters, so that Fraction daughters is endogenous. The author responds that
regression (5) provides evidence against this hypothesis, because Fraction daughters is
3
(with only one exception) unpredictable by the other regressors and thus is exogenous. Do
you agree or disagree with the authors response? Why? Be precise. (3 points)
Disagree, that is, the authors response is not persuasive. The criticism can be said this way. Let X be
Fraction daughters and let W be all the other regressors, so the regression is Y on X and W. The criticism
is that X is endogenous. The response is that X and W are uncorrelated (the regression (5) does in fact
show this the F-statistic fails to reject the null hypothesis that all the coefficients in the regression of X on
W are zero). But this does not show that X is exogenous! Endogeneity is if X is correlated with the error
term (E(u|X) z 0) but saying something about the relationship between X and W doesnt tell us about the
relationship between X and u.
Questions for Par t II (24 points). Please answer these questions in Blue Book II
1)
To result in omitted variable bias, an omitted variable Z must be a determinant of the NOW score and
must be correlated with District income. One such variable is religious background: attitudes towards
womens rights vary across religions (so this variable belongs in the regression), and income levels vary
on average across religious groups in the U.S. (so religion is correlated with District income). A second
such variable is education: attitudes towards womens issues vary with levels of education (although
there is one measure of education in the regression it is a limited measure, perhaps it is high school
graduation that is more important), and income varies strongly with level of education.
b) Comment on the following statement: Your answer to the previous question implies
that the conditional mean of the error term in (4) is nonzero, given the regressors in (4).
Therefore, the first least squares assumption is violated and the coefficient on Fraction
daughters in (4) does not have a causal interpretation. (3 points)
Disagree. The argument in (a) implies that the coefficient on District Income does not have a causal
interpretation. but this need not imply that the coefficient on Fraction daughters does not have a causal
interpretation. The relevant question is whether there is conditional mean independence, specifically,
whether E(u|X,W) = E(u|W), where X = Fraction daughters and W = the other regressors (the control
variables). In words, conditional on the other regressors, does the mean of the error term depend on
Fraction daughters or not? Because the gender of a child is as if randomly assigned by nature, it is
plausible to think that Fraction daughters is distributed independently of u given W (or, more strongly, is
independent of u and W). Thus it is plausible that conditional mean independence holds, and this suffices
to give the coefficient on Fraction daughters a causal interpretation, even if the remaining control variables
are correlated with the error term.
For the remaining questions, suppose (hypothetically) that the data set is extended to be panel
data for T = 3 Congresses, the 105th (1997-1998), 106th (1999-2000), and 107th (2001-2002)
Congresses. The observational unit would be a representative (his/her votes, children, and
district) in a given Congressional session. The data set would consist of all representatives who
were elected to Congress for all three sessions. Suppose n = 300, so there is a total of 900
observations (representatives are elected for two-year terms, and almost all who run for
reelection are reelected).
2)
Representatives in the 105th Congress who retire, are not reelected, or die would be in the
cross-sectional data set used in Table 1, but would not be in the panel data set. Would this
introduce sample selection bias into the panel data estimate of the effect of Fraction
daughters? (3 points)
For sample selection bias to occur, there must be a selection process that is related to the outcome
variable. Suppose that over this period the country became more conservative and representatives with
liberal positions on womens issues were either voted out of office or retired to avoid defeat. Then the
remaining representatives (the ones in the full sample) would be more likely to vote conservative,
whatever their Fraction daughters, and the estimated effect of Fraction daughters would be biased
(towards zero). (Think of the extreme case that only conservatives were elected, and all voted against
womens issues some of these conservatives would have daughters, but because all voted conservative
Regardless of your answer to question (2), for the rest of these questions, ignore the possibility
of sample selection bias.
3)
To what extent would including representative fixed effects address the endogeneity
criticism? Explain. (3 points)
Including representative fixed effects controls for all characteristics of the representative and district that
do not change over time including the representatives family composition at the beginning of the panel.
Thus if family composition at the time of election is an issue, it is now contained in the fixed effect. This
effectively addresses the endogeneity criticism.
A different way to make this point is to suppose that the panel has only two Congresses (T = 2), in which
case representative fixed effects estimation is equivalent to regressions of differences between t = 2 and t
= 1 data. If the fraction daughters upon initial election enters the specification and is correlated with
district attitudes, because neither variable changes over time their first differences are zero and they do
not enter the differences specification. Instead, the differences specification regresses the change in
NOW scores on the change in Fraction daughters and the change in the other regressors.
4)
Suppose the mood of the country is becoming more conservative, and representatives votes change to
reflect that mood. Then mood of the country is an omitted variable and could be included using time
fixed effects. For mood of the country to introduce bias in the coefficient on Fraction daughters, the
mood of the country would need to be correlated with the change in Fraction daughters for those
representatives who were in all three Congresses (see the second part of the answer to question 3
above). For a representative who has a child while in Congress, the gender can be treated as assigned at
random (the logical argument against this would be that voters, seeing that the representative had a
daughter instead of a son, voted him or her out this certainly is far fetched!). So, time effects could be
added to capture the mood of the country, but it very plausibly, omission of time effects would not result
in omitted variable bias.
5)
Consider a hypothetical panel data version of regression (4) in Table 1, in which both
representative fixed effects and time fixed effects are included. Call this hypothetical
regression (P4) (P for panel).
a) What is the problem that is solved by clustered or HAC standard errors, and how
do clustered standard errors solve that problem? (3 points)
Let uit denote the error term in hypothetical panel data regression (P4), where i runs over representatives
and t = 1, 2, 3 runs over Congresses. If uit is correlated over time, then the usual (heteroskedasticityrobust) formula for standard errors does not apply. [The usual formula assumes that the error term is
uncorrelated, but if it is instead serially correlated there is less information in the data than one would think
the observations are not independent.] Clustered standard errors solve that problem by providing an
estimate of the variance of the OLS estimator in panel data that allows for nonzero correlations among the
cluster group, under the assumption that the errors are independent across cluster groups. [This
answer is fully acceptable, it is not necessary to provide a formula.]
c)
Suppose that the author estimated regression (P4), using the standard errors you
recommended in part (b). Using your judgment, do you think that these standard errors
in hypothetical panel regression (P4) would be smaller, larger, or about the same as
those in the cross-section regression (4) in Table 1? Explain. (3 points)
In many cases, panel data standard errors are smaller than cross sectional standard errors because there
are more observations (nT instead of n). In regression (P4), however, this usual situation seems unlikely
to arise. The reason is that, by including representative fixed effects, the variation in Fraction daughters is
arising only from those representatives who have children during the time covered by the panel. The
average age for these representatives is 53, so most are beyond their years of having babies. Thus the
variation in the regressor Fraction daughters, given the fixed effects, will be very small, so the standard
errors will be large.
Another way to see this is to consider the T = 2 panel, for which the first differences regression is
equivalent to OLS with fixed effects. The change in Fraction daughters will be zero for the vast majority of
members. Consider the extreme case in which only one representative has a child between T = 1 and T =
2, who happens to be a daughter. Then the estimator will be comparing the change in his/her NOW score
(the sample size is 1 for this group), to the average NOW score for all the other representatives.
Parts I and II were drawn from Washington, E. (2006), Female Socialization: How Daughters
Affect their Legislator Fathers Voting on Womens Issues, NBER Working Paper no. 11924.
Questions for Par t III (21 points). Please answer these questions in Blue Book III
1)
Give the best reason you can why the OLS estimator of the coefficient on Kids>2 in Table
2, column (3) might be biased. (3 points)
Here are some very good reasons (you only need to have provide one; this list of good reasons is not
exhaustive):
(i) The number of children to have is to a considerable extent a choice variable, it is chosen by the woman
(and the couple) based on various considerations, including what she could earn in the labor market.
Economics of the family indicate that women with a greater value of time (greater potential earnings) will
choose less paid employment and more at-home work which includes child-rearing. This unobserved
variable, earnings potential, is a determinant of hours but is also a determinant of number of children, so
the number of children is correlated with the error term, i.e. endogenous. Correlation with the error term
implies a biased coefficient estimator.
(ii) The foregoing argument also applies to how professionally ambitious the wife is.
(iii) There is an accounting relationship involved here, if a woman had full-time employment during 1979
but had a child during 1979 then she would have taken maternity leave and her weeks worked would be
less. (This problem could be eliminated by restricting the sample to women with no children born in 1978
or 1979.)
(iv) Number of children and weeks worked by the mother are both influenced by cultural and religious
factors. Some religions which emphasize large families also support the view that a womans place is in
the home. Religion indicators are omitted, they are a determinant of whether the woman works and are
correlated with family size, so they cause omitted variable bias.
2)
Consider the hypothesis that, on average, U.S. parents want to have children of both
genders (that is, they prefer at least one girl and one boy to all girls or all boys). Does
Table 2 provide evidence in favor of this hypothesis, against this hypothesis, or neither?
Explain. (3 points)
In favor of this hypothesis. The variable Same sex enters significantly in the linear probability model of
regression (1). That is, couples for which the first two children are of the same sex, are more likely to
have subsequent children. Moreover the effect is large in a real-world sense: the probability of having
additional children increases by approximately .07, that is, 7%, for a woman who has the first two children
with the same sex. This is consistent with the couples having a desire to have another child of a different
gender. Regression (2) is also consistent with this: if the first two children are boys, the couple is more
likely to have another child; also, if the first two children are girls, the couple is more likely to have a
second child. The coefficient on girls is somewhat smaller than the coefficient on boys in regression (2),
indicating that the probability of having another child is greater if you have two daughters than if you have
two sons, indicating that having at least one son is slightly preferred on average to having at least one
daughter.
3)
Consider the following potential instrumental variables for Kids>2 in regression (3):
a) Whether wife came from large family (binary) (3 points)
b) The teen pregnancy rate in the wifes city or town of residence (3 points)
For each proposed instrument, is the variable arguably a valid instrument variable? Briefly
explain.
10
(i) be relevant (correlated with the included endogenous regressor, given the included exogenous
regressors) and
(ii) be exogenous (uncorrelated with the error term in the equation of interest).
For these two instruments:
(a) Wife coming from a large family:
(i) relevance: arguably yes if the wife came from a large family, she might be predisposed to
having a large family herself
(ii) exogeneity: no: coming from a large family would be correlated with religion (e.g. Catholic) or
would indicate being taught certain values, that would be in the error term so coming from a
large family would be correlated with the error term in her weeks worked equation.
(b) Teen pregnancy rate in wifes town:
(i) relevance: yes if teen pregnancy rate in her town is high, that reflects existing cultural
conditions and attitudes about women and work which could be correlated with her own personal
attitudes, which enter her decision about how many children to have.
(ii) exogeneity: no those same cultural attitudes that affect family size also influence the wifes
decision to work, so the instrument would be correlated with cultural attitudes which are in the
error term.
4)
b) Is the pair of variables, 2 boys and 2 girls, a valid set of instruments in regression (5)?
(3 points)
(i) relevance: first-stage F = 725.9 > 10, so the instruments taken as a set are relevant.
(ii) exogeneity. The logical reasoning given in (a) supports exogeneity here as well. Because there
are two instruments, however, we can also test for exogeneity of both, against the alternative that
one of them is not exogenous, using the J statistic. The J-statistic has a chi-squared distribution
with k1 degrees of freedom, where k is the number of instruments, so here it has a chi-squared
distribution with 1 degree of freedom. The J-statistic is 3.24; the 5% critical value of the F12
11
distribution is 3.84 and the 10% critical value is 2.71, so the J-statistic rejects at the 10% but not 5%
significance level. This provides some limited evidence against the hypothesis that both instruments
are exogenous, however the evidence is not strong (not significant at the 5% level), especially given
the very large number of observations. So it is reasonable to interpret this J-statistic as generally
supportive of the hypothesis of exogeneity.
An interesting aside: Suppose parents of boys need to take more time off on average from work than
parents of girls. Then 2 boys and 2 girls would be correlated with the error term in Weeks worked, that is,
they would be endogenous. However this does not imply that Same sex would be correlated with the error
term in Weeks worked, because Same sex simply says that you could either have 2 boys (negative effect)
or 2 girls (positive effect) which would by definition cancel out on average. So even if you are concerned
about the exogeneity of the pair of instruments 2 boys and 2 girls, you might not be concerned about the
exogeneity of Same sex.
5)
The estimated coefficient on Kids>2 differs in regressions (3) and (4) (the OLS estimate is
more negative than the TSLS estimate). Provide a real-world explanation (an interpretation
of the results) that explains why the OLS estimate is more negative than the TSLS estimate.
(3 points)
Consider explanation (ii) to question 1. Higher ambition implies more weeks worked and fewer children,
so the omitted variable bias effect is negative: fewer children is picking up higher ambition so the
coefficient on Kids>2 is biased towards a large negative number. This is what one sees comparing the
coefficients in regression (3) and (4).
12
Questions for Par t IV (17 points). Please answer these questions in Blue Book IV
1)
(7)
which would be estimated by TSLS, using Same sex as an instrument (so regression (7) is
regression (4) without the variables Boy first,, Other race). For this question, assume
that Same sex is a valid instrument in regression (4) and in addition that Same sex is
distributed independently of all the control variables in regression (4), so E(Boy first|Same
sex) = 0, , E(Other race|Same sex) = 0.
a) Explain why Same sex would be a valid instrument in regression (7). (3 points)
NOTE: TYPO ANNOUNCED DURING THE EXAM
The assumption
E(Boy first|Same sex) = 0, , E(Other race|Same sex) = 0
(*)
should be:
E(Boy first|Same sex) = E(Boy first), , E(Other race|Same sex) = E(Other race). (**)
The basic idea is that the difference between regression (4) and regression (7) is that W is in the error
term of (7). But if Same sex is distributed independently of W, and if it is exogenous in (4), then it is
uncorrelated with the error term in (7) because that error consists of the effect of W plus the error term in
(4).
Making this argument precise in equations (which isnt necessary for full credit) goes as follows. There
are two approaches. The first is to realize that, when an intercept is included in the regression, the
regression is the same if the regressors that vary are first subtracted from their mean. In this case,
assumption (*) is valid, and the regressors are simply (without loss of generality) interpreted as deviations
from their means. Here is the argument under this interpretation:
For Same sex to be valid in (7), it must be relevant and exogenous.
First the exogeneity argument:
Let u4 be the error term in regression (4), and write regression (4) as
Yi = J0 + J1Kids>2i + J2Wi + u4i
(4)
where Wi stands for all the other regressors in (4). The coefficient J1 is the effect of a change in Kids>2 on
Y, holding constant Wi. But if Same sexi is distributed independently of Wi then the final clause holding
constant Wi doesnt matter so J1 is the same as E1 in regression (7). This means that the error term in
regression (7) must be,
ui = J2Wi + u4i.
Thus,
13
(+)
E0 + ui = Jo + J2Wi + u4i.
(++)
E0 = Jo + J2E(Wi),
Substituting this expression into (++) yields,
ui = J2[Wi E(Wi)] + u4i.
Thus
E(ui|Same sexi) = J2E{[Wi E(Wi)]|Same sexi} + E(u4i|Same sexi).
By the independence of W and Same sex and the exogeneity of Same sex as an instrument in regression
(4), it follows that E(ui|Same sexi) = 0, so that Same sex is an exogenous instrument in regression (7).
The relevance argument is the same as given above.
b) Provide a reason why, despite the validity of Same sex as an instrument in regression
(7), you would still prefer regression (4). (3 points)
Including W as a regressor in (4) does not change the consistency properties of 2SLS, however it could
reduce the standard error relative to (7) because including W will mean that the variance of u4i is less than
the variance of ul (the error of regression (7)), which means that the variance of the 2SLS estimator in
regression (4) could be less than that in regression (7). This can be checked empirically and the estimator
specification yielding the smaller standard error, (4) or (7), could be chosen, and that estimator could be
(4) for the reason just given.
2)
Some women are more ambitious professionally than others. Suppose that the effect on
labor force participation of having a large family is not the same for every woman,
specifically, the more ambitious the woman, the smaller is the effect (the most ambitious
women will work whether or not they have a large family). How if at all would this
change your interpretation of the results in regressions (4) and (5)? Explain your reasoning.
(5 points)
In general, if there is heterogeneity in treatment effects, then the IV estimator estimates the local average
treatment effect. The local average treatment effect is the weighted average treatment effect, weighted
14
most heavily by those women who are most heavily affected by the instrumental variable. The
interpretation for regression (5) is the same as for regression (4), so it suffices to focus on one or the
other; here we consider regression (4). Applied to regression (4), the local average treatment effect is the
weighted average effect on weeks worked of having a large family (more than two children), where the
women whose family size decision is most heavily influenced by the gender of their first two children. To
interpret this further, one must ask, (a) are there, plausibly, any differences among mothers in their desire
to have at least one child of each gender; and (b) if so, who are those mothers whose subsequent childbearing decisions are the most influence by the gender of their first two children?
The answer to (a) is entirely judgmental there is no evidence on that in the table. Here are three
perfectly valid answers (any one is sufficient):
(i) All mothers are arguably identical in the US in their preferences for a mix of boys and girls.
Then despite the variation in E1i, the instrument has the same effect on everyone, the TSLS
estimator consistently estimates the average treatment effect.
(ii) Mothers vary in their preferences for children of each gender, however that variation is
independent of their professional motivation so he variation in the effect of the instrument is
independent of the variation in E1i. If so, then the TSLS estimator consistently estimates the
average treatment effect.
(iii) Mothers from certain cultural backgrounds that value children of multiple genders will have
less desire to work, if they have a large family, so the effect of the instrument is the greatest for
those with small E1i. In this case, LATE will be larger (larger negative number what seems to be
a larger effect on Wifes weeks of having a large family) than the average treatment effect.
Another entirely satisfactory approach to this problem is to use the formulas presented in class in which
the effect of the instrument on Kids>2 is denoted by Si, so the LATE is given by
LATE =
E ( E1i S i )
.
E (S i )
Using this formula, one can then rephrase the substantive discussion above in terms of Si and the various
expectations in the LATE formula.
Use Table 2 to comment on the following statements. For each statement, do you agree or
disagree with the statement, and explain why (be specific).
3)
Families with large numbers of children tend to be unusual in certain ways, in some cases
coming from certain religious/ethnic backgrounds (traditional Catholic families, Mormons,
etc.). So the analysis in regressions (4) and (5) is not providing a valid estimate of the
effect of family size on labor supply, it is just reflects this religious/ethnic effect. (3 points)
Disagree. This criticism would be a valid criticism of OLS but it entirely misses the point that the use of the
instruments and TSLS eliminates the potential correlation in question.
4)
Even though having large families reduces female labor force participation, this is only half
of the story because their husbands will work more to compensate for the loss of the wifes
earnings. (3 points)
According to the evidence in regression (6) in Table 2, this is wishful thinking. Husbands weeks worked
increase slightly, but the estimated effect is much smaller than the decline in wifes weeks worked and it is
15
statistically insignificant.
By the way, when this analysis is repeated using income earned by the wife and husband, one finds that
the TSLS estimate of the husbands income increases slightly with more kids, but it does not increase
nearly enough to compensate for the decline in the wifes income (on average). This said, monetary
income is not necessarily the best measure of household economic well-being, for example the wife could
be cutting back on hours but also cutting back on expenses (lower child care payments, for example), so a
more complete analysis of the effect of large families on household economic well-being would need to
take into account the value of home production and market purchase of home services.
Parts III and IV were drawn from Angrist, J.D. and W.N. Evans (1998), Children and Their Parents
Labor Supply: Evidence from Exogenous Variation in Family Size, American Economic Review 88,
450-477.
16
Questions for Par t V (20 points). Please answer these questions in Blue Book V
1)
The value of GDP growth in 2005:III was 4.1 (that is, in the third quarter of 2005, GDP
grew by 4.1% at an annual rate).
a) Use regression (1) in Table 3 to compute a forecast of GDP growth for 2005:IV. (3
points)
The forecast for quarterly GDP growth for 2005:IV is 2.42 + 0.27u 4.1 = 3.5, or 3.5% at an annual rate.
b) Suppose that the errors in regression (1) are normally distributed. Compute a 95%
prediction interval (forecast interval) for GDP growth in 2005:IV. (3 points)
A 95% prediction interval is approximately given by the point estimate, r1.96SER, which is 3.5 r 1.96u3.3
= (-2.9%, 10.0%) (which is huge).
c)
Suppose that forecast errors come in clusters, for example, some years have more
volatile GDP growth than others, so that GDP growth is more difficult to predict in
some years than in others. Suggest a modification of regression (1) in Table 3 that
would produce more reliable forecast intervals if there is this forecast error volatility
clustering. (2 points)
Regression (1) could be modified to have an ARCH or GARCH model, which model the error variance as
depending on past squared errors and thus capture the effect of volatility clustering. If the error variance
is currently low, the ARCH estimate of the variance will be small and the confidence interval will be tighter.
2)
No, heteroskedasticity-robust standard errors suffice. HAC standard errors are needed when the error
term is serially correlated. If the number of autoregressive lags is large enough, then the error term will
not be serially correlated the serial correlation is captured by the autoregressive lags. The question is, is
one lag enough? Regression (2) indicates that one lag is in fact enough, the hypothesis that the additional
three lags have nonzero coefficients is not rejected. Another way to have determined the number of lags
is the BIC, which is not reported in the table; however the evidence in (2) is enough to provide confidence
that a single lag is enough to leave the error term serially uncorrelated.
3)
In Business Week Online (January 9, 2006), David Wyss, chief economist for Standard and
Poors wrote about how the recent decline of Term Spread has created worries about a
slowdown in U.S. economic growth. Based on the results in Table 3, do you think that
these worries are justified? Fully explain your reasoning. (5 points)
The coefficient on Term Spread in regressions (3) (5) is positive, so a decline in Term Spread predicts a
decline in GDP growth. To decide whether this is a worry or not, we need to see how large the effect is.
From Figure 2, the decline in the past year or so has been large. Consider a decline of 1 percentage
point. The predicted decline in GDP growth is 0.7 percentage points (regression (3)), 1.6 percentage
points (regression (4)), and 0.2 percentage points (regression (5)). The estimates for regression (3) are
substantial, for regression (4) are quite large, and for regression (5) are quite small.
Therefore we must decide which regression to use. Regression (4) uses data ending in 1984 and it
makes no sense to use that regression. Whether one uses regression (3) or (5) depends in part on
20
whether the hypothesis of stability of regression (3) can be rejected (if it cannot, then the low estimate in
(5) might just be sampling variation). The QLR statistic in regression (3) rejects the null hypothesis of
stability at the 5% level, but the QLR statistic for regression (5) indicates stability over that subsample.
These results indicate that the full-sample regression is unstable and thus inappropriate, but that the
second-half regression is stable and thus can be used. The predicted effect in regression (5) is very
small, so this suggests that the worry discussed in the Business Week Online article is misplaced, based
on the results in Table 3.
4)
Suppose the U.S. Federal Reserve Bank is considering setting Term Spread to 1.0, that is,
increasing Term Spread from its current value of approximately zero by 1.0 percentage
point. (Suppose that, because long rates are more sluggish than short rates, the Fed can do
this by lowing short-term interest rates until Term Spread equals 1.0.)
a) Use regression (5) to estimate the effect of this easing. (1 points)
The estimated effect is an increase in quarterly GDP growth of 0.18u1.0 = 0.18 percentage points (annual
rate).
b) In your judgment, do you think that your answer in (a) provides a good estimate of the
effect of this proposed policy intervention by the Fed? Why or why not? (4 points)
No. For this to be valid for policy purposes, the term spread would need to be exogenous. But interest
rates, especially long rates, are set by market participants who look ahead to future economic conditions.
In particular they would be taking into account the Feds expected future economic actions. To be more
specific, temporary short-term interest rate tightening would be associated with decline in Term Spread,
but that would also be associated with slowing the growth rate of GDP. This can be thought of as omitted
variable bias (or it can be phrased as simultaneous causation); in either event, Term spread is
endogenous, so the OLS estimate of the coefficient is a biased estimator of the causal effect that is of
interest in the hypothetical policy.
21
b) (2 points) Compute a 95% confidence interval for your estimated effect in (a).
SE = 4 u .027 = 0.108, so the 95% confidence interval is 0.388 r 1.96u0.108 = (0.18, 0.60)
2) Consider the relationship between the childs years of education and parental BMI, holding
constant the regressors in Table 2, column (1) other than parental BMI.
a) (2 points) Suggest a reason why this effect might be nonlinear.
Here is one example: Suppose BMI is just a proxy for omitted health factors. Extremely high or low
BMI could be associated with chronic illness which would make parents less able to spend time
parenting. If so, then very high and very low BMI would be associated with worse outcomes, relative
to BMI in a normal range. This suggests that the outcomes/BMI relationship might be modeled by a
quadratic, where the maximum of the quadratic is in the range of normal BMI.
b) (2 points) Can you reject the null hypothesis that effect on the childs years of education
of parental BMI is linear? Explain.
No. The F-statistic testing the hypothesis that the coefficients on (Mother's BMI)2, (Fathers BMI)2,
(Mothers BMI) x (Fathers BMI) are all zero has a p-value of 0.634, so the F-statistic is not significant
at the 5% (or 10%) level.
b) (2 points) Using regressions (1) and (2), can you reject the hypothesis of random
assignment? Explain.
Yes, the F-statistic in both cases is significant at the 1% significance level (p-value < .01).
c) (2 points) Using regressions (3) and (4), can you reject the hypothesis of random
assignment? Explain.
No, the F-statistic in both cases is not significant at the 10% significance level (p-value > .10).
d) (3 points) Explain what your answers to (b) and (c) imply about the program. Explain, in
real-world, concrete terms, how you might reconcile any discrepancy between your
answers to (b) and (c).
The only difference between regression (1) and (3), and between regression (2) and (4), is that
regressions (3) and (4) include a full set of year dummy variables indicating the year in which the
adoption took place. The results are consistent with the assignment being random, conditional on the
year in which the adoption took place, but not with the assignment being random unconditionally.
Unconditionally, parents income is positively associated with weight and height, but conditionally it is
not. This might arise if, in the early years of the program, parents tended to be richer and the children
tended to be older, while in later years of the program, parents had lower incomes and the children
were younger, but in each year the children in the adoptee pool were randomly assigned to parents in
the parent pool.
4) The standard errors reported in Tables 1 and 2 are clustered standard errors, clustered at
the level of the household.
a) (3 points) Explain specifically what this means, that is, what are clustered standard errors,
clustered at the level of the household? Be precise.
Clustered standard errors allow for the possibility that the regression errors (ui) are correlated across
observations within a cluster, but that the regression errors are uncorrelated between clusters. In this
instance, clustered standard errors allow for the possibility that the error term is correlated among
adoptees who are in the same household, but not across adoptees who are in different households.
b) (3 points) Provide a reason why the clustered standard errors could be larger than the
conventional heteroskedasticity-robust standard errors for the regressions in Table 2.
The clustered standard errors will exceed the conventional heteroskedasticity-robust standard errors if
there is positive correlation between the errors within the household. This would arise if there are
omitted variables in the regression that are correlated for the two adoptees. Such omitted variables
might be omitted household characteristics. For example, the number of natural children in the family
is omitted; if more children means less attention to each child, then the number of natural children
could be a determinant of economic outcomes, but because it is the same for both adoptees in the
same family, the regression errors for adoptees within the family would be positively correlated. In this
case, the clustered standard errors would exceed the heteroskedasticity-robust standard errors.
b) (2 points) What is the difference in the predicted probabilities of drinking for the adoptee
in (a), compared with an adoptee whose parents have the same characteristics as those in
(a) except that the mother drinks?
If the mother drinks, the z score is the same as above, plus .374, that is,
z = .089 + 0.374 = 0.463
so the predicted probability is the cumulative normal density, evaluated at z = 0.463, which is 0.678.
The difference in predicted probabilities is .678 - .535 = .143. That is, the adoptee whose mother
drinks is 14.3 percentage points more likely to drink than the adoptee whose mother does not drink,
given the values of the other regressors.
c) (2 points) Now use the linear probability model from Table 2 to estimate the change in
predicted probabilities for the comparison in 1(b) (that is, a nondrinking vs. a drinking
mother, with the values of the other regressors given at the beginning of this question).
The corresponding linear probability model in Table 2 is regression (6). The coefficient on Mother
Drinks in that regression is the estimated effect of a unit increase in Mother Drinks on the probability
that the child drinks, holding the other variables constant. The estimated increase in the probability of
drinking from having a drinking mother is 0.135, approximately the same as the increase of 0.144
estimated using the probit model.
2) Using the results in Tables 2 and 3, do you agree or disagree with the following statements?
Explain.
a) (5 points) Many countries impose restrictions on foreign adopting parents, including
limits on parental BMI and parents education. The results in Tables 2 and 3 support
7
these policies in the sense that Tables 2 and 3 show that high parental BMI and low
parental education both are associated with worse outcomes for adoptees.
Agree. Conditional on the year of adoption, the assignment of children to parents does indeed appear
to be random, so this is a valid experiment for determining the causal effect of assignment of children
to parents with different characteristics. This is completely analogous to a drug study, where the
treatment is the parent characteristics. The result in Table 2 and 3 show that some parent
characteristics do affect child outcomes, although others do not. For example, the adoptee will have
more education if the mothers BMI is lower and if the mothers education is higher. These magnitudes
are statistically significant and, arguably, large: in Table 2, regression (1), four more years of mothers
education is associated with approximately 0.4 years of childs education. (One could also argue that
this is a relatively small increase in education; this is a matter of judgment; but in light of the small
number of manipulable variables that affect educational outcomes, this estimate of 0.4 years is rather
substantial)
b) (5 points) The results in Tables 2 and 3 show that dieting by overweight mothers has
positive benefits for children. Specifically, consider a mother who decreases her BMI by
10 (for an obese woman, this corresponds to a weight drop of approximately 25%). On
average, holding other family characteristics constant, we would expect to see this weight
loss lead to an economically substantial increase in the childs years of education and in
the childs probability of graduating from college.
Disagree. From Table 2, regression (1), a drop in BMI of 10 is associated with an increase in the
childs years of education by 0.88, that is, almost one year. This is a sizeable amount in a real-world
sense. Using the coefficient from the linear probability model in Table 2, regression (3), a drop of BMI
by 10 is associated with an increased probability of being a college graduate by 0.17, or 17
percentage points, also a large amount in a real-world sense. These effects are statistically
significant. It does not follow, however, that these estimates are causal effects of dieting. They are
causal effects of placing children in households with these different characteristics. The question is, is
E(u|X,W) = 0, where W represents the year dummies and X represents the parental characteristics in
Tables 2 and 3? Arguably, the answer is no. For example, suppose it is not BMI, but maternal health
that is the determinant of Y (childs education). Then health is an omitted variable which is correlated
with maternal BMI, so E(u|X,W) z 0. In real-world terms, dieting would change BMI but might not
change the other health conditions (or might only partially change them) so that the true determinant,
health, would not change (or might only change slightly). This is standard OV bias.
c) (5 points) The results in Table 3 shed light on the nature-nurture debate. These tables
show that paternal characteristics (such as drinking and being overweight) are transmitted
primarily through a genetic path, whereas maternal characteristics seem to be transmitted
primarily through a non-genetic (that is, environmental) path.
Agree. The coefficients on education, BMI, and parental drinking are all much larger for the natural
children than for the adoptees. This is especially true for the paternal variables, for example the
change in the z-score for college graduation associated with 4 additional years of paternal education
is a large (0.105u4 = 0.42) for natural children, but falls slightly by a statistically insignificant amount
(-0.010u4 = -0.04) for adoptees. This is strongly suggestive of a genetic pathway for the paternal
education effect, not a social pathway. For mothers, the difference in the coefficients is smaller
between the adoptees and natural-born children. The effect of maternal BMI on college graduation is
quite small for the two groups (-.086 vs. -.108) suggesting this pathway is mainly environmental.
2) Regression (3) uses three variables as instrumental variables for TV exposure. For each
instrument, explain whether, in your judgment, the instrument plausibly is exogenous:
a) (2 points) the Price of TV advertising in the county;
The desired exogeneity condition is E(u|Z,W) = 0, where Z are the instruments, u is the error term in the
BMI equation, and W are the additional regressors (included exogenous regressors) in the obesity
equation. Plausibly there is no feedback from BMI of the individual child to the regional price of
advertising, so there is no simultaneous causality. You also need to think about whether there might be
an omitted variable in the BMI equation that would be correlated with the price of TV advertising. One
possibility is the population density. Counties with more people will have higher TV ad prices (the TV ads
reach more people). If children in urban areas get less outdoor play and exercise than children in rural
areas then the price of TV advertising will be correlated with the omitted variable, outdoor exercise. If so,
the instrument would not be exogenous.
11
intervals will not contain the true value 95% of the time, in fact, the coverage rate of conventional
confidence intervals (and the size of hypothesis tests) can be very far from the nominal rate of 95% (or
size of 5%).
b) (3 points) Based on the results in Table 2 (TYPO: this should be Table 4), are the
instruments weak, are they strong, or do you need more information before you can
decide? Explain.
The first-stage F-statistic testing the hypothesis that the coefficients on the instruments in the first-stage
regression is 41.92. This exceeds the rule-of-thumb value of 10 so the instruments can be treated as
strong. Note: it is not enough that the first-stage F-statistic be statistically significant at the 5% or 1%
significance level the instruments can have statistically significant coefficients but still explain so little of
the variation in the endogenous regressor that the problems listed in response to 3(a) arise so that the
instruments are weak.
b) (3 points) Using the J-statistic actually reported in column (3), do you reject the null
hypothesis at the 5% significance level? Explain how you reached this conclusion (be
precise).
The J-statistic has a chi-squared distribution with degrees of freedom equal to the number of
overidentifying restrictions. There are 3 instruments and one endogenous regressor so there are 2
overidentifying restrictions. From the chi-squared table, the 5% critical value of the chi-squared
distribution with 2 degrees of freedom is 5.99; from regression (3) in Table 4, the J-statistic is 0.308 <
5.99, so the null hypothesis is not rejected at the 5% significance level.
5) (3 points) A researcher suggests using as instruments a full set of county binary variables
(county dummy variables). What would be the effect of adding a full set of county dummy
variables to regression (2)?
The instruments in regression (2) all vary at the county level, for example Temperature is the annual
mean temperature in the county. Because these vary at the county level, they are perfectly explained
by a complete set of county dummies; that is, a regression of Temperature on an intercept and n-1
county indicator variables will have R2 = 1.00. Thus all three of the instruments in regression (2) are a
perfect linear combination of the county dummies, and for this reason adding the full set of county
dummies (n-1 indicators, plus the intercept) to regression (2) will result in perfect multicollinearity.
6) (5 points) Another researcher suggests replacing the instruments in regression (3) with a new
instrumental variable, ProSports, that equals one if at least one local professional sports team
was in the playoffs during the study period, and equals zero otherwise. For the purposes of
this question, suppose that ProSports is a valid instrument. Describe, in concrete and
12
everyday terms, a reason why the local average treatment effect obtained using ProSports
would differ from the average treatment effect. In your example, is the local average
treatment effect greater than or less than the average treatment effect?
The local average treatment effect is a weighted average treatment effect, where the most weight is
placed on those most affected by the instrument. In this case, the LATE would be the treatment effect
for those whose TV watching is most swayed by the presence of a successful pro sports team. These
are people who would not normally watch TV, but would do so if they have a playoff game to watch.
Call these people non-TV watching sports fans.
The question is, does the effect of fast-food advertising differ for non-TV watching sports fans,
compared with the rest of the population? If so, then LATE z the average treatment effect in the
population. Here is a possible reason. Suppose these non-TV watching sports fans are normally out
playing sports and leading healthy lives, which means not eating too much fast food. Then seeing ads
on TV will not induce them to eat more fast food. If so, the LATE using ProSports would be less than
the average treatment effect.
7) (5 points) Do you agree or disagree with the following statement? Explain fully. (The
sample average of TV Exposure is approximately 0.5 hours.)
The results in Table 1 (TYPO: this should be Table 4) indicate that a ban on TV fast-food
advertisements would reduce the BMI among children by an amount that is statistically
significant and meaningful in a real-world sense.
Here is a full credit Agree response:
The criticism in question 1 implies that TV Exposure is endogenous so regression (1) is not
meaningful. Therefore, to evaluate the statistical and real-world significance, we need to look at the IV
estimates in regression (3). The coefficient on TV Exposure is statistically significant at the 5% level
in regression (3). Concerning the real-world significance, the change in BMI associated with a ban on
advertising (going from 0.5 to 0 hours/week) is -0.336u0.5 = -0.17. One way to see if this is large is to
compare it to the average BMI increase over the past 30 years, which is (from the intro to Part III)
17.37 - 16.63 = 0.74. Taken literally, a ban on fast-food advertising would reverse 0.17/0.74 = 0.22, or
approximately one-fifth of the mean childhood weight gain over the past three decades. Thus, in a
real-world, medical sense, this is a large and statistically significant effect. The question then turns to
whether the instruments are valid, that is, whether regression (3) provides a consistent estimate of the
causal effect of fast-food TV advertising on BMI.
Based on the evidence presented, the proposed set of three instruments are valid, specifically they
are not weak (first-stage F =41.92 > 10) and they appear to be exogenous. This latter point is
supported by the failure of the J-statistic to reject the null hypothesis that the instruments are
exogenous, that is, to reject the overidentifying restrictions. Although some doubts were raised about
the validity of the instruments in the response to question 2, these seem not to be justified, or at least
empirically important, based on this J-statistic. Because the instruments are valid, inference based on
regression (3) is internally valid and the conclusion of the previous paragraph stands.
Here is a full credit Disagree response:
The criticism in question 1 implies that TV Exposure is endogenous so regression (1) is not
meaningful. Therefore, to evaluate the statistical and real-world significance, we need to look at the IV
estimates in regression (3). The coefficient on TV Exposure is statistically significant at the 5% level
in regression (3). Concerning the real-world significance, the change in BMI associated with a ban on
advertising (going from 0.5 to 0 hours/week) is -0.336u0.5 = -0.17. One way to see if this is large is to
13
compare it to the average BMI increase over the past 30 years, which is (from the intro to Part III)
17.37 - 16.63 = 0.74. Taken literally, a ban on fast-food advertising would reverse 0.17/0.74 = 0.22, or
approximately one-fifth of the mean childhood weight gain over the past three decades. Thus, in a
real-world, medical sense, this is a large and statistically significant effect. The question then turns to
whether the instruments are valid, that is, whether regression (3) provides a consistent estimate of the
causal effect of fast-food TV advertising on BMI.
Based on the evidence presented, the proposed set of three instruments are not weak (first-stage F
=41.92 > 10), however there is good reason to believe they are not valid. It is true that the J-statistic
fails to reject the null hypothesis that the overidentifying restrictions are valid. However, it should be
borne in mind that the J-statistic cannot test whether all the instruments are exogenous, it can only
test the exogeneity of k-1 instruments, assuming that at least one instrument is exogenous. The
response to question 2 raises doubts about all of these instruments: they all could be correlated with
unobserved county variation that is a determinant of childhood obesity, in particular urban/rural
differentials and access to outdoor exercise opportunities. It is possible for the J-statistic not to reject
but still for all the instruments to be endogenous, and the a-priori reasoning in the response to
question 2 suggests that this is what is going on in the TSLS regressions. This means that the
estimate in regression (2) is not a valid estimate of the causal effect of fast-food TV advertising on
BMI, so causal inference about the proposed policy change is not justified by the results in this table.
14
c) (3 points) Compute the standard error for the cumulative dynamic effect in (b). If you do
not have enough information to do so, explain how you would compute this standard
error and what additional information you would need.
There is not enough information. There are several ways to calculate the standard errors for this 2-year
cumulative effect. One way to do so would be to use the full covariance matrix of the regression
coefficients. Let
E1 , E2 , and E3
17
3) (3 points) A critic of this analysis asserts that the relationship in regression (3) might be
unstable and suggests computing the QLR statistic (with trimming of 15% on each end of the
sample, as is conventional). Is this a good recommendation for the purpose of assessing the
stability of regression (3)? Explain why or why not.
This is not a good recommendation. FreeAgentst = 0 before the introduction of free agency, see Fig. 1.
So splitting the sample in for example 1965 will yield all values of FreeAgentst = 0 in the pre-1965 sample,
which will be perfectly mutlicollinear with the intercept term. Thus the QLR cannot be computed with
trimming of 15% on each end of the sample.
4) (5 points) Baseball owners assert that free agency reduces competitiveness across baseball
teams because rich teams can outbid poor teams, increasing talent disparities across teams.
Based on the results in Table 5, do you agree, disagree, or can you not reach a conclusion?
Explain.
Taken at face value, the regressions in Table 5 all suggest the opposite conclusion, that free agency has
been associated with a decrease in the standard deviation of winning percentages, that is, with an
increase in the competitive balance.
The question is whether to take this result at face value. It is hard to say without more information about
baseball than is provided in the introduction to this part. But it is evident from Fig. 1 that FreeAgentst is
essentially picking up a trend towards greater competition. The question then is whether there are other
determinants of that trend, which would be correlated with FreeAgentst, that is, whether there is omitted
variable bias. Plausibly, there might be. For example, if the pool of players is better now than it was, then
the difference in raw talent across teams might be less (hypothetically, suppose in the 50s there were a
few stars then many average players, but now all the players are much closer in quality; then differences
in quality across rosters would be less).
Note that, because the error term is serially correlated because of these omitted variables (we can also
deduce this from the substantial differences between the heteroskedasticity-robust and Newey-West
standard errors), it is not enough to say that FreeAgentst-1 is exogenous because it appears with a lag.
That lagged value of FreeAgents could be correlated with persistent elements of the error term, such as
improvements in the depth of talent.
18