Sei sulla pagina 1di 28

School of Economics

Introductory Econometrics
ECON2206/ECON3290

Tutorial Program
Sample Answers

Session 1, 2011

1
Table of Contents
Week 2 Tutorial Exercises ............................................................................................................................................................................................... 3
Problem Set ...................................................................................................................................................................................................................... 3
STATA Hints ..................................................................................................................................................................................................................... 4
Week 3 Tutorial Exercises ............................................................................................................................................................................................... 5
Review Questions (these may or may not be discussed in tutorial classes) ..................................................................................... 5
Problem Set ...................................................................................................................................................................................................................... 5
Week 4 Tutorial Exercises ............................................................................................................................................................................................... 7
Review Questions (these may or may not be discussed in tutorial classes) ..................................................................................... 7
Problem Set (these will be discussed in tutorial classes) .......................................................................................................................... 8
Week 5 Tutorial Exercises ............................................................................................................................................................................................... 9
Review Questions (these may or may not be discussed in tutorial classes) ..................................................................................... 9
Problem Set (these will be discussed in tutorial classes) ....................................................................................................................... 10
Week 6 Tutorial Exercises ............................................................................................................................................................................................ 11
Review Questions (these may or may not be discussed in tutorial classes) .................................................................................. 11
Problem Set (these will be discussed in tutorial classes) ....................................................................................................................... 11
Week 7 Tutorial Exercises ............................................................................................................................................................................................ 15
Review Questions (these may or may not be discussed in tutorial classes) .................................................................................. 15
Problem Set (these will be discussed in tutorial classes) ....................................................................................................................... 15
Week 8 Tutorial Exercises ............................................................................................................................................................................................ 17
Review Questions (these may or may not be discussed in tutorial classes) .................................................................................. 17
Problem Set (these will be discussed in tutorial classes) ....................................................................................................................... 17
Week 9 Tutorial Exercises ............................................................................................................................................................................................ 20
Review Questions (these may or may not be discussed in tutorial classes) .................................................................................. 20
Problem Set (these will be discussed in tutorial classes) ....................................................................................................................... 20
Week 10 Tutorial Exercises ......................................................................................................................................................................................... 23
Readings .......................................................................................................................................................................................................................... 23
Review Questions (these may or may not be discussed in tutorial classes) .................................................................................. 23
Problem Set (these will be discussed in tutorial classes) ....................................................................................................................... 24
Week 11 Tutorial Exercises ......................................................................................................................................................................................... 26
Review Questions (these may or may not be discussed in tutorial classes) .................................................................................. 26
Problem Set ................................................................................................................................................................................................................... 27
Week 12 Tutorial Exercises ......................................................................................................................................................................................... 29
Review Questions (these may or may not be discussed in tutorial classes) .................................................................................. 29
Problem Set (these will be discussed in tutorial classes) ....................................................................................................................... 30
Week 13 Tutorial Exercises ......................................................................................................................................................................................... 32
Review Questions (these may or may not be discussed in tutorial classes) .................................................................................. 32
Problem Set (these will be discussed in tutorial classes) ....................................................................................................................... 33

2
Week 2 Tutorial Exercises

Problem Set
Q1. Wooldridge 1.1
• (i) Ideally, we could randomly assign students to classes of different sizes. That is, each
student is assigned a different class size without regard to any student characteristics such
as ability and family background. For reasons we will see in Chapter 2, we would like
substantial variation in class sizes (subject, of course, to ethical considerations and resource
constraints).
• (ii) A negative correlation means that larger class size is associated with lower performance.
We might find a negative correlation because larger class size actually hurts performance.
However, with observational data, there are other reasons we might find a negative
relationship. For example, children from more affluent families might be more likely to
attend schools with smaller class sizes, and affluent children generally score better on
standardized tests. Another possibility is that, within a school, a principal might assign the
better students to smaller classes. Or, some parents might insist their children are in the
smaller classes, and these same parents tend to be more involved in their children’s
education.
• (iii) Given the potential for confounding factors – some of which are listed in (ii) – finding a
negative correlation would not be strong evidence that smaller class sizes actually lead to
better performance. Some way of controlling for the confounding factors is needed, and this
is the subject of multiple regression analysis.

Q2. Wooldridge 1.2


• (i) Here is one way to pose the question: If two firms, say A and B, are identical in all
respects except that firm A supplies job training one hour per worker more than firm B, by
how much would firm A’s output differ from firm B’s?
• (ii) Firms are likely to choose job training depending on the characteristics of workers.
Some observed characteristics are years of schooling, years in the workforce, and
experience in a particular job. Firms might even discriminate based on age, gender, or race.
Perhaps firms choose to offer training to more or less able workers, where “ability” might be
difficult to quantify but where a manager has some idea about the relative abilities of
different employees. Moreover, different kinds of workers might be attracted to firms that
offer more job training on average, and this might not be evident to employers.
• (iii) The amount of capital and technology available to workers would also affect output. So,
two firms with exactly the same kinds of employees would generally have different outputs
if they use different amounts of capital or technology. The quality of managers would also
have an effect.
• (iv) No, unless the amount of training is randomly assigned. The many factors listed in parts
(ii) and (iii) can contribute to finding a positive correlation between output and training
even if job training does not improve worker productivity.

Q3. Wooldridge C1.3 (meap01_ch01.do)


• (i) The largest is 100, the smallest is 0.
• (ii) 38 out of 1,823, or about 2.1 percent of the sample.
• (iii) 17
• (iv) The average of math4 is about 71.9 and the average of read4 is about 60.1. So, at least in
2001, the reading test was harder to pass.
• (v) The sample correlation between math4 and read4 is about 0.843, which is a very high
degree of (linear) association. Not surprisingly, schools that have high pass rates on one test
have a strong tendency to have high pass rates on the other test.

3
• (vi) The average of exppp is about $5,194.87. The standard deviation is $1,091.89, which
shows rather wide variation in spending per pupil. [The minimum is $1,206.88 and the
maximum is $11,957.64.]
• (vii) 8.7%. Note that, by Taylor expansion, log(x + h) – log(x) = log(1 + h/x) ≈ h/x, for
positive x and small h/x (in absolute value).

Q4. Wooldridge C1.4 (jtrain2_ch01.do)


• (i) 185/445 ≈ .416 is the fraction of men receiving job training, or about 41.6%.
• (ii) For men receiving job training, the average of re78 is about 6.35, or $6,350. For men not
receiving job training, the average of re78 is about 4.55, or $4,550. The difference is $1,800,
which is very large. On average, the men receiving the job training had earnings about 40%
higher than those not receiving training.
• (iii) About 24.3% of the men who received training were unemployed in 1978; the figure is
35.4% for men not receiving training. This, too, is a big difference.
• (iv) The differences in earnings and unemployment rates suggest the training program had
strong, positive effects. Our conclusions about economic significance would be stronger if
we could also establish statistical significance (which is done in Computer Exercise C9.10 in
Chapter 9).

STATA Hints
• Some of STATA do-files for solving the Problem Set are posted in the course website. It is
important to understand the commands in the do-files so that you will be able write your
own do-files for your assignments and the course project.

4
Week 3 Tutorial Exercises
Review Questions (these may or may not be discussed in tutorial classes)
• The minimum requirement for OLS to be carried out for the data set {(xi, yi), i=1,…,n} with
the sample size n > 2 is that the sample variance of x is positive. In what circumstances is the
sample variance of x zero?
When all observations on x have the same value, there is no variation in x.
• The OLS estimation of the simple regression model has the following properties:
a) the sum of the residuals is zero;
b) the sample covariance of the residuals and x is zero.
Why? How would you relate them to the “least squares” principle?
These are derived from the first-order conditions for minimising the sum of squared
residuals (SSR).
• Convince yourself that the point ( x , y ) , the sample means of x and y, is on the sample
regression function (SRF), which is a straight line.
This can be derived from the first of the first-order conditions for minimising SSR.
• How do you know that SST = SSE + SSR is true?
The dependent variable values can be expressed as the fitted value plus the residual. After
taking the average away from both side of the equation, we find (𝑦𝑖 − 𝑦�) = (𝑦�𝑖 − 𝑦�) + 𝑢�𝑖 .
The required “SST = SSE + SSR” is obtained by squaring and summing both sides of the
above equation, noting (𝑦�𝑖 − 𝑦�) = 𝛽̂1 (𝑥𝑖 − 𝑥̅ ), and using the second of the first-order
conditions of minimising SSR.
• Which of the following models is (are) nonlinear model(s)?
a) sales = β0 /[1 + exp(-β1 ad_expenditure)] + u;
b) sales = β0 + β1 log(ad_expenditure) + u;
c) sales = β0 + β1 exp(ad_expenditure) + u;
d) sales = exp(β0 + β1 ad_expenditure + u).
The model in a) is nonlinear.
• Can you follow the proofs of Theorems 2.1-2.3?
You should be able to follow the proofs. If not at this point, try again.

Problem Set
Q1. Wooldridge 2.4
• (i) When cigs = 0, predicted birth weight is 119.77 ounces. When cigs = 20, 󲐀bwght =
109.49. This is about an 8.6% drop.
• (ii) Not necessarily. There are many other factors that can affect birth weight,
particularly overall health of the mother and quality of prenatal care. These could be
correlated with cigarette smoking during birth. Also, something such as caffeine
consumption can affect birth weight, and might also be correlated with cigarette
smoking.
• (iii) If we want a predicted bwght of 125, then cigs = (125 – 119.77)/( –.524) –10.18, or
about –10 cigarettes! This is nonsense, of course, and it shows what happens when we
are trying to predict something as complicated as birth weight with only a single
explanatory variable. The largest predicted birth weight is necessarily 119.77. Yet
almost 700 of the births in the sample had a birth weight higher than 119.77.
• (iv) 1,176 out of 1,388 women did not smoke while pregnant, or about 84.7%. Because
we are using only cigs to explain birth weight, we have only one predicted birth weight

5
at cigs = 0. The predicted birth weight is necessarily roughly in the middle of the
observed birth weights at cigs = 0, and so we will under predict high birth rates.

Q2. Wooldridge 2.7


• (i) When we condition on inc in computing an expectation, inc becomes a constant. So
E(u|inc) = E(√𝑖𝑛𝑐⋅e|inc) = √𝑖𝑛𝑐⋅E(e|inc) = √𝑖𝑛𝑐⋅0 because E(e|inc) = E(e) = 0.
• (ii) Again, when we condition on inc in computing a variance, inc becomes a constant. So
Var(u|inc) = Var(√𝑖𝑛𝑐⋅e|inc) = (√𝑖𝑛𝑐)2Var(e|inc) = 𝜎𝜀2 inc because Var(e|inc) = 𝜎𝜀2 .
• (iii) Families with low incomes do not have much discretion about spending; typically, a
low-income family must spend on food, clothing, housing, and other necessities. Higher
income people have more discretion, and some might choose more consumption while
others more saving. This discretion suggests wider variability in saving among higher
income families.

Q3. Wooldridge C2.6 (meap93_ch02.do)


• (i) It seems plausible that another dollar of spending has a larger effect for low-spending
schools than for high-spending schools. At low-spending schools, more money can go
toward purchasing more books, computers, and for hiring better qualified teachers. At high
levels of spending, we would expect little, if any, effect because the high-spending schools
already have high-quality teachers, nice facilities, plenty of books, and so on.
• (ii) If we take changes, as usual, we obtain
� = 𝛽1 ∆ log(𝑒𝑥𝑝𝑒𝑛𝑑) ≈ � 𝛽1 � (%∆𝑒𝑥𝑝𝑒𝑛𝑑),
∆𝑚𝑎𝑡ℎ10 100
just as in the second row of Table 2.3. So, if %Δexpend = 10, Δmath10 = β1/10.
• (iii) The regression results are
� = −69.34 + 11.16 log(𝑒𝑥𝑝𝑒𝑛𝑑) , 𝑛 = 408, 𝑅 2 = .0297
𝑚𝑎𝑡ℎ10
• � increases by about 1.1 percentage points.
(iv) If expend increases by 10 percent, 𝑚𝑎𝑡ℎ10
This is not a huge effect, but it is not trivial for low-spending schools, where a 10 percent
increase in spending might be a fairly small dollar amount.
• (v) In this data set, the largest value of math10 is 66.7, which is not especially close to 100.
In fact, the largest fitted values is only about 30.2.

Q4. Wooldridge C2.7 (charity_ch02.do)


• (i) The average gift is about 7.44 Dutch guilders. Out of 4,268 respondents, 2,561 did not
give a gift, or about 60 percent.
• (ii) The average mailings per year are about 2.05. The minimum value is .25 (which
maximum value is 3.5.
• (iii) The estimated equation is
� = 2.01 + 2.65 mailsyear, n = 4,268, 𝑅 2 = .0138
𝑔𝚤𝑓𝑡
• (iv) The slope coefficient from part (iii) means that each mailing per year is associated with
– perhaps even “causes” – an estimated 2.65 additional guilders, on average. Therefore, if
each mailing costs one guilder, the expected profit from each mailing is estimated to be 1.65
guilders. This is only the average, however. Some mailings generate no contributions, or a
contribution less than the mailing cost; other mailings generated much more than the
mailing cost.
• (v) Because the smallest mailsyear in the sample is .25, the smallest predicted value of gifts
is 2.01 + 2.65(.25) ≈ 2.67. Even if we look at the overall population, where some people have
received no mailings, the smallest predicted value is about two. So, with this estimated
equation, we never predict zero charitable gifts.

6
Week 4 Tutorial Exercises

Review Questions (these may or may not be discussed in tutorial classes)


• What do we mean when we say “regress wage on educ and expr”?
Use OLS to estimate the multiple linear regression model: wage = β0 + β1educ + β2expr + u.
• Why and under what circumstance do we need to “control for” expr in the regression model
in order to quantify the effect of educ on wage?
We need to control for expr in the model when the partial or ceteris paribus effect of educ on
wage is the parameter of interest and expr and educ are correlated.
• What is the bias of an estimator?
bias =E( 𝛽̂𝑗 ) – βj.
• What is the “omitted variable bias”?
It is the bias resulted from omitting relevant variables that are correlated with the
explanatory variables in a regression model.
• What is the consequence of adding an irrelevant variable to a regression model?
Including an irrelevant variable will generally increase the variance of the OLS estimators.
• What is the requirement of the ZCM assumption, in your own words?
You must do this by yourself.
• Why assuming E(u) = 0 is not restrictive when an intercept is included in the regression
model?
When the mean value of the disturbance is not zero, you can always define the new
disturbance as the original disturbance minus the mean value, and define the new intercept
as the original intercept plus the mean value.
• In terms of notation, why do we need two subscripts for independent variables?
In our notation, the first subscript is for indexing observations (1st obs, 2nd obs, etc.) and
the second subscript is for indexing variables (1st variable, 2nd variable, etc.).
• Using OLS to estimate a multiple regression model with k independent variables and an
intercept, how many first-order conditions are there?
k + 1.
• How do you know that the OLS estimators are linear combinations of the observations on
the dependent variable?
By the first-order conditions, we see that the OLS estimators are the solution to k+1 linear
equations with k+1 unknowns. The “right-hand-sides” of the equations are linear functions
of observations on the dependent variable (say y). Hence the solution is linear combinations
of the y-observations.
• What is Gauss-Markov theorem? Read the textbook.
• Try to explain why R2 never decreases (it is likely to increase) when additional explanatory
variables are added to the regression model.
First, the smaller is the SSR, the greater is the R-squared. The OLS chooses parameter
estimates to minimise the SSR. With extra explanatory variables, the R-squared does not
decrease because the SSR does not increase, which is true because the OLS has a larger
parameter space to choose estimates (more possibilities to make SSR small).
• What is an endogenous explanatory variable? Exogenous explanatory variable?
See the top of page 88.
• What is multicollinearity and its likely effect on the OLS estimators? Read pages 95-99.

7
Problem Set (these will be discussed in tutorial classes)
Q1. Wooldridge 3.2
• (i) Yes. Because of budget constraints, it makes sense that, the more siblings there are in a
family, the less education any one child in the family has. To find the increase in the number
of siblings that reduces predicted education by one year, we solve 1 = .094(Δsibs), so Δsibs =
1/.094 ≈ 10.6.
• (ii) Holding sibs and feduc fixed, one more year of mother’s education implies .131 years
more of predicted education. So if a mother has four more years of education, her son is
predicted to have about a half a year (.524) more years of education.
• (iii) Since the number of siblings is the same, but meduc and feduc are both different, the
coefficients on meduc and feduc both need to be accounted for. The predicted difference in
education between B and A is .131(4) + .210(4) = 1.364.

Q2. Wooldridge 3.10


• (i) Because x1 is highly correlated with x2 and x3, and these latter variables have large partial
effects on y, the simple and multiple regression coefficients on x1 can differ by large amounts.
We have not done this case explicitly, but given equation (3.46) and the discussion with a
single omitted variable, the intuition is pretty straightforward.
• (ii) Here we would expect 𝛽�1 and 𝛽̂1 to be similar (subject, of course, to what we mean by
“almost uncorrelated”). The amount of correlation between x2 and x3 does not directly effect
the multiple regression estimate on x1 if is essentially uncorrelated with x2 and x3.
• (iii) In this case we are (unnecessarily) introducing multicollinearity into the regression: x2
and x3 have small partial effects on y and yet x2 and x3 are highly correlated withx1. Adding x2
and x3 likely increases the standard error of the coefficient on x1 substantially, so se(𝛽̂1 ) is
likely to be much larger than se(𝛽�1 ).
• (iv) In this case, adding x2 and x3 will decrease the residual variance without causing much
collinearity (because x1 is almost uncorrelated with x2 and x3 ), so we should see se(𝛽̂1 )
smaller than se(𝛽�1 ). The amount of correlation between x and does not directly affect se(𝛽̂1 ).

Q3. Wooldridge C3.2 (hprice1_ch03.do)


• (i) The estimated equation is
� = −19.32 + .128 𝑠𝑞𝑟𝑓𝑡 + 15.20 𝑏𝑑𝑟𝑚𝑠, 𝑛 = 88, 𝑅 2 = .632
𝑝𝑟𝚤𝑐𝑒
• (ii) Holding square footage constant, (the price increase) = 15.20 (increase in bedroom).
Hence the price increases by 15.20, which means $15,200.
• (iii) Now (the price increase) = .128 (increase in square feet) + 15.20 (increase in bedroom)
= .128 (140) + 15.20 (1) = 33.12. That is $33,120. Because the house size is increasing, the
effect is larger than that in (ii).
• (iv) About 63.2%
• (v) The predicted price is -19.32+.128(2438)+15.20(4) = 353.544, or $353,544.
• (vi) The residual = 300,000 – 353,544 = - 53,544. The buyer under-paid, judged by the
predicted price. But, of course, there are many other features of a house (some that we
cannot even measure) that affect price. We have not controlled for those.

Q4. Wooldridge C3.6 (wage2_ch03_c3_6.do)


• (i) The slope coefficient from regressing IQ on educ is 3.53383.
• (ii) The slope coefficient from log(wage) on educ is .05984.
• (iii) The slope coefficient from log(wage) on educ and IQ are .03912 and .00586 respectively.
~
• (iv) The computed βˆ1 + δ 1 βˆ 2 = .03912 + 3.53383(.00586) ≈ .05984.

8
Week 5 Tutorial Exercises

Review Questions (these may or may not be discussed in tutorial classes)


• What are the CLM assumptions?
These are MLR1, MLR2, MLR3 and MLR6. MLR6 is a very strong assumption that implies
both MLR4 and MLR5.
• What is the sampling distribution of the OLS estimators under the CLM assumptions?
The OLS estimators under the CLM assumptions follow the normal distribution.
• What are the standard errors of the OLS estimators?
The standard error is the square root of the estimated variance of the estimator.
• What is the null hypothesis about a parameter?
In this context, the null hypothesis is a statement that assigns a known value (or
hypothesised value) to the parameter of interest. Usually, the null is a maintained
proposition (from economic theories or experience) that will only be rejected on very
strong evidence.
• What is a one-tailed (two-tailed) alternative hypothesis?
This should be clear once you finish your reading.
• In testing hypotheses, what is a Type 1 (Type 2) error? What is the level of significance?
Type 1 error is the rejection of a true null. Type 2 error is the non-rejection of a false null.
• The decision rule we use can be stated as “reject the null if the t-statistic exceeds the critical
value”. How is the critical value determined?
The critical value is determined by the level of significance (or level of confidence in the case
of confidence intervals) and the distribution of the test statistic. For example, for t-stat, we
use the t-distribution when the sample size is small, and the normal distribution when the
sample size is large (df > 120).
• Justify the statement “Given the observed test statistic, the p-value is the smallest significant
level at which the null hypothesis would be rejected.”
If you used the observed t-stat as your critical value, you would just reject the null (because
the t-stat is equal to this value). This “critical value” (= the t-stat value) implies a level of
significance, say p, which is exactly the p-value. For any significance level smaller than p, the
corresponding critical value is more extreme than the t-stat value and does not lead to the
rejection of the null. Hence, p is the smallest significance level at which the null would be
rejected.
• What is the 90% confidence interval for a parameter?
It is an interval [L, U] defined by two sample statistics: the lower bound L and the upper
bound U. The probability that [L, U] covers the parameter is 0.9.
• In constructing a confidence interval for a parameter, what is the level of confidence?
It is the probability that the CI covers the parameter.
• When the level of confidence increases, how would the width of the confidence interval
change (holding other things fixed)? The width will increase.
• Try to convince yourself that the event “the 90% confidence interval covers a hypothesised
value of the parameter” is the same as the event “the null of the parameter being the
hypothesised value cannot be rejected in favour of the two-tailed alternative at the 10%
level of significance.”
It becomes apparent if you start with P(|t-stat| > c) = 0.9 and disentangle the expression for
the t-stat.

9
Problem Set (these will be discussed in tutorial classes)
Q1. Wooldridge 4.1
• (i) and (iii) generally cause the t statistics not to have a t distribution under H0.
Homoskedasticity is one of the CLM assumptions. An important omitted variable violates
Assumption MLR.4 (hence MLR.6) in general. The CLM assumptions contain no mention of
the sample correlations among independent variables, except to rule out the case where the
correlation is one (MLR.3).

Q2. Wooldridge 4.2


• (i) H0:β3 = 0. H1:β3 > 0.
• (ii) The proportionate effect on is .00024(50) = .012. To obtain the percentage effect, we
multiply this by 100: 1.2%. Therefore, a 50 point ceteris paribus increase in ros is predicted
to increase salary by only 1.2%. Practically speaking, this is a very small effect for such a
large change in ros.
• (iii) The 10% critical value for a one-tailed test, using df = ∞, is obtained from Table G.2 as
1.282. The t statistic on ros is .00024/.00054 ≈ .44, which is well below the critical value.
Therefore, we fail to reject H0 at the 10% significance level.
• (iv) Based on this sample, the estimated ros coefficient appears to be different from zero
only because of sampling variation. On the other hand, including ros may not be causing any
harm; it depends on how correlated it is with the other independent variables (although
these are very significant even with ros in the equation).

Q3. Wooldridge 4.5


• (i) .412 ± 1.96(.094), or about [.228, .596].
• (ii) No, because the value .4 is well inside the 95% CI.
• (iii) Yes, because 1 is well outside the 95% CI.

Q4. Wooldridge C4.8 (401ksubs_ch04.do)


• (i) There are 2,017 single people in the sample of 9,275.
• (ii) The estimated equation is
� = −43.04 + .799 inc + .843 age
𝑛𝑒𝑡𝑡𝑓𝑎
( 4.08) (.060) (.092)
n = 2,017, R2 = .119.
The coefficient on inc indicates that one more dollar in income (holding age fixed) is
reflected in about 80 more cents in predicted nettfa; no surprise there. The coefficient on
age means that, holding income fixed, if a person gets another year older, his/her nettfa is
predicted to increase by about $843. (Remember, nettfa is in thousands of dollars.) Again,
this is not surprising.
• (iii) The intercept is not very interesting as it gives the predicted nettfa for inc = 0 and age =
0. Clearly, there is no one with even close to these values in the relevant population.
• (iv) The t statistic is (.843 − 1)/.092 ≈ −1.71. Against the one-sided alternative H1: β2 < 1, the
p-value is about .044. Therefore, we can reject H0: β2 = 1 at the 5% significance level (against
the one-sided alternative).
• (v) The slope coefficient on inc in the simple regression is about .821, which is not very
different from the .799 obtained in part (ii). As it turns out, the correlation between inc and
age in the sample of single people is only about .039, which helps explain why the simple
and multiple regression estimates are not very different, see the text also.

10
Week 6 Tutorial Exercises

Review Questions (these may or may not be discussed in tutorial classes)


• How would you test hypotheses about a single linear combination of parameters?
We may re-parameterise the regression model to isolate the “single linear combination”.
Then the OLS on the re-parameterised model will provide the estimate of the “single linear
combination” and the related standard error. See the example in the textbook and lecture
slides
• What are exclusion restrictions for a regression model?
Exclusion restrictions are the null hypothesis that a group of x-variables have zero
coefficients in the regression model.
• What are restricted and unrestricted models?
When the null hypothesis is a set of restrictions on the parameters, the regression under the
null is called the restricted model, while the regression under the alternative (which simply
states that the null is false) is called the unrestricted model.
• How do you compute the F-statistic, given that you have SSRs?
Clearly, the general F-stat is based on the relative difference between the SSRs under the
restricted model and unrestricted model: F = [(SSRr – SSRur)/q]/[SSRur/(n-k-1)], where you
should be able to explain the meanings of various symbols.
• What are general linear restrictions on parameters?
These are linear equations on the parameters. For example, β1 – 2β2 + 3β3 = 0.
• What is the test for the overall significance of a regression?
This is the F-test for the null hypothesis that all the coefficients of x-variables are zero.
• How would you report your regression results?
See the guidelines in Section 4.6 of the textbook.
• Why would you care about the asymptotic properties of the OLS estimators?
When MLR.6 does not hold, the finite-sample distribution of the OLS estimators is not
available. For reasonably large samples (large n), we use the asymptotic distribution of the
OLS estimators, which is known from studying the asymptotic properties, to approximate
the finite-sample distribution of the OLS estimators, which is unknown.
• Comparing the inference procedures in Chapter 5 with those in Chapter 4, can you list the
similarities and differences?
It is beneficial to make a list by yourself.
• Under MLR.1-MLR.5, the OLS estimators are consistent, asymptotically normal, and
asymptotically efficient. Try to explain these properties in your own words.
This is really a test on your understanding of these notions in econometrics.

Problem Set (these will be discussed in tutorial classes)


Q1. Wooldridge 4.6
• (i) With df = n – 2 = 86, we obtain the 5% critical value from Table G.2 with df = 90. Because
each test is two-tailed, the critical value is 1.987. The t statistic for H0:β0 = 0 is about -.89,
which is much less than 1.987 in absolute value. Therefore, we fail to reject H0:β0 = 0. The t
statistic for H0:β1 = 1 is (.976 – 1)/.049 ≈ -.49, which is even less significant. (Remember,
we reject H0 in favor of H1 in this case only if |t| > 1.987.)
• (ii) We use the SSR form of the F statistic. We are testing q = 2 restrictions and the df in the
unrestricted model is 86. We are given SSRr = 209,448.99 and SSRur = 165,644.51. Therefore,
F = [(209,448.99 − 165,644.51)/2]/[165,644.51/86] ≈ 11.37

11
which is a strong rejection of H0: from Table G.3c, the 1% critical value with 2 and 90 df is
4.85.
• (iii) We use the R-squared form of the F statistic. We are testing q = 3 restrictions and there
are 88 – 5 = 83 df in the unrestricted model. The F statistic is
F = [(.829 – .820)/3]/[(1 – .829)/83] ≈ 1.46. The 10% critical value (again using 90
denominator df in Table G.3a) is 2.15, so we fail to reject H0 at even the 10% level. In fact,
the p-value is about .23.
• (iv) If heteroskedasticity were present, Assumption MLR.5 would be violated, and the F
statistic would not have an F distribution under the null hypothesis. Therefore, comparing
the F statistic against the usual critical values, or obtaining the p-value from the F
distribution, would not be especially meaningful.

Q2. Wooldridge 4.8


• (i) We use Property VAR.3 from Appendix B:
Var(𝛽̂1 − 3𝛽̂2 ) = Var (𝛽̂1 ) + 9 Var (𝛽̂2 ) – 6 Cov (𝛽̂1 , 𝛽̂2 ).
• (ii) t = (𝛽̂1 − 3𝛽̂2 − 1)/se(𝛽̂1 − 3𝛽̂2 ), so we need the standard error of 𝛽̂1 − 3𝛽̂2 .
• (iii) Because θ = β1 – 3β2, we can write β1 = θ + 3β2. Plugging this into the population model
gives y = β0 + (θ + 3β2)x1 + β2 x2 + β3 x3 + u = β0 + θ x1 + β2 (3 x1 + x2) + β3 x3 + u. This last
equation is what we would estimate by regressing y on x1, 3x1 + x2, and x3. The coefficient
and standard error on x1 are what we want

Q3. Wooldridge 4.10


• (i) We need to compute the F statistic for the overall significance of the regression with n =
142 and k = 4: F = [.0395/(1 – .0395)](137/4) ≈ 1.41. The 5% critical value with 4 and 120
df is 2.45, which is well above the value of F. Therefore, we fail to reject H0: β1 = β2 = β3 = β4
= 0 at the 10% level. No explanatory variable is individually significant at the 5% level. The
largest absolute t statistic is on dkr, tdkr ≈ 1.60, which is not significant at the 5% level
against a two-sided alternative.
• (ii) The F statistic (with the same df) is now [.0330/(1 – .0330)](137/4) ≈ 1.17, which is
even lower than in part (i). None of the t statistics is significant at a reasonable level.
• (iii) We probably should not use the logs, as the logarithm is not defined for firms that have
zero for dkr or eps. Therefore, we would lose some firms in the regression.
• (iv) It seems very weak. There are no significant t statistics at the 5% level (against a two-
sided alternative), and the F statistics are insignificant in both cases. Plus, less than 4% of
the variation in return is explained by the independent variables.

Q4. Wooldridge C4.9 (discrim_c4_9.do)


• (i) The results from the OLS regression, with standard errors in parentheses, are

log (𝑝𝑠𝑜𝑑𝑎) = −1.46 + .073 prpblck + .137 log(income) + .380 prppov
(0.29) (.031) (.027) (.133)
n = 401, R = .087
2

• The p-value for testing H0: β1= 0 against the two-sided alternative is about .018, so that we
reject H0 at the 5% level but not at the 1% level.
• (ii) The correlation is about −.84, indicating a strong degree of multicollinearity. Yet each
coefficient is very statistically significant: the t statistic for 𝛽̂log (𝑖𝑛𝑐𝑜𝑚) is about 5.1 and that
for 𝛽̂prppov is about 2.86 (two-sided p-value = .004).
• (iii) The OLS regression results when log(hseval) is added are

log (𝑝𝑠𝑜𝑑𝑎) = −.84 + .098 prpblck − .053 log(income) + .052 prppov + .121 log(hseval)
(.29) (.029) (.038) (.134) (.018)

12
n = 401, R2 = .184
The coefficient on log(hseval) is an elasticity: a one percent increase in housing value,
holding the other variables fixed, increases the predicted price by about .12 percent. The
two-sided p-value is zero to three decimal places.
• (iv) Adding log(hseval) makes log(income) and prppov individually insignificant (at even the
15% significance level against a two-sided alternative for log(income), and prppov is does
not have a t statistic even close to one in absolute value). Nevertheless, they are jointly
significant at the 5% level because the outcome of the F2,396 statistic is about 3.52 with p-
value = .030. All of the control variables – log(income), prppov, and log(hseval) – are highly
correlated, so it is not surprising that some are individually insignificant.
• (v) Because the regression in (iii) contains the most controls, log(hseval) is individually
significant, and log(income) and prppov are jointly significant, (iii) seems the most reliable.
It holds fixed three measure of income and affluence. Therefore, a reasonable estimate is
that if prpblck increases by .10, psoda is estimated to increase by 1%, other factors held fixed.

Q5. Wooldridge 5.2


• This is about the inconsistency of the simple regression of pctstck on funds. A higher
tolerance of risk means more willingness to invest in the stock market, so β2 > 0. By
assumption, funds and risktol are positively correlated. Now we use equation (5.5), where
δ1 > 0: plim(𝛽�1 ) = β1 + β2δ1 > β1, so 𝛽�1 has a positive inconsistency (asymptotic bias). This
makes sense: if we omit risktol from the regression and it is positively correlated with funds,
some of the estimated effect of funds is actually due to the effect of risktol.

Q6. Wooldridge C5.1 (wage1_c5_1.do)


• (i) The estimated equation is
𝑤𝑎𝑔𝑒
� = −2.87 + .599 educ + .022 exper + .169 tenure
(0.73) (.051) (.012) (.022)
n = 526, R2 = .306, 𝜎� = 3.085.
• Below is a histogram of the 526 residual, 𝑢�𝑖 , i = 1, 2 , ..., 526. The histogram uses 27 bins,
which is suggested by the formula in the Stata manual for 526 observations. For comparison,
the normal distribution that provides the best fit to the histogram is also plotted.


13
• (ii) With log(wage) as the dependent variable the estimated equation is
log �
(𝑤𝑎𝑔𝑒) = .284 + .092 educ + .0041 exper + .022 tenure
(.104) (.007) (.0017) (.003)
n = 526, R = .316, 𝜎� = .441.
2

The histogram for the residuals from this equation, with the best-fitting normal distribution
overlaid, is given below:

• (iii) The residuals from the log(wage) regression appear to be more normally distributed.
Certainly the histogram in part (ii) fits under its comparable normal density better than in
part (i), and the histogram for the wage residuals is notably skewed to the left. In the wage
regression there are some very large residuals (roughly equal to 15) that lie almost five
estimated standard deviations (𝜎� = 3.085) from the mean of the residuals, which is
identically zero, of course. Residuals far from zero does not appear to be nearly as much of a
problem in the log(wage) regression.
• The skewness and kurtosis of the residuals from the OLS regression may be used to test the
null that u is normally distributed (skewness = 0 and kurtosis = 3 under the null). The null
distribution of the Jarque-Bera test statistic (which takes into account both skewness and
kurtosis) is approximately the chi-squared with 2 degrees of freedom. Hence, at the 5% level,
we reject the normality of u whenever the JB statistic is greater than 5.99. In STATA, this test
is carried out by the command “sktest”. With the data in WAGE1.RAW, JB = ⋅ (extremely
large) for the wage model and JB = 10.59 for the log(wage) model. While the normality is
rejected for both models, we do see that the JB statistic is much smaller for the log(wage)
model [also compare its p-values of skewness and excess kurtosis tests against those from
the wage model]. Hence the normality appears to be a reasonable approximation to the
distribution of the error term in the log(wage) model.

14
Week 7 Tutorial Exercises

Review Questions (these may or may not be discussed in tutorial classes)


• What are the advantages of using the log of a variable in regression? Find the “rules of
thumb” for taking logs.
See page 191 of Wooldridge (p198-199 for the 3rd edition)
• Be careful when you interpret the coefficients of explanatory variables in a model where
some variables are in logarithm. Do you remember Table 2.3?
Consult Table 2.3.
• How do you compute the change in y caused by Δx when the model is built for log(y)?
A precise computation is given by Equation (6.8) of Wooldridge.
• Why do we need the “interaction” terms in regression models?
An interaction term is needed if the partial effect of an explanatory variable is linearly
related to another explanatory variable. See (6.17) for example.
• What is the adjusted R-squared? What is the difference between it and the R-squared?
The primary attractiveness of the adjusted R-squared is that it imposes penalty for adding
additional explanatory variables to a model. The R-squared can never fall when a new
explanatory variable is added. However, the adjusted R-squared will fall if the t-ratio on the
new variable is less than one in absolute value.
• How do you construct interval prediction for given x-values?
This is nicely summarised in Equations (6.27)-(6.31) and the surrounding text.
• How do you predict y for given x-values when the model is built for log(y)?
Check the list on page 212 (p220 for the 3rd edition).
• What is involved in “residual analysis”?
See pages 209-210 (p217-218 for the 3rd edition).

Problem Set (these will be discussed in tutorial classes)


Q1. Wooldridge 6.4
• (i) Holding all other factors fixed we have
Δlog(wage) = β1 Δeduc + β2 Δeduc⋅ pareduc = (β1 + β2 pareduc) Δeduc .
Dividing both sides by Δeduc gives the result. The sign of β2 is not obvious, although β2 > 0 if
we think a child gets more out of another year of education the more highly educated are
the child’s parents.
• (ii) We use the values pareduc = 32 and pareduc = 24 to interpret the coefficient on
educ⋅pareduc. The difference in the estimated return to education is .00078(32 – 24) = .0062,
or about .62 percentage points. (Percentage points are changes in percentages.)
• (iii) When we add pareduc by itself, the coefficient on the interaction term is negative. The t
statistic on educ⋅pareduc is about –1.33, which is not significant at the 10% level against a
two-sided alternative. Note that the coefficient on pareduc is significant at the 5% level
against a two-sided alternative. This provides a good example of how omitting a level effect
(pareduc in this case) can lead to biased estimation of the interaction effect.

Q2. Wooldridge 6.5


• This would make little sense. Performances on math and science exams are measures of
outputs of the educational process, and we would like to know how various educational
inputs and school characteristics affect math and science scores. For example, if the
staff-to-pupil ratio has an effect on both exam scores, why would we want to hold

15
performance on the science test fixed while studying the effects of staff on the math pass
rate? This would be an example of controlling for too many factors in a regression
equation. The variable scill could be a dependent variable in an identical regression
equation.

Q3. Wooldridge C6.3 (C6.2 for the 3rd Edition) (wage2_c6_3.do)


• (i) Holding exper (and the elements in u) fixed, we have
Δ log(wage) = β1Δeduc + β2Δeduc⋅exper = (β1 + β2exper)Δeduc ,
or
Δ log(wage)/ Δeduc = (β1 + β2exper),
This is the approximate proportionate change in wage given one more year of education.
• (ii) H0: β3 = 0. If we think that education and experience interact positively – so that people
with more experience are more productive when given another year of education, then β3 >
0 is the appropriate alternative.
• (iii) The estimated equation is
log �
(𝑤𝑎𝑔𝑒) = 5.95 + .0440 educ – .0215 exper + .00320 educ⋅exper
(0.24) (.0174) (.0200) (.00153)
n = 935, R = .135, adjusted-R = .132.
2 2

The t statistic on the interaction term is about 2.13, which gives a p-value below .02 against
H1: β3 > 0. Therefore, we reject H0: β3 = 0 in favor of H1: β3 > 0 at the 2% level.
• (iv) We rewrite the equation as
log(wage) = β0 + θ1educ + β2exper + β3 educ (exper − 10) + u,
and run the regression log(wage) on educ, exper, and educ(exper – 10). We want the
coefficient on educ. We obtain θ1 ≈ .0761 and se(θ1) ≈ .0066. The 95% CI for θ1 is about .063
to .089.

Q4. Wooldridge C6.8 (hprice1_c6_8.do)


• (i) The estimated equation (where price is in dollars) is
𝑝𝑟𝚤𝑐𝑒
� = −21,770.3 + 2.068 lotsize + 122.78 sqrft + 13,852.5 bdrms
(29,475.0) (0.642) (13.24) (9,010.1)
n = 88, R = .672, adjusted-R = .661, 𝜎� = 59,833.
2 2

The predicted price at lotsize = 10,000, sqrft = 2,300, and bdrms = 4 is about $336,714.
• (ii) The regression is pricei on (lotsizei – 10,000), (sqrfti – 2,300), and (bdrmsi – 4). We want
the intercept estimate and the associated 95% CI from this regression. The CI is
approximately 336,706.7 ± 14,665, or about $322,042 to $351,372 when rounded to the
nearest dollar.
• (iii) We must use equation (6.36) to obtain the standard error of and then use equation
(6.37) (assuming that price is normally distributed). But from the regression in part (ii),
se(𝑦� 0 ) ≈ 7,374.5 and 𝜎� ≈ 59,833. Therefore, se(𝑒̂ 0 ) ≈ [(7,374.5)2 + (59,833) 2]1/2 ≈ 60,285.8.
Using 1.99 as the approximate 97.5th percentile in the t84 distribution gives the 95% CI for
price0, at the given values of the explanatory variables, as 336,706.7 ± 1.99(60,285.8) or,
rounded to the nearest dollar, $216,738 to $456,675. This is a fairly wide prediction interval.
But we have not used many factors to explain housing price. If we had more, we could,
presumably, reduce the error standard deviation, and therefore 𝜎�, to obtain a tighter
prediction interval.

16
Week 8 Tutorial Exercises

Review Questions (these may or may not be discussed in tutorial classes)


• What is a qualitative factor? Give some examples.
This should be easy.
• How do you use dummy variables to represent qualitative factors?
A factor with two levels (or values) is easily represented one dummy variable. For a factor
with more than two levels (or values), we need more than one dummies. For example, a
factors with 5 values requires 4 dummy variables.
• How do you use dummy variables to represent an ordinal variable?
An ordinal variable may be represented by dummies in the same way as a multi-levelled
factor.
• How do you test for differences in regression functions across different groups?
We generally use the F test on properly-defined restricted model and unrestricted model.
Chow test is a special case of the F test, where the null hypothesis is no difference across
groups.
• What can you achieve by interacting group dummy variables with other regressors?
The interaction allows the slope coefficients to be different across groups.
• What is “program evaluation”?
It assesses the effectiveness of a program, by checking the differences in regression
coefficients for the controlled (or base) group and the treatment group.
• What are the interpretations of the PRF and SRF when the dependent variable is binary?
The PRF is the conditional probability of “success” for given explanatory variables. The SRF
is the predicted conditional probability of “success” for given explanatory variables.
• What are the shortcomings of the LPM?
The major drawback of the LPM is that the predicted probability of success may be outside
the interval [0,1].
• Does MLR5 hold for the LPM?
No, as indicated in the textbook and lecture, the conditional variance of the disturbance is
not a constant and depends on explanatory variables: var(u|x) = E(y|x)[1− E(y|x)].

Problem Set (these will be discussed in tutorial classes)


Q1. Wooldridge 7.4
• (i) The approximate difference is just the coefficient on utility times 100, or –28.3%. The t
statistic is −.283/.099 ≈ −2.86, which is statistically significant.
• (ii) 100⋅[exp(−.283) – 1) ≈ −24.7%, and so the estimate is somewhat smaller in magnitude.
• (iii) The proportionate difference is .181 − .158 = .023, or about 2.3%. One equation that can
be estimated to obtain the standard error of this difference is
log(salary) = β0 + β1log(sales) + β2roe + δ1consprod + δ2utility +δ3trans + u,
where trans is a dummy variable for the transportation industry. Now, the base group is
finance, and so the coefficient δ1 directly measures the difference between the consumer
products and finance industries, and we can use the t statistic on consprod.

Q2. Wooldridge 7.6


• In Section 3.3 – in particular, in the discussion surrounding Table 3.2 – we discussed how to
determine the direction of bias in the OLS estimators when an important variable (ability, in
this case) has been omitted from the regression. As we discussed there, Table 3.2 only

17
strictly holds with a single explanatory variable included in the regression, but we often
ignore the presence of other independent variables and use this table as a rough guide. (Or,
we can use the results of Problem 3.10 for a more precise analysis.) If less able workers are
more likely to receive training, then train and u are negatively correlated. If we ignore the
presence of educ and exper, or at least assume that train and u are negatively correlated
after netting out educ and exper, then we can use Table 3.2: the OLS estimator of β1 (with
ability in the error term) has a downward bias. Because we think β1 ≥ 0, we are less likely to
conclude that the training program was effective. Intuitively, this makes sense: if those
chosen for training had not received training, they would have lowers wages, on average,
than the control group.

Q3. Wooldridge 7.9


• (i) Plugging in u = 0 and d = 1 gives f1(z) = (β0+δ0) + (β1+δ1)z.
• (ii) Setting f0(z*) = f1(z*) gives β0 + β1z* = (β0+δ0) + (β1+δ1)z* . Therefore, provided δ1 ≠ 0,
we have z* = −δ0/δ1 . Clearly, z* is positive if and only if δ0/δ1 is negative, which means δ0
and δ1 must have opposite signs.
• (iii) Using part (ii) we have totcoll* = .357/.030 = 11.9 years. (totcoll= total college years).
• (iv) The estimated years of college where women catch up to men is much too high to be
practically relevant. While the estimated coefficient on female⋅totcoll shows that the gap is
reduced at higher levels of college, it is never closed – not even close. In fact, at four years of
college, the difference in predicted log wage is still −.357+.030(4) = .237, or about 21.1%
less for women.

Q4. Wooldridge C7.13 (apple_c7_13.do)


• (i) 412/660 ≈ .624.
• (ii) The OLS estimates of the LPM are
� = .424 − .803 ecoprc + .719 regprc + .00055 faminc + .024 hhsize
𝑒𝑐𝑜𝑛𝑏𝑢𝑦
(.165) (.109) (.132) (.00053) (.013)
+ .025 educ − .00050 age
(.008) (.00125)
n = 660, R2 = .110
If ecoprc increases by, say, 10 cents (.10), then the probability of buying eco-labeled apples
falls by about .080. If regprc increases by 10 cents, the probability of buying eco-labeled
apples increases by about .072. (Of course, we are assuming that the probabilities are not
close to the boundaries of zero and one, respectively.)
• (iii) The F test, with 4 and 653 df, is 4.43, with p-value = .0015. Thus, based on the usual F
test, the four non-price variables are jointly very significant. Of the four variables, educ
appears to have the most important effect. For example, a difference of four years of
education implies an increase of .025(4) = .10 in the estimated probability of buying eco-
labeled apples. This suggests that more highly educated people are more open to buying
produce that is environmentally friendly, which is perhaps expected. Household size (hhsize)
also has an effect. Comparing a couple with two children to one that has no children – other
factors equal – the couple with two children has a .048 higher probability of buying eco-
labeled apples.
• (iv) The model with log(faminc) fits the data slightly better: the R-squared increases to
about .112. (We would not expect a large increase in R-squared from a simple change in the
functional form.) The coefficient on log(faminc) is about .045 (t = 1.55). If log(faminc)
increases by .10, which means roughly a 10% increase in faminc, then P(ecobuy = 1) is
estimated to increase by about .0045, a pretty small effect.

18
• (v) The fitted probabilities range from about .185 to 1.051, so none are negative. There are
two fitted probabilities above 1, which is not a source of concern with 660 observations.
• � ≥ 0.5 and zero
(vi) Using the standard prediction rule – predict one when 𝑒𝑐𝑜𝑛𝑏𝑢𝑦
otherwise – gives the fraction correctly predicted for ecobuy = 0 as 102/248 ≈ .411, so about
41.1%. For ecobuy = 1, the fraction correctly predicted is 340/412 ≈ .825, or 82.5%. With the
usual prediction rule, the model does a much better job predicting the decision to buy eco-
labeled apples. (The overall percent correctly predicted is about 67%.)

19
Week 9 Tutorial Exercises

Review Questions (these may or may not be discussed in tutorial classes)


• What is heteroskedasticity in a regression model?
When homoskedasticity (MLR5) fails and the variance of the disturbance (u) changes across
observation index (i), we say that heteroskedasticity is present.
• In the presence of heteroskedasticity, are the t-stat and F-stat from the usual OLS still valid?
Why? Are there any other problems with the OLS under heteroskedasticity?
The usual t-stat and F-stat are not valid because the usual OLS standard errors are incorrect
under heteroskedasticity. Further the OLS estimator will no longer be (asymptotically)
efficient and there are better estimators.
• What are the heteroskedasticity-robust standard errors? How do you use them in STATA?
These are the corrected standard errors that take into account the possible presence of
heteroskedasticity. The t-stat and F-stat computed using the heteroskedasticity-robust
standard errors are valid test statistics. In STATA, these are easily obtained by using the
option “robust” with the “regress” command, e.g., regress lwage educ, robust;
• How do you detect if there is heteroskedasticity?
The Breusch-Pagan test or the White test can be used to detect the presence of
heteroskedasticity. See Section 8.3 for details.
• If heteroskedasticity is present in a known form, how would you estimate the model?
In this case, the WLS estimators should be used, which is more efficient than the OLS
estimators. See Section 8.4 for details.
• If heteroskedasticity is present in an unknown form, how would you estimate the model?
If there is strong evidence for heteroskedasticity, the FGLS estimators should be used, which
is based on an exponential functional form. See Section 8.4 for details.
• What are the steps in the FGLS estimation?
You should summarise from Section 8.4.
• How would you handle the heteroskedasticity of the LPM?
The heteroskedasticity functional form is known for the LPM. Hence the WLS can be used in
principle. However, because the LPM can produce predicted probabilities that are outside
the interval (0,1), the known functional form p(x)[1-p(x)] may not be useful for the WLS. In
that case, it may be necessary to use FGLS for the LPM.

Problem Set (these will be discussed in tutorial classes)


Q1. Wooldridge 8.1
• Parts (ii) and (iii). The homoskedasticity assumption played no role in Chapter 5 in showing
that OLS is consistent. But we know that heteroskedasticity causes statistical inference
based on the usual t and F statistics to be invalid, even in large samples. As
heteroskedasticity is a violation of the Gauss-Markov assumptions, OLS is no longer BLUE.

Q2. Wooldridge 8.2


• With Var(u|inc,price,educ,female) = σ2inc2, h(x) = inc2, where h(x) is the heteroskedasticity
function defined in equation (8.21). Therefore, √h(x) = inc, and so the transformed equation
is obtained by dividing the original equation by inc:
beer/inc = β0(1/inc) + β1 + β2price/inc + β3educ/inc + β4female/inc + u/inc

20
Notice that β1, which is the slope on inc in the original model, is now a constant in the
transformed equation. This is simply a consequence of the form of the heteroskedasticity
and the functional forms of the explanatory variables in the original equation.

Q3. Wooldridge 8.3


• False. The unbiasedness of WLS and OLS hinges crucially on Assumption MLR.4, and, as we
know from Chapter 4, this assumption is often violated when an important variable is
omitted. When MLR.4 does not hold, both WLS and OLS are biased. Without specific
information on how the omitted variable is correlated with the included explanatory
variables, it is not possible to determine which estimator has a small bias. It is possible that
WLS would have more bias than OLS or less bias. Because we cannot know, we should not
claim to use WLS in order to solve “biases” associated with OLS.

Q4. Wooldridge 8.5


• (i) No. For each coefficient, the usual standard errors and the heteroskedasticity-robust
ones are practically very similar.
• (ii) The effect is −.029(4) = −.116, so the probability of smoking falls by about .116.
• (iii) As usual, we compute the turning point in the quadratic: .020/[2(.00026)] ≈ 38.46, so
about 38 and one-half years.
• (iv) Holding other factors in the equation fixed, a person in a state with restaurant smoking
restrictions has a .101 lower chance of smoking. This is similar to the effect of having four
more years of education.
• (v) We just plug the values of the independent variables into the OLS regression line:
� = .656 − .069log(67.44)+.012log(6,500) −.029(16)+.020(77) −.00026(77) ≈ .0052.
𝑠𝑚𝑜𝑘𝑒𝑠
Thus, the estimated probability of smoking for this person is close to zero. (In fact, this
person is not a smoker, so the equation predicts well for this particular observation.)

Q5. Wooldridge C8.10 (401ksubs_c8_10.do)


• (i) In the following equation, estimated by OLS, the usual standard errors are in (⋅) and the
heteroskedasticity-robust standard errors are in [⋅]:
� = −.506 + .0124 inc − .000062 inc2 + .0265 age − .00031 age2 − .0035 male
𝑒401𝑘
(.081) (.0006) (.000005) (.0039) (.00005) (.0121)
[.079] [.0006] [.000005] [.0038] [.00004] [.0121]
n = 9,275, R2 = .094.
There are no important differences; if anything, the robust standard errors are smaller.
• (ii) This is a general claim. Since Var(y|x) = p(x)[1-p(x)], we can write E(u2|x) = p(x)-p(x)2.
Written in error form, u2 = p(x) - p(x)2 + v. In other words, we can write this as a regression
model u2 = δ0 + δ1p(x) + δ2p(x)2 + v, with the restrictions δ0 = 0, δ1 = 1, and δ2 = -1.
Remember that, for the LPM, the fitted values, 𝑦�, are estimates of p(x). So, when we run the
regression 𝑢�2 on 𝑦� and 𝑦� 2 (including an intercept), the intercept estimates should be close
to zero, the coefficient on 𝑦� should be close to one, and the coefficient on 𝑦� 2 should be
close to –1.
• (iii) The White LM statistic and F statistic about 581.9 and 310.32 respectively, both of
� is about 1.010, the coefficient on 𝑒401𝑘
which are very significant. The coefficient on 𝑒401𝑘 � 2
about −.970, and the intercept is about -.009. These estimates are quite close to what we
expect to find from the theory in part (ii).

21
• (iv) The smallest fitted value is about .030 and the largest is about .697. The WLS estimates
of the LPM are
� = −.488 + .0126 inc − .000062 inc2 + .0255 age − .00030 age2 − .0055 male
𝑒401𝑘
(.076) (.0005) (.000004) (.0037) (.00004) (.0117)
n = 9,275, R2 = .108.
There are no important differences with the OLS estimates. The largest relative change is in
the coefficient on male, but this variable is very insignificant using either estimation method.

22
Week 10 Tutorial Exercises

Readings
• Read Chapter 9 thoroughly.
• Make sure that you know the meanings of the Key Terms at the chapter end.

Review Questions (these may or may not be discussed in tutorial classes)


• What is functional form misspecification? What is its major consequence in regression
analysis?
As the word “misspecification” suggests, it is a mis-specified model with a wrong functional
form for explanatory variables. For example, a log wage model that does not include age2 is
mis-specified if the partial effect of age on log wage is upside-down-U shaped. In linear
regression models, the functional form misspecification can be viewed as the omission of a
nonlinear function of the explanatory variables. Hence the consequences of omitting
important variables apply and the major one is estimation bias.
• How would you test for functional form misspecification?
The RESET may be used to test for functional form misspecification, more specifically, for
neglected nonlinearities (nonlinear function of regressors). The RESET is based on the F-
test for the joint significance of the squared and cubed predicted value (y-hat) in a
“expanded” model that includes all the original regressors as well as the squared and cubed
y-hat, where the latter represent possible nonlinear functions of the regressors.
• What are nested models? And, nonnested models?
Two models are nested if one is a restricted version of the other (by restricting the values of
parameters). Two models are nonnested if neither nests the other.
• What is the purpose of testing one model against another?
The purpose of testing one model against another is to select a “best” or “preferred” model
for statistical inference and prediction.
• How would you test for two nonnested models?
There are two strategies. The first is to test exclusion restrictions in an expanded model that
nest both candidate models and decide which model can be “excluded”. The second is to test
the significance of the predicted value of one model in the other model (known as Davidson-
MacKinnon test).
• What is a proxy variable? What are the conditions for a proxy variable to be valid in
regression analysis?
A proxy variable is one that is used to represents the influence of an unobserved (and
important) explanatory variable. There are two conditions for the validity of a proxy. The
first is that the zero conditional mean assumption holds for all explanatory variables
(including the unobserved and the proxy). The second is that the conditional mean of the
unobserved, given other explanatory variables and the proxy, only depends on the proxy.
• Can you analyse the consequences of measurement errors?
With a careful reading of Section 9.4, you should be able to analyse.
• In what circumstances missing observations will cause major concerns in regression
analysis?
Concerns arise when observations are systematically missing and the sample can no longer
be regarded as a random sample. Nonrandom samples may cause bias in the OLS estimation.
• What is exogenous sample selection? What is endogenous sample selection?

23
The exogenous sample selection is a nonrandom sampling scheme in which a sample is
chosen on the basis of the explanatory variables. It is easily seen that such sampling scheme
does not violate MLR1,3,4 and hence does not introduce bias in the OLS estimation. On the
other hand, the endogenous sample selection scheme selects a sample on the basis of the
dependent variable and causes bias in the OLS estimation.
• What are outliers and influential observations?
Outliers are unusual observations that are far away from the centre of other observations. A
small number of outliers can have large impact on the OLS estimates of parameters. These
are called influential observations if the estimation results when including them are
significantly different from the results when excluding them.
• Consider the simple regression model yi = α + (β + bi)xi + ui for i = 1, 2,…, n, where the “slop
parameter” contains a random variable bi; α and β are constant parameters. Assume that the
usual MLR1-5 hold for yi = α + βxi + ui, ; E(bi|xi) = 0 and Var(bi|xi) = ω2. If we regress y on x
with a intercept, will the estimator of the “slope parameter” be biased from β? Will the usual
OLS standard errors be valid for statistical inference?
As the model can be writted as yi = α + β xi + (bixi + ui), where the overall disturbance is (bixi
+ ui). Given the stated assumptions, MLR1-4 hold for this model and the OLS estimator of the
“slope parameter” is unbiased (from β). However, MLR5 does not hold since the conditional
variance of the overall disturbance, var(bixi + ui), will be dependent on xi , and the usual OLS
standard errors will not be valid.

Problem Set (these will be discussed in tutorial classes)


Q1. Wooldridge 9.1
• There is functional form misspecification if β6 ≠ 0 or β7 ≠ 0, where these are the population
parameters on ceoten2 and comten2, respectively. Therefore, we test the joint significance of
these variables using the R-squared form of the F test:
F = [(.375 − .353)/2]/[(1 − .375)/(177 – 8)] = 2.97.
With 2 and ∞ df, the 10% critical value is 2.30, while the 5% critical value is 3.00. Thus, the
p-value is slightly above .05, which is reasonable evidence of functional form
misspecification. (Of course, whether this has a practical impact on the estimated partial
effects for various levels of the explanatory variables is a different matter.)

Q2. Wooldridge 9.3


• (i) Eligibility for the federally funded school lunch program is very tightly linked to being
economically disadvantaged. Therefore, the percentage of students eligible for the lunch
program is very similar to the percentage of students living in poverty.
• (ii) We can use our usual reasoning on omitting important variables from a regression
equation. The variables log(expend) and lnchprg are negatively correlated: school districts
with poorer children spend, on average, less on schools. Further, β3 < 0. From Table 3.2,
omitting lnchprg (the proxy for poverty) from the regression produces an upward biased
estimator of β1 [ignoring the presence of log(enroll) in the model]. So when we control for
the poverty rate, the effect of spending falls.
• (iii) Once we control for lnchprg, the coefficient on log(enroll) becomes negative and has a t
of about –2.17, which is significant at the 5% level against a two-sided alternative. The
coefficient implies that Δmath10 ≈ −(1.26/100)(%Δenroll) = −.0126(%Δenroll). Therefore,
a 10% increase in enrollment leads to a drop in math10 of .126 percentage points.

24
• (iv) Both math10 and lnchprg are percentages. Therefore, a ten percentage point increase in
lnchprg leads to about a 3.23 percentage point fall in math10, a sizeable effect.
• (v) In column (1) we are explaining very little of the variation in pass rates on the MEAP
math test: less than 3%. In column (2), we are explaining almost 19% (which still leaves
much variation unexplained). Clearly most of the variation in math10 is explained by
variation in lnchprg. This is a common finding in studies of school performance: family
income (or related factors, such as living in poverty) are much more important in explaining
student performance than are spending per student or other school characteristics.

Q3. Wooldridge 9.5


• The sample selection in this case is arguably endogenous. Because prospective students may
look at campus crime as one factor in deciding where to attend college, colleges with high
crime rates have an incentive not to report crime statistics. If this is the case, then the
chance of appearing in the sample is negatively related to u in the crime equation. (For a
given school size, higher u means more crime, and therefore a smaller probability that the
school reports its crime figures.)

Q4. Wooldridge C9.3 (jtrain_c9_3.do)


• (i) If the grants were awarded to firms based on firm or worker characteristics, grant could
easily be correlated with such factors that affect productivity. In the simple regression
model, these are contained in u.
• (ii) The simple regression estimates using the 1988 data are
log(scrap) = .409 + .057 grant
(.241) (.406)
n = 54, R = .0004.
2

The coefficient on grant is actually positive, but not statistically different from zero.
• (iii) When we add log(scrap87) to the equation, we obtain
log(scrap88) = .021 − .254 grant88 + .831 log(scrap87)
(.089) (.147) (.044)
n = 54, R = .873,
2

where the years are indicated as suffixes to the variables for clarity. The t statistic for H0:
βgrant = 0 is −.254/.147 ≈ −1.73. We use the 5% critical value for 40 df in Table G.2: -1.68.
Because t = −1.73 < −1.68, we reject H0 in favor of H1: βgrant < 0 at the 5% level.
• (iv) The t statistic is (.831 – 1)/.044 ≈ −3.84, which is a strong rejection of H0.
• (v) With the heteroskedasticity-robust standard error, the t statistic for grant88 is
−.254/.142 ≈ −1.79, so the coefficient is even more significantly less than zero when we use
the heteroskedasticity-robust standard error. The t statistic for H0: βlog(scrap87) = 1 is (.831 –
1)/.071 ≈ −2.38, which is notably smaller than before, but it is still pretty significant.

25
Week 11 Tutorial Exercises

Review Questions (these may or may not be discussed in tutorial classes)


• What are the main features of time series data? How do time series data differ from cross-
sectional data?
The main features include the following. First, time series data have a temporal ordering,
which matters in regression analysis because of the second feature. Second, many economic
time series are serially correlated (or autocorrelated), meaning that future observation are
dependent on present and past observations. Third, many economic time series contain
trends and seasonality. In comparison to cross-sectional data, the major difference is that
time series data are not “random sample”. Recall that, for a random sample, observations
are required to be independent of one another.
• What is a stochastic process and its realisation?
A stochastic process (SP) is a random variable that depends on the time index. At any fixed
point in time, the SP is a random variable (hence has a distribution). The SP can be viewed
as a “random curve” with the horizontal axis being the time index, where the outcome of the
underlying experiment is a “curve” (or a time series plot). The time series we observe is
viewed as a realisation (or outcome) of the “random curve” or the SP.
• What is serial correlation or autocorrelation?
The serial (or auto) correlation of a time-series variable is the correlation between the
variable at one point in time and the same variable at another point in time. The serial (or
auto) correlation of yt is usually denoted as Corr(yt, yt-h) = Cov(yt, yt-h)/[Var(yt)Var(yt-h)]1/2.
• What is a finite distributed lag model? What is the long-run propensity (LRP)? How would
you estimate the LRP and the associated standard error (say in STATA)?
Read ie_Slides10 pages 5-7. Read Section 10.2. Try Wooldridge 10.3.
• What are TS1-6 (assumptions about time series regression)? How do they differ from the
assumptions in MLR1-6?
TS1-6 include: 1) linear in parameter; 2) no perfect collinearity; 3) strict zero conditional
mean; 4) homoskedasticity; 5) no serial correlation; 6) normality. These differ from MLR1-6
mainly in that MLR2 (random sampling) do not hold for time series data. To ensure the
unbiasedness of the OLS estimators, TS3 is needed. To ensure the validity of OLS standard
errors, t-stat and F-stat for statistical inference, TS3-TS6 are needed. These are very strong
assumptions and can be relaxed when sample sizes are large.
• What are “strictly exogenous” regressors and “contemporaneously exogenous” regressors?
Strictly exogenous regressors satisfy the assumption TS3, whereas the contemporaneously
exogenous regressors only satisfy the condition (10.10) (or (z10) of the Slides) that is
weaker than TS3.
• What is a trending time series? What is a time trend?
A time series is trending if it has the tendency of growing (or shrinking) over time. A time
trend is a function of time index (eg, linear, quadratic, etc). A time trend can be used to
mimic (or model) the trending component of a time series.
• Why may a regression with trending time series produce “spurious” results?
First, two time trends are always “correlated” because they grow together in the same (or
opposite) direction. Now consider two unrelated time series, each of which contains a time
trend. Since the time trends in the two series are “correlated”, regressing one on the other
will produce a significant slope coefficient. Such “statistical significance” is spurious because

26
it is purely induced by the time trends and has little to say about the true relationship
between the two time series.
• Why would you include a time trend in regressions with trending variables?
Including a time trend in regressions allows one to study the relationship among time series
variables that is not induced by the time trends in the time series.
• What is seasonality in a time series? Give an example of time series variable with seasonality.
Seasonality in a time series is the fluctuation that repeats itself every year (or every week, or
every day). An example would be the daily revenue a pub, which is likely to have week-day
seasonality.
• For quarterly data, how would you define seasonal dummy variables for a regression model?
We need to define three dummies. For instance, if we take the first quarter as the base, we
define dummy variables Q2, Q3 and Q4 for the second, third and fourth quarters and include
them in the regression model.

Problem Set
Q1. Wooldridge 10.1
• (i) Disagree. Most time series processes are correlated over time, and many of them strongly
correlated. This means they cannot be independent across observations, which simply
represent different time periods. Even series that do appear to be roughly uncorrelated –
such as stock returns – do not appear to be independently distributed, as they have dynamic
forms of heteroskedasticity.
• (ii) Agree. This follows immediately from Theorem 10.1. In particular, we do not need the
homoskedasticity and no serial correlation assumptions.
• (iii) Disagree. Trending variables are used all the time as dependent variables in a
regression model. We do need to be careful in interpreting the results because we may
simply find a spurious association between yt and trending explanatory variables. Including
a trend in the regression is a good idea with trending dependent or independent variables.
As discussed in Section 10.5, the usual R-squared can be misleading when the dependent
variable is trending.
• (iv) Agree. With annual data, each time period represents a year and is not associated with
any season.

Q2. Wooldridge 10.2


• We follow the hint and write
gGDPt-1 = α0 + δ0 intt-1 + δ1intt-2 + ut-1,
and plug this into the right-hand-side of the intt equation:
int t = γ0 + γ1( α0 + δ0int t-1 + δ1int t-2 + ut-1 – 3) + vt
= (γ0 + γ1 α0 – 3 γ1) + γ1 δ0int t-1 + γ1 δ1int t-2 + γ1ut-1 + vt .
Now by assumption, u t-1 has zero mean and is uncorrelated with all right-hand-side
variables in the previous equation, except itself of course. So
Cov(intt , ut-1) = E(int t ⋅ ut-1) = γ1E(ut-12) > 0
because γ1 > 0. If σ2 = E(ut2 ) for all t then Cov(intt , ut-1) = γ1 σ2. This violates the strict
exogeneity assumption, TS3. While u t is uncorrelated with intt, intt-1, and so on, ut is
correlated with intt+1.

27
Q3. Wooldridge 10.7
• (i) pet-1 and pet-2 must be increasing by the same amount as pet.
• (ii) The long-run effect, by definition, should be the change in gfr when pe increases
permanently. But a permanent increase means the level of pe increases and stays at the new
level, and this is achieved by increasing pet-1, pet-1, and pet by the same amount.

Q4. Wooldridge C10.10 (intdef_c10_10.do)


• (i) The sample correlation between inf and def is only about .098, which is pretty small.
Perhaps surprisingly, inflation and the deficit rate are practically uncorrelated over this
period. Of course, this is a good thing for estimating the effects of each variable on i3, as it
implies almost no multicollinearity.
• (ii) The equation with the lags is
i3t = 1.61 + .343 inf t + .382 inf t-1 − .190 def t + .569 def t-1
(0.40) (.125) (.134) (.221) (.197)
n = 55, R = .685, adj.R = .660.
2 2

• (iii) The estimated LRP of i3 with respect to inf is .343 + .382 = .725, which is somewhat
larger than .606, which we obtain from the static model in (10.15). But the estimates are
fairly close considering the size and significance of the coefficient on inft-1.
• (iv) The F statistic for significance of inf t-1 and def t-1 is about 5.22, with p-value ≈ .009. So
they are jointly significant at the 1% level. It seems that both lags belong in the model.

28

Potrebbero piacerti anche