Sei sulla pagina 1di 6

HONOURS ECONOMETRICS TUTORIAL 7

MORE ON SPECIFICATION AND DATA ISSUES


06 APRIL 2011
ECO4016F
Part A: Problems
Answer any three questions in this section
1. (a) If the true model is 1
i
= ,
1
+,
2
A
2i
+,
3
A
3i
+n
i
but you t 1
i
= c
1
+c
2
A
2i
+
i
what
model specication error will you have committed? What will be the properties
of b c
1
and b c
2
with respect to bias if A
2
and A
3
are uncorrelated?
(b) What are the dierences between an outlier, an observation with high leverage
and an inuential observation?
2. (a) For a two variable regression model 1
i
= ,
1
+ ,
2
A
i
+ n
i
, show that when there
are errors of measurement in A (rather than 1 ) the explanatory variable and the
error term are correlated.
(b) If the true model is 1
i
= c
1
+c
2
A
2i
+
i
but you t 1
i
= ,
1
+,
2
A
2i
+,
3
A
3i
+n
i
what
will be the properties of
b
,
1
,
b
,
2
and
b
,
3
with respect to bias? What is the value of
1(
b
,
3
)?
3. Let :ct/: denote the percentage of students at a Western Cape high school receiving
a passing score on a standardised maths test. We are interested in estimating the eect
of per student spending (crjc:d) on maths performance. A simple model is
:ct/: = ,
0
+ ,
1
log(crjc:d) + ,
2
log(c::o||) + ,
3
joc:t + n.
where c::o|| is student enrollment (to reect school size) and joc:t is the percentage
of students living in poverty.
(a) The variable /o/:4qood is the percentage of students eligible to receive school
shoes from Bobby Skinstads bobsforgood foundation (http://bobsforgood.co.za).
Why is this a sensible proxy for joc:t?
(b) The table that follows contains OLS estimates, with and without /o/:4qood as an
explanatory variable.
1
Dependent Variable: :ct/:
Independent Variables (1) (2)
log(crjc:d) 11.13
(3:30)
7.75
(3:04)
log(c::o||) .022
(:615)
1.26
(:58)
/o/:4qood .324
(:036)
i:tc:ccjt 69.24
(26:72)
23.14
(24:99)
o/:c:ctio:: 428 428
1
2
.0297 .1893
Explain why the eect of expenditures on :ct/: is lower in column (2) than in
column (1). Is the eect in column (2) still statistically greater than zero?
(c) Does it appear that pass rates are lower at larger schools, other facts being equal?
Explain.
4. We are interested in estimating a model relating number of campus crimes to student
enrollment for a sample of universities in 2006. The sample we have is not a random
sample of universities in South Africa, because many universities did not report campus
crimes in 2006. Do you think that university failure to report crimes can be viewed as
exogenous sample selection? Explain.
5. The following equation explains weekly hours of television viewing by a child in terms
of the childs age, mothers education, fathers education, and number of siblings:
t/on::

= ,
0
+ ,
1
cqc + ,
2
cqc
2
+ ,
3
:ot/cdnc + ,
4
,ct/cdnc + ,
5
:i/: + n. We are
worried that t/on::

is measured with error in our survey. Let t/on:: denote the


reported hours of television viewing per week.
(a) What do the classical errors-in-variables (CEV) assumptions require in this ap-
plication?
(b) Do you think the CEV assumptions are likely to hold? Explain.
Part B: Computer Exercises
Well go over Questions 2, 3 and 4 in the tutorial. Questions 1 and 5 are home-
work.
1. Use the data set WAGE2.dta for this exercise. The dataset contains information on
monthly earnings, education, several demographic variables, and IQ scores for 935 men
in 1980.
(a) Apply RESET to the model
log(ncqc) = ,
0
+ ,
1
cdnc + ,
2
exp c: + ,
3
tc:n:c + n
Is there evidence of functional form mispecication in the model?
2
(b) Use the Davidson-Mackinnon test to test the model
log(ncqc) = ,
0
+ ,
1
cdnc + ,
2
exp c: + ,
3
tc:n:c + n (1)
against the model
log(ncqc) = ,
0
+ ,
1
log(cdnc) + ,
2
log(exp c:) + ,
3
log(tc:n:c) + n (2)
(c) Now estimate the following model
log(ncqc) = ,
0
+ ,
1
cdnc + ,
2
exp c: + ,
3
tc:n:c + ,
4
:c::icd + ,
5
:ont/
+,
6
n:/c: + ,
7
/|cc/ + ,
8
1Q + n
where IQ controls for omitted ability bias.
(d) Now use the variable KWW (the knowledge of the world of work test score) as
a proxy for ability in place of IQ. What is the estimated return to education in
this case?
(e) Now use IQ and KWW together as proxy variables. What happens to the esti-
mated return to education?
(f) In part (e), are IQ and KWW individually signicant? Are they jointly signi-
cant?
2. You need to use two datasets for this exercise, JTRAIN2.dta and JTRAIN3.dta. The
former is an outcome of a job training experiment. The le JTRAIN3.dta contains
observational data, where individuals largely determine whether they participate in
job training. The datasets cover the same time period.
(a) In the dataset JTRAIN2.dta, what fraction of the men received job training?
What is the fraction in JTRAIN3.dta? Why do you think there is such a big
dierence?
(b) Using JTRAIN2.dta, run a simple regression of :c78 on t:ci:. What is the esti-
mated eect of participating in job training on real earnings?
(c) Now add as controls to the regression in part (b) the variable
:c74. :c75. cdnc. cqc. /|cc/. and /i:j.
Does the estimated eect of job training on :c78 change much? How come? (Hint:
Remember that these are experimental data)
(d) Do the regression in part (b) and (c) using the data in JTRAIN3.dta, reporting
only the estimated coecients on t:ci:, along with their t-statistics. What is the
eect now of controlling for the extra factors, and why?
(e) Dene cq:c = (:c74 + :c75),2. Find the sample averages, standard deviations,
and minimum and maximum values in the two datasets. Are these datasets
representative of the same populations in 1978?
3
(f) Almost 96% of the men in the dataset JTRAIN2.dta have cq:c less than $10,000.
Using only these men, run the regression of
:c78 on t:ci:. :c74. :c75. cdnc. cqc. /|cc/. and /i:j
and report the training estimate and its t statistic. Run the same regression
for JTRAIN3.dta, using only men with cq:c 10. For the subsample of low
income men, how do the estimated training eects compare across experimental
and nonexperimental data sets?
(g) Now use each data set to run the simple regression :c78 on t:ci:, but only for men
who were unemployed in 1974 and 1975. How do the training estimates compare
now?
(h) Using your ndings from the previous regressions, discuss the potential impor-
tance of having comparable populations underlying comparisons of experimental
and nonexperimental estimates.
3. Use the state-level data on murder rates and executions in MURDER.dta for the
following questions. The variable ::d:tc is the murder rate, that is, the number
of murders per 100, 000 people. The variable crcc is the total number of prisoners
executed for the current and prior two years; n:c: is the state unemployment rate.
Use the data for the year 1993 for this question, although you will need to rst obtain
the lagged murder rate, say ::d:tc
1
.
(a) Run the regression of ::d:tc on crcc, n:c:. What are the coecient and t
statistic on crcc?
(b) How many executions are reported for Texas during 1993? (Actually, this is the
sum of executions for the current and past two years.) How does this compare
with the other states? Add a dummy variable for Texas to the regression in part
(a). Is its t statistic unusually large? From this, does it appear Texas is an
outlier?
(c) To the regression in part (a) add the lagged murder rate. What happens to
b
,
exec
and its statistical signicance?
(d) For the regression in part (c), does it appears Texas is an outlier? What is the
eect on
b
,
exec
from dropping Texas from the regression?
4. Use the dataset JTRAIN.dta for Michigan manufacturing rms.
(a) Consider the simple regression model
log(:c:cj) = ,
0
+ ,
1
q:c:t + n.
where scrap is the rm scrap rate and grant is a dummy variable indicating
whether a rm received a job training grant. Can you think of some reasons why
the unobserved factors in u might be correlated with grant?
4
(b) Estimate the simple regression model using the data for 1988. (You should have
54 observations.) Does receiving a job training grant signicantly lower a rms
scrap rate?
(c) Now add as an explanatory variable log(:c:cj
87
). How does this change the
estimated eect of q:c:t? Interpret the coecient on q:c:t. Is it statistically
signicant at the 5% level against the one-sided alternative H
1
: ,
grant
< 0?
(d) Test the null hypothesis that the parameter on log(:c:cj
87
) is one against the
two-sided alternative. Report the p-value for the test.
(e) Repeat parts (c) and (d), using heteroskedasticity-robust standard errors, and
briey discuss any notable dierences.
5. Use the le CHICKEN.dta to study the demand for chicken in the US, 1960-1982. This
le contains data for the following variables:
1 = per capita consumption of chickens, in kg
A
2
= real disposable income per capita, in $
A
3
= real retail price of chicken per kg, in cents
A
4
= real retail price of pork per kg, in cents
A
5
= real retail price of beef per kg, in cents
A
6
= composite real price of chicken substitutes per kg, in cents (which is a weighted
average of the real retail prices per kg of pork and beef, the weights being the relative
consumptions of beef and pork in total beef and pork consumption). Now consider the
following demand functions:
ln 1
t
= c
1
+ c
2
ln A
2t
+ c
3
ln A
3t
+ n
t
(1)
ln 1
t
=
1
+
2
ln A
2t
+
3
ln A
3t
+
4
ln A
4t
+ n
t
(2)
ln 1
t
= `
1
+ `
2
ln A
2t
+ `
3
ln A
3t
+ `
5
ln A
5t
+ n
t
(3)
ln 1
t
= o
1
+ o
2
ln A
2t
+ o
3
ln A
3t
+ o
4
ln A
4t
+ o
5
ln A
5t
+ n
t
(4)
ln 1
t
= ,
1
+ ,
2
ln A
2t
+ ,
3
ln A
3t
+ ,
6
ln A
6t
+ n
t
(5)
From microeconomic theory it is known that the demand for a commodity generally de-
pends on the real income of the consumer, the real price of the commodity, and the real
prices of competing and complementary commodities. In view of these considerations,
answer the following questions.
(a) Which demand function among the ones given here would you choose, and why?
What is the dierence between specications (2) and (4)? What problems do you
foresee if you adopt specication (4)?
(b) Since specication (5) includes the composite price of beef and pork, would you
prefer the demand function (5) to function (4)? Why? Are pork and/or beef
competing or substitute products to chicken? How do you know?
5
(c) Assume function (5) is the correct demand function. Estimate the parameters
of this model, obtain their standard errors, and R
2
, adjusted-R
2
. Interpret your
results. Now suppose you run the incorrect model (2). Assess the consequences
of this misspecication by considering the values of
2
and
3
in relation to ,
2
and ,
3
respectively.
(d) Assume now that model (1) is the true demand function, if we now estimate model
(5), what type of specication error is committed in this instance? What are the
theoretical consequences of this type of specication error? Illustrate with the
data at hand.
(e) Are models (2) and (3) nested models? Motivate. How do you decide the model
to adopt between the two of them? Which one is preferable?
6

Potrebbero piacerti anche