Sei sulla pagina 1di 3

Economics 326 Problem Set 3- Due Feb. 22 Book and Lecture Questions 1. Do questions 3.2, 3.5, 3.

7, and 3.9 (pgs 106-108) in Wooldridge. 2. Measurement Error Question Suppose that the true population model: Y = 0 + 1 X1 + U . Assume that the model satises the rst four of our model assumptions. So 0 and 1 would be unbiased if we could estimate this model. However, the problem is that we dont know X1 and instead use an imperfect measure of X1 which we will call X1 . We can dene the Measurement Error of using X1 instead of X1 as: e1 = X1 X1 For this problem we will assume that on average that there is no measurement error (negative measurement error and positive measurement error cancel out) or that: E[e1 ] = 0. We will also assume that U is uncorrelated with both X1 (this follows from our standard 4 assumptions) and X1 . Most economists consider two dierent cases or additional assumptions when thinking about whether 1 will be biased from using an independent variable measured with error. In this problem we will go through both cases. (a) First, lets assume that in addition to everything above that: Cov(x , e1 ) = 0. 1 Plug our equation for the measurement error into the true population model to determine the model that we estimate. What is the independent variable? Show that the new regression equation will have a new error term with two parts. What are the two parts of the new error term? Our estimate of 1 from this new regression equation will be biased if the independent variable is correlated with the new error term. The independent variable will be correlated with the error term if the covariance is not equal to zero. So lets look at the covariance between the independent variable and the new error term. Is the covariance equal to zero? Here you will want to write out the denition of Covariance and use our assumptions and the properties of expected value to check whether this covariance equals zero. Show your steps. So what do you conclude regarding whether your estimated coecient for 1 will be biased? (b) Now, instead of the additional assumption from the last part (Cov(x , e1 ) = 0) 1 lets assume something dierent. Dont assume Cov(x , e1 ) = 0, instead assume: 1 Cov(x1 , e1 ) = 0 (all of the assumptions in the setup of the problem still hold). Lets examine whether the independent variable will be correlated with the error term by checking whether Cov(x , e1 ) = 0. As before, plug our equation for the 1 measurement error into the true population model to determine the model that we estimate. As before, our estimate of 1 from this new regression equation will be biased if the independent variable is correlated with the new error term. The independent variable will be correlated with the error term if the covariance is not equal to zero. So lets look at the covariance between the independent variable and the new error term. Is the covariance equal to zero? Here you will want to write out the denition of Covariance and use our assumptions and the properties of expected value to check whether this covariance equals zero.

Show your steps. So what do you conclude regarding whether your estimated coecient for 1 will be biased? Stata Questions We will use the same data as we did for Problem Set 2 (ps2data in the PS2 folder under problem sets on Blackboard). We will want to clean the le again before start with this problem. Copy and paste your code from your .do le for PS2 into the beginning of your .do le for this problem set (PS3). Rerun the code (from PS2 Question 1) that attempts to drop variables with missing and unknown values. Also rerun the code that creates the new variable that is the distance to the nearest highway. After rerunning this code from PS2 Question 1, save the new le as PS3data. We are now ready to begin with the new Stata questions.

1. Maternal Age and Birthweight Well start by examining a medical question: how maternal age is associated with child birthweight. (a) First, estimate a simple linear regression model with weight as the dependent variable and age mother as the independent variable. Interpret your estimated coecient for age in words. (b) Create a new variable called yhat slr that contains the predicted values of birthweight based on this regression. To do this, the command is simply predict yhat slr, xb. Write down the formula to calculate the predicted dierence in birthweight between a 19-year old and 25-year old mother, and calculate this value (either by hand, or using Stata). (c) Calculate the predicted dierence in birthweight between a 25-year old and 31year old mother, and compare to the preceding answer. (d) The medical literature suggests that very low birthweight (a bad outcome) is more common among both young and old mothers, relative to the average maternal age. Explain why it is impossible to capture such a relationship using SLR. (e) Create a new variable with the squared age of mother. (f) Estimate a multiple linear regression model with both age and age-squared as explanatory variables. Interpret your estimated coecients (for both variables) in words. (g) As you did in part (b), create yhat mlr containing predicted values of birthweight based on this regression. Write down the formula used to calculate the predicted dierence in birthweight between a 19-year old and 25-year old mother, and calculate this value (either by hand, or using Stata). (h) Calculate the predicted dierence in birthweight between a 25-year old and 31year old mother, and compare to the preceding answer. (i) Graph the predicted values to compare the two regressions visually. Click on the Graphics menu, select Two-way Graphs. First create Plot 1 (Basic category, Scatter) to graph your predictions from the SLR against maternal age. 2

Then create Plot 2 to graph the predictions from the MLR against maternal age. Once you have created the desired graph, with both sets of predicted values shown at once, exit the graph dialogue and write down the command that was automatically submitted by Stata. In the future, you can use this command (replacing variable names as needed) instead of going through the steps of the Graphics module. (j) Notice that the predicted functions for y intersect at two points. Figure out what the maternal ages are at those two points (this will be easier to calculate by hand).

2. Partialling out Interpretation of MLR (a) Estimate a simple regression model with distance to closest highway as the dependent variable and age of the mother as the independent variable. (Recall that these are both potential explanatory variables for birthweight.) What is the estimated coecient for age of the mother? Interpret this coecient in words. (b) We want to create a new variable that is equal to the residuals from the last regression. Use the following command in your do-le: predict residuals, resid. Now check the mean of this new variable (residuals). Does this mean make sense (interpret to the rst 5 decimal points)? (c) Estimate a simple regression with weight as the dependent variable and residuals as the independent variable. What is the estimated coecient on the residual variable? (d) Now regress weight on both the age of the mother and the distance to the closest highway. Compare the estimated coecient (on distance to highway) with the estimated coecient on residual in part (c). Explain and interpret any dierences or similarities.

Potrebbero piacerti anche