Sei sulla pagina 1di 2

Regression Assignment Instructions

This assignment is due on the last day of instruction, May 8, at 11:59 PM. To submit the assignment, email me an Excel le that includes your answers to all parts of the assignment. The le name should be your last name, an underscore, and rst initial (for example, Michael Jacksons le would be called jackson m.xls). Check your le for viruses! I will mark down your paper by two letter grades if you send me an infected le. I recommend using the campus computer labs to complete the assignment if you dont have a copy of Excel at home. For Mac users, I would also recommend using the campus computers.

Background A problem of interest to health ocials (and others) is to determine the eects of smoking during pregnancy on infant health. One measure of infant health is birth weight; a birth weight that is too low can put an infant at risk for contracting various illnesses. Since factors other than cigarette smoking that aect birth weight are likely to be correlated with smoking, we should take those factors into account. 1. Download the le birth weight.txt from Moodle. Import this data into an Excel le. Name this worksheet data. Make sure all the data ends up in the correct columns and are properly named. Here are the variable names and their descriptions: Variable Name faminc cigtax cigprice bweight fatheduc motheduc parity gender white cigs Description family income, $1000s cigarette tax in home state cigarette price in home state childs birth weight, ounces fathers years of education mothers years of education birth order of child gender of child =Yes if child is white cigarettes smoked per day while pregnant

2. Create a dummy variable based on the variable gender. The dummy should equal 1 for male children and 0 otherwise. Name this variable gender dum. Create a dummy variable based on the variable white. The dummy should equal 1 for white children, 0 otherwise. Name this variable white dum. 3. Create a new variable based on the variables cigtax and cigprice. The new variable should equal the after-tax price of cigarettes. Name this new variable cigcost. 4. Examine the values of the variables motheduc and fatheduc. Sometimes, if data is missing, researchers will assign a value of -999 to observations that have missing values. Remove any such observations from the data set entirely. 1

5. Perform a simple regression using the following model: bweight = + cigs + Name the worksheet with the regression output regression 1. Expand the columns as needed to make the results look nice. 6. In a text box inserted in the regression 1 worksheet, interpret the regression results. Specifically, you should discuss: (a) the meaning of each slope coecient (dont forget the intercept), including an explanation of the eect of the variable on birth weight and whether the coecient has the sign you would expect (b) the statistical signicance of each coecient (c) the explanatory power of the regression as a whole, using R-squared Be sure to put your explanations of slope coecients in terms of the original units of measure, as given in the original data le. 7. Now examine the relationship between cigarette smoking and birth weight visually. Create a new worksheet tab named chart. On the new tab, create a scatter plot with trend line showing the linear relationship. The birthweight variable should be on the y-axis and the number of cigarettes smoked should be on the x-axis. Make the chart look pretty by removing the gridlines and labeling each axis. 8. Now perform a multiple regression, using the same explanatory variables as the rst regression, but include faminc, parity, motheduc, gender, and white as explanatory variables. Expand the columns to make the results look nice. Name the worksheet with the new regression output regression 2 (a) In a text box inserted into this worksheet, interpret the results of this regression (as in question 6 above, parts a, b, and c). (b) In a second text box in this worksheet, compare your results from this regression to the previous one. Does the second regression do a better job of explaining dierences in birthweight? 9. Think carefully about the possible determinants of a childs birthweight. What other variables do you think might be relevant? How would you go about including them in the regression? (You do not need to perform another regression. Just explain in words the approach you might use.) Create a new worksheet called analysis, and put your answer to this question in a text box in that worksheet. 10. Can you think of any other problems or diculties with the approach weve used? Put your answer in a second text box in the analysis worksheet. (Do not repeat your answer to question 9, and do not expect full credit for pointing out just one potential problem or diculty. (Remember the four basic assumptions of the simple linear regression model.)

Potrebbero piacerti anche