Sei sulla pagina 1di 2

Statistics 191

Introduction to Regression Analysis and Applied


Statistics
Assignment #1
Due Wednesday, January 21
Prof. J. Taylor

Use R for all calculations. Provide copies of your code in the


assignment.

Q. 1) (Mendenhall & Sincich, 1.51) Many Vietnam veterans have dangerously


high levels of the dioxin TCDD (2,3,7,8-tetrachlorodibenzo-p-dioxin) in
blood and fat tissue as a result of their exposure to the defoliant Agent
Orange. A study published in Chemosphere (Vol. 20, 1990) reported on
the TCDD levels of 20 Massachusetts Vietnam veterans who were possibly
exposed to Agent Orange. The amounts of TCDD (measured in parts per
trillion) in blood plasma are in the table
http://www-stat.stanford.edu/˜jtaylo/courses/stats191/data/TCDD.table

(a) Construct a 90% confidence interval for the true mean TCDD level
in the plasma of all Vietnam veterans exposed to Agent Orange.
(b) Interpret the interval in part (a).
(c) What assumptions are you making?

Q. 2) Use the right-to-work data discussed in class for this question:


http://www.ilr.cornell.edu/˜hadi/RABE4/Data4/P005.txt
We are interested in the question of whether right-to-work laws have af-
fected the unionization rates in various states.
(a) Create a boxplot of the unionization rate, splitting the data based
on whether states that have right-to-work or not.
(b) Compute the sample mean, sample standard deviation within each
group of states.
(c) Create a histogram of the unionization rate, one for each group of
states.

1
(d) Compute a 95% confidence interval for the difference in mean union-
ization rates within each group of states. What assumptions are you
making?
(e) At level α = 0.10, test the null hypothesis that the mean unioniza-
tion rate is the same within each group. What assumptions are you
making? What can you conclude?
(f) Repeat the test in (e) using the function lm.
Q. 3) (a) (RABE, 2.7) Load the Anscombe quartet data into R, located at
http://www-stat.stanford.edu/˜jtaylo/courses/stats191/data/anscombe.table
using the command read.table.
(b) Attach the table using the command attach.
(c) Plot the 4 data sets on a 2-by-2 grid of plots using the commands
plot and par(mfrow=c(2,2)). Add the number of each plot as the
main title on each plot.
(d) Fit a regression model to the data sets:
• Y1 ˜X1
• Y2 ˜X2
• Y3 ˜X3
• Y4 ˜X4
using the command lm. Verify that all the fitted models have the
exact same coefficients.
(e) Using the command cor, compute the sample correlation for each
data set.
(f) Fit the models with X and Y reversed
• X1 ˜Y1
• X2 ˜Y2
• X3 ˜Y3
• X4 ˜Y4
Using the command summary, does anything about the results stay
the same when you reverse X and Y ?
(g) Compute the SSE, SST and R2 value for each data set. Use the
commands mean, sum, predict.
(h) Using the command summary, verify that all 4 models have exactly
the same t-statistics for testing the hypotheses H0 : β0 = 0 and
H0 : β1 = 0.
(i) Using the command abline, replot the data, adding the regression
line to each plot.

Potrebbero piacerti anche