Sei sulla pagina 1di 5

Homework 3 Solution (Due Oct.

1, Thursday)

Chapter 2 Problems:
[1] Text Problems 1.27, 2.27 and 2.28 (for 2.28, a) and b) only).
[SAS output]
The REG Procedure
Model: MODEL1
Dependent Variable: muscle
Number of Observations Read
Number of Observations Used
Number of Observations with Missing Values

61
60
1

Analysis of Variance
Source

DF

Sum of
Squares

Mean
Square

Model
Error
Corrected Total

1
58
59

11627
3874.44750
15502

11627
66.80082

Root MSE
Dependent Mean
Coeff Var

8.17318
84.96667
9.61927

R-Square
Adj R-Sq

F Value

Pr > F

174.06

<.0001

0.7501
0.7458

Parameter Estimates
Variable
Intercept
age

DF

Parameter
Estimate

Standard
Error

t Value

Pr > |t|

1
1

156.34656
-1.19000

5.51226
0.09020

28.36
-13.19

<.0001
<.0001

95% Confidence Limits


145.31257
-1.37054

167.38056
-1.00945

Output Statistics
Obs

Dependent
Variable

Predicted
Value

Std Error
Mean Predict

1
2
3
4
5
6
7
8
9

106.0000
106.0000
97.0000
113.0000
96.0000
119.0000
92.0000
112.0000
92.0000

105.1768
107.5567
100.4168
101.6068
102.7968
107.5567
100.4168
107.5567
99.2268

1.8601
2.0113
1.5763
1.6444
1.7146
2.0113
1.5763
2.0113
1.5105

<
58
59
60
61

56.0000
70.0000
74.0000
.

95% CL Mean
101.4534
103.5308
97.2615
98.3151
99.3647
103.5308
97.2615
103.5308
96.2032

108.9001
111.5827
103.5721
104.8984
106.2289
111.5827
103.5721
111.5827
102.2504

95% CL Predict
88.3980
90.7083
83.7549
84.9185
86.0803
90.7083
83.7549
90.7083
82.5893

Residual

121.9555
124.4052
117.0787
118.2950
119.5133
124.4052
117.0787
124.4052
115.8642

0.8232
-1.5567
-3.4168
11.3932
-6.7968
11.4433
-8.4168
4.4433
-7.2268

82.6546
87.3051
82.6546
101.4430

-9.9069
-0.6669
8.0931
.

outputs for case 10 - case 57 were omitted >

65.9069
70.6669
65.9069
84.9468

1.7890
1.5127
1.7890
1.0552

62.3259
67.6390
62.3259
82.8347

69.4879
73.6948
69.4879
87.0590

49.1592
54.0287
49.1592
68.4507

1.27
a) The estimated regression function is Y = 156.35 1.1900X. The linear regression looks like a good
fit in terms of picking up the general trend in how muscle mass changes with age, and supports the
idea that muscle mass is decreasing with age.
b) (1) is simply asking for an estimate of 1 , which is -1.1900. (2) Yh = 156.35 1.1900 60 = 84.95.
(3) e8 = Y8 Y8 = 112(156.35 1.19 (41)) = 4.44. (4) M SE = 66.8.
2.27
a) This is a one-sided t-test for 1 . We are testing H0 : 1 = 0 vs Ha : 1 < 0. For = 0.05, we
b1
would reject H0 if the observed value of t = s(b
is less than t(0.95, 58) = 1.671 (an approximation
1)
from table B.2 with degrees of freedom = 60). Since tobs = 13.19, reject H0 and conclude 1 < 0. As
stated on the assignment, you do not need to calculate an exact p-value.
b) No. Doing so would be extrapolation since we are fitting the model for ages 40 and above, since
the X data values start at age 40. There is no reason to believe the model will hold at lower ages.
Interpreting 0 as E(Y ) at X = 0 only makes sense if the linear model works all the way down to 0
(it often doesnt), or when the data includes X values near 0.
c) This is just a confidence interval for 1 which is given in the SAS output as (1.37054, 1.00945).
The dierence [E(Y ) at (X + 1)] - [E(Y ) at X] is 1 no matter what X is.
2.28
a) From SAS output, the confidence interval for the expected mass at age 60 is (82.8347, 87.0590). We
conclude with 95% confidence that the mean muscle mass at age 60 is between 82.8347 and 87.0590.
Note that this confidence coecient are interpreted with respect to repeated sampling where the values
of a predictor are kept at the same level as in the observed sample.
b) From SAS output, the prediction interval for the mass of a randomly selected woman from those
age 60 is (68.4507, 101.4430). Whether the prediction interval is precise or not depends on i) if the
assumptions in a simple linear regression model are satisfied and ii) how important a range of 68 to
105 is. But, since that is close to the full range of mass values over the data, it looks very imprecise to
me.

[2]
2
2
a) The
estimate of 0 , 1 , and are b0 = 168.6, b1 = 2.03438, s = M SE = 10.45893 and
2
s = s = 3.23403.

b) Using table B.2 in the book, t(.975, 14) = 2.145. The 95% confidence interval for 0 is 168.6
2.145(2.65702) = (162.90125, 174.29875) and The 95% confidence interval for 1 is 2.034382.145(.09039) =
(1.8405046, 2.2282554).
c) The t-statistic of 22.51 is tobs = 2.0343/0.09039 which is the estimate/ standard error. The p-value
is the probability (before the data is collected) when H0 is true (that is H0 : 0 = 0) that we will get
a t statistic t such that | t |> 22.51. This probability is less than .0001.
d) The value Yh at Xh = 24 is 168.6 + 2.03438
of E(Y ) at
24 = 217.42512.
For the confidence interval
2
2
(Xh X)
M
SE(Xh X)
1
M SE
2

Xh = 24, we need to calculate s (Yh ) = M SE


+
+
2 =
2 . From outputs
n

(Xi X)

(Xi X)

M SE
2
2
2
above, M SE = 10.45893 and s2 (b1 ) = (X
2 = (0.09039) . Since (Xh X) = (24 28) = 16,
i X)
s2 (Y ) = 0.78442 and s(Y ) = 0.8857. Using a critical value of t(0.975, 14) = 2.145, we have the
confidence interval for E(Y ), 217.42512 2.145 0.8857 = (215.52554, 219.3247) (subject to rounding
error).

e) We are now trying to predict the response from a single observation that will be taken at X = 24.
The predicted value is the same as the estimate of the mean at 24, namely 217.42512. s2 (pred) =
M SE + s2 (Yh ) = 10.45893 + 0.78442 = 11.243354 and s(pred) = 3.35311. The prediction interval is
217.42512 2.145 3.3531104 = (210.23341, 224.61683).
2.29
b) The following is the ANOVA table from SAS
Analysis of Variance
Source

DF

Sum of
Squares

Mean
Square

Model
Error
Corrected Total

1
58
59

11627
3874.44750
15502

11627
66.80082

F Value

Pr > F

174.06

<.0001

c) Here we are testing H0 : 1 = 0 vs Ha : 1 = 0. Reject H0 if Fobs


> F (0.95; 1, 58) = 4.007. Here,

Fobs
= 174.06 from the SAS output, so reject H0 . Also, reject H0 if p-value < = 0.05. From the
SAS output, p-value < .0001 which is less than 0.05. So reject H0 . Conclude that there is significant
linear relationship between muscle mass and age.

Parameter Estimates
Variable
Intercept
age

DF

Parameter
Estimate

Standard
Error

t Value

Pr > |t|

1
1

156.34656
-1.19000

5.51226
0.09020

28.36
-13.19

<.0001
<.0001

d) 1 R2 = 1 0.7501 = 0.2499 is amount unexplained, which is relatively small.


Root MSE
Dependent Mean
Coeff Var

8.17318
84.96667
9.61927

R-Square
Adj R-Sq

0.7501
0.7458

e) R2 = SSR/SST O = 0.7501(or from SAS output). r = R2 = 0.866. r is negative since the


slope of the regression line is negative from SAS output.

Pearson Correlation Coefficients, N = 60


Prob > |r| under H0: Rho=0
muscle
muscle
age

age

1.00000

-0.86606
<.0001

-0.86606
<.0001

1.00000

2.30 See the following SAS outputs


Parameter Estimates
Variable
Intercept
population

DF

Parameter
Estimate

Standard
Error

t Value

Pr > |t|

1
1

20518
-170.57519

3277.64269
41.57433

6.26
-4.10

<.0001
<.0001

99% Confidence Limits


11874
-280.21182

29161
-60.93856

a) Here we are testing H0 : 1 = 0 vs Ha : 1 = 0. Reject H0 if | tobs |> t(0.995; 82) = 2.637.


Here, | tobs |=| 4.1 |= 4.1 from the SAS output. Since | tobs |= 4.1 > 2.637, we reject H0 . The pvalue<0.0001, from the SAS output. We conclude that there is a significant linear association between
crime rate and percentage of high school graduates.
c) The 99% CI for 1 is b1 t(0.995, n 2)s(b1 ) = 170.58 2.637(41.57) = (280.2, 60.94).
2.31
a) See the following SAS outputs
Analysis of Variance
Source

DF

Sum of
Squares

Mean
Square

Model
Error
Corrected Total

1
82
83

93462942
455273165
548736108

93462942
5552112

F Value

Pr > F

16.83

<.0001

b) Here we are testing H0 : 1 = 0 vs Ha : 1 = 0. Reject H0 if Fobs


> F (0.99; 1, 82) = 6.954. Here,

Fobs
= 16.83 from the SAS output, so reject H0 . Also, reject H0 if p-value < = 0.01. From the SAS
output, p-value < .0001 which is less than 0.05. So reject H0 . We conclude that there is a significant
linear association between crime rate and percentage of high school graduates.

Root MSE
Dependent Mean
Coeff Var

2356.29195
7111.20238
33.13493

R-Square
Adj R-Sq

0.1703
0.1602

Here | tosb |2 = F , since (4.1)2 = 16.83 (with some rounding error). Yes. the p-value for the F-test is
the same as that for the t-test.
c) The percent explained is about 17% (R2 = 0.1703). This seems to be somewhat low.

d) r = R2 = 0.4127. This is negative since the regression slope is negative.


Pearson Correlation Coefficients, N = 84
Prob > |r| under H0: Rho=0
rate
rate

1.00000

population

-0.41270
<.0001

[3] Text Problems 2.1.

population
-0.41270
<.0001
1.00000

a) The 95% confidence interval for 1 allows us to test H0 : 1 = 0 vs. Ha : 1 = 0 at = .05 by


rejecting H0 if 0 is not in the interval ; which it is not. Hence the conclusion is warranted with an
implied level of significance of .05. This assumes that a plot shows that the simple linear regression
model is reasonable.
b) First, there is no reason to be sure that the regression relationship used still holds at low values of
X. If it does not then we would not interpret 0 as the expected sales at population 0 (in this case, at
zero population you would have zero sales, exactly).

Potrebbero piacerti anche