Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
1, Thursday)
Chapter 2 Problems:
[1] Text Problems 1.27, 2.27 and 2.28 (for 2.28, a) and b) only).
[SAS output]
The REG Procedure
Model: MODEL1
Dependent Variable: muscle
Number of Observations Read
Number of Observations Used
Number of Observations with Missing Values
61
60
1
Analysis of Variance
Source
DF
Sum of
Squares
Mean
Square
Model
Error
Corrected Total
1
58
59
11627
3874.44750
15502
11627
66.80082
Root MSE
Dependent Mean
Coeff Var
8.17318
84.96667
9.61927
R-Square
Adj R-Sq
F Value
Pr > F
174.06
<.0001
0.7501
0.7458
Parameter Estimates
Variable
Intercept
age
DF
Parameter
Estimate
Standard
Error
t Value
Pr > |t|
1
1
156.34656
-1.19000
5.51226
0.09020
28.36
-13.19
<.0001
<.0001
167.38056
-1.00945
Output Statistics
Obs
Dependent
Variable
Predicted
Value
Std Error
Mean Predict
1
2
3
4
5
6
7
8
9
106.0000
106.0000
97.0000
113.0000
96.0000
119.0000
92.0000
112.0000
92.0000
105.1768
107.5567
100.4168
101.6068
102.7968
107.5567
100.4168
107.5567
99.2268
1.8601
2.0113
1.5763
1.6444
1.7146
2.0113
1.5763
2.0113
1.5105
<
58
59
60
61
56.0000
70.0000
74.0000
.
95% CL Mean
101.4534
103.5308
97.2615
98.3151
99.3647
103.5308
97.2615
103.5308
96.2032
108.9001
111.5827
103.5721
104.8984
106.2289
111.5827
103.5721
111.5827
102.2504
95% CL Predict
88.3980
90.7083
83.7549
84.9185
86.0803
90.7083
83.7549
90.7083
82.5893
Residual
121.9555
124.4052
117.0787
118.2950
119.5133
124.4052
117.0787
124.4052
115.8642
0.8232
-1.5567
-3.4168
11.3932
-6.7968
11.4433
-8.4168
4.4433
-7.2268
82.6546
87.3051
82.6546
101.4430
-9.9069
-0.6669
8.0931
.
65.9069
70.6669
65.9069
84.9468
1.7890
1.5127
1.7890
1.0552
62.3259
67.6390
62.3259
82.8347
69.4879
73.6948
69.4879
87.0590
49.1592
54.0287
49.1592
68.4507
1.27
a) The estimated regression function is Y = 156.35 1.1900X. The linear regression looks like a good
fit in terms of picking up the general trend in how muscle mass changes with age, and supports the
idea that muscle mass is decreasing with age.
b) (1) is simply asking for an estimate of 1 , which is -1.1900. (2) Yh = 156.35 1.1900 60 = 84.95.
(3) e8 = Y8 Y8 = 112(156.35 1.19 (41)) = 4.44. (4) M SE = 66.8.
2.27
a) This is a one-sided t-test for 1 . We are testing H0 : 1 = 0 vs Ha : 1 < 0. For = 0.05, we
b1
would reject H0 if the observed value of t = s(b
is less than t(0.95, 58) = 1.671 (an approximation
1)
from table B.2 with degrees of freedom = 60). Since tobs = 13.19, reject H0 and conclude 1 < 0. As
stated on the assignment, you do not need to calculate an exact p-value.
b) No. Doing so would be extrapolation since we are fitting the model for ages 40 and above, since
the X data values start at age 40. There is no reason to believe the model will hold at lower ages.
Interpreting 0 as E(Y ) at X = 0 only makes sense if the linear model works all the way down to 0
(it often doesnt), or when the data includes X values near 0.
c) This is just a confidence interval for 1 which is given in the SAS output as (1.37054, 1.00945).
The dierence [E(Y ) at (X + 1)] - [E(Y ) at X] is 1 no matter what X is.
2.28
a) From SAS output, the confidence interval for the expected mass at age 60 is (82.8347, 87.0590). We
conclude with 95% confidence that the mean muscle mass at age 60 is between 82.8347 and 87.0590.
Note that this confidence coecient are interpreted with respect to repeated sampling where the values
of a predictor are kept at the same level as in the observed sample.
b) From SAS output, the prediction interval for the mass of a randomly selected woman from those
age 60 is (68.4507, 101.4430). Whether the prediction interval is precise or not depends on i) if the
assumptions in a simple linear regression model are satisfied and ii) how important a range of 68 to
105 is. But, since that is close to the full range of mass values over the data, it looks very imprecise to
me.
[2]
2
2
a) The
estimate of 0 , 1 , and are b0 = 168.6, b1 = 2.03438, s = M SE = 10.45893 and
2
s = s = 3.23403.
b) Using table B.2 in the book, t(.975, 14) = 2.145. The 95% confidence interval for 0 is 168.6
2.145(2.65702) = (162.90125, 174.29875) and The 95% confidence interval for 1 is 2.034382.145(.09039) =
(1.8405046, 2.2282554).
c) The t-statistic of 22.51 is tobs = 2.0343/0.09039 which is the estimate/ standard error. The p-value
is the probability (before the data is collected) when H0 is true (that is H0 : 0 = 0) that we will get
a t statistic t such that | t |> 22.51. This probability is less than .0001.
d) The value Yh at Xh = 24 is 168.6 + 2.03438
of E(Y ) at
24 = 217.42512.
For the confidence interval
2
2
(Xh X)
M
SE(Xh X)
1
M SE
2
(Xi X)
(Xi X)
M SE
2
2
2
above, M SE = 10.45893 and s2 (b1 ) = (X
2 = (0.09039) . Since (Xh X) = (24 28) = 16,
i X)
s2 (Y ) = 0.78442 and s(Y ) = 0.8857. Using a critical value of t(0.975, 14) = 2.145, we have the
confidence interval for E(Y ), 217.42512 2.145 0.8857 = (215.52554, 219.3247) (subject to rounding
error).
e) We are now trying to predict the response from a single observation that will be taken at X = 24.
The predicted value is the same as the estimate of the mean at 24, namely 217.42512. s2 (pred) =
M SE + s2 (Yh ) = 10.45893 + 0.78442 = 11.243354 and s(pred) = 3.35311. The prediction interval is
217.42512 2.145 3.3531104 = (210.23341, 224.61683).
2.29
b) The following is the ANOVA table from SAS
Analysis of Variance
Source
DF
Sum of
Squares
Mean
Square
Model
Error
Corrected Total
1
58
59
11627
3874.44750
15502
11627
66.80082
F Value
Pr > F
174.06
<.0001
Fobs
= 174.06 from the SAS output, so reject H0 . Also, reject H0 if p-value < = 0.05. From the
SAS output, p-value < .0001 which is less than 0.05. So reject H0 . Conclude that there is significant
linear relationship between muscle mass and age.
Parameter Estimates
Variable
Intercept
age
DF
Parameter
Estimate
Standard
Error
t Value
Pr > |t|
1
1
156.34656
-1.19000
5.51226
0.09020
28.36
-13.19
<.0001
<.0001
8.17318
84.96667
9.61927
R-Square
Adj R-Sq
0.7501
0.7458
age
1.00000
-0.86606
<.0001
-0.86606
<.0001
1.00000
DF
Parameter
Estimate
Standard
Error
t Value
Pr > |t|
1
1
20518
-170.57519
3277.64269
41.57433
6.26
-4.10
<.0001
<.0001
29161
-60.93856
DF
Sum of
Squares
Mean
Square
Model
Error
Corrected Total
1
82
83
93462942
455273165
548736108
93462942
5552112
F Value
Pr > F
16.83
<.0001
Fobs
= 16.83 from the SAS output, so reject H0 . Also, reject H0 if p-value < = 0.01. From the SAS
output, p-value < .0001 which is less than 0.05. So reject H0 . We conclude that there is a significant
linear association between crime rate and percentage of high school graduates.
Root MSE
Dependent Mean
Coeff Var
2356.29195
7111.20238
33.13493
R-Square
Adj R-Sq
0.1703
0.1602
Here | tosb |2 = F , since (4.1)2 = 16.83 (with some rounding error). Yes. the p-value for the F-test is
the same as that for the t-test.
c) The percent explained is about 17% (R2 = 0.1703). This seems to be somewhat low.
1.00000
population
-0.41270
<.0001
population
-0.41270
<.0001
1.00000