Sei sulla pagina 1di 135

Chapter 11

Multiple Regression and Model Building


E(y) = 0 + 1x1 + 2x2
E(y) = 0 + 1x1 + 2x2 + 3x3 + 4x4

c.
11.2

a.
b.

11.1

E(y) = 0 + 1x1 + 2x2 + 3x3 + 4x4 + 5x5

a.

0 = 506.346, 1 = 941.900, 2 = -429.060

b.

y = 506.346 941.900x1 429.060x2

c.

SSE = 151,016, MSE = 8883, s = 94.251


We expect about 95% of the y-values to fall within 2s or 2(94.251) or 188.502 units of the fitted
regression equation.

d.

H0: 1 = 0
Ha: 1 0
The test statistic is t =

1 0
s

941.900
= 3.42
275.08

The rejection region requires /2 = .05/2 = .025 in each tail of the t distribution with df = n (k + 1)
= 20 (2 + 1) = 17. From Table V, Appendix B, t.025 = 2.110. The rejection region is t < 2.110 or t
> 2.110.
Since the observed value of the test statistic falls in the rejection region (t = 3.42 < 2.110), H0 is
rejected. There is sufficient evidence to indicate 1 0 at = .05.
e.

For confidence coefficient .95, = .05 and /2 = .025. From Table V, Appendix B, with
df = n (k + 1) = 20 (2 + 1) = 17, t.025 = 2.110. The 95% confidence interval is:

2 t.025 s 429.060 2.110(379.83) 429.060 801.441


2

(1230.501, 372.381)

f.

R2 = R-Sq = 45.9% . 45.9% of the total sample variation of the y values is explained by the model
containing x1 and x2.
R2a = R-Sq(adj) = 39.6%. 39.6% of the total sample variation of the y values is explained by the
model containing x1 and x2, adjusted for the sample size and the number of parameters in the model.

676
Copyright 2011 Pearson Education, Inc. Publishing as Prentice Hall.

Multiple Regression and Model Building

g.

677

To determine if at least one of the independent variables is significant in prediction y, we test:


H0: 1 = 2 = 0
Ha: At least one i 0

From the printout, the test statistic is F = 7.22


Since no level was given, we will choose = .05. The rejection region requires = .05 in the
upper tail of the F-distribution with 1 = k = 2 and 2 = n (k + 1) = 20 (2 + 1) = 17. From Table
VIII, Appendix B, F.05 = 3.59. The rejection region is F > 3.59.
Since the observed value of the test statistic falls in the rejection region ( F = 7.22 > 3.59), H0 is
rejected. There is sufficient evidence to indicate at least one of the variables, x1 or x2, is significant in
predicting y at = .05.
h.

a.

We are given 2 = 2.7, s = 1.86, and n = 30.

H0: 2 = 0
Ha: 2 0

The test statistic is t =

2 0
s

2.7
= 1.45
1.86

The rejection region requires /2 = .05/2 = .025 in each tail of the t distribution with df = n (k + 1)
= 30 (3 + 1) = 26. From Table V, Appendix B, t.025 = 2.056. The rejection region is t < 2.056 or t
> 2.056.
Since the observed value of the test statistic does not fall in the rejection region (t = 1.45 2.056),

H0 is not rejected. There is insufficient evidence to indicate 2 0 at = .05.


b.

We are given 3 = .93, s = .29, and n = 30.

Test

H0: 3 = 0
Ha: 3 0

The test statistic is t =

3 0
s

.93
= 3.21
.29

The rejection region is the same as part a, t < 2.056 or t > 2.056.
Since the observed value of the test statistic falls in the rejection region (t = 3.21 > 2.056), H0 is
rejected. There is sufficient evidence to indicate 3 0 at = .05.

11.3

The observed significance level of the test is p-value = 0.005. Since the p-value is so small, we will
reject H0 for most reasonable values of . There is sufficient evidence to indicate at least one of the
variables, x1 or x2, is significant in predicting y at greater than 0.005.

c.

3 has a smaller estimated standard error than 2 . Therefore, the test statistic is larger for 3 even

though 3 is smaller than 2 .

Copyright 2011 Pearson Education, Inc. Publishing as Prentice Hall.

678

11.4

Chapter 11

a.

We are given 1 = 3.1, s = 2.3, and n = 25.

H0: 1 = 0
H a: 1 > 0

The test statistic is t =

1 0
s

3.1
= 1.35
2.3

The rejection region requires = .05 in the upper tail of the t distribution with df =
n (k + 1) = 25 (2 + 1) = 22. From Table V, Appendix B, t.05 = 1.717. The rejection region is
t > 1.717.
Since the observed value of the test statistic does not fall in the rejection region (t = 1.35 1.717),

H0 is not rejected. There is insufficient evidence to indicate 1 > 0 at = .05.


b.

We are given 2 = .92, s = .27, and n = 25.

H0: 2 = 0
Ha: 2 0
The test statistic is t =

2 0
s

.92
= 3.41
.27

The rejection region requires /2 = .05/2 = .025 in each tail of the t distribution with df = n (k + 1)
= 25 (2 + 1) = 22. From Table V, Appendix B, t.025 = 2.074. The rejection region is t < 2.074 or t
> 2.074.
Since the observed value of the test statistic falls in the rejection region (t = 3.41 > 2.074), reject H0.
There is sufficient evidence to indicate 2 0 at = .05.
c.

For confidence coefficient .90, = 1 .90 = .10 and /2 = .10/2 = .05. From Table V, Appendix B,
with df = n (k + 1) = 25 (2 + 1) = 22, t.05 = 1.717. The confidence interval is:

1 t.05 s 3.1 1.717(2.3) 3.1 3.949 (.849, 7.049)


1

We are 90% confident that 1 falls between .849 and 7.049.


d.

For confidence coefficient .99, = 1 .99 = .01 and /2 = .01/2 = .005. From Table V, Appendix B,
with df = n (k + 1) = 25 (2 + 1) = 22, t.005 = 2.819. The confidence interval is:

2 t.005 s .92 2.819(.27) .92 .761 (.159, 1.681)


2

We are 99% confident that 2 falls between .159 and 1.681.


11.5

The number of degrees of freedom available for estimating 2 is n (k + 1) where k is the number of
independent variables in the regression model. Each additional independent variable placed in the model
causes a corresponding decrease in the degrees of freedom.

Copyright 2011 Pearson Education, Inc. Publishing as Prentice Hall.

Multiple Regression and Model Building

11.6

a.

For x2 = 1 and x3 = 3,
E(y) = 1 + 2x1 + 1 3(3)
E(y) = 2x1 7
The graph is :

b.

For x2 = 1 and x3 = 1
E(y) = 1 + 2x1 + (1) 3(1)
E(y) = 2x1 3
The graph is:

c.

They are parallel, each with a slope of 2. They have different y-intercepts.

d.

The relationship will be parallel lines.

Copyright 2011 Pearson Education, Inc. Publishing as Prentice Hall.

679

680

Chapter 11

a.

Yes. Since R2 = .92 is close to 1, this indicates the model provides a good fit. Without knowledge of
the units of the dependent variable, the value of SSE cannot be used to determine how well the model
fits.

b.

11.7

H0: 1 = 2 = = 5 = 0
Ha: At least one of the parameters is not 0
The test statistic is F =

R2 / k
(1 R ) /[n (k 1)]
2

.92 / 5
= 55.2
(1 .92) /[30 (5 1)]

The rejection region requires = .05 in the upper tail of the F distribution with 1 = k = 5 and 2 = n
(k + 1) = 30 (5 + 1) = 24. From Table VIII, Appendix B, F.05 = 2.62. The rejection region is F >
2.62.
Since the observed value of the test statistic falls in the rejection region (F = 55.2 > 2.62), H0 is
rejected. There is sufficient evidence to indicate the model is useful in predicting y at = .05.
11.8

No. There may be other independent variables that are important that have not been included in the model,
while there may also be some variables included in the model which are not important. The only
conclusion is that at least one of the independent variables is a good predictor of y.

11.9

a.

To determine if the model is useful, we test:


H0: 1 = 2 = 3 = 4 = 0
Ha: At least 1 i 0
From the problem, the test statistic is F = 4.74 and the p-value is less than .01.
Since the p-value is less than = .05 (p < .01), H0 is rejected. There is sufficient evidence to indicate
the model is useful for predicting accountants Mach scores at = .05

b.

R2 = .13. 13% of the total sample variation of the accountants Mach scores around their means is
explained by the model containing age, gender, education, and income.

c.

To determine if income is a useful predictor of Mach score, we test:


H0: 4 = 0
Ha: 4 0
From the printout, t = 0.52 and the p-value is p > .10. Since the p-value is greater than = .05 (p >
.10), H0 is not rejected. There is insufficient evidence to indicate that income is a useful predictor of
Mach score at = .05.

11.10

a.

The two properties are that the sum of the errors of prediction is 0 and the sum of the squares of the
errors of prediction is SSE.

b.

4 .42 . For each unit change in the betweenness centrality score, the mean lead-user rating is
estimated to increase by .42, holding all other variables constant.

c.

Since the p-value is less than (p = .002 < .05), H0 is rejected. There is sufficient evidence to indicate
that there is a significant linear relationship between betweenness centrality and lead-user rating,
holding all other variables constant.

Copyright 2011 Pearson Education, Inc. Publishing as Prentice Hall.

Multiple Regression and Model Building

11.11

a.

The least squares prediction equation is: y 1.81231 0.10875 x1 0.00017 x2

b.

681

o 1.81231 . Since x1 = 0 and x2 = 0 are not in the observed range, o has no meaning.

1 0.10875 . For each additional mile of roadway length, the mean number of crashes per three
years is estimated to increase by .10875 when average annual daily traffic is held constant.

2 0.00017 . For each additional unit increase in average annual daily traffic, the mean number of
crashes per three years is estimated to increase by .00017 when miles of roadway length is held
constant.

c.

For confidence coefficient .99, = .01 and /2 = .01/2 = .005. From Table V,
Appendix B, with df = n (k + 1) = 100 (2 + 1) = 97, t.005 2.63. The 99% confidence interval is:

1 t.005 s 0.10875 2.63(0.03166) 0.10875 0.08327


1

(0.02548, 0.19202)
We are 99% confident that the increase in the mean number of crashes per three years will be
between 0.02548 and 0.19202 for each additional mile of roadway length, holding average annual
daily traffic constant.
d.

The 99% confidence interval is:

2 t.005 s 0.00017 2.63(0.00003) 0.00017 0.00008


2

(0.00009, 0.00025)
We are 99% confident that the increase in the mean number of crashes per three years will be
between 0.00009 and 0.00025 for each additional unit increase in average annual daily traffic,
holding mile of roadway length constant.
e.

The least squares prediction equation is: y 1.20785 0.06343x1 0.00056 x2

o 1.20785 . Since x1 = 0 and x2 = 0 are not in the observed range, o has no meaning.

1 0.06343 . For each additional mile of roadway length, the mean number of crashes per three
years is estimated to increase by 0.06343 when average annual daily traffic is held constant.

2 0.00056 . For each additional unit increase in average annual daily traffic, the mean number of
crashes per three years is estimated to increase by 0.00056 when miles of roadway length is held
constant.
The 99% confidence interval is:

1 t.005 s 0.06343 2.63(0.01809) 0.06343 0.04758


1

(0.01585, 0.11101)

Copyright 2011 Pearson Education, Inc. Publishing as Prentice Hall.

682

Chapter 11

We are 99% confident that the increase in the mean number of crashes per three years will be
between 0.01585 and 0.11101 for each additional mile of roadway length, holding average annual
daily traffic constant.
The 99% confidence interval is:

2 t.005 s 0.00056 2.63(0.00012) 0.00056 0.00032


2

(0.00024, 0.00088)
We are 99% confident that the increase in the mean number of crashes per three years will be
between 0.00024 and 0.00088 for each additional unit increase in average annual daily traffic,
holding mile of roadway length constant.
a.

The first order model is: E(y) 0 1 x1 2 x2 3 x3 4 x4 5 x5

b.

R2 = .58. 58% of the total sample variation of the levels of trust is explained by the model containing
the 5 independent variables.

c.

d.

11.12

The rejection region requires = .10 in the upper tail of the F-distribution with 1 = k = 5 and
2 = n (k + 1) = 66 (5 + 1) = 60. From Table VII, Appendix B, F.10 = 1.90. The rejection region
is F > 1.96.

R2 k
(1 R ) [n (k 1)]
2

.58 5
16.57
(1 .58) [66 (5 1)]

Since the observed value of the test statistic falls in the rejection region (F = 16.57 > 1.96), H0 is
rejected. There is sufficient evidence to indicate that at least one of the 5 independent variables is
useful in the prediction of level of trust at = .10.
11.13

a.

1 2.006 . For each unit increase in the proportion of block with low-density residential areas, the
mean population density is estimated to increase by 2.006, holding proportion of block with highdensity residential areas constant. Since x1 is a proportion, it is unlikely that it can increase by one
unit. A better interpretation is: For each increase of .1 in the proportion of block with low-density
residential areas, the mean population density is estimated to increase by .2006, holding proportion of
block with high-density residential areas constant.

2 5.006 . For each unit increase in the proportion of block with high-density residential areas, the
mean population density is estimated to increase by 5.006, holding proportion of block with lowdensity residential areas constant. Since x2 is a proportion, it is unlikely that it can increase by one
unit. A better interpretation is: For each increase of .1 in the proportion of block with high-density
residential areas, the mean population density is estimated to increase by .5006, holding proportion of
block with low-density residential areas constant.

b.

R2 = .686. 68.6% of the total sample variation of the population densities is explained by the linear
relationship between population density and the independent variables proportion of block with lowdensity residential areas and the proportion of block with high-density residential areas.

c.

To determine if the overall model is adequate, we test:


H0: 1 = 2 = 0
Ha: At least one i 0

Copyright 2011 Pearson Education, Inc. Publishing as Prentice Hall.

Multiple Regression and Model Building

683

R2 / k

The test statistic is F

e.

The rejection region requires = .01 in the upper tail of the F distribution with 1 = k = 2 and 2 = n
(k + 1) = 125 (2 + 1) = 122. From Table X, Appendix B, F.01 4.79. The rejection region is
F > 4.79.

(1 R ) /[n (k 1)]
2

.686 / 2
133.27
(1 .686) /[125 (2 1)]

d.

Since the observed value of the test statistic falls in the rejection region (F = 133.27 > 4.79), H0 is
rejected. There is sufficient evidence to indicate the model is adequate at = .01.
11.14

a.

The least squares prediction equation is:

y 3.70 .34 x1 .49 x2 .72 x3 1.14 x4 1.51x5 .26 x6 .14 x7 .10 x8 .10 x9 .

b.

0 3.70 . This is estimate of the y-intercept. It has no other meaning because the point with all
independent variables equal to 0 is not in the observed range.

1 0.34 . For each additional walk, the mean number of runs scored is estimated to increase by
.30, holding all other variables constant.

2 0.49 . For each additional single, the mean number of runs scored is estimated to increase by
.49, holding all other variables constant.

3 0.72 . For each additional double, the mean number of runs scored is estimated to increase by
.72, holding all other variables constant.

4 1.14 . For each additional triple, the mean number of runs scored is estimated to increase by
1.14, holding all other variables constant.

5 1.51 . For each additional home run, the mean number of runs scored is estimated to increase
by 1.51, holding all other variables constant.

6 0.26 . For each additional stolen base, the mean number of runs scored is estimated to increase
by .26, holding all other variables constant.

7 0.14 . For each additional time a runner is caught stealing, the mean number of runs scored is
estimated too decrease by .14, holding all other variables constant.

8 0.10 . For each additional strikeout, the mean number of runs scored is estimated to decrease
by .10, holding all other variables constant.

9 0.10 . For each additional out, the mean number of runs scored is estimated to decrease by
.10, holding all other variables constant.

Copyright 2011 Pearson Education, Inc. Publishing as Prentice Hall.

684

Chapter 11

c.

H0: 7 = 0
Ha: 7 < 0
The test statistic is t

7 0
s

.14 0
1.00
.14

The rejection region requires = .05 in the lower tail of the t-distribution with df = n (k + 1) = 234
(9 + 1) = 224. From Table V, Appendix B, t.05 = 1.645. The rejection region is t < 1.645.
Since the observed value of the test statistic does not fall in the rejection region
(t = 1.00 1.645), H0 is not rejected. There is insufficient evidence to indicate that the mean

number of runs decreases as the number of runners caught stealing increase, holding all other
variables constant at = .05.
d.

For confidence level .95, = .05 and /2 = .05/2 = .025. From Table V, Appendix
B, with df = 224, t.025 = 1.96. The 95% confidence interval is:

5 t / 2 s 1.51 1.96(.05) 1.51 0.098 (1.412, 1.608)


5

We are 95% confident that the mean number of runs will increase by anywhere from 1.412 to 1.608
for each additional home run, holding all other variables constant.
11.15

a.

The first order model would be


E ( y ) 0 1 x1 2 x2 3 x3 4 x4

b.

Since the p-value is less than (p = .005 < .01), H0 is rejected. There is sufficient evidence to
indicate that there is a negative linear relationship between change from routine and the number of
years played golf, holding number of rounds of golf per year, total number of golf vacations, and
average golf score constant.

c.

The statement would be correct if the independent variables are not correlated. However, if the
independent variables are correlated, then this interpretation would not necessarily hold.

d.

To determine if the overall first-order regression model is adequate, we test:


H0: 1 = 2 = 3 = 4 = 0

e.

For all dependent variables, the rejection region requires = .01 in the upper tail of the
F-distribution with 1 = k = 4 and 2 = n (k + 1) = 393 (4 + 1) = 388. From Table X, Appendix
B, F.01 3.32. The rejection region is F > 3.32. Using MINITAB, the exact F.01, 4, 388 is 3.67. The
true rejection region is F > 3.67.

f.

For Thrill: Since the observed value of the test statistic falls in the rejection region
(F = 5.56 > 3.67), H0 is rejected. There is sufficient evidence to indicate at least one of the 4
independent variables is linearly related to Thrill at = .01.
For Change from Routine: Since the observed value of the test statistic does not fall in the rejection
region (F = 3.02 3.67), H0 is not rejected. There is insufficient evidence to indicate at least one of

the 4 independent variables is linearly related to Change from Routine at = .01.


For Surprise: Since the observed value of the test statistic does not fall in the rejection region (F =
3.33 3.67), H0 is not rejected. There is insufficient evidence to indicate at least one of the 4

independent variables is linearly related to Surprise at = .01.

Copyright 2011 Pearson Education, Inc. Publishing as Prentice Hall.

Multiple Regression and Model Building

f.

685

For Thrill: Since the p-value is less than (p < .001 < .01), H0 is rejected. There is sufficient evidence
to indicate that at least one of the independent variables is linearly related to Thrill at = .01.
For Change from Routine: Since the p-value is not less than (p = .018 > .01), H0 is not rejected.
There is insufficient evidence to indicate that at least one of the independent variables is linearly
related to Change from Routine at = .01.
For Surprise: Since the p-value is not less than (p = .011 > .01), H0 is not rejected. There is
insufficient evidence to indicate that at least one of the independent variables is linearly related to
Surprise at = .01.

h.

For Thrill: R2 = .055. 5.5% of the total variability around the mean thrill values can be explained by
the model containing the 4 independent variables: x1 = number of rounds of golf per year, x2 = total
number of golf vacations taken, x3 = number of years played golf, and x4 = average golf score.
For Change from Routine: R2 = .030. 3.0% of the total variability around the mean change from
routine values can be explained by the model containing the 4 independent variables: x1 = number of
rounds of golf per year, x2 = total number of golf vacations taken, x3 = number of years played golf,
and x4 = average golf score.
For Surprise: R2 = .023. 2.3% of the total variability around the mean surprise values can be
explained by the model containing the 4 independent variables: x1 = number of rounds of golf per
year, x2 = total number of golf vacations taken, x3 = number of years played golf, and x4 = average
golf score.

11.16

a.

b.

Let x1 = latitude, x2 = longitude, and x3 = depth. The 1st-order model is


y = o + 1x1 + 2x2 + 3x3 + .
Using MINITAB, the results are:
Regression Analysis: ARSENIC versus LATITUDE, LONGITUDE, DEPTH-FT
The regression equation is
ARSENIC = - 86991 - 2220 LATITUDE + 1544 LONGITUDE - 0.349 DEPTH-FT
327 cases used, 1 cases contain missing values
Predictor
Constant
LATITUDE
LONGITUDE
DEPTH-FT

Coef
-86991
-2220.1
1543.9
-0.3493

S = 103.295

SE Coef
31218
526.8
373.0
0.1566

R-Sq = 12.8%

T
-2.79
-4.21
4.14
-2.23

P
0.006
0.000
0.000
0.026

R-Sq(adj) = 12.0%

Analysis of Variance
Source
Regression
Residual Error
Total
Source
LATITUDE
LONGITUDE
DEPTH-FT

DF
1
1
1

DF
3
323
326

SS
506196
3446366
3952562

MS
168732
10670

F
15.81

P
0.000

Seq SS
132506
320624
53066

The least squares model is: y 80, 991 2, 220.1latitude 1, 543.9longitude .3493depth

Copyright 2011 Pearson Education, Inc. Publishing as Prentice Hall.

686

Chapter 11

c.

1 -2,220.1. For each unit increase in latitude, the mean arsenic level is estimated to decrease by
2,220.1, holding longitude and depth constant.

1,543.91. For each unit increase in longitude, the mean arsenic level is estimated to increase by
2

1,543.91, holding latitude and depth constant.

-.3493. For each unit increase in depth, the mean arsenic level is estimated to decrease by
3

.3493, holding latitude and longitude constant.


d.

From the printout, the s = 103.295. We would expect about 95% of all observations to fall within 2s =
2(103.295) = 206.590 units of their predicted values.

e.

From the printout, R2 = 12.8%. 12.8% of the total sample variation of the arsenic levels is explained
by the model containing latitude, longitude, and depth.
From the printout, R2adj = 12.0. 12.0% of the total sample variation of the arsenic levels is explained
by the model containing latitude, longitude, and depth, adjusting for the sample size and number of
independent variables in the model.

f.

To determine if the model is adequate, we test:


H0: 1 = 2 = 3 = 0
Ha: At least one i 0
From the printout, the test statistic is F = 15.81 and the p-value is p = 0.000.
Since the p-value is less than (p = .000 < .05), H0 is rejected. There is sufficient evidence to indicate
the model is adequate at = .05.

g.

11.17

Although the model was found to be adequate for = .05, it is not a particularly good model. The R2
value is only 12.8% and R2adj = 12.0. Only about 12% of the variation in arsenic values is explained
by the model.

a.

The first-order model is: E(y) = 0 + 1x1 + 2x2

b.

Using MINITAB, the results of fitting the model are:


Regression Analysis: Earnings versus Age, Hours
The regression equation is
Earnings = - 20 + 13.4 Age + 244 Hours
Predictor
Constant
Age
Hours
S = 547.737

Coef
-20.4
13.350
243.71

SE Coef
652.7
7.672
63.51

R-Sq = 58.2%

T
-0.03
1.74
3.84

P
0.976
0.107
0.002

R-Sq(adj) = 51.3%

Copyright 2011 Pearson Education, Inc. Publishing as Prentice Hall.

Multiple Regression and Model Building

687

Analysis of Variance
Source
Regression
Residual Error
Total
Source
Age
Hours

DF
1
1

DF
2
12
14

SS
5018232
3600196
8618428

MS
2509116
300016

F
8.36

P
0.005

Seq SS
600498
4417734

Unusual Observations
Obs
4

Age
18.0

Earnings
1552

Fit
2657

SE Fit
205

Residual
-1105

St Resid
-2.18R

R denotes an observation with a large standardized residual.

The least squares prediction equation is: y 20.4 13.350 x1 243.71x2


c.

0 = 20.4. This has no meaning since x1 = 0 and x2 = 0 are not in the observed range.

1 = 13.350. For each additional year of age, the mean annual earnings is predicted to increase by
$13.350, holding hours worked per day constant.

2 = 243.71. For each additional hour worked per day, the mean annual earnings is predicted to
increase by $243.71, holding age constant.

d.

To determine if age is a useful predictor of annual earnings, we test:


H0: 1 = 0
Ha: 1 0
The test statistic is t = 1.74.
The p-value is p = .107. Since the p-value is greater than = .01 (p = .107 > = .01), H0 is not
rejected. There is insufficient evidence to indicate that age is a useful predictor of annual earnings,
adjusted for hours worked per day, at = .01.

e.

For confidence coefficient .95, = .05 and /2 = .05/2 = .025. From Table V, Appendix B, with
df = n (k + 1) = 15 (2 + 1) = 12, t.025 = 2.179. The 95% confidence interval is:

2 t.005 s 243.71 2.179(63.51) 243.71 138.388


2

(105.322, 382.098)
We are 95% confident that the change in the mean annual earnings for each additional hour worked
per day will be somewhere between $105.322 and $382.098, holding age constant.
f.

From the printout, R2 = R-Sq = 58.2% or .582. 58.2% of the total sample variance of annual earnings
is explained by the model containing age and hours worked per day.

g.

R2a = R-Sq(adj) = 51.3% or .513. 51.3% of the total sample variance of annual earnings is
explained by the model containing age and hours worked per day, adjusted for the sample size and
the number of parameters in the model.

Copyright 2011 Pearson Education, Inc. Publishing as Prentice Hall.

688

Chapter 11

h.

To determine if at least one of the variables is useful in predicting the annual earnings, we test:
H0: 1 = 2 = 0
Ha: At least 1 i 0
The test statistic is F = 8.36 and the p-value is p = .005. Since the p-value is less than

= .01 (p = .005 < .01), H0 is rejected. There is sufficient evidence to indicate at least one of the
variables is useful in predicting the annual earnings at = .01.
11.18

a.

From MINITAB, the output is:


Regression Analysis: DDT versus Mile, Length, Weight
The regression equation is
DDT = - 108 + 0.0851 Mile + 3.77 Length - 0.0494 Weight
Predictor
Constant
Mile
Length
Weight

Coef
-108.07
0.08509
3.771
-0.04941

S = 97.48

SE Coef
62.70
0.08221
1.619
0.02926

R-Sq = 3.9%

T
-1.72
1.03
2.33
-1.69

P
0.087
0.302
0.021
0.094

R-Sq(adj) = 1.8%

Analysis of Variance
Source
Regression
Residual Error
Total

DF
3
140
143

SS
53794
1330210
1384003

MS
17931
9501

F
1.89

P
0.135

The least squares prediction equation is:

y = 108.07 + 0.08509x1 + 3.771x2 0.04941x3


b.

s = 97.48. We would expect about 95% of the observed values of DDT level to fall within 2s or
2(97.48) = 194.96 units of their least squares predicted values.

c.

To determine if at least one of the variables is useful in predicting the DDT level, we test:
Ho: 1 = 2 = 3 = 0
Ha: At least 1 i 0
The test statistic is F = 1.89 and the p-value is p = .135. Since the p-value is not less than = .05
(p = .135 .05), H0 is not rejected. There is insufficient evidence to indicate at least one of the

variables is useful in predicting the DDT level at = .05.

d.

To determine if DDT level increases as length increases, we test:


H0: 2 = 0
Ha: 2 > 0
The test statistics is t = 2.33
The p-value is p = .021/2 = .0105. Since the p-value is less than (p = .0105 < .05), H0 is rejected.
There is sufficient evidence to indicate that DDT level increases as length increases, holding the
other variables constant at = .05.
The observed significance level is p = .0105.

Copyright 2011 Pearson Education, Inc. Publishing as Prentice Hall.

Multiple Regression and Model Building

e.

689

For confidence coefficient .95, = .05 and /2 = .05/2 = .025. From Table V, Appendix B, with
df = n 3 = 144 4 = 140, t.025 = 1.96. The 95% confidence interval is:

3 t / 2 s 0.04941 1.96(0.02926) 0.04941 0.05735


3

(0.10676, 0.00794)
We are 95% confident that the mean DDT level will change from 0.10676 to 0.00794 for each
additional point increase in weight, holding length and mile constant. Since 0 is in the interval, there
is no evidence that weight and DDT level are linearly related.
11.19

a.

The 1st-order model is E(y) = 0 + 1x1 + 2x2 + 3x3 + 4x4 + 5x5.

b.

Using MINITAB, the results are:


Regression Analysis: HEATRATE versus RPM, INLET-TEMP, ...

The regression equation is


HEATRATE = 13614 + 0.0888 RPM - 9.20 INLET-TEMP + 14.4 EXH-TEMP + 0.4 CPRATIO
- 0.848 AIRFLOW
Predictor
Constant
RPM
INLET-TEMP
EXH-TEMP
CPRATIO
AIRFLOW

Coef
13614.5
0.08879
-9.201
14.394
0.35
-0.8480

S = 458.828

SE Coef
870.0
0.01391
1.499
3.461
29.56
0.4421

R-Sq = 92.4%

T
15.65
6.38
-6.14
4.16
0.01
-1.92

P
0.000
0.000
0.000
0.000
0.991
0.060

R-Sq(adj) = 91.7%

Analysis of Variance
Source
Regression
Residual Error
Total

DF
5
61
66

Source
RPM
INLET-TEMP
EXH-TEMP
CPRATIO
AIRFLOW

Seq SS
119598530
26893467
7784225
4623
774427

DF
1
1
1
1
1

SS
155055273
12841935
167897208

MS
31011055
210524

F
147.30

P
0.000

Unusual Observations
Obs
11
32
36
47
61
64

RPM
18000
14950
4473
7280
33000
3600

HEATRATE
14628.0
10656.0
13523.0
11588.0
16243.0
8714.0

Fit
13214.0
11663.0
12489.5
10533.0
15758.0
8415.2

SE Fit
117.9
132.5
195.1
154.7
246.5
340.9

Residual
1414.0
-1007.0
1033.5
1055.0
485.0
298.8

St Resid
3.19R
-2.29R
2.49R
2.44R
1.25 X
0.97 X

R denotes an observation with a large standardized residual.


X denotes an observation whose X value gives it large influence.

Copyright 2011 Pearson Education, Inc. Publishing as Prentice Hall.

690

Chapter 11

The least squares prediction equation is:

y 13, 614.5 0.0888 x1 9.201x2 14.394 x3 0.35 x4 0.848 x5


c.

o 13, 614.5 . Since 0 is not within the range of all the independent variables, this value has no
meaning.

1 0.0888 . For each unit increase in RPM, the mean heat rate is estimated to increase by .0888,
holding all the other 4 variables constant.

2 9.201 . For each unit increase in inlet temperature, the mean heat rate is estimated to decrease
by 9.201, holding all the other 4 variables constant.

3 14.394 . For each unit increase in exhaust temperature, the mean heat rate is estimated to
increase by 14.394, holding all the other 4 variables constant.

4 0.35 . For each unit increase in cycle pressure ratio, the mean heat rate is estimated to increase
by 0.35, holding all the other 4 variables constant.

5 0.8480 . For each unit increase in air flow rate, the mean heat rate is estimated to decrease by
.848, holding all the other 4 variables constant.

d.

From the printout, s = 458.828. We would expect to see most of the heat rate values within 2 s or
2(458.828) = 917.656 units of the least squares line.

e.

To determine if at least one of the variables is useful in predicting the heat rate values, we test:
H0: 1 = 2 = 3 = 4 = 5 = 0
Ha: At least 1 i 0
The test statistic is F = 147.30 and the p-value is p = .000. Since the p-value is less than = .01 (p =
.000 < .01), H0 is rejected. There is sufficient evidence to indicate at least one of the variables is
useful in predicting the heat rate values at = .01.

f.

R2a = R-Sq(adj) = 91.7% or .917. 91.7% of the total sample variance of the heat rate values is
explained by the model containing the 5 independent variables.

g.

To determine if there is evidence to indicate heat rate is linearly related to inlet temperature, we test:
H0: 2 = 0
Ha: 2 0
The test statistic is t = -6.14 and the p-value is p = 0.000. Since the p-value is less than = .01 (p =
.000 < .01), H0 is rejected. There is sufficient evidence to indicate heat rate is linearly related to inlet
temperature at = .01.

Copyright 2011 Pearson Education, Inc. Publishing as Prentice Hall.

Multiple Regression and Model Building

11.20

691

a.

E(y) = 0 + 1x1 + 2x2 + 3x3 + 4x4 + 5x5 + 6x6 + 7x7

b.

Using MINITAB, the output is:


The regression equation is y = 0.998 - 0.0224 x1 + 0.156x2 - 0.0172x3
0.00953x4 + 0.421x5 + 0.417x6 - 0.155x7
Predictor
Constant
x1
x2
x3
x4
x5
x6
x7

Coef
0.9981
-0.022429
0.15571
-0.01719
-0.009527
0.4214
0.4171
-0.1552

S = 0.4365

StDev
0.2475
0.005039
0.07429
0.01186
0.009619
0.1008
0.4377
0.1486

T
4.03
-4.45
2.10
-1.45
-0.99
4.18
0.95
-1.04

R-Sq = 77.1%

P
0.002
0.001
0.060
0.175
0.343
0.002
0.361
0.319

R-Sq(adj) = 62.5%

Analysis of Variance
Source
DF
Regression
7
Residual Error 11
Total
18

SS
7.9578
2.3632
10.3210

MS
1.1368
0.2148

F
5.29

P
0.007

Source
DF
Seq SS
x1
1
1.4016
x2
1
1.9263
x3
1
0.1171
x4
1
0.0446
x5
1
4.0771
x6
1
0.1565
x7
1
0.2345
Unusual Observations
Obs
14

x1
80.0

y
0.120

Fit StDev Fit Residual St Resid


-0.628
0.328
0.748
2.28R

R denotes an observation with a large standardized residual.

The least squares model is y = .9981 .0224x1 + .1557x2 .0172x3 .0095x4


+ .4214x5 + .4171x6 .1552x7
c.

0 = .9981 = the estimate of the y-intercept.

1 = .0224. We estimate that the mean voltage will decrease by .0224 kw/cm, for each additional
increase of 1% of x1, the disperse phase volume (with all other variables held constant).

2 = .1557. We estimate that the mean voltage will increase by .1557 kw/cm for each additional
increase of 1% of x2, the salinity (with all other variables held constant).

Copyright 2011 Pearson Education, Inc. Publishing as Prentice Hall.

692

Chapter 11

3 = -.0172. We estimate the the mean voltage will decrease by .0172 kw/cm for each additional
increase of 1 degree of x3, the temperature in Celsius (with all other variables held constant).

4 = .0095. We estimate that the mean voltage will decrease by .0095 kw/cm for each additional
increase of 1 hour of x4, the time delay (with all other variables held constant).

5 = .4214. We estimate that the mean voltage will increase by .4214 kw/cm for each additional
increase of 1% of x5, surficant concentration (with all other variables held constant).

6 = .4171. We estimate that the mean voltage will increase by .4171 kw/cm for each additional
increase of 1 unit of x6, span: Triton (with all other variables held constant).

7 = .1552. We estimate that the mean voltage will decrease by .1552 kw/cm for each additional
increase of 1% of x7, the solid particles (with all other variables held constant).
d.

To determine if at least one of the variables is useful in predicting voltage, we test:


H0: 1 = 2 = 3 = 4 = 5 = 6 = 7 = 0
Ha: At least 1 i 0
The test statistic is F = 5.29 and the p-value is p = .007. Since the p-value is less than = .10 (p =
.007 < .10), H0 is rejected. There is sufficient evidence to indicate at least one of the 7 variables is
useful in predicting voltage at = .10.

11.21

a.

b.

R2 = .362. 36.2% of the variability in the AC scores can be explained by the model containing the
variables self-esteem score, optimism score, and group cohesion score.
To test the utility of the model, we test:
H0: 1 = 2 = 3 = 0
Ha: At least one i 0, i = 1, 2, 3
The test statistic is:
.362 / 3
R2 / k
F=
= 5.11
=
2
(1 .362) /[31 (3 1)]
(1 R ) /[n (k 1)]

The rejection region requires = .05 in the upper tail of the F distribution with 1 = k = 3 and 2 = n
(k + 1) = 31 (3 + 1) = 27. From Table VIII, Appendix B, F.05 = 2.96. The rejection region is F >
2.96.
Since the observed value of the test statistic falls in the rejection region (F = 5.11 > 2.96), H0 is
rejected. There is sufficient evidence that the model is useful in predicting AC score at = .05.

Copyright 2011 Pearson Education, Inc. Publishing as Prentice Hall.

Multiple Regression and Model Building

11.22

693

To determine if the model is useful, we test:


H0: 1 = 2 = = 18 = 0
Ha: At least one i 0, i = 1, 2, ... , 18
The test statistic is F =

R2 / k
(1 R 2 ) /[n (k 1)]

.95 / 18
= 1.06
(1 .95) /[20 (18 1)]

The rejection region requires = .05 in the upper tail of the F distribution with 1 = k = 18 and 2 = n (k
+ 1) = 20 (18 + 1) = 1. From Table VIII, Appendix B, F.05 245.9. The rejection region is F > 245.9.
Since the observed value of the test statistic does not fall in the rejection region (F = 1.06 247), H0 is not

rejected. There is insufficient evidence to indicate the model is adequate at = .05.


Note: Although R2 is large, there are so many variables in the model that 2 is small.
11.23

a.

Model 1:

H0: 1 = 0
Ha: 1 0
The test statistic is t =

1 0
s

.0354
= 2.58.
.0137

Since no was given, we will use = .05. The rejection region requires /2 = .05/2 = .025 in each
tail of the t distribution. From Table V, Appendix B, with df = n (k + 1) = 12 (1 + 1) = 10, t.025 =
2.228. The rejection region is t < 2.228 or t > 2.228.
Since the observed value of the test statistic falls in the rejection region (t = 2.58 > 2.228), H0 is
rejected. There is sufficient evidence to indicate that there is a linear relationship between vintage
and the logarithm of price.
Model 2:

H0: 1 = 0
Ha: 1 0
The test statistic is t =

1 0
s

.0238
= 3.32
.00717

Since no was given, we will use = .05. The rejection region requires /2 = .05/2 = .025 in each
tail of the t distribution. From Table V, Appendix B, with df = n (k + 1) = 12 (4 + 1) = 7, t.025 =
2.365. The rejection region is t < 2.365 or t > 2.365.
Since the observed value of the test statistic falls in the rejection region (t = 3.32 > 2.365), H0 is
rejected. There is sufficient evidence to indicate that there is a linear relationship between vintage
and the logarithm of price, adjusting for all other variables.
H0: 2 = 0
Ha: 2 0

Copyright 2011 Pearson Education, Inc. Publishing as Prentice Hall.

694

Chapter 11

The test statistic is t =

2 0
s

.616
= 6.47
.0952

The rejection region is t < 2.365 or t > 2.365.


Since the observed value of the test statistic falls in the rejection region (t = 6.47 > 2.365), H0 is
rejected. There is sufficient evidence to indicate that there is a linear relationship between average
growing season temperature and the logarithm of price, adjusting for all other variables.
H0: 3 = 0
Ha: 3 0
The test statistic is t =

3 0
s

.00386
= 4.77
.00081

The rejection region is t < 2.365 or t > 2.365.


Since the observed value of the test statistic falls in the rejection region (t = 4.77 < 2.365), H0 is
rejected. There is sufficient evidence to indicate that there is a linear relationship between Sept./
Aug. rainfall and the logarithm of price, adjusting for all other variables.
H0: 4 = 0
Ha: 4 0
The test statistic is t =

4 0
s

.0001173
= 0.24.
.000482

The rejection region is t < 2.365 or t > 2.365.


Since the observed value of the test statistic does not fall in the rejection region (t = 0.24 2.365),

H0 is not rejected. There is insufficient evidence to indicate that there is a linear relationship between
rainfall in months preceding vintage and the logarithm of price, adjusting for all other variables.
Model 3:

H0: 1 = 0
Ha: 1 0
The test statistic is t =

1 0
s

.0240
= 3.21
.00747

Since no was given, we will use = .05. The rejection region requires /2 = .05/2 = .025 in each
tail of the t distribution. From Table V, Appendix B, with df = n (k + 1) = 12 (5 + 1) + 7, t.025 =
2.447. The rejection region is t < 2.447 or t > 2.447.
Since the observed value of the test statistic falls in the rejection region (t = 3.21 > 2.447), H0 is
rejected. There is sufficient evidence to indicate that there is a linear relationship between vintage
and the logarithm of price, adjusting for all other variables.

Copyright 2011 Pearson Education, Inc. Publishing as Prentice Hall.

Multiple Regression and Model Building

695

H0: 2 = 0
Ha: 2 0
The test statistic is t =

2 0
s

.608
= 5.24.
.116

The rejection region is t < 2.447 or t > 2.447.


Since the observed value of the test statistic falls in the rejection region (t = 5.24 > 2.447), H0 is
rejected. There is sufficient evidence to indicate that there is a linear relationship between average
growing season temperature and the logarithm of price, adjusting for all other variables.
H0: 3 = 0
Ha: 3 0
The test statistic is t =

3 0
s

.00380
= 4.00
.00095

The rejection region is t < 2.447 or t > 2.447.


Since the observed value of the test statistic falls in the rejection region (t = 4.00 < 2.447), H0 is
rejected. There is sufficient evidence to indicate that there is a linear relationship between Sept./Aug.
rainfall and the logarithm of price, adjusting for all other variables.
H0: 4 = 0
Ha: 4 0
The test statistic is t =

4 0
s

.00115
= 2.28
.000505

The rejection region is t < 2.447 or t > 2.447.


Since the observed value of the test statistic does not fall in the rejection region (t = 2.28 2.365),

H0 is not rejected. There is insufficient evidence to indicate that there is a linear relationship between
rainfall in months preceding vintage and the logarithm of price, adjusting for all other variables.
H0: 5 = 0
Ha: 5 0
The test statistic is t =

5 0
s

.00765
= 0.14.
.0565

The rejection region is t < 2.447 or t > 2.447.


Since the observed value of the test statistic does not fall in the rejection region (t = 0.14 2.365),

H0 is not rejected. There is insufficient evidence to indicate that there is a linear relationship between
average September temperature and the logarithm of price, adjusting for all other variables.

Copyright 2011 Pearson Education, Inc. Publishing as Prentice Hall.

696

Chapter 11

b.

Mode1 1:
.0354

1 = .0354, e

1 = .036

We estimate that the mean price will increase by 3.6% for each additional increase of unit of x1,
vintage year.
Model 2:
.0238

1 = .024
1 = .0238, e

We estimate that the mean price will increase by 2.4% for each additional increase of 1 unit of x1,
vintage year (with all other variables held constant).
.616

2 = .616, e 1 = .852

We estimate that the mean price will increase by 85.2% for each additional increase of 1 unit of x2,
average growing season temperature C (with all other variables held constant).
.00386

1 = .004
3 = .00386, e

We estimate that the mean price will decrease by .4% for each additional increase of 1 unit of x3,
Sept./Aug. rainfall in cm (with all other variables held constant).

4 = .0001173, e

.0001173

1 = .0001

We estimate that the mean price will increase by .01% for each additional increase of 1 unit of x4,
rainfall in months preceding vintage in cm (with all other variables held constant).
Model 3:
.0240

1 = .024
1 = .0240, e

We estimate that the mean price will increase by 2.4% for each additional increase of 1 unit of x1,
vintage year (with all other variables held constant).
.608

2 = .608, e 1 = .837

We estimate that the mean price will increase by 83.7% for each additional increase of 1 unit of x2,
average growing season temperatures in C (with all other variables held constant).
.00380

1 = .004
3 = .00380, e

We estimate that the mean price will decrease by .4% for each additional increase of 1 unit of x3,
Sept./Aug. rainfall in cm, (with all other variables held constant).
.00115

1 = .001
4 = .00115, e

Copyright 2011 Pearson Education, Inc. Publishing as Prentice Hall.

Multiple Regression and Model Building

697

We estimate that the average mean price will increase by .1% for each additional increase of 1 unit of
x4, rainfall in months preceding vintage in cm (with all other variables held constant).
.00765

1 = .008
5 = .00765, e

We estimate that the average mean price will increase by .8% for each additional increase of 1 unit of
x5, average Sept. temperature in C (with all other variables held constant).
c.

11.24

I would recommend model 2. Model 1 has only 1 independent variable in the model and it is
significant at = .05. The R2 for this model is R2 = .212 and s = .575. Model 2 has 4 independent
variables in the model and all terms are significant at = .05 except one. This one variable is
significant at = .10. This model has R2 = .828 and s = .287. Comparing model 2 to model 1, the R2
for model 2 is much larger than that for model 1 and the estimate of the standard deviation is much
smaller. Model 3 contains all of the independent variables that model 2 has plus one additional
variable. This additional variable is not significant at = .10. In addition, the R2 for this new model
= .828, the same as for model 2. However, the estimate of the standard deviation of model 3 is now
larger than that of model 2. This indicates that model 2 is better than model 3.

a.

From MINITAB, the output is:


Regression Analysis: Labor versus Pounds, Units, Weight
The regression equation is
Labor = 132 + 2.73 Pounds + 0.0472 Units - 2.59 Weight
Predictor
Constant
Pounds
Units
Weight

Coef
131.92
2.726
0.04722
-2.5874

S = 9.810

SE Coef
25.69
2.275
0.09335
0.6428

R-Sq = 77.0%

T
5.13
1.20
0.51
-4.03

P
0.000
0.248
0.620
0.001

R-Sq(adj) = 72.7%

Analysis of Variance
Source
Regression
Residual Error
Total
Source
Pounds
Units
Weight

DF
1
1
1

DF
3
16
19

SS
5158.3
1539.9
6698.2

MS
1719.4
96.2

F
17.87

P
0.000

Seq SS
3400.6
198.4
1559.3

The least squares equation is:

y = 131.92 + 2.726x1 + .0472x2 2.587x3


b.

To test the usefulness of the model, we test:


H0: 1 = 2 = 3 = 0
Ha: At least one i 0, for i = 1, 2, 3
The test statistic is F =

MSR
1719.4
=
= 17.87
MSE
96.2

Copyright 2011 Pearson Education, Inc. Publishing as Prentice Hall.

698

Chapter 11

The rejection region requires = .01 in the upper tail of the F-distribution with 1 = k = 3 and 2 = n
(k + 1) = 20 (3 + 1) = 16. From Table VIII, Appendix B, F.01 = 5.29. The rejection region is F >
5.29.
Since the observed value of the test statistic falls in the rejection region (F = 17.87 > 5.29), H0 is
rejected. There is sufficient evidence to indicate a relationship exists between hours of labor and at
least one of the independent variables at = .01.
c.

H0: 2 = 0
Ha: 2 0
The test statistic is t = .51. The p-value = .620. We reject H0 if p-value < . Since .620 > .05, do not
reject H0. There is insufficient evidence to indicate a relationship exists between hours of labor and
percentage of units shipped by truck, all other variables held constant, at = .05.

d.

e.

If the average number of pounds per shipment increases from 20 to 21, the estimated change in mean
number of hours of labor is 2.587. Thus, it will cost $7.50(2.587) = $19.4025 less, if the variables
x1 and x2 are constant.

f.

Since s = Standard Error = 9.81, we can estimate approximately with 2s precision or 2(9.81) or
19.62 hours.

g.

11.25

R2 is printed as R-Sq. R2 = .770. We conclude that 77% of the sample variation of the labor hours is
explained by the regression model, including the independent variables pounds shipped, percentage
of units shipped by truck, and weight.

No. Regression analysis only determines if variables are related. It cannot be used to determine
cause and effect.

a.

For x1 = 1, x2 = 10, x3 = 5, and x4 = 2, y 3.58 .01(1) .06(10) .01(5) .42(2) 3.78

For x1 = 0, x2 = 8, x3 = 10, and x4 = 4, y 3.58 .01(0) .06(8) .01(10) .42(4) 4.68

b.
11.26

You would look up the number of walks (x1), singles (x2), doubles (x3), triples (x4), home runs (x5), stolen
bases (x6), caught stealing (x7), strikeouts (x8), and outs (x9) for your favorite team. Then use the following
fitted regression line to predict the number of runs scored:

y 3.70 .34 x1 .49 x2 .72 x3 1.14 x4 1.51x5 .26 x6 .14 x7 .10 x8 .10 x9

11.27

The 95% prediction interval is (1,759.75, 4,275.38). We are 95% confident that the true
actual annual earnings for a vendor who is 45 years old and who works 10 hours per day is between
$1,759.75 and $4,275.38.

b.

The 95% confidence interval is (2,620.25, 3,414.87). We are 95% confident that the true mean
annual earnings for vendors who are 45 years old and who work 10 hours per day is between
$2,620.25 and $3,414.87.

c.

11.28

a.

Yes. The prediction interval for the ACTUAL value of y is always wider than the confidence interval
for the MEAN value of y.

From the printout, the 90% prediction interval is (143.218, 180.978). We are 90% confidence that an
actual DDT level for a fish caught 300 miles upstream that is 40 centimeters long and weighs 800 grams
will be between 143.218 and 180.978. Since the DDT level cannot be negative, the interval would be
between 0 and 180.978.

Copyright 2011 Pearson Education, Inc. Publishing as Prentice Hall.

Multiple Regression and Model Building

11.29

699

The 95% prediction interval is (11,599.6, 13,665.5). We are 95% confident that the actual heat rate
will be between 11,599.6 and 13.665.5 when the RPM is 7,500, the inlet temperature is 1,000, the
exhaust temperature is 525, the cycle pressure ratio is 13.5 and the air flow rate is 10.

b.

The 95% confidence interval is (12,157.9, 13,107.1). We are 95% confident that the mean heat rate
will be between 12,157.9 and 13,107.1 when the RPM is 7,500, the inlet temperature is 1,000, the
exhaust temperature is 525, the cycle pressure ratio is 13.5 and the air flow rate is 10.

c.

11.30

a.

Yes. The confidence interval for the mean will always be smaller than the prediction interval for the
actual value. This is because there are 2 error terms involved in predicting an actual value and only
one error term involved in estimating the mean. First, we have the error in locating the mean of the
distribution. Once the mean is located, the actual value can still vary around the mean, thus, the
second error. There is only one error term involved when estimating the mean, which is the error in
locating the mean.

Yes, we agree. The fitted regression model is:

y 80, 991 2, 220.1latitude 1, 543.9longitude .3493depth .


Because the estimated coefficients for latitude and depth are negative, the higher levels of arsenic will be
when these levels are low. Because the estimated coefficient for longitude is positive, the higher levels of
arsenic will be when longitude is high.
The lowest value of latitude is 23.755, the maximum longitude is 90.662 and the minimum depth is 25.
Using MINITAB, the output is:
Predicted Values for New Observations
New
Obs
1

Fit
232.43

SE Fit
23.23

95% CI
(186.73, 278.14)

95% PI
(24.14, 440.73)X

X denotes a point that is an outlier in the predictors.


Values of Predictors for New Observations
New
Obs
1

LATITUDE
23.8

LONGITUDE
90.7

DEPTH-FT
25.0

From the printout, the 95% prediction interval is (24.14, 440.73). We are 95% confident that the actual
arsenic level will be between 24.14 and 440.73 when the latitude is 23.755, longitude is 90.662, and depth
is 25.

Copyright 2011 Pearson Education, Inc. Publishing as Prentice Hall.

700

11.31

Chapter 11

a.

Using MINITAB, the results are:


Regression Analysis: PPRatio versus ARTenure, AR6Year, AveSal6
The regression equation is
PPRatio = 0.70 + 0.180 ARTenure + 0.0729 AR6Year - 0.120 AveSal6
Predictor
Constant
ARTenure
AR6Year
AveSal6

Coef
0.704
0.17957
0.07285
-0.11981

S = 8.89248

SE Coef
1.192
0.08876
0.07379
0.04238

R-Sq = 11.2%

T
0.59
2.02
0.99
-2.83

P
0.556
0.045
0.325
0.005

R-Sq(adj) = 9.6%

Analysis of Variance
Source
Regression
Residual Error
Total
Source
ARTenure
AR6Year
AveSal6

DF
1
1
1

DF
3
171
174

SS
1704.73
13522.03
15226.76

MS
568.24
79.08

F
7.19

P
0.000

Seq SS
994.79
78.02
631.93

The least squares prediction equation is: y = .704 + .180 x1 + .0729 x2 .120 x3
b.

To determine if the model is adequate, we test:


H0: 1 = 2 = 3 = 0
Ha: At least one i 0
From the printout, the test statistic is F = 7.19 and the p-value is p = .000.
Since the p-value is less than (p = .000 < .05), H0 is rejected. There is sufficient evidence to indicate
the model is adequate at = .05.

c.

Using MINITAB, the results are:


Predicted Values for New Observations
New
Obs
1

Fit
10.098

SE Fit
1.830

95% CI
(6.486, 13.710)

95% PI
(-7.822, 28.019)

Values of Predictors for New Observations


New
Obs
1

ARTenure
40.0

AR6Year
32.0

AveSal6
1.00

The 95% confidence interval for the efficiency rating of a CEO with x1 = 40%, x2 = 32%, and x3 = $1
million is (7.822, 28.019). We are 95% confident that the actual efficiency rating of a CEO with the
above values for the independent variables is between 7.822 and 28.019.

Copyright 2011 Pearson Education, Inc. Publishing as Prentice Hall.

Multiple Regression and Model Building

11.32

701

The first order model is:


E(y) = 0 + 1x1 + 2x2 + 3x5
We want to find a 95% prediction interval for the actual voltage when the volume fraction of the disperse
phase is at the high level (x1 = 80), the salinity is at the low level (x2 = 1), and the amount of surfactant is at
the low level (x5 = 2).
Using MINITAB, the output is:
The regression equation is
y = 0.993 - 0.0243 x1 + 0.142 x2 + 0.385 x5
Predictor
Constant
x1
x2
x5

Coef
0.9326
-0.024272
0.14206
0.38457

S = 0.4796

StDev
0.2482
0.004900
0.07573
0.09801

R-Sq = 66.6%

T
3.76
-4.95
1.88
3.92

P
0.002
0.000
0.080
0.001

R-Sq(adj) = 59.9%

Analysis of Variance
Source
Regression
Residual
Error
Total
Sourc
e
x1
x2
x5

DF
3
15

SS
6.8701
3.4509

18

10.3210

DF

F
9.95

P
0.001

Seq SS

1
1
1

MS
2.2900
0.2301

1.4016
1.9263
3.5422

Unusual Observations
Obs
x1
y
3
40.0
3.200

Fit
2.068

StDev Fit
0.239

Residual
1.132

St Resid
2.72R

R denotes an observation with a large standardized residual


Predicted Values
Fit
-0.098

StDev Fit
0.232

95.0%
( -0.592,

CI
0.396)

95.0%
-1.233,

PI
1.038)

The 95% prediction interval is (1.233, 1.038). We are 95% confident that the actual voltage is between
1.233 and 1.038 kw/cm when the volume fraction of the disperse phase is at the high level (x1 = 80), the
salinity is at the low level (x2 = 1), and the amount of surfactant is at the low level (x5 = 2).

Copyright 2011 Pearson Education, Inc. Publishing as Prentice Hall.

702

11.33

Chapter 11

a.

From MINITAB, the output is:


Regression Analysis: Man-Hours versus Capacity, Pressure, Type, Drum
The regression equation is
Man-Hours = - 3783 + 0.00875 Capacity + 1.93 Pressure + 3444 Type + 2093 Drum
Predictor
Constant
Capacity
Pressure
Type
Drum

Coef
-3783
0.0087490
1.9265
3444.3
2093.4

S = 894.6

SE Coef
1205
0.0009035
0.6489
911.7
305.6

R-Sq = 90.3%

T
-3.14
9.68
2.97
3.78
6.85

P
0.004
0.000
0.006
0.001
0.000

R-Sq(adj) = 89.0%

Analysis of Variance
Source
Regression
Residual Error
Total

DF
4
31
35

Source
Capacity
Pressure
Type
Drum

Seq SS
175007141
490357
17813091
37544266

DF
1
1
1
1

SS
230854854
24809761
255664615

MS
57713714
800315

F
72.11

P
0.000

Predicted Values for New Observations


New Obs
1

Fit
1936

SE Fit
239

95.0% CI
1449,

2424)

95.0% PI
48,

3825)

Values of Predictors for New Observations


New Obs
1

Capacity
150000

Pressure
500

Type
1.00

Drum
0.000000

The fitted regression line is:

y 3, 783 0.00875 x1 1.9265 x2 3, 444.3 x3 2, 093.4 x4


b.

To determine if the model is useful for predicting the number of man-hours needed, we test:
H0: 1 = 2 = 3 = 4 = 0
Ha: At least one i 0, i = 1, 2, 3, 4
The test statistic is F = 72.11 with p-value = .000. Since the p-value is less than = .01, we can
reject H0. There is sufficient evidence that the model is useful for predicting man-hours at = .01.

c.

The confidence interval is (1449, 2424).


With 95% confidence, we can conclude that the mean number of man-hours for all boilers with
characteristics x1 = 150,000, x2 = 500, x3 = 1, x4 = 0 will fall between 1449 hours and 2424 hours.

11.35

a.

E(y) = 0 + 1x1 + 2x2 + 3x1x2

b.

11.34

E(y) = 0 + 1x1 + 2x2 + 3x3 + 4x1x2 + 5x1x3 + 6x2x3

a.

The response surface is a twisted surface in three-dimensional space.

b.

For x1 = 0, E(y) = 3 + 0 + 2x2 0x2 = 3 + 2x2


For x1 = 1, E(y) = 3 + 1 + 2x2 1x2 = 4 + x2
For x1 = 2, E(y) = 3 + 2 + 2x2 2x2 = 5

Copyright 2011 Pearson Education, Inc. Publishing as Prentice Hall.

Multiple Regression and Model Building

703

The plot of the lines is

c.

The lines are not parallel because interaction between x1 and x2 is present. Interaction between x1 and
x2 means that the effect of x2 on y depends on what level x1 takes on.

d.

For x1 = 0, as x2 increases from 0 to 5, E(y) increases from 3 to 13.


For x1 = 1, as x2 increases from 0 to 5, E(y) increases from 4 to 9.
For x1 = 2, as x2 increases from 0 to 5, E(y) = 5.

e.

For x1 = 2 and x2 = 4, E(y) = 5


For x1 = 0 and x2 = 5, E(y) = 13
Thus, E(y) changes from 5 to 13.

11.36

a.

R2 = 1

SSE
SS yy

21
= .956
479

95.6% of the total variability of the y values is explained by this model.


b.

To test the utility of the model, we test:


H0: 1 = 2 = 3 = 0
Ha: At least one i 0, i = 1, 2, 3
The test statistic is F =

R2 / k
(1 R )[n (k 1)]
2

.956 / 3
= 202.8
(1 .956)[32 (3 1)]

The rejection region requires = .05 in the upper tail of the F distribution, with 1 = k = 3 and 2 = n
(k + 1) = 32 (3 + 1) = 28. From Table VIII, Appendix B, F.05 = 2.95. The rejection region is F >
2.95.
Since the observed value of the test statistic falls in the rejection region (F = 202.8 > 2.95), H0 is
rejected. There is sufficient evidence that the model is adequate for predicting y at = .05.

Copyright 2011 Pearson Education, Inc. Publishing as Prentice Hall.

704

Chapter 11

c.

The relationship between y and x1 depends on the level of x2.

d.

To determine if x1 and x2 interact, we test:


H0: 3 = 0
Ha: 3 0
The test statistic is t =

1 0
s

10
= 2.5.
4

The rejection region requires /2 = .05/2 = .025 in each tail of the t distribution with df = n (k + 1)
= 32 (3 + 1) = 28. From Table V, Appendix B, t.025 = 2.048. The rejection region is t < 2.048 or t
> 2.048.
Since the observed value of the test statistic falls in the rejection region (t = 2.5 > 2.048), H0 is
rejected. There is sufficient evidence to indicate that x1 and x2 interact at = .05.
11.37

a.

The prediction equation is:

y = 2.55 + 3.82x1 + 2.63x2 1.29x1x2


b.

The response surface is a twisted plane, since the equation contains an interaction term.

c.

For x2 = 1, = 2.55 + 3.82x1 + 2.63(1) 1.29x1(1)


= .08 + 2.53x1
For x2 = 3, = 2.55 + 3.82x1 + 2.63(3) 1.29x1(3)
= 5.34 .05x1
For x2 = 5, = 2.55 + 3.82x1 + 2.63(5) 1.29x1(5)
= 10.6 2.63x1

d.

If x1 and x2 interact, the effect of x1 on y is different at

different levels of x2. When x2 = 1, as x1 increases, y

also increases. When x2 = 5, as x1 increases, y


decreases.

Copyright 2011 Pearson Education, Inc. Publishing as Prentice Hall.

Multiple Regression and Model Building

e.

705

The hypotheses are:


H0: 3 = 0
Ha: 3 0

f.

The test statistic is t =

3
s

1.285
= 8.06
.159

The rejection region requires /2 = .01/2 = .005 in each tail of the t distribution with df = n (k + 1)
= 15 (3 + 1) = 11. From Table V, Appendix B, t.005 = 3.106. The rejection region is t < 3.106 or
t > 3.106.
Since the observed value of the test statistic falls in the rejection region (t = 8.06 < 3.106), H0 is
rejected. There is sufficient evidence to indicate that x1 and x2 interact at = .01.
11.38

a.

To determine if the overall model is useful for predicting y, we test:


H0: 1 = 2 = 3 = 0
Ha: At least one i is not 0
The test statistic is F = 226.35 and the p-value is p < .001. Since the p-value is less than
(p < .001 < .05), Ho is rejected. There is sufficient evidence to indicate the overall model is useful
for predicting y, willingness of the consumer to shop at a retailers store in the future at = .05.

b.

To determine if consumer satisfaction and retailer interest interact to affect willingness to shop at
retailers shop in future, we test:
H0: 3 = 0
Ha: 3 0
The test statistic is t = -3.09 and the p-value is p < .01. Since the p-value is less
than (p < .01 <
.05), H0 is rejected. There is sufficient evidence to indicate consumer satisfaction and retailer interest
interact to affect willingness to shop at retailers shop in future at = .05.

c.

When x2 = 1,


y o .426 x1 .044 x2 .157 x1 x2

o .426 x1 .044(1) .157 x1 (1)

.044 (.426 .157) x


o

o .044 .269 x1

Since no value is given for o , we will use o = 1 for graphing purposes. Using MINITAB, a
graph might look like:

Copyright 2011 Pearson Education, Inc. Publishing as Prentice Hall.

706

Chapter 11

Scatterplot of YHAT vs X1 when X2=1


3.0

YH A T

2.5

2.0

1.5

4
X1

When x2 = 7,


y o .426 x1 .044 x2 .157 x1 x2

o .426 x1 .044(7) .157 x1 (7)

.308 (.426 1.099) x


o

o .308 .673 x1

Since no value is given for o , we will again use o = 1 for graphing purposes.

Using MINITAB, a graph might look like:


Scatterplot of YHAT vs X1 when X2=7

-1
YH A T

d.

-2

-3

-4
1

4
X1

Copyright 2011 Pearson Education, Inc. Publishing as Prentice Hall.

Multiple Regression and Model Building


e.

707

Using MINITAB, both plots on the same graph would be:


Scatterplot of YAHT vs X1
Variable

x2=1
x2=7

YH A T

1
0
-1
-2
-3
-4
1

4
X1

Since the lines are not parallel, it indicates that interaction is present.
11.39

a.

A regression model incorporating interaction between x1 and x2 would be:

E ( y ) o 1 x1 2 x2 3 x1 x2
b.

11.40

If the slope of the relationship between number of defects (y) and turntable speed (x1) is steeper for
lower values of cutting blade speed, then the interaction term must be negative. As the value of
cutting speed increases, the steepness gets smaller, thus, the interaction term must get smaller. This
implies 3 0.

a.

The hypothesized regression model including the interaction between x1 and x2 would be:
E ( y ) o 1 x1 2 x2 3 x1 x2

b.

If x1 and x2 interact to affect y then the effect of x1 on y depends on the level of x2. Also, the effect
of x2 on y depends on the level of x1.

c.

Since the p-value is not small (p = .25), Ho is not rejected. There is insufficient evidence to indicate
x1 and x2 interact to affect y.

d.

1 corresponds to x1, the number ahead in line. If the negative feeling score gets larger as the
number of people ahead increases, then 1 is positive. 2 corresponds to x2, the number behind in
line. If the negative feeling score gets lower as the number of people behind increases, then 2 is
negative.

Copyright 2011 Pearson Education, Inc. Publishing as Prentice Hall.

708

11.41

Chapter 11

a.

Using MINITAB, the results of fitting the interaction model are:


Regression Analysis: Earnings versus Age, Hours, A_H
The regression equation is
Earnings = 1042 - 13.2 Age + 103 Hours + 3.62 A_H
Predictor
Constant
Age
Hours
A_H

Coef
1042
-13.24
103.3
3.621

S = 550.289

SE Coef
1304
29.23
162.0
3.840

R-Sq = 61.4%

T
0.80
-0.45
0.64
0.94

P
0.441
0.659
0.537
0.366

R-Sq(adj) = 50.8%

Analysis of Variance
Source
Regression
Residual Error
Total
Source
Age
Hours
A_H

DF
1
1
1

DF
3
11
14

SS
5287427
3331000
8618428

MS
1762476
302818

F
5.82

P
0.012

Seq SS
600498
4417734
269196

The least squares prediction equation is:

y 1042 13.24 x1 103.3 x2 3.621x1 x2


b.

When x2 = 10, the least squares line is:

y 1042 13.24 x1 103.3(10) 3.621x1 (10)


1042 1033 13.24 x1 36.21x1 2075 22.97 x1
The estimated slope relating annual earnings to age is 22.97. When hours worked is equal to 10, for
each additional year of age, the mean annual earnings is estimated to increase by 22.97.

c.

When x1 = 40, the least squares line is:

y 1042 13.24(40) 103.3x2 3.621(40) x2


1042 529.6 103.3 x2 144.84 x2 512.4 248.14 x2
The estimated slope relating annual earnings to hours worked is 248.14. When age is equal to 40, for
each additional hour worked, the mean annual earnings is estimated to increase by 248.14.

d.

To determine if age and hours worked interact, we test:


H0: 3 = 0

e.

From the printout, the test statistic for the test for interaction is t = 0.94 and the
p-value is p = .366.

Copyright 2011 Pearson Education, Inc. Publishing as Prentice Hall.

Multiple Regression and Model Building

709

f.

11.42

Since the p-value is so large (p = .366), H0 is not rejected. There is insufficient evidence to indicate
age and hours worked interact to affect annual earnings.

a.

If client credibility and linguistic delivery style interact, then the effect of client credibility on the
likelihood value depends on the level of linguistic delivery style.

b.

To determine the overall model adequacy, we test:


H0: 1 = 2 = 3 = 0
Ha: At least one i 0

c.

The test statistic is F = 55.35 and the p-value is p < 0.0005.


Since the p-value is so small (p < 0.0005), H0 is rejected for any reasonable value of . There is
sufficient evidence to indicate that the model is adequate at > 0.0005.

d.

To determine if client credibility and linguistic delivery style interact, we test:


H0: 3 = 0
Ha: 3 0

e.

The test statistic is t = 4.008 and the p-value is p < 0.005.


Since the p-value is so small (p < 0.005), H0 is rejected. There is sufficient evidence to indicate that
client credibility and linguistic delivery style interact at > 0.005.

f.

When x1 = 22, the least squares line is:

y 15.865 0.037(22) 0.678 x2 0.036 x2 (22) 16.679 0.114 x2


The estimated slope of the Likelihood-Linguistic delivery style line when client credibility is 22 is
0.114. When client credibility is equal to 22, for each additional point increase in linguistic delivery
style, the mean likelihood is estimated to increase by 0.114.

g.

When x1 = 46, the least squares line is:

y 15.865 0.037(46) 0.678 x2 0.036 x2 (46) 17.567 0.978 x2


The estimated slope of the Likelihood-Linguistic delivery style line when client credibility is 46 is
0.978. When client credibility is equal to 46, for each additional point increase in linguistic delivery
style, the mean likelihood is estimated to increase by 0.978.

Copyright 2011 Pearson Education, Inc. Publishing as Prentice Hall.

710

11.43

Chapter 11

a.

Let x1 = latitude, x2 = longitude, and x3 = depth. The model is


y = o + 1x1 + 2x2 + 3x3 +4x1 x3 +5x2x3 + .

b.

Using MINITAB, the results are:


Regression Analysis: ARSENIC versus LATITUDE, LONGITUDE, ...
The regression equation is
ARSENIC = 10845 - 1280 LATITUDE + 217 LONGITUDE - 1549 DEPTH-FT - 11.0 Lat_d
+ 20.0 Long_d
327 cases used, 1 cases contain missing values
Predictor
Constant
LATITUDE
LONGITUDE
DEPTH-FT
Lat_D
Long_D

Coef
10845
-1280
217.4
-1549.2
-11.00
19.98

S = 103.072

SE Coef
67720
1053
814.5
985.6
11.86
11.20

R-Sq = 13.7%

T
0.16
-1.22
0.27
-1.57
-0.93
1.78

P
0.873
0.225
0.790
0.117
0.355
0.076

R-Sq(adj) = 12.4%

Analysis of Variance
Source
Regression
Residual Error
Total
Source
LATITUDE
LONGITUDE
DEPTH-FT
Lat_D
Long_D

DF
1
1
1
1
1

DF
5
321
326

SS
542303
3410258
3952562

MS
108461
10624

F
10.21

P
0.000

Seq SS
132448
320144
53179
2756
33777

The least squares model is:

y 10, 845 1, 280 latitude 217.4 longitude 1, 549.2 depth 11.00 lat_d 19.98 long_d
c.

To determine if latitude and depth interact to affect arsenic level, we test:


H0: 4 = 0
Ha: 4 0
From the printout, the test statistic is F = -.93 and the p-value is p = .355.
Since the p-value is not less than (p = .355 < .05), Ho is not rejected. There is insufficient evidence
/
to indicate latitude and depth interact to affect arsenic level at = .05.

Copyright 2011 Pearson Education, Inc. Publishing as Prentice Hall.

Multiple Regression and Model Building

d.

711

To determine if longitude and depth interact to affect arsenic level, we test:


H0: 5 = 0
Ha: 5 0
From the printout, the test statistic is F = 1.78 and the p-value is p = .076.
Since the p-value is not less than (p = .076 < .05), Ho is not rejected. There is insufficient evidence
/
to indicate longitude and depth interact to affect arsenic level at = .05.

e.

11.44

a.

Because the interactions are not significant, this means that the effect of latitude on the arsenic levels
does not depend on the depth and the effect of longitude on the arsenic levels does not depend on the
depth.
The model that incorporates the researchers theories is:
E ( y ) 0 1 x2 2 x3 3 x5 4 x2 x5 5 x3 x5

b.

Using MINITAB, the results of fitting the model are:


Regression Analysis: HEATRATE versus INLET-TEMP, EXH-TEMP, ...
The regression equation is
HEATRATE = 13945 - 15.1 INLET-TEMP + 28.8 EXH-TEMP - 0.69 AIRFLOW
+ 0.0228 IT_AFR - 0.0543 ET_AFR
Predictor
Constant
INLET-TEMP
EXH-TEMP
AIRFLOW
IT_AFR
ET_AFR

Coef
13945
-15.1379
28.843
-0.689
0.022770
-0.05430

S = 425.072

SE Coef
1044
0.7775
2.304
3.628
0.002999
0.01053

R-Sq = 93.4%

T
13.35
-19.47
12.52
-0.19
7.59
-5.16

P
0.000
0.000
0.000
0.850
0.000
0.000

R-Sq(adj) = 92.9%

Analysis of Variance
Source
Regression
Residual Error
Total

DF
5
61
66

SS
156875371
11021838
167897208

MS
31375074
180686

F
173.64

P
0.000

The least squares prediction equation is:

y 13, 945 15.1379 x2 28.843x3 0.689 x5 0.02277 x2 x5 0.0543 x3 x5


c.

To determine if inlet temperature and air flow rate interact to affect heat rate, we test:
H0: 4 = 0
Ha: 4 0
The test statistic is t = 7.59 with a p-value of p = 0.000. Since the p-value is less than

= .05, H0 is rejected. There is sufficient evidence to indicate that inlet temperature and air flow rate
interact to affect heat rate at = .05.

Copyright 2011 Pearson Education, Inc. Publishing as Prentice Hall.

712

Chapter 11

d.

To determine if exhaust temperature and air flow rate interact to affect heat rate, we test:
H0: 5 = 0
Ha: 5 0
The test statistic is t = 5.16 with a p-value of p = 0.000. Since the p-value is less than

= .05, H0 is rejected. There is sufficient evidence to indicate that exhaust temperature and air flow
rate interact to affect heat rate at = .05.
e.

11.45

Since the interaction of inlet temperature and air flow rate is significant, it means that the effect of
inlet temperature on the heat rate depends on the level of air flow rate. Also, since the interaction of
exhaust temperature and air flow rate is significant, it means that the effect of exhaust temperature on
the heat rate also depends on the level of air flow rate

a.

By including the interaction terms, it implies that the relationship between voltage and volume
fraction of the disperse phase depends on the levels of salinity and surfactant concentration.
A possible sketch of the relationship is:

b.

From MINITAB, the output is:


Regression Analysis: Voltage versus x1, x2, x5, x1x2, x1x5
The regression equation is
Voltage = 0.906 - 0.0228 x1 + 0.305 x2 + 0.275 x5 - 0.00280 x1x2
+ 0.00158 x1x5
Predictor
Constant
x1
x2
x5
x1x2
x1x5

Coef
0.9057
-0.022753
0.3047
0.2747
-0.002804
0.001579

S = 0.5047

SE Coef
0.2855
0.008318
0.2366
0.2270
0.003790
0.003947

R-Sq = 67.9%

T
3.17
-2.74
1.29
1.21
-0.74
0.40

P
0.007
0.017
0.220
0.248
0.473
0.696

R-Sq(adj) = 55.6%

Analysis of Variance
Source
Regression
Residual Error
Total
Source
x1
x2
x5
x1x2
x1x5

DF
1
1
1
1
1

DF
5
13
18

SS
7.0103
3.3107
10.3210

MS
1.4021
0.2547

F
5.51

Seq SS
1.4016
1.9263
3.5422
0.0994
0.0408

Copyright 2011 Pearson Education, Inc. Publishing as Prentice Hall.

P
0.006

Multiple Regression and Model Building

713

The fitted regression line is:

y = .906 - .023x1 + .305x2 + .275x5 - .003x1x2 + .002x1x5


To determine if the model is useful, we test:
H0: 1 = 2 = 3 = 4 = 5 = 0
Ha: At least one i 0, for i = 1, 2, ..., 5

The test statistic is F = 5.51.


Since no was given, = .05 will be used. The rejection region requires = .05 in the upper tail of
the F-distribution with 1 = k = 5 and 2 = n (k + 1) = 19 (5 + 1) = 13. From Table VII,
Appendix B, F.05 = 3.03. The rejection region is F > 3.03.
Since the observed value of the test statistic falls in the rejection region (F = 5.51 > 3.03), H0 is
rejected. There is sufficient evidence to indicate the model is useful for predicting voltage at = .05.
R2 = .679. Thus, 67.9% of the sample variation of voltage is explained by the model containing the
three independent variables and two interaction terms.
The estimate of the standard deviation is s = .5047.
Comparing this model to that fit in Exercise 11.20, the model in Exercise 11.20 appears to fit the data
better. The model in Exercise 11.20 has a higher R2 (.771 vs .679) and a smaller estimate of the
standard deviation (.4365 vs .5047).
c.

0 = .906.

This is simply the estimate of the y-intercept.

1 = .023.

For each unit increase in disperse phase volume, we estimate that the mean voltage
will decrease by .023 units, holding salinity and surfactant concentration at 0.

2 = .305.

For each unit increase in salinity, we estimate that the mean voltage will increase
by .305 units, holding disperse phase volume and surfactant concentration at 0.

3 = .275.

For each unit increase in surfactant concentration, we estimate that the mean
voltage will increase by .275 units, holding disperse phase volume and salinity at 0.

4 = .003.

This estimates the difference in the slope of the relationship between voltage and
disperse phase volume for each unit increase in salinity, holding surfactant
concentration constant.

5 = .002.

This estimates the difference in the slope of the relationship between voltage and
disperse phase volume for each unit increase in surfactant concentration, holding
salinity constant.

Copyright 2011 Pearson Education, Inc. Publishing as Prentice Hall.

714

Chapter 11

a.

E(y) = 0 + 1x1 + 2x2 + 3x3 + 4x4 + 5x5

b.

H0: 4 = 0

c.

11.46

t = 4.408, p-value = .001

Since the p-value is so small, there is strong evidence to reject H0. There is sufficient evidence to
indicate that the strength of client-therapist relationship contributes information for the prediction of
a client's reaction for any > .001.
d.
e.
a.

E(y) = 0 + 1x + 2x2
2
2
E(y) = 0 + 1x1 + 2x2 + 3 x1 x2 + 4 x1 + 5 x2

c.
11.48

R2 = .2946. 29.46% of the variability in the client's reaction scores can be explained by this model.

b.

11.47

Answers may vary.

2
2
2
E(y) = 0 + 1x1 + 2x2 + 3x3 + 4 x1 x2 + 5 x1 x3 + 6 x2 x3 + 7 x1 + 8 x2 + 9 x3

a.

H0: 2 = 0
H a: 2 0

The test statistic is t =

2 0
s

.47 0
= 3.133
.15

The rejection region requires /2 = .05/2 = .025 in each tail of the t distribution with df = n (k + 1)
= 25 (2 + 1) = 22. From Table V, Appendix B, t.025 = 2.074. The rejection region is t < 2.074 or t
> 2.074.
Since the observed value of the test statistic falls in the rejection region (t = 3.133 > 2.074), H0 is
rejected. There is sufficient evidence to indicate the quadratic term should be included in the model
at = .05.
b.

H0: 2 = 0
Ha: 2 > 0
The test statistic is the same as in part a, t = 3.133.
The rejection region requires = .05 in the upper tail of the t distribution with df = 22. From Table
V, Appendix B, t.05 = 1.717. The rejection region is t > 1.717.
Since the observed value of the test statistic falls in the rejection region (t = 3.133 > 1.717), H0 is
rejected. There is sufficient evidence to indicate the quadratic curve opens upward at = .05.

11.49

a.

To determine if the model contributes information for predicting y, we test:


H0: 1 = 2 = 0
Ha: At least one i 0, i = 1, 2
The test statistic is F =

R2 / k
(1 R ) /[n (k 1)]
2

.91/ 2
= 85.94
(1 .91) /[20 (2 1)]

Copyright 2011 Pearson Education, Inc. Publishing as Prentice Hall.

Multiple Regression and Model Building

715

The rejection region requires = .05 in the upper tail of the F distribution, with 1 = k = 2, and
2 = n (k + 1) = 20 (2 + 1) = 17. From Table VIII, Appendix B, F.05 = 3.59. The rejection region

is F > 3.59.
Since the observed value of the test statistic falls in the rejection region (F = 85.94 > 3.59), H0 is
rejected. There is sufficient evidence that the model contributes information for predicting y at =
.05.
b.

To determine if upward curvature exists, we test:


H0: 2 = 0
H a: 2 > 0

c.

To determine if downward curvature exists, we test:


H0: 2 = 0
H a: 2 < 0

11.50

a.

b.

c.

11.51

It moves the graph to the right (2x) or to the left (+2x) compared to the graph of
y = 1 + x2.
It controls whether the graph opens up (+x2) or down (x2). It also controls how steep the curvature
is, i.e., the larger the absolute value of the coefficient of x2 , the narrower the curve is.

a.

To determine if at least one of the parameters is nonzero, we test:


H0: 1 = 2 = 3 = 4 = 5 = 0
Ha: At least one i 0, i = 1, 2, 3, 4, 5
The test statistic is F = 25.93, with p-value = 0.000. Since the p-value is less than = .05, H0 is
rejected. There is sufficient evidence to indicate that at least one of the parameters 1, 2, 3, 4, and
5 is nonzero at = .05.

b.

H0: 4 = 0
H a: 4 0
The test statistic is t = 10.74 with p-value = 0.000. Since the p-value is less than
= .01, H0 is rejected. There is sufficient evidence to indicate that 4 0 at = .01.

Copyright 2011 Pearson Education, Inc. Publishing as Prentice Hall.

716

Chapter 11
c.

H0: 5 = 0
H a: 5 0
The test statistic is t = .60 with p-value = .550. Since the p-value is greater than =.01, H0 is not
rejected. There is insufficient evidence to indicate that 5 0 at = .01.

d.
a.

0 has no meaning because x = 0 would not be in the observed range of values

b.

11.52

Graphs may vary.

1 = 321.67. Since the quadratic effect is included in the model, the linear term is
just a location parameter and has no meaning.

c.
d.

11.53

2 = .0794. Since the value of 2 is positive, the curvature is upward.


Since no data could have been collected from 2009 to 2021, we have no idea if the relationship
between the two variables will remain the same until 2021.

a.

b.

11.54

a.

If information were available only for x = 30, 31, 32, and 33, we would suggest a first-order model
where 1 > 0. If information was available only for x = 33, 34, 35, and 36, we would again suggest a
first-order model where 1 < 0. If all the information was available, we would suggest a secondorder model.
To determine if the model is adequate, we test:
H0: 1 = 2 = 0
Ha: At least one i 0

The test statistic is F

R2 / k
(1 R )[n (k 1)]
2

.12 / 2
26.25
(1 .12)[388 (2 1)]

The rejection region requires = .05 in the upper tail of the F distribution with 1 = k = 2 and 2 = n
(k + 1) = 388 (2 + 1) = 385. From Table VIII, Appendix B, F.05 3.00. The rejection region is
F > 3.00.
Since the observed value of the test statistic falls in the rejection region (F = 26.25 > 3.00), H0 is
rejected. There is sufficient evidence to indicate the model is adequate at = .05.

Copyright 2011 Pearson Education, Inc. Publishing as Prentice Hall.

Multiple Regression and Model Building

b.

717

To determine if leadership ability increases at a decreasing rate with assertiveness, we test:


H0: 2 = 0
Ha: 2 < 0

c.

11.55

From the table, the test statistic is t = -3.97 and the p-value is p < .01/2 = .005. Since the p-value is
less than (p < .005 < .05), H0 is rejected. There is sufficient evidence to indicate leadership ability
increases at a decreasing rate with assertiveness at = .05.

a.

The complete 2nd order model is:


2
2
E ( y ) 0 1 x1 2 x2 3 x1 x2 4 x1 5 x2

b.
c.

d.

R2 = .14. 14% of the total variation in the efficiency scores is explained by the complete 2nd order
model containing level of CEO leadership and level of congruence between the CEO and the VP.
2
If the -coefficient for the x2 term is negative, then as the value of the level of congruence increases,
the efficiency will increase at a decreasing rate to some point and then the efficiency will decrease at
an increasing rate, holding level of CEO leadership constant.
Since the p-value is less than (p = .02 < .05), H0 is rejected. There is sufficient evidence to indicate
that the level of CEO leadership and the level of congruence between the CEO and the VP interact to
affect efficiency. This means that the effect of CEO leadership on efficiency depends on the level of
congruence between the CEO and the VP.

11.57

a.

2
2
E(y) = 0 + 1x1 + 2x2 + 3x1x2 + 4 x1 5 x2

b.

11.56

2
2
4 x1 and 5 x2

a.

A first order model is:


E(y) = o + 1x

b.

A second order model is:


E(y) = o + 1x + 2x2

Copyright 2011 Pearson Education, Inc. Publishing as Prentice Hall.

718

Chapter 11

c.

Using MINITAB, a scattergram of these data is:


Scatterplot of International vs Domestic
1200

International

1000
800
600
400
200
0
100

200

300

400
Domestic

500

600

From the plot, it appears that the first order model might fit the data better. There does not appear to
be much of a curve to the relationship.
d.

Using MINITAB, the output is:


Regression Analysis: International versus Domestic, Dsq
The regression equation is
International = 183 - 0.24 Domestic + 0.00262 Dsq
Predictor
Constant
Domestic
Dsq

Coef
182.9
-0.243
0.002625

S = 175.370

SE Coef
301.0
1.849
0.002523

R-Sq = 65.4%

T
0.61
-0.13
1.04

P
0.554
0.897
0.317

R-Sq(adj) = 60.1%

Analysis of Variance
Source
Regression
Residual Error
Total
Source
Domestic
Dsq

DF
1
1

DF
2
13
15

SS
755320
399811
1155131

MS
377660
30755

F
12.28

P
0.001

Seq SS
722025
33295

To investigate the usefulness of the model, we test:


H0: 1 = 2 = 0
Ha: At least one i 0, i = 1, 2
The test statistic is F = 12.28.
The p-value is p = 0.001. Since the p-value is so small, we reject H0. There is sufficient evidence to
indicate the model is useful for predicting foreign gross revenue.

Copyright 2011 Pearson Education, Inc. Publishing as Prentice Hall.

Multiple Regression and Model Building

719

To determine if a curvilinear relationship exists between foreign and domestic gross revenues, we
test:
H0: 2 = 0
H a: 2 0
The test statistic is t = 1.04.
The p-value is p = .317 Since the p-value is greater than = .05
(p = 0.317 > = .05), H0 is not rejected. There is insufficient evidence to indicate that a curvilinear
relationship exists between foreign and domestic gross revenues at = .05.
e.

a.

Using MINITAB, a sketch of the least squares prediction equation is:


Scatterplot of yhat vs Dose
12
10
8

yhat

11.58

From the analysis in part d, the first-order model better explains the variation in foreign gross
revenues. In part d, we concluded that the second-order term did not improve the model.

6
4
2
0
0

100

200

300

400
Dose

500

600

700

800

b.

For x = 500, y = 10.25 + .0053(500) .0000266(5002 ) = 10.25 + 2.65 6.65 = 6.25

c.

For x = 0, y = 10.25 + .0053(0) .0000266(02 ) = 10.25

d.

For x = 100, y = 10.25 + .0053(100) .0000266(1002 ) = 10.25 + .53 .266 = 10.514


This value is slightly larger than that for the control group (10.25).

For x = 200, y = 10.25 + .0053(200) .0000266(2002 ) = 10.25 + 1.06 1.064 = 10.246


This value is slightly smaller than that for the control group (10.25). So, the largest value of x which
yields an estimated weight change that is closest to, but just less than the estimated weight change for
the control group is x = 200.

Copyright 2011 Pearson Education, Inc. Publishing as Prentice Hall.

720
a.

Using MINITAB, the scattergram of the data is:


Scatterplot of Time vs Temp
10000

8000

6000
Time

11.59

Chapter 11

4000

2000

0
120

130

140

150

160

170

Temp

The relationship appears to be curvilinear. As temperature increases, the value of time tends to
decrease but at a decreasing rate.
b.

Using MINITAB the results are:


Regression Analysis: Time versus Temp, Tempsq
The regression equation is
Time = 154243 - 1909 Temp + 5.93 Tempsq
Predictor
Constant
Temp
Tempsq

Coef
154243
-1908.9
5.929

S = 688.137

SE Coef
21868
303.7
1.048

R-Sq = 94.2%

T
7.05
-6.29
5.66

P
0.000
0.000
0.000

R-Sq(adj) = 93.5%

Analysis of Variance
Source
Regression
Residual Error
Total
Source
Temp
Tempsq

DF
1
1

DF
2
19
21

SS
144830280
8997107
153827386

MS
72415140
473532

F
152.93

P
0.000

Seq SS
129663987
15166293

The fitted regression line is: y = 154,243 1,908.9temp + 5.929temp 2

Copyright 2011 Pearson Education, Inc. Publishing as Prentice Hall.

Multiple Regression and Model Building

c.

721

To determine if there is an upward curvature in the relationship between failure time and solder
temperature, we test:
Ho: 2 = 0
Ha: 2 > 0

From the printout, the test statistic is t = 5.66 and the p-value is p = 0.000. Since the p-value is less
than (p = 0.000 < .05), H0 is rejected. There is sufficient evidence to indicate an upward curvature
in the relationship between failure time and solder temperature at = .05.
11.60

a.

Using MINITAB, the results are:

Regression Analysis: RATE versus EST, Esq


The regression equation is
RATE = - 288 + 1.39 EST + 0.000035 Esq
Predictor
Constant
EST
Esq

Coef
-288
1.395
0.00003509

S = 31901.1

SE Coef
8049
3.651
0.00009724

R-Sq = 45.9%

T
-0.04
0.38
0.36

P
0.972
0.706
0.722

R-Sq(adj) = 40.8%

Analysis of Variance
Source
Regression
Residual Error
Total
Source
EST
Esq

DF
1
1

DF
2
21
23

SS
18138955261
21371254395
39510209656

MS
9069477631
1017678781

F
8.91

P
0.002

Seq SS
18006405335
132549926

To determine if the incidence rate is curvilinearly related to the estimated rate, we test:
H0: 2 = 0
H a: 2 0

From the printout, the test statistic is t = .36 and the p-value is p = .722. Since the p-value is not less
than (p = .722 < .05), H0 is not rejected. There is insufficient evidence to indicate that the incidence
/
rate is curvilinearly related to the estimated rate at = .05.

Copyright 2011 Pearson Education, Inc. Publishing as Prentice Hall.

722

Chapter 11

b.

Using MINITAB, the scatterplot of the data is:


Scatterplot of RATE vs EST
200000

RA TE

150000

100000

50000

0
0

10000

20000
EST

30000

40000

The point for Botulism is in the lower right hand corner of the graph. The estimated value is way
bigger than the actual value.
c.

Using MINITAB, the results are:


Regression Analysis: RATE2 versus EST2, Esq2
The regression equation is
RATE2 = 735 - 0.081 EST2 + 0.000151 Esq2
Predictor
Constant
EST2
Esq2

Coef
735.0
-0.0810
0.00015052

S = 2756.80

SE Coef
695.9
0.3167
0.00000868

R-Sq = 99.6%

T
1.06
-0.26
17.34

P
0.303
0.801
0.000

R-Sq(adj) = 99.6%

Analysis of Variance
Source
Regression
Residual Error
Total
Source
EST2
Esq2

DF
1
1

DF
2
20
22

SS
39251490541
151998825
39403489366

MS
19625745270
7599941

F
2582.35

P
0.000

Seq SS
36967483627
2284006914

To determine if the incidence rate is curvilinearly related to the estimated rate after eliminating the
point for Botulism, we test:
H0: 2 = 0
H a: 2 0

Copyright 2011 Pearson Education, Inc. Publishing as Prentice Hall.

Multiple Regression and Model Building

723

From the printout, the test statistic is t = 17.34 and the p-value is p = .000. Since the p-value is less
than (p = .000 < .05), H0 is rejected. There is sufficient evidence to indicate that the incidence rate is
curvilinearly related to the estimated rate after omitting the Botulism point at = .05.
Yes, the fit has improved. With all of the points, the value of R2 = 45.9%. When the Botulism point
has been omitted, the R2 = 99.6%. Almost all of the variation in the Incidence rates is explained by the
curvilinear relationship between incidence rate and estimated value.
11.61

The model would be E(y) = 0 + 1x + 2x2. Since the value


of y is expected to increase and then decrease as x gets larger,
2 will be negative. A sketch of the model would be:

11.62

a.

A scatterplot of the data is:

10500+
7000+
3500+
-

*
*

*
*
*
*

***
*
* * *
*
*

**
**
*
*

*
*

*
*

** *
*
*
*

*
*
* *
*

*
*

+---------+---------+---------+---------+---------+------X
0.0
8.0
16.0
24.0
32.0
40.0

b.

From the plot, it looks like a second-order model would fit the data better than a first-order model.
There is little evidence that a third-order model would fit the data better than a second-order model.

Copyright 2011 Pearson Education, Inc. Publishing as Prentice Hall.

724

Chapter 11
c.

Using MINITAB, the output for fitting a first-order model is:


The regression equation is
Y = 2752 + 122 X
Predictor
Constant
X

Coef
2752.4
122.34

s = 1904

Stdev
613.5
26.08

R-sq = 36.7%

t-ratio
4.49
4.69

p
0.000
0.000

R-sq(adj) = 35.0%

Analysis of Variance
SOURCE
Regression
Error
Total

DF
1
38
39

SS
79775688
137726224
217501920

Unusual Observations
Obs.
X
Y
27
27.0
2007
40
40.0
11520

MS
79775688
3624374

Fit Stdev.Fit
6056
345
7646
591

F
22.01

Residual
-4049
3874

p
0.000

St.Resid
-2.16R
2.14R

R denotes an obs. with a large st. resid.

To see if there is a significant linear relationship between day and demand, we test:
H0: 1 = 0
H a: 1 0
The test statistic is t = 4.69.
The p-value for the test is p = 0.000. Since the p-value is less than = .05, H0 is rejected. There is
sufficient evidence to indicate that there is a linear relationship between day and demand at = .05.
d.

Using MINITAB, the output for fitting a second-order model is:


The regression equation is
Y = 5120 - 216 X + 8.25 XSQ
Predictor
Constant
X
XSQ
s = 1637

Coef
5120.2
-215.92
8.250

Stdev
816.9
91.89
2.173

R-sq = 54.4%

t-ratio
6.27
-2.35
3.80

p
0.000
0.024
0.001

R-sq(adj) = 52.0%

Copyright 2011 Pearson Education, Inc. Publishing as Prentice Hall.

Multiple Regression and Model Building

725

Analysis of Variance
SOURCE
Regression
Error
Total

DF
2
37
39

SS
118377056
99124856
217501920

SOURCE
X
XSQ

DF
1
1

SEQ SS
79775688
38601372

Unusual Observations
Obs.
X
Y
27
27.0
2007

MS
59188528
2679050

Fit Stdev.Fit
5305
357

F
22.09

Residual
-3298

p
0.000

St.Resid
-2.06R

R denotes an obs. with a large st. resid.

To see if there is a significant quadratic relationship between day and demand, we test:
H0: 2 = 0
H a: 2 0
The test statistic is t = 3.80.
The p-value for the test is p = 0.001. Since the p-value is less than = .05, H0 is rejected. There is
sufficient evidence to indicate that there is a quadratic relationship between day and demand at
= .05.
e.

11.63

Since the quadratic term is significant in the second-order model in part d, the second order model is
better.

1 if qualitative variable assumes 2nd level


Let x =
0 otherwise
The model is E(y) = 0 + 1x

0 = mean value of y when the qualitative variable assumes the first level
1 = difference in the mean values of y between levels 2 and 1 of the qualitative variable
11.64

The model is E(y) = 0 + 1x1 + 2x2


where

1 if the variable is at level 2


x1
0 otherwise

1 if the variable is at level 3


x2
0 otherwise

0 = mean value of y when qualitative variable is at level 1.


1 = difference in mean value of y between level 2 and level 1 of qualitative variable.
2 = difference in mean value of y between level 3 and level 1 of qualitative variable.

Copyright 2011 Pearson Education, Inc. Publishing as Prentice Hall.

726

11.65

Chapter 11

a.

Level 1 implies x1 = x2 = x3 = 0. y = 10.2

Level 2 implies x1 = 1 and x2 = x3 = 0. y = 10.2 - 4(1) = 6.2

Level 3 implies x2 = 1 and x1 = x3 = 0. y = 10.2 + 12(1) = 22.2

Level 4 implies x3 = 1 and x1 = x2 = 0. y = 10.2 + 2(1) = 12.2

b.

The hypotheses are:


H0: 1 = 2 = 3 = 0
Ha: At least one i 0, i = 1, 2, 3

11.66

a.

The least squares prediction equation is:

y = 80 + 16.8x1 + 40.4x2
b.

1 estimates the difference in the mean value of the dependent variable between level 2 and level 1
of the independent variable.

2 estimates the difference in the mean value of the dependent variable between level 3 and level 1
c.

of the independent variable.


The hypothesis H0: 1 = 2 = 0 is the same as H0: 1 = 2 = 3.
The hypothesis Ha: At least one of the parameters 1 and 2 differs from 0 is the same as Ha: At
least one mean (1, 2, or 3) is different.

d.

The test statistic is F =

MSR 2059.5

= 24.72
MSE
83.3

Since no was given, we will use = .05. The rejection region requires = .05 in the upper tail of
the test statistic with numerator df = k = 2 and denominator df = n (k + 1) = 15 (2 + 1) = 12.
From Table VIII, Appendix B, F.05 = 3.89. The rejection region is F > 3.89.
Since the observed value of the test statistic falls in the rejection region (F = 24.72 > 3.89), H0 is
rejected. There is sufficient evidence to indicate at least one of the means is different at = .05.
11.67

a.

1 if grape-picking method is manual


Let x1
0 otherwise

1 if soil type is clay


Let x2
0 otherwise

1 if soil type is gravel


Let x3
0 otherwise

1 if slope orientation is East


Let x4
0 otherwise

1 if slope orientation is South


Let x5
0 otherwise

1 if slope orientation is West


Let x6
0 otherwise

1 if slope orientation is Southeast


Let x7
0 otherwise

Copyright 2011 Pearson Education, Inc. Publishing as Prentice Hall.

Multiple Regression and Model Building

727

b.

c.

The model is: E(y) = 0 + 1x2 + 2x3


0 = mean wine quality for soil type sand
1 = difference in mean wine quality between soil types clay and sand
2 = difference in mean wine quality between soil types gravel and sand

d.

11.68

The model is: E(y) = 0 + 1x1


o = mean wine quality for grape-picking method automated
1 = difference in mean wine quality between grape-picking methods manual and automated

The model is: E(y) = 0 + 1x4 + 2x5 + 3x6 + 4x7


0 = mean wine quality for slope orientation Southwest
1 = difference in mean wine quality between slope orientations East and Southwest
2 = difference in mean wine quality between slope orientations South and Southwest
3 = difference in mean wine quality between slope orientations West and Southwest
4 = difference in mean wine quality between slope orientations Southeast and Southwest
1 if race is black
Let x1
0 otherwise

1 if availability is high
Let x2
0 otherwise

1 if position is quarterback
Let x3
0 otherwise

1 if position is running back


Let x4
0 otherwise

1 if position is wide receiver


Let x5
0 otherwise

1 if position is tight end


Let x6
0 otherwise

1 if position is defensive lineman


Let x7
0 otherwise

a.

1 if position is linebacker
Let x8
0 otherwise

1 if position is defensive back


Let x9
0 otherwise
b.

The model is: E(y) = 0 + 1x1


0 = mean price for race black
1 = difference in mean price between races white and black

c.

The model is: E(y) = 0 + 2x2


0 = mean price for card availability low
2 = difference in mean price between card availabilities high and low

d.

The model is: E(y) = 0 + 3x3 + 4x4 + 5x5 + 6x6 + 7x7 + 8x8 + 9x9
0 = mean price for position offensive lineman
3 = difference in mean price between player positions quarterback and offensive lineman
4 = difference in mean price between player positions running back and offensive lineman
5 = difference in mean price between player positions wide receiver and offensive lineman
6 = difference in mean price between player positions tight end and offensive lineman
7 = difference in mean price between player positions defensive lineman and offensive lineman
8 = difference in mean price between player positions linebacker and offensive lineman
9 = difference in mean price between player positions defensive back and offensive lineman

Copyright 2011 Pearson Education, Inc. Publishing as Prentice Hall.

728

11.69

Chapter 11

a.

1 if Developer
Let x
0 otherwise
Then the model would be: E ( y ) 0 1 x

0 = mean accuracy for the Project Leader


1 = difference in mean accuracy between the Developer and the Project Leader
b.

1 if Low
Let x1
0 otherwise

1 if Medium
Let x2
0 otherwise

Then the model would be: E ( y ) 0 1 x1 2 x2

0 = mean accuracy for the High task complexity


1 = difference in mean accuracy between Low and High task complexity
2 = difference in mean accuracy between Medium and High task complexity
c.

1 if Fixed price
Let x
0 otherwise
Then the model would be: E ( y ) 0 1 x
0 = mean accuracy for the Hourly rate
1 = difference in mean accuracy between the Fixed price and the Hourly rate

d.

1 if Time-of-delivery
Let x1
0 otherwise

1 if Cost
Let x2
0 otherwise

Then the model would be: E ( y ) 0 1 x1 2 x2

0 = mean accuracy for the Quality


1 = difference in mean accuracy between Time-of-delivery and Quality
2 = difference in mean accuracy between Cost and Quality
11.70

a.

The model would be: E(y) = 0 + 1x

b.

0 = mean relative optimism for analysts who worked for sell-side firms
1 = difference in mean relative optimism for analysts who worked for buy-side and sell-side firms

c.

Yes.

d.

Yes. If the buy-side analysts are less optimistic, then their estimates will be smaller than the sell-side
estimates. Thus, the estimate of 1 will be negative.

Copyright 2011 Pearson Education, Inc. Publishing as Prentice Hall.

Multiple Regression and Model Building

11.71

a.

729

2
Radj = .76. 76% of the total sample variation of SAT-Math scores is explained by the regression

model including score on PSAT and whether the student was coached or not, adjusting for the sample
size and the number of independent variables in the model.
b.

For confidence level .95, = .05 and /2 = .05/2 = .025. From Table V, Appendix B,
with df = n (k + 1) = 3,492 (2 + 1) = 3,489, t.025 = 1.96. The 95% confidence interval

is:

2 t / 2 s 19 1.96(3) 19 5.88 (13.12, 24.88)


2

We are 95% confident that the mean SAT-Math score for those who were coached was anywhere
from 13.12 to 24.88 points higher than the mean for those who were not coached, holding PSAT
scores constant.
c.

11.72

Since 0 is not contained in the confidence interval for 2, we can conclude that the coaching effect
was present. Those who received coaching scored higher on the SAT-Math than those who did not,
holding PSAT scores constant.

a.

4 = .296 The difference in the mean value of DTVA between when the operating earnings are
negative and lower than last year and when the operating earnings are not negative and lower than
last year is estimated to be .296, holding all other variables constant.

b.

To determine if the mean DTVA for firms with negative earnings and earnings lower than last year
exceed the mean DTVA of other firms, we test:
H0: 4 = 0
Ha: 4 > 0
The p-value for this test is p = .001 / 2 = .0005. Since the p-value is so small, we would reject H0 for
= .05. There is sufficient evidence to indicate the mean DTVA for firms with negative earnings
and earnings lower than last year exceed the mean DTVA of other firms at = .05.

c.

2
Ra = .280 28% of the variability in the DTVA scores is explained by the model containing the 5

independent variables, adjusted for the number of variables in the model and the sample size.
11.73

a.

To determine if there is a difference in the mean monthly rate of return for T-Bills between an
expansive Fed monetary policy and a restrictive Fed monetary policy, we test:
H0: 1 = 0
Ha: 1 0
The test statistic is t = 8.14.
Since no n nor is given, we cannot determine the exact rejection region. However, we can assume
that n is greater than 2 since the data used are from 1972 and 1997. With = .05, the critical value
of t for the rejection region will be smaller than 4.303. Thus, with = .05, t = 8.14 will fall in the
rejection region. There is sufficient evidence to indicate a difference in the mean monthly rate of
return for T-Bills between an expansive Fed monetary policy and a restrictive Fed monetary policy at
= .05.

Copyright 2011 Pearson Education, Inc. Publishing as Prentice Hall.

730

Chapter 11

However, the value of R2 is .1818. The model used is explaining only 18.18% of the variability in
the monthly rate of return. This is not a particularly large value.
To determine if there is a difference in the mean monthly rate of return for Equity REIT between an
expansive Fed monetary policy and a restrictive Fed monetary policy, we test:
H0: 1 = 0
Ha: 1 0
The test statistic is t = 3.46.
Since no n nor is given, we cannot determine the exact rejection region. However, we can assume
that n is greater than 4 since the data used are from 1972 and 1997. With = .05, the critical value
of t for the rejection region will be smaller than 3.182. Thus, with = .05, t = 3.46 will fall in the
rejection region. There is sufficient evidence to indicate a difference in the mean monthly rate of
return for Equity REIT between an expansive Fed monetary policy and a restrictive Fed monetary
policy at = .05.

However, the value of R2 is .0387. The model used is explaining only 3.87% of the variability in the
monthly rate of return. This is a very small value.
b.

For the first model, 1 is the difference in the mean monthly rate of return for T-Bills between an
expansive Fed monetary policy and a restrictive Fed monetary policy.
For the second model, 1 is the difference in the mean monthly rate of return for Equity REIT
between an expansive Fed monetary policy and a restrictive Fed monetary policy.

c.

The least squares prediction equation for the equity REIT index is:

y = 0.01863 0.01582x.
When the Federal Reserves monetary policy is restrictive, x = 1. The predicted mean monthly rate of
return for the equity REIT index is

y = 0.01863 0.01582(1) = .00281


When the Federal Reserves monetary policy is expansive, x = 0. The predicted mean monthly rate of
return for the equity REIT index is

y = 0.01863 0.01582(0) = .01863.

Copyright 2011 Pearson Education, Inc. Publishing as Prentice Hall.

Multiple Regression and Model Building

11.74

a.

731

1 if study group complete solution


Let x1
0 otherwise

1 if study group check figures


Let x2
0 otherwise

A possible model would be: E(y) = o + 1x1 + 2x2


b.

The difference between the mean knowledge gains of students in the completed solution and no
help groups would be 1.

c.

Using MINITAB, the results are:


Regression Analysis: IMPROVE versus X1, X2
The regression equation is
IMPROVE = 2.43 - 0.483 X1 + 0.287 X2
Predictor
Constant
X1
X2

Coef
2.4333
-0.4833
0.2867

S = 2.70636

SE Coef
0.4941
0.7813
0.7329

R-Sq = 1.2%

T
4.92
-0.62
0.39

P
0.000
0.538
0.697

R-Sq(adj) = 0.0%

Analysis of Variance
Source
Regression
Residual Error
Total
Source
X1
X2

DF
1
1

DF
2
72
74

SS
6.643
527.357
534.000

MS
3.322
7.324

F
0.45

P
0.637

Seq SS
5.523
1.121

The least squares prediction equation is: y = 2.4333 .4833x1 + .2867 x 2


d.

To determine if the model is useful, we test:


Ho: 1 = 2 = 0
Ha: At least one i 0
From the printout, the test statistic is F = .45 and the p-value is p = .637. Since the p-value is not less
than (p = .637 < .05), H0 is not rejected. There is insufficient to indicate that the model was useful
/
at = .05..

e.

From Exercise 8.28, the test statistic was F = .45 and the p-value was p = .637. These are the same as
those in part d. Thus, the results agree.

Copyright 2011 Pearson Education, Inc. Publishing as Prentice Hall.

732

11.75

Chapter 11

a.

1 if Lotion/cream
Let x =
0 otherwise
The model is E ( y ) 0 1 x.

b.

From MINITAB, the output is:


Regression Analysis: Cost/Use versus Type
The regression equation is
Cost/Use = 0.778 + 0.109 Type
Predictor
Constant
Type
S = 0.8415

Coef
0.7775
0.1092

SE Coef
0.2975
0.4545

R-Sq = 0.5%

T
2.61
0.24

P
0.023
0.814

R-Sq(adj) = 0.0%

Analysis of Variance
Source
Regression
Residual Error
Total

DF
1
12
13

SS
0.0409
8.4973
8.5381

MS
0.0409
0.7081

F
0.06

P
0.814

The fitted model is: y 0.7775 .1092 x


c.

To determine whether repellent type is a useful predictor of cost-per-use, we test:


H0: 1 = 0

d.

The alternative hypothesis is


Ha: 1 0
The test statistic is t = 0.24 and the p-value is p = 0.814.
Since the p-value is greater than (p = .814 > .10), H0 is not rejected. There is insufficient evidence
to indicate that repellent type is a useful predictor of cost-per-use at = .10.

e.

The dummy variable will be defined the same way and the model will look the same (just the
dependent variable will be different).
From MINITAB, the output is:
Regression Analysis: MaxProt versus Type
The regression equation is
MaxProt = 7.56 - 1.65 Type
Predictor
Constant
Type
S = 6.617

Coef
7.563
-1.646

SE Coef
2.339
3.574

R-Sq = 1.7%

T
3.23
-0.46

P
0.007
0.653

R-Sq(adj) = 0.0%

Analysis of Variance
Source
Regression
Residual Error
Total

DF
1
12
13

SS
9.29
525.43
534.71

MS
9.29
43.79

F
0.21

Copyright 2011 Pearson Education, Inc. Publishing as Prentice Hall.

P
0.653

Multiple Regression and Model Building

733

The fitted model is: y 7.56 1.65 x


To determine whether repellent type is a useful predictor of cost-per-use, we test:
H0: 1 = 0
Ha: 1 0
The test statistic is t = 0.46 and the p-value is p = 0.653.
Since the p-value is greater than (p = .653 > .10), H0 is not rejected. There is insufficient evidence
to indicate that repellent type is a useful predictor of maximum number of hours of protection at
= .10.
11.76

a.

For no stock split, x1 = 0. For high discretionary accrual, x2 = 1. The mean buy-and-hold return rate is
E(y) = 0 + 1x1 + 2x2 + 3x1 x2 = o + 1(0) + 2(1) + 3(0)(1) = o + 2.

b.

For no stock split, x1 = 0. For low discretionary accrual, x2 = 0. The mean buy-and-hold return rate is
E(y) = 0 + 1x1 + 2x2 + 3x1 x2 = 0 + 1(0) + 2(0) + 3(0)(0) = 0.

c.

The difference would be 0 + 2 0 = 2.

d.

For stock split, x1 = 1. For high discretionary accrual, x2 = 1. The mean buy-and-hold return rate is
E(y) = 0 + 1x1 + 2x2 + 3x1 x2 = 0 + 1(1) + 2(1) + 3(1)(1) = 0 + 1 + 2 + 3.
For stock split, x1 = 1. For low discretionary accrual, x2 = 0. The mean buy-and-hold return rate is
E(y) = 0 + 1x1 + 2x2 + 3x1 x2 = 0 + 1(1) + 2(0) + 3(1)(0) = 0 + 1.
The difference would be 0 + 1 + 2 + 3 (0 + 1) = 2 + 3.

e.

When there is no stock split, the mean buy-and-hold return rate increases by 2 when discretionary
accrual goes from low to high. When there is a stock split, the mean buy-and-hold return rate
increases by 2 + 3 when discretionary accrual goes from low to high. Thus, the effect of
discretionary accrual on the mean buy-and-hold return rate depends on the level of stock split.

f.

Since the p-value is less than (p = .027 < .05), Ho is rejected. There is sufficient evidence to indicate
that interaction between stock split and discretionary accrual exists at = .05.

g.

Yes. For no stock split, the difference between high discretionary accrual and low discretionary
accrual is 2. Since 2 is negative, then the performance of the high discretionary accrual acquirers is
worse than low discretionary accrual acquirers.
For stock split, the difference between high discretionary accrual and low discretionary accrual is 2 +
3. Since both 2 and 3 are negative, then the performance of the high discretionary accrual acquirers
is worse than low discretionary accrual acquirers, and even worse than for no stock split.

Copyright 2011 Pearson Education, Inc. Publishing as Prentice Hall.

734

11.77

Chapter 11

a.

1 if Group V
Let x1
0 otherwise

1 if Group S
Let x2
0 otherwise

The model would be: E ( y ) 0 1 x1 2 x2


b.

Using MINITAB, the results are:


Regression Analysis: Recall versus x1, x2
The regression equation is
Recall = 3.17 - 1.08 x1 - 1.45 x2
Predictor
Constant
x1
x2

Coef
3.1667
-1.0833
-1.4537

S = 1.73596

SE Coef
0.1670
0.2362
0.2362

R-Sq = 11.3%

T
18.96
-4.59
-6.15

P
0.000
0.000
0.000

R-Sq(adj) = 10.7%

Analysis of Variance
Source
Regression
Residual Error
Total
Source
x1
x2

DF
1
1

DF
2
321
323

SS
123.265
967.352
1090.617

MS
61.633
3.014

F
20.45

P
0.000

Seq SS
9.150
114.116

The least squares prediction equation is: y 3.1667 1.0833x1 1.4537 x2 .


c.

To determine if the overall model is useful, we test:


H0: 1 = 2 = 0
Ha: At least one i 0
The test statistic is F = 20.45 and the p-value is p = 0.000. Since the p-value is less than = .01, H0
is rejected. There is sufficient evidence to indicate the model is useful in predicting brand recall at
= .01.
From the Chapter 8 SIA, the test statistic was F = 20.45 and the p-value was p = 0.000. These are
identical to those above. The model is useful in predicting recall. This is the same as the conclusion
that there is a difference in mean recall among the 3 groups.

d.

With the dummy variable coding in part a, 0 is the mean recall for group N. Thus, the estimated
mean recall for Group N is 3.1667 or 3.17. 1 is the difference in mean recall between Group V and
Group N. Thus, the mean recall for Group V is 0 + 1 and is estimated to be 3.1667 1.0833 =
2.0834 or 2.08. 2 is the difference in mean recall between Group S and Group N. Thus, the mean
recall for Group S is 0 + 2 and is estimated to be 3.1667 1.4537 = 1.7130 or 1.71.

Copyright 2011 Pearson Education, Inc. Publishing as Prentice Hall.

Multiple Regression and Model Building

735

a.

The first-order model is E(y) = 0 + 1x1

b.

11.78

The new model is E(y) = 0 + 1x1 + 2x2 + 3x3

1 if level 2
where x 2
0 otherwise
c.

1 if level 3
x3
0 otherwise

To allow for interactions, the model is:


E(y) = 0 + 1x1 + 2x2 + 3x3 + 4x1x2 + 5x1x3

d.
e.

There will be one response line if 2 = 3 = 4 = 5 = 0

a.

2
The complete second-order model is E(y) = 0 + 1x1 + 2 x1

b.

11.79

The response lines will be parallel if 4 = 5 = 0

2
The new model is E(y) = 0 + 1x1 + 2 x1 + 3x2 + 4x3

1 if level 2
where x2 =
0 otherwise
c.

1 if level 3
x3 =
0 otherwise

The model with the interaction terms is:


2
2
2
E(y) = 0 + 1x1 + 2 x1 + 3x2 + 4x3 + 5x1x2 + 6x1x3 + 7 x1 x2 8 x1 x3

d.

e.

The response curves will be parallel lines if the interaction terms as well as the second-order terms
are absent or if 2 = 5 = 6 = 7 = 8 = 0.

f.

11.80

The response curves will have the same shape if none of the interaction terms are present or if 5 = 6
= 7 = 8 = 0.

The response curves will be identical if no terms involving the qualitative variable are present or 3 =
4 = 5 = 6 = 7 = 8 = 0.

a.

When x2 = x3 = 0, E(y) = 0 + 1x1


When x2 = 1 and x3 = 0, E(y) = 0 + 1x1 + 2
When x2 = 0 and x3 = 1, E(y) = 0 + 1x1 + 3

b.

For level 1, y = 44.8 + 2.2x1

For level 2, y = 44.8 + 2.2x1 + 9.4


= 54.2 + 2.2x1

For level 3, y = 44.8 + 2.2x1 + 15.6


= 60.4 + 2.2x1

Copyright 2011 Pearson Education, Inc. Publishing as Prentice Hall.

736

11.81

Chapter 11

a.

For x2 = 0 and x3 = 0, y = 48.8 3.4 x1 + .07 x12

For x2 = 1 and x3 = 0, y = 48.8 3.4 x1 + .07 x12 2.4(1) + 3.7 x1 (1) .02 x12 (1)
= 46.4 + 0.3 x1 + .05 x12

For x2 = 0 and x3 = 1, y = 48.8 3.4 x1 + .07 x12 7.5(1) + 2.7 x1 (1) .04 x12 (1)
= 41.3 0.7 x1 + 0.03 x12

b.

11.82

The plots of the lines are:

2
The model is E(y) = 0 + 1x1 + 2 x1 + 3x2 + 4x3 + 5x4
where x1 is the quantitative variable and

1 if level 2 of qualitative variable


x2 =
0 otherwise
1 if level 3 of qualitative variable
x3 =
0 otherwise
1 if level 4 of qualitative variable
x4 =
0 otherwise
11.83

a.

To determine if the model is adequate, we test:

Ho: 1 = 2 = 3 = . . . 12 = 0
Ha: At least 1 i 0
The test statistic is F = 26.9.
Using Tables VII, VIII, IX, and X, Appendix B, with 1 = k = 12 and 2 = n (k + 1) = 148 (12 + 1)
= 135, the p-value associated with F = 26.9 is less than .001. Since the p-value is so small, H0 is
rejected. There is sufficient evidence to indicate the model is adequate.

R2 = .705. 70.5% of the total variation of the natural logarithm of card prices is explained by the
model with the 12 variables in the model.
Adj-R2 = .681. 68.1% of the total variation of the natural logarithm of card prices is explained by the
model with the 12 variables in the model, adjusting for the sample size and the number of variables in
the model.
Since these R2 values are fairly large, it indicates that the model is pretty good.

Copyright 2011 Pearson Education, Inc. Publishing as Prentice Hall.

Multiple Regression and Model Building

b.

737

To determine if race contributes to the price, we test:

H0: 1 = 0
Ha: 1 0
The test statistic is t = 1.014 and the p-value is p = .312. Since the p-value is so large, H0 is not
rejected. There is insufficient evidence to indicate race has an impact on the value of professional
football players rookie cards for any reasonable value of , holding the other variables constant.
c.

To determine if card vintage contributes to the price, we test:

H0: 3 = 0
Ha: 3 0
The test statistic is t = 10.92 and the p-value is p = .000. Since the p-value is so small, H0 is rejected.
There is sufficient evidence to indicate card vintage has an impact on the value of professional
football players rookie cards for any reasonable value of , holding the other variables constant.
d.

11.84

The first order model is: E(y) = 0 + 1x3 + 2x5 + 3x6 + 4x7 + 5x8 + 6x9 + 7x10 + 8x11
+ 9x12 + 10x5x3 + 11x6x3 + 12x7x3 + 13x8 x3 + 14x9 x3 + 15x10 x3 + 16x11 x3 + 17x12 x3

a.

R2 = .069. 6.9% of the total variation of the relative optimism of the analysts 3-month horizon
forecasts is explained by the model containing type of firm, number of days between forecast and
fiscal year-end, and the natural logarithm of the number of quarters the analyst had worked with the
firm.

b.

To determine if the model is useful, we test:

H0: 1 = 2 = 3 = 0
Ha: At least 1 i 0
The test statistic is F =

R2 / k
(1 R ) /[ n (k + 1)]
2

.069 / 3
= 274.64
(1 .069) /[11,121 (3 + 1)]

The rejection region requires = .01 in the upper tail of the F distribution with 1 = k = 3 and
2 = n (k + 1) = 11,121 (3+1) = 11,117. From Table X, Appendix B, F.01 = 3.78. The rejection
region is F > 3.78.
Since the observed value of the test statistic falls in the rejection region (F = 274.64 > 3.78), H0 is
rejected. There is sufficient evidence to indicate the model is useful at = .01.
c.

To determine if x1 contributes significantly to the prediction of y, we test:

H0: 1 = 0
Ha: 1 0
The test statistic is t = 4.3.
The rejection region requires = .01/2 = .005 in each tail of the t distribution. From Table V,
Appendix, with df = n (k + 1) = 11,121 (3 + 1) = 11,117, t.005 = 2.576. The rejection region is t >
2.576 or t < -2.576.
Since the observed value of the test statistic falls in the rejection region (t = 4.3 > 2.576), H0 is
rejected. There is sufficient evidence to indicate x1 contributes significantly to the prediction of y at
= .01, holding the other variables constant.

Copyright 2011 Pearson Education, Inc. Publishing as Prentice Hall.

738

Chapter 11

d.

11.85

a.

Yes. In part c, we concluded that 1 is different from 0. Because the estimate of 1 is greater than 0,
we can conclude that 1 is positive. Therefore, the earnings forecasts by the analysts at buy-side firms
are more optimistic than forecasts made by analysts at sell-side firms, holding the other variables
constant.
For obese smokers, x2 = 0. The equation of the hypothesized line relating mean REE to time after
smoking for obese smokers is:

E(y) = 0 + 1x1 + 2(0) + 3x1(0) = 0 + 1x1


The slope of the line is 1.
b.

For normal weight smokers, x2 = 1. The equation of the hypothesized line relating mean REE to time
after smoking for normal smokers is:

E(y) = 0 + 1x1 + 2(1) + 3x1(1) = (0 + 2) + (1 + 3)x1


The slope of the line is 1 + 3.
c.

The reported p-value is .044. Since the p-value is small, there is evidence to indicate that interaction
between time and weight is present for > .044.
For = .01, there is no evidence to indicate that interaction between time and weight is present.

11.86

a.

1 if perceived organizational support is low


Let x2
0 otherwise
1 ifperceived organizational support is neutral
Let x3
0 otherwise

b. The model would be E(y) = o + 1x1 + 2x2 + 3x3.


c. The model would be E(y) = o + 1x1 + 2x2 + 3x3 + 4x1x2 + 5x1x3.
d. If the effect of bullying on intention to leave is greater at the low level of POS than at the
high level of POS, this indicates that POS and bullying interact. Thus, the model in part c
supports these findings.
11.87

a.

1 if Channel catfish
Let x1 =
0 otherwise

1 if Largemouth bass
x2 =
0 otherwise

b.

Let x3 = weight. The model would be: E ( y ) 0 1 x1 2 x2 3 x3

c.

The model would be: E ( y ) 0 1 x1 2 x2 3 x3 4 x1 x3 4 x2 x3

Copyright 2011 Pearson Education, Inc. Publishing as Prentice Hall.

Multiple Regression and Model Building

d.

739

From MINITAB, the output is:


Regression Analysis: DDT versus x1, x2, Weight
The regression equation is
DDT = 3.1 + 26.5 x1 - 4.1 x2 + 0.0037 Weight
Predictor
Constant
x1
x2
Weight

Coef
3.13
26.51
-4.09
0.00371

S = 98.57

SE Coef
38.89
21.52
37.91
0.02598

R-Sq = 1.7%

T
0.08
1.23
-0.11
0.14

P
0.936
0.220
0.914
0.887

R-Sq(adj) = 0.0%

Analysis of Variance
Source
Regression
Residual Error
Total
Source
x1
x2
Weight

DF
3
140
143

DF
1
1
1

SS
23652
1360351
1384003

MS
7884
9717

F
0.81

P
0.490

Seq SS
23041
414
198

The least squares prediction equation is: y 3.1 26.5 x1 4.1x2 0.0037 x3
e.

3 0.0037 . For each additional gram of weight, the mean level of DDT is expected to increase by
0.0037 units, holding species constant.

f.

From MINITAB, the output is:


Regression Analysis: DDT versus x1, x2, Weight, x1Weight, x2Weight
The regression equation is
DDT = 3.5 + 25.6 x1 - 3.5 x2 + 0.0034 Weight + 0.0008 x1Weight
- 0.0013 x2Weight
Predictor
Constant
x1
x2
Weight
x1Weight
x2Weight

Coef
3.50
25.59
-3.47
0.00344
0.00082
-0.00129

S = 99.29

SE Coef
54.69
67.52
84.70
0.03843
0.05459
0.09987

R-Sq = 1.7%

T
0.06
0.38
-0.04
0.09
0.02
-0.01

P
0.949
0.705
0.967
0.929
0.988
0.990

R-Sq(adj) = 0.0%

Analysis of Variance
Source
Regression
Residual Error
Total
Source
x1
x2
Weight
x1Weight
x2Weight

DF
1
1
1
1
1

DF
5
138
143

SS
23657
1360346
1384003

MS
4731
9858

F
0.48

Seq SS
23041
414
198
4
2

The least squares prediction equation is:

y 3.5 25.6 x1 3.5 x2 0.0034 x3 0.0008 x1 x3 .0013x2 x3

Copyright 2011 Pearson Education, Inc. Publishing as Prentice Hall.

P
0.791

740

Chapter 11

g.

For Channel catfish, x1 = 1 and x2 = 0. The least squares line is

y 3.5 25.6(1) 0.0034 x3 0.0008(1) x3 29.1 .0042 x3


The estimated slope is .0042.

11.88

a.

The first-order model is:

E(y) = 0 + 1x1 + 2x2


b.

For the high-tech firms, x2 = 1. The model for the high-tech firm is:

E(y) = 0 + 1x1 + 2(1) = 0 + 2 + 1x1


The slope of the line would be 1.
c.

The new model would include the interaction term:

E(y) = 0 + 1x1 + 2x2 + 3x1x2


d.

For the high-tech firms, x2 = 1. The model for the high-tech firm is:

E(y) = 0 + 1x1 + 2(1) + 3x1(1) = 0 + 2 + (1 + 3)x1


The slope of the line would be 1 + 3.
11.89

a.

Let x1 = sales volume

1 if NW
x2 =
0 if not
1 if W
x4 =
0 if not

1 if S
x3 =
0 if not

The complete second order model for the sales price of a single-family home is:

b.

E(y) = 0 + 1x1 + 2x12 + 3x2 + 4x3 + 5x4 + 6x1x2 + 7x1x3 + 8x1x4


+ 9x12x2 + 10x12x3 + 11x12x4
For the West, x2 = 0, x3 = 0, and x4 = 1. The equation would be:
E(y) = 0 + 1x1 + 2x12 + 3(0) + 4(0) + 5(0) + 6x1(0) + 7x1(0)
+ 8x1(0) + 9x12(0) + 10x12(0) + 11x12(0)
= 0 + 1x1 + 2x12 + 5 + 8x1 + 11x12
= 0 + 5 + 1x1 + 8x1 + 2x12 + 11x12
= (0 + 5) + (1 + 8)x1 + (2 + 11)x12

Copyright 2011 Pearson Education, Inc. Publishing as Prentice Hall.

Multiple Regression and Model Building

c.

741

For the Northwest, x2 = 1, x3 = 0, and x4 = 0. The equation would be:

E(y) = 0 + 1x1 + 2x12 + 3(1) + 4(0) + 5(0) + 6x1(1) + 7x1(0)


+ 8x1(0) + 9x12(1) + 10x12(0) + 11x12(0)
= 0 + 1x1 + 2x12 + 3 + 6x1 + 9x12
= 0 + 3 + 1x1 + 6x1 + 2x12 + 9x12
= (0 + 3) + (1 + 6)x1 + (2 + 9)x12
d.

The parameters 3, 4, and 5 allow for the y-intercepts of the 4 regions to be different. The
parameters 6, 7, and 8 allow for the peaks of the curves to be a different value of sales volume (x1)
for the four regions. The parameters 9, 10, and 11 allow for the shapes of the curves to be different
for the four regions. Thus, all the parameters from 3 through 11 allow for differences in mean sales
prices among the four regions.

e.

Using MINITAB, the printout is:

Regression Analysis: Price versus X1, X1SQ, ...


The regression equation is
Price = 1904740 - 70.4 X1 + 0.000721 X1SQ + 159661 X2 + 5291908 X3 + 3663319 X4
+ 22.2 X1X2 - 23.9 X1X3 - 37 X1X4 - 0.000421 X1SQX2 - 0.000404 X1SQX3
- 0.000181 X1SQX4
Predictor
Constant
X1
X1SQ
X2
X3
X4
X1X2
X1X3
X1X4
X1SQX2
X1SQX3
X1SQX4
S = 24365.8

Coef
1904740
-70.44
0.0007211
159661
5291908
3663319
22.25
-23.86
-37.2
-0.0004210
-0.0004044
-0.0001810

SE Coef
1984278
72.09
0.0006515
2069265
4812586
4478880
73.74
92.09
103.0
0.0006589
0.0006777
0.0007333

R-Sq = 85.0%

T
0.96
-0.98
1.11
0.08
1.10
0.82
0.30
-0.26
-0.36
-0.64
-0.60
-0.25

P
0.351
0.343
0.285
0.939
0.288
0.425
0.767
0.799
0.723
0.532
0.559
0.808

R-Sq(adj) = 74.6%

Copyright 2011 Pearson Education, Inc. Publishing as Prentice Hall.

742

Chapter 11
Analysis of Variance
Source
Regression
Residual Error
Total
Source
X1
X1SQ
X2
X3
X4
X1X2
X1X3
X1X4
X1SQX2
X1SQX3
X1SQX4

DF
1
1
1
1
1
1
1
1
1
1
1

DF
11
16
27

SS
53633628997
9499097458
63132726455

MS
4875784454
593693591

F
8.21

P
0.000

Seq SS
3591326
64275360
11338642654
10081000583
241539024
18258475317
5579187440
7566169810
138146367
326425228
36175888

Unusual Observations
Obs
2
5
7

X1
61025
60324
61025

Price
235900
345300
240855

Fit
291659
279697
241084

SE Fit
18746
15712
24360

Residual
-55759
65603
-229

St Resid
-3.58R
3.52R
-0.42 X

R denotes an observation with a large standardized residual.


X denotes an observation whose X value gives it large influence.

To determine if the model is useful for predicting sales price, we test:

H0: 1 = 2 = = 11 = 0
Ha: At least one of the coefficients is nonzero
The test statistic is F =

MS(Model)
= 8.21
MSE

The p-value is p = .000. Since the p-value is less than = .01 (p = .000 < .01), H0 is rejected. There
is sufficient evidence to indicate the model is useful in predicting sales price at = .01.
11.90

a.

1 if Developing
Let x2 =
0 otherwise
The model would be:

E(y) = 0 + 1x1 + 2x2 + 3x1x2

Copyright 2011 Pearson Education, Inc. Publishing as Prentice Hall.

Multiple Regression and Model Building

b.

743

Using MINITAB, the plot of the data is:


Scatterplot of Volatility vs CredRat
Market
D
E

60

50

40

30

20
0

10

20

30

40

50
x1

60

70

80

90

From the plot, it appears that the model is appropriate. The two lines appear to have different slopes.
c.

Using MINITAB, the output is:


Regression Analysis: y versus x1, x2, x1x2
The regression equation is
y = 58.8 - 0.557 x1 - 18.7 x2 + 0.354 x1x2
Predictor
Constant
x1
x2
x1x2

Coef
58.786
-0.55743
-18.718
0.35368

S = 2.66123

SE Coef
1.217
0.03669
5.572
0.07615

R-Sq = 96.1%

T
48.30
-15.19
-3.36
4.64

P
0.000
0.000
0.002
0.000

R-Sq(adj) = 95.7%

Analysis of Variance
Source
Regression
Residual Error
Total
Source
x1
x2
x1x2

DF
1
1
1

DF
3
26
29

SS
4596.5
184.1
4780.6

MS
1532.2
7.1

F
216.34

P
0.000

Seq SS
4388.0
55.7
152.8

The fitted regression model is:

y = 58.786 .557x1 18.718x2 + .354x1x2

Copyright 2011 Pearson Education, Inc. Publishing as Prentice Hall.

744

Chapter 11

For the emerging countries, x2 = 0. The fitted model is:

y = 58.786 .557x1 18.718(0) + .354x1(0) = 58.786 .557x1


For the developed countries, x2 = 1. The fitted model is:

y = 58.786 .557x1 18.718(1) + .354x1(1) = 40.068 .203x1


d.

The plot of the fitted lines is:


Scatterplot of y vs x1
Market
D
E

60

50

40

30

20
0

e.

10

20

30

40

50
x1

60

70

80

90

To determine if the slope of the linear relationship between volatility and credit rating depends on
market type, we test:
H0: 3 = 0
H a: 3 0
The test statistic is t = 4.64.
The p-value is 0.000. Since the p-value is less than = .01, H0 is rejected. There is sufficient
evidence to indicate that the slope of the linear relationship between volatility and credit rating
depends on market type at = .01.

11.91

The models in parts a and b are nested:


The complete model is E(y) = 0 + 1x1 + 2x2
The reduced model is E(y) = 0 + 1x1
The models in parts a and d are nested.
The complete model is E(y) = 0 + 1x1 + 2x2 + 3x1x2
The reduced model is E(y) = 0 + 1x1 + 2x2

Copyright 2011 Pearson Education, Inc. Publishing as Prentice Hall.

Multiple Regression and Model Building

745

The models in parts a and e are nested.


2
2
The complete model is E(y) = 0 + 1x1 + 2x2 + 3x1x2 + 4 x1 5 x2
The reduced model is E(y) = 0 + 1x1 + 2x2

The models in parts b and c are nested.


2
The complete model is E(y) = 0 + 1x1 + 2 x1
The reduced model is E(y) = 0 + 1x1

The models in parts b and d are nested.


The complete model is E(y) = 0 + 1x1 + 2x2 + 3x1x2
The reduced model is E(y) = 0 + 1x1
The models in parts b and e are nested.
2
2
The complete model is E(y) = 0 + 1x1 + 2x2 + 3x1x2 + 4 x1 5 x2
The reduced model is E(y) = 0 + 1x1

The models in parts c and e are nested.


2
2
The complete model is E(y) = 0 + 1x1 + 2x2 + 3x1x2 + 4 x1 5 x2
2
The reduced model is E(y) = 0 + 1x1 + 2 x1

The models in parts d and e are nested.


2
2
The complete model is E(y) = 0 + 1x1 + 2x2 + 3x1x2 + 4 x1 5 x2
The reduced model is E(y) = 0 + 1x1 + 2x2 + 3x1x2

11.92

a.

Ha: At least one i 0, i = 3, 4, 5

b.

The reduced model would be E(y) = 0 + 1x1 + 2x2

c.

The numerator df = k g = 5 2 = 3 and the denominator df = n (k + 1)


= 30 (5 + 1) = 24.

d.

H0: 3 = 4 = 5 = 0
Ha: At least one i 0, i = 3, 4, 5
The test statistic is F =

(SSE R SSE C)/(k g ) (1250.2 1125.2) /(5 2) 41.6667

= .89
=
1125.2 /[30 (5 1)]
46.8833
SSE C /[n (k 1)]

The rejection region requires = .05 in the upper tail of the F distribution with numerator df = k g
= 5 2 = 3 and denominator df = n (k + 1) = 30 (5 + 1) = 24. From Table VIII, Appendix B,
F.05 = 3.01. The rejection region is F > 3.01.
Since the observed value of the test statistic does not fall in the rejection region (F = .89 3.01), H0

is not rejected. There is insufficient evidence to indicate the second-order terms are useful at = .05.

Copyright 2011 Pearson Education, Inc. Publishing as Prentice Hall.

746

Chapter 11

a.

Including 0, there are five parameters in the complete model and three in the reduced model.

b.

11.93

The hypotheses are:


H0: 3 = 4 = 0
Ha: At least one i 0, i = 3, 4

c.

The test statistic is F =

(SSE R SSE C ) /(k g )


SSE C /[n (k 1)]
=

(160.44 152.66) /(4 2)


3.89

= .38
152.66 /[20 (4 1)]
10.1773

The rejection region requires = .05 in the upper tail of the F distribution with numerator df = k g
= 4 2 = 2 and denominator df = n (k + 1) = 20 (4 + 1) = 15. From Table VIII, Appendix B,
F.05 = 3.68. The rejection region is F > 3.68.
Since the observed value of the test statistic does not fall in the rejection region (F = .38 3.68), H0

is not rejected. There is insufficient evidence to indicate the complete model is better than the
reduced model at = .05.
11.94

a.

Let variables x1 through x4 be the Demographic variables, variables x5 through x11 be the Diagnostic
variables, variables x12 through x15 be the Treatment variables, and variables x16 through x21 be the
Community variables. The compete model is:
E ( y ) 0 1 x1 2 x2 3 x3 4 x4 5 x5 6 x6 7 x7 8 x8 9 x9
10 x10 11 x11 12 x12 13 x13 14 x14 15 x15 16 x16 17 x17
18 x18 19 x19 20 x20 21 x21

b.

To determine if the 7 Diagnostic variables contribute information for the prediction of y, we test:
H0: 5 = 6 = = 11 = 0

c.

The reduced model would be:


E ( y ) 0 1 x1 2 x2 3 x3 4 x4 12 x12 13 x13 14 x14
15 x15 16 x16 17 x17 18 x18 19 x19 20 x20 21 x21

d.

11.95

Since the p-value is so small (p < .0001), H0 is rejected. There is sufficient evidence to indicate at
least one of the seven diagnostic variables contributes information for the prediction of y.

a.

To determine whether the quadratic terms in the model are statistically useful for predicting relative
optimism, we test:
H0: 4 = 5 = 0
Ha: At least 1 i 0

b.

The complete model is E(y) = 0 + 1x1 + 2x2 + 3x1x2 + 4x22 + 5x1x22 and the reduced model is
E(y) = 0 + 1x1 + 2x2 + 3x1x2.

Copyright 2011 Pearson Education, Inc. Publishing as Prentice Hall.

Multiple Regression and Model Building

c.

747

To determine whether the interaction terms in the model are statistically useful for predicting relative
optimism, we test:
H0: 3 = 5 = 0
Ha: At least 1i 0

d.

The complete model is E(y) = 0 + 1x1 + 2x2 + 3x1x2 + 4x22 + 5x1x22 and the reduced model is
E(y) = 0 + 1x1 + 2x2 + 4x22.
To determine whether the dummy variable terms in the model are statistically useful for predicting
relative optimism, we test:

e.

f.

a.

The model from part b of Exercise 11.86 is E(y) = 0 + 1x1 + 2x2 + 3x3. The model from part c of
Exercise 11.86 is E(y) = 0 + 1x1 + 2x2 + 3x3 + 4x1x2 + 5x1x3. These two models are nested
because all of the terms in the first model are contained in the second model. The first model is the
reduced model and the second model is the complete model.
The null hypothesis for comparing the two models is H0: 4 = 5 = 0.

c.

If we reject H0 in part b, we would conclude that at least one of the interaction terms is not 0. Thus,
we would prefer the second model.

d.

11.97

The complete model is E(y) = 0 + 1x1 + 2x2 + 3x1x2 + 4x22 + 5x1x22 and the reduced model is
E(y) = 0 + 2x2 + 4x22.

b.

11.96

H0: 1 = 3 = 5 = 0
Ha: At least 1 i 0

If we fail to reject H0 in part b, then we would conclude that we have no evidence to indicate that the
interaction terms were significant. Thus, we would prefer the first model.

a.

Let x1 = cycle speed and x2 = cycle pressure ratio. A complete second order model is:
2
2
E ( y ) 0 1 x1 2 x2 3 x1 4 x2 5 x1 x2

b.

To determine whether the curvature terms in the complete 2nd order model are useful for predicting
heat rate, we test:
Ho: 3 = 4 = 0
Ha: At least one of the parameters 3 , 4 differs from 0

c.

2
2
The complete model is: E ( y ) 0 1 x1 2 x2 3 x1 4 x2 5 x1 x2

The reduced model is: E ( y ) 0 1 x1 2 x2 + 5x1x2


d.

From the printout, SSER = 25,310,639, SSEC = 19,370,350, and MSEC = 317,547.

e.

The test statistic is:


F

(SSE R SSE C ) /(k g ) 25, 310, 639 19, 370, 350 /(5 3)

9.35
SSE C /[ n ( k 1)]
19, 370, 350 /[67 (5 1)]

Copyright 2011 Pearson Education, Inc. Publishing as Prentice Hall.

748

Chapter 11

f.

The rejection region requires = .10 in the upper tail of the F-distribution with
1 = k g = 5 3 = 2 and 2 = n (k + 1) = 67 (5 + 1) = 61. From Table VII, Appendix B,
F.10 = 2.39. The rejection region is F > 2.39.

g.

11.98

a.

Since the observed value of the test statistic falls in the rejection region (F = 9.35 > 2.39), H0 is
rejected. There is sufficient evidence to indicate at least one of the curvature terms in the complete
2nd order model are useful for predicting heat rate at
= .10.
Model 1: R2 = .101. 10.1% of the total variation in the supervisor-directed aggression score is
explained by the terms in Model 1.
Model 2: R2 = .555. 55.5% of the total variation in the supervisor-directed aggression score is
explained by the terms in Model 2.

b.

To compare the fits of Model 1 and Model 2, we test:


H0: 5 = 6 = 7 = 8 = 0
Ha: At least 1 i 0

c.

Yes. All of the terms in Model 1 are contained in Model 2.

d.

H0 would be rejected. There is sufficient evidence that at least one of the variables Self-esteem,
history of aggression, Interactional injustice at primary job, and Abusive supervisor at primary job is
significant in predicting supervisor-directed aggression score.

e.

Model 3: E(y) = 0 + 1(Age) + 2(Gender) + 3(Interaction injustice at 2nd job) +


4(Abusive supervisor at 2nd job) + 5(Self-esteem) + 6(History of aggression) +
7(Interactional injustice at primary job) + 8(Abusive supervisor at primary job) +
9(Self-esteem)(History of aggression) + 10(Self-esteem)(Interactional injustice at primary job) +
11(Self-esteem)(Abusive supervisor at primary job) +
12(History of aggression)(Interactional injustice at primary job) +
13(History of aggression) (Abusive supervisor at primary job) +
14(Interactional injustice at primary job)(Abusive supervisor at primary job).

f.

To compare Model 2 with Model 3, we test:


H0: 9 = 10 = . . . = 14 = 0
Ha: At least 1 i 0
The p-value for the test is p > .10. Since the p-value > .10, H0 is not rejected. There is insufficient
evidence to indicate any of the interaction terms are significant in predicting supervisor-directed
aggression score for any reasonable value of .

11.99

a.

The hypothesized equation for E(y) is:


E ( y ) 0 1 x1 2 x2 3 x3 4 x4 5 x5 6 x6 7 x7 8 x8 9 x9 10 x10

b.

To determine if the initial model is sufficient, we test:


H0: 3 = 4 = = 10 = 0
Ha: At least one i 0 i = 3, 4, , 10

Copyright 2011 Pearson Education, Inc. Publishing as Prentice Hall.

Multiple Regression and Model Building

749

c.

Since the F was significant, we reject H0 at = .05. There is sufficient evidence to indicate that at
least one of the additional variables (student ethnicity, socio-economic status, school performance,
number of math courses taken in high school and overall GPA in the math courses) contributes to the
prediction of the SAT-math score.

d.

2
Radj = .79. 79% of the sample variability of SAT-math scores is explained by the model containing

the 10 independent variables, adjusted for the sample size and the number of variables.
e.

For confidence coefficient .95, = .05 and /2 = .05/2 = .025. From Table V, Appendix B, with df =
n (k + 1) = 3,492 (10 + 1) = 3,481, t.025 = 1.96. The confidence interval is:

2 t / 2 s 14 1.96(3) 14 5.88 (8.12, 19.88)


2

We are 95% confident that the mean SAT-Math score for those who were coached was anywhere
from 8.12 to 19.88 points higher than the mean for those who were not coached, holding all other
variables constant.
f.

Yes. The value of 2 decreased from 19 to 14 when the additional variables were added to the
model. Thus, the increase from coaching is not as great.

g.

Te new model including all the interaction terms is:


E ( y ) 0 1 x1 2 x2 3 x3 4 x4 5 x5 6 x6 7 x7 8 x8 9 x9 10 x10
11 x1 x2 12 x3 x2 13 x4 x2 14 x5 x2 15 x6 x2 16 x7 x2 17 x8 x2
18 x9 x2 19 x10 x2

h.

To determine if the model with the interaction terms is better in predicting SAT-Math scores, we test:
H0: 11 = 12 = = 19 = 0
Ha: At least one i 0 i = 11, 12, , 19
We would fit the complete model above. We would then compare it to the fitted model from part a
(Reduced model). The test statistic would be:
F

(SSE R SSE C ) /(k g )


SSE C /[n (k 1)]

Copyright 2011 Pearson Education, Inc. Publishing as Prentice Hall.

750

Chapter 11

11.100 a.

Using MINITAB, the results for fitting the reduced model are:
Regression Analysis: Price versus X1, X2, X3, X4, X1X2, X1X3, X1X4
The regression equation is
Price = - 286970 + 9.32 X1 + 578133 X2 + 60968 X3 - 575769 X4 - 10.4 X1X2
- 6.52 X1X3 + 1.00 X1X4
Predictor
Constant
X1
X2
X3
X4
X1X2
X1X3
X1X4

Coef
-286970
9.317
578133
60968
-575769
-10.408
-6.522
1.000

S = 30785.9

SE Coef
161003
2.900
183578
292823
325699
3.060
3.300
3.903

R-Sq = 70.0%

T
-1.78
3.21
3.15
0.21
-1.77
-3.40
-1.98
0.26

P
0.090
0.004
0.005
0.837
0.092
0.003
0.062
0.800

R-Sq(adj) = 59.5%

Analysis of Variance
Source
Regression
Residual Error
Total
Source
X1
X2
X3
X4
X1X2
X1X3
X1X4

DF
1
1
1
1
1
1
1

DF
7
20
27

SS
44177277861
18955448594
63132726455

MS
6311039694
947772430

F
6.66

P
0.000

Seq SS
3591326
8414868549
9294417537
1463449502
17344397940
7594294303
62258704

From Exercise 11.89, SSEC = 9,499,097,458, n = 28, and k = 11.


To determine if the quadratic terms are statistically useful for predicting sales price, we test:
H0: 2 = 9 = 10 = 11 = 0
Ha: At least 1 i 0
(SSE R SSE C ) /(k g ) (18, 955, 448, 594 9, 499, 097, 458) /(11 7)
The test statistic is F

3.98
SSE C /[ n ( k 1)]
9, 499, 097, 458 /[28 (11 1)]
The rejection region requires = .05 in the upper tail of the F distribution with 1 = k g = 11 7 = 4
and 2 = n (k + 1) = 28 (11 + 1) = 16. From Table VIII, Appendix B, F.05 = 3.01. The rejection
region is F > 3.01.
Since the observed value of the test statistic falls in the rejection region (F = 3.98 > 3.01), H0 is
rejected. There is sufficient evidence to indicate at least one of the quadratic terms is statistically
useful for predicting sales price at = .05.
b.

Since we rejected H0 in part a, the complete model is preferred. At least one of the quadratic terms is
significant.

Copyright 2011 Pearson Education, Inc. Publishing as Prentice Hall.

Multiple Regression and Model Building

d.

751

The preferred model from part b is the complete model. Using MINITAB, the results of fitting the
model without the interaction terms is:
Regression Analysis: Price versus X1, X1SQ, X2, X3, X4
The regression equation is
Price = 289549 - 2.15 X1 + 0.000019 X1SQ - 57530 X2 - 203755 X3 - 24038 X4
Predictor
Constant
X1
X1SQ
X2
X3
X4

Coef
289549
-2.150
0.00001888
-57530
-203755
-24038

S = 43381.9

SE Coef
138840
3.325
0.00001621
50653
113332
67099

R-Sq = 34.4%

T
2.09
-0.65
1.16
-1.14
-1.80
-0.36

P
0.049
0.524
0.257
0.268
0.086
0.724

R-Sq(adj) = 19.5%

Analysis of Variance
Source
Regression
Residual Error
Total
Source
X1
X1SQ
X2
X3
X4

DF
1
1
1
1
1

DF
5
22
27

SS
21729048947
41403677509
63132726455

MS
4345809789
1881985341

F
2.31

P
0.079

Seq SS
3591326
64275360
11338642654
10081000583
241539024

To determine whether region and sales volume interact to affect sales price, we test:
H0: 6 =7 =8 = 9 = 10 = 11 = 0
Ha: At least 1 i 0
(SSE R SSE C ) /(k g ) (41, 403, 677, 509 9, 499, 097, 458) /(11 5)
The test statistic is F

8.96
SSE C /[ n ( k 1)]
9, 499, 097, 458 /[28 (11 1)]
The rejection region requires = .05 in the upper tail of the F distribution with 1 = k g = 11 5 = 6
and 2 = n (k + 1) = 28 (11 + 1) = 16. From Table VIII, Appendix B, F.05 = 2.74. The rejection
region is F > 2.74.
Since the observed value of the test statistic falls in the rejection region (F = 8.96 > 2.74), H0 is
rejected. There is sufficient evidence to indicate region and sales volume interact to affect sales price
at = .05.
d.

Since we rejected H0 in part c, the complete model is preferred. At least one of the interaction terms is
significant.

Copyright 2011 Pearson Education, Inc. Publishing as Prentice Hall.

752

Chapter 11

11.101 a.

The model would be:


E(y) = 0 + 1x1 + 2x2 + 3x3

b.

The model including the interaction terms is:


E(y) = 0 + 1x1 + 2x2 + 3x3 + 4x1x2 + 5x1x3

c.

For AL, x2 = x3 = 0. The model would be:


E(y) = 0 + 1x1 + 2(0) + 3(0) + 4x1(0) + 5x1(0) = 0 + 1x1
The slope of the line is 1.
For TDS-3A, x2 = 1 and x3 = 0. The model would be:
E(y) = 0 + 1x1 + 2(1) + 3(0) + 4x1(1) + 5x1(0) = (0 + 2) + (1 + 4)x1
The slope of the line is 1 + 4.
For FE, x2 = 0 and x3 = 1. The model would be:
E(y) = 0 + 1x1 + 2(0) + 3(1) + 4x1(0) + 5x1(1) = (0 + 3) + (1 + 5)x1
The slope of the line is 1 + 5.

d.

To test for the presence of temperature-waste type interaction, we would fit the complete model listed
in part b and the reduced model found in part a. The hypotheses would be:
H0: 4 = 5 = 0
Ha: At least one i 0, for i = 4, 5
The test statistic would be F

(SSE R SSE C ) /(k g )


where k = 5, q = 3, SSER is the SSE for the
SSE C /[n (k 1)]

reduced model, and SSEC is the SSE for the complete model.
11.102 a.

To determine whether the rate of increase of emotional distress with experience is different for the
two groups, we test:
H0: 4 = 5 = 0
Ha: At least one i 0, i = 4, 5

b.

To determine whether there are differences in mean emotional distress levels that are attributable to
exposure group, we test:
H0: 3 = 4 = 5 = 0
Ha: At least one i 0, i = 3, 4, 5

c.

To determine whether there are differences in mean emotional distress levels that are attributable to
exposure group, we test:
H0: 3 = 4 = 5 = 0
Ha: At least one i 0, i = 3, 4, 5
The test statistic is F =

(SSE R SSE C) /(k g )


(795.23 783.9) /(5 2)
=
= .93
783.9 /[200 (5 1)]
SSE C /[n (k 1)]

Copyright 2011 Pearson Education, Inc. Publishing as Prentice Hall.

Multiple Regression and Model Building

753

The rejection region requires = .05 in the upper tail of the F distribution with 1 = k g = 5 2 = 3
and 2 = n (k + 1) = 200 (5 + 1) = 194. From Table VIII, Appendix B, F.05 2.60. The rejection
region is F > 2.60.
Since the observed value of the test statistic does not fall in the rejection region
(F = .93 2.60), H0 is not rejected. There is insufficient evidence to indicate that there are

differences in mean emotional distress levels that are attributable to exposure group at = .05.
11.103 a.

Using MINITAB, the output from fitting a complete second-order model is:
* NOTE *
* NOTE *
* NOTE *

X1 is highly correlated with other


X2 is highly correlated with other
X1X2 is highly correlated with other

predictor variables
predictor variables
predictor variables

The regression equation is


Y = 172788 - 10739 X1 - 499 X2 - 20.2 X1X2 + 198 X1SQ + 14.7 X2SQ
Predictor
Constant
X1
X2
X1X2
X1SQ
X2SQ
s = 13132

Coef
172788
-10739
-499
-20.20
197.57
14.678

Stdev
97785
2789
1444
21.36
22.60
8.819

R-sq = 95.9%

t-ratio
1.77
-3.85
-0.35
-0.95
8.74
1.66

p
0.084
0.000
0.731
0.350
0.000
0.103

R-sq(adj) = 95.5%

Analysis of Variance
SOURCE
Regression
Error
Total

DF
SS
MS
5 1.70956E+11 34191134720
42 7242915328
172450368
47 1.78199E+11

SOURCE
X1
X2
X1X2
X1SQ
X2SQ

DF
SEQ SS
1 1.56067E+11
1
13214024
1 1686339840
1 12711371776
1
477704384

Unusual
Obs.
14
22
34
43
47

Observations
X1
Y
62.9
203288
45.4
27105
28.2
28722
64.3
230329
63.9
212309

Fit Stdev.Fit
235455
6002
58567
3603
15156
11311
248054
8790
240469
4904

F
198.27

Residual
-32167
-31462
13566
-17725
-28160

p
0.000

St.Resid
-2.75R
-2.49R
2.03RX
-1.82 X
-2.31R

R denotes an obs. with a large st. resid.


X denotes an obs. whose X value gives it large influence.

Copyright 2011 Pearson Education, Inc. Publishing as Prentice Hall.

754

Chapter 11

b.

To test the hypothesis H0: 4 = 5 = 0, we must fit the reduced model


E(y) = 0 + 1x1 + 2x2 + 3x1x2

Using MINITAB, the output from fitting the reduced model is:
* NOTE *

X1X2 is highly correlated with other

predictor variables

The regression equation is


Y = - 476768 + 11458 X1 + 3404 X2 - 64.4 X1X2
Predictor
Constant
X1
X2
X1X2

Coef
-476768
11458
3404
-64.35

s = 21549

Stdev
100852
1874
1814
33.77

R-sq = 88.5%

t-ratio
-4.73
6.11
1.88
-1.91

p
0.000
0.000
0.067
0.063

R-sq(adj) = 87.8%

Analysis of Variance
SOURCE
Regression
Error
Total

DF
SS
MS
3 1.57767E+11 52588867584
44 20431990784
464363424
47 1.78199E+11

SOURCE
X1
X2
X1X2

DF
SEQ SS
1 1.56067E+11
1
13214024
1 1686339840

Unusual
Obs.
34
38
43

Observations
X1
Y
28.2
28722
66.5
290411
64.3
230329

Fit Stdev.Fit
-59713
11922
250350
9553
202899
11574

F
113.25

Residual
88435
40061
27430

p
0.000

St.Resid
4.93RX
2.07R
1.51 X

R denotes an obs. with a large st. resid.


X denotes an obs. whose X value gives it large influence.

The test is:


H0: 4 = 5 = 0
Ha: At least one i 0, for i = 4, 5
The test statistic is F =

(SSE R SSE C ) /(k g )


SSE C /[n (k 1)]
=

(20, 431, 990, 784 7, 242, 915, 328) /(5 3)


= 38.24
7, 242, 915, 328 /[48 (5 1)]

The rejection region requires = .05 in the upper tail of the F-distribution with 1 = k g = 5 3 = 2
and 2 = n (k + 1) = 48 (5 + 1) = 42. From Table VIII, Appendix B, F.05 3.23. The rejection
region is F > 3.23.

Copyright 2011 Pearson Education, Inc. Publishing as Prentice Hall.

Multiple Regression and Model Building

755

Since the observed value of the test statistic falls in the rejection region (F = 38.24 > 3.23), H0 is
rejected. There is sufficient evidence to indicate that at least one of the quadratic terms contributes to
the prediction of monthly collision claims at = .05.
c.

From part b, we know at least one of the quadratic terms is significant. From part a, it appears that
none of the terms involving x2 may be significant.
2
Thus, we will fit the model with just x1 and x1 . The MINITAB output is:

The regression equation is


Y = 185160 - 11580 X1 + 196 X1SQ
Predictor
Constant
X1
X1SQ

Coef
185160
-11580
195.54

s = 13219

Stdev
54791
2182
21.64

R-sq = 95.6%

t-ratio
3.38
-5.31
9.04

p
0.002
0.000
0.000

R-sq(adj) = 95.4%

Analysis of Variance
SOURCE
Regression
Error
Total

DF
SS
MS
2 1.70335E+11 85167357952
45 7863868416
174752624
47 1.78199E+11

SOURCE
X1
X1SQ

DF
SEQ SS
1 1.56067E+11
1 14267676672

Unusual
Obs.
10
14
22
34
38
47

Observations
X1
Y
35.8
28957
62.9
203288
45.4
27105
28.2
28722
66.5
290411
63.9
212309

Fit Stdev.Fit
21200
5825
230397
4044
62456
2856
14099
11344
279798
6189
243611
4570

F
487.36

Residual
7757
-27109
-35351
14623
10613
-31302

p
0.000

St.Resid
0.65 X
-2.15R
-2.74R
2.15RX
0.91 X
-2.52R

R denotes an obs. with a large st. resid.


X denotes an obs. whose X value gives it large influence.

To see if any of the terms involving x2 are significant, we test:


H0: 2 = 3 = 5 = 0
Ha: At least one i 0, for i = 2, 3, 5
The test statistic is F =

(SSE R SSE C ) /(k g )


SSE C /[ n ( k 1)]
=

(7, 863, 868, 416 7, 242, 915, 328) /(5 2)


= 1.20
7, 242, 915, 328 /[48 (5 1)]

Copyright 2011 Pearson Education, Inc. Publishing as Prentice Hall.

756

Chapter 11

The rejection region requires = .05 in the upper tail of the F-distribution with 1 = k g = 5 2 = 3
and 2 = n (k + 1) = 48 (5 + 1) = 42. From Table VIII, Appendix B, F.05 2.84. The rejection
region is F > 2.84
Since the observed value of the test statistic does not fall in the rejection region (F = 1.20 2.84),

H0 is not rejected. There is insufficient evidence to indicate that any of the terms involving x2
contribute to the model at = .05.
2
Thus, it appears that the best model is E(y) = 0 + 1x1 + 2 x1 . The model does not support the
analyst's claim. In the model above, the estimate for 2 is positive. This would indicate that the
higher claims are for both the young and the old. Also, there is no evidence to support the claim that
there are more claims when the temperature goes down.

11.104 a.

The best one-variable predictor of y is the one whose t statistic has the largest absolute value. The t
statistics for each of the variables are:
Independent
Variable


t = 1.6/.42 = 3.81
x1
x2
t = .9/.01 = 90
x3
t = 3.4/1.14 = 2.98
x4
t = 2.5/2.06 = 1.21
x5
t = 4.4/.73 = 6.03
x6
t = .3/.35 = .86
The variable x2 is the best one-variable predictor of y. The absolute value of the corresponding t
score is 90. This is larger than any of the others.
b.

Yes. In the stepwise procedure, the first variable entered is the one which has the largest absolute
value of t, provided the absolute value of the t falls in the rejection region.

c.

Once x2 is entered, the next variable that is entered is the one that, in conjunction with x2, has the
largest absolute t value associated with it.

11.105 a.

In Step 1, all one-variable models are fit to the data. These models are of the form:
E(y) = 0 + 1xi
Since there are 7 independent variables, 7 models are fit. (Note: There are actually only 6
independent variables. One of the qualitative variables has three levels and thus two dummy
variables. Some statistical packages will allow one to bunch these two variables together so that they
are either both in or both out. In this answer, we are assuming that each xi stands by itself.

b.

In Step 2, all two-varirable models are fit to the data, where the variable selected in Step 1, say x1, is
one of the variables. These models are of the form:
E(y) = 0 + 1x1 + 2xi
Since there are 6 independent variables remaining, 6 models are fit.

Copyright 2011 Pearson Education, Inc. Publishing as Prentice Hall.

Multiple Regression and Model Building


c.

757

In Step 3, all three-variable models are fit to the data, where the variables selected in Step 2, say x1
and x2, are two of the variables. These models are of the form:
E(y) = 0 + 1x1 + 2x2 + 3xi
Since there are 5 independent variables remaining, 5 models are fit.

d.

The procedure stops adding independent variables when none of the remaining variables, when
added to the model, have a p-value less than some predetermined value. This predetermined value is
usually = .05.

e.

Two major drawbacks to using the final stepwise model as the "best" model are:
(1) An extremely large number of single parameter t-tests have been conducted. Thus, the
probability is very high that one or more errors have been made in including or excluding
variables.
(2)

11.106 a.

Often the variables selected to be included in a stepwise regression do not include the highorder terms. Consequently, we may have initially omitted several important terms from the
model.

In the first step, there are 8 one-variable models fit to the data.

b.

The best one-variable model is the model that contains the one variable with the largest absolute
value of the t-statistic. This would also correspond to the one variable with the smallest p-value.

c.

In step 2, there would be 7 two-variable models fit to the data.

d.

1 .28 . The mean relative error for developers is estimated to be .28 lower than the mean relative
error for project leaders, holding previous accuracy constant.

.27 . The mean relative error for previous accuracy more than 20% is estimated to be .27 higher
8

than the mean relative error for previous accuracy less than 20%, holding company role of estimator
constant.
e.

11.107 a.
b.

There are a couple of reasons for being wary of using this model as the final model. First, in stepwise
regression, once a variable is in the model, it cannot be dropped. The best one variable model might
contain x1, but the best model may contain the variables x2 and x3. By including x1 in the model, we
may never get to the best model. Another reason to be wary is that we have not considered any 2nd
order terms in the model or any interactions. These higher order terms might be very important in the
model.
In step 1, all 1 variable models are fit. Thus, there are a total of 11 models fit.
In step 2, all two-variable models are fit, where 1 of the variables is the best one selected in step 1.
Thus, a total of 10 two-variable models are fit.

c.

In the 11th step, only one model is fit the model containing all the independent variables.

d.

The model would be:


E ( y ) 0 1 x11 2 x4 3 x2 4 x7 5 x10 6 x1 7 x9 8 x3

Copyright 2011 Pearson Education, Inc. Publishing as Prentice Hall.

758

Chapter 11

e.

67.7% of the total sample variability of overall satisfaction is explained by the model containing the
independent variables safety on bus, seat availability, dependability, travel time, convenience of
route, safety at bus stops, hours of service, and frequency of service.

f.

Using stepwise regression does not guarantee that the best model will be found. There may be better
combinations of the independent variables that are never found, because of the order in which the
independent variables are entered into the model. In addition, there are no squared or interaction
terms included. There is a high probability of making at least one Type 1 error.

11.108 a.

From the printout, the three variables that should be included in the model are: ST-DEPTH,
TGRSWT, and TI. They are all entered into the model using stepwise regression and all are retained.

b.

No. There may be other independent variables that were not included.

c.

The model is E(y) = 0 + 1x4 + 2x5 + 3x6 + 4x4x5 + 5x4x6 + 6x5x6

d.

He would test

H0: 4 = 5 = 6 = 0 versus
Ha: At least one i 0, i = 4, 5, 6

He would fit the first-order model and record SSER. He would then fit the model with the interaction
terms and record SSEC.
The test statistic is F =
e.

(SSE R SSE C ) /(k g )


SSE C /[ n ( k 1)]

To improve the model, the marine biologist could try to find other independent variables that affect y,
the log of the number of marine animals present, or higher order terms of the already identified
independent variables.

11.109 Yes. x2 and x4 are highly correlated (.93), as well as x4 and x5 (.86). When highly correlated independent
variables are present in a regression model, the results can be confusing. The researcher may want to
include only one of the variables.
11.110 a.

The plot of the residuals reveals a nonrandom pattern. The residuals exhibit a curved shape. Such a
pattern usually indicates that curvature needs to be added to the model

b.

The plot of the residuals reveals a nonrandom pattern. The residuals versus the predicted values

shows a pattern where the range in values of the residuals increases as y increases. This indicates
that the variance of the random error, , becomes larger as the estimate of E(y) increases in value.
Since E(y) depends on the x-values in the model, this implies that the variance of is not constant
for all settings of the x's.

c.

This plot reveals an outlier, since all or almost all of the residuals should fall within 3 standard
deviations of their mean of 0.

d.

This frequency distribution of the residuals is skewed to the right. This may be due to outliers or
could indicate the need for a transformation of the dependent variable.

11.111 a.

Since the absolute value of the correlation coefficient is .983, this would imply there is a very high
potential for multicollinearity.

b.

Since the absolute value of the correlation coefficient is .074, this would imply there is a very low
potential for multicollinearity.

c.

Since the absolute value of the correlation coefficient is .722, this would imply there is a moderate
potential for multicollinearity.

Copyright 2011 Pearson Education, Inc. Publishing as Prentice Hall.

Multiple Regression and Model Building

d.

11.112 a.
b.

759

Since the absolute value of the correlation coefficient is .528, this would imply there is a moderate
potential for multicollinearity.
Since all the pairwise correlations are .45 or less in absolute value, there is little evidence of extreme
multicollinearity.
No. The overall model test is significant (p < .001). This implies that at least one variable contributes
to the prediction of the urban/rural rating. Looking at the individual t-tests, there are several that are
significant, namely x1, x3, and x5. There is no evidence that multicollinearity is present.

11.113 It is possible that company role of estimator and previous accuracy could be correlated with each other.
This indicates multicollinearity may be present
11.114 First, we need to compute the value of the residual:

Residual y y 87 29.63 57.37


We are given that the standard deviation is s = 24.68. Thus, an observation with a residual of 57.37 is
57.37 / 24.68 = 2.32 standard deviations from the fitted regression line. Since this is less than 3 standard
deviations from the regression line, this point is not considered an outlier.
11.115 a.

b.

11.116 a.

The normal probability plot should be used to check for normal errors. The points in this plot are
fairly close to the straight line, so the assumption of normality appears to be satisfied.
The graph of the residuals versus the fitted or predicted values should be used to check for unequal
variances. The spread of the residuals appears to be fairly constant in this graph. It appears that the
assumption of equal variances is satisfied.
From MINITAB, the output is:
Regression Analysis: Food versus Income, Size
The regression equation is
Food = 2.79 - 0.00016 Income + 0.383 Size
Predictor
Constant
Income
Size

Coef
2.7944
-0.000164
0.38348

S = 0.7188

SE Coef
0.4363
0.006564
0.07189

R-Sq = 55.8%

T
6.40
-0.02
5.33

P
0.000
0.980
0.000

R-Sq(adj) = 52.0%

Analysis of Variance
Source
Regression
Residual Error
Total
Source
Income
Size

DF
1
1

DF
2
23
25

SS
15.0027
11.8839
26.8865

MS
7.5013
0.5167

F
14.52

P
0.000

Seq SS
0.2989
14.7037

Correlations: Income, Size


Pearson correlation of Income and Size = 0.137
P-Value = 0.506

No; Income and household size do not seem to be highly correlated. The correlation coefficient
between income and household size is .137.

Copyright 2011 Pearson Education, Inc. Publishing as Prentice Hall.

760

Chapter 11

Using MINITAB, the residual plots are:


Histogram of the Residuals
(response is Food)

Frequency

10

0
-1.0

-0.5

0.0

0.5

1.0

1.5

2.0

2.5

3.0

Residual

Residuals Versus the Fitted Values


(response is Food)
3

Residual

-1
3

Fitted Value

Residuals Versus Income


(response is Food)
3

Residual

b.

-1
0

10

20

30

40

50

60

70

80

90

100

Income

Copyright 2011 Pearson Education, Inc. Publishing as Prentice Hall.

Multiple Regression and Model Building

761

Residuals Versus Size


(response is Food)
3

Residual

-1
0

Size

Yes; The residuals versus income and residuals versus homesize exhibit a curved shape. Such a
pattern could indicate that a second-order model may be more appropriate.
c.

No; The residuals versus the predicted values reveals varying spreads for different values of y . This
implies that the variance of is not constant for all settings of the x's.

d.

Yes; The outlier shows up in several plots and is the 26th household (Food consumption = $7500,
income = $7300 and household size = 5).

e.

No; The frequency distribution of the residuals shows that the outlier skews the frequency
distribution to the right.

Copyright 2011 Pearson Education, Inc. Publishing as Prentice Hall.

762

Chapter 11

11.117 Using MINITAB, the residual plots are:

Residual Plots for ARSENIC


Normal Probability Plot of the Residuals

Percent

99
90
50
10
1
0.1

Residuals Versus the Fitted Values


Standardized Residual

99.9

-4

-2

0
2
Standardized Residual

4
2
0

Histogram of the Residuals


Standardized Residual

Frequency

80
Fitted Value

120

160

Residuals Versus the Order of the Data

80
60
40
20
0

40

-0.75 0.00 0.75 1.50 2.25 3.00 3.75 4.50


Standardized Residual

4
2
0

50

100
150
200
250
Observation Order

300

Scatterplot of SRES1 vs LATITUDE, LONGITUDE, DEPTH-FT


LATITUDE

LONGITUDE
4
2

SRES1

0
23.76

23.77 23.78 23.79


DEPTH-FT

23.80

90.60

90.62

90.64

4
2
0

50

100

150

200

Copyright 2011 Pearson Education, Inc. Publishing as Prentice Hall.

90.66

Multiple Regression and Model Building

763

a.

From the histogram of the standardized residuals, it appears that the mean of the residuals is close to
0. Thus, the assumption that the mean error is 0 appears to be met.

b.

From the plot of the standardized residuals versus the fitted values, it appears that the spread of the
residuals increases as the fitted values increase. Thus, it appears that the assumption of constant
variance is violated.

c.

From the plots of the standardized residuals versus the fitted values, it appears that there are some
outliers. There are several observations with standardized residuals of 4 or more.

d.

From the normal probability plot, the data do not form a straight line. Thus, it appears that the
assumption of normal error terms is violated.

e.

Using MINITAB, the correlations among the independent variables are:


Correlations: LATITUDE, LONGITUDE, DEPTH-FT
LONGITUDE
DEPTH-FT

LATITUDE
0.311
0.000

LONGITUDE

0.151
0.006

-0.328
0.000

Cell Contents: Pearson correlation


P-Value

None of the pairwise correlations are large in absolute value, so there is no evidence of
multicollinearity. In addition, the global test indicates that at least one of the independent variables is
significant and each of the independent variables is statistically significant. This also indicates that
multicollinearity does not exist.

Copyright 2011 Pearson Education, Inc. Publishing as Prentice Hall.

764

Chapter 11

11.118 Using MINITAB, the residual plots are:

Residual Plots for DDT


Normal Probability Plot of the Residuals

Percent

99
90
50
10
1
0.1

Residuals Versus the Fitted Values


Standardized Residual

99.9

-5

0
5
Standardized Residual

10.0
7.5
5.0
2.5
0.0

10

Histogram of the Residuals


Standardized Residual

50

2
4
6
8
Standardized Residual

10

10.0
7.5
5.0
2.5
0.0
1 10 20 30 4 0 5 0 6 0 7 0 8 0 9 0 00 10 20 30 40
1 1 1 1 1

Observation Order

Residuals Versus WEIGHT


(response is DDT)
12

Standardized Residual

10
8
6
4
2
0
0

500

1000

1500

2000

2500

WEIGHT

Residuals Versus LENGTH


(response is DDT)
12
10
Standardized Residual

Frequency

100

8
6
4
2
0
20

25

30

35
LENGTH

40

100

Residuals Versus the Order of the Data

150

50
Fitted Value

45

50

55

Copyright 2011 Pearson Education, Inc. Publishing as Prentice Hall.

Multiple Regression and Model Building

765

Residuals Versus MILE


(response is DDT)
12

Standardized Residual

10
8
6
4
2
0
0

50

100

150

200

250

300

350

MILE

From the normal probability plot, the points do not fall on a straight line, indicating the residuals are not
normal. The histogram of the residuals indicates the residuals are skewed to the right, which also indicates
that the residuals are not normal. The plot of the residuals versus yhat indicates that there is at least one
outlier and the variance is not constant. One observation has a standardized residual of more than 10 and
several others have standardized residuals greater than 3. This is also evident in the plots of the residuals
versus each of the independent variables. Since the assumptions of normality and constant variance appear
to be violated, we could consider transforming the data. We should also check the outlying observations to
see if there are any errors connected with these observations.
11.119 a.

Using MINITAB, the results are:


Regression Analysis: Time versus Temp
The regression equation is
Time = 30856 - 192 Temp
Predictor
Constant
Temp

Coef
30856
-191.57

S = 1099.17

SE Coef
2713
18.49

R-Sq = 84.3%

T
11.37
-10.36

P
0.000
0.000

R-Sq(adj) = 83.5%

Analysis of Variance
Source
Regression
Residual Error
Total

DF
1
20
21

SS
129663987
24163399
153827386

MS
129663987
1208170

F
107.32

P
0.000

The fitted regression line is y = 30,856 191.57temp


b.

For temperature = 149, y = 30,856 191.57(150) = 2,312.07 . There are 2 observations with a
temperature of 149. The residuals for the microchips manufactured at a temperature of 149o C are

r = y y = 1,100 2,312.07 = 1,212.07 and r = y y = 1,150 2,312.07 = 1,162.07 .

Copyright 2011 Pearson Education, Inc. Publishing as Prentice Hall.

766

Chapter 11

c.

Using MINITAB, the plot of the residuals versus temperature is:


Scatterplot of RESI1 vs Temp
3000

2000

RESI1

1000

-1000

-2000
120

130

140

150

160

170

Temp

There appears to be a U-shaped trend to the data.


d.

Yes. Because there appears to be a U-shaped trend to the data, this indicates that there is a
curvilinear relationship between temperature and time.

11.120 Using MINITAB, the results of the regression are:


Regression Analysis: HEATRATE versus RPM, CPRATIO, RPM*CPR
The regression equation is
HEATRATE = 12065 + 0.170 RPM - 146 CPRATIO - 0.00242 RPM*CPR
Predictor
Constant
RPM
CPRATIO
RPM*CPR

Coef
12065.5
0.16969
-146.07
-0.002425

S = 633.842

SE Coef
418.5
0.03467
26.66
0.003120

R-Sq = 84.9%

T
28.83
4.89
-5.48
-0.78

P
0.000
0.000
0.000
0.440

R-Sq(adj) = 84.2%

Analysis of Variance
Source
Regression
Residual Error
Total
Source
RPM
CPRATIO
RPM*CPR

DF
1
1
1

DF
3
63
66

SS
142586570
25310639
167897208

MS
47528857
401756

F
118.30

P
0.000

Seq SS
119598530
22745478
242561

Unusual Observations
Obs
11
28
36
61
62
64

RPM
18000
22516
4473
33000
30000
3600

HEATRATE
14628.0
14796.0
13523.0
16243.0
14628.0
8714.0

Fit
12710.6
14561.9
11428.0
16105.3
15296.4
7258.6

SE Fit
165.1
277.9
171.5
410.2
288.7
427.1

Residual
1917.4
234.1
2095.0
137.7
-668.4
1455.4

St Resid
3.13R
0.41 X
3.43R
0.28 X
-1.18 X
3.11RX

R denotes an observation with a large standardized residual.


X denotes an observation whose X value gives it large influence.

Copyright 2011 Pearson Education, Inc. Publishing as Prentice Hall.

Multiple Regression and Model Building

767

The residual plots are:

Residual Plots for HEATRATE


Normal Probability Plot of the Residuals

Percent

99
90
50
10
1
0.1

Residuals Versus the Fitted Values


Standardized Residual

99.9

-4

-2
0
2
Standardized Residual

4
2
0
-2

8000

Histogram of the Residuals


Standardized Residual

10
5
-1

0
1
2
Standardized Residual

2
0
-2

1 5 10 15 20 25 30 35 40 45 50 55 60 65
Observation Order

(response is HEATRATE)
4

Standardized Residual

3
2
1
0
-1
-2
5

10

15

20
CPRATIO

25

30

35

Residuals Versus RPM


(response is HEATRATE)
4
3
2
1
0
-1
-2
0

5000

16000

Residuals Versus CPRATIO

Standardized Residual

Frequency

15

-2

14000

Residuals Versus the Order of the Data

20

10000
12000
Fitted Value

10000

15000

20000

25000

30000

35000

RPM

Copyright 2011 Pearson Education, Inc. Publishing as Prentice Hall.

768

Chapter 11
From the normal probability plot, the points do not fall on a straight line, indicating the residuals are
not normal. The histogram of the residuals indicates the residuals are skewed to the right, which also
indicates that the residuals are not normal. The plot of the residuals versus yhat indicates that there
are potentially 3 outliers with standardized residuals of 3 or more. The variance appears to be
constant. On the graph of the residuals versus RPM, the spread of the residuals appears to decrease
as the value of RPM increases. This indicates the variance may not be constant for RPMs. Since the
assumptions of normality and constant variance appear to be violated, we could consider
transforming the data. We should also check the outlying observations to see if there are any errors
connected with these observations.

11.121 In multiple regression, as in simple regression, the confidence interval for the mean value of y is narrower
than the prediction interval of a particular value of y.
11.122 The error of prediction is smallest when the values of x1, x2, and x3 are equal to their sample means. The
further x1, x2, and x3 are from their means, the larger the error. When x1 = 60, x2 = .4, and x3 = 900, the
observed values are outside the observed ranges of the x values. When x1 = 30, x2 = .6, and x3 = 1300, the
observed values are within the observed ranges and consequently the x values are closer to their means.
Thus, when x1 = 30, x2 = .6, and x3 = 1300, the error of prediction is smaller.
11.123 The model-building step is the key to the success or failure of a regression analysis. If the model is a good
model, we will have a good predictive model for the dependent variable y. If the model is not a good
model, the predictive ability will not be of much use.
11.124 a.

To determine if at least one of the parameters is not zero, we test:


H0: 1 = 2 = 3 = 4 = 0
Ha: At least one i 0
The test statistic is F =

R2 / k
(1 R ) /[n (k 1)]
2

.83 / 4
= 24.41
(1 .83)([25 (4 1)]

The rejection region requires = .05 in the upper tail of the F distribution with numerator df = k = 4
and denominator df = n (k + 1) = 25 (4 + 1) = 20. From Table VIII, Appendix B, F.05 = 2.87.
The rejection region is F > 2.87.
Since the observed value of the test statistic falls in the rejection region (F = 24.41 > 2.87), H0 is
rejected. There is sufficient evidence to indicate at least one of the parameters is nonzero at =
.05.
b.

H0: 1 = 0
Ha: 1 < 0
The test statistic is t =

1 0
s

2.43 0
= 2.01
1.21

The rejection region requires = .05 in the lower tail of the t distribution with df = n (k + 1) = 25
(4 + 1) = 20. From Table V, Appendix B, t.05 = 1.725. The rejection region is t < 1.725.
Since the observed value of the test statistic falls in the rejection region (t = 2.01 < 1.725), H0 is
rejected. There is sufficient evidence to indicate 1 is less than 0 at = .05.

Copyright 2011 Pearson Education, Inc. Publishing as Prentice Hall.

Multiple Regression and Model Building

c.

769

H0: 2 = 0
Ha: 2 > 0
The test statistic is t =

2 0
s

.05 0
= .31
.16

The rejection region requires = .05 in the upper tail of the t distribution. From part b above, the
rejection region is t > 1.725.
Since the observed value of the test statistic does not fall in the rejection region (t = .31 1.725), H0

is not rejected. There is insufficient evidence to indicate 2 is greater than 0 at = .05.


d.

H0: 3 = 0
Ha: 3 0
The test statistic is t =

3 0
s

.62 0
= 2.38
.26

The rejection region requires /2 = .05/2 = .025 in each tail of the t distribution with df = 20. From
Table V, Appendix B, t.025 = 2.086. The rejection region is t < 2.086 or t > 2.086.
Since the observed value of the test statistic falls in the rejection region (t = 2.38 > 2.086), H0 is
rejected. There is sufficient evidence to indicate 3 is different from 0 at = .05.
11.125 a.

The least squares equation is y = 90.1 1.836x1 + .285x2

b.

R2 = .916. About 91.6% of the sample variability in the y's is explained by the model E(y) = 0 +
1x1 + 2x2

c.

To determine if the model is useful for predicting y, we test:


H0: 1 = 2 = 0
Ha: At least one i 0, i = 1, 2
The test statistic is F =

MSR 7400
= 64.91
=
MSE 114

The rejection region requires = .05 in the upper tail of the F distribution with 1 = k = 2 and 2 = n
(k + 1) = 15 (2 + 1) = 12. From Table VIII, Appendix B, F.05 = 3.89. The rejection region is F >
3.89.
Since the observed value of the test statistic falls in the rejection region (F = 64.91 > 3.89), H0 is
rejected. There is sufficient evidence to indicate the model is useful for predicting y at = .05.

Copyright 2011 Pearson Education, Inc. Publishing as Prentice Hall.

770

Chapter 11

d.

H0: 1 = 0
Ha: 1 0
The test statistic is t =

1
s

1.836
= 5.01
.367

The rejection region requires /2 = .05/2 = .025 in each tail of the t distribution with df = n (k + 1)
= 15 (2 + 1) = 12. From Table V, Appendix B, t.025 = 2.179. The rejection region is t < 2.179 or t
> 2.179.
Since the observed value of the test statistic falls in the rejection region (t = 5.01 < 2.179), H0 is
rejected. There is sufficient evidence to indicate 1 is not 0 at = .05.
e.

The standard deviation is MSE = 114 = 10.68. We would expect about 95% of the
observations to fall within 2(10.68) = 21.36 units of the fitted regression line.

11.126 From the plot of the residuals for the straight line model, there appears to be a mound shape which implies
the quadratic model should be used.
11.127 E(y) = 0 + 1x1 + 2x2 + 3x3
1, if level 2
where x1 =
0, otherwise
11.128 a.

1, if level 3
x2 =
0, otherwise

1, if level 4
x3 =
0, otherwise

E(y) = 0 + 1x1 + 2x2 + 3x3


1, if level 2
1, if level 3
x3 =
where x2 =
0, otherwise

0, otherwise

b.

2
2
2
E(y) = 0 + 1x1 + 2 x1 + 3x2 + 4x3 + 5x1x2 + 6x1x3 + 7 x1 x2 + 8 x1 x3

where x1, x2, and x3 are as in part a.


11.129 The stepwise regression method is used to try to find the best model to describe a process. It is a screening
procedure that tries to select a small subset of independent variables from a large set of independent
variables that will adequately predict the dependent variable. This method is useful in that it can eliminate
some unimportant independent variables from consideration.
11.130 a.
b.

E(y) = 0 + 1x1 + 2x2


2
E(y) = 0 + 1x1 + 2 x12 + 3x2 + 4 x2 + 5x1x2

11.131 Even though SSE = 0, we cannot estimate 2 because there are no degrees of freedom corresponding to
error. With three data points, there are only two degrees of freedom available. The degrees of freedom
corresponding to the model is k = 2 and the degrees of freedom corresponding to error is n (k + 1) = 3
(2 + 1) = 0. Without an estimate for 2, no inferences can be made.

Copyright 2011 Pearson Education, Inc. Publishing as Prentice Hall.

Multiple Regression and Model Building


11.132 a.
b.

771

Ha: At least one of 4 and 5 0


The regression model
2
2
E(y) = 0 + 1x1 + 2x2 + 3 x2 + 4x1x2 + 5x1 x2
is fit to the 35 data points, yielding a sum of squares for error, denoted SSEC. The regression model
2
E(y) = 0 + 1x1 + 2x2 + 3 x2

is also fit to the data and its sum of squares for error is obtained, denoted SSER. Then the test statistic
is:

F=

(SSE R SSE C) /(k g )


SSE C /[n (k 1)]

where k = 5, g = 3, and n = 35.


c.

The numerator degrees of freedom is k g = 5 3 = 2, and the denominator degrees of freedom is n


(k + 1) = 35 (5 + 1) = 29.

d.

The rejection region requires = .05 in the upper tail of the F distribution with numerator df = 2 and
denominator df = 29. From Table VIII, Appendix B, F.05 = 3.33. The rejection region is F > 3.33.

11.133 a.

b.

A confidence interval for the difference of two population means, ( 1 2 ), could be used. Since
both sample sizes are over 30, the large sample confidence interval is used (with independent
samples).
1 if public college
Let x =
0 otherwise
The model is E(y) = 0 + 1x

c.

11.134 a.

1 is the difference between the two population means. A point estimate for 1 is 1 . A confidence
interval for 1 could be used to estimate the difference in the two population means.
1.
2.
3.
4.
5.

b.

The "Quantitative GMAT score" is measured on a numerical scale, so it is a quantitative


variable.
The "Verbal GMAT score" is measured on a numerical scale, so it is a quantitative variable.
The "Undergraduate GPA" is measured on a numerical scale, so it is a quantitative variable.
The "First-year graduate GPA" is measured on a numerical scale, so it is a quantitative
variable.
The "Student cohort" has 3 categories, so it is a qualitative variable. Note that the numerical
scale is meaningless in this situation. (It is possible to consider this as a quantitative variable.
However, for this problem we will consider it as qualitative.)

The quantitative variables GMAT score, verbal GMAT score, undergraduate GPA, and first-year
graduate GPA should all be positively correlated to final GPA.

Copyright 2011 Pearson Education, Inc. Publishing as Prentice Hall.

772

Chapter 11

c.

1 if student entered doctoral program in year 3


x5 =
0 otherwise
1 if student entered doctoral program in year 5
x6
0 otherwise

d.

E(y) = 0 + 1x1 + 2x2 + 3x3 + 4x4 + 5x5 + 6x6

e.

0 = the y-intercept for students entering in year 1.


1 = the final GPA will increase by 1 for each additional increase of one unit of GMAT score,
holding the remaining variables constant.

2 = the final GPA will increase by 2 for each additional increase of one unit of verbal GMAT score,
holding the remaining variables constant.

3 = the final GPA will increase by 3 for each additional increase of one undergraduate GPA point,
holding the remaining variables constant.

4 = the final GPA will increase by 4 for each additional increase of one first-year graduate GPA
point, holding the remaining variables constant.

5 = difference in mean final GPA between student cohort year 2 and year 1.
6 = difference in mean final GPA between student cohort year 3 and year 1.
f.

E(y) = 0 + 1x1 + 2x2 + 3x3 + 4x4 + 5x5 + 6x6 + 7x1x5 + 8x1x6


+ 9x2x5 + 10x2x6 + 11x3x5 + 12x3x6 + 13x4x5 + 14x4x6

g.

For the year 1 cohort, x5 = x6 = 0. The model is:


E(y) = 0 + 1x1 + 2x2 + 3x3 + 4x4 + 5(0) + 6(0) + 7x1(0) + 8x1(0)
+ 9x2(0) + 10x2(0) + 11x3(0) + 12x3(0) + 13x4(0) + 14x4(0)
= 0 + 1x1 + 2x2 + 3x3 + 4x4
The slopes for the four variables are 1, 2, 3 and 4 respectively.

11.135 a.

b.

The type of juice extractor is qualitative.


The size of the orange is quantitative.
The model is E(y) = 0 + 1x1 + 2x2
where

x1 = diameter of orange
1 if Brand B
x2 =
0 if not

c.

To allow the lines to differ, the interaction term is added:


E(y) = 0 + 1x1 + 2x2 + 3x1x2

Copyright 2011 Pearson Education, Inc. Publishing as Prentice Hall.

Multiple Regression and Model Building

d.

773

For part b:

For part c:

e.

To determine whether the model in part c provides more information for predicting yield than does
the model in part b, we test:
H0: 3 = 0
Ha: 3 0

f.

The test statistic would be F =

(SSE R SSE C ) /(k g )


SSE C /[n (k 1)]

To compute SSER: The model in part b is fit and SSER is the sum of squares for error.
To compute SSEC: The model in part c is fit and SSEC is the sum of squares for error.

k g = number of parameters in H0 which is 1


n (k + 1) = degrees of freedom for error in the complete model
11.136 a.

b.

R2 = .31. 31% of the total sample variation of the natural log of the level of CO2 emissions in 1996 is
explained by the model containing the 7 independent variables.
The test statistic is F

R2 k
(1 R ) [n (k 1)]
2

.31 7
3.72
(1 .31) [66 (7 1)]

The rejection region requires = .01 in the upper tail of the F-distribution with 1 = k = 7 and
2 = n (k + 1) = 66 (7 + 1) = 58. From Table VIII, Appendix B, F.01 = 2.95. The rejection region
is F > 2.95.
Since the observed value of the test statistic falls in the rejection region (F = 3.72 > 2.95), H0 is
rejected. There is sufficient evidence to indicate that at least one of the 7 independent variables is
useful in the prediction of natural log of the level of CO2 emissions in 1996 at = .01.

Copyright 2011 Pearson Education, Inc. Publishing as Prentice Hall.

774

Chapter 11

c.

To determine if foreign investments in 1980 is a useful predictor of CO2 emissions in 1996, we test:

H0: 1 = 0
Ha: 1 0
d.

The test statistic is t = 2.52 and the p-value is p < 0.05. Since the observed p-value is less than
(p < .05), H0 is rejected. There is sufficient evidence to indicate foreign investments in 1980 is a
useful predictor of CO2 emissions in 1996 at = .05.

11.137 Variables that are highly correlated with each other are x4 and x5 (r = -.84). When highly correlated
independent variables are present in a regression model, the results can be confusing. Possible problems
include:
1.

Global test indicates at least one independent variable is useful in the prediction of y, but none of the
individual tests for the independent variables is significant.

2.

The signs of the estimated beta coefficients are opposite from what is expected.

11.138 a.
b.

The main effects model would be: E ( y ) 0 1 x1 8 x8

1 .28 . The mean value for the relative error of the effort estimate for developers
is estimated to be .28 units below that of project leaders, holding previous accuracy constant.

8 .27 . The mean value for the relative error of the effort estimate if previous accuracy is more
than 20% is estimated to be .27 units above that if previous accuracy is less than 20%, holding
company role of estimator constant.

c.

11.139 a.

One possible reason for the sign of 1 being opposite from what is expected could be that company
role of estimator and previous accuracy could be correlated.
2

R = .712. 71.2% of the total sample variation in the fees charged by auditors is explained

by the model containing 7 independent variables.


b.

To determine if the model is adequate, we test:

H0: 1 = 2 = 3 = 4 = 5 = 6 = 7 = 0
Ha: At least one i 0, i = 1, 2, 3, ..., 7
The test statistic is F = 111.1 (from table).
Since no was given, we will use = .05. The rejection region requires = .05 in the upper tail of
the F-distribution with 1 = k = 7 and 2 = n (k + 1) = 268 (7 + 1) = 260. From Table VIII,
Appendix B, F.05 2.01. The rejection region is F > 2.01.
Since the observed value of the test statistic falls in the rejection region (F = 111.1 > 2.01), H0 is
rejected. There is sufficient evidence to indicate that the model is adequate for predicting the audit
fees at = .05.
c.

If new auditors charge less than incumbent auditors, then 1 is negative. By definition, x1 = 1 if new
auditor and 0 if incumbent. Therefore, we will be adding to the mean only for new auditors. If new
auditors charge less, we have to add a negative number.

Copyright 2011 Pearson Education, Inc. Publishing as Prentice Hall.

Multiple Regression and Model Building

11.140 a.

775

1 if no
Let x1 =
0 if yes
The model would be E(y) = 0 + 1x1
In this model, 0 is the mean job preference for those who responded yes to the question "Flextime
of the position applied for" and 1 is the difference in the mean job preference between those who
responded 'no' to the question and those who answered yes to the question.

b.

1 if referral
Let x1 =
0 if not

1 if on-premise
x2 =
0 if not

The model would be E(y) = o + 1x1 + 2x2


In this model, o is the mean job preference for those who responded none to level of day care
support required, 1 is the difference in the mean job preference between those who responded
referral and those who responded none, and 2 is the difference in the mean job preference
between those who responded on-premise and those who responded none.
c.

1 if counseling
Let x1 =
0 if not

1 if active search
x2 =
0 if not

The model would be E(y) = 0 + 1x1 + 2x2


In this model, 0 is the mean job preference for those who responded none to spousal transfer
support required, 1 is the difference in the mean job preference between those who responded
counseling and those who responded none, and 2 is the difference in the mean job preference
between those who responded active search and those who responded none.
d.

1 if not married
Let x1 =
0 if married
The model would be E(y) = 0 + 1x1
In this model, 0 is the mean job preference for those who responded married to marital status and
1 is the difference in the mean job preference between those who responded not married and those
who answered married.

e.

1 if female
Let x1 =
0 if male
The model would be E(y) = 0 + 1x1
In this model, 0 is the mean job preference for males and 1 is the difference in the mean job
preference between females and males.

Copyright 2011 Pearson Education, Inc. Publishing as Prentice Hall.

776

Chapter 11

11.141 The correlation coefficient between Importance and Replace is .2682. This correlation coefficient is fairly
small and would not indicate a problem with multicollinearity between Importance and Replace. The
correlation coefficient between Importance and Support is .6991. This correlation coefficient is fairly large
and would indicate a potential problem with multicollinearity between Importance and Support. Probably
only one of these variables should be included in the regression model. The correlation coefficient
between Replace and Support is .0531. This correlation coefficient is very small and would not indicate a
problem with multicollinearity between Replace and Support. Thus, the model could probably include
Replace and one of the variables Support or Importance.
11.142 CEO income (x1) and stock percentage (x2) are said to interact if the effect of one variable, say CEO
income, on the dependent variable profit (y) depends on the level of the second variable, stock percentage.
11.143 a.

1 if intervention group
Let x2 =
0 if otherwise
The first-order model would be:

E(y) = 0 + 1x1 + 2x2


b.

For the control group, x2 = 0. The first-order model is:

E(y) = 0 + 1x1 + 2(0) = 0 + 1x1


For the intervention group, x2 = 1. The first-order model is:

E(y) = 0 + 1x1 + 2(1) = 0 + 1x1 + 2 = (0 + 2) + 1x1


In both models, the slope of the line is 1.
c.

If pretest score and group interact, the first-order model would be:

E(y) = 0 + 1x1 + 2x2 + 3x1x2


d.

For the control group, x2 = 0. The first-order model including the interaction is:

E(y) = 0 + 1x1 + 2(0) + 3x1(0) = 0 + 1x1


For the intervention group, x2 = 1. The first-order model including the interaction is:

E(y) = 0 + 1x1 + 2(1) + 3x1(1) = 0 + 1x1 + 2 + 3x1


= (0 + 2) + (1 + 3)x1
The slope of the model for the control group is 1. The slope of the model for the intervention group
is 1 + 3.

Copyright 2011 Pearson Education, Inc. Publishing as Prentice Hall.

Multiple Regression and Model Building

11.144 a.

777

The SAS output is:


DEP VARIABLE: Y
ANALYSIS OF VARIANCE

SOURCE

DF

SUM OF
SQUARES

MEAN
SQUARE

MODEL
ERROR
C TOTAL

3
16
19

25784705.01
568826.19
26353531.20

ROOT MSE
DEP MEAN
C.V.

188.5514
3014.2
6.255438

F VALUE

PROB>F

8594901.67
35551.63709

241.758

0.0001

R-SQUARE
ADJ R-SQ

0.9784
0.9744

PARAMETER ESTIMATES

VARIABLE

PARAMETER
ESTIMATE

STANDARD
ERROR

T FOR H0:
PARAMETER=0

PROB > |T|

INTERCEP
X1
X2
X1X2

b.

DF
1
1
1
1

1333.17830
-0.15122302
-2.62532461
0.05195415

290.99944
0.37864583
5.34596285
0.006863831

4.581
-0.399
-0.491
7.569

0.0003
0.6949
0.6300
0.0001

The fitted model is y = 1333.18 .151x1 2.625x2 + .052x1x2


To determine if the overall model is useful, we test:

H0: 1 = 2 = 3 = 0
Ha: At least one i 0, i = 1, 2, 3
The test statistic is F =

MSR
8, 594, 901.67

= 241.758
MSE
35, 551.637

The rejection region requires = .05 in the upper tail of the F distribution with numerator df = k = 3
and denominator df = n (k + 1) = 20 (3 + 1) = 16. From Table VIII, Appendix B, F.05 = 3.24.
The rejection region is F > 3.24.
Since the observed value of the test statistic falls in the rejection region (F = 241.758 > 3.24), H0 is
rejected. There is sufficient evidence to indicate the model is useful at
= .05.
c.

To determine if the interaction is present, we test:

H0: 3 = 0
Ha: 3 0
The test statistic is t =

3 0

= 7.569.

Copyright 2011 Pearson Education, Inc. Publishing as Prentice Hall.

778

Chapter 11

The rejection region requires /2 = .05/2 = .025 in each tail of the t distribution with df = n (k + 1)
= 20 (3 + 1) = 16. From Table V, Appendix B, t.025 = 2.120. The rejection region is t < 2.120 or t
> 2.120.
Since the observed value of the test statistic falls in the rejection region (t = 7.569 > 2.120), H0 is
rejected. There is sufficient evidence to indicate the interaction between advertising expenditure and
shelf space is present at = .05.
d.

Advertising expenditure and shelf space are said to interact if the affect of advertising expenditure on
sales is different at different levels of shelf space.

e.

If a first-order model was used, the effect of advertising expenditure on sales would be the same
regardless of the amount of shelf space. If interaction really exists, the effect of advertising
expenditure on sales would depend on which level of shelf space was present.

f.

Since the data collected are sequential, it is fairly unlikely that the error terms are independent.

11.145 a.

Not necessarily. If Nickel was highly correlated to several other variables, then it might be better to
keep Nickel and drop some of the other highly correlated variables.

b.

Using stepwise regression is a good start for selecting the best set of predictor variables. However,
one should use caution when looking at the model selected using stepwise regression. Sometimes
important variables are not selected to be entered into the model. Also, many t-tests have been run,
thus inflating the Type I and Type II error rates. One must also consider using higher order terms in
the model and interaction terms.

c.

No, further exploration should be used. One should consider using higher order terms for the
variables (i.e. squared terms) and also interaction terms.
Using MINITAB, a scattergram of the data is:
Scatterplot of Rate vs Time
1.00

0.75

Rate

11.146 a.

0.50

0.25

0.00

0.0

0.5

1.0

1.5
Time

2.0

2.5

3.0

It appears that as the time increases, the rate decreases but at a decreasing rate.

Copyright 2011 Pearson Education, Inc. Publishing as Prentice Hall.

Multiple Regression and Model Building


b.

779

Using MINITAB, the results are:


Regression Analysis: Rate versus Time, Tmsq
The regression equation is
Rate = 1.01 - 1.17 Time + 0.290 Tmsq
Predictor
Constant
Time
Tmsq

Coef
1.00705
-1.1671
0.28975

S = 0.101142

SE Coef
0.07899
0.1219
0.03937

R-Sq = 92.7%

T
12.75
-9.57
7.36

P
0.000
0.000
0.000

R-Sq(adj) = 91.4%

Analysis of Variance
Source
Regression
Residual Error
Total
Source
Time
Tmsq

DF
1
1

DF
2
12
14

SS
1.54782
0.12276
1.67057

MS
0.77391
0.01023

F
75.65

P
0.000

Seq SS
0.99365
0.55416

The least squares prediction equation is: y 1.007 1.1671x .2898 x 2


c.

To determine if there is an upward curvature in the relationship between surface production rate and
time after turnoff, we test:
H0: 2 = 0
H a: 2 > 0

From the printout, the test statistic is t = 7.36 and the p-value is p = 0.000/2 = 0.000. Since the p-value
is less than (p = 0.000 < .05), Ho is rejected. There is sufficient evidence to indicate there is an
upward curvature in the relationship between surface production rate and time after turnoff at = .05.
11.147 a.

Using MINITAB, the scattergram is:

Copyright 2011 Pearson Education, Inc. Publishing as Prentice Hall.

780

Chapter 11

b.

1 if 1 35W
Let x2 =
0 if not
The complete second-order model would be
E(y) = 0 + 1x1 + 2x12 + 3x2 + 4x1x2 + 5x12x2

c.

Using MINITAB, the printout is:


Regression Analysis
The regression equation is
y = 776 + 0.104 x1 -0.000002 x1sq + 232 x2 - 0.0091 x1x2
+0.000000 x1sqx2
Predictor
Coef
Constant
776.4
x1
0.10418
x1sq
-0.00000223
x2
232
x1x2
-0.00914
x1sqx2
0.00000027
S = 15.58

StDev
144.5
0.01388
0.00000033
1094
0.09829
0.00000220

R-Sq = 97.2%

T
5.37
7.50
-6.73
0.21
-0.09
0.12

P
0.000
0.000
0.000
0.833
0.926
0.903

R-Sq(adj) = 97.0%

Analysis of Variance
Source
Regression
Residual Error
Total

Source
x1
x1sq
x2
x1x2
x1sqx2

DF
1
1
1
1
1

DF
5
66
71

SS
555741
16027
571767

MS
111148
243

F
457.73

P
0.000

Seq SS
254676
21495
279383
183
4

Unusual Observations
Obs
x1
y
Fit StDev Fit
27 19062 1917.64 1953.27
2.51
48 26148 1982.02 1978.23
9.10
53 26166 1972.92 1978.01
9.15
55 20250 2120.00 2130.56
10.57
56 20251 2140.00 2130.57
10.57
63 24885 2160.02 2161.81
12.67

Residual
-35.63
3.79
-5.09
-10.56
9.43
-1.79

St Resid
-2.32R
0.30 X
-0.40 X
-0.92 X
0.82 X
-0.20 X

R denotes an observation with a large standardized residual


X denotes an observation whose X value gives it large influence.

Copyright 2011 Pearson Education, Inc. Publishing as Prentice Hall.

Multiple Regression and Model Building

781

The fitted model is


2
2

y = 776 + .104x1 .000002x1 + 232x2 .0091x1x2 + .00000027x1 x2 .

To determine if the curvilinear relationship is different at the two locations, we test:


H0: 3 = 4 = 5 = 0
H0: At least one of the coefficients is nonzero
In order to test this hypothesis, we must fit the reduced model
E(y) = 0 + 1x1 + 2x12
Using MINITAB, the printout from fitting the reduced model is:
Regression Analysis
The regression equation is
y = 197 + 0.149 x1 -0.000003 x1sq
Predictor
Coef
Constant
197.5
x1
0.14921
x1sq
-0.00000295
S = 65.45

StDev
578.9
0.05551
0.00000132

R-Sq = 48.3%

T
0.34
2.69
-2.24

P
0.734
0.009
0.028

R-Sq(adj) = 46.8%

Analysis of Variance
Source
Regression
Residual Error
Total
Source
x1
x1sq

DF
2
69
71

DF
1
1

Unusual Observations
Obs
x1
y
30
16691 1916.13
48
26148 1982.02
53
26166 1972.92
56
20251 2140.00

SS
276171
295597
571767

MS
138085
4284

F
32.23

P
0.000

Seq SS
254676
21495

Fit
1865.11
2079.68
2079.59
2007.88

StDev Fit
23.39
33.08
33.31
10.43

Residual
51.02
-97.66
-106.67
132.12

St Resid
0.83 X
-1.73 X
-1.89 X
2.04R

R denotes an observation with a large standardized residual


X denotes an observation whose X value gives it large influence.
2

The fitted regression line is y = 197 + .149x1 .000003x1

To determine if the curvilinear relationship is different at the two locations, we test:


H0: 3 = 4 = 5 = 0
Ha: At least one of the coefficients is nonzero

Copyright 2011 Pearson Education, Inc. Publishing as Prentice Hall.

782

Chapter 11

The test statistic is F =

(SSE R SSE C ) /(k g )


(295, 597 16, 027) /(5 2)
=
SSE C /[ n ( k 1)]
16, 027 /[72 (5 1)]

= 383.76
Since no was given we will use = .05. The rejection region requires = .05 in the upper tail of
the F-distribution with 1 = (k g) = (5 2) = 3 and 2 = n (k + 1) = 72 (5 + 1) = 66. From
Table VIII, Appendix B, F.05 2.76. The rejection region is
F > 2.76.
Since the observed value of the test statistic falls in the rejection region
(F = 383.76 > 2.76), H0 is rejected. There is sufficient evidence to indicate the curvilinear
relationship is different at the two locations at = .05.
d.

Using MINITAB, the plot of the residual versus x1 is:

From this plot, we notice that there is only one point more than 2 standard deviations from the mean
and no points that are more than 3 standard deviations from the mean. Thus, there do not appear to
be any outliers. There is no curve to the residuals, so we have the appropriate model.
A stem-and-leaf display of the residuals is:
Character Stem-and-Leaf Display
Stem-and-leaf of RESI1
Leaf Unit = 1.0
1
1
2
5
13
23
29
(10)
33
28
21
13
10
3

-3
-3
-2
-2
-1
-1
-0
-0
0
0
1
1
2
2

= 72

5
5
210
99877755
4443221100
996655
4432111000
03344
5678899
11222244
577
0012334
556

Copyright 2011 Pearson Education, Inc. Publishing as Prentice Hall.

Multiple Regression and Model Building

783

The stem-and-leaf display looks fairly mound-shaped, so it appears that the assumption of normality
is valid.
A plot of the residuals versus the fitted values is:

From this plot, there is no cone-shape. Thus, it appears that the assumption of constant variance is
valid.
11.148 a.

The first order model for this problem is:


E(y) = 0 + 1x1 + 2x2 + 3x3 + 4x4

b.

Using MINITAB, the printout is:


Regression Analysis
The regression equation is
y = 28.9 -0.000000 x1 + 0.844 x2 - 0.360 x3 - 0.300 x4
Predictor
Coef
Constant
28.87
x1
-0.00000011
x2
0.8440
x3
-0.3600
x4
-0.3003
S = 5.989

StDev
12.67
0.00000028
0.2326
0.1316
0.1834

R-Sq = 51.2%

T
2.28
-0.38
3.63
-2.74
-1.64

P
0.034
0.708
0.002
0.013
0.117

R-Sq(adj) = 41.5%

Copyright 2011 Pearson Education, Inc. Publishing as Prentice Hall.

784

Chapter 11
Analysis of Variance
Source
Regression
Residual Error
Total
Source
x1
x2
x3
x4

DF
4
20
24

DF
1
1
1
1

Unusual Observations
Obs
x1
y
4 11940345 32.60
12
4905123 27.00

SS
753.76
717.40
1471.17

MS
188.44
35.87

F
5.25

P
0.005

Seq SS
129.96
355.43
172.19
96.17

Fit
17.25
16.17

StDev Fit
3.40
4.36

Residual
15.35
10.83

St Resid
3.11R
2.63R

R denotes an observation with a large standardized residual

The least squares prediction line is y = 28.9 .00000011x1 + .844x2 .360x3 .300x4.
To determine if the model is useful for predicting percentage of problem mortgages, we test:
H0: 1 = 2 = 3 = 4 = 0
Ha: At least one of the coefficients is nonzero

The test statistic is F =

MS(Model)
= 5.25
MSE

The p-value is p = .005. Since the p-value is less than = .05 (p = .005 < .05), H0 is rejected. There
is sufficient evidence to indicate the model is useful in predicting percentage of problem mortgages
at = .05.
c.

0 = 28.9. This is merely the y-intercept. It has no other meaning in this problem.

1 = 0.00000011. For each unit increase in total mortgage loans, the mean percentage of problem
mortgages is estimated to decrease by 0.00000011, holding percentage of invested assets, percentage
of commercial mortgages, and percentage of residential mortgages constant.

2 = 0.844. For each unit increase in percentage of invested assets, the mean percentage of problem
mortgages is estimated to increase by 0.844, holding total mortgage loans, percentage of commercial
mortgages, and percentage of residential mortgages constant.

3 = 0.360. For each unit increase in percentage of commercial mortgages, the mean percentage of
problem mortgages is estimated to decrease by 0.360, holding total mortgage loans, percentage of
invested assets, and percentage of residential mortgages constant.

4 = 0.300. For each unit increase in percentage of residential mortgages, the mean percentage of
problem mortgages is estimated to decrease by 0.300, holding total mortgage loans, percentage of
invested assets, and percentage of commercial mortgages constant.

Copyright 2011 Pearson Education, Inc. Publishing as Prentice Hall.

Multiple Regression and Model Building

d.

Using MINITAB, the scattergrams are:

From the scattergrams, it appears that possibly x2 and x4 might warrant inclusion in the model as
second order terms.

Copyright 2011 Pearson Education, Inc. Publishing as Prentice Hall.

785

786

Chapter 11

e.

Using MINITAB, the printout is:


Regression Analysis
The regression equation is
y = 56.2 -0.000000 x1 - 1.82 x2 - 0.449 x3 + 0.223 x4 + 0.0771 x2sq - 0.0189 x4sq
Predictor
Coef
Constant
56.17
x1
-0.00000008
x2
-1.8177
x3
-0.4494
x4
0.2227
x2sq
0.07707
x4sq
-0.01887
S = 4.956

StDev
13.81
0.00000025
0.9935
0.1127
0.6079
0.02665
0.02334

R-Sq = 69.9%

T
4.07
-0.31
-1.83
-3.99
0.37
2.89
-0.81

P
0.001
0.760
0.084
0.001
0.718
0.010
0.429

R-Sq(adj) = 59.9%

Analysis of Variance
Source
Regression
Residual Error
Total
Source
x1
x2
x3
x4
x2sq
x4sq

DF
6
18
24

DF
1
1
1
1
1
1

Unusual Observations
Obs
x1
y
4 11940345 32.600
10 5328142
7.500
12 4905123 27.000
20 2978628
3.200

SS
1029.03
442.13
1471.17

MS
171.51
24.56

F
6.98

P
0.001

Seq SS
129.96
355.43
172.19
96.17
259.22
16.05

Fit
26.777
16.105
16.559
11.759

StDev Fit
4.038
2.599
3.607
2.679

Residual
5.823
-8.605
10.441
-8.559

St Resid
2.03R
-2.04R
3.07R
-2.05R

R denotes an observation with a large standardized residual

The least squares prediction equation is


2
2

y = 56.2 .00000008x1 1.82x2 .449x3 + .223x4 + 1 .0771x2 .0189 x4


To determine if the model is useful for predicting percentage of problem mortgages, we test:
H0: 1 = 2 = 3 = 4 = 5 = 6 = 0
Ha: At least one of the coefficients is nonzero

The test statistic is F =

MS(Model)
= 6.98
MSE

Copyright 2011 Pearson Education, Inc. Publishing as Prentice Hall.

Multiple Regression and Model Building

787

The p-value is p = .001. Since the p-value is less than = .05 (p = .001 < .05), H0 is rejected. There
is sufficient evidence to indicate the model is useful in predicting percentage of problem mortgages
at = .05.
f.

To determine if one or more of the second-order terms of our model contribute information for the
prediction of the percentage of problem mortgages, we test:
H0: 5 = 6 = 0
Ha: At least one of the coefficients is nonzero

The test statistic is F =

(SSE R SSE C) /(k g ) (717.40 442.13) /(6 4)

= 5.60
442.13 /[25 (6 1)]
SSE C /[n (k 1)]

The rejection region requires = .05 in the upper tail of the F-distribution with 1 = (k g) = (6 4)
= 2 and 2 = n (k + 1) = 25 (6 + 1) = 18. From Table VIII, Appendix B, F.05 = 3.55. The
rejection region is F > 3.55.
Since the observed value of the test statistic falls in the rejection region (F = 5.60 > 3.55), H0 is
rejected. There is sufficient evidence to indicate one or more of the second-order terms of our model
contribute information for the prediction of the percentage of problem mortgages at = .05.
11.149 a.

The model is:


E(y) = 0 + 1x1 + 2x2 + 3x3
where

y = market share
1 if VH
x1 =
0 otherwise

1 if H
x2 =
0 otherwise

1 if M
x3 =
0 otherwise

We assume that the error terms ( i) or y's are normally distributed at each exposure level, with a
common variance. Also, we assume the i's have a mean of 0 and are independent.

b.

No interaction terms were included because we have only one independent variable, exposure level.
Even though we have 3 xi's in the model, they are dummy variables and correspond to different levels
of the one independent variable.

Copyright 2011 Pearson Education, Inc. Publishing as Prentice Hall.

788

Chapter 11
c.

Using MINITAB, the output is:


Regression Analysis: y versus x1, x2, x3
The regression equation is
y = 10.2 + 0.500 x1 + 2.02 x2 + 0.683 x3
Predictor
Constant
x1
x2
x3

Coef
10.2333
0.5000
2.0167
0.6833

S = 0.2655

SE Coef
0.1084
0.1533
0.1533
0.1533

R-Sq = 90.4%

T
94.41
3.26
13.16
4.46

P
0.000
0.004
0.000
0.000

R-Sq(adj) = 89.0%

Analysis of Variance
Source
Regression
Residual Error
Total
Source
x1
x2
x3

DF
3
20
23

DF
1
1
1

SS
13.3433
1.4100
14.7533

MS
4.4478
0.0705

F
63.09

P
0.000

Seq SS
0.7200
11.2225
1.4008

The fitted model is y = 10.2 + .5x1 + 2.02x2 + .683x3


1 if VH
x1 =
0 otherwise
1 if H
x2 =
0 otherwise
1 if M
x3 =
0 otherwise
d.

To determine if the firm's expected market share differs for different levels of advertising exposure,
we test:

H0: 1 = 2 = 3 = 0
Ha: At least one i 0, i = 1, 2, 3
The test statistic is F = 63.09.
The rejection region requires = .05 in the upper tail of the F-distribution with 1 = k = 3 and 2 = n
(k + 1) = 24 (3 + 1) = 20. From Table VIII, Appendix B, F.05 = 3.10. The rejection region is F >
3.10.
Since the observed value of the test statistic falls in the rejection region (F = 63.09 > 3.10), H0 is
rejected. There is sufficient evidence to indicate the firm's expected market share differs for different
levels of advertising exposure at = .05.

Copyright 2011 Pearson Education, Inc. Publishing as Prentice Hall.

Multiple Regression and Model Building

11.150 a.

789

Using SAS, the output for fitting the model is:


DEP VARIABLE: Y
ANALYSIS OF VARIANCE

SOURCE

DF

SUM OF
SQUARES

MEAN
SQUARE

MODEL
ERROR
C TOTAL

3
16
11

2396.36410
128.58590
2524.95000
2.83489
23.05000
12.29889

ROOT MSE
DEP MEAN
C.V.

F VALUE

PROB>F

798.78803
8.03662

99.394

0.0001

R-SQUARE
ADJ R-SQ

0.9491
0.9395

PARAMETER ESTIMATES
PARAMETER
STANDARD
INTERCEP
X1
X1SQ
X2

1
1
1
1

-11.768830
10.293782
-0.417991
13.244076

3.05032146
1.43788129
0.16132974
1.50325080

T FOR H0:
-3.858
7.159
-2.591
8.810

VARI
0.0014
0.0001
0.0197
0.0001

The fitted model is: y = 11.8 + 10.3x1 .418 x1 + 13.2x2

b.

To determine if the second-order term is necessary, we test:

H0: 2 = 0
Ha: 2 0
The test statistic is t = 2.591.
The p-value is p = .0197. Since the p-value is less than (p = .0197 < .05), H0 is rejected. There is
sufficient evidence to conclude that the second-order term in the model proposed by the operations
manager is necessary at = .05.

Copyright 2011 Pearson Education, Inc. Publishing as Prentice Hall.

790

Chapter 11

c.

The reduced model E(y) = 0 + 3x2 was fit to the data. The SAS output is:
DEP VARIABLE: Y
ANALYSIS OF VARIANCE

SOURCE

DF

SUM OF
SQUARES

MEAN
SQUARE

MODEL
ERROR
C TOTAL

1
18
19

1.25000000
2523.70000
2524.95000

ROOT MSE
DEP MEAN
C.V.

11.84084
23.05
51.37025

F VALUE

PROB>F

1.25000000
140.20556

0.009

0.9258

R-SQUARE
ADJ R-SQ

0.0005
-0.0550

PARAMETER ESTIMATES

VARIABLE

DF

PARAMETER
ESTIMATE

STANDARD
ERROR

T FOR H0:
PARAMETER=0

PROB > |T|

INTERCEP
X2

1
1

23.30000000
-0.50000000

3.74440323
5.29538583

6.223
-0.094

0.0001
0.9258

The fitted model is y = 23.3 .5x2.


The hypotheses are:

H0: 1 = 2 = 0
Ha: At least one i 0, i = 1, 2
(SSE R SSE C) /(k g )
SSE C /[n (k 1)]
(2523.7 128.586) /(3 1) 1197.557

= 149.01
=
128.586 /[20 (3 1)]
8.036625

The test statistic is F =

The rejection region requires = .10 in the upper tail of the F distribution with numerator df = k g
= 3 1 = 2 and denominator df = n (k + 1) = 20 (3 + 1) = 16. From Table VIII, Appendix B,
F.10 = 2.67. The rejection region is F > 2.67.
Since the observed value of the test statistic falls in the rejection region (F = 149.01
> 2.67), H0 is rejected. There is sufficient evidence to indicate the age of the machine contributes
information to the model at = .10.
After adjusting for machine type, there is evidence that down time is related to age.

Copyright 2011 Pearson Education, Inc. Publishing as Prentice Hall.

Multiple Regression and Model Building

11.151 a.

791

0 = 105 has no meaning because x3 = 0 is not in the observable range. 0 is simply the yintercept.

1 = 25. The estimated difference in mean attendance between weekends and weekdays is 25,
temperature and weather constant.

2 = 100. The estimated difference in mean attendance between sunny and overcast days is 100,
type of day (weekend or weekday) and temperature constant.

3 = 10. The estimated change in mean attendance for each additional degree of temperature is 10,
type of day (weekend or weekday) and weather (sunny or overcast) held constant.

b.

To determine if the model is useful for predicting daily attendance, we test:


H0: 1 = 2 = 3 = 0
Ha: At least one i 0, i = 1, 2, 3
R2 / k

.65 / 3

= 16.10
(1 R 2 ) /[n (k 1)] (1 .65) /[30 (3 1)]
The rejection region requires = .05 in the upper tail of the F distribution with numerator df = k = 3
and denominator df = n (k + 1) = 30 (3 + 1) = 26. From Table VIII, Appendix B, F.05 2.98.
The rejection region is F > 2.98.

The test statistic is F =

Since the observed value of the test statistic falls in the rejection region (F = 16.10 > 2.98), H0 is
rejected. There is sufficient evidence to indicate the model is useful for predicting daily attendance
at = .05.
c.

To determine if mean attendance increases on weekends, we test:


H0: 1 = 0
H a: 1 > 0
The test statistic is t =

1
s

25 0
= 2.5
10

The rejection region requires = .10 in the upper tail of the t distribution with df = n (k + 1) = 30
(3 + 1) = 26. From Table V, Appendix B, t.10 = 1.315. The rejection region is t > 1.315.
Since the observed value of the test statistic falls in the rejection region (t = 2.5 > 1.315), H0 is
rejected. There is sufficient evidence to indicate the mean attendance increases on weekends at =
.10.
d.

Sunny x2 = 1, Weekday x1 = 0, Temperature 95 x3 = 95


= 105 + 25(0) + 100(1) + 10(95) = 945

e.

We are 90% confident that the actual attendance for sunny weekdays with a temperature of 95 is
between 645 and 1245.

Copyright 2011 Pearson Education, Inc. Publishing as Prentice Hall.

792

Chapter 11

11.152 a.

For a sunny weekday, x1 = 0 and x2 = 1:

x3 = 70 y = 250 700(0) + 100(1) + 5(70) + 15(0)(70) = 700

x3 = 80 y = 250 700(0) + 100(1) + 5(80) + 15(0)(80) = 750

x3 = 90 y = 800

x3 = 100 y = 850
For a sunny weekend, x1 = 1 and x2 = 1:

x3 = 70 y = 250 700(1) + 100(1) + 5(70) + 15(1)(70) = 1050

x3 = 80 y = 250 700(1) + 100(1) + 5(80) + 15(1)(80) = 1250

x3 = 90 y = 1450

x3 = 100 y = 1650

For both sunny weekdays and sunny weekend days, as the predicted high temperature increases, so
does the predicted day's attendance. However, the predicted day's attendance on sunny weekend
days increases at a faster rate than on sunny weekdays. Also, the predicted day's attendance is higher
on sunny weekend days than on sunny weekdays.
b.

To determine if the interaction term is a useful addition to the model, we test:


H0: 4 = 0
Ha: 4 0
The test statistic is t =

4
s

15
=5
3

The rejection region requires /2 = .05/2 = .025 in each tail of the t distribution with df = n (k + 1)
= 30 (4 + 1) = 25. From Table V, Appendix B, t.025 = 2.06. The rejection region is t < 2.06 or t >
2.06.
Since the observed value of the test statistic falls in the rejection region (t = 5 > 2.06), H0 is rejected.
There is sufficient evidence to indicate the interaction term is a useful addition to the model at =
.05.

Copyright 2011 Pearson Education, Inc. Publishing as Prentice Hall.

Multiple Regression and Model Building

c.

793

For x1 = 0, x2 = 1, and x3 = 95,

y = 250 700(0) + 100(1) + 5(95) + 15(0)(95) = 825

d.

The width of the interval in Exercise 11.151e is 1245 645 = 600, while the width is
850 800 = 50 for the model containing the interaction term. The smaller the width of the interval,
the smaller the variance. This implies that the interaction term is quite useful in predicting daily
attendance. It has reduced the unexplained error.

e.

Because an interaction term including x1 is in the model, the coefficient corresponding to x1 must be
interpreted with caution. For all observed values of x3 (temperature), the interaction term value is
greater than 700.

11.153 a.

E(y) = 0 + 1x1 + 2x6 + 3x7


1 if condition is good
where x 6 =
0 otherwise
1 if condition is fair
x7 =
0 otherwise

b.

The model specified in part a seems appropriate. The points for E, F, and G cluster around three
parallel lines.

Copyright 2011 Pearson Education, Inc. Publishing as Prentice Hall.

794

Chapter 11

c.

Using MINITAB, the output is


The regression equation is
y = 188875 + 15617 x1 - 103046 x6 - 152487 x7
Predictor
Constant
x1
x6
x7
S = 64624

Coef
StDev
T
P
188875
28588
6.61
0.000
15617
1066
14.66
0.000
-103046
31784
-3.24
0.004
-152487
39157
-3.89
0.001
R-Sq = 91.8%
R-Sq(adj) = 90.7%

Analysis of Variance
Source

DF

SS

MS

Regression
Residual Error
Total

3
21
24

9.86170E+11
87700442851
1.07387E+12

3.28723E+11
4176211564

78.71

0.000

Source
x1
x6
x7

DF
1
1
1

SeqSS
9.15776E+11
7061463149
63332198206

Unusual Observations
Obs
x1
y
10
62.0
950000
23
14.0
573200

Fit
1054078
407512

StDev Fit
53911
26670

Residual
-104078
165688

St Resid
-2.92RX
2.81R

R denotes an observation with a large standardized residual


X denotes an observation whose X value gives it large influence.

The fitted model is y = 188,875 + 15,617x1 103,046x6 152,487x7

For excellent condition, y = 188,875 + 15,617x1

For good condition, y = 85,829 + 15,617x1

For fair condition, y = 36,388 + 15,617x1


d.

Copyright 2011 Pearson Education, Inc. Publishing as Prentice Hall.

Multiple Regression and Model Building

e.

795

We must first fit a reduced model with just x1, number of apartments. Using MINITAB, the output
is:
The regression equation is
y = 101786 + 15525 x1
Predictor
Constant
x1

Coef
101786
15525

S = 82908

StDev
23291
1345

R-Sq = 85.3%

T
4.37
11.54

P
0.000
0.000

R-Sq(adj) = 84.6%

Analysis of Variance
Source
Regression
Residual Error
Total

DF
1
23
24

Unusual Observations
Obs
x1
y
4
26.0
676200
10
62.0
950000
23
14.0
573200

SS
9.15776E+11
1.58094E+11
1.07387E+12

Fit
505433
1064353
319140

MS
9.15776E+11
6873656705

F
133.23

StDev Fit
24930
69058
16765

Residual
170757
-114353
254060

P
0.000

St Resid
2.16R
-2.49RX
3.13R

R denotes an observation with a large standardized residual


X denotes an observation whose X value gives it large influence.

The fitted model is y = 101,786 + 15,525x1.


To determine if the relationship between sale price and number of units differs depending on the
physical condition of the apartments, we test:
H0: 2 = 3 = 0
Ha: At least one i 0, i = 2, 3
The test statistic is:
F=

(SSE R SSE C ) /(k g ) (1.58094 1011 87, 700, 442, 851) / 2

= 8.43
SSE C /[n (k 1)]
4,176, 211, 564

The rejection region requires = .05 in the upper tail of the F distribution with 1 = k g = 3 1 = 2
and 2 = n (k + 1) = 25 (3 + 1) = 21. From Table VIII, Appendix B, F.05 = 3.47. The rejection
region is F > 3.47.
Since the observed value of the test statistic falls in the rejection region (F = 8.43 > 3.47), H0 is
rejected. There is evidence to indicate that the relationship between sale price and number of units
differs depending on the physical condition of the apartments at = .05.

Copyright 2011 Pearson Education, Inc. Publishing as Prentice Hall.

796

Chapter 11

f.

We will look for high pairwise correlations.

x2
x3
x4
x5
x6
x7

x1
-0.014
0.800
0.224
0.878
0.175
-0.128

x2

x3

x4

x5

-0.188
-0.363
0.027
-0.447
0.392

0.166
0.673 0.089
0.271 0.112 0.020
-0.118 0.050 -0.238

x6

-0.564

When highly correlated independent variables are present in a regression model, the results are
confusing. The researchers may only want to include one of the variables. This may be the case for
the variables: x1 and x3, x1 and x5, x3 and x5
g.

Use the following plots to check the assumptions on .


residuals vs x1
residuals vs x2
residuals vs x3
residuals vs x4
residuals vs x5
resisduals vs predicted values
frequency distribution of the standardized residuals.
From the plots of the residuals, there do not appear to be any outliers - no standardized residuals are
larger than 2.38 in magnitude. In all the plots of the residuals vs xi, there is no trend that would
indicate non-constant variance (no funnel shape). In addition, there is no U or upside-down U shape
that would indicate that any of the variables should be squared. In the histogram of the residuals, the
plot is fairly mound-shaped, which would indicate the residuals are approximately normally
distributed. All of the assumptions appear to be met.

Copyright 2011 Pearson Education, Inc. Publishing as Prentice Hall.

Multiple Regression and Model Building

797

Residuals Versus x1
(response is y)

Residuals Versus x2
(response is y)

Residuals Versus x3
(response is y)

Copyright 2011 Pearson Education, Inc. Publishing as Prentice Hall.

798

Chapter 11

Residuals Versus x4
(response is y)

Residuals Versus x5
(response is y)

Residuals Versus the Predicted Values


(response is y)

Copyright 2011 Pearson Education, Inc. Publishing as Prentice Hall.

Multiple Regression and Model Building

799

Histogram of the Residuals


(response is y)

1 if C
11.154 Let x1 = Length of operation and let x 2
0 otherwise
To allow for the relationship between Drop in Light Output and Length of Operation to be different for the
two different Bulb Surfaces, we will fit the model: E(y) = 0 + 1x1 + 2x2 + 3x1x2.
Using MINITAB, the results of fitting
Regression Analysis: DROP versus x1, x2, x1x2
The regression equation is
DROP = 1.46 + 0.00473 x1 + 5.39 x2 + 0.00991 x1x2
Predictor
Constant
x1
x2
x1x2

Coef
1.464
0.004732
5.393
0.009911

S = 3.15719

SE Coef
2.151
0.001492
3.042
0.002109

R-Sq = 95.5%

T
0.68
3.17
1.77
4.70

P
0.512
0.010
0.107
0.001

R-Sq(adj) = 94.1%

Analysis of Variance
Source
Regression
Residual Error
Total
Source
x1
x2
x1x2

DF
1
1
1

DF
3
10
13

SS
2106.68
99.68
2206.36

MS
702.23
9.97

F
70.45

P
0.000

Seq SS
840.88
1045.79
220.02

The fitted regression line is y 1.464 .0047 x1 5.393 x2 .0099 x1 x2


To determine if the model is adequate for predicting Drop in Light Output, we test:
H0: 1 = 2 = 3 = 0
Ha: At least 1 i 0

Copyright 2011 Pearson Education, Inc. Publishing as Prentice Hall.

800

Chapter 11

From the printout, the test statistic is F = 70.45 and the p-value is p = 0.000. Since the p-value is so small,
H0 is rejected. There is sufficient evidence to indicate the model is adequate for predicting Drop in Light
Output at any reasonable value of .
For this model, R2 = 95.5%. 95.5% of the total variability of the Drop in Light Outputs is explained by th
model containing Length of Operation, Bulb Surface, and the Interaction of Bulb Surface and Length of
Operation.
To determine if the interaction between Bulb Surface and Length of Operation is significant, we test:
H0: 3 = 0
Ha: 3 0
From the printout, the test statistic is t = 4.70 and the p-value is p = 0.001. Since the p-value is so small, H0
is rejected. There is sufficient evidence to indicate Bulb Surface and Length of Operation interact at any
reasonable value of .
Using MINITAB, the residual plots are:
Residual Plots for DROP
Normal Probability Plot of the Residuals

Residuals Versus the Fitted Values

99

Residual

Percent

90
50

0
-4

10
1

-8

-4

0
Residual

-8

Histogram of the Residuals

20
30
Fitted Value

40

Residuals Versus the Order of the Data

6
Residual

Frequency

10

4
2
0

-6

-4

-2
0
Residual

0
-4
-8

5 6 7 8 9 10 11 12 13 14
Observation Order

From the histogram of the residuals, the residuals look somewhat mound-shaped. In addition, the normal
probability plot looks to be a fairly straight line. Thus, the assumption of normal errors appears to be valid.
From the plot of the residuals versus the fitted values, there is no funnel shape. It does not appear that the
error terms increase or decrease as the fitted values increase. Thus, it appears that the assumption of
constant variance appears to be valid.
It appears that the model is a pretty good model for the prediction of the Drop in Light Output.

Copyright 2011 Pearson Education, Inc. Publishing as Prentice Hall.

Multiple Regression and Model Building


11.155 a.

801

To determine whether the complete model contributes information for the prediction of y, we test:
H0: 1 = 2 = 3 = 4 = 5 = 0
Ha: At least one of the 's is not 0, i = 1, 2, 3, 4, 5

MSR

SS(Model) 4, 911.56

982.31
k
5

MSE

b.

SSE
1, 830.44

53.84
n (k 1) 40 (5 1)

The test statistic is F =

MSR 982.31
=
= 18.24
MSE 53.84

The rejection region requires = .05 in the upper tail of the F distribution with numerator df = k = 5
and denominator df = n (k + 1) = 40 (5 + 1) = 34. From Table VIII, Appendix B, F.05 2.53.
The rejection region is F > 2.53.

Since the observed value of the test statistic falls in the rejection region (F = 18.24 > 2.53), H0 is
rejected. There is sufficient evidence to indicate that the complete model contributes information for
the prediction of y at = .05.
c.

To determine whether a second-order model contributes more information than a first-order model
for the prediction of y, we test:
H0: 3 = 4 = 5 = 0
Ha: At least one i 0,

d.

The test statistic is F =

i = 3, 4, 5

(SSE R SSE C ) /(k g ) (3197.16 1830.44) /(5 2)

SSE C /[ n ( k 1)]
1830.44 /(40 (5 1)
=

455.5733
= 8.46
53.8365

The rejection region requires = .05 in the upper tail of the F distribution with numerator df = k g
= 3 and denominator df = n (k + 1) = 40 (5 + 1) = 34. From Table VIII, Appendix B, F.05 2.92.
The rejection region is F > 2.92.
Since the observed value of the test statistic falls in the rejection region (F = 8.46 > 2.92), H0 is
rejected. There is sufficient evidence to indicate the second-order model contributes more
information than a first-order model for the prediction of y at = .05.
e.
11.156 a.

The second-order model, based on the test result in part d.


The complete second order model is:
2
2
E(y) = 0 + 1x1 + x1 + 3x2 + 4x1x2 + 5 x1 x2

where x1 = age
1 if current
x2
0 otherwise

Copyright 2011 Pearson Education, Inc. Publishing as Prentice Hall.

802

Chapter 11

b.

To determine if the quadratic terms are important, we test:

c.

H0: 2 = 5 = 0
To determine if the interaction terms are important, we test:
H0: 4 = 5 = 0

d.

From MINITAB, the outputs from fitting the three models are:
Regression Analysis: Value versus Age, AgeSq, Status, AgeSt, AgeSqSt
The regression equation is
Value = 83 - 5.7 Age + 0.236 AgeSq - 62 Status + 5.4 AgeSt - 0.234 AgeSqSt
Predictor
Constant
Age
AgeSq
Status
AgeSt
AgeSqSt

Coef
83.4
-5.74
0.2361
-62.1
5.36
-0.2337

S = 286.8

SE Coef
316.3
18.68
0.2549
354.8
24.81
0.4080

R-Sq = 24.7%

T
0.26
-0.31
0.93
-0.18
0.22
-0.57

P
0.793
0.760
0.359
0.862
0.830
0.570

R-Sq(adj) = 16.1%

Analysis of Variance
Source
Regression
Residual Error
Total
Source
Age
AgeSq
Status
AgeSt
AgeSqSt

DF
5
44
49

DF
1
1
1
1
1

SS
1186549
3618994
4805542

MS
237310
82250

F
2.89

P
0.024

Seq SS
865746
138871
77594
77342
26996

Regression Analysis: Value versus Age, Status, AgeSt


The regression equation is
Value = - 176 + 11.2 Age + 196 Status - 11.4 AgeSt
Predictor
Constant
Age
Status
AgeSt

Coef
-176.1
11.166
196.5
-11.432

S = 283.2

SE Coef
145.0
3.902
178.9
6.763

R-Sq = 23.2%

T
-1.21
2.86
1.10
-1.69

P
0.231
0.006
0.278
0.098

R-Sq(adj) = 18.2%

Analysis of Variance
Source
Regression
Residual Error
Total
Source
Age
Status
AgeSt

DF
1
1
1

DF
3
46
49

SS
1116017
3689526
4805543

MS
372006
80207

F
4.64

P
0.006

Seq SS
865746
21097
229174

Regression Analysis: Value versus Age, AgeSq, Status


The regression equation is
Value = 166 - 8.8 Age + 0.253 AgeSq - 106 Status

Copyright 2011 Pearson Education, Inc. Publishing as Prentice Hall.

Multiple Regression and Model Building


Predictor
Constant
Age
AgeSq
Status

Coef
165.8
-8.81
0.2535
-105.6

803

S = 284.5

SE Coef
182.7
10.89
0.1632
107.9

R-Sq = 22.5%

T
0.91
-0.81
1.55
-0.98

P
0.369
0.423
0.127
0.333

R-Sq(adj) = 17.5%

Analysis of Variance
Source
Regression
Residual Error
Total
Source
Age
AgeSq
Status

DF
1
1
1

DF
3
46
49

SS
1082210
3723332
4805542

MS
360737
80942

F
4.46

P
0.008

Seq SS
865746
138871
77594

Test for part b:


The test statistic is:
F=

(SSE R SSE C)/(k g ) (3, 689, 526 3, 618, 994) / 2

= .429
82, 250
SSE C /[n (k 1)]

Since no is given, we will use = .05. The rejection region requires = .05 in the upper tail of the
F distribution with 1 = 2 numerator degrees of freedom and 2 = 44 denominator degrees of
freedom. From Table VIII, Appendix B, F.05 3.23. The rejection region is F > 3.23.
Since the observed value of the test statistic does not fall in the rejection region (F = .429 3.23),

H0 is not rejected. There is insufficient evidence to indicate the quadratic terms are important for
predicting market value at = .05.
Test for part c:
The test statistic is:
F=

(SSE R SSE C)/(k g ) (3, 723, 332 3, 618, 994) /(5 3)


= .634

82, 250
SSE C /[n (k 1)]

The rejection region is the same as in previous test. Reject H0 if F > 3.23.
Since the observed value of the test statistic does not fall in the rejection region
(F = .634 3.23), H0 is not rejected. There is insufficient evidence to indicate the interaction terms

are important for predicting market value at = .05.

Copyright 2011 Pearson Education, Inc. Publishing as Prentice Hall.

804

Chapter 11

11.157 First, we will fit the simple linear regression model: E ( y ) o 1 x1 2 x2


Using MINITAB, the results are:
Regression Analysis: y versus x1, x2
The regression equation is
y = - 1.57 + 0.0257 x1 + 0.0336 x2
Predictor
Constant
x1
x2

Coef
-1.5705
0.025732
0.033615

S = 0.4023

SE Coef
0.4937
0.004024
0.004928

R-Sq = 68.1%

T
-3.18
6.40
6.82

P
0.003
0.000
0.000

R-Sq(adj) = 66.4%

Analysis of Variance
Source
Regression
Residual Error
Total
Source
x1
x2

DF
1
1

DF
2
37
39

SS
12.7859
5.9876
18.7735

MS
6.3930
0.1618

F
39.51

P
0.000

Seq SS
5.2549
7.5311

Unusual Observations
Obs
x1
y
4
100
1.5400
32
39
1.2200

Fit
2.6498
2.1558

SE Fit
0.1699
0.1483

Residual
-1.1098
-0.9358

St Resid
-3.04R
-2.50R

R denotes an observation with a large standardized residual

To determine if the model is useful in the prediction of y (GPA), we test:


H0: 1 = 2 = 0
Ha: At least one i 0, i = 1, 2
The test statistic is F = 39.51 and the p-value is p = 0.000. Since the p-value is so small, H0 is rejected for
any reasonable value of . There is sufficient evidence to indicate at least one of the variables Verbal score
or Mathematics score is useful in predicting GPA.
To determine if Verbal score is useful in predicting GPA, controlling for Mathematics score, we test:
H0: 1 = 0
Ha: 1 0
The test statistic is t = 6.40 and the p-value is p = 0.000. Since the p-value is so small, H0 is rejected for
any reasonable value of . There is sufficient evidence to indicate Verbal score is useful in predicting
GPA, controlling for Mathematics score.
To determine if Mathematics score is useful in predicting GPA, controlling for Verbal score, we test:
H0: 2 = 0
Ha: 2 0
The test statistic is t = 6.82 and the p-value is p = 0.000. Since the p-value is so small, H0 is rejected for
any reasonable value of . There is sufficient evidence to indicate Mathematics score is useful in
predicting GPA, controlling for Verbal score.

Copyright 2011 Pearson Education, Inc. Publishing as Prentice Hall.

Multiple Regression and Model Building

805

Thus, both terms in the model are significant. The R-squared value is R2 = .681.
This indicates that 68.1% of the sample variance of the GPAs is explained by the model.
Now, we need to check the residuals. From MINITAB, the plots are:

Residual Plots for y


Normal Probability Plot of the Residuals

Residuals Versus the Fitted Values


0.5
Residual

1.0

90
Percent

99

50
10
1

0.0
-0.5
-1.0

-1.0

-0.5

0.0
Residual

0.5

1.0

Histogram of the Residuals

3
Fitted Value

Residuals Versus the Order of the Data


1.0
0.5
Residual

9
6
3
0

0.0
-0.5
-1.0

-1.00 -0.75 -0.50 -0.25 0.00

Residual

0.25

0.50

0.75

10

15
20
25
30
Observation Order

Residuals Versus x1
(response is y )
1.0

0.5

Residual

Frequency

12

0.0

-0.5

-1.0
40

50

60

70
x1

80

90

100

Copyright 2011 Pearson Education, Inc. Publishing as Prentice Hall.

35

40

806

Chapter 11

Residuals Versus x2
(response is y )
1.0

Residual

0.5

0.0

-0.5

-1.0
50

60

70

80

90

100

x2

From the normal probability plot, it appears that the assumption of normality is valid. The points are very
close to a straight line except for the first 2 points. The histogram of the residuals implies that the residuals
are slightly skewed to the left. I would still consider the assumption to be valid. The plot of the residuals
versus y-hat indicates a random spread of the residuals between the two bands. This indicates that the
assumption of equal variances is probably valid. The plot of the residuals versus x1 indicates that the
relationship between GPA and Verbal score may not be linear, but quadratic because the points form a
somewhat upside down U shape. The plot of the residuals versus x2 indicates that the relationship between
GPA and Mathematics score may or may not be quadratic.
Since the plots indicate a possible 2nd order model and the R2 value is not real large, we will fit a complete
2nd order model:
2
2
E ( y ) o 1 x1 2 x2 3 x1 4 x2 5 x1 x2

Copyright 2011 Pearson Education, Inc. Publishing as Prentice Hall.

Multiple Regression and Model Building

807

Using MINITAB, the results are:


Regression Analysis: y versus x1, x2, x1sq, x2sq, x1x2
The regression equation is
y = - 9.92 + 0.167 x1 + 0.138 x2 - 0.00111 x1sq - 0.000843 x2sq + 0.000241 x1x2
Predictor
Constant
x1
x2
x1sq
x2sq
x1x2

Coef
-9.917
0.16681
0.13760
-0.0011082
-0.0008433
0.0002411

S = 0.187142

SE Coef
1.354
0.02124
0.02673
0.0001173
0.0001594
0.0001440

R-Sq = 93.7%

T
-7.32
7.85
5.15
-9.45
-5.29
1.67

P
0.000
0.000
0.000
0.000
0.000
0.103

R-Sq(adj) = 92.7%

Analysis of Variance
Source
Regression
Residual Error
Total
Source
x1
x2
x1sq
x2sq
x1x2

DF
1
1
1
1
1

DF
5
34
39

SS
17.5827
1.1908
18.7735

MS
3.5165
0.0350

F
100.41

P
0.000

Seq SS
5.2549
7.5311
3.6434
1.0552
0.0982

Unusual Observations
Obs
2
4
34

x1
68
100
70

y
2.8900
1.5400
3.8200

Fit
3.2820
1.5806
3.3940

SE Fit
0.1002
0.1404
0.0753

Residual
-0.3920
-0.0406
0.4260

St Resid
-2.48R
-0.33 X
2.49R

R denotes an observation with a large standardized residual.


X denotes an observation whose X value gives it large influence.

To determine if the interaction between Verbal score and Mathematics score is useful in the prediction of y
(GPA), we test:
H0: 5 = 0
H a: 5 0
The test statistic is t = 1.67 and the p-value is p = 0.103. Since the p-value is not small, H0 is not rejected
for any value of < .10. There is insufficient evidence to indicate the interaction between Verbal score
and Mathematics score is useful in predicting GPA.
Now, we will fit a model without the interaction term, but including the squared terms:
2
2
E ( y ) o 1 x1 2 x2 3 x1 4 x2

Copyright 2011 Pearson Education, Inc. Publishing as Prentice Hall.

808

Chapter 11

Using MINITAB, the results are:


Regression Analysis: y versus x1, x2, x1sq, x2sq
The regression equation is
y = - 11.5 + 0.189 x1 + 0.159 x2 - 0.00114 x1sq - 0.000871 x2sq
Predictor
Constant
x1
x2
x1sq
x2sq

Coef
-11.458
0.18887
0.15874
-0.0011412
-0.0008705

S = 0.191905

SE Coef
1.019
0.01709
0.02417
0.0001186
0.0001626

R-Sq = 93.1%

T
-11.24
11.05
6.57
-9.62
-5.35

P
0.000
0.000
0.000
0.000
0.000

R-Sq(adj) = 92.3%

Analysis of Variance
Source
Regression
Residual Error
Total
Source
x1
x2
x1sq
x2sq

DF
1
1
1
1

DF
4
35
39

SS
17.4845
1.2890
18.7735

MS
4.3711
0.0368

F
118.69

P
0.000

Seq SS
5.2549
7.5311
3.6434
1.0552

Unusual Observations
Obs
2
4
32
34

x1
68
100
39
70

y
2.8900
1.5400
1.2200
3.8200

Fit
3.2921
1.7059
1.3190
3.3954

SE Fit
0.1025
0.1219
0.1240
0.0772

Residual
-0.4021
-0.1659
-0.0990
0.4246

St Resid
-2.48R
-1.12 X
-0.68 X
2.42R

R denotes an observation with a large standardized residual.


X denotes an observation whose X value gives it large influence.

To determine if the relationship between Verbal score and GPA is quadratic, controlling for Mathematics
score, we test:
H0: 3 = 0
Ha: 3 0
The test statistic is t = 9.62 and the p-value is p = 0.000. Since the p-value is so small, H0 is rejected for
any reasonable value of . There is sufficient evidence to indicate the relationship between Verbal score
and GPA is quadratic, controlling for Mathematics score.
To determine if the relationship between Verbal score and GPA is quadratic, controlling for Mathematics
score, we test:
H0: 4 = 0
Ha: 4 0

Copyright 2011 Pearson Education, Inc. Publishing as Prentice Hall.

Multiple Regression and Model Building

809

The test statistic is t = 5.35 and the p-value is p = 0.000. Since the p-value is so small, H0 is rejected for
any reasonable value of . There is sufficient evidence to indicate the relationship between Mathematics
score and GPA is quadratic, controlling for Verbal score.
Thus, both quadratic terms in the model are significant. The R-squared value is R2 =.913. This indicates
that 91.3% of the sample variance of the GPAs is explained by the model.
Now, we need to check the residuals. From MINITAB, the plots are:

Residual Plots for y


Normal Probability Plot of the Residuals

Residuals Versus the Fitted Values

0.25

Residual

0.50

90
Percent

99

50
10
1
-0.50

-0.25

0.00
Residual

0.25

0.00
-0.25
-0.50

0.50

Histogram of the Residuals

2
Fitted Value

Residuals Versus the Order of the Data


0.50

12

0.25
Residual

16
Frequency

8
4
0

-0.4

-0.2

0.0
Residual

0.2

0.4

0.00
-0.25
-0.50

10

15
20
25
30
Observation Order

Copyright 2011 Pearson Education, Inc. Publishing as Prentice Hall.

35

40

810

Chapter 11

Residuals Versus x1
(response is y )
0.5
0.4
0.3

Residual

0.2
0.1
0.0
-0.1
-0.2
-0.3
-0.4
40

50

60

70
x1

80

90

100

Residuals Versus x2
(response is y )
0.5
0.4
0.3

Residual

0.2
0.1
0.0
-0.1
-0.2
-0.3
-0.4
50

60

70

80

90

100

x2

From the normal probability plot, it appears that the assumption of normality is valid. The points are very
close to a straight line. The histogram of the residuals also implies that the residuals are approximately
normal. The plot of the residuals versus y-hat indicates a random spread of the residuals between the two
bands. This indicates that the assumption of equal variances is probably valid. The plot of the residuals
versus x1 indicates a random spread of the residuals between the two bands. This indicates that the order of
x1 (2nd) is appropriate. The plot of the residuals versus x2 indicates a random spread of the residuals
between the two bands. This indicates that the order of x2 (2nd) is appropriate.
The model appears to be pretty good. All terms in the model are significant, the residual analysis indicates
the assumptions are met and the R-squared value is fairly close to 1. The fitted model is
2
2

y 11.5 0.189 x1 0.159 x2 0.0114 x1 0.000871x2 .

Copyright 2011 Pearson Education, Inc. Publishing as Prentice Hall.

Potrebbero piacerti anche