Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
1. A company sets different prices for a particular stereo system in eight different
regions of the country. The accompanying table shows the numbers of units sold and
the corresponding prices (in hundreds of dollars).
SALES 420 380 350 400 440 380 450 420
PRICE 5.5 6.0 6.5 6.0 5.0 6.5 4.5 5.0
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.937137027
R Square 0.878225806
Adjusted R Square 0.857930108
Standard Error 12.74227575
Observations 8
ANOVA
df SS MS F Significance F
Regression 1 7025.806452 7025.806452 43.27152318 0.000592135
Residual 6 974.1935484 162.3655914
Total 7 8000
460
440
420
400
Units Sold
380
360
340
320
300
4 4.5 5 5.5 6 6.5 7
Price ($100)
(b) What effect would you expect a $100 increase in price to have on sales?
A $100 increase in the price will be expected to cause a 42.58 unit drop in sales.
Regression Statistics
Multiple R 0.733725713
R Square 0.538353422
Adjusted R Square 0.518281832
Standard Error 0.642482917
Observations 25
ANOVA
df SS MS F Significance F
Regression 1 11.07156114 11.07156114 26.82166251 2.99579E-05
Residual 23 9.494038861 0.412784298
Total 24 20.5656
Regression Statistics
Multiple R 0.805179413
R Square 0.648313886
Adjusted R Square 0.609237652
Standard Error 0.207325489
Observations 11
ANOVA
df SS MS F Significance F
Regression 1 0.713145275 0.713145275 16.5910019 0.002786228
Residual 9 0.386854725 0.042983858
Total 10 1.1
4. Refer to the data of Exercise 2. Test against a two-sided alternative the null hypothesis
that mutual fund losses on Friday, November 13, 1989, did not depend linearly on
previous gains in 1989.
We perform this test by looking at the p-value associated with the regression coefficient
for the variable "Gains" (see regression output; this p-value is 2.996E-05, or 0.00002996).
The null hypothesis that the "Gains" coefficient is zero can be rejected with a very small
probability of Type I error (alpha 0.00003).
0.2875
t
0.04566
6.2965
3.5
2.5
2
Sales ($100)
1.5
0.5
0
0 5 10 15 20 25
Shelf Space (feet)
ANOVA
df SS MS F Significance F
Regression 1.0000 2.0535 2.0535 21.6386 0.0009
Residual 10.0000 0.9490 0.0949
Total 11.0000 3.0025
̂ 0 1.45
̂ 1 0.074
(c) Interpret the meaning of the slope ̂ 1 in this problem.
For every additional foot of shelf space, we expect to see an increase in sales of $7.40 (in
other words, 0.0740 * $100).
(d) Predict the average weekly sales (in hundreds of dollars) of pet food for stores with 8 feet
of shelf space for pet food.
Yˆ ˆ0 ˆ1 X
1.45 0.074 8
2.042 * $100
$204.20
ANOVA
df SS MS F Significance F
Regression 1.0000 1.5360 1.5360 15.8242 0.0026
Residual 10.0000 0.9707 0.0971
Total 11.0000 2.5067
400
350
300
Video Sales (units)
250
200
150
100
50
0
0 10 20 30 40 50 60 70
ANOVA
df SS MS F Significance F
Regression 1.0000 171499.7780 171499.7780 74.8505 0.0000
Residual 28.0000 64154.4244 2291.2294
Total 29.0000 235654.2023
9. An agent for a residential real estate company in a large city would like to be able to predict
the monthly rental costs for apartments based on the size of apartment as defined by square
footage. A sample of 25 apartments in a particular residential neighborhood was selected and the
information gathered revealed the following:
MONTHLY SIZE (SQUARE MONTHLY SIZE (SQUARE
APARTMENT RENT ($) FEET) APARTMENT RENT ($) FEET)
1 950 850 14 1,800 1,369
2 1,600 1,450 15 1,400 1,175
3 1,200 1,085 16 1,450 1,225
4 1,500 1,232 17 1,100 1,245
5 950 718 18 1,700 1,259
6 1,700 1,485 19 1,200 1,150
7 1,650 1,136 20 1,150 896
8 935 726 21 1,600 1,361
9 875 700 22 1,650 1,040
10 1,150 956 23 1,200 755
11 1,400 1,100 24 800 1,000
12 1,650 1,285 25 1,750 1,200
13 2,300 1,985
(a) Set up a scatter diagram.
ScatterDiagram- Rentvs.SquareFeet
2500
2000
1500
1000
500
0
0 500 1000 1500 2000 2500
S quare Feet
ANOVA
df SS MS F Significance F
Regression 1 2268776.5453 2268776.5453 59.9138 0.0000
Residual 23 870949.4547 37867.3676
Total 24 3139726.0000
̂ 0 177.12
̂ 1 1.0651
(c) State the regression equation.
Ŷ 177.12 1.0651 X
(e) Predict the average monthly rental cost for an apartment that has 1,000 square feet.
Ŷ 177.12 1.0651 X
177.12 1.0651 1000
$1,242
(g) Your friends Jim and Jennifer are considering signing a lease for an apartment in this
residential neighborhood. They are trying to decide between two apartments, one
with 1,000 square feet for a monthly rent of $1,250 and the other with 1,200 square
feet for a monthly rent of $1,425. What would you recommend to them? Why?
We can use our model to see whether these apartments are a relatively good deal for the
money.
1,000 square foot apartment:
Ŷ 177.12 1.0651 X
177.12 1.0651 1000
$1,242
($1,250 is a little more expensive than we would expect.)
R R2
SSE
1
TSS
4
1
40
0.9
0.94868
This coefficient has two possible interpretations:
R is the correlation coefficient between predictions made with this model (Y-hat)and
actual observations of the dependent variable (Y).
R is the absolute value of the sample correlation coefficient between X and Y.
The first of these two interpretations applies not only to simple regression, but also to
multiple regression models. The second interpretation applies only to simple regression
models.
(c) How useful do you think this regression model is for predicting sales?
While it doesn't explain all of the variability in sales, this model seems to be fairly
useful, based on (a) and (b).
(b) At the = 0.05 level of significance, what are the critical values?
Note that n - k - 1 = 18 - 1 - 1 = 16. In the t-table, we see that
t 16 ,0.025 2.120
Therefore, we will reject the null hypothesis if the test statistic is greater than 2.12, or if
it is less than -2.12.
(c) On the basis of your answers to (a) and (b), what statistical decision should be
made?
We reject the null hypothesis that the slope is zero, and conclude that the independent
variable is useful in predicting the behavior of the dependent variable.
(c) On the basis of your answers to (a) and (b), what statistical decision should be
made?
The value of the F statistic is clearly large enough to reject the null hypothesis. This
model is useful in explaining variability in the dependent variable. In fact, using Excel,
we can determine the p-value associated with this F:
=FDIST(27,1,18) = 0.000061
15. In Problem 9 above an agent for a real estate company wanted to predict the
monthly rent for apartments based on the size of the apartment. Use the computer
output you obtained to solve that problem.
(a) At the 0.05 level of significance, is there evidence of a linear relationship between
the size of the apartment and the monthly rent?
Yes, as evidenced by the very small p-value associated with the Size in Square Feet
independent variable. Note in the t-table that the critical value for 5% significance is
t n k 1 , 2 t 23 ,0.025 2.069
The t here is 7.7404; clearly in the rejection region.
(b) Set up a 95% confidence interval estimate of the population
slope 1 .
ˆ 1 t n k 1 , 2 s 1
1.0651 t 23 ,0.025 0.1376
1.0651 2.069 0.1376
1.0651 0.2847
Or, 0.780 ,1.350
80
70
60
50
40
30
20
10
0
0 50 100 150 200 250 300 350
Cases
df SS MS F Significance F
Regression 1 2443.4660 2443.4660 619.1956 0.0000
Residual 18 71.0315 3.9462
Total 19 2514.4975
̂ 0 24.8345
̂ 1 0.14
(g) Compute the coefficient of determination R 2 and explain its meaning in this
problem.
SSE
R2 1
TSS
71.0315
1
2514.4975
0.9718
97.18% of the variation in delivery times can be explained by variation in the number of
cases of beverages being delivered.
5.00
4.00
3.00
2.00
1.00
0.00
30 35 40 45 50 55 60 65 70
- 1.00
- 2.00
- 3.00
- 4.00
- 5.00
DeliveryTime
(k) At the .05 level of significance, is there evidence of a linear relationship between
delivery time and the number of cases delivered?
The critical value of t for this test is t n k 1 , 2 t 18 ,0.025 2.101 . Our t is:
ˆ 1
t
s ˆ
1
0.14
0.0056
25
We can reject the null hypothesis and conclude that there is a relationship between
delivery time and the number of cases delivered.
(m) Set up a 95% prediction interval estimate of the delivery time for an individual customer
who is receiving 150 cases of soft drink.
Ŷ t n k 1 , / 2 s
24.8345 0.14 * 150 t 18 ,0.025 1.9865
45.83 2.101 1.9865
45.83 4.174
Or, 41.656 ,50.004
(o) Explain how the results obtained in (a)-(n) can help allocate delivery costs to
customers.
The beverage company can predict the cost of any particular delivery using the
regression model, which allows it to perform activity-based cost analysis and allocate
delivery costs to individual customers taking into account the differences in delivery
volume.
500
450
400
350
300
250
1,800 1,900 2,000 2,100 2,200 2,300 2,400 2,500 2,600 2,700
INC OM ING C ALLS
df SS MS F Significance F
Regression 1 48645.56 48645.56 56.21 0.0000
Residual 33 28560.84 865.48
Total 34 77206.40
̂ 0 63.0205
̂ 1 0.1890
(e) Predict the average number of trades executed for a day in which the number of incoming
calls is 2,000.
Ŷ ˆ0 ˆ1 X 1
63.0205 0.1890 X 1
63.0205 0.1890 2000
315.0 trades
3.5
2.5
500 550 600 650 700 750 800
GMAT
df SS MS F Significance F
Regression 1 1.7257 1.7257 71.0310 0.0000
Residual 18 0.4373 0.0243
Total 19 2.1631
(b) Interpret the meaning of the Y intercept ̂ 0 and the slope ̂ 1 in this problem.
̂ 0 is the theoretical GPI that would be executed if a student had received a zero on the
GMAT. ̂ 1 is the expected incremental increase in the GPI for each additional one-
point improvement in the GMAT score.
(c) Use the regression model developed in (a) to predict the average GPI for a student with a GMAT score of
600.
Ŷ ˆ0 ˆ1 X 1
0.3003 0.0049X 1
0.3003 0.0049 600
3.24
R R2
0.7978
0.8932
(g) Perform a residual analysis on your results and determine the adequacy of the fit of the
model.
OBSERVATION GMAT SCORE GPI Predicted GPI Residuals
1 688 3.72 3.651 0.069
2 647 3.44 3.451 -0.011
3 652 3.21 3.476 -0.266
4 608 3.29 3.261 0.029
5 680 3.91 3.612 0.298
6 617 3.28 3.305 -0.025
7 557 3.02 3.013 0.007
8 599 3.13 3.218 -0.088
9 616 3.45 3.300 0.150
10 594 3.33 3.193 0.137
11 567 3.07 3.062 0.008
12 542 2.86 2.940 -0.080
13 551 2.91 2.984 -0.074
14 573 2.79 3.091 -0.301
15 536 3 2.911 0.089
16 639 3.55 3.412 0.138
17 619 3.47 3.315 0.155
18 694 3.6 3.680 -0.080
19 718 3.88 3.797 0.083
20 759 3.76 3.997 -0.237
0.400
0.300
0.200
0.100
0.000
2.5 2.7 2.9 3.1 3.3 3.5 3.7 3.9 4.1
- 0.100
- 0.200
- 0.300
- 0.400
GPI
There is no obvious pattern, but one might see evidence of a subtle upward trend in the
residuals as GPI increases.
(h) At the .05 level of significance, is there evidence of a linear relationship between
GMAT score and GPI?
Yes, the p-value for the estimated slope coefficient is very small.
(i) Set up a 95% confidence interval estimate for the average GPI of students with a GMAT score of 600.
s 0.1559
Ŷ t n k 1 , / 2 0.3003 0.0049 * 600 t 18 ,0.025
n 20
3.2403 2.101 0.0349
3.2403 0.0732
Or, 3.17 ,3.31
(j) Set up a 95% prediction interval estimate of the GPI for a particular student with a GMAT score of 600.
Ŷ t n k 1 , / 2 s 0.3003 0.0049 * 600 t 18 ,0.025 0.1559
3.2403 2.101 0.1559
3.2403 0.3275
Or, 2.91 ,3.57
ˆ 1 t n k 1 , 2 s ˆ
1
0.0049 t 18 ,0.025 0.0006
0.0049 2.101 0.0006
0.0049 0.00126
Or, 0.00364 ,0.00616
3
Hours
0
0 50 100 150 200 250 300 350
Invoices Processed
df SS MS F Significance F
Regression 1 25.9438 25.9438 232.2200 0.0000
Residual 28 3.1282 0.1117
Total 29 29.0720
(d) Use the regression model developed in (b) to predict the average amount of time would take to process 150
invoices.
Ŷ ˆ 0 ˆ 1 X 1
0.4024 0.0126X 1
0.4024 0.0126 150
2.29 hours
(h) Plot the residuals against the number of invoices processed and also against time.
Scatter Plot Scatter Plot
1.00 1.00
Residuals
Residuals
0.00 0.00
0 50 100 150 200 250 300 350 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5
-1.00 -1.00
Invoices Processed Hours
(k) Set up a 95% confidence interval estimate of the average amount of time taken process 150 invoices.
s 0.3342
Ŷ t n k 1 , / 2 0.4024 0.0126 * 150 t 28 , 0.025
n 30
2.29 2.048 0.061
2.29 0.125
Or, 2.165 ,2.415
(l) Set up a 95% prediction interval estimate of the amount of time it takes to process 150 invoices on a
particular day.
Ŷ t n k 1 , / 2 s 0.4024 0.0126 * 150 t 28 , 0.025 0.3342
2.29 2.048 0.3342
2.29 0.684
Or, 1.606 ,2.974