Multiple Regression

06-05-10
Multiple Regression
Multiple Regression
Part 1. Basic Multiple Regression
The Linear Regression Model
The Least Squares Point Estimates
The Mean Squared Error and the Standard
Error
Model Utility: R2, Adjusted R2, and the F Test
Testing Significance of an Independent
Variable
Confidence Intervals and Prediction Intervals
• Part 2 Using Squared and Interaction Terms
The Quadratic Regression Model
Interaction
1
06-05-10
Multiple Regression
Part 3 Dummy Variables and Advanced Statistical
Inferences
Dummy Variables to Model Qualitative Variables
The Partial F Test: Testing a Portion of a Model
Part 4 Model Building and Model Diagnostics

Model Building and Model Diagnostics
Model Building and the Effects of Multicollineartity
Diagnostics for Detecting Outlying and
Influential Observations
The linear regression model relating y to x1, x2, …, xk is

y= µ y|x1 , x2 ,..., xk + ε = β0 + β1 x1 + β2 x2 + ... + β k xk + ε
where
µ y|x1 , x2 ,..., xk = β0 + β1 x1 + β2 x2 + ... + βk xk is the mean value of the
dependent variable y when the values of the independent
variables are x1, x2, …, xk.
β0 , β1 , β2 ,..., βk are the regression parameters relating the mean
value of y to x1, x2, …, xk.
ε is an error term that describes the effects on y of all factors other
than the independent variables x1, x2, …, xk .
2
06-05-10
Example: The Linear Regression

Model
Example : The Fuel Consumption Case

Average Hourly Fuel Consumption
Week Temperature, x1 (F) Chill Index, x2 y (MMcf)
1 28.0 18 12.4
2 28.0 14 11.7
3 32.5 24 12.4
4 39.0 22 10.8
5 45.9 8 9.4
6 57.8 16 9.5
7 58.1 1 8.0
8 62.5 0 7.5
y = β0 + β1 x1 + β2 x2 + ε

Illustrated
3
06-05-10
The Regression Model Assumptions
Model y= µ y|x1 , x2 ,..., xk + ε = β0 + β1 x1 + β2 x2 + ... + βk xk + ε
Assumptions about the model error terms, ε’s

Mean Zero The mean of the error terms is equal to 0.
Constant Variance The variance of the error terms σ2 is,
the same for every combination values of x1, x2, …, xk.
Normality The error terms follow a normal distribution
for every combination values of x1, x2, …, xk.
Independence The values of the error terms are
statistically independent of each other.
Least Squares Estimates and Prediction
Estimation/Prediction Equation:
yˆ = b0 + b1 x01 + b2 x02 + ... + bk x0 k
is the point estimate of the mean value of the dependent variable when
the values of the independent variables are x01, x02, …, x0k. It is also the
point prediction of an individual value of the dependent variable
when the values of the independent variables are x01, x02, …, x0k.
b1, b2, …, bk are the least squares point estimates of the parameters
β1, β 2, …, β k.
x01, x02, …, x0k are specified values of the independent predictor

variables x1, x2, …, xk.
4
06-05-10
Example: Least Squares Estimation

FuelCons = 13.1 - 0.0900 Temp + 0.0825 Chill
Predictor Coef StDev T P

Constant 13.1087 0.8557 15.32 0.000
Temp -0.09001 0.01408 -6.39 0.001
Chill 0.08249 0.02200 3.75 0.013
S = 0.3671 R-Sq = 97.4% R-Sq(adj) = 96.3%
Analysis of Variance
Source DF SS MS F P
Regression 2 24.875 12.438 92.30 0.000
Residual Error 5 0.674 0.135
Total 7 25.549
Predicted Values (Temp = 40, Chill = 10)

Fit StDev Fit 95.0% CI 95.0% PI
10.333 0.170 ( 9.895, 10.771) ( 9.293, 11.374)
Example: Point Predictions and

Residuals
Observed Fuel Predicted Fuel
Average Hourly Consumption Consumption Residual
Week Temperature, x1 (F) Chill Index, x2 y (MMcf) 13.1087 - .0900x1 + .0825x2 e = y - pred
1 28.0 18 12.4 12.0733 0.3267
2 28.0 14 11.7 11.7433 -0.0433
3 32.5 24 12.4 12.1631 0.2369
4 39.0 22 10.8 11.4131 -0.6131
5 45.9 8 9.4 9.6372 -0.2372
6 57.8 16 9.5 9.2260 0.2740
7 58.1 1 8.0 7.9616 0.0384
8 62.5 0 7.5 7.4831 0.0169
5
06-05-10
Mean Square Error and Standard Error

SSE = ∑ e i2 = ∑ ( y i − yˆ i ) 2 Sum of Squared Errors
SSE Mean Square Error, point estimate
s 2 = MSE =
n- ( k + 1 ) of residual variance σ2
SSE Standard Error, point estimate of
s = MSE =
n-(k + 1) residual standard deviation σ
Example :The Fuel Consumption Case
Source DF SS MS F P
Regression 2 24.875 12.438 92.30 0.000
Total 7 25.549
SSE 0.674
s 2 = MSE = = = 0.1348 s = s 2 = 0.1348 = 0.3671
n-(k + 1) 8 − 3
The Multiple Coefficient of Determination
The multiple coefficient of determination R2 is

Explained variation
R2 =
Total variation
R2 is the proportion of the total variation in y explained by the linear
regression model
Total variation = Explained variation + Unexplained variation

Total variation = ∑ (yi − y )2 Total Sum of Squares (SSTO)
Explained variation = ∑ (yˆ i − y )2 Regression Sum of Squares (SSR)
Unexplained variation = ∑ (yi − yˆ i )2 Error Sum of Squares (SSE)
Multiple correlation coefficient , R = R 2
6
06-05-10
The Adjusted R2
The adjusted multiple coefficient of determination is

 k  n − 1 
R 2 =  R2 −  
 n − 1  n − ( k + 1) 
S = 0.3671 R-Sq = 97.4% R-Sq(adj) = 96.3%
Source DF SS MS F P
Regression 2 24.875 12.438 92.30 0.000
Total 7 25.549
24.875  2  8 − 1 
R2 = = 0.974, R 2 =  0.974 −   = 0.963
25.549  8 − 1  8 − (2 + 1) 
F Test for Linear Regression Model
To test H0: β 1= β 2 = …= β κ = 0 versus

Ha: At least one of the β 1, β 2, …, β k is not equal to 0
Test Statistic:
(Explained variation)/k
F(model) =
(Unexplained variation)/[n - (k + 1)]
Reject H0 in favor of Ha if:

F(model) > Fα or
p-value < α
Fα is based on k numerator and n-(k+1) denominator degrees of freedom.
7
06-05-10
Example: F Test for Linear Regression
Example :The Fuel Consumption Case
Source DF SS MS F P
Regression 2 24.875 12.438 92.30 0.000
Total 7 25.549
Test Statistic:
(Explained variation)/k 24.875 / 2
F(model) = = = 92.30
(Unexplained variation)/[n - (k + 1)] 0.674 /(8 − 3)
Reject H0 at α level of significance, since
F-test at α = 0.05
F(model) = 92.30 > 5.79 = F.05 and level of significance
p - value ≈ 0.000 < 0.05 = α
Fα is based on 2 numerator and 5 denominator degrees of freedom.
12.5 Testing Significance of the Independent

Variable
If the regression assumptions hold, we can reject H0: β j = 0 at the α
level of significance (probability of Type I error equal to α) if and only if
the appropriate rejection point condition holds or, equivalently, if the
corresponding p-value is less than α.
Alternative Reject H0 if: p-Value

Ha : β j > 0 t > tα Area under t distributi on right of t
Ha : β j < 0 t < −tα Area under t distributi on left of t
Ha : β j ≠ 0 t > tα / 2 , that is Twice area under t distributi on right of t
t > tα / 2 or t < −tα / 2

Test Statistic α)% Confidence Interval for β j
100(1-α
bj
t= [b j ± tα / 2 sb j ]
sbj
tα, tα/2 and p-values are based on n – (k+1) degrees of freedom.
8
06-05-10
Example: Testing and Estimation

for βs

Constant 13.1087 0.8557 15.32 0.000
Temp -0.09001 0.01408 -6.39 0.001
Chill 0.08249 0.02200 3.75 0.013
Test Interval
b
t= 2 =
0.08249
= 3.75 > 2.571 = t.025 [b2 ± tα / 2 sb2 ] =
sb2 0.02200 [0.08249 ± (2.571)(0.02200)] =
[0.08249 ± 0.05656] =
p − value = 2 × P (t > 3.75) = 0.013
[0.02593, 0.13905]
Chill is significant at the α = 0.05 level, but not at α = 0.01
tα, tα/2 and p-values are based on 5 degrees of freedom.
Confidence and Prediction Intervals

Prediction: yˆ = b0 + b1 x01 + b2 x02 + ... + bk x0 k
If the regression assumptions hold,

100(1 - α)% confidence interval for the mean value of y
[ŷ ± t α/2 s( y − yˆ ) ] s( y − yˆ ) = s Distance value
100(1 - α)% prediction interval for an individual value of y
[ŷ ± t α/2 s yˆ ], s yˆ = s 1 + Distance value
Distance value (requires matrix algebra), see Appendix G on CD-ROM

tα/2 is based on n-(k+1) degrees of freedom
9
06-05-10
Example: Confidence and Prediction

Intervals
FuelCons = 13.1 - 0.0900 Temp + 0.0825 Chill
Predicted Values (Temp = 40, Chill = 10)

10.333 0.170 (9.895, 10.771) (9.293,11.374)
95% Confidence Interval 95% Prediction Interval

[ŷ ± t α/2 s Distance value ] [ŷ ± t α/2 s 1 + Distance value ]
[10.333 ± ( 2.571)( 0.3671) 0.2144515 ] [10.333 ± (2.571)( 0.3671) 1 + 0.2144515 ]
[10.333 ± 0.438] [10.333 ± 1.041]
[9.895,10 .771] [9.292,11 .374]
The Quadratic Regression Model
Model y= β0 + β1 x + β2 x
2
10
06-05-10
Example: Quadratic Regression Model

Example :The Gasoline Additive Case
Units of Mileage,
Additive, x y (MPG)
0 25.8
0 26.1
0 25.4
1 29.6
1 29.2
1 29.8
2 32.0
2 31.4
2 31.7
3 31.7
3 31.5
3 31.2
4 29.4
4 29.0
4 29.5
Example: Quadratic Regression Model
Example : The Gasoline Additive Case

Mileage = 25.7 + 4.98 Units - 1.02 UnitsSq
Constant 25.7152 0.1554 165.43 0.000
Units 4.9762 0.1841 27.02 0.000
UnitsSq -1.01905 0.04414 -23.09 0.000
S = 0.2861 R-Sq = 98.6% R-Sq(adj) = 98.3%
Source DF SS MS F P
Regression 2 67.915 33.958 414.92 0.000
Total 14 68.897
Predicted Values (Units = 2.44, UnitsSq = (2.44)(2.44) = 5.9536)
31.7901 0.1111 ( 31.5481, 32.0322) ( 31.1215, 32.4588)
yˆ = 25.7152 + 4.9762( 2.44) − 1.01905(2.44) 2 = 31.7901 mpg
11
06-05-10
Interaction
Radio and TV Print Sales Radio and TV Print Sales

Example Sales
Region
Expenditures
x1
Expenditures
x2
Volume
y
Sales
Region
Expenditures
x1
Expenditures
x2
Volume
y
12.13: The 1 1 1 3.27 14 3 4 17.99
2 1 2 8.38 15 3 5 19.85
Bonner 3 1 3 11.28 16 4 1 9.46
4 1 4 14.5 17 4 2 12.61
Frozen 5 1 5 19.63 18 4 3 15.5
Foods Case 6 2 1 5.84 19 4 4 17.68
7 2 2 10.01 20 4 5 21.02
8 2 3 12.46 21 5 1 12.23
9 2 4 16.67 22 5 2 13.58
10 2 5 19.83 23 5 3 16.77
11 3 1 8.51 24 5 4 20.56
12 3 2 10.14 25 5 5 21.05
13 3 3 14.75
Modeling Interaction
Model y= β0 + β1 x1 + β2 x2 + β3 x1 x2 x1x2 is a cross-product or interaction term

Example : The Bonner Frozen Food Case Minitab Output
Sales = - 2.35 + 2.36 RadioTV + 4.18 Print - 0.349 Interact
Constant -2.3497 0.6883 -3.41 0.003
RadioTV 2.3611 0.2075 11.38 0.000
Print 4.1831 0.2075 20.16 0.000
Interact -0.34890 0.06257 -5.58 0.000
S = 0.6257 R-Sq = 98.6% R-Sq(adj) = 98.4%
Source DF SS MS F P
Regression 3 590.41 196.80 502.67 0.000
Total 24 598.63
Predicted Values (RadioTV = 2, Print = 5, Interact=(2)(5) = 10)
19.799 0.265 ( 19.247, 20.351) ( 18.385, 21.213)
12
06-05-10
Using Dummy Variables to Model

Qualitative Independent Variable
Example : The Electronics World Case

Number of Location Sales
Households Dummy Volume
Store x Location DM y
1 161 Street 0 157.27
2 99 Street 0 93.28
3 135 Street 0 136.81
4 120 Street 0 123.79
5 164 Street 0 153.51
6 221 Mall 1 241.74
7 179 Mall 1 201.54
8 204 Mall 1 206.71
9 214 Mall 1 229.78
10 101 Mall 1 135.22
Location Dummy Variable

1 if a store is in a mall location
DM =
0 otherwise
Example: Regression with a Dummy

Variable
Example : The Electronics World Case
Sales = 17.4 + 0.851 Households + 29.2 DM

Constant 17.360 9.447 1.84 0.109
Househol 0.85105 0.06524 13.04 0.000
DM 29.216 5.594 5.22 0.001
S = 7.329 R-Sq = 98.3% R-Sq(adj) = 97.8%
Source DF SS MS F P
Regression 2 21412 10706 199.32 0.000
Residual Error 7 376 54
Total 9 21788
13
06-05-10
The Partial F Test: Testing the

Significance of a Portion of a Regression
Model
Complete model : y= β0 + β1 x1 + ... + β g x g + β g +1 x g +1 + ... + βk xk + ε
Reduced model : y= β0 + β1 x1 + ... + β g x g + ε
To test H0: β g+1= β g+2 = …= β k = 0 versus

Ha: At least one of the β g+1, β g+2, …, β k is not equal to 0
(SSE R - SSE C )/(k - g)

Partial F Statistic: F=
SSE C /[n - (k + 1)]
Reject H0 in favor of Ha if:
F > Fα or
p-value < α
Fα is based on k-g numerator and n-(k+1) denominator degrees of
freedom.
Model Building and the Effects of

Multicollinearity
Example: The Sale Territory Performance Case
Sales Time MktPoten Adver MktShare Change Accts WkLoad Rating
3669.88 43.10 74065.11 4582.88 2.51 0.34 74.86 15.05 4.9
3473.95 108.13 58117.30 5539.78 5.51 0.15 107.32 19.97 5.1
2295.10 13.82 21118.49 2950.38 10.91 -0.72 96.75 17.34 2.9
4675.56 186.18 68521.27 2243.07 8.27 0.17 195.12 13.40 3.4
6125.96 161.79 57805.11 7747.08 9.15 0.50 180.44 17.64 4.6
2134.94 8.94 37806.94 402.44 5.51 0.15 104.88 16.22 4.5
5031.66 365.04 50935.26 3140.62 8.54 0.55 256.10 18.80 4.6
3367.45 220.32 35602.08 2086.16 7.07 -0.49 126.83 19.86 2.3
6519.45 127.64 46176.77 8846.25 12.54 1.24 203.25 17.42 4.9
4876.37 105.69 42053.24 5673.11 8.85 0.31 119.51 21.41 2.8
2468.27 57.72 36829.71 2761.76 5.38 0.37 116.26 16.32 3.1
2533.31 23.58 33612.67 1991.85 5.43 -0.65 142.28 14.51 4.2
2408.11 13.82 21412.79 1971.52 8.48 0.64 89.43 19.35 4.3
2337.38 13.82 20416.87 1737.38 7.80 1.01 84.55 20.02 4.2
4586.95 86.99 36272.00 10694.20 10.34 0.11 119.51 15.26 5.5
2729.24 165.85 23093.26 8618.61 5.15 0.04 80.49 15.87 3.6
3289.40 116.26 26879.59 7747.89 6.64 0.68 136.58 7.81 3.4
2800.78 42.28 39571.96 4565.81 5.45 0.66 78.86 16.00 4.2
3264.20 52.84 51866.15 6022.70 6.31 -0.10 136.58 17.44 3.6
3453.62 165.04 58749.82 3721.10 6.35 -0.03 138.21 17.98 3.1
1741.45 10.57 23990.82 860.97 7.37 -1.63 75.61 20.99 1.6
2035.75 13.82 25694.86 3571.51 8.39 -0.43 102.44 21.66 3.4
1578.00 8.13 23736.35 2845.50 5.15 0.04 76.42 21.46 2.7
4167.44 58.54 34314.29 5060.11 12.88 0.22 136.58 24.78 2.8
2799.97 21.14 22809.53 3552.00 9.14 -0.74 88.62 24.96 3.9
14
06-05-10
Correlation Matrix
Example: The Sale Territory Performance Case
Multicollinearity
Multicollinearity refers to the condition where the independent

variables (or predictors) in a model are dependent, related, or
correlated with each other.
Effects
Hinders ability to use bjs, t statistics, and p-values to assess the
relative importance of predictors.
Does not hinder ability to predict the dependent (or response)
variable.
Detection
Scatter Plot Matrix
Correlation Matrix
Variance Inflation Factors (VIF)
15
06-05-10
Variance Inflation Factors (VIF)

The variance inflation factor for the jth independent (or predictor)
variable xj is
1
VIFj =
1 − R2j
where Rj2 is the multiple coefficient of determination for the

regression model relating xj to the other predictors – x1,…,xj-1,xj+1, xk
xj = β0 + β1x1 + β2 x2 +...+βj−1xj−1 + βj+1xj+1+...+βk xk + ε
Notes:
VIFj = 1 implies xj not related to other predictors
max(VIFj) > 10 suggest severe multicollinearity
mean(VIFj) substantially greater than 1 suggests severe
multicollinearity
Example: Variance Inflation Factors

(VIF)
Example: The Sale Territory Performance Case MegaStat Output
Regression output confidence interval

variables coefficients std. error t (df=16) p-value 95% lower 95% upper VIF
Intercept -1,507.8137 778.6349 -1.936 .0707 -3,158.4457 142.8182
Time 2.0096 1.9307 1.041 .3134 -2.0832 6.1024 3.343
MktPoten 0.0372 0.0082 4.536 .0003 0.0198 0.0546 1.978
Adver 0.1510 0.0471 3.205 .0055 0.0511 0.2509 1.910
MktShare 199.0235 67.0279 2.969 .0090 56.9307 341.1164 3.236
Change 290.8551 186.7820 1.557 .1390 -105.1049 686.8152 1.602
Accts 5.5510 4.7755 1.162 .2621 -4.5728 15.6747 5.639
WkLoad 19.7939 33.6767 0.588 .5649 -51.5975 91.1853 1.818
Rating 8.1893 128.5056 0.064 .9500 -264.2304 280.6090 1.809
2.667
mean VIF
max(VIFj) =5.639, mean(VIFj) = 2.667 probably not severe multicollinearity
16
06-05-10
Residual Analysis in Multiple

Regression
For an observed value of yi, the residual is
ei = yi − yˆ i = yi − (b0 + b1 xi1 + ... + bk xik )

If the regression assumptions hold, the residuals should
look like a random sample from a normal distribution
with mean 0 and variance σ2.
Residual Plots
Residuals versus each independent variable
Residuals versus predicted y’s
Residuals in time order (if the response is a time series)
Histogram of residuals
Normal plot of the residuals
Nonconstant Variance: Remedial

Measures
Example: The QHIC Case y / x = β 0 / x + β1 + β 2 x + η
Upkeep/V = - 53.5 1/V + 3.41 One + 0.0112 Value
Predictor Coef SE Coef T P
Noconstant
1/V -53.50 83.20 -0.64 0.524
One 3.409 1.321 2.58 0.014
Value 0.011224 0.004627 2.43 0.020
Predicted Values (1/V = 0.004545, One = 1, Value = 220)
Fit SE Fit 95.0% CI 95.0% PI
5.635 0.162 ( 5.306, 5.964) ( 3.994, 7.276)
Plots: Residual versus Fits x and predicted responses
17
06-05-10
Diagnostics for Detecting Outlying

and Influential Observations
Observation 1: Outlying with respect to y value

Observation 2: Outlying with respect to x value
Observation 3: Outlying with respect to x value and y value not
consistent with regression relationship (Influential)
Example: Influence Diagnostics
Hospital Labor Needs Case, Model:
y = monthly labor hours required

x1 = monthly X-ray exposures
x2 = monthly occupied bed days
x3 = average length of patient stay (days)
Studentized
Studentized Deleted
Observation Hours Predicted Residual Leverage Residual Residual Cook's D
1 566.520 688.409 -121.889 0.121 -0.211 -0.203 0.002
2 696.820 721.848 -25.028 0.226 -0.046 -0.044 0.000
3 1,033.150 965.393 67.757 0.130 0.118 0.114 0.001
4 1,603.620 1,172.464 431.156 0.159 0.765 0.752 0.028
5 1,611.370 1,526.780 84.590 0.085 0.144 0.138 0.000
6 1,613.270 1,993.869 -380.599 0.112 -0.657 -0.642 0.014
7 1,854.170 1,676.558 177.612 0.084 0.302 0.291 0.002
8 2,160.550 1,791.405 369.145 0.083 0.627 0.612 0.009
9 2,305.580 2,798.761 -493.181 0.085 -0.838 -0.828 0.016
10 3,503.930 4,191.333 -687.403 0.120 -1.192 -1.214 0.049
11 3,571.890 3,190.957 380.933 0.077 0.645 0.630 0.009
12 3,741.400 4,364.502 -623.102 0.177 -1.117 -1.129 0.067
13 4,026.520 4,364.229 -337.709 0.064 -0.568 -0.553 0.006
14 10,343.810 8,713.307 1,630.503 0.146 2.871 4.558 0.353
15 11,732.170 12,080.864 -348.694 0.682 -1.005 -1.006 0.541
16 15,414.940 15,133.026 281.914 0.785 0.990 0.989 0.897
17 18,854.450 19,260.453 -406.003 0.863 -1.786 -1.975 5.033
18
06-05-10
Leverage Values
Leverage = distance value (hi )
An observation is outlying with respect to x if it has a large leverage,
greater than 2(k+1)/n
Hospital Labor Needs Case: n = 17, k = 3, 2(3+1)/17 = 0.4706
Studentized
Studentized Deleted
1 566.520 688.409 -121.889 0.121 -0.211 -0.203 0.002
2 696.820 721.848 -25.028 0.226 -0.046 -0.044 0.000
3 1,033.150 965.393 67.757 0.130 0.118 0.114 0.001
4 1,603.620 1,172.464 431.156 0.159 0.765 0.752 0.028
5 1,611.370 1,526.780 84.590 0.085 0.144 0.138 0.000
6 1,613.270 1,993.869 -380.599 0.112 -0.657 -0.642 0.014
7 1,854.170 1,676.558 177.612 0.084 0.302 0.291 0.002
8 2,160.550 1,791.405 369.145 0.083 0.627 0.612 0.009
9 2,305.580 2,798.761 -493.181 0.085 -0.838 -0.828 0.016
10 3,503.930 4,191.333 -687.403 0.120 -1.192 -1.214 0.049
11 3,571.890 3,190.957 380.933 0.077 0.645 0.630 0.009
12 3,741.400 4,364.502 -623.102 0.177 -1.117 -1.129 0.067
13 4,026.520 4,364.229 -337.709 0.064 -0.568 -0.553 0.006
14 10,343.810 8,713.307 1,630.503 0.146 2.871 4.558 0.353
15 11,732.170 12,080.864 -348.694 0.682 -1.005 -1.006 0.541
16 15,414.940 15,133.026 281.914 0.785 0.990 0.989 0.897
17 18,854.450 19,260.453 -406.003 0.863 -1.786 -1.975 5.033
Residuals and Studentized Residuals
Residual  ei 
Studentized Residual =  ei′ = 
Residual Standard Error  s 1 − h 
 i 
An observation is outlying with respect to y if it has a large studentized
(or standardized) residual, |StRes| greater than 2
Studentized
Studentized Deleted
1 566.520 688.409 -121.889 0.121 -0.211 -0.203 0.002
2 696.820 721.848 -25.028 0.226 -0.046 -0.044 0.000
3 1,033.150 965.393 67.757 0.130 0.118 0.114 0.001
4 1,603.620 1,172.464 431.156 0.159 0.765 0.752 0.028
5 1,611.370 1,526.780 84.590 0.085 0.144 0.138 0.000
6 1,613.270 1,993.869 -380.599 0.112 -0.657 -0.642 0.014
7 1,854.170 1,676.558 177.612 0.084 0.302 0.291 0.002
8 2,160.550 1,791.405 369.145 0.083 0.627 0.612 0.009
9 2,305.580 2,798.761 -493.181 0.085 -0.838 -0.828 0.016
10 3,503.930 4,191.333 -687.403 0.120 -1.192 -1.214 0.049
11 3,571.890 3,190.957 380.933 0.077 0.645 0.630 0.009
12 3,741.400 4,364.502 -623.102 0.177 -1.117 -1.129 0.067
13 4,026.520 4,364.229 -337.709 0.064 -0.568 -0.553 0.006
14 10,343.810 8,713.307 1,630.503 0.146 2.871 4.558 0.353
15 11,732.170 12,080.864 -348.694 0.682 -1.005 -1.006 0.541
16 15,414.940 15,133.026 281.914 0.785 0.990 0.989 0.897
17 18,854.450 19,260.453 -406.003 0.863 -1.786 -1.975 5.033
19
06-05-10
Studentized Deleted Residuals
Deleted Residual  di n−k −2 

Studentized Deleted Residual =  = ei 
Deleted Residual Standard Error  sd SSE (1 − hi ) − ei2 
 i 
An observation is outlying with respect to y if it has a large studentized deleted residual,

|tRes| greater than tα/2 [with (n-k-2) d.f.]
Hospital Labor Needs Case: (17-3-2) = 12 |tRes| > t.025 = 2.179
Studentized
Studentized Deleted
1 566.520 688.409 -121.889 0.121 -0.211 -0.203 0.002
2 696.820 721.848 -25.028 0.226 -0.046 -0.044 0.000
3 1,033.150 965.393 67.757 0.130 0.118 0.114 0.001
4 1,603.620 1,172.464 431.156 0.159 0.765 0.752 0.028
5 1,611.370 1,526.780 84.590 0.085 0.144 0.138 0.000
6 1,613.270 1,993.869 -380.599 0.112 -0.657 -0.642 0.014
7 1,854.170 1,676.558 177.612 0.084 0.302 0.291 0.002
8 2,160.550 1,791.405 369.145 0.083 0.627 0.612 0.009
9 2,305.580 2,798.761 -493.181 0.085 -0.838 -0.828 0.016
10 3,503.930 4,191.333 -687.403 0.120 -1.192 -1.214 0.049
11 3,571.890 3,190.957 380.933 0.077 0.645 0.630 0.009
12 3,741.400 4,364.502 -623.102 0.177 -1.117 -1.129 0.067
13 4,026.520 4,364.229 -337.709 0.064 -0.568 -0.553 0.006
14 10,343.810 8,713.307 1,630.503 0.146 2.871 4.558 0.353
15 11,732.170 12,080.864 -348.694 0.682 -1.005 -1.006 0.541
16 15,414.940 15,133.026 281.914 0.785 0.990 0.989 0.897
17 18,854.450 19,260.453 -406.003 0.863 -1.786 -1.975 5.033
Cook’s Distance
ei2  hi 
Cook's Distance = Di =  2
(k + 1) s 2  (1 − hi ) 
An observation is influential with respect to the estimated regression
parameters b0, b1, …, bk if it has a large Cook’s distance, Di greater
than F.50 [with k+1 and n-(k+1) d.f.]
Hospital Labor Needs Case: (3+1) = 4, (17-3-1) =13, Di > F.50 = 0.8845
Studentized
Studentized Deleted
1 566.520 688.409 -121.889 0.121 -0.211 -0.203 0.002
2 696.820 721.848 -25.028 0.226 -0.046 -0.044 0.000
3 1,033.150 965.393 67.757 0.130 0.118 0.114 0.001
4 1,603.620 1,172.464 431.156 0.159 0.765 0.752 0.028
5 1,611.370 1,526.780 84.590 0.085 0.144 0.138 0.000
6 1,613.270 1,993.869 -380.599 0.112 -0.657 -0.642 0.014
7 1,854.170 1,676.558 177.612 0.084 0.302 0.291 0.002
8 2,160.550 1,791.405 369.145 0.083 0.627 0.612 0.009
9 2,305.580 2,798.761 -493.181 0.085 -0.838 -0.828 0.016
10 3,503.930 4,191.333 -687.403 0.120 -1.192 -1.214 0.049
11 3,571.890 3,190.957 380.933 0.077 0.645 0.630 0.009
12 3,741.400 4,364.502 -623.102 0.177 -1.117 -1.129 0.067
13 4,026.520 4,364.229 -337.709 0.064 -0.568 -0.553 0.006
14 10,343.810 8,713.307 1,630.503 0.146 2.871 4.558 0.353
15 11,732.170 12,080.864 -348.694 0.682 -1.005 -1.006 0.541
16 15,414.940 15,133.026 281.914 0.785 0.990 0.989 0.897
17 18,854.450 19,260.453 -406.003 0.863 -1.786 -1.975 5.033
20

Multiple Regression

Caricato da

Informazioni sul documento

Descrizione originale:

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Multiple Regression

Caricato da

Copyright:

Formati disponibili

06-05-10

Part 4 Model Building and Model Diagnostics

The Linear Regression Model

The linear regression model relating y to x1, x2, …, xk is

Example: The Linear Regression

Example : The Fuel Consumption Case

The Linear Regression Model

The Regression Model Assumptions

Model y= µ y|x1 , x2 ,..., xk + ε = β0 + β1 x1 + β2 x2 + ... + βk xk + ε

Assumptions about the model error terms, ε’s

Least Squares Estimates and Prediction

yˆ = b0 + b1 x01 + b2 x02 + ... + bk x0 k

x01, x02, …, x0k are specified values of the independent predictor

Example: Least Squares Estimation

Example : The Fuel Consumption Case

Predictor Coef StDev T P

S = 0.3671 R-Sq = 97.4% R-Sq(adj) = 96.3%

Predicted Values (Temp = 40, Chill = 10)

Example: Point Predictions and

Mean Square Error and Standard Error

The Multiple Coefficient of Determination

The multiple coefficient of determination R2 is

Total variation = Explained variation + Unexplained variation

Multiple correlation coefficient , R = R 2

The adjusted multiple coefficient of determination is

F Test for Linear Regression Model

To test H0: β 1= β 2 = …= β κ = 0 versus

Reject H0 in favor of Ha if:

Example: F Test for Linear Regression

Example :The Fuel Consumption Case

12.5 Testing Significance of the Independent

Alternative Reject H0 if: p-Value

Ha : β j < 0 t < −tα Area under t distributi on left of t

Ha : β j ≠ 0 t > tα / 2 , that is Twice area under t distributi on right of t

t > tα / 2 or t < −tα / 2

Example: Testing and Estimation

Example : The Fuel Consumption Case

Predictor Coef StDev T P

Confidence and Prediction Intervals

If the regression assumptions hold,

[ŷ ± t α/2 s( y − yˆ ) ] s( y − yˆ ) = s Distance value

100(1 - α)% prediction interval for an individual value of y

[ŷ ± t α/2 s yˆ ], s yˆ = s 1 + Distance value

Distance value (requires matrix algebra), see Appendix G on CD-ROM

Example: Confidence and Prediction

FuelCons = 13.1 - 0.0900 Temp + 0.0825 Chill

Predicted Values (Temp = 40, Chill = 10)

95% Confidence Interval 95% Prediction Interval

The Quadratic Regression Model

Example: Quadratic Regression Model

Example: Quadratic Regression Model

Example : The Gasoline Additive Case

yˆ = 25.7152 + 4.9762( 2.44) − 1.01905(2.44) 2 = 31.7901 mpg

Radio and TV Print Sales Radio and TV Print Sales

Model y= β0 + β1 x1 + β2 x2 + β3 x1 x2 x1x2 is a cross-product or interaction term

Using Dummy Variables to Model

Example : The Electronics World Case

Location Dummy Variable

Example: Regression with a Dummy

Example : The Electronics World Case

Sales = 17.4 + 0.851 Households + 29.2 DM

Predictor Coef StDev T P

S = 7.329 R-Sq = 98.3% R-Sq(adj) = 97.8%