Sei sulla pagina 1di 20

06-05-10

Multiple Regression

Multiple Regression
Part 1. Basic Multiple Regression
The Linear Regression Model
The Least Squares Point Estimates
The Mean Squared Error and the Standard
Error
Model Utility: R2, Adjusted R2, and the F Test
Testing Significance of an Independent
Variable
Confidence Intervals and Prediction Intervals
• Part 2 Using Squared and Interaction Terms
The Quadratic Regression Model
 Interaction

1
06-05-10

Multiple Regression
 Part 3 Dummy Variables and Advanced Statistical
Inferences
Dummy Variables to Model Qualitative Variables
The Partial F Test: Testing a Portion of a Model

 Part 4 Model Building and Model Diagnostics


Model Building and Model Diagnostics
Model Building and the Effects of Multicollineartity
Diagnostics for Detecting Outlying and
Influential Observations

The Linear Regression Model

The linear regression model relating y to x1, x2, …, xk is


y= µ y|x1 , x2 ,..., xk + ε = β0 + β1 x1 + β2 x2 + ... + β k xk + ε

where
µ y|x1 , x2 ,..., xk = β0 + β1 x1 + β2 x2 + ... + βk xk is the mean value of the
dependent variable y when the values of the independent
variables are x1, x2, …, xk.
β0 , β1 , β2 ,..., βk are the regression parameters relating the mean
value of y to x1, x2, …, xk.
ε is an error term that describes the effects on y of all factors other
than the independent variables x1, x2, …, xk .

2
06-05-10

Example: The Linear Regression


Model

Example : The Fuel Consumption Case


Average Hourly Fuel Consumption
Week Temperature, x1 (F) Chill Index, x2 y (MMcf)
1 28.0 18 12.4
2 28.0 14 11.7
3 32.5 24 12.4
4 39.0 22 10.8
5 45.9 8 9.4
6 57.8 16 9.5
7 58.1 1 8.0
8 62.5 0 7.5

y = β0 + β1 x1 + β2 x2 + ε

The Linear Regression Model


Illustrated
Example : The Fuel Consumption Case

3
06-05-10

The Regression Model Assumptions

Model y= µ y|x1 , x2 ,..., xk + ε = β0 + β1 x1 + β2 x2 + ... + βk xk + ε

Assumptions about the model error terms, ε’s


Mean Zero The mean of the error terms is equal to 0.
Constant Variance The variance of the error terms σ2 is,
the same for every combination values of x1, x2, …, xk.
Normality The error terms follow a normal distribution
for every combination values of x1, x2, …, xk.
Independence The values of the error terms are
statistically independent of each other.

Least Squares Estimates and Prediction

Estimation/Prediction Equation:

yˆ = b0 + b1 x01 + b2 x02 + ... + bk x0 k

is the point estimate of the mean value of the dependent variable when
the values of the independent variables are x01, x02, …, x0k. It is also the
point prediction of an individual value of the dependent variable
when the values of the independent variables are x01, x02, …, x0k.

b1, b2, …, bk are the least squares point estimates of the parameters
β1, β 2, …, β k.

x01, x02, …, x0k are specified values of the independent predictor


variables x1, x2, …, xk.

4
06-05-10

Example: Least Squares Estimation

Example : The Fuel Consumption Case


FuelCons = 13.1 - 0.0900 Temp + 0.0825 Chill

Predictor Coef StDev T P


Constant 13.1087 0.8557 15.32 0.000
Temp -0.09001 0.01408 -6.39 0.001
Chill 0.08249 0.02200 3.75 0.013

S = 0.3671 R-Sq = 97.4% R-Sq(adj) = 96.3%

Analysis of Variance
Source DF SS MS F P
Regression 2 24.875 12.438 92.30 0.000
Residual Error 5 0.674 0.135
Total 7 25.549

Predicted Values (Temp = 40, Chill = 10)


Fit StDev Fit 95.0% CI 95.0% PI
10.333 0.170 ( 9.895, 10.771) ( 9.293, 11.374)

Example: Point Predictions and


Residuals
Example : The Fuel Consumption Case
Observed Fuel Predicted Fuel
Average Hourly Consumption Consumption Residual
Week Temperature, x1 (F) Chill Index, x2 y (MMcf) 13.1087 - .0900x1 + .0825x2 e = y - pred
1 28.0 18 12.4 12.0733 0.3267
2 28.0 14 11.7 11.7433 -0.0433
3 32.5 24 12.4 12.1631 0.2369
4 39.0 22 10.8 11.4131 -0.6131
5 45.9 8 9.4 9.6372 -0.2372
6 57.8 16 9.5 9.2260 0.2740
7 58.1 1 8.0 7.9616 0.0384
8 62.5 0 7.5 7.4831 0.0169

5
06-05-10

Mean Square Error and Standard Error


SSE = ∑ e i2 = ∑ ( y i − yˆ i ) 2 Sum of Squared Errors
SSE Mean Square Error, point estimate
s 2 = MSE =
n- ( k + 1 ) of residual variance σ2
SSE Standard Error, point estimate of
s = MSE =
n-(k + 1) residual standard deviation σ
Example :The Fuel Consumption Case
Analysis of Variance
Source DF SS MS F P
Regression 2 24.875 12.438 92.30 0.000
Residual Error 5 0.674 0.135
Total 7 25.549

SSE 0.674
s 2 = MSE = = = 0.1348 s = s 2 = 0.1348 = 0.3671
n-(k + 1) 8 − 3

The Multiple Coefficient of Determination

The multiple coefficient of determination R2 is


Explained variation
R2 =
Total variation
R2 is the proportion of the total variation in y explained by the linear
regression model

Total variation = Explained variation + Unexplained variation


Total variation = ∑ (yi − y )2 Total Sum of Squares (SSTO)
Explained variation = ∑ (yˆ i − y )2 Regression Sum of Squares (SSR)
Unexplained variation = ∑ (yi − yˆ i )2 Error Sum of Squares (SSE)

Multiple correlation coefficient , R = R 2

6
06-05-10

The Adjusted R2

The adjusted multiple coefficient of determination is


 k  n − 1 
R 2 =  R2 −  
 n − 1  n − ( k + 1) 
Example : The Fuel Consumption Case
S = 0.3671 R-Sq = 97.4% R-Sq(adj) = 96.3%
Analysis of Variance
Source DF SS MS F P
Regression 2 24.875 12.438 92.30 0.000
Residual Error 5 0.674 0.135
Total 7 25.549

24.875  2  8 − 1 
R2 = = 0.974, R 2 =  0.974 −   = 0.963
25.549  8 − 1  8 − (2 + 1) 

F Test for Linear Regression Model

To test H0: β 1= β 2 = …= β κ = 0 versus


Ha: At least one of the β 1, β 2, …, β k is not equal to 0

Test Statistic:
(Explained variation)/k
F(model) =
(Unexplained variation)/[n - (k + 1)]

Reject H0 in favor of Ha if:


F(model) > Fα or
p-value < α
Fα is based on k numerator and n-(k+1) denominator degrees of freedom.

7
06-05-10

Example: F Test for Linear Regression

Example :The Fuel Consumption Case

Analysis of Variance
Source DF SS MS F P
Regression 2 24.875 12.438 92.30 0.000
Residual Error 5 0.674 0.135
Total 7 25.549

Test Statistic:
(Explained variation)/k 24.875 / 2
F(model) = = = 92.30
(Unexplained variation)/[n - (k + 1)] 0.674 /(8 − 3)
Reject H0 at α level of significance, since
F-test at α = 0.05
F(model) = 92.30 > 5.79 = F.05 and level of significance
p - value ≈ 0.000 < 0.05 = α
Fα is based on 2 numerator and 5 denominator degrees of freedom.

12.5 Testing Significance of the Independent


Variable
If the regression assumptions hold, we can reject H0: β j = 0 at the α
level of significance (probability of Type I error equal to α) if and only if
the appropriate rejection point condition holds or, equivalently, if the
corresponding p-value is less than α.

Alternative Reject H0 if: p-Value


Ha : β j > 0 t > tα Area under t distributi on right of t

Ha : β j < 0 t < −tα Area under t distributi on left of t

Ha : β j ≠ 0 t > tα / 2 , that is Twice area under t distributi on right of t

t > tα / 2 or t < −tα / 2


Test Statistic α)% Confidence Interval for β j
100(1-α
bj
t= [b j ± tα / 2 sb j ]
sbj
tα, tα/2 and p-values are based on n – (k+1) degrees of freedom.

8
06-05-10

Example: Testing and Estimation


for βs

Example : The Fuel Consumption Case

Predictor Coef StDev T P


Constant 13.1087 0.8557 15.32 0.000
Temp -0.09001 0.01408 -6.39 0.001
Chill 0.08249 0.02200 3.75 0.013
Test Interval
b
t= 2 =
0.08249
= 3.75 > 2.571 = t.025 [b2 ± tα / 2 sb2 ] =
sb2 0.02200 [0.08249 ± (2.571)(0.02200)] =
[0.08249 ± 0.05656] =
p − value = 2 × P (t > 3.75) = 0.013
[0.02593, 0.13905]
Chill is significant at the α = 0.05 level, but not at α = 0.01
tα, tα/2 and p-values are based on 5 degrees of freedom.

Confidence and Prediction Intervals


Prediction: yˆ = b0 + b1 x01 + b2 x02 + ... + bk x0 k

If the regression assumptions hold,


100(1 - α)% confidence interval for the mean value of y

[ŷ ± t α/2 s( y − yˆ ) ] s( y − yˆ ) = s Distance value

100(1 - α)% prediction interval for an individual value of y

[ŷ ± t α/2 s yˆ ], s yˆ = s 1 + Distance value

Distance value (requires matrix algebra), see Appendix G on CD-ROM


tα/2 is based on n-(k+1) degrees of freedom

9
06-05-10

Example: Confidence and Prediction


Intervals
Example : The Fuel Consumption Case

FuelCons = 13.1 - 0.0900 Temp + 0.0825 Chill

Predicted Values (Temp = 40, Chill = 10)


Fit StDev Fit 95.0% CI 95.0% PI
10.333 0.170 (9.895, 10.771) (9.293,11.374)

95% Confidence Interval 95% Prediction Interval


[ŷ ± t α/2 s Distance value ] [ŷ ± t α/2 s 1 + Distance value ]
[10.333 ± ( 2.571)( 0.3671) 0.2144515 ] [10.333 ± (2.571)( 0.3671) 1 + 0.2144515 ]
[10.333 ± 0.438] [10.333 ± 1.041]
[9.895,10 .771] [9.292,11 .374]

The Quadratic Regression Model

Model y= β0 + β1 x + β2 x
2

10
06-05-10

Example: Quadratic Regression Model


Example :The Gasoline Additive Case

Units of Mileage,
Additive, x y (MPG)
0 25.8
0 26.1
0 25.4
1 29.6
1 29.2
1 29.8
2 32.0
2 31.4
2 31.7
3 31.7
3 31.5
3 31.2
4 29.4
4 29.0
4 29.5

Example: Quadratic Regression Model

Example : The Gasoline Additive Case


Mileage = 25.7 + 4.98 Units - 1.02 UnitsSq
Predictor Coef StDev T P
Constant 25.7152 0.1554 165.43 0.000
Units 4.9762 0.1841 27.02 0.000
UnitsSq -1.01905 0.04414 -23.09 0.000
S = 0.2861 R-Sq = 98.6% R-Sq(adj) = 98.3%
Analysis of Variance
Source DF SS MS F P
Regression 2 67.915 33.958 414.92 0.000
Residual Error 12 0.982 0.082
Total 14 68.897
Predicted Values (Units = 2.44, UnitsSq = (2.44)(2.44) = 5.9536)
Fit StDev Fit 95.0% CI 95.0% PI
31.7901 0.1111 ( 31.5481, 32.0322) ( 31.1215, 32.4588)

yˆ = 25.7152 + 4.9762( 2.44) − 1.01905(2.44) 2 = 31.7901 mpg

11
06-05-10

Interaction

Radio and TV Print Sales Radio and TV Print Sales


Example Sales
Region
Expenditures
x1
Expenditures
x2
Volume
y
Sales
Region
Expenditures
x1
Expenditures
x2
Volume
y
12.13: The 1 1 1 3.27 14 3 4 17.99
2 1 2 8.38 15 3 5 19.85
Bonner 3 1 3 11.28 16 4 1 9.46
4 1 4 14.5 17 4 2 12.61
Frozen 5 1 5 19.63 18 4 3 15.5
Foods Case 6 2 1 5.84 19 4 4 17.68
7 2 2 10.01 20 4 5 21.02
8 2 3 12.46 21 5 1 12.23
9 2 4 16.67 22 5 2 13.58
10 2 5 19.83 23 5 3 16.77
11 3 1 8.51 24 5 4 20.56
12 3 2 10.14 25 5 5 21.05
13 3 3 14.75

Modeling Interaction

Model y= β0 + β1 x1 + β2 x2 + β3 x1 x2 x1x2 is a cross-product or interaction term


Example : The Bonner Frozen Food Case Minitab Output
Sales = - 2.35 + 2.36 RadioTV + 4.18 Print - 0.349 Interact
Predictor Coef StDev T P
Constant -2.3497 0.6883 -3.41 0.003
RadioTV 2.3611 0.2075 11.38 0.000
Print 4.1831 0.2075 20.16 0.000
Interact -0.34890 0.06257 -5.58 0.000
S = 0.6257 R-Sq = 98.6% R-Sq(adj) = 98.4%
Analysis of Variance
Source DF SS MS F P
Regression 3 590.41 196.80 502.67 0.000
Residual Error 21 8.22 0.39
Total 24 598.63
Predicted Values (RadioTV = 2, Print = 5, Interact=(2)(5) = 10)
Fit StDev Fit 95.0% CI 95.0% PI
19.799 0.265 ( 19.247, 20.351) ( 18.385, 21.213)

12
06-05-10

Using Dummy Variables to Model


Qualitative Independent Variable

Example : The Electronics World Case


Number of Location Sales
Households Dummy Volume
Store x Location DM y
1 161 Street 0 157.27
2 99 Street 0 93.28
3 135 Street 0 136.81
4 120 Street 0 123.79
5 164 Street 0 153.51
6 221 Mall 1 241.74
7 179 Mall 1 201.54
8 204 Mall 1 206.71
9 214 Mall 1 229.78
10 101 Mall 1 135.22

Location Dummy Variable


1 if a store is in a mall location
DM =
0 otherwise

Example: Regression with a Dummy


Variable

Example : The Electronics World Case

Sales = 17.4 + 0.851 Households + 29.2 DM

Predictor Coef StDev T P


Constant 17.360 9.447 1.84 0.109
Househol 0.85105 0.06524 13.04 0.000
DM 29.216 5.594 5.22 0.001

S = 7.329 R-Sq = 98.3% R-Sq(adj) = 97.8%

Analysis of Variance

Source DF SS MS F P
Regression 2 21412 10706 199.32 0.000
Residual Error 7 376 54
Total 9 21788

13
06-05-10

The Partial F Test: Testing the


Significance of a Portion of a Regression
Model
Complete model : y= β0 + β1 x1 + ... + β g x g + β g +1 x g +1 + ... + βk xk + ε
Reduced model : y= β0 + β1 x1 + ... + β g x g + ε

To test H0: β g+1= β g+2 = …= β k = 0 versus


Ha: At least one of the β g+1, β g+2, …, β k is not equal to 0

(SSE R - SSE C )/(k - g)


Partial F Statistic: F=
SSE C /[n - (k + 1)]
Reject H0 in favor of Ha if:
F > Fα or
p-value < α
Fα is based on k-g numerator and n-(k+1) denominator degrees of
freedom.

Model Building and the Effects of


Multicollinearity
Example: The Sale Territory Performance Case
Sales Time MktPoten Adver MktShare Change Accts WkLoad Rating
3669.88 43.10 74065.11 4582.88 2.51 0.34 74.86 15.05 4.9
3473.95 108.13 58117.30 5539.78 5.51 0.15 107.32 19.97 5.1
2295.10 13.82 21118.49 2950.38 10.91 -0.72 96.75 17.34 2.9
4675.56 186.18 68521.27 2243.07 8.27 0.17 195.12 13.40 3.4
6125.96 161.79 57805.11 7747.08 9.15 0.50 180.44 17.64 4.6
2134.94 8.94 37806.94 402.44 5.51 0.15 104.88 16.22 4.5
5031.66 365.04 50935.26 3140.62 8.54 0.55 256.10 18.80 4.6
3367.45 220.32 35602.08 2086.16 7.07 -0.49 126.83 19.86 2.3
6519.45 127.64 46176.77 8846.25 12.54 1.24 203.25 17.42 4.9
4876.37 105.69 42053.24 5673.11 8.85 0.31 119.51 21.41 2.8
2468.27 57.72 36829.71 2761.76 5.38 0.37 116.26 16.32 3.1
2533.31 23.58 33612.67 1991.85 5.43 -0.65 142.28 14.51 4.2
2408.11 13.82 21412.79 1971.52 8.48 0.64 89.43 19.35 4.3
2337.38 13.82 20416.87 1737.38 7.80 1.01 84.55 20.02 4.2
4586.95 86.99 36272.00 10694.20 10.34 0.11 119.51 15.26 5.5
2729.24 165.85 23093.26 8618.61 5.15 0.04 80.49 15.87 3.6
3289.40 116.26 26879.59 7747.89 6.64 0.68 136.58 7.81 3.4
2800.78 42.28 39571.96 4565.81 5.45 0.66 78.86 16.00 4.2
3264.20 52.84 51866.15 6022.70 6.31 -0.10 136.58 17.44 3.6
3453.62 165.04 58749.82 3721.10 6.35 -0.03 138.21 17.98 3.1
1741.45 10.57 23990.82 860.97 7.37 -1.63 75.61 20.99 1.6
2035.75 13.82 25694.86 3571.51 8.39 -0.43 102.44 21.66 3.4
1578.00 8.13 23736.35 2845.50 5.15 0.04 76.42 21.46 2.7
4167.44 58.54 34314.29 5060.11 12.88 0.22 136.58 24.78 2.8
2799.97 21.14 22809.53 3552.00 9.14 -0.74 88.62 24.96 3.9

14
06-05-10

Correlation Matrix
Example: The Sale Territory Performance Case

Multicollinearity

Multicollinearity refers to the condition where the independent


variables (or predictors) in a model are dependent, related, or
correlated with each other.

Effects
Hinders ability to use bjs, t statistics, and p-values to assess the
relative importance of predictors.
Does not hinder ability to predict the dependent (or response)
variable.

Detection
Scatter Plot Matrix
Correlation Matrix
Variance Inflation Factors (VIF)

15
06-05-10

Variance Inflation Factors (VIF)


The variance inflation factor for the jth independent (or predictor)
variable xj is
1
VIFj =
1 − R2j

where Rj2 is the multiple coefficient of determination for the


regression model relating xj to the other predictors – x1,…,xj-1,xj+1, xk

xj = β0 + β1x1 + β2 x2 +...+βj−1xj−1 + βj+1xj+1+...+βk xk + ε

Notes:
VIFj = 1 implies xj not related to other predictors
max(VIFj) > 10 suggest severe multicollinearity
mean(VIFj) substantially greater than 1 suggests severe
multicollinearity

Example: Variance Inflation Factors


(VIF)
Example: The Sale Territory Performance Case MegaStat Output

Regression output confidence interval


variables coefficients std. error t (df=16) p-value 95% lower 95% upper VIF
Intercept -1,507.8137 778.6349 -1.936 .0707 -3,158.4457 142.8182
Time 2.0096 1.9307 1.041 .3134 -2.0832 6.1024 3.343
MktPoten 0.0372 0.0082 4.536 .0003 0.0198 0.0546 1.978
Adver 0.1510 0.0471 3.205 .0055 0.0511 0.2509 1.910
MktShare 199.0235 67.0279 2.969 .0090 56.9307 341.1164 3.236
Change 290.8551 186.7820 1.557 .1390 -105.1049 686.8152 1.602
Accts 5.5510 4.7755 1.162 .2621 -4.5728 15.6747 5.639
WkLoad 19.7939 33.6767 0.588 .5649 -51.5975 91.1853 1.818
Rating 8.1893 128.5056 0.064 .9500 -264.2304 280.6090 1.809
2.667
mean VIF

max(VIFj) =5.639, mean(VIFj) = 2.667 probably not severe multicollinearity

16
06-05-10

Residual Analysis in Multiple


Regression
For an observed value of yi, the residual is

ei = yi − yˆ i = yi − (b0 + b1 xi1 + ... + bk xik )


If the regression assumptions hold, the residuals should
look like a random sample from a normal distribution
with mean 0 and variance σ2.
Residual Plots
Residuals versus each independent variable
Residuals versus predicted y’s
Residuals in time order (if the response is a time series)
Histogram of residuals
Normal plot of the residuals

Nonconstant Variance: Remedial


Measures
Example: The QHIC Case y / x = β 0 / x + β1 + β 2 x + η
Upkeep/V = - 53.5 1/V + 3.41 One + 0.0112 Value
Predictor Coef SE Coef T P
Noconstant
1/V -53.50 83.20 -0.64 0.524
One 3.409 1.321 2.58 0.014
Value 0.011224 0.004627 2.43 0.020
Predicted Values (1/V = 0.004545, One = 1, Value = 220)
Fit SE Fit 95.0% CI 95.0% PI
5.635 0.162 ( 5.306, 5.964) ( 3.994, 7.276)

Plots: Residual versus Fits x and predicted responses

17
06-05-10

Diagnostics for Detecting Outlying


and Influential Observations

Observation 1: Outlying with respect to y value


Observation 2: Outlying with respect to x value
Observation 3: Outlying with respect to x value and y value not
consistent with regression relationship (Influential)

Example: Influence Diagnostics

Hospital Labor Needs Case, Model:

y = monthly labor hours required


x1 = monthly X-ray exposures
x2 = monthly occupied bed days
x3 = average length of patient stay (days)

Studentized
Studentized Deleted
Observation Hours Predicted Residual Leverage Residual Residual Cook's D
1 566.520 688.409 -121.889 0.121 -0.211 -0.203 0.002
2 696.820 721.848 -25.028 0.226 -0.046 -0.044 0.000
3 1,033.150 965.393 67.757 0.130 0.118 0.114 0.001
4 1,603.620 1,172.464 431.156 0.159 0.765 0.752 0.028
5 1,611.370 1,526.780 84.590 0.085 0.144 0.138 0.000
6 1,613.270 1,993.869 -380.599 0.112 -0.657 -0.642 0.014
7 1,854.170 1,676.558 177.612 0.084 0.302 0.291 0.002
8 2,160.550 1,791.405 369.145 0.083 0.627 0.612 0.009
9 2,305.580 2,798.761 -493.181 0.085 -0.838 -0.828 0.016
10 3,503.930 4,191.333 -687.403 0.120 -1.192 -1.214 0.049
11 3,571.890 3,190.957 380.933 0.077 0.645 0.630 0.009
12 3,741.400 4,364.502 -623.102 0.177 -1.117 -1.129 0.067
13 4,026.520 4,364.229 -337.709 0.064 -0.568 -0.553 0.006
14 10,343.810 8,713.307 1,630.503 0.146 2.871 4.558 0.353
15 11,732.170 12,080.864 -348.694 0.682 -1.005 -1.006 0.541
16 15,414.940 15,133.026 281.914 0.785 0.990 0.989 0.897
17 18,854.450 19,260.453 -406.003 0.863 -1.786 -1.975 5.033

18
06-05-10

Leverage Values
Leverage = distance value (hi )
An observation is outlying with respect to x if it has a large leverage,
greater than 2(k+1)/n
Hospital Labor Needs Case: n = 17, k = 3, 2(3+1)/17 = 0.4706
Studentized
Studentized Deleted
Observation Hours Predicted Residual Leverage Residual Residual Cook's D
1 566.520 688.409 -121.889 0.121 -0.211 -0.203 0.002
2 696.820 721.848 -25.028 0.226 -0.046 -0.044 0.000
3 1,033.150 965.393 67.757 0.130 0.118 0.114 0.001
4 1,603.620 1,172.464 431.156 0.159 0.765 0.752 0.028
5 1,611.370 1,526.780 84.590 0.085 0.144 0.138 0.000
6 1,613.270 1,993.869 -380.599 0.112 -0.657 -0.642 0.014
7 1,854.170 1,676.558 177.612 0.084 0.302 0.291 0.002
8 2,160.550 1,791.405 369.145 0.083 0.627 0.612 0.009
9 2,305.580 2,798.761 -493.181 0.085 -0.838 -0.828 0.016
10 3,503.930 4,191.333 -687.403 0.120 -1.192 -1.214 0.049
11 3,571.890 3,190.957 380.933 0.077 0.645 0.630 0.009
12 3,741.400 4,364.502 -623.102 0.177 -1.117 -1.129 0.067
13 4,026.520 4,364.229 -337.709 0.064 -0.568 -0.553 0.006
14 10,343.810 8,713.307 1,630.503 0.146 2.871 4.558 0.353
15 11,732.170 12,080.864 -348.694 0.682 -1.005 -1.006 0.541
16 15,414.940 15,133.026 281.914 0.785 0.990 0.989 0.897
17 18,854.450 19,260.453 -406.003 0.863 -1.786 -1.975 5.033

Residuals and Studentized Residuals

Residual  ei 
Studentized Residual =  ei′ = 
Residual Standard Error  s 1 − h 
 i 
An observation is outlying with respect to y if it has a large studentized
(or standardized) residual, |StRes| greater than 2
Studentized
Studentized Deleted
Observation Hours Predicted Residual Leverage Residual Residual Cook's D
1 566.520 688.409 -121.889 0.121 -0.211 -0.203 0.002
2 696.820 721.848 -25.028 0.226 -0.046 -0.044 0.000
3 1,033.150 965.393 67.757 0.130 0.118 0.114 0.001
4 1,603.620 1,172.464 431.156 0.159 0.765 0.752 0.028
5 1,611.370 1,526.780 84.590 0.085 0.144 0.138 0.000
6 1,613.270 1,993.869 -380.599 0.112 -0.657 -0.642 0.014
7 1,854.170 1,676.558 177.612 0.084 0.302 0.291 0.002
8 2,160.550 1,791.405 369.145 0.083 0.627 0.612 0.009
9 2,305.580 2,798.761 -493.181 0.085 -0.838 -0.828 0.016
10 3,503.930 4,191.333 -687.403 0.120 -1.192 -1.214 0.049
11 3,571.890 3,190.957 380.933 0.077 0.645 0.630 0.009
12 3,741.400 4,364.502 -623.102 0.177 -1.117 -1.129 0.067
13 4,026.520 4,364.229 -337.709 0.064 -0.568 -0.553 0.006
14 10,343.810 8,713.307 1,630.503 0.146 2.871 4.558 0.353
15 11,732.170 12,080.864 -348.694 0.682 -1.005 -1.006 0.541
16 15,414.940 15,133.026 281.914 0.785 0.990 0.989 0.897
17 18,854.450 19,260.453 -406.003 0.863 -1.786 -1.975 5.033

19
06-05-10

Studentized Deleted Residuals

Deleted Residual  di n−k −2 


Studentized Deleted Residual =  = ei 
Deleted Residual Standard Error  sd SSE (1 − hi ) − ei2 
 i 

An observation is outlying with respect to y if it has a large studentized deleted residual,


|tRes| greater than tα/2 [with (n-k-2) d.f.]
Hospital Labor Needs Case: (17-3-2) = 12 |tRes| > t.025 = 2.179
Studentized
Studentized Deleted
Observation Hours Predicted Residual Leverage Residual Residual Cook's D
1 566.520 688.409 -121.889 0.121 -0.211 -0.203 0.002
2 696.820 721.848 -25.028 0.226 -0.046 -0.044 0.000
3 1,033.150 965.393 67.757 0.130 0.118 0.114 0.001
4 1,603.620 1,172.464 431.156 0.159 0.765 0.752 0.028
5 1,611.370 1,526.780 84.590 0.085 0.144 0.138 0.000
6 1,613.270 1,993.869 -380.599 0.112 -0.657 -0.642 0.014
7 1,854.170 1,676.558 177.612 0.084 0.302 0.291 0.002
8 2,160.550 1,791.405 369.145 0.083 0.627 0.612 0.009
9 2,305.580 2,798.761 -493.181 0.085 -0.838 -0.828 0.016
10 3,503.930 4,191.333 -687.403 0.120 -1.192 -1.214 0.049
11 3,571.890 3,190.957 380.933 0.077 0.645 0.630 0.009
12 3,741.400 4,364.502 -623.102 0.177 -1.117 -1.129 0.067
13 4,026.520 4,364.229 -337.709 0.064 -0.568 -0.553 0.006
14 10,343.810 8,713.307 1,630.503 0.146 2.871 4.558 0.353
15 11,732.170 12,080.864 -348.694 0.682 -1.005 -1.006 0.541
16 15,414.940 15,133.026 281.914 0.785 0.990 0.989 0.897
17 18,854.450 19,260.453 -406.003 0.863 -1.786 -1.975 5.033

Cook’s Distance
ei2  hi 
Cook's Distance = Di =  2
(k + 1) s 2  (1 − hi ) 
An observation is influential with respect to the estimated regression
parameters b0, b1, …, bk if it has a large Cook’s distance, Di greater
than F.50 [with k+1 and n-(k+1) d.f.]
Hospital Labor Needs Case: (3+1) = 4, (17-3-1) =13, Di > F.50 = 0.8845
Studentized
Studentized Deleted
Observation Hours Predicted Residual Leverage Residual Residual Cook's D
1 566.520 688.409 -121.889 0.121 -0.211 -0.203 0.002
2 696.820 721.848 -25.028 0.226 -0.046 -0.044 0.000
3 1,033.150 965.393 67.757 0.130 0.118 0.114 0.001
4 1,603.620 1,172.464 431.156 0.159 0.765 0.752 0.028
5 1,611.370 1,526.780 84.590 0.085 0.144 0.138 0.000
6 1,613.270 1,993.869 -380.599 0.112 -0.657 -0.642 0.014
7 1,854.170 1,676.558 177.612 0.084 0.302 0.291 0.002
8 2,160.550 1,791.405 369.145 0.083 0.627 0.612 0.009
9 2,305.580 2,798.761 -493.181 0.085 -0.838 -0.828 0.016
10 3,503.930 4,191.333 -687.403 0.120 -1.192 -1.214 0.049
11 3,571.890 3,190.957 380.933 0.077 0.645 0.630 0.009
12 3,741.400 4,364.502 -623.102 0.177 -1.117 -1.129 0.067
13 4,026.520 4,364.229 -337.709 0.064 -0.568 -0.553 0.006
14 10,343.810 8,713.307 1,630.503 0.146 2.871 4.558 0.353
15 11,732.170 12,080.864 -348.694 0.682 -1.005 -1.006 0.541
16 15,414.940 15,133.026 281.914 0.785 0.990 0.989 0.897
17 18,854.450 19,260.453 -406.003 0.863 -1.786 -1.975 5.033

20

Potrebbero piacerti anche