Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
13) Multicollinearity
18) Summary
D iv is io n o f th e n u m er ato r an d d en o m in ato r b y ( n 1 ) g iv es
n
( X i X )( Y i Y )
Σ n 1
r= i=1
n
(X i X )2 n
(Y i Y )2
Σ= 1 n 1 Σ n 1
i i=1
C OV x y
=
SxSy
© 2007 Prentice Hall 17-6
Prod uct Mo ment Co rrel ati on
2 9 12 11
3 8 12 4
4 3 4 1
5 10 12 11
6 4 6 1
7 5 8 7
8 2 2 4
9 11 18 8
10 9 9 10
11 10 17 8
12 2 2 5
© 2007 Prentice Hall 17-8
Prod uct Mom ent Corr el ati on
The correlation coefficient may be calculated as follows:
X = (10 + 12 + 12 + 4 + 12 + 6 + 8 + 2 + 18 + 9 + 17 + 2)/12
= 9.333
= (6 + 9 + 8 + 3 + 10 + 4 + 5 + 2 + 11 + 9 + 10 + 2)/12
Y = 6.583
n
Σ=1 (X i X )(Y i Y ) =
+
(10 -9.33)(6-6.58) + (12-9.33)(9-6.58)
(12-9.33)(8-6.58) + (4-9.33)(3-6.58)
i
+ (12-9.33)(10-6.58) + (6-9.33)(4-6.58)
+ (8-9.33)(5-6.58) + (2-9.33) (2-6.58)
+ (18-9.33)(11-6.58) + (9-9.33)(9-6.58)
+ (17-9.33)(10-6.58) + (2-9.33)(2-6.58)
= -0.3886 + 6.4614 + 3.7914 + 19.0814
+ 9.1314 + 8.5914 + 2.1014 + 33.5714
+ 38.3214 - 0.7986 + 26.2314 + 33.5714
= 179.6668
© 2007 Prentice Hall 17-9
Prod uct Mom ent Corr el ati on
n
Σ=1 (X i X )2 = (10-9.33)2 + (12-9.33)2 + (12-9.33)2 + (4-9.33)2
i + (12-9.33)2 + (6-9.33)2 + (8-9.33)2 + (2-9.33)2
+ (18-9.33)2 + (9-9.33)2 + (17-9.33)2 + (2-9.33)2
= 0.4489 + 7.1289 + 7.1289 + 28.4089
+ 7.1289+ 11.0889 + 1.7689 + 53.7289
+ 75.1689 + 0.1089 + 58.8289 + 53.7289
= 304.6668
n
Σ (Y i Y )2 = (6-6.58)2 + (9-6.58)2 + (8-6.58)2 + (3-6.58)2
i =1 + (10-6.58)2+ (4-6.58)2 + (5-6.58)2 + (2-6.58)2
+ (11-6.58)2 + (9-6.58)2 + (10-6.58)2 + (2-6.58)2
= 0.3364 + 5.8564 + 2.0164 + 12.8164
+ 11.6964 + 6.6564 + 2.4964 + 20.9764
+ 19.5364 + 5.8564 + 11.6964 + 20.9764
= 120.9168
= T o ta l v a r ia tio n E r r o r v a r ia tio n
T o ta l v a r ia tio n
S S y S S e rro r
=
S S y
Y6
0
-3 -2 -1 0 1 2 3
X
© 2007 Prentice Hall 17-14
Par ti al Co rre latio n
A pa rtial co rrel at ion coef ficie nt measures the
association between two variables after controlling for,
or adjusting for, the effects of one or more additional
variables.
rx y (rx z ) (ry z )
rx y . z =
1 rx2z 1 ry2z
Regressi on An alysi s
Reg ress io n co efficien t. The estimated parameter
b is usually referred to as the non-standardized
regression coefficient.
Sca tt er gram . A scatter diagram, or scattergram, is a
plot of the values of two variables for all the cases or
observations.
Stand ar d er ror of est ima te . This statistic, SEE, is
the standard deviation of the actual Y values from the
predicted Y values.
Stand ar d er ror. The standard deviation of b, SEb , is
called the standard error.
© 2007 Prentice Hall 17-21
Statist ics A sso ciated w ith B iv ari ate
where
Y = dependent or criterion variable
X = independent or predictor variable
β 0 = intercept of the line
β 1= slope of the line
Yi = β 0 + β 1 Xi + ei
Fig. 17.3
9
Attitude
Duration of Residence
Line 2
9 Line 3
Line 4
6
Fig. 17.5
Y β0 + β1X
YJ
eJ
eJ
YJ
X
X1 X2 X3 X4 X5
Y i = a + b xi
where Y i is the estimated or predicted value of Yi , and
a and b are estimators of β 0 and β 1 , respectively.
COV xy
b=
S x2
n
Σ (X i X )(Y i Y )
= i=1
n 2
Σ (X i X )
i=1
n
Σ X iY i nX Y
= i=1
n
Σ X i2 nX 2
i=1
© 2007 Prentice Hall 17-29
Conduc ting Bivari at e Reg ressio n
Analys is
Esti ma te the Pa ram eters
a =Y - bX
For the data in Table 17.1, the estimation of parameters may be
illustrated as follows:
12
Σ XiYi
i =1
= (10) (6) + (12) (9) + (12) (8) + (4) (3) + (12) (10) + (6) (4)
+ (8) (5) + (2) (2) + (18) (11) + (9) (9) + (17) (10) + (2) (2)
= 917
12
Σ Xi2 = 102 + 122 + 122 + 42 + 122 + 62
i =1
+ 82 + 22 + 182 + 92 + 172 + 22
= 1350
© 2007 Prentice Hall 17-30
Conduc ting Biv ari at e Re gr essio n
Analys is
Est ima te the Pa ra meters
It may be recalled from earlier calculations of the simple correlation
that:
X = 9.333
Y = 6.583
Given n = 12, b can be calculated as:
917 (12) (9.333) ( 6.583)
b =
1350 (12) (9.333)2
= 0.5897
a=Y-b X
= 6.583 - (0.5897) (9.333)
= 1.0793
Y
Attitude ( ) = 1.0793 + 0.5897 (Duration of residence)
where n
SSy = iΣ=1 (Yi Y)2
n
SSreg = iΣ (Yi Y)2
=1
n
SSres = iΣ=1 (Yi Yi)2
Fig. 17.6
a l Residual Variation
t
To tion SSres
a ria Explained Variation
V SS y SSreg
Y
X
X1 X2 X3 X4 X5
The strength of association may then be calculated as follows:
SS
r2 = reg
SSy
S S S S res
= y
SSy
To illustrate the calculations of r2, let us consider again the effect of attitude
toward the city on the duration of residence. It may be recalled from earlier
calculations of the simple correlation coefficient that:
n
SS y = Σ (Y i Y )2
i =1
= 120.9168
n
Therefore, 2
SS reg = Σ (Y i Y )
i =1
= (6.9763-6.5833)2 + (8.1557-6.5833)2
+ (8.1557-6.5833)2 + (3.4381-6.5833)2
+ (8.1557-6.5833)2 + (4.6175-6.5833)2
+ (5.7969-6.5833)2 + (2.2587-6.5833)2
+ (11.6939 -6.5833)2 + (6.3866-6.5833)2
+ (11.1042 -6.5833)2 + (2.2587-6.5833)2
=0.1544 + 2.4724 + 2.4724 + 9.8922 + 2.4724
+ 3.8643 + 0.6184 + 18.7021 + 26.1182
+ 0.0387 + 20.4385 + 18.7021
= 105.9524
= 14.9644
r 2
= SSreg /SSy
= 105.9524/120.9168
= 0.8762
H0: R2pop = 0
H0 : β 1 = 0
H0 : β 1 ≠ 0
or
H0 : ρ = 0
H0 : ρ ≠ 0
© 2007 Prentice Hall 17-42
Co nd ucti ng Biva ria te Reg res sio n Ana ly si s
Det er mi ne the Streng th and Si gn if ican ce of
As so cia tio n
r2 = 105.9522/(105.9522 + 14.9644)
= 0.8762
Which is the same as the value calculated earlier. The value of the
F statistic is:
F = 105.9522/(14.9644/10)
= 70.8027
Multiple R 0.93608
R2 0.87624
Adjusted R2 0.86387
Standard Error 1.22329
∑
2
(Y i Yˆ i )
−
SEE = i =1
n−2
or
SEE = SS res
n−2
SEE = SS res
n − k −1
For the data given in Table 17.2, the SEE is estimated as follows:
SEE = 14.9644/(122)
= 1.22329
© 2007 Prentice Hall 17-45
Assu mp tio ns
The error term is normally distributed. For each
fixed value of X, the distribution of Y is normal.
The means of all these normal distributions of Y,
given X, lie on a straight line with slope b.
The mean of the error term is 0.
The variance of the error term is constant. This
variance does not depend on the values assumed
by X.
The error terms are uncorrelated. In other words,
the observations have been drawn independently.
Y= a + b1X1 + b2X2
Suppose one was to remove the effect of X2 from X1. This could
be done by running a regression of X1 on X2. In other words,
X
one would estimate the X equation 1 = a + b X2 and calculate
the residual Xr = (X1 - 1). The partial regression coefficient, b1,
is then equal to theY bivariate regression coefficient, br , obtained
from the equation = a + br Xr .
or
Multiple R 0.97210
R2 0.94498
Adjusted R2 0.93276
Standard Error 0.85974
where
n
SSy = Σ (Y i Y )2
i =1
n
2
S S reg = Σ (Y i Y )
i =1
n
2
S S res = Σ (Y i Y i )
i =1
SS reg
R 2 =
SS y
H0 : β1 = β2 = β 3 = . . . = βk = 0
SS reg /k
F =
SS res /(n k 1)
= R 2 /k
(1 R 2 )/(n k 1)
t = b
SE
b
Residuals
Predicted Y Values
Residuals
Time
Residuals
Predicted Y Values
Y i = a + b1 D1 + b2 D2 + b3 D3
In this case, "heavy users" has been selected as a reference
category and has not been directly included in the regression
equation.
The coefficient b1 is the difference in predicted Yi for
nonusers, as compared to heavy users.
© 2007 Prentice Hall 17-69
An al ysi s of Va ri anc e a nd
Co var ian ce
with Re gress ion
i =1
R 2
= η2
7. Click OK.
8. Click CONTINUE.
9. Click OK.
© 2007 Prentice Hall 17-74