Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
15-1
Page
15-2
15-3
15-13
15-20
15-23
2007 A. Karpinski
Code
1
2
3
4
Coding B
Religion
Protestant
Jewish
Catholic
Other
Code
1
2
3
4
o Even if we solve the coding problem (say we could get all researchers
to agree on coding scheme A), the regression model estimates a linear
relationship between the predictor variable and the outcome variable.
15-2
2007 A. Karpinski
Y = b0 + b1 X
AttitudesTowardAbortion = b0 + b1 (Re ligion)
Dummy coding
For dummy coding, one group is specified to be the reference group and is
given a value of 0 for each of the (a-1) indicator variables.
Dummy Coding of Gender
(a = 2)
Gender
Male
Female
D1
1
0
Group
Treatment 1
Treatment 2
Control
D1
0
1
0
D2
1
0
0
15-3
2007 A. Karpinski
The choice of the reference group is statistically arbitrary, but it affects how
you interpret the resulting regression parameters. Here are some
considerations that should guide your choice of reference group (Hardy,
1993):
o The reference group should serve as a useful comparison (e.g., a
control group; a standard treatment; or the group expected to have the
highest/lowest score).
o The reference group should be a meaningful category (e.g., not an
other category).
o If the sample sizes in each group are unequal, it is best if the reference
group not have a small sample size relative to the other groups.
Dummy coding a dichotomous variable
o We wish to examine whether gender predicts level of implicit selfesteem (as measured by a Single Category Implicit Association Test).
Implicit self-esteem data are obtained from a sample of women (n =
56) and men (n = 17).
implicit
1.00
0.50
0.00
-0.50
0.50
1.00
1.50
2.00
2.50
gender
15-4
2007 A. Karpinski
o Using this equation, we can obtain separate regression lines for women
and men by substituting appropriate values for the dummy variable.
For women: Dummy = 0
Implicit Self Esteem = b0 + b1 * 0
= b0
15-5
2007 A. Karpinski
REGRESSION
/STATISTICS COEFF OUTS R ANOVA ZPP
/DEPENDENT implicit
/METHOD=ENTER dummy.
Model Summary
Model
1
R
.404a
R Square
.163
Adjusted
R Square
.151
Std. Error of
the Estimate
.28264
Model
1
(Constant)
dummy
Unstandardized
Coefficients
B
Std. Error
.493
.038
-.291
.078
Standardized
Coefficients
Beta
-.404
t
13.059
-3.716
Sig.
.000
.000
Zero-order
-.404
Correlations
Partial
Part
-.404
-.404
The test of b0 indicates that women have more positive than negative
associations with the self (high self-esteem), b = .49, t (71) = 13.06, p < .01 .
Also, we know YWomen = 0.49 .
The test of b1 indicates that mens self esteem differs from womens
self-esteem by -0.29 units (that is, they have lower self-esteem),
b = .29, t (71) = 3.72, p < .01 .
By inference, we know that the average self-esteem of men is
b0 + b1 = .493 .291 = .202 . However, with this dummy coding, we do
not obtain a test of whether or not the mean score for men differs from
zero.
The female/male distinction accounts for 16% of the variance in
implicit self-esteem scores, R 2 = .16 .
15-6
2007 A. Karpinski
gender
Male
Female
Statistic
.2024
.4932
Mean
Mean
Std. Error
.07529
.03662
120.00
ata
100.00
80.00
60.00
40.00
Protestant
Catholic
Jewish
Other
religion
D3
0
0
0
1
D3
0
0
1
0
15-7
D3
0
0
0
1
2007 A. Karpinski
o When the categorical variable has more than two levels (meaning that
more than 1 dummy variable is required), it is essential that all the
dummy variables be entered into the regression equation.
ATA = b0 + (b1 * D1 ) + (b2 * D2 ) + (b3 * D3 )
o Using this equation, we can obtain separate regression lines for each
religion by substituting appropriate values for the dummy variables.
Reference Group = Protestant
Religion
D1
D2
D3
Protestant
0
0
0
Catholic
1
0
0
Jewish
0
1
0
Other
0
0
1
For Protestant: D1 = 0; D2 = 0; D3 = 0
For Jewish: D1 = 0; D2 = 1; D3 = 0
For Catholic: D1 = 1; D2 = 0; D3 = 0
For Other: D1 = 0; D2 = 0; D3 = 1
15-8
2007 A. Karpinski
The test of b2 tells us whether the mean score on the outcome variable
differs between the reference group and the group identified by D2.
The test of b3 tells us whether the mean score on the outcome variable
differs between the reference group and the group identified by D3.
2007 A. Karpinski
REGRESSION
/STATISTICS COEFF OUTS R ANOVA ZPP
/DEPENDENT ATA
/METHOD=ENTER dummy1 dummy2 dummy3.
Model Summary
Model
1
R
.596a
R Square
.355
Adjusted
R Square
.294
Std. Error of
the Estimate
23.41817
Model
1
(Constant)
dummy1
dummy2
dummy3
Unstandardized
Coefficients
B
Std. Error
93.308
6.495
-32.641
10.155
10.192
11.558
-23.183
10.523
Standardized
Coefficients
Beta
-.514
.138
-.351
t
14.366
-3.214
.882
-2.203
Sig.
.000
.003
.384
.035
Correlations
Partial
Zero-order
-.442
.355
-.225
-.494
.154
-.363
Part
-.456
.125
-.313
The test of b0 indicates that the mean ATA score for Protestants,
YProtestants = 93.31 , is significantly above zero,
b = 93.31, t (32) = 14.37, p < .01 . In this case, b0 is probably meaningless
(the scale responses have to be greater than zero).
The test of b1 indicates that the mean ATA score for Catholics is
significantly less than (because the sign is negative) the mean ATA
score for Protestants, b = 32.64, t (32) = 3.21, p < .01 . The mean ATA
score for Catholics is YCatholics = b0 + b1 = 93.31 32.64 = 60.67 . The
Catholic/non-Catholic distinction accounts for 19.5% of the variance
in ATA scores ( rD2 Y = .442 2 = .195 ).
1
The test of b2 indicates that the mean ATA score for Jews is not
significantly different from the mean ATA score for Protestants,
b = 10.19, t (32) = 0.88, p = .38 . The mean ATA score for Jews is
YJews = b0 + b2 = 93.31 + 10.19 = 103.50 . The Jew/non-Jew distinction
accounts for 19.5% of the variance in ATA scores ( rD2 Y = .355 2 = .126 ).
2
15-10
2007 A. Karpinski
The test of b3 indicates that the mean ATA score for Others is
significantly lower than the mean ATA score for Protestants,
b = 23.18, t (32) = 2.20, p = .04 . The mean ATA score for Others is
YOthers = b0 + b3 = 93.31 23.18 = 70.13 . The Other/non-Other distinction
accounts for 5.1% of the variance in ATA scores ( rD2 Y = .225 2 = .051 )
3
D3
0
0
0
1
15-11
2007 A. Karpinski
Model R2
-.51
.14
-.35
p
< .001
.003
.384
.035
-.44
.36
-.23
.355
60.67
32.64
42.83
9.46
.57
.58
.14
< .001
.003
.002
.412
.32
.36
-.23
.355
b0
Dummy 1
Dummy 2
Dummy 3
103.50
-42.83
-10.19
-33.38
-.68
-.18
-.51
< .001
.002
.384
.013
-.44
.32
-.23
.355
b0
Dummy 1
Dummy 2
Dummy 3
70.13
-9.46
23.18
33.38
-.15
.41
.45
< .001
.412
.035
.013
-.44
.32
.36
.355
Reference Group
Protestant
b0
Dummy 1
Dummy 2
Dummy 3
b
93.31
-32.64
10.19
-23.18
Catholic
b0
Dummy 1
Dummy 2
Dummy 3
Jewish
Other
o Note that model parameters, p-values, and correlations are all different.
o In all cases, it is the unstandardized regression coefficients that have
meaning. We should not interpret or report standardized regression
coefficients for dummy code analyses (This general rule extends to all
categorical variable coding systems).
o So long as the a-1 indicator variables are entered into the regression
equations, the Model R2 is the same regardless of how the model is
parameterized.
15-12
2007 A. Karpinski
3.
Gender
Male
Female
E1
1
-1
Group
Treatment 1
Treatment 2
Control
E1
0
1
-1
E2
1
0
-1
15-13
2007 A. Karpinski
o Now, we can predict implicit self esteem from the effects coded gender
variable in an OLS regression
Implicit Self Esteem = b0 + b1 * Effect1
o Using this equation, we can get separate regression lines for women
and men by substituting appropriate values for the effects coded
variable.
For women: Effect1 = -1
Implicit Self Esteem = b0 b1
15-14
2007 A. Karpinski
REGRESSION
/STATISTICS COEFF OUTS R ANOVA ZPP
/DEPENDENT implicit
/METHOD=ENTER effect1.
Model Summary
Model
1
R
.404a
Adjusted
R Square
.151
R Square
.163
Std. Error of
the Estimate
.28264
Coefficientsa
Model
1
(Constant)
effect1
Unstandardized
Coefficients
B
Std. Error
.348
.039
-.145
.039
Standardized
Coefficients
Beta
t
8.887
-3.716
-.404
Sig.
.000
.000
Zero-order
-.404
Correlations
Partial
-.404
Part
-.404
The test of b0 indicates that the average self-esteem score (the average
of mens self-esteem and of womens self-esteem) is greater than
zero, b = .35, t (71) = 8.89, p < .01 .
The test of b1 indicates that mens self-esteem differs from the
average self-esteem score by -0.145 units (that is, they have lower selfesteem), b = .15, t (71) = 3.72, p < .01 .
Thus, we know that YMen = .348 .145 = .203 and that
YWomen = .348 + .145 = .493 .
The female/male distinction accounts for 16% of the variance in
implicit self-esteem scores, rD2 Y = .16 .
1
15-15
2007 A. Karpinski
o Note that with unequal sample sizes, the grand mean is the average
of the group means YUnweighted = .2024 + .4932 = .348 = b0 and is not the
grand mean of all N observations Y = .426 .
Descriptives
implicit
N
Male
Female
Total
17
56
73
Mean
.2024
.4932
.4255
Std. Deviation
.31041
.27403
.30676
E3
0
0
-1
1
E3
0
0
1
-1
15-16
E3
0
-1
0
1
2007 A. Karpinski
IF (religion = 1) effect2 = -1 .
IF (religion = 2) effect2 = 0 .
IF (religion = 3) effect2 = 1 .
IF (religion = 4) effect2 = 0 .
IF (religion = 1) effect3 = -1 .
IF (religion = 2) effect3 = 0 .
IF (religion = 3) effect3 = 0 .
IF (religion = 4) effect3 = 1 .
o Using this equation, we can get separate regression lines for each
religion by substituting appropriate values for the effect coded
variables.
Reference Group = Protestant
Religion
E1
E2
E3
Protestant
-1
-1
-1
Catholic
1
0
0
Jewish
0
1
0
Other
0
0
1
For Protestant: E1 = 1; E 2 = 1; E 3 = 1
For Jewish: E1 = 0; E 2 = 1; E 3 = 0
For Other: E1 = 0; E 2 = 0; E 3 = 1
15-17
2007 A. Karpinski
The test of b2 tells us whether the mean score on the outcome variable for
the group identified by E2.differs from the grand mean.
The test of b3 tells us whether the mean score on the outcome variable for
the group identified by E3.differs from the grand mean.
REGRESSION
/STATISTICS COEFF OUTS R ANOVA ZPP
/DEPENDENT ATA
/METHOD=ENTER effect1 effect2 effect3.
15-18
2007 A. Karpinski
Model Summary
Model
1
R
R Square
.596a
.355
Adjusted
R Square
.294
Std. Error of
the Estimate
23.41817
Coefficientsa
Model
1
(Constant)
effect1
effect2
effect3
Unstandardized
Coefficients
B
Std. Error
81.900
4.055
-21.233
6.849
21.600
7.883
-11.775
7.122
Standardized
Coefficients
Beta
-.598
.550
-.322
t
20.198
-3.100
2.740
-1.653
Sig.
.000
.004
.010
.108
Zero-order
Correlations
Partial
-.444
-.029
-.328
-.481
.436
-.281
Part
-.440
.389
-.235
The test of b0 indicates that the mean ATA score, YGroup Means = 81.90
(calculated as the average of the four group means), is significantly
above zero, b = 81.90, t (32) = 20.20, p < .01.
The test of b1 indicates that the mean ATA score for Catholics is
significantly less than (because the sign is negative) the mean ATA
score, b = 21.23,t(32) = 3.10, p < .01. The mean ATA score for Catholics
is YCatholics = b0 + b1 = 81.900 21.233 = 60.67 .
The test of b2 indicates that the mean ATA score for Jews is
significantly greater than (because the sign is positive) the mean ATA
score, b = 21.60,t(32) = 2.74, p = .01 . The mean ATA score for Jews is
YJews = b0 + b2 = 81.900 + 21.600 = 103.50 .
The test of b3 indicates that the mean ATA score for Others is not
significantly different than the mean ATA score,
b = 11.78, t (32) = 1.65, p = .11. The mean ATA score for Others is
YOthers = b0 + b3 = 81.900 11.775 = 70.13 .
Overall 35.5% of the variability in ATA scores in the sample is
associated with religion; 29.4% of the variability in ATA scores in the
population is associated with religion.
15-19
2007 A. Karpinski
C1
1
1
1
-3
C2
1
1
-2
0
C3
1
-1
0
0
o For each contrast, the sum of the contrast coefficients should equal zero
o The contrasts should be orthogonal (assuming equal n)
If the contrast codes are not orthogonal, then you need to be very
careful about interpreting the regression coefficients.
15-20
2007 A. Karpinski
IF (religion = 1) cont1 = 1.
IF (religion = 2) cont1 = 1 .
IF (religion = 3) cont1 = 1 .
IF (religion = 4) cont1 = -3.
IF (religion = 1) cont2 = 1 .
IF (religion = 2) cont2 = 1.
IF (religion = 3) cont2 = -2 .
IF (religion = 4) cont2 = 0.
IF (religion = 1) cont3 = -1 .
IF (religion = 2) cont3 = 1 .
IF (religion = 3) cont3 = 0 .
IF (religion = 4) cont3 = 0.
Now, we can enter all a-1 contrast codes into a regression equation
REGRESSION
/STATISTICS COEFF OUTS R ANOVA ZPP
/DEPENDENT ATA
/METHOD=ENTER cont1 cont2 cont3.
Model Summary
Model
1
R
.596a
R Square
.355
Adjusted
R Square
.294
Std. Error of
the Estimate
23.41817
Model
1
(Constant)
cont1
cont2
cont3
Unstandardized
Coefficients
B
Std. Error
81.900
4.055
3.925
2.374
-8.838
3.608
-16.321
5.077
Standardized
Coefficients
Beta
.237
-.352
-.459
t
20.198
1.653
-2.449
-3.214
Sig.
.000
.108
.020
.003
Zero-order
Correlations
Partial
.225
-.277
-.444
.281
-.397
-.494
Part
.235
-.348
-.456
= 60.67
15-21
= 103.50
= 70.13
2007 A. Karpinski
ata
Contrast
1
2
3
Value of
Contrast
47.0994
-53.0256
-32.6410
Std. Error
28.48656
21.65011
10.15480
t
1.653
-2.449
-3.214
df
32
32
32
Sig. (2-tailed)
.108
.020
.003
o Note that the t-values and the p-values are identical to what we
obtained from the regression analysis.
15-22
2007 A. Karpinski
HispanicAmerican
21.13
20.52
22.31
22.04
22.52
17.71
26.60
26.36
27.05
27.97
24.03
19.08
25.82
23.01
41.50
23.43
24.13
18.30
18.01
34.89
24.22
29.18
36.91
29.44
25.80
19.75
25.10
AsianAmerican
20.52
20.94
21.03
18.36
22.46
19.00
19.58
18.30
20.52
20.17
19.53
18.02
16.47
18.18
22.46
18.83
16.47
21.80
22.31
18.71
20.98
19.46
22.46
19.76
17.75
19.13
18.89
CaucasianAmerican
18.65
18.01
19.13
19.79
19.20
20.72
19.37
20.80
19.57
23.49
19.65
23.49
19.74
24.69
19.76
24.69
19.79
31.47
19.80
15.78
20.12
17.63
20.25
17.71
20.30
17.93
18.01
45.00
48
40.00
52
37
35.00
BMI
90
30.00
88
86
25.00
20.00
91
18
15.00
African-American
Hispanic-American
Asian-American
Caucasian-American
ethnic
o We have some outliers and unequal variances, but lets ignore the
assumptions for the moment and compare the ANOVA and regression
outputs.
15-23
2007 A. Karpinski
Regression Approach
Yij = + j + ij
Effects Coding
Descriptives
IF (ethnic = 1) effect1 = 1 .
IF (ethnic = 2) effect1 = 0 .
IF (ethnic = 3) effect1 = 0 .
IF (ethnic = 4) effect1 = -1 .
BMI
N
African-American
Hispanic-American
Asian-American
Caucasian-American
Total
27
27
27
27
108
Mean
21.4896
25.0670
19.7070
20.3533
21.6543
= 21.65
IF (ethnic = 1) effect2 = 0 .
IF (ethnic = 2) effect2 = 1 .
IF (ethnic = 3) effect2 = 0 .
IF (ethnic = 4) effect2 = -1 .
IF (ethnic = 1) effect3 = 0 .
IF (ethnic = 2) effect3 = 0 .
IF (ethnic = 3) effect3 = 1 .
IF (ethnic = 4) effect3 = -1 .
Coefficientsa
j = Y . j Y ..
1 = 21.4869 21.6543 = 0.165
2 = 25.0670 21.6543 = 3.413
3 = 19.7070 21.6543 = 1.947
4 = 20.3533 21.6543 = 1.301
Model
1
(Constant)
effect1
effect2
effect3
Unstandardized
Coefficients
B
Std. Error
21.654
.334
-.165
.578
3.413
.578
-1.947
.578
2 = b2
1 = b1
3 = b3
15-24
2007 A. Karpinski
The ANOVA tables outputted from ANOVA and regression also test
equivalent hypotheses:
o ANOVA: H 0 : 1 = 2 = 3 = 4 (There are no group effects)
o Regression: H 0 : b1 = b2 = b3 = 0 (The predictor variable accounts for no
variability in the outcome variable).
o Lets compare the ANOVA tables from the two analyses
REGRESSION
/DEPENDENT BMI
/METHOD=ENTER effect1 effect2 effect3.
ANOVAb
ANOVA
BMI
Between Groups
Within Groups
Total
Sum of
Squares
463.272
1251.986
1715.258
df
3
104
107
Mean Square
154.424
12.038
F
12.828
Sig.
.000
Model
1
Regression
Residual
Total
Sum of
Squares
463.272
1251.986
1715.258
df
3
104
107
Mean Square
154.424
12.038
F
12.828
Model Summary
Model
1
R
R Square
.520a
.270
Adjusted
R Square
.249
Std. Error of
the Estimate
3.46963
Source
Corrected Model
Intercept
ethnic
Error
Total
Corrected Total
df
3
1
3
104
108
107
Mean Square
154.424
50641.950
154.424
12.038
F
12.828
4206.727
12.828
Sig.
.000
.000
.000
Partial Eta
Squared
.270
.976
.270
R 2 = 2 = .27
We have not yet examined the individual parameter estimates from the
regression output. All the regression output we have examined thus far is
independent of how the regression model is parameterized. Thus, any
parameterization of the regression model should produce identical output to
the ANOVA results.
15-25
2007 A. Karpinski
Sig.
.000a
In fact, when you use the UNIANOVA command in SPSS, SPSS constructs
dummy variables, runs a regression, and converts the output to an ANOVA
format. We can see this by asking for parameter estimates in the output.
UNIANOVA BMI BY ethnic
/PRINT = PARAMETER.
Parameter Estimates
Dependent Variable: BMI
Parameter
Intercept
[ethnic=1.00]
[ethnic=2.00]
[ethnic=3.00]
[ethnic=4.00]
B
Std. Error
20.353
.668
1.136
.944
4.714
.944
-.646
.944
0a
.
t
30.481
1.203
4.992
-.684
.
Sig.
.000
.232
.000
.495
.
IF (ethnic = 1) dummy1 = 1 .
IF (ethnic ne 1) dummy1 = 0 .
IF (ethnic = 2) dummy2 = 1 .
IF (ethnic ne 2) dummy2 = 0 .
IF (ethnic = 3) dummy3 = 1 .
IF (ethnic ne 3) dummy3 = 0 .
REGRESSION
/STATISTICS COEFF OUTS R ANOVA
/DEPENDENT BMI
/METHOD=ENTER dummy1 dummy2 dummy3.
Coefficientsa
Model
1
(Constant)
dummy1
dummy2
dummy3
Unstandardized
Coefficients
B
Std. Error
20.353
.668
1.136
.944
4.714
.944
-.646
.944
t
30.481
1.203
4.992
-.684
Sig.
.000
.232
.000
.495
Regression
Effects coded parameters
Dummy coded parameters
Contrast coded parameters
We have shown that ANOVA and regression are equivalent analyses. The
common framework that unites the two is called the general linear model.
Specifically, ANOVA is a special case of regression analysis.
15-26
2007 A. Karpinski
15-27
2007 A. Karpinski