Sei sulla pagina 1di 23

Applied Statistics for Engineers – Week 9 Handouts Husam A.

Abu Hajar

Analysis of Covariance: ANCOVA (GLM 2)


We have seen that the one-way ANOVA can be expressed as a multiple linear regression model
using dummy-coded variables (0 and 1). We can extend this multiple linear regression model to
incorporate one or more continuous variables (predictors). Such continuous variables may not be
a part of the experimental manipulation but they have an influence on the outcome variable (they
are called covariates). Including the covariates in the analysis is called the analysis of covariance
(ANCOVA). ANCOVA is sometimes necessary to reduce the “within-group error variance 𝑆𝑆𝑅 ”
and to accurately assess the effect of the experimental control (to allow the covariate to explain
some of the variance).
If we go back to the drug example in the ANOVA chapter, the researchers may discover that
another health parameter may have an effect on the outcome. So they repeat the experiment on a
different set of participants and the new independent variable (covariate) is taken into
consideration. The multiple linear regression equation can be expressed as follows:
𝑂𝑢𝑡𝑐𝑜𝑚𝑒 = 𝑏0 + 𝑏3 𝐶𝑜𝑣𝑎𝑟𝑖𝑎𝑡𝑒 + 𝑏1 𝐻𝑖𝑔ℎ + 𝑏1 𝐿𝑜𝑤
Notice that we inserted the covariate first because the goal is to enter the covariate first to partial
its effect out and then we enter the dummy variables representing the experimental manipulation.
Assumptions of ANCOVA
The same ANOVA assumptions apply to ANCOVA but there are two additional considerations:
 Independence of the covariate and treatment effect
This necessitates that the covariate’s variance is only a portion of the unexplained variance of
the outcome (the covariate is independent of the treatment effect). If there is an overlap
between the covariate and the experimental effect (placebo, low, and high doses), ANCOVA
should not be used because the covariate will obscure the effect of the treatment. A classic
example is if you want to compare an anxious group vs. a non-anxious group on a certain
parameter and then decided to incorporate the depression as a covariate to look at the pure
effect of anxiety, then you committed a mistake because anxiety and depression are closely
correlated. One way to check this assumption is run a 𝑡-test or ANOVA to see if the depression
levels differ significantly amongst the groups, and if there is a significant difference,
depression cannot be used as a covariate.
 Homogeneity of regression slopes
When we conduct ANCOVA, we explore the overall relationship between the outcome and
the covariate by fitting a regression line to the entire data with no regards to the groups. Thus,
we assume that a positive or a negative relationship between the covariate and the outcome for
one group will apply to all groups. If this relationship is exclusive to one group, the overall
regression model will not be accurate. To simplify, if we establish linear regression models
between the covariate and the outcome for the different groups, the regression lines should
look somehow similar.

1
Applied Statistics for Engineers – Week 9 Handouts Husam A. Abu Hajar

Conducting ANCOVA using SPSS


Use the data in the following table to run ANCOVA analysis in SPSS.

Dose Outcome Covariate


3 4
2 1
5 5
2 1
Placebo 2 2
2 2
7 7
2 4
4 5
7 5
5 3
3 1
4 2
Low
4 2
7 6
5 4
4 2
9 1
2 3
6 5
3 4
4 3
4 3
High 4 2
6 0
4 1
6 3
2 0
8 1
5 0

The first step in the ANCOVA analysis is to check the “Independence of the covariate and
treatment effect” assumption. This is accomplished by checking if the mean of the covariate is
equal across the different levels of the grouping variable. We will simply run ANOVA with the
covariate as the outcome and the dose as the predictor (grouping variable). The ANOVA results
are presented in the Table below. The ANOVA model states that the differences between the

2
Applied Statistics for Engineers – Week 9 Handouts Husam A. Abu Hajar

covariate means across the different groups are not significant. Thus, the covariate can be included
in the ANCOVA.

ANOVA
Covariate
Sum of Squares df Mean Square F Sig.
Between Groups 12.769 2 6.385 1.979 .158
Within Groups 87.097 27 3.226
Total 99.867 29

To carry out ANCOVA, go to Analyze  General Linear Model  Univariate (Figure below).

Once a covariate is selected, the “Post Hoc” button will be disabled. W comparisons using the
“Contrasts” option. Click “Contrasts” where you can select one of the several available standard
contrasts. Select “Simple” from the dropdown list and change the “Reference Category” to “First”
as shown in the Figure below and click “Change”. Simple contrast compares each level to the first
category.

3
Applied Statistics for Engineers – Week 9 Handouts Husam A. Abu Hajar

Click “Continue” to go back to the main dialogue box. Click “Options” and drag the variable
“Dose” to the “Display Means for” box as shown in the Figure below. Check the “Compare main
effects” box and from the dropdown list, select one of the proposed adjustments (Bonferroni or
Sidak are recommended).

Click “Continue” to go back to the main dialogue box and click “OK”.

4
Applied Statistics for Engineers – Week 9 Handouts Husam A. Abu Hajar

SPSS output
The first main output is Levene’s test result as shown in the following Table. The result clearly
states that the homogeneity assumption has been violated (Levene’s statistic is significant).

Levene's Test of Equality of Error Variancesa


Dependent Variable: Outcome
F df1 df2 Sig.
4.618 2 27 .019
Tests the null hypothesis that the error variance of
the dependent variable is equal across groups.
a. Design: Intercept + Covariate + Dose

The next table is the “Test of Between-Subjects Effects” (ANOVA). Clearly, the covariate has a
significant contribution to the model (sig = 0.035). The Dose is also significant which in this means
that by including the covariate in the model, its effect is removed (eliminated) so the Dose becomes
a significant predictor (or factor). The 𝑆𝑆𝑀 is 31.92, 25.185 of which is accounted for by the Dose.
To understand the importance of the 𝑆𝑆𝑀 and 𝑆𝑆𝑅 , you may run an ANOVA analysis (without the
covariate) and observe the differences in errors (between ANOVA and ANCOVA).

Tests of Between-Subjects Effects


Dependent Variable: Outcome
Type III Sum of
Source Squares df Mean Square F Sig.
Corrected Model 31.920a 3 10.640 3.500 .030
Intercept 76.069 1 76.069 25.020 .000
Covariate 15.076 1 15.076 4.959 .035
Dose 25.185 2 12.593 4.142 .027
Error 79.047 26 3.040
Total 683.000 30
Corrected Total 110.967 29
a. R Squared = .288 (Adjusted R Squared = .205)

The “Parameters Estimates” table presents the regression coefficients as explained earlier in the
ANOVA chapter (two dummy variables for each dose level). In this case, the reference category
is the one with the highest value, that is the high dose. The B values in the table represent the
differences between the groups means and the 𝑡-statistic values indicate whether those differences
are significant. From this table, we can conclude that there is a significant difference between the
high dose and the placebo groups. However, the difference between the high and low doses is not
significant. The covariate’s coefficient (0.416) indicates that if all other factors are equal (similar

5
Applied Statistics for Engineers – Week 9 Handouts Husam A. Abu Hajar

doses), a one-unit increase in the covariate results in a 0.416-unit increase in the outcome and the
direction is positive (increase with an increase).

Parameter Estimates
Dependent Variable: Outcome
95% Confidence Interval
Parameter B Std. Error t Sig. Lower Bound Upper Bound
Intercept 4.014 .611 6.568 .000 2.758 5.270
Covariate .416 .187 2.227 .035 .032 .800
[Dose=1.00] -2.225 .803 -2.771 .010 -3.875 -.575
[Dose=2.00] -.439 .811 -.541 .593 -2.107 1.228
[Dose=3.00] 0a . . . . .
a. This parameter is set to zero because it is redundant.

The “Contrast Results” table shows the contrast analysis and the first comparison is between the
placebo and the low dose. The second comparison is between the high dose and the placebo groups.
Both contrasts are significant at 0.05 level which is consistent with the earlier findings (regression
coefficients).

Contrast Results (K Matrix)


Dependent
Variable
Dose Simple Contrasta Outcome
Level 2 vs. Level 1 Contrast Estimate 1.786
Hypothesized Value 0
Difference (Estimate - Hypothesized) 1.786
Std. Error .849
Sig. .045
95% Confidence Interval for Lower Bound .040
Difference Upper Bound 3.532
Level 3 vs. Level 1 Contrast Estimate 2.225
Hypothesized Value 0
Difference (Estimate - Hypothesized) 2.225
Std. Error .803
Sig. .010
95% Confidence Interval for Lower Bound .575
Difference Upper Bound 3.875
a. Reference category = 1

6
Applied Statistics for Engineers – Week 9 Handouts Husam A. Abu Hajar

One may consider exploring the means of the groups as a way of comparison after determining
that the differences are significant. However, the original means are of little help because they
have not been adjusted for the effect of the covariate. Thus, the group means are adjusted in SPSS
as shown in the following Table “Estimates”. Notice that the SPSS does not allow you to run
planned contrasts as defined in the ANOVA. There might be a way to go around it using a special
coding or one may elect to run a multiple linear regression and the planned contrasts are defined
based on the regression coefficients with respect to the reference category.

Estimates
Dependent Variable: Outcome
95% Confidence Interval
Dose Mean Std. Error Lower Bound Upper Bound
Placebo 2.926a .596 1.701 4.152
Low dose 4.712a .621 3.436 5.988
High dose 5.151a .503 4.118 6.184
a. Covariates appearing in the model are evaluated at the following
values: Covariate = 2.7333.

Finally, the Sidak-corrected post hoc comparisons are presented in the “Pairwise Comparisons”
Table shown below. Based on these comparisons, we can conclude that the high dose differs
significantly from the placebo (as portrayed earlier). Also, the high and lose dose are significantly
different. However, the pairwise comparisons indicate that the low dose is not significantly
different from the placebo (unlike the findings of the regression coefficients).

Pairwise Comparisons
Dependent Variable: Outcome
95% Confidence Interval for
Mean Differenceb
(I) Dose (J) Dose Difference (I-J) Std. Error Sig.b Lower Bound Upper Bound
Placebo Low dose -1.786 .849 .130 -3.953 .381
High dose -2.225* .803 .030 -4.273 -.177
Low dose Placebo 1.786 .849 .130 -.381 3.953
High dose -.439 .811 .932 -2.509 1.631
High dose Placebo 2.225* .803 .030 .177 4.273
Low dose .439 .811 .932 -1.631 2.509
Based on estimated marginal means
*. The mean difference is significant at the .05 level.
b. Adjustment for multiple comparisons: Sidak.

7
Applied Statistics for Engineers – Week 9 Handouts Husam A. Abu Hajar

In summary, we can look at the parameter estimates of the covariate and sign of the parameter
(regression coefficient) tells us the direction of the relationship. You may run ANCOVA by
running the classic multiple linear regression in a hierarchical way by entering the covariate in the
first block and the dummy coded variables in the second block. There will be few differences
because the dummy coding assumes 0 and 1 with respect to the placebo (baseline group).
Finally, we need to check the assumption of homogeneity of regression slopes. This means that
the relationship between the outcome and the covariate is pretty similar across the different
categories (treatment groups). To test this assumption, we rerun the ANCOVA with a customized
model. Access the main dialogue box and insert the variables in the same way. Click “Model” 
select “Custom”. We need to select a model that includes the interaction between the independent
variable and the covariate. First include the main effects of each variable and then include the
interaction of the two variables. The Model dialogue box should appear like the following Figure.

Click “Continue” and then “OK”.

In the SPSS output, you need to look at the “Tests of Between-Subjects Effects” table. Look at the
interaction term and you can see that it is significant, which means that the assumption of
homogeneous slopes has been broken. In other words, the results of ANCOVA analysis are
probably biased. Unfortunately, SPSS does not offer an easy replacement (nonparametric) for
ANCOVA.

8
Applied Statistics for Engineers – Week 9 Handouts Husam A. Abu Hajar

Tests of Between-Subjects Effects


Dependent Variable: Outcome
Type III Sum of
Source Squares df Mean Square F Sig.
Corrected Model 52.346a 5 10.469 4.286 .006
Intercept 53.542 1 53.542 21.921 .000
Dose 36.558 2 18.279 7.484 .003
Covariate 17.182 1 17.182 7.035 .014
Dose * Covariate 20.427 2 10.213 4.181 .028
Error 58.621 24 2.443
Total 683.000 30
Corrected Total 110.967 29
a. R Squared = .472 (Adjusted R Squared = .362)

9
Applied Statistics for Engineers – Week 9 Handouts Husam A. Abu Hajar

Factorial ANOVA (GLM 3)


If we have two predictor categorical variables, one-way ANOVA cannot be utilized. Instead, we
use the factorial ANOVA. Previously, there has been only one independent categorical variable
(in ANCOVA, we used a continuous independent variable which does not split data into groups).
If an experiment has two or more independent categorical variables, it is called a factorial design.
The factorial design has several types:
 Independent factorial design, where there are two or more independent predictors and each
has been measured using different participants (between groups).
 Repeated measures (related) factorial design, where the same participants were used to
measure several independent predictors (GLM 4).
 Mixed design, in which some predictors have been measured using different participants
and some using the same participants (Mixed design ANOVA).

Example: A researcher was interested in evaluating the effect of a drug dose on a certain health
indicator (outcome). The researcher also believes that the effect would be different based on the
gender. 48 patients were selected for the experiment (24 males and 24 females) and were divided
into 3 groups: placebo, low, and high doses. The experimental data is presented in the following
table.

Dose Placebo Low High


Gender Female Male Female Male Female Male
65 50 70 45 55 30
Outcome (dependent variable)

70 55 65 60 65 30
60 80 60 85 70 30
60 65 70 65 55 55
60 70 65 70 55 35
55 75 60 70 60 20
60 75 60 80 50 45
55 65 50 60 50 40
Total 485 535 500 535 460 285
𝑋̅ 60.625 66.875 62.500 66.875 57.500 35.625
𝑉𝑎𝑟 24.554 106.696 42.857 156.696 50.000 117.411
𝑠 4.955 10.329 6.547 12.518 7.071 10.836

10
Applied Statistics for Engineers – Week 9 Handouts Husam A. Abu Hajar

Two-way ANOVA is very similar to one-way ANOVA where we will find the 𝑆𝑆𝑇 which is broken
down into 𝑆𝑆𝑀 and 𝑆𝑆𝑅 . The 𝑆𝑆𝑀 is broken down further into variance explained by the first
independent variable 𝑆𝑆𝐴 , variance explained by the second 𝑆𝑆𝐵 , and variance explained by the
interaction of the two 𝑆𝑆𝐴×𝐵 .
To begin with, we calculate 𝑆𝑆𝑇 which represents the variability between all scores (ignoring the
experimental conditions) as 𝑆𝑆𝑇 = Grand 𝑠 2 × (𝑁 − 1). The grand variance is the variance of all
48 participants’ scores (190.78). Thus, 𝑆𝑆𝑇 = 8,966.7 (𝑑𝐹 = 47).
To calculate 𝑆𝑆𝑀 , we need to consider the six experimental groups as follows:
2
𝑆𝑆𝑀 = ∑ 𝑛𝑘 (𝑥̅ 𝑘 − 𝑥̅𝑔𝑟𝑎𝑛𝑑 )

So the 𝑆𝑆𝑀 deals with each group of the six groups (placebo-male, placebo-female, low-male, low-
female, high-male, high-female) and is calculated as follows:
𝑆𝑆𝑀 = 8(60.625 − 58.33)2 + 8(66.875 − 58.33)2 + 8(62.5 − 58.33)2
+ 8(66.875 − 58.33)2 + 8(57.5 − 58.33)2 + 8(35.625 − 58.33)2 = 5,479.17
The 𝑑𝐹 for 𝑆𝑆𝑀 is equal to the number of groups – 1 (𝑑𝐹 = 5).
By dividing our experimental procedure to six groups, we were able to explain 5,479.17 variance
units of the 8,966.7.
To break down the 𝑆𝑆𝑀 , we will first deal with the independent variable (gender). Therefore, we
need to rearrange the groups based on the gender (ignoring the dose and placing all males in one
group and all females into another) as follows:

Female (𝑥̅ = 60.208) Male 𝑥̅ = 56.458)


65 70 55 50 45 30
70 65 65 55 60 30
60 60 70 80 85 30
60 70 55 65 65 55
60 65 55 70 70 35
55 60 60 75 70 20
60 60 50 75 80 45
55 50 50 65 60 40

We calculate the 𝑆𝑆𝑀 explained by the gender as follows:


𝑆𝑆𝑔𝑒𝑛𝑑𝑒𝑟 = 24(60.208 − 58.33)2 + 24(56.458 − 58.33)2 = 168.75.

11
Applied Statistics for Engineers – Week 9 Handouts Husam A. Abu Hajar

𝑑𝐹 for the 𝑆𝑆𝑔𝑒𝑛𝑑𝑒𝑟 = 2 – 1 = 1.

We then rearrange the data based on the doses and ignoring the gender variable as follows:

Placebo (𝑥̅ = 63.75) Low (𝑥̅ = 64.69) High (𝑥̅ = 46.56)


65 50 70 45 55 30
70 55 65 60 65 30
60 80 60 85 70 30
60 65 70 65 55 55
60 70 65 70 55 35
55 75 60 70 60 20
60 75 60 80 50 45
55 65 50 60 50 40

We calculate the 𝑆𝑆𝑀 explained by the dose as follows:


𝑆𝑆𝑑𝑜𝑠𝑒 = 16(63.75 − 58.33)2 + 16(64.69 − 58.33)2 + 16(46.56 − 58.33)2 = 3,332.3.
𝑑𝐹 for the 𝑆𝑆𝑑𝑜𝑠𝑒 = 3 – 1 = 2.
The 𝑆𝑆𝑀 explained by the interaction of the two independent variables is computed as follows:
𝑆𝑆𝑔𝑒𝑛𝑑𝑒𝑟×𝑑𝑜𝑠𝑒 = 𝑆𝑆𝑀 − 𝑆𝑆𝑔𝑒𝑛𝑑𝑒𝑟 − 𝑆𝑆𝑑𝑜𝑠𝑒

𝑆𝑆𝑔𝑒𝑛𝑑𝑒𝑟×𝑑𝑜𝑠𝑒 = 5,479.17 − 168.75 − 3,332.3 = 1,978.12.

The 𝑑𝐹 for 𝑆𝑆𝑔𝑒𝑛𝑑𝑒𝑟×𝑑𝑜𝑠𝑒 is the multiplication of the 𝑑𝐹 for both variables = 2. It can also be
determined by subtracting the 𝑑𝐹 of both variables from the 𝑑𝐹 of 𝑆𝑆𝑀 (5 – 1 – 2 = 2).
Finally, the residual sum of squares 𝑆𝑆𝑅 is the difference between 𝑆𝑆𝑇 and 𝑆𝑆𝑀 = 8,966.7 –
5,479.17 = 3,487.53. Alternatively, 𝑆𝑆𝑅 can be computed as: 𝑠12 (𝑛1 − 1) + 𝑠22 (𝑛2 − 1) + ⋯ +
𝑠𝑛2 (𝑛𝑛 − 1). The numbers 1, 2, … , 𝑛 correspond to the groups. So 𝑆𝑆𝑅 = 24.554 (8 – 1) + 106.696
(8 – 1) + 42.857 (8 – 1) + 156.696 (8 – 1) + 50 (8 – 1) + 117.411 (8 – 1) =3,487.5. The 𝑑𝐹 for the
𝑆𝑆𝑅 is the number of groups × (number of observations in each group – 1) = 6 × (8 – 1) = 42.

12
Applied Statistics for Engineers – Week 9 Handouts Husam A. Abu Hajar

Question: If we have three predictors, how can we determine the 𝑆𝑆𝑖𝑛𝑡𝑒𝑟𝑎𝑐𝑡𝑖𝑜𝑛 ?


This case becomes more complicated where 𝑆𝑆𝐴×𝐵×𝐶 = 𝑆𝑆𝑀 − 𝑆𝑆𝐴×𝐵 − 𝑆𝑆𝐵×𝐶 − 𝑆𝑆𝐴×𝐶 − 𝑆𝑆𝐴 −
𝑆𝑆𝐵 − 𝑆𝑆𝐶 . First, we determine the sum of squares for the interaction between A and B by
eliminating C (as if it has no impact) and then we repeat the same for B, C and A, C.

Now that we have the sum of squares and the corresponding 𝑑𝐹’s, we can calculate the mean sum
of squares and the 𝐹 ratios.
168.75
𝑀𝑆𝑔𝑒𝑛𝑑𝑒𝑟 = = 168.75
1
3,332.3
𝑀𝑆𝑑𝑜𝑠𝑒 = = 1,666.15
2
1,978.13
𝑀𝑆𝑔𝑒𝑛𝑑𝑒𝑟×𝑑𝑜𝑠𝑒 = = 989.07
2
3,487.53
𝑀𝑆𝑅 = = 83.04
42
The 𝐹-ratio is computed by dividing the mean sum of squares by the residual mean sum of squares
as follows:
𝑀𝑆𝑔𝑒𝑛𝑑𝑒𝑟 168.75
𝐹𝑔𝑒𝑛𝑑𝑒𝑟 = = = 2.032
𝑀𝑆𝑅 83.04
𝑀𝑆𝑑𝑜𝑠𝑒 1,666.15
𝐹𝑑𝑜𝑠𝑒 = = = 20.06
𝑀𝑆𝑅 83.04
𝑀𝑆𝑔𝑒𝑛𝑑𝑒𝑟×𝑑𝑜𝑠𝑒 989.07
𝐹𝑔𝑒𝑛𝑑𝑒𝑟×𝑑𝑜𝑠𝑒 = = = 11.91
𝑀𝑆𝑅 83.04

Each of the above 𝐹-ratios can then be compared against a critical value obtained from the 𝐹-
distribution (based on the 𝑑𝐹’s) to determine if the effect of each predictor is significant or not.

Factorial ANOVA on SPSS


We are going to run factorial ANOVA on the previous example using SPSS. You will need first
to define two categorical variables (nominal): one for the gender and the other for the dose. For
the gender, define the coding values as male = 0 and female = 1. For the dose, define the coding
values as placebo = 1, low = 2, and high = 3. The data should look like this:

13
Applied Statistics for Engineers – Week 9 Handouts Husam A. Abu Hajar

Gender Dose Outcome Gender Dose Outcome


1 1 65 0 2 45
1 1 70 0 2 60
1 1 60 0 2 85
1 1 60 0 2 65
1 1 60 0 2 70
1 1 55 0 2 70
1 1 60 0 2 80
1 1 55 0 2 60
0 1 50 1 3 55
0 1 55 1 3 65
0 1 80 1 3 70
0 1 65 1 3 55
0 1 70 1 3 55
0 1 75 1 3 60
0 1 75 1 3 50
0 1 65 1 3 50
1 2 70 0 3 30
1 2 65 0 3 30
1 2 60 0 3 30
1 2 70 0 3 55
1 2 65 0 3 35
1 2 60 0 3 20
1 2 60 0 3 45
1 2 50 0 3 40

Remember that this is an independent design (different participants were assigned to the different
groups). Go to Analyze  General Linear Model  Univariate. Select the independent and
dependent variables as shown in the Figure below.

14
Applied Statistics for Engineers – Week 9 Handouts Husam A. Abu Hajar

The “Model” button will allow you to customize the model (for instance, you may only want to
test the main effects rather than the full factorial effects). The “Model” becomes handy if we have
3 or more independent variables. We will keep the default settings in “Model”.
Click on “Plots” to select the graphs you want to appear in your output. One of the most useful
graphical outputs in factorial ANOVA is the interaction graph which helps us understand the
combined effect of gender and dose. Select the variables as shown in the Figure below.

Click “Add” and if there are no additional graphs you wish to plot, click “Continue”.
“Contrasts” will allow you to establish useful comparisons using the SPSS standard contrasts. One
disadvantage of contrasts is that they compare the main effects but not the interactions. For the
gender variable, there is no need to establish a contrast because there are only two categories under
this variable. For the dose variable, there are 3 levels so we can select different options for contrasts
(from the Contrast dropdown list) such as:

15
Applied Statistics for Engineers – Week 9 Handouts Husam A. Abu Hajar

 Simple contrast, which compares each level to the first category. In our example, SPSS
will compare the low to the placebo and the high to the placebo.
 Repeated contrast, which compares each level to the previous one. In our example, it will
compare the low to the placebo and then the high to the low.
 Helmert contrast, which compares each level to all subsequent levels. In our example, it
will compare the placebo to the low and high dose groups and then will compare the low
to the remaining groups (only the high).
We will use “Helmert contrast for the dose variable as shown in the Figure below (remember to
click “Change” to save the contrast).

Click “Continue” to go back to the main dialogue box.


The “Post Hoc” will allow you to run additional comparisons between the different categories of
each independent variable. Because the gender variable has only two categories, there is no need
to run post hoc tests for this variable. To run post hoc tests on the dose variable, click “Post Hoc”
from the main dialogue box and select the dose variable. Select the tests shown in the Figure below.

16
Applied Statistics for Engineers – Week 9 Handouts Husam A. Abu Hajar

Click continue to go back to the main dialogue box and click “Options” and select the options
shown in the Figure below.

Click “Continue” to go back to the main dialogue box and then click “OK” to view the output.

17
Applied Statistics for Engineers – Week 9 Handouts Husam A. Abu Hajar

SPSS output
The first output is the descriptive statistics table as shown below. These descriptive statistics were
already computed when we first discussed the concepts of ANOVA (page 10 of these handouts).

Descriptive Statistics
Dependent Variable: Outcome
Gender Dose Mean Std. Deviation N
Male Placebo 66.8750 10.32940 8
Low 66.8750 12.51784 8
High 35.6250 10.83562 8
Total 56.4583 18.50259 24
Female Placebo 60.6250 4.95516 8
Low 62.5000 6.54654 8
High 57.5000 7.07107 8
Total 60.2083 6.33815 24
Total Placebo 63.7500 8.46562 16
Low 64.6875 9.91106 16
High 46.5625 14.34326 16
Total 58.3333 13.81232 48

The next table is Levene’s test output which tells us that the variances are homogenous.

Levene's Test of Equality of Error Variancesa


Dependent Variable: Outcome
F df1 df2 Sig.
1.527 5 42 .202
Tests the null hypothesis that the error variance of
the dependent variable is equal across groups.
a. Design: Intercept + Gender + Dose + Gender *
Dose

The most important output is the “Tests of Between-Subjects Effects” shown below. From this
table, we can conclude that there is a significant main effect of the dose. This conclusion can be
confirmed by looking at the “Descriptive Statistics” Table from which we can see that the placebo
and the low dose effects have close means (63.75 and 64.69, respectively) while the high is much
less than the two (46.56). The gender effect on the outcome is not significant. Again, going back
to the “Descriptive Statistics” table, we can see that the overall means of the female and male

18
Applied Statistics for Engineers – Week 9 Handouts Husam A. Abu Hajar

groups are pretty close (56.45 and 60.21, respectively). We can also take a look at the graphical
output we requested (“Estimated Marginal Means of Outcome”). Finally, we can conclude that the
interaction between gender and dose has a significant effect on the outcome. In other words, this
informs us that the effect of the dose on the outcome was different for male participants and female
participants. If we inspect the “Estimated Marginal Means of Outcome” Figure, we can see that
the dose has very little effect for female participants while there is a pronounced effect of the dose
(high dose) for male participants. In general, non-parallel lines in this Figure indicate a significant
interaction effect.
Notice that we earlier concluded that the dose has a significant effect but based on the interaction
analysis, we were able to identify that the dose’s effect is only significant for male participants.
This indicates that the main effects can be misleading in factorial designs.

Tests of Between-Subjects Effects


Dependent Variable: Outcome
Type III Sum of
Source Squares df Mean Square F Sig.
Corrected Model 5479.167a 5 1095.833 13.197 .000
Intercept 163333.333 1 163333.333 1967.025 .000
Gender 168.750 1 168.750 2.032 .161
Dose 3332.292 2 1666.146 20.065 .000
Gender * Dose 1978.125 2 989.062 11.911 .000
Error 3487.500 42 83.036
Total 172300.000 48
Corrected Total 8966.667 47
a. R Squared = .611 (Adjusted R Squared = .565)

19
Applied Statistics for Engineers – Week 9 Handouts Husam A. Abu Hajar

The “Contrast Results (K Matrix)” Table is the result of the Helmert contrast analysis (only on the
dose). You can see that the table is divided into two main components. The first is the “Level 1 vs.
Later” which compares the first category (placebo) to the other two groups. So the comparison is
between the mean of the placebo (63.75) to the mean of the other two groups [(64.69 + 46.56)/2 =
55.63]. The difference between the two means = 63.75 – 55.63 = 8.12. This difference (contrast)
is significant. This implies that any amount of the drug will be significant but this is misleading
because if we inspect the placebo and low groups, we can be certain that these two groups are
almost similar.
The second component of the table is the “Level 2 vs. Level 3” contrast which tests the difference
between the low and the high groups. The difference is 18.125 which is also significant.

Contrast Results (K Matrix)


Dependent
Variable
Dose Helmert Contrast Outcome
Level 1 vs. Later Contrast Estimate 8.125
Hypothesized Value 0
Difference (Estimate - Hypothesized) 8.125
Std. Error 2.790
Sig. .006
95% Confidence Interval for Lower Bound 2.494
Difference Upper Bound 13.756
Level 2 vs. Level 3 Contrast Estimate 18.125
Hypothesized Value 0
Difference (Estimate - Hypothesized) 18.125
Std. Error 3.222
Sig. .000
95% Confidence Interval for Lower Bound 11.623
Difference Upper Bound 24.627

The next output is the “Multiple Comparisons” table which presents the post hoc results.
Remember that we only asked for post hoc tests for the dose variable. The results in the table
inform us that the placebo and low dose groups are not significantly different whereas the high
group is significantly different from the other two groups. Remember that the post hoc tests
(similar to contrasts) do not inform us on the interactive effect of the two independent variables.

20
Applied Statistics for Engineers – Week 9 Handouts Husam A. Abu Hajar

Multiple Comparisons
Dependent Variable: Outcome

Mean Difference 95% Confidence Interval


(I) Dose (J) Dose (I-J) Std. Error Sig. Lower Bound Upper Bound
Tukey HSD Placebo Low -.9375 3.22172 .954 -8.7646 6.8896
High 17.1875* 3.22172 .000 9.3604 25.0146
Low Placebo .9375 3.22172 .954 -6.8896 8.7646
High 18.1250* 3.22172 .000 10.2979 25.9521
High Placebo -17.1875* 3.22172 .000 -25.0146 -9.3604
Low -18.1250* 3.22172 .000 -25.9521 -10.2979
Bonferroni Placebo Low -.9375 3.22172 1.000 -8.9714 7.0964
High 17.1875* 3.22172 .000 9.1536 25.2214
Low Placebo .9375 3.22172 1.000 -7.0964 8.9714
High 18.1250* 3.22172 .000 10.0911 26.1589
*
High Placebo -17.1875 3.22172 .000 -25.2214 -9.1536
Low -18.1250* 3.22172 .000 -26.1589 -10.0911
Based on observed means.
The error term is Mean Square(Error) = 83.036.
*. The mean difference is significant at the .05 level.

The “Homogeneous Subsets” table provides similar conclusions in which the placebo and low
groups are combined as homogenous subsets (equal means).

21
Applied Statistics for Engineers – Week 9 Handouts Husam A. Abu Hajar

Outcome
Subset
Dose N 1 2
Tukey HSDa,b High 16 46.5625
Placebo 16 63.7500
Low 16 64.6875
Sig. 1.000 .954
Ryan-Einot-Gabriel-Welsch High 16 46.5625
b
Range Placebo 16 63.7500
Low 16 64.6875
Sig. 1.000 .772
Means for groups in homogeneous subsets are displayed.
Based on observed means.
The error term is Mean Square(Error) = 83.036.
a. Uses Harmonic Mean Sample Size = 16.000.
b. Alpha = .05.

Questions: What will you infer if the “Estimated Marginal Means of Outcome” looked like this?

We can conclude that the effect is not interactive (as the dose increases, similar trend is observed
for both genders). So when the two lines are almost parallel, we can safely assume that there is no
interaction. However, if the two lines cross, the interaction is significant.

22
Applied Statistics for Engineers – Week 9 Handouts Husam A. Abu Hajar

Question: What if one or more of the factorial assumptions (parametric assumptions) were
broken)?
SPSS does not provide nonparametric counterpart to the factorial ANOVA. Other software might
still be used. Data transformation might provide the solution (for non-normal and/or
heterogeneous-variances data).

Self-study problem
People with obsessive compulsive disorder (OCD) tend to check things too many times, so they
may check whether they locked the door too many times and it will take them forever to leave the
house. One of the OCD theories suggests that it is caused by a combination of the mood (positive
or negative) interacting with your rules on when to stop (you continue the task until you feel like
stopping or until you feel that you’ve done the task as best as possible). Davey et al. (2003) tested
this hypothesis on a group of people by inducing positive mood, negative mood, and no mood in
different participants. The outcome (dependent variable) is the number of things that each
participant will check before leaving home on a holiday. Half of the participants in each mood
were asked to generate items (checks) until they feel like stopping (“as many as can” stop rule)
whereas the other were asked to generate items (checks) for as long as they felt like continuing the
task (“feel like continuing” stop rule). Conduct the proper analysis to test the hypothesis that the
OCD is affected by the combination of the mood and the stop rule. The data is in “Davey et al.
(2003)” data file.

23

Potrebbero piacerti anche