Sei sulla pagina 1di 33

Univariate GLM, ANOVA, and ANCOVA

Overview
Univariate GLM is the version of the general linear model now often used to implement two long-
established statistical procedures - ANOVA and ANCOVA. Univariate GLM, ANOVA, and ANCOVA all
deal with the situation where there is one dependent variable and one or more independents. ANCOVA also
supports use of continuous control variables as covariates.

Analysis of variance (ANOVA) is used to uncover the main and interaction effects of categorical
independent variables (called "factors") on an interval dependent variable. The new general linear model
(GLM) implementation of ANOVA also supports categorical dependents. A "main effect" is the direct
effect of an independent variable on the dependent variable. An "interaction effect" is the joint effect of
two or more independent variables on the dependent variable. Whereas regression models cannot handle
interaction unless explicit crossproduct interaction terms are added, ANOVA uncovers interaction effects
on a built-in basis. For the case of multiple dependents, discussed separately, multivariate GLM
implements multiple analysis of variance (MANOVA), including a variant which supports control
variables as covariates (MANCOVA).

The key statistic in ANOVA is the F-test of difference of group means, testing if the means of the groups
formed by values of the independent variable (or combinations of values for multiple independent
variables) are different enough not to have occurred by chance. If the group means do not differ
significantly then it is inferred that the independent variable(s) did not have an effect on the dependent
variable. If the F test shows that overall the independent variable(s) is (are) related to the dependent
variable, then multiple comparison tests of significance are used to explore just which values of the
independent(s) have the most to do with the relationship.

If the data involve repeated measures of the same variable, as in before-after or matched pairs tests, the F-
test is computed differently from the usual between-groups design, but the inference logic is the same.
There are also a large variety of other ANOVA designs for special purposes, all with the same general
logic.

Note that analysis of variance tests the null hypotheses that group means do not differ. It is not a test of
differences in variances, but rather assumes relative homogeneity of variances. Thus some key ANOVA
assumptions are that the groups formed by the independent variable(s) are relatively equal in size and have
similar variances on the dependent variable ("homogeneity of variances"). Like regression, ANOVA is a
parametric procedure which assumes multivariate normality (the dependent has a normal distribution for
each value category of the independent(s)).

Analysis of covariance (ANCOVA) is used to test the main and interaction effects of categorical variables
on a continuous dependent variable, controlling for the effects of selected other continuous variables which
covary with the dependent.The control variable is called the "covariate." There may be more than one
covariate. One may also perform planned comparison or post hoc comparisons to see which values of a
factor contribute most to the explanation of the dependent. ANCOVA uses built-in regression using the
covariates to predict the dependent, then does an ANOVA on the residuals (the predicted minus the actual
dependent variables) to see if the factors are still significantly related to the dependent variable after the
variation due to the covariates has been removed.

In SPSS, select Analyze, General Linear Model, Univariate; enter the dependent variable, the factor(s), and
the covariate(s); click the Model button and accept the default, which is Full Factorial (if you select
Custom, your model should not include interactions of factors with covariates: that is used beforehand in
testing the equality of regressions assumption discussed below in the "Assumptions" section, but not in the
ANCOVA model itself). The Full Factorial model contains the intercept, all factor and covariate main
effects, and all factor-by-factor interactions. For instance, for three variables A, B, and C, it includes the
effects A, B, C, A*B, A*C, B*C, and A*B*C. It does not contain factor-by-covariate interactions.
Covariates will be listed in the /DESIGN statement after the WITH keyword. The maximum number of
covariates SPSS will process is 10.

ANCOVA is used for three purposes:


In quasi-experimental (observational) designs, to remove the effects of variables which modify the
relationship of the categorical independents to the interval dependent.
In experimental designs, to control for factors which cannot be randomized but which can be
measured on an interval scale. Since randomization in principle controls for all unmeasured
variables, the addition of covariates to a model is rarely or never needed in experimental research.
If a covariate is added and it is uncorrelated with the treatment (independent) variable, it is difficult
to interpret as in principle it is controlling for something already controlled for by randomization. If
the covariate is correlated with the treatment/independent, then its inclusion will lead the researcher
to underestimate of the effect size of the treatment factors (independent variables). .
In regression models, to fit regressions where there are both categorical and interval independents.
(This third purpose has become displaced by logistic regression and other methods. On ANCOVA
regression models, see Wildt and Ahtola, 1978: 52-54).

All three purposes have the goal of reducing the error term in the model. Like other control procedures,
ANCOVA can be seen as a form of "what if" analysis, asking what would happen if all cases scored
equally on the covariates, so that the effect of the factors over and beyond the covariates can be isolated.
ANCOVA can be used in all ANOVA designs and the same assumptions apply.

Key Concepts
o Why testing means is related to variance. ANOVA focuses on F-tests of significance of differences in
group means, discussed below. If one has an enumeration rather than a sample, then any difference of
means is "real." However, when ANOVA is used for comparing two or more different samples, the real
means are unknown. The researcher wants to know if the difference in sample means is enough to
conclude the real means do in fact differ among two or more groups (ex., if support for civil liberties
differs among Republicans, Democrats, and Independents). The answer depends on:
the size of the difference between group means (the variability of group means) .
the sample sizes in each group. Larger sample sizes give more reliable information and even small
differences in means may be significant if the sample sizes are large enough.
the variances of the dependent variable (ex., civil liberties scores) in each group. For the same
absolute difference in means, the difference is more significant if in each group the civil liberties
scores tightly cluster about their respective means. Likewise, if the civil liberties scores are widely
dispersed (have high variance) in each group, then the given difference of means is less significant.

The formulas for the t-test (a special case of one-way ANOVA), and for the F-test used in ANOVA, thus
reflect three things: the difference in means, group sample sizes, and the group variances. That is, the
ANOVA F-test is a function of the variance of the set of group means, the overall mean of all observations,
and the variances of the observations in each group weighted for group sample size. Thus, the larger the
difference in means, the larger the sample sizes, and/or the lower the variances, the more likely a finding of
significance.

o One-way ANOVA tests differences in a single interval dependent variable among two, three, or more
groups formed by the categories of a single categorical independent variable. Also known as univariate
ANOVA , simple ANOVA, single classification ANOVA, or one-factor ANOVA, this design deals with one
independent variable and one dependent variable. It tests whether the groups formed by the categories of
the independent variable seem similar (specifically that they have the same pattern of dispersion as
measured by comparing estimates of group variances). If the groups seem different, then it is concluded
that the independent variable has an effect on the dependent (ex., if different treatment groups have
different health outcomes). One may note also that the significance level of a correlation coefficient for the
correlation of an interval variable with a dichotomy will be the same as for a one-way ANOVA on the
interval variable using the dichotomy as the only factor. This similarity does not extend to categorical
variables with greater than two values.

In SPSS, select Analyze, Compare Means, One-Way ANOVA; enter the dependent variable in the
Dependent list; enter the independent variable as the Factor.

o Two-way ANOVA analyzes one interval dependent in terms of the categories (groups) formed by two
independents, one of which may be conceived as a control variable. Two-way ANOVA tests whether the
groups formed by the categories of the independent variables have similar centroids. Two-way ANOVA is
less sensitive than one-way ANOVA to moderate violations of the assumption of homogeneity of variances
across the groups. In SPSS, select Analyze, General Linear Model, Univariate; enter the dependent
variable and the independents (factors); if you want to test interactions, click Model and select Custom,
Model (Interaction) and enter interaction terms (ex. gender*race); click Plots to set plot options; click
Options to set what predicted group and interaction means are desired.
o Multivariate or n-way ANOVA. To generalize, n-way ANOVA deals with n independents. It should be
noted that as the number of independents increases, the number of potential interactions proliferates. Two
independents have a single first-order interaction (AB). Three independents have three first order
interactions (AB,AC,BC) and one second-order interaction (ABC), or four in all. Four independents have
six first-order (AB,AC,AD,BC,BC,CD), three second-order (ABC, ACD, BCD), and one third-order
(ABCD) interaction, or 10 in all. As the number of interactions increase, it becomes increasingly difficult
to interpret the model. The MAXORDERS command in SPSS syntax allows the researcher to limit what
order of interaction is computed.
o Factors are categorical independent variables, such as treatments. A factor is a fixed factor if all of its
values (categories) are measured, which is the usual case. A factor is a random factor if only a random
sample of its values are measured, which may be the case when a factor has a very large number of values.
Profile plots are plots of marginal means on the continuous dependent variable for value groups of
one factor, using values of another factor as the X axis (the Y axis is the magnitude of the mean).
Profile plots are an easy way to visualize the relationship of factors to the dependent variable and to
each other. For profile plots, click the Plots button in the univariate ANOVA dialog, specify one
factor as the horizontal axis, then specify a second factor for "Separate lines."
o Covariates in ANCOVA: A covariate is an interval-level independent. If there are covariates, ANCOVA is
used instead of ANOVA. Covariates are commonly used as control variables. For instance, use of a
baseline pre-test score can be used as a covariate to control for initial group differences on math ability or
whatever is being assessed in the ANCOVA study. That is, in ANCOVA we look at the effects of the
categorical independents on an interval dependent, after effects of interval covariates are controlled. (This
is similar to regression, where the beta weights of categorical independents represented as dummy
variables entered after interval independents reflect the control effect of these independents).

In SPSS, select Analyze, General Linear Model, Univariate; enter the dependent variable, factor(s), and
covariate variables; click Model and select the model type you want (ex., Full Factorial) or click Custom
and specify a model (such as Model: gender race gender*race). You may click the Options button and
specify means to estimate (gives marginal means for each level of the selected factor when the covariate is
at its mean value), and under the Options button you may check that you want Parameter Estimates (which
are the GLM-estimated regression coefficients).

Regression models. In univariate GLM, entering only covariates and no factors in the model is
equivalent to specifying a regression model.
b coefficients. If you ask for parameter estimates for a model which has factors as well as
covariates, adding the b corresponding to a given level of a factor when added to the b for the
intercept, gives the estimate of the dependent when the covariate is 0.
o Designs. ANOVA and ANCOVA have a number of different experimental designs. The alternative designs
affect how the F-ratio is computed in generating the ANOVA table. However, regardless of design, the
ANOVA table is interpreted similarly -- the significance of the F-ratio indicates the significance of each
main and interaction effect (and each covariate effect in ANCOVA).
Between-groups ANOVA design: When a dependent variable is measured for independent groups
of sample members, where each group is exposed to a different condition, the set of conditions is
called a between-subjects factor. The groups correspond to conditions, which are categories of a
categorical independent variable. For the experimental mode, the conditions are assigned randomly
to subjects by the researcher, or subjects are assigned randomly to exposure to the conditions,
which is equivalent. For the non-experimental mode, the conditions are simply measures of the
independent variable for each group. For instance, four random groups might all be asked to take a
performance test (the interval dependent variable) but each group might be exposed to different
levels of noise distraction (the categorical independent variable).

This is the usual ANOVA design. There is one set of subjects: the "groups" refer to the subset of
subjects associated with each category of the independent variable (in one-way ANOVA) or with
each cell formed by multiple categorical independents (in multivariate ANOVA). After
measurements are taken for each group, analysis of variance is computed to see if the variance on
the dependent variable between groups is different from the variance within groups. Just by chance,
one would expect the variance between groups to be as large as the variance within groups. If the
variance between groups is enough larger than the variance within groups, as measured by the F
ratio (discussed below), then it is concluded that the grouping factor (the independent variable(s)
does/do have a significant effect.

Completely randomized design is simply between-groups ANOVA design for the experimental
mode (see above), where an equal number of subjects is assigned randomly to each of the cells
formed by the factors (treatments). Randomization is an effort to control for all unmeasured factors.
When there is an priori reason for thinking some additional independent categorical variable is
important, the additional variable may be controlled explicitly by a block design (see below), or by
a covariate (in ANCOVA) if the independent is a continuous variable. In the non-experimental
mode, where there is no control by randomization, it is all the more important to control explicitly
by using covariates.
Latin square designs. Latin square designs extend the logic of block designs to control for two
categorical variables. Latin square designs also reduce the number of observations necessary to
compute ANOVA. This design requires that the researcher assume all interaction effects are zero.
Normally, if one had three variables, each of which could assume four values, then one would need
43 = 64 observations just to have one observation for every possible combination. Under Latin
square design, however, the number of necessary observations is reduced to 42 = 16 because the
third variable is nested. For instance, suppose there are 4 teachers, 4 classes, and 4 textbooks. The
16 groups in the design would be the 16 different teacher-class pairs. Each teacher would teach in
each of the four classes, using a different text each time. Each class would be taught by each of the
four different teachers, using a different text each time. However, only 16 of the 64 possible
teacher-class-textbook combinations would be represented in the design because textbooks are a
nested factor, with each class and each teacher being exposed to a given textbook only once.
Eliminating all but 16 cells from the complete (crossed) design requires the researcher to assume
there are no significant teacher-textbook or class-textbook interaction effects, only the main effects
for teacher, class, and textbook For a discussion of how to select the necessary observations under
Latin square, see Iverson and Norpoth (1987: 80-84).
Graeco-Latin square designs extend block designs to control for three categorical variables.
Full Factorial ANOVA: Factorial ANOVA is for more than one factor (more than one independent
-- hence for two-way ANOVA or higher), used to assess the relative importance of various
combinations of independents. In a full factorial design, the model includes all main effects and all
interactions among the factors but does not include interactions between the factors and the
covariates. As such factorial anova is not a true separate form of ANOVA design but rather a way of
combining designs. A design matrix table shows the intersection of the categories of the
independent variables. A corresponding ANOVA table is constructed in which the columns are the
various covariate (in ANCOVA), main, and interaction effects. See the discussion of two-way
ANOVA below.
Factors are categorical independent variables. The categories of a factor are its groups or
levels. When using factorial ANOVA terminology, 2 x 3 ("two-by-three") factorial design
means there are two factors with the first having two categories and the second having
three, for a total of six groups (levels). A 2x2x2 factorial design has three factors, each with
two categories. The order of the factors makes no difference. If you multiply through, you
have the number of groups (often "treatment groups") formed by all the independents
collectively. Thus a 2x3 design has 6 groups, and a 2x2x2 design has 8 groups. In
experimental research equal numbers of subjects are assigned to each group on a random
basis.
Balanced designs are simply factorial designs where there are equal numbers of cases in
each subgroup (cell) of the design, assuring that the factors are independent of one another
(but not necessarily the covariates). Unbalanced designs have unequal n's in the cells
formed by the intersection of the factors.
Random effect models. Most ANOVA designs are fixed effect models ("Model I"), meaning
that data are collected on all categories of the independent variables. Factors with all
category values included are called "fixed factors." In random effect models ("Model II"),
in contrast, data are collected only for a sample of categories. For instance, a researcher
may study the effect of item order in a questionnaire. Six items could be ordered 720 ways.
However, the researcher may limit him- or herself to the study of a sample of six of these
720 ways. The random effect model in this case would test the null hypothesis that the
effects of ordering are zero. For one-way ANOVA, computation of F is the same for fixed
and random effects, but computation differs when there are two or more independents. The
resulting ANOVA table still gives similar sums of squares and F-ratios for the main and
interaction effects, and is read similarly (see below). See Iverson and Norpoth (1987: 69-
78). Random effect models assume normality, homogeneity of variances, and sphericity, but
are robust to violations of these assumptions (Jackson and Brashers, 1994: 34-35).
Random factors models are the same as random effect models. Do not confuse these
terms with completely randomized design or randomized block design, which are
fixed factor models.
Random effects are factors which meet two criteria:
Replaceability: The levels (categories) of the factor (independent variable)
are randomly or arbitrarily selected, and could be replaced by other, equally
acceptable levels.
Generalization: The researcher wishes to generalize findings beyond the
particular, randomly or arbitrarily selected levels in the study.
Mixed factorial design is any random effect model with one fixed factor and one
random factor.
Nested designs. In nested designs, there are two (or more) factors, but the levels of
one factor are never repeated as levels of the other factor. This happens in
hierarchical designs, for instance, when a forester samples trees, then samples
seedlings of each sampled trees for survival rates.. The seedlings are unique to each
tree and are a random factor. Likewise, we could sample drug companies and within
sampled companies, we could sample drug products for quality. This contrasts with
crossed designs of ordinary two-way (or higher) ANOVA, in which the levels of one
factor appear as levels in another factor (ex., tests may appear as levels across
schools). We can get the mean of different tests by averaging across schools, but we
cannot get the mean survival rate of different seedlings across trees because each
tree has its own unique seedlings. Likewise, we cannot compute the mean quality
rating for a drug product across companies because each company has its own
unique set of products. Latin square and Graeco-Latin square designs (see above)
are also nested designs.

In SPSS, Analyze, General Linear Model, Univariate; specify the main factor as
fixed or random, then specify the nested factor as random; click the Model button
and enter the main effects of the main (not nested) factor(s); click the Paste button
and modify the /DESIGN statement to a format such as /DESIGN = mainfactor
nestedfactor(mainfactor), signifying the model is the main effect of the fixed factor
plus the effects of the random nested factor at each value of the main fixed factor. In
the syntax window, Run All. In the resulting ANOVA table, a significant
nestedfactor(mainfactor) effect means that the dependent variable varies by the
nested factor even within the same level of (controlling for) the main factor.

Treatment by replication design is a common random effects model. The treatment


is a fixed factor, such as exposure to different types of public advertising, while the
replication factor is the particular respondents who are treated. Sometimes it is
possible and advisable to simplify analysis from a hierarchical design to a simple
treatment by replication design by shifting the unit of analysis, as by using class
averages rather than student averages in a design in which students are a random
factor nested within teachers as another random factor (the shift drops the student
random factor from analysis). Note also that the greater the variance of the random
effect variable, the more levels needed (ex. more subjects in replication) to test the
fixed (treatment) factor at a given alpha level of significance.
Effects shown in the ANOVA table for a random factor design are interpreted a bit
differently from standard, within-groups designs. The main effect of the fixed
treatment variable is the average effect of the treatment across the randomly-
sampled or arbitrarily-selected categories of the random effect variable. The effect
of the fixed by random (ex., treatment by replication) interaction indicates the
variance of the treatment effect across the categories of the random effect variable.
The main effect of the random effect variable (ex., the replication effect) is of no
theoretical interest as its levels are arbitrary particular cases from a large population
of equally acceptable cases.
Treating a random factor as a fixed factor will inflate Type I error. The F test for the
treatment effect may read as .05 on the computer output, but F will have been
computed incorrectly. That is, the treatment effect will be .05 only for the particular
levels of the random effect variable (ex., the subjects in the replication factor). This
test result is irrelevant to the researcher's real interest, which is controlling the alpha
error rate (ex., .05) for the population from which the levels of the random effect
variable were taken. The correct computation of F is discussed below.

Put another way, if a random factor is treated as a fixed factor, the researcher opens
his or her research up to the charge that the findings pertain only to the particular
arbitrary cases studied and findings and inferences might well be quite different if
alternative cases had been selected. The purpose of using a random effect model is
to avoid these potential criticisms by taking into account the variability of the
replications or random effects when computing the error term which forms the
denominator of the F test for random effect models.

Mixed effects models One can have mixed effects models, which have both fixed
factors and random factors. In SPSS, select Analyze, General Linear Model,
Univariate; specify the dependent variable; specify the fixed factor(s); specify the
random factor(s); click Model and select your model.
Repeated measures or within-groups ANOVA design: When a dependent variable is
measured repeatedly at different time points (ex., before and after treatment) for all sample
members across a set of conditions (the categories of an independent variable), this set of
conditions is called a within-subjects factor and the design is called within-groups or
repeated measures ANOVA. In the within-groups or repeated measures design, there is one
group of subjects. The conditions are the categories of the independent variable, which is
the repeated measures factor, and each subject is exposed to each condition and measured.
For instance, four random groups might all be asked to take a performance test (the interval
dependent variable) four times -- once under each of four levels of noise distraction (the
categorical independent variable).

The object of repeated measures design is to test the same group of subjects at each
category (ex., levels of distraction) of the independent variable. The levels are introduced to
the subject in a counterbalanced manner to rule out effects of practice and fatigue. The
levels must be independent (performance on one cannot affect performance on another).
Each subject is his or her own "control": the different "groups" are really the same people
tested at different levels of the independent variable. Because each subject is his/her own
control, unlike between-groups ANOVA, in repeated measures designs individual
differences don't affect differences between treatment groups. This in turn means that
within-group variance is no longer the appropriate error term (denominator) in the F-ratio.
This then requires different computation of error terms. SPSS makes these different
calculations automatically when repeated measures design is specified. Repeated measures
ANOVA is also much more affected by violations of the assumption of homogeneity of
variances (and covariances in ANCOVA) compared to between-groups ANOVA.

Note that the RELIABILITY procedure in SPSS's "Professional Statistics" module can be
used to perform repeated measures analysis of variance when the more complex options of
the MANOVA procedure are not needed.

Randomized Complete Block Design ANOVA: In a randomized complete block


design (RCBD), there are still one or more factors (treatments) as in completely
randomized design, but there is also another categorical (factor) variable, which is
used to form the blocks. In agriculture, the blocking variable might be plots of land
and the treatment factors might be fertilizers. In another study, drug brand and drug
dosage level could be the factors and the blocks could be age groups. Each brand-
dosage combination is considered a treatment. The blocking factor is sometimes
called the "nuisance variable." If there are four brands and three dosage levels, the
factor design contains 12 cells, one for each treatment. In RCB designs, subjects are
matched together in blocks (ex.,age group), then one (usually) member of each
block is randomly assigned to each treatment. Each treatment is assigned an equal
number of times to each block group (usually once). Within each block there must
be as many subjects as treatment categories (12 here).
Within-groups F ratio. Note that when sample members are matched in this
way, the F-ratio is computed similar to that in a repeated measures ANOVA,
discussed above. Within-subjects ANOVA applies to matching as well as to
repeated measures designs.
RCB designs with empty cells. Type III sums of squares (the SPSS default)
are used even if some design cells are empty, provided (1) every treatment
appears at least once in some blocks and (2) each block has some of the same
treatments. If, however, a treatment does not appear in any block, then
significance tests should utilize Type IV sums of squares, not the default
Type III. Type IV sums of squares use a type of averaging to compensate for
empty cells. In SPSS, Type IV sums of squares are specified by choosing
Model, Custom, Model in the Univariate GLM dialog. For further
discussion, see Millikan and Johnson (1992).
Mixed design models is a term which refers to the fact that in repeated measures ANOVA
there also may still be one or more between-subjects factors in which each group of the
dependent variable is exposed to a separate and different category of an independent
variable, as discussed above in the section on between-groups designs. Mixed designs are
common. For instance, performance test might be the interval dependent variable, noise
distraction might be the within-subjects repeated factor (measure) administered to all
subjects in a counterbalanced sequence, and the between-subjects factor might be mode of
testing (ex., having a pen-and-paper test group and a computer-tested group). Repeated
measures ANOVA must be specified whenever there are one or more repeated factor
measures, even if there are also some between-groups factors which are not repeated
measures. In mixed designs, sphericity is almost always violated and therefore epsilon
adjustments to degrees of freedom are routine prior to computing F-test significance levels.
Split-plot designs are a form of mixed design, originating in agricultural research,
where seeds were assigned to different plots of land, each receiving a different
treatment. In split plot designs, subjects (ex., seeds) are randomly assigned to each
level (ex., plots) of the between-groups factor (soil types), prior to receiving the
within-subjects repeated factor treatments (ex., applications of different types of
fertilizer).
Pretest-posttest designs are a special variant of mixed designs, which involve baseline
testing of treatment and control groups, administration of a treatment, and post-test
measurement. As Girden (1992: 57-58) notes, there are four ways of handling such designs:
One-way ANOVA on the posttest scores. This involves ignoring the pretest data and
is therefore not recommended.
Split-plot repeated measures ANOVA can be used when the same subjects are
measured more than once. In this design, the between-subjects factor is the group
(treatment or control) and the repeated measure is, for example, the test scores for
two trials. The resulting ANOVA table will include a main treatment effect
(reflecting being in the control or treatment group) and a group-by-trials interaction
effect (reflecting treatment effect on posttest scores, taking pretest scores into
account). This partitioning of the treatment effect may be more confusing than
analysis of difference scores, which gives equivalent results and therefore is
sometimes recommended.

In a typical split-plot repeated measures design, Subjects will be measured on some


Score over a number of Trials. Subjects will also be split by some Group variable. In
SPSS, Analyze, General Linear Model, Univariate; enter Score as the dependent;
enter Trial and Group as fixed factors; enter Subject as a random factor; Press the
Model button and choose Custom, asking for the Main effects for Group and Trial,
and the interaction effect of Trial*Group; then click the Paste button and modify
the /DESIGN statement to also include Subject(Group) to get the Subject-within-
Group effect; then select Run All in the syntax window to execute.

One-way ANOVA on difference scores, where difference is the posttest score minus
the pretest score. This is equivalent to a split-plot design if there is close to a perfect
linear relation between the pretest and posttest scores in all treatment and control
groups. This linearity will be reflected in a pooled within-groups regression
coefficient of 1.0. When this coefficient approaches 1.0, this method is more
powerful than the ANCOVA method.
ANCOVA on the posttest scores, using the pretest scores as a covariate control.
When pooled within-groups regression coefficient is less than 1.0, the error term is
smaller in this method than in ANOVA on difference scores, and the ANCOVA
method is more powerful.

Effects
Main effects: Main effects are the unique effects of the categorical independent
variables. If the probability of F is less than .05 for any independent, it is concluded
that that variable does have an effect on the dependent.
Interaction effects: Interaction effects are the joint effects of pairs, triplets, or
higher-order combinations of the independent variables, different from what would
be predicted from any of the independents acting alone. That is, when there is
interaction, the effect of an independent on a dependent varies according to the
values of another independent. When there is interaction in a fixed effect model, the
researcher becomes less interested in the main effects but instead proceeds to
examine one factor's effect at each level of the other factor. If the probability of F is
less than .05 for any such combination, we conclude that that interaction of the
combination does have an effect on the dependent. Note that the concept of
interaction between two independents is not related to the issue of whether the two
variables are correlated.
Covariate effects: In ANCOVA, covariates are interval-level independents. The F
test is interpreted the same as for main effects.
Residual effects: Residual effects are effects unmeasured variables. The smaller the
sum of squared residuals, the better the fit of the ANOVA model. When the residual
effect is 0, then the values of each observation are the same within groups, since the
grouping effects totally determine the dependent. That is, the group mean estimates
the value of the dependent when residual effect is 0. The difference between
observed values and group means are residuals. The residual sum of squares will
equal the total sum of squares minus the group sum of squares.
Residual analysis: Systematic patterns in the residuals may throw light on
unmeasured variables which should have been included in the analysis.
Extreme outliers in the distribution may indicate cases which need to be
explained on a different basis. Normal distribution of residuals is an
assumption of ANOVA. While it has been demonstrated that the F-test is
robust in the face of violations of this assumption, if there is extreme
skewness or extreme kurtosis, then the reliability of the F-test is brought into
question. Histograms of residuals (bar charts) allow visual inspection of
skew and kurtosis, and are output by most ANOVA software.
Effect size. An effect size, usually denoted d, is a standardized measure of the
strength of a relationship. That is, the effect size indicates the relative importance of
the given covariate, main, or interaction effect. Effect sizes are computed as a
function of differences in subgroup means by effect category. The effect size is
divided by the pooled standard deviation (the standard deviation of the
unstandardized data for all the cases, for all groups) to provide a coefficient which
may be used to compare group effects. In a two-variable analysis, d is the difference
in group means (on y, the dependent) divided by the pooled standard deviation of y.
Computation of d becomes more complex for other ANOVA designs - Cortina and
Nouri (2000) give formulas for n-way, factorial, ANCOVA, and repeated measures
designs. In an ANOVA table, the effect size normally is placed at the bottom of each
effect column.

There are several variants for computing d. Glass's d, for instance, uses control
group standard deviation rather than pooled standard deviation as the denominatory.
If the homogeneity of variances assumption is met, this variation is without effect.
Cortina and Nouri (2000: 58) note that for sample size greater than 50, variant
formulas for computing d seldom inflate d by more than .01, rarely affecting
inferences.

For meta-analyses comparing effect sizes across groups, either d or r (correlation)


may be used. Usually the researcher will use one or the other, but not both.
Conversion of p values, F values, and t values to r is easier than for d and so r is
more often used for this purpose. Cortina and Nouri (2000) provide formulas for
conversion to d.

Profile plots of effects. Effects may be depicted graphically Univariate GLM


predicts mean cell values, which may be plotted across groups to show trends and
effects. The X axis is categories of a factor and the Y axis is estimated means. Lines
connect means of the dependent (ex., test scores) of a second factor's categories (ex.,
gender) across categories of a first factor (ex., region). Parallel or roughly parallel
lines indicate lack of interaction effects (ex., lack of interaction between gender and
region). In SPSS, profile plots are selected under the Plots button in the main dialog
box on selecting Analyze, General Linear Model, Univariate.
Significance tests
F-test, also called the F-ratio. The F-test is an overall test of the null hypothesis that
group means on the dependent variable do not differ. It is used to test the
significance of each main and interaction effect (the residual effect is not tested
directly). A "Sig." or "p" probability value of .05 or less on the F test conventionally
leads the researcher to conclude the effect is real and not due to chance of sampling.
For most ANOVA designs, F is between-groups mean square variance divided by
within-groups mean square variance. (Between-groups variance is the variance of
the set of group means from the overall mean of all observations. Within-groups
variance is a function of the variances of the observations in each group weighted
for group size.) If the computed F score is greater than 1, then there is more
variation between groups than within groups, from which we infer that the grouping
variable does make a difference. If the F score is enough above 1, it will be found to
be significant in a table of F values, using df=k-1 and df=N-k-1, where N is sample
size and k is the number of groups formed by the factor(s). That is, the logic of the
F-test is that the larger the ratio of between-groups variance (a measure of effect) to
within-groups variance (a measure of noise), the less likely that the null hypothesis
is true.

If the computed F value is around 1.0, differences in group means are only random
variations. If the computed F score is significantly greater than 1, then there is more
variation between groups than within groups, from which we infer that the grouping
variable does make a difference. Note that the significant difference may be very
small for large samples. The researcher should report not only significance, but also
strength of association, discussed below.

Reading the F value. If the F score is enough above 1, it will be found to be


significant in a table of F values, using k - 1 (number of groups minus 1)
degrees of freedom for between-groups and n - k (sample size minus the
number of groups) for within-groups degrees of freedom. If F is significant,
then we conclude there are differences in group means, indicating that the
independent variable has an effect on the dependent variable. In practice, of
course, computer programs do the lookup for the researcher and return the
significance level automatically.

Example. For instance, an F-ratio of 1.21 with 1 and 8 degrees of freedom


corresponds to a significance level of .30, which means that there is a 30%
chance that one would find a sample difference of means this large or larger
when the unknown real difference is zero. At the customary .05 significance
level cutoff customarily used by social scientists, this is too much chance.
That is, the researcher would not reject the null hypothesis that the group
means do not differ on the dependent variable being measured.

Significance in two-way ANOVA. Toothaker (1993: 69) notes that in two-way


ANOVA most researchers set the alpha significance level (ex., .05) at the
same level for the two main effects and the interaction effect, but that "when
you make this choice, you should realize that the error rate for the whole
experiment is approximately three times alpha. Toothaker therefore
recommends setting the error rate at alpha/3 to obtain an overall
experimentwise error rate of alpha in two-way ANOVA.

F-test assumptions. The F test is less reliable as sample sizes are smaller,
group sample sizes are more divergent, and the number of factors increase
(see Jaccard, 1998: 81). In the case of unequal variances and unequal group
sample sizes, F is conservative if smaller variances are found in groups with
smaller samples. If larger variances are found in groups with smaller
samples, F is too liberal, with actual Type I error more than indicated by the
F test.
Adjusted means are usually part of ANCOVA output and are examined if the F-test
demonstrates significant relationships exist. Comparison of the original and adjusted
group means can provide insight into the role of the covariates. For k groups formed
by categories of the categorical independents and measured on the dependent
variable, the adjustment shows how these k means were altered to control for the
covariates. Typically, this adjustment is one of linear regression of the type: Yadj.mean
= Ymean - b(Xith.mean-Xmean), where Y is the interval dependent, X is the covariate, i is
one of the k groups, and b is the regression coefficient. There is no constant when Y
is standardized. For multiple covariates, of course, there are additional similar X
terms in the equation.
Hotelling's T-Square is a multivariate significance test of mean differences, for the
case multiple interval dependents and two groups formed by a categorical
independent. SPSS computes the related statistic, Hotelling's Trace (a.k.a. Lawley-
Hotelling or Hotelling-Lawley Trace). To convert from the Trace coefficient to the
T-Square coefficient, multiply the Trace coefficient by (N-L), where N is the sample
size across all groups and L is the number of groups. The T-Square result will still
have the same F value, degrees of freedom, and significance level as the Trace
coefficient.
Multiple Comparison Procedures are used to assess which group means differ
from which others, after the overall F test has demonstrated at least one difference
exists. If the F test establishes that there is an effect on the dependent variable, the
researcher then proceeds to determine just which group means differ significantly
from others. The group means, of course, refer to the means on the dependent
variable for each of the k groups formed by the categories of the independent
variable(s). The possible number of comparisons is k(k-1)/2. Multiple comparisons
help specify the exact nature of the overall effect determined by the F test.

In SPSS, for One-Way ANOVA, select Analyze, Compare Means, One-Way


ANOVA; click Post Hoc; select the multiple comparison test you want (see below).

Planned comparisons with the t-test: The t-test is a test of significance of


the difference in the means of a single interval dependent, for the case of two
groups formed by a categorical independent. The simple t-test is
recommended when the researcher has a single planned comparison (a
comparison of means specified beforehand on the basis of priori theory). If
the independents formed 8 groups there would be 8!/6!2! = 28 comparisons
and if one used the .05 significance level, one would expect at least one of
the comparisons to generate a false positive (thinking you had a relationship
when you did not).

In contrast to the t-test, coefficients based on the q-statistic, discussed below,


are commonly used for post-hoc comparisons (exploring the data to uncover
large differences, without limiting investigation by priori theory). T-tests
may be seen as a special case of one-way ANOVA. The SPSS TTEST
procedure implements t-tests.
Bonferroni-adjusted multiple t-tests. Also called the Dunn test, a simple
type of multiple comparisons is to apply the standard t-test, but then to adjust
the significance level by multiplying by the number of comparisons being
made. For instance, a finding of .01 significance for 9 comparisons
becomes .09. This is equivalent to saying that if the target alpha significance
level is .05, then the t-test must show alpha/9 (ex., .05/9 = .0056) or lower
for a finding of significance to be made. Bonferroni-adjusted multiple t-tests
are usually employed only when there are few comparisons, as with many it
quickly becomes practically impossible to show significance. Note this test
may be applied to F-tests as well as t-tests. That is, it can handle nonpairwise
as well as pairwise comparisons.

This test imposes an extremely small alpha significance level as the number
of comparisons becomes large. That is, this method is not recommended
when the number of comparisons is large because the power of the test
becomes low. Klockars and Sax (1986: 38-39) recommend using a simple .
05 alpha rate when there are few comparisons, but using the more stringent
Bonferroni-adjusted multiple t-test when the number of planned comparisons
is greater than the number of degrees of freedom for between-groups mean
square (which is k-1, where k is the number of groups). Nonetheless,
researchers still try to limit the number of comparisons, trying to reduce the
probability of Type II errors (accepting a false null hypothesis). This test is
not recommended when the researcher wishes to perform all possible
pairwise comparisons.

If the Bonferroni test is requested, SPSS will print out a table of "Multiple
Comparisons" giving the mean difference in the dependent variable between
any two groups (ex., differences in test scores for any two educational
groups). The significance of this difference is also printed, and an asterisk is
printed next to differences significant at the .05 level or better. SPSS
supports the Bonferroni test in its GLM and UNIANOVA procedure.

Sidak test. The Sidak test is a variant on the Dunn or Bonferroni approach,
using a t-test for pairwise multiple comparisons. The alpha significance level
for multiple comparisons is adjusted to tighter bounds than for the
Bonferroni test. SPSS supports the Sidak test in its GLM and UNIANOVA
procedures.
Dunnett's test is a t-statistic which is used when the researcher wishes to
compare each treatment group mean with the mean of the control group, and
for this purpose has better power than alternative tests. This test, based on a
1955 article by Dunnett, is not to be confused with Dunnett's C or Dunnett's
T3, discussed below.
Post-hoc tests are used when the researcher is exploring differences, not limited by ones
specified in advance on the basis of theory. Most tests in this group use the q-statistic. As a
general principle, when comparisons of group means are selected on a post hoc basis simply
because they are large, there is an expected increase in variability for which the researcher
must compensate by applying a more conservative test -- otherwise, the likelihood of Type I
errors will be substantial.
The q-statistic, also called the q range statistic or the Studentized range statistic, is
commonly used in coefficients for planned comparisons. Both the q and t statistics
use the difference of means in the numerator, but where the t statistic uses the
standard error of difference between the means in the denominator; q uses the
standard error of the mean. Consequently, where the t test tests the difference
between two means, the q-statistic tests the probability that the largest mean and
smallest mean among the k groups formed by the categories of the independent(s)
were sampled from the same population. If the q-statistic computed for the two
sample means is not as large as the criterion q value in a table of critical q values,
then the researcher cannot reject the null hypothesis that the groups do not differ at
the given alpha significance level (usually .05). If the null hypothesis is not rejected
for the largest compared to smallest group means, it follows that all intermediate
groups are also drawn from the same population -- so the q-statistic is thus also a
test of homogeneity for all k groups formed by the independent variable(s).
Tukey honestly significant difference (HSD) test: If the Tukey test is requested,
SPSS will produce a table similar to that for the Bonferroni test (see above) of all
pairwise comparisons between groups, interpreted in the same way. The Tukey
method is preferred when the number of groups is large as it is a very conservative
pairwise comparison test, and researchers prefer to be conservative when the large
number of groups threatens to inflate Type I errors. That is, HSD is the most
conservative of the post-hoc tests in that it is the most likely to accept the null
hypothesis of no group differences. Some recommend it only when all pairwise
comparisons are being tested. When all pairwise comparisons are being tested, the
Tukey HSD test is more powerful than the Dunn test (Dunn may be more powerful
for fewer than all comparisons). The Tukey HSD test is based on the q-statistic (the
Studentized range distribution) and is limited to pairwise comparisons.

In SPSS, Analyze, Compare Means, One-Way ANOVA; click Post Hoc; select
Tukey. The "sig." column is the Tukey corrected significance level.

Games and Howell's modification of Tukey's HSD is a modified HSD test


which is appropriate when the homogeneity of variances assumption is
violated. Games-Howell test is relatively liberal. (See below for other tests
not assuming homogeneity of variances).
Tukey's wholly significant difference (WSD) test, also called Tukey-b in
SPSS, is a less conservative version of Tukey's HSD test, also based on the
q-statistic. The critical value of WSD (Tukey-b) is the mean of the
corresponding value for the Tukey's HSD test and the Newman-Keuls test,
discussed below.
Newman-Keuls test. also called the Student-Newman-Keuls test, is a post-hoc
comparison test, also based on the q-statistic, which is used to evaluate partial null
hypotheses (hypotheses that all but g of the k means come from the same
population). Let k = the number of groups formed by categories of the independent
variable(s). First all combinations of k-1 means are tested, then k-2 groups, and so
on until sets of 2 means are tested. As one is proceeding toward testing ever smaller
sets, testing stops if an insignificant range is discovered (that is, if the q-statistic for
the comparison of the highest and lowest mean in the set [the "stretch"] is not as
great as the critical value of q for the number of groups in the set). Klockars and Sax
(1986: 57) recommend the Newman-Keuls test when the researcher wants to
compare adjacent means (pairs adjacent to each other when all means are presented
in rank order). Toothaker (1993: 29) recommends Newman-Keuls only when the
number of groups to be compared equals 3, assuming one wants to control the
comparison error rate at the experimentwise alpha rate (ex., .05), but states that the
Ryan or Shaffer-Ryan, or the Fisher-Hayter tests are preferable (Toothaker, 1993:
46).
Ryan test (REGWQ): This is a modified Newman-Keuls test adjusted so
critical values decrease as stretch size (the range from highest to lowest
mean in the set being considered) decreases. The result is that Ryan controls
the experimentwise alpha rate at the desired level (ex., .05) even when the
number of groups exceeds 3, but at a cost of being less powerful (more
chance of Type II errors) than Newman-Keuls. As with Newman-Keuls,
Ryan is a step-down procedure such that one will not get to smaller stretch
comparisons if the null hypothesis is accepted for larger stretches of which
they are a subset. Toothaker (1993: 56) calls Ryan the "best choice" among
tests supported by major statistical packages because maintains good alpha
control (ex., better than Newman-Keuls) while having at least 75% of the
power of the most powerful tests (ex., better than Tukey HSD).
The Shaffer-Ryan test modifies the Ryan test. It is also a protected
or step-down test, requiring the overall F test reject the null
hypothesis first but uses slightly different critical values. To date,
Shaffer-Ryan is not supported by SAS or SPSS, but it is
recommended by Toothaker (1993: 55) as "one of the best multiple
comparison tests in terms of power."
The least significant difference test (LSD), also called Fisher's LSD or the
protected t test, is not a range test (as are those above based on the q-statistic), but
instead is based on the t-statistic and thus can be considered a form of t-test. It
compares all possible pairs of means after the F-test rejects the null hypothesis that
groups do not differ (this is a requirement of the test). (Note some computer
packages wrongly report LSD t-test coefficients for comparisons even if the F test
leads to acceptance of then null hypothesis). It can handle both pairwise and
nonpairwise comparisons and does not require equal sample sizes. LSD is the most
liberal of the post-hoc tests (it is most likely to reject the null hypothesis in favor of
finding groups do differ). It controls the experimentwise Type I error rate at a
selected alpha level (typically 5%), but only for the omnibus (overall) test of the null
hypothesis. LSD allows higher Type I errors for the partial null hypotheses involved
in the comparisons. Toothaker (1993: 42) recommends against any use of LSD on
the grounds that it has poor control of alpha significance, and better alternatives
exist such as Shaffer-Ryan, discussed above. However, the LSD test is the default in
SPSS for pairwise comparisons in its GLM or UNIANOVA procedures.
The Fisher-Hayter test is a modification of the LSD test meant to control
for the liberal alpha significance level allowed by LSD. It is used when all
pairwise comparisons are done post-hoc, but power may be low for fewer
comparisons. See Toothaker (1993: 43-44).
The Scheff test is a widely-used method of controlling Type I errors in post hoc
testing of differences in group means. It works by first requiring the overall F test of
the null hypothesis be rejected. If the null hypothesis is not rejected overall, then it
is not rejected for any comparison null hypothesis. If the overall null hypothesis is
rejected, however, then F values are computed simultaneously for all possible
comparison pairs and must be higher than an even larger critical value of F than for
the overall F test described above. Let F be the critical value of F as used for the
overall test. For the Scheff test, the new, higher critical value, F', is (k-1)F. The
Scheff test can be used to analyze any linear combination of group means.

While the Scheff test maintains an experimentwise .05 significance level in the face
of multiple comparisons, it does so at the cost of a loss in statistical power (more
Type II errors may be made -- thinking you do not have a relationship when you do).
That is, the Scheff test is a conservative one (more conservative than Dunn or
Tukey, for ex.), not appropriate for planned comparisons but rather restricted to post
hoc comparisons. Even for post hoc comparisons, the test is used for complex
comparisons and is not recommended for pairwise comparisons due to "an
unacceptably high level of Type II errors" (Brown and Melamed, 1990: 35).
Toothaker (1993: 28) recommends the Scheff test only for complex comparisons,
or when the number of comparisons is large. The Scheff test is low in power and
thus not preferred for particular comparisons, but it can be used when one wishes to
do all or a large number of comparisons. Tukey's HSD is preferred for making all
pairwise comparisons among group means, and Scheff for making all or a large
number of other linear combinations of group means.

Methods when the assumption of homogeneity of variances is not met: SPSS


provides these alternate methods, all of which also provide for unequal sample sizes:
Games-Howell GH Games-Howell, designed for unequal variances and
unequal sample sizes, is based on the q-statistic distribution. GH can be
liberal when sample size is small and is recommended only when group
sample sizes are greater than 5. Because GH is only slightly liberal and
because it is more powerful than Dunnett's C or T3, it is recommended over
these tests. Toothaker (1993: 66) recommends GH for the situation of
unequal (or equal) sample sizes and unequal or unknown variances.
Dunnett's T3 and Dunnett's C. These tests might be used in lieu of GH
only if it were essential to maintain strict control over the alpha significance
level (es., exactly .05 or better).
Tamhane's T2. Tamhane's T2 is a conservative test.
Additional methods when sample sizes are unequal: Toothaker (1993: 60) notes
most multiple comparison procedure assume equal sample sizes in the groups being
compared. Because multiple comparison tests are thought to be robust in face of a
violation of this assumption, tests specifically designed for unequal sample sizes are
not common. Three procedures when sizes are markedly unequal are:
The Tukey-Kramer test: This test, described in Toothaker (1993: 60), who
also gives an appendix with critical values, controls alpha. Requires equal
population variances. Toothaker (p. 66) recommends this test for the
situation of equal variances but unequal sample sizes. In SPSS, if you ask for
the Tukey test and sample sizes are unequal, you will get the Tukey-Kramer
test, using the harmonic mean.
The Miller-Winer test: Not recommended unless equal population
variances are assured.
The Hochberg GT2 test: Not recommended unless equal population
variances are assured.
Multiple range (homogeneous subset) tests test for homogeneous subsets of
groups based on their group means. This is simply a different way of looking at
multiple comparison tests based on the q range statistic. The Tukey HSD method
can output both comparison and range tests. For the range tests, SPSS will print out
a table which lists all the groups (the categories on the independent variable) and
their means. Then for the .05 level of significance, additional columns will be
printed, one for each subset where group means do not differ significantly. Note this
means that when the significance level of mean differences is worse than .05, we fail
to accept the null hypothesis that the means do not differ. In each subset column
only the means of the variables in that subset will be printed. The subsets may
overlap (a group may belong to more than one subset). Examination of the different
subset columns reveals for which groups (independent variable categories) the
means on the dependent do or do not differ.
Contrast tests. A contrast is a comparison of means among some or all of the
groups. A contrast test is a test of an hypothesis relating the group means. For
instance, given three groups (Catholic, Protestant, Jewish) one might test the
hypothesis that a liberalism quotient for Catholics is half-way between the means for
the other two groups. Contrast codes would be set as Catholic=2, Protestant= -1, and
Jewish = -1. The codes must add to zero. In this example, 2 times the mean of the
liberalism quotient for Catholics, minus 1 times the mean for Protestants, minus 1
times the mean for Jews, equals zero. (Note this is equivalent to the quotient for
Catholics being the mean of the quotients for the other two groups). If the "sig 2-
tailed" t-test of the contrast is significant, then the researcher rejects the null
hypothesis that the Catholic mean is the average of the other two. (Actually, there
are two "sig 2-tailed" rows: the "assume equal variances" row is if the research
assumes groups come from the same population, and "Doesn't assume equal" if the
groups cannot be assumed to come from the same population).

In SPSS, select Analyze, Compare Means, One-Way ANOVA; click Contrasts; enter
the contrasts you want. Any number of contrast tests is possible. If the researcher
wishes to omit a group from the comparison, it is simply coded 0.

Polynomial contrast tests. Polynomial contrast tests can be used to test which level
of polynomial (linear, quadratic, cubic) suffices to explain the relationship under
study. The polynomial contrast output will have significance tests for unweighted,
weighted, and deviation terms. If the Deviation row is not significant, then the
research does not reject the null hypotheseis that the polynomial term (ex., linear)
can explain the relationship. A significant Deviation row suggest that the linear (or
other polynomial selected) term cannot explain the relationship. Unweighted and
weighted rows both test the same thing. If these rows are significant, then the
researcher concludes the polynomial (ex., linear) term can adequately explain the
relationship (technically, the researcher rejects the null hypothesis that there is no
polynomial relationship of the selected Degree).

In SPSS, select Analyze, Compare Means, One-Way ANOVA; select the dependent
and the factor (categorical independent); click Contrasts; select Polynomial; set the
Degree drop-down list to Linear, Quadratic, or Cubic.

Measures of association (effect size). ANOVA typically centers on significance, not


association. However, with large samples, groups may be found to differ significantly on a
dependent variable, but these differences in effect size may be small. Therefore researchers
using ANOVA should also report level of association for significant effects.
The coefficient of determination, omega-square: This is the proportion of variance
in the dependent variable accounted for by the independent variable, interpreted
similarly to r-square. Omega-square = (Between-groups SS - (k-1)* within-groups
MS)/(Total SS + Within-Groups MS), where SS is sum of squares, MS is mean
square, and k is the number of groups formed by categories of the independent
variable. Omega-square normally varies from 0 to 1, but may have negative values
when the F-ratio is less than 1. Omega-square is probably the most commonly used
measure of the magnitude of the effect of the independent factor. Cohen (1977) calls
omega-square "large" when over .15, "medium" when .06 to .15, and otherwise
"small." Note omega-square is not used for random effects designs. While it may be
used for one-way repeated measures designs, omega-square is underestimated
slightly if there is subject by treatment interaction. Due to sources of variability
being large, omega-square is not usually reported for two-way or higher repeated
measures designs.
The correlation ratio, eta: Eta, also called the coefficient of nonlinear correlation
or E-squared (E2), is the percent of total variance in the dependent variable
accounted for by the variance between categories (groups) formed by the
independent variable(s). Eta is thus the ratio of the between-groups sum of squares
to the total sum of squares. The between-groups sum of squares measures the effect
of the grouping variable (that is, the extent to which the means are different between
groups). It can also be said that eta is the percent that prediction is improved by
knowing the grouping variable(s) when the dependent is measured in terms of the
square of the prediction error. Eta is analogous to R2 in regression analysis. When
there are curvilinear relations of the factor to the dependent, eta-square will be
higher than the corresponding coefficient of multiple correlation (R2).
In SPSS, select Analyze, Compare Means; Means; click Options; select ANOVA
table and Eta.

The coefficient of intraclass correlation, r: This ANOVA-based type of correlation


measures the relative homogeneity within groups in ratio to the total variation and is
used, for example, in assessing inter-rater reliability. Intraclass correlation, r =
(Between-groups MS - Within-groups MS)/(Between-groups MS + (n-1)*Within-
Groups MS), where n is the average number of cases in each category of the
independent. Intraclass correlation is large and positive when there is no variation
within the groups, but group means differ. It will be at its largest negative value
when group means are the same but there is great variation within groups. Its
maximum value is 1.0, but its maximum negative value is (-1/(n-1)). A negative
intraclass correlation occurs when between-group variation is less than within-group
variation, indicating some third (control) variable has introduced nonrandom effects
on the different groups. Intraclass correlation is discussed further in the section on
reliability.

Assumptions
Interval data. ANOVA assumes an interval-level dependent. With Likert scales and other ordinal
dependents, the nonparametric Kruskal-Wallace test is preferred.
Homogeneity of variances. The dependent variable should have the same variance in each
category of the independent variable. When there is more than one independent, there must be
homogeneity of variances in the cells formed by the independent categorical variables. The reason
for this assumption is that the denominator of the F-ratio is the within-group mean square, which is
the average of group variances taking group sizes into account. When groups differ widely in
variances, this average is a poor summary measure. However, ANOVA is robust for small and even
moderate departures from homogeneity of variance (Box, 1954). Still, a rule of thumb is that the
ratio of largest to smallest group variances should be 3:1 or less. Moore (1995) suggests the more
lenient standard of 4:1. When choosing rules of thumb, remember that the more unequal the sample
sizes, the smaller the differences in variances which are acceptable. Marked violations of the
homogeneity of variances assumption can lead to either over- or under-estimation of the
significance level, disrupt the F-test.
Levene's test of homogeneity of variance is computed by SPSS to test the ANOVA
assumption that each group (category) of the independent)(s) has the same variance. If the
Levene statistic is significant at the .05 level or better, the researcher rejects the null
hypothesis that the groups have equal variances. The Levene test is robust in the face of
departures from normality. Note, however, that failure to meet the assumption of
homogeneity of variances is not fatal to ANOVA, which is relatively robust, particularly
when groups are of equal sample size. When groups are of very unequal sample size,
Welch's variance-weighted ANOVA is recommended.
Bartlett's test of homogeneity of variance is an older test which is alternative to Levene's
test. Bartlett's test is a chi-square statistic with (k-1) degrees of freedom, where k is the
number of categories in the independent variable. The Bartlett's test is dependent on
meeting the assumption of normality and therefore Levene's test has now largely replaced it.
Brown & Forsythe's F test of equality of means is more robust than ANOVA using the
Levine test when groups are unequal in size and the absolute deviation scores (deviations
from the group means) are highly skewed, causing a violation of the normality assumption.
The Brown-Forsythe F test does not assume homogeneity of variances.

In SPSS, Analyze, Compare Means, One-Way ANOVA; click Options; select Brown-
Forsyth.

Welch's test of equality of means is used when variables and/or group sizes are unequal.

In SPSS, Analyze, Compare Means, One-Way ANOVA; click Options; select Wech test.

Appropriate sums of squares. Normally there are data for every cell in the design. For instance,
2-way ANOVA with a 3-level factor and a 4-level factor will have 12 cells (groups). But if there are
no data for some of the cells, the ordinary computation of sums of squares ("Type III" is the
ordinary, default type) will result in bias. When there are empty cells, one must ask for "Type IV"
sums of squares, which compare a given cell with averages of other cells. In SPSS, Analyze,
General Linear Model, Univariate; click Model, then set "Sum of Squares" to "Type IV" or other
appropriate type depending on one's design:
Type I. Used in hierarchical balanced designs where main effects are specified before first-
order interaction effects, and first-order interaction effects are specified before second-order
interaction effects, etc. Also used for purely nested models where a first effect is nested
within a second effect, the second within a third, etc. And used in polynomial regression
models where simple terms are specified before higher-order terms (ex., squared terms).
Type II. Used with purely nested designs which have main factors and no interaction effects,
or with any regression model, or for balanced models common in experimental research.
Type III. The default type and by far the most common, for any models mentioned above
and any balanced or unbalanced model as long as there are no empty cells in the design.
Type IV. Required if any cells are empty in a balanced or unbalanced design. This would
include all nested designs, such as Latin square designs.
Random sampling. For purposes of significance testing, the subjects in each group are randomly
sampled.
Multivariate normality. For purposes of significance testing, variables should follow multivariate
normal distributions. The dependent variable should be normally distributed in each category of the
independent variable(s). ANOVA is robust even for moderate departures from multivariate
normality, so this is among the less crucial assumption of ANOVA.
Boxplot tests of the normality assumption: The SPSS boxplot output option produces
charts in which the Y axis is the interval dependent and categories of the independent are
arrayed on the X axis. Inside the graph, for each X category, will be a rectangle indicating
the spread of the dependent's values for that category. If these rectangles are roughly at the
same Y elevation for all categories, this indicates little difference among groups. Within
each rectangle is a horizontal dark line, indicating the mean. If most of the rectangle is on
one side or the other of the mean line, this indicates the dependent is skewed (not normal)
for that group (category). Note you can display boxplots for two factors (two independents)
together by selecting Clustered Boxplots from the Boxplot item on the SPSS Graphs menu.
Quantile plot tests of normality generate a Q-Q plot of residuals, which should form a 45-
degree line on a plot of observed and expected values.

In SPSS, select Analyze, General Linear Model, Univariate; click Save; select the residual
res_1 for the dependentl click Plots; select Normality Plots.

Orthogonal error. Error terms are uncorrelated (best assured through randomization of subjects).
Error terms should be random, independent, and normally distributed around a zero mean. Error
patterns than differ by group are not random.
Equal or similar sample sizes. The groups formed by the categories of the independent(s) should
be equal or similar in sample size. The more the groups are similar in size, the more robust ANOVA
will be with respect to violations of the assumptions of normality and homogeneity of variance.
Unequal sample sizes will confound interpretation of main effects but will not affect the F test of
interaction effects in the ANOVA table, for two-way ANOVA. For 3-way ANOVA, unequal sample
sizes will confound interpretation of main and 2-way interaction effects but will not affect analysis
of 3-way interactions. Etc. When sample sizes are markedly dissimilar, Brown and Forsyth's test or
Welch's test of equality of means may be preferred (see above).

Balanced ANOVA designs have equal group sizes, unbalanced ANOVA does not. Unbalanced
designs require adjustments in how ANOVA is computed. This is done automatically in ANOVA
and MANOVA in SPSS. In SAS, unless a recent version has changed it, no correction is made in
PROC ANOVA but correction for unequal groups is done in PROC GLM.

Equal group sizes are not assumed by the t or F tests for the overall model. The range tests based on
the q statistic do require a common n, but this is derived by computing the harmonic mean of the
unequal group n's when differences are small, and by computing the harmonic mean of the two
groups being compared when differences are larger.

Data independence. In most ANOVA designs, it is assumed the independents are orthogonal
(uncorrelated, independent). This corresponds to the absence of multicollinearity in regression
models. If there is such lack of independence, then the ratio of the between to within variances will
not follow the F distribution assumed for significance testing. If all cells in a factorial design have
approximately equal numbers of cases, orthogonality is assured because there will be no
association in the design matrix table. In factorial designs, orthogonality is assured by equalizing
the number of cases in each cell of the design matrix table, either through original sampling or by
post-hoc sampling of cells with larger frequencies. Note, however, that there are other designs for
correlated independents, using different computation.
Recursive models. A causal model is implied which does not have feedback loops from the
dependent back to the independent(s).
Categorical independents. The independent variable is or variables are categorical.
Continuous dependents. The dependent variable is continuous and interval level.
Non-significant outliers. There are no gross outliers in the data.
Sphericity, also called circularity is assumed for repeated measures designs and for random effect
designs. This is a special case of the homogeneity of variance assumption. Sphericity is when the
variance of the difference between the estimated means for any pair of groups is the same as for
any other pair. SPSS automatically does a sphericity test for repeated measures designs with
repeated measures factors with three or more levels, of for within-subjects designs, but not for
mixed model designs. (With two levels there is only one covariance and there is no issue of
homogeneity of variances of differences). In SAS, the PRINTE option in PROC GLM tests
sphericity. If the significance of the sphericity test is less than 0.05 then the researcher accepts the
null hypothesis that the data are not spherical, thereby violating the sphericity assumption, and
must correct for sphericity or must use multivariate ANOVA tests (Wilks Lambda, Pillai's Trace,
Hotelling-Lawley Trace, Roy's Greatest Root).
Compound sphericity. A more restrictive assumption, called compound symmetry, is that the
correlations between any two different groups are the same value. If compound symmetry exists,
sphericity exists. Tests or adjustments for lack of sphericity are usually actually based on possible
lack of compound symmetry.

Epsilon. If the researcher wishes to correct the univariate F test, this is done by using Huynh-Feldt
or Greenhouse-Geisser Epsilon. The closer epsilon is to 1.0, the greater the sphericity. Recall that F
is the ratio of between-groups to within-groups mean square variance. The degrees of freedom for
between-groups is (k-1), where k = the number of groups. The degrees of freedom for within-
groups is k(n-1), where n is the number of cases in each group. To correct F given a finding of lack
of sphericity, the researcher multiplies the between-groups degrees of freedom by the value of
epsilon. SPSS supplies Huynh-Feldt epsilon and the more conservative Greenhouse-Geisser epsilon
[which in turn is an extension of Box's epsilon, no longer widely used]). For more severe
departures from sphericity (epsilon < .75), the more conservative Greenhouse-Geisser epsilon is
used, while Huynh-Feldt epsilong is used for less severe violations of the sphericity assumption.
The researcher rounds degrees of freedom down to the nearest whole number and looks up the
corrected F value in a table using the corrected degrees of freedom.

ANOVA does not assume linear relationships and can handle interaction effects in most cases.
However, note that for block designs, ANOVA assumes additivity -- that raw scores are an additive
combination of the mean, the group effect, the block effect, and an error term, meaning that it
assumes there is no interaction between the group factor (ex., the independent variable representing
the treatment) and the block factor (ex., the independent variable used as an explicit control in
assignment of subjects). If this additivity assumption is violated, the observed value of F is
underestimated and it becomes more difficult to detect differences between groups.
Assumptions related to ANCOVA:
Limited number of covariates. The more the covariates, the greater the likelihood that an
additional covariate will have little residual correlation with the dependent after other
covariates are controlled. The marginal gain in explanatory power is offset by loss of
statistical power (a degree of freedom is lost for each added covariate).
Low measurement error of the covariate. The covariate variables are continuous and
interval level, and are assumed to be measured without error. Imperfect measurement
reduces the statistical power of significance tests for ANCOVA and for experimental data,
there is a conservative bias (increased likelihood of Type II errors: thinking there is no
relationship when in fact there is a relationship) . As a rule of thumb, covariates should have
a reliability coefficient of .80 or higher.
Covariates are linearly related or in a known relationship to the dependent. The form
of the relationship between the covariate and the dependent must be known and most
computer programs assume this relationship is linear, adjusting the dependent mean based
on linear regression. Scatterplots of the covariate and the dependent for each of the k groups
formed by the independents is one way to assess violations of this assumption. Covariates
may be transformed (ex., log transform) to establish a linear relationship.
Homogeneity of covariate regression coefficients. This is ANCOVA's "equality of
regressions" or "homogeneity of regressions" assumption. The covariate coefficients (the
slopes of the regression lines) are the same for each group formed by the categorical
variables and measured on the dependent. The more this assumption is violated, the more
conservative ANCOVA becomes (increased likelihood of Type II errors: thinking there is no
relationship when in fact there is a relationship). Homogeneity of regression in SPSS can be
tested under the Model button of Analyze, General Linear Model, Univariate; select Custom
under the Model button; enter a model with all main effects of the factors and covariates
and the interaction of the covariate(s) with the factor(s). These interaction effects should be
non-significant if the homogeneity of regressions assumption is met. There is also a
statistical test of the assumption of homogeneity of regression coefficients (see Wildt and
Ahtola, 1978: 27).
Separate slopes models. If the equality of regressions assumption is violated, the
dependent variable can be estimated by a separate slopes model. That is, one must
obtain the slopes of the covariates for each level of the factor(s). This is obtained
under the Model button by specifying Custom model, then asking for a model with
the main effects of the factor(s) and the interaction of the factor(s) with the
covariate(s), but not asking for the main effect of the covariate(s) and not asking for
an intercept. Under the Options button one asks for Parameter estimates. The
parameter estimates table will give the separate regression slopes for Initial at each
level of the factor(s). These will be the slopes in the factor*covariate rows. The
same table will also have the B coefficients for each level of the factor(s). These
coefficients can then be used to construct estimates of the dependent using separate
slopes for each level of the factor(s). For a one-factor model, equations take the
form: Dependent = (B for the first level of the factor) + (B for the
[factor=1]*covariate)*(value of the covariate for a given case). A similar equation is
used for each level of the factor(s). For a given case, depending on its values on the
factor(s), a separate regression equation is used.
No covariate outliers. ANCOVA is highly sensitive to outliers in the covariates.
No high multicollinearity of the covariates. ANCOVA is sensitive to multicollinearity
among the covariates and also loses statistical power as redundant covariates are added to
the model. Some researchers recomment dropping from analysis any added covariates
whose squared correlation with prior covariates is .50 or higher.
Additivity. The values of the dependent are an additive combination of its overall mean, the
effect of the categorical independents, the covariate effect, and an error term. ANCOVA is
robust against violations of additivity but in severe violations the researcher may transform
the data, as by using a logarithmic transformation to change a multiplicative model into an
additive model. Note, however, that ANCOVA automatically handles interaction effects and
thus is not an additive procedure in the sense of regression models without interaction
terms.
Independence of the error term. The error term is independent of the covariates and the
categorical independents. Randomization in experimental designs assures this assumption
will be met.
Independent variables orthogonal to covariates. In traditional ANCOVA, the
independents area assumed to be orthogonal to the factors. If the covariate is influenced by
the categorical independents, then the control adjustment ANCOVA makes on the
dependent variable prior to assessing the effects of the categorical independents will be
biased since some indirect effects of the independents will be removed from the dependent.
However, in GLM ANCOVA, the values of the factors are adjusted for interactions with the
covariates.
Homogeneity of variances. It is assumed there is homogeneity of variances of the
dependent and of the covariates in the cells formed by the factors Heteroscedasticity is lack
of homogeneity of variances, in violation of this assumption. When this assumption is
violated, the offending covariate may be dropped or the researcher may adopt a more
stringent alpha significance level (ex., .01 instead of .05).

Frequently Asked Questions


How do you interpret an ANOVA table?
One-Way ANOVA Table
SS df MS F
between or explained 64 2 32
within or 68 21 3.24 9.88
total 132 23
SS is the sum of squares (the variation), df the degrees of freedom, MS the mean square
(the variance, which is SS/df), and F is the F ratio (which is between MS divided by within
MS). As the MS for between-groups is much greater than the MS for within-groups, this
table shows the grouping variable does have an effect, as indicated hy the F ratio being
greater than 1. The grouping variable had three groups (high, medium, low), which is why
the between-groups df was (3-1)=2. There are 8 people per group, so the within-groups d.f.
is number of groups times one less than the number of people per group: 3*(8-1)=21. These
are the df for the numerator and denominator respectively. We look in the F table for the .05
significance level with 2 and 21 d.f., and find the critical F value is 3.47. As the computed F
value is considerably more (9.88), we can be 95% confident that the grouping (independent)
variable makes a difference in the dependent variable. (In fact, the F is high enough to be
significant at the .001 level, and some computer programs will print this out in the ANOVA
table along with or instead of the F value).

The table for two-way ANOVA is similar, but there are additional rows for the main
(dependent on independent) effects and for interaction effects, as well as total explained and
residual portions:
Two-Way ANOVA Table
SS df MS F
Main Effects 88 3 29.333 18.857
X1 24 1 24 15.429
X2 64 2 32 20.571
2-Way Interaction Effects 16 2 8 5.143
X1 X2 16 2 8 5.143
explained 104 5 20.8 13.371
residual 28 18 1.556
total 132 23 5.739
The two-way table is interpreted in the same way, except now there are rows for assessing
the between-groups (main effects) variation overall and for each independent, and there are
rows for assessing the interaction effects overall and for each interaction (here there is just
one interaction, which is thus the same as the overall interaction row). The Explained row
now reflects the combined main and interaction effects of the grouping variables, and the
Residual is the remaining within-groups variation (the total variation minus the explained
variation).

The two-way ANOVA table can be interpreted in terms of the difference of mean
differences. The F test for either of the main effects in the table above is reflected in the
difference between row means or between column means ( depending on whether X1 or X2
is the row or column variable) in a table (not shown) where X1 and X2 are independent
factors and the cell entries are means on the dependent variable. The F test for the
interaction effect is reflected in the difference of these two mean differences.

Isn't ANOVA just for experimental research designs?

No. This is sometimes heard but is incorrect. It is true that ANOVA emerges from the
experimental tradition, whereas political scientists have traditionally relied more on
correlation and regression models, but ANOVA is not limited to experimental design.

Should I standardize my data before using ANOVA or ANCOVA?

No. ANOVA and ANCOVA differences and patterns in variances, covariances, and means.
Standardization, by definition, makes all variables have the same mean (0) and the same
standard deviation (1).

Since orthogonality (uncorrelated independents) is an assumption, and since this is rare in


real life topics of interest to social scientists, shouldn't regression models be used instead of
ANOVA models?

The orthogonality problem in ANOVA is mirrored by the multicollinearity problem in


regression models. In fact, ANOVA models can be modeled as a type of regression model
with dummy variables. When independents are highly correlated, separating their causal
role can be difficult or impossible regardless of the statistical model used. Wherever
possible, the researcher should seek to employ variables which minimize this problem. For
instance, using husband's age and wife's age as two independents will almost always
involve multicollinearity, but using one of these and using difference between the two ages
as a second variable will not.

How does counterbalancing work in repeated measures designs?

Repeated measures designs test the same subjects repeatedly at different points in time.
Suppose there are four time points, corresponding to four conditions (ex., text, online,
telephone, and personal coaching, in that order, prior to completing a task). If all subjects
receive the four levels in the same order, then improvement in later tasks might be due to
later conditions (personal coaching) being more effective or it might simply be due to a
practice effect or a positive carry-over treatment effect (positive effects of prior coaching
methods kick in during later coaching treatments). Worse scores in later tasks might be due
to a fatigue effect or negative carry-over treatment effect (there could be negative effects
from prior coaching methods, such as confusion). Counterbalancing of the presentation of
treatments must be used to control these otherwise confounding effects.

Under counterbalancing, treatments are introduced in different orders for different subjects,
at random, such that overall each treatment occurs equally often at each time stage (here,
four time periods) and equally often before and after every other treatment. Some
algorithms have been devised to help the researcher set the sequences for each subject.

Even-number algorithm: Assuming an even number of treatment levels (here, four: 1=text, 2=online, 3=telephone,
4=personal), let the first subject receive the treatments in this order: 1, 2, 4, 3. If there were six levels, this would be the
order: 1, 2, 6, 3, 5, 4. If there were n levels, the order would be: 1, 2, n, 3, (n-1), 4, (n-2), etc. The sequence for the second
subject would be derived by adding 1 to the first sequence, except rolling a sum > n back to 1. Thus the 1, 2, 4, 3 for the
first subject becomes 2, 3, 1, 4 for the second subject, and so on.

Odd-number algorithm: Assuming an odd number of treatment levels, let the sequence for the first subject be the same as
for the even-number algorithm above. If there were five levels, this would be the order: 1, 2, 5, 3, 4. The second subject
would be the reverse of the first: 4, 3, 5, 2, 1. The third subject would be the order created by adding 1: 5, 4, 1, 3, 2; and
the fourth would be its reverse: 2, 3, 1, 4, 5. For subsequent subjects one alternates between adding 1 and reversing, etc.

How is F computed in random effect designs?

When there is one fixed factor and one random effect factor, the denominator of the F test is
their two-way interaction, not the usual within-groups mean square, which is the default in
many computer programs. In SPSS, the MANOVA procedure can be used to implement the
random effect design:

MANOVA Y BY a (1,3) b(1, 4)


/DESIGN = a VS 1
a BY b = 1 VS WITHIN
b VS WITHIN
Let a be the fixed effects factor and b the random effect factor.

Line 1 invokes the MANOVA procedure and asks implicitly for testing of the main effect for a, the main effect for b, and
the interaction effect.

Line 2 specifies that the fixed factor is to be tested against error term #1, to be defined next, rather than the default
WITHIN (within-groups mean square) term.

Line 3 defines a BY b as error term 1, and also asks that the a BY b interaction term be tested against the WITHIN
default.

Line 4 asks that the random effect factor, b, also be tested against the WITHIN error term.

What designs are available in ANOVA for correlated independents?

Correlation of independents is common in non-experimental applications of ANOVA. Such


correlation violates one of the assumptions of usual ANOVA designs. When correlation
exists, the sum of squares reflecting the main effect of an independent no longer represents
the unique effect of that variable. The general procedure for dealing with correlated
independents in ANOVA involves taking one independent at a time, computing its sum of
squares, then allocating the remaining sum to the other independents. One will get different
solutions depending on the order in which the independents are considered. Order of entry
is set by common logic (ex., race or gender cannot be determined by such variables as
opinion or income, and thus should be entered first). If there is no evident logic to the
variables, a rule of thumb is to consider them in the order of magnitude of their sums of
squares. See Iverson and Norpoth (1987: 58-64).

If the assumption of homogeneity of variances is not met, should regression models be used instead?

Again, the problem that a relationship may differ for different ranges of the independent(s)
is a generic research problem, not to be sidestepped merely by switching statistical
procedures. Regardless of the procedure used, one must test for heteroscedasticity.

Is ANOVA a linear procedure like regression? What is the "Contrasts" option?

No. Residuals, which are errors of estimate, are measured differently in ANOVA compared
to regression. In regression, residuals are deviations from the regression line. In ANOVA,
residuals are deviations from the group means. When a relationship is nonlinear, the
regression line will not pass through the group means and in such cases residuals in
regression are different from residuals in ANOVA. The sum of squared residuals, which is
an indicator of model lack of fit, will be greater for regression than for ANOVA when
relationships are nonlinear.

The SPSS ANOVA procedure can be used as a test for the existence of linear, quadratic, and
other polynomial relationships, using the "Contrasts" option. Of course, nonlinear effects
can be modeled in regression using polynomial, logarithmic, or other nonlinear data
transformations. A polynomial contrast partitions the between-groups sums of squares into
trend components, which can be used to test for a trend (ex., a linear trend) of the dependent
variable across the ordered levels of the categorical independent variable. SPSS supports
1st, 2nd, 3rd, 4th, and 5th degree polynomials.

What is hierarchical ANOVA or ANCOVA?

This is an option in the SPSS METHOD command in the ANOVA procedure. The
METHOD command determines how effects are calculated from the sums of squares. The
SPSS default is the regression or "unique" method, which calculates all effects
simultaneously. Another option is the "classic experimental approach", which computes
effects separately in this order: covariate effects (in ANCOVA), main effects, two-way
interaction effects, three-way interaction effects, ..., five-way interaction effects. The
hierarchical method also computes effects separately, but covariate effects are assessed only
for previously-computed covariate effects, and likewise main effects are assessed only for
previously-computed main effects. In both the experimental and hierarchical methods,
interaction effects are assessed the same (assessed for covariates, factors and all equal or
lower-order interactions), which is different from the regression method (assessed for all
covariates, factors, and interactions).

Is there a limit on the number of independents which can be included in an analysis of variance?

Yes and no. SPSS limits you to 10. While there is no theoretical limit, even 10 would be a
great many. As one adds independents, the likelihood of collinearity of the variables
increases, adding little to the percent of variance explained (R-squared) and making
interpretation of the standard errors of the individual independents difficult.

Which SPSS procedures compute ANOVA?

Not just ANOVA, but also ONEWAY, SUMMARIZE, and GLM.

I have several independent variables, which means there are a very large number of possible interaction effects.
Does SPSS have to compute them all?

The MAXORDERS subcommand in the SPSS ANOVA procedure is used to suppress the
effects of various orders of interaction.

Do you use the same designs (between groups, repeated measures, etc.) with ANCOVA as you do with ANOVA?

By and large, yes. Be warned, however, that for repeated measures designs with more than
two levels of the repeated measure factor, if the covariates are also measured repeatedly,
only the univariate tests output is appropriate (in SPSS, the ones labeled "AVERAGED tests
of significance". SPSS will also print the multivariate tests, but they are not appropriate
because the multivariate tests partial each of the covariates from the entire set of dependent
variables. The appropriate univariate approach partials variance on the basis of dependent-
variable/covariate matched pairs.

How is GLM ANCOVA different from traditional ANCOVA?

The traditional method assumes that the covariates are uncorrelated with the factors. The
GLM (general linear model) approach adjusts for interactions of the covariates with the
factors.

What are paired comparisons (planned or post hoc) in ANCOVA?

After the omnibus F test establishes an overall relationship, the researcher can test
differences between pairs of group means (assuming the independent has more than two
levels) to determine which groups are most involved with significant effects. Ideally these
comparisons are based on a priori theory and there are just a few of them. But if the
researcher wants to investigate all possible paired comparisons on a post hoc basis, some
will be found significant just be chance, so there are various adjustments (Bonferroni,
Tukey, Scheffe) which make it harder to find significance.

Can ANCOVA be modeled using regression?

Yes, if dummy variables are used for the categorical independents. When creating dummy
variables, one must use one less category than there are values of each independent. For full
ANCOVA one would also add the interaction crossproduct terms for each pair of
independents included in the equation, including the dummies. Then one computes multiple
regression. The resulting F tests will be the same as in classical ANCOVA.

How does blocking with ANOVA compare to ANCOVA?

In blocking under ANOVA, what would have been a continuous covariate in ANCOVA is
classified (ex., high, medium, low) and used as an additional factor in an ANOVA. The
main effect of this factor is similar to the effect of the covariate in ANCOVA. If there is an
interaction effect involving this factor, this shows the homogeneity of regressions
assumption would have been violated in ANCOVA. This has the advantage compared to
ANCOVA that one need not assume the relationship between the covariate and the
dependent variable is linear. However, classification involves loss of information and
attenuation of correlation. If the covariate is related to the dependent in a linear manner,
ANCOVA will be more powerful than ANOVA with blocking and is preferred. Also,
blocking after data are collected may involve unequal group sample sizes, which also makes
ANOVA less robust.

Bibliography
Box, G. E. P. (1954). "Some theorems on quadratic forms applied in the study of analysis of variance problems." Annals
of Statistics, 25: 290-302. Cited with regard to robustness of the F test even in the face of small violations of the
homogeneity of variances assumption.

Brown, Steven R. and Lawrence E. Melamed (1990). Experimental design and analysis. Thousand Oaks, CA: Sage
Publications. Quantitative Applications in the Social Sciences series no. 74. Discusses alternative ANOVA designs and
related coefficients.

G.E. P. Box, W. G. Hunter, and J. S. Hunter (1978). Statistics for experimenters: An introduction to design and data
analysis. NY: John Wiley. General introduction.

Cohen, J. (1977). Statistical power analysis for the behavioral sciences. NY: Academic Press. Cited with regard to
intepretation of omega-square.

Cortina, Jose M. and Hossein Nouri (2000). Effect size for ANOVA designs. Thousand Oaks, CA: Sage Publications.
Quantitative Applications in the Social Sciences series no. 129.

Dunteman, George H. and Moon-Ho R. Ho (2005). An introduction to generalized linear models. Thousand Oaks, CA:
Sage Publications. Quantitative Applications in the Social Sciences, Vol. 145.

Girden, Ellen R. (1992). ANOVA Repeated Measures. Thousand Oaks, CA: Sage Publications. Quantitative Applications
in the Social Sciences series no. 84. A thorough review of repeated measures design, including single-, two-, and three-
factor studies. Good discussion of sphericity and epsilon adjustments to degrees of freedom in F tests. Cited in relation to
pretest-posttest designs.

Iverson, Gudmund R. and Helmut Norpoth (1987). Analysis of variance. Thousand Oaks, CA: Sage Publications.
Quantitative Applications in the Social Sciences series no. 1. A readable introduction to one-way and two-way ANOVA,
including Latin Square designs, nested designs, and discussion of ANOVA in relation to regression.

Jaccard, James (1998). Interaction effects in factorial analysis of variance. Quantitative Applications in the Social
Sciences Series No. 118. Thousand Oaks, CA: Sage Publications.

Jackson, Sally and Dale E. Brashers (1994). Random factors in ANOVA. Thousand Oaks, CA: Sage Publications.
Quantitative Applications in the Social Sciences series no. 98. Thorough coverage of random effect models, including
SAS and SPSS code.

Klockars, Alan J. and Gilbert Sax (1986). Multiple comparisons. Thousand Oaks, CA: Sage Publications. Quantitative
Applications in the Social Sciences series #61. Covers multiple comparison tests, range tests, Tukey's tests, Scheffe test,
and others.

Levin, Irwin P. (1999). Relating statistics and experimental design. Thousand Oaks, CA: Sage Publications. Quantitative
Applications in the Social Sciences series #125. Elementary introduction covers t-tests and various simple ANOVA
designs. Some additional discussion of chi-square, significance tests for correlation and regression. and non-parametric
tests such as the runs test, median test, and Mann-Whitney U test.

Millikan, G. A. and D. E. Johnson (1992). Analysis of messy data, vol. 1: Designed experiments. NY: Chapman and Hall.
Moore, D. S. (1995). The basic practice of statistics. NY: Freeman and Co.

Rutherford, Andrew (2001). Introducing ANOVA and ANCOVA: A GLM approach. Thousand Oaks, CA: Sage
Publications.

Toothacker, Larry E. (1993). Multiple comparisons procedures. Thousand Oaks, CA: Sage Publications. Quantitative
Applications in the Social Sciences series #89. Discusses multiple comparison tests, assumptions, power considerations,
and use in two-way ANOVA. Good coverage of SAS and SPSS support for MCP's.

Turner, J. Rick and Julian Thayer (2001). Introduction to analysis of variance. Thousand Oaks, CA: Sage Publications.
Focus on explaining different types of designs.

Wildt, Albert R. and Olli T. Ahtola (1978). Analysis of covariance. Quantitative Applications in the Social Sciences series
#12. Thousand Oaks, CA: Sage Publications.

Potrebbero piacerti anche