Sei sulla pagina 1di 25

Analysis of variance

From Wikipedia, the free encyclopedia


Analysis of variance (ANOVA) is a collection of statistical models used to analyze the
differences between group means and their associated procedures (such as "variation"
among and between groups), developed by R! Fisher "n the !#$%! setting, the
observed variance in a particular variable is partitioned into components attributable to
different sources of variation "n its simplest form, !#$%! provides a statistical test of
whether or not the means of several groups are e&ual, and therefore generalizes the t'test
to more than two groups !s doing multiple two'sample t'tests would result in an
increased chance of committing a statistical type " error, !#$%!s are useful in
comparing (testing) three or more means (groups or variables) for statistical significance
Contents
( )otivating e*ample
+ ,ackground and terminology
o +( -esign'of'e*periments terms
. /lasses of models
o .( Fi*ed'effects models
o .+ Random'effects models
o .. )i*ed'effects models
0 !ssumptions of !#$%!
o 0( 1e*tbook analysis using a normal distribution
o 0+ Randomization'based analysis
0+( 2nit'treatment additivity
0++ -erived linear model
0+. 3tatistical models for observational data
o 0. 3ummary of assumptions
4 /haracteristics of !#$%!
5 6ogic of !#$%!
o 5( 7artitioning of the sum of s&uares
o 5+ 1he F'test
o 5. 8*tended logic
9 !#$%! for a single factor
: !#$%! for multiple factors
; Worked numeric e*amples
(< !ssociated analysis
o (<( 7reparatory analysis
(<(( 1he number of e*perimental units
(<(+ 7ower analysis
(<(. 8ffect size
o (<+ Followup analysis
(<+( )odel confirmation
(<++ Follow'up tests
(( 3tudy designs and !#$%!s
(+ !#$%! cautions
(. =eneralizations
(0 >istory
(4 3ee also
(5 Footnotes
(9 #otes
(: References
(; Further reading
+< 8*ternal links
Motivating example
#o fit
Fair fit
%ery good fit
1he analysis of variance can be used as an e*ploratory tool to e*plain observations ! dog
show provides an e*ample ! dog show is not a random sampling of the breed? it is
typically limited to dogs that are male, adult, pure'bred, and e*emplary ! histogram of
dog weights from a show might plausibly be rather comple*, like the yellow'orange
distribution shown in the illustrations 3uppose we wanted to predict the weight of a dog
based on a certain set of characteristics of each dog ,efore we could do that, we would
need to explain the distribution of weights by dividing the dog population into groups
based on those characteristics ! successful grouping will split dogs such that a) each
group has a low variance of dog weights (meaning the group is relatively homogeneous)
and b) the mean of each group is distinct (if two groups have the same mean, then it isn@t
reasonable to conclude that the groups are, in fact, separate in any meaningful way)
"n the illustrations to the right, each group is identified as X(, X+, etc "n the first
illustration, we divide the dogs according to the product (interaction) of two binary
groupings? young vs old, and short'haired vs long'haired (thus, group ( is young, short'
haired dogs, group + is young, long'haired dogs, etc) 3ince the distributions of dog
weight within each of the groups (shown in blue) has a large variance, and since the
means are very close across groups, grouping dogs by these characteristics does not
produce an effective way to e*plain the variation in dog weights? knowing which group a
dog is in does not allow us to make any reasonable statements as to what that dog@s
weight is likely to be 1hus, this grouping fails to fit the distribution we are trying to
e*plain (yellow'orange)
!n attempt to e*plain the weight distribution by grouping dogs as (pet vs working breed)
and (less athletic vs more athletic) would probably be somewhat more successful (fair
fit) 1he heaviest show dogs are likely to be big strong working breeds, while breeds kept
as pets tend to be smaller and thus lighter !s shown by the second illustration, the
distributions have variances that are considerably smaller than in the first case, and the
means are more reasonably distinguishable >owever, the significant overlap of
distributions, for e*ample, means that we cannot reliably say that X( and X+ are truly
distinct (ie, it is perhaps reasonably likely that splitting dogs according to the flip of a
coinAby pure chanceAmight produce distributions that look similar)
!n attempt to e*plain weight by breed is likely to produce a very good fit !ll
/hihuahuas are light and all 3t ,ernards are heavy 1he difference in weights between
3etters and 7ointers does not Bustify separate breeds 1he analysis of variance provides
the formal tools to Bustify these intuitive Budgments ! common use of the method is the
analysis of e*perimental data or the development of models 1he method has some
advantages over correlation? not all of the data must be numeric and one result of the
method is a Budgment in the confidence in an e*planatory relationship
Background and terminology
!#$%! is a particular form of statistical hypothesis testing heavily used in the analysis
of e*perimental data ! statistical hypothesis test is a method of making decisions using
data ! test result (calculated from the null hypothesis and the sample) is called
statistically significant if it is deemed unlikely to have occurred by chance, assuming the
truth of the null hypothesis ! statistically significant result, when a probability (p'value)
is less than a threshold (significance level), Bustifies the reBection of the null hypothesis,
but only if the a priori probability of the null hypothesis is not high
"n the typical application of !#$%!, the null hypothesis is that all groups are simply
random samples of the same population For e*ample, when studying the effect of
different treatments on similar samples of patients, the null hypothesis would be that all
treatments have the same effect (perhaps none) ReBecting the null hypothesis would
imply that different treatments result in altered effects
,y construction, hypothesis testing limits the rate of 1ype " errors (false positives leading
to false scientific claims) to a significance level 8*perimenters also wish to limit 1ype ""
errors (false negatives resulting in missed scientific discoveries) 1he 1ype "" error rate is
a function of several things including sample size (positively correlated with e*periment
cost), significance level (when the standard of proof is high, the chances of overlooking a
discovery are also high) and effect size (when the effect is obvious to the casual observer,
1ype "" error rates are low)
1he terminology of !#$%! is largely from the statistical design of e*periments 1he
e*perimenter adBusts factors and measures responses in an attempt to determine an effect
Factors are assigned to e*perimental units by a combination of randomization and
blocking to ensure the validity of the results ,linding keeps the weighing impartial
Responses show a variability that is partially the result of the effect and is partially
random error
!#$%! is the synthesis of several ideas and it is used for multiple purposes !s a
conse&uence, it is difficult to define concisely or precisely
"/lassical !#$%! for balanced data does three things at once?
( !s e*ploratory data analysis, an !#$%! is an organization of an additive data
decomposition, and its sums of s&uares indicate the variance of each component
of the decomposition (or, e&uivalently, each set of terms of a linear model)
+ /omparisons of mean s&uares, along with F'tests allow testing of a nested
se&uence of models
. /losely related to the !#$%! is a linear model fit with coefficient estimates and
standard errors"
C(D
"n short, !#$%! is a statistical tool used in several ways to develop and confirm an
e*planation for the observed data
!dditionally?
0 "t is computationally elegant and relatively robust against violations of its
assumptions
4 !#$%! provides industrial strength (multiple sample comparison) statistical
analysis
5 "t has been adapted to the analysis of a variety of e*perimental designs
!s a result? !#$%! "has long enBoyed the status of being the most used (some would
say abused) statistical techni&ue in psychological research"
C+D
!#$%! "is probably the
most useful techni&ue in the field of statistical inference"
C.D
!#$%! is difficult to teach, particularly for comple* e*periments, with split'plot designs
being notorious
C0D
"n some cases the proper application of the method is best determined
by problem pattern recognition followed by the consultation of a classic authoritative test
C4D
Design-of-experiments terms
(/ondensed from the #"31 8ngineering 3tatistics handbook? 3ection 49 ! =lossary of
-$8 1erminology)
C5D
,alanced design
!n e*perimental design where all cells (ie treatment combinations) have the
same number of observations
,locking
! schedule for conducting treatment combinations in an e*perimental study such
that any effects on the e*perimental results due to a known change in raw
materials, operators, machines, etc, become concentrated in the levels of the
blocking variable 1he reason for blocking is to isolate a systematic effect and
prevent it from obscuring the main effects ,locking is achieved by restricting
randomization
-esign
! set of e*perimental runs which allows the fit of a particular model and the
estimate of effects
-$8
-esign of e*periments !n approach to problem solving involving collection of
data that will support valid, defensible, and supportable conclusions
C9D
8ffect
>ow changing the settings of a factor changes the response 1he effect of a single
factor is also called a main effect
8rror
2ne*plained variation in a collection of observations -$8@s typically re&uire
understanding of both random error and lack of fit error
8*perimental unit
1he entity to which a specific treatment combination is applied
Factors
7rocess inputs an investigator manipulates to cause a change in the output
6ack'of'fit error
8rror that occurs when the analysis omits one or more important terms or factors
from the process model "ncluding replication in a -$8 allows separation of
e*perimental error into its components? lack of fit and random (pure) error
)odel
)athematical relationship which relates changes in a given response to changes in
one or more factors
Random error
8rror that occurs due to natural variation in the process Random error is typically
assumed to be normally distributed with zero mean and a constant variance
Random error is also called e*perimental error
Randomization
! schedule for allocating treatment material and for conducting treatment
combinations in a -$8 such that the conditions in one run neither depend on the
conditions of the previous run nor predict the conditions in the subse&uent runs
Cnb
(D
Replication
7erforming the same treatment combination more than once "ncluding replication
allows an estimate of the random error independent of any lack of fit error
Responses
1he output(s) of a process 3ometimes called dependent variable(s)
1reatment
! treatment is a specific combination of factor levels whose effect is to be
compared with other treatments
Classes of models
1here are three classes of models used in the analysis of variance, and these are outlined
here
Fixed-effects models
)ain article? Fi*ed effects model
1he fi*ed'effects model of analysis of variance applies to situations in which the
e*perimenter applies one or more treatments to the subBects of the e*periment to see if
the response variable values change 1his allows the e*perimenter to estimate the ranges
of response variable values that the treatment would generate in the population as a
whole
andom-effects models
)ain article? Random effects model
Random effects models are used when the treatments are not fi*ed 1his occurs when the
various factor levels are sampled from a larger population ,ecause the levels themselves
are random variables, some assumptions and the method of contrasting the treatments (a
multi'variable generalization of simple differences) differ from the fi*ed'effects model
C:D
Mixed-effects models
)ain article? )i*ed model
! mi*ed'effects model contains e*perimental factors of both fi*ed and random'effects
types, with appropriately different interpretations and analysis for the two types
8*ample? 1eaching e*periments could be performed by a university department to find a
good introductory te*tbook, with each te*t considered a treatment 1he fi*ed'effects
model would compare a list of candidate te*ts 1he random'effects model would
determine whether important differences e*ist among a list of randomly selected te*ts
1he mi*ed'effects model would compare the (fi*ed) incumbent te*ts to randomly
selected alternatives
-efining fi*ed and random effects has proven elusive, with competing definitions
arguably leading toward a linguistic &uagmire
C;D
Assumptions of ANOVA
1he analysis of variance has been studied from several approaches, the most common of
which uses a linear model that relates the response to the treatments and blocks #ote that
the model is linear in parameters but may be nonlinear across factor levels "nterpretation
is easy when data is balanced across factors but much deeper understanding is needed for
unbalanced data
!ext"ook analysis using a normal distri"ution
1he analysis of variance can be presented in terms of a linear model, which makes the
following assumptions about the probability distribution of the responses?
C(<DC((DC(+DC(.D
"ndependence of observations E this is an assumption of the model that simplifies
the statistical analysis
#ormality E the distributions of the residuals are normal
8&uality (or "homogeneity") of variances, called homoscedasticity A the variance
of data in groups should be the same
1he separate assumptions of the te*tbook model imply that the errors are independently,
identically, and normally distributed for fi*ed effects models, that is, that the errors ( @s)
are independent and
andomi#ation-"ased analysis
3ee also? Random assignment and Randomization test
"n a randomized controlled e*periment, the treatments are randomly assigned to
e*perimental units, following the e*perimental protocol 1his randomization is obBective
and declared before the e*periment is carried out 1he obBective random'assignment is
used to test the significance of the null hypothesis, following the ideas of / 3 7eirce and
Ronald ! Fisher 1his design'based analysis was discussed and developed by Francis F
!nscombe at Rothamsted 8*perimental 3tation and by $scar Gempthorne at "owa 3tate
2niversity
C(0D
Gempthorne and his students make an assumption of unit treatment
additivity, which is discussed in the books of Gempthorne and -avid R /o*
Ccitation neededD
$nit-treatment additivity
"n its simplest form, the assumption of unit'treatment additivity
Cnb +D
states that the
observed response from e*perimental unit when receiving treatment can be written
as the sum of the unit@s response and the treatment'effect , that is
C(4DC(5DC(9D
1he assumption of unit'treatment additivity implies that, for every treatment , the th
treatment have e*actly the same effect on every e*periment unit
1he assumption of unit treatment additivity usually cannot be directly falsified, according
to /o* and Gempthorne >owever, many consequences of treatment'unit additivity can
be falsified For a randomized e*periment, the assumption of unit'treatment additivity
implies that the variance is constant for all treatments 1herefore, by contraposition, a
necessary condition for unit'treatment additivity is that the variance is constant
1he use of unit treatment additivity and randomization is similar to the design'based
inference that is standard in finite'population survey sampling
Derived linear model
Gempthorne uses the randomization'distribution and the assumption of unit treatment
additivity to produce a derived linear model, very similar to the te*tbook model discussed
previously
C(:D
1he test statistics of this derived linear model are closely appro*imated by
the test statistics of an appropriate normal linear model, according to appro*imation
theorems and simulation studies
C(;D
>owever, there are differences For e*ample, the
randomization'based analysis results in a small but (strictly) negative correlation between
the observations
C+<DC+(D
"n the randomization'based analysis, there is no assumption of a
normal distribution and certainly no assumption of independence $n the contrary, the
observations are dependentH
1he randomization'based analysis has the disadvantage that its e*position involves
tedious algebra and e*tensive time 3ince the randomization'based analysis is
complicated and is closely appro*imated by the approach using a normal linear model,
most teachers emphasize the normal linear model approach Few statisticians obBect to
model'based analysis of balanced randomized e*periments
%tatistical models for o"servational data
>owever, when applied to data from non'randomized e*periments or observational
studies, model'based analysis lacks the warrant of randomization
C++D
For observational
data, the derivation of confidence intervals must use subjective models, as emphasized by
Ronald ! Fisher and his followers "n practice, the estimates of treatment'effects from
observational studies generally are often inconsistent "n practice, "statistical models" and
observational data are useful for suggesting hypotheses that should be treated very
cautiously by the public
C+.D
%ummary of assumptions
1he normal'model based !#$%! analysis assumes the independence, normality and
homogeneity of the variances of the residuals 1he randomization'based analysis assumes
only the homogeneity of the variances of the residuals (as a conse&uence of unit'
treatment additivity) and uses the randomization procedure of the e*periment ,oth these
analyses re&uire homoscedasticity, as an assumption for the normal'model analysis and as
a conse&uence of randomization and additivity for the randomization'based analysis
>owever, studies of processes that change variances rather than means (called dispersion
effects) have been successfully conducted using !#$%!
C+0D
1here are no necessary
assumptions for !#$%! in its full generality, but the F'test used for !#$%! hypothesis
testing has assumptions and practical limitations which are of continuing interest
7roblems which do not satisfy the assumptions of !#$%! can often be transformed to
satisfy the assumptions 1he property of unit'treatment additivity is not invariant under a
"change of scale", so statisticians often use transformations to achieve unit'treatment
additivity "f the response variable is e*pected to follow a parametric family of
probability distributions, then the statistician may specify (in the protocol for the
e*periment or observational study) that the responses be transformed to stabilize the
variance
C+4D
!lso, a statistician may specify that logarithmic transforms be applied to the
responses, which are believed to follow a multiplicative model
C(5DC+5D
!ccording to
/auchy@s functional e&uation theorem, the logarithm is the only continuous
transformation that transforms real multiplication to addition
Ccitation neededD
C&aracteristics of ANOVA
!#$%! is used in the analysis of comparative e*periments, those in which only the
difference in outcomes is of interest 1he statistical significance of the e*periment is
determined by a ratio of two variances 1his ratio is independent of several possible
alterations to the e*perimental observations? !dding a constant to all observations does
not alter significance )ultiplying all observations by a constant does not alter
significance 3o !#$%! statistical significance results are independent of constant bias
and scaling errors as well as the units used in e*pressing observations "n the era of
mechanical calculation it was common to subtract a constant from all observations (when
e&uivalent to dropping leading digits) to simplify data entry
C+9DC+:D
1his is an e*ample of
data coding
'ogic of ANOVA
1he calculations of !#$%! can be characterized as computing a number of means and
variances, dividing two variances and comparing the ratio to a handbook value to
determine statistical significance /alculating a treatment effect is then trivial, "the effect
of any treatment is estimated by taking the difference between the mean of the
observations which receive the treatment and the general mean"
C+;D
(artitioning of t&e sum of s)uares
)ain article? 7artition of sums of s&uares
!#$%! uses traditional standardized terminology 1he definitional e&uation of sample
variance is , where the divisor is called the degrees of freedom
(-F), the summation is called the sum of s&uares (33), the result is called the mean
s&uare ()3) and the s&uared terms are deviations from the sample mean !#$%!
estimates . sample variances? a total variance based on all the observation deviations
from the grand mean, an error variance based on all the observation deviations from their
appropriate treatment means and a treatment variance 1he treatment variance is based on
the deviations of treatment means from the grand mean, the result being multiplied by the
number of observations in each treatment to account for the difference between the
variance of observations and the variance of means
1he fundamental techni&ue is a partitioning of the total sum of s&uares SS into
components related to the effects used in the model For e*ample, the model for a
simplified !#$%! with one type of treatment at different levels
1he number of degrees of freedom DF can be partitioned in a similar way? one of these
components (that for error) specifies a chi's&uared distribution which describes the
associated sum of s&uares, while the same is true for "treatments" if there is no treatment
effect
3ee also 6ack'of'fit sum of s&uares
!&e F-test
)ain article? F'test
1he F'test is used for comparing the factors of the total deviation For e*ample, in one'
way, or single'factor !#$%!, statistical significance is tested for by comparing the F test
statistic
where MS is mean s&uare, I number of treatments and I total number of cases
to the F'distribution with , degrees of freedom 2sing the F'distribution is
a natural candidate because the test statistic is the ratio of two scaled sums of s&uares
each of which follows a scaled chi's&uared distribution
1he e*pected value of F is (where n is the treatment sample
size) which is ( for no treatment effect !s values of F increase above (, the evidence is
increasingly inconsistent with the null hypothesis 1wo apparent e*perimental methods of
increasing F are increasing the sample size and reducing the error variance by tight
e*perimental controls
1here are two methods of concluding the !#$%! hypothesis test, both of which produce
the same result?
1he te*tbook method is to compare the observed value of F with the critical value
of F determined from tables 1he critical value of F is a function of the degrees of
freedom of the numerator and the denominator and the significance level (J) "f F
K F/ritical, the null hypothesis is reBected
1he computer method calculates the probability (p'value) of a value of F greater
than or e&ual to the observed value 1he null hypothesis is reBected if this
probability is less than or e&ual to the significance level (J)
1he !#$%! F'test is known to be nearly optimal in the sense of minimizing false
negative errors for a fi*ed rate of false positive errors (ie ma*imizing power for a fi*ed
significance level) For e*ample, to test the hypothesis that various medical treatments
have e*actly the same effect, the F'test@s p'values closely appro*imate the permutation
test@s p'values? 1he appro*imation is particularly close when the design is balanced
C(;DC.<D

3uch permutation tests characterize tests with ma*imum power against all alternative
hypotheses, as observed by Rosenbaum
Cnb .D
1he !#$%! FEtest (of the null'hypothesis
that all treatments have e*actly the same effect) is recommended as a practical test,
because of its robustness against many alternative distributions
C.(DCnb 0D
*xtended logic
!#$%! consists of separable partsL partitioning sources of variance and hypothesis
testing can be used individually !#$%! is used to support other statistical tools
Regression is first used to fit more comple* models to data, then !#$%! is used to
compare models with the obBective of selecting simple(r) models that ade&uately describe
the data "3uch models could be fit without any reference to !#$%!, but !#$%! tools
could then be used to make some sense of the fitted models, and to test hypotheses about
batches of coefficients"
C.+D
"CWDe think of the analysis of variance as a way of
understanding and structuring multilevel modelsAnot as an alternative to regression but
as a tool for summarizing comple* high'dimensional inferences "
C.+D
ANOVA for a single factor
)ain article? $ne'way analysis of variance
1he simplest e*periment suitable for !#$%! analysis is the completely randomized
e*periment with a single factor )ore comple* e*periments with a single factor involve
constraints on randomization and include completely randomized blocks and 6atin
s&uares (and variants? =raeco'6atin s&uares, etc) 1he more comple* e*periments share
many of the comple*ities of multiple factors ! relatively complete discussion of the
analysis (models, data summaries, !#$%! table) of the completely randomized
e*periment is available
ANOVA for multiple factors
)ain article? 1wo'way analysis of variance
!#$%! generalizes to the study of the effects of multiple factors When the e*periment
includes observations at all combinations of levels of each factor, it is termed factorial
Factorial e*periments are more efficient than a series of single factor e*periments and the
efficiency grows as the number of factors increases
C..D
/onse&uently, factorial designs are
heavily used
1he use of !#$%! to study the effects of multiple factors has a complication "n a .'way
!#$%! with factors *, y and z, the !#$%! model includes terms for the main effects
(*, y, z) and terms for interactions (*y, *z, yz, *yz) !ll terms re&uire hypothesis tests
1he proliferation of interaction terms increases the risk that some hypothesis test will
produce a false positive by chance Fortunately, e*perience says that high order
interactions are rare
C.0D
1he ability to detect interactions is a maBor advantage of multiple
factor !#$%! 1esting one factor at a time hides interactions, but produces apparently
inconsistent e*perimental results
C..D
/aution is advised when encountering interactionsL 1est interaction terms first and
e*pand the analysis beyond !#$%! if interactions are found 1e*ts vary in their
recommendations regarding the continuation of the !#$%! procedure after encountering
an interaction "nteractions complicate the interpretation of e*perimental data #either the
calculations of significance nor the estimated treatment effects can be taken at face value
"! significant interaction will often mask the significance of main effects"
C.4D
=raphical
methods are recommended to enhance understanding Regression is often useful !
lengthy discussion of interactions is available in /o* ((;4:)
C.5D
3ome interactions can be
removed (by transformations) while others cannot
! variety of techni&ues are used with multiple factor !#$%! to reduce e*pense $ne
techni&ue used in factorial designs is to minimize replication (possibly no replication
with support of analytical trickery) and to combine groups when effects are found to be
statistically (or practically) insignificant !n e*periment with many insignificant factors
may collapse into one with a few factors supported by many replications
C.9D
+orked numeric examples
3everal fully worked numerical e*amples are available ! simple case uses one'way (a
single factor) analysis ! more comple* case uses two'way (two'factor) analysis
Associated analysis
3ome analysis is re&uired in support of the design of the e*periment while other analysis
is performed after changes in the factors are formally found to produce statistically
significant changes in the responses ,ecause e*perimentation is iterative, the results of
one e*periment alter plans for following e*periments
(reparatory analysis
!&e num"er of experimental units
"n the design of an e*periment, the number of e*perimental units is planned to satisfy the
goals of the e*periment 8*perimentation is often se&uential
8arly e*periments are often designed to provide mean'unbiased estimates of treatment
effects and of e*perimental error 6ater e*periments are often designed to test a
hypothesis that a treatment effect has an important magnitudeL in this case, the number of
e*perimental units is chosen so that the e*periment is within budget and has ade&uate
power, among other goals
Reporting sample size analysis is generally re&uired in psychology "7rovide information
on sample size and the process that led to sample size decisions"
C.:D
1he analysis, which
is written in the e*perimental protocol before the e*periment is conducted, is e*amined in
grant applications and administrative review boards
,esides the power analysis, there are less formal methods for selecting the number of
e*perimental units 1hese include graphical methods based on limiting the probability of
false negative errors, graphical methods based on an e*pected variation increase (above
the residuals) and methods based on achieving a desired confident interval
C.;D
(o,er analysis
7ower analysis is often applied in the conte*t of !#$%! in order to assess the
probability of successfully reBecting the null hypothesis if we assume a certain !#$%!
design, effect size in the population, sample size and significance level 7ower analysis
can assist in study design by determining what sample size would be re&uired in order to
have a reasonable chance of reBecting the null hypothesis when the alternative hypothesis
is true
C0<DC0(DC0+DC0.D
*ffect si#e
)ain article? 8ffect size
3everal standardized measures of effect have been proposed for !#$%! to summarize
the strength of the association between a predictor(s) and the dependent variable (eg, M
+
,
N
+
, or O
+
) or the overall standardized difference (P) of the complete model 3tandardized
effect'size estimates facilitate comparison of findings across studies and disciplines
>owever, while standardized effect sizes are commonly used in much of the professional
literature, a non'standardized measure of effect size that has immediately "meaningful"
units may be preferable for reporting purposes
C00D
Follo,up analysis
"t is always appropriate to carefully consider outliers 1hey have a disproportionate
impact on statistical conclusions and are often the result of errors
Model confirmation
"t is prudent to verify that the assumptions of !#$%! have been met Residuals are
e*amined or analyzed to confirm homoscedasticity and gross normality
C04D
Residuals
should have the appearance of (zero mean normal distribution) noise when plotted as a
function of anything including time and modeled data values 1rends hint at interactions
among factors or among observations $ne rule of thumb? ""f the largest standard
deviation is less than twice the smallest standard deviation, we can use methods based on
the assumption of e&ual standard deviations and our results will still be appro*imately
correct"
C05D
Follo,-up tests
! statistically significant effect in !#$%! is often followed up with one or more
different follow'up tests 1his can be done in order to assess which groups are different
from which other groups or to test various other focused hypotheses Follow'up tests are
often distinguished in terms of whether they are planned (a priori) or post hoc 7lanned
tests are determined before looking at the data and post hoc tests are performed after
looking at the data
$ften one of the "treatments" is none, so the treatment group can act as a control
-unnett@s test (a modification of the t'test) tests whether each of the other treatment
groups has the same mean as the control
C09D
7ost hoc tests such as 1ukey@s range test most commonly compare every group mean with
every other group mean and typically incorporate some method of controlling for 1ype "
errors /omparisons, which are most commonly planned, can be either simple or
compound 3imple comparisons compare one group mean with one other group mean
/ompound comparisons typically compare two sets of groups means where one set has
two or more groups (eg, compare average group means of group !, , and / with group
-) /omparisons can also look at tests of trend, such as linear and &uadratic relationships,
when the independent variable involves ordered levels
Following !#$%! with pair'wise multiple'comparison tests has been criticized on
several grounds
C00DC0:D
1here are many such tests ((< in one table) and recommendations
regarding their use are vague or conflicting
C0;DC4<D
%tudy designs and ANOVAs
1here are several types of !#$%! )any statisticians base !#$%! on the design of the
e*periment,
C4(D
especially on the protocol that specifies the random assignment of
treatments to subBectsL the protocol@s description of the assignment mechanism should
include a specification of the structure of the treatments and of any blocking "t is also
common to apply !#$%! to observational data using an appropriate statistical model
Ccitation neededD
3ome popular designs use the following types of !#$%!?
$ne'way !#$%! is used to test for differences among two or more independent
groups (means),eg different levels of urea application in a crop 1ypically,
however, the one'way !#$%! is used to test for differences among at least three
groups, since the two'group case can be covered by a t'test
C4+D
When there are
only two means to compare, the t'test and the !#$%! F'test are e&uivalentL the
relation between !#$%! and t is given by F I t
+

Factorial !#$%! is used when the e*perimenter wants to study the interaction
effects among the treatments
Repeated measures !#$%! is used when the same subBects are used for each
treatment (eg, in a longitudinal study)
)ultivariate analysis of variance ()!#$%!) is used when there is more than one
response variable
ANOVA cautions
,alanced e*periments (those with an e&ual sample size for each treatment) are relatively
easy to interpretL 2nbalanced e*periments offer more comple*ity For single factor (one
way) !#$%!, the adBustment for unbalanced data is easy, but the unbalanced analysis
lacks both robustness and power
C4.D
For more comple* designs the lack of balance leads
to further complications "1he orthogonality property of main effects and interactions
present in balanced data does not carry over to the unbalanced case 1his means that the
usual analysis of variance techni&ues do not apply /onse&uently, the analysis of
unbalanced factorials is much more difficult than that for balanced designs"
C40D
"n the
general case, "1he analysis of variance can also be applied to unbalanced data, but then
the sums of s&uares, mean s&uares, and F'ratios will depend on the order in which the
sources of variation are considered"
C.+D
1he simplest techni&ues for handling unbalanced
data restore balance by either throwing out data or by synthesizing missing data )ore
comple* techni&ues use regression
!#$%! is (in part) a significance test 1he !merican 7sychological !ssociation holds
the view that simply reporting significance is insufficient and that reporting confidence
bounds is preferred
C00D
While !#$%! is conservative (in maintaining a significance level) against multiple
comparisons in one dimension, it is not conservative against comparisons in multiple
dimensions
C44D
-enerali#ations
!#$%! is considered to be a special case of linear regression
C45DC49D
which in turn is a
special case of the general linear model
C4:D
!ll consider the observations to be the sum of
a model (fit) and a residual (error) to be minimized
1he GruskalEWallis test and the Friedman test are nonparametric tests, which do not rely
on an assumption of normality
C4;DC5<D
.istory
While the analysis of variance reached fruition in the +<th century, antecedents e*tend
centuries into the past according to 3tigler
C5(D
1hese include hypothesis testing, the
partitioning of sums of s&uares, e*perimental techni&ues and the additive model 6aplace
was performing hypothesis testing in the (99<s
C5+D
1he development of least's&uares
methods by 6aplace and =auss circa (:<< provided an improved method of combining
observations (over the e*isting practices of astronomy and geodesy) "t also initiated
much study of the contributions to sums of s&uares 6aplace soon knew how to estimate a
variance from a residual (rather than a total) sum of s&uares
C5.D
,y (:+9 6aplace was
using least s&uares methods to address !#$%! problems regarding measurements of
atmospheric tides
C50D
,efore (:<< astronomers had isolated observational errors resulting
from reaction times (the "personal e&uation") and had developed methods of reducing the
errors
C54D
1he e*perimental methods used in the study of the personal e&uation were later
accepted by the emerging field of psychology
C55D
which developed strong (full factorial)
e*perimental methods to which randomization and blinding were soon added
C59D
!n
elo&uent non'mathematical e*planation of the additive effects model was available in
(::4
C5:D
3ir Ronald Fisher introduced the term "variance" and proposed a formal analysis of
variance in a (;(: article The Correlation et!een "elatives on the Supposition of
Mendelian #nheritance
C5;D
>is first application of the analysis of variance was published
in (;+(
C9<D
!nalysis of variance became widely known after being included in Fisher@s
(;+4 book Statistical Methods for "esearch $or%ers
Randomization models were developed by several researchers 1he first was published in
7olish by #eyman in (;+.
C9(D
$ne of the attributes of !#$%! which ensured its early popularity was computational
elegance 1he structure of the additive model allows solution for the additive coefficients
by simple algebra rather than by matri* calculations "n the era of mechanical calculators
this simplicity was critical 1he determination of statistical significance also re&uired
access to tables of the F function which were supplied by early statistics te*ts
%ee also
Wikimedia /ommons has media related to Analysis of variance
!)$%!
!nalysis of covariance (!#/$%!)
!#$R%!
!#$%! on ranks
!#$%!'simultaneous component analysis
)i*ed'design analysis of variance
)ultivariate analysis of variance ()!#$%!)
$ne'way analysis of variance
Repeated measures !#$%!
1wo'way analysis of variance
Footnotes
( Randomization is a term used in multiple ways in this material
"Randomization has three roles in applications? as a device for eliminating biases,
for e*ample from unobserved e*planatory variables and selection effects? as a
basis for estimating standard errors? and as a foundation for formally e*act
significance tests" /o* (+<<5, page (;+) >inkelmann and Gempthorne use
randomization both in e*perimental design and for statistical analysis
+ 2nit'treatment additivity is simply termed additivity in most te*ts
>inkelmann and Gempthorne add adBectives and distinguish between additivity in
the strict and broad senses 1his allows a detailed consideration of multiple error
sources (treatment, state, selection, measurement and sampling) on page (5(
. Rosenbaum (+<<+, page 0<) cites 3ection 49 (7ermutation 1ests),
1heorem +. (actually 1heorem ., page (:0) of 6ehmann@s Testing Statistical
&ypotheses ((;4;)
0 1he F'test for the comparison of variances has a mi*ed reputation "t is not
recommended as a hypothesis test to determine whether two different samples
have the same variance "t is recommended for !#$%! where two estimates of
the variance of the same sample are compared While the F'test is not generally
robust against departures from normality, it has been found to be robust in the
special case of !#$%! /itations from )oore Q )c/abe (+<<.)? "!nalysis of
variance uses F statistics, but these are not the same as the F statistic for
comparing two population standard deviations" (page 440) "1he F test and other
procedures for inference about variances are so lacking in robustness as to be of
little use in practice" (page 445) "C1he !#$%! F testD is relatively insensitive to
moderate nonnormality and une&ual variances, especially when the sample sizes
are similar" (page 95.) !#$%! assumes homoscedasticity, but it is robust 1he
statistical test for homoscedasticity (the F'test) is not robust )oore Q )c/abe
recommend a rule of thumb
Notes
( =elman (+<<4, p +)
+ >owell (+<<+, p .+<)
. )ontgomery (+<<(, p 5.)
0 =elman (+<<4, p ()
4 =elman (+<<4, p 4)
5 "3ection 49 ! =lossary of -$8 1erminology" '#ST (ngineering
Statistics handboo% #"31 Retrieved 4 !pril +<(+
9 "3ection 0.( ! =lossary of -$8 1erminology" '#ST (ngineering
Statistics handboo% #"31 Retrieved (0 !ug +<(+
: )ontgomery (+<<(, /hapter (+? 8*periments with random factors)
; =elman (+<<4, pp +<E+()
(< 3nedecor, =eorge WL /ochran, William = ((;59) Statistical Methods
(5th ed) p .+(
(( /ochran Q /o* ((;;+, p 0:)
(+ >owell (+<<+, p .+.)
(. !nderson, -avid RL 3weeney, -ennis FL Williams, 1homas ! ((;;5)
Statistics for business and economics (5th ed) )inneapolisR3t 7aul? West 7ub
/o pp 04+E04. "3,# <'.(0'<5.9:'(
(0 !nscombe ((;0:)
(4 Gempthorne ((;9;, p .<)
(5 /o* ((;4:, /hapter +? 3ome Gey !ssumptions)
(9 >inkelmann and Gempthorne (+<<:, %olume (, 1hroughout "ntroduced in
3ection +..? 7rinciples of e*perimental designL 1he linear modelL $utline of a
model)
(: >inkelmann and Gempthorne (+<<:, %olume (, 3ection 5.? /ompletely
Randomized -esignL -erived 6inear )odel)
(; >inkelmann and Gempthorne (+<<:, %olume (, 3ection 55? /ompletely
randomized designL !ppro*imating the randomization test)
+< ,ailey (+<<:, /hapter +(0 "! )ore =eneral )odel" in ,ailey, pp .:E0<)
+( >inkelmann and Gempthorne (+<<:, %olume (, /hapter 9? /omparison of
1reatments)
++ Gempthorne ((;9;, pp (+4E(+5, "1he e*perimenter must decide which of
the various causes that he feels will produce variations in his results must be
controlled e*perimentally 1hose causes that he does not control e*perimentally,
because he is not cognizant of them, he must control by the device of
randomization" "C$Dnly when the treatments in the e*periment are applied by the
e*perimenter using the full randomization procedure is the chain of inductive
inference sound "t is only under these circumstances that the e*perimenter can
attribute whatever effects he observes to the treatment and the treatment only
2nder these circumstances his conclusions are reliable in the statistical sense")
+. Freedman
Cfull citation neededD
+0 )ontgomery (+<<(, 3ection .:? -iscovering dispersion effects)
+4 >inkelmann and Gempthorne (+<<:, %olume (, 3ection 5(<? /ompletely
randomized designL 1ransformations)
+5 ,ailey (+<<:)
+9 )ontgomery (+<<(, 3ection .'.? 8*periments with a single factor? 1he
analysis of varianceL !nalysis of the fi*ed effects model)
+: /ochran Q /o* ((;;+, p + e*ample)
+; /ochran Q /o* ((;;+, p 0;)
.< >inkelmann and Gempthorne (+<<:, %olume (, 3ection 59? /ompletely
randomized designL /R- with une&ual numbers of replications)
.( )oore and )c/abe (+<<., page 95.)
.+ =elman (+<<:)
.. )ontgomery (+<<(, 3ection 4'+? "ntroduction to factorial designsL 1he
advantages of factorials)
.0 ,elle (+<<:, 3ection :0? >igh'order interactions occur rarely)
.4 )ontgomery (+<<(, 3ection 4'(? "ntroduction to factorial designsL ,asic
definitions and principles)
.5 /o* ((;4:, /hapter 5? ,asic ideas about factorial e*periments)
.9 )ontgomery (+<<(, 3ection 4'.9? "ntroduction to factorial designsL 1he
two'factor factorial designL $ne observation per cell)
.: Wilkinson ((;;;, p 4;5)
.; )ontgomery (+<<(, 3ection .'9? -etermining sample size)
0< >owell (+<<+, /hapter :? 7ower)
0( >owell (+<<+, 3ection (((+? 7ower (in !#$%!))
0+ >owell (+<<+, 3ection (.9? 7ower analysis for factorial e*periments)
0. )oore and )c/abe (+<<., pp 99:E9:<)
00 Wilkinson ((;;;, p 4;;)
04 )ontgomery (+<<(, 3ection .'0? )odel ade&uacy checking)
05 )oore and )c/abe (+<<., p 944, Sualifications to this rule appear in a
footnote)
09 )ontgomery (+<<(, 3ection .'4:? 8*periments with a single factor? 1he
analysis of varianceL 7ractical interpretation of resultsL /omparing means with a
control)
0: >inkelmann and Gempthorne (+<<:, %olume (, 3ection 94? /omparison
of 1reatmentsL )ultiple /omparison 7rocedures)
0; >owell (+<<+, /hapter (+? )ultiple comparisons among treatment means)
4< )ontgomery (+<<(, 3ection .'4? 7ractical interpretation of results)
4( /ochran Q /o* ((;49, p ;, "C1Dhe general rule CisD that the way in which
the e*periment is conducted determines not only whether inferences can be made,
but also the calculations re&uired to make them")
4+ "1he 7robable 8rror of a )ean" iometri%a /? (E< (;<:
doi?(<(<;.RbiometR5((
4. )ontgomery (+<<(, 3ection .'.0? 2nbalanced data)
40 )ontgomery (+<<(, 3ection (0'+? 2nbalanced data in factorial design)
44 Wilkinson ((;;;, p 5<<)
45 =elman (+<<4, p() (with &ualification in the later te*t)
49 )ontgomery (+<<(, 3ection .;? 1he Regression !pproach to the !nalysis
of %ariance)
4: >owell (+<<+, p 5<0)
4; >owell (+<<+, /hapter (:? Resampling and nonparametric approaches to
data)
5< )ontgomery (+<<(, 3ection .'(<? #onparametric methods in the analysis
of variance)
5( 3tigler ((;:5)
5+ 3tigler ((;:5, p (.0)
5. 3tigler ((;:5, p (4.)
50 3tigler ((;:5, pp (40E(44)
54 3tigler ((;:5, pp +0<E+0+)
55 3tigler ((;:5, /hapter 9 ' 7sychophysics as a /ounterpoint)
59 3tigler ((;:5, p +4.)
5: 3tigler ((;:5, pp .(0E.(4)
5; The Correlation et!een "elatives on the Supposition of Mendelian
#nheritance Ronald ! Fisher )hilosophical Transactions of the "oyal Society of
(dinburgh (;(: (volume 4+, pages .;;E0..)
9< $n the "7robable 8rror" of a /oefficient of /orrelation -educed from a
3mall 3ample Ronald ! Fisher )etron, (? .'.+ ((;+()
9( 3cheffT ((;4;, p +;(, "Randomization models were first formulated by
#eyman ((;+.) for the completely randomized design, by #eyman ((;.4) for
randomized blocks, by Welch ((;.9) and 7itman ((;.9) for the 6atin s&uare
under a certain null hypothesis, and by Gempthorne ((;4+, (;44) and Wilk ((;44)
for many other designs")
eferences
!nscombe, F F ((;0:) "1he %alidity of /omparative 8*periments" *ournal of
the "oyal Statistical Society+ Series , -.eneral/ 000 (.)? (:(E+((
doi?(<+.<9R+;:0(4; F31$R +;:0(4; )R .<(:(
,ailey, R ! (+<<:) Design of Comparative (xperiments /ambridge 2niversity
7ress "3,# ;9:'<'4+('5:.49'; 7re'publication chapters are available on'line
,elle, =erald van (+<<:) Statistical rules of thumb (+nd ed) >oboken, #F?
Wiley "3,# ;9:'<'09<'(000:'<
/ochran, William =L /o*, =ertrude ) ((;;+) (xperimental designs (+nd ed)
#ew Uork? Wiley "3,# ;9:'<'09('40459'4
/ohen, Facob ((;::) Statistical po!er analysis for the behavior sciences (+nd
ed) Routledge "3,# ;9:'<':<4:'<+:.'+
/ohen, Facob ((;;+) "3tatistics a power primer" )sychology ulletin 001 (()?
(44E(4; doi?(<(<.9R<<..'+;<;((+((44 7)"- (;4545:.
/o*, -avid R ((;4:) )lanning of experiments Reprinted as "3,# ;9:'<'09('
490+;'.
/o*, - R (+<<5) )rinciples of statistical inference /ambridge #ew Uork?
/ambridge 2niversity 7ress "3,# ;9:'<'4+('5:459'+
Freedman, -avid !(+<<4) Statistical Models0 Theory and )ractice, /ambridge
2niversity 7ress "3,# ;9:'<'4+('59(<4'9
=elman, !ndrew (+<<4) "!nalysis of varianceV Why it is more important than
ever" The ,nnals of Statistics 22? (E4. doi?(<(+(0R<<;<4.5<0<<<<<(<0:
=elman, !ndrew (+<<:) "%ariance, analysis of" The ne! )algrave dictionary of
economics (+nd ed) ,asingstoke, >ampshire #ew Uork? 7algrave )acmillan
"3,# ;9:'<'...'9:595'4
>inkelmann, Glaus Q Gempthorne, $scar (+<<:) Design and ,nalysis of
(xperiments " and "" (3econd ed) Wiley "3,# ;9:'<'09<'.:44('9
>owell, -avid / (+<<+) Statistical methods for psychology (4th ed) 7acific
=rove, /!? -u*buryR1homson 6earning "3,# <'4.0'.999<'W
Gempthorne, $scar ((;9;) The Design and ,nalysis of (xperiments (/orrected
reprint of ((;4+) Wiley ed) Robert 8 Grieger "3,# <'::+94'(<4'<
6ehmann, 86 ((;4;) 1esting 3tatistical >ypotheses Fohn Wiley Q 3ons
)ontgomery, -ouglas / (+<<() Design and ,nalysis of (xperiments (4th ed)
#ew Uork? Wiley "3,# ;9:'<'09('.(50;'9
)oore, -avid 3 Q )c/abe, =eorge 7 (+<<.) "ntroduction to the 7ractice of
3tatistics (0e) W > Freeman Q /o "3,# <'9(59';549'<
Rosenbaum, 7aul R (+<<+) 1bservational Studies (+nd ed) #ew Uork?
3pringer'%erlag "3,# ;9:'<'.:9';:;59';
3cheffT, >enry ((;4;) The ,nalysis of 2ariance #ew Uork? Wiley
3tigler, 3tephen ) ((;:5) The history of statistics 0 the measurement of
uncertainty before 3455 /ambridge, )ass? ,elknap 7ress of >arvard 2niversity
7ress "3,# <'590'0<.0<'(
Wilkinson, 6eland ((;;;) "3tatistical )ethods in 7sychology FournalsL
=uidelines and 8*planations" ,merican )sychologist 34 (:)? 4;0E5<0
doi?(<(<.9R<<<.'<55W40:4;0
Furt&er reading
,o*, = 8 7 ((;4.) "#on'#ormality and 1ests on %ariances" iometri%a
(,iometrika 1rust) 45 (.R0)? .(:E..4 doi?(<(<;.RbiometR0<.'0.(:
F31$R +....4<
,o*, = 8 7 ((;40) "3ome 1heorems on Suadratic Forms !pplied in the 3tudy
of !nalysis of %ariance 7roblems, " 8ffect of "ne&uality of %ariance in the $ne'
Way /lassification" The ,nnals of Mathematical Statistics 13 (+)? +;<
doi?(<(+(0RaomsR((999+:9:5
,o*, = 8 7 ((;40) "3ome 1heorems on Suadratic Forms !pplied in the 3tudy
of !nalysis of %ariance 7roblems, "" 8ffects of "ne&uality of %ariance and of
/orrelation ,etween 8rrors in the 1wo'Way /lassification" The ,nnals of
Mathematical Statistics 13 (.)? 0:0 doi?(<(+(0RaomsR((999+:9(9
/aliXski, 1adeusz Q Gageyama, 3anpei (+<<<) loc% designs0 , "andomi6ation
approach7 2olume I0 ,nalysis 6ecture #otes in 3tatistics 035 #ew Uork?
3pringer'%erlag "3,# <'.:9';:49:'5
/hristensen, Ronald (+<<+) )lane ,ns!ers to Complex 8uestions0 The Theory of
9inear Models (1hird ed) #ew Uork? 3pringer "3,# <'.:9';4.5('+
/o*, -avid R Q Reid, #ancy ) (+<<<) The theory of design of experiments
(/hapman Q >allR/R/) "3,# ;9:'('4:0::'(;4'9
Fisher, Ronald ((;(:) "3tudies in /rop %ariation " !n e*amination of the yield
of dressed grain from ,roadbalk" *ournal of ,gricultural Science 00? (<9E(.4
doi?(<(<(9R3<<+(:4;5<<<<.94<
Freedman, -avid !L 7isani, RobertL 7urves, Roger (+<<9) Statistics, 0th edition
WW #orton Q /ompany "3,# ;9:'<'.;.';+;9+'<
>ettmansperger, 1 7L )cGean, F W ((;;:) 8dward !rnold, ed "obust
nonparametric statistical methods Gendall@s 6ibrary of 3tatistics %olume 4 (First
ed) #ew Uork? Fohn Wiley Q 3ons, "nc pp *ivY059 pp "3,# <'.0<'40;.9':
)R (5<0;40
6entner, )arvinL 1homas ,ishop ((;;.) (xperimental design and analysis
(3econd ed) 7$ ,o* ::0, ,lacksburg, %! +0<5.? %alley ,ook /ompany
"3,# <';5(5+44'+'W
1abachnick, ,arbara = Q Fidell, 6inda 3 (+<<9) :sing Multivariate Statistics
(4th ed) ,oston? 7earson "nternational 8dition "3,# ;9:'<'+<4'04;.:'0
Wichura, )ichael F (+<<5) The coordinate;free approach to linear models
/ambridge 3eries in 3tatistical and 7robabilistic )athematics /ambridge?
/ambridge 2niversity 7ress pp *ivY(;; "3,# ;9:'<'4+(':5:0+'5
)R ++:.044
*xternal links
Wikiversity has learning materials about Analysis of variance
3$/R !#$%! !ctivity and interactive applet
8*amples of all !#$%! and !#/$%! models with up to three treatment factors,
including randomized block, split plot, repeated measures, and 6atin s&uares, and
their analysis in R
#"31R38)!18/> e'>andbook of 3tatistical )ethods, section 90.? "!re the
means e&ualV"
6s&o,7
v
t
e
%tatistics
6s&o,7
v
t
e
Design of experiments
%tatistics portal
/ategories?
!nalysis of variance
-esign of e*periments
3tatistical tests
7arametric statistics
Navigation menu
/reate account
6og in
!rticle
1alk
Read
8dit
%iew history
)ain page
/ontents
Featured content
/urrent events
Random article
-onate to Wikipedia
Wikimedia 3hop
8nteraction
>elp
!bout Wikipedia
/ommunity portal
Recent changes
/ontact page
!ools
What links here
Related changes
2pload file
3pecial pages
7ermanent link
7age information
Wikidata item
/ite this page
(rint9export
/reate a book
-ownload as 7-F
7rintable version
'anguages
Z[\]^_`
!zarbaycanca
bcdefghij
/atalk
lemtina
-eutsch
8spanol
8uskara
opqrs
Frantais
=alego



,ahasa "ndonesia
"taliano
,asa Fawa
6atviemu
)agyar
,ahasa )elayu
#ederlands

#orsk bokmul
#orsk nynorsk
7olski
7ortuguvs
wxhhijy
3lovenmzina
,asa 3unda
3venska


1{rkte
|igf}~hif

8dit links
1his page was last modified on (< !ugust +<(0 at (0?44
1e*t is available under the /reative /ommons !ttribution'3hare!like 6icenseL
additional terms may apply ,y using this site, you agree to the 1erms of 2se and
7rivacy 7olicy Wikipedia is a registered trademark of the Wikimedia
Foundation, "nc, a non'profit organization

Potrebbero piacerti anche