Anova

Analysis of Variance
1998 Brooks/Cole Publishing/ITP
Specific Topics
1. The analysis of variance
2. The completely randomized design
3. The randomized block design
4. Factorial experiments
What Is an Analysis
of Variance?
Responses exhibit variability.

In an analysis of variance (ANOVA), the total variation in the
response measurements is divided into portions that may be
attributed to various factors, e.g., amount of variation due to
Drug A and amount due to Drug B.
The variability of the measurements within experimental groups
and between experimental groups and their comparisons is
formalized by the ANOVA.
F-test
Group 1

Group 2

Group 3

Mean
Mean
Mean
Is the mean of one group significantly different to the

means of the other groups?
One way ANOVA
Factorial ANOVA
One Independent
Variable
More than One

Independent Variable
Between
Repeated
subjects
measures /
Within
subjects
Different
participants
Same
participants
Two
way
Three
way
Four
way
The Assumption for

an Analysis of Variance
Analysis of variance procedures are fairly robust when sample

sizes are equal and when the data are fairly mound-shaped.
Assumptions of ANOVA Test and Estimation Procedures:

- The observations within each population are normally
distributed with a common variance s 2.
- Assumptions regarding the sampling procedures are
specified for each design.
The Completely Randomized Design:

A One-Way Classification
A completely randomized design is one in which random

samples are selected independently from each of k populations.
Example 11.3
A researcher is interested in the effects of five types of
insecticide for use in controlling the boll weevil in cotton fields.
Explain how to implement a completely randomized design to
investigate the effects of the five insecticides on crop yield.
Solution
The only way to generate the equivalent of five random samples
from the hypothetical populations corresponding to the five
insecticides is to use a method called a randomized assignment.
A fixed number of cotton plants are chosen for treatment,

and each is assigned a random number. Suppose that each
sample is to have an equal number of measurements.
Using a randomization device, you can assign the first n
plants chosen to receive insecticide 1, the second n plants
to receive insecticide 2, and so on, until all five treatments
have been assigned.
The completely randomized design has these:

- Involves only one factor, designated a one-way
classification, e.g., dosage level for Drug A only.
- k levels corresponding to the k populations, which are
also the treatments for this deign, e.g., five possible
dosage levels5,10,15,20, and 25 ml of Drug A.
- Are the k population means all the same, or is at least one
different from the others?
- The ANOVA compares the population means simultaneously versus pair by pair as with Students t.
The Analysis of Variance for

a Completely Randomized Design
Suppose you want to compare k population means based on

independent random samples of size n 1, n 2, , n K from normal
populations with a common variance s 2. They have the same
shape, but potentially different locations:
Null Hypothesis
Versus
Alternative Hypothesis
Ho: m1 = m2 = m3
Ha: m1 = m2 = m3
Ha: At least one pair differ (Right-Tailed)
Partitioning the Total Variation in an Experiment

Let xij be the j th measurement ( j = 1,2, , ni ) in the i th
sample. The total sum of squares is:
Total SS = x ij x
2 = xij2
xij 2
n
If we let G represent the grand total of all n observations, then

the correction for the mean is:
CM =
xij 2
n
G2
=
n
This Total SS is partitioned into two components. The first

component, called the sum of squares for treatments (SST),
measures the variation among k sample means:
SST = ni x 1 x =
2
Ti 2
ni
CM
where Ti is the total of the observations for treatment i.

The second component, called the sum of squares for error

(SSE), is used to measure the pooled variations within the
k samples:
SSE = n1 1s12 + n2 1s22 + + nk 1s k2
We can show algebraically that, in the analysis of variance,

Total SS = SST + SSE
Therefore, you need to calculate only two of the three sums
of squares.
Each of the three sources of variance, when divided by its
appropriate degrees of freedom, provides an estimate of the
variation in the experiment.
Since total SS involves n squared observations, its degrees of
freedom are d f = (n 1).
Similarly, the sum of squares for treatments involves k squared

observations, and its degrees of freedom are d f = (k 1).
Finally, the sum of squares for error, a direct extension of the
pooled estimate, has:
d f = (n1 1) + (n2 1) + + (nk 1) = n k
Notice that the degrees of freedoms for the three sources of

variation are additive that is,
d f (total) = d f (treatments) + d f (error)

The three sources of variation and their respective degrees of
freedom are combined to form the mean squares as
MS = SS/d f.
The total variation in the experiment is then displayed in an
ANOVA table. See Example 11.4 for such a table.
ANOVA Table for k Independent Random Samples:

Completely Randomized Design
Critical F-Value:
Using these two df values, look in Table F to determine your
critical F value. Based on the F-Value equation, our numerator is 5
and our denominator is 4. With an alpha value of .05, our critical F
value is 9.36.
Therefore, if
F computed > is larger than F critical=9.36 we can reject H and
conclude there is a significant difference between our three groups
Table F
For df: 5,4
Nineteen cows are assigned at random among four experiment groups. Each
group is fed a different diet. The data are cow body weights, in kilograms, after
being raised on this these diets. We wish to ask cow weights are the same for
all four diets.
Feed1
Feed 2
Feed 3
Feed 4
60.8
68.7
102.6
87.9
57.0
67.7
102.1
84.2
65.0
74.0
100.2
83.1
58.6
66.3
96.5
85.7
61.7
69.8
90.3
1. The hypothesis are

H : U = 90% = U80%
Ha: U = 90% = U80%
OR
Ha: At least one pair of the i are not equal
Total SS = x ij x
2 = xij2
xij 2
n
2. Total SS = 11981.900 - (1482.2)2

19
= 119981.900 115627.202
= 4354.698
SST = ni x 1 x =
2
Ti 2
ni
CM
3. SSTreatment = (303.1)2 + (346.5)2 + (401.4)2 + (431.2)2 - 115627.202

5
= 18373.922 + 24012.450 + 40289.490 + 37186.688 115627. 202
= 4226.348
4. SSError
= Total SS SSTreatment
= 4354.698 - 4226.348
= 128.350
5. Summary of the Analysis Variance

Source of Variation
SS
df
MS
Total
4354.698
19-1= 18
Treatment
4226.348
4-1=3
1408.783
Error
128.350
18-3=15
8.557
Fcompute
165 (1408.783/8.557)
6. Fcritical = F0.05 (1), 3, 15 = 3.29

7. Conclusion
Since Fcompute > Fcritical we reject H
and conclude there is a significant difference between our four groups of feeds
The Randomized Block Design: A

Two-Way Classification
The randomized block design is a direct extension of the

matched pairs or paired-difference design.
k treatment means are compared.
The design uses blocks of k experimental units that are
relatively similar or homogeneous, with one unit within each
block randomly assigned to each treatment.
If the design involves k treatments within each of b blocks, then
the total number of observations is n = bk.
The purpose of blocking is to remove or isolate the block-toblock variability that might hide the effect of the treatments.
The Analysis of Variance for

a Randomized Block Design
The randomized block design identifies two factorstreatments

and blocks. See Example 11.8.
Partitioning the Total Variation in the Experiment

Let xij be the response when the i th treatment (i = 1, 2, , k) is
applied in the j th block (j = 1, 2, , b). The total variation is the
n = bk observations is
Total SS = x ij x
2 = xij2
xij 2
n
This is partitioned into three (rather that two) parts in such a way
the Total SS = SSB + SST + SSE.
SSB (sum of squares for blocks) measures the

variation among the block means.
SST (sum of squares for treatments) measures the

variation among the treatment means.
SSE (sum of squares for error) measures the

variation of the differences among the treatment
observations within blocks, which measures the
experimental error.
Calculating the Sums of Squares for a Randomized

Block Design, k Treatments in b Blocks
G2
CM =
n
where
G = S xij = Total of all n = bk observations
Total SS = x ij2 CM
= (Sum of squares of all x-values) CM

SST =
SSB =
Ti 2
b
B 2j
k
CM
CM
SSE = Total SS - SST = SSB

with
Ti = Total of all observances receiving treatment i, i = 1, 2, , k
Bj = Total of all observations in block j , j = 1, 2, , b
Each of the three sources of variation, when divided by the

appropriate degree of freedom, provides an estimate of the
variation in the experiment.
Since Total SS involves n = bk squared observations, its
degrees of freedom are d f = (n 1).
Similarly, SST involves k squared totals, and its degree of
freedom are d f = (k 1), while SSB involves b squared totals
and has (b 1) degrees of freedom.
Finally, since the degrees of freedom are additive, the remaining
degrees of freedom associated with SSE can be shown
algebraically to be d f = (b 1)(k 1).
The four sources of variation and their respective degrees of
freedom are combined to form the mean squares as MS = SS/df.
The total variation in the experiment is then displayed in an
analysis of variance table as:
ANOVA Table for a Randomized Block Design,

k treatments and b Blocks
Testing the Equality of the Treatment and Block Means

The mean squares in the analysis of variance table can be
used to test the null hypothesis.
H 0 : No difference among the k treatments means
H 0 : No difference among the block means
versus the alternative hypothesis
H a : At least one of the means is different from the others
using a theoretical argument similar to the one we used for the
completely randomized design.
Remember that s 2 is the common variance for the observations

in all bk blocktreatment combinations. The quantity
MSE + SSE / (b 1)(k 1) is an unbiased estimate of s 2
whether or not H 0 is true.
The two mean squares, MST and MSB, estimate s 2 only if H 0

is true and tend to be unusually large if H 0 is false and either the
treatment or block means are different.
The test statistics
F = MST/ MSE and F = MSB/ MSE
are used to test the equality of treatment and block means,
respectively.
Both statistics tend to be larger than usual if H 0 is false. Hence,
you can reject H 0 for large values of F, using right-tailed critical
values of the F distribution with appropriate degrees of freedom.
Tests for a Randomized Block Design:
For comparing treatment means:

1. Null hypothesis: H 0 : The treatment means are equal
2. Alternative hypothesis: H a : At least two of the treatment
means differ
3. Test statistic: F = MST/ MSE, where F is based on
d f 1 = (k 1) and d f 2 = (b 1)(k 1)
4. Rejection region: Reject if F > Fa, where Fa lies in the
upper tail or the F distribution, or when the p-value < a
For comparing block means:

1. Null hypothesis: H 0 : The block mean are equal
2. Alternative hypothesis: H a : At least two of the block means
differ
3. Test statistic: F = MSB / MSE, where F is based on
d f 1 = (b 1) and d f 2 = (b 1)(k 1)
4. Rejection region: Reject if F > Fa , where Fa lies in the
upper tail or the F distribution, or when the p-value < a
Some Cautionary Comments on Blocking:

- A randomized block design should not be used when
treatments and blocks both correspond to experimental
factors of interest to the researcher, e.g., dosage levels of
two drugs.
- Remember that blocking may not always be beneficial.
- Remember that you cannot construct confidence intervals
for individual treatment means unless it is reasonable to
assume that the b blocks have been randomly selected from
a population of blocks.
Partitioning the Total Sum of Squares

in the Randomized Block Design
SST
(Total Sum of Squares)
SSE
(Error Sum of Squares)
SSC
(Treatment
Sum of Squares)
SSR
(Sum of Squares
Blocks)
Business Statistics, 4e, by Ken Black. 2003 John Wiley & Sons.
SSE
(Sum of Squares
Error)
A Randomized Block Design

Single Independent Variable
.
Individual
observations
.
Blocking
Variable
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Randomized Block Design:

Tread-Wear Example
Speed
Supplier
Slow
Medium
Fast
Block
Means
( X )
3.77
11.3
3.37
10.11
3.53
10.59
3.10
9.3
4.03
12.09
3.56
53.4
i
n=5
3.7
4.5
3.1
3.4
3.9
2.8
3.5
4.1
3.0
3.2
3.5
2.6
3.9
4.8
3.4
Treatment
Means( X )
3.54
17.7
4.16
20.8
2.98
14.9
C=3
N = 15
Total SS = x ij x
2 = xij2
xij 2
n
2. Total SS = 5.176
SST = ni x 1 x =
2
Ti 2
ni
CM
3. SSTreatment = 3.484
4. SSBlock
= 1.549
5. SSError
= Total SS SSBlock - SSTreatment

= 5.176-1.549-3.484
= 0.143
Randomized Block Design:

Mean Square Calculations
SSC 3.484
MSC =
=
= 1742
.
C 1
2
SSR 1549
.
MSR =
=
= 0.387
n 1
4
SSE
0143
.
MSE =
=
= 0.018
N n C +1
8
MSC 1742
.
F=
=
= 96.78
MSE 0.018
for the Tread-Wear Example
Source of VarianceSS
df
Treatment
3.484
Block
1.549
Error
0.143
Total
5.176
MS
2
4
8
14
F
1.742
0.387
0.018
96.78
21.50
Randomized Block Design Treatment

Effects: Procedural Summary
Ho: m1 = m 2 = m 3
Ha: At least one of the means is different from the others
MSC 1.742
F=
=
= 96.78
MSE 0.018
F = 96.78 >
.01,2,8
= 8.65, reject Ho.
Randomized Block Design Blocking

Effects: Procedural Overview
Ho: m1 = m 2 = m 3 = m 4 = m 5
Ha: At least one of the blocking means is different from the others
MSR .387
F=
=
= 21.5
MSE .018
F = 21.5 >
.01,4 ,8
= 7.01, reject Ho.
The a b Factorial Experiment:

A Two-Way Classification
You may be interested in two factors and also the interaction or

relationship between the two factors, e.g., two drugs and their
interactions.
Consider two different examples of interaction on responses in
this situation.
The Analysis of Variance

for an a b Factorial Experiment
An analysis of variance for a two-factor experiment replicated

r times follows the same pattern as the previous designs.
If the letters A and B are to be used to identify the two factors,
the total variation in the experiment
Total SS = x - x = x 2 CM
is partitioned into four parts in such a way that
Total SS = SSA + SSB + SS(AB) + SSE
where
SSA (sum of squares for factor A) measures the variation
among the factor A means.
SSB (sum of squares for factor B) measures the variation
among factor B means.
SSE (sum of errors) measures the variation of the differences

among the observations within each combination of factor
levelsthe experimental error.
Sums of squares SSA and SSB are often called the main effect
of squares, to distinguish them from the interaction sum of
squares.
You can assume that there are:
- a levels of factor A
- b levels of factor B
- r replications for each of the ab factor combinations
- A total of n = abr observations.
Finally, the equality of means for various levels of the factor

combinations (the interaction effect) and for the levels of both
main effects, A and B, can be tested using the ANOVA F tests,
as shown below:
Tests for a Factorial Experiment:

For Interaction:
1. Null hypothesis:
H 0 : Factors A and B do not interact
2. Alternative hypothesis: H a : Factors A and B interact
3. Test statistic: F = MS(AB)/MSE, where F is based on
d f 1 = (a 1)(b 1) and d f 2 = ab(r 1)
4. Rejection region: Reject H 0 when F > Fa where Fa lies in
the upper tail of the F distribution, or when p-value < a
For main effects, factor A:

1. Null hypothesis:
H 0 : There are no differences among the factor A means
2. Alternative hypothesis:
H a : At least two of the factor A means differ
3. Test statistic: F = MSA / MSE, when F is based on
d f 1 = (a 1) and d f 2 = ab(r 1)
4. Rejection region: Reject H 0 when F > Fa or when p-value < a
For main effects, factor B:
1. Null hypothesis:
H 0 : There are no differences among the factor B means
2. Alternative hypothesis:
H a : At least two of the factor B means differ
3. Test statistic: F = MSB / MSE, where F is based on
d f 1 = (b 1) and d f 2 = ab(r 1)
4. Rejection region: Reject H 0 when F > Fa or when p-value < a
If the interaction effect is significant, the differences in the

treatment means can be further studied by looking at the a b
factor level combinations.
If the interaction effect is not significant, then the significance of
the main effect means should be investigated.
Least significant difference
Duncan new multiple range test
Fisher's Least Significant Difference (LSD) Test
The rejection of Ho does not imply that all k means are different from one
another, and we know neither how many differences there nor where
differences are located among the k population means
For example, if k=3, and Ho: Ho: m1 = m2 = m3 rejected, then we do not
whether Ha: m1 = m2 = m3 , Ha: Ha: m1 = m2 = m3 , or Ha: m1 = m2 = m3 is
the appropriate alternate hypothesis.
5. Summary of the Analysis Variance

Source of Variation
SS
df
Total
2437.572
29
Treatment
4226.348
Error
244.130
25
MS
Fcompute
548.360
56.2
9.765
6. Fcritical = F0.05 (1), 4,,25 = 2.76

7. Conclusion
Since Fcompute > Fcritical we reject H
Because a significant F resulted from the analysis of variance,
the Turkey test is no

Anova

Caricato da

Informazioni sul documento

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Anova

Caricato da

Copyright:

Formati disponibili

Analysis of Variance

1998 Brooks/Cole Publishing/ITP

1998 Brooks/Cole Publishing/ITP

Responses exhibit variability.

1998 Brooks/Cole Publishing/ITP

Is the mean of one group significantly different to the

More than One

The Assumption for

Analysis of variance procedures are fairly robust when sample

Assumptions of ANOVA Test and Estimation Procedures:

1998 Brooks/Cole Publishing/ITP

The Completely Randomized Design:

A completely randomized design is one in which random

A fixed number of cotton plants are chosen for treatment,

The completely randomized design has these:

The Analysis of Variance for

Suppose you want to compare k population means based on

1998 Brooks/Cole Publishing/ITP

Partitioning the Total Variation in an Experiment

If we let G represent the grand total of all n observations, then

This Total SS is partitioned into two components. The first

where Ti is the total of the observations for treatment i.

The second component, called the sum of squares for error

SSE = n1 1s12 + n2 1s22 + + nk 1s k2

We can show algebraically that, in the analysis of variance,

1998 Brooks/Cole Publishing/ITP

Similarly, the sum of squares for treatments involves k squared

Notice that the degrees of freedoms for the three sources of

d f (total) = d f (treatments) + d f (error)

ANOVA Table for k Independent Random Samples:

1998 Brooks/Cole Publishing/ITP

For df: 5,4

1. The hypothesis are

2. Total SS = 11981.900 - (1482.2)2

3. SSTreatment = (303.1)2 + (346.5)2 + (401.4)2 + (431.2)2 - 115627.202

= 18373.922 + 24012.450 + 40289.490 + 37186.688 115627. 202

5. Summary of the Analysis Variance

6. Fcritical = F0.05 (1), 3, 15 = 3.29

The Randomized Block Design: A

The randomized block design is a direct extension of the

1998 Brooks/Cole Publishing/ITP

The Analysis of Variance for

The randomized block design identifies two factorstreatments

Partitioning the Total Variation in the Experiment

1998 Brooks/Cole Publishing/ITP

SSB (sum of squares for blocks) measures the

SST (sum of squares for treatments) measures the

SSE (sum of squares for error) measures the

Calculating the Sums of Squares for a Randomized

= (Sum of squares of all x-values) CM

SSE = Total SS - SST = SSB

Bj = Total of all observations in block j , j = 1, 2, , b

1998 Brooks/Cole Publishing/ITP

Each of the three sources of variation, when divided by the

1998 Brooks/Cole Publishing/ITP

ANOVA Table for a Randomized Block Design,

Testing the Equality of the Treatment and Block Means

Remember that s 2 is the common variance for the observations

1998 Brooks/Cole Publishing/ITP

The two mean squares, MST and MSB, estimate s 2 only if H 0

1998 Brooks/Cole Publishing/ITP

Tests for a Randomized Block Design:

For comparing treatment means:

For comparing block means: