Sei sulla pagina 1di 29

Analysis of Variance

Jigyasu Gaur

General ANOVA Setting

Investigator controls one or more independent


variables

Observe effects on the dependent variable

Called factors (or treatment variables)


Each factor contains two or more levels (or groups or
categories/classifications)
Response to levels of independent variable

Experimental design: the plan used to collect


the data

One-Way Analysis of Variance

Evaluate the difference among the means of


three or more groups
Example: Performance rates for 1st, 2nd, and 3rd shift of
employees in a factory

Assumptions
Populations are normally distributed : or CLT
applies
Populations have equal variances
Samples are randomly and independently
drawn

Hypotheses of One-Way ANOVA

H0 : 1 2 3 c

All population means are equal

i.e., no treatment effect (no variation in means among


groups)

H1 : Not all of the population means are the same

At least one population mean is different

i.e., there is a treatment effect

Does not mean that all population means are different


(some pairs may be the same)

Why ANOVA?
We could compare the means, one by one using t-tests
for difference of means.
Problem: each test contains type I error
k
The total type I error is
1 1 where k is the
number of comaprisons.
For example, if there are 5 means and you use =.05, you
must make 10 two by two comparisons. Thus, the type
I error is 1-(.95)10, which is .59. That is, 59% of the time
you will reject the null hypothesis of equal means in
favor of the alternative!

One-Way ANOVA
H0 : 1 2 3 c
H1 : Not all i are the same
All Means are the same:
The Null Hypothesis is True
(No Treatment Effect)

1 2 3

One-Way ANOVA
H0 : 1 2 3 c
H1 : Not all i are the same
At least one mean is different:
The Null Hypothesis is NOT true
(Treatment Effect is present)
or

1 2 3

1 2 3

(continued)

Partitioning the Variation

Total variation can be split into two parts:

SST = SSA + SSW


SST = Total Sum of Squares
(Total variation)
SSA = Sum of Squares Among Groups
(Between-group variation)
SSW = Sum of Squares Within Groups
(Within-group variation)

Partitioning the Variation


(continued)

SST = SSA + SSW


Total Variation = the aggregate dispersion of the individual
data values across the various factor levels (SST)
Between-Group Variation = dispersion between the factor
sample means (SSA)
Within-Group Variation = dispersion that exists among the
data values within a particular factor level (SSW)

Partition of Total Variation


Total Variation (SST)

Variation Due to
Factor (SSA)

Commonly referred to as:


Sum of Squares Between
Sum of Squares Among
Sum of Squares Explained
Among Groups Variation

Variation Due to Random


Sampling (SSW)

Commonly referred to as:


Sum of Squares Within
Sum of Squares Error
Sum of Squares Unexplained
Within Groups Variation

Total Sum of Squares


SST = SSA + SSW
c

nj

SST ( Xij X)
Where:

j1 i 1

SST = Total sum of squares


c = number of groups (levels or treatments)
nj = number of observations in group j
Xij = ith observation from group j
X = grand mean (mean of all data values)

Total Variation
(continued)

SST ( X11 X)2 ( X12 X)2 ... ( Xcnc X)2


R esp on se, X

X
G ro u p 1

G ro u p 2

G ro u p 3

Among-Group Variation
SST = SSA + SSW
c

SSA n j ( X j X)2
Where:

j1

SSA = Sum of squares among groups


c = number of groups or populations
nj = sample size from group j
Xj = sample mean from group j
X = grand mean (mean of all data values)

Among-Group Variation
(continued)
c

SSA n j ( X j X)2
j 1

Variation Due to
Differences Among Groups

SSA
MSA
c 1
Mean Square Among =
SSA/degrees of freedom

Among-Group Variation
(continued)

SSA n1 ( x1 x ) n2 ( x 2 x ) ... nc ( x c x )
2

Response, X

X3
X1
G ro u p 1

G ro u p 2

X2
G ro u p 3

Within-Group Variation
SST = SSA + SSW
c

SSW
j 1

nj

i 1

( Xij X j )

Where:

SSW = Sum of squares within groups


c = number of groups
nj = sample size from group j
Xj = sample mean from group j
Xij = ith observation in group j

Within-Group Variation
(continued)
c

SSW
j1

nj

i 1

( Xij X j )2

Summing the variation


within each group and then
adding over all groups

SSW
MSW
nc
Mean Square Within =
SSW/degrees of freedom

Within-Group Variation
(continued)

SSW ( x11 X1 ) ( X12 X 2 ) ... ( Xcnc Xc )


2

Response, X

X1
G ro u p 1

G ro u p 2

X2
G ro u p 3

X3

Obtaining the Mean Squares


SSA
MSA
c 1
SSW
MSW
nc
SST
MST
n 1

One-Way ANOVA Table


Source of
Variation

SS

df

Among
Groups

SSA

c-1

Within
Groups

SSW

n-c

SST =
SSA+SSW

n-1

Total

MS
(Variance)

F ratio

SSA
MSA
MSA =
c - 1 F = MSW
SSW
MSW =
n-c

c = number of groups
n = sum of the sample sizes from all groups
df = degrees of freedom

One-Factor ANOVA
F Test Statistic
H0: 1= 2 = = c
H1: At least two population means are different

Test statistic

MSA
F
MSW

MSA is mean squares among variances


MSW is mean squares within variances

Degrees of freedom

df1 = c 1

(c = number of groups)

df2 = n c

(n = sum of sample sizes from all populations)

Interpreting One-Factor ANOVA


F Statistic

The F statistic is the ratio of the among


estimate of variance and the within estimate
of variance

The ratio must always be positive


df1 = c -1 will typically be small
df2 = n - c will typically be large

Decision Rule:
Reject H if F > F ,
0
U
otherwise do not
reject H0

= .05

Do not
reject H0

Reject H0

FU

One-Factor ANOVA
F Test Example
You want to see if three
different golf clubs yield
different distances. You
randomly select five
measurements from trials on
an automated driving
machine for each club. At
the .05 significance level, is
there a difference in mean
distance?

Club 1
254
263
241
237
251

Club 2
234
218
235
227
216

Club 3
200
222
197
206
204

One-Factor ANOVA Example:


Scatter Diagram
Club 1
254
263
241
237
251

Club 2
234
218
235
227
216

Club 3
200
222
197
206
204

Distance
270
260
250
240
230

220

x1 249.2 x 2 226.0 x 3 205.8


x 227.0

210

X1

X2

200
190
1

2
Club

X
X3

One-Factor ANOVA Example


Computations
Club 1
254
263
241
237
251

Club 2
234
218
235
227
216

Club 3
200
222
197
206
204

X1 = 249.2

n1 = 5

X2 = 226.0

n2 = 5

X3 = 205.8

n3 = 5

X = 227.0

n = 15

c=3
SSA = 5 (249.2 227)2 + 5 (226 227)2 + 5 (205.8 227)2 = 4716.4
SSW = (254 249.2)2 + (263 249.2)2 ++ (204 205.8)2 = 1119.6

MSA = 4716.4 / (3-1) = 2358.2


MSW = 1119.6 / (15-3) = 93.3

2358.2
F
25.275
93.3

One-Factor ANOVA Example


Solution
Test Statistic:

H0: 1 = 2 = 3
H1: i not all equal
= .05
df1= 2

df2 = 12
Critical
Value:
FU = 3.89
= .05

Do not
reject H0

Reject H0

FU = 3.89

MSA 2358.2
F

25.275
MSW
93.3

Decision:
Reject H0 at = 0.05
Conclusion:
There is evidence that
at least one i differs
F = 25.275
from the rest

ANOVA -- Single Factor:


Excel Output
EXCEL: tools | data analysis | ANOVA: single factor
SUMMARY
Groups

Count

Sum

Average

Variance

Club 1

1246

249.2

108.2

Club 2

1130

226

77.5

Club 3

1029

205.8

94.2

ANOVA
Source of
Variation

SS

df

MS

Between
Groups

4716.4

2358.2

Within
Groups

1119.6

12

93.3

Total

5836.0

14

F
25.275

P-value
4.99E-05

F crit
3.89

What happens if there is more than 1 explanation


for changes in the dependent variable?

If 2 or more independent variables all have


independent effects then you get a good result
by doing separate 1-way ANOVA analyses.
This is likely only when the independent
variables are not related to each other (not
correlated) and when there is no interaction
between them in influencing the dependent
variable.

Thank You

Potrebbero piacerti anche