Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
EXAMPLE 1.
Shaw and Rausher believe that there is a link between some cognitive abilities and
music. This is then the research hypothesis. To investigate this hypothesis they
decided to use three treatments. These were (a) listening to a Mozart sonata which
finished 10 minutes before the task was performed; (b) listening to minimalist music
which finished 10 minutes before the task was performed; (c) listening to silence
which finished 10 minutes before the task was performed. The experimental units
were college students and they all performed the same cutting and folding task with
a piece of paper. [Reported in Neurological Research, 1996]
EXAMPLE 2.
An experiment was conducted to compare how different tasks affect a worker’s pulse
rate. In this experiment there were 60 male workers who were randomly assigned to
the 6 tasks such that each task had 10 workers assigned to it. Each group was trained
to perform their assigned task. On one day after training each group performed their
assigned task for one hour and then the pulse rates of the workers in each group
were measured. Is there a significant difference in the pulse rates for the 6 tasks?
The results from this experiment (now sorted to appear by task) appear below.
S c a tte r p lo t o f P u ls e a n d T a s k
p c h a n g e
5 0
4 0
3 0
2 0
1 2 3 4 5 6
ta s k
Generated by:
libname lect ’/courses/da9372e5ba27fe300/35356’;
Analysis of Variance
We represent the response on the jth experimental unit receiving the ith treatment
by yij . In general we will assume that there are n observations made on each of the
treatments. We can also define similar models for unequal replication.
Since the goal of the experiment is to try and decide whether the response to each
of the treatments is the same, we need to model the response. Two models are
yij = µi + eij ,
In the effects model we assume that the mean response µi can be written as an
overall mean and a term, τi , called the ith treatment effect. Thus we have
yij = µ + τi + eij ,
For either of these models, we say that this is the linear statistical model for a
one-way treatment classification in a completely randomised design for comparing
a treatments. We allow for some variation from the population mean for each
treatment group by including the eij . These random variables have mean 0 and
variance σ 2 which is constant for all the treatments. For hypothesis testing we
assume further that the error terms are from a normal distribution.
The a treatments that we have been discussing could represent the set of all possible
treatments that we were interested in, or they could represent a random sample
from a much larger set of treatments. In the first case we are interested in the mean
response to each treatment. In the second case we would like to be able to make
comments about all of the possible treatments, whether or not they were actually
included in the experiment. In this case we think of the τi as random variables and
we are interested in using the experiment to decide if there is substantial variability
between the treatments. The first model is referred to as a fixed effects model and the
yij − y .. = yij − y i. + y i. − y ..
= (yij − y i. ) + (y i. − y .. )
which is the fundamental identity of the analysis of variance. This identity is often
written as
T h e A N O V A P ro c e d u re
D e p e n d e n t V a r ia b le : p c h a n g e
S u m o f
S o u r c e D F S q u a r e s M e a n S q u a r e F V a lu e P r > F
M o d e l 5 6 2 4 .7 3 3 3 3 3 1 2 4 .9 4 6 6 6 7 4 .2 5 0 .0 0 2 5
E r r o r 5 4 1 5 8 6 .0 0 0 0 0 0 2 9 .3 7 0 3 7 0
C o r r e c te d T o ta l 5 9 2 2 1 0 .7 3 3 3 3 3
R -S q u a r e C o e ff V a r R o o t M S E p c h a n g e M e a n
0 .2 8 2 5 9 1 1 6 .5 3 9 5 0 5 .4 1 9 4 4 4 3 2 .7 6 6 6 7
S o u r c e D F A n o v a S S M e a n S q u a r e F V a lu e P r > F
ta sk 5 6 2 4 .7 3 3 3 3 3 3 1 2 4 .9 4 6 6 6 6 7 4 .2 5 0 .0 0 2 5
p c h a n g e
L e v e l o f
ta sk N M e a n S td D e v
1 1 0 3 2 .7 0 0 0 0 0 0 5 .1 2 1 8 4 8 6 2
2 1 0 3 1 .9 0 0 0 0 0 0 5 .8 2 0 4 6 1 9 9
3 1 0 3 5 .8 0 0 0 0 0 0 5 .3 0 8 2 7 4 4 6
4 1 0 3 7 .9 0 0 0 0 0 0 6 .5 2 2 6 1 0 2 5
5 1 0 2 9 .2 0 0 0 0 0 0 4 .6 1 3 9 8 8 3 9
6 1 0 2 9 .1 0 0 0 0 0 0 4 .9 0 9 1 7 5 0 8
T h e A N O V A P ro c e d u re
p c h a n g e
L e v e l o f
ta sk N M e a n S td D e v
Estimating Model Parameters
1 1 0 3 2 .7 0 0 0 0 0 0 5 .1 2 1 8 4 8 6 2
2 1 0 3 1 .9 0 0 0 0 0 0 5 .8 2 0 4 6 1 9 9
3 1 0 3 5 .8 0 0 0 0 0 0 5 .3 0 8 2 7 4 4 6
We will estimate the model parameters
4
using least squares. To do this we calculate
1 0 3 7 .9 0 0 0 0 0 0 6 .5 2 2 6 1 0 2 5
6 1 0 2 9 .1 0 0 0 0 0 0 4 .9 0 9 1 7 5 0 8
a X
X b a X
X b
(yij − µ − τi )2 = e2ij = S,
i=1 j=1 i=1 j=1
and choose values for µ and τi that minimise this sum of squares. Thus we must
differentiate S with respect to each of the parameters in turn, set the resulting
equations to 0 and solve to find the parameter estimates.
abb
µ = y..,
bb
µ + bb
τi = yi ., i = 1, . . . , a,
b = y..
µ
τbi = y i . − y..
for i = 1, . . . , a and j = 1, . . . , b.
Other constraints could give different estimates for the parameter values in the model
but the estimates for the estimable functions are independent of the constraints
chosen.
0g 5g 10g 15g
6.7 7.8 9.9 11.9 10.4 9.1 9.3 10.2
7.8 8.6 8.4 7.1 8.1 8.8 9.3 8.7
5.5 7.4 10.4 6.4 10.6 8.1 7.2 8.6
8.4 5.8 9.3 8.6 8.7 7.8 7.8 9.3
7 7 10.7 10.6 10.7 8 9.3 7.2
T h e G L M P ro c e d u re
D e p e n d e n t V a r ia b le : H E M O
S u m o f
S o u r c e D F S q u a r e s M e a n S q u a r e F V a lu e P r > F
M o d e l 3 2 6 .8 0 2 7 5 0 0 0 8 .9 3 4 2 5 0 0 0 5 .7 0 0 .0 0 2 7
E r r o r 3 6 5 6 .4 7 1 0 0 0 0 0 1 .5 6 8 6 3 8 8 9
C o r r e c te d T o ta l 3 9 8 3 .2 7 3 7 5 0 0 0
R -S q u a r e C o e ff V a r R o o t M S E H E M O M e a n
0 .3 2 1 8 6 3 1 4 .6 2 7 1 9 1 .2 5 2 4 5 3 8 .5 6 2 5 0 0
S o u r c e D F T y p e I S S M e a n S q u a r e F V a lu e P r > F
S U L F A 3 2 6 .8 0 2 7 5 0 0 0 8 .9 3 4 2 5 0 0 0 5 .7 0 0 .0 0 2 7
S o u r c e D F T y p e III S S M e a n S q u a r e F V a lu e P r > F
S U L F A 3 2 6 .8 0 2 7 5 0 0 0 8 .9 3 4 2 5 0 0 0 5 .7 0 0 .0 0 2 7
D is trib u tio n o f H E M O
1 2 F 5 .7 0
P ro b > F 0 .0 0 2 7
1 0
H E M O
0 .3 2 1 8 6 3 1 4 .6 2 7 1 9 1 .2 5 2 4 5 3 8 .5 6 2 5 0 0
S o u r c e D F T y p e I S S M e a n S q u a r e F V a lu e P r > F
S U L F A 3 2 6 .8 0 2 7 5 0 0 0 8 .9 3 4 2 5 0 0 0 5 .7 0 0 .0 0 2 7
S o u r c e D F T y p e III S S M e a n S q u a r e F V a lu e P r > F
S U L F A 3 2 6 .8 0 2 7 5 0 0 0 8 .9 3 4 2 5 0 0 0 5 .7 0 0 .0 0 2 7
D is trib u tio n o f H E M O
1 2 F 5 .7 0
P ro b > F 0 .0 0 2 7
1 0
H E M O
1 2 3 4
S U L F A
Power Analysis
F ix e d S c e n a r io E le m e n t s
M e th o d E x a c t
G r o u p M e a n s 3 2 .7 3 1 .9 3 5 .8 3 7 .9 2 9 .2 2 9 .1
S t a n d a r d D e v ia t io n 5 .4 1 9 4 4 4
S a m p le S iz e P e r G r o u p 1 0
A lp h a 0 .0 5
C o m p u te d
P o w e r
P o w e r
0 .9 4 3
T h e P O W E R P ro c e d u re
O v e r a ll F T e s t fo r O n e -W a y A N O V A
These are usually based on the estimated residuals. Recall that êij = yij − µ̂i , for
i = l, 2, . . . , a, and j = 1, 2, . . . , r are the estimated residuals.
To test the assumption of normality we plot the residuals against the expected
values of the residuals if they do indeed come from a normal distribution. Thus a
visual comparison and formal tests based on the difference between the estimated
residuals and their expected values have been developed (see below). To test the
assumption of constant variance we plot the estimated residuals against the fitted
values for each treatment group. All treatment groups should have residuals that
are approximately equally spread out.
In SAS, we use the GLM procedure and in R, we use the commands
model <- aov(response ~ tmt, data=dataset)
residuals<-rstandard(model)
plot(model)
where dataset is the name of the data set containing the variables response and tmt.
EXAMPLE 6. There are various methods available to estimate the peak discharge
from a watershed. Four of these methods were compared (Montgomery [2005]).
Since there were four methods, we have a = 4. Each of the methods was used six
times on the watershed and the data are the discharge volumes in cubic feet per
second. In this experiment each treatment has r = 6 replications.
S c a tte r p lo t o f W a te r s h e d D a ta
v o lu m e
1 8
1 7
1 6
1 5
1 4
1 3
1 2
1 1
1 0
0
1 2 3 4
m e th o d
To obtain the ANOVA output as well as residual plots, we run the following code
in SAS,
proc glm data=lect.watershed plots=diagnostics;
class method;
model volume=method;
means method;
run;
T h e G L M P ro c e d u re
D e p e n d e n t V a r ia b le : v o lu m e
S u m o f
S o u r c e D F S q u a r e s M e a n S q u a r e F V a lu e P r > F
M o d e l 3 7 0 8 .3 4 7 1 1 2 5 2 3 6 .1 1 5 7 0 4 2 7 6 .0 7 < .0 0 0 1
E r r o r 2 0 6 2 .0 8 1 0 8 3 3 3 .1 0 4 0 5 4 2
C o r r e c te d T o ta l 2 3 7 7 0 .4 2 8 1 9 5 8
R -S q u a r e C o e ff V a r R o o t M S E v o lu m e M e a n
0 .9 1 9 4 2 0 2 7 .1 2 4 2 4 1 .7 6 1 8 3 3 6 .4 9 5 4 1 7
S o u r c e D F T y p e I S S M e a n S q u a r e F V a lu e P r > F
m e th o d 3 7 0 8 .3 4 7 1 1 2 5 2 3 6 .1 1 5 7 0 4 2 7 6 .0 7 < .0 0 0 1
S o u r c e D F T y p e III S S M e a n S q u a r e F V a lu e P r > F
W e d n e s d a y , F e b ru a ry 1 2 , 2 0 1 4 1 2 :0 2 :4 8 P M 4
m e th o d 3 7 0 8 .3 4 7 1 1 2 5 2 3 6 .1 1 5 7 0 4 2 7 6 .0 7 < .0 0 0 1
T h e G L M P ro c e d u re
D e p e n d e n t V a r ia b le : v o lu m e
EXAMPLE 7. (from Dean and Voss [1999]) Larry was interested in determining
whether the amount of time he had to wait at a particular pedestrian crossing
depended on the number of times he pushed the button. In this case the response
Number of Pushes
0 1 2 3
38.14 38.28 38.17 38.14
38.20 37.17 38.13 38.30
38.31 38.08 38.16 38.21
38.14 38.25 38.30 38.04
38.29 38.18 38.34 38.37
38.17 38.03 38.34
38.20 37.95 38.17
38.26 38.18
38.30 38.09
38.21 38.06 W e d n e s d a y , F e b ru a ry 1 2 , 2 0 1 4 1 2 :0 2 :4 8 P M 1 5
S c a tte r p lo t o f P e d e s tr ia n D a ta
tim e
3 8 .4
3 8 .3
3 8 .2
3 8 .1
3 8 .0
3 7 .9
3 7 .8
3 7 .7
3 7 .6
3 7 .5
3 7 .4
3 7 .3
3 7 .2
3 7 .1
0 1 2 3
p re sse s
T h e G L M P ro c e d u re
D e p e n d e n t V a r ia b le : tim e
S u m o f
S o u r c e D F S q u a r e s M e a n S q u a r e F V a lu e P r > F
M o d e l 3 0 .1 1 9 7 9 7 1 4 0 .0 3 9 9 3 2 3 8 0 .9 3 0 .4 4 1 3
E r r o r 2 8 1 .2 0 7 9 5 2 8 6 0 .0 4 3 1 4 1 1 7
C o r r e c te d T o ta l 3 1 1 .3 2 7 7 5 0 0 0
R -S q u a r e C o e ff V a r R o o t M S E t im e M e a n
0 .0 9 0 2 2 6 0 .5 4 4 2 8 1 0 .2 0 7 7 0 5 3 8 .1 6 1 2 5
S o u r c e D F T y p e I S S M e a n S q u a r e F V a lu e P r > F
p r e sse s 3 0 .1 1 9 7 9 7 1 4 0 .0 3 9 9 3 2 3 8 0 .9 3 0 .4 4 1 3
S o u r c e D F T y p e III S S M e a n S q u a r e F V a lu e P r > F
p r e sse s 3 0 .1 1 9 7 9 7 1 4 0 .0 3 9 9 3 2 3 8 0 .9 3 0 .4 4 1 3
W e d n e s d a y , F e b ru a ry 1 2 , 2 0 1 4 1 2 :0 2 :4 8 P M 1 8
T h e G L M P ro c e d u re
D e p e n d e n t V a r ia b le : tim e
To do this in SAS, we can use the hovtest in the means line of PROC GLM. In R,
we use the following commands
L e v e n e ' s T e Bs t a f s o i c r SH t o a m t i s o t g i c e a n l e M i t y e a o s f u v r o e l s u m e V a r ia n c e
A N O V A o f S q u a r e d D e v ia t io n s f r o m G r o u p M e a n s
L o c a t io n V a r ia b ilit y
S u m o f M e a n
M
S o u r c e e a n 0 .
D F 0 0 0 0 0 S
S q u a r e st d D e v S q t ui o a n r e
i a F V a 1 l u. 6 e 4 2 9 2 P r > F
m e t h oM d e d i a n - 0 .133 8 3 3 1 V 3 a 5 r. 7 i a n c 4e 5 . 2 4 1 7 6 2 . .7 6 5 9 9 1 8 0 . 0 0 2 5
E r r o rM o d e - 0 2. 5 0 9 0 0 0 1 R 3 a 4 n. 1 g e 6 .7 0 5 0 6 .2 5 0 0 0
I n t e r q u a r t ile R a n g e 2 .0 7 0 0 0
run; S i g n M - 1 P r > = | M | 0 .8 3 8 8
S ig n e d R a n k S 1 P r > = |S | 0 .9 7 7 9
The relevant table is:
T e s t s f o r N o r m a lit y
T e st S t a t is t ic p V a lu e
S h a p ir o - W ilk W 0 .9 5 7 0 1 4 P r < W 0 .3 8 1 4
K o lm o g o r o v - S m ir n o v D 0 .1 2 8 9 1 5 P r > D > 0 .1 5 0 0
C r a m e r - v o n M is e s W -S q 0 .0 5 3 6 7 1 P r > W -S q > 0 .2 5 0 0
U n cour
The additional table in o r r e c SAS
t e d S S output
1 . 2 0 7 9 5 2 8 6 isC o then:
r r e c te d S S 1 .2 0 7 9 5 2 8 6
W e d n e s d a y , F e b ru a ry 1 2 , 2 0 1 4 1 2 :0 2 :4 8 P M 2 5
C o e f f V a r ia t io n . S td E r r o r M e a n 0 .0 3 4 8 9 5 4 8
T h e G L M P ro c e d u re
B a s ic S t a t is t ic a l M e a su r e s
L e v e n e 's T e s t f o r H o m o g e n e it y o f t im e V a r ia n c e
A N O V L oA c o a f t i oS n q u a r e d D e v i a t i o n V s a f r r i oa mb i l i G t y r o u p M e a n s
M e a n 0 . 0 0 0 0 0 S u S m t d o Df e v i a M t i o e n a n 0 .1 9 7 4 0
S o u r c e D F S q u a r e s S q u a r e F V a lu e P r > F
M e d ia n - 0 .0 0 7 1 4 V a r ia n c e 0 .0 3 8 9 7
p r e sse s 3 0 .0 6 0 9 0 .0 2 0 3 1 .0 1 0 .4 0 3 3
M o d e - 0 .0 6 7 1 4 R a n g e 1 .1 3 0 0 0
E r r o r 2 8 0 .5 6 3 5 0 .0 2 0 1
I n t e r q u a r t ile R a n g e 0 .1 8 9 5 7
N o te : T h e m o d e d is p la y e d is th e s m a lle s t o f 4 m o d e s w ith a c o u n t o f 2 .
To obtain the normality test, we use PROC UNIVARIATE
T e s t s f o r L o c a t io n : M u 0 = 0
proc univariate data=lect.pedestrian2
T e st S t a t is t ic
normaltest;
p V a lu e
var resi; S t u d e n t 's t t 0 P r > |t | 1 .0 0 0 0
run; S ig n M - 2 P r > = |M | 0 .5 9 6 6
S i g n e d (amongst
We obtain the following table R a n k S 4 3 P r > = |S |
many others)0 . 4 3 0 0
T e s t s f o r N o r m a lit y
T e st S t a t is t ic p V a lu e
S h a p ir o - W ilk W 0 .7 0 2 5 3 5 P r < W < 0 .0 0 0 1
K o lm o g o r o v - S m ir n o v D 0 .2 0 1 4 0 1 P r > D < 0 .0 1 0 0
C r a m e r - v o n M is e s W -S q 0 .2 8 4 6 0 7 P r > W -S q < 0 .0 0 5 0
T h e G L M P ro c e d u re
D e p e n d e n t V a r ia b le : H E M O
S u m o f
S o u r c e D F S q u a r e s M e a n S q u a r e F V a lu e P r > F
M o d e l 3 2 6 .8 0 2 7 5 0 0 0 8 .9 3 4 2 5 0 0 0 5 .7 0 0 .0 0 2 7
E r r o r 3 6 5 6 .4 7 1 0 0 0 0 0 1 .5 6 8 6 3 8 8 9
C o r r e c te d T o ta l 3 9 8 3 .2 7 3 7 5 0 0 0
R -S q u a r e C o e ff V a r R o o t M S E H E M O M e a n
0 .3 2 1 8 6 3 1 4 .6 2 7 1 9 1 .2 5 2 4 5 3 8 .5 6 2 5 0 0
S o u r c e D F T y p e I S S M e a n S q u a r e F V a lu e P r > F
S U L F A 3 2 6 .8 0 2 7 5 0 0 0 8 .9 3 4 2 5 0 0 0 5 .7 0 0 .0 0 2 7
S o u r c e D F T y p e III S S M e a n S q u a r e F V a lu e P r > F
S U L F A 3 2 6 .8 0 2 7 5 0 0 0 8 .9 3 4 2 5 0 0 0 5 .7 0 0 .0 0 2 7 T u e s d a y , F e b ru a ry 1 0 , 2 0 1 5 1 1 :2 1 :3 5 P M 5
T h e G L M P ro c e d u re
D is trib u tio n o f H E M O
L e v e n e 's T e s t fo r H o m o g e n e it y o f H E M O V a r ia n c e
1 2 F 5 .7 0
A N O V A o f S q u a r e d D e v ia t io n s fr o m G r o u p M e a n s
P ro b > F 0 .0 0 2 7
S u m o f M e a n
S o u r c e D F S q u a r e s S q u a r e F V a lu e P r > F
S U L F A 3 2 0 .9 1 0 1 6 .9 7 0 0 2 .3 5 0 .0 8 9 1
E r r o r 3 6 1 0 7 .0 2 .9 7 1 3
1 0
H E M O
T e s t S t a t is t ic p V a lu e
S t u d e n t 's t t 0 P r > |t | 1 .0 0 0 0
S ig n M 0 P r > = |M | 1 .0 0 0 0
S ig n e d R a n k S 5 .5 P r > = |S | 0 .9 4 2 2
T e s t s fo r N o r m a lit y
T e s t S t a t is t ic p V a lu e
S h a p ir o - W ilk W 0 .9 8 2 6 5 7 P r < W 0 .7 8 6 6
K o lm o g o r o v - S m ir n o v D 0 .1 0 7 1 4 P r > D > 0 .1 5 0 0
C r a m e r - v o n M is e s W -S q 0 .0 4 6 7 2 7 P r > W -S q > 0 .2 5 0 0
Q u a n t ile s ( D e fin it io n 5 )
L e v e l Q u a n t ile
Further reading
1 0 0 % M a x 2 .5 7 0
9 9 % 2 .5 7 0
These notes are only intended to provide an overview of the material. They are
9 5 % 1 .6 2 0
supplemented by the discussion that takes place in the classroom, both in lectures
9 0 % 1 .4 5 5
and in labs. The exercise sheets are an integral part of the subject and students
7 5 % Q 3 0 .8 4 0
should attempt all of the questions.
5 0 % M e d ia n -0 .0 1 0
Students who would like further reading
2 5 % Q 1
about- 0 . this
9 3 0
topic have many options as this
is a standard topic that is covered in any book on designed experiments. Kuehl
[2000] and Montgomery (2007 and earlier editions) both cover this topic in detail
and are well written. But any book that you find in the library that covers this
topic in a way that you find helpful is suitable.
References
A. Dean and D. Voss. Design and Analysis of Experiments. New York: Springer,
1999. ISBN 0387985611.
D.C. Montgomery. Design and analysis of experiments. John Wiley & Sons Inc,
2005. ISBN 047148735X.