Anova PDF

Analysis of variance
(ANOVA)
A
G
1. All the students of this batch are requested to keep a copy of
the main class note in order to check for any discrepancy
N
2. And for outsiders of the batch if you find any mistake re-
garding anything in this notes, then before taking screen shot
and making fun of that, report to me directly cause always re-
RA
member you are reading this as your note is not worthwhile
What is analysis of variance? Explain with an example.

The total variation present in a set of observable quantities may under
certain circumstances be partitioned into a number of disjoint components
TA
associated with the nature of classification of the data. The systematic

methodology by which one can partition the causes of variation into several
components is called analysis of variance.
Let us consider yield of paddy. Suppose the yield is carried out using three
kinds of seeds . So, the yield variation occurs due to variation of seed and
also due to some random error. This is an example of one-way fixed effect
layout of ANOVA.
What is Gauss-Markov Linear model?.Distinguish between fixed,
mixed and random effects model. Also distinguish between ANOVA,
Regression and ANCOVA model
Consider ‘n’ independent random observations y1 , y2 , . . . , yn where
1
E(yi ) = ai1 β1 + ai2 β2 + · · · + aip βp
V (yi ) = σ 2
cov(yi , yj ) = 0 ∀ i 6= j
where β1 , β2 , . . . , βp are model parameters and ai1 , ai2 , . . . , aip are coefficients
which are known.
Define,
A
 
a11 a12 ... a1p
 
 a21 a22 . . . a2p 
n×p
X = .
 
.. .. 
G
 .. . . 
 
an1 an2 . . . anp
 
N y1
 
 y2 
y= . 
 
.
e  .
yn
RA
   
E(y1 ) a11 β1 + a12 β2 + · · · + a1p βp
   
 E(y2 )   a21 β1 + a22 β2 + · · · + a2p βp 
E(y ) =  .  = 
   
. .. 
e 
 . 
 
 . 

E(yn ) an1 β1 + an2 β2 + · · · + anp βp
  
TA
a11 a12 . . . a1p β1

  
 a21 a22 . . . a2p  β2 
= .
  
 .. .. ..    ... 
 
 . .  
an1 an2 . . . anp βp
= X n×p β p×1
e
 
β1
 
β2 
where β =  .  .
 
.
e  .
βp
So finally we can write y = Xβ + e, where E(e) = 0 and Disp (e) = σ 2 In .
e e e e e
2
This model is called Gauss-Markov linear model:
If we assume that
β1 , β2 , . . . , βp themselves are a random sample from the
distribution of β̃1 , β̃2 , β̃p , where β̃j is a random variable whose realised
value is βj , then the model is called random effects model.
If all the parameters are prefixed then the model is called fixed effects model.
If some of the parameters are fixed quantity and some are chosen at random
then the model is called Mixed effects model.
Let us consider the Gauss Markov linear model Y = Xβ + e, where X
is called design matrix, which contains coefficients corresponding
e e to model
e
A
parameters.
If the value of the coefficients are binary i.e., 0 or 1, then the model is
called Analysis of variance model.
G
By an ANOVA model we test whether an effect is absent or present.
When the value of the coefficients come from the value of some other
independent co-variates i.e., coefficients have usual continuous value, then
N
the model is called a regression model.. Hence by a regression model, the
impact of an effect can be judged.
If some of the coefficient values are binary and some are continuous then
RA
the model is called Analysis of covariance (ANCOVA) model.
One way fixed effects model:
Motivation: Let us consider that a single medicine which controls fever, has
four different dose levels.
TA

DrugA 


10 mg 



15 mg Dose levels


25 mg 




50 mg 
Here fever is influenced by a single medicine( Factor) and the factor has
fixed levels. Hence dependence of fever over medicine has one way varia-
tion viz medicine Suppose there be a single factor A with ‘k’ fixed levels
A1 , A2 , . . . , Ak , say.
For ith level, let there be ni observations, yi1 , yi2 , . . . , yni , i = 1(1)k. We
represent the observations in the following array data.
3
A1 y11 y12 . . . y1n1
A2 y21 y22 . . . y2n2
.. .. .. ..
. . . .
Ak yk1 yk2 . . . yknk
k
X
ni = n is the total number of responses
A
i=1
one way fixed effect model is given by
yij = µi + eij , i = 1(1)k, j = 1(1)ni
yij = response corresponding to jth observation of ith level of A
G
µi = effect due to ith level of A
N
Next we re parametrise the model as follows
yij = µ + (µi − µ) + eij
= µ + αi + eij
RA
αi = additional effect due to ith level of A
µ = general effect
eij = error in model
Re parametrization essentially leads to separation of error and exact effect
due the factor A
TA
Assumption:
k
X k
X k
X k
X
1. ni αi = ni (µi − µ) = ni µ i − µ ni = nµ − nµ
i=1 i=1 i=1 i=1
P
ni µ i
=0 µ= = grand mean of effects
n
a
2. eij ∼ N (0, σ 2 )
Estimation of the model parameter:

XX
E= e2ij
i j
4
∂E ∂ XX
=0⇒ (yij − µ − αi )2 = 0
∂µ ∂µ
i j
XX
⇒ (yij − µ − αi )(−1) = 0
i j
XX XX XX
⇒ yij = µ+ αi
i j i j i j
XX k
X
⇒ yij = nµ − ni αi
i j i=1
A
| {z }
0
k X
ni
1 X
⇒ µ̂ = yij = ȳ00
n
i=1 j=1
G
k ni
∂ XX
(yij − µ − αi )2 = 0
∂αi
i=1 j=1
⇒
N ni
X
j=1
(yij − µ − αi )(−1) = 0 [here we differentiate on w.r.t. αi
for a specific value of i = 1(1)k. So
RA
summation over i vanishes]
ni
X ni
X ni
X
⇒ yij − µ− αi = 0
j=1 j=1 j=1
ni
X
⇒ yij − ni µ = ni αi
j=1
TA
ni
1 X
⇒ α̂i = −µ̂ = ȳi0 − ȳ00
ni
j=1
i.e., yij = ȳ00 + (ȳi0 − ȳ00 ) + eij (1)
Now
eij = yij − α̂i − µ̂ = yij − (ȳi0 − ȳ00 ) − ȳ00
= yij − ȳi0
i.e., from (1)
yij = ȳ00 + (ȳi0 − ȳ00 ) + eij
⇒ (yij − ȳ00 ) = (ȳi0 − ȳ00 ) + (yij − ȳi0 )
5
Squaring and summing over i and j we get
k X
X ni k X
X ni k X
X ni
(yij − ȳ00 )2 = (ȳi0 − ȳ00 )2 + (yij − ȳi0 )2
i=1 j=1 i=1 j=1 i=1 j=1
(product term vanishes)

k
X k X
X ni
= ni (ȳi0 − ȳ00 )2 + (yij − ȳio )2
i=1 i=1 j=1
A
where
G
k
X
ni (ȳi0 − ȳ00 )2 = Sum of squares due to the factor A. (SSA)
i=1
k X
X ni
(yij − ȳio )2 = sum of squares due to the error (SSE)
N
i=1 j=1
RA
i.e., T SS = SSA + SSE
This is called orthogonal splitting of total sum of square.
Now note that TSS carries degrees of freedom (n − 1) and SSA carries degrees
of freedom (k − 1).Hence SSE carries degrees of freedom n − k
Hypothesis: Here we want to test whether all levels of A have similar
effect or not H0 : αi = 0 vs. Hi : At least one inequality in H0 .
TA
yij = µi + eij
α i = µi − µ = 0
[µ1 = µ2 = · · · = µk ]
Expectation of sum of squares:
k
X
SSA = ni (ȳi0 − ȳ00 )2
i=1
6
Note that yij = µ + αi + eij
ni ni
1 X 1 X
ȳi0 = yij = (µ + αi + eij )
ni ni
j=1 j=1
ni ni ni
1 X 1 X 1 X
= µ+ αi + eij
ni ni ni
j=1 j=1 j=1
| {z }
ēi0
A
= µ + αi + ēi0
i k n i k n
1 XX 1 XX
ȳ00 = yij = (µ + αi + eij )
n n
i=1 j=1 i=1 j=1
G
ni
k X ni
k X i k n
1 X 1 X 1 XX
= µ+ αi + eij
n n n
N i=1 j=1 i=1 j=1 i=1 j=1
| {z }
0
= µ + ē00
k
X
i.e., SSA = ni {(µ + αi + ēi0 ) − (µ + ē00 )}2
RA
i=1
Xk
= ni {αi + ēi0 − ē00 }2
i=1
Xk
ni αi2 + (ēi0 − ē00 )2 + 2αi (ēi0 − ē00 )

=
i=1
TA
eij ’s are linearly independent N (0, σ 2 )
k
X k
X k
X
2
∴ E(SSA) = ni αi2 + ni E (ēio − ē00 ) + 2 ni αi E (ēi0 − ē00 )
| {z }
i=1 i=1 i=1 0
E(ēi0 ) = 0
E(ē00 ) = 0
7
Therefore,
k
X k
X
E(SSA) = ni αi2 + ni E (ēi0 − ē00 )2
i=1 i=1
2
E (ēi0 − ē00 ) = E(ē2i0 ) + E(ē200 ) − 2E(ēi0 ēi0 )
= V (ēi0 ) + V (ē00 ) − 2 cov(V (ēi0 , (ē00 )
σ2 σ2 σ2 σ2 σ2
= + −2 = −
ni n n ni n
k
!
A
1 X
[cov(ēi0 , ē00 ) = cov ēi0 , ni ēi0
n
i=1
1 1 σ2 σ2
= ni V (ēi0 ) = · ni = ]
G
n n ni n
Therefore,
k k
N
E(SSA) =
X
i=1
k
ni αi2 +
X
i=1
ni

σ2 σ2
ni
−
n

k
X σ2 X
ni αi2 2
ni αi2 + (k − 1)σ 2
RA
= + kσ − ·n=
n
i=1 i=1
 
X n
k Xi
E(SSE) = E  (yij − ȳi0 )2 

i=1 j=1
yij = µ + αi + eij
ȳi0 = µ + αi + ēi0
TA
 
Xk X ni
E(SSE) = E  (eij − ēi0 )2 
i=1 j=1
 
X ni
k X
E(e2ij ) + E(ē2i0 ) − 2E(eij ēi0 ) 

=E
i=1 j=1
E(e2ij ) = V (eij ) = σ 2
σ2
E(ē2i0 ) = V (ēi0 ) =
ni
 
ni
1 X 1 σ2
E(eij ēi0 ) = E eij , eij  = E(e2ij ) =
ni ni ni
j=1
8
Therefore,
ni
k X k
σ2 σ2
X X
2
ni σ 2 − σ 2

E(SSE) = σ + −2 =
ni ni
i=1 j=1 i=1
k
X k
X
= σ2 ni − σ 2 1 = σ 2 (n − k)
i=1 i=1
Note that
k
A
X
E(SSA) = ni αi2 + (k − 1)σ 2
i=1
k
SSA 1 X
⇒E = ni αi2 + σ 2
G
k−1 k−1
i=1
k
1 X
⇒ E(M SA) = ni αi2 + σ 2
k−1
i=1
N
(k − 1) degrees of freedom corresponding to SSA.
Here, MSA = Mean square due to A.
RA
Again,
E(SSE) = (n − k)σ 2

SSE
⇒E = σ 2 (MSE : Mean square error)
n−k
⇒ E(M SE) = σ 2
TA
(n − k) degrees of freedom due to error. Under H0 ,
E(M SA) = σ 2 = E(M SE)
As we deviate away from H0 ,
E(M SA) ≥ E(M SE)
M SA
We define the following test statistic F = so the large value of F
M SE
indicates the rejectION of H0 . So a right tailed test based on F statistic
under H0 , is appropriate.
+
SSA 2
σ 2 ∼ χ k−1
SSE 2
independent
σ 2 ∼ χ n−k
9
SSA/k − 1
F = ∼ Fk−1,n−k
SSE/n − k
i.e., we reject H0 at level α if
F > Fα;k−1,n−k .
Paired comparison:
H0 : µ1 = µ2 = · · · = µk v/s H1 : at least one inequality in H0 .
A
If H0 is rejected then different class means are different from each other,
so a paired comparison is required.
From comparing ith class with i0 th class we have the following hypothesis,
G
H0 : µi = µ0i vs H1 : µi 6= µ0i
The corresponding test statistic is,

N ȳi0 − ȳi0 0
t= r ∼ tn−k .
M SE n1i + 1
n0i
RA
We reject H0 at level α if
s
1 1
|ȳi0 − ȳ | >
i0 0 M SE + t
ni n0i α/2;n−k
this quantity called least significant.

Remark:
TA
let m hypothesis to be tested simultaneously. Type-1 error of each test is α.
P [at least one false rejection] = 1 − P [No false rejection]

= 1 − (1 − α)m
as m → ∞, P [at least one false rejection] → 1.This is a problem of multiple

testing
Two way fixed effects model: (one observation per cell)

Motivation: Suppose in statistics tuition market there are two teachers
the first teacher has two fixed batches and the second teacher has three fixed
batches. Also suppose that only one student is present in the intersection of
10
the batches of two teachers. Then marks of students is varied by two factors,
i.e., the two teachers and it is modelled by ANOVA two way fixed effects
model one observation per cell.
B→
B1 B2 . . . Bq
A↓
A1 y11 y12 . . . y1q
A2 y21 y22 . . . y2q
A
.. .. .. ..
. . . .
Ap yp1 yp2 . . . ypq
G
Model:
yij = observation corresponding to ith level of A and jth level of B.
Here the model is : yij = µij + eij , i = 1(1)p, j = 1(1)q
N
We re parametrize the model as follows,
yij = µ + (µi0 − µ) + (µ0j − µ) + (µij − µi0 − µ0j + µ) + eij
RA
= µ + αi + βj + γij + eij
µ = general effect
αi = additional effect due to ith level of A.
βj = additional effect due to jth level of B.
γij = Interaction effect due to ith level of A and jth level of B.
TA
eij = The error in the model

For one observation per cell we take γij = 0 as γij can not be estimated so
the model becomes,
yij = µ + αi + βj + eij .
Assumptions:
p q
iid
X X
(i) αi = 0 (ii) βj = 0 (iii) eij ∼ N (0, σ 2 ).
i=1 j=1
Model parameter estimation
p X
X q p X
X q
E= e2ij = (yij − µ − αi − βj )2
i=1 j=1 i=1 j=1
11
p X
q
∂E X
=0⇒ (yij − µ − αi − βj )(−1) = 0
∂µ
i=1 j=1
Xp X q p X
X p p X
X q
⇒ yij − pqµ − αi − βj = 0
i=1 j=1 i=1 j=1 i=1 j=1
| {z } | {z }
0 0
p X
q
1 X
⇒ µ̂ = yij = ȳ00
pq
i=1 j=1
p q
A
∂E ∂ XX
=0⇒ (yij − µ − αi − βj )2 = 0
∂αi ∂αi
i=1 j=1
q
X q
X
⇒ yij = qµ + qαi + βj
G
j=1 j=1
|{z}
0
q
N 1X
⇒ α̂i = yij − µ̂ = ȳi0 − ȳ00
q
j=1
1P p
Similarly, β̂j = ȳ0j − ȳ00 where ȳ0j =
yij
RA
p i=1
orthogonal splitting of total sum of squares:
Note that yij = µ̂ + α̂ii + β̂j + eij
i.e., yij = ȳ00 + (ȳi0 − ȳ00 ) + (ȳ0j − ȳ00 ) + (yij − ȳi0 − ȳ0j + ȳ00 )
⇒ (yij − ȳ00 ) = (ȳi0 − ȳ00 ) + (ȳ0j − ȳ00 ) + (yij − ȳi0 − ȳ0j + ȳ00 )
TA
p X
X q p
X q
X
⇒ (yij − ȳ00 )2 = q (ȳi0 − ȳ00 )2 + p (ȳ0j − ȳ00 )2
i=1 j=1 i=1 j=1
X p X q
+ (yij − ȳi0 − ȳ0j + ȳ00 )
i=1 j=1
T SS = SSA + SSB +SSE→Sum of square due to error

↓ ↓ ↓
Total sumSum of square Sum of square
of square due to A due to B
12
Degrees of freedom corresponding to TSS → pq − 1
Degrees of freedom corresponding to SSA → p − 1
Degrees of freedom corresponding to SSB → q − 1
Degrees of freedom corresponding to error → (p − 1) × (q − 1)
Expectation of SS:
 
p X
q
A
X
E(SSE) = E  (yij − ȳi0 − ȳ0j + ȳ00 )2 
i=1 j=1
G
yij = µ + αi + βj + eij
q q
1X 1X
ȳi0 = yij = (µ + αi + βj + eij )
q q
j=1 j=1
N = µ + αi +
1
q
q
X
j=1
βj +
1X
q
q
eij
j=1
RA
= µ + αi + ēi0
Similarly,
ȳ0j = µ + βj + ē0j
ȳ00 = µ + ē00
TA
SSE:
p X
X q
(µ + αi + βj + eij − µ − αi − ēi0 − µ − βj − ē0j + µ + ē00 )2
i=1 j=1
Xp X q
= (eij − ēi0 − ē0j + ē00 )2
i=1 j=1
Xp X q
= (e2ij + ē2i0 + ē20j + ē200 − 2eij ēi0 − 2eij ē0j + 2eij ē00
i=1 j=1
+ 2ēi0 ē0j − 2ēi0 ē00 − 2ē0j ē00 )
13
p X
X q
E[SSE] = [E(e2ij ) + E(ē2i0 ) + E(ē20j ) + E(ē200 ) − 2 cov(eij , ēi0 ) − 2 cov(eij , ē0j
i=1 j=1
+ 2 cov(eij , ē00 ) + 2 cov(ēi0 , ē0j ) − 2 cov(ēi0 , ē00 ) − 2 cov(ē0j , ē00 )]
 
q
1X 1 1 σ2
cov(eij , ēi0 ) = cov eij , eij  = E(e2ij ) = V (eij ) =
q q q q
j=1
p
!
1X 1 σ2
A
cov(eij , ē0j ) = cov eij , eij = V (eij ) =
p p p
i=1
 
p q
1 XX  1 σ2
cov(eij , ē00 ) = cov eij ,
 eij = V (eij ) =
G
pq pq pq
i=1 j=1
 
q p
1X 1X  1 σ2
cov(ēi0 , ē0j ) = cov  eij , eij = V (eij ) =
N q p pq pq
j=1 i=1
p
!
1X 1 σ2 σ2
cov(ēi0 , ē00 ) = cov ēi0 , ēi0 = =
p p q pq
i=1
RA
Similarly,
 
q
q X 1 σ2 σ2
cov(ē0j , ē00 ) = cov ē0j , ē0j  = =
q q p pq
j=1
TA
p X q
σ2 σ2 σ2 σ2 σ2 σ2 σ2 σ2 σ2
X
E(SSE) = σ2 + + + −2 −2 +2 +2 −2 −2
p q pq q p pq pq pq pq
i=1 j=1
p X q
σ2 σ2 σ2
X
= σ2 − − +
p q pq
i=1 j=1
pqσ 2 (p − 1)(q − 1)
= = σ 2 (p − 1)(q − 1)
pq

2 SSE
i.e., E[SSE] = σ (p − 1)(q − 1) ⇒ E = σ2
(p − 1)(q − 1)
⇒ E[M SE] = σ 2
14
p
" # " p #
X X
2 2
E[SSA] = E q (ȳi0 − ȳ00 ) = E q (µ + αi + ēi0 − µ − ē00 )
i=1 i=1
p
" #
X
=E q (αi + ei0 − ē00 )2
i=1
p
" #
X
=E q (αi2 + ē2i0 + ē200 − 2ēi0 ē00 + 2αi ēi0 − 2αi ē00 )
i=1
p
X 2
αi + E(ē2i0 ) + E(ē200 ) − 2 cov(ēi0 , ē00 )

=q
A
i=1
p p
σ2 σ2
X X
=q αi2 + − =q αi2 + pσ 2 − σ 2
q pq
i=1 i=1
G
p
X p
X
=q αi2 + pσ 2 − σ 2 = q αi2 + (p − 1)σ 2
i=1 i=1
p
SSA q X
αi2 + σ 2
E
N p−1
=
p−1
q
i=1
p
X
⇒ E[M SA] = αi2 + σ 2
p−1
RA
i=1
p Pq
Similarly, E[M SB] = β 2 + σ2
q − 1 j=1 j
Hypothesis
Here we want to test,
(i) H01 : αi = 0 ∀ i vs H1 : at least one inequality in H01

TA
↓
The factor A has no effect
(ii) H02 : βj = 0 ∀ j vs H1 : at least one inequality in H02

↓
Factor B has no effect.
Under H0 ,
E(M SA) = σ 2 As we deviate away from H01
E(M SA) ≥ σ 2 = E(M SE)
M SA
So intuitively a large value of M SE indicates the rejection of H0 .
15
Under H01 , +
SSA
σ2
∼ χ2p−1
SSE
independent
σ2
∼ χ2(p−1)(q−1)
Now
SSA/p − 1
F1 = ∼ Fp−1 , (p − 1)(q − 1)
SSE/(p − 1)(q − 1)
M SA
F1 = ∼ Fp−1,(p−1)(q−1)
M SE
A
So we reject H01 at size α if
F1 > Fα p−1,(p−1)(q−1)
G
M SB
Define F2 = ∼ Fq−1,(p−1)(q−1)
M SE
We reject H02 at size α if
N F2 > Fα; q−1,(p−1)(q−1)
RA
Two way fixed effects model
(m-observations per cell)

Motivation: Suppose in statistics tuition market there are two teachers.
The first teacher has two fixed batches and the second teacher has three fixed
batches. Also suppose that ‘m’ students are present in the each intersection of
TA
the batches of two teachers. Then marks of students is varied by two factors
i.e., the two teachers and it is modelled by ANOVA two way fixed effects
layout m observation per cell.
Model:
We consider a factor A having p fixed levels A1 , ..., Ap and another factor B
with q fixed levels B1 , B2 , .., Bq . Corresponding to ith level of A and j th level
of B there are ”m” observations ∀i, j
yijk = µ + αi + βj + γij + eijk , i = 1(1)p, j = 1(1)q, k = 1(1)m
yijk = kth observation corresponding to ith level of A and jth level of B.
µ = general effect
αi = additional level due to ith level of A.
16
γij = interaction effect due to ith level of A and jth level of B.
eijk = error in the model.
Here we can estimate γij and hence we incorporate it in the model. Assump-
tion:
Xp q
X
(i) αi = 0 (ii) βj = 0
i=1 j=1
p q p X
q
iid
X X X
(iii) γij = γij = γij = 0 (iv) eijk ∼ N (0, σ 2 )
A
i=1 j=1 i=1 j=1
G
Estimation of model parameter:
p q m
1 XXX
µ̂ = ȳ000 = yijk
N pqm
i=1 j=1 k=1
q m
1 XX
α̂i = ȳi00 − ȳ000 ,
RA
ȳi00 = yijk
qm
j=1 k=1
p m
1 XX
β̂j = ȳ0j0 − ȳ000 , ȳ0j0 = yijk
pm
i=1 k=1
p X
X q X
m
TA
E= (yijk − µ − αi − βj − γij )2
i=1 j=1 k=1
m
∂E X
= (−2)(yijk − µ − αi − βj − γij ) = 0
∂γij
k=1
m
X
⇒ yijk = mµ̂ + mα̂i + mβ̂j + mγ̂ij
k=1
m
1 X
⇒ γ̂ij = yijk − (ȳi00 − ŷ000 ) − (ȳ0j0 − ȳ000 ) − ȳ000
m
k=1
⇒ γ̂ij = ȳij0 − ȳi00 − ȳ0j0 + ȳ000
17
Orthogonal splitting of total sum of square
yijk = ȳ000 + (ȳi00 − ȳ000 ) + (ȳ0j0 − ȳ000 )

+ (ȳij0 − ȳi00 − ȳ0j0 + ȳ000 ) + (yijk − ȳijo )
⇒ (yijk − ȳ000 ) = (ȳi00 − ȳ000 ) + (ȳ0j0 − ȳ000 )
+ (ȳij0 − ȳi00 − ȳ0j0 + ȳ000 ) + (yijk − ȳij0 )
squaring and taking sum in both sides we get
A
p X
X q X
m p
X q
X
2
(yijk − ȳ000 ) = qm (ȳi00 − ȳ000 ) + pm (ȳ0j0 − ȳ000 )
i=1 j=1 k=1 i=1 j=1
p X
q
G
X
+m (ȳij0 − ȳi00 − ȳ0j0 + ȳ000 )2
i=1 j=1
p X
X q X
m
+ (yijk − ȳij0 )2
N i=1 j=1 k=1
T SS = SSA + SSB + SS(AB) + SSE

RA
degrees of freedom of TSS = pqm − 1
degrees of freedom of SSA = p − 1
degrees of freedom of SSB = q − 1
degrees of freedom of SS(AB) = (p − 1)(q − 1)
degrees of freedom of SSE = pq(m − 1)
TA
Expectation of SS:
p X
X q X
m
SSE = (yijk − ȳij0 )
i=1 j=1 k=1
yijk = µ + αi + βj + γij + eijk

m
1 X
ȳij0 = (µ + αi + βj + γij + eijk )
m
k=1
= µ + αi + βj + γij + ēijo
18
p X
X q X
m
E(SSE) = E(eijk − ēij0 )2
i=1 j=1 k=1
Xp X q X m
E e2ijk + ē2ij0 − 2ēij0 eijk

=
i=1 j=1 k=1
Xp X q X m
E(e2ijk ) + E(ē2ij0 ) − 2 cov(eijk , ēij0 )

=
i=1 j=1 k=1
p X q X m m
" !#
X σ2 1 X
= σ2 + − 2 cov eijk , eijk
A
m m
i=1 j=1 k=1 k=1
p X q X m
σ2 σ2
X
2
= σ + −2
m m
i=1 j=1 k=1
G
p X q X m
σ2
X
= σ2 −
m
i=1 j=1 k=1
(m − 1)pqmσ 2
N =
m
= (m − 1)pqσ 2
p X
X q
E(SS(AB)) = m (ȳij0 − ȳi00 − ȳ0j0 + ȳ000 )2
RA
i=1 j=1
yijk = µ + αi + βj + γij + eijk

⇒ ȳij0 = µ + αi + βj + γij + ēij0
ȳi00 = µ + ēi00 + αi
ȳ0j0 = µ + ē0j0 + βj
TA
ȳ00 = µ + ē000
 
p X
X q
E[SS(AB)] = m E (ēij0 − ēi00 − ē0j0 + ē000 + γij )2 
i=1 j=1
p X
X q
=m {E(ēij0 − ēi00 − ē0j0 + ē000 )2 + E(γij
2
)
i=1 j=1

+ 2γij E(ēij0 − ēi00 − ē0j0 + ē000 )}
p X
X q p X
X q
2
=m γij +m E(ēij0 − ēi00 − ē0j0 + ē000 )2
i=1 j=1 i=1 j=1
19
Note that
E(ēij0 − ēi00 − ē0j0 + ē000 )2
= E(ē2ij0 + ē2i00 + ē20j0 + ē2000 − 2ēij0 ēi00 − 2ēij0 ē0j0
+ 2ēij0 ē000 + 2ē0j0 ēi00 − 2ēi00 ē000 − 2ē0j0 ē000 )
= E(ē2ij0 ) + E(ē2i00 ) + E(ē0j0 )2 + E(ē2000 )
− 2 cov(ēij0 , ēi00 ) − 2 cov(ēij0 , ē0j0 ) + 2 cov(ēi00 , ē0j0 )
+ 2 cov(ēij0 , ē000 ) − 2 cov(ēi00 , ē000 ) − 2 cov(ē0j0 , ē000 )
A
 
q
1X 1 σ2
cov(ēij0 , ēi00 ) = cov ēij0 , ēij0  = V (ēij0 ) =
q q mq
j=1
G
 
p
1 X 1 σ2
cov(ēij0 , ē0j0 ) = cov ēij0 , ēij0  = V (ēij0 ) =
p p mp
j=1
 
q X m p X
m
Ncov(ēi00 , ē0j0 ) = cov 
1
qm
X
m
eijk ,
j=1 k=1
1
pm
X
eijk 
i=1 k=1
1 X 1 2 σ2
RA
= V (eijk ) =mσ =
pqm2 pqm2 pqm
k=1
p
!
1X 1 σ2
cov(ēi00 , ē000 ) = cov ēi00 , ēi00 = V (ēi00 ) =
p p pqm
i=1
σ2
cov(ēij0 , ē000 ) =
pqm
TA
Similarly,
σ2
cov(ē0j0 , ē000 ) =
pqm
p X
q 2
X σ σ2 σ2 σ2
= + + +
m qm pm pqm
i=1 j=1
σ2 σ2 σ2 σ2 σ2 σ2

−2
−2 +2 +2 −2 −2
mq mp pqm pqm pqm pqm
p q m
X X X σ2 σ2 σ2 σ2

= − − +
m mp qm pqm
i=1 j=1 k=1
mpq(pq − p − q + 1)
= σ2 = σ 2 (p − 1)(q − 1)
mpq
20
Therefore,
p X
X q
2
E[SS(AB)] = m γij + (p − 1)(q − 1)σ 2
i=1 j=1
Xp
E(SSA) = qm αi2 + (p − 1)σ 2 [see copy]
i=1
Xq
E(SSB) = pm βj2 + (q − 1)σ 2 [see copy]
j=1
A
Hypothesis
Here we want to test,
G
↓
N
↓
RA
(iii) H03 : γij = 0 ∀ i, j vs H1 : at least one inequality in H03

↓
The interaction of A and B has no effect.
TA
test statistic
Note that under H01 , E(M SA) = E(M SE) = σ 2 .As we drift away from H01 ,
M SA
E(M SA) > E(M SB). Thus a large value of M SE indicates the rejection of
H01 .
Under H01 , +
SSA 2
σ 2 ∼ χp−1
SSE
independent
σ2
∼ χ2pq(m−1)
SSA/p − 1
F1 = ∼ Fp−1,pq(m−1)
SSE/pq(m − 1)
M SA
F1 = ∼ Fp−1,pq(m−1)
M SE
21
We reject H01 at level α if F 1 > Fα;p−1,pq(m−1) .
Similarly we reject H02 at level α if F 2 > Fα;q−1,pq(m−1) and we reject H03
M SB M S(AB)
at level α if F 3 > Fα;(p−1)(q−1),pq(m−1) . Here, F 2 = M SE and F 3 = M SE
Random effects model:one way layout
Motivation: The yield of paddy in India is influenced by different states.

So state is a single factor causing the variation. Since India has twenty nine
A
states, the yield data can not be obtained from all the states due to time
and cost constraints. So a sample of states are randomly chosen. Hence the
factor ”State” has random number of levels. Thus it is modelled as one way
G
random effects model.
Model
Let us consider a single factor A having ‘k’ random levels, where the levels are
so chosen at random from larger number of levels. There are ‘r’ observations
N
corresponding to each levels.
So total number of observations are, n = rk.
ANOVA model is given by
RA
yij = µ + ai + eij
where µ = general effect

ai = additional random effect corresponding to ith level of A
eij = error in the model
TA
yij = jth observation corresponding to ith level of A
Assumption:
iid
(i) ai ∼ N (0, σa2 )
iid
(ii) eij ∼ N (0, σe2 )
(iii) ai and eij are independent.
Orthogonal splitting of TSS:

k X
X r k
X k X
X r
(yij − ȳ00 )2 = r (ȳi0 − ȳ00 )2 + (yij − ȳi0 )2
i=1 j=1 i=1 i=1 j−1
22
T SS = SSA + SSE
degrees of freedom of T SS = (n − 1)
degrees of freedom of SSA = (k − 1)
A
degrees of freedom of SSE = (n − k)
G
Expectation of SS:
N " k
X
#
2
E[SSA] = E r (ȳi0 − ȳ00 )
RA
i=1
k
X
=r E(ȳi0 − ȳ00 )2
i=1
k
" #
X
=E r (µ + ai + ēi0 − µ − ā − ē00 )2
i=1
TA
yij = µ + ai + eij
r
1X
ȳi0 = yij = µ + ai + ēi0
r
j=1
ȳ00 = µ + ā + ē00
23
Now by (*)
k
" #
X
(ai − ā)2 + (ēi0 − ē00 )2 + 2(ai − ā)(ēi0 − ē00 )

E(SSA) = E r
i=1
k
X k
X k
X
=r E(ai − ā)2 + r E(ēi0 − ē00 )2 + 2 (ai − ā)(ēi0 − ē00 )
i=1 i=1 i=1
| {z }
0
A
k
X
E(ai − ā)2 + E(ēi0 − ē00 )2

=r
i=1
Xk
E(a2i ) + E(σ̄ 2 ) − 2 cov(ai , ā)
G

=r
i=1
k
X
E(ē2i0 ) + E(ē200 ) − 2 cov(ēi0 , ē00 )

N +r
i=1
k k 2
σa2 σa2 σa σe2 σe2
X X
= r σa2 + −2 +r + −2
k k r rk rk
i=1 i=1
RA
k k 2
σa2 σe2

X X σe
= r σa2 − +r −
k r rk
i=1 i=1
= r(k − 1)σa2 + (k − 1)σe2 = (k − 1)(rσa2 + σe2 )
h i Pk
SSA
= rσa2 + σe2 cov(ēi0 , ē00 ) = cov(ēi0 , k1
TA
E k−1 i=1 ēi0 )

σe2
⇒ E[M SA] = rσa2 + σe2 = k1 var(ēi0 ) = rk
 
Xk X
r
E[SSE] = E  (yij − ȳi0 )2 
i=1 j=1
 
Xk X
r
=E (µ + ai + eij − µ − ai − ēi0 )2 
i=1 j=1
 
Xk X
r
=E (eij − ēi0 )2 
i=1 j=1
24
E(eij − ēi0 )2 = E(e2ij ) + E(ē2i0 ) − 2 cov(eij , ēi0 )
 
2 r
σ 1 X
= σe2 + e − 2 cov eij , eij 
r r
j=1
σe2 σe2σ2
= σe2 +
−2 = σe2 − e
r r r
k r 2

XX σ
E(SSE) = σe2 − e
r
i=1 j=1
= nσe2 − kσe2 = (n − k)σe2
A
⇒ E(M SE) = σe2
G
Hypothesis
Here we want to test
N H0 : σa2 = 0 vs H1 : σa2 > 0
Test statistic
RA
so under H0 , E(M SA) = E(M SE) = σe2
As we deviate away from null E(M SA) ≥ E(M SE)
M SA M SA
So a right tailed test based on is appropriate under H0 , F = ∼
M SE M SE
Fk−1,n−k so we reject H0 at level α if F > Fα;k−1,n−k .
TA
Random effects model: Two way ‘m’ observations per cell
Motivation: Total yield of crop in India is varied due to different states as

well as due to different types of seeds. Since types of seeds as well as number
of states is very high, data collection becomes tough. So a sample of seeds and
sample of states are chosen. So both the factors have randomly chosen levels.
Also in the intersection of different levels we have fixed m observation. Hence
yield of crop is modelled by random effects model, two way m-observations
per cell.
yijk = µ + ai + bj + cij + eijk
25
i = 1(1)p, j = 1(1)q and k = 1(1)m

ai = random effect due to ith level of A.
bj = random effect due to jth level of B.
cij = random interaction effect due to ith level of A and jth level of B.
eij = random error in the model.
A
p X
X q X
m p
X q
X
(yijk − ȳ000 )2 = mq (ȳi00 − ȳ000 )2 + pm (ȳ0j0 − ȳ000 )2
G
i=1 j=1 k=1 i=1 j=1
p X
X q
+m (ȳij0 − ȳi00 − ȳ0j0 + ȳ000 )2
N i=1 j=1
p
XX q X m
+ (yijk − ȳij0 )2
i=1 j=1 k=1
RA
TA

Assumptions:
iid
i. eijk ∼ N (0, σe2 )
iid 2)
ii. ai ∼ N (0, σA
iid 2)
iii. bj ∼ N (0, σB
iid 2 )
iv. cij ∼ N (0, σAB
v. ai , bj , cij , eijk are mutually uncorrelated
26
 
XXX
E[SS(AB)] = E  (ȳij0 − ȳi00 − ȳ0j0 + ȳ000 )2 
i j k
XX
=m E(ȳij0 − ȳi00 − ȳ0j0 + ȳ000 )2
i j

ȳij0 = µ + ai + bj + cij + ēij0
ȳi00 = µ + ai + b̄ + c̄i0 + ēi00
A
ȳ0j0 = µ + ā + bj + c̄0j + ē0j0
ȳ000 = µ + ā + b̄ + c̄00 + ē000
G
Expectation of SS
XX
E[SS(AB)] = m E µ + ai + bj + cij + ēij0
N j j
− µ − ai − bj − c̄i0 − ēi00 − µ − ā − bj − c̄0j − ē0j0

+ µ + ā + b̄ + c̄00 + ē000
XX
=m E {(cij − c̄i0 − c̄0j + c̄00 ) + (ēij0 − ēi00 − ē0j0 + ē000 )}2
RA
i j
XX
=m [E(c2ij ) + E(c̄2i0 ) + E(c̄20j ) + E(c̄200 ) − 2 cov(cij , c̄i0 )
i j
− 2 cov(cij , c̄0j ) + 2 cov(cij , c̄00 ) + 2 cov(c̄i0 , c̄0j ) − 2 cov(c̄i0 , c̄00 )

− 2 cov(c̄0j , c̄00 ) + E(ē2ij0 ) + E(ē2i00 ) + E(ē20j0 ) + E(ē2000 )
− 2 cov(ēij0 , ēi00 ) − 2 cov(ēij0 , ē0j0 ) + 2 cov(ēij0 , ē000 )
TA
+ 2 cov(ēi00 , ē0j0 ) − 2 cov(ēi00 , ē000 ) − 2 cov(ē0j0 , ē000 )]
Now  
2
σAB
1X
cov(cij , c̄i0 ) = cov cij , cij  =
q q
j
2
σAB 2
σAB
cov(cij , c̄0j ) = cov (cij , c̄00 ) =
p pq
2
σAB
cov(c̄i0 , c̄0j ) =
pq
2
σAB
cov(c̄0j , c̄00 ) =
pq
27
Again
q
1X σ2
cov(ēij0 , ēi00 ) = cov(ēij0 , ēij0 ) = e
q qm
j=1
σe2 σe2
cov(ēij0 , ē0j0 ) = cov(ēij0 , ē000 ) =
pm pqm
σe2 σe2
cov(ēi00 , ē000 ) = cov(ē0j0 , ē000 ) =
pqm pqm
σe2
A
cov(ēi00 , ē0j0 ) =
pqm
Therefore,
G
2
(p − 1)(q − 1)σe2

(p − 1)(q − 1)σAB
E[SS(AB)] = m · pq + · pq
pq mpq
2
+ σe2

N = (p − 1)(q − 1) mσAB

SS(AB) 2
⇒E = mσAB + σe2
(p − 1)(q − 1)
2
⇒E[M S(AB)] = mσAB + σe2
RA
2 2
E(M SA) = mσAB + qmσA + σe2 [See copy]
2 2
E(M SB) = mσAB + pmσB + σe2
E(M SE) = σe2
Hypothesis
2 = 0 vs H
Here we want to test H0A : σA 2
1A : σA > 0
TA
2 2
H0B : σB = 0 vs H1B : σB >0
2
and H0AB : σAB 2
= 0 vs H1AB : σAB >0
Test Statistic
2 + σ 2 = E(M S(AB))
Under H0A , E(M SA) = mσAB e
As we deviate away from H0A , E(M SA) ≥ E(M S(AB)) so right tailed test
M SA M SA
based on is appropriate under H0A , ∼ Fp−1,(p−1)(q−1)
M S(AB) M S(AB)
we reject H0A at level α if
M SA
> Fα;(p−1);(p−1)(q−1) .
M S(AB)
Similarly, under H0B , E(M SB) = E(M S(AB)) = σe2 + mσAB

2
28
As we deviate away from H0B , E(M SB) ≥ E(M S(AB)). A right tailed
M SB M SB
test based on is appropriate. Hence we reject H0B if >
M S(AB) M S(AB)
Fα;(q−1);(p−1)(q−1) .
Again under H0AB , E[M S(AB)] = E[M SE] = σ 2 . As we deviate away from
M S(AB)
H0AB , E[M S(AB)] > E[M SE]. Thus a right tailed test based on
M SE
is appropriate. We reject H0AB if MM S(AB)
SE > F α;(p−1)(q−1),pq(m−1)
Two way mixed effects model:m observation per cell
A
Motivation:
Yield of crop may vary in different state and fertilisers. We have randomly
G
chosen 10 states and kept all the varieties of fertilisers. Hence the effect
due to state is a random effect and the effect due to fertilizer remains as a
fixed effect. We have ”m” observations corresponding to each state and each
N
fertilizer. Thus the analysis of yield of crop can be carried by two way mixed
effects model m observation per cell.
RA
model
Suppose there are two factors A and B. For the factor A there are ‘p’-fixed
levels and for the factor B there are randomly chosen ‘q’ levels. Suppose
there are ‘m’ observations per cell The ANOVA model is given by,
yijk = µ + ai + bj + cij + eijk ,i = 1(1)p

TA
j = 1(1)q
k = 1(1)m
yijk = kth observation corresponding to ith level of A and jth level of B

µ = general effect
ai = additional fixed effect due to ith level of A.
bj = additional fixed effect due to jth level of B.
cij = random interaction level due to ith level of A and jth level of B.
Assumptions:
29
p
X
(i) ai = 0
i=1
p
X p
X p
X
(ii) cij = ai bj = bj ai = 0
i=1 i=1 i=1
iid 2
(iii) bj ∼ N (0, σB )
(iv) cij ∼ N (0, σi2 ) independently.
A
iid
(v) eijk ∼ N (0, σe2 )
G
Remark: {bj } and {eij } are independently distributed.
We further define,
N 2
σA =
1 X 2
p−1
ai ;
p
i=1
2
σAB =
1 X 2
p−1
σi
p
i=1
RA
Orthogonal splitting of total sum of squares:
XXX p
X q
X
2 2
(yijk − ȳ000 ) = qm (ȳi00 − ȳ000 ) + pm (ȳ0j0 − ȳ000 )2
i j k i=1 j=1
XX
+m (ȳij0 − ȳi00 − ȳ0j0 + ȳ000 )2
i j
TA
XXX
+ (yijk − ȳij0 )2
i j k

Expectation of SS:
30
1 XX
ȳi00 = yijk
qm
j k
1 XXX
ȳ000 = yijk
mpq
i j k
1 X X
ȳ0j0 = yijk
pm
i k
A
1 X
ȳij0 = yijk
m
k
1 XX
ȳi00 = yijk
G
qm
j k
1 XX
= (µ + ai + bj + cij + eijk )
qm
j k
N = µ + ai + b̄ + c̄i0 + ēi00
ȳ0j0 = µ + bj + ē0j0
ȳij0 = µ + ai + bj + cij + ēij0
RA
ȳ000 = µ + b̄ + ē000
p
X
E[SSA] = qm E(ȳi00 − ȳ000 )2
TA
i=1
p
X
= qm E(µ + ai + b̄ + c̄i0 + ēi00 − µ − b̄ − ē000 )2
i=1
Xp
= qm E(ai + ēc0 + ēi00 − ē000 )2
i=1
Xp
E(a2i ) + E(c̄2i0 ) + E(ēi00 − ē000 )2

= qm
i=1
+ product term vanishes due to independent.
" p p p
#
X
2
X σi2 X 2 2

= qm ai + + E(ēi00 ) + E(ē000 ) − 2E(ēi00 ē000 )
q
i=1 i=1 i=1
31
Now
A
p
!
1X
E(ēi00 , ē000 ) = cov(ēi00 , ē000 ) = cov ēi00 , ēi00
p
i=1
σe2
G
1
= V (ēi00 ) =
p pqm
N
RA
Therefore,
TA
" p p p 2 #
X 1 X X σ σ 2
E(SSA) = qm a2i + σi2 + − e
q qm pqm
i=1 i=1 i=1

2 (p − 1) 2 p 2 p−1
= qm (p − 1)σA + σAB + σ
q qm e p
2 2
+ σe2

= (p − 1) qmσA + mσAB
32
q
X
and E(SSB) = pm E(ȳ0j0 − ȳ000 )2
j=1
Xq
= pm E(µ + bj + ē0j0 − µ − b̄ − ē000 )2
j=1
Xq
= pm E(bj − b̄)2 + E(ē0j0 − ē000 )2
j=1

− 2E (bj − b̄)(ē0j0 − ē000 )
A
| {z }
0
q
X
E(bj − b̄)2 + E(ē0j0 − ē000 )2

= pm
j=1
G
∴ E(bj − b̄) = E(b2j ) + E(b̄2 ) − 2E(bj b̄)
2
σ2 σ2

2 q−1 2
= σB + B −2 B = σB
q q q
N
E(ē0j0 − ē000 )2 = E(ē20j0 ) + E(ē2000 ) − 2 cov(ē000 , ē0j0 )
σe2
=
σ2
+ e −
2σe2 σ2
= e − e
σ2
pm pqm pqm pm pqm
RA

1 q−1
= σe2
pm q
q
X q−1 2 q−1 1 2
E(SSB) = pm σB + σ
q q pm e
j=1
2
= pm(q − 1)σB + (q − 1)σe2
TA
2
= (q − 1)(pmσB + σe2 )
33
Now
p X
X q
E[SS(AB)] = m E [ȳij0 − ȳi00 − ȳ0j0 + ȳ000 ]2
i=1 j=1
Xp X q
=m E[µ + ai + bj + cij + ēij0 − µ − ai − b̄ − c̄i0
i=1 j=1
− ēi00 − µ − bj − ē0j0 + µ + b̄ + ē000 ]

 
p X
q
A
X  
=m E (cij − c̄i0 )2 + (ēij0 − ēi00 − ē0j0 + ē000 )2 
 
 | {z } 
i=1 j=1
(p−1)(q−1) 2
pqm
σe (already done)
G
Therefore,
N E[cij − c̄i0 ]2 = E(c2ij ) + E(c̄2i0 ) − 2 cov(cij , c̄i0 )

σi2 (q − 1) 2
= σi2 − = σi
q q
p X
q
RA

2 (q − 1) (p − 1)(q − 1) 2
X
∴ E[SS(AB)] = m σi + σe
q pqm
i=1 j=1
p
m(q − 1) X 2
= q σi + (p − 1)(q − 1)σe2
q
i=1
2
= m(p − 1)(q − 1)σAB + (p − 1)(q − 1)σe2
2
+ σe2 )
TA
= (p − 1)(q − 1)(mσAB
So finally,

SSA 2 2
E(M SA) = E = qmσA + mσAB + σe2
p−1
2
E(M SB) = pmσB + σe2
2
E(M S(AB)) = mσAB + σe2
and E(M SE) = σe2
2 = 0 [i.e., a = 0 ∀ i] vs
Hypothesis: Here we want to test H0A : σA i
2 >0
H1A : σA
34
2 = 0 ag H
H0B : σB 2
1B : σB > 0
2
H0AB : σAB 2
= 0 ag H1AB : σAB >0
Test statistic:
Under H0A
2
E[M SA] = mσAB + σe2 = E[M S(AB)]
As we deviate from H0A ,
E[M SA] ≥ E[M S(AB)]
A
M SA
so a right tailed test based on is appropriate under H0A ,
M S(AB)
M SA
G
∼ F(p−1),(p−1)(q−1)
M S(AB)

N M SA
> Fα j(p−1),(p−1)(q−1)
M S(AB)
under H0b , E(M SB) = E(M SE) = σe2

RA
E(M SB) ≥ E(M SE)
M SB
A right tailed test based on is appropriate. Similarly for testing
M SE
M S(AB)
H0AB a right tailed test based on is appropriate.
M SE
TA
Remark:
M SA
In general has no F -distribution under H0A . So we consider a
M S(AB)
approximate F -statistic with degrees of freedom (p − 1), (p − 1)(q − 1).
Some important questions and answers:
1. What is general linear hypothesis?

Let y1 , y2 , . . . , yn be independently distributed normal variables with,
E(yi ) = ai1 β1 + ai2 β2 + · · · + aip βp

V (yi ) = σ 2 ∀ i = (1)n
cov(yi , yj ) = 0 ∀ i 6= j
35
i.e.,    
E(y1 ) a11 βq + a12 β2 + · · · + a1p βp
   
 E(y2 )   a21 β1 + a22 β2 + · · · + a2p βp 
 . =
   
 ..   .. 
   . 

E(yn ) an1 β1 + an2 β2 + · · · + anp βp
  
a11 a12 . . . a1p β1
  
 a21 a22 . . . a2p  β2 
A
⇒ E(y ) =  . (2)
  
. .. ..  .
.
 . . . 
 . 
e 
an1 an2 . . . anp βp
⇒ E(y ) = Xβ
G
e e
X is called design matrix containing known coefficients. β is the vector of
unknown model parameters.
N e
Let us consider that parameters β are subject to ‘m’ independent linear

constraints. e
m×p p×1 m×1
= (3)
RA
H β h
e e
Now for the linear model (1) and restriction (2), we consider a set of
hypothesis containing linear equation in βi ’s given as follows.
t×p p×1 t×1

H0 : =
L β l
TA
where t linear functions in β are assumed to be independent. It is necessary

to assume that the row vectors
e in L are linearly dependent on row vectors
of X and H.
This set of hypothesis is called general linear hypothesis.
2. Explain the concept of selection of valid error

In ANOVA valid error refers to as the denominator of the concerned test
statistic of a certain hypothesis. For most of the ANOVA models mean
square error serves as the valid error. We consider an example here , where
besides mean square some other quantity turns out to be the valid error.
We consider a two way random effects model m observation per cell.
36
i = 1(1)p, j = 1(1)q and k = 1(1)m

ai = random effect due to ith level of A.
bj = random effect due to jth level of B.
cij = random interaction effect due to ith level of A and jth level of B.
A
p X
q X
m p q
G
X X X
2 2
(yijk − ȳ000 ) = mq (ȳi00 − ȳ000 ) + pm (ȳ0j0 − ȳ000 )2
i=1 j=1 k=1 i=1 j=1
p X
X q
+m (ȳij0 − ȳi00 − ȳ0j0 + ȳ000 )2
N +
p X
X q X
m
i=1 j=1
(yijk − ȳij0 )2
RA
i=1 j=1 k=1

TA

Assumptions:
iid iid
(i) eijk ∼ N (0, σe2 ) 2)
(ii) ai ∼ N (0, σA
iid iid It can be shown that,
2)
(iii) bj ∼ N (0, σB 2 )
(iv) cij ∼ N (0, σAB
2
E[M S(AB)] = mσAB + σe2
2 2
E(M SA) = mσAB + qmσA + σe2
2 2
E(M SB) = mσAB + pmσB + σe2
E(M SE) = σe2
37
Hypothesis
2 = 0 vs H
Here we want to test H0A : σA 2
1A : σA > 0
2 2
H0B · σB = 0 vs H1B : σB >0
2
and H0AB · σAB 2
= 0 vs H1AB : σAB >0
Test Statistic
2 + σ 2 = E(M S(AB))
Under H0A , E(M SA) = mσAB e
As we deviate away from H0A , E(M SA) ≥ E(M S(AB)) so right tailed test
M SA M SA
A
based on is appropriate under H0A , ∼ Fp−1,(p−1)(q−1)
M S(AB) M S(AB)
M SA
G
> Fα;(p−1);(p−1)(q−1) .
M S(AB)
Similarly, under H0B , E(M SB) = E(M S(AB)) = σe2 + mσAB

N 2
As we deviate away from H0B , E(M SB) ≥ E(M S(AB)). A right tailed
M SB M SB
test based on is appropriate. Hence we reject H0B if >
M S(AB) M S(AB)
Fα;(q−1);(p−1)(q−1) .
RA
Again under H0AB , E[M S(AB)] = E[M SE] = σ 2 . As we deviate away
from H0AB , E[M S(AB)] > E[M SE]. Thus a right tailed test based on
M S(AB)
is appropriate. We reject H0AB if MMS(AB)
SE > Fα;(p−1)(q−1),pq(m−1)
M SE
As we drift away from H0A and H0B , E(M SA) ≥ E(M S(AB)) and
E(M SB) ≥ E(M S(AB)) i.e., a right tailed test based on MM SA
S(AB) and
M SB
is appropriate M S(AB) serves as the valid error for testing H0A
TA
M S(AB)
and H0B . Whereas under H0AB , E(M S(AB)) = E(M SE) and as we
drift away from H0AB E(M S(AB)) ≥ E(M SE). Hence a right tailed test
based on M S(AB) is appropriate. Hence M SE serves as the valid error
in this case. Hence valid error changes over different testing problem.
3. What is orthogonal splitting?

In analysis of variance main objective is to partition the total variability
of any response into disjoint parts, where each parts indicate variability
due to certain effects. Now this partitioning is achieved via orthogonal
splitting of total sum of squares. Here total sum of square refers to as
total variance and via orthogonal splitting total sum of square is splitted
into sum square due to different sources of variation. Here the term square
38
refers to as total variance and via orthogonal splitting total sum of square
is splitted into sum square due to different sources of variation. Here the
term “orthogonal” indicates that sum of square due to different sources
are independent of each other.
In one way layout, fixed effect model, the total sum of square is partitioned
into sum of square due to the single factor and sum square due to error as
total variability is caused by the single factor and error. We can describe
it as follows,
A
yij = ȳ00 + (ȳi0 − ȳ00 ) + eij
⇒ (yij − ȳ00 ) = (ȳi0 − ȳ00 ) + (yij − ȳi0 )
G
Squaring and summing over i and j we get
k X
X ni k X
X ni k X
X ni
(ȳi0 − ȳ00 )2 = (ȳi0 − ȳ00 )2 + (yij − ȳi0 )2
N i=1 j=1 i=1 j=1

i=1 j=1
k k X
ni
RA
X 2
X
= ni (ȳi0 − ȳ00 ) + (yij − ȳio )2
i=1 i=1 j=1
where
k X
X ni
(yij − ȳ00 )2 = Sum of squares due to the factor A. (SSA)
i=1 j=1
TA
k X
X ni
(yij − ȳi0 ) = sum of squares due to the error (SSE)
i=1 j=1
i.e., T SS = SSA + SSE
4. If the F ratio is fractional, then what would be the interpreta-

tion?
A higher value of F -ratio indicates the rejection of H0 and if F ratio
becomes one the numerator of the test statistic becomes equal the valid
error. Then the null hypothesis is trivially accepted.
Whenever the F -ratio becomes fraction, then it is strongly implied that
the valid error is dominant over the presence of the factor. Hence the null
39
hypothesis is strongly accepted, i.e., effect of the corresponding factor is
tested not to be true.
5. In a two way classified data, if the equality of a certain fac-

tor gets rejected, then discuss how the significant effect can be
traced out?
We consider a two way classified data with one observation per cell. The
ANOVA model is given by
A
yij = µ + αi + βj + γij + eij , i = 1(1)p, j = 1(1)q
µ = general effect
G
αi = additional effect due to ith level of A.
γij = Interaction effect due to ith level of A and jth level of B.
N
For one observation per cell we take γij = 0 as γij can not be estimated
so the model becomes,
yij = µ + αi + βj + eij .
RA
Assumptions:
p q
iid
X X
(i) αi = 0 (ii) βj = 0 (iii) eij ∼ N (0, σ 2 ).
i=1 j=1
Hypothesis

TA
↓
↓
Under H0 ,
E(M SA) = σ 2 As we deviate away from H01
E(M SA) ≥ σ 2 = E(M SE)
M SA
So intuitively a large value of M SE indicates the rejection of H0 .
40
Under H01 , +
SSA
σ2
∼ χ2p−1
SSE
independent
σ2
∼ χ2(p−1)(q−1)
Now
SSA/p − 1
F1 = ∼ Fp−1 , (p − 1)(q − 1)
SSE/(p − 1)(q − 1)
M SA
F1 = ∼ Fp−1,(p−1)(q−1)
M SE
A
So we reject H01 at size α if
F1 > Fα p−1,(p−1)(q−1)
G
If H01 is rejected then paired comparison is required
From comparing ith class with i0 th class we have the following hypothesis,
N H0 : αi = αi0 vs H1 : αi 6= αi0
The corresponding test statistic is,

RA
ȳi0 − ȳi0 0
t= r ∼ t(p−1)(q−1) .
1 1
M SE q + q
We reject H0 at level α if
s
2M SE
TA
|ȳi0 − ȳi0 0 | > tα/2;(p−1)(q−1)

q
this quantity called least significant difference.
6. In a two way m observation per cell layout which hypothesis to

be tested at first and why?
If the hypothesis of interaction effect is rejected , then testing for individual
effects are not worth making as under the presence of interaction effect
if a particular level of A is found to be best, then there is no way to
detect that for each level of B it will remain as the best. The same holds
for factor B also. So under the presence of interaction it is suggested to
perform one way ANOVA by considering factor A for a particular level of
41
B or performing one way ANOVA by considering factor B for a particular
level of A.
In the current set up testing for individual effects can be carried out only
if interaction is tested to be not present.
E[M SA] = E[M S(AB)] and

E(M SB) = E[M S(AB)]
A
N G
RA
TA
42

Anova PDF

Caricato da

Informazioni sul documento

Titolo originale

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Anova PDF

Caricato da

Copyright:

Formati disponibili

Analysis of variance

What is analysis of variance? Explain with an example.

associated with the nature of classification of the data. The systematic

a11 a12 . . . a1p β1

One way fixed effects model:

Estimation of the model parameter:

i.e., yij = ȳ00 + (ȳi0 − ȳ00 ) + eij (1)

(product term vanishes)

Expectation of sum of squares:

eij ’s are linearly independent N (0, σ 2 )

E(SSE) = E  (yij − ȳi0 )2 

(n − k) degrees of freedom due to error. Under H0 ,

E(M SA) = σ 2 = E(M SE)

As we deviate away from H0 ,

E(M SA) ≥ E(M SE)

H0 : µ1 = µ2 = · · · = µk v/s H1 : at least one inequality in H0 .

The corresponding test statistic is,

this quantity called least significant.

let m hypothesis to be tested simultaneously. Type-1 error of each test is α.

P [at least one false rejection] = 1 − P [No false rejection]

as m → ∞, P [at least one false rejection] → 1.This is a problem of multiple

Two way fixed effects model: (one observation per cell)

eij = The error in the model

(product term vanishes)

T SS = SSA + SSB +SSE→Sum of square due to error

+ 2ēi0 ē0j − 2ēi0 ē00 − 2ē0j ē00 )

+ 2 cov(eij , ē00 ) + 2 cov(ēi0 , ē0j ) − 2 cov(ēi0 , ē00 ) − 2 cov(ē0j , ē00 )]

(i) H01 : αi = 0 ∀ i vs H1 : at least one inequality in H01

(ii) H02 : βj = 0 ∀ j vs H1 : at least one inequality in H02

(m-observations per cell)

yijk = ȳ000 + (ȳi00 − ȳ000 ) + (ȳ0j0 − ȳ000 )

squaring and taking sum in both sides we get

T SS = SSA + SSB + SS(AB) + SSE

yijk = µ + αi + βj + γij + eijk

yijk = µ + αi + βj + γij + eijk

(iii) H03 : γij = 0 ∀ i, j vs H1 : at least one inequality in H03

Random effects model:one way layout

Motivation: The yield of paddy in India is influenced by different states.

where µ = general effect

yij = jth observation corresponding to ith level of A

(iii) ai and eij are independent.

Orthogonal splitting of TSS:

degrees of freedom of SSA = (k − 1)

E k−1 i=1 ēi0 )

= nσe2 − kσe2 = (n − k)σe2

Random effects model: Two way ‘m’ observations per cell

Motivation: Total yield of crop in India is varied due to different states as

yijk = µ + ai + bj + cij + eijk

yijk = kth observation corresponding to ith level of A and jth level of B.

degrees of freedom of SSE = pq(m − 1)

v. ai , bj , cij , eijk are mutually uncorrelated

yijk = µ + ai + bj + cij + eijk

− µ − ai − bj − c̄i0 − ēi00 − µ − ā − bj − c̄0j − ē0j0

− 2 cov(cij , c̄0j ) + 2 cov(cij , c̄00 ) + 2 cov(c̄i0 , c̄0j ) − 2 cov(c̄i0 , c̄00 )

+ 2 cov(ēi00 , ē0j0 ) − 2 cov(ēi00 , ē000 ) − 2 cov(ē0j0 , ē000 )]

Similarly, under H0B , E(M SB) = E(M S(AB)) = σe2 + mσAB

Two way mixed effects model:m observation per cell

yijk = µ + ai + bj + cij + eijk ,i = 1(1)p

yijk = kth observation corresponding to ith level of A and jth level of B

(iv) cij ∼ N (0, σi2 ) independently.

T SS = SSA + SSB + SS(AB) + SSE

− ēi00 − µ − bj − ē0j0 + µ + b̄ + ē000 ]

N E[cij − c̄i0 ]2 = E(c2ij ) + E(c̄2i0 ) − 2 cov(cij , c̄i0 )