Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
(ANOVA)
A
G
1. All the students of this batch are requested to keep a copy of
the main class note in order to check for any discrepancy
N
2. And for outsiders of the batch if you find any mistake re-
garding anything in this notes, then before taking screen shot
and making fun of that, report to me directly cause always re-
RA
member you are reading this as your note is not worthwhile
1
E(yi ) = ai1 β1 + ai2 β2 + · · · + aip βp
V (yi ) = σ 2
cov(yi , yj ) = 0 ∀ i 6= j
where β1 , β2 , . . . , βp are model parameters and ai1 , ai2 , . . . , aip are coefficients
which are known.
Define,
A
a11 a12 ... a1p
a21 a22 . . . a2p
n×p
X = .
.. ..
G
.. . .
an1 an2 . . . anp
N y1
y2
y= .
.
e .
yn
RA
E(y1 ) a11 β1 + a12 β2 + · · · + a1p βp
E(y2 ) a21 β1 + a22 β2 + · · · + a2p βp
E(y ) = . =
. ..
e
.
.
E(yn ) an1 β1 + an2 β2 + · · · + anp βp
TA
A
parameters.
If the value of the coefficients are binary i.e., 0 or 1, then the model is
called Analysis of variance model.
G
By an ANOVA model we test whether an effect is absent or present.
When the value of the coefficients come from the value of some other
independent co-variates i.e., coefficients have usual continuous value, then
N
the model is called a regression model.. Hence by a regression model, the
impact of an effect can be judged.
If some of the coefficient values are binary and some are continuous then
RA
the model is called Analysis of covariance (ANCOVA) model.
Motivation: Let us consider that a single medicine which controls fever, has
four different dose levels.
TA
DrugA
10 mg
15 mg Dose levels
25 mg
50 mg
Here fever is influenced by a single medicine( Factor) and the factor has
fixed levels. Hence dependence of fever over medicine has one way varia-
tion viz medicine Suppose there be a single factor A with ‘k’ fixed levels
A1 , A2 , . . . , Ak , say.
For ith level, let there be ni observations, yi1 , yi2 , . . . , yni , i = 1(1)k. We
represent the observations in the following array data.
3
A1 y11 y12 . . . y1n1
A2 y21 y22 . . . y2n2
.. .. .. ..
. . . .
Ak yk1 yk2 . . . yknk
k
X
ni = n is the total number of responses
A
i=1
one way fixed effect model is given by
yij = µi + eij , i = 1(1)k, j = 1(1)ni
yij = response corresponding to jth observation of ith level of A
G
µi = effect due to ith level of A
N
Next we re parametrise the model as follows
yij = µ + (µi − µ) + eij
= µ + αi + eij
RA
αi = additional effect due to ith level of A
µ = general effect
eij = error in model
Re parametrization essentially leads to separation of error and exact effect
due the factor A
TA
Assumption:
k
X k
X k
X k
X
1. ni αi = ni (µi − µ) = ni µ i − µ ni = nµ − nµ
i=1 i=1 i=1 i=1
P
ni µ i
=0 µ= = grand mean of effects
n
a
2. eij ∼ N (0, σ 2 )
4
∂E ∂ XX
=0⇒ (yij − µ − αi )2 = 0
∂µ ∂µ
i j
XX
⇒ (yij − µ − αi )(−1) = 0
i j
XX XX XX
⇒ yij = µ+ αi
i j i j i j
XX k
X
⇒ yij = nµ − ni αi
i j i=1
A
| {z }
0
k X
ni
1 X
⇒ µ̂ = yij = ȳ00
n
i=1 j=1
G
k ni
∂ XX
(yij − µ − αi )2 = 0
∂αi
i=1 j=1
⇒
N ni
X
j=1
(yij − µ − αi )(−1) = 0 [here we differentiate on w.r.t. αi
for a specific value of i = 1(1)k. So
RA
summation over i vanishes]
ni
X ni
X ni
X
⇒ yij − µ− αi = 0
j=1 j=1 j=1
ni
X
⇒ yij − ni µ = ni αi
j=1
TA
ni
1 X
⇒ α̂i = −µ̂ = ȳi0 − ȳ00
ni
j=1
Now
eij = yij − α̂i − µ̂ = yij − (ȳi0 − ȳ00 ) − ȳ00
= yij − ȳi0
i.e., from (1)
yij = ȳ00 + (ȳi0 − ȳ00 ) + eij
⇒ (yij − ȳ00 ) = (ȳi0 − ȳ00 ) + (yij − ȳi0 )
5
Squaring and summing over i and j we get
k X
X ni k X
X ni k X
X ni
(yij − ȳ00 )2 = (ȳi0 − ȳ00 )2 + (yij − ȳi0 )2
i=1 j=1 i=1 j=1 i=1 j=1
A
where
G
k
X
ni (ȳi0 − ȳ00 )2 = Sum of squares due to the factor A. (SSA)
i=1
k X
X ni
(yij − ȳio )2 = sum of squares due to the error (SSE)
N
i=1 j=1
RA
i.e., T SS = SSA + SSE
This is called orthogonal splitting of total sum of square.
Now note that TSS carries degrees of freedom (n − 1) and SSA carries degrees
of freedom (k − 1).Hence SSE carries degrees of freedom n − k
Hypothesis: Here we want to test whether all levels of A have similar
effect or not H0 : αi = 0 vs. Hi : At least one inequality in H0 .
TA
yij = µi + eij
α i = µi − µ = 0
[µ1 = µ2 = · · · = µk ]
k
X
SSA = ni (ȳi0 − ȳ00 )2
i=1
6
Note that yij = µ + αi + eij
ni ni
1 X 1 X
ȳi0 = yij = (µ + αi + eij )
ni ni
j=1 j=1
ni ni ni
1 X 1 X 1 X
= µ+ αi + eij
ni ni ni
j=1 j=1 j=1
| {z }
ēi0
A
= µ + αi + ēi0
i k n i k n
1 XX 1 XX
ȳ00 = yij = (µ + αi + eij )
n n
i=1 j=1 i=1 j=1
G
ni
k X ni
k X i k n
1 X 1 X 1 XX
= µ+ αi + eij
n n n
N i=1 j=1 i=1 j=1 i=1 j=1
| {z }
0
= µ + ē00
k
X
i.e., SSA = ni {(µ + αi + ēi0 ) − (µ + ē00 )}2
RA
i=1
Xk
= ni {αi + ēi0 − ē00 }2
i=1
Xk
ni αi2 + (ēi0 − ē00 )2 + 2αi (ēi0 − ē00 )
=
i=1
TA
k
X k
X k
X
2
∴ E(SSA) = ni αi2 + ni E (ēio − ē00 ) + 2 ni αi E (ēi0 − ē00 )
| {z }
i=1 i=1 i=1 0
E(ēi0 ) = 0
E(ē00 ) = 0
7
Therefore,
k
X k
X
E(SSA) = ni αi2 + ni E (ēi0 − ē00 )2
i=1 i=1
2
E (ēi0 − ē00 ) = E(ē2i0 ) + E(ē200 ) − 2E(ēi0 ēi0 )
= V (ēi0 ) + V (ē00 ) − 2 cov(V (ēi0 , (ē00 )
σ2 σ2 σ2 σ2 σ2
= + −2 = −
ni n n ni n
k
!
A
1 X
[cov(ēi0 , ē00 ) = cov ēi0 , ni ēi0
n
i=1
1 1 σ2 σ2
= ni V (ēi0 ) = · ni = ]
G
n n ni n
Therefore,
k k
N
E(SSA) =
X
i=1
k
ni αi2 +
X
i=1
ni
σ2 σ2
ni
−
n
k
X σ2 X
ni αi2 2
ni αi2 + (k − 1)σ 2
RA
= + kσ − ·n=
n
i=1 i=1
X n
k Xi
yij = µ + αi + eij
ȳi0 = µ + αi + ēi0
TA
Xk X ni
E(SSE) = E (eij − ēi0 )2
i=1 j=1
X ni
k X
E(e2ij ) + E(ē2i0 ) − 2E(eij ēi0 )
=E
i=1 j=1
E(e2ij ) = V (eij ) = σ 2
σ2
E(ē2i0 ) = V (ēi0 ) =
ni
ni
1 X 1 σ2
E(eij ēi0 ) = E eij , eij = E(e2ij ) =
ni ni ni
j=1
8
Therefore,
ni
k X k
σ2 σ2
X X
2
ni σ 2 − σ 2
E(SSE) = σ + −2 =
ni ni
i=1 j=1 i=1
k
X k
X
= σ2 ni − σ 2 1 = σ 2 (n − k)
i=1 i=1
Note that
k
A
X
E(SSA) = ni αi2 + (k − 1)σ 2
i=1
k
SSA 1 X
⇒E = ni αi2 + σ 2
G
k−1 k−1
i=1
k
1 X
⇒ E(M SA) = ni αi2 + σ 2
k−1
i=1
N
(k − 1) degrees of freedom corresponding to SSA.
Here, MSA = Mean square due to A.
RA
Again,
E(SSE) = (n − k)σ 2
SSE
⇒E = σ 2 (MSE : Mean square error)
n−k
⇒ E(M SE) = σ 2
TA
M SA
We define the following test statistic F = so the large value of F
M SE
indicates the rejectION of H0 . So a right tailed test based on F statistic
under H0 , is appropriate.
+
SSA 2
σ 2 ∼ χ k−1
SSE 2
independent
σ 2 ∼ χ n−k
9
SSA/k − 1
F = ∼ Fk−1,n−k
SSE/n − k
i.e., we reject H0 at level α if
F > Fα;k−1,n−k .
Paired comparison:
A
If H0 is rejected then different class means are different from each other,
so a paired comparison is required.
From comparing ith class with i0 th class we have the following hypothesis,
G
H0 : µi = µ0i vs H1 : µi 6= µ0i
10
the batches of two teachers. Then marks of students is varied by two factors,
i.e., the two teachers and it is modelled by ANOVA two way fixed effects
model one observation per cell.
B→
B1 B2 . . . Bq
A↓
A1 y11 y12 . . . y1q
A2 y21 y22 . . . y2q
A
.. .. .. ..
. . . .
Ap yp1 yp2 . . . ypq
G
Model:
yij = observation corresponding to ith level of A and jth level of B.
Here the model is : yij = µij + eij , i = 1(1)p, j = 1(1)q
N
We re parametrize the model as follows,
yij = µ + (µi0 − µ) + (µ0j − µ) + (µij − µi0 − µ0j + µ) + eij
RA
= µ + αi + βj + γij + eij
µ = general effect
αi = additional effect due to ith level of A.
βj = additional effect due to jth level of B.
γij = Interaction effect due to ith level of A and jth level of B.
TA
Assumptions:
p q
iid
X X
(i) αi = 0 (ii) βj = 0 (iii) eij ∼ N (0, σ 2 ).
i=1 j=1
Model parameter estimation
p X
X q p X
X q
E= e2ij = (yij − µ − αi − βj )2
i=1 j=1 i=1 j=1
11
p X
q
∂E X
=0⇒ (yij − µ − αi − βj )(−1) = 0
∂µ
i=1 j=1
Xp X q p X
X p p X
X q
⇒ yij − pqµ − αi − βj = 0
i=1 j=1 i=1 j=1 i=1 j=1
| {z } | {z }
0 0
p X
q
1 X
⇒ µ̂ = yij = ȳ00
pq
i=1 j=1
p q
A
∂E ∂ XX
=0⇒ (yij − µ − αi − βj )2 = 0
∂αi ∂αi
i=1 j=1
q
X q
X
⇒ yij = qµ + qαi + βj
G
j=1 j=1
|{z}
0
q
N 1X
⇒ α̂i = yij − µ̂ = ȳi0 − ȳ00
q
j=1
1P p
Similarly, β̂j = ȳ0j − ȳ00 where ȳ0j =
yij
RA
p i=1
orthogonal splitting of total sum of squares:
Note that yij = µ̂ + α̂ii + β̂j + eij
i.e., yij = ȳ00 + (ȳi0 − ȳ00 ) + (ȳ0j − ȳ00 ) + (yij − ȳi0 − ȳ0j + ȳ00 )
⇒ (yij − ȳ00 ) = (ȳi0 − ȳ00 ) + (ȳ0j − ȳ00 ) + (yij − ȳi0 − ȳ0j + ȳ00 )
TA
p X
X q p
X q
X
⇒ (yij − ȳ00 )2 = q (ȳi0 − ȳ00 )2 + p (ȳ0j − ȳ00 )2
i=1 j=1 i=1 j=1
X p X q
+ (yij − ȳi0 − ȳ0j + ȳ00 )
i=1 j=1
12
Degrees of freedom corresponding to TSS → pq − 1
Degrees of freedom corresponding to SSA → p − 1
Degrees of freedom corresponding to SSB → q − 1
Degrees of freedom corresponding to error → (p − 1) × (q − 1)
Expectation of SS:
p X
q
A
X
E(SSE) = E (yij − ȳi0 − ȳ0j + ȳ00 )2
i=1 j=1
G
yij = µ + αi + βj + eij
q q
1X 1X
ȳi0 = yij = (µ + αi + βj + eij )
q q
j=1 j=1
N = µ + αi +
1
q
q
X
j=1
βj +
1X
q
q
eij
j=1
RA
= µ + αi + ēi0
Similarly,
ȳ0j = µ + βj + ē0j
ȳ00 = µ + ē00
TA
SSE:
p X
X q
(µ + αi + βj + eij − µ − αi − ēi0 − µ − βj − ē0j + µ + ē00 )2
i=1 j=1
Xp X q
= (eij − ēi0 − ē0j + ē00 )2
i=1 j=1
Xp X q
= (e2ij + ē2i0 + ē20j + ē200 − 2eij ēi0 − 2eij ē0j + 2eij ē00
i=1 j=1
13
p X
X q
E[SSE] = [E(e2ij ) + E(ē2i0 ) + E(ē20j ) + E(ē200 ) − 2 cov(eij , ēi0 ) − 2 cov(eij , ē0j
i=1 j=1
q
1X 1 1 σ2
cov(eij , ēi0 ) = cov eij , eij = E(e2ij ) = V (eij ) =
q q q q
j=1
p
!
1X 1 σ2
A
cov(eij , ē0j ) = cov eij , eij = V (eij ) =
p p p
i=1
p q
1 XX 1 σ2
cov(eij , ē00 ) = cov eij ,
eij = V (eij ) =
G
pq pq pq
i=1 j=1
q p
1X 1X 1 σ2
cov(ēi0 , ē0j ) = cov eij , eij = V (eij ) =
N q p pq pq
j=1 i=1
p
!
1X 1 σ2 σ2
cov(ēi0 , ē00 ) = cov ēi0 , ēi0 = =
p p q pq
i=1
RA
Similarly,
q
q X 1 σ2 σ2
cov(ē0j , ē00 ) = cov ē0j , ē0j = =
q q p pq
j=1
TA
p X q
σ2 σ2 σ2 σ2 σ2 σ2 σ2 σ2 σ2
X
E(SSE) = σ2 + + + −2 −2 +2 +2 −2 −2
p q pq q p pq pq pq pq
i=1 j=1
p X q
σ2 σ2 σ2
X
= σ2 − − +
p q pq
i=1 j=1
pqσ 2 (p − 1)(q − 1)
= = σ 2 (p − 1)(q − 1)
pq
2 SSE
i.e., E[SSE] = σ (p − 1)(q − 1) ⇒ E = σ2
(p − 1)(q − 1)
⇒ E[M SE] = σ 2
14
p
" # " p #
X X
2 2
E[SSA] = E q (ȳi0 − ȳ00 ) = E q (µ + αi + ēi0 − µ − ē00 )
i=1 i=1
p
" #
X
=E q (αi + ei0 − ē00 )2
i=1
p
" #
X
=E q (αi2 + ē2i0 + ē200 − 2ēi0 ē00 + 2αi ēi0 − 2αi ē00 )
i=1
p
X 2
αi + E(ē2i0 ) + E(ē200 ) − 2 cov(ēi0 , ē00 )
=q
A
i=1
p p
σ2 σ2
X X
=q αi2 + − =q αi2 + pσ 2 − σ 2
q pq
i=1 i=1
G
p
X p
X
=q αi2 + pσ 2 − σ 2 = q αi2 + (p − 1)σ 2
i=1 i=1
p
SSA q X
αi2 + σ 2
E
N p−1
=
p−1
q
i=1
p
X
⇒ E[M SA] = αi2 + σ 2
p−1
RA
i=1
p Pq
Similarly, E[M SB] = β 2 + σ2
q − 1 j=1 j
Hypothesis
Here we want to test,
↓
The factor A has no effect
Under H0 ,
E(M SA) = σ 2 As we deviate away from H01
E(M SA) ≥ σ 2 = E(M SE)
M SA
So intuitively a large value of M SE indicates the rejection of H0 .
15
Under H01 , +
SSA
σ2
∼ χ2p−1
SSE
independent
σ2
∼ χ2(p−1)(q−1)
Now
SSA/p − 1
F1 = ∼ Fp−1 , (p − 1)(q − 1)
SSE/(p − 1)(q − 1)
M SA
F1 = ∼ Fp−1,(p−1)(q−1)
M SE
A
So we reject H01 at size α if
F1 > Fα p−1,(p−1)(q−1)
G
M SB
Define F2 = ∼ Fq−1,(p−1)(q−1)
M SE
We reject H02 at size α if
N F2 > Fα; q−1,(p−1)(q−1)
RA
Two way fixed effects model
the batches of two teachers. Then marks of students is varied by two factors
i.e., the two teachers and it is modelled by ANOVA two way fixed effects
layout m observation per cell.
Model:
We consider a factor A having p fixed levels A1 , ..., Ap and another factor B
with q fixed levels B1 , B2 , .., Bq . Corresponding to ith level of A and j th level
of B there are ”m” observations ∀i, j
yijk = µ + αi + βj + γij + eijk , i = 1(1)p, j = 1(1)q, k = 1(1)m
yijk = kth observation corresponding to ith level of A and jth level of B.
µ = general effect
αi = additional level due to ith level of A.
16
βj = additional effect due to jth level of B.
γij = interaction effect due to ith level of A and jth level of B.
eijk = error in the model.
Here we can estimate γij and hence we incorporate it in the model. Assump-
tion:
Xp q
X
(i) αi = 0 (ii) βj = 0
i=1 j=1
p q p X
q
iid
X X X
(iii) γij = γij = γij = 0 (iv) eijk ∼ N (0, σ 2 )
A
i=1 j=1 i=1 j=1
G
Estimation of model parameter:
p q m
1 XXX
µ̂ = ȳ000 = yijk
N pqm
i=1 j=1 k=1
q m
1 XX
α̂i = ȳi00 − ȳ000 ,
RA
ȳi00 = yijk
qm
j=1 k=1
p m
1 XX
β̂j = ȳ0j0 − ȳ000 , ȳ0j0 = yijk
pm
i=1 k=1
p X
X q X
m
TA
E= (yijk − µ − αi − βj − γij )2
i=1 j=1 k=1
m
∂E X
= (−2)(yijk − µ − αi − βj − γij ) = 0
∂γij
k=1
m
X
⇒ yijk = mµ̂ + mα̂i + mβ̂j + mγ̂ij
k=1
m
1 X
⇒ γ̂ij = yijk − (ȳi00 − ŷ000 ) − (ȳ0j0 − ȳ000 ) − ȳ000
m
k=1
⇒ γ̂ij = ȳij0 − ȳi00 − ȳ0j0 + ȳ000
17
Orthogonal splitting of total sum of square
A
p X
X q X
m p
X q
X
2
(yijk − ȳ000 ) = qm (ȳi00 − ȳ000 ) + pm (ȳ0j0 − ȳ000 )
i=1 j=1 k=1 i=1 j=1
p X
q
G
X
+m (ȳij0 − ȳi00 − ȳ0j0 + ȳ000 )2
i=1 j=1
p X
X q X
m
+ (yijk − ȳij0 )2
N i=1 j=1 k=1
Expectation of SS:
p X
X q X
m
SSE = (yijk − ȳij0 )
i=1 j=1 k=1
18
p X
X q X
m
E(SSE) = E(eijk − ēij0 )2
i=1 j=1 k=1
Xp X q X m
E e2ijk + ē2ij0 − 2ēij0 eijk
=
i=1 j=1 k=1
Xp X q X m
E(e2ijk ) + E(ē2ij0 ) − 2 cov(eijk , ēij0 )
=
i=1 j=1 k=1
p X q X m m
" !#
X σ2 1 X
= σ2 + − 2 cov eijk , eijk
A
m m
i=1 j=1 k=1 k=1
p X q X m
σ2 σ2
X
2
= σ + −2
m m
i=1 j=1 k=1
G
p X q X m
σ2
X
= σ2 −
m
i=1 j=1 k=1
(m − 1)pqmσ 2
N =
m
= (m − 1)pqσ 2
p X
X q
E(SS(AB)) = m (ȳij0 − ȳi00 − ȳ0j0 + ȳ000 )2
RA
i=1 j=1
ȳ00 = µ + ē000
p X
X q
E[SS(AB)] = m E (ēij0 − ēi00 − ē0j0 + ē000 + γij )2
i=1 j=1
p X
X q
=m {E(ēij0 − ēi00 − ē0j0 + ē000 )2 + E(γij
2
)
i=1 j=1
+ 2γij E(ēij0 − ēi00 − ē0j0 + ē000 )}
p X
X q p X
X q
2
=m γij +m E(ēij0 − ēi00 − ē0j0 + ē000 )2
i=1 j=1 i=1 j=1
19
Note that
E(ēij0 − ēi00 − ē0j0 + ē000 )2
= E(ē2ij0 + ē2i00 + ē20j0 + ē2000 − 2ēij0 ēi00 − 2ēij0 ē0j0
+ 2ēij0 ē000 + 2ē0j0 ēi00 − 2ēi00 ē000 − 2ē0j0 ē000 )
= E(ē2ij0 ) + E(ē2i00 ) + E(ē0j0 )2 + E(ē2000 )
− 2 cov(ēij0 , ēi00 ) − 2 cov(ēij0 , ē0j0 ) + 2 cov(ēi00 , ē0j0 )
+ 2 cov(ēij0 , ē000 ) − 2 cov(ēi00 , ē000 ) − 2 cov(ē0j0 , ē000 )
A
q
1X 1 σ2
cov(ēij0 , ēi00 ) = cov ēij0 , ēij0 = V (ēij0 ) =
q q mq
j=1
G
p
1 X 1 σ2
cov(ēij0 , ē0j0 ) = cov ēij0 , ēij0 = V (ēij0 ) =
p p mp
j=1
q X m p X
m
Ncov(ēi00 , ē0j0 ) = cov
1
qm
X
m
eijk ,
j=1 k=1
1
pm
X
eijk
i=1 k=1
1 X 1 2 σ2
RA
= V (eijk ) =mσ =
pqm2 pqm2 pqm
k=1
p
!
1X 1 σ2
cov(ēi00 , ē000 ) = cov ēi00 , ēi00 = V (ēi00 ) =
p p pqm
i=1
σ2
cov(ēij0 , ē000 ) =
pqm
TA
Similarly,
σ2
cov(ē0j0 , ē000 ) =
pqm
p X
q 2
X σ σ2 σ2 σ2
= + + +
m qm pm pqm
i=1 j=1
σ2 σ2 σ2 σ2 σ2 σ2
−2
−2 +2 +2 −2 −2
mq mp pqm pqm pqm pqm
p q m
X X X σ2 σ2 σ2 σ2
= − − +
m mp qm pqm
i=1 j=1 k=1
mpq(pq − p − q + 1)
= σ2 = σ 2 (p − 1)(q − 1)
mpq
20
Therefore,
p X
X q
2
E[SS(AB)] = m γij + (p − 1)(q − 1)σ 2
i=1 j=1
Xp
E(SSA) = qm αi2 + (p − 1)σ 2 [see copy]
i=1
Xq
E(SSB) = pm βj2 + (q − 1)σ 2 [see copy]
j=1
A
Hypothesis
Here we want to test,
G
(i) H01 : αi = 0 ∀ i vs H1 : at least one inequality in H01
↓
The factor A has no effect
N
(ii) H02 : βj = 0 ∀ j vs H1 : at least one inequality in H02
↓
RA
Factor B has no effect.
test statistic
Note that under H01 , E(M SA) = E(M SE) = σ 2 .As we drift away from H01 ,
M SA
E(M SA) > E(M SB). Thus a large value of M SE indicates the rejection of
H01 .
Under H01 , +
SSA 2
σ 2 ∼ χp−1
SSE
independent
σ2
∼ χ2pq(m−1)
SSA/p − 1
F1 = ∼ Fp−1,pq(m−1)
SSE/pq(m − 1)
M SA
F1 = ∼ Fp−1,pq(m−1)
M SE
21
We reject H01 at level α if F 1 > Fα;p−1,pq(m−1) .
Similarly we reject H02 at level α if F 2 > Fα;q−1,pq(m−1) and we reject H03
M SB M S(AB)
at level α if F 3 > Fα;(p−1)(q−1),pq(m−1) . Here, F 2 = M SE and F 3 = M SE
A
states, the yield data can not be obtained from all the states due to time
and cost constraints. So a sample of states are randomly chosen. Hence the
factor ”State” has random number of levels. Thus it is modelled as one way
G
random effects model.
Model
Let us consider a single factor A having ‘k’ random levels, where the levels are
so chosen at random from larger number of levels. There are ‘r’ observations
N
corresponding to each levels.
So total number of observations are, n = rk.
ANOVA model is given by
RA
yij = µ + ai + eij
Assumption:
iid
(i) ai ∼ N (0, σa2 )
iid
(ii) eij ∼ N (0, σe2 )
22
T SS = SSA + SSE
degrees of freedom of T SS = (n − 1)
A
degrees of freedom of SSE = (n − k)
G
Expectation of SS:
N " k
X
#
2
E[SSA] = E r (ȳi0 − ȳ00 )
RA
i=1
k
X
=r E(ȳi0 − ȳ00 )2
i=1
k
" #
X
=E r (µ + ai + ēi0 − µ − ā − ē00 )2
i=1
TA
yij = µ + ai + eij
r
1X
ȳi0 = yij = µ + ai + ēi0
r
j=1
ȳ00 = µ + ā + ē00
23
Now by (*)
k
" #
X
(ai − ā)2 + (ēi0 − ē00 )2 + 2(ai − ā)(ēi0 − ē00 )
E(SSA) = E r
i=1
k
X k
X k
X
=r E(ai − ā)2 + r E(ēi0 − ē00 )2 + 2 (ai − ā)(ēi0 − ē00 )
i=1 i=1 i=1
| {z }
0
A
k
X
E(ai − ā)2 + E(ēi0 − ē00 )2
=r
i=1
Xk
E(a2i ) + E(σ̄ 2 ) − 2 cov(ai , ā)
G
=r
i=1
k
X
E(ē2i0 ) + E(ē200 ) − 2 cov(ēi0 , ē00 )
N +r
i=1
k k 2
σa2 σa2 σa σe2 σe2
X X
= r σa2 + −2 +r + −2
k k r rk rk
i=1 i=1
RA
k k 2
σa2 σe2
X X σe
= r σa2 − +r −
k r rk
i=1 i=1
= r(k − 1)σa2 + (k − 1)σe2 = (k − 1)(rσa2 + σe2 )
h i Pk
SSA
= rσa2 + σe2 cov(ēi0 , ē00 ) = cov(ēi0 , k1
TA
Xk X
r
E[SSE] = E (yij − ȳi0 )2
i=1 j=1
Xk X
r
=E (µ + ai + eij − µ − ai − ēi0 )2
i=1 j=1
Xk X
r
=E (eij − ēi0 )2
i=1 j=1
24
E(eij − ēi0 )2 = E(e2ij ) + E(ē2i0 ) − 2 cov(eij , ēi0 )
2 r
σ 1 X
= σe2 + e − 2 cov eij , eij
r r
j=1
σe2 σe2σ2
= σe2 +
−2 = σe2 − e
r r r
k r 2
XX σ
E(SSE) = σe2 − e
r
i=1 j=1
A
⇒ E(M SE) = σe2
G
Hypothesis
Here we want to test
N H0 : σa2 = 0 vs H1 : σa2 > 0
Test statistic
RA
so under H0 , E(M SA) = E(M SE) = σe2
As we deviate away from null E(M SA) ≥ E(M SE)
M SA M SA
So a right tailed test based on is appropriate under H0 , F = ∼
M SE M SE
Fk−1,n−k so we reject H0 at level α if F > Fα;k−1,n−k .
TA
25
i = 1(1)p, j = 1(1)q and k = 1(1)m
A
Orthogonal splitting of TSS:
p X
X q X
m p
X q
X
(yijk − ȳ000 )2 = mq (ȳi00 − ȳ000 )2 + pm (ȳ0j0 − ȳ000 )2
G
i=1 j=1 k=1 i=1 j=1
p X
X q
+m (ȳij0 − ȳi00 − ȳ0j0 + ȳ000 )2
N i=1 j=1
p
XX q X m
+ (yijk − ȳij0 )2
i=1 j=1 k=1
RA
T SS = SSA + SSB + SS(AB) + SSE
degrees of freedom of TSS = pqm − 1
degrees of freedom of SSA = p − 1
degrees of freedom of SSB = q − 1
degrees of freedom of SS(AB) = (p − 1)(q − 1)
TA
iid
i. eijk ∼ N (0, σe2 )
iid 2)
ii. ai ∼ N (0, σA
iid 2)
iii. bj ∼ N (0, σB
iid 2 )
iv. cij ∼ N (0, σAB
26
XXX
E[SS(AB)] = E (ȳij0 − ȳi00 − ȳ0j0 + ȳ000 )2
i j k
XX
=m E(ȳij0 − ȳi00 − ȳ0j0 + ȳ000 )2
i j
A
ȳ0j0 = µ + ā + bj + c̄0j + ē0j0
ȳ000 = µ + ā + b̄ + c̄00 + ē000
G
Expectation of SS
XX
E[SS(AB)] = m E µ + ai + bj + cij + ēij0
N j j
Now
2
σAB
1X
cov(cij , c̄i0 ) = cov cij , cij =
q q
j
2
σAB 2
σAB
cov(cij , c̄0j ) = cov (cij , c̄00 ) =
p pq
2
σAB
cov(c̄i0 , c̄0j ) =
pq
2
σAB
cov(c̄0j , c̄00 ) =
pq
27
Again
q
1X σ2
cov(ēij0 , ēi00 ) = cov(ēij0 , ēij0 ) = e
q qm
j=1
σe2 σe2
cov(ēij0 , ē0j0 ) = cov(ēij0 , ē000 ) =
pm pqm
σe2 σe2
cov(ēi00 , ē000 ) = cov(ē0j0 , ē000 ) =
pqm pqm
σe2
A
cov(ēi00 , ē0j0 ) =
pqm
Therefore,
G
2
(p − 1)(q − 1)σe2
(p − 1)(q − 1)σAB
E[SS(AB)] = m · pq + · pq
pq mpq
2
+ σe2
N = (p − 1)(q − 1) mσAB
SS(AB) 2
⇒E = mσAB + σe2
(p − 1)(q − 1)
2
⇒E[M S(AB)] = mσAB + σe2
RA
2 2
E(M SA) = mσAB + qmσA + σe2 [See copy]
2 2
E(M SB) = mσAB + pmσB + σe2
E(M SE) = σe2
Hypothesis
2 = 0 vs H
Here we want to test H0A : σA 2
1A : σA > 0
TA
2 2
H0B : σB = 0 vs H1B : σB >0
2
and H0AB : σAB 2
= 0 vs H1AB : σAB >0
Test Statistic
2 + σ 2 = E(M S(AB))
Under H0A , E(M SA) = mσAB e
As we deviate away from H0A , E(M SA) ≥ E(M S(AB)) so right tailed test
M SA M SA
based on is appropriate under H0A , ∼ Fp−1,(p−1)(q−1)
M S(AB) M S(AB)
we reject H0A at level α if
M SA
> Fα;(p−1);(p−1)(q−1) .
M S(AB)
28
As we deviate away from H0B , E(M SB) ≥ E(M S(AB)). A right tailed
M SB M SB
test based on is appropriate. Hence we reject H0B if >
M S(AB) M S(AB)
Fα;(q−1);(p−1)(q−1) .
Again under H0AB , E[M S(AB)] = E[M SE] = σ 2 . As we deviate away from
M S(AB)
H0AB , E[M S(AB)] > E[M SE]. Thus a right tailed test based on
M SE
is appropriate. We reject H0AB if MM S(AB)
SE > F α;(p−1)(q−1),pq(m−1)
A
Motivation:
Yield of crop may vary in different state and fertilisers. We have randomly
G
chosen 10 states and kept all the varieties of fertilisers. Hence the effect
due to state is a random effect and the effect due to fertilizer remains as a
fixed effect. We have ”m” observations corresponding to each state and each
N
fertilizer. Thus the analysis of yield of crop can be carried by two way mixed
effects model m observation per cell.
RA
model
Suppose there are two factors A and B. For the factor A there are ‘p’-fixed
levels and for the factor B there are randomly chosen ‘q’ levels. Suppose
there are ‘m’ observations per cell The ANOVA model is given by,
j = 1(1)q
k = 1(1)m
Assumptions:
29
p
X
(i) ai = 0
i=1
p
X p
X p
X
(ii) cij = ai bj = bj ai = 0
i=1 i=1 i=1
iid 2
(iii) bj ∼ N (0, σB )
A
iid
(v) eijk ∼ N (0, σe2 )
G
Remark: {bj } and {eij } are independently distributed.
We further define,
N 2
σA =
1 X 2
p−1
ai ;
p
i=1
2
σAB =
1 X 2
p−1
σi
p
i=1
RA
Orthogonal splitting of total sum of squares:
XXX p
X q
X
2 2
(yijk − ȳ000 ) = qm (ȳi00 − ȳ000 ) + pm (ȳ0j0 − ȳ000 )2
i j k i=1 j=1
XX
+m (ȳij0 − ȳi00 − ȳ0j0 + ȳ000 )2
i j
TA
XXX
+ (yijk − ȳij0 )2
i j k
30
1 XX
ȳi00 = yijk
qm
j k
1 XXX
ȳ000 = yijk
mpq
i j k
1 X X
ȳ0j0 = yijk
pm
i k
A
1 X
ȳij0 = yijk
m
k
1 XX
ȳi00 = yijk
G
qm
j k
1 XX
= (µ + ai + bj + cij + eijk )
qm
j k
N = µ + ai + b̄ + c̄i0 + ēi00
ȳ0j0 = µ + bj + ē0j0
ȳij0 = µ + ai + bj + cij + ēij0
RA
ȳ000 = µ + b̄ + ē000
p
X
E[SSA] = qm E(ȳi00 − ȳ000 )2
TA
i=1
p
X
= qm E(µ + ai + b̄ + c̄i0 + ēi00 − µ − b̄ − ē000 )2
i=1
Xp
= qm E(ai + ēc0 + ēi00 − ē000 )2
i=1
Xp
E(a2i ) + E(c̄2i0 ) + E(ēi00 − ē000 )2
= qm
i=1
+ product term vanishes due to independent.
" p p p
#
X
2
X σi2 X 2 2
= qm ai + + E(ēi00 ) + E(ē000 ) − 2E(ēi00 ē000 )
q
i=1 i=1 i=1
31
Now
A
p
!
1X
E(ēi00 , ē000 ) = cov(ēi00 , ē000 ) = cov ēi00 , ēi00
p
i=1
σe2
G
1
= V (ēi00 ) =
p pqm
N
RA
Therefore,
TA
" p p p 2 #
X 1 X X σ σ 2
E(SSA) = qm a2i + σi2 + − e
q qm pqm
i=1 i=1 i=1
2 (p − 1) 2 p 2 p−1
= qm (p − 1)σA + σAB + σ
q qm e p
2 2
+ σe2
= (p − 1) qmσA + mσAB
32
q
X
and E(SSB) = pm E(ȳ0j0 − ȳ000 )2
j=1
Xq
= pm E(µ + bj + ē0j0 − µ − b̄ − ē000 )2
j=1
Xq
= pm E(bj − b̄)2 + E(ē0j0 − ē000 )2
j=1
− 2E (bj − b̄)(ē0j0 − ē000 )
A
| {z }
0
q
X
E(bj − b̄)2 + E(ē0j0 − ē000 )2
= pm
j=1
G
∴ E(bj − b̄) = E(b2j ) + E(b̄2 ) − 2E(bj b̄)
2
σ2 σ2
2 q−1 2
= σB + B −2 B = σB
q q q
N
E(ē0j0 − ē000 )2 = E(ē20j0 ) + E(ē2000 ) − 2 cov(ē000 , ē0j0 )
σe2
=
σ2
+ e −
2σe2 σ2
= e − e
σ2
pm pqm pqm pm pqm
RA
1 q−1
= σe2
pm q
q
X q−1 2 q−1 1 2
E(SSB) = pm σB + σ
q q pm e
j=1
2
= pm(q − 1)σB + (q − 1)σe2
TA
2
= (q − 1)(pmσB + σe2 )
33
Now
p X
X q
E[SS(AB)] = m E [ȳij0 − ȳi00 − ȳ0j0 + ȳ000 ]2
i=1 j=1
Xp X q
=m E[µ + ai + bj + cij + ēij0 − µ − ai − b̄ − c̄i0
i=1 j=1
A
X
=m E (cij − c̄i0 )2 + (ēij0 − ēi00 − ē0j0 + ē000 )2
| {z }
i=1 j=1
(p−1)(q−1) 2
pqm
σe (already done)
G
Therefore,
p X
q
RA
2 (q − 1) (p − 1)(q − 1) 2
X
∴ E[SS(AB)] = m σi + σe
q pqm
i=1 j=1
p
m(q − 1) X 2
= q σi + (p − 1)(q − 1)σe2
q
i=1
2
= m(p − 1)(q − 1)σAB + (p − 1)(q − 1)σe2
2
+ σe2 )
TA
= (p − 1)(q − 1)(mσAB
So finally,
SSA 2 2
E(M SA) = E = qmσA + mσAB + σe2
p−1
2
E(M SB) = pmσB + σe2
2
E(M S(AB)) = mσAB + σe2
and E(M SE) = σe2
2 = 0 [i.e., a = 0 ∀ i] vs
Hypothesis: Here we want to test H0A : σA i
2 >0
H1A : σA
34
2 = 0 ag H
H0B : σB 2
1B : σB > 0
2
H0AB : σAB 2
= 0 ag H1AB : σAB >0
Test statistic:
Under H0A
2
E[M SA] = mσAB + σe2 = E[M S(AB)]
A
M SA
so a right tailed test based on is appropriate under H0A ,
M S(AB)
M SA
G
∼ F(p−1),(p−1)(q−1)
M S(AB)
M SB
A right tailed test based on is appropriate. Similarly for testing
M SE
M S(AB)
H0AB a right tailed test based on is appropriate.
M SE
TA
Remark:
M SA
In general has no F -distribution under H0A . So we consider a
M S(AB)
approximate F -statistic with degrees of freedom (p − 1), (p − 1)(q − 1).
Some important questions and answers:
35
i.e.,
E(y1 ) a11 βq + a12 β2 + · · · + a1p βp
E(y2 ) a21 β1 + a22 β2 + · · · + a2p βp
. =
.. ..
.
E(yn ) an1 β1 + an2 β2 + · · · + anp βp
a11 a12 . . . a1p β1
a21 a22 . . . a2p β2
A
⇒ E(y ) = . (2)
. .. .. .
.
. . .
.
e
an1 an2 . . . anp βp
⇒ E(y ) = Xβ
G
e e
X is called design matrix containing known coefficients. β is the vector of
unknown model parameters.
N e
36
yijk = µ + ai + bj + cij + eijk
A
Orthogonal splitting of TSS:
p X
q X
m p q
G
X X X
2 2
(yijk − ȳ000 ) = mq (ȳi00 − ȳ000 ) + pm (ȳ0j0 − ȳ000 )2
i=1 j=1 k=1 i=1 j=1
p X
X q
+m (ȳij0 − ȳi00 − ȳ0j0 + ȳ000 )2
N +
p X
X q X
m
i=1 j=1
(yijk − ȳij0 )2
RA
i=1 j=1 k=1
2
E[M S(AB)] = mσAB + σe2
2 2
E(M SA) = mσAB + qmσA + σe2
2 2
E(M SB) = mσAB + pmσB + σe2
E(M SE) = σe2
37
Hypothesis
2 = 0 vs H
Here we want to test H0A : σA 2
1A : σA > 0
2 2
H0B · σB = 0 vs H1B : σB >0
2
and H0AB · σAB 2
= 0 vs H1AB : σAB >0
Test Statistic
2 + σ 2 = E(M S(AB))
Under H0A , E(M SA) = mσAB e
As we deviate away from H0A , E(M SA) ≥ E(M S(AB)) so right tailed test
M SA M SA
A
based on is appropriate under H0A , ∼ Fp−1,(p−1)(q−1)
M S(AB) M S(AB)
we reject H0A at level α if
M SA
G
> Fα;(p−1);(p−1)(q−1) .
M S(AB)
As we deviate away from H0B , E(M SB) ≥ E(M S(AB)). A right tailed
M SB M SB
test based on is appropriate. Hence we reject H0B if >
M S(AB) M S(AB)
Fα;(q−1);(p−1)(q−1) .
RA
Again under H0AB , E[M S(AB)] = E[M SE] = σ 2 . As we deviate away
from H0AB , E[M S(AB)] > E[M SE]. Thus a right tailed test based on
M S(AB)
is appropriate. We reject H0AB if MMS(AB)
SE > Fα;(p−1)(q−1),pq(m−1)
M SE
As we drift away from H0A and H0B , E(M SA) ≥ E(M S(AB)) and
E(M SB) ≥ E(M S(AB)) i.e., a right tailed test based on MM SA
S(AB) and
M SB
is appropriate M S(AB) serves as the valid error for testing H0A
TA
M S(AB)
and H0B . Whereas under H0AB , E(M S(AB)) = E(M SE) and as we
drift away from H0AB E(M S(AB)) ≥ E(M SE). Hence a right tailed test
based on M S(AB) is appropriate. Hence M SE serves as the valid error
in this case. Hence valid error changes over different testing problem.
38
refers to as total variance and via orthogonal splitting total sum of square
is splitted into sum square due to different sources of variation. Here the
term “orthogonal” indicates that sum of square due to different sources
are independent of each other.
In one way layout, fixed effect model, the total sum of square is partitioned
into sum of square due to the single factor and sum square due to error as
total variability is caused by the single factor and error. We can describe
it as follows,
A
yij = ȳ00 + (ȳi0 − ȳ00 ) + eij
⇒ (yij − ȳ00 ) = (ȳi0 − ȳ00 ) + (yij − ȳi0 )
G
Squaring and summing over i and j we get
k X
X ni k X
X ni k X
X ni
(ȳi0 − ȳ00 )2 = (ȳi0 − ȳ00 )2 + (yij − ȳi0 )2
N i=1 j=1 i=1 j=1
k k X
ni
RA
X 2
X
= ni (ȳi0 − ȳ00 ) + (yij − ȳio )2
i=1 i=1 j=1
where
k X
X ni
(yij − ȳ00 )2 = Sum of squares due to the factor A. (SSA)
i=1 j=1
TA
k X
X ni
(yij − ȳi0 ) = sum of squares due to the error (SSE)
i=1 j=1
39
hypothesis is strongly accepted, i.e., effect of the corresponding factor is
tested not to be true.
A
yij = µ + αi + βj + γij + eij , i = 1(1)p, j = 1(1)q
µ = general effect
G
αi = additional effect due to ith level of A.
βj = additional effect due to jth level of B.
γij = Interaction effect due to ith level of A and jth level of B.
N
For one observation per cell we take γij = 0 as γij can not be estimated
so the model becomes,
yij = µ + αi + βj + eij .
RA
Assumptions:
p q
iid
X X
(i) αi = 0 (ii) βj = 0 (iii) eij ∼ N (0, σ 2 ).
i=1 j=1
Hypothesis
↓
The factor A has no effect
(ii) H02 : βj = 0 ∀ j vs H1 : at least one inequality in H02
↓
Factor B has no effect.
Under H0 ,
E(M SA) = σ 2 As we deviate away from H01
E(M SA) ≥ σ 2 = E(M SE)
M SA
So intuitively a large value of M SE indicates the rejection of H0 .
40
Under H01 , +
SSA
σ2
∼ χ2p−1
SSE
independent
σ2
∼ χ2(p−1)(q−1)
Now
SSA/p − 1
F1 = ∼ Fp−1 , (p − 1)(q − 1)
SSE/(p − 1)(q − 1)
M SA
F1 = ∼ Fp−1,(p−1)(q−1)
M SE
A
So we reject H01 at size α if
F1 > Fα p−1,(p−1)(q−1)
G
If H01 is rejected then paired comparison is required
From comparing ith class with i0 th class we have the following hypothesis,
N H0 : αi = αi0 vs H1 : αi 6= αi0
We reject H0 at level α if
s
2M SE
TA
41
B or performing one way ANOVA by considering factor B for a particular
level of A.
In the current set up testing for individual effects can be carried out only
if interaction is tested to be not present.
A
N G
RA
TA
42