Sei sulla pagina 1di 42

Analysis of variance

(ANOVA)

A
G
1. All the students of this batch are requested to keep a copy of
the main class note in order to check for any discrepancy
N
2. And for outsiders of the batch if you find any mistake re-
garding anything in this notes, then before taking screen shot
and making fun of that, report to me directly cause always re-
RA
member you are reading this as your note is not worthwhile

What is analysis of variance? Explain with an example.


The total variation present in a set of observable quantities may under
certain circumstances be partitioned into a number of disjoint components
TA

associated with the nature of classification of the data. The systematic


methodology by which one can partition the causes of variation into several
components is called analysis of variance.
Let us consider yield of paddy. Suppose the yield is carried out using three
kinds of seeds . So, the yield variation occurs due to variation of seed and
also due to some random error. This is an example of one-way fixed effect
layout of ANOVA.
What is Gauss-Markov Linear model?.Distinguish between fixed,
mixed and random effects model. Also distinguish between ANOVA,
Regression and ANCOVA model
Consider ‘n’ independent random observations y1 , y2 , . . . , yn where

1
E(yi ) = ai1 β1 + ai2 β2 + · · · + aip βp
V (yi ) = σ 2
cov(yi , yj ) = 0 ∀ i 6= j

where β1 , β2 , . . . , βp are model parameters and ai1 , ai2 , . . . , aip are coefficients
which are known.
Define,

A
 
a11 a12 ... a1p
 
 a21 a22 . . . a2p 
n×p
X = .
 
.. .. 

G
 .. . . 
 
an1 an2 . . . anp
 
N y1
 
 y2 
y= . 
 
.
e  .
yn
RA
   
E(y1 ) a11 β1 + a12 β2 + · · · + a1p βp
   
 E(y2 )   a21 β1 + a22 β2 + · · · + a2p βp 
E(y ) =  .  = 
   
. .. 
e 
 . 
 
 . 

E(yn ) an1 β1 + an2 β2 + · · · + anp βp
  
TA

a11 a12 . . . a1p β1


  
 a21 a22 . . . a2p  β2 
= .
  
 .. .. ..    ... 
 
 . .  
an1 an2 . . . anp βp
= X n×p β p×1
e
 
β1
 
β2 
where β =  .  .
 
.
e  .
βp
So finally we can write y = Xβ + e, where E(e) = 0 and Disp (e) = σ 2 In .
e e e e e
2
This model is called Gauss-Markov linear model:
If we assume that
 β1 , β2 , . . . , βp themselves are a random sample from the
distribution of β̃1 , β̃2 , β̃p , where β̃j is a random variable whose realised
value is βj , then the model is called random effects model.
If all the parameters are prefixed then the model is called fixed effects model.
If some of the parameters are fixed quantity and some are chosen at random
then the model is called Mixed effects model.
Let us consider the Gauss Markov linear model Y = Xβ + e, where X
is called design matrix, which contains coefficients corresponding
e e to model
e

A
parameters.
If the value of the coefficients are binary i.e., 0 or 1, then the model is
called Analysis of variance model.

G
By an ANOVA model we test whether an effect is absent or present.
When the value of the coefficients come from the value of some other
independent co-variates i.e., coefficients have usual continuous value, then
N
the model is called a regression model.. Hence by a regression model, the
impact of an effect can be judged.
If some of the coefficient values are binary and some are continuous then
RA
the model is called Analysis of covariance (ANCOVA) model.

One way fixed effects model:

Motivation: Let us consider that a single medicine which controls fever, has
four different dose levels.
TA


DrugA 


10 mg 



15 mg Dose levels


25 mg 




50 mg 

Here fever is influenced by a single medicine( Factor) and the factor has
fixed levels. Hence dependence of fever over medicine has one way varia-
tion viz medicine Suppose there be a single factor A with ‘k’ fixed levels
A1 , A2 , . . . , Ak , say.
For ith level, let there be ni observations, yi1 , yi2 , . . . , yni , i = 1(1)k. We
represent the observations in the following array data.

3
A1 y11 y12 . . . y1n1
A2 y21 y22 . . . y2n2
.. .. .. ..
. . . .
Ak yk1 yk2 . . . yknk
k
X
ni = n is the total number of responses

A
i=1
one way fixed effect model is given by
yij = µi + eij , i = 1(1)k, j = 1(1)ni
yij = response corresponding to jth observation of ith level of A

G
µi = effect due to ith level of A

N
Next we re parametrise the model as follows
yij = µ + (µi − µ) + eij
= µ + αi + eij
RA
αi = additional effect due to ith level of A
µ = general effect
eij = error in model
Re parametrization essentially leads to separation of error and exact effect
due the factor A
TA

Assumption:
k
X k
X k
X k
X
1. ni αi = ni (µi − µ) = ni µ i − µ ni = nµ − nµ
i=1 i=1 i=1 i=1
 P 
ni µ i
=0 µ= = grand mean of effects
n
a
2. eij ∼ N (0, σ 2 )

Estimation of the model parameter:


XX
E= e2ij
i j

4
∂E ∂ XX
=0⇒ (yij − µ − αi )2 = 0
∂µ ∂µ
i j
XX
⇒ (yij − µ − αi )(−1) = 0
i j
XX XX XX
⇒ yij = µ+ αi
i j i j i j

XX k
X
⇒ yij = nµ − ni αi
i j i=1

A
| {z }
0
k X
ni
1 X
⇒ µ̂ = yij = ȳ00
n
i=1 j=1

G
k ni
∂ XX
(yij − µ − αi )2 = 0
∂αi
i=1 j=1


N ni
X

j=1
(yij − µ − αi )(−1) = 0 [here we differentiate on w.r.t. αi
for a specific value of i = 1(1)k. So
RA
summation over i vanishes]
ni
X ni
X ni
X
⇒ yij − µ− αi = 0
j=1 j=1 j=1
ni
X
⇒ yij − ni µ = ni αi
j=1
TA

ni
1 X
⇒ α̂i = −µ̂ = ȳi0 − ȳ00
ni
j=1

i.e., yij = ȳ00 + (ȳi0 − ȳ00 ) + eij (1)

Now
eij = yij − α̂i − µ̂ = yij − (ȳi0 − ȳ00 ) − ȳ00
= yij − ȳi0
i.e., from (1)
yij = ȳ00 + (ȳi0 − ȳ00 ) + eij
⇒ (yij − ȳ00 ) = (ȳi0 − ȳ00 ) + (yij − ȳi0 )

5
Squaring and summing over i and j we get

k X
X ni k X
X ni k X
X ni
(yij − ȳ00 )2 = (ȳi0 − ȳ00 )2 + (yij − ȳi0 )2
i=1 j=1 i=1 j=1 i=1 j=1

(product term vanishes)


k
X k X
X ni
= ni (ȳi0 − ȳ00 )2 + (yij − ȳio )2
i=1 i=1 j=1

A
where

G
k
X
ni (ȳi0 − ȳ00 )2 = Sum of squares due to the factor A. (SSA)
i=1
k X
X ni
(yij − ȳio )2 = sum of squares due to the error (SSE)
N
i=1 j=1
RA
i.e., T SS = SSA + SSE
This is called orthogonal splitting of total sum of square.
Now note that TSS carries degrees of freedom (n − 1) and SSA carries degrees
of freedom (k − 1).Hence SSE carries degrees of freedom n − k
Hypothesis: Here we want to test whether all levels of A have similar
effect or not H0 : αi = 0 vs. Hi : At least one inequality in H0 .
TA

yij = µi + eij
α i = µi − µ = 0
[µ1 = µ2 = · · · = µk ]

Expectation of sum of squares:

k
X
SSA = ni (ȳi0 − ȳ00 )2
i=1

6
Note that yij = µ + αi + eij

ni ni
1 X 1 X
ȳi0 = yij = (µ + αi + eij )
ni ni
j=1 j=1
ni ni ni
1 X 1 X 1 X
= µ+ αi + eij
ni ni ni
j=1 j=1 j=1
| {z }
ēi0

A
= µ + αi + ēi0
i k n i k n
1 XX 1 XX
ȳ00 = yij = (µ + αi + eij )
n n
i=1 j=1 i=1 j=1

G
ni
k X ni
k X i k n
1 X 1 X 1 XX
= µ+ αi + eij
n n n
N i=1 j=1 i=1 j=1 i=1 j=1
| {z }
0
= µ + ē00
k
X
i.e., SSA = ni {(µ + αi + ēi0 ) − (µ + ē00 )}2
RA
i=1
Xk
= ni {αi + ēi0 − ē00 }2
i=1
Xk
ni αi2 + (ēi0 − ē00 )2 + 2αi (ēi0 − ē00 )

=
i=1
TA

eij ’s are linearly independent N (0, σ 2 )

k
X k
X k
X
2
∴ E(SSA) = ni αi2 + ni E (ēio − ē00 ) + 2 ni αi E (ēi0 − ē00 )
| {z }
i=1 i=1 i=1 0

E(ēi0 ) = 0
E(ē00 ) = 0

7
Therefore,
k
X k
X
E(SSA) = ni αi2 + ni E (ēi0 − ē00 )2
i=1 i=1
2
E (ēi0 − ē00 ) = E(ē2i0 ) + E(ē200 ) − 2E(ēi0 ēi0 )
= V (ēi0 ) + V (ē00 ) − 2 cov(V (ēi0 , (ē00 )
σ2 σ2 σ2 σ2 σ2
= + −2 = −
ni n n ni n
k
!

A
1 X
[cov(ēi0 , ē00 ) = cov ēi0 , ni ēi0
n
i=1
1 1 σ2 σ2
= ni V (ēi0 ) = · ni = ]

G
n n ni n

Therefore,
k k
N
E(SSA) =
X

i=1
k
ni αi2 +
X

i=1
ni

σ2 σ2
ni

n


k
X σ2 X
ni αi2 2
ni αi2 + (k − 1)σ 2
RA
= + kσ − ·n=
n
i=1 i=1
 
X n
k Xi

E(SSE) = E  (yij − ȳi0 )2 


i=1 j=1

yij = µ + αi + eij
ȳi0 = µ + αi + ēi0
TA

 
Xk X ni
E(SSE) = E  (eij − ēi0 )2 
i=1 j=1
 
X ni
k X
E(e2ij ) + E(ē2i0 ) − 2E(eij ēi0 ) 

=E
i=1 j=1

E(e2ij ) = V (eij ) = σ 2
σ2
E(ē2i0 ) = V (ēi0 ) =
ni
 
ni
1 X 1 σ2
E(eij ēi0 ) = E eij , eij  = E(e2ij ) =
ni ni ni
j=1

8
Therefore,
ni 
k X k
σ2 σ2
X  X
2
ni σ 2 − σ 2

E(SSE) = σ + −2 =
ni ni
i=1 j=1 i=1
k
X k
X
= σ2 ni − σ 2 1 = σ 2 (n − k)
i=1 i=1

Note that
k

A
X
E(SSA) = ni αi2 + (k − 1)σ 2
i=1
  k
SSA 1 X
⇒E = ni αi2 + σ 2

G
k−1 k−1
i=1
k
1 X
⇒ E(M SA) = ni αi2 + σ 2
k−1
i=1
N
(k − 1) degrees of freedom corresponding to SSA.
Here, MSA = Mean square due to A.
RA
Again,

E(SSE) = (n − k)σ 2
 
SSE
⇒E = σ 2 (MSE : Mean square error)
n−k
⇒ E(M SE) = σ 2
TA

(n − k) degrees of freedom due to error. Under H0 ,

E(M SA) = σ 2 = E(M SE)

As we deviate away from H0 ,

E(M SA) ≥ E(M SE)

M SA
We define the following test statistic F = so the large value of F
M SE
indicates the rejectION of H0 . So a right tailed test based on F statistic
under H0 , is appropriate.
+
SSA 2
σ 2 ∼ χ k−1
SSE 2
independent
σ 2 ∼ χ n−k

9
SSA/k − 1
F = ∼ Fk−1,n−k
SSE/n − k
i.e., we reject H0 at level α if

F > Fα;k−1,n−k .

Paired comparison:

H0 : µ1 = µ2 = · · · = µk v/s H1 : at least one inequality in H0 .

A
If H0 is rejected then different class means are different from each other,
so a paired comparison is required.
From comparing ith class with i0 th class we have the following hypothesis,

G
H0 : µi = µ0i vs H1 : µi 6= µ0i

The corresponding test statistic is,


N ȳi0 − ȳi0 0
t= r   ∼ tn−k .
M SE n1i + 1
n0i
RA
We reject H0 at level α if
s  
1 1
|ȳi0 − ȳ | >
i0 0 M SE + t
ni n0i α/2;n−k

this quantity called least significant.


Remark:
TA

let m hypothesis to be tested simultaneously. Type-1 error of each test is α.

P [at least one false rejection] = 1 − P [No false rejection]


= 1 − (1 − α)m

as m → ∞, P [at least one false rejection] → 1.This is a problem of multiple


testing

Two way fixed effects model: (one observation per cell)


Motivation: Suppose in statistics tuition market there are two teachers
the first teacher has two fixed batches and the second teacher has three fixed
batches. Also suppose that only one student is present in the intersection of

10
the batches of two teachers. Then marks of students is varied by two factors,
i.e., the two teachers and it is modelled by ANOVA two way fixed effects
model one observation per cell.

B→
B1 B2 . . . Bq
A↓
A1 y11 y12 . . . y1q
A2 y21 y22 . . . y2q

A
.. .. .. ..
. . . .
Ap yp1 yp2 . . . ypq

G
Model:
yij = observation corresponding to ith level of A and jth level of B.
Here the model is : yij = µij + eij , i = 1(1)p, j = 1(1)q
N
We re parametrize the model as follows,
yij = µ + (µi0 − µ) + (µ0j − µ) + (µij − µi0 − µ0j + µ) + eij
RA
= µ + αi + βj + γij + eij

µ = general effect
αi = additional effect due to ith level of A.
βj = additional effect due to jth level of B.
γij = Interaction effect due to ith level of A and jth level of B.
TA

eij = The error in the model


For one observation per cell we take γij = 0 as γij can not be estimated so
the model becomes,
yij = µ + αi + βj + eij .

Assumptions:
p q
iid
X X
(i) αi = 0 (ii) βj = 0 (iii) eij ∼ N (0, σ 2 ).
i=1 j=1
Model parameter estimation
p X
X q p X
X q
E= e2ij = (yij − µ − αi − βj )2
i=1 j=1 i=1 j=1

11
p X
q
∂E X
=0⇒ (yij − µ − αi − βj )(−1) = 0
∂µ
i=1 j=1
Xp X q p X
X p p X
X q
⇒ yij − pqµ − αi − βj = 0
i=1 j=1 i=1 j=1 i=1 j=1
| {z } | {z }
0 0
p X
q
1 X
⇒ µ̂ = yij = ȳ00
pq
i=1 j=1
p q

A
∂E ∂ XX
=0⇒ (yij − µ − αi − βj )2 = 0
∂αi ∂αi
i=1 j=1
q
X q
X
⇒ yij = qµ + qαi + βj

G
j=1 j=1
|{z}
0
q
N 1X
⇒ α̂i = yij − µ̂ = ȳi0 − ȳ00
q
j=1

1P p
Similarly, β̂j = ȳ0j − ȳ00 where ȳ0j =
yij
RA
p i=1
orthogonal splitting of total sum of squares:
Note that yij = µ̂ + α̂ii + β̂j + eij

i.e., yij = ȳ00 + (ȳi0 − ȳ00 ) + (ȳ0j − ȳ00 ) + (yij − ȳi0 − ȳ0j + ȳ00 )

⇒ (yij − ȳ00 ) = (ȳi0 − ȳ00 ) + (ȳ0j − ȳ00 ) + (yij − ȳi0 − ȳ0j + ȳ00 )
TA

p X
X q p
X q
X
⇒ (yij − ȳ00 )2 = q (ȳi0 − ȳ00 )2 + p (ȳ0j − ȳ00 )2
i=1 j=1 i=1 j=1
X p X q
+ (yij − ȳi0 − ȳ0j + ȳ00 )
i=1 j=1

(product term vanishes)

T SS = SSA + SSB +SSE→Sum of square due to error


↓ ↓ ↓
Total sumSum of square Sum of square
of square due to A due to B

12
Degrees of freedom corresponding to TSS → pq − 1
Degrees of freedom corresponding to SSA → p − 1
Degrees of freedom corresponding to SSB → q − 1
Degrees of freedom corresponding to error → (p − 1) × (q − 1)

Expectation of SS:
 
p X
q

A
X
E(SSE) = E  (yij − ȳi0 − ȳ0j + ȳ00 )2 
i=1 j=1

G
yij = µ + αi + βj + eij
q q
1X 1X
ȳi0 = yij = (µ + αi + βj + eij )
q q
j=1 j=1
N = µ + αi +
1
q
q
X

j=1
βj +
1X
q
q
eij
j=1
RA
= µ + αi + ēi0

Similarly,

ȳ0j = µ + βj + ē0j
ȳ00 = µ + ē00
TA

SSE:

p X
X q
(µ + αi + βj + eij − µ − αi − ēi0 − µ − βj − ē0j + µ + ē00 )2
i=1 j=1
Xp X q
= (eij − ēi0 − ē0j + ē00 )2
i=1 j=1
Xp X q
= (e2ij + ē2i0 + ē20j + ē200 − 2eij ēi0 − 2eij ē0j + 2eij ē00
i=1 j=1

+ 2ēi0 ē0j − 2ēi0 ē00 − 2ē0j ē00 )

13
p X
X q
E[SSE] = [E(e2ij ) + E(ē2i0 ) + E(ē20j ) + E(ē200 ) − 2 cov(eij , ēi0 ) − 2 cov(eij , ē0j
i=1 j=1

+ 2 cov(eij , ē00 ) + 2 cov(ēi0 , ē0j ) − 2 cov(ēi0 , ē00 ) − 2 cov(ē0j , ē00 )]

 
q
1X 1 1 σ2
cov(eij , ēi0 ) = cov eij , eij  = E(e2ij ) = V (eij ) =
q q q q
j=1
p
!
1X 1 σ2

A
cov(eij , ē0j ) = cov eij , eij = V (eij ) =
p p p
i=1
 
p q
1 XX  1 σ2
cov(eij , ē00 ) = cov eij ,
 eij = V (eij ) =

G
pq pq pq
i=1 j=1
 
q p
1X 1X  1 σ2
cov(ēi0 , ē0j ) = cov  eij , eij = V (eij ) =
N q p pq pq
j=1 i=1
p
!
1X 1 σ2 σ2
cov(ēi0 , ē00 ) = cov ēi0 , ēi0 = =
p p q pq
i=1
RA
Similarly,
 
q
q X 1 σ2 σ2
cov(ē0j , ē00 ) = cov ē0j , ē0j  = =
q q p pq
j=1
TA

p X q 
σ2 σ2 σ2 σ2 σ2 σ2 σ2 σ2 σ2
X 
E(SSE) = σ2 + + + −2 −2 +2 +2 −2 −2
p q pq q p pq pq pq pq
i=1 j=1
p X q 
σ2 σ2 σ2
X 
= σ2 − − +
p q pq
i=1 j=1
pqσ 2 (p − 1)(q − 1)
= = σ 2 (p − 1)(q − 1)
pq


2 SSE
i.e., E[SSE] = σ (p − 1)(q − 1) ⇒ E = σ2
(p − 1)(q − 1)
⇒ E[M SE] = σ 2

14
p
" # " p #
X X
2 2
E[SSA] = E q (ȳi0 − ȳ00 ) = E q (µ + αi + ēi0 − µ − ē00 )
i=1 i=1
p
" #
X
=E q (αi + ei0 − ē00 )2
i=1
p
" #
X
=E q (αi2 + ē2i0 + ē200 − 2ēi0 ē00 + 2αi ēi0 − 2αi ē00 )
i=1
p
X  2
αi + E(ē2i0 ) + E(ē200 ) − 2 cov(ēi0 , ē00 )

=q

A
i=1
p  p
σ2 σ2
X  X
=q αi2 + − =q αi2 + pσ 2 − σ 2
q pq
i=1 i=1

G
p
X p
X
=q αi2 + pσ 2 − σ 2 = q αi2 + (p − 1)σ 2
i=1 i=1
  p
SSA q X
αi2 + σ 2
E
N p−1
=
p−1

q
i=1
p
X
⇒ E[M SA] = αi2 + σ 2
p−1
RA
i=1

p Pq
Similarly, E[M SB] = β 2 + σ2
q − 1 j=1 j
Hypothesis
Here we want to test,

(i) H01 : αi = 0 ∀ i vs H1 : at least one inequality in H01


TA


The factor A has no effect

(ii) H02 : βj = 0 ∀ j vs H1 : at least one inequality in H02



Factor B has no effect.

Under H0 ,
E(M SA) = σ 2 As we deviate away from H01
E(M SA) ≥ σ 2 = E(M SE)
M SA
So intuitively a large value of M SE indicates the rejection of H0 .

15
Under H01 , +
SSA
σ2
∼ χ2p−1
SSE
independent
σ2
∼ χ2(p−1)(q−1)

Now
SSA/p − 1
F1 = ∼ Fp−1 , (p − 1)(q − 1)
SSE/(p − 1)(q − 1)
M SA
F1 = ∼ Fp−1,(p−1)(q−1)
M SE

A
So we reject H01 at size α if

F1 > Fα p−1,(p−1)(q−1)

G
M SB
Define F2 = ∼ Fq−1,(p−1)(q−1)
M SE
We reject H02 at size α if
N F2 > Fα; q−1,(p−1)(q−1)
RA
Two way fixed effects model

(m-observations per cell)


Motivation: Suppose in statistics tuition market there are two teachers.
The first teacher has two fixed batches and the second teacher has three fixed
batches. Also suppose that ‘m’ students are present in the each intersection of
TA

the batches of two teachers. Then marks of students is varied by two factors
i.e., the two teachers and it is modelled by ANOVA two way fixed effects
layout m observation per cell.

Model:
We consider a factor A having p fixed levels A1 , ..., Ap and another factor B
with q fixed levels B1 , B2 , .., Bq . Corresponding to ith level of A and j th level
of B there are ”m” observations ∀i, j
yijk = µ + αi + βj + γij + eijk , i = 1(1)p, j = 1(1)q, k = 1(1)m
yijk = kth observation corresponding to ith level of A and jth level of B.
µ = general effect
αi = additional level due to ith level of A.

16
βj = additional effect due to jth level of B.
γij = interaction effect due to ith level of A and jth level of B.
eijk = error in the model.
Here we can estimate γij and hence we incorporate it in the model. Assump-
tion:
Xp q
X
(i) αi = 0 (ii) βj = 0
i=1 j=1
p q p X
q
iid
X X X
(iii) γij = γij = γij = 0 (iv) eijk ∼ N (0, σ 2 )

A
i=1 j=1 i=1 j=1

G
Estimation of model parameter:

p q m
1 XXX
µ̂ = ȳ000 = yijk
N pqm
i=1 j=1 k=1

q m
1 XX
α̂i = ȳi00 − ȳ000 ,
RA
ȳi00 = yijk
qm
j=1 k=1
p m
1 XX
β̂j = ȳ0j0 − ȳ000 , ȳ0j0 = yijk
pm
i=1 k=1

p X
X q X
m
TA

E= (yijk − µ − αi − βj − γij )2
i=1 j=1 k=1
m
∂E X
= (−2)(yijk − µ − αi − βj − γij ) = 0
∂γij
k=1

m
X
⇒ yijk = mµ̂ + mα̂i + mβ̂j + mγ̂ij
k=1
m
1 X
⇒ γ̂ij = yijk − (ȳi00 − ŷ000 ) − (ȳ0j0 − ȳ000 ) − ȳ000
m
k=1
⇒ γ̂ij = ȳij0 − ȳi00 − ȳ0j0 + ȳ000

17
Orthogonal splitting of total sum of square

yijk = ȳ000 + (ȳi00 − ȳ000 ) + (ȳ0j0 − ȳ000 )


+ (ȳij0 − ȳi00 − ȳ0j0 + ȳ000 ) + (yijk − ȳijo )
⇒ (yijk − ȳ000 ) = (ȳi00 − ȳ000 ) + (ȳ0j0 − ȳ000 )
+ (ȳij0 − ȳi00 − ȳ0j0 + ȳ000 ) + (yijk − ȳij0 )

squaring and taking sum in both sides we get

A
p X
X q X
m p
X q
X
2
(yijk − ȳ000 ) = qm (ȳi00 − ȳ000 ) + pm (ȳ0j0 − ȳ000 )
i=1 j=1 k=1 i=1 j=1
p X
q

G
X
+m (ȳij0 − ȳi00 − ȳ0j0 + ȳ000 )2
i=1 j=1
p X
X q X
m
+ (yijk − ȳij0 )2
N i=1 j=1 k=1

T SS = SSA + SSB + SS(AB) + SSE


RA
degrees of freedom of TSS = pqm − 1
degrees of freedom of SSA = p − 1
degrees of freedom of SSB = q − 1
degrees of freedom of SS(AB) = (p − 1)(q − 1)
degrees of freedom of SSE = pq(m − 1)
TA

Expectation of SS:

p X
X q X
m
SSE = (yijk − ȳij0 )
i=1 j=1 k=1

yijk = µ + αi + βj + γij + eijk


m
1 X
ȳij0 = (µ + αi + βj + γij + eijk )
m
k=1
= µ + αi + βj + γij + ēijo

18
p X
X q X
m
E(SSE) = E(eijk − ēij0 )2
i=1 j=1 k=1
Xp X q X m
E e2ijk + ē2ij0 − 2ēij0 eijk
 
=
i=1 j=1 k=1
Xp X q X m
E(e2ijk ) + E(ē2ij0 ) − 2 cov(eijk , ēij0 )
 
=
i=1 j=1 k=1
p X q X m m
" !#
X σ2 1 X
= σ2 + − 2 cov eijk , eijk

A
m m
i=1 j=1 k=1 k=1
p X q X m 
σ2 σ2
X 
2
= σ + −2
m m
i=1 j=1 k=1

G
p X q X m
σ2
X  
= σ2 −
m
i=1 j=1 k=1
(m − 1)pqmσ 2
N =
m
= (m − 1)pqσ 2
p X
X q
E(SS(AB)) = m (ȳij0 − ȳi00 − ȳ0j0 + ȳ000 )2
RA
i=1 j=1

yijk = µ + αi + βj + γij + eijk


⇒ ȳij0 = µ + αi + βj + γij + ēij0
ȳi00 = µ + ēi00 + αi
ȳ0j0 = µ + ē0j0 + βj
TA

ȳ00 = µ + ē000
 
p X
X q
E[SS(AB)] = m E (ēij0 − ēi00 − ē0j0 + ē000 + γij )2 
i=1 j=1
p X
X q
=m {E(ēij0 − ēi00 − ē0j0 + ē000 )2 + E(γij
2
)
i=1 j=1

+ 2γij E(ēij0 − ēi00 − ē0j0 + ē000 )}
p X
X q p X
X q
2
=m γij +m E(ēij0 − ēi00 − ē0j0 + ē000 )2
i=1 j=1 i=1 j=1

19
Note that
E(ēij0 − ēi00 − ē0j0 + ē000 )2
= E(ē2ij0 + ē2i00 + ē20j0 + ē2000 − 2ēij0 ēi00 − 2ēij0 ē0j0
+ 2ēij0 ē000 + 2ē0j0 ēi00 − 2ēi00 ē000 − 2ē0j0 ē000 )
= E(ē2ij0 ) + E(ē2i00 ) + E(ē0j0 )2 + E(ē2000 )
− 2 cov(ēij0 , ēi00 ) − 2 cov(ēij0 , ē0j0 ) + 2 cov(ēi00 , ē0j0 )
+ 2 cov(ēij0 , ē000 ) − 2 cov(ēi00 , ē000 ) − 2 cov(ē0j0 , ē000 )

A
 
q
1X 1 σ2
cov(ēij0 , ēi00 ) = cov ēij0 , ēij0  = V (ēij0 ) =
q q mq
j=1

G
 
p
1 X 1 σ2
cov(ēij0 , ē0j0 ) = cov ēij0 , ēij0  = V (ēij0 ) =
p p mp
j=1
 
q X m p X
m
Ncov(ēi00 , ē0j0 ) = cov 
1
qm
X

m
eijk ,
j=1 k=1
1
pm
X
eijk 
i=1 k=1

1 X 1 2 σ2
RA
= V (eijk ) =mσ =
pqm2 pqm2 pqm
k=1
p
!
1X 1 σ2
cov(ēi00 , ē000 ) = cov ēi00 , ēi00 = V (ēi00 ) =
p p pqm
i=1
σ2
cov(ēij0 , ē000 ) =
pqm
TA

Similarly,
σ2
cov(ē0j0 , ē000 ) =
pqm
p X
q  2
X σ σ2 σ2 σ2
= + + +
m qm pm pqm
i=1 j=1
σ2 σ2 σ2 σ2 σ2 σ2

−2
−2 +2 +2 −2 −2
mq mp pqm pqm pqm pqm
p q m
X X X σ2 σ2 σ2 σ2
 
= − − +
m mp qm pqm
i=1 j=1 k=1
mpq(pq − p − q + 1)
= σ2 = σ 2 (p − 1)(q − 1)
mpq

20
Therefore,
p X
X q
2
E[SS(AB)] = m γij + (p − 1)(q − 1)σ 2
i=1 j=1
Xp
E(SSA) = qm αi2 + (p − 1)σ 2 [see copy]
i=1
Xq
E(SSB) = pm βj2 + (q − 1)σ 2 [see copy]
j=1

A
Hypothesis
Here we want to test,

G
(i) H01 : αi = 0 ∀ i vs H1 : at least one inequality in H01

The factor A has no effect
N
(ii) H02 : βj = 0 ∀ j vs H1 : at least one inequality in H02

RA
Factor B has no effect.

(iii) H03 : γij = 0 ∀ i, j vs H1 : at least one inequality in H03



The interaction of A and B has no effect.
TA

test statistic
Note that under H01 , E(M SA) = E(M SE) = σ 2 .As we drift away from H01 ,
M SA
E(M SA) > E(M SB). Thus a large value of M SE indicates the rejection of
H01 .
Under H01 , +
SSA 2
σ 2 ∼ χp−1
SSE
independent
σ2
∼ χ2pq(m−1)

SSA/p − 1
F1 = ∼ Fp−1,pq(m−1)
SSE/pq(m − 1)
M SA
F1 = ∼ Fp−1,pq(m−1)
M SE

21
We reject H01 at level α if F 1 > Fα;p−1,pq(m−1) .
Similarly we reject H02 at level α if F 2 > Fα;q−1,pq(m−1) and we reject H03
M SB M S(AB)
at level α if F 3 > Fα;(p−1)(q−1),pq(m−1) . Here, F 2 = M SE and F 3 = M SE

Random effects model:one way layout

Motivation: The yield of paddy in India is influenced by different states.


So state is a single factor causing the variation. Since India has twenty nine

A
states, the yield data can not be obtained from all the states due to time
and cost constraints. So a sample of states are randomly chosen. Hence the
factor ”State” has random number of levels. Thus it is modelled as one way

G
random effects model.
Model
Let us consider a single factor A having ‘k’ random levels, where the levels are
so chosen at random from larger number of levels. There are ‘r’ observations
N
corresponding to each levels.
So total number of observations are, n = rk.
ANOVA model is given by
RA
yij = µ + ai + eij

where µ = general effect


ai = additional random effect corresponding to ith level of A
eij = error in the model
TA

yij = jth observation corresponding to ith level of A

Assumption:
iid
(i) ai ∼ N (0, σa2 )
iid
(ii) eij ∼ N (0, σe2 )

(iii) ai and eij are independent.

Orthogonal splitting of TSS:


k X
X r k
X k X
X r
(yij − ȳ00 )2 = r (ȳi0 − ȳ00 )2 + (yij − ȳi0 )2
i=1 j=1 i=1 i=1 j−1

22
T SS = SSA + SSE

degrees of freedom of T SS = (n − 1)

degrees of freedom of SSA = (k − 1)

A
degrees of freedom of SSE = (n − k)

G
Expectation of SS:

N " k
X
#
2
E[SSA] = E r (ȳi0 − ȳ00 )
RA
i=1
k
X
=r E(ȳi0 − ȳ00 )2
i=1
k
" #
X
=E r (µ + ai + ēi0 − µ − ā − ē00 )2
i=1
TA

yij = µ + ai + eij
r
1X
ȳi0 = yij = µ + ai + ēi0
r
j=1

ȳ00 = µ + ā + ē00

23
Now by (*)

k
" #
X
(ai − ā)2 + (ēi0 − ē00 )2 + 2(ai − ā)(ēi0 − ē00 )

E(SSA) = E r
i=1
k
X k
X k
X
=r E(ai − ā)2 + r E(ēi0 − ē00 )2 + 2 (ai − ā)(ēi0 − ē00 )
i=1 i=1 i=1
| {z }
0

A
k
X
E(ai − ā)2 + E(ēi0 − ē00 )2

=r
i=1
Xk
E(a2i ) + E(σ̄ 2 ) − 2 cov(ai , ā)

G

=r
i=1
k
X
E(ē2i0 ) + E(ē200 ) − 2 cov(ēi0 , ē00 )

N +r
i=1
k k  2
σa2 σa2 σa σe2 σe2
X   X 
= r σa2 + −2 +r + −2
k k r rk rk
i=1 i=1
RA
k  k  2
σa2 σe2
 
X X σe
= r σa2 − +r −
k r rk
i=1 i=1
= r(k − 1)σa2 + (k − 1)σe2 = (k − 1)(rσa2 + σe2 )

h i Pk
SSA
= rσa2 + σe2 cov(ēi0 , ē00 ) = cov(ēi0 , k1
TA

E k−1 i=1 ēi0 )


σe2
⇒ E[M SA] = rσa2 + σe2 = k1 var(ēi0 ) = rk

 
Xk X
r
E[SSE] = E  (yij − ȳi0 )2 
i=1 j=1
 
Xk X
r
=E (µ + ai + eij − µ − ai − ēi0 )2 
i=1 j=1
 
Xk X
r
=E (eij − ēi0 )2 
i=1 j=1

24
E(eij − ēi0 )2 = E(e2ij ) + E(ē2i0 ) − 2 cov(eij , ēi0 )
 
2 r
σ 1 X
= σe2 + e − 2 cov eij , eij 
r r
j=1

σe2 σe2σ2
= σe2 +
−2 = σe2 − e
r r r
k r  2

XX σ
E(SSE) = σe2 − e
r
i=1 j=1

= nσe2 − kσe2 = (n − k)σe2

A
⇒ E(M SE) = σe2

G
Hypothesis
Here we want to test
N H0 : σa2 = 0 vs H1 : σa2 > 0

Test statistic
RA
so under H0 , E(M SA) = E(M SE) = σe2
As we deviate away from null E(M SA) ≥ E(M SE)
M SA M SA
So a right tailed test based on is appropriate under H0 , F = ∼
M SE M SE
Fk−1,n−k so we reject H0 at level α if F > Fα;k−1,n−k .
TA

Random effects model: Two way ‘m’ observations per cell

Motivation: Total yield of crop in India is varied due to different states as


well as due to different types of seeds. Since types of seeds as well as number
of states is very high, data collection becomes tough. So a sample of seeds and
sample of states are chosen. So both the factors have randomly chosen levels.
Also in the intersection of different levels we have fixed m observation. Hence
yield of crop is modelled by random effects model, two way m-observations
per cell.

yijk = µ + ai + bj + cij + eijk

25
i = 1(1)p, j = 1(1)q and k = 1(1)m

yijk = kth observation corresponding to ith level of A and jth level of B.


ai = random effect due to ith level of A.
bj = random effect due to jth level of B.
cij = random interaction effect due to ith level of A and jth level of B.
eij = random error in the model.

A
Orthogonal splitting of TSS:
p X
X q X
m p
X q
X
(yijk − ȳ000 )2 = mq (ȳi00 − ȳ000 )2 + pm (ȳ0j0 − ȳ000 )2

G
i=1 j=1 k=1 i=1 j=1
p X
X q
+m (ȳij0 − ȳi00 − ȳ0j0 + ȳ000 )2
N i=1 j=1
p
XX q X m
+ (yijk − ȳij0 )2
i=1 j=1 k=1
RA
T SS = SSA + SSB + SS(AB) + SSE
degrees of freedom of TSS = pqm − 1
degrees of freedom of SSA = p − 1
degrees of freedom of SSB = q − 1
degrees of freedom of SS(AB) = (p − 1)(q − 1)
TA

degrees of freedom of SSE = pq(m − 1)


Assumptions:

iid
i. eijk ∼ N (0, σe2 )

iid 2)
ii. ai ∼ N (0, σA

iid 2)
iii. bj ∼ N (0, σB

iid 2 )
iv. cij ∼ N (0, σAB

v. ai , bj , cij , eijk are mutually uncorrelated

26
 
XXX
E[SS(AB)] = E  (ȳij0 − ȳi00 − ȳ0j0 + ȳ000 )2 
i j k
XX
=m E(ȳij0 − ȳi00 − ȳ0j0 + ȳ000 )2
i j

yijk = µ + ai + bj + cij + eijk


ȳij0 = µ + ai + bj + cij + ēij0
ȳi00 = µ + ai + b̄ + c̄i0 + ēi00

A
ȳ0j0 = µ + ā + bj + c̄0j + ē0j0
ȳ000 = µ + ā + b̄ + c̄00 + ē000

G
Expectation of SS
XX 
E[SS(AB)] = m E µ + ai + bj + cij + ēij0
N j j

− µ − ai − bj − c̄i0 − ēi00 − µ − ā − bj − c̄0j − ē0j0



+ µ + ā + b̄ + c̄00 + ē000
XX
=m E {(cij − c̄i0 − c̄0j + c̄00 ) + (ēij0 − ēi00 − ē0j0 + ē000 )}2
RA
i j
XX
=m [E(c2ij ) + E(c̄2i0 ) + E(c̄20j ) + E(c̄200 ) − 2 cov(cij , c̄i0 )
i j

− 2 cov(cij , c̄0j ) + 2 cov(cij , c̄00 ) + 2 cov(c̄i0 , c̄0j ) − 2 cov(c̄i0 , c̄00 )


− 2 cov(c̄0j , c̄00 ) + E(ē2ij0 ) + E(ē2i00 ) + E(ē20j0 ) + E(ē2000 )
− 2 cov(ēij0 , ēi00 ) − 2 cov(ēij0 , ē0j0 ) + 2 cov(ēij0 , ē000 )
TA

+ 2 cov(ēi00 , ē0j0 ) − 2 cov(ēi00 , ē000 ) − 2 cov(ē0j0 , ē000 )]

Now  
2
σAB
1X
cov(cij , c̄i0 ) = cov cij , cij  =
q q
j
2
σAB 2
σAB
cov(cij , c̄0j ) = cov (cij , c̄00 ) =
p pq
2
σAB
cov(c̄i0 , c̄0j ) =
pq
2
σAB
cov(c̄0j , c̄00 ) =
pq

27
Again
q
1X σ2
cov(ēij0 , ēi00 ) = cov(ēij0 , ēij0 ) = e
q qm
j=1

σe2 σe2
cov(ēij0 , ē0j0 ) = cov(ēij0 , ē000 ) =
pm pqm

σe2 σe2
cov(ēi00 , ē000 ) = cov(ē0j0 , ē000 ) =
pqm pqm

σe2

A
cov(ēi00 , ē0j0 ) =
pqm
Therefore,

G
2
(p − 1)(q − 1)σe2
 
(p − 1)(q − 1)σAB
E[SS(AB)] = m · pq + · pq
pq mpq
 2
+ σe2

N = (p − 1)(q − 1) mσAB
 
SS(AB) 2
⇒E = mσAB + σe2
(p − 1)(q − 1)
2
⇒E[M S(AB)] = mσAB + σe2
RA
2 2
E(M SA) = mσAB + qmσA + σe2 [See copy]
2 2
E(M SB) = mσAB + pmσB + σe2
E(M SE) = σe2

Hypothesis
2 = 0 vs H
Here we want to test H0A : σA 2
1A : σA > 0
TA

2 2
H0B : σB = 0 vs H1B : σB >0
2
and H0AB : σAB 2
= 0 vs H1AB : σAB >0
Test Statistic
2 + σ 2 = E(M S(AB))
Under H0A , E(M SA) = mσAB e
As we deviate away from H0A , E(M SA) ≥ E(M S(AB)) so right tailed test
M SA M SA
based on is appropriate under H0A , ∼ Fp−1,(p−1)(q−1)
M S(AB) M S(AB)
we reject H0A at level α if
M SA
> Fα;(p−1);(p−1)(q−1) .
M S(AB)

Similarly, under H0B , E(M SB) = E(M S(AB)) = σe2 + mσAB


2

28
As we deviate away from H0B , E(M SB) ≥ E(M S(AB)). A right tailed
M SB M SB
test based on is appropriate. Hence we reject H0B if >
M S(AB) M S(AB)
Fα;(q−1);(p−1)(q−1) .
Again under H0AB , E[M S(AB)] = E[M SE] = σ 2 . As we deviate away from
M S(AB)
H0AB , E[M S(AB)] > E[M SE]. Thus a right tailed test based on
M SE
is appropriate. We reject H0AB if MM S(AB)
SE > F α;(p−1)(q−1),pq(m−1)

Two way mixed effects model:m observation per cell

A
Motivation:
Yield of crop may vary in different state and fertilisers. We have randomly

G
chosen 10 states and kept all the varieties of fertilisers. Hence the effect
due to state is a random effect and the effect due to fertilizer remains as a
fixed effect. We have ”m” observations corresponding to each state and each
N
fertilizer. Thus the analysis of yield of crop can be carried by two way mixed
effects model m observation per cell.
RA
model

Suppose there are two factors A and B. For the factor A there are ‘p’-fixed
levels and for the factor B there are randomly chosen ‘q’ levels. Suppose
there are ‘m’ observations per cell The ANOVA model is given by,

yijk = µ + ai + bj + cij + eijk ,i = 1(1)p


TA

j = 1(1)q
k = 1(1)m

yijk = kth observation corresponding to ith level of A and jth level of B


µ = general effect
ai = additional fixed effect due to ith level of A.
bj = additional fixed effect due to jth level of B.
cij = random interaction level due to ith level of A and jth level of B.

Assumptions:

29
p
X
(i) ai = 0
i=1

p
X p
X p
X
(ii) cij = ai bj = bj ai = 0
i=1 i=1 i=1

iid 2
(iii) bj ∼ N (0, σB )

(iv) cij ∼ N (0, σi2 ) independently.

A
iid
(v) eijk ∼ N (0, σe2 )

G
Remark: {bj } and {eij } are independently distributed.
We further define,
N 2
σA =
1 X 2
p−1
ai ;
p

i=1
2
σAB =
1 X 2
p−1
σi
p

i=1
RA
Orthogonal splitting of total sum of squares:

XXX p
X q
X
2 2
(yijk − ȳ000 ) = qm (ȳi00 − ȳ000 ) + pm (ȳ0j0 − ȳ000 )2
i j k i=1 j=1
XX
+m (ȳij0 − ȳi00 − ȳ0j0 + ȳ000 )2
i j
TA

XXX
+ (yijk − ȳij0 )2
i j k

T SS = SSA + SSB + SS(AB) + SSE


degrees of freedom of TSS = pqm − 1
degrees of freedom of SSA = p − 1
degrees of freedom of SSB = q − 1
degrees of freedom of SS(AB) = (p − 1)(q − 1)
degrees of freedom of SSE = pq(m − 1)
Expectation of SS:

30
1 XX
ȳi00 = yijk
qm
j k
1 XXX
ȳ000 = yijk
mpq
i j k
1 X X
ȳ0j0 = yijk
pm
i k

A
1 X
ȳij0 = yijk
m
k
1 XX
ȳi00 = yijk

G
qm
j k
1 XX
= (µ + ai + bj + cij + eijk )
qm
j k
N = µ + ai + b̄ + c̄i0 + ēi00
ȳ0j0 = µ + bj + ē0j0
ȳij0 = µ + ai + bj + cij + ēij0
RA
ȳ000 = µ + b̄ + ē000

p
X
E[SSA] = qm E(ȳi00 − ȳ000 )2
TA

i=1
p
X
= qm E(µ + ai + b̄ + c̄i0 + ēi00 − µ − b̄ − ē000 )2
i=1
Xp
= qm E(ai + ēc0 + ēi00 − ē000 )2
i=1
Xp
E(a2i ) + E(c̄2i0 ) + E(ēi00 − ē000 )2

= qm
i=1
+ product term vanishes due to independent.
" p p p
#
X
2
X σi2 X  2 2

= qm ai + + E(ēi00 ) + E(ē000 ) − 2E(ēi00 ē000 )
q
i=1 i=1 i=1

31
Now

A
p
!
1X
E(ēi00 , ē000 ) = cov(ēi00 , ē000 ) = cov ēi00 , ēi00
p
i=1
σe2

G
1
= V (ēi00 ) =
p pqm

N
RA

Therefore,
TA

" p p p  2 #
X 1 X X σ σ 2
E(SSA) = qm a2i + σi2 + − e
q qm pqm
i=1 i=1 i=1
  
2 (p − 1) 2 p 2 p−1
= qm (p − 1)σA + σAB + σ
q qm e p
2 2
+ σe2
 
= (p − 1) qmσA + mσAB

32
q
X
and E(SSB) = pm E(ȳ0j0 − ȳ000 )2
j=1
Xq
= pm E(µ + bj + ē0j0 − µ − b̄ − ē000 )2
j=1
Xq 
= pm E(bj − b̄)2 + E(ē0j0 − ē000 )2
j=1


− 2E (bj − b̄)(ē0j0 − ē000 )

A
| {z }
0
q
X
E(bj − b̄)2 + E(ē0j0 − ē000 )2

= pm
j=1

G
∴ E(bj − b̄) = E(b2j ) + E(b̄2 ) − 2E(bj b̄)
2

σ2 σ2
 
2 q−1 2
= σB + B −2 B = σB
q q q
N
E(ē0j0 − ē000 )2 = E(ē20j0 ) + E(ē2000 ) − 2 cov(ē000 , ē0j0 )
σe2
=
σ2
+ e −
2σe2 σ2
= e − e
σ2
pm pqm pqm pm pqm
RA
 
1 q−1
= σe2
pm q
q     
X q−1 2 q−1 1 2
E(SSB) = pm σB + σ
q q pm e
j=1
2
= pm(q − 1)σB + (q − 1)σe2
TA

2
= (q − 1)(pmσB + σe2 )

33
Now
p X
X q
E[SS(AB)] = m E [ȳij0 − ȳi00 − ȳ0j0 + ȳ000 ]2
i=1 j=1
Xp X q
=m E[µ + ai + bj + cij + ēij0 − µ − ai − b̄ − c̄i0
i=1 j=1

− ēi00 − µ − bj − ē0j0 + µ + b̄ + ē000 ]


 
p X
q

A
X  
=m E (cij − c̄i0 )2 + (ēij0 − ēi00 − ē0j0 + ē000 )2 
 
 | {z } 
i=1 j=1 
(p−1)(q−1) 2
pqm
σe (already done)

G
Therefore,

N E[cij − c̄i0 ]2 = E(c2ij ) + E(c̄2i0 ) − 2 cov(cij , c̄i0 )


σi2 (q − 1) 2
= σi2 − = σi
q q

p X
q 
RA

2 (q − 1) (p − 1)(q − 1) 2
X
∴ E[SS(AB)] = m σi + σe
q pqm
i=1 j=1
p
m(q − 1) X 2
= q σi + (p − 1)(q − 1)σe2
q
i=1
2
= m(p − 1)(q − 1)σAB + (p − 1)(q − 1)σe2
2
+ σe2 )
TA

= (p − 1)(q − 1)(mσAB

So finally,
 
SSA 2 2
E(M SA) = E = qmσA + mσAB + σe2
p−1

2
E(M SB) = pmσB + σe2
2
E(M S(AB)) = mσAB + σe2
and E(M SE) = σe2
2 = 0 [i.e., a = 0 ∀ i] vs
Hypothesis: Here we want to test H0A : σA i
2 >0
H1A : σA

34
2 = 0 ag H
H0B : σB 2
1B : σB > 0
2
H0AB : σAB 2
= 0 ag H1AB : σAB >0
Test statistic:
Under H0A
2
E[M SA] = mσAB + σe2 = E[M S(AB)]

As we deviate from H0A ,

E[M SA] ≥ E[M S(AB)]

A
M SA
so a right tailed test based on is appropriate under H0A ,
M S(AB)

M SA

G
∼ F(p−1),(p−1)(q−1)
M S(AB)

we reject H0A at level α if


N M SA
> Fα j(p−1),(p−1)(q−1)
M S(AB)

under H0b , E(M SB) = E(M SE) = σe2


RA
E(M SB) ≥ E(M SE)

M SB
A right tailed test based on is appropriate. Similarly for testing
M SE
M S(AB)
H0AB a right tailed test based on is appropriate.
M SE
TA

Remark:
M SA
In general has no F -distribution under H0A . So we consider a
M S(AB)
approximate F -statistic with degrees of freedom (p − 1), (p − 1)(q − 1).
Some important questions and answers:

1. What is general linear hypothesis?


Let y1 , y2 , . . . , yn be independently distributed normal variables with,

E(yi ) = ai1 β1 + ai2 β2 + · · · + aip βp


V (yi ) = σ 2 ∀ i = (1)n
cov(yi , yj ) = 0 ∀ i 6= j

35
i.e.,    
E(y1 ) a11 βq + a12 β2 + · · · + a1p βp
   
 E(y2 )   a21 β1 + a22 β2 + · · · + a2p βp 
 . =
   
 ..   .. 
   . 

E(yn ) an1 β1 + an2 β2 + · · · + anp βp
  
a11 a12 . . . a1p β1
  
 a21 a22 . . . a2p  β2 

A
⇒ E(y ) =  . (2)
  
. .. ..  .
.
 . . . 
 . 
e 
an1 an2 . . . anp βp
⇒ E(y ) = Xβ

G
e e
X is called design matrix containing known coefficients. β is the vector of
unknown model parameters.
N e

Let us consider that parameters β are subject to ‘m’ independent linear


constraints. e
m×p p×1 m×1
= (3)
RA
H β h
e e
Now for the linear model (1) and restriction (2), we consider a set of
hypothesis containing linear equation in βi ’s given as follows.

t×p p×1 t×1


H0 : =
L β l
TA

where t linear functions in β are assumed to be independent. It is necessary


to assume that the row vectors
e in L are linearly dependent on row vectors
of X and H.
This set of hypothesis is called general linear hypothesis.

2. Explain the concept of selection of valid error


In ANOVA valid error refers to as the denominator of the concerned test
statistic of a certain hypothesis. For most of the ANOVA models mean
square error serves as the valid error. We consider an example here , where
besides mean square some other quantity turns out to be the valid error.
We consider a two way random effects model m observation per cell.

36
yijk = µ + ai + bj + cij + eijk

i = 1(1)p, j = 1(1)q and k = 1(1)m

yijk = kth observation corresponding to ith level of A and jth level of B.


ai = random effect due to ith level of A.
bj = random effect due to jth level of B.
cij = random interaction effect due to ith level of A and jth level of B.

A
Orthogonal splitting of TSS:
p X
q X
m p q

G
X X X
2 2
(yijk − ȳ000 ) = mq (ȳi00 − ȳ000 ) + pm (ȳ0j0 − ȳ000 )2
i=1 j=1 k=1 i=1 j=1
p X
X q
+m (ȳij0 − ȳi00 − ȳ0j0 + ȳ000 )2
N +
p X
X q X
m
i=1 j=1

(yijk − ȳij0 )2
RA
i=1 j=1 k=1

T SS = SSA + SSB + SS(AB) + SSE


degrees of freedom of TSS = pqm − 1
degrees of freedom of SSA = p − 1
degrees of freedom of SSB = q − 1
TA

degrees of freedom of SS(AB) = (p − 1)(q − 1)


degrees of freedom of SSE = pq(m − 1)
Assumptions:
iid iid
(i) eijk ∼ N (0, σe2 ) 2)
(ii) ai ∼ N (0, σA
iid iid It can be shown that,
2)
(iii) bj ∼ N (0, σB 2 )
(iv) cij ∼ N (0, σAB

2
E[M S(AB)] = mσAB + σe2
2 2
E(M SA) = mσAB + qmσA + σe2
2 2
E(M SB) = mσAB + pmσB + σe2
E(M SE) = σe2

37
Hypothesis
2 = 0 vs H
Here we want to test H0A : σA 2
1A : σA > 0

2 2
H0B · σB = 0 vs H1B : σB >0

2
and H0AB · σAB 2
= 0 vs H1AB : σAB >0
Test Statistic
2 + σ 2 = E(M S(AB))
Under H0A , E(M SA) = mσAB e

As we deviate away from H0A , E(M SA) ≥ E(M S(AB)) so right tailed test
M SA M SA

A
based on is appropriate under H0A , ∼ Fp−1,(p−1)(q−1)
M S(AB) M S(AB)
we reject H0A at level α if
M SA

G
> Fα;(p−1);(p−1)(q−1) .
M S(AB)

Similarly, under H0B , E(M SB) = E(M S(AB)) = σe2 + mσAB


N 2

As we deviate away from H0B , E(M SB) ≥ E(M S(AB)). A right tailed
M SB M SB
test based on is appropriate. Hence we reject H0B if >
M S(AB) M S(AB)
Fα;(q−1);(p−1)(q−1) .
RA
Again under H0AB , E[M S(AB)] = E[M SE] = σ 2 . As we deviate away
from H0AB , E[M S(AB)] > E[M SE]. Thus a right tailed test based on
M S(AB)
is appropriate. We reject H0AB if MMS(AB)
SE > Fα;(p−1)(q−1),pq(m−1)
M SE
As we drift away from H0A and H0B , E(M SA) ≥ E(M S(AB)) and
E(M SB) ≥ E(M S(AB)) i.e., a right tailed test based on MM SA
S(AB) and
M SB
is appropriate M S(AB) serves as the valid error for testing H0A
TA

M S(AB)
and H0B . Whereas under H0AB , E(M S(AB)) = E(M SE) and as we
drift away from H0AB E(M S(AB)) ≥ E(M SE). Hence a right tailed test
based on M S(AB) is appropriate. Hence M SE serves as the valid error
in this case. Hence valid error changes over different testing problem.

3. What is orthogonal splitting?


In analysis of variance main objective is to partition the total variability
of any response into disjoint parts, where each parts indicate variability
due to certain effects. Now this partitioning is achieved via orthogonal
splitting of total sum of squares. Here total sum of square refers to as
total variance and via orthogonal splitting total sum of square is splitted
into sum square due to different sources of variation. Here the term square

38
refers to as total variance and via orthogonal splitting total sum of square
is splitted into sum square due to different sources of variation. Here the
term “orthogonal” indicates that sum of square due to different sources
are independent of each other.
In one way layout, fixed effect model, the total sum of square is partitioned
into sum of square due to the single factor and sum square due to error as
total variability is caused by the single factor and error. We can describe
it as follows,

A
yij = ȳ00 + (ȳi0 − ȳ00 ) + eij
⇒ (yij − ȳ00 ) = (ȳi0 − ȳ00 ) + (yij − ȳi0 )

G
Squaring and summing over i and j we get
k X
X ni k X
X ni k X
X ni
(ȳi0 − ȳ00 )2 = (ȳi0 − ȳ00 )2 + (yij − ȳi0 )2
N i=1 j=1 i=1 j=1

(product term vanishes)


i=1 j=1

k k X
ni
RA
X 2
X
= ni (ȳi0 − ȳ00 ) + (yij − ȳio )2
i=1 i=1 j=1

where
k X
X ni
(yij − ȳ00 )2 = Sum of squares due to the factor A. (SSA)
i=1 j=1
TA

k X
X ni
(yij − ȳi0 ) = sum of squares due to the error (SSE)
i=1 j=1

i.e., T SS = SSA + SSE

4. If the F ratio is fractional, then what would be the interpreta-


tion?
A higher value of F -ratio indicates the rejection of H0 and if F ratio
becomes one the numerator of the test statistic becomes equal the valid
error. Then the null hypothesis is trivially accepted.
Whenever the F -ratio becomes fraction, then it is strongly implied that
the valid error is dominant over the presence of the factor. Hence the null

39
hypothesis is strongly accepted, i.e., effect of the corresponding factor is
tested not to be true.

5. In a two way classified data, if the equality of a certain fac-


tor gets rejected, then discuss how the significant effect can be
traced out?
We consider a two way classified data with one observation per cell. The
ANOVA model is given by

A
yij = µ + αi + βj + γij + eij , i = 1(1)p, j = 1(1)q

µ = general effect

G
αi = additional effect due to ith level of A.
βj = additional effect due to jth level of B.
γij = Interaction effect due to ith level of A and jth level of B.
N
For one observation per cell we take γij = 0 as γij can not be estimated
so the model becomes,
yij = µ + αi + βj + eij .
RA
Assumptions:
p q
iid
X X
(i) αi = 0 (ii) βj = 0 (iii) eij ∼ N (0, σ 2 ).
i=1 j=1
Hypothesis

(i) H01 : αi = 0 ∀ i vs H1 : at least one inequality in H01


TA


The factor A has no effect
(ii) H02 : βj = 0 ∀ j vs H1 : at least one inequality in H02

Factor B has no effect.

Under H0 ,
E(M SA) = σ 2 As we deviate away from H01
E(M SA) ≥ σ 2 = E(M SE)
M SA
So intuitively a large value of M SE indicates the rejection of H0 .

40
Under H01 , +
SSA
σ2
∼ χ2p−1
SSE
independent
σ2
∼ χ2(p−1)(q−1)

Now
SSA/p − 1
F1 = ∼ Fp−1 , (p − 1)(q − 1)
SSE/(p − 1)(q − 1)
M SA
F1 = ∼ Fp−1,(p−1)(q−1)
M SE

A
So we reject H01 at size α if

F1 > Fα p−1,(p−1)(q−1)

G
If H01 is rejected then paired comparison is required
From comparing ith class with i0 th class we have the following hypothesis,
N H0 : αi = αi0 vs H1 : αi 6= αi0

The corresponding test statistic is,


RA
ȳi0 − ȳi0 0
t= r   ∼ t(p−1)(q−1) .
1 1
M SE q + q

We reject H0 at level α if
s
2M SE
TA

|ȳi0 − ȳi0 0 | > tα/2;(p−1)(q−1)


q

this quantity called least significant difference.

6. In a two way m observation per cell layout which hypothesis to


be tested at first and why?
If the hypothesis of interaction effect is rejected , then testing for individual
effects are not worth making as under the presence of interaction effect
if a particular level of A is found to be best, then there is no way to
detect that for each level of B it will remain as the best. The same holds
for factor B also. So under the presence of interaction it is suggested to
perform one way ANOVA by considering factor A for a particular level of

41
B or performing one way ANOVA by considering factor B for a particular
level of A.
In the current set up testing for individual effects can be carried out only
if interaction is tested to be not present.

E[M SA] = E[M S(AB)] and


E(M SB) = E[M S(AB)]

A
N G
RA
TA

42

Potrebbero piacerti anche