Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
=
j xij , i 1, =
2,..., n, j 1, 2,..., p
j =1
Var (Yi ) = 2 .
This is the linear model in the expectation form where 1 , 2 ,..., p are the unknown parameters and
xij s are the known values of independent covariates X 1 , X 2 ,..., X p .
j =1
xij , + i=
, i 1, 2,..., n;=
j 1, 2,..., p
where i s are identically and independently distributed random error component with mean 0 and
variance 2 , i.e., E ( i ) = 0 Var ( i ) = 2 and Cov( i , =
0(i j ).
j)
where
X 11 X 12 ... X 1 p
X 21 X 22 ... X 2 p
the matrix X =
is an n p matrix of
X X ... X
np
n1 n 2
n observations on
p independent
covariates X 1 , X 2 ,..., X p ,
= ( 1 , 2 ,..., p ) is a
p 1 vector of
Analysis of Variance | Chapter 2 | General Linear Hypothesis and Anova | Shalabh, IIT Kanpur
Note that in the linear regression model, the covariates are usually continuous variables.
When some of the covariates are counter variables and rest are continuous variables, then the
model is called as mixed model and is used in the analysis of covariance.
Consider an example of agricultural yield. The study variable Y denotes the yield which depends on
various covariates X 1 , X 2 ,..., X p . In case of regression analysis, the covariates X 1 , X 2 ,..., X p are the
different variables like temperature, quantity of fertilizer, amount of irrigation etc.
Now consider the case of one-way model and try to understand its interpretation in terms of multiple
regression model. The covariate X is now measured at different levels, e.g., if X is the quantity of
fertilizer then suppose there are p possible values, say 1 Kg., 2 Kg., ,..., p Kg. then X 1 , X 2 ,..., X p
denotes these p values in the following way.
The linear model now can be expressed as
Y = o + 1 X 1 + 2 X 2 + ... + p X p + by defining.
If effect of 1 Kg. of fertilizer is present, then other effects will obviously be absent and the linear
model is expressible as
Y = 0 + 1 ( X 1 = 1) + 2 ( X 2 = 0) + ... + p ( X p = 0) +
= 0 + 1 +
and so on.
Analysis of Variance | Chapter 2 | General Linear Hypothesis and Anova | Shalabh, IIT Kanpur
If the experiment with 1 Kg. of fertilizer is repeated n1 number of times then n1 observation on
response variables are recorded which can be represented as
Y11 = 0 + 1.1 + 2 .0 + ... + p .0 + 11
Y12 = 0 + 1.1 + 2 .0 + ... + p .0 + 12
If X 2 =1 is repeated n 2 times, then on the same lines n2 number of times then n2 observation on
response variables are recorded which can be represented as
Y21 = 0 + 1.0 + 2 .1 + ... + p .0 + 21
Y22 = 0 + 1.0 + 2 .1 + ... + p .0 + 22
Y2 n2 = 0 + 1.0 + 2 .1 + ... + p .0 + 2 n2
The experiment is continued and if X p = 1 is repeated n p times, then on the same lines
Yp1 = 0 + 1.0 + 2 .0 + ... + p .1 + P1
Yp 2 = 0 + 1.0 + 2 .0 + ... + p .1 + P 2
y
11 1
y12 1
y1n 1
1
y21 1
y
22 1
y
2 n2 1
y p1 1
1
y p2
y pn 1
p
11
12
1n
21
0
22
1 +
2 n2
p
p1
p2
0 0 0 0 1
pn
p
1 0 0 0 0
1 0 0 0 0
1 0 0 0 0
0 1 0 0 0
0 1 0 0 0
0 1 0 0 0
0 0 0 0 1
0 0 0 0 1
or
=
Y X + .
Analysis of Variance | Chapter 2 | General Linear Hypothesis and Anova | Shalabh, IIT Kanpur
In the two-way analysis of variance model, there are two covariates and the linear model is
expressible as
Y 0 + 1 X 1 + 2 X 2 + ... + p X p + 1Z1 + 2 Z 2 + ... + q Z q +
=
where X 1 , X 2 ,..., X p denotes , e.g., the p levels of quantity of fertilizer, say 1 Kg., 2 Kg.,...,p Kg.
and Z1 , Z 2 ,..., Z q denotes, e.g. , the q levels of level of irrigation, say 10 Cms., 20 Cms.,10 q Cms.
etc. The levels X 1 , X 2 ,..., X p , Z1 , Z 2 ,..., Z q are the counter variable indicating the presence or
absence of the effect as in the earlier case. If the effect of X 1 and Z1 are present, i.e., 1 Kg of
fertilizer and 10 Cms. of irrigation is used then the linear model is written as
Y = 0 + 1.1 + 2 .0 + ... + p .0 + 1.1 + 2 .0 + ... + p .0 +
= 0 + 1 + 1 + .
Y = 0 + 2 + 2 + .
The design matrix can be written accordingly as in the one-way analysis of variance case.
In the three-way analysis of variance model
Y = + 1 X 1 + ... + p X p + 1Z1 + ... + q Z q + 1W1 + ... + rWr +
If all ' s are unknown constants, they are called as parameters of the model and the model is
called as a fixed effect model or model I. The objective in this case is to make inferences about
the parameters and the error variance 2 .
If for some j , xij = 1 for all i = 1, 2,..., n then j is termed as additive constant. In this case, j
occurs with every observation and so it is also called as general mean effect.
If all s are observable random variables except the additive constant, then the linear model is
termed as random effect model, model II or variance components model. The objective in
this case is to make inferences about the variances of ' s , i.e., 2 , 2 ,..., 2 and error
1
If some parameters are fixed and some are random variables, then the model is called as mixed
effect model or model III. In mixed effect model, at least one j is constant and at least one
j is random variable.
Analysis of variance
Analysis of variance is a body of statistical methods of analyzing the measurements assumed to be
structured as
y=
1 xi1 + 2 xi 2 + ... + p xip + i , =
i 1, 2,..., n
i
where xij are integers, generally 0 or 1 indicating usually the absence or presence of effects j ; and
i s are assumed to be identically and independently distributed with mean 0 and variance 2 . It
may be noted that the i s can be assumed additionally to follow a normal distribution N (0, 2 ) . It
is needed for the maximum likelihood estimation of parameters from the beginning of analysis but
in the least squares estimation, it is needed only when conducting the tests of hypothesis and the
confidence interval estimation of parameters. The least squares method does not require any
knowledge of distribution like normal upto the stage of estimation of parameters.
' ( y X )( y X )
S2 =
i2 ==
i =1
=
yy 2 X ' y + X X
is minimum where y = ( y1 , y2 ,..., yn ) . Differentiating S 2 with respect to and substituting it to
be zero, the normal equations are obtained as
dS 2
= 2 X X 2 X y= 0
d
or X X = X y .
If X has full rank p, then
is
= ( X X ) 1 X y
which is the best linear unbiased estimator of in the sense of having minimum variance in the
class of linear and unbiased estimator. If rank of X is not full, then generalized inverse is used for
finding the inverse of ( X X ).
Analysis of Variance | Chapter 2 | General Linear Hypothesis and Anova | Shalabh, IIT Kanpur
If L is a linear parametric function where L = (1 , 2 ,..., p ) is a non-null vector, then the least
squares estimate of L is L .
A question arises that what are the conditions under which a linear parametric function L admits
a unique least squares estimate in the general case.
Estimable functions:
A linear function
if L is estimable.
where is a
solution of
X X = X Y
is the best linear unbiased estimator of L ' in the sense of having minimum variance in the class of
all linear and unbiased estimators of L .
'
l1=
, 2 l2' ,...,
=
k lk' are estimable , then any
Theorem 4: All linear parametric functions in are estimable if and only if X has full rank.
Analysis of Variance | Chapter 2 | General Linear Hypothesis and Anova | Shalabh, IIT Kanpur
If X is not of full rank, then some linear parametric functions do not admit the unbiased linear
estimators and nothing can be inferred about them. The linear parametric functions which are not
estimable are said to be confounded. A possible solution to this problem is to add linear restrictions
on so as to reduce the linear model to a full rank.
Theorem 5: Let L1' and L'2 be two estimable parametric functions and let L1' and L'2 be
their least squares estimators. Then
Var ( L1' ) = 2 L1' ( X X ) 1 L1
Cov( L' , L' ) = 2 L' ( X X ) 1 L
1
assuming that X is a full rank matrix. If not, the generalized inverse of X X can be used in place of
unique inverse.
2 =
tr [ I X ( X X ) 1 ] =
trI trX ( X X ) 1 X
=
n tr ( X X ) 1 X X (using the result tr ( AB) =
tr ( BA))
= n tr I p
= n p.
2
n p
tr[ I X ( X X ) 1 X ]
= 2
Analysis of Variance | Chapter 2 | General Linear Hypothesis and Anova | Shalabh, IIT Kanpur
Suppose y1 , y2 ,..., yn are independently and identically distributed following a normal distribution
p
with mean E ( yi ) = j xij and variance Var ( yi ) = 2 (i =1, 2,, n). Then the likelihood function
j =1
of y1 , y2 ,..., yn is
L( =
y , 2 )
1
n
2
n
2 2
(2 ) ( )
exp 2 ( y X )( y X )
2
and 2 , we have
L
X y
=
0 X X =
L
1
=0 2 = ( y X )( y X )
2
n
Assuming the full rank of X , the normal equations are solved and the maximum likelihood
estimators are obtained as
= ( X X ) 1 X y
1
2 =( y X )( y X )
n
1
=
y I X ( X X ) 1 X y.
n
The second order differentiation conditions can be checked and they are satisfied for and 2 to be
the maximum likelihood estimators.
Analysis of Variance | Chapter 2 | General Linear Hypothesis and Anova | Shalabh, IIT Kanpur
Note that in the maximum likelihood estimator is same as the least squares estimator and
n p 2
2 like the least squares
n
estimator.
Now we use the following theorems for developing the test of hypothesis.
N ( , ) with mean
vector and positive definite covariance matrix . Then Y AY follows a noncentral chi-square
distribution with p degrees of freedom and noncentrality parameter A , i.e., 2 ( p, A ) if and
only if A is an idempotent matrix of rank p.
N ( , )
with
follows 2 ( p1 , A1 )
mean vector and positive definite covariance matrix . Let Y AY
1
and Y A2Y follows
2 ( p2 , A2 ) .
Then
Y AY
and Y A2Y are independently distributed if
1
A1A2 =
0.
n 2
1
where rank ( X ) = p.
10
N L , 2 L( X X ) 1 L . Let
A= I X ( X X ) 1 X
and B = L( X X ) 1 X ,
then
( X X ) 1 X Y BY
=
L L=
and
and n 2 =
(Y X ) ' I X ( X X )1 X (Y X ) =
Y ' AY .
So, using Theorem 6 with rank(A) = n - p,
n 2
follows a 2 (n p ) . Also
2
=
BA L( X X ) 1 X L( X X ) 1 X X ( X X ) 1 X
= 0.
So using Theorem 7, Y ' AY and BY are independently distributed.
Analysis of Variance
The technique in the analysis of variance involves the breaking down of total variation
into
orthogonal components. Each orthogonal factor represents the variation due to a particular factor
contributing in the total variation.
Model
Let
E ( ) = 0
Analysis of Variance | Chapter 2 | General Linear Hypothesis and Anova | Shalabh, IIT Kanpur
11
E (Y X )(Y X ) =
2I.
Now we consider four different types of tests of hypothesis .
In the first two cases, we develop the likelihood ratio test for the null hypothesis related to the
analysis of variance. Note that, later we will derive the same test on the basis of least squares
principle also. An important idea behind the development of this test is to demonstrate that the test
used in the analysis of variance can be derived using least squares principle as well as likelihood
ratio test.
H0 : = 0
where (=
=
1 , 2 ,..., p ), 0 ( 10 , 20 ,..., p0 ) ' is specified and
hypothesis is equivalent to
0
H=
=
20 ,...,
=
p p0 .
0 : 1
1 , 2
Assume that all i ' s are estimable, i.e., rank ( X ) = p (full column rank). We now develop the
likelihood ratio test.
The ( p + 1) 1 dimensional parametric space is a collection of points such that
{( ,
2
); < i < , =
> 0, i 1, 2,... p} .
Under H 0 , all s s are known and equal, say 0 all are known and the reduces to one
dimensional space given by
{(
, 2 ); 2 > 0} .
1 2
1
, )
exp 2 ( y X 0 )( y X 0 )
L( y =
2
2
2
The likelihood function is maximum over when and 2 are substituted with their maximum
likelihood estimators, i.e.,
= ( X X ) 1 X y
1
n
2 =( y X )( y X ).
Substituting and 2 in L( y | , 2 ) gives
Analysis of Variance | Chapter 2 | General Linear Hypothesis and Anova | Shalabh, IIT Kanpur
12
1 2
1
, )
exp 2 ( y X )( y X )
Max L( y =
2
2
2
2
n
n
=
exp .
2
2 ( y X )( y X )
1
Under H 0 , the maximum likelihood estimator of 2 is 2 =
( y X 0 )( y X 0 ).
n
1 2
1
, )
exp 2 ( y X 0 )( y X 0 )
Max L( y =
2
2
n
2
n
n
exp
=
0
0
2
2 ( y X )( y X )
Max L( y , 2 )
Max L( y , 2 )
( y X )( y X ) 2
=
0
0
( y X )( y X )
n
2
)( y X )
(
y
X
=
'
0
0
( y X ) + ( X X ) ( y X ) + ( X X )
( 0 ) ' X X ( 0 )
= 1 +
( y X )( y X )
q
= 1 + 1
q2
n
2
n
2
where q2 =
( y X )( y X )
and q =
( 0 ) X X ( 0 ).
1
Analysis of Variance | Chapter 2 | General Linear Hypothesis and Anova | Shalabh, IIT Kanpur
13
q1 =
( 0 ) X X ( 0 )
( X X ) 1 X y 0 X X ( X X ) 1 X y 0
=
( X X ) 1 X ( y X 0 ) X X ( X X ) 1 X ( y X 0 )
=
=
( y X 0 ) X ( X X ) 1 X X ( X X ) 1 X ( y X 0 )
=
( y X 0 ) X ( X X ) 1 X ( y X 0 )
q2 =
( y X )( y X )
y X ( X X ) 1 X y y X ( X X ) 1 X y
=
= y I X ( X X ) 1 X y
=
[( y X 0 ) + X 0 ][ I X ( X X ) 1 X '][( y X 0 ) + X 0 ]
( y X 0 )[ I X ( X X ) 1 X ]( y X 0 )
=
[ I X ( X X ) 1 X ] X =
0
In order to find out the decision rule for H 0 based on , first we need to find if is a monotonic
increasing or decreasing function of
Let g =
q1
. So we proceed as follows:
q2
q1
, so that
q2
q
= 1 + 1
q2
= (1 + g )
then
d
n
=
dg
2
n
2
n
2
1
n
(1 + g ) 2
+1
So as g increases, decreases.
Thus is a monotonic decreasing function of
The decision rule is to reject
q1
.
q2
14
0
q
or 1 + 1
q2
or
(1 + g )
n
2
n
2
or (1 + g ) 0
2
n
or g 0 n 1
or g C
where C is a constant to be determined by the size condition of the test.
So reject H 0 whenever
Note that the statistic
q1
C.
q2
q1
q2
can also be obtained by the least squares method as follows. The least
Min( y X )( y X )
sum
of
squares
due to
deviation
from H o
OR
sum of
squares due
to
Min( y X )( y X )
sum
of
squares
due to H o
sum
of
squares
due to error
OR
Totalsum of
squares
It will be seen later that the test statistic will be based on the ratio
distribution of
q1
. In order to find an appropriate
q2
q1
, we use the following theorem:
q2
Analysis of Variance | Chapter 2 | General Linear Hypothesis and Anova | Shalabh, IIT Kanpur
15
Theorem 9: Let
Z= Y X 0
Q1 = Z X ( X X ) 1 X ' Z
=
Q2 Z [ I X ( X X ) 1 X ]Z .
Q1
Then
and
Q2
and
Q2
Q1
~ 2 ( p)
2
2
~ 2 (n p ) where (m) denotes the distribution with m degrees of freedom.
Proof: Under H 0 ,
E (Z ) = X 0 X 0 = 0
( Z ) Var
(Y ) 2 I .
Var
=
=
Z ~ N (0, 2 I )
The matrices X ( X X ) 1 X and [ I X ( X X ) 1 X ] are idempotent matrices. So
1
tr [ X ( X X ) =
X ] tr[( X X ) 1=
X X ] tr=
(I p ) p
tr [ I X ( X X ) 1 X ] =
tr I n tr[ X ( X X ) 1 X ] =
n p
~ 2 ( p)
and
Q2
~ 2 (n p)
where the degrees of freedom p and (n p ) are obtained by the trace of X ( X X ) 1 X and trace
of I X ( X X ) 1 X , respectively.
Since
I X ( X X ) 1 X X ( X X ) 1 X =
0,
So using theorem 7, the quadratic forms Q1 and Q2 are independent under H 0 .
Hence the theorem is proved.
Since Q 1 and Q 2 are independently distributed, so under H 0
Q1 / p
follows a central F distribution, i.e.
Q2 /(n p )
n p Q1
F ( p, n p).
p
Q2
Analysis of Variance | Chapter 2 | General Linear Hypothesis and Anova | Shalabh, IIT Kanpur
16
=
C F1 ( p, n p)
where F1 (n1 , n2 ) denotes the upper 100 % points of F-distribution with n1 and n2 degrees of
freedom.
The computations of this test of hypothesis can be represented in the form of an analysis of variance
table.
ANOVA for testing H 0 : = 0
______________________________________________________________________________
Source of
Degrees
Sum of
Mean
F-value
variation
of freedom
squares
squares
______________________________________________________________________________
q1
n p q1
q1
Due to
p
p
p q2
Error
Total
n p
q2
q2
(n p)
( y X 0 )( y X 0 )
0
H 0=
: k =
1, 2,.., r < p when r +1 , r + 2 ,..., p and
k ,k
2 are unknown.
In case 1 , the test of hypothesis was developed when all s are considered in the sense that we test
0
for each=
i =
1, 2,..., p. Now consider another situation, in which the interest is to test only a
i , i
subset of 1 , 2 ,..., p , i.e., not all but only few parameters. This type of test of hypothesis can be
used, e.g., in the following situation. Suppose five levels of voltage are applied to check the
rotations per minute (rpm) of a fan at 160 volts, 180 volts, 200 volts, 220 volts and 240 volts. It can
be realized in practice that when the voltage is low, then the rpm at 160, 180 and 200 volts can be
observed easily. At 220 and 240 volts, the fan rotates at the full speed and there is not much
difference in the rotations per minute at these voltages. So the interest of the experimenter lies in
testing the hypothesis related to only first three effects, viz., 1 , for 160 volts, 2 for 180 volts
and 3 for 200 volts. The null hypothesis in this case can be written as:
Analysis of Variance | Chapter 2 | General Linear Hypothesis and Anova | Shalabh, IIT Kanpur
17
0
0
0
=
=
30
H=
0 : 1
1 , 2
2 , 3
(2)
{( ,
2
); < i < , =
> 0, i 1, 2,..., p}
{(
0
(1)
Analysis of Variance | Chapter 2 | General Linear Hypothesis and Anova | Shalabh, IIT Kanpur
18
1 2
1
, 2 )
exp 2 ( y X )( y X ) .
L( y =
2
2
2
The maximum value of likelihood function under is obtained by substituting the maximum
likelihood estimates of and 2 , i.e.,
= ( X X ) 1 X y
1
n
2 =( y X )( y X )
as
n
1 2
1
, 2 )
exp 2 ( y X )( y X )
Max L( y =
2
2
2
n
n
exp .
=
'
2
2 ( y X ) ( y X )
Now we find the maximum value of likelihood function under H 0 . The model under H 0 becomes
0
+ X 2 2 + . The likelihood function under H 0 is
Y = X 1 (1)
1 2
1
0
0
exp 2 ( y X 1 (1)
X 2 (2) )( y X 1 (1)
X 2 (2) )
L( y =
, 2 )
2
2
2
1 2
1
(0)
where y*= y X 1 (1)
. Note that (2) and 2 are the unknown parameters . This likelihood function
( y * X 2 (2) )( y * X 2 (2) ).
2 =
Note that X 2' X 2 is a principal minor of X X . Since X X is a positive definite matrix, so X 2' X 2 is
also positive definite . Thus ( X 2' X 2 ) 1 exists and is unique.
Analysis of Variance | Chapter 2 | General Linear Hypothesis and Anova | Shalabh, IIT Kanpur
19
1 2
1
2
2
2
n
n
exp
2
2 ( y * X 2 (2) ) '( y * X 2 (2) )
0
The likelihood ratio test statistic for H 0 : (1) = (1)
is
max L( y , 2 )
max L( y , 2 )
( y X )( y X )
=
( y * X 2 (2) )( y * X 2 (2) )
n
2
( y * X 2 (2) )( y * X 2 (2) ) ( y X )( y X ) + ( y X )( y X )
=
( y X )( y X )
( y * X 2 (2) )( y * X 2 (2) ) ( y X )( y X )
= 1 +
( y X )( y X )
-n
q1 2
= 1 +
q2
- n2
n
2
0
0
( y X 1 (1)
=
X 2 (2) ) + X 2 (2) I X 2 ( X 2' X 2 ) 1 X 2' ( y X 1 (1)
X 2 (2) ) + X 2 (2)
0
0
= ( y X 1 (1)
X 2 (2) ) I X 2 ( X 2' X 2 ) 1 X 2' ( y X 1 (1)
X 2 (2) ).
0.
The other terms becomes zero using the result X 2' I X 2 ( X 2' X 2 ) 1 X 2' =
0
0
Note that under H 0 , X 1 (1)
+ X 2 (2) can be expressed as ( X 1 X 2 )( (1)
(2) ) ' ,
Analysis of Variance | Chapter 2 | General Linear Hypothesis and Anova | Shalabh, IIT Kanpur
20
Consider
( y X )( y X ) =
= ( y X ( X ' X ) 1 X ' y )( y X ( X ' X ) 1 X ' y )
= y I X ( X X ) 1 X ' y
0
0
0
0
X 2 (2) ) + X 1 (1)
+ X 2 (2) ) I X ( X ' X ) 1 X ( y X 1 (1)
X 2 (2) ) + X 1 (1)
+ X 2 (2) )
= ( y X 1 (1)
0
0
( y X 1 (1)
X 2 (2) ) I X ( X X ) 1 X ( y X 1 (1)
X 2 (2) )
0
0
X 2 (2) ) ' I X ( X X ) 1 X ( y X 1 (1)
X 2 (2) )
= ( y X 1 (1)
0.
and other term becomes zero using the result X ' I X ( X X ) 1 X =
0
Note that under H 0 , the term X 1 (1)
+ X 2 (2) can be expressed as ( X 1
0
X 2 )( (1)
(2) ) ' .
Thus
q1 = ( y * X 2 (2) )( y * X 2 (2) ) ( y X )( y X )
= y *' I X 2 ( X 2 X 2 ) 1 X 2 y * y ' I X ( X X ) 1 X y
0
0
=
X 2 (2) ) I X 2 ( X 2' X 2 ) 1 X 2' ( y X 1 (1)
X 2 (2) )
( y X 1 (1)
0
0
( y X 1 (1)
X 2 (2) ) ' I X ( X X ) 1 X ( y X 1 (1)
X 2 (2) )
0
0
=
( y X 1 (1)
X 2 (2) ) X ( X X ) 1 X X 2 ( X 2' X 2 ) 1 X 2' ( y X 1 (1)
X 2 (2) )
( y X )( y X )
q2 =
= y [ I X ( X X ) X ] y
0
0
( y X 1 (1)
=
X 2 (2) ) + ( X 1 (1)
+ X 2 (2) ) [ I X ( X X ) X ]
0
0
( y X 1 (1)
X 2 (2) ) + ( X 1 (1)
+ X 2 (2) )
'
0
0
=
X 2 (2) ) ' I X ( X X ) 1 X ( y X 1 (1)
X 2 (2) ).
( y X 1 (1)
Other terms become zero. Note that in simplifying the terms q1 and q2 , we tried to write them in the
0
quadratic form with same variable ( y X 1 (1)
X 2 (2) ).
Using the same argument as in the case 1, we can say that since
function of
is a monotonic decreasing
q1
, so the likelihood ratio test rejects H 0 whenever
q2
Analysis of Variance | Chapter 2 | General Linear Hypothesis and Anova | Shalabh, IIT Kanpur
21
q1
>C
q2
The likelihood ratio test statistic can also be obtained through least squares method as follows:
0
(q1 + q2 ) : Minimum value of ( y X )( y X ) when H 0 : (1) = (1)
holds true.
q2
q1
: Sum of squares due to the deviation from H 0 or sum of squares due to (1) adjusted
for (2) .
0
If (1)
= 0 then
= X y
'
X 2' y.
(2)
sum of squares
due to (2)
Reduction
sum of squares
or
sum of squares
due to
ignoring (1)
Q1 = Z AZ
Q2 = Z BZ
=
where A X ( X X ) 1 X ' X 2 ( X 2' X 2 ) 1 X 2'
B= I X ( X X ) 1 X '.
Q1
Q2
Q1
Q2
~ 2 (n p ).
2
Analysis of Variance | Chapter 2 | General Linear Hypothesis and Anova | Shalabh, IIT Kanpur
Then
and
~ 2 (r ) and
22
Thus under H 0 ,
Q1 / r
n p Q1
follow a F-distribution F (r , n p ).
=
Q2 /(n p )
r Q2
=
C F1 (r , n p ),
where F1 (r , n p ) denotes the upper % points on F-distribution with r and (n p ) degrees of
freedom.
Source of
Variation
Degrees
of
Freedom
Sum of
squares
Due to (1)
q1
Error
n p
q2
Total
n ( p r)
Mean
squares
q1
r
q2
(n p)
F-value
n p q1
r q2
q1 + q2
Case 3: Test of H 0 : L =
Let us consider the test of hypothesis related to a linear parametric function. Assuming that the linear
parameter function L is estimable where L = (1 , 2 ,..., p ) is a p 1 vector of known constants
and = ( 1 , 2 ,..., p ) . The null hypothesis of interest is
H 0 : L = .
where is some specified constant.
2
Consider the set up of linear model=
Y X + where Y = (Y1 , Y2 ,..., Yn ) follows N ( X , I ). The
23
E ( L ' ) = L
Cov( L ) = 2 L( X X ) 1 L
L ~ N L , 2 L( X X ) 1 L
and
n 2
~ 2 (n p)
n 2
are also independently
L and
2
distributed.
Under H 0 : L = , the statistic
(n p )( L )
t=
n 2 L( X X ) 1 L
H 0 : L = against
H1 : L rejects H 0 whenever
t t
(n p)
where t1 (n1 ) denotes the upper % points on t-distribution with n1 degrees of freedom.
: 1 =
2 ,...,
=
k k
Case 4: Test of H 0=
1 , 2
Now we develop the test of hypothesis related to more than one linear parametric functions. Let the
ith estimable linear parametric function is
i = L'i and there are k such functions with Li and both being p 1 vectors as in the Case 3.
H 0=
: 1 =
2 ,...,
=
k k
1 , 2
where 1 , 2 ,..., k are the known constants.
Let = (1 , 2 ,..., k ) and = (1 , 2 ,.., k ).
: L=
Then H 0 is expressible as H 0=
where L is a k p matrix of constants associated with L1 , L2 ,..., Lk . The maximum likelihood
estimator of i is : i = L'i where = ( X X ) 1 X y.
, ,..., ) L .
Then (=
=
1
2
k
Also E () =
Analysis of Variance | Chapter 2 | General Linear Hypothesis and Anova | Shalabh, IIT Kanpur
24
Cov() = 2V
V = (( L'i ( X X ) 1 L j ))
where
where
( L'i ( X X ) 1 L j )
is the
(i, j )th
element of V . Thus
( )V 1 ( )
2
follows a 2 distribution with k degrees of freedom and
n 2
( )V 1 ( )
and
n 2
Thus under H 0 : =
( )V 1 ( )
2
n
(n p)
n p ( )V 1 ( )
or
n 2
k
follows F- distribution with k and (n p ) degrees of freedom. So the hypothesis H 0 : = is
Analysis of Variance | Chapter 2 | General Linear Hypothesis and Anova | Shalabh, IIT Kanpur
25
Let there be p univariate normal populations and samples of different sizes are drawn from each of
the population. Let yij ( j = 1, 2,..., ni ) be a random sample from the ith normal population with mean
The random samples from different population are assumed to be independent of each other.
where
Y = (Y11 , Y12 ,..., Y1n1 , Y21 ,..., Y2 n2 ,..., Yp1 , Yp 2 ,..., Ypn p ) '
y = ( y11 , y12 ,..., y1n1 , y21 ,..., y2 n2 ,..., y p1 , y p 2 ,..., y pn p ) '
= ( 1 , 2 ,..., p )
= (11 , 12 ,..., 1n , 21 ,..., 2 n ,..., p1 , p 2 ,..., pn ) '
1
1 0...0
n1 values
1 0 0
0 1...0
n values
2
X =
0 1...0
0 0...1
n p values
0 0...1
0 if effect i is absent in x j
p
n = ni .
i =1
Analysis of Variance | Chapter 2 | General Linear Hypothesis and Anova | Shalabh, IIT Kanpur
26
and similarly the last n p rows of are 'p = (0, 0,..., 0,1).
Obviously, rank
=
, E (Y ) X and Cov(Y ) = 2 I .
( X ) p=
This completes the representation of a fixed effect linear model of full rank.
The null hypothesis of interest is H 0 : 1= 2= ...= p= (say)
and H1 : At least one i j (i j )
where and 2 are unknown.
We would develop here the likelihood ratio test. It may be noted that the same test can also be
derived through the least squares method. This will be demonstrated in the next module. This way
the readers will understand both the methods.
We already have developed the likelihood ratio for the hypothesis H 0 : 1= 2= ...= p in the case
1.
The
whole
{( ,
parametric
space
is
( p + 1)
dimensional
space
2
) : < < ,
=
> 0, i 1, 2,..., p} .
{( ,
1 p ni
1 2
2
exp
L( y , ) =
2 ( yij i )
2
2
2 =i 1 =j 1
1 p ni
n
ln (2 2 ) 2 ( yij i ) 2
ln L( y , 2 ) =
L=
2
2 =i 1 =j 1
L
1
=0 i =
i
ni
ni
y
j =1
ij
=yio
L
1 p ni
2
0
( yij yio )2 .
2
n=i 1 =j 1
Analysis of Variance | Chapter 2 | General Linear Hypothesis and Anova | Shalabh, IIT Kanpur
27
The dot sign ( o ) in yio indicates that the average has been taken over the second subscript j. The
Hessian matrix of second order partial derivation of ln L with respect to i and 2 is negative
definite at = yio and 2 = 2 which ensures that the likelihood function is maximized at these
values.
Thus the maximum value of L( y , 2 ) over is
n
1 p ni
1 2
2 ( yij i ) 2
Max L( y , 2 ) =
exp
2
2 =i 1 =j 1
=
p ni
2
2 ( yij yio )
=i 1 =j 1
n/2
n
exp .
2
1 p ni
1 2
exp 2 ( yij ) 2
L( y , ) =
2
2
2 =i 1 =j 1
1 p ni
n
ln L( y , 2 ) =
ln(2 2 ) 2 ( yij ) 2
2
2 =i 1 =j 1
The normal equations and the least squares are obtained as follows:
ln L( y , 2 )
ln L( y , 2 )
2
ni
1
=0 = yij = yoo
n=i 1 =j 1
1 p ni
=0 2 = ( yij yoo ) 2 .
n=i 1 =j 1
1 p ni
1 2
exp
2 ( yij ) 2
Max L( y , 2 ) =
2
2 =i 1 =j 1
=
p ni
2
2 ( yij yoo )
=i 1 =j 1
n/2
n
exp .
2
28
Max L( y , 2 )
Max L( y , 2 )
p ni
2
( yij yio )
i 1 j 1
= =p =ni
2
( yij yoo )
=i 1 =j 1
n/2
We have that
p
ni
( y
ij
=i 1 =j 1
=
yoo )
2
ni
( y
ij
=i 1 =j 1
p
ni
( y
=i 1 =j 1
ij
=i 1
Thus
n
2
yi ) 2 + ni ( yio yoo ) 2
I 1
=
p ni
( yij yio ) 2
=i 1 =j 1
ni
( y
ij
=i 1 =j 1
q
= 1 + 1
q2
n
2
where
=
q1
n (y
i =1
io
yoo ) ,=
and q2
2
ni
( y
=i 1 =j 1
ij
yio ) 2 .
Q1 =
ni (Yio Yoo ) ,
2
Q2 =
Si2
i 1 =i 1
where
Analysis of Variance | Chapter 2 | General Linear Hypothesis and Anova | Shalabh, IIT Kanpur
29
ni
(Y
=
Si2
ij
i =1
Yio ) 2
Yoo =
1 p ni
Yij
n=i 1 =j 1
Yio =
1
ni
ni
Y
j =1
ij
then under H 0
Q1
Q2
~ 2 ( p 1)
~ 2 (n p)
and
Q1
Q2
and
Thus under H 0
Q1
2
p 1
~ F ( p 1, n p ).
Q
2
2
n p
C F1 ( p 1, n p).
where the constant =
The analysis of variance table for the one-way classification in fixed effect model is
Source of
Variation
Degrees of
freedom
Sum of
squares
Between Population
p 1
q1
Within Population
n p
q2
Total
n 1
Mean sum
of squares
q1
p 1
q2
n p
n p q1
.
p
q2
q1 + q2
Analysis of Variance | Chapter 2 | General Linear Hypothesis and Anova | Shalabh, IIT Kanpur
30
Note that
Q
E 2 =2
n p
p
( i )2
Q1
1 p
2
i =1
=
+
=
;
E
i
p 1
p i =1
p 1
Case of rejection of
H0
: i k (i k ) against H1 : i k .
Test H 0=
This can be tested using following t-statistic
t=
Yio Yko
1 1
s2 +
ni nk
q2
. Thus
n p
The quantity t
1 , n p
2
1 , n p
2
1 1
s2 +
ni nk
1 1
s 2 + is called the critical difference.
ni nk
Analysis of Variance | Chapter 2 | General Linear Hypothesis and Anova | Shalabh, IIT Kanpur
31
The computation are simplified if ni = n for all i. In such a case , the common critical difference
(CCD) is
CCD = t
1 , n p
2
2s 2
n
and the observed difference ( yio yko ), i k are compared with CCD.
If yio yko > CCD
then the corresponding effects/means yio and yko are coming from populations with the different
means.
then
2 ( denote as event C ) will be accepted. The question arises here that in what sense
then H 03 : =
1
do we conclude such statement about the acceptance of H 03. The reason is as follows:
Since event A B C , so
P ( A B ) P (C )
In this sense if the probability of an event is higher than the intersection of the events, i.e., the
probability that H 03 is accepted is higher than the probability of acceptance of H 01 and H 02
both, so we conclude , in general , that the acceptance of H 01 and H 02 imply the acceptance of
H 03 .
Analysis of Variance | Chapter 2 | General Linear Hypothesis and Anova | Shalabh, IIT Kanpur
32
This is based mainly on the t-statistic. If we want to ensure that the significance level
simultaneously for all group comparison of interest, the approximate multiple test procedure is one
that controls the error rate per experiment basis.
There are various available multiple comparison tests. We will discuss some of them in the context
of one-way classification. In two-way or higher classification, they can be used on similar lines.
R n
s
where q , p , is the upper % point of Studentized range when = n p. The tables for q , p , are
available.
The testing procedure involves the comparison of q p , with q , p , in the usual way as1= 2= ...= p
2. Student-Newman-Keuls test:
The Student-Newman-Keuls test is similar to Studentized range test in the sense that the range is
compared with % points on critical Studentized range W P given by
W p = q , p ,
s2
.
n
If R < W p then stop the process of comparison and conclude that 1= 2= ...= p .
If R > W p then
Analysis of Variance | Chapter 2 | General Linear Hypothesis and Anova | Shalabh, IIT Kanpur
33
(i)
divide the ranked means y1* , y2* ,..., y *p into two subgroups containing - ( y *p , y *p 1 ,..., y2* )
and ( y *p 1 , y *p 2 ,..., y1* )
(ii)
y *p y2* and =
Compute the ranges R=
R2 y *p 1 y1* . Then compare the ranges R1 and R2
1
with W p 1 .
If either range ( R1 or R2 ) is smaller than W p 1 , then the means (or i s) in each of the
groups are equal.
If R1 and/or R2 are greater then W p 1 , then the ( p 1) means (or i s) in the group
concerned are divided into two groups of ( p 2) means (or i s) each and compare the
range of the two groups with W p 2 .
Continue with this procedure until a group of i means (or i s) is found whose range does not exceed
Wi .
By this method, the difference between any two means under test is significant when the range of the
observed means of each and every subgroup containing the two means under test is significant
according to the studentized critical range.
Arrange '
yio s in increasing order
y1* y2* ... y *p
Compute =
R y *p y1*
Compare with W p = q , p ,
If R < W p
Stop and conclude
1= 2= ...= p
s2
n
If R > W p
continue
Analysis of Variance | Chapter 2 | General Linear Hypothesis and Anova | Shalabh, IIT Kanpur
34
Compute R=
y p* y2*
1
=
R2 y *p 1 y1*
Compare R1 andR2 with W p 1
R1 < W p 1
R2 < W p 1
2 =3 =... = p
and
1= 2= ...= p 1
1 = 2 =... = p
R1 < W p 1
R1 > W p 1
R2 > W p 1
R2 < W p 1
2 =3 =... = p
1 = 2 =... = p 1
which is 1
which is p
one subgroup is
( 2 , 3 ,..., p )
( 1 , 2 ,..., p 1 )
R1 > W p 1
R2 > W p 1
Compute
R=
y *p y3*
3
=
R4 y *p 1 y2*
=
R5 y *p 2 y1*
and compare
with W p 2
Analysis of Variance | Chapter 2 | General Linear Hypothesis and Anova | Shalabh, IIT Kanpur
35
D p = q* p , p ,
s2
n
where p =1 (1 ) p 1 , q*
p , p,
Duncans range.
q* * , p ,
p
s
1 1
1
by q* * , p , s
+
p
2 nU nL
n
where nU and nL are the number of observations corresponding to the largest and smallest means in
the data. This procedure is only an approximate procedure but will tend to be conservative, since
means based on a small number of observations will tend to be overrepresented in the extreme
groups of means.
Analysis of Variance | Chapter 2 | General Linear Hypothesis and Anova | Shalabh, IIT Kanpur
36
p
1
i =1 ni
p
yio yko
(y y )
Var
io
ko
is used which follows a t-distribution , say with degrees of freedom ' df ' . Thus H 0 is rejected
whenever
t >t
df , 1
and it is concluded that 1 and 2 are significantly different. The inequality t > t
df , 1
can be
equivalently written as
df , 1
(y y ) .
Var
io
ko
df , 1
(y y )
Var
io
ko
then this will indicate that the difference between i and k is significantly different . So according
to this, the quantity t
df , 1
will be declared that the difference between i and k is significant. Based on this idea, we use the
pooled variance of the two sample Var ( yio yko ) as s2 and the Least Significant Difference (LSD) is
defined as
=
LSD t
df , 1
2
1 1
s2 + .
ni nk
n=
n, then
If n=
1
2
LSD = t
df , 1
2s 2
.
n
p ( p 1)
pairs of
2
Now all
Analysis of Variance | Chapter 2 | General Linear Hypothesis and Anova | Shalabh, IIT Kanpur
37
(largest/smallest sample mean) or if all pairwise comparisons are done without correction of the test
level. If LSD is used for all the pairwise comparisons then these tests are not independent. Such
correction for test levels was incorporated in Duncans test.
standard error of the difference of pooled mean is used in place of standard error of mean in
common critical difference for testing H 0 : i = k against
HSD = q
1 , p ,
2
MSerror
.
n
p ( p 1)
pairs yio yko
2
Contrast:
A linear parametric function=
L l=
'
i =1
= 0.
i=1
For example
1 2= 0, 1 + 2 3 1= 0, 1 + 2 2 33= 0
Analysis of Variance | Chapter 2 | General Linear Hypothesis and Anova | Shalabh, IIT Kanpur
38
Orthogonal contrast:
If =
L1 =
'
L2 m=
'
i i and=
i =1
m = 0 then L
i =1
L1 = 1 + 2 3 4 and
orthogonal contrasts.
The condition
m
i =1
Cov
( L1 , L2 ) =
=
i mi 0.
2
i =1
It may be noted that the number of mutually orthogonal contrasts is the number of degrees of
freedom.
Coming back to the multiple comparison test, if the null hypothesis of equality of all effects is
rejected then it is reasonable to look for the contrasts which are responsible for the rejection. In terms
of contrasts, it is desirable to have a procedure
(i)
that permits the selection of the contrasts after the data is available.
(ii)
Such procedures are Tukeys and Scheffes procedures. Before discussing these procedure, let us
consider the following example which illustrates the relationship between the testing of hypothesis
and confidence intervals.
Analysis of Variance | Chapter 2 | General Linear Hypothesis and Anova | Shalabh, IIT Kanpur
39
( i j ) ( i j )
=
(y y )
Var
io
ko
L L
( L )
Var
where denotes the maximum likelihood (or least squares) estimator of and t follows a tdistribution with df degrees of freedom. This statistic, in fact, can be extended to any linear contrast,
say e.g., L = 1 + 2 3 4 , L = 1 + 2 3 4 .
The decision rule is
reject H 0 : L = 0 against H1 : L 0 .
If
( L ) .
L > tdf Var
L L
P tdf
tdf =1
Var ( L)
( L ) L L + t Var
( L ) =1
or P L tdf Var
df
and
( L ) L L + t Var
( L )
L tdf Var
df
H 0 : L = 0 is
accepted. Our objective is to know if the confidence interval contains zero or not.
Suppose for some given data the confidence intervals for 1 2 and 1 3 are obtained as
3 1 2 2 and 2 1 3 4.
Analysis of Variance | Chapter 2 | General Linear Hypothesis and Anova | Shalabh, IIT Kanpur
40
Thus we find that the interval of 1 2 includes zero which implies that
accepted. Thus 1 = 2 .
H 0 : 1 2 =
0 is
H 0 : 1 3 =
0 is not accepted . Thus 1 3 .
If the interval of 1 3 is 1 1 3 1 then H 0 : 1 = 3 is accepted. If both H 0 : 1 = 2 and
2
mean Yi and its variance is
. This reduces to a simple condition that all ni ' s are same, i.e.,
ni
ni = n for all i. so that all the variances are same.
Another assumption is to assume that 1 , 2 ,..., p are statistically independent and the only contrasts
considered are the
p ( p 1)
1, 2,..., p .
differences i j , i j =
2
(ii)
2 2
1, 2,..., p, a > 0 is a known constant.
i ~ N ( =
i , a ), i
(iii)
s2
~ 2 ( )
2
and
(iv) s 2 is statistically independent of 1 , 2 ,..., p .
Analysis of Variance | Chapter 2 | General Linear Hypothesis and Anova | Shalabh, IIT Kanpur
41
=
Ci i ( Ci 0) simultaneously satisfy
=i 1 =i 1
1 p
1 p
L Ts Ci L L + Ts Ci
=
2 i 1=
2 i 1
1 p
Ci = 1 and the variance is 2 so that
2 i =1
( i j ) Ts i j ( i j ) + Ts
of
where T = q , p , . Thus the maximum likelihood (or least squares) estimate L=
i
j
if i j > Ts
1 p
or more general if L > Ts Ci
2 i =1
The steps involved in the testing now involve the following steps:
Compute L or i j .
s 1 p
Ci
n 2 i =1
1 p
If L or ( i j ) > Ts Ci
2 i =1
q
then i and j are significantly different where T = , p , .
n
42
q , p ,
1 1 1 1 p
s + Ci
2 ni n j 2 i =1
or
1 1 1 1 p
+ Ci
2 ni n j 2 i =1
form = Ci yi where C1 , C2 ,..., C p are known constants. In other word, L is the set of all
i =1
For any L, Let = Ci yi be its least squares (or maximum likelihood) estimator,
i =1
Var ( ) = 2 Ci2
i =1
= ( say )
2
and 2 = s 2 Ci2
i =1
constant S
L, S + S where the =
pF1 ( p, n p ) .
Analysis of Variance | Chapter 2 | General Linear Hypothesis and Anova | Shalabh, IIT Kanpur
43
(1 ) , the
Analysis of Variance | Chapter 2 | General Linear Hypothesis and Anova | Shalabh, IIT Kanpur
44