Sei sulla pagina 1di 22

Analysis of Variance Approach

to Regression Analysis

Rationale

The Analysis of Variance (ANOVA) is a


statistical principle that is based on
partitioning total observed variation into
several components with the aim of trying
to explain the sources of such variation.
Total observed variation is often measured
by the total of the squared deviations of
each observation from the mean.
2

Rationale

In the context of regression analysis in


which we presume that the observations on
the response variable can be expressed as a
(linear) function of the independent
variables in the form of
yi 0 1 xi ei

Rationale

Based on sample data, and assuming that


such a relation is true, the line that best fits
the observed values is obtained as
yi 0 1 xi

After fitting the said regression line, we


now gather some evidence if indeed such
model really holds in describing such
relationship.
4

Rationale
y

yi
( yi yi )

( yi y )

yi 0 1 xi

( yi y )

yi

( yi y ) ( yi yi ) ( yi y )

x
5

Total Deviation
( yi y ) ( yi yi ) ( yi y )
TOTAL DEVIATION

Deviation of fitted regression


value around the mean

Deviation around fitted


regression line

Sum of Squares
2
2

(
y

y
)

[(
y

y
)

(
y

y
)]
i
i
i
i

( yi y ) 2 ( yi yi ) 2 2 ( yi yi )( yi y )

( yi y ) 2 ( yi y ) 2 ( yi yi ) 2
Total Sum of Squares
(TSS)

Sum of Squares due


to the Regression of y
on x (SSR)

Sum of Squares Error


(SSE)

TSS = SSR + SSE


7

Degrees of Freedom

Total Degrees of Freedom (associated with


TSS) is (n-1). One degree of freedom is
lost because:

The deviations ( yi y ) is subject to one


constraint: sum=0; or,
The sample mean is used to estimate the
population mean

Degrees of Freedom

Degrees of Freedom due to Error : n-2.


Two degrees of freedom are lost because
we are estimating two parameters 0 and 1
in obtaining the fitted value yi

Degrees of Freedom

Degreed of Freedom due to Regression: 1.

Although there are n deviations ( yi y ) , all


fitted values yi are calculated from the same
regression line.
Two df is associated with regression line but
1 df is lost because the deviations ( yi y )
are subject to one constraint: sum is zero

Thus, df Total df Regression df Error


10

Mean Squares

In a general ANOVA, the mean squares


are obtained by dividing the SS with it
corresponding df. That is

MSTot = SSTot/(n-1)
MSR = SSR/1; MSE = SSE/(n-2)

Note: MSTot MSR + MSE

11

ANOVA Table

Results of the Analysis of Variance are


summarized in an ANOVA table:

Source of
Variation
Regression
Error
Total

df

SS

MS

1
n-2
n-1

SSR
SSE

MSR
MSE

12

Expected Mean Squares (EMS)

The EMS are useful quantities that:

Tells us what parametric function is being


estimated by the MS [Method of Moments
Estimator]
In some instances, this will suggest how the
test-statistic will be defined to test specific
hypotheses.

13

Expected Mean Squares (EMS)


2
E
[
MSE
]

The mean of the sampling distribution of


MSE is 2 whether or not X and Y are
linearly related ( whether or not 1=0)

( SSE ) / 2 ~

(2n2) E[ SSE / 2 ] n 2
SSE
2
E

E
[
MSE
]

n 2

14

Expected Mean Squares (EMS


E[ MSR ] 1
2

2
(
x

x
)
i

The mean of the sampling distribution of


MSR is also 2 when 1=0. In this case,
MSR and MSE will tend to be of the same
magnitude.
When, 10, MSR > MSE. Thus, a
comparison of MSR and MSE may be
used to determined whether or not 1=0.
15

Test of Hypothesis:
H0:1=0 vs H1: 10

From the EMS, it appears to be logical that


to test this hypothesis, one can compare
MSR and MSE.
From statistical theory and assuming
normality of the error terms (Cochrans
Theorem):

16

Test of Hypothesis:
H0:1=0 vs H1: 10

MSR and MSE are independent

MSR ~

2
(1)
, MSE ~ (2n 2)

Thus, a logical test-statistic (GLRT) would


be
MSR
Fc

MSE

~ F(1,n 2)

Reject H0 for large values of Fc or if


Fc F ,(1,n2)

17

General Linear Test Approach

Another approach to test the hypothesis


concerning regression parameters (or a
function of such parameters).
First Fit the Full Model. In SLR case,
yi 0 1 xi ei

Compute:
SSE ( F ) ( yi yi ) 2 ( yi [ 0 1 xi ]) 2 SSE

18

General Linear Test Approach

Fit the Reduced Model under H0


(assuming H0 is true). In the SLR case,
H0: 1=0. This means, we fit the (reduced)
model: yi 0 ei
In this case, the value of 0 that minimizes
2
) 2 is y
e

(
y

i i 0
0

19

General Linear Test Approach

Compute the SSE for the reduced model


as: SSE ( R) ( yi 0 ) 2 ( yi y ) 2 SST
Note that in general, SSE ( F ) SSE ( R) due to
the fact that the more parameters are
employed in model fitting, the better the
fit.

20

General Linear Test Approach

Test Statistic:
SSE ( R ) SSE ( F )
df R df F
*
F
~ F[ df R df F ,df F ]
SSE ( F )
df F

Note that in the case of the SLR and


testing H0:1=0,
21

General Linear Test Approach


SST SSE
SSR
MSR
(n 1) (n 2)
*
1

F
SSE
MSE MSE
n2

Thus, the two tests are equivalent.


This approach can be extended for more
complex tests.
22

Potrebbero piacerti anche