Sei sulla pagina 1di 11

Analysis of Variance (ANOVA)

The analysis of variance, frequently referred to as ANOVA is a statistical technique specially designed to test whether the means of more than two quantitative populations are equal. The analysis is capable of fruitful application to a diversity of practical problems. Basically, it consists of classifying and cross classifying statistical results and testing whether the means of a specified classification differs significantly. !n this way it is determined whether the given classification is important in affecting the results. "or e#ample, the output of a given process might be cross classified by machines and operators $each operator having wor%ed on each machine. "rom this classification it could be determined whether the mean qualities of it could be determined whether the mean qualities of outputs of various machines differed significantly. Also it could independently be determined whether the mean qualities of outputs of the various machines deferred significantly. &uch a study would help us in determining whether uniformity in quality of outputs could be increased by standardi'ing the procedures of the operators $say through special training( and li%ewise whether it could be increased by standardi'ing the machines. Analysis of variance thus enables us to analyse the total variance of our data into components which may be attributed to various )sources* or )causes* of variation. The analysis of variance originated in agrarian research and its language is thus loaded with agricultural terms li%e +bloc%s, $referring to land( and +treatments, $referring to populations or samples(.

Assumption in ANOVA
ANOVA is based on the following assumptions $i( Normality - The universe from which the sample is drawn is normally distributed.

$ii(

Homogeneity - The )variances* of the population from which the samples have been ta%en do not significantly differ from one another. !n other words +Null .ypothesis* is 2 2 2 H 0 : 1 = 2 = ...... = n Independence of Error - The samples drawn from the universe is random and independent of each other.

$iii(

!n the problems faced in actual life, these assumptions may or may not hold good. .owever, unless the universe are highly s%ewed, minor differences in the assumptions do not affect the validity of +" test,.

echniques of Analy!ing Variance


"or the sta%e of clarity, the technique of analysis of variance has been classified as $i( $ii( One way classification and Two way classification

One"#ay $lassification
!n a one way classification, the data are classified according to one criterion The null hypothesis is
H 0 = 1 = 2 = 3 = ... = k and H i = 1 2 3 ... k

!t means that the arithmetic means of populations from which )/* samples were randomly drawn were equal to one another.

%teps in $arrying out Analysis


(I) $alculate Variance &etween the %amples $i( The variance between samples $groups( measures the difference between the sample mean of each group and the overall mean weighted by the number of observations in each group. The variance between samples ta%es into account the random variations from observation to observation. !t measures difference from one group to another.

$ii(

$iii( $iv(

The sum of the squares between samples is denoted by &&0 "or calculating variance between the samples we ta%e the total of the square of the deviations of the means of various samples from the grand average and divide this total by the degree of freedom. Thus, the steps in calculating variance between samples will be $a( $b( 0alculate the mean of each sample i.e. X 1 , X 2 ,..., etc. 0alculate the grand average X pronounced as )1 double bar*. !ts value is obtained as follows
X = X 1 + X 2 + X 3 + .... N 1 + N 2 + N 3 + ....

$c( $d( $e(

Ta%e the difference between the means of the various samples and the grand average. &quare these deviations and obtain the total which will give sum of the squares between the samples2 and 3ivide the total obtained in step $d( by the degree of freedom will be one less than the number of samples, i.e. if there are 4 samples, then the degree of freedom will be 4 567 or = k 1 , where % is the number of samples.

$alculation of Variance within samples


The variance $or sum of squares( within samples measures these inter sample differences due to chance only. !t is denoted by &&8. The variance within samples $groups( measures variability around mean of each group. &teps ta%en in calculating the variance within the samples are as follows $i( $ii( 0alculate the mean value of each sample X 1 , X 2 ,..., etc. Ta%e the deviations of the various items in a sample from the mean values of respective samples.

$iii( $iv(

&quare these deviations and obtain the total which give the sum of the square within the samples, and 3ivide the total obtained in step $iii(, by the degree of freedom. The degree of freedom is obtained by deducting from the total number of items, the number of samples i.e. = n k , where % refers to the number of samples and +n,, the number of observations.

$alculation of 'atio
*= -etween " collumn variance #ithin " column variance %ym&olically *= %, %+
2 2

0ompare the calculated value of " for the degree of freedom at a certain critical level $generally ta%en to be 9 percent level of significance(. !f the calculated value of " is greater than the table value, the difference in the sample means is ta%en to be significant. On the other hands, if the calculated value of " is less than table value, the difference is ta%en as not significant and may have arisen due to fluctuation of sampling. !t is customary to summarise calculations for sum of squares, together with the r numbers of degrees of freedom and mean squares in a table called +Analysis of variance table,. !t is shown as Analysis of variance (ANNOVA) a&le One"#ay $lassification (odel

%ources of Variation Between &ample <ithin &amples Total

%% (%um of %quares) &&0 &&8 &&T

()egree of *reedom)

1 = c 1 1 = n c
n 5

(% ((ean %quare) :&0 6 &&0;c - 5 :&8 6 &&8;n - c

Variance 'atio of *

(%$ (%E

<here &&T = &&0 = Total sum of square of variations &um of square between samples $columns(

&&8 = :&0 = &:8 = E.ample

&um of square within samples $>ows( :ean sum of squares between samples :ean sum of square within samples

To assess the significance of possible variation in performance in a certain test between the grammer schools of a city, a common test was given to a number of students ta%en at random from the senior fifth class of each of the four schools concerned. The results are given below. :a%e an analysis of variance of data. @ @ @ @ @ 5A 55 C 54 4 @ @ @ @ @ $ 5? 5A 5D D ? @ @ @ @ @ ) 57 C 5A 5D 59

A ? 5B 5A ? E

%olution %ample , /, ? 5B 5A ? E 12 6 %ample + /+ 5A 55 C 54 4 24 ,4 %ample 0 /0 5? 5A 5D D ? 54 ,+ %ample 1 /1 57 C 5A 5D 59 52 ,0

otal
X

@ @ @ @ @ 3 3

@ @ @ @ @ 3 3

@ @ @ @ @ 3 3

7rand (ean X1 + X 2 + X 3 + X 4 N 9 + 10 + 12 + 13 = = 11 20 X =

Variance -etween %amples To obtain the variation between samples, calculate the square of the deviation of various samples from the grand average. The mean of the sample 5 is C but the grand mean is 55, thus the difference and its square is ta%en.

Fi%ewise for sample A, the mean is 5B, and the grand mean is 55 the difference of 5B and 55 is ta%en and is squared. >epeating the procedure in the ne#t samples, we get the following table %ample + @ @ @ @ @ 3 3 %ample 0 @ @ @ @ @ 3 3 %ample 1 @ @ @ @ @ 3 3

%ample ,

(X

(9 )

4 4 4 4 4 +4

(X

(10)

5 5 5 5 5 2

(X

(12)

5 5 5 5 5 2

(X

(13)

4 4 4 4 4 +4

&um of the square between samples


6 AB G 9 G 9 G AB 6 9B sum of
50 50 = = = 16.7 ( 4 1) 3

:ean

the

square

between

the

samples

of

$Because the df here is 7( Variance within %amples .ere we find the sum of the squares is the deviation of various items in a sample from the mean values of the respective samples. Thus, for first sample, the mean is C, so we ta%e the deviations from respective items of the sample and so on. The squared deviations are given in the following tables.

%ample , /, ? 5 B 5 A ? E @ @ @ @ @ ,5

(X

5 5 C 5 4

%ample + /+ 5A @ 55 @ C @

(X

4 5 5 5D 7D

%ample 0 /0 5? @ 5A @ 5D @ D ? @ @ ,41

(X

7D B 5D 7D 5D

%ample 1 /1 57 @ C @ 5A @ 5D @ 59 @ 04

(X

B 5D 5 C 4

54 @ 4 @ 28

Total sum of squares within the samples


6 5D G 9? G 5B4 G 7B 6 AB? 6

:ean sum of square within the samples


= 208 208 = = 13 20 4 16

!t is advisable to chec% up the calculations by finding out the total variation. The total variation is calculated by ta%ing the square of the deviations of each item from the grand average. %ample + /+ 5A 55 C 54 4 9B $5B ( @ @ @ @ @ @ @ %ample 0 /0 5? 5A 5D D ? DB $5A ( @ @ @ @ @ @ @ %ample 1 /1 57 C 5A 5D 59 D9 $57 ( @ @ @ @ @ @ @

%ample , /, ? 5B 5A ? E 49 $C( @ @ @ @ @ @ @

(X

C 5 5 C 5D 7D $55(

(X

5 B 4 C 4C D7 $55(

(X

4C 5 A9 A9 C 5BC $55(

(X

4 4 5 A9 5D 9B $55(

Total sum of squares 6 $7D G D7 G 5BC G 9B( 6 A9?


The 3egree of "reedom 6 AB - 5 6 5C Thus, when we add the sum of square between samples and sum of squares within samples, we get the same total 9B G AB? 6 A9?. Thus, our calculation is correct. Now all the above results are tabulated as %um of %quares 9B AB? A9? )egree of *reedom 7 5D 5C (ean %quare 5D.E 57.B

%ource of Variation Between &amples <ithin &amples Total

* =

Variance -etween %amples 16.7 = = 1.285 Variance within %amples 13

The table value of " or 1 = 3 and 2 = 16 at 9 percent level of significance is 7.A4. The calculated value of " is less than the table value, hence the difference in the mean values of the sample is not significant. Thus, we can say that the samples could have come from same universe.

Analysis of Variance9 wo"#ay $lassification (odel

!n a one factor analysis of variance, the treatment constitutes different levels of a single factor which is controlled in the e#periment. There could be many situations in which the response variable of interest may be affected by more than one factor. "or e#ample, the sale of cosmetics, in addition to being affected by point of display, might also be affected by the price charged, the si'e of and;or location of the store or the number of competitive products sold by the store all. &imilarly, petrol mileage may be affected by the car driven, the way it is driven, road conditions and other factors in addition to the brand of petrol used. <hen it is believed that two independent factors might have an affect on the response variable of interest, it is possible to design the test so that an analysis of variance can be used to test the affects of two factors simultaneously, such a test is called a two factor analysis of variance. <ith two factor analysis of variance, we can test two sets of hypothesis with the same data at the same time. !n a two way classification, the data are classified according to two different criteria or factors. !n a two way classification, the analysis of variance table ta%es the following form
%um of %quares &&0 &&> &&8 &&T )egree of *reedom $c - 5( $r - 5( $c - 5(.$r - 5( n 5 (ean %um of %quares :&0 6 &&0;$c - 5( :&> 6 &&>;$r - 5( :&8 6 &&8;$r - 5(.$c - 5( 'atio of * :&0;:&8 :&>;:&8

%ource of Variation Between &amples Between >ows >esidual or 8rror Total

<here, &&0 &&> &&8 &&T 6 6 6 6 &um of &quare between columns &um of &quare between rows &um of &quare due to 8rror Total sum of &quare

" - Values are calculated as


MSC MSE = <here, 1 (c 1) and 2 = (c 1).(r 1) F (1 ,2 ) =

F(1 ,2 ) =

MSR MSE

<here, 1 = (c 1) and 2 = (c 1).(r 1) !t should be carefully noted that 1 may not be same in both cases in = ( c 1 ) one case 1 and in other 1 = (r 1) . The calculated values of " are compared with the table values. !f calculated value of " is greater than the table value at a pre assigned level of significance, the null hypothesis is reHected, otherwise accepted. E.ample A tea company appoints four sales men A, B, 0 and 3 and observes their sales in three seasons - summer, winter and monsoon. The figures $in la%hs( are given in the following table. A 7D A? AD CB $i( $ii( %olution The above data are classified to criteria. $a( $b( %easons &ummer <inter :onsoon Total &alesmen and &easons 7D AC A? C7 %alesmen $ A5 75 AC ?5 ) 79 7A AC CD %eason:s otal 5A? 5AB 55A 7DB

%easons &ummer <inter :onsoon Total

3o the salesmen significantly differ in performanceI !s there a significant difference between the seasonI

"urther, in order to simplify the calculations we code the data by subtracting 7B from each figure. The data in coded form is summari'ed as follows. %alesmen $ C G5 5 C %eason:s otal G? B ? B

A GD A 4 B

GD 5 A 7

) G9 GA 5 D

0orrection "actor

T2 0 = =0 N 12

$Number of items or N is 5A( %um of the %quare &etween %alesmen This will be obtained by squaring up the salesmen,s totals dividing each total by the number of items included in it, adding these figures and subtracting the correlation factor from them

Thus, sum of squares between salesmen


= 3 3 3 = 0 + 3 + 27 + 12 0 = 42

( 0 ) 2 + ( 3) 2 + ( 9 ) 2

( 6) 2 ( 0) 2
3 12

2 = ( c 1) = ( 4 1) = 3

%ums of the squares &etween seasons This is obtained by dividing the squares of the seasons total by the number of items that ma%e up each total, adding all such figures and subtracting them from the correction factors.

Thus, the sum of squares between seasons


=

( 8) 2

2 = ( r 1) = ( 3 1) = 2

4 4 4 = 16 + 0 + 16 0 = 32

( 0) 2

( 8) 2

T2 N

otal %um of %quares This is obtained by adding the squares of all the items in the total items in the table and subtracting the correction factor therefore, thus. Total sum of squares
2

( 6) 2 + ( 2) 2 + ( 4) 2 + ( 1) 2 + ( 2) 2 + ( 9 ) 2 + (1) 2 + ( 1) 2 + ( 5) 2 + ( 2 ) 2 + ( 1) 2 T
= ( n 1) = (12 1) = 11 = 210 0 = 210

10

Now, the above information is presented in the following table of Analysis of variance. %um of %quares 4A 7A 57D A5B

%ources of Variation Between &amples Between >ows $&easons( >esidual Total

()
7 A D 55

df

(ean %quares 54 5D AA.DE

Now let us ta%e the hypothesis that there is no difference between the sales of salesmen and of seasons or in other words, the three independent estimates of estimates of variance are the estimates of variance of a common population. Now, first compare the salesmen variance estimate with the residual variance estimate,

Thus,
F =

22.67 = 1.619 14

The table value of " for 1 = 3 and 2 = 6 at 9J level of significance is 4.ED. The calculated value is less than the table value and we conclude that the sales of different salesman do not differ significantly. Now, let us compare the )season variance* estimate with residual estimates.

Thus,
F =

22.67 = 1.417 16

The critical value of " for 1 = 2 and 1 = 3 at 9J level of significance is 9.54. The calculated the difference is not significant, hence we conclude that the difference is not significant. .ence, we say that sale of salesman in different season do not differ significantly.

11

Potrebbero piacerti anche