Sei sulla pagina 1di 18

ANOVA & CHI Square

To Test for the significance of the


difference among more than two
sample means
• The training Director wanted to evaluate three
different training methods to determine whether
there were any differences in effectiveness of
the training methods.
• After completion of the training period, she
chose 18 new employees assigned at random to
the three training methods.
• Counting the production output by these 18
trainees, she summarized the data and
calculated the mean production of the trainees.
To determine the grand mean any
of the method can be followed
• Grand Mean
={(15+18+19+22+11)+(22+27+18+21+17)+(18+
24+19+16+22+15) = 19
• Method-1 Method-2 Method-3
15 22 18
… … …
11 17 15
------- -------- -------
85 105 114
Grand Mean=19
Statement of the Hypotheses
• Whether these three samples were drawn from
populations( a population is the total number of
employees who could be trained by that method)
having the same means.
• If the population means do not differ significantly
we can infer that choice of training methods
have the same effectiveness on the productivity
of the employee.
• Otherwise, we could adjust our training program
accordingly.
ANOVA
• Each of the samples is drawn from a normal
population and that each of the populations has
the same variance.
• If the sample size is large—Normality
assumption is not required.
• If null hypothesis is true, classifying data into
three columns is unnecessary and the entire set
of 18 measurements of productivity can be
thought of as a sample from one population
having a common variance.
Comparison of Estimates
• ANOVA is based on a comparison of two different
estimates of the variance of overall population.
• We can calculate one of these estimates by examining
the variance among three sample means( 17, 21, 19).
• The other estimate can be determined by the variations
within the three samples themselves.(17.5, 15.5, 12.0)
• Compare these two estimates of population variance.
• Because, both are estimates of variance, they should be
approximately equal in value when null hypothesis is
true.
• If the null hypothesis is not true, these two estimates will
differ considerably.
Variability among sample Means: Variance
between the samples provides a good estimate
only if the null hypothesis is true. If the null
hypothesis is false, it overestimates variance
( 20)
Variability of data within samples: Variance
within the samples approach provides a good
estimate of population variance in either case
(14.769).
When populations are not the same, the between
sample mean variance tends to be larger than
variance within sample approach and F tends to
be large which tends to reject the null
hypothesis.( F = 1.354)
Problem
In McDonald, a fast- food chain feels it is gaining a bad reputation
because it takes too long to serve the customers . Because the
chain as four restaurants in a city , it is concerned with whether all
four restaurants have the same average service time. One of the
owners of the fast food chain has decided to visit each of the stores
and monitor the service time for five randomly selected customers.
At his four noontime visits , he records the following service times in
minutes:
Restaurant-1 3 4 5.5 3.5 4
Restaurant-2 3 3.5 4.5 4 5.5
Restaurant-3 2 3.5 5 6.5 6
Restaurant-4 3 4 5.5 2.5 3
( a) Using a 0.05 significance level , do all the restaurants have the
same mean service time ?
( b) Based on his results , should the owner make any policy
recommendations to any of the restaurant managers ?
A survey conducted over the last 25 years
indicated that in 10 years ,the winter was
mild , in 8 years it was cold and in the
remain years , it was very cold . A company
sells 1000 woolen coats in a mild year ,1300
in a cold year and 2000 in a very cold year .
Find the yearly expected profit of the
company , if a woolen coat costs Rs. 1730/-
was sold for Rs.2480/- on an average .
Chi square as a test of
independence
• To test whether more than two population
proportions can be considered equal.
• One can classify population into several
categories with respect to two attributes and can
use this test to determine their independence or
whether one influences the other.
• If the null hypothesis is true, one can combine
the data from samples and then estimate the
proportion.
Problem (Attitude about job
interview)
N-E S-E Central West Coast Total

Present 68 75 57 79 279
Method
New 32 45 33 31 141
Method
Total 100 120 90 110 420
Problem-1
The number of car accidents per month in a
certain city were as follows:

12, 18, 20, 2 , 14, 10, 15, 6 , 9, 4.


Are these frequencies in agreement with the
belief that accident conditions were same
during this 10 month period.
Problem-2
The theory predicts that the proportion of an
item in the four groups A, B, C, D should
be 9:3:3:1. In an experiment among 1600
items, the numbers in the groups were
882, 313, 287 and 118. Does this
experiment support the theory.
Problem-3
Records taken of the number of male and female births in
800 families having 4 children are given as:
male female families
0 4 32
1 3 178
2 2 290
3 1 236
4 0 64
Test whether the data are consistent with the hypothesis
that male and female births are equally likely.
Problem-4
An educator has the opinion that the grades high school students
make depend on the amount of time they spend listening to music.
To test this theory, he has randomly given 400 students a
questionnaire. Within the questionnaire are the two questions. ”How
many hours a week you listen to music?” “What is the average
grade for all your classes?” The data from the survey are in the
following table. Using a 5% significance level, verify whether grades
and the time spent listening to music are independent or dependent.
Hours spent listening to Music Average Grade
A B C D F
Less than 5 Hours 13 10 11 16 5
5 to 10 hours 20 27 27 19 2
10 to 15 hours 9 27 71 16 32
More than 20 hours 8 11 41 24 11
Problem-5
The distribution of typing mistakes committed by a typist is
given as :

Mistakes/page 0 1 2 3 4 5
No. of pages 142 156 69 27 5 1

Assuming the distribution to be random find out the


expected number of pages containing 0 ,1, 2, 3, 4 ,5
mistakes respectively. Test a hypothesis that there is no
significant difference amongst the observed and expected
number of pages congaing the mistakes.
Find Correlation coefficient
sales expenses (lakh)
50 11
50 13
55 14
60 16
65 16
65 15
65 15
60 14
60 13
50 13
Weight at age 12
Age Weight
1 5
2 8
3 9
4 11
5 14
6 15
7 17
8 18
9 20
10 25