Sei sulla pagina 1di 117

1

 Population  all possible values


 Sample  a portion of the population
 Statistical inference  generalizing from a
sample to a population with calculated
degree of certainity
 Two forms of statistical inference
◦ Hypothesis testing
◦ Estimation
 Parameter  a characteristic of population, e.g.,
population mean µ
 Statistic The mean x and standard deviation s
of a sample are known as statistics
 The general goal of a hypothesis test is to
rule out chance (sampling error) as a
plausible explanation for the results from a
research study.
 Hypothesis testing is a technique to help
determine whether a specific treatment has
an effect on the individuals in a population.

3
The hypothesis test is used to evaluate the
results from a research study in which
1. A sample is selected from the
population.
2. The treatment is administered to the
sample.
3. After treatment, the individuals in the
sample are measured.

4
 This helps us to decide, on the basis of the
results of the samples, whether
 (i) the deviation between the observed sample
statistic and the hypothetical parameter value
 (or)
 (ii) the deviation between two sample
statistics
is significant

5
 On the basis of sample information, we make certain
decision about the population. In taking such decisions
we make certain assumptions. These assumptions are
known as statistical hypothesis. Assuming the
hypothesis correct we calculate the probability getting
the observed sample.
 Null hypothesis(H0): We set up a hypothesis which
assumes that there is no significant difference between
the sample statistic and the corresponding parameter or
between two population statistics. Such a hypothesis of
no difference is called a null hypothesis.
 Alternative hypothesis(H1): Any hypothesis which is
complementary to the null hypothesis is called an
alternative hypothesis.

6
 Because the hypothesis test relies on sample
data, and because sample data are not
completely reliable, there is always the risk
that misleading data will cause the hypothesis
test to reach a wrong conclusion.
 Two types of error are possible.

7
 Type I error
 Type II error

 A Type I error is the mistake of rejecting the


null hypothesis when is actually true.
 A Type II error is the mistake of failing to
reject the null hypothesis when it is actually
false(also known as consumer’s risk)
 .

8
The tails in a distribution are the extreme
regions bounded by critical values.
Determinations of P-values and critical
values are affected by whether a critical
region is in two tails, the left tail, or the
right tail. It therefore becomes important to
correctly characterize a hypothesis test as
two-tailed left-tailed, or right-tailed.

9
 If θ0 is a population parameter and θ is the
corresponding sample statistic and if we set up the null
hypothesis H0: θ= θ0, then the alternative hypothesis
which is H0 complementary to can be any one of the
following:
 (i) H1: θ≠θ0 (ie) θ>θ0 or θ<θ0
 (ii) H1: θ>θ0
 (iii) H1: θ<θ0
 H1 given in (i) is called a two tailed alternative
hypothesis
 H1 given in (ii) is called a right tailed alternative
hypothesis.
 H1 given in (ii) is called a left tailed alternative
hypothesis.

10
 The critical region (or rejection region) is the set of
all values of the test statistic that cause us to reject
the null hypothesis.
 The region of the sample S which amounts to the
acceptance of H0 is called acceptance region.
 The value of the test statistic which separates the
critical region from the acceptance region is called
the critical value or significant value.
 The value of the test statistic z for which the
critical region and acceptance region are separated
is called the critical value or the significant value of
z and denoted by zα, where α is the level of
significance.

11
 :A confidence interval estimate of a
population parameter contains the likely
values of that parameter. If a confidence
interval does not include a claimed value of a
population parameter, reject that claim.

12
13
 (i) Null hypothesis H0 is defined
 (ii) Alternative hypothesis H1 is also defined (two
tailed or one tailed)
 (iii) Level of significance α is fixed or taken from
the problem if specified and z
 The test statistic z is computed
 If | z | z , null hypothesis is accepted or H1 is
rejected (ie) the difference is not significant at
α%. Otherwise null hypothesis is rejected and
concluded there is a significant difference .

14
LOS
Nature of
test 1%(0.01) 2%(0.02) 5^(0.05) 10%(0.1)

Two tailed |zα|=2.58 |zα|=2.33 |zα|=1.96 |zα|=1.645

Right- zα=2.33 zα=2.055 zα=1.645 zα=1.28


tailed
Left-tailed zα=-2.33 zα=-2.055 zα=-1.645 zα=-1.28

15
 The power of a hypothesis test is defined is
the probability that the test will reject the null
hypothesis when the treatment does have an
effect.
 The power of a test depends on a variety of
factors including the size of the treatment
effect and the size of the sample.

17
18
 95% confidence limits for μ are given by

   
 x  1.96 , x  1.96 
 n n
 When σ is not given 95% confidence limits are
given by

 s s 
 x  1.96 , x  1.96 
 n n
 Where s is the sample standard deviation

19
 The mean breaking strength of the cables
supplied by a manufacturer is 1800 with a SD
of 100. By a new technique in the
manufacturing process, it is claimed that the
breaking strength of the cable has increased.
To test this claim, a sample of 50 cables is
tested and it is found that the mean breaking
strength is 1850. Can we support the claim at
1% LOS?

20
x  1850, n  50,   1800 and   100
H0 : x  
H1 : x  
One tailed (right tailed ) test is to be used
Let LOS  1%. z  2.33.
x   1850  1800
z   3.54
 / n 100 / 50
| z | z
Therefore, the difference between x and  is significant at 1% level.
H 0 is rejected and H1 is accepted (ie) sample sup port claim of increase in breaking strength.

21
 A sample of 100 students is taken from a
large population. The mean height of the
students in this sample is 160 cm. Can it be
reasonably regarded that, in the population,
the mean height is 165 cm, and the SD is 10
cm?

22
x  160, n  100,   165 and   10
H0 : x  
H1 : x  
Two tailed test is to be used
Let LOS be1%.Therefore, z  2.58
x   160  165
z   5
 / n 10 / 100
| z | z
The difference is significant at 1% level (ie) H 0 is rejected
It is not statistically correct to assume that   165

23
 The mean value of a random sample of 60
items was found to be 145, with a SD of 40.
Find the 95% confidence limits for the
population mean. What size of the sample is
required to estimate the population mean
within 5 of its actual value with 95% or more
confidence, using the sample mean.

24
 Since σ is not given , we use the formula
 s s 
 x  1.96    x  1.96 
 n n 
1.96 * 40 1.96 * 40
(ie)145     145 
60 60
(ie)134.9    155.1
We have to find the value of n such that
P{ x  5    x  5}  0.95
P{5    x  5}  0.95
P (|  x | 5)  0.95
 (ie) P (| x   | 5)  0.95
| x | 5 
P    0.95
  / n  / n 
 5 n 
P
 | z |   0.95

  
25
 Z is the standard normal variate

5 n
 1.96

n  245.86

26
 A normal population has a mean of 6.8 and
sd 0f 1.5. A sample of 400 members gave a
mean of 6.75. Is the difference significant?

27
x  6.75, n  400,   6.8 and   1.5
H0 : x  
H1 : x  
Two tailed test is to be used
Let LOS be 5%.Therefore, z  1.96
x 6.75  6.8
z   0.67
 / n 1.5 / 900
| z | z
The difference is significant at 5% level (ie) H 0 is accepted

28
 The mean weight obtained from a random
sample of size 100 is 64 gms. The SD of the
weight distribution of the population is 3
gms. Test the statement that the mean
weight of the population is 67 gms at 5 :%
level of significance. Also set up 99%
confidence limits of the mean weight of the
population.

29
n  100,   67, x  64,   3
H 0 : x   or   67
H1 : x   or   67
x
z  10
/ n
| z | 1.96
H 0 is rejected .

99% confidence lim its x  2.58( / n ) (ie)64  2.58(3 / 10)


 63.226, 64.774

30
Suppose the manufacturer claims that the mean lifetime
of a light bulb is more than 10,000 hours. In a sample
of 30 light bulbs, it was found that they only last
9,900 hours on average. Assume the population
standard deviation is 120 hours. At 0.05 significance
level, can we reject the claim by the manufacturer?
 Solution:

 Null hypothesis:μ ≥ 10000.Alternative Hypothesis:


μ<10000
 Interpretation: The test statistic -4.5644 is less than
the critical value of -1.6449. Hence, at 0.05
significance level, we reject the claims that mean
lifetime of a light bulb is above 10,000 hours.
 The lower tail p-value of the test statistic is less than
the significance level 0.05, we reject the null
hypothesis that μ ≥ 10000.
31
Suppose the food label on a cookie bag states that
there is at most 2 grams of saturated fat in a
single cookie. In a sample of 35 cookies, it is
found that the mean amount of saturated fat per
cookie is 2.1 grams. Assume that the population
standard deviation is 0.25 grams. At 0.05
significance level, can we reject the claim on food
label?
 Solution:
 Null hypothesis:μ ≤2.Alternative Hypothesis: μ> 2.
 Interpretation:-
 The test statistic 2.3664 is greater than the critical
value of 1.6449. Hence, at 0.05 significance level,
we reject the claim that there is at most 2 grams of
saturated fat in a cookie.
32
 Suppose the mean weight of King Penguins found
in an Antarctic colony last year was 15.4 kg. In a
sample of 35 penguins same time this year in the
same colony, the mean penguin weight is 14.6 kg.
Assume the population standard deviation is 2.5
kg. At .05 significance level, can we reject the null
hypothesis that the mean penguin weight does not
differ from last year?

33
Let X 1 and X 2 be the means of two l arg e samples n1 and n2
with s tan dard deviations  1 and  2
The statistic z is given by
X1  X 2
z
 2
 2
1
 2|

n1 n2
| z | z , thedifference between X 1 and X 2 is not significant at  % LOS

34
 If the samples are drawn from the same
population (ie) σ1=σ1= σ
X1  X 2
z 
1 1
 
n1 n2

 If σ1 and σ2 are unknown, s1 and s2 can be


used
X1  X 2
z
s12 s2 2

n1 n2

35
 If σ1,σ2 are equal and not known , then
σ1=σ2 =σ is found using
2
 n2 s2
2
 2

n1 s1
n1  n2

In such cases
X1  X
z  2

s12 s2 2

n2 n1

36
 In a random sample of size 500, the mean is
found to be 20. In another independent
sample of size 400, the mean is 15. Could
the samples have been drawn from the same
population with SD 4?

37
x1  20, n1  500, x2  15, n2  400,   4
H 0 : x1  x2
H1 : x1  x2
Two tailed test is to be used
Let LOS be 1%. z  2.58
x1  x2
z  18.6
1 1
 
n1 n2
| z | z
Therefore, the difference between x1 and x2 is significant at 1% level
H 0 is rejected
Samples are not drawn from the same population

38
 A simple sample of heights of 6400 English
men has a mean of 170 cm and a SD 6.4 cm,
while a simple sample of heights of 1600
Americans has a mean of 172 cm and a SD of
6.3 cm. Do the data indicate that Americans
are, on the average taller than the
Englishmen?

39
n1  6400, x1  170, s1  6.4, n2  1600, x2  172, s1  6.3
H 0 : 1  2 or x1  x2
H1 : 1  2 or x1  x2
Left tail test
LOS 1%
z  2.33
x1  x2 x1  x2 170  172
z    11.32
 12  22 s12 s2 2 6.42 6.32
  
n1 n2 n1 n2 6400 1600
| z || z |

40
 The difference is significant at 1% level
 H0 is rejected and H1 is accepted
 (ie) Americans are,on an average, taller than
the Englishmen

41
 Test the significane of the difference between
the means of the samples, drawn from two
normal populations with the same SD using
the following data:

Size Mean SD
Sample 1 100 61 4
Sample 2 200 63 6

42
H 0 : x1  x2 or 1  2
H1 : x1  x2 or 1  2
Two tailed test is to be used
LOS 1%
z  2.33
x1  x2
z  3.02
2 2
s s
1
 2
n1 n2
| z | z
The difference is significant at 1% level
The two normal populations, from which the samples are drawn, may
not have the same mean, though they the same SD
43
 The average marks scored by 32 boys is 72
with a SD of 8, while that for 36 girls is 70
with a SD of 6. Test at 1% LOS whether the
boys perform better than girls.

44
H 0 : x1  x2 or 1   2
H1 : x1  x2 or 1   2
Right tailed test is to be used
LOS 1%
z  2.33
x1  x2
z  1.15
2 2
s s2
1

n1 n2
| z | z
The difference is not significant at 1% level
We conclude boys perform better than girls

45
 The average income of persons was Rs.210
with a SD of Rs.10 in sample of 100 people of
a city. For another sample of 150 persons,
the average income was Rs. 220 with SD of
Rs.12. The SD of incomes of the people of the
city was Rs.11. Test whether there is any
significant difference between the average
incomes of the localities.

46
 For sample 1 n1=1000,∑x=49000,
 ∑(x- x )2=784000
 For sample 2
 n2=1500,∑x=70500,
 ∑(x-mean(x) )2=2400000
 Discuss the significane of the difference of
the sample means.

47
 X denote no of successes in n independent
Bernoulli trials
 P denote the prob of success of each trial
X follows Binomial with mean nP and var nPQ
when n is l arg e
X follows Normal N (nP, nPQ )
 nP nPQ 
X / n follows normal N 
 n , 

 n2 
 PQ 
(ie) N 
 nP , 

 n 
pP
Statistic is z 
PQ
n
48
 1. If P is not known, we assume that p is
nearly equal to P, then
pP
z 
pq
n

 2. 95% confidence limits for P are then given


by
|P p|
z
pq
n
 pq pq 
(ie)  p  1.96 , p  1.96 
 n n 
49
 The fatality rate of typhoid patients is
believed to be 17.26%. In a certain year 640
patients suffering from typhoid were treated
in a metropolitan hospital and only 63
patients died. Can you consider the hospital
efficient?

50
H0 : p  P
H1 : p  P
One tailed test
Let LOS be 1%
z  2.33
P  0.1726, Q  0.8274
0.0984  0.1726
z  4.96
(0.1726*0.8274) / 640
| z || z |
H 0 is rejected

51
 Experience has shown that 20% 0f a
manufactured product is of top quality.In one
day’s production of 400 articles, only 50 are
of top quality. Show that either the
production of the day chosen was not a
representative sample or the hypothesis of
20% wrong. Based on the particular day’s
production, find also the 95% confidence
limits for the percentage of top quality
product

52
H 0 : P  1/ 5
H1 : P  1 / 5
p  prod of top quality products  50 / 400  1 / 8
1 / 8 1 / 5
z   3.75
(1 / 5) * (4 / 5) * (1 / 400)
| z | 3.75  1.96
95% confidence lim its for P are given by
 pq pq 

 p  1.96  P  p  1.96 

 n n 
0.093  P  0.157

53
 A salesman in a departmental store claims
that at most 60% of the shoppers entering the
store leave without making a purchase. A
random sample of 50 shoppers showed that
35 of them left without making a purchase.
Are these sample results consistent with the
claim of the salesman? Use an LOS of 0.05.

54
H0 : p  P
H1 : p  P
p  0.7 P  0.6
One tailed test
z  1.443
| z | z
H 0 is accepted
Sample results consistent with the claim of the salesman

55
 Suppose 60% of citizens voted in last election. 85 out of
148 people in a telephone survey said that they voted in
current election. At 0.05 significance level, can we reject
the null hypothesis that the proportion of voters in the
population is above 60% this year?
 Interpretation :
 The test statistic -0.6376 is not less than the critical
value of -1.6449. Hence, at 0.05 significance level, we do
not reject the null hypothesis that the proportion of
voters in the population is above 60% this year.
 p-value is greater than the 0.05 significance level, we do
not reject the null hypothesis that the proportion of
voters in the population is above 60% this year.

56
 p1,p2 be the proportions of successes in two
large samples of n1 and n2 from the same
population or from two populations with the
same proportions P
 PQ 
p1 follows N  P, 
 n1 
 PQ 
p 2 follows N 
 P, 

 n 2 
p1  p 2
Statistic z 
 1 1 
PQ   
 n1 n2 
n1 p1  n 2 p 2
P is not known , P 
n1  n 2
57
 In a large city A, 20% of a random sample of
900 school boys had a slight physical defect.
In another large city B, 18.5% percent of a
random sample of 1600 school boys had the
same defect. Is the difference between the
proportions significant?

58
p1  0.2, p 2  0.185, n1  900, n 2  1600
H 0 : p1  p 2
H1 : p1  p 2
Two tailed test is to be used
LOS be 5% , z  1.96
p1  p 2
z 
 1 1 
PQ   
 n1 n2 
n1 p1  n 2 p 2
P   .1904
n1  n 2
z  0.92
| z | z
p1 and p 2 is not significant at 5% level

59
 Before an increase in excise duty on tea, 800
people out of a sample of 1000 were
consumers of tea. After the increase in duty,
800 people were consumers of tea in a
sample of 1200 persons. Find whether there
is a significant decrease in the consumption
of tea after the increase in duty.

60
p1, p 2 denote proportions of the consumers before and after
increase in duty respy
p1  800 /1000  4 / 5, p 2  800 /1200  2 / 3
H 0 : p1  p 2
H1 : p1  p 2
One tailed test
LOS be 1%, z  2.33
p1  p 2
z
1 1 
PQ   
 n1 n 2 
n1 p1  n 2 p 2
P  0.7273
n1  n 2
z  6.82
| z | z (ie) diff between p1 and p 2 is significant at 1%

61
 15.5% of a random sample of 1600
undergraduates were smokers, whereas 20%
of a random sample of 900 postgraduates
were smokers in a state. Can we conclude
that less number of undergraduates are
smokers than the postgraduates?

62
p1, p 2 denote proportions of the undergrad and postgrad smo ker s
p1  0.155, p 2  0.2, n1  1600, n2  900
H 0 : p1  p 2
H1 : p1  p 2
One tailed test
LOS be 1%, z  1.645
p1  p 2
z
1 1 
PQ   
 n1 n 2 
n1 p1  n 2 p 2
P  0.1712
n1  n2
z  2.87
| z || z | (ie) diff between p1 and p 2 is significant at 5%
The habit of smoking is less among undergraduates than postgraduates

63
 Suppose that 12% of apples harvested in an
orchard last year was rotten. 30 out of 214
apples in a harvest sample this year turns out
to be rotten. At .05 significance level, can we
reject the null hypothesis that the proportion
of rotten apples in harvest stays below 12%
this year?
 The null hypothesis is that p ≤ 0.12.
 Alternative hypothesis p>0.12

64
 Interpretation:-
 The test statistic 0.90875 is not greater than
the critical value of 1.6449. Hence, at 0.05
significance level, we do not reject the null
hypothesis that the proportion of rotten
apples in harvest stays below 12% this year.

 p-value is greater than the 0.05 significance
level, we do not reject the null hypothesis
that the proportion of rotten apples in
harvest stays below 12% this year.

65
 Suppose a coin toss turns up 12 heads out of
30 trials. At 0.05 significance level, can one
reject the null hypothesis that the coin toss is
fair?
 The null hypothesis is that p = 0.5.
 Interpretation:
 The test statistic 1.095445 lies between the
critical values -1.9600 and 1.9600. Hence, at
0.05 significance level, we do not reject the
null hypothesis that the coin toss is fair.
 p-value is greater than the 0.05 significance
level, we do not reject the null hypothesis
that the coin toss is fair.
66
STUDENT’S t-DISTRIBUTION

67
 The probability density function is symmetric,
and its overall shape resembles the bell shape
of a normally distributed variable with mean
0 and variance 1, except that it is a bit lower
and wider. As the number of degrees of
freedom grows, the t-distribution approaches
the normal distribution with mean 0 and
variance 1. 68
69
 Statistic is given by

x
t , n  1 deg rees of freedom
s / n 1
x
or t 
S / n 1
n
1
S2  
n  1 r 1
( xi  x) 2

s s
95% confidence level x  t0.05    x  t0.05 , t0.05 is 5% critical
n 1 n 1
value of t for n  1deg rees of freedom

70
 A machinist is expected to make engine parts
with axle diameter of 1.75 cm. A random
sample of 10 parts shows a mean diameter
1.85 cm with an SD of 0.1cm. On the basis of
this sample, would you say that the work of
the machinist is inferior?

71
x  1.85, s  0.1, n  10 and   1.75
H0 : x  
H1 : x  
Two tailed test is to be used . LOS be 5%
x
t  3, v  n  1  9
s / n 1
Table t value t0.05  2.26, t0.01  3.25
| t | t0.05 and | t | t0.01
H 0 is rejected and H1 is accepted at 5% level but
H1 is rejected and H 0 is accepted at 1% level
At 5% work of machinist is inf erior
At 1% work of machinist is not inf erior

72
 A certain injection administered to each of 12
patients resulted in the following increases of
blood pressure
 5,2,8,-1,3,0,6,-2,1,5,0,4
 Can it be concluded that the injection will be,
in general, accompanied by an increase in BP?

73
Mean 
x  31 / 12  2.58
n
 x2  x 
2

s2  
  8.76
n  n  
s  2.96
H0 : x  
H1 : x  
Right tailed test LOS 5%, v  11
t0.05  1.8
x
t   2.89
s / n 1
| t | t0.05 . Hence H 0 is rejected and H1 is accepted .
Injection is accompanied by an increase in BP

74
 Suppose now that the manager of the team
(given the results obtained) fired the coach
 who has not made any improvement, and take
another, more promising. We report the
 times of athletes after the second training:
 Before training:
12.9 13.5 12.8 15.6 17.2 19.2 12.6 15.3 14.4 11.3
 After the second training:
12.0 12.2 11.2 13.0 15.0 15.8 12.2 13.4 12.9
11.0
 Statistic

x1  x2
t
1 1
 
n1 n2
ns n s
2 2
 is not known,   1 1 2 2
n1  n2  2
Degrees of freedom n1  n 2  2

76
 Two independent samples of sizes 8 and 7
contained the following values:

Sample 1 19 17 15 21 16 18 16 14

Sample 2 15 14 15 19 15 18 16

 Is the difference between the sample means


significant?

77
x1  17, x2  16, s1  2.12, s2  1.69
H 0 : x1  x2
H1 : x1  x2
Two tailed . LOS 5%
x1  x2
t
1 1
 
n1 n2
n1s12  n2 s22
 is not known,  
n1  n2  2
t  0.93, v  n1  n 2  2  13
t0.05  2.16
| t | t0.05 , H 0 is accepted and H1 is rejected
Two sample means do not differ significantly at 5% LOS
78
 The mean height and the SD height of 8
randomly chosen soldiers are 166.9 and 8.29
cm respy. The corresponding values of 6
randomly chosen sailors are 170.3 and 8.5
cm respy. Based on this data, can we
conclude that soldiers are, in general, shorter
than sailors?

79
x1  166.9, x2  170.3, s1  8.29, s2  8.50, n1  8, n 2  6
H 0 : x1  x2
H1 : x1  x2
One tailed . LOS 5%
x1  x2
t
1 1
 
n1 n2
n1s12  n2 s22
 is not known,  
n1  n2  2
t  0.695, v  n1  n 2  2  12
t0.05  1.78
| t | t0.05 , H 0 is accepted and H1 is rejected
Soldiers are in general , shorter than sailors
80
 12
F  2 with v1 , v2 deg rees of freedom
2
2
n s
 12  1 1 with n1  1 deg rees of freedom,
n1  1
2
n s
 22  2 2 with n2  1 deg rees of freedom
n2  1

81
 Two samples of sizes 9 and 8 gave the sums
of squares of deviations from their respective
means equal to 160 and 91 respy. Can they
be regarded as drawn from the same normal
population?

82
n1  9,  ( xi  x ) 2  160 (ie) n1 s12  160
n2  8,  ( yi  y ) 2  91, (ie) n2 s22  91
 12  20,  22  13
H 0 :  12   22
H1 :  12   22
LOs 5%
 12 20
F    1.54
 22 13
F0.05 (v1  8, v 2  7)  3.73
F  F0.05
H 0 is accepted

83
 Two independent samples of 8 and 7 items
respectively had the following values of the
variable

Sample 1 9 11 13 11 15 9 12 14

Sample 2 10 12 10 14 9 8 10

84
x 1  94, x
2
1  1138
2
1 1 
s12   x1
2
   1   4.19
x
n1  n1 
 x2  73,  x12  785
2
1  1 
s22   2 n
x 2
  2   3.39
x
n2  2 
n1s12
1 
2
 4.79
n1  1
n2 s22
 2
  3.96
n2  1
2

H 0 :  12   22 and H1 :  12   22
 12
F  2  1.21
2
F0.05 (v1  7, v 2  6)  4.21
F  F0.05 , H 0 is accepted (ie) 12 and  22 do not differ significantly

85
Chi-Square Goodness
Of Fit

86
 A statistical method assessing the goodness
of fit between a set of observed values and
those expected theoretically. A chi-squared
test is any statistical hypothesis test in
which the sampling distribution of the test
statistic is a Chi-square distribution when
the null hypothesis is true. The chi-squared
test is used to determine whether there is a
significant difference between the expected
frequencies and the observed frequencies in
one or more categories

87
n
(O  E ) 2
2   i i

i 1 Ei
Oi  Observed frequency
Ei  Expected frequency
If  2   v 2 ( ), null hypothesis is accepted
  LOS
v  deg rees of freedom

88
 1. The no. of observations N in the sample must be
reasonably large, say ≥50
 2. Individual frequencies must not be too small
(ie)Oi≥10. In case Oi<10, it is combined with the
neighbouring frequencies, so that the combined
frequency is ≥10
 3. The number of classes n must be neither too
small nor too large (ie) 4≤n≤16

89
90
 Total no of digits=10000
 If digits occur uniformly, then each digit will
occur 10,000/10=1000 times.
Oi 1026 1107 997 966 1075 933 1107 972 964 853
Ei 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000

(Oi  Ei ) 2
n
  2

i 1 Ei
1
 [(26) 2  (107) 2  ...]  58.542
1000
v  10  1  9
 0.05
2
( n  9)  16.919
Since  2   0.05
2
, Ho is rejected ,
(ie)digits do not occur uniformly in the directory

91
 The following data give the number of air-
craft accidents that occurred during the
various days of a week.

Day Mon Tue Wed Thur Fri Sat


No. of 15 19 13 12 16 15
accidents

 Test whether the accidents are uniformly


distributed over the week

92
Oi 15 19 13 12 16 15
Ei 15 15 15 15 15 15
n
(O  E ) 2
2   i i

i 1 Ei
2
O i   Ei
v  6 1  5
 0.05
2
(v  5)  11.07
Since  2   0.05
2
, Ho is ACCEPTED
Accidents occur uniformly over the week

93
94
95
96
97
98
99
10
0
10
1
10
2
10
3
10
4
10
5
10
6
Chi-Square Test

INDEPENDENCE OF
ATTRIBUTES

10
7
10
8
A B ROW
TOTAL
B1 B2 - Bj - Bn

A1 O11 O12 O1j O1n O1*

A2 O21 O22 O2j O2n O2*

Ai Oi1 Oi2 Oij Oin Oi*

Am Om1 Om2 Omi Omn Om*

COL O*1 O*2 O*j O*n n


TOT

10
9
11
0
11
1
11
2
11
3
11
4
11
5
11
6
11
7

Potrebbero piacerti anche