Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
3
The hypothesis test is used to evaluate the
results from a research study in which
1. A sample is selected from the
population.
2. The treatment is administered to the
sample.
3. After treatment, the individuals in the
sample are measured.
4
This helps us to decide, on the basis of the
results of the samples, whether
(i) the deviation between the observed sample
statistic and the hypothetical parameter value
(or)
(ii) the deviation between two sample
statistics
is significant
5
On the basis of sample information, we make certain
decision about the population. In taking such decisions
we make certain assumptions. These assumptions are
known as statistical hypothesis. Assuming the
hypothesis correct we calculate the probability getting
the observed sample.
Null hypothesis(H0): We set up a hypothesis which
assumes that there is no significant difference between
the sample statistic and the corresponding parameter or
between two population statistics. Such a hypothesis of
no difference is called a null hypothesis.
Alternative hypothesis(H1): Any hypothesis which is
complementary to the null hypothesis is called an
alternative hypothesis.
6
Because the hypothesis test relies on sample
data, and because sample data are not
completely reliable, there is always the risk
that misleading data will cause the hypothesis
test to reach a wrong conclusion.
Two types of error are possible.
7
Type I error
Type II error
8
The tails in a distribution are the extreme
regions bounded by critical values.
Determinations of P-values and critical
values are affected by whether a critical
region is in two tails, the left tail, or the
right tail. It therefore becomes important to
correctly characterize a hypothesis test as
two-tailed left-tailed, or right-tailed.
9
If θ0 is a population parameter and θ is the
corresponding sample statistic and if we set up the null
hypothesis H0: θ= θ0, then the alternative hypothesis
which is H0 complementary to can be any one of the
following:
(i) H1: θ≠θ0 (ie) θ>θ0 or θ<θ0
(ii) H1: θ>θ0
(iii) H1: θ<θ0
H1 given in (i) is called a two tailed alternative
hypothesis
H1 given in (ii) is called a right tailed alternative
hypothesis.
H1 given in (ii) is called a left tailed alternative
hypothesis.
10
The critical region (or rejection region) is the set of
all values of the test statistic that cause us to reject
the null hypothesis.
The region of the sample S which amounts to the
acceptance of H0 is called acceptance region.
The value of the test statistic which separates the
critical region from the acceptance region is called
the critical value or significant value.
The value of the test statistic z for which the
critical region and acceptance region are separated
is called the critical value or the significant value of
z and denoted by zα, where α is the level of
significance.
11
:A confidence interval estimate of a
population parameter contains the likely
values of that parameter. If a confidence
interval does not include a claimed value of a
population parameter, reject that claim.
12
13
(i) Null hypothesis H0 is defined
(ii) Alternative hypothesis H1 is also defined (two
tailed or one tailed)
(iii) Level of significance α is fixed or taken from
the problem if specified and z
The test statistic z is computed
If | z | z , null hypothesis is accepted or H1 is
rejected (ie) the difference is not significant at
α%. Otherwise null hypothesis is rejected and
concluded there is a significant difference .
14
LOS
Nature of
test 1%(0.01) 2%(0.02) 5^(0.05) 10%(0.1)
15
The power of a hypothesis test is defined is
the probability that the test will reject the null
hypothesis when the treatment does have an
effect.
The power of a test depends on a variety of
factors including the size of the treatment
effect and the size of the sample.
17
18
95% confidence limits for μ are given by
x 1.96 , x 1.96
n n
When σ is not given 95% confidence limits are
given by
s s
x 1.96 , x 1.96
n n
Where s is the sample standard deviation
19
The mean breaking strength of the cables
supplied by a manufacturer is 1800 with a SD
of 100. By a new technique in the
manufacturing process, it is claimed that the
breaking strength of the cable has increased.
To test this claim, a sample of 50 cables is
tested and it is found that the mean breaking
strength is 1850. Can we support the claim at
1% LOS?
20
x 1850, n 50, 1800 and 100
H0 : x
H1 : x
One tailed (right tailed ) test is to be used
Let LOS 1%. z 2.33.
x 1850 1800
z 3.54
/ n 100 / 50
| z | z
Therefore, the difference between x and is significant at 1% level.
H 0 is rejected and H1 is accepted (ie) sample sup port claim of increase in breaking strength.
21
A sample of 100 students is taken from a
large population. The mean height of the
students in this sample is 160 cm. Can it be
reasonably regarded that, in the population,
the mean height is 165 cm, and the SD is 10
cm?
22
x 160, n 100, 165 and 10
H0 : x
H1 : x
Two tailed test is to be used
Let LOS be1%.Therefore, z 2.58
x 160 165
z 5
/ n 10 / 100
| z | z
The difference is significant at 1% level (ie) H 0 is rejected
It is not statistically correct to assume that 165
23
The mean value of a random sample of 60
items was found to be 145, with a SD of 40.
Find the 95% confidence limits for the
population mean. What size of the sample is
required to estimate the population mean
within 5 of its actual value with 95% or more
confidence, using the sample mean.
24
Since σ is not given , we use the formula
s s
x 1.96 x 1.96
n n
1.96 * 40 1.96 * 40
(ie)145 145
60 60
(ie)134.9 155.1
We have to find the value of n such that
P{ x 5 x 5} 0.95
P{5 x 5} 0.95
P (| x | 5) 0.95
(ie) P (| x | 5) 0.95
| x | 5
P 0.95
/ n / n
5 n
P
| z | 0.95
25
Z is the standard normal variate
5 n
1.96
n 245.86
26
A normal population has a mean of 6.8 and
sd 0f 1.5. A sample of 400 members gave a
mean of 6.75. Is the difference significant?
27
x 6.75, n 400, 6.8 and 1.5
H0 : x
H1 : x
Two tailed test is to be used
Let LOS be 5%.Therefore, z 1.96
x 6.75 6.8
z 0.67
/ n 1.5 / 900
| z | z
The difference is significant at 5% level (ie) H 0 is accepted
28
The mean weight obtained from a random
sample of size 100 is 64 gms. The SD of the
weight distribution of the population is 3
gms. Test the statement that the mean
weight of the population is 67 gms at 5 :%
level of significance. Also set up 99%
confidence limits of the mean weight of the
population.
29
n 100, 67, x 64, 3
H 0 : x or 67
H1 : x or 67
x
z 10
/ n
| z | 1.96
H 0 is rejected .
30
Suppose the manufacturer claims that the mean lifetime
of a light bulb is more than 10,000 hours. In a sample
of 30 light bulbs, it was found that they only last
9,900 hours on average. Assume the population
standard deviation is 120 hours. At 0.05 significance
level, can we reject the claim by the manufacturer?
Solution:
33
Let X 1 and X 2 be the means of two l arg e samples n1 and n2
with s tan dard deviations 1 and 2
The statistic z is given by
X1 X 2
z
2
2
1
2|
n1 n2
| z | z , thedifference between X 1 and X 2 is not significant at % LOS
34
If the samples are drawn from the same
population (ie) σ1=σ1= σ
X1 X 2
z
1 1
n1 n2
35
If σ1,σ2 are equal and not known , then
σ1=σ2 =σ is found using
2
n2 s2
2
2
n1 s1
n1 n2
In such cases
X1 X
z 2
s12 s2 2
n2 n1
36
In a random sample of size 500, the mean is
found to be 20. In another independent
sample of size 400, the mean is 15. Could
the samples have been drawn from the same
population with SD 4?
37
x1 20, n1 500, x2 15, n2 400, 4
H 0 : x1 x2
H1 : x1 x2
Two tailed test is to be used
Let LOS be 1%. z 2.58
x1 x2
z 18.6
1 1
n1 n2
| z | z
Therefore, the difference between x1 and x2 is significant at 1% level
H 0 is rejected
Samples are not drawn from the same population
38
A simple sample of heights of 6400 English
men has a mean of 170 cm and a SD 6.4 cm,
while a simple sample of heights of 1600
Americans has a mean of 172 cm and a SD of
6.3 cm. Do the data indicate that Americans
are, on the average taller than the
Englishmen?
39
n1 6400, x1 170, s1 6.4, n2 1600, x2 172, s1 6.3
H 0 : 1 2 or x1 x2
H1 : 1 2 or x1 x2
Left tail test
LOS 1%
z 2.33
x1 x2 x1 x2 170 172
z 11.32
12 22 s12 s2 2 6.42 6.32
n1 n2 n1 n2 6400 1600
| z || z |
40
The difference is significant at 1% level
H0 is rejected and H1 is accepted
(ie) Americans are,on an average, taller than
the Englishmen
41
Test the significane of the difference between
the means of the samples, drawn from two
normal populations with the same SD using
the following data:
Size Mean SD
Sample 1 100 61 4
Sample 2 200 63 6
42
H 0 : x1 x2 or 1 2
H1 : x1 x2 or 1 2
Two tailed test is to be used
LOS 1%
z 2.33
x1 x2
z 3.02
2 2
s s
1
2
n1 n2
| z | z
The difference is significant at 1% level
The two normal populations, from which the samples are drawn, may
not have the same mean, though they the same SD
43
The average marks scored by 32 boys is 72
with a SD of 8, while that for 36 girls is 70
with a SD of 6. Test at 1% LOS whether the
boys perform better than girls.
44
H 0 : x1 x2 or 1 2
H1 : x1 x2 or 1 2
Right tailed test is to be used
LOS 1%
z 2.33
x1 x2
z 1.15
2 2
s s2
1
n1 n2
| z | z
The difference is not significant at 1% level
We conclude boys perform better than girls
45
The average income of persons was Rs.210
with a SD of Rs.10 in sample of 100 people of
a city. For another sample of 150 persons,
the average income was Rs. 220 with SD of
Rs.12. The SD of incomes of the people of the
city was Rs.11. Test whether there is any
significant difference between the average
incomes of the localities.
46
For sample 1 n1=1000,∑x=49000,
∑(x- x )2=784000
For sample 2
n2=1500,∑x=70500,
∑(x-mean(x) )2=2400000
Discuss the significane of the difference of
the sample means.
47
X denote no of successes in n independent
Bernoulli trials
P denote the prob of success of each trial
X follows Binomial with mean nP and var nPQ
when n is l arg e
X follows Normal N (nP, nPQ )
nP nPQ
X / n follows normal N
n ,
n2
PQ
(ie) N
nP ,
n
pP
Statistic is z
PQ
n
48
1. If P is not known, we assume that p is
nearly equal to P, then
pP
z
pq
n
50
H0 : p P
H1 : p P
One tailed test
Let LOS be 1%
z 2.33
P 0.1726, Q 0.8274
0.0984 0.1726
z 4.96
(0.1726*0.8274) / 640
| z || z |
H 0 is rejected
51
Experience has shown that 20% 0f a
manufactured product is of top quality.In one
day’s production of 400 articles, only 50 are
of top quality. Show that either the
production of the day chosen was not a
representative sample or the hypothesis of
20% wrong. Based on the particular day’s
production, find also the 95% confidence
limits for the percentage of top quality
product
52
H 0 : P 1/ 5
H1 : P 1 / 5
p prod of top quality products 50 / 400 1 / 8
1 / 8 1 / 5
z 3.75
(1 / 5) * (4 / 5) * (1 / 400)
| z | 3.75 1.96
95% confidence lim its for P are given by
pq pq
p 1.96 P p 1.96
n n
0.093 P 0.157
53
A salesman in a departmental store claims
that at most 60% of the shoppers entering the
store leave without making a purchase. A
random sample of 50 shoppers showed that
35 of them left without making a purchase.
Are these sample results consistent with the
claim of the salesman? Use an LOS of 0.05.
54
H0 : p P
H1 : p P
p 0.7 P 0.6
One tailed test
z 1.443
| z | z
H 0 is accepted
Sample results consistent with the claim of the salesman
55
Suppose 60% of citizens voted in last election. 85 out of
148 people in a telephone survey said that they voted in
current election. At 0.05 significance level, can we reject
the null hypothesis that the proportion of voters in the
population is above 60% this year?
Interpretation :
The test statistic -0.6376 is not less than the critical
value of -1.6449. Hence, at 0.05 significance level, we do
not reject the null hypothesis that the proportion of
voters in the population is above 60% this year.
p-value is greater than the 0.05 significance level, we do
not reject the null hypothesis that the proportion of
voters in the population is above 60% this year.
56
p1,p2 be the proportions of successes in two
large samples of n1 and n2 from the same
population or from two populations with the
same proportions P
PQ
p1 follows N P,
n1
PQ
p 2 follows N
P,
n 2
p1 p 2
Statistic z
1 1
PQ
n1 n2
n1 p1 n 2 p 2
P is not known , P
n1 n 2
57
In a large city A, 20% of a random sample of
900 school boys had a slight physical defect.
In another large city B, 18.5% percent of a
random sample of 1600 school boys had the
same defect. Is the difference between the
proportions significant?
58
p1 0.2, p 2 0.185, n1 900, n 2 1600
H 0 : p1 p 2
H1 : p1 p 2
Two tailed test is to be used
LOS be 5% , z 1.96
p1 p 2
z
1 1
PQ
n1 n2
n1 p1 n 2 p 2
P .1904
n1 n 2
z 0.92
| z | z
p1 and p 2 is not significant at 5% level
59
Before an increase in excise duty on tea, 800
people out of a sample of 1000 were
consumers of tea. After the increase in duty,
800 people were consumers of tea in a
sample of 1200 persons. Find whether there
is a significant decrease in the consumption
of tea after the increase in duty.
60
p1, p 2 denote proportions of the consumers before and after
increase in duty respy
p1 800 /1000 4 / 5, p 2 800 /1200 2 / 3
H 0 : p1 p 2
H1 : p1 p 2
One tailed test
LOS be 1%, z 2.33
p1 p 2
z
1 1
PQ
n1 n 2
n1 p1 n 2 p 2
P 0.7273
n1 n 2
z 6.82
| z | z (ie) diff between p1 and p 2 is significant at 1%
61
15.5% of a random sample of 1600
undergraduates were smokers, whereas 20%
of a random sample of 900 postgraduates
were smokers in a state. Can we conclude
that less number of undergraduates are
smokers than the postgraduates?
62
p1, p 2 denote proportions of the undergrad and postgrad smo ker s
p1 0.155, p 2 0.2, n1 1600, n2 900
H 0 : p1 p 2
H1 : p1 p 2
One tailed test
LOS be 1%, z 1.645
p1 p 2
z
1 1
PQ
n1 n 2
n1 p1 n 2 p 2
P 0.1712
n1 n2
z 2.87
| z || z | (ie) diff between p1 and p 2 is significant at 5%
The habit of smoking is less among undergraduates than postgraduates
63
Suppose that 12% of apples harvested in an
orchard last year was rotten. 30 out of 214
apples in a harvest sample this year turns out
to be rotten. At .05 significance level, can we
reject the null hypothesis that the proportion
of rotten apples in harvest stays below 12%
this year?
The null hypothesis is that p ≤ 0.12.
Alternative hypothesis p>0.12
64
Interpretation:-
The test statistic 0.90875 is not greater than
the critical value of 1.6449. Hence, at 0.05
significance level, we do not reject the null
hypothesis that the proportion of rotten
apples in harvest stays below 12% this year.
p-value is greater than the 0.05 significance
level, we do not reject the null hypothesis
that the proportion of rotten apples in
harvest stays below 12% this year.
65
Suppose a coin toss turns up 12 heads out of
30 trials. At 0.05 significance level, can one
reject the null hypothesis that the coin toss is
fair?
The null hypothesis is that p = 0.5.
Interpretation:
The test statistic 1.095445 lies between the
critical values -1.9600 and 1.9600. Hence, at
0.05 significance level, we do not reject the
null hypothesis that the coin toss is fair.
p-value is greater than the 0.05 significance
level, we do not reject the null hypothesis
that the coin toss is fair.
66
STUDENT’S t-DISTRIBUTION
67
The probability density function is symmetric,
and its overall shape resembles the bell shape
of a normally distributed variable with mean
0 and variance 1, except that it is a bit lower
and wider. As the number of degrees of
freedom grows, the t-distribution approaches
the normal distribution with mean 0 and
variance 1. 68
69
Statistic is given by
x
t , n 1 deg rees of freedom
s / n 1
x
or t
S / n 1
n
1
S2
n 1 r 1
( xi x) 2
s s
95% confidence level x t0.05 x t0.05 , t0.05 is 5% critical
n 1 n 1
value of t for n 1deg rees of freedom
70
A machinist is expected to make engine parts
with axle diameter of 1.75 cm. A random
sample of 10 parts shows a mean diameter
1.85 cm with an SD of 0.1cm. On the basis of
this sample, would you say that the work of
the machinist is inferior?
71
x 1.85, s 0.1, n 10 and 1.75
H0 : x
H1 : x
Two tailed test is to be used . LOS be 5%
x
t 3, v n 1 9
s / n 1
Table t value t0.05 2.26, t0.01 3.25
| t | t0.05 and | t | t0.01
H 0 is rejected and H1 is accepted at 5% level but
H1 is rejected and H 0 is accepted at 1% level
At 5% work of machinist is inf erior
At 1% work of machinist is not inf erior
72
A certain injection administered to each of 12
patients resulted in the following increases of
blood pressure
5,2,8,-1,3,0,6,-2,1,5,0,4
Can it be concluded that the injection will be,
in general, accompanied by an increase in BP?
73
Mean
x 31 / 12 2.58
n
x2 x
2
s2
8.76
n n
s 2.96
H0 : x
H1 : x
Right tailed test LOS 5%, v 11
t0.05 1.8
x
t 2.89
s / n 1
| t | t0.05 . Hence H 0 is rejected and H1 is accepted .
Injection is accompanied by an increase in BP
74
Suppose now that the manager of the team
(given the results obtained) fired the coach
who has not made any improvement, and take
another, more promising. We report the
times of athletes after the second training:
Before training:
12.9 13.5 12.8 15.6 17.2 19.2 12.6 15.3 14.4 11.3
After the second training:
12.0 12.2 11.2 13.0 15.0 15.8 12.2 13.4 12.9
11.0
Statistic
x1 x2
t
1 1
n1 n2
ns n s
2 2
is not known, 1 1 2 2
n1 n2 2
Degrees of freedom n1 n 2 2
76
Two independent samples of sizes 8 and 7
contained the following values:
Sample 1 19 17 15 21 16 18 16 14
Sample 2 15 14 15 19 15 18 16
77
x1 17, x2 16, s1 2.12, s2 1.69
H 0 : x1 x2
H1 : x1 x2
Two tailed . LOS 5%
x1 x2
t
1 1
n1 n2
n1s12 n2 s22
is not known,
n1 n2 2
t 0.93, v n1 n 2 2 13
t0.05 2.16
| t | t0.05 , H 0 is accepted and H1 is rejected
Two sample means do not differ significantly at 5% LOS
78
The mean height and the SD height of 8
randomly chosen soldiers are 166.9 and 8.29
cm respy. The corresponding values of 6
randomly chosen sailors are 170.3 and 8.5
cm respy. Based on this data, can we
conclude that soldiers are, in general, shorter
than sailors?
79
x1 166.9, x2 170.3, s1 8.29, s2 8.50, n1 8, n 2 6
H 0 : x1 x2
H1 : x1 x2
One tailed . LOS 5%
x1 x2
t
1 1
n1 n2
n1s12 n2 s22
is not known,
n1 n2 2
t 0.695, v n1 n 2 2 12
t0.05 1.78
| t | t0.05 , H 0 is accepted and H1 is rejected
Soldiers are in general , shorter than sailors
80
12
F 2 with v1 , v2 deg rees of freedom
2
2
n s
12 1 1 with n1 1 deg rees of freedom,
n1 1
2
n s
22 2 2 with n2 1 deg rees of freedom
n2 1
81
Two samples of sizes 9 and 8 gave the sums
of squares of deviations from their respective
means equal to 160 and 91 respy. Can they
be regarded as drawn from the same normal
population?
82
n1 9, ( xi x ) 2 160 (ie) n1 s12 160
n2 8, ( yi y ) 2 91, (ie) n2 s22 91
12 20, 22 13
H 0 : 12 22
H1 : 12 22
LOs 5%
12 20
F 1.54
22 13
F0.05 (v1 8, v 2 7) 3.73
F F0.05
H 0 is accepted
83
Two independent samples of 8 and 7 items
respectively had the following values of the
variable
Sample 1 9 11 13 11 15 9 12 14
Sample 2 10 12 10 14 9 8 10
84
x 1 94, x
2
1 1138
2
1 1
s12 x1
2
1 4.19
x
n1 n1
x2 73, x12 785
2
1 1
s22 2 n
x 2
2 3.39
x
n2 2
n1s12
1
2
4.79
n1 1
n2 s22
2
3.96
n2 1
2
H 0 : 12 22 and H1 : 12 22
12
F 2 1.21
2
F0.05 (v1 7, v 2 6) 4.21
F F0.05 , H 0 is accepted (ie) 12 and 22 do not differ significantly
85
Chi-Square Goodness
Of Fit
86
A statistical method assessing the goodness
of fit between a set of observed values and
those expected theoretically. A chi-squared
test is any statistical hypothesis test in
which the sampling distribution of the test
statistic is a Chi-square distribution when
the null hypothesis is true. The chi-squared
test is used to determine whether there is a
significant difference between the expected
frequencies and the observed frequencies in
one or more categories
87
n
(O E ) 2
2 i i
i 1 Ei
Oi Observed frequency
Ei Expected frequency
If 2 v 2 ( ), null hypothesis is accepted
LOS
v deg rees of freedom
88
1. The no. of observations N in the sample must be
reasonably large, say ≥50
2. Individual frequencies must not be too small
(ie)Oi≥10. In case Oi<10, it is combined with the
neighbouring frequencies, so that the combined
frequency is ≥10
3. The number of classes n must be neither too
small nor too large (ie) 4≤n≤16
89
90
Total no of digits=10000
If digits occur uniformly, then each digit will
occur 10,000/10=1000 times.
Oi 1026 1107 997 966 1075 933 1107 972 964 853
Ei 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000
(Oi Ei ) 2
n
2
i 1 Ei
1
[(26) 2 (107) 2 ...] 58.542
1000
v 10 1 9
0.05
2
( n 9) 16.919
Since 2 0.05
2
, Ho is rejected ,
(ie)digits do not occur uniformly in the directory
91
The following data give the number of air-
craft accidents that occurred during the
various days of a week.
92
Oi 15 19 13 12 16 15
Ei 15 15 15 15 15 15
n
(O E ) 2
2 i i
i 1 Ei
2
O i Ei
v 6 1 5
0.05
2
(v 5) 11.07
Since 2 0.05
2
, Ho is ACCEPTED
Accidents occur uniformly over the week
93
94
95
96
97
98
99
10
0
10
1
10
2
10
3
10
4
10
5
10
6
Chi-Square Test
INDEPENDENCE OF
ATTRIBUTES
10
7
10
8
A B ROW
TOTAL
B1 B2 - Bj - Bn
10
9
11
0
11
1
11
2
11
3
11
4
11
5
11
6
11
7