Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
STATISTICAL ESTIMATIONS
2.1 INTRODUCTION
Now we are entering that part of statistics called inferential statistics. Inferential statistics was
defined as the part of statistics that helps us to make decisions about some characteristics of a
population based on sample information. In other words inferential statistics uses the sample
results to make decisions and draw conclusions about the population from which the sample is
drawn. Estimation and hypothesis testing taken together are usually referred to as inference
making.
2.2 STATISTICAL ESTIMATION
Population Mean, m X i
X= i =1
n
n
( X -X)
2
Population variance, s 2 i
S2 = i =1
n -1
( X )
n 2
-X
Population S.D, s i
S= i =1
n -1
x
Population proportion, P P =
n
Population parameters can have more than one estimator, but one best estimator.
Example 2.2: m can be estimated by: X - mean , X%- median or X -mode.
Best Estimator means that the sample statistic should be as close to the true value of the
parameter as possible.
2.2.4. POINT ESTIMATION
Point estimation is a statistical procedure in which we use a single value to estimate a population
parameter.
A point estimate is a single number that is used as an estimate of population parameter, and is
derived from a random sample taken form the population of interest.
Some of the most important point estimators are given below.
Population parameter Point estimator
n
Mean, m X i
X= i
n
n
Variance, s 2 (X i - X )2
S2 = i
n -1
Standard deviation, s S = S2
x
Proportion, P P =
n
2.3.3 INTERVAL ESTIMATION
and - z a 2 denotes the value of Z for which the area to its left is a 2 .
Area= a 2 1- a Area= a 2
-za2 za
2
b) ( 1 - a ) 100 = 95%
( 1 - a ) 100 = 95
( 1 - a ) 100 = 95 = 0.95
100
a = 1 - 0.95 = 0.05
za = 1.96 ( from table )
2
Thus, a 95% confidence interval for m is
s
X za
2 n
50
200 ( 1.96 )
25
200 19.6
( 180.40 Birr , 219.60 Birr )
Therefore, we are 95 percent confident that the true mean monthly expenditure for food ( m ) is
between 180.40 Birr and 219.60 Birr.
Example 2.7 A manufacturer claims that his tyres last 20, 000 miles on average. A research
organization tests a random sample of 64 tyres and reports an average mileage of 19,200 with a
standard deviation of 2,000 miles. Does a 99 percent confidence interval for the mean life of all
tyres produced by this manufacturer support the claim?
t -distribution
m =0
Degrees of freedom can be defined as the number of values we can choose freely.
Suppose we are dealing with a sample of size n = 6 and we know that the mean of these 6
numbers is 4. Symbolically, we have
a +b +c +d +e + f
=4
6
Now, we are free to assign any value to a, b, c d and e say, a=2, b=4, c= 8, d= 4 and e=2. But, we
are no more free to assign a value to f since
a+b+c+d +e+ f 21 + f
=4 =4
6 6
21 + f = 24
f = 3,
that is, in order for the mean of these 6 numbers to be equal 4, f must be equal to 3. If we assign
another number for f, then the mean will not be equal to 4. Thus, we are free to choose only 5
values and the 6th is determined automatically. Hence, the degrees of freedom is df=6-1=5.
Similarly, a sample of size n = 25 would give us 24 degrees of freedom.
The values of t for different degrees of freedom and different values of a are tabulated.
t a ( n -1) denotes the value of t for which the area under the curve to its right is equal to a with
(n-1) degrees of freedom.
Example 8.8: Find
a) t 0.025( 19 )
a) t 0.005( 25)
Solution:
a)
Area = 0.025
t 0.025( 19) t
From the t -distribution table t 0.025( 19) =2.093 (shaded area=0.025).
b)
Area = 0.005
t 0.005( 25)
t
From the t -distribution table
t 0.005( 25) =2.787 (shaded area=0.005)
Under such situation, a ( 1 - a ) 100% confidence interval for the population mean m is given by
S
X ta ( n -1)
2 n
Example 8.9: A company has been concerned about the length of time it took to deliver its
potential customers. It felt it averaged about three weeks to deliver its products after receiving an
order. If a random sample of 25 orders averaged 3.4 weeks with a standard deviation of 0.8
weeks, would a 95 percent confidence interval for the average delivery times of all orders
confirm the estimate of three weeks? Assume delivery times are normally distributed.
Solution: Given n = 25, X = 3.4 weeks, S = 0.8 weeks.
Since the population is normally distributed, s is not known and n<30, we have to use the t -
distribution with n-1=25-1=24 degrees freedom.
( 1 - a ) 100% = 95%
a = 0.05
ta ( n -1) = t0.025( 24) = 2.064 ( from table )
2
Therefore, we are 95 percent confident that the minimum delivery time for all orders on average
is 3.07 weeks. Since this minimum number (i.e.the fastest delivery time) is more than three
weeks, the estimate of three weeks is not confirmed by the confidence interval. In fact, the
average delivery time exceeds three weeks.
Confidence interval for population proportion
Recall that when both np and n ( 1 - p ) are at least 5, the binomial distribution can be
approximated by the normal distribution.
Let P be the sample proportion. If the sample size is large ( n 50 ) , then a ( 1 - a ) 100%
confidence interval for the population proportion P is given by:
P za
(
P 1 - P )
2 n
Example 8. 10. In a random sample of 200 people who are having new eyeglasses made, 162
select plastic lenses rather than glass lenses. Find a 95% confidence interval for the percentage of
all new eyeglasses made with plastic lenses.
162
Solution: Given: n=200, x =162, P = = 0.81 .
200
nP = 162 and n 1 - P = 38 ( )
( 1 - a ) 100% = 95%
a = 0.05
a = 0.025
2
Thus, a 95 percent confidence interval for the population proportion P is
P
za
(
P 1 - P ) 0.81 1.96 ( 0.028 )
2 n
0.81 0.055
( 0.755, 0.865 )
That is, with 95% certainty, we conclude that between 75.5% and 86.5% of the people choose
plastic lenses.
Exercise: Of a randomly chosen group of 300 air flights, 74% arrive on time.
a) Find a 99% confidence interval for the proportion of all fights that arrive on time.
b) Find a 95% confidence interval for the proportion of all flights that arrive late.
2.3.4 SAMPLE SIZE DETERMINATION IN ESTIMATING THE
MEAN AND PROPORTION
I. Sample size determination for the mean
Whenever we take a sample for inferential purposes, there is always a sampling error. This
sampling error is controlled by selecting a sample that is adequate in size. If the sample size is
small, then we may fail to achieve the objective of our analysis and if it is too large, then we
waste the resources when we gather the sample.
1. When we estimate the population mean m by the sample mean X , with probability
(1-a ) the maximum error E will be:
s
E = za if s is known
2 n
S
E = za if s is not known
2 n
2. With probability ( 1 - a ) , the sampling error will not exceed some prescribed quantity E if
the sample size is at least:
2
za s
n= 2
E
When we compute n using the above relation, we may get a fractional number. In such cases we
always round up the fractional number to the next integer.
Example 8.11: A manufacturing concern wants to estimate the average amount of purchase of its
product in a month by the customers. If the standard deviation is Birr 10, find the sample size if
the maximum error is not to exceed Birr 3 with probability of 0.99.
Solution: Given: s = Birr 10, E = Birr 3,
( 1 - a ) 100% = 99% a = 0.01
za = 2.58
2
(z s)
2
( 2.58) ( 10 )
2
a
\n = 2
=
E 3
= 73.96
@ 74
Hence, any sample of size 74 or large will give the desired accuracy with approximately 99%
certainty.
Exercise: A researcher for a coffee distributing agency is interested in determining the rate of
coffee usage per household in a certain city. He believes that yearly consumption per household
is normally distributed with a standard deviation of 3 kilos.
a) How large a sample of house holds must he take in order to be 99 percent certain that the
sample mean is with in 0.5 kilo of the true mean.
b) If the researcher takes a random sample of 64 households and records their consumption
for one year, what is the maximum error committed in estimating the mean consumption
of all house holds by the sample mean at the 95 percent confidence level?
II. Sample size determination for a proportion
The methods of sample size determination that are used in estimating a population proportion are
similar to those employed in estimating a mean.
Recall that in developing the sample size for a confidence interval for the mean the sampling
error was defined by
s
E = za .
2 n
( )
When estimating a proportion, s is replaced by P 1 - P . Thus, the sampling error is
E = za
(
P 1 - P ).
2 n
Solving for n , the sample size necessary to develop a confidence interval estimate for a
proportion is obtained as.
za2 P 1 - P( )
n= 2
2
E
Example8.12: A principal of college wants to estimate the proportion of smokers among his
students. What size of a sample should he select so as to have the proportion of smokers not to
exceed by 10% at 98% confidence? It is believed from previous records level that the proportion
of smokers was 0.30.
Solution: Given: P = 0.3, E = 10% = 0.1,
( 1 - a ) 100% = 98% a = 2%
a = 0.02
Then za = Z 0.01 = 2.33
2
Required: n
We know that n =
za2 P 1 - P
2
( )
2
E
(2.33) 2 x ( 0.3) ( 0.7 )
n=
( 0.1)
2
\ n = 114.0069 = 114
Let X 1 and X 2 be sample means of the first and second population respectively. Central limit
theorem states that the difference in sample means X 1 - X 2 is normally distributed for large
sample sizes ( n1 30 and n 2 0 ) with mean
s 12 s 22
s X1-X 2 = + if s 1 and s 2 are known,
n1 n2
S12 S 22
s X1-X 2 = + if s 1 and s 2 are unknown.
n1 n2
S .D12 S .D22
That is X 1 - X 2 : N m1 - m2 , +
n1 n2
\Z =
(X 1 )
- X 2 - ( m1 - m2 )
: N ( 0,1) .
S .D12 S .D22
+
n1 n2
Let a be the probability that the difference of the two means may not be contained in the
interval.
1-a
a
2
a
2
-z a 0 za
2 2
Hence with similar argument as in the single mean case, a ( 1 - a ) 100% confidence interval
estimate for the difference of two populations means:
s 12 s 22 s 12 s 22
(X 1 )
- X 2 - za
2 n1
+
n2
( )
m1 - m 2 X 1 - X 2 + za
2 n1
+
n2
S 12 S 22
(X 1 )
- X 2 za
2 n1
+
n2
n1
Where
( X i - X 1 )2
S12 = i =1
n1 - 1
n2
( X i - X 2 )2
S 22 = i =1
n2 - 1
Example 2 13: A standardized Accounting test was given to 50 girls and 75 boys. The girls made
an average grade of 76 with standard deviation of 6, while the boys made an average grade of 82
with a standard deviation of 8. Find a 96% confidence interval for the difference m1 - m2 , where
m1 is the mean score of all boys and m 2 is the mean score of all girls who might take this test.
Solution: The data given are:
Girls Boys
ng = 50 nb = 75
X g = 76 Xb = 2
Sg = 6 Sb = 8
S 12 S 22 S 12 S 22
(X 1 )
- X 2 - za
2 n1
+
n2
( )
< m1 - m2 < X 1 - X 2 + za
2 n1
+
n2
yields the 96 % confidence interval:
64 36 64 36
6 - 2.05 + m1 - m 2 < 6 + 2.05 +
75 50 75 50
or ,3.43 < m1 - m 2 < 8.57
This procedure for estimating the difference between two means is applicable if s 1 and s 2 are
2 2