Sei sulla pagina 1di 13

CHAPTER TWO

STATISTICAL ESTIMATIONS
2.1 INTRODUCTION

Now we are entering that part of statistics called inferential statistics. Inferential statistics was
defined as the part of statistics that helps us to make decisions about some characteristics of a
population based on sample information. In other words inferential statistics uses the sample
results to make decisions and draw conclusions about the population from which the sample is
drawn. Estimation and hypothesis testing taken together are usually referred to as inference
making.
2.2 STATISTICAL ESTIMATION

Definition: Estimation is the process by which we approximate or estimate various unknown


population parameters from sample statistics.
In inferential statistics, m is called the true population mean and P is called the true population
proportion. There are many other population parameters, such as the median, mode, variance,
and standard deviation. Then the difference between population values and sample values is
called sampling error.
2.2.1 ESTIMATOR AND ESTIMATES
Definitions:
1. Any sample statistics that is used to estimate a population parameter is called an
estimator.
2. An estimate is a numerical value of an estimator.
Example 2.1:
Parameter (population values) Estimator (statistic)
n

Population Mean, m X i
X= i =1

n
n

( X -X)
2

Population variance, s 2 i
S2 = i =1

n -1

( X )
n 2
-X
Population S.D, s i
S= i =1

n -1
x
Population proportion, P P =
n
Population parameters can have more than one estimator, but one best estimator.
Example 2.2: m can be estimated by: X - mean , X%- median or X -mode.
Best Estimator means that the sample statistic should be as close to the true value of the
parameter as possible.
2.2.4. POINT ESTIMATION
Point estimation is a statistical procedure in which we use a single value to estimate a population
parameter.
A point estimate is a single number that is used as an estimate of population parameter, and is
derived from a random sample taken form the population of interest.
Some of the most important point estimators are given below.
Population parameter Point estimator
n

Mean, m X i
X= i

n
n

Variance, s 2 (X i - X )2
S2 = i

n -1
Standard deviation, s S = S2
x
Proportion, P P =
n
2.3.3 INTERVAL ESTIMATION

Definition: An interval estimation is a method of estimating the population parameter by a


range of values, i.e., by an interval where by the population parameter lies between two limits.
The range of the interval would depend up on the probability with which the population
parameter is expected to fall in the range. Given two values T 1 and T2, we can always determine
the probability that the interval (T1, T2) contains the parameter.
In general, P ( T1 ,T 2 ) = 1 - a
Where a = the probability that the parameter may not be contained in the interval (usually called
level of significance)
Common values of a are: 0.1, 0.05, 0.025, 0.01,
1 - a Is called confidence coefficient and the interval (T1, T2) is called confidence interval.
Confidence interval for population mean ( m )

I. Sampling from a normally distributed population with known variance s 2


(Sample large or small)
Recall that za denotes the value of Z for which the area under the standard normal curve to its
right equal to a . Analogously z a 2 denotes the value of Z for which the area to its right is a 2 ,

and - z a 2 denotes the value of Z for which the area to its left is a 2 .

Consider the following figure,

Area= a 2 1- a Area= a 2

-za2 za
2

From the above figure we have:


1. Large sample confidence interval for the population mean m
If the distribution of the population form which the sample is drawn is unknown or is not normal
but the sample size is large; i.e., the sample size is at least 30, then by the Central Limit
Theorem, a confidence interval for the population mean m is given by
s
X za if s is known
2 n
S
or X za , if s is not known
2 n
Example 2.6 In a certain small city, to estimate the mean monthly expenditure for food, a
random sample of 25 households was randomly selected yielding a mean of 200 birr. From
experience it is known that such expenditures are normally distributed with a standard deviation
of 50 Birr.
a) What is the point estimate of the mean monthly expenditures for food of all households in
the city?
b) Find a 95 percent confidence interval for the mean monthly expenditures for food of all
households in the city.
Solution: Here, the population mean m is:
m = the mean monthly expenditure for food for all houses holds in the city

Given: X = 200 Birr , s = 50 Birr , n = 25

a) Point estimate of m is: m = X = 200 birr.

b) ( 1 - a ) 100 = 95%
( 1 - a ) 100 = 95
( 1 - a ) 100 = 95 = 0.95
100
a = 1 - 0.95 = 0.05
za = 1.96 ( from table )
2
Thus, a 95% confidence interval for m is
s
X za
2 n
50
200 ( 1.96 )
25
200 19.6
( 180.40 Birr , 219.60 Birr )
Therefore, we are 95 percent confident that the true mean monthly expenditure for food ( m ) is
between 180.40 Birr and 219.60 Birr.

Example 2.7 A manufacturer claims that his tyres last 20, 000 miles on average. A research
organization tests a random sample of 64 tyres and reports an average mileage of 19,200 with a
standard deviation of 2,000 miles. Does a 99 percent confidence interval for the mean life of all
tyres produced by this manufacturer support the claim?

Solution: Given: n = 64, X = 19, 200miles, S = 2, 000 miles.


Though we have no information about the normality of the population, since n is large
( i .e .n 30 ) we can use the normal distribution by the central limit theorem.
( 1 - a ) 100% = 99%
a = 0.1
za = 2.58 ( from table )
2
A 99 percent confidence interval for the mean ( m ) will thus be
S
X za
2 n
2, 000
19, 200 ( 2.58 )
64
19, 200 ( 645.0 )
( 18,555 miles,19,845.0 miles )
Hence, we are 99 percent confident that the true mean mileage is at most 19, 845.0 which is
less than the claimed mean 20,000 miles. Therefore, the claim is not true.
II. Small Sample Confidence interval for the Population Mean: Sampling from a
normally distributed population with s 2 unknown and n < 30 .
When 1. the population from which the sample is selected is (approximately)
normally distributed,
2. the sample size is small (that is, n < 30 ), and
3. the population standard deviation s is not known,
the normal distribution is replaced by the t distribution to construct confidence intervals about
m . The t distribution was developed by W. S. Gosset in 1908 and published under the
Pseudonym Student. As a result, the t distribution is also called Students t distribution. The t
distribution is similar to the normal distribution in some respects. Like the normal distribution
curve, the t distribution curve is symmetric (bell-shaped) about the mean and it never meets the
horizontal axis. The total area under a t distribution curve is1.0 or 100%. However, the t
distribution curve is flatter than the standard normal distribution curve. In other words, the t
distribution curve has a lower height and a wider spread (or, we can say, larger standard
deviation) than the standard normal distribution.
However, as the sample size increases, the t distribution approaches the standard normal
distribution. The units of a t distribution are denoted by t .The shape of a particular t
distribution curve depends on the number of degrees of freedom (df).
Z-distribution

t -distribution

m =0
Degrees of freedom can be defined as the number of values we can choose freely.
Suppose we are dealing with a sample of size n = 6 and we know that the mean of these 6
numbers is 4. Symbolically, we have
a +b +c +d +e + f
=4
6
Now, we are free to assign any value to a, b, c d and e say, a=2, b=4, c= 8, d= 4 and e=2. But, we
are no more free to assign a value to f since

a+b+c+d +e+ f 21 + f
=4 =4
6 6
21 + f = 24
f = 3,
that is, in order for the mean of these 6 numbers to be equal 4, f must be equal to 3. If we assign
another number for f, then the mean will not be equal to 4. Thus, we are free to choose only 5
values and the 6th is determined automatically. Hence, the degrees of freedom is df=6-1=5.
Similarly, a sample of size n = 25 would give us 24 degrees of freedom.
The values of t for different degrees of freedom and different values of a are tabulated.
t a ( n -1) denotes the value of t for which the area under the curve to its right is equal to a with
(n-1) degrees of freedom.
Example 8.8: Find
a) t 0.025( 19 )

a) t 0.005( 25)
Solution:

a)

Area = 0.025

t 0.025( 19) t
From the t -distribution table t 0.025( 19) =2.093 (shaded area=0.025).

b)

Area = 0.005

t 0.005( 25)
t
From the t -distribution table
t 0.005( 25) =2.787 (shaded area=0.005)

Under such situation, a ( 1 - a ) 100% confidence interval for the population mean m is given by

S
X ta ( n -1)
2 n
Example 8.9: A company has been concerned about the length of time it took to deliver its
potential customers. It felt it averaged about three weeks to deliver its products after receiving an
order. If a random sample of 25 orders averaged 3.4 weeks with a standard deviation of 0.8
weeks, would a 95 percent confidence interval for the average delivery times of all orders
confirm the estimate of three weeks? Assume delivery times are normally distributed.
Solution: Given n = 25, X = 3.4 weeks, S = 0.8 weeks.
Since the population is normally distributed, s is not known and n<30, we have to use the t -
distribution with n-1=25-1=24 degrees freedom.
( 1 - a ) 100% = 95%
a = 0.05
ta ( n -1) = t0.025( 24) = 2.064 ( from table )
2

A 95 percent confidence interval of m will be:


S
X ta s
2 n
0.8
3.4 ( 2.064 ) x
25
3.4 0.33
( 3.07 weeks,3.73 weeks )

Therefore, we are 95 percent confident that the minimum delivery time for all orders on average
is 3.07 weeks. Since this minimum number (i.e.the fastest delivery time) is more than three
weeks, the estimate of three weeks is not confirmed by the confidence interval. In fact, the
average delivery time exceeds three weeks.
Confidence interval for population proportion
Recall that when both np and n ( 1 - p ) are at least 5, the binomial distribution can be
approximated by the normal distribution.
Let P be the sample proportion. If the sample size is large ( n 50 ) , then a ( 1 - a ) 100%
confidence interval for the population proportion P is given by:

P za
(
P 1 - P )
2 n
Example 8. 10. In a random sample of 200 people who are having new eyeglasses made, 162
select plastic lenses rather than glass lenses. Find a 95% confidence interval for the percentage of
all new eyeglasses made with plastic lenses.
162
Solution: Given: n=200, x =162, P = = 0.81 .
200
nP = 162 and n 1 - P = 38 ( )
( 1 - a ) 100% = 95%
a = 0.05
a = 0.025
2
Thus, a 95 percent confidence interval for the population proportion P is

P
za
(
P 1 - P ) 0.81 1.96 ( 0.028 )
2 n
0.81 0.055
( 0.755, 0.865 )
That is, with 95% certainty, we conclude that between 75.5% and 86.5% of the people choose
plastic lenses.
Exercise: Of a randomly chosen group of 300 air flights, 74% arrive on time.
a) Find a 99% confidence interval for the proportion of all fights that arrive on time.
b) Find a 95% confidence interval for the proportion of all flights that arrive late.
2.3.4 SAMPLE SIZE DETERMINATION IN ESTIMATING THE
MEAN AND PROPORTION
I. Sample size determination for the mean
Whenever we take a sample for inferential purposes, there is always a sampling error. This
sampling error is controlled by selecting a sample that is adequate in size. If the sample size is
small, then we may fail to achieve the objective of our analysis and if it is too large, then we
waste the resources when we gather the sample.
1. When we estimate the population mean m by the sample mean X , with probability
(1-a ) the maximum error E will be:
s
E = za if s is known
2 n
S
E = za if s is not known
2 n

2. With probability ( 1 - a ) , the sampling error will not exceed some prescribed quantity E if
the sample size is at least:
2
za s
n= 2
E

When we compute n using the above relation, we may get a fractional number. In such cases we
always round up the fractional number to the next integer.
Example 8.11: A manufacturing concern wants to estimate the average amount of purchase of its
product in a month by the customers. If the standard deviation is Birr 10, find the sample size if
the maximum error is not to exceed Birr 3 with probability of 0.99.
Solution: Given: s = Birr 10, E = Birr 3,
( 1 - a ) 100% = 99% a = 0.01
za = 2.58
2

(z s)
2

( 2.58) ( 10 )
2
a
\n = 2
=
E 3
= 73.96
@ 74
Hence, any sample of size 74 or large will give the desired accuracy with approximately 99%
certainty.
Exercise: A researcher for a coffee distributing agency is interested in determining the rate of
coffee usage per household in a certain city. He believes that yearly consumption per household
is normally distributed with a standard deviation of 3 kilos.
a) How large a sample of house holds must he take in order to be 99 percent certain that the
sample mean is with in 0.5 kilo of the true mean.
b) If the researcher takes a random sample of 64 households and records their consumption
for one year, what is the maximum error committed in estimating the mean consumption
of all house holds by the sample mean at the 95 percent confidence level?
II. Sample size determination for a proportion
The methods of sample size determination that are used in estimating a population proportion are
similar to those employed in estimating a mean.
Recall that in developing the sample size for a confidence interval for the mean the sampling
error was defined by
s
E = za .
2 n

( )
When estimating a proportion, s is replaced by P 1 - P . Thus, the sampling error is

E = za
(
P 1 - P ).
2 n
Solving for n , the sample size necessary to develop a confidence interval estimate for a
proportion is obtained as.
za2 P 1 - P( )
n= 2
2
E
Example8.12: A principal of college wants to estimate the proportion of smokers among his
students. What size of a sample should he select so as to have the proportion of smokers not to
exceed by 10% at 98% confidence? It is believed from previous records level that the proportion
of smokers was 0.30.
Solution: Given: P = 0.3, E = 10% = 0.1,

( 1 - a ) 100% = 98% a = 2%
a = 0.02
Then za = Z 0.01 = 2.33
2

Required: n

We know that n =
za2 P 1 - P
2
( )
2
E
(2.33) 2 x ( 0.3) ( 0.7 )
n=
( 0.1)
2

\ n = 114.0069 = 114

Confidence interval for the difference between two population means ( m1 - m 2 )

Suppose we have a random sample of size n1 from a N ( m1 , s 1 ) population and


2
a)
another independent sample of size n 2 form normal population N ( m1 , s 22 ) .
2

Let X 1 and X 2 be sample means of the first and second population respectively. Central limit
theorem states that the difference in sample means X 1 - X 2 is normally distributed for large
sample sizes ( n1 30 and n 2 0 ) with mean

m X1 - X 2 = m1 - m 2 and standard deviation

s 12 s 22
s X1-X 2 = + if s 1 and s 2 are known,
n1 n2

S12 S 22
s X1-X 2 = + if s 1 and s 2 are unknown.
n1 n2

S .D12 S .D22
That is X 1 - X 2 : N m1 - m2 , +
n1 n2
\Z =
(X 1 )
- X 2 - ( m1 - m2 )
: N ( 0,1) .
S .D12 S .D22
+
n1 n2

Let a be the probability that the difference of the two means may not be contained in the
interval.

1-a

a
2

a
2

-z a 0 za
2 2

Hence with similar argument as in the single mean case, a ( 1 - a ) 100% confidence interval
estimate for the difference of two populations means:

i) When n1 and n1 are large, s 1 and s 2 are known is given by

s 12 s 22 s 12 s 22
(X 1 )
- X 2 - za
2 n1
+
n2
( )
m1 - m 2 X 1 - X 2 + za
2 n1
+
n2

ii) When n1 and n2 are large, s 1 and s 2 are unknown is given by

S 12 S 22
(X 1 )
- X 2 za
2 n1
+
n2
n1

Where
( X i - X 1 )2
S12 = i =1

n1 - 1
n2

( X i - X 2 )2
S 22 = i =1

n2 - 1
Example 2 13: A standardized Accounting test was given to 50 girls and 75 boys. The girls made
an average grade of 76 with standard deviation of 6, while the boys made an average grade of 82
with a standard deviation of 8. Find a 96% confidence interval for the difference m1 - m2 , where
m1 is the mean score of all boys and m 2 is the mean score of all girls who might take this test.
Solution: The data given are:
Girls Boys
ng = 50 nb = 75
X g = 76 Xb = 2
Sg = 6 Sb = 8

Using a = 0.04 we find z0.02 = 2.05.


Hence substituting in the formula

S 12 S 22 S 12 S 22
(X 1 )
- X 2 - za
2 n1
+
n2
( )
< m1 - m2 < X 1 - X 2 + za
2 n1
+
n2
yields the 96 % confidence interval:
64 36 64 36
6 - 2.05 + m1 - m 2 < 6 + 2.05 +
75 50 75 50
or ,3.43 < m1 - m 2 < 8.57

This procedure for estimating the difference between two means is applicable if s 1 and s 2 are
2 2

known or can be estimated from large samples.


Exercise
A business man bought 50 bulbs of each 2 types of electric bulbs A and B. When testing them he
found out that bulb A has a mean life of 1262 hrs and bulb B has a mean life of 1200hrs. From
previous studies it is known that bulb A groups have standard deviation of 60 hrs and those of
bulb B have standard deviation of 50 hrs. Construct a 99% confidence interval estimate for the
difference in the quality of the two types of bulbs.

Potrebbero piacerti anche