Sei sulla pagina 1di 26

Statistics 2

ESTIMATION

Whereas the numerical values of the members of the population can be


summarized by a population probability distribution, this distribution is
often not completely known. For instance, certain of its parameters, such as
its mean and its standard deviation, may be unknown. A fundamental
concern in statistics relates to how one can use the results from a sample of
the population to estimate these unknown parameters.
In this chapter we will consider ways of estimating certain parameters of
the population distribution. To accomplish this, we will show how to use
estimators and the estimates they give rise to.
Definition An estimator is a statistic whose value depends on the particular
sample drawn. The value of the estimator, called the estimate, is used to
predict the value of a population parameter.
Point Esimator of a Population Mean
The sample mean X can be used as an estimator of μ. Since, we have
seen that E(X ) = μ , the population mean, such an estimator is called unbiased.
Definition An estimator whose expected value is equal to the parameter it is
estimating is said to be an unbiased estimator of that parameter.

Example: To estimate the average amount of damages claimed in fires a


consumer organization sampled the files of a large insurance company to
come up with the following amounts (in thousands of dollars) for 10
claims, Xi, i=1-10 as 121, 55, 63, 12, 8, 141, 42, 51, 66, 103.The estimate
of the mean amount of damages claimed in all fires of the type being
considered is
1
X
10
 X i  66.200
That is, we estimate that the mean fire damage claim is $66,200. Since a random
variable is not likely to be too many standard deviations away from its expected
value, it is important to determine the standard deviation of X . However, as we
have already noted SD(X ) = /n, where σ is the population standard deviation.
The quantity SD( X) is sometimes called the standard error of X as an estimator
of the mean. Since a random variable is unlikely to be more than 2 standard
deviations away from its mean (especially when that random variable is
approximately normal, as X will be when the sample size n is large. )
We are usually fairly confident that the estimate of the population mean will be
correct to within ±2 standard errors. Note that the standard error decreases by
the square root of the sample size; as a result, to cut the standard error in half,
we must increase the sample size by a factor of 4.

Example : Successive tests for the level of potassium in an individual’s


blood vary because of the basic imprecision of the test and because the actual
level itself varies, depending on such things as the amount of food recently
eaten and the amount of stress recently undergone. Suppose it is known that,
for a given individual, the successive readings of potassium level vary around
a mean value μ with a standard deviation of 0.3. If a set of four readings on a
particular individual yields the data 3.6, 3.9, 3.4, 3.5 the since the sample
mean is 3.6 with the standard error SD=0.3/2=0.15 we can be quite confident
that the actual mean will not differ from 3.6 by more than 0.30.
Suppose we wanted the estimator to have a standard error of 0.05. Then,
since this would be a reduction in standard error by a factor of 3, it follows
that we would have had to choose a sample 9 times as large. That is, we
would have had to take 36 blood potassium readings.
Example: A manufacturer of compact disk players wants to estimate the average
lifetime of the lasers in its product. A random sample of 40 is chosen. If the sum
of the lifetimes of these lasers is 6624 hours, what is the estimate of the average
lifetime of a laser?
A- 6624/40 = 165.6 hours
Example: It is known that the standard deviation of the weight of a newborn
child is 10 ounces. If we want to estimate the average weight of a newborn,
how large a sample will be needed for the standard error of the estimate to be
less than 3 ounces?
A- Sample SD = /n hence n = (10/3)2 =11.1. That is we need a sample size
of at least n=12.
Example: The following frequency table gives the household sizes of a
random selection of 100 single-family households in a given city.

Household size 1 2 3 4 5 6 7
Frequency 11 19 28 26 11 4 1
Estimate the average size of all single-family households in the city.
A- X
 X f  323
i i
 3.23
 f  n 100
i

Example: Does (a) or (b) yield a more precise estimator of μ?


(a) The sample mean of a sample of size n from a population with mean
μ and standard deviation σ
(b) The sample mean of a sample of size 3n from a population with
mean μ and standard deviation 3σ
A- Sample SD = /n . We have for (a) SD= /n but for (b) SD= 3/(3n)
thus (a) yields a more precise estimator of μ.

Point Estimator of a Population Proportion


Suppose that we are trying to estimate the proportion of a large population
that is in favor of a given proposition. Let p denote the unknown proportion.
To estimate p, a random sample should be chosen, and then p should be
estimated by the proportion of the sample that is in favor. That is if X is an
indicator variable defined by Xi=1 when ith element of sample is in favor of
the proposition and else Xi=0. Clearly the sample mean of X is the logical
choice for the estimator. Calling this estimator p̂ we can express it by
1
pˆ  X 
n
 X It is unbiased estimator for p since E ( pˆ )  p
i
The spread of the estimator p̂ about its mean p is measured by its standard
deviation, which is equal to
p (1  p )
SD ( pˆ ) 
n
The standard deviation of p̂ is also called the standard error of p̂ as an
estimator of the population proportion p. By the foregoing formula this
standard error will be small whenever the sample size n is large. In fact, since it
can be shown that for every value of p, we have p(1 − p) ≤ ¼ (note the
expression on the rhs maximizes at p=1/2.) It follows that
1
SD ( pˆ ) 
2 n
For instance, suppose a random sample of size 900 is chosen. Then no matter
what proportion of the population is actually in favor of the proposition, it
follows that the standard error of the estimator of this proportion is less than
or equal to 1/(2√900) = 1/60.
Example : A school district is trying to determine its students’ reaction to a
proposed change in regulations. To do so, the school selected a random
sample of 50 students and questioned them. If 20 were in favor of the
proposal, then
(a) Estimate the proportion of all students who are in favor.
(b) Estimate the standard error of the estimate.
A- (a) The estimate of the proportion of all students who are in favor of the
change is 20/50 = 0.40.
(b) The standard error of the estimate is [p(1 − p)/50]1/2, where p is the actual
proportion of the entire population that is in favor. Using the estimate for p of
0.4, we can estimate this standard error by [0.4(1 − 0.4)/50]1/2 = 0.0693.
Example : A random sample of 85 students at a university revealed that 35
students owned a car. Estimate the proportion of all students at the university who
dont own a car. What is the estimate of the standard error of this estimate?
A- It means that 50 out of 85 students dont own a car. Thus the estimate of
their proportion is 50/85=0.588. The standard error can be estimated by
[0.588(1 − 0.588)/85]1/2 = 0.053
Example : A random sample of 1000 construction workers revealed that 122
are presently unemployed.
(a) Estimate the proportion of all construction workers who are unemployed.
(b) Estimate the standard error of the estimate in part (a).
A- (a) 0.122, (b) 0.01

Example : Los Angeles has roughly 3 times the voters of San Diego. Each city
will be voting on an election. To determine the preferences of the voters, a
random sample of 3000 Los Angeles voters and a random sample of 1000 San
Diego voters will be queried. Of the following statements, which is most
accurate?
(a) The resulting estimates of the proportions of people who will vote in the two
cities are equally accurate.
(b) The Los Angeles estimate is 3 times as accurate.
(c) The Los Angeles estimate is roughly 1.7 times as accurate.
Explain how you are interpreting the word accurate in statements (a),(b), and (c).
A- Let the estimated proportions be p1 for LA and p2 for San Diego. The
Corresponding standard error estimates will be
pi (1  pi ) 1 1 1 1
SD( pi )   SD( p1 )  and SD( p2 ) 
ni 2 ni 3 2 1000 2 1000
Thus we expect that the estimation for LA will be more accurate by a factor
of approximately 1/ √3 as claimed in part (c).
Note however, this claim will hold only when the (unknown) proportions p1 and
p2 are close to each other. As an extreme example suppose p1=0.1 and p2=0.5,
which will yield
0.1(1  0.1) 0.3 1 0.5(1  0.5) 1 1
SD( pi )    0.0094 SD( p2 )    0.0091
1000 1000 3 1000 2 3 1000

Evidently, the SD’s will then be approximately equal to each other.


Randomizing Samples : Suppose that a company is interested in learning
about the extent of illegal drug use among its employees. However, the
company recognizes that employees might be reluctant to truthfully answer
questions on this subject even if they have been assured that their answers will
be kept in confidence. Presumably if the true answer is no, then the worker
will not hesitate to give that answer. However, if the real answer is yes, then
some workers may still answer no. Given this background, how can the
company elicit the desired information? The method is to employ a sample
randomization technique. Let us present an example as to how it may work.
To relieve any pressure to lie, the following rule for answering should be
explained to each worker before the questioning begins: After the question has
been posed, the worker is to flip a fair coin, not allowing the questioner to see
the result of the flip. If the coin lands on heads, then the worker should answer
yes to the question; and if it lands on tails, then the worker should answer the
question honestly. It should be explained to the worker that an answer of yes
does not mean that he or she is admitting to having used illegal drugs, since
that answer may have resulted solely from the coin flip’s landing on heads
(which will occur 50 percent of the time). In this manner the workers sampled
should feel assured that they can play the game truthfully and, at the same
time, preserve their privacy.
Example : How can one estimate proportions from data randomized as
explained above?
A- Let p the proportion of those having used an illegal drug and q of those who
have not. Consider q, since “no” answers will occur only if both
(1) the coin toss lands on tails and (2) the worker has not used
any illegal drugs, we see that P(no) =1/2 q. Hence, we can take the fraction
of workers sampled who answered no as our estimate of q/2; or, equivalently,
we can estimate q to be twice the proportion who answered no.
Since p = 1 − q, this will also result in an estimate of p, the proportion of all
workers who have used an illegal drug. For instance, if 70 percent of the workers
sampled answered the question in the affirmative, and so 30 percent answered no,
then we would estimate that q was equal to 2(0.3) = 0.6. That is, we would
estimate that 60 percent of the population has not, and so 40 percent of the
population has, used an illegal drug.
Example :Suppose that the same randomization scheme is employed. If a
sample of 50 people results in 32 yes answers, what is the estimate of the
affirmative answer, p?
A- P(no)=18/50=q/2, q=18/25, p=1-q=7/25
Estimating a Population Variance
 Xi  X 
1 n

2
The sample variance S2, defined by S 2

n  1 i 1

is an unbiased estimator of the population variance since E(S2) = σ2. When


1 n
the population mean is known it is appropriate to use   X    as the
2
i
n i 1

estimator. The identity,   X  X    X  n X can be used in calculating


n 2 n
2 2
i i
i 1 i 1

variance estimations. 2
S 
1
 2 2
n 1
 X i n X

Example : Estimate the mean and the variance for the two samples of size ten.
(a) X={48, 22, 19, 65, 72, 37, 55, 60, 49, 28}
(b) X={776, 810, 790, 788, 822, 806, 795, 807, 812, 791}
A- (a) Consider Y=X-50={-2,-28,-31,15,22,-13,5,10,-1,-22} for convenience
1 n
Y  Yi  4.5, X  50  Y  45.5
10 i 1
Var (Y )   Yi  Y   Yi 2  Y 2  339.4  Var ( X )
1 n 2 1 n 10
9 i 1 9 i 1 9
(b) Consider Y=X-790={-14,20,0,-2,32,16,5,17,22,1} for convenience
1 n
Y Yi  9.7, X  790  Y  799.7
10 i 1

 i  9 
1 n 2 1 n 2 10 2
Var (Y )  Y  Y  Yi  Y 181.5  Var ( X )
9 i 1 i 1 9
Example: Estimate the population mean μ and the population variance σ2 for
the data sample X={104, 110, 114, 97, 105, 113, 106, 101, 100, 107}.
Recalculate variance supposing it is known that the population mean is 104.
A- Consider Y=X-105={-1,5,9,-8,0,8,1,-4,-5,2} for convenience
1 n
Y Yi  0.7, X  105  Y  105.7
10 i 1
1 10 2 10 2
Var (Y )   Yi  Y  31.22  0.54  30.68  Var ( X )
9 1 9

When it is known that the population mean is 104 we have to replace


X  105.7 with   104 or equivalently Y  0.7 with Y  1  Y
We then obtain,
1 10 2
Var (Y )   Yi   Y  28.1  1  27.1  Var ( X )
2

10 1
Interval estimator of the mean – Normal population with known SD
When we estimate a parameter by a point estimator, we do not expect the
resulting estimator to exactly equal the parameter, but we expect that it will
be “close” to it. To be more specific, we can try to find an interval about the
point estimator in which we can be highly confident that the parameter lies.
Such an interval is called an interval estimator.
Definition An interval estimator of a population parameter is an interval that
is predicted to contain the parameter. The confidence we ascribe to the
interval is the probability that it will contain the parameter.
Let X1, . . . , Xn be a sample of size n from a normal population having known
standard deviation σ, and suppose we want to utilize this sample to obtain a 95
percent confidence interval estimator for the population mean μ. To obtain
such an interval, we start with the sample mean X, which is the point estimator
of μ. We now make use of the fact that X is normal with mean μ and standard
deviation σ/√n, which implies that the standardized variable Z,
X  X 
Z  n
/ n 
has a standard normal distribution. It follows that 95 percent of the time the
absolute value of Z is less than or equal to 1.96
Thus, we can write
 X     
Z  1.96   P  X    1.96   0.95
/ n   n 
   
P  1.96  X    1.96   0.95
 n n 
That is, with 95 percent probability, the interval X ± 1.96σ/√n will contain
the population mean.
Interval estimation for given percentiles
When we denote the percentile value of probability by  using
P(Zz) = 1-, then from the table of standard normal distribution we
obtain for some frequently used percentile values:
Confidence Level Percentiles
Confidence level
Corresponding 100(1 − α) value of α Value of zα/2
90 .10 z0.05 = 1.645
95 0.05 z0.025 = 1.960
99 0.01 z0.005 = 2.576
The interval X ± zα/2 σ/√n is called a 100(1 − α) percent
confidence interval estimator of the population mean, X .
Determining the Necessary Sample Size
The length of the 100(1 − α) percent confidence interval estimator of the
population mean will be less than or equal to b when the sample size n
satisfies
 2  z / 2 
2

n  
 b 
Aralık kestirimi
 standart sapması bilinen bir topluluğun bilinmeyen  ortalama
değerini tahmin etmek için topluluktan n boyutunda bir örnek alınsın.
Bu örneğin hesaplanan ortalama değeri topluluk
 ortalaması için bir
varsayımda bulunmak için kullanılabilir : X  .
Aralık kestirimi örnek ortalamaları ile topluluk ortalaması arasındaki
“mesafenin” belli bir değerden (b/2) küçük kalma olasılığının (1-)
belirlenmesi problemidir.
P  X  b / 2    X  b / 2  1    
 P X    b / 2  1 
Olasılık verildiğinde aralığı belirleyebiliriz:
b n X   b n 
P   b / 2  X    b / 2  1   
 P Z    1 
 2 / n 2 
 b  2 z / 2 
b n
 z / 2 tanımı yapılırsa yukarıdaki bağıntıdan
2 n
 z / 2     z / 2   1      z / 2   1   / 2 elde edilir. Örneğin, 1-=0.9 için
 
 z / 2   0.95  z / 2  1.65  b / 2  1.65
b
  X 
n 2
bulunur, yani topluluk ortalaması % 90 olasılıkla örnek ortalamasından b/2
kadar farklı olabilir.
Tek taraflı bölge kestirimi:
Bazı problemlerde topluluk ortalamasının bulunabileceği aralık değil de yalnızca
alabileceği en büyük veya en küçük değerlere ilişkin olasılıklar ile ilgilenilir. Bu
durumda (1-) olasılığı ile minimum  değerini şu şekilde belirleyebiliriz:
P  X  b     1    P   b  X   1  
z 
 z    1      X  b bulunur.
b n
z  b ile
 n
Maksimum  değeri de benzer şekilde bulunabilir:
P  X  b     1    P   b  X   1  
b n z 
z  b ile P(Z  z  )  1 - 
 n
  z    1      X  b bulunur.
Örnek olarak 1-=0.99 alalım. Bu durumda (z)=0.99, z = 2.33 bulunur ve
%99 olasılıkla    min  X  2.33   / n eşitsizliğinin sağlanacağı anlaşılır.
Benzer şekilde , maksimum  değeri de %99 olasılıkla Aralık belirlemeden fark:
   max  X  2.33   / n eşitsizliğini sağlayacaktır. Her yandaki olasılık
1-/2 yerine 1-,
Example : The following are data from a normal population with standard
deviation 3: {3,5, 4, 8, 12, 11, 7, 14, 12, 15, 10}
(a) With 95 percent confidence find the maximum value of population
mean.
(b) Find ,with 99 percent confidence, the minimum value of the
population mean.

A- The population mean is calculated as, 101/11=9.18


(a) X 
P   b  X   1   Z   z   ( z )  0.95 , z  1.65
3 / 11
  ma x  X  1.49  10.67 P ( X   max  10.67 )  P ( Z  1.65 )  1   ( 1.65 )   ( 1.65 )  0.95
Thus with a probability of 95 % the population mean will be less than
10.67. X   b 11
(b) P ( X    b )  0.99  Z    z  2.33   min  X  2.11  7.07
3 / 11 3
P ( X   min  7.07 )  P ( Z  2.33 )  1   ( 2.33 )  0.01

Thus with a probability of 99 % the population mean will be larger than 7.07.
Interval estimator of the mean – Normal population with unknown SD
Since σ is no longer known, it is natural to replace it by its estimator S,
the sample standard deviation. However, this replacement effects the
probability distribution and the Z variable of standard normal
distribution now becomes Tn−1 variable of the so called T distribution of
n degrees of freedom.
X  X 
Z n  Tn 1  n
 S
The density function of a t random variable, like a standard normal random
variable, is symmetric about zero. It looks similar to a standard normal
density, although it is somewhat more spread out, resulting in its having
“larger tails.” As the degree of freedom parameter increases, the density
becomes more and more similar to the standard normal density. For sample
sizes n>30 the two distributions become practically identical. For smaller
sample sizes we have to replace Z by T and obtain its value from the table
given in the textbook.
Example: Consider the data of a sample of size n=20
{16, 0, 0, 2, 3, 6, 8, 2, 5, 0, 12, 10, 5, 7, 2, 3, 8, 17, 9, 1}
For the sample mean obtain a 95 percent confidence interval
A- Sample mean and SD are obtained as
X  116 / 20  5.8 S2 
1
19
 X i
2

 20 X 2  25.85 S  5.08

X
1   / 2  0.975  t19 ( 0.025 )  2.093  2.093  n
S
   5.8  2.38

Thus 95% of time the sample mean will be in the (3.42, 8.18 ) interval.
It is interesting to note that if Z is used in place of T one would have
to replace 2.093 by 1.96 and obtain the interval as 5.82.22. Which is
on the optimistic side and not substantially far from the correct prediction.
Note however, the error resulting from using Z instead of T will increase
when sample size decreases.
Interval Estimators of Population Proportion
We recall that once the proportion of the sample having a certain characteristic,
(with a population probability p) is determined by calculating the sample mean
of the indicator variable X as,
1
pˆ  X 
n
X i

its expected value and standard deviation are obtained by,


p (1  p )
E ( pˆ )  p SD ( pˆ ) 
n
When n is large enough that both np and n(1 − p) are greater than 5, we can
use the normal approximation to the binomial distribution to assert that an
approximate 100(1 − α) percent confidence interval estimator of p is given by
pˆ (1  pˆ )
p  pˆ  z / 2
n
Note that in calculating the SD p is replaced by its sample mean :
p (1  p) pˆ (1  pˆ )
SD( pˆ )  
n n

Also recall that P(Zz) = 1-.


Example: On December 24, 1991, The New York Times reported that a
poll indicated that 46 percent of the population was in favor of the way that
President Bush was handling the economy, with a confidence level of 95 %
and a margin of error of ±3 percent. What does this mean? Can we infer
how many people were questioned?
A- We can assume that the sample size is large enough so that normal
approximation is applicable then since sample mean and SD are
pˆ (1  pˆ )
pˆ  0.46 SD( pˆ )  and z / 2  1.96
n
The confidence interval is
0.46 (0.54)
p  pˆ  z / 2 SD( pˆ )  0.46  1.96 hence
n
0.46 (0.54)
1.96  0.03  n  1060
n
Review Problems
1. Consider a sample of n=100 . Given that the sample mean and SD are
320 and 16, respectively, determine 95% confidence interval for the
sample mean.
A- /2 = 0.025, z /2 =1.96 Hence the confidence interval is
320 1.96 SD/10 = 320 3.136 i.e., 316.864 – 323.136
P(316.864  X  323.136)  P(1.96  Z  1.96)  2 (1.96)  1  0.05

2. A sample of n=50 000 is used for determining the


unemployment rate. Considering 95% confidence level what
would be the margin of error in the outcome?
A- Let p be the unemployment rate as outcome of the poll. The sample
standard deviation will be bounded by
p (1  p) 1
SD    2.2 10 3
n 2 n
95 % of time the outcome will be within 2 SD’s from the mean, thus the
error margin of p is estimated as 0.0044.
3. Among a random sample of 1000 there are 518 females. Determine
the 95% confidence interval of the population mean of females.
A- Since population SD is not known we have to approximate it with sample
SD. We can use normal approximation, rather then T distribution for a
sample size of 1000. Hence,
pˆ (1  pˆ )
pˆ  0.518 ; SD   0.0158
1000
p  pˆ  1.96 SD  0.518  0.031  0.487  0.549

4. The sample average of a random sample of 36 is given as 35. Determine


95% confidence interval for population mean when the population SD is
kown to be (a) 3 and (b) 12.
A- 3
(a)   35  1.96  35  0.98
6

12
(b)   35  1.96  35  3.92
6

Potrebbero piacerti anche