Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
ESTIMATION
Household size 1 2 3 4 5 6 7
Frequency 11 19 28 26 11 4 1
Estimate the average size of all single-family households in the city.
A- X
X f 323
i i
3.23
f n 100
i
Example : Los Angeles has roughly 3 times the voters of San Diego. Each city
will be voting on an election. To determine the preferences of the voters, a
random sample of 3000 Los Angeles voters and a random sample of 1000 San
Diego voters will be queried. Of the following statements, which is most
accurate?
(a) The resulting estimates of the proportions of people who will vote in the two
cities are equally accurate.
(b) The Los Angeles estimate is 3 times as accurate.
(c) The Los Angeles estimate is roughly 1.7 times as accurate.
Explain how you are interpreting the word accurate in statements (a),(b), and (c).
A- Let the estimated proportions be p1 for LA and p2 for San Diego. The
Corresponding standard error estimates will be
pi (1 pi ) 1 1 1 1
SD( pi ) SD( p1 ) and SD( p2 )
ni 2 ni 3 2 1000 2 1000
Thus we expect that the estimation for LA will be more accurate by a factor
of approximately 1/ √3 as claimed in part (c).
Note however, this claim will hold only when the (unknown) proportions p1 and
p2 are close to each other. As an extreme example suppose p1=0.1 and p2=0.5,
which will yield
0.1(1 0.1) 0.3 1 0.5(1 0.5) 1 1
SD( pi ) 0.0094 SD( p2 ) 0.0091
1000 1000 3 1000 2 3 1000
variance estimations. 2
S
1
2 2
n 1
X i n X
Example : Estimate the mean and the variance for the two samples of size ten.
(a) X={48, 22, 19, 65, 72, 37, 55, 60, 49, 28}
(b) X={776, 810, 790, 788, 822, 806, 795, 807, 812, 791}
A- (a) Consider Y=X-50={-2,-28,-31,15,22,-13,5,10,-1,-22} for convenience
1 n
Y Yi 4.5, X 50 Y 45.5
10 i 1
Var (Y ) Yi Y Yi 2 Y 2 339.4 Var ( X )
1 n 2 1 n 10
9 i 1 9 i 1 9
(b) Consider Y=X-790={-14,20,0,-2,32,16,5,17,22,1} for convenience
1 n
Y Yi 9.7, X 790 Y 799.7
10 i 1
i 9
1 n 2 1 n 2 10 2
Var (Y ) Y Y Yi Y 181.5 Var ( X )
9 i 1 i 1 9
Example: Estimate the population mean μ and the population variance σ2 for
the data sample X={104, 110, 114, 97, 105, 113, 106, 101, 100, 107}.
Recalculate variance supposing it is known that the population mean is 104.
A- Consider Y=X-105={-1,5,9,-8,0,8,1,-4,-5,2} for convenience
1 n
Y Yi 0.7, X 105 Y 105.7
10 i 1
1 10 2 10 2
Var (Y ) Yi Y 31.22 0.54 30.68 Var ( X )
9 1 9
10 1
Interval estimator of the mean – Normal population with known SD
When we estimate a parameter by a point estimator, we do not expect the
resulting estimator to exactly equal the parameter, but we expect that it will
be “close” to it. To be more specific, we can try to find an interval about the
point estimator in which we can be highly confident that the parameter lies.
Such an interval is called an interval estimator.
Definition An interval estimator of a population parameter is an interval that
is predicted to contain the parameter. The confidence we ascribe to the
interval is the probability that it will contain the parameter.
Let X1, . . . , Xn be a sample of size n from a normal population having known
standard deviation σ, and suppose we want to utilize this sample to obtain a 95
percent confidence interval estimator for the population mean μ. To obtain
such an interval, we start with the sample mean X, which is the point estimator
of μ. We now make use of the fact that X is normal with mean μ and standard
deviation σ/√n, which implies that the standardized variable Z,
X X
Z n
/ n
has a standard normal distribution. It follows that 95 percent of the time the
absolute value of Z is less than or equal to 1.96
Thus, we can write
X
Z 1.96 P X 1.96 0.95
/ n n
P 1.96 X 1.96 0.95
n n
That is, with 95 percent probability, the interval X ± 1.96σ/√n will contain
the population mean.
Interval estimation for given percentiles
When we denote the percentile value of probability by using
P(Zz) = 1-, then from the table of standard normal distribution we
obtain for some frequently used percentile values:
Confidence Level Percentiles
Confidence level
Corresponding 100(1 − α) value of α Value of zα/2
90 .10 z0.05 = 1.645
95 0.05 z0.025 = 1.960
99 0.01 z0.005 = 2.576
The interval X ± zα/2 σ/√n is called a 100(1 − α) percent
confidence interval estimator of the population mean, X .
Determining the Necessary Sample Size
The length of the 100(1 − α) percent confidence interval estimator of the
population mean will be less than or equal to b when the sample size n
satisfies
2 z / 2
2
n
b
Aralık kestirimi
standart sapması bilinen bir topluluğun bilinmeyen ortalama
değerini tahmin etmek için topluluktan n boyutunda bir örnek alınsın.
Bu örneğin hesaplanan ortalama değeri topluluk
ortalaması için bir
varsayımda bulunmak için kullanılabilir : X .
Aralık kestirimi örnek ortalamaları ile topluluk ortalaması arasındaki
“mesafenin” belli bir değerden (b/2) küçük kalma olasılığının (1-)
belirlenmesi problemidir.
P X b / 2 X b / 2 1
P X b / 2 1
Olasılık verildiğinde aralığı belirleyebiliriz:
b n X b n
P b / 2 X b / 2 1
P Z 1
2 / n 2
b 2 z / 2
b n
z / 2 tanımı yapılırsa yukarıdaki bağıntıdan
2 n
z / 2 z / 2 1 z / 2 1 / 2 elde edilir. Örneğin, 1-=0.9 için
z / 2 0.95 z / 2 1.65 b / 2 1.65
b
X
n 2
bulunur, yani topluluk ortalaması % 90 olasılıkla örnek ortalamasından b/2
kadar farklı olabilir.
Tek taraflı bölge kestirimi:
Bazı problemlerde topluluk ortalamasının bulunabileceği aralık değil de yalnızca
alabileceği en büyük veya en küçük değerlere ilişkin olasılıklar ile ilgilenilir. Bu
durumda (1-) olasılığı ile minimum değerini şu şekilde belirleyebiliriz:
P X b 1 P b X 1
z
z 1 X b bulunur.
b n
z b ile
n
Maksimum değeri de benzer şekilde bulunabilir:
P X b 1 P b X 1
b n z
z b ile P(Z z ) 1 -
n
z 1 X b bulunur.
Örnek olarak 1-=0.99 alalım. Bu durumda (z)=0.99, z = 2.33 bulunur ve
%99 olasılıkla min X 2.33 / n eşitsizliğinin sağlanacağı anlaşılır.
Benzer şekilde , maksimum değeri de %99 olasılıkla Aralık belirlemeden fark:
max X 2.33 / n eşitsizliğini sağlayacaktır. Her yandaki olasılık
1-/2 yerine 1-,
Example : The following are data from a normal population with standard
deviation 3: {3,5, 4, 8, 12, 11, 7, 14, 12, 15, 10}
(a) With 95 percent confidence find the maximum value of population
mean.
(b) Find ,with 99 percent confidence, the minimum value of the
population mean.
Thus with a probability of 99 % the population mean will be larger than 7.07.
Interval estimator of the mean – Normal population with unknown SD
Since σ is no longer known, it is natural to replace it by its estimator S,
the sample standard deviation. However, this replacement effects the
probability distribution and the Z variable of standard normal
distribution now becomes Tn−1 variable of the so called T distribution of
n degrees of freedom.
X X
Z n Tn 1 n
S
The density function of a t random variable, like a standard normal random
variable, is symmetric about zero. It looks similar to a standard normal
density, although it is somewhat more spread out, resulting in its having
“larger tails.” As the degree of freedom parameter increases, the density
becomes more and more similar to the standard normal density. For sample
sizes n>30 the two distributions become practically identical. For smaller
sample sizes we have to replace Z by T and obtain its value from the table
given in the textbook.
Example: Consider the data of a sample of size n=20
{16, 0, 0, 2, 3, 6, 8, 2, 5, 0, 12, 10, 5, 7, 2, 3, 8, 17, 9, 1}
For the sample mean obtain a 95 percent confidence interval
A- Sample mean and SD are obtained as
X 116 / 20 5.8 S2
1
19
X i
2
20 X 2 25.85 S 5.08
X
1 / 2 0.975 t19 ( 0.025 ) 2.093 2.093 n
S
5.8 2.38
Thus 95% of time the sample mean will be in the (3.42, 8.18 ) interval.
It is interesting to note that if Z is used in place of T one would have
to replace 2.093 by 1.96 and obtain the interval as 5.82.22. Which is
on the optimistic side and not substantially far from the correct prediction.
Note however, the error resulting from using Z instead of T will increase
when sample size decreases.
Interval Estimators of Population Proportion
We recall that once the proportion of the sample having a certain characteristic,
(with a population probability p) is determined by calculating the sample mean
of the indicator variable X as,
1
pˆ X
n
X i
12
(b) 35 1.96 35 3.92
6