Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
E XAMPLE 9.3. Which one of the following estimators • About 95% of the √ values of X are expected to
Chapter 9 One- and Two-Sample Estimation Problems
is the most efficient one? fall within 2(σ / n) of µ, i.e.,
σ σ
^ P µ −2√ ≤ X ≤ µ +2√ = 0.95.
! 1 n n
^
! 3 Exchanging the positions of µ and X,
σ σ
P X −2√ ≤ µ ≤ X +2√ = 0.95.
n n
^
! 2
• Our sample
x = 160, then
gives the interval is
^
θ
θ from x − 2 n to x + 2 n , or,
σ
√ √σ
Figure 9.1: Sampling distributions of different estimators of θ. 10 10
In many situations, we prefer to determine an 160 − 2 √ , 160 + 2 √
36 36
interval within which we would expect to find the value
of X̃. Thus, both estimates x̄ and x̃ will, on average, equal the population mean or,
µ, but x̄ of the parameter.
is likely Such
to be closer to µ foran interval
a given is called
sample, an X̄
and thus interval
is more efficient
(157, 163)
than X̃. estimate.
• This is an interval estimate of the unknown pa-
Estimation Interval Estimation rameter of µ.
Even theAn most efficientestimate
interval unbiased of
estimator is unlikely
a population to estimateθ the
parameter population
is an
parameter exactly. It is true that estimation accuracy increases with large samples, – 0.95, confidence level or confidence coeffi-
but thereinterval
is still noof the we
reason form
should expect a point estimate from a given sample cient
to be exactly equal to the population parameter it is supposed to estimate. There
L < θ < to , – (157, 163), 95% confidence interval of µ
are many situations in which it isθ̂preferable θ̂Udetermine an interval within which
we would expect to find the value of the parameter. Such an interval is called an – 157, lower confidence limit
intervalwhere
estimate.θ̂L and θ̂U depend on the value of the statistic
– 163, upper confidence limit
An interval
Θ̂ for estimate of a population
a particular sample andparameter θ is the
also on an interval
sampling of the form
θ̂L < θ < θ̂U , where θ̂L and θ̂U depend on the value of the statistic Θ̂ for a
distribution of Θ̂. – 6 = 163 − 157, interval width
particular sample and also on the sampling distribution of Θ̂. For example, a
random sample of SAT verbal scores for students in the entering freshman class
might produce an interval from 530 to 550, within which we expect to find the Interpretation of Interval Estimates
true average of all SAT verbal scores for the freshman class. The values of the
9.4 Single Sample: Estimating the
endpoints, 530 and 550, will depend on the computed sample mean x̄ and the From the sampling distribution of Θ̂, we shall be able
2 to determine θ̂ and θ̂
U such that
2
sampling distribution of X̄. As the sample size increases, we know that σX̄ = σ /n
Mean
decreases, and consequently our estimate is likely to be closer to the parameter µ,
L
resulting in a shorter interval. Thus, the interval estimate indicates, by its length, P Θ̂L < θ < Θ̂U = 1 − α,
the accuracy of the point estimate. An engineer will gain some insight into the
9.4.1
population An defective
proportion Introductory
by taking aExample.
sample and computing theSample:
samplefor 0 < α < 1, then we have a probability of 1 − α of
9.4 Single Estimating the Mean 271
proportion defective. But an interval estimate might be more informative. selecting a random sample that will produce an interval
Let us now look at an example. containing
the shape of the θ .
distributions not too skewed, sampling theory guarantees good
results.
ation of Interval Estimates Clearly, the values of the random variables Θ̂L and Θ̂U , defined in Section 9.3,
The heights of the freshmen at UMD are sup- • confidence
The interval θ̂L < θ < θ̂U , computed from the
Since different samples will generally yield different values of Θ̂ and, therefore,are the limits
posed to follow a normal distribution with mean µ and selectedθ̂L = x̄ − zα/2 √ is
sample, called zα/2 √ . − α)% confi-
θ̂U = x̄a+ 100(1
different values for θ̂L and θ̂U , these endpoints of the interval are values of corre- σ σ
and
spondingstandard deviation
random variables Θ̂L σ =Θ̂10
and (in cm). A random sample
U . From the sampling distribution of Θ̂ we dence interval.
n n
shall be of
ablesize n = 36 is Ltaken, Uthe sample mean
to determine Θ̂ and Θ̂ such that P ( Θ̂ L < θx = 160.
< Θ̂ U ) is equal to any
Different samples will yield different values of x̄ and therefore produce different
interval
• estimates
The of the parameter
fraction 1 − α µ, is
as called
shown in theFigureconfidence
9.3. The dot atco-the
center of each interval indicates the position of the point estimate x̄ for that random
efficient
sample. Note orthese
that all of theintervals
degree are of confidence.
of the same width, since their widths
depend only on the choice of zα/2 once x̄ is determined. The larger the value we
• Use x = 160 to estimate the value of µ. This is choose for zα/2 , the wider we make all the intervals and the more confident we
• The endpoints, θ̂L andwill , are an
θ̂Uproduce called the lower
can be that the particular sample selected interval that contains
a point estimation of µ. andparameter
the unknown upperµ.confidence
In general, for alimits.
selection of z , 100(1 − α)% of the
α/2
intervals will cover µ.
• Is µ equal to 160?
• We would like to convert this point estimate into 10
9
a statement, like “the value of µ is between 150
8
cm and 170 cm” and attached to the statement
7
a measure of degree of confidence of it being
Sample
6
true. 5
4
3
• From the distribution of X, 2
1
X −µ
√ ∼ N (0, 1) x
σ/ n
µ
• Caution – Don’t say A random sample of size n is selected from a population whose variance σ 2 is known,
and theEmean
XAMPLE 9.4. High
x̄ is computed school
to give students
the 100(1 − α)% who takeinterval
confidence the SATbelow. It
– 95% of all freshmen at UMD are expected is important to
mathematicsemphasizeexamthatawe have
second invoked
time the Central
generally Limit
score Theorem
higher above.
As a result, it is important to note the conditions for applications that follow.
to have heights between 157cm and 163cm. than on their first try. The change in score has a normal
Confidence If x̄ isdistribution
the mean of witha random sampleσof2 =
variance size2500.
n fromAa random
population sample
with known
– We are 95% confident that a Interval
randomly se-
on µ, σ 2
variance 2
σ , a 100(1 − α)%gains
confidence interval forofµ xis =
given by
lected UMD freshman has a height between of 1000 students an average 22 points on
Known
their second try. σ σ
157cm and 163cm. x̄ − zα/2 √ < µ < x̄ + zα/2 √ ,
n n
– 95% of all simple random samples of 10 where zα/2
(a) isConstruct
the z-value a
leaving
90% an area of α/2 interval
confidence to the right.
for the mean
UMD freshmen will have mean height be- For small samples selected from nonnormal populations, we cannot expect our
score gain µ in the population of all students.
tween 157cm and 163cm. degree of confidence to be accurate. However, for samples of size n ≥ 30, with
(b) Interpret the C.I. in part (a).
In general, let us consider the interval estimate of the
unknown population mean µ, in general. If the sample (c) Repeat part (a) for levels of confidence of 95%
is selected from a normal population or, if n is suffi- and 99%.
ciently large, we can establish a confidence interval for
(d) How does increasing the confidence level affect
µ based on the sampling distribution of X.
the width of a confidence interval?
9.4.2 The case of known σ A wise user of statistics never plans data collec-
tion without at the same time planning the inference.
The idea is the same as that in the UMD freshmen We could arrange to have both high confidence and a
height example. small error.
If e is a pre-fixed (perhaps, desired and specified)
P −zα/2 < Z < zα/2 = 1 − α
amount that the error zα/2 √σn can not exceed, we set
X −µ
P −zα/2 < √ < zα/2 = 1 − α
σ/ n σ
zα/2 √ < e
σ σ n
P µ − zα/2 √ < X < µ + zα/2 √ = 1−α
n n Solve for n, we have
σ σ
P X − zα/2 √ < µ < X + zα/2 √ = 1−α
n n Sample Size Determination
If x is used as an estimate of µ, we can be 100(1−α)%
confident that the error will not exceed a specified
Confidence Interval on µ, when σ Known
amount e when the sample size is
If x is the mean of a random sample of size n from a 2
population with known standard deviation σ , a 100(1− zα/2 σ
n=
α)% confidence interval for µ is given by e
σ σ
x − zα/2 √ < µ < x + zα/2 √ N OTE . When solving for the sample size, n, we round
n n
all fractional values up to the next whole number. This
where zα/2 is the z-value such that P Z > zα/2 = way, we can be sure that our degree of confidence never
α/2. falls below 100(1 − α)%.
s s
x − tα/2 √ < µ < x + tα/2 √ (a) Is it appropriate to use t confidence interval to an-
n n
alyze the data? Briefly explain.
where tα/2 is the t-value with (n − 1) degrees of free- (b) Give a 95% confidence interval for the mean monthly
dom such that P T (n − 1) > tα/2 = α/2. cost of Internet access in August 2000.
For estimating population means based on a single We will now conduct statistical inference procedures
sample, it is essential to require that (i) the statistics for estimating µ1 − µ2 , the difference between two pop-
(i.e., x and s) must be from a random sample, and ulation means, based on independent samples.
(ii) the population is normal, or, if failing, n ≥ 30. As in Subsection 8.4.3, suppose that we have
In general, when constructing the 2-sided C.I., two populations with means µ1 and µ2 and variances
we σ12 and σ22 , respectively. We take two independent
random samples, one from each population, of sizes n1
σ
• use x ± zα/2 √ , if σ is known. and n2 . Then it is quite nature to have the difference
n between two sample means X 1 − X 2 as a nature point
s estimator of the difference between two population
• use x ± tα/2 √ , if σ is unknown and n small.
n means µ1 − µ2 .
s For an interval estimate of µ1 − µ2 , we must con-
• use x ± zα/2 √ , if σ is unknown and n large.
n sider the sampling distribution of X 1 − X 2 .
C.I. for µ1 − µ2 , when both σ12 and σ22 known Then the above statistic becomes
If x1 and x2 are means of independent random sam- (X 1 − X 2 ) − (µ1 − µ2 )
T= q ∼ T (n1 + n2 − 2)
ples of sizes n1 and n2 from populations with known
S p n11 + n12
variances σ12 and σ22 , respectively, a 100(1−α)% con-
fidence interval for µ1 − µ2 is given by Now,
s
σ12 σ22 1 − α = P −tα/2 < T < tα/2
(x1 − x2 ) − zα/2 + < µ1 − µ2
n1 n2
s (X 1 − X 2 ) − (µ1 − µ2 )
σ12 σ22 = P −tα/2 < q < tα/2
1 1
< (x1 − x2 ) + zα/2 + S p n1 + n2
n1 n2
where zα/2 is the z-value defined previously. follows
N OTE . The confidence interval is exact when two in- C.I. for µ1 − µ2 , when σ12 = σ22 = σ 2 , but Unknown
dependent samples are taken from normal populations.
For non-normal populations, the Central Limit Theorem If x1 and x2 are means of independent random samples
allows for a pretty good approximation for reasonable of sizes n1 and n2 from populations with unknown but
size samples. equal variances, a 100(1−α)% confidence interval for
µ1 − µ2 is given by
E XAMPLE 9.11. We would like to compare the mean r
tar content in regular cigarettes and light cigarettes. We 1 1
take simple random samples of regular and light cigarettes (x1 − x2 ) − tα/2 s p + < µ1 − µ2
n1 n2
of a particular brand and measure the tar content (in mg) r
1 1
of each cigarette. The data are as follows: < (x1 − x2 ) + tα/2 s p +
n1 n2
Regular: 11.3 12.1 12.6 11.5 12.2 12.8 where tα/2 is the t-value defined previously and
Light: 9.5 9.8 9.3 8.9 10.0 s
(n1 − 1)s21 + (n2 − 1)s22
It is known that tar content for regular cigarettes of this sp = .
n1 + n2 − 2
brand follows a normal distribution with standard devi-
ation 0.4 mg and tar content for light cigarettes of this
brand follows a normal distribution with standard devi- E XAMPLE 9.12. An insurance company would like to
ation 0.3 mg. Find a 95% confidence interval for the know if men drive faster on average than women. The
difference in mean tar content for all regular cigarettes company took a random sample of 52 cars driven by men
and all light cigarettes of this brand. on a highway and found the mean speed to be 114 km/h
with a standard deviation of 10 km/h. Another sample
of 30 cars driven by women on the same highway gave
9.8.2 Variances Unknown but Equal a mean speed of 108 km/h with a standard deviation of
7 km/h. Construct a 98% confidence interval for the true
Assume that the population variances σ12 and σ22 are difference between the mean speeds of cars driven by
unknown but equal, i.e., σ12 = σ22 = σ 2 . We can show men and women on this highway.
Assume that the population variances σ12 and σ22 are Define the sample proportion of successes
unknown but unequal, i.e., σ12 6= σ22 . The statistic X
Pb = .
X 1 − X 2 − (µ1 − µ2 ) · n
T= q ∼ T(ν)
When n is large, the sampling distribution of Pb is
s21 /n1 + s22 /n2
approximately normal with mean
(s21 /n1 + s22 /n2 )2
where ν = . b X np
[(s1 /n1 ) /(n1 − 1)] + [(s22 /n2 )2 /(n2 − 1)]
2 2 µPb = E P = E = =p
n n
N OTE . The expression for v above is an estimate of the
and variance
degrees of freedom. In applications, it is rarely a whole
X np(1 − p) p(1 − p)
number, and we should round it down to the nearest 2 b
σPb = Var P = Var = = .
integer to achieve the desired confidence. n n2 n
That is,
C.I. for µ1 − µ2 , when σ12 6= σ22 , but Unknown r !
· p(1 − p)
If x1 and s21
and x2 and s22
are the means and vari- Pb ∼ N p, , n → ∞.
ances of independent random samples of sizes n1 and n
n2 , respectively, from approximately normal popula- or,
tions with unknown and unequal variances, an ap- Pb − p
proximate 100(1 − α)% confidence interval for µ1 − µ2 Z=q → N(0, 1)
p(1−p)
is given by n
s N OTE . As a thumb rule, this approximation requires
s21 s22 np ≥ 5 and n(1 − p) ≥ 5.
(x1 − x2 ) − tα/2 + < µ1 − µ2
n1 n2
s Now,
s21 s22
< (x1 − x2 ) + tα/2 + 1 − α = P −zα/2 < Z < zα/2
n1 n2
where tα/2 is the t-value defined previously with ν as Pb − p
= P −zα/2 < q < zα/2
above. p(1−p)
n
E XAMPLE 9.13. The gasoline prices (in cents/litre) for
a random sample of 8 Winnipeg gas stations and 5 Cal- Pb − p
gary gas stations are recorded one day and are shown ≈ P −zα/2 < q < zα/2
p̂(1− p̂)
below: n
r r !
p̂(1 − p̂) p̂(1 − p̂)
= P Pb − zα/2 < p < Pb + zα/2
Winnipeg: 119.9 122.4 121.7 120.9 n n
121.0 122.9 119.9 121.7
where we used the point estimate p̂ = x/n to replace
p under the radical sign.
Calgary: 117.9 120.4 118.4 122.9 117.0
Find a 95% confidence interval for the difference in mean Large-Sample Confidence Intervals for p
gas prices for the two cities. If p̂ is the proportion of successes in a random sam-
ple of size n an approximate 100(1 − α)% confidence
interval, for the binomial parameter p is given by
9.9 Paired Observations r r
b p̂(1 − p̂) b p̂(1 − p̂)
P − zα/2 < p < P + zα/2
n n
Take STAT 3612 for “Paired Observations”
N OTE . The (estimated) standard error of Pb is defined N OTE . If we have no idea what the value of p might be,
by r we can use p̂ = 0.5 in the sample size formula, i.e.,
b = p̂(1 − p̂) .
s. e.(P) (zα/2 )2
n n= .
4e2
E XAMPLE 9.14. A question in a Christmas tree market This is the most conservative estimate of the sample size.
survey was “Did you have a Christmas tree last year?”
E XAMPLE 9.17. The use of email is growing rapidly
Of the 500 respondents, 421 answered “Yes.”
and is having a dramatic effect on the way we commu-
nicate. Suppose that we want to determine the current
(a) Find the sample proportion and its standard error.
proportion of Canadian households using email. How
(b) Give a 90% confidence interval for the proportion many households must be surveyed to estimate the pro-
of Indiana households who had a Christmas tree portion with a 90% confidence and an error of no more
this year. than 3%?
Confidence (n Intervals 2
− 1)s2 for2 σ (n − 1)s2
E XAMPLE 9.18. Do older adults and young adults have < σ < ,
χ2α/2of a random χ
If s2 is the variance 2
sample
1−α/2 of size n from a
different views on Canada’s involvement in the war in
Afghanistan? A sample of 150 older adults (agedwhere 40 χ2 normal population, a 100(1 − α)% confidence interval
2
and
α/2for σ 2χis arebyχ2 -values with v = n − 1 degrees of freedom, leaving
given
1−α/2
- 65) and a sample of 120 young adults (aged 18 - 30)
areas of α/2 and 1 − α/2, respectively, to the right.
were selected. Respondents were asked if they approved
An approximate 100(1(n−−α)% 1)s2confidence (n − 1)s2 for σ is obtained by taking
of Canada’s continued involvement in Afghanistan. Of 2
< σ 2 < interval
2 for σ 2 .
the square root of each endpoint χα/2 of the interval χ1−α/2
the older adults, 87 said they agree with the decision,
while 54 of the young adults said they agree. Let p1
Example 9.18: The following are the 2 weights,
and χin 2 decagrams,2 of 10 packages of grass seed distributed
be the true population proportion of all older adults who where χα/2 1−α/2 are χ -values with ν = n − 1
by a certain company: 46.4, 46.1, 45.8, 47.0, 46.1, 45.9, 45.8, 46.9, 45.2, and 46.0.
pro- a 95%degrees
agree with the war and let p2 be the true populationFind of freedom, leaving areas of α/2 and 1 − α/2,
confidence interval for the variance of the weights of all such packages
portion of all young adults who agree with the war.ofCal- respectively, to the right.
grass seed distributed by this company, assuming a normal population.
culate a 95% confidence interval for the difference
Solution in we find
: First E XAMPLE 9.19. The bottlers of a new soft drink are ex-
population proportions p1 − p2 .
periencing problems # n with$the
#
%2 mechanism for their
n filling
2
n x
16 floz bottles. To estimate i − x
the standard deviation of the
i
9.12 Single Sample: Estimating the n(n − 1) for 20 bottles was mea-
volume
sured, yielding a sample standard deviation of 0.1 floz.
(10)(21, 273.12) − (461.2)2
Variance Compute a = 95% confidence interval for the
(10)(9)
= 0.286.
population
variance.
We have shown that the sample variance S2 is an un- N OTE . Under the same setups, a 100(1 − α)% confi-
biased estimator of the population variance σ 2 . Thus, dence interval for σ is given by
S2 is a point estimate of σ 2 . s s
We have also shown that the statistic (n − 1)s2 (n − 1)s2
2
<σ < 2
χα/2 χ1−α/2
(n − 1)S2
χ2 = ∼ χ 2 (n − 1)
σ2 E XAMPLE 9.20. Suppose that the data collected from
if random samples of size n are selected from a normal a random sample of 20 observations from a normal pop-
population. ulation and the sample variance is 100. Construct a 90%
confidence interval for the population standard devia-
Based on this,
tion σ .
2
P χ1−α/2 < χ 2 < χα/2
2
= 1−α