Sei sulla pagina 1di 10

CHAPTER 9

ONE- AND TWO-SAMPLE ESTIMATION PROBLEMS

9.1 Introduction on sampling distributions, they require a probability


model for the data. Trustworthy probability models
The purpose of statistical inference is to draw con- can arise in many ways, but the model is most secure
clusions from data. We have already examined data and inference is most reliable when the data are pro-
and arrived at conclusions many times in the previous duced by a properly randomized design. When you use
chapters. Formal inference emphasizes substantiating statistical inference, you are acting as if the data come
our conclusions by probability calculations. from a random sample or a randomized experiment.

Although there are many specific recipes for in-


ference, there are only a few general types of statis- 9.3 Classical Methods of Estimation
tical inference. This chapter and next chapter will
introduce the two most common types: confidence in-
tervals and tests of significance. We usually refer them A point estimate of some population parameter θ is
as the problems of estimation and hypothesis testing. a single value θ̂ of a statistic Θ̂. For example, the
∑n xi ∑n Xi
value x = i=1 of the statistic X = i=1 is a point
n n
estimate of the population parameter µ. Similarly,
9.2 Statistical Inference p̂ = x/n is a point estimate of the true proportion p
for a binomial experiment.
Statistical inference draws conclusions about a popu- Note that we don’t expect an error-free estima-
lation or process based on sample data. For instance, tion because of the sampling bias and variability. Also,
one uses the sample mean x to make generalization there may be more than one point estimates for a
for the population mean µ, and uses the sample stan- population unknown parameter; we need choose one
dard deviation s for the population standard deviation wisely.
σ . It also provides a statement, expressed in terms of
probability, of how much confidence we can place in Unbiased Estimator
our conclusions. A statistic Θ is said to be an unbiased estimator of the
For the problem of estimation of unknown pop- parameter θ if
ulation parameters such as the mean, the proportion, 
and the variance, the trend is to distinguish between µΘ̂ = E Θ̂ = θ .
the classical method, or frequentist method, whereby
inferences are based strictly on information obtained E XAMPLE 9.1. Show that X is an unbiased estimator
from a random sample selected from the population, of the parameter µ.
and the Bayesian method, which utilizes prior sub-
E XAMPLE 9.2. Show that S2 is an unbiased estimator
jective knowledge about the probability distribution of
of the parameter σ 2 .
the unknown parameters in conjunction with the in-
formation provided by the sample data. We shall use
classical methods to estimate unknown population pa- Most Efficient Estimator
rameters by computing statistics from random samples
If we consider all possible unbiased estimators of some
and applying the theory of sampling distributions.
parameter θ , the one with the smallest variance is
Because the methods of formal inference are based called the most efficient estimator of θ .
44 Chapter 9. One- and Two-Sample Estimation Problems

E XAMPLE 9.3. Which one of the following estimators • About 95% of the √ values of X are expected to
Chapter 9 One- and Two-Sample Estimation Problems
is the most efficient one? fall within 2(σ / n) of µ, i.e.,
 
σ σ
^ P µ −2√ ≤ X ≤ µ +2√ = 0.95.
! 1 n n
^
! 3 Exchanging the positions of µ and X,
 
σ σ
P X −2√ ≤ µ ≤ X +2√ = 0.95.
n n
^
! 2
• Our sample
  x = 160, then
gives  the interval is
^
θ
θ from x − 2 n to x + 2 n , or,
σ
√ √σ

 
Figure 9.1: Sampling distributions of different estimators of θ. 10 10
In many situations, we prefer to determine an 160 − 2 √ , 160 + 2 √
36 36
interval within which we would expect to find the value
of X̃. Thus, both estimates x̄ and x̃ will, on average, equal the population mean or,
µ, but x̄ of the parameter.
is likely Such
to be closer to µ foran interval
a given is called
sample, an X̄
and thus interval
is more efficient
(157, 163)
than X̃. estimate.
• This is an interval estimate of the unknown pa-
Estimation Interval Estimation rameter of µ.
Even theAn most efficientestimate
interval unbiased of
estimator is unlikely
a population to estimateθ the
parameter population
is an
parameter exactly. It is true that estimation accuracy increases with large samples, – 0.95, confidence level or confidence coeffi-
but thereinterval
is still noof the we
reason form
should expect a point estimate from a given sample cient
to be exactly equal to the population parameter it is supposed to estimate. There
L < θ < to , – (157, 163), 95% confidence interval of µ
are many situations in which it isθ̂preferable θ̂Udetermine an interval within which
we would expect to find the value of the parameter. Such an interval is called an – 157, lower confidence limit
intervalwhere
estimate.θ̂L and θ̂U depend on the value of the statistic
– 163, upper confidence limit
An interval
Θ̂ for estimate of a population
a particular sample andparameter θ is the
also on an interval
sampling of the form
θ̂L < θ < θ̂U , where θ̂L and θ̂U depend on the value of the statistic Θ̂ for a
distribution of Θ̂. – 6 = 163 − 157, interval width
particular sample and also on the sampling distribution of Θ̂. For example, a
random sample of SAT verbal scores for students in the entering freshman class
might produce an interval from 530 to 550, within which we expect to find the Interpretation of Interval Estimates
true average of all SAT verbal scores for the freshman class. The values of the
9.4 Single Sample: Estimating the
endpoints, 530 and 550, will depend on the computed sample mean x̄ and the From the sampling distribution of Θ̂, we shall be able
2 to determine θ̂ and θ̂
U such that
2
sampling distribution of X̄. As the sample size increases, we know that σX̄ = σ /n
Mean
decreases, and consequently our estimate is likely to be closer to the parameter µ,
L

resulting in a shorter interval. Thus, the interval estimate indicates, by its length, P Θ̂L < θ < Θ̂U = 1 − α,
the accuracy of the point estimate. An engineer will gain some insight into the
9.4.1
population An defective
proportion Introductory
by taking aExample.
sample and computing theSample:
samplefor 0 < α < 1, then we have a probability of 1 − α of
9.4 Single Estimating the Mean 271
proportion defective. But an interval estimate might be more informative. selecting a random sample that will produce an interval
Let us now look at an example. containing
the shape of the θ .
distributions not too skewed, sampling theory guarantees good
results.
ation of Interval Estimates Clearly, the values of the random variables Θ̂L and Θ̂U , defined in Section 9.3,
The heights of the freshmen at UMD are sup- • confidence
The interval θ̂L < θ < θ̂U , computed from the
Since different samples will generally yield different values of Θ̂ and, therefore,are the limits
posed to follow a normal distribution with mean µ and selectedθ̂L = x̄ − zα/2 √ is
sample, called zα/2 √ . − α)% confi-
θ̂U = x̄a+ 100(1
different values for θ̂L and θ̂U , these endpoints of the interval are values of corre- σ σ
and
spondingstandard deviation
random variables Θ̂L σ =Θ̂10
and (in cm). A random sample
U . From the sampling distribution of Θ̂ we dence interval.
n n

shall be of
ablesize n = 36 is Ltaken, Uthe sample mean
to determine Θ̂ and Θ̂ such that P ( Θ̂ L < θx = 160.
< Θ̂ U ) is equal to any
Different samples will yield different values of x̄ and therefore produce different
interval
• estimates
The of the parameter
fraction 1 − α µ, is
as called
shown in theFigureconfidence
9.3. The dot atco-the
center of each interval indicates the position of the point estimate x̄ for that random
efficient
sample. Note orthese
that all of theintervals
degree are of confidence.
of the same width, since their widths
depend only on the choice of zα/2 once x̄ is determined. The larger the value we
• Use x = 160 to estimate the value of µ. This is choose for zα/2 , the wider we make all the intervals and the more confident we
• The endpoints, θ̂L andwill , are an
θ̂Uproduce called the lower
can be that the particular sample selected interval that contains
a point estimation of µ. andparameter
the unknown upperµ.confidence
In general, for alimits.
selection of z , 100(1 − α)% of the
α/2
intervals will cover µ.
• Is µ equal to 160?
• We would like to convert this point estimate into 10
9
a statement, like “the value of µ is between 150
8
cm and 170 cm” and attached to the statement
7
a measure of degree of confidence of it being
Sample

6
true. 5
4
3
• From the distribution of X, 2
1
X −µ
√ ∼ N (0, 1) x
σ/ n
µ

Figure 9.3: Interval estimates of µ for different samples.

STAT-3611 Lecture Notes 2015 Fall X. Li


Example 9.2: The average zinc concentration recovered from a sample of measurements taken
in 36 different locations in a river is found to be 2.6 grams per milliliter. Find
the 95% and 99% confidence intervals for the mean zinc concentration in the river.
Hence,
! "
X̄ − µ
P −z
Section α/2 <
9.4. √ Sample:
Single = 1 − α. the Mean
< zα/2 Estimating 45
σ/ n

Continue on the UMD freshmen height example.

• Interpretation of 95% confidence interval


1−α
– We are 95% confident that the interval
from 157 cm and 163 cm will contain the
true value of µ. α /2 α /2 z
−zα /2 0 zα /2
– We are 95% confident that the true value
of µ lies between 157 cm and 163 cm. Figure 9.2: P (−zα/2 < Z < zα/2 ) = 1 − α.
N OTE . Sometimes, it is easier to use the t table to find
– If we repeat the sampling processes over Multiplying the z-values. √
each term in the inequality by σ/ n and then subtracting X̄ from each
and over again, then approximately 95% of term and multiplying by −1 (reversing the sense of the inequalities), we obtain
N OTE . The !
C.I. is exact if the population "
is normal; it
the similarly constructed intervals are ex- σ population isσnon-normal and n is
is approximate if the
P X̄ − zα/2 √ < µ < X̄ + zα/2 √ = 1 − α.
pected to contain the true value of µ.
sufficiently large. n n

• Caution – Don’t say A random sample of size n is selected from a population whose variance σ 2 is known,
and theEmean
XAMPLE 9.4. High
x̄ is computed school
to give students
the 100(1 − α)% who takeinterval
confidence the SATbelow. It
– 95% of all freshmen at UMD are expected is important to
mathematicsemphasizeexamthatawe have
second invoked
time the Central
generally Limit
score Theorem
higher above.
As a result, it is important to note the conditions for applications that follow.
to have heights between 157cm and 163cm. than on their first try. The change in score has a normal
Confidence If x̄ isdistribution
the mean of witha random sampleσof2 =
variance size2500.
n fromAa random
population sample
with known
– We are 95% confident that a Interval
randomly se-
on µ, σ 2
variance 2
σ , a 100(1 − α)%gains
confidence interval forofµ xis =
given by
lected UMD freshman has a height between of 1000 students an average 22 points on
Known
their second try. σ σ
157cm and 163cm. x̄ − zα/2 √ < µ < x̄ + zα/2 √ ,
n n
– 95% of all simple random samples of 10 where zα/2
(a) isConstruct
the z-value a
leaving
90% an area of α/2 interval
confidence to the right.
for the mean
UMD freshmen will have mean height be- For small samples selected from nonnormal populations, we cannot expect our
score gain µ in the population of all students.
tween 157cm and 163cm. degree of confidence to be accurate. However, for samples of size n ≥ 30, with
(b) Interpret the C.I. in part (a).
In general, let us consider the interval estimate of the
unknown population mean µ, in general. If the sample (c) Repeat part (a) for levels of confidence of 95%
is selected from a normal population or, if n is suffi- and 99%.
ciently large, we can establish a confidence interval for
(d) How does increasing the confidence level affect
µ based on the sampling distribution of X.
the width of a confidence interval?

9.4.2 The case of known σ A wise user of statistics never plans data collec-
tion without at the same time planning the inference.
The idea is the same as that in the UMD freshmen We could arrange to have both high confidence and a
height example. small error.
 If e is a pre-fixed (perhaps, desired and specified)
P −zα/2 < Z < zα/2 = 1 − α
  amount that the error zα/2 √σn can not exceed, we set
X −µ
P −zα/2 < √ < zα/2 = 1 − α
σ/ n σ
  zα/2 √ < e
σ σ n
P µ − zα/2 √ < X < µ + zα/2 √ = 1−α
n n Solve for n, we have
 
σ σ
P X − zα/2 √ < µ < X + zα/2 √ = 1−α
n n Sample Size Determination
If x is used as an estimate of µ, we can be 100(1−α)%
confident that the error will not exceed a specified
Confidence Interval on µ, when σ Known
amount e when the sample size is
If x is the mean of a random sample of size n from a  2
population with known standard deviation σ , a 100(1− zα/2 σ
n=
α)% confidence interval for µ is given by e
σ σ
x − zα/2 √ < µ < x + zα/2 √ N OTE . When solving for the sample size, n, we round
n n
 all fractional values up to the next whole number. This
where zα/2 is the z-value such that P Z > zα/2 = way, we can be sure that our degree of confidence never
α/2. falls below 100(1 − α)%.

X. Li 2015 Fall STAT-3611 Lecture Notes


46 Chapter 9. One- and Two-Sample Estimation Problems
9.4 Single Sample: Estimating the Mean 275

E XAMPLE 9.5. A community health nutritionist wishes


to conduct a survey among a population of teenage girls
to determine their average daily protein intake (mea-
sured in grams). Assume that the population of protein 1 −α

intakes is normally distributed with a standard deviation


of 20 grams. If she wants a 95% confidence interval with
an error of no more than 5 grams, how many teenage α /2 α /2 t
−t α 2 0 tα 2
girls should be interviewed?
Figure 9.5: P (−tα/2 < T < tα/2 ) = 1 − α.

The derivation can be done in the similar fashion.


One-Sided Confidence Interval on µ, σ Known
Confidence If x̄ and s are the mean and standard deviation of a random sample from a
Interval on µ, σ 2

normal population with unknown variance σ 2 , a 100(1 − α)% confidence interval
If x is the mean of a random sample of size nUnknown
from for µ is
1 − α = P −tα/2 < T < tα/2
 
a population with standard deviation σ , the one-sided s X− µ √s ,
= α/2 √ <
Px̄ − t−t < µ <√
x̄ + tα/2
< tα/2
100(1 − α)% confidence intervals for µ are given by α/2n
s/ n n
where tα/2 is the t-value with v = n − 1 degrees of freedom, leavingan area of
α/2 to the right. s s
upper one-sided C.I.:
σ
−∞ <µ < x + zα √ = P X − tα/2 √ the<cases
We have made a distinction between
tα/2 √
µ <ofXσ+known andn σ unknown in
n n
computing confidence interval estimates. We should emphasize that for σ known
σ we exploited the Central Limit Theorem, whereas for σ unknown we made use
lower one-sided C.I.: x − zα √ <µ < ∞. of the sampling distribution of the random variable T . However, the use of the t-
n distribution is based on the premise that the sampling is from a normal distribution.
E XAMPLE
As long 9.7. In
as the distribution an experiment
is approximately on the
bell shaped, metabolism
confidence intervals can
of insects,
be computed when σAmerican
2
is unknowncockroaches were fedand
by using the t-distribution measured
we may expect
very good results.
amounts of a sugar solution after being deprived of food
Computed one-sided confidence bounds for µ with σ unknown are as the reader
The derivation can be similarly done. For the wouldfor a week
expect, namelyand of water for 3 days. After 2, 5, and
lower one-sided C.I., 10 hours, the researchers s dissected √some s of the cock-
x̄ + tα √ and x̄ − tα .
roaches and measured n the amount ofn sugar in various
 
X −µ Theytissues.
are the upper
Fiveand lower 100(1 −fed
cockroaches α)% thebounds,
sugarrespectively.
D-glucose Hereand
tα is the
1 − α = P (Z < zα ) = P √ < zα t-value having an area of α to the right.
σ/ n dissected after 10 hours had the following amounts (in
   
σ Example
σ 9.5: The micrograms)
contents of sevenof D-glucose
similar in oftheir
containers hindguts:
sulfuric acid are 9.8, 10.2, 10.4, 9.8,
= P X < zα √ + µ = P µ > X − zα √ 10.0, 10.2, and 9.6 liters. Find a 95% confidence interval for the mean contents of
n n all such containers, assuming an approximately normal distribution.
Solution : The sample55.95 68.24 deviation
mean and standard 52.73for the given
21.50data are23.78
E XAMPLE 9.6. An electrical firm manufactures light x̄ = 10.0 and s = 0.283.
bulbs that have a length of life that is approximately nor- (a) List the conditions that are required for this inter-
Using Table A.4, we find t0.025 = 2.447 for v = 6 degrees of freedom. Hence, the
mally distributed with a standard deviation of 40 hours. val estimation.
If a sample of 30 bulbs has an average life of 780 hours,
find a 98% lower one-sided confidence interval for the (b) Find a 99% confidence interval for the mean amount
population mean of all bulbs produced by this firm. of D-glucose in cockroach hindguts under these
conditions.

E XAMPLE 9.8. How much do users pay for Internet


9.4.3 The case of unknown σ service? Here are the monthly fees (in dollars) paid by a
random sample of 50 users of commercial Internet ser-
It is more practical and important that the population vice providers in August 2000: (Data from the August
standard deviation σ is assumed unknown. 2000 supplement to the Current Population Survey, from
the Census Bureau Web site, www.census.gov.)

Confidence Interval on µ, when σ Unknown


20 40 22 22 21 21 20 10 20 20
If x and s are the mean and the standard deviation 20 13 18 50 20 18 15 8 22 25
of a random sample of size n from a population with 22 10 20 22 22 21 15 23 30 12
unknown standard deviation σ , a 100(1 − α)% confi- 9 20 40 22 29 19 15 20 20 20
dence interval for µ is given by 20 15 19 21 14 22 21 35 20 22

s s
x − tα/2 √ < µ < x + tα/2 √ (a) Is it appropriate to use t confidence interval to an-
n n
alyze the data? Briefly explain.
where tα/2 is the t-value with (n − 1) degrees of free- (b) Give a 95% confidence interval for the mean monthly

dom such that P T (n − 1) > tα/2 = α/2. cost of Internet access in August 2000.

STAT-3611 Lecture Notes 2015 Fall X. Li


Section 9.5. Standard Error of a Point Estimate 47

One-Sided Confidence Interval on µ, σ Unknown 9.5 Standard Error of a Point Esti-


If x and s are the mean and the standard deviation mate
of a random sample of size n from a population with
standard deviation σ , the one-sided 100(1 − α)% con-
fidence intervals for µ are given by The standard error of X is the standard deviation of
s X.
upper one-sided C.I.: −∞ <µ < x + tα √ σ
n s. e.(x) = √
s n
lower one-sided C.I.: x − tα √ <µ < ∞.
n The estimated standard
√ error of X is defined by the
estimator of σ / n.
E XAMPLE 9.9. A meat inspector has randomly selected s
30 packs of 95% lean beef. The sample resulted in a . e.(x) = √
sd
n
mean of 96.2% with a sample standard deviation of 0.8%.
Find a 90% upper one-sided confidence interval for the It is also called as the standard error of X in many
leanness of all packs. Assume normality. statistical computing packages.
N OTE . All the 2-sided confidence intervals that we have
9.4.4 Large-Sample Confidence Interval constructed in preceding section can be written as
x ± (zα/2 or tα/2 ) · s. e.(x)
Assume that the sample size n is greater than 30 and
the population distribution is not too skewed. We may More generally, a 2-sided 100(1 − α)% C.I. for θ
utilize both the z-values and the sample standard de- is expressible of
viation s for estimating the population mean µ
s s θ̂ ± (critical value) · s. e.(θ̂ )
x − zα/2 √ < µ < x + zα/2 √
n n
This is often referred to as a large-sample confidence
interval.
N OTE . This can be regarded as a normal approximation 9.6 Prediction Intervals
(t-value becomes z-value when n is sufficiently large);
the quality of the approximation becomes better as the Take STAT 3612 for “Prediction Intervals”
sample size gets larger.
E XAMPLE 9.10. Due to the decrease in interest rates,
the First Citizens Bank received a lot of mortgage appli- 9.7 Tolerance Limits
cations. A recent sample of 100 mortgage loans resulted
in an average loan amount of $255,500 with a standard
Take STAT 3612 for “Tolerance Limits”
deviation of $25,000. Construct a 95% confidence inter-
val for the loan amount. for all customers who fill out
mortgage applications.
9.8 Two Samples: Estimating the Dif-
9.4.5 Summary
ference between Two Means

For estimating population means based on a single We will now conduct statistical inference procedures
sample, it is essential to require that (i) the statistics for estimating µ1 − µ2 , the difference between two pop-
(i.e., x and s) must be from a random sample, and ulation means, based on independent samples.
(ii) the population is normal, or, if failing, n ≥ 30. As in Subsection 8.4.3, suppose that we have
In general, when constructing the 2-sided C.I., two populations with means µ1 and µ2 and variances
we σ12 and σ22 , respectively. We take two independent
random samples, one from each population, of sizes n1
σ
• use x ± zα/2 √ , if σ is known. and n2 . Then it is quite nature to have the difference
n between two sample means X 1 − X 2 as a nature point
s estimator of the difference between two population
• use x ± tα/2 √ , if σ is unknown and n small.
n means µ1 − µ2 .
s For an interval estimate of µ1 − µ2 , we must con-
• use x ± zα/2 √ , if σ is unknown and n large.
n sider the sampling distribution of X 1 − X 2 .

X. Li 2015 Fall STAT-3611 Lecture Notes


48 Chapter 9. One- and Two-Sample Estimation Problems

9.8.1 Variances Known that the statistic


,s
Assume that both σ12 and σ22 are known, we have from (X 1 − X 2 ) − (µ1 − µ2 ) (n1 − 1)S12 + (n2 − 1)S22
T= p
Subsection 8.4.3, that σ 2 (1/n1 + 1/n2 ) σ 2 (n1 + n2 − 2)

X 1 − X 2 − (µ1 − µ2 ) follows the Student t-distribution with ν = n1 + n2 − 2
Z= q ∼ N(0, 1)
σ12 /n1 + σ22 /n2 degrees of freedom.
Define the pooled estimate of variance, or, the
Now,
pooled sample variance, as

1 − α = P −zα/2 < Z < zα/2
  (n1 − 1)S12 + (n2 − 1)S22
 S2p =
X 1 − X 2 − (µ1 − µ2 ) n1 + n2 − 2
= P −zα/2 < q < zα/2 
σ12 /n1 + σ22 /n2 N OTE . The pooled variance is just a weighted average
of the variances of X1 and X2 , where the weights are the
follows respective degrees of freedom.

C.I. for µ1 − µ2 , when both σ12 and σ22 known Then the above statistic becomes
If x1 and x2 are means of independent random sam- (X 1 − X 2 ) − (µ1 − µ2 )
T= q ∼ T (n1 + n2 − 2)
ples of sizes n1 and n2 from populations with known
S p n11 + n12
variances σ12 and σ22 , respectively, a 100(1−α)% con-
fidence interval for µ1 − µ2 is given by Now,
s

σ12 σ22 1 − α = P −tα/2 < T < tα/2
(x1 − x2 ) − zα/2 + < µ1 − µ2  
n1 n2
s (X 1 − X 2 ) − (µ1 − µ2 )
σ12 σ22 = P −tα/2 < q < tα/2 
1 1
< (x1 − x2 ) + zα/2 + S p n1 + n2
n1 n2
where zα/2 is the z-value defined previously. follows
N OTE . The confidence interval is exact when two in- C.I. for µ1 − µ2 , when σ12 = σ22 = σ 2 , but Unknown
dependent samples are taken from normal populations.
For non-normal populations, the Central Limit Theorem If x1 and x2 are means of independent random samples
allows for a pretty good approximation for reasonable of sizes n1 and n2 from populations with unknown but
size samples. equal variances, a 100(1−α)% confidence interval for
µ1 − µ2 is given by
E XAMPLE 9.11. We would like to compare the mean r
tar content in regular cigarettes and light cigarettes. We 1 1
take simple random samples of regular and light cigarettes (x1 − x2 ) − tα/2 s p + < µ1 − µ2
n1 n2
of a particular brand and measure the tar content (in mg) r
1 1
of each cigarette. The data are as follows: < (x1 − x2 ) + tα/2 s p +
n1 n2
Regular: 11.3 12.1 12.6 11.5 12.2 12.8 where tα/2 is the t-value defined previously and
Light: 9.5 9.8 9.3 8.9 10.0 s
(n1 − 1)s21 + (n2 − 1)s22
It is known that tar content for regular cigarettes of this sp = .
n1 + n2 − 2
brand follows a normal distribution with standard devi-
ation 0.4 mg and tar content for light cigarettes of this
brand follows a normal distribution with standard devi- E XAMPLE 9.12. An insurance company would like to
ation 0.3 mg. Find a 95% confidence interval for the know if men drive faster on average than women. The
difference in mean tar content for all regular cigarettes company took a random sample of 52 cars driven by men
and all light cigarettes of this brand. on a highway and found the mean speed to be 114 km/h
with a standard deviation of 10 km/h. Another sample
of 30 cars driven by women on the same highway gave
9.8.2 Variances Unknown but Equal a mean speed of 108 km/h with a standard deviation of
7 km/h. Construct a 98% confidence interval for the true
Assume that the population variances σ12 and σ22 are difference between the mean speeds of cars driven by
unknown but equal, i.e., σ12 = σ22 = σ 2 . We can show men and women on this highway.

STAT-3611 Lecture Notes 2015 Fall X. Li


Section 9.10. Single Sample: Estimating a Proportion 49

N OTE . It is practically important to determine whether 9.10 Single Sample: Estimating a


the population variances can be assumed to be equal or
not. A rule of thumb is to look at the ratio of the sample
Proportion
1 s1
standard deviations. If < < 2, the equal variance
2 s2 Suppose that we draw a random sample of size n from
can be assumed; otherwise unequal. a large population having population proportion p of
successes. Let X be the count of successes in the
sample that follows the Binomial distribution with pa-
9.8.3 Variances Unknown and Unequal rameters n and p.

Assume that the population variances σ12 and σ22 are Define the sample proportion of successes
unknown but unequal, i.e., σ12 6= σ22 . The statistic X
 Pb = .
X 1 − X 2 − (µ1 − µ2 ) · n
T= q ∼ T(ν)
When n is large, the sampling distribution of Pb is
s21 /n1 + s22 /n2
approximately normal with mean
(s21 /n1 + s22 /n2 )2    
where ν = . b X np
[(s1 /n1 ) /(n1 − 1)] + [(s22 /n2 )2 /(n2 − 1)]
2 2 µPb = E P = E = =p
n n
N OTE . The expression for v above is an estimate of the
and variance
degrees of freedom. In applications, it is rarely a whole  
  X np(1 − p) p(1 − p)
number, and we should round it down to the nearest 2 b
σPb = Var P = Var = = .
integer to achieve the desired confidence. n n2 n
That is,
C.I. for µ1 − µ2 , when σ12 6= σ22 , but Unknown r !
· p(1 − p)
If x1 and s21
and x2 and s22
are the means and vari- Pb ∼ N p, , n → ∞.
ances of independent random samples of sizes n1 and n
n2 , respectively, from approximately normal popula- or,
tions with unknown and unequal variances, an ap- Pb − p
proximate 100(1 − α)% confidence interval for µ1 − µ2 Z=q → N(0, 1)
p(1−p)
is given by n
s N OTE . As a thumb rule, this approximation requires
s21 s22 np ≥ 5 and n(1 − p) ≥ 5.
(x1 − x2 ) − tα/2 + < µ1 − µ2
n1 n2
s Now,
s21 s22 
< (x1 − x2 ) + tα/2 + 1 − α = P −zα/2 < Z < zα/2
n1 n2  
where tα/2 is the t-value defined previously with ν as Pb − p
= P −zα/2 < q < zα/2 
above. p(1−p)
n
E XAMPLE 9.13. The gasoline prices (in cents/litre) for  
a random sample of 8 Winnipeg gas stations and 5 Cal- Pb − p
gary gas stations are recorded one day and are shown ≈ P −zα/2 < q < zα/2 
p̂(1− p̂)
below: n
r r !
p̂(1 − p̂) p̂(1 − p̂)
= P Pb − zα/2 < p < Pb + zα/2
Winnipeg: 119.9 122.4 121.7 120.9 n n
121.0 122.9 119.9 121.7
where we used the point estimate p̂ = x/n to replace
p under the radical sign.
Calgary: 117.9 120.4 118.4 122.9 117.0

Find a 95% confidence interval for the difference in mean Large-Sample Confidence Intervals for p
gas prices for the two cities. If p̂ is the proportion of successes in a random sam-
ple of size n an approximate 100(1 − α)% confidence
interval, for the binomial parameter p is given by
9.9 Paired Observations r r
b p̂(1 − p̂) b p̂(1 − p̂)
P − zα/2 < p < P + zα/2
n n
Take STAT 3612 for “Paired Observations”

X. Li 2015 Fall STAT-3611 Lecture Notes


50 Chapter 9. One- and Two-Sample Estimation Problems

N OTE . The (estimated) standard error of Pb is defined N OTE . If we have no idea what the value of p might be,
by r we can use p̂ = 0.5 in the sample size formula, i.e.,
b = p̂(1 − p̂) .
s. e.(P) (zα/2 )2
n n= .
4e2
E XAMPLE 9.14. A question in a Christmas tree market This is the most conservative estimate of the sample size.
survey was “Did you have a Christmas tree last year?”
E XAMPLE 9.17. The use of email is growing rapidly
Of the 500 respondents, 421 answered “Yes.”
and is having a dramatic effect on the way we commu-
nicate. Suppose that we want to determine the current
(a) Find the sample proportion and its standard error.
proportion of Canadian households using email. How
(b) Give a 90% confidence interval for the proportion many households must be surveyed to estimate the pro-
of Indiana households who had a Christmas tree portion with a 90% confidence and an error of no more
this year. than 3%?

E XAMPLE 9.15. When trying to hire managers and ex-


ecutives, companies sometimes verify the academic cre- 9.11 Two Samples: Estimating the
dentials described by the applicants. One company that
performs these checks summarized its findings for a six- Difference between Two Pro-
month period. Of the 84 applicants whose credentials portions
were checked, 15 lied about having a degree. (Data pro-
vided by Jude M. Werra & Associates, Brookfield, Wis-
consin.) We will now turn our attention to the case where we
wish to compare two population proportions and would
like to estimate the difference in population propor-
(a) Find the proportion of applicants who lied about
tions p1 − p2 , where p1 and p2 are the true propor-
having a degree and its standard error.
tions of all individuals in Population 1 and Population
(b) Consider these data to be a random sample of cre- 2 who have some attribute, respectively.
dentials from a large collection of similar appli- To do this, we will take a random sample of size
cants. Give a 95% confidence interval for the true n1 from Population 1 and a random sample of size n2
proportion of applicants who lie about having a from Population 2, and then calculate p̂1 and p̂2 , the
degree. sample proportions from the first and second samples,
respectively.
In a similar fashion, if e is a pre-fixed
p amount that
the error can not exceed, we set zα/2 p̂(1 − p̂)/n < e. Hence it is quite nature that our point estimate
and solve for n to determine the sample size. of p1 − p2 is p̂1 − p̂2 . The mean of p1 − p2 is
µ p̂1 − p̂2 = E ( p̂1 − p̂2 )
Sample Size Determination
= E ( p̂1 ) − E ( p̂2 ) = p1 − p2
If p̂ is used as an estimate of p, we can be 100(1−α)%
and, since the sample proportions are independent, the
confident that the error will be less than a specified
variance of p̂1 − p̂2 is
amount e when the sample size is approximately
σ p̂21 − p̂2 = Var ( p̂1 − p̂2 )
(zα/2 )2 p̂(1 − p̂)
n= = Var ( p̂1 ) + Var ( p̂2 )
e2
p1 (1 − p1 ) p2 (1 − p2 )
= +
N OTE . In order to ensure the the confidence degree is n1 n2
no less than 100(1 − α)%, we round all fractional values
up to the next whole number. If both sample sizes are large, we have the ap-
proximate distribution of p̂1 − p̂2 :
E XAMPLE 9.16. An automobile manufacturer would  s 
like to know what proportion of its customers are dissat- p (1 − p ) p (1 − p )
1 1 2 2 
isfied with the service received from their local dealer. p̂1 − p̂2 ∼ N  p1 − p2 , +
n1 n2
The customer relations department will survey a random
sample of customers and compute a 95% confidence in- and so
terval for the proportion that are dissatisfied. From past
( p̂1 − p̂2 ) − (p1 − p2 ) ·
studies, they believe that this proportion will be about Z=q ∼ N(0, 1)
p1 (1−p1 )
0.25. Find the sample size needed if the error of the n1 + p2 (1−p
n2
2)

confidence interval is to be no more than 0.02.

STAT-3611 Lecture Notes 2015 Fall X. Li


Section 9.12. Single Sample: Estimating the Variance 51
304 Chapter 9 One- and Two-Sample Estimation Problems

N OTE . The (estimated) standard error of p̂1 − p̂2 is given


by the estimate of the standard deviation
s
p̂1 (1 − p̂1 ) p̂2 (1 − p̂2 )
s. e.( p̂1 − p̂2 ) = + .
n1 n2
1!α

Large-Sample Confidence Interval for p1 − p2


α /2 α /2
!2
If p̂1 and p̂2 are the proportions of successes in ran- 0 2
!1" !α2 /2
α /2
dom samples of sizes n1 and n2 , respectively, an ap-
proximate 100(1 − α)% confidence interval for the dif-
Figure 9.7: P (χ21−α/2 < X 2 < χ2α/2 ) = 1 − α.
ference of two binomial parameters, p1 − p2 , is given And,
by  
2
Dividing each term
1 −inα the
= Pinequality
2
χ1−α/2 < (n−−1)S
by(n 1)S 2<and2 then inverting each term
χα/2
( p̂1 − p̂2 ) − zα/2 s. e.( p̂1 − p̂2 ) < p1 − p2 2
(thereby changing the sense of the inequalities), we obtain
σ
!
< ( p̂1 − p̂2 )+zα/2 s. e.( p̂1 − p̂2 ) !
2 1 σ2 2 " 1
(n −= 1)S
P < (n − 1)S < 2
P 2 σ 2 <(n − 1)S2
< = 1 − α.
or χ2α/2 χα/2 χ21−α/2 χ1−α/2
s !
p̂1 (1 − p̂1 ) p̂2 (1 − p̂2 ) (n − 1)S2 2 (n − 1)S2
( p̂1 − p̂2 ) − zα/2 + − p2a random sample of=size
< p1 For P n from 2
<σ <
a normal population,
2
the sample variance s2
n1 n2 is computed, and the following 100(1 χ α/2− α)% confidenceχ interval for σ 2 is obtained.
1−α/2
s
p̂1 (1 − p̂Confidence
1) p̂2 (1 − If
p̂2 )s2 is thewhere 2 ofand
variance
χα/2 2
a random
χ1−α/2 sample
are as of
wesize n from
defined a normal population, a
previously.
< ( p̂1 − p̂2 ) + zα/2 +
nInterval
1 for σ n2 100(1 − α)% confidence interval for σ 2 is
2

Confidence (n Intervals 2
− 1)s2 for2 σ (n − 1)s2
E XAMPLE 9.18. Do older adults and young adults have < σ < ,
χ2α/2of a random χ
If s2 is the variance 2
sample
1−α/2 of size n from a
different views on Canada’s involvement in the war in
Afghanistan? A sample of 150 older adults (agedwhere 40 χ2 normal population, a 100(1 − α)% confidence interval
2
and
α/2for σ 2χis arebyχ2 -values with v = n − 1 degrees of freedom, leaving
given
1−α/2
- 65) and a sample of 120 young adults (aged 18 - 30)
areas of α/2 and 1 − α/2, respectively, to the right.
were selected. Respondents were asked if they approved
An approximate 100(1(n−−α)% 1)s2confidence (n − 1)s2 for σ is obtained by taking
of Canada’s continued involvement in Afghanistan. Of 2
< σ 2 < interval
2 for σ 2 .
the square root of each endpoint χα/2 of the interval χ1−α/2
the older adults, 87 said they agree with the decision,
while 54 of the young adults said they agree. Let p1
Example 9.18: The following are the 2 weights,
and χin 2 decagrams,2 of 10 packages of grass seed distributed
be the true population proportion of all older adults who where χα/2 1−α/2 are χ -values with ν = n − 1
by a certain company: 46.4, 46.1, 45.8, 47.0, 46.1, 45.9, 45.8, 46.9, 45.2, and 46.0.
pro- a 95%degrees
agree with the war and let p2 be the true populationFind of freedom, leaving areas of α/2 and 1 − α/2,
confidence interval for the variance of the weights of all such packages
portion of all young adults who agree with the war.ofCal- respectively, to the right.
grass seed distributed by this company, assuming a normal population.
culate a 95% confidence interval for the difference
Solution in we find
: First E XAMPLE 9.19. The bottlers of a new soft drink are ex-
population proportions p1 − p2 .
periencing problems # n with$the
#
%2 mechanism for their
n filling
2
n x
16 floz bottles. To estimate i − x
the standard deviation of the
i

fill volume,s2 =the i=1


filled
i=1

9.12 Single Sample: Estimating the n(n − 1) for 20 bottles was mea-
volume
sured, yielding a sample standard deviation of 0.1 floz.
(10)(21, 273.12) − (461.2)2
Variance Compute a = 95% confidence interval for the
(10)(9)
= 0.286.
population
variance.
We have shown that the sample variance S2 is an un- N OTE . Under the same setups, a 100(1 − α)% confi-
biased estimator of the population variance σ 2 . Thus, dence interval for σ is given by
S2 is a point estimate of σ 2 . s s
We have also shown that the statistic (n − 1)s2 (n − 1)s2
2
<σ < 2
χα/2 χ1−α/2
(n − 1)S2
χ2 = ∼ χ 2 (n − 1)
σ2 E XAMPLE 9.20. Suppose that the data collected from
if random samples of size n are selected from a normal a random sample of 20 observations from a normal pop-
population. ulation and the sample variance is 100. Construct a 90%
confidence interval for the population standard devia-
Based on this,
tion σ .
 
2
P χ1−α/2 < χ 2 < χα/2
2
= 1−α

X. Li 2015 Fall STAT-3611 Lecture Notes


This page is intentionally blank.

Potrebbero piacerti anche