Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
The null hypothesis of Tomato Example 10.1 is: N OTE . Note again that hypotheses always refer to some
population or model, not to a particular outcome. For
H0 : µ = 227 this reason, we must state H0 and H1 in terms of popu-
lation parameters. So, x > 100, x1 − x2 < 1.2, p̂ 6= 0.75,
where µ is the true average weight of tomato packs. σ 2 < 0.4 or σ12 /σ22 6= 1 would NOT be the correct sta-
N OTE . The conclusions do NOT involve a formal and tistical hypothesis.
literal “accept H0 .”
The most well-known illustration for hypothesis
One should reach at one of the two following testing is on the predicament encountered in a jury
conclusions: trial, where the null and alternative hypotheses are:
H0 : defendant is innocent
• reject H0 in favor of H1 because of sufficient
H1 : defendant is guilty.
evidence in the data
• fail to reject H0 because of insufficient evidence Would ”failure to reject H0 ” imply innocence? Do you
in the data. think the jury should ”accept H0 ” or ”fail to reject
H0 ”?
N OTE . Keep in mind that one should reach at one of
Alternative Hypothesis H1
the two following conclusions:
The statement making the claim which we are trying
to find evidence for supporting is called the alternative • reject H0 in favor of H1 because of sufficient evi-
hypothesis, denoted H1 or Ha . dence in the data
E XAMPLE 10.2 (Hepatitis B Vaccine). Henning et al • When H0 is true, we expect the estimate to take
(Amer. J. Pub. Health 1992) found that in a sample of a value near the parameter value specified by H0 .
670 infants, 66 percent of them had completed the hep-
atitis B vaccine series. Can we conclude on the basis of • Values of the estimate far from the parameter
these data that in the sampled population, more than 60 value specified by H0 give evidence against H0 .
percent infants have completed the hepatitis B vaccine
• The alternative hypothesis determines which di-
series? What are the null and alternative hypotheses?
rection count against H0 .
E XAMPLE 10.3 (Material). An experiment was per-
formed to compare the abrasive wear of two different
laminated materials. Twelve pieces of material 1 were
The test statistic of Tomato Example 10.1 can be
tested by exposing each piece to a machine measuring
found to be
wear. Ten pieces of material 2 were similarly tested. In
each case, the depth of wear was observed. The samples x − µ0 218 − 227
z0 = √ = √ = −3.6
of material 1 gave an average (coded) wear of 81 units σ/ n 5/ 4
with a sample standard deviation of 4, while the samples
of material 2 gave an average of 85 with a sample stan-
dard deviation of 5. Can we conclude that the abrasive The test statistic assesses how far the estimate is from
wear of material 1 is less than that of material 2? the parameter. The standardized estimate is usually
The alternative hypothesis allows for the possibility that µ < 68 or µ > 68.
A sample mean that falls close to the hypothesized
Section 10.2. Testingvalue a
of Hypothesis
68 would be consid- 55
ered evidence in favor of H0 . On the other hand, a sample mean that is considerably
less than or more than 68 would be evidence inconsistent with H0 and therefore
favoring H1 . The sample mean is the test statistic in this case. A critical region
for the test statistic might arbitrarily be chosen to be the two intervals x̄ < 67
adopted. In many common situations the test statistic For testing H : µ = 68 v.s. H : µ 6= 68,
and x̄ > 69. The nonrejection
0 1 be the interval 67 ≤ x̄ ≤ 69. This
region will then
decision criterion is illustrated in Figure 10.4.
is of form
estimate - hypothesized value Reject H0 Do not reject H0 Reject H0
( µ!
" 68) ( µ ! 68) (µ !
" 68)
standard deviation of the estimate x
67 68 69
For example, the statistic for testing H0 : µ = µ0 is
Figure 10.4: Critical region (in blue).
x − µ0 A summary fordecision
thecriterion
Critical/Rejection Region Ap-
Let us now use the of Figure 10.4 to calculate the probabilities
z0 = √ , when σ is known of committing type I and type II errors when testing the null hypothesis that µ = 68
σ/ n proach
kilograms against the alternative that µ != 68 kilograms.
x−µ Assume the standard deviation of the population of weights to be σ = 3.6. For
t0 = √ 0 , when σ is unknown. large samples, we may substitute s for σ if no other estimate of σ is available.
s/ n 1. State the null and alternative hypotheses.
Our decision statistic, based on a random sample of size n = 36, will be X̄, the
most efficient estimator of µ. From the Central Limit Theorem, we know that
2. Choose an appropriate test statistic and estab-
the sampling
√ distribution of X̄ is approximately normal with standard deviation
σX̄ = σ/ n = 3.6/6 = 0.6.
lish the critical region based on α.
Significance Level α 3. Reject H0 if the computed test statistic is in the
critical region. Otherwise, do not reject.
The probability that the test statistic will fall in
the critical region when the null hypothesis is actually 4. Draw scientific or engineering conclusions.
true. It is also the probability of making the mistake of
rejecting the null hypothesis when it is true. Common E XAMPLE 10.4 (Health). A health advocacy group sus-
choices of α is 0.01, 0.05, 0.10, with 0.05 being most pects that cigarette manufacturers sell cigarettes with a
common. nicotine content higher than what they advertise (1.4 mg
per cigarette) in order to better addict consumers to their
N OTE . If the level of significance α is NOT given, a
products and maintain revenues. The health advocacy
rule of thumb is to take α = 0.05.
group took a sample random sample of 12 cigarettes and
found that the sample mean was 1.6 mg with a standard
Critical Region (Rejection Region) deviation of 0.3 mg. Suppose that the measurement of
nicotine in cigarettes is normally distributed. Is there
The set of all values of the test statistic that cause
any evidence for the suspicion of the health advocacy
us to reject the null hypothesis. They are the extreme
group at α = 5%? State the hypotheses, calculate the
regions bounded by the critical values.
value of the test statistic, write the rejection region and
draw the statistical conclusion.
For example, again, suppose that we test
• H0 : µ = µ0 v.s. H1 : µ 6= µ0 : The critical region Type I Error, Type II Error and Power
is in the two extreme regions 10.2
(bothTesting
tails).a Statistical Hypothesis 323
As shown in the following diagram, there are four sit-
test statistic value > zα/2 or tα/2 uations we 10.1.
may encounter upon making a conclusion
summarized in Table
for a statistical test of hypothesis:
• H0 : µ = µ0 v.s. H1 : µ < µ0 : The critical region Table 10.1: Possible Situations for Testing a Statistical Hypothesis
is in the extreme left region (left tail). H0 is true H0 is false
Do not reject H0 Correct decision Type II error
test statistic value < −zα or − tα Reject H0 Type I error Correct decision
The probability of committing a type I error, also called the level of signif-
• H0 : µ = µ0 v.s. H1 : µ > µ0 : The critical regionicance, Obviously,
is denoted bywe
thewant
Greek to reject
letter Hour
α. In 0 when it is false
illustration, a typeand
I error will
is in the extreme right region (right tail). we more
occur when wantthan
to fail to rejectinoculated
8 individuals H0 whenwithit isthetrue.
new vaccine surpass the
2-year period without contracting the virus and researchers conclude that the new
vaccine is better when it is actually equivalent to the one in use. Hence, if X is
test statistic value > zα or tα Type I Error
the number of individuals who remain free of the virus for at least 2 years,
Rejection of the null !
hypothesis when1 "
it is #
true
20 !is called
1
"
αa =type
P (type I error) = P
I error. X > 8 when p = = b x; 20,
Decision Criterion using Critical Regions 4 x=9
4
8
# ! "
1
= 1 −We say
b x;we
20,have=made a Type
1 − 0.9591 I Error if we reject
= 0.0409.
• If the test statistic falls within the critical region, H0 when 4
x=0 H0 is true.
we reject H0 at a level of significance α.
We say that the null hypothesis, p = 1/4, is being tested at the α = 0.0409 level of
α = P (type I error)
significance. Sometimes the level of significance is called the size of the test. A
• If the test statistic does not fall within the criti-critical region of size 0.0409=
is P (reject
very 0 |Htherefore
small,Hand 0 true) it is unlikely that a type
cal region, we fail to (do not) reject H0 at an αI error will be committed. Consequently, it would be most unusual for more than
level. 8 individuals
It is to remain
also immune
called to a virus
the size of thefortest.
a 2-year period using a new vaccine
that is essentially equivalent to the one now on the market.
N OTE . This is just our regular level of significance we (c) Compare, graphically, the results in part (b) and
have been using all along. discuss your findings.
Power
The power of a test is the probability of rejecting H0 10.3 The Use of P -Values
given that a specific alternative is true.
P-value
Power = 1 − β
= P (reject H0 |H1 true) The probability, computed assuming that H0 is true,
that the test statistic would take a value as extreme or
more extreme than the actual observed value is called
N OTE . The power of a test of significance depends on the P-value of the test.
µ1 , the specific value of the mean10.3 Thethe
under Usealternative
of P -Values for Decision Making in Testing Hypotheses 333
hypothesis. In other words, a P-value is the lowest level (of
A Graphical Demonstration of a P-Value
significance) at which the observed value of the test
E XAMPLE 10.5. Suppose we wanted to conduct a One hy- very simple
statistic is significant.
way of explaining a P -value graphically is to consider two distinct
pothesis test to determine whether or not a coin is fair, samples. Suppose that two materials are being considered for coating a particular
i.e., H0 : p = 0.5 vs. H1 : p 6= 0.5, where p is the proba-
type of metal in order to inhibit corrosion. Specimens are obtained, and one
• This is a way of assessing the “believability” of
bility of the coin landing on heads. We will flip the coincollection is coated with material 1 and one collection coated with material 2. The
sample sizes are n the=null
n2 =hypothesis given the
10, and corrosion evidenceinprovided
is measured percent of surface
ten times and count X, the number of heads we observe. 1
area affected. Theby a random
hypothesis sample.
is that the samples came from common distributions
We will reject H0 if X ≤ 2 or if X ≥ 8. with mean µ = 10. Let us assume that the population variance is 1.0. Then we
are testing • The P-value is used to measure the strength of
(a) What is the probability of a Type I Error? the evidence in the data against H0 .
H0: µ1 = µ2 = 10.
(b) What is the power of the test if p = 0.3? • The smaller the P-value, the stronger the evi-
Let Figure 10.8 represent a point plot of the data; the data are placed on the
dence against H0 .
distribution stated by the null hypothesis. Let us assume that the “×” data refer to
E XAMPLE 10.6. A manufacturer has developed a new material 1 and the “◦” data refer to material 2. Now it seems clear that the data do
fishing line, which the company claims has a mean break- • The P-value is calculated assuming that the null
refute the null hypothesis. But how can this be summarized in one number? The
hypothesis
P-value can be viewed as simply
ing strength of 15 kilograms with a standard deviation H 0 is true.
the probability of obtaining these data
given
of 0.5 kilogram. To test the hypothesis that µ = 15 kilo- that both samples come from the same distribution. Clearly, this
• The
probability is quite pre-fix
small, significanceThus,
say 0.00000001! levelthe
is small
not required for refutes
P -value clearly
grams against the alternative that µ < 15 kilograms, a calculating the P-values.
H0 , and the conclusion is that the population means are significantly different.
random sample of 50 lines will be tested. The critical
region is defined to be x < 14.9.
Figure 10.8: Data that are likely generated from populations having two different
STAT-3611 Lecture Notes means. 2015 Fall X. Li
For example, again, the P-values can be calculated as N OTE . This approach and the rejection region approach
follows for one sample mean testing problem. are the same in nature. The confidence area is the “op-
posite” of the rejection region.
To test a population mean µ when σ is known,
E XAMPLE 10.8. Bottles of antacid tablets include a
• 2P (Z ≥ |z0 |) for H1 : µ 6= µ0 printed label claiming that they contain 1000 mg of cal-
cium carbonate. In a simple random sample of 21 bottles
• P (Z ≥ z0 ) for H1 : µ > µ0
of tablets made by the Medassist Pharmaceutical Com-
• P (Z ≤ z0 ) for H1 : µ < µ0 pany, the amounts of calcium carbonate are measured
and the average amount of calcium carbonate is found
x − µ0 to be 985 mg. Suppose that the amounts of calcium
where z0 = √ and Z ∼ N (0, 1).
σ/ n carbonate in bottles of antacid tablets produced by the
company follows a normal distribution with a standard
To test a population mean µ when σ is unknown, deviation of 50 mg. Is there sufficient evidence to sup-
port the claim that consumers are being cheated at the
• 2P (T ≥ |t0 |) for H1 : µ 6= µ0 significance level of α = 0.01. Use the confidence inter-
• P (T ≥ t0 ) for H1 : µ > µ0 val approach.
x − µ0
where t0 = √ and T ∼ t(n − 1). • The critical region method and the confidence
s/ n interval method give a black and white answer:
N OTE . The inequality sign for P-value is consistent with Reject or do not reject H0 . But it also estimates a
that in the one-sided alternative hypothesis. range of likely values for the true population pa-
rameters.
Decision Criterion using P-values
• A P-value quantifies how strong the evidence is
• If the P-value ≤ α, we reject H0 at the level of against the H0 . But if you reject H0 , it does not
significance α. provide any information about the true population
• If the P-value > α, we fail to (do not) reject H0 parameter.
at the level of significance α.
pressure in this sample is 126.07 and the standard devia- E XAMPLE 10.11. Refer to Example Material (10.3).
tion is 15. Is this evidence that executive blood pressures Conduct a hypothesis test at a level of significance 5%.
are higher than the national average? Assume normality. Assume the populations to be approximately normal with
unknown-but-equal variances.