Sei sulla pagina 1di 26

Testing hypotheses about

proportions
Tests of significance
The reasoning of significance tests
Stating hypotheses
The P-value
Statistical significance
Tests for a population proportion
Confidence intervals to test hypotheses
Reasoning of Significance Tests

Example: A coin is toss 500 times. It lands heads
275 times, which is a bit more than we expect. Is
the coin fair or not?
Is the somewhat higher number of heads due to
chance variation?
Is it evidence that the coin is not fair?


x
Stating Hypotheses

Situation: We observe some effect and we have
two explanations for it:
1) the effect is due to chance variation
2) the effect is due to something significant
How to decide?
Statement 1) = null hypothesis H
0
(the coin is fair)
Statement 2) = alternative hypothesis H
a
(the coin
is not fair)

x
The null hypothesis is a very specific statement about a parameter of the
population(s). It is labeled H
0
and states status quo, previous knowledge,
no effect, the observed difference is due to chance. It is the one which we
want to reject.
The alternative hypothesis is a more general statement about a parameter of
the population(s) that is the opposite of the null hypothesis. It is labeled H
a

and is the one we try to prove.
Coin tossing example:
H
0
: p = 1/2 (p is the probability that the coin lands heads)
H
a
: p 1/2 (p is either larger or smaller)
Analogy with a criminal trial
H
0
: the defendant is innocent
If sufficient evidence is presented, the jury
will reject this hypothesis and conclude
that
H
a
: the defendant is guilty

One-sided and two-sided tests
A two-tail or two-sided test of the population proportion
has these null and alternative hypotheses:
H
0
: p = p
0
[a specific proportion] H
a
: p = p
0

A one-tail or one-sided test of a population proportion has
these null and alternative hypotheses:
H
0
: p = p
0
[a specific proportion] H
a
: p < p
0
OR
H
0
: p = p
0
[a specific proportion] H
a
: p > p
0

What determines the choice of a one-sided versus a two-
sided test is what we know about the problem before we
perform a test of statistical significance. It is important to
make the choice before performing the test or else you
could make a choice of convenience.

The P-value
Example-contd: A coin is tossed 500 times. It lands heads 275
times. H
0
: p = 1/2 vs. H
a
: p 1/2
What is the chance of observing something like what we
observed if H
0
is true?
Tests of statistical significance quantify the chance of
obtaining a particular random sample result if the null
hypothesis were true. This quantity is the P-value.
This is a way of assessing the believability of the null
hypothesis, given the evidence provided by a random sample.


Interpreting a P-value
Could random variation alone account for the difference between the null
hypothesis and observations from a random sample?
A small P-value implies that random variation due to the sampling process
alone is not likely to account for the observed difference.
With a small p-value we reject H
0
. The true property of the population is
significantly different from what was stated in H
0
.
Thus, small P-values are strong evidence AGAINST H
0
.
Oftentimes, a P-value of 0.05 or less is considered significant: The
phenomenon observed is unlikely to be entirely due to chance event from the
random sampling.



Test for a population proportion
The sampling distribution for is approximately normal for
large sample sizes and its shape depends solely on p and n.
Thus, we can easily test the null hypothesis:
H
0
: p = p
0
(a given value we are testing).

z =

p p
0
p
0
(1 p
0
)
n
If H
0
is true, the sampling distribution is known
The likelihood of our sample proportion given the null
hypothesis depends on how far from p
0
our is in
units of standard deviation.
This is valid when both expected countsexpected successes np
0
and expected
failures n(1 p
0
)are each 10 or larger.

p
0
(1 p
0
)
n

p
0


p
p

p
P-values and one or two sided hypotheses
And as always, if the p-value is as small or smaller than the significance level
o, then the difference is statistically significant and we reject H
0
.
A national survey by the National Institute for Occupational Safety and Health on
restaurant employees found that 75% said that work stress had a negative impact on
their personal lives.
You investigate a restaurant chain to see if the proportion of all their employees
negatively affected by work stress differs from the national proportion p
0
= 0.75.
H
0
: p = p
0
= 0.75 vs. H
a
: p 0.75 (2 sided alternative)
In your SRS of 100 employees, you find that 68 answered Yes when asked, Does work
stress have a negative impact on your personal life?
The expected counts are 100 0.75 = 75 and 25.
Both are greater than 10, so we can use the z-test.
The test statistic is:
From the standard normal table we find the area to the left of z = 1.62 is 0.9474.
Thus P(Z 1.62) = 1 0.9474, or 0.0526. Since the alternative hypothesis is two-sided,
the P-value is the area in both tails, and P = 2 0.0526 = 0.1052>5%.
The chain restaurant data are
not significantly different from the
national survey results
Four steps of hypothesis testing
Define the hypotheses to test, and the required
significance level o.
Calculate the value of the test statistic.
Find the p-value based on the observed data.
State the conclusion.
Reject the null hypothesis if the p-value <=o ;
if p-value>o, the data do not provide sufficient
evidence to reject the null.
The significance level o
The significance level, , is the largest P-value tolerated for
rejecting a true null hypothesis (how much evidence against H
0

we require). This value is decided arbitrarily before conducting
the test.
If the P-value is equal to or less than (P ), then we
reject H
0
.
If the P-value is greater than (P > ), then we fail to reject
H
0
.
When the z score falls within the
rejection region (shaded area on
the tail-side), the p-value is smaller
than and you have shown
statistical significance.
z = -1.645
Z
One-sided
test, = 5%

Two-sided
test, = 1%
Rejection region for a two-tail test of p with = 0.05 (5%)
A two-sided test means that is spread
between both tails of the curve, thus:

-A middle area C of 1 = 95%, and

-An upper tail area of /2 = 0.025.




Here are the traditional z* critical values
from the Normal model:

0.025 0.025
Example
A marketing company
claims that it receives 8%
responses from its mailing.
To test this claim, a
random sample of 500
were surveyed with 25
responses. Test at the o =
0.05 significance level.
Check:
npo = (500)(0.08) = 40
n(1-po) = (500)(0.92) = 460


Both 10, normal assumption
OK
Rejection region: Solution
o = 0.05
n = 500, p = 0.05
Reject H
0
at o = 0.05
H
0
: p = 0.08
H
A
: p = 0.08
Critical Values: 1.96
Test Statistic:
Decision:
Conclusion:
z
0
Reject Reject
0.025 0.025
1.96
-2.47
There is statistical evidence to
reject the companys claim of 8%
response rate. -1.96

z =

p
0
p
0
p
(1
0
p
)
n
=
0.050.08
0.08(10.08)
500
= 2.47
Do not reject H
0
Reject H
0
Reject H
0
o/2 = 0.025
1.96
0

z = -2.47
Calculate the p-value and compare to o
(For a two tailed test the p-value is always two tailed)

P(z s 2.47)+P(z >2.47)
= 2P(zs 2.47)
= 2(0.0068)= 0.0136
p-value = .0136:
p -Value Solution
Reject H
0
since p-value = 0.0136 < o = 0.05
z = 2.47
-1.96
o/2 = 0.025
0.0068 0.0068
Statistical significance vs practical significance
Statistical significance only says whether the effect observed is likely
to be due to chance alone because of random sampling.

Statistical significance may not be practically important. Thats because
statistical significance doesnt tell you about the magnitude of the
effect, only that there is one.

An effect could be too small to be relevant. And with a large enough
sample size, significance can be reached even for the tiniest effect.

Example: A drug to lower temperature is found to lower patient
temperature by 0.4Celsius (P-value < 0.01). But clinical benefits of
temperature reduction only appear for a 1 decrease or larger.

Dont ignore lack of significance

Consider this provocative title from the British Medical
Journal: Absence of evidence is not evidence of absence.

Indeed, failing to find statistical significance in results is not
rejecting the null hypothesis. This is very different from actually
accepting it. The sample size, for instance, could be too small to
overcome large variability in the population.


Interpretation: magnitude vs. reliability of effects
The reliability of an interpretation is related to the strength of
the evidence. The smaller the p-value, the stronger the
evidence against the null hypothesis and the more confident
you can be about your interpretation.

The magnitude or size of an effect relates to the real-life
relevance of the phenomenon uncovered. The p-value does
NOT assess the relevance of the effect, nor its magnitude.

A confidence interval will assess the magnitude of the effect.
However, magnitude is not necessarily equivalent to how
theoretically or practically relevant an effect is.
Confidence intervals to test hypotheses
Because a two-sided test is symmetrical, you can also use
a confidence interval to test a two-sided hypothesis.
/2 /2
In a two-sided test,
C = 1 .
C confidence level
significance level
Although a hypothesis
test can tell us whether
the observed statistic
differs from the
hypothesized value, it
doesnt say by how
much.

The corresponding confidence
interval gives us more
information.
Type I and II errors
A Type I error is made when we reject the null hypothesis and the
null hypothesis is actually true (incorrectly reject a true H
0
).
The probability of making a Type I error is the significance level o


A Type II error is made when we fail to reject the null hypothesis
and the null hypothesis is false (incorrectly keep a false H
0
).
The probability of making a Type II error is labeled |.
The power of a test is 1 |.
Running a test of significance is a balancing act between the chance
of making a Type I error and the chance | of making a Type II error.
Reducing reduces the power of a test and thus increases |.

Potrebbero piacerti anche