Sei sulla pagina 1di 19

Inferences on Proportions

Tieming Ji
Fall 2012
1 / 19
Example: One measure of quality and customer satisfaction is
repeat business. A supplier of paper used for computer
printouts sampled 75 customer accounts last year and found
that 40 of these had place more than one order during the
year. Estimate the proportion of repeat business, and give a
100(1 )% condence interval of the proportion of repeat
business.
X : if one customer reorder or not.
X=1, repeat business; X = 0, no repeat business.
X Bernoulli(p)
We want to estimate p. Notice E(X) = p.
Thus, for an i.i.d. sample with sample size n, an unbiased
point estimator is : p =

X =
1
n

n
i =1
X
i
=
40
75
=
8
15
.
2 / 19
We have found a (unbiased) point estimator for p. What
about a C.I. estimate?
Consider: p =

X. What have we learnt about

X? By CLT, we
have

X
X

X
/

n
N(0, 1) for suciently large n.
We have
X
= E(X) = p,
2
X
= Var(X) = p(1 p). So
p p

p(1 p)/n
N(0, 1).
3 / 19
p p

p(1 p)/n
N(0, 1)
P(z
/2

p p

p(1 p)/n
z
/2
) = 1
P( p z
/2

p(1 p)/n p p +z
/2

p(1 p)/n) = 1
P( p z
/2

p(1 p)/n p p +z
/2

p(1 p)/n) 1
C.I. for p:
p z
/2

p(1 p)/n.
Thus, C.I. for p is:
40
75
z
/2

40
75
(1
40
75
)/75 for level (1 )
condence.
4 / 19
C.I. for p:
p z
/2

p(1 p)/n.
Half Interval length: d = z
/2

p(1 p)/n. If we want to


control the interval length (2d) or half interval length (d), the
sample size n should be at least:
n
z
2
/2
p(1 p)
d
2
.
Since p(1 p) 1/4, when no estimate of p is available, use
n
z
2
/2
4d
2
.
5 / 19
Now, we want to test if the proportion of repeat business is
bigger than p
0
= 0.5 at the signicance level = 0.05. This is
to test
H
0
: p p
0
vs. H
1
: p > p
0
.
(or H
0
: p = p
0
vs. H
1
: p > p
0
)
Rationale:
We would reject H
0
if p =
40
75
is much bigger than p
0
= 0.5.
If H
0
is true,
pp

p(1p)/n
=
pp
0

p
0
(1p
0
)/n

N(0, 1).
If
pp
0

p
0
(1p
0
)/n
is much bigger than 0, we will reject H
0
.
6 / 19
To control type I error at = 0.05 (Reject H
0
but H
0
is true),
we reject H
0
if
p p
0

p
0
(1 p
0
)/n
z

We observe:
pp
0

p
0
(1p
0
)/n
=
40/750.5

0.5(10.5)/75
= 0.578.
z
0.05
= 1.64
Thus, we fail to reject H
0
.
7 / 19
Statistical inferences for proportions:
Summary:
1. p =

X is an unbiased estimator for p.
2. (1 ) condence interval on p is: p z
/2

p(1 p)/n.
3. In hypothesis tests (right-tailed, left-tailed, two-tailed), use
test statistic
pp

p(1p)/n
. When H
0
is true (and p
0
is the
boundary value in the hypothsis),
p p
0

p
0
(1 p
0
)/n
N(0, 1).
Find critical value and rejection region according the type of
the test.
8 / 19
Comparison of two proportions
Example: A study is conducted to compare computer usage in
Canadian businesses to that of businesses in the United States.
Independent random samples of size 375 businesses are
selected from the population of Canadian and United States
businesses, respectively. It is found that 221 of the Canadian
rms and 232 of the rms in the United States have
mainframe computers.
Proportion of business in Canada having mainframe computers: p
1
.
Proportion of business in United States having mainframe computers: p
2
.
(1) Estimation: How to give point and interval estimation of p
1
p
2
?
(2) Test: How to test if p
1
p
2
> p
0
, p
1
p
2
< p
0
, or p
1
p
2
= p
0
?
9 / 19
An i.i.d. sample of size n
1
with x
1
successes from Canada;
An i.i.d. sample of size n
2
with x
2
successes from United
States.
We have
E(
x
1
n
1

x
2
n
2
) = E(
x
1
n
1
) E(
x
2
n
2
) = p
1
p
2
.
Thus, an unbiased point estimate for p
1
p
2
is

p
1
p
2
=

x
1
n
1

x
2
n
2

=
221
375

232
375
= 0.03.
10 / 19
What about C.I. estimator for p
1
p
2
? That relies on the
distribution of the point estimator

p
1
p
2
.
According to the Central Limit Theorem, we have

p
1
p
2
(p
1
p
2
)

p
1
(1 p
1
)/n
1
+ p
2
(1 p
2
)/n
2
N(0, 1).
Use p
1
and p
2
to replace p
1
and p
2
in the denominator, we
have:

p
1
p
2
(p
1
p
2
)

p
1
(1 p
1
)/n
1
+ p
2
(1 p
2
)/n
2

N(0, 1).
11 / 19
P

z
/2

p
1
p
2
(p
1
p
2
)

p
1
(1 p
1
)/n
1
+ p
2
(1 p
2
)/n
2
z
/2

= 1
Thus, the (1 ) C.I. is:

p
1
p
2
z
/2

p
1
(1 p
1
)/n
1
+ p
2
(1 p
2
)/n
2
.
A 95% C.I. is
0.03 1.96

(0.589)(0.411)/375 + (0.619)(0.381)/375
= 0.03 0.07 = [0.1, 0.04].
12 / 19
Consider the following tests:
1. Right-tail test:
H
0
: p
1
p
2
= (p
1
p
2
)
0
or H
0
: p
1
p
2
(p
1
p
2
)
0
H
1
: p
1
p
2
> (p
1
p
2
)
0
2. Left-tail test:
H
0
: p
1
p
2
= (p
1
p
2
)
0
or H
0
: p
1
p
2
(p
1
p
2
)
0
H
1
: p
1
p
2
< (p
1
p
2
)
0
3. Two-tail test:
H
0
: p
1
p
2
= (p
1
p
2
)
0
H
1
: p
1
p
2
= (p
1
p
2
)
0
13 / 19
Consider two cases: (p
1
p
2
)
0
= 0 and (p
1
p
2
)
0
= 0. Test
statistics are dierent for these two cases.
Case 1: (p
1
p
2
)
0
= 0 example:
At the signicance level = 0.05, we want to test
H
0
: p
1
p
2
= 0.05 vs. H
1
: p
1
p
2
= 0.05
We will reject H
0
if

p
1
p
2
is too far from -0.05.
If H
0
is true, by Central Limit Theorem, we have

p
1
p
2
(p
1
p
2
)
0

p
1
(1 p
1
)/n
1
+ p
2
(1 p
2
)/n
2

N(0, 1).
14 / 19
We use test statistic

p
1
p
2
(p
1
p
2
)

p
1
(1p
1
)/n
1
+p
2
(1p
2
)/n
2
, and when H
0
is
true,

p
1
p
2
(p
1
p
2
)
0

p
1
(1 p
1
)/n
1
+ p
2
(1 p
2
)/n
2

N(0, 1).
Since this is a two-tailed test, we reject H
0
if

p
1
p
2
(p
1
p
2
)
0

p
1
(1 p
1
)/n
1
+ p
2
(1 p
2
)/n
2

z
/2
We observe

p
1
p
2
(p
1
p
2
)
0

p
1
(1 p
1
)/n
1
+ p
2
(1 p
2
)/n
2

=
0.03(0.05)

0.589(10.589)
375
+
0.619(10.619)
375
= 0.560
z
0.025
= 1.96. Thus, we fail to reject H
0
.
15 / 19
Case 2: (p
1
p
2
)
0
= 0 example:
At the signicance level = 0.05, we want to test
H
0
: p
1
= p
2
vs. H
1
: p
1
= p
2
We will reject H
0
if

p
1
p
2
is too far from 0.
When H
0
is true, then

p
1
p
2
0

p(1 p)(1/n
1
+ 1/n
2
)
N(0, 1).
When H
0
is true, estimate p =
n
1
p
1
+n
2
p
2
n
1
+n
2
, thus

p
1
p
2

p(1 p)(1/n
1
+ 1/n
2
)

N(0, 1).
16 / 19
If H
0
true, then

p
1
p
2

p(1 p)(1/n
1
+ 1/n
2
)

N(0, 1).
Since this is a two-tailed test, at the level , we reject H
0
if

p
1
p
2

p(1 p)(1/n
1
+ 1/n
2
)

z
/2
.
We observe p =
n
1
p
1
+n
2
p
2
n
1
+n
2
=
375(0.589)+375(0.619)
375+375
= 0.604, and

p
1
p
2

p(1 p)(1/n
1
+1/n
2
)

0.03

0.604(10.604)(
1
375
+
1
375
)

= 0.840.
Since z
0.025
= 1.96, we fail to reject H
0
.
17 / 19
Comparing Two Proportions
Summary:
A random sample of size n
1
from population 1 with x
1
success.
A random sample of size n
2
from population 2 with x
2
success.
1. E(
x
1
n
1

x
2
n
2
) = p
1
p
2
.
2. A (1 ) C.I. for p
1
p
2
is

p
1
p
2
z
/2

p
1
(1 p
1
)/n
1
+ p
2
(1 p
2
)/n
2
.
18 / 19
3. Tests:
(1) To test if p
1
= p
2
, p
1
> p
2
or p
1
< p
2
(in H
1
), under H
0
,
use test statistic

p
1
p
2

p(1 p)(1/n
1
+ 1/n
2
)

N(0, 1),
where p =
n
1
p
1
+n
2
p
2
n
1
+n
2
.
(2) To test if p
1
p
2
= c
0
, p
1
p
2
> c
0
or p
1
p
2
< c
0
(in
H
1
and c
0
= 0), under H
0
, use test statistic

p
1
p
2
c
0

p
1
(1 p
1
)/n
1
+ p
2
(1 p
2
)/n
2

N(0, 1).
19 / 19

Potrebbero piacerti anche