Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
2
=
i=1k
(n
i
-np
i0
)
2
/np
i0
vs.
2
, k-1
Thus, goodness of fit tests are upper tail
Chi-Square tests.
With continuous distributions, cell
probabilities for such tests are selected
to satisfy, p
i0
=P(a
i-1
Xa
i
), where it is
recommended that cells be chosen to
satisfy np
i0
5.
Goodness of fit test application:
A statistics tutoring service was designed to
serve the following client base:
Mgt: 40%, Engr: 30%, Soc: 20%, Agr: 10%
A sample of 120 client visits yields:
Mgt: 52, Engr: 38, Soc: 21, Agr: 9
Was the design expectation correct?
Observed Expected
40% Business 52 .4(120)=48
30% Engineering 38 .3(120)=36
20% Social Science 21 .2(120)=24
10% Agriculture 9__ .1(120)=12
120
1 2 3 4
:p .4, p .3, p .2, p .1
.
:
o
a
H
vs
H Distribution is different
= = = =
2
.05,3
4 1 3 7.81 df _ = = =
2 2 2 2
2
(52 48) (38 36) (21 24) (9 12)
48 36 24 12
1.57 7.81 Fail
_
= + + +
= <
3
Problem 14-9 (page 602)
Test with an exponential distribution
14-9)
1
1
2
2
3
3
4
4
1 1 2 2 3 3 4 4
1
2
3
( ) ( ) 1 =1
: ( ) vs. : ( )
5 intervals
( , )( , )( , )( , )( , )
.2 1 .2231
.4 1 .5108
.6 1 1.49
.8 1
x x
x x
o o a o
C
C x
O
C
C x
O
C
C x
O
C
C x
O
f x e F x e
H f x e H f x e
O C C C C C C C C C
e e C
e e C
e e C
e e
= =
= =
= = =
= = =
= = =
= =
}
}
}
} 4
1.61 C =
( ) ( )
25 . 1
8
8 9
...
8
8 6
2 2
2
=
(
+ +
= _
2
.10,4
Fail to Reject since 7.779 _ =
,
Cell counts are 6, 8, 10, 7 and 9:
Problem 14-17 (page 612)
Test for a discrete distribution
# 0 1 2 3 4 5 6 7 8 9 10 11 12
Freq
24 16 16 18 15 9 6 5 3 4 3 0 1
( ) ( ) ( ) ( )
: ( ) 0,1...
!
vs.
: ( ) is non-Poisson
16 0 24 1
0 1 ... 11 12
120 120 120 120
3.167
x
o o
a o
e
H p x x
x
H p x
= =
= + + + +
=
14-17
3.167
0
3.167
: p ( ) for 0,1...
!
x
o
e
H x x
X
= =
x 0 1 2 3 4 5 6 7
.0421 .1224 .059 .043
n 5.05 16 7.1 5.16
obs 24 16 16 18 15 9 6 16
P
^
P
^
2 2 2
2
.01,7
(5.05 24) (16 16) (5.16 16)
... 104
5.05 16 5.16
18.47 104 Reject
_
_
= + + =
= <
4
A Contingency Table is a table whose
rows represent the possible values of
one variable and whose columns
represent the possible values for a
second variable. The entries in the table
are the number of times that each pair of
values occurs.
Two Variables: defect type and shift
Shift:
Defect Type: 1st 2nd 3rd Total
Color 27 13 10 50
Printing 20 17 7 44
Skewness 5 7 5 17
Total 52 37 22 111
In problems of this type, we may be
interested in testing whether proportions
in the different categories are the same
for all populations, i.e., whether the
populations are homogeneous. For
example, are all defect types distributed
the same way across the three shifts?
(In this case, the row totals are fixed and
the column totals are random variables.)
Defining n
ij
and e
ij
as the observed and
expected number from the i
th
sample
falling into category j, and p
ij
as the
proportion of individuals in population i
who fall into category j, we can use the
data in the contingency table to test
homogeneity using hypotheses of the
form:
H
0
: p
1j
=p
2j
==p
Ij
vs. (for j=1,,J)
H
a
: H
0
is not true
where the corresponding Chi Square test
statistic has (I-1)(J-1) degrees of freedom
and has the form:
2
=
all cells
(n
ij
-e
ij
)
2
/e
ij
vs.
2
, (I-1)(J-1)
where e
ij
=(i
th
row total)(j
th
column total)/(grand total)
as before, it is recommended that data be
collected to satisfy e
ij
5 for all cells.
In tests of Homogeneity, H
0
states that
the proportion of individuals in category
J is the same for each population and
that this is true for every category.
5
Two Variables: defect type and shift
Shift:
Defect Type: 1st 2nd 3rd Total
Color 27 13 10 50
Printing 20 17 7 44
Skewness 5 7 5 17
Total 52 37 22 111
This procedure assumes fixed row totals column totals are random variables.
Illustration for the sample contingency
table with defect type fixed and shift as a
random variable.
A closely related procedure to
homogeneity testing can be used to test
for independence by defining p
ij
as the
proportion of individuals in category (i,j)
and phat
i.
as the sample proportion for
category i of factor 1 and phat
.j
as the
sample proportion for category j of
factor 2. These definitions define the
estimated expected cell counts:
e
ij
=(i
th
row total)(j
th
column total)/(grand total)
and can be applied in tests of the form:
H
0
: p
ij
=p
i.
p
.j
for i=1I and j=1J vs.
H
a
: H
0
is not true
where the corresponding test statistic
has the form:
2
=
all cells
(n
ij
-e
ij
)
2
/e
ij
vs.
2
, (I-1)(J-1)
as before, it is recommended that data
be collected to satisfy e
ij
5 for all cells.
Tests of Homogeneity vs. Independence:
In tests of homogeneity, either the row
total is fixed and the column totals are
random variables, or else the column
totals are fixed and the row totals are the
random variables.
In tests of independence, only the
sample size is fixed and the row and
column totals are both random variables.
Problem 14-32 (page 620)
Test of independence