Sei sulla pagina 1di 4

12.

2 Chi-Squared Tests for the Homogeneity of Proportions

In a goodness-of-fit test, we are analyzing the distribution of 1 variable in 1 population.

In a homogeneity of proportions test, we are analyzing the distribution of 1 variable in 2 or more populations.
Note: the variable can have 2 or more categories.

For example, if we wanted to know if WHS students prefer a particular subject (English, math, etc.) we would
do a goodness-of-fit test.

population: WHS students, variable: favorite subject

However, if we wanted to see if girls prefer different subjects than boys, we would do a homogeneity of
proportions test.

populations: girls and boys, variable: favorite subject

Likewise, if we wanted to see if different classes have different preferences, we would do a homogeneity of
proportions test.

populations: 9, 10, 11, 12, variable: favorite subject

The c 2 test statistic remains the same, but the data comes in a two-way table instead of a one-way table and the
methods of calculating the expected values and the df change as well.

Suppose that independent random samples were taken of voters from 4 different regions of the country and that
each voter’s political affiliation was noted (see table below). Are political party affiliations the same throughout
the US?
Northea Sout Midwe Wes tota
st h st t l
Republica 86 137 142 93 458
n
Democrat 158 112 79 123 472
Other 22 23 41 51 137
total 266 272 262 267 106
7

Note: Why would we want to combine all of the other political parties into one “other” category?
Because it is not likely that each expected cell count would be > 5 for each minor party.

Discuss: What are the populations? What is the variable? Based on the table alone, we couldn’t tell. We
must know about the sampling procedure! Select randomly from the population and measure the variable.
(Sample NEasterners and ask party affiliation, … OR sample democrats and ask where they live)
H 0 : Political affiliations are the same throughout the country OR
Democrats / Republicans /Others are have the same distribution regardless of region.
First is the correct one.

df = (4 – 1)(3 – 1) = (row – 1) (column – 1)


Discuss degrees of freedom. If we take the totals as given how many of the counts can be freely
chosen. The last count in each row / column is determined since we must have the specified total.
Expected cell counts:
Set up on the board a blank expected cell count table

Use different colors when filling in free/determined cells.


Expecte Northea Sout Midwe West tota
d st h st l
Republica 114.2 116. 112.5 114. 458
n 8 6
Democrat 117.7 120. 115.9 118. 472
3 1
Other 34.2 34.9 33.6 34.3 137
Total 266 272 262 267 106
7

Since Republicans make up .429 of the overall population (458/1067), then according to the null hypothesis,
Republicans should make up .429 of each region.
�458 �
Expected Republicans in NE = � � �266 = 114.2, etc.
�1067 �
Since Democrats make up .442 of the overall population (472/1067), then according to the null hypothesis,
Democrats should make up .442 of each region.
�472 �
Expected Democrats in NE = � ��266 = 117.7, etc.
�1067 �
row total �
column total
Note: Expected cell count =
grand total

5 Steps:
1. At first glance, it appears that political affiliations are not the same throughout the country since the observed
counts are different than the expected counts. However, it is possible that the political affiliations are the same
and we got these differences due to sampling variability. To decide I will conduct a c 2 Homogeneity of
Proportions test.
2. H 0 : Political affiliations are the same throughout the country.
H a : They aren’t
a = .05
3. Conditions:
a. independent random samples of voters in each region? given.
b. large sample size? Expected counts all > 5 (see table above)
c. samples < 10% of populations? Yes, assuming at least 2720 voters in each region.
( 86 - 114.2 )
2

4. c =
2
+L = 66.80,
114.2
df = (4 - 1)(3 - 1) = 6,
P-value = 0
5. Since P-value < a , I reject the null hypothesis and conclude that party affiliations are not the same
throughout the US.

Many high schools survey graduating classes to determine the plans of the graduates. We might wonder whether
the plans of students have stayed roughly the same over the past decades or whether they have changed. Here is
some data from a random sample of Bay Area High School taken in 1980, 1990, and 2000 about this very
question. Is there evidence that the proportions have changed over the years?

Proportions:

Expected Counts:

Chi Square:

1. At first glance, it appears that plans for students after college over a 30 year period are not the same at the
East Bay High School since the observed counts are different than the expected counts. However, it is possible
that the plans after high school are the same and we got these differences due to sampling variability. To decide
I will conduct a c 2 Homogeneity of Proportions test.
2. H 0 : Plans for after high school are the same throughout the 30 yr period.
H a : They aren’t a = .05
3. Conditions:
a. independent random samples of student in each region? given.
b. large sample size? Expected counts all > 5 (see table above). All except for Travel = 2. Proceed with
caution.
c. samples < 10% of populations? Yes, assuming at least 10,000 graduates in 30 years..
(320 - 365.2) 2
4. c 2 = + ... =72.13,
365.2
df = (4 - 1)(3 - 1) = 6,
P-value = 0 (1.494 x E-13)
5. Since P-value < a , I reject the null hypothesis and conclude that the plans after high school are not the same
over 30 years.

HW #92: 12.15, 12.17, 12.32, 12.33

Potrebbero piacerti anche