Sei sulla pagina 1di 3

The Logic of Chi-Square Test

Alp Eren AKYUZ∗

April 2011

1 Purpose and Method of Chi-Square Test

Chi-Square test is performed to assess whether two categorical variables are related to
each other.
Assume we have a population, members of which can be classified according to their
certain properties such as gender, income level, etc. Let us start with the following piece
of information.

Gender distribution of employees: Female=300 Male=700


Job title distribution of employees: CEO=100 Middle Manager=300 Supervisor=600

If job title and gender were totally independent, we would expect to see a proportional
distribution of job titles among genders. For example out of 700 male employees we would
expect 70 to be CEOs since 10% of the total population consists of CEOs. Similarly, we
would expect 210 males to be middle managers and 420 to be supervisors if gender had
no effect on position. For females, we would expect to see 30 CEOs, 90 middle managers
and 180 supervisors.
This information can be presented by constructing a contingency table.

Bogazici University, Department of Management, Bebek, Istanbul, 34342. Email:
alperen.akyuz@boun.edu.tr. Phone: +90 212 3597508.

1
The Logic of Chi-Square Test

- Male Female Total

CEO 70 30 100

Mid. Man. 210 90 300

Supervisor 420 180 600

Total 700 300 1000

This is what we should observe if gender and job title were totally independent. As
we get away from this condition (as the values we observe differs from what we expect)
our test statistic should increase in value.
The easiest way to measure the distance between the observed and expected values is
to take their differences. However, some differences would have a negative sign which may
be troubling to interpret. To get rid of the negative sign, we can just take the squares of
the differences. This also inflicts a penalty for greater differences. At the final step we
sum these squared differences to have an idea of the total difference between the observed
and expected values.
Chi-Square test statistic is exactly calculated according to this principle. To see how
its value changes as we get away from the independency condition, consider the following
two cases.

1.1 Case 1: Small deviation from Independency

In this case the numbers in black are observed values whereas those in red are expected
values we calculated previously.

- Male Female Total

CEO 80(70) 20(30) 100

Mid. Man. 220(210) 80(90) 300

Supervisor 400(420) 200(180) 600

Total 700 300 1000

Calculate the test statistic.

2
The Logic of Chi-Square Test

X (fobs − fexp )2
χ2ST AT =
all cells
fexp

(80−70)2 (20−30)2
70
+ 30
2
χ2ST AT = + (220−210) (80−90)2 = 9.5238
210
+ 90
2
(200−180)2
+ (400−420)
420
+ 180

1.2 Case 2: More deviation from Independency

Let us double the deviation for each cell.

- Male Female Total

CEO 90(70) 10(30) 100

Mid. Man. 230(210) 70(90) 300

Supervisor 380(420) 220(180) 600

Total 700 300 1000


Now calculate the test statistic once again.

X (fobs − fexp )2
χ2ST AT =
all cells
fexp

(90−70)2 (10−30)2
70
+ 30
2
χ2ST AT = + (230−210) (70−90)2 = 38.0952
210
+ 90
2
(220−180)2
+ (380−420)
420
+ 180

Doubling the deviations more than quadrupled the Chi-Square test statistic.

2 Conclusion

In conclusion, as the values get away from the ideal independent condition, chi-square
test statistic increases drastically. This means rejection of the null hypothesis stating
“the two variables are independent” becomes more likely.