Sei sulla pagina 1di 19

Chi Square Test

Rationale for Chi Square Test


• T test popular and widely used
Not applicable for qualitative data
• Frequency data, categorical data ,qualitative
data
• Categorical data does not does not quantify
for example BP levels but classify persons
into hypertensive and normotensive
• Classification table – contingency table
• Chi square -2 test is used to determine
association b/w two variables
Basics of a Chi Square Test
• For a given phenomenon Chi square test
compares the observed frequencies with
the expected frequencies
• Expected frequencies are calculated from
hypothesis
• Example of coin
• In comparing observed frequency (O) from
expected (E), detrmine if the deviation (O-
E) are signifiant.
Observed and expected
frequencies and their deviations for
1 2
100 toses
3 4 5

O E O-E (O-E)2 (O-E)2/ E

H 40 50 -10 100 2
T 60 50 10 100 2
100 100 0 200 4
• Chi square distribution
Basics cont...
• Next is the value ∑(O-E)2/E can occur by
chance
• Need to know the Chi square distribution
or the probability distribution of χ2 statistic
• Chi square is a positively skewed
distribution beginning at 0.
• Degrees of freedom determined by
number of independent deviations (O-E)
on contingency table
Assumptions
• The data used for analysis comes from a random sample.
• The size of the sample is large. Applying chi-square to
small samples exposes the researcher to an unacceptable
rate of type 2 errors
• Adequate cell sizes are also assumed
• Observations are assumed to be independent. The same
observation can only appear in one cell.
• It is assumed that chi-square tests the hypothesis that two
variables are related only by chance. If a significant
relationship is found, this is not equivalent to establishing
the researcher's hypothesis that A causes B, or that B
causes A.
• It is assumed that values are finite. Observations
must be grouped in categories.

• chi-square is a nonparametric test in the sense that


is does not assume the parameter of normal
distribution for the data -- only for the deviations.
• No assumption is made about level of data.
Nominal, ordinal, or interval data may be used with
chi-square tests.

The basic two-by-two contingency table for
epidemiological studies

Outcome

Ill Not ill

Yes a b a+b = total exposed

Exposure

No c d c+d = total unexposed

a+c b+d a+b+c+d = n

(total ill) (total not ill)


CHI-SQUARE (χ 2) TEST
Example
A study of the factors affecting the utilization of antenatal clinics it
was found that 64% of the women who lived within 10 km of the
clinic came for antenatal care, compared to only 47% of those who
lived more than 10 km away. This suggests that antenatal care
(ANC) is used more often by women who live close to the clinics.
Distance U
from ANC A

Less than 10
•Calculate the χ 2 value
•Using a χ 2 Table
5
km
•Interpreting the Result
CALCULATE χ 2
VALUE
• Expected frequency (E) for each cell.
E = row total x column total / grand (overall) total
• Each cell, subtract the expected frequency
from the observed frequency (O)
O-E
• For each cell square the result of (O-E) and
divide by expected frequency E.

• Add the result of the above step for all the


cells
Contd..
CALCULATE χ 2
VALUE
• Formula for calculating chi-square
value:
χ 2 = ∑ ( 0 - E)2 / E
O is the observe frequency (indicated in the table)
E is the expected frequency to be calculated
∑ (the sum of) direct s you to add together the products of (O-E)2
for all the cell of the table

• For two by two table (which contain 4


cells) the formula is
χ 2 = (0 - E ) / E + (0 - E ) / E + (0 - E ) / E + (0 - E ) / E
1 1
2
1 2 2
2
2 3 3
2
3 4 4
2
4
USING a χ 2
TABLE
• Decide a p-value example 0.05
• Degree of freedom = df = (r-1) x (c-1)
for a 2 by 2 table the no. of df is 1 (i.e.
df = (2-1) x (2-1) = 1)
• Step 1 (a)
expected frequency for each cell
E1 = 86 x 80 / 155 = 44.4

D istance fro m A N C U sed A N C D id not use A N C T otal

L ess tha n 10 km O1 = 51 E1 = 44.44 O2 = 29 E2 = 35.6 80

10 km or > O3 = 35 E3 = 41.6 O4 = 40 E4 = 33.4 75


T otal 86 69 155

Contd..
• Step 1(b) to (1d)
χ 2 = (51-44.4)2 / 44.4 + …… + …… + …….
= 0.98 + 1.22 + 1.05 + 1.30 = 4.55
• Step 2
- (df) is 1
- table of chi-square decided p-value = 0.05
- d.f. is 1, we look along row in the column where p=0.05.
This gives us value of 3.84. Our value of 4.55 is > 3.84,

Contd..
INTERPRETING THE RESULT
• Step 3
- We can now conclude that the women
living within the distance of 10 km from
the clinic used antenatal care
significantly more often than women
living more than 10 km away.
• Odds Ratio= ad/bc
• Relative Risk= a/a+b/c/c+d
• Attack rate = Exposure specific risk =

QUICK FORMULA
• For two-by-two tables there is a quick method for calculating the
chi-square value, which can replace step 1 described above.
If the various numbers in the cross table are represented by the
following letters

The quick formula for calculating the Chi-square value is


χ 2
= n (ad - bc)2 / efgh

Potrebbero piacerti anche