Sei sulla pagina 1di 6

CHI-SQUARE TEST

Introduction

Many business surveys and experiments result in the observation of qualitative rather than
quantitative response variables, that is data are classified and not quantified. Data from these surveys
and experiments consists of enumerating the number of occurrence of some events or the number of
response observations falling under each class. This type of data is known as count, classification or
enumeration data and the level of measurement is either nominal or ordinal. For example we might be
interested in the number of consumers who choose each of the three brands of coffee, or the number of
sales made by each of the 3 medical representatives during the month of January.

One objective why we collect count data is to analyze the distribution of counts corresponding
to each class or clarification thereby giving us an idea of the proportion of the population under each
various categories. For example, we may want to estimate the proportion of households who prefer
each of the three different brands of detergents by counting the number of households in a sample who
buy each brand. A head nurse might classify his nurses according to their work preference, either day
shift or night shift. The results of an advertising campaign would yield count data that indicate a
classification of consumer reactions. A marketing manager may be interested in studying the reaction
of a sales prospect to a particular product promotional device resulting to three categories (High,
Moderate, Low). This type of data is known as a one-dimensional classification data and that the
observations are not measurable but rather countable.

One-Dimensional Count Data

Consider the following survey results on brand preference of a random sample of 150
consumers of coffee: Brand A = 61, Brand B = 53 and Brand C = 36. The question of interest, given
the data is, “do these results indicate that a preference exists for any of the brands?”

To answer the question, let us formulate the null hypothesis that there is no preference for any
of the three brands of coffee against the alternative that a preference exists for one or more of the
coffee brands. If we let

p1 be the proportion of customers who prefer brand A,


p2 be the proportion of customers who prefer brand B,
p3 be the proportion of customers who prefer brand C,

we want to test

Ho: p1 = p2 = p3 = 1/3 (since the proportion for the three brands are the same, this hypothesis indicate
that there is no preference for any of the three brands of coffee)

Ha: At least one of the proportion exceeds 1/3 (since the proportion exceeds 1/3, this hypothesis
indicate that the customer have a preference on the brand of coffee).
Now, if the null hypothesis is true, then we would expect that 1/3 of the customers in the
sample will purchase each brand. Thus if we have a sample of 150 customers, then 1/3 of them or
1/3(150) = 50 will purchase brand A, 50 will purchase brand B and another 50 will purchase brand C
coffee which shows that no preference exist in the choice of the brand of coffee. The value 50 is
known as the expected value and generally it is obtained by:

E(ni) = np1
where n = total samples
pi = proportion of group i
E(ni) = expected number of samples in group i

To test the above hypothesis, we use the test statistic χ2 (read as chi-square) which measures
the degree of disagreement between the data and the null hypothesis.

Chi-square goodness of fit test

Chi-square is actually used as a test of significance when we have data that are expressed in
frequencies or count data. This test is used to determine whether the observed frequencies or counts,
in a one-dimensional classification data are in complete agreement or disagreement with the expected
frequencies or counts under the null hypothesis. The assumptions underlying the use of this test are:
1. Data must be independent, that is, no response or observation is related to any other
response or observation.
2. The categories must be mutually exclusive, that is an observation or frequency is place in
one and only category.
3. No more than 20% of the classes have expected frequencies less than 5 and no expected
frequencies is less than 1.

We want to test the hypothesis

Ho: Oi = Ei versus Ha: Oi ≠ Ei for at least one i, i = 1, 2, 3, …, k

Where Oi = observed frequency in the ith class


Ei = expected frequency under Ho in the ith class
k = number of classes

The null hypothesis may be tested using the following statistic:


k
(Oi – Ei)2
2
χ = Σ - - - - - - - - - - - ~ follows a chi-square distibution
i=1 Ei with k – 1 degrees of freedom

In making decisions about the parameters, computed value of chi-square are compared with
critical value of the chi-square statistic

Presenting the problem in tabular form,


Table 1. Coffee brand preference

Brand Observed Expected O-E (O-E)2 (O-E)2/E


A 61 50 11 121 2.420
B 53 50 3 9 0.180
C 36 50 -14 196 3.920
Total 150 150 6.520

Χ2 comp

Note that the farther the observed frequencies are from their expected frequencies or the difference
between the observed and the expected frequencies (either positive or negative) becomes large, chi-
square value will also be large, that is large values of chi-square suggest that the null hypothesis is
false.

If the null hypothesis is true, χ2 follows a chi-square distribution with k-1 degrees of freedom.

Our decision rule will be:

If χ2 computed value > χ2a,(k-1) reject the null hypothesis,


If χ2 computed value > χ2a,(k-1) fail to reject the null hypothesis

Table 2 . Partial critical values of the chi-square distribution

Df χ2 0.100 χ2 0.050 χ2 0.025 χ2 0.010 χ2 0.005


1 2.706 3.841 5.024 6.635 7.879
2 4.605 5.991 7.378 9.210 10.597
3 6.251 7.815 9.348 11.345 12.838
4 7.779 9.488 11.143 13.277 14.860
5 9.236 11.070 12.832 15.086 16.750
6 10.645 12.592 14.449 16.812 18.548
7 12.017 14.067 16.013 18.475 20.278
8 13.362 15.507 17.535 20.090 21.955
9 14.684 16.919 19.023 21.666 23.589
10 15.987 18.307 20.483 23.209 25.188

For our brand preference problem, with α 0.05 and k-1 = 3-1=2 df, our critical value is
2
χ 0.05,,2 = 5.991.

Since the computed chi-square is 6.520 and exceeds the critical value of 5.991, we conclude at
5% level of significance that there is customer preference for one or more of the brands of coffee.

f(x2)
Rejection region
α = 0.05

5.99
6.52

Figure1. Rejection brand of coffee preference

Sample Problem 1: A study was undertaken by a watch company to determine whether young college
students have special preference for the color of watchband or whether all four colors under
consideration are equally preferred. A random sample of 80 prospective watch buyers is selected.
Results are tabulated below.

Table 3. Watchband color preference

tan Brown Maroon Black Total


12 40 8 20 80

Solution:

1. Define the null and alternative hypothesis:

Ho: p1 = p2 = p3 = p4 = ¼ (the four band colors are equally preferred)


Ha: At least one of the proportion exceeds ¼ (not all four colors are preferred)

2. Compute the expected frequencies based on Ho:

Under Ho, the expected number of people who will choose color i is equal to E i = npi and since the
proportions are equal to ¼, then we have the following:

E1 = E2 = E3 = E4 =(80) (1/4) = 20

3. Our test statistic is


(Oi – Ei)2
k (12-20)2 (40-20)2 (8-20)2 (20-20)2
2
χ =Σ ----------- = ------- + ------ + ------+ ------
i=1 Ei 20 20 20 20

64 400 144 0
= ----- + ------- + ----- + --------
20 20 20 20

= 3.2 + 20 + 7.2 + 0 = 30.4

Critical value: χ20.01,,3 = 11.34

Decision rule:
If χ2 computed value > χ2a,(k-1) reject the null hypothesis,
If χ2 computed value > χ2a,(k-1) fail to reject the null hypothesis

Conclusion: Since the chi-square computed value is much larger than the chi-square critical point
at the 1% level of significance, we conclude that there is evidence to reject the null hypothesis that all
four colors are equally likely to be chosen. This result shows that some watchband colors are probably
preferred by young college students.

Sample Problem 2: A study reports an analysis of 35 key product categories. At the time of the study,
70% of the products were of local brand, 25% were imported brand and 5% were generic. We want to
test whether this percentage are still valid for the market today, so we collect a random sample of 1000
products in the 35 product categories studied and obtained the following results: 610 products are of
local brand, 290 are imported and 100 are generic. Conduct the test of hypothesis and make your
conclusion.

Solution:

State Ho and Ha:

Ho: Oi = Ei (the proportion is 70%, 25% and 5%)


Ha: Oi ≠ Ei (the proportion is 70%, 25%, 5% is not true)

Compute the expected frequencies based on Ho:

E1 = (1000) (0.70) = 700


E2 = (1000) (0.25) = 250
E3 = (1000) (0.05) = 50

Our test statistic is


k (Oi – Ei)2 (610-700)2 (290-250)2 (100-50)2
χ2 = Σ - - - - - - - - - - - = ------- + ------ + ------
i=1 Ei 700 250 50

8100 1600 2500


= ----- + ------- + ------
700 250 50

= 11.57 + 64 + 50 = 125.57

Critical value: χ20.01,,2 = 9.21

Decision rule:

If χ2 computed value > χ2a,(k-1) reject the null hypothesis,


If χ2 computed value > χ2a,(k-1) fail to reject the null hypothesis

Conclusion: Since the chi-square computed value is much larger than the chi-square critical point
at the 1% level of significance, we conclude that there is evidence to reject the null hypothesis that the
proportions are not valid for the market today.

Potrebbero piacerti anche