Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Introduction
Many business surveys and experiments result in the observation of qualitative rather than
quantitative response variables, that is data are classified and not quantified. Data from these surveys
and experiments consists of enumerating the number of occurrence of some events or the number of
response observations falling under each class. This type of data is known as count, classification or
enumeration data and the level of measurement is either nominal or ordinal. For example we might be
interested in the number of consumers who choose each of the three brands of coffee, or the number of
sales made by each of the 3 medical representatives during the month of January.
One objective why we collect count data is to analyze the distribution of counts corresponding
to each class or clarification thereby giving us an idea of the proportion of the population under each
various categories. For example, we may want to estimate the proportion of households who prefer
each of the three different brands of detergents by counting the number of households in a sample who
buy each brand. A head nurse might classify his nurses according to their work preference, either day
shift or night shift. The results of an advertising campaign would yield count data that indicate a
classification of consumer reactions. A marketing manager may be interested in studying the reaction
of a sales prospect to a particular product promotional device resulting to three categories (High,
Moderate, Low). This type of data is known as a one-dimensional classification data and that the
observations are not measurable but rather countable.
Consider the following survey results on brand preference of a random sample of 150
consumers of coffee: Brand A = 61, Brand B = 53 and Brand C = 36. The question of interest, given
the data is, “do these results indicate that a preference exists for any of the brands?”
To answer the question, let us formulate the null hypothesis that there is no preference for any
of the three brands of coffee against the alternative that a preference exists for one or more of the
coffee brands. If we let
we want to test
Ho: p1 = p2 = p3 = 1/3 (since the proportion for the three brands are the same, this hypothesis indicate
that there is no preference for any of the three brands of coffee)
Ha: At least one of the proportion exceeds 1/3 (since the proportion exceeds 1/3, this hypothesis
indicate that the customer have a preference on the brand of coffee).
Now, if the null hypothesis is true, then we would expect that 1/3 of the customers in the
sample will purchase each brand. Thus if we have a sample of 150 customers, then 1/3 of them or
1/3(150) = 50 will purchase brand A, 50 will purchase brand B and another 50 will purchase brand C
coffee which shows that no preference exist in the choice of the brand of coffee. The value 50 is
known as the expected value and generally it is obtained by:
E(ni) = np1
where n = total samples
pi = proportion of group i
E(ni) = expected number of samples in group i
To test the above hypothesis, we use the test statistic χ2 (read as chi-square) which measures
the degree of disagreement between the data and the null hypothesis.
Chi-square is actually used as a test of significance when we have data that are expressed in
frequencies or count data. This test is used to determine whether the observed frequencies or counts,
in a one-dimensional classification data are in complete agreement or disagreement with the expected
frequencies or counts under the null hypothesis. The assumptions underlying the use of this test are:
1. Data must be independent, that is, no response or observation is related to any other
response or observation.
2. The categories must be mutually exclusive, that is an observation or frequency is place in
one and only category.
3. No more than 20% of the classes have expected frequencies less than 5 and no expected
frequencies is less than 1.
In making decisions about the parameters, computed value of chi-square are compared with
critical value of the chi-square statistic
Χ2 comp
Note that the farther the observed frequencies are from their expected frequencies or the difference
between the observed and the expected frequencies (either positive or negative) becomes large, chi-
square value will also be large, that is large values of chi-square suggest that the null hypothesis is
false.
If the null hypothesis is true, χ2 follows a chi-square distribution with k-1 degrees of freedom.
For our brand preference problem, with α 0.05 and k-1 = 3-1=2 df, our critical value is
2
χ 0.05,,2 = 5.991.
Since the computed chi-square is 6.520 and exceeds the critical value of 5.991, we conclude at
5% level of significance that there is customer preference for one or more of the brands of coffee.
f(x2)
Rejection region
α = 0.05
5.99
6.52
Sample Problem 1: A study was undertaken by a watch company to determine whether young college
students have special preference for the color of watchband or whether all four colors under
consideration are equally preferred. A random sample of 80 prospective watch buyers is selected.
Results are tabulated below.
Solution:
Under Ho, the expected number of people who will choose color i is equal to E i = npi and since the
proportions are equal to ¼, then we have the following:
E1 = E2 = E3 = E4 =(80) (1/4) = 20
64 400 144 0
= ----- + ------- + ----- + --------
20 20 20 20
Decision rule:
If χ2 computed value > χ2a,(k-1) reject the null hypothesis,
If χ2 computed value > χ2a,(k-1) fail to reject the null hypothesis
Conclusion: Since the chi-square computed value is much larger than the chi-square critical point
at the 1% level of significance, we conclude that there is evidence to reject the null hypothesis that all
four colors are equally likely to be chosen. This result shows that some watchband colors are probably
preferred by young college students.
Sample Problem 2: A study reports an analysis of 35 key product categories. At the time of the study,
70% of the products were of local brand, 25% were imported brand and 5% were generic. We want to
test whether this percentage are still valid for the market today, so we collect a random sample of 1000
products in the 35 product categories studied and obtained the following results: 610 products are of
local brand, 290 are imported and 100 are generic. Conduct the test of hypothesis and make your
conclusion.
Solution:
= 11.57 + 64 + 50 = 125.57
Decision rule:
Conclusion: Since the chi-square computed value is much larger than the chi-square critical point
at the 1% level of significance, we conclude that there is evidence to reject the null hypothesis that the
proportions are not valid for the market today.