Sei sulla pagina 1di 30

UIC

Announcement BUSINESS
Final exam: Saturday May 02 between 9:30am - 11:30am in
the DH220.
Final exam syllabus:
Chapters 1 (except 1.7), 2, 3 (except 3.4), 4 (except 4.5),
5, 6 (except 6.4), 7 (except 7.5 & 7.6), 8 (except 8.5), 9,
10, 11(only 11.1, 11.2, and 11.3), 14(only 14.1)
There are 30 True/False and 5 numerical questions.
The exam duration is two hours.
You need your laptop with R software installed.
Open books, notes, and laptop but no internet or any form of
communication.
UIC
Final Preparation BUSINESS
Use the slides as your main reference to understand concepts
and the textbook for clarifications and additional explanation
Resolve in-class exercises, assignments
Be familiar with R
UIC
BUSINESS

Topic 3:
Inference for Two-Way Tables
Chi-Square Test
(Additional reading: Chapter 6.4 in text book)
UIC
Chi-Square as a statistical test BUSINESS
Chi-square test: an inferential statistics technique designed to
test for significant relationships between two variables
organized in a bivariate table.
The chisquare test can be used to evaluate a relationship
between two categorical variables.
It is one example of a nonparametric test.
Chi-square requires no assumptions about the shape of the
population distribution from which a sample is drawn.
However, like all inferential techniques it assumes random
sampling.
UIC
Example BUSINESS
Suppose that 125 children are shown three television commercials
for breakfast cereal and are asked to pick which they liked best.
The results are shown in the table below.

A B C Total
Levels of Boys 30 29 16 75
variable 1 Girls 12 33 5 50
Levels of
Total 42 62 21 125 variable 2
Commercial Preference for Boys and Girls

We would like to know if the choice of favorite commercial was


related to whether the child was a boy or a girl or if these two
variables are independent.
UIC
Example contd BUSINESS
Hypothesis
0 : Variables are independent
: Variables are related (dependent)

A B C Total

Boys 30 25.2 29 37.2 16 12.6 75


Girls 12 16.8 33 24.8 5 8.4 50

Total 42 62 21 125

Expected frequency cell (1,1): Expected frequency cell (1,2):


7542 7562
= 25.2 = 37.2
125 125
UIC
Example contd BUSINESS
Observed value
A B C Total

Boys 30 25.2 29 37.2 16 12.6 75

Expected Girls 12 16.8 33 24.8 5 8.4 50


value
Total 42 62 21 125

2
30 25.2 2 29 37.2 2 16 12.6 2
= + +
25.2 37.2 12.6
2 2
12 16.8 33 24.8 5 8.4 2
+ + +
16.8 24.8 8.4
= 9.098
UIC
Example contd BUSINESS
The larger 2 , the more likely that the variables are related;
note that the cells that contribute the most to the resulting
statistic are those in which the expected count is very different
from the actual count.

Chisquare has a probability distribution, the critical values for


which are listed in Chi Square Tables." As with
the tdistribution, 2 has a degreesoffreedom parameter, the
formula for which:
(2 1) (3 1) = 1 2 = 2
UIC
Example contd BUSINESS
In Chi-Square Table, a chisquare of 9.098 with two degrees
of freedom falls between the commonly used significance levels
of 0.05 and 0.01. If you had specified an alpha of 0.05 for
the test, you could, therefore, reject the null hypothesis that
gender and favorite commercial are independent. At =
0.01, however, you could not reject the null hypothesis.

The 2 test does not allow you to conclude anything more


specific than that there is some relationship in your sample
between gender and commercial liked (at = 0.05).
Examining the observed versus expected counts in each cell
might give you a clue as to the nature of the relationship and
which levels of the variables are involved. For example,
Commercial B appears to have been liked more by girls than
boys. But 2 tests only the very general null hypothesis that the
two variables are independent.
UIC
Hypothesis Testing with Chi-Square BUSINESS
Chi-square follows five steps:
1. Making assumptions (random sampling)
2. Stating the research and null hypotheses and selecting alpha
3. Selecting the sampling distribution and specifying the test
statistic
4. Computing the test statistic
5. Making a decision and interpreting the results
UIC
Stating research and null hypothesis BUSINESS
The research hypothesis (H1) proposes that the two variables
are related in the population.
The null hypothesis (H0) states that no association exists
between the two cross-tabulated variables in the population,
and therefore the variables are statistically independent.
UIC
Expected frequencies BUSINESS
Observed frequencies O: the cell frequencies actually observed in
a bivariate table.
Expected frequencies E : the cell frequencies that would be
expected in a bivariate table if the two tables were statistically
independent.
To obtain the expected frequencies for any cell in any cross-
tabulation in which the two variables are assumed independent,
multiply the row and column totals for that cell and divide the
product by the total number of cases in the table.

( )
=

UIC
Chi-Square BUSINESS
The test statistic that summarizes the differences between the
observed () and the expected () frequencies in a bivariate
table.

2
2
=

The sampling distribution of chi-square tells the probability of


getting values of chi-square, assuming no relationship exists in
the population.
The chi-square sampling distributions depend on the degrees
of freedom (df = (r-1)(c-1), where r = the number of rows, and
c = number of columns).
UIC
Sampling distribution of Chi-Square BUSINESS
The distributions are positively skewed. The research
hypothesis for the chi-square is always a one-tailed test.
Chi-square values are always positive. The minimum possible
value is zero, with no upper limit to its maximum value.
As the number of degrees of freedom increases, the 2
distribution becomes more symmetrical.
UIC
Limitation of the Chi-Square test BUSINESS
The chi-square test does not give us much information about the
strength of the relationship or its substantive significance in the
population.
The chi-square test is sensitive to sample size. The size of the
calculated chi-square is directly proportional to the size of the
sample, independent of the strength of the relationship
between the variables.
The chi-square test is also sensitive to small expected
frequencies in one or more of the cells in the table.
UIC
Another example BUSINESS
A public opinion poll surveyed a simple random sample of 1000
voters. Respondents were classified by gender (male or female)
and by voting preference (Republican, Democrat, or Independent).
Results are shown in the two-way table below.

Republican Democrat Independent Total


Male 200 150 50 400
Female 250 300 50 600
Total 450 450 100 1000

Is there a gender gap? Do the men's voting preferences differ


significantly from the women's preferences? Use a 0.05 level of
significance.
UIC
Example contd BUSINESS
The solution to this problem takes four steps:
(1) state the hypotheses,
(2) analyze sample data,
(3) interpret results.
We wo
State the hypotheses. The first step is to state the null-
hypothesis and an alternative hypothesis.
H0: Gender and voting preferences are independent.
Ha: Gender and voting preferences are not independent.
UIC
Example contd BUSINESS
Analyze sample data. Applying the chi-square test for
independence to sample data, we compute the degrees of
freedom, the expected frequency counts, and the chi-square
test statistic. Based on the chi-square statistic and the degree
of freedom, we determine the p-value.
Degree of freedom = 2 1 3 1 = 2

Republican Democrat Independent Total


Male 200 180 150 180 50 40 400
Female 250 270 300 270 50 60 600
Total 450 450 100 1000
UIC
Example contd BUSINESS
2
200 180 2 150 180 2 50 40 2 250 270 2
= + + +
180 180 40 270
300 270 2 50 60 2
+ + = 16.2
270 60
The P-value is the probability that a chi-square statistic having 2
degrees of freedom is more extreme than 16.2.
We use the Chi-Square Distribution Table to find (2 > 16.2) =
0.0003.
Interpret results. Since the P-value (0.0003) is less than the
significance level (0.05), we cannot accept the null hypothesis. Thus,
we conclude that there is a relationship between gender and voting
preference.
UIC
BUSINESS

Review
(Example of ANOVA, Chi-Squares and Comparing two
Proportions)
UIC
Example 1 BUSINESS
Northern States Marketing Research has been asked to determine
if an advertising campaign for a new cell phone is increased
customer recognition of the new World A phone. A random
sample of 270 residents of a major city were asked if they knew
about the World A phone before the advertising campaign. In this
survey 50 respondents had heard of World A. After the
advertising campaign a second random sample of 203 residents
were asked exactly the same question using the same protocol. In
this case, 81 respondents had heard of the World A phone. Do
these results provide evidence that customer recognition increased
after the advertising campaign?
UIC
Solution BUSINESS
Let 1 and 2 be the population proportions that recognized the
World A phone before and after the advertising campaign,
respectively. The null hypothesis
0 : 1 = 2
versus
: 1 < 2
The decision rule is to reject 0 in favor of 1 if
1 2
<
1 1
1 +
1 2
The data for this problem are as follows:
50 81
1 = 270, 1 = = 0.185, 2 = 203, 2 = = 0.399
270 203
UIC
Solution contd BUSINESS
The estimate of the common variance under the null hypothesis is
as follows
1 1 + 2 2 270 0.185 + (203)(0.399)
= = = 0.277
1 + 2 270 + 203
The test statistic is as follows:
1 2 0.185 0.399
=
1 1 1 1
1 + 0.277 1 0.277 +
1 2 270 203
= 5.15
For a one-tailed test with = 0.05, the value is -1.645. Thus,
we reject the null-hypothesis and conclude that customer
recognition did increase after the advertising campaign.
UIC
Example 2 BUSINESS
When marketers position or establish new brands, they aim to
differentiate their product from its competition. To investigate the
consumers perception, consumers are exposed to different products and
asked what comes to their mind when they see or hear of this product.
For example, suppose a study was conducted to determine whether
Safety or Sporty comes to a persons mind when they see or hear of
a particular type of automobile: BMW, Mercedes, or Lexus. Associations
and products can be organized in a cross table below. We want to
know whether the products mentioned differ in their associations and
are, thus, perceived as dissimilar.
Automobile SPORTY SAFETY TOTAL
BMW 256 74 330
Mercedes 41 42 83
Lexus 66 34 100
Total 363 150 513
UIC
Solution BUSINESS
The null hypothesis to be tested implies that, in the population, the
three types of automobiles are perceived as similar; that is no
association between automobile type and costumers perception
of the car as being known for being sporty or being known for its
safety.
To do the test, we first calculate the expected values.

Automobile SPORTY SAFETY TOTAL


BMW 256 (233.5) 74 (96.5) 330
Mercedes 41 (58.7) 42 (24.3) 83
Lexus 66 (70.8) 34 (29.2) 100
Total 363 150 513
UIC
Solution contd BUSINESS
The 2 -test statistic is computed as follows
256233.5 2 7496.5 2 4158.7 2 4224.3 2
2 = + + + +
233.5 96.5 58.7 24.3
6670.8 2 (3429.2) 2
+ = 26.8
70.8 29.2

The degree of freedom are (r-1)(c-1) = (3-1)(2-1) = 2. From the


2 Table we find the following:
22,0.005 = 10.60

Therefore, the null-hypothesis of no association is very clearly


rejected, even at the 0.5% level. The evidence against this
hypothesis is overwhelming.
UIC
Example 3 BUSINESS
Which of the following is true about the F ratio?
1) it has no negative values
2) it is positively skewed
3) the mean of the F distribution equals zero
5) (1) and (2)
6) (1) and (3)
7) (2) and (3)
8) (1), (2), and (3)

Solution: 5
UIC
Example 4 BUSINESS
While conducting a one-way ANOVA comparing five treatments
with 10 observations per treatment, you compute SSG = 42.41
and MSE = 6.34. What is the value of F?
A) 42.41
B) 1.67
C) 6.34
D) 6.69
E) 0.74
Solution: B
UIC
Example 5 BUSINESS
When comparing three treatments in a one-way ANOVA, the null
hypothesis would be; that is, all three treatments have the same
effect on the mean response. In words, how would you interpret
the alternative hypothesis Ha?
A) At least two treatments are different from each other in terms
of their effect on the mean response.
B) All three treatments have different effects on the mean
response.
C) Exactly two of the three treatments have the same effect on
the mean response.
D) All of the above
Solution: A
UIC
Example 6 BUSINESS
It is desired to test 0 = 45 against < 45 using =
0.10. The population in question is Normaly distributed with a
standard deviation of 15. A random sample of 49 will be drawn
from this population. If is really equal to 40, What is the power
of this test?
A) 0.1469
B) 0.8531
C) 0.3531
D) 0.6469
Solution: B

Potrebbero piacerti anche