Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
The chi-square test can be used to determine whether the sample data conform to
any kind of expected distribution and the data is categorical (nominal or
ordinal). The test determines whether the data fits a given distribution, such as
uniform, normal, …
( fo − fe )2
χ =Σ
2
df = k – 1 – m
fe
Where:
f0 = frequency of observed (or actual) values
fe = frequency of expected (or theoretical) values
k = number of categories
m = number of parameters being estimated from the sample data
1
Chi-square test for independence
The Chi-square test for independence is based on the count in a
contingency (or cross tabs) table. It tests whether the counts for the
row categories are probabilistically independent of the counts for the
column categories.
(Oij − Eij ) 2
χ 2 = Σij df = (row – 1)(col – 1)
Eij
Where:
Oi = Observed number of observations in category I
Ei = Expected number of observations in category I
2
Chi-square test – Local survey
◆ In a national survey, consumers were asked the question, ‘In general, how would
you rate the level of service that business in this country provide?’
◆ The distribution of responses was in the National column:
◆ Suppose a manager wants to find out whether this result apply to his customers
of her store in the city.
◆ She did a similar survey to 207 randomly selected customers in her
Microsoft Excel
Worksheet
Clive Morley 4
Steps in Hypothesis Testing
Clive Morley 5
Contingency Tables
Two way table
χ ² = Σ (Oi - Ei)²/Ei
Clive Morley 6
Contingency Tables - Example
Two way table
eg. responses to question 6 (a, b, or c) by two groups:
Q6 Group 1 Group 2
a 10 18
b 12 22
c 15 26
Q6 Group 1 Group 2
a 27% 27%
b 32 33
c 41 39
Clive Morley 8
Contingency Tables - Example
Q6 Group 1 Group 2
a 10 18
b 12 22
c 15 26
Chi-square test statistic = 0.0142
χ ² = 0.014
p-value = 0.9929
prob = 0.993
Microsoft Excel
Worksheet
Not significant
Clive Morley 9
Statistical Decision
For t-test (for mean or proportion):
Null Hypothesis: no-change situation
For Chi square test:
Null Hypothesis: two variable sets are independent
Clive Morley 10
Type I and Type II errors
Two ways a hypothesis test result can be wrong:
Clive Morley 11
Type I and Type II errors
REALITY
Hypothesis correct Hypothesis wrong
TEST FINDS
Hypothesis wrong × √
type I error
(test significance level)
Clive Morley 12
Type I and Type II errors
Prob value = observed probability of type I error
Using t = 2
equivalent to setting probability of type I error at 0.05
Clive Morley 13
BUSM 4074
Management Decision
Making
4. Multiple regression
Clive Morley 16
… as my salary increases, computers are getting
cheaper,
therefore to get cheaper computers, pay me more
Y = a + b 1 X1 + b 2 X2 + b 3 X3
Plot of data
y = 75.814 + 0.123xSq.Fteet
Simple Linear Regression Model
yi = β0 + β1 xi + ε i
^
Y
Y = mX + c
Observed
value of Y for Slope = β1
εi
Xi
εi
Xi 29
Excel Residual Output for House Price model
X Y Predicted Y Y-Y
(SqFt) ($’000)
Y
1700 255 284.85 -29.85348 It shows how well the regression line fits the data
points. The best and worst predictions were 3.94 and
64.3, respectively. 30
Measures of variation
Y
Yi ∧
SSE = ∑(Yi - Ŷi ) 2
Y
_
Xi X
SSE: Sum of Squares of Error, SSR: Sum of Squares of Regression31
Measures of variation
Error
= ∑ y 2 − b0 ∑ y − b1 ∑ xy
Standard Error of the Estimate tells us how spread-out the errors is.
33
Linear Regression – Example
Computer output:
Correlation R = 0.837 , R-squared = 0.700
Coefficient t sig
Coefficient t sig
Slight improvement in R²
Data
Company MBV Revenue
1 2.011 39.505
2 1.814 4.165
3 1.522 10.406
4 1.826 7.602
5 1.824 2.942
6 1.337 5.228
7 1.650 1.697
etc
Linear regression - example
Output
Dep Var: MBV N: 71
Multiple R: 0.318
Squared multiple R: 0.101
Model is:
MBV = 2.010 + 0.046× Revenue
Dialogue box
Dependent: Annual Nursing Salary
Independents: Number of beds in home
Annual medical in-patient days
Annual total patient days
Rural (1) and non-rural (0) homes
Multiple regression - Example
Model Summary
R R Square Adjusted R Square
0.8803 0.775 0.7557
ANOVA F Sig
Regression 40.4375 0.000
Multiple regression - Example
R=0.88 Coeff. of correlation, the relationship between 2
variables. R=-1: strong, negative relationship. R=1: strong,
positive relationship. R=0: no relationship between 2 variables.
R Square = 0.775 Coeff. of Determination. 77.5% of a change in Y
can be explained by a change in X. The other 22.5% is by some
other factors. This fit is quite strong.
Adjusted R Square=0.7557 Adjusted for multiple variables. A
decrease Adjusted R Square means the newly added variable is not
significant.
Multiple regression - Example
H
is
tog
ram S t a n d a r d iz e d R e s id u a ls
D
epe
nde
ntV
aria
ble
:Cu
rre
ntS
ala
ry
1
60
20
1
40
15
1
20
Frequenc y
Frequency
1
00
10
8
0
5
6
0
4
0
0
2
0
S
td
M
e
.D
a
e
v=1
n=0
.0
.00
0
0 N=4
74.0
0
- 3 .5 - 3 - 2 .5 - 2 - 1 .5 - 1 - 0 .5 0 0 .5 1 1 .5 2 2 .5 3 3 .5
-
6
4
.0
.0
.0
.0
.0
.0
.0
.0
.0
.0
.0
0
0
R
egre
ssio
nSta
nda
rdizedR
esid
ual
V a r ia b le
1500.0000
1000.0000
500.0000
u
aR
id
s
e
0.0000
l
0 10 20 30 40 50 60
-500.0000
-1000.0000
-1500.0000
-2000.0000
-2500.0000
Estimate
Scatterplot
-6
Current Salary
Multiple regression - Example
R Square 0.7359
Standard Error 2733.7424
R Square 0.8216
Standard Error 2280.7998
Log model
Very often we use multiple regression to fit a multiplicative model:
Y = aX1b1 X2b2 X3b3
45
40
SALES
35
30
25
20
0 5 10 15 20
TIME
Multiple regression – time series example
Create dummy variables for the Quarters and time period