Sei sulla pagina 1di 14

12/9/2014

Inferential Statistics using


SPSS
Chafik Bouhaddioui
Department of Statistics

Outline
Hypothesis testing using SPSS
Analysis of Variance: One and two way
ANOVA using SPSS
Regression analysis using SPSS
Time Series using SPSS*.

One mean
Is there an evidence to say that the mean
salary of employees is greater than
$32,000.00?
Hypotheses:
If we define by the mean salary of
employees, then:
H0 : = 32000
H1 : > 32000
This test is called one sample t-test.

12/9/2014

One mean t-test: SPSS

P-value method
The p - value provides information about the amount
of statistical evidence that supports the alternative
hypothesis.

The p-value of a test is the probability of observing a


test statistic at least as extreme as the one computed,
given that the null hypothesis is true.

Let us demonstrate the concept on example


SPSS or any Statistical Software will give you the p-value
5

One mean t-test: SPSS

12/9/2014

One mean t-test: SPSS


Interpret?
Decision:
Conclusion:

Two independent means


Is there any evidence to conclude that the
company is discriminating between males
and females in salaries?
If we define by:
=
=
Hypotheses:

H0 : =
H1 : >
This test is called 2 samples t-test.

Two independent means: SPSS

12/9/2014

Two independent means: SPSS

How to extract age from date?

Exercise
Compare the mean age of male and
female employees?

12/9/2014

Analysis of Variance:
Example:
A pharmaceutical manufacturer would like to be able to
claim that its new headache relief medication is better than
those of rivals. Also, it has two methods for formulating
its product, and it would like to compare these as well.
File: Headache.sav
The data is the result of an experiment where in the
column drug (1 is active compound #1, 2 for active
compound 2, 3 for rival product and 4 for control group
(aspirin). We measured a pain relief score with a range
from 0 (no relief) to 50 (complete relief). Study was carried
out double-blind
From the small experiment, what claims can the marketers
offer?

H0: m1 = m2= m3 = m4
H1: At least one of the means
differs
To perform the analysis of variance
we need to build an F statistic.
To more easily follow the process we use
the following notation:

12/9/2014

Descriptives
PainRelief

N
Activ e1
Activ e2
Control
Riv al
Total

10
11
29
14
64

Mean
13.370
22.255
11.462
14.250
14.225

Std. Dev iat ion


5.9183
6.2943
7.6760
6.6110
7.8349

SSTreat

Std. Error
1.8715
1.8978
1.4254
1.7669
.9794

95% Conf idence Interv al f or


Mean
Lower Bound
Upper Bound
9.136
17.604
18.026
26.483
8.542
14.382
10.433
18.067
12.268
16.182

Minimum
1.3
10.6
.5
3.3
.5

Maximum
22.3
31.9
25.1
25.2
31.9

MSTreat
ANOVA

PainRelief

Between Groups
Within Groups
Total

Sum of
Squares
937.908
2929.412
3867.320

df
3
60
63

MSE
SSE

Mean Square
312.636
48.824

F
6.403

Sig.
.001

P_Value

12/9/2014

We can also use General Linear


Model. This way we do not need
to do any recoding.

We can also use general linear


model/univariate
P-value
Tests of Between-Subjects Effects
Dependent Variable: PainRelief
Source
Corrected Model
Intercept
Drug
Error
Total
Corrected Total

Ty pe II I Sum
of Squares
937.908a
12674.937
937.908
2929.412
16817.760
3867.320

df
3
1
3
60
64
63

Mean Square
312.636
12674.937
312.636
48.824

F
6.403
259.607
6.403

Sig.
.001
.000
.001

a. R Squared = .243 (Adjust ed R Squared = .205)

12/9/2014

Interpretations
Decision:

Conclusion:

Multiple comparisons
When the null hypothesis is rejected, it may
be desirable to find which mean(s) is (are)
different, and at what ranking order.
Three statistical inference procedures,
geared at doing this, are discussed:
Fishers least significant difference (LSD)
method
Bonferroni adjustment
Tukeys multiple comparison method

Multiple comparisons
If you just need to verify 2 or 3 pairwise
comparisons use the Bonferroni method.
If you plan to do all possible comparisons,
use Tukey.
Fisher might be used if you want to identify
areas that require further analysis.

12/9/2014

Multiple Comparisons
Dependent Variable: PainRelief
Tukey HSD

(I) drug_code
Activ e1

Activ e2

Control

Rival

(J) drug_code
Activ e2
Control
Rival
Activ e1
Control
Rival
Activ e1
Activ e2
Rival
Activ e1
Activ e2
Control

Mean
Diff erence
(I-J)
Std. Error
-8.8845*
3.0530
1.9079
2.5624
-.8800
2.8931
8.8845*
3.0530
10.7925*
2.4743
8.0045*
2.8153
-1.9079
2.5624
-10.7925*
2.4743
-2.7879
2.2740
.8800
2.8931
-8.0045*
2.8153
2.7879
2.2740

Sig.
.025
.879
.990
.025
.000
.030
.879
.000
.613
.990
.030
.613

95% Confidence Interv al


Lower Bound Upper Bound
-16.952
-.817
-4.863
8.679
-8.525
6.765
.817
16.952
4.254
17.331
.565
15.444
-8.679
4.863
-17.331
-4.254
-8.797
3.221
-6.765
8.525
-15.444
-.565
-3.221
8.797

*. The mean diff erence is signif icant at the .05 lev el.

Other fixed effects Analysis of


Variance Models
We are interested in studying the effect of
several factors on some dependent variable.
Each characteristic investigated is called a factor.
Each factor has several levels.

Difference among the levels of factor A, and Difference among the levels of factor A
difference among the levels of factor B; no
No difference among the levels of factor B
M R interaction
Level 1and 2 of factor B
M R
e s
Level 1 of factor B e s
a p
a p
n o
Level 2 of factor B n o
n
n
s
s
e
e
Levels of factor A
Levels of factor A
1
M
e
a
n

R
s
p
o
n
s
e

M
No difference among the levels of factor A. e
a
Difference among the levels of factor B
n

R
s
p
o
n
s
e

2
Interaction

Levels of factor A

Levels of factor A
1

12/9/2014

Example: Evaluating Employee Time Schedules


Should the clerical employees of a large insurance company be
switched to a four-day week, allowed to use flextime schedules, or
kept to the usual 9-to-5 workday?
File: Flextime.sav
The data measure the percentage
efficiency gains over a four-week trial.

Department

Condition

1 Claims

1 Flextime

2 Data
processing

2 Four-day
week

3 Investments

3 Regular
hours

Estimated Marginal Means of Improve


Condition
Flex
FourDay
Regular

Estimated Marginal Means

15.00

10.00

5.00

0.00

-5.00

Claims

DP

Invest

Department

10

12/9/2014

Box-Plot

Box-Plot

Exercise
Study the effect of gender and the job
categories on salaries.

11

12/9/2014

Regression Analysis Using


SPSS
We employ Regression Analysis to
examine the relationship among
quantitative variables.
The technique is used to predict the
value of one variable (the dependent
variable - y)based on the value of other
variables (independent variables x1,
x2,xk.)
34

Example 1: DataCom
The human resources manager at DataCom, Inc.
wants to predict the annual salary of given
employees using the following explanatory
variables: The number of years of prior relevant
experience, the number of years of employment at
DataCom, the number of years of education beyond
high school, the employee's gender, the employee's
department, and the number of individuals
supervised by the given employee. These data are
collected for a sample of employees and are
provided in the file DataCom.sav.
35

The Model
The first order linear model

y 0 1x
y = dependent variable
x = independent variable y
b0 = y-intercept
b1 = slope of the line
= error variable

b0 and b1 are unknown,


therefore, are estimated
from the data.

Rise

1 = Rise/Run

Run

x
36

12

12/9/2014

If we have more predictors


We allow for k independent variables to
potentially be related to the dependent
variable

Coefficients

Random error variable

y = 0 + 1x1+ 2x2 + + kxk +


Dependent variable

Independent variables

Example 1
DataCom example mentioned in the
first part of this unit.
Use dummy variables to evaluate the
effect of the department on salaries.

Example 1 using SPSS

13

12/9/2014

Example 1 using SPSS

Example 1 using SPSS

14

Potrebbero piacerti anche