Sei sulla pagina 1di 80

SPSS Workshop 2014 Tutorial

Sung Hyun Kang, Biostatistician


WCHRI, University of Alberta
3/5/2014
1 SPSS Workshop 2014 Tutorial

Table of Contents
1. Outline: SPSS Workshop 2014 .............................................................................................................................3
2. What is SPSS? .....................................................................................................................................................4
3. Introducing the SPSS interface ............................................................................................................................5
3.1. SPSS Data Editor: Data View ........................................................................................................................................... 5
3.2. SPSS Data Editor: Variable View...................................................................................................................................... 5
3.3. SPSS Output window....................................................................................................................................................... 7
3.4. SPSS Syntax window........................................................................................................................................................ 7
4. Getting familiar with SPSS Menu and Icon ...........................................................................................................8
5. Data Import/Export ..........................................................................................................................................10
5.1. Create Data File (Entering Data) ................................................................................................................................... 10
5.2. Opening Data File (Import data) ................................................................................................................................... 15
5.2.1. Opening SPSS data: File > Open>Data… (Select SPSS statistics (*.sav) as File of type :) ..................................... 15
5.2.2. Opening Text File: Fixed width .............................................................................................................................. 15
5.2.3. Opening Text File: (Tab) Delimited ....................................................................................................................... 17
5.2.4. Opening EXCEL (or CSV) File .................................................................................................................................. 19
5.2.5. Opening SAS data file ............................................................................................................................................ 20
5.3. Export Data File (Save as different type of data) .......................................................................................................... 21
5.4. Saving Data File with selected variables ....................................................................................................................... 21
6. Manipulating data1 (SPSS Menu: Data) .............................................................................................................22
6.1. Data Menu: Sort Cases… ............................................................................................................................................... 22
6.2. Data Menu: Identify Duplicate Cases… ......................................................................................................................... 23
6.3. Data Menu: Merge Files > Add Cases…......................................................................................................................... 24
6.4. Data Menu: Merge Files > Add Variables… ................................................................................................................... 25
6.5. Data Menu: Aggregate… ............................................................................................................................................... 26
6.6. Data Menu: Restructure… ............................................................................................................................................. 27
6.7. Data Menu: Split into Files ............................................................................................................................................ 29
6.8. Data Menu: Split Files…................................................................................................................................................. 30
6.9. Data Menu: Select Cases… ............................................................................................................................................ 31
6.10. Data Menu: Weight Cases… .................................................................................................................................. 32
7. Manipulating data2 (SPSS Menu: Transform).....................................................................................................33
7.1. Transform Menu: Compute Variable… ......................................................................................................................... 33
7.2. Transform Menu: Recode into Same Variables….......................................................................................................... 33
7.3. Transform Menu: Recode into Different Variables… .................................................................................................... 34
7.4. Transform Menu: Automatic Recode… ......................................................................................................................... 34
7.5. Transform Menu: Create Dummy Variables ................................................................................................................. 35
7.6. Transform Menu: Visual Binning… ................................................................................................................................ 35
7.7. Transform Menu: Rank Cases…..................................................................................................................................... 37
7.8. Transform Menu: Date and time Wizard… ................................................................................................................... 37
7.9. Transform Menu: Replace missing values… .................................................................................................................. 38
8. Descriptive statistics .........................................................................................................................................39
8.1. Descriptive statistics for continuous data (Interval, Ratio)........................................................................................... 39
8.2. Descriptive statistics for categorical data (Nominal, Ordinal) ...................................................................................... 44
8.3. Generating graphs (or charts) for continuous data (Interval, Ratio) ............................................................................ 47
8.4. Generating graphs (or charts) for categorical data (Nominal, Ordinal) ........................................................................ 50
8.5. Using Chart Builder ....................................................................................................................................................... 52
WCHRI, University of Alberta
2 SPSS Workshop 2014 Tutorial

8.6. Chart Edit Window ........................................................................................................................................................ 53


9. Compare Means (T-test) ...................................................................................................................................54
9.1. Independent sample t-test with two groups ................................................................................................................ 54
9.2. Paired samples t-test .................................................................................................................................................... 55
10. Compare proportions (Analysis of contingency table) and association................................................................56
10.1. Pearson’s Chi-Square test ..................................................................................................................................... 56
10.2. Fisher’s exact test.................................................................................................................................................. 57
10.3. Cochran-Mantel-Haenszel (CMH) Statistics .......................................................................................................... 58
10.4. McNemar’s test for matched pairs data ............................................................................................................... 61
10.5. Measure of Agreement (Cohen’s Kappa) .............................................................................................................. 62
10.6. Studies in medical science (Review) ..................................................................................................................... 63
11. ANOVA, ANCOVA, and MANOVA ......................................................................................................................64
11.1. One-way ANOVA ................................................................................................................................................... 64
11.2. Two-way ANOVA (With interaction) ..................................................................................................................... 66
11.3. ANCOVA (Analysis of Covariance) ......................................................................................................................... 68
11.4. MANOVA (Multivariate ANOVA) ........................................................................................................................... 69
12. Nonparametic method......................................................................................................................................72
12.1. Wilcoxon rank-sum test (Mann-Whitney U test) .................................................................................................. 72
12.2. Wilcoxon signed-rank test for paired data ........................................................................................................... 73
12.3. Kruskal-Wallis test ................................................................................................................................................. 74
13. Correlation and Regression analysis ..................................................................................................................75
13.1. Correlation analysis ............................................................................................................................................... 75
13.2. Linear Regression model ....................................................................................................................................... 76
14. Logistic regression analysis ...............................................................................................................................78

WCHRI, University of Alberta


3 SPSS Workshop 2014 Tutorial

1. Outline: SPSS Workshop 2014

 The first day objectives are:


• Learning about SPSS 1
• Opening and reviewing layouts of SPSS
• Becoming familiar with menus and icons
• Manipulating data files
• Calculating descriptive statistics
• Comparing means and proportions
• Calculating association
• Creating graphs
• Working with SPSS syntax

 The second day objectives are:


• ANOVA, ANCOVA, MANOVA
• Repeated measure ANOVA
• Correlation analysis & (ordinary) linear regression
• Logistic regression
• Nonparametric method

1
Note that this tutorial was created using IBM SPSS Statistics Version 22.

WCHRI, University of Alberta


4 SPSS Workshop 2014 Tutorial

2. What is SPSS?

• Windows based program that can be used to perform data entry and analysis and to create tables and graphs.
• Capable of handling large amounts of data and can perform all of the analyses covered in the text and much
more.
• Commonly used in the Social Sciences and in the business world.
• SPSS is updated often.

Figure 1 Running SPSS with welcome dialogue

• SPSS Extension file name:


o SPSS data file: *.sav
o SPSS output file: *.spo
o SPSS syntax file: *.sps

WCHRI, University of Alberta


5 SPSS Workshop 2014 Tutorial

3. Introducing the SPSS interface


3.1. SPSS Data Editor: Data View

 Many of the features of Data View are similar to the features that are found in spreadsheet applications. There are,
however, several important distinctions:
• Rows are cases. Each row represents a case or an observation. For example, each individual respondent to a
questionnaire is a case.
• Columns are variables. Each column represents a variable or characteristic that is being measured. For example, each
item on a questionnaire is a variable.
• Cells contain values. Each cell contains a single value of a variable for a case. The cell is where the case and the variable
intersect. Cells contain only data values. Unlike spreadsheet programs, cells in the Data Editor cannot contain formulas.
• The data file is rectangular. The dimensions of the data file are determined by the number of cases and variables. You
can enter data in any cell. If you enter data in a cell outside the boundaries of the defined data file, the data rectangle
is extended to include any rows and/or columns between that cell and the file boundaries. There are no "empty" cells
within the boundaries of the data file. For numeric variables, blank cells are converted to the system-missing value. For
string variables, a blank is considered a valid value.

Figure 2 SPSS Data Editor: Data View

3.2. SPSS Data Editor: Variable View

 Variable View contains descriptions of the attributes of each variable in the data file. In Variable View:
• Rows are variables.
• Columns are variable attributes.
WCHRI, University of Alberta
6 SPSS Workshop 2014 Tutorial

 You can add or delete variables and modify attributes of variables, including the following attributes:
• Variable name
• Data type
• Number of digits or characters
• Number of decimal places
• Descriptive variable and value labels
• User-defined missing values
• Column width
• Measurement level
All of these attributes are saved when you save the data file.

 In addition to defining variable properties in Variable View, there are two other methods for defining variable properties:
• The Copy Data Properties Wizard provides the ability to use an external IBM® SPSS® Statistics data file or another
dataset that is available in the current session as a template for defining file and variable properties in the active
dataset. You can also use variables in the active dataset as templates for other variables in the active dataset. Copy
Data Properties is available on the Data menu in the Data Editor window. See the topic Copying Data Properties for
more information.
• Define Variable Properties (also available on the Data menu in the Data Editor window) scans your data and lists all
unique data values for any selected variables, identifies unlabeled values, and provides an auto-label feature. This
method is particularly useful for categorical variables that use numeric codes to represent categories--for example, 0 =
Male, 1 = Female. See the topic Defining Variable Properties for more information.

Figure 3 SPSS Data Editor: Variable View

WCHRI, University of Alberta


7 SPSS Workshop 2014 Tutorial

3.3. SPSS Output window

Figure 4 SPSS output

3.4. SPSS Syntax window

Figure 5 SPSS syntax

WCHRI, University of Alberta


8 SPSS Workshop 2014 Tutorial

4. Getting familiar with SPSS Menu and Icon

SPSS MENUs
SPSS ICONs

Status Bar

Figure 6 SPSS Menu and Icon

 SPSS MENU
• File includes all of the options you typically use in other programs, such as open, save, exit. Notice, that you can open
or create new files of multiple types as illustrated to the right.
• Edit includes the typical cut, copy, and paste commands, and allows you to specify various options for displaying data
and output.
o Click on Options, and you will see the dialog box to the left. You can use this to format the data, output, charts,
etc. These choices are rather overwhelming, and you can simply take the default options for now. The author
of your text (me) was too dumb to even know these options could easily be set.

Figure 7 SPSS Menu - Edit (Options)


WCHRI, University of Alberta
9 SPSS Workshop 2014 Tutorial

• View allows you to select which toolbars you want to show, select font size, add or remove the gridlines that separate
each piece of data, and to select whether or not to display your raw data or the data labels.
• Data allows you to select several options ranging from displaying data that is sorted by a specific variable to selecting
certain cases for subsequent analyses.
• Transform includes several options to change current variables. For example, you can change continuous variables to
categorical variables, change scores into rank scores, add a constant to variables, etc.
• Analyze includes all of the commands to carry out statistical analyses and to calculate descriptive statistics. Much of
this book will focus on using commands located in this menu.
• Graphs includes the commands to create various types of graphs including box plots, histograms, line graphs, and bar
charts.
• Utilities allows you to list file information which is a list of all variables, there labels, values, locations in the data file,
and type.
• Add-ons are programs that can be added to the base SPSS package. You probably do not have access to any of those.
• Window can be used to select which window you want to view (i.e., Data Editor, Output Viewer, or Syntax). Since we
have a data file and an output file open, let’s try this.
o Select Window/Data Editor. Then select Window/SPSS Viewer.
• Help has many useful options including a link to the SPSS homepage, a statistics coach, and a syntax guide. Using topics,
you can use the index option to type in any key word and get a list of options, or you can view the categories and
subcategories available under contents. This is an excellent tool and can be used to troubleshoot most problems.

 SPSS ICON
• The Icons directly under the Menu bar provide shortcuts to many common commands that are available in specific
menus. Take a moment to review these as well.

 STATUS Bar
The status bar at the bottom of each IBM® SPSS® Statistics window provides the following information:
• Command status. For each procedure or command that you run, a case counter indicates the number of cases
processed so far. For statistical procedures that require iterative processing, the number of iterations is displayed.
• Filter status. If you have selected a random sample or a subset of cases for analysis, the message Filter on indicates
that some type of case filtering is currently in effect and not all cases in the data file are included in the analysis.
• Weight status. The message Weight on indicates that a weight variable is being used to weight cases for analysis.
• Split File status. The message Split File on indicates that the data file has been split into separate groups for analysis,
based on the values of one or more grouping variables.

WCHRI, University of Alberta


10 SPSS Workshop 2014 Tutorial

5. Data Import/Export
5.1. Create Data File (Entering Data)

 Variable names
The following rules apply to variable names:

• Each variable name must be unique; duplication is not allowed.


• Variable names can be up to 64 bytes long, and the first character must be a letter or one of the characters @, #, or $.
Subsequent characters can be any combination of letters, numbers, non-punctuation characters, and a period (.). In
code page mode, sixty-four bytes typically means 64 characters in single-byte languages (for example, English, French,
German, Spanish, Italian, Hebrew, Russian, Greek, Arabic, and Thai) and 32 characters in double-byte languages (for
example, Japanese, Chinese, and Korean). Many string characters that only take one byte in code page mode take two
or more bytes in Unicode mode. For example, é is one byte in code page format but is two bytes in Unicode format; so
résumé is six bytes in a code page file and eight bytes in Unicode mode.

Note: Letters include any non-punctuation characters used in writing ordinary words in the languages supported in the
platform's character set.

• Variable names cannot contain spaces.


• A # character in the first position of a variable name defines a scratch variable. You can only create scratch variables
with command syntax. You cannot specify a # as the first character of a variable in dialog boxes that create new
variables.
• A $ sign in the first position indicates that the variable is a system variable. The $ sign is not allowed as the initial
character of a user-defined variable.
• The period, the underscore, and the characters $, #, and @ can be used within variable names. For example, A._$@#1
is a valid variable name.
• Variable names ending with a period should be avoided, since the period may be interpreted as a command terminator.
You can only create variables that end with a period in command syntax. You cannot create variables that end with a
period in dialog boxes that create new variables.
• Variable names ending in underscores should be avoided, since such names may conflict with names of variables
automatically created by commands and procedures.
• Reserved keywords cannot be used as variable names. Reserved keywords are ALL, AND, BY, EQ, GE, GT, LE, LT, NE,
NOT, OR, TO, and WITH.
• Variable names can be defined with any mixture of uppercase and lowercase characters, and case is preserved for
display purposes.
• When long variable names need to wrap onto multiple lines in output, lines are broken at underscores, periods, and
points where content changes from lower case to upper case.

 Variable type
Variable Type specifies the data type for each variable. By default, all new variables are assumed to be numeric. You can use
Variable Type to change the data type. The contents of the Variable Type dialog box depend on the selected data type. For
some data types, there are text boxes for width and number of decimals; for other data types, you can simply select a
format from a scrollable list of examples. The available data types are as follows:

• Numeric. A variable whose values are numbers. Values are displayed in standard numeric format. The Data Editor
accepts numeric values in standard format or in scientific notation.
WCHRI, University of Alberta
11 SPSS Workshop 2014 Tutorial

• Comma. A numeric variable whose values are displayed with commas delimiting every three places and displayed with
the period as a decimal delimiter. The Data Editor accepts numeric values for comma variables with or without commas
or in scientific notation. Values cannot contain commas to the right of the decimal indicator.
• Dot. A numeric variable whose values are displayed with periods delimiting every three places and with the comma as
a decimal delimiter. The Data Editor accepts numeric values for dot variables with or without periods or in scientific
notation. Values cannot contain periods to the right of the decimal indicator.
• Scientific notation. A numeric variable whose values are displayed with an embedded E and a signed power-of-10
exponent. The Data Editor accepts numeric values for such variables with or without an exponent. The exponent can be
preceded by E or D with an optional sign or by the sign alone--for example, 123, 1.23E2, 1.23D2, 1.23E+2, and 1.23+2.
• Date. A numeric variable whose values are displayed in one of several calendar-date or clock-time formats. Select a
format from the list. You can enter dates with slashes, hyphens, periods, commas, or blank spaces as delimiters. The
century range for two-digit year values is determined by your Options settings (from the Edit menu, choose Options,
and then click the Data tab).
• Dollar. A numeric variable displayed with a leading dollar sign ($), commas delimiting every three places, and a period
as the decimal delimiter. You can enter data values with or without the leading dollar sign.
• Custom currency. A numeric variable whose values are displayed in one of the custom currency formats that you have
defined on the Currency tab of the Options dialog box. Defined custom currency characters cannot be used in data
entry but are displayed in the Data Editor.
• String. A variable whose values are not numeric and therefore are not used in calculations. The values can contain any
characters up to the defined length. Uppercase and lowercase letters are considered distinct. This type is also known as
an alphanumeric variable.
• Restricted numeric. A variable whose values are restricted to non-negative integers. Values are displayed with leading
zeros padded to the maximum width of the variable. Values can be entered in scientific notation.

Figure 8 SPSS Variable View: Variable Type

 Measure of Variables
• Nominal variable is one that has two or more categories, but there is no intrinsic ordering to the categories.
e.g., gender, ethnicity etc.

WCHRI, University of Alberta


12 SPSS Workshop 2014 Tutorial

• Ordinal variable is similar to nominal variable with clear ordering of the categories but the spacing between the values
may not be the same.
e.g. Socio-economic status, Severity of disease etc.
• Interval variable is similar to ordinal variable with intervals between values are equally spaced.
e.g. Height, weight, age etc.

Figure 9 SPSS Variable View: Measure of Variable

 Missing values
• If you do not enter any data in a field, it will be considered as missing and SPSS will enter a period for you.
• Or you can define specific value as missing value

Figure 10 SPSS Variable View: Define missing values

WCHRI, University of Alberta


13 SPSS Workshop 2014 Tutorial

Example) Enter the following data in SPSS and save file


PatientID Gender Age Weight Height Ethnicity
1 1 18 175 155 A
2 2 31 156 150 W
3 1 12 141 136 B
4 9 31 160 177 O
• For Gender,
o 1=”Male”
o 2=”Female”
o 9=”User defined Missing value”
• For Ethnicity,
o A=”Aboriginal”
o W=”White”
o B=”Black”
o O=”Others”

• Entering data in SPSS (Variable name, define value labels, and define missing value)

WCHRI, University of Alberta


14 SPSS Workshop 2014 Tutorial

• Save data file as SPSS data format (File name: Example1.sav)

WCHRI, University of Alberta


15 SPSS Workshop 2014 Tutorial

5.2. Opening Data File (Import data)


Data files come in a wide variety of formats, and this software is designed to handle many of them, including:
• Spreadsheets created with Excel and Lotus
• Database tables from many database sources, including Oracle, SQLServer, Access, dBASE, and others
• Tab-delimited and other types of simple text files
• Data files in IBM® SPSS® Statistics format created on other operating systems
• SYSTAT data files. SYSTAT SYZ files are not supported.
• SAS data files
• Stata data files
• IBM Cognos Business Intelligence data packages and list reports

5.2.1. Opening SPSS data: File > Open>Data… (Select SPSS statistics (*.sav) as File of type :)
5.2.2. Opening Text File: Fixed width
o Raw data

Figure 11 Text Data File: Fixed width

o Open in SPSS
o File > Read Text Data…
o File > Open>Data…

WCHRI, University of Alberta


16 SPSS Workshop 2014 Tutorial

WCHRI, University of Alberta


17 SPSS Workshop 2014 Tutorial

5.2.3. Opening Text File: (Tab) Delimited


o Raw data

Figure 12 Text Data File: (Tab) Delimited

o Open in SPSS
o File > Read Text Data…
o File > Open>Data…

WCHRI, University of Alberta


18 SPSS Workshop 2014 Tutorial

WCHRI, University of Alberta


19 SPSS Workshop 2014 Tutorial

5.2.4. Opening EXCEL (or CSV) File


o Raw data

Figure 13 EXCEL data file

Figure 14 CSV data file

o Open in SPSS
o File > Open>Data…
 Select EXCEL as “Files of type:” to open EXCEL file
 Select TEXT as “Files of type:” to open CSV file
o Or simply drag EXCEL (or CSV) file to SPSS program
WCHRI, University of Alberta
20 SPSS Workshop 2014 Tutorial

5.2.5. Opening SAS data file


o Raw data

Figure 15 SAS data file

o Open in SPSS
o File > Open>Data…
 Select SAS as “Files of type:” to open SAS data file
o Or simply drag SAS data file to SPSS program
WCHRI, University of Alberta
21 SPSS Workshop 2014 Tutorial

5.3. Export Data File (Save as different type of data)


o Data export in SPSS
o File >Save as…
 Select “Save as type” to export data

Figure 16 Export data (Save as different data format)

5.4. Saving Data File with selected variables


o File >Save as… (click “Variables…” in Save Data As window)

Figure 17 Save data with selected variables

WCHRI, University of Alberta


22 SPSS Workshop 2014 Tutorial

6. Manipulating data1 (SPSS Menu: Data)


Even though, you have cleaned final dataset. During the data analysis, you need to manipulate data. For instance, if you have
several datasets from several different resources, you might need to merge or cut data. And no matter how carefully you
planned your data design, you’ll probably want to work with some variables in different forms. If you collected income or age
data, for example, you might want to group the continuous variables into categories. Or you might want to create a variable
that combines various conditions, say, all minority managers by gender.
This section 6 & 7 will deal with Data and Transform menu in SPSS

Figure 18 SPS Menu: Data Figure 19 SPSS Menu: Transform

6.1. Data Menu: Sort Cases…

Figure 20 Data Menu: Sort Cases...

WCHRI, University of Alberta


23 SPSS Workshop 2014 Tutorial

6.2. Data Menu: Identify Duplicate Cases…


(Sample data: Example_Identify_Duplicated.sav)

Figure 21 Data with duplicate cases

Figure 22 Data Menu: Identify Duplicate Cases...

o SPSS output

Indicator of each last matching case as Primary

Cumulative
Frequency Percent Valid Percent Percent
Valid Duplicate Case 1 3.2 3.2 3.2
Primary Case 30 96.8 96.8 100.0
Total 31 100.0 100.0

WCHRI, University of Alberta


24 SPSS Workshop 2014 Tutorial

6.3. Data Menu: Merge Files > Add Cases…


(Sample data: Example_MergeByCase_01.sav, Example_MergeByCase_02.sav)

Figure 23 SPSS data files to be merged (Add cases)

o Both SPSS datasets above have same variables with same name, but different cases (One is data for Female, the other is for
Male).

Figure 24 Data Menu: Merge Files > Add Cases...


WCHRI, University of Alberta
25 SPSS Workshop 2014 Tutorial

6.4. Data Menu: Merge Files > Add Variables…


(Sample data: Example_MergeByVariables_01.sav, Example_MergeByVariable_02.sav)

Figure 25 SPSS datasets to be merged (Add variables)

o Note that both SPSS datasets above must have key variable (Unique Identifier).
o Note that both SPSS datasets above must be sorted by key variable before merging files.

Figure 26 Data Menu: Merge Files > Add Variables...


WCHRI, University of Alberta
26 SPSS Workshop 2014 Tutorial

6.5. Data Menu: Aggregate…


(Sample data: DataExcel.sav)

Figure 27 Data Menu: Aggregate...

Figure 28 Aggregated data by Gender

WCHRI, University of Alberta


27 SPSS Workshop 2014 Tutorial

6.6. Data Menu: Restructure…


(Sample data: Example1_RepeatedMeasureANOVA.sav)

o Data structure for Repeated Measures ANOVA (Wide format)

subject group dv1 dv2 dv3 dv4


1 1 3 4 7 3
2 1 6 8 12 9
3 1 7 13 11 11
4 1 0 3 6 6
5 2 5 6 11 7
6 2 10 12 18 15
7 2 10 15 15 14
8 2 5 7 11 9

o Data structure for Mixed Model (Long format)

subject group dv trial


1 1 3 1
1 1 4 2
1 1 7 3
1 1 3 4
: : : :
8 2 5 1
8 2 7 2
8 2 11 3
8 2 9 4

o Using “Restructure” menu in SPSS (Step1 – 7), we can convert wide format data into long format (or Vice versa)

WCHRI, University of Alberta


28 SPSS Workshop 2014 Tutorial

WCHRI, University of Alberta


29 SPSS Workshop 2014 Tutorial

6.7. Data Menu: Split into Files


(Sample data: DataExcel.sav)

Figure 29 Data Menu: Split into Files

o This procedure will generate two separate SPSS dataset by gender

WCHRI, University of Alberta


30 SPSS Workshop 2014 Tutorial

6.8. Data Menu: Split Files…


(Sample data: DataExcel.sav)

Figure 30 Data Menu: Split Files...

o If you run above, you can see “Split by Gender” in Status bar
o SPSS output before splitting files by Gender.
Descriptive Statistics
N Minimum Maximum Mean Std. Deviation
Age 30 12 31 21.60 6.360
Weight 30 100 200 147.13 34.309
Height 30 121 188 144.00 17.388
Valid N (listwise) 30

o SPSS output after splitting files by Gender.


Descriptive Statistics
Gender N Minimum Maximum Mean Std. Deviation
F Age 18 12 31 20.06 5.955
Weight 18 100 199 142.61 36.934
Height 18 121 188 143.94 18.479
Valid N (listwise) 18
M Age 12 12 31 23.92 6.487
Weight 12 111 200 153.92 30.189
Height 12 122 177 144.08 16.412
Valid N (listwise) 12

o If you analyze all cases, then select “Analyze all cases, do not create groups” in “Split Files…” menu (See Figure 30)

WCHRI, University of Alberta


31 SPSS Workshop 2014 Tutorial

6.9. Data Menu: Select Cases…


(Sample data: DataExcel.sav)

Figure 31 Data Menu: Select Cases...

Figure 32 SPSS dataset with selected cases

o From the Figure 32, Female data was selected, so all of male data will be excluded from the analysis.
o If you want to use all cases again, then select “All cases” in “Select Cases…” menu in Figure 31

WCHRI, University of Alberta


32 SPSS Workshop 2014 Tutorial

6.10. Data Menu: Weight Cases…

Figure 33 Data Menu: Weight Cases...


Suppose we have cross (or contingency) table below to see association between X and Y.
YY
Variable Total
1 2
1 15 20 35
X
2 25 35 60
Total 40 55 95

o Data input in SPSS.

Figure 34 Data input for weight cases


o SPSS output before weighting cases.
x * y Crosstabulation
Count
y
1 2 Total
x 1 1 1 2
2 1 1 2
Total 2 2 4

o SPSS output after weighting cases (“Weight on” in Status bar).

x * y Crosstabulation
Count
y
1 2 Total
x 1 15 20 35
2 25 35 60
Total 40 55 95

WCHRI, University of Alberta


33 SPSS Workshop 2014 Tutorial

7. Manipulating data2 (SPSS Menu: Transform)


7.1. Transform Menu: Compute Variable…
(Sample data: DataExcel.sav)

Figure 35 Transform Menu: Compute Variable

7.2. Transform Menu: Recode into Same Variables…


(Sample data: DataExcel.sav)

WCHRI, University of Alberta


34 SPSS Workshop 2014 Tutorial

7.3. Transform Menu: Recode into Different Variables…


(Sample data: DataExcel.sav)
o From the example dataset, we want to generate categorize age variable (<20 years, 20-29 years, >=30 years)

7.4. Transform Menu: Automatic Recode…


(Sample data: DataExcel.sav)

WCHRI, University of Alberta


35 SPSS Workshop 2014 Tutorial

7.5. Transform Menu: Create Dummy Variables


(Sample data: DataExcel.sav)

7.6. Transform Menu: Visual Binning…


(Sample data: DataExcel.sav)

o From the example dataset, we want to generate same categorized age variable (<20 years, 20-29 years, >=30 years) in
Section 7.4.

WCHRI, University of Alberta


36 SPSS Workshop 2014 Tutorial

o Make Cutpoints…

WCHRI, University of Alberta


37 SPSS Workshop 2014 Tutorial

7.7. Transform Menu: Rank Cases…


(Sample data: DataExcel.sav)

7.8. Transform Menu: Date and time Wizard…


(Sample data: DataExcel.sav)

o In the example dataset, “Date” variable is string variable. Let’s generate date type variable using string (or text).

WCHRI, University of Alberta


38 SPSS Workshop 2014 Tutorial

7.9. Transform Menu: Replace missing values…


(Sample data: DataExcel.sav)

WCHRI, University of Alberta


39 SPSS Workshop 2014 Tutorial

8. Descriptive statistics
• With the dataset specified and labeled it is ready for analysis.
• The first thing that would be done before conducting the analysis would be to present descriptive statistics for each of the
variables in the study.
• The descriptive statistics that will be presented or frequency distributions, measures of central tendency and comparing
means with different groups etc.

Figure 36 Analyze Menu: Descriptive Statistics

8.1. Descriptive statistics for continuous data (Interval, Ratio)


- Central tendency (Mean, Median, Mode etc.)
- Dispersion (Variance, Standard deviation, Range, IQR etc.)
- Distribution (Skewness, kurtosis etc.)

• SPSS Menu to perform descriptive analysis for continuous


o Analyze > Descriptive Statistics>Descriptive…
o Analyze > Descriptive Statistics>Explore…
o Analyze > Compare Means>Means…
o Analyze > Descriptive Statistics>Frequencies…

WCHRI, University of Alberta


40 SPSS Workshop 2014 Tutorial

• Analyze > Descriptive Statistics>Descriptive…


(Example dataset: DataExcel.sav)

Figure 37 Analyze>Descriptive Statistics>Descriptives...

o SPSS output:

Descriptive Statistics
N Minimum Mean Std. Deviation Variance Skewness Kurtosis
Statistic Statistic Statistic Statistic Statistic Statistic Std. Error Statistic Std. Error
Age 30 12 21.60 6.360 40.455 .067 .427 -1.380 .833
Weight 30 100 147.13 34.309 1177.085 .087 .427 -1.414 .833
Height 30 121 144.00 17.388 302.345 .872 .427 .358 .833
Valid N (listwise) 30

o SPSS output (by Gender): Splitting File by Gender

Descriptive Statistics
N Minimum Mean Std. Deviation Variance Skewness Kurtosis
Gender Statistic Statistic Statistic Statistic Statistic Statistic Std. Error Statistic Std. Error
F Age 18 12 20.06 5.955 35.467 .527 .536 -.805 1.038
Weight 18 100 142.61 36.934 1364.134 .314 .536 -1.596 1.038
Height 18 121 143.94 18.479 341.467 1.119 .536 .902 1.038
Valid N (listwise) 18
M Age 12 12 23.92 6.487 42.083 -.688 .637 -.852 1.232
Weight 12 111 153.92 30.189 911.356 -.141 .637 -.674 1.232
Height 12 122 144.08 16.412 269.356 .435 .637 -.338 1.232
Valid N (listwise) 12

WCHRI, University of Alberta


41 SPSS Workshop 2014 Tutorial

• Analyze > Descriptive Statistics>Explore…


(Example dataset: DataExcel.sav)

Figure 38 Analyze>Descriptive Statistics>Explore...

o SPSS output:
(Note that if you do analysis by group variable, then add group variable into “Factor List:” on the menu)

Descriptives
Statistic Std. Error
Height Mean 144.00 3.175
95% Confidence Interval for Lower Bound 137.51
Mean
Upper Bound 150.49
5% Trimmed Mean 142.96
Median 144.00
Variance 302.345
Std. Deviation 17.388
Minimum 121
Maximum 188
Range 67
Interquartile Range 24
Skewness .872 .427
Kurtosis .358 .833

Tests of Normality
a
Kolmogorov-Smirnov Shapiro-Wilk
Statistic df Sig. Statistic df Sig.
*
Height .111 30 .200 .927 30 .041
*. This is a lower bound of the true significance.
a. Lilliefors Significance Correction

WCHRI, University of Alberta


42 SPSS Workshop 2014 Tutorial

Height Stem-and-Leaf Plot

Frequency Stem & Leaf

6.00 12 . 124579
8.00 13 . 00013568
6.00 14 . 355679
5.00 15 . 02455
2.00 16 . 03
1.00 17 . 7
2.00 18 . 08

Stem width: 10
Each leaf: 1 case(s)

• Analyze > Compare Means>Means…


(Example dataset: DataExcel.sav)

Figure 39 Analyze > Compare Means > Means…

WCHRI, University of Alberta


43 SPSS Workshop 2014 Tutorial

o SPSS output: Generate descriptive statistics by Age (categorized) and Gender

Report
Weight
Age (categorical variable) Gender N Mean Std. Deviation Median Minimum Maximum
< 20 years F 10 131.60 36.056 111.00 100 197
M 3 169.00 30.050 167.00 140 200
Total 13 140.23 37.343 140.00 100 200
20-29 years F 6 162.33 35.770 169.00 115 199
M 7 146.29 34.369 161.00 111 198
Total 13 153.69 34.541 161.00 111 199
>= 30 years F 2 138.50 38.891 138.50 111 166
M 2 158.00 2.828 158.00 156 160
Total 4 148.25 25.171 158.00 111 166
Total F 18 142.61 36.934 133.00 100 199
M 12 153.92 30.189 160.50 111 200
Total 30 147.13 34.309 158.00 100 200

• Analyze > Descriptive Statistics>Frequencies…


(Example dataset: DataExcel.sav)

Figure 40 Analyze > Descriptive Statisticis>Frequencies...

WCHRI, University of Alberta


44 SPSS Workshop 2014 Tutorial

o SPSS output:
Statistics
Weight Height
N Valid 30 30
Missing 0 0
Mean 147.13 144.00
Median 158.00 144.00
Percentiles 10 102.80 124.10
25 111.00 130.00
30 111.90 130.30
50 158.00 144.00
70 166.70 151.40
75 169.75 154.25

8.2. Descriptive statistics for categorical data (Nominal, Ordinal)


- Frequency table, Cross table

• SPSS Menu to perform descriptive analysis for categorical data


o Analyze > Descriptive Statistics>Frequencies…
o Analyze > Descriptive Statistics>Crosstabs…

• Analyze > Descriptive Statistics>Frequencies…


(Example dataset: DataExcel.sav)

Figure 41 Analyze > Descriptive Statistics>Frequencies...


WCHRI, University of Alberta
45 SPSS Workshop 2014 Tutorial

o SPSS output:
Gender
Cumulative
Frequency Percent Valid Percent Percent
Valid F 18 60.0 60.0 60.0
M 12 40.0 40.0 100.0
Total 30 100.0 100.0

Ethnicity
Cumulative
Frequency Percent Valid Percent Percent
Valid A 11 36.7 36.7 36.7
B 5 16.7 16.7 53.3
O 5 16.7 16.7 70.0
W 9 30.0 30.0 100.0
Total 30 100.0 100.0

Age (categorical variable)


Cumulative
Frequency Percent Valid Percent Percent
Valid < 20 years 13 43.3 43.3 43.3
20-29 years 13 43.3 43.3 86.7
>= 30 years 4 13.3 13.3 100.0
Total 30 100.0 100.0

WCHRI, University of Alberta


46 SPSS Workshop 2014 Tutorial

• Analyze > Descriptive Statistics>Crosstabss…


(Example dataset: DataExcel.sav)

Figure 42 Analyze >Descriptive Statistics>Crosstabs...

o SPSS output:
Gender * Ethnicity Crosstabulation
Ethnicity
A B O W Total
Gender F Count 6 3 3 6 18
% within Gender 33.3% 16.7% 16.7% 33.3% 100.0%
% within Ethnicity 54.5% 60.0% 60.0% 66.7% 60.0%
% of Total 20.0% 10.0% 10.0% 20.0% 60.0%
M Count 5 2 2 3 12
% within Gender 41.7% 16.7% 16.7% 25.0% 100.0%
% within Ethnicity 45.5% 40.0% 40.0% 33.3% 40.0%
% of Total 16.7% 6.7% 6.7% 10.0% 40.0%
Total Count 11 5 5 9 30
% within Gender 36.7% 16.7% 16.7% 30.0% 100.0%
% within Ethnicity 100.0% 100.0% 100.0% 100.0% 100.0%
% of Total 36.7% 16.7% 16.7% 30.0% 100.0%

WCHRI, University of Alberta


47 SPSS Workshop 2014 Tutorial

8.3. Generating graphs (or charts) for continuous data (Interval, Ratio)
- Histogram, Box-plot, Stem-and-Leaf plot
- Error bar chart, Scatter plot etc.
(Example dataset: DataExcel.sav)

Figure 43 SPSS Menu-Graphs

• Histogram: Graphs > Legacy Dialogs > Histogram…

Figure 44 Graph menu - Histogram

WCHRI, University of Alberta


48 SPSS Workshop 2014 Tutorial

• Box-plot: Graphs > Legacy Dialogs > Boxplot…

Figure 45 Graph menu - Boxplot

• Error bar: Graphs > Legacy Dialogs > Errorbar…

Figure 46 Graph menu - Errorbar

WCHRI, University of Alberta


49 SPSS Workshop 2014 Tutorial

• Scatter/Dot plot: Graphs > Legacy Dialogs > Scatter/Dot…

Figure 47 Graph menu – Scatter/Dot

• Stem-and-Leaf: Analyze > Descriptive Statistics>Explore… (see page 40)

WCHRI, University of Alberta


50 SPSS Workshop 2014 Tutorial

8.4. Generating graphs (or charts) for categorical data (Nominal, Ordinal)
- Bar, Pie chart, Line, Area chart etc.
(Example dataset: DataExcel.sav)

• Bar chart: Graphs > Legacy Dialogs > Bar… Analyze > Descriptive Statistics>Explore…

Figure 48 Graph menu – Bar chart

• Line chart: Graphs > Legacy Dialogs > Line…

Figure 49 Graph menu – Line chart

WCHRI, University of Alberta


51 SPSS Workshop 2014 Tutorial

• Area chart: Graphs > Legacy Dialogs > Area…

Figure 50 Graph menu – Area chart

• Pie chart: Graphs > Legacy Dialogs > Pie…

Figure 51 Graph menu – Pie chart

WCHRI, University of Alberta


52 SPSS Workshop 2014 Tutorial

8.5. Using Chart Builder


o SPSS Menu: Graphs > Chart Builder
o Example (Data: DataExcel.sav)

WCHRI, University of Alberta


53 SPSS Workshop 2014 Tutorial

8.6. Chart Edit Window


• If you want to edit chart in SPSS output window, just double-click the chart that you want to edit, then you can edit
chart in “Chart Edit Window”.

Figure 52 Chart Edit Window

• Example:

WCHRI, University of Alberta


54 SPSS Workshop 2014 Tutorial

9. Compare Means (T-test)


9.1. Independent sample t-test with two groups

o SPSS Menu: Analyze > Compare means > Independent-samples T-Test…


o Example (Graze.sav): T aken from Huntsberger and Billingsley (1989).Compares two grazing methods using 32
cattle. Half of the cattle are allowed to graze continuously while the other half are subjected to controlled
grazing time. The researchers want to know if these two grazing methods affect weight gain differently

Figure 53 Independent-samples T test

o SPSS output:

Group Statistics
GrazeType N Mean Std. Deviation Std. Error Mean
WeightGain continuous 16 75.19 33.812 8.453
controlled 16 83.13 30.535 7.634

Independent Samples Test


Levene's Test for
Equality of Variances t-test for Equality of Means
95% Confidence
Interval of the
Sig. (2- Mean Std. Error Difference
F Sig. t df tailed) Difference Difference Lower Upper
WeightGain Equal variances
.085 .773 -.697 30 .491 -7.938 11.390 -31.198 15.323
assumed
Equal variances
-.697 29.694 .491 -7.938 11.390 -31.208 15.333
not assumed

o Interpretation: A group test statistic for the equality of means is reported for both equal and unequal
variances. Both tests indicate a lack of evidence for a significant difference between grazing methods (and for
the pooled test-equal variance assumed), and for the Satterthwaite test-equal variance not assumed). The
equality of variances test does not indicate a significant difference in the two variances (Levene’s Test). This
test assumes that the observations in both groups are normally distributed.

WCHRI, University of Alberta


55 SPSS Workshop 2014 Tutorial

9.2. Paired samples t-test

o SPSS Menu: Analyze > Compare means > Paired-samples T-Test…


o Example (Pressure.sav): A stimulus is being examined to determine its effect on systolic blood pressure.
Twelve men participate in the study. Each man’s systolic blood pressure is measured both before and after the
stimulus is applied.

Figure 54 Paired-samples T test

o SPSS output:

Paired Samples Statistics


Mean N Std. Deviation Std. Error Mean
Pair 1 SBPbefore 128.67 12 6.933 2.001
SBPafter 130.50 12 5.916 1.708

Paired Samples Correlations

N Correlation Sig.

Pair 1 SBPbefore & SBPafter 12 .598 .040

Paired Samples Test


Paired Differences
95% Confidence Interval of
Std. Std. Error the Difference Sig. (2-
Mean Deviation Mean Lower Upper t df tailed)
Pair 1 SBPbefore -
-1.833 5.828 1.683 -5.536 1.870 -1.090 11 .299
SBPafter

o Interpretation: The variables SBPbefore and SBPafter are the paired variables with a sample size of 12.
The summary statistics of the difference are displayed (mean, standard deviation, and standard
error) along with their confidence limits. The minimum and maximum differences are also displayed.
The test is not significant (t=-1.09, p=0.299), indicating that the stimuli did not significantly affect
systolic blood pressure.

WCHRI, University of Alberta


56 SPSS Workshop 2014 Tutorial

10. Compare proportions (Analysis of contingency table) and association


10.1. Pearson’s Chi-Square test
o SPSS Menu: Analyze > Descriptive Statistics > Crosstab…
o Example (Color.sav): The eye and hair color of children from two different regions of Europe are recorded in
the data set Color. Instead of recording one observation per child, the data are recorded as cell counts, where
the variable Count contains the number of children exhibiting each of the 15 eye and hair color combinations.
The data set does not include missing combinations.

Figure 55 Pearson’s Chi-square test

o SPSS output:
Chi-Square Tests
Asymp. Sig. (2-
Value df sided)
Pearson Chi-Square 20.925a 8 .007
Likelihood Ratio 25.973 8 .001
Linear-by-Linear Association 3.229 1 .072
N of Valid Cases 762
a. 0 cells (0.0%) have expected count less than 5. The minimum expected count is
5.75.

Eye Color * Hair Color Crosstabulation


Hair Color
black dark fair medium red Total
Eye Color blue Count 6 51 69 68 28 222
Expected Count 6.4 53.0 66.4 63.2 32.9 222.0
% within Eye Color 2.7% 23.0% 31.1% 30.6% 12.6% 100.0%
% within Hair Color 27.3% 28.0% 30.3% 31.3% 24.8% 29.1%
brown Count 16 94 90 94 47 341
Expected Count 9.8 81.4 102.0 97.1 50.6 341.0
% within Eye Color 4.7% 27.6% 26.4% 27.6% 13.8% 100.0%
% within Hair Color 72.7% 51.6% 39.5% 43.3% 41.6% 44.8%
green Count 0 37 69 55 38 199
Expected Count 5.7 47.5 59.5 56.7 29.5 199.0
% within Eye Color 0.0% 18.6% 34.7% 27.6% 19.1% 100.0%
% within Hair Color 0.0% 20.3% 30.3% 25.3% 33.6% 26.1%
Total Count 22 182 228 217 113 762
Expected Count 22.0 182.0 228.0 217.0 113.0 762.0
% within Eye Color 2.9% 23.9% 29.9% 28.5% 14.8% 100.0%
% within Hair Color 100.0% 100.0% 100.0% 100.0% 100.0% 100.0%

o Interpretation: The SPSS output displays the chi-square statistics. The alternative hypothesis for this analysis
states that eye color is associated with hair color. With p-value=0.007, the alternative hypothesis is supported
WCHRI, University of Alberta
57 SPSS Workshop 2014 Tutorial

10.2. Fisher’s exact test

o SPSS Menu: Analyze > Descriptive Statistics > Crosstab…


o Example (FatComp.sav): This example computes chi-square tests and Fisher’s exact test to compare the
probability of coronary heart disease for two types of diet. It also estimates the relative risks and computes
exact confidence limits for the odds ratio. The data set “FatComp.sav” contains hypothetical data for a case-
control study of high fat diet and the risk of coronary heart disease. The data are recorded as cell counts,
where the variable Count contains the frequencies for each exposure and response combination. The data set
is sorted in descending order by the variables Exposure and Response, so that the first cell of the 2 by 2 table
contains the frequency of positive exposure and positive response.

Figure 56 Fisher’s exact test

o SPSS output:

Heart Disease * Exposure Crosstabulation


Exposure
No Yes Total
Heart Disease Low Cholesterol Diet Count 6 4 10
Expected Count 3.5 6.5 10.0
High Cholesterol Diet Count 2 11 13
Expected Count 4.5 8.5 13.0
Total Count 8 15 23
Expected Count 8.0 15.0 23.0

Chi-Square Tests
Asymp. Sig. (2- Exact Sig. (2- Exact Sig. (1-
Value df sided) sided) sided)
Pearson Chi-Square 4.960a 1 .026
Continuity Correctionb 3.188 1 .074
Likelihood Ratio 5.098 1 .024
Fisher's Exact Test .039 .037
Linear-by-Linear Association 4.744 1 .029
N of Valid Cases 23
a. 2 cells (50.0%) have expected count less than 5. The minimum expected count is 3.48.
b. Computed only for a 2x2 table

WCHRI, University of Alberta


58 SPSS Workshop 2014 Tutorial

Risk Estimate
95% Confidence Interval
Value Lower Upper
Odds Ratio for Heart Disease
(Low Cholesterol Diet / High 8.250 1.154 59.003
Cholesterol Diet)
For cohort Exposure = No 3.900 .989 15.373
For cohort Exposure = Yes .473 .214 1.045
N of Valid Cases 23

o Interpretation: SPSS output displays the chi-square statistics. Because the expected counts in some of the
table cells are small, Output gives a warning that the asymptotic chi-square tests might not be appropriate. In
this case, the exact tests are appropriate. The alternative hypothesis for this analysis states that coronary
heart disease is more likely to be associated with a high fat diet, so a one-sided test is desired. Fisher’s exact
right-sided test analyzes whether the probability of heart disease in the high fat group exceeds the probability
of heart disease in the low fat group; because this p-value is small, the alternative hypothesis is supported.

The odds ratio, displayed in “Risk estimate” table, provides an estimate of the relative risk when an event is
rare. This estimate indicates that the odds of heart disease is 8.25 times higher in the high fat diet group;
however, the wide confidence limits indicate that this estimate has low precision.

10.3. Cochran-Mantel-Haenszel (CMH) Statistics

o SPSS Menu: Analyze > Descriptive Statistics > Crosstab…


o Example (Migraine.sav): The data set Migraine contains hypothetical data for a clinical trial of migraine
treatment. Subjects of both genders receive either a new drug therapy or a placebo. Their response to
treatment is coded as 'Better' or 'Same'. The data are recorded as cell counts, and the number of subjects for
each treatment and response combination is recorded in the variable Count.

Figure 57 CMH test

WCHRI, University of Alberta


59 SPSS Workshop 2014 Tutorial

o SPSS output:

Treatment * Response * Gender Crosstabulation


Response
Gender Better Same Total
female Treatment Active Count 16 11 27
Expected Count 10.9 16.1 27.0
Placebo Count 5 20 25
Expected Count 10.1 14.9 25.0
Total Count 21 31 52
Expected Count 21.0 31.0 52.0
male Treatment Active Count 12 16 28
Expected Count 9.9 18.1 28.0
Placebo Count 7 19 26
Expected Count 9.1 16.9 26.0
Total Count 19 35 54
Expected Count 19.0 35.0 54.0
Total Treatment Active Count 28 27 55
Expected Count 20.8 34.2 55.0
Placebo Count 12 39 51
Expected Count 19.2 31.8 51.0
Total Count 40 66 106
Expected Count 40.0 66.0 106.0

Chi-Square Tests
Asymp. Sig. (2- Exact Sig. (2- Exact Sig. (1-
Gender Value df sided) sided) sided)
female Pearson Chi-Square 8.310c 1 .004
Continuity Correctionb 6.759 1 .009
Likelihood Ratio 8.633 1 .003
Fisher's Exact Test .005 .004
N of Valid Cases 52
male Pearson Chi-Square 1.501d 1 .221
Continuity Correctionb .884 1 .347
Likelihood Ratio 1.515 1 .218
Fisher's Exact Test .264 .174
N of Valid Cases 54
Total Pearson Chi-Square 8.443a 1 .004
Continuity Correctionb 7.318 1 .007
Likelihood Ratio 8.626 1 .003
Fisher's Exact Test .005 .003
N of Valid Cases 106
a. 0 cells (0.0%) have expected count less than 5. The minimum expected count is 19.25.
b. Computed only for a 2x2 table
c. 0 cells (0.0%) have expected count less than 5. The minimum expected count is 10.10.
d. 0 cells (0.0%) have expected count less than 5. The minimum expected count is 9.15.

Risk Estimate
95% Confidence Interval
Gender Value Lower Upper
female Odds Ratio for Treatment
5.818 1.676 20.203
(Active / Placebo)
For cohort Response = Better 2.963 1.274 6.891
For cohort Response = Same .509 .310 .836
N of Valid Cases 52
male Odds Ratio for Treatment
2.036 .648 6.398
(Active / Placebo)
For cohort Response = Better 1.592 .741 3.418
For cohort Response = Same .782 .526 1.163
WCHRI, University of Alberta
60 SPSS Workshop 2014 Tutorial

N of Valid Cases 54
Total Odds Ratio for Treatment
3.370 1.462 7.772
(Active / Placebo)
For cohort Response = Better 2.164 1.237 3.783
For cohort Response = Same .642 .471 .875
N of Valid Cases 106

Tests of Homogeneity of the Odds Ratio


Asymp. Sig. (2-
Chi-Squared df sided)
Breslow-Day 1.493 1 .222
Tarone's 1.491 1 .222

Tests of Conditional Independence


Asymp. Sig. (2-
Chi-Squared df sided)
Cochran's 8.465 1 .004
Mantel-Haenszel 7.198 1 .007
Under the conditional independence assumption, Cochran's statistic
is asymptotically distributed as a 1 df chi-squared distribution, only if
the number of strata is fixed, while the Mantel-Haenszel statistic is
always asymptotically distributed as a 1 df chi-squared distribution.
Note that the continuity correction is removed from the Mantel-
Haenszel statistic when the sum of the differences between the
observed and the expected is 0.

Mantel-Haenszel Common Odds Ratio Estimate


Estimate 3.313
ln(Estimate) 1.198
Std. Error of ln(Estimate) .423
Asymp. Sig. (2-sided) .005
Asymp. 95% Confidence Common Odds Ratio Lower Bound 1.446
Interval Upper Bound 7.593
ln(Common Odds Ratio) Lower Bound .369
Upper Bound 2.027
The Mantel-Haenszel common odds ratio estimate is asymptotically normally distributed
under the common odds ratio of 1.000 assumption. So is the natural log of the estimate.

o Interpretation: SPSS output above displays the CMH statistics.


 Breslow-Day test: The large p-value for the Breslow-Day test (p-value=0.222) in Output indicates no
significant gender difference in the odds ratios.
 CMH test: The significant p-value (p=0.004) indicates that the association between treatment and
response remains strong after adjusting for gender.
 The CMH statistics option in Statistics window (See Figure 57) also produces a table of overall relative
risks. Because this is a prospective study, the relative risk estimate assesses the effectiveness of the
new drug; the "For cohort response=Better” values are the appropriate estimates for the first column
(the risk of improvement). The probability of migraine improvement with the new drug is just over
two times the probability of improvement with the placebo (Relative risk=2.164).

WCHRI, University of Alberta


61 SPSS Workshop 2014 Tutorial

10.4. McNemar’s test for matched pairs data


• Common subjects being observed under 2 conditions (2 treatments, before/after, 2 diagnostic tests) in a crossover
setting
• Two possible outcomes (Presence/Absence of Characteristic) on each measurement
• Four possibilities for each subjects with respect to outcome:
– Present in both conditions
– Absent in both conditions
– Present in Condition 1, Absent in Condition 2
– Absent in Condition 1, Present in Condition 2

o SPSS Menu: Analyze > Descriptive Statistics > Crosstab…


o Example (PrimeMinister.sav): From the data, we want to compare the probabilities of approval for the prime
minister’s performance at the times of two surveys.

Figure 58 McNemar’s test

o SPSS output:
First survey * Second survey Crosstabulation
Second survey
Approve Disapprove Total
First survey Approve Count 794 150 944
% within First survey 84.1% 15.9% 100.0%
Disapprove Count 86 570 656
% within First survey 13.1% 86.9% 100.0%
Total Count 880 720 1600
% within First survey 55.0% 45.0% 100.0%

Symmetric Measures
Value Asymp. Std. Errora Approx. Tb Approx. Sig.
Interval by Interval Pearson's R .702 .018 39.396 .000c
Ordinal by Ordinal Spearman Correlation .702 .018 39.396 .000c
N of Valid Cases 1600
a. Not assuming the null hypothesis.
b. Using the asymptotic standard error assuming the null hypothesis.
c. Based on normal approximation.

o Interpretation: SPSS output above displays the result of McNemar’s test for matched pair data. We can see
that there is big difference of the probabilities of approval for the prime minister’s performance at the times
of two surveys (p-value <0.001). i.e., we have strong evidence to support a drop in rating

WCHRI, University of Alberta


62 SPSS Workshop 2014 Tutorial

10.5. Measure of Agreement (Cohen’s Kappa)


o Cohen's kappa measures the agreement between the evaluations of two raters when both are rating the same
object. A value of 1 indicates perfect agreement. A value of 0 indicates that agreement is no better than
chance. Kappa is based on a square table in which row and column values represent the same scale. Any cell
that has observed values for one variable but not the other is assigned a count of 0. Kappa is not computed if
the data storage type (string or numeric) is not the same for the two variables. For string variable, both
variables must have the same defined length.
o A value of kappa higher than 0.75 will indicate excellent agreement while lower than 0.4 will indicate poor
agreement.

o SPSS Menu: Analyze > Descriptive Statistics > Crosstab…


o Example (Dermatology.sav): Two dermatologists evaluate the skin condition of 88 people. From the data, we
want to know whether two dermatologists’ evaluation is same or not.

Figure 59 Cohen’s Kappa

o SPSS output:

Derm1 * Derm2 Crosstabulation


Count
Derm2
clear marginal poor terrible Total
Derm1 clear 13 6 2 0 21
marginal 5 12 4 2 23
poor 2 12 10 5 29
terrible 0 1 4 10 15
Total 20 31 20 17 88

Symmetric Measures
Value Asymp. Std. Errora Approx. Tb Approx. Sig.
Measure of Agreement Kappa .345 .072 5.637 .000
N of Valid Cases 88
a. Not assuming the null hypothesis.
b. Using the asymptotic standard error assuming the null hypothesis.

o Interpretation: From the SPSS output, estimated Cohen’s Kappa=0.345 and Test for the test of symmetry is
significant (p-value<0.001) implying low agreement.

WCHRI, University of Alberta


63 SPSS Workshop 2014 Tutorial

10.6. Studies in medical science (Review)


Prospective studies
A prospective study watches for outcomes, such as the development of a disease, during the study period and
relates this to other factors such as suspected risk or protection factor(s). The study usually involves taking a
cohort of subjects and watching them over a long period. The outcome of interest should be common; otherwise,
the number of outcomes observed will be too small to be statistically meaningful (indistinguishable from those
that may have arisen by chance). All efforts should be made to avoid sources of bias such as the loss of individuals
to follow up during the study. Prospective studies usually have fewer potential sources of bias and confounding
than retrospective studies.

Retrospective studies
A retrospective study looks backwards and examines exposures to suspected risk or protection factors in relation
to an outcome that is established at the start of the study. Many valuable case-control studies, such as Lane and
Claypon's 1926 investigation of risk factors for breast cancer, were retrospective investigations. Most sources of
error due to confounding and bias are more common in retrospective studies than in prospective studies. For this
reason, retrospective investigations are often criticised. If the outcome of interest is uncommon, however, the size
of prospective investigation required to estimate relative risk is often too large to be feasible. In retrospective
studies the odds ratio provides an estimate of relative risk. You should take special care to avoid sources of bias
and confounding in retrospective studies.
Prospective investigation is required to make precise estimates of either the incidence of an outcome or the
relative risk of an outcome based on exposure.

Case-Control studies
Case-Control studies are usually but not exclusively retrospective; the opposite is true for cohort studies. The
following notes relate case-control to cohort studies:
• outcome is measured before exposure
• controls are selected on the basis of not having the outcome
• good for rare outcomes
• relatively inexpensive
• smaller numbers required
• quicker to complete
• prone to selection bias
• prone to recall/retrospective bias
• related methods are risk (retrospective), chi-square 2 by 2 test, Fisher's exact test, exact confidence
interval for odds ratio, odds ratio meta-analysis and conditional logistic regression.

Cohort studies
Cohort studies are usually but not exclusively prospective; the opposite is true for case-control studies. The
following notes relate cohort to case-control studies:
• outcome is measured after exposure
• yields true incidence rates and relative risks
• may uncover unanticipated associations with outcome
• best for common outcomes
• expensive
• requires large numbers
• takes a long time to complete
• prone to attrition bias (compensate by using person-time methods)
• prone to the bias of change in methods over time
• related methods are risk (prospective), relative risk meta-analysis, risk difference meta-analysis and
proportions

WCHRI, University of Alberta


64 SPSS Workshop 2014 Tutorial

11. ANOVA, ANCOVA, and MANOVA


11.1. One-way ANOVA
- The One-Way ANOVA procedure produces a one-way analysis of variance for a quantitative dependent variable by a
single factor (independent) variable. Analysis of variance is used to test the hypothesis that several means are equal.
This technique is an extension of the two-sample t test.
- The assumptions of analysis of variance are that treatment effects are additive and experimental errors are
independently random with a normal distribution that has mean zero and constant variance.
- Once you have determined that differences exist among the means, post hoc range tests and pairwise multiple
comparisons can determine which means differ. Range tests identify homogeneous subsets of means that are not
different from each other. Pairwise multiple comparisons test the difference between each pair of means and yield a
matrix where asterisks indicate significantly different group means at an alpha level of 0.05.

o SPSS Menu: Analyze > Compare Means > One-Way ANOVA...


o Example (Clover.sav): The following example studies the effect of bacteria on the nitrogen content of red
clover plants. The treatment factor is bacteria strain, and it has six levels. Five of the six levels consist of five
different Rhizobium trifolii bacteria cultures combined with a composite of five Rhizobium meliloti strains. The
sixth level is a composite of the five Rhizobium trifolii strains with the composite of the Rhizobium meliloti.
Red clover plants are inoculated with the treatments, and nitrogen content is later measured in milligrams.

Figure 60 One-way ANOVA

o SPSS output:

Descriptives
Nitrogen
95% Confidence Interval for Mean
N Mean Std. Deviation Std. Error Lower Bound Upper Bound Minimum Maximum
3DOK1 5 28.8200 5.80017 2.59392 21.6181 36.0219 19.40 33.00
3DOK13 5 13.2600 1.42759 .63844 11.4874 15.0326 11.60 14.40
3DOK4 5 14.6400 4.11619 1.84082 9.5291 19.7509 9.10 19.40
3DOK5 5 23.9800 3.77717 1.68920 19.2900 28.6700 17.70 27.90
3DOK7 5 19.9200 1.13004 .50537 18.5169 21.3231 18.60 21.00
COMPOS 5 18.7000 1.60156 .71624 16.7114 20.6886 16.90 20.80
Total 30 19.8867 6.24217 1.13966 17.5558 22.2175 9.10 33.00

Test of Homogeneity of Variances


Nitrogen
Levene Statistic df1 df2 Sig.
3.145 5 24 .025

WCHRI, University of Alberta


65 SPSS Workshop 2014 Tutorial

ANOVA
Nitrogen
Sum of Squares df Mean Square F Sig.
Between Groups 847.047 5 169.409 14.371 .000
Within Groups 282.928 24 11.789
Total 1129.975 29

Multiple Comparisons
Dependent Variable: Nitrogen
Tukey HSD
Mean Difference 95% Confidence Interval
(I) Strain (J) Strain (I-J) Std. Error Sig. Lower Bound Upper Bound
3DOK1 3DOK13 15.56000* 2.17151 .000 8.8458 22.2742
3DOK4 14.18000* 2.17151 .000 7.4658 20.8942
3DOK5 4.84000 2.17151 .262 -1.8742 11.5542
3DOK7 8.90000* 2.17151 .005 2.1858 15.6142
COMPOS 10.12000* 2.17151 .001 3.4058 16.8342
3DOK13 3DOK1 -15.56000* 2.17151 .000 -22.2742 -8.8458
3DOK4 -1.38000 2.17151 .987 -8.0942 5.3342
3DOK5 -10.72000* 2.17151 .001 -17.4342 -4.0058
3DOK7 -6.66000 2.17151 .053 -13.3742 .0542
COMPOS -5.44000 2.17151 .162 -12.1542 1.2742
3DOK4 3DOK1 -14.18000* 2.17151 .000 -20.8942 -7.4658
3DOK13 1.38000 2.17151 .987 -5.3342 8.0942
3DOK5 -9.34000* 2.17151 .003 -16.0542 -2.6258
3DOK7 -5.28000 2.17151 .185 -11.9942 1.4342
COMPOS -4.06000 2.17151 .443 -10.7742 2.6542
3DOK5 3DOK1 -4.84000 2.17151 .262 -11.5542 1.8742
3DOK13 10.72000* 2.17151 .001 4.0058 17.4342
3DOK4 9.34000* 2.17151 .003 2.6258 16.0542
3DOK7 4.06000 2.17151 .443 -2.6542 10.7742
COMPOS 5.28000 2.17151 .185 -1.4342 11.9942
3DOK7 3DOK1 -8.90000* 2.17151 .005 -15.6142 -2.1858
3DOK13 6.66000 2.17151 .053 -.0542 13.3742
3DOK4 5.28000 2.17151 .185 -1.4342 11.9942
3DOK5 -4.06000 2.17151 .443 -10.7742 2.6542
COMPOS 1.22000 2.17151 .993 -5.4942 7.9342
COMPOS 3DOK1 -10.12000* 2.17151 .001 -16.8342 -3.4058
3DOK13 5.44000 2.17151 .162 -1.2742 12.1542
3DOK4 4.06000 2.17151 .443 -2.6542 10.7742
3DOK5 -5.28000 2.17151 .185 -11.9942 1.4342
3DOK7 -1.22000 2.17151 .993 -7.9342 5.4942
*. The mean difference is significant at the 0.05 level.

WCHRI, University of Alberta


66 SPSS Workshop 2014 Tutorial

11.2. Two-way ANOVA (With interaction)

o SPSS Menu: Analyze > General Linear Model > Univariate...


o Example (Drug.sav): Thiis example uses data from Kutner (1974, p. 98) to illustrate a two-way analysis of
variance. The original data source is Afifi and Azen (1972, p. 166).

Figure 61 Unbalanced two-way ANOVA with interaction

o SPSS output:

Tests of Between-Subjects Effects


Dependent Variable: y
Type III Sum of
Source Squares df Mean Square F Sig.
a
Corrected Model 4259.339 11 387.213 3.506 .001
Intercept 20037.613 1 20037.613 181.414 .000
drug 2997.472 3 999.157 9.046 .000
disease 415.873 2 207.937 1.883 .164
drug * disease 707.266 6 117.878 1.067 .396
Error 5080.817 46 110.453
Total 30013.000 58
Corrected Total 9340.155 57
a. R Squared = .456 (Adjusted R Squared = .326)

WCHRI, University of Alberta


67 SPSS Workshop 2014 Tutorial

Multiple Comparisons
Dependent Variable: y
Tukey HSD
Mean 95% Confidence Interval
(I) drug (J) drug Difference (I-J) Std. Error Sig. Lower Bound Upper Bound
1 2 .53 3.838 .999 -9.70 10.76
*
3 17.32 4.070 .001 6.47 28.17
*
4 12.57 3.777 .009 2.50 22.63
2 1 -.53 3.838 .999 -10.76 9.70
*
3 16.78 4.070 .001 5.93 27.63
*
4 12.03 3.777 .013 1.97 22.10
*
3 1 -17.32 4.070 .001 -28.17 -6.47
*
2 -16.78 4.070 .001 -27.63 -5.93
4 -4.75 4.013 .640 -15.45 5.95
*
4 1 -12.57 3.777 .009 -22.63 -2.50
*
2 -12.03 3.777 .013 -22.10 -1.97
3 4.75 4.013 .640 -5.95 15.45
Based on observed means.
The error term is Mean Square(Error) = 110.453.
*. The mean difference is significant at the .05 level.

WCHRI, University of Alberta


68 SPSS Workshop 2014 Tutorial

11.3. ANCOVA (Analysis of Covariance)


Two general applications exist for ANCOVA:
• Remove Error Variance in the Randomized Experiment: Participants are assigned to treatment and control
groups in any ANOVA-type design. ANCOVA is then used as the statistical technique to eliminate irrelevant y
variance.
• Equating Non-Equivalent (Intact) Groups: A very controversial use of ANCOVA is to correct for initial group
differences (prior to assigned to x) that exists on y among several intact, state variable groups.

o SPSS Menu: Analyze > General Linear Model > Univariate...


o Example (Cholesterol.sav): Cholesterol levels [mg/ml] for 30 women from two US states, Iowa and Nebraska.
Age [years] may be a relevant covariate.

Figure 62 ANCOVA

o SPSS output:

Tests of Between-Subjects Effects


Dependent Variable: cholesterol
Type III Sum of
Source Squares df Mean Square F Sig.
a
Corrected Model 54432.754 2 27216.377 14.965 .000
Intercept 16901.543 1 16901.543 9.293 .005
age 53820.058 1 53820.058 29.593 .000
state 5456.450 1 5456.450 3.000 .095
Error 49103.913 27 1818.663
Total 1473140.000 30
Corrected Total 103536.667 29
a. R Squared = .526 (Adjusted R Squared = .491)

o Exercise (Goat .sav): Experiments were carried out on six commercial goat farms to determine whether the
standard worm drenching program was adequate. Forty goats were used in each experiment. Twenty of these,
chosen completely at random, were drenched according to the standard program, while the remaining twenty
were drenched more frequently. The goats were individually tagged, and weighed at the start and end of the
year-long study. For the first farm in the study the resulting liveweight gains are given along with the initial
liveweights. In each experiment the main interest was in the comparison of the liveweight gains between the
two treatments.

WCHRI, University of Alberta


69 SPSS Workshop 2014 Tutorial

11.4. MANOVA (Multivariate ANOVA)


Multivariate (>1 dependent variable) tests for differences among groups. ANOVA is a special case of MANOVA
• MANOVA - This is a good option if there are two or more continuous dependent variables and one categorical predictor
variable.
• Discriminant function analysis - This is a reasonable option and is equivalent to a one-way MANOVA.
• The data could be reshaped into long format and analyzed as a multilevel model.
• Separate univariate ANOVAs - You could analyze these data using separate univariate ANOVAs for each response
variable. The univariate ANOVA will not produce multivariate results utilizing information from all variables
simultaneously. In addition, separate univariate tests are generally less powerful because they do not take into
account the inter-correlation of the dependent variables.

Assumption of MANOVA
• One of the assumptions of MANOVA is that the response variables come from group populations that are multivariate
normal distributed. This means that each of the dependent variables is normally distributed within group, that any
linear combination of the dependent variables is normally distributed, and that all subsets of the variables must be
multivariate normal. With respect to Type I error rate, MANOVA tends to be robust to minor violations of the
multivariate normality assumption.
• The homogeneity of population covariance matrices (a.k.a. sphericity) is another assumption. This implies that the
population variances and covariances of all dependent variables must be equal in all groups formed by the
independent variables.
• Small samples can have low power, but if the multivariate normality assumption is met, the MANOVA is generally more
powerful than separate univariate tests.

o SPSS Menu: Analyze > General Linear Model > Multivariate...


o Example (MANOVA_Dietary .sav): A researcher randomly assigns 33 subjects to one of three groups. The first
group receives technical dietary information interactively from an on-line website. Group 2 receives the same
information from a nurse practitioner, while group 3 receives the information from a video tape made by the
same nurse practitioner. The researcher looks at three different ratings of the presentation, difficulty,
usefulness and importance, to determine if there is a difference in the modes of presentation. In particular,
the researcher is interested in whether the interactive website is superior because that is the most cost-
effective way of delivering the information.

WCHRI, University of Alberta


70 SPSS Workshop 2014 Tutorial

Figure 63 MANOVA

o SPSS output:

Descriptive Statistics
GROUP Mean Std. Deviation N
USEFUL Treatment 18.1182 3.90380 11
Control1 15.5273 2.07562 11
Control2 15.3455 3.13827 11
Total 16.3303 3.29246 33
DIFFICULTY Treatment 6.1909 1.89971 11
Control1 5.5818 2.43426 11
Control2 5.3727 1.75903 11
Total 5.7152 2.01760 33
IMPORTANCE Treatment 8.6818 4.86309 11
Control1 5.1091 2.53119 11
Control2 5.6364 3.54691 11
Total 6.4758 3.98513 33

Multivariate Testsa
Effect Value F Hypothesis df Error df Sig.
Intercept Pillai's Trace .986 657.857b 3.000 28.000 .000
Wilks' Lambda .014 657.857b 3.000 28.000 .000
Hotelling's Trace 70.485 657.857b 3.000 28.000 .000
Roy's Largest Root 70.485 657.857b 3.000 28.000 .000
GROUP Pillai's Trace .477 3.025 6.000 58.000 .012
Wilks' Lambda .526 3.538b 6.000 56.000 .005
Hotelling's Trace .897 4.038 6.000 54.000 .002
Roy's Largest Root .892 8.623c 3.000 29.000 .000
a. Design: Intercept + GROUP
b. Exact statistic
c. The statistic is an upper bound on F that yields a lower bound on the significance level.

WCHRI, University of Alberta


71 SPSS Workshop 2014 Tutorial

Tests of Between-Subjects Effects


Type III Sum of
Source Dependent Variable Squares df Mean Square F Sig.
Corrected Model USEFUL 52.924a 2 26.462 2.701 .083
DIFFICULTY 3.975b 2 1.988 .472 .628
IMPORTANCE 81.830c 2 40.915 2.879 .072
Intercept USEFUL 8800.400 1 8800.400 898.106 .000
DIFFICULTY 1077.878 1 1077.878 256.054 .000
IMPORTANCE 1383.869 1 1383.869 97.371 .000
GROUP USEFUL 52.924 2 26.462 2.701 .083
DIFFICULTY 3.975 2 1.988 .472 .628
IMPORTANCE 81.830 2 40.915 2.879 .072
Error USEFUL 293.965 30 9.799
DIFFICULTY 126.287 30 4.210
IMPORTANCE 426.371 30 14.212
Total USEFUL 9147.290 33
DIFFICULTY 1208.140 33
IMPORTANCE 1892.070 33
Corrected Total USEFUL 346.890 32
DIFFICULTY 130.262 32
IMPORTANCE 508.201 32
a. R Squared = .153 (Adjusted R Squared = .096)
b. R Squared = .031 (Adjusted R Squared = -.034)
c. R Squared = .161 (Adjusted R Squared = .105)

o Exercise (Pottery .sav): This example employs multivariate analysis of variance (MANOVA) to measure
differences in the chemical characteristics of ancient pottery found at four kiln sites in Great Britain. The data
are from Tubb, Parker, and Nickless (1980), as reported in Hand et al. (1994). For each of 26 samples of
pottery, the percentages of oxides of five metals are measured.

WCHRI, University of Alberta


72 SPSS Workshop 2014 Tutorial

12. Nonparametic method


• A statistical method is called non-parametric if it makes no assumption on the population distribution or sample size.
• This is in contrast with most parametric methods in elementary statistics that assume the data is quantitative, the
population has a normal distribution and the sample size is sufficiently large.
• In general, conclusions drawn from non-parametric methods are not as powerful as the parametric ones. However, as
non-parametric methods make fewer assumptions, they are more flexible, more robust, and applicable to non-
quantitative data.

12.1. Wilcoxon rank-sum test (Mann-Whitney U test)


o SPSS Menu: Analyze > Nonparametic Tests >Legacy Dialogs> 2 Independent Samples …
o Corresponding parametric method: two-sample t-test
o Example (Graze.sav): T aken from Huntsberger and Billingsley (1989). Compares two grazing methods using
32 cattle. Half of the cattle are allowed to graze continuously while the other half are subjected to controlled
grazing time. The researchers want to know if these two grazing methods affect weight gain differently

Figure 64 Wilcoxon rank-sum (or Mann-Whitney U test)

o SPSS output:
Ranks
GrazeType N Mean Rank Sum of Ranks
WeightGain continuous 16 15.19 243.00
controlled 16 17.81 285.00
Total 32

a
Test Statistics
WeightGain
Mann-Whitney U 107.000
Wilcoxon W 243.000
Z -.792
Asymp. Sig. (2-tailed) .429
b
Exact Sig. [2*(1-tailed Sig.)] .445
a. Grouping Variable: GrazeType
b. Not corrected for ties.

WCHRI, University of Alberta


73 SPSS Workshop 2014 Tutorial

12.2. Wilcoxon signed-rank test for paired data


o SPSS Menu: Analyze > Nonparametic Tests >Legacy Dialogs> 2 Related Samples …
o Corresponding parametric method: paired t-test
o Example (Pressure.sav): A stimulus is being examined to determine its effect on systolic blood pressure.
Twelve men participate in the study. Each man’s systolic blood pressure is measured both before and after the
stimulus is applied.

Figure 65 Wilcoxon signed-rank test

o SPSS output:

Ranks
N Mean Rank Sum of Ranks
a
SBPafter - SBPbefore Negative Ranks 3 8.17 24.50
b
Positive Ranks 9 5.94 53.50
Ties c
0
Total 12
a. SBPafter < SBPbefore
b. SBPafter > SBPbefore
c. SBPafter = SBPbefore

a
Test Statistics
SBPafter -
SBPbefore
b
Z -1.143
Asymp. Sig. (2-tailed) .253
a. Wilcoxon Signed Ranks Test
b. Based on negative ranks.

WCHRI, University of Alberta


74 SPSS Workshop 2014 Tutorial

12.3. Kruskal-Wallis test


o SPSS Menu: Analyze > Nonparametic Tests >Legacy Dialogs> K independent Samples …
o Corresponding parametric method: One-way ANOVA
o Example (Clover.sav): The following example studies the effect of bacteria on the nitrogen content of red
clover plants. The treatment factor is bacteria strain, and it has six levels. Five of the six levels consist of five
different Rhizobium trifolii bacteria cultures combined with a composite of five Rhizobium meliloti strains. The
sixth level is a composite of the five Rhizobium trifolii strains with the composite of the Rhizobium meliloti.
Red clover plants are inoculated with the treatments, and nitrogen content is later measured in milligrams.

Figure 66 Kruskal-Wallis test

o SPSS output:

Ranks
Strain N Mean Rank
Nitrogen 3DOK1 5 26.00
3DOK13 5 4.60
3DOK4 5 8.00
3DOK5 5 22.20
3DOK7 5 17.60
COMPOS 5 14.60
Total 30

a,b
Test Statistics
Nitrogen
Chi-Square 21.659
df 5
Asymp. Sig. .001
a. Kruskal Wallis Test
b. Grouping Variable:
Strain

WCHRI, University of Alberta


75 SPSS Workshop 2014 Tutorial

13. Correlation and Regression analysis

13.1. Correlation analysis


- Linear relation between bivariate variables

o SPSS Menu: Analyze > Correlate > Bivariate...


o Example (Cholesterol.sav): Cholesterol levels [mg/ml] for 30 women from two US states, Iowa and Nebraska.
Age [years] may be a relevant covariate.

Figure 67 Correlation analysis

o SPSS output:

Correlations
cholesterol age
**
cholesterol Pearson Correlation 1 .688
Sig. (2-tailed) .000
N 30 30
**
age Pearson Correlation .688 1
Sig. (2-tailed) .000
N 30 30
**. Correlation is significant at the 0.01 level (2-tailed).

Correlations
cholesterol age
**
Spearman's rho cholesterol Correlation Coefficient 1.000 .749
Sig. (2-tailed) . .000
N 30 30
**
age Correlation Coefficient .749 1.000
Sig. (2-tailed) .000 .
N 30 30
**. Correlation is significant at the 0.01 level (2-tailed).

WCHRI, University of Alberta


76 SPSS Workshop 2014 Tutorial

13.2. Linear Regression model


– Linear regression is the most widely used of all statistical techniques: it is the study of linear (i.e., straight-line)
relationships between variables, usually under an assumption of normally distributed errors

o SPSS Menu: Analyze > Regression > Linear...


o Example (Cholesterol.sav): Cholesterol levels [mg/ml] for 30 women from two US states, Iowa and Nebraska.
Age [years] may be a relevant covariate.

Figure 68 Linear regression model

o SPSS output:

b
Model Summary
Adjusted R Std. Error of the
Model R R Square Square Estimate
a
1 .725 .526 .491 42.646
a. Predictors: (Constant), State1, age
b. Dependent Variable: cholesterol

a
ANOVA
Model Sum of Squares df Mean Square F Sig.
b
1 Regression 54432.754 2 27216.377 14.965 .000
Residual 49103.913 27 1818.663
Total 103536.667 29
a. Dependent Variable: cholesterol
b. Predictors: (Constant), State1, age

a
Coefficients
Standardized
Unstandardized Coefficients Coefficients
Model B Std. Error Beta t Sig.
1 (Constant) 93.141 24.799 3.756 .001
age 2.698 .496 .738 5.440 .000
State1 -28.651 16.541 -.235 -1.732 .095
a. Dependent Variable: cholesterol

WCHRI, University of Alberta


77 SPSS Workshop 2014 Tutorial

o Exercise (BrainSize.sav): Are the size and weight of your brain indicators of your mental capacity? In this
study by Willerman et al. (1991) the researchers use Magnetic Resonance Imaging (MRI) to determine the
brain size of the subjects. The researchers take into account gender and body size to draw conclusions about
the connection between brain size and intelligence. Willerman et al. (1991) conducted their study at a large
southwestern university. They selected a sample of 40 right-handed Anglo introductory psychology students
who had indicated no history of alcoholism, unconsciousness, brain damage, epilepsy, or heart disease. These
subjects were drawn from a larger pool of introductory psychology students with total Scholastic Aptitude Test
Scores higher than 1350 or lower than 940 who had agreed to satisfy a course requirement by allowing the
administration of four subtests (Vocabulary, Similarities, Block Design, and Picture Completion) of the
Wechsler (1981) Adult Intelligence Scale-Revised. With prior approval of the University's research review
board, students selected for MRI were required to obtain prorated full-scale IQs of greater than 130 or less
than 103, and were equally divided by sex and IQ classification. The MRI Scans were performed at the same
facility for all 40 subjects. The scans consisted of 18 horizontal MR images. The computer counted all pixels
with non-zero gray scale in each of the 18 images and the total count served as an index for brain size.

Variable Information:
Gender: Male or Female
FSIQ: Full Scale IQ scores based on the four Wechsler (1981) subtests
VIQ: Verbal IQ scores based on the four Wechsler (1981) subtests
PIQ: Performance IQ scores based on the four Wechsler (1981) subtests
Weight: body weight in pounds
Height: height in inches
MRI_Count: total pixel Count from the 18 MRI scans

WCHRI, University of Alberta


78 SPSS Workshop 2014 Tutorial

14. Logistic regression analysis


Logistic regression, also called a logit model, is used to model dichotomous outcome variables. In the logit model the log odds of
the outcome is modeled as a linear combination of the predictor variables.

o SPSS Menu: Analyze > Regression > Binary Logistic...


o Example (Logistic.sav): A researcher is interested in how variables, such as GRE (Graduate Record Exam
scores), GPA (grade point average) and prestige of the undergraduate institution, affect admission into
graduate school. The outcome variable, admit/don't admit, is binary. This data set has a binary response
(outcome, dependent) variable called admit, which is equal to 1 if the individual was admitted to graduate
school, and 0 otherwise. There are three predictor variables: gre, gpa, and rank. We will treat the variables gre
and gpa as continuous. The variable rank takes on the values 1 through 4. Institutions with a rank of 1 have
the highest prestige, while those with a rank of 4 have the lowest. We start out by looking at some descriptive
statistics.

Figure 69 Logistic regression model

o SPSS output:

Model Summary
Cox & Snell R Nagelkerke R
Step -2 Log likelihood Square Square
1 458.517a .098 .138
a. Estimation terminated at iteration number 4 because parameter
estimates changed by less than .001.

WCHRI, University of Alberta


79 SPSS Workshop 2014 Tutorial

Classification Tablea
Predicted
ADMIT Percentage
Observed Not admitted Admitted Correct
Step 1 ADMIT Not admitted 254 19 93.0
Admitted 97 30 23.6
Overall Percentage 71.0
a. The cut value is .500

Variables in the Equation


95% C.I.for EXP(B)
B S.E. Wald df Sig. Exp(B) Lower Upper
Step 1a GRE .002 .001 4.284 1 .038 1.002 1.000 1.004
GPA .804 .332 5.872 1 .015 2.235 1.166 4.282
RANK 20.895 3 .000
RANK(1) 1.551 .418 13.787 1 .000 4.718 2.080 10.702
RANK(2) .876 .367 5.706 1 .017 2.401 1.170 4.927
RANK(3) .211 .393 .289 1 .591 1.235 .572 2.668
Constant -5.541 1.138 23.709 1 .000 .004
a. Variable(s) entered on step 1: GRE, GPA, RANK.

o Interpretation: From the output, GRE, GPA, and Rank variables are associated with response variable
(Admit or not). In logistics regression. Exp(B) is useful for interpretation. For the coefficients for GPA,
b 0.804
the odds ratio can be computed by raising e to the power of the logistic coefficient, OR = e = e =
2.235. This means that a one unit change in GPA results in 2.235 times chance to get admission.

WCHRI, University of Alberta