0 valutazioniIl 0% ha trovato utile questo documento (0 voti)

29 visualizzazioni80 pagineadv

Oct 28, 2019

© © All Rights Reserved

PDF, TXT o leggi online da Scribd

adv

© All Rights Reserved

0 valutazioniIl 0% ha trovato utile questo documento (0 voti)

29 visualizzazioni80 pagineadv

© All Rights Reserved

Sei sulla pagina 1di 80

WCHRI, University of Alberta

3/5/2014

1 SPSS Workshop 2014 Tutorial

Table of Contents

1. Outline: SPSS Workshop 2014 .............................................................................................................................3

2. What is SPSS? .....................................................................................................................................................4

3. Introducing the SPSS interface ............................................................................................................................5

3.1. SPSS Data Editor: Data View ........................................................................................................................................... 5

3.2. SPSS Data Editor: Variable View...................................................................................................................................... 5

3.3. SPSS Output window....................................................................................................................................................... 7

3.4. SPSS Syntax window........................................................................................................................................................ 7

4. Getting familiar with SPSS Menu and Icon ...........................................................................................................8

5. Data Import/Export ..........................................................................................................................................10

5.1. Create Data File (Entering Data) ................................................................................................................................... 10

5.2. Opening Data File (Import data) ................................................................................................................................... 15

5.2.1. Opening SPSS data: File > Open>Data… (Select SPSS statistics (*.sav) as File of type :) ..................................... 15

5.2.2. Opening Text File: Fixed width .............................................................................................................................. 15

5.2.3. Opening Text File: (Tab) Delimited ....................................................................................................................... 17

5.2.4. Opening EXCEL (or CSV) File .................................................................................................................................. 19

5.2.5. Opening SAS data file ............................................................................................................................................ 20

5.3. Export Data File (Save as different type of data) .......................................................................................................... 21

5.4. Saving Data File with selected variables ....................................................................................................................... 21

6. Manipulating data1 (SPSS Menu: Data) .............................................................................................................22

6.1. Data Menu: Sort Cases… ............................................................................................................................................... 22

6.2. Data Menu: Identify Duplicate Cases… ......................................................................................................................... 23

6.3. Data Menu: Merge Files > Add Cases…......................................................................................................................... 24

6.4. Data Menu: Merge Files > Add Variables… ................................................................................................................... 25

6.5. Data Menu: Aggregate… ............................................................................................................................................... 26

6.6. Data Menu: Restructure… ............................................................................................................................................. 27

6.7. Data Menu: Split into Files ............................................................................................................................................ 29

6.8. Data Menu: Split Files…................................................................................................................................................. 30

6.9. Data Menu: Select Cases… ............................................................................................................................................ 31

6.10. Data Menu: Weight Cases… .................................................................................................................................. 32

7. Manipulating data2 (SPSS Menu: Transform).....................................................................................................33

7.1. Transform Menu: Compute Variable… ......................................................................................................................... 33

7.2. Transform Menu: Recode into Same Variables….......................................................................................................... 33

7.3. Transform Menu: Recode into Different Variables… .................................................................................................... 34

7.4. Transform Menu: Automatic Recode… ......................................................................................................................... 34

7.5. Transform Menu: Create Dummy Variables ................................................................................................................. 35

7.6. Transform Menu: Visual Binning… ................................................................................................................................ 35

7.7. Transform Menu: Rank Cases…..................................................................................................................................... 37

7.8. Transform Menu: Date and time Wizard… ................................................................................................................... 37

7.9. Transform Menu: Replace missing values… .................................................................................................................. 38

8. Descriptive statistics .........................................................................................................................................39

8.1. Descriptive statistics for continuous data (Interval, Ratio)........................................................................................... 39

8.2. Descriptive statistics for categorical data (Nominal, Ordinal) ...................................................................................... 44

8.3. Generating graphs (or charts) for continuous data (Interval, Ratio) ............................................................................ 47

8.4. Generating graphs (or charts) for categorical data (Nominal, Ordinal) ........................................................................ 50

8.5. Using Chart Builder ....................................................................................................................................................... 52

WCHRI, University of Alberta

2 SPSS Workshop 2014 Tutorial

9. Compare Means (T-test) ...................................................................................................................................54

9.1. Independent sample t-test with two groups ................................................................................................................ 54

9.2. Paired samples t-test .................................................................................................................................................... 55

10. Compare proportions (Analysis of contingency table) and association................................................................56

10.1. Pearson’s Chi-Square test ..................................................................................................................................... 56

10.2. Fisher’s exact test.................................................................................................................................................. 57

10.3. Cochran-Mantel-Haenszel (CMH) Statistics .......................................................................................................... 58

10.4. McNemar’s test for matched pairs data ............................................................................................................... 61

10.5. Measure of Agreement (Cohen’s Kappa) .............................................................................................................. 62

10.6. Studies in medical science (Review) ..................................................................................................................... 63

11. ANOVA, ANCOVA, and MANOVA ......................................................................................................................64

11.1. One-way ANOVA ................................................................................................................................................... 64

11.2. Two-way ANOVA (With interaction) ..................................................................................................................... 66

11.3. ANCOVA (Analysis of Covariance) ......................................................................................................................... 68

11.4. MANOVA (Multivariate ANOVA) ........................................................................................................................... 69

12. Nonparametic method......................................................................................................................................72

12.1. Wilcoxon rank-sum test (Mann-Whitney U test) .................................................................................................. 72

12.2. Wilcoxon signed-rank test for paired data ........................................................................................................... 73

12.3. Kruskal-Wallis test ................................................................................................................................................. 74

13. Correlation and Regression analysis ..................................................................................................................75

13.1. Correlation analysis ............................................................................................................................................... 75

13.2. Linear Regression model ....................................................................................................................................... 76

14. Logistic regression analysis ...............................................................................................................................78

3 SPSS Workshop 2014 Tutorial

• Learning about SPSS 1

• Opening and reviewing layouts of SPSS

• Becoming familiar with menus and icons

• Manipulating data files

• Calculating descriptive statistics

• Comparing means and proportions

• Calculating association

• Creating graphs

• Working with SPSS syntax

• ANOVA, ANCOVA, MANOVA

• Repeated measure ANOVA

• Correlation analysis & (ordinary) linear regression

• Logistic regression

• Nonparametric method

1

Note that this tutorial was created using IBM SPSS Statistics Version 22.

4 SPSS Workshop 2014 Tutorial

2. What is SPSS?

• Windows based program that can be used to perform data entry and analysis and to create tables and graphs.

• Capable of handling large amounts of data and can perform all of the analyses covered in the text and much

more.

• Commonly used in the Social Sciences and in the business world.

• SPSS is updated often.

o SPSS data file: *.sav

o SPSS output file: *.spo

o SPSS syntax file: *.sps

5 SPSS Workshop 2014 Tutorial

3.1. SPSS Data Editor: Data View

Many of the features of Data View are similar to the features that are found in spreadsheet applications. There are,

however, several important distinctions:

• Rows are cases. Each row represents a case or an observation. For example, each individual respondent to a

questionnaire is a case.

• Columns are variables. Each column represents a variable or characteristic that is being measured. For example, each

item on a questionnaire is a variable.

• Cells contain values. Each cell contains a single value of a variable for a case. The cell is where the case and the variable

intersect. Cells contain only data values. Unlike spreadsheet programs, cells in the Data Editor cannot contain formulas.

• The data file is rectangular. The dimensions of the data file are determined by the number of cases and variables. You

can enter data in any cell. If you enter data in a cell outside the boundaries of the defined data file, the data rectangle

is extended to include any rows and/or columns between that cell and the file boundaries. There are no "empty" cells

within the boundaries of the data file. For numeric variables, blank cells are converted to the system-missing value. For

string variables, a blank is considered a valid value.

Variable View contains descriptions of the attributes of each variable in the data file. In Variable View:

• Rows are variables.

• Columns are variable attributes.

WCHRI, University of Alberta

6 SPSS Workshop 2014 Tutorial

You can add or delete variables and modify attributes of variables, including the following attributes:

• Variable name

• Data type

• Number of digits or characters

• Number of decimal places

• Descriptive variable and value labels

• User-defined missing values

• Column width

• Measurement level

All of these attributes are saved when you save the data file.

In addition to defining variable properties in Variable View, there are two other methods for defining variable properties:

• The Copy Data Properties Wizard provides the ability to use an external IBM® SPSS® Statistics data file or another

dataset that is available in the current session as a template for defining file and variable properties in the active

dataset. You can also use variables in the active dataset as templates for other variables in the active dataset. Copy

Data Properties is available on the Data menu in the Data Editor window. See the topic Copying Data Properties for

more information.

• Define Variable Properties (also available on the Data menu in the Data Editor window) scans your data and lists all

unique data values for any selected variables, identifies unlabeled values, and provides an auto-label feature. This

method is particularly useful for categorical variables that use numeric codes to represent categories--for example, 0 =

Male, 1 = Female. See the topic Defining Variable Properties for more information.

7 SPSS Workshop 2014 Tutorial

8 SPSS Workshop 2014 Tutorial

SPSS MENUs

SPSS ICONs

Status Bar

SPSS MENU

• File includes all of the options you typically use in other programs, such as open, save, exit. Notice, that you can open

or create new files of multiple types as illustrated to the right.

• Edit includes the typical cut, copy, and paste commands, and allows you to specify various options for displaying data

and output.

o Click on Options, and you will see the dialog box to the left. You can use this to format the data, output, charts,

etc. These choices are rather overwhelming, and you can simply take the default options for now. The author

of your text (me) was too dumb to even know these options could easily be set.

WCHRI, University of Alberta

9 SPSS Workshop 2014 Tutorial

• View allows you to select which toolbars you want to show, select font size, add or remove the gridlines that separate

each piece of data, and to select whether or not to display your raw data or the data labels.

• Data allows you to select several options ranging from displaying data that is sorted by a specific variable to selecting

certain cases for subsequent analyses.

• Transform includes several options to change current variables. For example, you can change continuous variables to

categorical variables, change scores into rank scores, add a constant to variables, etc.

• Analyze includes all of the commands to carry out statistical analyses and to calculate descriptive statistics. Much of

this book will focus on using commands located in this menu.

• Graphs includes the commands to create various types of graphs including box plots, histograms, line graphs, and bar

charts.

• Utilities allows you to list file information which is a list of all variables, there labels, values, locations in the data file,

and type.

• Add-ons are programs that can be added to the base SPSS package. You probably do not have access to any of those.

• Window can be used to select which window you want to view (i.e., Data Editor, Output Viewer, or Syntax). Since we

have a data file and an output file open, let’s try this.

o Select Window/Data Editor. Then select Window/SPSS Viewer.

• Help has many useful options including a link to the SPSS homepage, a statistics coach, and a syntax guide. Using topics,

you can use the index option to type in any key word and get a list of options, or you can view the categories and

subcategories available under contents. This is an excellent tool and can be used to troubleshoot most problems.

SPSS ICON

• The Icons directly under the Menu bar provide shortcuts to many common commands that are available in specific

menus. Take a moment to review these as well.

STATUS Bar

The status bar at the bottom of each IBM® SPSS® Statistics window provides the following information:

• Command status. For each procedure or command that you run, a case counter indicates the number of cases

processed so far. For statistical procedures that require iterative processing, the number of iterations is displayed.

• Filter status. If you have selected a random sample or a subset of cases for analysis, the message Filter on indicates

that some type of case filtering is currently in effect and not all cases in the data file are included in the analysis.

• Weight status. The message Weight on indicates that a weight variable is being used to weight cases for analysis.

• Split File status. The message Split File on indicates that the data file has been split into separate groups for analysis,

based on the values of one or more grouping variables.

10 SPSS Workshop 2014 Tutorial

5. Data Import/Export

5.1. Create Data File (Entering Data)

Variable names

The following rules apply to variable names:

• Variable names can be up to 64 bytes long, and the first character must be a letter or one of the characters @, #, or $.

Subsequent characters can be any combination of letters, numbers, non-punctuation characters, and a period (.). In

code page mode, sixty-four bytes typically means 64 characters in single-byte languages (for example, English, French,

German, Spanish, Italian, Hebrew, Russian, Greek, Arabic, and Thai) and 32 characters in double-byte languages (for

example, Japanese, Chinese, and Korean). Many string characters that only take one byte in code page mode take two

or more bytes in Unicode mode. For example, é is one byte in code page format but is two bytes in Unicode format; so

résumé is six bytes in a code page file and eight bytes in Unicode mode.

Note: Letters include any non-punctuation characters used in writing ordinary words in the languages supported in the

platform's character set.

• A # character in the first position of a variable name defines a scratch variable. You can only create scratch variables

with command syntax. You cannot specify a # as the first character of a variable in dialog boxes that create new

variables.

• A $ sign in the first position indicates that the variable is a system variable. The $ sign is not allowed as the initial

character of a user-defined variable.

• The period, the underscore, and the characters $, #, and @ can be used within variable names. For example, A._$@#1

is a valid variable name.

• Variable names ending with a period should be avoided, since the period may be interpreted as a command terminator.

You can only create variables that end with a period in command syntax. You cannot create variables that end with a

period in dialog boxes that create new variables.

• Variable names ending in underscores should be avoided, since such names may conflict with names of variables

automatically created by commands and procedures.

• Reserved keywords cannot be used as variable names. Reserved keywords are ALL, AND, BY, EQ, GE, GT, LE, LT, NE,

NOT, OR, TO, and WITH.

• Variable names can be defined with any mixture of uppercase and lowercase characters, and case is preserved for

display purposes.

• When long variable names need to wrap onto multiple lines in output, lines are broken at underscores, periods, and

points where content changes from lower case to upper case.

Variable type

Variable Type specifies the data type for each variable. By default, all new variables are assumed to be numeric. You can use

Variable Type to change the data type. The contents of the Variable Type dialog box depend on the selected data type. For

some data types, there are text boxes for width and number of decimals; for other data types, you can simply select a

format from a scrollable list of examples. The available data types are as follows:

• Numeric. A variable whose values are numbers. Values are displayed in standard numeric format. The Data Editor

accepts numeric values in standard format or in scientific notation.

WCHRI, University of Alberta

11 SPSS Workshop 2014 Tutorial

• Comma. A numeric variable whose values are displayed with commas delimiting every three places and displayed with

the period as a decimal delimiter. The Data Editor accepts numeric values for comma variables with or without commas

or in scientific notation. Values cannot contain commas to the right of the decimal indicator.

• Dot. A numeric variable whose values are displayed with periods delimiting every three places and with the comma as

a decimal delimiter. The Data Editor accepts numeric values for dot variables with or without periods or in scientific

notation. Values cannot contain periods to the right of the decimal indicator.

• Scientific notation. A numeric variable whose values are displayed with an embedded E and a signed power-of-10

exponent. The Data Editor accepts numeric values for such variables with or without an exponent. The exponent can be

preceded by E or D with an optional sign or by the sign alone--for example, 123, 1.23E2, 1.23D2, 1.23E+2, and 1.23+2.

• Date. A numeric variable whose values are displayed in one of several calendar-date or clock-time formats. Select a

format from the list. You can enter dates with slashes, hyphens, periods, commas, or blank spaces as delimiters. The

century range for two-digit year values is determined by your Options settings (from the Edit menu, choose Options,

and then click the Data tab).

• Dollar. A numeric variable displayed with a leading dollar sign ($), commas delimiting every three places, and a period

as the decimal delimiter. You can enter data values with or without the leading dollar sign.

• Custom currency. A numeric variable whose values are displayed in one of the custom currency formats that you have

defined on the Currency tab of the Options dialog box. Defined custom currency characters cannot be used in data

entry but are displayed in the Data Editor.

• String. A variable whose values are not numeric and therefore are not used in calculations. The values can contain any

characters up to the defined length. Uppercase and lowercase letters are considered distinct. This type is also known as

an alphanumeric variable.

• Restricted numeric. A variable whose values are restricted to non-negative integers. Values are displayed with leading

zeros padded to the maximum width of the variable. Values can be entered in scientific notation.

Measure of Variables

• Nominal variable is one that has two or more categories, but there is no intrinsic ordering to the categories.

e.g., gender, ethnicity etc.

12 SPSS Workshop 2014 Tutorial

• Ordinal variable is similar to nominal variable with clear ordering of the categories but the spacing between the values

may not be the same.

e.g. Socio-economic status, Severity of disease etc.

• Interval variable is similar to ordinal variable with intervals between values are equally spaced.

e.g. Height, weight, age etc.

Missing values

• If you do not enter any data in a field, it will be considered as missing and SPSS will enter a period for you.

• Or you can define specific value as missing value

13 SPSS Workshop 2014 Tutorial

PatientID Gender Age Weight Height Ethnicity

1 1 18 175 155 A

2 2 31 156 150 W

3 1 12 141 136 B

4 9 31 160 177 O

• For Gender,

o 1=”Male”

o 2=”Female”

o 9=”User defined Missing value”

• For Ethnicity,

o A=”Aboriginal”

o W=”White”

o B=”Black”

o O=”Others”

• Entering data in SPSS (Variable name, define value labels, and define missing value)

14 SPSS Workshop 2014 Tutorial

15 SPSS Workshop 2014 Tutorial

Data files come in a wide variety of formats, and this software is designed to handle many of them, including:

• Spreadsheets created with Excel and Lotus

• Database tables from many database sources, including Oracle, SQLServer, Access, dBASE, and others

• Tab-delimited and other types of simple text files

• Data files in IBM® SPSS® Statistics format created on other operating systems

• SYSTAT data files. SYSTAT SYZ files are not supported.

• SAS data files

• Stata data files

• IBM Cognos Business Intelligence data packages and list reports

5.2.1. Opening SPSS data: File > Open>Data… (Select SPSS statistics (*.sav) as File of type :)

5.2.2. Opening Text File: Fixed width

o Raw data

o Open in SPSS

o File > Read Text Data…

o File > Open>Data…

16 SPSS Workshop 2014 Tutorial

17 SPSS Workshop 2014 Tutorial

o Raw data

o Open in SPSS

o File > Read Text Data…

o File > Open>Data…

18 SPSS Workshop 2014 Tutorial

19 SPSS Workshop 2014 Tutorial

o Raw data

o Open in SPSS

o File > Open>Data…

Select EXCEL as “Files of type:” to open EXCEL file

Select TEXT as “Files of type:” to open CSV file

o Or simply drag EXCEL (or CSV) file to SPSS program

WCHRI, University of Alberta

20 SPSS Workshop 2014 Tutorial

o Raw data

o Open in SPSS

o File > Open>Data…

Select SAS as “Files of type:” to open SAS data file

o Or simply drag SAS data file to SPSS program

WCHRI, University of Alberta

21 SPSS Workshop 2014 Tutorial

o Data export in SPSS

o File >Save as…

Select “Save as type” to export data

o File >Save as… (click “Variables…” in Save Data As window)

22 SPSS Workshop 2014 Tutorial

Even though, you have cleaned final dataset. During the data analysis, you need to manipulate data. For instance, if you have

several datasets from several different resources, you might need to merge or cut data. And no matter how carefully you

planned your data design, you’ll probably want to work with some variables in different forms. If you collected income or age

data, for example, you might want to group the continuous variables into categories. Or you might want to create a variable

that combines various conditions, say, all minority managers by gender.

This section 6 & 7 will deal with Data and Transform menu in SPSS

23 SPSS Workshop 2014 Tutorial

(Sample data: Example_Identify_Duplicated.sav)

o SPSS output

Cumulative

Frequency Percent Valid Percent Percent

Valid Duplicate Case 1 3.2 3.2 3.2

Primary Case 30 96.8 96.8 100.0

Total 31 100.0 100.0

24 SPSS Workshop 2014 Tutorial

(Sample data: Example_MergeByCase_01.sav, Example_MergeByCase_02.sav)

o Both SPSS datasets above have same variables with same name, but different cases (One is data for Female, the other is for

Male).

WCHRI, University of Alberta

25 SPSS Workshop 2014 Tutorial

(Sample data: Example_MergeByVariables_01.sav, Example_MergeByVariable_02.sav)

o Note that both SPSS datasets above must have key variable (Unique Identifier).

o Note that both SPSS datasets above must be sorted by key variable before merging files.

WCHRI, University of Alberta

26 SPSS Workshop 2014 Tutorial

(Sample data: DataExcel.sav)

27 SPSS Workshop 2014 Tutorial

(Sample data: Example1_RepeatedMeasureANOVA.sav)

1 1 3 4 7 3

2 1 6 8 12 9

3 1 7 13 11 11

4 1 0 3 6 6

5 2 5 6 11 7

6 2 10 12 18 15

7 2 10 15 15 14

8 2 5 7 11 9

1 1 3 1

1 1 4 2

1 1 7 3

1 1 3 4

: : : :

8 2 5 1

8 2 7 2

8 2 11 3

8 2 9 4

o Using “Restructure” menu in SPSS (Step1 – 7), we can convert wide format data into long format (or Vice versa)

28 SPSS Workshop 2014 Tutorial

29 SPSS Workshop 2014 Tutorial

(Sample data: DataExcel.sav)

30 SPSS Workshop 2014 Tutorial

(Sample data: DataExcel.sav)

o If you run above, you can see “Split by Gender” in Status bar

o SPSS output before splitting files by Gender.

Descriptive Statistics

N Minimum Maximum Mean Std. Deviation

Age 30 12 31 21.60 6.360

Weight 30 100 200 147.13 34.309

Height 30 121 188 144.00 17.388

Valid N (listwise) 30

Descriptive Statistics

Gender N Minimum Maximum Mean Std. Deviation

F Age 18 12 31 20.06 5.955

Weight 18 100 199 142.61 36.934

Height 18 121 188 143.94 18.479

Valid N (listwise) 18

M Age 12 12 31 23.92 6.487

Weight 12 111 200 153.92 30.189

Height 12 122 177 144.08 16.412

Valid N (listwise) 12

o If you analyze all cases, then select “Analyze all cases, do not create groups” in “Split Files…” menu (See Figure 30)

31 SPSS Workshop 2014 Tutorial

(Sample data: DataExcel.sav)

o From the Figure 32, Female data was selected, so all of male data will be excluded from the analysis.

o If you want to use all cases again, then select “All cases” in “Select Cases…” menu in Figure 31

32 SPSS Workshop 2014 Tutorial

Suppose we have cross (or contingency) table below to see association between X and Y.

YY

Variable Total

1 2

1 15 20 35

X

2 25 35 60

Total 40 55 95

o SPSS output before weighting cases.

x * y Crosstabulation

Count

y

1 2 Total

x 1 1 1 2

2 1 1 2

Total 2 2 4

x * y Crosstabulation

Count

y

1 2 Total

x 1 15 20 35

2 25 35 60

Total 40 55 95

33 SPSS Workshop 2014 Tutorial

7.1. Transform Menu: Compute Variable…

(Sample data: DataExcel.sav)

(Sample data: DataExcel.sav)

34 SPSS Workshop 2014 Tutorial

(Sample data: DataExcel.sav)

o From the example dataset, we want to generate categorize age variable (<20 years, 20-29 years, >=30 years)

(Sample data: DataExcel.sav)

35 SPSS Workshop 2014 Tutorial

(Sample data: DataExcel.sav)

(Sample data: DataExcel.sav)

o From the example dataset, we want to generate same categorized age variable (<20 years, 20-29 years, >=30 years) in

Section 7.4.

36 SPSS Workshop 2014 Tutorial

o Make Cutpoints…

37 SPSS Workshop 2014 Tutorial

(Sample data: DataExcel.sav)

(Sample data: DataExcel.sav)

o In the example dataset, “Date” variable is string variable. Let’s generate date type variable using string (or text).

38 SPSS Workshop 2014 Tutorial

(Sample data: DataExcel.sav)

39 SPSS Workshop 2014 Tutorial

8. Descriptive statistics

• With the dataset specified and labeled it is ready for analysis.

• The first thing that would be done before conducting the analysis would be to present descriptive statistics for each of the

variables in the study.

• The descriptive statistics that will be presented or frequency distributions, measures of central tendency and comparing

means with different groups etc.

- Central tendency (Mean, Median, Mode etc.)

- Dispersion (Variance, Standard deviation, Range, IQR etc.)

- Distribution (Skewness, kurtosis etc.)

o Analyze > Descriptive Statistics>Descriptive…

o Analyze > Descriptive Statistics>Explore…

o Analyze > Compare Means>Means…

o Analyze > Descriptive Statistics>Frequencies…

40 SPSS Workshop 2014 Tutorial

(Example dataset: DataExcel.sav)

o SPSS output:

Descriptive Statistics

N Minimum Mean Std. Deviation Variance Skewness Kurtosis

Statistic Statistic Statistic Statistic Statistic Statistic Std. Error Statistic Std. Error

Age 30 12 21.60 6.360 40.455 .067 .427 -1.380 .833

Weight 30 100 147.13 34.309 1177.085 .087 .427 -1.414 .833

Height 30 121 144.00 17.388 302.345 .872 .427 .358 .833

Valid N (listwise) 30

Descriptive Statistics

N Minimum Mean Std. Deviation Variance Skewness Kurtosis

Gender Statistic Statistic Statistic Statistic Statistic Statistic Std. Error Statistic Std. Error

F Age 18 12 20.06 5.955 35.467 .527 .536 -.805 1.038

Weight 18 100 142.61 36.934 1364.134 .314 .536 -1.596 1.038

Height 18 121 143.94 18.479 341.467 1.119 .536 .902 1.038

Valid N (listwise) 18

M Age 12 12 23.92 6.487 42.083 -.688 .637 -.852 1.232

Weight 12 111 153.92 30.189 911.356 -.141 .637 -.674 1.232

Height 12 122 144.08 16.412 269.356 .435 .637 -.338 1.232

Valid N (listwise) 12

41 SPSS Workshop 2014 Tutorial

(Example dataset: DataExcel.sav)

o SPSS output:

(Note that if you do analysis by group variable, then add group variable into “Factor List:” on the menu)

Descriptives

Statistic Std. Error

Height Mean 144.00 3.175

95% Confidence Interval for Lower Bound 137.51

Mean

Upper Bound 150.49

5% Trimmed Mean 142.96

Median 144.00

Variance 302.345

Std. Deviation 17.388

Minimum 121

Maximum 188

Range 67

Interquartile Range 24

Skewness .872 .427

Kurtosis .358 .833

Tests of Normality

a

Kolmogorov-Smirnov Shapiro-Wilk

Statistic df Sig. Statistic df Sig.

*

Height .111 30 .200 .927 30 .041

*. This is a lower bound of the true significance.

a. Lilliefors Significance Correction

42 SPSS Workshop 2014 Tutorial

6.00 12 . 124579

8.00 13 . 00013568

6.00 14 . 355679

5.00 15 . 02455

2.00 16 . 03

1.00 17 . 7

2.00 18 . 08

Stem width: 10

Each leaf: 1 case(s)

(Example dataset: DataExcel.sav)

43 SPSS Workshop 2014 Tutorial

Report

Weight

Age (categorical variable) Gender N Mean Std. Deviation Median Minimum Maximum

< 20 years F 10 131.60 36.056 111.00 100 197

M 3 169.00 30.050 167.00 140 200

Total 13 140.23 37.343 140.00 100 200

20-29 years F 6 162.33 35.770 169.00 115 199

M 7 146.29 34.369 161.00 111 198

Total 13 153.69 34.541 161.00 111 199

>= 30 years F 2 138.50 38.891 138.50 111 166

M 2 158.00 2.828 158.00 156 160

Total 4 148.25 25.171 158.00 111 166

Total F 18 142.61 36.934 133.00 100 199

M 12 153.92 30.189 160.50 111 200

Total 30 147.13 34.309 158.00 100 200

(Example dataset: DataExcel.sav)

44 SPSS Workshop 2014 Tutorial

o SPSS output:

Statistics

Weight Height

N Valid 30 30

Missing 0 0

Mean 147.13 144.00

Median 158.00 144.00

Percentiles 10 102.80 124.10

25 111.00 130.00

30 111.90 130.30

50 158.00 144.00

70 166.70 151.40

75 169.75 154.25

- Frequency table, Cross table

o Analyze > Descriptive Statistics>Frequencies…

o Analyze > Descriptive Statistics>Crosstabs…

(Example dataset: DataExcel.sav)

WCHRI, University of Alberta

45 SPSS Workshop 2014 Tutorial

o SPSS output:

Gender

Cumulative

Frequency Percent Valid Percent Percent

Valid F 18 60.0 60.0 60.0

M 12 40.0 40.0 100.0

Total 30 100.0 100.0

Ethnicity

Cumulative

Frequency Percent Valid Percent Percent

Valid A 11 36.7 36.7 36.7

B 5 16.7 16.7 53.3

O 5 16.7 16.7 70.0

W 9 30.0 30.0 100.0

Total 30 100.0 100.0

Cumulative

Frequency Percent Valid Percent Percent

Valid < 20 years 13 43.3 43.3 43.3

20-29 years 13 43.3 43.3 86.7

>= 30 years 4 13.3 13.3 100.0

Total 30 100.0 100.0

46 SPSS Workshop 2014 Tutorial

(Example dataset: DataExcel.sav)

o SPSS output:

Gender * Ethnicity Crosstabulation

Ethnicity

A B O W Total

Gender F Count 6 3 3 6 18

% within Gender 33.3% 16.7% 16.7% 33.3% 100.0%

% within Ethnicity 54.5% 60.0% 60.0% 66.7% 60.0%

% of Total 20.0% 10.0% 10.0% 20.0% 60.0%

M Count 5 2 2 3 12

% within Gender 41.7% 16.7% 16.7% 25.0% 100.0%

% within Ethnicity 45.5% 40.0% 40.0% 33.3% 40.0%

% of Total 16.7% 6.7% 6.7% 10.0% 40.0%

Total Count 11 5 5 9 30

% within Gender 36.7% 16.7% 16.7% 30.0% 100.0%

% within Ethnicity 100.0% 100.0% 100.0% 100.0% 100.0%

% of Total 36.7% 16.7% 16.7% 30.0% 100.0%

47 SPSS Workshop 2014 Tutorial

8.3. Generating graphs (or charts) for continuous data (Interval, Ratio)

- Histogram, Box-plot, Stem-and-Leaf plot

- Error bar chart, Scatter plot etc.

(Example dataset: DataExcel.sav)

48 SPSS Workshop 2014 Tutorial

49 SPSS Workshop 2014 Tutorial

50 SPSS Workshop 2014 Tutorial

8.4. Generating graphs (or charts) for categorical data (Nominal, Ordinal)

- Bar, Pie chart, Line, Area chart etc.

(Example dataset: DataExcel.sav)

• Bar chart: Graphs > Legacy Dialogs > Bar… Analyze > Descriptive Statistics>Explore…

51 SPSS Workshop 2014 Tutorial

52 SPSS Workshop 2014 Tutorial

o SPSS Menu: Graphs > Chart Builder

o Example (Data: DataExcel.sav)

53 SPSS Workshop 2014 Tutorial

• If you want to edit chart in SPSS output window, just double-click the chart that you want to edit, then you can edit

chart in “Chart Edit Window”.

• Example:

54 SPSS Workshop 2014 Tutorial

9.1. Independent sample t-test with two groups

o Example (Graze.sav): T aken from Huntsberger and Billingsley (1989).Compares two grazing methods using 32

cattle. Half of the cattle are allowed to graze continuously while the other half are subjected to controlled

grazing time. The researchers want to know if these two grazing methods affect weight gain differently

o SPSS output:

Group Statistics

GrazeType N Mean Std. Deviation Std. Error Mean

WeightGain continuous 16 75.19 33.812 8.453

controlled 16 83.13 30.535 7.634

Levene's Test for

Equality of Variances t-test for Equality of Means

95% Confidence

Interval of the

Sig. (2- Mean Std. Error Difference

F Sig. t df tailed) Difference Difference Lower Upper

WeightGain Equal variances

.085 .773 -.697 30 .491 -7.938 11.390 -31.198 15.323

assumed

Equal variances

-.697 29.694 .491 -7.938 11.390 -31.208 15.333

not assumed

o Interpretation: A group test statistic for the equality of means is reported for both equal and unequal

variances. Both tests indicate a lack of evidence for a significant difference between grazing methods (and for

the pooled test-equal variance assumed), and for the Satterthwaite test-equal variance not assumed). The

equality of variances test does not indicate a significant difference in the two variances (Levene’s Test). This

test assumes that the observations in both groups are normally distributed.

55 SPSS Workshop 2014 Tutorial

o Example (Pressure.sav): A stimulus is being examined to determine its effect on systolic blood pressure.

Twelve men participate in the study. Each man’s systolic blood pressure is measured both before and after the

stimulus is applied.

o SPSS output:

Mean N Std. Deviation Std. Error Mean

Pair 1 SBPbefore 128.67 12 6.933 2.001

SBPafter 130.50 12 5.916 1.708

N Correlation Sig.

Paired Differences

95% Confidence Interval of

Std. Std. Error the Difference Sig. (2-

Mean Deviation Mean Lower Upper t df tailed)

Pair 1 SBPbefore -

-1.833 5.828 1.683 -5.536 1.870 -1.090 11 .299

SBPafter

o Interpretation: The variables SBPbefore and SBPafter are the paired variables with a sample size of 12.

The summary statistics of the difference are displayed (mean, standard deviation, and standard

error) along with their confidence limits. The minimum and maximum differences are also displayed.

The test is not significant (t=-1.09, p=0.299), indicating that the stimuli did not significantly affect

systolic blood pressure.

56 SPSS Workshop 2014 Tutorial

10.1. Pearson’s Chi-Square test

o SPSS Menu: Analyze > Descriptive Statistics > Crosstab…

o Example (Color.sav): The eye and hair color of children from two different regions of Europe are recorded in

the data set Color. Instead of recording one observation per child, the data are recorded as cell counts, where

the variable Count contains the number of children exhibiting each of the 15 eye and hair color combinations.

The data set does not include missing combinations.

o SPSS output:

Chi-Square Tests

Asymp. Sig. (2-

Value df sided)

Pearson Chi-Square 20.925a 8 .007

Likelihood Ratio 25.973 8 .001

Linear-by-Linear Association 3.229 1 .072

N of Valid Cases 762

a. 0 cells (0.0%) have expected count less than 5. The minimum expected count is

5.75.

Hair Color

black dark fair medium red Total

Eye Color blue Count 6 51 69 68 28 222

Expected Count 6.4 53.0 66.4 63.2 32.9 222.0

% within Eye Color 2.7% 23.0% 31.1% 30.6% 12.6% 100.0%

% within Hair Color 27.3% 28.0% 30.3% 31.3% 24.8% 29.1%

brown Count 16 94 90 94 47 341

Expected Count 9.8 81.4 102.0 97.1 50.6 341.0

% within Eye Color 4.7% 27.6% 26.4% 27.6% 13.8% 100.0%

% within Hair Color 72.7% 51.6% 39.5% 43.3% 41.6% 44.8%

green Count 0 37 69 55 38 199

Expected Count 5.7 47.5 59.5 56.7 29.5 199.0

% within Eye Color 0.0% 18.6% 34.7% 27.6% 19.1% 100.0%

% within Hair Color 0.0% 20.3% 30.3% 25.3% 33.6% 26.1%

Total Count 22 182 228 217 113 762

Expected Count 22.0 182.0 228.0 217.0 113.0 762.0

% within Eye Color 2.9% 23.9% 29.9% 28.5% 14.8% 100.0%

% within Hair Color 100.0% 100.0% 100.0% 100.0% 100.0% 100.0%

o Interpretation: The SPSS output displays the chi-square statistics. The alternative hypothesis for this analysis

states that eye color is associated with hair color. With p-value=0.007, the alternative hypothesis is supported

WCHRI, University of Alberta

57 SPSS Workshop 2014 Tutorial

o Example (FatComp.sav): This example computes chi-square tests and Fisher’s exact test to compare the

probability of coronary heart disease for two types of diet. It also estimates the relative risks and computes

exact confidence limits for the odds ratio. The data set “FatComp.sav” contains hypothetical data for a case-

control study of high fat diet and the risk of coronary heart disease. The data are recorded as cell counts,

where the variable Count contains the frequencies for each exposure and response combination. The data set

is sorted in descending order by the variables Exposure and Response, so that the first cell of the 2 by 2 table

contains the frequency of positive exposure and positive response.

o SPSS output:

Exposure

No Yes Total

Heart Disease Low Cholesterol Diet Count 6 4 10

Expected Count 3.5 6.5 10.0

High Cholesterol Diet Count 2 11 13

Expected Count 4.5 8.5 13.0

Total Count 8 15 23

Expected Count 8.0 15.0 23.0

Chi-Square Tests

Asymp. Sig. (2- Exact Sig. (2- Exact Sig. (1-

Value df sided) sided) sided)

Pearson Chi-Square 4.960a 1 .026

Continuity Correctionb 3.188 1 .074

Likelihood Ratio 5.098 1 .024

Fisher's Exact Test .039 .037

Linear-by-Linear Association 4.744 1 .029

N of Valid Cases 23

a. 2 cells (50.0%) have expected count less than 5. The minimum expected count is 3.48.

b. Computed only for a 2x2 table

58 SPSS Workshop 2014 Tutorial

Risk Estimate

95% Confidence Interval

Value Lower Upper

Odds Ratio for Heart Disease

(Low Cholesterol Diet / High 8.250 1.154 59.003

Cholesterol Diet)

For cohort Exposure = No 3.900 .989 15.373

For cohort Exposure = Yes .473 .214 1.045

N of Valid Cases 23

o Interpretation: SPSS output displays the chi-square statistics. Because the expected counts in some of the

table cells are small, Output gives a warning that the asymptotic chi-square tests might not be appropriate. In

this case, the exact tests are appropriate. The alternative hypothesis for this analysis states that coronary

heart disease is more likely to be associated with a high fat diet, so a one-sided test is desired. Fisher’s exact

right-sided test analyzes whether the probability of heart disease in the high fat group exceeds the probability

of heart disease in the low fat group; because this p-value is small, the alternative hypothesis is supported.

The odds ratio, displayed in “Risk estimate” table, provides an estimate of the relative risk when an event is

rare. This estimate indicates that the odds of heart disease is 8.25 times higher in the high fat diet group;

however, the wide confidence limits indicate that this estimate has low precision.

o Example (Migraine.sav): The data set Migraine contains hypothetical data for a clinical trial of migraine

treatment. Subjects of both genders receive either a new drug therapy or a placebo. Their response to

treatment is coded as 'Better' or 'Same'. The data are recorded as cell counts, and the number of subjects for

each treatment and response combination is recorded in the variable Count.

59 SPSS Workshop 2014 Tutorial

o SPSS output:

Response

Gender Better Same Total

female Treatment Active Count 16 11 27

Expected Count 10.9 16.1 27.0

Placebo Count 5 20 25

Expected Count 10.1 14.9 25.0

Total Count 21 31 52

Expected Count 21.0 31.0 52.0

male Treatment Active Count 12 16 28

Expected Count 9.9 18.1 28.0

Placebo Count 7 19 26

Expected Count 9.1 16.9 26.0

Total Count 19 35 54

Expected Count 19.0 35.0 54.0

Total Treatment Active Count 28 27 55

Expected Count 20.8 34.2 55.0

Placebo Count 12 39 51

Expected Count 19.2 31.8 51.0

Total Count 40 66 106

Expected Count 40.0 66.0 106.0

Chi-Square Tests

Asymp. Sig. (2- Exact Sig. (2- Exact Sig. (1-

Gender Value df sided) sided) sided)

female Pearson Chi-Square 8.310c 1 .004

Continuity Correctionb 6.759 1 .009

Likelihood Ratio 8.633 1 .003

Fisher's Exact Test .005 .004

N of Valid Cases 52

male Pearson Chi-Square 1.501d 1 .221

Continuity Correctionb .884 1 .347

Likelihood Ratio 1.515 1 .218

Fisher's Exact Test .264 .174

N of Valid Cases 54

Total Pearson Chi-Square 8.443a 1 .004

Continuity Correctionb 7.318 1 .007

Likelihood Ratio 8.626 1 .003

Fisher's Exact Test .005 .003

N of Valid Cases 106

a. 0 cells (0.0%) have expected count less than 5. The minimum expected count is 19.25.

b. Computed only for a 2x2 table

c. 0 cells (0.0%) have expected count less than 5. The minimum expected count is 10.10.

d. 0 cells (0.0%) have expected count less than 5. The minimum expected count is 9.15.

Risk Estimate

95% Confidence Interval

Gender Value Lower Upper

female Odds Ratio for Treatment

5.818 1.676 20.203

(Active / Placebo)

For cohort Response = Better 2.963 1.274 6.891

For cohort Response = Same .509 .310 .836

N of Valid Cases 52

male Odds Ratio for Treatment

2.036 .648 6.398

(Active / Placebo)

For cohort Response = Better 1.592 .741 3.418

For cohort Response = Same .782 .526 1.163

WCHRI, University of Alberta

60 SPSS Workshop 2014 Tutorial

N of Valid Cases 54

Total Odds Ratio for Treatment

3.370 1.462 7.772

(Active / Placebo)

For cohort Response = Better 2.164 1.237 3.783

For cohort Response = Same .642 .471 .875

N of Valid Cases 106

Asymp. Sig. (2-

Chi-Squared df sided)

Breslow-Day 1.493 1 .222

Tarone's 1.491 1 .222

Asymp. Sig. (2-

Chi-Squared df sided)

Cochran's 8.465 1 .004

Mantel-Haenszel 7.198 1 .007

Under the conditional independence assumption, Cochran's statistic

is asymptotically distributed as a 1 df chi-squared distribution, only if

the number of strata is fixed, while the Mantel-Haenszel statistic is

always asymptotically distributed as a 1 df chi-squared distribution.

Note that the continuity correction is removed from the Mantel-

Haenszel statistic when the sum of the differences between the

observed and the expected is 0.

Estimate 3.313

ln(Estimate) 1.198

Std. Error of ln(Estimate) .423

Asymp. Sig. (2-sided) .005

Asymp. 95% Confidence Common Odds Ratio Lower Bound 1.446

Interval Upper Bound 7.593

ln(Common Odds Ratio) Lower Bound .369

Upper Bound 2.027

The Mantel-Haenszel common odds ratio estimate is asymptotically normally distributed

under the common odds ratio of 1.000 assumption. So is the natural log of the estimate.

Breslow-Day test: The large p-value for the Breslow-Day test (p-value=0.222) in Output indicates no

significant gender difference in the odds ratios.

CMH test: The significant p-value (p=0.004) indicates that the association between treatment and

response remains strong after adjusting for gender.

The CMH statistics option in Statistics window (See Figure 57) also produces a table of overall relative

risks. Because this is a prospective study, the relative risk estimate assesses the effectiveness of the

new drug; the "For cohort response=Better” values are the appropriate estimates for the first column

(the risk of improvement). The probability of migraine improvement with the new drug is just over

two times the probability of improvement with the placebo (Relative risk=2.164).

61 SPSS Workshop 2014 Tutorial

• Common subjects being observed under 2 conditions (2 treatments, before/after, 2 diagnostic tests) in a crossover

setting

• Two possible outcomes (Presence/Absence of Characteristic) on each measurement

• Four possibilities for each subjects with respect to outcome:

– Present in both conditions

– Absent in both conditions

– Present in Condition 1, Absent in Condition 2

– Absent in Condition 1, Present in Condition 2

o Example (PrimeMinister.sav): From the data, we want to compare the probabilities of approval for the prime

minister’s performance at the times of two surveys.

o SPSS output:

First survey * Second survey Crosstabulation

Second survey

Approve Disapprove Total

First survey Approve Count 794 150 944

% within First survey 84.1% 15.9% 100.0%

Disapprove Count 86 570 656

% within First survey 13.1% 86.9% 100.0%

Total Count 880 720 1600

% within First survey 55.0% 45.0% 100.0%

Symmetric Measures

Value Asymp. Std. Errora Approx. Tb Approx. Sig.

Interval by Interval Pearson's R .702 .018 39.396 .000c

Ordinal by Ordinal Spearman Correlation .702 .018 39.396 .000c

N of Valid Cases 1600

a. Not assuming the null hypothesis.

b. Using the asymptotic standard error assuming the null hypothesis.

c. Based on normal approximation.

o Interpretation: SPSS output above displays the result of McNemar’s test for matched pair data. We can see

that there is big difference of the probabilities of approval for the prime minister’s performance at the times

of two surveys (p-value <0.001). i.e., we have strong evidence to support a drop in rating

62 SPSS Workshop 2014 Tutorial

o Cohen's kappa measures the agreement between the evaluations of two raters when both are rating the same

object. A value of 1 indicates perfect agreement. A value of 0 indicates that agreement is no better than

chance. Kappa is based on a square table in which row and column values represent the same scale. Any cell

that has observed values for one variable but not the other is assigned a count of 0. Kappa is not computed if

the data storage type (string or numeric) is not the same for the two variables. For string variable, both

variables must have the same defined length.

o A value of kappa higher than 0.75 will indicate excellent agreement while lower than 0.4 will indicate poor

agreement.

o Example (Dermatology.sav): Two dermatologists evaluate the skin condition of 88 people. From the data, we

want to know whether two dermatologists’ evaluation is same or not.

o SPSS output:

Count

Derm2

clear marginal poor terrible Total

Derm1 clear 13 6 2 0 21

marginal 5 12 4 2 23

poor 2 12 10 5 29

terrible 0 1 4 10 15

Total 20 31 20 17 88

Symmetric Measures

Value Asymp. Std. Errora Approx. Tb Approx. Sig.

Measure of Agreement Kappa .345 .072 5.637 .000

N of Valid Cases 88

a. Not assuming the null hypothesis.

b. Using the asymptotic standard error assuming the null hypothesis.

o Interpretation: From the SPSS output, estimated Cohen’s Kappa=0.345 and Test for the test of symmetry is

significant (p-value<0.001) implying low agreement.

63 SPSS Workshop 2014 Tutorial

Prospective studies

A prospective study watches for outcomes, such as the development of a disease, during the study period and

relates this to other factors such as suspected risk or protection factor(s). The study usually involves taking a

cohort of subjects and watching them over a long period. The outcome of interest should be common; otherwise,

the number of outcomes observed will be too small to be statistically meaningful (indistinguishable from those

that may have arisen by chance). All efforts should be made to avoid sources of bias such as the loss of individuals

to follow up during the study. Prospective studies usually have fewer potential sources of bias and confounding

than retrospective studies.

Retrospective studies

A retrospective study looks backwards and examines exposures to suspected risk or protection factors in relation

to an outcome that is established at the start of the study. Many valuable case-control studies, such as Lane and

Claypon's 1926 investigation of risk factors for breast cancer, were retrospective investigations. Most sources of

error due to confounding and bias are more common in retrospective studies than in prospective studies. For this

reason, retrospective investigations are often criticised. If the outcome of interest is uncommon, however, the size

of prospective investigation required to estimate relative risk is often too large to be feasible. In retrospective

studies the odds ratio provides an estimate of relative risk. You should take special care to avoid sources of bias

and confounding in retrospective studies.

Prospective investigation is required to make precise estimates of either the incidence of an outcome or the

relative risk of an outcome based on exposure.

Case-Control studies

Case-Control studies are usually but not exclusively retrospective; the opposite is true for cohort studies. The

following notes relate case-control to cohort studies:

• outcome is measured before exposure

• controls are selected on the basis of not having the outcome

• good for rare outcomes

• relatively inexpensive

• smaller numbers required

• quicker to complete

• prone to selection bias

• prone to recall/retrospective bias

• related methods are risk (retrospective), chi-square 2 by 2 test, Fisher's exact test, exact confidence

interval for odds ratio, odds ratio meta-analysis and conditional logistic regression.

Cohort studies

Cohort studies are usually but not exclusively prospective; the opposite is true for case-control studies. The

following notes relate cohort to case-control studies:

• outcome is measured after exposure

• yields true incidence rates and relative risks

• may uncover unanticipated associations with outcome

• best for common outcomes

• expensive

• requires large numbers

• takes a long time to complete

• prone to attrition bias (compensate by using person-time methods)

• prone to the bias of change in methods over time

• related methods are risk (prospective), relative risk meta-analysis, risk difference meta-analysis and

proportions

64 SPSS Workshop 2014 Tutorial

11.1. One-way ANOVA

- The One-Way ANOVA procedure produces a one-way analysis of variance for a quantitative dependent variable by a

single factor (independent) variable. Analysis of variance is used to test the hypothesis that several means are equal.

This technique is an extension of the two-sample t test.

- The assumptions of analysis of variance are that treatment effects are additive and experimental errors are

independently random with a normal distribution that has mean zero and constant variance.

- Once you have determined that differences exist among the means, post hoc range tests and pairwise multiple

comparisons can determine which means differ. Range tests identify homogeneous subsets of means that are not

different from each other. Pairwise multiple comparisons test the difference between each pair of means and yield a

matrix where asterisks indicate significantly different group means at an alpha level of 0.05.

o Example (Clover.sav): The following example studies the effect of bacteria on the nitrogen content of red

clover plants. The treatment factor is bacteria strain, and it has six levels. Five of the six levels consist of five

different Rhizobium trifolii bacteria cultures combined with a composite of five Rhizobium meliloti strains. The

sixth level is a composite of the five Rhizobium trifolii strains with the composite of the Rhizobium meliloti.

Red clover plants are inoculated with the treatments, and nitrogen content is later measured in milligrams.

o SPSS output:

Descriptives

Nitrogen

95% Confidence Interval for Mean

N Mean Std. Deviation Std. Error Lower Bound Upper Bound Minimum Maximum

3DOK1 5 28.8200 5.80017 2.59392 21.6181 36.0219 19.40 33.00

3DOK13 5 13.2600 1.42759 .63844 11.4874 15.0326 11.60 14.40

3DOK4 5 14.6400 4.11619 1.84082 9.5291 19.7509 9.10 19.40

3DOK5 5 23.9800 3.77717 1.68920 19.2900 28.6700 17.70 27.90

3DOK7 5 19.9200 1.13004 .50537 18.5169 21.3231 18.60 21.00

COMPOS 5 18.7000 1.60156 .71624 16.7114 20.6886 16.90 20.80

Total 30 19.8867 6.24217 1.13966 17.5558 22.2175 9.10 33.00

Nitrogen

Levene Statistic df1 df2 Sig.

3.145 5 24 .025

65 SPSS Workshop 2014 Tutorial

ANOVA

Nitrogen

Sum of Squares df Mean Square F Sig.

Between Groups 847.047 5 169.409 14.371 .000

Within Groups 282.928 24 11.789

Total 1129.975 29

Multiple Comparisons

Dependent Variable: Nitrogen

Tukey HSD

Mean Difference 95% Confidence Interval

(I) Strain (J) Strain (I-J) Std. Error Sig. Lower Bound Upper Bound

3DOK1 3DOK13 15.56000* 2.17151 .000 8.8458 22.2742

3DOK4 14.18000* 2.17151 .000 7.4658 20.8942

3DOK5 4.84000 2.17151 .262 -1.8742 11.5542

3DOK7 8.90000* 2.17151 .005 2.1858 15.6142

COMPOS 10.12000* 2.17151 .001 3.4058 16.8342

3DOK13 3DOK1 -15.56000* 2.17151 .000 -22.2742 -8.8458

3DOK4 -1.38000 2.17151 .987 -8.0942 5.3342

3DOK5 -10.72000* 2.17151 .001 -17.4342 -4.0058

3DOK7 -6.66000 2.17151 .053 -13.3742 .0542

COMPOS -5.44000 2.17151 .162 -12.1542 1.2742

3DOK4 3DOK1 -14.18000* 2.17151 .000 -20.8942 -7.4658

3DOK13 1.38000 2.17151 .987 -5.3342 8.0942

3DOK5 -9.34000* 2.17151 .003 -16.0542 -2.6258

3DOK7 -5.28000 2.17151 .185 -11.9942 1.4342

COMPOS -4.06000 2.17151 .443 -10.7742 2.6542

3DOK5 3DOK1 -4.84000 2.17151 .262 -11.5542 1.8742

3DOK13 10.72000* 2.17151 .001 4.0058 17.4342

3DOK4 9.34000* 2.17151 .003 2.6258 16.0542

3DOK7 4.06000 2.17151 .443 -2.6542 10.7742

COMPOS 5.28000 2.17151 .185 -1.4342 11.9942

3DOK7 3DOK1 -8.90000* 2.17151 .005 -15.6142 -2.1858

3DOK13 6.66000 2.17151 .053 -.0542 13.3742

3DOK4 5.28000 2.17151 .185 -1.4342 11.9942

3DOK5 -4.06000 2.17151 .443 -10.7742 2.6542

COMPOS 1.22000 2.17151 .993 -5.4942 7.9342

COMPOS 3DOK1 -10.12000* 2.17151 .001 -16.8342 -3.4058

3DOK13 5.44000 2.17151 .162 -1.2742 12.1542

3DOK4 4.06000 2.17151 .443 -2.6542 10.7742

3DOK5 -5.28000 2.17151 .185 -11.9942 1.4342

3DOK7 -1.22000 2.17151 .993 -7.9342 5.4942

*. The mean difference is significant at the 0.05 level.

66 SPSS Workshop 2014 Tutorial

o Example (Drug.sav): Thiis example uses data from Kutner (1974, p. 98) to illustrate a two-way analysis of

variance. The original data source is Afifi and Azen (1972, p. 166).

o SPSS output:

Dependent Variable: y

Type III Sum of

Source Squares df Mean Square F Sig.

a

Corrected Model 4259.339 11 387.213 3.506 .001

Intercept 20037.613 1 20037.613 181.414 .000

drug 2997.472 3 999.157 9.046 .000

disease 415.873 2 207.937 1.883 .164

drug * disease 707.266 6 117.878 1.067 .396

Error 5080.817 46 110.453

Total 30013.000 58

Corrected Total 9340.155 57

a. R Squared = .456 (Adjusted R Squared = .326)

67 SPSS Workshop 2014 Tutorial

Multiple Comparisons

Dependent Variable: y

Tukey HSD

Mean 95% Confidence Interval

(I) drug (J) drug Difference (I-J) Std. Error Sig. Lower Bound Upper Bound

1 2 .53 3.838 .999 -9.70 10.76

*

3 17.32 4.070 .001 6.47 28.17

*

4 12.57 3.777 .009 2.50 22.63

2 1 -.53 3.838 .999 -10.76 9.70

*

3 16.78 4.070 .001 5.93 27.63

*

4 12.03 3.777 .013 1.97 22.10

*

3 1 -17.32 4.070 .001 -28.17 -6.47

*

2 -16.78 4.070 .001 -27.63 -5.93

4 -4.75 4.013 .640 -15.45 5.95

*

4 1 -12.57 3.777 .009 -22.63 -2.50

*

2 -12.03 3.777 .013 -22.10 -1.97

3 4.75 4.013 .640 -5.95 15.45

Based on observed means.

The error term is Mean Square(Error) = 110.453.

*. The mean difference is significant at the .05 level.

68 SPSS Workshop 2014 Tutorial

Two general applications exist for ANCOVA:

• Remove Error Variance in the Randomized Experiment: Participants are assigned to treatment and control

groups in any ANOVA-type design. ANCOVA is then used as the statistical technique to eliminate irrelevant y

variance.

• Equating Non-Equivalent (Intact) Groups: A very controversial use of ANCOVA is to correct for initial group

differences (prior to assigned to x) that exists on y among several intact, state variable groups.

o Example (Cholesterol.sav): Cholesterol levels [mg/ml] for 30 women from two US states, Iowa and Nebraska.

Age [years] may be a relevant covariate.

Figure 62 ANCOVA

o SPSS output:

Dependent Variable: cholesterol

Type III Sum of

Source Squares df Mean Square F Sig.

a

Corrected Model 54432.754 2 27216.377 14.965 .000

Intercept 16901.543 1 16901.543 9.293 .005

age 53820.058 1 53820.058 29.593 .000

state 5456.450 1 5456.450 3.000 .095

Error 49103.913 27 1818.663

Total 1473140.000 30

Corrected Total 103536.667 29

a. R Squared = .526 (Adjusted R Squared = .491)

o Exercise (Goat .sav): Experiments were carried out on six commercial goat farms to determine whether the

standard worm drenching program was adequate. Forty goats were used in each experiment. Twenty of these,

chosen completely at random, were drenched according to the standard program, while the remaining twenty

were drenched more frequently. The goats were individually tagged, and weighed at the start and end of the

year-long study. For the first farm in the study the resulting liveweight gains are given along with the initial

liveweights. In each experiment the main interest was in the comparison of the liveweight gains between the

two treatments.

69 SPSS Workshop 2014 Tutorial

Multivariate (>1 dependent variable) tests for differences among groups. ANOVA is a special case of MANOVA

• MANOVA - This is a good option if there are two or more continuous dependent variables and one categorical predictor

variable.

• Discriminant function analysis - This is a reasonable option and is equivalent to a one-way MANOVA.

• The data could be reshaped into long format and analyzed as a multilevel model.

• Separate univariate ANOVAs - You could analyze these data using separate univariate ANOVAs for each response

variable. The univariate ANOVA will not produce multivariate results utilizing information from all variables

simultaneously. In addition, separate univariate tests are generally less powerful because they do not take into

account the inter-correlation of the dependent variables.

Assumption of MANOVA

• One of the assumptions of MANOVA is that the response variables come from group populations that are multivariate

normal distributed. This means that each of the dependent variables is normally distributed within group, that any

linear combination of the dependent variables is normally distributed, and that all subsets of the variables must be

multivariate normal. With respect to Type I error rate, MANOVA tends to be robust to minor violations of the

multivariate normality assumption.

• The homogeneity of population covariance matrices (a.k.a. sphericity) is another assumption. This implies that the

population variances and covariances of all dependent variables must be equal in all groups formed by the

independent variables.

• Small samples can have low power, but if the multivariate normality assumption is met, the MANOVA is generally more

powerful than separate univariate tests.

o Example (MANOVA_Dietary .sav): A researcher randomly assigns 33 subjects to one of three groups. The first

group receives technical dietary information interactively from an on-line website. Group 2 receives the same

information from a nurse practitioner, while group 3 receives the information from a video tape made by the

same nurse practitioner. The researcher looks at three different ratings of the presentation, difficulty,

usefulness and importance, to determine if there is a difference in the modes of presentation. In particular,

the researcher is interested in whether the interactive website is superior because that is the most cost-

effective way of delivering the information.

70 SPSS Workshop 2014 Tutorial

Figure 63 MANOVA

o SPSS output:

Descriptive Statistics

GROUP Mean Std. Deviation N

USEFUL Treatment 18.1182 3.90380 11

Control1 15.5273 2.07562 11

Control2 15.3455 3.13827 11

Total 16.3303 3.29246 33

DIFFICULTY Treatment 6.1909 1.89971 11

Control1 5.5818 2.43426 11

Control2 5.3727 1.75903 11

Total 5.7152 2.01760 33

IMPORTANCE Treatment 8.6818 4.86309 11

Control1 5.1091 2.53119 11

Control2 5.6364 3.54691 11

Total 6.4758 3.98513 33

Multivariate Testsa

Effect Value F Hypothesis df Error df Sig.

Intercept Pillai's Trace .986 657.857b 3.000 28.000 .000

Wilks' Lambda .014 657.857b 3.000 28.000 .000

Hotelling's Trace 70.485 657.857b 3.000 28.000 .000

Roy's Largest Root 70.485 657.857b 3.000 28.000 .000

GROUP Pillai's Trace .477 3.025 6.000 58.000 .012

Wilks' Lambda .526 3.538b 6.000 56.000 .005

Hotelling's Trace .897 4.038 6.000 54.000 .002

Roy's Largest Root .892 8.623c 3.000 29.000 .000

a. Design: Intercept + GROUP

b. Exact statistic

c. The statistic is an upper bound on F that yields a lower bound on the significance level.

71 SPSS Workshop 2014 Tutorial

Type III Sum of

Source Dependent Variable Squares df Mean Square F Sig.

Corrected Model USEFUL 52.924a 2 26.462 2.701 .083

DIFFICULTY 3.975b 2 1.988 .472 .628

IMPORTANCE 81.830c 2 40.915 2.879 .072

Intercept USEFUL 8800.400 1 8800.400 898.106 .000

DIFFICULTY 1077.878 1 1077.878 256.054 .000

IMPORTANCE 1383.869 1 1383.869 97.371 .000

GROUP USEFUL 52.924 2 26.462 2.701 .083

DIFFICULTY 3.975 2 1.988 .472 .628

IMPORTANCE 81.830 2 40.915 2.879 .072

Error USEFUL 293.965 30 9.799

DIFFICULTY 126.287 30 4.210

IMPORTANCE 426.371 30 14.212

Total USEFUL 9147.290 33

DIFFICULTY 1208.140 33

IMPORTANCE 1892.070 33

Corrected Total USEFUL 346.890 32

DIFFICULTY 130.262 32

IMPORTANCE 508.201 32

a. R Squared = .153 (Adjusted R Squared = .096)

b. R Squared = .031 (Adjusted R Squared = -.034)

c. R Squared = .161 (Adjusted R Squared = .105)

o Exercise (Pottery .sav): This example employs multivariate analysis of variance (MANOVA) to measure

differences in the chemical characteristics of ancient pottery found at four kiln sites in Great Britain. The data

are from Tubb, Parker, and Nickless (1980), as reported in Hand et al. (1994). For each of 26 samples of

pottery, the percentages of oxides of five metals are measured.

72 SPSS Workshop 2014 Tutorial

• A statistical method is called non-parametric if it makes no assumption on the population distribution or sample size.

• This is in contrast with most parametric methods in elementary statistics that assume the data is quantitative, the

population has a normal distribution and the sample size is sufficiently large.

• In general, conclusions drawn from non-parametric methods are not as powerful as the parametric ones. However, as

non-parametric methods make fewer assumptions, they are more flexible, more robust, and applicable to non-

quantitative data.

o SPSS Menu: Analyze > Nonparametic Tests >Legacy Dialogs> 2 Independent Samples …

o Corresponding parametric method: two-sample t-test

o Example (Graze.sav): T aken from Huntsberger and Billingsley (1989). Compares two grazing methods using

32 cattle. Half of the cattle are allowed to graze continuously while the other half are subjected to controlled

grazing time. The researchers want to know if these two grazing methods affect weight gain differently

o SPSS output:

Ranks

GrazeType N Mean Rank Sum of Ranks

WeightGain continuous 16 15.19 243.00

controlled 16 17.81 285.00

Total 32

a

Test Statistics

WeightGain

Mann-Whitney U 107.000

Wilcoxon W 243.000

Z -.792

Asymp. Sig. (2-tailed) .429

b

Exact Sig. [2*(1-tailed Sig.)] .445

a. Grouping Variable: GrazeType

b. Not corrected for ties.

73 SPSS Workshop 2014 Tutorial

o SPSS Menu: Analyze > Nonparametic Tests >Legacy Dialogs> 2 Related Samples …

o Corresponding parametric method: paired t-test

o Example (Pressure.sav): A stimulus is being examined to determine its effect on systolic blood pressure.

Twelve men participate in the study. Each man’s systolic blood pressure is measured both before and after the

stimulus is applied.

o SPSS output:

Ranks

N Mean Rank Sum of Ranks

a

SBPafter - SBPbefore Negative Ranks 3 8.17 24.50

b

Positive Ranks 9 5.94 53.50

Ties c

0

Total 12

a. SBPafter < SBPbefore

b. SBPafter > SBPbefore

c. SBPafter = SBPbefore

a

Test Statistics

SBPafter -

SBPbefore

b

Z -1.143

Asymp. Sig. (2-tailed) .253

a. Wilcoxon Signed Ranks Test

b. Based on negative ranks.

74 SPSS Workshop 2014 Tutorial

o SPSS Menu: Analyze > Nonparametic Tests >Legacy Dialogs> K independent Samples …

o Corresponding parametric method: One-way ANOVA

o Example (Clover.sav): The following example studies the effect of bacteria on the nitrogen content of red

clover plants. The treatment factor is bacteria strain, and it has six levels. Five of the six levels consist of five

different Rhizobium trifolii bacteria cultures combined with a composite of five Rhizobium meliloti strains. The

sixth level is a composite of the five Rhizobium trifolii strains with the composite of the Rhizobium meliloti.

Red clover plants are inoculated with the treatments, and nitrogen content is later measured in milligrams.

o SPSS output:

Ranks

Strain N Mean Rank

Nitrogen 3DOK1 5 26.00

3DOK13 5 4.60

3DOK4 5 8.00

3DOK5 5 22.20

3DOK7 5 17.60

COMPOS 5 14.60

Total 30

a,b

Test Statistics

Nitrogen

Chi-Square 21.659

df 5

Asymp. Sig. .001

a. Kruskal Wallis Test

b. Grouping Variable:

Strain

75 SPSS Workshop 2014 Tutorial

- Linear relation between bivariate variables

o Example (Cholesterol.sav): Cholesterol levels [mg/ml] for 30 women from two US states, Iowa and Nebraska.

Age [years] may be a relevant covariate.

o SPSS output:

Correlations

cholesterol age

**

cholesterol Pearson Correlation 1 .688

Sig. (2-tailed) .000

N 30 30

**

age Pearson Correlation .688 1

Sig. (2-tailed) .000

N 30 30

**. Correlation is significant at the 0.01 level (2-tailed).

Correlations

cholesterol age

**

Spearman's rho cholesterol Correlation Coefficient 1.000 .749

Sig. (2-tailed) . .000

N 30 30

**

age Correlation Coefficient .749 1.000

Sig. (2-tailed) .000 .

N 30 30

**. Correlation is significant at the 0.01 level (2-tailed).

76 SPSS Workshop 2014 Tutorial

– Linear regression is the most widely used of all statistical techniques: it is the study of linear (i.e., straight-line)

relationships between variables, usually under an assumption of normally distributed errors

o Example (Cholesterol.sav): Cholesterol levels [mg/ml] for 30 women from two US states, Iowa and Nebraska.

Age [years] may be a relevant covariate.

o SPSS output:

b

Model Summary

Adjusted R Std. Error of the

Model R R Square Square Estimate

a

1 .725 .526 .491 42.646

a. Predictors: (Constant), State1, age

b. Dependent Variable: cholesterol

a

ANOVA

Model Sum of Squares df Mean Square F Sig.

b

1 Regression 54432.754 2 27216.377 14.965 .000

Residual 49103.913 27 1818.663

Total 103536.667 29

a. Dependent Variable: cholesterol

b. Predictors: (Constant), State1, age

a

Coefficients

Standardized

Unstandardized Coefficients Coefficients

Model B Std. Error Beta t Sig.

1 (Constant) 93.141 24.799 3.756 .001

age 2.698 .496 .738 5.440 .000

State1 -28.651 16.541 -.235 -1.732 .095

a. Dependent Variable: cholesterol

77 SPSS Workshop 2014 Tutorial

o Exercise (BrainSize.sav): Are the size and weight of your brain indicators of your mental capacity? In this

study by Willerman et al. (1991) the researchers use Magnetic Resonance Imaging (MRI) to determine the

brain size of the subjects. The researchers take into account gender and body size to draw conclusions about

the connection between brain size and intelligence. Willerman et al. (1991) conducted their study at a large

southwestern university. They selected a sample of 40 right-handed Anglo introductory psychology students

who had indicated no history of alcoholism, unconsciousness, brain damage, epilepsy, or heart disease. These

subjects were drawn from a larger pool of introductory psychology students with total Scholastic Aptitude Test

Scores higher than 1350 or lower than 940 who had agreed to satisfy a course requirement by allowing the

administration of four subtests (Vocabulary, Similarities, Block Design, and Picture Completion) of the

Wechsler (1981) Adult Intelligence Scale-Revised. With prior approval of the University's research review

board, students selected for MRI were required to obtain prorated full-scale IQs of greater than 130 or less

than 103, and were equally divided by sex and IQ classification. The MRI Scans were performed at the same

facility for all 40 subjects. The scans consisted of 18 horizontal MR images. The computer counted all pixels

with non-zero gray scale in each of the 18 images and the total count served as an index for brain size.

Variable Information:

Gender: Male or Female

FSIQ: Full Scale IQ scores based on the four Wechsler (1981) subtests

VIQ: Verbal IQ scores based on the four Wechsler (1981) subtests

PIQ: Performance IQ scores based on the four Wechsler (1981) subtests

Weight: body weight in pounds

Height: height in inches

MRI_Count: total pixel Count from the 18 MRI scans

78 SPSS Workshop 2014 Tutorial

Logistic regression, also called a logit model, is used to model dichotomous outcome variables. In the logit model the log odds of

the outcome is modeled as a linear combination of the predictor variables.

o Example (Logistic.sav): A researcher is interested in how variables, such as GRE (Graduate Record Exam

scores), GPA (grade point average) and prestige of the undergraduate institution, affect admission into

graduate school. The outcome variable, admit/don't admit, is binary. This data set has a binary response

(outcome, dependent) variable called admit, which is equal to 1 if the individual was admitted to graduate

school, and 0 otherwise. There are three predictor variables: gre, gpa, and rank. We will treat the variables gre

and gpa as continuous. The variable rank takes on the values 1 through 4. Institutions with a rank of 1 have

the highest prestige, while those with a rank of 4 have the lowest. We start out by looking at some descriptive

statistics.

o SPSS output:

Model Summary

Cox & Snell R Nagelkerke R

Step -2 Log likelihood Square Square

1 458.517a .098 .138

a. Estimation terminated at iteration number 4 because parameter

estimates changed by less than .001.

79 SPSS Workshop 2014 Tutorial

Classification Tablea

Predicted

ADMIT Percentage

Observed Not admitted Admitted Correct

Step 1 ADMIT Not admitted 254 19 93.0

Admitted 97 30 23.6

Overall Percentage 71.0

a. The cut value is .500

95% C.I.for EXP(B)

B S.E. Wald df Sig. Exp(B) Lower Upper

Step 1a GRE .002 .001 4.284 1 .038 1.002 1.000 1.004

GPA .804 .332 5.872 1 .015 2.235 1.166 4.282

RANK 20.895 3 .000

RANK(1) 1.551 .418 13.787 1 .000 4.718 2.080 10.702

RANK(2) .876 .367 5.706 1 .017 2.401 1.170 4.927

RANK(3) .211 .393 .289 1 .591 1.235 .572 2.668

Constant -5.541 1.138 23.709 1 .000 .004

a. Variable(s) entered on step 1: GRE, GPA, RANK.

o Interpretation: From the output, GRE, GPA, and Rank variables are associated with response variable

(Admit or not). In logistics regression. Exp(B) is useful for interpretation. For the coefficients for GPA,

b 0.804

the odds ratio can be computed by raising e to the power of the logistic coefficient, OR = e = e =

2.235. This means that a one unit change in GPA results in 2.235 times chance to get admission.

## Molto più che documenti.

Scopri tutto ciò che Scribd ha da offrire, inclusi libri e audiolibri dei maggiori editori.

Annulla in qualsiasi momento.