V.G SPSS 13 Video Guide

3/24/2005 10:42:00 AM SPSS 13 Video Guide 1
Text
A Video Guide to SPSS for Windows, Version 13

(Version G)
by Mark S. Saviano
Part I Features of the SPSS Program
Welcome to the SPSS Video Guide. This Guide is designed primarily for use by
introductory students, learning statistics and using SPSS for the first time. We also hope The
Guide is useful as a reference or refresher for experienced or advanced users of SPSS or for other
users of SPSS in industry or elsewhere.
The Guide is designed to familiarize you with the structure and features of the SPSS
program and the use of the program to conduct statistical analyses. It also provides some general
insight on how to manage and work effectively with data sets. It is important to note that this
Guide is not intended to teach statistics that task is left to statistics textbooks and instructors.
Part I is designed to give you a thorough introduction to the basic structure and use of the
SPSS program.
Preliminary Lessons A and B demonstrate how to find and open the SPSS program on
your computer.
Chapters 1 to 3 cover very basic elements of SPSS.
Chapter 1 introduces the three main windows used by SPSS to hold data, review output,
and store programming code.
Chapter 2 demonstrates some useful ways to navigate between the three windows to work
more effectively.
and Chapter 3 outlines some features common to all SPSS windows.
Chapters 4 to 6 give an in-depth look at the features and options within the SPSS
program.
Text
Chapter 4 builds on Chapter 1 demonstrating the nature of each of the three primary
SPSS windows and how they are used.
Chapter 5 gives an overview of the menus available.
and Chapter 6 gives an in-depth examination of each of the menus and the selections
contained therein.
Chapter 7 outlines many useful skills for effectively managing your data and
Chapter 8 explains the Dialog Box, the basic tool for converting your intentions into
SPSS actions.
Preliminaries:
A. Running SPSS
Be sure that you are working on a computer that has the SPSS software. To open SPSS
using the menus, click on the Start button in the lower left corner of your screen. The style of the
Start Menu may vary from one computer to another, but there should be a listing for Programs,
All Programs, or something similar. Select the listing for Programs. Then select SPSS for
Windows. If SPSS for Windows is not showing, it is likely within one of the folders that is
showing. Open these folders, locate SPSS for Windows and select it. Then click on SPSS v.12
for Windows, or whichever version you are using. SPSS versions 11.5, 12, and 13 are fairly
similar, and this guide should work reasonably well with any of them. SPSS should open after a
short wait.
Try this now on your computer.
B. Entering SPSS
Starting SPSS opens a Data Editor window by default, shown here in the background. If
an SPSS for Windows dialog box saying, What would you like to do? opened on your
Text
computer, simply click the Cancel button to close the dialog box and go to the Data Editor.
Anything we could do from this dialog box can also be done from the main program.
1. Three Primary SPSS Windows
There are 3 main windows that well need to learn in order to work with SPSS The
Data Editor, shown here, The Output Viewer, and The Syntax Editor.
1.1 Data Editor Overview
Here we are looking at the Data Editor, labeled at the top. This is the window that opens
automatically when we start SPSS and is more or less the heart of the program. The Data Editor
will be used to store in a matrix format all of the data collected in your research study. We will
also use the Data Editor to transform or manipulate data as well as to run statistical tests or
analyses.
1.2 Output Viewer Overview
The second window well use in SPSS is the Output Viewer. We can open the Output
Viewer by clicking File, then clicking New, then clicking Output.
Pause this lesson now and try this on your computer.
Here we see the Output Viewer. This window will show the tables and graphs that result
from the statistical procedures we run and will also be used to organize your output and print
hard copies.
1.3 Syntax Editor Overview
The third window we will use in SPSS is the Syntax Editor. We can open the Syntax
Editor by clicking File, then clicking New, then clicking Syntax. We can do this from either the
Data Editor or the Output Viewer, whichever is the active window at the moment.
Pause this lesson now and try this on your computer.

Text
Here we see the Syntax Editor. The Syntax Editor is essentially just a text editor that is
linked to the SPSS statistical procedures. In this window we can write text commands that will
run SPSS procedures the same way that choosing those procedures from a menu would. Some
instructors prefer to omit use of the Syntax Editor when first introducing students to SPSS. We
have included use of the Syntax Editor in this video guide because most frequent users of SPSS
will benefit from using syntax and because SPSS can be made to write most of the syntax for us.
2. Switching Between Windows
We will often want to switch back and forth between the Data Editor window, the Output
Viewer Window, and the Syntax Editor window. It is useful to look at a couple of ways of
quickly switching the view from one of these windows to another.
2.1 Using the Taskbar
One way to switch views from one window to another is to click on the window you want
to see on the Taskbar at the bottom of the screen. Here we can see all three windows - the Data
Editor, the Output Viewer, and the Syntax Editor - represented on the Taskbar. To switch to the
Output Viewer we simply click on it. The Output Viewer becomes active on the screen and is
highlighted on the Taskbar. We can switch to another window the same way, by clicking on it.
Syntax EditorOutput ViewerData Editor. Note the unique symbols next to the window
name on the Taskbar to help us differentiate Data Editors from Output Viewers from Syntax
Editors.
Try switching between windows, on your computer, using the Taskbar now.
2.2 Using the Window Menu
Another way to switch views from one window to another is to select the window you
want to see from the Window Menu at the top of the screen. Clicking on the Window menu
Text
shows all of the SPSS windows that are currently open, with the active window marked with a
check. We can see that a Data Editor is open, an Output Viewer is open, and a Syntax Editor is
open. To switch to the Output Viewer we simply highlight this option and select it. We can
switch to another window the same way, by highlighting then selecting it. Syntax Editor
Output ViewerData Editor. Note that this menu appears in all three windows for your
convenience.
Try switching between windows on your computer, using the Window menu, now.
2.3 Sizing and Placing Windows
Another way to switch views from one window to another is to click on a part of the
window you want to see. Here I have resized and organized the Data Editor, Output Viewer, and
Syntax Editor windows so that a part of each is always showing, no matter which window is
active at the moment. The Syntax Editor is now active. To show the Output Viewer we simply
click on any part of that window. We can switch to another window the same way, by clicking
on a part of it. Data EditorOutput ViewerSyntax Editor.
Try switching between windows by sizing and placing the windows on your computer
now.
3 Common Window Features
The SPSS Data Editor, Output Viewer, and Syntax Editor windows share some common
features, features that are also common in many other Windows programs. The next several
lessons will introduce these features.
3.1 Title Bar
At the top of the window is the Title Bar. On the left side, the Title Bar shows the name
of the file this window displays as well as the type of SPSS file it is in this case a Data Editor.
Text
On the right side, the Title Bar contains the Minimize button, the Restore or Maximize button,
and the Close button.
3.2 Menu Bar
Below the Title Bar is the Menu Bar. The menus contain the actions or procedures we
want to perform. A single click on a menu selection will reveal a number of actions or
procedures associated with that menu. A second single click on the menu heading or anywhere
except the menu contents will close the menu.
3.3 Toolbar
Below the Menu Bar is the Toolbar containing a number of toolbar buttons. Clicking a
button performs an action and acts as a shortcut rather than selecting that action from the menu
selections. The function of each button can be seen by leaving the cursor over the button for a
second without moving it. Here we see the Open File, Save File, and Print buttons.
3.4 Status Bar
At the bottom of the screen is the Status Bar. This bar shows the status of procedures you
are running and whether you have filtered, weighted, or subdivided your cases in any way.
3.5 Common Features Summary
These features a Title Bar, a Menu Bar, a Toolbar with buttons, and a Status Bar are
common features of all SPSS windows. Note that options can be set to hide the Toolbar and/or
the Status Bar, so it is possible that one or both may not be shown.
4. Unique Window Features
While all three SPSS windows have some features in common, each window also has
unique features suited for the function of that window. The next several lessons will introduce
these features.
Text
4.1 Two Data Editor Views
The Data Editor has two separate views, Data View, for viewing raw data, and Variable
View, for viewing characteristics of our variables. We can switch between these views by
clicking the tabs at the bottom of the screenVariable ViewData View.
Try switching back and forth between these two views, on your computer, now.
4.1.1 Data Editor Data View
Here I have opened a data file. In the Data View of the Data Editor Window there is an
Active Data line below the Toolbar, and a grid of cells formed by the rows and columns.
The Active Data line shows which data cell is selected and what its contents are. The
gray portion on the left lists the row and column of the cell selected. In this example the cell in
row 1 and column ID is selected and the gray portion of the Active Data line shows this.
The white portion on the right lists what data has been entered into that cell. The selected cell
has the number 1 entered into it, and this is shown on the Active Data Line. We can switch to
other cells simply by clicking on them and the cell listing and contents on the Active Data line
will change accordingly: For example, (Row 5, Column GRE_Q)=760, and (Row 9, Column
GRE_V)=530.
The Data Cells are the main portion of the Data Editor Window. The data cells are
simply a matrix or grid of rows and columns, into which we can enter individual pieces of data.
The rows indicate cases or individuals we have measured, in this example person one, person
two, person three, and so on. The columns represent characteristics or variables we have
measured for each person, in this example participant ID, sex of participant, etc. Together, the
rows and columns create a place to enter a particular characteristic for a particular individual, for
Text
example the Quantitative score of Person 3 is 680. The whole matrix of cells holds all the
characteristics we have measured for all individuals in our study.
Try selecting different cells on your computer now and notice that the Active Data Line
changes to list the cell you have selected.
4.1.2 Data Editor Variable View
Switch to the variable view by clicking the tab at the bottom of the window. In Variable
View of the Data Editor there is also a matrix or grid of rows and columns. The rows and
columns here, however, represent something very different than the rows and columns in the
Data View. Unlike the Data View, this grid represents only the variables, not every specific piece
of data.
The rows in Variable View represent the variables in our study, just as the columns did in
Data View. Switching between views we can see that the variables of ID, Sex, Program, and
Major are the columns in the Data View but become the rows ID, Sex, Program, and Major in the
Variable View. The columns in Variable View represent specific characteristics of our variables,
not characteristics of the individuals in our study.
The purpose of Variable View is to allow us to quickly enter and define the characteristics
of the variables we are measuring in our study.
NAME shows the name of our variable, up to 8 characters long in SPSS v11.5 or earlier,
or up to 64 characters long in SPSS v. 12.
TYPE shows what kind of data will be entered into this variable. By far the most
common type of data is numeric data or numbers, though we could set the variable for several
other kinds of data including string or letters, or dates.

Text
WIDTH shows up to how many digits or characters a single piece of data for this variable
can be.
DECIMALS shows how many decimal places to use and display for this variable.
LABEL allows us to create a label for our variable that is more descriptive than the
variable name. For example, variable labels may contain spaces or begin with a number whereas
variable names may not contain spaces and may not begin with a number.
VALUES allows us to indicate categories using a numeric code. For example, the
variable Sex has value labels of 1=male and 2=female. Coding data in this way allows us to
enter the data faster and keep the data set more organized.
MISSING allows us to set codes for data that is missing, such as non-response, refusal, or
measurement error.
COLUMNS indicates how wide the display for the variable will be in the Data View.
ALIGN indicates how the data for this variable will be aligned in the column in Data
View. And
MEASURE allows us to set the scale of measurement for this variable.
4.2. Output Viewer
Below the Toolbar, the Output Viewer is divided into two sections by a vertical bar. The
left side of the bar is the Outline View and contains an outline or list of all of the results, much
like a Table of Contents. The right side of the bar is the Results View and shows fully all of the
actual graphs and tables generated by our procedures. Later in this Guide we will go over how to
run procedures and produce output. For now, let me quickly produce some output that we may
use as an example.
Text
On the left, in the Outline View, we can see an outline of all of the results just produced.
Each line here is referred to as a book and uses a icon that looks like a small open book. On
the right, in the Results View, we can see the actual results in full.
Clicking on a book in the Outline View on the left will cause the output represented by
that book to appear in the Results View on the right. In this way, the Outline View helps us to
move quickly from one section to another as we look over our results. We can look at the
Statistics table, then look at the 3rd histogram at the end of our results, then return to Statistics
table or to the Log.
Double-clicking a book causes the book to close in the Outline View and will hide the
corresponding output in the Results View. For example, double-clicking the Statistics book
closes the book and hides the Statistics table. Double clicking again opens the book and restores
the Statistics table.
We can also rearrange the order of our output by clicking and dragging the books in
Outline View. For example, if we wanted the Histograms to appear before the Frequency Tables
we can click and drag the set of Histograms to a new location.
On the right, in Results View, we can move through results using the window scroll bar.
Also, if you want to change the relative size of the Outline View and Results View sides, simply
click and drag the vertical bar that divides them.
4.3. Syntax Editor
Below the Toolbar, the Syntax Editor is more or less a text editor. A blinking cursor
shows our location in the editor and we can type in whatever we want. Like most text editors we
can use the arrow keys to move the cursordouble click on a word to select itclick and drag
to highlight a section of textor click and drag a highlighted section to a new location. Unlike
Text
most text editors, there are not many ways we can format the text, other than changing the type
or size of the font. This is because the Syntax Editor is used to write short text programs that
command SPSS to run procedures. The look of the text is not really that important, so long as
the text is written correctly to run the SPSS procedure. And well see later that SPSS will write
most of the program text for us, making our jobs much easier.
5. The Menus Overview
Now that you understand the basic features and purposes of the Data EditorOutput
Viewerand Syntax Editor windowslets look at the list of menus, also called Menu
Selections, and the options they provide. These menus are where all the functions and statistical
procedures of SPSS are located and where you go to run these procedures.
Note, this guide does not go into detail on use of the toolbar buttons because every
command that can be executed with a toolbar button can also be executed from a menu or using
syntax. The toolbar buttons, however, can be a nice shortcut for menu commands that are used
frequently, and we encourage you to explore the use of the buttons as you become more familiar
with the program.
5.1. Common Menus
Most of the menu selections in the Menu Bars of the Data Editor, Output Viewer, and
Syntax Editor windows are the same. All three primary SPSS windows have the following basic
set of menus: File, Edit, View, Data, Transform, Analyze, Graphs, Utilities, Add-ons, Window
and Help. Further, the specific listings in the Data, Transform, Analyze, Graphs, Utilities, Add-
ons, Window, and Help menus are virtually identical in all three SPSS windows. This allows you
to choose most menu options from whichever SPSS window you are in, without having to switch
from one window to another, for example, from the Output Viewer to the Data Editor. The
Text
listing within the File, Edit, and View menus may change somewhat from one window to another
AND when choosing a listing, for example save file, the resulting action usually only applies
to the window you selected it from. For example, the Save File listing in the Output Viewer
window will only save only the Output file, not the Data or Syntax files. To save the Data Editor
or Syntax Editor files, you must select the Save File command from within those windows.
5.2. Unique Menus
While most of the menus appear in all three SPSS windows, the menus Insert and
Format appear only in the Output Viewer window, not in the Data Editor or Syntax Editor
windows. Similarly, the menu Run appears only in the Syntax Editor window. The commands
in the Insert and Format menus apply specifically to your Output and you must be in the
Output Viewer window to use them. Similarly, the commands in the Run menu apply
specifically to your syntax and you must be in that window to use them.
6. The Menus Up Close
In the next several lessons, well take a close look at the options available in each menu.
Well start by looking at the options in the Common menus that are more or less identical in all
three SPSS windows. Then well look at the options in the Insert and Format menus, from
the Output Viewer window, and the options in the Run menu, from the Syntax Editor window.
6.1. File Menu
Like most Windows programs, the File menu has selections allowing you to create a new
file, open an existing file, save your file or a renamed copy of it, print your file or preview it,
open a file from a list of recently used files, or exit and close your file. These selections should
function very much as they would in other Windows programs. As a data analysis program, the
File menu in SPSS also includes selections for opening other types of data files, such as external
Text
databases or text files, and a selection for displaying a comprehensive summary of the data in
your file. The Cache Data, Stop Processor, and Switch Server selections are for advanced users
working with very large data files or using data over a network connection. The differences
between the File Menu options here in the Data Editor and the ones in the Output Viewer and
Syntax Editor windows are small and we leave it to you to explore those on your own.
6.2. Edit Menu
Like most Windows programs, the Edit menu has options that allow you to Undo your
last action, Cut, Copy, or Paste data, Clear or Delete an entry, or Find a particular number, word,
or phrase. In the Output Viewer this menu also has an Outline option for organizing your output
and in the Syntax window this menu also includes a Replace option for finding and replacing
particular words or phrases, usually variable names. Each of these commands functions very
much like the similar commands found in other Windows programs.
The Edit Menu also includes a selection called Options. This selection is often
overlooked by users of SPSS but is in fact very important and will be covered in the next lesson.
6.2.1. Edit>Options Dialog Box
Selecting Edit>Options from the menus brings you to a complicated dialog box that has
many tabs and many options on each tab. It is from here that you can control most of the user-
interface options in SPSS. That is, this is where you can customize how SPSS will look and feel
and some of the ways it will respond to your actions. If you come to use SPSS frequently you
will definitely want to explore these options and make changes that suite your style and needs. A
full exposition of the options in this dialog box is beyond the scope of this introductory guide.
Instead, we will highlight a few of the most useful and relevant options you should know.
Text
On the General tab, its worth noting the Variable Lists options. These control how
variables are displayed in the dialog boxes from which we run our tests. We have options for
displaying either the variable labels or the variable names and displaying the variables either in
alphabetical order or in the order in which they appear in the Data Editor. Its also worth noting
that SPSS supports a variety of languages.
On the Viewer tab, its worth noting the check box for Display commands in the Log.
Selecting this check box will cause SPSS to print the syntax for any command run at the
beginning of the resulting output. This is one nice way to have SPSS generate syntax for you
that may be copied to the Syntax Editor Window so that you do not have to create the syntax for
yourself.
The Output Labels tab allows you to choose whether variable names or variable labels are
used in your output as well as whether value labels or specific values are used in your output.
Finally I will note that a brief description of many of the options in the Edit>Options
dialog box can be seen by RIGHT-clicking on an item. Ill leave it to you to explore these
options further.
6.3. View Menu
The View menu allows you to control to some extent what is shown on your screen. In
the Data Editor window, the View menu will allow you to hide or show the Status bar and the
Toolbar. Hide the Status barShow the Status BarHide the ToolbarShow the Toolbar. In
addition, the View menu allows you to control the font and point size of the data displayed, to
hide or show the gridlines or cell boundaries, and to show the value labels or simply the coded
entries. You may also switch back and forth between the Data View and Variable View tabs from
this menu.
Text
In the Output Viewer window, the View menu similarly has options for showing or hiding
the Status bar and Toolbar, or for changing the font. There are also options for opening or
closing the books in the Outline View, or for changing the size of the outline in the Outline View.
In the Syntax Editor window, the View menu simply controls the Status bar, Toolbar, and
fonts.
6.4. Data Menu
The Data menu provides a number of options for working with your data set. Insert
Variable allows you to create a new column for a new variable, shown here on the left. Insert
Cases allows you to create a new row for a new individual, shown here at the top. And Sort
Cases allows you to sort the rows according to some column or variable. Here the data is sorted
by ID, but we could easily sort by quantitative GRE scoreas soor return to a sort by ID.
Note that even without this menu you can always add a variable by using the first unused column
to the right of your data set or add a case by using the first unused row at the bottom of your data
set. Note also that you can add variables, cases, or sort, by right clicking on the row or column
headings and choosing the desired option from the menu that results.
The Split File option allows you to divide you data set into groups based on some
variable, for example, Sex, so that you can perform analyses separately for each group, for
example for males and for females. Similarly, Select Cases allows you to perform analyses on
some specific subset of your data.
By default, SPSS gives equal weight to every case or individual when performing
computations. Weight Cases allows you to give some individuals greater weight in computations
than others if it is appropriate to do so.

Text
Aggregate allows you to combine multiple rows into a single row using some
mathematical function, for example, the mean or standard deviation of the rows. This is not
typically used to get simple means or standard deviations for you data, but rather to create a new
higher-order data set, for example aggregating data for individuals into data for states, or data for
states into data for countries.
The Merge Files option allows you to combine your current data set with an additional
data file that contains either the same variables for additional individuals or additional variables
for the same individuals.
Define Variable Properties, Copy Data Properties, Define Dates, Transpose, and
Restructure are more advanced options for working with variable definitions or changing the
arrangement of your data set.
6.5. Transform Menu
The Transform menu generally creates a new variable based on some manipulation of the
existing variables. Compute allows you to create a new variable using some mathematical
function of existing variables, for example creating a GRE Total score by adding the verbal and
math scores. Recode allows you to create a new recoded variable when necessary, for example,
when you have positively and negatively worded survey items and need to reverse the order of
the negatively worded items to match that of the positively worded items.
Count will create a new variable that counts for each case how many other variables met
a particular condition, for example the number of questions ranked 5 on a teaching evaluation,
where the rows are the students and the columns are the questions, and the questions were on a 1
to 5 scale. Rank Cases will create a new variable that lists the rank of an individual in the whole
set of individuals, for example which case had the highest GRE score, the second highest, etc.
Text
Replace Missing Values allows you to replace blank or missing data points in the data set
with a substitute value, when it is appropriate to do so. For example, replacing the missing value
with the mean of all individuals responding.
Automatic Recode, Create Time Series, and Random Number Seed are for advanced or
special purpose functions and are less frequently used by introductory students.
6.6. Analyze, Window, and Add-Ons Menus
The Analyze menu is probably the most important of the menus and certainly the one you
will use most frequently. This menu contains all of the statistical tests and procedures you have
learned about, or will learn about, in your Introductory Statistics class. Most of the later lessons
in this Guide will focus on examining the options in this menu in detail.
The Window menu was describe thoroughly in the Chapter 2 in the Using the Window
Menu lesson.
The Add-ons menu contains a number of additional SPSS products that can be purchased
and used in conjunction with the Base product we are learning. In most cases, these products
will not be installed on your computer.
6.7. Graphs Menu
For many SPSS procedures, it is possible to produce graphs at the same time the
procedure is run. If you wish to produce a graph separately from any procedure, however, the
Graphs menu has a wide variety of graphing options for you to choose from. The most
commonly used graphs by statistics students are probably Histograms, Scatter Plots, or Box
Plots. The menu also has options for the more traditional Bar, Line, Area, and Pie graphs, as
well as several other types of graphs or plots.
6.8 Utilities Menu

Text
The Utilities menu contains some useful tools that make working with SPSS easier. The
Variables option can be used to quickly review your variables and the characteristics of each.
The Data File Comments option simply allows you to enter some notes that will be attached to
your data file, for example, notes on how and where the data were collected.
Define Sets allows you to group some variables together into a set you define. For
example, we can group the variables Sex, Program, and Major, into a set called Descriptors. You
could also use Define Sets to group together all the items from a single survey. Grouping
variables into a set can be particularly helpful when you are working with large files with many
variables. Use Sets allows you to limit the variables you are working with to only the sets you
choose. For example, if we only wanted to work with the variables, Sex, Program, and Major,
we could select only the set Descriptors, which we just created, and removing the two predefined
SPSS sets, All Variables and New Variables. All the variables will still appear in the Data Editor
but only Sex, Program, and Major, the variables in the set chosen, will be listed in the dialog
boxes when we choose procedures from the menus. Menu Editor is a tool that allows you to
customize the menus and options that appear in each SPSS window. In general, I recommend
working with the default menus provided by SPSS.
OMS Identifiers and Run Script are advanced options beyond the scope of this
introductory guide.
6.9. Help Menu
The SPSS Help menu has several tools to aid you in using SPSS and conducting your
analyses. Selecting Topics will take you to the online manual, which contains a complete listing
of SPSSs features. This option can be useful if you are looking for specific information about a
particular topic.
Text
Selecting Tutorial runs a built-in tutorial provided by the makers of SPSS. The tutorial
presents general information on using SPSS, is structured for novice users, and introduces topics
in a careful, methodical way.
The Statistics Coach is an interactive tool that guides you to the test you need to perform
by asking you about your question and the nature of your data.
The Case Studies option is a wonderful tool if you have a specific analysis to perform.
You select the test and SPSS will show you the steps involved in a standard analysis, including a
concrete example with sample data and an explanation of the resulting output.
The Command Syntax Reference option opens an Acrobat Reader file that contains
complete information on how to write the syntax for every SPSS procedure. For most of the
work in this Guide, we will have SPSS generate the syntax for us. The Command Syntax
Reference information, however, may become particularly useful when you become an
intermediate or advanced user of SPSS.
The SPSS Home Page option is simply a link that connects you to SPSSs home page via
the internet. About gives specific information about the version of SPSS in use. And Register
Product connects you to the SPSS web page and allows you to formally register a newly
purchased and installed version of SPSS. This option will only be of use to you if you have
purchased SPSS for installation on your own personal computer.
6.10. Insert Menu
The Insert Menu is only available when we are working from the Output Viewer window
so we will switch to that window now. The Insert menu contains options that allow you to insert
or remove a page break or insert Headings, Titles, Page Titles, or plain Text. You can insert a
Text
Text File or an Object such as a picture, a spreadsheet, or clip art, or use the Interactive options
to create a new graph from your data to insert into your output file.
SPSS v.12 contained an menu option allowing you to insert an old graph saved from
previous work but that option has been removed in SPSS v.13, likely because of revisions made
between v.12 and v.13 to the Output system and programming.
6.11. Format Menu
The Format menu, like the Insert menu, is only available when we are working in the
Output Viewer window. This menu allows you to set the alignment on the page for printing of an
output item or title. Note: It is also possible to open an output item by double clicking it.
When you do so, the Format menu will change and will have many more options for formatting
the item you are working with. In SPSS v.13, but not v.12, an additional Pivoting Trays window
and Formatting Toolbar will appear when you open the object. The Formatting Toolbar is
controlled by the Toolbar option in the View menu, while the Pivoting Trays window is
controlled by the Pivoting Trays option in the Pivot menu. Use of a Formatting Toolbar is
probably familiar to you from using other Windows programs. The Pivoting Trays window is
useful and will be described more in CH 9 Descriptive Statistics in the lesson on OLAP Cubes.
To close an output item, simply click outside of the item, and it will return to normal.
6.12. Run Menu
The Run menu is only available from the Syntax Editor Window, so we will switch to
that window now. Recall that the Syntax Editor window is used to write text commands that will
run SPSS statistical procedures. The Run menu allows you to run all of the procedures you have
typed into the Syntax Editor, just those procedures you have highlighted with the cursor, just the
Text
procedure at the current location of the cursor, or all the procedures from the current location of
the cursor to the end of the file.
7. Working With Data
Before you can run any statistical tests you must first assemble your data set. Working
effectively with data is a skill in and of itself separate from skill in conducting statistics. The
following lessons are intended to give you a basic introduction to some of the most fundamental
skills needed to work effectively with data.
7.1. Opening SPSS Files
The most basic data skill is being able to open existing SPSS files. This allows you to
open the sample files included with SPSS, the practice files included with this Video Guide, files
created by your instructor, or any other existing SPSS files.
Starting from the empty Data Editor window, we click FileOpen, then the kind of SPSS
file we want to open, in this case a data file. This opens a Windows dialog box used to locate
files on your computer. Which folder from your computer is showing here may vary from
machine to machine, depending on the default settings. Frequently, the default folder will be the
SPSS folder. Its worth noting that the SPSS folder contains a large number of sample data files
included with SPSS. You can open these files yourself, or they can be accessed through the
SPSS Tutorial or Case Studies help tools.
We will use the Up One Level button repeatedly until we reach the level that shows the
computers disk drives. Open the SPSS Video Guide from the CD drive by double clicking it.
Select the data file, Video Guide Practice Data 1.sav, and click Open. The sample data file
should be loaded into the Data Editor window. The process for Output files and Syntax files is
identical: FileOpenOutput. The dialog box still shows the Video Guide folder. We select
Text
the file, Video Guide Practice Output 1.spo, and open it. Similarly: FileOpenSyntax.
Select the file, Video Guide Practice Syntax.1.sps, and open it.
Note, it is possible to have multiple Output Viewer or Syntax Editor windows running
simultaneously, but you may only have one Data Editor window running at a time.
Just to make a point, we can go back to the CD folder, and use the View Menu and
Details View to see that VG Sample Output 1 is 51 kilobytes in size. In the same way we can
see that VG Sample Syntax file 1 is only 1 kilobyte in size. Yet the output in the output file can
be re-generated from the syntax file in maybe one or two SECONDS. This is one reason to learn
to use syntax. Output files can be regenerated from Syntax files, which are small and easily
stored and transferred, while Output files, which contain tables and graphic images, are quite
large and can fill up your disk or CD quickly.
Try opening the practice data, output, and syntax files from the Video Guide CD with
your computer now.
7.1.1. Recently Used Files
When analyzing data and running tests, it is often necessary for you to work with a
number of different files, whether data, output, or syntax. The File Menu is SPSS makes it easy
to open or switch to files by using the Recently Used Data option to open data files or the
Recently Used Files option to open output or syntax files. Using these opens may be faster than
using the Open option, especially when files are located in a number of different drives or
folders.
One word of caution: Be sure you know which drive you are working from. If you have
the same file saved on different drives, it is easy to confuse the file on one drive, say the Floppy
drive, with the same file on another drive, say the CD drive. And confusing the drives can cause
Text
recent or updated work to be lost. In the lessons in this Guide, well be using both the Open
option and the Recently Used options to open files.
7.2. Saving SPSS Files
Saving files in SPSS is an easy task and probably one you are familiar with from other
windows programs. A few quick notes: First remember that you MUST save the data file,
output file, and syntax files separately. New users of SPSS are often frustrated when they have
saved the output file, thinking it would save all their work, and then find that all the data they
entered is lost. Save your data from the Data Editor window: FileSave AsChoose a
location to save, Ill use a floppy disk,name your file,note the file extension .SAV for data,
and Save. Save your Output from the Output Viewer window: FileSave AsChoose a
locationname the filenote the extension .SPO for outputand Save. Save your syntax from
the Syntax Editor window: FileSave AsChoose a locationname the filenote the
extension .SPS for syntaxand Save. Note that because the Data, Output, and Syntax files have
the extensions (.sav), (.spo), and (.sps) respectively, it is possible to name all three files with the
same filename if you choose.
A second note on saving, I use the following rule of thumb for deciding how often to save
my work: If you would be upset at losing everything youve done since the last time you saved?
.Its time to save again.
Try saving the practice data, output, and syntax files on your computer now.
Text
7.3. Exporting and Importing Data
The following two lessons describe the basic steps involved in moving data between
programs, either exporting data from SPSS to another program, or importing data from another
program into an SPSS data file.
7.3.1. Exporting Data
Sometimes it is desirable to transfer data between SPSS and another program, for
example a database or spreadsheet program. You should NOT have to reenter any data! In
general, once you enter something into a computer, you should never have to enter it by hand
againrather it should be transferable. SPSS is able to save and open files in several formats,
making it relatively easy to move data between programs. Lets look at an example of saving
data in another format.
Open the data setPractice Data 1. Select FileSave As By default, SPSS saves
data files in SPSS data file format. But there are many other formats we can use to save. If the
program you want to move the data to is listed, you can choose that format. Or we can choose
Tab Delimited (.dat). This is a generic format that is compatible with most data programs.
Choose the location to save, name the file, and save. We now have the data saved in a generic
format that can be opened in most other data programs.
Try saving Practice Data set 1 in tab-delimited format on your computer now.
7.3.2. Importing Data.
Other data programs usually allow you to save data in tab delimited format, which allows
you to move data from those programs to SPSS. Lets look at how SPSS can open tab delimited
files by opening the tab-delimited file we just created. Select FileOpenData. Select the
folder where the file is located. Note: Only files of the type listed will be showing. Change the
Text
type to All files to show all files in this folder. Select the tab-delimited file, (.data), and click
Open. SPSS runs an Importing Wizard to help you import the data. The wizard may look
complicated but in most cases the default selections work fine.
First it asks if you already have defined the format for opening this kind of data file. We
have not so no is appropriate. NEXT. Our variables were saved in tab-DELIMITED format,
and our variable names should be at the top of the file. NEXT. The first case or first row of
data begins on line 2, is correct. If we did not have variable names at the top of our file, data
would begin on line 1. Each line represents a case, is correct, and, we want to import All of
the data. NEXT. Our data was TAB-delimited is correct, and we do not have a text qualifier.
The Data Preview box shows us how SPSS will read the data based on the choices weve made
so far. If this preview didnt look correct, for example if our data was not saved in tab-delimited
format, but we said it was, we could go back and change our choices in the wizard. NEXT. This
screen allows us to specify the type for each of our variables or we could choose not to import
some variables. Well accept the defaults. NEXT. We have finished telling SPSS how to import
our tab-delimited file, and we did not have to change any of the defaults. If for any reason the
defaults were not the same on your machine, just select the choices weve used here. This screen
allows you to save all of the choices we made in the wizard. If we saved this format, next time
we wanted to import a similarly formatted, tab-delimited file, we should not have to go through
the whole wizard we would simply tell SPSS on the first window in the wizard to use the
choices weve made and saved here. I will not be saving our choices at this time though.
FINISH. The tab-delimited practice data file 1 is opened and looks correct. It is always a good
idea to look over the imported data though, to verify that it IS correct.
Try opening the tab-delimited Practice Data Set 1 on your computer now.
Text
7.4. Merging Two Data Files
Sometimes you will need to combine the data from two SPSS data files. You might have
a second set of data that contains the same variables or measurements as does the first data set,
but contains additional individuals not included in the first data set. To merge two sets of data,
each with the same variables for two different groups of individuals, we can use Data>Merge
Files>Add Cases..
Or, on the other hand, you might have a second set of data that contains additional
variables for the same individuals from the first data set. This often happens with longitudinal
studies for example. To merge two sets of data, each with separate variables for the same
individuals, we can use Data>Merge Files>Add Variables.
7.4.1. Merging Cases
In Merging Cases, we want to add new individuals to our data set. Open Video Guide
Practice Data 1. This data set has 200 individuals, ID numbers 1 through 200, and for each
individual listed, the variables ID, Sex, Program, Major, and 3 sets of GRE scores. Open our
second data set, Video Guide Practice Data 2. This data set contains the same variables as the
first data set, ID, Sex, Program, Major, and 3 sets of GRE scores. However, looking at ID
numbers in Data View we can see that this data set contains new individuals that were not
contained in the first data set, specifically ID numbers 201 through 400. We want to combine the
two data sets so that we have data on these variables for all 400 individuals in one file. Since
SPSS will use the variable names to match the data, we MUST make sure that the variable names
are the same in both data sets before Merging Cases.
Lets begin from the first data set. Open Practice Data 1. To merge the two data sets we
choose: DataMerge FilesAdd Cases. Choose the correct file to merge, in this example we
Text
are adding the second practice data set to the first practice set currently open. The dialog box
shows you the variables for which it will add cases on the right. And variables for which it will
not add cases, because the variable names do not match, on the left. We click OK, and the new
cases are added to the current data file. We can sort by ID to make sure the cases are in order by
right clicking on the ID column heading, and choosing Sort Ascending. All 400 cases are now in
one file. We can save this new combined Data Set as Practice Data 12so that the name shows
the two files we combined, one and two.
Try merging Practice Data Sets 1 and 2 on your computer now, and save the resulting
combined file of 400 cases.
7.4.2. Merging Variables
Open the combined data set Practice Data 12 that we created in the previous lesson. Or,
if you dont have Practice Data 12, open Practice Data 1 and substitute that data everywhere
Practice Data 12 is mentioned in this lesson. Practice Data 12 has 400 individuals and for each
individual, the variables ID, Sex, Program, Major, and 3 sets of GRE scores. Open Practice Data
set 3. We can see that Practice Data 3 contains ID numbers 1 to 200, that is, it contains data for
the same 200 individuals that were in Practice Data 1, or for the first 200 individuals in the
combined set, Practice Data 12. This new file, Practice Data 3, however, contains three
additional sets of GRE scores for these 200 individuals. We want to combine Practice Data 12
with this file, Practice Data 3, so that all 6 sets of GRE scores for the first 200 individuals are
contained in one file.
IMPORTANT! Before we can Merge Variables, BOTH files must contain a variable we
can use to match rows from the two files AND both files must be SORTED by this variable
before merging or the merge may not happen correctly. The matching variable must have the
Text
same name in both data files. In this example, ID is the matching variable so we will sort by ID
by right clicking the column heading and selected Sort Ascending. Then Save the file.
Return to the Practice 12 Data either by using the Open command or using the Recently
Used Data command, sort the file by the ID column, then save the sorted file.
We now are ready to merge the two files. Select DataMerge FilesAdd Variables
Choose the file you want to add, in this example were adding Practice Data 3 to the already
open Practice Data 12, and click Open.
On the right we see variables that will be included in our new data set, which combines
the data from Practice Data 12 and Practice Data 3. The variables with a star come from the
current file, Practice Data 12, the variables with a plus are being added from Practice Data 3. To
tell SPSS to match the rows for the same individuals in both files we select Match Cases. I
recommend keeping the default option, Both files contain cases. We must tell SPSS which
variable to use to match rows from the two files. We used ID so we move ID into the Key
Variables box. Click OK, and SPSS gives us a warning about sorting. We sorted, so we click
OK, and the variables from our two sets are merged into the current file. We can see that the new
file contains variables for all six sets of GRE scores. Looking at the data itself, we can see that
the 4th, 5th, and 6th, set of GRE scores are missing for ID numbers 201-400. This is appropriate
since Practice Data 3 only had data for the first 200 ID numbers. Lets save this combined file as
Practice Data 123, so that the name shows which data included in this file.
Try combining Practice Data 12 with Practice Data 3 on your computer now. Dont
forget to sort the matching variable ID before combining.
7.5. Printing
Text
While printing in SPSS is very similar to printing in other Windows programs, there are
some specific aspects to printing in SPSS that are worth reviewing, particularly aspects related to
printing portions of your output or customizing your output before printing.
7.5.1. Printing Data or Syntax
Printing the data file from the Data Editor or syntax file from the Syntax Editor is a
relatively straightforward process. To print the data use the Data Editor window. FilePrint
Preview to show what the data will look like when printed. FilePrint will open the printer
dialog box and clicking OK will print your data file. ViewFonts can be used to reduce the size
of the font before printing to reduce the number of printed pages. Or, if you want to print only
part of the file, highlight the part you wish to print. Select FilePrintand use the Selection
option, however, its not possible to preview when printing only a portion of the data file. You
may also switch to the Variable View and use FilePrint to print your variable definitions.
To print the syntax file use the Syntax Editor window, use FilePrintOK to print the
syntax file. Use ViewFonts to change the size of the font if desired. Or highlight, a portion of
the syntax file, choose FilePrintand the Selection option to print only a portion.
Try printing a few selections from a practice data and syntax file on your computer now.
7.5.2. Printing Output
Printing your output from the Output Viewer is far more common than printing data or
syntax, and you have many more options for controlling what is printed. Frequently, you do not
want to print the entire SPSS output but rather only those portions necessary for your purposes.
By default, the Print command will use the selection option to print only the portions of the
output you have selected. The Outline View shows the currently selected section, in this
example the Statistics table, and Print Preview will show how this will look when printed. If you
Text
want to print your entire file you can click on the Output heading to select the entire output. You
can click on a procedure heading to print the results of one procedure or on a subprocedure
heading to print one subprocedure. Holding the CTRL key on your keyboard while clicking, you
can select or deselect individual books to print. and Print Preview will show what only these
books will look like when printed. It is also possible to close some books before making your
selection in order to prevent those books from printing. If you are unhappy with the order of
your output you can of course click and drag a book to a new location before printing.
So we see you have many ways to choose which portions of the output you want to print.
Try printing selections from Practice Output 1 on your computer now.
7.5.3. Customizing Output
In addition to choosing the portions of the output to print, you can change and customize
the output itself before printing. FilePage Setup will take you to some controls for how the
page will look. Optionswill take you to the Header/Footer tab that allows you to insert text or
other information into the header or footer of your document. You can type text, change the font
or alignment of the text, and insert dates, times, page numbers, filenames, or page titles. The
Options tab allows you to control the size of the graphs in your output. You can increase the size
to emphasize a graph or decrease the size to save space. These options must be selected before
you run the procedure and produce the output.
The output objects themselves can also be edited. Double-clicking a text object will
open the object. Once the text object is open, you can add to or edit the text, format the text or
change the font, or insert a page break between lines of text. This works for the log, titles, or any
other text object.

Text
Tables can be edited as well. Parts of the table can be selected or edited. And the look of
the table can be customized using the Format menu, or Formatting Toolbar, or the Pivot menu, or
Pivoting Trays window, which appeared when we opened the table.
Similarly, you can open and customize graphical objects also called charts. Double
clicking opens the chart in the Chart Editor window. Parts of the chart can be selected and the
Edit, Options, and Elements menus can be used to make changes. EditProperties opens a
dialog box with many options for customizing the part of the chart selected. Or you can double-
click on any part of the chart to open the properties dialog box for that part. A particularly nice
feature of the Chart Editor window is the ability to save a template for the chart formatting
changes you just made. Once saved, you can apply the chart template, containing all of the
changes you decided to make, by opening any default chart produced by SPSS and Applying the
Chart Template. It is possible to save many different templates for displaying your charts in
different ways.
I encourage you to take some time and experiment with customizing the output objects in
Practice Output 1. The features used to customize can be learned quickly and the customizing
can have a huge effect on how your final output will look.
7.6. Entering Data
Entering data is an essential, but often tedious, part of the overall data analysis. Having
an efficient and effective strategy for data entry helps to save entry time and maintain the
accuracy of the data.
7.6.1. Defining Variables
If you are working on a project of your own, you may need to create a new data file and
enter your data by hand. I recommend first defining all of your variables and their characteristics
Text
in Variable View, then entering the actual data in Data View. The most important characteristics
to define are the variable name, variable type, variable labels and if necessary, value labels. You
can create variables quickly by typing a variable name and using the down arrow on your
keyboard. There are a few restrictions on the variable names, for example, they cannot begin
with a number or contain spaces. I recommend keeping names as short as possible while still
making them descriptive. Often, a combination of letters and numbers works well. Keeping
variable names short helps to keep the column width in Data View manageable. After defining
variables names, you can change the variable types if necessary. By default, SPSS makes all
new variables numeric and in the vast majority of cases your variables will be numeric. But if
you do have non-numeric data you can change the type here. Defining variable labels allows
you to describe your variables more completely without cluttering up the variable names.
Defining value labels is only necessary for discrete variables but helps a lot when reviewing your
output. For example, if the data was from a 5 point anchored survey, you may want to enter the
anchors as value labels. Enter the value. Use the tab key or mouse to move to the next box.
Enter the label. Then click Add. Repeat this process for every possible entry the discrete
variable could have. Then click OK to save the labels. You should check the other default
characteristics of the variables and make sure they are acceptable for your data. But in most
cases, defining the variable name, type, label, and value labels is adequate in preparing your file
for data entry.
7.6.2. Inputting Data
Once you have defined your variables, your data file is ready for you to input the actual
data. While speed it desirable, ACCURACY is THE MOST IMPORTANT priority when
entering your data. What seem like simple typos may not be noticed later and can have a
Text
profound effect on the results of your statistical analysis. We are all human and the best of us
make occasional entry errors. Professional SPSS users always perform double checks on their
entry to eliminate mistakes.
Organizing your hardcopy data before entry will help to speed the entry process and
improve accuracy. Sometimes it is useful to use your finger, a ruler, or some other kind of aid to
help you keep your place as you move through the hardcopy data. Whether you enter the
variables horizontally, by individual, or vertically, by variable, will depend on what form your
hard copy is in. However, it is important to remember that SPSS expects that ALL data entered
on the same row is for the same individual. Entering data from more than one individual on the
same row will not cause any error messages, but it WILL cause the output from your procedures
to be totally meaningless. SPSS can only work correctly if you enter the data in the form it
expects, only one individual per row.
You can use the mouse pointer, the ENTER key, the TAB key, and the four ARROW keys
to move from data cell to data cell. In general, the keyboard is a faster technique than using the
mouse. The ENTER key will move the active cell down and is useful when you are entering data
by variable. The TAB key will move the active cell to the right and is useful when you are
entering data by individual. Using the SHIFT key and the TAB key together will move the active
cell to the left. Using the ARROW keys will move the active cell in the direction of the arrow.
Up arrow, Right arrow, Left arrow, Down arrow, and is useful for larger shifts in location in the
data set. The Page UP, Page DN, Home, and END keys can also be useful. With experience, you
will find the entry techniques that are best suited for you.
Open a new, empty Data Editor file and practice defining a few variables and entering
some sample data. If you like, print some of the Practice Data Set 1 and try recreating it.
Text
7.7. Transforming Data
It is often necessary to manipulate the raw data gathered from an experiment before the
data is in the appropriate form to be analyzed. SPSS provides options for quickly and easily
making any needed manipulations to the data. Entering the data directly as it was gathered, then
using these options to transform it, is generally the fasted and most accurate way to transform the
data into the form needed. The next two lessons describe these options for transforming data.
7.7.1. Transform>Compute
Once you have entered your data, you may wish to create new variables by combining the
existing variables in some way. Open Practice Data 1. This practice data has three variables
listings for GRE Total scores. That is, each individual in this set took the GRE three times. Lets
say we wanted to create a fourth variable representing each persons average Total score over all
3 tests. TransformCompute opens a dialog box that allows you to create a new variable by
combining the existing variables in some mathematical function. Target Variable is the name we
want to give to the new variable SPSS will create to hold the results of our calculation. We can
label this variable Average GRE and confirm that the type is numeric. Continue. In the Numeric
Expression box we can create our formula. We can use simple mathematical operators or choose
from a large number of predefined functions included with SPSS. In SPSS v.13, groups of
similar functions are listed in the top box and clicking on a group displays the specific functions
available in the lower box. In versions 11.5 and 12 there is only a single box listing all available
functions. We want the average GRE score so we will add the three scores and divide by three.
We will need parentheses for the addition in the numerator. GRE total 1 + GRE total 2 + GRE
total 3 Move the cursor outside the parentheses and divide by3. Check to see that the
formula and variable name are correct for what you want. Then click OK.
Text
We can see that SPSS has created our new variable, Average GRE, and placed it at the
end of our data file. 1263.33 should be the average of person 1s three total scores, 1252, 1278,
and 1260, and it is.
Using the Transform>Compute command in this way, we can create whatever new
variables we want from any mathematical combination of the existing variables. Note that this
can be a way to save data entry time. I entered the math and verbal scores when I created
Practice Data 1 but used TransformCompute to quickly create the total scores.
Try experimenting with the Transform>Compute command to create some new variables
from those existing in Practice Data 1 now.
7.7.2. Recoding Data
For a variety of reasons, variables often need to be recoded. For example, lets say we
wanted to recode the Sex variable to use negative one, instead of one, to represent males, and one
instead of two to represent females. There may be some complicated formula that would allow
us to make these changes with the Transform>Compute command. But the Transform>Recode
command can accomplish this very simply. When using Transform>Recode, I recommend
always choosing, Into a Different Variable. Having an extra variable in your data set is usually
not a problem, whereas choosing, Into the Same Variable will overwrite your existing data,
making it impossible to retrieve.
We need to choose the variable we want to recodeand create a new variable to hold
the recoded values. We want to recode Sex, and we will call our new variable sexr1 with the
label Sex Recode 1. Clicking change confirms our choice for the new variable. The Old and
New Values button allows us to tell SPSS how to recode the data. The old value of 1 for males
needs to be changed to a new value of negative 1 for males. We click ADD to confirm these
Text
values. Our old value of 2 for females needs to be changed to a new value of 1 for females.
ADD confirms these values. Changing the values 1 and 2 should be sufficient to recode our sex
variable. As a sort of protection, however, it is often useful to make all other values copy either
unchanged, OR, listed as system missing. When our recode choices are complete we click
Continue, double check our variables, then click OK.
We see that our new variable, Sex Recode 1 has been created and appears to contain the
correct values.
Try recoding the Sex, Program, or Major variables from Practice Data 1 on your
computer now.
7.8. Generating and Saving Syntax
Running SPSS procedures by selecting options from the menus and dialog boxes is
always acceptable. But using Syntax offers several advantages over using menus. For example,
syntax can be saved and used again with other data files. Experienced users often create a
template for the procedures they use the most and just change the variable names in the syntax
file when they apply it to a new data set. Menu choices cannot be saved and must be reentered
for other data sets. Also, highlighting and running syntax is faster than making selections from
the menus. Its much easier to change a bit a text in the syntax file than it is to go through all the
buttons in the dialog boxes. Most users of SPSS who dont use syntax avoid it because they find
writing the syntax tedious and troublesome. In this guide, wed like to emphasize the ways in
which SPSS can be made to write the syntax, or at least most of it, for us.
7.8.1. Pasting Syntax
The first way to make SPSS generate the syntax for a procedure is to use the Paste button
after making your selections from a dialog box. In the Transform>Compute lesson, we wrote a
Text
formula to compute the average GRE Total score from the first three GRE Total scores. Lets
examine that process again.
Open Practice Data 1, the original version before we computed AveGRE. We can see that
no AveGRE variable appears in the Data Editor. Switch to the Syntax Editor.
Select: TransformCompute from the menus. The dialog box is completed already,
with our new variable and label on the left, and our formula on the right. Earlier, we clicked OK
to run the procedure and create the new variable from the formula. This time we will click the
Paste button instead. The syntax is then pasted into the syntax editor.
We can switch to the Data Editor to confirm that the Paste Command did not create our
new variable from the dialog box. It only pasted the syntax. We can switch back to the Syntax
Editorhighlight the syntax for creating AveGREand select RunSelection from the menus.
AveGRE has been computed and now appears in the Data Editor. This time however, it was
produced from syntax, not from the dialog boxes.
Virtually all SPSS procedures can be pasted in this way and subsequently run from the
saved syntax file.
Try creating the syntax for the average GRE formula by using the paste button on your
computer now.
7.8.2 Copying Syntax
A second way to make SPSS generate the syntax for a procedure is by copying the syntax
from the Log in the output window. To activate the log, select EditOptionsSelect the
Viewer tab, and select the Display commands in the log checkbox. Then click OK to save
these preferences and exit the Edit>Options dialog box. Now when we run a procedure, whether
from the menus or from syntax, the syntax for that procedure will be shown in the Log portion of
Text
the Output Viewer. Lets delete the variable AveGRE and run our formula from the menus again.
TransformComputethe formula is readyOK. Switching to the Output Viewer we can see
a book called Log in the Outline View. Clicking that book shows the actual syntax output in the
Results View. With the log selected we can choose EditCopy, switch to the Syntax Editor,
locate the cursor where we want, and choose EditPaste. We have successfully copied the
syntax from the Output Viewer.
Try activating the Log, running a procedure, and copying the syntax from the Output
Viewer into the Syntax Editor on your computer now.
8. Dialog Boxes
In some of the previous lessons, we have mentioned and have been using Dialog Boxes.
Dialog Boxes are a central component in working with SPSS and its time we looked at them in a
little more detail.
Dialog Boxes, like the one seen here, provide an interface between you and SPSSthat
is they provide a way for you to interact with the program. You make selections within the
Dialog Box and when you click OK, SPSS converts those selections into programming code that
Text
will run the procedure. Dialog boxes are used because they are a much more user-friendly way
of interacting with SPSS than learning a programming language.
Most Dialog Boxes in SPSS share a common set of features. At the top of the Dialog
Box is a title bar listing the name of the procedure for which this box is used. This Dialog Box is
used for the Frequencies procedure. On the left, is a list of the variables from the Data Editor, in
this case, the variables from Practice Data 1. In the center we see boxes representing variable
choices for our procedure, in this case a single box. On the right we see a set of program related
buttons, and at the bottom we see a set of statistical option buttons, in this case three.
8.1. Variable List
The variable list on the left shows all of the variables from the Data Editor, or if we have
defined Sets from the Utilities Menu, all of the variables from the Sets were working with.. The
variable list shows the names of the variables and might also show the variable labels, as it does
in this example, showing the name ID and the label Participant ID. This setting is controlled
from EditOptions, the General Tab, the Variable Lists section. Checking Display Names will
show only variable names in the dialog boxes. Checking Display Labels with show both names
and labels.
8.2. Variable Choice Boxes
The variable choice boxes in the center allow us to choose which variables will be used in
our statistical procedure. What are the variables we want to analyze? In this example, the
Frequencies dialog box, there is only a single variable choice box, called variables. The
number and type of variable choice boxes appearing within the larger dialog box will depends on
what statistical procedure we are conducting.
8.3. Program Buttons

Text
Most dialog boxes have the same five Program Buttons OK, Paste, Reset, Cancel, and
Help. As we have seen: The OK button runs the procedure once variables have been entered
into the necessary Variable Choice Boxes. The Paste button does not run the procedure but does
paste the syntax into the Syntax Editor. The Reset button will clear all the variable choice boxes
and any other changes from the default settings you have made. This is a quick way to start over
with the dialog box. The Cancel button will exit the dialog box without running the procedure
and return you to the last SPSS window you were working in. The Help button will connect you
to the built in help files for SPSS. Specifically, it opens files that explain the components and
settings of the dialog box you are working from. When first running a procedure it is often good
to click the Help button and read the resulting information.
8.4. Statistical Option Buttons
The statistical option buttons allow us to change settings for the procedure to customize
the way the procedure runs and the output it creates. In this example we see buttons that allow
us to choose which statistics we want to generate, what charts or graphs we want to produce, and
how the output will be formatted.
8.5. A Second Example
Here we see a second example of a dialog box, used for the Univariate procedure of the
General Linear Model. We see the familiar variable list on the left and the variable choice boxes
in the center. Notice that there are more boxes here because this is a more sophisticated
procedure. And we see the familiar program related buttons and statistical option buttons,
though for some reason they have been transposed so that the program buttons are at the bottom
and the statistical option buttons are on the right. Dont let small inconsistencies like this
confuse you. We still have the same basic parts of a dialog box that were described previously.
Text
Part II. Descriptive Statistics
Part II of this guide demonstrates how to use SPSS to generate the Descriptive Statistics
usually covered in an introductory Statistics class.
Chapter 9, covering descriptive statistics, includes a more detailed description of the data
set, and lessons showing how to use SPSS for scales of measurement, frequency distributions,
measures of central tendency and variability, and z-scores.
Beginning with Chapter 9, much of the discussion in The Guide presumes you have a
basic knowledge of statistics or are currently taking an introductory class. The lessons are not
sufficient to teach statistics to someone without any knowledge.
9. Descriptive Statistics
All statistical analysis begins with a set of measurements we refer to as data. The goal of
Descriptive Statistics is to describe the characteristics of that data with as much detail and clarity
as possible while still being brief and parsimonious. Statisticians have found that a set of
measurements can be described thoroughly yet briefly by examining their mathematical nature,
central tendency, and distribution or variability. This chapter discusses some of the ways in
which SPSS can be used to describe these characteristics using Scales of Measurement, measures
of central tendency and variability, and tables and graphs of observed data.
9.1. The Data Set
Open the data set, Practice Data 1. This data is fictional and was designed strictly as a
teaching tool for statistics. The data is intended to represent 200 undergraduate students, 100
applying to Medical school and 100 applying to graduate school in Clinical psychology. GRE
stands for Graduate Records Exam, which is a standardized test taken by undergraduates before
Text
applying to graduate school much the same way the SAT is taken by high school students before
applying to college. Program indicates which program the student is apply to and major
indicates the students undergraduate major.
9.2. Scales of Measurement
Scales of Measurement are handled by SPSS in the Variable View of the Data Editor
under the heading Measure. Variables can be defined as Nominal, Ordinal, or Scale, which
represents both Interval and Ratio data. SPSS cannot tell you what the scale of measurement for
a variable is, you must tell SPSS what it is based on the way it was measured and the concept it
represents. By default, SPSS defines all newly created variables as Scale variables.
9.3. Frequency Distributions
The Frequencies Procedure, located under AnalyzeDescriptive StatisticsFrequencies,
can produce most of the desired descriptive statistics for variables, including frequency tables,
frequency graphs, percentiles, measures of central tendency and measures of variability. The
single Variable Choice box allows you to choose as many variables as you would like for
analysis, though its usually a good idea to analyze nominal variables separately from scale
variables.
9.3.1. Frequencies Procedure Options
Checking the Display Frequencies Tables checkbox will produce for each variable an
ungrouped frequency table that includes the frequency, percent, and cumulative percent for each
category in the table. The frequency table can be ordered from low-to-high or from high-to-low
by checking the ascending or descending values checkboxes from the Format button. In addition
to producing a frequency table, the Frequencies procedure can be used to produce a frequency
graph, either a histogram for continuous variables or a bar chart for nominal or ordinal variables
Text
by choosing the desired option from the Charts button. For histograms its possible to
superimpose a normal curve over your data.
At the same time, the Frequencies procedure can be used to produce for each variable the
desired measures of central tendency, measure of variability or dispersion, and/or percentiles by
using the Statistics button. For percentiles you can choose the standard quartiles, equally
divisions such as every five or every ten percentage points, or you can select the specific
percentage cutoff you desire. For your output, you can choose to have all of the statistics
selected displayed in separate tables for each variable OR combine statistics for all variables into
a single table, by selecting the desired choice from the Format button.
9.3.2. Output
Lets run the Frequencies procedure and examine how the output looks. Open Practice
Data Set 1. Choose AnalyzeDescriptive StatisticsFrequencies. Move the 3 GRE Total
variables into the variable choice box. Check the display frequency tables box. From the
Statistics button lets choose mean, median, and mode for central tendency, standard deviation,
variance, minimum and maximum for variability, and quartiles for percentiles. From the Charts
button well choose Histogram since our variables are at least interval scale, and well include a
normal curve. From the Format button, Ascending order for frequency tables is acceptable and
Compare variables condenses the output so that is fine. Then run the procedure.
In the output, the Log shows the syntax for the variables and choices we selected. I will
copy and paste the syntax from the Log into Practice Syntax 1. The Statistics table shows the
three variables and each of our choices from the Statistics Button. It also shows the number of
cases or rows of data used to generate these statistics to help us confirm that we analyzed the
data we wanted to. Ungrouped frequency tables were produced for each of our three variables
Text
showing frequency, percent, and cumulative percent. Also, histograms with a superimposed
normal curve were produced for each variable.
Recall that charts are customizable by double clicking on them. You can use the
histogram chart to produce a grouped frequencies table if you desire. Select the x-axis values
and choose EditProperties. Create the desired grouped categories by customizing the category
width, or bin sizes on the Histogram Options tab and/or by changing the range choices on the
Scale tab. To obtain the frequency value for each category, select the bars by clicking, then
choose ElementsShow Data Labels, Close. The histogram can now be used to generate a
grouped frequency table.
Note: For scale variables I recommend deselecting the Display frequency tables
checkbox in the dialog but keeping the statistics and histograms. For nominal variables I
recommend keeping the Display tables checkbox, omitting the statistics, and using bar charts,
rather than histograms.
Practice using the Frequencies procedure to generate descriptive statistics for the
variables in Practice Data 1 on your computer now.
9.4. Subgroups
While the Frequencies procedure allows us to examine our variables with most of the
tools of descriptive statistics, it does not allow us to look at subgroups within a variable. For
example, we may want to divide the variable GRE Total 1 into male and female subgroups and
produce statistics separately for each group. SPSS offers other procedures for examining
subgroups.
9.4.1. Case Summaries

Text
To examine subgroups of a variable we can use AnalyzeReportsCase Summaries.
As with the Frequencies Procedure we can move the variables we want to examine into the upper
box, Variables, for example GRE Total 1. In the lower box, Grouping Variables, we enter the
variables we want to use to divide GRE Total 1 into subgroups before calculating statistics. For
example, using Program as a grouping variable tells SPSS to divide the values in the GRE
TOTAL 1 column of the data set into two groups, Graduate and Medical, based on the entries in
the Program column of the data set. If you want to, you can see the divided data by looking at
the GRE Total 1 column after sorting the data by the Program variable
Lets choose all three GRE Total variables to examine and use both Program and Sex as
grouping variables. Turn off the Display Cases option to simplify the output, choose the
statistics Number of Cases, Mean, and Standard Deviation, accept the default options, and run
the procedure. The output consists of the Log, showing the syntax for the variables and choices
we made, the Processing Summary table, showing which data was processed in the procedure,
and the Case Summaries table, which shows the subgroup information we wanted. Each of our
target variables, have been first divided into Programs, Medical and Graduate, then again by Sex
into Male-Medical, Female-Medical, Male-Graduate, and Female-Graduate groups. We see the
chosen statistics for these 4 groups, 1234 As well as stats for all Medical and all
Graduate or all Male and all Female. I will copy and paste the syntax for this procedure
from the Log to Practice Syntax 1.
Practice combinations of variables and statistics using the Case Summaries procedure on
your computer now. Try checking the Display cases box to see what happens.
9.4.2. OLAP Cubes

Text
The OLAP Cubes procedure is located under AnalyzeReportsOLAP Cubes. This
procedure is essentially identical to the Case Summaries procedure, except that it can produce a
3-dimensional table, that has depth or layers, in addition to the two dimensions, rows and
columns. This allows you to isolate the data of interest readily for examination or for printing.
Lets use the same target variables, GRE Totals 1, 2, & 3, the same grouping variables, Program
and Sex, and the same statistics, Number of cases, Mean, and Standard Deviation.
The visible table shows only the statistics for all of the cases in each variable. Activating
the table by double clicking, we can see the stats subdivided by Program only: Medical
Graduate. Or by Sex only: MaleFemale. Or by combinations: Medical MaleMedical
Female, etc. Think of each of these tables as being lined up, one behind the other, giving the
visible table depth. We can use the Pivot menu to show multiple layers of depth at the same
time by using the Move Layers To Rows or Move Layers To Columns options. Or to have the
greatest flexibility of format, use the Pivoting Trays command. This window shows icons for
our variables, Program, Sex, Statistics, and Variables, our GRE Total Scores. And table locations
for Rows, Columns, and Depth or Layers. To control what is placed where in the table, move the
icons into the desired row, column, or depth location. [PAUSE] For example, this arrangement
duplicates our table from the Case Summaries lesson. Before exiting I will save the syntax to
Practice Syntax 1.
Practice combinations of variables using the OLAP Cubes procedure on your computer
now. Experiment with the Pivot menu options to see what happens.
9.4.3. Explore
If you want to produce graphs for subgroups of a variable, instead of the Case Summaries
procedure you can use the Explore Procedure, located under AnalyzeDescriptive Statistics
Text
Explore. Like the Case Summaries procedure, the first box, Dependent List is for the target
variables we want to examine, and the second box, Factor List is for the categorical variables
well used to create subgroups. Explore can produce statistics if you want it to, but there are few
choices for customizing and the format of the resulting output is far inferior to the tables created
by the Case Summaries procedure. Therefore I recommend displaying Plots Only.
With the Plots button, we can choose our standard graph, the histogram. And, the
Explore procedure also offers us two other kinds of graphs, stem-and-leaf and boxplots, if you
have learned about those graphs and want to use them.
Making the choice of how to handle Missing Values is beyond the scope of this
introductory guide, so well leave that discussion to your statistics textbooks or instructors.
Well simply accept the default here. Then run the procedure. We can also note that you can get
help with Missing Values from the Help menu from the main window, choose Topics, select the
index tab, type the words missing values into the Find box, highlight the in Explore listing
from the results and click the display button.
Now, returing to the Explore procedure we see the output consists of the Log showing the
syntax for our procedure, the Explore procedure, and subprocedures for our grouping variables
Program Type and Sex. The subprocedures consist of the familiar Processing Summary,
histograms, stef-and-leaf plots, and boxplots for the two Programs and for the two Sexes for each
of our variables. Notice that the procedure used the two grouping variables, Program and Sex,
separately rather than combining them into for example Medical-Male, Medical-Female, as the
Case Summary Procedure did.
As usual, Ill copy the syntax before exiting.

Text
In the syntax it is possible to generate combinations of our grouping variables to generate
graphs for Medical-Male, Medical-Female subgroups, by adding the word BY between the
grouping variables. This is not possible when using the dialog box and the menus and is one
example of how Syntax is more powerful than menus and dialog boxes.
Practice using the Explore procedure to produce graphs and plots for subgroups on your
computer now. Try adding the word BY to your syntax and see what results.
9.4.4. Normality & Homogeneity
The assumptions of Normality of distribution and Homogeneity of Variance between
subgroups are common to most statistical tests. The Explore procedure also allows us to
examine these assumptions using the Normality Tests and Levene Test check boxes. The
resulting output shows the Normality and Homogeneity test results with significant findings
indicating a violation of assumptions.
9.4.5. Split File
DataSplit File is another general procedure for examining subgroups of variables.
After you choose your grouping variables, the data file is essentially divided into sets
representing all possible permutations of the grouping variables. After this is done, ALL
procedures selected will be applied to each of the created sets separately. For example, we can
use Frequencies to produce the mean, standard deviation, and histograms for GRE Tot 1. The
output shows results for the 4 sets created by combining Program and Sex. Important: If you
use this procedure, dont forget to reset your data file before conducting additional tests.
9.5. More Graphs
In addition to the ways discussed so far, the Graphs menu allows you to produce the
desired type of graph directly. In most cases you simply need to specify the variable you wish to
Text
examine. For example, Bar Graphchoose a variableOK. Line Graphchoose a variable
OK. Box Plotschoose a variable, and a grouping variableOK. Error Barschoose a
variable and a grouping variableOK. Scatterplotselect the two variables whose relationship
you want to graphOK. And Histogramchoose a variableOK.
The Interactive option allows you to create more sophisticated graphs of the type desired.
For example, HistogramDrag the variable to graph onto the X-axisDrag any grouping
variables to the Panel Variables boxOK. Histograms on the same scale are produced for all the
combinations of the panel variables selected.
In SPSS v.13, but not in v11.5 or 12, the graph dialog boxes include panel variable
selection boxes that allow you to create graphs for subgroups of your data. For example, using
sex as a panel variable for scatterplots produces separate scatterplots for males and females.
Experiment with the different graphing choices and the interactive option to see which
graphs you can produce on your computer now.
9.6. Z-scores
The simplest way to produce z-scores for a variable is to use AnalyzeDescriptive
StatisticsDescriptives. Choose the variables you want to create z-scores for, then select the
save standardized values as variables checkbox. This will cause the procedure to calculate z-
scores for the selected variable and save them as new variables in the active data file. The
Descriptives procedure may also be used to produce descriptive statistics if you like. The output
shows any statistics you selected and the Data file contains z-scores for the Total scores as three
new variables.
If you want to, try using Descriptives to create z-scores for one or more variables in your
data set. Pick a few cases and verify the z-score entry by calculating it by hand.
Text
Part III. Inferential Statistics: Mean Differences
Beginning in Part III of the Guide, we cover inferential tests of mean differences.
Chapters 10 to 12 cover t-tests. Chapter 10 covers the one sample t-test, the most basic
inferential test of mean differences. Chapter 11 covers the t-test for two independent samples
and Chapter 12 covers the t-test for two paired samples.
Chapters 13 to 15 cover the Analysis of Variance procedure also called ANOVA. Chapter
13 covers the ANOVA test used for one between groups factor. SPSS offers two procedures, one
specific one general, that can be used for this test and both will be demonstrated. Chapter 14
covers the ANOVA test used for two between groups factors and Chapter 15 covers the ANOVA
test used for one within groups factor.
10. One-Sample T-Test
The One-Sample T-Test compares the mean of a sample to a known population mean to
test whether these two means differ significantly, that is, more than would be expected by
sampling variability or chance. In our example, we will compare the mean of GRE Total score 1
for our sample to a known population mean of 1000. Repeat each step of this process on your
computer with Practice Data 1 as we move through this example.
10.1. Running Procedure
Select AnalyzeCompare MeansOne-Sample T-Test from the menus. Move the
variable to be compared, GRE Total 1, into the Test Variable box and enter the known
population mean, 1000, into the test-value box. The Options button allows us to change the
Confidence Interval criteria or the way the procedure will handle missing values. Well accept
the default. Click OK to run the procedure.
10.2. Reading Output

Text
The output for a One-Sample T-Test contains two tables descriptive statistics for our
variable choice and the results of the t-test comparison. The descriptive statistic table shows the
variable we chose to compare, GRE Total 1, the number of cases, the mean, the standard
deviation and the standard error of the mean for that variable.
The results table for our t-test shows the variable we selected, GRE Total 1, the known
population value that we compared it to, 1000, the t-statistic, 39.543, the degrees of freedom,
199, the significance or probability value, less than .001, the mean difference, 259.4 or simply
1259.4 minus 1000, and the upper and lower bounds for a 95% confidence interval for the
difference between the means.
The t-statistic, 39.5, with 199 df is significant at the p<.05 alpha level indicating a
significant difference between our sample mean and the know population mean of 1000.
Looking at the descriptive statistics we can see that our sample mean, 1259.4 is significantly
higher than the population mean of 1000.
10.3. Using Syntax
Copy the syntax for the One-Sample T-Test from the previous lesson to the syntax editor,
then clear the Output Viewer by deleting all output. The first line of syntax tells SPSS which
procedure to run the one-sample t-test The Variables subcommand tells SPSS which variables
to enter into the procedure GRE Total 1. The Test Value subcommand tells SPSS the value for
the known population mean 1000. The Missing subcommand tells SPSS how to handle
missing values and the Confidence Interval Criteria subcommand tells SPSS how wide a
confidence interval to compute a 95% confidence interval.
SPSS has created the basic syntax for us. If we want, we can make changes to our
procedure here, instead of from the dialog box, simply by editing the text. Create a copy of the
Text
syntax to experiment with. We can add the variables gre_tot2 and gre_tot3 to the procedure, and
change the population value to 900. Run the modified procedure by highlighting it then
choosing RunSelection. We now have test results for all three GRE Total variables and for a
comparison to the value 900 instead of 1000. As procedures become more complex, making
simple text changes to the basic syntax SPSS has produced, as we did here, becomes a much
faster way than using dialog boxes to run procedures.
Experiment on your computer with other changes to the syntax for this test to see what
results you can produce.
11. Independent Samples T-Test
For the Independent Samples T-Test an independent variable creates two subgroups or
samples. The test compares the difference between the means of those two samples for some
dependent variable to a hypothesized difference of zero between the population means for that
variable for the two populations from which the samples came. A significant difference would
indicate that the sample means differ more than would be expected by sampling variability or
chance if actually drawn from two populations with equal means for that dependent variable.
Our example will compare the means of the male and female subgroups for the dependent
variable GRE Tot 1. Repeat each step of this process on your computer using Practice Data 1 as
we move through this example.
Select AnalyzeCompare MeansIndependent Samples T-Test from the menus. For
this test, one column of data in the data set holds values for both samples and another column
defines the two samples. Move the variable that contains the data for both samples, GRE Total
1, into the Test Variable box. Move the variable that defines the two samples, Sex, into the
Text
Grouping Variables box. Click Define Groups and tell SPSS what values were used in the
data set to represent the two groups. Accept the default options and click OK to run the
procedure.
The output for the Independent Samples T-Test contains two tables descriptive statistics
for the two samples and the results of the t-test comparison. The descriptive statistic table should
be self-explanatory. The results table shows the Levene test, which tests for homogeneity of
variance between the two groups. The test is non-significant, p.354, so homogeneity of variance
can be assumed and well use the first row of the table as our results. If the test were significant,
p<0.05, we would not assume homogeneity of variance and we would use the second row of the
table as our results. We see the variable that contained data for both samples, GRE Total 1, a t-
statistic of minus 0.936 on 198 degrees of freedom, and a two-tailed significance of 0.350
indicating non-significant differences between the male and female groups. The mean difference
12.54 represents the male group mean minus the female group mean and would be the
numerator when calculating the formula by-hand. The standard error, 13.39, of the difference
would be the denominator when calculating the formula by hand. And we have the upper and
lower bounds of a 95% confidence interval
11.3. Using Syntax
Copy the syntax for the Independent Samples T-Test to the syntax editor, then clear the
Output Viewer by deleting all output. The first and second lines of syntax tell SPSS which
procedure to run, an independent samples t-test, which variable defines the two groups, sex, and
what values to use from the data set to define the groups, 1 and 2. The Variables subcommand
tells SPSS which variable to enter into the procedure GRE Total 1. The Missing subcommand
Text
tells SPSS how to handle missing values and the confidence interval criteria subcommand tells
SPSS to create a 95% confidence interval.
Create a copy of the syntax to experiment with. Add the variables gre_tot2 and gre_tot3
to the procedure. Highlight and run the modified procedure with RunSelection. A similar set
of output as just described is produced, this time for each of the three variables.
12. Paired Samples T-Test
For the Paired Samples T-Test an independent variable creates two samples that are
matched in some way, often by repeated measurements. The test compares the difference
between the means of those two samples for some dependent variable to a hypothesized
difference of zero between the population means for that variable for the two populations from
which the samples came. In the Paired Samples T-Test, unlike the Independent Samples T-test,
the two samples must be of equal size and each data point in the first sample must be linked,
either by matching or by repeated measurement, to a single data point in the second sample. A
significant difference indicates that the sample means for this dependent variable differ more
than would be expected by sampling variability or chance if actually drawn from two populations
with equal means for that dependent variable. In our example, time is the independent variable
creating GRE testing 1 and GRE testing 2. The dependent variable is GRE Total score, that is,
GRE Tot 1 for time 1 and GRE Tot 2 for time 2. Repeat each step of this process on your
computer using Practice Data 1 as we move through this example.

Text
Select AnalyzeCompare MeansPaired Samples T-Test from the menus. For this test,
one column of data in the data file is for sample 1 and another column of data is for sample 2.
SPSS differentiates the samples using the columns, not an additional variable. Move the
variables for sample 1 and sample 2 into the Paired Variables box. Accept the default options
and click OK to run the procedure.
The output for the Paired Samples T-Test contains three tables descriptive statistics for
the two sample, correlation for the two samples, and the results of the t-test comparison. The
descriptive statistic table should be self-explanatory. The correlation table shows the number of
cases in each sample, 200, the correlation between samples, 0.716, and the significance level of
that correlation, less than .001. The t-test results table shows the two samples compared, GRE
Total 1 and 2, the t-statistic, 0.102 with 199 degrees of freedom, and the two-tailed significance
of 0.919 indicating non-significant differences between the two GRE Total scores. The table also
shows the average, standard deviation, standard error of the mean, and a 95% confidence interval
for the 200 difference scores created by subtracting values in one sample from values in the other
for all 200 cases. The mean difference score would be the numerator and the standard error
would be the denominator for calculating the paired samples t-score by hand.
12.3. Using Syntax
Copy the syntax for the Paired Samples T-Test to the syntax editor, then clear the Output
Viewer by deleting all output. The first and second lines of syntax tell SPSS which procedure to
run, a paired samples t-test, and what variables to use, GRE Total 1 and 2. The Missing
subcommand tells SPSS how to handle missing values and the confidence interval criteria
subcommand tells SPSS to create a 95% confidence interval.

Text
Create a copy of the syntax to experiment with. Add the pair of variables gre_tot2 with
gre_tot3 to the procedure. SPSS pairs the variables in the order they appear, GRE Total 1 with 2,
and GRE Total 2 with 3. Run the modified procedure with RunSelection. The procedure now
shows results for both pairs of variables, 1 with 2 and 2 with 3.
13. One-Factor ANOVA Between Groups
A One-Factor ANOVA Between Groups is analogous to an Independent Samples T-Test
that has been generalized so that more than 2 group means can be compared and tested for mean
differences. An independent variable creates two or more subgroups or samples. The test
compares the differences between the sample means for some dependent variable to a
hypothesized difference of zero between the population means for that variable for the
populations from which the samples came. That is, the hypothesis states the all the populations
have equal means for that variable. A significant result indicates that there is a greater difference
between at least one sample mean and at least one other sample mean than would be expected by
chance if the samples came from populations where the means were equal.
For ANOVA, mean differences and differences expected by chance are converted
mathematically into variances. The variances are then compared in a ratio of variance due to
mean differences divided by variance expected by chance. The hypothesis is that the ratio will
equal one, indicating that the variance due to mean differences is not larger than the variance
expected by chance. A significant result is a ratio of sample variances much larger than one,
indicating that the variance due to mean differences is much larger than the variance expected by
chance.
Text
In this example, the independent variable that defines the groups to be compared is
Student Major, either 1=Biology, 2=Neuroscience, or 3=Psychology. The dependent variable on
which these groups will be compared is GRE Total 1. SPSS provides two different procedures
we can use to conduct a One-Factor ANOVA Between Groups the One-Way ANOVA
procedure and the Univariate General Linear Model procedure. Repeat the steps demonstrated
here on your computer using Practice Data 1 as we move through this example.
13.1. One-Way ANOVA
Using the One-Way ANOVA procedure is simpler than using the General Linear Model
procedure and is the preferred procedure when there is only a single independent variable to be
tested and no covariates. For the single factor case, both the dialog box and the output of this
procedure are more parsimonious than using the more general GLM procedure.
13.1.1. Running Procedure
Select AnalyzeCompare MeansOne-Way ANOVA from the menus. For this test, one
column of data in the data set holds values for all samples and another column defines the
samples. Move the variable that contains the data for all samples, GRE Total 1, into the
Dependent List box. Move the variable that defines the samples, Major, into the Factor box.
The Contrasts button allows us to look for polynomial trends in our data or to specify
specific contrasts for the levels of our independent variable. In this example, we will not be
making any selections here.
The Post Hoc button allows us to select post hoc comparison of means for the levels of
our independent variable to determine specifically, which means differ from which others. The
box contains a wide selection of post hoc tests, some assuming homogeneity of variance and
Text
others not requiring this assumption. Ill assume homogeneity and choose The Tukey post hoc
test with a .05 significance level.
In the Options button, well choose to produce Descriptive Statistics, test for
Homogeneity, and create a graphical plot of means.
Then we can run the procedure.
13.1.2. Reading Output
The ANOVA summary table is the basic output for this procedure. The Mulitple
Comparisons table and the Homogeneous subsets table were generated because we selected the
Tukey post-hoc test from the Post Hoc button. The Descriptive Statistics table, Homogeneity of
Variance table, and Plot of Means were generated because we selected these options from the
Options button.
The Log shows the syntax for this procedure with the options we selected. Ill copy this
syntax into the Syntax Editor. The Descriptive Statistics table shows some basic statistics for the
data from GRE Total 1, broken down by the subgroups defined by Major. Most SPSS tests allow
you to generate some Descriptive statistics. Whether you choose to use these or choose to
produce descriptive statistics separately using one of the other procedures weve covered is a
choice youll have to make as you become more familiar with the program.
The Homogeneity of Variance table shows the results of the homogeneity test. The
significance level, (.302), indicates our assumption of homogeneity of variance is valid.
The ANOVA summary table shows the results of the mean comparisons. As expected, it
shows the variable analyzed, the partitioning of total sums of squares, the degrees of freedom,
mean squares, F-statistic, and significance value. Significance greater than .05 tells us that no
significant differences exist between any of our groups.

Text
In this example, the Multiple Comparisons and Homogeneous Subsets tables are
extraneous because no significant main effect was found. The Multiple Comparisons table, if
necessary, tests each pair of subgroups for significant differences. Biology with Neuroscience,
Biology with Psychology, and Neuroscience with Psychology. The significance levels would tell
us if significant differences between pairs existed.
The Homogeneous subsets table, rather than showing significant differences, shows
groups that are NOT significantly different by putting them in the same column. Here there is
only one column since there were no significance differences.
The plot of means shows the mean GRE Total score for the Biology, Neuroscience, and
Psychology subgroups. There were no significant differences between these groups. Beware of
Y-axis scaling that distorts smalls differences.
13.1.3. Using Syntax
Clear the Output Viewer by deleting all output. The first and second lines of syntax tell
SPSS which procedure to run, the One-Way ANOVA, which variable holds the data, GRE Total
1, and which variable defines the subgroups, Major. The Statistics subcommand tells SPSS to
produce a descriptive statistics table and a homogeneity test. The Plot subcommand tells SPSS
to create a plot of means. The Missing subcommand tells SPSS how to handle missing values.
And the Post Hoc subcommand tells SPSS to conduct a post hoc test, which test to conduct, the
Tukey test, and the significance level to use.
Create a copy of the syntax to experiment with. Since we know no post hoc tests were
necessary, we could delete this subcommand. Move the period, which indicates the end of the
procedure, to the missing values line and delete the post hoc subcommand. Run the modified
procedure with RunSelection. The output now omits any post hoc test results.
Text
Experiment on your computer with other choices from the dialog box or other changes to
the syntax to see what results you can produce.
13.2 Univariate GLM
Using the General Linear Model procedure is more complicated than using the One-Way
ANOVA procedure for the case where there is only a single independent variable to be tested and
no covariates. This procedure is designed for cases where there are more than one independent
variable to be tested and/or there are covariates to be examined. Why would anyone use this
procedure for a single factor test with no covariates? One possible reason is that most
applications of ANOVA in the research world involve more than one factor to test and must use
the General Linear Model Procedure procedure. Researchers get used to using the GLM
procedure and may come to prefer it over the simpler One-Way ANOVA procedure, even for the
simplest case.
Select AnalyzeGeneral Linear ModelUnivariate from the menus. As with the One-
Way ANOVA procedure, for this test, one column of data in the data set holds values for all
samples and another column defines the samples. Move the variable that contains the data for all
samples, GRE Total 1, into the Dependent Variable box. Move the variable that defines the
samples, Major, into the Fixed Factors box. There are several other choice boxes that we do
not need to use, but that would be used with some other more sophisticated tests.
In the Model button, simply accept the defaults. The Contrasts button, similar to the
Contrast button in the One-Way procedure, allows us to specify specific contrasts for the levels
of our independent variable. Simply accept the defaults. The Plot button allows us to produce a
plot of the means of our samples. This is desirable so move the group-defining variable, Major,
Text
to the box for the horizontal axis of the graph, then click Add to confirm the Plot. The Post Hoc
button contains similar options as the same button in the One-Way procedure. Move Major into
the Post Hoc tests box to have the procedure produce post hoc tests, then select the test of your
choice from the list. I will select the Tukey test.
The Save button allows you to produce new variables that represent a number of
additional characteristics useful in more advanced testing. We will not make any selections here.
The Options button allows us to produce some additional information for our groups. Choose
Descriptive Statistics, Estimates of Effect Size, Observed Power, and Homogeneity Tests. Then
run the procedure.
The Between Subjects Factors table, listing the groups we are comparing, and the Tests of
Between Subjects Effects table, showing the results of the comparison, are the basic output for
this procedure. The Descriptive Statistics table, Levene test for homogeneity of variance, and the
last 3 columns of the Between Subjects Factors table were produced because of the selections we
made in the Options button. The Output under the Post Hoc Tests heading was produced because
we selected the Tukey Post Hoc test. And the Profile Plot was produced because we selected this
graph from the Plots button.
The Descriptive Statistics table should be self-explanatory. The Levene test here is
similar to in other procedures, p=0.302 indicating the homogeneity of variance assumption is
valid. The main results table shows us the parsing of total sums of squares, degrees of freedom,
mean square, F-statistic, and significance value for our test. Major is the between subjects factor
and generates the numerator of the F-statistic. Error is the expected differences by chance and
generates the denominator of the F-statistic. The significance value, p=0.789 indicates there
Text
were no significant differences between the means. The observed power and estimates of effect
size are included in case you are covering these topics in your class. If not, ignore them. The
Post-Hoc Multiple Comparisons table and Homogeneous Subsets table are extraneous in this test
because of the non-significant main effect, but if necessary, would be interpreted in the way
described in the One-Way procedure. The means plot shows the means of the three groups,
which are not significantly different from one another, distorted to some degree by the choice of
Y-axis scaling.
Copy the syntax from the Log to the Syntax Editor. Then clear the Output Viewer by
deleting all output. The first and second lines of syntax tell SPSS which procedure to run, the
Univariate GLM, which variable holds the data, GRE Total 1, and which variable defines the
subgroups, Major. The Method, Intercept, and Design subcommands are advanced options we
will not describe here. The Post Hoc, Plot, Print, and Criteria subcommands represent the
options we chose from the respective buttons, Post Hoc, Plots, and Options in the dialog box.
If you like, experiment with other variables and choices from the dialog box or in the
syntax on your computer to see how they affect the results produced.
14. Two Factor ANOVA Between Groups
A Two-Factor ANOVA Between Groups is actually 3 tests in one, testing main effects for
each of the two independent variables and an interaction effect combining the two independent
variables. Conceptually, each of the two main effect tests are similar to a One Factor ANOVA
Between Groups test, mathematically using variances to compare observed differences between
two or more independent sample means created by a single independent variable to differences
expected by chance assuming equal population means. A significant main effect indicates that at
Text
least one of the sample means differs by more than chance from at least one other sample mean,
when comparing the groups created by that independent variable.
An interaction test examines variability caused by mean differences between all of the
groups created by combining both independent variables, controlling for the amount that is due
to the two main effects. A significant interaction effect indicates that there are differences
between the means of the groups created by combining both independent variables that is greater
than the differences expected by chance, and that cannot be explained by the differences between
the groups created by each independent variable separately.
In this example, the two quasi-independent variables that defines the groups to be
compared are Student Major, either Biology, Neuroscience, or Psychology, and Sex of Student,
either male or female. The dependent variable on which these groups will be compared is GRE
Total 1. Repeat each step of this process on your computer using Practice Data 1 as we move
through this example.
Select AnalyzeGeneral Linear ModelUnivariate from the menus. The dialog
box should look familiar from the One Factor ANOVA Univariate lesson. For the Two Factor
test, one column of data in the data set holds values for all samples, a second column of data
defines the groups for the first independent variable, and a third column of data defines the
groups for the second independent variable. Move the variable that contains the data for all
samples, GRE Total 1, into the Dependent Variable box. Move the two variables, Sex and
Major, that define the samples for each independent variable and for the interaction into the
Fixed Factors box. For this test, we do not need to use the other variable choice boxes.
Text
In the Model buttonand Contrasts button, simply accept the defaults. In the Plots
button, move Major to the Horizontal Axis box and Sex to the Separate Lines box. Keeping the
variable that has more groups, in this case Major with 3 groups, on the horizontal axis generally
produces more easily interpretable graphs. Click Add to confirm the Plot. In the Post Hoc
button, move Major into the Post Hoc tests box and select the Tukey post hoc test. It is not
necessary to do post hoc tests for the variable Sex, since a significant effect can mean only one
thing males differed from females. We do not need to make any selections from the Save
button. And in the Options button choose Descriptive Statistics, Estimates of Effect Size,
Observed Power, and Homogeneity Tests. Then run the procedure.
The Between Subjects Factors table shows how the 200 individuals in the study were
broken down into groups by each independent variable. The Descriptive Statistics table shows
statistics for all 6 groups created by combining the two independent variables, 1, 2, 3, 4, 5, 6, as
well as what we refer to as marginal totals for each independent variable separately, Sex, 1, 2,
and Major, 1, 2, 3. These are the groups that will be compared to determine whether significant
differences exist. The Levene Test, tests for homogeneity of variance between all six groups.
The significance value p > 0.05 indicates the homogeneity assumption is valid no significant
differences in the variances of the six groups were found.
The Test of Between Subject Effects table is the main output for this procedure and
shows the results of the three tests, two for the main effects of Sex and Major and one for the
Sex-by-Major interaction. The Between Cells partition of sums of squares, created as an interim
step when calculating by hand, is not shown here but can be calculated by adding the sums of
squares for Sex, Major, and Sex-by-Major. The table shows the partitioning of sums of squares,
Text
degrees of freedom, mean squares, F-statistics, and significance values for all three tests. The F-
statistic for all three tests, Sex, Major, and Sex by Major, was calculated by dividing the Mean
Square value for each effect, Sex, Major, and Sex by Major, by the Mean Square value for Error.
The significance value for all three tests are greater than 0.05 indicating there are no significant
differences found. That is, the means of the male and female group did not differ significantly;
The means of the Biology, Neuroscience, and Psychology groups did not differ significantly;
And the means of all six groups did not differ significantly.
The Multiple Comparisons and Homogeneous Subsets tables, if necessary, show the
specific differences existing between groups. The Profile Plot graphically shows the values for
the means of the six groups created by combining Major and Sex. Sex is represented by
horizontal lines, male and female. A significant effect of sex would show up as a large vertical
distance between the lines. Major is represented by three different vertical columns, biology,
neuroscience, and psychology. A significant effect would show up as different heights, or Y-axis
values, for the average of the two points in each column. A significant interaction effect shows
up as non-parallel lines. We know from our test results there are no significant effects, so the
average column values should be roughly equal, which they appear to be. And the male and
female lines should be close together, which they are except for the neuroscience group. The
lines should be parallel but clearly they are not. Rather, there seems to be an interaction effect
caused by the differences in the male and female neuroscience groups. This could indicate that
our test did not have enough statistical power to detect the interaction. On the other hand,
changing the y-axis values to represent the full range of the GRE Total scale, from 400 to 1600,
4001600, we see that there do not appear to be any substantial effects. Which is true is open to
interpretation and further research.

Text
14.3. Using Syntax
deleting all output. The basic syntax is the same here as it was for the one factor between groups
test, with the Post Hoc, Plot, Print, and Criteria subcommands showing the choices we made in
the dialog box. The main differences here from the one factor test are the listing of two
independent variables, Sex and Major, and the resulting Design showing three tests to be run.
15. One Factor ANOVA Within Groups
The One-Factor ANOVA Within Groups test can perhaps be best learned by combining
and building on what we covered in the lessons for the Paired Samples T-Test and for the One-
Factor Between Groups ANOVA. Similar to the case for the Paired Samples T-Test, an
independent variable creates samples of equal size that are matched or related in some way, most
often by repeated measurement of the same individuals. Similar to the One Factor Between
Groups ANOVA, more than two samples can be tested for mean differences, the hypothesis is
that the population means for this variable are all equal, and the test will be conducted using
variances to compare variance due to means differences to variance expected by chance. A
significant result would indicate that there is a greater difference between at least one sample
mean and at least one other sample mean than would be expected by chance if all the population
means were equal.
In this example, the independent variable that defines the samples is Time, creating GRE
testing 1, GRE testing 2, and GRE testing 3. The dependent variable on which these groups will
Text
be compared is GRE Total Score. Repeat each step of this process on your computer using
Practice Data 1 as we move through this example.
Select AnalyzeGeneral Linear ModelRepeated Measures from the menus. As a
preliminary step, we must define for SPSS what our repeated measurement independent variable
is. Ill call it Time, representing time of testing, with three levels, time 1, time 2, and time 3.
Click add to confirm the definition then click Define to continue. SPSS now knows that we have
three times, represented as three columns or variables in the data set, but does not yet know
which columns these are. Move the three variables for GRE Total into the Within Subjects
Variables box. In this case, we do not have any Between Subjects variables or covariates. Time,
the repeated measurement variable, is our only variable.
In the Model buttonand Contrasts button, simply accept the defaults. In the Plot
button, move the repeated measurement variable Time to the Horizontal Axis box, then click Add
to confirm the Plot. The Post Hoc button allows us to specify post hoc tests for between subjects
factors. In this test there are no between subjects factors so there is nothing to select. We do not
need to make any selections from the Save button. In the Options button choose Descriptive
Statistics, Estimates of Effect Size, and Observed Power. There is no option for Homogeneity
Tests because an analogous test for repeated measurement variables is displayed automatically.
Move Time into the Display means box, select Compare Main Effects, and choose the LSD or
Least Significant Difference option. These selections will produce post hoc tests for the repeated
measurement variable Time. Then run the procedure.

Text
The format of the output SPSS produces for the Within Groups ANOVA differs
substantially from the format presented in most Introductory Statistics textbooks. Therefore
well divide the normal lesson for reading the output into two lessons, one to explain how the
Partitioning of Sums of Squares is represented in SPSS, and one to examine the results of our
sample test.
15.2.1. Partitioning of SS
The Tests of Within Subjects and Tests of Between Subjects Effects tables are the primary
output for this procedure. For the moment, I will hide the Tests of Within Subjects Contrasts
table by closing the book in the Outline View, so we can see the two tables of interest one above
the other. Both of these tables show the source, sums of squares, degrees of freedom, mean
squares, F-statistic, and significance value, as we would expect. Otherwise, these tables look
much different than the One Factor ANOVA Within Groups summary tables presented in most
introductory text books.
In most books, the partitioning of Sums of Squares is described something like this:
Total SS is first divided into Between Treatments and Within Treatments components. Then the
Within Treatments component is further subdivided into individual differences and error
components. Thus, the Total sums of squares ends up divided into 3 partitions: a Between
Treatments partition, an Individual Differences partition, and an Error partition. The F statistic is
then calculated by dividing the Mean Square for the Between treatments partition by the Mean
Square for the Error partition.
SPSS shows these two partitions, which form the F-statistic of interest, in the Tests of
Within Subjects effects table. In our example, Time represents the Between Treatments partition
and has 2 degrees of freedom, 3 groups minus 1. Error(time) represents the Error partition and
Text
has the expected 398 degrees of freedom: (600 observations 3 groups) (200 individuals 1) =
597-199 = 398 degrees of freedom. SPSS presents the last partition, Individual Differences, in
the Tests of Between Subjects Effects Table under the heading Error. As it should, this
partition has 199 degrees of freedom, 200 individuals 1.
The other major difference between these tables and what we see in textbooks is that
there are 4 rows, leading to 4 F-statistics and 4 significance values. Dont stress. Only one row
will be used in any specific study. If the homogeneity of variance assumption is valid, wed use
the first row for both Time and Error(time) and the p=0.187 significance value. If the
homogeneity assumption is not valid, we can instead use one of the other 3 lines, each of which
represents a correction for the violated assumption of homogeneity by reducing the degrees of
freedom.
For technical reasons I wont go into, SPSS tests Sphericity rather than Homogeneity of
Variance and presents the results in the Mauchlys Test of Sphericity table. In most Introductory
Classes, you can consider Sphericity to be synonymous with Homogeneity of variance.
15.2.2. Output Contents
Now that we have a better understanding of how SPSS displays summary results for a
Within Subjects ANOVA, lets examine the contents of the results in detail. Mauchlys Test is
significant, p < .05 so we cannot assume sphericity. Therefore in the Within Subjects Effects
table we must choose one of the three lines with corrected degrees of freedom. Well arbitrarily
choose the most conservative one, Lower Bound. P = 0.196 indicates there are no significant
differences between the three GRE Total scores. In this example, the Between Subjects Effects
table is useful only for identifying the SS for individual differences, and we need not worry about
the Within Subjects Contrasts table, which tests for polynomial trends in the data.
Text
The Within Subjects Factors table shows which variables we used to define our repeated
measures variable and Descriptive Statistics shows statistics for those variables. Note that
N=200 for each group, yielding a more powerful test than a Between Subjects test with the same
individuals. The Multivariate Tests Table is not relevant for the question we are asking so we
can ignore this table. If we had found a significant main effect we could use the Pairwise
Comparisons table to determine which groups differed significantly from which others. The 1s,
2s, and 3s, in this table simply refer back to the Within Subjects Factor definitions. The profile
plot shows a graph of the means of the three GRE Total groups. The graph may appear to show
large differences but this is mostly because of the distorted Y-axis scaling. Our previous tables
showed there were no significant differences between these means.
15.3. Using Syntax
deleting all output. The first two lines of syntax tell SPSS to run the General Linear Model
procedure using the variables GRE Total 1, 2, and 3 as within subjects measurements. The
Within Subjects Factor and Design subcommands complete the definition of the within subjects
variable as Time and in addition request tests for polynomial trends in the data. The Plot,
Print, and Criteria subcommands are similar to other ANOVA procedures, representing choices
we made from the dialog box. The Estimated Marginal Means subcommand replaces the Post
Hoc subcommand from previous procedures and asks for a post hoc comparison using the Least
Significant Differences test.
Make a copy of the syntax to experiment with. We could run the same test on a different
set of variables simply by changing the variable names, for example, to the quantitative GRE
Text
Scores. Highlight. Select RunSelection The output now shows results for a within
subjects test of the quantitative scores only.
Part IV Inferential Statistics: Association
In Part IV, we continue our examination of inferential tests. Chapters 16 to 18 examine
inferential tests of association between two or more variables. Chapter 16 covers correlation
between two variables. Correlation can be used as an inferential test, but also can be used a
descriptive statistic was well. Chapter 17 covers regression between two variables, that is, using
the values of one variable to predict the values of another. Chapter 18 examines Multiple
Regression or MR, that is predicting the values of one variable from the values of more than one
predictor variables.
16. Bivariate Correlation
The Bivariate Correlation measures the extent to which two variables are related. That is,
how much are changes in one of the variables related to changes in the other variable.
Sometimes we simply desire the correlation coefficient as a descriptive statistic telling us
something about two variables. Other times we want to use the sample correlation to estimate
the correlation in the population by performing an inferential test. Correlation as an inferential
statistic tests whether two sample variables are correlated more than would be expected by
sampling error or chance if the populations from which these variables came are not correlated.
A significant result indicates that the correlation between sample variables is more than would be
expected by chance if the populations are not correlated. In our example, we will examine and
test the correlations between GRE 1 and GRE 2, GRE 1 and GRE 3, and GRE 2 and GRE 3.
Text
Select AnalyzeCorrelateBivariate from the menus. Move the variables you want to
correlate into the Variables box, in this case GRE Totals 1, 2, and 3. By default, SPSS performs
tests between every possible pair of variables listed in the choice box. Well accept the standard
Pearson correlation coefficient and a two-tailed test. Flagging significant correlations will allow
us to read the table more easily. If we want, we can also produce means and standard deviations
for the variables analyzed, in case we do not already have these values. Then run the procedure.
The output contains two tables, a Descriptive Statistics table since we selected this option
from the Options button, and the basic output for this procedure, a correlation table showing the
correlation coefficients between variables and the corresponding significance values. The
variables chosen, GRE Totals 1, 2, and 3, are listed both as rows and as columns. Each cell in
the table shows the results for that row variable and that column variable. For example, GRE
Total 1 with GRE Tot 2, GRE Tot 1 with GRE Tot 3, and GRE Tot 2 with GRE Tot 3. The cells
on the diagonal show the result for a variable with itself, which is necessarily a perfect
correlation of positive 1.0. The lower half of the table is simply the mirror image of the upper
half and offers no additional information.
For each pair of variables, for example GRE Tot 1 with 2, the table shows the actual
correlation between the sample variables, the significance level or p-value if this correlation is
used as an inferential test of the population correlation, and the number of data points that were
correlated, in this case, 200 pairs of data points. The significance level is also indicated as
asterisks next to the correlation because we used the Flag Significant Correlations checkbox. A
single asterisk indicates a significant correlation at the p<0.05 level, and a double asterisk
Text
indicates a significant correlation at the p<0.01 level. The results indicate that all three pairs of
variables are significantly correlated and are strongly positively correlated. This is appropriate
since we would expect a person to have similar performance on each of the GRE Tests they took.
We can square these correlation coefficients to find the coefficient of determination or
percentage of variance in one variable explainable by the other variable. For example, .716
squared equals .513, meaning that 51.3% of the variance in GRE Tot 2 can be explained by GRE
Total 1. Therefore 48.7% of the variance cannot be explained and may be due to differing
conditions between tests, such as different test items, testing location, fatigue level of students
during the test, temperature in room, etc.
One caveat. If the variables are related in a non-linear way, or if there are any strong
outliers in the data, the numbers shown in this table will not be accurate and could be
substantially different than the true relationship between variables. For this reason I strongly
recommend you examine a scatterplot graph of each pair of variables before conducting
correlations.
16.3. Using Syntax
deleting all output. The procedure name is Correlations. The Variables Subcommand tells SPSS
to calculate correlation coefficients for each possible pairing of the variable listed. The Print
subcommand tells SPSS to show 2-tailed significance values and to flag significant correlations
in the output table. The Statistics subcommand tells SPSS to compute descriptive statistics for
the variables listed and is included because we checked this box in the Options button. And the
Missing subcommand tells SPSS how to handle missing values.

Text
Create a copy of the syntax to experiment with. Add the word with between the
variables GRE Tot 1 and GRE Tot 2. Highlight the modified syntax and run the procedure.
SPSS produces a smaller table showing only the correlations of 1 with 2 and 1 with 3. The
correlation of each variable with itself, the correlation of 2 with 3, and the duplicate mirror
images are not displayed. Adding the word with to the syntax can be very useful when you
want to calculate correlations between groups of variables but not within groups of variables.
Correlations like this are only possible with syntax, not from the pull down menus and dialog
box. How would you learn about this syntax if not created by the dialog box? The Command
Syntax Reference option in the Help menu provides detailed explanation of the syntax for all
procedures. As you become more comfortable with syntax you may want to access this material.
Try experimenting with other variables or changes to syntax on your computer now to see
how you can control the output.
17. Bivariate Regression
Bivariate Regression measures the same relationship between two variables that
Correlation does. In Correlation, the two variables are standardized before examining the
relationship and the correlation is a unit-less number. In Regression, the relationship is
expressed in the original units of the two variables. Typically we form a regression equation
with a slope and y-intercept and plot the regression line graphically. As an inferential statistic,
regression tests the same question as correlation, whether there is a greater relationship between
the two sample variables than would be expected if the populations were not related. However,
with regression we usually speak in terms of whether one variable can significantly predict the
other rather than in terms of significant relationships.
17.1. Curve Estimation

Text
Select AnalyzeRegressionCurve Estimation from the menus. The Curve Estimation
procedure is one of the simplest ways to generate the regression coefficients and significance
tests. It also produces a nice plot of the regression line. In this example, we will predict GRE
Total 2 from GRE Tot 1. That is, we want to know if scores on GRE Total 2 can be predicted
beyond chance, from scores on GRE Total 1. Move GRE Total 2 into the Dependent box, and
GRE Total 1 into the Independent box. Select linear regression, show the ANOVA significance
test, include the intercept in the equation, and plot the regression line. Then run the procedure.
We can recall that a typical regression equation is of the form (criterion y = b X plus a; or
the more common notation, criterion y = b1 X + b0). For this example, the equation would be
gre_tot2 = b1(gre_tot1) + b0. We can use this equation as a guide to understanding the values
presented in the output.
Note: in SPSS versions 11.5 and 12, the Curve Estimation procedure produces Text
Output, containing all the results, and a graph of the regression line. In SPSS v.13, the
information from the Text Output from previous versions is broken down and displayed in
separate tables.
The Model Description, Case Processing Summary, and Variable Processing Summary
tables show that we predicted GRE Total 2 from GRE Total 1, using linear regression with the
intercept included in the equation, there were 200 individuals in the data set with no missing
values, and all 200 individuals had positive data values, the only possible result for a GRE test.
The Coefficients table contains the basic values of interest for our regression equation.
The slope, b1, and y-intercept, b0, from the equation are found in the unstandardized coefficients
section under B. The slope was +0.91, on the GRE Total 1 line, and y-intercept was +107.8 on
Text
the Constant line. These coefficients have been used to plot the regression line through a
scatterplot of the data from GRE Totals 1 and 2. Note that the axes are in raw GRE units and the
far left boundary is NOT the origin of the x-axis. The plot does allow us to check for influential
outliers, but for the most part there do not appear to be any.
Returning to the Coefficients table, we see Beta, which is the slope of the regression
line expressed in standardized unit, similar to a Pearsons correlation coefficent, and a
significance test for the slope, t = 14.4, indicating a significant relationship between GRE Totals
1 and 2 at the p < .001 level.
The underlying details of the significance test can be found above in the ANOVA table,
expressed with the F-statistic rather than the t-statistic. F = 207.8 is necessarily equal to the
square of the t-value, 14.4, with the same level of significance. The Model Summary table,
expresses the same results in the format we will encounter later for Multiple Regression.
Multiple R is exactly the same thing as Beta when there are only two variables: 0.716; 0.716.
R-Square, as the name implies, is calculated by squaring the R-value .716. and shows the
coefficient of determination or amount of variability in GRE Total 2 that can be explained by
GRE Total 1. The table also displays Adjusted R-square, which is a correction for the effect of
sample size on the equation, and also the Standard Error of the Estimate.
17.2. Linear Regression
Instead of using the Curve Estimation procedure, a more common way to calculate
bivariate linear regression is to use the Linear Regression procedure. Curve Estimation utilized a
simpler dialog box, with fewer choices, and more basic output. We can now look at the options
available in Linear Regression.

Text
Choose AnalyzeRegressionLinear from the menus. Move GRE Total 2 to the
Dependent box and GRE Tot 1 to the Independent box. The default statistics are Coefficient
Estimates and Model Fit, which will produce the same Model Summary, ANOVA, and
Coefficients tables as in the Curve Estimation procedure, but no scatterplot. We can also select
Confidence Intervals for the coefficients and examine possible Outlier cases that have residuals
more than 2 standard deviations from the mean residual. If we wanted to, we could display
Descriptive Statistics for the variables.
The Plots button allows you to select a number of different plots all used to check the
validity of assumptions about the regression equation. A detailed look at these options is beyond
the scope of this Guide, though you can find descriptions of these values and other options using
the Help button.
The Save button allows you to create new variables containing predicted values for GRE
Total 2 using the values from GRE Total 1 and the regression equation; Residual values for the
difference between GRE Total 2 scores and predicted GRE Total 2 scores; And Influence and
Distance values that can be examined for possible outliers.
In the Options button we can accept the default values, making sure we Include the
Constant or y-intercept in the equation. Then run the procedure.
The Linear Regression procedure produces Model Summary, ANOVA, and Coefficients
tables exactly analogous to the tables produced in the Curve Estimation procedure: 0.716, 0.512,
0.5100.716, 0.512, 0.510; F-value of 207.8, 207.8; coefficients, 0.91 and 107.8, t=14.4, p< .
001, coefficients 0.91, 107.8, t=14.4 p<.001.

Text
In the Linear Regression procedure we were also able to produce a 95% confidence
interval for the regression coefficient, as well as Casewise Diagnostics and Residual Statistics
tables. The Casewise Diagnostics table shows us the individuals from the data set that had the
largest residuals, or differences between the actual GRE Tot 2 scores and the predicted GRE Tot
2 scores. The standardized residuals make these easier to interpret using the Standard Normal
Curve and standard deviations. The Residual Statistics table gives several common statistics for
the predicted GRE Total 2 scores, the difference between actual and predicted, and the
standardized version of both these numbers.
17.3. Using Syntax
Copy the syntax for the Curve Estimation and Linear Regression procedures to the
Syntax Editor. The Curve Estimation syntax shows the variables we chose, GRE Total 1 and 2,
linear regression with the constant in the equation, the ANOVA significance test for multiple R
and the scatter plot with regression line.
The Linear Regression syntax shows the criterion variable, GRE Total 2, the
predict-OR variable, GRE Total 1, the statistics tables we chose, for example the Coefficients
and ANOVA tables, and the Residual Diagnostics.
On your computer, practice with the Curve Estimation and Linear Regression procedures,
using several different pairs of variables, and compare the results of the two procedures.
18. Multiple Regression
In Bivariate Regression, the relation between two variables is determined. In Multiple
Regression, the relation between one variable and one linear composite of variables is
determined. The linear composite is formed from several variables, rather than one as in
Bivariate Regression, each with a regression coefficient and with a y-intercept. There is one
Text
inferential test for the relation between the criterion variable and the linear composite, using the
Multiple R statistic. Then there are significance tests for each individual regression coefficient,
that is, for the relationship between the criterion and each variable in the linear composite.
Multiple Regression is a sophisticated procedure often reserved for the advanced student
of statistics, and a full exposition of the procedure is beyond the Scope of this Introductory
Guide. The lessons that follow introduce some of the basic features of SPSS for use with
Multiple Regression organized for an introductory statistics student with only a basic
comprehension of the procedure. Follow along on your computer as we work through this
example.
Select AnalyzeRegressionLinear from the menus. We have already discussed
portions of this dialog box in the lesson on Bivariate Regression. In this example, well try to
predict GRE Total Score 3 from a linear composite of GRE Total scores 1 and 2, after controlling
for the Sex of the test-taker. Move GRE Total 3 into the Dependent box and Sex into the
Independent box. Click the Next button to create a second block in the hierarchical regression
and enter GRE Totals 1 and 2 into this block. The Method drop down menu allows you to
choose from several methods of entering variables into the equation. Well keep the default,
Enter.
In the Statistics button well make the same selections we made for Bivariate Regression,
and in addition, well select R-squared change, part and partial correlations, and collinearity
diagnostics. In the Plots button well choose only a couple of the many plots you might want to
produce. Well plot the standardized residual values against the standardized predicted values,
and select All Partial Plots.

Text
In the Save button, well choose only a couple of the many new variables you may be
interested in. Well select Cooks D and Standardized DF Betas. Accept the defaults in the
Option button and run the procedure.
First we examine the Model Summary for the two models, Sex alone in Model 1, then the
linear composite of Sex with GRE Totals 1 and 2 in Model 2. At the far right of this table, we see
that Sex alone in Model 1 does not significantly predict GRE Total 3, p>.05, but the linear
composite of Sex, GRE Total 1 and GRE Total 2 does significantly predict GRE Total 3, p<.05.
The Multiple R for Model 2, 0.726, shows us the correlation between the linear composite of all
three predictors and GRE Total 3. The R-squared change, 0.525, shows us the correlation for
linear composite, after controlling for the correlation due to Sex in the first model. The ANOVA
table contains the raw values used to obtain the F-statistics and significance values, p=0.633 and
p=0.000, that we saw in the Model Summary Table above.
The coefficients table shows us the regression coefficients for the two models. Only
model two was significant above, so that is the only model we will examine. We see that only
GRE Total 1 significantly predicted GRE Total 3, p<.05, after controlling for the other variables,
GRE Total 2 and Sex. GRE Total 1 has a regression coefficient of 0.890 in raw units and a
standardized coefficient beta of 0.699. This table also shows us confidence individuals for the
regression coefficients, and bivariate, part and partial correlations for all the variables in both
models with the criterion, GRE Total 3. The collinearity statistics show Tolerance and Variable
Inflation Factor for each variable, both of which we desire to be closer to one.
Excluded Variables shows that we did not include GRE Total 1 and GRE Total2 in the
first model. The Casewise Diagnostics and Residual Statistics tables contain the same kind of
Text
information as they did in Bivariate Regression, only now applying to the current Multiple
Variable Model. The Charts can be used to look for curvilinear relations or for fan-shaped
heterogeneity of variances.
Finally, I will use the Frequencies procedure to produce histograms for each of the
variables created from the selections we made in the Save button. These histograms can be used
to look for outliers that may distort the model, for example this outlier case in the graph of
Cooks D.
18.3. Using Syntax
Copy the syntax for the multiple regression procedure to the syntax editor. Among other
things we can see listings for the statistical tables we chose, the two blocks or levels used to
create the models, the partial plots and scatterplot, the outlier diagnostic and the new variables
we created.
Part V Inferential Statistics: NonparametricTests
In Part V of this Guide we briefly introduce the concept of nonparametric tests and two
specific tests based on the Chi-Squared distribution. Chapter 19 covers the chi-squared tests of
independence and of goodness of fit. Lessons 19.1 and 19.2 examine these two tests in detail.
19. Chi-Squared
Unlike the tests we have described so far, Non-parametric tests do not require any
assumptions about parameters from the population. Parametric tests such as T-tests, ANOVAs,
correlation, and regression each require some kind of mathematical manipulation of VALUES
from the population, and assumptions about the population, for example, about SD, normality, or
linearity.
Text
Non-parameter tests we will discuss do not need to manipulate scores or values, or make
assumptions about the population. Instead, they test hypotheses using nominal level data. Thus,
all that is required of the population is that the individuals can be divided into groups. In this
Guide well cover two non-parametric tests that rely on the chi-squared distribution of
frequencies: The Test of Independence, which is analogous to correlation for nominal variables,
and the Goodness of Fit test, which tests a distribution of frequencies compared to a hypothetical
distribution.
19.1. Test of Independence
In correlation, we examine the relation between two variables measured on interval or
ratio scales, for example, GRE scores. What if we want to measure the relationship of variables
measured only on nominal scales, for example the relation between race and political party
affiliation. The chi-square test of independence allows us to examine the relationship between
two or more nominal level variables.
Select AnalyzeDescriptive StatisticsCrosstabs from the menus. In this example,
well test whether there is relationship between the sex of a student and the graduate program to
which they apply. Move Sex into the Row box, Program Type into the Column box, and select
the Display Clustered Bar Charts check-box. In the Statistics button choose Chi-Square. In the
Cells button, select observed and expected counts, and row, column, and total percentages.
Accept the default format and run the procedure.
The Processing Summary table shows us that 200 individuals were divided into the
groups that are created by combining Sex and Program Type. The Cross-tabulation table shows
Text
how many individuals were in each group: 51 males and 49 females applied to medical school,
and 29 males and 71 females applied to clinical graduate school. The table also shows the
expected frequency in each cell if we assumed no relation between variables. We could calculate
this from the row and column totals if we were doing this test by hand. (100 x 80) / 200 = 40
and (100 x 120) / 200 = 60. The percentages show subtotals by category, for example 63.8% of
the males applied to medical school and 51% of medical school applicants were male.
The observed and expected frequencies from the 4 cells can be used to calculate the chi-
squared statistic, equal to 10.083 on one degree of freedom. This chi-squared value is significant
at the p<.01 significance level, indicating that there is a significant relationship between the
variables Sex and Graduate Program. This table also shows that we have not violated the
minimum expected cell frequency of 5, which would make the chi-squared test inaccurate.
The bar chart helps us to show what exactly the relationship between variables is. Males
are more likely to apply to medical school than to clinical graduate school, while the reverse is
true for females. Females are more likely to apply to clinical graduate school than to medical
school.
Copy the syntax for the Chi-squared Test of Independence from the Output Viewer to the
Syntax Editor. The syntax shows the procedure we are running, Crosstabs, the variables that
create the categories, Sex by Program, and the request for: the chi-squared statistic and
significance test, the values for each cell of the cross-tabulation, and the bar chart.
Experiment with the Crosstabs procedure to produce Chi-squared Tests of independence
on your computer now. Try conducting the test on Sex and Major or on Major and Program.
19.2. Goodness of Fit

Text
The chi-squared goodness of fit test tests whether the distribution of individuals in the
sample into a number of groups differs significantly from what would be expected based on the
hypothesized population distribution. For example, we could hypothesize that the three
undergraduate majors, biology, neuroscience, and psychology are equally attractive to students
and therefore wed expect equal numbers of students in each major. Well test this hypothesis in
the following lessons.
Select AnalyzeNonparametric TestsChi-Square from the menus. Note that the
procedure we use for this chi-squared test is found in a different location in the SPSS menus than
the procedure for the chi-squared Test of Independence. Move the variable we are testing,
undergraduate Major, to the test variable box. In the expected values box, we must tell SPSS
how many individuals we expect to be in each group created by the variable Major, based upon
our hypothesis. Since we hypothesized each major was equally attractive, the number of
students in each major should be equal. Therefore, the 200 students in the sample should be
divided equally, with 66.7 students in each major. Dont worry about the fractional student, this
is common in statistics. If our hypothesis had led to unequal numbers in each group, it would be
important to enter the expected values here in the same order the subgroups are listed in the
variable. Accept the default options and run the procedure.
The output for the chi-squared goodness of fit test is fairly simple, consisting of a table of
observed and expected frequencies and a table showing the chi-squared value and significance
level. The observed frequencies show there were 82 biology students, 54 neuroscience students,
and 64 psychology students in the sample, compared with our expected frequencies of 66.7 in
Text
each major. These values can be entered into the chi-squared formula to arrive at a chi-squared
value of 6.04 with 2 degrees of freedom. This statistic is significant at the p<.05 level, indicating
that the actual distribution into majors was significantly different that what was expected from
our hypothesis about the population. Considering only the results of this test we would conclude
that the three majors are NOT equally attractive to students in the population.
Copy the syntax for the chi-squared Goodness of Fit test from the Output Viewer to the
Syntax Editor. SPSS uses the NPAR TEST procedure to conduct the Goodness of Fit test. The
CHISQUARE subcommand tells SPSS to calculate the chi-square statistic and significance test
for the groups created by the variable Major. EXPECTED defines the expected frequencies for
each group, in the same order the groups are identified in the variable. And MISSING tells SPSS
how to handle missing values.
On your computer, experiment with different hypotheses for the variable Major, or with
hypotheses about the variables Sex or Program, to see what kinds of results you can obtain.

V.G SPSS 13 Video Guide

Caricato da

Informazioni sul documento

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

V.G SPSS 13 Video Guide

Caricato da

Copyright:

Formati disponibili

3/24/2005 10:42:00 AM SPSS 13 Video Guide 1

A Video Guide to SPSS for Windows, Version 13

Part I Features of the SPSS Program

users of SPSS in industry or elsewhere.

Chapters 1 to 3 cover very basic elements of SPSS.

and store programming code.

and Chapter 3 outlines some features common to all SPSS windows.

SPSS windows and how they are used.

Chapter 5 gives an overview of the menus available.

Try this now on your computer.

1. Three Primary SPSS Windows

1.1 Data Editor Overview

1.2 Output Viewer Overview

Viewer by clicking File, then clicking New, then clicking Output.

Pause this lesson now and try this on your computer.

1.3 Syntax Editor Overview

Pause this lesson now and try this on your computer.

2. Switching Between Windows

quickly switching the view from one of these windows to another.

2.1 Using the Taskbar

2.2 Using the Window Menu

2.3 Sizing and Placing Windows

on a part of it. Data EditorOutput ViewerSyntax Editor.

3 Common Window Features

lessons will introduce these features.

3.1 Title Bar

and the Close button.

3.2 Menu Bar

except the menu contents will close the menu.

3.4 Status Bar

3.5 Common Features Summary

4. Unique Window Features

4.1 Two Data Editor Views

clicking the tabs at the bottom of the screenVariable ViewData View.

4.1.1 Data Editor Data View

characteristics we have measured for all individuals in our study.

changes to list the cell you have selected.

4.1.2 Data Editor Variable View

not characteristics of the individuals in our study.

of the variables we are measuring in our study.

or up to 64 characters long in SPSS v. 12.

other kinds of data including string or letters, or dates.

MEASURE allows us to set the scale of measurement for this variable.

4.2. Output Viewer

table or to the Log.

the Statistics table.

we can click and drag the set of Histograms to a new location.

click and drag the vertical bar that divides them.

4.3. Syntax Editor

5. The Menus Overview

with the program.

5.1. Common Menus

5.2. Unique Menus

6. The Menus Up Close

6.1. File Menu

6.2. Edit Menu

much like the similar commands found in other Windows programs.

6.2.1. Edit>Options Dialog Box

that SPSS supports a variety of languages.

6.3. View Menu

6.4. Data Menu

some specific subset of your data.

than others if it is appropriate to do so.

states into data for countries.