Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
C ontents
CHAPTER 1: Contents i
CHAPTER 2: Introduction 1
Using this Manual .................................................................................... 1
What’s New in SigmaStat 3.1 ................................................................... 2
System Requirements................................................................................ 4
SigmaStat Procedures ............................................................................... 4
Major SigmaStat Features......................................................................... 6
Using SigmaStat Toolbars ........................................................................ 8
Saving Your Work ................................................................................ 10
Setting Options .................................................................................... 10
Exiting SigmaStat ................................................................................... 11
Getting Help........................................................................................... 11
Systat on the World Wide Web .............................................................. 12
References............................................................................................... 12
i
CONTENTS
ii
CONTENTS
iii
CONTENTS
iv
CONTENTS
v
CONTENTS
vi
CONTENTS
vii
CONTENTS
viii
Introduction
1 Introduction
The first
To perform individual tests and interpret test and analysis results, see:
Chapter 8, Comparing Two or More Groups; Chapter 9, Comparing
Repeated Measurements of the Same Individuals; Chapter 10,
Comparing Frequencies, Rates, and Proportions; Chapter 12, Survival
Analysis; and Chapter 13, Computing Power and Sample Size.
For a reference of all menu commands and associated dialog box option
functions, see Chapter 15, Reference.
SigmaPlot 9.0 Prepare your SigmaStat graphs for publication using SigmaPlot’s
Integration advanced graph editing on your SigmaStat graphs. SigmaPlot 9.0 must
be installed.
For more information, see Using SigmaPlot to Modify Graphs on page
189.
Survival Analysis Use Survival Analysis to compute the probability of time to an event,
such as surviving lung cancer, using Kaplan-Meier (product limit)
survival curve estimation. Run Single Group, Log Rank, or Gehan-
Breslow tests, and SigmaStat automatically generates a graph page and
report. Survival curve options include error bars and confidence
intervals, fraction or percent scales.Survival Analysis supports both raw
and indexed data formats, and it alllows for multiple definitions of event
and censored data.
For more information, see Chapter 12, Survival Analysis.
Improved Worksheet SigmaStat 3.1’s new worksheet can now handle larger data sets of 32
million rows by 32,000 columns. Adjust row height and column width,
enter longer text strings and variable names, add row titles, perform
multiple undo, and format cells and empty columns.
For more information, see Chapter 3, Using the Worksheet.
Improved Importing Import MS Access, SPSS, and previous SigmaPlot and SigmaStat files.
Multiple Undo Experiment with different annotations on your graph page, worksheet,
or report, then quickly undo the last several changes and start again.
Improved Reports SigmaStat reports now support .pdf and .html export, the ability to
insert a date/time field, decimal aligned tabs, more keyboard control
options for improved navigation, a new Formatting toolbar that includes
text justification, Print Preview, and an improved Find/Replace dialog
box with a Go To option.
For more information, see Chapter 6, Working with Reports.
Graph Improvements Graph improvements include graph legends, multiple undo, step plots,
and more symbols, Create and re-create graphs with fewer clicks using
the new Graph Wizard. Select a range of data in a column, rather than
the whole column. Graph single X or Y data, or when creating multiple
plots, choose X many Y or many X and Y data formats. When creating
histograms, indicate the number of bins.
For more information, see Chapter 7, Creating and Modifying Graphs.
Transform Language Added to the transform language are parameter determination functions
Additions using ape, dsinp, fwhm, inv, lowess, lowpass, sinp, x25, x50, x75,
xatymax, xwtr.
Performance SigmaStat 3.1’s statistical tests, curve fitting and transforms are now
Improvements faster than ever.
New Technology SigmaStat 3.1 is compatible with Windows 2000 and Windows XP.
System Requirements 10
SigmaStat Procedures 10
System Requirements 4
Introduction
Viewing Test Test reports automatically appear after the test is complete. Reports can
Report be edited, exported, printed, and saved to the notebook.
Generating You can also generate graphs from test reports. Report graphs can be
Report Graphs edited, printed, and saved to notebook files.
Creating You can create a variety of graphs using your worksheet data. These
Exploratory Graphs graphs can be edited, printed, and saved to the notebook.
SigmaStat Procedures 5
Introduction
SigmaStat Transforms You can also use the complete set of data transforms to center,
standardize, and otherwise modify your data prior to tests.
Several special SigmaStat features provide you with new standards of ease
of use, power, and flexibility..
The Advisor Wizard The Help menu Advisor command starts the Advisor Wizard. The
Advisor asks you a series of questions about what you want to do with
your data, and the kind of data you have. Answering these questions
enables SigmaStat to select the appropriate test for your analysis.
Assumption Checking All statistical tests and analyses can assume that your data possesses
certain characteristics that underlie the methods used to test the data.
For the appropriate tests, SigmaStat can automatically check whether
your data is compatible with assumptions of:
➤ Normality
➤ Equal variance
➤ Constant variance
➤ Multicollinearity
➤ Outlying and influential points
Missing and Unbalanced SigmaStat can perform One and Two Way (two factor) ANOVA and
Data Handling Repeated Measures ANOVA on data that contains missing values or
with unbalanced designs without requiring you to supply your own
estimates of the missing data. A sophisticated general linear model is
automatically used to provide least squares estimates of the means for the
cells.
Data Transforms SigmaStat provides a complete array of data transformations. These can
be used to modify data to better fit assumptions of tests, or otherwise
modify it before performing a statistical procedure.
User-Defined Nonlinear Both Linear and Nonlinear Regression are provided. The functions used
Regression for Nonlinear Regression can be defined as any function of up to ten
independent variables and up to 25 parameters. Full use of the transform
language is supported, and many other nonlinear regression procedure
settings are customizable. You can save these nonlinear procedures to
and load them from nonlinear regression fit files.
Importing Data SigmaStat can import data from most common spreadsheet formats,
dBASE files, and ASCII text files.
SigmaStat can display two toolbars at the top of the SigmaStat window.
The standard toolbar provides quick and easy access to statistical tests
and the most commonly used commands. The formatting toolbar
provides easy access to text attributes commands for report graphs.
FIGURE 1–1
The SigmaStat Toolbar
New Excel Select Test
Open Print Copy Redo Worksheet Drop-Down List Run Test Test Options Advisor
New Notebook Save Cut Paste Undo New Create Graph Rerun Test Zoom Help
Worksheet
Right 1½
FIGURE 1–2 Boldface Underline Align Justified Spaces
The Formatting Toolbar
Viewing and Choose the View menu Toolbars command to open the Toolbars dialog
Hiding Toolbars box, then select or clear options to view or hide selected toolbars.
FIGURE 1–3
The Toolbars Dialog Box
Check the Large Buttons check box to increase the size of standard and
formatting toolbar buttons. Clear the Color Buttons check box by
selecting it to make color toolbar buttons monochrome. Clear the Show
ToolTips check box to hide the toolbar help tags.
Positioning Toolbars The toolbars can be moved from their default position to anywhere in
the screen, including changing from a horizontal to a vertical position.
To move a toolbar from its fixed position, click anywhere in the toolbar
that is not a button, and drag the toolbar to the desired place on the
screen. The toolbar appears as a floating palette when it is not attached
to the SigmaStat window.
dragging the toolbar to the edge of the window, release the mouse
button after the toolbar flips to a vertical position.
To save the notebook and its associated data, report, and graphs, choose
the File menu Save command. The first time you save your work, you
must select or enter a file name and/or directory. The default file
extension is .SNB for Notebook files. Select OK to save the notebook.
! Note that each worksheet, its associated reports and graphs, and the
notebook are saved together as a single file.
After initially saving your work to a file, you can continue to save to the
same file name with the Save command, or choose a different file name
and/or destination with the Save As command. Use the Save All
command to save all open notebooks.
Setting Options 10
Click Options on the Tool menu to set worksheet and graph page
preferences.
Worksheet Options The worksheet options include numeric display, default column width,
number of decimal places, and use of engineering notation. Worksheet
preferences are discussed in Setting Worksheet Display Options on page
41.
Page Options The page options control units of measurement on the page, graph and
object resizing options, and page Undo disabling. Graph page
preferences are discussed in Setting Graph Page Options on page 178.
Report options The report options include units of measurement on the report, setting
the number of decimals displayed in the report, enabling or disabling
Exiting SigmaStat 10
Select either the File menu Exit command, or press Alt+F, then X to
leave SigmaPlot. You can press Alt+F, X from any location in the
program to quit.
All current toolbar and text settings are saved as defaults between
sessions.
Getting Help 10
SigmaStat's on-line help uses the standard Windows help system. To get
help, select the Help menu, then choose Contents and Index to open the
help search dialog box and search for a specific topic within help.
Getting Technical The services of Systat Software Technical Support are available to
Support registered customers. Customers may call Technical Support for
assistance in using Systat Software products or for installation help for
one of the supported hardware environments. To reach Technical
Support, see the Systat Software home page on the World Wide Web at
http:// www.systat.com, or contact us:
In North America:
Exiting SigmaStat 11
Introduction
In Europe:
Telephone: 49 2104 / 95480
Fax: 49 2104 / 95410
Email: eurotechsupport@systat.com
References 10
Press, WH, Flannery, BP, Teukolsky, SA, and Vetterling, WT. Numerical
Recipes. Cambridge: Cambridge University Press, 1986.
Searle SR. Linear Models for Unbalanced Data. New York: John Wiley
and Sons, 1987.
Weisberg S. Applied Linear Regression (2 ed). New York: John Wiley and
Sons, 1985.
References 13
Introduction
References 14
Notebook Manager Basics
SigmaPlot notebook files contains all of your SigmaPlot data and graphs,
and are organized within the SigmaPlot Notebook Manager. This
chapter covers:
When you first start SigmaPlot, and empty worksheet appears along
with the Notebook Manager. The Notebook Manager is a dockable or
floating window that displays all open notebooks.
The first time you see the Notebook Manager, it appears with one open
notebook, which contains one section. That section contains one empty
worksheet. Contents of the Notebook Manager appear as a tree
structure, similar to Windows Explorer.
FIGURE 2–1
The Notebook Manager Window
Each open notebook appears as the top level, with one or more sections
at the second level, and one or more items at the third level. Within each
section you can create one worksheet and an unlimited number of graph
pages, reports, equations, and macros. The most recently opened
notebook file appears at the top of the Notebook Manager.
FIGURE 2–2
The Notebook
Manager in a
Docked Position
Modified Notebook An asterisk next to an item in the Notebook Manager indicates that the
Names item has been modified since the last time you saved the notebook.
Notebook Item Names The default startup notebook is named Notebook1. It contains one
notebook section, Section 1, and one worksheet, Data 1. When you save
your notebook file, the name of the file appears at the top of the
Notebook Manager window. Notebook files use a (.jnb) extension. The
default names given to notebook sections and items are, Section
(number), Data (number) or Excel (number), and Report (number).
Regression equations are named when they are created. New items are
numbered sequentially.
You can open as many notebooks as you like. All opened notebooks
appear in the Notebook Manager. You can navigate through the
different open notebooks by selecting them in the Notebook Manager.
You can hide them by clicking the Close button on the upper right-hand
corner of the Notebook window; however, this does not close the item.
It only hides it from view. To close notebook, use the File menu.
To open a notebook:
1 From the File menu, click Open. The Open dialog box appears.
2 Select a notebook (.jnb) file from the list, and click Open. The
notebook appears in the Notebook Manager.
To close a notebook:
1 Select the notebook to close in the Notebook Manager.
You can also choose Close Notebook from the File menu.
Sizing and Docking the The Notebook Manager can appear in six states:
Notebook Manager
➤ Docked with summary information in view.
➤ Docked with summary information hidden.
➤ Floating with summary information in view.
➤ Floating with summary information hidden.
➤ Docked and collapsed.
➤ Hidden.
FIGURE 2–3
The Notebook
Manager Displaying
Summary Information
5 To drag and drop the Notebook manager, click the title bar and drag
the Notebook Manager anywhere on the SigmaPlot desktop.
3 Type a name for the notebook in the File Name text box.
4 Click Save to save the notebook file and close the Save As dialog
box.
3 Type a name for the notebook in the File Name text box.
4 Click Save to save the notebook file and close the Save As dialog
box.
Printing Selected You can print active worksheets, graph pages, reports, and selected
Notebook Items notebook items by clicking the Print button on the Standard toolbar.
You can print individual or multiple items from the notebook, including
entire sections.
4 Click Properties.
Protecting Notebooks 10
5 Click OK.
5 If you want to remove this password, leave this box, and the
Reconfirm box, empty.
Protecting Notebooks 21
Notebook Manager Basics
7 Click OK.
Working with Sections in Notebook sections are place-holders in the notebook. They contain
the Notebook Manager notebook items, but no data. However, you can name, open, and close
notebook sections.
You can create as many new sections as you want in a notebook. You
may also create reports within each section to document the items in
each section.
Creating New Items in Using the right-click shortcut menu, you can create new sections and
the Notebook Manager items in the Notebook Manager, such as:
➤ Worksheets
➤ Excel Worksheets
➤ Graph pages
➤ Reports
➤ Equations
➤ Sections
➤ Macros
2 On the shortcut menu click New, and then select the item to
create. The new section or item appears in the Notebook
Manager.
Copying and Pasting to Another method to create a new notebook section is to copy and paste a
Create New Sections section in the notebook window. Whenever you copy and paste a
section, its contents appear at the bottom of the notebook window.
SigmaPlot names and numbers the section automatically. For example, if
you copy notebook Section 3, the new section is named Copy of Section
3.
Copied sections create copies of all items within that section as well.
Protecting Notebooks 22
Notebook Manager Basics
Renaming Notebook You can change summary information for all notebook files and items.
Files and Items
To change summary information:
1 If the summary information is hidden on the Notebook Manager,
click View summary information.
In-place Editing Section You can change the name of a notebook section or item in the notebook
and Item Names itself without opening the Summary Information dialog box.
To in-place edit:
1 In the Notebook Manager, click the section or item you want to
rename.
! To change the name of the notebook, use the Save As dialog box. For more
information, see “Saving Your Work ” above.
Copying a Page to a If you copy a graph page into an empty section or a section that has no
Section with No worksheet, you create an independent page. The independent page
Worksheet retains all its plotted data without the worksheet. You can store the pages
from several different sections that have different data together this way.
However, if you ever create or paste a worksheet into a section, all
independent pages will revert to plotting the data from the new
worksheet.
Opening Files in the You can open SigmaPlot files and other types of files as SigmaPlot
Notebook Manager notebooks.
Protecting Notebooks 23
Notebook Manager Basics
1 Click the Open button on the Standard toolbar. The Open dialog
box appears.
Figure 10-1
Open Dialog Box
FIGURE 2–4
Open Dialog Box
4 If you want to open another type of file, choose the type of file from
the Files of type list.
Opening Worksheets, You can open a worksheet, report, or page by double-clicking its icon in
Reports, and Pages the Notebook Manager. You can also right-click the item, and on the
shortcut menu, click Open. Open worksheets, pages and report appear
in their own window, and in the notebook as a colored icons.
Opening Multiple Items You can open as many items as your system’s memory allows. You can
open multiple items from multiple notebooks. The selected item appears
highlighted in the Notebook Manager.
Protecting Notebooks 24
Notebook Manager Basics
Copying and Pasting You can copy and paste items from one open notebook file to another in
Items in the Notebook the Notebook Manager; however, you cannot copy a worksheet into a
Manager notebook section that already contains a worksheet.
2 Right-click the section where you want to paste the item, and on
the shortcut menu, click Paste. The selected item is pasted to the
current notebook and section.
Items removed from a notebook file using the Delete button are
removed permanently.
Protecting Notebooks 25
Notebook Manager Basics
Protecting Notebooks 26
Using the Worksheet
27
Using the Worksheet
FIGURE 3–1
Example of a
Worksheet
Column numbers and titles
appear here.
Opening New and To begin a new worksheet, choose the File menu New... command, or
Multiple Worksheets click the toolbar button. The New dialog box appears. Select
Worksheet from the list in the New drop-down list, select SigmStat 3.1
from the Type option, then click OK.
FIGURE 3–2
Selecting a Worksheet from
the New Dialog Box
FIGURE 3–3
Example of Multiple
Worksheets
SigmaStat also supports Microsoft Excel worksheets that you can use to
run tests and create graphs on your data. To open an Excel worksheet in
SigmaStat, choose the File menu New... command, or click the toolbar
button. The New dialog box appears. Select Worksheet from the
New drop-down list, select Excel from the Type option, then click OK.
You can also click the toolbar button to start an Excel worksheet..
FIGURE 3–4
Selecting an Excel
Worksheet from
the New Dialog Box
An Excel worksheet appears in its own section of the notebook, and the
Microsoft Excel menus, menu commands, and toolbars appear in
SigmaStat.
Opening New and To begin a new Excel worksheet, choose the File menu New... command
Multiple Excel or click the toolbar button. The New dialog box appears. Select
Worksheets Worksheet from the New drop-down list, select Excel Worksheet from
the Type list, then click OK. A new Excel Worksheet appears in the
SigmaStat window. You can open as many Excel worksheets as desired.
Each worksheet you open is assigned to its own notebook section.
1 Move the pointer to the cell where you want to begin and click, or
move the cursor to the desired location.
FIGURE 3–5
Entering Data in
the Worksheet
You can move around the worksheet using the scroll bars or by moving
the cell highlight using the keyboard.
Function Keystroke
Move one column right/left * or +
Move one row up/down ) or (
Going to a To move the cell highlight to any cell in the worksheet, specifying the
Specific Cell column and row number in the Go To Cell dialog box, double-right-
click the worksheet icon in the upper left corner of the worksheet, or
choose the Edit menu Go To... command. The Go To Cell dialog box
appears.
FIGURE 3–6
Using the Go To Dialog Box
to Move to a Specific
Cell in the Worksheet
Enter the desired column and row number. To select the block of cells
between the current highlight location and the new cell, click the Extend
Selection to Cell check box. Click OK to move to the new cell.
➤ Drag the mouse over the desired worksheet cells while pressing and
holding down the left mouse button.
➤ Hold down the Shift key and press the arrow, PgUp, PgDn, Home,
or End keys.
➤ Use the Go To... command (see Going to a Specific Cell on page 32)
Selecting Columns and To select an entire column, move the pointer over the column. When
Rows the pointer changes to a downward pointing arrow, click or drag to
highlight the desired columns.
FIGURE 3–7
Selecting a Block
of Data in the Worksheet
To select entire rows, move the pointer to the left of the rows. When the
pointer changes to a right-pointing arrow, click or drag to select the
desired rows.
Selecting the To select all data in the worksheet, double-click the worksheet icon in
Entire Worksheet the upper left corner of the worksheet. To select the entire worksheet,
double click the worksheet icon.
Use the appropriate Edit menu commands to cut, copy, paste, or clear a
selected cell or block. You can also press the Ctrl+X, Ctrl+C, and Ctrl+V
and the toolbar , , and buttons to cut, copy, and paste data.
You can also access the Edit menu Cut, Copy, and Paste using the right-
click popup menu. Right-click the column with the data you want to
cut, copy, or paste, then choose Cut, Copy, Paste, or Delete.
Cutting and To remove a selected cell or block of data from the worksheet and copy it
Copying Data to the Clipboard, choose the Edit menu Cut command, click the toolbar
button, or press Ctrl+X. To copy a selected cell or block of data from
the worksheet to the Clipboard without removing it from the worksheet,
choose the Edit menu Copy command, click the toolbar button, or
press Ctrl+C.
! The Clipboard is a data buffer which retains the last cut or copied data
block. Subsequent cuts or copies overwrite the current Clipboard contents.
Pasting Data To paste cut or copied data from the Clipboard, click or move the
worksheet cursor to the cell you want to paste the data to; then choose
the Paste command, click the toolbar button, or press Ctrl+V. The
Clipboard contents appear in the specified cells of the worksheet.
Moving Data To move a block of data cut it, select the upper-left cell of the new
location, then paste the block.
Deleting Data Use the Clear command or press the Delete key to permanently erase
selected data. The data is not copied to the Clipboard.
Stacking Columns 10
You can merge the contents of two or more columns by stacking the
column contents on top of each other.
2 Select the output column to place the stacked data by clicking the
worksheet column.
FIGURE 3–8
Stacking Data in
the Worksheet
Column 5 contains
the results of stacking
columns 1 through 4.
Note that you cannot stack blocks of data, only entire columns.
Stacking Columns 34
Using the Worksheet
You can insert and delete blocks of cells as well as multiple columns or
rows using the Edit menu Insert Cells and Delete Cells commands.
You can cut or delete data and titles in columns by highlighting the
column, then cutting or deleting, but this does not shift or affect other
columns.
3 Select the direction to shift the existing data when the empty block
is inserted or deleted. Select Columns or Rows to insert or delete
the columns or rows specified in the selected block.
FIGURE 3–9
Inserting an Empty Block
of Data into the Worksheet
FIGURE 3–10
The Result of Inserting
an Empty Block with
Cells Shifted Down
FIGURE 3–11
Viewing a Worksheet
Right-Click Popup Menu
Press the Ins key or on the Edit menu click Insertion Mode to switch
between overwrite and insert data entry modes.
If in Insertion mode, “Ins” appears in the status bar. A check mark next
to the Insertion Mode command on the Edit menu also indicates that the
worksheet is in insertion mode. New data entered in a cell does not erase
the previous contents. Any existing data in the column is moved down
one row. If you paste a block of cells, existing data is pushed down and/
or to the right to make room for the pasted cells. If you cut or clear data,
data below the deleted block moves up and/or to the left.
Column and row titles label and identify data. Reports reference column
titles when building tables of results. The Indexing transform also uses
column titles to build index columns. Some column titles are generated
automatically when residuals or other results are placed in the worksheet.
Column titles also appear in the Graph Wizard when picking columns
to plot and can be used in transforms instead of column numbers.
! You must use at least one text character in every column title. If you need to
use a number as column title, type a space character (by pressing the space
bar) before the number.
Using the Column and You can enter and edit column and row titles using the Column and
Row Titles Row Titles dialog box.
Dialog Box
To enter or edit a column or row title:
Figure 3–12
Entering a Column Title
Using the Column and Row
Titles Dialog Box
2 Click the Column tab to enter or edit a column title, or the Row
tab to enter or edit a row title.
5 Click OK to close the Column and Row Titles dialog box when
you are finished editing the titles.
Using a Enter labels into a row, then use that row for worksheet column titles.
Worksheet Row This is useful for data imported or copied from spreadsheets.
for Column Titles
All the cells of the selected row are promoted, not just those cells which
contain column titles. This may effect other data sets in the worksheet.
2 Select the cells in the row you want to use as column titles.
The row you wish to promote appears in the Promote row to titles
box.
5 To delete the original row once it has been promoted, select Delete
Promoted Row.
6 Click Promote.
Using a Worksheet Enter labels into a column, then use that column for worksheet row
Column for Row Titles titles. This is particularly useful for data imported or copied from
spreadsheets.
All the cells of the selected row are promoted, not just those cells which
contain column titles. This may effect other data sets in the worksheet.
6 Click Promote.
Using a Cell as a Column Use the Column and Row Titles dialog box to promote individual cells
or Row Title to column and row titles.
3 Click the Row tab to promote a row cell to title; click the Column
tab to promote a column cell to a title.
4 Click Promote.
Use the Options dialog box to set the default for how data is displayed in
the worksheet. You can also set the default for acolumn widths and row
height. Options set here appear in all subsequently opened worksheets.
The Options dialog box Worksheet tab sets the display for:
➤ Numeric
➤ Date and Time
➤ Statistics
➤ Appearance
Setting Worksheet To set the way numbers are displayed in the worksheet, select one of the
Numeric Display numeric formats available on the Options dialog box.
Figure 3–13
Selecting a Numeric Display
on the Options Dialog Box
Setting Decimal Places The column width limits the number of decimal places allowed. The
in the Worksheet maximum number of decimal places cannot exceed the column width.
Figure 3–14
Setting Decimal Places on
the Options Dialog Box
3 Select the number of decimal places from the Decimal Places drop-
down list.
Changing Date and Time SigmaStat has a variety of date/time displays. When you enter a value
Display in a Worksheet into a date/ time formatted cell, SigmaStat assumes internal date/time
information about that value from the year to the millisecond.
For example, if you enter a day and month, you can display the month
and year.
Figure 3–15
Using the Options Dialog
Box to Change the Date and
Time Display on the
Workhsheet
3 To change the Date format, you can type a format listed below, or
select a format from the drop-down list.
Typing Displays
M/d/yy No leading 0 for single digit month, day or year
M/d/yy Leading 0 for single digit month, day or year
MMMM Complete month
dddd Complete day
yyy or yyyy Complete year
MMM Three-letter month
ddd Three-letter day
gg Era (AD or BC)
Typing Displays
hh or h 12 hour clock
HH or H Military hours
mm or m Minutes
ss or s Seconds
uu or u Milliseconds
H: h: m: s: or u No leading zeroes for single digits
HH: hh: mm: ss: uu Leading zero for single digits
tt Double letter AM or PM
t Single letter AM or PM
Day Zero Setting a Start Date is only necessary if you are importing numbers to be
converted to dates, or converting dates to numbers for export. The
starting date must match the date used by the other application
3 Select a date from the Day Zero drop-down list, or type your own
start date. SigmaStat provides three start dates:
➤ 1900
➤ 1904
Figure 3–16
Using the Options Dialog
Box to Set the Start Date for
Date and Time Data
➤ -4713
Day Zero becomes the number 1.00 when you change from Date and
Time to Numbers format. The basic unit of conversion is the day; that
is, whole integers correspond to days. Fractions of numbers convert to
times. Zero and negative numbers entered into the worksheet convert to
days previous to the Day Zero start date.
Conversion between date/time values and numbers can occur for the
calendar range of 4713 BC to beyond the year 4,000 AD. The internal
calendar calculates dates using the Julian calendar until September,
1752. After that, dates are calculated using the Gregorian calendar.
! If you convert numbers to dates, a start date is applied. If you convert the
dates back to numbers, be sure you use the same start date as when you
converted them, or they will have a different value.
! Date and time values appear on the worksheet using the date and time
delimiters, generally a forward slash (/) or colon (:).
Changing Worksheet Use the Options dialog box to adjust column width and row height, set
Appearance grid line color and thickness, set the data feedback display, and to set the
worksheet font and size.
! To learn how to set data feedback colors, see Setting Data Feedback Colors
below.
3 To adjust column width and row height, select from the the
Column Width and Row Height drop-down lists. The maximum
number of columns displayed depends on the resolution of your
display and the size of the worksheet window.
Note that you can drag the boundaries of worksheet columns and
rows to resize them. For more informaton, see Sizing Individual
Columns and Rows on page 3-51.
Cell entries whose length exceeds the column width are displayed
as greater than symbols (####).
4 To set color and thickness, select from the Color and Thickness
drop-down lists.
5 To set the font style and size, select from the Font and size drop-
down lists.
Figure 3–17
Using the Options Dialog
Box to Set Worksheet
Line Thickness
Setting Data Feedback Data Feedback highlights the cells and columns on the worksheet that
Colors correspond to the selected curve or datapoint’s X and Y values.
Set data feedback colors and thickness from the X and Y drop-
down lists.
Figure 3–18
Using the Options Dialog
Box to Set Worksheet
Data Feedback Colors
3 To set the data display, click the Data tab. Under Type, select
Numeric, Text, or Date and Time data.
FIGURE 3–19
Using the Format Cells
Dialog Box to Set the
Numeric Display on a
Workhsheet
4 To set the column width and row height, click the Rows and
Columns tab.
You can also drag the boundaries of column and row headings to
resize. To learn more, see Sizing Individual Columns and Rows
below.
5 Select heights and widths from the Height and Width drop-down
lists.
Sizing Individual If the contents of your column exceed the column width, cell contents
Columns and Rows display as pound symbols (####). Label entries are truncated.
To change a column width, drag the boundary on the right side of the
column heading until the column is the size you want.
To change a row height, drag the boundary below the row heading until
the row is the size you want.
FIGURE 3–20
Dragging a Column
Heading to increase the
Column Width
Sorting Data 10
! Because the sort command sorts data in place, if you want the original data to
remain intact, copy the data to a new location and sort the copied data.
Sorting Data 52
Using the Worksheet
FIGURE 3–21
The Sort
Selection Dialog Box
3 Select the key column to use. If you sort more than one column of
data, the key column is used as the sorting index for all other
selected data. Only the key column is sorted. The rows in the other
selected columns are “attached” to the original rows in the key
column, and follow the rows in the key column as they are sorted.
Note that this will not necessarily sort the other columns in
ascending or descending order; instead, the order is determined by
the order the key column was sorted.
Sorting Data 53
Using the Worksheet
FIGURE 3–22
The Column
Statistics Worksheet
You can close the Column Statistics window by clicking the button
in the upper right corner of the worksheet window, by choosing the
View menu Statistics command again, or by choosing the File menu
Close command.
Column Statistics To display only a portion of the available statistics use the Worksheet
Options Preferences dialog box, then select column statistics to show or hide.
2 Select the statistic(s) you want shown or hidden, then use the
Show and Hide buttons to move the statistics between the Shown
and Not Shown lists.
width, see page 41. For more information on changing other data
display settings, see page 41.
Available Statistics The statistics shown in the Column Statistics window are determined by
your settings in the Column Statistics Options dialog box (see the
preceding section, Column Statistics Options). The following statistics
can be displayed in the Column Statistics window. Empty cells, missing
values, and text are ignored in most calculations.
Mean The arithmetic mean, or average, of all the cells in the column,
excluding the missing values. This is defined by:
n
x , --n1- - xi
i ,1
Std Dev The sample standard deviation is defined as the square root of
the mean of the square of the differences from their mean of the data
samples xi in the column. Missing values are ignored.
1---
2
n
2
- . xi – x /
1
s = ----------
-
n–1
i =1
Std Err The standard error is the standard deviation of the mean. It is
the sample standard deviation divided by the square root of the number
of samples. For sample standard deviation s:
s
Std Err , -------
n
95% Conf The value for a 95% confidence interval. The end points of
the interval are given by:
s-
& t . v, z / ------
x0
n
where x is the mean, s is the sample standard deviation, and t(v,z) the t
statistic for v = n&1 degrees of freedom and z , 1.96 standard normal
percentile equivalent.
99% Conf The value for a 99% confidence interval. The end points for
this interval are computed from the equation for the 95% confidence
interval using z , 2.576.
Size The number of occupied cells in the column, whether they are
occupied by data, text, or missing values.
Min The value of the numerically smallest data value in the column,
ignoring missing values.
Max The value of the numerically largest data value in the column.
Printing To print or export column statistics, choose the File menu Print...
Column Statistics command and choose Statistics from the Print drop-down list. Note that
in order to print the name of the statistic, you must select to print the
row titles by clicking Options, then checking Row Titles by selecting it
from the Headers options.
SigmaStat Worksheets are saved to Notebook files. To save data for the
current worksheet to a notebook file, choose the File menu Save
command, press Ctrl+S, or click the toolbar button.
If you are saving the notebook for the first time, the Save As dialog box
appears prompting you for a file name and path for the notebook file. If
you are saving the worksheet to an existing notebook file, the notebook
is updated to include the new worksheet or the changes to the existing
worksheet.
! To save worksheets as non-notebook files, you must export them using the
File menu Export... command. For more information on exporting
worksheets, see Exporting Worksheets on page 58.
! To save worksheets as non-notebook files, you must export them using the
File menu Export... command. For more information on exporting
worksheets, see the following section.
Exporting Worksheets 10
Exporting Worksheets 58
4
Opening Worksheets 10
To open a worksheet, choose the File menu Open... command, click the
toolbar button, or press Ctrl+O. When the Open dialog box appears,
select the type of worksheet you want to open by selecting a file type
from the List Files of Type drop-down list, then click OK.
Worksheets in Notebook If you open a SigmaStat Notebook file type (.SNB), a notebook file
Files appears displaying its sections and items. To view the desired worksheet,
double-click the worksheet icon in the appropriate notebook section.
Non-Notebook Non-notebook files are individual files which are separate from the
Worksheet Files notebook. They are automatically converted to notebook file format
when opened in SigmaStat. You can open the following non-notebook
worksheet file types in SigmaStat:
Opening Worksheets 59
➤ Mocha and SigmaScan files (.MOC)
➤ DIF (.dif )
Importing Data 10
➤ MS Excel (*.xls)
➤ Plain Text (*.txt)
➤ Comma Delimited (*.csv)
➤ MS Access (*.mdb)
➤ SPSS (*.sav)
➤ SigmaPlot 1.0, 2.0 Worksheet (*.spw)
➤ SigmaPlot Mackintosh 4.0 Worksheet (*.sp5)
➤ SigmaPlot Mackintosh 5.0 Worksheet (*.spw)
➤ SigmaStat 1.0 Worksheet (*.spw)
➤ SigmaPlot DOS Worksheet (*.sp5, *.spg)
➤ SigmaStat DOS Worksheet (*.sp5)
➤ SigmaScan, SigmaScan Pro Worksheets (*.spw)
➤ Mocha, SigmaScan Image Worksheets (*.spw)
➤ Axon Binary (*.abf, *.dat)
➤ Axon Text (*.atf., etc)
➤ Lotus 1-2-3 (*.wks, *.wk1, *.wk3, *wk4)
➤ DBase (*.db2, *.db3, *.dbf )
➤ Quattro Pro (*.wq1, *.wkq)
➤ Paradox (*.db)
➤ Symphony (*.wk1, *.wr1, *.wrk, *.wks)
➤ SYSTAT (*.sys, *.syd)
➤ TableCurve 2D & 3D (*.tvc, *.txt, *.prn)
➤ DIF (*.dif )
Importing Data 60
1 Move the worksheet cursor to the worksheet cell where you want
the imported data to start.
2 Choose the File menu Import Data... command. A file dialog box
displaying the current drive, directory, and files appears.
3 Use the List Files of Type box to select the type of file you want to
import.
4 Change the drive and directory as desired, select the file you want
to read, then click Import, or double-click the file name.
Depending on the type of file, the data is either imported
immediately, or another dialog box appears.
Select the start and end of the range; the default is the entire range. The
dialog box lists the insertion point in the SigmaStat worksheet.
FIGURE 4–1
Import SPW Dialog Box
Importing Data 61
After selecting the range, click Import to place the data in the SigmaStat
worksheet.
FIGURE 4–2
Import Spreadsheet
Dialog Box
! Importing data from an Excel file places the data into a worksheet.
To open an Excel worksheet in SigmaStat, see Using Excel
Worksheets in SigmaStat on page 29.
Importing Text Files If you are importing a text file, the Import Text dialog box appears. Use
this dialog box to view the text file and to specify other delimiter types
or to build a model of the data file according to custom column widths.
The drop-down lists display all delimiters used by all saved formats
(see Saving Text Import Formats below.).
Importing Data 62
FIGURE 4–3
The Import Text Dialog Box
Saving Text Import Formats You can save the specifications used to
import a text file for future use. Enter a name into the Format scheme
box, then click Add. Delete unwanted import formats using the
Remove button.
When you are finished specifying the file parameters, click Import. The
specified data from the file is imported.
Occasionally you may need to rearrange data from a row oriented format
to a column-wise organization or vice versa. In this case, you can use the
Edit menu Transpose Paste command to paste Clipboard contents with
the row and column coordinates transposed.
1 Drag the mouse or use the Shift+arrow keys to select the block of
data whose rows and columns you want to transpose.
3 Select the cell to paste the beginning of the data, then choose the
Edit menu Transpose Paste command. The data is pasted to the
worksheet with the column and row coordinates reversed.
FIGURE 4–4
Results of Switching Rows
to Columns Using the
Transpose Paste Command
Columns 1 and 2
were copied and then
transposed pasted,
beginning in
column 3, row 1.
There are several forms of data that can be analyzed by SigmaStat t-tests,
analysis of variances (ANOVAs), repeated measures ANOVAs, and their
nonparametric analogs, such as:
➤ Raw data, which places the data for each group in separate columns;
this is the format used by SigmaStat.
➤ Indexed data, which places the group names in one column, and the
corresponding data for each group in another column.
➤ Statistical summary data, which can be used by unpaired t-tests and
One Way ANOVAs.
The data format is set in the Pick Columns dialog box that appears after
choosing the Statistics menu Run Current Test... command or clicking
the toolbar Run icon.
t-tests and Rank Tests The groups to be compared are always placed in
two columns.
Paired t-tests and signed rank tests (both repeated measures tests)
assume that the data for each subject is in the same row.
FIGURE 4–5
Raw Data for an
Unpaired t-test
For more information on arranging data for t-tests and rank tests, see
pages 8-207 and 8-221.
One Way ANOVA and One Way ANOVA on Ranks Data for each
group is placed in separate columns, with as many columns as there are
groups. One way repeated measures ANOVA and one way repeated
measures ANOVA on ranks assume that the data for each subject is in
the same row.
For more information on arranging data for one way ANOVAs, see page
8-231.
Raw Data for Two and Three Way ANOVAs The Two way
ANOVA,Two Way repeated measures ANOVA, and Three Way
ANOVA cannot analyze raw data and require indexed data; for a
description of indexed data, see Indexed Data below. For more
information on using the Index command, see INDEXING DATA on page
73.
For more information on arranging data for Two Way ANOVAs, see
Arranging Two Way ANOVA Data on page 254.
Two way ANOVAs require two factor columns and one data column.
Three Way ANOVAs require three factor columns and one data column,
and Repeated measures ANOVAs require an additional subject column
to identify the subject of the measurement.
The order of the rows containing the index and data does not matter;
i.e., they do not have to be grouped or sorted by factor level or subject.
! If you are analyzing entire columns of data, the location in the worksheet of
the factor, subject, and data columns does not matter.
If you plan to compare only a portion of the data, put the index in the
FIGURE 4–6
Indexed Data for
a One Way ANOVA
Column 1 (Species) is the
factor column, with levels A,
B, and C, and column 2
(Density) is the
corresponding data.
You can index data using Edit menu Index command. For information
on indexing data, see INDEXING DATA below.
For more information on arranging data for the t-test and the Rank Sum
Test, see pages 8-207 and 8-221 in Chapter 9.
For more information on arranging data for the Paired t-test and the
Signed Rank Sum Test, see pages page 333 and page 346 in Chapter 10.
For more information on arranging data for the One Way ANOVA and
the ANOVA on Ranks, see pages 8-231 and 9-311 in Chapter 9.
Two Way ANOVA Two factor columns are required for Two Way
ANOVAs, one for each level of the observation. Each data point should
be represented by different combinations of the factors; see Table 4-3 on
page 74 and Figure 4–13 on page 75 for an example. The factors are
Gender and Drug, and the levels are Male/Female and Drug A/Drug B.
! If you do not want to bother entering indexed data for a Two Way ANOVA,
you can enter the data for each cell of the Two Way ANOVA table into
separate columns, then use the Edit menu Index command to create the
indexed columns. See page 4-73 for this procedure.
For more information on arranging data for the Two Way ANOVA, see
9-254 in Chapter 9.
Three Way ANOVA Three factors are required for Three Way
ANOVAs, one for each level of observation. Each data point should be
represented by different combinations of the factors.
For more information on arranging data for the Three Way ANOVA, see
Three Way Analysis of Variance (ANOVA) on page 283.
FIGURE 4–7
Indexed Data Format
for a Two Way Repeated
Measures ANOVA of the
Data from Table 3-1
Column 1 is the subject
column, columns 2 and 3
are the factor columns, and
column 4 is the data column.
Statistical Unpaired t-tests and one way ANOVAs can be performed on summary
Summary Data statistics of the data. These statistics can be in the form of:
➤ The sample size, mean, and standard deviation for each group, or
➤ The sample size, mean, and standard error of the mean (SEM) for
each group.
FIGURE 4–8
Statistical Summary Data
for a One Way ANOVA
another column, and the standard deviations (or standard errors of the
mean) in a third column, with the data for each group in the same row.
Data for 12 (Chi-Square) tests, the Fisher Exact Test, and McNemar’s
Test can be arranged in the worksheet as either the contingency table
data, or as indexed raw data.
Tabulated Data Tabulated data is arranged in a contingency table showing the number of
observations for each cell. The worksheet rows and columns correspond
to the groups and categories. The number of observations must always
be an integer.
Note that the order and location of the rows or columns corresponding
to the groups and categories is unimportant. You can use the rows for
category and the columns for group, or vice versa.
Fisher Exact Test The data for a Fisher exact test must form a 2 x 2
(two rows by two columns) contingency table, with 5 or less expected
observations in one or more cells of the table.
Raw Data You can report the group and category of each individual observation by
placing the group in one worksheet column and the corresponding
category in another column. Each row corresponds to a single
observation, so there should be as many rows of data as there are total
numbers of observations.
FIGURE 4–9
Worksheet Data
Arrangement for
Contingency Table
Data from Table 3-1
Columns 1 through 3 are in
tabular format, and columns
4 and 5 are raw data.
Fisher Exact Test There can be no more than two categories for each
group, so that exactly four possible combinations result. There should be
no more than 5 observations in at least one combination of categories.
McNemar’s Test There must be the same set of categories used in each
column.
Raw Data All regressions use data arranged in raw data format. To enter data in raw
format, place the data for the observed dependent variable in one
FIGURE 4–10
Data for a Multiple
Linear Regression
Temperature and pH are
the independent
variables, and Growth
Rate is the dependent
variable.
Grouped Data Only the Logistic Regression uses the grouped data format. Use grouped
data to specify the number of instances a combination of dependent and
independent variables appear in a logistic regression data set. This data
format is useful if you have several instances of the same variable
combination, and you don’t want to enter every instance in the
worksheet.
To enter data in grouped format, place the data for the observed
dependent variable in one column and the data for the corresponding
independent variables in one or more columns. Only enter one instance
of each different combination of dependent and independent variables,
then specify the number of times the combination appears in the data set
in the corresponding row of another worksheet column.
For example, if there are three instances of the dependent variable 0 with
corresponding independent variables of 26, and 142, place 0 in the
dependent variable column, 26, and 142 in the corresponding rows of
the independent variable columns, and 3 in the corresponding row of
the count worksheet column.
You can convert raw data to indexed data and vice versa, using the
Transforms menu Index and Unindex commands. You can index and
unindex data with one and two factors.
Creating Before indexing data, add titles to the columns. The column title strings
Indexed Data are used as the index codes.
! If you are indexing two ways, you must use columns titles consisting of the
levels of the two factors for that table cell, separated by a hyphen (–), forward
slash (/) or colon (:). These levels names will be used for the index codes.
For more information on entering unindexed cell data for a Two Way
ANOVA, see page Indexing Data on page 73.
To index data:
FIGURE 4–11
The Results of a One Way
Index of Columns 1, 2, and 3
The results appear in
columns 4 and 5.
2 Select the output column to place the indexed data by clicking the
worksheet column. This should be an empty column with at least
one empty column to the right for a One Way ANOVA, or two
empty columns for Two Way ANOVA.
Indexing Data 73
3 Select the columns to index, either by clicking the worksheet
columns, or selecting the column from the Data for Input drop-
down list. Click Finish to index the contents of the selected input
columns in the selected output column.
Indexing Raw Data for a Two Way ANOVA In order to index data for
a Two Way ANOVA, you must have entered the data for each cell of the
Two Way ANOVA table into separate columns before indexing.
1 Decide on the strings to use as the indexes for the factor levels.
These should be no longer than six characters in length.
2 Enter the factor level combination for each cell, using the index
names for the levels, as the titles for the columns.
Repeat this for every cell, being sure that you enter the name for
the level identically each time, with the first factor level entered
first, followed by the second factor level.
3 Enter the cell data into the column with the corresponding
column title.
Indexing Data 74
FIGURE 4–12
Raw Data Format for a Two
Way ANOVA, Arranged by
Cell (see Table 3-4)
TABLE 4-3
4 To create the indexed columns, choose the Transforms menu Index
A Two Way command, choose Two Way, and select the columns as directed.
ANOVA table The levels used in the column titles are used as the level indexes.
The factors are Gender and
Drug, and the levels are Male/ 5 If you are indexing data for a two way repeated measures test, you
Female and Drug A/Drug B
still need to enter the subject index. Use the Edit menu Copy,
Paste, and Stack commands to quickly create a subject index
column.
FIGURE 4–13
Columns 5 through 7
contain indexed data for a
Two Way ANOVA, generated
using the Index Two Ways
command.
Unindexing Data Indexed data can be unindexed for graphing purposes using the unindex
command.
Indexing Data 75
FIGURE 4–14
Results of a Two Way
Unindex of Columns
5, 6, and 7
Columns 5 and 6 are the
factor columns, and column
7 is the data column. The
unindexed data was placed in
column 8 and appears in
columns 8 through 11.
4 The data is unindexed into raw data. The level indexes are used as
the column titles. If you unindexed one way, each column contains
the data for each level of the factor.
5 If you unindexed two ways, each column contains the data for one
cell in the Two Way ANOVA table, and the two factor levels
appear as the column title, separated by a hyphen (-).
Quick Mathematical SigmaStat provides quick transform functions that can be executed from
Transforms a menu command. These functions are:
➤ Add
➤ Subtract
➤ Divide
➤ Square
➤ Absolute value
➤ Natural logarithm
FIGURE 4–15
The Quick Transforms
and Transforms Available
from the Transform Menu
2 Select the column with the data you want to manipulate as your
input column.
3 Select the column where you want to place the transform results as
your output column, then click Run. The results appear in the
specified column.
User-Defined You can specify transforms other than those provided as commands in
Transforms the Transforms menu using the User-Defined... command. User-
defined transforms use the SigmaStat transform language. These
transforms are defined by typing equations using variables you define,
the transform language functions, and standard math arithmetic and
logic operators.
User-defined transforms can use data from the worksheet as well as save
equation results to the worksheet.
2 Select the edit box to begin entering the desired equations. As you
enter your equation, the window scrolls up to accommodate all of
the
FIGURE 4–16
The User-Defined
Transform Dialog Box
The transform entered
into the Edit Window
recodes the numeric data
in column 1 to the values
“SMALL,” “MEDIUM,” and
“LARGE” in column 2.
FIGURE 4–17
The Results of the
User-defined Recoding
Transform from Figure 4–16
You can send the contents of the worksheet to a printer using the toolbar
button or the File menu Print... command. To print the worksheet:
1 Make sure that the worksheet is the active window. If you want to
print only a portion of the columns in the worksheet, select a block
from the worksheet.
FIGURE 4–18
The Print Dialog Box for the
HP LaserJet 4/4M Postscript
Printer Driver
3 Set the appropriate options, then click OK. The Print Data
Worksheet dialog box appears.
4 Specify whether you want to print the entire worksheet, only the
selected cells in the worksheet, or a specified range of columns by
selecting one of the options under the Area to Print heading.
5 To include page, column, and row titles, and column numbers for
the current worksheet, select the appropriate check boxes.
6 To print the data at the full twenty-one place precision, select the
Full Precision option. Otherwise, the data is printed as displayed in
the worksheet. Worksheet data display is controlled with the File
menu Preferences... command (see page 41).
FIGURE 4–19
The Print Data
Worksheet Dialog Box
Printing Column To print column statistics, select the column statistics worksheet, click
Statistics the toolbar button, choose the File menu Print... command, or press
Ctrl+P, then follow the procedures for printing the worksheet (see
PRINTING WORKSHEET DATA on page 80).
! Note that in order to print the names of the statistics that appear in the row
region of the worksheet, you must select to print row titles.
2 Answer the questions about what you want to do and the format of
your data. Click Next to go to the following dialog box, Back to go
to the preceding dialog box, Finish to view the suggested test, or
Cancel to close the Advisor Wizard.
3 When a test is suggested, click Run to perform the test. The Pick
Columns dialog box for the suggested test appears prompting you
to select the worksheet columns with the data you want to test. For
information on how to use this dialog box, see page 99.
The remainder of this chapter describes the answers for each dialog box.
The first step in assigning a test appropriate to your data is defining what
you want to accomplish. SigmaStat’s Advisor begins by prompting you
to select what you need to do. After selecting the desired general goal,
you are either prompted for additional information or a dialog box
appears suggesting the test to use.
Describe your Data with Select this option if you want to view a list of descriptive statistics for
Basic Statistics one or more columns of data.
After you select this option, click Finish. SigmaStat suggests performing
the Describe Statistics test. Click Run to perform the test. The Pick
Columns dialog box appears prompting you to pick the columns you
want to use for the test. For directions on performing this test, see page
107. For information on the results of this procedure, see page 108.
FIGURE 4–1
The Test Suggestion
Dialog Box Prompting You
to Run the Suggested Test
Comparing Groups Select this option if you want to compare data for significant differences,
or Treatments for example, if you want to compare the mean blood pressure of people
for Significant who are receiving different drug treatments. The data to be compared
Differences can be the data collected from different groups, the data for different
treatments on the same subjects, or the distributions or proportions of
different groups.
If you select this option, you are asked to describe how your data is
measured; see How are the data measured? on page 85.
Predict a Trend, Select this option if you want to use regression to predict a dependent
Find a Correlation, variable from one or more independent variables, or describe the
or Fit a Curve strength of association between two variables with a correlation
coefficient. For example, select this option if you want to see if you can
predict the average caloric intake of an animal from its weight.
If you select this option, you are asked to describe how your data is
measured; see the following section, HOW ARE THE DATA MEASURED?.
Determine Select this option if you want to determine the desired sample size for an
Sample Size for an experiment you intend to perform.
Experimental Design
If you select this option, you are asked to describe how the data is
measured; see the following section, HOW ARE THE DATA MEASURED?.
Determine the Sensitivity Select this answer to determine the power or ability of a test to detect an
of an Experimental effect for an experiment you want to perform.
Design
If you select this option, then click Next, you are asked to describe how
the data is measured; see the following section, HOW ARE THE DATA
MEASURED?.
You need to define how your data are measured to determine which
SigmaStat test to perform for most procedures.
➤ By numeric values.
➤ By order or rank.
➤ By proportion or number of observations.
By Numeric Select this option if your data are measured on a continuous scale using
Values (e.g., numbers. Examples of numeric values include height, weight,
meters or degrees) concentrations, ages, or any measurement where there is an arithmetic
relationship between values.
By Order or Rank (e.g., Select this option if your data are measured on a rank scale that has an
poor, fair, ordering relationship, but no arithmetic relationship, between values.
good, excellent)
For example, clinical status is often measured on an ordinal scale, such
as: Healthy , 1, Feeling ill , 2, Sick , 3, Hospitalized , 4, and Dead ,
5. These ratings show that being dead is worse than being healthy, but
they do not indicate that being dead is five times worse than being
healthy.
By Proportion Select this option if your data is measured on a nominal scale, which
or Number of counts the number or proportions that fall into categories, and where
Observations in there is no relationship between the categories (such as Democrat versus
Categories (e.g., male Republican).
vs. female)
➤ If you are comparing groups or treatments for differences, you are
asked if you have repeated observations on the same individuals. See
Did you apply more than one treatment per subject? on page 86.
➤ If you are predicting a trend, click Finish. SigmaStat suggests
running a Multiple Logistic Regression. Click Run to perform the
test, Cancel to exit the Advisor and return to the worksheet, or Help
for information on the test. For information on how to perform a
Multiple Logistic Regression, see page 527. For information on
Logistic Regression results, see page 544.
➤ If you are determining a sample size or the sensitivity of a
experimental group, you are asked how your data is formatted. See
What kind of data do you have? on page 91.
Yes Answer Yes if the observations are different treatments made on the same
subjects. Select Yes when you are comparing the same individuals before
and after one or more different treatments or changes in condition. For
example, you would select Yes if you were testing the effect of changing
diet on the cholesterol level of experimental subjects, or if you were
taking an opinion poll of the same voters before and after a political
debate.
Two Select this option if you have two different experimental groups or if
your subjects underwent two different treatments.
After you select this option, SigmaStat suggests the appropriate test.
Click Finish to view the suggested test, then Run to perform the test.
Click Cancel to exit the Advisor and return to the worksheet, or Help
for information on the test.
Three or More Select this option if your group has three or more different groups to
compare, or are comparing the response of the same subjects to three or
more different treatments.
For example, if you collected ethnic diversity data from five different
cities, or subjected individuals to a series of four dietary changes and
measured change in serum cholesterol, you are analyzing three or more
groups.
After you select this option, click Finish. SigmaStat suggests the
appropriate test. Click Run to perform the test, Cancel to exit the
Advisor and return to the worksheet, or Help for information o the test.
There are two Select this option if each experimental subject is affected by two
combinations of groups different experimental factors or underwent two different treatments
or treatments to simultaneously. Note that different levels of a factor, such as male and
consider (e.g., males female for gender, are not considered to be different factors.
and females from
different cities) For example, if you were comparing only males and females, you would
have only one factor. However, if you compared males and females from
different countries, there would be two factors, gender and nationality.
After you select this option, click Finish. SigmaStat suggests the
appropriate test. Click Run to perform the test, Cancel to exit the
Advisor and return to the worksheet, or Help for information on the
test.
There are three Select this option if each experimental subject is affected by three
combinations of groups different experimental factors or underwent three different treatments
to consider. simultaneously. Note that different levels of a factor, such as male and
female for gender, and Italian and German for nationalities are not
considered to be different factors.
For example, if you are comparing only males and females, from Italy
and Germany, you have only two factors. However, if you are comparing
males and females from different countries with different diets, there are
three factors: gender, nationality, and diet.
After you select this option, click Finish to view the suggested test.
SigmaStat suggests you run a Three Way ANOVA. Click Run to
perform the test, Back to return to the previous Advisor panel, or
Cancel to return to the worksheet without running the test. For
directions on performing the Three Way ANOVA, see page 293. For
descriptions of the results for the Three Way ANOVA, see page 300.
This is a measure of the If you are determining power or sample size, this option also appears. If
association between two you select this answer, click Finish. SigmaStat suggests performing power
variables or sample size computations for a correlation coefficient.
Click Run to perform the test, Cancel to exit the Advisor and return to
the worksheet, or Help for information on the test.
You can have two kinds of data that are arranged by proportions in
categories. After specifying the kind of data you have, click Finish to
view the suggested test, Back to return to the previous panel, or Cancel
to quit the Advisor and return to the worksheet. Click Run to perform
the test, Cancel to return to the worksheet, or Help for information on
the test.
A Contingency Table Select this option if you have data in the form of a contingency table. A
contingency table is a method of displaying the observed numbers of
different groups that fall into different categories; for example, the
number of men and women that voted for a Republican or Democratic
candidate. These tables are used to see if there is a difference between the
expected and observed distributions of the groups in the categories.
A contingency table uses the groups and categories as the rows and
columns, and places the number of observations for each combination
in the cells. For more information on how to create a contingency table,
see page 443.
Observed Proportions Select this option when you have data for the sample sizes of two groups
and the proportion of each group that falls into a single category. This
data is used to see if there is a difference between the proportion of two
different groups that fall into the category. For information on how to
enter this data, see page 435.
If you select this option, click Finish to view the suggested test;
SigmaStat suggests that you compare proportions. Click Run to perform
the procedure, Cancel to quit the Advisor, or Help for information on
the test. For directions on performing this procedure, see page 438. For
descriptions on the results for this procedure, see page 439.
Fit a Straight Line Select this answer to find the slope and the intercept of the line
Through the Data
y = p0 + p1 x
that most closely describes the relationship of your data, where y is the
dependent variable and x is the independent variable.
If you select this option, click Finish to view the suggested test.
SigmaStat suggests performing a Linear Regression. Click Run to
perform the procedure, Cancel to quit the Advisor, or Help for
information on the test. For directions on performing this procedure, see
page 482. For descriptions on the results for this procedure, see page
483.
Fit a Curved Line Select this answer to find an equation that predicts the dependent
Through the Data variable from an independent variable without assuming a straight line
relationship. If you select to fit a curved line through your data,
SigmaStat asks you what kind of curve you want to use; see WHAT KIND
OF CURVE DO YOU WANT TO USE? on page 93.
Predict a Select this option if you want to predict a dependent variable from more
Dependent than one independent variable using the linear relationship
Variable from Several
Independent Variables y = b0 + b1 x1 + b2 x2 + b3 x3 + + bk xk
where y is the dependent variable, x1, x2, x3, ..., xk are the k independent
variables, and b0, b1, b2,...,bk are the regression coefficients. As the values
for xi vary, the corresponding value for y either increases or decreases
proportionately.
If you select this option, SigmaStat asks how you want to specify the
independent variables. See HOW DO YOU WANT TO SPECIFY THE
INDEPENDENT VARIABLES? on page 94.
Measure Variable Select this option to find how closely the value of one variable predicts
Association Strength the value of another (i.e., the likelihood that a variable increases or
decreases when the other variable increases or decreases), without
specifying which is the dependent and independent variable.
If you select this option, click Finish. SigmaStat suggests computing the
Pearson Product Moment Correlation. Click Run to perform the
procedure or Cancel to quit the Advisor, or Help for information on the
test.
If you are trying to predict one variable from one or more other variables
using a curved line, you are asked what kind of curve you want to use.
A Polynomial Select this option if you want to use a kth order polynomial curve of the
Curve with One form
Independent Variable
2 x
y = b0 + b1 x + b2 x + + bk x
A General Select this option if you want to describe your data with a nonlinear
Nonlinear Equation function. Common nonlinear functions include rising and falling
exponential and log curves, logistic sigmoid curves, and hyperbolic
curves that approach a maximum or minimum.
Let SigmaStat Select the Select this option if you want SigmaStat to screen the potential
“Best” Variables to independent variables you select and only include ones that significantly
Include in the Equation contribute to predicting the dependent variable. You are then asked how
you want to select the independent variables; see HOW DO YOU WANT
SIGMASTAT TO SELECT THE INDEPENDENT VARIABLE? below.
If you are predicting the value of one variable from other variables, and
you want SigmaStat to screen potential variables for their contribution
to the predictive value of the regression equation, you can select three
different methods.
Sequentially Add New Select this option to select the independent variables for the equation by
Independent Variables to starting with no independent variables, then adding variables until the
the Equation ability to predict the dependent variable is no longer improved. The
variables are added in order of the amount of predictive ability they add
to the model.
Sequentially Remove Select this option to select the independent variables for the equation by
Independent Variables starting with all independent variables in the equation, then deleting
from the Equation variables one at a time. The variable that contributes the least to the
prediction of the dependent variable is deleted from the equation first.
This elimination process continues until the ability of the model to
predict the dependent variable is reduced below a specified level.
If you select this option, click Finish. SigmaStat suggests the Backward
Stepwise Regression. Click Run to perform the test, Cancel to exit the
Advisor and return to the worksheet, or Help for information on the
test. For directions on performing this procedure, see page 594. For
descriptions on the results for this procedure, see page 596.
Consider All Possible Select this option if you want SigmaStat to evaluate all possible
Combinations of the regression models, and isolate the models that “best” predict the
Independent Variable dependent variable.
and Select the Best
Subset If you select this option, click Finish. SigmaStat suggests the Best Subset
Regression. Click Run to perform the procedure, Cancel to exit the
Advisor and return to the worksheet, or Help for information on the
test. For directions on performing this procedure, see page 618. For
descriptions on the results for this procedure, see page 619.
SigmaStat selects the sets of independent variables that “best” predict the
dependent variable using criteria specified in the Best Subsets Regression
Options dialog box.
The statistical procedure used to analyze a given data set depends on the
goals of your analysis and the nature of your data. The Advisor Wizard
asks you questions about your goals and your data, then selects the
appropriate test. For information on how to use the Advisor Wizard, see
Chapter 4. Alternately, you can perform SigmaStat's statistical
procedures directly by choosing the appropriate Statistics menu
command.
3 If desired, setting the test options using the selected test’s Options
dialog box.
4 the test by picking the worksheet columns with the data you want
to test using the Pick Columns dialog box.
Arranging Worksheet The method used to enter or arrange data in the worksheet depends on
Data the type of test you are running. For information on how to arrange data
for the different tests, see Data Format for Group Comparison Tests on
page 204, Data Format for Repeated Measures Tests on page 330, Data
Format for Rate and Proportion Tests on page 431, and Data Format for
Regression and Correlation on page 468.
Selecting a Test You can select a test by selecting the test from the drop-down list in the
toolbar or by choosing the appropriate Statistics menu command.
To change test options before you run a test, you must select the test in
the toolbar drop-down list, then choose the Statistics menu Current Test
Options... command, or click the toolbar button.
Setting Test Options Almost all SigmaStat procedures can be configured with a set of options.
These settings enable you to perform additional tests and procedures.
You may wish to enable or disable some of these options or change
assumption checking parameters; all changes are saved between
SigmaStat sessions.
1 Select the test you will be running from the drop-down list in the
toolbar, then click the button or choose the Statistics menu
Current Test Options... command.
2 Select the tab of the options you want to view. Click a selected
check box if you do not want to use that test option. Click an
unselected check box to include an option in the test.
FIGURE 5–1
Example of
an Options Dialog Box
Each test has
its own settings.
3 Click the Select All button to select all the options in the panel.
Click Clear to clear all the selected options in the panel.
4 Once you have changed the desired options, click Run Test to
continue the test. The Pick Columns dialog box appears see (the
next topic for more information). To accept the current settings
without continuing the test, click Apply. To close the dialog box
without changing any settings or running the test, click Cancel.
Select Help at any time to access SigmaStat’s on-line help system.
Picking Data to Test The Pick Columns dialog box is used to select the worksheet columns
with the data you want to test and to specify how your data is arranged
in the worksheet.
1 Start the test you want to run; this opens the Pick Columns dialog
box for that test. You can either:
➤ Select the test from the drop-down list in the toolbar, then click
the toolbar button.
➤ Click Run Test from the Options dialog box.
➤ Choose the test from the Statistics menu.
2 If your data can be arranged in more than one format, the Pick
Columns dialog box appears prompting you to specify a data
format. Select the appropriate format from the Data Format drop-
down list, then click Next.
The available formats depend on the test you are running. For
information on how data can be arranged for the different
SigmaStat tests, see Data Format for Repeated Measures Tests on
page 330, Data Format for Rate and Proportion Tests on page 431,
and Data Format for Regression and Correlation on page 468.
FIGURE 5–2
The Pick Columns
Dialog Box for a One Way
ANOVA Prompting
For a Data Format
The data formats available
depend on the type of test
you are running.
If the test you are running uses only one type of data format, the
Pick Columns dialog box appears prompting you to select the
columns with the data you want to test (see the following step).
3 If you selected columns before you chose the test, the selected
columns automatically appear in the Selected Columns list. To
assign the desired worksheet columns to the Selected Columns list,
select the columns in the worksheet, or select the columns from
the Data drop-down list.
The dialog box indicates the type of data you are selecting.
FIGURE 5–3
The Pick Columns Dialog
Box
for a One Way ANOVA
Using Raw Data
If you select your data
columns before you
run the test, the columns
appear in the dialog box.
FIGURE 5–4
The Pick Columns
Dialog Box
for the Forward Stepwise
Regression Prompting
You to Specify the
Variables to Force into
the Regression Equation
FIGURE 5–5
The Convert Empty Cells
to Missing Values Dialog
Box
Reports and Test reports automatically appear after a test has been performed. To
Report Graphs generate a report graph, make sure the report is the active window, then
click the toolbar button, or choose the Graph menu Create Graph...
command.
Graphs are not created for rates and proportion tests, and best subset
and incremental polynomial regression reports. The toolbar button
and the Graph menu Create Graph... command are dimmed for these
tests.
! If you close a report without generating or saving a graph, the graph is not
recoverable. See Saving Graphs in Notebook Files on page 200 for more
information on saving graphs.
Repeating Tests Repeating a test involves running the last test you performed, using the
same worksheet columns. To repeat a test using new data columns, use
the button or the Statistics menu Run Current Test... command (see
Running SigmaStat Procedures on page 97 for more information).
1 Make sure the last test you performed is displayed in the toolbar
drop-down list.
2 If desired, edit the data in the columns used by the test. You can
add data and change values and column titles.
3 To change the option settings before you rerun the test, click the
toolbar button, change the desired options, then click OK to
accept the changes and close the dialog box.
5 Click Finish to repeat the procedure using these columns. After the
computations are complete, a new report appears.
Numeric, normally Unpaired t-test One Way or Paired t-test One Way or Regression or
distributed with Two Way Two Way Pearson Product
equal variances ANOVA Repeated Moment
Measures Correlation
ANOVA
All statistical procedure commands are found under the Statistics menu.
FIGURE 5–7
Data Arrangement
with Treatments or
Groups in Columns
Selecting Data Columns You can calculate statistics for entire columns or only a portion of
columns. When running the descriptive statistics procedure, you can:
➤ Select the columns or block of data before you run the test, or
➤ Select the columns while running the test (page 105)
! To calculate statistics for only a range of data, select the data before you run
the test. You can select a minimum of one column and a maximum of 32
columns when describing data.
1 If you are going to run the test after changing test options, and
want to select your data before you run the test, drag the pointer
over your data.
➤ Click
the button, or
➤ Choose the Statistics menu Current Test Options... command
FIGURE 5–8
The Options for
Descriptive
Statistics Dialog Box
5 To change the P value for the normality test, edit the value P
Value to Reject edit box. The P value determines the probability of
being incorrect in concluding that the data is not normally
distributed. If the P computed by the test is greater than the P set
here, the data passes the normality test.
6 To select all statistics options, click the All button. To clear all
selections, click the Clear button.
7 Click Run Test to perform the test with the selected options
settings. Click Apply to accept the selected settings without
continuing with the test. Cancel closes the Options dialog box and
returns to the previous option settings, and Help accesses
SigmaStat’s on-line help system.
1 If you want to select your data before you run the procedure, drag
the pointer over your data.
2 Open the Pick Columns for Descriptive Statistics dialog box by:
3 If you selected columns before you chose the test, the selected
columns automatically appear in the Select Columns list. To assign
the desired worksheet columns to the Selected Columns list, select
the columns in the worksheet, or select the columns from the Data
for Data drop-down list.
FIGURE 5–9
The Pick Columns
for Descriptive
Statistics Dialog Box
The first selected column is assigned to the first row in the Selected
Columns list, and all successively selected columns are assigned to
successive rows in the list. The number or title of selected columns
appear in each row. You can select up to 64 columns of data for the
Descriptive Statistics Test.
5 Click Finish to describe the data in the selected columns. After the
computations are completed, the report appears. To edit the
report, use the Format menu commands; for information on
editing reports, see page 137.
Mean The mean is the average value for a column. If the observations are
normally distributed, the mean is the center of the distribution.
Standard Deviation Standard deviation is a measure of data variability about the mean.
Standard Error The standard error of the mean is a measure of how closely the sample
of the Mean mean approximates the true population mean.
FIGURE 5–10
Descriptive
Statistics Results
Report
Range The range is the minimum values subtracted from the maximum values.
Percentiles The two percentile points which define the upper and lower ends (tails)
of the data, as specified by the Descriptive Statistics options.
Sum The sum is the sum of all observations. The mean equals the sum
divided by the sample size.
Sum of Squares The sum of the squared observation is the sum of squared deviations
from the mean.
Confidence Interval for The confidence interval for the mean is the range in which the true
the Mean population mean will fall for a percentage of all possible samples drawn
from the population.
Normality Normality tests the observations for normality using the Kolmogorov-
Smirnov test.
You can generate up to five graphs using the results from a descriptive
statistics graph. They include a:
Bar Chart The Descriptive Statistics bar chart plots the group means as vertical bars
with error bars indicating the standard deviation. The column titles are
used as the tick marks for the bar chart bars and default X Data and Y
Data axis titles are assigned to the graph. For an example of a bar chart,
see page 149.
Scatter Plot The Descriptive Statistics scatter plot graphs the column means as single
points with error bars indicating the standard deviation. The column
titles are used as the tick marks for the scatter plot points and default X
Data and Y Data axis titles are assigned to the graph. For an example of
a scatter plot, see page 150.
Point Plot The Descriptive Statistics point plot graphs all values in each column as
a point on the graph. The column titles are used as the tick marks for the
plot points and default X Data and Y Data axis titles are assigned to the
graph. For an example of a point plot, see page 150.
Point and The Descriptive Statistics point and column means plot graphs all values
Column Means Plot in each column as a point on the graph with error bars indicating the
column means and standard deviations of each column. The column
titles are used as the tick marks for the plot points and default X Data
and Y Data axis titles are assigned to the graph. For an example of a
point and column means plot, see page 151.
Box Plot The Descriptive Statistics test box plot graphs the percentiles and the
median of column data. The ends of the boxes define the 25th and 75th
percentiles, with a line at the median and error bars defining the 10th
and 90th percentiles. For an example of a box plot, see Figure 5–12 on
page 112.
The column titles are used as the tick marks for the box plot boxes, and
no axis titles are assigned to the graph.
2 Select the type of graph you want to create from the Graph Type
list, then click OK, or double-click the desired graph in the list.
FIGURE 5–11
The Create Graph Dialog
Box
for the Descriptive Statistics
Report
For more information on each of the graph types, see pages 7-149
through 8-176. The specified graph appears in a graph window or
in the report.
FIGURE 5–12
A Box Plot of the Result
Data for a Descriptive
Statistics Test
! Note that SigmaStat can automatically test for assumptions of normality and
equal variance.
SigmaStat lists the specific tests in the Statistics menu and the toolbar
drop-down list. The complete procedures for each group comparison
test are outlined in Chapter 6.
For more information on how to use the Advisor Wizard, see Chapter 5.
When to Compare Two If data was collected from two different groups of subjects (for example,
Groups two different species of fish or voters from two different parts of the
country), use a two group comparison to test for a significant difference
beyond what can be attributed to random sampling variation.
➤ Choose the unpaired t-test (page 206) if your samples were taken
from normally distributed populations and the variances of the two
! Note that you can tell SigmaStat to analyze your data and test for normal
distribution and equal variance. If assumptions of normality and equal
variance are violated, the alternative parametric or nonparametric test is
suggested. Assumption tests are activated and configured in the t-test and
Mann-Whitney Options dialog boxes.
When to Compare Many If you collected data from three or more different groups of subjects,
Groups use one of the ANOVA (analysis of variance) procedures to test if there
There are four procedures available: the single factor or One Way
ANOVA, the Two Way ANOVA, the Three Way ANOVA, and the
Kruskal-Wallis ANOVA on ranks.
➤ Choose One, Two, or Three Way ANOVA (page 230, page 253,
and page 283) if the samples were taken from normally distributed
populations and the variances of the populations are equal. The
One, Two, and Three Way ANOVAs are parametric tests which
directly compare the samples arithmetically.
➤ If your samples were taken from populations with non-normal
distribution and/or unequal variance, choose the Kruskall-Wallis
ANOVA on ranks (page 310), which is the nonparametric analog of
the one way ANOVA. The Kruskall-Wallis ANOVA on ranks
arranges the data into sets of rankings, then performs an analysis of
variance based on these ranks, rather than directly on the data, so it
does not require assuming normality and equal variance.
FIGURE 5–14
The Compare Many
Groups Commands
The advantage of parametric ANOVAs are that, when the normality and
equal variance assumptions are met, they are slightly more sensitive (i.e.,
they have greater power) than the analysis based on ranks. When the
assumptions are not met, the Kruskall-Wallis ANOVA on ranks is more
reliable.
! Note that SigmaStat does not have a two factor analysis of variance based
on ranks.
! Note also that you can tell SigmaStat to analyze your data and tests for
normal distribution and equal variance. If assumptions of normality and
equal variance are violated, the alternative parametric or nonparametric test
is suggested. These tests are specified in the Options dialog boxes. To open
the dialog box for the current test, click the button, or choose the
Statistics menu Current Test Options... command.
When to Use One, Two, The difference between a One, Two, and Three Way ANOVA lies in the
and Three Way ANOVAs design of the experiment that produced the data.
➤ Use a One Way ANOVA (page 230) if there are several different
experimental groups that received a set of related but different
treatments (i.e., one factor). This design is essentially the same as an
unpaired t-test (a one way ANOVA of two groups obtains exactly
the same P value as an unpaired t-test).
➤ Use a Two Way ANOVA (page 253) if there were two experimental
factors that are varied for each experimental group.
➤ Use a Three Way ANOVA (page 283) if there are three experimental
factors which are varied for each experimental group.
factor design can test for (1) there is no difference in opinion of the
teachers among gender; (2) there is no difference in opinion of the
teachers among states; (3) there is no difference in knowledge among
education levels; and (4) there is no interaction between gender, state,
and education in terms of knowledge; any differences between differing
levels of education are the same for all genders in all states.
How to Determine Which Analysis of variance techniques (both parametric and nonparametric)
Groups are Different test the hypothesis of no differences between the groups, but do not
indicate what the differences are. You can use the multiple comparison
procedures (post-hoc tests) provided by SigmaStat to isolate these
differences.
To always test for differences among the groups select the Always
Perform option under the Post Hoc Tests tab in the ANOVA options
dialog boxes. You can also specify to use multiple comparisons to test for
a difference only when the ANOVA P value is significant by selecting the
Only When ANOVA P Value is Significant option, then select the
desired P value.
! Note that SigmaStat can automatically test for assumptions of normality and
variance.
SigmaStat lists the specific tests in the Statistics menu and the toolbar
drop-down list. The complete procedures for each repeated measures
comparison test are outlined in Chapter 10.
When to Compare If data was collected from the same group of individuals (for example,
Effects on Individuals patients before and after a surgical treatment, or rats before and after
Before and After a Single training), use Before and After comparison to test for a significant
Treatment difference beyond what can be attributed to random individual
variation.
➤ Choose the Paired t-test (page 332) if your samples were taken from
a population in which the changes to each subject are normally
distributed. The Paired t-test is a parametric test which directly
compares the sample data.
➤ If your sample effects are not normally distributed, choose the
Wilcoxon Signed Rank Test (page 345). The Wilcoxon Signed Rank
Test arranges the data into sets of rankings, then performs a Paired t-
test on the sum of these ranks, rather than directly on the data.
➤ If your samples are already ordered according to qualitative ranks,
such as poor, fair, good, and very good, use the Wilcoxon Signed
Rank Test.
FIGURE 5–15
The Before and After
Comparison Commands
The advantage of the paired t-test is that, assuming normality and equal
variance, it is slightly more sensitive (i.e., it has greater power) than the
Wilcoxon Signed Rank Test. When these assumptions are not met, the
Wilcoxon Signed Rank Test is more reliable.
! Note that you can tell SigmaStat to analyze your data and test for normality.
If the assumption of normality is violated, the alternative parametric or
nonparametric test is suggested. Assumption tests are activated and
configured in the Paired t-test and Wilcoxon Options dialog boxes.
When to Compare If you collected data on the same individuals undergoing three or more
Effects on Individuals different treatments or conditions, use one of the Repeated Measures
After Multiple ANOVA (analysis of variance) procedures to test if there is difference
Treatments among the effects of the treatments beyond what can be attributed to
random individual variation.
There are three procedures available: the single factor or One Way
Repeated Measures ANOVA (analysis of variance), the Two Way
Repeated Measures ANOVA, and the Friedman Repeated Measures
ANOVA on Ranks.
➤ Choose One or Two Way ANOVA (pages page 230 and page 379) if
the treatment effects are normally distributed with equal variances.
The one and two way ANOVAs are parametric tests which directly
compare the two samples arithmetically.
➤ If the treatment effects are not normally distributed and/or have
unequal variances, choose the Friedman Repeated Measures
FIGURE 5–16
The Repeated Measures
Comparison Commands
! Note that SigmaStat does not have a two factor analysis of variance based
on ranks.
! Note that you can tell SigmaStat to analyze your data and test for normal
distribution and equal variance. If assumptions of normality and equal
variance are violated, the alternative parametric or nonparametric test is
suggested. These tests are specified in the repeated measures one and two way
and Friedman options dialog boxes. See pages page 230, and page 408 for
more information.
When to Use One and The difference between a one factor and two factor repeated measures
Two Way RM ANOVA ANOVA lies in the design of the experiment that produced the data.
The two factor design can test three hypotheses about the education
levels and schools: (1) there is no difference in reading skill at different
education levels; (2) there is no difference in reading skill at different
schools or after changing schools; and (3) there is no interaction between
education level and school in terms of reading skill; any effect of levels of
education are the same in all schools.
How to Determine Which Repeated measures analysis of variance techniques (both parametric and
Treatments Have an nonparametric) test the hypothesis of no effect among treatments, but
Effect do not indicate which treatments have an effect. You can use the
multiple comparison procedures provided by SigmaStat to isolate the
differences in effect.
To always test for differences among the groups, select the Always
Perform option under the Post Hoc Tests tab in the ANOVA options
dialog boxes. You can also specify to use multiple comparisons to test for
a difference only when the ANOVA P value is significant by selecting the
Only When ANOVA P Value is Significant option, then select the
desired P value.
! Note that SigmaStat automatically analyzes your data for its suitability for
Chi-Square or Fisher Exact Test, and suggests the appropriate test.
FIGURE 5–17
The Rates and Proportions
Comparison Methods
When you want to predict the value of one variable from one or more
other variables, you can use regression methods to estimate the
predictive equation, and/or compute a correlation coefficient to
describe the how strongly the value of one variable is associated with
another.
When to Use Regression Regression methods are used to predict the value of one variable (the
to dependent variable) from one or more independent variables by
Predict a Variable estimating the coefficients in a mathematical model. Regression assumes
that the value of the dependent variable is always determined by the
value of independent variables. Regression is also known as fitting a line
or curve to the data.
FIGURE 5–18
The Regression Commands
When to Use Correlation Compute the correlation coefficient if you want to quantify the
relationship between two variables without specifying which variable is
the dependent variable and which is the independent variable.
Correlation does not predict the value of one variable from another; it
only quantifies the strength of association between the value of one
variable with another.
FIGURE 5–19
The Correlation Coefficient
Computation Commands
Testing Normality 10
FIGURE 5–20
Example of Normally
Distributed Data Plotted
Using a Line Plot
Normally distributed data
has a characteristic “bell”
shaped distribution,
as shown on the left.
When to Test for Normality is assumed for all parametric tests and regression procedures.
Normality SigmaStat can automatically perform a normality test when running
a statistical procedure that makes assumptions about the population
parameters. This assumption testing is enabled in the Options dialog
box for each test. If the data fails the assumptions required for a
particular test, SigmaStat will suggest the appropriate test that can be
used instead.
However, if you want to perform a parametric test and your data fails the
normality test, you can transform your data using Transforms menu
commands so that it meets the normality requirements. To make sure
transformed data now follows a normal distribution pattern, you can
run a normality test on the data before performing the parametric
procedure again.
2 If desired, set the P value used to pass or fail the data in the Report
Options dialog box (see the following section).
5 View and interpret the Normality test report, and generate the
report graphs.
2 If you want to change the P value for the normality test, select the
P value box. The P value determines the probability of being
incorrect in concluding that the data is not normally distributed. If
the P computed by the test is greater than the P set here, the test
passes.
FIGURE 5–21
The Report Options Dialog
Box
Normality test data must be in raw data format, with the individual
observations for each group, treatment or level in separate columns. You
can test up to 64 columns of data for normality.
FIGURE 5–22
Valid Data Format
for Normality Testing
To run a Normality test, you need to select the data to test. The Pick
Columns dialog box is used to select the worksheet columns with the
data you want to test.
1 If you want to select your data before you run the test, drag the
pointer over your data.
3 If you selected columns before you chose the test, the selected
columns automatically appear in the Selected Columns list. To
assign the desired worksheet columns to the Selected Columns list,
select the columns in the worksheet, or select the columns from
the Data for Data drop-down list.
FIGURE 5–23
The Pick Columns
for Normality Dialog Box
The first selected column is assigned to the first row in the Selected
Columns list, and all successively selected columns are assigned to
successive rows in the list. The number or title of selected columns
appear in each row. You can select up to 64 columns of data for the
Normality test.
5 Click Finish to describe the data in the selected columns. After the
computations are completed, the report appears. To edit the
report, use the Format menu commands; for information on
editing reports, see page 137.
The results of a Normality test display the K-S distances and P values
computed for each column, and whether or not each column selected
passed or failed the test.
FIGURE 5–24
The Normality
Test Report
P Values The P values represent the observations for normality using the
Kolmogorov-Smirnov test. If the P computed by the test is greater than
the P set in the Report Options dialog box (see page 135), your data can
be considered normal.
You can generate two graphs using the results from a Normality report.
They include a:
Histogram of Residuals The Normality histogram plots the raw residuals in a specified range,
using a defined interval set. The residuals are divided into a number of
evenly incremented histogram intervals and plotted as histogram bars
indicating the number of residuals in each interval. The X axis represents
the histogram intervals, and the Y axis represent the number of residuals
in each group. For an example of a histogram of residuals, see page 153.
Normal The Normality probability plot graphs the frequency of the raw
Probability Plot residuals. The residuals are sorted and then plotted as points around a
curve representing the area of the gaussian plotted on a probability axis.
Plots with residuals that fall along gaussian curve indicate that your data
was taken from a normally distributed population. The X axis is a linear
scale representing the residual values. The Y axis is a probability scale
representing the cumulative frequency of the residuals. For an example
of a normal probability plot, see page 155.
FIGURE 5–25
The Create Graph Dialog
Box
for the Normality Report
2 Select the type of graph you want to create from the Graph Type
list, then click OK, or double-click the desired graph in the list.
For more information on each of the graph types, see page 149
When to Power and sample size computations are used to determine the
Compute Power parameters for an intended experiment, before the experiment is carried
and Sample Size out. Use these procedures to help improve the ability of your
experiments to test the desired hypotheses.
When the dialog box appears, specify the remaining parameters of the
data. For more information on determining the power of a test, see
Chapter 13.
FIGURE 5–26
The Power
Computation Commands
FIGURE 5–27
The Sample Size
Computation Commands
When the dialog box appears specify the power and the remaining
parameters of the data. For more information on determining the
sample size of a test, see Chapter 13.
1 Choose the Tools menu Options command and then click the
Report tab.
Setting The Number of Significant Digits option is used to set the number of
Significant Digits significant digits used for the values in the report. The default is three
digits. The maximum number of digits is sixteen.
Using Scientific Notation The Always Use Scientific Notation option uses scientific notation for
the appropriate values in the report tables. If this option is disabled,
scientific notation is only used when the value is too long to fit in the
table cell. This option is disabled by default.
Explaining The Explain Test Results option includes explanatory text for test results
Test Results in the report. This option is enabled by default. Disable the option to
keep explanatory text out of the report.
Specifying a Significant The P Value for Significance determines whether there is a statistically
P Value significant difference in the mean values of the groups being tested. The
value you specify is compared to the P values computed by all tests.
! It is important to note that this P value does not affect the actual test results.
It only affects the text that explains if the difference in the mean values of the
groups is due to chance or due to random sampling variation.
If the P computed by the test is smaller than the P set here, the text
reads, “The difference in the mean values of the two groups is greater
than would be expected by chance; there is a statistically significant
difference between the input groups.”
If the P value computed by the test is greater than the P set here, the text
reads, “The difference in the mean values of the two groups is not great
enough to reject the possibility that the difference is due to random
sampling variability. There is not a statistically significant difference
between the input groups.”
One of the above explanation text strings appears for each P value
computed by the test. ANOVAs and some regressions produce multiple
P values.
! If the Explain Test Results option is turned off, the results of this P value do
not appear in the report.
Hiding and Displaying The Show Ruler option displays the ruler at the top margin of the report
the page. This option is enabled by default. Disable the option to hide the
Report Ruler report ruler.
Generating Reports 10
Reports contain the results of a performed test. Each time you run a test
a new test section containing the report is generated. The report takes
the name of the test it was generated from and number of the report.
The section takes the name of the test the report was generated from and
is numbered according to the order it is generated.
For example, if the first report you generate by running the Descriptive
Statistics test, the title of the report window is Descriptive Statistics 1. If
you generate a second report using data from the Paired t-test worksheet,
the title of the report is Paired t-test 2. The sections are Descriptive
Statistics 1 and Paired t-test 1. For information of renaming and moving
notebook items, see Copying and Moving Notebook Items on page 28
and Naming Sections and Items on page 23.
You can generate as many reports as desired. If you have multiple reports
opened, use the Window menu to select the report you want to view.
! Use the Edit menu commands to combine reports together. See the
following section, EDITING REPORTS for more information.
Editing Reports 10
Editing reports involves changing text and paragraph attributes. Use the
formatting toolbar or the Format menu commands to modify the font,
margins, tabs, text alignment, line spacing, and tabs in the selected
report. You can also use the Edit menu to search for and replace specified
text with new text.
! By default, the report page is set to US Letter size and Portrait orientation.
Formatting Toolbar The formatting toolbar automatically appears under the standard
toolbar whenever a report window is open; it is not available unless a
report is open, and is not active unless a report is the active window. Use
the formatting toolbar to modify the style, alignment, spacing, and tabs
of selected report text.
Report Ruler Settings A ruler automatically appears at the top of each report. It can be used to
set tabs and position text in the report. To turn the report ruler on and
off choose the Tools menu Options command, and click the Report tab.
Select or clear Show Ruler. Select the units of measurement you want to
use, then click OK.
FIGURE 6–1
The Ruler Units Dialog Box
Changing Text Attributes Changing the attributes of report text involves modifying the typeface,
color, style, and size of the selected characters.
FIGURE 6–2
The Font Dialog Box
3 Select the desired font, style, and size from the Font, Font Style,
and Size lists. Use the Underline and Strikeout check boxes to draw
lines through and under selected text. Select the desired color from
the Color drop-down list. An example of the specified font
attributes appears in the box at the bottom of the dialog box.
! You can also use the buttons in the formatting toolbar to italicize,
underline, or make selected text boldface.
Spacing and Space and align report text using the Format menu Spacing and
Aligning Report Text Alignment commands or the formatting toolbar. Line spacing can be set
to 1 Line, 2 Lines, or 3 Lines. Text can be left, right, or centered aligned
or justified. Select the lines you want to space. Choose the Edit menu
Select All command to select all the text in the report. When the desired
text is selected, choose the appropriate command, or click the
appropriate buttons from the formatting toolbar.
Setting Tabs Set tabs using the ruler that appears above the report. Select the left,
right, center, or decimal aligned tab button from the formatting toolbar
or choose the appropriate Format menu Tab command, then click in the
2 Type the text you want to search for in the Find What edit box.
4 Select the Match Case option to search only for text that matches
the case you specified in the Find What edit box.
5 To replace the text you are searching for with new text, click the
Replace tab, and type the text you want to replace the old text with
in the Replace With edit box.
6 Click the Find Next button to find the specified text according to
the selected settings. SigmaStat starts searching at the cursor
location. The first instance of the text after the cursor is
highlighted in the report.
7 To replace the highlighted text with the text in the Replace With
edit box, click the Replace button. Click the Replace All button to
replace all instances of the text in the Find What edit box with the
text in the Replace With edit box.
! Selecting Replace with nothing in the Replace With edit box, deletes
report text matching the text in the Find What edit box.
8 Click Cancel to close the dialog box. You can also close the dialog
box by clicking the button in the upper right corner of the
dialog box.
Cutting and Copying To remove selected text from the report, choose the Edit menu Cut
Report Text command, click the toolbar button, or press Ctrl+X. To copy
selected text from the report to the Clipboard without removing it from
the worksheet, choose the Edit menu Copy command, click the toolbar
button, or press Ctrl+C.
Pasting Report Text Use the Paste command to paste text to other locations in the report,
other reports, or other applications. To paste cut or copied text from the
Clipboard, click or move the cursor to where you want to place the text;
then choose the Paste command, click the toolbar button, or press
Ctrl+V. The Clipboard contents appear in the specified location of the
report or application.
Deleting Report Text Use the Clear command or press the Delete key to permanently erase
selected text from the report. The text is not copied to the Clipboard.
Editing Reports Using a The SigmaStat editor is a fully functional text editor, however, for
Word Processor complex or lengthy editing tasks, you can use a more powerful word
processor. To open reports in other applications, you need to export
report as either text or RTF files. Reports saved as RTF files keep all of
the formatting code. To leave the formatting code out of the report,
export the report as plain text by choosing Text (.TXT) as the file type.
Use the scroll bars at the right and bottom edges of the report window to
scroll through the current page of the report. Scroll bar do not move to
the next or previous page. You must use the formatting toolbar and
buttons to move one page up and down in the report.
You can also use the following keyboard commands to move around and
select text in the report.
Function Keystroke
Move to next character %
Move to previous character &
Move to next word Ctrl'%
Move to previous word Ctrl+&
If you are saving a notebook for the first time, the Save As dialog box
appears prompting you for a file name, and path for the notebook file.
If you are saving the report to an existing notebook file, the notebook is
updated to include the new report or the changes to the existing report.
Exporting Reports 10
To export a report as a non-notebook file, drag the mouse over the text
you want to save to a file. If no text is selected, the entire report is
exported. Choose the File menu Export... command, then select the file
type to export the report to. Reports can be exported as the following file
types:
RTF files are saved with formatting attributes. Text files are saved
without the formatting attributes.
! You can also cut and/or copy report text, then paste it into a word processing
application using the Edit menu commands. Text pasted into other
applications is pasted as plain text.
Opening Reports 10
To open a report, choose the File menu Open... command, click the
toolbar button, or press Ctrl+O. When the Open dialog box appears,
select the type of file you want to open by selecting a file type from the
List Files of Type drop-down list, then click OK. If you are opening a
report in a notebook file, choose a notebook file as the file type. If you
are opening a non-notebook report file, choose Text or Rich Text
Format as the file type.
Reports in If you open a Notebook file (.SNB), a notebook file appears displaying
Notebook Files its sections and items. To view the desired report, double-click the report
icon in the appropriate section.
Non-Notebook Report Non-notebook files are individual files which are separate from the
Files notebook. They are automatically converted to notebook file format
when opened in SigmaStat. You can open the following non-notebook
report file types in SigmaStat:
Close a report by clicking the close button which appears in the upper
right corner of the report window. You can also close reports by choosing
the File menu Close command while the report is the active window.
Closed reports can be opened by double-clicking the report and graph
icon in the notebook section.
Closed reports and graphs are not removed from the notebook. To delete
a report or graph, close it, then select the report icon in the notebook
section and press Delete. For more information on deleting notebook
items, see Removing Items from a Notebook File on page 30.
Printing Reports 10
To print a report:
1 Make sure the report or graph you want to print in the active
window, then choose the File menu Print... command, click the
toolbar button, or press Ctrl+P. The Print dialog box appears.
2 Specify the printer to use, the range of pages, and number of copies
to print.
FIGURE 6–3
Example of the Print dialog
box for the HP LaserJet
Postscript Printer
This dialog box differs
depending on the type of
output device you have.
! Note that the Print dialog box differs depending on the type of
printer you have. Figure 6–3 is an example of the Print dialog box
with an HP 4/4M Postscript driver selected.
SigmaStat creates graphs generated from reports and exploratory graphs that
you create using the Graph Wizard. This chapter discusses how to create and
modify both types of SigmaStat graphs. It explains how to:
147
Creating and Modifying Graphs
Graphs can be generated for all test reports except Two Way Repeated
Measures ANOVA, rates and proportions tests, Best Subset and Incremental
Polynomial Regression, and Multiple Logistic reports.
To generate a report graph, select the appropriate report, then click the
toolbar button, or choose the Graph menu Create Graph... command, or
press F3. The Create Graph dialog box appears displaying the available
graphs for the selected report.
Select the report graph you want to create, then click OK, or double-click the
graph in the list.
FIGURE 7–1
The Create Graph
Dialog Box for a Report
Graph
FIGURE 7–2
The Select Independent
Variable Dialog Box
The selected graph appears in a graph page window with the name of the
page in the window title bar. Graph pages are named according to the type of
graph created and are numbered incrementally. The graph page is assigned to
the test section of its associated report.
Bar Charts of the Bar charts to the column means are available for the following tests:
Column Means
➤ Descriptive Statistics (see page 104)
➤ t-test (see page 206)
➤ One Way ANOVA (see page 230)
This bar chart plots the group means as vertical bars with error bars
indicating the standard deviation.
FIGURE 7–3
A Bar Chart of the
Result Data for a t-test
Scatter Plot The scatter plot is available for the following tests:
The scatter plot graphs the group means as single points with error bars
indicating the standard deviation.
FIGURE 7–4
A Scatter Plot of the
Result Data for a
One Way ANOVA
Point Plot The point plot is available for the following tests:
The point plot graphs all values in each column as a point on the graph.
FIGURE 7–5
A Point Plot of the
Result Data for a
ANOVA on Ranks
Point Plots and The point and column means plot is only available for Descriptive Statistics
Column Means (see Describing Your Data with Basic Statistics on page 104).
The point and column means plot graphs all values in each column as a point
on the graph with error bars indicating the column means and standard
deviations of each column.
FIGURE 7–6
A Point and Column Means
Plot of the Result Data for a
Descriptive Statistics Test
The error bars plot the
column means and the
standard deviations
of the column data.
Box Plot The box plot is available for the following tests:
The Rank Sum Test box plot graphs the percentiles and the median of
column data. The ends of the boxes define the 25th and 75th percentiles,
with a line at the median and error bars defining the 10th and 90th
percentiles.
FIGURE 7–7
A Box Plot of the
Result Data for a
Rank Sum Test
2D Scatter Plot The 2D scatter plot of the residuals is available for all of the regressions
of the Residuals except the Multiple Logistic and the Incremental Polynomial Regressions.
The scatter plots of the residuals plot the raw residuals of the independent
variables as points relative to the standard deviations. The X axis represents
the independent variable values, the Y axis represents the residuals of the
variables, and the horizontal lines running across the graph represent the
standard deviations of the data. See Chapter 12 for more information on the
graphs for the individual regression reports.
FIGURE 7–8
Scatter Plot of the Simple
Linear Regression
Residuals with Standard
Deviations
Bar Chart of Bar charts of the standardized residuals are available for all regressions except
the Standardized the Multiple Logistic and the Incremental Polynomial Regressions. They
Residuals plot the standardized residuals of the data in the selected independent
variable column as points relative to the standard deviations.
See Chapter 12 for more information on the graphs for the individual
regression reports.
Histogram of The histogram of residuals graph is available for the following tests:
Residuals
➤ t-test (see page 206)
➤ One Way ANOVA (see page 230)
➤ Two Way ANOVA (see page 253)
➤ Three Way ANOVA (see page 283)
➤ Paired t-test (see page 332)
➤ One Way Repeated Measures ANOVA (see page 355)
➤ Two Way Repeated Measures ANOVA (see page 379)
➤ Linear Regression (see page 469)
➤ Multiple Linear Regression (see page 495)
➤ Polynomial Regression (see page 553)
➤ Stepwise Regression (see page 577)
FIGURE 7–9
Multiple Linear
Regression Bar Chart
of the Standardized
Residuals with Standard
Deviations Using One
Independent Variable
The histogram plots the raw residuals in a specified range, using a defined
interval set. The residuals are divided into a number of evenly incremented
histogram intervals and plotted as histogram bars indicating the number of
residuals in each interval.
FIGURE 7–10
A Histogram of
the Residuals for a t-test
Normal The normal probability plot is available for the following test reports:
Probability Plot
➤ t-test (see page 206)
➤ One Way ANOVA (see page 230)
➤ Two Way ANOVA (see page 253)
➤ Three Way ANOVA (see page 283)
➤ Paired t-test (see page 332)
➤ One Way Repeated Measures ANOVA (see page 355)
➤ Two Way Repeated Measures ANOVA (see page 379)
➤ Linear Regression (see page 469)
➤ Multiple Linear Regression (see page 495)
➤ Polynomial Regression (see page 553)
➤ Stepwise Regression (see page 577)
➤ Nonlinear Regression (see page 636)
➤ Normality test (see page 127)
The probability plot graphs the frequency of the raw residuals. The residuals
are sorted and then plotted as points around a curve representing the area of
the gaussian plotted on a probability axis. Plots with residuals that fall along
gaussian curve indicate that your data was taken from a normally distributed
population. The X axis is a linear scale representing the residual values. The Y
axis is a probability scale representing the cumulative frequency of the
residuals.
FIGURE 7–11
Normal Probability
Plot of the Residuals
2D Line/Scatter Plots The 2D line and scatter plots of the regressions are available for all of the
of the Regressions regression reports, except Multiple Logistic and Incremental Polynomial
with Prediction and Regressions. They plot the observations of the regressions as a line/scatter
Confidence Intervals plot. The points represent the data dependent variables plotted against the
independent variables, the solid line running through the points represents
the regression line, and the dashed lines represent the prediction and
confidence intervals. The X axis represents the independent variables and the
Y axis represents the dependent variables.
FIGURE 7–12
A Line/Scatter Plot
of the Linear Regression
Observations with
a Regression and
Confidence and
Prediction Interval Lines
3D Residual The 3D residual scatter plots are available for the following test reports:
Scatter Plot
➤ Two Way ANOVA (see page 253)
➤ Two Way Repeated Measures ANOVA (see page 379)
➤ Multiple Linear Regression (see page 495)
➤ Stepwise Regression (see page 577)
➤ Nonlinear Regression (see page 636)
They plot the residuals of the two selected columns of independent variable
data. The X and the Y axes represent the independent variables, and the Z
axis represents the residuals.
FIGURE 7–13
A Multiple Linear
Regression 3D
Residual Scatter
Plot of the Two
Selected Independent
Variable Columns
Grouped Bar Chart This graph is available for the Two Way ANOVA (see page 253) report. It
with Error Bars plots the data means with error bars indicating the standard deviations for
each level of the factor columns. The levels in the first factor column are used
as the X axis tick marks, and the title of the first factor column and the data
column are used as the X and the Y axis titles. The first bar in the group
represents the first level of the second factor column and the second bar in
the group represents the second level in the second factor column.
FIGURE 7–14
A Two Way ANOVA Grouped
Bar Chart with Error Bars
3D Category This graph is available for the Two Way ANOVA (see Two Way Analysis of
Scatter Graph Variance (ANOVA) on page 253) and the Two Way Repeated Measures
ANOVA (see Two Way Repeated Measures Analysis of Variance (ANOVA)
on page 379). The 3D Category Scatter plot graphs the two factors from the
independent data columns along the X and Y axes against the data of the
dependent variable column along the Z axis. The tick marks for the X and Y
axes represent the two factors from the independent variable columns, and
the tick marks for the Z axis represent the data from the dependent variable
column.
FIGURE 7–15
A Two Way ANOVA 3D
Category Scatter Plot
Before and The before and after line plot is available for the:
After Line Plots
➤ Paired t-test (see page 332)
➤ Signed Rank Test (see page 345)
➤ One Way Repeated Measures ANOVA (see page 355)
➤ Repeated Measures ANOVA on Ranks (see page 408)
The before and after line plot uses lines to plot a subject's change after each
treatment. If the graph plots raw data, the lines represent the rows in the
column, the column titles are used as the tick marks for the X axis and the
data is used as the tick marks for the Y axis.
If the graph plots indexed data, the lines represent the levels in the subject
column, the levels in the treatment column are used as the tick marks for the
X axis, the data is used as the tick marks for the Y axis, and the treatment and
data column titles are used as the axis titles.
FIGURE 7–16
A Before & After Line Scatter
Plot Displaying Data
for a Paired t-test
Multiple The multiple comparison graphs are available for all ANOVA reports. They
Comparison Graphs plot significant differences between levels of a significant factor. There is one
graph for every significant factor reported by the specified multiple
comparison test. If there is one significant factor reported, one graph appears;
if there are tow significant factors, two graphs appear, etc. If a factor is not
reported as significant, a graph for the factor does not appear.
For information on individual graphs for each ANOVA, see the Chapters 9
and 10.
Scatter Matrix The matrix of scatter graphs is available for all the Pearson and the Spearman
Correlation reports (see Pearson Product Moment Correlation on page 623
and Spearman Rank Order Correlation on page 631). The matrix is a series
of scatter graphs that plot the associations between all possible combinations
of variables.
The first row of the matrix represents the first set of variables or the first
column of data, the second row of the matrix represents the second set of
variables or the second data column, and the third row of the matrix
represents the third set of variables or third data column. The X and Y data
for the graphs correspond to the column and row of the graph in the matrix.
For example, the X data for the graphs in the first row of the matrix is taken
from the second column of tested data, and the Y data is taken from the first
FIGURE 7–17
A Multiple Comparison
Matrix for a Two
Way ANOVA with One
Significant Factor
column of tested data. The X data for the graphs in the second row of the
matrix is taken from the first column of tested data, and the Y data is taken
from the second column of tested data. The X data for the graphs in the third
row of the matrix is taken from the second column of tested data, and the Y
data is taken from the third column of tested data. The number of graph
rows in the matrix is equal to the number of data columns tested.
FIGURE 7–18
A Scatter Matrix for a
Pearson Correlation
Exploratory graphs are graphs that you create by picking data from the
worksheet. Use your worksheet data to create a variety of scatter, line, point,
and box plots, bar charts, and pie charts.
1 Make sure a worksheet is the active window. If you want to pick your
data before creating a graph, drag the pointer over the data you want to
plot.
2 On the Graph menu click Create Graph or click the toolbar button.
The Graph Wizard appears displaying all of the exploratory graph types
in the Graph Styles scroll box.
FIGURE 7–19
Use the Graph Wizard to
Create Exploratory Graphs
3 Select the graph you want to create, and click Next to move to the next
panel of the Graph Wizard.
➤ If you select Histogram as the graph style, the Graph Wizard Create
Graph- Histogram panel prompts you to specify the number of bins
for the histogram. Go to Step 4.
➤ Some graph styles, like scatter plots and bar charts, require you to
choose the data format. Some, however, like pie charts or 3D scatter
plots, use a default data format.
➤ If you select a graph style that requires you to select the data
format, the Graph Wizard Create Graph - Data Format panel
appears. Go to Step 5.
➤ If you select a graph style that uses a default data format, the Graph
Wizard Creat Graph - Select Data panel appears. Go to Step 6.
4 To specify the number of bins to use for a histogram, type the
desired value in the edit box, then click Next.
FIGURE 7–20
Selecting Histogram Bins
! The number of bins is the number of intervals used for the histogram
bars. For more information, see Histogram on page 172.
FIGURE 7–21
Picking the Columns to Plot
5 Select the data format you want to use from the Data format list, and
click Next.
6 To pick the columns with the data to plot, begin picking data either
by clicking the corresponding column directly in the worksheet, or
choosing the appropriate column from the Data Columns drop-down
list. The selected columns appear in the Selected Columns list in the
order they are picked from the worksheet. You are prompted for X and/
or Y columns depending on the type of graph you are creating.
➤ If you are creating a histogram, select the column with the range of
data to plot as the Y column and the column to place the results of
data range divided by the number of bins as the Output column. For
more information on the histogram, see Histogram on page 172.
➤ If you are creating a graph of residuals, you are prompted to select the
data column with the residuals you want to plot as the Y column,
and the column to place the residuals of the Y data as the Output
column. If you are creating a Normal Probability Plot select the first
output column for the sorted residuals and the second output
column for the cumulative frequency of the residuals. For more
information on residual and normal probability plots, see 2D Scatter
Plot of the Residuals on page 152, Bar Chart of the Standardized
Residuals on page 153, and Normal Probability Plot on page 155.
➤ If you are plotting error bars and you selected Worksheet Column as
your error bar source, you are prompted for your X and /or Y
columns and the columns with the error bar values. For more
information on graphs with error bars, see Bar Charts of the Column
Means on page 149 and Grouped Bar Chart with Error Bars on page
157.
For more information of the how data is plotted for a graph, see the
sections on each of the exploratory graphs below.
If you have already selected the data you want to plot in the worksheet,
the selected columns automatically appear in the Selected Columns
box.
8 When you have finished picking your columns, click Finish to create
the graph. The graph appears in a graph window with the name and
number of the graph page and its associated worksheet in the window
title bar. The graph is assigned to the notebook section of its associated
worksheet. For descriptions of each of the exploratory graphs, see the
following pages.
Scatter Plot Select Single Scatter Plot as the graph type from the Graph Wizard to use
symbols to graph two columns of worksheet data as X values versus Y values.
Select one worksheet column for the X data and one worksheet column for
the Y data.
FIGURE 7–22
Example of a Scatter Plot
Scatter Plot Using Select Multiple Scatter Plot as the graph type from the Graph Wizard to use
Many Y Columns symbols to graph multiple columns or worksheet data as X values against Y
values. Select as many XY column pairs as desired. Each XY pair represents a
curve on the graph.
FIGURE 7–23
Example of a Scatter Plot
With Multiple Curves
Line Plot Select Single Line Plot as the graph type from the Graph Wizard to use lines
to graph two columns of worksheet data as X values versus Y values. The
Graph Wizard dialog box prompts you to pick one worksheet column for the
X data and one worksheet column for the Y data. Lines connect the points
plotted on the graph.
FIGURE 7–24
Example of a Line Plot
Line Plot Using Many Select Multiple Line Plot as the graph type from the Graph Wizard to use
Y Columns lines to graph multiple columns or worksheet data as X values against Y
values. You are prompted to select as many XY column pairs as desired. Each
column of data represents a curve on the graph Lines connect the points
plotted on the graph.
Bar Chart Select Simple Bar Chart as the graph type from the Graph Wizard to plot all
values in a selected column as bars on a bar chart. The column values are
plotted as the Y values against X values representing the row numbers of each
value. The Graph Wizard dialog box prompts you to pick one column of
data.
FIGURE 7–25
Example of a Line Plot
with Many Curves
FIGURE 7–26
Example of a Bar Chart
Bar Chart of Select Bar Chart Col Means as the graph type from the Graph Wizard to plot
Column Means the means of column data as bars with error bars indicating the standard
deviation of each column. The column means are plotted as the Y values
against X values representing the row numbers of each value.
The error bars are calculated using the means from the plotted worksheet
columns or from specified worksheet columns. If you selected to use values
from a worksheet column, you are prompted to pick the columns with the
error bar values. You pick one error bar column for each bar in the bar chart.
FIGURE 7–27
Example of a Bar Chart
of the Column Means
Scatter Plot of Select Scatter Plot Col Means as the graph type from the Graph Wizard to
Column Means plot the means of column data as points with error bars indicating the
standard deviation of each column. The column means are plotted as the Y
values against X data representing the row numbers of each value.
The error bars are calculated using the means from the plotted worksheet
columns or from specified worksheet columns. If you selected to use values
from a worksheet column, you are prompted to pick the columns with the
error bar values. You pick one error bar column for each point you plot on
the graph.
FIGURE 7–28
Example of a Scatter Plot
of the Column Means
Point Plot Select Point Plot as the graph type from the Graph Wizard to graph each
value in a selected worksheet column as Y data against X values representing
the order the columns are selected from the worksheet. The points are
represented by symbols. The Graph Wizard dialog box prompts you to pick
as many columns as desired.
FIGURE 7–29
Example of a Point Plot
Point Plot and Select Point and Column Means as the graph type from the Graph Wizard to
Column Means graph each value in the selected columns as Y data. The order of the columns
represent the X data. Error bars plot the means of each data column with
their standard deviations.
This graph style uses a default data format. Do not pre-select the data;
instead, select the data on the Graph Wizard Create Graph - Select Data
panel. First, select an empty column for the Output, which is the location for
the symbols of the means.
FIGURE 7–30
Selecting Data for a Point
Plot with Column Means
The error bars are calculated using the means from the plotted worksheet
columns or from specified worksheet columns.
FIGURE 7–31
Example of a Point
Plot with Error Bars
Plotting the Column Means
Box Plot Select Box Plot as the graph type in the Graph Wizard to graph the
percentiles and the median of the data in selected columns as boxes. The
percentiles and the median are plotted as the Y data against X values
representing the order the columns are selected from the worksheet. You can
select as many columns as desired.
FIGURE 7–32
Example of a Box Plot
Histogram Select Histogram as the graph type from the Graph Wizard to plot a column
of data in a specified range, using a defined interval set. The data is divided
into a number of evenly incremented histogram bins and plotted as
histogram bars indicating the number of data points in each bin or interval.
The X axis represents the histogram bins, and the Y axis represent the
number of residuals in each group. The Graph Wizard dialog box prompts
you the worksheet column with the range of data you want to plot as the Y
column and the column to place the results of your data divided by the
specified bin.
FIGURE 7–33
Example of a Histogram
Plotting Residuals of
Each Data Column
FIGURE 8–31
Example of a Pie Chart
3D Scatter Plot Select 3D Scatter Plot as the graph type from the Graph Wizard to use points
to graph three columns of worksheet data as X, Y, and Z values on a three
dimensional plane. The Graph Wizard prompts you to select an X, Y, and Z
column. You can select as many XYZ triplets as desired.
FIGURE 8–32
Example of a 3D
Scatter Plot
FIGURE 8–33
Example of a Scatter
Plot Plotting Residuals
of Each Data Column
Bar Chart of Select Standardized Residuals as the graph type from the Graph Wizard to
the Standardized plot the standardized residuals of the values in the selected column as bars
Residuals relative to the standard deviations. The column residuals are plotted as the Y
values against X values representing the row numbers of each Y value. The
Graph Wizard dialog box prompts you to pick a Y column with the data to
plot the residuals for, and an Output column to place the residual results in.
The output column is plotted.
Normal Select Normal Probability Plot as the graph style in the Graph Wizard to plot
Probability Plot the frequency of the raw residuals along a gaussian curve. The residuals are
sorted and then plotted as points around a curve representing the area of the
gaussian plotted on a probability axis. Plots with residuals that fall along the
gaussian curve indicate that your data was taken from a normally distributed
population.
The X axis is a linear scale representing the residual values. The Y axis is a
probability scale representing the cumulative frequency of the residual. Select
the column with the data you want to plot the residuals for plot as the Y
column, the column to place the sorted residuals of the data as the first
output column and the column to place the cumulative frequency of the
residuals as the second output column.
FIGURE 8–35
Example of a Normal
Probability Plot of the
Residuals
Setting the graph page options includes setting the page margins, size and
orientation, and the units of measurement on the page, graph resizing
options, and page undo disabling.
Setting Page The margins, size, and orientation of the graph page are set in the Page Setup
Margins, Size, dialog box.
and Orientation
1 On the File menu click Page Setup.
2 To set the margins of the graph page, click the Margins tab, then type
in or select the desired margins in the appropriate edit boxes. The
measurement unit used for the margins is specified in the Page
Preferences dialog box (see the following section).
FIGURE 8–36
The Page Setup Dialog
Box Displaying the
Margins Options
3 Clear or check the Show Margins options by selecting it. If this option
is selected, margins are displayed on the page. To hide page margins,
make sure the Show Margins option is not checked.
4 To set the size of the graph page, click the Page Size tab, then select a
paper size from the Paper Size drop-down list. Use the Width and
Height options to specify the dimensions of the page and the Portrait
and Landscape options to specify the orientation of the page. The
FIGURE 8–37
The Page Setup
Dialog Box Displaying
the Page Size Options
5 To set the background color and to show or hide graphs on the graph
page, click the Page Layout tab. Graphs that appear on the page are
listed under Shown. Graphs that are hidden are listed under Hidden.
Double-click a graph to move it between lists.
FIGURE 8–38
The Page Setup
Dialog Box Displaying
the Page Layout Options
6 Click OK to accept the specified settings and close the dialog box, or
Apply to accept the settings without closing the dialog box.
Setting Measurement The graph page measurement units, graph resizing options, and page undo
Units, Graph Resizing disabling option are set in the Options dialog box on the Page tab. To set
Options, and Page these options:
Undo
1 On the Tools menu click Options.
FIGURE 8–39
The Page Panel of the
Preferences Dialog Box
3 To set the measurement unit used for the graph page, select the
desired unit of measurement from the Units list. You can choose inches,
millimeters, or points. The selected measurement unit is used for the
page margins and size and for the size of the graph.
4 To enable and disable the Undo and Redo commands for the graph
page, select the Page Undo check box. When this check box is cleared,
the Undo and Redo commands are unavailable. Disabling the Undo
and Redo functionality of the graph page can speed page operations
significantly; however, it means page editing cannot be undone. To
disable the page Undo and Redo commands, clear the check box by
selecting it.
5 To set the aspect ratio of resized graph, select the Stretch Maintains
Aspect Ratio check box. When this command is checked, resized
objects maintain their vertical-to-horizontal ratio. If this command is
not checked, objects can be resized disproportionately. For more
information on sizing graphs, Resizing and Moving Graphs on the Page
on page 182.
7 To display grids on the graph page, select Show Grid. You can display
grids as either as dots or as lines. Select the density of the grid from the
Density drop-down list. Select the color of the grid from the Color
drop-down list.
9 Click OK to accept the settings and close the dialog box or Apply to
accept the settings without closing the dialog box.
Use the View menu Zoom command to reduce and enlarge a graph in the
graph window. There are five different zoom levels to choose from. You can
zoom out on the entire page or you can choose a 50, 100, 200, or 400
percent view of the graph. You can also select the desired zoom level from the
toolbar drop-down list.
FIGURE 8–40
Example of Different
Zoom Levels for Graphs
Before you can modify a graph or graph labels, you need to select them. To
select a graph or graph label, you must be in select mode. To make sure you
are in select mode, choose the Tools menu Select Object command. A check
mark next to the Select Object command indicates you are in select mode.
Once you are in select mode, click a graph or graph label to select it. Selected
graphs are surrounded by handles, and selected labels are surrounded by a
dotted box.
Use text mode to enter and edit text on a page. For information on entering
and editing text on a page, see Creating and Editing Labels on the Graph
Page on page 186.
FIGURE 8–41
Moving a Graph to a
New Location on the Page
Sizing Graphs To resize and scale a graph, select it. Handles surround a selected graph. Drag
a side handle to stretch or shrink a graph in one direction; drag a corner
handle to stretch or shrink a a graph two dimensionally. A dashed outline of
the resized graph follows the pointer position.
FIGURE 8–42
Sizing a Graph on the Page
To modify graph attributes in SigmaStat, select the graph, then choose the
Graph menu Graph Properties... command. The Graph Properties dialog box
appears.
FIGURE 8–43
The Graph Properties Dialog
Box
The options available in the dialog box depend on the type of graph you have
selected. Change the desired settings, click Apply to update the graph, and
OK to close the dialog box.
Changing Fills Use the Fill Color option to change the color of graph symbols, bars, boxes,
meshes, and pie slices. Use the Fill Pattern option to change the fill pattern of
graph bars, boxes, and pie slices.
Select a color scheme from the list of options to assign a set of colors to the
curve, bar, boxes in the plot or slices in the pie chart.
Changing Symbols Use the Symbol Type option to change the type of symbols used in 2D and
3D scatter plots. Change symbol sizes by moving the slider with your mouse.
Moving it to the left decreases symbol size and moving it to the right
increases symbol size. The value in the edit box changes to reflect the position
of the slider. You can also edit the value in the box to change symbol size.
Changing Lines Use the color option to change the color of plot and mesh lines and the
outline and fill pattern lines of bars, boxes, and pie chart slices. Use the Type
option to change the type of lines used for Line Plot lines.
! You cannot change the line type of bar, box, or pie slice outlines or of mesh
plots.
Changing Axes Select the axis you want to apply the selected scale type to from the Apply to
drop-down list. You can change the scale type of individual axes or multiple
axes in the graph.
Changing Axis Scales Use the Scale option to assign a different scale type to the specified axis or
axes. The default axis scales are linear, but you can also use common log,
natural log, probability, and logit axis scales.
Common Log Scale A common log axis scale is a base 10 logarithmic scale.
Probit Scale A probit axis scale is similar to the probability scale; the
Gaussian cumulative distribution function plots as a straight line on a probit
scale. The scale is linear, however, with major tick marks at each Normal
Equivalent Deviation (N.E.D. = X * +,-.,/plus 5.0. At the mean (X = +,/the
probit/0/5.01/at the mean plus one standard deviation (X = + + .) the
probit = 6.0, etc. The default range is from 3 to 7. The range limit for a
probit axis scale is 1 to 9.
where a 0100 and 0 2/y/2/100. The default range is 7 to 97. Like the
probability and probit scales, the logit scale “straightens” a sigmoidal curve
Labels, legends, and other kinds of text are added to the graph page using the
Edit Text dialog box. You can also use the Edit Text dialog box to edit graph
and axis titles which automatically appear on the page when a graph is
created.
Creating Labels You can add an unlimited number of text labels and legends to any page.
and Legends SigmaStat for Windows supports:
1 Make the page the active window by selecting it, then choose the Tools
menu Text command to switch from select to text mode. A check mark
next to the command indicates that you are in text mode.
2 Click the page where you want the label to begin. The Edit Text dialog
box appears.
3 Select the font and character size, and normal, bold, italic, or
underlined characters. You can also use the Edit Text options to create
FIGURE 8–44
The Edit Text Dialog Box
! Note that the Rotation, Alignment, and Line Spacing options affect the
entire label, not just the selected text, and that Line Spacing is an
automatic spacing control, not fixed. If you change the height of
characters by changing font sizes or by adding superscripts or subscripts,
the line height adjusts automatically.
4 Use the keyboard to type your label. To type additional lines, insert a
line break by pressing the Enter key.
You can use all standard cut (Ctrl+X), copy (Ctrl+C), and paste
(Ctrl+V) keystrokes and the Ctrl+% and Ctrl+& key commands to
move the cursor to the next and previous words in the text.
8 To add legend symbols to your text, click the Symbols... button. The
Symbols dialog box appears.
Legend symbols added to text using the Edit Text dialog box do not
appear in the Edit Text dialog box; they appear with the text on the
page.
FIGURE 8–45
The Symbol Dialog Box
9 Click OK to place the symbol in the text and to close the Symbol dialog
box.
10 When you are finished entering text, close the Edit Text dialog box by
clicking OK.
Editing Existing Use the Edit Text dialog box to edit existing text and labels that you’ve
Text Labels created, as well as automatically created graph and axis titles. Editing existing
text consists of changing the content of the text, including adding Greek
symbols to the text, adding legend symbols, and changing formatting of the
text.
2 Select the text to modify, then use your keyboard to type new text, or
use the Edit Text dialog box options to format the text. For more
information on Edit Text options, see step 3 on page 186. You can also
use all standard cut (Ctrl+X), paste (Ctrl+V), and copy (Ctrl+C)
keystrokes as well as the , , and toolbar buttons.
FIGURE 8–46
Example of Report Graph
with Added Legends and
with Modified Axis Titles
If you have SigmaPlot 8.02 installed on your computer, you can use
SigmaPlot's more advanced graph editing capabilities to modify your
SigmaStat graph. To view and edit SigmaStat graphs in SigmaPlot, choose the
Graph menu Edit with SigmaPlot command.
SigmaPlot opens within SigmaStat, which you can use to customize your
graph.
To close SigmaPlot, click another SigmaStat window, and choose the File
menu End SigmaPlot Editing command, or press Esc.
You can use SigmaPlot to edit both the data and the graph attributes of
exploratory graphs but only the graph attributes of report graphs. Because
report graphs do not use worksheets in SigmaStat, worksheets do not appear
with the graph page when you run SigmaPlot.
Cut and copy graphs to the Clipboard using the toolbar, or by using Edit
menu commands.
Clipboard contents can be pasted to any open page, or into any other
Windows application that supports Windows Metafiles or OLE2 (Object
Linking and Embedding). To learn about pasting objects and graphs, see
Pasting Graphs and Other Objects onto a SigmaStat Graph Page on page
190.
! The Clipboard is a Microsoft Windows feature. To learn more about how the
Clipboard works, refer to your Windows User’s Guide.
To cut or copy a graph or page object, select the graph or object to cut or
copy by clicking it. To cut the item, click the toolbar button, choose the
Edit menu Cut command, or press Ctrl+X. To copy the item, click the
toolbar button, choose the Edit menu Copy command, or press Ctrl+C. A
copy of the selected graph is placed in the Clipboard. Since copied items
remain in the Clipboard until replaced, you can paste as many copies as you
want without having to cut or copy the object each time.
For information on retrieving cut and copied items from the Clipboard, see
the following section, PASTING GRAPHS AND OTHER OBJECTS ONTO A
PAGE.
Use the Edit menu Paste or Paste Special commands to paste Clipboard
contents to a graph page window. Pasted objects can be SigmaStat graphs,
scanned images, clip art, text from a word processor, or anything else that can
be cut or loaded into the Windows Clipboard.
Use the Paste command, the toolbar button, or press Ctrl+V to paste an
object without linking or embedding it. Use the Paste Special... command to
paste an object as a specified file type, as an embedded object, or as a linked
file object.
To learn about using the Edit menu Cut and Copy commands to place
graphs and other objects on the Clipboard, see Cutting and Copying Graphs
and Other Page Objects on page 190.
When using the Edit menu Paste Special... command to paste an object to a
page, you can usually choose between embedding the object as a specified file
type by choosing the Paste Special dialog box Paste option, or linking the
object using the Paste Link option. Embedding the object actually places
copy of the object on the graph page and enables you to edit the object by
activating the object’s source application when you double-click it, but does
not change the original file from which the object was pasted. Embedding the
object has the advantage of keeping all the associated data in one place, but
can create large files.
Linking the object appears to place a copy of the object on the page, but
actually only places a reference to the original object file, and modifies the
object every time the original file is changed. Linking is useful when a
number of files need to refer to a central graph page, but also need to be
stored separately, either to save disk space, or to keep file elements in their
native applications for easy location and updating. The disadvantage of
linking objects is that a referenced file cannot be accessed if the locations of
the SigmaStat file and/or the source file are changed.
To learn about viewing, updating, and changing object links, see Viewing and
Modifying Object Links on page 196.
! Note that the Link option is unavailable if the Clipboard contents come from
an application that cannot link to SigmaPlot.
If you don’t anticipate needing to edit the object you are pasting, you do not
need to paste it as an embedded or linked file; however, you can still use the
Paste Special... command to paste the object as a specified file type.
Pasting Graphs and Other Objects onto a SigmaStat Graph Page 191
! You can place an object on the graph page without pasting it. To learn about
inserting an object on the page, see Inserting Graphic Objects on page 194.
1 Select the graph to cut or copy, then use the Edit menu Cut or Copy
command, the toolbar or button, or press Ctrl+X or Ctrl+C to
cut or copy the graph.
2 View the page to paste the graph to, then choose the Edit menu Paste
command, or click the toolbar button.
Graphs pasted using the Edit menu Paste command take their plotted
data with them; pasted graph data is placed in the worksheet associated
with the current page. Graphs pasted using the Paste command can be
modified just like any other graph. To learn about modifying graphs,
see Modifying Graph Attributes on page 184.
3 To paste a graph without moving its data, choose the Edit menu
Paste Special... command, the Paste Special dialog box appears.
FIGURE 8–47
Using the Paste
Special Dialog Box to
Paste a Graph to
Another Graph Page
4 Select the Paste option and choose Metafile from the As box. The graph
is pasted as a metafile object and does not place data on the worksheet.
Graphs pasted as metafile objects cannot be edited using most
SigmaPlot commands.
Pasting Graphs and Other Objects onto a SigmaStat Graph Page 192
Pasting Objects To paste artwork, text from a word processing application, or other objects
onto a page:
1 Open the application and file containing the desired artwork or text,
and cut or copy the object.
2 Switch to SigmaPlot and view the page, opening the notebook file with
the page, if necessary. (To learn about opening notebook files and items,
see Opening and Viewing Notebook Files and Items on page 25.)
! Note that the options available in the Paste Special dialog box depend on
the type of file being pasted.
5 Check the Display As Icon option if you want the object displayed as an
icon. Click the icon to view and edit the object in its source application.
You can also specify a different icon to display the pasted object. Click
the Icon... button to open the Change Icon dialog box. Choose a
different icon from the available options, or click the Browse... button
to search for alternative icons on your system.
Pasting Graphs and Other Objects onto a SigmaStat Graph Page 193
! The options in the As box change depending on your selection of either
Paste or Paste Link, and the explanation in the Result box changes
depending on your selection in the As option box.
FIGURE 8–48
Using the Paste Special
Dialog Box to Paste an
Object from MicroSoft
Word to SigmaPlot
7 Select the file type to paste, embed, or link, from the As box. If you
have selected the Paste option, the text in the Results box explains
whether you are choosing simply to paste the object, or whether you are
embedding the object. Embedded objects enable you to activate a
source application and edit the object.
8 Click OK to paste the object and to close the Paste Special dialog box.
The object is pasted to the page.
! If you pasted the object as an embedded file, you can double-click the
object to edit it. If you pasted the object as a linked file, double-click it to
modify the pasted object and the file from which it was pasted.
2 Choose the Edit menu Insert New Object... command. The Insert
Object dialog box appears.
3 To display the new object as an icon, check the Display As Icon check
box.
You can also specify a different icon to display the inserted object. Click
the Icon... button to open the Change Icon dialog box. Choose a
Pasting Graphs and Other Objects onto a SigmaStat Graph Page 194
different icon from the available options, or click the Browse... button
to search for alternative icons on your system.
4 To create a new object to place on the page, select the Create New
option, then choose the type of object to create from the Object Type
list. The objects available to create depend on the applications installed
on your system.
FIGURE 8–49
The Insert Object Dialog Box
After Selecting Create New
6 To insert an object from an existing file on the graph page, select the
Create From File option, then type the path and file name of the
desired file in the File edit box, or click the Browse button to Open the
Browse dialog box from which you can select the appropriate path and
file name.
Pasting Graphs and Other Objects onto a SigmaStat Graph Page 195
7 Check the Link option by selecting it to place the object on the page as
a linked object. If the Link option is not selected, the object is pasted to
the page as an embedded object.
FIGURE 8–50
The Insert Object Dialog Box
After Selecting Create From
File, and the Browse Dialog
Box
8 Click OK to place the object on the page and close the dialog box.
Viewing and View and modify links of pasted objects using the Links dialog box. The
Modifying Links dialog box displays all links associated with the current graph page.
Object Links
To view and modify links:
1 View the page by selecting it, or by choosing the name of the page from
the Window menu.
2 Choose the Edit menu Links... command. The Links dialog box
appears displaying the path, file name, type of file, and if it is a
manually updated or automatically updated link.
FIGURE 8–51
The Links Dialog Box
Pasting Graphs and Other Objects onto a SigmaStat Graph Page 196
If you do not have any linked objects on this page, the dialog box does
not display any links.
4 To edit a linked object, highlight the object name in the Links dialog
box by selecting it, then click the Open Source button. The source file
opens in the appropriate application where you can make changes, then
exit the application and return to SigmaPlot.
5 To change the source file used for a linked object, click the Change
Source button. The Change Source dialog box opens. Choose the new
path and file name, then click OK. The link appears in the Links dialog
box with the new path and file name. You may need to click the Update
Now button to view this change in your document.
FIGURE 8–52
The Change Source Dialog
Box
6 To end the link between an object and it’s source file, click the Break
Link button. The object is no longer treated as a linked object.
Pasting Graphs and Other Objects onto a SigmaStat Graph Page 197
Identifying Page Objects 10
If an object has been pasted or inserted on a graph page, you can use the Edit
menu Object command to determine the type of object. To identify the
object, select it, then view the Edit menu by selecting it with the pointer. The
Object command changes to reflect the file type of the selected object. For
example, if a bitmap object is selected, the Object command may read
Bitmap Image Object.
You can paste SigmaStat graphs into other applications exactly the same way
as you paste objects and graphs into SigmaStat. Copy the graph in SigmaStat
using the Edit menu Copy command (see Cutting and Copying Graphs and
Other Page Objects on page 190), then open the application you want to
paste the graph into, and choose the Edit menu Paste or Paste Special
command.
For detailed information on linking and embedding objects using the Paste
commands, see Pasting Graphs and Other Objects onto a SigmaStat Graph
Page on page 190. For information on how to insert graphs into other
applications, see Pasting SigmaStat Graphs into other Applications on page
198.
To save a graph page to a notebook file, choose the File menu Save command,
press Ctrl+S, or click the toolbar button. The Save As dialog box appears
prompting you for a file name, and path for the notebook file.
If you are saving the graph to an existing notebook file, the notebook is
updated to include the new worksheet or the changes to the existing
worksheet.
For more information on saving notebook items, see Saving Notebook Files
and Items on page 30.
Graphs as Non- SigmaStat graphs cannot be exported directly, but you can use the Edit menu
notebook Files command to cut or copy a graph to the Clipboard, then paste it to another
application.
For more information on cutting, copying, and pasting graphs, see Creating
and Editing Labels on the Graph Page on page 186.
To open a graph page, choose the File menu Open... command, click the
toolbar button, or press Ctrl+O. When the Open dialog box appears,
select the type of graph you want to open by selecting a file type from the List
Files of Type drop-down list, then click OK.
Graph Pages in If you open a Notebook file type (.SNB), a notebook file appears displaying
Notebook Files its sections and items. To view the desired graph, double-click the graph icon
in the appropriate notebook section.
The graph page and associated worksheet or report appear in the SigmaStat
window.
Non-Notebook Graph Non-notebook files are individual files which are separate from the notebook.
Page Files They are automatically converted to notebook file format when opened in
SigmaStat.
FIGURE 8–54
Opening a Graph File
Using the Open Dialog Box
Close a graph page by clicking the close button which appears in the upper
right corner of the Windows 95 graph page window. You can also close graph
pages by choosing the File menu Close command while the graph page is the
active window. Closed graph pages can be opened by double-clicking the
graph icon in the notebook section.
Closed graphs are not removed from the notebook. To delete a graph page,
close it, then select the report or graph icon in the notebook section and press
the Delete key.
To delete a graph from a graph page, select it, then press the Delete key or
choose the Edit menu Clear command. Cut the graph to the Clipboard using
the Edit menu Cut command, pressing Ctrl+X, or clicking the toolbar
button.
1 Make sure the graph page you want to print in the active window, then
choose the File menu Print... command, click the toolbar button, or
press Ctrl+P. The Print dialog box appears.
! To set page margins, size, orientation, and paper source before you print
the page, choose the File menu Page Setup... command, set the desired
options, then select Printer... to go to the Print dialog box. See Setting
Graph Page Options on page 178 for more information.
2 Specify the printer to use, the range of pages, and number of copies to
print.
FIGURE 8–55
Example of the Print dialog
box for the HP LaserJet
Postscript Printer
This dialog box differs
depending on the type of
output device you have.
3 Click Properties to set more advanced printing options. Once you have
set the desired option, click OK to return to the Print dialog box, then
click OK again to print the selected page.
! Note that the Print dialog box differs depending on the type of printer you
have. Figure 8–55 is an example of the Print dialog box with an HP 4/4M
Postscript driver selected.
Use group comparison tests to compare random samples from two or more
different groups for differences in the mean or median values which cannot
be attributed to random sampling variation.
See Choosing the Group Comparison Test to Use on page 113 for more
information on when to use the different SigmaStat group comparison tests.
Parametric and Parametric tests assume samples were drawn from normally distributed
Nonparametric Tests populations with the same variances (or standard deviations). Parametric tests
are based on estimates of the population means and standard deviations, the
parameters of a normal distribution.
Nonparametric tests do not assume that the samples were drawn from a
normal population. Instead, they perform a comparison on ranks of the
observations. Rank Sum Tests automatically rank numeric data, then
compare the ranks rather than the original values.
If you are using one of these procedures to compare multiple groups, and you
find a statistically significant difference, you can use several multiple
comparison procedures (also known as post-hoc tests) to determine exactly
which groups are different and the size of the difference. These procedures are
described for each test.
For t-tests and One Way ANOVAs, you can also use:
➤ The sample size, mean, and standard deviation for each group.
➤ The sample size, mean, and standard error of the mean (SEM) for each
group.
Raw Data The raw data format uses separate worksheet columns for the data in each
group. This is the most common format, where your data have not yet been
analyzed or transformed.
You can use raw data for all tests except Two and Three Way ANOVAs.
! SigmaStat tests accept messy and unbalanced data and do not require equal
sample sizes in the groups being compared. There are no problems associated
with missing data or uneven columns; however, missing values must be
indicated by double dashes (“--”), not empty cells.
FIGURE 8–1
Valid Data Formats
for an Unpaired t-
test
Columns 1 and 2
are arranged as raw
data. Columns 3, 4,
and 5 are arranged
as descriptive
statistics using the
sample size, mean,
and standard
deviation. Columns
6 and 7 are
arranged as group
indexed data, with
column 6 as the
factor column and column 7
as the data column.
Descriptive If your data is in the form of statistical values (sample size, mean, standard
Statistics deviation, or standard error of the mean), the sample sizes (N) must be in one
worksheet column, the means in another column, and the standard
deviations (or standard errors of the mean) in a third column, with the data
for each group in the same row. When comparing two groups, there should
be exactly two rows of data.
Indexed Data Indexed data places the groups or treatments in a factor column, and the
corresponding data points in a second column. Two way ANOVAs require
two factor columns and one data column.
! You can index raw data or convert indexed data to raw data using the Edit
menu Index and UnIndex commands (see Indexing Data on page 73).
FIGURE 8–2
Data Format for a Two
Way ANOVA with
Two Factor Indexed Data
Column 1 is the first factor
column, column 2 is the
second factor column, and
column 3 contains the data.
Unpaired t-Test 10
➤ You want to see if the means of two different samples are significantly
different.
➤ Your samples are drawn from normally distributed populations with the
same variances.
If you know that your data was drawn from a non-normal population, use
the Mann-Whitney Rank Sum Test. When there are more than two groups
to compare, do a One Way Analysis of Variance.
! Note that, depending on your t-test options settings (see Setting t-test
Options on page 208), if you attempt to perform a t-test on non-normal
populations or populations with unequal variances, SigmaStat will inform
you that the data is unsuitable for a t-test, and suggest the Mann-Whitney
Rank Sum Test instead (see Mann-Whitney Rank Sum Test on page 220).
About the The unpaired t-test tests for a difference between two groups that is greater
Unpaired t-test than what can be attributed to random sampling variation. The null
hypothesis of an unpaired t-test is that the means of the populations that you
drew the samples from are the same. If you can confidently reject this
hypothesis, you can conclude that the means are different.
The Unpaired t-test is a parametric test based on estimates of the mean and
standard deviation parameters of the normally distributed populations from
which the samples were drawn.
2 If desired, set the t-test options using the Options for t-test dialog box
(see page 209).
3 Select t-test from the toolbar drop-down list, then click the button,
or choose the Statistics menu Compare Two Groups, t-test command.
4 Run the test by selecting the worksheet columns with the data you want
to test using the Pick Columns dialog box (see page 212).
5 View and interpret the t-test report and generate report graphs (pages
8-214 and 8-217).
FIGURE 8–3
Valid Data
Formats
for an Unpaired
t-test
Columns 1 and
2 are arranged
as raw data.
Columns 3, 4,
and 5 are
arranged as
descriptive
statistics using
the sample size,
mean, and
standard
deviation.
Columns 6 and
7 are arranged as group
indexed data, with column 6
as the factor column and
column 7 as the data column.
For more information on arranging data, see Data Format for Group
Comparison Tests on page 204 or Arranging Data for t-Tests and ANOVAs
on page 64. For information on how to select the data format for an unpaired
t-test, see Unpaired t-Test on page 206.
➤ Adjust the parameters of a test to relax or restrict the testing of your data
for normality and equal variance.
➤ Display the statistics summary and the confidence interval for the data in
the report and save residuals to a worksheet column.
➤ Compute the power or sensitivity of the test.
1 If you are going to run the test after changing test options, and want to
select your data before you run the test, drag the pointer over your data.
2 To open the Options for t-test dialog box, select t-test from the toolbar
drop-down list, then click the button, or choose the Statistics menu
Current Test Options... command. The Normality and Equal Variance
options appear (see Figure 8–4 on page 209).
3 Click the Result tab to view the Summary Table, Confidence Interval,
and Residual options (see Figure 8–5 on page 210), and the Post Hoc
Test tab to view the Power option (see Figure 8–6 on page 211). Click
the Assumption Checking tab to return to the Normality and Equal
Variance options.
4 Click a check box to enable or disable a test option. Options settings are
saved between SigmaStat sessions. For more information on each of the
test options, see pages 8-209 through 8-217.
5 To continue the test, click Run Test. The Pick Columns dialog box
appears (see Selecting Data Columns on page 105 for more
information).
6 To accept the current settings and close the options dialog box, click
OK. To accept the current setting without closing the options dialog
box, click Apply. To close the dialog box without changing any settings
or running the test, click Cancel.
! You can select Help at any time to access SigmaStat’s on-line help system.
Normality and Select the Assumption Checking tab from the options dialog box to view the
Equal Variance Normality and Equal Variance options. The normality assumption test
Assumptions checks for a normally distributed population. The equal variance assumption
test checks the variability about the group means.
FIGURE 8–4
The Options for t-test
Dialog Box Displaying
the Assumption
Checking Options
Equal Variance Testing SigmaStat tests for equal variance by checking the
variability about the group means.
P Values for Normality and Equal Variance The P value determines the
probability of being incorrect in concluding that the data is not normally
distributed (P value is the risk of falsely rejecting the null hypothesis that the
data is normally distributed). If the P computed by the test is greater than the
P set here, the test passes.
! There are extreme conditions of data distribution that these tests cannot take
into account. For example, the Levene Median test fails to detect differences
in variance of several orders of magnitude. However, these conditions should
be easily detected by simply examining the data without resorting to the
automatic assumption tests.
Summary Table Select the Results tab in the options dialog box to view the Summary Table
option. The Summary Table option displays the number of observations for a
column or group, the number of missing values for a column or group, the
average value for the column or group, the standard deviation of the column
or group, and the standard error of the mean for the column or group.
FIGURE 8–5
The Options for t-test
Dialog Box Displaying
the Summary Table,
Confidence Intervals,
and Residuals Options
Confidence Interval Select the Results tab in the options dialog box to view the Confidence
Intervals option. The Confidence Intervals option displays the confidence
interval for the difference of the means. To change the interval, enter any
number from 1 to 99 (95 and 99 are the most commonly used intervals).
Click the selected check box if you do not want to include the confidence
interval in the report.
Residuals Select the Results tab in the options dialog box to view the Residuals option.
Use the Residuals option to display residuals in the report and to save the
residuals of the test to the specified worksheet column. To change the column
the residuals are saved to, edit the number in or select a number from the
drop-down list.
Power Select the Post Hoc Tests tab in the options dialog box to view the Power
option. The power or sensitivity of a test is the probability that the test will
detect a difference between the groups if there is really a difference.
Change the alpha value by editing the number in the Alpha Value box. Alpha
($) is the acceptable probability of incorrectly concluding that there is a
difference. The suggested value is $ 0 0.05. This indicates that a one in
twenty chance of error is acceptable, or that you are willing to conclude there
is a significant difference when P 7 0.05.
FIGURE 8–6
The Options for t-test
Dialog Box Displaying
the Power Option
Running a t-test 10
To run a t-test, you need to select the data to test. The Pick Columns dialog
box is used to select the worksheet columns with the data you want to test
and to specify how your data is arranged in the worksheet.
To run a t-test:
1 If you want to select your data before you run the test, drag the pointer
over your data.
2 Open the Pick Columns dialog box to start the t-test. You can either:
The Pick Columns dialog box appears prompting you to specify a data
format.
3 Select the appropriate data format from the Data Format drop-down
list. If your data is grouped in columns, select Raw. If your data is in the
form of a group index column(s) paired with a data column(s), select
Indexed. If your data was entered in the form of summary statistics for
each group, select either Sample Size, Mean, and Standard Deviation,
or Sample Size, Mean, and SEM (Standard Error of the Mean).
FIGURE 8–7
The Pick Columns
for t-test Dialog Box
Prompting You to
Specify a Data Format
For more information on arranging data, see Data Format for Group
Comparison Tests on page 204, or Arranging Data for t-Tests and
ANOVAs on page 64.
4 Click Next to pick the data columns for the test. If you selected
columns before you chose the test, the selected columns appear in the
Selected Columns list.
The first selected column is assigned to the first row in the Selected
Columns list, and all successively selected columns are assigned to
successive rows in the list. The title of selected columns appears in each
row. For raw and indexed data, you are prompted to select two
worksheet columns. For statistical summary data you are prompted to
select three columns.
FIGURE 8–8
The Pick Columns
for t-test Dialog Box
Prompting You to
Select Data Columns
6 To change your selections, select the assignment in the list, then select
new column from the worksheet. You can also clear a column
assignment by double-clicking it in the Selected Columns list.
7 Click Finish to run the t-test on the selected columns. After the
computations are completed, the report appears. To edit the report, use
the Format menu commands; for information on editing reports, see
Editing Reports on page 137.
The t-test calculates the t statistic, degrees of freedom, and P value of the
specified data. These results are displayed in the t-test report which
automatically appears after the t-test is performed. The other results displayed
in the report are enabled and disabled in the Options for t-test dialog box (see
page 209).
For descriptions of the derivations for t-test results, you can reference any
appropriate statistics reference. For a list of suggested references, see
References on page 12.
! The report scroll bars only scroll to the top and bottom of the current page.
To move to the next or the previous page in the report, use the and
buttons in the formatting toolbar to move one page up and down in the
report.
Result Explanations In addition to the numerical results, expanded explanations of the results may
also appear. To turn off this explanatory text, choose the Statistics menu
Report Options... command and uncheck the Explain Test Results option.
The number of decimal places displayed is also set in the Report Options
dialog box. For more information on setting report options, see Setting
Report Options on page 135.
Normality Test Normality test results show whether the data passed or failed the test of the
assumption that the samples were drawn from normal populations and the P
value calculated by the test. All parametric tests require normally distributed
source populations.
This result is set in the Options for t-test dialog box (see page 209).
FIGURE 8–9
The t-test Report
Equal Variance Test Equal Variance test results display whether or not the data passed or failed the
test of the assumption that the samples were drawn from populations with
the same variance and the P value calculated by the test. Equal variance of the
source population is assumed for all parametric tests. This result is set in the
Options for t-test dialog box (see page 209).
Summary Table SigmaStat can generate a summary table listing the sizes N for the two
samples, number of missing values, means, standard deviations, and the
standard error of the means (SEM). This result is displayed unless you disable
the Summary Table option in the Options for t-test dialog box.
Mean The average value for the column. If the observations are normally
distributed the mean is the center of the distribution.
above or below the mean, and about 95% of the observations will fall within
two standard deviations above or below the mean.
The standard error of the difference is a measure of the precision with which
this difference can be estimated.
You can conclude from “large” absolute values of t that the samples were
drawn from different populations. A large t indicates that the difference
between the treatment group means is larger than what would be expected
from sampling variability alone (i.e., that the differences between the two
groups are statistically significant). A small t (near 0) indicates that there is no
significant difference between the samples.
Confidence Interval If the confidence interval does not include zero, you can conclude that there
for the Difference is a significant difference between the proportions with the level of
of the Means confidence specified. This can also be described as P < $ (alpha), where $ is
the acceptable probability of incorrectly concluding that there is a difference.
The level of confidence is adjusted in the Options for t-test dialog box; this is
typically 100(1*$), or 95%. Larger values of confidence result in wider
intervals and smaller values in smaller intervals. For a further explanation of
$, see Power below. This result is set Options for t-test dialog box (see page
209).
Power The power, or sensitivity, of a t-test is the probability that the test will detect
a difference between the groups if there really is a difference. The closer the
power is to 1, the more sensitive the test.
t-test power is affected by the sample size of both groups, the chance of
erroneously reporting a difference, $ (alpha), the difference of the means, and
the standard deviation.
This result is set in the Options for t-test dialog box (see page 209).
The $ value is set in the Options for t-test dialog box; a value of $ 0 0.05
indicates that a one in twenty chance of error is acceptable, or that you are
willing to conclude there is a significant difference when P 7 0.05.
You can generate up to five graphs using the results from a t-test. They
include a:
Bar Chart The t-test bar chart plots the group means as vertical bars with error bars
indicating the standard deviation. If the graph data is indexed, the levels in
the factor column are used as the tick marks for the bar chart bars, and the
column titles are used as the X and Y axis titles. If the graph data is in raw or
statistical format, the column titles are used as the tick marks for the bar chart
bars and default X Data and Y Data axis titles are assigned to the graph. For
an example of a bar chart, see Bar Charts of the Column Means on page 149.
Scatter Plot The t-test scatter plot graphs the group means as single points with error bars
indicating the standard deviation. If the graph data is indexed, the levels in
the factor column are used as the tick marks for the scatter plot points, and
the column titles are used as the X and Y axis titles. If the graph data is in raw
or statistical format, the column titles are used as the tick marks for the
scatter plot points and default X Data and Y Data axis titles are assigned to
the graph. For an example of a scatter plot, see Scatter Plot on page 150.
Point Plot The t-test point plot graphs all values in each column as a point on the graph.
If the graph data is indexed, the levels in the factor column are used as the
tick marks for the plot points, and the column titles are used as the X and Y
axis titles. If the graph data is in raw or statistical format, the column titles are
used as the tick marks for the plot points and default X Data and Y Data axis
titles are assigned to the graph. For an example of a point plot, see Point Plot
on page 150.
Histogram of The t-test histogram plots the raw residuals in a specified range, using a
Residuals defined interval set. The residuals are divided into a number of evenly
incremented histogram intervals and plotted as histogram bars indicating the
number of residuals in each interval. The X axis represents the histogram
intervals, and the Y axis represent the number of residuals in each group. For
an example of a histogram, see page 153.
Normal The t-test probability plot graphs the frequency of the raw residuals. The
Probability Plot residuals are sorted and then plotted as points around a curve representing
the area of the gaussian plotted on a probability axis. Plots with residuals that
fall along gaussian curve indicate that your data was taken from a normally
distributed population. The X axis is a linear scale representing the residual
values. The Y axis is a probability scale representing the cumulative frequency
of the residuals. For an example of a normal probability plot, see page 155.
1 Click the toolbar button, or choose the Graph menu Create Graph
command when the t-test report is selected. The Create Graph dilaog
appears displaying the types of graphs available for the t-test results.
FIGURE 8–10
The Create Graph Dialog
Box
for the t-test Report
2 Select the type of graph you want to create from the Graph Type list,
then click OK, or double-click the desired graph in the list. The
selected graph appears in a graph window. For more information on
each of the graph types, see Chapter 8.
FIGURE 8–11
A Point Plot of the
Result Data for a t-test
➤ You want to see if the medians of two different samples are significantly
different.
➤ The samples are not drawn from normally distributed populations with
the same variances, or you do not want to assume that they were drawn
from normal populations.
If you know your data was drawn from a normally distributed population,
use the Unpaired t-test (page 206). When there are more than two groups to
compare, run a Kruskal-Wallis ANOVA on Ranks (page 310).
! Note that, depending on your Rank Sum Test options settings (see page
222), if you attempt to perform a rank sum test on normal populations with
equal variances, SigmaStat informs you that the data can be analyzed with
the more powerful Unpaired t-test instead.
About the The Mann-Whitney Rank Sum Test is used to test for a difference between
Mann-Whitney two groups that is greater than what can be attributed to random sampling
Rank Sum Test variation. The null hypothesis is that the two samples were not drawn from
populations with different medians.
The Rank Sum Test is a nonparametric procedure, which does not require
assuming normality or equal variance. It ranks all the observations from
smallest to largest without regard to which group each observation comes
from. The ranks for each group are summed and the rank sums compared.
If there is no difference between the two groups, the mean ranks should be
approximately the same. If they differ by a large amount, you can assume that
the low ranks tend to be in one group and the high ranks are in the other, and
conclude that the samples were drawn from different populations (i.e., that
there is a statistically significant difference).
2 If desired, set the Rank Sum options using the Options for Rank Sum
Test dialog box (page 232).
3 Select Rank Sum Test from the toolbar drop-down list, then click the
button, or choose the Statistics menu Compare Two Groups, Rank
Sum Test command.
4 Run the test by selecting the worksheet columns with the data you want
to test using the Pick Columns dialog box (page 99).
5 View and interpret the Rank Sum Test report and generate report
graphs (pages 8-226 and pages 8-228).
The format of the data to be tested can be raw data or indexed data; in either
case, the data is found in two worksheet columns. For more information on
arranging data, see Data Format for Group Comparison Tests on page 204,
or Arranging Data for t-Tests and ANOVAs on page 64. For information on
how to select the data format for a test, see Selecting a Test on page 98.
FIGURE 8–12
Valid Data Formats
for a Mann-Whitney
Rank Sum Test
Columns 1 and 2 are
arranged as raw data.
Columns 3 and 4 are
arranged as group
indexed data, with column
3 as the factor column.
➤ Adjust the parameters of the test to relax or restrict the testing of your
data for normality and equal variance.
➤ Display the summary table.
1 If you are going to run the test after changing test options, and want to
select your data before you run the test, drag the pointer over your data.
2 To open the Options for Rank Sum Test dialog box, select Rank Sum
Test from the toolbar drop-down list, then click the button, or
choose the Statistics menu Current Test Options... command. The
Normality and Equal Variance options appear (see Figure 8–20 on page
232).
3 Click the Results tab to view the Summary Table option (see Figure 8–
20 on page 232). Click the Assumption Checking tab to return to the
Normality and Equal Variance options.
4 Click a check box to enable or disable a test option. Options settings are
saved between SigmaStat sessions. For more information on each of the
test options, see pages 8-209 through pages 8-210.
5 To continue the test, click Run Test. The Pick Columns dialog box
appears (see Picking Data to Test on page 99 for more information).
6 To accept the current settings and close the options dialog box, click
OK. To accept the current setting without closing the options dialog
box, click Apply. To close the dialog box without changing any settings
or running the test, click Cancel.
! You can select Help at any time to access SigmaStat’s on-line help system.
Normality and Select the Assumption Checking tab from the options dialog box to view the
Equal Variance Normality and Equal Variance options. The normality assumption test
Assumptions checks for a normally distributed population. The equal variance assumption
test checks the variability about the group means.
FIGURE 8–13
The Options for Rank
Sum Test Dialog Box
Displaying the Assumption
Checking Options
Equal Variance Testing SigmaStat tests for equal variance by checking the
variability about the group means.
P Values for Normality and Equal Variance The P value determines the
probability of being incorrect in concluding that the data is not normally
distributed (the P value is the risk of falsely rejecting the null hypothesis that
the data is normally distributed). If the P value computed by the test is
greater than the P set here, the test passes.
! There are extreme conditions of data distribution that these tests cannot take
into account. For example, the Levene Median test fails to detect differences
in variance of several orders of magnitude. However, these conditions should
be easily detected by simply examining the data without resorting to the
automatic assumption tests.
Summary Table Select the Results tab to view the Summary Table option. The summary table
for a ANOVA on Ranks lists the medians, percentiles, and sample sizes N in
the ANOVA on Ranks report. If desired, change the percentile values by
editing the boxes. The 25th and the 75th percentiles are the suggested
percentiles.
FIGURE 8–14
The Options for Rank Sum
Test Dialog Box Displaying
the Summary Table Options
To run a test, you need to select the data to test. The Pick Columns dialog
box is used to select the worksheet columns with the data you want to test
and to specify how your data is arranged in the worksheet.
1 If you want to select your data before you run the test, drag the pointer
over your data.
2 Open the Pick Columns dialog box to start the Rank Sum Test. You can
either:
➤ Select Rank Sum Test from the toolbar drop-down list, then click the
button.
➤ Choose the Statistics menu Compare Two Groups, Rank Sum Test...
command.
➤ Click the Run Test button from the Options for Rank Sum Test
dialog box.
The Pick Columns dialog box appears prompting you to specify a data
format.
3 Select the appropriate data format from the Data Format drop-down
list. If your data is grouped in columns, select Raw. If your data is in the
form of a group index column(s) paired with a data column(s), select
Indexed.
FIGURE 8–15
The Pick Columns
for Rank Sum Test Dialog
Box
Prompting You to
Specify a Data Format
For more information on arranging data, see Data Format for Group
Comparison Tests on page 204, or Arranging Data for t-Tests and
ANOVAs on page 64.
4 Click Next to pick the data columns for the test. If you selected
columns before you chose the test, the selected columns appear in the
Selected Columns list.
The first selected column is assigned to the first row in the Selected
Columns list, and all successively selected columns are assigned to
successive rows in the list. The number or title of selected columns
appear in each row. For raw and indexed data, you are prompted to
select two worksheet columns.
6 To change your selections, select the assignment in the list, then select
new column from the worksheet. You can also clear a column
assignment by double-clicking it in the Selected Columns list.
7 Click Finish to run the Rank Sum Test on the selected columns. If you
elected to test for normality and equal variance, SigmaStat performs the
test for normality (Kolmogorov-Smirnov) and the test for equal
variance (Levene Median). If your data pass both tests, SigmaStat
FIGURE 8–16
The Pick Columns
for Rank Sum Test Dialog
Box
Prompting You to
Select Data Columns
The Rank Sum Test computes the Mann-Whitney T statistic and the P value
for T. These results are displayed in the rank sum report which appears after
the rank sum test is performed. The other results displayed in the report are
enabled and disabled in the Options for Rank Sum Test dialog box (see
Setting Mann-Whitney Rank Sum Test Options on page 222).
FIGURE 8–17
The Mann-Whitney
Rank Sum Test
Report
For descriptions of the derivations for t-test results, you can reference any
appropriate statistics reference. For a list of suggested references, see
References on page 12.
! The report scroll bars only scroll to the top and bottom of the current page.
To move to the next or the previous page in the report, use the and
buttons in the formatting toolbar to move one page up and down in the
report.
Result Explanations In addition to the numerical results, expanded explanations of the results may
also appear. To turn off this explanatory text, choose the Statistics menu
Report Options... command and uncheck the Explain Test Results option.
The number of decimal places displayed is also set in the Report Options
dialog box. For more information of setting report options, see Setting
Report Options on page 135.
Normality Test Normality test results display whether the data passed or failed the test of the
assumption that they were drawn from a normal population and the P value
calculated by the test. For nonparametric procedures, this test can fail, as
nonparametric tests do not assume normally distributed source populations.
This result is set in the Options for Rank Sum Test dialog box (see page 232).
Equal Variance Test Equal Variance test results display whether or not the data passed or failed the
test of the assumption that the samples were drawn from populations with
the same variance and the P value calculated by the test. Nonparametric tests
do not assume equal variance of the source populations. This result is set in
the Options for Rank Sum Test dialog box (see page 232).
Summary Table SigmaStat generates a summary table listing the sample sizes N, number of
missing values, medians, and percentiles unless you disable the Display
Summary Table option in the Options for Rank Sum Test dialog box.
Percentiles The two percentile points that define the upper and lower tails
of the observed values.
T Statistic The T statistic is the sum of the ranks in the smaller sample group or from
the first selected group, if both groups are the same size. This value is
compared to the population of all possible rankings to determine the
possibility of this T occurring.
You can generate up to two graphs using the results from a Rank Sum Test.
They include a:
Box Plot The Rank Sum Test box plot graphs the percentiles and the median of
column data. The ends of the boxes define the 25th and 75th percentiles,
with a line at the median and error bars defining the 10th and 90th
percentiles.
If the graph data is indexed, the levels in the factor column are used as the
tick marks for the box plot boxes, and the column titles are used as the axis
titles. If the graph data is in raw format, the column titles are used as the tick
marks for the box plot boxes, and no axis titles are assigned to the graph. For
an example of a box plot, see page 152 in the CREATING AND MODIFYING
GRAPHS chapter.
Point Plot The Rank Sum Test point plot graphs all values in each column as a point on
the graph. If the graph data is indexed, the levels in the factor column are
used as the tick marks for the plot points, and the column titles are used as
the X and Y axis titles. If the graph data is in raw or statistical format, the
column titles are used as the tick marks for the plot points and default X Data
and Y Data axis titles are assigned to the graph. For an example of a point
plot, see Point Plot on page 150.
FIGURE 8–18
The Create Graph Dialog
Box for the Rank Sum Test
Report
2 Select the type of graph you want to create from the Graph Type list,
then click OK, or double-click the desired graph in the list. For more
information on each of the graph types, see Chapter 8.
FIGURE 8–19
A Box Plot of the
Result Data for a
Rank Sum Test
One Way Analysis of Variance is a parametric test that assumes that all the
samples are drawn from normally distributed populations with the same
standard deviations (variances).
If you know that your data was drawn from non-normal populations, use the
Kruskal-Wallis ANOVA on Ranks. If you want to consider the effects of two
factors on your experimental groups, use Two Way ANOVA. When there are
only two groups to compare, you can do a t-test (depending on the type of
results you want). Performing an ANOVA for two groups yields exactly the
same P value as an unpaired t-test.
! Depending on your ANOVA options settings (see page 232), if you attempt
to perform an ANOVA on non-normal populations or populations with
unequal variances, SigmaStat informs you that the data is unsuitable for a
parametric test, and suggests the Kruskal-Wallis ANOVA on Ranks (see
page 310).
About a One The design for a One Way ANOVA is the same as an unpaired t-test except
Way ANOVA that there can be more than two experimental groups. The null hypothesis is
that there is no difference among the populations from which the samples
were drawn.
2 If desired, set the One Way ANOVA options using the Options for
One Way ANOVA dialog box (page 232).
3 Select One Way ANOVA from the toolbar drop-down list, then click
the button, or choose Compare Many Groups, One Way ANOVA...
command from the Statistics menu.
4 Run the test by selecting the worksheet columns with the data you want
to test using the Pick Columns dialog box (page 99).
6 View and interpret the One Way ANOVA report and generate report
graphs (page 243 and page 250).
Data can be arranged as raw data, indexed data, or summary statistics. Raw
data is placed in as many columns as there are groups, up to 32; each column
contains the data for one group. Indexed data is placed in two worksheet
columns. Statistical summary data is placed in three columns. For more
information on arranging data, see Data Format for Group Comparison Tests
on page 204, or Arranging Data for Contingency Tables on page 69.
FIGURE 8–20
Valid Data Formats
for a One Way ANOVA
Columns 1 through 3 are
arranged as groups in
columns. Columns 4, 5,
and 6 are arranged as
descriptive statistics using
the mean, standard
deviation, and size. Columns
7 and 8 are arranged as
group indexed data, with
column
7 as the factor column.
Selecting Data When running a One Way ANOVA you can either:
Columns
➤ Select the columns by dragging your mouse over the columns before
choosing the test.
➤ Select the columns while running the test.
➤ Adjust the parameters of the test to relax or restrict the testing of your
data for normality and equal variance.
➤ Display the statistics summary table and the confidence interval for the
data, and assign residuals to a worksheet column.
➤ Enable multiple comparisons.
➤ Compute the power, or sensitivity, of the test.
1 If you are going to run the test after changing test options, and want to
select your data before you run the test, drag the pointer over your data.
2 To open the Options for One Way ANOVA dialog box, select One Way
ANOVA from the toolbar drop-down, then click the button, or
choose the Statistics menu Current Test Options... command. The
Normality and Equal Variance options appear (see Figure 8–20 on page
232).
3 Click the Results tab to view the Summary Table, Confidence Intervals,
and Residuals in Column options (see Figure 8–22 on page 235). Click
the Post Hoc Test tab to view the Power and Multiple Comparisons
options (see 8–23 on page 236). Click the Assumption Checking tab to
return to the Normality and Equal Variance options.
4 Click a check box to enable or disable a test option. Options settings are
saved between SigmaStat sessions. For more information on each of the
test options, see pages 8-233 through 8-236.
5 To continue the test, click Run Test. The Pick Columns dialog box
appears (see page 99 for more information).
6 To accept the current settings and close the options dialog box, click
OK. To accept the current setting without closing the options dialog
box, click Apply. To close the dialog box without changing any settings
or running the test, click Cancel.
! You can select Help at any time to access SigmaStat’s on-line help system.
Normality and Select the Assumption Checking tab from the options dialog box to view the
Equal Variance Normality and Equal Variance options. The normality assumption test
Assumptions checks for a normally distributed population. The equal variance assumption
test checks the variability about the group means.
FIGURE 8–21
The Options for
One Way ANOVA
Dialog Box Displaying
the Assumption
Checking Options
Equal Variance Testing SigmaStat tests for equal variance by checking the
variability about the group means.
P Values for Normality and Equal Variance The P value determines the
probability of being incorrect in concluding that the data is not normally
distributed (the P value is the risk of falsely rejecting the null hypothesis that
the data is normally distributed). If the P value computed by the test is
greater than the P set here, the test passes.
! There are extreme conditions of data distribution that these tests cannot take
into account. For example, the Levene Median test fails to detect differences
in variance of several orders of magnitude. However, these conditions should
be easily detected by simply examining the data without resorting to the
automatic assumption tests.
Summary Table Select the Results tab in the options dialog box to view the summary table
option. The Summary Table option displays the number of observations for a
column or group, the number of missing values for a column or group, the
average value for the column or group, the standard deviation of the column
or group, and the standard error of the mean for the column or group.
FIGURE 8–22
The Options for One Way
ANOVA Dialog Box
Displaying the Summary
Table, Confidence Intervals,
and Residuals Options
Confidence Interval Select the Results tab in the options dialog box to view the Confidence
Interval option. The Confidence Intervals option displays the confidence
interval for the difference of the means. To change the interval, enter any
number from 1 to 99 (95 and 99 are the most commonly used intervals).
Click the selected check box if you do not want to include the confidence
interval in the report.
Residuals Select the Results tab in the options dialog box to view the Residuals option.
Use the Residuals option to display residuals in the report and to save the
residuals of the test to the specified worksheet column. To change the
column the residuals are saved to, edit the number in or select a number
from the drop-down list.
Power Select the Post Hoc Tests tab in the options dialog box to view the Power
options. The power or sensitivity of a test is the probability that the test will
detect a difference between the groups if there is really a difference.
FIGURE 8–23
The Options for One
Way ANOVA Dialog Box
Displaying the
Power and Multiple
Comparison Options
Change the alpha value by editing the number in the Alpha Value box. Alpha
($) is the acceptable probability of incorrectly concluding that there is a
difference. The suggested value is $ 0 0.05. This indicates that a one in
twenty chance of error is acceptable, or that you are willing to conclude there
is a significant difference when P 7 0.05.
Multiple Select the Post Hoc Test tab in the Options dialog box to view the multiple
Comparisons comparisons options (see Figure 8–25 on page 239). One Way ANOVAs test
the hypothesis of no differences between the several treatment groups, but do
not determine which groups are different, or the sizes of these differences.
Multiple comparison procedures isolate these differences.
The P value used to determine if the ANOVA detects a difference is set in the
Report Options dialog box. If the P value produced by the One Way
ANOVA is less than the P value specified in the box, a difference in the
groups is detected and the multiple comparisons are performed. For more
information on specifying a P value for the ANOVA, see Setting Report
Options on page 232.
Significant Multiple Comparison Value Select either .05 or .01 from the
Significance Value for Multiple Comparisons drop-down list. This value
determines the that the likelihood of the multiple comparison being incorrect
in concluding that there is a significant difference in the treatments.
A value of .05 indicates that the multiple comparisons will detect a difference
if there is less than 5% chance that the multiple comparison is incorrect in
detecting a difference.
To run a test, you need to select the data to test. The Pick Columns dialog
box is used to select the worksheet columns with the data you want to test
and to specify how your data is arranged in the worksheet.
1 If you want to select your data before you run the test, drag the pointer
over your data.
2 Open the Pick Columns dialog box to start the One Way ANOVA. You
can either:
➤ Select One Way ANOVA from the toolbar drop-down list, then click
the button.
➤ Choose the Statistics menu Compare Many Groups, One Way
ANOVA... command.
➤ Click the Run Test button from the Options for One Way ANOVA
dialog box.
The Pick Columns dialog box appears prompting you to specify a data
format.
3 Select the appropriate data format from the Data Format drop-down
list. If your data is grouped in columns, select Raw. If your data is in the
form of a group index column(s) paired with a data column(s), select
Indexed. If your data was entered in the form of summary statistics for
each group, select either Sample Size, Mean, and Standard Deviation or
Sample Size, Mean, and SEM (Standard Error of the Mean).
For more information on arranging data, see Data Format for Group
Comparison Tests on page 204, or Arranging Data for Contingency
Tables on page 69.
FIGURE 8–24
The Pick Columns
for One Way ANOVA
Dialog Box Prompting You to
Specify a Data Format
4 Click Next to pick the data columns for the test. If you selected
columns before you chose the test, the selected columns appear in the
Selected Columns list.
The first selected column is assigned to the first row in the Selected
Columns list, and all successively selected columns are assigned to
successive rows in the list. The number or title of selected columns
appear in each row. You are prompted to pick a minimum of two and a
maximum of 64 columns for raw data, two columns for indexed data,
and three columns for statistical summary data.
6 To change your selections, select the assignment in the list, then select
new column from the worksheet. You can also clear a column
assignment by double-clicking it in the Selected Columns list.
FIGURE 8–25
The Pick Columns
for One Way ANOVA
Dialog Box Prompting You to
Select Data Columns
7 Click Finish to perform the One Way ANOVA. If you elected to test
for normality and equal variance, and your data fails either test,
SigmaStat warns you and suggests continuing your analysis using the
nonparametric Kruskal-Wallis ANOVA on Ranks (see page 310).
The One Way ANOVA tests the hypothesis of no differences between the
several treatment groups, but does not determine which groups are different,
or the sizes of these differences. Multiple comparison tests isolate these
differences by running comparisons between the experimental groups.
There are seven multiple comparison tests to choose from for the One Way
ANOVA. You can choose to perform the:
➤ Holm-Sidak Test
➤ Tukey Test
➤ Student-Newman-Keuls Test
➤ Bonferroni t-test
➤ Fisher’s LSD
➤ Dunnet’s Test
➤ Duncan’s Multiple Range Test
There are two types of multiple comparisons available for the One Way
ANOVA. The types of comparison you can make depends on the selected
multiple comparison test, including:
Holm-Sidak Test Use the Holm-Sidak Test for both pairwise comparisons and comparisons
versus a control group. It is more powerful than the Tukey and Bonferroni
tests and, consequently, it is able to detect differences that these other tests do
not. It is recommended as the first-line procedure for pairwise comparison
testing.
When performing the test, the P values of all comparisons are computed and
ordered from smallest to largest. Each P value is then compared to a critical
level that depends upon the significance level of the test (set in the test
options), the rank of the P value, and the total number of comparisons made.
A P value less than the critical level indicates there is a significant difference
between the corresponding two groups.
Tukey Test The Tukey Test and the Student-Newman-Keuls test are conducted similarly
to the Bonferroni t-test, except that they use a table of critical values that is
computed based on a better mathematical model of the probability structure
of the multiple comparisons. The Tukey Test is more conservative than the
Student-Newman-Keuls test, because it controls the errors of all comparisons
simultaneously, while the Student-Neuman-Keuls test controls errors among
tests of k means. Because it is more conservative, it is less likely to determine
that a give differences is statistically significant and it is the recommended test
for all pairwise comparisons.
Student-Newman- The Student-Newman-Keuls Test and the Tukey Test are conducted similarly
Keuls (SNK) Test to the Bonferroni t-test, except that they use a table of critical values that is
computed based on a better mathematical model of the probability structure
of the multiple comparisons. The Student-Newman-Keuls Test is less
conservative than the Tukey Test because it controls errors among tests of k
means, while the Tukey Test controls the errors of all comparisons
simultaneously. Because it is less conservative, it is more likely to determine
that a given difference is statistically significant. The Student-Newman-Keuls
Test is usually more sensitive than the Bonferroni t-test, and is only available
for all pairwise comparisons.
Bonferroni t-test The Bonferroni t-test performs pairwise comparisons with paired t-tests. The
P values are then multiplied by the number of comparisons that were made.
It can perform both all pairwise comparisons and multiple comparisons vs. a
control, and is the most conservative test for both each comparison type. For
less conservative all pairwise comparison tests, see the Tukey and the Student-
Newman-Keuls tests, and for the less conservative multiple comparison vs. a
control tests, see the Dunnett’s Test.
Fisher’s Least The Fisher’s LSD Test is the least conservative all pairwise comparison test.
Significance Unlike the Tukey and the Student-Newman-Keuls, it makes not effort to
Difference Test control the error rate. Because it makes not attempt in controlling the error
rate when detecting differences between groups, it is not recommended.
Dunnett’s Test Dunnett's test is the analog of the Student-Newan-Keuls Test for the case of
multiple comparisons against a single control group. It is conducted similarly
to the Bonferroni t-test, but with a more sophisticated mathematical model
of the way the error accumulates in order to derive the associated table of
critical values for hypothesis testing. This test is less conservative than the
Bonferroni Test, and is only available for multiple comparisons vs. a control.
Duncan’s The Duncan’s Test is the same way as the Tukey and the Student-Newman-
Multiple Range Keuls tests, except that it is less conservative in determining whether the
difference between groups is significant by allowing a wider range for error
rates. Although it has a greater power to detect differences than the Tukey
and the Student-Newman-Keuls tests, it has less control over the Type 1 error
rate, and is, therefore, not recommended.
Performing a The multiple comparison test you choose depends on the treatments you are
Multiple Comparison testing. Click Cancel if you do not want to perform a multiple comparison
test.
8 The All Levels option under the Select Factors to Compare heading
determines whether or not multiple comparison are performed. This
option is automatically selected if P value produced by the ANOVA
(displayed in the upper left corner of the dialog box) is less than or
equal to the P value set in the Options dialog box, and multiple
comparisons are performed. If the P value displayed in the dialog box is
greater than the P value set in the Options dialog box, the All Factors
options is not selected and multiple comparisons are not performed.
You can disable multiple comparison testing for the groups by clicking
the selected All Factors check box.
9 Select the desired multiple comparison test from the Suggested Test
drop-down list. The Tukey and Student-Newman-Keuls tests are
recommended for determining the difference among all treatments. If
you have only a few treatments, you may want to select the simpler
Bonferroni t-test.
FIGURE 8–26
The Multiple Comparison
Options Dialog Box
! Note that in both cases the Bonferroni t-test is most sensitive with a small
number of groups. Dunnett’s test is not available if you have fewer than
six observations.
The One Way ANOVA report displays an ANOVA table describing the
source of the variation in the groups. This table displays the sum of squares,
degrees of freedom, and mean squares of the groups, as well as the F statistic
and the corresponding P value. The statistical summary table of the data and
other results displayed in the report are enabled and disabled in the Options
for One Way ANOVA dialog box (see Setting One Way ANOVA Options on
page 232).
FIGURE 8–27
The Multiple Comparison
Options Dialog Box
Prompting You to
Select a Control Group
For descriptions of the derivations for One Way ANOVA results, you can
reference any appropriate statistics reference. For a list of suggested references,
see References on page 12.
! The report scroll bars only scroll to the top and bottom of the current page.
To move to the next or the previous page in the report, use the and
buttons in the formatting toolbar to move one page up and down in the
report.
Result Explanations In addition to the numerical results, expanded explanations of the results may
also appear. To turn off this explanatory text, choose the Statistics menu
Report Options... command and click the selected Explain Test Results check
box.
The number of decimal places displayed is also set in the Report Options
dialog box. For information on editing reports, see Chapter 6, Working with
Reports.
Normality Test Normality test results display whether the data passed or failed the test of the
assumption that they were drawn from a normal population and the P value
calculated by the test. Normally distributed source populations are required
for all parametric tests. This result is set in the Options for One Way
ANOVA dialog box (see page 233).
Equal Variance Test Equal Variance test results display whether or not the data passed or failed the
test of the assumption that the samples were drawn from populations with
the same variance, and the P value calculated by the test. Equal variance of
the source populations is assumed for all parametric tests.
FIGURE 8–28
The One Way ANOVA
Results Report
Summary Table If you enabled this option in the Options for One Way ANOVA dialog box,
SigmaStat generates a summary table listing the sample sizes N, number of
missing values, mean, standard deviation, differences of the means and
standard deviations, and standard error of the means.
Mean The average value for the column. If the observations are normally
distributed the mean is the center of the distribution.
Confidence Interval If the confidence interval does not include zero, you can conclude that there
for the Difference is a significant difference between the proportions with the level of
of the Means confidence specified. This can also be described as P < $ (alpha), where $ is
the acceptable probability of incorrectly concluding that there is a difference.
The level of confidence is adjusted in the options dialog box; this is typically
100(1*$), or 95%. Larger values of confidence result in wider intervals and
smaller values in smaller intervals. For a further explanation of $, see the
following section, Power.
Power The power of the performed test is displayed unless you disable this option in
the Options for One Way ANOVA dialog box.
The power, or sensitivity, of a One Way ANOVA is the probability that the
test will detect a difference among the groups if there really is a difference.
The closer the power is to 1, the more sensitive the test.
ANOVA power is affected by the sample sizes, the number of groups being
compared, the chance of erroneously reporting a difference $ (alpha), the
observed differences of the group means, and the observed standard
deviations of the samples.
The $ value is set in the Options for One Way ANOVA dialog box; the
suggested value is $ 0 0.05 which indicates that a one in twenty chance of
error is acceptable. Smaller values of $ result in stricter requirements before
concluding there is a significant difference, but a greater possibility of
concluding there is no difference when one exists (a Type II error). Larger
values of $ make it easier to conclude that there is a difference but also
increase the risk of seeing a false difference (a Type I error).
ANOVA Table The ANOVA table lists the results of the one way ANOVA.
➤ The sum of squares between the groups measures the variability of the
average differences of the sample groups.
➤ The sum of squares within the groups (also called error or residual sum of
squares) measures the underlying variability of all individual samples.
➤ The total sum of squares measures the total variability of the observations
about the grand mean (mean of all observations).
The mean square within groups (also called the residual or error mean square)
is:
If the F ratio is around 1, you can conclude that there are no significant
differences between groups (i.e., the data groups are consistent with the null
hypothesis that all the samples were drawn from the same population).
If F is a large number, you can conclude that at least one of the samples was
drawn from a different population (i.e., the variability is larger than what is
expected from random variability in the population). To determine exactly
which groups are different, examine the multiple comparison results.
Multiple If you selected to perform multiple comparisons (see page 236), a table of the
Comparisons comparisons between group pairs is displayed. The multiple comparison
procedure is activated in the Options for One Way ANOVA dialog box (see
page 236). The tests used in the multiple comparison procedure is selected in
the Multiple Comparison Options dialog box (see page 239).
Bonferroni t-test Results The Bonferroni t-test lists the differences of the
means for each pair of groups, computes the t values for each pair, and
displays whether or not P < 0.05 for that comparison. The Bonferroni t-test
can be used to compare all groups or to compare versus a control.
You can conclude from “large” values of t that the difference of the two
groups being compared is statistically significant.
If the P value for the comparison is less than 0.05, the likelihood of
erroneously concluding that there is a significant difference is less than 5%. If
it is greater than 0.05, you cannot confidently conclude that there is a
difference.
The difference of the means is a gauge of the size of the difference between
the two groups.
Dunnett's test only compares a control group to all other groups. All tests
compute the q test statistic and display whether or not P 7 0.05 for that pair
comparison.
You can conclude from “large” values of q that the difference of the two
groups being compared is statistically significant.
If the P value for the comparison is less than 0.05, the likelihood of being
incorrect in concluding that there is a significant difference is less than 5%. If
it is greater than 0.05, you cannot confidently conclude that there is a
difference.
The Difference of the Means is a gauge of the size of the difference between
the two groups.
p is a parameter used when computing q. The larger the p, the larger q needs
to be to indicate a significant difference. p is an indication of the differences
in the ranks of the group means being compared. Groups means are ranked
in order from largest to smallest, and p is the number of means spanned in
the comparison. For example, if you are comparing four means, when
comparing the largest to the smallest p 0 4, and when comparing the second
smallest to the smallest p 0 2.
You can generate up to five graphs using the results from a One Way
ANOVA. They include a:
Bar Chart The One Way ANOVA bar chart plots the group means as vertical bars with
error bars indicating the standard deviation. If the graph data is indexed, the
levels in the factor column are used as the tick marks for the bar chart bars,
and the column titles are used as the X and Y axis titles. If the graph data is in
raw or statistical format, the column titles are used as the tick marks for the
bar chart bars and default X Data and Y Data axis titles are assigned to the
graph. For an example of a bar chart, see Bar Charts of the Column Means on
page 149.
Scatter Plot The One Way ANOVA scatter plot graphs the group means as single points
with error bars indicating the standard deviation. If the graph data is indexed,
the levels in the factor column are used as the tick marks for the scatter plot
points, and the column titles are used as the X and Y axis titles. If the graph
data is in raw or statistical format, the column titles are used as the tick marks
for the scatter plot points and default X Data and Y Data axis titles are
assigned to the graph. For an example of a scatter plot, see page 150.
Histogram of The One Way ANOVA histogram plots the raw residuals in a specified range,
Residuals using a defined interval set. The residuals are divided into a number of evenly
incremented histogram intervals and plotted as histogram bars indicating the
number of residuals in each interval. The X axis represents the histogram
intervals, and the Y axis represent the number of residuals in each group. For
an example of a histogram, see page 153.
Probability Plot The One Way ANOVA probability plot graphs the frequency of the raw
residuals. The residuals are sorted and then plotted as points around a curve
representing the area of the gaussian plotted on a probability axis. Plots with
residuals that fall along gaussian curve indicate that your data was taken from
a normally distributed population. The X axis is a linear scale representing
the residual values. The Y axis is a probability scale representing the
cumulative frequency of the residuals. For an example of a probability, see
page 155.
Multiple The One Way ANOVA multiple comparison graphs a plot significant
Comparison Graphs differences between levels of a significant factor. There is one graph for every
significant factor reported by the specified multiple comparison test. If there
is one significant factor reported, one graph appears; if there are two
significant factors, two graphs appear, and so on. If a factor is not reported as
significant, a graph for the factor does not appear. For an example of a
multiple comparison graph, see page 160.
Graph dilaog appears displaying the types of graphs available for the
One Way ANOVA report.
FIGURE 8–29
The Create Graph Dialog
Box for a One Way ANOVA
Report
2 Select the type of graph you want to create from the Graph Type list,
then click OK, or double-click the desired graph in the list. For more
information on each of the graph types, see Chapter 8. The specified
graph appears in a graph window or in the report.
FIGURE 8–30
A Normal Probability Plot
for a One Way ANOVA
About the In a two way or two factor analysis of variance, there are two
Two Way ANOVA experimental factors which are varied for each experimental group. A
two factor design is used to test for differences between samples grouped
according to the levels of each factor and for interactions between the
factors.
Two Way ANOVA is a parametric test that assumes that all the samples
were drawn from normally distributed populations with the same
variances.
2 If desired, set the Two Way ANOVA options using the Options for
Two Way ANOVA dialog box (page 258).
4 Run the test by selecting the worksheet columns with the data you
want to test using the Pick Columns dialog box (page 99).
6 View and interpret the Two Way ANOVA report and generate the
report graph (pages 9-271 and 9-279).
The Two Way ANOVA tests for differences between samples grouped
according to the levels of each factor and the interactions between the
factors.
If your data is missing data points or even whole cells, SigmaStat detects
this and provides the correct solutions; see the following section, Missing
Data and Empty Cells.
For more information on arranging data, see Data Format for Group
Comparison Tests on page 204, or Arranging Data for Contingency
Tables on page 69.
Missing Data and Empty Ideally, the data for a Two Way ANOVA should be completely balanced,
Cells Data i.e., each group or cell in the experiment has the same number of
observations and there are no missing data. However, SigmaStat
properly handles all occurrences of missing and unbalanced data
automatically.
TABLE 3-2
Data for a Two
Gender Drug
Way ANOVA with a Drug A Drug B
Missing Value in the
Male/Drug A Cell Male 3.8 1.5
A general linear model
approach is used in -- 1.8
these situations.
3.5 2.2
Female 5.1 5.9
4.9 6.1
5.5 6.6
Empty Cells When there is an empty cell, i.e., there are no observations
for a combination of two factor levels, SigmaStat stops and suggests
either analysis of the data using a two way design with the added
If you treat the problem as a One Way ANOVA, each cell in the table is
treated as a different level of a single experimental factor. This approach
is the most conservative analysis because it requires no additional
assumptions about the nature of the data or experimental design.
TABLE 3-5
Example of Connected 1.2 8.3
Data that You Can’t Draw
a Series of Straight 2.4 6.2
Vertical and Horizontal
Lines Through
5.8 1.0
4.8 .98
Entering A Two Way ANOVA can only be performed on two factor indexed data.
Worksheet Data Two factor indexed data is placed in three columns; a data point indexed
FIGURE 9–31
Valid Data Formats
for a Two Way ANOVA
Column 1 is the first factor
index, column 2 is the
second factor index, and
column 3 is the data. The
data for this worksheet is
taken from Table 3-2.
1 If you are going to run the test after changing test options and
want to select your data before you run the test, drag the pointer
over the data.
2 To open the Options for Two Way ANOVA dialog box, select Two
Way ANOVA from the toolbar drop-down list, then click the
button, or choose the Statistics menu Current Test Options...
5 To continue the test, click Run Test. The Pick Columns dialog
box appears (see Picking Data to Test on page 99 for more
information).
6 To accept the current settings and close the options dialog box,
click OK. To accept the current setting without closing the
options dialog box, click Apply. To close the dialog box without
changing any settings or running the test, click Cancel.
! You can cick Help at any time to access SigmaStat’s on-line help
system.
Normality and Select the Assumption Checking tab from the options dialog box to view
Equal Variance the Normality and Equal Variance options. The normality assumption
Assumptions test checks for a normally distributed population. The equal variance
assumption test checks the variability about the group means.
FIGURE 9–32
The Options for Two
Way ANOVA Dialog Box
Displaying the Assumption
Checking Options
Summary Table Select the Results tab in the options dialog box to view the Summary
Table option. The Summary Table option displays the number of
observations for a column or group, the number of missing values for a
column or group, the average value for the column or group, the
standard deviation of the column or group, and the standard error of the
mean for the column or group.
Confidence Intervals Select the Results tab in the options dialog box to view the Confidence
Intervals option. The Confidence Intervals option displays the
confidence interval for the difference of the means. To change the
interval, enter any number from 1 to 99 (95 and 99 are the most
commonly used intervals). Click the selected check box if you do not
want to include the confidence interval in the report.
Residuals Select the Results tab in the options dialog box to view the Residuals
option. Use the Residuals option to display residuals in the report and to
save the residuals of the test to the specified worksheet column. To
change the column the residuals are saved to, edit the number in or
select a number from the drop-down list.
Power Select the Post Hoc Tests tab in the options dialog box to view the Power
option. The power or sensitivity of a test is the probability that the test
will detect a difference between the groups if there is really a difference.
Change the alpha value by editing the number in the Alpha Value box.
Alpha ($) is the acceptable probability of incorrectly concluding that
there is a difference. The suggested value is $ 0 0.05. This indicates
that a one in twenty chance of error is acceptable, or that you are willing
to conclude there is a significant difference when P 7 0.05.
FIGURE 9–34
The Options for Two
Way ANOVA Dialog Box
Displaying the
Power and Multiple
Comparisons Options
To run a Two Way ANOVA you need to select the data to test. The Pick
Columns dialog box is used to select the worksheet columns with the
data you want to test.
1 If you want to select your data before you run the test, drag the
pointer over your data.
2 Open the Pick Columns for Two Way ANOVA dialog box to start
the Two Way ANOVA. You can either
➤ Select Two Way ANOVA from the toolbar drop-down list, then
click the button.
➤ Choose the Statistics menu Compare Many Groups, Two Way
ANOVA... command.
➤ Click the Run Test button from the Options for Two Way
ANOVA dialog box.
The first selected column is assigned to the first row in the Selected
Columns list, and all successively selected columns are assigned to
successive rows in the list. The number or title of selected columns
FIGURE 9–35
The Pick Columns
for Two ANOVA Dialog Box
Prompting You to
Select Data Columns
5 Click Finish to perform the Two Way ANOVA. The Two Way
ANOVA report appears (see page 272) if you:
➤ Selected to test for normality and equal variance, and your data
passes both tests.
➤ Your data has no missing data points, cells, or is not otherwise
unbalanced.
➤ Selected not perform multiple comparisons, or if you selected to
run multiple comparisons only when the P value is significant,
and the P value is not significant (see page 260).
6 If you elected to test for normality and equal variance, and your
data fails either test, either continue or transform your data, then
perform the Two Way ANOVA on the transformed data. For
information on how to transform data, see Chapter 14, Using
Transforms.
For more information on missing data point and cell handling, see
If There Were Missing Data Cells on page 272.
This dialog box displays the P values for each of the two experimental
factors and of the interaction between the two factors. Only the options
with P values less than or equal to the value set in the Options dialog
box are selected. You can disable multiple comparison testing for a factor
by clicking the selected option. If no factor is selected, multiple
comparison results are not reported.
There are seven multiple comparison tests to choose from for the Two
Way ANOVA. You can choose to perform the:
There are two types of multiple comparison available for the Two Way
ANOVA. The types of comparison you can make depends on the
selected multiple comparison test.
When comparing the two factors separately, the levels within one factor
are compared among themselves without regard to the second factor,
and vice versa. These results should be used when the interaction is not
statistically significant.
Holm-Sidak Test The Holm-Sidak Test can be used for both pairwise comparisons and
comparisons versus a control group. It is more powerful than the Tukey
and Bonferroni tests and, consequently, it is able to detect differences
that these other tests do not. It is recommended as the first-line
procedure for pairwise comparison testing.
Tukey Test The Tukey Test and the Student-Newman-Keuls test are conducted
similarly to the Bonferroni t-test, except that it uses a table of critical
values that is computed based on a better mathematical model of the
probability structure of the multiple comparisons. The Tukey Test is
more conservative than the Student-Newman-Keuls test, because it
controls the errors of all comparisons simultaneously, while the Student-
Neuman-Keuls test controls errors among tests of k means. Because it is
more conservative, it is less likely to determine that a give differences is
statistically significant and it is the recommended test for all pairwise
comparisons.
Student-Newman-Keuls The Student-Newman-Keuls Test and the Tukey Test are conducted
(SNK) Test similarly to the Bonferroni t-test, except that it uses a table of critical
values that is computed based on a better mathematical model of the
probability structure of the multiple comparisons. The Student-
Newman-Keuls Test is less conservative than the Tukey Test because it
controls errors among tests of k means, while the Tukey Test controls the
errors of all comparisons simultaneously. Because it is less conservative,
it is more likely to determine that a give differences is statistically
significant. The Student-Newman-Keuls Test is usually more sensitive
than the Bonferroni t-test, and is only available for all pairwise
comparisons.
Bonferroni t-test The Bonferroni t-test performs pairwise comparisons with paired t-tests.
The P values are then multiplied by the number of comparisons that
were made. It can perform both all pairwise comparisons and multiple
comparisons vs. a control, and is the most conservative test for both each
comparison type. For less conservative all pairwise comparison tests, see
the Tukey and the Student-Newman-Keuls tests, and for the less
conservative multiple comparison vs. a control tests, see the Dunnett’s
Test.
Dunnett’s Test Dunnett's test is the analog of the Student-Newman-Keuls Test for the
case of multiple comparisons against a single control group. It is
conducted similarly to the Bonferroni t-test, but with a more
sophisticated mathematical model of the way the error accumulates in
order to derive the associated table of critical values for hypothesis
testing. This test is less conservative than the Bonferroni Test, and is
only available for multiple comparisons vs. a control.
Duncan’s The Duncan’s Test is the same way as the Tukey and the Student-
Multiple Range Newman-Keuls tests, except that it is less conservative in determining
whether the difference between groups is significant by allowing a wider
range for error rates. Although it has a greater power to detect
differences than the Tukey and the Student-Newman-Keuls tests, it has
less control over the Type1 error rate, and is, therefore, not
recommended.
Performing a Multiple The multiple comparison you choose to perform depends on the
Comparison treatments you are testing. Click Cancel if you do not want to perform
a multiple comparison procedure.
! Note that in both cases the Bonferroni t-test is most sensitive with a
small number of groups. Dunnett’s test is not available if you have
fewer than six observations.
FIGURE 9–36
The Multiple Comparison
Options Dialog Box for
a Two Way ANOVA
FIGURE 9–37
The Multiple Comparison
Options Dialog Box
Prompting You to
Select Control Group s
Summary tables of least square means for each factor and for both
factors together can also be generated. This result and additional results
are enabled in the Options for Two Way ANOVA dialog box (see Setting
Two Way ANOVA Options on page 258). Click a selected check box to
enable or disable a test option. All options are saved between SigmaStat
sessions.
For descriptions of the derivations for Two Way ANOVA results, you
can reference any appropriate statistics reference. For a list of suggested
references, see page 12.
! The report scroll bars only scroll to the top and bottom of the current page.
To move to the next or the previous page in the report, use the and
buttons in the formatting toolbar to move one page up and down in the
report.
Result Explanations In addition to the numerical results, expanded explanations of the results
may also appear. To turn off this explanatory text, choose the Statistics
menu Report Options... command and click the selected Explain Test
Results check box.
FIGURE 9–38
Two Way ANOVA
Report
If There Were Missing If your data contained missing values but no empty cells, the report
Data Cells indicates the results were computed using a general linear model.
Dependent Variable This is the data column title of the indexed worksheet data you are
analyzing with the Two Way ANOVA. Determining if the values in this
column are affected by the different factor levels is the objective of the
Two Way ANOVA.
Normality Test Normality test results display whether the data passed or failed the test of
the assumption that they were drawn from a normal population and the
P value calculated by the test. Normally distributed source populations
are required for all parametric tests.
This result appears if you enabled normality testing in the Two Way
ANOVA Options dialog box (see page 260).
Equal Variance Test Equal Variance test results display whether or not the data passed or
failed the test of the assumption that the samples were drawn from
populations with the same variance and the P value calculated by the
test. Equal variance of the source population is assumed for all
parametric tests.
This result appears if you enabled equal variance testing in the Two Way
ANOVA Options dialog box (see page 260).
ANOVA Table The ANOVA table lists the results of the Two Way ANOVA.
! When there are missing data, the best estimate of these values is
automatically calculated using a general linear model.
If the F ratio is around 1, you can conclude that there are no significant
differences between factor levels or that there is no interaction between
factors (i.e., the data groups are consistent with the null hypothesis that
all the samples were drawn from the same population).
If F is a large number, you can conclude that at least one of the samples
for that factor or combination of factors was drawn from a different
population (i.e., the variability is larger than what is expected from
random variability in the population). To determine exactly which
Power The power, or sensitivity, of a Two Way ANOVA is the probability that
the test will detect the observed difference among the groups if there
really is a difference. The closer the power is to 1, the more sensitive the
test. The power for the comparison of the groups within the two factors
and the power for the comparison of the interactions are all displayed.
These results are set in the Options for Two Way ANOVA dialog box.
The $ value is set in the Options for Two Way ANOVA dialog box; the
suggested value is $ 0 0.05 which indicates that a one in twenty chance
of error is acceptable. Smaller values of $ result in stricter requirements
before concluding there is a significant difference, but a greater
possibility of concluding there is no difference when one exists (a Type
II error). Larger values of $ make it easier to conclude that there is a
difference, but also increase the risk of seeing a false difference (a Type I
error).
Summary Table The least square means and standard error of the means are displayed for
each factor separately (summary table row and column), and for each
combination of factors (summary table cells). If there are missing values,
the least square means are estimated using a general linear model.
When there are no missing data, the least square means equal the cell
and marginal (row and column) means. When there are missing data,
the least squared means provide the best estimate of these values, using a
general linear model. These means and standard errors are used when
performing multiple comparisons (see following section).
Multiple Comparisons If a difference is found among the groups, multiple comparison tables
can be computed. Multiple comparison procedures are activated in the
Options for Two Way ANOVA dialog box (see page 260). The tests
used in the multiple comparisons are set in the Multiple Comparisons
Options dialog box (see page 270).
➤ Groups within each factor without regard to the other factor (this is
a marginal comparison, i.e.,only the columns or rows in the table
are compared).
➤ All combinations of factors (all cells in the table are compared with
each other).
You can conclude from “large” values of t that the difference of the two
groups being compared is statistically significant.
If the P value for the comparison is less than 0.05, the likelihood of
erroneously concluding that there is a significant difference is less than
5%. If it is greater than 0.05, you cannot confidently conclude that
there is a difference.
Dunnett's test only compares a control group to all other groups. All
tests compute the q test statistic, and display whether or not P 7 0.01 for
that pair comparison.
You can conclude from “large” values of q that the difference of the two
groups being compared is statistically significant.
If the P value for the comparison is less than 0.05, the likelihood of
being incorrect in concluding that there is a significant difference is less
You can generate up to six graphs using the results from a Two Way
ANOVA. They include a:
Histogram of Residuals The Two Way ANOVA histogram plots the raw residuals in a specified
range, using a defined interval set. The residuals are divided into a
number of evenly incremented histogram intervals and plotted as
Probability Plot The Two Way ANOVA probability plot graphs the frequency of the raw
residuals. The residuals are sorted and then plotted as points around a
curve representing the area of the gaussian plotted on a probability axis.
Plots with residuals that fall along gaussian curve indicate that your data
was taken from a normally distributed population. The X axis is a
linear scale representing the residual values. The Y axis is a probability
scale representing the cumulative frequency of the residuals. For an
example of a probability plot, see page 155.
Grouped Bar Chart The Two Way ANOVA grouped bar chart plots the means of column
data as bars with error bars indicating the standard deviation of each
group. The levels in the first factor column are used as the tick marks
for the bar chart bars, and the column titles for the first factor column
and the data column are used as the axis titles. Each bar in the group
represents a different level in the second factor column. For an example
of a grouped bar chart, see page 157.
! If there are no interactions between factors, a graph for the Two Way
ANOVA is not generated. In this case, it is more appropriate to perform two
One Way ANOVAs.
3D Residual The Two Way ANOVA 3D residual scatter plot graphs the residuals of
Scatter Plot the two columns of independent variable data. The X and the Y axes
represent the independent variables, and the Z axis represents the
residuals. For an example a 3D residual scatter plot, see page 156.
3D Category The Two Way ANOVA 3D Category Scatter plot graphs the two factors
Scatter Graph from the independent data columns along the X and Y axes against the
data of the dependent variable column along the Z axis. The tick marks
for the X and Y axes represent the two factors from the independent
variable columns, and the tick marks for the Z axis represent the data
from the dependent variable column. For an example a 3D category
scatter plot, see page 158.
FIGURE 9–39
The Create Graph Dialog
Box for Two ANOVA Report
Graphs
2 Select the type of graph you want to create from the Graph Type
list, then cick OK, or double-click the desired graph in the list.
FIGURE 9–40
A Multiple Comparison
for the Two Way ANOVA
➤ You want to see if two of more different experimental groups are affected
by three different factors which may or may not interact.
➤ Samples are drawn from normally distributed populations with equal
variances.
About the In a three way or three factor analysis of variance, there are three experimental
Three Way ANOVA factors which are varied for each experimental group. A three factor design is
used to test for differences between samples grouped according to the levels
of each factor and for interactions between the factors.
Three Way ANOVA is a parametric test that assumes that all the samples
were drawn from normally distributed populations with the same variances.
3 Select Three Way ANOVA from the toolbar drop-down list, or choose
the Statistics menu Compare Many Groups, Three Way ANOVA...
command.
4 Run the test by selecting the worksheet columns with the data you want
to test using the Pick Columns dialog box page 99.
6 View and interpret the Three Way ANOVA report and generate report
graphs (pages 9-300 and 9-308).
The Three Way ANOVA tests for differences between samples grouped
according to the levels of each factor and the interactions between the factors.
If your data is missing data points or even whole cells, SigmaStat detects this
and provides the correct solutions; see the following section, Missing Data
and Empty Cells.
Missing Data and Ideally, the data for a Three Way ANOVA should be completely balanced,
Empty Cells Data i.e., each group or cell in the experiment has the same number of
observations and there are no missing data. However, SigmaStat properly
handles all occurrences of missing and unbalanced data automatically.
Empty Cells When there is an empty cell, i.e., there are no observations for
a combination of three factor levels, a dialog box appears asking you if you
want to analyze the data using a two way or a one way design. If you select a
two way design, SigmaStat attempts to analyze your data using two
interactions. If there are no observations with two interactions, SigmaStat
runs a One Way ANOVA.
If you treat the problem as a Two Way ANOVA, a dialog box appears
prompting you to remove one of the factors. Select the factor you want to
remove, then click OK. The Two Way ANOVA is performed.
If you treat the problem as a One Way ANOVA, each cell in the table is
treated as a single experimental factor. This approach is the most
TABLE 3-10
Example of Drawing 1.2 4.2 .05 1.4 .54
Straight Horizontal and
Vertical Lines through 2.6 3.3 3.1
Connected Data
2.0
! It is important to note that failure to meet the above requirement does not
imply that the data is disconnected. The data in Table 3-11, for example, is
connected.
For descriptions of the concept of data connectivity, you can reference any
appropriate statistics reference. For a list of suggested references, see page 12.
TABLE 3-12
Disconnected Data Gender Male Female
Because this data is not
geometrically connected Drug Drug A Drug B Drug A Drug B
(they share no factor levels
in common), a Three 1 2 3 1 2 3 1 2 3 1 2 3
Way ANOVA cannot be Days Day Day Day Day Day Day Day Day Day Day Day Day
performed, even
assuming no interaction.
-- -- -- 4 5 6 7 8 9 -- -- --
Reaction -- -- -- 16 17 18 19 20 21 -- -- --
-- -- -- 28 29 30 31 32 33 -- -- --
Entering A Three Way ANOVA can only be performed on three factor indexed data.
Worksheet Data Three factor indexed data is placed in four columns; a data point indexed
three ways consists of the first factor in one column, the second factor in a
second column, the third factor in a third column, and the data in a forth
column.
Selecting Data When running a Three Way ANOVA you can either:
Columns
➤ Select the columns to test by dragging your mouse over the columns
before choosing the test.
➤ Select the columns while running the test (page 99).
➤ Adjust the parameters of the test to relax or restrict the testing of your
data for normality and equal variance.
➤ Include the statistics summary table and confidence interval for the data
in the report, and save residuals to the worksheet.
➤ Compute the power, or sensitivity, of the test.
➤ Enable multiple comparison testing.
1 If you are going to run the test after changing test options and want to
select your data before you run the test, drag the pointer over the data.
2 To open the Options for Three Way ANOVA dialog box, select Three
Way ANOVA from the toolbar drop-down list, then click the
button, or choose the Statistics menu Current Test Options...
command. The Normality and Equal Variance options appear (see
Figure 9–42 on page 290).
5 To continue the test, click Run Test. The Pick Columns dialog box
appears (see page 99 for more information).
6 To accept the current settings and close the options dialog box, click
OK. To accept the current setting without closing the options dialog
box, click Apply. To close the dialog box without changing any settings
or running the test, click Cancel.
! You can select Help at any time to access SigmaStat’s on-line help system.
Normality and Select the Assumption Checking tab from the options dialog box to view the
Equal Variance Normality and Equal Variance options. The normality assumption test
Assumptions checks for a normally distributed population. The equal variance assumption
test checks the variability about the group means.
Equal Variance Testing SigmaStat tests for equal variance by checking the
variability about the group means.
! There are extreme conditions of data distribution that these tests cannot take
into account. For example, the Levene Median test fails to detect differences
in variance of several orders of magnitude. However, these conditions
should be easily detected by simply examining the data without resorting to
the automatic assumption tests.
Summary Table Select the Results tab in the options dialog box to view the Summary Table
option. The Summary Table option displays the number of observations for
a column or group, the number of missing values for a column or group, the
average value for the column or group, the standard deviation of the column
or group, and the standard error of the mean for the column or group.
Confidence Intervals Select the Results tab in the options dialog box to view the Confidence
Intervals option. The Confidence Intervals option displays the confidence
interval for the difference of the means. To change the interval, enter any
number from 1 to 99 (95 and 99 are the most commonly used intervals).
Click the selected check box if you do not want to include the confidence
interval in the report.
FIGURE 9–43
The Options for Three Way
ANOVA Dialog Box
Displaying
the Summary Table,
Confidence Intervals,
and Residual Options
Power Select the Post Hoc Tests tab in the options dialog box to view the Power
option. The power or sensitivity of a test is the probability that the test will
detect a difference between the groups if there is really a difference.
Change the alpha value by editing the number in the Alpha Value box.
Alpha ($) is the acceptable probability of incorrectly concluding that there is
a difference. The suggested value is $ 0 0.05. This indicates that a one in
twenty chance of error is acceptable, or that you are willing to conclude there
is a significant difference when P 7 0.05.
Multiple Select the Post Hoc Test tab in the Options dialog box to view the multiple
Comparisons comparisons options. Three Way ANOVAs test the hypothesis of no
differences between the several treatment groups, but do not determine
which groups are different, or the sizes of these differences. Multiple
comparisons to isolate these differences whenever a Three Way ANOVA
detects a difference.
The P value used to determine if the ANOVA detects a difference is set in the
Report Options dialog box. If the P value produced by the Three Way
FIGURE 9–44
The Options for Three
Way ANOVA Dialog Box
Displaying the Power and
Multiple Comparisons
Options
Significant Multiple Comparison Value Select either .05 or .01 from the
Significance Value for Multiple Comparisons drop-down list. This value
determines that the likelihood of the multiple comparison being incorrect in
concluding that there is a significant difference in the treatments.
A value of .05 indicates that the multiple comparisons will detect a difference
if there is less than 5% chance that the multiple comparison is incorrect in
detecting a difference. A value of .01 indicates that the multiple comparisons
will detect a difference if there is less than 1% chance that the multiple
comparison is incorrect in detecting a difference.
To run a Three Way ANOVA you need to select the data to test. The Pick
Columns dialog box is used to select the worksheet columns with the data
you want to test.
1 If you want to select your data before you run the test, drag the pointer
over your data.
2 Open the Pick Columns for Three Way ANOVA dialog box to start the
Three Way ANOVA. You can either:
➤ Select Three Way ANOVA from the drop-down list in the toolbar,
then click the button.
➤ Choose the Statistics menu Compare Many Groups, Three Way
ANOVA... command.
➤ Click the Run Test button from the Options for Three Way ANOVA
dialog box.
The Pick Columns for Three Way ANOVA dialog box appears. If you
selected columns before you chose the test, the selected columns appear
in the Selected Columns list. If you have not selected columns, the
dialog box prompts you to pick your data.
The first selected column is assigned to the first row in the Selected
Columns list, and all successively selected columns are assigned to
successive rows in the list. The number or title of selected columns
appear in each row. You are prompted to pick three factor columns and
one data column.
4 To change your selections, select the assignment in the list, then select
new column from the worksheet. You can also clear a column
assignment by double-clicking it in the Selected Columns list.
5 Click Finish to perform the Three Way ANOVA. The Three Way
ANOVA report appears if:
To edit the report, use the Format menu commands; for information
on editing reports, see Editing Reports on page 137.
FIGURE 9–45
The Pick Columns for
Three Way ANOVA Dialog
Box
6 If you elected to test for normality and equal variance, and your data
fails either test, either continue or transform your data, then perform
the Three Way ANOVA on the transformed data. For information on
how to transform data, see Chapter 14, USING TRANSFORMS.
➤ You are missing data points, but still have at least one observation in
each cell, SigmaStat automatically proceeds with the Three Way
ANOVA using a general linear model.
➤ You are missing a cell, but the data is connected, you can proceed by
either performing a three way analysis assuming no interaction
between the factor, or converting the problem into a two way design
with each non-empty cell a different level of two factor.
➤ Your data is not geometrically connected, you cannot perform a
Three Way ANOVA. Either treat the problem as a Two Way
ANOVA, or cancel the test.
This dialog box displays the P values for each of the experimental factors and
of the interaction between the three factors. Only the options with P values
less than or equal to the value set in the Options dialog box are selected. You
can disable multiple comparison testing for a factor by clicking the selected
option. If no factor is selected, multiple comparison results are not reported.
There are seven multiple comparison tests to choose from for the Three Way
ANOVA. You can choose to perform the:
➤ Holm-Sidak Test
➤ Tukey Test
➤ Student-Newman-Keuls Test
➤ Bonferroni t-test
➤ Fisher’s LSD
➤ Dunnet’s Test
➤ Duncan’s Multiple Range Test
There are two types of multiple comparison available for the Three Way
ANOVA. The types of comparison you can make depends on the selected
multiple comparison test, such as:
All pairwise comparisons test the difference between each treatment or level
within the two factors separately (i.e., among the different rows and columns
of the data table). Multiple comparisons versus a control test the difference
between all the different combinations of each factors (i.e., all the cells in the
data table).
When comparing the two factors separately, the levels within one factor are
compared among themselves without regard to the second factor, and vice
versa. These results should be used when the interaction is not statistically
significant.
The result of both comparisons is a listing of the similar and different group
pairs, i.e., those groups that are and are not detectably different from each
other. Because no statistical test eliminates uncertainty, multiple comparison
procedures sometimes produce ambiguous groupings.
Holm-Sidak Test The Holm-Sidak Test can be used for both pairwise comparisons and
comparisons versus a control group. It is more powerful than the Tukey and
Bonferroni tests and, consequently, it is able to detect differences that these
other tests do not. It is recommended as the first-line procedure for pairwise
comparison testing.
When performing the test, the P values of all comparisons are computed and
ordered from smallest to largest. Each P value is then compared to a critical
level that depends upon the significance level of the test (set in the test
options), the rank of the P value, and the total number of comparisons made.
A P value less than the critical level indicates there is a significant difference
between the corresponding two groups.
Tukey Test The Tukey Test and the Student-Newman-Keuls test are conducted similarly
to the Bonferroni t-test, except that each uses a table of critical values that is
Student-Newman- The Student-Newman-Keuls Test and the Tukey Test are conducted similarly
Keuls (SNK) Test to the Bonferroni t-test, except that each uses a table of critical values that is
computed based on a better mathematical model of the probability structure
of the multiple comparisons. The Student-Newman-Keuls Test is less
conservative than the Tukey Test because it controls errors among tests of k
means, while the Tukey Test controls the errors of all comparisons
simultaneously. Because it is less conservative, it is more likely to determine
that a give differences is statistically significant. The Student-Newman-Keuls
Test is usually more sensitive than the Bonferroni t-test, and is only available
for all pairwise comparisons.
Bonferroni t-test The Bonferroni t-test performs pairwise comparisons with paired t-tests. The
P values are then multiplied by the number of comparisons that were made.
It can perform both all pairwise comparisons and multiple comparisons vs. a
control, and is the most conservative test for both each comparison type. For
less conservative all pairwise comparison tests, see the Tukey and the Student-
Newman-Keuls tests, and for the less conservative multiple comparison vs. a
control tests, see the Dunnett’s Test.
Fisher’s Least The Fisher’s LSD Test is the least conservative all pairwise comparison test.
Significance Unlike the Tukey and the Student-Newman-Keuls, it makes not effort to
Difference Test control the error rate. Because it makes not attempt in controlling the error
rate when detecting differences between groups, it is not recommended.
Dunnett’s Test Dunnett's test is the analog of the Student-Newman-Keuls Test for the case
of multiple comparisons against a single control group. It is conducted
similarly to the Bonferroni t-test, but with a more sophisticated mathematical
model of the way the error accumulates in order to derive the associated table
of critical values for hypothesis testing. This test is less conservative than the
Bonferroni Test, and is only available for multiple comparisons vs. a control.
Performing a The multiple comparison you choose to perform depends on the treatments
Multiple Comparison you are testing. Click Cancel if you do not want to perform a multiple
comparison procedure.
You can disable multiple comparison testing for a factor by clicking the
selected option.
FIGURE 9–46
The Multiple Comparison
Options Dialog Box for
a Three Way ANOVA
2 Select the desired multiple comparison test from the Suggested Test
drop-down list. The Tukey and Student-Newman-Keuls tests are
recommended for determining the difference among all treatments. If
you have only a few treatments, you may want to select the simpler
Bonferroni t-test.
! Note that in both cases the Bonferroni t-test is most sensitive with a small
number of groups. Dunnett’s test is not available if you have fewer than
six observations.
FIGURE 9–47
The Multiple Comparison
Options Dialog Box
Prompting You to
Select a Control Group
A full Three Way ANOVA report displays an ANOVA table describing the
variation associated with each factor and their interactions. This table
displays the degrees of freedom, sum of squares, and mean squares for each of
the elements in the data table, as well as the F statistics and the corresponding
P values.
Summary tables of least square means for each factor and for all three factors
together can also be generated. This result and additional results are enabled
in the Options for Three Way ANOVA dialog box (see Setting Three Way
ANOVA Options on page 288). Click a check box to enable or disable a test
option. All options are saved between SigmaStat sessions.
For descriptions of the derivations for Three Way ANOVA results, you can
reference any appropriate statistics reference. For a list of suggested
references, see page 12.
! The report scroll bars only scroll to the top and bottom of the current page.
To move to the next or the previous page in the report, use the and
Result Explanations In addition to the numerical results, expanded explanations of the results may
also appear. To turn off this explanatory text, choose the Statistics menu
Report Options... command and click the selected Explain Test Results check
box.
The number of decimal places displayed is also set in the Report Options
dialog box. For more information on setting report options see Setting Test
Options on page 98.
If There Were If your data contained missing values but no empty cells, the report indicates
Missing Data Cells the results were computed using a general linear model.
If your data contained empty cells, you either analyzed the problem assuming
either no interaction or treated the problem as a Two or One Way ANOVA.
FIGURE 9–48
Three Way ANOVA
Report
Normality Test Normality test results display whether the data passed or failed the test of the
assumption that they were drawn from a normal population and the P value
calculated by the test. Normally distributed source populations are required
for all parametric tests.
This result appears if you enabled normality testing in the Options for Three
Way ANOVA dialog box (see page 290).
Equal Variance Test Equal Variance test results display whether or not the data passed or failed the
test of the assumption that the samples were drawn from populations with
the same variance and the P value calculated by the test. Equal variance of
the source population is assumed for all parametric tests.
This result appears if you enabled equal variance testing in the Options for
Three Way ANOVA dialog box (see page 290).
ANOVA Table The ANOVA table lists the results of the Three Way ANOVA.
! When there are missing data, the best estimate of these values is
automatically calculated using a general linear model.
F Statistic The F test statistic is provided for comparisons within each factor
and between the factors.
If the F ratio is around 1, you can conclude that there are no significant
differences between factor levels or that there is no interaction between
factors (i.e., the data groups are consistent with the null hypothesis that all
the samples were drawn from the same population).
If F is a large number, you can conclude that at least one of the samples for
that factor or combination of factors was drawn from a different population
(i.e., the variability is larger than what is expected from random variability in
the population). To determine exactly which groups are different, examine
the multiple comparison results (see page 305).
Power The power, or sensitivity, of a Three Way ANOVA is the probability that the
test will detect the observed difference among the groups if there really is a
difference. The closer the power is to 1, the more sensitive the test. The
power for the comparison of the groups within the two factors and the power
for the comparison of the interactions are all displayed. These results are set
in the Options for Three Way ANOVA dialog box (see page 291).
The $ value is set in the Options for Three Way ANOVA dialog box; the
suggested value is $ 0 0.05 which indicates that a one in twenty chance of
error is acceptable. Smaller values of $ result in stricter requirements before
concluding there is a significant difference, but a greater possibility of
concluding there is no difference when one exists (a Type II error). Larger
values of $ make it easier to conclude that there is a difference, but also
increase the risk of seeing a false difference (a Type I error).
Summary Table The least square means and standard error of the means are displayed for each
factor separately (summary table row and column), and for each combination
of factors (summary table cells). If there are missing values, the least square
means are estimated using a general linear model.
Mean The average value for the column. If the observations are normally
distributed the mean is the center of the distribution.
When there are no missing data, the least square means equal the cell and
marginal (row and column) means. When there are missing data, the least
squared means provide the best estimate of these values, using a general linear
model. These means and standard errors are used when performing multiple
comparisons (see below).
Multiple If a difference is found among the groups, multiple comparison tables can be
Comparisons computed. Multiple comparison procedures are activated in the Options for
Three Way ANOVA dialog box (see page 292). The tests used in the
multiple comparisons are set in the Multiple Comparisons Options dialog
box (see page 298).
➤ Groups within each factor without regard to the other factor (this is a
marginal comparison, i.e.,only the columns or rows in the table are
compared).
➤ All combinations of factors (all cells in the table are compared with each
other).
Bonferroni t-test Results The Bonferroni t-test lists the differences of the
means for each pair of groups, computes the t values for each pair, and
displays whether or not P 7 0.05 for that comparison. The Bonferroni t-test
can be used to compare all groups or to compare versus a control.
You can conclude from “large” values of t that the difference of the two
groups being compared is statistically significant.
If the P value for the comparison is less than 0.05, the likelihood of
erroneously concluding that there is a significant difference is less than 5%.
If it is greater than 0.05, you cannot confidently conclude that there is a
difference.
The Difference of Means is a gauge of the size of the difference between the
levels or cells being compared.
Dunnett's test only compares a control group to all other groups. All tests
compute the q test statistic and display whether or not P 7 0.05 for that pair
comparison.
You can conclude from “large” values of q that the difference of the two
groups being compared is statistically significant.
If the P value for the comparison is less than 0.05, the likelihood of being
incorrect in concluding that there is a significant difference is less than 5%.
If it is greater than 0.05, you cannot confidently conclude that there is a
difference.
p is a parameter used when computing q. The larger the p, the larger q needs
to be to indicate a significant difference. p is an indication of the differences
in the ranks of the group means being compared. Groups means are ranked
in order from largest to smallest, and p is the number of means spanned in
the comparison. For example, when comparing four means, comparing the
largest to the smallest p 0 4, and when comparing the second smallest to the
smallest p 0 2.
The Difference of Means is a gauge of the size of the difference between the
groups or cells being compared.
You can generate up to three graphs using the results from a Three Way
ANOVA. They include a:
Histogram of The Three Way ANOVA histogram plots the raw residuals in a specified
Residuals range, using a defined interval set. The residuals are divided into a number of
evenly incremented histogram intervals and plotted as histogram bars
indicating the number of residuals in each interval. The X axis represents the
histogram intervals, and the Y axis represent the number of residuals in each
group. For an example of a histogram, see page 182.
Probability Plot The Three Way ANOVA probability plot graphs the frequency of the raw
residuals. The residuals are sorted and then plotted as points around a curve
representing the area of the gaussian plotted on a probability axis. Plots with
residuals that fall along gaussian curve indicate that your data was taken from
a normally distributed population. The X axis is a linear scale representing
the residual values. The Y axis is a probability scale representing the
cumulative frequency of the residuals. For an example of a probability, see
page 182.
Multiple The Three Way ANOVA multiple comparison graphs plot significant
Comparison Graphs differences between levels of a significant factor. There is one graph for every
significant factor reported by the specified multiple comparison test. If there
is one significant factor reported, one graph appears; if there are two
significant factors, two graphs appear, etc. If a factor is not reported as
significant, a graph for the factor does not appear. For an example of a
multiple comparison graph, see page 182
FIGURE 9–49
The Create Graph Dialog
Box for a Three Way ANOVA
Report
2 Select the type of graph you want to create from the Graph Type list,
then click OK, or double-click the desired graph in the list. For more
information on each of the graph types, see Chapter 8. The specified
graph appears in a graph window or in the report.
FIGURE 9–50
A Normal Probability Plot
for a Three Way ANOVA
If you know that your data were drawn from normal populations with equal
variances, use One Way ANOVA. When there are only two groups to
compare, do a Mann-Whitney Rank Sum Test. There is no two or three
factor test for non-normal populations; however, you can transform your data
using Transform menu commands so that it fits the assumptions of a
parametric test. For more information on transforming your data, see
Chapter 14, USING TRANSFORMS.
About the The Kruskal-Wallis Analysis of Variance on Ranks compares several different
Kruskal-Wallis experimental groups that receive different treatments. This design is
ANOVA on Ranks essentially the same as a Mann-Whitney Rank Sum Test, except that there are
more than two experimental groups. If you try to perform an ANOVA on
Ranks on two groups, SigmaStat tells you to perform a Rank Sum Test
instead.
The null hypothesis you test is that there is no difference in the distribution
of values between the different groups.
4 Run the test by selecting the worksheet columns with the data you want
to test using the Pick Columns dialog box (page 316).
6 View and interpret the ANOVA on Ranks report and generate report
graph (pages page 322 and page 326).
The format of the data to be tested can be raw data or indexed data. Raw
data is placed in as many columns as there are groups, up to 64; each column
contains the data for one group. Indexed data is placed in two worksheet
columns with at least three treatments. If you have less than three treatments
you should use the Rank Sum Test (see page 220).
For more information on arranging data, see Data Format for Group
Comparison Tests on page 204, or Arranging Data for Contingency Tables
on page 69.
➤ Adjust the parameters of the test to relax or restrict the testing of your
data for normality and equal variance.
➤ Enable multiple comparison testing.
➤ Display the summary table.
1 If you are going to run the test after changing test options, and want to
select your data before you run the test, drag the pointer over your data.
2 To open the Options for ANOVA on Ranks dialog box, select ANOVA
on Ranks from the toolbar drop-down list, then click the button, or
choose the Statistics menu Current Test Options... command. The
Normality and Equal Variance options appear (see Figure 9–52 on page
313).
3 Click the Results tab to view the Summary Table option (see Figure 9–
53 on page 314) and the Post Hoc Tests tab to view the multiple
comparison option (see Figure 9–54 on page 315). Click the
Assumption Checking tab to return to the Normality and Equal
Variance options.
5 To continue the test, click Run Test. The Pick Columns dialog box
appears (see page 99 for more information).
6 To accept the current settings and close the options dialog box, click
OK. To accept the current setting without closing the options dialog
! You can click Help at any time to access SigmaStat’s on-line help system.
Normality and Select the Assumption Checking tab from the options dialog box to view the
Equal Variance Normality and Equal Variance options. The normality assumption test
Assumptions checks for a normally distributed population. The equal variance assumption
test checks the variability about the group means.
FIGURE 9–52
The Options for
ANOVA on Ranks Dialog Box
Displaying the Assumption
Checking Options
Equal Variance Testing SigmaStat tests for equal variance by checking the
variability about the group means.
! There are extreme conditions of data distribution that these tests cannot take
into account. For example, the Levene Median test fails to detect differences
in variance of several orders of magnitude. However, these conditions
should be easily detected by simply examining the data without resorting to
the automatic assumption tests.
Summary Table The summary table for a Rank Sum Test lists the medians, percentiles, and
sample sizes N in the Rank Sum test report. If desired, change the percentile
values by editing the boxes. The 25th and the 75th percentiles are the
suggested percentiles.
FIGURE 9–53
The Options for
ANOVA on Ranks Dialog Box
Displaying the Summary
Table Option
Multiple Comparison Select the Post Hoc Test tab in the Options dialog box to view the multiple
Options comparisons options. An ANOVA on Ranks tests the hypothesis of no
differences between the several treatment groups, but does not determine
which groups are different, or the size of these differences. Multiple
comparisons isolate these differences.
The P value used to determine if the ANOVA detects a difference is set in the
Report Options dialog box. If the P value produced by the ANOVA on
Ranks is less than the P value specified in the box, a difference in the groups
is detected and the multiple comparisons are performed. For more
information on specifying a P value for the ANOVA, see Setting Report
Options on page 135.
FIGURE 9–54
The Options for ANOVA on
Ranks Dialog Box
Displaying the Multiple
Comparison Options
Select a value from the Significance Value for Multiple Comparisons drop-
down list. This value determines the that the likelihood of the multiple
comparison being incorrect in concluding that there is a significant difference
in the treatments.
A value of .05 indicates that the multiple comparisons will detect a difference
if there is less than 5% chance that the multiple comparison is incorrect in
detecting a difference.
To run a test, you need to select the data to test. The Pick Columns dialog
box is used to select the worksheet columns with the data you want to test
and to specify the format of the data you are testing.
1 If you want to select your data before you run the test, drag the pointer
over your data.
2 Open the Pick Columns for ANOVA on Ranks dialog box to start the
ANOVA on Ranks. You can either:
➤ Select ANOVA on Ranks from the toolbar drop-down list, then click
the button.
➤ Choose the Statistics menu Compare Many Groups, ANOVA on
Ranks... command.
➤ Click the Run Test button from the Options from ANOVA on
Ranks dialog box (see page 313).
The Pick Columns dialog box appears prompting you to specify a data
format.
3 Select the appropriate data format from the Data Format drop-down
list. If your data is grouped in columns, select Raw. If your data is in
the form of a group index column(s) paired with a data column(s),
select Indexed.
For more information on arranging data, see Data Format for Group
Comparison Tests on page 204, or Arranging Data for Contingency
Tables on page 69.
FIGURE 9–55
The Pick Columns
for ANOVA on
Ranks Dialog Box
Prompting You to
Specify A Data Format
The first selected column is assigned to the first row in the Selected
Columns list, and all successively selected columns are assigned to
successive rows in the list.
The number or title of selected columns appear in each row. You are
prompted to pick a minimum of two and a maximum of 64 columns
for raw data and two columns with at least three treatments are selected
for indexed data. If you have less than three treatments, a message
appears telling you to use the Rank Sum Test (see page 220).
FIGURE 9–56
The Pick Columns
for ANOVA on Ranks Dialog
Box
Prompting You to
Select Data Columns
6 To change your selections, select the assignment in the list, then select
new column from the worksheet. You can also clear a column
assignment by double-clicking it in the Selected Columns list.
7 If you elected to test for normality and equal variance, and your data
fails either test, either continue or transform your data, then perform
the Two Way ANOVA on the transformed data. For information on
how to transform data, see Chapter 14, USING TRANSFORMS.
➤ Elected to test for normality and equal variance, and your data passes
both tests.
To edit the report, use the Format menu commands; for information
on editing reports, see page 137 in the WORKING WITH REPORTS
chapter.
This dialog box displays the P values for each of the two experimental factors
and of the interaction between the two factors. Only the options with P
values less than or equal to the value set in the Options dialog box are
selected. You can disable multiple comparison testing for a factor by clicking
the selected option. If no factor is selected, multiple comparison results are
not reported.
There are four multiple comparison tests to choose from for the ANOVA on
Ranks. You can choose to perform the:
➤ Dunn’s Test
➤ Dunnett’s Test
➤ Tukey Test
➤ Student-Newman-Keuls Test
Tukey Test The Tukey Test is the suggested all pairwise comparison test unless you have
missing values (see the Dunn’s test for data with empty cells). It uses a table
of critical values that is computed based on a mathematical model of the
probability structure of the multiple comparisons. The Tukey Test is more
conservative than the Student-Newman-Keuls test, because it controls the
errors of all comparisons simultaneously, while the Student-Neuman-Keuls
test controls errors among tests of k means. Because it is more conservative, it
is less likely to determine that a give differences is statistically significant and
it is the recommended test for all pairwise comparisons.
Student-Newman- Like the Tukey Test, the Student-Newman-Keuls test is an all pairwise
Keuls (SNK) Test comparison test that uses a table of critical values that is computed based on a
mathematical model of the probability structure of the multiple comparisons.
The Student-Newman-Keuls Test is more conservative than the Tukey Test,
because the Tukey test controls the errors of all comparisons simultaneously,
while the Student-Neuman-Keuls test controls errors among tests of k means.
Because it is less conservative, it is more likely to determine that a give
differences is statistically significant.
The nonparametric SNK test requires equal sample sizes. Use Dunn's test if
the sample sizes are not equal; SigmaStat automatically selects this test when
sample sizes are unequal.
Dunnett's Test Dunnett's test is the analog of the SNK test for the case of multiple
comparisons against a single control group. The nonparametric Dunnett's
test requires equal sample sizes. Use Dunn's test if the sample sizes are not
equal; SigmaStat automatically selects this test when sample sizes are unequal.
Performing a The multiple comparison you choose to perform depends on the groups you
Multiple Comparison are testing. Click Cancel if you do not want to perform a multiple
comparison procedure.The multiple comparison you choose to perform
depends on the treatments you are testing. Click Cancel if you do not want
to perform a multiple comparison procedure.
You can disable multiple comparison testing for a factor by clicking the
selected option.
FIGURE 9–57
The Multiple Comparison
Options Dialog Box for
the ANOVA on Ranks
2 Select the desired multiple comparison test from the Suggested Test
drop-down list. The Tukey and Student-Newman-Keuls tests are
recommended for determining the difference among all treatments, and
if your sample sizes are equal. To perform an all pairwise comparison
on unequal sample size, select Dunn’s test.
! Note that in both cases SigmaStat defaults to Dunn’s test when your
sample sizes are unequal. You must use Dunn’s test for unequal sample
sizes.
FIGURE 9–58
The Multiple Comparison
Options Dialog Box
Prompting You to
Select a Control Group
The ANOVA on Ranks report displays the H statistic (corrected for ties) and
the corresponding P value for H. The other results displayed in the report are
enabled and disabled in the Options for ANOVA on Ranks dialog box (see
page 313).
For descriptions of the derivations for ANOVA on Ranks results, you can
reference any appropriate statistics reference. For a list of suggested
references, see page 12.
The report scroll bars only scroll to the top and bottom of the current page.
To move to the next or the previous page in the report, use the and
buttons in the formatting toolbar to move one page up and down in the
report.
Result Explanations In addition to the numerical results, expanded explanations of the results may
also appear. To turn off this explanatory text, choose the Statistics menu
Report Options... command and click the selected Explain Test Options
check box.
The number of decimal places displayed is also set in the Report Options
dialog box. For more information on setting report options, see page 135 in
Chapter 7.
Normality Test Normality test results display whether the data passed or failed the test of the
assumption that it was drawn from a normal population and the P value
These results appear unless you disabled normality testing in the Options for
ANOVA on Ranks dialog box (see page 313).
Equal Variance Test Equal Variance test results display whether or not the data passed or failed the
test of the assumption that the samples were drawn from populations with
the same variance and the P value calculated by the test. Nonparametric tests
do not assume equal variances of the source populations.
These results appear unless you disabled equal variance testing in the Options
for ANOVA on Ranks dialog box (see page 313).
FIGURE 9–59
The ANOVA on Ranks
Results Report
Summary Table If you selected this option in the Options for ANOVA on Ranks dialog box,
SigmaStat generates a summary table listing the medians, the percentiles
defined in the Options dialog box, and sample sizes N.
Percentiles The two percentile points that define the upper and lower tails
of the observed values.
For large sample sizes, this value is compared to the chi-square distribution
(the estimate of all possible distributions of H) to determine the possibility of
this H occurring. For small sample sizes, the actual distribution of H is used.
If H is a large number, the variability among the average ranks is larger than
expected from random variability in the population, and you can conclude
that the samples were drawn from different populations (i.e., the differences
between the groups are statistically significant).
Multiple If a difference is found among the groups, and you requested and elected to
Comparisons perform multiple comparisons, a table of the comparisons between group
pairs is displayed. The multiple comparison procedure is activated in the
Options for ANOVA on Ranks dialog box (see page 292). The test used in
the multiple comparison procedure is selected in the Multiple Comparison
Options dialog box (see page 298).
Multiple comparison results are used to determine exactly which groups are
different, since the ANOVA results only inform you that two or more of the
groups are different. The specific type of multiple comparison results
You can conclude from “large” values of q that the difference of the two
groups being compared is statistically significant.
If the P value for the comparison is less than 0.05, the probability of being
incorrect in concluding that there is a significant difference is less than 5%.
If it is greater than 0.05, you cannot confidently conclude that there is a
difference.
The Difference of Ranks is a gauge of the size of the real difference between
the two groups.
p is a parameter used when computing q. The larger the p, the larger q needs
to be to indicate a significant difference. p is an indication of the differences
in the ranks of the group means being compared. Group rank sums are
ranked in order from largest to smallest in an SNK or Dunnett’s test, so p is
the number of rank sums spanned in the comparison. For example, when
comparing four rank sums, comparing the largest to the smallest p " 4, and
when comparing the second smallest to the smallest
p " 2.
You can conclude from “large” values of Q that the difference of the two
groups being compared is statistically significant.
If the P value for the comparison is less than 0.05, the likelihood of being
incorrect in concluding that there is a significant difference is less than 5%.
If it is greater than 0.05, you cannot confidently conclude that there is a
difference.
The Difference of Rank Means is a gauge of the size of the difference between
the two groups.
You can generate up to three graphs using the results from an ANOVA on
Ranks. They include a:
Point Plot The ANOVA on Ranks point plot graphs all values in each column as a point
on the graph. If the graph data is indexed, the levels in the factor column are
used as the tick marks for the plot points, and the column titles are used as
the X and Y axis titles. If the graph data is in raw or statistical format, the
column titles are used as the tick marks for the plot points and default X Data
and Y Data axis titles are assigned to the graph.
If the graph data is indexed, the levels in the factor column are used as the
tick marks for the box plot boxes, and the column titles are used as the axis
titles. If the graph data is in raw format, the column titles are used as the tick
marks for the box plot boxes, and default axis titles, X Axis and Y Axis, are
assigned to the graph. For an example of a box plot, see page 182 in the
CREATING AND MODIFYING GRAPHS chapter.
Multiple The multiple comparison graphs are available for all ANOVA reports. They
Comparison Graphs plot significant differences between levels of a significant factor. There is one
graph for every significant factor reported by the specified multiple
comparison test. If there is one significant factor reported, one graph
appears; if there are two significant factors, two graphs appear, etc. If a factor
is not reported as significant, a graph for the factor does not appear. For an
example of a multiple comparison graph, see page 182 in the CREATING AND
MODIFYING GRAPHS chapter.
1 Click the toolbar button, or choose the Graph menu Create Graph
command when the ANOVA on Ranks report is selected. The Create
Graph dilaog appears displaying the types of graphs available for the
ANOVA on Ranks report.
FIGURE 9–60
The Create Graph Dialog
Box
for the ANOVA on
Ranks Report Graph
For information on manipulating graphs, see pages page 178 through page
202.
FIGURE 9–61
A Multiple Comparison
Matrix for a
ANOVA on Ranks
9 Comparing Repeated
Measurements of the Same
Individuals
See Choosing the Repeated Measures Test to Use on page 117 for more
information on when to use the different SigmaStat repeated measures
comparisons tests.
Parametric and Parametric tests assume treatment effects are normally distributed with
Nonparametric Tests the same variances (or standard deviations). Parametric tests are based
on estimates of the population means and standard deviations, the
parameters of a normal distribution.
Comparing Individuals Use before and after comparisons to test the effect of a single
Before and After a Single experimental treatment on the same individuals. There are two tests
Treatment available:
Comparing Individuals Use repeated measures procedures to test the effect of more than one
Before and After Multiple experimental treatment on the same individuals. There are three tests
Treatments available
You cannot use the summary statistics for repeated measures tests.
Complete descriptions of data entry and formats can be found in
Chapter 4, USING THE DATA WORKSHEET.
Raw Data To enter data in raw data format, enter the data for each treatment in
separate worksheet columns. You can use raw data for all tests except
Two Way ANOVAs.
FIGURE 9–1
Valid Data Formats
for a Paired t-test
Columns 1 and 2 are
arranged as raw data.
Columns 3, 4, and 5 are
arranged as indexed data,
with column 3 as the subject
column, column 4 as the
factor column, and column 5
as the data column.
The worksheet columns for raw data must be the same length. If a missing
value is encountered, that individual is either ignored or, for parametric
ANOVAs, a general linear model is used to take advantage of all available
data.
Indexed Data Indexed data contains the treatments in one column and the
corresponding data points in another column. A One Way Repeated
Measures ANOVA requires a subject index in a third column. Two Way
Repeated Measures ANOVA requires an additional factor column, for a
total of four columns.
If you plan to compare only a portion of the data, put the treatment
index in the left column, followed by the second factor index (for Two
Way ANOVA only), then the subject index (for Repeated Measures
ANOVA), and finally the data in the rightmost column.
You can index raw data or convert indexed data to raw data using the Edit
menu Index and UnIndex commands.
Paired t-Test 10
If you know that the distribution of the observed effects are non-normal,
use the Wilcoxon Signed Rank Test. If you are comparing the effect of
multiple treatments on the same individuals, do a Repeated Measures
Analysis of Variance.
About the The Paired t-test examines the changes which occur before and after a
Paired t-test single experimental intervention on the same individuals to determine
whether or not the treatment had a significant effect. Examining the
changes rather than the values observed before and after the intervention
removes the differences due to individual responses, producing a more
sensitive, or powerful, test.
2 If desired, set the Paired t-test options using the Options for Paired
t-test dialog box (page 336).
4 Run the test by selecting the worksheet columns with the data you
want to test using the Pick Columns dialog box (page 338).
5 View and interpret the Paired t-test report and generate report
graphs (page 339 and page 343).
The format of the data to be tested can be raw data or indexed data. The
data is placed in two worksheet columns for raw data and three columns
(a subject, factor, and data column) for indexed data. The columns for
raw data must be the same length. If a missing value is encountered,
that individual is ignored. You cannot use statistical summary data for
repeated measures tests.
FIGURE 9–2
Valid Data Formats
for a Paired t-test
Columns 1 and 2 are
arranged as raw data.
Columns 3, 4, and 5 are
arranged as indexed data,
with column 3 as the subject
column and column 4 as the
factor column.
For more information on arranging data, see Data Format for Repeated
Measures Tests on page 330, or Arranging Paired t-test Data on page
333.
Selecting Data Columns When running a Paired t-test, you can either:
1 If you are going to run the test after changing test options, and
want to select your data before you run the test, drag the pointer
over your data.
2 To open the Options for Paired t-test dialog box, select Paired t-
test from the toolbar drop-down list, then click the button, or
choose the Statistics menu Current Test Options... command.
The Normality option appears (see Figure 9–3 on page 335).
5 To continue the test, click Run Test. The Pick Columns dialog
box appears (see page 350 for more information).
6 To accept the current settings and close the options dialog box,
click OK. To accept the current setting without closing the
options dialog box, click Apply. To close the dialog box without
changing any settings or running the test, click Cancel.
You can select Help at any time to access SigmaStat’s on-line help
system.
Normality Assumptions Select the Assumption Checking tab from the options dialog box to view
the Normality option. The normality assumption test checks for a
normally distributed population.
FIGURE 9–3
The Options for Paired t-test
Dialog Box Displaying the
Assumption Checking
Options
The Equal Variance option is not available for the Paired t-test because
Paired t-tests are based on changes in each individual rather than on different
individuals in the selected population, making equal variance testing
unnecessary.
Although the normality test is robust in detecting data from populations that
are non-normal, there are extreme conditions of data distribution that this
test cannot take into account. However, these conditions should be easily
Summary Table Select the Results tab in the options dialog box to view the Summary
Table option. The Summary Table option displays the number of
observations for a column or group, the number of missing values for a
column or group, the average value for the column or group, the
standard deviation of the column or group, and the standard error of the
mean for the column or group.
Confidence Intervals Select the Results tab in the options dialog box to view the Confidence
Intervals option. The Confidence Intervals option displays the
confidence interval for the difference of the means. To change the
interval, enter any number from 1 to 99 (95 and 99 are the most
commonly used intervals). Click the selected check box if you do not
want to include the confidence interval in the report.
Residuals Select the Results tab in the options dialog box to view the Residuals
option. Use the Residuals option to display residuals in the report and to
save the residuals of the test to the specified worksheet column. To
change the column the residuals are saved to, edit the number in or
select a number from the drop-down list.
FIGURE 9–4
The Options for Paired
t-test Dialog Box Displaying
the Summary Table,
Confidence Intervals,
and Residuals Options
Power Select the Post Hoc Tests tab in the options dialog box to view the Power
option. The power or sensitivity of a test is the probability that the test
will detect a difference between the groups if there is really a difference.
Change the alpha value by editing the number in the Alpha Value box.
Alpha (#) is the acceptable probability of incorrectly concluding that
there is a difference. The suggested value is # " 0.05. This indicates
that a one in twenty chance of error is acceptable, or that you are willing
to conclude there is a significant difference when P ! 0.05.
FIGURE 9–5
The Options for Paired
t-test Dialog Box Displaying
the Power Option
To run a test, you need to select the data to test. The Pick Columns
dialog box is used to select the worksheet columns with the data you
want to test and to specify how your data is arranged in the worksheet.
1 If you want to select your data before you run the test, drag the
pointer over your data.
2 Open the Pick Columns dialog box to start the Paired t-test. You
can either:
➤ Select Paired t-test from the toolbar drop-down list, then select
the button.
➤ Choose the Statistics menu Before and After command, then
choose Paired t-test.
➤ Click the Run Test button from the Options for Paired t-test
dialog box (see page 337).
3 Select the appropriate data format from the Data Format drop-
down list. If your data is grouped in columns, select Raw. If your
data is in the form of a group index column(s) paired with a data
column(s), select Indexed.
FIGURE 9–6
The Pick Columns
for Paired t-test Dialog Box
Prompting You to
Specify a Data Format
4 Select Next to pick the data columns for the test. If you selected
columns before you chose the test, the selected columns appear in
the Selected Columns list.
The first selected column is assigned to the first row in the Selected
Columns list, and all successively selected columns are assigned to
successive rows in the list. The number or title of selected columns
appear in each row. For raw data you are prompted for two data
columns and for indexed data, you are prompted to select three
worksheet columns.
FIGURE 9–7
The Pick Columns
for Paired t-test Dialog Box
Prompting You to
Select Data Columns
The Paired t-test report displays the t statistic, degrees of freedom, and P
value for the test. The other results displayed in the report are selected in
the Options for Paired t-test dialog box (see Setting Paired t-test Options
on page 333).
For descriptions of the derivations for paired t-test results, you can
reference an appropriate statistics reference. For a list of suggested
references, see the page 12.
The report scroll bars only scroll to the top and bottom of the current page.
To move to the next or the previous page in the report, use the and
buttons in the formatting toolbar to move one page up and down in the
report.
Result Explanations In addition to the numerical results, expanded explanations of the results
may also appear. To turn off this explanatory text, choose the Statistics
menu Report Options... command and uncheck the Explain Test
Results option.
Normality Test Normality test results display whether the data passed or failed the test of
the assumption that the changes observed in each subject is consistent
This result appears unless you disabled normality testing in the Paired t-
test Options dialog box (see page 335).
FIGURE 9–8
The Paired t-test
Results Report
Summary Table SigmaStat can generate a summary table listing the sample size N,
number of missing values (if any), mean, standard deviation, and
standard error of the means (SEM). This result is displayed unless you
disabled it in the Paired
t-test Options dialog box (see page 336).
Mean The average value for the column. If the observations are
normally distributed the mean is the center of the distribution.
deviation above or below the mean, and about 95% of the observations
will fall within two standard deviations above or below the mean.
Difference The difference of the group before and after the treatment is described in
terms of the mean of the differences (changes) in the subjects before and
after the treatment, and the standard deviation and standard error of the
mean difference.
t Statistic The t-test statistic is computed by subtracting the values before the
intervention from the value observed after the intervention in each
experimental subject. The remaining analysis is conducted on these
differences.
You can conclude from large (bigger than $2) absolute values of t that
the treatment affected the variable of interest (you reject the null
hypothesis of no difference). A large t indicates that the difference in
observed value after and before the treatment is larger than one would be
expected from effect variability alone (i.e., that the effect is statistically
significant). A small t (near 0) indicates that there is no significant
difference between the samples (little difference in the means before and
after the treatment).
Confidence Interval for If the confidence interval does not include a value of zero, you can
the Difference conclude that there is a significant difference with that level of
of the Means confidence. Confidence can also be described as P !%#, where # is the
acceptable probability of incorrectly concluding that there is an effect.
The level of confidence is adjusted in the Options for Paired t-test dialog
box; this is typically 100(1&%#), or 95%. Larger values of confidence
result in wider intervals.
This result is displayed unless you disabled it in the options for Paired t-
test dialog box (see page 337).
Power The power, or sensitivity, of a Paired t-test is the probability that the test
will detect a difference between treatments if there really is a difference.
The closer the power is to 1, the more sensitive the test.
This result is displayed unless you disabled it in the Options for Paired t-
test dialog box (see page 337).
The # value is set in the Options for Paired t-test dialog box; the
suggested value is # " 0.05 which indicates that a one in twenty chance
of error is acceptable. Smaller values of # result in stricter requirements
before concluding there is a significant difference, but a greater
possibility of concluding there is no difference when one exists (a Type
II error). Larger values of # make it easier to conclude that there is a
difference but also increase the risk of seeing a false difference (a Type I
error).
You can generate up to three graphs using the results from a paired t-test.
They include a:
Before and After The Paired t-test graph uses lines to plot a subject's change after each
Line Graph treatment. If the graph plots raw data, the lines represent the rows in the
column, the column titles are used as the tick marks for the X axis and
the data is used as the tick marks for the Y axis.
If the graph plots indexed data, the lines represent the levels in the
subject column, the levels in the treatment column are used as the tick
marks for the X axis, the data is used as the tick marks for the Y axis, and
the treatment and data column titles are used as the axis titles. For an
example of a before and after graph, see page 159.
Histogram of Residuals The Paired t-test histogram plots the raw residuals in a specified range,
using a defined interval set. The residuals are divided into a number of
evenly incremented histogram intervals and plotted as histogram bars
indicating the number of residuals in each interval. The X axis
represents the histogram intervals, and the Y axis represent the number
of residuals in each group. For an example of a histogram, see page 172.
Probability Plot The Paired t-test probability plot graphs the frequency of the raw
residuals. The residuals are sorted and then plotted as points around a
curve representing the area of the gaussian plotted on a probability axis.
Plots with residuals that fall along gaussian curve indicate that your data
was taken from a normally distributed population. The X axis is a
linear scale representing the residual values. The Y axis is a probability
scale representing the cumulative frequency of the residuals. For an
example of a probability plot, see page 176.
FIGURE 9–9
The Create Graph Dialog
Box for Paired t-test Report
Graphs
2 Select the type of graph you want to create from the Graph Type
list, then select OK, or double-click the desired graph in the list.
For more information on each of the graph types, see Chapter 8.
FIGURE 9–10
A Normal Probability
Plot of the Report Data
For information on modifying graphs, see pages page 178 through page
202.
If you know that the effects are normally distributed, use the Paired t-
test. When there are multiple treatments to compare, do a Friedman
Repeated Measures ANOVA on Ranks.
Note that, depending on your Signed Rank Test option settings (see page
347), if you attempt to perform a Signed Rank Test on a normal population,
SigmaStat suggests that the data can be analyzed with the more powerful
Paired t-test instead.
About the A Signed Rank Test Ranks all the observed treatment differences from
Signed Rank Test smallest to largest without regard to sign (based on their absolute value),
then attaches the sign of each difference to the ranks. The signed ranks
are summed and compared. This procedure uses the size of the
treatment effects and the sign.
The Wilcoxon Signed Rank Tests the null hypothesis a treatment has no
effect on the subject.
2 If desired, set the Signed Rank Test options using the Options for
Signed Rank Test dialog box (page 349).
3 Select Signed Rank Test from the toolbar, or choose the Statistics
menu Before and After command, then choose Signed Rank Test.
4 Run the test by selecting the worksheet columns with the data you
want to test using the Pick Columns dialog box (page 350).
5 View and interpret the Signed Rank Test report and generate
report graphs (pages 9-351 and 9-353).
The format of the data to be tested can be raw data or indexed data; in
either case, the data is found in two worksheet columns. For more
information on arranging data, see Data Format for Repeated Measures
Tests on page 330, or Arranging Data for Contingency Tables on page
69. For information on how to select the data format for a test, see
Picking Data to Test on page 99.
FIGURE 9–11
Valid Data Formats for a
Wilcoxon Signed Rank Test
Columns 1 and 2 are
arranged as raw data.
Columns 3 and 4 are
arranged as indexed data,
with column 3 as the factor
column.
Selecting When running a Wilcoxon Signed Rank Test you can either:
Data Columns
➤ Select the columns to test by dragging your mouse over the columns
before choosing the test.
➤ Select the columns while running the test.
1 If you are going to run the test after changing test options, and
want to select your data before you run the test, drag the pointer
over your data.
2 To open the Options for Signed Rank Test dialog box, select
Signed Rank Test from the drop-down list in the toolbar, then
click the button, or choose the Statistics menu Current Test
Options... command. The Normality appears (see Figure 9–12 on
page 349).
3 Click the Results tab to view the Summary Table option (see
Figure 9–13 on page 349). Click the Assumption Checking tab to
return to the Normality and Equal Variance options.
5 To continue the test, click Run Test. The Pick Columns dialog
box appears (see page 338 for more information).
6 To accept the current settings and close the options dialog box,
click OK. To accept the current setting without closing the
options dialog box, click Apply. To close the dialog box without
changing any settings or running the test, click Cancel.
You can select Help at any time to access SigmaStat’s on-line help
system.
The Normality Select the Assumption Checking tab from the options dialog box to view
Assumption the Normality and Equal Variance options. The normality assumption
test checks for a normally distributed population.
The Equal Variance option is not available for the Signed Rank Test because
Signed Rank Tests are based on changes in each individual rather than on
different individuals in the selected population, making equal variance
testing unnecessary.
Summary Table The summary table for a Signed Rank Test lists the medians, percentiles,
and sample sizes N in the Rank Sum test report. If desired, change the
percentile values by editing the boxes. The 25th and the 75th
percentiles are the suggested percentiles...
FIGURE 9–13
The Options for Signed
Rank Test Displaying
the Summary Table Option
To run a test, you need to select the data to test. The Pick Columns
dialog box is used to select the worksheet columns with the data you
want to test and to specify how your data is arranged in the worksheet.
1 If you want to select your data before you run the test, drag the
pointer over your data.
2 Open the Pick Columns dialog box to start the Signed Rank Test.
You can either:
➤ Select Signed Rank Test from the toolbar drop-down list, then
select the button.
➤ Choose the Statistics menu Before and After, Signed Rank
Test... command.
➤ Click the Run button from the Options for Signed Rank Test
dialog box (see page 349).
3 Select the appropriate data format from the Data Format drop-
down list. If your data is grouped in columns, select Raw. If your
data is in the form of a group index column(s) paired with a data
column(s), select Indexed.
FIGURE 9–14
The Pick Columns
for Signed Rank Test
Dialog Box Prompting You
to Specify a Data Format
4 Select Next to pick the data columns for the test. If you selected
columns before you chose the test, the selected columns appear in
the Selected Columns list.
The first selected column is assigned to the first row in the Selected
Columns list, and all successively selected columns are assigned to
successive rows in the list.
When the test is complete, the report appears displaying the results
of the Signed Rank Test (see Figure 9–15 on page 352).
The Signed Rank Test computes the Wilcoxon W statistic and the P
value
for W. Additional results to be displayed are selected in the Options for
Signed Rank Test dialog box (see Setting the Signed Rank Options on
page 347).
The report scroll bars only scroll to the top and bottom of the current page.
To move to the next or the previous page in the report, use the and
buttons in the formatting toolbar to move one page up and down in the
report.
Result Explanations In addition to the numerical results, expanded explanations of the results
may also appear. To turn off this explanatory text, choose the Statistics
menu Report Options... command and uncheck the Explain Test
Results option.
FIGURE 9–15
The Wilcoxon
Signed
Rank Test Results
Report
Normality Test Normality test results display whether the data passed or failed the test of
the assumption that the difference of the treatment originates from a
normal distribution, and the P value calculated by the test. For
nonparametric procedures this test can fail, since nonparametric tests do
not require normally distributed source populations. This result appears
unless you disabled normality testing in the Options for Signed Rank
Test dialog box (see page 349).
Summary Tables SigmaStat generates a summary table listing the sample sizes N, number
of missing values (if any), medians, and percentiles. All of these results
are displayed in the report unless you disable them in the Signed Rank
Test Options dialog box (see page 349).
Percentiles The two percentile points that define the upper and lower
tails of the observed values.
W Statistic The Wilcoxon test statistic W is computed by ranking all the differences
before and after the treatment based on their absolute value, then
attaching the signs of the difference to the corresponding ranks. The
signed ranks are summed and compared.
If the absolute value of W is “large”, you can conclude that there was a
treatment effect (i.e., the ranks tend to have the same sign, so there is a
statistically significant difference before and after the treatment).
If W is small, the positive ranks are similar to the negative ranks, and you
can conclude that there is no treatment effect.
You can generate a line scatter graph of the changes after treatment for a
Signed Rank Test report.
Before and After The Signed Rank Test graph uses lines to plot a subject's change after
Line Graph each treatment. If the graph plots raw data, the lines represent the rows
in the column, the column titles are used as the tick marks for the X axis
and the data is used as the tick marks for the Y axis.
If the graph plots indexed data, the lines represent the levels in the
subject column, the levels in the treatment column are used as the tick
marks for the X axis, the data is used as the tick marks for the Y axis, and
the treatment and data column titles are used as the axis titles.
FIGURE 9–16
The Create Graph
Dialog Box for the Signed
Rank Test Report
9 Select the type of graph you want to create from the Graph Type
list, then select OK, or double-click the desired graph in the list.
For more information on each of the graph types, see Chapter 8.
The specified graph appears in a graph window or in the report.
FIGURE 9–17
A Before & After
Scatter Graph
If you know that the treatment effects are not normally distributed, use
the Friedman Repeated Measures ANOVA on Ranks. If your want to
consider the effects of an additional factor on your experimental
treatments, use Two Way Repeated Measures ANOVA. When there is
only a single treatment, you can do a Paired t-test (depending on the
type of results you want).
About the One A One Way or One Factor Repeated Measures ANOVA tests for
Way Repeated Measures differences in the effect of a series of experimental interventions on the
ANOVA same group of subjects by examining the changes in each individual.
Examining the changes rather than the values observed before and after
interventions removes the differences due to individual responses,
producing a more sensitive (or more powerful) test.
The design for a One Way Repeated Measures ANOVA is essentially the
same as a Paired t-test, except that there can be multiple treatments on
the same group. The null hypothesis is that there are no differences
among all the treatments.
One Way Analysis of Variance is a parametric test that assumes that all
treatment effects are normally distributed with the same standard
deviations (variances).
4 Run the test by selecting the worksheet columns with the data you
want to test using the Pick Columns dialog box box (page 363).
The format of the data to be tested can be raw data or indexed data.
Raw data is placed in as many columns as there are treatments, up to 64;
each column contains the data for one treatment. The columns for raw
data must be the same length.
Selecting Data Columns When running a One Way Repeated Measures ANOVA you can either:
➤ Select the columns to test by dragging your mouse over the columns
before choosing the test.
➤ Select the columns while running the test (see page 99).
Missing Data Points If there are missing values, SigmaStat automatically handles the missing
data by using a general linear model. This approach constructs
hypothesis tests using the marginal sums of squares (also commonly
called the Type III or adjusted sums of squares). However, the columns
must still be equal in length.
1 If you are going to run the test after changing test options, and
want to select your data before you run the test, drag the pointer
over your data.
2 To open the Options for One Way RM ANOVA dialog box box,
select One Way RM ANOVA from the toolbar drop-down list,
then click the button, or choose the Statistics menu Current
Test Options... command. The Normality and Equal Variance
options appear (see Figure 10–21 on page 358).
5 To continue the test, click Run Test. The Pick Columns dialog
box box appears (see page 363 for more information).
6 To accept the current settings and close the options dialog box box,
click OK. To accept the current setting without closing the
options dialog box box, click Apply. To close the dialog box box
without changing any settings or running the test, click Cancel.
You can select Help at any time to access SigmaStat’s on-line help
system.
Normality and Select the Assumption Checking tab from the options dialog box box to
Equal Variance view the Normality and Equal Variance options. The normality
Assumptions assumption test checks for a normally distributed population. The equal
variance assumption test checks the variability about the group means.
FIGURE 10–21
The Options for
One Way RM ANOVA
Dialog Box Displaying
the Assumption
Checking Options
Although the assumption tests are robust in detecting data from populations
that are non-normal or with unequal variances, there are extreme conditions
of data distribution that these tests cannot take into account. For example,
the Levene Median test fails to detect differences in variance of several orders
of magnitude. However, these conditions should be easily detected by
simply examining the data without resorting to the automatic assumption
tests.
Summary Table Select the Results tab in the options dialog box box to view the
Summary Table option. The Summary Table option displays the
number of observations for a column or group, the number of missing
values for a column or group, the average value for the column or group,
the standard deviation of the column or group, and the standard error of
the mean for the column or group.
Confidence Intervals Select the Results tab in the options dialog box box to view the
Confidence Intervals option. The Confidence Intervals option displays
the confidence interval for the difference of the means. To change the
interval, enter any number from 1 to 99 (95 and 99 are the most
commonly used intervals). Click the selected check box if you do not
want to include the confidence interval in the report.
Residuals Select the Results tab in the options dialog box box to view the Residuals
option. Use the Residuals option to display residuals in the report and to
save the residuals of the test to the specified worksheet column. To
FIGURE 10–22
The Options for One Way
ANOVA Dialog Box
Displaying the Summary
Table Options
Power Select the Post Hoc Tests tab in the options dialog box box to view the
Power option. The power or sensitivity of a test is the probability that
the test will detect a difference between the groups if there is really a
difference.
Change the alpha value by editing the number in the Alpha Value box.
FIGURE 10–23
The Options for One Way
ANOVA Dialog Box
Displaying the Power and
Multiple Comparison
Options
Multiple Comparisons Select the Post Hoc Test tab in the Options dialog box box to view the
multiple comparisons options (see Figure 10–23 on page 361). A One
Way Repeated Measures ANOVA tests the hypothesis of no differences
between the several treatment groups, but do not determine which
groups are different, or the sizes of these differences. Multiple
comparison procedures isolate these differences.
To run a test, you need to select the data to test. The Pick Columns
dialog box box is used to select the worksheet columns with the data you
want to test and to specify how your data is arranged in the worksheet.
1 If you want to select your data before you run the test, drag the
pointer over your data.
2 Open the Pick Columns dialog box box to start the One Way
Repeated Measures ANOVA. You can either:
3 Select the appropriate data format from the Data Format drop-
down list. If your data is grouped in columns, select Raw. If your
data is in the form of a group index column(s) paired with a data
column(s), select Indexed.
FIGURE 10–24
The Pick Columns
for One Way RM ANOVA
Dialog Box Prompting You to
Specify a Data Format
4 Select Next to pick the data columns for the test. If you selected
columns before you chose the test, the selected columns appear in
the Selected Columns list.
The first selected column is assigned to the first row in the Selected
Columns list, and all successively selected columns are assigned to
successive rows in the list. The number or title of selected columns
appear in each row. You are prompted to pick a minimum of two
and a maximum of 64 columns for raw data, two columns for
indexed data, and three columns for statistical summary data.
FIGURE 10–25
The Pick Columns
for One Way RM ANOVA
Dialog Box Prompting You
to Select Data Columns
There are seven kinds of multiple comparison tests available for the One
Way Repeated Measures ANOVA.
➤ Holm-Sidak Test
➤ Tukey Test
➤ Student-Newman-Keuls Test
➤ Bonferroni t-test
➤ Fisher’s LSD
➤ Dunnett’s Test
➤ Duncan’s Multiple Range Test
There are two types of multiple comparisons available for the One Way
Repeated Measures ANOVA. The types of comparison you can make
depends on the selected multiple comparison test.
Holm-Sidak Test The Holm-Sidak Test can be used for both pairwise comparisons and
comparisons versus a control group. It is more powerful than the Tukey
and Bonferroni tests and, consequently, it is able to detect differences
that these other tests do not. It is recommended as the first-line
procedure for pairwise comparison testing.
When performing the test, the P values of all comparisons are computed
and ordered from smallest to largest. Each P value is then compared to a
critical level that depends upon the significance level of the test (set in
the test options), the rank of the P value, and the total number of
comparisons made. A P value less than the critical level indicates there is
a significant difference between the corresponding two groups.
Tukey Test The Tukey Test and the Student-Newman-Keuls test are conducted
similarly to the Bonferroni t-test, except that it uses a table of critical
values that is computed based on a better mathematical model of the
probability structure of the multiple comparisons. The Tukey Test is
more conservative than the Student-Newman-Keuls test, because it
controls the errors of all comparisons simultaneously, while the Student-
Neuman-Keuls test controls errors among tests of k means. Because it is
more conservative, it is less likely to determine that a give differences is
statistically significant and it is the recommended test for all pairwise
comparisons.
Student-Newman-Keuls The Student-Newman-Keuls Test and the Tukey Test are conducted
(SNK) Test similarly to the Bonferroni t-test, except that it uses a table of critical
values that is computed based on a better mathematical model of the
probability structure of the multiple comparisons. The Student-
Newman-Keuls Test is less conservative than the Tukey Test because it
controls errors among tests of k means, while the Tukey Test controls the
errors of all comparisons simultaneously. Because it is less conservative,
it is more likely to determine that a give differences is statistically
significant. The Student-Newman-Keuls Test is usually more sensitive
than the Bonferroni t-test, and is only available for all pairwise
comparisons.
Bonferroni t-test The Bonferroni t-test performs pairwise comparisons with paired t-tests.
The P values are then multiplied by the number of comparisons that
were made. It can perform both all pairwise comparisons and multiple
comparisons vs a control, and is the most conservative test for both each
comparison type. For less conservative all pairwise comparison tests, see
the Tukey and the Student-Newman-Keuls tests, and for the less
conservative multiple comparison vs a control tests, see the Dunnett’s
Test.
Fisher’s Least The Fisher’s LSD Test is the least conservative all pairwise comparison
Significance Difference test. Unlike the Tukey and the Student-Newman-Keuls, it makes not
Test effort to control the error rate. Because it makes not attempt in
controlling the error rate when detecting differences between groups, it
is not recommended.
Dunnett’s Test Dunnett's test is the analog of the Student-Newman-Keuls Test for the
case of multiple comparisons against a single control group. It is
conducted similarly to the Bonferroni t-test, but with a more
sophisticated mathematical model of the way the error accumulates in
order to derive the associated table of critical values for hypothesis
testing. This test is less conservative than the Bonferroni Test, and is
only available for multiple comparisons vs a control.
Duncan’s The Duncan’s Test is the same way as the Tukey and the Student-
Multiple Range Newman-Keuls tests, except that it is less conservative in determining
whether the difference between groups is significant by allowing a wider
Performing a Multiple The multiple comparison test you choose depends on the treatments
Comparison you are testing. Select Cancel if you do not want to perform a multiple
comparison test.
Note that in both cases the Bonferroni t-test is most sensitive with a
small number of groups. Dunnett’s test is not available if you have
fewer than six observations.
FIGURE 10–26
The Multiple Comparison
Options Dialog Box
FIGURE 10–27
The Multiple Comparison
Options Dialog Box
Prompting You to
Select a Control Group
The report scroll bars only scroll to the top and bottom of the current page.
To move to the next or the previous page in the report, use the and
buttons in the formatting toolbar to move one page up and down in the
report.
Result Explanations In addition to the numerical results, expanded explanations of the results
may also appear. To turn off this explanatory text, choose the Statistics
FIGURE 10–28
Example of the
One Way Repeated
Measures ANOVA
Report
If There Were Missing If your data contained missing values, the report indicates the results
Data Cells were computed using a general linear model. The ANOVA table
includes the degrees of freedom used to compute F, the estimated mean
square equations are listed, and the summary table displays the estimated
least square means.
This result appears unless you disabled equal variance testing in the
Options for One Way RM ANOVA dialog box (see page 358).
Equal Variance Test Equal Variance test results display whether or not the data passed or
failed the test of the assumption that the differences of the changes
originate from a population with the same variance, and the P value
calculated by the test. Equal variances of the source populations are
assumed for all parametric tests.
This result appears unless you disabled equal variance testing in the
Options for One Way RM ANOVA dialog box (see page 358).
Summary Table If you enabled this option in the Options for One Way RM ANOVA
dialog box, SigmaStat generates a summary table listing the sample sizes
N, number of missing values, mean, standard deviation, differences of
the means and standard deviations, and standard error of the means.
Mean The average value for the column. If the observations are
normally distributed the mean is the center of the distribution.
Power The power of the performed test is displayed unless you disable this
option in the Options for One Way RM ANOVA dialog box.
The # value is set in the Options for One Way RM ANOVA dialog box;
the suggested value is # " 0.05 which indicates that a one in twenty
chance of error is acceptable. Smaller values of # result in stricter
requirements before concluding there is a significant difference, but a
greater possibility of concluding there is no difference when one exists (a
Type II error). Larger values of # make it easier to conclude that there
is a difference but also increase the risk of seeing a false difference (a
Type I error).
ANOVA Table The ANOVA table lists the results of the one way repeated measures
ANOVA.
F Statistic The F test statistic is a ratio used to gauge the differences of the effects.
If there are no missing data, F is calculated as
If the F ratio is around 1, you can conclude that there are no differences
among treatments (the data is consistent with the null hypothesis that
there are no treatment effects).
Expected If there was missing data and a general linear model was used, the linear
Mean Squares equations for the expected mean squares computed by the model are
displayed. These equations are displayed only if a general linear model
was used.
Multiple Comparisons If you selected to perform multiple comparisons (see page 364), a table
of the comparisons between group pairs is displayed. The multiple
comparison procedure is activated in the Options for One Way RM
ANOVA dialog box (see page 361). The tests used in the multiple
comparison procedure is selected in the Multiple Comparison Options
dialog box (see page 368).
You can conclude from “large” values of t that the difference of the two
treatments being compared is statistically significant.
If the P value for the comparison is less than 0.05, the likelihood of
erroneously concluding that there is a significant difference is less than
5%. If it is greater than 0.05, you cannot confidently conclude that
there is a difference.
Dunnett's test only compares a control group to all other groups. All
tests compute the q test statistic, the number of means spanned in the
comparison p, and display whether or not P ! 0.05 for that pair
comparison.
You can conclude from “large” values of q that the difference of the two
treatments being compared is statistically significant.
If the P value for the comparison is less than 0.05, the likelihood of
being incorrect in concluding that there is a significant difference is less
than 5%. If it is greater than 0.05, you cannot confidently conclude
that there is a difference.
You can generate up to three graphs using the results from a One Way
RM ANOVA. They include a:
Before and After The One Way Repeated Measures ANOVA uses lines to plot a subject's
Line Graph change after each treatment. If the graph plots raw data, the lines
represent the rows in the column, the column titles are used as the tick
marks for the X axis and the data is used as the tick marks for the Y axis.
For an example of a before and after graph, see page 159.
If the graph plots indexed data, the lines represent the levels in the
subject column, the levels in the treatment column are used as the tick
marks for the X axis, the data is used as the tick marks for the Y axis, and
the treatment and data column titles are used as the axis titles.
Histogram of Residuals The One Way Repeated Measures ANOVA histogram plots the raw
residuals in a specified range, using a defined interval set. The residuals
are divided into a number of evenly incremented histogram intervals and
plotted as histogram bars indicating the number of residuals in each
interval. The X axis represents the histogram intervals, and the Y axis
represent the number of residuals in each group. For an example of a bar
chart, see page 153.
Multiple The One Way Repeated Measures ANOVA multiple comparison graphs
Comparison Graphs plot significant differences between levels of a significant factor. There is
one graph for every significant factor reported by the specified multiple
comparison test. If there is one significant factor reported, one graph
appears; if there are tow significant factors, two graphs appear, etc. If a
factor is not reported as significant, a graph for the factor does not
appear. For an example of a multiple comparison graph, see page 361.
Creating a Graph To generate a graph of One Way Repeated Measures ANOVA data:
FIGURE 10–29
The Create Graph Dialog
Box
for a One Way RM
ANOVA Report
2 Select the type of graph you want to create from the Graph Type
list, then select OK, or double-click the desired graph in the list.
FIGURE 10–30
A Normal Probability Plot
for a One Way RM ANOVA
SigmaStat performs Two Way Repeated Measures ANOVAs for one factor
repeated or both factors repeated. SigmaStat automatically determines if
one or both factors are repeated from the data, and uses the appropriate
procedures.
About the Two In a two way or two factor repeated measures analysis of variance, there
Way Repeated Measures are two experimental factors which may affect each experimental
ANOVA treatment. Either or both of these factors are repeated treatments on the
same group of individuals. A two factor design tests for differences
between the different levels of each treatment and for interactions
between the treatments. For information on arranging data for the Two
Way Repeated Measures ANOVA, see page 380.
3 Select Two Way RM ANOVA from the toolbar, then select the
button, or choose the Statistics menu Repeated Measures
command, then choose Two Way Repeated Measures ANOVA.
4 Run the test by selecting the worksheet columns with the data you
want to test using the Pick Columns dialog box (page 391).
Either or both of the two factors used in the Two Way Repeated
Measures ANOVA can be repeated on the same group of individuals.
For example, if you analyze the effect of changing salinity on the activity
of two different species of shrimp, you have a two factor experiment
with a single repeated treatment (salinity). Different salinity treatment
Missing Data Ideally, the data for a Two Way ANOVA should be completely balanced,
and Empty Cells i.e., each group or cell in the experiment has the same number of
observations and there are no missing data. However, SigmaStat
properly handles all occurrences of missing and unbalanced data
automatically.
Note that assuming there is no interaction between the two factors in Two
Way ANOVA can be dangerous. Under some circumstances, this
assumption can lead to a meaningless analysis, particularly if you are
interested in studying the interaction effect.
TABLE 3-4
Data for a Two Way Temperature Subject Salinity
Repeated Factor ANOVA
with Two Repeated Factors 10 15
(Temperature and Salinity)
Repeated And A Missing
25° A 8.5 11.0
Cell B 8.5 10.5
Data with missing cells
that still have repeated C 9.5 12.0
factor data for every
subject can be analyzed 30° A 9.0 --
either by assuming no
interaction or a One
B 9.0 --
Way AVOVA. C 10.0 --
If you treat the problem as One Way ANOVA, each cell in the table is
treated as a different level of a single experimental factor. This approach
TABLE 3-5
Data for a Two Way
Temperature Subject Salinity
Repeated Factor ANOVA 10 15
with Geometrically
Disconnected Data 25° A -- 11.0
This data cannot be
analyzed with a two way B -- 10.5
repeated measures
ANOVA.
C -- 12.0
30° A 9.0 --
B 9.0 --
C 10.0 --
TABLE 3-6
Data for a Two Way
Temperature Subject Salinity
Repeated Factor ANOVA 10 15
with Two Factors
Repeated and No Data for 25° A -- 11.0
One Level for a Subject
This data cannot be B -- --
analyzed as a one way
repeated measures
C -- 12.0
ANOVA problem. 30° A 9.0 12.5
B 9.0 --
C 10.0 13.0
FIGURE 10–31
Valid Data Formats for
a Two Way Repeated
Measures ANOVA with
One Factor Repeated
Columns 1 is the subject
index, column 2 is the
non-repeated first factor,
column 3 is the repeated
second factor, and column 4
is the data (see Table 10-1,
on page 10-381).
SigmaStat performs two way repeated measures for one factor repeated or
both factors repeated. SigmaStat automatically determines if one or both
factors are repeated from the data, and uses the appropriate procedures.
➤ Select the columns to test by dragging your mouse over the columns
or cells before choosing the test.
➤ Select the columns while running the test .
1 If you are going to run the test after changing test options, and
want to select your data before you run the test, drag the pointer
over your data.
2 To open the Options for Two Way RM ANOVA dialog box, select
Two Way RM ANOVA from the toolbar drop-down list, then
click the button, or choose the Statistics menu Current Test
Options... command. The Normality and Equal Variance options
appear (see Figure 10–32 on page 387).
6 To accept the current settings and close the options dialog box,
click OK. To accept the current setting without closing the
options dialog box, click Apply. To close the dialog box without
changing any settings or running the test, click Cancel.
You can select Help at any time to access SigmaStat’s on-line help
system.
Normality and Select the Assumption Checking tab from the options dialog box to view
Equal Variance the Normality and Equal Variance options. The normality assumption
Assumptions test checks for a normally distributed population. The equal variance
assumption test checks the variability about the group means. .
FIGURE 10–32
The Options for Two
Way RM ANOVA Dialog Box
Displaying the Assumption
Checking Options
Although the assumption tests are robust in detecting data from populations
that are non-normal or with unequal variances, there are extreme conditions
of data distribution that these tests cannot take into account. For example,
the Levene Median test fails to detect differences in variance of several orders
of magnitude. However, these conditions should be easily detected by
simply examining the data without resorting to the automatic assumption
tests.
Summary Table Select the Results tab in the options dialog box to view the summary
table option. The Summary Table option displays the number of
observations for a column or group, the number of missing values for a
column or group, the average value for the column or group, the
standard deviation of the column or group, and the standard error of the
mean for the column or group.
Confidence Interval Select the Results tab in the options dialog box to view the Confidence
Interval option. The Confidence Intervals option displays the
confidence interval for the difference of the means. To change the
interval, enter any number from 1 to 99 (95 and 99 are the most
FIGURE 10–33
The Options for
Two Way ANOVA Dialog Box
Displaying the Summary
Table, Confidence Intervals,
and Residuals Options
Residuals Select the Results tab in the options dialog box to view the Residuals
option. Use the Residuals option to display residuals in the report and
to save the residuals of the test to the specified worksheet column. To
change the column the residuals are saved to, edit the number in or
select a number from the drop-down list.
Power Select the Post Hoc Tests tab in the options dialog box to view the Power
options. The power or sensitivity of a test is the probability that the test
will detect a difference between the groups if there is really a difference.
FIGURE 10–34
The Options for Two
Way RM ANOVA Dialog Box
Displaying the
Power and Multiple
Comparison Options
Change the alpha value by editing the number in the Alpha Value box.
Multiple Comparisons Select the Post Hoc Test tab in the Options dialog box to view the
multiple comparisons options (see Figure 10–36 on page 396). The
Two Way Repeated Measures ANOVA tests the hypothesis of no
differences between the several treatment groups, but do not determine
which groups are different, or the sizes of these differences. Multiple
comparison procedures isolate these differences.
To run a test, you need to select the data to test. The Pick Columns
dialog box is used to select the worksheet columns with the data you
want to test and to specify how your data is arranged in the worksheet.
1 If you want to select your data before you run the test, drag the
pointer over your data.
If you selected columns before you chose the test, the selected
columns appear in the column list. If you have not selected
columns, the dialog box prompts you to pick your data.
The first selected column is assigned to the first row in the Selected
Columns list, and all successively selected columns are assigned to
successive rows in the list. The number or title of selected columns
appear in each row. You are prompted to pick four data columns;
the first factor in one column, the second factor in a second
FIGURE 10–35
The Pick Columns
for Two Way ANOVA
Dialog Box Prompting You to
Select Data Columns
6 If your data have empty cells, you are prompted to perform the
appropriate procedure.
➤ If you are missing a cell, but the data is still connected, you may
have to proceed by either assuming no interaction between the
factors, or by performing a one factor analysis on each cell
➤ If your data is not geometrically connected, or if a subject is
missing data for one level, you cannot perform a Two Way
Repeated Measures ANOVA. Continue using a One Way
ANOVA, or cancel the test
For more information on missing data point and cell handling, see
Missing Data and Empty Cells on page 381.
There are sevem multiple comparison tests to choose from for the Two
Way Repeated Measures ANOVA. You can choose to perform the
➤ Holm-Sidak Test
➤ Tukey Test
There are two types of multiple comparisons available for the Two Way
Repeated Measures ANOVA. The types of comparison you can make
depends on the selected multiple comparison test. .
When comparing the two factors separately, the treatments within one
factor are compared among themselves without regard to the second
factor, and vice versa. These results should be used when the interaction
is not statistically significant.
Holm-Sidak Test The Holm-Sidak Test can be used for both pairwise comparisons and
comparisons versus a control group. It is more powerful than the Tukey
and Bonferroni tests and, consequently, it is able to detect differences
that these other tests do not. It is recommended as the first-line
procedure for pairwise comparison testing.
When performing the test, the P values of all comparisons are computed
and ordered from smallest to largest. Each P value is then compared to a
critical level that depends upon the significance level of the test (set in
the test options), the rank of the P value, and the total number of
comparisons made. A P value less than the critical level indicates there is
a significant difference between the corresponding two groups.
Student-Newman-Keuls The Student-Newman-Keuls Test and the Tukey Test are conducted
(SNK) Test similarly to the Bonferroni t-test, except that it uses a table of critical
values that is computed based on a better mathematical model of the
probability structure of the multiple comparisons. The Student-
Newman-Keuls Test is less conservative than the Tukey Test because it
controls errors among tests of k means, while the Tukey Test controls the
errors of all comparisons simultaneously. Because it is less conservative,
it is more likely to determine that a give differences is statistically
significant. The Student-Newman-Keuls Test is usually more sensitive
than the Bonferroni t-test, and is only available for all pairwise
comparisons.
Bonferroni t-test The Bonferroni t-test performs pairwise comparisons with paired t-tests.
The P values are then multiplied by the number of comparisons that
were made. It can perform both all pairwise comparisons and multiple
comparisons vs a control, and is the most conservative test for both each
comparison type. For less conservative all pairwise comparison tests, see
the Tukey and the Student-Newman-Keuls tests, and for the less
conservative multiple comparison vs a control tests, see the Dunnett’s
Test.
Fisher’s Least The Fisher’s LSD Test is the least conservative all pairwise comparison
Significance Difference test. Unlike the Tukey and the Student-Newman-Keuls, it makes not
Test effort to control the error rate. Because it makes not attempt in
controlling the error rate when detecting differences between groups, it
is not recommended.
Duncan’s The Duncan’s Test is the same way as the Tukey and the Student-
Multiple Range Newman-Keuls tests, except that it is less conservative in determining
whether the difference between groups is significant by allowing a wider
range for error rates. Although it has a greater power to detect
differences than the Tukey and the Student-Newman-Keuls tests, it has
less control over the Type1 error rate, and is, therefore, not
recommended.
Performing a Multiple The multiple comparison you choose to perform depends on the
Comparison treatments you are testing. Select Cancel if you do not want to perform
a multiple comparison procedure.
FIGURE 10–36
The Multiple Comparison
Options Dialog Box for
a Two Way ANOVA
Note that in both cases the Bonferroni t-test is most sensitive with a
small number of groups. Dunnett’s test is not available if you have
fewer than six observations.
FIGURE 10–37
The Multiple Comparison
Options Dialog Box
Prompting You To Select a
Control Group
Tables of least square means for each for the levels of factor and for the
levels of both factors together are also generated for both one and two
factor two way repeated measures ANOVA.
The report scroll bars only scroll to the top and bottom of the current page.
To move to the next or the previous page in the report, use the and
buttons in the formatting toolbar to move one page up and down in the
report.
Result Explanations In addition to the numerical results, expanded explanations of the results
may also appear. To turn off this explanatory text, choose the Statistics
menu Report Options... command and uncheck the Explain Test
Options option.
If There Were Missing If your data contained missing values but no empty cells, the report
Data or Empty Cells indicates the results were computed using a general linear model. The
ANOVA table includes the approximate degrees of freedom used to
compute F, the estimated mean square equations are listed, and the
summary table displays the estimated least square means.
If your data contained empty cells, you either analyzed the problem
assuming no interaction, or treated the problem as a One Way ANOVA.
Dependent Variable This is the column title of the indexed worksheet data you are analyzing
with the Two Way Repeated Measures ANOVA. Determining if the
values in this column are affected by the different factor levels is the
objective of the Two Way Repeated Measures ANOVA.
Normality Test Normality test results display whether the data passed or failed the test of
the assumption that the differences of the changes originate from a
normal distribution, and the P value calculated by the test. A normally
distributed source is required for all parametric tests.
This result appears if you enabled normality testing in the Options for
Two Way RM ANOVA dialog box (see page 387).
Equal Variance Test Equal Variance test results display whether or not the data passed or
failed the test of the assumption that the differences of the changes
originate from a population with the same variance, and the P value
calculated by the test. Equal variance of the source is assumed for all
parametric tests.
ANOVA Table The ANOVA table lists the results of the two way repeated measures
ANOVA. The results are calculated for each factor, and then between
the factors.
If there are missing data or empty cells, SigmaStat automatically adjusts the F
computations to account for the offsets of the expected mean squares.
If the F ratio is around 1, the data is consistent with the null hypothesis
that there is no effect (i.e., no differences among treatments).
Power The power of the performed test is displayed unless you disable this
option in the Options for Two Way RM ANOVA dialog box.
The # value is set in the Options for Two Way RM ANOVA dialog box;
the suggested value is # " 0.05 which indicates that a one in twenty
chance of error is acceptable. Smaller values of # result in stricter
requirements before concluding there is a significant difference, but a
greater possibility of concluding there is no difference when one exists (a
Type II error). Larger values of # make it easier to conclude that there
is a difference but also increase the risk of seeing a false difference (a
Type I error).
Expected If there were missing data and a general linear model was used, the linear
Mean Squares equations for the expected mean squares computed by the model are
displayed. These equations are displayed only if a general linear model
was used.
Summary Table The least square means and standard error of the means are displayed for
each factor separately (summary table row and column), and for each
combination of factors (summary table cells). If there are missing values,
the least square means are estimated using a general linear model.
Mean The average value for the column. If the observations are
normally distributed the mean is the center of the distribution.
You can conclude from “large” values of t that the difference of the two
treatments being compared is statistically significant.
If the P value for the comparison is less than 0.05, the likelihood of
erroneously concluding that there is a significant difference is less than
5%. If it is greater than 0.05, you cannot confidently conclude that
there is a difference.
Dunnett's test only compares a control group to all other groups. All
tests compute the q test statistic, the number of means spanned in the
comparison p, and display whether or not P ! 0.05 for that pair
comparison.
You can conclude from “large” values of q that the difference of the two
treatments being compared is statistically significant.
If the P value for the comparison is less than 0.05, the likelihood of
being incorrect in concluding that there is a significant difference is less
than 5%. If it is greater than 0.05, you cannot confidently conclude
that there is a difference.
SigmaStat does not apply the DNT logic to all pairwise comparisons because
of differences in the degrees of freedom between different cell pairs.
You can generate up to five graphs using the results from a Two Way
Repeated Measures ANOVA. They include a:
Histogram of Residuals The Two Way Repeated Measures ANOVA histogram plots the raw
residuals in a specified range, using a defined interval set. The residuals
are divided into a number of evenly incremented histogram intervals and
plotted as histogram bars indicating the number of residuals in each
interval. The X axis represents the histogram intervals, and the Y axis
represent the number of residuals in each group. For an example of a
histogram, see page 153.
Probability Plot The Two Way Repeated Measures ANOVA probability plot graphs the
frequency of the raw residuals. The residuals are sorted and then plotted
as points around a curve representing the area of the gaussian plotted on
a probability axis. Plots with residuals that fall along gaussian curve
indicate that your data was taken from a normally distributed
population. The X axis is a linear scale representing the residual values.
The Y axis is a probability scale representing the cumulative frequency of
the residuals. For an example of a probability plot, see page 155.
3D Residual The Two Way RM ANOVA 3D residual scatter plot graphs the residuals
Scatter Plot of the two columns of independent variable data. The X and the Y axes
represent the independent variables, and the Z axis represents the
residuals. For an example a 3D residual scatter plot, see page 156.
3D Category The Two Way Repeated Measures ANOVA 3D Category Scatter plot
Scatter Graph graphs the two factors from the independent data columns along the X
and Y axes against the data of the dependent variable column along the
Z axis. The tick marks for the X and Y axes represent the two factors
Multiple The Two Way Repeated Measures ANOVA multiple comparison graphs
Comparison Matrix plot significant differences between levels of a significant factor. There is
one graph for every significant factor reported by the specified multiple
comparison test. If there is one significant factor reported, one graph
appears; if there are tow significant factors, two graphs appear, etc. If a
factor is not reported as significant, a graph for the factor does not
appear. For an example a multiple comparison graph, see page 160.
Creating a Graph To generate a graph of Two Way Repeated Measures ANOVA report
data:
FIGURE 10–39
The Create Graph Dialog
Box
for Two RM ANOVA
Report Graphs
2 Select the type of graph you want to create from the Graph Type
list, then select OK, or double-click the desired graph in the list.
FIGURE 10–40
Example of a Multiple
Comparison Graph
If you know the treatment effects are normally distributed, use One Way
Repeated Measures ANOVA. If there are only two treatments to
compare, do a Wilcoxon Signed Rank Test. There is no two factor test
for non-normally distributed treatment effects; however, you can
transform your data using Transform menu commands so that it fits the
assumptions of a parametric test.
About the Repeated The Friedman Repeated Measures Analysis of Variance on Ranks
Measures ANOVA compares effects of a series of different experimental treatments on a
on Ranks single group. Each subject's responses are ranked from smallest to
largest without regard to other subjects, then the rank sums for the
treatments are compared.
2 If desired, set the rank sum options using the Options for RM
ANOVA on Ranks dialog box (see following section).
4 Run the test by selecting the worksheet columns with the data you
want to test using the Pick Columns dialog box (page 415).
The format of the data to be tested can be raw data or indexed data.
Data for raw data is placed in as many columns as there are treatments,
up to 64; each column contains the data for one treatment and each row
contains the treatments of one subject. Indexed data is placed in three
worksheet columns: a factor column, a subject index column, and a data
column.
The columns for raw data must be the same length. If a missing value is
encountered, that individual is ignored.
For more information on arranging data, see Data Format for Repeated
Measures Tests on page 330, or Arranging Data for Contingency Tables
on page 69.
FIGURE 10–41
Valid Data Formats for
a Repeated Measures
ANOVA on Ranks
Columns 1 through 3 are
arranged as raw data.
Columns 4 through 6 are
arranged as indexed data,
with column 4 as the
subject column, column
5 as the factor column,
and column 6 as the data
column.
Selecting Data Repeated Measures ANOVA on Ranks can be performed on two entire
columns or only a portion of two columns. When running a test you
can either:
➤ Select the columns to test by dragging your mouse over the columns
before choosing the test.
➤ Select the columns while running the test.
1 If you are going to run the test after changing test options, and
want to select your data before you run the test, drag the pointer
over your data.
3 Click the Results tab to view the Summary Table option (see
Figure 10–43 on page 413). Click the Post Hoc Test tab to view
the Power and Multiple Comparisons options (see Figure 10–44
on page 414). Click the Assumption Checking tab to return to the
Normality and Equal Variance options.
5 To continue the test, click Run Test. The Pick Columns dialog
box appears (see page 415 for more information).
6 To accept the current settings and close the options dialog box,
click OK. To accept the current setting without closing the
options dialog box, click Apply. To close the dialog box without
changing any settings or running the test, click Cancel.
You can select Help at any time to access SigmaStat’s on-line help
system.
Normality and Select the Assumption Checking tab from the options dialog box to view
Equal Variance the Normality and Equal Variance options. The normality assumption
Assumptions test checks for a normally distributed population. The equal variance
assumption test checks the variability about the group means.
FIGURE 10–42
The Options for RM
ANOVA on Ranks
Dialog Box Displaying
the Assumption
Checking Options
Although the assumption tests are robust in detecting data from populations
that are non-normal or with unequal variances, there are extreme conditions
of data distribution that these tests cannot take into account. For example,
the Levene Median test fails to detect differences in variance of several orders
Summary Table Select the Results tab to view the Summary Table option. The summary
table for a ANOVA on Ranks lists the medians, percentiles, and sample
sizes N in the ANOVA on Ranks report. If desired, change the
percentile values by editing the boxes. The 25th and the 75th
percentiles are the suggested percentiles.
FIGURE 10–43
The Options for RM ANOVA
on Ranks Dialog Box
Displaying the Summary
Table Options
Multiple Comparisons Select the Post Hoc Test tab in the Options dialog box to view the
multiple comparisons options (see Figure 10–44 on page 414).
Repeated Measures ANOVA on Ranks test the hypothesis of no
differences between the several treatment groups, but do not determine
which groups are different, or the sizes of these differences. Multiple
comparison procedures isolate these differences.
FIGURE 10–44
The Options for RM
ANOVA on Ranks Dialog Box
Displaying the Multiple
Comparison Options
1 If you want to select your data before you run the test, drag the
pointer over your data.
2 Open the Pick Columns dialog box to start the Repeated Measures
ANOVA on Ranks. You can either:
3 Select the appropriate data format from the Data Format drop-
down list. If your data is grouped in columns, select Raw. If your
data is in the form of a group index column(s) paired with a data
column(s), select Indexed.
FIGURE 10–45
The Pick Columns
for RM ANOVA on Ranks
Dialog Box Prompting You to
Specify a Data Format
4 Select Next to pick the data columns for the test. If you selected
columns before you chose the test, the selected columns appear in
the Selected Columns list.
The first selected column is assigned to the first row in the Selected
Columns list, and all successively selected columns are assigned to
successive rows in the list. The number or title of selected columns
appear in each row. For raw data you are prompted for up to 64
data columns and for indexed data, you are prompted to select
three (Subject, Level, Data) worksheet columns.
FIGURE 10–46
The Pick Columns
for RM ANOVA on Ranks
Dialog Box Prompting You to
Select Data Columns
This dialog box displays the P values for each of the two experimental
factors and of the interaction between the two factors. Only the options
with P values less than or equal to the value set in the Options dialog
box are selected. You can disable multiple comparison testing for a factor
by clicking the selected option. If no factor is selected, multiple
comparison results are not reported.
There are four multiple comparison tests to choose from for the
ANOVA on Ranks. You can choose to perform the
➤ Dunn’s Test
➤ Dunnett’s Test
➤ Tukey Test
➤ Student-Newman-Keuls Test
There are two kinds of multiple comparison procedures available for the
Repeated Measures ANOVA on Ranks.
Tukey Test The Tukey Test and the Student-Newman-Keuls test are conducted
similarly to the Bonferroni t-test, except that it uses a table of critical
values that is computed based on a better mathematical model of the
probability structure of the multiple comparisons. The Tukey Test is
more conservative than the Student-Newman-Keuls test, because it
controls the errors of all comparisons simultaneously, while the Student-
Neuman-Keuls test controls errors among tests of k means. Because it is
Student-Newman-Keuls The Student-Newman-Keuls Test and the Tukey Test are conducted
(SNK) Test similarly to the Bonferroni t-test, except that it uses a table of critical
values that is computed based on a better mathematical model of the
probability structure of the multiple comparisons. The Student-
Newman-Keuls Test is less conservative than the Tukey Test because it
controls errors among tests of k means, while the Tukey Test controls the
errors of all comparisons simultaneously. Because it is less conservative,
it is more likely to determine that a give differences is statistically
significant. The Student-Newman-Keuls Test is usually more sensitive
than the Bonferroni t-test, and is only available for all pairwise
comparisons.
Dunn's Test Dunn's test must be used for ANOVA on Ranks when the sample sizes
in the different treatment groups are different. You can perform both all
pairwise comparisons and multiple comparisons versus a control with
the Dunn’s test. The all pairwise Dunn’s test is the default for data with
missing values.
Dunnett’s Test Dunnett's test is the analog of the Student-Newman-Keuls Test for the
case of multiple comparisons against a single control group. It is
conducted similarly to the Bonferroni t-test, but with a more
sophisticated mathematical model of the way the error accumulates in
order to derive the associated table of critical values for hypothesis
testing. This test is less conservative than the Bonferroni Test, and is
only available for multiple comparisons vs a control.
Performing a Multiple The multiple comparison you choose to perform depends on the
Comparison treatments you are testing. Select Cancel if you do not want to perform
a multiple comparison procedure.
FIGURE 10–47
The Multiple Comparison
Options Dialog Box for
the Repeated Measures
ANOVA on Ranks
Note that in both cases SigmaStat defaults to Dunn’s test when your
sample sizes are unequal. You must use Dunn’s test for unequal
sample sizes.
FIGURE 10–48
The Multiple Comparison
Options Dialog Box
Prompting You to
Select a Control Group
The report scroll bars only scroll to the top and bottom of the current page.
To move to the next or the previous page in the report, use the and
buttons in the formatting toolbar to move one page up and down in the
report.
Result Explanations In addition to the numerical results, expanded explanations of the results
may also appear. To turn off this explanatory text, choose the Statistics
menu Report Options... command and uncheck the Explain Test
Results option.
Normality Test Normality test results display whether the data passed or failed the test of
the assumption that the differences of the treatments originate from a
normal distribution, and the P value calculated by the test. For
nonparametric procedures this test can fail, as nonparametric tests do
not require normally distributed source populations. This result appears
unless you disabled normality testing in the Options for RM ANOVA
on Ranks dialog box (see page 413).
Equal Variance Test Equal Variance test results display whether or not the data passed or
failed the test of the assumption that the differences of the treatments
originate from a population with the same variance, and the P value
calculated by the test. Nonparametric tests do not assume equal variance
of the source. This result appears unless you disabled equal variance
Summary Table SigmaStat can generate a summary table listing the sample sizes N,
number of missing values, medians, and percentiles defined in the
Options for RM ANOVA on Ranks dialog box.
Percentiles The two percentile points that define the upper and lower
tails of the observed values.
These results appear in the report unless you disable them in the
Options for RM ANOVA on Ranks dialog box (see page 413).
FIGURE 10–49
The Friedman
Repeated
Measures ANOVA
on
Ranks Results
Report
Multiple Comparisons If a difference is found among the groups, and you requested and elected
to perform multiple comparisons, a table of the comparisons between
group pairs is displayed. The multiple comparison procedure is
activated in the Options for ANOVA on Ranks dialog box (see page
414). The test used in the multiple comparison procedure is selected in
the Multiple Comparison Options dialog box (see page 396).
You can conclude from “large” values of q that the difference of the two
treatments being compared is statistically significant.
If the P value for the comparison is less than 0.05, the likelihood of
being incorrect in concluding that there is a significant difference is less
than 5%. If it is greater than 0.05, you cannot confidently conclude
that there is a difference.
The rank sums is a gauge of the size of the difference between the two
treatments.
SigmaStat does not apply the DNT logic to all pairwise comparisons because
of differences in the degrees of freedom between different cell pairs.
If the P value for the comparison is less than 0.05, the likelihood of
being incorrect in concluding that there is a significant difference is less
than 5%. If it is greater than 0.05, you cannot confidently conclude
that there is a difference.
The rank sums is a gauge of the size of the difference between the two
treatments.
A result of DNT (do not test) appears for those comparison pairs whose
difference of rank means is less than the differences of the first
comparison pair which is found to be not significantly different.
You can generate up to three graphs using the results from a Repeated
Measures ANOVA on Ranks. They include a:
Box Plot The Repeated Measures ANOVA on Ranks box plot graphs each of the
groups being tested as boxes. The ends of the boxes define the 25th and
75th percentiles, with a line at the median and error bars defining the
10th and 90th percentiles.
If the graph data is indexed, the levels in the factor column are used as
the tick marks for the box plot boxes, and the column titles are used as
the axis titles. If the graph data is in raw format, the column titles are
used as the tick marks for the box plot boxes, and default axis titles, X
Axis and Y Axis, are assigned to the graph. For an example of a box plot,
see page 152.
Before and After The Repeated Measures ANOVA on Ranks uses lines to plot a subject's
Line Plot change after each treatment. If the graph plots raw data, the lines
represent the rows in the column, the column titles are used as the tick
marks for the X axis and the data is used as the tick marks for the Y axis.
If the graph plots indexed data, the lines represent the levels in the
subject column, the levels in the treatment column are used as the tick
Multiple Comparison The Repeated Measures ANOVA on Ranks multiple comparison graphs
Graphs plot significant differences between levels of a significant factor. There is
one graph for every significant factor reported by the specified multiple
comparison test. If there is one significant factor reported, one graph
appears; if there are tow significant factors, two graphs appear, etc. If a
factor is not reported as significant, a graph for the factor does not
appear. For an example of a multiple comparison graph, see page 160.
FIGURE 10–50
The Create Graph Dialog
Box
for the Repeated Measures
ANOVA on Ranks Report
Graph
2 Select the type of graph you want to create from the Graph Type
list, then select OK, or double-click the desired graph in the list.
FIGURE 10–51
A Box Plot for a Repeated
Measures ANOVA on Ranks
Use rate and proportion tests to compare two or more sets of data for
differences in the number of individuals that fall into different classes or
categories. All these tests are found under the Statistics menu Rates and
Proportions command.
Rate and proportion tests are used when the data is measured on a
nominal scale. Rate and proportion comparisons test for significant
differences in the categorical distribution of the data beyond what can be
attributed to random variation.
See Choosing the Rate and Proportion Comparison to Use on page 122
for more information on when to use the different SigmaStat frequency,
rate, and proportion tests.
Contingency Tables Many rate and proportion tests utilize a contingency table which lists
the groups and/or categories to be compared as the table column and
row titles, and the number of observations for each combination of
category or group as the table cells. See Figure 11-1 on page 431 for an
example of a simple contingency table. A contingency table is used to
determine whether or not the distribution of a group is contingent on
the categories it falls in.
A 2 x 2 contingency table has two groups and two categories (i.e., two
rows and two columns). A 2 x 3 table has two groups and three
categories or three groups and two categories, etc.
Comparing the Use a z-test to compare the proportions of two groups found within a
Proportions of single category for a significant difference. The z-test is performed using
Two Groups in the Rates and Proportions, z-test command.
One Category
Comparing Proportions You can use analysis of contingency tables to test if the distributions of
of Multiple Groups in two or more groups within two or more categories are significantly
Multiple Categories different.
Comparing Proportions You can test for differences in the proportions of the responses in the
of same individuals to a series of two different treatments using McNemar's
the Same Group Test for changes.
to Two Treatments
Yates Correction The Yates Correction for continuity can be automatically applied to the
z-test and for all tests using 2 x 2 tables or comparisons with the (2
distribution with one degree of freedom. It is generally accepted that the
Yates Correction yields a more accurately computed P value in these
cases.
For descriptions of the Yates Correction Factor, you can reference any
appropriate statistics reference. For a list of suggested references, see
page 12.
The exact format for each rate and proportion test varies from test to
test.
Note that whenever numbers of observations are listed, they must always be
integers.
z-test The data for a z-test is always placed in two worksheet rows by two
columns. The size (total number of observations) of each group is in
one column, and the corresponding proportion p of the observations
within the category is in a second column. The number of observations
must always be an integer, and the proportions p must be between 0 and
1.
(2 Analysis of The data can be arranged in the worksheet as either the contingency
Contingency Tables table data or as indexed raw data.
Note that the order and location of the rows or columns corresponding
to the groups and categories is unimportant. You can use the rows for
category and the columns for group, or vice versa.
TABLE 11-1
A Contingency Table Species Location
Describing the Number of
Lowland and Alpine Species Tundra Foothills Treeline
Found at Different Locations
Lowland 125 16 6
Alpine 7 19 117
Raw Data You can report the group and category of each individual
observation by placing the group in one worksheet column and the
corresponding category in another column. Each row corresponds to a
single observation, so there should be as many rows of data as there are
total numbers of observations.
FIGURE 10–1
Worksheet Data
Arrangement for
Contingency Table Data
from Table 8-1
Columns 1 through 3 are in
tabular format, and columns
4 and 5 are raw data.
Fisher Exact Test The data must form a 2 x 2 contingency table, with the number of
observations in each cell. You can test tabulated data or raw data
observations.
TABLE 3-2
A 2 x 2 Contingency Pinniped Species Island
Table Describing the
Number of Harbor Seals Island 1 Island 2
and Sea Lions Found
on Two Different Islands
Sea Lions 5 1
Harbor Seals 2 7
FIGURE 10–2
Data Formats for
a Fisher Exact Test
Columns 1 and 2 are in
tabular format and
columns 3 and 4 are
raw data observations.
For information on specifying a data format for a Fisher Exact Test, see
page 454.
McNemar's Test The data must form a table with the same number of rows and columns,
since the both treatments must have the same number of categories. You
can test tabulated data or raw data observations.
FIGURE 10–3
Data Formats for
McNemar Test
Columns 1 through 3 are in
tabular format, and columns
4 through 6 are raw data
observations.
If you have data for the numbers of observations for each group that fall
in two categories perform (2 analysis of contingency tables instead. This
will produce the same P value as the z-test. You can also run the (2
analysis of contingency tables if you have more than two groups or
categories.
About the z-test The z-test comparison of proportions is used to determine if the
proportions of two groups within one category or class are significantly
different. The
z-test assumes that:
2 If desired, set the z-test options using the Options for z-test dialog
box
(page 435).
3 Select z-test from the toolbar, then click the button, or choose
the Statistics menu Rates and Proportions command, then choose
z-test.
4 Run the test by selecting the worksheet columns with the data you
want to test using the Pick Columns dialog box (page 438).
To compare two proportions, enter the two sample sizes in one column
and the corresponding observed proportions p in a second column.
There must be exactly two rows and two columns. The sample sizes
must be whole numbers and the observed proportions must be between
0 and 1. For more information see Data Format for Rate and
Proportion Tests on page 431.
1 If you are going to run the test after changing test options, and
want to select your data before you run the test, drag the pointer
over your data.
2 To open the Options for z-test dialog box, select z-test from the
toolbar drop-down, then click the button, or choose the
Statistics menu Current Test Options... command. The power
Yates Correction Factor, and Confidence Intervals appear.
3 Click a check box to enable or disable a test option. All options are
saved between SigmaStat sessions.
FIGURE 10–4
The Options for z-test Dialog
Box
4 To continue the test, click Run Test. The Pick Columns dialog
box appears (see page 438 for more information).
5 To accept the current settings and close the options dialog box,
click OK. To accept the current setting without closing the
options dialog box, click Apply. To close the dialog box without
changing any settings or running the test, click Cancel.
You can select Help at any time to access SigmaStat’s on-line help
system.
Power Leave the Power option selected to detect the sensitivity of the test. The
power or sensitivity of a test is the probability that the test will detect a
Change the alpha value by editing the number in the Alpha Value box.
The Yates When a statistical test uses a () distribution with one degree of freedom,
Correction Factor such as analysis of a 2 x 2 contingency table or McNemar's test, the (2
calculated tends to produce P values which are too small, when
compared with the actual distribution of the (2 test statistic. The
theoretical (2 distribution is continuous, whereas the distribution of the
(2 test statistic is discrete.
Use the Yates Correction Factor to adjust the computed (2 value down
to compensate for this discrepancy. Using the Yates correction makes a
test more conservative, i.e., it increases the P value and reduces the
chance of a false positive conclusion. The Yates correction is applied to
2 x 2 tables and other statistics where the P value is computed from a (2
distribution with one degree of freedom.
Click the selected check box to turn the Yates Correction Factor on or
off.
Confidence Interval This is the confidence interval for the difference of proportions. To
change the specified interval, select the box and type any number from 1
to 99 (95 and 99 are the most commonly used intervals).
Running a z-test 10
To run a test, you need to select the data to test. The Pick Columns
dialog box is used to select the worksheet columns with the data you
want to test and to specify how your data is arranged in the worksheet.
To run a z-test:
1 If you want to select your data before you run the test, drag the
pointer over your data.
2 Open the Pick Columns dialog box to start the z-test. You can
either:
➤ Select z-test from the toolbar drop-down list, then select the
button.
➤ Choose the Statistics menu Rates and Proportions command,
then choose z-test...
➤ Click the Run Test button from the Options for z-test dialog
box (see step 4 on page 436).
If you selected columns before you chose the test, the selected
columns appear in the column list. If you have not selected
columns, the dialog box prompts you to pick your data.
The first selected column is assigned to the Size row in the Selected
Columns list, and the second column is assigned to Proportion
row in the list. The title of selected columns appear in each row.
You can only select one Size and one Proportion data column.
FIGURE 10–5
The Pick Columns
for z-test Dialog Box
Prompting You to
Select Data Columns
The z-test report displays a table of the statistical values used, the z
statistic, and the P for the test. You can also display a confidence
interval for the difference of the proportions using the Options for z-test
dialog box (see Setting z-test Options on page 435).
For descriptions of the derivation for z-test results, you can reference any
appropriate statistics reference. For a list of suggested references, see
page 12.
The report scroll bars only scroll to the top and bottom of the current page.
To move to the next or the previous page in the report, use the and
buttons in the formatting toolbar to move one page up and down in the
report.
Result Explanations In addition to the numerical results, expanded explanations of the results
may also appear. To turn off this explanatory text, choose the Statistics
menu Report Options... command and uncheck the Explain Test
Results option.
Statistical Summary The summary table for a z-test lists the sizes of the groups n and the
proportion of each group in the category p. These values are taken
directly from the data.
FIGURE 10–6
The z-test
Comparison of
Proportions
Results Report
You can conclude from “large” absolute values of z that the proportions
of the populations are different. A large z indicates that the difference
between the proportions is larger than what would be expected from
sampling variability alone (i.e., that the difference between the
proportions of the two groups is statistically significant). A small z (near
0) indicates that there is no significant difference between the
proportions of the two groups.
If you enabled the Yates correction in the Options for z-test dialog box,
the calculation of z is slightly smaller to account for the difference
between the theoretical and calculated values of z. For more
Confidence Interval for If the confidence interval does not include zero, you can conclude that
the Difference there is a significant difference between the proportions with the level of
confidence specified. This can also be described as P < #, where # is the
acceptable probability of incorrectly concluding that there is a
difference.
This result is displayed unless you disable it in the Options for z-test
dialog box (see page 437).
Power The power, or sensitivity, of a z-test is the probability that the test will
detect a difference among the groups if there really is a difference. The
closer the power is to 1, the more sensitive the test. z-test power is
affected by the sample size and the observed proportions of the samples.
This result is displayed unless you disabled it in the Options for z-test
dialog box (see page 435).
The #%value is set in the z-test Power dialog box (see page 719 in
Chapter 10, Comparing Frequencies, Rates, and Proportions); the
suggested value is # " 0.05 which indicates that a one in twenty chance
of error is acceptable. Smaller values of # result in stricter requirements
before concluding there is a difference in distribution, but a greater
A 2 x 2 contingency table has two groups and two categories, (i.e., two
rows and two columns), a 2 x 3 table has two groups and three categories
or three groups and two categories, etc.
TABLE 3-4
A Contingency Table Species Location
Describing the Number
of Lowland and Alpine Tundra Foothills Treeline
Species Found at
Different Locations Lowland 125 16 6
Alpine 7 19 117
The (2 test uses the percentages of the row and column totals for each
cell to compute the expected number of observations per cell if the
2 If desired, set the Chi-Square options using the Options for Chi-
Square dialog box (page 444).
3 Select Chi-Square from the toolbar drop-down list, then select the
button, or choose the Statistics menu Rates and Proportions
command, then choose Chi-Square.
4 Run the test by selecting the worksheet columns with the data you
want to test using the Pick Columns dialog box (page 446).
The data format used in the test is specified in the Pick Columns dialog
box. For more information on selecting a data format in the Pick
Columns dialog box, see page 447.
Tabulated Data Tabulated data is arranged in a contingency table using the worksheet
rows and columns as the groups and categories. The number of
observations for each combination of the group are entered into the
appropriate cells.
Raw Data Raw data uses a row for each individual observation, and places the
corresponding groups for the observations in one column and the
categories in a second column. SigmaStat automatically determines the
number of groups and categories used. For more information on
arranging data as indexed data, see Data Format for Rate and Proportion
Tests on page 431.
FIGURE 10–7
Valid Data Formats
for a%(2 Test
Columns 1 through 3
are arranged as a
contingency table.
Columns 4 and 5 are raw
data for the observations.
Each row corresponds to
a single observation.
Selecting Data Columns When running a Chi-Square test, you can either:
1 If you are going to run the test after changing test options, and
want to select your data before you run the test, drag the pointer
over your data.
3 Click a check box to enable or disable a test option. All options are
saved between SigmaStat sessions.
4 To continue the test, click Run Test. The Pick Columns dialog
box appears (see page 446 for more information).
5 To accept the current settings and close the options dialog box,
click OK. To accept the current setting without closing the
options dialog box, click Apply. To close the dialog box without
changing any settings or running the test, click Cancel.
You can select Help at any time to access SigmaStat’s on-line help
system.
Power Leave the Power option selected to detect the sensitivity of the test. The
power or sensitivity of a test is the probability that the test will detect a
difference between the proportions of two groups if there is really a
difference.
FIGURE 10–8
The Options for
Chi-Square Dialog Box
Change the alpha value by editing the value in the Alpha Value box.
The Yates When a statistical test uses a () distribution with one degree of freedom,
Correction Factor such as analysis of a 2 x 2 contingency table or McNemar's test, the (2
calculated tends to produce P values which are too small, when
compared with the actual distribution of the (2 test statistic. The
theoretical (2 distribution is continuous, whereas the (2 produced with
real data is discrete.
You can use the Yates Continuity Correction to adjust the computed (2
value down to compensate for this discrepancy. Using the Yates
correction makes a test more conservative, i.e., it increases the P value
and reduces the chance of a false positive conclusion. The Yates
correction is applied to 2 x 2 tables and other statistics where the P value
is computed from a (2 distribution with one degree of freedom.
Click the check box to turn the Yates Correction Factor on or off.
To run a test, you need to select the data to test. The Pick Columns
dialog box is used to select the worksheet columns with the data you
want to test and to specify how your data is arranged in the worksheet.
1 If you want to select your data before you run the test, drag the
pointer over your data.
2 Open the Pick Columns dialog box to start the Chi-Square test.
You can either:
3 Select the appropriate data format from the Data Format drop-
down list. If you are testing contingency table data, select
Tabulated. If your data is arranged in raw format, select Raw (see
page 443).
FIGURE 10–9
The Pick Columns
for Chi-Square Test Dialog
Box
Prompting You to
Specify a Data Format
For more information on arranging data, see Data Format for Rate
and Proportion Tests on page 431, or Arranging Data for
Contingency Tables on page 69.
4 Select Next to pick the data columns for the test. If you selected
columns before you chose the test, the selected columns appear in
the Selected Columns list.
If you selected columns before you chose the test, the selected
columns appear in the column list. If you have not selected
columns, the dialog box prompts you to pick your data.
FIGURE 10–10
The Pick Columns
for Chi-Square Dialog Box
Prompting You to
Select Data Columns
7 Select Finish to run the test. If there are too many cells in a
contingency table with expected values below 5, SigmaStat either:
The report for a (2 test lists a summary of the contingency table data,
the (2 statistic calculated from the distributions, and the P value for (2.
For descriptions of the derivations for (2 test results, you can reference
any appropriate statistics reference. For a list of suggested references, see
page 12.
The report scroll bars only scroll to the top and bottom of the current page.
To move to the next or the previous page in the report, use the and
buttons in the formatting toolbar to move one page up and down in the
report.
Results Explanations In addition to the numerical results, expanded explanations of the results
may also appear. To turn off this explanatory text, choose the Statistics
menu Report Options... command and click the selected Explain Test
Results check box.
FIGURE 10–11
A%Chi-Square
Test Results Report
Chi-Square ((2) (2 is the summed squared differences between the observed frequencies
in each cell of the table and the expected frequencies, or
2
2 observed – expected numbers per cell + -
*-----------------------------------------------------------------------------------------
( =
expected numbers per cell
This computation assumes that the rows and columns are independent.
If the value of (2 is large, you can conclude that the distributions are
different (i.e., that there is a large differences between the expected and
observed frequencies, indicating that the rows and columns are
independent).
Values of (2 near zero indicate that the pattern in the contingency table
is no different from what one would expect if the counts were
distributed at random.
Power The power, or sensitivity, of a Chi-Square test is the probability that the
test will detect a difference among the groups if there really is a
difference. The closer the power is to 1, the more sensitive the test.
Chi-Square power is affected by the sample size and the observed
proportions of the samples. This result is displayed if you selected this
option in the Options for Chi-Square dialog box.
The # value is set in the Power Option dialog box (see Determining the
Power of a Chi-Square Test on page 723). The suggested value is # "
0.05, which indicates that a one in twenty chance of error is acceptable.
Smaller values of # result in stricter requirements before concluding
there is a difference in distribution, but a greater possibility of
concluding there is no difference when one exists (a Type II error).
Larger values of # make it easier to conclude that there is a difference,
but also increase the risk of seeing a false difference (a Type I error).
If no cells have less than five expected observations, you can use a (2 test.
About the The Fisher Exact Test determines the exact probability of observing a
Fisher Exact Test specific 2 x 2 contingency table (or a more extreme pattern). Use the
Fisher Exact Test instead of (2 analysis of a 2 x 2 contingency table when
the expected frequencies of one or more cells is less than 5.
2 Select Fisher Exact Test from the toolbar, then select the
button, or choose the Statistics menu Rates and Proportions
command, then choose Fisher Exact Test.
3 Run the test by selecting the worksheet columns with the data you
want to test using the Pick Columns dialog box (page 453).
4 View and interpret the Fisher Exact Test report (page 455).
The data of a Fisher Exact Test must form a 2 x 2 contingency table, that
is, exactly two rows by two columns. The data can be tabulated data in
2 x 2 table entered in the worksheet or from two columns of raw data.
Tabulated Data Tabulated or contingency table data uses the rows to represent the two
groups, and the columns to represent the two categories, or vice versa.
The number of individuals that fall into each combination of groups
and categories is entered into each cell. There should be no more than
two rows and two columns.
Raw Data Raw data uses a row for each individual observation, and places the
corresponding groups for the observations in one column and the
FIGURE 10–12
Valid Data Formats for a
Fisher Exact Test
Columns 1 and 2 are
arranged as a 2 x 2
contingency table, and
columns 3 and 4 are the
raw observation data.
To run a test, you need to select the data to test. The Pick Columns
dialog box is used to select the worksheet columns with the data you
want to test and to specify how your data is arranged in the worksheet.
1 If you want to select your data before you run the test, drag the
pointer over your data.
2 Open the Pick Columns dialog box to start the Fisher Exact Test.
You can either:
➤ Select Fisher Exact Test from the toolbar drop-down list, then
select the button.
➤ Choose the Statistics menu Rates and Proportions command,
then choose Fisher Exact Test...
3 Select the appropriate data format from the Data Format drop-
down list. If you are testing contingency table data, select
Tabulated. If your data is arranged in raw format, select Raw (see
page 452).
FIGURE 10–13
The Pick Columns
for Fisher Exact Test
Dialog Box Prompting You to
Specify a Data Format
For more information on arranging data, see Data Format for Rate
and Proportion Tests on page 431, or Arranging Data for
Contingency Tables on page 69.
4 Select Next to pick the data columns for the test. If you selected
columns before you chose the test, the selected columns appear in
the Selected Columns list.
If you selected columns before you chose the test, the selected
columns appear in the column list. If you have not selected
columns, the dialog box prompts you to pick your data.
FIGURE 10–14
The Pick Columns
for Fisher Exact Test
Dialog Box Prompting You to
Select Data Columns
7 Select Finish to run the test. If there are no cells in the table with
expected values below 5, SigmaStat suggests the (2 test instead (the
Fisher Exact Test can be used, but takes longer to compute).
For descriptions of the derivations for Fisher Exact Test results, you can
reference any appropriate statistics reference. For a list of suggested
references, see page 12.
% The report scroll bars only scroll to the top and bottom of the current page.
To move to the next or the previous page in the report, use the and
buttons in the formatting toolbar to move one page up and down in the
report.
Result Explanations In addition to the numerical results, expanded explanations of the results
may also appear. To turn off this explanatory text, choose the Statistics
FIGURE 10–15
A Fisher Exact Test
Results Report
The Fisher Exact Test computes P directly using a two tailed probability.
McNemar’s Test 10
About the McNemar's Test is an analysis of contingency tables that have repeated
McNemar Test observations of the same individuals. These table designs are used when
2 If desired, set the McNemar Test options using the Options for
McNemar’s dialog box (page 459).
3 Select McNemar Test from the toolbar, then select the button,
or choose the Statistics menu Rates and Proportions command,
McNemar Test from the Statistics menu.
4 Run the test by selecting the worksheet columns with the data you
want to test using the Pick Columns dialog box (page 461).
The data for McNemar's Test must form a contingency table that has
exactly the same number of rows and columns. The data can be
tabulated in a table entered in the worksheet or from two columns of
raw data.
TABLE 3-5
A 3 x 3 Contingency Table Before Report After Report
Describing the Effect of a
Report on the Opinion of Approve Disapprove Don’t Know
Surveyed People
The McNemar Test Approve 12 24 6
ignores people who
didn’t change their opinion. Disapprove 5 32 3
Don’t Know 4 6 4
Tabulated Data For tabulated or contingency table data, the worksheet rows correspond
to one set of treatment categories and the columns to the other set of
treatment categories. The number of individuals that correspond to that
combination of categories is entered into each cell. The categories
assigned to the rows are assumed to be in the same order of occurrence as
the columns. The number of individuals that fall into each combination
of the categories is entered into each cell. Because the same set of
categories are used for the two different treatments, the number of rows
and columns in the table are always the same.
Raw Data Raw data uses a row for each individual observation, and places the
corresponding groups for the first treatment category in one column and
The data format used when running a test is specified in the Pick
Columns dialog box. See page 461 for more information.
FIGURE 10–16
Valid Data Formats for
McNemar Test
Columns 1 through 3
are arranged as a 3 x 3
contingency table, and
columns 4 and 5 are raw
observation data.
Use the McNemar Test options to enable the Yates Correction Factor.
1 If you are going to run the test after changing test options and
want to select your data before you run the test, drag the pointer
over your data.
3 Leave the Yates Correction check box selected to include the Yates
Correction Factor in the test report. Click the selected check box
4 To continue the test, click Run Test. The Pick Columns dialog
box appears (see page 461 for more information). To close the
options dialog box and accept the current settings without
continuing the test, click OK. Click Apply to accept the current
settings without closing the dialog box, and to click Cancel close
the dialog box without changing any settings or running the test.
You can select Help at any time to access SigmaStat’s on-line help
system.
The Yates When a statistical test uses a () distribution with one degree of freedom,
Correction Factor such as analysis of a 2 x 2 contingency table or McNemar's test, the (2
calculated tends to produce P values which are too small when compared
with the actual distribution of the (2 test statistic. The theoretical (2
distribution is continuous, whereas the (2 produced with real data is
discrete.
FIGURE 10–17
The Options for
McNemar Test Dialog Box
You can use the Yates Continuity Correction to adjust the computed (2
value down to compensate for this discrepancy. Using the Yates
correction makes a test more conservative, i.e., it increases the P value
and reduces the chance of a false positive conclusion. The Yates
correction is applied to 2 x 2 tables and other statistics where the P value
is computed from a (2 distribution with one degree of freedom.
Click the check box to enable or disable the Yates Correction Factor.
To run the McNemar Test, you need to select the data to test. The Pick
Columns dialog box is used to select the worksheet columns with the
data you want to test and to specify how your data is arranged in the
worksheet.
1 If you want to select your data before you run the test, drag the
pointer over your data.
2 Open the Pick Columns dialog box to start the McNemar Test.
You can either:
3 Select the appropriate data format from the Data Format drop-
down list. If you are testing contingency table data, select
Tabulated. If your data is arranged in raw format, select Raw (see
page 458).
For more information on arranging data, see Data Format for Rate
and Proportion Tests on page 431, or Arranging Data for
Contingency Tables on page 69.
4 Select Next to pick the data columns for the test. If you selected
columns before you chose the test, the selected columns appear in
the Selected Columns list.
FIGURE 10–18
The Pick Columns
for McNemar’s Test
Dialog Box Prompting You to
Specify a Data Format
If you selected columns before you chose the test, the selected
columns appear in the column list. If you have not selected
columns, the dialog box prompts you to pick your data.
FIGURE 10–19
The Pick Columns
for McNemar’s Test
Dialog Box Prompting You to
Select Data Columns
7 Select Finish to run the test. The McNemar’s test report appears.
.
The report for McNemar Test lists a summary of the contingency table
data, the (2 statistic calculated from the distributions, and the P value.
The report scroll bars only scroll to the top and bottom of the current page.
To move to the next or the previous page in the report, use the and
buttons in the formatting toolbar to move one page up and down in the
report.
Result Explanations In addition to the numerical results, expanded explanations of the results
may also appear. To turn off this explanatory text, choose the Statistics
menu Report Options... command and uncheck the Explain Test
Results option.
FIGURE 10–20
A McNemar Test
Results Report
Chi-Square ((2) (2 is the summed squared differences between the observed frequencies
in each cell of the table and the expected frequencies, ignoring
observations on the diagonal cells of the table where the individuals
responded identically to the treatments.
2
2 observed – expected numbers per cell + -
*-----------------------------------------------------------------------------------------
( =
expected numbers per cell
Values of (2 near zero indicate that the pattern in the contingency table
is no different from what one would expect if the counts were
distributed at random.
Contingency Each cell in the table is described with a set of statistics for that cell.
Table Summary
Observed Counts These are the number of observations per cell,
obtained from the contingency table data.
About Regression 10
For example, Simple Linear Regression uses the equation for a straight
line
y = b0 + b1 x
y = b0 + b1 x1 + b2 x2 + b3 x3 + , + bk xk
where y is the dependent variable, x1, x2, x3, ..., xk are the k independent
variables, and b0, b1, b2,...,bk are the k regression coefficients. As the
values for xi increase by 1, the corresponding value for y either increases
or decreases by bk depending on the sign of bk.
FIGURE 11–1 11
Graph of a Linear
Regression with data points, 10
Regression Line, and
Residual
Residuals Labeled 9
Dependent Variable, y
8 Data point
6
Regression
5
2
100 120 140 160 180 200 220 240 260 280 300
Independent Variable, x
Correlation 10
8 8
6 6
4 4
2 2
0 0
0 2 4 6 8 10 12 0 2 4 6 8 10 12
r = 0.75 r = 0.05
12 12
10 10
8 8
6 6
4 4
2 2
0 0
0 2 4 6 8 10 12 0 2 4 6 8 10 12
Correlation 467
Prediction and Correlation
See the Selecting Data Columns sections under each test for information
on selecting blocks of data instead of entire columns.
FIGURE 11–3
Data for a Multiple
Linear Regression
Temperature and pH are
the independent variables,
and Growth Rate is the
dependent variable.
About the Simple Linear Linear Regression assumes an association between the independent and
Regression dependent variable that, when graphed on a Cartesian coordinate
system, produces a straight line. Linear Regression finds the straight line
that most closely describes, or predicts, the value of the dependent
variable, given the observed value of the independent variable.
The equation used for a Simple Linear Regression is the equation for a
straight line, or
y = b0 + b1 x
2 If desired, set the Linear Regression options using the Options for
Linear Regression dialog box (page 471).
4 Run the test by selecting the worksheet columns with the data you
want to test using the Pick Columns dialog box (page 482).
Place the data for the observed dependent variable in one column and
the data for the corresponding independent variable in a second column.
Observations containing missing values are ignored, and both columns
must be equal in length.
FIGURE 11–4
Data Format for a
Simple Linear Regression
1 If you are going to run the test after changing test options, and
want to select your data before you run the test, drag the pointer
over your data.
3 Click the Residuals tab to view the residual options (see Figure 11–
6 on page 475), More Statistics tab to view the confidence
intervals, PRESS Prediction Error, and Standardized Coefficients
options (see 11–7 on page 477), and Other Diagnostics tab to view
the Influence and Power options (see Figures 11–9 on page 480).
Click the Assumption Checking tab to return to the Normality,
Constant Variance, and Durbin-Watson options.
5 To continue the test, click Run Test. The Pick Columns dialog
box appears (see page 482 for more information).
6 To accept the current settings and close the options dialog box,
click OK. To accept the current setting without closing the
options dialog box, click Apply. To close the dialog box without
changing any settings or running the test, click Cancel.
You can select Help at any time to access SigmaStat’s on-line help system.
Assumption Checking Select the Assumption Checking tab from the options dialog box to view
Options the Normality, Constant Variance, and Durbin-Watson options. These
options test your data for its suitability for regression analysis by
checking three assumptions that a linear regression makes about the
data. A linear regression assumes:
FIGURE 11–5
The Options for
Linear Regression Dialog
Box Displaying the
Assumption Checking
Options
Although the assumption tests are robust in detecting data from populations
that are non-normal or with non-constant variances, there are extreme
conditions of data distribution that these tests cannot detect. However, these
conditions should be easily detected by visually examining the data without
resorting to the automatic assumption tests.
Difference from 2 Value Enter the acceptable deviation from 2.0 that
you consider as evidence of a serial correlation in the Difference for 2.0
box. If the computed Durbin-Watson statistic deviates from 2.0 more
than the entered value, SigmaStat warns you that the residuals may not
be independent. The suggested deviation value is 0.50, i.e., Durbin-
Watson Statistic values greater than 2.5 or less than 1.5 flag the residuals
as correlated.
Residuals Select the Residuals tab in the options dialog box to view the Predicted
Values, Raw, Standardized, Studentized, Studentized Deleted, and
Report Flagged Values Only options.
FIGURE 11–6
The Options for
Linear Regression
Dialog Box Displaying the
Residuals Options
Predicted Values Use this option to calculate the predicted value of the dependent variable
for each observed value of the independent variable(s), then save the
results to the worksheet. Click the selected check box if you do not want
to include raw residuals in the worksheet.
Raw Residuals The raw residuals are the differences between the
predicted and observed values of the dependent variables. To include
raw residuals in the report, make sure this check box is selected. Click
the selected check box if you do not want to include raw residuals in the
worksheet.
standardized residuals in the report, make sure this check box is selected.
Click the selected check box if you do not want to include raw residuals
in the worksheet.
To include studentized residuals in the report, make sure this check box
is selected. Click the selected check box if you do not want to include
studentized residuals in the worksheet.
SigmaStat can automatically flag data points with “large” values of the
studentized deleted residual, i.e., outlying data points; the suggested data
points flagged lie outside the 95% confidence interval for the regression
population.
Note that both Studentized and Studentized deleted residuals use the same
confidence interval setting to determine outlying points.
Confidence Intervals Select the More Statistics tab in the options dialog box to view the
confidence interval options. You can set the confidence interval for the
population, regression, or both and then save them to the worksheet.
FIGURE 11–7
The Options for Linear
Regression Dialog Box
Displaying the Confidence
Intervals Options
Click the selected check box if you do not want to include the
confidence intervals for the population in the report.
PRESS Select the More Statistics tab in the options dialog box to view the
Prediction Error PRESS Prediction Error option (see Figure 11–7 on page 477). The
PRESS Prediction Error is a measure of how well the regression equation
fits the data. Leave this check box selected to evaluate the fit of the
equation using the PRESS statistic. Click the selected check box if you
do not want to include the PRESS statistic in the report.
Standardized Click the More Statistics tab in the options dialog box to view the
Coefficients (-i) Standardized Coefficients option (see Figure 11–7 on page 477). These
are the coefficients of the regression equation standardized to
dimensionless values,
sx
- i = b i ----i
sy
Influence Options Select the Other Diagnostics tab in the options dialog box to view the
Influence options. Influence options automatically detect instances of
influential data points. Most influential points are data points which are
outliers, that is, they do not do not “line up” with the rest of the data
points. These points can have a potentially disproportionately strong
influence on the calculation of the regression line. You can use several
influence tests to identify and quantify influential points.
FIGURE 11–8 16
A Graph with an
Influential Outlying Point 14 Outlying Point
The solid line shows the
regression for the data 12
including the outlier, and
the dotted line is the
regression computed 10
with the outlying point.
8
2
100 120 140 160 180 200 220 240 260 280 300
Predicted values that change by more than two standard errors when the
data point is removed are considered to be influential.
Check the DFFITS check box to compute this value for all points and
flag influential points, i.e., those with DFFITS greater than the value
specified in the Flag Values > edit box. The suggested value is 2.0
standard errors, which indicates that the point has a strong influence on
the data. To avoid flagging more influential points, increase this value;
to flag less influential points, decrease this value.
FIGURE 11–9
The Options for Linear
Regression Dialog Box
Displaying the Influence
and Power Options
Check the Leverage check box to compute the leverage for each point
and automatically flag potentially influential points, i.e., those points
that could have leverages greater than the specified value times the
expected leverage. The suggested value is 2.0 times the expected leverage
* k + 1 +-
for the regression (i.e., 2--------------------
n
). To avoid flagging more potentially
influential points, increase this value; to flag points with less potential
influence, lower this value.
Check the Cook's Distance check box to compute this value for all
points and flag influential points, i.e., those with a Cook's distance
greater than the specified value. The suggested value is 4.0. Cook's
distances above 1 indicate that a point is possibly influential. Cook's
distances exceeding 4 indicate that the point has a major effect on the
values of the parameter estimates. To avoid flagging more influential
points, increase this value: to flag less influential points, lower this value.
Power Select the Other Diagnostics tab in the options dialog box to view the
Power options (see Figure 11–9 on page 480). The power of a regression
is the power to detect the observed relationship in the data. The alpha
(#) is the acceptable probability of incorrectly concluding there is a
relationship.
Check the Power check box to compute the power for the linear
regression data. Change the alpha value by editing the number in the
Alpha Value edit box. The suggested value is # " 0.05. This indicates
that a one in twenty chance of error is acceptable, or that you are willing
to conclude there is a significant relationship when P ! 0.05.
To run a Simple Linear Regression, you need to select the data to test.
The Pick Columns dialog box is used to select the worksheet columns
with the data you want to test.
1 If you want to select your data before you run the test, drag the
pointer over your data.
2 Open the Pick Columns dialog box to start the Linear Regression.
You can either:
If you selected columns before you chose the test, the columns
appear in the Selected Columns list. If you have not selected
columns, the dialog box prompts you to pick your data.
FIGURE 11–10
The Pick Columns
for Linear Regression
Dialog Box Prompting You to
Select Data Columns
The report for a Linear Regression displays the equation with the
computed coefficients for the line, R, R2, and adjusted R2, a table of
statistical values for the estimate of the dependent variable, and the P
values for the regression equation and for the individual coefficients.
The other results displayed in the report are enabled and disabled
Options for Linear Regression dialog box (see Setting Linear Regression
Options on
page 471).
The report scroll bars only scroll to the top and bottom of the current page.
To move to the next or the previous page in the report, use the and
buttons in the formatting toolbar to move one page up and down in the
report.
Result Explanations In addition to the numerical results, expanded explanations of the results
may also appear. To turn off this explanatory text, choose the Statistics
menu Report Options... command and uncheck the Explain Test
Results option.
Regression Equation This is the equation for a line with the values of the coefficients—the
intercept (constant) and the slope—in place.
y = b0 + b1 x
FIGURE 11–11
An Example of the
Simple Linear
Regression Report
R equals 0 when the values of the independent variable do not allow any
prediction of the dependent variables, and equals 1 when you can
perfectly predict the dependent variable from the independent variable.
2
Adjusted R2 The adjusted R2, R adj , is also a measure of how well the
regression model describes the data, but takes into account the number
of independent variables, which reflects the degrees of freedom. Larger
2
R adj values (nearer to 1) indicate that the equation is a good description
of the relation between the independent and dependent variables.
Standard Error of The standard error of the estimate s y x is a measure of the actual
the Estimate ( s y x ) variability about the regression line of the underlying population. The
underlying population generally falls within about two standard errors of
the observed sample.
Statistical Coefficients The value for the constant (intercept) and coefficient of
Summary Table the independent variable (slope) for the regression model are listed.
Standard Error The standard errors of the intercept and slope are
measures of the precision of the estimates of the regression coefficients
(analogous to the standard error of the mean). The true regression
coefficients of the underlying population generally fall within about two
standard errors of the observed sample coefficients. These values are
used to compute t and confidence intervals for the regression.
t Statistic The t statistic tests the null hypothesis that the coefficient of
the independent variable is zero, that is, the independent variable does
not contribute to predicting the dependent variable. t is the ratio of the
regression coefficient to its standard error, or
regression coefficient
t = -------------------------------------------------------------------------------------
-
standard error of regression coefficient
You can conclude from “large” t values that the independent variable can
be used to predict the dependent variable (i.e., that the coefficient is not
zero).
sx
- 1 = b 1 ---
sy
Analysis of Variance The ANOVA (analysis of variance) table lists the ANOVA statistics for
(ANOVA) Table the regression and the corresponding F value.
2
The residual mean square is also equal to s y x .
In simple linear regression, the P value for the ANOVA is identical to the P
value associated with the t of the slope coefficient, and F"t 2, where t is the t
value associated with the slope.
PRESS Statistic PRESS, the Predicted Residual Error Sum of Squares, is a gauge of
how well a regression model predicts new data. The smaller the PRESS
statistic, the better the predictive ability of the model.
Regression assumes that the residuals are independent of each other; the
Durbin-Watson test is used to check this assumption. If the Durbin-
Watson value deviates from 2 by more than the value set in the Options
for Linear Regression dialog box, a warning appears in the report. The
suggested trigger value is a difference of more than 0.50 (i.e., if the
Durbin-Watson statistic is below 1.5 or over 2.5).
Normality Test Normality test result displays whether the data passed or failed the test of
the assumption that the source population is normally distributed
around the regression line, and the P value calculated by the test. All
regressions assume a source population to be normally distributed about
the regression line. When this assumption may be violated, a warning
appears in the report. This result appears unless you disabled normality
testing in the Options for Linear Regression dialog box (see page 471).
Constant The constant variance test result displays whether or not the data passed
Variance Test or failed the test of the assumption that the variance of the dependent
variable in the source population is constant regardless of the value of
the independent variable, and the P value calculated by the test. When
the constant variance assumption may be violated, a warning appears in
the report.
If you receive this warning, you should consider trying a different model
(i.e., one that more closely follows the pattern of the data), or
transforming the independent variable to stabilize the variance and
obtain more accurate estimates of the parameters in the regression
equation. See Chapter 14, Using Transforms for more information on
the appropriate transform to use.
Power This result is displayed if you selected this option in the options dialog
box. The power, or sensitivity, of a performed regression is the
probability that the model correctly describes the relationship of the
variables, if there is a relationship.
The # value is set in the Power Options dialog box; the suggested value
is
#%" 0.05 which indicates that a one in twenty chance of error is
acceptable. Smaller values of # result in stricter requirements before
concluding the model is correct, but a greater possibility of concluding
the model is bad when it is really correct (a Type II error). Larger values
of # make it easier to conclude that the model is correct, but also
increase the risk of accepting a bad model (a Type I error).
Regression Diagnostics The regression diagnostic results display only the values for the predicted
values, residual results, and other diagnostics selected in the Options for
Regression dialog box (see page 471). All results that qualify as outlying
values are flagged with a ! symbol. The trigger values to flag residuals as
outliers are set in the Options for Linear Regression dialog box.
If you selected Report Cases with Outliers Only, only those observations
that have one or more residuals flagged as outliers are reported; however,
all other results for that observation are also displayed.
Predicted Values This is the value for the dependent variable predicted
by the regression model for each observation.
Residuals These are the raw residuals, the difference between the
predicted and observed values for the dependent variables.
If the residuals are normally distributed about the regression line, about
66% of the standardized residuals have values between &1 and 1, and
about 95% of the standardized residuals have values between !2 and 2.
A larger standardized residual indicates that the point is far from the
regression line; the suggested value flagged as an outlier is 2.5.
Influence Diagnostics The influence diagnostic results display only the values for the results
selected in the Options dialog box under the Other Diagnostics tab (see
page 478). All results that qualify as outlying values are flagged with a"#
symbol. The trigger values to flag data points as outliers are also set in
the Options dialog box under the Other Diagnostics tab.
If you selected Report Cases with Outliers Only, only observations that
have one or more observations flagged as outliers are reported; however,
all other results for that observation are also displayed.
Confidence Intervals These results are displayed if you selected them in the Regression
Options dialog box. If the confidence interval does not include zero,
you can conclude that the coefficient is different than zero with the level
of confidence specified. This can also be described as P # & (alpha),
where & is the acceptable probability of incorrectly concluding that the
coefficient is different than zero, and the confidence interval is 100(1"!"
&).
The specified confidence level can be any value from 1 to 99; the
suggested confidence level for both intervals is 95%.
Predicted This is the value for the dependent variable predicted by the
regression model for each observation.
Regression The confidence interval for the regression line gives the
range of variable values computed for the region containing the true
relationship between the dependent and independent variables, for the
specified level of confidence.
Population The confidence interval for the population gives the range
of variable values computed for the region containing the population
from which the observations were drawn, for the specified level of
confidence.
You can generate up to five graphs using the results from a Simple Linear
Regression. They include a:
Histogram of Residuals The linear regression histogram plots the raw residuals in a specified
range, using a defined interval set. The residuals are divided into a
number of evenly incremented histogram intervals and plotted as
histogram bars indicating the number of residuals in each interval. The
X axis represents the histogram intervals, and the Y axis represent the
Scatter Plot of The Linear Regression scatter plot of the residuals plots the residuals of
the Residuals the independent variables as points relative to the standard deviations.
The X axis represents the independent variable values, the Y axis
represents the residuals of the variables, and the horizontal lines running
across the graph represent the standard deviations of the data. For an
example of a scatter plot, see page 152 in the CREATING AND
MODIFYING GRAPHS chapter.
Bar Chart of The Linear Regression bar chart of the standardized residuals plots the
the Standardized standardized residuals of the independent variables as points relative to
Residuals the standard deviations. The X axis represents the independent variable
values, the Y axis represents the residuals of the variables, and the
horizontal lines running across the graph represent the standard
deviations of the data. For an example of a bar chart of the residuals, see
page 153 in the CREATING AND MODIFYING GRAPHS chapter.
Line/Scatter Plot The Linear Regression graph plots the observations of the linear
of the Regression with regression as a line/scatter plot. The points represent the data dependent
Prediction and variables plotted against the independent variables, the solid line
Confidence Intervals running through the points represents the regression line, and the
dashed lines represent the prediction and confidence intervals. The X
axis represents the independent variables and the Y axis represents the
dependent variables. For an example of a line/scatter plot of the
regression, see page 156 in the CREATING AND MODIFYING GRAPHS
chapter.
FIGURE 11–12
The Create Graph Dialog
Box
for the Linear Regression
Report Graphs
2 Select the type of graph you want to create from the Graph Type
list, then select OK, or double-click the desired graph in the list.
For more information on each of the graph types, see pages 11-493
through 11-494. The specified graph appears in a graph window
or in the report.
If you know there is only one independent variable, use Simple Linear
Regression. If you are not sure if all independent variables should be
FIGURE 11–13
An Example of a Line/
Scatter Plot of the Linear
Regression Observations
with a Regression and
Confidence and Prediction
Interval Lines
About the Multiple Multiple Linear Regression assumes an association between the
Linear Regression dependent and k independent variables that fits the general equation for
a multidimensional plane:
y = b0 + b1 x1 + b2 x2 + b3 x3 + ' + bk xk
where y is the dependent variable, x1, x2, x3, ..., xk are the k independent
variables, and b0, b1 ,b2,...,bk are the k regression coefficients.
Multiple Linear Regression is a parametric test, that is, for a given set of
independent variable values, the possible values for the dependent
variable are assumed to be normally distributed and have constant
variance about the regression plane.
2 If desired, set the Linear Regression options using the Options for
Multiple Linear Regression dialog box (page 498).
3 Select Multiple Linear Regression from the toolbar, then select the
button, or choose the Statistics menu command, then choose
Regression.
4 Run the test by selecting the worksheet columns with the data you
want to test using the Pick Columns dialog box (page 511).
Place the data for the observed dependent variable in one column and
the data for the corresponding independent variables in two or more
columns.
FIGURE 11–14
Data Format for a
Multiple Linear Regression
1 If you are going to run the test after changing test options and
want to select your data before you run the test, drag the pointer
over the data.
3 Click the Residuals tab to view the residual options (see Figure 11–
16 on page 502), More Statistics tab to view the confidence
intervals, PRESS Prediction Error, Standardized Coefficients
options (see Figure 11–17 on page 504), and Other Diagnostics to
view the Influence, Variance Inflation Factor, and Power options
(see Figure 11–19 on page 507). Click the Assumption Checking
tab to return to the Normality, Constant Variance, and Durbin-
Watson options.
5 To continue the test, click Run Test. The Pick Columns dialog
box appears (see page 498 for more information).
6 To accept the current settings and close the options dialog box,
click OK. To accept the current setting without closing the
options dialog box, click Apply. To close the dialog box without
changing any settings or running the test, click Cancel.
( You can click Help at any time to access SigmaStat’s on-line help system.
Assumption Checking Select the Assumption Checking tab from the options dialog box to view
Options the Normality, Constant Variance, and Durbin-Watson options. These
options test your data for its suitability for regression analysis by
FIGURE 11–15
The Options for Multiple
Linear Regression
Dialog Box Displaying
the Assumption
Checking Options
( Although the assumption tests are robust in detecting data from populations
that are non-normal or with non-constant variances, there are extreme
conditions of data distribution that these tests cannot detect. However, these
conditions should be easily detected by visually examining the data without
resorting to the automatic assumption tests.
Difference from 2 Value Enter the acceptable deviation from 2.0 that
you consider as evidence of a serial correlation in the Difference for 2.0
box. If the computed Durbin-Watson statistic deviates from 2.0 more
than the entered value, SigmaStat warns you that the residuals may not
be independent. The suggested deviation value is 0.50, i.e., Durbin-
Watson Statistic values greater than 2.5 or less than 1.5 flag the residuals
as correlated.
Residuals Select the Residuals tab in the options dialog box to view the Predicted
Values, Raw, Standardized, Studentized, Studentized Deleted, and
Report Flagged Values Only options.
FIGURE 11–16
The Options for Multiple
Linear Regression
Dialog Box Displaying
the Residual Options
Predicted Values Use this option to calculate the predicted value of the
dependent variable for each observed value of the independent
variable(s), then save the results to the data worksheet.
Raw Residuals The raw residuals are the differences between the
predicted and observed values of the dependent variables. To include
raw residuals in the report, make sure this check box is selected.
To include studentized residuals in the report, make sure this check box
is selected. Click the selected check box if you do not want to include
studentized residuals in the worksheet.
SigmaStat can automatically flag data points with “large” values of the
studentized deleted residual, i.e., outlying data points; the suggested data
points flagged lie outside the 95% confidence interval for the regression
population.
( Note that both Studentized and Studentized deleted residuals use the same
confidence interval setting to determine outlying points.
Report Flagged Values Only check box is selected. Uncheck this option
to include all standardized and studentized residuals in the report.
Confidence Intervals Select the More Statistics tab in the options dialog box to view the
confidence interval options. You can set the confidence interval for the
population, regression, or both and then save them to the data
worksheet.
FIGURE 11–17
The Options for Multiple
Linear Regression
Dialog Box Displaying
the Confidence
Interval Options
Click the selected check box if you do not want to include the
confidence intervals for the population in the report.
PRESS Select the More Statistics tab in the options dialog box to view the
Prediction Error PRESS Prediction Error option (see Figure 11–17 on page 504). The
PRESS Prediction Error is a measure of how well the regression equation
fits the data. Leave this check box selected to evaluate the fit of the
equation using the PRESS statistic. Click the selected check box if you
do not want to include the PRESS statistic in the report.
Standardized Click the More Statistics tab in the options dialog box to view the
Coefficients ()i) Standardized Coefficients option (see Figure 11–17 on page 504).
These are the coefficients of the regression equation standardized to
dimensionless values,
sx
) i = b i ----i
sy
Influence Options Select the Other Diagnostics tab in the options dialog box to view the
Influence options. Influence options automatically detect instances of
influential data points. Most influential points are data points which are
outliers, that is, they do not do not “line up” with the rest of the data
points. These points can have a potentially disproportionately strong
influence on the calculation of the regression line. You can use several
influence tests to identify and quantify influential points.
FIGURE 11–18 16
A Graph with an
Influential Outlying Point 14 Outlying Point
The solid line shows the
regression for the data 12
including the outlier, and
the dotted line is the
regression computed 10
with the outlying point.
8
2
100 120 140 160 180 200 220 240 260 280 300
Predicted values that change by more than two standard errors when the
data point is removed are considered to be influential.
Check the DFFITS check box to compute this value for all points and
flag influential points, i.e., those with DFFITS greater than the value
specified in the Flag Values > edit box. The suggested value is 2.0
standard errors, which indicates that the point has a strong influence on
the data. To avoid flagging more influential points, increase this value;
to flag less influential points, decrease this value.
Check the Leverage check box to compute the leverage for each point
and automatically flag potentially influential points, i.e., those points
that could have leverages greater than the specified value times the
expected leverage. The suggested value is 2.0 times the expected leverage
$ k + 1 %-
for the regression (i.e., 2--------------------
n
). To avoid flagging more potentially
influential points, increase this value; to flag points with less potential
influence, lower this value.
FIGURE 11–19
The Options for
Multiple Linear
Regression Dialog Box
Displaying the Influence
Variance Inflation Factor,
and Power Options
Check the Cook's Distance check box to compute this value for all
points and flag influential points, i.e., those with a Cook's distance
greater than the specified value. The suggested value is 4.0. Cook's
distances above 1 indicate that a point is possibly influential. Cook's
distances exceeding 4 indicate that the point has a major effect on the
values of the parameter estimates. To avoid flagging more influential
points, increase this value: to flag less influential points, lower this value.
Power Select the Other Diagnostics tab in the options dialog box to view the
Power options (see Figure 11–19 on page 507). The power of a
regression is the power to detect the observed relationship in the data.
The alpha (&) is the acceptable probability of incorrectly concluding
there is a relationship.
Check the Power check box to compute the power for the multiple
linear regression data. Change the alpha value by editing the number in
the Alpha Value edit box. The suggested value is & * 0.05. This
indicates that a one in twenty chance of error is acceptable, or that you
are willing to conclude there is a significant relationship when P # 0.05.
Variance Select the Other Diagnostics tab in the options dialog box to view the
Inflation Factor Variance Inflation Factor option (see Figure 11–19 on page 507). Use
this option to measure the multicollinearity of the independent
variables, or the linear combination of the independent variables in the
fit.
FIGURE 11–20
A Graph with
Multicollinear data points
Note that knowing the value 18
of one of the independent
variables allows you to 16
predict the other, so that the
independent variables are 14
Dependent y
statistically independent.
12
10 120
100
2
ent x
8 80
6 60
pend
40
4 600
Inde
500 20
400 300 200 100 0
Independ
ent x
1
Flagging Multicollinear Data Use the value in the Flag Values > edit
box as a threshold for multicollinear variables. The default threshold
value is 4.0, meaning that any value greater than 4.0 will be flagged as
multicollinear. To make this test more sensitive to possible
multicollinearity, decrease this value. To allow greater correlation of the
independent variables before flagging the data as multicollinear, increase
this value.
When the variance inflation factor is large, there are redundant variables
in the regression model, and the parameter estimates may not be reliable.
Variance inflation factor values above 4 suggest possible
multicollinearity; values above 10 indicate serious multicollinearity.
Report Flagged Values Only To only include only the points flagged
by the influential point tests and values exceeding the variance inflation
threshold in the report, make sure the Report Flagged Values Only check
box is selected. Uncheck this option to include all influential points in
the report.
To run a Multiple Linear Regression, you need to select the data to test.
Use the Pick Columns dialog box to select the worksheet columns with
the data you want to test.
1 If you want to select your data before you run the regression, drag
the pointer over your data.
2 Open the Pick Columns dialog box to start the Multiple Linear
Regression. You can either:
If you selected columns before you chose the test, the selected
columns appear in the column list. If you have not selected
columns, the dialog box prompts you to pick your data.
FIGURE 11–21
The Pick Columns
for Multiple Linear
Regression Dialog Box
The report for a Multiple Linear Regression displays the equation with
the computed coefficients, R, R2, and the adjusted R2, a table of
statistical values for the estimate of the dependent variable, and the P
value for the regression equation and for the individual coefficients.
The other results displayed in the report are enabled or disabled in the
Options for Multiple Linear Regression dialog box (see page 498).
Result Explanations In addition to the numerical results, expanded explanations of the results
may also appear. To turn off this explanatory text, choose the Statistics
menu Report Options command and clear the Explain Results option.
Regression Equation This is the equation with the values of the coefficients in place. This
equation takes the form:
y = b0 + b1 x1 + b2 x2 + b3 x3 + ' + bk xk
where y is the dependent variable, x1, x2, x3, ..., xk are the independent
variables, and b0, b1, b2, b3,...,bk are the regression coefficients.
R, R2, and Adj R2 R and R2 R, the correlation coefficient, and R2, the coefficient of
determination for multiple regression, are both measures of how well
the regression model describes the data. R values near 1 indicate that the
equation is a good description of the relation between the independent
and dependent variables.
R equals 0 when the values of the independent variable do not allow any
prediction of the dependent variables, and equals 1 when you can
perfectly predict the dependent variables from the independent
variables.
Adjusted R2 The adjusted R2, R2adj, is also a measure of how well the
regression model describes the data, but takes into account the number
of independent variables, which reflects the degrees of freedom. Larger
R2adj values (nearer to 1) indicate that the equation is a good description
of the relation between the independent and dependent variables.
Standard Error of The standard error of the estimate Sy x is a measure of the actual
the Estimate ( Sy x ) variability about the regression plane of the underlying population. The
underlying population generally falls within about two standard errors of
the estimate of the observed sample.
Statistical Coefficients The value for the constant and coefficients of the
Summary Table independent variables for the regression model are listed.
These values are used to compute t and confidence intervals for the
regression.
t Statistic The t statistic tests the null hypothesis that the coefficient of
the independent variable is zero, that is, the independent variable does
not contribute to predicting the dependent variable. t is the ratio of the
regression coefficient to its standard error, or:
regression coefficient
t = --------------------------------------------------------------------------------------
standard error of regression coefficient
You can conclude from “large” t values that the independent variable can
be used to predict the dependent variable (i.e., that the coefficient is not
zero).
committing a Type I error, based on t). The smaller the P value, the
greater the probability that the variables are correlated.
FIGURE 11–22
An Example
of a Multiple Linear
Regression Report
Analysis of Variance The ANOVA (analysis of variance) table lists the ANOVA statistics for
(ANOVA) Table the regression and the corresponding F value.
PRESS Statistic PRESS, the Predicted Residual Error Sum of Squares, is a gauge of
how well a regression model predicts new data. The smaller the PRESS
statistic, the better the predictive ability of the model.
Regression assumes that the residuals are independent of each other; the
Durbin-Watson test is used to check this assumption. If the Durbin-
Watson value deviates from 2 by more than the value set in the
Regression Options dialog box, a warning appears in the report. The
suggested trigger value is a difference of more than 0.50, i.e., the
Durbin-Watson statistic is below 1.50 or above 2.50.
Normality Test Normality test result displays whether the data passed or failed the test of
the assumption that the source population is normally distributed
around the regression, and the P value calculated by the test. All
regressions require a source population to be normally distributed about
the regression line. When this assumption may be violated, a warning
appears in the report. This result appears unless you disabled normality
testing in the Regression Options dialog box (see page 499).
Constant The constant variance test result displays whether or not the data passed
Variance Test or failed the test of the assumption that the variance of the dependent
variable in the source population is constant regardless of the value of
the independent variable, and the P value calculated by the test. When
the constant variance assumption may be violated, a warning appears in
the report.
If you receive this warning, you should consider trying a different model
(i.e., one that more closely follows the pattern of the data), or
transforming the independent variable to stabilize the variance and
obtain more accurate estimates of the parameters in the regression
equation. See Chapter 14, Using Transforms, USING TRANSFORMS for
more information on the appropriate transform to use.
Power This result is displayed if you selected this option in the Options for
Multiple Linear Regression dialog box.
The & value is set in the Power Options dialog box; the suggested value
is
&"* 0.05 which indicates that a one in twenty chance of error is
acceptable. Smaller values of & result in stricter requirements before
concluding the model is correct, but a greater possibility of concluding
the model is bad when it is really correct (a Type II error). Larger values
of & make it easier to conclude that the model is correct, but also
increase the risk of accepting a bad model (a Type I error).
Regression Diagnostics The regression diagnostic results display only the values for the predicted
values, residuals, and other diagnostic results selected in the Options for
Multiple Linear Regression dialog box (see page 498). All results that
qualify as outlying values are flagged with a # symbol. The trigger values
to flag residuals as outliers are set in the Options for Multiple Linear
Regression dialog box.
If you selected Report Cases with Outliers Only, only those observations
that have one or more residuals flagged as outliers are reported; however,
all other results for that observation are also displayed.
Predicted Values This is the value for the dependent variable predicted
by the regression model for each observation.
Residuals These are the raw residuals, the difference between the
predicted and observed values for the dependent variables.
If the residuals are normally distributed about the regression, about 66%
of the standardized residuals have values between !1 and 1, and about
95% of the standardized residuals have values between !2 and 2. A
larger standardized residual indicates that the point is far from the
regression; the suggested value flagged as an outlier is 2.5.
Influence Diagnostics The influence diagnostic results display only the values for the results
selected in the Options dialog box under the Other Diagnostics tab (see
page 505). All results that qualify as outlying values are flagged with a #
symbol. The trigger values to flag data points as outliers are also set in
Options dialog box under the Other Diagnostics tab.
If you selected Report Cases with Outliers Only, only observations that
have one or more observations flagged as outliers are reported; however,
all other results for that observation are also displayed.
Confidence Intervals These results are displayed if you selected them in the Options for
Multiple Linear Regression dialog box. If the confidence interval does
not include zero, you can conclude that the coefficient is different than
zero with the level of confidence specified. This can also be described as
P #"& (alpha), where & is the acceptable probability of incorrectly
concluding that the coefficient is different than zero, and the confidence
interval is 100(1 !"&").
The specified confidence level can be any value from 1 to 99; the
suggested confidence level for both intervals is 95%.
Predicted This is the value for the dependent variable predicted by the
regression model for each observation.
Regression The confidence interval for the regression gives the range of
variable values computed for the region containing the true relationship
between the dependent and independent variables, for the specified level
of confidence.
Population The confidence interval for the population gives the range
of variable values computed for the region containing the population
from which the observations were drawn, for the specified level of
confidence.
You can generate up to six graphs using the results from a Multiple
Linear Regression. They include a:
Histogram of Residuals The Multiple Linear Regression histogram plots the raw residuals in a
specified range, using a defined interval set. The residuals are divided
into a number of evenly incremented histogram intervals and plotted as
histogram bars indicating the number of residuals in each interval. The
X axis represents the histogram intervals, and the Y axis represent the
number of residuals in each group. For an example of a histogram, see
page 153 in the CREATING AND MODIFYING GRAPHS chapter.
Scatter Plot of The Multiple Linear Regression scatter plot of the residuals plots the
the Residuals residuals of the data in the selected independent variable column as
points relative to the standard deviations. The X axis represents the
independent variable values, the Y axis represents the residuals of the
variables, and the horizontal lines running across the graph represent the
standard deviations of the data. For an example of a scatter plot of the
residuals, see page 152 in the CREATING AND MODIFYING GRAPHS
chapter.
Bar Chart of The Multiple Linear Regression bar chart of the standardized residuals
the Standardized plots the standardized residuals of the data in the selected independent
Residuals variable column as points relative to the standard deviations. The X axis
represents the selected independent variable values, the Y axis represents
the residuals of the variables, and the horizontal lines running across the
graph represent the standard deviations of the data. For an example of a
bar chart of the residuals, see page 153 in the CREATING AND
MODIFYING GRAPHS chapter.
The residuals are sorted and then plotted as points around a curve
representing the area of the gaussian. Plots with residuals that fall along
gaussian curve indicate that your data was taken from a normally
distributed population. The X axis is a linear scale representing the
residual values. The Y axis is a probability scale representing the
cumulative frequency of the residuals. For an example of a normal
probability plot, see page 155 in the CREATING AND MODIFYING
GRAPHS chapter.
Line/Scatter Plot The Multiple Linear Regression line/scatter graph plots the observations
of the Regression with of the linear regression for the data of the selected independent variable
Prediction and column as a line/scatter plot. The points represent the dependent
Confidence Intervals variable data plotted against the selected independent variable data, the
solid line running through the points represents the regression line, and
the dashed lines represent the prediction and confidence intervals. The
X axis represents the independent variables and the Y axis represents the
dependent variables. For an example of a line scatter plot of the
regression, see page 156 in the CREATING AND MODIFYING GRAPHS
chapter.
3D Residual The multiple linear regression 3D residual scatter plot graphs the
Scatter Plot residuals of the two selected columns of independent variable data. The
X and the Y axes represent the independent variables, and the Z axis
represents the residuals. For an example of a 3D scatter plot of the
residuals, see page 156 in the CREATING AND MODIFYING GRAPHS
chapter.
Creating Multiple Linear To generate a report graph of Multiple Linear Regression data:
Regression
Report Graphs 1 Click the toolbar button, or choose the Graph menu Create
Graph command when the Multiple Linear Regression report is
selected. The Create Graph dialog box appears displaying the
types of graphs available for the Multiple Linear Regression results.
2 Select the type of graph you want to create from the Graph Type
list, then select OK, or double-click the desired graph in the list.
For more information on each of the graph types, see pages 11-493
through 11-494.
FIGURE 11–23
The Create Graph Dialog
Box
for the Multiple Linear
Regression Report
FIGURE 11–24
The Dialog Box Prompting
You
to Select the Independent
Variable Column to Plot
3 Select the columns with the independent variables you want to use
in the graph, then select OK. The graph appears using the
specified independent variables.
FIGURE 11–25
A 3D Scatter Plot of
Multiple Comparison
Residuals
If your dependent variable data does not use dichotomous values, use a
Simple Linear Regression if you have one independent variable and a
Multiple Linear Regression if you have more than one independent
variable.
About the Multiple Multiple Logistic Regression assumes an association between the
Logistic Regression dependent and k independent variables that fits the general equation for
a multidimensional plane:
1
P $ y = 1 % = -----------------------------------------------------------------
$ b0 + b1 x1 + b2 x2 + ' + bk xk %
1+e
4 Run the test by selecting the worksheet columns with the data you
want to test using the Pick Columns dialog box (page 541).
FIGURE 12–28
Valid Raw Data Format
for a Multiple Logistic
Regression
Column 1 is the dependent
variable column and
columns 2 through 6
are the independent
variable columns.
Grouped Data The grouped data format enables you to specify the number of instances
a combination of dependent and independent variables appear in a data
set. This data format is useful if you have several instances of the same
variable combination, and you don’t want to enter every instance in the
worksheet.
To enter data in grouped format, place the data for the observed
dependent variable in one column and the data for the corresponding
independent variables in one or more columns. Only enter one instance
of each different combination of dependent and independent variables,
then specify the number of times the combination appears in the data set
in the corresponding row of another worksheet column.
For example, if there are three instances of the dependent variable 0 with
corresponding independent variables of 26, and 142, place 0 in the
dependent variable column, 26, and 142 in the corresponding rows of
the independent variable columns, and 3 in the corresponding row of
the count worksheet column.
1 If you are going to run the test after changing test options and
want to select your data before you run the test, drag the pointer
over the data.
5 To continue the test, click Run Test. The Pick Columns dialog
box appears (see page 541 for more information).
6 To accept the current settings and close the options dialog box,
click OK. To accept the current setting without closing the
options dialog box, click Apply. To close the dialog box without
changing any settings or running the test, click Cancel.
( You can select Help at any time to access SigmaStat’s on-line help system.
Criterion Options Select the Criteriton tab in the Options dialog box to set the criterion
options. Use these options to specify the criterion you want to use to
test how well your data fits the logistic regression equation.
FIGURE 12–30
The Options for
Multiple Logistic
Regression Dialog Box
Displaying the
Criterion Options
.
Statistics Options Select the More Statistics tab in the Options dialog box to view the
statistics options. These options help determine how well your data fits
the logistic regression equation using maximum likelihood as the
estimation criterion.
FIGURE 12–31
The Options for
Multiple Logistic
Regression Dialog Box
Displaying the
Statistics Options
Select the Wald statistic option to include the ratio of the observed
coefficient with the associated standard error in the report. The Wald
statistic can also be used to determine how significant the independent
variables are in predicting the dependent variable. For information on
using the Wald statistic to test whether your data fits the logistic
regression equation, see Wald Statistic on page 548.
Where P is the probability of the event happening. The odds ratio for
an independent variable is computed as
, G = e )I
0b - Z s 1
. i 1 – --- i/
& b
e 2
Predicted Values Use this option to calculate the predicted value of the
dependent variable for each observed value of the independent
variable(s), then save the results to the data worksheet.
Flagging Multicollinear Data Use the value in the Flag Values > edit
box as a threshold for multicollinear variables. The default threshold
value is 4.0, meaning that any value greater than 4.0 will be flagged as
multicollinear. To make this test more sensitive to possible
multicollinearity, decrease this value. To allow greater correlation of the
independent variables before flagging the data as multicollinear, increase
this value.
When the variance inflation factor is large, there are redundant variables
in the regression model, and the parameter estimates may not be reliable.
Variance inflation factor values above 4 suggest possible
multicollinearity; values above 10 indicate serious multicollinearity.
Report Flagged Values Only To only include only the points flagged
by the influential point tests and values exceeding the variance inflation
threshold in the report, make sure the Report Flagged Values Only check
box is selected. Uncheck this option to include all influential points in
the report.
Residuals Select the Residuals tab in the options dialog box to view the Residual
Type, Raw, Standardized, Studentized, Studentized Deleted, and Report
Flagged Values Only options.
Deviance residuals are used to calculate the likelihood ratio test statistic
to assess the overall goodness of fit of the logistic regression equation to
the data. The likelihood ratio test statistic is the sum of squared
deviance residuals. The deviance residual for each point is a measure of
how much that point contributes to the likelihood ratio test statistic.
Larger values of the deviance residual indicate a larger difference
between the observed and predicted values of the dependent variable.
FIGURE 12–32
The Options for
Multiple Logistic
Regression Dialog Box
Displaying the
Residuals Options
Raw Residuals The raw residuals are the differences between the
predicted and observed values of the dependent variables. To include
raw residuals in the report, make sure this check box is selected. Click
the selected check box if you do not want to include raw residuals in the
worksheet.
To include studentized residuals in the report, make sure this check box
is selected. Click the selected check box if you do not want to include
studentized residuals in the worksheet.
SigmaStat can automatically flag data points with “large” values of the
studentized deleted residual, i.e., outlying data points; the suggested data
points flagged lie outside the 95% confidence interval for the regression
population.
( Note that both Studentized and Studentized deleted residuals use the same
confidence interval setting to determine outlying points.
Influence Options Select the Residuals tab in the options dialog box to view the Influence
options (see Figure 12–32 on page 538). Influence options
automatically detect instances of influential data points. Most
influential points are data points which are outliers, that is, they do not
“line up” with the rest of the data points. These points can have a
potentially disproportionately strong influence on the calculation of the
regression line. You can use several influence tests to identify and
quantify influential points.
2
100 120 140 160 180 200 220 240 260 280 300
Check the Leverage check box to compute the leverage for each point
and automatically flag potentially influential points, i.e., those points
that could have leverages greater than the specified value times the
expected leverage. The suggested value is 2.0 times the expected leverage
$ k + 1 %-
for the regression (i.e., 2--------------------
n
). To avoid flagging more potentially
influential points, increase this value; to flag points with less potential
influence, lower this value.
Check the Cook's Distance check box to compute this value for all
points and flag influential points, i.e., those with a Cook's distance
greater than the specified value. The suggested value is 4.0. Cook's
distances above 1 indicate that a point is possibly influential. Cook's
distances exceeding 4 indicate that the point has a major effect on the
values of the parameter estimates. To avoid flagging more influential
points, increase this value: to flag less influential points, lower this value.
To run a Multiple Logistic Regression, you need to select the data to test.
The Pick Columns dialog box is used to select the worksheet columns
with the data you want to test.
1 If you want to select your data before you run the regression, drag
the pointer over your data.
2 Open the Pick Columns dialog box to start the Multiple Logistic
Regression. You can either:
3 Select the appropriate data format from the Data Format drop-
down list. If every instance of your dependent and independent
variable combination, including repeated combinations, is entered
in the worksheet, select Raw. If the number of repeated dependent
and independent variable combinations are indicated by a value in
a separate column, select data, select Grouped.
FIGURE 12–34
The Pick Columns
for Multiple Logistic
Regression Dialog Box
Prompting You to
Specify a Data Format
4 Select Next to pick the data columns for the test. If you selected
columns before you chose the test, the selected columns appear in
the Selected Columns list.
If you selected Raw as your data format, you are prompted for one
dependent column and up to 64 independent column. If you
selected Grouped as your data format, you are prompted for one
Count column. Select the column with the values indication the
number of time a dependent and independent combination is
repeats as the Count column. The title of selected columns
appears in each row.
FIGURE 12–35
The Pick Columns
for Multiple Logistic
Regression Dialog Box
Prompting You to
Select Data Columns
The other results displayed in the report are enabled or disabled in the
Options for Multiple Logistic Regression dialog box (see page 530).
( The report scroll bars only scroll to the top and bottom of the current page.
To move to the next or the previous page in the report, use the and
buttons in the formatting toolbar to move one page up and down in the
report.
Result Explanations In addition to the numerical results, expanded explanations of the results
may also appear. To turn off this explanatory text, choose the Statistics
menu Report Options command and uncheck the Explain Results
option.
1
P = --------------------------------------------------------------------
–$ b0 + b1 x1 + b2 x2 + ' + bk xk %
1+e
P
Logit P = ln 0 ------------1
. 1 – P/
Estimation Criterion Logistic regression uses the maximum likelihood approach to find the
values of the coefficients (bi) in the Logistic Regression Equation that
were most likely to fit the observed data.
Dependent Variable This section of the report indicates which values in the dependent
variable column represent the positive response (1) and which value
represents the reference response (0).
Number of Unique This value represents the number of unique combinations of the
Independent Variable independent variables and appears if you have the Number of
Combinations Independent Variable Combinations option in the Options for Logistic
Regression dialog box (see page 530) selected. The number of unique
independent variable combinations is compared to the actual number of
independent variables. If this value is less than the value specified for the
Number of Independent Variable Combinations option (page 533), a
warning message appears in the report that your results may be
unreliable.
Hosmer-Lemshow The Hosmer-Lemshow P value indicates how well the logistic regression
P Value equation fits your data by comparing the number of individuals with
When the dataset is small, goodness of fit measures for the logistic
regression should be interpreted with great caution. All of the P values
are based on a chi-square probability distribution, which is not
recommended for use with small numbers of observations.
Pearson The Pearson Chi-Square statistic is the sum of the squared Pearson
Chi-Square Statistic residuals. It is a measure of the agreement between the observed and
predicted values of the dependent variable using a Chi-Square test
statistic. The Chi-Square test statistic is analogous to the residual sum of
squares in ordinary linear regression. Small values of the Chi-Square
(and corresponding large values of the associated P value) indicate a
good agreement between the logistic regression equation and the data
and large values of Chi-Square (and small values of P) indicate a poor
agreement. The Pearson Chi-Square option is set in the Options for
Multiple Logistic Regression dialog box (page 530).
Likelihood Ratio The Likelihood Ratio Test statistic is derived from the sum of the
Test Statistic squared deviance residuals. It indicates how well the logistic regression
equation fits your data by comparing the likelihood of obtaining
observations if the independent variables had no effect on the dependent
variable with the likelihood of obtaining the observations if the
independent variables had an effect on the dependent variables.
–2 4 yi ln $ 2i % + $ 1 – yi % ln $ 1 – 2i %
i=1
where the yi and 23 are respectively the observed and predicted values of
the dependent variable, and n is the number of observations. Note that
ln(1) is zero and the observed values must be 0 or 1. Thus the closer the
predicted values are to the observed, the closer this sum will be to zero.
The -2 log likelihood is also equal to the sum of the squared deviance
residuals.
The -2 log likelihood (LL) statistic is related to the likelihood ratio (LR):
LR = LL – LL 0
Threshold Probability for The threshold probability value determines whether the response
Positive Classification predicted by the logistic model in the classification and probability
tables (see following sections) is a positive or a reference response. If the
estimated probability in the probability table (see page 532) exceeds the
specified threshold probability value, the predicted variable is assigned a
positive response (value of 1); probabilities less than or equal to the
specified value are assigned a value of 0 or a reference value. The
threshold probability value is set in the options dialog box (see page
532).
Probability Table The Probability Table lists the actual responses of the dependent
variable, the estimated logistic probability of a positive response (a value
of 1), and the predicted response of the dependent variables. The
predicted responses are assigned values of 1 (positive response) or 0
(reference response) derived by comparing estimated logistic
probabilities to the specified threshold probability value (see preceding
section).
This table appears in the report if the Predicted Values option is selected
in the options dialog box (see page 535).
Statistical The summary table lists the coefficient, standard error, Wald Statistic,
Summary Table Odds Ratio, Odds Ratio Confidence, P value, and VIF for the
independent variables.
These values are used to compute the Wald statistic and confidence
intervals for the regression coefficients.
b
z = ----i
s bi
P value P is the P value calculated for the Wald statistic. The P value is
the probability of being wrong in concluding that there is a true
association between the variables. The P value is based on the chi-square
distribution with one degree of freedom. The smaller the P value, the
greater the probability that the independent variables affect the
dependent variable.
, G = e )I
Odds Ratio Confidence These two values represent the lower and
upper ends of the confidence interval in which the true odds ratio lies.
The level of confidence (95%) is specified in the options dialog box (see
page 534).
Residual The residual calculation method indicates how the residuals for the
Calculation Method logistic regression are calculated. You can choose Pearson or Deviance
residuals from the Options for Logistic Regression dialog box. This
choice does not affect the logistic regression itself, which minimizes the
deviance residuals squared, but does affect how the Studentized residuals
are calculated.
where yi and 2i are respectively the observed and predicted values for the
ith case.
1-
+ 2 ln ---- for y i = 1
2i
Residuals Table The residuals table displays the raw, Pearson or Deviance, studentized,
and studentized deleted residuals if the associated options are selected in
the options dialog box (see page 537). All residuals that qualify as
outlying values are flagged with a # symbol. The trigger values to flag
residuals as outliers are also set in the Options for Multiple Logistic
Regression dialog box.
If you selected Report Cases with Outliers Only, only those observations
that have one or more residuals flagged as outliers are reported; however,
all other results for that observation are also displayed. The way the
residuals are calculated depend on whether Pearson or Deviance is
selected as the residual type in the options dialog box (see page 537).
Row This is the row number of the observation. Note that if your data
has a case with a value missing, the corresponding row is entirely
omitted from the table of residuals.
Raw Residuals Raw residuals are the difference between the predicted
and observed values for each of the subjects or cases.
Influence Diagnostics The influence diagnostic results display only the values for the results
selected in the Options dialog box under the More Statistics tab (see
page 539). All results that qualify as outlying values are flagged with a #
If you selected Report Flagged Values Only, only observations that have
one or more observations flagged as outliers are reported; however, all
other results for that observation are also displayed.
Row This is the row number of the observation. Note that if your data
has a case with a value missing, the corresponding row is entirely
omitted from the table of residuals.
About the Polynomial Polynomial Regression assumes an association between the independent
Regression and dependent variables that fits the general equation for a polynomial
of order k
2 3 k
y = b0 + b1 x1 + b2 x + b3 x + ' + bk x
4 Run the test by selecting the worksheet columns with the data you
want to test using the Pick Columns dialog box (page 541).
Place the data for the dependent variable in one column and the
corresponding data for the observed independent variable in another
column.
FIGURE 12–37
Data Format for a
Polynomial Regression
Selecting Data Columns When running a Polynomial Regression, you can either:
1 If you are going to run the test after changing test options, and
want to select your data before you run the test, drag the pointer
over your data.
5 To continue the test, click Run Test. The Pick Columns dialog
box appears (see page 564 for more information).
6 To accept the current settings and close the options dialog box,
click OK. To accept the current setting without closing the
options dialog box, click Apply. To close the dialog box without
changing any settings or running the test, click Cancel.
( You can select Help at any time to access SigmaStat’s on-line help system.
Criterion Options Select the Criterion tab from the options dialog box to view the
Polynomial Order and Regression options. Use these options to specify
the polynomial order to use and the type of polynomial to use to
evaluate your data.
FIGURE 12–38
The Options for Polynomial
Regression Dialog Box
Displaying the
Criterion Options
Note this option does not display all regression results; instead, it is used
to evaluate the order for the best model to use. Once the order is
determined, run an order only polynomial regression to obtain complete
regression results.
Assumption Checking Select the Assumption Checking tab from the options dialog box to view
Options the Normality, Constant Variance, and Durbin-Watson options. These
options test your data for its suitability for regression analysis by
checking three assumptions that a polynomial regression makes about
the data. A polynomial regression assumes:
( Although the assumption tests are robust in detecting data from populations
that are non-normal or with non-constant variances, there are extreme
conditions of data distribution that these tests cannot detect. However, these
conditions should be easily detected by visually examining the data without
resorting to the automatic assumption tests.
FIGURE 12–39
The Options for Polynomial
Regression Dialog Box
Displaying the Assumption
Checking Options
Difference from 2 Value Enter the acceptable deviation from 2.0 that
you consider as evidence of a serial correlation in the Difference for 2.0
box. If the computed Durbin-Watson statistic deviates from 2.0 more
than the entered value, SigmaStat warns you that the residuals may not
be independent. The suggested deviation value is 0.50, i.e., Durbin-
Watson Statistic values greater than 2.5 or less than 1.5 flag the residuals
as correlated.
Residuals Select the Residuals tab in the options dialog box to view the Predicted
Values, Raw, Standardized, Studentized, Studentized Deleted, and
Report Flagged Values Only options.
Predicted Values Use this option to calculate the predicted value of the
dependent variable for each observed value of the independent
variable(s), then save the results to the worksheet. Click the selected
check box if you do not want to include raw residuals in the worksheet.
FIGURE 12–40
The Options for Polynomial
Regression Dialog Box
Displaying the
Residual Options
Raw Residuals The raw residuals are the differences between the
predicted and observed values of the dependent variables. To include
raw residuals in the report, make sure this check box is selected. Click
the selected check box if you do not want to include raw residuals in the
worksheet.
To include studentized residuals in the report, make sure this check box
is selected. Click the selected check box if you do not want to include
studentized residuals in the worksheet.
SigmaStat can automatically flag data points with “large” values of the
studentized deleted residual, i.e., outlying data points; the suggested data
points flagged lie outside the 95% confidence interval for the regression
population.
( Note that both Studentized and Studentized deleted residuals use the same
confidence interval setting to determine outlying points.
FIGURE 12–41
The Options for Polynomial
Regression Dialog Box
Displaying the
Confidence intervals,
PRESS Prediction Error,
and Standard
Coefficients Options
Click the selected check box if you do not want to include the
confidence intervals for the population in the report.
PRESS Select the More Statistics tab in the options dialog box to view the
Prediction Error PRESS Prediction Error option (see Figure 12–42 on page 564). The
PRESS Prediction Error is a measure of how well the regression equation
fits the data. Leave this check box selected to evaluate the fit of the
equation using the PRESS statistic. Click the selected check box if you
do not want to include the PRESS statistic in the report.
Standardized Click the More Statistics tab in the options dialog box to view the
Coefficients ()i) Standardized Coefficients option (see Figure 12–42 on page 564).
These are the coefficients of the regression equation standardized to
dimensionless values,
sx
) i = b i ----i
sy
Power Select the Post Hoc Tests tab in the options dialog box to view the Power
option. If you can’t see the Post Hoc tab in the Options dialog box, click
the right point arrow at the right of the tabs to move the tab into view.
Use the left pointing arrow to move the other tabs back into view.
Check the Power check box to compute the power for the polynomial
regression data. Change the alpha value by editing the number in the
Alpha Value edit box. The suggested value is & * 0.05. This indicates
FIGURE 12–42
The Options for Polynomial
Regression Dialog Box
Displaying the
Power Option
To run a Polynomial Regression you need to select the data to test. The
Pick Columns dialog box is used to select the worksheet columns with
the data you want to test.
1 If you want to select your data before you run the regression, drag
the pointer over your data.
FIGURE 12–43
The Pick Columns
for Polynomial Regression
Dialog Box Prompting You to
Select Data Columns
( The report scroll bars only scroll to the top and bottom of the current page.
To move to the next or the previous page in the report, use the and
buttons in the formatting toolbar to move one page up and down in the
report.
Results Explanations In addition to the numerical results, expanded explanations of the results
may also appear. To turn off this explanatory text, choose the Statistics
menu Options command, choose Report, and unselect the Explain
Results option. The number of decimal places displayed is also set in the
Report Options dialog box. For more information on setting report
options, see page 135.
Regression Equation These are the regression equations for each order, with the values of the
coefficients in place. The equations take the form:
2
y = b0 + b1 x1 + b2 x + b3 x 3 + ' + bk x k
FIGURE 12–44
An Example of the
Incremental
Polynomial
Regression Report
Incremental Results MSres (Residual Mean Square) The residual mean square is a measure
of the variation of the residuals about the regression line.
residual sum of squares - SS res
----------------------------------------------------------- = ------------ = MS res
residual degrees of freedom DF res
Assumption Testing Normality Normality test result displays whether or not the polynomial
model passed or failed the test of the assumption that the source
population is normally distributed around the regression curve, and the
P value calculated by the test. All regression requires a source population
to be normally distributed about the regression curve.
Choosing the The smaller the residual sum of squares and mean square, the closer the
Best Model curve matches the data at those values of the independent variable. The
first model that has a significant increase in the incremental F value is
generally the best model to use. Because the R2 value increases as the
order increases, you also want to use the simplest model that adequately
describes the data.
The other results displayed in the report are selected in the Options for
Polynomial Regression dialog box (see page 555).
( The report scroll bars only scroll to the top and bottom of the current page.
To move to the next or the previous page in the report, use the and
buttons in the formatting toolbar to move one page up and down in the
report.
Result Explanations In addition to the numerical results, expanded explanations of the results
may also appear. To turn off this explanatory text, choose the Statistics
menu Report Options command and unselect the Explain Results
option.
Regression Equation This is the equation with the values of the coefficients in place. This
equation takes the form:
2 3 k
y = b0 + b1 x1 + b2 x + b3 x + ' + bk x
Analysis of MSres (Residual Mean Square) The mean square provides an estimate
Variance (ANOVA) of the population variance. The residual mean square is a measure of the
variation of the residuals about the regression curve, or
Standard Error of The standard error of the estimate s y x is a measure of the actual
the Estimate ( s y x ) variability about the regression line of the underlying population. The
Regression assumes that the residuals are independent of each other; the
Durbin-Watson test is used to check this assumption. If the Durbin-
Watson value deviates from 2 by more than the value set in the Options
for Polynomial Regression dialog box, a warning appears in the report.
The suggested trigger value is a difference of more than 0.50 (i.e., if the
Durbin-Watson statistic is below 1.5 or over 2.5).
Normality Test The normality test results display whether or not the polynomial model
passed or failed the test of the assumption that the source population is
normally distributed around the regression curve, and the P value
calculated by the test. All regression requires a source population to be
normally distributed about the regression curve.
This result appears unless you disabled normality testing in the Options
for Polynomial Regression dialog box (see page 555).
Constant The constant variance test result displays whether or not the polynomial
Variance Test model passed or failed the test of the assumption that the variance of the
dependent variable in the source population is constant regardless of the
value of the independent variable, and the P value calculated by the test.
If you receive this warning, you should consider trying a different model
(i.e., one that more closely follows the pattern of the data), or
transforming the independent variable to stabilize the variance and
obtain more accurate estimates of the parameters in the regression
equation. For more information on the appropriate transform to use,
see Using Quick Transforms to Linearize and Normalize Data on page
749.
This result appears unless you disabled constant variance testing in the
Options for Polynomial Regression dialog box (see page 557).
Regression Diagnostics The regression diagnostic results display only the values for the predicted
values, residual results, and other diagnostics selected in the Options for
Polynomial Regression dialog box (see page 559). All results that qualify
as outlying values are flagged with a # symbol. The trigger values to flag
residuals as outliers are set in the Options for Polynomial Regression
dialog box.
If you selected Report Cases with Outliers Only, only those observations
that have one or more residuals flagged as outliers are reported; however,
all other results for that observation are also displayed.
Residuals These are the raw residuals, the difference between the
predicted and observed values for the dependent variables.
If the residuals are normally distributed about the regression line, about
66% of the standardized residuals have values between !1 and 1, and
about 95% of the standardized residuals have values between !2 and 2.
A larger standardized residual indicates that the point is far from the
regression line; the suggested value flagged as an outlier is 2.5.
Confidence Intervals These results are displayed if you selected them in the Options for
Polynomial Regression dialog box. If the confidence interval does not
include zero, you can conclude that the coefficient is different than zero
with the level of confidence specified. This can also be described as P #"
The specified confidence level can be any value from 1 to 99; the
suggested confidence level for both intervals is 95%.
Predicted This is the value for the dependent variable predicted by the
regression model for each observation.
Regression These are the values that define the region containing the
true relationship between the dependent and independent variables, for
the specified level of confidence, centered at the predicted value.
Population Confidence Interval These are the values that define the
region containing the population from which the observations were
drawn, for the specified level of confidence, centered at the predicted
value.
You can generate up to five graphs using the results from a Polynomial
Regression. They include a:
Scatter Plot of The polynomial regression scatter plot of the residuals plots the residuals
the Residuals of the independent variables data as points relative to the standard
deviations. The X axis represents the independent variable values, the Y
axis represents the residuals of the variables, and the horizontal lines
running across the graph represent the standard deviations of the data.
For an example of a scatter plot, see page 152.
Bar Chart of The Polynomial Regression bar chart of the standardized residuals plots
the Standardized the standardized residuals of the independent variable data as points
Residuals relative to the standard deviations. The X axis represents the
independent variable values, the Y axis represents the residuals of the
variable data, and the horizontal lines running across the graph represent
the standard deviations of the data. For an example of a bar chart, see
page 153.
Line/Scatter Plot The Polynomial Regression graph plots the observations of the
of the Regression with polynomial regression for the independent variables as a line/scatter plot.
Prediction and The points represent the data dependent variables plotted against the
Confidence Intervals selected independent variables, the solid line running through the points
represents the regression line, and the dashed lines represent the
prediction and confidence intervals. The X axis represents the
independent variables and the Y axis represents the dependent variables.
For an example of a line/scatter plot of the regression, see page 156.
2 Select the type of graph you want to create from the Graph Type
list, then select OK, or double-click the desired graph in the list.
The selected graph appears in a graph window. For more
information on each of the graph types, see pages 12-575 through
12-575.
FIGURE 12–46
A Line and Scatter Plot of
the Regression and
Confidence and Prediction
Intervals for a Polynomial
Regression Report
If you already know the independent variables you want to include, use
Multiple Linear Regression. If you want to find the few best equations
from all possible models, use Best Subsets Regression. If the
relationship is not a straight line or plane, use Polynomial or Nonlinear
Regression.
About Stepwise Linear Stepwise Regression is a technique for selecting independent variables
Regression for a Multiple Linear Regression equation from a list of candidate
variables. Using Stepwise Regression instead of regular Multiple Linear
Regression avoids using extraneous variables, or under specifying or over
specifying the model.
y = b0 + b1 x1 + b2 x2 + b3 x3 + ' + bk xk
where y is the dependent variable, x1, x2, x3, ..., xk are the independent
variables, and b0, b1, b2,...,bk are the regression coefficients. The
independent variable is the known, or predicted, variable. As the values
for xi vary, the corresponding value for y either increases or decreases,
depending on the sign of bi. Stepwise Regression determines which
independent variables to use by adding or removing selected
independent variables from the equation.
This process is repeated the until adding or removing variables does not
significantly improve the prediction of the dependent variable.
This process is repeated the until removing or adding variables does not
significantly improve the prediction of the dependent variable.
The data format for a Stepwise Linear Regression consists of the data for
the independent variables in one or more columns and the
corresponding data for the observed dependent variable in a single
column. Any observations containing missing values are ignored, and
the columns must be equal in length.
FIGURE 12–47
Data Format for a
Stepwise Linear Regression
1 If you are going to run the test after changing test options, and
want to select your data before you run the test, drag the pointer
over the data.
5 To continue the test, click Run Test. The Pick Columns dialog
box appears (see page 594 for more information).
( You can select Help at any time to access SigmaStat’s on-line help system.
Criterion Options Select the Criterion tab from the options dialog box to view the F-to-
Enter, F-to-Remove, and Number of Steps options. Use these options to
specify the independent variables that are entered into, replaced, or
removed from the regression equation during the stepwise regression,
and to specify when the stepwise algorithm stops.
( The F-to-Enter value should always be greater than or equal to the F-to-
Remove value, to avoid cycling variables in and out of the regression model.
( If you are performing backwards stepwise regression and you want any
variable that has been removed to remain deleted, increase the F-to-Enter
value to a large number, e.g., 100000.
( The F-to-Remove value should always be less than or equal to the F-to-Enter
value, to avoid cycling variables in and out of the regression model.
( If you are performing forwards stepwise regression and you want any variable
that has been to entered to remain in the equation, set the F-to-Remove
value to zero.
Number of Steps Use this option to set the maximum number of steps
permitted before the stepwise algorithm stops. Note that if the
algorithm stops because it ran out of steps, the results are probably not
reliable. The suggested number of steps is 20 added or deleted
independent variables.
Assumption Checking Select the Assumption Checking tab from the options dialog box to view
Options the Normality, Constant Variance, and Durbin-Watson options. These
FIGURE 12–49
The Options for Stepwise
Regression Dialog Box
Displaying the Assumption
Checking Options
( Although the assumption tests are robust in detecting data from populations
that are non-normal or with non-constant variances, there are extreme
conditions of data distribution that these tests cannot detect. However, these
conditions should be easily detected by visually examining the data without
resorting to the automatic assumption tests.
Difference from 2 Value Enter the acceptable deviation from 2.0 that
you consider as evidence of a serial correlation in the Difference for 2.0
box. If the computed Durbin-Watson statistic deviates from 2.0 more
than the entered value, SigmaStat warns you that the residuals may not
be independent. The suggested deviation value is 0.50, i.e., Durbin-
Watson Statistic values greater than 2.5 or less than 1.5 flag the residuals
as correlated.
Residuals Select the Residuals tab in the options dialog box to view the Predicted
Values, Raw, Standardized, Studentized, Studentized Deleted, and
Report Flagged Values Only options.
Predicted Values Use this option to calculate the predicted value of the
dependent variable for each observed value of the independent
variable(s), then save the results to the data worksheet. Click the selected
check box if you do not want to include raw residuals in the worksheet.
Raw Residuals The raw residuals are the differences between the
predicted and observed values of the dependent variables. To include
raw residuals in the report, make sure this check box is selected. Click
the selected check box if you do not want to include raw residuals in the
worksheet.
FIGURE 12–50
The Options for Stepwise
Regression Dialog Box
Displaying the Residuals
Options
To include studentized residuals in the report, make sure this check box
is selected. Click the selected check box if you do not want to include
studentized residuals in the worksheet.
SigmaStat can automatically flag data points with “large” values of the
studentized deleted residual, i.e., outlying data points; the suggested data
points flagged lie outside the 95% confidence interval for the regression
population.
( Note that both Studentized and Studentized deleted residuals use the same
confidence interval setting to determine outlying points.
Confidence Intervals Select the More Statistics tab in the options dialog box to view the
confidence interval options. You can set the confidence interval for the
population, regression, or both, and then save them to the worksheet.
Click the selected check box if you do not want to include the
confidence intervals for the population in the report.
PRESS Select the More Statistics tab in the options dialog box to view the
Prediction Error PRESS Prediction Error option (see Figure 12–48). The PRESS
Prediction Error is a measure of how well the regression equation fits the
data. Leave this check box selected to evaluate the fit of the equation
using the PRESS statistic. Click the selected check box if you do not
want to include the PRESS statistic in the report.
Standardized Click the More Statistics tab in the options dialog box to view the
Coefficients ()i) Standardized Coefficients option (see Figure 12–48 on page 582).
These are the coefficients of the regression equation standardized to
dimensionless values,
sx
) i = b i ----i
sy
Influence Options Select the Other Diagnostics tab in the options dialog box to view the
Influence options (see Figure 12–48 on page 582). If Other Diagnostic
is hidden, click the right pointing arrow to the right of the tabs to move
it into view. Use the left pointing arrow to move the other tabs back
into view.
FIGURE 12–52 16
A Graph with an
Influential Outlying Point 14 Outlying Point
The solid line shows the
regression for the data 12
including the outlier, and
the dotted line is the
regression computed 10
with the outlying point.
8
2
100 120 140 160 180 200 220 240 260 280 300
Predicted values that change by more than two standard errors when the
data point is removed are considered to be influential.
Check the DFFITS check box to compute this value for all points and
flag influential points, i.e., those with DFFITS greater than the value
specified in the Flag Values > edit box. The suggested value is 2.0
standard errors, which indicates that the point has a strong influence on
the data. To avoid flagging more influential points, increase this value;
to flag less influential points, decrease this value.
Check the Leverage check box to compute the leverage for each point
and automatically flag potentially influential points, i.e., those points
that could have leverages greater than the specified value times the
expected leverage. The suggested value is 2.0 times the expected leverage
$ k + 1 %-
for the regression (i.e., 2--------------------
n
). To avoid flagging more potentially
influential points, increase this value; to flag points with less potential
influence, lower this value.
Check the Cook's Distance check box to compute this value for all
points and flag influential points, i.e., those with a Cook's distance
greater than the specified value. The suggested value is 4.0. Cook's
distances above 1 indicate that a point is possibly influential. Cook's
distances exceeding 4 indicate that the point has a major effect on the
values of the parameter estimates. To avoid flagging more influential
points, increase this value: to flag less influential points, lower this value.
Variance Select the Other Diagnostics tab in the options dialog box to view the
Inflation Factor Variance Inflation Factor option (see Figure 12–48 on page 582). If
Other Diagnostic is hidden, click the right pointing arrow to the right of
the tabs to move it into view. Use the left pointing arrow to move the
other tabs back into view.
FIGURE 12–54
A Graph with
Multicollinear Data Points
Note that knowing the value 18
of one of the independent
variables allows you to 16
predict the other so that the
independent variables are 14
Dependent y
statistically independent.
12
10 120
100
2
ent x
8 80
6 60
pend
40
4 600
Inde
500 20
400 300 200 100 0
Independ
ent x
1
Flagging Multicollinear Data Use the value in the Flag Values > edit
box as a threshold for multicollinear variables. The default threshold
value is 4.0, meaning that any value greater than 4.0 will be flagged as
multicollinear. To make this test more sensitive to possible
multicollinearity, decrease this value. To allow greater correlation of the
independent variables before flagging the data as multicollinear, increase
this value.
When the variance inflation factor is large, there are redundant variables
in the regression model, and the parameter estimates may not be reliable.
Variance inflation factor values above 4 suggest possible
multicollinearity; values above 10 indicate serious multicollinearity.
Report Flagged Values Only To only include only the points flagged
by the influential point tests and values exceeding the variance inflation
threshold in the report, make sure the Report Flagged Values Only check
box is selected. Uncheck this option to include all influential points in
the report.
Power Select the Other Diagnostics tab in the options dialog box to view the
Power options (see Figure 12–48 on page 582). If Other Diagnostic is
hidden, click the right pointing arrow to the right of the tabs to move it
Check the Power check box to compute the power for the stepwise linear
regression data. Change the alpha value by editing the number in the
Alpha Value edit box. The suggested value is & * 0.05. This indicates
that a one in twenty chance of error is acceptable, or that you are willing
to conclude there is a significant relationship when P # 0.05.
To run a Stepwise Regression, you need to select the data to test. The
Pick Columns dialog box is used to select the worksheet columns with
the data you want to test and to specify which independent variables to
include in and omit from the regression equation.
1 If you want to select your data before you run the regression, drag
the pointer over your data.
FIGURE 12–55
The Pick Columns
for Stepwise Regression
Dialog Box Prompting You to
Select Data Columns
FIGURE 12–56
The Pick Columns
for Forward Stepwise
Regression Dialog Box
Prompting You to
Select Columns With
the Variables to
Force into the Equation
The report for both Forward and Backward Stepwise Regression displays
the variables that were entered or removed for that step, the regression
coefficients, an ANOVA table, and information about the variables in
and not in the model. Regression diagnostics, confidence intervals, and
predicted values are listed for the final regression model if these options
were selected in the Options for Forward or Backward Regression dialog
box. For more information on selecting regression options, see Setting
Stepwise Regression Options on page 579.
( The report scroll bars only scroll to the top and bottom of the current page.
To move to the next or the previous page in the report, use the and
buttons in the formatting toolbar to move one page up and down in the
report.
Result Explanations In addition to the numerical results, expanded explanations of the results
may also appear. To turn off this explanatory text, choose the Statistics
menu Report Options command and uncheck the Explain Test Result
option.
FIGURE 12–57
Example of the
a Forward Stepwise
Regression Report
F-to-Enter This is the worksheet column used as the dependent variable in the
F-to-Remove regression computation.
These are the F values specified in the Options for Stepwise Regression
dialog boxs.
Step The step number, variable added or removed, R, R2, and the adjusted R2
for the equation, and standard error of the estimate are all listed under
this heading.
R equals 0 when the values of the independent variable does not allow
any prediction of the dependent variables, and equals 1 when you can
perfectly predict the dependent variables from the independent
variables.
Adjusted R2 The adjusted R2, R2adj, is also a measure of how well the
regression model describes the data, but takes into account the number
of independent variables, which reflects the degrees of freedom. Larger
R2adj values (nearer to 1) indicate that the equation is a good description
of the relation between the independent and dependent variables.
Analysis of Variance The ANOVA (analysis of variance) table lists the ANOVA statistics for
(ANOVA) Table the regression and the corresponding F value for each step.
2
The residual mean square is also equal to S y x .
Variables in Model Information about the independent variables used in the regression
equation for the current step are listed under this heading. The value of
the variable coefficients, standard errors, the F-to-Remove, and the
corresponding P value for the F-to-Remove are listed. These statistics
are displayed for each step. An asterisk (5) indicates variables that were
forced into the model.
( Note that the F-to-Remove value is the cutoff that determines if a variable is
removed from or stays out of the equation.
Variables The variables not entered or removed from the model are listed under
not in Model this heading, along with their corresponding F-to-Remove and P values.
( Note that it is the F-to-Enter value that determines which variable is re-
entered into or remains in the equation.
PRESS Statistic PRESS, the Predicted Residual Error Sum of Squares, is a gauge of how
well a regression model predicts new data. The smaller the PRESS
statistic, the better the predictive ability of the model.
Regression assumes that the residuals are independent of each other; the
Durbin-Watson test is used to check this assumption. If the Durbin-
Watson value deviates from 2 by more than the value set in the Options
for Stepwise Regression dialog box, a warning appears in the report. The
suggested trigger value is a difference of more than 0.50, i.e., when the
Durbin-Watson statistic is less than 1.5 or greater than 2.5.
Normality Test The Normality test result displays whether the data passed or failed the
test of the assumption that the source population is normally distributed
around the regression, and the P value calculated by the test. All
regression requires a source population to be normally distributed about
the regression. When this assumption may be violated, a warning
appears in the report. This result appears unless you disabled normality
testing in the Options for Best Subset Regression dialog box (see page
583).
Constant The constant variance test result displays whether or not the data passed
Variance Test or failed the test of the assumption that the variance of the dependent
variable in the source population is constant regardless of the value of
the independent variable, and the P value calculated by the test. When
the constant variance assumption may be violated, a warning appears in
the report.
If you receive this warning, you should consider trying a different model
(i.e., one that more closely follows the pattern of the data), or
transforming the independent variable to stabilize the variance and
obtain more accurate estimates of the parameters in the regression
equation. See Chapter 14, Using Transforms for more information on
the appropriate transform to use.
Power This result is displayed if you selected this option in the Options for
Stepwise Regression dialog box.
The & value is set in the Power Options dialog box; the suggested value
is
&"* 0.05 which indicates that a one in twenty chance of error is
acceptable. Smaller values of & result in stricter requirements before
concluding the model is correct, but a greater possibility of concluding
the model is bad when it is really correct (a Type II error). Larger values
of & make it easier to conclude that the model is correct, but also
increase the risk of accepting a bad model (a Type I error).
If you selected Report Cases with Outliers Only, only those observations
that have one or more residuals flagged as outliers are reported; however,
all other results for that observation are also displayed.
Predicted Values This is the value for the dependent variable predicted
by the regression model for each observation. If these values were saved
to the worksheet, they may be used to plot the regression using
SigmaPlot.
Residuals These are the raw residuals, the difference between the
predicted and observed values for the dependent variables.
If the residuals are normally distributed about the regression, about 66%
of the standardized residuals have values between !1 and 1, and about
95% of the standardized residuals have values between !2 and 2. A
larger standardized residual indicates that the point is far from the
regression; the suggested value flagged as an outlier is 2.5.
Influence Diagnostics The influence diagnostic results display only the values for the results
selected in the Options dialog box under the Other Diagnostics tab (see
page 588). All results that qualify as outlying values are flagged with a #
symbol. The trigger values to flag data points as outliers are also set in
the Options dialog box under the Other Diagnostics tab.
If you selected Report Cases with Outliers Only, only observations that
have one or more observations flagged as outliers are reported; however,
all other results for that observation are also displayed.
Confidence Intervals These results are displayed if you selected them in the Options for
Stepwise Regression dialog box. If the confidence interval does not
include zero, you can conclude that the coefficient is different than zero
with the level of confidence specified. This can also be described as P #
& (alpha), where & is the acceptable probability of incorrectly
concluding that the coefficient is different than zero, and the confidence
interval is 100(1 !"&).
The specified confidence level can be any value from 1 to 99; the
suggested confidence level for both intervals is 95%.
Mean The confidence interval for the regression gives the range of
variable values computed for the region containing the true relationship
between the dependent and independent variables, for the specified level
of confidence.
You can generate up to six graphs using the results from a Stepwise
Regression. They include a:
Histogram of Residuals The Stepwise Regression histogram plots the raw residuals in a specified
range, using a defined interval set. The residuals are divided into a
number of evenly incremented histogram intervals and plotted as
histogram bars indicating the number of residuals in each interval. The
X axis represents the histogram intervals, and the Y axis represent the
number of residuals in each group. For an example of a histogram, see
page 153.
Scatter Plot of The Stepwise Regression scatter plot of the residuals plots the residuals
the Residuals of the data in the selected independent variable column as points relative
to the standard deviations. The X axis represents the independent
variable values, the Y axis represents the residuals of the variables, and
the horizontal lines running across the graph represent the standard
deviations of the data. For an example of a scatter plot, see page 152.
Line/Scatter Plot The Stepwise Regression line/scatter graph plots the observations of the
of the Regression with stepwise regression for the data of the selected independent variable
Prediction and column as a line/scatter plot. The points represent the dependent
Confidence Intervals variable data plotted against the selected independent variable data, the
solid line running through the points represents the regression line, and
the dashed lines represent the prediction and confidence intervals. The
X axis represents the independent variables and the Y axis represents the
dependent variables. For an example a line/scatter plot, see page 156.
3D Residual The stepwise regression 3D residual scatter plot graphs the residuals of
Scatter Plot the two selected columns of independent variable data. The X and the Y
axes represent the independent variables, and the Z axis represents the
residuals. For an example a 3D residual scatter plot, see page 156.
FIGURE 12–58
The Create Graph Dialog
Box
for the Stepwise
Regression Report
2 Select the type of graph you want to create from the Graph Type
list, then select OK, or double-click the desired graph in the list.
For more information on each of the graph types, see pages 12-607
through 12-608.
FIGURE 12–59
Select X Independent
Variable Prompting you to
Select The Independent
Variable You Want to Plot
FIGURE 12–60
Example of a 3D Scatter
Plot of the Residuals for a
Stepwise Regression Report
y = b0 + b1 x1 + b2 x2 + b3 x3 + ' + bk xk
where y is the dependent variable, x1, x2, x3, ..., xk are the independent
variables, and b0, b1, b2,...,bk are the regression coefficients. As the
values for xi vary, the corresponding value for y either increases or
decreases. Best subsets regression searches for those combinations of the
independent variables that give the “best” prediction of the dependent
variable. There are several criteria for “best,” and the results depend on
“Best” Subsets Criteria There are three statistics that can be used to evaluate which subsets of
variables best contribute to predicting the dependent variable. For a
further discussion of these statistics, you can reference an appropriate
statistics reference. For a list of suggested references, see page 12.
However, the number of variables used in the equation is not taken into
account. Consequently, equation with more variables will always have
higher R2 values, whether or not the additional variables really
contribute to the prediction.
2
Adjusted R2 The adjusted R2, R adj , is a measure of how well the
regression model describes the data based on R2, but takes into account
the number of independent variables.
Cp = p = k + 1
4 Run the test by selecting the worksheet columns with the data you
want to test using the Pick Columns dialog box (page 618).
5 View and interpret the Best Subset Regression report (page 619).
Place the data for the observed dependent variable in a single column
and the corresponding data for the independent variables in one or more
columns. Rows containing missing values are ignored, and the columns
must be of equal length.
FIGURE 12–61
Data Format for a
Best Subset Regression
➤ Specify the criterion to use to predict the dependent variable and the
number of subset used in the equation.
➤ Enable the variance inflation factor to identify potential difficulties
with the regression parameter estimates (multicollinearity).
1 If you are going to run the test after changing test options, and
want to select your data before you run the test, drag the pointer
over the data.
5 To continue the test, click Run Test. The Pick Columns dialog
box appears (see page 618 for more information).
6 To accept the current settings and close the options dialog box,
click OK. To accept the current setting without closing the
options dialog box, click Apply. To close the dialog box without
changing any settings or running the test, click Cancel.
Criterion Options Use the Best Criterion option to select the criterion used to determine
the best subsets and the Number of Subsets option to specify the
number of subsets to list.
FIGURE 12–62
The Options for Best Subset
Regression Dialog Box
Variance Use Variance Inflation Factor option (see Figure 12–62 on page 615) to
Inflation Factor measure the multicollinearity of the independent variables, or the linear
combination of the independent variables in the fit.
FIGURE 12–63
A Graph with
Multicollinear Data Points
Note that knowing the value 18
of one of the independent
variables allows you to 16
predict the other, so that the
independent variables are 14
Dependent y
statistically independent.
12
10 120
100
2
ent x
8 80
6 60
pend
40
4 600
Inde
500 20
400 300 200 100 0
Independ
ent x
1
Flagging Multicollinear Data Use the value in the Flag Values > edit
box as a threshold for multicollinear variables. The default threshold
value is 4.0, meaning that any value greater than 4.0 will be flagged as
multicollinear. To make this test more sensitive to possible
multicollinearity, decrease this value. To allow greater correlation of the
independent variables before flagging the data as multicollinear, increase
this value.
When the variance inflation factor is large, there are redundant variables
in the regression model, and the parameter estimates may not be reliable.
Variance inflation factor values above 4 suggest possible
multicollinearity; values above 10 indicate serious multicollinearity.
Report Flagged Values Only To only include only the points flagged
by the influential point tests and values exceeding the variance inflation
threshold in the report, make sure the Report Flagged Values Only check
box is selected. Uncheck this option to include all influential points in
the report.
To run a Best Subset Regression, you need to select the data to test. The
Pick Columns dialog box is used to select the worksheet columns with
the data you want to test.
1 If you want to select your data before you run the regression, drag
the pointer over your data.
2 Open the Pick Columns dialog box to start the best subset
regression. You can either:
If you selected columns before you chose the test, the selected
columns appear in the column list. If you have not selected
columns, the dialog box prompts you to pick your data.
FIGURE 12–64
The Pick Columns
for Best Subset Regression
Dialog Box Prompting You to
Select Data Columns
( No predicted values, residuals and other test results are computed or placed
in the worksheet. To view results for models, note which independent
variables were used for that model, then perform a Multiple Linear
Regression using only those independent variables.
( You cannot generate report graphs for Best Subsets Regression. To view a
graph, perform a Multiple Linear Regression using the variables in the
subset(s) of interest, and graph those results. For information on performing
Multiple Linear Regression, page 495.
( The report scroll bars only scroll to the top and bottom of the current page.
To move to the next or the previous page in the report, use the and
buttons in the formatting toolbar to move one page up and down in the
report.
Result Explanations In addition to the numerical results, expanded explanations of the results
may also appear. To turn off this explanatory text, choose the Statistics
menu Report Options command and unselect the Explain Results
option.
Summary Table Regression Model This is the subset model number that corresponds
to the numbers of the more detailed regression equation statistics.
Variables The variables included in the subset are noted by asterisks (*)
which appear below the variable symbols on the right side of the table.
Cp = p = k + 1
also have Cp values close to k 1 are good candidates for the best subset
of variables.
Subsets Results Tables of statistical results are listed for each regression equation
identified in the summary table.
Std Err (Standard Error) The standard errors are estimates of these
regression coefficients (analogous to the standard error of the mean).
The true regression coefficients of the underlying population generally
fall within about two standard errors of the observed sample coefficients.
Large standard errors may indicate multicollinearity. These values are
used to compute t for the regression coefficients.
t Statistic The t statistic tests the null hypothesis that the coefficient of
each independent variable is zero, that is, the independent variable does
not contribute to predicting the dependent variable. t is the ratio of the
regression coefficient to its standard error, or:
regression coefficient
t = --------------------------------------------------------------------------------------
standard error of regression coefficient
You can conclude from “large” t values that the independent variable(s)
can be used to predict the dependent variable (i.e., that the coefficient is
not zero).
This result appears unless it was disabled in the Options for Best Subset
Regression dialog box (see page 623).
If you want to predict the value of one variable from another, use Simple
or multiple Linear Regression. If you need to find the correlation of
data measured by rank or order, use the nonparametric Spearman Rank
Order Correlation.
Computing the Pearson To compute the Pearson Product Moment Correlation coefficient:
Product Moment
Correlation Coefficient 1 Enter or arrange your data appropriately in the data worksheet
(see following section).
3 Run the test by selecting the worksheet columns with the data you
want to test using the Pick Columns dialog box (page 625).
Place the data for each variable in a column. You must have at least two
columns of variables, with a maximum of 64 columns. Observations
FIGURE 12–66
Data for Computing a
Pearson Product Moment
Correlation Coefficient
To run a Pearson Product Moment test, you need to select the data to
test. The Pick Columns dialog box is used to select the worksheet
columns with the data you want to test.
1 If you want to select your data before you run the correlation, drag
the pointer over your data.
2 Open the Pick Columns dialog box to start the Pearson Product
Moment Correlation. You can either:
FIGURE 12–67
The Pick Columns
for Pearson Correlation
Dialog Box Prompting You to
Select Data Columns
( The report scroll bars only scroll to the top and bottom of the current page.
To move to the next or the previous page in the report, use the and
buttons in the formatting toolbar to move one page up and down in the
report.
Result Explanations In addition to the numerical results, expanded explanations of the results
may also appear. To turn off this explanatory text, choose the Statistics
menu Report Options command and unselect the Explain Results
option.
( The number of decimal places displayed is also set in the Report Options
dialog box. For more information on setting report options see page 135.
FIGURE 12–68
The Pearson
Product Moment
Correlation
Results Report
P Value The P value is the probability of being wrong in concluding that there is
a true association between the variables (i.e., the probability of falsely
rejecting the null hypothesis, or committing a Type I error). The smaller
the P value, the greater the probability that the variables are correlated.
Number of Samples This is the number of data points used to compute the correlation
coefficient. This number reflects samples omitted because of missing
values in one of the two variables used to compute each correlation
coefficient.
The first row of the matrix represents the first set of variables or the first
column of data, the second row of the matrix represents the second set of
variables or the second data column, and the third row of the matrix
represents the third set of variables or third data column. The X and Y
data for the graphs correspond to the column and row of the graph in
the matrix.
For example, the X data for the graphs in the first row of the matrix is
taken from the second column of tested data, and the Y data is taken
from the first column of tested data. The X data for the graphs in the
second row of the matrix is taken from the first column of tested data,
and the Y data is taken from the second column of tested data. The X
data for the graphs in the third row of the matrix is taken from the
second column of tested data, and the Y data is taken from the third
column of tested data, etc. The number of graph rows in the matrix is
equal to the number of data columns being tested.
Creating the Pearson To generate the graph of Pearson Product Moment report data:
Product Moment
Report Graph 1 Click the toolbar button, or choose the Graph menu Create
Graph command when the Pearson product moment report is
selected. The Create Graph dialog box appears displaying the
Scatter Matrix graph
FIGURE 12–69
The Create Graph Dialog
Box
for the Pearson Product
Moment Correlation Report
FIGURE 12–70
A Scatter Matrix Graph
of the Pearson Product
Moment Report Data
If you want to assume that the value of one variable affects the other, use
some form of regression. If you need to find the correlation of normally
distributed data, use the parametric Pearson Product Moment
Correlation.
About the When an assumption is made about the dependency of one variable on
Spearman Rank Order another, it affects the computation of the regression line. Reversing the
Correlation Coefficient assumption of the variable dependencies results in a different regression
line.
The Spearman Rank Order Correlation coefficient does not require the
variables to be assigned as independent and dependent. Instead, only
the strength of association is measured.
3 Run the test by selecting the worksheet columns with the data you
want to test using the Pick Columns dialog box (12-632).
Place the data for each variable in a column. You must have at least two
columns of variables, with a maximum of 64 columns. Observations
containing missing values are ignored. However, rank order correlations
require columns of equal length.
FIGURE 12–71
Data for Computing a
Spearman Rank Order
Correlation Coefficient
➤ Select the columns to test from the worksheet before choosing the
test, or
➤ Select the columns while computing the coefficient
To run a Spearman Rank Order Correlation test, you need to select the
data to test. The Pick Columns dialog box is used to select the
worksheet columns with the data you want to test and to specify how
your data is arranged in the worksheet.
1 If you want to select your data before you run the correlation, drag
the pointer over your data.
If you selected columns before you chose the test, the selected
columns appear in the column list. If you have not selected
columns, the dialog box prompts you to pick your data.
FIGURE 12–72
The Pick Columns
for Spearman Correlation
Dialog Box Prompting You to
Select Data Columns
( The report scroll bars only scroll to the top and bottom of the current page.
To move to the next or the previous page in the report, use the and
buttons in the formatting toolbar to move one page up and down in the
report.
Result Explanations In addition to the numerical results, expanded explanations of the results
may also appear. To turn off this explanatory text, choose the Statistics
menu Options command, choose Report, and unselect the Explain
Results option. The number of decimal places is also set in the Report
Options dialog box.
FIGURE 12–73
The Spearman Rank
Order Correlation
Report
Spearman Correlation The Spearman correlation coefficient rs quantifies the strength of the
Coefficient rs association between the variables. rs varies between !1 and 1. A
correlation coefficient near 1 indicates there is a strong positive
relationship between the two variables, with both always increasing
together. A correlation coefficient near !1 indicates there is a strong
negative relationship between the two variables, with one always
P Value The P value is the probability of being wrong in concluding that there is
a true association between the variables (i.e., the probability of falsely
rejecting the null hypothesis, or committing a Type I error). The smaller
the P value, the greater the probability that the variables are correlated.
Number of Samples This is the number of data points used to compute the correlation
coefficient. This number reflects samples omitted because of missing
values in one of the two variables used to compute each correlation
coefficient.
The first row of the matrix represents the first set of variables or the first
column of data, the second row of the matrix represents the second set of
variables or the second data column, and the third row of the matrix
represents the third set of variables or third data column. The X and Y
data for the graphs correspond to the column and row of the graph in
the matrix.
For example, the X data for the graphs in the first row of the matrix is
taken from the second column of tested data, and the Y data is taken
from the first column of tested data. The X data for the graphs in the
second row of the matrix is taken from the first column of tested data,
and the Y data is taken from the second column of tested data. The X
data for the graphs in the third row of the matrix is taken from the
second column of tested data, and the Y data is taken from the third
column of tested data, etc. The number of graph rows in the matrix is
equal to the number of data columns being tested.
2 Select Scatter Plot Matrix from the Graph Type list, then select
OK, or double-click the desired graph in the list.
FIGURE 12–74
The Create Graph Dialog
Box
for the Spearman
Correlation Report
Nonlinear Regression 10
4 $ yi – ŷi %
2
SS =
i=1
Data for Nonlinear Regressions consists of the data for the observed
dependent variable in one column, and the corresponding data for the
independent variables in one or more columns.
FIGURE 12–75
Data Format for a
Nonlinear Regression
Note that Nonlinear
Regression does not
necessarily require data in
the worksheet; variable
values can be specified using
ranges within the Nonlinear
Regression dialog box.
Note that Nonlinear Regression does not necessarily require data in the
worksheet; variable values can be specified using ranges within the
Nonlinear Regression dialog box under the [Variables] heading.
1 If you are going to run the test after changing test options, and
want to select your data before you run the test, drag the pointer
over your data.
3 Click the Residuals tab to view the Residual options (see Figure
12–77 on page 642), the More Statistics tab to view the
Confidence Interval, PRESS, and Standardized Coefficients
options (see Figure 12–78 on page 644), Other Diagnostics tab to
view Influence, VIF, and Power options (see Figure 12–80 on page
646). Click the Assumption Checking tab to return to the
Normality and Equal Variance options.
5 To continue the test, click Run Test. The Pick Columns dialog
box appears (see page 648 for more information).
6 To accept the current settings and close the options dialog box,
click OK. To apply the current settings without closing the
options dialog box, click Apply. To close the dialog box without
changing any settings or running the test, click Cancel.
( You can select Help at any time to access SigmaStat’s online help
system.
Assumption Checking Select the Assumption Checking tab from the options dialog box to view
the Normality, Constant Variance, and Durbin-Watson options. These
options test your data for its suitability for regression analysis by
checking three assumptions that a nonlinear regression makes about the
data. A nonlinear regression assumes:
Only disable the assumption checking options if you are certain that the
data was sampled from normal populations with constant variance and
that the residuals are independent of each other.
FIGURE 12–76
The Options for
Nonlinear Regression
Dialog Box Displaying
the Assumption
Checking Options
( Although the assumption tests are robust in detecting data from populations
that are non-normal or with unconstant variances, there are extreme
conditions of data distribution that these tests cannot detect. However, these
conditions should be easily detected by visually examining the data without
resorting to the automatic assumption tests.
Difference from 2 Value Enter the acceptable deviation from 2.0 that
you consider as evidence of a serial correlation in the Difference for 2.0
box. If the computed Durbin-Watson statistic deviates from 2.0 more
than the entered value, SigmaStat warns you that the residuals may not
be independent. The suggested deviation value is 0.50, i.e., Durbin-
Watson Statistic values greater than 2.5 or less than 1.5 flag the residuals
as correlated.
Predicted Values Use this option to calculate the predicted value of the
dependent variable for each observed value of the independent
variable(s), then save the results to the worksheet. Click the selected
check box if you do not want to include raw residuals in the worksheet.
FIGURE 12–77
The Options for
Nonlinear Regression
Dialog Box Displaying
the Residuals Options
Raw Residuals The raw residuals are the differences between the
predicted and observed values of the dependent variables. To include
raw residuals in the report, make sure this check box is selected. Click
the selected check box if you do not want to include raw residuals in the
worksheet.
To include studentized residuals in the report, make sure this check box
is selected. Click the selected check box if you do not want to include
studentized residuals in the worksheet.
SigmaStat can automatically flag data points with “large” values of the
studentized deleted residual, i.e., outlying data points; the suggested data
points flagged lie outside the 95% confidence interval for the regression
population.
( Note that both Studentized and Studentized deleted residuals use the same
confidence interval setting to determine outlying points.
Confidence Intervals Select the More Statistics tab in the options dialog box to view the
confidence interval options. You can set the confidence interval for the
population, regression, or both and then save them to the worksheet.
FIGURE 12–78
The Options for Nonlinear
Regression Dialog Box
Displaying the Confidence
Interval, PRESS Prediction
Error, and Standardized
Coefficient Options
PRESS Select the More Statistics tab in the options dialog box to view the
Prediction Error PRESS Prediction Error option (see Figure 12–78 on page 644). The
PRESS Prediction Error is a measure of how well the regression equation
fits the data. Leave this check box selected to evaluate the fit of the
equation using the PRESS statistic. Click the selected check box if you
do not want to include the PRESS statistic in the report.
Standardized Click the More Statistics tab in the options dialog box to view the
Coefficients ()i) Standardized Coefficients option (see Figure 12–78 on page 644).
These are the coefficients of the regression equation standardized to
dimensionless values,
sx
) i = b i ----i
sy
Influence Options Select the Other Diagnostics tab in the options dialog box to view the
Influence options. Influence options automatically detect instances of
influential data points. Most influential points are data points which are
outliers, that is, they do not do not “line up” with the rest of the data
points. These points can have a potentially disproportionately strong
influence on the calculation of the regression line. You can use several
influence tests to identify and quantify influential points.
2
100 120 140 160 180 200 220 240 260 280 300
Predicted values that change by more than two standard errors when the
data point is removed are considered to be influential.
Check the DFFITS check box to compute this value for all points and
flag influential points, i.e., those with DFFITS greater than the value
specified in the Flag Values > edit box. The suggested value is 2.0
standard errors, which indicates that the point has a strong influence on
the data. To avoid flagging more influential points, increase this value;
to flag less influential points, decrease this value.
FIGURE 12–80
The Options for Nonlinear
Regression Dialog Box
Displaying the Influence
and Power Options
Check the Leverage check box to compute the leverage for each point
and automatically flag potentially influential points, i.e., those points
that could have leverages greater than the specified value times the
expected leverage. The suggested value is 2.0 times the expected leverage
2 $ k + 1 %-
for the regression (i.e., --------------------
n
). To avoid flagging more potentially
influential points, increase this value; to flag points with less potential
influence, lower this value.
Check the Cook's Distance check box to compute this value for all
points and flag influential points, i.e., those with a Cook's distance
greater than the specified value. The suggested value is 4.0. Cook's
distances above 1 indicate that a point is possibly influential. Cook's
distances exceeding 4 indicate that the point has a major effect on the
values of the parameter estimates. To avoid flagging more influential
points, increase this value: to flag less influential points, lower this value.
Power Select the Other Diagnostics tab in the options dialog box to view the
Power options (see Figure 12–80 on page 646). The power of a
regression is the power to detect the observed relationship in the data.
The alpha (&) is the acceptable probability of incorrectly concluding
there is a relationship.
Check the Power check box to compute the power for the nonlinear
regression data. Change the alpha value by editing the number in the
Alpha Value edit box. The suggested value is & * 0.05. This indicates
that a one in twenty chance of error is acceptable, or that you are willing
to conclude there is a significant relationship when P # 0.05.
The equation should contain all of the variables you want to use as
independent variables, as well as the dependent variable to be fit to
the data.
FIGURE 12–81
The Edit Nonlinear
Regression Dialog Box with
Regression Settings Entered
Under the Appropriate
Headings
You can use this dialog box to edit the Nonlinear Regression, then run
the Nonlinear Regression again, check your parameter constraints, save
the results to the worksheet, and view the Nonlinear Regression report.
FIGURE 12–82
The Nonlinear Regression
Results Dialog Box
Select the Report button in the Nonlinear Regression Results dialog box
to view detailed Nonlinear Regression report. The report for a
Nonlinear Regression lists all the settings entered into the Nonlinear
Regression dialog box, and a table of the values and statistics for the
regression parameters.
For descriptions of the derivations of these results, you can reference any
appropriate statistics reference. For a list of suggested references, see
page 12.
( The report scroll bars only scroll to the top and bottom of the current page.
To move to the next or the previous page in the report, use the and
buttons in the formatting toolbar to move one page up and down in the
report.
Result Explanations In addition to the numerical results, expanded explanations of the results
may also appear. To turn off this explanatory text, choose the Statistics
menu Report Options... command and click the selected Explain Test
Results check box.
Initial Settings These are the settings as entered into the Nonlinear Regressions dialog
box.
Fit Equations These are the regression model, fit statement, and
options weighting statement, as specified under the [Equations] heading
in the Nonlinear Regressions dialog box.
R and R Squared The multiple correlation coefficient, and R2, the coefficient of
determination for Nonlinear Regression, are both measures of how well
the regression model describes the data. R values near 1 indicate that the
equation is a good description of the relation between the independent
and dependent variables.
R equals 0 when the values of the independent variable does not allow
any prediction of the dependent variables, and equals 1 when you can
perfectly predict the dependent variables from the independent
variables.
Standard Error of The standard error of the estimate S y x is a measure of the actual
the Estimate ( S y x ) variability about the regression plane of the underlying population. The
underlying population generally falls within about two standard errors of
the observed sample.
Statistical The standard error, t and P values are approximations based on the final
Summary Table iteration of the Nonlinear Regression.
t Statistic The t statistic tests the null hypothesis that the coefficient of
the independent variable is zero, that is, the independent variable does
not contribute to predicting the dependent variable. t is the ratio of the
regression coefficient to its standard error, or
regression coefficient
t = -------------------------------------------------------------------------------------
-
standard error of regression coefficient
You can conclude from “large” t values that the independent variable can
be used to predict the dependent variable (i.e., that the coefficient is not
zero).
VIF The variance inflation factor (VIF) for each parameter is a measure
of the uncertainty with which the parameter can be estimated.
Parameters with large VIFs (much greater than 1.0) indicate that the
equation(s) used are “overparameterized.” There are too many
parameters to allow unique identification of the parameter values from
the available data, and a model with fewer parameters may be better.
Analysis of Variance The ANOVA (analysis of variance) table lists the ANOVA statistics for
(ANOVA ) Table the regression and the corresponding F value for each step.
2
The residual mean square is also equal to S y x .
PRESS Statistic PRESS, the Predicted Residual Error Sum of Squares, is a gauge of
how well a regression model predicts new data. The smaller the PRESS
statistic, the better the predictive ability of the model.
Regression assumes that the residuals are independent of each other; the
Durbin-Watson test is used to check this assumption. If the Durbin-
Watson value deviates from 2 by more than the value set in the Options
for Nonlinear Regression dialog box, a warning appears in the report.
The suggested trigger value is a difference of more than 0.50, i.e., the
Durbin-Watson statistic is below 1.50 or above 2.50.
Normality Test The normality test results display whether the data passed or failed the
test of the assumption that the source population is normally distributed
around the regression, and the P value calculated by the test. All
regressions require a source population to be normally distributed about
the regression line. When this assumption may be violated, a warning
appears in the report. This result appears unless you disabled normality
testing in the Options for Nonlinear Regression dialog box (see page
640).
Constant The constant variance test result displays whether or not the data passed
Variance Test or failed the test of the assumption that the variance of the dependent
variable in the source population is constant regardless of the value of
the independent variable, and the P value calculated by the test. When
the constant variance assumption may be violated, a warning appears in
the report.
If you receive this warning, you should consider trying a different model
(i.e., one that more closely follows the pattern of the data) using a
weighted regression, or transforming the independent variable to
stabilize the variance and obtain more accurate estimates of the
parameters in the regression equation.
Power This result is displayed if you selected this option in the Options for
Nonlinear Regression dialog box.
The & value is set in the Options dialog box; the suggested value is
&"* 0.05 which indicates that a one in twenty chance of error is
acceptable. Smaller values of & result in stricter requirements before
concluding the model is correct, but a greater possibility of concluding
the model is bad when it is really correct (a Type II error). Larger values
of & make it easier to conclude that the model is correct, but also
increase the risk of accepting a bad model (a Type I error).
Regression Diagnostics The regression diagnostic results display the values for the predicted
values, residuals, and other diagnostic results selected in the Options for
Nonlinear Regression dialog box (see page 644). All results that qualify
as outlying values are flagged with a # symbol. The trigger values to flag
residuals as outliers are set in the Options for Nonlinear Regression
dialog box.
If you selected Report Cases with Outliers Only, only those observations
that have one or more residuals flagged as outliers are reported; however,
all other results for that observation are also displayed.
If the residuals are normally distributed about the regression, about 66%
of the standardized residuals have values between !1 and 1, and about
95% of the standardized residuals have values between !2 and 2. A
larger standardized residual indicates that the point is far from the
regression; the suggested value flagged as an outlier is 2.5.
Influence Diagnostics The influence diagnostic results display only the values for the results
selected in the Options dialog box under the Other Diagnostics tab (see
page 645). All results that qualify as outlying values are flagged with a #
symbol. The trigger values to flag data points as outliers are also set in
the Options for Nonlinear Regression dialog box under the Other
Diagnostics tab.
If you selected Report Cases with Outliers Only, only observations that
have one or more observations flagged as outliers are reported; however,
all other results for that observation are also displayed.
Confidence Intervals These results are displayed if you selected them in the Regression
Options dialog box. If the confidence interval does not include zero,
you can conclude that the coefficient is different than zero with the level
of confidence specified. This can also be described as P # & (alpha),
where & is the acceptable probability of incorrectly concluding that the
coefficient is different than zero, and the confidence interval is 100(1 !
&).
The specified confidence level can be any value from 1 to 99; the
suggested confidence level for both intervals is 95%.
Predicted Values This is the value for the dependent variable predicted
by the regression model for each observation.
Regression The confidence interval for the regression gives the range of
variable values computed for the region containing the true relationship
between the dependent and independent variables, for the specified level
of confidence.
Population The confidence interval for the population gives the range
of variable values computed for the region containing the population
from which the observations were drawn, for the specified level of
confidence.
You can generate up to six graphs using the results from a Stepwise
Regression. They include a:
Histogram of Residuals The Nonlinear Regression histogram plots the raw residuals in a
specified range, using a defined interval set. The residuals are divided
into a number of evenly incremented histogram intervals and plotted as
histogram bars indicating the number of residuals in each interval. The
X axis represents the histogram intervals, and the Y axis represent the
number of residuals in each group. For an example of a histogram, see
page 153.
Scatter Plot of The Nonlinear Regression scatter plot of the residuals plots the residuals
the Residuals of the data in the selected independent variable column as points relative
to the standard deviations. The X axis represents the independent
variable values, the Y axis represents the residuals of the variables, and
the horizontal lines running across the graph represent the standard
deviations of the data. For an example of a scatter plot, see page 152.
Bar Chart of The Nonlinear Regression bar chart of the standardized residuals plots
the Standardized the standardized residuals of the data in the selected independent
Residuals variable column as points relative to the standard deviations. The X axis
represents the selected independent variable values, the Y axis represents
the residuals of the variables, and the horizontal lines running across the
graph represent the standard deviations of the data. For an example of a
bar chart, see page 153.
Line/Scatter Plot The Nonlinear Regression line/scatter graph plots the observations of
of the Regression with the stepwise regression for the data of the selected independent variable
Prediction and column as a line/scatter plot. The points represent the dependent
Confidence Intervals variable data plotted against the selected independent variable data, the
solid line running through the points represents the regression line, and
the dashed lines represent the prediction and confidence intervals. The
X axis represents the independent variables and the Y axis represents the
dependent variables. For an example a line/scatter plot, see page 156.
3D Residual The Nonlinear Regression 3D residual scatter plot graphs the residuals
Scatter Plot of the two selected columns of independent variable data. The X and
the Y axes represent the independent variables, and the Z axis represents
the residuals. For an example a 3D residual scatter plot, see page 156.
FIGURE 12–84
The Create Graph Dialog
Box
for the Stepwise
Regression Report
2 Select the type of graph you want to create from the Graph Type
list, then select OK, or double-click the desired graph in the list.
For more information on each of the graph types, see pages 12-662
through 12-663.
FIGURE 12–85
Select X Independent
Variable Prompting you to
Select The Independent
Variable You Want to Plot
3 Select the columns with the independent variables you want to use
in the graph, then select OK. The graph appears using the
specified independent variables.
FIGURE 12–86
Example of a 2D Scatter
Plot of the Residuals for a
Nonlinear Regression
Report
12 Survival Analysis
Survival analysis studies the variable that is the time to some event. The
term survival originates from the event death. But the event need not be
death; it can be the time to any event. This could be the time to closure
of a vascular graft or the time when a mouse footpad swells from
infection. Of course it need not be medical or biological. It could be the
time a motor runs until it fails. For consistency we will use survival and
death (or failure) here.
Sometimes death doesn't occur during the length of the study or the
patient dies from some other cause or the patient relocates to another
part of the country. Though a death did not occur, this information is
useful since the patient survived up until the time he or she left the
study. When this occurs the patient is referred to as censored. This comes
from the expression censored from observation - the data has been lost
from view of the study. Examples of censored values are patients who
moved to another geographic location before the study ended and
patients who are alive when the study ended. Kaplan-Meier survival
analysis includes both failures (death) and censored values.
Three Survival Tests Use the Survival statistic to obtain one of the following three tests.
➤ Single Group Use this to analyze and graph one survival curve.
➤ LogRank Use this to compare two or more survival curves. The
LogRank test assumes that all survival time data is equally accurate
and all data will be equally weighted in the analysis.
➤ Gehan-Breslow Use this to compare two or more survival curves
when you expect the early data to be more accurate than later. Use
this, for example, if there are many more censored values at the end
of the study than at the beginning.
➤ Survival time
➤ Status
➤ Group
The survival times are the times when the event occurred. They must be
positive and all non-positive values will be considered missing values.
The data need not be sorted by survival time or group.
The group variable defines each individual survival data set (and curve).
➤ Raw data format Column pairs of survival time and status value for
each group.
➤ Indexed data format Data indexed to a group column.
Raw Data To enter the data in Raw data format, enter the survival time in one
column and the corresponding status in a second column. Do this for
each group. If you wish, you can identify each group with a column title
in the survival time column. If you do this then these group titles will be
used in the graph and report.
FIGURE 12–1
Raw Data Format for a
Survival Analysis
with Two Groups
Columns 1 and 2 are the
survival time and status
values for the first group -
Affected Node. Columns 3
and 4 are the same for the
second group - Total Node.
The report and the survival
curve graph will use the text
strings (“Affected Node”,
“Total Node”) found in the
survival time column titles.
( The worksheet columns for each group must be the same length. If not then
the cells in the longer length column will be considered missing. All non-
positive survival times will also be considered missing. All status variable
values not defined as either a failure or a censored value will be considered
missing.
Indexed Data Indexed data is a three-column format. The survival time and status
variable in two columns are indexed on the group names in a third
column. Informative column titles are not necessary but are useful when
selecting columns in the wizard.
FIGURE 12–2
Indexed Data Format - a
Three-Column Format
Consisting of Group,
Survival Time, and Status
In this example group is in
column 1, survival time is in
column 2 and the status
variable is in column 3.
( The Transforms menu Index and Unindex commands are not designed for
converting between survival analysis data formats. To use these features you
must index and unindex the survival time and status variables separately and
then reorganize the resulting columns.
The Single Group option analyzes the survival data from one group,
creates a report and a graph with a single survival curve. There is no
statistical test performed but statistics associated with the data, such as
the median survival time, are calculated and presented in the report.
2 If desired set the Single Group options using the Options for
Survival Single Group dialog. For more information, see Setting
Single Group Options on page 672.
3 Select Survival Single Group from the toolbar then click the
button, or choose the Statistics menu Survival command, then
choose Single Group.
4 Select the two worksheet columns with the survival times and
status values using the Pick Columns panel.
5 Click Next and select the Event and Censored labels. You may
select multiple labels for each.
6 Click Finish.
7 View and interpret the Single Group survival analysis report and
curve ( see Interpreting Single Group Survival Results on page
677).
Two data columns are required: a column with survival times and a
column with status labels. These can be just two columns in a worksheet
or two columns from a multi-group data set. So you can select a single
pair of columns from the multiple groups in the Raw data format.
( Use this option to analyze all groups as a single group from an indexed
format data set. For example, select the last two columns in the worksheet
shown in Figure 12–2 to analyze both groups as one group. You cannot do
this directly with Raw data format since the groups are not concatenated in
two columns. You would need to use the Stack feature in Transforms to
concatenate the columns.
Selecting Data Columns When running a Single Group survival analysis you can either:
1 If you are going to analyze your survival curve after changing test
options, and want to select your data before you create the curve,
drag the pointer over your data.
FIGURE 12–3
The Options for Survival
Curve Dialog Displaying the
Graph Options
3 Click the Graph Options tab to view the graph symbol, line and
scaling options. Additional statistical graph elements may be
selected here.
FIGURE 12–4
The Options for Survival
Curve Dialog Displaying the
Report and Worksheet
Results Options
4 Click the Results tab to specify the survival time units and to
modify the content of the report and worksheet.
5 To continue the test, click the Run Test button. The Pick Columns
panel appears. For more information, see Running a Single Group
Survival Analysis on page 675.
6 To accept the current settings and close the options dialog, click
OK. To accept the current settings without closing the options
dialog, click Apply. To close the dialog without changing any
settings or running the test, click Cancel.
( You can click Help at any time to access SigmaStat's on-line help
system.
All options in these dialogs are “sticky” and will remain in the state that
you have selected until you change them.
Status Symbols All graph options apply to graphs that are created when the analysis is
run. You can use the Graph Properties dialog to modify the attributes of
the survival curves after they have been created.
Censored Symbols Select the Graph Options tab from the Options for
Survival Single Group dialog to view the status symbols options.
Censored symbols are graphed by default. Clear this option to not
display the censored symbols.
Failures Symbols Checking this box will display symbols at the failure
times. These symbols always occupy the inside corners of the steps in the
survival curve. As such they provide redundant information and need
not be displayed.
Group Color The color of the objects in a survival curve group may be changed with
this option. All objects, for example, survival line, symbols, confidence
interval lines, will be changed to the selected color. Use Graph Properties
to modify individual object colors after the graph has been created.
Survival Scale You can display the survival graph either using fractional values
(probabilities) or percents.
Fraction If you select this then the Y axis scaling will be from 0 to 1.
( The results in the report are always expressed in fractional terms no matter
which option is selected for the graph.
Additional Plot Statistics Two different types of graph elements may be added to your survival
curve.
95% Confidence Intervals Selecting this will add the upper and lower
confidence lines in a stepped line format.
Standard Error Bars Selecting this will add error bars for the standard
errors of the survival probability. These are placed at the failure times.
All of these elements will be graphed with the same color as the survival
curve. You may change these colors, and other graph attributes, from
Graph Properties after the graph has been created.
Worksheet 95% Confidence Intervals Select this to place the survival curve upper
and lower 95% confidence interval values into the worksheet. These will
be placed into the first empty worksheet columns.
Time Units Select a time unit from the drop-down list or enter a unit. These units
will be used in the graph axis titles and the survival report.
To run a single group survival analysis you need to select survival time
and status data columns to analyze. The Pick Columns panel is used to
select these two columns in the worksheet.
1 Specify any options for your graph and report. You can do this by
selecting Survival Single Group in the Select Test drop-down list
and either clicking the Test Options button or selecting Current
Test Options from the Statistics menu.
2 If you want to select your data before you run the test, drag the
pointer over your data. The Survival Time column must precede
and be adjacent to the Status column.
3 Open the Pick Columns panel to start the Single Group analysis.
You can either:
4 Click the Run Test button from the Options for Survival Single
Group dialog.
FIGURE 12–5
The Pick Columns for
Survival Single Group Panel
Prompting You to Select
Time and Status Columns
from the Data for Status drop-down list. The first selected column
is assigned to the first row (Time) in the Selected Columns list,
and the next selected column is assigned to the next row (Status) in
the list. The number or title of selected columns appears in each
row.
FIGURE 12–6
The Pick Columns for
Survival Single Group Panel
Prompting You to Select the
Status Variables.
FIGURE 12–7
The Pick Columns for
Survival Single Group Panel
Showing the Results of
Selecting the Status
Variables
You can have more than one Event label and more than one
Censored label. You must select one Event label in order to
proceed. You need not select a censored variable, though, and some
data sets will not have any censored values. You need not select all
8 Use the back arrow keys to remove labels from the Event and
Censored windows. This will place them back in the Status labels
in selected columns window.
The Event and Censored labels that you selected are saved for your
next analysis. If the next data set contains exactly the same status
labels, or if you are re-analyzing your present data set, then the
saved selections will appear in the Event and Censored windows.
9 Click Finish to create the survival graph and report. The results
you obtain will depend on the Test Options that you selected.
The Single Group survival analysis report displays information about the
origin of your data, a table containing the cumulative survival
probabilities and summary statistics of the survival curve.
Results Explanations The number of significant digits displayed in the report may be set in
the Report Options dialog. Use Tools, Options to display this dialog.
For more information on setting report options, see Setting Report
Options on page 135.
Report Header The report header includes the date and time that the analysis was
Information performed. The data source is identified by the worksheet title
containing the data being analyzed and the notebook name. In Figure
12–8 the Data source shows the worksheet title to be “standard,
squamous” and the notebook name to be “Survival Analysis Data”. The
event and censor labels used in this analysis are listed. Also, the time
units used are displayed.
Survival Cumulative The survival probability table lists all event times and, for each event
Probability Table time, the number of events that occurred, the number of subjects
remaining at risk, the cumulative survival probability and its standard
error. The upper and lower 95% confidence limits are not displayed but
these may be placed into the worksheet (see Figure 12–4). Failure times
are not shown but you can infer their existence from jumps in the
Number at Risk data and the summary table immediately below this
table
You can turn the display of this table off by clearing this option in the
Results tab of Test Options. This is useful for large data sets.
Data Summary Table The data summary table shows the total number of cases. The sum of
the number of events, censored and missing values, shown below this,
will equal the total number of cases.
Statistical Summary The mean and percentile survival times and their statistics are listed in
Table this table. The median survival time is commonly used in publications.
FIGURE 12–8
The Single Group
Survival Analysis
Results Report
You can control the graph in three ways. You can set the graph options
shown in Figure 12–3 and these options will become the default values
until they are changed. After the graph is created you can modify it using
SigmaStat's Graph Options. Each object in the graph is a separate plot
FIGURE 12–9
A single group
survival curve
The LogRank option analyzes survival data from multiple groups and
creates a report and a graph showing multiple survival curves. Statistics
associated with each group, such as the median survival time, are
calculated and presented in the report.
The LogRank statistic for two groups is formed from the square of the
sum, across all event times, of the difference of the observed and
estimated number of events (censored values removed) divided by the
2 If desired set the LogRank options using the Options for Survival
LogRank dialog. For more information, see Setting LogRank
Survival Options on page 681.
6 Select the groups from the Group panel if you selected Indexed
data format and click Next
7 Select the Event and Censored labels. You may select multiple
labels for each.
8 Click Finish.
Multiple Time, Status column pairs (two or more) are required for Raw
data format. Indexed data format requires three columns for Group,
Time and Status. You can preselect the data to have the column selection
panel automatically select the Time, Status column pairs if you organize
your worksheet with the Time column preceding the Status column and
have all columns be adjacent. For Indexed data format, placing the
Group, Time and Status variables in adjacent columns and in that order
also allows automatic column selection.
1 If you are going to analyze your survival curve after changing test
options, and want to select your data before you create the curve,
then drag the pointer over your data.
FIGURE 12–10
The Options for Survival
LogRank Dialog Displaying
the Graph Options
3 Click the Graph Options tab to view the graph symbol, line and
scaling options. Additional statistical graph elements may also be
selected here.
4 Click the Results tab to specify the survival time units and to
modify the content of the report and worksheet.
FIGURE 12–11
The Options for Survival
LogRank Dialog Displaying
the Report and Worksheet
Results Options
Click the Post Hoc Tests tab to modify the multiple comparison
options.
FIGURE 12–12
The Options for Survival
LogRank Dialog Displaying
the Post Hoc Test Options
6 To accept the current settings and close the options dialog, click
OK. To accept the current settings without closing the options
dialog, click Apply. To close the dialog without changing any
settings or running the test, click Cancel.
( All options in these dialogs are “sticky” and will remain in the state that you
have selected until you change them.
Status Symbols All graph options apply to graphs that are created when the analysis is
run. You can use the Graph Properties dialog to modify the attributes of
the survival curves after they have been created.
Censored Symbols Select the Graph Options tab from the Options
dialog to view the status symbols options. Censored symbols are graphed
by default. Clear this option to not display the censored symbols.
Failures Symbols Checking this box will display symbols at the failure
times. These symbols always occupy the inside corners of the steps in the
survival curve. As such they provide redundant information and need
not be displayed.
Group Color The color of the objects in a survival curve group may be changed with
this option. All objects, e.g., survival line, symbols, confidence interval
lines, will be changed to the selected color or color scheme. A four
density gray scale color scheme is used as the default. You may change
this to black, where all survival curves and their attributes will be black,
or incrementing that is a multi-color scheme. Use Graph Properties to
modify individual object colors after the graph has been created.
Survival Scale You can display the survival graph either using fractional values
(probabilities) or percents.
Fraction If you select this then the Y axis scaling will be from 0 to 1.
Additional Plot Statistics Two different types of graph elements may be added to your survival
curves.
95% Confidence Intervals Selecting this will add the upper and lower
confidence lines in a stepped line format.
Standard Error Bars Selecting this will add error bars for the standard
errors of the survival probability. These are placed at the failure times.
All of these elements will be graphed with the same color as the survival
curve. You may change these colors, and other graph attributes, from
Graph Properties after the graph has been created.
( The critical P value for the LogRank test may also be changed.
Entering a different value for P Value for Significance at the Report
tab of Tools, Options does this. This is a global setting for the
critical P value and will affect all tests in SigmaStat.
Worksheet 95% Confidence Intervals Select this to place the survival curve upper
and lower 95% confidence intervals into the worksheet. These will be
placed into the first empty worksheet columns.
Time Units Select a time unit from the drop-down list or enter a unit. These units
will be used in the graph axis titles and the survival report.
Multiple Comparisons You can select when multiple comparisons are to be computed and
displayed in the report. LogRank tests the hypothesis of no differences
between survival groups but do not determine which groups are
different, or the sizes of these differences. Multiple comparison
procedures isolate these differences.
1 Specify any options for your graph, report and post-hoc tests (see
Figure 12–10, Figure 12–11, and Figure 12–12). You can do this
by selecting Survival LogRank in the Select Test drop-down list
and either clicking the Test Options button or selecting Current
Test Options from the Statistics menu.
2 To select your data before you run the test drag the pointer over
your data. The columns must be adjacent and in the correct order
(Time, Status for Raw data and Group, Time Status for Indexed
data).
FIGURE 12–13
The Data Format Panel With
Raw Data Format Selected
5 Click Next to display the Pick Columns panel that prompts you to
select your data columns. If you selected columns before you chose
the test, the selected columns appear in the Selected Columns list.
FIGURE 12–14
The Pick Columns Panel for
Survival LogRank Raw Data
Format Prompting You to
Select Multiple Time and
Status Columns
FIGURE 12–15
The Pick Columns for
Survival LogRank Panel
Prompting You to Select the
Status Variables
Figure 12–15 shows an example with one event variable “1”, two
censored variables “0” and “censored” and one variable
“lobectomy” to be ignored. Select these and click the right arrow
buttons to place the event variables in the Event window and the
censored variable in the Censored window. The result of this
selection is shown inFigure 12–16. All data associated with
“lobectomy” will be considered missing.
FIGURE 12–16
The Pick Columns for
Survival LogRank Dialog
Showing the Results of
Selecting the Status
Variables
You can have more than one Event label and more than one
Censored label. You must select one Event label in order to
proceed. You need not select a censored variable, though, and some
data sets will not have any censored values. You need not select all
the variables; any data associated with unselected status variables
will be considered missing.
10 Use the back arrow keys to remove labels from the Event and
Censored windows. This will place them back in the Status labels
in selected columns window.
The Event and Censored labels that you selected are saved for your
next analysis. If the next data set contains exactly the same status
labels, or if you are re-analyzing your present data set, then the
saved selections will appear in the Event and Censored windows.
11 Click Finish to create the survival graph and report. The results
you obtain will depend on the Test Options that you selected.
If you selected Indexed data format then the Pick Columns panel
asks you to select the three columns in the worksheet for your
Group, Time and Status (Figure 12–17).
FIGURE 12–17
The Pick Columns Panel for
Survival LogRank Indexed
Data Format Prompting You
to Select Group, Time and
Status Columns
12 Click Next to select the groups you want to include in the analysis.
If you want to analyze all groups found in the Group column then
select the Select all groups checkbox. Otherwise select groups from
the Data for Group drop-down list. You can select subsets of all
groups and, as shown in Figure 12–18, select them in the order
that you wish to see them in the report.
FIGURE 12–18
The Group Selection Panel
for Survival LogRank
Indexed Data Format
Prompting You to Select
Groups to Analyze
Multiple Comparison LogRank tests the hypothesis of no differences between the several
Options survival groups, but does not determine which groups are different, or
the sizes of the differences. Multiple comparison tests isolate these
differences by running comparisons between the experimental groups.
There are two multiple comparison tests to choose from for the
LogRank survival analysis:
➤ Holm-Sidak
➤ Bonferroni
Holm-Sidak Test The Holm-Sidak test is an improvement on the Bonferroni test that
avoids the low power and overconservatism that the Bonferroni test
yields. The Holm-Sidak test is a sequentially rejective procedure because
it applies an accept/reject criterion to a set of ordered null hypotheses
(Glantz, see page 1-12). The Bonferroni test is not sequential. The
Holm-Sidak test can be described by example using the VA Lung Cancer
data in Samples.jnb. The multiple comparison results are shown in
Figure 12–19.
FIGURE 12–19
Holm-Sidak Multiple
Comparison Results for VA
Lung Cancer Study
There are six comparisons of the four survival groups small, large, adeno
and squamous. The LogRank statistic is computed for all data pairs and
the corresponding P value (Unadjusted P Value) determined from the
chi-square distribution. The comparisons are ranked by ascending P
value and the critical P level computed (the critical P level depends only
on the rank, total number of comparisons and the family P value set in
Options). The unadjusted P value is compared to the critical level to
Bonferroni Test The Bonferroni test performs pairwise comparisons with paired chi-
square tests. It is computationally similar to the Holm-Sidak test except
that it is not sequential (the critical level used is fixed for all
comparisons). The critical level is the ratio of the family P value (set in
Post Hoc Test Options - Figure 12–12) to the number of comparisons.
It is a more conservative test than the Holm-Sidak test in that the chi-
square value required to conclude that a difference exists becomes much
larger than it really needs to be. Bonferroni multiple comparison results
for the VA Lung Cancer data from Samples.jnb are shown in Figure
Figure 12–20.
FIGURE 12–20
Bonferroni Multiple
Comparison Results for VA
Lung Cancer Study
The critical level is constant at 0.05/6 = 0.00833. Since the critical level
does not increase, as it does for the Holm-Sidak test, there will tend to
be fewer comparisons with significant differences (they are the same for
this particular example but not for the Gehan-Breslow test, Figure
Figure 12–33).
Results Explanations The number of significant digits displayed in the report may be set in
the Report Options dialog. For more information on setting report
options, see Setting Report Options on page 135.
Report Header The report header includes the date and time that the analysis was
Information performed. The data source is identified by the worksheet title
containing the data being analyzed and the notebook name. In Figure
12–21 the Data source shows the worksheet title to be “VA Lung Cancer
Trial” and the notebook title to be “Survival Analysis Data”. The event
and censor labels used in this analysis are listed. Also, the time units used
are displayed.
Survival Cumulative The survival probability table lists all event times and, for each event
Probability Table time, the number of events that occurred, the number of subjects
remaining at risk, the cumulative survival probability and its standard
error. The upper and lower 95% confidence limits are not displayed but
these may be placed into the worksheet (see Figure 12–11). Failure times
are not shown but you can infer their existence from jumps in the
Number at Risk data and the summary table immediately below this
table
You can turn the display of this table off by clearing this option in the
Results tab of Test Options. This is useful to keep the report a reasonable
length when you have large data sets.
Data Summary Table The data summary table shows the total number of cases. The sum of
the number of events, censored and missing values, shown below this,
will equal the total number of cases.
Statistical Summary The mean and percentile survival times and their statistics are listed in
Table this table. The median survival time is commonly used in publications.
FIGURE 12–21
The LogRank Survival
Analysis Results Report
You can control the graph in three ways. You can set the graph options
shown in Figure 12–10 and these options will become the default values
until they are changed. After the graph is created you can modify it using
SigmaStat's Graph Options. Each object in the graph is a separate plot
(for example, survival curve, failure symbols, censored symbols, upper
confidence limit, etc.) so you have considerable control over the
appearance of your graph. If you also have SigmaPlot then you can use
Edit with SigmaPlot from the Graph menu and obtain additional
control over your graph.
FIGURE 12–22
LogRank Survival Curves
The default Test Options, gray
scale colors, solid circle
symbols, was used.
Squamous and large cell
carcinomas do not appear to
be significantly different (as
well as small cell and
adenocarcinoma). This is
confirmed by the Holm-Sidak
test (see Figure 12–19).
The Gehan-Breslow test assumes that the early survival times are known
more accurately than later times and weights the data accordingly. As an
example, you would want to use Gehan-Breslow if there were many late-
survival-time censored values. This is different from the LogRank test
that assumes there is no difference in the accuracy of the survival times.
6 Select the groups from the Group panel if you selected Indexed
data format and click Next.
7 Select the Event and Censored labels. You may select multiple
labels for each.
8 Click Finish.
Multiple Time, Status column pairs (two or more) are required for Raw
data format. Indexed data format requires three columns for Group,
Time and Status. You can preselect the data to have the column selection
panel automatically select the Time, Status column pairs if you organize
your worksheet with the Time column preceding the Status column and
have all columns be adjacent.
1 If you are going to analyze your survival curve after changing test
options, and want to select your data before you create the curve,
then drag the pointer over your data.
FIGURE 12–23
The Options for
Survival Gehan-Breslow
Dialog Displaying the
Graph Options
3 Click the Graph Options tab to view the graph symbol, line and
scaling options. Additional statistical graph elements may be
selected here.
Click the Results tab to specify the survival time units and to
modify the content of the report and worksheet.
FIGURE 12–24
The Options for Survival
Gehan-Breslow Dialog
Displaying the Report and
Worksheet Results Options
Click the Post Hoc Tests tab to modify the multiple comparison
options.
FIGURE 12–25
The Options for Survival
Gehan-Breslow Dialog
Displaying the Post Hoc Test
Options
5 To continue the test, click the Run Test button. The Pick Columns
panel appears (see Running a Gehan-Breslow Survival Analysis on
page 699 for more information).
6 To accept the current settings and close the options dialog, click
OK. To accept the current settings without closing the options
You can select Help at any time to access SigmaStat's on-line help
system.
( All options in these dialogs are “sticky” and will remain in the state that you
have selected until you change them.
Status Symbols All graph options apply to graphs that are created when the analysis is
run. You can use the Graph Properties dialog to modify the attributes of
the survival curves after they have been created.
Censored Symbols Select the Graph Options tab from the Options
dialog to view the status symbols options. Censored symbols are graphed
by default. Clear this option to not display the censored symbols.
Failures Symbols Checking this box will display symbols at the failure
times. These symbols always occupy the inside corners of the steps in the
survival curve. As such they provide redundant information and need
not be displayed.
Group Color The color of the objects in a survival curve group may be changed with
this option. All objects, e.g., survival line, symbols, confidence interval
lines, will be changed to the selected color. A four density gray scale
color scheme is used as the default. You may change this to black, where
all survival curves and their attributes will be black, or incrementing that
is a multi-color scheme. Use Graph Properties to modify individual
object colors after the graph has been created.
Survival Scale You can display the survival graph either using fractional values
(probabilities) or percents.
Fraction If you select this then the Y axis scaling will be from 0 to 1.
Additional Plot Statistics Two different types of graph elements may be added to your survival
curves.
95% Confidence Intervals Selecting this will add the upper and lower
confidence lines in a stepped line format.
Standard Error Bars Selecting this will add error bars for the standard
errors of the survival probability. These are placed at the failure times.
All of these elements will be graphed with the same color as the survival
curve. You may change these colors, and other graph attributes, from
Graph Properties after the graph has been created.
( The critical P value for the Gehan-Breslow test may also be changed.
Entering a different value for P Value for Significance at the Report tab of
Tools, Options does this. This is a global setting for the critical P value and
will affect all tests in SigmaStat.
Worksheet 95% Confidence Intervals Select this to place the survival curve upper
and lower 95% confidence intervals into the worksheet. These will be
placed into the first empty worksheet columns.
Time Units Select a time unit from the drop-down list or enter a unit.
These units will be used in the graph axis titles and the survival report.
Multiple Comparisons You can select when multiple comparisons are to be computed and
displayed in the report. Gehan-Breslow tests the hypothesis of no
differences between survival groups but do not determine which groups
are different, or the sizes of these differences. Multiple comparison
procedures isolate these differences.
( If multiple comparisons are triggered, the report will show the results of the
comparison. You may elect to always show them by clearing Only when
Survival P Value is Significant.
1 Specify any options for your graph, report and post-hoc tests (see
Figure 12–23, Figure 12–24, and Figure 12–25). You can do this
by selecting Survival Gehan-Breslow in the Select Test drop-down
list and either clicking the Test Options button or selecting
Current Test Options from the Statistics menu.
2 If you want to select your data before you run the test then drag
the pointer over your data. The columns must be adjacent and in
the correct order (Time, Status for Raw data and Group, Time
Status for Indexed data).
FIGURE 12–26
The Data Format Panel With
Raw Data Format Selected
5 Click Next to display the Pick Columns panel that prompts you to
select your data columns. If you selected columns before you chose
the test, the selected columns appear in the Selected Columns list.
FIGURE 12–27
The Pick Columns for
Survival LogRank Panel
Prompting You to Select
Multiple Time and Status
Columns
FIGURE 12–28
The Pick Columns for
Survival Gehan-Breslow
Panel Prompting You to
Select the Status Variables
FIGURE 12–29
The Pick Columns for
Survival Gehan-Breslow
Dialog Showing the Results
of Selecting the Status
Variables
You can have more than one Event label and more than one
Censored label. You must select one Event label in order to
proceed. You need not select a censored variable, though, and some
data sets will not have any censored values. You need not select all
the variables; any data associated with unselected status variables
will be considered missing.
9 Use the back arrow keys to remove labels from the Event and
Censored windows. This will place them back in the Status labels
in selected columns window.
The Event and Censored labels that you selected are saved for your
next analysis. If the next data set contains exactly the same status
labels, or if you are re-analyzing your present data set, then the
saved selections will appear in the Event and Censored windows.
10 Click Finish to create the survival graph and report. The results
you obtain will depend on the Test Options that you selected.
11 If you selected Indexed data format then the Pick Columns panel
asks you to select the three columns in the worksheet for your
Group, Time and Status.
FIGURE 12–30
The Pick Columns Panel for
Survival Gehan-Breslow
Indexed Data Format
Prompting You to Select
Group, Time and Status
Columns
12 Click Next to select the groups you want to include in the analysis.
If you want to analyze all groups found in the Group column then
select the Select all groups checkbox. Otherwise select groups from
the Data for Group drop-down list. You can select subsets of all
groups and select them in the order that you wish to see them in
the report.
FIGURE 12–31
The Group Selection Panel
for Survival Gehan-Breslow
Indexed Data Format
Prompting You to Select
Groups to Analyze
Multiple Comparison Gehan-Breslow tests the hypothesis of no differences between the several
Options survival groups, but does not determine which groups are different, or
the sizes of the differences. Multiple comparison tests isolate these
differences by running comparisons between the experimental groups.
the Options for Gehan-Breslow dialog (see Figure Figure 12–25), the
multiple comparison results are displayed in the Report.
There are two multiple comparison tests to choose from for the
LogRank survival analysis:
➤ Holm-Sidak
➤ Bonferroni
Holm-Sidak Test The Holm-Sidak test is an improvement on the Bonferroni test that
avoids the low power and overconservatism that the Bonferroni test
yields. The Holm-Sidak test is a sequentially rejective procedure because
it applies an accept/reject criterion to a set of ordered null hypotheses
(Glantz, page 1-12). The Bonferroni test is not sequential. The Holm-
Sidak test can be described by example using the VA Lung Cancer data
in Samples.jnb.
FIGURE 12–32
Holm-Sidak Multiple
Comparison Results for VA
Lung Cancer Study
There are six comparisons of the four survival groups small, large, adeno
and squamous. The Gehan-Breslow statistic is computed for all data
pairs and the corresponding P value (Unadjusted P Value) determined
from the chi-square distribution. The comparisons are ranked by
ascending P value and the critical P level computed (the critical P level
depends only on the rank, total number of comparisons and the family P
value set in Test Options). The unadjusted P value is compared to the
critical level to determine significance. Compare Figure 12–32 and
Figure 12–33 to see that one difference between the two tests is the
computation of the critical level. The Bonferroni critical level is constant
since it is not a sequential method.
Bonferroni Test The Bonferroni test performs pairwise comparisons with paired chi-
square tests. It is computationally similar to the Holm-Sidak test except
that it is not sequential (the critical level used is fixed for all
comparisons). The critical level for the Bonferroni test is the ratio of the
FIGURE 12–33
Bonferroni Multiple
Comparison Results for VA
Lung Cancer Study
The critical level is constant at 0.05/6 = 0.00833. Since the critical level
does not increase, as it does for the Holm-Sidak test, there will tend to
be fewer comparisons with significant differences. This occurs here with
three significant comparisons as compared to four for the Holm-Sidak
case.
Results Explanations The number of significant digits displayed in the report may be set in
the Report Options dialog. For more information on setting report
options, Setting Report Options on page 135.
Report Header Information The report header includes the date and
time that the analysis was performed. The data source is identified by
the worksheet title containing the data being analyzed and the notebook
name. In Figure 12–34 the Data source shows the worksheet title to be
“VA Lung Cancer Trial” and the notebook name to be “Survival Analysis
Data”. The event and censor labels used in this analysis are listed. Also,
the time units used are displayed.
You can turn the display of this table off by clearing this option in the
Results tab of Test Options. This is useful to keep the report a reasonable
length when you have large data sets.
Data Summary Table The data summary table shows the total number
of cases. The sum of the number of events, censored and missing values,
shown below this, will equal the total number of cases.
Statistical Summary Table The mean and percentile survival times and
their statistics are listed in this table. The median survival time is
commonly used in publications.
FIGURE 12–34
The Gehan-Breslow
Survival Analysis
Results Report
You can control the graph in three ways. You can set the graph options
shown in Figure 12–23 and these options will become the default values
until they are changed. After the graph is created you can modify it using
SigmaStat's Graph Options. Each object in the graph is a separate plot
(e.g., survival curve, failure symbols, censored symbols, upper
confidence limit, etc.) so you have considerable control over the
appearance of your graph. If you also have SigmaPlot then you can use
Run SigmaPlot from the Graph menu and obtain additional control
over your graph.
FIGURE 12–35
Gehan-Breslow
Survival Curves
Incrementing colors, percent
survival and 95% confidence
interval options were
selected from Test Options.
The Holm-Sidak test showed
these two curves to be
significantly different at the
0.001 level.
FIGURE 12–36
A contrived survival curve
with various combinations
of failures, censored values
and tied data that
graphically shows the
effects of these rules
Failures and censored values are shown in Figure 12–36 as open and
filled circles, respectively. A single failure is shown at time = 1.0. It is
located at the inner corner of the step curve. All failures will occur at the
inner corners so it is not necessary to display failure symbols. Failure
symbols may be displayed in SigmaStat but by default are not. Two tied
failures are shown at time = 2.0. They superimpose at the inner corner of
the step that has decreased roughly twice as much as the step for a single
failure. Four censored values, two of which are tied, are shown in the
time interval between 2.0 and 8.0. Censored values do not cause a
decrease in the survival curve and nothing unusual occurs at tied censor
values. Four tied values, two failures and two censored, are shown at
time = 8.0 (the censored values are slightly displaced for clarity). They
occur at the inside corner of the step since that is where failures are
located. The censored value at time = 19.0 prevents the survival curve
from touching the X-axis.
Test Options Figure 12–37 shows four variations that can be achieved by modifying
survival curve Test Options.
FIGURE 12–37
For Variations of
Survival Graphs 1 .0
that Can Be
1 .0
A B
Achieved by
Modifying 0 .8 0 .8
Survival Curve
S u r v iv a l
S u r v iv a l
Test Options
0 .6 0 .6
0 .4
0 .4
0 .2
0 .2
0 .0
0 .0
0 1 0 2 0 3 0 4 0
0 1 0 2 0 3 0 4 0
T im e ( D a y s )
T im e (D a y s )
1 .0
C 1 .0
D
0 .8
0 .8
S u r v iv a l
S u r v iv a l
0 .6
0 .6
0 .4
0 .4
0 .2
0 .2
0 .0
0 .0
0 1 0 2 0 3 0 4 0
0 1 0 2 0 3 0 4 0
T im e ( D a y s )
T im e ( d a y s )
Graph Properties Figure 12–38 shows modifications made from Graph Properties to the
graph in Figure 12–37, C. The confidence interval lines were changed
from small gray dashed to solid blue. The censored symbol type was also
changed from a solid circle to a square.
FIGURE 12–38
Modifications made from
Graph Properties to the
graph in Figure 12–37, C.
1.0
0.8
0.6
Survival
0.4
0.2
0.0
0 10 20 30 40
Time (Days)
SigmaPlot If you have access to SigmaPlot you have complete control over your
graph. Figure 13-39 shows a graph generated in SigmaStat that has been
modified by using Run SigmaPlot from the Graph menu. The
background color and grid lines were added using custom colors. One of
the custom colors was used for the axis lines. Custom colors were used
for the survival curve lines and symbols. Tick marks were removed and
legend modifications made.
FIGURE 12–39
Modification of the Irradiation Effectiveness
Survival Curve Graph
using SigmaPlot
100
Affected Node
Total Node
Relapse Free Percentage
80
60
40
20
0
0 500 1000 1500 2000 2500
About Power 10
The power, or sensitivity, of a test is the probability that the test will
detect a difference or effect if there really is a difference or effect. The
closer the power is to 1, the more sensitive the test. Traditionally, you
want to achieve a power of 0.80, which means that there is an 80%
chance of detecting a specified effect with 1 !" confidence (i.e., a 95%
confidence when
"!# 0.05). Power less than 0.001 is noted as "P # < 0.001."
FIGURE 13–1
The Power Computation
Commands Menu
You can estimate how big the sample size has to be in order to detect the
a treatment effect or difference with a specified level of statistical
significance and power. All else being equal, the larger the sample size,
the greater the power of the test.
You can determine the power of an intended t-test. Unpaired t-tests are
used to compare two different samples from populations that are
normally distributed with equal variances among the individuals. For
more information on running t-tests, see Running a t-test on page 212.
FIGURE 13–2
The t-test Power Dialog Box
2 Enter the size of the difference between the means of the two
groups you want to be able to detect in the Expected Difference of
Means box. This can be the size you expect to see, as determined
from previous samples or experiments, or just an estimate.
4 Enter the expected sizes of each group in the Group 1 Size and
Group 2 Size boxes.
" error is also called a Type I error (a Type I error is when you
reject the hypothesis of no effect when this hypothesis is true).
The traditional " value used is 0.05. This indicates that a one in
twenty chance of error is acceptable, or that you are willing to
conclude there is a significant difference when P % 0.05.
FIGURE 13–3
The t-test Power
Computation
Results
Viewed in the
Report
You can determine the power of a Paired t-test. Paired t-tests are used to
see if there is a change in the same individuals before and after a single
treatment or change in condition. The sizes of the treatment effects are
assumed to be normally distributed. For more information on
performing Paired t-tests, see page 337.
To determine the power for a Paired t-test, you need to set the
FIGURE 13–4
The Paired t-test
Power Dialog Box
2 Enter the size of the change before and after the treatment in the
Change to be Detected box. The size of the change is determined
by the difference of the means. This can be size of the treatment
effect you expect to see, as determined from previous experiments,
or just an estimate.
FIGURE 13–5
The Paired t-test
Power
Computation
Results Viewed
in the Report
2 Enter the expected proportions that fall into the category for each
group. This can be the distribution you expect to see, as
determined from previous experiments, or just an estimate.
3 Enter the sizes of each group. This can be sample sizes you expect
to obtain, or just an estimate.
FIGURE 13–6
The Proportions
Power Dialog Box
& The Yates correction factor is used if this option is selected in the
Options for z-test dialog box. See page 435 for information on
setting z-test options.
FIGURE 13–7
The Proportion Power
Computation Results
Viewed in the Report
To determine the power for a One Way ANOVA, you need to specify
the:
FIGURE 13–8
The ANOVA Power Dialog
Box
4 Enter the expected number of groups and the expected size of each
group.
7 Select the # button to see the power of a One Way ANOVA at the
specified conditions. The power calculation appears at the top of
the dialog box. If desired, you can change any of the settings and
select the # button again to view the new power as many times as
desired.
FIGURE 13–9
The ANOVA Power
Computation Results
Viewed in the Report
Group Categories
Group 1 15 15 35
Group 2 15 30 10
& You only need to specify the pattern (distribution) of the number of
observations. The absolute numbers in the cells do not matter, only their
relative values.
FIGURE 13–10
Contingency Table Data
Entered into the Worksheet
FIGURE 13–11
The Chi-square
Power Dialog Box
FIGURE 13–12
The Chi-square Power
Computation Results
Viewed in the Report
You can determine the power to detect a given Pearson Product Moment
Correlation Coefficient r. A correlation coefficient quantifies the
strength of association between the values of two variables. A correlation
coefficient of 1 means that as one variable increases, the other increases
exactly linearly. A correlation coefficient of 1 means that as one
variable increases, the other decreases exactly linearly. For more
information on computing the correlation coefficient, see page 467.
FIGURE 13–13
The Correlation
Power Dialog Box
3 Enter the desired number of data points. This can be the sample
size you expect to obtain, or just an estimate.
FIGURE 13–14
The Correlation
Coefficient Power
Computation
Results Viewed in
the Report
You can determine the minimum sample size for an intended t-test.
Unpaired t-tests are used to compare two different samples from
populations that are normally distributed with equal variances among
the individuals. For more information on running t-tests, see page 212.
To determine the sample size for a t-test, you need to specify the:
2 Enter the size of the difference between the means of the two
groups to be detected in the Expected Difference in Means box.
This can be size you expect to see, as determined from previous
samples or experiments, or just an estimate.
FIGURE 13–15
The t-test Sample
Size Dialog Box
The traditional " value used is 0.05. This indicates that a one in
twenty chance of error is acceptable, or that you are willing to
conclude there is a significant difference when P % 0.05.
6 Select the # button to see the required sample size for a t-test at the
specified conditions. The sample size calculation appears at the
top of the dialog box. The sample size is the size of each of the
groups. If desired, you can change any of the settings and select
the # button again to view the new sample size as many times as
desired.
FIGURE 13–16
The t-test
Sample Size
Results Viewed
in the Report
For descriptions of computing the sample size for a t-test, you can
reference an appropriate statistics reference. For a list of suggested
references, see page 12.
You can determine the sample size for a Paired t-test. Paired t-tests are
used to see if there is a change in the same individuals before and after a
single treatment or change in condition. The sizes of the treatment
effects are assumed to be normally distributed. For more information on
running Paired t-tests, see page 337.
To determine the sample size for a Paired t-test, you need to estimate
the:
2 Enter the size of the change before and after the treatment in the
Change to be Detected box. This can be size of the treatment
effect you expect to see, as determined from previous experiments,
or just an estimate.
FIGURE 13–17
The Paired t-test
Sample Size Dialog Box
6 Select the # button to see the required sample size for a paired t-
test at the specified conditions. The sample size calculation
appears at the top of the dialog box. If desired, you can change any
of the settings and select the # button again to view the new
sample size as many times as desired.
FIGURE 13–18
The Paired t-test
Sample Size Results
Viewed in the Report
For descriptions of computing the sample size for a paired t-test, you can
reference an appropriate statistics reference. For a list of suggested
references, see page 12.
1 Enter the expected proportions that fall into the category for each
group in the Group 1 and 2 Proportion boxes. This can be the
distribution you expect to see, as determined from previous
experiments, or just an estimate.
FIGURE 13–19
The Proportions
Sample Size Dialog Box
4 Select the # button to see the required sample size for a proportion
comparison at the specified conditions. The calculated sample size
appears at the top of the dialog box. If desired, you can change any
of the settings and select the # button again to view the new
sample size as many times as desired.
& The Yates correction factor is used if this option was selected in the
Options for z-test dialog box. See page 435 for information on z-
test options.
report. The estimated sample size is the sample size for each
group.
FIGURE 13–20
The Proportions
Sample Size Results
Viewed in the Report
For descriptions of computing the sample size for a z-test, you can
reference an appropriate statistics reference. For a list of suggested
references, see page 12.
You can determine the group sample size for a One Way ANOVA
(analysis of variance). One Way ANOVAs are used to see if there is a
difference among two or more samples taken from populations that are
normally distributed with equal variances among the individuals. For
more information on running a One Way ANOVA, see page 237.
To determine the sample size for a One Way ANOVA, you need to
specify the:
Determining the Minimum Sample Size for a One Way Anova 735
Computing Power and Sample Size
FIGURE 13–21
The ANOVA
Sample Size Dialog Box
Determining the Minimum Sample Size for a One Way Anova 736
Computing Power and Sample Size
6 Select the # button to see the required sample size for a One Way
ANOVA at the specified conditions. The sample size calculation
appears at the top of the dialog box. The sample size is the size of
each group. If desired, you can change any of the settings and
select the # button again to view the new sample size as many
times as desired.
FIGURE 13–22
The ANOVA Sample
Size Results Viewed
in the Report
For descriptions of computing the sample size for a One Way ANOVA,
you can reference an appropriate statistics reference. For a list of
suggested references, see page 12.
Determining the Minimum Sample Size for a One Way Anova 737
Computing Power and Sample Size
You can determine the sample size for a chi-square ('2) analysis of a
contingency table. A Chi-square test compares the difference between
the expected and observed number of individuals of two or more
different groups that fall within two or more categories. For more
information on running Chi-square tests, see page 446.
TABLE 13-2
The Contingency Table Group Categories
with Expected Numbers
of Observations of Two Category 1 Category 2 Category 3
Groups in Three Categories
Group 1 15 15 35
Group 2 15 30 10
FIGURE 13–23
Contingency Table Data
Entered into the Worksheet
FIGURE 13–24
The Pick Columns for
Chi-square Dialog Box
FIGURE 13–25
The Chi-square Sample Size
Dialog Box
6 Select the #!button to see the required sample size for a Chi-square
test at the specified conditions. The sample size calculation
appears at the top of the dialog box. If desired, you can change any
of the settings and select the #!button again to view the new
sample size as many times as desired. However, if you want to
change the number of observations per category, you need to select
Close, edit the table, then repeat the sample size computation.
FIGURE 13–26
The Chi-square Sample Size
Computation Results
Viewed in the Report
You can determine the sample size necessary to detect a specified Pearson
Product Moment Correlation Coefficient r. A correlation coefficient
quantifies the strength of association between the values of two variables.
A correlation coefficient of 1 means that as one variable increases, the
other increases exactly linearly. A correlation coefficient of 1 means
that as one variable increases, the other decreases exactly linearly. For
more information on computing the correlation coefficient, see page
467.
FIGURE 13–27
The Correlation
Sample Size Dialog Box
FIGURE 13–28
The Correlation
Coefficient Sample
Size Results Viewed
in the Report
14 Using Transforms
Types of Transforms 10
➤ Quick transforms
➤ A number of miscellaneous data transforms
➤ User-defined transforms
Quick Transforms Use quick transforms to perform fast mathematical transforms on your
data and for linearizing and normalizing data.
You can often use transforms in order to use linear regression techniques
for data which does not fall along a straight line. SigmaStat provides a
number of Quick Transforms for linearizing data and stabilizing
(equalizing) non-constant variances.
The other option for handling nonlinear data is to use the appropriate
curved data regression technique. Both polynomial and general
nonlinear regression methods are provided. For a description of
polynomial regression, see Polynomial Regression on page 553. For
information on how to use nonlinear regression, see Chapter 11,
Prediction and Correlation.
➤ Add
➤ Subtract
➤ Divide
➤ Absolute Value
& If you select columns in the worksheet before you choose the sum
transform, the first two selected columns are automatically assigned
as the input columns, and the third column is assigned as the output
column.
1 Pick the first input column with the data you want to add by
clicking it in the worksheet or selecting it from the Data for Input
drop-down list.
2 Pick the column with the data you want to add the first column
values to, subtract the first column values from, or divide the first
columns by as the second input column.
FIGURE 14–1
The Pick Columns for
Add Transform Dialog Box
3 Select the column where you want to place the results of the
addition, subtraction, or division of the input columns as the
output column.
& If you specify an output column that contains data, a dialog box
appears asking you if you want to overwrite the column contents,
push the contents down, or cancel the transform.
FIGURE 14–2
The Output Columns
Are Not Empty Dialog Box
Finding the Use the Absolute Value quick transform to find the absolute values of
Absolute Values data in a worksheet column. Choose the Transforms menu Quick
of Column Data Transforms, Absolute Value command. When the Pick Columns dialog
box appears prompting you for an input column, select the column with
the data you want to find the absolute values for.
Select the column you want to put the absolute value results in as the
output column.
For detailed instructions on how to pick input and output columns for
the Absolute Value transform, see steps 1 through 7 on page 749.
➤ Square x2
➤ Natural log ln(x)
➤ Log log(x)
1
➤ Reciprocal --x-
➤ Exponential ex
➤ Square root x
➤ Arcsin square root transform arcsin $ x (
FIGURE 14–3
The Transforms menu Quick
Transforms Commands
The Pick Columns dialog box for the specified transform appears
and prompts you for an input column.
& If you select a column in the worksheet before you choose the
transform, the selected column is automatically assigned as the
input column, and you are prompted for the output column (see
step 3).
2 Pick the data column you want to apply the transform to as the
input column by clicking it in the worksheet or selecting it from
the Data for Input drop-down list.
3 Pick the column where you want the transform results to appear as
the output column by clicking it in the worksheet or selecting it
form the drop-down list. The number or title of the selected
column appear in the highlighted output row.
FIGURE 14–4
Picking Input and
Output Columns Using
the Pick Columns Dialog Box
& If you specify an output column that contains data, a dialog box
appears asking you if you want to erase the column contents, push
the contents down, or cancel the transform.
FIGURE 14–5
The Output Columns
Are Not Empty Dialog Box
Common Linearizing and Here are some common linearization and normalizing transformations
Normalization for nonlinear data. You can use these when you want to use linear
Transforms regression or ANOVA on nonlinear data.
FIGURE 14–6
Sample Quadratic
Curve Shapes
Note that when squaring the independent variable, you can also
introduce multicollinearity. To avoid this, apply the Center transform
on the data before squaring the independent variable. For information
on centering variables, see page 755.
& This procedure can be generalized to any order polynomial; use user-defined
transforms to create the higher-order polynomials. However, if you are
trying to fit data to higher-order curves than a quadratic or cubic, the
Polynomial Regression produces more reliable results. For information on
using Polynomial Regression, see page 553.
b1 Power Functions You can use transforms to linearize equations that use
y = b0 x
the independent variable as the power of a constant.
FIGURE 14–7
Sample Power
Function Shapes
FIGURE 14–8
Sample Exponential
Function Shapes
x -
y = ------------------- Hyperbola You can transform hyperbolas to make them suitable for
b0 + b1 x Simple Linear Regression.
+ 1,
1 Apply the Reciprocal ) --x-* quick transform to both the dependent
variable y and the independent variable x.
FIGURE 14–9
Sample Inverse
Exponential Function
Shapes
FIGURE 14–10
Sample Hyperbolic Function
Shapes
1 Apply the Arcsin Square Root quick transform to the data you
want to normalize.
FIGURE 14–11
Histograms of Data Before
and After the Arcsin Quick
Transform was Applied
Centering Data 10
The center transform subtracts the mean of a column from all values in
that column and places the result in a specified output column.
You can often use the center transform on data to eliminate or reduce
multicollinearity. For more information on centering data, you can
reference any appropriate statistics reference. For a list of suggested
references, see page 12.
To center a variable:
& If you select a column in the worksheet before you choose the
transform, the selected column is automatically assigned as the
input column, and you are prompted for the output column (see
step 3).
2 Pick the worksheet column with the data you want to center as the
input column by clicking it in the worksheet or selecting it from
the Data for Input drop-down list. The number or title of the
selected column appear in the highlighted input row and you are
prompted for an output column.
FIGURE 14–12
The Pick Columns for
Center Transform Dialog Box
3 Pick the column where you want the centered variables to appear
as the output column by clicking it the worksheet or selecting it
from the Data for Output drop-down list. The number or title of
the selected column appear in the highlighted output row.
& If you specify an output column that contains data, a dialog box
appears asking you if you want to erase the column contents, push
the contents down, or cancel the transform.
FIGURE 14–13
The Output Columns
Are Not Empty Dialog Box
The data from the source column is centered around its mean
value and placed in the specified output column.
FIGURE 14–14
Example of Center
Transform
The data in column 1
is the input column and
the data in column 2 is
the result data from
running the Center
transform on the data
in column 1.
Standardizing Data 10
To standardize a variable:
& If you select a column in the worksheet before you choose the
transform, the selected column is automatically assigned as the
input column in the Selected Columns list, and you are prompted
for the output column (see step 3).
2 Pick the worksheet column with the data you want to standardize
as the input column by clicking it in the worksheet or selecting it
from the Data for Input drop-down list. The number or title of
the selected column appear in the highlighted input row, and you
are prompted for an output column.
FIGURE 14–15
The Pick Columns
for Standardize
Transform Dialog Box
& If you specify an output column that contains data, a dialog box
appears asking you if you want to erase the column contents, push
the contents down, or cancel the transform.
FIGURE 14–16
The Output Columns
Are Not Empty Dialog Box
The data from the source column is standardized and placed in the
specified column.
FIGURE 14–17
Example of the
Standardized Transform
The data in column 1
is the input column and
the data in column 2 is
the result data from running
the Standardized
transform on the data
in column 1.
Ranking Data 10
Use the rank transform to assign integer rank values to data. Ranking
data is useful if you want know how the values are ranked, or to perform
two way ANOVA on the ranks of data that fails the normality or equal
variance tests.
To rank a variable:
1 Choose the Transform menu Rank command. The Pick Data for
Rank Transform dialog box appears and prompts you to select an
input column.
& If you select a column in the worksheet before you choose the
transform, the selected column is automatically assigned as the
input column, and you are prompted for the output column (see
step 3).
2 Pick the column with the data you want to rank as the input
column by clicking it in the worksheet or selecting it from the
Data for Input drop-down list. The number or title of the selected
column appears in the highlighted input row, and you are
prompted for an output column.
FIGURE 14–18
The Pick Columns for
Rank Transform Dialog Box
3 Pick the column where you want the ranked variables to appear as
the output column by clicking it the worksheet or selecting it from
the Data for Output drop-down list. The number or title of the
selected column appears in the highlighted output row.
& If you specify an output column that contains data, a dialog box
appears asking you if you want to erase the column contents, push
the contents down, or cancel the transform.
FIGURE 14–19
The Output Columns
Are Not Empty Dialog Box
The data from the input column is ranked and the corresponding
rank values are placed in the specified column.
FIGURE 14–20
Example of the
Ranked Transform
The data in column 1
is the input column and
the data in column 2 is
the result data from
running the Ranked
transform on the
data in column 1.
y = b0 + b1 x1 + b2 x2
you could add another variable to the equation equal to x1x2, e.g.,
y = b0 + b1 x1 + b2 x2 + b3 x1 x2
& Note that adding an interaction variable to a multiple linear regression can
induce multicollinearity. To avoid or reduce this problem, use the Center
& If you selected columns before you ran the transform, the selected
columns are assigned as the input and output columns in the order
they were selected in the worksheet.
2 Pick the first variable column with the data you want to factor into
the interaction by clicking it in the worksheet or selecting it from
the Data for Input drop-down list, then pick the second input
column. The number or title of the selected column appears in the
input row of the Selected Columns list.
FIGURE 14–21
The Pick Columns for
Interaction Transform Dialog
Box
3 Select the column where you want to place the interaction variable
as the output column by clicking it in the worksheet or selecting it
from the Data for Output drop-down list. The number or title of
the selected column appears in the highlighted output row.
& If you specify an output column that contains data, a dialog box
appears asking you if you want to erase the column contents, push
the contents down, or cancel the transform.
FIGURE 14–22
The Output Columns
Are Not Empty Dialog Box
The data from the input columns are factored together and placed
in the specified output column.
FIGURE 14–23
Example of the
Interaction Transform
The data in columns 1 and 2
are the input columns and
the data in column 3 is
the result data from
running the Interaction
transform on the
data in columns 1 and 2.
Reference Coding Reference coding sets the value of all dummy variables to zero when the
index variable corresponds to the indexed condition used, and codes all
other values of the index variable with a 1. The referenced condition is
always assigned
a 0.
& Use reference coding when you want the constant to be the mean of the
dependent variable under a selected referenced condition, and the
coefficients computed for the dummy variable(s) to reflect the changes of the
constant value from reference condition dependent variable mean.
1 If necessary, create an index column for your data. These data can
consist of any numbers or strings. Each dependent variable value
that falls under a different condition is indexed with a different
label. For more information on indexing data, see page 73. Two
FIGURE 14–26
Example of Indexed Data
Column Created For
Dependent Variable Data
Column 3 is the indexed data.
& If you selected columns before you ran the transform, the selected
columns are assigned as the input and output columns in the order
they were selected in the worksheet.
3 Pick the column with the indexed data you want to create dummy
variables for as the input column by clicking it in the worksheet or
selecting it from the Data for Input drop-down list. The number
FIGURE 14–27
The Pick Columns for
Reference Transform Dialog
Box
FIGURE 14–28
The Select Reference
Index Dialog Box
7 Select the reference index value from the list to use as the reference
condition; no dummy variable is created using this value (this is
the condition that determines the constant value; the
corresponding dummy variable values for this condition are always
& Ifappears
you specify an output column that contains data, a dialog box
asking you if you want to erase the column contents, push
the contents down, or cancel the transform.
FIGURE 14–29
The Output Columns
Are Not Empty Dialog Box
FIGURE 14–30
Example of Reference
Coded Dummy
Variable Values
The data in column 2 is the
input data and the data in
column 4 ius the output data.
Effects Coding In effects coding, the dummy variables are coded with! 1, 0, and 1. The
reference condition is always coded with a 1. The value of other
dummy variables is set to zero when the index variable corresponds to
the indexed condition used, and set to 1 for all other values of the index
variable.
& Use effects coding when you want the constant term to be computed using
the value of the dependent variable under all indexed conditions, and you
want the coefficients of the dummy variables to quantify the size of changes
from this overall mean.
1 If necessary, create an index column for your data. This data can
consist of any numbers or strings. Each dependent variable value
that falls under a different condition is indexed with a different
label. For more information on indexing data, see page 73. Two
FIGURE 14–31
Example of Indexed Data
Columns Created For
Dependent Variable Data
Column 3 is the indexed data.
& If you selected columns before you ran the transform, the selected
columns are assigned as the input and output columns in the order
they were selected in the worksheet.
3 Pick the column with the indexed data you want to create dummy
variables for as the input column by clicking it in the worksheet or
by selecting it from the Data for Input drop-down list. The
FIGURE 14–32
The Pick Columns for
Effects Transform Dialog
Box
6 Select Finish to run the transform and open the Select Reference
Index dialog box.
FIGURE 14–33
The Select Reference
Index Dialog Box
7 Select the reference index value from the list to use as the reference;
no dummy variable is created for this value, and the corresponding
dummy variable values for this condition are always 1. All other
8 Select OK. Values in the index column that match the index value
used to evaluate the column are assigned a zero. Index values that
match the reference condition are assigned 1. All other values are
set to 1. One dummy variable column is produced for each index
value, except the index selected as the reference condition.
& Ifappears
you specify an output column that contains data, a dialog box
asking you if you want to erase the column contents, push
the contents down, or cancel the transform.
FIGURE 14–34
The Output Columns
Are Not Empty Dialog Box
FIGURE 14–35
Example of Effects Coded
Dummy Variable Values
The data in column 3 is the
input data and the data in
columns 4 and 5 are the
output data columns.
Performing a Regression The equation used to evaluate the effect of a condition on the regression
Using Dummy Variables model constant is:
y = b0 + b1 x + b2 d1 + b3 d2 + - + bk – 1 dk – 1
Lagged variables are commonly used to create time series models, when
the effect of an independent variable on the dependent variable
corresponds more appropriately to the value of the dependent variable at
a later time.
6 Pick the column with the data you want to lag as the input column
by clicking it in the worksheet or selecting it from the Data for
Input drop-down list. The number or title of the selected column
appears in the highlighted input row of the Selected Columns list,
and you are prompted for an output column.
7 Pick the column where you want the lagged variables to appear as
the output column by clicking it in the worksheet or selecting it
from the Data for Output drop-down list. The number or title of
FIGURE 14–36
The Pick Columns for
Lagged Transform Dialog
Box
& If you specify an output column that contains data, a dialog box
appears asking you if you want to erase the column contents, push
the contents down, or cancel the transform.
FIGURE 14–37
The Output Columns
Are Not Empty Dialog Box
FIGURE 14–38
Example of Lagged
Variable Values
The data in column 2 is the
input data and the data in
columns 3 is the output data.
You can isolate specified groups of data using both numeric and text
filters. The filter transform operates by selecting only the rows that
correspond to specified numbers or labels in a key column, then placing
these rows and the corresponding data in new columns.
3 Pick the column where you want the results of the key column to
appear as the output column by clicking it in the worksheet or
selecting it from the Data for Output drop-down list. The number
or title of the selected column appear in the highlighted output
row, and you are prompted for an input column.
FIGURE 14–39
The Pick Columns for
Filter Transform Dialog Box
6 Select Finish to run the Filter transform. The Set Filter dialog box
appears.
FIGURE 14–40
The Set Filter Dialog Box
8 Select Text Filter to sort the key column data according to a text
label in the key column. Enter the string exactly as it appears in
the worksheet in the Key Label box and select OK when you have
specified the appropriate filter.
& Ifappears
you specify an output column that contains data, a dialog box
asking you if you want to erase the column contents, push
the contents down, or cancel the transform.
FIGURE 14–41
The Output Columns
Are Not Empty Dialog Box
FIGURE 14–42
Example of the Filter
Transform Using
a Numeric Filter
The data in columns 1
through 3 were filtered to
include a range of 1 through
3, using column 1 as the key
column, and placed in
columns 4 through 6.
FIGURE 14–43
Example of the
Filter Transform
Using the Text Filer
The data in columns 1
through 3 was filtered for
the label “Site 3,” using
column 3 as the key column,
and placed in columns 4
through 6.
& Input columns are not selected for the random number transform.
2 Pick the column where you want the random numbers to appear as
the output column by clicking it in the worksheet or selecting it
from the Data for Output drop-down list. The number or title of
the selected column appears in the highlighted output row of the
Selected Columns list.
FIGURE 14–44
The Pick Columns
for Uniform Random
Transform Dialog Box
6 Enter the seed for the random generator. This is the number used
to generate the random numbers. Select Random from the drop-
down list to use a random seed number.
& Ifappears
you specify an output column that contains data, a dialog box
asking you if you want to erase the column contents, push
the contents down, or cancel the transform.
FIGURE 14–46
The Output Columns
Are Not Empty Dialog Box
Normally Distributed To generate random data that follows a normal “bell” shaped
Random Numbers distribution curve:
& Input numbers are not selected for the random number transform.
FIGURE 14–47
The Pick Columns
for Normal Random
Transform Dialog Box
FIGURE 14–48
The Normal Random
Number Generator Dialog
Box
5 Enter the mean used for the numbers. This is the “middle” or
“top” of the bell curve.
6 Enter the standard deviation for the data. The size of this value
determines the amount of variation about the mean of the data. A
relatively large standard deviation distributes data as a low, flat bell.
A relatively small standard deviation creates a tall, skinny bell.
& Ifappears
you specify an output column that contains data, a dialog box
asking you if you want to erase the column contents, push
the contents down, or cancel the transform.
FIGURE 14–49
The Output Columns
Are Not Empty Dialog Box
FIGURE 14–50
Example of Values
Generated by The Random
Numbers Transform
Column 1 contains uniformly
randomly distributed
numbers. Column 2 contains
random numbers with
normally distributed values.
& If you select columns before you choose the missing values
transform, the selected columns are assigned as the input and output
columns in the order they were selected in the worksheet.
2 Pick the columns with the strings you want convert to missing
values as the input column by clicking it in the worksheet or
selecting it from the Data for Input drop-down list; then the
corresponding output column. You must pick an output column
for every input column you select. You can pick as many input
columns as desired.
FIGURE 14–51
The Pick Columns
for Missing Value
Transform Dialog Box
5 Specify the string to replace with missing value symbols. Enter the
string exactly as it appears in the worksheet, or select the string
from the drop-down list.
FIGURE 14–52
The Missing Value
Transform Dialog Box
& Ifappears
you specify an output column that contains data, a dialog box
asking you if you want to erase the column contents, push
the contents down, or cancel the transform.
FIGURE 14–53
The Output Columns
Are Not Empty Dialog Box
FIGURE 14–54
Example of the
Missing Values
Transform Converting
Text Strings to
Missing Values
The string “N/A” in
columns 1 through 3
was converted to “--”
symbols in columns
4 through 6.
User-Defined Transforms 10
2 Type the transforms instructions into the edit box. You can enter
up to 32K worth of text.
FIGURE 14–55
The User-Defined
Transform Dialog Box
A Glossary
Advisor The SigmaStat Advisor is designed to help you determine the appropriate SigmaStat test to
use to analyze your data. For more information, see the USING THE ADVISOR WIZARD chapter.
Alpha Value Alpha (") is the acceptable probability of incorrectly rejecting the null hypothesis.
ANOVA on Ranks Also known as the Kruskal-Wallis analysis of variance on ranks. This
nonparametric test compares several different experimental groups that receive different treatments.
Arcsin Square Root Transform This transform is used to normalize percentage data that is linearly
distributed before performing a statistical procedure, by computing arcsin $ x ( .
ASCII File See text file. (ASCII stands for American Standard Code for Information Interchange.)
AUTOEXEC.BAT A DOS file that automatically executes a series of commands when DOS is
booted.
Axis In a Cartesian graph, an axis indicates the direction and range of X, Y, or Z values. In SigmaStat,
axes define the origin and scaling of a plot, and include tick and label definitions.
Backward Stepwise Regression One of two stepwise regression methods for selecting independent
variables. In backward stepwise regression, all variables are entered into the equation. The
independent variable that contributes the least to the prediction is removed, followed by the next least,
and so on.
Bar Chart A plot which graphs data as vertical or horizontal bars with bar lengths equal to the data
values.
Base (of an exponent) The number that is raised to the exponential power (for example, 10 or e).
Block A selected, rectangular region of worksheet cells. Blocks can be copied, deleted, pasted,
transposed, sorted, printed, and exported.
789
Glossary
Box Plot A plot type that displays the 10th, 25th, 50th, 75th, and 90th percentiles as lines on a bar
centered about the mean, and the 5th and 95th percentiles as error bars. The mean line and data
points beyond the 5th and 95th percentiles can also be displayed.
Cell (worksheet) A location on the worksheet that holds a single data value or label, described by its
column and row number.
Center This transform is used to subtract the mean of a column from each of the values in that
column and place the results in a specified output column.
Chi-Square The Chi-Square statistic summarizes the difference between the expected and the
observed frequencies by summing the squared differences and dividing by the expected frequencies
$ O – E (2
. ---------------------
E
It can be calculated wherever you have a set of observed values and a set of corresponding expected
values.
Click To press and release a mouse button, usually to select a menu or dialog box option, an item on
a list, a block of text, etc.
Clipboard The Windows data buffer where cut or copied data and text are stored. Press Ctrl+V or
use the Edit menu Paste command to place Clipboard contents in the worksheet or on the page. Note
that data and text are stored in the same Clipboard, so cutting additional data, text, or objects
overwrites current Clipboard contents. Cleared (deleted) data or text bypasses the Clipboard and
leaves the current contents intact.
Coefficient A real number that multiplies a variable in an algebraic expression. See also, Correlation
Coefficient and R.
Column The SigmaStat worksheet consists of columns and rows of cells. A column is a vertical
collection of cells which generally holds a range of numbers to be analyzed as a set.
Column Titles These are used to identify groups of data. The column title is displayed above the
data column in the worksheet.
Column Statistics A collection of statistics computed for each column. These are displayed on the
bottom half of the worksheet by choosing the View menu Column Statistics command.
Common Log Scale An axis type that plots data along a logarithmic scale with base 10. See also,
Natural Log Scale.
790
Glossary
Confidence Interval Also known as confidence level, a specified confidence interval can be any%
value from 1 to 99; the suggested confidence level for both intervals is 95%. This can also be
described as P < " (alpha), where " is the acceptable probability of incorrectly concluding that the
coefficient is different than zero, and the confidence interval is 100(1 - ").
Confidence Interval for the Mean The range in which the true population mean will fall for a
percentage of all possible samples of a certain size drawn from the population.
CONFIG.SYS A DOS file that installs device drivers and sets system parameters when you turn on or
restart your computer.
Contingency Table A method of displaying the observed numbers of different groups that fall into
different categories. These tables are used to see if there is a difference between the expected and
observed distributions of the groups in the categories.
A contingency table associates the groups and categories with the rows and columns, and places the
number of observations for each combination in the cells. For more information about how to create
a contingency table, see page 69.
Constant Variance Also known as homoscedasticity, this is the assumption that the variance of the
dependent variable in the source population is constant regardless of the value of the independent
variables.
Copy To place selected worksheet data or graphic objects in the Windows Clipboard without
removing the data or objects, press Ctrl+C or use the Edit menu Copy command. The Clipboard
contents can be placed elsewhere on the worksheet or page by pressing Ctrl+V or selecting the Edit
menu Paste command.
Correlation Coefficient (R) R represents the measure of the relationship between two variables.
Specifically, it is the covariance divided by the product of the sample standard deviations. This
number varies between -1 and +1.
Cut To Remove selected data, graphs, or text and place them in the Windows Clipboard. Press
Ctrl+X or use the Edit menu Cut command to cut data, graphs, or text. Cut displaces any current
Clipboard contents. Only the last cut item can be pasted. Note that data, graphs, and text use the
same Clipboard.
The Clipboard contents can be placed at any selected worksheet or page location by pressing Ctrl+V or
choosing the Edit menu Paste command. Clipboard contents can also be pasted into other Windows
applications. See also, Paste.
Data Set A column, or set of worksheet columns, that have been chosen for analysis.
791
Glossary
Degrees of Freedom Represents a measure of the sample size, which affects the power of a test.
Delimiter A symbol or character used to separate data fields within a data file format; for example,
white space, commas, semicolons, or colons.
Descriptive Statistics SigmaStat can describe your data by computing basic statistics, such as the
mean, median, standard deviation, percentiles, etc. that summarize the observed data.
Dialog Boxes Boxes of commands and options that appear on the screen. Use dialog box options to
view and change test and report settings.
DIF files A text-based data file format, recognized by SigmaStat, developed by Software ArtsTM which
is used widely to exchange data between programs.
Drag Move the mouse while holding down the left mouse button.
Dummy Variables Also known as indicator variables, they can be used to determine if sets of data
share the same constant (intercept) value by determining if the constant is affected by conditional
changes specified by the dummy variables. Dummy variables can be defined with either effects coding
or reference coding.
Durbin-Watson Statistic This is a measure of serial correlation between the residuals. If the residuals
are not correlated, the Durbin-Watson statistic will be 2. Small values indicate positive serial
correlation among the residuals; large values indicate negative serial correlation.
Equal Variance Test This test is used to determine if the variances are as close together as the samples.
The test is computed with a Levene Median test.
Exponential Transform This transform is used to calculate the values of the number e raised to the
values in a specified column.
Export Data Save worksheet data from SigmaStat to a file, for use with other programs. Choose the
File menu Export Data command to export files in Text, DIF, Excel, or other file formats. See also,
Text files and DIF.
Fills Fills include pattern of lines and colors that fill bar chart bars, pie chart slices, 3D graph mesh
grids, 3D bar fills, and drawn objects. Fill patterns affect the color of bar, box, and slice edges and
mesh grid lines. Fills and edges are specified using the Fills settings of the Graph Properties dialog box.
Filter Transform This transform is used to isolate specified groups of data using both numeric and
text filters. It operates by selecting only rows that correspond to specified numbers or labels in a key
column, then placing these rows and the corresponding data in new columns.
792
Glossary
Fisher Exact Test This test determines the exact two-tailed probability of observing a specific 2 / 2
contingency table (or a more extreme pattern).
Font A style or type of character. TrueType fonts are available with the Windows system. Other
fonts, such as PostScript and Hewlett Packard fonts, are only available if the printer drivers are
installed.
Forward Stepwise Regression One of two stepwise regression methods for selecting independent
variables. In forward stepwise regression, the independent variable that produces the best prediction of
the dependent variable is entered into the equation first, the independent variable that adds the next
largest amount of information is entered second, and so on.
Gaussian Distribution A continuous probability distribution defined by two parameters, mean, and
variance. Also called the normal distribution. You can use the gaussian transform function to generate
normally distributed data.
Help System A context-sensitive system of indexed screens providing on-line information about
SigmaStat commands and operations. Press F1 to view the Help Contents, or choose one of the Help
menu commands to get additional help information.
Related topics are linked through highlighted words on the screen; selecting these brings up the entry
for that topic.
Holm-Sidak Test This can be used for both pairwise comparisons and comparisons versus a control
group. It is more powerful than the Tukey and Bonferroni tests and, consequently, it is able to detect
differences that these other tests do not. It is recommended as the first-line procedure for pairwise
comparison testing.
Hotkey A quick method of selecting menu commands and dialog box options. A letter in the
command or option appears underlined; pressing that letter selects the command or option.
Import Data Transfer data from a file to the SigmaStat worksheet for testing or other operations.
SigmaStat recognizes text, .DIF, .Lotus 123, Excel, SigmaPlot, and other file formats.
Choose the File menu Import Data... command to select files to import. See also, Text Files, DIF, and
Lotus .WKS.
Indexed Data This data format places the group names in one column (a factor column), and the
corresponding data for each group in another column.
793
Glossary
Interactions Transform This transform is used when you want to introduce an interaction variable
into a multiple linear regression model, i.e., a variable that takes into account the interaction between
two independent variables. The interaction transform computes the product of the values in two data
columns and places the results in an output column.
Insert A data entry mode where existing data is moved aside to make room for entered data. When
typing text labels on the page, you are always in insert mode.
When entering text, press the Insert key to toggle between insert and overwrite modes. In insert
mode, characters are moved to the right to make room for the new characters. See also Overwrite.
Kruskal-Wallis ANOVA on Ranks This is a nonparametric test that compares several different
experimental groups that receive different treatments. All values are ranked without regard to which
group they are in, then the sum of the ranks for each group are compared.
Kurtosis A measure of how peaked or flat the distribution of observed values is, compared to a
normal distribution. A normal distribution has kurtosis equal to zero.
Label Any text string, including graph and axis titles, and text entered using the Tools menu Text
command. Any text labels on the page can be modified by double-clicking them or using the Edit
Text dialog box.
Lagged Variables The lagged variables transform lags the observations in one column by one row.
These variables are commonly used to create time series models, when the effect of an independent
variable on the dependent variable corresponds more appropriately to the value of the dependent
variable at a later time.
Legend An explanation of the symbols on a graph. Legends are edited by double-clicking them, or
choosing the Tools menu Text command, clicking the legend on the page, then selecting the
Symbols... button to specify the placement of symbols, and legend style to use for the legend.
Levene Median Test This test is a Levene Mean test using the median instead of the mean. In
SigmaStat, it is used to test the equivalency of variances.
Line Graph A plot type in which data points are connected by lines. Line graphs and trajectory
graphs are 2D Cartesian graphs or 3D Cartesian graphs using a line plot type.
Linear Axis Scale An axis scale in which values along the axis increment arithmetically.
Linear Regression A linear regression finds the straight line that most closely describes, or predicts,
the value of the dependent variable, given the observed value of the independent variable. See also,
Regression.
794
Glossary
Link Use the Edit menu Paste Special command to place a linked object on the graph page. Linking
the object appears to place a copy of the object on the page, but actually only places a reference to the
original object file, and modifies the object every time the original file is changed.
Ln Transform This function returns a value or range of values consisting of the natural logarithm of
each number in the specified range.
Log Transform This function returns a value or range of values consisting of the base 10 logarithm of
each number in the specified range.
Logarithmic Scale A scale that represents numbers as a power of the base. See also, Common Log
and Natural Log.
y
Logit = ln " -----------------#
100 – y!
Lotus .WKS, .WK? Files Files created in Lotus 1-2-3 are recognized by SigmaStat and can be
imported. Lotus files have the file name extension .WKS or .WK?. Note that Lotus functions are not
imported.
Mann-Whitney Rank Sum Test This nonparametric test is used to test the null hypothesis that two
samples were not drawn from populations with different medians. A rank sum test ranks all the
observations from smallest to largest without regard to which group each observation comes from.
The ranks for each group are summed and the rank sums compared.
McNemar’s Test McNemar’s test is an analysis of contingency tables that have repeated observations
of the same individuals. Unlike a regular analysis of a contingency table, it ignores individuals who
responded the same way to the same treatments, and calculates the expected frequencies using the
remaining cells as the average number of individuals who responded differently to the treatments.
Mean The average value for a column. If the observations are normally distributed, the mean is the
center of the distribution.
Median The “middle” observation, computed by ordering all observations from smallest to largest,
then selecting the largest value of the smaller half of the observations.
Menu Bar A list of menus appearing at the top of the SigmaStat screen. These menus can be selected
with a mouse, or by pressing Alt and the first letter of the menu name. When one menu appears, the
adjacent menu can be pulled down by pressing 0 or 1.
795
Glossary
Missing Values The number of missing observations in a worksheet column. A missing value is
different than a blank because it represents the attempt to record a value.
Multicollinearity Multicollinearity occurs when changing the parameters for two independent
variables has a similar effect on the fit; when this is serious, the estimates of the regression coefficients
become unreliable. This only applies to regressions involving multiple independent variables (not for
simple linear regression or polynomial regression).
Multiple Comparisons You can use multiple comparison procedures to isolate the differences
between groups, when running an ANOVA. There are two classes of multiple comparison procedures:
all pairwise comparisons, where every pair of groups are compared; and multiple comparisons versus a
control, where all treatment groups are compared with a single control group. For more information,
see Multiple Comparison Options on page 295.
Multiple Linear Regression Multiple linear regression is used when you want to predict the value of
one variable from the values of two or more other variables, by fitting a plane (or hyperplane) to the
data, and you know there are two or more independent variables, and want to find a model with these
independent variables.
Multiple Logistic Regression Multiple Logistic Regression is used when you wan to predict a
qualitative dependent variable, such as the presence or absence of disease, from observations of one or
more independent variables, by fitting a logistic function to the data. For more information, see page
527.
Nonlinear Regression A nonlinear regression is used when the data follow a curve that is a nonlinear
function. Nonlinear regression solves the regression problem directly without transforming the data
and performing linear regression techniques. Nonlinear regression uses the Marquardt-Levenberg
algorithm to find the coefficients (parameters) of the independent variable(s) that give the “best fit”
between the equation and the data.
Nonparametric Tests These tests do not require that the data is normally distributed. They perform
a comparison on ranks of the observations.
Normality This refers to the assumption (contained within parametric tests) that a population follows
a standard, “bell” shaped Gaussian distribution, also known as a “normal” distribution.
Notebook File Notebook files are compound files that contain worksheets and graph pages.
Notebook files are provided as a means for automatic file organization, enabling you to keep separate
notebooks for separate groups of data.
796
Glossary
Novice Prompting Messages alerting you to certain situations or which double check some choices
(for example, telling you that data contains missing values or asking for confirmation before clearing
data).
Observed Proportions This data format consists of the sample sizes of two groups and the
proportion of each group that falls into a single category. This data is used to see if there is a difference
between the proportion of two different groups that fall into the category.
OLE2 Objects pasted from the Clipboard to a graph page can be linked, embedded, or placed on the
page as a generic object without any kind of file reference. Linked and embedded objects use OLE2,
Object Linking and Embedding version 2. To learn about the differences between linking and
embedding, see PASTING GRAPHS AND OTHER OBJECTS page 190.
One Way ANOVA A one way ANOVA is used when you want to see if two or more different
experimental groups are affected by two different treatments and your samples are drawn from
normally distributed populations with equal variances.
One Way Repeated Measures ANOVA A one way RM ANOVA tests for differences in the effect of a
series of experimental interventions on the same group of subjects by examining the changes in each
individual. Examining the differences between the values rather than the absolute values removes any
differences due to individual responses, producing a more sensitive (or more powerful) test.
Open Load a file into SigmaStat, either a notebook file, worksheet, graph, report, or another
program’s file.
Overwrite A data or text entry mode in which newly typed characters replace characters already on
the screen. See also, Insert.
P Value This value is the probability of incorrectly rejecting the null hypothesis. An increase in the P
value lessens the probability, and a decrease in P increases the risk.
Page Where reports and graphs are displayed and printed. The page displays the current report(s),
graph(s), or other objects as they appear when printed.
Paired t-test The paired t-test examines the changes that occur before and after a single experimental
intervention on the same individuals to determine whether or not the treatment had a significant
effect. Examining the changes rather than the values observed before and after the intervention
removes the differences due to individual responses, producing a more sensitive, or powerful, test.
Parametric Tests These tests are used to compare samples from normal populations, and are based on
estimates of the means and standard variation parameters of a normally distributed population.
797
Glossary
Paste Place the contents of the Clipboard at the selected location. On the worksheet, the upper left
corner of the Clipboard data block appears at the highlighted cell. On the Page, the Clipboard
contents are offset from the original object’s position.
Press the Ctrl+V key or choose the Edit menu Paste command to paste data or graphics.
Paste Special Place the contents of the Clipboard as an object of specified file type, as an embedded
object, or as a linked file object.
Embedding or linking text is especially useful for placing equations on a page, enabling you to insert
equations created with the Microsoft Word Equation Editor, and edit them at a later date. For more
information on using Microsoft Word and the Equation Editor, refer to the Microsoft Word User’s
Guide. To learn about pasting text on a page, see PASTING GRAPHS AND OTHER OBJECTS page 190.
Pearson Product Moment Correlation The Pearson product moment correlation is used when you
want to measure the strength of association between pairs of variables without regard to which variable
is dependent or independent, the relationship, if any, between the variables is a straight line, and the
residuals (distances of the data points from the regression line) are normally distributed with constant
variance.
Percentiles The two percentile points which define the upper and lower ends (tails) of the data, as
specified by the Descriptive Statistics options.
Point (pt) A unit of measure used in typesetting. Seventy-two points equal one inch.
Pointer The tool controlled by the mouse used to choose commands, select dialog box options, select
data on the worksheet, and select and modify page objects. Sometimes called the cursor.
The pointer is usually arrow-shaped. On the page, the shape of the pointer changes according to its
current function.
Polynomial Regression A polynomial regression is used when you want to predict a trend in the
data, or predict the value of one variable from the value of another variable, by fitting a curve through
the data that does not follow a straight line, and know there is only one independent variable.
Power The power, or sensitivity, of a test is the probability that the test will detect a difference or
effect if there really is a difference or effect. The closer the power is to 1, the more sensitive the test.
Predicted Values The predicted values for the regression are the values computed for the dependent
variable by the regression equation for each observed value of the independent variables.
Preferences A set of options used to customize the appearance of SigmaStat worksheets and graph
pages and to set some defaults. Use the File menu Preferences... command to access the preferences
options.
798
Glossary
PRESS Prediction Error The PRESS statistic (Predicted Residual Error Sum of Squares) is a measure
of how well the regression equation fits the data. The PRESS statistic is computed by removing the ith
data point from the data set, computing the regression equation without this data point, predicting
that point based on the regression equation, then computing the residual.
Probability Scale An axis scale in which a sigmoidally shaped curve identical to the Gaussian
cumulative distribution function appears as a straight line.
Probit Scale An axis scale identical to the probability scale, except that it is expressed in terms of
standard normal deviates increased by five. A probability of 0.5 (50%) corresponds to 0 standard
normal deviates, or five probits. One standard normal deviate on either side of zero encompasses
68.2% of the area under the normal curve. A probit of 6 (1+5) corresponds to the 84.1% probability
and a probit of 4 (-1+5) corresponds to the 15.9% probability (68.2% = 84.1% - 15.9%).
Quick Transforms Seven commonly-used functions used to linearize observations or stabilize the
variance, so that the resulting variables meet the requirements of statistical methods.
R Value The correlation coefficient, or square root of R2. R2 is sometimes called the coefficient of
determination and is a measure of the closeness of fit of a scatter graph to its regression line where R2 =
1 is a perfect fit. See also, Correlation Coefficient and Regression.
Random Numbers A series of normally or uniformly distributed numbers created by the two random
number generating functions, or transforms, within SigmaStat.
Rank Transform This function is used to assign rank values to all observations in a column from
smallest to largest. Ties are assigned the average of the ranks that would be assigned if there were no
tied values. The rank transform assigns integer rank values to data.
Raw Data This data format places the data for each group to be compared or analyzed in separate
columns.
Raw Residuals The raw residuals are the differences between the predicted and observed values of the
dependent variables.
1
Reciprocal Transform This function calculates the reciprocal, ---x- , of the values in a specified column.
Regression These procedures use the values of one or more independent variables to predict the value
of a dependent variable. Regression assumes an association between the independent and dependent
variables that, when graphed on a Cartesian coordinate system, produces a straight line, plane, or
curve. Regression finds the equation that most closely describes the actual data.
799
Glossary
Use the Statistics menu Regressions... command to perform regressions. See also Confidence Interval
and Nonlinear Regression.
Regression Coefficients These are the values of the constant and coefficients of the independent
variables for the regression model, as computed by the regression procedure. See also, Regression.
Repeated Measures ANOVA on Ranks The Friedman repeated measures analysis of variance on
ranks is a parametric test that compares effects of a series of different experimental treatments on a
single group. Each subject’s responses are ranked from smallest to largest without regard to other
subjects, then the rank sums for the subjects are compared.
Residuals These are the differences between the predicted and observed values of the dependent
variables. There are 4 types of residuals: Raw Residuals, Standardized Residuals, Studentized
Residuals, and Studentized Deleted Residuals.
Sample Size The sample size is the number of observations, both in each column or group, and taken
as a whole (all groups or treatments). All else being equal, the larger the sample size, the greater the
power of the test.
Save To write all data and graph settings to a file. Use the File menu Save command to save your
work.
Your data, report, and graph are saved to the same notebook file that was previously opened; if you
began a new session, you are prompted for a path and file name. Transform (.XFM) and nonlinear
regression (.FIT) files are saved using buttons in the Transform or Nonlinear Regression dialog boxes.
Save As Write all data, report, and graph settings to a new file. Use the File menu Save As...
command to specify a new file name and directory. Transform (.XFM) and nonlinear regression (.FIT)
files are saved using buttons in the Math Transform or Regression dialog boxes.
Scatter Graphs A graph type where a symbol represents each data point. Scatter plots are 2D or 3D
Cartesian graphs using a scatter plot type.
Scientific Notation A form for expressing numbers using the letter e to represent the power of 10.
For example, the scientific notation for 10.0 is 1.0e+001.
Scroll Box A dialog box option containing a list of items. You can scroll up or down to reveal more
selections. Selected scroll boxes have a scroll bar appearing along the right side. You can use the
mouse to drag the scroll bar up or down, or click the up and down arrow buttons.
Section Sections are a subdivision of the notebook file which is a compound file used to save all data
and graphs in SigmaStat. Notebook sections are individual “folders” that contain notebook items.
Notebook items are worksheets and graph pages you have created using SigmaStat. Each notebook
800
Glossary
section may contain only one worksheet, but can contain up to ten graph pages. Within sections,
notebook items are indicated as worksheets or graph pages by icons that appear next to item names.
In addition, reports and their associated graphs, are saved to test sections “nested” within the section
containing the corresponding worksheet data. For more information, see NOTEBOOK FILE
STRUCTURE on page 17.
Select (Object) To choose an object on the page in order to perform an operation (such as move or
delete) on it. Graphs and text labels can be selected. Items can only be selected when the Tools menu
Select Object command is checked.
To select an object, click while the pointer is over the object. Selected objects are surrounded by square
handles or a dotted line. You can select multiple objects by dragging a dotted-line box completely
around the objects, or by holding down the Shift key while selecting individual objects.
Signed Rank Test The Wilcoxon signed rank test tests the null hypothesis that two samples were
drawn from populations with the same medians. A signed rank test is a nonparametric procedure that
ranks all the observed treatment differences from smallest to largest without regard to sign (based on
their absolute value), then attaches the sign of each difference to the ranks.
Skewness A measure of how symmetrically the observed values are distributed about the mean. A
normal distribution has a skewness equal to zero.
Sort To arrange items in an ascending or descending order. Selected blocks of worksheet data can be
sorted using the Edit menu Sort Selection... command. If you sort more than one column, all
columns are sorted according to the selected key column.
Spearman Rank Order Correlation This correlation is used when you want to measure the strength
of association between pairs of variables without specifying which variable is dependent or
independent measure the residuals (distances of the data points from the regression line) the
population is not normally distributed with constant variance.
Square Root Transform This function, x , is used to calculate the square root of values in a
specified worksheet column.
Square Transform This function computes the squares of the values in a specified worksheet column
x 2.
801
Glossary
Standard Deviation A measure of the spread of the data about the mean. The sample standard
deviation is the square root of the ratio of the sum of the squares of the residuals divided by the
number of data points, minus one.
1/2
n
1 -
s = -----------
n–1 & $ xi – x %2
i=1
Standard Error (of the Mean) The standard deviation of the mean, computed by dividing the
sample standard deviation by the square root of the sample size.
s
Std Err = -------
n
Standardize Transform The standardize transform, used before performing a statistical procedure,
subtracts the mean of a column from the sum of all values in that column, then divides that value by
the standard deviation, placing the results in a specified output column. By definition, standardized
data has a mean of zero and a standard deviation of one.
Standardized Residuals The standardized residual is the residual divided by the standard error of the
estimate. The standard error of the residuals is essentially the standard deviation of the residuals, and
is a measure of variability around the regression line. See also, Residuals.
Statistical Summary Data This data format can be used to perform a t-test or one way ANOVA.
These statistics are in the form of the sample size, mean, and the standard deviation (or the standard
error of the mean) for each group.
Stepwise Regression A stepwise linear regression is used when you want to predict a trend in the data,
or predict the value of one variable from the values of one or more other variables, by fitting a line or
plane (or hyperplane) through the data, and when you do not know which independent variables
contribute to predicting the dependent variable. This procedure finds the model with suitable
independent variables by adding or removing independent variables from the equation.
Studentized Deleted Residuals Studentized deleted residuals are similar to the Studentized Residuals
except that the residual values are obtained by computing the regression equation without using the
data point in question. See also, Studentized Residuals.
Studentized Residuals Studentized residuals scale the standardized residuals by taking into account
the greater precision of the regression line near the middle of the data versus the extremes. The
Studentized residuals tend to be distributed according to the Student t distribution, so the t
distribution can be used to define “large” values of the Studentized residuals.
802
Glossary
Sum Refers to the sum of all observations. The mean equals the sum divided by the sample size.
Sum of Squares The sum of the squared observation values. This is the sum of squared deviations
from the mean.
Summary Table A summary table of basic statistics can be produced for all group comparison and
repeated measures tests. The summary table is displayed in the report if it was selected in the options
dialog box for the test.
Survival Analysis This is a statistical test that studies the variable that is the time to some event.
Symbol The figure (such as a circle or triangle) used to represent a data point in a line or scatter plot.
Plot symbols are modified using the Symbols settings of the Graph Properties dialog box. For more
information on symbol settings, see MODIFYING GRAPH ATTRIBUTES on page 184.
Tabulated Data Raw observation counts organized in a contingency table. This data format can be
used for the Chi-Square, McNemar’s, and Fisher exact tests. See also Raw Data.
t-test A parametric statistical test used to determine if there is a difference between two groups that is
greater than what can be attributed to random sampling variation. A t-test is based on estimates of the
mean and standard deviation parameters of the normally distributed populations from which the
samples were drawn. Also called Student's t-test. See also Paired t-test.
Text File A “plain text” file format widely used by word processing, desktop publishing, and
spreadsheet programs. SigmaStat can import and export text files.
Toolbar Toolbars are floating palettes containing buttons to execute many common File, Edit, View,
Format, Graph, and Statistics menu commands. These include running tests, the SigmaStat Advisor,
creating and editing graphs, and formatting and editing reports.
For more information on modifying the display and positioning of toolbars, and using the button
commands, see USING TOOLBARS on page 8.
See the Transforms and Nonlinear Regression reference for a complete description of transform
functions.
Transpose Switches the orientation of worksheet data so that columns become rows and rows become
columns. Use the Edit menu Transpose Paste command to paste Clipboard data with rows and
columns transposed.
803
Glossary
Three Way ANOVA In a three way or three factor analysis of variance, there are three experimental
factors which are varied for each experimental group. A three factor design is used to test for
differences between samples grouped according to the levels of each factor, and for interactions
between the factors.
Transpose Paste Switches the orientation of worksheet data so that columns become rows and rows
become columns.
Use the Edit menu Transpose Paste command to paste Clipboard data with rows and columns
transposed.
Two Way ANOVA In a two way or two factor analysis of variance, there are two experimental factors
which are varied for each experimental group. A two factor design is used to test for differences
between samples grouped according to the levels of each factor, and for interactions between the
factors.
Two Way Repeated Measures ANOVA In a two way or two factor repeated measures analysis of
variance, there are two experimental factors which may affect each experimental treatment. Either or
both of these factors are repeated treatments of the same group of individuals. A two factor design
tests for differences between the different levels of each treatment, and for interactions between the
treatments.
User-Defined Transforms SigmaStat user-defined transforms are math functions and equations
which are applied to worksheet data. User-defined transforms provide extremely flexible data
manipulation, allowing powerful mathematical calculations to be performed on specific sets of data.
Yates Correction Factor The Yates correction is applied to '()(' tables and other statistics where the
P value is computed from a chi-square distribution with one degree of freedom. Using the Yates
correction makes a test more conservative, i.e., it increases the P value and reduces the chance of a false
positive conclusion.
z-test The z-test comparison of proportions is used to determine if the proportions of two groups
within one category or class are significantly different. It is used when there are two groups to
compare, the total sample size (number of observations) for each group is known, and with the
proportions p for each group that falling within a single category.
Zoom Enlarge or shrink the view of the current graph. Choose the View menu Zoom command to
change the zoom level. You can view a graph at 50%, 100%, 200%, 400%, or fit the page in the
current window.
804
Index
Index A
Adjusted R2
best subset regression 612
best subset regression results 621
Symbols linear regression results 485
.ASC files multiple linear regression 513
opening 23, 24 nonlinear regression results 654
.CVS files stepwise regression results 598
opening 23, 24 Advisor
.DBF files calculating power 88, 89, 91
opening 23, 24 calculating sample size 84, 88, 91
.DIF files data format 91
importing 61 defining your goals 83
opening 23, 24 determining sensitivity 85
.HTM files determining which test to use 83
exporting 143 independent variables 94
.JNB files measuring data 85
saving 10, 19 number of treatments 87
.MOC files repeated observations 86
opening 23, 24 using 6
.PDF files see also type of prediction
exporting 143 Algorithm
.PRN files Marquardt-Levenberg 636
opening 23, 24 Aligning
.SP5 files report text 139
importing 61 Alignment
opening 23, 24 text 186
.SPG files All pairwise comparisons
importing 61 ANOVA on ranks 319
opening 23, 24 ANOVA on ranks results 325
.SPW files one way ANOVA 240
opening 23, 24 one way ANOVA results 248
.TXT files one way RM ANOVA 365, 374
opening 23, 24 RM ANOVA on ranks 417, 423
.WK* files three way ANOVA 296
opening 23, 24 three way ANOVA results 306
.WKS files two way ANOVA 267
defined 795 two way ANOVA results 277
two way RM ANOVA 393, 404
Alpha
Numerics defined 716
2D graphs in power 441, 451
line/scatter 800 Alpha value
3D category scatter plot Chi-Square test 445
two way ANOVA 280 defined 789
two way RM ANOVA 406 editing 211, 236, 291, 481, 508, 563, 594, 648
3D graphs in power 133, 217
scatter 800 linear regression 481, 594, 648
3D scatter plot linear regression results 490
two way ANOVA 280 multiple linear regression 508
3D scatter residual plot one way ANOVA options 236
two way RM ANOVA 406 one way RM ANOVA 360
95% confidence interval 56 paired t-test 336
99% confidence interval 56
805
Index
806
Index
807
Index
808
Index
809
Index
810
Index
811
Index
812
Index
813
Index
point and column means plot 111 report ruler 137, 138
point plot 111 docking
scatter plot 110 Notebook Manager 18
Deviance residuals Dragging
regression results 537 defined 792
DFFITS test dragging
linear regression results 492 Notebook Manager 18
multiple linear regression results 522 Dummy variables
nonlinear regression results 661 applying transform 765
stepwise regression results 606 defined 792
when to use 479, 506, 589, 646 defining 765
Diagnostics multiple logistic regression 528
influence 491, 605, 660 using 746
regression 520, 550, 573, 604 Duncan’s Multiple Range test
Dialogs three way ANOVA 298
defined 792 two way ANOVA 269
DIF files two way RM ANOVA 395
defined 792 Duncan’s test
importing 61 one way RM ANOVA 366
Difference from 2 value two way RM ANOVA results 405
linear regression 474 Dunn’s test
multiple linear regression 501 ANOVA on ranks 320, 326, 418
nonlinear regression 641 RM ANOVA on ranks results 424
polynomial regression 559 Dunnett’s test
stepwise regression 584 ANOVA on ranks 319, 325
Difference of groups one way ANOVA 241
paired t-test results 341 one way RM ANOVA 366, 375
Difference of means repeated measures ANOVA on ranks 418
Bonferroni t-test results 278, 307 RM ANOVA on ranks results 424
Dunnett’s test results 279, 307 three way ANOVA 297, 307
one way ANOVA results 250 two way ANOVA 269, 278
one way RM ANOVA results 249 two way RM ANOVA 395
Student-Newman-Keuls test results 279, 307 two way RM ANOVA results 405
Difference of proportions Durbin-Watson statistic
confidence interval 437 defined 792
z-test results 439 linear regression 474
Difference of ranks multiple linear regression 501
ANOVA on ranks results 325 nonlinear regression 641
Disconnected data polynomial regression 559
three way ANOVA 286 stepwise regression 584
two way ANOVA 256 Durbin-Watson test
two way RM ANOVA 383 linear regression results 489
Display summary table multiple linear regression results 518
one way RM ANOVA 359 nonlinear regression results 657
rank sum test 224, 349, 413 polynomial regression results 572
RM ANOVA on ranks 413 stepwise regression results 602
two way ANOVA 261
two way RM ANOVA 387 E
unpaired t-test 210
Displaying Edit menu
data using a fixed decimal 43 clear 141
data using scientific notation 42, 43 copy 33, 140
formatting toolbar 9 cut 33, 140
page margins 178 delete 34
814
Index
815
Index
816
Index
817
Index
818
Index
819
Index
820
Index
821
Index
822
Index
823
Index
824
Index
825
Index
826
Index
827
Index
828
Index
829
Index
830
Index
831
Index
832
Index
833
Index
834
Index
835
Index
836
Index
837
Index
838
Index
839
Index
840
Index
3D residual scatter plot 608, 663 regression results 476, 538, 643
adjusted R2 598, 654 stepwise regression results 586
ANOVA table 598 Student-Newman-Keuls test
bar chart of standardized residuals 608, 662 ANOVA on ranks 319, 325
confidence intervals 606 one way ANOVA 241, 249, 268, 297, 418
constant variance test 603 one way RM ANOVA 366, 375
Cook’s Distance test 605 repeated measures ANOVA on ranks 418
creating a graph 608, 663 RM ANOVA on ranks results 424
DFFITS test 606 three way ANOVA 297, 307
Durbin-Watson statistic 602 two way ANOVA 268, 278
F statistic 599 two way RM ANOVA 394
F-to-enter value 597 two way RM ANOVA results 405
F-to-remove value 598 Subject column 66
histogram of residuals 607, 662 Subscript 186
influence diagnostics 605 Sum
leverage test 605 column statistics 56
line/scatter plot with prediction and confidence descriptive statistics results 109
intervals 608, 663 Sum of squares
normality test 602 defined 567, 803
P value 600 descriptive statistics results 110
power 603 incremental 568
predicted values 585 linear regression results 487
PRESS statistic 602 multiple linear regression results 515
probability plot 608, 662 nonlinear regression results 655
raw residuals 585 one way ANOVA results 247
regression diagnostics 604 one way RM ANOVA results 372
scatter plot of residuals 607, 662 residual 622
standard error of the estimate 598 stepwise regression results 598
standardized residuals 586 three way ANOVA results 302, 303
step number 598 two way ANOVA results 274
studentized deleted residuals 586 two way RM ANOVA results 400
studentized residuals 586 Summary table
sum of squares 598 ANOVA on ranks results 323
variables 600 best subset regression results 620
Stop after steps defined 803
entering value 582 linear regression results 486
Strings multiple linear regression results 514
filtering 777 multiple logistic regression results 548
Structural multicollinearity one way ANOVA 234
defined 509, 536, 592, 617 one way ANOVA results 245
Studentized deleted residuals one way RM ANOVA 359
defined 802 one way RM ANOVA results 371
multiple linear regression results 503 Paired t-test 336
multiple logistic regression results 551 paired t-test results 340
polynomial regression options 561 rank sum results 228
regression diagnostic results 491 rank sum test 224, 413
regression results 476, 521, 539, 605, 643, 659 RM ANOVA on ranks 413
stepwise regression results 586 RM ANOVA on ranks results 422
Studentized residuals signed rank test results 352
defined 802 three way ANOVA 290
multiple linear regression results 503 three way ANOVA results 305
multiple logistic regression diagnostic results 551 two way ANOVA 261
polynomial regression options 561 two way ANOVA results 273
regression diagnostic results 491, 520, 604, 659 two way RM ANOVA 387
841
Index
842
Index
843
Index
844
Index
845
Index
846
Index
847
Index
Y
Yates correction
chi-square results 450
chi-square test 446
defined 430
McNemar’s test 460
setting 437, 446, 460
when to use 437, 446, 460
Yates correction factor
defined 804
Z
z statistic
P value 441
z-test results 441
Zooming in/out
defined 804
z-test
power 719
sample size 733
when to use 430
z-test
alpha value 437
arranging data 435
calculating power/sample size 133
confidence interval 437
data format 431, 435
defined 804
848