Sei sulla pagina 1di 20

SPSS INSTRUCTION CHAPTER 8

SPSS provides rather straightforward output for regression and correlation analysis. The program’s graph, regression, and correlation functions can respectively produce scatterplots, provide regression equation coefficients, and create correlation matrices. Within the outputs for these functions, you can also find information, such as coefficients of determination and significance values.

Preparing Regression and Correlation Analysis Data in SPSS

The first step in performing regression and correlation analyses in SPSS is, of course, inputting data into the program. Each variable should receive its own column on SPSS’s Data View screen. With this arrangement, each subject’s independent and dependent variable scores should fall into the same row.

Example 8.25 SPSS Data View Screen for Regression and Correlation Analysis For a simple example, consider the five-subject sample introduced in Example 8.5 (selected for this example due to the small sample size, which allows the entire data set to be shown easily). Figure 8.19presents the data from this example as it would look in the SPSS Data View screen . FIGURE 8.19SPSS REGRESSION AND CORRELATION ANALYSIS DATA ARRANGEMENT Data for the independent variable appears on the left and data for the dependent variable appears on the right. However, the variables do not need to appear in this order because, in forthcoming steps, SPSS asks the

user to identify the independent and the dependent variable by name.

If your analysis involves more than two variables, you can simply include additional columns. In the commands that you provide to SPSS about the analysis that you wish to perform, you must specify which of these columns you wish to represent independent variables, dependent variables, and intervening variables.

Creating Scatterplots in SPSS

Basic scatterplots are most easily created through SPSSs Graphs function. SPSS instructions forChapter 2 and Chapter 3 explain how to use this function to create bar graphs, pie charts, and frequency histograms. The process for creating scatterplots in SPSS begins the same way.

1. From the pull-down menu under the Graphs option at the top of the data view or variable view screen, select “Legacy Dialogues.” A listing of graphs and charts available through this method should appear.

2. Select “Scatter/Dot.” A window entitled Scatter/Dot should appear. The Scatter/Dot window contains various options for the graph. A two-variable situation requires a Simple Scatter. A three-variable situation, such as that described in Section 8.3.1, requires a 3-D Scatter. After selecting the name of the appropriate scatterplot, click “Define.”

a. For a simple scatterplot, a new window, entitled Simple Scatterplot should appear. FIGURE 8.20 SPSS SIMPLE SCATTERPLOT WINDOW The user creates two-variable scatterplot by identifying the independent (X) and dependent (Y) variables from those listed on the left side of the window. To do so, highlight the name of each variable and click on the arrow next to the box labeled with the appropriate axis name.

Identify the independent variable by moving its name from the box on the left to the box labeled “X Axis.” Identify the dependent variable by moving its name from the box on the left to the box labeled “Y Axis.”

b. For a 3-D scatterplot, a new window, entitled, 3-D Scatterplot should appear. FIGURE 8.21 SPSS 3-D SCATTERPLOT WINDOW The user creates three-variable scatterplot by identifying the two independent variables and the dependent variable from those listed on the left side of the window. To do so, highlight the name of each variable and click on the arrow next to the box labeled with the appropriate axis name.

Move the names of each of the two independent variables and the dependent variable from the box on the left to a box on the right marked for one of the axis. The assignment of the three variables to the X, Y, and Z axis on the graph depends upon the user’s intentions and preference for the graph’s appearance.

3. Click OK.

Example 8.26 Simple Scatterplot in SPSS The steps for producing a simple scatterplot can be applied to the examples from Section 8.2.1. The following graph results from moving the name of the independent variable, students, to the box labeled, “X Axis,” and moving the name of the dependent variable, “hedgers,” to the box labeled “Y Axis.” FIGURE 8.22 SPSS SIMPLE SCATTERPLOT OUTPUT The scale for independent-variable scores lies along the X axis and the scale for dependent-variable scores lies along the Y axis. Each point represents a particular independent and dependent variable score.

This particular scatterplot indicates that, as class size increases, teachers’ use of hedgers

tends to increase. Thus, it suggests a positively-sloped regression line.

The basic SPSS scatterplot does not show the regression line. If you would like the graph to include this line, you must use SPSS’s Chart Editor. To access the Chart Editor, you must double click on the scatterplot.

The Chart Editor refers to the least-squares regression line as a fit line. The pull-down menu for the Elements function in the Chart Editor contains a “Fit Line at Total” option. (Often, the lowest menu bar in the Chart Editor also contains a shortcut icon for this process.) Selecting this option begins the process for overlaying the regression line onto the existing scatterplot.

1. From the “Elements” pull-down menu in the Chart Editor, select “Fit Line at Total.”

2. A new window entitled Properties should appear. FIGURE 8.23 SPSS CHART EDITOR PROPERTIES WINDOW The choice of a fit method determines the line or curve that SPSS superimposes on the scatterplot. Simple analyses may require only a horizontal line to visually indicate the mean of all Y values. A linear fit produces a least-squares regression line. Loess, quadratic, and cubic fits refer to curvilinear relationships.

Select the appropriate Fit Method from the options provided. Most analyses require a linear fit. However, if you wish to investigate a possible curvilinear relationship, you may wish to request a cubic, quadratic, or loess fit.

3. Click CLOSE.

Example 8.27 Regression Line in SPSS Figure 8.23 shows the scatterplot in Figure 8.22 with an added regression line, obtained by requesting a linear fit within the Chart Editor window. As expected, the line has a positive slope. FIGURE 8.24 SPSS SIMPLE SCATTERPLOT WITH REGRESSION LINE OUTPUT The regression line indicates the general linear trend of points. This particular line is the one that SPSS identifies as producing the smallest sum of squared residuals for all points on the scatterplot.

In this case, the points may fit a curvilinear path, particularly a cubic curve, slightly better than they fit a linear path. Requesting a cubic fit in the Chart Editor window produces Figure 8.26. FIGURE 8.26 SPSS SIMPLE SCATTERPLOT WITH CUBIC CURVE OUTPUT The curve that appears in Figure 8.26 indicates the general cubic trend of points. This particular cubic curve is the one SPSS identifies as producing the smallest sum of squared residuals for all points on the scatterplot.

This curve does, in fact, seem to fit the data better than Figure 8.25’s line does. The researcher may, therefore, which to characterize the relationship between the number of

students in a class and the number of hedgers used per hour by the teacher as curvilinear.

Example 8.28 3-D Scatterplot in SPSS A three-dimensional scatterplot can represent the two variables from Example 8.26 and Example 8.27 along with the questions/hour variable used to demonstrate calculation of the multiple correlation coefficient in Example 8.13 In Example 8.13, x corresponds to the number of students in a particular class, y corresponds to the number of hedgers used per hour by the teacher, and z corresponds to the number of student questions per hour. Assigning these three variables to the appropriate axes in the 3-D Scatterplot window produces the following scatterplot. FIGURE 8.25 SPSS 3-D SCATTERPLOT OUTPUT Scales for the two independent variables appear along the X axis and the Y axis. The scale for the dependent variable appears along the Z axis. The researcher, however, can assign the variables to the axes that suit his or her purposes. Each point represents a particular subject’s scores for the two independent variables and the dependent variable.

The points on this scatterplot seem to float in space. Actually, though, each point is situated at the intersection of the planes representing the enrollment for a particular class, the number of hedgers used per hour by the teacher of that class and the number of questions

asked per hour by students in the class.

You should know that methods of creating a scatterplot in SPSS other than “Legacy Dialogues” option exist. The “Chart Builder” function within the “Graph” menu, for instance, also leads you through steps that produce a scatterplot. With the Chart Builder, you gain some more control over the appearance and components of the scatterplot than you have when using Legacy Dialogues. However, when comparing the two methods, the process needed to use the Chart Builder is a bit more complicated.

If you need to create a scatterplot that uses data points other than raw values you may wish to use a different approach. SPSS’s regression analysis function allows you to create such scatterplots. By clicking on the window’s “plots” button, you can access a new, entitled, Linear Regression: Plots, which allows you to specify scales based upon standardized values, residuals, and predicted values. This function generally has the most value for somewhat advanced analyses.

Regression Analysis in SPSS

With the exception of the scatterplot, itself, you can obtain all pairwise regression and correlation values by using SPSS’s “Regression” function. Output from the following steps includes regression equation coefficients, r, and r 2 .

1. Select “Regression” from SPSS’s Analyze pull-down menu and then, assuming a linear regression is desired, select the “Linear” option.

2. A window entitled Linear Regression should appear. A box in the upper left of the window contains the names of all variables. FIGURE 8.26 SPSS LINEAR REGRESSION WINDOW The user obtains regression values by identifying the independent variable(s) and the dependent variable from those listed on the left side of the window. To do so, highlight the name of the variable and click on the arrow next to the appropriate box.

Move the names of the independent and dependent variables to the properly-labeled boxes on the right. If the user moves the name of only one variable the box labeled “independent variable(s)”, SPSS performs a bivariate regression analysis. If the names of more than one variable are moved to the “independent variable(s) box, SPSS performs a multiple regression analysis.

3. Click OK

Four output tables result. The first of these tables simply identifies the variables used for the analysis. The other three tables provide the information that you need to assess the relationship between the independent and dependent variables. You can find the correlation coefficient and the coefficient of determination in the Model Summary table and coefficients for the regression equation in the Coefficients table’s column “B.” SPSS refers to the y-intercept as the constant and lists each slope next to its corresponding variable’s name.

The other table included in SPSS output provides ANOVA results. As explained in Section 8.6, some statisticians supplement regression and correlation analysis with an ANOVA. Although a regression and correlation analysis addresses the trend in changes between independent and dependent variable scores, it does not measure the sizes of differences between scores on either factor. So, even if a trend exists, differences in dependent-variable scores associated with changes in independent-variable scores may be so miniscule that the trend becomes negligible. Those concerned about this issue may use an ANOVA

determine whether significant differences exist between dependent-variable scores. When conducting an ANOVA in this circumstance, SPSS regards the independent variable as a categorical measure. Each independent-variable score, thus, defines a separate category, often resulting in categories that contain only one subject. Then, the ANOVA compares the dependent-variable score that corresponds to each independent-variable category. You can interpret the results of this test just as you would interpret the results of any ANOVA. (Please see Chapter 7 for information about ANOVAs.)

Example 8.29 SPSS Regression Output To further understand how to locate and interpret relevant regression and correlation coefficients, consider the four output tables as they apply to the bivariate situation used for Example 8.26 and Example 8.27. Variables Entered/Removed b
Variables
Variables
Model
Entered
Removed
Method
1
students a
.
Enter
a.
All requested variables entered.
b.
Dependent Variable: hedgers
Model Summary
Std. Error of the
Model
R
R Square
Square
Estimate
1
.703 a
.494
.481
2.59045
a. Predictors: (Constant), students
ANOVA b
Sum of
Model
Squares
df
Mean Square
F
Sig.
1
Regression
249.405
1
249.405
37.167
.000 a
Residual
254.995
38
6.710
Total
504.400
39
a.
Predictors: (Constant), students
b.
Dependent Variable: hedgers
 Coefficients a Unstandardized Standardized Coefficients Coefficients Model B Std. Error Beta T Sig. 1 (Constant) 1.017 .799 1.272 .211 Students .101 .017 .703 6.096 .000 a. Dependent Variable: hedgers

TABLE 8.9, TABLE 8.10, TABLE 8.11, AND TABLE 8.12 SPSS LINEAR REGRESSION OUTPUT SPSS output for the linear regression command includes four tables. Table 8.9, entitled “Variables Entered/Removed,” indicates the independent variables and footnotes the name of the dependent variable. Table 8.10, Table 11, and Table 8.12 provide information about the changes in variable scores. The correlation coefficient (r) and the coefficient of determination (r 2 ) found in the Model Summary, indicate the strength of the linear trend between the variables. The significance value in the ANOVA table, when compared to a predetermined α, indicates whether changes in dependent- variable scores that accompany changes in independent variable scores are significant. Finally, the Coefficients table provides the y-intercept and the slope for the regression equation.

The correlation coefficient of .703, from Table 8.10, suggests that the number of students in

a class and number of hedgers used per hour by the teacher have a strong (although barely

so) linear relationship. For those who do not wish to square the correlation coefficient themselves, this table also includes the coefficient of determination, which indicates that differences in the number of student in the class can explain 49.4% of differences in teachers’ use of hedgers. Further, the ANOVA produces a p-value of .000, which, obviously, lies below all α values. So, one could conclude that the number of hedgers used by teachers per hour changes significantly with respect to in the number of students in the class. The

regression equation helps to further describe this change. Using the regression equation of

y = 1.017 + .101x, obtained from value in Table 8.12, one can the dependent-variable score

for each independent-variable score. Each x value substituted into the equation and the y

value that results provides an ordered pair that falls on the regression line. This process

produces a best guess for the number of hedgers used based upon class size.

If you input more than one variable name into the Linear Regression window’s

Independent Variable(s)box, output looks similar to that shown in Example 8.29. In this

case, however, the Model Summary provides the multiple correlation coefficient and the coefficient of multiple determination. Also, the “B” column in the Coefficients table includes

a slope for each independent variable.

Correlation Matrices in SPSS

You may not always want to obtain all of the information provided by SPSS’s regression

analysis. In some situations, correlation coefficients, alone, suffice. The Correlate function can not only provide these values without unneeded regression output, but can also display coefficients for more than one pair of variables at a time and can compute partial correlation coefficients. Coefficients appear in a correlation matrix. The following steps produce this output.

1. Select “Correlate” from SPSS’s Analyze pull-down menu. Then, indicate whether SPSS should calculate bivariate (pairwise) or partial correlation coefficients.

2. For a bivariate analysis, a new window entitled Bivariate Correlations, should appear.

a. FIGURE 8.27 SPSS BIVARIATE CORRELATIONS WINDOW SPSS calculates correlation coefficients between each pair of variables with names appearing in the box labeled “Variables.” The user should move the name of each variable involved in the analysis from the box on the left of the window by highlighting the name of the variable and clicking on the arrow to the left of the “Variables” box.

Move the name of all variables that you would like to analyze from the list of variables on the left of the window to the box labeled “Variables.” The “Variables” box can contain as many variable names as needed. SPSS will calculate the pairwise correlation coefficient between each pair of variables listed. For instance, if the names of variables “x”, “y”, and “z” appear in the “Variable” box, SPSS calculates r XY, r XZ , and r YZ .

b. For partial correlations, a new window entitled, Partial Correlations, should appear. FIGURE 8.28 SPSS PARTIAL CORRELATIONS WINDOW SPSS calculates correlation coefficients between each pair of variables with names appearing in the box labeled “Variables,” while removing the effects of any variables with names appearing in the box labeled “Controlling for.” The user should move the name of each variable involved in the analysis from the box on the left of the window by highlighting the name of the variable and clicking on the arrow to the left of the appropriate box.

The names of intervening variables should be moved from the list of variable son the left of the window to the box labeled “Controlling for.” Move the name of the variables involved in the correlation, itself, to the box labeled “Variables.” The “Variables” box can contain as many variable names as needed. SPSS will calculate the correlation coefficient between each pair of variables listed while holding steady the influence of the variable(s) appearing in the “Controlling for” box.

3. Click OK.

SPSS assigns each variable for which you requested a correlation to a row and column of the resulting correlation matrix. The coefficient for a particular linear relationship appears at the intersection of each relevant row and column, as shown in Table 8.13, based upon an analysis involving four variables, W, X, Y, and Z.

 W X Y Z W r WW r WX r WY r WZ X r XW RXX r XY r XZ Y r YW r YX r YY r YZ Z r ZW r ZX r ZY r ZZ

TABLE 8.13 - BASIC CORRELATION MATRIX The interior portion of the table contains correlation coefficients for all pairs of variables. Values along the diagonal, which represent associations between each variable and itself, equal +1.00. This diagonal also serves as a line of symmetry because r WX = r XW , r WY =r YW, etc.

SPSS’s correlation matrix contains correlation coefficients as well as significance values and sample sizes for the data used to analyze each pair of variables.

In some cases, your analysis may focus entirely upon these pairwise correlation coefficients. Often, though, obtain these values is just the first step in a multiple regression or correlation analysis.

Example 8.30 SPSS Correlation Matrix One may wish to begin an investigation into the relationship between class size, the number of hedgers used per hour by a teacher, and the number of questions asked per hour

by students by considering pairwise correlation coefficients. These values appear in Table

8.14.

 Correlations Students Hedgers questions students Pearson Correlation 1.000 .703 ** .592 ** Sig. (2-tailed) .000 .000 N 40 40 40 hedgers Pearson Correlation .703 ** 1.000 .495 ** Sig. (2-tailed) .000 .001 N 40 40 40 questions Pearson Correlation .592 ** .495 ** 1.000 Sig. (2-tailed) .000 .001 N 40 40 40

TABLE 8.14 - SPSS PAIRWISE CORRELATION MATRIX The correlation coefficient for each pair of variables appears at the intersection of one variable’s row and the other variable’s column. Each variable correlates perfectly with itself, as evidenced by the coefficients of +1.00 at the intersection of a particular variables’ row and column.

The number of students in a class correlates strongly with the number of hedgers used per hour by the teacher of that class (r XY = +.703). A moderate correlation exists between the number or students in a class and the number of questions asked per hour by students (r XZ = +.592) as well as between the number of questions asked per hour by students and the number of hedgers used per hour by the teacher (r YZ = +.495). The fact that all of these correlation coefficients have positive values indicates that increases in one variable

correspond to increases in the other.

A table similar to Table 8.14 emerges from SPSS when you request partial correlation coefficients. In this case, SPSS informs the user that it has held constant the impact of intervening variables by including their names under a “control variable” heading in the output.

Example 8.31 SPSS Partial Correlation Matrix Table 8.15 shows the SPPS results comparable to the calculations in Example 8.16. The correlation matrix values describe the relationship between the number of questions asked per hour by students and the number of hedgers used per hour by the teacher researcher asks SPSS, independent of any influence of the percentage of factual information in course material.

 Correlations Control Variables students Hedgers factualinfo students Correlation 1.000 .628 Significance (2-tailed) . .372 Df 0 2 hedgers Correlation .628 1.000 Significance (2-tailed) .372 . Df 2 0

TABLE 8.15 - SPSS PARTIAL CORRELATION MATRIX By listing “factualinfo” as a control variable on the left side of the table, SPSS reminds the user that it removed any influence that the percentage of factual information in a course has upon the number of students in the class and the number of hedgers used per hour by the teacher.

Because it plays the role of an intervening variable, “factual” is identified as a control variable in Table 8.15 rather than appearing as part of the main correlation matrix. The resulting partial correlation coefficient of +.628 also emerged from the calculations in Example 8.16. This value indicates a moderate tendency for the number of hedgers used by the teacher to increase as class enrollment increases when discounting the effects of the

intervening variable upon both of the other two variables.

The “Linear Regression” box provides another method for obtaining partial correlations. Although this method only displays one partial correlation coefficient at a time, it also provides part correlation coefficients, which you cannot obtain in matrix form. If you need

to include part correlation coefficients in your analysis, therefore, you may prefer following procedure.

1. Select “Regression” from SPSS’s Analyze pull-down menu. Then, select the “Linear” option.

2. A window entitled Linear Regression should appear. Follow the procedure described earlier in this document for identifying the independent variable(s) and dependent variable for the analysis. However, include the intervening variable(s) among those on the independent variable list. Be sure to remember which of the variables is the true independent variable and which are intervening variables.

3. Click on the button marked, “Statistics,” on the right of the Linear Regression window. A new window, entitled, Linear Regression: Statistics should appear. FIGURE 8.29 LINEAR REGRESSION: STATISTICS WINDOW The prompt for part and partial correlations can be found in this window. With this option selected, SPSS calculates correlation coefficients between one independent variable and the dependent variable, independent of all other independent variables identified in the Linear Regression window.

4. Mark the box labeled “Part and partial correlations,” located on the right side of the window. Doing so tells SPSS to calculate the correlation between the dependent variable and each independent variable while holding constant the effects of all other independent variables.

6. Click OK.

The partial and part correlation coefficients appear in the output’s “Coefficients” table, under the heading “Correlations.”

Example 8.32 SPSS Part and Partial Correlation Output Table 8.16 includes the partial correlation coefficient first presented in Example 8.31 as well as the comparable part correlation coefficient.

 Coefficients a Unstandardized Standardized Coefficients Coefficients Correlations Model B Std. Error Beta T Sig. Zero-order Partial Part 1 (Constant) 5.145 15.399 .334 .770 students .152 .133 .771 1.142 .372 .773 .628 .512 factualinfo .000 .181 -.003 -.004 .997 -.580 -.003 -.002 a. Dependent Variable: hedgers

TABLE 8.16 SPSS COEFFICIENTS TABLE WITH PARTIAL AND PART CORRELATIONS Partial and part correlation coefficients appear on the far right of the table. The values in the row labeled, “students” pertain to the relationship between the number of students in the course and the number of hedgers used per hour by the teacher independent of the amount of factual information in the course. The table also provides coefficients for the multiple regression equation that uses the number of students and the percentage of factual information in a course to predict the number of hedgers used per hour by the teacher.

Not surprisingly, Table 8.16 and the correlation matrix in Example 8.32 both identify the partial correlation as +.628. One would rely upon Table 8.16, however, to learn the part correlation coefficient. This value, +.512, describes the linear relationship between the

number of student in a class and the number of hedgers used by the teacher, independent of any effect that the amount of factual information in the class has upon the former. This value lies below both the pairwise and the partial correlation coefficients, but still

characterizes the relationship as moderately strong.

Phi Analysis in SPSS

The request for a phi coefficient in SPSS takes place within the Crosstabulations context. To access this window and to instruct SPSS to include the phi-coefficient along with its crosstabulation output, you should use the following steps.

1. Select “Descriptive Statistics” from SPSS’s Analyze pull-down menu.

2. A new menu, containing a “Crosstabs” option appears. Select this option.

3. A Crosstabs window should appear. FIGURE 8.30 SPSS CROSSTABS WINDOW The user should move the name of one variable from the box on the left of the window to the box labeled “Row(s)” and the name of another variable from the box on the left to the box labeled “Colunm(s).” Highlighting the name of the variable and clicking on the arrow to the left of the “Row(s)” or “Column(s)” box moves the variable name to the appropriate place.

Move the name of one variable involved in the analysis from the list on the left of the window to the “Row(s)” box. Move the name of the other variable involved in the analysis from the list on the left of the window to the “Column(s)” box.

4. Click the “Statistics” button, located on the right of the window. A new window entitled Crosstabs: Statistics should appear. FIGURE 8.31 SPSS CROSSTABS:STATISTICS WINDOW

The user instructs SPSS to include

option in the Crosstabs: Statitistics Window. The resulting value describes the trend in frequencies for categories of the variables in the crosstabulation.

in its crosstabulation output by selecting the “Phi and Cramer’s V”

Click on the open box next to the “Phi and Cramer’s V” listing. Be sure that this box contains a check mark.

5. Click the “Continue” button at the bottom of the page. You should return to the Crosstabs window.

6. Click OK.

The output that results from these steps consists of a crosstabulation table (discussed in Chatper 2) and a Symmetric Measures table. The second of these contains the phi coefficient.

Example 8.33 SPSS Crosstabulation Output Including Symmetric Measures The output for a crosstabulation and phi analysis involving the student enrollment categories and the hedger use categories introduced in Section 8.4 of the chapter appears as follows.

 Case Processing Summary Cases Valid Missing Total N Percent N Percent N Percent studcats * hedgecats 40 100.0% 0 .0% 40 100.0% studcats * hedgecats Crosstabulation Count hedgecats less than 5 5 or more Total Studcats fewer than 30 13 2 15 30 or more 4 21 25 Total 17 23 40 Symmetric Measures Value Approx. Sig.
 Nominal by Nominal Phi .692 .000 Cramer's V .692 .000 N of Valid Cases 40

TABLE 8.17, TABLE 8.18, and TABLE 8.19 SPSS CROSSTABULATION AND PHI COEFFICIENT OUTPUT

Table 8.6 and 8.7 are part of SPSS’s standard crosstabulation output. The value of

Symmetric Measures table indicates the strength of the trend in frequencies of classes that fall into the two enrollment and the two hedgers categories.

that appears in the

Table 8.19’s phi coefficient of +.692 indicates a moderate (close to strong) trend of larger values in the upper left and lower right cells than in the other two cells of the Table 8.18’s crosstabulation.