Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
In this exercise, we will work an example of logistic regression as found in the literature:
Sandra L. Hanson and Douglas M. Sloane, "Young Children and Job Satisfaction." Journal
of Marriage and the Family, 54 (November, 1992), 799-811.
The data for this problem is: YoungChildrenJobSatisfaction.Sav.
Slide 1
Relationship to be analyzed
"We are interested in examining the effect of young children on the job satisfaction of
men and women involved in a variety of work and family roles to see how the presence
of family responsibilities affects their happiness at work. The research is comparative. It
involves contrasts between men and women in different work and marital statuses as
several points in time." (page 800)
Slide 2
Slide 3
Slide 4
Slide 5
Slide 6
Slide 7
Slide 8
Slide 9
Slide 10
Slide 11
Slide 12
Slide 13
Slide 14
Slide 15
Slide 16
Slide 17
Slide 18
Slide 19
Slide 20
Slide 21
Slide 22
Slide 23
Slide 24
Slide 25
Slide 26
Slide 27
In this problem the model Chi-Square value of 57.153 has a significance of 0.000, less
than 0.05, so we conclude that there is a significant relationship between the dependent
variable and the set of independent variables.
Slide 28
Measures Analogous to R
The next SPSS outputs indicate the strength of the relationship between the dependent
variable and the independent variables, analogous to the R measures in multiple
regression.
The Cox and Snell R measure operates like R, with higher values indicating greater
model fit. However, this measure is limited in that it cannot reach the maximum value
of 1, so Nagelkerke proposed a modification that had the range from 0 to 1. We will rely
upon Nagelkerke's measure as indicating the strength of the relationship.
Based on the interpretive criteria, we would characterize this model as weak.
Slide 29
The goodness-of-fit measure has a value of 5.678 which has the desirable outcome of
nonsignificance.
Young Children and Job Satisfaction
Slide 30
To evaluate the accuracy of the model, we compute the proportional by chance accuracy
rate and the maximum by chance accuracy rates, if appropriate. Since the sizes of the
groups in this problem are equal to 46% and 54%, the proportional accuracy criterion is
appropriate because we do not have a dominant group.
The proportional by chance accuracy rate is equal to 0.503 (0.463^2 + 0.537^2). A 25%
increase over the by chance accuracy rate would equal 0.628.
Our model accuracy race of 63.2% meets this criterion.
Slide 31
Stacked Histogram
SPSS provides a
visual image of
the classification
accuracy in the
stacked
histogram as
shown below.
To the extent to
which the cases
in one group
cluster on the
left and the
other group
clusters on the
right, the
predictive
accuracy of the
model will be
higher.
Slide 32
Slide 33
Presence of outliers
There are two outputs to alert us to outliers that we might consider excluding from the
analysis: listing of residuals and saving Cook's distance scores to the data set.
SPSS provides a casewise list of residuals that identify cases whose residual is above or
below a certain number of standard deviation units. Like multiple regression there are a
variety of ways to compute the residual. In logistic regression, the residual is the
difference between the observed probability of the dependent variable event and the
predicted probability based on the model. The standardized residual is the residual
divided by an estimate of its standard deviation. The deviance is calculated by taking
the square root of -2 x the log of the predicted probability for the observed group and
attaching a negative sign if the event did not occur for that case. Large values for
deviance indicate that the model does not fit the case well. The studentized residual
for a case is the change in the model deviance if the case is excluded. Discrepancies
between the deviance and the studentized residual may identify unusual cases. (See the
SPSS chapter on Logistic Regression Analysis for additional details).
In the output for our problem, SPSS listed one cases that have may be considered an
outlier with a studentized residuals greater than 2:
Slide 34
Cooks Distance
SPSS has an option to compute Cook's distance as a measure of influential cases and add
the score to the data editor. I am not aware of a precise formula for determining what
cutoff value should be used, so we will rely on the more traditional method for
interpreting Cook's distance which is to identify cases that either have a score of 1.0 or
higher, or cases which have a Cook's distance substantially different from the other. The
prescribed method for detecting unusually large Cook's distance scores is to create a
scatterplot of Cook's distance scores versus case id.
Slide 35
Slide 36
Slide 37
Slide 38
Slide 39
Slide 40
Slide 41
Slide 42
Slide 43
Slide 44
Compute the Variable to Randomly Split the Sample into Two Halves
Slide 45
Slide 46
Slide 47
Slide 48
Split=0
Split=1
Model Chi-Square
57.153, p=.0000
54.386, p<.0001
28.867, p=.0109
Nagelkerke R2
.135
.246
.136
63.20%
72.12%
65.80%
56.51%
59.85%
Significant
Coefficients
(p < 0.05)
GENHAPPY 'How
Happy Generally'
PRESTIGE 'Job
Characteristic Prestige'
CONVENIE 'Job
Characteristic Convenience'
YEAR 'GSS Year
for Respondent'
GENHAPPY 'How
Happy Generally'
PRESTIGE 'Job
Characteristic Prestige'
CONVENIE 'Job
Characteristic Convenience'
FAMILSAT 'Family
Satisfaction'
CONVENIE 'Job
Characteristic Convenience'
YEAR 'GSS Year
for Respondent'
JCINCOME 'Job
Characteristic Income'
Only one predictor variable, CONVENIE 'Job Characteristic - Convenience, has a stable,
statistically significant relationship to the dependent variable, Job Satisfaction.
In addition, the accuracy that we should evaluate in assessing our model is in the 56% to
59% range rather than in the 63% to 72% range. At this accuracy rate, the model does
not represent a 25% increase over the proportional by chance accuracy rate.
In sum, we do find a relationship between one of the independent variables and job
satisfaction. Our findings should be regarded as tentative or exploratory rather than
definitive because we would not meet the classification accuracy rate required for a
usable model.
Tabachnick and Fidell Sample Problem
Slide 49