Regression With Categorical Variables

Regression With
Categorical Variables
Overview
 Regression with Categorical Predictors
 Logistic Regression
Regression with a Categorical
Predictor Variable
 Recall that predictor variables must be
quantitative or dichotomous.
 Categorical variables that are not
dichotomous can be used, but first they must
be recoded to be either quantitative or
dichotomous.
Ways to Code a
Categorical Variable
 Dummy Coding
 Effect Coding
 Orthogonal Coding
 Criterion Coding
Dummy Coding
 Test for the overall effect of the predictor
variable
 1 indicates being in that category and 0
indicates not being in that category; need one
fewer dummy variables than categories
Dummy Coding Example
 We are trying to predict happiness rating
using region of the country as a predictor
variable
 Three regions:
 Northeast
 Southeast
 West
Dummy Coding Example
 Dummy Variable 1:
Northeast = 1 Southeast = 0 West = 0
 Dummy Variable 2:
Northeast = 0 Southeast = 1 West = 0
 We don’t need a third dummy variable,
because West is indicated by 0’s on both
dummy variables
Effect Coding
 Compare specific categories to each other
 Use weights to indicate the intended contrast
Orthogonal Coding
 Same as Effect Coding, except that the
contrasts are orthogonal to each other
 You can do a maximum of k-1 orthogonal
contrasts, where k is the number of
categories
Criterion Coding
 Overall relationship between predictor and
criterion variable
 Each individual is assigned the mean score
of the category
Review!
I want to predict level of course satisfaction
from whether each student is a commuter or
not. How could I recode the predictor variable?
Review!
What is an advantage of criterion coding
compared to dummy coding?
Regression with a Categorical
Outcome Variable
 Logistic Regression
 The outcome variable (Y) indicates whether
or not the individual falls in a particular
category
 0 = not in category
 1 = in category
Why is it Logistic?
 One of the assumptions for linear regression
is a linear relationship between X and Y
 When Y is categorical, it can’t have a linear
relationship with X
 A logarithmic transformation can make the
relationship appear linear
Logistic Regression Methods
 Similar to options for linear Multiple
Regression
 Use hierarchical/forced entry to test a theory
 Use stepwise (backward or forward) to
search for the best fitting model
Evaluating the Model
 Log-likelihood statistic measures amount of
unexplained data
 Compare model to baseline model
 Baseline model: predict that everyone will be
in the category that is most frequent
 Is there significant improvement in
prediction?
 -2LL is the log-likelihood statistic multiplied by
-2 so that it yields a c2 distribution and
significance can be determined
 Model chi-square indicates the difference
between -2LL with predictor(s) and -2LL in
the baseline model
 Significant model c2 means that the model is
helping to predict the outcome variable
 When there are multiple steps in the analysis,
the step c2 indicates whether there was
improvement in the model from the previous
step
 Positive value of R means that increases in X
(or combination of X variables) are
associated with increased probability of the
case being in the category (Y = 1)
 Nagelkerke’s R2 can be interpreted similar to
R2 in linear Multiple Regression
Evaluating Predictor Variables
 B is the regression coefficient
 The Wald Statistic indicates whether B is
significantly different from 0
B
Wald 
SE
Evaluating Predictor Variables
Exp (B) is the change in odds that the case
will be in the category from a one-unit change
in X
prob. (case in category)
odds 
prob. (case not in category)
odds after 1 - unit change in X

Exp(B) 
original odds
Review!
I am predicting whether or not you want to take
another statistics class from your grade in the
current class. I find that 16 out of 50 students
want to take another statistics class. What
would my baseline model predict?
Review!
I am predicting whether or not you want to take
another statistics class from your grade in the
current class. Explain what an Exp(B) of 1.5
would indicate.
Review!
TRUE OR FALSE?
If the model c2 is significant, it indicates that
there is less unexplained data when the
predictor variables are included than when the
predictors are not included.
Review!
TRUE OR FALSE?
If I use hierarchical entry and enter all of the
predictors at once, the model c2 will be the
same as the step c2.
Reporting a Logistic
Regression
We conducted a logistic regression to predict
likelihood of voting from age, education, and
TV watching. The model explained a
significant portion of variance, c2 (3) =
196.61, p < .001, Nagelkerke R2 = .18. As
shown in Table 1, all three variables were
significant predictors of voting behavior.
Choosing Stats
College students are asked to indicate whether
they have Facebook accounts or not and
whether they have engaged in binge drinking in
the last month or not. The researcher
hypothesizes that those with Facebook
accounts are more likely to have engaged in
binge drinking.

Regression With Categorical Variables

Caricato da

Informazioni sul documento

Descrizione originale:

Titolo originale

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Regression With Categorical Variables

Caricato da

Copyright:

Formati disponibili

Regression With

odds after 1 - unit change in X

Potrebbero piacerti anche