6.MV - Logistics Regression

Industrial Statistics
MS3001 Advanced Marketing Research

Faculty of Science University of Colombo
Application of Multivariate Statistical Methods in Marketing Research

Session 3 Logistics Regression
December 25, 2013
Illustration
Who will win in the next elections NDA or UPA? Are factors such as location, religion, caste, education, past voting behavior etc. decisive of this?
I am concerned about attrition of my customers. Can you help me predict who are likely to drop off in the next six months? What make them attrite? What make my loyal customers stay with me?
What determines Contraceptive Practice amongst ever married women in rural? Their religion? Caste? Education? Awareness of various methods?
Logistic Regression
December 25, 2013
Logistic Regression
Useful in making prediction of an event: Victory or Loss Will vote or Will not vote Adoption or Rejection Or of multiple events or things: Accept, Reject, Defer Hindu, Muslim, Christian Doctorate, Postgraduate, Undergraduate Of interest is in knowing probability of occurrence of an event or thing Single non-metric dependent variable (binary or multichotomous) Several (more than two) metric (interval or ratio) or non-metric (nominal) independent variables (predictors) Does not assume (or require) normally distributed data Often requires very large samples Results in an equation (or set of equations) using which probabilities can be computed and classifications can be made
December 25, 2013
Background to the course: What is Logistic Regression?

Logistic regression in a nutshell:
It is a multiple regression with an outcome variable (or dependent variable) that is a categorical dichotomic and explanatory variables that can be either continuous or categorical In other words, the interest is in predicting which of two possible events are going to happen given certain other information For example in Political Science, logistic regression could be used to analyse the factors that determine whether an individual participates in a general election or not.
December 25, 2013
Why cannot we use a Simple Linear Regression?

Let us remember what we have learnt about Simple Linear Regression:
We used it when we had reasons (a theory) to assume causality between two variables: X Y. Example:
X= Investment in R&D; Y= New Products introduced
December 25, 2013
Simple Linear Regression

This sort of regression analysis provides us with useful information:
E.g.: For a certain confidence level (95%, for example): How much the explained variable (Y) changes as a result of a change in the explanatory variable (X) With a regression we can predict the value of Y given the value of X
December 25, 2013
Simple Linear Regression How is this impact of X on Y estimated?

We assumed a linear relation between the two variables We introduced u, unobserved factors affecting Y, which we are not going to account for in our model Then we postulated the following relation: Yi = + Xi + ui
December 25, 2013
Simple Linear Regression How is this impact of X on Y estimated?

We made some assumptions about u
(Basically we assumed that ui are identically and independently distributed with zero mean and constant variance)
Then we estimated the parameters for the model (using generally

Ordinary Least Squares)
Simple Linear Regression provides the best fit line. i.e.: the straight line which best describes the relationship between the two variables
December 25, 2013
Our example: R&D and New Products

How does investment in R+D affects the number of new products developed? We can postulate the following relation:
# of new products = + *Investment in R+D + u

Let us look at the scatter plot:
50
40
30
20
NEWPROD
10
0 0 200 400 600 800
RD
December 25, 2013
Our Example: Investment in R+D and introduction of new products

It makes sense to assume a linear relation between X and Y in this case. The estimate for = 0.049 This tells us that in order to increase the number of new products in one unit, we need to invest a little bit more than 20 monetary units in R+D. If a company invests 1000 in R+D, we would predict this company to develop around 49 new products
NEWPROD
50
40
30
20
10
0 0 200 400 600 800
RD
December 25, 2013
Another example: Failing or Passing an exam

Let us define a variable Outcome
We can reasonably assume that Failing or Passing an exam depends on the quantity of hours we use to study Note than in this case, the dependent variable takes only two possible values. We will call it dichotomic variable
Outcome = 0 if the individual fails the exam = 1 if the individual passes the exam
December 25, 2013
Regression analysis with dichotomic dependent variables

We will be interested then in inference about the probability of passing the exam.
Were we to use linear regression, we would postulate:
Prob (Outcome=1) = + *Quantity of hours of study + u
As we are concerned about modelling the probability of the event occurring, this is a probability model As we model the relation between the quantity of hours of study and the probability of passing the exam as linear, this is a linear model We will call this model a Linear Probability Model (LPM)
December 25, 2013
Linear Probability Models (LPM)

Our dataset contains information about 14 students. Our statistical software (SPSS) will happily perform a linear regression of Outcome, on the quantity of study hours.
Student id 1 2 3 4 5 6
Outcome 0 1 0 0 0 1
Quantity of Study Hours 3 34 17 6 12 15
7 8
9 10 11 12 13 14
1 1
0 1 0 1 1 0
26 29
14 58 2 31 26 11
December 25, 2013
Linear Probability Models (LPM) What is wrong about them?
OUTCOME
Let us do a scatter plot and insert the regression line: The probability of Outcome=1 can take values between 0 and 1 But we do not observe probabilities but the actual event happening A straight line will predict values between negative and positive infinity, outside the [0,1] interval!
1.2
1.0
.8
.6
.4
.2
0.0
-.2 0 10 20 30 40 50 60
HSTUDY
December 25, 2013
What is wrong with LPM?

Coefficients Model Unstandardized Coefficients Sig. B Std. Error 1 (Constant) 0.031861 0.161591 0.846994 HSTUDY 0.026219 0.006483 0.001627 Dependent Variable: OUTCOME
Above is the SPSS output on the linear regression of Outcome on Hours of Study The results suggest that an increase in 1 hour of studying increases the probability of passing the exam, on average, by approx. 0.026 or 2.6%. So what would the model predict if we studied 100 hours for the exam?
December 25, 2013
Linear Probability Models (LPM) What is wrong with them?

Basically, the linear relation we had postulated before between X and Y is not appropriate when our dependent variable is dichotomic. Predictions for the probability of the event occurring would lie outside the [0,1] interval, which is unacceptable.
December 25, 2013
Non Linear Probability Models

We want to be able to model the probability of the event occurring with an explanatory variable X, but we want the predicted probability to remain within the [0,1] bounds.
There is a threshold above which the probability hardly increases as a reaction to changes in the explanatory variable.
Many functions meet these requirements (non-linearity and being bounded within the [0,1] interval). We will focus on the Logistic. The Logistic Curve will relate the explanatory variable X to the probability of the event occurring. In our example, it will relate the number of study hours with the probability of passing the exam.
December 25, 2013
The Logit Model

A Logit Model states that:
Prob(Yi=1) = F ( + Xi) Prob(Yi=0) = 1 - F ( + Xi) Where F(.) is the Logistic Function. So, the probability of the event occurring is a logistic function of the independent variables
December 25, 2013

6.MV - Logistics Regression

Caricato da

Informazioni sul documento

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

6.MV - Logistics Regression

Caricato da

Copyright:

Formati disponibili

Industrial Statistics

MS3001 Advanced Marketing Research

Application of Multivariate Statistical Methods in Marketing Research

December 25, 2013

December 25, 2013

December 25, 2013

Background to the course: What is Logistic Regression?

December 25, 2013

Why cannot we use a Simple Linear Regression?

December 25, 2013

Simple Linear Regression

December 25, 2013

Simple Linear Regression How is this impact of X on Y estimated?

December 25, 2013

Simple Linear Regression How is this impact of X on Y estimated?

Then we estimated the parameters for the model (using generally

December 25, 2013

Our example: R&D and New Products

# of new products = + *Investment in R+D + u

0 0 200 400 600 800

December 25, 2013

Our Example: Investment in R+D and introduction of new products

0 0 200 400 600 800

December 25, 2013

Another example: Failing or Passing an exam

December 25, 2013

Regression analysis with dichotomic dependent variables

December 25, 2013

Linear Probability Models (LPM)

Quantity of Study Hours 3 34 17 6 12 15

December 25, 2013

Linear Probability Models (LPM) What is wrong about them?

December 25, 2013

What is wrong with LPM?

December 25, 2013

Linear Probability Models (LPM) What is wrong with them?

December 25, 2013

Non Linear Probability Models

December 25, 2013

The Logit Model

December 25, 2013

Potrebbero piacerti anche