Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Illustration
Who will win in the next elections NDA or UPA? Are factors such as location, religion, caste, education, past voting behavior etc. decisive of this?
I am concerned about attrition of my customers. Can you help me predict who are likely to drop off in the next six months? What make them attrite? What make my loyal customers stay with me?
What determines Contraceptive Practice amongst ever married women in rural? Their religion? Caste? Education? Awareness of various methods?
Logistic Regression
Logistic Regression
Useful in making prediction of an event: Victory or Loss Will vote or Will not vote Adoption or Rejection Or of multiple events or things: Accept, Reject, Defer Hindu, Muslim, Christian Doctorate, Postgraduate, Undergraduate Of interest is in knowing probability of occurrence of an event or thing Single non-metric dependent variable (binary or multichotomous) Several (more than two) metric (interval or ratio) or non-metric (nominal) independent variables (predictors) Does not assume (or require) normally distributed data Often requires very large samples Results in an equation (or set of equations) using which probabilities can be computed and classifications can be made
Simple Linear Regression provides the best fit line. i.e.: the straight line which best describes the relationship between the two variables
50
40
30
20
NEWPROD
10
RD
50
40
30
20
10
RD
We can reasonably assume that Failing or Passing an exam depends on the quantity of hours we use to study Note than in this case, the dependent variable takes only two possible values. We will call it dichotomic variable
Outcome = 0 if the individual fails the exam = 1 if the individual passes the exam
As we are concerned about modelling the probability of the event occurring, this is a probability model As we model the relation between the quantity of hours of study and the probability of passing the exam as linear, this is a linear model We will call this model a Linear Probability Model (LPM)
Student id 1 2 3 4 5 6
Outcome 0 1 0 0 0 1
7 8
9 10 11 12 13 14
1 1
0 1 0 1 1 0
26 29
14 58 2 31 26 11
OUTCOME
Let us do a scatter plot and insert the regression line: The probability of Outcome=1 can take values between 0 and 1 But we do not observe probabilities but the actual event happening A straight line will predict values between negative and positive infinity, outside the [0,1] interval!
1.2
1.0
.8
.6
.4
.2
0.0
-.2 0 10 20 30 40 50 60
HSTUDY
Above is the SPSS output on the linear regression of Outcome on Hours of Study The results suggest that an increase in 1 hour of studying increases the probability of passing the exam, on average, by approx. 0.026 or 2.6%. So what would the model predict if we studied 100 hours for the exam?
Many functions meet these requirements (non-linearity and being bounded within the [0,1] interval). We will focus on the Logistic. The Logistic Curve will relate the explanatory variable X to the probability of the event occurring. In our example, it will relate the number of study hours with the probability of passing the exam.