Sei sulla pagina 1di 15

Data Mining Project

(Predict Election Winners)


- By Harshal Kolhatkar
Problem Statement

• An election is to be held in next month, ABC Corporation


a data analytics company wants to predict the future of
two largest parties in country.

• Two major parties are BJP & Congress.

• Goal : Use Polling data to predict state Winner.


Given Dataset

Instance represent a state in a given election

• State : Name of the state

• Year : Election year (2004,2008,2012)

Dependent Variable

• BJP : 1 if BJP won state, 0 if congress won.

Independent Variable

• Times now, India Today : Polled BJP% - Polled Congress%

• DiffCount : Polls with BJP winner – Polls with congress winner

• PropBJP : Polls with BJP winner / # polls


Data Cleaning
 Summary of Polling Data
Data Cleaning – Packages to handle Missing Values

List of R Packages
1. MICE (Multiple Imputation Via Chain Equation)
2. Amelia
3. miss Forest
4. Hmisc
5. mi
Data Cleaning
 Graphical Representation of Missing Value

Before Cleansing After Cleansing


Data Visualization
Mean = 0.2525253 Mean = 0.02858385
Standard Deviation = 14.27238 Standard Deviation = 1.026924

Before Normalizing After Normalizing


Data Visualization
Mean = 0.3838384 Mean = 0.02858385
Standard Deviation = 15.45745 Standard Deviation = 1.026924

Before Normalizing After Normalizing


Data Visualization – Checking Normality
Before Normalizing After Normalizing

Times Now
Data Visualization – Checking Normality
Before Normalizing After Normalizing

India Today
Data Modeling

 Collinearity is a linear association between two explanatory variables.

 Two variables are perfectly collinear if there is an exact linear relationship

between them.
Data Modeling (Using Train & Test)

Years : 2004, 2008, 2012

Train : 2004, 2008

Test : 2012
Data Model ( Logistic regression )
With India Today + Prop BJP With Prop BJP
Data Model ( Logistic regression )
• Train model ( Year 2004,2008)
• Accuracy of model = 94.11%
• Test Model (Year 2012)
• Accuracy of data = 96.77%
Conclusion

• Finally I conclude that the model that I have


made is performing well to predict data of
year 2012.
• So we can use this model to predict the state
winners.

Potrebbero piacerti anche