Sei sulla pagina 1di 4

MGTSC 645 Shivani Gupta

Assignment 2 1646112

Decision Tree:

For the bank data that consists of multiple information – age, balance, job, education, deposit etc., the
python code is written for the decision tree portion.

The mean age for the dataset is: 41.232

The number of observations is: 11162

The number of individuals with age less than 65 is: 10737

The below image shows the data output for the above three questions:

After cleaning the data and removing all unknown cells, the number of observations that are left is: 2675

The below image shows the output of the code after data cleaning:
After data cleaning, the dummy variables are created for columns – Marital, Education, Housing, Loan,
Contact, Poutcome and Deposit such that the number of dummy variables for each of the column is one
less than the types of outcome. For example: Marital has three possible outcomes like divorced, single
and married. Therefore, two dummy variables are created for Married categorical variable.

Thus, this step brings us to a total of 20 columns in the dataset and the sample of the dataset is shown
in the picture of the output attached below:

For defining the categorical variables into dummy variables, for k possible values of the variable, we
need to create (k-1) dummy variables to ensure the variable is completely defined. Therefore, one
dummy variable for each column is dropped.

The decision tree for the bank dataset will look like this:

Balance
?

Mediu High
Low
m >$2500

Own a Educati Marital


house? on? Status?

Second Marrie Divorce


No Yes Primary Tertiary Single
ary d d

Any
Job?
loan?

Manag
Student Retired Yes No
emnet

Payment
Days?

<100 >100
Then the data is split for training and testing with 30% of the data to be used for testing the model.

Then using the sklearn - Decision Tree Classifier, the decision tree is built.

The decision tree model works by learning the functioning and training on 70% of the data and it takes
into account the data from columns - age, marital, education, balance, housing, loan, contact, day,
month, duration, campaign, pdays, previous, poutcome and deposit. And then it returns the confusion
matrix. The confusion matrix displays the number of observations for which the prediction of the model
was same as the actual data. It also provides the number of observations for which the model predicted
differently.

Based on that the classification report is generated that gives the accuracy, precision, f1-score and the
support values for the model.

And the below image of the output shows the confusion matrix, the classification report.

The accuracy of the model is: 99.13%

Whereas the precision is: 99.38%


Neural Network

Now for the neural network portion, the following variables are used to predict the default – education,
job, balance, loan, deposit, housing. The neural network for the code is as follows:

Education

Job 1

Balance

Loan 2

Deposit
3
Default
Housing
Input Layer Hidden Layer Output Layer

The output of the code is shown in the below image:

The accuracy of this code is: 98.45%

The accuracy of the neural network is a little less than the accuracy of the decision tree that is because
in the neural network code, a limited number of variables are used to predict the default, whereas in the
decision tree, all the variables are considered.

Potrebbero piacerti anche