Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
April 4, 2019
In [3]: diabetes.describe()
1
std 7.884160 0.331329 11.760232 0.628517
min 0.000000 0.078000 21.000000 0.000000
25% 27.300000 0.243750 24.000000 0.000000
50% 32.000000 0.372500 29.000000 0.906200
75% 36.600000 0.626250 41.000000 1.260800
max 67.100000 2.420000 81.000000 3.900600
In [4]: diabetes.shape
In [5]: diabetes.isnull().values.any()
Out[5]: False
corr = diabetes.corr()
fig, ax = plt.subplots(figsize=(size,size))
ax.matshow(corr)
plt.xticks(range(len(corr.columns)), corr.columns)
plt.yticks(range(len(corr.columns)), corr.columns)
In [7]: plot_corr(diabetes)
2
In [8]: diabetes.corr()
3
num_preg 0.017683 -0.033523 0.544341 -0.081672 0.221898
glucose_conc 0.221071 0.137337 0.263514 0.057328 0.466581
diastolic_bp 0.281805 0.041265 0.239528 0.207371 0.065068
thickness 0.392573 0.183928 -0.113970 1.000000 0.074752
insulin 0.197859 0.185071 -0.042163 0.436783 0.130548
bmi 1.000000 0.140647 0.036242 0.392573 0.292695
diab_pred 0.140647 1.000000 0.033561 0.183928 0.173844
age 0.036242 0.033561 1.000000 -0.113970 0.238356
skin 0.392573 0.183928 -0.113970 1.000000 0.074752
diabetes 0.292695 0.173844 0.238356 0.074752 1.000000
In [10]: diabetes.head()
age diabetes
0 50 True
1 31 False
2 32 True
3 21 False
4 33 True
In [11]: plot_corr(diabetes)
4
In [12]: diabetes_map = {True:1, False:0}
In [14]: diabetes.head()
age diabetes
0 50 1
5
1 31 0
2 32 1
3 21 0
4 33 1
In [15]: num_true = len(diabetes.loc[diabetes['diabetes'] == True])
num_false = len(diabetes.loc[diabetes['diabetes'] == False])
print("Number of True Cases: {0} ({1:2.2f}%)".format(num_true, (num_true/(num_true + n
print("Number of False Cases: {0} ({1:2.2f}%)".format(num_false, (num_false/(num_true +
Number of True Cases: 268 (34.90%)
Number of False Cases: 500 (65.10%)
x = diabetes[features].values
y = diabetes[predict].values
split_test_size = 0.30
6
In [19]: diabetes.columns
x_train = fill_0.fit_transform(x_train)
x_test = fill_0.fit_transform(x_test)
/home/edutech/anaconda3/lib/python3.7/site-packages/sklearn/utils/deprecation.py:58: Deprecation
warnings.warn(msg, category=DeprecationWarning)
7
1 Naive Bayes - Gaussian
In [23]: from sklearn.naive_bayes import GaussianNB
nb_model = GaussianNB()
nb_model.fit(x_train, y_train.ravel())
print("Classification Report:")
print("{0}".format(metrics.classification_report(y_test, nb_predict_test)))
Confusion Matrix:
[[118 33]
[ 28 52]]
Classification Report:
precision recall f1-score support
8
2 Random Forest
In [27]: from sklearn.ensemble import RandomForestClassifier
rf_model = RandomForestClassifier(random_state=42)
rf_model.fit(x_train,y_train.ravel())
/home/edutech/anaconda3/lib/python3.7/site-packages/sklearn/ensemble/forest.py:246: FutureWarnin
"10 in version 0.20 to 100 in 0.22.", FutureWarning)
Classification Report
precision recall f1-score support
9
Accuracy on training data is 98.70% whereas on test data it is 71%. This means that our model
is overfitting the training data.
The method to overcome the problem of overfitting is known as regularization. For imple-
menting regularizations, there are hyperparameters available in the functions
Another method to overcome overfitting is known as cross validation
Both cross validation and regularization are mutually exclusive. Both can be used at the same
time.
Now we will implement Logistic Regression and observe if overfitting is detected or not
3 Logistic Regression
In [31]: from sklearn.linear_model import LogisticRegression
lr_model = LogisticRegression(random_state=42)
lr_model_predict = lr_model.fit(x_train, y_train.ravel())
lr_predict_test = lr_model.predict(x_test)
Confusion Matrix
[[128 23]
[ 34 46]]
Classification Report
precision recall f1-score support
/home/edutech/anaconda3/lib/python3.7/site-packages/sklearn/linear_model/logistic.py:433: Future
FutureWarning)
Let us set the regularization hyperparameters and find out in which configuration we get the
highest recall score
10
In [32]: C_start = 0.1
C_end = 5
C_inc = 0.1
C_values, recall_scores=[],[]
C_val = C_start
best_recall_score=0
best_score_C_val = C_values[recall_scores.index(best_recall_score)]
print("1st max value of {0:.3f} occured at C = {1:.3f}".format(best_recall_score,best_s
/home/edutech/anaconda3/lib/python3.7/site-packages/sklearn/linear_model/logistic.py:433: Future
FutureWarning)
/home/edutech/anaconda3/lib/python3.7/site-packages/sklearn/linear_model/logistic.py:433: Future
FutureWarning)
/home/edutech/anaconda3/lib/python3.7/site-packages/sklearn/linear_model/logistic.py:433: Future
FutureWarning)
/home/edutech/anaconda3/lib/python3.7/site-packages/sklearn/linear_model/logistic.py:433: Future
FutureWarning)
/home/edutech/anaconda3/lib/python3.7/site-packages/sklearn/linear_model/logistic.py:433: Future
FutureWarning)
/home/edutech/anaconda3/lib/python3.7/site-packages/sklearn/linear_model/logistic.py:433: Future
FutureWarning)
/home/edutech/anaconda3/lib/python3.7/site-packages/sklearn/linear_model/logistic.py:433: Future
FutureWarning)
/home/edutech/anaconda3/lib/python3.7/site-packages/sklearn/linear_model/logistic.py:433: Future
FutureWarning)
/home/edutech/anaconda3/lib/python3.7/site-packages/sklearn/linear_model/logistic.py:433: Future
FutureWarning)
11
/home/edutech/anaconda3/lib/python3.7/site-packages/sklearn/linear_model/logistic.py:433: Future
FutureWarning)
/home/edutech/anaconda3/lib/python3.7/site-packages/sklearn/linear_model/logistic.py:433: Future
FutureWarning)
/home/edutech/anaconda3/lib/python3.7/site-packages/sklearn/linear_model/logistic.py:433: Future
FutureWarning)
/home/edutech/anaconda3/lib/python3.7/site-packages/sklearn/linear_model/logistic.py:433: Future
FutureWarning)
/home/edutech/anaconda3/lib/python3.7/site-packages/sklearn/linear_model/logistic.py:433: Future
FutureWarning)
/home/edutech/anaconda3/lib/python3.7/site-packages/sklearn/linear_model/logistic.py:433: Future
FutureWarning)
/home/edutech/anaconda3/lib/python3.7/site-packages/sklearn/linear_model/logistic.py:433: Future
FutureWarning)
/home/edutech/anaconda3/lib/python3.7/site-packages/sklearn/linear_model/logistic.py:433: Future
FutureWarning)
/home/edutech/anaconda3/lib/python3.7/site-packages/sklearn/linear_model/logistic.py:433: Future
FutureWarning)
/home/edutech/anaconda3/lib/python3.7/site-packages/sklearn/linear_model/logistic.py:433: Future
FutureWarning)
/home/edutech/anaconda3/lib/python3.7/site-packages/sklearn/linear_model/logistic.py:433: Future
FutureWarning)
/home/edutech/anaconda3/lib/python3.7/site-packages/sklearn/linear_model/logistic.py:433: Future
FutureWarning)
/home/edutech/anaconda3/lib/python3.7/site-packages/sklearn/linear_model/logistic.py:433: Future
FutureWarning)
/home/edutech/anaconda3/lib/python3.7/site-packages/sklearn/linear_model/logistic.py:433: Future
FutureWarning)
/home/edutech/anaconda3/lib/python3.7/site-packages/sklearn/linear_model/logistic.py:433: Future
FutureWarning)
/home/edutech/anaconda3/lib/python3.7/site-packages/sklearn/linear_model/logistic.py:433: Future
FutureWarning)
/home/edutech/anaconda3/lib/python3.7/site-packages/sklearn/linear_model/logistic.py:433: Future
FutureWarning)
/home/edutech/anaconda3/lib/python3.7/site-packages/sklearn/linear_model/logistic.py:433: Future
FutureWarning)
/home/edutech/anaconda3/lib/python3.7/site-packages/sklearn/linear_model/logistic.py:433: Future
FutureWarning)
/home/edutech/anaconda3/lib/python3.7/site-packages/sklearn/linear_model/logistic.py:433: Future
FutureWarning)
/home/edutech/anaconda3/lib/python3.7/site-packages/sklearn/linear_model/logistic.py:433: Future
FutureWarning)
/home/edutech/anaconda3/lib/python3.7/site-packages/sklearn/linear_model/logistic.py:433: Future
FutureWarning)
/home/edutech/anaconda3/lib/python3.7/site-packages/sklearn/linear_model/logistic.py:433: Future
FutureWarning)
/home/edutech/anaconda3/lib/python3.7/site-packages/sklearn/linear_model/logistic.py:433: Future
FutureWarning)
12
/home/edutech/anaconda3/lib/python3.7/site-packages/sklearn/linear_model/logistic.py:433: Future
FutureWarning)
/home/edutech/anaconda3/lib/python3.7/site-packages/sklearn/linear_model/logistic.py:433: Future
FutureWarning)
/home/edutech/anaconda3/lib/python3.7/site-packages/sklearn/linear_model/logistic.py:433: Future
FutureWarning)
/home/edutech/anaconda3/lib/python3.7/site-packages/sklearn/linear_model/logistic.py:433: Future
FutureWarning)
/home/edutech/anaconda3/lib/python3.7/site-packages/sklearn/linear_model/logistic.py:433: Future
FutureWarning)
/home/edutech/anaconda3/lib/python3.7/site-packages/sklearn/linear_model/logistic.py:433: Future
FutureWarning)
/home/edutech/anaconda3/lib/python3.7/site-packages/sklearn/linear_model/logistic.py:433: Future
FutureWarning)
/home/edutech/anaconda3/lib/python3.7/site-packages/sklearn/linear_model/logistic.py:433: Future
FutureWarning)
/home/edutech/anaconda3/lib/python3.7/site-packages/sklearn/linear_model/logistic.py:433: Future
FutureWarning)
/home/edutech/anaconda3/lib/python3.7/site-packages/sklearn/linear_model/logistic.py:433: Future
FutureWarning)
/home/edutech/anaconda3/lib/python3.7/site-packages/sklearn/linear_model/logistic.py:433: Future
FutureWarning)
/home/edutech/anaconda3/lib/python3.7/site-packages/sklearn/linear_model/logistic.py:433: Future
FutureWarning)
/home/edutech/anaconda3/lib/python3.7/site-packages/sklearn/linear_model/logistic.py:433: Future
FutureWarning)
/home/edutech/anaconda3/lib/python3.7/site-packages/sklearn/linear_model/logistic.py:433: Future
FutureWarning)
/home/edutech/anaconda3/lib/python3.7/site-packages/sklearn/linear_model/logistic.py:433: Future
FutureWarning)
/home/edutech/anaconda3/lib/python3.7/site-packages/sklearn/linear_model/logistic.py:433: Future
FutureWarning)
/home/edutech/anaconda3/lib/python3.7/site-packages/sklearn/linear_model/logistic.py:433: Future
FutureWarning)
13
Still after passing and getting the best value for the regularization parameter we are getting
the recall value of just 61.3%.
Now one of the main reason for this is the classes are not balanced. That is the number of
examples with the ratio of diabetes and without diabetes is almost 35% and 65%. So this creates
biasing. To overcome this there is another hyper parameter for this in our function. Let us apply
that and see if there is any change is the result.
C_val = C_start
best_recall_score = 0
14
best_recall_score = recall_score
best_lr_predict_test = lr_predict_loop_test
best_score_C_val = C_values[recall_scores.index(best_recall_score)]
print("1st max value of {0:.3f} occured at C = {1:.3f}".format(best_recall_score,best_s
/home/edutech/anaconda3/lib/python3.7/site-packages/sklearn/linear_model/logistic.py:433: Future
FutureWarning)
/home/edutech/anaconda3/lib/python3.7/site-packages/sklearn/linear_model/logistic.py:433: Future
FutureWarning)
/home/edutech/anaconda3/lib/python3.7/site-packages/sklearn/linear_model/logistic.py:433: Future
FutureWarning)
/home/edutech/anaconda3/lib/python3.7/site-packages/sklearn/linear_model/logistic.py:433: Future
FutureWarning)
/home/edutech/anaconda3/lib/python3.7/site-packages/sklearn/linear_model/logistic.py:433: Future
FutureWarning)
/home/edutech/anaconda3/lib/python3.7/site-packages/sklearn/linear_model/logistic.py:433: Future
FutureWarning)
/home/edutech/anaconda3/lib/python3.7/site-packages/sklearn/linear_model/logistic.py:433: Future
FutureWarning)
/home/edutech/anaconda3/lib/python3.7/site-packages/sklearn/linear_model/logistic.py:433: Future
FutureWarning)
/home/edutech/anaconda3/lib/python3.7/site-packages/sklearn/linear_model/logistic.py:433: Future
FutureWarning)
/home/edutech/anaconda3/lib/python3.7/site-packages/sklearn/linear_model/logistic.py:433: Future
FutureWarning)
/home/edutech/anaconda3/lib/python3.7/site-packages/sklearn/linear_model/logistic.py:433: Future
FutureWarning)
/home/edutech/anaconda3/lib/python3.7/site-packages/sklearn/linear_model/logistic.py:433: Future
FutureWarning)
/home/edutech/anaconda3/lib/python3.7/site-packages/sklearn/linear_model/logistic.py:433: Future
FutureWarning)
/home/edutech/anaconda3/lib/python3.7/site-packages/sklearn/linear_model/logistic.py:433: Future
FutureWarning)
/home/edutech/anaconda3/lib/python3.7/site-packages/sklearn/linear_model/logistic.py:433: Future
FutureWarning)
/home/edutech/anaconda3/lib/python3.7/site-packages/sklearn/linear_model/logistic.py:433: Future
FutureWarning)
/home/edutech/anaconda3/lib/python3.7/site-packages/sklearn/linear_model/logistic.py:433: Future
15
FutureWarning)
/home/edutech/anaconda3/lib/python3.7/site-packages/sklearn/linear_model/logistic.py:433: Future
FutureWarning)
/home/edutech/anaconda3/lib/python3.7/site-packages/sklearn/linear_model/logistic.py:433: Future
FutureWarning)
/home/edutech/anaconda3/lib/python3.7/site-packages/sklearn/linear_model/logistic.py:433: Future
FutureWarning)
/home/edutech/anaconda3/lib/python3.7/site-packages/sklearn/linear_model/logistic.py:433: Future
FutureWarning)
/home/edutech/anaconda3/lib/python3.7/site-packages/sklearn/linear_model/logistic.py:433: Future
FutureWarning)
/home/edutech/anaconda3/lib/python3.7/site-packages/sklearn/linear_model/logistic.py:433: Future
FutureWarning)
/home/edutech/anaconda3/lib/python3.7/site-packages/sklearn/linear_model/logistic.py:433: Future
FutureWarning)
/home/edutech/anaconda3/lib/python3.7/site-packages/sklearn/linear_model/logistic.py:433: Future
FutureWarning)
/home/edutech/anaconda3/lib/python3.7/site-packages/sklearn/linear_model/logistic.py:433: Future
FutureWarning)
/home/edutech/anaconda3/lib/python3.7/site-packages/sklearn/linear_model/logistic.py:433: Future
FutureWarning)
/home/edutech/anaconda3/lib/python3.7/site-packages/sklearn/linear_model/logistic.py:433: Future
FutureWarning)
/home/edutech/anaconda3/lib/python3.7/site-packages/sklearn/linear_model/logistic.py:433: Future
FutureWarning)
/home/edutech/anaconda3/lib/python3.7/site-packages/sklearn/linear_model/logistic.py:433: Future
FutureWarning)
/home/edutech/anaconda3/lib/python3.7/site-packages/sklearn/linear_model/logistic.py:433: Future
FutureWarning)
/home/edutech/anaconda3/lib/python3.7/site-packages/sklearn/linear_model/logistic.py:433: Future
FutureWarning)
/home/edutech/anaconda3/lib/python3.7/site-packages/sklearn/linear_model/logistic.py:433: Future
FutureWarning)
/home/edutech/anaconda3/lib/python3.7/site-packages/sklearn/linear_model/logistic.py:433: Future
FutureWarning)
/home/edutech/anaconda3/lib/python3.7/site-packages/sklearn/linear_model/logistic.py:433: Future
FutureWarning)
/home/edutech/anaconda3/lib/python3.7/site-packages/sklearn/linear_model/logistic.py:433: Future
FutureWarning)
/home/edutech/anaconda3/lib/python3.7/site-packages/sklearn/linear_model/logistic.py:433: Future
FutureWarning)
/home/edutech/anaconda3/lib/python3.7/site-packages/sklearn/linear_model/logistic.py:433: Future
FutureWarning)
/home/edutech/anaconda3/lib/python3.7/site-packages/sklearn/linear_model/logistic.py:433: Future
FutureWarning)
/home/edutech/anaconda3/lib/python3.7/site-packages/sklearn/linear_model/logistic.py:433: Future
FutureWarning)
/home/edutech/anaconda3/lib/python3.7/site-packages/sklearn/linear_model/logistic.py:433: Future
16
FutureWarning)
/home/edutech/anaconda3/lib/python3.7/site-packages/sklearn/linear_model/logistic.py:433: Future
FutureWarning)
/home/edutech/anaconda3/lib/python3.7/site-packages/sklearn/linear_model/logistic.py:433: Future
FutureWarning)
/home/edutech/anaconda3/lib/python3.7/site-packages/sklearn/linear_model/logistic.py:433: Future
FutureWarning)
/home/edutech/anaconda3/lib/python3.7/site-packages/sklearn/linear_model/logistic.py:433: Future
FutureWarning)
/home/edutech/anaconda3/lib/python3.7/site-packages/sklearn/linear_model/logistic.py:433: Future
FutureWarning)
/home/edutech/anaconda3/lib/python3.7/site-packages/sklearn/linear_model/logistic.py:433: Future
FutureWarning)
/home/edutech/anaconda3/lib/python3.7/site-packages/sklearn/linear_model/logistic.py:433: Future
FutureWarning)
/home/edutech/anaconda3/lib/python3.7/site-packages/sklearn/linear_model/logistic.py:433: Future
FutureWarning)
/home/edutech/anaconda3/lib/python3.7/site-packages/sklearn/linear_model/logistic.py:433: Future
FutureWarning)
Here you can see after balancing the weights of the classes we get the recall score of 73.8%
In [34]: from sklearn.linear_model import LogisticRegression
lr_model = LogisticRegression(class_weight='balanced', C=best_score_C_val, random_state
17
lr_model.fit(x_train, y_train.ravel())
lr_predict_test = lr_model.predict(x_test)
Accuracy: 0.7143
Confusion Matrix:
[[106 45]
[ 21 59]]
Classification Report
precision recall f1-score support
/home/edutech/anaconda3/lib/python3.7/site-packages/sklearn/linear_model/logistic.py:433: Future
FutureWarning)
18
In [36]: lr_cv_predict_test = lr_cv_model.predict(x_test)
Accuracy: 0.7013
Confusion Matrix:
[[108 43]
[ 26 54]]
Classification Report:
precision recall f1-score support
Recall Score:
0.675
In [ ]:
19