Sei sulla pagina 1di 11

CBA Batch 11

PRACTICUM 2 PRESENTATION

DIABETES DATA ANALYSIS FOR READMISSION PREDICTION

Preeti Agarwal – 11810054


Raman Teja Venigalla - 11810027
Executive Summary
A brief understanding of the problem Means
We have used python packages like pydotplus,
graphviz etc. for visualization and understanding the
data better.
Primary Hypothesis
The project comprises of analysis of Message
Based on our analysis, we have made a few
healthcare data from 130 US recommendations to both patients and also hospitals.
hospitals (FY 1999-2008) and find
meaningful insights from it. We will
also predict patient readmissions Models
for diabetic patients and identify Message We used Random Forests, Decision Tree
important factors for readmissions. and Logistic Regression with two sets of
The main hypothesis which we will features.
test is
”Are some features or factors from Method
Models Analysis is done using feature
patient history and diagnosis
engineering.
strong indicators of hospital
readmission in diabetic patients?”
Method
Motivation
The motivation is to study the
Motivation factors that are probably
responsible for readmission of
diabetes patients.
Business Problem
The pain-point we intend to solve

Business Context
To understand the behavior, patterns and
factors that lead to the readmission of
diabetes patients.

Business Expectations
We intend to find the patterns in readmission
of diabetes patients and propose preventive
action to reduce medical costs.
Business Decomposition
From the hospital/health center
perspective, this analysis could be
helpful in procuring medicines,
recruiting doctors based on need
and also preventing the harmful
Business Decomposition effects of diabetes.
From a developer/analyst
perspective the translation of the
business expectation into model
is essential. For this we consult
domain experts and take their
opinion and seek their
knowledge.
Connecting the dots
Based on the domain knowledge we
have to decide what are the suitable
models for the analysis and also we
need to validate the predictions in
real-time.
Problems
The primary roadblock here is the
availability of data. The data is not
available in abundance as the
records are not maintained by all
healthcare centers/hospital.
Building a model
With the data collected, we must proceed to build an efficient model to
predict the diabetes readmission cases.
Data Requirements and Processing
What is the data needed and in what form is it needed for the analysis?

Data Requirement
Data provided by UCI Repository
is used in the analysis.

Data Cleaning
For data cleaning and pre-
processing, we have imputed
values for age by substituting
mid points of the age buckets,
dropped records where values
are unknowns or special
characters.
Dropped features which does
not provide any meaningful
insight.
Data Understanding
Making sense out of the data

Step 1
Data collection. We
have collected data
from UCI repository.

Data Cleaning
we have imputed values
for age by substituting mid
points of the age buckets,
dropped incomplete
records.
Data Visualization
Python Plotting Libraries like
graphviz, matplotlib and
seaborn were used to
Feature Engineering visualize the data.
New variables were created in the dataset
like Service Utilization, clubbing diagnosis
categories, medication changes which would
provide more meaningful insights on the Modelling and Inferences
dataset. We used five models that is Random Forests
(Gini and Entropy), Decision Tree (Gini and
Entropy) and Logistic Regression with two
sets of features and evaluated.
Modelling, Evaluation & Feedback
How good are the models

Step 3
Step 1 We choose five models for Step 5
We analyzed the each of the feature sets All the models are built using train data and
requirements specific to with high interpretability validated using the test data. Metric scores
healthcare domain of each model is compared

Step 2 Step 4 Step 6


We intend to use two set Split ratio of train and Decision trees have good
of features i.e. Detailed test data is 80:20 accuracy and better
and simple features to interpretability
compare performance of
models
Limitations, Further Work and references
Understanding the constraints and setting goals

Limitations Further Work


Finding large data sets is a primary limitation We intend to build an application and collect
as data is not recorded for all cases. more data from the patients itself by providing
them health insights as a service.
Since the health care sector contains
PI(personal information) data, it is also not so
easily available due to various policies of
governments and hospitals. References
• https://medium.com/

• https://kaggle.com

• https://github.com
Business Recommendation
Helpful insights that are derived from the analysis

For Hospitals
For hospitals, we highly recommend,
decreasing the usage of Repaglinide and For Patients
Insulin for treating diabetic patients as
appear to increase the odds of For patients we highly recommend
readmission and also recommend usage to follow the medical advice and do
of Chlorpropamide usage appears to not leave before the prescribed
decreases the odds treatment is completed.

Also we would recommend the hospitals


to avoid transferring patients to the
extent possible.
Thanks You!
Have a great day!

Potrebbero piacerti anche