Sei sulla pagina 1di 3

Prediction of Heart Disease Using a Hybrid

Technique in Data Mining Classification


Ankita Dewan Meghna Sharma
ITM University Assistant Professor, ITM University
Gurgaon, INDIA Gurgaon, INDIA
ankita.dewan91@gmail.com meghnasharma@itmindia.edu

Abstract - Heart disease prediction is treated as most smoking, alcohol, obesity, high blood pressure, diabetes etc.
complicated task in the field of medical sciences. Thus there which are responsible for the risk of having a heart problem.
arises a need to develop a decision support system for However, with the resent studies, with the introduction of
detecting heart disease of a patient. In this paper, we propose artificial intelligence and medical sciences, we can actually
efficient genetic algorithm hybrid with the back propagation help in preventing any such kind of diseases. For making a
technique approach for heart disease prediction. Today good decision, machine learning helps in extracting relevant
medical field have come a long way to treat patients with data from huge databases which are available in hospitals.
various kind of diseases. Among the most threatening one is There are many kind of classification techniques such as K-
the Heart disease which cannot be observed with a naked eye nearest neighbor, decision trees like CART, C4.5, CHAID,
and comes instantly when its limitations are reached. Bad J48, ID3algorithm etc. but all these are weak classifiers which
clinical decisions would cause death of a patient which need the help of bagging and boosting techniques to improve
cannot be afforded by any hospital. To achieve a correct and their performances.
cost effective treatment computer-based and support Systems In this paper, various kinds of techniques which have been
can be developed to make good decision. Many hospitals use applied in the prediction of heart diseases or classification has
hospital information systems to manage their healthcare or been discussed and a proposed methodology of hybrid
patient data. These systems produce huge amounts of data in technique has been given which can be implemented in future
the form of images, text, charts and numbers. to have an accuracy of almost 100% or with least error. The
Sadly, this data is rarely used to support the medical decision proposed system will be implemented in MATLAB R2012a.
making. There is a bulk of hidden information in this data
that is not yet explored which give rise to an important query II. DATA MINING TECHNIQUES
of how to make useful information out of the data. So there is
Data mining techniques are helpful in extracting & analyzing
necessity of creating an excellent project which will help
the complicated medical data using various kinds of
practitioners predict the heart disease before it occurs.
techniques. Practioners of medical science are also using these
The main objective of this paper is to develop a prototype
minute techniques in other field such as detection of cancer
which can determine and extract unknown knowledge
and stroke. Researchers have been applying various techniques
(patterns and relations) related with heart disease from a past
of machine learning such as Artificial Neural Network, BP
heart disease database record. It can solve complicated
(Back-Propagation algorithm) genetic algorithm for the
queries for detecting heart disease and thus assist medical
optimization purpose.
practitioners to make smart clinical decisions which
[1] One of the systems has used Back-Propagation in neural
traditional decision support systems were not able to. By
network which is regarded as the best prediction algorithm
providing efficient treatments, it can help to reduce costs of
where we have a non-linear relationship between the data and
treatment.
the target output. The characteristics of BP algorithm are that it
is adaptive and tolerant towards the noisy data or other outliers
Keywords – BP Neural Network, Data Mining, Genetic
present in the medical data.
Algorithm, Heart Disease Prediction.
The steps are as follows:
1. The normalized data is feed in network and the
I. INTRODUCTION
correspondent result is computed
Heart disease is one of the most common reasons of death in 2. The actual and computed result is checked for error
India or other Asian countries. In 2003 approx 17.3 million differences.
people died all over the globe and out of this, 10 million were 3. According to the error, weights and bias are rearranged.
only due to the coronary heart disease. Along without 4. In case the error exceeds the tolerance then step1 is again
changing lifestyle there are many such factors such as followed else process is terminated.

978-9-3805-4416-8/15/$31.00 2015
c IEEE 704
In this system, former 3 week data is taken as input. The one C. Naive Bayes
week data later produced as output. So, the no of input neurons
This classifier uses conditional independence which states that
are 3 and output neuron is 1. To calculate no. of neurons in
an attribute value on a given class is not dependent on the
hidden layer, various thumb rules are applied and range came
values of other attributes or factors as it is based on Bayes
out to be 2 to 8 and the optimized is chosen as 4.
theorem.
So, a 3- layer feed-forward network is applied by the system
All 3 algorithms were implemented on data set consisting of
and the activation function between hidden and input layer is
303 records for training and 270 for testing purpose. In
tansig() and between output and hidden is purelin(). The
preprocessing, the most common tool of Weka i.e. Replace
levenberg-marquardt algorithm was chosen, which leads to fast
missing value filter is used. From the results in Fig. 2., a
high hit rate with conversions as it is the best algorithm. The
conclusion can be made that Neural Network is providing a
absolute error show casted in this model is 25% for weekly
better result in such a non-linear application of health disease
price prediction. So, accuracy reaches high enough to 75% as
prediction.
shown below in Fig. 1.

Fig. 1. Weekly Price Prediction


Fig. 2. Graphical Representation of Accuracy for Each Method
[2] chaitrali and sulabha in 2012 performed a comparative
analysis of three data mining classification techniques, namely [3] K.Srinivas in 2010 made a comparative analysis of popular
Naive Bayes, decision trees and Neural Network on heart data mining technologies namely decision tress, Naive Bayes
disease database using a data mining tool Weka3.6.6 & Neural Network for classifying heart disease dataset
The Dataset is taken from UCI repository which includes 13 In decision trees, C4.5 is the most common one. Tree made
attributes such as sex, Blood pressure, cholesterol and added 2 using C4.5 consists of 3 predictor attributes - Age, Gender,
more attributes obesity and smoking. Intensity of symptoms & one goal - for a patient having a
Data mining techniques for prediction disease or not.
In neural network, Multilayer feed forward network consist of
A. Neural Network 20 input nodes which are the attributes, 10 hidden nodes
calculated using hit & trial, 10 output nodes showing range of
Artificial Neural network as discussed before is a mathematical diseases.
model of our biological neural network. It consists of 3 layers Bayesian Network - Major benefit of Bayesian Network is that
– input layer, output layer and a hidden layer with some it requires less information based on pre-existing
interconnecting weights. The output is calculated as a function. understanding of the system's variable dependencies.
Oj = f(™ WjiXi) Support Vector Machine - SVM performs classification tasks
Where Oj is output neuron by maximizing the margin separating both the classes while
Xi is input neuron minimizing the classification error.
Wji is the weight connecting Xi and Oj
f is sigmoidal function

B. Decision Trees
There are many decision tree algorithms and among them the
most popular is J48 which uses a pruning technique to build a
good decision tree. Pruning is a method which tries to
eliminate the over fitting data which is not so relevant in
making a decision and leads to poor prediction. At last, a tree
is build to provide flexibility and accuracy balance.
Fig. 3. Comparison of four models, Neural Network (MLP) proving
Highest accuracy

2015 2nd International Conference on Computing for Sustainable Global Development (INDIACom) 705
On the basis of the experimental chart in Fig. 3. We can say updating technique of weights by propagating the errors
that Neural Network & Naive Bayes perform much better than backward can be used. But it has drawback of being stuck in a
the other two. local minima solution so to solve this problem, we can use an
efficient optimizing technique to further improve its accuracy
and apply in the predictions of various applications.

REFERENCES
[1] Chaitrali S. Dangare, Sulabha S. Apte, “Improved Study of
Heart Disease Prediction System using Data Mining
Classification Techniques”, International Journal of Computer
Applications (0975 –888)Volume 47–No.10, June 2012.
[2] K.Srinivas, Dr.G.Raghavendra Rao, Dr. A.Govardhan,
“Analysis of Coronary Heart Disease and Prediction of Heart
Attack in Coal Mining Regions Using Data Mining
Techniques”, The 5th International Conference on Computer
Science & Education Hefei, China. August 24–27, 2010.
[3] Niti Guru, Anil Dahiya, Navin Rajpal, “Decision Support
System for Heart Disease Diagnosis Using Neural Network”,
Delhi Business Review, Vol. 8, No. 1 (January -June 2007).
[4] Yanwei Xing, Jie Wang and Zhihong Zhao, “Combination
data mining methods with new medical data to predicting
outcome of
[5] Coronary Heart Disease”, 2007 International Conference on
Convergence Information Technology.
[6] T.John Peter, K. Somasundaram, “An empirical study on
prediction of heart disease using classification data mining
techniques”, IEEE-International Conference On Advances In
Engineering, Science And Management (ICAESM -
2012)March 30, 31,2012.
[7] Anchana Khemphila, Veera Boonjing, “Heart disease
Classification using Neural Network and Feature Selection”,
2011 21st International Conference on Systems Engineering.
[8] UCI Machine Learning Repository [homepage on the
Internet]. Arlington: The Association; 2006 [updated 1996
Dec 3; cited 2011 Feb 2]. Available from:
http://archive.ics.uci.edu/ml/datasets/Heart+Disease
[9] Jiawei Han, Micheline Kamber & Jian Pei-Data
Mining: Concepts and Techniques; 3rd ed; 2011.
Fig. 4. Algorithm for Proposed Methodology [10] Sellappan Palaniappan, Rafiah Awang, “Intelligent Heart
Disease Prediction System Using a Data Mining Techniques”,
III. PROPOSED METHODOLOGY IJCSNS International Journal of Computer Science and
Network Security, VOL.8 No.8, August 2008.
It is observed that the best classification technique which can [11] Ms. Ishtake S.H , Prof. Sanap S.A., “Intelligent Heart Disease
be used in our domain is none other than Back Propagation Prediction System Using Data Mining Techniques”,
since it’s the only technique which is used for nonlinear International J. of Healthcare & Biomedical Research,
relationships. But due its drawback of being stuck in local Volume: 1, Issue: 3, April 2013, Pages 94-101.
minima we are not able to achieve the maximum profit from [12] R. chitra, v. seenivasagam, “review of heart disease prediction
this technique. So to solve this problem we can use a best system using data mining and hybrid intelligent techniques”,
ICTACT journal on soft computing, july 2013, volume: 03,
optimizer i.e. Genetic Algorithm which uses the phenomena of issue: 04
mutation and crossover over various generations. The weights [13] G. M. Nasira , N. Hemageetha, “Vegetable Price Prediction
which are used for BP can be optimized first and then given as Using Data Mining Classification Technique”, Proceedings of
input to our network to give much better results as shown in the International Conference on Pattern Recognition,
fig. 4. Informatics and Medical Engineering , March 21-23, 2012.
IV. CONCLUSION
A conclusion can be made that neural network is best among
all the classification techniques when we talk about prediction
or classification of a non-linear data. BP algorithm which is the
best classifier of Artificial Neural Network which uses the

706 2015 2nd International Conference on Computing for Sustainable Global Development (INDIACom)