Sei sulla pagina 1di 5

1

Liver Disease Prediction Using Machine Learning

Abstract- Liver disease had became one of the most functioning properly and person may suffer from
prominent disease in our country .It is the reason for about liver cancer.It cannot be reversed but can be stopped
2.4% of death per year in India. It has became a challenge to if the consumption of alcohol is stopped.
predict the disease of liver in early stage if not diagnosed
early stages it become very hard to cure later on. Machine II. LITERATURE SURVEY
Learning has helped us a lot in the field of medical.In this
paper,it is estimated that which attributes are important and This part consist of papers that are surveyed:
which are not.The classification techinques are performed
in the training dataset.The main aim of the paper is to apply P.Rajeswari,G.Sophia Reena et al.,[2010]introduced the
various machine learning algorithms like random classification based on liver diagnosis. The dataset for
forest,SVM(Support Vector Machine),Logistic Regression training is created by collecting data from UCI repository
on the datasets and thus identify whether the patient has liver which consists of 345 instances and 7 distinct attributes.
disease or not. In this paper,the result is obtained by applying naïve baised
algorithm,K-star algorithm and FT tree algorithm.Out of
Index Terms-Liver disease,classification,dataset,random these three algorithms,time taken by FT tree algorithm is fast
forest,SVM,logistic regression. with accuracy of 97.10%.Based on the result,FT tree
algorithm is the best algorithm among all three
algorithms[2].
I. INTRODUCTION

Internet is full of data and people give their opinion about Sa’diyah Noor Novita Alfisahrin, Teddy Mantoro et al.,
everything and anything.It is extremely difficult to make a [2013] have introduced to predict or identify if the patients
correct choice when their is a large amount of data and no is suffering from liver disease pr not on the basis of 10
good method to make a decision.We need a method to attributes.The algorithms used in it were Naïve
extract the sentiment out of a data and use it as a make a Baiyes,Decision tree and NB tree algorithm.On seeing the
sensible choice , to solve such problem we use sentiment result,it is seen that NB Tree algorithm is the best algorithm
analysis.SENTIMENTAL ANALYSIS is a kind of text with the highest accuracy,however NB algorithm is the first
classification based on Sentimental Orientation (SO) of to give the results.For further study,the improvement in
opinion they contain. Sentiment analysis of product reviews accuracy of NB tree algorithm,will be the target by finding
has recently become very popular in text mining and the most suitable factors in predicting whether patient has
computational linguistics research. liver disease or not[3].

DISEASE OF LIVER S.Dhamodharan[2014] introduced that there are many liver


disease which require medical attention.With the help of
The main reasons of the liver disease are excessive different symptoms,3 major disease of liver are predicted
consumption of alcohol,fat accumulation in the liver,etc. that are hepatitis,cirrhosis and liver cancer.The main aim of
The common liver disorders are: the paper is to predict the type from classes like
● Alcoholic hepatitis – This is swelling of the liver. cirrhosis,liver cancer,hepatitis or no disease.Algorithms
It might be mild and last for years but with no used in this were Naïve Bayes and FT tree which on
certain symptoms.In mild cases,it might be comparing the accuracy,it is seen that Naïve Bayes
controlled by reducing the consumption of alcohol. algorithm is better than other algorithms[4].
If not controlled,it might become fatal in that stage.
● Fatty liver disease-In this disease,fat is accumulated A.S.AnneshKumar ,Dr.C.JothiVenkateswaran et al.
in the liver cells.No symptoms are shown in this ,[2015]has introduced which describes the categorization
disease.Most probably,it goes away when a person of liver disease by use of fuzzy K-means classification and
stop drinking alcohol. feature selection. Various liver diseases have similar
● Cirrhosis-It generally occurs when healthy liver attribute values and thus it require more hard work to
tissue is replaced by hard scar tissue.It is the most classify liver disease type correctly. So Fuzzy based
serious form of liver disease.It stop liver from from classification best performance in these classes and gave
2

above 94 percentage accuracy for every type of liver Due to data-preprocessing,it becomes easy to process the
disorder [6]. data and thus gives the better result.The example of the data-
preprocessing is to fill the null values in the dataset.
P.Thangarajul, R.Mehala et al., [2015] has introduced to The three techniques used in data-preprocessing are:
analyze liver diseases patients datawith the use of particle ● Rescale data
swarm optimization algorithm (PSO) with K-Star ● Binarize the data
classification in two ways for classifying the presence of ● Standardize the data
disease or not.This algorithm increase the performance of
accuracy when compared to present classification
algorithms. PSO-Kstar algorithm is best algorithm for the iii)Machine learning Algorithms used:
liver disorders classification as it improved the
performance in prediction accuracy.The best data mining A. SVM(Support Vector Machine)
algorithm with respect to understandability,
transformability and accuracy is PSO-KStar algorithm with This algorithm aims to focus on to find the hyperplane in N-
100% accuracy[7]. dimension plane which classifies the data-points distinctly.
Hyperplanes are the boundries which classifies the
Onwodi Gregory [2015] introduced two dataset of liver datapoints distinctly.Various hyperplanes are created and
patient which were to build classification model for hyperplane with the largest margin is choosed in this
predicting liver disorder . Eleven datamining classification algorithms to classify.The more is the margin,the less is the
algorithm were used in dataset and then performance of all error.
those algoritim are compared among themselves with
repect to accuracy , recall and precision. Based on those B. Random Forest algorithm
result the accuracy of FT tree algorithum was the best
which 78% accuracy , 86.4% sensitivity ,77.5% of It is the flexible,easy to use algorithm in the field of machine
precision and 38.2% of specificity result respectively [8]. learning.It is a supervised algorithm and can be used for both
classification and regression.Random forest reduce the
variance that create the disturbances in the results.We can
III. IMPLEMENTATION find the output of the individual tree through majority voting
and thus smoothing out the variance to increase the accuracy
i)DATASET of the results.

The dataset of Indian Liver Patients comprises of 11 C. Linear Regression


attributes and 583 liver patients.The patients in dataset were
given by either 1 or 2 for liver patient or not. This is a type of supervised algorithms.This algorithm
The dataset was records of the patients of liver from the models a target precision on independent variable.It is called
hospital of Andhara Pradesh. so because it finds the linear relationship between X and Y
The dataset comprises of the following attributes: where X is input and Y is output.The best fit line for our
⮚ Age model is regression line.
⮚ Sex
⮚ Total_Bilirubin D. Logistic Regrssion
⮚ Direct_Bilirubin
⮚ Alkaline Phosphatase It is another technique in the field of machine learning from
⮚ Alamine Phosphatase field of statistic.It is used in binary classification method.To
⮚ Total_Protein squash the value between 0 and 1,it makes use of logistic
⮚ Albumin function or sigmoid function which is a S shaped graph.
⮚ Albumin and Globulin_Ratio
⮚ Result E. Naïve Bayes algorithm

ii)Data-preprocessing Although this algorithm is simple but is very powerful


algorithm for predictive model in classification.This
Data-preprocessing means transforming the data before classifier is from the family of probabilistic classifier which
using it .It is done to convert the raw data into clean data. makes use of the Bayes theorem with strong independence
assumption between features.It considers each feature to
3

contribute independently to the probability regardless of any


possible correlation between the attributes. Name of the classification algorithm Accuracy
1. SVM 70.85%
2. Logistic Regression 71.42%
IV.RESULT AND ANALYSIS 3. Linear Regression 65.71%
4. Random forest 65.71%
In this section,different algorithms are compared to each 5. Naïve Bayes 58.85%
others on the basis of following factors: Result of algorithms

i)Accuracy
V.CONCLUSION
It is the ratio of the correct predicted to the total number of
predictions.
In this paper,various algorithms are applied on the
Formula of accuracy is:-
dataset.Various algorithm such as Random
forest,SVM,logistic regression,etc are applied to the
dataset.These algorithms gave different accuracy for the
dataset.
On analyzing the resul,it is seen that logistic regression
regression turns out to be the best algorithm with the
ii)Recall or sensitivity accuracy of 71.42%.

It is the ratio of the true positive to the sum of true


positive and false negative.
It can be calculated as:
Sensitivity = (True Positive)/(True Positive + False
Negative)

iii)Precision

It is the ratio between true positives and all positive


results.
It is given by:

Each algorithms give its own different result.


The result of each algorithm is shown in table below:
4
5

Potrebbero piacerti anche