Sei sulla pagina 1di 5

A Different approach for Cardiac Arrhythmia Classification using Random Forest

Algorithm
Allam Jaya prakash
Department of Electronics and Communication Engineering, National Institute of Technology, Rourkela,India
E-mail: allamjayaprakash@gmail.com

Published in Healthcare Technology Letters; Received on xxx; Revised on xxx.

ECG plays a most important role in finding cardiac disorders of the heart. Cardiac arrhythmias occur in a short duration of time which can’t be
distinguishable by a human eye. The finding of arrhythmias is a tedious task since slight changes in electro cardiogram signal may lead to life-
threatening. Diagnosis and medication at an early stage of cardiac arrhythmia may facilitate to reduce the mortality rate of the heart patients.
This paper presents an accurate system for the classification of 5 types of electro cardiogram arrhythmias namely Paced (P), Premature
ventricular contraction (V), Normal (N), Right bundle branch block (R) and Left bundle branch block (L). We proposed an ECG arrhythmia
classification using random forest classifier. The projected model of cardiogram heart condition recognition system covers 3 stages they’re
pre-processing, feature extraction and classification. In the first stage, filtered the ECG signal raw data and finding the R-peak locations
of the ECG signal. Dual tree complex wavelet transform (DTCWT) is employed to extract the feature vector from the electro cardiogram
signal within the second stage. The final feature vector consists of an extracted feature sets from DTCWT and four other features Skewness,
Timing information, Kurtosis and AC power of the ECG signal. The final feature vector is applied as an input to the classifier. Random forest
classifier achieved overall accuracy of 98.78% on an individual basis once tested over five kinds of physionet MIT-BIH arrhythmia database.
The proposed arrangement of framework effectively characterised five kinds of ECG arrhythmias. Comparatively, the random forest provides
higher accuracy than other techniques for the ECG arrhythmia classification.

1. Introduction: Electrocardiogram (ECG) is the recording quantization (LVQ) and self-organizing maps (SOM) are two
of the electrical phenomenon at the heart that represents the classification models based on clustering technique, LVQ is a
variability of heart rate. Any deviation from the normal heart rate, supervised learning technique which classifies the feature vector
disturbance in rhythm, site of origin or conduction of cardiac corresponding to the label of the cluster pattern (code word) into
electric impulse is considered as an arrhythmia. Most of the which is clustered. In SOM, each centring of the cluster (prototype
arrhythmias are seldom occurring and can’t be distinguished by a or code word) is represented by some weights of a neuron which is
person’s eye. Long duration of ECG record called as ambulatory assigned to correlate in the feature map. The classifier is modelled
electrocardiogram are required to identify the abnormalities present using mixture of SOM and LVQ approaches. The network is
in a patient. Moreover, huge variations in temporal and some designed in such a way that LVQ gives superior classification
other morphological characteristics from one patient to the other performance for classes 1 and 3 however for class 2 and 4 SOM
patient make detection of abnormalities is a challenging task. Hence gives superior performance. Overall classification accuracy of 94%
it is very difficult to analyse and recognize these arrhythmias is reported using this mixture of expert’s approach.
manually by an expert cardiologist. Hence we require an automatic A unique technique is proposed in [9] for a patient-adapted ECG
computer-aided diagnostic (CAD) system that can quickly detect heartbeat classification that consists of four stages namely, pre-
abnormalities within the patient. processing, feature extraction, feature selection and classification.
In literature, there are many algorithms proposed on automatic Features are extracted in temporal and frequency domain.
classification of ECG Arrhythmias. Most of them have done Frequency domain features include coefficients obtained by
ECG classification in the subsequent stages i) pre-processing applying stock well transform. 184 samples around each R peak
ii) feature extraction iii) classification. These feature extraction are taken by combining 4 temporal features (Pre R-R, Post R-R,
techniques include both time and frequency domain features. Some average R-R, local R-R) and 180 samples of frequency domain
of the temporal features include R-R intervals, Q-R intervals, features and obtained better classification results as reported in [9].
QRS complex duration, R-S intervals, ST time segments. These To address the above limitations, a novel approach of random
temporal features are not enough for classification since there is a forest algorithm is proposed for the classification of ECG
huge variation in the other morphological patterns within the same arrhythmias. Random forest is a highly powerful machine learning
patient. Hence some of the researchers have reported the mixture algorithm which is based on supervised learning. Random forest
of both frequency and time features as reported in [1]. Many algorithm can use individually for both classification and the
transformation techniques like S transform, Fourier transform (FT) regression kind of difficulties. As name prompt that this algorithm
and discrete wavelet transform (DWT) [2] are used to extract the creates the forest with many number of trees.
features from the pre-processed data. Extracted features are given to The remaining paper is organized as follows: Section 2 presents
the input of classifier for classification into respective classes. Some MIT-BIH database that is used to evaluate the performance of
of the classifiers are artificial neural networks (ANN) [3], support the proposed classification algorithm. Section 3 contains proposed
vector machine (SVM) [4]. framework. Section 4 shows the results and discussion of the
Detection of cardiac arrhythmias by applying Hidden Markov proposed algorithm. The conclusion of the work is explained in
models is reported in [7].This algorithm has demonstrated very section 5.
good promising results in classifying ventricular arrhythmias and
detection of low amplitude P wave detection. In [8], the author 2. ECG data Processing: MIT-BIH ECG Arrhythmia database
presented a customized electrocardiogram (ECG) beat classifier is utilised for assessing the performance of the proposed technique.
using a mixture of experts (MOE) methodology. Linear vector The database consists of some deadly arrhythmias. Most of the

Healthcare Technology Letters, pp. 1–5 1


The Institution of Engineering and Technology 2018
c
researchers consider as the standard database for detection of
ECG Data Acquisition
cardiac arrhythmias. The database contains forty-eight files of ECG
recordings, and every file includes 30 min of ECG segment selected
from 24 hours recordings of 48 specific patients. The first 23

(i) Pre-processing
Normalization
recordings correspond to the routine clinical recordings while the
remaining recordings contain the complex ventricular, junctional,
and supraventricular arrhythmias [16].These ECG recordings are Filtering
sampled at 360 Hz and band-pass filtered at 0.1-100 Hz. Annotation
file contains labels for each rhythm that are detected by using R-peak detection
a simple slope sensitive detector. Two independent cardiologists
cross-check the results for the verification. These labels are used
in the training and testing phase of the implemented algorithm.
In this work, five different types of cardiac arrhythmias namely

Morphological features
Temporal features
AC power
paced (P), premature ventricular contraction (V), Normal (N), right

(ii) Feature Extraction


Select a window ( 192
bundle branch block (RBBB) and left bundle branch block (LBBB) samples from right and
Kurtosis
arrhythmias are classified by implementing the proposed method. left) around the R-peak

Skewness

Timing
DTCWT features
information

Extracted Features Extracted Features

Total Feature Vector

Random Forest Classifier

P V N RBBB LBBB

Figure 1. Plot of 100m ECG signal from physionet Figure 2. Block diagram of proposed methodology

(1) AC power: It represents the total power content in


3. Proposed Framework: The complete proposed methodology Electrocardiogram QRS complex signal.
for the arrhythmia classification is shown in Figure 2 which
includes the acquisition of ECG data, pre-processing, feature p =E(Z[n]2 ) (1)
extraction and classification.
(2) Kurtosis: It indicates the sharpedness of the Electrocardiogram
QRS complex signal.
3.1. Pre-processing and R-peak detection: The preprocessing stage
E[(z − µ)4 ]
contains the subsequent two steps: (i) The amplitude of ECG signals kurt(z) = (2)
is normalised to a mean zero, and therefore the amplitude variance σ4
for every ECG signal is eliminated. (ii) Every ECG signal is filtered (3) Skewness: More precisely skewness indicates the shortage of
using a bandpass filter at 0.1-100 Hz to get rid of the noises. The symmetry of the distribution or data set.
R-peak of the electrocardiogram signal is determined by the Pan-
Tompkins algorithm from the pre-processed ECG signal. E[(z − µ)3 ]
skew(z) = (3)
σ3
(4) Timing information: It is the measure of deviation from constant
3.2. Feature extraction: Extracting unique good features provide beat rate calculated by using R-R interval ratio.
better classification accuracy. In this work we extracted four
temporal features AC power, kurtosis, skewness, timing Tj − Tj−1
IRj = (4)
information from Electrocardiogram signal but these features Tj+1 − Tj
are not sufficient to classify the arrhythmias. Therefore these
where Tj indicates the time at which meant for beat J occurs.
features are attached to the morphological features extracted with
the help of Dual tree complex wavelet transform (DTCWT).
Feature extraction method is described in the above block diagram 3.2.2. Morphological features: These features are extracted with
Figure 2. the help of dual-tree complex wavelet transform. The physionet
MIT-BIH database ECG signals are sampled at 360 samples/sec as
a consequence frequency component in ECG signal is in the range
3.2.1. Temporal features: Temporal features are also called as time of 0-180 Hz. In this work coefficients of the DTCWT calculated
domain features and these are very easy to extract from the signal. across the QRS complexes of the electrocardiogram signal.
AC power, kurtosis, skewness, timing information are extracted DTCWT is a powerful technique and it is not sufferred with shift
from QRS complex of respective cardiac cycle. If z[n] represents variance, oscillation and aliasing like DWT and other techniques.
the QRS complex signal. Due to these abilities DTCWT is preferrable for extracting

2 Healthcare Technology Letters, pp. 2–5



c The Institution of Engineering and Technology 2018
morphological features of the QRS complexes. The detection all the five classes (P,V,N,R,L). The Random forest algorithm is
and reconstruction of a signal having singularities become very an approach with several decision trees. In this work 10-fold cross
difficult due to the oscillation of the coefficients between positive validation is used to avoid overfitting problems. 10 different sub-
and negative values around the singularities in discrete wavelet models were build on 10 training sets consists of 90% of the whole
transform. This problem overcome by dual tree complex wavelet data. The remaining 10% were used each model.
transform.
4. Experimental results:

4.1. Data selection: In this piece of work, Performance of the


proposed technique is evaluated by using 48 files of MIT-BIH
arrhythmia database and each file accomadate 30 mins of ECG
signal portion choosen from the 24 hours recording of 48 different
patient files. All the MIT-BIH signals are sampled at 360 hertz.The
following five types of arrhythmias named as Paced (P), Premature
ventricular contraction (V), Normal (N), Right bundle branch block
(R) and Left bundle branch block (L) are classified.

4.2. Metrics: In this paper, we compared proposed method to


five popular methods used for arrhythmia classification interms of
accuracy (Acc), Sensitivity (Se), Specificity (Sp).
Accuracy is defined as the ratio of correctly classified patterns to
Figure 3. Three levels of DTCWT the total no of patterns classified.
TP + TN
Dual tree CWT has complex valued wavelt and scaling function Accuracy(Acc) = (7)
TP + TN + FP + FN
and they are given as like below
Sensitivity is defined as the rate of correctly classified events among
ϕc (t) = ϕreal (t) + jϕim (t) (5) all events.
TP
φc (t) = φreal (t) + jφim (t) (6) Sensitivity(Sen) = = (8)
TP + FN
The dual tree complex wavelet uses two real wavelet filters: one is Specificity is defined as the rate of correctly classified non events
for collecting real part and another one is for collecting imaginary among all non events.
part of the resulting transform. Figure 3 shows 3 levels of DTCWT.
TP + TN
Tree A handles the real part and tree B handles the imaginary part Specificity(Spe) = (9)
of the of the complex wavelet transform. TN + FP
Positive-Predictivity is defined as the rate of correctly classified
3.3. Classifier: Random forest is a smart tool for multi- events in all detected events.
classification problems with good classification accuracy. In geneal TP
Positive_predctivity(Ppr) = (10)
random forest classifier is the extension of random tree classifier TP + FP
and it has multiple decision trees. Random forest technique was
where TP, TN, FP, FN are true positive, true negatuve, flase
invented by Leo breiman from university of California,United
positive, false negative respectively which can be calculated from
states. Random forest classifier has the ability to handle large
the confusion matrix.
number of input data. Random forest algorithm integrates multiple
The diagonal elements of the confusion matrix specifies the
decision trees and provides overall aggregate predictions of those
correctly classiifed instances corresponding to their individual
trees.
classes.From Table 1, it is clear that 197 N arrhythmia beats, 398
V arrhythmia beats, 67 P arrhythmia beats, 310 L arrhythmia beats,
310 R arrhythmia beats are misclassified. V arrhythmia beat is
very important in clinical diagnosis for cardiac arrhythmia patient.
Compared to the previous techniques random forest classified V
beat with better accuracy.The proposed method classifies N,P,L,R,V
with sensitivity of 99.74%, 99.05%, 96.16%, 95.88%, 94.42%
respectively. The proposed technique provides best overall accuracy
of 98.78% compared to the all other classifiers. The proposed
technique with random classifier gives higher true positive rate
(TPR) than false positive rate (FPR).
Table 1 indicates the relative performance of proposed work
with the recent works on ECG arrhythmia classification based on
physionet database.The disadvantage of Chazal et al. technique is
that fixed classification technique doesn’t take any variation in ECG
Figure 4. Generalised random forest structure pattern caused by personal or environmental differences. Chazal et
al. are used AAMI standard five heart beat types and only they
In the proposed random forest applies 300 decision random used 44 files for the experimental study with classification accuracy
trees to classify arrhythmias based on applied features, where the 85.90%. In [2-Ubeyli] the authors used lyapunov exponents,wavelet
applied features are extracted by using DTCWT technique.During coefficients and they consider power levels of power spectral
random forest training, individual decision tree utilise a subgroup density (PSD) values of the Electrocardiogram signal as features
of the features (along with labels information) to develop output for classifying four types of ECG beats of MIT-BIH database. In
classification model. Each random tree established a classification the reported work [2] experimental study conducted on very small
based on its trained model and vote for one arrhythmia class among data and DWT which lacks the property of shift invariance due to

Healthcare Technology Letters, pp. 3–5 3


The Institution of Engineering and Technology 2018
c
Table 1 Summary of the techniques on ECG arrhythmia classification using MIT-BIH database
Literature Features Classifier Classes Accuracy (%)
De Chazal et al. Morphology and heart beat interval Linear discriminant 5 85.90
Ubeyli Lyapunov exponents and wavelet coefficients ANN classifier 4 93.90
Hu et al. Time domain features Mixture of experts 2 94.00
Manu Thomas et al. DTCWT and morphological features ANN classifier 5 94.64
Proposed method DTCWT and morphological features Random forest 5 98.78

R arrhythmia beat detection shows an accuracy of 99.70%


Table 2 Classification results of MIT-BIH arrhythmia database , sensitivity of 95.88%, specificity of 99.99% and positive
Class Confusion matrix [DTCWT] predictivity of 99.86%. F-score is calculated based on precision
N V P L R and recall. F-score provides the most realistic measure of classifier
N 74788 142 9 42 4 performance. By using Table 2 confusion matrix, calcultaed
V 357 6730 21 17 3 precision and recall values.Precision is defined as the fraction of
P 43 23 6953 1 0 elements correctly classified as positive out of all the elements
L 250 54 3 7759 3 the algorithm classified as positive, whereas recall is the fraction
R 272 16 1 10 6952 of elements correctly classified as positive out of all the positive
elements. From precision and recall values calculated the overall F-
score of the classifier performance as 98.24 by using the following
formulae.
the down sampling operation at each stage of DWT implementation.
Due to these resons classification accuracy achieved only 93.90%.In
[3] Hu et al. classified only two classes of arrhymia with the help of 2 ∗ TP
F − score(F) = (11)
local and global classifier by using time domain features and attain 2 ∗ TP + FP + FN
accuracy of 94 % respectively.
In [4] M.Thomas et al. selected a window of 128 samples from 5. Conclusion: In this letter, an automatic classification approach
right and left of the R-peak of the ECG signal.The final feature is proposed to classify the ECG arrhythmia types of Paced
vector consists of an extracted feature set from DTCWT and (P), Premature ventricular contraction (V), Normal (N), Right
four other temporal features.The authors proposed multilayer back bundle branch block (R) and Left bundle branch block (L). The
propogation neural network is proposed to classify the five types experiments are conducted on the standard MIT-BIH standard
of ECG arrhythmias. Multilayer back propogation neural network physionet dataset.In this work Feature extraction and classification
is very much sensitive to the hidden layer neurons. When hidden are the two important steps to detect the ECG arrhythmia type.
layer neurons are less then it leads to underfitting. If the hidden In the proposed method DTCWT extract the morphological
layer neurons are many then it leads to over fitting so that the fitting features of the ECG signal and additionally four other temporal
curve takes uncontrolled oscillations.Network dysfunction happens features (AC power, Kurtosis, Skewness and timing information)
once the weights are adjusted to terribly massive values. It has the are also extracted from the ECG signal. Morphological features
following disadvantages (i) A fixed handcrafted feature extraction are appended with temporal features and considered as final
method may not be suitable for extracting patient-specific features feature vector. This Final feature vector used as an input to
when classifying beats of a particular person (ii) Complexity in the the random forest classifier. Table 3 suggest that the chosen
structure of this technique imposes difficulty in implementation. feature extraction method and classifier provides high overall
DTCWT is a promising technique even though because of multi accuracy of 98.78%, sensitivity of 98.24% , specificity of 98.12%
layer back propogation drawbacks final classification decreased. and positive predictivity of 97.89%. These results show that an
These drawbacks can be over come by random forest algorithm to advance improvement is attained for the proposed method of ECG
get better accuracy. arrhythmia classification when compared to other recent methods.
Table 2 shows the accuracy, sensitivity, specificity, Positive-
Predictivity obtained on by using random forest for classification.
All 48 files including four files of paced beats also included for 6 References
the performance evaluation. For 48 files the N arrhythmia beat
detection shows an accuracy of 98.93% , sensitivity of 99.74%, [1] Inan O.T., L. Giovangrandi., G.T.A Kovacs.: ‘Robust
specificity of 96.85% and positive predictivity of 98.78%, V neural network based classification of premature ventricular
arrhythmia beat detection shows an accuracy of 99.39% , sensitivity contractions using wavelet transform and timing interval
of 94.42%, specificity of 99.76% and positive predictivity of features’, IEEE trans. Inf.Technol. Biomed., 2006, 53, (12),
96.63%, P arrhythmia beat detection shows an accuracy of pp. 2507-2515
99.90% , sensitivity of 99.05%, specificity of 99.96% and positive [2] D. Cvetkovic., E. D. U beyli., I. Cosic.: ‘Wavelet transform
predictivity of 99.51%. L arrhythmia beat detection shows an feature extraction from human ppg , ecg, and eeg signal
accuracy of 99.63% , sensitivity of 96.16%, specificity of 99.93% responses to elf pemf exposures: A pilot study’, Digital
and positive predictivity of 99.11%. signal processing., 2008, 18, (5), pp. 861-874
[3] J. J. Oresko., Z. Jin., J. Cheng., S. Huang., Y. Sun., H.
Duschl., and A. C. Cheng.: ‘A wearable smartphone-based
Table 3 Classification performance of DTCWT with random forest platform for real-time cardiovascular disease detection via
Performance matrix electrocardiogram processing’, IEEE trans. Inf.Technol.
Method Class
Acc (%) Sen (%) Spe (%) Ppr (%) Biomed., 2010, 14, (3), pp. 734-740
N 98.93 99.74 96.85 98.78 [4] J. A. Nasiri., M. Naghibzadeh., H. S. Yazdi., and B.
V 99.39 94.42 99.76 96.63 Naghibzadeh.: ‘Ecg arrhythmia classification with support
Proposed
P 99.90 99.05 99.96 99.51 vector machines and genetic algorithm’, IEEE third UK
Method
L 99.63 96.16 99.93 99.11 Sim European Symposium on Computer Modelling and
R 99.70 95.88 99.99 99.86 Simulation., 2009, pp. 187-192

4 Healthcare Technology Letters, pp. 4–5



c The Institution of Engineering and Technology 2018
[5] G. B. Moody., R. G. Mark.: ‘The impact of the mit-bih
arrhythmia database,’IEEE Engineering in Medicine and
Biology Magazine., 2001, 20, (3), pp. 45-50
[6] D. A. Coast., R. M. Stern., G. G. Cano., and S. A. Briller.:
‘An approach to cardiac arrhythmia analysis using hidden
markov models’, IEEE Trans Biomed Eng., 1990,37, (9), pp.
826-836
[7] Y. H. Hu., S. Palreddy., W. J. Tompkins.: ‘A patient-
adaptable ecg beat classifier using a mixture of experts
approach’, IEEE Trans Biomed Eng., 1997, 44, (9), pp. 891-
900
[8] M. K. Das., S. Ari.: ‘Patient-specific ecg beat classification
technique’, Healthcare technology letters., 2014, 1, (3), pp.
98-103
[9] G. B. Moody.,R. G. Mark.: ‘The impact of the mit-bih
arrhythmia database’, IEEE Engineering in Medicine and
Biology Magazine., 2001, 20, (3), pp. 45-50
[10] M. Thomas., M. K. Das.,S. Ari.: ‘Automatic ecg arrhythmia
classification using dual tree complex wavelet based
features’, AEU International Journal of Electronics and
Communications., 2015, 69, (4), pp. 715-721
[11] P. De Chazal., M. O. Dwyer.,R. B. Reilly.: ‘Automatic
classification of heartbeats using ecg morphology and
heartbeat interval features’, IEEE Trans Biomed Eng., 2004,
51, (7), pp. 1196-1206
[12] Ubeyli ED.: ‘Statistics over features of ECG signals’, Expert
Syst Appl., 2009, 36, (5), pp. 8758-67

Healthcare Technology Letters, pp. 5–5 5


The Institution of Engineering and Technology 2018
c

Potrebbero piacerti anche