Breast Cancer Diagnosis An Enchaed ELM

Breast cancer diagnosis using an enhanced Extreme
Learning Machine based-Neural Network

Mohamed NEMISSI, Halima SALAH, Hamid SERIDI
LabSTIC, University May 8th 1945, Guelma-Algeria
{nemissi_m, seridihamid, halima_sallah}@yahoo.fr
Abstract— Breast cancer has become one of the most deadly cancer nearest neighbor algorithm. In [5], the authors proposed a fuzzy
among women all over the word. Fortunately, an early diagnosis rule-based reasoning system. This system is developed in two
of this type of cancer can considerably enhances the success of stages. In the first one, the data is clustered into similar groups
treatment. In this work, we propose a classification system of the using Expectation Maximization (EM). Then, the fuzzy rules
breast cancer based on neural networks. The proposed system is a
neural network with single hidden layer and trained using extreme
are generated using Classification and Regression Trees. The
learning machine algorithm. The main contribution of this work authors also incorporated the Principal Component Analysis in
relies on the use of different activation functions for the hidden order to avoid the problem of multi-collinearity.
neurons and their optimization using genetic algorithm. To Some other works have been based on Support Vector
evaluate the performance of the proposed system, tests are carried Machines (SVM). For example, in [6] the authors proposed a
out on Wisconsin Diagnostic Breast Cancer database. The system based on a combination of the K-means clustering
obtained results show an important enhancement compared to the method and SVM. The K-means algorithm is separately applied
conventional extreme learning machine. On the other hand, the on benign and malignant tumors in order to recognize possible
obtained results are promising compared to other state-of-the-art hidden patterns. These patterns are then considered as new
methods.
features for the SVM classifier.
Keywords—Extreme learning machine; Breast cancer Other works include Naïve Bayesian [7], Adaptive Neuro-
diagnosis; Classification; Neural networks; Genetic algorithm. Fuzzy Inference System (ANFIS) [8], data mining techniques
[9]…etc.
I. INTRODUCTION (HEADING 1) In this work, we propose a neural classifier for breast cancer
Recently, many breast cancer deaths have been prevented. based on single hidden neural network trained using Extreme
This is due to both improvements in treatment and early Learning Machine algorithm (ELM). The advantages of this
detection by mammography [1]. Therefore, given the training algorithm over the standard Back-propagation are as
importance of the early detection of the breast cancer, a variety follows [10-12]. First, it is faster. Second, it can be used with
of classification techniques have been established to deal with non-differential activation functions. Third, it does not need to
this problem. The published works in this field includes many set the stopping criteria and the learning rate. Indeed, training
machine learning and pattern recognition methods. the output weights of a neural network is much simpler than
Some of these works have been based on neural networks. training all weights [13]. Instead of using similar fix activation
For example, in [2] the authors proposed optimizing the used functions, we use different activation functions for the hidden
neural classifier using genetic algorithm. In their model, both neurons and we consider the choice of their parameters as an
the structure and the weights of the network are optimized. optimization problem. Then, we use the genetic algorithm for
They introduced different crossover and mutation methods in this task. Indeed, the importance of using tunable functions
order to overcome the drawbacks of the conventional operators. have been widely investigated in the literature and it has been
In [3], the authors proposed a neural classifier with two learning noted that the success of the network is linked with determining
stages. In the first one, the input features are learned in an the optimal functions [14][15].
unsupervised way using deep belief network. Then, in the Apart for that, in Section 2, we briefly present the
second stage, the weights of this network are updated using formulations of the original ELM. In Section 3, we describe the
back propagation algorithm in a supervised mode. proposed classification system. In section 4, we discuss the
Some other works have been based on fuzzy logic. For classification results obtained on Wisconsin Diagnostic Breast
example, in [4] the authors proposed a classification model Cancer database. Finally, Section 5 concludes the paper.
performed in three phases: samples selection, features selection
II. EXTREME LEARNING MACHINE
and classification. In the first stage, i.e. samples selection, a
fuzzy-rough method is used to remove unusable or inaccurate The ELM algorithm, introduced by Huang et al [10], is a
samples. In the second stage, i.e. feature selection, a learning approach for Single Hidden Neural Networks (SHNN).
combination of a feature selection method founded on This algorithm is based on the random initialization of input
consistency with a reranking algorithm is used. In the third weights and biases and the analytic calculation of the output
stage, the classification is performed using the fuzzy-rough weights. Therefore, The network can be trained in only some
1 ;1 Step 3: Analytic calculation of the output weights using:
> - = ‡ !.

<
;2 III. THE PROPOSED METHOD
> The aim of this work it to introduce a system for classifying
<2 breast cancer based on an enhanced single hidden neural
network trained using ELM algorithm. The proposed
enhancement is based on the activation functions. Indeed, the
activation functions have an important role on the performance
<K of the neural networks. This matter has been discussed in several
works [10-15]. It was noted that: “networks with any bounded
piecewise of a continuous and non-constant function can
>
;M
approximate any continuous objective functions and can
separate arbitrary disconnected regions of any shapes” [10-12].
Figure 1. An example of ELM-NN with architecure ? ? @ On the other hand, it was noted that a network with different type
of activation function has better generalization capacities
[14][15]. In this work, we use sigmoid activation function,
steps. Fig. 1 illustrates an example of an ELM based- neural
which are the most commonly used. The sigmoid function is
network.
given by:

For a -dimensional classification problem with training () = (4)
/023(4(56.2/76))
samples: () , () ,
= 1: , where () and ()
. An ELM-based FFNN with activation function (. ) and In order to enhance the performance of the neural classifier,
hidden neurons is given by [10]: we propose using different sigmoid functions for the hidden
()
neurons. More precisely, we propose using different values of

+ = () ,
= 1: (1) the parameters 89 and 9. Fig. 2 part (a) and (b) illustrate the
where = [ … ] is the input weight vector effect of the value of 89 and 9, respectively, on the shape of the
that connects the input neurons to the neuron of the hidden sigmoid function.
layer, is the bias of the hidden neuron, and = 1
[ … ] is the weight vector that connects the 0.9
neuron of the hidden layer to the output layer. This can be done
0.8
in matrix form as:
0.7
= ! (2) 0.6
0.5
f(x)
() + … ()

+ 0.4
where = " # … # % , 0.3
($) + … ($)

+ 0.2
as=0.5
$× as=1

as=1.5
0.1
as=2
=& # ' , !=& # ' 0

-5 -4 -3 -2 -1 0 1 2 3 4 5

×
$× (A) x
Since the number of training samples is usually larger than 0.9
the number of hidden neurons, i.e. * , is a nonsquare 0.8
matrix. Therefore, a solution to = ! might not exist. The 0.7

smallest norm least-squares solution of this linear system is
0.6
given by:
0.5
f(x)
- = ‡ ! (3) 0.4
0.3
where ‡ is the Moore–Penrose generalized inverse of 0.2

bs=-1
bs=0
matrix 0.1
bs=1
bs=1.5
The algorithm of ELM is given according to the following 0

-5 -4 -3 -2 -1 0 1 2 3 4 5
three steps [10]: (B)

x
Step 1: Random initialization of input weights and bias

Step 2: Analytic calculation of the hidden neurons outputs Figure 2. Effect of the paparmeters of the sigmoid
matrix . (a) Effect of 89 (b) Effect of 9
Random which 239 samples are malignant and 444 are benign. Nine
Initialisation integer features characterize each record.
of and In order to evaluate the generalization performance, we used
a ten-fold cross validation. According to this strategy, the dataset
is randomly split in ten subgroups, nine of them are used as
training data and the remaining is used as test set. This procedure
Initial population
is repeated ten times and, consequently, all samples appears once
in a test set.
Operations
We carried out tests with different number of hidden
Selection neurons. Given the random nature initialization of networks, it
is difficult to make a fair comparison. Therefore, we performed
Crosseover 20 runs for each test. Furthermore, the same initial random
weights are used for both conventional and proposed neural
Mutation
classifier.
Table I displays the classification results corresponding the
Evaluation first optimizing strategy, i.e. optimizing 89 and 9 . The
proposed classifier do not contain the initial bias . We note that
Calculating Training the performance of the conventional system are enhanced in the
data case of small number of hidden neurons only.
Calculating
TABLE I. RESULTS BASED ON THE FIRST STRATEGY
Objective Validation
function data # hidden Conventional Proposed
neurons ELM-NN GA-ELM-NN
5 95.32 ± 1.05 96.12 ± 0.57
8 96.26 ± 0.57 96.80 ± 0.37
Best population 12 96.67 ± 0.21 96.95 ± 0.24
20 96.69 ± 0.22 97.03 ± 0.30
25 96.87 ± 0.33 96.98 ± 0.74
Test Table II displays the classification results corresponding the

Test second optimizing strategy, i.e. optimizing 89 only. The
data
proposed classifier contains the initial bias . We note that
enhancement are more significant than the case of first strategy.
Figure 3. Principal scheme of the proposed method
TABLE II. RESULTS BASED ON THE SECOND STRATEGY
Therefore, we consider the problem of defining the # hidden Conventional Proposed
parameters of the sigmoid function as an optimization problem. neurons ELM-NN GA-ELM-NN
Then, we use the Genetic Algorithm (GA) for resolving this 5 95.16 ± 1.21 96.11 ± 0.58
problem 8 96.54 ± 0.28 96.74 ± 0.35
12 96.82 ± 0.27 97.21 ± 0.22
Since the parameter 9 of the sigmoid function and the bias 20 96.87 ± 0.27 97.25 ± 0.33
in the input layer have almost the same effect, we propose 25 96.88 ± 0.31 97.28 ± 0.23
either optimizing both 89 and 9 and do not use or optimizing
only the parameter 89. Therefore, we propose two optimization Table III compares the proposed method with other works
strategies. applied on the same database. These works include a variety of
machine learning techniques, i.e. decision trees, fuzzy logic,
Fig. 3 illustrates the principal scheme of the proposed neural networks, clustering, wavelet, GA, SVM, ELM…etc. We
system. First , the input weights are randomly initialized then the note that our classification system outperforms some of these
GA is used to define the optimal values of the parameters 89 and works. Although some other works outperform our system, we
9 that correspond to the initial weights. The objective function consider the obtained results promising especially in the case of
of the GA is based on the accuracy of the system on the small number of hidden neurons. Indeed, the ELM algorithm
validation samples. have several advantages compared to the conventional Back-
propagation algorithm but it needs more hidden neurons. The
IV. TESTS AND EXPERIMENTS application of our approach helps to overcome this problem.
To evaluate the performance of the proposed classifier, we
used Wisconsin Breast Cancer Dataset (WBCD) [16]. This
database contains 699 records, among them 16 are with missing
data. To be consistent with the literature, we removed these
records. The retained dataset consists then of 683 records, in
TABLE III. COMPARAISON OF THE RESULTS machine algorithms”, Expert Systems with Applications, vol. 41, no 4, pp.
1476-1482, 2014.
Accuracy (%)
Authors, Year Method [7] M. Karabatak, “A new classifier for breast cancer detection based on
[cross validation ]
Naïve Bayesian”, Measurement, vol. 72, pp. 32-36, 2015.
Quinlan, 1996 [17] Decision trees (C4.5) 94.74 [10-CV]
[8] A. Addeh, H. Demirel, and P. Zarbakhsh, “Early detection of breast
Hamilton, 1996 [18] Rule induction based on 95.00 [10-CV]
cancer using optimized ANFIS and features selection”, In :
approximate class.
Computational Intelligence and Communication Networks (CICN), 2017
Nauck, 1999 [19] Neuro-Fuzzy classifier 95.06 [10-CV] 9th International Conference on. IEEE, 2017. pp. 39-42.
Abonyi, 2003 [20] Fuzzy Clustering + 95.57 [10-CV] [9] V. Chaurasia and S. Pal,” A novel approach for breast cancer detection
Fuzzy Classifier using data mining techniques”. 2017
Örkcü, 2011 [21] Real-Coded G.A. + 96.50 [10-CV] [10] G. B. Huang, Q. Y. Zhu and C. K. Siew, “Extreme learning machine:
Neural Networks theory and applications”, Neurocomputing, vol. 70, no 1-3, pp. 489-501,
Stoean , 2013 [22] SVM + 97.07a 2006.
Evolutionary Algorithm [11] G. B. Huang, “An insight into extreme learning machines: random
Malmir, 2013 [23] Imperialist Competitive 97.75a neurons, random features and kernels”, Cognitive Computation, vol. 6, no
Algorithm + Neural 3, pp. 376-390, 2014.
Networks [12] G. B. Huang, Q. Y. Zhu and X. DING, “Extreme learning machine for
Nguyen, 2015 [24] Wavelet + 97.88 [5-CV] regression and multiclass classification”, IEEE Transactions on Systems,
Type-2 fuzzy logic Man, and Cybernetics, Part B (Cybernetics), vol. 42, no 2, pp. 513-529,
Fadzil, 2015 [25] Neural Networks + 98.29a 2012.
Genetic Algorithm [13] M. NEMISSI, H. SERIDI, and H. AKDAG, “One-against-all and one-
Karabatak, 2015 [7] Weighted naïve Bayesian 98.54 [5-CV] against-one based neuro-fuzzy classifiers”, Journal of Intelligent & Fuzzy
Proposed Extreme Learning + 97.28 [10-CV] Systems, vol. 26, no 6, p. 2661-2670, 2014.
Genetic Algorithm [14] B. Li, Y. Li and X. Rong, “The extreme learning machine learning
a Cross Validation not mentioned algorithm with tunable activation function”, Neural Computing and
Applications, vol. 22, no 3-4, pp. 531-539, 2013.
[15] 2 ) (UWX÷UXO ³$ QRYHO W\SH RI DFWLYDWLRQ IXQFWLRQ LQ DUWLILFLDO QHXUDO
V. CONCLUSION networks: Trained activation function”, Neural Networks, vol. 99, pp.
148-157, 2018.
This paper introduced a neural classification system for [16] Bache, K., & Lichman, M. (2013). UCI machine learning repository.
breast cancer diagnosis. The proposed system consists in an http://archieve. ics.uci.edu/ml, Irvine, CA: University of California,
extreme learning based-single hidden neural network. We used School of Information and Computer Science.
sigmoid activation functions with different parameters for the [17] J. Quinlan, “Improved use of continuous attributes in C4. 5”, Journal of
hidden neurons. We considered defining the optimal values of artificial intelligence research, vol. 4, pp. 77-90, 1996.
these parameters as an optimization problem and we used [18] H. J. Hamilton, N. Cercone and N. Shan, “RIAC: a rule induction
genetic algorithm for this task. We used Wisconsin Breast algorithm based on approximate classification”, Computer Science
Department, University of Regina, 1996.
Cancer Dataset for the evaluation of the proposed system. When
compared to the conventional extreme learning network, the [19] D. Nauck and R. Kruse,”Obtaining interpretable fuzzy classification rules
from medical data”, Artificial intelligence in medicine, vol. 16, no 2, pp.
proposed classification system provided higher generalization 149-169, 1999.
performance with less hidden neurons. This helps to overcome [20] J. Abonyi and F. Szeifert, ”Supervised fuzzy clustering for the
the main problem of the ELM algorithm, which consists in large identification of fuzzy classifiers”, Pattern Recognition Letters, vol. 24,
number of hidden neurons. The proposed system was also no 14, pp. 2195-2207, 2003.
compared with other works and it outperformed most of them. [21] H. Örkcü, and H Bal, “Comparing performances of backpropagation and
These results are promising for further enhancement of this genetic algorithms in the data classification”, Expert systems with
classifier. applications, vol. 38, no 4, pp. 3703-3709, 2011.
[22] R. Stoean and C. stoean, “Modeling medical decision making by support
REFERENCES vector machines, explaining by rules of evolutionary algorithms with
feature selection”, Expert Systems with Applications, vol. 40, no 7, pp.
[1] C. E. DeSantis, J.Ma, A. Sauer Goding, L. A. Newman and A. Jemal, 2677-2686. 2013.
“Breast cancer statistics, 2017, racial disparity in mortality by state”, CA:
a cancer journal for clinicians, vol. 67, no 6, pp. 439-448. 2017. [23] H. Malmir, F. Farokhi and R. Sabbaghi-nadooshan, “ Optimization of data
mining with evolutionary algorithms for cloud computing application. In
[2] Bhardwaj and A. Tiwari, ”Breast cancer diagnosis using genetically : Computer and Knowledge Engineering (ICCKE), 2013 3th International
optimized neural network model”, Expert Systems with Applications, eConference on. IEEE, 2013. pp. 343-347.
vol. 42, no 10, p. 4611-4620, 2015.
[24] T. Nguyen, A. Khosravi, D. Creighton and S. Nahavandi, “ Medical data
[3] M. Abdel-Zaher and A. M. Eldeib, “Breast cancer classification using classification using interval type-2 fuzzy logic system and wavelets.
deep belief networks. Expert Systems with Applications”, vol. 46, pp. Applied Soft Computing, vol. 30, pp. 812-822, 2015.
139-144, 2016.
[25] A. Fadzil, N. A. M. Isa, Z. Hussain, M. K. Osman, S. N. Sulaiman, “A
[4] Onan, “A fuzzy-rough nearest neighbor classifier combined with GA-based feature selection and parameter optimization of an ANN in
consistency-based subset evaluation and instance selection for automated diagnosing breast cancer”, Pattern Analysis and Applications, vol. 18, no
diagnosis of breast cancer”, Expert Systems with Applications, vol. 42, 4, pp. 861-870, 2015.
no 20, pp. 6844-6852, 2015.
[5] M. Nilashi, O. Ibrahim, H. Ahmadi and L. Shahmoradi, “A knowledge-
based system for breast cancer classification using fuzzy logic method”,
Telematics and Informatics, vol. 34, no 4, pp. 133-144, 2017.
[6] ZHENG, S. W. YOON and S. S. LAM, “Breast cancer diagnosis based
on feature extraction using a hybrid of K-means and support vector

Breast Cancer Diagnosis An Enchaed ELM

Caricato da

Informazioni sul documento

Titolo originale

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Breast Cancer Diagnosis An Enchaed ELM

Caricato da

Copyright:

Formati disponibili

Breast cancer diagnosis using an enhanced Extreme

Learning Machine based-Neural Network

() + … ()

where = " # … # % , 0.3

($) + … ($)

=& # ' , !=& # ' 0

Since the number of training samples is usually larger than 0.9

the number of hidden neurons, i.e. * , is a nonsquare 0.8

matrix. Therefore, a solution to = ! might not exist. The 0.7

where ‡ is the Moore–Penrose generalized inverse of 0.2

The algorithm of ELM is given according to the following 0

three steps [10]: (B)

Step 1: Random initialization of input weights and bias

Test Table II displays the classification results corresponding the

Potrebbero piacerti anche

Breast Cancer Diagnosis An Enchaed ELM

Caricato da

Informazioni sul documento

Titolo originale

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Breast Cancer Diagnosis An Enchaed ELM

Caricato da

Copyright:

Formati disponibili

Breast cancer diagnosis using an enhanced Extreme

Learning Machine based-Neural Network

  () +  …   ()

where  = " # … # % , 0.3

  ($) +  …   ($)

=& # ' , !=& # ' 0

Since the number of training samples is usually larger than 0.9

the number of hidden neurons, i.e. *  ,  is a nonsquare 0.8

matrix. Therefore, a solution to  = ! might not exist. The 0.7

where  ‡ is the Moore–Penrose generalized inverse of 0.2

The algorithm of ELM is given according to the following 0

three steps [10]: (B)

Step 1: Random initialization of input weights  and bias 

Test Table II displays the classification results corresponding the

Potrebbero piacerti anche

() + … ()

where = " # … # % , 0.3

($) + … ($)

=& # ' , !=& # ' 0

the number of hidden neurons, i.e. * , is a nonsquare 0.8

matrix. Therefore, a solution to = ! might not exist. The 0.7

where ‡ is the Moore–Penrose generalized inverse of 0.2

Step 1: Random initialization of input weights and bias