Sei sulla pagina 1di 18

Neural Comput & Applic (2019) 31:171–188

DOI 10.1007/s00521-017-2988-6

ORIGINAL ARTICLE

Feature selection via a novel chaotic crow search algorithm


Gehad Ismail Sayed1 · Aboul Ella Hassanien1 · Ahmad Taher Azar2,3

Received: 15 November 2016 / Accepted: 27 March 2017 / Published online: 25 April 2017
© The Natural Computing Applications Forum 2017

Abstract Crow search algorithm (CSA) is a new natural show that CCSA is superior compared to CSA and the
inspired algorithm proposed by Askarzadeh in 2016. The other algorithms. In addition, the experiments show that sine
main inspiration of CSA came from crow search mecha- chaotic map is the appropriate map to significantly boost the
nism for hiding their food. Like most of the optimization performance of CSA.
algorithms, CSA suffers from low convergence rate and
entrapment in local optima. In this paper, a novel meta- Keywords Crow search algorithm · Feature selection ·
heuristic optimizer, namely chaotic crow search algorithm Optimization algorithm · Chaos theory
(CCSA), is proposed to overcome these problems. The
proposed CCSA is applied to optimize feature selection
problem for 20 benchmark datasets. Ten chaotic maps are 1 Introduction
employed during the optimization process of CSA. The per-
formance of CCSA is compared with other well-known and Recently, evolutionary algorithms (EAs) have a great attrac-
recent optimization algorithms. Experimental results reveal tion and proved their efficiency for solving optimization
the capability of CCSA to find an optimal feature subset problems. Some of these algorithms are genetic algorithm
which maximizes the classification performance and mini- (GA) [15], particle swarm optimization (PSO) [26], many
mizes the number of selected features. Moreover, the results objective particle swarm optimization (MOPSO) [9], and
differential evolution (DE) [44]. Despite the different struc-
tures of EAs, they usually set a random population. Then,
they evaluate it through the iterations. EAs are similar in
Scientific Research Group in Egypt http://www.egyptscience.net dividing the search space into two main phases: exploitation
and exploration. In many cases, EAs get stuck in local min-
ima. This is due to improper balancing between exploitation
 Gehad Ismail Sayed
and exploration and the stochastic nature of EAs. Several
gehad.ismail@egyptscience.net
studies are presented in the literature to overcome these
Aboul Ella Hassanien problems and to improve the performance of EAs. Chaos is
aboitcairo@gmail.com
one of the mathematical approaches recently employed to
Ahmad Taher Azar boost the performance of EAs.
ahmad.azar@fci.bu.edu.eg; ahmad t azar@ieee.org
In the last decade, mathematics and system science
1 Faculty of Computers and Information, Cairo University,
branch, namely chaos, has been developed. It has been
Cairo, Egypt applied intensively in different sciences’s fields such as
synchronization [60], chaos control [57], different optimiza-
2 Faculty of Computers and Information, Benha University, tion researches [45, 51], and so on. Chaos has three main
Banha, Egypt
dynamic properties; (1) quasi stochastic property, (2) sen-
3 Nanoelectronics Integrated Systems Center (NISC), sitivity against initial conditions, and (3) ergodicity. The
Nile University, Giza, Egypt chaos application in optimization research disciplines has
172 Neural Comput & Applic (2019) 31:171–188

attracted a great attention in recent years. Chaotic opti- memory. In addition, it can help in better data understanding
mization algorithm (COA) is one of the chaos applications, and interpretation. Several studies on applying feature selec-
which makes a use of chaos sequence’s nature [29]. It has tion before clustering are found in the literature [6, 53]. It
been proved that replacing random variables with chaotic has been proven that clusters built by a subset of the salient
variable can enhance the performance of COA [29]. Thus, features are more practical and interpretable than clusters
several studies combine chaos with other algorithms in order built using all of the features, as some of these features
to improve the performance of COA. Some of them are include noise. Moreover, feature selection plays an impor-
chaotic ant swarm optimization (CASO) [5], chaotic genetic tant role in multi-label learning. Class label dependence in
algorithm (CGA) [1, 49], chaotic particle swarm (CPSO) multi-label learning is considered another aspect that can
[20, 21, 50], chaotic pattern search (CPS) [54], chaotic sim- be noisy and incomplete. Thus, the high dimensionality of
ulated annealing algorithm (CSA) [33], chaotic differential multi-labeled data not only increases computational costs
evolution algorithm (CDEA) [22], chaotic biogeography- and the memory storage requirements for many learning
based optimization (CBBO) [39], chaotic krill herd algo- algorithms but also, in real applications, limits the usage of
rithm (CKHA) [47], and chaotic firefly algorithm (CFA) these learning algorithms. Feature selection can be used to
[13]. These algorithms have been proposed and in literature improve the performance of multi-label ELM-RBF through
and applied on different domains. reducing the feature dimensionality while performing clus-
Feature selection is considered as a preprocessing step in tering analysis [58]. Authors in [24] proposed multi-label
machine learning. Selecting the most relevant feature sub- informed feature selection approach. They used label corre-
set is one of the challenging tasks for complex or large lations for selecting the most discriminating features across
dataset. Discovering hidden patterns or valuable knowl- multiple labels.
edge in large scale data has become a pressing matter Feature selection algorithms are divided in two main cat-
of the moment [23]. Feature selection has been proven egories, namely wrapper-based algorithms and filter-based
to effectively remove irrelevant and redundant features. In algorithms [27]. The wrapper-based algorithms depend on
addition, it can improve the performance of classifiers, using machine learning algorithms for their evaluation. On
reduce the computational cost, and reduce the required stor- the other hand, filter-based algorithms use statistical meth-
age [28]. In recent years, data became increasingly larger ods for selecting a feature subset. However, wrapper-based
in both a number of attributes/features and a number of algorithms obtain better results; they are computationally
instances. Feature selection has been applied successfully in expensive. Thus, an intelligent search algorithm is needed
many applications such as text categorization [52], genome to reduce the computational time [17]. Several studies have
projects [4], customer relationship management [35], and been proposed to introduce feature selection algorithms
image retrieval [37]. Thus, nowadays, feature selection for such as sequential backward selection (SBS) and sequential
high-dimensional data becomes very necessary needed for forward selection (SFS). However, these algorithms suf-
machine learning tasks. fer from stacking in local optima and high computational
Feature selection emerged as an effective medical clas- time. EAs using their search agents can search the feature
sification and diagnostic support system. Microarray tech- space adaptively to find the optimal solution. Some of these
nology is one of the recent breakthroughs in experimental algorithms are gradient descent algorithm [10], discrete par-
molecular biology. This technology helps scientists in mon- ticle swarm optimization (DPSO) [46], tabu search [56],
itoring the gene expression on a genomic scale. Such a elephant herding optimization (EHO) [11], firefly algo-
technology at the gene expression level can significantly rithm (FA) [12], harmony search (HS) [14], charged system
increase the possibility of cancer classification and diagno- search (CSS) [25], bird swarm algorithm (BSA) [32], ani-
sis. However, many factors can degrade the outcome of the mal migration optimization (AMO) [30], teaching–learning-
analysis. One of these factors is in the huge number of genes based optimization (TLBO) [36], moth-flame optimization
founded in the original data. Thus, determining the most dis- (MFO) [55], and gray wolf optimizer (GWO) [34].
criminatory genes is considered a critical task to improve the Although selecting the best feature subset has been sig-
speed and accuracy of prediction systems [16, 43]. Authors nificantly improved, there is still a need to motivate to
in [43] present a hybrid forward selection algorithm for push these presented work further. This work tries a novel
cardiovascular disease diagnosis. The experimental results hybrid approach, where chaos embedded with CSA. The
demonstrate that the proposed approach finds smaller fea- main contribution of this paper is that a chaotic binary ver-
ture subsets with high diagnosis’s accuracy compared to sion of CCSA is proposed to enhance the performance of
back elimination and forward inclusion algorithms. CSA. In this hybrid approach, the chaotic search method-
Feature selection can have a great impact on the cluster- ology is adopted to select the optimal feature subset which
ing performance, as selecting relevant features can improve maximizes the classification accuracy and minimizes the
the learning quality and consume computational time and feature subset length. Ten one-dimensional chaotic maps are
Neural Comput & Applic (2019) 31:171–188 173

adopted and replaced with random movement parameters of at iteration t for crow j . N j,t defined as the best position
CSA. In this study, CCSA is used as a feature selection algo- obtained so far by crow j .
rithm. The performance of the proposed approach is tested At iteration t, crow j wants to follow the hiding place of
on 20 benchmark datasets. In addition, the performance of crow z. In this case, there are two cases that may happen:
CCSA is compared with seven other meta-heuristic algorithms.
The organization of this paper is as follows. A brief
overview of the basic CSA algorithm and ten chaotic maps Case 1: Crow z does not know that crow j follows it
is described in Section 2. The detailed description of the Crow j approaches to the hiding place of crow z. The
proposed CCSA approach is provided in Section 3. Exper- updating position of crow j is defined as follows:
imental results and discussions of the proposed CCSA are
introduced in Section 4. Finally, a summary of the proposed
work is represented in Section 5. y j,t+1 = y j,t+1 + Rj × f l j,t × (N z,t − y j,t ) (1)

2 Basics and background where f l denotes the flight length. Rj is a random


number ∈ [0, 1]. f l has a great influence on search-
2.1 Crow search algorithm ing capability. f l with small value leads to local search,
while the large value leads to global search.
2.1.1 Inspiration analysis Case 2: Crow z knows that crow j follows it Crow z will
change its position in the search space to protect its cache.
The CSA algorithm is a meta-heuristic algorithm proposed
by Askarzadeh [2] in 2016. The main inspiration of this
algorithm came from crow search mechanism for hiding The previous two cases are mathematically defined as
their food. Crows is considered one of the most intelligent follows:
birds. They have larger brain compared to their body size.
Crows are clever. Their brain is slightly lower in the brain of

humans. In addition, they in mirror test are self-aware. They y j,t +Rj ×f l j,t ×(N z,t −y j,t ), Rz ≥ qAP j,t
y j,t+1
= (2)
able to remember faces. When they found unfriendly one, Choose a rand position , otherwise
they warn the other crows in a sophisticated communication
way. Additionally, they can store and recall their food dur-
ing several months. They are known to be thieves, as they where Rz is a random number ∈ [0, 1] and AP j,t is the
are stealing the food of the other birds. They make a use of awareness probability of crow z at t iteration. AP controls
their experiences as a thief in predicting pilferer’s behavior. the balance between exploration and exploitation. Small val-
They are very cautious. When a crow reported of thievery, ues of AP lead to the search on local regions (exploitation),
they move their hiding places. This enables them to avoid while the large value leads to global search in the search
any kind of future victims. space (exploration).
The four main principles of CSA are defined as follows: CSA starts with setting the constraints, D, tMax, M, AP ,
and f l. Each crow’s position y is randomly initialized at
– Crows live in the flock’s form search space. At the beginning, the crows y do not have an
– Crows keep in their memory their hiding places of foods experience to hide their food. Thus, they hide their food at
– Each member of crow follows each other while doing initial positions N. During the algorithm runs, each crow is
thievery evaluated using a predefined fitness function. Then, accord-
– Crows are very cautions against thievery; they protect ing to the fitness value, the crows update their positions
their caches by a probability using Eq. (2). Each new position is checked for feasibility.
The crows update their memory as follows:
2.1.2 Mathematical model of CSA

Suppose the number of crows/flock denotes as M. D is the y j,t+1 , F n(y j,t+1 ) is better than F n(N j,t )
total number of dimensions, y j,t is the position of j crow at N j,t+1 = (3)
N j,t , otherwise
iteration t in the search space, where j = 1, 2, ...M. tMax
is the maximum number of iterations.
Each crow has memory where the position of hiding where F n() is defined as the objective function.
place is memorized. N j,t is the position of hiding place As long as the termination criteria are met, the best
174 Neural Comput & Applic (2019) 31:171–188

position is reported as the optimal solution. The pseudo code algorithms. The main idea of it is to transform param-
of CSA is defined at Algorithm 1. eters/variables from the chaos to the solution space. It
depends for searching of global optimum on chaotic motion
properties such as ergodicity, regularity, and stochastic
properties. The main privileges of COA are fast convergence
rate and their capability for avoiding local minima. All of
these privileges can significantly improve the performance
of evolutionary algorithms [59].
Chaotic maps have a form of determinate, where no
random factors are used. In this work, ten distinguished
non-invertible one-dimensional maps are adopted to obtain
chaotic sets. The adopted chaotic maps are defined in
Table 1. In this table, q denotes the index of the chaotic
sequence p, and pq is the qth number in the chaotic
sequence. The rest of the parameters including d, c, and μ
are the control parameters, which determine the chaotic
behavior of the dynamic system. The initial point p0 is set to
0.7 for all chaos maps, as the initial values for chaotic map
may have a great influence of fluctuation patterns on chaotic
maps. We used the initial values as in [39]. Figure 1 shows
the visualization of these maps for C chaotic map value.

3 The proposed chaotic crow search algorithm

In this section, the random variables used for updating crow


position are replaced with chaotic variables. As updating
2.2 Chaotic map crow position influences the optimal solution and conver-
gence rate, chaotic sequence generated from chaotic maps
Chaos is defined as a phenomenon. Any change of its ini- is used. Such a combination of chaos with CCSA is defined
tial condition may lead to non-linear change for the future as CCSA. In this study, ten different chaotic maps are used
behavior. Chaos optimization is one of the recent search for the optimization process. These maps are chebychev,

Table 1 The ten adapted chaotic maps

No. Name Definition Range

CCSA1 Chebyshev pq+1 = cos(q cos−1 (pq )) (−1,1)


CCSA2 Circle pq+1 = mod (pq + d − ( 2π c
) sin(2πpq ), 1) , c = 0.5 and d = 0.2 (0,1)

1, pq = 0
CCSA3 Guass/mouse pq+1 = 1 (0,1)
mod (pq ,1) , otherwise
CCSA4 Iterative pq+1 = sin( cπ
pq ) , c = 0.7 (−1,1)
CCSA5 Logistic pq+1 = cp
⎧ qp(1 − pq ), c = 4 (0,1)


q
l , 0 ≤ pq < l


⎨ q −l , l ≤ p < 0.5
p
q
CCSA6 Piecewise pq+1 = 0.5−l , l = 0.4 (0,1)


1−l−pq
, 0.5 ≤ pq < 1 − l

⎪ 0.5−l
⎩ 1−pq , 1 − l ≤ p < 1
l q
CCSA7 Sine pq+1 = 4c sin(πpq ), c = 4 (0,1)
CCSA8 Singer pq+1 = μ(7.86pq − 23.31pq2 + 28.75pq3 − 13.302875pq4 ) , μ = 1.07 (0,1)
CCSA9 Sinusoidal pq+1 = cp
 qp sin(πpq ) , c = 2.3
2 (0,1)
q
, pq < 0.7
CCSA10 Tent pq+1 = 0.7 (0,1)
10
3 (1 − p q ), p q ≥ 0.7
Neural Comput & Applic (2019) 31:171–188 175

Fig. 1 Visualization of chaotic maps

sine, sinusoidal, tent, circle, singer, guass/mouse, logistic, where


iterative, and piecewise. The mathematical formula of these
maps is defined at Section 2.2. These maps can significantly 1
improve the performance and convergence rate of CSA s(y j,t+1 ) = j,t+1 −0.5)
(6)
1 + e10(y
as it will be demonstrated later in the following section.
The CSA approach combined with chaotic sequences is
where rand() is a random number from uniform distribution
described in Eq. (4).
[0, 1] and y j,t+1 is the updated binary position at t iteration.
 j,t In this paper, CCSA is implemented as a feature selec-
y +Cj ×f l j,t ×(N z,t −y j,t ), Cz ≥ AP j,t
y j,t+1 = (4) tion algorithm-based wrapper method. In CCSA, a chaotic
Choose a rand position , otherwise
sequence is embedded in its searching iterations. The opti-
where C(j ) is the obtained value of chaotic map at j th iter- mal feature subset describing the dataset is selected using
ation and C(z) is the obtained value of chaotic map at zth CCSA. The purpose of feature selection is to improve the
iteration. classification performance, reduce feature subset length,
In this work, a novel binary CCSA for the feature selec- and reduce the computational cost. The detailed description
tion task is proposed. In binary CCSA, the solution pool is in is defined as follows:
binary form, where the solutions are restricted to the binary
{0, 1}. The agents transfer from continues to binary space 3.1 Parameter initialization
using the following equations.
 At the beginning, CCSA starts with setting adjustable
1 if (s(y j,t+1 )) ≥ rand() parameters and randomly initialized crow positions
y j,t+1 = (5)
0 otherwise (solution) in the search space. Each position represents a
176 Neural Comput & Applic (2019) 31:171–188

Table 2 Parameter settings for CCSA 3.4 Termination criteria


Parameter Value
The optimization process terminates when it reaches the
M 30 maximum number of iterations or when the best solu-
AP 0.1 tion is found. In our case, we used the maximum num-
fl 2 ber of iteration. The maximum number of iteration in all
Lower bound 0 experiments is set to 50.
Upper bound 1 The pseudo code of CCSA and its corresponding
tMax 50 flowchart is defined at Algorithm 2 and Fig. 2, respectively.
D Same as total number of features
in the original dataset

feature subset with different number of features and dif-


ferent length. The initial parameter settings of CCSA are
presented in Table 2.

3.2 Fitness function

At each iteration, each crow position is evaluated using a


specified fitness function F nt . The data is divided randomly
into two different parts, namely training and testing datasets
using m-fold methods. In this study, m sets to 10 to ensure
the stability of the obtained results. In this paper, two objec-
tive criteria are used for evaluation, namely classification
accuracy and the number of selected features. The adopted
fitness function equation combines the two criteria into one
by setting a weight factor as in Eq. (7). Acc is the classifica-
tion accuracy calculated by dividing the number of correctly
classified instances over the total number of instances. K-
nearest neighbor (KNN) is the used classifier where k equals
to 3 with mean absolute distance. KNN is one of super-
vised learning algorithms which depend on classifying new
instance based on distance from the new instance to the
training instances [8]. In this study, KNN is used to deter-
mine the goodness of the selected features. The selection of
K and distance method is based on trial and error method.
Lf is the length of selected feature subset, Lt is the total
number of features, and wf is the weighted factor which 4 Experimental results and discussion
has value in [0, 1]. wf is used to control the importance
of classification accuracy and number of selected features. 4.1 Dataset description
Improving the accuracy is the first objective for any clas-
sifier. Thus, the weight factor is usually set to value near Twenty benchmark datasets from various types including
one [38]. In this study, we set wf to 0.8. The best solution medical/biology and business are used in the experiments.
is the one which maximizes the classification accuracy and The datasets are collected from UCI machine learning
minimizes the number of selected features. repository [3]. A brief description of each adopted dataset
Lf is presented in Table 3. As it can be seen, the adopted
F nt = maximize(Acc + wf × (1 − )) (7) datasets contain missing values in some records. In this
Lt
paper, all these missing values are replaced by the median
3.3 Position updating value of all known values of a feature given class. The
mathematical definition of median method is defined in Eq.
The updating crow positions of CCSA are defined at Eqs. (8). si,j is the missing value for j th feature of a given ith
(2), (4), and (5). class W . For missing categorical values, the most appeared
Neural Comput & Applic (2019) 31:171–188 177

Fig. 2 Chaotic crow search


algorithm flowchart
Set values of
initialize
randomly feature subsets

Assign each feature subset


to a crow position

Fill the row memory with


initial positions

Evaluate each crow


position using

Yes No

Generate new crow


Select a random position
position using Equ. (4)

Yes No
The new position
is feasible?

Update position with new


Keep the current position
one

Yes Fn of new No
position Fn of
memory position

Update memory with new


position Keep the current memory

No
Is Iteration

Yes

Produce the best position

Terminate

value for a feature given class is replaced with the missing 4.2 Performance metrics
value.
In this subsection, six different statistical measurements
s i,j = mediani:si,j ∈Wr si,j (8) are adopted. These measurements are the worst, best, and
178 Neural Comput & Applic (2019) 31:171–188

Table 3 Dataset description

ID Dataset No. of features No. of instances No. of classes Missing values Type

D1 Chess 36 3196 2 No Game


D2 Poker hand 10 25010 10 No Game
D3 Germen credit 24 1000 2 No Business
D4 Credit approval 15 690 2 Yes Business
D5 Cylinder bands 40 512 2 Yes Physical
D6 Abalone 8 4177 29 No Life
D7 Glass identification 10 214 6 No Physical
D8 Letter recognition 17 20000 26 No Computer
D9 Waveform 21 5000 3 No Physical
D10 Zoo 18 101 2 No Life
D11 Wisconsin Diagnosis Breast Cancer (WBCD) 32 596 2 No Clinical
D12 Mice Protein Expression Dataset (MPED) 82 1080 8 Yes Clinical
D13 Parkinson’s Disease Detection Dataset (PDD) 23 197 2 No Clinical
D14 Cardiotocography 23 2126 3 No Clinical
D15 Hepatitis 19 155 2 Yes Clinical
D16 Lung Cancer 56 32 3 Yes Clinical
D17 Single proton emission computed tomography (SPECT) 44 267 2 No Clinical
D18 Thoracic surgery 17 470 2 No Clinical
D19 Statlog (heart) 13 270 2 No Clinical
D20 Indian Liver Patient Dataset 10 583 2 No Clinical

mean fitness value, standard deviation (SD), average feature tMax


selection size (ASS), and P value from Wilcoxon’s rank Mean fitness = BSi (12)
tMax
sum test. Wilcoxon’s rank sum test is a non-parametric sta- i=1
tistical test with 5% significance level [48]. The statistical
1
length(BSi )
tMax
test is necessary needed in order to prove that the proposed ASS = (13)
algorithm provides a significant improvement compared to tMax L
i=1
other algorithms [7]. Wilcoxon’s rank sum test is more
where BS is the best score obtained so far for each iteration.
sensitive than the t test as it assumes proportionality of dif-
L is number of features in the original dataset.
ferences between two pair samples. Moreover, it is safer
than the t test as it does not assume the normal distributions.
4.3 Analysis and discussion
Additionally, the outliers affect less on Wilcoxon’s test than
t test [7]. Generally speaking, the best values of P are when
In this subsection, various experiments on 20 benchmark
P value <0.05. Therefore, it can be considered a sufficient
datasets are implemented. These experiments aim to eval-
evidence against the null hypothesis. In this work, we used
uate the performance of the proposed CCSA feature selec-
this test to evaluate the performance of each chaotic map
tion algorithm and to compare it with other meta-heuristic
and determine the best one. Worst, best, and mean fitness
value, SD, and ASS are mathematically defined as follows:
Table 4 PC specification

M
i=1 (BSi − μ)
2 Name Detailed settings
SD = (9)
tMax Hardware
CPU Core(TM) i3
tMax Frequency 2.13 GHZ
Best fitness = max BSi (10)
i=1 RAM 2 GB

Software
tMax Operating system Windows 7
Worst fitness = min BSi (11) Language MATLAB R2012R
i=1
Neural Comput & Applic (2019) 31:171–188 179

Table 5 Statistical results

D1 Worst Best Mean SD ASS P val. D2 Worst Best Mean SD ASS P val.
CSA 1 1.39 1.23 0.07 0.96 CSA 0.77 1.05 0.97 0.07 1
CCSA1 1.04 1.32 1.28 0.06 0.53 2.05E-07 CCSA1 0.74 1.12 1.07 0.06 0.49 1.29E-05
CCSA2 1.09 1.38 1.31 0.07 0.92 1.25E-07 CCSA2 0.81 1.08 1.04 0.07 0.93 2.76E-06
CCSA3 1.02 1.32 1.27 0.07 0.45 7.23E-07 CCSA3 0.81 1.07 1.01 0.07 0.45 0.517
CCSA4 1.04 1.43 1.31 0.1 0.8 1.03E-05 CCSA4 0.81 1.07 1.03 0.06 0.8 9.20E-04
CCSA5 1.03 1.32 1.28 0.06 0.12 8.18E-07 CCSA5 0.82 1.12 1.04 0.06 0.86 4.78E-06
CCSA6 0.97 1.38 1.28 0.06 0.35 1.56E-04 CCSA6 0.69 1.12 1.04 0.07 0.55 4.00E-04
CCSA7 1.07 1.35 1.29 0.05 0.63 4.33E-07 CCSA7 0.82 1.13 1.08 0.06 0.43 3.04E-07
CCSA8 1.04 1.35 1.29 0.07 0.21 1.90E-06 CCSA8 0.77 1.12 1.07 0.07 0.62 1.06E-09
CCSA9 1.03 1.34 1.29 0.06 0.2 8.55E-08 CCSA9 0.76 1.13 1.06 0.08 0.59 1.07E-06
CCSA10 1.1 1.35 1.28 0.07 0.02 4.70E-02 CCSA10 0.82 1.12 1.04 0.07 0.41 1.19E-14

D3 Worst Best Mean SD ASS P val. D4 Worst Best Mean SD ASS P val.
CSA 0.97 1.4 1.33 0.12 0.36 CSA 1.14 1.27 1.24 0.04 1
CCSA1 1.03 1.47 1.42 0.08 0.54 1.57E-08 CCSA1 1.08 1.37 1.33 0.05 0.73 2.05E-14
CCSA2 0.96 1.47 1.41 0.11 0.8 2.28E-09 CCSA2 1.02 1.42 1.34 0.08 0.99 6.89E-10
CCSA3 1.02 1.45 1.35 0.14 0.57 0.002 CCSA3 1.02 1.43 1.33 0.09 0.95 1.85E-08
CCSA4 0.97 1.47 1.42 0.12 0.73 3.77E-12 CCSA4 1 1.42 1.33 0.09 0.99 1.44E-05
CCSA5 1 1.47 1.4 0.12 0.44 3.64E-09 CCSA5 0.99 1.39 1.33 0.08 0.8 3.59E-11
CCSA6 0.94 1.47 1.4 0.13 0.08 3.01E-08 CCSA6 1.15 1.48 1.37 0.08 0.94 1.25E-07
CCSA7 1.04 1.47 1.41 0.1 0.04 3.52E-05 CCSA7 1.05 1.4 1.31 0.05 0.68 1.20E-07
CCSA8 0.99 1.47 1.42 0.1 0.07 2.11E-09 CCSA8 1 1.35 1.31 0.07 0.32 5.22E-10
CCSA9 1.01 1.47 1.41 0.1 0.09 3.78E-06 CCSA9 1 1.39 1.32 0.08 0.48 1.55E-08
CCSA10 0.98 1.47 1.37 0.14 0.13 1.00E-02 CCSA10 1.06 1.37 1.32 0.06 0.12 4.29E-17

D5 Worst Best Mean SD ASS P val. D6 Worst Best Mean SD ASS P val.
CSA 1.01 1.37 1.3 0.11 0.21 CSA 0.46 0.52 0.77 0.1 0.5
CCSA1 1.03 1.4 1.36 0.08 0.07 1.11E-09 CCSA1 0.47 0.9 0.86 0.09 0.12 5.24E-09
CCSA2 1 1.45 1.38 0.09 0.26 9.86E-09 CCSA2 0.48 0.9 0.84 0.1 0.78 1.64E-05
CCSA3 0.99 1.44 1.32 0.14 0.75 6.00E-03 CCSA3 0.47 0.88 0.82 0.09 0.59 1.37E-08
CCSA4 0.98 1.45 1.37 0.1 0.82 4.38E-06 CCSA4 0.48 0.89 0.87 0.07 0.18 9.13E-14
CCSA5 1.01 1.39 1.34 0.08 0.46 1.26E-04 CCSA5 0.48 0.9 0.84 0.09 0.22 1.88E-09
CCSA6 1 1.41 1.34 0.1 0.46 8.48E-06 CCSA6 0.48 0.9 0.85 0.09 0.8 4.56E-08
CCSA7 1.02 1.45 1.34 0.08 0.18 1.27E-07 CCSA7 0.55 0.9 0.87 0.07 0.16 6.60E-13
CCSA8 1.02 1.41 1.36 0.08 0.02 6.00E-07 CCSA8 0.49 0.9 0.87 0.07 0.13 6.66E-13
CCSA9 1.01 1.41 1.36 0.08 0.06 5.43E-07 CCSA9 0.49 0.9 0.87 0.07 0.13 6.66E-13
CCSA10 1 1.45 1.4 0.09 1 1.24E-09 CCSA10 0.4 0.9 0.82 0.1 0.38 4.03E-07

D7 Worst Best Mean SD ASS P val. D8 Worst Best Mean SD ASS P val.
CSA 1.01 1.66 1.58 0.16 0.28 CSA 1.08 1.3 1.25 0.06 0.97
CCSA1 0.95 1.72 1.63 0.14 0.37 1.38E-05 CCSA1 1.13 1.3 1.25 0.05 0.66 5.14E-04
CCSA2 1.04 1.73 1.59 0.18 0.37 2.68E-02 CCSA2 1.07 1.36 1.29 0.07 0.99 1.08E-04
CCSA3 1.05 1.72 1.49 0.19 0.34 5.62E-02 CCSA3 1.09 1.28 1.24 0.04 1 4.79E-05
CCSA4 1.32 1.73 1.63 0.12 0.36 7.43E-04 CCSA4 0.09 1.3 1.25 0.05 0.69 1.70E-05
CCSA5 1.25 1.73 1.62 0.16 0.48 1.28E-04 CCSA5 1.05 1.34 1.26 0.07 0.83 5.40E-01
CCSA6 1.07 1.73 1.63 0.12 0.3 1.90E-02 CCSA6 1.06 1.35 1.28 0.07 0.95 2.50E-05
CCSA7 1.13 1.73 1.63 0.12 0.22 2.10E-05 CCSA7 1.13 1.36 1.28 0.05 0.79 1.20E-05
CCSA8 1.15 1.73 1.62 0.12 0.31 9.50E-03 CCSA8 1.3 1.33 1.25 0.68 0.68 1.13E-01
CCSA9 1.07 1.66 1.62 0.12 0.25 8.05E-04 CCSA9 1.06 1.29 1.24 0.05 0.89 9.13E-05
CCSA10 1.06 1.52 1.39 0.11 0.99 5.60E-03 CCSA10 0.98 1.36 1.27 0.09 1 3.40E-03
180 Neural Comput & Applic (2019) 31:171–188

Table 5 (continued)

D9 Worst Best Mean SD ASS P val. D10 Worst Best Mean SD ASS P val.
CSA 1.05 1.3 1.25 0.07 0.55 CSA 1.18 1.49 1.39 0.1 0.54
CCSA1 1.06 1.36 1.29 0.07 0.68 1.99E-04 CCSA1 1.18 1.53 1.45 0.08 0.7 2.34E-04
CCSA2 1.05 1.38 1.32 0.08 0.98 5.47E-09 CCSA2 1.21 1.54 1.46 0.09 0.9 7.20E-04
CCSA3 1.01 1.36 1.31 0.08 0.97 6.17E-08 CCSA3 1.19 1.54 1.41 0.08 0.44 7.47E-02
CCSA4 1.04 1.38 1.33 0.08 0.94 6.46E-07 CCSA4 1.13 1.43 1.43 0.09 0.83 2.13E-04
CCSA5 1.07 1.37 1.29 0.08 0.78 4.23E-05 CCSA5 1.2 1.47 1.41 0.06 0.37 8.20E-01
CCSA6 1.07 1.38 1.31 0.07 0.84 7.04E-07 CCSA6 1.17 1.45 1.4 0.07 0.41 2.64E-04
CCSA7 1.08 1.38 1.32 0.07 0.65 4.46E-08 CCSA7 1.2 1.55 1.46 0.06 0.18 9.56E-05
CCSA8 1.04 1.36 1.31 0.07 0.74 2.06E-08 CCSA8 1.24 1.5 1.44 0.07 0.26 3.13E-05
CCSA9 1.03 1.4 1.3 0.08 0.78 1.21E-05 CCSA9 1.21 1.49 1.45 0.08 0.45 2.34E-05
CCSA10 1.03 1.36 1.32 0.07 1 3.35E-06 CCSA10 1.26 1.49 1.44 0.07 1 4.31E-02

D11 Worst Best Mean SD ASS P val. D12 Worst Best Mean SD ASS P val.
CSA 1.16 1.62 1.52 0.13 0.15 CSA 1.27 1.7 1.59 0.14 0.21
CCSA1 1.27 1.69 1.64 0.09 0.36 3.77E-08 CCSA1 1.26 1.75 1.66 0.1 0.2 9.90E-04
CCSA2 1.2 1.69 1.61 0.1 0.72 8.05E-06 CCSA2 1.27 1.73 1.63 0.09 0.18 4.15E-03
CCSA3 1.22 1.66 1.57 0.13 0.9 4.21E-01 CCSA3 1.31 1.58 1.5 0.07 0.33 9.19E-02
CCSA4 1.28 1.66 1.58 0.1 0.1 8.76E-05 CCSA4 1.29 1.7 1.61 0.1 0.26 2.79E-03
CCSA5 1.21 1.68 1.59 0.09 0.51 4.60E-02 CCSA5 1.27 1.77 1.68 0.13 0.25 1.23E-05
CCSA6 1.22 1.68 1.63 0.09 0.5 3.90E-07 CCSA6 1.25 1.77 1.67 0.13 0.19 2.62E-04
CCSA7 1.25 1.69 1.65 0.08 0.2 2.13E-08 CCSA7 1.3 1.76 1.68 0.09 0.27 6.14E-05
CCSA8 1.25 1.66 0.09 0.09 0.15 3.21E-02 CCSA8 1.33 1.77 1.67 0.1 0.3 9.81E-04
CCSA9 1.17 1.69 1.63 0.1 0.53 8.25E-08 CCSA9 1.26 1.74 1.65 0.11 0.12 3.90E-03
CCSA10 1.23 1.67 1.6 0.12 0.16 2.36E-04 CCSA10 1.3 1.62 1.54 0.09 1 1.65E-03

D13 Worst Best Mean SD ASS P val. D14 Worst Best Mean SD ASS P val.
CSA 1.07 1.52 1.4 0.12 0.39 CSA 1.17 1.5 1.42 0.1 0.33
CCSA1 1.18 1.64 1.59 0.1 0.04 2.86E-13 CCSA1 1.23 1.55 1.51 0.07 0.44 2.95E-02
CCSA2 1.09 1.64 1.57 0.11 0.25 5.64E-10 CCSA2 1.15 1.57 1.53 0.09 0.98 2.74E-08
CCSA3 1.15 1.58 1.51 0.12 0.51 1.78E-08 CCSA3 1.16 1.57 1.48 0.1 0.93 7.31E-02
CCSA4 1.13 1.64 1.55 0.13 0.18 9.12E-08 CCSA4 1.15 1.58 1.53 0.09 0.07 1.00E-06
CCSA5 1.16 1.64 1.57 0.11 0.14 4.72E-11 CCSA5 1.19 1.57 1.49 0.08 0.69 2.15E-02
CCSA6 1.16 1.64 1.55 0.15 0.21 3.31E-08 CCSA6 1.19 1.57 1.5 0.08 0.43 4.33E-03
CCSA7 1.18 1.64 1.59 0.1 0.2 2.04E-11 CCSA7 1.19 1.58 1.53 0.07 0.67 1.20E-06
CCSA8 1.2 1.63 1.56 0.09 0.09 3.95E-11 CCSA8 1.22 1.58 1.51 0.07 0.31 1.84E-03
CCSA9 1.18 1.64 1.58 0.1 0.08 1.16E-12 CCSA9 1.16 1.58 1.51 0.08 0.19 6.26E-03
CCSA10 1.14 1.63 1.52 0.14 0.47 3.70E-08 CCSA10 1.16 1.58 1.5 0.1 0.69 4.70E-04

D15 Worst Best Mean SD ASS P val. D16 Worst Best Mean SD ASS P val.
CSA 1.14 1.72 1.52 0.22 0.3 CSA 0.69 1.83 1.52 0.36 0.1
CCSA1 1.18 1.76 1.7 0.11 0.14 7.71E-04 CCSA1 0.71 1.84 1.64 0.25 0.1 4.24E-02
CCSA2 1.13 1.8 1.66 0.18 0.17 9.80E-04 CCSA2 0.9 1.84 1.49 0.31 0.32 9.12E-04
CCSA3 1.18 1.8 1.62 0.17 0.21 1.48E-04 CCSA3 0.83 1.86 1.58 0.3 0.06 4.55E-04
CCSA4 1.1 1.8 1.68 0.19 0.97 3.28E-06 CCSA4 0.81 1.85 1.59 0.36 0.16 2.27E-04
CCSA5 1.19 1.8 1.68 0.21 0.3 6.22E-07 CCSA5 0.89 1.84 1.62 0.28 0.1 4.55E-03
CCSA6 1.18 1.76 1.65 0.15 0.13 7.72E-03 CCSA6 0.75 1.85 1.62 0.3 0.15 3.32E-04
CCSA7 1.18 1.8 1.69 0.11 0.18 1.20E-04 CCSA7 0.91 1.85 1.64 0.24 0.11 2.58E-04
CCSA8 1.08 1.8 1.68 0.15 0.09 9.45E-03 CCSA8 0.59 1.84 1.64 0.29 0.09 6.06E-04
CCSA9 1.04 1.76 1.65 0.17 0.1 5.76E-03 CCSA9 0.77 1.84 1.69 0.29 0.11 9.52E-04
CCSA10 1.02 1.8 1.65 0.2 0.23 5.14E-05 CCSA10 0.62 1.85 1.66 0.34 0.1 2.27E-05
Neural Comput & Applic (2019) 31:171–188 181

Table 5 (continued)

D17 Worst Best Mean SD ASS P val. D18 Worst Best Mean SD ASS P val.
CSA 1.04 1.73 1.5 0.2 0.12 CSA 1.09 1.61 1.51 0.15 0.37
CCSA1 1.06 1.81 1.63 0.2 0.49 6.21E-05 CCSA1 1.11 1.76 1.66 0.14 0.11 4.49E-08
CCSA2 1.03 1.81 1.66 0.21 0.06 6.66E-05 CCSA2 1.2 1.76 1.62 0.15 0.4 9.26E-05
CCSA3 1.08 1.77 1.43 0.15 0.43 1.31E-02 CCSA3 1.17 1.61 1.52 0.12 0.28 9.69E-02
CCSA4 1.08 1.81 1.65 0.2 0.17 1.99E-04 CCSA4 1.16 1.76 1.64 0.15 0.78 4.10E-07
CCSA5 1.08 1.81 1.66 0.19 0.11 6.29E-05 CCSA5 1.09 1.76 1.62 0.15 0.94 4.77E-05
CCSA6 1.12 1.81 1.69 0.19 0.07 3.38E-06 CCSA6 1.12 1.76 1.62 0.15 0.15 6.92E-05
CCSA7 1.14 1.81 1.69 0.17 0.07 5.36E-06 CCSA7 1.19 1.76 1.7 0.12 0.64 9.80E-09
CCSA8 1.09 1.53 1.49 0.09 0.04 7.82E-04 CCSA8 1.2 1.76 1.66 0.15 0.2 1.15E-08
CCSA9 1.1 1.75 1.65 0.16 0.07 2.81E-04 CCSA9 1.17 1.76 1.69 0.14 0.88 8.30E-09
CCSA10 1.04 1.81 1.65 0.21 0.15 5.25E-04 CCSA10 1.18 1.76 1.64 0.15 0.47 2.62E-06

D19 Worst Best Mean SD ASS P val. D20 Worst Best Mean SD ASS P val.
CSA 0.91 1.37 1.3 0.13 0.27 CSA 0.86 1.33 1.27 0.15 0.25
CCSA1 0.99 1.45 1.39 0.1 0.74 8.47E-08 CCSA1 0.98 1.44 1.39 0.13 0.19 3.02E-11
CCSA2 0.95 1.45 1.37 0.12 0.76 1.30E-05 CCSA2 0.94 1.44 1.37 0.14 0.29 2.63E-09
CCSA3 1 1.36 1.3 0.09 0.41 6.65E-05 CCSA3 0.96 1.41 1.32 0.14 0.19 7.11E-08
CCSA4 0.96 1.45 1.38 0.12 0.73 2.47E-08 CCSA4 0.92 1.44 1.4 0.11 0.33 3.81E-10
CCSA5 0.94 1.45 1.36 0.13 0.39 1.77E-05 CCSA5 0.99 1.44 1.38 0.11 0.18 4.34E-10
CCSA6 0.92 1.45 1.37 0.12 0.6 1.07E-07 CCSA6 0.93 1.44 1.38 0.12 0.45 2.18E-08
CCSA7 0.95 1.45 1.39 0.1 0.84 1.87E-08 CCSA7 0.94 1.44 1.41 0.11 0.17 1.98E-13
CCSA8 0.94 1.44 1.39 0.12 0.6 6.10E-10 CCSA8 0.95 1.44 1.39 0.1 0.59 1.99E-09
CCSA9 0.95 1.45 1.4 0.12 0.42 1.94E-10 CCSA9 1.03 1.44 1.41 0.1 0.15 2.66E-13
CCSA10 0.9 1.43 1.36 0.13 0.76 3.33E-05 CCSA10 0.97 1.44 1.38 0.12 0.26 4.42E-08

algorithms. All these experiments are performed on the results compared with the others. In addition, the obtained
same PC with the same specification. The detailed settings P value proves this improvement, which means sine chaotic
are presented in Table 4. map can improve the performance of CSA. Thus, these
results prove that sine map is statistically significant than
4.3.1 The performance of CCSA with different chaotic maps CSA, as it obtains good classification performance with
the minimum number of features. The highest classification
The main objective of this experiment is to evaluate the performance with small size of features is highly desired
performance of CCSA with different chaotic maps and in biology and medicine. This is due to fewer experiments
determine the optimal chaotic map. needed for patients to detect a disease or cancer. Also, it
Table 5 compares CCSA with different chaotic maps with can be useful for reducing the cost involved in each exper-
the original CSA in terms of mean, best, and worst fit- iment. However, CCSA3 in most of the cases obtains the
ness, ASS, SD, and P value of Wilcoxon’s rank sum test. worst results. Moreover, P value shows that, too, there is no
P value is used to compared CSA with different versions huge difference between CSA and CCSA3 because P value
of CCSA. The best results of worst, mean, and best fitness, exceeds 0.05. According to these results, sine chaotic map is
standard deviation, and average selection size are under- selected to be the appropriate map of CCSA. In the follow-
lined. Despite, the worst results of P value where P < 0.05 ing section, CCSA with sine chaotic map will be evaluated
are underlined. It must be pointed out that CCSA1, CCSA2,. further in more details.
. ., CCSA10 in the Table 5 refer to the ten adopted chaotic For further comparison of CCSA with different chaotic
maps as shown in Table 1. Also it must be pointed out maps, graphical representation of the convergence curve of
that the marks D1, D2, . . ., D20 in Table 5 refer to the CCSA’s versions are analyzed as well. The convergence
adopted benchmarks as shown in Table 3. As it can be curve of CCSA with different chaotic maps on 20 bench-
observed from Table 5, CCSA with different chaotic maps mark datasets is shown in Fig. 3. In this figure, the number
overtakes the standard CSA. Moreover, it can be noticed of iterations is equal to 50. As it can be observed from this
that in most cases, CCSA7 obtains the highest statistical figure, almost all CCSA with different chaotic maps obtain
182 Neural Comput & Applic (2019) 31:171–188

D1 D2 D9 D10

D3 D4
D11 D12

D5 D6
D13 D14

D7 D8
D15 D16

Fig. 3 Performance comparison on D1 to D20


Fig. 3 (continued)

better results. This is due to that their curves are higher than
the original one “colored with black.” In addition, it can 4.3.2 CCSA vs. other optimization algorithms
be noticed that in most cases, CCSA algorithms converge
faster towards the global optima than CSA. Additionally, it In this subsection, the performance of CCSA with sine
can be observed that CCSA7 curve at most cases obtains chaotic map is compared with other optimization algo-
the highest results. However, CCSA3 obtains the lowest rithms. These algorithms are proposed in the literature for
results. These results are consistent with the obtained results solving the feature selection problem. These algorithms
in Table 5. Such an improvement of the results came from are particle swarm optimization (PSO) [31], artificial bee
embedding chaotic sequence in the searching iterations of colony (ABC) [42], chicken swarm optimization (CSO)
CSA. This helps the algorithm to avoid local optima and [18], flower pollination algorithm (FPA) [41], moth-flame
reach the global optima more faster. optimization (MFO) [55], gray wolf optimizer (GWO) [8],
Neural Comput & Applic (2019) 31:171–188 183

D17 D18 Table 6 Meta-heuristic algorithm parameter settings

Algorithm Parameter Value

PSO An inertial weight 1


A inertia weight damping ratio 0.9
Personal learning coefficient 1.5
Global learning coefficient 2.0
ABC A number of colony size 10
A number of food source 5
D19 D20
A number of limit trials 5
CSO A number of chicken updated 10
The percent of roosters population size 0.15
The percent of hens population size 0.7
The percent of mother hens population size 0.05
FPA The probability switch = 0.6
MFO a −1
b 1
GWO a 2
WOA a 2
Fig. 3 (continued)
b 1
SCA b 2

whale optimization algorithm (WOA) [40], sine cosine algo-


rithm (SCA) [19], and CSA. The parameter settings for all no experiences. The algorithm starts with randomly initial-
adopted optimization algorithms are shown in Table 6. In ized positions followed by random selection of one of the
addition, the default parameters for all algorithms are maxi- flock crows to be followed. Then, the crow discovers the
mum generation = 50, number of search agents = 30, lower position of the food hidden N z . However, FPA starts with
bound = 1, and upper bound = size of features. randomly initialized positions followed by sorting the solu-
Table 7 compares the average score for 50 iterations. As tions according to their fitness value. The best solution sets
it can be seen, the performance of CCSA, WOA, and FPA with the optimal value. Then the algorithm enhances the
is comparable. CCSA algorithms obtains the highest mean obtained results from the first iteration. So FPA obtains the
fitness value on 13 out of 20 benchmarks datasets (D2, D3, highest results for almost most cases. However, for the other
D6, D8, D9, D11, D13, D14, D15, D17, D18, D19, and algorithms such as PSO, they have many parameters. These
D20), despite WOA that obtains the same or higher results parameters can highly effected on the obtained results.
than CCSA for six benchmark datasets (D1, D4, D8, D10, Table 10 compares the stability of the competing algo-
D16, and D19). FPA obtains the highest mean fitness for the rithms with CCSA. As it can be seen, MFO is superior
rest of the datasets (D4, D5, and D7). WOA is found to be compared to the others. However, with this good stability, it
the second most effective performing on the 20 benchmark obtains the worst results in terms of mean, best, and worst
datasets. MFO for most cases obtains the worst results. fitness. The reason behind that is the weakness of these
Table 8 compares the best score obtained from other algorithms for exploitation because of the random parame-
meta-heuristic algorithms with CCSA and CSA. As it can ters. Thus, the initial solution improves slowly. The rest of
be seen from this table, CCSA obtains the highest results in the other algorithms are comparable with each other.
almost most of the cases. WOA is in the second place and The main privilege of CSA is that it has less parame-
FPA is in the third place. CCSA outperform the other algo- ters to adjust. These parameters can highly influence on
rithms for D2, D5, D6, D8, D9, D12, D13, D14, D16, D17, the performance of any optimization algorithm. However,
D18, and D19, while WOA outperforms for D4, D10, D11, selecting the optimum value for each is considered a diffi-
D14, D15, and D20 and FPA for D1, D4, and D7. Also, it cult task. This paper presents a new mechanism for choosing
can be observed, CCSA improves the performance of CSA. the optimal parameter values through using chaotic maps,
Table 9 compares the worst fitness value for meta- where the random parameters are replaced with acquired
heuristics algorithms. As it can be observed, FPA obtains values from the chaotic map. The experimental results show
the highest results. CSA algorithm obtains almost the worst that chaotic maps, especially sine chaotic map, can increase
results. This is due to the memory of each crow set with ini- the stability and the performance of CSA. In addition, the
tial random positions, as the assumption of the crows have results show the repeatability, stability, and the ability of
184 Neural Comput & Applic (2019) 31:171–188

Table 7 Mean fitness value for meta-heuristic algorithms

PSO ABC CSO FPA MFO GWO WOA SCA CSA CCSA

D1 1.24 1.16 1.31 1.4 1.24 1.28 1.32 1.3 1.23 1.29
D2 1.04 1.03 0.98 1.07 0.95 0.99 0.99 1.04 0.97 1.08
D3 1.14 1.16 1.2 1.35 1.14 1.16 1.38 1.3 1.33 1.41
D4 1.35 1.34 1.29 1.38 1.35 1.28 1.38 1.34 1.24 1.31
D5 1.2 1.09 1.2 1.39 1.12 1.12 1.35 1.24 1.3 1.34
D6 0.76 0.74 0.77 0.81 0.64 0.69 0.81 0.77 0.77 0.87
D7 1.43 1.51 1.57 1.66 1.43 1.5 1.66 1.56 1.58 1.63
D8 1.24 1.18 1.15 1.19 1.19 1.24 1.28 1.18 1.25 1.28
D9 1.28 1.17 1.17 1.3 1.23 1.24 1.29 1.23 1.25 1.32
D10 1.4 1.35 1.32 1.47 1.28 1.36 1.48 1.37 1.39 1.46
D11 1.37 1.36 1.4 1.57 1.3 1.35 1.58 1.48 1.52 1.65
D12 1.42 1.26 1.42 1.59 1.28 1.32 1.61 1.5 1.59 1.68
D13 1.25 1.35 1.31 1.52 1.23 1.29 1.56 1.42 1.4 1.59
D14 1.35 1.35 1.36 1.5 1.21 1.3 1.52 1.4 1.42 1.53
D15 1.37 1.4 1.32 1.68 1.26 1.39 1.67 1.5 1.52 1.69
D16 1 1.12 1.18 1.42 0.93 1.02 1.7 1.3 1.52 1.64
D17 1.12 1.15 1.27 1.61 1.13 1.15 1.67 1.37 1.5 1.69
D18 1.42 1.48 1.37 1.6 1.35 1.41 1.47 1.42 1.51 1.7
D19 1.19 1.14 1.13 1.29 1.07 1.12 1.39 1.27 1.3 1.39
D20 1.24 1.26 1.22 1.36 1.14 1.2 1.37 1.28 1.27 1.41

Table 8 Best fitness value for meta-heuristic algorithms

PSO ABC CSO FPA MFO GWO WOA SCA CSA CCSA

D1 1.28 1.17 1.34 1.4 1.26 1.35 1.35 1.32 1.39 1.35
D2 1.05 1.08 1.05 1.09 0.96 1.05 1.08 1.08 1.05 1.13
D3 1.16 1.31 1.23 1.43 1.17 1.25 1.47 1.34 1.4 1.47
D4 1.41 1.43 1.35 1.46 1.36 1.36 1.46 1.4 1.27 1.4
D5 1.28 1.15 1.25 1.44 1.14 1.15 1.39 1.28 1.37 1.45
D6 0.82 0.8 0.8 0.87 0.66 0.71 0.88 0.78 0.52 0.9
D7 1.5 1.57 1.61 1.75 1.45 1.55 1.71 1.6 1.7 1.73
D8 1.28 1.23 1.15 1.21 1.2 1.28 1.34 1.2 1.33 1.36
D9 1.32 1.22 1.2 1.34 1.24 1.33 1.36 1.25 1.3 1.38
D10 1.44 1.39 1.37 1.53 1.3 1.42 1.57 1.4 1.51 1.55
D11 1.41 1.47 1.45 1.62 1.31 1.38 1.71 1.51 1.68 1.7
D12 1.45 1.3 1.45 1.69 1.29 1.38 1.74 1.55 1.73 1.76
D13 1.28 1.4 1.35 1.6 1.24 1.32 1.62 1.45 1.52 1.64
D14 1.38 1.41 1.38 1.55 1.29 1.35 1.58 1.45 1.5 1.58
D15 1.4 1.47 1.39 1.8 1.29 1.42 1.81 1.55 1.72 1.8
D16 1.02 1.18 1.25 1.78 1.02 1.07 1.8 1.38 1.86 1.85
D17 1.19 1.15 1.32 1.79 1.14 1.2 1.78 1.4 1.73 1.81
D18 1.48 1.53 1.42 1.7 1.36 1.42 1.58 1.48 1.66 1.76
D19 1.24 1.22 1.18 1.32 1.09 1.17 1.43 1.32 1.37 1.45
D20 1.3 1.32 1.24 1.4 1.18 1.3 1.45 1.3 1.33 1.44
Neural Comput & Applic (2019) 31:171–188 185

Table 9 Worst fitness value for meta-heuristic algorithms

PSO ABC CSO FPA MFO GWO WOA SCA CSA CCSA

D1 1.19 1.16 1.26 1.39 1.23 1.2 1.24 1.15 1 1.07


D2 1 0.93 0.9 1.02 0.9 0.9 0.89 0.85 0.77 0.81
D3 1.01 1 1.03 1.1 1 1.07 1.05 1.06 0.97 1.04
D4 1.08 1.22 1.13 1.31 1.33 1.25 1.2 1.23 1.14 1.05
D5 1.11 1.02 1.07 1.22 1.09 1.07 1.07 1.04 1.01 1.02
D6 0.7 0.73 0.68 0.77 0.62 0.62 0.68 0.68 0.46 0.55
D7 1.31 1.41 1.45 1.52 1.42 1.44 1.41 1.33 1.01 1.13
D8 1.18 1.12 1.15 1.16 1.18 1.16 1.21 1.15 1.08 1.13
D9 1.15 1.09 1.15 1.28 1.21 1.13 1.15 1.13 1.05 1.08
D10 1.35 1.27 1.26 1.38 1.26 1.31 1.3 1.25 1.18 1.2
D11 1.31 1.27 1.3 1.27 1.27 1.3 1.27 1.3 1.16 1.25
D12 1.28 1.24 1.3 1.52 1.25 1.28 1.27 1.3 1.27 1.3
D13 1.2 1.29 1.23 1.36 1.2 1.21 1.25 1.17 1.07 1.18
D14 1.29 1.23 1.26 1.39 1.24 1.27 1.25 1.25 1.17 1.19
D15 1.29 1.4 1.25 1.52 1.24 1.21 1.2 1.25 1.14 1.18
D16 0.98 0.98 0.97 1.28 0.9 0.92 1.02 0.92 0.69 0.91
D17 1.13 1.15 1.13 1.38 1.1 1.13 1.12 0.98 1.04 1.14
D18 1.4 1.28 1.32 1.53 1.33 1.4 1.4 1.33 1.09 1.19
D19 1.13 1.01 1.14 1.2 1.05 1.05 1.14 1.15 0.91 0.95
D20 1.12 1.05 1.1 1.29 1.12 1.07 1.1 1.09 0.86 0.94

Table 10 Standard deviation for meta-heuristic algorithms

PSO ABC CSO FPA MFO GWO WOA SCA CSA CCSA

D1 0.03 0.01 0.05 0.01 0.01 0.03 0.03 0.05 0.07 0.05
D2 0.01 0.05 0.05 0.03 0.03 0.04 0.05 0.06 0.07 0.06
D3 0.02 0.1 0.09 0.1 0.04 0.06 0.15 0.09 0.12 0.1
D4 0.09 0.08 0.1 0.08 0.03 0.06 0.13 0.06 0.04 0.05
D5 0.06 0.07 0.07 0.05 0.03 0.05 0.1 0.08 0.11 0.08
D6 0.04 0.06 0.04 0.05 0.03 0.04 0.06 0.05 0.1 0.07
D7 0.06 0.07 0.08 0.07 0.02 0.05 0.09 0.09 0.16 0.12
D8 0.05 0.06 0 0.02 0.02 0.05 0.04 0.02 0.06 0.05
D9 0.06 0.06 0.04 0.04 0.02 0.09 0.06 0.05 0.07 0.07
D10 0.07 0.06 0.05 0.05 0.04 0.06 0.09 0.05 0.1 0.06
D11 0.06 0.08 0.07 0.09 0.01 0.04 0.09 0.08 0.13 0.08
D12 0.06 0.05 0.06 0.05 0.01 0.06 0.1 0.08 0.14 0.09
D13 0.05 0.09 0.04 0.07 0.02 0.06 0.1 0.08 0.12 0.1
D14 0.04 0.08 0.06 0.07 0.05 0.06 0.09 0.07 0.1 0.07
D15 0.05 0.06 0.06 0.07 0.02 0.08 0.18 0.1 0.22 0.11
D16 0.05 0.1 0.12 0.13 0.04 0.08 0.23 0.18 0.36 0.24
D17 0.06 0 0.08 0.21 0.01 0.07 0.24 0.1 0.2 0.17
D18 0.06 0.11 0.07 0.11 0.02 0.01 0.09 0.07 0.15 0.12
D19 0.09 0.1 0.06 0.05 0.02 0.08 0.1 0.08 0.13 0.1
D20 0.09 0.1 0.06 0.07 0.03 0.11 0.1 0.09 0.15 0.11
186 Neural Comput & Applic (2019) 31:171–188

Table 11 Comparison between before applying CCSA and after applying CCSA

Before CCSA After CCSA

Lt Accuracy(%) Time (sec) Lf Accuracy(%) Time (sec)

D1 36 94.9467468 0.719863916 21 91.9814006 0.412878573


D2 10 52.1120568 3.898838573 5 56.4238614 1.960385134
D3 24 67.3024206 0.142678961 12 75.2986905 0.068773236
D4 15 81.6397423 0.095975524 9 89.1433695 0.066734388
D5 36 63.9529284 0.099454607 18 74.3023043 0.052294992
D6 8 21.1255002 0.511819257 4 23.9413022 0.426966692
D7 10 97.8919654 0.087094961 3 99.993271 0.04356745
D8 16 90.0411199 1.061973386 11 92.3475118 0.888438378
D9 21 79.3606111 0.123031527 10 83.1285304 0.102288399
D10 16 96.5858538 0.078921282 9 93.6982489 0.041023341
D11 30 94.6010058 0.096846979 14 90.2838101 0.047685087
D12 80 98.6991135 0.231579869 49 100 0.138074349
D13 22 87.2745655 0.095018416 15 90.7859824 0.035512628
D14 22 89.534504 0.291153506 13 91.0358514 0.198558899
D15 20 83.700571 0.093885187 2 100 0.038915681
D16 57 56.0296056 0.073634809 5 100 0.035579996
D17 45 75.8928936 0.099246728 20 81.5086657 0.037070276
D18 17 84.8157717 0.085629223 3 100 0.051169463
D19 13 63.6562316 0.097383518 4 78.8470018 0.043596322
D20 10 66.6812339 0.09964035 1 71.682268 0.051165132

the CCSA for finding the feature subset in the feature space feature selection algorithm. These selected features can be
better than the other evolutionary algorithms. This is due used further for improving the clustering performance. As
to the adopted evolutionary algorithms that employ random it was mentioned before, clusters built by a subset of salient
parameters in their searching process that can significantly features are more practical and interpretable than clusters
influence their performance. Moreover, the random behav- built using all of the features, which include noise [6]. Thus,
ior of agents such as particle, ant, and firefly can cause the it can help in better data understanding and interpretation.
optimization process stagnation with high probability.

4.3.3 Classification performance 5 Conclusion

In this subsection, the selected features of CCSA with sine In this paper, a novel hybridization of chaos with CSA algo-
chaotic map are evaluated using three criteria. These criteria rithm, namely CCSA, is proposed. Ten chaotic maps are
are classification accuracy, number of selected features, and used in this study to enhance the performance and con-
CPU processing time. A comparison between using whole vergence speed of CSA. CCSA is applied on one of the
feature length and using subset of it is shown in Table 11. challenge problems, namely feature selection. The proposed
In this table, Lt indicates the dataset feature length and Lf CCSA feature selection algorithm has been validated on
indicates the feature subset of CCSA length. Tenfold cross 20 benchmark datasets. Six different evaluation criteria are
validation method is employed. In this study, KNN classifier adopted in this study. These criteria are best, worst, and
with k = 3 and Euclidean distance are adopted. The mean mean fitness value, SD, ASS, and P value. In addition, the
accuracy of ten folds with CPU processing time in seconds performance of CCSA is compared with the popular and
and number of features are recorded. As it can be seen, using most recent meta-heuristic algorithms. These algorithms are
only a subset of features can highly increase the classifi- PSO, ABC, CSO, FPA, MFO, GWO, WOA, SCA, and CSA.
cation performance and reduce the computational time and The experimental results show that CCSA outperforms the
memory storage. In addition, it can be observed that in some other algorithms in terms of best and mean fitness value.
cases, the accuracy reaches 100% with less computational Moreover, the results show that the CCSA with sine map
time. These obtained results prove the superiority of CCSA can significantly enhance CSA in terms of classification
Neural Comput & Applic (2019) 31:171–188 187

performance, stability quality, number of selected features, 18. Hafez AI, Zawbaa HM, Emary E, Mahmoud HA, Hassanien AE
and convergence speed. Further work on embedding chaotic (2015) An innovative approach for feature selection based on
chicken swarm optimization. In: 7th international conference of
maps with other meta-heuristic algorithms will be consid-
soft computing and pattern recognition (SoCPaR), pp 19–24
ered. For future verification, the performance of CCSA will 19. Hafez AI, Zawbaa HM, Emary E, Hassanien AE (2016) Sine
be applied on more complex science and real-world engi- cosine optimization algorithm for feature selection. In: Inter-
neering problems. Moreover, other chaotic maps are also national symposium on inovations in intelligent systems and
worth applying to CSA. applications (INISTA), pp 1–5
20. He YY, Zhou JZ, Zhou XQ (2009) Comparison of different
chaotic maps in particle swarm optimization algorithm for long
Compliance with Ethical Standards term cascaded hydroelectric system scheduling. Chaos Solitons
Fractals 42:3169–1376
21. He YY, Zhou JZ, Li CS (2008) A precise chaotic particle swarm
Conflict of interest The authors declare that they have no conflict
optimization algorithm based on improved tent map. ICNC 7:569–
of interest.
573
22. He Y, Zhou J, Lu N, Qin H, Lu Y (2010) Differential evolu-
tion algorithm combined with chaotic pattern search. Kybernetika
References 46(4):684–696
23. Jia H, Ding S, Du M, Xue Y (2016) Approximate normalized cuts
without Eigen-decomposition. Inf Sci 374:135–150
1. Abdullah A, Enayatifa R, Lee M (2012) A hybrid genetic algo- 24. Jian L, Li J, Shu K, Liu H (2016) Multi-label informed feature
rithm and chaotic function model for image encryption. Journal of selection. In: Proceedings of the twenty-fifth international joint
Electronics and Communication 66(1):806–816 conference on artificial intelligence, pp 1627–1633
2. Askarzadeh A (2016) A novel metaheuristic method for solv- 25. Kaveh A, Talatahari S (2010) A novel heuristic optimization
ing constrained engineering optimization problems: crow search method: charged system search. Acta Mech 213:267–289
algorithm. Comput Struct 169:1–12 26. Kennedy J, Eberhart R (1995) Particle swarm optimization. In:
3. Bache K, Lichman M UCI Machine learning repository. http:// IEEE international conference on neural networks, vol 4, pp 1942–
archive.ics.uci.edu/ml. Retrieved July 19, 2016 1948
4. Blum A, Langley P (1997) Selection of relevant features and 27. Kohavi R, John G (1997) Wrappers for feature subset selection.
examples in machine learning. Artif Intell 97:245–271 Artif Intell 97(1):273–324
5. Cai JJ, Ma XQ, Li X (2007) Chaotic ant swarm optimization to 28. Lei Y, Huan L (2003) Feature selection for high-dimensional data:
economic dispatch. Electr Power Syst Res 77(10):1373–1380 a fast correlation-based filter solution. In: Proceedings of the 20th
6. Chen CH (2014) A hybrid intelligent model of analyzing clin- international conference on machine learning (ICML-03), pp 856–
ical breast cancer data using clustering techniques with feature 863
selection. Appl Soft Comput 20:4–14 29. Li B, Jiang W (1998) Optimizing complex functions by chaos
7. Derrac J, Garcı́a S, Molina D, Herrera F (2011) A practical tuto- search. Journal of Cybernetics and Systems 29:409–419
rial on the use of nonparametric statistical tests as a methodology 30. Li X, Zhang J, Yin M (2013) Animal migration optimization:
for comparing evolutionary and swarm intelligence algorithms. an optimization algorithm inspired by animal migration behavior,
Swarm Evol Comput 1(1):3–18 Neural Comput Applic, pages=1–11
8. Emary E, Zawbaa H, Hassanien A (2016) Binary gray wolf 31. Lin S, Ying KS-C, Lee Z (2008) Particle swarm optimization for
optimization approaches for feature selection. Neurocomputing parameter determination and feature selection of support vector
172:371–381 machines. Expert Syst Appl 35(4):1817–1824
9. Figueiredo E, Ludermir T, Bastos C (2016) Many objective parti-
32. Meng X, Gao XZ, Lu L, Liu Y, Zhang H (2016) A new bio-
cle swarm optimization. Inf Sci 374:115–134
inspired optimisation algorithm: bird swarm algorithm. J Exp
10. Gadat S, Younes L (2007) A stochastic algorithm for feature
Theor Artif Intell 28(4):673–687
selection in pattern recognition. Journal of Machine Learning
33. Mingjun J, Tang HW (2004) Application of chaos in simulated
8:509–547
annealing optimization. Chaos Solitons Fractals 21:933–941
11. Gai-Ge W, Suash D, Leandro D, Coelho S (2015) Elephant
34. Mirjalili S, Seyed M, Lewis A (2014) Grey wolf optimizer. Adv
herding optimization. In: 3rd international symposium on compu-
Eng Softw 69:46–61
tational and business intelligence (ISCBI), Bali, pp 1–5
12. Gandomi A, Yang X, Alavi A (2011) Mixed variable structural 35. Ng K, Liu H (2000) Customer retention via data mining. AI
optimization using firefly algorithm. Comput Struct 89:2325– Review 14:569–590
2336 36. Repinsek M, Liu S, Mernik L (2012) A note on teaching–learning-
13. Gandomi AH, Yang XS, Talatahari S, Alavi AH (2013) Fire- based optimization algorithm. Inf Sci 212:79–93
fly algorithm with chaos. Commun Nonlinear Sci Numer Simul 37. Rui Y, Huang TS, Chang S (1999) Image retrieval: current tech-
18(1):89–98 niques, promising directions and open issues. J Vis Commun
14. Geem Z, Kim J, Loganathan G (2001) A new heuristic optimiza- Image Represent 10:39–62
tion algorithm: harmony search. Simulation 76(2):60–68 38. Sarafrazi S (2013) Facing the classification of binary problems
15. Goldberg D (1989) Genetic algorithms in search, optimization and with a gsa-svm hybrid system. Math Comput Model 57:270–278
machine learning, 1st edn. Addison-Wesley Longman Publishing 39. Saremi S, Mirjalili S, Lewis A (2014) Biogeography-based opti-
Co., Inc., Boston, MA, USA. ISBN 0201157675 mization with chaos. Neural Comput & Applic 25(5):1077–1097
16. Golub TR (1999) Molecular classification of cancer: class discov- 40. Sayed G, Darwish A, Hassanien A, Pan S (2016) Breast cancer
ery and class prediction by gene expression monitoring. Science diagnosis approach based on meta-heuristic optimization algo-
286:531–537 rithm inspired by bubble-net hunting strategy of whales. In: 10th
17. Guyon I, Elisseeff A (2003) An introduction to variable and international conference on genetic and evolutionary computing
attribute selection. Machine Learning Research 3:1157–1182 (ICGEC), Fujian, China, pp 306–313
188 Neural Comput & Applic (2019) 31:171–188

41. Sayed S, Nabil E, Badr A (2016) A binary clonal flower pollina- 51. Yang DX, Li G, Cheng GD (2007) On the efficiency of chaos
tion algorithm for feature selection. Pattern Recogn Lett 77:21–27 optimization algorithms for global optimization. Chaos Solitons
42. Schiezaro M, Pedrini H (2013) Data feature selection based on Fractals 34:1366–1375
artificial bee colony algorithm. EURASIP Journal on Image and 52. Yang Y, Pederson JO (1997) A comparative study on feature
Video Processing 2013(1):1–8 selection in text categorization. In: Proceedings of the fourteenth
43. Shilaskar S, Ghatol A (2013) Feature selection for medical diag- international conference on machine learning, pp 412–420
nosis: evaluation for cardiovascular diseases. Expert Syst Appl 53. Yu Z (2014) Hybrid clustering solution selection strategy. Pattern
40(10):4146–4153 Recogn 47:3362–3375
44. Storn R, Price K (1997) Differential evolution -a simple and effi- 54. Yuan XF, Wang YN, Wu LH (2007) Pattern search algorithm
cient heuristic for global optimization over continuous spaces. J using chaos and its application. Journal of Hunan University,
Glob Optim 11(4):341–359 Natural Sciences 34(9):30–33
45. Tavazoei MS, Haeri M (2007) Comparison of different one- 55. Zawbaa H, Emary E, Parv B, Shaarawi M (2016) Feature selec-
dimensional maps as chaotic search pattern in chaos optimization tion approach based on moth-flame optimization algorithm. In:
algorithms. Appl Math Comput 187:1076–1085 IEEE congress on evolutionary computation, Vancouver, Canada,
46. Unler A, Murat A (2010) A discrete particle swarm optimiza- pp 24–29
tion method for feature selection in binary classification problems. 56. Zhang H, Sun G (2002) Feature selection using tabu search
Journal of Operation Research 206:528–539 method. Pattern Recogn 35:701–711
47. Wanga G, Guo L, Gandomi A, Hao G, Wangb H (2014) Chaotic 57. Zhang L, Zhang CJ (2008) Hopf bifurcation analysis of some
krill herd algorithm. Inf Sci 274:17–34 hyperchaotic systems with time-delay controllers. Kybernetika
48. Wilcoxon F (1945) Individual comparisons by ranking methods. 44(1):35–42
Biom Bull 1:80–83 58. Zhang N, Ding S, Zhang J (2016) Multi layer elm-rbf for multi-
49. Yuan XH, Yuan YB, Zhang YC (2002) A hybrid chaotic genetic label learning. Appl Soft Comput 43:535–545
algorithm for short-term hydro system scheduling. Math Comput 59. Zhang Q, Li Z, Zhou CJ, Wei XP (2013) Bayesian network struc-
Simul 59(4):319–327 ture learning based on the chaotic particle swarm optimization
50. Xiang T, Liao XF, Wong KW (2007) Comparison of differ- algorithm. Genet Mol Res 12(4):4468–4479
ent chaotic maps in particle swarm optimization algorithm for 60. Zhu ZL, Li SP, Yu H (2008) A new approach to generalized
long term cascaded hydroelectric system scheduling. Appl Math chaos synchronization based on the stability of the error system.
Comput 190:1637–1645 Kybernetika 44(4):492–500

Potrebbero piacerti anche