Expert Systems With Applications: J.H. Ang, K.C. Tan, A.A. Mamun

Expert Systems with Applications 37 (2010) 13021315
Contents lists available at ScienceDirect
Expert Systems with Applications

journal homepage: www.elsevier.com/locate/eswa
An evolutionary memetic algorithm for rule extraction

J.H. Ang, K.C. Tan *, A.A. Mamun
Department of Electrical and Computer Engineering, National University of Singapore, 4 Engineering Drive 3, Singapore 117576, Singapore
a r t i c l e
i n f o
Keywords:
Evolutionary Algorithms
Memetic search
Rule extraction
Articial immune systems
a b s t r a c t
In this paper, an Evolutionary Memetic Algorithm (EMA), which uses a local search intensity scheme to
complement the global search capability of Evolutionary Algorithms (EAs), is proposed for rule extraction. Two schemes for local search are studied, namely EMA-lGA, which uses a micro-Genetic Algorithm-based (lGA) technique, and EMA-AIS, which is inspired by Articial Immune System (AIS) and
uses the clonal selection for cell proliferation. The evolutionary memetic algorithm is complemented
with the use of a variable-length chromosome structure, which allows the exibility to model the number of rules required. In addition, advanced variation operators are used to improve different aspects of
the algorithm. Real world benchmarking problems are used to validate the performance of EMA and
results from simulations show the proposed algorithm is effective.
2009 Elsevier Ltd. All rights reserved.
1. Introduction
Emphasis on information technology has created a greater reliance by experts on the knowledge extracted using automated systems. In recent years, the eld of automated data mining has
emerged as an important area of applied research in order to deal
with the voluminous amount of data collected in various industries
due to the low cost and availability of larger storage devices.
Hence, this paper focuses on one of the major data mining tasks,
i.e. classication. Classication is inherently evident in our daily
life, in the elds of medical diagnosis, criminal investigation, nance, etc. (Bojarczuk, Lopes, Freitas, & Michalkiewicz, 2004;
Doumpos & Zopounidis, 2002; Tan, Yu, Heng, & Lee, 2003). The
use of automated systems for classication provides the advantages of lower processing cost, prompt analysis, providing a second
opinion as human decisions might be bias, etc. In areas such as
medical diagnosis and management decision support, where human experts play a critical role, information derived from automated systems are particularly useful to compliment and
expedite their decisions.
Several automated and statistical techniques for classication
have been studied in the literature, including K-nearest neighbour
(Lepist, Kunttu, & Visa, 2006), discriminate analysis, computational intelligence methods like Neural Networks (NNs) and EAs,
decision trees, etc. Among these, though methods like neural networks (Guan & Li, 2001; Stathakis & Vasilakos, 2006) are able to
provide high predictive accuracies, they are not able to give any
comprehensible knowledge which is of utmost importance to decision makers.
* Corresponding author.
E-mail address: eletankc@nus.edu.sg (K.C. Tan).
0957-4174/$ - see front matter 2009 Elsevier Ltd. All rights reserved.
doi:10.1016/j.eswa.2009.06.028
On the other hand, rule-based algorithms like C4.5 (Quinlan,

1993), decision search trees, fuzzy logic and Evolutionary Algorithms (EAs) (Bojarczuk et al., 2004; Tan et al., 2003; Tan, Yu, &
Ang, 2006b) exhibit tremendous advantages over black box methods as they represent solutions in the form of high level linguistic
rules. Information extracted from databases is only useful if it can
be presented in the form of explicit knowledge, like high level linguistic rules, which clearly interprets the data. EAs stand out as a
promising search algorithm among these rule-based techniques
in various elds due to their easy implementation of chromosome
structures and its population-based global search optimization
capabilities. The ease of representing the rules using a set of chromosome structures for a given problem provides the exibility and
adaptability. The genotype representation of EAs in terms of chromosome structure encode a set of parameters of the problem to be
optimize, this allows exibility in designing the problem representation. Ideally, the representation should clearly reect the problem, be easy to implement, comprehend and manipulate to
exploit the problem structure well. In addition, unlike gradientbased methods, they are less likely to get trapped in local optima
as multiple searches are performed concurrently in a stochastic
manner while converging towards the global optimum. Hence, this
non-mathematical complex optimization method has been widely
accepted by various researchers as an effective alternative optimization method to classical methodologies.
Though EAs are capable of global exploration and locating new
regions of the solution space to identify potential candidates, however, when a potential region is identied, there is no further focus
on the micro-exploitation aspect. Advance schemes are required to
complement the global search capabilities of the algorithm. Hence,
memetic algorithms, which incorporate local improvement search
into EAs, are proposed (Ong & Keane, 2004). Experimental studies
J.H. Ang et al. / Expert Systems with Applications 37 (2010) 13021315
have shown that EA-Local Search (LS) hybrids or hybrid EAs are
capable of more efcient search capabilities (Merz & Freisleben,
1999; Ong & Keane, 2004).
With these in mind, this paper proposes an Evolutionary
Memetic Algorithm (EMA) for rule extraction. The main EA global
search algorithm evolves the architecture of the rule set while
the LS is applied at every generation to ne-tune the rule parameters. This scheme is achievable by the use of variable-length chromosome representation which gives exibility in the
representation of the rule set and allows easy manipulations by
the advanced variation operators, i.e. structural mutation and
structural crossover (Tan, Chew, & Lee, 2006). Two local improvement search algorithms are used in this paper; the rst scheme
uses a micro-Genetic Algorithm (lGA). This simple method provides efcient and guided improvement over the generations.
The second method incorporates ideas from AIS (Alves, Delgado,
Lopes, & Freitas, 2004; Tan, Goh, Mamun, & Ei, 2008). AIS have been
widely used for data analysis in unsupervised machine learning
algorithms (Timmis & Neal, 2001; Timmis, Neal, & Hunt, 2000).
Watkins and Boggress built on further and developed Articial Immune Recognition Systems (AIRS) (Watkins & Boggress, 2002a) as a
supervised classier and these systems follow a K-nearest neighbor
scheme. AIRS as a classication tool have shown to be very competitive to the existing algorithms in the literature in terms of classier performance (Freitas & Timmis, 2007; Goodman, Boggess, &
Watkins, 2002; Watkins & Boggress, 2002b).
The major evaluation metric for classication rule sets in the literature has been classication accuracy. However, there are other
aspects which are equally important when considering the performance of rule sets. Apart from how accurate the rule sets are able
to classify a dataset, the coverage on the dataset is also important.
The ideas of support and condence level borrowed from association rule mining (Savasere, Omiecinski, & Navathe, 1995) are incorporated into tness function evaluation. The effects of these two
factors are analyzed and discussed.
The necessary background information and fundamental
knowledge to appreciate the signicance of the work presented
in this paper are provided in Section 2. The details of the proposed
features and operators of EMA are described in Section 3 while the
algorithm overview is given in Section 4. Section 5 presents the local search algorithm. Experimental setup and datasets used are described in Section 6. Results from simulations on real-world
benchmarking datasets are given and analyzed in Section 7. Conclusions, together with possible future work, are subsequently
drawn in Section 8.
2. Background information
2.1. Rule-based knowledge
The discovered knowledge in rule-based systems is represented
in the form of IF-THEN prediction rules or commonly known as the
Michigan encoding (Michalewicz, 1994). These rules have the
advantage of being high-level symbolic statements. Such rules
state that the presence of one or more observations (antecedents)
implies or predicts a certain consequence. A typical rule has the
form,
Rule : If A1 and A2 and . . . An then C:

where Ai ; 8i 2 f1; 2; . . . ; ng, is the antecedent that leads to the prediction of C, the consequence. Ai is seen as the independent variable
and C as the dependent variable. In a population of rules, each of the
IF-THEN rules can be viewed as an autonomous piece of knowledge
(Tan et al., 2003), and the insertion or deletion of rules into the population would not interfere with the performance of the other individual rules but would collectively reect the performance of the
1303
population. For a multi-class problem, each of the rules predicts a

particular class and the population of rules would encompass all
the different classes of the problem set.
These decision rules can be combined to form a decision rule set
(Pittsburgh encoding) (Michalewicz, 1994) in the form of:
Rule Set: If A11 and A12 and . . . :A1n then C 1
else if A21 and A22 and . . . :A2n then C 2
. . .. . .. . ...
else C default
where Aij ; 8j 2 f1; 2; . . . ; ng, is the antecedent part of rule i that
leads to consequence C i . When a new instance is presented for
evaluation, the rst rule (top most) will be considered rst. If the
instance is not able to t the rules antecedent, the subsequent rule
will be considered. This process is repeated until a rule that
matches the instance is found. In the case where none of the rules
in the rule set matches the new instance, the new instance will be
classied by a general/default rule.
The advantages of Michigan encoding are its simplicity and ease
of representation. It promotes faster convergence rate as the rules
are more specic, hence the search space for each individual rule is
smaller as each rule searches for the optimal antecedents for each
class only. As compared to Michigan encoding, the Pittsburgh rule
set search space is much larger as each single rule set is able to provide solutions to the entire problem. However, unlike Pittsburgh
encoding, Michigan encoding does not take into account rule interaction as the rule evaluation and tness feedback are independent
of others. Hence, one of the main advantages for Pittsburgh encoding is that rule interaction is inherently considered while for Michigan encoding method, it is able to nd a small number of rules
with higher tness (Freitas, 2002; Noda, Freitas, & Lopes, 1999).
In this paper, Pittsburgh encoding is applied. The encoding of a
rule set would be much longer than that of the Michigan approach.
With more parameters within a chromosome, the search space is
inevitably enlarged and the time taken for the Pittsburgh approach
to nd a good solution is also lengthened. Hence, appropriate representation and specic variation operators are required to ease
the implementation. The chromosome representation and operators used in relation to this are given in later sections.
2.2. Articial immune systems
Articial immune systems (AIS) (Coello Coello & Cortes, 2005;
Luh, Chueh, & Liu, 2003) are a relatively new paradigm in the domain of Computational Intelligence (CI). It applies several features
of the human immune system like recognition, detection and elimination of foreign threats and non-self entities while having the
ability to learn.
Similar to the human immune systems response to non-self
entities or antigens (Ag), the AIS would archive and update a pool
of candidate solutions (antibodies) to the problem. The pool of candidate solutions is associated to the problem with an afnity measure which indicates their tness (Tan et al., 2008).
In the eld of AIS, the antigens can be dened as the optimal
solutions, while the antibodies are the candidate solutions trying
to match the antigens. In this paper, the design problem involves
maximizing the modied weighted tness function.
3. Algorithm operators
3.1. Variable length chromosome
The rule set chromosome structure representation often used in
the literature is the xed length structure representation. Fixed
1304
length representations are not suitable for design problems where

various parameters are to be concurrently evolved as the use of it
might cause invalid solutions after the variation operators are applied and the representation itself inherently presents several constraints and limitations.
This paper proposes the use of variable length chromosome
representation (Tan et al., 2006) to represent the rule set topology. This representation would provide the essential exibility
for the simultaneous evolution of various aspects like the number of rules within a rule set, the boundary conditions for each
feature, the masking operator, variation operator and the predicted outcome. Each chromosome represents a rule set that
consists of several rules. Each rule is separated and is coupled
with its associated parameters, thus allowing easy manipulation
by search operators for the addition or deletion of rules within
each chromosome. Each chromosome may consist of different
number of rules which reects the complexity of the rule set,
however the number of parameters within each rule is xed
by the number of input features of the dataset. Such a representation is efcient and facilitates the design of problem-specic
genetic operators. Each chromosome is made up of four allele
strings:
3.1.1. Boundary string
The rst string encodes the boundary values for each input feature of the dataset. The lower boundary for the ith input feature is
denoted by Li while the upper boundary is denoted by U i . This
boundary value gives the numeric threshold for the input feature
and is to be used concurrently with the operator string.
3.1.2. Masking string
The masking string has the effect of feature selection as not all
the features presented in the dataset are necessary for classication. Though several features are present in a dataset, very often
only a fraction of these features are useful for good classication
accuracy (Cheng, Wei, & Tseng, 2006). The inclusion of all the features may even deteriorate the classication performance as they
may mislead or interfere with the good features. The masking allele is a binary bit string indicating whether a given input feature
should appear in the rule. A bit 0 at the ith position indicates exclusion of the feature i, hence the corresponding operator and boundary conditions would be excluded too. Similarly, a bit 1 indicates
inclusion of the feature.
3.1.3. Operator string

The operator string indicates the mathematical operator to be
applied on the boundary string. The four operators encoded are
namely, greater than P, less than 6, within a range of 66
and less than or greater than6P, given by x P L; x 6 U; L 6
x 6 U; x 6 U and x P L, respectively, where x is a given numeric
attribute.
3.1.4. Class string
If any of the rules is able to classify the instance, the instance
would be classied as the corresponding class else it will be classied as the default class. There are several methods used to determine the default class. Some methods are, the majority class given
in the training dataset, random assignment, evolved using EAs, etc.
(Tan et al., 2003, Tan, Yu, & Ang, 2006a, 2006b). In this paper, the
majority class is used.
Fig. 1 illustrates a generic rule set with m number of rules for a
dataset with n number of input features.
3.2. Fitness evaluation
Several different evaluation metrics for rule evaluation are presented in the literature and the most commonly used metric is the
classication accuracy which has a more direct relation to generalization accuracy. Sensitivity and specicity which are statistical
tests for binary classication are also often used; however, these
are only applicable for Michigan rules. The less commonly considered metrics are comprehensibility and interestingness. The comprehensibility is a subjective measure on how clearly and easy a
rule is interpretable by humans. Generally, rules that are incomprehensible to humans are often useless in data mining or knowledge discovery because such rules are not benecial to the users.
Interestingness is a measure of how common a rule is, a common
rule is deemed less interesting than an uncommon rule.
In this paper, the tness function used to evaluate the rule set
comprises of the classication accuracy being the major metric
and in addition, ideas borrowed from the evaluation of association
rules (Agrawal, Imielinski, & Swami, 1993) are also used. Association rules are rules that attempt to nd interesting relationships
among variables in a database. These rules are often used for databases that would present some trends within the variables, e.g.
supermarket database, earthquake database, etc. In a supermarket
database, an association rule could look like, < if a customer buys
our and egg, the customer would buy sugar too >. The common
Rule 1
L11
M11
O11
U11
M12
O12
Rule M
Class
Lm1
Mm1
O11
Um1
Mm2
O12
L12
Lm2
U12
Um2
M1n
O1n
Mm4n Om4n
L1n
Lmn
U1n
Umn
Fig. 1. Genotype representation of a rule set chromosome.
Class
1305
metrics for association rules are the support and condence level
(Savasere et al., 1995). Support of A or sup A is a measure of
the number of transactions that contain the item set within the
database whereas condence of an association rule is condence
, where sup AUC is the item set containing A = anteR supAUC
supA
cedent as well as C = consequent.
In the context of this paper, instead of using support and condence level for a rule, it is applied on a Pittsburgh rule set. The support would then be modied as measuring the coverage of the
entire rule set, which is the ratio of the number of instances covered by the rule set to the total number of instances in the dataset:
supRSi
F Ai
N
RSi is the ith rule set, i81; . . . M where M is the number of rule sets
to be evaluated. F Ai is the number of instances that t the antecedents of the rule set and N is the total number of instances.
The condence factor will measure how accurate a rule set is on
those instances that are captured by its antecedents
confidenceRSi
CRi
F Ai
where CRi is the number of instances correctly classied by the rule

set.
The support factor and condence factor will provide additional
indicators for the performance of the rule set.
In the evaluation of a rule, only individual performance is considered while the evaluation of a rule set constitutes the collective
performance of the rules. Classication accuracy is the major component used in the tness function as eventually it is the generalization accuracy which is of signicant importance. This
classication accuracy measure is given in Eq. (3)
fi
Ci
100%
N
Rule
11
Rule
24
Rule
12
Rule
14
Rule
13
Rule
13
Rule
14
Rule
22
Rule
15
Rule
16
fi is the tness of chromosome i. C i is the total number of correctly

classied patterns by the algorithm using chromosome i and N is
the total number of patterns in the training set, i.e. C i 6 N.
In order to integrate the three components of the tness function, a weighted approach is used. Depending on the users priority,
the weights are set accordingly. Hence, the Modied Weighted Fitness Function (MWFF) is given in Eq. (4). In this paper, the classication accuracy is considered the most important component,
hence w1 is set as 1 while w2 w3 0:2
MWFF w1 CA w2 Support w3 Confidence
3.3. Tournament selection

Offspring are created using structural crossover of the parents.
Parents with higher tness are selected for crossover process and
this is done using binary tournament selection with replacement.
In binary tournament selection, two parents are randomly selected
and the parent with the higher tness would be taken a snapshot
for offspring creation process. The two parent chromosomes are
then replaced back into the parent population pool. The selection
of the second parent for crossover follows the same procedure.
3.4. Structural crossover
Crossover process is required so that useful information is exchanged between parents and passed to the offspring. Since the
standard chromosome representation is not used, application of
standard genetic operators might not be suitable; hence, a problem
specic crossover operator is applied.
Firstly, crossover points are only allowed at the junctions between each rule, hence crossover is carried out in terms of rule
blocks. The crossover point should not occur in between each rule
which would break up the rules strings. Secondly, the crossover
process is done in a shufed manner. This shufed crossover pro-
Rule
16
Children
Parents
Rule
11
Fig. 2. Structural crossover process.
Rule
21
Rule
22
Rule
23
Rule
24
Rule
21
Rule
12
Rule
15
Rule
23
1306
cess starts with combining the rules of both selected parents into a
common pool. The rst selected rule will be assigned to the rst
child while the second selected rule to the second child. The
remaining rules are then distributed randomly between the two
children. Since rules are considered sequentially within a rule
set, rules at the bottom might not have a chance to be considered
even though they are good rules as those rules on top would have
already classied the instance. One major advantage and necessity
of doing structural shufing crossover is to bring those rules that
are at the bottom to the top of the rule set, hence having the opportunity of classifying the instances. Fig. 2 shows an example of the
crossover process. It can be seen that after the crossover process,
the rules are randomly ordered and the number of rules of a rule
set may vary too.
late into a very different rule set, ensuring that new areas of the
search space are constantly being explored. Fig. 3a shows an example of rule addition while Fig. 3b shows an example of rule deletion. The position of insertion and deletion of the rules is
randomly chosen.
The algorithm needs to decide whether addition or deletion is
to be done at each generation. After which, the number of rules
to be added or deleted is based on a Gaussian distribution (Ang,
Tan, & Mamun, 2008). Using the Gaussian distribution has the
advantage of allowing more exibility on the number of rules to
be deleted or added, while keeping the distribution centered at
the zero mean. Fig. 4 shows the decision based on the Gaussian
distribution.
3.6. Probability of mutation
3.5. Structural mutation

The rate of mutation Pm plays an important role in the mutation
process. If the mutation rate is minimal, there would be too many
similar chromosomes. On the other hand, having a large mutation
rate directs the search towards random search and increases the
possibility of disrupting the structure of chromosomes that may
carry good solutions. Researchers have acknowledged the importance of this parameter and many works have been done. Generally, research papers have shown that using a xed Pm
throughout the whole evolution process is not optimal and ef-
The mutation operator provides a mechanism for solutions to

escape from local regions and to create more diversity. The encoding of Pittsburgh rule sets results in higher complexity than in
Michigan approach, dedicated variation operators are needed
when applying crossover and mutation. The structural mutation
is applied based on probability of mutation, Pm (given in Section
3.6). Structural mutation involves the addition or deletion of rules.
With this insertion and deletion, the resulting rule set could trans-
Insertion
Rule 1
Rule 2
Rule 3
Rule 4
Rule 5
Rule 1
Rule 2
Rule 3
Rule 4
Rule 1
Rule 2
Rule 4
Rule 5
(a)
Deletion
Rule 1
Rule 2
Rule 3
Rule 4
Rule 5
(b)
Fig. 3. Structural mutation (a) addition and (b) deletion.
-5
-4
-3
-2
-1
Fig. 4. Gaussian distribution for structural mutation.
New
Rule
Rule 5
1307
cient. (Holland, 1992) proposed a time-dependent and deterministic rate scheduled mutation operator where the mutation probability decreases in value over time. (Fogarty, 1989) used a varying
mutation rate and showed the effectiveness of using mutation rate
that decreases exponentially over the generations. Similarly in
(Tan, Goh, Yang, & Lee, 2006), the mutation rate starts off with a
high value to produce a diverse set of solutions for an effective
genetic exploration search and this value decreases as a function
of generation to meet the exploitation requirements of local netuning. This mutation scheme is different from the other adaptive
mutation operators where the mutation rate decreases proportionally as the generation increases.
In this paper, the probability of mutation P m changes based on
the scheme of (Tan et al., 2006), however it is modied to suit
the problem discussed in this paper. The equation used in this paper is given in Eq. (5)
Pm
8

2
>
n
>
;
< 0:7 1 genNum
0 6 n 6 0:2 genNum;

2
>
>
: 0:2 ngenNum 0:05; 0:2 genNum 6 n 6 genNum;
genNum
5
where n is the current generation of the evolution process, genNum

is the maximum generation number.
Fig. 5 shows the values of mutation rate as the evolution proceeds through the generations. This ensures that during the initial
stage of evolution, there are large jumps in the search space (exploration) while at the latter stage, it narrows down to local search
within the neighborhood (exploitation).
3.7. Default class
There are papers in the literature that evolve the default class
during the evolution process, however since tness is based on
accuracy on the training dataset, this would naturally gear the evolution process towards the major class within the training dataset.
Hence, the default class used in this paper is the major class of the
training dataset which provides a straight forward solution. However, from the simulation results provided in the later section of
this paper, the default class is not an issue in this paper as the support for the rule sets evolved is able to achieve almost 100% coverage for both the training and testing datasets.
3.8. Elitism and archiving

Elitism is used to prevent the loss of good particles due to the
stochastic nature of the optimization process. It is implemented
by including the best parent chromosome into the children population for consideration as parents for the next generation. Elitism
has the effect of improving convergence speed.
The archive acts as candidates for the parents in the next generation. The chromosomes in the archive go through local search
with the tuned chromosomes emerging as the parents for the next
generation. The size of the archive is set as the size of the parent
population l.
4. Rule extraction algorithm overview
4.1. Training phase overview
The objective of the training phase is to have an algorithm that
is capable of optimizing the architecture of rule sets and parameters of the rules concurrently. A few features, such as variablelength chromosome representation, specialized genetic operators
in the form of the structural mutation (SM) and crossover (SC),
and local search for effective exploitation, are incorporated to
achieve this. The rules evolved should be high in support and condence level and most importantly produce good classication
accuracy.
Fig. 6 shows the owchart of the algorithm. The algorithm rst
initializes a population of b rule sets and evaluates their tness.
These rule sets act as the initial parent population for offspring creation and the population goes through a series of binary tournament selection and structural crossover based on P c (crossover
probability) to exchange good information to pass on to the children. These newly created offspring would undergo structural
Initialization
Offspring
Creation
Binary Tournament
Selection
Structural
Mutation
Structural
Crossover
0.7
Evaluate and
Archive
Probability of Mutation
0.6
0.5
Local Search
0.4
Elitism
0.3
Parent
Population
0.2
No
0.1
0
Stopping Criterion met?

Yes
20
40
60
Generations
Fig. 5. Probability of mutation.
80
100
Terminate
Fig. 6. Evolutionary memetic rule extraction algorithm overview.
1308
mutation based on P m (mutation probability) in order to vary the

number of rules in a rule set. Elitism is carried out every generation
before local search is performed. The archive size selected is the
same as the parent size b, hence the ttest b children in the children population of k rule sets are archived. These archived individuals would undergo the two local search operators presented in
this paper, i.e. the lGA and the AIS local search. The tter individuals emerging from the local search would be brought forward to
the next generation as parent chromosomes. This process continues until the overall stopping criterion is met, which is the maximum number of generations allowed.
Archive
Genetic
Crossover
Micro-Mutate
End
4.2. Testing phase
Rule Set
Selection
Yes
Stopping Criterion Met?
The rule set that has the highest tness on the training dataset would be applied on the test dataset. Each instance in the
test dataset would be presented to the rule set for classication.
If the rst rule does not capture the instance, this instance will
be considered by the subsequent rules. If none of the rules is
able to classify this instance, the class will be classied by the
default class, which is the major class as specied in this paper.
Hence, it is important that rule sets have high support level during the training phase, as high support level on the training data
would most probably translate to higher coverage on the testing
dataset.
5. Local search algorithm
Evolutionary Algorithms (EAs) have the ability of escaping from
local optima due to their inherent global search capability. Their
concurrent search enables them to promptly explore and identify
new promising regions of the solution space. Though EAs are able
to see the macro situation well they do not exploit the search space
thoroughly. Hence, local search are often used as a complement to
EAs optimization that concentrate mainly on global exploration.
Many researchers complemented the global exploration ability of
EAs by incorporating dedicated learning or local search heuristics.
Experimental studies have shown that EA-LS hybrids or memetic
EAs are capable of more efcient search capabilities (Merz & Freisleben, 1999; Ong & Keane, 2004).
This section introduces two variants of local search operators
that are incorporated into the main rule extraction algorithm.
The main difference between these two local search operators lies
in the way the children are created. The rst local search operator
uses the lGA which involves crossover of chromosome alleles. The
second local search operator, inspired by AIS, uses clonal selection
for proliferation.
5.1. Micro-genetic algorithm (lGA) local search
The framework of the lGA is similar to a normal evolution
process; however it differs in its purpose, implementation and
operation. The target of the main EA is to optimize the structure
of rule sets and exchange good rules among the rule sets
through the use of structural mutation and structural crossover.
In micro-genetic algorithm local search, the structure of the rule
sets is maintained throughout the evolution process while the
parameters of the rules are being exchanged and mutated. In
this manner, the parameter values are optimized to t the rule
set structure.
Fig. 7 shows the overview of the lGA Local Search. Each individual from the archive of the main algorithm would go through one
round of local search process. A parent goes through genetic crossover to produce the children population. After the crossover process, all the children undergo micro-mutation (Section 5.3).
No
Fig. 7. Micro-genetic algorithm local search overview.
These mutated children are evaluated and the ttest child is selected as the parent to be used for crossover process in the next
generation. This process is repeated until the overall stopping criterion, which is the maximum number of generations of the local
search, is met. When the local search process ends, the ttest child
would be returned to the main algorithm to be the parent for the
next generation in the main algorithm.
5.1.1. Genetic crossover
The genetic crossover is done among the rules of a rule set as
depicted in Fig. 8. Using one rule set, it is able to create multiple
different children by choosing different rules and different crossover points. In exceptional cases, where the chosen rule set contains only one rule, no crossover can be done and it is replaced
by the cloning local search. If the chosen rule set contains two
rules, different children are created by choosing random points
for crossover among the two parent rules. The allowed crossover
points of the boundary strings are at the intersection of each attribute to prevent separating the lower and upper boundaries of an
attribute. This constraint is applied to avoid any infeasible solution
after the crossover. For the crossover to proceed, two random rules
and one crossover point are chosen. Once the crossover point is selected, this point is consistent for both the rules. Fig. 8 shows a rule
set containing three rules for a four input feature problem. Rules 1
and 3 are selected for crossover process and the selected crossover
point is between feature 1 and feature 2. No change is made to the
class eld of each rule.
5.2. AIS local search
Though AIS are widely used for classication, they are seldom
used to provide high level linguistic rules. A number of works,
including Carvalho and Freitas (Carvalho & Freitas, 2000; Carvalho
& Freitas, 2001), use immunological algorithm to classify examples
belonging to small disjuncts. The set of small disjuncts would then
be able to cover a large set of examples. In (Timmis & Neal, 2001),
AIS is used to discover fuzzy classication rules whereas in (Castro,
Coelho, Caetano, & von Zuben, 2005) an antibody represents a set
of fuzzy classication rules. In addition, AIS are usually used as a
main algorithm for classication, hence the construction of an
AIS local search operator to ne tune the rules in a rule set would
open up new interesting avenues on the hybridization of EA with
AIS.
Inspired by AIS, the local search here incorporates the idea of
clonal selection. The principle of clonal selection forms the core
of AIS where the antibodies that t the antigens most are cloned
in order to replicate good individuals. In this paper, the immuno-
1309
Parent
Rule 1
L11
M11
O11
U11
M12
L12
U12
Rule 2
Class1
L21
M21
O21
O12
U21
M22
M13
O13
L22
M14
O14
U22
Rule 3
Class2
L31
M31
O31
O22
U31
M32
O32
M23
O23
L32
M33
O33
M24
O24
U32
M34
O34
L13
L23
L33
U13
U23
U33
L14
L24
L34
U14
U24
U34
Rule 1
L11
M11
O11
U11
M32
L32
U32
Rule 2
Class1
Class3
Rule 3
L21
M21
O21
O32
U21
M22
M33
O33
L22
M34
O34
U22
Class2
L31
M31
O31
O22
U31
M12
O12
M23
O23
L12
M13
O13
M24
O24
U12
M14
O14
L33
L23
L13
U33
U23
U13
L34
L24
U34
U24
Offspring
Class3
L14
U14
Fig. 8. Micro-genetic algorithm crossover process.
logical memory cell is the best antibody solution from the cloned
cell. The antigen is regarded as the optimal solution for the training
dataset in terms of accuracy. The analogous of the antibodies of the
natural immune systems is the solution to the problem, i.e. the rule
sets. The objective is for the antibodies to t the antigen as close
as possible and in this case, is for the evolving rule sets to get as
close to the optimal solution as possible.
The owchart of the AIS local search is given in Fig. 9. Similar to
lGA, a chromosome from the archive of the main algorithm would
be taken as a good antibody. This antibody proliferates by cloning
based on a clone rate, C. In cloning, exact replication of the antibody is done. Hyper-mutation in terms of micro-mutation is then
applied on the cloned cells. At every encounter/iteration, the afnity of the antibodies and antigen is evaluated. Afnity measurement is calculated based on their Euclidean distance in the
objective space. Smaller Euclidean distance means higher matching, hence higher afnity. The cell with the highest afnity is selected as the antibody to be cloned in the next encounter/
iteration. No crossover operation is done here.
5.3. Micro-mutation
By using the mutation operator, the algorithm is able to explore
new areas of the search space and has the possibility of getting a
Archive
Cloning
Micro-Mutate
End
Yes
Clonal
Selection
Stopping Criterion Met?
No
Fig. 9. Articial immune systems local search algorithm.
better solution. As mentioned in Section 3.1, each rule consists of

four elds, namely the boundary, mask, operator and class eld.
The micro-mutation used here is eld specic and due to the limitation of each eld, constraints are also applied accordingly. The
probability that an allele will be mutated is based on the micromutation probability Pmm . This mutation probability is genera-
1310
tion/iteration-based, however it is different from the one described

in Section 3.6. As the number of generations/iterations used in the
local search is lesser than in the main algorithm, the generational
mutation in Section 3.6 may not seem suitable. The mutation
scheme used here is given in Eq. (6)
Pm 0:2

1
0:05;
n
fn 2 Z : 1; genNumg
where n is the current generation/iteration number and genNum is

the maximum number of generations/iterations allowed.
The mutation details for each string are given as:
5.3.1. Boundary string
Real value mutation is used to change the values of the boundary conditions. A random value between [0.1, 0.1] is added to the
allele. If the value of the allele exceeds the range [0, 1] after undergoing mutation, boundary conditions given in Eq. (7) are enforced
vq
1; if
0; if
vq > 1
vq < 0
where v q = mutated value of allele occupying the qth position of the

allele string.
5.3.2. Masking string
The mutation of the masking string takes the form of a bit ip
operator. A bit 1 is mutated to be bit 0 and vice versa.
5.3.3. Operator string
Mutation on an operator allele would cause that allele to take
any of the other three types of operators.
The iris dataset is a botanical multi-class problem and has been

one of the most widely used dataset in the pattern recognition literature (Duda & Hart, 1973). Among the three classes, one class is
linearly separable from the other two classes while these two classes are non-linearly separable. The input features are the sepal
length, sepal width, petal length and petal width. The three different types of iris plant to be identied are the iris setosa, versicolour
and virginica.
6.2. Experimental setup
EMA is implemented using the MATLAB technical computing
platform and the corresponding simulations are performed on Intel
Pentium 4 2.8 GHz computers. Twenty independent runs are performed for each simulation setup.
As different datasets have different characteristics with different convergence rates and requirements, specic settings that suit
each dataset are required. The parameter settings used in the
simulations for both the main algorithm and the local search
are tabulated in Table 1 and Table 2. As local search is performed
on each chromosome in the archive, the initial size in the local
search is 1 in each round. Local search focuses mainly on local
ne tuning, the number of evaluations used is much less than
the main algorithm, therefore the type of mutation probability
they used is different from the main algorithm. The crossover
operator for the main algorithm is chromosome-based, meaning
it is done between two chromosomes, while that for the local
search is rule-based, which is done among the rules in each rule
set (chromosome).
7. Experimental results and analysis
5.3.4. Class string

The number of possible mutations is dependent on the number
of output classes in the problem. A random class would be used to
replace the existing class in mutation.
6. Datasets and experimental setup
In this section, simulation results are presented and discussed.

Comparisons are made against rule extraction without local search
and a well known rule-based algorithm in the literature, i.e. PART
(Frank & Witten, 1998). The parameter settings and operators used
for rule extraction without local search (No LS) are similar to EMA
with the exception that there is no ne tuning of the parameters by
6.1. Datasets
Datasets from the medical and botanical elds are being used to
validate the proposed algorithm. The cancer, diabetes and iris datasets represent both binary and multi-class classication problems.
These datasets are taken from the University of California Irvine
Machine Learning Repository (Blake & Merz, 1998) benchmark collection and these are data collected from real world problems. For
each dataset, 75% of the data is used for training while the remaining 25% is used for testing. Values of the datasets are normalized in
the range of [0, 1].
The objective of the cancer problem is to diagnose breast cancer
in patients by classifying a tumor as benign or malignant. The
Breast Cancer Wisconsin problem dataset was originally collected in the University of Wisconsin Hospitals, Madison from Dr.
William H. Wolberg (Mangasarian & Wolberg, 1990). 458(65.5%)
of the patterns in the datasets are benign while 241(34.5%) of the
patterns are malignant. There are nine attributes/inputs (clump
thickness, uniformity of cell size and shape, marginal adhesion,
single epithelial cell size, bare nuclei, bland chromatin, normal
nucleoli and mitoses).
The diabetes problem is to diagnose a Pima Indian individual
based on personal data and medical examination. There are eight
attributes/inputs (no. of times pregnant, plasma glucose concentration, diastolic blood pressure, triceps skin fold thickness, serum
insulin, BMI and age) and two output classes (diabetes positive or
diabetes negative).
Table 1
Parameter settings for EMA.
Populations
Archive size
Crossover prob.
Crossover type
Generations (G)
Mutation prob.
Mutation type
Elitism
Cancer
Diabetes
Iris
Parent: 30
Offspring: 120
30
0.7
Chromosome
50
Generational
Structural
Yes
Parent: 25
Offspring: 150
25
0.7
Chromosome
100
Generational
Structural
Yes
Parent: 30
Offspring: 120
30
0.7
Chromosome
50
Generational
Structural
Yes
Table 2
Parameter settings for local search.
Populations
Crossover Prob.
Crossover Type
Generations (G)/
Iterations
Cloning
Mutation Prob.
Mutation Type
lGA
AIS
Parent: 1 offspring: 5
0.9
Rule
10
Initial antibody: 1 Clone rate: 5

No
No
10
No
Generational
Micro-Mutation
Yes
Iteration-Based
Micro-Mutation
1311
local search. The comparisons are not meant to be exhaustive but

rather it provides an indication of how well EMA performs in addition to the advantage of being able to generate high level comprehensible linguistic rules.
7.1. Training phase
Figs. 1012 show the tness level, classication accuracy, support and condence level results of using EMA-lGA, EMA-AIS
and No LS on the cancer, diabetes and iris training datasets.
7.1.1. Cancer dataset
From Fig. 10, it can be seen that the tness from all the algorithms has converged at the end of 50 generations. All the algorithms presented a rather smooth convergence curve rather than
a uctuating and noisy one, meaning the algorithm is able to improve the classication accuracy on the training set gradually
and steadily, with each generation getter tter. Compared to No
LS, both EMA-lGA and EMA-AIS are able to obtain higher tness
values. EMA-lGA and EMA-AIS have comparable performance on
the training dataset. The highest classication accuracies obtained
by EMA-lGA and EMA-AIS are around 97.6% while that of No LS is
around 95.7%. Since the support on the dataset is about 100% (from
Fig. 10c), this classication accuracy performance is very much solely due to the performance of the rule sets evolved rather than
dependent on the default class rule. In addition, when the support
is almost 100%, the condence level is reective of the accuracy of
the rule sets on the entire dataset.
7.1.2. Diabetes dataset
Fig. 11 shows that the algorithms, when applied on the diabetes
dataset require a larger number of generations for convergence.
Once again, all the algorithms presented a smooth graph that has
(a)
140
tness getting better as the generation increases. The classication

accuracies obtained by both EMA-lGA and EMA-AIS at the end of
the generations are about 82% while No LS gave 77.8%. The support
on training dataset increases sharply to 100% during the initial
generations which implies that most of the search effort after the
initial phase of the algorithm is concentrated on increasing the
accuracy.
7.1.3. Iris dataset
The classication accuracies obtained by both EMA-lGA and
EMA-AIS at the end of the generations are about 98.9% while No
LS only gave 94.2%. The observations made from the iris dataset
shows similar trends with cancer and diabetes datasets. The performance of No LS in terms of tness and classication accuracy
could not match both the EMA local search algorithms, while both
the EMA-lGA and EMA-AIS have comparable performance. This
shows that local search is important in improving the results on
training dataset.
Generally, different datasets require different number of generations for convergence. In addition, the algorithms are able to improve the support on the training dataset rapidly, showing
condence that the accuracy is solely due to decisions of rule sets
rather than the default class being used.
7.2. Rule set generated
The parent population rule sets are of different lengths and use
different number of features due to the masking operator. An
example of the rule set generated for the iris dataset is given in
Fig. 13. This rule set comprises of six rules with the last rule being
the default rule. Each rule contains different number of features;
the rst rule uses only the petal length while the second rule contains the sepal length, petal length and width. Each of these indi-
100
(b)
135
Fitness
130
EMA-GA
EMA-AIS
No Local Search
125
Accuracy
95
EMA-GA
EMA-AIS
No Local Search
90
85
120
80
115
110
10
20
30
40
75
50
10
Generations
Support
40
50
98
(d)
100
EMA-GA
EMA-AIS
No Local Search
95
30
Generations
96
Confidence
(c)
20
90
85
80
94
EMA-GA
EMA-AIS
No Local Search
92
90
88
86
75
70
84
10
20
30
Generations
40
50
82
0
10
20
30
Generations
Fig. 10. Cancer dataset (a) tness level, (b) accuracy, (c) support and (d) condence.
40
50
1312
(a)
(b)
120
84
82
80
Accuracy
Fitness
115
110
EMA-GA
EMA-AIS
No Local Search
105
78
76
74
EMA-GA
EMA-AIS
No Local Search
72
70
100
68
95
20
40
60
80
66
0
100
10
20
Generations
(c)
(d)
50
60
70
80
90 100
84
82
100
80
99
EMA-GA
EMA-AIS
No Local Search
98
Confidence
Support
40
Generations
101
97
96
78
76
74
70
94
68
0
20
40
60
80
66
100
EMA-GA
EMA-AIS
No Local Search
72
95
93
30
20
Generations
40
60
80
100
Generations
Fig. 11. Diabetes dataset (a) tness level, (b) accuracy, (c) support and (d) condence.
(a)
(b)
140
135
100
95
125
Accuracy
Fitness
130
EMA-GA
EMA-AIS
No Local Search
120
115
90
EMA-GA
EMA-AIS
No Local Search
85
80
110
75
105
100
10
20
30
40
70
50
10
Generations
(d)
100
98
Support
94
40
50
100
95
EMA-GA
EMA-AIS
No Local Search
96
30
Generations
Confidence
(c)
20
92
90
90
EMA-GA
EMA-AIS
No Local Search
85
80
88
86
75
84
82
10
20
30
Generations
40
50
70
10
20
30
Generations
Fig. 12. Iris dataset (a) tness level, (b) accuracy, (c) support and (d) condence.
40
50
1313
if
petal length<=2.9575
then class is Setosa
elseif
(sepal length<=5.7553 or sepal length>=5.9491)

and (petal length >=4.1471) and (petal
width<=1.0625 or petal width>=1.6954)
then class is Virginica
elseif
elseif
petal length>=4.9726
(4.7692 <= sepal length<=6.6939) and (petal
width>=1.9455)
then class is Virginica

then class is Setosa
elseif
2.8685<=petal length<= 5.2774
then class is Versicolour
else
class is Versicolour
Fig. 13. Example rule set on iris dataset.
vidual rules predicts a class and they collectively make up the

whole rule set and all the three classes of the iris ower are predicted. Only rule sets that predict all the classes of the problem
as useful. If a rule set does not cover all the classes of the problem,
some classes will be left unidentied. Further observations on the
rule set show that not all the input features are required for decision making. The sepal width did not appear in the antecedent of
the rules. This leads to the eld of feature selection (Lepist
et al., 2006). The eld of feature selection states that not all the given features of a dataset are required for good classication. In fact,
very often, only a subset is required. The inclusion of all features
might even deteriorate the performance of the algorithm as some
features are detrimental and might interfere with the decision of
the other features. This could imply that the sepal width is not required for good classication for the iris problem.
The number of rules generated by the algorithms over the runs
are shown as box plots in Fig. 14 boxplot. The boxplot shows the
(b)
35
55
Rule Number
50
30
25
20
15
45
40
35
30
25
20
15
10
10
EMA-GA
EMA-AIS
No LS
EMA-GA
Algorithms
(c)
EMA-AIS
Algorithms
20
18
Rule Number
Rule Number
(a)
median (line within the box) and the lower quartile and upper
quartile given by the lower and upper boundaries of the box.
The smallest and largest observations are also given, with possible outliers marked by a cross. Generally, EMA-lGA and EMAAIS have lesser rules in a rule set for all problems compared to
No LS. In connection with the tness graphs, an appropriate number of rules is required rather than having a larger rule set that
does not guarantee better classication abilities. For all the datasets, EMA-lGA and EMA-AIS evolve rule sets that are more consistent in terms of the number of rules, while No LS has larger
deviations from the median. In addition, without using local
search, outliers are evident for all the three datasets. The differences between the median number of rules for EMA-lGA and
EMA-AIS are not very great; 13.25 vs. 12, 21.8 vs. 21 and 7.15
vs. 7.8 for the cancer, diabetes and iris datasets, respectively.
Using local search results in less complex rule sets but yet achieving better performance.
16
14
12
10
8
6
4
EMA-GA
EMA-AIS
No LS
Algorithms
Fig. 14. Average number of rules in a rule set (a) cancer, (b) diabetes and (c) iris.
No LS
1314
7.3. Generalization and results comparison

In this section, the support and generalization performance on
the test datasets are given. In addition, comparison of generalization accuracy is made with PART a rule-based algorithm in the literature (Frank & Witten, 1998).
7.3.1. Support on test datasets
The best training rule set is applied on the test set and Table 3
shows that the support on the test set is close to 100% for both the
cancer and diabetes problems, and it is able to cover 100% of the
instances for the iris problem. The deviation on the number of instances that are supported is very small or zero. Hence, for every
run, there is high condence that the rule set evolved during the
training phase is able to cover most of the testing set instances
and probably only one or two instances of the test dataset do not
t the rule set antecedents. When one rule set is able to support
the test data with high probability, this requires only one rule to
be selected from the training phase. If each rule set has low support
on the test set, several rule sets are needed to cover the entire testing dataset. Since the support on the test data is almost 100%, in
Table 4 which shows the generalization accuracy, it is certain that
the generalization accuracy gures truly reect the performance of
the rule sets on the test data and the accuracy is also reective of
the condence on the test data.
7.3.2. Generalization accuracy
EMA-lGA and EMA-AIS obtained, respectively, 97.4% and 97.6%
mean generalization accuracy on the cancer dataset and are the
highest among the algorithms. All the algorithms proposed in this
paper are able to outperform PART in mean accuracy. EMA-lGA,
EMA-AIS and No LS have the same maximum accuracy while No
LS has lower minimum accuracy. No LS also has higher standard
deviation as compared to using LS. Though PART has the lowest
standard deviation among all the presented algorithms, it should
Table 3
Support for dataset.
Mean support (%)
Standard deviation
Cancer
EMAlGA
EMAAIS
No LS
99.71
99.86
99.60
0.659
0.255
0.497
Diabetes
EMAlGA
EMAAIS
No LS
99.87
99.95
99.84
0.373
0.160
0.382
Iris
EMAlGA
EMAAIS
No LS
100
100
100
0.00
0.00
0.00
Table 4
Generalization accuracy.
Mean
accuracy (%)
Maximum
accuracy (%)
Minimum
accuracy (%)
Standard
deviation
EMAlGA
EMAAIS
No LS
PART
97.414
97.557
96.236
94.9
99.425
99.425
99.425
95.402
95.402
93.678
0.942
1.002
1.532
0.4
Diabetes
EMAlGA
EMAAIS
No LS
PART
75.052
75.235
73.229
74.0
80.208
77.604
77.083
71.875
72.396
68.75
2.292
1.439
2.133
0.5
Iris
EMAlGA
EMAAIS
No LS
PART
97.297
97.027
94.595
93.7
100
97.297
100
94.595
94.595
83.784
0.877
0.832
3.921
1.6
Cancer
be noted that the minimum accuracies for EMA-lGA and EMAAIS are still higher than the mean accuracy of PART. Hence, for
EMA-lGA and EMA-AIS, the deviation from the mean accuracy will
not give results that are lower than PART.
PART obtained higher mean accuracy than No LS and has the
lowest standard deviation for the diabetes problem. The algorithms presented in this paper have rather large standard deviations. The best algorithm among those presented in this paper is
EMA-AIS, which has the highest mean accuracy yet lowest standard deviation. EMA-lGA is able to achieve the highest maximum
accuracy at 80.2% which is a large difference from the mean accuracy of the rest of the algorithms.
The best algorithm on the iris dataset is EMA-lGA where it obtained the highest mean accuracy and the highest maximum and
minimum accuracies. It also has lower standard deviation compared to No LS and PART. Both EMA-lGA and EMA-AIS have outperformed PART signicantly since they have higher mean
accuracies with lower standard deviations.
Generally, across all datasets, the following observations are
made:
No LS has the lowest minimum accuracies and large standard
deviations, hence local search is important for a more robust
algorithm.
EMA-lGA and EMA-AIS have the best mean accuracies.
8. Conclusions and future work

A novel way of incorporating local search into Evolutionary
Algorithms for rule extraction is presented in this paper. Two local search operators, namely lGA and AIS, are proposed and the
results are analyzed in detail and discussed. The memetic algorithm is complimented by the use of variable length chromosome
which naturally represents the rules for the problems to be
solved. Several advanced variation operators are used to improve
the algorithms. The algorithms are applied on real world benchmarking problems and simulations show that both EMA-lGA
and EMA-AIS have comparable results. The results also show that
LS is generally important for better efciency as No LS is not able
to attain the same level of performance as EMA-lGA and EMAAIS. All the algorithms proposed in this paper managed to get
near 100% support level on the datasets. In addition, the algorithms have also managed to outperform PART, a rule based algorithm in literature.
There is great potential in this area of research where memetic
algorithms are used for rule extraction. Future work could be concentrated on improving the various variation operators to improve
on other aspects of the problem. In addition, data in real life are often interspersed with missing values and noise, which could lead
to spurious results and mislead the users. Proposing dedicated local search operators that are able to handle such situations would
be advantages.
References
Agrawal, R., Imielinski, T., & Swami, A. (1993). Mining association rules between
sets of items in large databases. In Proceedings of SIGMOD conference (pp. 207
216).
Alves, R. T., Delgado, M. R., Lopes, H. S. & Freitas, A.A. (2004). An articial immune
system for fuzzy-rule induction in data mining. In Lecture notes in computer
science, proceedings of parallel problem solving from nature (Vol. 3242, pp. 1011
1020). Berlin, Germany: Springer-Verlag.
Ang, J. H., Tan, K. C., & Mamun, A. A. (2008). Training neural networks for
classication using growth probability based evolution. Neurocomputing, 71,
34933508.
Blake, C. L. & Merz, C. J. (1998). UCI Repository of machine learning databases,
<http://www.ics.uci.edu/~mlearn/MLRepository.html>. Irvine, CA: University of
California, Department of Information and Computer Science.

Bojarczuk, C. C., Lopes, H. S., Freitas, A. A., & Michalkiewicz, E. L. (2004). A
constrained-syntax genetic programming system for discovering classication
rules: Application to medical data sets. Articial Intelligence in Medicine, 30(1),
2748.
Carvalho, D. R., & Freitas, A. A. (2000). A hybrid decision tree/genetic algorithm for
coping with the problem of small disjunct in data mining. In Proceedings of
genetic and evolutionary computation conference (pp. 10611068). Las Vegas, NV,
USA.
Carvalho, D. R., & Freitas, A.A.(2001). An immunological algorithm for discovering
small-disjunct rules in data mining. In Proceedings of graduate student workshop
at GECCO-2001 (pp. 401404). San Francisco, USA.
Castro, P. D., Coelho, G. O., Caetano, M. F., & von Zuben, F. J. (2005). Designing
ensembles of fuzzy classication systems: An immune-inspired approach. In
Proceedings of fourth international conference on articial immune systems (Vol.
3627, pp. 469482). LNCS.
Cheng, T. H., Wei, C. P., & Tseng, V. S. (2006). Feature selection for medical data
mining: comparisons of expert judgment and automatic approaches. In
Proceedings of 19th IEEE international symposium on computer-based medical
systems (pp. 165170).
Coello Coello, C. A., & Cortes, N. C. (2005). Solving multi-objective optimization
problems using an articial immune system. Genetic Programming and Evolvable
Machines, 6(2), 163190.
Doumpos, M., & Zopounidis, C. (2002). Multi-criteria classication methods in
nancial and banking decisions. International Transactions in Operational
Research, 9(5), 567581.
Duda, R. O. & Hart, P. E. (1973). Pattern classication and scene analysis. John Wiley
& Sons. ISBN 0-471-22361-1.
Frank, E., & Witten, I. H. (1998). Generating accurate rule sets without global
optimization. In Proceedings of the fteenth international conference machine
learning (pp. 144151).
Freitas, A. A. (2002). A survey of evolutionary algorithms for data mining and
knowledge discovery. In A. Ghosh & S. Tsutsui (Eds.), Advances in evolutionary
computation (pp. 819845). Springer Verlag. ISBN: 3-540-43330-9.
Freitas, A. A., & Timmis, J. (2007). Revisiting the foundations of articial immune
systems for data mining. IEEE Transactions on Evolutionary Computation, 11(4),
521540.
Fogarty, T. (1989). Varying the probability of mutation in the genetic algorithm. In
Proceedings of the third international conference on genetic algorithms (pp. 104
109). Morgan Kaufmann.
Goodman, D. E., Jr., Boggess, L. C., & Watkins, A. B. (2002). Articial immune system
classication of multiple-class problems. Intelligent Engineering Systems through
Articial Neural Networks, 12, 179184.
Guan, S. U., & Li, S. C. (2001). Incremental learning with respect to new incoming
input attributes. Neural Processing Letters, 14(3), 241260.
Holland, J. H. (1992). Adaptation in natural and articial systems. The MIT Press.
Lepist, L., Kunttu, I., & Visa, A. (2006). Rock image classication based on k-nearest
neighbour voting. Proceedings of IEE Vision, Image, and Signal Processing, 153(4),
475482.
1315
Luh, G. C., Chueh, C. H., & Liu, W. W. (2003). MOIA: multi-objective immune
algorithm. Engineering Optimization, 35(2), 143164.
Mangasarian, O. L., & Wolberg, W. H. (1990). Cancer diagnosis via linear
programming. SIAM News, 23(5), 118.
Merz, P. & Freisleben, B. (1999). A comparison of memetic algorithms, Tabu search,
and ant colonies for the quadratic assignment problem. In Proceedings of the
congress on evolutionary computation (Vol. 1, pp. 20632070).
Michalewicz, Z. (1994). Genetic algorithms + data structures = evolution programs.
London: Kluwer Academic Publisher.
Noda, E., Freitas, A.A., & Lopes, H.S. (1999). Discovering interesting prediction rules
with a genetic algorithm. In Proceedings of IEEE congress on evolutionary
computation (pp. 13221329). Washington, DC, USA.
Ong, Y. S., & Keane, A. J. (2004). Meta-Lamarckian learning in memetic algorithms.
IEEE Transactions on Evolutionary Computation, 8(2), 99110.
Quinlan, J. R. (1993). C.45: Programs for machine learning. Morgan Kaufmann
Publishers.
Stathakis, D., & Vasilakos, A. (2006). Satellite image classication using granular
neural networks. International Journal of Remote Sensing, 27(18), 39914003.
Savasere, A., Omiecinski, E., & Navathe, S. (1995). An efcient algorithm for mining
association rules in large databases. In Proceedings of the 21st International
Conference on Very Large Data Bases (pp. 43244).
Tan, K. C., Goh, C. K., Yang, Y. J., & Lee, T. H. (2006). Evolving better population
distribution and exploration in evolutionary multi-objective optimization.
European Journal of Operational Research, 171, 463495.
Tan, K. C., Chew, Y. H., & Lee, T. H. (2006). A hybrid multi-objective evolutionary
algorithm for solving vehicle routing problem with time windows.
Computational Optimization and Applications, 34, 115151.
Tan, K. C., Yu, Q., Heng, C. M., & Lee, T. H. (2003). Evolutionary computing for
knowledge discovery in medical diagnosis. Articial Intelligence in Medicine,
27(2), 129154.
Tan, K. C., Yu, Q., & Ang, J. H. (2006a). A coevolutionary algorithm for rules discovery
in data mining. International Journal of Systems Science, 37(12), 835864.
Tan, K. C., Yu, Q., & Ang, J. H. (2006b). A dual-objective evolutionary algorithm for
rules extraction in data mining. Computational Optimization and Applications, 34,
115151.
Tan, K. C., Goh, C. K., Mamun, A. A., & Ei, E. Z. (2008). An evolutionary articial
immune system for multi-objective optimization. European Journal of
Operational Research, 187(2), 371392.
Timmis, J., & Neal, M. (2001). A resource limited articial immune system for data
analysis. Knowledge-Based Systems, 14, 121130.
Timmis, J., Neal, M., & Hunt, J. (2000). An articial immune system for data analysis.
BioSystems, 55, 150413.
Watkins, A.B., & Boggress, L.C. (2002a). A resource limited articial immune
classier. In Proceedings of Congress on Evolutionary Computation (Vol. 1, pp.
926931).
Watkins, A.B., & Boggress, L.C. (2002b). A new classier based on resource limited
articial immune systems. In Proceedings of the Congress on Evolutionary
Computation (Vol. 2, pp. 15461551).

Expert Systems With Applications: J.H. Ang, K.C. Tan, A.A. Mamun

Caricato da

Informazioni sul documento

Descrizione originale:

Titolo originale

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Expert Systems With Applications: J.H. Ang, K.C. Tan, A.A. Mamun

Caricato da

Copyright:

Formati disponibili

Expert Systems with Applications 37 (2010) 13021315

Contents lists available at ScienceDirect

Expert Systems with Applications

An evolutionary memetic algorithm for rule extraction

On the other hand, rule-based algorithms like C4.5 (Quinlan,

J.H. Ang et al. / Expert Systems with Applications 37 (2010) 13021315

Rule : If A1 and A2 and . . . An then C:

population. For a multi-class problem, each of the rules predicts a

J.H. Ang et al. / Expert Systems with Applications 37 (2010) 13021315

length representations are not suitable for design problems where

3.1.3. Operator string

J.H. Ang et al. / Expert Systems with Applications 37 (2010) 13021315

where CRi is the number of instances correctly classied by the rule

fi is the tness of chromosome i. C i is the total number of correctly

MWFF w1 CA w2 Support w3 Confidence

3.3. Tournament selection

Fig. 2. Structural crossover process.

J.H. Ang et al. / Expert Systems with Applications 37 (2010) 13021315

3.5. Structural mutation

The mutation operator provides a mechanism for solutions to

Fig. 4. Gaussian distribution for structural mutation.

J.H. Ang et al. / Expert Systems with Applications 37 (2010) 13021315

where n is the current generation of the evolution process, genNum

3.8. Elitism and archiving

Stopping Criterion met?

J.H. Ang et al. / Expert Systems with Applications 37 (2010) 13021315

mutation based on P m (mutation probability) in order to vary the

4.2. Testing phase

Fig. 7. Micro-genetic algorithm local search overview.

J.H. Ang et al. / Expert Systems with Applications 37 (2010) 13021315

Fig. 8. Micro-genetic algorithm crossover process.

Stopping Criterion Met?

Fig. 9. Articial immune systems local search algorithm.

better solution. As mentioned in Section 3.1, each rule consists of

J.H. Ang et al. / Expert Systems with Applications 37 (2010) 13021315

tion/iteration-based, however it is different from the one described

where n is the current generation/iteration number and genNum is

where v q = mutated value of allele occupying the qth position of the

The iris dataset is a botanical multi-class problem and has been

5.3.4. Class string

In this section, simulation results are presented and discussed.

Initial antibody: 1 Clone rate: 5

J.H. Ang et al. / Expert Systems with Applications 37 (2010) 13021315

local search. The comparisons are not meant to be exhaustive but

tness getting better as the generation increases. The classication

J.H. Ang et al. / Expert Systems with Applications 37 (2010) 13021315

J.H. Ang et al. / Expert Systems with Applications 37 (2010) 13021315

then class is Setosa

(sepal length<=5.7553 or sepal length>=5.9491)

then class is Virginica

then class is Virginica

2.8685<=petal length<= 5.2774

then class is Versicolour

vidual rules predicts a class and they collectively make up the

J.H. Ang et al. / Expert Systems with Applications 37 (2010) 13021315

7.3. Generalization and results comparison

8. Conclusions and future work

J.H. Ang et al. / Expert Systems with Applications 37 (2010) 13021315

Potrebbero piacerti anche