Sei sulla pagina 1di 7

Delft University of Technology

Delft Center for Systems and Control

Technical report 09-001

Fuzzy ant colony optimization for


optimal control∗

J. van Ast, R. Babuška, and B. De Schutter

If you want to cite this report, please use the following reference instead:
J. van Ast, R. Babuška, and B. De Schutter, “Fuzzy ant colony optimization
for optimal control,” Proceedings of the 2009 American Control Conference,
St. Louis, Missouri, pp. 1003–1008, June 2009.

Delft Center for Systems and Control


Delft University of Technology
Mekelweg 2, 2628 CD Delft
The Netherlands
phone: +31-15-278.51.19 (secretary)
fax: +31-15-278.66.79
URL: http://www.dcsc.tudelft.nl

This report can also be downloaded via http://pub.deschutter.info/abs/09_001.html
Fuzzy Ant Colony Optimization for Optimal Control
Jelmer van Ast, Robert Babuška, and Bart De Schutter

Abstract— Ant Colony Optimization (ACO) has proven to be II. R ELATION TO THE S TATE OF THE A RT
a very powerful optimization heuristic for Combinatorial Opti-
mization Problems. While being very successful for various NP-
complete optimization problems, ACO is not trivially applicable The state of the art related to the subjects covered in this
to control problems. In this paper a novel ACO algorithm is
introduced for the automated design of optimal control policies
paper can be summarized as follows.
for continuous-state dynamic systems. The so called Fuzzy ACO 1) The original ACO algorithm in the form of the Ant
algorithm integrates the multi-agent optimization heuristic of System has been introduced in [2] with the Ant Colony
ACO with a fuzzy partitioning of the state space of the system. System in [3] and the M A X -M I N Ant System in [4].
A simulated control problem is presented to demonstrate the
functioning of the proposed algorithm. The basic ACO algorithm and its variants have successfully
been applied to various optimization problems [5], [6],
I. I NTRODUCTION [7], [8]. A detailed description of ACO algorithms and its
applications can be found in the survey [1] and the book [5].
Ant Colony Optimization (ACO) is inspired by ants and 2) One of the first real application of the ACO framework
their behavior of finding shortest paths from their nest to to optimization problems in continuous search spaces is
sources of food. Without any leader that could guide the ants described in [9] and [10]. Earlier application of the ant
to optimal trajectories, the ants manage to find these optimal metaphor to continuous optimization appears in [11], with
trajectories over time in a distributed fashion. In an ACO more recent work like the Aggregation Pheromones System
algorithm, the metaphorical ants are agents programmed to in [12] and the Differential Ant-Stigmergy Algorithm in [13].
find an optimal combination of elements of a given set that
3) The first application of ACO to the automated design
maximizes some utility function. The key ingredient in ACO
of optimal control policies for continuous-state dynamic
and its biological counterpart are the pheromones. With real
systems has been developed by the authors of this paper
ants, these are chemicals deposited by the ants and their
in [14]. The method however is hampered by the large
concentration encodes a map of trajectories, where stronger
number of bins needed to quantize the state space in order
concentrations represent better trajectories. ACO represents
to capture the dynamics of the original system. This curse
the class of metaheuristic optimization methods that use the
of dimensionality is a phenomenon widely encountered when
concepts of distributed optimization and pheromone maps in
an originally continuous-state system needs to be represented
solving Combinatorial Optimization Problems [1].
using a finite number of quantized states.
This paper introduces an ACO-based algorithm for the
automated design of optimal control policies for continuous- 4) There are only a few publications that combine ACO
state dynamic systems. The algorithm combines the concepts with the concept of fuzzy control [15], [16], [17]. In all
of multi-agent optimization and fuzzy approximation of the three publications fuzzy controllers are obtained using ACO,
state space in a novel approach. rather than presenting an actual fuzzy ACO algorithm, as
This paper is structured as follows. Section II describes introduced in this paper.
the relation of the subjects covered in this paper to the This paper contributes to the above four categories of the
state of the art. In Section III, the ACO heuristic is briefly state of the art. It contributes to 1) by further exploring
reviewed. Section IV presents some preliminaries on the the applicability of ACO and to 2) by presenting a novel
control problem and the fuzzy partitioning of the state space. approach to the optimization in continuous spaces using
In Section V the Fuzzy ACO algorithm is introduced and ACO. It presents an extension to the work mentioned in 3)
described in detail. Section VI demonstrates the functioning by curbing the curse of dimensionality through the use of
of the Fuzzy ACO algorithm on a simple control problem a parametric interpolation to retain a continuous state space
and Section VII concludes this paper. with a finite number of parameters, typically much smaller
than the number of quantized states that would be necessary
This research is financially supported by Senter, Ministry of Economic to achieve similar accuracy. The interpolation method of
Affairs of The Netherlands within the BSIK-ICIS project “Self-Organizing choice is fuzzy approximation. Finally, the work in this paper
Moving Agents” (grant no. BSIK03024)
Jelmer van Ast, Robert Babuška, and Bart De Schutter are with is different from that described in 4) as it presents an ACO
the Delft Center for Systems and Control of the Delft University of algorithm that operates on the membership degrees of the
Technology, Mekelweg 2, 2628 CD Delft, The Netherlands (email: ants to find the optimal pheromone map corresponding to the
j.m.vanast@tudelft.nl, r.babuska@tudelft.nl,
b@deschutter.info). Bart De Schutter is also with the Marine and fuzzy partitioning of the state space, rather than optimizing
Transport Technology Department of the Delft University of Technology. the membership functions themselves, or the linguistic rules.
III. A NT C OLONY O PTIMIZATION each ant decides based on some probability distribution,
which solution component ci j to add to its partial solution
A. Framework for ACO Algorithms
sp,c . The probability pc { j|i} for an ant c on a vertex i to
ACO algorithms have been developed to solve hard com- move to a vertex j within its feasible neighborhood Ni is:
binatorial optimization problems [1]. A combinatorial opti-
β
mization problem can be represented as a tuple P = hS , Fi, τiαj ηi j
where S is the solution space with s ∈ S a specific candi- pc { j|i} = β
, ∀ j ∈ Ni , (1)
∑l∈Ni τilα ηil
date solution and F : S → R+ is a fitness function assigning
strictly positive values to candidate solutions, where higher with α and β determining the relative importance of ηi j
values correspond to better solutions. The purpose of the and τi j respectively. The neighborhood Ni is the set of
algorithm is to find a solution s∗ , or set of solutions S ∗ , not yet visited vertices that are connected to the vertex i.
s∗ ∈ S ∗ ⊆ S that maximizes the fitness function. The By moving from vertex i to vertex j, each ant adds the
solution s∗ is then called an optimal solution and S ∗ is associated solution component ci j to its partial solution sp
called the set of optimal solutions. until it reaches its terminal vertex and completes its candidate
In ACO, the combinatorial optimization problem is repre- solution. This candidate solution is evaluated using the fitness
sented as a graph consisting of a set of vertices and a set of function F(s) and the resulting value is used to update the
arcs connecting the vertices. A particular solution s consists pheromone levels by:
of solution components, which are denoted by ci j and which
are pairs of a vertex and an arc. Here the subscript i j denotes
τi j ← (1 − ρ )τi j + ρ ∑ ∆τi j (s), (2)
s∈Supd
that this solution component consists of a vertex i and an
arc that connects this vertex to another vertex j. A particular with ρ ∈ (0, 1) the evaporation rate and Supd the set of
solution s is thus a concatenation of solution components, solutions that are eligible to be used for the pheromone
and forms a tour from the initial vertex to the terminal update, which will be explained further on in this section.
vertex. How the terminal vertices are defined depends on the The pheromone deposit ∆τi j (s) is computed as:
problem considered. For instance, in the traveling salesman 
problem1 , there are multiple terminal vertices, namely for F(s) , if ci j ∈ s
∆τi j (s) =
each ant the terminal vertex is equal to its initial vertex, 0 , otherwise
after visiting all other cities (i.e. vertices) exactly once. For The pheromone levels are a measure of how desirable it
the application to control problems, as considered in this is to add the associated solution component to the partial
paper, the terminal vertex corresponds to the desired steady- solution. In order to incorporate forgetting, the pheromone
state. Two values are associated with the arcs: a pheromone levels decrease by some factor in each iteration. In this way
trail variable τi j and a heuristic variable ηi j . The pheromone it can be avoided that the algorithm prematurely converges to
trail represents the acquired knowledge about the optimal suboptimal solutions. In the next iteration, each ant repeats
solution over time and the heuristic variable provides a priori the previous steps, but now the pheromone levels have been
information about the quality of the solution component, i.e., updated and can be used to make better decisions about
the quality of moving to a node j from a node i. In the case which vertex to move to. After some stopping criterion has
of the traveling salesman problem, the heuristic variables been reached, the pheromone levels encode the solution of
typically represent the distance between the respective pair the optimization problem.
of cities. In general, the heuristic variables represent a short- There exist various rules to construct Supd , of which the
term quality measure of the solution component, while the most standard one is to use all the candidate solutions found
task is to acquire a concatenation of solution components that in the trial Strial 2 . This update rule is typical for the first ACO
overall form the optimal solution. The pheromone variables algorithm, called the Ant System (AS) [2]. Other update rules
basically encode the measure of the long-term quality of have shown to outperform the AS update rule. Most notably,
adding the solution component. The trade-off between these the two most successful ACO variants in practice, the Ant
two parameters is important for the performance of ACO. Colony System (ACS) [3] and the M A X -M I N Ant
System [4] respectively use the Global Best and the Iteration
B. The Ant System and Ant Colony System Best update rule. These methods result in a strong bias of the
The basic ACO algorithm works as follows. A set of M pheromone trail reinforcement towards solutions that have
ants is randomly distributed over the vertices. The heuristic been proven to perform well and additionally reduce the
variables are set to encode the prior knowledge by favoring computational complexity of the algorithm. An important
the choice of some vertices over others. For each ant c,
2 In ACO literature, the term trial is seldom used. It is rather a term from
the partial solution sp,c is initially empty and all pheromone
the reinforcement learning (RL) community [18]. In our opinion it is also a
variables are set to some initial value τ0 . In each iteration, more appropriate term for ACO and we will use it to denote the part of the
algorithm from the initialization of the ants over the state space until the
1 In the traveling salesman problem, there is a set of cities connected by global pheromone update step. The corresponding term for a trial in ACO
roads of different lengths and the problem is to find the sequence of cities is iteration and the set of all candidate solutions found in each iteration is
that takes the traveling salesman to all cities, visiting each city exactly once denoted as Siter . In this paper, equivalently to RL, we prefer to use the
and bringing him back to its initial city with a minimum length of the tour. word iteration to indicate one interaction step with the system.
element from the ACS algorithm is the local pheromone states and an input to the system may drive its state to one
update rule, which occurs while iterating through the trial bin, or another. In [14], a variation to the AS algorithm
and is defined as follows: is introduced that is capable of solving an optimal control
problem in this manner. However, the number of bins needed
τi j ← (1 − γ )τi j + γτ0 , (3) to accurately capture the dynamics of the original system
where γ ∈ (0, 1) is a parameter similar to ρ , i j is the index may become very large even for simple systems with only
of the solution component just added, and τ0 is the initial two state variables. Moreover, the time complexity of the
value of the pheromone trail. The effect of (3) is that during ACO algorithm grows exponentially with the number of bins,
the trial visited solution components are made less attractive making the algorithm infeasible for realistic systems.
for other ants to take, in that way promoting the exploration A much better alternative is not to quantize the state space
of other, less frequently visited, solution components. In this at all, but to approximate it by a smooth parameterized
paper, the introduced Fuzzy ACO algorithm is based on the function approximator. In that case, there is still a finite
AS combined with the local pheromone update rule of ACS. number of parameters, but this number is typically much
smaller compared to using crisp quantization. The universal
IV. C ONTROL P ROBLEM AND F UZZY A PPROXIMATION function approximator that is used in this paper is the fuzzy
A. The Optimal Control Problem approximator.
Assume we have a nonlinear system,
T characterized by a C. Fuzzy Approximation
state vector x = x1 x2 . . . xn . Also assume that the With fuzzy approximation, the domain of each state vari-
state can be controlled by an input u and can be measured able is partitioned using membership functions. We define the
at discrete time steps, with a sample time Ts , and that the membership functions for the state variables to be triangular-
system dynamics in discrete-time can be denoted as: shaped, such that the membership degrees for any value of
the state on the domain always sum up to one. Only the
x(k + 1) = f (x(k), u(k)),
centers of the membership functions have to be stored. Let
with k the discrete time index. The optimal control problem Ai denote the membership functions for x1 , with ai their
is to control the state of the system from any given initial centers for i = 1, . . . , NA , with NA the number of membership
state x(0) = x0 to a desired goal state x(K) = xg in an optimal functions for x1 . Similarly for x2 , denote the membership
way, where optimality is defined by minimizing the following functions by Bi , with bi their centers for i = 1, . . . , NB , with
quadratic cost function: NB the number of membership functions for x2 . Similarly,
K−1
the membership functions can be defined for the other state
J= ∑ eT (k + 1)Qe(k + 1) + uT (k)Ru(k), (4) variables in x, but for the sake of notation, the discussion
k=0 in this paper limits the number to two, without loss of
with e(k +1) = x(k +1)−xg the error at time k +1 and Q and generality. Note that in the example in Section VI, the order
R positive definite matrices of appropriate dimensions. The of the system is four.
The membership degree of Ai and Bi are respectively
problem is to find a nonlinear mapping from states to input
denoted by µAi (x1 (k)) and µBi (x2 (k)) for a specific value
u(k) = g(x(k)) that, when applied to the system in x0 results
of the state at time k. The degree of fulfillment is computed
in a sequence of state-action pairs (u(0), x(1)), (u(1), x(2)),
by multiplying the two membership degrees:
. . ., (u(K − 1), xg ) that minimizes this cost function. The
quadratic cost function in (4) is minimized if the goal is βi j (x(k)) = µAi (x1 (k)) · µB j (x2 (k)).
reached in minimum time given the dynamics of the system
Let the vector of all degrees of fulfillment for a certain
and restrictions on the size of the input. The matrices Q and
state at time k be denoted by:
R balance the importance of speed versus the aggressiveness
of the controller. β (x(k)) =[β11 (x(k)) β12 (x(k)) . . . β1NB (x(k))
In the case of an ACO implementation, only a finite set β21 (x(k)) β22 (x(k)) . . . β2NB (x(k))
of input values, called the action set U , can be used. The
... βNA NB (x(k))]T , (5)
goal is then actually to find the nonlinear mapping of states
to the action set that minimizes (4). which is a vector containing βi j for all combinations of i j.
Each element will be associated to a vertex in the graph
B. Quantization Issues used by the Fuzzy ACO algorithm introduced in this paper.
For a system with a continuous-valued state space, op- Most of the steps taken from the AS and ACS algorithms
timization algorithms like ACO can only be applied if the for dealing with the β (x(k)) vectors from (5) need to be
state space is quantized. The most straightforward way to reconsidered. This is the subject of the following section.
do this is to divide the state space into a finite number of
V. F UZZY ACO FOR O PTIMAL C ONTROL
bins, such that each state value is assigned to exactly one
bin. These bins can be enumerated and used as the vertices A. Outline of the Algorithm
in the ACO graph. It is easy to see that the state transitions In the original paper on ACO for optimal control [14],
are stochastic. Namely, each bin corresponds to a range of the continuous state variables were quantized into a finite
number of bins, called the quantized states. All combi- out as being the terminal vertex. Rather there has to be
nations of these quantized states for the different state defined a set of membership functions that can be used to
variables corresponded to the nodes in the graph and the determine to what degree the goal state has been reached.
arcs corresponded to transitions from one quantized state to These membership functions can be used to express the
another. Because of the quantization, the resulted system was linguistic fuzzy term of the state being close to the goal.
transformed into a stochastic decision problem. However, If this has been satisfied, the ant has terminated its trial.
the pheromones were associated to these arcs as usual. In
the fuzzy case, the state space is partitioned by membership C. Parameter Setting
functions, as described in Section IV-C and the combination Some of the parameters are similarly initialized as with the
of the indices to these membership functions for the different ACS. The global and local pheromone trail decay factors are
state variables correspond to the nodes in the construction set to a preferably small value, respectively ρ ∈ (0, 1) and
graph. With the fuzzy interpolation, the system remains a γ ∈ (0, 1). There will be no heuristic parameter associated to
deterministic decision problem, but the transition from node the arcs in the construction graph, so only an exponential
to node now does not directly correspond to a state transition. weighting factor for the pheromone trail α > 0 needs to
The pheromones are associated to the arcs as usual, but the be chosen. Increasing α leads to more biased decisions to-
updating needs to take into account the degree of fulfillment wards the one corresponding to the highest pheromone level.
of the associated membership functions. This updating will Choosing α = 2 or 3 appears to be an appropriate choice.
be described in Sections V-E and V-F. Furthermore, some control parameters for the algorithm need
In [14], the vertex to vertex transitions of an ant are not to be chosen, such as the maximum number of iterations per
deterministic. In Fuzzy ACO, an ant is not assigned to a trial, Kmax , and the maximum number of trials, Tmax . The
certain vertex at a certain time, but to all vertices according latter one can be set to, e.g., 100 trials, where the former one
to some degree of fulfillment at the same time and a transition depends on the sample time Ts and a guess of the time needed
from vertex to vertex is not trivial either. Because of this, a to get from the initial state to the goal optimally, Tguess . A
solution component ci j does not contain pairs of vertex-next good choice for Kmax would be to take about 10 times the
vertex, but of state-action. For this reason, a pheromone τi j expected number of iterations Tguess · Ts−1 . Specific to the
is now denoted as τiu with i the index of the vertex (i.e. the fuzzy implementation, the number of membership functions
corresponding element of β ) and u the action. For the sake and the spacing over the state domain need to be determined.
of notation, no distinction will be made between the actual Furthermore, the pheromones are initialized as τiu = τ0 for
input u(k) and the index of the input (the action) u. all i, u, where τ0 is a small, positive value, which, according
Similar to the definition of the vector of all degrees of to [3] can be chosen as τ0 = (n · Lguess )−1 , with n the number
fulfillment in (5), the vector of all pheromones for a certain of nodes and Lguess a guess of the optimal tour length. Finally
action u at time k is denoted as: the number of ants M must be chosen large enough such that
T the complete state space can be visited frequently enough.
τ u (k) = τ1u (k) τ2u (k) . . . τNA NB u (k) .


B. Outline of a Trial D. Fuzzy Action Selection


In the following sections, the fuzzy action selection and The action is chosen randomly according to the probability
the local and global pheromone update are explained in distribution:
more detail. Two more elements in this algorithm need τ αu (k)
pc {u|β c (k)}(k) = β Tc (k) · , (6)
special attention, namely the initialization of the ants and ∑l∈U τ αl (k)
the determination whether an ant has reached the goal.
When using a Global Best pheromone update rule in an where all operations are performed element-wise, except for
optimal control problem, all ants have to be initialized to the inner product (·). Note that when β c contains exactly
the same state, as starting from states that require less time one 1 and for the rest only zeros, this would correspond to
and less effort to reach the goal would always result in a the crisp case, where the state is quantized to a set of bins
better Global Best solution. Ultimately, initializing an ant and (6) then reduces to the original case in (1).
exactly in the goal state would be the best possible solution E. Local Pheromone Update
and no other solution, starting from more interesting states
The standard local pheromone update from the ACS
would get the opportunity to update the pheromones in the
algorithm from (3) can be modified to the fuzzy case as
global pheromone update phase. In order to find a control
follows:
policy from any initial state to the goal state, the Global
Best update rule cannot be used. Simply using all solutions τ u ←τ u (1 − β ) + ((1 − γ )τ u + γτ0 )β
of all ants in the updating, like in the original AS algorithm, = τ u (1 − γβ ) + τ0 (γβ ), (7)
the resulting algorithm does allow for random initialization
of the ants over the state space and is therefore used in the where all operations are performed element-wise.
Fuzzy ACO algorithm. As all ants update the pheromone levels associated with
Regarding the terminal condition for the ants, with the the state just visited and the action just taken in parallel,
fuzzy implementation, none of the vertices can be pointed one may wonder whether or not the order in which the
updates are done matters, when the algorithm is executed on VI. E XAMPLE : NAVIGATION WITH VARIABLE DAMPING
a standard CPU, where all operations are done in series. If it This section presents an example application of the Fuzzy
would matter, there would be a serious flaw in the algorithm. ACO algorithm to a continuous-state dynamic system. The
With crisp quantization, the ants may indeed sometimes visit dynamic system under consideration is a simulated two-
the same state and with fuzzy quantization, the ants may dimensional (2D) navigation problem and similar to the one
very well share some of the membership functions with a described in [19]. Note that it is not our purpose to demon-
membership degree larger than zero. We will show that in strate the superiority of Fuzzy ACO over any other method
both cases, the order of updates in series does not influence for this specific problem. Rather we want to demonstrate the
the final value of the pheromones after the joint update. In functioning of the algorithm.
the original, crisp case, the local pheromone update from (3)
may be rewritten as follows: A. Problem Formulation
(1)
A vehicle, modeled as a point-mass of 1 kg, has to be
τi j ←(1 − γ )τi j + γτ0 steered to the origin of a two-dimensional surface from any
= (1 − γ )(τi j − τ0 ) + τ0 . given initial position in an optimal manner. The vehicle ex-
periences a damping that varies non-linearly  over the surface.
T
After n updates, the pheromone level is reduced to: The state of the vehicle is defined as x = c1 v1 c2 v2 ,
with c1 , c2 and v1 , v2 the position and velocity in the direction
(n)
τi j ← (1 − γ )n (τi j − τ0 ) + τ0 , (8) of each of the two principal  axes respectively.
T
The control
input to the system u = u1 u2 is a two-dimensional
which shows that the order of the update is of no influence force. The dynamics are:
to the final value of the pheromone level.    
For the fuzzy case a similar derivation can be made. In 0 1 0 0 0 0
0 −b(c1 , c2 ) 0 0  x + 1 0 u,
  
general, after all the ants have performed the update, the ẋ = 0 0 0 1  0 0
pheromone vector is reduced to:
0 0 0 −b(c1 , c2 ) 0 1
τu ← ∏ {1 − γβ c }(τ u − τ0 ) + τ0 , (9) where the damping b(c1 , c2 ) in the experiments is modeled
c∈{1,2,...,M}
by an affine sum of two Gaussian functions, with means
where again all operations are performed element-wise. This (0, −2.3) and (4.7, 1) and standard deviations (2.5, 1.5) and
result also reveals that the final values of the pheromones are (1.5, 2) respectively. The damping profile can be seen in
invariant with respect to the order of updates. Furthermore, Fig. 1(b), where darker shading means more damping.
also note that when β c contains exactly one 1 and for the B. Fuzzy ACO Setup and Parameters
rest only zeros, corresponding to the crisp case, the fuzzy
The cores of the membership functions of the positions
local pheromone update from either (7) or (9) reduces to the
c1 , c2 are chosen to be {−5, −3.5, −2, −0.5, 0, 0.5, 2, 3.5, 5}
original case in respectively (3) or (8).
and those for the velocities v1 , v2 are {−2, −0.5, 0, 0.5, 2}.
The action set contains of 25 actions, namely the cross-
F. Global Pheromone Update product of the sets {−1, −0.5, 0, 0.5, 1} for both dimensions.
The global pheromone update step is similar to (2) with The local and global pheromone decay factors are respec-
the pheromone deposit defined as: tively γ = 0.01 and λ = 0.1. Furthermore, α = 3 and the
number of ants is 2000. The sampling time is Ts = 0.2 and the
J (s)β (x) , if ci j ∈ s ∈ Strial
 −1
ants are randomly initialized over the complete state space
∆τi j =
0 , otherwise, at the start of each trial. An ant terminates its trial when its
position and velocity in both dimensions are within a bound
with J(s) the cost of the sequence of states and actions, of ±0.25 and ±0.05 from the goal respectively.
according to (4).
As explained in Section V-B, for optimal control problems, C. Simulation Results
the appropriate update rule is to use all solutions by all ants The convergence of the Fuzzy ACO algorithm is depicted
in the trial Strial . In the fuzzy case, the solutions s ∈ Strial in Fig. 1(a). It shows that the relative variation of the policy
consist of sequences of states and actions and the states can is already very low after about 20 trials. A slice of resulted
be fuzzified so that they are represented by sequences of policy for zero velocity is depicted together with the damping
vectors of degrees of fulfillment β . Instead of one pheromone profile in Fig. 1(b). The policy shows the mapping of the
level, in the fuzzy case a set of pheromone levels are updated positions in both dimensions to the input on a fine grid.
to a certain degree. It can be easily seen that as this update Fig. 1(c) presents the trajectories of the vehicle for various
process is just a series of pheromone deposits, the final value initial positions and zero initial velocity. It shows that the
of the pheromone levels relates to the sum of these deposits vehicles manage to drive quickly to the goal, while avoiding
and is invariant with respect to the order of these deposits. the regions of stronger damping to a certain extent. However,
This is also the case for this step in the AS or ACS algorithm. the trajectories are only close to optimal. Especially for the
1 5 5

0.9 4 4

0.8 3 3

0.7 2 2
Policy variation

0.6 1 1

c1 [m]

c1 [m]
0.5 0 0

0.4 −1 −1

0.3 −2 −2

0.2 −3 −3

0.1 −4 −4

0 −5 −5
10 20 30 40 50 60 70 80 90 −5 0 5 −5 0 5
Trial c2 [m] c2 [m]

(a) Convergence of the algorithm in terms of the (b) A slice of the resulting policy for zero (c) Trajectories of the vehicle under the resulted
fraction of cores of the membership functions for velocity. It shows the control input for a fine policy for various initial positions and zero ve-
which the policy changed at the end of the trial. grid of positions. The damping profile is shown, locity. The markers indicate the positions at the
where darker shades mean more damping. sampling instances.

Fig. 1. Results.

case where the vehicle starts in the bottom-left corner, the [8] Y. Hsiao, C. Chuang, and C. Chien, “Computer network load-
optimality of the trajectory can be questioned as the vehicle balancing and routing by ant colony optimization,” in Proceedings
of the IEEE International Conference on Networks (ICON 2004),
drives straight to the goal, without avoiding the region of Singapore, November 2004, pp. 313–318.
larger damping at all. These results demonstrate that the [9] K. Socha and C. Blum, “An ant colony optimization algorithm for
algorithm is capable of converging quickly, but only to a continuous optimization: application to feed-forward neural network
training,” Neural Computing & Applications, vol. 16, no. 3, pp. 235–
suboptimal policy with the settings used in the experiments. 247, May 2007.
[10] K. Socha and M. Dorigo, “Ant colony optimization for continuous
VII. C ONCLUSIONS AND F UTURE W ORK domains,” European Journal of Operational Research, vol. 185, no. 3,
This paper has introduced the Fuzzy ACO algorithm for pp. 1155–1173, 2008.
optimal control problems, which combines the framework of [11] G. Bilchev and I. C. Parmee, “The ant colony metaphor for searching
continuous design spaces,” in Selected Papers from AISB Workshop
the AS and ACS algorithms with a fuzzy partitioning of the on Evolutionary Computing, ser. Lecture Notes in Computer Science,
state space. The applicability of this algorithm to optimal T. Fogarty, Ed., vol. 993. London, UK: Springer-Verlag, April 1995,
control problems with continuous-valued states is outlined pp. 25–39.
[12] S. Tsutsui, M. Pelikan, and A. Ghosh, “Performance of aggregation
and demonstrated on the non-linear control problem of two- pheromone system on unimodal and multimodal problems,” in Pro-
dimensional navigation with variable damping. The results ceedings of the 2005 Congress on Evolutionary Computation (CEC
show convergence of the algorithm to a suboptimal policy 2005), September 2005, pp. 880–887.
[13] P. Korosec, J. Silc, K. Oblak, and F. Kosel, “The differential ant-
that drives the vehicle to the goal for any initial state. Future stigmergy algorithm: an experimental evaluation and a real-world
research must further develop the algorithm to deal with application,” in Proceedings of the 2007 Congress on Evolutionary
suboptimality in a better way and to theoretically prove its Computation (CEC 2007), September 2007, pp. 157–164.
[14] J. M. van Ast, R. Babuška, and B. De Schutter, “Ant colony opti-
convergence. mization for optimal control,” in Proceedings of the 2008 Congress
R EFERENCES on Evolutionary Computation (CEC 2008), Hong Kong, China, June
2008, pp. 2040–2046.
[1] M. Dorigo and C. Blum, “Ant colony optimization theory: a survey,” [15] J. Casillas, O. Cordón, and F. Herrera, “Learning fuzzy rule-based
Theoretical Computer Science, vol. 344, no. 2-3, pp. 243–278, Novem- systems using ant colony optimization algorithms,” in Proceedings of
ber 2005. the ANTS’2000. From Ant Colonies to Artificial Ants: Second Inter-
[2] M. Dorigo, V. Maniezzo, and A. Colorni, “Ant system: optimization antional Workshop on Ant Algorithms. Brussels (Belgium), September
by a colony of cooperating agents,” IEEE Transactions on Systems, 2000, pp. 13–21.
Man, and Cybernetics, Part B, vol. 26, no. 1, pp. 29–41, 1996. [16] B. Zhao and S. Li, “Design of a fuzzy logic controller by ant
[3] M. Dorigo and L. Gambardella, “Ant Colony System: a cooperative colony algorithm with application to an inverted pendulum system,” in
learning approach to the traveling salesman problem,” IEEE Transac- Proceedings of the IEEE International Conference on Systems, Man
tions on Evolutionary Computation, vol. 1, no. 1, pp. 53–66, 1997. and Cybernetics, 2006, pp. 3790–3794.
[4] T. Stutzle and U. Hoos, “MAX MIN Ant System,” Journal of Future [17] W. Zhu, J. Chen, and B. Zhu, “Optimal design of fuzzy controller
Generation Computer Systems, vol. 16, pp. 889–914, 2000. based on ant colony algorithms,” in Proceedings of the IEEE Interna-
[5] M. Dorigo and T. Stützle, Ant Colony Optimization. Cambridge, MA, tional Conference on Mechatronics and Automation, 2006, pp. 1603–
USA: The MIT Press, 2004. 1607.
[6] P. K. Jain and P. K. Sharma, “Solving job shop layout problem [18] R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduc-
using ant colony optimization technique,” in Proceedings of the IEEE tion. Cambridge, MA: MIT Press, 1998.
International Conference on Systems, Man and Cybernetics, Big [19] L. Busoniu, D. Ernst, B. De Schutter, and R. Babuška, “Continuous-
Island, HI, USA, October 2005, pp. 288–292. state reinforcement learning with fuzzy approximation,” IEEE Trans-
[7] M. T. Islam, P. Thulasiraman, and R. K. Thulasiram, “A parallel ant actions on Systems, Man and Cybernetics, Part C, vol. 38, pp. 156–
colony optimization algorithm for all-pair routing in MANETs,” in 172, 2008.
Proceedings of the International Symposium on Parallel and Dis-
tributed Processing (IPDPS 2003), Nice, France, April 2003.

Potrebbero piacerti anche