Для Просмотра Статьи Разгадайте Капчу

Applied Soft Computing 46 (2016) 187203
Contents lists available at ScienceDirect
Applied Soft Computing

journal homepage: www.elsevier.com/locate/asoc
Parallel extremal optimization in processor load balancing for

distributed applications
Ivanoe De Falco a , Eryk Laskowski b, , Richard Olejnik c , Umberto Scafuri a ,
Ernesto Tarantino a , Marek Tudruj b,d
a
Institute of High Performance Computing and Networking, CNR, Naples, Italy

Institute of Computer Science, Polish Academy of Sciences, Warsaw, Poland
c
University of Lille, CNRS, Centrale Lille, UMR 9189 CRIStAL, F-59000 Lille, France
d
Polish-Japanese Academy of Information Technology, Warsaw, Poland
b
a r t i c l e
i n f o
Article history:
Received 15 July 2015
Received in revised form 25 April 2016
Accepted 26 April 2016
Available online 6 May 2016
Keywords:
Distributed programs
Load balancing
Extremal optimization
a b s t r a c t
The paper concerns parallel methods for extremal optimization (EO) applied in processor load balancing
in execution of distributed programs. In these methods EO algorithms detect an optimized strategy of
tasks migration leading to reduction of program execution time. We use an improved EO algorithm
with guided state changes (EO-GS) that provides parallel search for next solution state during solution
improvement based on some knowledge of the problem. The search is based on two-step stochastic
selection using two tness functions which account for computation and communication assessment of
migration targets. Based on the improved EO-GS approach we propose and evaluate several versions of
the parallelization methods of EO algorithms in the context of processor load balancing. Some of them use
the crossover operation known in genetic algorithms. The quality of the proposed algorithms is evaluated
by experiments with simulated load balancing in execution of distributed programs represented as macro
data ow graphs. Load balancing based on so parallelized improved EO provides better convergence of
the algorithm, smaller number of task migrations to be done and reduced execution time of applications.
2016 Elsevier B.V. All rights reserved.
1. Introduction
Dynamic load balancing in parallel and distributed systems is a
very important problem of computer engineering. It has accumulated a very rich bibliography, too numerous to be detailed in this
paper, including a large number of survey papers [18].
Load balancing has been recently supported by many natureinspired optimization methods for which some representative
papers have been outlined in Section 2. Among nature inspired
methods for load balancing no attention has been paid so far to
extremal optimization (EO) [9] which is a fairly new optimization
This paper is an extended, improved version of the paper Parallel Extremal

Optimization with Guided State Changes Applied to Load Balancing presented at
EvoCOMNET 2015 and published in: Applications of Evolutionary Computing, Proc.
of 18th European Conference, EvoApplications 2015, Copenhagen, Denmark, April
810, 2015, LNCS Vol. 9028, pp. 7990, Springer 2015.
Corresponding author. Tel.: +48 223800517.
E-mail addresses:
ivanoe.defalco@na.icar.cnr.it (I. De Falco), laskowsk@ipipan.waw.pl (E. Laskowski),
richard.olejnik@li.fr (R. Olejnik), umberto.scafuri@na.icar.cnr.it (U. Scafuri),
ernesto.tarantino@na.icar.cnr.it (E. Tarantino), tudruj@ipipan.waw.pl (M. Tudruj).
http://dx.doi.org/10.1016/j.asoc.2016.04.033
1568-4946/ 2016 Elsevier B.V. All rights reserved.
technique with very interesting properties. Main elements of this

technique are stochastic improvements of the worst components of
problems solution represented in a way similar to a chromosome
in a genetic algorithm. EO follows the approach of self-organized
criticality [10], which means that the quality of a problem solution
improves after a number of iterative improvements due to global
auto-breeding effect in response to actions on solution worst components. Very important features of EO are low operational and
memory complexities, which make it a good candidate for on-line
dynamic load balancing.
The idea of modifying the worst solution elements is not unique
to EO, but it is also present in the shufed frog leaping algorithm
(SFLA) [11]. However, the complexity of solution improving actions
is lower in EO than in SFLA. This is because SFLA is based on a
population of solution representations, structured into groups and
subgroups in which sets of worst components are modied in each
iteration. In EO, instead, just one solution individual exists at each
generation, and only one of its components is modied at that generation. Another difference is that in EO a global tness function and
a local tness function exist, whereas in SFLA tness is computed for
frogs (solutions) only, not for their components. Also mechanisms
for acceptance of new individuals are different.
188
I. De Falco et al. / Applied Soft Computing 46 (2016) 187203
Classical mathematical programming approaches can be used to

face the load balancing tasks. Given the non-linearity of the problem, non-linear programming procedures should be used. They
have the advantage of being able to quickly detect optima. The
drawback with them is that many of the nonlinear-programming
solution procedures that have been developed so far do not solve
the problem of optimizing a function in its generality, rather
this can be achieved in a set of special cases [12]. Otherwise
said, most general-purpose nonlinear programming procedures are
near-sighted and can do no better than determine local maxima,
apart from some special cases such as when for example the function under evaluation is convex [12].
In our previous papers [1315], we have shown how to use EO
sequential algorithms to support load balancing of processors in
execution of distributed programs. In these papers, we have additionally modied the EO algorithm to improve its convergence and
better deal with the time constraints posed by this real-time problem, for larger load balancing problems, for which we have noticed
unsatisfactory EO behaviour. In the modied EO, we have replaced a
fully random processor selection as a target of task migration in load
balancing actions by stochastic selection in which the probability
used in the selection mechanism is guided by some knowledge
of the problem. It constitutes the essence of the EO algorithm
with guided state changes (EO-GS). The guidance is based on a
formula which examines how a migrated task ts a given target
processor in terms of the global computational balance in the system and the processor communication loads. In [16], which is the
basis of this publication, we have presented an initial approach to
parallelization of EO. The algorithm was evaluated by simulation
experiments in the discrete event simulation (DEVS) model [17].
The experiments have assessed the new algorithm against different parameters of the application program graphs and the kinds of
load balancing algorithms.
Within the current paper we are interested in parallelized versions of the EO-based processor load balancing algorithms applied
in execution of distributed programs, thus extending our previous results from [16]. Parallelization is an interesting aspect of
EO, since it additionally adapts this technique towards on line
dynamic optimization. In this respect, this research direction is
fully convergent with this techniques low complexity. Generally speaking, parallelization of EO can be considered using two
methods. The rst method is to intensify the actions aiming at
a possibly stronger improvement of the current EO solution, frequently with the introduction of a population-based representation
or a multipoint strategy during solution improvement. It can be
done using a really parallel system or modelling concurrency in
a sequential system in which some components of an EO solution are identied based on multipoint selection and improved in
a possibly concurrent way. The second method consists in using
a population-based approach for solutions with parallel component improvement. Both approaches have accumulated already
some non-negligible bibliography (see Section 2). In the mentioned
papers, different parallel versions of EO were proposed based on the
population-based and distributed approach including the island
model. However, in these papers EO has not been applied in the
context of load balancing and they do not address the use of problem knowledge-guided search viewed from a theoretical point
of view.
In this paper, we use a population-based EO approach in solving
a processor load balancing problem for programs represented as
layered graphs of tasks. In our approach, we rst identify load
imbalance in the functioning of the executive system. Then, we
apply a parallel (population-based) EO-GS algorithm to select tasks
which are to be migrated among processors to improve the general
balance of processor loads. The EO-GS algorithm is performed
a given number of times in the background of the application
program to nd a number of best logical migrations of tasks. When

the iterations are over, the physical migrations worked out by EOGS take place. In the parallel EO-GS algorithm we dene and use an
additional local tness function as a base for a stochastic selection
of the best solution state in the neighbourhood of the one chosen
for improvement. Additionally, we verify the use of the crossover
operation (known in genetic algorithms) in the algorithm steps
leading to selection of the next solution state used in further EO
iterations. Performance features of the proposed approach have
been assessed by experiments performed with load balancing
of applications represented by layered graphs. The experiments
include mutual comparisons of the proposed variants of EO-based
algorithms and the algorithms tests related to scalability.
The paper is organized as follows: Section 2 surveys and discusses the state of the art in the domain of the EO-based parallel
optimization methods. Section 3 presents EO principles including
the guided state changes. Section 4 reports theoretical foundations
for the load balancing method based on EO that we propose. Section 5 outlines the way in which we parallelize EO algorithms.
Section 6 shows the experimental results which evaluate the proposed approach. Section 7 compares the proposed algorithms using
a classical statistical technique based on non-parametric statistical
tests.
2. State of the art

Processor load balancing is one of the most important research
domains in the methodology of parallel and distributed systems.
The number of papers covering this domain is beyond the possibility to be cited and discussed here, so we identify only leading
relevant survey papers. Good surveys and classications of general
load balancing methods are presented in [13]. In [4] a survey on
energy efcient load balancing algorithms for multicore processors
is presented. Surveys of load balancing techniques for application in cloud computing can be found in [6,7]. In our paper we
address application of a specic nature-inspired algorithm to load
balancing. Nature-inspired algorithms applied to load balancing,
including genetic algorithms [1821], simulated annealing [22],
swarm intelligence methods [23,24], ant colonies [2527] and similar, have received attention in many papers. Good surveys of this
subject can be found in [8,15]. Among relevant earlier papers enumerated in the surveys we have not spotted any reports on research
on application of EO to processor load balancing. The only papers
on this subject are [1315], however they propose only sequential
EO algorithms for these purposes.
Our current paper concerns parallel EO-based methods, hence in
the rest of this section we will discuss so-far works on EO algorithms
suitable for parallel implementation, presenting new methodology
which has already accumulated some non-negligible bibliography. In [28] the authors propose an extended EO model called a
population-based EO in which a single EO solution is replaced by a
set of EO solutions which are improved using the general EO strategy. The set of these solutions is subject to parallel component
selection and mutation that provides a number of solution vectors
next processed in parallel.
In [29] population-based EO was applied for solving numerical
constrained optimization problems. This was done on a set of six
benchmark problems in the domain of constrained nonlinear programming. Both coarse and ne grain search methods were tested.
The local tness function evaluated for the components of a solution was the mutation cost measured as the sum of the deviation
of the new solution after mutation from the currently known best
solution plus the sum of penalties due to constraints violation. The
experiments have shown a competitive performance of the proposed approach in comparison to other state of the art approaches.
In [30] a population-based EO was applied to multi-objective

optimization. The paper presents the Pareto-based EO algorithm, in
which local tness assigned to a component of a solution is based on
the Pareto domination in respect to the set of assumed objectives.
The algorithm maintains an archive of non-dominated solutions
which contains the searched Pareto-optimal set of solutions, additionally used in comparisons for solutions selection for next EO
cycles. First, the algorithm generates an initial population of solutions. For each solution in this population and for each component
in each solution an offspring solution is generated by mutation. For
all offsprings of each initial solution, a component ranking based on
the Pareto dominance is dened in which the rank of the component is dened by the number of solutions a derivative solution is
dominated. The tness function of a component in a parent solution
is the number of solutions the offsprings bound to the component are dominated by. The component related to a non-dominated
solution gets the lowest tness and the rank equal to zero. Such
component is selected for the mutation to produce a solution which
replaces the parent solution in the population and is stored in the
archive. In case of a tie for the zero value components, the selection is governed by a crowding-distance metrics which evaluates
the density of the neighbouring offspring solutions in the space of
problem objectives. When the capacity of the archive is exceeded,
its contents is reduced using the crowding-distance metrics.
In [3134] the authors propose a rich set of EO improvements
used for optimization of problems in molecular biology. The modied EO (MEO) algorithm starts with a single current solution. The
modication consists in generation of a set of neighbour solutions
based on improvement (random mutation) of the current solution
components chosen by roulette wheel selection with the probability proportional to a local tness function. The best solution is
selected among multiple thus formed new solutions in respect to
the global tness function. It becomes the new current solution if
its global tness is better than the current one. Otherwise a next
MEO round is executed based on the same current solution if the
total number of iterations has not been met. When the assumed
number of MEO iterations is completed, the registered best solution
becomes the outcome of the MEO algorithm.
The population-based modied EO (PMEO) discussed in the
papers mentioned above combines a population-based approach to
solution generation with the MEO approach to the selection of the
best solution for further improvements. PMEO starts with a number
(population) of current solutions, usually generated at random. All
members of a population are improved using the MEO approach,
which performs random mutation of sets of components chosen by
roulette wheel selection based on a local tness function. The generated solutions copy a substructure of the solution, that behaves
well in the solution improvement which is a form of injection of best
solution features into the processed population. When the assumed
number of PMEO iterations is completed, the best solution out of
all best ones produced in all populations is returned as the result
of the PMEO algorithm.
The third approach identied in the papers mentioned above
is the distributed modied EO (DMEO) which is a combination
of the PMEO approach and the distributed genetic algorithms
methodology. The DMEO starts with an initial family of solution
sub-populations generated at random. They are next distributed
into distinct islands of components improvements. The islands
evolve using the PMEO method with execution of a number of
mutations. After a number of mutation generations inside the
islands are performed, peer-to-peer transfers of solution individuals between the islands are executed. Each island improves a set
of solutions to nd a best island solution. When the assumed number of DMEO iterations is completed, the best solution out of all
registered in all islands becomes the outcome of the DMEO algorithm.
189
In [35,36] an island model-based DMEO combined with tabu

search was proposed applied to molecular biology. From a general point of view, this algorithm behaves in the similar way as the
standard DMEO algorithm and is composed of iterations of the MEO
algorithm executed on solutions in distributed sub-populations.
However, at the level of internal MEO algorithms, tabu lists of
already generated offspring solutions are maintained and updated.
A solution can be introduced into a set of candidates for further
improvement only if it is not on the tabu list. The proposed model
showed better performance than the standard DMEO algorithm. In
[3739] optimal protein structure alignment was elaborated using
MEO and DMEO approaches, similar to those explained above.
To conclude the overview of the cited above methods including the modied, population-based and distributed EO we can say
that we have not found any EO-based methods oriented towards
processor load balancing for execution of distributed applications. In the analyzed algorithms, the component selection for
improvements was univocally governed by local tness functions expressing some problem properties. In the mutation of the
selected solution components, a stochastic method of fully random replacement was mostly applied. In some cases, the change of
the selected component was governed by inspection of some relations between possible values of mutated components. No clearly
visible interest was observed in more elaborate analysis of the
properties of so generated solutions with the use of additional
tness functions as a basis of the component mutation denition. In some of the population-based EO periodic migrations of
solutions between sub-populations were applied. No direct use of
the support of the crossover known in typical genetic algorithms
was proposed nor assessed in selection and mutation operations
in EO. In the cited papers no attention has been paid to the
characteristics of executive systems on which the populationbased and distributed modied EO algorithms were executed. It
is unclear if the algorithms were performed on a single-core or
a multi-core processor or a multiprocessor cluster. Very rarely
any measurements are given on the execution time of the EO
algorithms, which testies weak interest so far in using EO for
on-line or real-time optimization, especially in parallel computing
environments.
This paper is an extended version of our initial paper presented at EvoComnet 2015 conference [16]. Compared to the initial
paper the current version has been extended in several aspects.
The number of discussed EO parallel algorithms applied for load
balancing was increased from two in the initial paper to six in
the current version. This includes addition of three algorithms
which apply crossover operations on EO solutions and also selection based on the previous iterations history in the algorithms
phase where the starting solutions for next EO iteration are selected
or generated. The discussed algorithms have been experimentally
assessed by experiments with simulated load balancing of execution of synthetic application graphs. Statistical comparisons of the
behaviour of the six EO algorithms in terms of the obtained application speedup and migration cost due to load balancing based on
EO algorithms have been accomplished. Comparing the EO parallel algorithms earlier proposed in the literature and discussed
in this state of the art review, our proposed new algorithms partially extend the population-Based EO, the modied EO (MEO) and
the population-based modied EO (PMEO) by introduction of the
Guided State Changes approach in the parallel branches (inner
loops) of the algorithm and of the solution crossover in the phase
of dening the starting solutions for next iterations of the outer
loops of the algorithms. Partial development means that we have
introduced new algorithmic elements in selected points of the
algorithm (component mutation, starting solution set denition
for next EO generations) while preserving the general populationbased style of the algorithms.
190
3. Extremal optimization with guided state changes

Before we go to parallel versions of the EO algorithms, we will
recall the basics of the sequential EO-GS in its general form. In classic sequential EO algorithms we use iterative updates of a single
solution S built of a number of components si , which are variables of
the problem. For each component, a value i of a local tness function is evaluated to select the worst variable sw in the solution. In a
generic EO, S is modied at each iteration step, by randomly updating the worst variable. As a result, a solution S is created which
belongs to the neighbourhood Neigh(S, sw ) of S. For S the global
tness function (S ) is evaluated which assesses the quality of S .
The new solution S replaces S if its global tness is better than that
of S. We can avoid staying in a local optimum in such EO, by using
a probabilistic version -EO proposed in [9]. It is based on a userdened parameter , used in stochastic selection of the updated
component. In a minimization problem solved by -EO, the solution components are rst assigned ranks k, 1 k n, where n is the
number of the components, consistently with the increasing order
of their local tness values. This is done by a permutation of the
component labels i such that: (1) (2) . . . (n). The worst
component si is of rank 1, while the best one is of rank n. Then, the
component selection probability pk over the ranks k is dened as
proportional to k , for a given value of the parameter . At each
iteration, a component rank k is selected in the current solution S
according to pk , using a roulette wheel method. Next, the respective
component sj with j = (k) randomly changes its state and S moves
to a neighbouring solution, S Neigh(S, sj ), unconditionally. The
control parameters of -EO are: the total number of iterations Niter
and the probabilistic selection parameter .
Algorithm 1.
EO algorithm with guided state changes (EO-GS)
initialize conguration S at will

Sbest S
while total number of iterations Niter not reached do
evaluate i for each variable si of the current solution S
rank the variables si based on their local tness i
choose the rank k according to k so that the variable sj with j = (k) is selected
evaluate s for each neighbour Sv Neigh(S, sj ), generated by changing sj in the
current solution S
rank neighbours Sv Neigh(S, sj ) based on the target function s
choose S Neigh(S, sj ) according to the exponential distribution
accept S S unconditionally
if (S) < (Sbest ) then
Sbest S
end if
end while
return Sbest and (Sbest )
To improve the convergence rate of -EO, we have proposed

EO-GS [14,15]. In this approach, some knowledge of the problem
properties is used for next solution selection in consecutive EO iterations with the help of an additional local target function s . This
function is evaluated for all neighbour solutions existing in Neigh(S,
s(k) ) for the selected rank k. Then, the neighbour solutions are
sorted and assigned GS-ranks g with the use of the function s . The
new state S Neigh(S, s(k) ) is selected in a stochastic way using a
roulette wheel method based on the exponential distribution with
the selection probability p Exp(g, ) = eg . Due to this, better
neighbour solutions are more probable to be selected. The bias to
better neighbours is controlled by the parameter. The general
scheme of the discussed EO-GS approach is shown as Algorithm 1.
In this scheme the formulae for local and global tness functions (i
and (S)), the local target function s as well as the way in which
the neighbourhood of a solution S is determined and the values of
and depend on the optimized problem.
This choice has been made because of the time constraints posed
by the problem we wish to face. In fact, our EO for load balancing
should be run at set time instants during the execution of the distributed application, so its execution should last as low an amount
of time as possible. This leads to our decision of not using a totally
random-based replacement strategy, rather a best-based one.
This approach of exploiting best-based search strategies is quite
typical in problems that have to be faced in a quasi-real-time way,
as this choice often leads to a reduction in the time needed to nd
a solution of acceptable quality, see for example [40,41]. Actually,
the mechanism for the choice of the neighbour we introduce here,
and hence the resulting strategy, are not strictly deterministic in
the use of the best, rather they are probabilistic, as they tend to
favour the best neighbouring solutions, yet also bad ones could be
selected, although with a low probability.
It should be noted here that the EO term component is similar
in meaning to the term subcomponent that is used in cooperative coevolution [42,43], and that the local tness function
term employed here is similar to the subcomponent evaluation
utilized there. In fact, in both cases attention is paid to evolve
solutions to complex problems in the form of interacting coadapted subcomponents that should emerge rather than being hand
designed. As it is noted in [42]: If a problem can be decomposed
into subcomponents without interdependencies, clearly each can
be evolved without regard to the others. Unfortunately, many
problems can only be decomposed into subcomponents exhibiting complex interdependencies. The effect of changing one of
these interdependent subcomponents is sometimes described as
a deforming or warping of the tness landscapes associated with
each of the other interdependent subcomponents. In the problem
we face within this paper, exactly as in [42], subcomponents are
interdependent, evolution aims at improving subcomponents, and
our local tness function performs the evaluation of what is called
a subcomponent in [42].
4. Load balancing based on the EO approach
In this section we will recall basic theoretical foundations for the
proposed EO-based load balancing. The proposed load balancing
method is meant for a cluster of multicore processors interconnected by a message passing network. Load balancing actions for
a program are controlled at the level of indivisible tasks which are
process threads.
We assume that the load balancing algorithms dynamically control assignment of program tasks tk , k {1 . . . |T|} to processors
(computing nodes) n, n {0, 1, . . ., |N| 1}, where T and N are the
sets of all the tasks and the computing nodes, respectively. The
goal is the minimal total program execution time, achieved by task
migration between processors. The load balancing method is based
on a series of steps in which detection and correction of processor load imbalance is done, Fig. 1. The imbalance detection relies
on some run-time infrastructure which observes the state of the
executive computer system and the execution states of application programs. Processors (computing nodes) periodically report
their current loads to the load balancing control which monitors
the current system load imbalance. When load imbalance is discovered, processor load correction actions are launched. For them
an EO-GS algorithm is executed that identies the tasks which need
migration and the processor nodes which will be migration targets.
Following this, the required physical task migrations are performed
with the return to the load imbalance detection.
In the solutions published in [13,14], the applied EO-GS algorithm was of the sequential character (see Section 3). In Section 5
of this paper we describe several parallel versions of EO-GS whose
applications were studied in the context of load balancing. Their
experimental evaluations are described in Section 6.
To evaluate the load of the system two indicators are used.
The rst is a function returning the computing power of a node
191
tasks placed on different nodes. This function is normalized in the

range [0, 1]. In executive systems with homogeneous communication links it is a quotient of an absolute value of the total external
communication volume and the total communication volume of
all communications (when all tasks are placed on the same node
attrexttotal(S) = 0, when tasks are placed in the way that all communication is external attrexttotal(S) = 1); in heterogeneous executive
systems equivalent measures of the communication time are used:
attrexttotal(S) = totalext(S)/CT
where CT =
s,d T com(s,
d) and totalext(S) =

s,d T :s =
/ d
(3)
com
(s, d).
The function migration(S) is a migration costs metrics. The
value of this function is in the range [0, 1], i.e., it is equal to 0
when there is no migration, when all tasks have to be migrated
migration(S) = 1, otherwise 0 migration(S) 1:
migration(S) = |{t T : St =
/ S
t }|/|T |
Fig. 1. The general scheme of load balancing based on EO with guided state changes.
n : powerCPU (n), which is the sum of potential computing powers

of all the active cores on the node. The second is the percentage
of the CPU power available for application threads on the node
n : timeCPU (n), periodically estimated on computing nodes. The
percentage of the CPU power available for a single thread is computed as a quotient of the time during which the CPU was allocated
to a probe thread against the time interval of the measurement.
timeCPU (n) value is the sum of the percentages of CPU power available for the number of probe threads equal to the number of cores
on the node.
System load imbalance I is a Boolean dened based on the difference of the CPU availability between the currently most heavily
and the least heavily loaded computing nodes:
I=
true
false
if max(timeCPU (n)) min(timeCPU (n)) A

nN
nN
where S is the currently considered solution and S* is the previous

solution (or the initial solution in the algorithm).
The function imbalance(S) represents the numerical computational load imbalance metrics in the solution S. It is equal to 1 when
in S there exists at least one unloaded (empty) computing node,
otherwise it is equal to the normalized average absolute computational load deviation of tasks in S, determined in the denition
below:
imbalance(S) =
exists at least one
deviation(S)/(2 |N| WT )
unloaded node
otherwise
(5)
where
deviation(S)
= n=0,. . .,|N|1 |nwp(S, n)/powerCPU (n) WT |,
WT = t T wp(t)/ n=0,. . .,|N|1 powerCPU (n),
nwp(S, n) =

wp(t).
t T : =n
t
4.2. The local tness function

(1)
otherwise
The load imbalance equal true requires a load correction. The

value of A is set using an experimental approach (during experiments we set it between 25% and 75%).
An application is characterized by two programmer-supplied
parameters based on the volume of computations and communication related to program tasks: com(ts , td ) is a communication
metrics related to a pair of tasks ts and td , wp(t) is a computational
load metrics introduced by a task t. com(ts , td ) and wp(t) metrics can
provide exact values, e.g. for well-dened tasks sizes and inter-task
communication in regular parallel applications, or only some predictions, e.g., when the execution time depends on the processed
data.
A task mapping solution S is represented by a vector = (1 , . . .,
|T| ) of |T| integers ranging in the interval {0, 1, . . ., |N| 1}. i = j
means that the solution S under consideration maps the ith task
ti onto the computing node j. We assume that in S each task ti is
assigned to only one computing node.
4.1. The global tness function
The global tness function (S) is dened as follows:
The local tness (or per-component tness) (t) of a task t in

a solution S is dened using the formula (6) below, in such a way
that it forces moving tasks away from overloaded nodes, at the
same time preserving low external (inter-node) communication. A
component of a solution is a pair task-processor, such that the
index of the task is equal to the index of the component in the
solution vector. The local tness function of the component t is
computed assuming that the task t is assigned to the processor t .
The
parameter (0 <
< 1) allows tuning the weight of the computational load metrics against the communication metrics concerned
with a component t (task t assigned to processor t ).
(t) =
load(t ) + (1
) rank(t)
(2)
where 1 > 1 0, 1 > 2 0 and 1 + 2 < 1 hold.

The function attrexttotal(S) represents the impact of the total
external communication between tasks on the quality of a given
mapping S. By external we mean the communication between
(6)
The function load(n) indicates how much the computational

load of node n, which executes the task t, exceeds the average
computational load of all nodes. It is normalized versus the heaviest computational load among all the nodes. The rank(t) function
governs the selection of best candidates for migration. A better
chance for migration have tasks, which show low communication
with their current node (attraction) and low computational load
deviation from the average load:
rank(t) = 1 ( attr(t) + (1 ) ldev(t))
(S) = attrexttotal(S) 1 + migration(S) 2

+ imbalance(S) [1 ( 1 + 2 )]
(4)
(7)
where is a parameter indicating the importance of the weight

of attraction metrics, 0 < < 1. The attraction attr(t) of the task t
to the executive computing node to which t is assigned, is dened
as the amount of communication between task t and other tasks
on the same node, normalized versus the maximal communication
metrics inside the node. The computational load deviation compared to the average load ldev(t) is dened as the absolute value of
192
the difference between the computational load metrics of the task

t and the minimum such load on the node, normalized versus the
highest such difference for all tasks on the node.
As we have seen, in order to assess the performance of a problem variable, EO requires the denition of a per-variable tness
function that is called local tness. The relationship of such a local
tness to the global tness is straightforward if there is a linear relationship among them. Actually, the problem we face in this paper
is not linearly separable, yet this does not prevent us from dening
and using a local tness function for each of the problem variables,
see formula (6). In fact, the inventors of EO, i.e. S. Boettcher and
A. Percus, in their paper [44] state that The cost C(S) is assumed
to be a linear function of the tness i assigned to each variable xi
(although that is not essential [...]). As examples of the fact that
linearity is not required, they cite another paper from themselves
[45] in which they report results achieved by using EO over problems that are not linearly separable. Also other authors have used
EO to face nonlinear problems as, e.g., in pattern recognition [46].
4.3. The selection of the target node
We use the EO-GS algorithm to perform task and target node
selection for migration. Target node selection is based on additional
biased stochastic approach, to favour some solutions over others.
In our case, the valid solution state neighbourhood includes the use
of all system nodes. Therefore, at each update of rank k, all nodes
n = 0, 1, . . ., |N| 1 are sorted using the (n1 , n2 ) function, n1 , n2 = 0,
1, . . ., |N| 1, with the assignment of GS-ranks g to them. Then, one
of the nodes is selected using the exponential distribution Exp(g,
) = eg .
We propose the following denition of (n1 , n2 ) for the sorting
algorithm based on a pairwise ordering of the computing nodes n1 ,
n2 as targets (hosts) for migration of task j in the load balancing
algorithm. It takes into account load deviations loaddev(n) of the
target nodes n from the current average processor load in the executive system and the communicational attraction attrext(j, n) of the
migration candidate tasks j to tasks at each of the target nodes. First
the computing node current load deviations are examined to set the
order in a node pair (the less charged target computing node has
a higher priority). In the case the load deviations of target processor nodes in a pair are the same, the communicational attractions
are examined of the migrated task to the tasks placed on the target
computing nodes (the node with a higher attraction has a priority, since it will eliminate usually slow external data transmissions
through the interprocessor network).

(n1 , n2 ) =
sgn(loaddev(n1 ) loaddev(n2 ))
when loaddev(n1 ) =
/ loaddev(n2 )
sgn(attrext(j, n2 ) attrext(j, n1 ))
otherwise.
(8)
where
loaddev(n) =
attrext(j, n) =
nwp(S, n)
WT
powerCPU (n)
(com(e, j) + com(j, e))
(9)
(10)
e Tn
and Tn = {t T : t = n} the set of threads, placed on computing

node n.
5. Parallel extremal optimization applied to load balancing
In this section we describe parallel versions of the EO algorithms
that have been proposed and studied in the research reported in this
paper. The different parallel versions have been integrated into one
Table 1
The control parameters of EO algorithm variants and the range of their values.
Parameter
Range
Studied values
Selected value
1
2

Niter
[0, 1]
[0, 1]
[0, 1]
[0, 1]
[0, 1]
[0, ]
[0, ]
{1, . . . }
0.25 . . . 0.75
0.5
0.5, 0.6, 0.75, 0.95
0.25, 0.18, 0.13, 0.05
0.25, 0.22, 0.17, 0.05
0.75, 1.5, 3.0
0.08, 0.15, 0.5, 1.0, 2.0, 4.0
30 . . . 750
0.5
0.5
0.75
0.13
0.17
1.5
0.5
512
N
P
{1, . . . }
{1, . . . }
2, 4, 8, 16, 32
1, 2, 4, 8
Fitness function
Output value range
(S)
(S)
[0, 1]
[0, 1]
general scheme which is presented in Fig. 2. The scheme begins

with an initialization of the EO starting best solution based on
current loads of all computing nodes in the distributed application.
Next, a parallel part of the scheme starts, which includes iterative
execution of EO algorithms in parallel branches.
The algorithms are constructed using two nested loops: the
inner ones and the outer ones, which are terminated in Fig. 2
by End of inner iterations and End of outer iterations blocks,
respectively. The inner loop body in Fig. 2 represents EO-GS or
classic EO algorithms, which were used in this research and corresponds to the main loop of EO in Algorithm 1. In inner loops based
on EO-GS, the local tness function (sj ) values are evaluated for all
components sj of all solutions Sp processed in parallel branches of
the algorithm using the formula (6) in Section 4.2 and the ranking
of these components based on (sj ) is constructed in each parallel
branch of the scheme. Next, a component of S for the improvement
is stochastically selected based on the local tness function in each
branch, with the highest probability of selecting the worst component, using the way explained in Section 3 with parameter values
given in Table 1 in Section 6. Then, the component is improved using
the guided state changes approach (EO-GS) explained in Sections 3
and 4.3 the formulae (8)(10). For this obtained new solution S ,
the global tness function (S ) is evaluated using the formula (2).
The improved solution with the global tness better than the current best value known in a branch is selected as the base for next
parallel EO inner loop iteration in this branch. When all the parallel
inner loops are terminated, the solution with the best value among
the global tness function gathered from all parallel branches is
registered as the current best one found in the current iteration of
the outer loop. Next, the algorithm enters the solution exchange
phase, in which an initial starting EO solution or a set of starting
solutions from previous iterations are identied or computed for
the next iteration of the outer loop of the algorithm. The identied
starting solution or solutions are next distributed among parallel
branches of the scheme in a way proper for the selected type of the
algorithm (denoted by a chosen algorithm label). Then, the next
EO algorithm outer loop iteration starts in parallel branches of the
scheme.
The general scheme of the algorithm shown in Fig. 2 depicts in
fact six parallel versions of the EO algorithm applied in this paper for
load balancing of distributed programs. They are denoted by labels
PEO-A, PEO-GS-A, PEO-GS-B, PEO-GS-C, PEO-GS-D, PEO-GS-E. They
differ in the way in which the starting EO solutions for populationbased improvement are generated and distributed among parallel
branches of the scheme (outer loop) as well as in details of the EO
algorithms iteratively executed inside the parallel branches (inner
loops). Most of the studied algorithm versions (all except PEO-A)
are based on a parallelized execution of the EO-GS algorithm.
193
Fig. 2. The general scheme of the parallel version of the EO algorithm.
In inner loops of the PEO-A version, the selected solution components are improved using a random selection of a new component
value. For each improved solution, the global tness function (S)
is used in the denition of the starting solution for next outer loop
iterations.
When iterations in all P parallel EO branches are completed, we
have 5 variants of the rules which govern the decision taken on
which solutions will be improved in inner loops during the subsequent iteration of the outer loop, i.e., in a series of iterations of
EO in P parallel branches of the algorithm. Three of them (PEO-GSC, PEO-GS-D, PEO-GS-E) include a single point crossover on so-far
solutions in stochastically selected points.
Variants PEO-A and PEO-GS-A (without crossover) select only
one globally best solution Sbest p produced during the previous
outer loop iteration to be next distributed to P parallel branches
for parallel improvement. Variant PEO-GS-B (without crossover)

selects the solution which was produced in the previous outer
loop iteration by the branch which has shown the best average
quality of solutions over all iterations. Variant PEO-GS-C nds
and selects the crossover of Sbest p with a stochastically selected
other nal solution from the preceding iterations. Variant PEOGS-D nds crossovers of Sbest p with P 1 other solutions from
the preceding iterations and selects the one for which the global
tness function is the best. Variant PEO-GS-E nds crossovers of
Sbest p with P 1 other nal solutions from the preceding iterations but it selects for further improvement Sbest p and P 1
best solutions in all pairs of generated crossovers with respect to
values of (S). In all crossovers, two offspring solutions are generated out of which the one (Sx ) is selected for which (Sx ) is
better.
194
The following equation determines the relationship between

the numbers of iterations for the inner and the outer loops, denoted
with Ninner and Nouter respectively, and the total number of EO
iterations among all parallel branches Niter :
Niter = Ninner Nouter P
So, the sum of inner iterations of a single parallel branch is equal
to Niter /P (i.e. the total number of iterations divided by the number
of parallel branches executed on processors which are in fact cores
of a multicore processor we have used).
6. Experimental results
The goal of the experiments was to compare the presented variants of parallel EO-GS to classic (sequential) EO and EO-GS. The
experimental results have been obtained by simulated execution
of application programs in a distributed system, namely in a cluster of computing nodes (processors) interconnected by a message
passing network. For this a discrete event based simulator was
used which was designed on top of the DEVS formalism (discrete
event system specication) [17]. The processors executing programs, including their data communication, were modelled using
the DEVS formalism. The experiments were performed using a cluster of Intel i7-based workstations (8-core i7-2600 3.40 GHz CPU),
under control of the Linux operating system.
The DEVS program execution simulator was running as a thread
on an Intel workstation. During experiments, in parallel with the
simulated distributed execution of an exemplary application graph,
a dynamic load balancing algorithm was performed based on the
EO approach. The load balancing algorithm was designed as a number of threads executed inside a load balancing controller which
was a DEVS module. Simulated computing nodes were periodically reporting their loads to the load balancing controller and then,
depending on the states of the system and the application, appropriate actions were undertaken including activation of EO-based
load balancing actions. Each load balancing experiment with a parallel version of EO algorithm was run in such a way that parallel
branches of the algorithm were executed as threads inside the DEVS
load balancing controller module. The EO parallel threads were running on separate cores of a 8-core workstation. The experiments
were executed in parallel on workstations of the cluster we used.
The reference sequential EO algorithms were run as single threads
executed on separate cores of workstations, inside the cluster of
workstations.
The DEVS-based simulator and the load balancing algorithms
including the EO approach were written in Java, with thread-based
parallelization for multicore machines. Source codes of the software used for the experiments can be made available to readers
upon request.
The assumed simulated model of program execution corresponds to parallelization based on message-passing, using the MPI
library for communication. MPI mechanism is simulated using
DEVS, including communication contentions which are modelled
at the level of the network interface of each computing node. In our
experiments, we used exemplary application graphs, which were
randomly generated in such a way that program tasks were set in
a number of phases. Tasks in a phase could communicate. At the
boundaries between phases there was also a global exchange of
data, Fig. 3. To generate a random application, we set the number of phases, the number of tasks in each phase, the precedence
relationship between phases and the lower and the upper limit
for duration of each phase. Then, the tasks of each phase are created from code and communication blocks of uniformly distributed
random weight, or transfer time, respectively, again within some
Fig. 3. The general structure of exemplary applications.
specied range, until the expected duration time of the phase is

reached.
We used two sets of synthetic exemplary programs: the rst
one consisted of 8 programs with the number of tasks varying from
128 to 576, the second one consisted of 4 larger exemplary applications, built of 1024, 5120, 10,400 or 20,480 parallel tasks. The
number of phases in the programs varies from 1 to 40, the average
is about 10. The communication/computation ratio for applications was in the range [0.10, 0.20]. The rst program is a regular
application which has xed task execution times. In the regular
application, load imbalance can appear due to a non-optimized task
placement of tasks or runtime conditions change. The rest of exemplary programs from both sets are irregular applications in which
the execution time of tasks depends on the processed data. Thus,
they exhibit unpredictable execution time of tasks and of the communication scheme, and load imbalance can occur in computing
nodes.
All parallel EO-GS variants and sequential EO and EO-GS used
the same local and global tness functions. The following parameters for load balancing control were used: A = 0.5, = 0.5,
= 0.75,
1 = 0.13, 2 = 0.17, = 1.5, and for EO-GS = 0.5. Other settings
of control parameters were presented in [13,14]. The values of
parameters shown above have been selected based on approximate
analysis of the inuence of these parameters on the global tness
function. All considered parameters, with their respective ranges
of values, are specied in Table 1. Denition of optimal parameter
values requires tuning of the algorithm with inspection of the inuence of improvements of solution components controlled by local
tness function (parameters ,
) on the global tness function
controlled by the 1 , 2 parameters. In the context of processor
load balancing such tuning has to take into account the characteristics of the application program graphs expressed in terms of
relations between the intensity of the communication and computation in the optimized applications. Each experiment was repeated
10 times, by using random unoptimized initial task placement
for each run, then the results were averaged. Experiments were
repeated for 1, 2, 4, or 8 threads for parallel versions of EO.
In the rst experiment, we investigated the quality of presented
load balancing algorithms, i.e., the parallel speedup of exemplary
applications obtained with load balancing performed using our proposed algorithms. During the experiment, we simulated execution
of the rst set of exemplary application graphs in the executive
system consisting of 2, 4, 8, 16, or 32 processors with distributed
195
Table 2
The number of iterations and evaluations of the global tness function in parallel variants of EO algorithm.
Algorithm
PEO-GS-D, PEO-GS-E
PEO-GS-C
PEO-A, PEO-GS-A, PEO-GS-B
Nouter
Ninner
Niter
Evaluations of (S)
8
4
2
8
4
2
8
4
2
4
8
16
4
8
16
4
8
16
14
15
15
16
16
15
16
16
16
448
480
480
512
512
480
512
512
512
504
528
512
520
528
512
512
512
512
memory. In order to obtain comparable results, we tested performance with standard and parallelized EO applied to load balancing
for a xed search space in all versions of EO algorithms [47] equivalent to about 512 global tness function evaluations. To accomplish
this requirement, the number of iterations Niter has been set to
a different value for each presented version of EO, see Table 2.
For population-based parallel versions, the numbers of iterations
in inner loops depended on the number of parallel branches in
the algorithm and the number of additional global tness evaluations performed in the exchange phase. The assumed exchange
rate of solutions between parallel branches was every 14, 15, or 16
inner iterations. For example, when P = 2, Niter was equal to 480 for
PEO-GS-C, PEO-GS-D and PEO-GS-E variants, and equal to 512 for
PEO-GS-A and PEO-GS-B. For sequential EO algorithms, we always
set the number of iterations Niter to 512.
Fig. 4(a) and (b) shows the average irregular application parallel
speedup (against sequential execution) and applications parallel
speedup improvement due to load balancing based on different versions of EO for the number of computing nodes in an application set
to 2, 4, 8, 16 and 32. The reference for the speedup improvement
was the speedup with load balancing based on the standard sequential EO. Load balancing with PEO-A algorithm (based on single
executions of standard EO in parallel branches in the scheme from
Fig. 2) and with other parallel versions of EO-GS for the number of
computing nodes in the application up to 16, produced no meaningful speedup improvement in this experiment (see Fig. 4(b)). It is
because the standard sequential EO was sufcient for nding good
task migrations for these smaller numbers of processor nodes in
the application execution. For 32 computing nodes in the application execution, population-based parallelization of EO combined
with the guided search approach (algorithms PEO-GS-A, -B, -C, D, -E) gives meaningful speedup improvements due to repeated
selection/exchange of the best solutions among parallel branches
of the algorithm and nding better task migrations in load balancing. For 32 computing nodes used in the applications we achieved
the application average speedup improvement with different parallel versions of EO-GS of about 17%. We can see that application of
standard crossover (algorithms PEO-GS-C, -D, -E) in solution selection did not result in any spectacular speed-up increase. The reason
was the relatively small number of iterations of EO, comparing
the relatively high iteration number applied in genetic algorithms.
For 32 computing nodes, the speedup improvement with parallel
EO-GS (PEO-GS-A) was better by about 4% than the improvement
with sequential EO-GS (EO-GS). These improvements needed no
additional computations, since all algorithms performed the same
search work. So, an increase of the iteration number was replaced
by a widening of the search area using parallel algorithm branches
and best solutions exchange.
We also investigated changes of migration number for different
load balancing algorithms in irregular applications, Fig. 5. Except
PEO-A, population-based parallel EO-GS algorithms (PEO-GS-A, B, -C, -D, -E), achieved big reductions of migration number. The
reduction in the range 2040% is higher for all cases than reduction
for sequential EO-GS and PEO. We see that parallelization is able to

substantially reduce time overheads of EO-based load balancing.
The impact of regularity of applications on the obtained results
is presented in Fig. 6. Regular applications have shown speedup
improvements within the range of 1%, regardless the applications
size and pattern of communication. On the other hand, they have
shown substantial reduction of the migrations number, up to 85%. It
conrms that parallelized EO-GS algorithms can be used for regular
applications with some success even when other methods (such as
a sequential EO) give speedup, since parallelized EO-GS is able to
nd smart load balancing decisions using much lower number of
migration steps.
During experiments with irregular applications we have noticed
that there exists a subset of such applications, for which parallelized
EO-GS achieved better results than the average for the whole test
range. Programs in this subset show higher degree of their graph
irregularity. They contain only small or no regular parts and lower
parallelization degree i.e. a relatively small graph width. In effect,
efcient execution of such applications places higher demands
on optimized mapping of tasks to computational nodes. Parallelized EO-GS enabled speedup improvement of up to 25% for 32
computing nodes in the application. Even for smaller numbers of
computing nodes we have some speedup improvement over the
sequential EO, Fig. 7(a) and (b). Better results are also visible when
we compare the migration number for sequential and parallelized
EO, Fig. 8. So, parallelized EO-GS methods are especially efcient
for harder optimization problems, such as strongly irregular applications.
In the second experiment, we assessed the execution times of
presented load balancing algorithms as a function of the application size parameters. To do this, we simulated execution of load
balancing for the second set of exemplary applications containing
different numbers of tasks performed in executive systems consisting of 32, 64, or 128 processors. The number of iterations Niter in
constituent EO algorithm phases was set to 512, 2048, or 4096 for
all investigated kinds of algorithms. The average execution time of
a single execution of EO and EO-GS are shown in Fig. 9. We can
see that EO scales very well, i.e., we obtained linear increase of the
running time of EO algorithms with increasing number of application tasks. The EO-GS algorithm is slightly slower than classical EO
due to additional computations inside the guided search phase but
shows a similar linear behaviour. The running times depend very
little on the number of computing nodes in the executive system,
Fig. 10. Please notice that the execution times of EO are very small,
which makes EO a very good constituent internal solution engine
for load balancing algorithms.
The average parallel efciency of PEO algorithms as a function
of the number of iterations, on the example of PEO-GS-A running on two hardware threads (P = 2), is shown in Fig. 11. We
can observe that an increase of the number of iterations slightly
increases the efciency too. On the other hand, when we increase
the number of tasks in exemplary applications, efciency slightly
decreases, mostly due to a sequential initial preprocessing phase
196
(a)
(b)
Fig. 4. Irregular application average speedup (a) and average speedup improvement (b) with parallelized EO-GS against sequential EO versus computing node number.
which prepares the data for the parallelized EO. For larger number
of hardware threads (P = 4), the average parallel efciency, measured but not shown as a graph was from 32% for 512 iterations to
about 50% for 4096 iterations (we do not cover the results of using
8 threads due to non equivalent execution on 4-core i7 CPUs under
HyperThreading).
Our experiments have revealed that the population-type parallelized EO-GS methods are able to nd load balancing solutions of
high quality for irregular applications, both in the terms of applications speedup and the migration number. This positive result is
consistent for all tested irregular applications. Thus, all populationbased EO-GS variants (except PEO-A) give viable load balancing
algorithms for irregular applications, providing a promising alternative. They gave parallel applications speedup improvement by
using multicore CPUs without increase of the iteration number.
Since the differences between population-type parallelized EO-GS
Fig. 5. Average migration number change with parallelized EO-GS against sequential EO for irregular applications versus computing node number.
197
(a)
(b)
Fig. 6. Regular application average speedup improvement (a) and migration number change (b) with parallelized EO-GS against sequential EO versus computing node
number.
methods in the terms of applications speedup and the migration

number are small, we decided to use the statistical analysis of
obtained results in order to provide a better insight into the characteristics of PEO-GS-A, -B, -C, -D, -E algorithms.
7. Statistical comparison of the presented algorithms
To carry out a statistical analysis of the behaviour of the six
algorithms, and to compare them in terms of two parameters,
i.e. speedup and migration cost, a classical statistical approach
based on nonparametric statistical tests has been carried out. This
analysis is shown in next subsections, and has been performed
by following [48,49]. Therefore, multiple comparison analysis has
been effected by means of Friedman, Aligned Friedman, and Quade
tests. A brief explanation of these statistical tests is reported in the
Appendix. To full the goal, the StatService [50] tool has been used.
It is a free service and is available on the internet at the address
http://moses.us.es/statservice/, and was developed to compute the
rankings for these tests, and to carry out the related post-hoc procedures and the computation of the adjusted p-values.
The results for the one-to-all analysis are reported in the following for the two different parameters.
7.1. Speedup
Table 3 contains the results of the Friedman, Aligned Friedman,
and Quade tests in terms of average rankings obtained by all the
Table 3
Speedup: average rankings of the algorithms.
Algorithm
PEO-GS-B
PEO-GS-E
PEO-GS-D
PEO-GS-A
PEO-GS-C
PEO-A
Friedman
Aligned Friedman
Quade
3.372
3.380
3.412
3.445
3.450
3.941
2943.491
2989.804
2982.654
3071.527
3025.880
3889.645
3.311
3.337
3.296
3.332
3.337
4.386
Statistic
71.694
895.922
1082.224
p-Value
0.000
0.000
0.000
algorithms. The last two rows show the statistic and the p-value
for each test, respectively. For Friedman and Aligned Friedman tests
the statistic is distributed according to chi-square with 5 degrees
of freedom, whereas for Quade test it is distributed according to
F-distribution with 5 and 5245 degrees of freedom.
In each of the three tests, the lower the value for an algorithm,
the better the algorithm is. The three tests give different results. In
Friedman test PEO-GS-B turns out to be the best optimization algorithm, and PEO-GS-E is the runner-up, whereas in Aligned Friedman
test PEO-GS-B is the winner followed by PEO-GS-D, and, nally, in
Quade test PEO-GS-D is the best and PEO-GS-B is the runner-up.
For each test the best performance is shown in bold in the table.
For all the tests PEO-A achieves the worst ranking.
Furthermore, with the aim to examine if some hypotheses of
equivalence between the best performing algorithm and the other
198
(a)
(b)
Fig. 7. Strongly irregular application average speedup (a) and average speedup improvement (b) with parallelized EO-GS against sequential EO versus computing node
number.
ones can be rejected, the complete statistical analysis based on the

post-hoc procedures ideated by Bonferroni, Holm, Holland, Rom,
Finner, and Li has been carried out following [49].
Tables 46 report the results of this analysis performed at a
level of signicance = 0.05 for Friedman, Aligned Friedman, and
Quade, respectively. The level of signicance represents the maximum allowable probability of incorrectly rejecting a given null
hypothesis. In our case, as an example, if it is equal to 0.05, this
means that if an equivalence hypothesis is rejected, there is a 5%

probability of making a mistake in rejecting it.
In these tables the other algorithms are ranked in terms of distance from the best performing one, which is taken as the control
method, and each algorithm is compared against this latter with
the aim to investigate whether or not the equivalence hypothesis
can be rejected. For each algorithm each table reports the z value,
the unadjusted p-value, and the adjusted p-values according to the
Fig. 8. Average migration number change with parallelized EO-GS against sequential EO for strongly irregular applications versus computing node number.
199
Fig. 9. Average execution time of EO and EO-GS for 512 iterations versus the number of tasks in exemplary applications.
Fig. 10. Average execution time of EO versus the number of tasks and computing nodes in applications under load balancing.
Fig. 11. Average parallel efciency of PEO-GS-A (P = 2) for different number of tasks in applications versus the number of EO-GS iterations.
Table 4
Speedup: results of post-hoc procedures for Friedman test over all tools (at = 0.05). Control method: PEO-GS-B.
i
Algorithm
z = (R0 Ri )/SE
Bonferroni
Holm
Holland
Rom
Finner
Li
5
4
3
2
1
PEO-A
PEO-GS-C
PEO-GS-A
PEO-GS-D
PEO-GS-E
0.000
0.339
0.372
0.624
0.921
6.975
0.956
0.892
0.490
0.099
0.000
0.339
0.372
0.624
0.921
0.010
0.013
0.017
0.025
0.050
0.010
0.013
0.017
0.025
0.050
0.011
0.013
0.017
0.025
0.050
0.010
0.020
0.030
0.040
0.050
0.004
0.004
0.004
0.004
0.050
0.010
0.013
0.013
0.011
0.020
0.004
Th
200
Table 5
Speedup: results of post-hoc procedures for Aligned Friedman test over all tools (at = 0.05). Control method: PEO-GS-B.
i
Algorithm
z = (R0 Ri )/SE
Bonferroni
Holm
Holland
Rom
Finner
Li
5
4
3
2
1
PEO-A
PEO-GS-A
PEO-GS-C
PEO-GS-E
PEO-GS-D
0.000
0.107
0.299
0.560
0.622
11.919
1.613
1.038
0.583
0.493
0.000
0.107
0.299
0.560
0.622
0.010
0.013
0.017
0.025
0.050
0.010
0.013
0.017
0.025
0.050
0.011
0.013
0.017
0.025
0.050
0.010
0.020
0.030
0.040
0.050
0.020
0.020
0.020
0.020
0.050
0.010
0.013
0.013
0.011
0.020
0.020
Th
Table 6
Speedup: results of post-hoc procedures for Quade test over all tools (at = 0.05). Control method: PEO-GS-D.
i
Algorithm
z = (R0 Ri )/SE
Bonferroni
Holm
Holland
Rom
Finner
Li
5
4
3
2
1
PEO-A
PEO-GS-E
PEO-GS-C
PEO-GS-A
PEO-GS-B
0.000
0.780
0.780
0.806
0.917
7.315
0.280
0.279
0.245
0.104
0.000
0.780
0.780
0.806
0.917
0.010
0.013
0.017
0.025
0.050
0.010
0.013
0.017
0.025
0.050
0.011
0.013
0.017
0.025
0.050
0.010
0.020
0.030
0.040
0.050
0.004
0.004
0.004
0.004
0.050
0.010
0.013
0.013
0.011
0.020
0.004
Th
different post-hoc procedures. The variable z represents the test

statistic for comparing the algorithms, and its denition depends
on the main nonparametric test used. In [49] all the different denitions for z, corresponding to the different tests, are reported. The
last row in the tables contains for each procedure the threshold
value Th such that the procedure considered rejects those equivalence hypotheses that have an adjusted p-value lower than or equal
to Th.
All the post-hoc procedures applied to the Friedman test cannot
reject the hypothesis of statistical equivalence between PEO-GS-B
and PEO-GS-E, whereas they all can reject that between PEO-GSB and PEO-A. All the procedures apart from Li cannot reject the
equivalence of PEO-GS-B with PEO-GS-D and PEO-GS-A.
For the Aligned Friedman test all the post-hoc procedures cannot
reject the hypothesis of statistical equivalence between PEO-GS-B
and PEO-GS-D, whereas they all can reject that between PEO-GSB and PEO-A. All the procedures apart from Li cannot reject the
equivalence of PEO-GS-B with PEO-GS-E and PEO-GS-C.
For the Quade test all the post-hoc procedures cannot reject the
hypothesis of statistical equivalence between PEO-GS-D and PEOGS-B, whereas they all can reject that between PEO-GS-D and PEOA. All the procedures apart from Li cannot reject the equivalence of
PEO-GS-D with PEO-GS-A and PEO-GS-C.
Finally, Table 7 shows the pairwise comparison between these
six algorithms. Two post-hoc procedures, i.e., those by Holm and
Shaffer, are considered here.
Table 7
Speedup: pairwise comparison between the algorithms.
Algorithm
PEO-A vs. PEO-GS-B
PEO-A vs. PEO-GS-E
PEO-A vs. PEO-GS-D
PEO-A vs. PEO-GS-A
PEO-A vs. PEO-GS-C
PEO-GS-B vs. PEO-GS-C
PEO-GS-A vs. PEO-GS-B
PEO-GS-C vs. PEO-GS-E
PEO-GS-A vs. PEO-GS-E
PEO-GS-B vs. PEO-GS-D
PEO-GS-C vs. PEO-GS-D
PEO-GS-A vs. PEO-GS-D
PEO-GS-D vs. PEO-GS-E
PEO-GS-B vs. PEO-GS-E
PEO-GS-A vs. PEO-GS-C
Th
p
0.000
0.000
0.000
0.000
0.000
0.339
0.372
0.391
0.428
0.624
0.641
0.687
0.696
0.921
0.949
z
6.975
6.876
6.485
6.083
6.019
0.956
0.892
0.857
0.793
0.490
0.467
0.402
0.391
0.099
0.064
Holm
Shaffer
0.003
0.004
0.004
0.004
0.005
0.005
0.006
0.006
0.007
0.008
0.010
0.013
0.017
0.025
0.050
0.003
0.005
0.005
0.005
0.005
0.005
0.006
0.006
0.007
0.008
0.010
0.013
0.017
0.025
0.050
0.005
0.005
The equivalence between PEO-A and any other algorithm is

rejected by both procedures, so PEO-A is statistically inferior to
all the others in terms of speedup. Also equivalence between PEOGS-B and PEO-GS-C is rejected by both procedures. Equivalence
hypothesis between any pair of algorithms from the set {PEO-GSA, PEO-GS-B, PEO-GS-D, PEO-GS-E}, instead, is not rejected by both
procedures.
7.2. Migration cost
Similarly to the way results have been presented in the previous
subsection, also here Table 8 contains the results of the Friedman,
Aligned Friedman, and Quade tests in terms of average rankings
obtained by all the algorithms. The last two rows show the statistic and the p-value for each test, respectively. For Friedman and
Aligned Friedman tests the statistic is distributed according to chisquare with 5 degrees of freedom, whereas for Quade test it is
distributed according to F-distribution with 5 and 5245 degrees
of freedom.
All the three tests give the same results: PEO-GS-C is the best
optimization algorithm, and PEO-GS-B is the runner-up. For all
the tests PEO-A achieves the worst ranking, immediately following
PEO-GS-D.
Furthermore, with the aim to examine if some hypotheses of
equivalence between the best performing algorithm and the other
ones can be rejected, the same complete statistical analysis based
on the post-hoc procedures as in the Section 7.1 has been carried
out. Tables 911 deal with this aim, and report the results of this
analysis performed at a level of signicance = 0.05 for Friedman,
Aligned Friedman, and Quade, respectively.
Similarly to the presentation followed in Section 7.1, in these
tables the other algorithms are ranked in terms of distance from the
Table 8
Migrations: average rankings of the algorithms.
Algorithm
Friedman
Aligned Friedman
Quade
PEO-GS-C
PEO-GS-B
PEO-GS-E
PEO-GS-A
PEO-GS-D
PEO-A
3.205
3.285
3.306
3.311
3.343
4.550
2775.854
2883.107
2930.587
2929.703
3066.721
4317.029
3.090
3.238
3.280
3.308
3.361
4.723
Statistic
399.751
906.230
1073.112
p-Value
0.000
0.000
0.000
201
Table 9
Migrations: results of post-hoc procedures for Friedman test over all tools (at = 0.05). Control method: PEO-GS-C.
i
Algorithm
z = (R0 Ri )/SE
Bonferroni
Holm
Holland
Rom
Finner
Li
5
4
3
2
1
PEO-A
PEO-GS-D
PEO-GS-A
PEO-GS-E
PEO-GS-B
0.000
0.092
0.195
0.216
0.327
16.464
1.685
1.295
1.236
0.980
0.000
0.092
0.195
0.216
0.327
0.010
0.013
0.017
0.025
0.050
0.010
0.013
0.017
0.025
0.050
0.011
0.013
0.017
0.025
0.050
0.010
0.020
0.030
0.040
0.050
0.035
0.035
0.035
0.035
0.050
0.010
0.013
0.013
0.011
0.020
0.035
Th
Table 10
Migrations: results of post-hoc procedures for Aligned Friedman test over all tools (at = 0.05). Control method: PEO-GS-C.
i
Algorithm
z = (R0 Ri )/SE
Bonferroni
Holm
Holland
Rom
Finner
Li
5
4
3
2
1
PEO-A
PEO-GS-D
PEO-GS-E
PEO-GS-A
PEO-GS-B
0.000
0.000
0.051
0.053
0.177
19.415
3.664
1.949
1.938
1.351
0.000
0.000
0.051
0.053
0.177
0.010
0.013
0.017
0.025
0.050
0.010
0.013
0.017
0.025
0.050
0.011
0.013
0.017
0.025
0.050
0.010
0.020
0.030
0.040
0.050
0.043
0.043
0.043
0.043
0.050
0.010
0.017
0.017
0.013
0.030
0.043
Th
Table 11
Migrations: results of post-hoc procedures for Quade test over all tools (at = 0.05). Control method: PEO-GS-C.
i
Algorithm
5
4
3
2
1
PEO-A
PEO-GS-D
PEO-GS-A
PEO-GS-E
PEO-GS-B
0.000
0.069
0.143
0.202
0.321
z = (R0 Ri )/SE
Bonferroni
Holm
Holland
Rom
Finner
Li
10.961
1.821
1.466
1.274
0.993
0.000
0.069
0.143
0.202
0.321
0.010
0.013
0.017
0.025
0.050
0.010
0.013
0.017
0.025
0.050
0.011
0.013
0.017
0.025
0.050
0.010
0.020
0.030
0.040
0.050
0.036
0.036
0.036
0.036
0.050
0.010
0.013
0.013
0.011
0.020
0.036
Th
best performing one, that is taken as the control method, and each
algorithm is compared against this latter with the aim to investigate
whether or not the equivalence hypothesis can be rejected.
All the post-hoc procedures applied to the Friedman test cannot
reject the hypothesis of statistical equivalence between PEO-GS-C
and PEO-GS-B, whereas they all can reject that between PEO-GSC and PEO-A. All the procedures apart from Li cannot reject the
equivalence of PEO-GS-C with PEO-GS-E and PEO-GS-A.
For the Aligned Friedman test all the post-hoc procedures cannot
reject the hypothesis of statistical equivalence between PEO-GS-C
and PEO-GS-B, whereas they all can reject that between PEO-GSC and PEO-A and that between PEO-GS-C and PEO-GS-D. All the
procedures apart from Li cannot reject the equivalence of PEO-GS-C
with PEO-GS-A.
For the Quade test all the post-hoc procedures cannot reject
the hypothesis of statistical equivalence between PEO-GS-C and
Table 12
Migrations: pairwise comparison between the algorithms.
Algorithm
Holm
Shaffer
PEO-A vs. PEO-GS-C

PEO-A vs. PEO-GS-B
PEO-A vs. PEO-GS-E
PEO-A vs. PEO-GS-A
PEO-A vs. PEO-GS-D
PEO-GS-C vs. PEO-GS-D
PEO-GS-A vs. PEO-GS-C
PEO-GS-C vs. PEO-GS-E
PEO-GS-B vs. PEO-GS-C
PEO-GS-B vs. PEO-GS-D
PEO-GS-D vs. PEO-GS-E
PEO-GS-A vs. PEO-GS-D
PEO-GS-A vs. PEO-GS-B
PEO-GS-B vs. PEO-GS-E
PEO-GS-A vs. PEO-GS-E
0.000
0.000
0.000
0.000
0.000
0.092
0.195
0.216
0.327
0.480
0.653
0.696
0.753
0.797
0.953
16.464
15.484
15.228
15.169
14.779
1.685
1.295
1.236
0.980
0.706
0.449
0.391
0.315
0.257
0.058
0.003
0.004
0.004
0.004
0.005
0.005
0.006
0.006
0.007
0.008
0.010
0.013
0.017
0.025
0.050
0.003
0.005
0.005
0.005
0.005
0.005
0.006
0.006
0.007
0.008
0.010
0.013
0.017
0.025
0.050
0.005
0.005
Th
PEO-GS-B, whereas they all can reject that between PEO-GS-C and
PEO-A. All the procedures apart from Li cannot reject the equivalence of PEO-GS-C with PEO-GS-E and PEO-GS-A.
Finally, Table 12 shows the pairwise comparison between these
six algorithms in terms of migrations. Two post-hoc procedures,
i.e., those by Holm and Shaffer, are considered here.
The equivalence between PEO-A and any other algorithm is
rejected by both procedures, so PEO-A is statistically inferior to all
the others in terms of migration cost. Also equivalence between
PEO-GS-C and PEO-GS-D is rejected by both procedures. Equivalence hypothesis between any pair of algorithms from the set
{PEO-GS-A, PEO-GS-B, PAR6, PEO-GS-E}, instead, is not rejected by
both procedures.
8. Conclusions and future works

The paper has presented parallel algorithms for dynamic processor load balancing in execution of distributed programs. The
proposed algorithms are based on parallel extensions of the EO
method, a nature-inspired optimization technique exploiting selforganized criticality. In the overall load balancing procedure, the
parallel EO algorithms determine candidates for task migrations to
correct global imbalance of processor loads in the executive system.
The contribution of the paper is the proposal of a set of
population-based, parallelized EO methods, including ve variants
of stochastic exchange of best solutions between parallel branches
of the algorithm. They are additionally extended by a guided search
based on some knowledge of the problem (EO-GS versions). Three
of them are based on crossover operator used in genetic algorithms.
The proposed parallel EO-GS algorithms have been assessed by
experiments with simulated load balancing of distributed applications. During the experiments we have compared the proposed
parallel load balancing methods to sequential EO-based versions.
To better characterize the proposed parallel EO-GS methods, we
202
have presented the results of statistical tests applied to experimentally measured applications execution speedup and numbers.
Additionally, we have checked that the proposed EO-GS algorithms
show very good scalability for the increasing number of tasks and
computing units appearing in the load balancing problem.
The general conclusion resulting from our experiments is that
the extension of the solution search space in EO-GS through parallel
mutations in a population of multiple candidate solutions, even in
its modest form of 8 parallel branches, provides satisfactory results.
Obtained experimental results conrm that the application of parallel EO for load balancing is successful as an evolutionary strategy
based on processing of best solutions in the optimization process
by improvements of the tness-supported selection of the worst
components. The EO method, which is based on improvements of
the single solution representation has low memory and operational
complexities, which is strongly convergent with the requirements
of the load balancing problem. We were able to obtain satisfactory
quality of processor load balancing without increasing the number of sequential iterations of EO but using parallelized computing
power of multicore CPUs. Additional prot of processing many
parallel state changes in EO is a reduction in the number of task
migrations needed to balance processor loads. Best results, in the
terms of improved application parallel speedup with reduced number of migrations, have been obtained for irregular applications.
The statistical analysis of the tested algorithms for the obtained
irregular applications speedup has revealed that the variants
PEO-GS-B and PEO-GS-D have provided theoretically best results
over tested algorithms, however with a very small predominance.
The variant PEO-GS-B was based on the solutions produced by the
parallel branch with the best average quality of solutions over all
iterations. The variant PEO-GS-D was based on the set of crossovers
of the globally best solution obtained from so-far iterations with
P 1 other solutions from the preceding iterations. In terms of the
migration number reduction for irregular applications, the variant
PEO-GS-C (based on broadcast of the best crossover of the best
solution obtained from so-far iterations with a randomly selected
nal solution in parallel branches) turned out to be the best with
a similar slight predominance. For regular parallel applications, in
which task execution and communication pattern are predictable,
the proposed EO-GS algorithms were able to give very small
speedup improvements comparing standard EO, however, with
strongly reduced number of task migrations, thus lowering the
cost of load balancing.
Our further research on the proposed parallel EO-based application load balancing methods will concern research on tuning
methods for the proposed algorithms to the properties of the application graphs under optimization. In this research, we would like
to compare the effects of such tuning with the solutions obtained
using mathematical programming methods and multi-objective
optimization. Multi-objective approach could be fruitful continuation of the research presented in this paper. In fact, in the current
paper we have assumed the problem solving with the use of one
global tness function . Yet, as it has been described, it is actually a
linear combination of three functions, i.e. attrxttotal, migration and
imbalance. Consequently, we can take a multi-objective approach
and dene as our objectives the three above functions. Some
approaches to multiobjective EO versions can already be found in
the literature, e.g. [51,52]. Additional aspect for this research will
be the assessment of the performance of the algorithms for a larger
number of computing nodes.
Appendix A. Statistical tests introduction
A very basic explanation of the statistical tests reported in
Section 7 is given here, for a case in which np problems and nA
algorithms are considered.
Friedman test ranks the algorithms for each problem separately;

the best performing algorithm is assigned the rank of 1, the second
best rank 2, and so on up to nA . The ranks for each algorithm are
summed over the np faced problems, and the sum is divided by np .
A drawback of the ranking scheme employed by the Friedman
test is that when the number of algorithms for comparison is small,
this may pose a disadvantage, and comparability among problems
is desirable.
To overcome this problem, in Aligned Friedman test a value of
location is computed as the average performance achieved by all
algorithms in each problem. Then, the difference between the performance obtained by an algorithm and the value of location is
obtained. This step is repeated for each combination of algorithms
and problems. The resulting differences are then ranked from 1 to
nA np relative to each other. Again, the ranks for each algorithm
are summed over the faced problems, and the sum is divided by
the number of problems.
The main drawback with the two above described tests is that
they consider all problems to be equal in terms of importance.
Quade test takes into account the fact that some problems are
more difcult or that the differences registered on the run of various algorithms over them are larger. Therefore, ranks are assigned
to the problems themselves according to the width of the results
range in each problem. Thus, the problem with the smallest range is
assigned rank 1, the second smallest rank 2, and so on to the problem with the largest range, which gets rank np . So, each problem
j
i is given a rank Qi . Next, for each problem the product Si is com-
j
puted as: Si
j
Qi (ri
j
(nA + 1)/2), where ri
=
is the rank of algorithm
j within problem i. By doing so, this value takes into account both
the relative importance of each observation within the problem,
and the relative signicance of the problem the observation refers
np j
to. Finally, a value Sj computed as Sj =
S , for j = 1, 2, . . ., nA can
i=1 i
be assigned to each algorithm.
References
[1] R.Z. Khan, J. Ali, Classication of task partitioning and load balancing
strategies in distributed parallel computing systems, Int. J. Comput. Appl. 60
(17) (2012) 4853.
[2] Y. Jiang, A survey of task allocation and load balancing in distributed systems,
IEEE Trans. Par. Distrib. Syst. (2016), http://dx.doi.org/10.1109/TPDS.2015.
2407900.
[3] K. Barker, N. Chrisochoides, An evaluation of a framework for the dynamic
load balancing of highly adaptive and irregular parallel applications, in:
Proceedings of the ACM/IEEE Conference on Supercomputing, Phoenix, USA,
ACM Press, 2003, p. 45.
[4] M. Zakarya, N. Dilawar, N. Khan, A survey on energy efcient load balancing
algorithms over multicores, Int. J. Res. Comput. Appl. Inf. Technol. 1 (1) (2013)
6068.
[5] S.V. Pius, T.S. Shilpa, Survey on load balancing in cloud computing, in:
Proceedings of the International Conference on Computing, Communication
and Energy Systems (ICCCES), Kerala, India, 2014, p. 277.
[6] T. Desai, J. Prajapati, A survey of various load balancing techniques and
challenges in cloud computing, Int. J. Sci. Technol. Res. 2 (11) (2013) 158161.
[7] N. Patel, N. Chauhan, A survey on load balancing and scheduling in cloud
computing, IJIRST Int. J. Innov. Res. Sci. Technol. 1 (7) (2014) 185189.
[8] M. Mishra, S. Agarwal, P. Mishra, S. Singh, Comparative analysis of various
evolutionary techniques of load balancing: a review, Int. J. Comput. Appl. 63
(15) (2013) 813.
[9] S. Boettcher, A.G. Percus, Extremal optimization: methods derived from
coevolution, in: Proceedings of the Genetic and Evolutionary Computation
Conference, San Francisco, CA, Morgan Kaufmann, 1999, pp. 825832.
[10] K. Sneppen, P. Bak, H. Flyvbjerg, M.H. Jensen, Evolution as a self-organized
critical phenomenon, Proc. Natl. Acad. Sci. U. S. A. 92 (1995) 52095213.
[11] M. Eusuff, K. Lansey, F. Pasha, Shufed frog-leaping algorithm: a memetic
meta-heuristic for discrete optimization, Eng. Optim. 38 (2) (2006) 129154.
[12] S.P. Bradley, A.C. Hax, T.L. Magnanti, Applied Mathematical Programming,
Addison-Wesley Publishing Company, Reading, MA, 1977.
[13] I. De Falco, E. Laskowski, R. Olejnik, U. Scafuri, E. Tarantino, M. Tudruj, Load
balancing in distributed applications based on extremal optimization, in:
EvoCOMNET 2013, Vienna, Austria, Lect. Notes Comput. Sci. 7835, Springer,
2013, pp. 5261.
[14] I. De Falco, E. Laskowski, R. Olejnik, U. Scafuri, E. Tarantino, M. Tudruj,
Extremal optimization with guided state changes in load balancing of
[15]
[16]
[17]
[18]
[19]
[20]
[21]
[22]
[23]
[24]
[25]
[26]
[27]
[28]
[29]
[30]
[31]
[32]
[33]
distributed programs, in: EvoCOMNET 2014, Granada, Spain, Lect. Notes

Comput. Sci. 8602, Springer, 2014, pp. 5162.
I. De Falco, E. Laskowski, R. Olejnik, U. Scafuri, E. Tarantino, M. Tudruj,
Extremal optimization applied to load balancing in execution of distributed
programs, Appl. Soft Comput. 30 (5) (2015) 501513.
I. De Falco, E. Laskowski, R. Olejnik, U. Scafuri, E. Tarantino, M. Tudruj, Parallel
extremal optimization with guided state changes applied to load balancing,
in: EvoCOMNET 2015, Copenhagen, Denmark, Lect. Notes Comput. Sci. 9028,
Springer, 2015, pp. 7990.
B. Zeigler, Hierarchical, modular discrete-event modelling in an
object-oriented environment, Simulation 49 (5) (1987) 219230.
M. Munetomo, M.N.K. Takai, Y. Sato, A stochastic genetic algorithm for
dynamic load balancing in distributed systems, in: Proceedings of the IEEE
International Conference on Systems, Man and Cybernetics 4, IEEE Press,
1995, pp. 37953799.
S.-H. Lee, C.-S. Hwang, A dynamic load balancing approach using genetic
algorithm in distributed systems, in: Proceedings of the International
Conference on Evolutionary Computation, IEEE Press, 1998, pp. 639644.
A.Y. Zomaya, Y.-H. Teh, Observations on using genetic algorithms for dynamic
load-balancing, IEEE Trans. Par. Distrib. Syst. 12 (9) (2001) 899911.
S. Suresh, H. Huang, H.J. Kim, Hybrid real-coded genetic algorithm for data
partitioning in multi-round load distribution and scheduling in
heterogeneous systems, Appl. Soft Comput. 24 (11) (2014) 500510.
M. Paletta, P. Herrero, A simulated annealing method to cover dynamic load
balancing in grid environment, Adv. Soft Comput. 50 (2009) 110.
P. Visalakshi, S.N. Sivanandam, Dynamic task scheduling with load balancing
using hybrid particle swarm optimization, Int. J. Open Problems Comput.
Math. 2 (3) (2009) 475488.
V. Sesum-Cavic, E. Kuehn, Applying swarm intelligence algorithms for
dynamic load balancing to a cloud based call center, in: Proceedings of the 4th
International Conference on Self-Adaptive and Self-Organizing Systems, IEEE
Press, 2010, pp. 255256.
L. Bai, Y.-L. Hu, S.-Y. Lao, W.-M. Zhang, Task scheduling with load balancing
using multiple ant colonies optimization in grid computing, in: Proceedings of
the 6th International Conference on Natural Computation 5, 2010, pp.
27152719.
S.K. Goyal, M. Singh, Adaptive and dynamic load balancing in grid using ant
colony optimization, Int. J. Eng. Sci. Technol. 4 (9) (2012) 167174.
L.D. Dhinesh Babu, P. Venkata Krishna, Honey bee behavior inspired load
balancing of tasks in cloud computing environments, Appl. Soft Comput. 13
(5) (2013) 22922303.
M. Randall, A. Lewis, An extended extremal optimisation model for parallel
architectures, in: Proceedings of the 2nd IEEE International Conference on
e-Science and Grid Computing 2006, e-Science06, IEEE Press, 2006, p. 114.
M. Chen, Y. Lu, G. Yang, Population-based extremal optimization with
adaptive levy mutation for constrained optimization, in: CIS 2006, Lect. Notes
Artif. Int. 4456, Springer, 2007, pp. 144156.
M. Chen, Y. Lu, G. Yang, Multi-objective optimization using population-based
extremal optimization, in: Proceedings of the 1st International Conference on
Bio-Inspired Computing: Theory and Applications (BIC-TA 2006), Neural
Computing and Applications 17, Springer, 2008, pp. 101109.
K. Tamura, H. Kitakami, A. Nakada, Reducing crossovers in reconciliation
graphs with extremal optimization, Trans. Inf. Process. Soc. Jpn. 49 (4) (2008)
105116 (in japanese).
N. Hara, K. Tamura, H. Kitakami, Modied EO-based evolutionary algorithm
for reducing crossovers of reconciliation graph, in: Proceedings of the 2nd
IEEE World Congress on Nature and Biologically Inspired Computing (NaBIC),
IEEE Press, 2010, pp. 169176.
K. Tamura, H. Kitakami, A. Nakada, Distributed extremal optimization using
island model for reducing crossovers in reconciliation graph, in: Proceedings
of the International MultiConference of Engineers and Computer Scientists,
Hong-Kong, 2013, pp. 16.
203
[34] K. Tamura, H. Kitakami, A. Nakada, Distributed modied extremal

optimization using island model for reducing crossovers in reconciliation
graph, Eng. Lett. 21 (2) (2013) 8188.
[35] K. Tamura, H. Kitakami, Island-model-based distributed modied extremal
optimization with tabu lists for reducing crossovers in reconciliation graphs,
in: Proceedings of the MultiConference of Engineers and Computer Scientists
1, Hong-Kong, 2014, pp. 16.
[36] K. Tamura, H. Kitakami, A new distributed modied extremal optimization
with tabu search mechanism for reducing crossovers in reconciliation graph
and its performance evaluation, IAENG Int. J. Comput. Sci. 41 (2) (2014)
131140.
[37] A. Nakada, K. Tamura, H. Kitakami, Optimal protein structure alignment using
modied extremal optimization, in: Proceedings of the IEEE International
Conference on Systems, Man and Cybernetics, IEEE Press, 2012,
pp. 697702.
[38] A. Nakada, K. Tamura, H. Kitakami, Y. Takahashi, Population-based modied
extremal optimization for contact map overlap maximization problem, in:
Proceedings of the IIAI International Conference on Advanced Applied
Informatics (IIAI-AAI), August, 2013, pp. 245250.
[39] K. Tamura, H. Kitakami, Y. Takagashi, Bio-inspired heuristic for optimizing
protein structure alignment using distributed modied extremal
optimization, in: Proceedings of the 7th IEEE International Workshop on
Computational Intelligence and Applications, IEEE Press, 2014, pp. 2328.
[40] A. Zamuda, J.D. Hernandez Sosa, Differential evolution and underwater glider
path planning applied to the short-term opportunistic sampling of dynamic
mesoscale ocean structures, Appl. Soft Comput. 24 (2014) 95108.
[41] A. Glotic, A. Zamuda, Short-term combined economic and emission
hydrothermal optimization by surrogate differential evolution, Appl. Energy
141 (2015) 4256.
[42] M.A. Potter, K.A.D. Jong, Cooperative coevolution: an architecture for evolving
coadapted subcomponents, Evol. Comput. 8 (1) (2001) 129.
[43] Y. Zhenyu, K. Tang, X. Yao, Large scale evolutionary optimization using
cooperative coevolution, Inform. Sci. 178 (15) (2008) 29852999.
[44] S. Boettcher, A.G. Percus, Extremal optimization: an evolutionary local-search
algorithm, in: H.K. Bhargava, N. Ye (Eds.), Computational Modeling and
Problem Solving in the Networked World: Interfaces in Computer Science and
Operations Research, Kluwer Academic Publishers, 2003, pp. 6177.
[45] S. Boettcher, A.G. Percus, Natures way of optimizing, Artif. Intell. 119 (1)
(2000) 275286.
[46] S. Meshoul, M. Batouche, Robust point correspondence for image registration
using optimization with extremal dynamics, in: Proceedings of the 24th
DAGM Symposium on Pattern Recognition, Lect. Notes Comput. Sci. 2449,
Springer, 2002, pp. 330337.
[47] M. Crepin
sek, H.-S. Liu, M. Mernik, Replication and comparison of
computational experiments in applied evolutionary computing: common
pitfalls and guidelines to avoid them, Appl. Soft Comput. 19 (2014) 161170.
[48] J. Demsar, Statistical comparisons of classiers over multiple data sets, J.
Mach. Learn. Res. 7 (2006) 130.
[49] J. Derrac, S. Garcia, D. Molina, F. Herrera, A practical tutorial on the use of
nonparametric statistical tests as a methodology for comparing evolutionary
and swarm intelligence algorithms, Swarm Evol. Comput. 1 (2011) 318.
[50] J. Parejo, J. Garca, A. Ruiz-Corts, J.C. Riquelme, STATService: Herramienta de
analisis estadistico como soporte para la investigacion con Metaheuristicas,
in: Actas del VIII Congreso Espanol sobre Metaheuristicas, Algoritmos
Evolutivos y Bio-inspirados, 2012.
[51] M. Chen, Y. Lu, G. Yang, Multiobjective extremal optimization with
applications to engineering design, J. Zhejiang Univ. Sci. A 8 (12) (2007)
19051911.
[52] I. De Falco, A. Della Cioppa, D. Maisto, U. Scafuri, E. Tarantino, A multiobjective
extremal optimization algorithm for efcient mapping in grids, in: M. Jrn, M.
Kppen, A. Saad, A.E. Tiwari (Eds.), Applications of Soft Computing, Advances
in Intelligent and Soft Computing 58, Springer, 2009, pp. 367377.

Для Просмотра Статьи Разгадайте Капчу

Caricato da

Informazioni sul documento

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Для Просмотра Статьи Разгадайте Капчу

Caricato da

Copyright:

Formati disponibili

Applied Soft Computing 46 (2016) 187203

Contents lists available at ScienceDirect

Applied Soft Computing

Parallel extremal optimization in processor load balancing for

Institute of High Performance Computing and Networking, CNR, Naples, Italy

This paper is an extended, improved version of the paper Parallel Extremal

technique with very interesting properties. Main elements of this

I. De Falco et al. / Applied Soft Computing 46 (2016) 187203

Classical mathematical programming approaches can be used to

program to nd a number of best logical migrations of tasks. When

2. State of the art

I. De Falco et al. / Applied Soft Computing 46 (2016) 187203

In [30] a population-based EO was applied to multi-objective

In [35,36] an island model-based DMEO combined with tabu

I. De Falco et al. / Applied Soft Computing 46 (2016) 187203

3. Extremal optimization with guided state changes

EO algorithm with guided state changes (EO-GS)

initialize conguration S at will

To improve the convergence rate of -EO, we have proposed

I. De Falco et al. / Applied Soft Computing 46 (2016) 187203

tasks placed on different nodes. This function is normalized in the

n : powerCPU (n), which is the sum of potential computing powers

if max(timeCPU (n)) min(timeCPU (n)) A

where S is the currently considered solution and S* is the previous

exists at least one

4.2. The local tness function

The load imbalance equal true requires a load correction. The

The local tness (or per-component tness) (t) of a task t in

where 1 > 1 0, 1 > 2 0 and 1 + 2 < 1 hold.

The function load(n) indicates how much the computational

(S) = attrexttotal(S) 1 + migration(S) 2

where is a parameter indicating the importance of the weight

I. De Falco et al. / Applied Soft Computing 46 (2016) 187203

the difference between the computational load metrics of the task

(com(e, j) + com(j, e))

and Tn = {t T : t = n} the set of threads, placed on computing

Output value range

general scheme which is presented in Fig. 2. The scheme begins

I. De Falco et al. / Applied Soft Computing 46 (2016) 187203

Fig. 2. The general scheme of the parallel version of the EO algorithm.

for parallel improvement. Variant PEO-GS-B (without crossover)

I. De Falco et al. / Applied Soft Computing 46 (2016) 187203

The following equation determines the relationship between

Fig. 3. The general structure of exemplary applications.

specied range, until the expected duration time of the phase is

I. De Falco et al. / Applied Soft Computing 46 (2016) 187203

PEO-A, PEO-GS-A, PEO-GS-B

for sequential EO-GS and PEO. We see that parallelization is able to

I. De Falco et al. / Applied Soft Computing 46 (2016) 187203

I. De Falco et al. / Applied Soft Computing 46 (2016) 187203

methods in the terms of applications speedup and the migration

I. De Falco et al. / Applied Soft Computing 46 (2016) 187203

ones can be rejected, the complete statistical analysis based on the

means that if an equivalence hypothesis is rejected, there is a 5%

I. De Falco et al. / Applied Soft Computing 46 (2016) 187203

I. De Falco et al. / Applied Soft Computing 46 (2016) 187203

different post-hoc procedures. The variable z represents the test

The equivalence between PEO-A and any other algorithm is

I. De Falco et al. / Applied Soft Computing 46 (2016) 187203

PEO-A vs. PEO-GS-C

8. Conclusions and future works

I. De Falco et al. / Applied Soft Computing 46 (2016) 187203

Friedman test ranks the algorithms for each problem separately;

i is given a rank Qi . Next, for each problem the product Si is com-

and Tn = {t T : t = n} the set of threads, placed on computing