Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
a r t i c l e
i n f o
Article history:
Received 15 July 2015
Received in revised form 25 April 2016
Accepted 26 April 2016
Available online 6 May 2016
Keywords:
Distributed programs
Load balancing
Extremal optimization
a b s t r a c t
The paper concerns parallel methods for extremal optimization (EO) applied in processor load balancing
in execution of distributed programs. In these methods EO algorithms detect an optimized strategy of
tasks migration leading to reduction of program execution time. We use an improved EO algorithm
with guided state changes (EO-GS) that provides parallel search for next solution state during solution
improvement based on some knowledge of the problem. The search is based on two-step stochastic
selection using two tness functions which account for computation and communication assessment of
migration targets. Based on the improved EO-GS approach we propose and evaluate several versions of
the parallelization methods of EO algorithms in the context of processor load balancing. Some of them use
the crossover operation known in genetic algorithms. The quality of the proposed algorithms is evaluated
by experiments with simulated load balancing in execution of distributed programs represented as macro
data ow graphs. Load balancing based on so parallelized improved EO provides better convergence of
the algorithm, smaller number of task migrations to be done and reduced execution time of applications.
2016 Elsevier B.V. All rights reserved.
1. Introduction
Dynamic load balancing in parallel and distributed systems is a
very important problem of computer engineering. It has accumulated a very rich bibliography, too numerous to be detailed in this
paper, including a large number of survey papers [18].
Load balancing has been recently supported by many natureinspired optimization methods for which some representative
papers have been outlined in Section 2. Among nature inspired
methods for load balancing no attention has been paid so far to
extremal optimization (EO) [9] which is a fairly new optimization
188
189
190
should be run at set time instants during the execution of the distributed application, so its execution should last as low an amount
of time as possible. This leads to our decision of not using a totally
random-based replacement strategy, rather a best-based one.
This approach of exploiting best-based search strategies is quite
typical in problems that have to be faced in a quasi-real-time way,
as this choice often leads to a reduction in the time needed to nd
a solution of acceptable quality, see for example [40,41]. Actually,
the mechanism for the choice of the neighbour we introduce here,
and hence the resulting strategy, are not strictly deterministic in
the use of the best, rather they are probabilistic, as they tend to
favour the best neighbouring solutions, yet also bad ones could be
selected, although with a low probability.
It should be noted here that the EO term component is similar
in meaning to the term subcomponent that is used in cooperative coevolution [42,43], and that the local tness function
term employed here is similar to the subcomponent evaluation
utilized there. In fact, in both cases attention is paid to evolve
solutions to complex problems in the form of interacting coadapted subcomponents that should emerge rather than being hand
designed. As it is noted in [42]: If a problem can be decomposed
into subcomponents without interdependencies, clearly each can
be evolved without regard to the others. Unfortunately, many
problems can only be decomposed into subcomponents exhibiting complex interdependencies. The effect of changing one of
these interdependent subcomponents is sometimes described as
a deforming or warping of the tness landscapes associated with
each of the other interdependent subcomponents. In the problem
we face within this paper, exactly as in [42], subcomponents are
interdependent, evolution aims at improving subcomponents, and
our local tness function performs the evaluation of what is called
a subcomponent in [42].
4. Load balancing based on the EO approach
In this section we will recall basic theoretical foundations for the
proposed EO-based load balancing. The proposed load balancing
method is meant for a cluster of multicore processors interconnected by a message passing network. Load balancing actions for
a program are controlled at the level of indivisible tasks which are
process threads.
We assume that the load balancing algorithms dynamically control assignment of program tasks tk , k {1 . . . |T|} to processors
(computing nodes) n, n {0, 1, . . ., |N| 1}, where T and N are the
sets of all the tasks and the computing nodes, respectively. The
goal is the minimal total program execution time, achieved by task
migration between processors. The load balancing method is based
on a series of steps in which detection and correction of processor load imbalance is done, Fig. 1. The imbalance detection relies
on some run-time infrastructure which observes the state of the
executive computer system and the execution states of application programs. Processors (computing nodes) periodically report
their current loads to the load balancing control which monitors
the current system load imbalance. When load imbalance is discovered, processor load correction actions are launched. For them
an EO-GS algorithm is executed that identies the tasks which need
migration and the processor nodes which will be migration targets.
Following this, the required physical task migrations are performed
with the return to the load imbalance detection.
In the solutions published in [13,14], the applied EO-GS algorithm was of the sequential character (see Section 3). In Section 5
of this paper we describe several parallel versions of EO-GS whose
applications were studied in the context of load balancing. Their
experimental evaluations are described in Section 6.
To evaluate the load of the system two indicators are used.
The rst is a function returning the computing power of a node
191
s,d T com(s,
d) and totalext(S) =
s,d T :s =
/ d
(3)
com
(s, d).
The function migration(S) is a migration costs metrics. The
value of this function is in the range [0, 1], i.e., it is equal to 0
when there is no migration, when all tasks have to be migrated
migration(S) = 1, otherwise 0 migration(S) 1:
migration(S) = |{t T : St =
/ S
t }|/|T |
Fig. 1. The general scheme of load balancing based on EO with guided state changes.
I=
true
false
nN
imbalance(S) =
deviation(S)/(2 |N| WT )
unloaded node
otherwise
(5)
where
deviation(S)
= n=0,. . .,|N|1 |nwp(S, n)/powerCPU (n) WT |,
WT = t T wp(t)/ n=0,. . .,|N|1 powerCPU (n),
nwp(S, n) =
wp(t).
t T : =n
t
otherwise
(2)
(6)
(4)
(7)
192
(n1 , n2 ) =
sgn(loaddev(n1 ) loaddev(n2 ))
when loaddev(n1 ) =
/ loaddev(n2 )
sgn(attrext(j, n2 ) attrext(j, n1 ))
otherwise.
(8)
where
loaddev(n) =
attrext(j, n) =
nwp(S, n)
WT
powerCPU (n)
(9)
(10)
e Tn
Table 1
The control parameters of EO algorithm variants and the range of their values.
Parameter
Range
Studied values
Selected value
1
2
Niter
[0, 1]
[0, 1]
[0, 1]
[0, 1]
[0, 1]
[0, ]
[0, ]
{1, . . . }
0.25 . . . 0.75
0.5
0.5, 0.6, 0.75, 0.95
0.25, 0.18, 0.13, 0.05
0.25, 0.22, 0.17, 0.05
0.75, 1.5, 3.0
0.08, 0.15, 0.5, 1.0, 2.0, 4.0
30 . . . 750
0.5
0.5
0.75
0.13
0.17
1.5
0.5
512
N
P
{1, . . . }
{1, . . . }
2, 4, 8, 16, 32
1, 2, 4, 8
Fitness function
(S)
(S)
[0, 1]
[0, 1]
193
In inner loops of the PEO-A version, the selected solution components are improved using a random selection of a new component
value. For each improved solution, the global tness function (S)
is used in the denition of the starting solution for next outer loop
iterations.
When iterations in all P parallel EO branches are completed, we
have 5 variants of the rules which govern the decision taken on
which solutions will be improved in inner loops during the subsequent iteration of the outer loop, i.e., in a series of iterations of
EO in P parallel branches of the algorithm. Three of them (PEO-GSC, PEO-GS-D, PEO-GS-E) include a single point crossover on so-far
solutions in stochastically selected points.
Variants PEO-A and PEO-GS-A (without crossover) select only
one globally best solution Sbest p produced during the previous
outer loop iteration to be next distributed to P parallel branches
194
6. Experimental results
The goal of the experiments was to compare the presented variants of parallel EO-GS to classic (sequential) EO and EO-GS. The
experimental results have been obtained by simulated execution
of application programs in a distributed system, namely in a cluster of computing nodes (processors) interconnected by a message
passing network. For this a discrete event based simulator was
used which was designed on top of the DEVS formalism (discrete
event system specication) [17]. The processors executing programs, including their data communication, were modelled using
the DEVS formalism. The experiments were performed using a cluster of Intel i7-based workstations (8-core i7-2600 3.40 GHz CPU),
under control of the Linux operating system.
The DEVS program execution simulator was running as a thread
on an Intel workstation. During experiments, in parallel with the
simulated distributed execution of an exemplary application graph,
a dynamic load balancing algorithm was performed based on the
EO approach. The load balancing algorithm was designed as a number of threads executed inside a load balancing controller which
was a DEVS module. Simulated computing nodes were periodically reporting their loads to the load balancing controller and then,
depending on the states of the system and the application, appropriate actions were undertaken including activation of EO-based
load balancing actions. Each load balancing experiment with a parallel version of EO algorithm was run in such a way that parallel
branches of the algorithm were executed as threads inside the DEVS
load balancing controller module. The EO parallel threads were running on separate cores of a 8-core workstation. The experiments
were executed in parallel on workstations of the cluster we used.
The reference sequential EO algorithms were run as single threads
executed on separate cores of workstations, inside the cluster of
workstations.
The DEVS-based simulator and the load balancing algorithms
including the EO approach were written in Java, with thread-based
parallelization for multicore machines. Source codes of the software used for the experiments can be made available to readers
upon request.
The assumed simulated model of program execution corresponds to parallelization based on message-passing, using the MPI
library for communication. MPI mechanism is simulated using
DEVS, including communication contentions which are modelled
at the level of the network interface of each computing node. In our
experiments, we used exemplary application graphs, which were
randomly generated in such a way that program tasks were set in
a number of phases. Tasks in a phase could communicate. At the
boundaries between phases there was also a global exchange of
data, Fig. 3. To generate a random application, we set the number of phases, the number of tasks in each phase, the precedence
relationship between phases and the lower and the upper limit
for duration of each phase. Then, the tasks of each phase are created from code and communication blocks of uniformly distributed
random weight, or transfer time, respectively, again within some
195
Table 2
The number of iterations and evaluations of the global tness function in parallel variants of EO algorithm.
Algorithm
PEO-GS-D, PEO-GS-E
PEO-GS-C
Nouter
Ninner
Niter
Evaluations of (S)
8
4
2
8
4
2
8
4
2
4
8
16
4
8
16
4
8
16
14
15
15
16
16
15
16
16
16
448
480
480
512
512
480
512
512
512
504
528
512
520
528
512
512
512
512
memory. In order to obtain comparable results, we tested performance with standard and parallelized EO applied to load balancing
for a xed search space in all versions of EO algorithms [47] equivalent to about 512 global tness function evaluations. To accomplish
this requirement, the number of iterations Niter has been set to
a different value for each presented version of EO, see Table 2.
For population-based parallel versions, the numbers of iterations
in inner loops depended on the number of parallel branches in
the algorithm and the number of additional global tness evaluations performed in the exchange phase. The assumed exchange
rate of solutions between parallel branches was every 14, 15, or 16
inner iterations. For example, when P = 2, Niter was equal to 480 for
PEO-GS-C, PEO-GS-D and PEO-GS-E variants, and equal to 512 for
PEO-GS-A and PEO-GS-B. For sequential EO algorithms, we always
set the number of iterations Niter to 512.
Fig. 4(a) and (b) shows the average irregular application parallel
speedup (against sequential execution) and applications parallel
speedup improvement due to load balancing based on different versions of EO for the number of computing nodes in an application set
to 2, 4, 8, 16 and 32. The reference for the speedup improvement
was the speedup with load balancing based on the standard sequential EO. Load balancing with PEO-A algorithm (based on single
executions of standard EO in parallel branches in the scheme from
Fig. 2) and with other parallel versions of EO-GS for the number of
computing nodes in the application up to 16, produced no meaningful speedup improvement in this experiment (see Fig. 4(b)). It is
because the standard sequential EO was sufcient for nding good
task migrations for these smaller numbers of processor nodes in
the application execution. For 32 computing nodes in the application execution, population-based parallelization of EO combined
with the guided search approach (algorithms PEO-GS-A, -B, -C, D, -E) gives meaningful speedup improvements due to repeated
selection/exchange of the best solutions among parallel branches
of the algorithm and nding better task migrations in load balancing. For 32 computing nodes used in the applications we achieved
the application average speedup improvement with different parallel versions of EO-GS of about 17%. We can see that application of
standard crossover (algorithms PEO-GS-C, -D, -E) in solution selection did not result in any spectacular speed-up increase. The reason
was the relatively small number of iterations of EO, comparing
the relatively high iteration number applied in genetic algorithms.
For 32 computing nodes, the speedup improvement with parallel
EO-GS (PEO-GS-A) was better by about 4% than the improvement
with sequential EO-GS (EO-GS). These improvements needed no
additional computations, since all algorithms performed the same
search work. So, an increase of the iteration number was replaced
by a widening of the search area using parallel algorithm branches
and best solutions exchange.
We also investigated changes of migration number for different
load balancing algorithms in irregular applications, Fig. 5. Except
PEO-A, population-based parallel EO-GS algorithms (PEO-GS-A, B, -C, -D, -E), achieved big reductions of migration number. The
reduction in the range 2040% is higher for all cases than reduction
196
(a)
(b)
Fig. 4. Irregular application average speedup (a) and average speedup improvement (b) with parallelized EO-GS against sequential EO versus computing node number.
which prepares the data for the parallelized EO. For larger number
of hardware threads (P = 4), the average parallel efciency, measured but not shown as a graph was from 32% for 512 iterations to
about 50% for 4096 iterations (we do not cover the results of using
8 threads due to non equivalent execution on 4-core i7 CPUs under
HyperThreading).
Our experiments have revealed that the population-type parallelized EO-GS methods are able to nd load balancing solutions of
high quality for irregular applications, both in the terms of applications speedup and the migration number. This positive result is
consistent for all tested irregular applications. Thus, all populationbased EO-GS variants (except PEO-A) give viable load balancing
algorithms for irregular applications, providing a promising alternative. They gave parallel applications speedup improvement by
using multicore CPUs without increase of the iteration number.
Since the differences between population-type parallelized EO-GS
Fig. 5. Average migration number change with parallelized EO-GS against sequential EO for irregular applications versus computing node number.
197
(a)
(b)
Fig. 6. Regular application average speedup improvement (a) and migration number change (b) with parallelized EO-GS against sequential EO versus computing node
number.
Table 3
Speedup: average rankings of the algorithms.
Algorithm
PEO-GS-B
PEO-GS-E
PEO-GS-D
PEO-GS-A
PEO-GS-C
PEO-A
Friedman
Aligned Friedman
Quade
3.372
3.380
3.412
3.445
3.450
3.941
2943.491
2989.804
2982.654
3071.527
3025.880
3889.645
3.311
3.337
3.296
3.332
3.337
4.386
Statistic
71.694
895.922
1082.224
p-Value
0.000
0.000
0.000
algorithms. The last two rows show the statistic and the p-value
for each test, respectively. For Friedman and Aligned Friedman tests
the statistic is distributed according to chi-square with 5 degrees
of freedom, whereas for Quade test it is distributed according to
F-distribution with 5 and 5245 degrees of freedom.
In each of the three tests, the lower the value for an algorithm,
the better the algorithm is. The three tests give different results. In
Friedman test PEO-GS-B turns out to be the best optimization algorithm, and PEO-GS-E is the runner-up, whereas in Aligned Friedman
test PEO-GS-B is the winner followed by PEO-GS-D, and, nally, in
Quade test PEO-GS-D is the best and PEO-GS-B is the runner-up.
For each test the best performance is shown in bold in the table.
For all the tests PEO-A achieves the worst ranking.
Furthermore, with the aim to examine if some hypotheses of
equivalence between the best performing algorithm and the other
198
(a)
(b)
Fig. 7. Strongly irregular application average speedup (a) and average speedup improvement (b) with parallelized EO-GS against sequential EO versus computing node
number.
Fig. 8. Average migration number change with parallelized EO-GS against sequential EO for strongly irregular applications versus computing node number.
199
Fig. 9. Average execution time of EO and EO-GS for 512 iterations versus the number of tasks in exemplary applications.
Fig. 10. Average execution time of EO versus the number of tasks and computing nodes in applications under load balancing.
Fig. 11. Average parallel efciency of PEO-GS-A (P = 2) for different number of tasks in applications versus the number of EO-GS iterations.
Table 4
Speedup: results of post-hoc procedures for Friedman test over all tools (at = 0.05). Control method: PEO-GS-B.
i
Algorithm
z = (R0 Ri )/SE
Bonferroni
Holm
Holland
Rom
Finner
Li
5
4
3
2
1
PEO-A
PEO-GS-C
PEO-GS-A
PEO-GS-D
PEO-GS-E
0.000
0.339
0.372
0.624
0.921
6.975
0.956
0.892
0.490
0.099
0.000
0.339
0.372
0.624
0.921
0.010
0.013
0.017
0.025
0.050
0.010
0.013
0.017
0.025
0.050
0.011
0.013
0.017
0.025
0.050
0.010
0.020
0.030
0.040
0.050
0.004
0.004
0.004
0.004
0.050
0.010
0.013
0.013
0.011
0.020
0.004
Th
200
Table 5
Speedup: results of post-hoc procedures for Aligned Friedman test over all tools (at = 0.05). Control method: PEO-GS-B.
i
Algorithm
z = (R0 Ri )/SE
Bonferroni
Holm
Holland
Rom
Finner
Li
5
4
3
2
1
PEO-A
PEO-GS-A
PEO-GS-C
PEO-GS-E
PEO-GS-D
0.000
0.107
0.299
0.560
0.622
11.919
1.613
1.038
0.583
0.493
0.000
0.107
0.299
0.560
0.622
0.010
0.013
0.017
0.025
0.050
0.010
0.013
0.017
0.025
0.050
0.011
0.013
0.017
0.025
0.050
0.010
0.020
0.030
0.040
0.050
0.020
0.020
0.020
0.020
0.050
0.010
0.013
0.013
0.011
0.020
0.020
Th
Table 6
Speedup: results of post-hoc procedures for Quade test over all tools (at = 0.05). Control method: PEO-GS-D.
i
Algorithm
z = (R0 Ri )/SE
Bonferroni
Holm
Holland
Rom
Finner
Li
5
4
3
2
1
PEO-A
PEO-GS-E
PEO-GS-C
PEO-GS-A
PEO-GS-B
0.000
0.780
0.780
0.806
0.917
7.315
0.280
0.279
0.245
0.104
0.000
0.780
0.780
0.806
0.917
0.010
0.013
0.017
0.025
0.050
0.010
0.013
0.017
0.025
0.050
0.011
0.013
0.017
0.025
0.050
0.010
0.020
0.030
0.040
0.050
0.004
0.004
0.004
0.004
0.050
0.010
0.013
0.013
0.011
0.020
0.004
Th
Table 7
Speedup: pairwise comparison between the algorithms.
Algorithm
PEO-A vs. PEO-GS-B
PEO-A vs. PEO-GS-E
PEO-A vs. PEO-GS-D
PEO-A vs. PEO-GS-A
PEO-A vs. PEO-GS-C
PEO-GS-B vs. PEO-GS-C
PEO-GS-A vs. PEO-GS-B
PEO-GS-C vs. PEO-GS-E
PEO-GS-A vs. PEO-GS-E
PEO-GS-B vs. PEO-GS-D
PEO-GS-C vs. PEO-GS-D
PEO-GS-A vs. PEO-GS-D
PEO-GS-D vs. PEO-GS-E
PEO-GS-B vs. PEO-GS-E
PEO-GS-A vs. PEO-GS-C
Th
p
0.000
0.000
0.000
0.000
0.000
0.339
0.372
0.391
0.428
0.624
0.641
0.687
0.696
0.921
0.949
z
6.975
6.876
6.485
6.083
6.019
0.956
0.892
0.857
0.793
0.490
0.467
0.402
0.391
0.099
0.064
Holm
Shaffer
0.003
0.004
0.004
0.004
0.005
0.005
0.006
0.006
0.007
0.008
0.010
0.013
0.017
0.025
0.050
0.003
0.005
0.005
0.005
0.005
0.005
0.006
0.006
0.007
0.008
0.010
0.013
0.017
0.025
0.050
0.005
0.005
Friedman
Aligned Friedman
Quade
PEO-GS-C
PEO-GS-B
PEO-GS-E
PEO-GS-A
PEO-GS-D
PEO-A
3.205
3.285
3.306
3.311
3.343
4.550
2775.854
2883.107
2930.587
2929.703
3066.721
4317.029
3.090
3.238
3.280
3.308
3.361
4.723
Statistic
399.751
906.230
1073.112
p-Value
0.000
0.000
0.000
201
Table 9
Migrations: results of post-hoc procedures for Friedman test over all tools (at = 0.05). Control method: PEO-GS-C.
i
Algorithm
z = (R0 Ri )/SE
Bonferroni
Holm
Holland
Rom
Finner
Li
5
4
3
2
1
PEO-A
PEO-GS-D
PEO-GS-A
PEO-GS-E
PEO-GS-B
0.000
0.092
0.195
0.216
0.327
16.464
1.685
1.295
1.236
0.980
0.000
0.092
0.195
0.216
0.327
0.010
0.013
0.017
0.025
0.050
0.010
0.013
0.017
0.025
0.050
0.011
0.013
0.017
0.025
0.050
0.010
0.020
0.030
0.040
0.050
0.035
0.035
0.035
0.035
0.050
0.010
0.013
0.013
0.011
0.020
0.035
Th
Table 10
Migrations: results of post-hoc procedures for Aligned Friedman test over all tools (at = 0.05). Control method: PEO-GS-C.
i
Algorithm
z = (R0 Ri )/SE
Bonferroni
Holm
Holland
Rom
Finner
Li
5
4
3
2
1
PEO-A
PEO-GS-D
PEO-GS-E
PEO-GS-A
PEO-GS-B
0.000
0.000
0.051
0.053
0.177
19.415
3.664
1.949
1.938
1.351
0.000
0.000
0.051
0.053
0.177
0.010
0.013
0.017
0.025
0.050
0.010
0.013
0.017
0.025
0.050
0.011
0.013
0.017
0.025
0.050
0.010
0.020
0.030
0.040
0.050
0.043
0.043
0.043
0.043
0.050
0.010
0.017
0.017
0.013
0.030
0.043
Th
Table 11
Migrations: results of post-hoc procedures for Quade test over all tools (at = 0.05). Control method: PEO-GS-C.
i
Algorithm
5
4
3
2
1
PEO-A
PEO-GS-D
PEO-GS-A
PEO-GS-E
PEO-GS-B
0.000
0.069
0.143
0.202
0.321
z = (R0 Ri )/SE
Bonferroni
Holm
Holland
Rom
Finner
Li
10.961
1.821
1.466
1.274
0.993
0.000
0.069
0.143
0.202
0.321
0.010
0.013
0.017
0.025
0.050
0.010
0.013
0.017
0.025
0.050
0.011
0.013
0.017
0.025
0.050
0.010
0.020
0.030
0.040
0.050
0.036
0.036
0.036
0.036
0.050
0.010
0.013
0.013
0.011
0.020
0.036
Th
best performing one, that is taken as the control method, and each
algorithm is compared against this latter with the aim to investigate
whether or not the equivalence hypothesis can be rejected.
All the post-hoc procedures applied to the Friedman test cannot
reject the hypothesis of statistical equivalence between PEO-GS-C
and PEO-GS-B, whereas they all can reject that between PEO-GSC and PEO-A. All the procedures apart from Li cannot reject the
equivalence of PEO-GS-C with PEO-GS-E and PEO-GS-A.
For the Aligned Friedman test all the post-hoc procedures cannot
reject the hypothesis of statistical equivalence between PEO-GS-C
and PEO-GS-B, whereas they all can reject that between PEO-GSC and PEO-A and that between PEO-GS-C and PEO-GS-D. All the
procedures apart from Li cannot reject the equivalence of PEO-GS-C
with PEO-GS-A.
For the Quade test all the post-hoc procedures cannot reject
the hypothesis of statistical equivalence between PEO-GS-C and
Table 12
Migrations: pairwise comparison between the algorithms.
Algorithm
Holm
Shaffer
0.000
0.000
0.000
0.000
0.000
0.092
0.195
0.216
0.327
0.480
0.653
0.696
0.753
0.797
0.953
16.464
15.484
15.228
15.169
14.779
1.685
1.295
1.236
0.980
0.706
0.449
0.391
0.315
0.257
0.058
0.003
0.004
0.004
0.004
0.005
0.005
0.006
0.006
0.007
0.008
0.010
0.013
0.017
0.025
0.050
0.003
0.005
0.005
0.005
0.005
0.005
0.006
0.006
0.007
0.008
0.010
0.013
0.017
0.025
0.050
0.005
0.005
Th
PEO-GS-B, whereas they all can reject that between PEO-GS-C and
PEO-A. All the procedures apart from Li cannot reject the equivalence of PEO-GS-C with PEO-GS-E and PEO-GS-A.
Finally, Table 12 shows the pairwise comparison between these
six algorithms in terms of migrations. Two post-hoc procedures,
i.e., those by Holm and Shaffer, are considered here.
The equivalence between PEO-A and any other algorithm is
rejected by both procedures, so PEO-A is statistically inferior to all
the others in terms of migration cost. Also equivalence between
PEO-GS-C and PEO-GS-D is rejected by both procedures. Equivalence hypothesis between any pair of algorithms from the set
{PEO-GS-A, PEO-GS-B, PAR6, PEO-GS-E}, instead, is not rejected by
both procedures.
202
have presented the results of statistical tests applied to experimentally measured applications execution speedup and numbers.
Additionally, we have checked that the proposed EO-GS algorithms
show very good scalability for the increasing number of tasks and
computing units appearing in the load balancing problem.
The general conclusion resulting from our experiments is that
the extension of the solution search space in EO-GS through parallel
mutations in a population of multiple candidate solutions, even in
its modest form of 8 parallel branches, provides satisfactory results.
Obtained experimental results conrm that the application of parallel EO for load balancing is successful as an evolutionary strategy
based on processing of best solutions in the optimization process
by improvements of the tness-supported selection of the worst
components. The EO method, which is based on improvements of
the single solution representation has low memory and operational
complexities, which is strongly convergent with the requirements
of the load balancing problem. We were able to obtain satisfactory
quality of processor load balancing without increasing the number of sequential iterations of EO but using parallelized computing
power of multicore CPUs. Additional prot of processing many
parallel state changes in EO is a reduction in the number of task
migrations needed to balance processor loads. Best results, in the
terms of improved application parallel speedup with reduced number of migrations, have been obtained for irregular applications.
The statistical analysis of the tested algorithms for the obtained
irregular applications speedup has revealed that the variants
PEO-GS-B and PEO-GS-D have provided theoretically best results
over tested algorithms, however with a very small predominance.
The variant PEO-GS-B was based on the solutions produced by the
parallel branch with the best average quality of solutions over all
iterations. The variant PEO-GS-D was based on the set of crossovers
of the globally best solution obtained from so-far iterations with
P 1 other solutions from the preceding iterations. In terms of the
migration number reduction for irregular applications, the variant
PEO-GS-C (based on broadcast of the best crossover of the best
solution obtained from so-far iterations with a randomly selected
nal solution in parallel branches) turned out to be the best with
a similar slight predominance. For regular parallel applications, in
which task execution and communication pattern are predictable,
the proposed EO-GS algorithms were able to give very small
speedup improvements comparing standard EO, however, with
strongly reduced number of task migrations, thus lowering the
cost of load balancing.
Our further research on the proposed parallel EO-based application load balancing methods will concern research on tuning
methods for the proposed algorithms to the properties of the application graphs under optimization. In this research, we would like
to compare the effects of such tuning with the solutions obtained
using mathematical programming methods and multi-objective
optimization. Multi-objective approach could be fruitful continuation of the research presented in this paper. In fact, in the current
paper we have assumed the problem solving with the use of one
global tness function . Yet, as it has been described, it is actually a
linear combination of three functions, i.e. attrxttotal, migration and
imbalance. Consequently, we can take a multi-objective approach
and dene as our objectives the three above functions. Some
approaches to multiobjective EO versions can already be found in
the literature, e.g. [51,52]. Additional aspect for this research will
be the assessment of the performance of the algorithms for a larger
number of computing nodes.
Appendix A. Statistical tests introduction
A very basic explanation of the statistical tests reported in
Section 7 is given here, for a case in which np problems and nA
algorithms are considered.
j
puted as: Si
j
Qi (ri
j
(nA + 1)/2), where ri
=
is the rank of algorithm
j within problem i. By doing so, this value takes into account both
the relative importance of each observation within the problem,
and the relative signicance of the problem the observation refers
np j
to. Finally, a value Sj computed as Sj =
S , for j = 1, 2, . . ., nA can
i=1 i
be assigned to each algorithm.
References
[1] R.Z. Khan, J. Ali, Classication of task partitioning and load balancing
strategies in distributed parallel computing systems, Int. J. Comput. Appl. 60
(17) (2012) 4853.
[2] Y. Jiang, A survey of task allocation and load balancing in distributed systems,
IEEE Trans. Par. Distrib. Syst. (2016), http://dx.doi.org/10.1109/TPDS.2015.
2407900.
[3] K. Barker, N. Chrisochoides, An evaluation of a framework for the dynamic
load balancing of highly adaptive and irregular parallel applications, in:
Proceedings of the ACM/IEEE Conference on Supercomputing, Phoenix, USA,
ACM Press, 2003, p. 45.
[4] M. Zakarya, N. Dilawar, N. Khan, A survey on energy efcient load balancing
algorithms over multicores, Int. J. Res. Comput. Appl. Inf. Technol. 1 (1) (2013)
6068.
[5] S.V. Pius, T.S. Shilpa, Survey on load balancing in cloud computing, in:
Proceedings of the International Conference on Computing, Communication
and Energy Systems (ICCCES), Kerala, India, 2014, p. 277.
[6] T. Desai, J. Prajapati, A survey of various load balancing techniques and
challenges in cloud computing, Int. J. Sci. Technol. Res. 2 (11) (2013) 158161.
[7] N. Patel, N. Chauhan, A survey on load balancing and scheduling in cloud
computing, IJIRST Int. J. Innov. Res. Sci. Technol. 1 (7) (2014) 185189.
[8] M. Mishra, S. Agarwal, P. Mishra, S. Singh, Comparative analysis of various
evolutionary techniques of load balancing: a review, Int. J. Comput. Appl. 63
(15) (2013) 813.
[9] S. Boettcher, A.G. Percus, Extremal optimization: methods derived from
coevolution, in: Proceedings of the Genetic and Evolutionary Computation
Conference, San Francisco, CA, Morgan Kaufmann, 1999, pp. 825832.
[10] K. Sneppen, P. Bak, H. Flyvbjerg, M.H. Jensen, Evolution as a self-organized
critical phenomenon, Proc. Natl. Acad. Sci. U. S. A. 92 (1995) 52095213.
[11] M. Eusuff, K. Lansey, F. Pasha, Shufed frog-leaping algorithm: a memetic
meta-heuristic for discrete optimization, Eng. Optim. 38 (2) (2006) 129154.
[12] S.P. Bradley, A.C. Hax, T.L. Magnanti, Applied Mathematical Programming,
Addison-Wesley Publishing Company, Reading, MA, 1977.
[13] I. De Falco, E. Laskowski, R. Olejnik, U. Scafuri, E. Tarantino, M. Tudruj, Load
balancing in distributed applications based on extremal optimization, in:
EvoCOMNET 2013, Vienna, Austria, Lect. Notes Comput. Sci. 7835, Springer,
2013, pp. 5261.
[14] I. De Falco, E. Laskowski, R. Olejnik, U. Scafuri, E. Tarantino, M. Tudruj,
Extremal optimization with guided state changes in load balancing of
[15]
[16]
[17]
[18]
[19]
[20]
[21]
[22]
[23]
[24]
[25]
[26]
[27]
[28]
[29]
[30]
[31]
[32]
[33]
203
[47] M. Crepin
sek, H.-S. Liu, M. Mernik, Replication and comparison of
computational experiments in applied evolutionary computing: common
pitfalls and guidelines to avoid them, Appl. Soft Comput. 19 (2014) 161170.
[48] J. Demsar, Statistical comparisons of classiers over multiple data sets, J.
Mach. Learn. Res. 7 (2006) 130.
[49] J. Derrac, S. Garcia, D. Molina, F. Herrera, A practical tutorial on the use of
nonparametric statistical tests as a methodology for comparing evolutionary
and swarm intelligence algorithms, Swarm Evol. Comput. 1 (2011) 318.
[50] J. Parejo, J. Garca, A. Ruiz-Corts, J.C. Riquelme, STATService: Herramienta de
analisis estadistico como soporte para la investigacion con Metaheuristicas,
in: Actas del VIII Congreso Espanol sobre Metaheuristicas, Algoritmos
Evolutivos y Bio-inspirados, 2012.
[51] M. Chen, Y. Lu, G. Yang, Multiobjective extremal optimization with
applications to engineering design, J. Zhejiang Univ. Sci. A 8 (12) (2007)
19051911.
[52] I. De Falco, A. Della Cioppa, D. Maisto, U. Scafuri, E. Tarantino, A multiobjective
extremal optimization algorithm for efcient mapping in grids, in: M. Jrn, M.
Kppen, A. Saad, A.E. Tiwari (Eds.), Applications of Soft Computing, Advances
in Intelligent and Soft Computing 58, Springer, 2009, pp. 367377.