Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Article history Abstract: The GeantV R&D approach is revisiting the standard particle
Received: 21-09-2018 transport simulation approach to be able to benefit from “Single Instruction,
Revised: 08-12-2018 Multiple Data” (SIMD) computational architectures or extremely parallel
Accepted: 7-01-2019 systems like coprocessors and GPUs. The goal of this work is to develop a
Corresponding Author:
mechanism for optimizing the programs used for High-Energy Physics
Oksana Shadura (HEP) particle transport simulations using a “black-box” optimization
CERN, Geneve, Switzerland approach. Taking in account that genetic algorithms are among the most
Email: oksana.shadura@cern.ch widely used “black-box” optimization methods, we analyzed a simplified
ksu.shadura@gmail.com model that allows precise mathematical definition and description of the
genetic algorithm. The work done in this article is focused on the studies of
evolutionary algorithms and particularly on stochastic optimization
algorithms and unsupervised machine learning methods for the
optimization of the parameters of the GeantV applications.
© 2019 Oksana Shadura, Federico Carminati and Anatoliy Petrenko. This open access article is distributed under a Creative
Commons Attribution (CC-BY) 3.0 license.
Oksana Shadura et al. / Journal of Computer Science 2019, 15 (1): 57.66
DOI: 10.3844/jcssp.2019.57.66
injected “operators” in genetic algorithms and if, thanks A Markov chain is defined by a transition matrix Tq , p
to them, it is possible to speed up the process of finding from the population p to q , where the vector describing
a Pareto front.
the population is defined as:
Materials and Methods
During the work on optimization of computing
{ p = ( p , p ,…, p ) ,
1 2 m
t
0 ≤ pα ≤ 1, ∑
m
α =1 }
pα = 1 , (2)
58
Oksana Shadura et al. / Journal of Computer Science 2019, 15 (1): 57.66
DOI: 10.3844/jcssp.2019.57.66
threads used in simulation or the size of the vectors of and for the matrix Wi , j = {w j } = ( wi ) j
we have the
transported particles etc. These can be described via a
orthogonality condition:
data matrix of size m × n:
Wˆ t ⋅ Wˆ = Iˆ. (8)
Xˆ = { X α ,i } = {( xα )i } = { xα } , (3)
1 ɶt Θ ɶ
Tˆ = Xˆ t ⋅ Xˆ t . (4) Θ i ,α α , j = δ i , j .
m
if w j
are eigenvectors of the matrix T̂ with the Using (12), (11) and (7) we see that Θɶ α , j = θɶj is the {}
corresponding eigenvalues tj:
matrix of the eigenvectors (θɶj )α of the matrix Kɶ = Xˆ ⋅ Xˆ t
Tˆ ⋅ w j = t j w j ,1 ≤ j ≤ n. (5) of size m × m:
wit ⋅ w j = δ i , j ,1 ≤ i, j ≤ n. (6)
From (10) we can derive the representation for the
Then: data matrix X̂ :
59
Oksana Shadura et al. / Journal of Computer Science 2019, 15 (1): 57.66
DOI: 10.3844/jcssp.2019.57.66
and the SVD representation of the data matrix is then selection operator selects genetic individuals from the
obtained: current population via the performance indicators,
f = { fα } ∈ R m , fα = f(α), α∈Ω:
ɶ ∆1/ 2W t .
X α ,i = mΘ (14)
α ,k k , j j ,i
diag ( f ) ⋅ p
F ( p) =
In the case when the matrix of non-central second t ,
f ⋅p
moments T̂ has the (n-q) smallest eigenvalues such that
tj << 1, q +1≤j≤n, we can use the “eigenvalue control
where, diag ( f ) is a diagonal matrix.
parameter” approximation of the data matrix and
approximate it by the output data matrix Xɶ α , j of rank q: The mutation operator
U : Λ → Λ,
ɶ ∆ɶ 1/ 2W t
Xɶ α ,i = mΘ α ,k k , j j ,i
(15)
( ɶ W t + ... + t1/ 2Θ
= m t11/ 2Θα ,1 1, i q
ɶ Wt ,
α , q q ,i ) is an m × m matrix with the (α,β)-th entry uα,β >0 for all
α, β and uα,β indicates the probability of individual β∈Ω
where, the eigenvalue matrix ∆ɶ k , j has rank q(tq+1 = tq+2 = mutating into α∈Ω. (U ⋅ p )α is the probability for an
⋅⋅⋅ = tn = 0). This approximation is the analog of the individual (type α) to appear after the mutation process
Hotelling transformation (Hotelling, 1936). is applied to the population p (Amadio et al., 2017).
We can estimate the mean square error ηq for the The crossover operator is defined as:
approximation (15):
C : Λ → Λ,
1 m n 2
ηq = ∑∑
mn α =1 i =1
( X α ,i − Xɶ α ,i )
(
C ( p ) = pt ⋅ Cˆ1 ⋅ p,..., p t ⋅ Cˆ m ⋅ p
)
2
1 m n n
ɶ W t = 1
n
= ∑∑ m ∑ tk Θ α , k k ,i ∑ .tk where, Cˆ1 ,..., Cˆ m is a sequence of symmetric non-negative
mn α =1 i =1 k = q +1 n k = q +1
N × N real-valued matrices. Cα ( p ) expresses
The minimum error is achieved if the matrix T̂ has probability that an individual α appears after the
crossover process is applied to the population p .
the (n-q) smallest eigenvalues such that tj << 1, q
+1≤j≤n.
At this point having already defined the UPCA G : Λ → Λ,
(16)
operator, we need to describe the genetic algorithm G ( p ) = C U F ( p ).
model for the injection of the operator we introduced
earlier. Recalling ideas presented in (Vose, 1999) and
Having defined G, we can express the stochastic
described in the previous section, we will develop the transition matrix, based on the probability of
optimization process (Amadio et al., 2017). transformation of the population p into q :
The GA operators are operating in the space Λ = (p1,
p2,⋅⋅⋅, pm)t of the population’s vectors.
( G ( p ))
mqα
The genetic operator Gα ( p ) is defined by the
= m!∏α ∈Ω
α
Tq, p
(17)
probability of generating an individual α (starting from ( mqα )!
the previous population p ). We define a map G: Λ→Λ,
function and Ω is the sample space. The map G is individual α in the next generation and mqα is the number
equivalent to the composition of selection, mutation and of individuals α in the population q with size m .
crossover maps. The genetic selection operator:
We can improve the convergence rate of genetic
algorithms (how fast genetic algorithms are converging
F : Λ → Λ,
to the optimum per generation) by adding a new genetic
F ( p ) = ∏α∈Ω Fα ( p )
operator P based on UPCA procedure. We will call this
procedure “UPCA noise cleanup operator and we will
defines the probability of a genetic individual of type α introduce it in the genetic algorithm’s map
(when the process of selection is applied to p ∈ Λ ). A
GP ( p ) = P C U F ( p ) .
60
Oksana Shadura et al. / Journal of Computer Science 2019, 15 (1): 57.66
DOI: 10.3844/jcssp.2019.57.66
Since we have a similar setup as it is done in the SGA on the front to which they belong to. Individuals in the
case (Schmitt and Rothlauf, 2001), we can observe that a first front are given a rank value of one, while
genetic algorithm convergence rate is defined by the individuals in the second front are assigned a rank value
eigenvalues that follow the highest one. That’s why we of two and so on. In addition to the rank value, an
had to apply the proposed operator P on the earlier additional parameter called crowding distance is
eigenvalues. calculated for each individual (Shadura, 2017). The
An interesting observation is that the eigenvector with crowding distance is a measure of how close an individual
the largest eigenvalues defines a subspace of solutions for is to its neighbors. Larger average value of crowding
multi-objective problem populating Pareto front. distances indicates significant population diversity.
In the next section we will assess whether using an NSGA-III (Deb et al., 2005) has a different algorithm
iterative procedure for an uncentered data matrix allows schema, built on the idea of improved reference points
us to reach the subspace of optimal solutions faster than selection and using an already defined set of reference
with a centered matrix. We will perform this test for the points to assure diversity for the population.
standard benchmarking problems used to validate The main problem of genetic algorithms is an
performance of genetic algorithms. absence of operators that enhance the convergence to a
global model optimum. We present an improved
Testing of the UPCA-Improved Genetic algorithm (Fig. 1) based on the combination of NSGA-II
Algorithm on NSGA-II and DTLZ and the UPCA operator defined in the previous section.
The DTLZ benchmarks (Vose, 1999) are a set of
Benchmarks numerical multi-objective problems, used for
We use NSGA-II (Allison et al., 2006) as a part of comparison/validation of GA algorithms. We present the
our testing setup since it is one of the most popular comparison of NSGA-II without and with the UPCA
genetic algorithms. It is based on fast non-dominance noise cleanup operator applied to DTLZ.
sorting for generated populations and provides an In Fig. 2 and 3 we present the tuning parameter
impressive convergence rate to the optimal Pareto set. distribution together with the mean and standard
The genetic algorithm population is initialized and it deviation values with and without the UPCA operator. In
is afterwards sorted into a group of fronts using the non- Fig. 3 and 5 the PCA-based noise-cleanup procedure is
domination principle. The first front is treated as a used, while in Fig. 2 and 4 it is not. We can observe a
completely non-dominant set and the individuals of the significant improvement of the convergence speed to the
first front are dominating the individuals of the second ideal values of the parameters in the first case when we
front only and so on until the last front. For each use the UPCA operator. Figure 5 shows the first images
individual, in each front, a rank value is assigned based of an early Pareto front.
2M×N
Crossover,
Mutation
Transition matrix
Transition matrix
61
Oksana Shadura et al. / Journal of Computer Science 2019, 15 (1): 57.66
DOI: 10.3844/jcssp.2019.57.66
200
150
100
50
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
TGenes/bins
Fig. 2: Distribution of genetic population in the 20th generation of NSGA-II for DTLZ2 problem
Population distribution
PopDist20
z Entries 12000
Mean 0.482
120 Std Dev 0.1369
100
80
60
40
20
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
TGenes/bins
Fig. 3: Distribution of the genetic population at the 20th generation for NSGA-II with UPCA operator for DTLZ2 problem
62
Oksana Shadura et al. / Journal of Computer Science 2019, 15 (1): 57.66
DOI: 10.3844/jcssp.2019.57.66
Y1/Y2/Y3
h3y20
Entries 2000
Mean x 0.5129
Mean y 0.6449
Mean z 0.6434
Std Dev x 0.2695
Std Dev y 0.255
Std Dev z 0.2393
1
1 0.9
\ 0.8
0.7
0.6
0.5
0.4
0.3
0.2 0
0.1 0.1
0.2
0 0.3
1 0.4
0.9 0.8 0.5
0.7 0.6 0.6
0.7
0.5 0.4 0.8
0.3 0.2 0.9
0.1 0 1
Fig. 4: Visualization of the Pareto Front for the 20th generation of NSGA-II for DTLZ2 problem
Y1/Y2/Y3
h3y20
Entries 1000
1 Mean x 0.5359
Mean y 0.5357
0.9 Mean z 0.5286
Std Dev x 0.2869
0.8
Std Dev y 0.27
0.7 Std Dev z 0.2766
0.6
0.5
0.4
0.3
0.2
0.1
0
0.10.2
0.3
0.4 0.5
0.6
0.70.8 0.9 1
0.9 1 0.6 0.7 0.8
0.3 0.4 0.5
0 0.1 0.2
Fig. 5: Visualization of the Pareto Front at the 40th generation of NSGA-II with UPCA operator for DTLZ2 problem
63
Oksana Shadura et al. / Journal of Computer Science 2019, 15 (1): 57.66
DOI: 10.3844/jcssp.2019.57.66
1600
1400
1200
1000
×10000
400
200
0
Number of idle Number Number of Number of
CPU cycles of cycles instructions branches
Fig. 6: Comparison of number of performance indicators for optimized and non optimized GeantV simulation
160
155
150
Thousands
140
135
Number of primaries events/second
Fig. 7: Comparison of number primaries events for optimized and non optimized GeantV simulations
64
Oksana Shadura et al. / Journal of Computer Science 2019, 15 (1): 57.66
DOI: 10.3844/jcssp.2019.57.66
We have of course carefully verified that the physics analysis helps us to get faster convergence rate using
output does not change by applying stochastic tuning simple noise cleanup techniques based on an
algorithms. Users will be able to collect data on the orthonormal transformation strategy. The performance
performance of their system and tune it while running enhancement reached with this tuning for particle
simulation jobs. transport simulations can significantly help to economize
In this study we demonstrated a proof of concept computing resources, improve scheduling strategies and
tuning of the performance of a complex multithreaded, provide energy economy for computing resources.
highly vectorized application (GeantV) using
evolutionary computation/genetic algorithms. This opens Acknowledgment
the possibility to use the same method to tune complex
The authors would like to thank the CERN for their
applications on supercomputers and High Performance
support in writing this article.
Computing (HPC) clusters. It could also help to analyze
the performance of compute-intensive jobs, running in
heterogeneous environments, with the objective of Author’s Contributions
reaching better scalability. All authors equally contributed in this work.
Conclusion Ethics
The work we presented is focused on the stochastic This article is original and contains unpublished
optimization of highly parallelized applications for material. The authors confirm that are no conflict of
simulation of radiation transport in complex detectors. interest involved.
During the work we introduced a new genetic operator
able to speedup convergence of the algorithm to the References
true Pareto Front.
We have shown that the idea of merging the classical Agostinelli, S., J. Allison, K.A. Amako, J. Apostolakis
implementation of evolutionary algorithms with and H. Araujo et al., 2003. GEANT4 - a simulation
unsupervised machine learning methods can create a toolkit. Nuclear Instruments Meth. Phys. Res., 506:
powerful symbiosis that can efficiently tune the 250-303. DOI: 10.1016/S0168-9002(03)01368-8
performance of complex algorithms depending on a large Allison, J., K. Amako, J. Apostolakis, H. Araujo and P.
number of correlated parameters and it could be easily Arce Dubois et al., 2006. Geant4 developments and
implemented in any framework. The “mutation” of applications. IEEE Trans. Nuclear Sci., 53: 270-278.
genetic algorithm coupled with principal component DOI: 10.1109/TNS.2006.869826
65
Oksana Shadura et al. / Journal of Computer Science 2019, 15 (1): 57.66
DOI: 10.3844/jcssp.2019.57.66
Amadio, G., A. Ananya, J. Apostolakis, A. Arora and M. Rowe, J.E., 2007. Genetic algorithm theory. Proceedings
Bandieramonte et al., 2016. GeantV: from CPU to of the 9th Annual Conference Companion on
accelerators. J. Phys., 762: 012019-012019. Genetic and Evolutionary Computation, Jul. 07-11,
Amadio, G., J. Apostolakis, M. Bandieramonte, S.P. ACM, London, UK, pp: 3585-3608.
Behera and R. Brun et al., 2017. Stochastic DOI: 10.1145/1274000.1274125
optimization of GeantV code by use of genetic Rudolph, G., 1997. Convergence Properties of
algorithms. J. Phys.: Conf. Series, 898: 042026- Evolutionary Algorithms. 1st Edn., Kovac, ISBN-0:
042026. DOI: 10.1088/1742-6596/898/4/042026 3860645544, pp: 286.
Cadima, J. and I. Jolliffe, 2009. On relationships Schmitt, F. and F. Rothlauf, 2001. On the importance of
between uncentred and column-centred principal
the second largest eigenvalue on the convergence
component analysis. Pak. J. Stat., 25: 473-503.
rate of genetic algorithms. Proceedings of the 3rd
Deb, K. and H. Jain, 2014. An evolutionary many-
Annual Conference on Genetic and Evolutionary
objective optimization algorithm using reference-
point-based nondominated sorting approach, part I: Computation, Jul. 07-11, Morgan Kaufmann
Solving problems with box constraints. IEEE Trans. Publishers Inc., pp: 559-564.
Evolut. Comput., 18: 577-601. Shadura, O. and F. Carminati, 2016. Stochastic
DOI: 10.1109/TEVC.2013.2281535 performance tuning of complex simulation applications
Deb, K., A. Pratap, S. Agarwal and T. Meyarivan, 2002. using unsupervised machine learning. Proceedings of
A fast and elitist multiobjective genetic algorithm: the IEEE Symposium Series on Computational
NSGA-II. IEEE Trans. Evolut. Comput., 6: 182-197. Intelligence, Dec. 6-9, IEEE Xplore Press, Athens,
DOI: 10.1109/4235.996017 Greece, pp: 1-8. DOI: 10.1109/SSCI.2016.7850200
Deb, K., L. Thiele, M. Laumanns and E. Zitzler, 2005. Shadura, O., 2017. Performance tuning and monitoring
Scalable Test Problems for Evolutionary using genetic optimization techniques. GeantV
Multiobjective Optimization. In: Evolutionary Spring Sprint.
Multiobjective Optimization, Abraham, A., L. Vose, M.D., 1999. The Simple Genetic Algorithm:
Jain and R. Goldberg (Eds.), Springer, London, Foundations and Theory. 1st Edn., MIT Press,
pp: 105-145. Cambridge, ISBN-10: 026222058X, pp: 251.
Hotelling, H., 1936. Relations between two sets of
variates. Biometrika, 28: 321-377.
DOI: 10.2307/2333955
66