Sei sulla pagina 1di 10

CS 807

TASK: C
A GPU-based Parallel Ant Colony Algorithm for Scientific
Workflow Scheduling.

Abstract
The workflow management system that is designed specifically to compose and
perform series of computational or data manipulation steps is called as scientific
workflow system. A very simple example of a Scientific workflow can be the scripts
that call in data, programs, and other inputs and produce other outputs which
might include visualizations and some analytical outputs. To schedule this Scientific
workflow programs poses combinatorial optimization problem [1]. A scientific
workflow generally has thousands of nodes in a real application. So the scheduling
of this large-scale workflows has very humongous computational overhead
associated with them. The method proposed in the paper [1] uses ant colony
optimization approaches on GPU to increase the efficiency of scientific workflow
programs. This paper will talk about the GPU approach for ant colony optimization
to improve the performance of scientific workflow scheduling.
Introduction:
The ant colony optimization (ACO) algorithm was originally proposed by Marco
Dorigo in 1992 [2][3]. This algorithm works efficiently while solving combination
optimization problems [4]. A parallel variant of ant colony algorithm is designed to
run on a parallel platform. There is a large scale intensive computation in Ant colony
algorithm which requires parallel hardware to finish calculations. There are some
parallel versions of Ant colony algorithm available for example cluster and MPI
based parallel algorithm [5], Multi-core CPU and OpenMP based algorithm [6] and
GPU and CUDA based algorithm [7-10].
There are many solutions to solve scientific workflow scheduling problems. This
problem is known as a combinatorial optimization problem for scientific workflow for
scheduling. Benedict et al. have solved this problem by using Niched Pareto genetic
Algorithm (NGPA) [11]. A genetic approach for solve this problem [12-13] by Jia Yu
et al. is also a nice approach to solving scientific workflow scheduling problem.
Plankensteiner et al. have proposed a dynamic execution and scheduling heuristic
capable scheduling application which supports a high degree of fault tolerance [13].
In this paper, we are going to discuss Workflow scheduling algorithm based on GPU
accelerated ant colony optimization algorithm [1]. The speed of GPU computation is
generally higher than that of CPU; the authors in the paper [1] selected this
approach. They have implemented their parallel algorithm using CUDA [12] and
tested their code in NVIDIA Tesla M2070. The GPU implementation of their algorithm
attained a speed up factor of 5.8 ~20.7 over the conventional CPU implementation.

Problem Description:

A scientific workflow model can be described as a directed graph with no


loops, DAG_<V,E>, where the task node of the workflows are denoted as
nodes in the graph and the relationship between the tasks are denoted by
edges.
Ti is the i-th task and denoted by Ti (1<=i<=N).
If there is another task Tj which is dependent on Ti then there is an edge
between them denoting their relationship. It is denoted as <i,j> e E .
If Tj can only be started after Ti finishes its execution Ti s predecessor are
denoted as Pred(Ti) and its successive tasks are denoted as Succ(Ti).
When the workflow begins from Ti so pred(Ti)= (as there is no task
beforeTi) . Now the workflow stops at Tn, Succ(Tn)= .
Let's assume there exist Mi candidate computing resource services to be
allocated to run task Ti. So computing resource service is denoted as Sik(1<=
k<= Mi).
Now one task can only choose one computing resource service. Let the
makespan of task Ti executed over Sik be tik and cost of task Ti executed
over Sik be Cik.
The decision variable of the workflow scheduling problem are Xik and if task
Ti is allocated with computing resources Sik, then Xik=1 orit should be Xik 0.
The total makespan of the workflow is determined by the time taken by its
each task to complete and the topology of DAG (Directed Acyclic Graph).
Ti is the time taken by task Ti to complete so tN is the makespan of the
whole workflow .
The scheduling problem is to map every Ti on to some Sik as to achieve
minimum execution cost. This whole process should also be completed within
execution deadline D.

The description of scientific workflow scheduling problem can be shown by


following mathematical equations:

The above image [1] shows all mathematical equations to describe scientific
workflow scheduling problem.
Explanation:Equation 1: The core objective function is described in this equation. It summates
the cost of scientific workflow which is the sum of each tasks cost.
Equation 2: This equation represents the constraint in the relationship among the
tasks. Task Tj should wait till , Task Ti, completes itself.
Equation 3: This shows that the time taken by each node to complete itself is
nonnegative.
Equation 4: This represents the total time of the workflow
Equation 5: The constraint that each task node can only select one and only one
candidate computing resource.
Equation 6: The problem decision variables are Boolean variables.

Design and implementation of Parallel Ant colony Algorithm:


Design and implementation of Parallel Ant colony Algorithm:
There are three main parts of Ant colony Algorithm

Data Initialization

Solution construction

Pheromone update

Data Initialization involves pheromone matrix initialization, random number


generator initialization, auxiliary variable initialization and heuristic information
matrix initialization. The initial value for every element of pheromone matrix is and
is known as algorithm parameter and the value is specified by the user. The initial
value of Heuristic information matrix is 1/(ciktik). This is used to guide the ants to
choose low-cost computing resources which have very less execution time and
contribute significantly to the solution construction. The work of random number
generator is to generate random number seed. If the random number seed has no
value then the system used default seed. This will result in same results of the
algorithm in each run. Hence, there would be no change in algorithm outputs in
multiple runs.
The solution construction is the heart of the algorithm. In every round iteration,
several ants build the parallel solution. The solution is the allocation scheme of a
workflow task and the computing resources. It is expressed as the vector
X=(x1,x2,..... xn), xi=k means that the resources Sik are allocated to task Ti.When
building a solution ants selects the computing resources Sik for the Task Ti. A
workflow with N tasks needs N times selection in total. Every time the ant selection
is uncertain.
Another important part of the algorithm is the pheromone update which includes
updating local and global pheromones. The local pheromone update is done during
the process of constructing the solution while the global pheromone updating is
done iteratively in each round. The aim of pheromone update is to realize the ants
positive feedback in solution searching space. This means if an excellent solution is
found, the possibility that solution will be chosen again will increase.

The input of algorithm is : Computing resource cost matrix, a directed-graph matrix


of workflow, computing resource time consuming matrix and workflow deadline.

Implementing algorithm
The algorithm is implemented in 3 core parts

Initializing the values

Construction of the problem solution

The pheromone matrix update

The Initializing part is executed by CPU and other two are executed by
GPU.
Constructing Problem solution:
It consists of four kernel function ant_create_solution, evalue_total_time,
evalue_cost and select_node.
Pheromone matrix update:
It consists of two main kernel functions local_update and global_update
Step by step implementation

Kernel function ant_create_solution mimics an ant to get a solution of the


problem. The problem is stated with keyword _global_ and is called from the
host code.
Ant_create_solution function triggers several threads which can run over GPU
to search the solution in a parallel paradigm. Kernel function
ant_create_solution is basically a loop with N steps( N= number of tasks in
workflow)
Now when the solution space has constructed the evaluation of the execution
time of workflow is done. If this evaluated time is within the time
requirement, then the solution vector is returned otherwise the solution
vector is discarded.
Select_node function selects a computing resource service from the
candidate resources and this kernel is the implementation of node selection

rule of the algorithm. Just like a roulette wheel method ,which generates a
random number every time.
Evalue_total_time function computes the overall time span of the workflow.
This function searches the DAG of the workflow using breadth-first search
method.
The function evalue_cost calculates the total cost of the workflow. If the
makespan of the workflow is created then the deadline D, that means the
solution is unacceptable. Then this function adds a very big number into the
cost.
This is the core functionality of GPU implementation of Ant Colony search
algorithm.

Experimental results

The above figure [1] shows the performance of the algorithm in CPU vs GPU
environment.
When N=1000, i.e 1000 tasks in the scientific workflow, the best cost of the
workflow by sequential algorithm is4880.1 and by the parallel algorithm is 4214.28.
When N= 12, the cost found by the parallel algorithm is bigger than that of
sequential algorithm. So the algorithm works better with higher values of N.

Conclusion:
Ant colony algorithm is one of the best algorithms to solve workflow management
system computing problems. In real applications, the scientific workflow will have
thousands of nodes and the problem is huge. The commonly used sequential

algorithm needs a long time to obtain the optimal solution. This makes is practically
infeasible for practically implementing in scientific workflow scheduling. GPU-based
parallel ant colony optimization algorithm is the first step towards the parallel
solution of workflow scheduling. Although there is much more work to be done in
this field.

References
[1]P. Wang, H. Li and B. Zhang, "A GPU-based Parallel Ant Colony Algorithm for
Scientific Workflow Scheduling", International Journal of Grid and Distributed
Computing, vol. 8, no. 4, pp. 37-46, 2015.
[2] M. Dorigo, Optimization, learning and natural algorithms, Ph.D. dissertation,
Dipartimento di Elettronica, Politecnico di Milano, (1992).
[3] M. Dorigo, V. Maniezzo and A. Colorni, The ant system, Optimization by a colony of
cooperating agents, IEEE Transactions on Systems, Man, and CyberneticsPart B. vol.
26, no. 1, (1996).
[4] M. Dorigo, T. Sttzle, Ant Colony Optimization, MIT Press, (2004).
[5] Y. Liu, G. Wu, Research on MPI-based parallel max-min ant system, Applied
Mechanics and Materials, (2012), pp. 198-199.
[6]

U. Boryczka, J. Kozak and R. Skinderowicz, Parallel ant-miner, parallel implementation


of ACO techniques to discover classification rules with OpenMP Proceedings of 15th
International Conference on Soft Computing, (2009) June 24-26, Brno, Czech.
[7]
A. Delvacq, P. Delisle, M. Gravel and M. Krahecki, Parallel ant colony optimization on
graphics processing units, Journal of Parallel and Distributed Computing, vol. 73, no. 1,
(2013).
[8]
K. Kobashi, A. Fujii, T. Tanaka and K. Miyoshi, Acceleration of ant colony optimization
for the traveling salesman problem on a gpu, Proceedings of the IASTED International
Conference Parallel and Distributed Computing and Systems, (2011) December 14-16,
Dallas, TX, United States.
[9]
J. M. Cecilia, J. M. Garca, A. Nisbet, M. Amos and M. Ujaldn, Enhancing data
parallelism for ant colony optimization on gpus, Journal of Parallel and Distributed
Computing, vol. 73, no. 1, (2013).
[10]
A. Uchida, Y. Ito and K. Nakano, An efficient GPU implementation of ant colony
optimization for the traveling salesman problem, Proceedings of 2012 3rd International
Conference on Networking and Computing, (2012) December 5-7, Naha, Japan.
[11]
S. Benedict and V. Vasudevan, Scheduling of scientific workflows using Niched Pareto
GA for grids, Proceedings of 2006 IEEE International Conference on Service
Operations and Logistics, and Informatics, (2006) June 21-23, Shanghai, China.
[12] NVIDIA Corp. CUDA Programming Guide Version 5.5, (2013).

[13] K. Plankensteiner, R. Prodan and T. Fahringer, Scheduling scientific workflows to


meet soft deadlines in the absence of failure models. Proceedings of 16th International
Euro-Par Conference on Parallel Processing, (2010) August 31- September 3, Ischia,
Italy.

[14] [2]"Applying Ant Colony Optimization Algorithms to Solve the Traveling Salesman
Problem - CodeProject", Codeproject.com, 2013.
http://www.codeproject.com/Articles/644067/Applying-Ant-Colony-OptimizationAlgorithms-to-Sol#_Toc364710425.

Potrebbero piacerti anche