Parallellization of TSP On HPC v03

PARALLELIZATION OF TRAVELLING SALESMAN PROBLEM USING ROCKS CLUSTER
IZZATDIN AZIZ, MAZLINA MEHAT, LOW TAN JUNG

Department of Computer and Information Sciences
Universiti Teknologi Petronas
31750 Tronoh, Perak
MALAYSIA
{izzatdin, mazlinamehat, lowtanjung }@petronas.com.my
ABSTRACT The ROCKS is an open source application that allows

networked computing terminals to perceive each other as
The Traveling Salesman Problem (TSP) is a well
one cluster working in unison to solve a common problem
known Nondeterministic Polynomial (NP) problem. A
[4,5].
number of sequential algorithms have been well defined
The emphasis of this study is to measure the
to solve TSP. However in this paper the authors
performance of the developed parallel algorithm on the
demonstrate an alternative way of solving TSP with
ROCKS cluster based on load distribution factor.
parallelism by modifying Prim’s algorithm and integrate
It is not difficult for one to predict that the more the
the modified algorithm with a Random algorithm making
number of computing nodes used for a fix number of
the whole process more computational extensive in
vertices would result in a shorter completion time.
visiting all the nodes/cities. This paper discussed on
The data used for this study is 500 cities to represent
converting the sequential algorithm into a parallel version
vertices in x:y axis. To illustrate the complexity of the
to be computed in a High Performance Computer (HPC)
situation, the number of possible path a salesmen could
using Message Passing Interface (MPI) libraries running
possibly take is P = n!, where n factorial indicates the
on ROCKS open source systems. Outcome of load
number of cities and P number of possibilities. For the
balancing and cluster performance is also discussed.
case of three cities, six permutations representing viable
paths are possible. When considering the case of 20 cities
Keywords Rock Cluster, Traveling Salesman Problem
or 20! in which there are over 2.4 billions permutations. If
(TSP), Message Passing Interface (MPI), Prim’s
we exhaustively searched through all the possible path
Algorithm, High Performance Computing (HPC).
traversals at a rate of 500 million per second, it would
take over 150 years to examine them all [6]. Therefore,
1 Introduction
imagine how immense a TSP problem could be when
dealing with 500 cities or 500!
The traveling salesman problem is well known as a
resource extensive algorithm due to its complex 2 ROCKS Cluster
computation. Given a number of cities representing
vertices and edges representing paths, this algorithm Twenty machines were used to form the cluster. Each
would compute the shortest path for a person to make a was off-the-shelf Intel based machine with dual P3-733
round trip visiting all the cities only once. MHz processors and came with 512 MB memory.
This algorithm has been applied in many [1,2,3] However it should be noted here that only 5
engineering and technological application such as machines/nodes (10 processors) were actually used in the
electronic circuitry design, neuron chain, oil and gas performance measurement.
transportation application, DNA research and protein All these machines were connected to a Fast Ethernet
folding. The importance of gaining the shortest path in 100Mbps switch at which there exist a head node that
these computationally complex applications has led to the acted as the master node. The master node has multiple
optimality of operation thus achieving high-quality and network interfaces [7].
accurate results. Rocks Cluster Distribution is a Linux distribution
It is fairly clear that TSP algorithm needs to run in a intended for high-performance computing clusters. Rocks
high performance computing facility in order to achieve was initially based of Red Hat Linux distribution. Rocks
comparable result within diminutive time. includes many tools such as MPI, which are not part of
It is known to the mathematical and computing CentOS but are integral components that make a group of
community that TSP is a time consuming algorithm in computers into a cluster.[9]
getting to the shortest path for few to enormous vertices.
The aims of this paper are to produce a parallel algorithm
3 TSP Pseudocode and algorithm
of TSP using Message Passing Interface or MPI and to
measure the performance on ROCKS Cluster in terms of There could be various ways to achieve parallelism in
ideal load distribution. solving TSP. The authors chose to apply modifications to
Prim’s algorithm and integrate it with Random algorithm
making it more computational extensive in visiting all the
nodes. Prim’s algorithm was chosen so that the process
will visit all vertices before master can consolidate the Figure 1: Proposed parallel algorithm of TSP
result to determine the shortest path. The Random
algorithm was picked for its ability to randomly initiate 4 Decomposition or Job Distribution
the process for each computing node or slave in order to
discover the shortest path. The algorithm works by having the Master creating a
The authors’ solving of TSP with parallelism using linked-list structure in a sorted order from shortest to
modified version of Prim’s and integration of Random farthest distance in a 2D matrix. Then the Master
algorithm is depicted in figure 1. transforms the 2D matrix to a single-dimension array to
be broadcasted using MPI_Bcast (this array acts as a map
for the entire slave). The Master then sends a random
START
starting point (num_nodes) for the slave as shown in
figure 2.
Master open file and get
input coordinate of node
Randomly choose a start
point
Master initialize source

and destination node
Start
p
point [0]
Master computes all
Start
possible paths in
dynamic 2Dpoint
array[3]
Start
point [9]
Master send m,
Startto
number of rows
point
slave (possible [1]
path)
Start
point [n]
n
First Last
row row Figure 2: The transformation of the 2D matrix to single
1
dimensional arrays.
As soon as the slave receives the initial random start, it

compares the distances of itself with all the connected
nodes. Once the shortest edge is found, it will go by that
S1 S2 Sn edge and marks the current node flag to 1 so that it won't
revisit nor consider that node again. It is worth noted here
that this marking mechanism does not exist in the original
Compute Compute Compute
distance for all distance for all distance for all
Prim's algorithm.
possible paths possible paths possible paths Each node that was visited will be stored in an array.
and find and find and find
shortest path shortest path shortest path The first element of the array is the minimal path. Then
this array is sent to the Master to make comparison with
other processors’ results. Note that the send and receive
Return Return Return process use a Non-blocking method.
shortest shortest shortest
distance distance distance
and and and
shortest shortest shortest 5 Problems Encountered
path path path
There were two weaknesses encountered in the

algorithm proposed. Firstly the array map broadcasted by
Master finds and display s the Master can be large. This might overwhelm the slave
shortest distance and buffer. One possible solution is to break the array map
shortest path
into smaller chunks and reassemble it once it arrives at the
slaves. Secondly the use of linked-list to add and delete
END
nodes by the Master may slow down the processing speed 6.2 Cluster Performance
due to memory allocation and de-allocation by the
compiler. A possible solution to this is by having a fix- In figure 5, the master processor maintains a
size array instead of building up a dynamic linked list. consistent performance of 100%, however the slave
Nevertheless the strength is the linked-list which maintains a consistent performance rate of 75%. This is
allows the dynamic linked list tree to be built up only due to delegation of tasks by master processor, which at
once by the master. So this could optimize the execution the same time performs the same computation as the
of the program by not forcing the slave to iteratively others.
building the linked list, thus avoiding redundancy for CPUPerformance
slave to visit a particular node more than once if the nodes
have been flagged to 1. 100%
90%
80% 100
6 Results and Discussions 70%

0.75 0.75 0.75 0.75
60%
Two variables were investigated upon execution of 50%
40%
the parallel TSP algorithm. The variables are load balance 30%
and CPU performance (processing time). The master 20%
processor resides in the HPC is used to disseminate the 10%
Slave 0
Slave 1
Slave 2
Slave 3
Master
jobs to each and every slave for computation as according 0%
to the written algorithm.
6.1 Load balancing Figure 5: CPU performance for 5 processors
It is best to remind that the cluster performance

From the experiment it can be seen that in figure 3, to measured in this paper refers to the CPU usage of
certain extent an ideal load balancing is achieved. each computing nodes.
Through out the test, the system is able to maintain 75%
of CPU loads.
7 Conclusion
Load Balance A parallel algorithm was developed by modifying the

100%
90%
Prim’s version to speed up the TSP processing time. This
80% was possible because TSP was solved in unison by a
70% collection of processors. The primary goal here is to
60% 0.75 0.75 0.75 0.75
50%
0.75
address the common problem of a sequential algorithm
40% that consumes huge amount of time when dealing with
30% massive processing needs or massive data set.
20%
10%
It can be seen that by having the ROCKS running on
0% Beowulf cluster we can effectively disseminate the tasks
Slave 0
Slave 1
Slave 2
Slave 3
Master
through out the cluster thereby performing computation

via the developed parallel algorithm with reasonable
Figure 3: Load balancing for 5 processors performance and load balancing.
Towards the end of processing time, the load has 8 Reference

reduced to an average of 10% as depicted in figure 4. This
is due to reduction of all results from slave processors [1] Solving the Travelling Salesman Problem in Parallel
back to the master processor to decide which possibilities by Genetic Algorithm on Multicomputer Cluster,
of the shortest path is to be concluded. Plamenka Borovska, 2006, International Conference
on Computer Systems and Technologies.
[2] Genetic Algorithms and Traveling Salesmen

Problem, Kylie Bryant, Arthur Benjamin, December
2000, Harvey Mudd College.
[3] Parallel Ant Colony Optimization for the Traveling

Salesman Problem, Max Manfrin, Mauro Birattari,
Thomas St.utzle, and Marco Dorigo IRIDIA, CoDE,
Figure 4: Load balance towards ending of the process
Universite Libre de Bruxelles, Brussels, Belgium.
[4] Technical Report: Modifying a Standard System
Installer to Support User-Customizable Cluster
Frontend Appliances, Greg Bruno, Mason J. Katz,
Federico D. Sacerdoti and Philip M. Papadopoulos,
July 2, 2004.
[5] Grid Systems Deployment & Management Using

Rocks , Federico D. Sacerdoti, Sandeep Chandra, and
Karan Bhatia, September 2004, IEEE International
Conference on Cluster Computing, San Diego
[6] An Introduction to Genetic Algorithms In Java,

Michael Lacy, http://java.sys-
con.com/read/36224.htm , accessed date 2nd
January 2008.
[7] Performance Evaluation on Hybird Cluster: The

Intergration of Beowulf and Single System Image by
Dani Adhipta, Izzatdin Bin Abdul Aziz, Low
Tan Jung, Nazleeni Samiha Binti Haron, 2006. ,
The 2nd Information and Communication
Technology Seminar (ICTS),Jakarta. August
2006.
[8] The ganglia distributed monitoring system:

design,implementation,and experience by Matthew
L.Massie, Brent N.Chun, David E.Culler, April 2004,
University of California,Berkeley,Computer Science
Division.
[9] SCI: Delivering Cyberinfrastructure: From Vision to

Reality by Stephen Meacham,
http://www.nsf.gov/awardsearch/showAward.do?
AwardNumber=0438741, accessed date 2nd January
2008.

Parallellization of TSP On HPC v03

Caricato da

Informazioni sul documento

Descrizione originale:

Titolo originale

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Parallellization of TSP On HPC v03

Caricato da

Copyright:

Formati disponibili

PARALLELIZATION OF TRAVELLING SALESMAN PROBLEM USING ROCKS CLUSTER

IZZATDIN AZIZ, MAZLINA MEHAT, LOW TAN JUNG

ABSTRACT The ROCKS is an open source application that allows

Master initialize source

As soon as the slave receives the initial random start, it

There were two weaknesses encountered in the

6 Results and Discussions 70%

Two variables were investigated upon execution of 50%

and CPU performance (processing time). The master 20%

processor resides in the HPC is used to disseminate the 10%

to the written algorithm.

6.1 Load balancing Figure 5: CPU performance for 5 processors

It is best to remind that the cluster performance

Load Balance A parallel algorithm was developed by modifying the

through out the cluster thereby performing computation

Towards the end of processing time, the load has 8 Reference

[2] Genetic Algorithms and Traveling Salesmen

[3] Parallel Ant Colony Optimization for the Traveling

[5] Grid Systems Deployment & Management Using

[6] An Introduction to Genetic Algorithms In Java,

[7] Performance Evaluation on Hybird Cluster: The

[8] The ganglia distributed monitoring system:

[9] SCI: Delivering Cyberinfrastructure: From Vision to

Potrebbero piacerti anche