Sei sulla pagina 1di 61

Parallel Graph Colouring Algorithms

for Shared-Memory Machines

Ismet Isnaini, B.Eng.

June 2002

Department of Computer Science


The University Of Adelaide,
South Australia

Supervisor: Dr Paul Coddington

Submitted in partial fulfillment of the requirement for the Master Degree


in Computer Science
Abstract

Graph colouring is very useful in many different kind of applications. The Graph
Colouring Problem (GCP) itself — which is known as an NP-hard problem — is usu-
ally part of another large computation problem, therefore a good solution to the GCP is
required. Much researches have found solutions in the form of sequential algorithms,
which is very useful for small scale graphs. In the case of large graphs, these sequential
algorihms might cause a bottle neck in the overall computation, particularly if the rest of
the computation is done in parallel. Hence, a parallel heuristic is required to enhance the
computation timing to the GCP problem.

The lack of research on parallel heuritics of GCP has motivated us to seek a good
solution for the problem. This project is aimed at implementing and comparing a variety
of those sequential as well as parallel algorithm(s). Moreover, most of existing parallel
algorithms have been implemented on distributed memory machines and typically give
little or no speed-up. Therefore, the algorithms developed here is written in Java Thread
and run on shared memory machine to achieve a good speed-up. A comparison of per-
formance for different algorithms in different types and size of graphs is conducted to
observe which algorithm is best for particular types of graphs.
.

Alhamdulillaahi Rabbil ’Alamiin

praise is only for Allah who is the Lord of all the Universes

i
Acknowledgements

I would like to thank my supervisors, Paul Coddington has been patience and gives
me a lot of encouragement and guidance throughout the project

My gratitude and sympathy go to my family overseas and friends here who always
wish me the best of my study
My special thanks to my wife for her understanding and support, and my 2 little
daughters . . . seeing them makes me forget the due date of this Thesis . . .

ii
Contents

1 Introduction 1
1.1 Graph Colouring Problem . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Objective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2 Sequential Graph Colouring 5


2.1 Common Graph Colouring Algorithms . . . . . . . . . . . . . . . . . . . 5
2.2 First–Fit (FF) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.3 Largest–Degree–First Algorithm (LDF) . . . . . . . . . . . . . . . . . . 6
2.4 Smallest–Degree–Last (SDL) . . . . . . . . . . . . . . . . . . . . . . . . 6
2.5 Incidence–Degree–Ordering (IDO) . . . . . . . . . . . . . . . . . . . . . 6
2.6 Saturation–Degree–Ordering (SDO) . . . . . . . . . . . . . . . . . . . . 10

3 Parallel Graph Colouring 11


3.1 Parallel Graph Colouring Algorithm . . . . . . . . . . . . . . . . . . . . 12
3.2 Synchronisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.3 Independent Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.3.1 Jones–Plassmann (JP) . . . . . . . . . . . . . . . . . . . . . . . 15
3.3.2 Largest–Degree–First Algorithm (LDF) . . . . . . . . . . . . . . 15
3.4 Non-independent Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3.4.1 First Fit (FF) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.4.2 Gebremedhin–Manne (GEBMAN) . . . . . . . . . . . . . . . . . 20
3.4.3 Smallest–Degree–Last (SDL) . . . . . . . . . . . . . . . . . . . 21
3.4.4 Incidence–Degree–Ordering (IDO) . . . . . . . . . . . . . . . . 23
3.4.5 Saturation–Degree–Ordering (SDO) . . . . . . . . . . . . . . . . 23
3.5 Balanced Colouring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

4 Implementation 26
4.1 Previous Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

iii
4.2 Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
4.2.1 Java Thread . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
4.2.2 Data Structures . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
4.2.3 Class Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
4.3 Sequential version . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
4.4 Parallel version . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
4.4.1 Independent Set Vertices . . . . . . . . . . . . . . . . . . . . . . 30
4.4.2 non-Independent Set Vertices . . . . . . . . . . . . . . . . . . . 30
4.5 Balanced Colouring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

5 Performance measurement and Analysis 33


5.1 Experiment conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
5.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
5.2.1 Different types of graphs . . . . . . . . . . . . . . . . . . . . . . 34
5.2.2 Graphs with same number of vertices and different number of edges 39
5.2.3 Different number of processors . . . . . . . . . . . . . . . . . . . 45
5.3 Balanced Colouring Graph . . . . . . . . . . . . . . . . . . . . . . . . . 45

6 Conclusions and Future Work 50


6.1 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
6.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

iv
List of Tables

5.1 Testing Graphs 1 : Random Graph . . . . . . . . . . . . . . . . . . . . . 34


5.2 Testing Graphs 2 : Sparse Matrix . . . . . . . . . . . . . . . . . . . . . . 34
5.3 Speed up of all algorithms . . . . . . . . . . . . . . . . . . . . . . . . . 35
5.4 Time taken for each algorithms (in second) . . . . . . . . . . . . . . . . 35
5.5 Number of colour used in the algorithms using 4 processors . . . . . . . . 35
5.6 Speed up of each algorithms on Random Graphs . . . . . . . . . . . . . . 39
5.7 Time taken (in second) of each algorithms for Random Graphs . . . . . . 39
5.8 Number of colours used in each algorithm for Random Graphs . . . . . . 43
5.9 Computation time for each algorithm for Graphs of same nodes and dif-
ferent edges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
5.10 Number of colours in each algorithm for Graphs of same nodes and dif-
ferent edges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
5.11 Computation time for each algorithm in different machines (TITAN) . . . 45
5.12 Distribution of Colour before balancing for 4 processors using FF Algo-
rithm in 4elt problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
5.13 Distribution of Colour after balancing for 4 processors using FF Algo-
rithm in 4elt problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
5.14 Distribution of Colour before balancing for 4 processors using FF Algo-
rithm in 4elt2 problem . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
5.15 Distribution of Colour after balancing for 4 processors using FF Algo-
rithm in 4elt2 problem . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

v
List of Figures

1.1 Principal of Graph Colouring . . . . . . . . . . . . . . . . . . . . . . . . 3

2.1 First Fit (FF) Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . 5


2.2 Largest Degree First (LDF) Algorithm . . . . . . . . . . . . . . . . . . . 6
2.3 Largest Degree First (LDF) Algorithm . . . . . . . . . . . . . . . . . . . 7
2.4 Smallest Degree Last (SDL) Algorithm . . . . . . . . . . . . . . . . . . 8
2.5 Smallest Degree Last (SDL) Algorithm . . . . . . . . . . . . . . . . . . 9
2.6 Incidence Degree Ordering (IDO) Algorithm . . . . . . . . . . . . . . . 10

3.1 Incorrect Graph Colouring . . . . . . . . . . . . . . . . . . . . . . . . . 14


3.2 Jones–Plassmann (JP) Algorithm . . . . . . . . . . . . . . . . . . . . . 16
3.3 Jones–Plassmann (JP) Algorithm . . . . . . . . . . . . . . . . . . . . . 17
3.4 Largest Degree First (LDF) Algorithm . . . . . . . . . . . . . . . . . . . 18
3.5 Largest Degree First (LDF) Algorithm . . . . . . . . . . . . . . . . . . . 19
3.6 First Fit (FF) Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.7 Gebremedhin–Manne (GEBMAN) Algorithm . . . . . . . . . . . . . . . 21
3.8 Smallest Degree Last (SDL) Algorithm . . . . . . . . . . . . . . . . . . 22
3.9 Incidence Degree Ordering (IDO) Algorithm . . . . . . . . . . . . . . . 23
3.10 Balanced Coloured Graph . . . . . . . . . . . . . . . . . . . . . . . . . . 24

4.1 Colour Balancing method . . . . . . . . . . . . . . . . . . . . . . . . . . 31

5.1 Computation time for 3elt problem . . . . . . . . . . . . . . . . . . . . . 36


5.2 Computation time for 4elt2 problem . . . . . . . . . . . . . . . . . . . . 37
5.3 Speed up for 3 elt problem . . . . . . . . . . . . . . . . . . . . . . . . . 38
5.4 Speed up for Random Graph (250 Nodes) . . . . . . . . . . . . . . . . . 40
5.5 Computation time for Random Graph (500 Nodes) . . . . . . . . . . . . 41
5.6 Computation Time for Random Graph (250 Nodes) . . . . . . . . . . . . 42
5.7 Computation time for Graph of different number of edges . . . . . . . . . 44
5.8 Computation time for 3elt Graph in Titan . . . . . . . . . . . . . . . . . 46
5.9 Speedup for 3elt Graph in Titan . . . . . . . . . . . . . . . . . . . . . . 47

vi
Chapter 1

Introduction

Graph Colouring is the process of assigning labels (called colours) to a vertex in an


arbitrary graph, such that the neighbouring vertices (i.e. those connected by an edge of
a graph) will not have the same colour [8]. In other words, we will avoid having two
vertices of the same colour connected by an edge which usually signifies a relationship
between the vertices. Therefore, the vertices are in some sense independent, which makes
it easier to manipulate the vertices, for example to update them independently in parallel.
Figure 1.1 shows that every vertex in the graph does not have the same colour as its
neighbour vertices.

Graph colouring algorithms have been widely applied in many different kinds of ap-
plications. Timetabling of courses at university [20], for example, can be viewed as
a graph-colouring application that optimises the allocation of subjects, students, rooms
and lecturers. These entities are similar to vertices in the graphs, while the relationships
between the entities are the edges. Hence, for a given time period (colour), the graph
colouring algorithm will make sure there will be no clash between the rooms, student and
lecturers. This can also be applied for scheduling of flights at airports, and the schedul-
ing of running tasks in a multiprocessor machine. Another application is printed circuit
board testing, in which a graph colouring algorithm was used to check whether any of
the points in the board is short-circuited [9]. In this case, the lines between the points
in the board are the edges, while the points themselves are the vertices. There are also
other applications such as optimising the solution of sparse Jacobian matrix problems
[6], parallel numerical computation [17] and register allocation [4].

Due to the importance of graph-colouring applications, many researches have been con-
ducted to find out the best algorithm in order to get an optimal graph colouring. Un-
fortunately, optimal graph colouring is an NP-hard problem [8]; therefore it is almost
impossible to find an optimal solution with the minimum number of colours, we can
only get a good colouring with a small number of colours. However, this is acceptable
for virtually all applications. Many applications don’t require to have the least number
of colour, they might be more interested to solve the problem in the shortest period of

1
time. In order to achieve a good solution, there are two main strategies in forming the
algorithm: first, the algorithm should perform in such a manner that it will only use as
few colours as possible in a graph, and secondly it should colour all the vertices in the
graph in the shortest period of time [8]. Nevertheless, there is always a trade off between
these two strategies. In some applications, we might need to emphasise minimum time,
but allowing a bigger number of colours. On the other hand, for some applications the
minimum colours are more important than the time constraint.

1.1 Graph Colouring Problem

The terminology in this paper will be defined as follows. Say we have a graph G = (V ; E )
with vertex set V with the number of vertices jV j = N (V ), and edge set E , with the num-
ber of edges jE j = N (E ). Two vertices, u and v , in G are said to be adjacent (or neigh-
bour to each other) if there exists an edge connecting them, E = (u; v )ju; v 2 V and the
set of vertices adjacent to v is denoted as neigh(v ). Every vertex v in the graph, has a
degree, d(v ), defined as the number of adjacent vertices, d(v ) = jneigh(v )j. The maxi-
mum and minimum degree of vertex in a graph is denoted as max(d(v )) and min(d(v )).
In solving the Graph Coloring Problem, we need to form a set of vertices, denoted by
V with the number of vertices in the set jV j = N (V ). An independent set of G is an
0 0 0

0
independent set of vertices V , in such a way that there is no edge existing between u and
v , 8u; v 2 V . On the contrary, a non-independent set of G is a set of vertices V , such
0 0

that there is an edge between u and v , for some u; v 2 V 0. In some algorithms, a vertex v ,
might be assigned a random number denoted as R(v ) or given a weight denoted as w(v ).
P
The colour assigned to a vertex v is denoted as with the total number of colour in the
adjacent vertices v is denoted as (neigh(v )).

1.2 Motivation

There has been relatively little research on parallel graph-colouring algorithms, which
has motivated this project to try to find improved algorithms. We have also tried to
achieve a balanced graph colouring, that is minimising the number of colours and at the
same time considering the requirement that each processor should approximately have
the same number of vertices of each colour. This gives good load balancing when the
colouring is used in other parallel algorithms using the graph.

The fact that there are many good sequential algorithms which have not yet been paral-
lelized is also one of the reasons behind this project. We have looked at some well known
sequential algorithms and parallelized them. In most previous work, many of the parallel
algorithms gain little or no speedup [2, 14]. The work of Jones and Plassman [14] report
that they did not get any speedup for their algorithm. Most of these algorithms were also

2
Figure 1.1: The Principal of Graph Colouring Algorithm

written for distributed-memory machines. Therefore we would like to try to implement


shared-memory versions of Parallel Graph Colouring Algorithms, which hopefully can
gain a reasonably good performance in terms of speedup and the number of colours used.

Recently Gebremedhin and Manne [11] implemented a parallel version of a standard


sequential algorithm and claimed to have a good linear speedup. They also applied their
approach to a better colouring algorithm. This was done on a shared-memory machine
using OpenMP[11]. What we would like to know is whether their algorithm is better
than other parallel algorithms that have given no speedup, or is it because shared-memory
machine is better than distributed-memory machine for this particular applications?

This project is an extension of previous work on parallel graph-colouring algorithms by


Allwright et al. [2]. The programs in that work were written in old non-standard parallel
programming languages. These previous programs were written in Express Fortran (for
message passing) and run on an Intel iPSC/860 computer. For data parallel, the programs
were written in CM-Fortran and run on a 32-node Thinking Machine CM5.

1.3 Objective

The aim of this project is to implement a variety of graph-coloring algorithms, both Se-
quential and Parallel. We then compare their performance in diferent parallel computers
as well as graph with different number of vertices/edges.

3
The project concentrates on graph colouring algorithms for shared memory parallel com-
puters. The programs are written in Java which supports the Thread mechanism for
developing parallel programs.

The organisation of this thesis will be as follows. Chapter 2 introduces the algorithms
for sequential graph colouring, while Chapter 3 describes the parallel versions of those
sequential algorithms. Chapter 4 describes how these algorithms are implemented in
Java Threads. The result of the experiment comparing different algorithms for graphs
of different types and sizes can be seen in Chapter 5. A conclusion will be drawn in
Chapter 6 and some future work will be suggested.

4
Chapter 2

Sequential Graph Colouring

2.1 Common Graph Colouring Algorithms

Many studies have been conducted on sequential graph colouring algorithms. Some of
these algorithms have proven to be quite efficient and reliable, such as Saturation-Degree-
Ordering [3], Incidence-Degree-Ordering [6], Smallest-Degree-Last (SDL) [19], Largest
Degree First (LDF) [26] and First Fit or Greedy Ordering [1, 2, 16]. The NP-hard prob-
lem, such as the timetabling problem[26], can have an almost optimal solutions when
solved using these algorithms.

2.2 First–Fit (FF)

First–Fit (or Greedy) Algorithm is the simplest algorithm of all. It basically starts by
getting an arbitrary vertex v in the graph G and colouring it by the lowest available colour
(which is obviously 0 for the start). The next step is to get the next vertex arbitrarily

and get the vertex coloured in the same fashion until all vertices are coloured, as shown
in Figure 2.1

For i = 0 to N (V ) do

find lowest available colour , for vertex v (i)


set colour of vertex v (i) =

end for

Figure 2.1: First Fit (FF) Algorithm

5
2.3 Largest–Degree–First Algorithm (LDF)

The Largest-Degree-First Algorithm is described in Figure 2.2 and Figure 2.3. Every
vertex v in a graph will be assigned its degree of vertex d(v ), i.e. total number of neigh-
bouring vertices connected to that vertex. The algorithm will use the degree of vertex
d(v ) to determine which vertex to be coloured first. The vertex with a highest degree

(among neighbouring vertices) will be coloured first.


while (not all vertices in V are coloured)

for i = 1 to N (V ) do
if d(v (i))  d(v (j ))8v (j ) 2 neigh(v (i))
find lowest colour available, , for vertex v (i)
set the colour of vertex v (i) to
end if
end for

end while

Figure 2.2: Largest Degree First (LDF) Algorithm

2.4 Smallest–Degree–Last (SDL)

The Smallest-Degree-Last (SDL) algorithm, on the other hand, has a different system
in numbering the vertices. First of all, every vertex v having the same lowest degree of
vertex, min(d(v )), will be assigned a weight, w(v ) as can be seen in figure 2.4. This set
of vertices V 0 will then be removed from the graph, which will affect the degree of its
neighbours. In the next step, all the vertices with degree of d(v ), will again be removed,
but will be given successively larger weight, w(v ) + 1. If there is no vertex of degree
d(v ), the algorithm will then remove all vertices with degree of d(v ) + 1 and assign the

next weight, w(v ) + 1. The neighbouring vertex will pushed back to the next weight. The
same step will then be repeated again, until all the vertices were assigned to a weight.
The colouring will then proceed as in LDF algorithm, starting from the highest value of
weight. The detail of the algorihtm is shown in Figure 2.5

2.5 Incidence–Degree–Ordering (IDO)

The IDO algorithm, as in figure 2.5, first identify the highest degree among the vertices
v 2 G and then selects the set of vertices V with the highest degree max(d(v )). The set
0

6
Figure 2.3: Largest Degree First (LDF) Algorithm

7
Figure 2.4: The first phase of Smallest Degree Last (SDL) Algorithm

8
find lowest degree of vertex, min(d(v )), among all vertices;

j = 0; while (not all vertices in V are weighted)

for i = 0 to N (V ) do
if d(v (i)) = min(d(v )) + j
assign them a weight, w(v (i)) = j

end for

increase j
end while
w (v ) = max(w (v ))

while (not all vertices in V are coloured)

for i = 0 to N (V ) do
find vertices with weighting = w(v )
find lowest colour available, , for vertex v (i)
set colour of v (i) =

end for decrease w(v )

end while;

Figure 2.5: Smallest Degree Last (SDL) Algorithm

9
find the highest degree of vertex, max(d(v )); v 2 V ,

for i = 0 to N (V ) do

if d(v (i)) = max(d(v ))

find lowest available colour, , for vertex v (i)


set the colour v (i) =

end if

end for
while (not all vertices in V are coloured)

for i = 0 to N (V ) do

P
get the number
(neigh(v (i)))
of coloured neighbour, (v (i)) =

if (v (i))  (v (j ); 8v (j ) 2 neigh(v (i))


find lowest available colour, , for v (i)
set colour of v (i) to
end if

end for

end while

Figure 2.6: Incidence Degree Ordering (IDO)

0
V will then have to look for the lowest available colour for its members. Having some
vertices coloured, the algorithm will then select vertices that have the highest incidence
degree, i.e. number of coloured neighbours, v 0, and colour them with lowest available
colour 0 . The step is repeated until all the vertices are coloured.

2.6 Saturation–Degree–Ordering (SDO)

Instead of counting the number of coloured neighbour as in IDO, SDO takes into consid-
eration the number of differently coloured neighbours. Therefore, a vertex v , which has
m neighbours, but only m colours, would be in the same degree as vertex u, which

has only m neighbours with all of them coloured differently. The pseudo-code of
SDO is the same as IDO, except that now it will count the number of differently coloured
neighbours. IDO and SDO take much longer than other colouring algorithms but usually
give lower number of colours.

10
Chapter 3

Parallel Graph Colouring

In practice, the Graph Colouring problem is usually part of a larger computation problem.
If the Graph Colouring cannot be solved in a relatively short period of time, it may affect
the whole computation[23]. For a small graph, sequential algorithms might be attractive,
but when it comes to large graphs, the sequential solution might cause a bottle neck to
the overall computation problem. Therefore we need parallel graph colouring algorithms.
Even if the result of the parallel heuristic might not give as good quality colouring as the
sequential version, it will reduce the amount of time for the computation problem.

Studies on parallel graph colouring algorithms are very limited. Most of the parallel al-
gorithms are originated from sequential algorithms, which were parallelized. The basic
approach to parallel algorithm is by finding an independent set of vertices to be updated
[2], or in other words the algorithm cannot accept a pair of connected vertices to be up-
dated simultaneously. One of the first parallel algorithm was written by Luby [18], called
Maximum Independent Set (MIS) algorithm. The MIS algorithm is based on selection
of the largest set of independent vertices i.e. vertices which are unconnected, which can
then be coloured and removed from the graph. The next step will be looking for the next
largest independent set and so on, until all vertices have been coloured.

Another parallel algorithm based on independent sets was developed by Jones and Plass-
mann [14], (which is not from a sequential version). Every vertex in the graph was
assigned a random number. The algorithm will then check if none of the neighbouring
vertices have a higher random number, it will then colour that particular vertex. This
selection creates an independent set of vertices that can be coloured in parallel. This
algorithm has some deficiencies. First, the number of colours used in this heuristic is a
little bit more than number of colours in the best sequential heuristic. Secondly, it can
not provide a balanced colouring, an approximate equal distribution of colours among
the threads, especially for graphs which have highly variable local structure [11].

Other examples of parallel algorithms are the parallel versions of the two sequential al-

11
gorithm (LDF and SDL) described in section 2.3 and section 2.4 that were parallelized
by Allwright et al. [2]. They basically work on the same principle namely selecting a set
of independent nodes to be coloured in parallel in the next stage.

Gjertsen, Jones and Plassman worked on improving the previous Jones-Plassman algo-
rithm, trying to fix the deficiencies by introducing two new algorithms, namely Parallel
Deviance Reduction (PDR(k)) and Parallel Largest First (PLF(k)) [15]. These two al-
gorithms improve the balance of an existing colouring without increasing the number of
required colours.

The research on parallel implementation was halted for quite some time until a recent
work of Gebremedhin and Manne [11] described a parallel algorithm which is suited to
shared memory programming and gives a linear speed up on the PRAM model. Another
heuristic which was developed by the same authors, shows an improvement in the number
of colours used. The experiments of these algorithms were done on an SGI Origin 2000.
Further work also shows that his approach is also suitable for an application on a coarse
grained multithread [10].

There is also one work implementing a parallel algorithm in Java Threads. Umland[24]
in his paper claimed that he has implemented the Java version of First Fit Algorithm, and
give a reasonable speedup. Nevertheless, in his paper, the speedup gained is not linear
with a maximum of about 2 and slowly getting smaller for a high number of threads.
Umland uses a pipelined approach which is not scalable and has overheads in filling the
pipeline.

3.1 Parallel Graph Colouring Algorithm

As has been discussed in Chapter 1, basically the graph Colouring Algorithm is finding a
set of vertices in a graph and colouring them in such a way that none of the neighbouring
vertices would have the same colour. If we examine the existing sequential graph colour-
ing algorithms, there are some algorithms in which the selection of vertices creates an
independent set of vertices while the rest of the algorithms creates a non-independent set
of vertices. The algorithms included in the first group are JP and LDF in which it selects
a vertex in such a manner that none of the following vertices are neighbours. We also
need to assign random numbers to vertices to break ties. The rest of the algorithms such
as SDL, SDO, IDO and FF uses a non-independent sets, in which random numbers might
also required.

The fact that the first group of algorithms are having independent set of vertices, has
made them easy to be parallelised. Those vertices in the set can be distributed among
the processors and coloured concurrently. Some of the algorithms in the second group
of algorithms can be ‘directed’ to produce an independent set of vertices. For example,
the selection of nodes in SDL can use a random number to break the ties between two

12
neighbours having the same weight. However, there are still some algorithms which are
quite hard to produce an independent set of algorithms, for example First Fit Algorithm,
due to its nature of selecting vertices.

Parallel Graph colouring algorithms need to ‘communicate’ between the processors.


They need to know what is the ‘condition’ (e.g. colour number, weighting, random num-
ber) of its neighbours, which might be on other processors. All parallel algorithms need
to get this information, which is why shared-memory machines should be better than
distributed-memory machines in this application.

This chapter describes the major component of this project, that is composing the parallel
versions of the previous sequential algorithms in Chapter 2. In the parallel version, the
vertices in the graph will be distributed among a certain number of processors. The dis-
tribution is based on the number of vertices, N (V ), divided by the number of processors
available, p. Hence each processor will colour N (V )=p number of vertices.

In this Chapter, we will divide the discusson of the development of the parallel algorithms
based on the approaches discussed above. The first section will discuss the importance
of synchronisation in a parallel graph colouring. The next section will then describe
those algorithms which produces set of independent vertices such as Largest-Degree-First
(LDF) algorithm [26] and Jones-Plassmann (JP) [14] and Smallest-Degree-Last (SDL) al-
gorithm [19], while the second section will talk about the rest of the algorithms using the
second approach, namely First–Fit Algorithm (FF) [1, 2, 16], Incidence Degree Ordering
(IDO) [6] and Saturation Degree Ordering (SDO) [3] and Gebremedhin and Medhin [11]
algorithm.

3.2 Synchronisation

Synchronisation holds an important role when developing a parallel version of the algo-
rithm. A proper synchronisation is required at certain stages of the algorithm in order to
minimise the running time and avoid any race condition.

Synchronisation takes place in such cases : threads have to be synchronised after forming
the set of independence vertices. For example, after giving weight to a set of vertices V 0,
the thread has to wait for other still-running threads. Otherwise, it will result in wrong
selection of vertices.

In most of the algorithms, colouring will take place just after forming the set of vertices
0
V , therefore a synchronisation is required. In the colouring phase, all threads will colour

the vertices assigned to them concurrently. A race condition might occur here where 2
adjacent vertices in 2 different threads are being coloured by the thread at the same time
with the same colour. Thread 1, for example, is trying to find the lowest colour available
for vertex A, and it will look at A’s neighbour, say B , in which at this stage has not

13
been coloured yet and therefore is ignored. At the same time, thread 2 is trying to colour
vertex B , and searching for the lowest available colour among B ’s neighbour, say one of
them is A, which at this stage has not been coloured yet and therefore is ignored. Hence,
both threads might end up colouring both vertices in the same colour or in other words
the colouring is wrong. Figure 3.1 shows how this might happened in a graph colouring
using 4 processors machines.

Figure 3.1: Incorrect Graph Colouring

Therefore, we need to make sure that both threads will not assign the same colour to
both vertices. There are 2 proposals to correct this : The first proposal is to make sure
that thread 1 will colour vertex A, after or before vertex B , and not at the same time.
Therefore, vertex A has to find out whether its neighbour belongs to other threads or not
(since only in this case the race condition will happen). We also need to call the barrier
synchroniser to hold thread 2 from checking the lowest available colour, until thread 1 has
finished colouring vertex A. The drawback of this method is that if the conflict happened
in a significant number of times, the essence of parallelism won’t be achieved, since this
method would use up more resources both in time as well as memory.

Another proposal is to let those errors happen but afterwards conduct a checking through
the whole graph, to search for any adjacent vertices which have been coloured wrongly.
These pairs of vertices will be then be stored, and then fixed sequentially [11].

Other issue that might create problem in the synchronisation is the different number of
iterations for each thread. Once a thread has finished its part in one stage of the algorithm,
the barrier synchroniser will tell this thread to wait for other threads that are still running

14
their tasks. In these tasks, a thread might need to synchronise its work with other threads
and hence will invoke the barrier synchroniser. This call to the barrier synchroniser might
cause those threads that have been put to sleep to be woken up and continue with the next
step of the algorithm. This will result in an incorrectly coloured Graph.

Nevertheless, synchronisation has a major drawback in terms of speed-up. We must be


very careful in selecting methods or classes of Java in which some of them might be
synchronised and therefore slow down the whole process.

3.3 Independent Set

3.3.1 Jones–Plassmann (JP)

The first phase of this algorithm is assigning a random number to every vertex in the
graph. The algorithm will then form a set of independent vertices in the following man-
ner: Each vertex will look at its neighbour and see whether it has got the highest random
number among its neighbours. The next step is the colouring of all these ‘highest’ ver-
tices by the lowest available colour (which has not been used by any of its neighbour)
and remove them from the graph. The algorithm will then choose the next set of ‘highest
(random number)’ vertices and again colour them in the same manner. Figure 3.3 and
3.2 shows how the algorithm actually works. All threads need to be synchronised once
it has formed the set of independent vertices V 0 , before moving on to the colouring step.
Similarly, once V 0 has been coloured, all the threads need to be synchronised once more,
to avoid any wrong selection of vertices in the following V 0. The algorithm then will
iterate until all vertices in V 0 are coloured in each thread.

3.3.2 Largest–Degree–First Algorithm (LDF)

The basic principle is similar to the sequential version, i.e. to form set of vertices which
has the largest degree of vertex, and colour them independently (see Figure 3.4). In the
parallel version, the vertices in each thread will look at the degree of all its neighbours,
even though they might belong to other threads. Any conflict — two vertices having the
same degree — will be solved by comparing its random number. Having formed the set
of independent vertices, all the threads are now need to be synchronised before moving
on to the colouring process. The synchronisation process is essential in obtaining correct
colouring, without which two threads might colour two adjacent vertices with the same
colour and hence produce a mistake. This could happen when one thread has finished
finding the set of independent vertices, while the others are still searching. After being
synchronized, the colouring phase will then take place concurrently (since all of them
are independent and not connected to each other). Nevertheless, each vertex still has

15
assign random number R(v (i)) to each vertex v (i) 2 V ;
while (not all vertices in V are coloured)

for i=1 to N (V ) do

if R(v (i))  R(v (j )); 8v (j ) 2 neigh(v (i)


then v (i) 2 V 0
end if

end for
for i = 1 to N (V 0 ) do

find the lowest available colour, , for vertex v (i);


set the colour of vertex v (i) to ;

end for
SYNCHRONISE ALL THE THREADS;

end while

Figure 3.2: Jones–Plassmann(JP) Algorithm

to find out what is the lowest colour available (by looking at colour of its neighbours).
The threads once again, need to be synchronised before moving on to the next stage
of forming another set of independent vertices, otherwise in the next step one thread
might select those vertices which are not coloured yet, but soon to be coloured by other
still-running threads. Figure 3.5 describes the process of colouring using Parallel LDF
method.

3.4 Non-independent Set

The methods below are using the approach of forming a non-independent set of vertices,
0
V , to then start with the colouring. When applied in parallel, most of these algorithms

will give an incorrectly coloured graph. This will occur when two threads happen to
access two adjacent vertices at the same time, looking at each other’s colour (which has
yet to be coloured) and assign them the same colour. In the previous algorithms, this will
not happened, since all of them are independent. Therefore a step has to be taken either to
make sure that when they have neighbours in other threads, they colouring phase would
be synchronised, or else fix up those vertices which are assigned the wrong colour, after
the entire colouring process finished.

16
Figure 3.3: The colouring stages in Jones-Plassmann Algorithm

17
Figure 3.4: Largest Degree First (LDF) Algorithm

18
assign random number R(v (i)) for each vertex v (i) ;
assign N (V )=p vertices to each thread;
while (not all vertices in N (V )=p are coloured)

for i = 1 to N (V )=p do
if d(v (i)) > d(v (j )); 8v (j ) 2 neigh(v (i));

then v (i) 2 V 0

else if d(v (i)) = d(v (j )); v (j ) 2 neigh(v (i))


and R(v (i)) > R(v (j ))

then v (i) 2 V 0

end if
end for

for i = 1 to N (V 0 =p) do

find the lowest colour available, , for vertex v (i) ;


set the colour of vertex v (i) to ;
end for

synchronise all the threads;


end while

Figure 3.5: Largest Degree First (LDF) Algorithm

19
3.4.1 First Fit (FF)

As has been described in section 2.2, First Fit Algorithm will colour the vertices by
choosing the vertex arbitrarily. This also apply in the parallel version. The consequences
of having wrong colour might occurred here. As described previously, to prevent this
from happening we have to synchronize all other threads accessing two adjacent vertices.
This will cause a big overhead for the overall computation time. Gebremedhin and Manne
[11] introduced a new approach that we should check for any possible wrong coloured
vertices at the end of the session and give them the appropriate coloured afterwards. This
part will be done sequentially, to ensure there will no more race condition between the
threads. As we can see in figure 3.6, the thread need only be synchronised once the
colouring is done, before the checking commences.

distribute N (V )=p vertices to each thread;


while (not all N (V )=p vertices are Coloured)

select an arbitrary vertex v (i) in each thread t;


give them the lowest colour available
synchronised all threads;

end while
for each thread, t

check if the graph is correct


if not, store those incorrect vertices

end for
colour incorrect vertices sequentially

Figure 3.6: First Fit (FF) Algorithm

3.4.2 Gebremedhin–Manne (GEBMAN)

Gebremedhin and Manne developed two algorithms. The first one is basically the imple-
mentation of FF algorithm in parallel. The other version (GEBMAN algorithm) involves
another phase before coming to the checking and correcting stage. The first phase of
this algorithm works exactly the same as FF but the result of the colouring is regarded
as a pseudo-colouring. We group those vertices V 0 which have the same colour into a
C olourC lass(i), start from 0 up to the highest colour . Hence if the graph with 5 differ-

ent colours, there will be 5 ColourClass (see Figure 3.7). The second phase is working on
the basis that if we re-apply FF algorithm to the graph and use the ColourClass with the
highest colour to start the colouring, we will be able to first colour the vertices which are

20
step 1: colour the graph as in FF
vertices are coloured from 0 to ;
step 2:

for i = C olN um( ) down to C olN um(0) do


distribute C olourC lass(i) evenly among the threads p;
for each vertices, v (i) 2 C olourC lass(i)
get the lowest colour available, , for v (i)
set the colour v (i) to ;
end for
end for

step3: same as before : check whether the graph is correct or not


step4: correct the graph if it is wrong (sequentially)

Figure 3.7: Gebremedhin–Manne (GEBMAN) Algorithm

hardest to be coloured. In this manner, the colouring of the graph are actually in reverse
order [11]. This will hopefully reduce the number of colours.

3.4.3 Smallest–Degree–Last (SDL)

The parallel version of SDL— as can be seen in figure 3.8 — is quite similar with its
sequential version, except in a few parts. The algorithm will determine what is the lowest
degree of vertex, min(d(v )), in the graph G and then search those vertices that has got
such degree. The work is then distributed in p number of thread in which each threads
will look for the vertices who has the degree of vertex, d(v ), and assign them the lowest
weight, w(v ). This set of vertices will then be ‘removed’ from the graph, and the next
iteration will find another set of vertices which has degree of vertex less than or equal to
d(v ) + 1 and given the next weight, w (v ) + 1. This weighting stage will continue until

all the vertices are given a weight.

The next stage is the colouring phase, which starts from the vertices that have been as-
signed the highest weight max(w(v )) down to the lowest weight min(w(v )). The colour-
ing phase uses the approach introduced by Gebremedhin and Manne, namely ignore any
wrong colouring at the first stage then correct them later on. SDL algorithm could also
be ‘directed’ to produce a set of independent vertices by introducing a random number to
break ties between 2 adjacent vertices, similar to parallel LDF.

21
find the lowest degree of vertex, min(d(v )), in all vertices;
distribute the vertices into p number of threads;
j = 0;

while (not all vertices in N (V )=p weighted)

for i = 0 to N (V )=p do
if d(v (i))  min(d(v )) + j
give v (i) a weight of w(v )

end for

increase w(v );
j = j + 1;

end while;
SYNCHRONISE ALL THREADS;
w (v ) = max(w (v ));

while (not all vertices in N (V )=p coloured)

for i = 0 to N (V )=p do
if w(v (i)) = w(v )
find the lowest colour available, , for vertex v (i)
set colour of v (i) =

end for

decrease w(v )

SYNCHRONISE ALL THREADS;


end while;

for each thread

check if the graph is correct


if not, store the incorrect vertices

end for
fix up incorrect vertices sequentially

Figure 3.8: Smallest Degree Last (SDL) Algorithm

22
3.4.4 Incidence–Degree–Ordering (IDO)

The first part of parallel IDO algorithm, i.e. searching for the highest degree of vertex
max(d(v )) in the whole graph G. As figure 3.9 shows, after this stage, the work will

be done in parallel among p number of threads. Having done the first set of vertices
coloured (with the lowest available colour), we can now can start with the gist of the
P
algorithm i.e. selecting vertices based on the total number of its coloured neighbours,
(neigh(v )). Each vertex in every thread will look at its neighbour and count how

many of them is coloured even though the neighbour might belong to other threads. The
highest ones among them will then be coloured with the lowest available colour . Again
the colouring is done based on Gebremedhin and Manne approach. The algorithm will
iterate until all the vertices is coloured.
find the highest degree of vertex, max(d(v )), in graph G,

distribute the work on p number of threads.

while (not all vertices (N (V )=p coloured)


. . . same as the sequential version
end while
for each thread

check if the graph is correct


if not, store the incorrect vertices

end for
fix up those vertices which are incorrect

Figure 3.9: Incidence Degree Ordering (IDO) Algorithm

3.4.5 Saturation–Degree–Ordering (SDO)

There is no significant difference between the parallel version of IDO and SDO except
P
that now it take account the number of differently coloured neighbour (which must be
less or equal to the number of coloured neighbours) 0(neigh(v )).

The algorithm can be seen in figure 3.9. SDO and IDO are among the best Graph Colour-
ing Algorithms because they give the lowest number of colour. These algorithms have not
been implemented in parallel before, therefore this is the first implementation of parallel
version of IDO and SDO.

23
3.5 Balanced Colouring

Having the fastest and lowest number of colours for each algorithm, is one of the aims.
Another aim of this project is to achieve a balanced graph colouring. To achieve this,
there are few techniques that can be implemented. We have looked at 2 techniques of
balanced colouring :

1. Balancing during colouring


Within the colouring phase, every thread should have the knowledge of how many
colours other threads have so far and how many of them for each colour. Hence, a
public variable is required in the program so that every thread could know the num-
ber of vertices of a given colours in other threads. Therefore, instead of assigning
the lowest colour available, we might have to give a vertex a higher colour, in order
to maintain the balanced between colours. This might result in the increase of the
number of colours used. Some extra computation time might also be required to
check other thread’s colour composition.

Figure 3.10: Balanced Coloured Graph

2. Balancing after colouring


We can also colour the graph initially with the lowest colour available, and then

24
check the composition of each colour in every thread. Having this information, we
can then sweep every single colour and ‘exchange’ the colour 1 for a vertex to a
higher / lower colour 2 (which has a lower number of colours in the whole graph).
Here, we also have to make sure that the new colour should conform to the basic
requirement of graph colouring i.e. none of the neighbours has the same colour.

Gjertsen, Jones and Plassman implemented the second balanced colouring method in
their later algorithm and allow several passes to the graph to re–order the balancing of
the graph. This is the ‘k’ factor in their PLF(k) and PDR(k) algorithms[15].

25
Chapter 4

Implementation

4.1 Previous Work

The algorithm of Maximum Independent Set (MIS) by Luby [18] takes an average time
O(log n) using the P-RAM model, however this was not implemented on a real ma-
chine. The next algorithm was introduced by Jones-Plassmann, in which they reported
no speedup for their algorithm which used PVM on a distributed memory machine. A
further implementation of JP algorithm was developed by Gjertsen Jr. et. al. [15] in which
they developed a set of new algorithm PLF(k) and PDR(k) which require fewer colours
than its ‘older’ algorithm JP but used slightly more execution time. This work also does
not report any speedup on their new algorithm although they achieved a good balanced
colouring algorithm. Allwright et al. [2] parallelized some well-known sequential algo-
rithms such as LDF and SDL, and implemented them both in SIMD and MIMD parallel
architectures. Unfortunately, their work also did not achieve any speed up for any of
these algorithms.

Most of these algorithms were implemented on distributed-memory machines. There are


also some recent works which have implemented the algorithms on a shared-memory
machine. A work done by Umland [24] has implemented the parallel version of First
Fit (FF) Algorithm in Java Threads in a 4 processor machines and achieved maximum
speedup of 2. Another work of Gebremedhin and Manne [11] developed two new algo-
rithms and claimed that they have achieved an almost linear speedup as well as improv-
ing the number of colours used compared to the standard FF algorithm. Their algorithms
were implemented using Fortran90 using OpenMP on a SGI Origin 2000 super computer.
Since they only implemented one particular algorithm, namely First Fit, we would like to
find out whether their good speedup is due to the algorithm or is it showing that shared-
memory machine would perform better in Parallel Graph Colouring algorithm than a
distributed memory machine.

26
4.2 Structure

The implementation of the algorithms in Chapter 2 and Chapter 3 is using Java Thread.
The selection of Java is due to the fact that Object-oriented programming language, such
as Java, is good for graph algorithms. Moreover, Java has inbuilt support for shared-
memory parallelism using its Thread class.

4.2.1 Java Thread

A thread is part of a program which has a beginning, and executions and an end, just
like any other sequential program. Multithreading is a mechanism in which we can run
several jobs concurrently in one program. Java supports multithread programming in
which we can assign several tasks to different threads at the same time. There are two
methods of implementing Threads in Java [13, 21]:

 Subclassing Thread and overriding its run method


The implementation should be the subclass of Thread Class and create a run method
in our Class to overide the run method of Thread Class. The run method will then
be invoked by calling the start method of the Thread Class.

 Implementing the Runnable Interface


Instead of subclassing the Thread class, we can also implement the Runnable inter-
face, which means we have to implement the run method defined in the interface.
This is very useful when our class has to subclass other Class (other than Thread).

In our implementation, however, we choose to use the second method since we create
a Class which subClass Thread Class with the hope that this class would be generic
and can be used for all other class in our program. Nevertheless, in the later stage of
the development, we find out that we need almost a different Thread Class for every
algorithm we develop. Therefore we change the implementation using the first method.

4.2.2 Data Structures

Java does not have a graph class and therefore we implemented our own graph class.
The Class contains the data structure of the graph, which store the vertices and edges
as well as various methods to invoke or access the data in the graph, such as method
of firstNode() which return the first Node in the list of vertices, firstEdgefrom(Node n)
which return the first Edge of vertex v and so on. The Class also need to read an input
file either in stardard form (for Sparse matrix graphs) or the user-defined format (for the
Random Graphs). Therefore we wrote 2 separate input Parser in order to do this.

27
Initially the data structure of the graph was stored in a Vector, since the size of the Vec-
tor can grow by itself and we don’t know how many vertices or edges the input file will
have. But, this selection has a major drawback which affects the speedup since Vector
is synchronized. Hence, every time a thread is trying to access a particular vertex in the
graph, other threads have to wait until it is finished. This fact defeated the purpose of
parallel programming. We therefore changed the data structure to an array to avoid any
synchronisation. The work of Gortz [12] also shows that there is unnecessary synchroni-
sation using Vector as the data structure. This is acceptable since most of the graphs are
static.

4.2.3 Class Structure

The algorithms are implemented in Java Thread and organised in such a way that com-
mon methods are collected in one Class. Those algorithms which are implemented are
discussed in Chapter 2 and Chapter 3.

For every algorithm, few Classes are written:

1. Main file: containing the main method, a method of parsing the graph, a method
of distributing jobs to different threads and invoking the run method of the Thread
class.

2. Thread Class: overwrite the run method in the Thread class, which invoke the
method in color / algorithm class.

3. Algorithm Class: consists of methods to form the set of vertices.

4. Colouring Class: containing a method to colour the set of vertices. In simple


algorithms, this class is combined with the algorithm class in one class.

On the top of these classes, there are also other general classes:

1. Graph Generator: creating file input of random graphs with a certain paramater, e.g.
the number of vertices, the number of edges, the percentage of edge per vertex.

2. Graph Parser: to read and form the graph from the file input.

3. Barrier Synchronisation: used in the parallel version, containing method to inform


the thread to wait for other threads until they are finished running (synchronising
the threads)

4. Function Class: a collection of common methods used in most of the algorithm,


for example finding the lowest/highest degree of vertex, lowest colour available,
checking the balanced colour etc.

28
Other than all these files we also have developed a Graph generator Class in order to
create random Graph input files, and sets of testing files for different number of threads
in different machines.

4.3 Sequential version

There are 6 sequential algorithms implemented namely Jones-Plassman (JP), Largest De-
gree First(LDF), Smallest Degree Last (SDL), Incidence Degree of Ordering (IDO), Sat-
uration Degree of Ordering (SDO), First Fit (FF) Algorithm. The degree of complexity
of these algorithms, starts from FF being the simplest one, JP, LDF, SDL, IDO and SDO.
All of these algorithms are choosing a vertex to be coloured following a set of rules. The
vertex is then coloured one after another (with the lowest colour available) until all the
vertices in the graph G is coloured. Note that JP algorithm does not actually have any
sequential version, but we developed its sequential algorithm (which has the same princi-
pal as its parallel version, i.e using the biggest random number to choose the vertex to be
coloured) for the purpose of comparison of speedup achieved by its parallel algorithm.

4.4 Parallel version

The main issue with parallel colouring is that we cannot in general colour nodes inde-
pendently, otherwise we might get a wrong colouring i.e. 2 adjacent vertices having the
same colour. In sequential version, the vertex is coloured one after another, therefore we
can make sure that none of its neighbour would have the same colour. On the contrary,
the parallel colouring require the colouring to be done simultaneously and at the same
time, avoid any mistake in the colouring phase. Hence, to achieve this we need a few
synchronisation methods in some stage of the program.

We have developed a barrier synchroniser which help the thread to understand whether
they have to wait to execute next part of the program. To do this, we use two Java Thread
Class methods, namely wait() and notifyAll() to let other threads know whether the caller
of this methods wants other threads to wait or to release itself from the waiting queue
[13, 7]. Once a thread invokes a wait() method, it will wait until another thread calls the
nofityAll() method, in which all the waiting threads are woken up and start executing the
next part of the program.

Barrier synchroniser is invoked mostly at 3 places:

 After the formation of independent (or non-independent) set of vertices. Neverthe-


less, this only apply to those algorithms which take into consideration the number
of coloured neighbours, such as in SDL, SDO and IDO. Other algorithms such as

29
FF, JP and LDF need not be synchronized at this stage. To illustrate the importance
of synchronisation, let’s take a look at an example of the SDO Algorithm: Say
we have two Threads, in which Thread 1 is faster than Thread 2. Having finished
selecting the set of vertices, say V10 , for the 1st iteration, Thread 1 moved on by
colouring those set of vertices. Thread 2 , on the other hand, is still selecting the
vertices which have the highest number of differently coloured neighbours, say V20.
While Thread 2 is selecting V20 , those set of vertices in V10 (which might be the
neigbours of vertices in V 20 ) are being coloured. When Thread 2 is selecting V 20 ,
0 0
V1 might not be coloured yet, but it might be so just after V2 is formed. Hence the

selection of V20 is wrong.

 After the colouring phase. This synchronisation basically has the same function as
the first one, that is to avoid any possibility of one thread identifying a vertex as an
independent set while the other thread is colouring one of its neighbour.

 After all the vertices in the graph are coloured and before we want to perform any
checking for any wrong coloured vertices. The reason for this is quite obvious,
since uncoloured vertex will be ignored and later on might have a wrong colour.

4.4.1 Independent Set Vertices

The algorithms of this category will produce a correctly coloured graph since all the
vertices in the set is independent, and hence no wrong colour would be given to any ad-
jacent vertices. The method of finding the lowest colour available holds a very important
rule in making sure the all the vertices are correctly coloured. Nevertheless, a checking
is performed at the end of the algorithm for debugging purpose. The time taken for the
checking is quite and since this is not required in the algorithm therefore it is not included
in the timing. Synchronisation for these set of algorithms are taking place as mentioned
above, namely after the grouping of vertices, and after the colouring of the set of vertices.

4.4.2 non-Independent Set Vertices

For each algorithm, the set of vertices will be coloured according to the order it was
stored in the collection. Errors of giving same colour to adjacent vertices are likely to
occurred during this phase, since the threads are not forced to wait for others until they
finished colouring (see Figure 3.1). In the implementation, we choose to use the second
approach (as in section 4.4).Hence, checking is very essential in the later stage of the
algorithm, in order to fix the colour of those vertices. The checking of the graph is done
in parallel, but the correction is done in sequential in order to avoid any further errors.

30
Set the Threshold value, t;
Loop over vertices v (i) 2 G
Ni = Np =N ;

if j (i)j > Ni and j (j )j < Ni


for i = 0 to N (V )=p
if vertex v (i) having the colour 1 2 (i)
Check if colour 2 2 (j ) exists in the (neigh(v (i)))

if not then swap(v (i); 1; 2 )

if j (i)j Ni < t, threshold


then stop
else
iterate until 8(j ( )j
i Ni ) < t or
(i) = N i; wherei20 :::

end if
end if
end for

end loop

Figure 4.1: Colour Balancing method

4.5 Balanced Colouring

The balancing method used in the algorithm is the second approach explain in section 3.5,
with some modification. The algorithm is described in Figure 4.5. The method is, first of
all, colouring the graph as per normal, and thus we know what is the number of color N .
The number of each colors will be stored in an array and then compared with the ‘ideal’
number of colors. The ideal number of colors is defined as the number of vertices per
processor Np divided by the number of color N , ideal Ni = Np =N . In the case where
all the threads have a different number of colours, we will use the highest colour among
all threads. Those vertices which have been colored with a colour which has a higher
number of colors than the ideal number, will have to be re-coloured with another colour
which has a lower number of colour than the ideal number. These swapping of colours
will also consider the main rule of Graph Colouring that is none of the new colour is
belong to any of the adjacent vertices.

We also set a threshold to stop the process of re-colouring in the case where the colour
of a vertex cannot be swapped with another colour (since all of the colours are already

31
exist in the adjacent vertices). The threshold here is a percentage of the ideal number of
colour which we are trying to achieve for every thread. The method will keep checking,
if the distribution of a given colour within all the threads is less than the threshold, then
the iteration of swapping colors should be stopped. The drawback of this method is that
it sweeps the graph once and it will stop even though some of the colours might not be
distributed evenly. Ideally, we might need a few sweep across the graph to re-order the
distribution of colouring in the case where no further swap of colours can be done. This
balancing method is very simple and it could be improved in many ways.

32
Chapter 5

Performance measurement and


Analysis

A major component of this project is to observe the performance of the newly devel-
oped algorithms and find out whether these algorithms have gained any speed up in the
computation time. Most of the previous work has not gained any or much speedup. The
work of Allwright et al. report that they did not get any speedup [2]. Jones-Plassmann
in their paper in which they describe the JP algorithm does not describe any speed-up
in their algorithm [14]. The only work that has shown good speedup is Gebremedhin
and Manne [11] who used a shared-memory machine. This chapter describes the per-
formance of these algorithms which we have developed, in terms of the running time,
speed-up gained and the number of colours used in the graph.

5.1 Experiment conditions

The testing of the algorithms was conducted in a 4-processors shared-memory machine,


Sun E420R (Orion) of Physics Department, University of Adelaide. Orion is made up of
40 Sun E420R servers machine, in which each processor is 450 MHz Ultrasparc II with
4 MB of level 2 cache[5]. These tests were done on few nodes of the Orion machine.
We tried to make sure that during the execution of the program, there were no other jobs
running in order to obtain a reliable result. In the later part of the experiment, we also
tested the algorithms on a larger machine, Titan, a SGI Power Challenge of 20-processors
with 195 MHz MIPS R10000 processors with 2 MB of level 2 cache[25].

The test graphs here are of 2 different types :

 Random Graph: We developed a ‘graph generator’ to produce a random graph with


a certain number of nodes, and certain percentage of edges per nodes. A few large

33
graphs in the order of several hundred nodes were selected, with different number
of edges.

 Sparse Graph: This was taken from the collection of standard Sparse Matrix Graphs
available on the internet [22].

Table 5.1 and 5.2 shows the number of vertices and edges for each test graph. The
tests were conducted for each algorithm for 1,2,3,4 processors, since E420R has only 4
processors. Any speed-up shown in the graphs was the time taken by the parallel version
of the algorithm against the time taken by the sequential version.

Nodes Edges
250 6062
500 9490
1000 19764

Table 5.1: Testing Graphs 1 : Random Graph

Name Nodes Edges


3elt 4720 27444
4elt2 11143 65636
4elt 15606 91756

Table 5.2: Testing Graphs 2 : Sparse Matrix

5.2 Results

5.2.1 Different types of graphs

Sparse matrix graphs

For the sparse graphs, the algorithm has shown a good speedup. Tests were conducted
on small graphs (3elt) as well as large graphs (4elt and 4elt2). In terms of the time taken
to solve the GCP, figure 5.1 and figure 5.2 shows that FF algorithms took the smallest
amount of time, followed by its similar version, GEBMAN. IDO and SDO algorithms
are the slowest among the algorithms, while JP and LDF are in between. On the contrary,
in terms of speedup table 5.3 FF Algorithm, being the simplest and fastest algorithm,
have a fairly reasonable gained between 2-4; while SDO and IDO which are the slowest,
gain a high speedup between 5-6. This gain might be due to the fact that the sequential
version of these two algorithms are very slow and Orion might have had a heavy load

34
during testing the sequential algorithm. Or that might be the real performance of SDO
and IDO, when parallelised, could achieve a super linear speedup due to the greater use
of cache memory. Overall all the algorithms have gained reasonable speedup from 2 up
to 6. Figure 5.3 shows a better picture of the speedup for each algorithm for every num-
ber of processor in solving problem 3elt.

Problem FF SDL GEBMAN JP LDF IDO SDO


3elt 3.82 2.88 3.76 3.45 3.75 5.83 5.9
4elt2 2.21 2.51 2.88 3.61 3.26
4elt 3.76 2.58 3.71 3.69 3.4
Tested on 4-processors Sun E420R (Orion)

Table 5.3: Speed up of all algorithms

Problem FF SDL GEBMAN JP LDF IDO SDO


3elt 60 207 91 393 365 3967 3938
4elt2 663 1345 748 430 2074 54739 54154
4elt 1006 3527 1499 5635 5759
Tested on 4-processors Sun E420R (Orion)

Table 5.4: Time taken for each algorithms (in second)

In terms of colour used, SDO and IDO has the lowest number of colour, while FF, SDL
and other algorithms have slightly higher number of colour. Unfortunately, the test graphs
used here are have 5-8 neighbours per vertex, therefore we cannot see much variations in
the number of colours used for each algorithm.

Problem FF SDL GEBMAN JP LDF IDO SDO


3elt 6 6 6 6 6 5 5
4elt2 6 6 6 6 7 5 5
4elt 6 6 7 7 7 5 5
Tested on 4-processors Sun E420R (Orion)

Table 5.5: Number of colour used in the algorithms using 4 processors

Random Graph

For the random graphs, a reasonably good speed up was also achieved for all of the
algorithms. Table 5.6 and figure 5.4 shows that IDO and SDO algorithms had the highest
gain of 4.5-6, while the SDL, LDF algorithms are in between 2.5-4 FF, SDL, being the

35
Figure 5.1: Computation time for 3elt problem

36
Figure 5.2: Computation time for 4elt2 problem

37
Figure 5.3: Speed up for 3 elt problem

38
fastest among them, only gain speed up in order of 2-3 in a 4-processor machine. The
reason why IDO and SDO gained a speedup higher than 4 is the same as for Sparse
Graphs.

Problem FF SDL GEBMAN JP LDF IDO SDO


1000 Nodes 2.56 2.85 2.86 3.37 3.39 4.68 5.55
19274 Edges
500 Nodes 2.01 2.06 2.02 2.38 2.73 4.79 5.08
9490 Edges
250 Nodes 6062 Edges 2.39 3.31 2.59 2.89 2.01 4.68 5.76
6062 Edges
Tested on 4-processors Sun E420R (Orion)

Table 5.6: Speed up of each algorithms on Random Graphs

In terms of time, FF algorithm is the fastest algorithm of all, followed by LDF, JP, while
IDO and SDO is the slowest among them. Again, this timing might subject to some
disturbance by other tasks while running the tasks.

Problem FF SDL GEBMAN JP LDF IDO SDO


1000 Nodes 44.8 230.8 61.4 616.6 234.6 716.0 588.8
19274 Edges
500 Nodes 12.9 55.9 12.9 146.2 58.3 141.6 124.4
9490 Edges
250 Nodes 4.4 18.8 5.8 56.5 31.5 27.7 26.6
6062 Edges
Tested on 4-processors Sun E420R (Orion)

Table 5.7: Time taken (in second) of each algorithms for Random Graphs

The figure 5.5 and figure 5.6 shows the timing achieved for some test graphs of 500
vertices and 250 vertices respectively. As we can observed, First–Fit algorithm and the
GEBMAN algorithm always have the smallest computation time. This shows that for
any particular application that needs a fast computation time (without worrying about the
number of colours used), we should use First–Fit algorithm.

5.2.2 Graphs with same number of vertices and different number of


edges

The algorithms were also tested on some graphs which have the same number of nodes
but different number of edges. We would like to see whether the number of edges per

39
Figure 5.4: Speed up for Random Graph (250 Nodes)

40
Figure 5.5: Computation Time for Random Graph (500 Nodes)

41
Figure 5.6: Computation Time for Random Graph (250 Nodes)

42
Problem FF SDL GEBMAN JP LDF IDO SDO
1000 Nodes 19274 Edges 10 10 11 11 10 11 11
250 Nodes 6062 Edges 12 11 12 12 12 11 11
500 Nodes 9490 Edges 11 10 10 11 10 10 10
Tested on 4-processors Sun E420R (Orion)

Table 5.8: Number of colours used in each algorithm for Random Graphs

vertex have any effect on the time taken to colour the graph, the number of colours used
and the speed-up gained in each algorithm.

As we can observed from figure 5.7 and table 5.9, the bigger the number of edges, the
longer it takes to process the colouring. The increase number of edges signifies the bigger
number of edges per vertex, and therefore it might also mean the bigger the number of
colours given for that graph. Hence, the performance (in terms of time) of the algorithms
for a certain graph depends on the complexity of the graph itself. Table 5.10 shows the
number of colours used in each graph. As we can see, the lower the number of edges, the
lower the number of colour given for that particular graph.

Problem FF SDL GEBMAN JP LDF IDO SDO


24726 Edges 82.0 316.1 94.1 1455.2 504.8 636.3 629.9
19650 Edges 50.6 136.9 63.9 886.9 259.4 530.2 533.5
14024 Edges 27.3 105.5 34.0 395.4 154.6 243.0 243.3
9550 Edges 10.5 34.7 15.2 120.8 42.8 124.7 120.5
Tested on 4-processors Sun E420R (Orion)

Table 5.9: Computation time for each algorithm for Graphs of same nodes and different
edges

Problem FF SDL GEBMAN JP LDF IDO SDO


24726 Edges 19 19 19 19 18 19 18
19650 Edges 16 15 16 15 15 15 15
14024 Edges 13 12 13 12 13 13 12
9550 Edges 10 9 10 10 9 10 10
Tested on 4-processors Sun E420R (Orion)

Table 5.10: Number of colours in each algorithm for Graphs of same nodes and different
edges

43
Figure 5.7: Computation Time for Graph of same nodes (500 Nodes) and different num-
ber of Edges

44
5.2.3 Different number of processors

The algorithms were tested on another machine called Titan, which has 20 processors but
having less cache memory. The graph used in the test is 3elt problem from the Sparse
matrix. If we compare the timing result for Titan in table 5.11 and for Orion in table 5.4,
for the same number of processor (4), Orion performed better in terms of the time taken,
as Orion has faster processors (450 MHz compared to 195 MHz).

No of FF SDL GEBMAN JP LDF IDO SDO


processors
1 922.8 2246.5 1492.4 5022.6 5660.7
2 493.2 1503.2 739.7 2848.2 2775.0
4 264.6 857.5 391.2 1521.0
8 133.4 839.3 220.6 776.0
12 91.2 883.3 192.3 555.3
16 114.7 1095.4 155.5 485.6

Table 5.11: Computation time for each algorithm in different machines (TITAN)

Figure 5.8 shows the overall performance of all the algorithms. The bigger the number of
threads used to solve the problem, the smaller the time taken. Nevertheless, the speedup
or time taken tend to be go down or even constant after the number of processor above
12. Hence, we can say that most of the algorithms scale well to more than 4 processors.
From figure 5.9 we can see that algorithms such as FF have an increasing speedup until
it reach 12, while LDF, GEBMAN and JP algorithms are having a increasing speedup
upto 16 processors. Interestingly, SDL have a low speedup, which might be due to the
machine had a heavy load at the running time. The result obtained for 16 processors
might not be reliable either for the same reason.

5.3 Balanced Colouring Graph

The approach for balanced colouring graph is using the Checking and fixing method
of section 4.5. Table 5.12 shows the colour distribution for 4elt problem using a FF
algorithm before we apply the balancing method. Note that the standard deviation is
difference of number of a given colour in each processor. Hence a big standard deviation
refers to the condition that the number of a given colour in each thread is not balanced
or far apart. Table 5.12 shows how the improvement of balanced colouring for the same
graph problem. Before balancing, the deviations are large. After balancing, most of the
colours have zero standard deviation (all threads are having the same number of a given
colour), with some of them have a big standard deviation, but still in the order of 10
percent of the ‘ideal’ number of colour. The fact that there are threads which are having
more colours than the rest (see colour 3 and 5 of table 5.13) is because the balancing

45
Figure 5.8: Computation Time for 3elt Graph in Titan

46
Figure 5.9: Speedup for 3elt Graph in Titan

47
method is only making one sweep of the graph to perform the balancing. The method can
be improved by performing several sweeps to the graph, and re-order the graph balancing
to make sure that the colours is evenly distributed.

Colour No Thread 1 Thread 2 Thread 3 Thread 4 Std Deviation


0 980 603 615 662 178.49
1 749 607 741 659 68.14
2 444 692 719 655 125.11
3 431 709 546 614 116.98
4 155 114 158 182 28.22
5 27 61 14 24.27

Table 5.12: Distribution of Colour before balancing for 4 processors using FF Algorithm
in 4elt problem

Colour No Thread 1 Thread 2 Thread 3 Thread 4 Std Deviation


0 650 650 650 650 0
1 650 650 650 650 0
2 650 652 651 650 0.96
3 694 650 650 774 58.47
4 650 650 650 650 0
5 608 650 650 527 57.98

Table 5.13: Distribution of Colour after balancing for 4 processors using FF Algorithm
in 4elt problem

The result of balancing for another problem, 4elt2 is very much similar to 4elt. The
colours are evenly distributed for colour no 0,1,2 and 4, while colour 3 and 5 (see ta-
ble 5.15 are imbalanced , especially in thread no 4. This might cause the overall calcula-
tion of thread 4 to be delayed in the order of 10 percent of the other threads.

The balancing method applied here is very simple. It only took one sweep across the
graph, we therefore recommend an improvement of this particular section so that we will
get a better balanced colouring. The method should be able to sweep across the graphs
several times to re-order the distribution of the colours.

48
Colour No Thread 1 Thread 2 Thread 3 Thread 4 Std Deviation
0 1096 1015 1095 1112 43.7
1 1078 995 1056 1085 40.91
2 1042 948 994 989 38.48
3 630 774 694 637 66.65
4 50 167 59 71 54.19
5 6 2 4 7 2.22

Table 5.14: Distribution of Colour before balancing for 4 processors using FF Algorithm
in 4elt2 problem

Colour No Thread 1 Thread 2 Thread 3 Thread 4 Std Deviation


0 464 464 464 464 0
1 466 464 464 464 1
2 464 464 464 464 0
3 464 466 466 546 40.34
4 464 464 464 464 0
5 464 464 464 383 40.5

Table 5.15: Distribution of Colour after balancing for 4 processors using FF Algorithm
in 4elt2 problem

49
Chapter 6

Conclusions and Future Work

6.1 Conclusions

We have implemented some of the existing sequential Graph Colouring algorithms, namely
Smallest Degree Last (SDL), Largest Degree First (LDF), First Fit (FF), Saturation De-
gree Ordering (SDO) and Incidence Degree Ordering (IDO) in Java. We also have
developed their parallel version and the parallel algorithm of Jones-Plassman (JP), for
shared-memory machines using Java Threads. These algorithms were also transformed
to parallel versions using two different approaches, forming the Independent Set and non-
independent set. The algorithms were implemented in Java since it supports the parallel
programming using its Thread Class.

The performance of these algorithms shows that FF Algorithms is the fastest but gives a
larger number of colours. On the other hand, SDO and IDO are the slowest algorithms,
but gives a better colouring in terms of number of colours.

The choice of the Graph Colouring Algorithm depends on what sort of problem it is
trying to solve. For problems in which we require the lowest number of colours, IDO and
SDO will probably the most suitable (even though these are slow). But if the application
does not require this, FF, SDL or JP algorithm will be sufficient (and they are faster as
well).

Most of the algorithms have shown a reasonably good speedup on shared-memory ma-
chines, so we come to the conclusion it is the shared-memory machines which is better
than distributed-memory machines in this sort of applications, and not the algorithms as
stated by Gebremedhin and Manne [11].

This project also has developed a parallel implementation of the best sequential algo-
rithm, namely SDO and IDO, which is the first implementation of such algorithms.

50
6.2 Future Work
1. There are some algorithms, for example SDL, which can be parallelised by form-
ing the independent-set of vertices. It would be interesting to observe the per-
formance of both algorithms, one using a selection of independent-vertices and
the other which are implemented here, using the non-independent set (which also
Gebremedhin-Manne approach in fixing up the incorrect vertices). We can com-
pare which one is better in terms of time, speedup and the number of colours used.

2. We also recommend that more efforts should be done on testing on larger machines
as well as larger graphs. The performance of algorithms should also be tested
against the complexity of the graphs to classify which algorithms work best for
which type of graphs.

3. There are also other sequential algorithms which are yet to be parallelised and
compared with the existing ones.

4. The balanced colouring is an interesting feature of Graph Colouring Algorithms.


There are other methods besides the ones which are mentioned in this paper, which
can be implemented. We strongly recommend to improve the balanced colouring
method by allowing the method to have several sweep on the graphs in refine the
distribution of colouring for each colour class.

51
Bibliography

[1] A. Aho, J. Hopcroft, and J. Ullman. Data Structure and Algorithms. Addison-
Wesley Publishing Company, 1983.

[2] J. R. Allwright, R. Bordawekar, P. Coddington, K. Dincer, and C. L. Martin. A


Comparison or Parallel Graph Colouring Algorithms. Technical Report SCCS-666,
Northeast Parallel Architecture Centre, Syracuse University, 1995.

[3] D. Brelaz. New Methods to Colour the Vertices of a Graph. Communications Of


The ACM, 22:251, 1979.

[4] G. J. Chaitin, M. Auslander, A. K. Chandra, J. Cocke, M. E. Hopkins, and P. Mark-


stein. Register Allocation via Colouring. Computer Languages, 6:47 – 57, 1981.

[5] P. Coddington. DHPC Group’s Beowulf Cluster Projects.


http://www.dhpc.adelaide.edu.au/projects/beowulf/index.html, accessed online
on 24th June 2002.

[6] T. Coleman and J. J.More. Estimation of Sparse Jacobian Matrices and Graph
Colouring Problems. SIAM Journal of Numerical Analysis, 20:187 – 209, 1983.

[7] EPCC. The Java Grande Forum Multi–threaded Benchmarks.


http://www.epcc.ed.ac.uk/javagrande/ threads/contents.html, accessed online
on 24th May 2002.

[8] M. Garey and D. Johnson. Computers and Intractability. W.H. Freeman, New
York, 1979.

[9] M. Garey, D. Johnson, and H. C. So. An Application of Graph Colouring to Printed


Circuit Testing. IEEE Transactions On Circuit and Systems, pages 591 – 599, 1976.

[10] A. Gebremedhin, I. Lassous, J. Gustedt, and J. Telle. Graph Colouring on ACoarse


Grained Microprocessor. In Proceedings on WG 2000, 26th International Workshop
on Graph-Theoretic Concepts in Computer Science, Germany, 15 – 17 Jun 2000.

[11] A. H. Gebremedhin and F. Manne. Scalable Parallel Graph Colouring Algorithms.


Concurrency: Practice and Experience, 12:1131 – 1146, May 2000.

52
[12] J. Gortz. Java Tip 92: Use the JVM Profiler Interface for accurate timing.
http://www.javaworld.com/ javatips/jwjavatip92p.html, accessed online on 23 May
2002.

[13] S. M. Inc. Java Tutorial. http://java.sun.com/docs/books/tutorial/essential/threads/index.html,


accessed online on April 2002.

[14] M. T. Jones and P. E. Plassmann. A Parallel Graph Colouring Heuristic. SIAM


Journal of Scientific Computing, 14:654, 1993.

[15] R. K. G. Jr., M. T. Jones, and P. E. Plassmann. Parallel Heuristic for Improved,


Balanced Graph Colouring. Journal of Parallel and Discrete Computing, 37:171 –
186, 1996.

[16] H. Kierstead and J. Qin. Coloring interval graphs with First–Fit. Discrete Mathe-
matics, 144:47–57, 1995.

[17] G. Lewandowski. Practical Implementation and Applications of Graph Colouring.


PhD thesis, Computer Science Department, University of Wisconsin-Madison, Aug
1994.

[18] M. Luby. A Simple Parallel Algorithm for the Maximal Independent Set Problem.
SIAM Journal on Computing, 4:1036, 1986.

[19] D. Matula, G. Marbleand, and J.D.Isaacson. Graph Colouring Algorithms. Graph


Theory and Computing, pages 104 – 122, 1972.

[20] S. K. Miner, S. Elmohamed, and H. W. Yau. Optimizing Timetabling Solutions


Using Graph Colouring. Technical report, NorthEast Parallel Architecture Center,
Syracuse University, 1995.

[21] S. Oaks and H. Wong. Java Threads. O’Reilly, 1997.

[22] F. Pellegrini. Scotch:static mapping, graph partitioning, and sparse matrix block
ordering package. http://www.labri.fr/Perso/ pelegrin/scotch/, accessed online on
May 2002.

[23] J. Robert Kenneth Gjertsen. Parallel Graph Coloring Heuristics. Master’s thesis,
University of Illinois, Urbana–Champaign, 1994.

[24] T. Umland. Parallel Graph Coloring using JAVA. Architectures, Languages and
Patterns, 10:211 – 217, 1998.

[25] F. Vaughan. The South Australian Centre for Parallel Computing.


http://www.cs.adelaide.edu.au/users/sacpc/, accessed on line on 24th June
2002.

[26] D. Welsh and M.B.Powell. An Upper Bound for the Chromatic Number of a Graph
and its Application to Timetabling Problems. Computing Journal, 10:85, 1967.

53

Potrebbero piacerti anche