Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Outline
Graph Terminology
G = (V, E)
W = weight matrix
wij = weight/length of edge (vi, vj)
wij = if vi and vj are not connected by an edge
wii = 0
v1
-4
1
v2
7
9
v3
v4
0
1
2
3
4
5
4
1
0
5 4
0 3
3 0
0 7
2 9
3 4
1
0
7
0
6
2
9
6
0
1
-1
v1
v2
7
5
6
3
v4
v3
-2
v5
0 0
1
2
3 2
4 1
5
2 9
0 3
7 0
0 4
6 5 0
3 4
v3
-2
1
-1
v1
v2
5
6
3
v4
v5
Johnsons algorithm
Appropriate for sparse graphs: |E| = O(|V|)
O(V2 log V + V E) if using a Fibonacci heap
O(V E log V) if using binary min-heap
Strassens Algorithm
(matrix multiplication)
Pettie (2002)
Allows real-weighted edges
O(V2 log log V + V E)
wk
p (v, w) k 1 v
k
Properties of Interest
k
d
Let ij denote the length of the shortest path from
vi to vj that goes through at most k - 1 intermediate
vertices (k hops)
1
d
vr
vs
MIN
vj
MIN
MIN
MINs
d ik11
w1j
v2
vi
d ik21
w2j
k 1
in
vn
vj
wnj
dijk 1
k 1 n
k
ij
l 1
Recurrence Definition
k
k /2
k/2
d
min
(
d
d
For k > 1, ij
il
lj )
l
vi
vl
vj
MIN
k/2 vertices
MIN
k/2 vertices
k vertices
Similarity
l 1
Computing D
Let Dk = matrix with entries dij for 0 i, j n - 1.
Given D1, compute D2, D4, , Dm
m 2 log( n 1)
D = Dm
To calculate Dk from Dk/2, use special form of
matrix multiplication
min
Step 3: for m = 2q to 3q 1 do
for all r N (rm = 0) dopar
Cr = min(Cr, Cr(m))
Modified Example
1 2
A
3 4
P100
1 2
B
3 4
2
3
P110
4
-3
7 10
C
15 22
2
-4
P000
P001
1
-1
1
-2
3
-1
3
-2
P010
P011
P101
P111
4
-4
A
3 4
P100
1 2
B
3 4
-2
P000
P110
1
P101
P001
-1
P010
P011
P111
0
A
3 4
1 2
B
3 4
0 2
C
1 0
P101
P100
MIN
P110
MIN
P000
P001
-2
P010
P011
MIN
MIN
P111
Hypercube Setup
Begin with a hypercube of n3 processors
Each has registers A, B, and C
Arrange them in an n n n array (cube)
Setup Example
0
D1 = Wjk = A(0, j, k) =
v0
1
-1
v1
v2
7
5
6
3
v4
v3
-2
v5
0 0
1
2
3 2
4 1
5
2 9
0 3
7 0
0 4
6 5 0
3 4
An Example
0
D1 =
0 0
1
2
3 2
4 1
5
0
D4 =
0 2 9
0 3
7 0
0 4
6 5 0
1
0 0 1
1 4 0
2 2 3
3 2 1
4 1 0
5 3 4
D2 =
3 19 6 10
2 14 5 9
0 12 3 7
1 0 4 12
2 9 0 4
6 5 9 0
0 0 1
1 8 0
2
3 2 1
4 1 0
5 3
0
D8 =
0 0 1
1 4 0
2 2 3
3 2 1
4 1 0
5 3 4
10
2 5 13
0 3 7
7 0 10
10 9 0 4
6 5 9 0
3
3 15 6 10
2 14 5 9
0 12 3 7
1 0 4 8
2 9 0 4
6 5 9 0
Analysis
Steps 1 and (2.2) require constant time
There are log( n 1) iterations of Step (2.1)
Each requires O(log n) time
T1
O(n 3 )
1
Efficiency is E
3
2
c(n) O (n log n) O (log 2 n)
Recent Research
Jenq and Sahni (1987) compared various parallel
algorithms for solving APSP empirically
Kumar and Singh (1991) used the isoefficiency
metric (developed by Kumar and Rao) to analyze
the scalability of parallel APSP algorithms
Hardware vs. scalability
Memory vs. scalability
Isoefficiency
For scalable algorithms (efficiency increases
monotonically as p remains constant and problem
size increases), efficiency can be maintained for
increasing processors provided that the problem
size also increases
Relates the problem size to the number of
processors necessary for an increase in speedup
in proportion to the number of processors used
Isoefficiency (cont)
Given an architecture, defines the
degree of scalability
Tells us the required growth in problem size to be able to efficiently
utilize an increasing number of processors
Ex:
Isoefficiency (cont)
Given an architecture, defines the
degree of scalability
Tells us the required growth in problem size to be able to efficiently
utilize an increasing number of processors
Ex:
Given
isoefficiency of kp3 of kp3
Given
ananisoefficiency
If p0 and w0, speedup = 0.8p0 (efficiency = 0.8)
w1 = 2w
maintain efficiency
of 0.8(efficiency
p0Ifand
w
=
0.8p
0,,tospeedup
0
0
3
p1 = 2 w0 = 8w0
If
= 0.8)
If p1 = 2p0, to maintain efficiency of 0.8
w1 = the
23superiority
w0 = 8wof0one algorithm over another only when
Indicates
problem sizes are increased in the range between the two
isoefficiency functions
Architectures Discussed
Floyd Checkerboard
Floyd Pipelined Checkerboard
Floyd Striped
Dijkstra Source-Partition
Dijkstra Source-Parallel
Floyd Checkerboard
n
p
n
p
n
p
Floyd Striped
n
p
Dijkstra Source-Partition
Assumes Dijkstras Single-source Shortest Path is equally
distributed over p processors and executed in parallel
Processor p finds shortest paths from each vertex in its
set to all other vertices in the graph
Dijkstras Source-Parallel
Motivated by keeping more processors busy
Run n copies of the Dijkstras SSP
Each copy runs on
p
n
p
n
p
n
p
n
p
n
processors (p > n)
p
n
p
n
p
n
p
n
p
n
p
n
p
n
p
n
p
n
p
n
p
n
p
n
Calculating Isoefficiency
Example: Floyd Checkerboard
At most n2 processors can be kept busy
n must grow as (p) due to problem structure
By Floyd (sequential), Te = (n3)
Thus isoefficiency is (p3) = (p1.5)
But what about communication
Hypercube:
(ts + tw m) log d = time to deliver m words
2 (ts + tw m) log p = barrier synchronization time (up & down tree)
d = p
Mathematical Details
To pTp Te
2
n
n
log p 2t s t w log p tc tc n3
To p n 2 t s t w
p
p
Mathematical Details
To pTp Te
2
n
n
log p 2t s t w log p tc tc n3
To p n 2 t s t w
p
p
E
1 E
p1.5 (log p ) 3
Mesh:
n
p
p
Step 1 =
n
Step 2 = p p
Step 3 (barrier synch) =
Step 4 = Te
n
Tp (comm / sync) 2
p
p p n p
Isoefficiency = (p3+p2.25)
= (p3)
Parallel Variant
Architecture
Isoefficiency
MOF
Dijkstra
SourcePartitioned
p3
Dijkstra
Source-Parallel
SM, Cube
(p log p)1.5
Mesh, Mesh-CT
Mesh-CT-MC
p1.8
SM
p3
Cube
(p log p)3
Mesh
p4.5
Mesh-CT
(p log p)3
Mesh-CT-MC
p3
SM
p1.5
Cube
Mesh
p3
Mesh-CT
p2.25
Mesh-CT-MC
p2.25
p1.5
Floyd
Floyd
Floyd
Stripe
Checkerboard
Pipelined
Checkerboard
Comparing Metrics
Weve used cost previously this semester
(cost = p Tp)
But notice that the cost of all of the architecturealgorithm combinations discussed here is (n3)
Clearly some are more scalable than others
Thus isoefficiency is a useful metric when
analyzing algorithms and architectures
References
Akl S. G. Parallel Computation: Models and Methods. Prentice
Hall, Upper Saddle River NJ, pp. 381-384,1997.
Cormen T. H., Leiserson C. E., Rivest R. L., and Stein C.
Introduction to Algorithms (2nd Edition). The MIT Press, Cambridge
MA, pp. 620-642, 2001.
Jenq J. and Sahni S. All Pairs Shortest Path on a Hypercube
Multiprocessor. In International Conference on Parallel Processing.
pp. 713-716, 1987.
Kumar V. and Singh V. Scalability of Parallel Algorithms for the All
Pairs Shortest Path Problem. Journal of Parallel and Distributed
Computing, vol. 13, no. 2, Academic Press, San Diego CA, pp. 124138, 1991.
Pettie S. A Faster All-pairs Shortest Path Algorithm for Realweighted Sparse Graphs. In Proc. 29th Int'l Colloq. on Automata,
Languages, and Programming (ICALP'02), LNCS vol. 2380, pp. 8597, 2002.