CLR Explained

Introduction to Algorithms
6.046J/18.401J
LECTURE 1
Analysis of Algorithms
Insertion sort
Merge sort
Prof. Charles E. Leiserson
Course information
1.
2.
3.
4.
5.
6.
Staff
Prerequisites
Lectures
Recitations
Handouts
Textbook (CLRS)
7. Extra help
8. Registration
9.Problem sets
10.Describing algorithms
11.Grading policy
12.Collaboration policy
Course information handout

20014 by Charles E. Leiserson
September 8, 2004
L1.2
Analysis of algorithms
The theoretical study of computer-program
performance and resource usage.
Whats more important than performance?
modularity
user-friendliness
correctness
programmer time
maintainability
simplicity
functionality
extensibility
robustness
reliability
September 8, 2004
L1.3
Why study algorithms and

performance?
Algorithms help us to understand scalability.
Performance often draws the line between what
is feasible and what is impossible.
Algorithmic mathematics provides a language
for talking about program behavior.
Performance is the currency of computing.
The lessons of program performance generalize
to other computing resources.
Speed is fun!
September 8, 2004
L1.4
The problem of sorting

Input: sequence a1, a2, , an of numbers.
Output: permutation a'1, a'2, , a'n such
that a'1 a'2 a'n .
Example:
Input: 8 2 4 9 3 6
Output: 2 3 4 6 8 9
September 8, 2004
L1.5
Insertion sort
pseudocode
INSERTION-SORT (A, n)
A[1 . . n]
for j 2 to n
do key A[ j]
ij1
while i > 0 and A[i] > key
do A[i+1] A[i]
ii1
A[i+1] = key
September 8, 2004
L1.6
Insertion sort
INSERTION-SORT (A, n)
A[1 . . n]
for j 2 to n
do key A[ j]
ij1
while i > 0 and A[i] > key
do A[i+1] A[i]
ii1
A[i+1] = key
pseudocode
A:
sorted
September 8, 2004
key
L1.7
Example of insertion sort

8
September 8, 2004
L1.8

8
September 8, 2004
L1.9

8
September 8, 2004
L1.8

8
September 8, 2004
L1.11

8
September 8, 2004
L1.12

8
September 8, 2004
L1.13

8
September 8, 2004
L1.14

8
September 8, 2004
L1.15

8
September 8, 2004
L1.16

8
September 8, 2004
L1.17

8
9 done
September 8, 2004
L1.18
Running time
The running time depends on the input: an
already sorted sequence is easier to sort.
Parameterize the running time by the size of
the input, since short sequences are easier to
sort than long ones.
Generally, we seek upper bounds on the
running time, because everybody likes a
guarantee.
September 8, 2004
L1.19
Kinds of analyses
Worst-case: (usually)
T(n) = maximum time of algorithm
on any input of size n.
Average-case: (sometimes)
T(n) = expected time of algorithm
over all inputs of size n.
Need assumption of statistical
distribution of inputs.
Best-case: (bogus)
Cheat with a slow algorithm that
works fast on some input.
September 8, 2004
L1.20
Machine-independent time
What is insertion sorts worst-case time?
It depends on the speed of our computer:
relative speed (on the same machine),
absolute speed (on different machines).
BIG IDEA:
Ignore machine-dependent constants.
Look at growth of T(n) as n .
Asymptotic Analysis
September 8, 2004
L1.21
-notation
Math:
(g(n)) = { f (n) : there exist positive constants c1, c2, and

n0 such that 0 c1 g(n) f (n) c2 g(n)
for all n n0 }
Engineering:
Drop low-order terms; ignore leading constants.
Example: 3n3 + 90n2 5n + 6046 = (n3)
September 8, 2004
L1.22
Asymptotic performance
When n gets large enough, a (n2) algorithm
always beats a (n3) algorithm.
T(n)
n
September 8, 2004
n0
We shouldnt ignore
asymptotically slower
algorithms, however.
Real-world design
situations often call for a
careful balancing of
engineering objectives.
Asymptotic analysis is a
useful tool to help to
structure our thinking.
L1.23
Insertion sort analysis

Worst case: Input reverse sorted.
T ( n) =
2)
(
(
j
)
=
[arithmetic series]
j =2
Average case: All permutations equally likely.
T ( n) =
( j / 2) = (n 2 )
j =2
Is insertion sort a fast sorting algorithm?

Moderately so, for small n.
Not at all, for large n.
September 8, 2004
L1.24
Merge sort
MERGE-SORT A[1 . . n]
1. If n = 1, done.
2. Recursively sort A[ 1 . . n/2 ]
and A[ n/2+1 . . n ] .
3. Merge the 2 sorted lists.
Key subroutine: MERGE
September 8, 2004
L1.25
Merging two sorted arrays

20 12
13 11
7
September 8, 2004
L1.26

20 12
13 11
7
1
1
September 8, 2004
L1.27

20 12
20 12
13 11
13 11
September 8, 2004
L1.28

20 12
20 12
13 11
13 11
September 8, 2004
L1.29

20 12
20 12
20 12
13 11
13 11
13 11
September 8, 2004
L1.30

20 12
20 12
20 12
13 11
13 11
13 11
September 8, 2004
L1.31

20 12
20 12
20 12
20 12
13 11
13 11
13 11
13 11
September 8, 2004
L1.32

20 12
20 12
20 12
20 12
13 11
13 11
13 11
13 11
September 8, 2004
L1.33

20 12
20 12
20 12
20 12
20 12
13 11
13 11
13 11
13 11
13 11
September 8, 2004
L1.34

20 12
20 12
20 12
20 12
20 12
13 11
13 11
13 11
13 11
13 11
11
September 8, 2004
L1.35

20 12
20 12
20 12
20 12
20 12
20 12
13 11
13 11
13 11
13 11
13 11
13
11
September 8, 2004
L1.36

20 12
20 12
20 12
20 12
20 12
20 12
13 11
13 11
13 11
13 11
13 11
13
11
12
September 8, 2004
L1.37

20 12
20 12
20 12
20 12
20 12
20 12
13 11
13 11
13 11
13 11
13 11
13
11
12
Time = (n) to merge a total

of n elements (linear time).
September 8, 2004
L1.38
Analyzing merge sort

T(n)
MERGE-SORT A[1 . . n]
(1)
1. If n = 1, done.
2T(n/2) 2. Recursively sort A[ 1 . . n/2 ]
Abuse
and A[ n/2+1 . . n ] .
(n)
3. Merge the 2 sorted lists
Sloppiness: Should be T( n/2 ) + T( n/2 ) ,
but it turns out not to matter asymptotically.
September 8, 2004
L1.39
Recurrence for merge sort

T(n) =
(1) if n = 1;
2T(n/2) + (n) if n > 1.
We shall usually omit stating the base

case when T(n) = (1) for sufficiently
small n, but only when it has no effect on
the asymptotic solution to the recurrence.
CLRS and Lecture 2 provide several ways
to find a good upper bound on T(n).
September 8, 2004
L1.40
Recursion tree
Solve T(n) = 2T(n/2) + cn, where c > 0 is constant.
September 8, 2004
L1.41
Recursion tree
T(n)
September 8, 2004
L1.42
Recursion tree
cn
T(n/2)
T(n/2)
September 8, 2004
L1.43
Recursion tree
cn
cn/2
cn/2
T(n/4)
T(n/4)
T(n/4)
T(n/4)
September 8, 2004
L1.44
Recursion tree
cn
cn/2
cn/2
cn/4
cn/4
cn/4
cn/4
(1)
September 8, 2004
L1.45
Recursion tree
cn
cn/2
cn/2
cn/4
cn/4
cn/4
h = lg n cn/4
(1)
September 8, 2004
L1.46
Recursion tree
cn
cn
cn/2
cn/2
cn/4
cn/4
cn/4
h = lg n cn/4
(1)
September 8, 2004
L1.47
Recursion tree
cn
cn
cn/2
cn/2
cn/4
cn/4
cn/4
h = lg n cn/4
cn
(1)
September 8, 2004
L1.48
Recursion tree
cn
cn
cn/2
cn/2
cn/4
cn/4
cn/4
cn
h = lg n cn/4
cn
(1)
September 8, 2004
L1.49
Recursion tree
cn
cn
cn/2
cn/2
cn/4
cn/4
(1)
cn/4
cn
h = lg n cn/4
cn
#leaves = n
(n)
September 8, 2004
L1.50
Recursion tree
cn
cn
cn/2
cn/2
cn/4
cn/4
(1)
cn/4
cn
h = lg n cn/4
cn
#leaves = n
(n)
Total = (n lg n)
September 8, 2004
L1.51
Conclusions
(n lg n) grows more slowly than (n2).
Therefore, merge sort asymptotically
beats insertion sort in the worst case.
In practice, merge sort beats insertion
sort for n > 30 or so.
Go test it out for yourself!
September 8, 2004
L1.52
6.046J/18.401J
LECTURE 2
Asymptotic Notation
O-, -, and -notation
Recurrences
Substitution method
Iterating the recurrence
Recursion tree
Master method
Asymptotic notation
O-notation (upper bounds):
We
We write
write f(n)
f(n) == O(g(n))
O(g(n)) ifif there
there
exist
such
exist constants
constants cc >> 0,
0, nn00 >> 00 such
that
that 00 f(n)
f(n) cg(n)
cg(n) for
for all
all nn nn00..
September 13, 2004
L2.2
Asymptotic notation
We
We write
write f(n)
f(n) == O(g(n))
O(g(n)) ifif there
there
exist
such
exist constants
constants cc >> 0,
0, nn00 >> 00 such
that
that 00 f(n)
f(n) cg(n)
cg(n) for
for all
all nn nn00..
EXAMPLE: 2n2 = O(n3)
(c = 1, n0 = 2)
September 13, 2004
L2.3
Asymptotic notation
We
We write
write f(n)
f(n) == O(g(n))
O(g(n)) ifif there
there
exist
such
exist constants
constants cc >> 0,
0, nn00 >> 00 such
that
that 00 f(n)
f(n) cg(n)
cg(n) for
for all
all nn nn00..
functions,
not values
September 13, 2004
(c = 1, n0 = 2)
L2.4
Asymptotic notation
We
We write
write f(n)
f(n) == O(g(n))
O(g(n)) ifif there
there
exist
such
exist constants
constants cc >> 0,
0, nn00 >> 00 such
that
that 00 f(n)
f(n) cg(n)
cg(n) for
for all
all nn nn00..
(c = 1, n0 = 2)
funny, one-way
equality
functions,
not values
September 13, 2004
L2.5
Set definition of O-notation

O(g(n))
O(g(n)) == {{ f(n)
f(n) :: there
there exist
exist constants
constants
cc >> 0,
such
0, nn00 >> 00 such
that
that 00 f(n)
f(n) cg(n)
cg(n)
for
for all
all nn nn00 }}
September 13, 2004
L2.6

O(g(n))
O(g(n)) == {{ f(n)
f(n) :: there
there exist
exist constants
constants
cc >> 0,
such
0, nn00 >> 00 such
that
that 00 f(n)
f(n) cg(n)
cg(n)
for
for all
all nn nn00 }}
EXAMPLE: 2n2 O(n3)
September 13, 2004
L2.7

O(g(n))
O(g(n)) == {{ f(n)
f(n) :: there
there exist
exist constants
constants
cc >> 0,
such
0, nn00 >> 00 such
that
that 00 f(n)
f(n) cg(n)
cg(n)
for
for all
all nn nn00 }}
EXAMPLE: 2n2 O(n3)
(Logicians: n.2n2 O(n.n3), but its
convenient to be sloppy, as long as we
understand whats really going on.)
September 13, 2004
L2.8
Macro substitution
Convention: A set in a formula represents
an anonymous function in the set.
September 13, 2004
L2.9
Macro substitution
Convention: A set in a formula represents
an anonymous function in the set.
EXAMPLE:
f(n) = n3 + O(n2)
means
f(n) = n3 + h(n)
for some h(n) O(n2) .
September 13, 2004
L2.10
notation (lower bounds)

O-notation is an upper-bound notation. It
makes no sense to say f(n) is at least O(n2).
September 13, 2004
L2.11

O-notation is an upper-bound notation. It
makes no sense to say f(n) is at least O(n2).
(g(n))
(g(n)) == {{ f(n)
f(n) :: there
there exist
exist constants
constants
cc >> 0,
such
0, nn00 >> 00 such
that
that 00 cg(n)
cg(n) f(n)
f(n)
for
for all
all nn nn00 }}
September 13, 2004
L2.12
(g(n))
(g(n)) == {{ f(n)
f(n) :: there
there exist
exist constants
constants
cc >> 0,
such
0, nn00 >> 00 such
that
that 00 cg(n)
cg(n) f(n)
f(n)
for
for all
all nn nn00 }}
EXAMPLE:
n = (lg n)
September 13, 2004
L2.13
-notation (tight bounds)

(g(n))
(g(n)) == O
O (g(n))
(g(n))
(g(n))
(g(n))
September 13, 2004
L2.14

(g(n))
(g(n)) == O
O (g(n))
(g(n))
(g(n))
(g(n))
1 2
2
EXAMPLE: n 2n = (n )
2
September 13, 2004
L2.15

(g(n))
(g(n)) == O
O (g(n))
(g(n))
(g(n))
(g(n))
EXAMPLE:
1 2
n
2
2 n = ( n )
Theorem. The leading constant and loworder terms dont matter.

September 13, 2004
L2.16
Solving recurrences
The analysis of merge sort from Lecture 1
required us to solve a recurrence.
Recurrences are like solving integrals,
differential equations, etc.
o Learn a few tricks.
Lecture 3: Applications of recurrences to
divide-and-conquer algorithms.
September 13, 2004
L2.17
Substitution method
The most general method:
1. Guess the form of the solution.
2. Verify by induction.
3. Solve for constants.
September 13, 2004
L2.18
Substitution method
The most general method:
1. Guess the form of the solution.
2. Verify by induction.
3. Solve for constants.
EXAMPLE: T(n) = 4T(n/2) + n
[Assume that T(1) = (1).]
Guess O(n3) . (Prove O and separately.)
Assume that T(k) ck3 for k < n .
Prove T(n) cn3 by induction.
September 13, 2004
L2.19
Example of substitution
T (n) = 4T (n / 2) + n
4c ( n / 2 ) 3 + n
= ( c / 2) n 3 + n
desired residual
= cn3 ((c / 2)n3 n)
cn3 desired
whenever (c/2)n3 n 0, for example,
if c 2 and n 1.
residual
September 13, 2004
L2.20
Example (continued)
We must also handle the initial conditions,
that is, ground the induction with base
cases.
Base: T(n) = (1) for all n < n0, where n0
is a suitable constant.
For 1 n < n0, we have (1) cn3, if we
pick c big enough.
September 13, 2004
L2.21
Example (continued)
We must also handle the initial conditions,
that is, ground the induction with base
cases.
Base: T(n) = (1) for all n < n0, where n0
is a suitable constant.
For 1 n < n0, we have (1) cn3, if we
pick c big enough.
This bound is not tight!
September 13, 2004
L2.22
A tighter upper bound?

We shall prove that T(n) = O(n2).
September 13, 2004
L2.23

Assume that T(k) ck2 for k < n:
T (n) = 4T (n / 2) + n
4 c ( n / 2) 2 + n
= cn 2 + n
2
= O(n )
September 13, 2004
L2.24

T (n) = 4T (n / 2) + n
4c ( n / 2) 2 + n
= cn 2 + n
2
= O(n ) Wrong! We must prove the I.H.
September 13, 2004
L2.25

T (n) = 4T (n / 2) + n
4c ( n / 2) 2 + n
= cn 2 + n
2
= O(n ) Wrong! We must prove the I.H.
= cn 2 ( n) [ desired residual ]
cn 2 for no choice of c > 0. Lose!
September 13, 2004
L2.26
A tighter upper bound!

IDEA: Strengthen the inductive hypothesis.
Subtract a low-order term.
Inductive hypothesis: T(k) c1k2 c2k for k < n.
September 13, 2004
L2.27

T(n) = 4T(n/2) + n
= 4(c1(n/2)2 c2(n/2) + n
= c1n2 2c2n + n
= c1n2 c2n (c2n n)
c1n2 c2n if c2 1.
September 13, 2004
L2.28

T(n) = 4T(n/2) + n
= 4(c1(n/2)2 c2(n/2) + n
= c1n2 2c2n + n
= c1n2 c2n (c2n n)
c1n2 c2n if c2 1.
Pick c1 big enough to handle the initial conditions.
September 13, 2004
L2.29
Recursion-tree method
A recursion tree models the costs (time) of a
recursive execution of an algorithm.
The recursion-tree method can be unreliable,
just like any method that uses ellipses ().
The recursion-tree method promotes intuition,
however.
The recursion tree method is good for
generating guesses for the substitution method.
September 13, 2004
L2.30
Example of recursion tree

Solve T(n) = T(n/4) + T(n/2) + n2:
September 13, 2004
L2.31

Solve T(n) = T(n/4) + T(n/2) + n2:
T(n)
September 13, 2004
L2.32

Solve T(n) = T(n/4) + T(n/2) + n2:
n2
T(n/4)
T(n/2)
September 13, 2004
L2.33

Solve T(n) = T(n/4) + T(n/2) + n2:
n2
(n/4)2
T(n/16)
T(n/8)
(n/2)2
T(n/8)
T(n/4)
September 13, 2004
L2.34

Solve T(n) = T(n/4) + T(n/2) + n2:
n2
(n/4)2
(n/8)2
(n/8)2
(n/4)2
(n/16)2
(n/2)2
(1)
September 13, 2004
L2.35

Solve T(n) = T(n/4) + T(n/2) + n2:
n2
(n/4)2
(n/8)2
(n/2)2
(n/8)2
(n/4)2
(n/16)2
n2
(1)
September 13, 2004
L2.36

Solve T(n) = T(n/4) + T(n/2) + n2:
n2
(n/4)2
(n/8)2
5 n2
16
(n/2)2
(n/8)2
(n/4)2
(n/16)2
n2
(1)
September 13, 2004
L2.37

Solve T(n) = T(n/4) + T(n/2) + n2:
n2
(n/8)2
(n/8)2
(n/16)2
(n/2)2
(n/4)2
5 n2
16
25 n 2
256
(n/4)2
n2
(1)
September 13, 2004
L2.38

Solve T(n) = T(n/4) + T(n/2) + n2:
n2
(n/2)2
(n/8)2
(n/8)2
(1)
September 13, 2004
(n/4)2
( ) +( )
2
5
5
1 + 16 + 16
Total = n
= (n2)
5 n2
16
25 n 2
256
(n/4)2
(n/16)2
n2
5 3
16
+L
geometric series
L2.39
The master method

The master method applies to recurrences of
the form
T(n) = a T(n/b) + f (n) ,
where a 1, b > 1, and f is asymptotically
positive.
September 13, 2004
L2.40
Three common cases

Compare f (n) with nlogba:
1. f (n) = O(nlogba ) for some constant > 0.
f (n) grows polynomially slower than nlogba
(by an n factor).
Solution: T(n) = (nlogba) .
September 13, 2004
L2.41
Three common cases

1. f (n) = O(nlogba ) for some constant > 0.
f (n) grows polynomially slower than nlogba
(by an n factor).
Solution: T(n) = (nlogba) .
2. f (n) = (nlogba lgkn) for some constant k 0.
f (n) and nlogba grow at similar rates.
Solution: T(n) = (nlogba lgk+1n) .
September 13, 2004
L2.42
Three common cases (cont.)

3. f (n) = (nlogba + ) for some constant > 0.
f (n) grows polynomially faster than nlogba (by
an n factor),
and f (n) satisfies the regularity condition that
a f (n/b) c f (n) for some constant c < 1.
Solution: T(n) = ( f (n)) .
September 13, 2004
L2.43
Examples
EX. T(n) = 4T(n/2) + n
a = 4, b = 2 nlogba = n2; f (n) = n.
CASE 1: f (n) = O(n2 ) for = 1.
T(n) = (n2).
September 13, 2004
L2.44
Examples
EX. T(n) = 4T(n/2) + n
a = 4, b = 2 nlogba = n2; f (n) = n.
CASE 1: f (n) = O(n2 ) for = 1.
T(n) = (n2).
EX. T(n) = 4T(n/2) + n2
a = 4, b = 2 nlogba = n2; f (n) = n2.
CASE 2: f (n) = (n2lg0n), that is, k = 0.
T(n) = (n2lg n).
September 13, 2004
L2.45
Examples
EX. T(n) = 4T(n/2) + n3
a = 4, b = 2 nlogba = n2; f (n) = n3.
CASE 3: f (n) = (n2 + ) for = 1
and 4(n/2)3 cn3 (reg. cond.) for c = 1/2.
T(n) = (n3).
September 13, 2004
L2.46
Examples
EX. T(n) = 4T(n/2) + n3
a = 4, b = 2 nlogba = n2; f (n) = n3.
CASE 3: f (n) = (n2 + ) for = 1
and 4(n/2)3 cn3 (reg. cond.) for c = 1/2.
T(n) = (n3).
EX. T(n) = 4T(n/2) + n2/lg n
a = 4, b = 2 nlogba = n2; f (n) = n2/lg n.
Master method does not apply. In particular,
for every constant > 0, we have n = (lg n).
September 13, 2004
L2.47
Idea of master theorem

Recursion tree:
f (n)
a
f (n/b) f (n/b) f (n/b)
a
f (n/b2) f (n/b2) f (n/b2)
(1)
September 13, 2004
L2.48

Recursion tree:
f (n)
a f (n/b)
a2 f (n/b2)
a
f (n/b) f (n/b) f (n/b)
a
f (n/b2) f (n/b2) f (n/b2)
f (n)
(1)
September 13, 2004
L2.49

Recursion tree:
f (n)
a f (n/b)
a2 f (n/b2)
a
f (n/b) f (n/b) f (n/b)
a
h = logbn
f (n/b2) f (n/b2) f (n/b2)
f (n)
(1)
September 13, 2004
L2.50

Recursion tree:
f (n)
(1)
#leaves = ah
= alogbn
= nlogba
a f (n/b)
a2 f (n/b2)
a
f (n/b) f (n/b) f (n/b)
a
h = logbn
f (n/b2) f (n/b2) f (n/b2)
f (n)
nlogba (1)
September 13, 2004
L2.51

f (n)
a
f (n/b) f (n/b) f (n/b)
a
h = logbn
f (n/b2) f (n/b2) f (n/b2)
C
CASE
ASE 1:
1: The
The weight
weight increases
increases
geometrically
geometrically from
from the
the root
root to
to the
the
(1) leaves.
leaves. The
The leaves
leaves hold
hold aa constant
constant
fraction
fraction of
of the
the total
total weight.
weight.
September 13, 2004
f (n)
a f (n/b)
a2 f (n/b2)
Recursion tree:
nlogba (1)
(nlogba)
L2.52

f (n)
a
f (n/b) f (n/b) f (n/b)
a
h = logbn
f (n/b2) f (n/b2) f (n/b2)
(1)
September 13, 2004
C
CASE
ASE 2:
2: (k
(k == 0)
0) The
The weight
weight
isis approximately
approximately the
the same
same on
on
each
each of
of the
the log
logbbnn levels.
levels.
f (n)
a f (n/b)
a2 f (n/b2)
Recursion tree:
nlogba (1)
(nlogbalg n)
L2.53

f (n)
a
f (n/b) f (n/b) f (n/b)
a
h = logbn
f (n/b2) f (n/b2) f (n/b2)
C
CASE
ASE 3:
3: The
The weight
weight decreases
decreases
geometrically
geometrically from
from the
the root
root to
to the
the
(1) leaves.
leaves. The
The root
root holds
holds aa constant
constant
fraction
fraction of
of the
the total
total weight.
weight.
September 13, 2004
f (n)
a f (n/b)
a2 f (n/b2)
Recursion tree:
nlogba (1)
( f (n))
L2.54
Appendix: geometric series

n +1
x
1
for x 1
1 + x + x2 + L + xn =
1 x
1
1+ x + x +L =
for |x| < 1
1 x
2
Return to last
slide viewed.
September 13, 2004
L2.55
6.046J/18.401J
LECTURE 3
Divide and conquer
Binary search
Powering a number
Fibonacci numbers
Matrix multiplication
Strassens algorithm
VLSI tree layout
The divide-and-conquer
design paradigm
1. Divide the problem (instance)
into subproblems.
2. Conquer the subproblems by
solving them recursively.
3. Combine subproblem solutions.
September 15, 2004
L3.2
Merge sort
1. Divide: Trivial.
2. Conquer: Recursively sort 2 subarrays.
3. Combine: Linear-time merge.
September 15, 2004
L3.3
Merge sort
1. Divide: Trivial.
2. Conquer: Recursively sort 2 subarrays.
3. Combine: Linear-time merge.
T(n) = 2 T(n/2) + (n)
# subproblems
subproblem size
work dividing
and combining
September 15, 2004
L3.4
Master theorem (reprise)

T(n) = a T(n/b) + f (n)
CASE 1: f (n) = O(nlogba ), constant > 0
T(n) = (nlogba) .
CASE 2: f (n) = (nlogba lgkn), constant k 0
T(n) = (nlogba lgk+1n) .
CASE 3: f (n) = (nlogba + ), constant > 0,
and regularity condition
T(n) = ( f (n)) .
September 15, 2004
L3.5
Master theorem (reprise)

T(n) = a T(n/b) + f (n)
CASE 1: f (n) = O(nlogba ), constant > 0
T(n) = (nlogba) .
CASE 2: f (n) = (nlogba lgkn), constant k 0
T(n) = (nlogba lgk+1n) .
CASE 3: f (n) = (nlogba + ), constant > 0,
and regularity condition
T(n) = ( f (n)) .
Merge sort: a = 2, b = 2 nlogba = nlog22 = n
CASE 2 (k = 0) T(n) = (n lg n) .
September 15, 2004
L3.6
Binary search
Find an element in a sorted array:
1. Divide: Check middle element.
2. Conquer: Recursively search 1 subarray.
3. Combine: Trivial.
September 15, 2004
L3.7
Binary search
Example: Find 9
3
12
15
September 15, 2004
L3.8
Binary search
Example: Find 9
3
12
15
September 15, 2004
L3.9
Binary search
Example: Find 9
3
12
15
September 15, 2004
L3.10
Binary search
Example: Find 9
3
12
15
September 15, 2004
L3.11
Binary search
Example: Find 9
3
12
15
September 15, 2004
L3.12
Binary search
Example: Find 9
3
12
15
September 15, 2004
L3.13
Recurrence for binary search

T(n) = 1 T(n/2) + (1)
# subproblems
subproblem size
work dividing
and combining
September 15, 2004
L3.14
Recurrence for binary search

T(n) = 1 T(n/2) + (1)
# subproblems
subproblem size
work dividing
and combining
nlogba = nlog21 = n0 = 1 CASE 2 (k = 0)

T(n) = (lg n) .
September 15, 2004
L3.15
Powering a number
Problem: Compute a n, where n N.
Naive algorithm: (n).
September 15, 2004
L3.16
Powering a number
Divide-and-conquer algorithm:
an
a n/2 a n/2
a (n1)/2 a (n1)/2 a
if n is even;
if n is odd.
September 15, 2004
L3.17
Powering a number
Divide-and-conquer algorithm:
an
a n/2 a n/2
a (n1)/2 a (n1)/2 a
if n is even;
if n is odd.
T(n) = T(n/2) + (1) T(n) = (lg n) .

September 15, 2004
L3.18
Fibonacci numbers
Recursive definition:
0
if n = 0;
if n = 1;
Fn = 1
Fn1 + Fn2 if n 2.
0
8 13 21 34 L
September 15, 2004
L3.19
Fibonacci numbers
Recursive definition:
0
if n = 0;
if n = 1;
Fn = 1
Fn1 + Fn2 if n 2.
0
8 13 21 34 L
Naive recursive algorithm: ( n)

(exponential time), where = (1 + 5) / 2
is the golden ratio.
September 15, 2004
L3.20
Computing Fibonacci
numbers
Bottom-up:
Compute F0, F1, F2, , Fn in order, forming
each number by summing the two previous.
Running time: (n).
September 15, 2004
L3.21
Computing Fibonacci
numbers
Bottom-up:
Compute F0, F1, F2, , Fn in order, forming
each number by summing the two previous.
Running time: (n).
Naive recursive squaring:
Fn = n/ 5 rounded to the nearest integer.
Recursive squaring: (lg n) time.
This method is unreliable, since floating-point
arithmetic is prone to round-off errors.
September 15, 2004
L3.22
Recursive squaring
Fn +1
Theorem:
Fn
Fn 1 1
.
=
Fn 1 1 0
September 15, 2004
L3.23
Recursive squaring
Fn +1
Theorem:
Fn
Fn 1 1
.
=
Fn 1 1 0
Algorithm: Recursive squaring.

Time = (lg n) .
September 15, 2004
L3.24
Recursive squaring
Fn +1
Theorem:
Fn
Fn 1 1
.
=
Fn 1 1 0
Algorithm: Recursive squaring.

Time = (lg n) .
Proof of theorem. (Induction on n.)

1
F
F
2
1 1 1
Base (n = 1):
.
=
F1 F0 1 0
September 15, 2004
L3.25
Recursive squaring
Inductive step (n 2):
Fn +1
F
n
.
Fn Fn
Fn 1 1 1
=
Fn 1 Fn 1 Fn 2 1 0
n1
1 1
1 1
=
.
1 0
1 0
n
1 1
=
1
0
September 15, 2004
L3.26
Input: A = [aij], B = [bij].
Output: C = [cij] = A B.
c11 c12
c c
21 22
M M
c c
n1 n 2
L c1n a11 a12

L c2 n a21 a22
=
M
O M M
L cnn an1 an 2
i, j = 1, 2, , n.
L a1n b11 b12

L a2 n b21 b22
O M M M
L ann bn1 bn 2
L b1n
L b2 n
O M
L bnn
cij = aik bkj

k =1
September 15, 2004
L3.27
Standard algorithm
for i 1 to n
do for j 1 to n
do cij 0
for k 1 to n
do cij cij + aik bkj
September 15, 2004
L3.28
Standard algorithm
for i 1 to n
do for j 1 to n
do cij 0
for k 1 to n
do cij cij + aik bkj
Running time = (n3)

September 15, 2004
L3.29
Divide-and-conquer algorithm
IDEA:
nn matrix = 22 matrix of (n/2)(n/2) submatrices:
r s a b e f
t u = c d g h
C
r
s
t
u
= ae + bg
= af + bh
= ce + dg
= cf + dh
8 mults of (n/2)(n/2) submatrices

4 adds of (n/2)(n/2) submatrices
September 15, 2004
L3.30
Divide-and-conquer algorithm
IDEA:
nn matrix = 22 matrix of (n/2)(n/2) submatrices:
r s a b e f
t u = c d g h
r
s
t
u
= ae + bg
= af + bh
= ce + dh
= cf + dg
September 15, 2004
C = A B
recursive
8 mults of (n/2)(n/2) submatrices
^
4 adds of (n/2)(n/2) submatrices
L3.31
Analysis of D&C algorithm

T(n) = 8 T(n/2) + (n2)
# submatrices
submatrix size
work adding
submatrices
September 15, 2004
L3.32

T(n) = 8 T(n/2) + (n2)
# submatrices
submatrix size
work adding
submatrices
nlogba = nlog28 = n3 CASE 1 T(n) = (n3).
September 15, 2004
L3.33

T(n) = 8 T(n/2) + (n2)
# submatrices
submatrix size
work adding
submatrices
nlogba = nlog28 = n3 CASE 1 T(n) = (n3).

No better than the ordinary algorithm.
September 15, 2004
L3.34
Strassens idea
Multiply 22 matrices with only 7 recursive mults.
September 15, 2004
L3.35
Strassens idea
P1 = a ( f h)
P2 = (a + b) h
P3 = (c + d) e
P4 = d (g e)
P5 = (a + d) (e + h)
P6 = (b d) (g + h)
P7 = (a c) (e + f )
September 15, 2004
L3.36
Strassens idea
P1 = a ( f h)
P2 = (a + b) h
P3 = (c + d) e
P4 = d (g e)
P5 = (a + d) (e + h)
P6 = (b d) (g + h)
P7 = (a c) (e + f )
r
s
t
u
= P5 + P4 P2 + P6
= P1 + P2
= P3 + P4
= P5 + P1 P3 P7
September 15, 2004
L3.37
Strassens idea
P1 = a ( f h)
P2 = (a + b) h
P3 = (c + d) e
P4 = d (g e)
P5 = (a + d) (e + h)
P6 = (b d) (g + h)
P7 = (a c) (e + f )
r
s
t
u
= P5 + P4 P2 + P6
= P1 + P2
= P3 + P4
= P5 + P1 P3 P7
77 mults,
mults, 18
18 adds/subs.
adds/subs.
Note:
Note: No
No reliance
reliance on
on
commutativity
commutativity of
of mult!
mult!
September 15, 2004
L3.38
Strassens idea
P1 = a ( f h)
P2 = (a + b) h
P3 = (c + d) e
P4 = d (g e)
P5 = (a + d) (e + h)
P6 = (b d) (g + h)
P7 = (a c) (e + f )
r = P5 + P4 P2 + P6
= (a + d) (e + h)
+ d (g e) (a + b) h
+ (b d) (g + h)
= ae + ah + de + dh
+ dg de ah bh
+ bg + bh dg dh
= ae + bg
September 15, 2004
L3.39
Strassens algorithm
1. Divide: Partition A and B into
(n/2)(n/2) submatrices. Form terms
to be multiplied using + and .
2. Conquer: Perform 7 multiplications of
(n/2)(n/2) submatrices recursively.
3. Combine: Form C using + and on
(n/2)(n/2) submatrices.
September 15, 2004
L3.40
Strassens algorithm
1. Divide: Partition A and B into
(n/2)(n/2) submatrices. Form terms
to be multiplied using + and .
2. Conquer: Perform 7 multiplications of
(n/2)(n/2) submatrices recursively.
3. Combine: Form C using + and on
(n/2)(n/2) submatrices.
T(n) = 7 T(n/2) + (n2)

September 15, 2004
L3.41
Analysis of Strassen
T(n) = 7 T(n/2) + (n2)
September 15, 2004
L3.42
T(n) = 7 T(n/2) + (n2)
nlogba = nlog27 n2.81 CASE 1 T(n) = (nlg 7).
September 15, 2004
L3.43
T(n) = 7 T(n/2) + (n2)
The number 2.81 may not seem much smaller than
3, but because the difference is in the exponent, the
impact on running time is significant. In fact,
Strassens algorithm beats the ordinary algorithm
on todays machines for n 32 or so.
September 15, 2004
L3.44
T(n) = 7 T(n/2) + (n2)
The number 2.81 may not seem much smaller than
3, but because the difference is in the exponent, the
impact on running time is significant. In fact,
Strassens algorithm beats the ordinary algorithm
on todays machines for n 32 or so.
Best to date (of theoretical interest only): (n2.376L).
September 15, 2004
L3.45
VLSI layout
Problem: Embed a complete binary tree
with n leaves in a grid using minimal area.
September 15, 2004
L3.46
VLSI layout
W(n)
H(n)
September 15, 2004
L3.47
VLSI layout
W(n)
H(n)
H(n) = H(n/2) + (1)
= (lg n)
September 15, 2004
L3.48
VLSI layout
W(n)
H(n)
H(n) = H(n/2) + (1)
= (lg n)
W(n) = 2 W(n/2) + (1)

= (n)
September 15, 2004
L3.49
VLSI layout
W(n)
H(n)
H(n) = H(n/2) + (1) W(n) = 2 W(n/2) + (1)
= (lg n)
= (n)
Area = (n lg n)
September 15, 2004
L3.50
H-tree embedding
L(n)
L(n)
September 15, 2004
L3.51
H-tree embedding
L(n)
L(n)
L(n/4) (1) L(n/4)

September 15, 2004
L3.52
H-tree embedding
L(n)
L(n) = 2 L(n/4) + (1)
= ( n )
L(n)
Area = (n)
L(n/4) (1) L(n/4)
September 15, 2004
L3.53
Conclusion
Divide and conquer is just one of several
powerful techniques for algorithm design.
Divide-and-conquer algorithms can be
analyzed using recurrences and the master
method (so practice this math).
The divide-and-conquer strategy often leads
to efficient algorithms.
September 15, 2004
L3.54
6.046J/18.401J
Lecture 4
Prof. Piotr Indyk
Today
Randomized algorithms: algorithms that flip
coins
Matrix product checker: is AB=C ?
Quicksort:
Example of divide and conquer
Fast and practical sorting algorithm
Other applications on Wednesday
September 20, 2004
(c) Piotr Indyk & Charles Leiserson
L4.2
Randomized Algorithms
Algorithms that make random decisions
That is:
Can generate a random number x from
some range {1R}
Make decisions based on the value of x
Why would it make sense ?
September 20, 2004
L4.3
Two cups, one coin
If you always choose a fixed cup, the adversary

will put the coin in the other one, so the expected
payoff = $0
If you choose a random cup, the expected payoff
= $0.5
September 20, 2004
L4.4
Two basic types:
Typically fast (but sometimes slow):
Las Vegas
Typically correct (but sometimes output
garbage): Monte Carlo
The probabilities are defined by the random
numbers of the algorithm! (not by random
choices of the problem instance)
September 20, 2004
L4.5
Matrix Product
Compute C=AB
Simple algorithm: O(n3) time
Multiply two 22 matrices using 7 mult.
O(n2.81) time [Strassen69]
Multiply two 70 70 matrices using 143640
multiplications O(n2.795) time [Pan78]
O(n2.376) [Coppersmith-Winograd]
September 20, 2004
L4.6
Matrix Product Checker

Given: nn matrices A,B,C
Goal: is AB=C ?
We will see an O(n2) algorithm that:
If answer=YES, then Pr[output=YES]=1
If answer=NO, then Pr[output=YES]
September 20, 2004
L4.7
The algorithm
Algorithm:
Choose a random binary vector x[1n] ,
such that Pr[xi=1]= , i=1n
Check if ABx=Cx
Does it run in O(n2) time ?
YES, because ABx = A(Bx)
September 20, 2004
L4.8
Correctness
Let D=AB, need to check if D=C
What if D=C ?
Then Dx=Cx ,so the output is YES
What if DC ?
Presumably there exists x such that
DxCx
We need to show there are many such x
September 20, 2004
L4.9
DC
September 20, 2004
L4.10
Vector product
Consider vectors dc (say, dici)

Choose a random binary x
We have dx=cx iff (d-c)x=0
Pr[(d-c)x=0]= ?
(d-c):
x:
d1-c1 d2-c2
x1
x2
September 20, 2004
di-ci
dn-cn
xi
xn
= ji(dj-cj)xj + (di-ci)xi
L4.11
Analysis, ctd.
If xi=0, then (c-d)x=S1
If xi=1, then (c-d)x=S2S1
So, 1 of the choices gives (c-d)x0
Pr[cx=dx]
September 20, 2004
L4.12
Matrix Product Checker
Is AB=C ?
We have an algorithm that:
If answer=YES, then Pr[output=YES]=1
If answer=NO, then Pr[output=YES]
What if we want to reduce to ?
Run the algorithm twice, using independent random numbers
Output YES only if both runs say YES
Analysis:
If answer=YES, then Pr[output1=YES, output2=YES ]=1
If answer=NO, then
Pr[output=YES] = Pr[output1=YES, output2=YES]
= Pr[output1=YES]*Pr[output2=YES]
September 20, 2004
L4.13
Quicksort
Proposed by C.A.R. Hoare in 1962.
Divide-and-conquer algorithm.
Sorts in place (like insertion sort, but not
like merge sort).
Very practical (with tuning).
Can be viewed as a randomized Las Vegas
algorithm
September 20, 2004
L4.14
Divide and conquer

Quicksort an n-element array:
1. Divide: Partition the array into two subarrays
around a pivot x such that elements in lower
subarray x elements in upper subarray.
xx
xx xx xx
xx
2. Conquer: Recursively sort the two subarrays.
Key: Linear-time partitioning subroutine.
September 20, 2004
L4.15
Pseudocode for quicksort

QUICKSORT(A, p, r)
if p < r
then q PARTITION(A, p, r)
QUICKSORT(A, p, q1)
QUICKSORT(A, q+1, r)
Initial call: QUICKSORT(A, 1, n)
September 20, 2004
L4.16
Partitioning subroutine
PARTITION(A, p, r) A[ p . . r]
x A[ p]
pivot = A[ p]
ip
for j p + 1 to r
do if A[ j] x
then i i + 1
exchange A[i] A[ j]
exchange A[ p] A[i]
return i
Invariant: xx
p
September 20, 2004
xx
xx
i
??
j
r
L4.17
Example of partitioning
66 10
10 13
13 55
i
j
September 20, 2004
88
33
22 11
11
L4.18
66 10
10 13
13 55
i
j
September 20, 2004
88
33
22 11
11
L4.19
66 10
10 13
13 55
i
j
September 20, 2004
88
33
22 11
11
L4.20
66 10
10 13
13 55
66
September 20, 2004
88
33
22 11
11
55 13
13 10
10 88
i
j
33
22 11
11
L4.21
66 10
10 13
13 55
66
September 20, 2004
88
33
22 11
11
55 13
13 10
10 88
i
j
33
22 11
11
L4.22
66 10
10 13
13 55
66
September 20, 2004
88
33
22 11
11
55 13
13 10
10 88
i
33
j
22 11
11
L4.23
66 10
10 13
13 55
88
33
22 11
11
66
55 13
13 10
10 88
33
22 11
11
66
55
September 20, 2004
33 10
10 88 13
13 22 11
11
i
j
L4.24
66 10
10 13
13 55
88
33
22 11
11
66
55 13
13 10
10 88
33
22 11
11
66
55
September 20, 2004
33 10
10 88 13
13 22 11
11
i
j
L4.25
66 10
10 13
13 55
88
33
22 11
11
66
55 13
13 10
10 88
33
22 11
11
66
55
33 10
10 88 13
13 22 11
11
66
55
33
September 20, 2004
22
i
88 13
13 10
10 11
11
j
L4.26
66 10
10 13
13 55
88
33
22 11
11
66
55 13
13 10
10 88
33
22 11
11
66
55
33 10
10 88 13
13 22 11
11
66
55
33
September 20, 2004
22
i
88 13
13 10
10 11
11
j
L4.27
66 10
10 13
13 55
88
33
22 11
11
66
55 13
13 10
10 88
33
22 11
11
66
55
33 10
10 88 13
13 22 11
11
66
55
33
September 20, 2004
22
i
88 13
13 10
10 11
11
L4.28
66 10
10 13
13 55
88
33
22 11
11
66
55 13
13 10
10 88
33
22 11
11
66
55
33 10
10 88 13
13 22 11
11
66
55
33
22
88 13
13 10
10 11
11
22
55
33
66
i
88 13
13 10
10 11
11
September 20, 2004
L4.29
Analysis of quicksort
Assume all input elements are distinct.
In practice, there are better partitioning
algorithms for when duplicate input
elements may exist.
What is the worst case running time of
Quicksort ?
xx
September 20, 2004
xx
xx
L4.30
Worst-case of quicksort
Input sorted or reverse sorted.
Partition around min or max element.
One side of partition always has no elements.
T (n) = T (0) + T (n 1) + (n)

= (1) + T (n 1) + (n)
= T (n 1) + (n)
= ( n 2 )
September 20, 2004
(arithmetic series)
L4.31
Worst-case recursion tree

T(n) = T(0) + T(n1) + cn
September 20, 2004
L4.32

T(n) = T(0) + T(n1) + cn
T(n)
September 20, 2004
L4.33

T(n) = T(0) + T(n1) + cn
cn
T(0) T(n1)
September 20, 2004
L4.34

T(n) = T(0) + T(n1) + cn
cn
T(0) c(n1)
T(0) T(n2)
September 20, 2004
L4.35

T(n) = T(0) + T(n1) + cn
cn
T(0) c(n1)
T(0) c(n2)
T(0)
O
(1)
September 20, 2004
L4.36

T(n) = T(0) + T(n1) + cn
cn
T(0) c(n1)
T(0) c(n2)
T(0)
n
k = (n 2 )
k =1
O
(1)
September 20, 2004
L4.37

T(n) = T(0) + T(n1) + cn
cn
(1) c(n1)
h=n
(1) c(n2)
(1)
n
k = (n 2 )
k =1
O
(1)
September 20, 2004
L4.38
Nice-case analysis
If were lucky, PARTITION splits the array evenly:
T(n) = 2T(n/2) + (n)
(same as merge sort)
= (n lg n)
What if the split is
1 9
always 10 : 10 ?
T (n) = T (101 n ) + T (109 n ) + (n)
September 20, 2004
L4.39
Analysis of nice case

T (n)
September 20, 2004
L4.40

cn
T (101 n )
September 20, 2004
T (109 n )
L4.41

cn
1
10
cn
9
10
cn
9
9
81
1
T (100
n ) T (100
n ) T (100
n )T (100
n)
September 20, 2004
L4.42

1
10
cn
9
100
9
10
cn
cn
log10/9n
9
81
cn
cn
100
100
1
100
cn
cn
(1)
cn
cn
cn
(1)
September 20, 2004
L4.43

1
10
9
100
9
10
cn
cn
log10/9n
9
81
cn
cn
100
100
log10n
1
cn
100
cn
cn
(1)
cn
cn
cn
(1)
cn log10n T(n) cn log10/9n + (n)
September 20, 2004
L4.44
Randomized quicksort
Partition around a random element. I.e.,
around A[t] , where t chosen uniformly
at random from {pr}
We will show that the expected time is
O(n log n)
September 20, 2004
L4.45
Paranoid quicksort
Will modify the algorithm to make it easier to
analyze:
Repeat:
Choose the pivot to be a random element
of the array
Perform PARTITION
Until the resulting split is lucky, i.e., not
worse than 1/10: 9/10
Recurse on both sub-arrays
September 20, 2004
L4.46
Analysis
Let T(n) be an upper bound on the expected
running time on any array of n elements
Consider any input of size n
The time needed to sort the input is bounded
from the above by a sum of
The time needed to sort the left subarray
The time needed to sort the right subarray
The number of iterations until we get a
lucky split, times cn
September 20, 2004
L4.47
Expectations
By linearity of expectation:
T (n) max T (i ) + T (n i ) + E[# partitions ] cn

where maximum is taken over i [n/10,9n/10]
We will show that E[#partitions] is 10/8
Therefore:
T (n) max T (i ) + T (n i ) + 2cn, i [n / 10,9n / 10]
September 20, 2004
L4.48
Final bound
Can use the recursion tree argument:
Tree depth is (log n)
Total expected work at each level is at most
10/8 cn
The total expected time is (n log n)
September 20, 2004
L4.49
Lucky partitions
The probability that a random pivot induces
lucky partition is at least 8/10
(we are not lucky if the pivot happens to be
among the smallest/largest n/10 elements)
If we flip a coin, with heads prob. p=8/10 ,
the expected waiting time for the first head
is 1/p = 10/8
September 20, 2004
L4.50
Quicksort in practice
Quicksort is a great general-purpose
sorting algorithm.
Quicksort is typically over twice as fast
as merge sort.
Quicksort can benefit substantially from
code tuning.
Quicksort behaves well even with
caching and virtual memory.
Quicksort is great!
September 20, 2004
L4.51
More intuition
Suppose we alternate lucky, unlucky,
lucky, unlucky, lucky, .
L(n) = 2U(n/2) + (n) lucky
U(n) = L(n 1) + (n) unlucky
Solving:
L(n) = 2(L(n/2 1) + (n/2)) + (n)
= 2L(n/2 1) + (n)
= (n lg n) Lucky!
How can we make sure we are usually lucky?
September 20, 2004
L4.52
Randomized quicksort
analysis
Let T(n) = the random variable for the running
time of randomized quicksort on an input of size
n, assuming random numbers are independent.
For k = 0, 1, , n1, define the indicator
random variable
Xk =
1 if PARTITION generates a k : nk1 split,

0 otherwise.
E[Xk] = Pr{Xk = 1} = 1/n, since all splits are

equally likely, assuming elements are distinct.
September 20, 2004
L4.53
Analysis (continued)
T(0) + T(n1) + (n) if 0 : n1 split,
T(1) + T(n2) + (n) if 1 : n2 split,
M
T(n1) + T(0) + (n) if n1 : 0 split,
T(n) =
n 1
X k (T (k ) + T (n k 1) + (n)) .
k =0
September 20, 2004
L4.54
Calculating expectation
n 1
E[T (n)] = E X k (T (k ) + T (n k 1) + (n) )

k =0
Take expectations of both sides.
September 20, 2004
L4.55
n 1
E[T (n)] = E X k (T (k ) + T (n k 1) + (n) )

k =0
n 1
E[ X k (T (k ) + T (n k 1) + (n) )]
k =0
Linearity of expectation.
September 20, 2004
L4.56
n 1
E[T (n)] = E X k (T (k ) + T (n k 1) + (n) )

k =0
=
=
n 1
E[ X k (T (k ) + T (n k 1) + (n) )]
k =0
n 1
E[ X k ] E[T (k ) + T (n k 1) + (n)]
k =0
Independence of Xk from other random

choices.
September 20, 2004
L4.57
n 1
E[T (n)] = E X k (T (k ) + T (n k 1) + (n) )

k =0
=
=
n 1
E[ X k (T (k ) + T (n k 1) + (n) )]
k =0
n 1
E[ X k ] E[T (k ) + T (n k 1) + (n)]
k =0
n 1
n 1
n 1
= 1 E [T (k )] + 1 E [T (n k 1)] + 1 (n)
n k =0
n k =0
n k =0
Linearity of expectation; E[Xk] = 1/n .

September 20, 2004
L4.58
n 1
E[T (n)] = E X k (T (k ) + T ( n k 1) + (n) )
k =0
=
=
n 1
E[ X k (T (k ) + T (n k 1) + (n) )]
k =0
n 1
E[ X k ] E[T (k ) + T (n k 1) + (n)]
k =0
n 1
n 1
n 1
= 1 E [T (k )] + 1 E [T (n k 1)] + 1 (n)
n k =0
n k =0
n k =0
n 1
= 2 E [T (k )] + (n)
n k =1
September 20, 2004
Summations have
identical terms.
L4.59
Hairy recurrence
n 1
E[T (n)] = 2 E [T (k )] + (n)

n k =2
(The k = 0, 1 terms can be absorbed in the (n).)
Prove: E[T(n)] an lg n for constant a > 0 .
Choose a large enough so that an lg n
dominates E[T(n)] for sufficiently small n 2.
n 1
Use fact:
1 n 2 lg n 1n 2
k
lg
k
(exercise).
2
8
k =2
September 20, 2004
L4.60
Substitution method
n 1
E [T (n)] 2 ak lg k + (n)
n k =2
Substitute inductive hypothesis.
September 20, 2004
L4.61
Substitution method
n 1
E [T (n)] 2 ak lg k + (n)
n k =2
2a 1 n 2 lg n 1 n 2 + (n)
n 2
8
Use fact.
September 20, 2004
L4.62
Substitution method
n 1
E [T (n)] 2 ak lg k + (n)
n k =2
2a 1 n 2 lg n 1 n 2 + (n)
n 2
8
= an lg n an (n)
4
Express as desired residual.
September 20, 2004
L4.63
Substitution method
n 1
E [T (n)] 2 ak lg k + (n)
n k =2
= 2a 1 n 2 lg n 1 n 2 + (n)
n 2
8
= an lg n an (n)
4
an lg n ,
if a is chosen large enough so that
an/4 dominates the (n).
September 20, 2004
L4.64
Running
Running time
time
== O(n)
O(n) for
for nn
elements.
elements.
Assume
September 20, 2004
L4.65
Algorithms that make decisions based on
random coin flips.
Can fool the adversary.
The running time (or even correctness) is a
random variable; we measure the expected
running time.
We assume all random choices are
independent .
This is not the average case !
September 20, 2004
L4.66
6.046J/18.401J
Lecture 5
Prof. Piotr Indyk
Today
Order statistics (e.g., finding median)
Two O(n) time algorithms:
Randomized: similar to Quicksort
Deterministic: quite tricky
Both are examples of divide and conquer
Piotr Indyk and Charles Leiserson
September 22, 2004
L5.2
Order statistics
Select the ith smallest of n elements (the
element with rank i).
i = 1: minimum;
i = n: maximum;
i = (n+1)/2 or (n+1)/2: median.
How fast can we solve the problem ?
Min/max: O(n)
General i : O(n log n) by sorting
We will see how to do it in O(n) time
September 22, 2004
L5.3
Randomized Algorithm for

Finding the ith element
Divide and Conquer Approach
Main idea: PARTITION
k
xx
p
If i<k, recurse on the left
If i>k, recurse on the right
Otherwise, output x
xx
September 22, 2004
L5.4
Randomized Divide-andConquer
RAND-SELECT(A, p, r, i)
if p = r then return A[ p]
q RAND-PARTITION(A, p, r)
k=rank(A[q])
kqp+1
if i = k then return A[ q]
if i < k
then return RAND-SELECT(A, p, q 1, i )
else return RAND-SELECT(A, q + 1, r, i k )
k
A[q]
A[q]
A[q]
A[q]
September 22, 2004
L5.5
Example
Select the i = 7th smallest:
66 10
10 13
13 55
pivot
Partition:
22 55
33
66
88
33
22 11
11
i=7
88 13
13 10
10 11
11
k=4
Select the 7 4 = 3rd smallest recursively.

September 22, 2004
L5.6
Analysis
What is the worst-case running time ?
Unlucky:
arithmetic series
T(n) = T(n 1) + (n)
= (n2)
Recall that a lucky partition splits into arrays
with size ratio at most 9:1
What if all partitions are lucky ?
Lucky:
n log10 / 9 1 = n 0 = 1
T(n) = T(9n/10) + (n)
CASE 3
= (n)
September 22, 2004
L5.7
Expected Running Time

The probability that a random pivot induces lucky
partition is at least 8/10 (Lecture 4)
Let ti be the number of partitions performed
between the (i-1) -th and the i-th lucky partition
The total time is at most
T = t1 n + t2 (9/10) n + t3 (9/10)2 n +
The total expected time is at most:
E[T]=E[t1] n + E[t2] (9/10) n + E[t3] (9/10)2 n +
= 10/8 * [n + (9/10)n + ]
= O(n)
September 22, 2004
L5.8
Digression: 9 to 1
Do we need to define the lucky partition as
9:1 balanced ?
No. Suffices to say that both sides have size
n , for 0< <
Probability of getting a lucky partition is
1-2
September 22, 2004
L5.9
How Does it Work In Practice?

Need 7 volunteers (a.k.a. elements)
Will choose the median according to height
September 22, 2004
L5.10
x A[ p]
pivot = A[ p]
ip
for j p + 1 to r
do if A[ j] x
then i i + 1
exchange A[i] A[ j]
exchange A[ p] A[i]
return i
Invariant: xx
p
xx
xx
i
??
j
September 22, 2004
L5.11
Summary of randomized
order-statistic selection
Works fast: linear expected time.
Excellent algorithm in practice.
But, the worst case is very bad: (n2).
Q. Is there an algorithm that runs in linear
time in the worst case?
A. Yes, due to [Blum-Floyd-Pratt-RivestTarjan73].
IDEA: Generate a good pivot recursively.
September 22, 2004
L5.12
Worst-case linear-time order

statistics
SELECT(i, n)
1. Divide the n elements into groups of 5. Find
the median of each 5-element group by hand.
2. Recursively SELECT the median x of the n/5
group medians to be the pivot.
3. Partition around the pivot x. Let k = rank(x).
4. if i = k then return x
elseif i < k
then recursively SELECT the ith
smallest element in the lower part
else recursively SELECT the (ik)th
smallest element in the upper part
Same as
RANDSELECT
September 22, 2004
L5.13
Choosing the pivot
September 22, 2004
L5.14
Choosing the pivot
1. Divide the n elements into groups of 5.
September 22, 2004
L5.15
Choosing the pivot
1. Divide the n elements into groups of 5. Find lesser

the median of each 5-element group by rote.
greater
September 22, 2004
L5.16
Choosing the pivot
1. Divide the n elements into groups of 5. Find lesser

greater
September 22, 2004
L5.17
Analysis
At least half the group medians are x, which

is at least n/5 /2 = n/10 group medians.
lesser
greater
September 22, 2004
L5.18
Analysis

Therefore, at least 3 n/10 elements are x.
lesser
greater
September 22, 2004
L5.19
Analysis

Therefore, at least 3 n/10 elements are x.
Similarly, at least 3 n/10 elements are x.
lesser
greater
September 22, 2004
L5.20
Developing the recurrence

T(n)
(n)
T(n/5)
(n)
T(7n/10)
SELECT(i, n)
1. Divide the n elements into groups of 5. Find
3. Partition around the pivot x. Let k = rank(x).
4. if i = k then return x
elseif i < k
then recursively SELECT the ith
smallest element in the lower part
else recursively SELECT the (ik)th
smallest element in the upper part
September 22, 2004
L5.21
Solving the recurrence

7
1
T ( n ) = T n + T n + ( n )
10
5
Substitution:
T(n) cn
1
7
T (n) cn + cn + (n)
5
10
18
cn + (n)
=
20
2
= cn cn (n)
20
cn
if c is chosen large enough to handle the (n).

September 22, 2004
L5.22
Minor simplification
For n 50, we have 3 n/10 n/4.
Therefore, for n 50 the recursive call to
SELECT in Step 4 is executed recursively
on 3n/4 elements.
Thus, the recurrence for running time
can assume that Step 4 takes time
T(3n/4) in the worst case.
For n < 50, we know that the worst-case
time is T(n) = (1).
September 22, 2004
L5.23
Conclusions
Since the work at each level of recursion
is a constant fraction (18/20) smaller, the
work per level is a geometric series
dominated by the linear work at the root.
In practice, this algorithm runs slowly,
because the constant in front of n is large.
The randomized algorithm is far more
practical.
Exercise: Why not divide into groups of 3?
September 22, 2004
L5.24
6.046J/18.401J
Lecture 6
Prof. Piotr Indyk
Today: sorting
Show that (n lg n) is the best possible
running time for a sorting algorithm.
Design an algorithm that sorts in O(n) time.
Hint: different models ?
Charles E. Leiserson and Piotr Indyk
September 27, 2004
L6.2
Comparison sort
All the sorting algorithms we have seen so far
are comparison sorts: only use comparisons to
determine the relative order of elements.
E.g., insertion sort, merge sort, quicksort,
heapsort.
September 27, 2004
L6.3
x A[ p]
pivot = A[ p]
ip
for j p + 1 to r
do if A[ j] x
then i i + 1
exchange A[i] A[ j]
exchange A[ p] A[i]
return i
Invariant: xx
p
xx
xx
i
??
j
r
September 27, 2004
L6.4
Comparison sort
All of our algorithms used comparisons
All of our algorithms have running time
(n lg n)
Is it the best that we can do using just
comparisons ?
Answer: YES, via decision trees
September 27, 2004
L6.5
Decision-tree example
Sort a1, a2, , an
(n=3)
1:2
1:2
2:3
2:3
123
123
1:3
1:3
213
213
1:3
1:3
132
132
312
312
2:3
2:3
231
231
321
321
Each internal node is labeled i:j for i, j {1, 2,, n}.

The left subtree shows subsequent comparisons if ai aj.
The right subtree shows subsequent comparisons if ai aj.
September 27, 2004
L6.6
Sort a1, a2, a3
= 9, 4, 6 :
1:2
1:2
2:3
2:3
123
123
1:3
1:3
213
213
1:3
1:3
132
132
312
312
2:3
2:3
231
231
321
321

September 27, 2004
L6.7
Sort a1, a2, a3
= 9, 4, 6 :
1:2
1:2
94
2:3
2:3
123
123
1:3
1:3
213
213
1:3
1:3
132
132
312
312
2:3
2:3
231
231
321
321

September 27, 2004
L6.8
Sort a1, a2, a3
= 9, 4, 6 :
1:2
1:2
2:3
2:3
123
123
1:3
1:3
213
213
1:3
1:3
132
132
312
312
96
2:3
2:3
231
231
321
321

September 27, 2004
L6.9
Sort a1, a2, a3
= 9, 4, 6 :
1:2
1:2
2:3
2:3
123
123
1:3
1:3
213
213 4 6 2:3
2:3
1:3
1:3
132
132
312
312
231
231
321
321

September 27, 2004
L6.10
Sort a1, a2, a3
= 9, 4, 6 :
1:2
1:2
2:3
2:3
123
123
1:3
1:3
213
213
1:3
1:3
132
132
312
312
2:3
2:3
231
231
321
321
469
Each leaf contains a permutation (1), (2),, (n) to
indicate that the ordering a(1) a(2) L a(n) has been
established.
September 27, 2004
L6.11
Decision-tree model
A decision tree can model the execution of
any comparison sort:
One tree for each input size n.
View the algorithm as splitting whenever it compares two
elements.
The tree contains the comparisons along all possible
instruction traces.
The number of comparisons done by the algorithm on a given
input =
the length of the path taken.
Worst-case number of comparisons =
max path length = height of tree.
Worst-case time worst-case number of comparisons
September 27, 2004
L6.12
Lower bound for decisiontree sorting

Theorem. Any decision tree that can sort n
elements must have height (n lg n) .
Corollary. Any comparison sorting algorithm
has worst-case running time (n lg n).
Corollary 2. Merge sort and Heap Sort are
asymptotically optimal comparison sorting
algorithms.
September 27, 2004
L6.13
Lower bound for decisiontree sorting

Theorem. Any decision tree that can sort n
elements must have height (n lg n) .
Proof.
The tree must contain n! leaves, since there
are n! possible permutations
A height-h binary tree has 2h leaves
Thus, 2h #leaves n! , or h lg(n!)
September 27, 2004
L6.14
Proof, ctd.
2h
n!
n*(n-1)** n/2
(n/2)n/2
lg( (n/2)n/2 )
(n/2) (lg n lg 2)
(n lg n) .
September 27, 2004
L6.15
Example: sorting 3 elements

Recall h lg(n!)
n=3
n!=6
log26 = 2.58
Sorting 3 elements requires
3 comparisons in the worst case
September 27, 2004
L6.16
Decision-tree for n=3

Sort a1, a2, a3
1:2
1:2
2:3
2:3
123
123
213
213
1:3
1:3
132
132
1:3
1:3
312
312
2:3
2:3
231
231
321
321
September 27, 2004
L6.17
Sorting in linear time

Counting sort: No comparisons between elements.
Input: A[1 . . n], where A[ j]{1, 2, , k} .
Output: B[1 . . n], sorted*
Auxiliary storage: C[1 . . k] .
*Actually, we require the algorithm to construct a permutation of the input

array A that produces the sorted array B. This permutation can be obtained
by making small changes to the last loop of the algorithm.
September 27, 2004
L6.18
Counting sort
for i 1 to k
do C[i] 0
for j 1 to n
do C[A[ j]] C[A[ j]] + 1 C[i] = |{key = i}|
for i 2 to k
do C[i] C[i] + C[i1]
C[i] = |{key i}|
for j n downto 1
do B[C[A[ j]]] A[ j]
C[A[ j]] C[A[ j]] 1
September 27, 2004
L6.19
Counting-sort example
A:
44
11
33
44
33
C:
B:
September 27, 2004
L6.20
Loop 1
A:
44
11
33
44
33
C:
00
00
00
00
B:
for i 1 to k
do C[i] 0
September 27, 2004
L6.21
Loop 2
A:
44
11
33
44
33
C:
00
00
00
11
B:
for j 1 to n
do C[A[ j]] C[A[ j]] + 1 C[i] = |{key = i}|
September 27, 2004
L6.22
Loop 2
A:
44
11
33
44
33
C:
11
00
00
11
B:
for j 1 to n
do C[A[ j]] C[A[ j]] + 1 C[i] = |{key = i}|
September 27, 2004
L6.23
Loop 2
A:
44
11
33
44
33
C:
11
00
11
11
B:
for j 1 to n
do C[A[ j]] C[A[ j]] + 1 C[i] = |{key = i}|
September 27, 2004
L6.24
Loop 2
A:
44
11
33
44
33
C:
11
00
11
22
B:
for j 1 to n
do C[A[ j]] C[A[ j]] + 1 C[i] = |{key = i}|
September 27, 2004
L6.25
Loop 2
A:
44
11
33
44
33
C:
11
00
22
22
B:
for j 1 to n
do C[A[ j]] C[A[ j]] + 1 C[i] = |{key = i}|
September 27, 2004
L6.26
Loop 3
A:
44
11
33
44
33
B:
C:
11
00
22
22
C':
11
11
22
22
for i 2 to k
C[i] = |{key i}|

September 27, 2004
L6.27
Loop 3
A:
44
11
33
44
33
B:
C:
11
00
22
22
C':
11
11
33
22
for i 2 to k
C[i] = |{key i}|

September 27, 2004
L6.28
Loop 3
A:
44
11
33
44
33
B:
C:
11
00
22
22
C':
11
11
33
55
for i 2 to k
C[i] = |{key i}|

September 27, 2004
L6.29
Loop 4
A:
44
11
33
44
33
B:
C':
11
11
33
55
for j n downto 1
C[A[ j]] C[A[ j]] 1
September 27, 2004
L6.30
Loop 4
A:
B:
44
11
33
44
33
33
C':
11
11
33
55
for j n downto 1
C[A[ j]] C[A[ j]] 1
September 27, 2004
L6.31
Loop 4
A:
B:
44
11
33
44
33
33
C':
11
11
22
55
for j n downto 1
C[A[ j]] C[A[ j]] 1
September 27, 2004
L6.32
Loop 4
A:
B:
44
11
33
44
33
33
44
C':
11
11
22
55
for j n downto 1
C[A[ j]] C[A[ j]] 1
September 27, 2004
L6.33
Loop 4
A:
B:
44
11
33
44
33
33
44
C':
11
11
22
44
for j n downto 1
C[A[ j]] C[A[ j]] 1
September 27, 2004
L6.34
Loop 4
A:
B:
44
11
33
44
33
33
33
44
C':
11
11
22
44
for j n downto 1
C[A[ j]] C[A[ j]] 1
September 27, 2004
L6.35
Loop 4
A:
B:
44
11
33
44
33
33
33
44
C':
11
11
11
44
for j n downto 1
C[A[ j]] C[A[ j]] 1
September 27, 2004
L6.36
Loop 4
A:
B:
44
11
33
44
33
11
33
33
44
C':
11
11
11
44
for j n downto 1
C[A[ j]] C[A[ j]] 1
September 27, 2004
L6.37
Loop 4
A:
B:
44
11
33
44
33
11
33
33
44
C':
00
11
11
44
for j n downto 1
C[A[ j]] C[A[ j]] 1
September 27, 2004
L6.38
Loop 4
A:
B:
44
11
33
44
33
11
33
33
44
44
C':
00
11
11
44
for j n downto 1
C[A[ j]] C[A[ j]] 1
September 27, 2004
L6.39
Loop 4
A:
B:
44
11
33
44
33
11
33
33
44
44
C':
00
11
11
33
for j n downto 1
C[A[ j]] C[A[ j]] 1
September 27, 2004
L6.40
B vs C
B:
11
33
33
44
44
C':
11
11
33
55
In the end, each element i occupies the range

B[C[i-1]+1 C[i]]
September 27, 2004
L6.41
Analysis
(k)
(n)
(k)
(n)
for i 1 to k
do C[i] 0
for j 1 to n
do C[A[ j]] C[A[ j]] + 1
for i 2 to k
for j n downto 1
C[A[ j]] C[A[ j]] 1
(n + k)
September 27, 2004
L6.42
Running time
If k = O(n), then counting sort takes (n) time.
But, sorting takes (n lg n) time!
Why ?
Answer:
Comparison sorting takes (n lg n) time.
Counting sort is not a comparison sort.
In fact, not a single comparison between
elements occurs!
September 27, 2004
L6.43
Stable sorting
Counting sort is a stable sort: it preserves
the input order among equal elements.
A:
44
11
33
44
33
B:
11
33
33
44
44
September 27, 2004
L6.44
Sorting integers
We can sort n integers from {1, 2, , k} in
O(n+k) time
This is nice if k=O(n)
What if, say, k=n2 ?
September 27, 2004
L6.45
Radix sort
Origin: Herman Holleriths card-sorting
machine for the 1890 U.S. Census. (See
Appendix .)
Digit-by-digit sort.
Holleriths original (bad) idea: sort on
most-significant digit first.
Good idea: Sort on least-significant digit
first with auxiliary stable sort.
September 27, 2004
L6.46
Operation of radix sort

329
457
657
839
436
720
355
720
355
436
457
657
329
839
720
329
436
839
355
457
657
329
355
436
457
657
720
839
September 27, 2004
L6.47
Correctness of radix sort

Induction on digit position
Assume that the numbers
are sorted by their low-order
t 1 digits.
Sort on digit t
720
329
436
839
355
457
657
329
355
436
457
657
720
839
September 27, 2004
L6.48

t 1 digits.
Sort on digit t
Two numbers that differ in
digit t are correctly sorted.
720
329
436
839
355
457
657
329
355
436
457
657
720
839
September 27, 2004
L6.49

t 1 digits.
Sort on digit t
Two numbers that differ in
digit t are correctly sorted.
Two numbers equal in digit t
are put in the same order as
the input correct order.
720
329
436
839
355
457
657
329
355
436
457
657
720
839
September 27, 2004
L6.50
Analysis of radix sort

Assume counting sort is the auxiliary stable sort.
Sort n computer words of b bits each
E.g., if we sort elements in {1n2} , b=2 lg n
Each word can be viewed as having b/r base-2r
digits.
Example: 32-bit word
r = 8 b/r = 4 passes of counting sort on base-28 digits;

or r = 16 b/r = 2 passes of counting sort on base-216
digits.
September 27, 2004
L6.51
Analysis (continued)
Recall: Counting sort takes (n + k) time to
sort n numbers in the range from 0 to k 1.
If each b-bit word is broken into r-bit pieces,
each pass of counting sort takes (n + 2r) time.
Since there are b/r passes, we have
T (n, b) = b (n + 2 r ) .
r
Choose r to minimize T(n, b):
Increasing r means fewer passes, but as
r >> lg n, the time grows exponentially.
September 27, 2004
L6.52
Choosing r
T (n, b) = b (n + 2 r )
r
Minimize T(n, b) by differentiating and setting to 0.
Or, just observe that we dont want 2r >> n, and
theres no harm asymptotically in choosing r as
large as possible subject to this constraint.
Choosing r = lg n implies T(n, b) = (bn/lg n) .
For numbers in the range from 0 to n d 1, we
have b = d lg n radix sort runs in (d n) time.
September 27, 2004
L6.53
Conclusions
In practice, radix sort is fast for large inputs, as
well as simple to code and maintain.
Example (32-bit numbers):
At most 3 passes when sorting 2000 numbers.
Merge sort and quicksort do at least lg 2000 =
11 passes.
Downside: Unlike quicksort, radix sort displays
little locality of reference.
September 27, 2004
L6.54
Appendix: Punched-card
technology
Herman Hollerith (1860-1929)
Punched cards
Holleriths tabulating system
Operation of the sorter
Origin of radix sort
Modern IBM card
Web resources on punchedcard technology
Return to last
slide viewed.
September 27, 2004
L6.55
Herman Hollerith
(1860-1929)
The 1880 U.S. Census took almost
10 years to process.
While a lecturer at MIT, Hollerith
prototyped punched-card technology.
His machines, including a card sorter, allowed
the 1890 census total to be reported in 6 weeks.
He founded the Tabulating Machine Company in
1911, which merged with other companies in 1924
to form International Business Machines.
Image removed due to copyright considerations.
September 27, 2004
L6.56
Punched cards
Punched card = data record.
Hole = value.
Algorithm = machine + human operator.
September 27, 2004
L6.57
Holleriths
tabulating
system
Pantograph card
punch
Hand-press reader
Dial counters
Sorting box
September 27, 2004
L6.58
Operation of the sorter

An operator inserts a card into
the press.
Pins on the press reach through
the punched holes to make
electrical contact with mercuryfilled cups beneath the card.
Whenever a particular digit
value is punched, the lid of the
corresponding sorting bin lifts.
The operator deposits the card
into the bin and closes the lid.
When all cards have been processed, the front panel is opened, and
the cards are collected in order, yielding one pass of a stable sort.
September 27, 2004
L6.59
Origin of radix sort

Holleriths original 1889 patent alludes to a mostsignificant-digit-first radix sort:
The most complicated combinations can readily be
counted with comparatively few counters or relays by first
assorting the cards according to the first items entering
into the combinations, then reassorting each group
according to the second item entering into the combination,
and so on, and finally counting on a few counters the last
item of the combination for each group of cards.
Least-significant-digit-first radix sort seems to be

a folk invention originated by machine operators.
September 27, 2004
L6.60
Modern IBM card

One character per column.
So, thats why text windows have 80 columns!

September 27, 2004
L6.61
Web resources on punchedcard technology

Doug Joness punched card index
Biography of Herman Hollerith
The 1890 U.S. Census
Early history of IBM
Pictures of Holleriths inventions
Holleriths patent application (borrowed
from Gordon Bells CyberMuseum)
Impact of punched cards on U.S. history
September 27, 2004
L6.62
6.046J/18.401J
Lecture 7
Prof. Piotr Indyk
Data Structures
Role of data structures:
Encapsulate data
Support certain operations (e.g., INSERT,
DELETE, SEARCH)
What data structures do we know already ?
Yes, heap:
INSERT(x)
DELETE-MIN
Charles Leiserson and Piotr Indyk
September 29, 2004
L7.2
Dictionary problem
Dictionary T holding n records:
x
record
key[x]
key[x]
Other fields
containing
satellite data
Operations on T:
INSERT(T, x)
DELETE(T, x)
SEARCH(T, k)
How should the data structure T be organized?

September 29, 2004
L7.3
Assumptions
Assumptions:
The set of keys is K U = {0, 1, , u1}
Keys are distinct
What can we do ?
September 29, 2004
L7.4
Direct access table

Create a table T[0u-1]:
T[k] =
x
NIL
if k K and key[x] = k,
otherwise.
Benefit:
Each operation takes constant time
Drawbacks:
The range of keys can be large:
64-bit numbers (which represent
18,446,744,073,709,551,616 different keys),
character strings (even larger!)
September 29, 2004
L7.5
Hash functions
Solution: Use a hash function h to map the
universe U of all keys into
T
{0, 1, , m1}:
0
k1
K
k2
k4
h(k1)
h(k4)
k5
h(k2) = h(k5)
k3
h(k3)
m1
When a record to be inserted maps to an already

As
each key
h maps
it to a slot of T.
occupied
slotisininserted,
T, a collision
occurs.
September 29, 2004
L7.6
Collisions resolution by
chaining
Records in the same slot are linked into a list.
T
49
49
86
86
52
52
h(49) = h(86) = h(52) = i
September 29, 2004
L7.7
Hash functions
Designing good functions is quite nontrivial
For now, we assume they exist. Namely, we
assume simple uniform hashing:
Each key k K of keys is equally likely
to be hashed to any slot of table T,
independent of where other keys are
hashed
September 29, 2004
L7.8
Analysis of chaining
Let n be the number of keys in the table, and
let m be the number of slots.
Define the load factor of T to be
= n/m
= average number of keys per slot.
September 29, 2004
L7.9
Search cost
Expected time to search for a record with
a given key = (1 + ).
apply hash
function and
access slot
search
the list
Expected search time = (1) if = O(1),

or equivalently, if n = O(m).
September 29, 2004
L7.10
Other operations
Insertion time ?
Constant: hash and add to the list
Deletion time ? Recall that we defined
DELETE(T, x)
Also constant, if x has a pointer to the
collision list and list doubly linked
Otherwise, do SEARCH first
September 29, 2004
L7.11
Delete
key[x]
key[x]
T
49
49
86
86
52
52
September 29, 2004
L7.12
Dealing with wishful thinking

The assumption of simple uniform hashing
is hard to guarantee, but several common
techniques tend to work well in practice as
long as their deficiencies can be avoided.
Desirata:
A good hash function should distribute the
keys uniformly into the slots of the table.
Regularity in the key distribution (e.g.,
arithmetic progression) should not affect
this
uniformity.
September 29, 2004 L7.13
Hashing in practice
Leaving the realm of Provable

September 29, 2004
L7.14
Division method
Define
h(k) = k mod m.
Deficiency: Dont pick an m that has a small
divisor d. A preponderance of keys that are
congruent modulo d can adversely affect
uniformity.
Extreme deficiency: If m = 2r, then the hash
doesnt even depend on all the bits of k:
If k = 10110001110110102 and r = 6, then
h(k) = 0110102 .
h(k)
September 29, 2004
L7.15
Division method (continued)

h(k) = k mod m.
Pick m to be a prime.
Annoyance:
Sometimes, making the table size a prime is
inconvenient.
But, this method is popular, although the next
method well see is usually superior.
September 29, 2004
L7.16
Multiplication method
Assume that all keys are integers, m = 2r, and our
computer has w-bit words. Define
h(k) = (Ak mod 2w) rsh (w r),
where rsh is the bit-wise right-shift operator
and A is an odd integer in the range 2w1 < A < 2w.
Dont pick A too close to 2w.
Multiplication modulo 2w is fast.
The rsh operator is fast.
September 29, 2004
L7.17
Multiplication method
example
h(k) = (Ak mod 2w) rsh (w r)
Suppose that m = 8 = 23 and that our computer
has w = 7-bit words:
1011001 =A
1101011 =k
10010100110011
A
h(k)
0
7 1
5 4 3
.
2A
Modular wheel
3A
September 29, 2004
L7.18
Back to the realm of Provable
September 29, 2004
L7.19
A weakness of hashing as we
saw it
Problem: For any hash function h, a set
of keys exists that can cause the average
access time of a hash table to skyrocket.
An adversary can pick all keys from
h-1(i) ={k U : h(k) = i} for a slot i.
There is a slot i for which |h-1(i)| u/m
September 29, 2004
L7.20
Solution
Randomize!
Choose the hash function at random from
some family of function, and independently
of the keys.
Even if an adversary can see your code, he
or she cannot find a bad set of keys, since
he or she doesnt know exactly which hash
function will be chosen.
What family of functions should we select ?
September 29, 2004
L7.21
Family of hash functions

Idea #1: Take the family of all functions
h: U {0m-1}
That is, choose each of h(0), h(1), , h(u-1)
independently at random from {0m-1}
Benefit:
The uniform hashing assumption is true!
Drawback:
We need u random numbers to specify h.
Where to store them ?
September 29, 2004
L7.22
Universal hashing
Idea #2: Universal Hashing
Let H be a finite collection of hash
functions, each mapping U to {0, 1, , m1}
We say H is universal if for all x, y U,
where x y, we have
PrhH{h(x) = h(y)}| = 1/m.
September 29, 2004
L7.23
Universality is good
Theorem. Let h be a hash function chosen
(uniformly) at random from a universal set H
of hash functions. Suppose h is used to hash
n arbitrary keys into the m slots of a table T.
Then, for a given key x, we have
E[#collisions with x] < n/m.
September 29, 2004
L7.24
Proof of theorem
Proof. Let Cx be the random variable denoting
the total number of collisions of keys in T with
x, and let
1 if h(x) = h(y),
cxy =
0 otherwise.
Note: E[cxy] = 1/m and C x =
cxy .
yT {x}
September 29, 2004
L7.25
Proof (continued)
E[C x ] = E c xy
yT { x}
Take expectation
of both sides.
September 29, 2004
L7.26
Proof (continued)
E[C x ] = E c xy
yT { x}
=
E[cxy ]
yT { x}
Take expectation
of both sides.
Linearity of
expectation.
September 29, 2004
L7.27
Proof (continued)
E[C x ] = E c xy
yT { x}
=
E[cxy ]
Linearity of
expectation.
1/ m
E[cxy] = 1/m.
yT { x}
Take expectation
of both sides.
yT { x}
September 29, 2004
L7.28
Proof (continued)
E[C x ] = E c xy
yT { x}
=
E[cxy ]
Linearity of
expectation.
1/ m
E[cxy] = 1/m.
yT { x}
Take expectation
of both sides.
yT { x}
= n 1 .
m
Algebra.
September 29, 2004
L7.29
Constructing a set of universal

hash functions
Let m be prime.
Decompose key k into r + 1 digits, each with value in
the set {0, 1, , m1}.
That is, let k = k0, k1, , kr, where 0 ki < m.
Randomized strategy:
Pick a = a0, a1, , ar where each ai is chosen
randomly from {0, 1, , m1}.
r
Define ha (k ) = ai ki mod m
i =0
Denote H={ha: a as above}

September 29, 2004
L7.30
Universality of dot-product
hash functions
Theorem. The set H = {ha} is universal.
Proof. Suppose that

x = x0, x1, , xr and
y = y0, y1, , yr are distinct keys. Thus,
they differ in at least one digit position, wlog
position 0. What is the probability that x and y
collide, that is ha(x)=hb(y) ?
September 29, 2004
L7.31
Proof (continued)
r
i =0
i =0
ha ( x) = ha (b) ai xi ai yi
r
(mod m)
ai ( xi yi ) 0
(mod m)
a0 ( x0 y0 ) + ai ( xi yi ) 0
(mod m)
i =0
r
i =1
a0 ( x0 y0 ) ai ( xi yi )
(mod m. )
i =1
September 29, 2004
L7.32
Recall PS 2
Theorem. Let m be prime. For any z Zm
such that z 0, there exists a unique z1 Zm
such that
z z1 1 (mod m).
September 29, 2004
L7.33
Back to the proof

We have
a0 ( x0 y0 ) ai ( xi yi )
(mod m) ,
i =1
and since x0 y0 , an inverse (x0 y0 )1 must exist,

which implies that
a0 ai ( xi yi ) ( x0 y0 ) 1
i =1
(mod m) .
Thus, for any choices of a1, a2, , ar, exactly

one choice of a0 causes x and y to collide.
September 29, 2004
L7.34
Proof (completed)
Q. What is the probability that x and y
collide?
A. There are m choices for a0, but exactly one
choice for a0 causes x and y to collide,
namely
1
a0 = ai ( xi yi ) ( x0 y0 ) mod m .
i =1
Thus, the probability of x and y colliding is

1/m.
September 29, 2004
L7.35
Recap
Showed how to implement dictionary so
that INSERT, DELETE, SEARCH work in
expected constant time under the uniform
hashing assumption
Relaxed the assumption to universal
hashing
Constructed universal hashing for keys in
{0mr -1}
September 29, 2004
L7.36
Perfect hashing
Given a set of n keys, construct a static hash
table of size m = O(n) such that SEARCH takes
(1) time in the worst case.
IDEA: Twolevel scheme
with universal
hashing at
both levels.
No collisions
at level 2!
T
0
1 44 31
31
2
3
4 11 00
00
5
6 99 86
86
m a
S1
14
27
1427
S4
26
26
h31(14) = h31(27) = 1
S6
40
22
40 37
37
22
0 1 2 3 4 5 6 7 8
September 29, 2004
L7.37
Collisions at level 2
Theorem. Let H be a class of universal hash
functions for a table of size m = n2. Then, if we
use a random h H to hash n keys into the table,
the expected number of collisions is at most 1/2.
Proof. By the definition of universality, the
probability that 2 given keys in the table collide
n
2
under h is 1/m = 1/n . Since there are (2 ) pairs
of keys that can possibly collide, the expected
number of collisions is
n 1
n(n 1) 1
2 < 1.
2 =
2
2
n
2 n
September 29, 2004
L7.38
No collisions at level 2
Corollary. The probability of no collisions
is at least 1/2.
Proof. Markovs inequality says that for any

nonnegative random variable X, we have
Pr{X t} E[X]/t.
Applying this inequality with t = 1, we find
that the probability of 1 or more collisions is
at most 1/2.
Thus, just by testing random hash functions
in H, well quickly find one that works.
September 29, 2004
L7.39
Analysis of storage
For the level-1 hash table T, choose m = n, and
let ni be random variable for the number of keys
that hash to slot i in T. By using ni2 slots for the
level-2 hash table Si, the expected total storage
required for the two-level scheme is therefore
m1
2
E (ni ) = (n) ,
i =0
since the analysis is identical to the analysis from

recitation of the expected running time of bucket
sort. (For a probability bound, apply Markov.)
September 29, 2004
L7.40
Resolving collisions by open

addressing
No storage is used outside of the hash table itself.
Insertion systematically probes the table until an
empty slot is found.
The hash function depends on both the key and
probe number:
h : U {0, 1, , m1} {0, 1, , m1}.
The probe sequence h(k,0), h(k,1), , h(k,m1)
should be a permutation of {0, 1, , m1}.
The table may fill up, and deletion is difficult (but
not impossible).
September 29, 2004
L7.41
Example of open addressing

Insert key k = 496:
0. Probe h(496,0)
586
133
204
collision
481
m1
September 29, 2004
L7.42

Insert key k = 496:
0. Probe h(496,0)
1. Probe h(496,1)
586
133
collision
204
481
m1
September 29, 2004
L7.43

Insert key k = 496:
0. Probe h(496,0)
1. Probe h(496,1)
2. Probe h(496,2)
586
133
204
496
481
insertion
m1
September 29, 2004
L7.44

Search for key k = 496:
0. Probe h(496,0)
1. Probe h(496,1)
2. Probe h(496,2)
586
133
204
496
481
Search uses the same probe

sequence, terminating sucm1
cessfully if it finds the key
and unsuccessfully if it encounters an empty slot.
September 29, 2004
L7.45
Probing strategies
Linear probing:
Given an ordinary hash function h(k), linear
probing uses the hash function
h(k,i) = (h(k) + i) mod m.
This method, though simple, suffers from primary
clustering, where long runs of occupied slots build
up, increasing the average search time. Moreover,
the long runs of occupied slots tend to get longer.
September 29, 2004
L7.46
Probing strategies
Double hashing
Given two ordinary hash functions h1(k) and h2(k),
double hashing uses the hash function
h(k,i) = (h1(k) + ih2(k)) mod m.
This method generally produces excellent results,
but h2(k) must be relatively prime to m. One way
is to make m a power of 2 and design h2(k) to
produce only odd numbers.
September 29, 2004
L7.47
Analysis of open addressing

We make the assumption of uniform hashing:
Each key is equally likely to have any one of
the m! permutations as its probe sequence.
Theorem. Given an open-addressed hash
table with load factor = n/m < 1, the
expected number of probes in an unsuccessful
search is at most 1/(1).
September 29, 2004
L7.48
Proof of the theorem

Proof.
At least one probe is always necessary.
With probability n/m, the first probe hits an
occupied slot, and a second probe is necessary.
With probability (n1)/(m1), the second probe
hits an occupied slot, and a third probe is
necessary.
With probability (n2)/(m2), the third probe
hits an occupied slot, etc.
n
i
n
<
= for i = 1, 2, , n.
Observe that
mi m
September 29, 2004
L7.49
Proof (continued)
Therefore, the expected number of probes is
2
1
n
n
n
1
1 + 1 +
L
L 1 +
1 +
m m 1 m 2 m n + 1
1 + (1 + (1 + (L (1 + )L)))
1+ + 2 +3 +L
=
i =0
= 1 .
1
The textbook has a

more rigorous proof.
September 29, 2004
L7.50
Implications of the theorem

If is constant, then accessing an openaddressed hash table takes constant time.
If the table is half full, then the expected
number of probes is 1/(10.5) = 2.
If the table is 90% full, then the expected
number of probes is 1/(10.9) = 10.
September 29, 2004
L7.51
Dot-product method
Randomized strategy:
Let m be prime. Decompose key k into r + 1
digits, each with value in the set {0, 1, , m1}.
That is, let k = k0, k1, , km1, where 0 ki < m.
Pick a = a0, a1, , am1 where each ai is chosen
randomly from {0, 1, , m1}.
Define ha (k ) =
ai ki mod m.
i =0
Excellent in practice, but expensive to compute.

September 29, 2004
L7.52
6.046J/18.401J
Lecture 8
Prof. Piotr Indyk
Data structures
Previous lecture: hash tables
Insert, Delete, Search in (expected)

constant time
Works for integers from {0mr-1}
This lecture: Binary Search Trees
Insert, Delete, Search (Successor)
Works in comparison model
Piotr Indyk
October 6, 2004
L7.2
Binary Search Tree
Each node x has:
key[x]
Pointers:
left[x]
right[x]
p[x]
9
5
1
12
6
7
8
Piotr Indyk
October 6, 2004
L7.3
Binary Search Tree (BST)
Property: for any node x:
For all nodes y in the left

subtree of x:
key[y] key[x]
For all nodes y in the right

subtree of x:
key[y] key[x]
Given a set of keys, is BST for

those keys unique?
Piotr Indyk
9
5
1
12
6
7
8
October 6, 2004
L7.4
No uniqueness
5
1
9
6
5
12
12
6
7
8
Piotr Indyk
October 6, 2004
L7.5
What can we do given BST ?
Sort !
Inorder-Walk(x):
If xNIL then
9
5
Inorder-Walk( left[x] )
print key[x]
Inorder-Walk( right[x] )
Output:
Piotr Indyk
12
6
7
12
October 6, 2004
L7.6
Sorting, ctd.
What is the running time of

Inorder-Walk?
It is O(n)
Because:
Each link is traversed
twice
There are O(n) links
Piotr Indyk
9
5
1
12
6
7
8
October 6, 2004
L7.7
Sorting, ctd.
Does it mean that we can

sort n keys in O(n) time ?
No
It just means that building

a BST takes (n log n)
time
(in the comparison model)
9
5
1
12
6
7
8
Piotr Indyk
October 6, 2004
L7.8
BST as a data structure
Operations:
Insert(x)
Delete(x)
Search(k)
Piotr Indyk
October 6, 2004
L7.9
Search
Search(x):
If xNIL then
If key[x] = k then return x
If k < key[x] then return
Search( left[x] )
If k > key[x] then return
Search( right[x] )
Else return NIL
Piotr Indyk
9
5
1
12
6
7
Search(8):
8
Search(8.5):
October 6, 2004
L7.10
Predecessor/Successor
Can modify Search (into Search) such that,

if k is not stored in BST, we get x such that:
Either it has the largest key[x]<k, or
It has the smallest key[x]>k

Useful when k prone to errors
What if we always want a successor of k ?
x=Search(k)
If key[x]<k, then return Successor(x)
Else return x
Piotr Indyk
October 6, 2004
L7.11
Successor
Successor(x):
yx
If right[x] NIL then
5
12
return Minimum( right[x] )
yx
Otherwise
1
6
y p[x]
While yNIL and x=right[y] do
7 xy
xy
8
y p[y]
Return y
Piotr Indyk
October 6, 2004
L7.12
Minimum
Minimum( x )
While left[x]NIL do
x left[x]
Return x
9
5
1
12
6
7
8
Piotr Indyk
October 6, 2004
L7.13
Nearest Neighbor
Assuming keys are numbers
For a key k, can we find x such that |k-key[x]| is

minimal ?
Yes:
key[x] must be either a predecessor or

successor of k
y=Search(k)
//y is either succ or pred of k
y =Successor(y)
y=Predecessor(y)
Report the closest of key[y], key[y], key[y]
Piotr Indyk
October 6, 2004
L7.14
Analysis
How much time does all of

this take ?
Worst case: O(height)
Height really important
Tree better be balanced
9
5
1
12
6
7
8
Piotr Indyk
October 6, 2004
L7.15
Constructing BST
Insert(z):
y NIL
x root
While x NIL do
y x
If key[z] < key[x]
then x left[x]
else x right[x]
p[z] y
If key[z] < key[y]
then left[y] z
else right[y] z
Piotr Indyk
9
5
12
1
6 y
z 5.5
7
Insert(8.5)
Insert(5.5)
8
8.5
October 6, 2004
L7.16
Analysis
1
After we insert n elements,
what is the worst possible
BST height ?
Pretty bad: n-1
Piotr Indyk
2
3
4
5
6
October 6, 2004
L7.17
Average case analysis
Consider keys 1,2,,n, in a random order

Each permutation equally likely
For each key perform Insert
What is the likely height of the tree ?
It is O(log n)
Piotr Indyk
October 6, 2004
L7.18
Creating a random BST

n=9
1 2 3 4 5 6 7 8 9
1 2
4 5 6 7 8 9
Piotr Indyk
4 5
7 8 9
7
7
9
9
October 6, 2004
L7.19
Observations
Each edge corresponds to a random
partition
Element x has height h x participated in

h partitions
Let hx be a random variable denoting height
of x
What is Pr[hx >t] , where t=c lg n ?
Piotr Indyk
October 6, 2004
L7.20
Partitions
A partition is lucky if the ratio is at least 1:3, i.e.,

each side has size 25%
Probability of lucky partition is
After log4/3 n lucky partitions the element becomes a

leaf
hx>t in t= c log4/3 n partitions we had <log4/3 n
lucky ones
Toss t= c log4/3 n coins, what is the probability you
get <k=log4/3 n heads ?
Piotr Indyk
October 6, 2004
L7.21
Concentration inequalities
CLRS, p. 1118: probability of at most k heads

t-k
in t trials is at most (t ) /2
t
Pr[hx >t] (k) /2t-k
(et/k)k/2t-k
= (ce)log n/2(c-1) log n
= 2lg(ce) log n/2 (c-1) log n
= 2 [lg(ce) (c-1)] * (lg n)/ lg(4/3)
2 -1.1 lg n = 1/n1.1, for sufficient c
4/3
4/3
4/3
Piotr Indyk
4/3
October 6, 2004
L7.22
Final Analysis
We know that for each x, Pr[hx >t] 1/n1.1

We want Pr[h1>t or h2>t or or hn>t]
This is at most
Pr[h1>t]+Pr[h2>t] ++ Pr[hn>t]
n * 1/n1.1
= 1/n0.1
As n grows, probability of height >c lgn

becomes arbitrarily small
Piotr Indyk
October 6, 2004
L7.23
Summing up
We have seen BSTs
Support Search, Successor, Nearest

Neighbor etc, as well as Insert
Worst case: O(n)
But O(log n) on average
Next week: O(log n) worst case
Piotr Indyk
October 6, 2004
L7.24
6.046J/18.401J
Lecture 9
Prof. Piotr Indyk
Today
Balanced search trees,

or how to avoid this
even in the worst case
2
3
4
5
6
Piotr Indyk and Charles E. Leiserson
October 13, 2004
L9.2
Balanced search trees
Balanced search tree: A search-tree data

structure for which a height of O(lg n) is
guaranteed when implementing a dynamic
set of n items.
Examples:
AVL trees
2-3 trees
2-3-4 trees
B-trees
Red-black trees
October 13, 2004
L9.3
Red-black trees
BSTs with an extra one-bit color field in

each node.
Red-black properties:
1. Every node is either red or black.
2. The root and leaves (NILs) are black.
3. If a node is red, then its parent is black.
4. All simple paths from any node x to a

descendant leaf have the same number
of black nodes.
October 13, 2004
L9.4
Example of a red-black tree

77
33
NIL
18
18
NIL
10
10
88
22
22
11
11
NIL NIL NIL NIL
NIL
26
26
NIL
NIL
October 13, 2004
L9.5
Use of red-black trees
What properties would we like to prove about

red-black trees ?
They always have O(log n) height
There is an O(log n)time insertion
procedure which preserves the red-black

properties
Is it true that, after we add a new element to a
tree (as in the previous lecture), we can always
recolor the tree to keep it red-black ?
October 13, 2004
L9.6
Example of a red-black tree

77
33
18
18
10
10
88
22
22
11
11
26
26
7.5
7.5
October 13, 2004
L9.7
Use of red-black trees
What properties would we like to prove about red-
black trees ?
They always have O(log n) height
There is an O(log n)time insertion procedure

which preserves the red-black properties
Is it true that, after we add a new element to a tree (as
in the previous lecture), we can always recolor the
tree to keep it red-black ?
NO
After insertions, sometimes we need to juggle nodes

around
October 13, 2004
L9.8
Rotations
RIGHT-ROTATE(B)
BB
LEFT-ROTATE(A)
AA
AA
BB
Rotations maintain the inorder ordering of keys:
a , b , c a A b B c.
A rotation can be performed in O(1) time.
October 13, 2004
L9.9
Rotations can reduce height
B
2
A 1
LEFT-ROTATE(A)
AA
BB
B
3
BB
AA
October 13, 2004
L9.10
Red-black tree wrap-up
Can show how
O(log n) re-colorings
1 rotation
can restore red-black properties after an

insertion
Instead, we will see 2-3 trees (but will come
back to red-black trees at the end)
October 13, 2004
L9.11
2-3 Trees
The simplest balanced trees on the planet!

Although a little bit more wasteful
October 13, 2004
L9.12
2-3 Trees
Degree of each node is

either 2 or 3
Keys are in the leaves
All leaves have equal

depth
1
Leaves are sorted
Each node x contains

maximum key in the
sub-tree, denoted
x.max
12
6
5
8
6
12
8
9 12
October 13, 2004
L9.13
Internal nodes
Internal nodes:
Values:
x.max: maximum key in the sub-tree

Pointers:
left[x]
mid[x]
right[x] : can be null
p[x] : can be null for the root
Leaves:
x.max : the key
October 13, 2004
L9.14
Height of 2-3 tree
What is the maximum height h of a 2-3 tree

with n nodes ?
Alternatively, what is the minimum number of
nodes in a 2-3 tree of height h ?
It is 1+2+22+23++2h =2h+1-1
n 2h+1-1 h = O(log n)
Full binary tree is the worst-case example!
October 13, 2004
L9.15
Searching
How can we search for a
key k ?
Search(x,k):
If x=NIL then return NIL
Else if x is a leaf then
If x.max=k then return x
Else return NIL
Else
If k left[x].max
then Search(left[x],k)
Else if k mid[x].max
then Search(mid[x],k)
Else Search(right[x],k)
12
6
1
8
6
12
8
9 12
Search(8)
Search(13)
October 13, 2004
L9.16
Insertion
12
13
How to insert x ?
Perform Search for the
key of x
Let y be the last internal
6
8
node
Insert x into y in a
1 5 6 7 8
sorted order
At the end, update the
5.5
7.5
max values on the path
to root
Insert(7.5)
(continued on the next

Insert(13)
slide)
12
13
9 12
13
Insert(5.5)
October 13, 2004
L9.17
Insertion, ctd.
12
13
(continued from the
previous slide)
If y has 4 children,
then Split(y)
6 y
1
6
5.5
x
8
7
12
13
8
7.5
9 12
13
October 13, 2004
L9.18
Split
Split y into two nodes
y1, y2
Both are linked to

z=parent(y)*
If z has 4 children, split z
y
a
y1
y is a root, then create

new parent(y)=new root
a
d
z
*If
y2
c
October 13, 2004
L9.19
Split
12
13
6
1
8
6
5.5
12
13
8
9 12
7.5
13
October 13, 2004
L9.20
Split
12
13
5.5
1
5 5.5 6
8
7
12
13
8
9 12
7.5
13
October 13, 2004
L9.21
Split
13
6
5.5
1
13
5 5.5 6
8
7
12
13
8
9 12
7.5

13
Insert and Split preserve heights, unless new root is created, in which case all heights are
increased by 1
After Split, all nodes have 2 or 3 children

Everything takes O(log n) time
October 13, 2004
L9.22
Delete
12
How to delete x ?
6
12
Let y=p(x)
z
Remove x from y
5.5
6
8 y 12
If y has 1 child:
Remove y
1 5 5.5 6 7 8 9 12
Attach x to ys sibling z
x
Delete(8)
October 13, 2004
L9.23
Delete
12
How to delete x ?
6
12
Let y=p(x)
z
Remove x from y
5.5
6
12
If y has 1 child:
Remove y
1 5 5.5 6 7
9 12
Attach x to ys sibling z
If z has 4 children, then
Delete(8)
Split(z)
INCOMPLETE SEE THE END FOR FULL VERSION
October 13, 2004
L9.24
Summing up
2-3 Trees:
O(log n) depth Search in O(log n) time
Insert, Delete (and Split) in O(log n) time
We will now see 2-3-4 trees

Same idea, but:
Each parent has 2,3 or 4 children
Keys in the inner nodes
More complicated procedures
October 13, 2004
L9.25
2-3-4 Trees
5 9
1 2 4
7 8
10 12
October 13, 2004
L9.26
Height of a red-black tree
Theorem. A red-black tree with n keys has height
h 2 lg(n + 1).
INTUITION:
Merge red nodes

into their black
parents.
October 13, 2004
L9.27
h 2 lg(n + 1).
INTUITION:
Merge red nodes

into their black
parents.
October 13, 2004
L9.28
h 2 lg(n + 1).
INTUITION:
Merge red nodes

into their black
parents.
October 13, 2004
L9.29
h 2 lg(n + 1).
INTUITION:
Merge red nodes

into their black
parents.
October 13, 2004
L9.30
h 2 lg(n + 1).
INTUITION:
Merge red nodes

into their black
parents.
October 13, 2004
L9.31
h 2 lg(n + 1).
INTUITION:
Merge red nodes

h
into their black
parents.
This process produces a tree in which each node
has 2, 3, or 4 children.
The 2-3-4 tree has uniform depth h of leaves.
October 13, 2004
L9.32
Summing up
We have seen:
Red-black trees
2-3 trees (in detail)
2-3-4 trees
Red-black trees are undercover 2-3-4 trees
In most cases, does not matter what you use
October 13, 2004
L9.33
2-3 Trees: Deletions
Problem: there is an
internal node that
has only 1 child
6
5.5
12
5 5.5 6
12
12
7
9 12
October 13, 2004
L9.34
Full procedure for Delete(x)
Special case: x is the only element in the tree:

delete everything
x
NIL
Not-so-special case: x is one of two elements

in the tree. In this case, the procedure on the
next slide will delete x
y
x
Both NIL and y are special 2-3 trees

October 13, 2004
L9.35
Procedure for Delete(x)
Let y=p(x)
Remove x
If yroot then
Let z be the sibling of y.
Assume z is the right sibling of y, otherwise the code is
symmetric.
If y has only 1 child w left
Case 1: z has 3 children
Attach left[z] as the rightmost child of y

Update y.max and z.max
Case 2: z has 2 children:
Attach the child w of y as the leftmost child of z
Update z.max
Delete(y) (recursively*)
Else
Update max of y, p(y), p(p(y)) and so on until root

Else
If root has only one child u
Remove root
Make u the new root
*Note
that the input of Delete does not have to be a leaf
October 13, 2004
L9.36
Example
12
6
5.5
1
5 5.5 6
12
8
7
12
8
9 12
October 13, 2004
L9.37
Example, ctd.
12
6
5.5
1
5 5.5 6
12
8
7
12
9 12
October 13, 2004
L9.38
Example, ctd.
12
12
5.5
1
5 5.5 6
12
12
7
9 12
October 13, 2004
L9.39
Example, ctd.
12
5.5
1
5 5.5 6
12
7
9 12
October 13, 2004
L9.40
6.046J/18.401J/SMA5503
Lecture 10
Prof. Piotr Indyk
Today
A data structure for a new problem
Amortized analysis
2004 by Erik Demaine and Piotr Indyk
October 18, 2004
L10.2
2-3 Trees: Deletions
12
6
12
Problem: there is
an internal node
that has only 1
5.5
6
12
child
Solution: delete 1 5 5.5 6 7
9 12
recursively
October 18, 2004
L10.3
Example
12
6
5.5
1
5 5.5 6
12
8
7
12
8
9 12
October 18, 2004
L10.4
Example, ctd.
12
6
5.5
1
5 5.5 6
12
8
7
12
9 12
October 18, 2004
L10.5
Example, ctd.
12
12
5.5
1
5 5.5 6
12
12
7
9 12
October 18, 2004
L10.6
Example, ctd.
12
5.5
1
5 5.5 6
12
7
9 12
October 18, 2004
L10.7
Procedure for Delete(x)
Let y=p(x)
Remove x
If yroot then
Let z be the sibling of y.
Assume z is the right sibling of y, otherwise the code is
symmetric.
If y has only 1 child w left
Case 1: z has 3 children
Attach left[z] as the rightmost child of y
Update y.max and z.max
Case 2: z has 2 children:
Attach the child w of y as the leftmost child of z
Update z.max
Delete(y) (recursively*)
Else
Update max of y, p(y), p(p(y)) and so on until root
Else
If root has only one child u
Remove root
Make u the new root
*Note
that the input of Delete does not have to be a leaf
October 18, 2004
L10.8
2-3 Trees
The simplest balanced trees on the planet!
(but, nevertheless, not completely trivial)
October 18, 2004
L10.9
Dynamic Maintenance of Sets

Assume, we have a collection
of elements
The elements are clustered
Initially, each element forms
its own cluster/set
We want to enable two
operations:
FIND-SET(x): report the
cluster containing x
UNION(C1, C2): merges
the clusters C1, C2
1
3
2
6
October 18, 2004
L10.10
Disjoint-set data structure

(Union-Find)
Problem:
Maintain a collection of pairwise-disjoint
sets S = {S1, S2, , Sr}.
Each Si has one representative element x=rep[Si].
Must support three operations:
MAKE-SET(x): adds new set {x} to S
with rep[{x}] = x (for any x Si for all i).
WEAK UNION(x, y): replaces sets S , S with S S
x
y
x
y
in S for any rep. x, y in distinct sets Sx, Sy .
FIND-SET(x): returns representative rep[Sx]
of set Sx containing element x.
October 18, 2004
L10.11
Quiz
If we have a WEAKUNION( x, y) that works
only if x, y are representatives, how can we
implement UNION that works for any x, y ?
UNION( x, y)
=WEAKUNION( FIND-SET(x) , FIND-SET(y) )
October 18, 2004
L10.12
Representation
x
Data
Other fields containing
data of our choice
October 18, 2004
L10.13
Applications
1
Data clustering
Killer App: Minimum
Spanning Tree
(Lecture 13)
Amortized analysis
2
6
October 18, 2004
L10.14
Ideas ?
How can we implement this data structure
efficiently ?
MAKE-SET
UNION
FIND-SET
October 18, 2004
L10.15
Bad case for UNION or FIND

1
n+1
2n
October 18, 2004
L10.16
Simple linked-list solution

Store set Si = {x1, x2, , xk} as an (unordered) doubly
linked list. Define representative element
rep[Si] to be the front of the list, x1.
Si :
x1
rep[Si]
x2
xk
How
it ?x as a lone node.
M
AKEcan
-SETwe
(x)improve
initializes
(1)
FIND-SET(x) walks left in the list containing x
(n)
until it reaches the front of the list.
UNION(x, y) concatenates the lists containing
(n)
x and y, leaving rep. as FIND-SET[x].
October 18, 2004
L10.17
Augmented linked-list solution

Store set Si = {x1, x2, , xk} as unordered doubly
linked list. Each xj also stores pointer rep[xj] to head.
rep[Si]
Si :
x1
x2
xk
FIND-SET(x) returns rep[x].

x and y, and updates the rep pointers for
all elements in the list containing y.
October 18, 2004
L10.18
Example of
augmented linked-list solution
rep
Sx :
x1
rep[Sx]
x2
rep
Sy :
y1
rep[Sy]
y2
y3
October 18, 2004
L10.19
Example of
Sx Sy :
x1
rep[Sx]
rep
x2
rep
y1
rep[Sy]
y2
y3
October 18, 2004
L10.20
Example of
rep
Sx Sy :
x1
rep[Sx Sy]
x2
y1
y2
y3
October 18, 2004
L10.21
Augmented linked-list solution

Store set Si = {x1, x2, , xk} as unordered doubly
linked list. Each xj also stores pointer rep[xj] to head.
rep[Si]
Si :
x1
x2
xk
FIND-SET(x) returns rep[x].

(1)
(n)
x and y, and updates the rep pointers for
all elements in the list containing y.
?
October 18, 2004
L10.22
Amortized analysis
So far, we focused on worst-case time of each operation.
E.g., UNION takes (n) time for some operations
Amortized analysis: count the total time spent by any
sequence of operations
Total time is always at most
worst-case-time-per-operation * #operations
but it can be much better!
E.g., if times are 1,1,1,,1,n,1,,1
Can we modify the linked-list data structure so that any
sequence of m MAKE-SET, FIND-SET, UNION
operations cost less than m*(n) time?
October 18, 2004
L10.23
Alternative
UNION(x, y) :
concatenates the lists containing y and x, and
update the rep pointers for all elements in the
list containing y x
rep
rep
Sy :
y1
rep[Sy]
y2
Sx :
x1
rep[Sx]
x2
y3
October 18, 2004
L10.24
Alternative concatenation
UNION(x, y) could instead
concatenate the lists containing y and x, and
list containing x.
rep
Sx Sy :
y1
rep[Sy]
x1
rep[Sx]
rep
y2
x2
y3
October 18, 2004
L10.25
Alternative concatenation
UNION(x, y) could instead
concatenate the lists containing y and x, and
list containing x.
rep
Sx Sy :
y1
rep[Sx Sy]
x1
rep
y2
x2
y3
October 18, 2004
L10.26
Smaller into larger

Concatenate smaller list onto the end of the larger
list (each list stores its weight = # elements)
Cost = (length of smaller list).
Let n denote the overall number of elements
(equivalently, the number of MAKE-SET operations).
Let m denote the total number of operations.
Theorem: Cost of all UNIONs is O(n lg n).
Corollary: Total cost is O(m + n lg n).
October 18, 2004
L10.27
Total UNION cost is O(n lg n)

Proof:
Monitor an element x and set Sx containing it
After initial MAKE-SET(x), weight[Sx] = 1
Consider any time when Sx is merged with set Sy
If weight[Sy] weight[Sx]
pay 1 to update rep[x]
weight[Sx] at least doubles (increasing by weight[Sy])
Otherwise
pay nothing
weight[Sx] only increases
Thus:
Each time we pay 1, the weight doubles
Maximum possible weight is n
Maximum pay lg n for x , or O(n log n) overall
October 18, 2004
L10.28
Final Result
We have a data structure for dynamic sets which
supports:
MAKE-SET: O(1) worst case
FIND-SET: O(1) worst case
UNION:
Any sequence of any m operations* takes O(m log n) time, or
the amortized complexity of the operations* is O(log n)
*
I.e., MAKE-SET, FIND-SET or UNION
October 18, 2004
L10.29
Amortized vs Average
What is the difference between average case
complexity and amortized complexity ?
Average case assumes random
distribution over the input (e.g., random
sequence of operations)
Amortized means we count the total
time taken by any sequence of m
operations (and divide it by m)
October 18, 2004
L10.30
Can we do better ?
One can do:
FIND-SET: O(lg n) worst case
WEAKUNION: O(1) worst case
Thus, UNION: O(lg n) worst case
October 18, 2004
L10.31
Representing sets as trees

Each set Si = {x1, x2, , xk} stored as a tree
rep[Si] is the tree root.
UNION(rep[S1] ,rep[S1]): rep[S
1 S2]
rep[S1]
MAKE-SET(x) initializes x rep[S2]
x1
as a lone node.
x7
FIND-SET(x) walks up the
x4
x3
tree containing x until it
reaches the root.
x2 x5 x6
UNION(x, y) concatenates
the trees containing
S1 = {x1, x2, x3, x4, x5 , x6}
x and y
S2 = {x7}
October 18, 2004
L10.32
Time Analysis
MAKE-SET(x) initializes x
O(1)
as a lone node.
O(depth) = ?
reaches the root.
WEAKUNION(x, y)
O(1)
concatenates
the trees containing x and y
October 18, 2004
L10.33
Smaller into Larger in trees

Algorithm: Merge tree with
smaller weight into tree with x1
larger weight.
x3
Height of tree increases x4
only when its size
x2 x5 x6
doubles
Height logarithmic in
weight
y1
y4
October 18, 2004
y3
L10.34
Smaller into Larger in trees

Proof:
Monitor the height of an element z
Each time the height of z increases, the
weight of its tree doubles
Maximum weight is n
Thus, height of z is log n
October 18, 2004
L10.35
Tree implementation
We have:
FIND-SET: O(depth) = O(lg n) worst case
WEAKUNION: O(1) worst case
Can amortized analysis buy us anything ?
Need another trick
October 18, 2004
L10.36
Trick 2: Path compression

When we execute a FIND-SET operation and walk
up a path to the root, we know the representative
for all the nodes on the path.
x1
Path compression makes
x4
x3 y1
all of those nodes direct
children of the root.
x2 x5 x6
y4 y3
FIND-SET(y2)
y2
October 18, 2004
y5
L10.37

up a path to the root, we know the representative
for all the nodes on the path.
x1
x4
x3 y1
x2 x5 x6
y4 y3
FIND-SET(y2)
y2
October 18, 2004
y5
L10.38

up a path p to the root, we know the representative
for all the nodes on path p.
x1
x4
x3 y1 y2 y3
x2 x5 x6 y4
y5
Cost of FIND-SET(x)
is still (depth[x]).
FIND-SET(y2)
October 18, 2004
L10.39
The Theorem
Theorem: In general, amortized cost is O((n)),
where (n) grows really, really, really slow.
October 18, 2004
L10.40
Ackermanns function A
j + 1 if k = 0
Define Ak ( j ) = ( j +1)
Ak 1 ( j ) if k 1 -iterate Ak-1() j+1 times
A0( j) = j + 1
A1( j) = A0((A0( j)) ~2j
A2( j) = A1(A1( j)) ~2j 2j
.2
A3( j ) > 22
..
A0(1) = 2
A1(1) = 3
A2(1) = 7
A3(1) = 2047
.2
A4(1) > 2
A4( j) is a lot bigger.
Define (n) = min {k : Ak(1) n}.
2047
..
October 18, 2004
2048
L10.41
The Theorem
Theorem: In general, amortized cost is O((n)),
where (n) grows really, really, really slow.
Proof: Really, really, really long (CLRS, p. 509)
October 18, 2004
L10.42
Application:
Dynamic connectivity
Suppose a graph is given to us incrementally by
ADD-VERTEX(v)
ADD-EDGE(u, v)
and we want to support connectivity queries:
CONNECTED(u, v):
Are u and v in the same connected component?
For example, we want to maintain a spanning forest,
so we check whether each new edge connects a
previously disconnected pair of vertices.
October 18, 2004
L10.43
Application:
Dynamic connectivity
Sets of vertices represent connected components.
Suppose a graph is given to us incrementally by
ADD-VERTEX(v) MAKE-SET(v)
ADD-EDGE(u, v) if not CONNECTED(u, v)
then UNION(v, w)
and we want to support connectivity queries:
CONNECTED(u, v): FIND-SET(u) = FIND-SET(v)
Are u and v in the same connected component?
For example, we want to maintain a spanning forest,
so we check whether each new edge connects a
previously disconnected pair of vertices.
October 18, 2004
L10.44
Simple balanced-tree solution

Store each set Si = {x1, x2, , xk} as a balanced tree
(ignoring keys). Define representative element
rep[Si] to be the root of the tree.
MAKE-SET(x) initializes x
(1)
as a lone node.
reaches the root.
(lg n)
UNION(x, y) concatenates
the trees containing x and y,
(lg n)
changing rep.
Si = {x1, x2, x3, x4, x5}

rep[Si] x1
x4
x2
x3
x5
October 18, 2004
L10.45
Plan of attack
We will build a simple disjoint-union data structure
that, in an amortized sense, performs significantly
better than (lg n) per op., even better than
(lg lg n), (lg lg lg n), etc., but not quite (1).
To reach this goal, we will introduce two key tricks.
Each trick converts a trivial (n) solution into a
simple (lg n) amortized solution. Together, the
two tricks yield a much better solution.
First trick arises in an augmented linked list.
Second trick arises in a tree structure.
October 18, 2004
L10.46
Each element xj stores pointer rep[xj] to

rep[Si].
UNION(x, y)
concatenates the lists containing x and y,
and
updates the rep pointers for all elements
in the
list containing y.
October 18, 2004
L10.47
Analysis of Trick 2 alone

Theorem: Total cost of FIND-SETs is O(m lg n).
Proof: Amortization by potential function.
The weight of a node x is # nodes in its subtree.
Define (x1, , xn) = i lg weight[xi].
UNION(xi, xj) increases potential of root FIND-SET(xi)
by at most lg weight[root FIND-SET(xj)] lg n.
Each step down p c made by FIND-SET(xi),
except the first, moves cs subtree out of ps subtree.
Thus if weight[c] weight[p], decreases by 1,
paying for the step down. There can be at most lg n
steps p c for which weight[c] < weight[p].
October 18, 2004
L10.48
Analysis of Trick 2 alone

Theorem: If all UNION operations occur before
all FIND-SET operations, then total cost is O(m).
Proof: If a FIND-SET operation traverses a path
with k nodes, costing O(k) time, then k 2 nodes
are made new children of the root. This change
can happen only once for each of the n elements,
so the total cost of FIND-SET is O(f + n).
October 18, 2004
L10.49
UNION(x, y)
Every tree has a rank
Rank is an upper bound for height
When we take UNION(x, y):
If rank[x] >rank[y] then link y to x
If rank[x] <rank[y] then link x to y
If rank[x]=rank[y] then
link x to y
rank[y]=rank[y]+1
Can show that 2rank(x) #elements in x (Exercise 21.4-2)
Therefore, height is O(log n)
October 18, 2004
L10.50
6.046J/18.401J
LECTURE 11
Amortized analysis
Dynamic tables
Aggregate method
Accounting method
Potential method
How large should a hash

table be?
Goal: Make the table as small as possible, but
large enough so that it wont overflow (or
otherwise become inefficient).
Problem: What if we dont know the proper size
in advance?
Solution: Dynamic tables.
IDEA: Whenever the table overflows, grow it
by allocating (via malloc or new) a new, larger
table. Move all items from the old table into the
new one, and free the storage for the old table.
October 20, 2004
L14.2
Example of a dynamic table

1. INSERT
2. INSERT
1
overflow
October 20, 2004
L14.3

1. INSERT
2. INSERT
11
overflow
October 20, 2004
L14.4

1. INSERT
2. INSERT
11
2
October 20, 2004
L14.5

1. INSERT
2. INSERT
3. INSERT
11
22
overflow
October 20, 2004
L14.6

1. INSERT
2. INSERT
3. INSERT
1
2
overflow
October 20, 2004
L14.7

1. INSERT
2. INSERT
3. INSERT
1
2
October 20, 2004
L14.8

1.
2.
3.
4.
INSERT
INSERT
INSERT
INSERT
1
2
3
4
October 20, 2004
L14.9

1.
2.
3.
4.
5.
INSERT
INSERT
INSERT
INSERT
INSERT
1
2
3
4
overflow
October 20, 2004
L14.10

1.
2.
3.
4.
5.
INSERT
INSERT
INSERT
INSERT
INSERT
1
2
3
4
overflow
October 20, 2004
L14.11

1.
2.
3.
4.
5.
INSERT
INSERT
INSERT
INSERT
INSERT
1
2
3
4
October 20, 2004
L14.12

1.
2.
3.
4.
5.
6.
7.
INSERT
INSERT
INSERT
INSERT
INSERT
INSERT
INSERT
1
2
3
4
5
6
7
October 20, 2004
L14.13
Worst-case analysis
Consider a sequence of n insertions. The
worst-case time to execute one insertion is
(n). Therefore, the worst-case time for n
insertions is n (n) = (n2).
WRONG! In fact, the worst-case cost for
n insertions is only (n) (n2).
Lets see why.
October 20, 2004
L14.14
Tighter analysis
Let ci = the cost of the i th insertion
i if i 1 is an exact power of 2,
=
1 otherwise.
i
sizei
16 16
ci
October 20, 2004
10
1
L14.15
Tighter analysis
Let ci = the cost of the i th insertion
i if i 1 is an exact power of 2,
=
1 otherwise.
i
sizei
16 16
1
1
1
2
1
4
1
8
ci
October 20, 2004
10
1
L14.16
Tighter analysis (continued)

n
Cost of n insertions = ci
i =1
n+
lg( n 1)
2j
j =0
3n
= ( n ) .
Thus, the average cost of each dynamic-table
operation is (n)/n = (1).
October 20, 2004
L14.17
Amortized analysis
An amortized analysis is any strategy for
analyzing a sequence of operations to
show that the average cost per operation is
small, even though a single operation
within the sequence might be expensive.
Even though were taking averages, however,
probability is not involved!
An amortized analysis guarantees the
average performance of each operation in
the worst case.
October 20, 2004
L14.18
Types of amortized analyses

Three common amortization arguments:
the aggregate method,
the accounting method,
the potential method.
Weve just seen an aggregate analysis.
The aggregate method, though simple, lacks the
precision of the other two methods. In particular,
the accounting and potential methods allow a
specific amortized cost to be allocated to each
operation.
October 20, 2004
L14.19
Accounting method
Charge i th operation a fictitious amortized cost
i, where $1 pays for 1 unit of work (i.e., time).
This fee is consumed to perform the operation.
Any amount not immediately consumed is stored
in the bank for use by subsequent operations.
The bank balance must not go negative! We
must ensure that n
n
ci ci
i =1
i =1
for all n.
Thus, the total amortized costs provide an upper
bound on the total true costs.
October 20, 2004
L14.20
Accounting analysis of
dynamic tables
Charge an amortized cost of i = $3 for the i th
insertion.
$1 pays for the immediate insertion.
$2 is stored for later table doubling.
When the table doubles, $1 pays to move a
recent item, and $1 pays to move an old item.
Example:
$0
$0 $0
$0 $0
$0 $2
$2 $2
$2 $2 $2 overflow
$0 $0
October 20, 2004
L14.21
dynamic tables
insertion.
Example:
overflow
$0
$0 $0
$0 $0
$0 $0
$0 $0
$0 $0
$0 $0
$0
$0 $0
October 20, 2004
L14.22
dynamic tables
insertion.
Example:
$0
$0 $0
$0 $0
$0 $0
$0 $0
$0 $0
$0 $0
$0 $2 $2 $2
$0 $0
October 20, 2004
L14.23
Accounting analysis
(continued)
Key invariant: Bank balance never drops below 0.
Thus, the sum of the amortized costs provides an
upper bound on the sum of the true costs.
i
sizei
16 16
ci
2* 3
banki
10
*Okay, so I lied. The first operation costs only $2, not $3.
October 20, 2004
L14.24
Potential method
IDEA: View the bank account as the potential
energy ( la physics) of the dynamic set.
Framework:
Start with an initial data structure D0.
Operation i transforms Di1 to Di.
The cost of operation i is ci.
Define a potential function : {Di} R,
such that (D0 ) = 0 and (Di ) 0 for all i.
The amortized cost i with respect to is
defined to be i = ci + (Di) (Di1).
October 20, 2004
L14.25
Understanding potentials
i = ci + (Di) (Di1)
potential difference i
If i > 0, then i > ci. Operation i stores

work in the data structure for later use.
If i < 0, then i < ci. The data structure
delivers up stored work to help pay for
operation i.
October 20, 2004
L14.26
The amortized costs bound

the true costs
The total amortized cost of n operations is
n
i =1
i =1
ci = (ci + ( Di ) ( Di1 ))
Summing both sides.
October 20, 2004
L14.27

the true costs
n
i =1
i =1
n
ci = (ci + ( Di ) ( Di1 ))
= ci + ( Dn ) ( D0 )
i =1
The series telescopes.
October 20, 2004
L14.28

the true costs
n
i =1
i =1
n
ci = (ci + ( Di ) ( Di1 ))
= ci + ( Dn ) ( D0 )
i =1
n
ci
i =1
since (Dn) 0 and

(D0 ) = 0.
October 20, 2004
L14.29
Potential analysis of table

doubling
Define the potential of the table after the ith
insertion by (Di) = 2i 2lg i. (Assume that
2lg 0 = 0.)
Note:
(D0 ) = 0,
(Di) 0 for all i.
Example:
= 26 23 = 4
$0
$0 $0
$0 $0
$0 $2
$2 $2
$2
$0 $0
accounting method)
October 20, 2004
L14.30
Calculation of amortized costs

The amortized cost of the i th insertion is
i = ci + (Di) (Di1)
=
i + (2i 2lg i) (2(i 1) 2lg (i1))

if i 1 is an exact power of 2,
1 + (2i 2lg i) (2(i 1) 2lg (i1))
otherwise.
October 20, 2004
L14.31
Calculation (Case 1)
Case 1: i 1 is an exact power of 2.
i = i + (2i 2lg i) (2(i 1) 2lg (i1))

= i + 2 (2lg i 2lg (i1))
= i + 2 (2(i 1) (i 1))
= i + 2 2i + 2 + i 1
=3
October 20, 2004
L14.32
Calculation (Case 2)
Case 2: i 1 is not an exact power of 2.
i = 1 + (2i 2lg i) (2(i 1) 2lg (i1))

= 1 + 2 (2lg i 2lg (i1))
=3
Therefore, n insertions cost (n) in the worst case.

Exercise: Fix the bug in this analysis to show that
the amortized cost of the first insertion is only 2.
October 20, 2004
L14.33
Conclusions
Amortized costs can provide a clean abstraction
of data-structure performance.
Any of the analysis methods can be used when
an amortized analysis is called for, but each
method has some situations where it is arguably
the simplest.
Different schemes may work for assigning
amortized costs in the accounting method, or
potentials in the potential method, sometimes
yielding radically different bounds.
October 20, 2004
L14.34
6.046J/18.401J
LECTURE 12
Dynamic programming
Longest common
subsequence
Optimal substructure
Overlapping subproblems
Dynamic programming
Design technique, like divide-and-conquer.
Example: Longest Common Subsequence (LCS)
Given two sequences x[1 . . m] and y[1 . . n], find
a longest subsequence common to them both.
October 25, 2004
L12.2
Dynamic programming
a not the
October 25, 2004
L12.3
Dynamic programming
a not the
x: A B
C
B
D A B
y: B
October 25, 2004
L12.4
Dynamic programming
a not the
x: A B
C
B
D A B
BCBA =
LCS(x, y)
y: B
D C
A B
A
functional notation,
but not a function
October 25, 2004
L12.5
Brute-force LCS algorithm

Check every subsequence of x[1 . . m] to see
if it is also a subsequence of y[1 . . n].
October 25, 2004
L12.6
Brute-force LCS algorithm

Check every subsequence of x[1 . . m] to see
if it is also a subsequence of y[1 . . n].
Analysis
Checking = O(n) time per subsequence.
2m subsequences of x (each bit-vector of
length m determines a distinct subsequence
of x).
Worst-case running time = O(n2m)
= exponential time.
October 25, 2004
L12.7
Towards a better algorithm

Simplification:
1. Look at the length of a longest-common
subsequence.
2. Extend the algorithm to find the LCS itself.
October 25, 2004
L12.8

Simplification:
subsequence.
Notation: Denote the length of a sequence s
by | s |.
October 25, 2004
L12.9

Simplification:
subsequence.
Notation: Denote the length of a sequence s
by | s |.
Strategy: Consider prefixes of x and y.
Define c[i, j] = | LCS(x[1 . . i], y[1 . . j]) |.
Then, c[m, n] = | LCS(x, y) |.
October 25, 2004
L12.10
Recursive formulation
Theorem.
c[i, j] =
c[i1, j1] + 1
if x[i] = y[j],
max{c[i1, j], c[i, j1]} otherwise.
October 25, 2004
L12.11
Theorem.
c[i1, j1] + 1
if x[i] = y[j],
c[i, j] = max{c[i1, j], c[i, j1]} otherwise.
Proof. Case x[i] = y[ j]:
x:
1
1
2
2
y:
L
j
October 25, 2004
L12.12
Theorem.
c[i1, j1] + 1
if x[i] = y[j],
c[i, j] = max{c[i1, j], c[i, j1]} otherwise.
Proof. Case x[i] = y[ j]:
x:
1
1
2
2
y:
L
j
Let z[1 . . k] = LCS(x[1 . . i], y[1 . . j]), where c[i, j]

= k. Then, z[k] = x[i], or else z could be extended.
Thus, z[1 . . k1] is CS of x[1 . . i1] and y[1 . . j1].
October 25, 2004
L12.13
Proof (continued)
Claim: z[1 . . k1] = LCS(x[1 . . i1], y[1 . . j1]).
Suppose w is a longer CS of x[1 . . i1] and
y[1 . . j1], that is, | w | > k1. Then, cut and
paste: w || z[k] (w concatenated with z[k]) is a
common subsequence of x[1 . . i] and y[1 . . j]
with | w || z[k] | > k. Contradiction, proving the
claim.
October 25, 2004
L12.14
Proof (continued)
Claim: z[1 . . k1] = LCS(x[1 . . i1], y[1 . . j1]).
Suppose w is a longer CS of x[1 . . i1] and
y[1 . . j1], that is, | w | > k1. Then, cut and
paste: w || z[k] (w concatenated with z[k]) is a
common subsequence of x[1 . . i] and y[1 . . j]
with | w || z[k] | > k. Contradiction, proving the
claim.
Thus, c[i1, j1] = k1, which implies that c[i, j]
= c[i1, j1] + 1.
Other cases are similar.
October 25, 2004
L12.15
Dynamic-programming
hallmark #1
An optimal solution to a problem
(instance) contains optimal
solutions to subproblems.
October 25, 2004
L12.16
Dynamic-programming
hallmark #1
An optimal solution to a problem
(instance) contains optimal
solutions to subproblems.
If z = LCS(x, y), then any prefix of z is
an LCS of a prefix of x and a prefix of y.
October 25, 2004
L12.17
Recursive algorithm for LCS

LCS(x, y, i, j)
if x[i] = y[ j]
then c[i, j] LCS(x, y, i1, j1) + 1
else c[i, j] max{ LCS(x, y, i1, j),
LCS(x, y, i, j1)}
October 25, 2004
L12.18
Recursive algorithm for LCS

LCS(x, y, i, j)
if x[i] = y[ j]
LCS(x, y, i, j1)}
Worst-case: x[i] y[ j], in which case the
algorithm evaluates two subproblems, each
with only one parameter decremented.
October 25, 2004
L12.19
Recursion tree
m = 3, n = 4:
3,4
3,4
2,4
2,4
1,4
1,4
3,3
3,3
2,3
2,3
1,3
1,3
3,2
3,2
2,3
2,3
2,2
2,2
1,3
1,3
2,2
2,2
October 25, 2004
L12.20
Recursion tree
m = 3, n = 4:
3,4
3,4
2,4
2,4
1,4
1,4
3,3
3,3
2,3
2,3
1,3
1,3
3,2
3,2
2,3
2,3
2,2
2,2
1,3
1,3
m+n
2,2
2,2
Height = m + n work potentially exponential.

October 25, 2004
L12.21
Recursion tree
m = 3, n = 4:
3,4
3,4
2,4
2,4
1,4
1,4
3,3
3,3
same
subproblem
2,3
2,3
1,3
1,3
3,2
3,2
2,3
2,3
2,2
2,2
1,3
1,3
m+n
2,2
2,2
Height = m + n work potentially exponential.,

but were solving subproblems already solved!
October 25, 2004
L12.22
Dynamic-programming
hallmark #2
A recursive solution contains a
small number of distinct
subproblems repeated many times.
October 25, 2004
L12.23
Dynamic-programming
hallmark #2
A recursive solution contains a
small number of distinct
subproblems repeated many times.
The number of distinct LCS subproblems for
two strings of lengths m and n is only mn.
October 25, 2004
L12.24
Memoization algorithm
Memoization: After computing a solution to a
subproblem, store it in a table. Subsequent calls
check the table to avoid redoing work.
October 25, 2004
L12.25
LCS(x, y, i, j)
if c[i, j] = NIL
then if x[i] = y[j]
LCS(x, y, i, j1)}
same
as
before
October 25, 2004
L12.26
LCS(x, y, i, j)
if c[i, j] = NIL
then if x[i] = y[j]
LCS(x, y, i, j1)}
same
as
before
Time = (mn) = constant work per table entry.

Space = (mn).
October 25, 2004
L12.27
Dynamic-programming
algorithm
IDEA:
Compute the
table bottom-up.
A B C B D
00 00 00 00 00 00
B 00 00 11 11 11 11
D 00 00 11 11 11 22
C 00 00 11
A 00 11 11
B 00 11 22
A 00 11 22
A B
00 00
11 11
22 22
22 22 22 22 22
22 22 22 33 33
22 33 33 33 44
22 33 33 44 44
October 25, 2004
L12.28
Dynamic-programming
algorithm
IDEA:
Compute the
table bottom-up.
Time = (mn).
A B C B D
00 00 00 00 00 00
B 00 00 11 11 11 11
D 00 00 11 11 11 22
C 00 00 11
A 00 11 11
B 00 11 22
A 00 11 22
A B
00 00
11 11
22 22
22 22 22 22 22
22 22 22 33 33
22 33 33 33 44
22 33 33 44 44
October 25, 2004
L12.29
Dynamic-programming
algorithm
IDEA:
Compute the
table bottom-up.
Time = (mn).
Reconstruct
LCS by tracing
backwards.
A B C B D
00 00 00 00 00 00
B 00 00 11 11 11 11
D 00 00 11 11 11 22
C 00 00 11
A 00 11 11
B 00 11 22
A 00 11 22
A B
00 00
11 11
22 22
22 22 22 22 22
22 22 22 33 33
22 33 33 33 44
22 33 33 44 44
October 25, 2004
L12.30
Dynamic-programming
algorithm
IDEA:
Compute the
table bottom-up.
Time = (mn).
Reconstruct
LCS by tracing
backwards.
Space = (mn).
Exercise:
O(min{m, n}).
A B C B D
00 00 00 00 00 00
B 00 00 11 11 11 11
D 00 00 11 11 11 22
C 00 00 11
A 00 11 11
B 00 11 22
A 00 11 22
A B
00 00
11 11
22 22
22 22 22 22 22
22 22 22 33 33
22 33 33 33 44
22 33 33 44 44
October 25, 2004
L12.31
6.046J/18.401J
LECTURE 13
Graph algorithms
Graph representation
Minimum spanning trees
Greedy algorithms
Greedy choice
Prims greedy MST
algorithm
Graphs (review)
Definition. A directed graph (digraph)
G = (V, E) is an ordered pair consisting of
a set V of vertices (singular: vertex),
a set E V V of edges.
In an undirected graph G = (V, E), the edge
set E consists of unordered pairs of vertices.
In either case, we have | E | = O(V 2). Moreover,
if G is connected, then | E | | V | 1, which
implies that lg | E | = (lg V).
(Review CLRS, Appendix B.)
October 27, 2004
L13.2
Adjacency-matrix
representation
The adjacency matrix of a graph G = (V, E), where
V = {1, 2, , n}, is the matrix A[1 . . n, 1 . . n]
given by
1 if (i, j) E,
A[i, j] =
0 if (i, j) E.
October 27, 2004
L13.3
Adjacency-matrix
representation
The adjacency matrix of a graph G = (V, E), where
V = {1, 2, , n}, is the matrix A[1 . . n, 1 . . n]
given by
1 if (i, j) E,
A[i, j] =
0 if (i, j) E.
22
11
33
44
A 1 2 3 4
1 0 1 1 0
2 0 0 1 0
3 0 0 0 0
4 0 0 1 0
(V 2) storage
dense
representation.
October 27, 2004
L13.4
Adjacency-list representation
An adjacency list of a vertex v V is the list Adj[v]
of vertices adjacent to v.
Adj[1] = {2, 3}
22
11
33
44
Adj[2] = {3}
Adj[3] = {}
Adj[4] = {3}
October 27, 2004
L13.5
Adj[1] = {2, 3}
22
11
Adj[2] = {3}
Adj[3] = {}
Adj[4] = {3}
33
44
For undirected graphs, | Adj[v] | = degree(v).
For digraphs, | Adj[v] | = out-degree(v).
October 27, 2004
L13.6
Adj[1] = {2, 3}
22
11
Adj[2] = {3}
Adj[3] = {}
Adj[4] = {3}
33
44
For undirected graphs, | Adj[v] | = degree(v).
For digraphs, | Adj[v] | = out-degree(v).
Handshaking Lemma: vV = 2 |E| for undirected
graphs adjacency lists use (V + E) storage
a sparse representation (for either type of graph).
October 27, 2004
L13.7

Input: A connected, undirected graph G = (V, E)
with weight function w : E R.
For simplicity, assume that all edge weights are
distinct. (CLRS covers the general case.)
October 27, 2004
L13.8

Input: A connected, undirected graph G = (V, E)
with weight function w : E R.
For simplicity, assume that all edge weights are
distinct. (CLRS covers the general case.)
Output: A spanning tree T a tree that connects
all vertices of minimum weight:
w(T ) = w(u , v) .
(u ,v )T
October 27, 2004
L13.9
Example of MST
6
12
9
5
14
8
3
15
10
October 27, 2004
L13.10
Example of MST
6
12
9
5
14
8
3
15
10
October 27, 2004
L13.11
MST T:
(Other edges of G
are not shown.)
October 27, 2004
L13.12
MST T:
(Other edges of G
are not shown.)
Remove any edge (u, v) T.
October 27, 2004
L13.13
MST T:
(Other edges of G
are not shown.)
Remove any edge (u, v) T.
October 27, 2004
L13.14
MST T:
(Other edges of G
are not shown.)
T2
T1
v
Remove any edge (u, v) T. Then, T is partitioned

into two subtrees T1 and T2.
October 27, 2004
L13.15
MST T:
(Other edges of G
are not shown.)
T2
T1
v
Remove any edge (u, v) T. Then, T is partitioned

into two subtrees T1 and T2.
Theorem. The subtree T1 is an MST of G1 = (V1, E1),
the subgraph of G induced by the vertices of T1:
V1 = vertices of T1,
E1 = { (x, y) E : x, y V1 }.
Similarly for T2.
October 27, 2004
L13.16
Proof of optimal substructure

Proof. Cut and paste:
w(T) = w(u, v) + w(T1) + w(T2).
If T1 were a lower-weight spanning tree than T1 for
G1, then T = {(u, v)} T1 T2 would be a
lower-weight spanning tree than T for G.
October 27, 2004
L13.17

w(T) = w(u, v) + w(T1) + w(T2).
Do we also have overlapping subproblems?
Yes.
October 27, 2004
L13.18

w(T) = w(u, v) + w(T1) + w(T2).
Do we also have overlapping subproblems?
Yes.
Great, then dynamic programming may work!
Yes, but MST exhibits another powerful property
which leads to an even more efficient algorithm.
October 27, 2004
L13.19
Hallmark for greedy

algorithms
Greedy-choice property
A locally optimal choice
is globally optimal.
October 27, 2004
L13.20
Hallmark for greedy

algorithms
Greedy-choice property
A locally optimal choice
is globally optimal.
Theorem. Let T be the MST of G = (V, E),

and let A V. Suppose that (u, v) E is the
least-weight edge connecting A to V A.
Then, (u, v) T.
October 27, 2004
L13.21
Proof of theorem
Proof. Suppose (u, v) T. Cut and paste.
T:
v
A
VA
u
(u, v) = least-weight edge
connecting A to V A
October 27, 2004
L13.22
Proof of theorem
T:
v
A
VA
u
connecting A to V A
Consider the unique simple path from u to v in T.
October 27, 2004
L13.23
Proof of theorem
T:
v
A
VA
u
connecting A to V A

Swap (u, v) with the first edge on this path that
connects a vertex in A to a vertex in V A.
October 27, 2004
L13.24
Proof of theorem
T :
A
VA
v
u
connecting A to V A

Swap (u, v) with the first edge on this path that
connects a vertex in A to a vertex in V A.
A lighter-weight spanning tree than T results.
October 27, 2004
L13.25
Prims algorithm
IDEA: Maintain V A as a priority queue Q. Key
each vertex in Q with the weight of the leastweight edge connecting it to a vertex in A.
QV
key[v] for all v V
key[s] 0 for some arbitrary s V
while Q
do u EXTRACT-MIN(Q)
for each v Adj[u]
do if v Q and w(u, v) < key[v]
then key[v] w(u, v)
DECREASE-KEY
[v] u
At the end, {(v, [v])} forms the MST.

October 27, 2004
L13.26
Example of Prims algorithm

A
VA
14
8
3
12
9
7
15
00
10
October 27, 2004
L13.27

A
VA
14
8
3
12
9
7
15
00
10
October 27, 2004
L13.28

A
VA
14
77
7
8
3
12
00
10
10
15
15
15
10
October 27, 2004
L13.29

A
VA
14
77
7
8
3
12
00
10
10
15
15
15
10
October 27, 2004
L13.30

A
VA
12
12
55
14
77
7
8
3
12
00
10
10
99
15
15
15
10
October 27, 2004
L13.31

A
VA
12
12
55
14
77
7
8
3
12
00
10
10
99
15
15
15
10
October 27, 2004
L13.32

A
VA
66
55
14
14
14
77
7
8
3
12
00
88
99
15
15
15
10
October 27, 2004
L13.33

A
VA
66
55
14
14
14
77
7
8
3
12
00
88
99
15
15
15
10
October 27, 2004
L13.34

A
VA
66
55
14
14
14
77
7
8
3
12
00
88
99
15
15
15
10
October 27, 2004
L13.35

A
VA
66
55
14
3
77
7
33
12
00
88
99
15
15
15
10
October 27, 2004
L13.36

A
VA
66
55
14
3
77
7
33
12
00
88
99
15
15
15
10
October 27, 2004
L13.37

A
VA
66
55
14
3
77
7
33
12
00
88
99
15
15
15
10
October 27, 2004
L13.38

A
VA
66
55
14
3
77
7
33
12
00
88
99
15
15
15
10
October 27, 2004
L13.39
Analysis of Prim
QV
key[v] for all v V
while Q
do u EXTRACT-MIN(Q)
for each v Adj[u]
then key[v] w(u, v)
[v] u
October 27, 2004
L13.40
Analysis of Prim
(V)
total
QV
key[v] for all v V
while Q
do u EXTRACT-MIN(Q)
for each v Adj[u]
then key[v] w(u, v)
[v] u
October 27, 2004
L13.41
Analysis of Prim
(V)
total
|V |
times
QV
key[v] for all v V
while Q
do u EXTRACT-MIN(Q)
for each v Adj[u]
then key[v] w(u, v)
[v] u
October 27, 2004
L13.42
Analysis of Prim
QV
(V)
key[v] for all v V
total
while Q
do u EXTRACT-MIN(Q)
for each v Adj[u]
|V |
times degree(u)
times
then key[v] w(u, v)
[v] u
October 27, 2004
L13.43
Analysis of Prim
QV
(V)
key[v] for all v V
total
while Q
do u EXTRACT-MIN(Q)
for each v Adj[u]
|V |
times degree(u)
times
then key[v] w(u, v)
[v] u
Handshaking Lemma (E) implicit DECREASE-KEYs.
October 27, 2004
L13.44
Analysis of Prim
QV
(V)
key[v] for all v V
total
while Q
do u EXTRACT-MIN(Q)
for each v Adj[u]
|V |
times degree(u)
times
then key[v] w(u, v)
[v] u
Time = (V)TEXTRACT-MIN + (E)TDECREASE-KEY

October 27, 2004
L13.45
Analysis of Prim (continued)

October 27, 2004
L13.46

Q
TEXTRACT-MIN TDECREASE-KEY
Total
October 27, 2004
L13.47

Q
array
O(V)
O(1)
Total
O(V2)
October 27, 2004
L13.48

Q
Total
array
O(V)
O(1)
O(V2)
binary
heap
O(lg V)
O(lg V)
O(E lg V)
October 27, 2004
L13.49

Q
Total
array
O(V)
O(1)
O(V2)
binary
heap
O(lg V)
O(lg V)
O(E lg V)
Fibonacci O(lg V)
heap
amortized
O(1)
O(E + V lg V)
amortized worst case
October 27, 2004
L13.50
MST algorithms
Kruskals algorithm (see CLRS):
Uses the disjoint-set data structure (Lecture 10).
Running time = O(E lg V).
October 27, 2004
L13.51
MST algorithms
Kruskals algorithm (see CLRS):
Uses the disjoint-set data structure (Lecture 10).
Running time = O(E lg V).
Best to date:
Karger, Klein, and Tarjan [1993].
Randomized algorithm.
O(V + E) expected time.
October 27, 2004
L13.52
6.046J/18.401J
LECTURE 14
Shortest Paths I
Properties of shortest paths
Dijkstras algorithm
Correctness
Analysis
Breadth-first search
Paths in graphs
Consider a digraph G = (V, E) with edge-weight
function w : E R. The weight of path p = v1
v2 L vk is defined to be
k 1
w( p ) = w(vi , vi +1 ) .
i =1
November 1, 2004
L14.2
Paths in graphs
Consider a digraph G = (V, E) with edge-weight
function w : E R. The weight of path p = v1
v2 L vk is defined to be
k 1
w( p ) = w(vi , vi +1 ) .
i =1
Example:
vv11
vv22
vv33
vv44
vv55
w(p) = 2
November 1, 2004
L14.3
Shortest paths
A shortest path from u to v is a path of
minimum weight from u to v. The shortestpath weight from u to v is defined as
(u, v) = min{w(p) : p is a path from u to v}.
Note: (u, v) = if no path from u to v exists.
November 1, 2004
L14.4
Theorem. A subpath of a shortest path is a
shortest path.
November 1, 2004
L14.5
shortest path.
November 1, 2004
L14.6
shortest path.
November 1, 2004
L14.7
Triangle inequality
Theorem. For all u, v, x V, we have
(u, v) (u, x) + (x, v).
November 1, 2004
L14.8
Triangle inequality
Theorem. For all u, v, x V, we have
(u, v) (u, x) + (x, v).
Proof.
(u, v)
uu
(u, x)
vv
(x, v)
xx
November 1, 2004
L14.9
Well-definedness of shortest
paths
If a graph G contains a negative-weight cycle,
then some shortest paths may not exist.
November 1, 2004
L14.10
Well-definedness of shortest
paths
If a graph G contains a negative-weight cycle,
then some shortest paths may not exist.
Example:
<0
uu
vv
November 1, 2004
L14.11
Single-source shortest paths

Problem. From a given source vertex s V, find
the shortest-path weights (s, v) for all v V.
If all edge weights w(u, v) are nonnegative, all
shortest-path weights must exist.
IDEA: Greedy.
1. Maintain a set S of vertices whose shortestpath distances from s are known.
2. At each step add to S the vertex v V S
whose distance estimate from s is minimal.
3. Update the distance estimates of vertices
adjacent to v.
November 1, 2004
L14.12
Dijkstras algorithm
d[s] 0
for each v V {s}
do d[v]
S
QV
Q is a priority queue maintaining V S
November 1, 2004
L14.13
Dijkstras algorithm
d[s] 0
for each v V {s}
do d[v]
S
QV
while Q
do u EXTRACT-MIN(Q)
S S {u}
for each v Adj[u]
do if d[v] > d[u] + w(u, v)
then d[v] d[u] + w(u, v)
November 1, 2004
L14.14
Dijkstras algorithm
d[s] 0
for each v V {s}
do d[v]
S
QV
while Q
do u EXTRACT-MIN(Q)
S S {u}
for each v Adj[u]
relaxation
step
Implicit DECREASE-KEY
November 1, 2004
L14.15
Example of Dijkstras
algorithm
Graph with
nonnegative
edge weights:
10
AA
1 4
3
BB
CC
2
8
D
D
7 9
EE
November 1, 2004
L14.16
algorithm
BB
Initialize:
10
0 AA
Q: A B C D E
0
1 4
3
CC
2
8
D
D
7 9
EE
S: {}
November 1, 2004
L14.17
algorithm
A EXTRACT-MIN(Q):
10
0 AA
Q: A B C D E
0
BB
1 4
3
CC
2
8
D
D
7 9
EE
S: { A }
November 1, 2004
L14.18
algorithm
Relax all edges leaving A:
10
0 AA
Q: A B C D E
0
10
10
BB
1 4
3
CC
3
2
8
D
D
7 9
EE
S: { A }
November 1, 2004
L14.19
algorithm
C EXTRACT-MIN(Q):
10
0 AA
Q: A B C D E
0
10
10
BB
2
8
1 4
3
CC
3
D
D
7 9
EE
S: { A, C }
November 1, 2004
L14.20
algorithm
Relax all edges leaving C:
10
0 AA
Q: A B C D E
0
10
7
11
7
BB
2
8
1 4
3
CC
3
11
D
D
7 9
EE
5
S: { A, C }
November 1, 2004
L14.21
algorithm
E EXTRACT-MIN(Q):
10
0 AA
Q: A B C D E
0
10
7
11
7
BB
1 4
3
CC
3
2
8
11
D
D
7 9
EE
5
S: { A, C, E }
November 1, 2004
L14.22
algorithm
Relax all edges leaving E:
10
0 AA
Q: A B C D E
0
10
7
7
11
11
7
BB
1 4
3
CC
3
2
8
11
D
D
7 9
EE
5
S: { A, C, E }
November 1, 2004
L14.23
algorithm
B EXTRACT-MIN(Q):
10
0 AA
Q: A B C D E
0
10
7
7
11
11
7
BB
1 4
3
CC
3
2
8
11
D
D
7 9
EE
5
S: { A, C, E, B }
November 1, 2004
L14.24
algorithm
Relax all edges leaving B:
10
0 AA
Q: A B C D E
0
10
7
7
11
11
9
7
BB
1 4
3
CC
3
9
D
D
2
8
7 9
EE
5
S: { A, C, E, B }
November 1, 2004
L14.25
algorithm
D EXTRACT-MIN(Q):
10
0 AA
Q: A B C D E
0
10
7
7
11
11
9
7
BB
1 4
3
CC
3
2
8
9
D
D
7 9
EE
5
S: { A, C, E, B, D }
November 1, 2004
L14.26
Correctness Part I
Lemma. Initializing d[s] 0 and d[v] for all
v V {s} establishes d[v] (s, v) for all v V,
and this invariant is maintained over any sequence
of relaxation steps.
November 1, 2004
L14.27
Correctness Part I
Lemma. Initializing d[s] 0 and d[v] for all
v V {s} establishes d[v] (s, v) for all v V,
and this invariant is maintained over any sequence
of relaxation steps.
Proof. Suppose not. Let v be the first vertex for
which d[v] < (s, v), and let u be the vertex that
caused d[v] to change: d[v] = d[u] + w(u, v). Then,
d[v] < (s, v)
supposition
(s, u) + (u, v) triangle inequality
(s,u) + w(u, v) sh. path specific path
d[u] + w(u, v)
v is first violation
Contradiction.
November 1, 2004
L14.28
Correctness Part II
Lemma. Let u be vs predecessor on a shortest
path from s to v. Then, if d[u] = (s, u) and edge
(u, v) is relaxed, we have d[v] = (s, v) after the
relaxation.
November 1, 2004
L14.29
Correctness Part II
Lemma. Let u be vs predecessor on a shortest
path from s to v. Then, if d[u] = (s, u) and edge
(u, v) is relaxed, we have d[v] = (s, v) after the
relaxation.
Proof. Observe that (s, v) = (s, u) + w(u, v).
Suppose that d[v] > (s, v) before the relaxation.
(Otherwise, were done.) Then, the test d[v] >
d[u] + w(u, v) succeeds, because d[v] > (s, v) =
(s, u) + w(u, v) = d[u] + w(u, v), and the
algorithm sets d[v] = d[u] + w(u, v) = (s, v).
November 1, 2004
L14.30
Correctness Part III

Theorem. Dijkstras algorithm terminates with
d[v] = (s, v) for all v V.
November 1, 2004
L14.31

Theorem. Dijkstras algorithm terminates with
d[v] = (s, v) for all v V.
Proof. It suffices to show that d[v] = (s, v) for every
v V when v is added to S. Suppose u is the first
vertex added to S for which d[u] > (s, u). Let y be the
first vertex in V S along a shortest path from s to u,
and let x be its predecessor:
uu
S, just before
adding u.
ss
xx
yy
November 1, 2004
L14.32

(continued)
S
ss
uu
xx
yy
Since u is the first vertex violating the claimed

invariant, we have d[x] = (s, x). When x was
added to S, the edge (x, y) was relaxed, which
implies that d[y] = (s, y) (s, u) < d[u]. But,
d[u] d[y] by our choice of u. Contradiction.
November 1, 2004
L14.33
Analysis of Dijkstra
while Q
do u EXTRACT-MIN(Q)
S S {u}
for each v Adj[u]
November 1, 2004
L14.34
|V |
times
while Q
do u EXTRACT-MIN(Q)
S S {u}
for each v Adj[u]
November 1, 2004
L14.35
|V |
times
while Q
do u EXTRACT-MIN(Q)
S S {u}
for each v Adj[u]
degree(u)
times
November 1, 2004
L14.36
|V |
times
while Q
do u EXTRACT-MIN(Q)
S S {u}
for each v Adj[u]
degree(u)
times
November 1, 2004
L14.37
|V |
times
while Q
do u EXTRACT-MIN(Q)
S S {u}
for each v Adj[u]
degree(u)
times
Time = (VTEXTRACT-MIN + ETDECREASE-KEY)

Note: Same formula as in the analysis of Prims
minimum spanning tree algorithm.
November 1, 2004
L14.38
(continued)
Q
Total
November 1, 2004
L14.39
(continued)
Q
array
O(V)
O(1)
Total
O(V2)
November 1, 2004
L14.40
(continued)
Q
Total
array
O(V)
O(1)
O(V2)
binary
heap
O(lg V)
O(lg V)
O(E lg V)
November 1, 2004
L14.41
(continued)
Q
Total
array
O(V)
O(1)
O(V2)
binary
heap
O(lg V)
O(lg V)
O(E lg V)
Fibonacci O(lg V)
heap
amortized
O(1)
O(E + V lg V)
amortized worst case
November 1, 2004
L14.42
Unweighted graphs
Suppose that w(u, v) = 1 for all (u, v) E.
Can Dijkstras algorithm be improved?
November 1, 2004
L14.43
Unweighted graphs
Use a simple FIFO queue instead of a priority
queue.
November 1, 2004
L14.44
Unweighted graphs
queue.
while Q
do u DEQUEUE(Q)
for each v Adj[u]
do if d[v] =
then d[v] d[u] + 1
ENQUEUE(Q, v)
November 1, 2004
L14.45
Unweighted graphs
queue.
while Q
do u DEQUEUE(Q)
for each v Adj[u]
do if d[v] =
then d[v] d[u] + 1
ENQUEUE(Q, v)
Analysis: Time = O(V + E).

November 1, 2004
L14.46
Example of breadth-first
search
aa
ff
hh
dd
bb
gg
ee
ii
cc
Q:
November 1, 2004
L14.47
search
0
aa
ff
hh
dd
bb
gg
ee
ii
cc
0
Q: a
November 1, 2004
L14.48
search
0
aa
ff
hh
dd
1
bb
gg
ee
ii
cc
1 1
Q: a b d
November 1, 2004
L14.49
search
0
aa
ff
hh
dd
1
bb
gg
ee
cc
ii
2
1 2 2
Q: a b d c e
November 1, 2004
L14.50
search
0
aa
ff
hh
dd
1
bb
gg
ee
cc
ii
2
2 2
Q: a b d c e
November 1, 2004
L14.51
search
0
aa
ff
hh
dd
1
bb
gg
ee
cc
ii
2
2
Q: a b d c e
November 1, 2004
L14.52
search
0
aa
dd
1
bb
cc
ff
1
3
hh
gg
ee
ii
3
3 3
Q: a b d c e g i
November 1, 2004
L14.53
search
4
0
aa
dd
1
bb
cc
ff
1
3
hh
gg
ee
ii
3
3 4
Q: a b d c e g i f
November 1, 2004
L14.54
search
0
aa
dd
1
bb
cc
ff
hh
gg
ee
ii
3
4 4
Q: a b d c e g i f h
November 1, 2004
L14.55
search
0
aa
dd
1
bb
cc
ff
hh
gg
ee
ii
3
4
November 1, 2004
L14.56
search
0
aa
dd
1
bb
cc
ff
hh
gg
ee
ii
November 1, 2004
L14.57
search
0
aa
dd
1
bb
cc
ff
hh
gg
ee
ii
November 1, 2004
L14.58
Correctness of BFS
while Q
do u DEQUEUE(Q)
for each v Adj[u]
do if d[v] =
then d[v] d[u] + 1
ENQUEUE(Q, v)
Key idea:
The FIFO Q in breadth-first search mimics
the priority queue Q in Dijkstra.
Invariant: v comes after u in Q implies that
d[v] = d[u] or d[v] = d[u] + 1.
November 1, 2004
L14.59
6.046J/18.401J
LECTURE 15
Shortest Paths II
Bellman-Ford algorithm
DAG shortest paths
Linear programming and
difference constraints
VLSI layout compaction
Negative-weight cycles
Recall: If a graph G = (V, E) contains a negativeweight cycle, then some shortest paths may not exist.
Example:
<0
uu
vv
November 3, 2004
L15.2
Negative-weight cycles
Recall: If a graph G = (V, E) contains a negativeweight cycle, then some shortest paths may not exist.
Example:
<0
uu
vv
Bellman-Ford algorithm: Finds all shortest-path

lengths from a source s V to all v V or
determines that a negative-weight cycle exists.
November 3, 2004
L15.3
Bellman-Ford algorithm
d[s] 0
for each v V {s}
do d[v]
initialization
for i 1 to | V | 1
do for each edge (u, v) E
relaxation
step
for each edge (u, v) E
then report that a negative-weight cycle exists
At the end, d[v] = (s, v), if no negative-weight cycles.
Time = O(VE).
November 3, 2004
L15.4
Example of Bellman-Ford
BB
1
3
AA
4
CC
1
5
2
2
D
D
EE
3
November 3, 2004
L15.5
BB
AA
4
CC
1
5
EE
D
D
Initialization.
November 3, 2004
L15.6
BB
0
AA
4
5
CC
1
3
1
5
2
8
D
D
EE
Order of edge relaxation.

November 3, 2004
L15.7
BB
0
AA
4
5
CC
1
3
1
5
2
8
D
D
EE
November 3, 2004
L15.8
BB
0
AA
4
5
CC
1
3
1
5
2
8
D
D
EE
November 3, 2004
L15.9
BB
0
AA
4
5
CC
1
3
1
5
2
8
D
D
EE
November 3, 2004
L15.10
1
BB
0
AA
4
5
CC
1
3
1
5
2
8
D
D
EE
November 3, 2004
L15.11
1
BB
0
AA
4
5
CC
4
1
3
1
5
2
8
D
D
EE
November 3, 2004
L15.12
1
BB
0
AA
4
5
CC
4
1
3
1
5
2
8
D
D
EE
November 3, 2004
L15.13
1
BB
0
AA
4
5
CC
2
4
1
3
1
5
2
8
D
D
EE
November 3, 2004
L15.14
1
BB
0
AA
4
5
CC
2
1
3
1
5
2
8
D
D
EE
November 3, 2004
L15.15
1
BB
0
AA
4
5
CC
2
1
3
2
8
D
D
EE
End of pass 1.
November 3, 2004
L15.16
1
BB
0
AA
4
5
CC
2
1
3
1
5
2
8
D
D
EE
November 3, 2004
L15.17
1
BB
0
AA
4
5
CC
2
1
3
1
5
2
8
D
D
EE
November 3, 2004
L15.18
1
BB
0
AA
4
5
CC
2
1
3
1
5
2
8
D
D
1
EE
November 3, 2004
L15.19
1
BB
0
AA
4
5
CC
2
1
3
1
5
2
8
D
D
1
EE
November 3, 2004
L15.20
1
BB
0
AA
4
5
CC
2
1
3
1
5
2
8
D
D
1
EE
November 3, 2004
L15.21
1
BB
0
AA
4
5
CC
2
1
3
1
5
2
8
D
D
1
EE
November 3, 2004
L15.22
1
BB
0
AA
4
5
CC
2
1
3
1
5
2
8
D
D
1
EE
November 3, 2004
L15.23
1
BB
0
AA
4
5
CC
2
1
3
1
5
2
8
D
D
2
1
EE
November 3, 2004
L15.24
1
BB
0
AA
4
5
CC
2
1
3
1
5
2
8
D
D
2
EE
End of pass 2 (and 3 and 4).

November 3, 2004
L15.25
Correctness
Theorem. If G = (V, E) contains no negativeweight cycles, then after the Bellman-Ford
algorithm executes, d[v] = (s, v) for all v V.
November 3, 2004
L15.26
Correctness
Theorem. If G = (V, E) contains no negativeweight cycles, then after the Bellman-Ford
algorithm executes, d[v] = (s, v) for all v V.
Proof. Let v V be any vertex, and consider a shortest
path p from s to v with the minimum number of edges.
s
p: vv0
0
vv11
vv22
vv33
vvkk
Since p is a shortest path, we have

(s, vi) = (s, vi1) + w(vi1, vi) .
November 3, 2004
L15.27
Correctness (continued)
s
p: vv0
0
vv11
vv22
vv33
v
vvkk
Initially, d[v0] = 0 = (s, v0), and d[v0] is unchanged by

subsequent relaxations (because of the lemma from
Lecture 14 that d[v] (s, v)).
After 1 pass through E, we have d[v1] = (s, v1).
After 2 passes through E, we have d[v2] = (s, v2).
M
After k passes through E, we have d[vk] = (s, vk).
Since G contains no negative-weight cycles, p is simple.
Longest simple path has |V| 1 edges.
November 3, 2004
L15.28
Detection of negative-weight
cycles
Corollary. If a value d[v] fails to converge after
|V| 1 passes, there exists a negative-weight
cycle in G reachable from s.
November 3, 2004
L15.29
Linear programming
Let A be an mn matrix, b be an m-vector, and c
be an n-vector. Find an n-vector x that maximizes
cTx subject to Ax b, or determine that no such
solution exists.
n
m
.
A
x b
maximizing
cT
November 3, 2004
x
L15.30
Linear-programming
algorithms
Algorithms for the general problem
Simplex methods practical, but worst-case
exponential time.
Interior-point methods polynomial time and
competes with simplex.
November 3, 2004
L15.31
Linear-programming
algorithms
Algorithms for the general problem
Simplex methods practical, but worst-case
exponential time.
Interior-point methods polynomial time and
competes with simplex.
Feasibility problem: No optimization criterion.
Just find x such that Ax b.
In general, just as hard as ordinary LP.
November 3, 2004
L15.32
Solving a system of difference

constraints
Linear programming where each row of A contains
exactly one 1, one 1, and the rest 0s.
Example:
x1 x2 3
xj xi wij
x2 x3 2
x1 x3 2
November 3, 2004
L15.33

constraints
Example:
Solution:
x1 = 3
x1 x2 3
x2 = 0
xj xi wij
x2 x3 2
x3 = 2
x1 x3 2
November 3, 2004
L15.34

constraints
Example:
Solution:
x1 = 3
x1 x2 3
x2 = 0
xj xi wij
x2 x3 2
x3 = 2
x1 x3 2
Constraint graph:
xj xi wij
vvii
wij
vvjj
(The A
matrix has
dimensions
|E | |V |.)
November 3, 2004
L15.35
Unsatisfiable constraints
Theorem. If the constraint graph contains
a negative-weight cycle, then the system of
differences is unsatisfiable.
November 3, 2004
L15.36
Proof. Suppose that the negative-weight cycle is
v1 v2 L vk v1. Then, we have
x2 x1
x3 x2
w12
w23
M
xk xk1 wk1, k
x1 xk wk1
November 3, 2004
L15.37
Proof. Suppose that the negative-weight cycle is
v1 v2 L vk v1. Then, we have
x2 x1
x3 x2
w12
w23
M
xk xk1 wk1, k
x1 xk wk1
0
weight of cycle
<0
Therefore, no
values for the xi
can satisfy the
constraints.
November 3, 2004
L15.38
Satisfying the constraints

Theorem. Suppose no negative-weight cycle
exists in the constraint graph. Then, the
constraints are satisfiable.
November 3, 2004
L15.39

Proof. Add a new vertex s to V with a 0-weight edge

to each vertex vi V.
vv11
vv44
vv77
vv99
vv33
November 3, 2004
L15.40

Proof. Add a new vertex s to V with a 0-weight edge

to each vertex vi V.
vv11
vv44
vv77
vv99
vv33
Note:
No negative-weight
cycles introduced
shortest paths exist.
November 3, 2004
L15.41
Proof (continued)
Claim: The assignment xi = (s, vi) solves the constraints.
Consider any constraint xj xi wij, and consider the
shortest paths from s to vj and vi:
ss
(s, vi)
(s, vj)
vvii
wij
vvjj
The triangle inequality gives us (s,vj) (s, vi) + wij.

Since xi = (s, vi) and xj = (s, vj), the constraint xj xi
wij is satisfied.
November 3, 2004
L15.42
Bellman-Ford and linear

programming
Corollary. The Bellman-Ford algorithm can
solve a system of m difference constraints on n
variables in O(m n) time.
Single-source shortest paths is a simple LP
problem.
In fact, Bellman-Ford maximizes x1 + x2 + L + xn
subject to the constraints xj xi wij and xi 0
(exercise).
Bellman-Ford also minimizes maxi{xi} mini{xi}
(exercise).
November 3, 2004
L15.43
Application to VLSI layout

compaction
Integrated
-circuit
features:
minimum separation
Problem: Compact (in one dimension) the
space between the features of a VLSI layout
without bringing any features too close together.
November 3, 2004
L15.44
VLSI layout compaction

d1
11
x1
x2
x2 x1 d 1 +
Bellman-Ford minimizes maxi{xi} mini{xi},
which compacts the layout in the x-dimension.
Constraint:
November 3, 2004
L15.45
6.046J/18.401J
LECTURE 16
Shortest Paths III
All-pairs shortest paths
Matrix-multiplication
algorithm
Floyd-Warshall algorithm
Johnsons algorithm
Shortest paths
Nonnegative edge weights
Dijkstras algorithm: O(E + V lg V)
General
Bellman-Ford algorithm: O(VE)
DAG
One pass of Bellman-Ford: O(V + E)
November 8, 2004
L16.2
Shortest paths
Dijkstras algorithm: O(E + V lg V)
General
Bellman-Ford: O(VE)
DAG
One pass of Bellman-Ford: O(V + E)

Dijkstras algorithm |V| times: O(VE + V 2 lg V)
General
Three algorithms today.
November 8, 2004
L16.3

Input: Digraph G = (V, E), where V = {1, 2,
, n}, with edge-weight function w : E R.
Output: n n matrix of shortest-path lengths
(i, j) for all i, j V.
November 8, 2004
L16.4

Input: Digraph G = (V, E), where V = {1, 2,
, n}, with edge-weight function w : E R.
Output: n n matrix of shortest-path lengths
(i, j) for all i, j V.
IDEA:
Run Bellman-Ford once from each vertex.
Time = O(V 2E).
Dense graph (n2 edges) (n 4) time in the
worst case.
Good first try!
November 8, 2004
L16.5
Dynamic programming
Consider the n n adjacency matrix A = (aij)
of the digraph, and define
dij(m) = weight of a shortest path from
i to j that uses at most m edges.
Claim: We have
0 if i = j,
(0)
dij =
if i j;
and for m = 1, 2, , n 1,
dij(m) = mink{dik(m1) + akj }.
November 8, 2004
L16.6
Proof of claim
ks
dij(m) = mink{dik(m1) + akj }
es
g
d
1e
ii
s
e
g
d
e
1
m
m
1
edg
es
jj
M
m 1 edges
November 8, 2004
L16.7
Proof of claim
ks
es
g
d
1e
ii
Relaxation!
s
e
g
d
e
1
m
m
1
edg
es
for k 1 to n
do if dij > dik + akj
then dij dik + akj
jj
M
m 1 edges
November 8, 2004
L16.8
Proof of claim
ks
es
g
d
1e
ii
Relaxation!
s
e
g
d
e
1
m
m
1
edg
es
for k 1 to n
do if dij > dik + akj
then dij dik + akj
jj
M
m 1 edges
Note: No negative-weight cycles implies

(i, j) = dij (n1) = dij (n) = dij (n+1) = L
November 8, 2004
L16.9
Compute C = A B, where C, A, and B are n n
matrices:
n
cij = aik bkj .
k =1
Time = (n3) using the standard algorithm.
November 8, 2004
L16.10
matrices:
n
cij = aik bkj .
k =1

What if we map + min and +?
November 8, 2004
L16.11
matrices:
n
cij = aik bkj .
k =1

What if we map + min and +?
cij = mink {aik + bkj}.
Thus, D(m) = D(m1) A.
Identity matrix = I =
0
0
0
0
= D0 = (dij(0)).
November 8, 2004
L16.12
(continued)
The (min, +) multiplication is associative, and
with the real numbers, it forms an algebraic
structure called a closed semiring.
Consequently, we can compute
D(1) = D(0) A = A1
D(2) = D(1) A = A2
M
M
D(n1) = D(n2) A = An1 ,
yielding D(n1) = ((i, j)).
Time = (nn3) = (n4). No better than n B-F.
November 8, 2004
L16.13
Improved matrix
multiplication algorithm
Repeated squaring: A2k = Ak Ak.
lg(n1)
2
4
2
.
Compute A , A , , A
O(lg n) squarings
Note: An1 = An = An+1 = L.
Time = (n3 lg n).
To detect negative-weight cycles, check the
diagonal for negative values in O(n) additional
time.
November 8, 2004
L16.14
Floyd-Warshall algorithm
Also dynamic programming, but faster!
Define cij(k) = weight of a shortest path from i
to j with intermediate vertices
belonging to the set {1, 2, , k}.
ii
kk
kk
kk
kk
jj
Thus, (i, j) = cij(n). Also, cij(0) = aij .

November 8, 2004
L16.15
Floyd-Warshall recurrence
cij(k) = min {cij(k1), cik(k1) + ckj(k1)}
cik
ii
(k1)
cij(k1)
ckj(k1)
jj
intermediate vertices in {1, 2, , k}
November 8, 2004
L16.16
Pseudocode for FloydWarshall

for k 1 to n
do for i 1 to n
do for j 1 to n
do if cij > cik + ckj
then cij cik + ckj
relaxation
Notes:
Okay to omit superscripts, since extra relaxations
cant hurt.
Runs in (n3) time.
Simple to code.
Efficient in practice.
November 8, 2004
L16.17
Transitive closure of a
directed graph
Compute tij =
1 if there exists a path from i to j,

0 otherwise.
IDEA: Use Floyd-Warshall, but with (, ) instead

of (min, +):
tij(k) = tij(k1) (tik(k1) tkj(k1)).

Time = (n3).
November 8, 2004
L16.18
Graph reweighting
Theorem. Given a function h : V R, reweight each
edge (u, v) E by wh(u, v) = w(u, v) + h(u) h(v).
Then, for any two vertices, all paths between them are
reweighted by the same amount.
November 8, 2004
L16.19
Graph reweighting
Theorem. Given a function h : V R, reweight each
edge (u, v) E by wh(u, v) = w(u, v) + h(u) h(v).
Then, for any two vertices, all paths between them are
reweighted by the same amount.
Proof. Let p = v1 v2 L vk be a path in G. We
k 1
have
wh ( p ) =
=
wh ( vi ,vi+1 )
i =1
k 1
( w( vi ,vi+1 )+ h ( vi ) h ( vi+1 ) )
i =1
k 1
w( vi ,vi+1 ) + h ( v1 ) h ( vk ) Same
i =1
= w ( p ) + h ( v1 ) h ( v k ) .
amount!
November 8, 2004
L16.20
Shortest paths in reweighted

graphs
Corollary. h(u, v) = (u, v) + h(u) h(v).
November 8, 2004
L16.21
Shortest paths in reweighted

graphs
Corollary. h(u, v) = (u, v) + h(u) h(v).
IDEA: Find a function h : V R such that
wh(u, v) 0 for all (u, v) E. Then, run
Dijkstras algorithm from each vertex on the
reweighted graph.
NOTE: wh(u, v) 0 iff h(v) h(u) w(u, v).
November 8, 2004
L16.22
Johnsons algorithm
1. Find a function h : V R such that wh(u, v) 0 for
all (u, v) E by using Bellman-Ford to solve the
difference constraints h(v) h(u) w(u, v), or
determine that a negative-weight cycle exists.
Time = O(V E).
2. Run Dijkstras algorithm using wh from each vertex
u V to compute h(u, v) for all v V.
Time = O(V E + V 2 lg V).
3. For each (u, v) V V, compute
(u, v) = h(u, v) h(u) + h(v) .
Time = O(V 2).
Total time = O(V E + V 2 lg V).

November 8, 2004
L16.23
6.046J/18.401
Lecture 18
Prof. Piotr Indyk
Today
We have seen algorithms for:
numerical data (sorting,

median)
graphs (shortest path, MST)
Today and the next lecture:

algorithms for geometric data
2003 by Piotr Indyk
November 10, 2004
L17.2
Computational Geometry
Algorithms for geometric problems
Applications: CAD, GIS, computer
vision,.
E.g., the closest pair problem:
Given: a set of points P={p1pn} in
the plane, such that pi=(xi,yi)
Goal: find a pair pi pj that

||p-q||= [(px-qx)2+(py-qy)2]1/2
minimizes ||pi pj||
We will see more examples in the next
lecture
2003 by Piotr Indyk
November 10, 2004
L17.3
Closest Pair
Find a closest pair among p1pn

Easy to do in O(n2) time
For all pi pj, compute ||pi pj|| and
choose the minimum
We will aim for O(n log n) time
2003 by Piotr Indyk
November 10, 2004
L17.4
Divide and conquer
Divide:
Compute the median of

x-coordinates
Split the points into PL
and PR, each of size n/2
Conquer: compute the
closest pairs for PL and PR
Combine the results (the
hard part)
2003 by Piotr Indyk
November 10, 2004
L17.5
Combine
2d
Let d=min(d1,d2)
Observe:
Need to check only pairs
which cross the dividing
line
Only interested in pairs
within distance < d
Suffices to look at points in the
2d-width strip around the
median line
2003 by Piotr Indyk
d2
d1
November 10, 2004
L17.6
Scanning the strip
Sort all points in the strip by their y-
coordinates, forming q1qk, k n.
Let yi be the y-coordinate of qi

dmin= d
For i=1 to k
j=i-1
While yi-yj < d

If ||qiqj||<d then dmin=||qiqj||
j:=j-1
Report dmin (and the corresponding pair)
2003 by Piotr Indyk
d
d
November 10, 2004
L17.7
Analysis
Correctness: easy
Running time is more

involved
Can we have many qjs
that are within distance
d from qi ?
No
Proof by packing
argument
2003 by Piotr Indyk
November 10, 2004
L17.8
Analysis, ctd.
Theorem: there are at most 7

qjs such that yi-yj d.
Proof:
Each such qj must lie either in

the left or in the right d d
square
Within each square, all points
have distance distance d
from others
We can pack at most 4 such
points into one square, so we
have 8 points total (incl. qi)
2003 by Piotr Indyk
qi
November 10, 2004
L17.9
Packing bound
Proving 4 is not easy
Will prove 5
Draw a disk of radius d/2
around each point
Disks are disjoint
The disk-square intersection

has area (d/2)2/4 = /16 d2
The square has area d2
Can pack at most 16/ 5.1

points
2003 by Piotr Indyk
November 10, 2004
L17.10
Running time
Divide: O(n)
Combine: O(n log n) because we sort by y
However, we can:
Sort all points by y at the beginning
Divide preserves the y-order of points

Then combine takes only O(n)
We get T(n)=2T(n/2)+O(n), so
T(n)=O(n log n)
2003 by Piotr Indyk
November 10, 2004
L17.11
Close pair
Given: P={p1pn}
Goal: check if there is any pair pi pj within
distance R from each other
Will give an O(n) time algorithm, using
radix sort !
(assuming coordinates are small integers)
2003 by Piotr Indyk
November 10, 2004
L17.12
Algorithm
Impose a square grid onto the plane, where each cell is
an R R square
Put each point into a bucket corresponding to the cell it
belongs to. That is:
For each point p=(x,y), create computes its bucket
ID b(p)=( x/R , y/R )
Radix sort all b(p) s

Each sequence of the same b(p) forms a bucket
If there is a bucket with > 4 points in it, answer YES
and exit
Otherwise, for each pP:
Let c =b(p)
Let C be the set of bucket IDs of the 8 cells
adjacent to c
For all points q from buckets in C {c}
If ||p-q||R, then answer YES and exit
Answer NO
(1,1), (1,2), (1,2), (2,1), (2,2), (2,2), (2,3), (3,1), (3,2)

2003 by Piotr Indyk
November 10, 2004
L17.13
Bucket access
Given a bucket ID c, how can we quickly
retrieve all points p such that b(p)=c ?
This is exactly the dictionary problem

(Lecture 7)
E.g., we can use hashing.
2003 by Piotr Indyk
November 10, 2004
L17.14
Analysis
Running time:
Putting points into the buckets: O(n) time
Checking if there is a heavy bucket: O(n)
Checking the cells: 9 4 n = O(n)
Overall: linear time
2003 by Piotr Indyk
November 10, 2004
L17.15
Computational Model
In the two lectures, we assume that

The input (e.g., point coordinates) are real numbers
We can perform (natural) operations on them in
constant time, with perfect precision
Advantage: simplicity
Drawbacks: highly non-trivial issues:
Theoretical: if we allow arbitrary operations on reals,

we can compress n numbers into a one number
Practical: algorithm designed for infinite precision
sometimes fail on real computers
2003 by Piotr Indyk
November 10, 2004
L17.16
6.046J/18.401
Lecture 17
Prof. Piotr Indyk
Computational Geometry ctd.
Segment intersection problem:
Given: a set of n distinct
segments s1sn, represented
by coordinates of endpoints
Detection: detect if there is any
pair si sj that intersects
Reporting: report all pairs of
intersecting segments
2003 by Piotr Indyk
November 15, 2004
L18.2
Segment intersection
Easy to solve in O(n2) time

Is it possible to get a better algorithm
for the reporting problem ?
NO (in the worst-case)
However:
We will see we can do better for
the detection problem
Moreover, the number of
intersections P is usually small.
Then, we would like an output
sensitive algorithm, whose
running time is low if P is small.
2003 by Piotr Indyk
November 15, 2004
L18.3
Result
We will show:
O(n log n) time for detection
O( (n +P) log n) time for reporting
We will use
(no, not divide and conquer)
Binary Search Trees
Specifically: Line sweep approach

2003 by Piotr Indyk
November 15, 2004
L18.4
Orthogonal segments
V-segment
All segments are either
horizontal or vertical
Assumption: all coordinates
are distinct
Therefore, only vertical-
horizontal intersections exist
2003 by Piotr Indyk
H-segment
November 15, 2004
L18.5
Orthogonal segments
Sweep line:
A vertical line sweeps the
plane from left to right
It stops at all important

x-coordinates, i.e., when it
hits a V-segment or
endpoints of an H-segment
Invariant: all intersections on
the left side of the sweep line
have been already reported
2003 by Piotr Indyk
November 15, 2004
L18.6
Orthogonal segments ctd.
We maintain sorted y-
coordinates of H-segments
currently intersected by the
sweep line (using a balanced 17
BST V)
When we hit the left point of 12
an H-segment, we add its y-
coordinate to V
When we hit the right point of
an H-segment, we delete its y-
coordinate from V
2003 by Piotr Indyk
November 15, 2004
L18.7
Orthogonal segments ctd.
Whenever we hit a Vsegment having coord.

ytop, ybot), we report all
H-segments in V with ycoordinates in [ytop, ybot]
2003 by Piotr Indyk
ytop
17
ybot
12
November 15, 2004
L18.8
Algorithm
Sort all V-segments and endpoints of Hsegments by their x-coordinates this gives
the trajectory of the sweep line
Scan the elements in the sorted list:
Left endpoint: add segment to tree V
Right endpoint: remove segment from V
V-segment: report intersections with the
H-segments stored in V
2003 by Piotr Indyk
November 15, 2004
L18.9
Analysis
Sorting: O(n log n)
Add/delete H-segments to/from vertical data

structure V:
O(log n) per operation
O(n log n) total
Processing V-segments:
O(log n) per intersection - SEE NEXT SLIDE
O(P log n) total
Overall: O( (P+ n) log n) time
Can be improved to O(P +n log n)
2003 by Piotr Indyk
November 15, 2004
L18.10
Analyzing intersections
Given:
A BST V containing y-coordinates
An interval I=[ybot,ytop]
Goal: report all ys in V that belong to I
Algorithm:
y=Successor(ybot)
While yytop
Report y
y:=Successor(y)
End
Time: (number of reported ys)*O(log n) + O(log n)

2003 by Piotr Indyk
November 15, 2004
L18.11
The general case
Assumption: all
coordinates of endpoints
and intersections distinct
In particular:
No vertical segments
No three segments
intersect at one point
2003 by Piotr Indyk
November 15, 2004
L18.12
Sweep line
Invariant (as before): all intersections

on the left of the sweep line have
been already reported
Stops at all important xcoordinates, i.e., when it hits
endpoints or intersections
Do not know the intersections in
advance !
The list of intersection coordinates is
constructed and maintained
dynamically
(in a horizontal data structure H)
2003 by Piotr Indyk
November 15, 2004
L18.13
Sweep line
Also need to maintain the

information about the segments
intersecting the sweep line
Cannot keep the values of ycoordinates of the segments !
Instead, we will maintain their
order .I.e., at any point, we
maintain all segments intersecting
the sweep line, sorted by the ycoordinates of the intersections
(in a vertical data structure V)
2003 by Piotr Indyk
November 15, 2004
L18.14
Algorithm
Initialize the vertical BST V (to empty)
Initialize the horizontal priority queue H (to contain the

segments endpoints sorted by x-coordinates)
Repeat
Take the next event p from H:
// Update V
If p is the left endpoint of a segment, add the segment

to V
If p is the right endpoint of a segment, remove the
segment from V
If p is the intersection point of s and s, swap the order
of s and s in V, report p
2003 by Piotr Indyk
November 15, 2004
L18.15
Algorithm ctd.
// Update H
For each new pair of neighbors s and s in V:
Check if s and s intersect on the right side of the

sweep line
If so, add their intersection point to H
Remove the possible duplicates in H
Until H is empty
2003 by Piotr Indyk
November 15, 2004
L18.16
Analysis
Initializing H: O(n log n)
Updating V:
O(log n) per operation
O( (P+n) log n) total
Updating H:
O(log n) per intersection
O(P log n) total
Overall: O( (P+ n) log n) time
2003 by Piotr Indyk
November 15, 2004
L18.17
Correctness
All reported intersections are correct
Assume there is an intersection not reported. Let p=(x,y)
be the first such unreported intersection (of s and s )
Let x be the last event before p. Observe that:
At time x segments s and s are neighbors on the

sweep line
Since no intersections were missed till then, V
maintained the right order of intersecting segments
Thus, s and s were neighbors in V at time x. Thus,
their intersection should have been detected
2003 by Piotr Indyk
November 15, 2004
L18.18
Changes
Ys change the order
2003 by Piotr Indyk
November 15, 2004
L18.19
6.046J/18.401J
LECTURE 19
Take-home exam
Instructions
Academic honesty
Strategies for doing well
Take-home quiz
The take-home quiz contains 5 problems worth

25 points each, for a total of 125 points.
1 easy
2 moderate
1 hard
1 very hard
November 17, 2004
L19.2
End of quiz
Your exam is due between 10:00 and
11:00 A.M. on Monday, November 22,

2004.
Late exams will not be accepted unless
you obtain a Deans Excuse or make
prior arrangements with your
recitation instructor.
You must hand in your own exam in
person.
November 17, 2004
L19.3
Planning
The quiz should take you about 12 hours to
do, but you have five days in which to do it.
Plan your time wisely. Do not overwork,

and get enough sleep.
Ample partial credit will be given for good
solutions, especially if they are well written.
The better your asymptotic running-time
bounds, the higher your score.
Bonus points will be given for exceptionally
efficient or elegant solutions.
November 17, 2004
L19.4
Format
Each problem should be answered on

a separate sheet (or sheets) of 3-hole
punched paper.
Mark the top of each problem with
your name,
6.046J/18.410J,
the problem number,
your recitation time,
and your TA.
November 17, 2004
L19.5
Executive summary
Your solution to a problem should start with

a topic paragraph that provides an executive
summary of your solution.
This executive summary should describe
the problem you are solving,
the techniques you use to solve it,
any important assumptions you make, and
the running time your algorithm achieves.
November 17, 2004
L19.6
Solutions
Write up your solutions cleanly and concisely

to maximize the chance that we understand
them.
Be explicit about running time and algorithms.
For example, don't just say you sort n numbers,

state that you are using heapsort, which sorts the n
numbers in O(n lg n) time in the worst case.
When describing an algorithm, give an English
description of the main idea of the algorithm.
Use pseudocode only if necessary to clarify

your solution.
November 17, 2004
L19.7
Solutions
Give examples, and draw figures.
Provide succinct and convincing arguments

for the correctness of your solutions.
Do not regurgitate material presented in class.
Cite algorithms and theorems from CLRS,
lecture, and recitation to simplify your
solutions.
November 17, 2004
L19.8
Assumptions
Part of the goal of this exam is to test

engineering common sense.
If you find that a question is unclear or
ambiguous, make reasonable assumptions
in order to solve the problem.
State clearly in your write-up what
assumptions you have made.
Be careful what you assume, however,

because you will receive little credit if you
make a strong assumption that renders a
problem trivial.
November 17, 2004
L19.9
Bugs, etc.
If you think that youve found a bug, please send

email.
Corrections and clarifications will be sent to the
class via email.
Check your email daily to avoid missing
potentially important announcements.
If you did not receive an email last night

reminding you about Quiz 2, then you are not on
the class email list. Please let your recitation
instructor know immediately.
November 17, 2004
L19.10
Academic honesty
This quiz is limited open book.
You may use

your course notes,
the CLRS textbook,
lecture videos,
basic reference materials such as dictionaries,
and
any of the handouts posted on the server.
No other sources whatsoever may be consulted!
November 17, 2004
L19.11
Academic honesty
For example, you may not use notes or solutions

from other times that this course or other related
courses have been taught, or materials on the
server.
These materials will not help you, but you may not
use them anyhow.
You may not communicate with any person

except members of the 6.046 staff about any
aspect of the exam until after noon on Monday,
November 22, even if you have already handed
in your exam.
November 17, 2004
L19.12
Academic honesty
If at any time you feel that you may have

violated this policy, it is imperative that you
contact the course staff immediately.
It will be much the worse for you if third parties
divulge your indiscretion.
If you have any questions about what resources
may or may not be used during the quiz, send
email.
November 17, 2004
L19.13
Poll of 78 quiz takers
Question 1: Did you cheat?
76 No.
1 Yes.
1 Abstain.
November 17, 2004
L19.14
Poll of 78 quiz takers
Question 2: How many people do you know

who cheated?
72 None.
2 3 people compared answers.
1 Suspect 2, but dont know.
1 Either 0 or 2.
1 Abstain.
1 10 (the cheater).
November 17, 2004
L19.15
Reread instructions
Please reread the exam
instructions in their entirety at
least once a day during the exam.
November 17, 2004
L19.16
Test-taking strategies
Manage your time.

Manage your psyche.
Brainstorm.
Write-up early and often.
November 17, 2004
L19.17
Manage your time
Work on all problems the first day.

Budget time for write-ups and debugging.
Dont get sucked into one problem at the

expense of others.
Replan your strategy every day.
November 17, 2004
L19.18
Manage your psyche
Get enough sleep.
Maintain a patient, persistent, and

positive attitude.
Use adrenaline productively.
Relax, and have fun.
Its not the end of the world!
November 17, 2004
L19.19
Brainstorm
Get an upper bound, even if it is loose.

Look for analogies with problems youve seen.
Exploit special structure.

Solve a simpler problem.
Draw diagrams.
Contemplate.
Be wary of self-imposed constraints think
out of the box.
Work out small examples, and abstract.
Understand things in two ways: sanity checks.
November 17, 2004
L19.20
Write up early and often
Write up partial solutions.

Groom your work every day.
Work on shortening and simplifying.
Provide an executive summary.

Ample partial credit will be given!
Unnecessarily long answers will be
penalized.
November 17, 2004
L19.21
Positive attitude
November 17, 2004
L19.22
6.046J/18.401J
LECTURE 20
Network Flow I
Flow networks
Maximum-flow problem
Flow notation
Properties of flow
Cuts
Residual networks
Augmenting paths
Flow networks
Definition. A flow network is a directed graph

G = (V, E) with two distinguished vertices: a
source s and a sink t. Each edge (u, v) E has
a nonnegative capacity c(u, v). If (u, v) E,
then c(u, v) = 0.
November 24, 2004
L20.2
Flow networks
Definition. A flow network is a directed graph

G = (V, E) with two distinguished vertices: a
source s and a sink t. Each edge (u, v) E has
a nonnegative capacity c(u, v). If (u, v) E,
then c(u, v) = 0.
Example:
2
3
3
1
ss
2
3
tt
2
2
November 24, 2004
L20.3
Flow networks
Definition. A positive flow on G is a function
p : V V R satisfying the following:

Capacity constraint: For all u, v V,
0 p(u, v) c(u, v).
Flow conservation: For all u V {s, t},
p(u, v) p(v,
u ) = 0 .
vV
vV
November 24, 2004
L20.4
Flow networks
Definition. A positive flow on G is a function
p : V V R satisfying the following:

0 p(u, v) c(u, v).
p(u, v) p(v, u ) = 0 .
vV
vV
The value of a flow is the net flow out of the

source:
p( s, v) p(v, s) .
vV
vV
November 24, 2004
L20.5
A flow on a network
positive
flow
capacity
2:2
2:3
1:3
0:1
ss
2:2
1:3 1:1 2:3

2:3
tt
1:2
1:2
November 24, 2004
L20.6
A flow on a network
positive
flow
capacity
2:2
2:3
1:3
0:1
ss
2:2
1:3 1:1 2:3
2:3
tt
1:2
1:2
Flow conservation (like Kirchoffs current law):
Flow into u is 2 + 1 = 3.
Flow out of u is 0 + 1 + 2 = 3.
The value of this flow is 1 0 + 2 = 3.
November 24, 2004
L20.7
The maximum-flow problem
Maximum-flow problem: Given a flow network

G, find a flow of maximum value on G.
2:2
2:3
2:3
0:1
ss
2:2
0:3 1:1 2:3

3:3
tt
1:2
2:2
The value of the maximum flow is 4.
November 24, 2004
L20.8
Flow cancellation
Without loss of generality, positive flow goes
either from u to v, or from v to u, but not both.
vv
2:3
vv
1:2
uu
1:3
0:2
uu
Net flow from

u to v in both
cases is 1.
The capacity constraint and flow conservation

are preserved by this transformation.
INTUITION: View flow as a rate, not a quantity.
November 24, 2004
L20.9
A notational simplification
IDEA: Work with the net flow between two
vertices, rather than with the positive flow.
Definition. A (net) flow on G is a function
f : V V R satisfying the following:

f (u, v) c(u, v).
f (u, v) = 0.
vV
Skew symmetry: For all u, v V,

f (u, v) = f (v, u).
November 24, 2004
L20.10
A notational simplification
IDEA: Work with the net flow between two
vertices, rather than with the positive flow.
Definition. A (net) flow on G is a function
f : V V R satisfying the following:
f (u, v) c(u, v).

summation
f (u, v) = 0. One
instead of two.
vV
Skew symmetry: For all u, v V,
f (u, v) = f (v, u).
November 24, 2004
L20.11
Equivalence of definitions
Theorem. The two definitions are equivalent.
November 24, 2004
L20.12
Equivalence of definitions
Theorem. The two definitions are equivalent.

Proof. () Let f (u, v) = p(u, v) p(v, u).
Capacity constraint: Since p(u, v) c(u, v) and
p(v, u) 0, we have f (u, v) c(u, v).
Flow conservation:
( p(u, v) p(v, u ) )
f (u , v) =
vV
= p (u , v) p (v, u )
vV
vV
vV
Skew symmetry:
f (u, v) = p(u, v) p(v, u)
= (p(v, u) p(u, v))
= f (v, u).
November 24, 2004
L20.13
Proof (continued)
() Let
p(u, v) =
f (u, v) if f(u, v) > 0,
0
if f(u, v) 0.
Capacity constraint: By definition, p(u, v) 0. Since f

(u, v) c(u, v), it follows that p(u, v) c(u, v).
Flow conservation: If f (u, v) > 0, then p(u, v) p(v, u)
= f (u, v). If f (u, v) 0, then p(u, v) p(v, u) = f (v, u)
= f (u, v) by skew symmetry. Therefore,
p(u, v) p(v, u) = f (u, v) .
vV
vV
vV
November 24, 2004
L20.14
Notation
Definition. The value of a flow f, denoted by | f |,

is given by
f = f ( s, v )
vV
= f ( s, V ) .
Implicit summation notation: A set used in

an arithmetic formula represents a sum over
the elements of the set.
Example flow conservation:
f (u, V) = 0 for all u V {s, t}.
November 24, 2004
L20.15
Simple properties of flow
Lemma.
f (X, X) = 0,
f (X, Y) = f (Y, X),
f (XY, Z) = f (X, Z) + f (Y, Z) if XY = .
November 24, 2004
L20.16
Simple properties of flow
Lemma.
f (X, X) = 0,
f (X, Y) = f (Y, X),
f (XY, Z) = f (X, Z) + f (Y, Z) if XY = .
Theorem. | f | = f (V, t).
Proof.
|f| =
=
=
=
=
f (s, V)
f (V, V) f (Vs, V)
Omit braces.
f (V, Vs)
f (V, t) + f (V, Vst)
f (V, t).
November 24, 2004
L20.17
Flow into the sink
2:2
2:3
2:3
0:1
ss
0:3 1:1 1:3
2:2
3:3
| f | = f (s, V) = 4
tt
0:2
2:2
f (V, t) = 4
November 24, 2004
L20.18
Cuts
Definition. A cut (S, T) of a flow network G =

(V, E) is a partition of V such that s S and t T.
If f is a flow on G, then the flow across the cut is
f (S, T).
2:2
2:3
2:3
0:1
ss
0:3 1:1 1:3
2:2
3:3
tt
0:2
2:2
f (S, T) = (2 + 2) + ( 2 + 1 1 + 2)
=4
S
T
November 24, 2004
L20.19
Another characterization of
flow value
Lemma. For any flow f and any cut (S, T), we
have | f | = f (S, T).
November 24, 2004
L20.20
Another characterization of
flow value
Lemma. For any flow f and any cut (S, T), we
have | f | = f (S, T).
Proof.
f (S, T) = f (S, V) f (S, S)

= f (S, V)
= f (s, V) + f (Ss, V)
= f (s, V)
= | f |.
November 24, 2004
L20.21
Capacity of a cut
Definition. The capacity of a cut (S, T) is c(S, T).
2:2
2:3
2:3
0:1
ss
0:3 1:1 1:3
2:2
tt
0:2
3:3
S
T
2:2
c(S, T) = (3 + 2) + (1 + 2 + 3)
= 11
November 24, 2004
L20.22
Upper bound on the maximum

flow value
Theorem. The value of any flow is bounded
above by the capacity of any cut.
November 24, 2004
L20.23
Upper bound on the maximum

flow value
Theorem. The value of any flow is bounded
above by the capacity of any cut.
Proof.
f = f (S ,T )
= f (u , v)
uS vT
c(u , v)
uS vT
= c( S , T ) .
November 24, 2004
L20.24
Residual network
Definition. Let f be a flow on G = (V, E). The

residual network Gf (V, Ef ) is the graph with
strictly positive residual capacities
cf (u, v) = c(u, v) f (u, v) > 0.
Edges in Ef admit more flow.
November 24, 2004
L20.25
Residual network

cf (u, v) = c(u, v) f (u, v) > 0.
Example:
G:
0:1
uu
vv
Gf :
3:5
uu
vv
2
November 24, 2004
L20.26
Residual network

cf (u, v) = c(u, v) f (u, v) > 0.
Example:
G:
0:1
uu
vv
Gf :
3:5
uu
vv
2
Lemma. | Ef | 2| E |.
November 24, 2004
L20.27
Augmenting paths
Definition. Any path from s to t in Gf is an augmenting path in G with respect to f. The flow
value can be increased along an augmenting
path p by c f ( p) = min {c f (u , v)}.
(u ,v )p
November 24, 2004
L20.28
Augmenting paths
Definition. Any path from s to t in Gf is an augmenting path in G with respect to f. The flow
value can be increased along an augmenting
path p by c f ( p) = min {c f (u , v)}.
(u ,v )p
Ex.:
3:5
G:
0:2
2:5
ss
cf (p) = 2
Gf :
2:6
tt
2
5:5
7
2:3
2
ss
tt
3
2
November 24, 2004
L20.29
Max-flow, min-cut theorem
Theorem. The following are equivalent:
1. f is a maximum flow.
2. Gf contains no augmenting paths.
3. | f | = c(S, T) for some cut (S, T) of G.
November 24, 2004
L20.30
2. Gf contains no augmenting paths.
3. | f | = c(S, T) for some cut (S, T) of G.
Proof (and algorithms). Next time.
November 24, 2004
L20.31
6.046J/18.401J
LECTURE 21
Network Flow II
Ford-Fulkerson algorithm
and analysis
Edmonds-Karp algorithm
and analysis
Best algorithms to date
Recall from Lecture 22
Flow value: | f | = f (s, V).
Cut: Any partition (S, T) of V such that s S

and t T.
Lemma. | f | = f (S, T) for any cut (S, T).
Corollary. | f | c(S, T) for any cut (S, T).
Residual graph: The graph Gf = (V, Ef ) with

strictly positive residual capacities cf (u, v) =
c(u, v) f (u, v) > 0.
Augmenting path: Any path from s to t in Gf .
Residual capacity of an augmenting path:
c f ( p) = min {c f (u , v)} .
(u ,v )p
November 29, 2004
L21.2

1. | f | = c(S, T) for some cut (S, T).
3. f admits no augmenting paths.
November 29, 2004
L21.3

Proof.
(1) (2): Since | f | c(S, T) for any cut (S, T) (by
the corollary from Lecture 22), the assumption that
| f | = c(S, T) implies that f is a maximum flow.
November 29, 2004
L21.4

Proof.
(1) (2): Since | f | c(S, T) for any cut (S, T) (by
the corollary from Lecture 22), the assumption that
| f | = c(S, T) implies that f is a maximum flow.
(2) (3): If there were an augmenting path, the
flow value could be increased, contradicting the
maximality of f.
November 29, 2004
L21.5
Proof (continued)
(3) (1): Suppose that f admits no augmenting paths.

Define S = {v V : there exists a path in Gf from s to v},
and let T = V S. Observe that s S and t T, and thus
(S, T) is a cut. Consider any vertices u S and v T.
ss
path in Gf
uu
vv
S
We must have cf (u, v) = 0, since if cf (u, v) > 0, then v S,

not v T as assumed. Thus, f (u, v) = c(u, v), since cf (u, v)
= c(u, v) f (u, v). Summing over all u S and v T
yields f (S, T) = c(S, T), and since | f | = f (S, T), the theorem
follows.
November 29, 2004
L21.6
Ford-Fulkerson max-flow
algorithm
Algorithm:
f [u, v] 0 for all u, v V

while an augmenting path p in G wrt f exists
do augment f by cf (p)
November 29, 2004
L21.7
algorithm
Algorithm:

Can be slow:
109
G:
ss
109
109
tt
109
November 29, 2004
L21.8
algorithm
Algorithm:

Can be slow:
0:109
G:
0:1
ss
0:109
0:109
tt
0:109
November 29, 2004
L21.9
algorithm
Algorithm:

Can be slow:
0:109
G:
0:1
ss
0:109
0:109
tt
0:109
November 29, 2004
L21.10
algorithm
Algorithm:

Can be slow:
1:109
G:
1:1
ss
0:109
0:109
tt
1:109
November 29, 2004
L21.11
algorithm
Algorithm:

Can be slow:
1:109
G:
1:1
ss
0:109
0:109
tt
1:109
November 29, 2004
L21.12
algorithm
Algorithm:

Can be slow:
1:109
G:
0:1
ss
1:109
1:109
tt
1:109
November 29, 2004
L21.13
algorithm
Algorithm:

Can be slow:
1:109
G:
0:1
ss
1:109
1:109
tt
1:109
November 29, 2004
L21.14
algorithm
Algorithm:

Can be slow:
2:109
G:
1:1
ss
1:109
1:109
tt
2:109
November 29, 2004
L21.15
algorithm
Algorithm:

Can be slow:
2:109
G:
1:109
1:1
ss
1:109
tt
2:109
2 billion iterations on a graph with 4 vertices!

November 29, 2004
L21.16
Edmonds-Karp algorithm
Edmonds and Karp noticed that many peoples

implementations of Ford-Fulkerson augment along
a breadth-first augmenting path: a shortest path in
Gf from s to t where each edge has weight 1. These
implementations would always run relatively fast.
Since a breadth-first augmenting path can be found
in O(E) time, their analysis, which provided the first
polynomial-time bound on maximum flow, focuses
on bounding the number of flow augmentations.
(In independent work, Dinic also gave polynomialtime bounds.)
November 29, 2004
L21.17
Monotonicity lemma
Lemma. Let (v) = f (s, v) be the breadth-first

distance from s to v in Gf . During the EdmondsKarp algorithm, (v) increases monotonically.
November 29, 2004
L21.18
Monotonicity lemma
Lemma. Let (v) = f (s, v) be the breadth-first

distance from s to v in Gf . During the EdmondsKarp algorithm, (v) increases monotonically.
Proof. Suppose that augmenting a flow f on G produces
a new flow f . Let (v) = f (s, v). Well show (v)
(v) by induction on (v). For the base case, (v) = 0
implies v = s, and since (s) = 0, we have (v) (v).
For the inductive case, consider a breadth-first path s
L u v in Gf . We must have (v) = (u) + 1, since
subpaths of shortest paths are shortest paths. Hence, we
have (u) (u) by induction, because (v) > (u).
Certainly, (u, v) Ef .
November 29, 2004
L21.19
Proof of Monotonicity Lemma

Case 1
Consider two cases depending on whether (u, v) Ef .
Case 1: (u, v) Ef .
We have
(v) (u) + 1
(triangle inequality)
(u) + 1
(induction)
= (v)
(breadth-first path),
and thus monotonicity of (v) is established.
November 29, 2004
L21.20
Proof of Monotonicity Lemma

Case 2
Case: (u, v) Ef .
Since (u, v) Ef , the augmenting path p that produced

f from f must have included (v, u). Moreover, p is a
breadth-first path in Gf :
p = s L v u L t .
Thus, we have
(v) = (u) 1
(breadth-first path)
(u) 1
(induction)
= (v) 2
< (v) ,
thereby establishing monotonicity for this case, too.
November 29, 2004
L21.21
Counting flow augmentations
Theorem. The number of flow augmentations

in the Edmonds-Karp algorithm (Ford-Fulkerson
with breadth-first augmenting paths) is O(V E).
November 29, 2004
L21.22

Proof. Let p be an augmenting path, and suppose that
we have cf (u, v) = cf (p) for edge (u, v) p. Then, we
say that (u, v) is critical, and it disappears from the
residual graph after flow augmentation.
November 29, 2004
L21.23

Proof. Let p be an augmenting path, and suppose that
we have cf (u, v) = cf (p) for edge (u, v) p. Then, we
say that (u, v) is critical, and it disappears from the
residual graph after flow augmentation.
Example:
cf (p) = 2
2
Gf :
ss
tt
3
1
2
November 29, 2004
L21.24

(continued)
The first time an edge (u, v) is critical, we have (v) =

(u) + 1, since p is a breadth-first path. We must wait
until (v, u) is on an augmenting path before (u, v) can
be critical again. Let be the distance function when
(v, u) is on an augmenting path. Then, we have
(u) = (v) + 1
(v) + 1
(monotonicity)
= (u) + 2
(breadth-first path).
November 29, 2004
L21.25

(continued)

(u) = (v) + 1
(v) + 1
(monotonicity)
= (u) + 2
Example:
uu
ss
tt
vv
November 29, 2004
L21.26

(continued)

(u) = (v) + 1
(v) + 1
(monotonicity)
= (u) + 2
Example:
(u) = 5
uu
ss
tt
vv
(v) = 6
November 29, 2004
L21.27

(continued)

(u) = (v) + 1
(v) + 1
(monotonicity)
= (u) + 2
Example:
(u) = 5
uu
ss
tt
vv
(v) = 6
November 29, 2004
L21.28

(continued)

(u) = (v) + 1
(v) + 1
(monotonicity)
= (u) + 2
Example:
(u) 7
uu
ss
tt
vv
(v) 6
November 29, 2004
L21.29

(continued)

(u) = (v) + 1
(v) + 1
(monotonicity)
= (u) + 2
Example:
(u) 7
uu
ss
tt
vv
(v) 6
November 29, 2004
L21.30

(continued)

(u) = (v) + 1
(v) + 1
(monotonicity)
= (u) + 2
Example:
(u) 7
uu
ss
tt
vv
(v) 8
November 29, 2004
L21.31
Running time of EdmondsKarp

Distances start out nonnegative, never decrease, and are
at most |V| 1 until the vertex becomes unreachable.
Thus, (u, v) occurs as a critical edge O(V) times, because
(v) increases by at least 2 between occurrences. Since
the residual graph contains O(E) edges, the number of
flow augmentations is O(V E).
November 29, 2004
L21.32

Corollary. The Edmonds-Karp maximum-flow

algorithm runs in O(V E 2) time.
November 29, 2004
L21.33

Corollary. The Edmonds-Karp maximum-flow

algorithm runs in O(V E 2) time.
Proof. Breadth-first search runs in O(E) time, and all
other bookkeeping is O(V) per augmentation.
November 29, 2004
L21.34
Best to date
The asymptotically fastest algorithm to date for

maximum flow, due to King, Rao, and Tarjan,
runs in O(V E logE/(V lg V)V) time.
If we allow running times as a function of edge
weights, the fastest algorithm for maximum
flow, due to Goldberg and Rao, runs in time
O(min{V 2/3, E 1/2} E lg (V 2/E + 2) lg C),
where C is the maximum capacity of any edge
in the graph.
November 29, 2004
L21.35
6.046J/18.401J
Lecture 22
Prof. Piotr Indyk
Today
String matching problems

HKN Evaluations (last 15 minutes)
Graded Quiz 2 (outside)
Piotr Indyk
December 1, 2004
L22.2
String Matching
Input: Two strings T[1n] and P[1m],

containing symbols from alphabet .
E.g. :
={a,b,,z}
T[118]=to be or not to be
P[1..2]=be
Goal: find all shifts 0 s n-m such that
T[s+1s+m]=P
E.g. 3, 16
Piotr Indyk
December 1, 2004
L22.3
Simple Algorithm
for s 0 to n-m
Match 1
for j 1 to m
if T[s+j] P[j] then

Match 0
exit loop
if Match=1 then output s
Piotr Indyk
December 1, 2004
L22.4
Results
Running time of the simple algorithm:

Worst-case: O(nm)
Average-case (random text): O(n)
Ts= time spent on checking shift s
E[Ts] 2
E [sTs] = s E[Ts] = O(n)
Piotr Indyk
December 1, 2004
L22.5
Worst-case
Is it possible to achieve O(n) for any input ?
Knuth-Morris-Pratt77: deterministic
Karp-Rabin81: randomized
Piotr Indyk
December 1, 2004
L22.6
Karp-Rabin Algorithm
A very elegant use of an idea that we have encountered

before, namely
HASHING !
Idea:
Hash all substrings
T[1m], T[2m+1], , T[m-n+1n]
Hash the pattern P[1m]

Report the substrings that hash to the same value as P
Problem: how to hash n-m substrings, each of length m, in
O(n) time ?
Piotr Indyk
December 1, 2004
L22.7
Attempt 0
In Lecture 7, we have seen
ha(x)=i aixi mod q
where a=(a1,,ar) , x=(x1,,xr)
To implement it, we would need to compute
ha( T[ss+m-1] )=i ai T[s+i] mod q

for s=0n-m
How to compute it in O(n) time ?
A big open problem!
Piotr Indyk
December 1, 2004
L22.8
Attempt 1
Assume ={0,1}
Think about each Ts=T[s+1s+m] as a

number in binary representation, i.e.,
ts=T[s+1]2m-1+T[s+2]2m-2++T[s+m]20
Find a fast way of computing ts+1 given ts
Output all s such that ts is equal to the
number p represented by P
Piotr Indyk
December 1, 2004
L22.9
The great formula
How to transform
ts=T[s+1]2m-1+T[s+2]2m-2++T[s+m]20
into
ts+1=T[s+2]2m-1+T[s+3]2m-2++T[s+m+1]20 ?
Three steps:
Subtract T[s+1]2m-1
Multiply by 2 (i.e., shift the bits by one

position)
Add T[s+m+1]20
Therefore: ts+1= (ts- T[s+1]2m-1)*2 + T[s+m+1]20
Piotr Indyk
December 1, 2004
L22.10
Algorithm
ts+1= (ts- T[s+1]2m-1)*2 + T[s+m+1]20

Can compute ts+1 from ts using 3 arithmetic
operations
Therefore, we can compute all t0,t1,,tn-m
using O(n) arithmetic operations
We can compute a number corresponding to
P using O(m) arithmetic operations
Are we done ?
Piotr Indyk
December 1, 2004
L22.11
Problem
To get O(n) time, we would need to perform

each arithmetic operation in O(1) time
However, the arguments are m-bit long !
If m large, it is unreasonable to assume that

operations on such big numbers can be done
in O(1) time
We need to reduce the number range to
something more managable
Piotr Indyk
December 1, 2004
L22.12
Attempt 2: Hashing
We will instead compute

ts=T[s+1]2m-1+T[s+2]2m-2++T[s+m]20 mod q
where q is an appropriate prime number
One can still compute ts+1 from ts :

ts+1= (ts- T[s+1]2m-1)*2+T[s+m+1]20 mod q
If q is not large, i.e., has O(log n) bits, we
can compute all ts (and p) in O(n) time
Piotr Indyk
December 1, 2004
L22.13
Problem
Unfortunately, we can have false positives,

i.e., TsP but ts mod q = p mod q
Need to use a random q
We will show that the probability of a false
positive is small randomized algorithm
Piotr Indyk
December 1, 2004
L22.14
False positives
Consider any tsp. We know that both numbers are in the

range {02m-1}
How many primes q are there such that
ts mod q = p mod q (ts-p) =0 mod q ?
Such prime has to divide x=(ts-p) 2m
Represent x=p1e1p2e2pkek
, pi prime, ei1
What is the largest possible value of k ?
Since 2 pi , we have x 2k
But x 2m
k m
There are m primes dividing x

Piotr Indyk
December 1, 2004
L22.15
Algorithm
Algorithm:
Let be a set of 2nm primes, each having O(log n) bits
Choose q uniformly at random from
Compute t0 mod q, t1 mod q, ., and p mod q
Report s such that ts mod q = p mod q
Analysis:
For each s, the probability that TsP but
ts mod q =p mod q
is at most m/2nm = 1/2n
The probability of any false positive is at most (n-m)/2n 1/2
Piotr Indyk
December 1, 2004
L22.16
Details
How do we know that such exists ?
(That is, a set of 2nm primes, each having

O(log n) bits)
How do we choose a random prime from
in O(n) time ?
Piotr Indyk
December 1, 2004
L22.17
Prime density
Primes are dense. I.e., if PRIMES(N) is

the set of primes smaller than N, then
asymptotically
|PRIMES(N)|/N ~ 1/ln N
If N large enough, then
|PRIMES(N)| N/(2ln N)
Proof: Trust me.

Piotr Indyk
December 1, 2004
L22.18
Prime density continued
Set N=C mn ln(mn)

There exists C=O(1) such that
N/(2ln N) 2mn
(Note: for such N we have PRIMES(N) 2mn )
Proof:
C mn ln(mn) / [2 ln(C mn ln(mn)) ]
C mn ln(mn) / [2 ln(C (mn)2) ]
= C mn ln(mn) / 4[ ln(C) + ln(mn)]
All elements of PRIMES(N) are log N = O(log n)

bits long
Piotr Indyk
December 1, 2004
L22.19
Prime selection
Still need to find a random element of

PRIMES(N)
Solution:
Choose a random element from {1 N}
Check if it is prime
If not, repeat
Piotr Indyk
December 1, 2004
L22.20
Prime selection analysis
A random element q from {1N} is prime with

probability ~1/ln N
We can check if q is prime in time polynomial in
log N :
Randomized: Rabin, Solovay-Strassen in 1976
Deterministic: Agrawal et al in 2002

Therefore, we can generate random prime q in
o(n) time
Piotr Indyk
December 1, 2004
L22.21
Final Algorithm
Set N=C mn ln(mn)
Repeat
Choose q uniformly at random from
{1N}
Until q is prime
Compute t0 mod q, t1 mod q, ., and p mod

q
Report s such that ts mod q = p mod q
Piotr Indyk
December 1, 2004
L22.22
6.046J/18.401J
Lecture 24
Prof. Piotr Indyk
Dealing with Hard Problems
What to do if:
Divide and conquer
Dynamic programming
Greedy
Linear Programming/Network Flows
does not give a polynomial time algorithm?
Piotr Indyk
December 8, 2004
L24.2
Solution I: Ignore the problem
Cant do it ! There are thousands of

problems for which we do not know
polynomial time algorithms
For example:
Traveling Salesman Problem (TSP)
Set Cover
Piotr Indyk
December 8, 2004
L24.3
Traveling Salesman Problem
Traveling Salesman
Problem (TSP)
Input: undirected graph
with lengths on edges
Output: shortest cycle
that visits each vertex
exactly once
Best known algorithm:
O(n 2n) time.
Piotr Indyk
December 8, 2004
L24.4
Set Covering
Set Cover:
Input: subsets S1Sn of X,
i Si = X, |X|=m
Output: C {1n} , such
that iC Si = X, and |C|
minimal
Best known algorithm:
O(2n m) time(?)
Piotr Indyk
Bank robbery problem:

X={plan, shoot, safe,
drive, scary}
Sets:
SJoe ={plan, safe}

SJim={shoot, scary,
drive}
December 8, 2004
L24.5
Exponential time algorithms for small

inputs. E.g., (100/99)n time is not bad for
n < 1000.
Polynomial time algorithms for some (e.g.,
average-case) inputs
Polynomial time algorithms for all inputs,
but which return approximate solutions
Piotr Indyk
December 8, 2004
L24.6
Approximation Algorithms
An algorithm A is -approximate, if, on any input

of size n:
The cost CA of the solution produced by the
algorithm, and
The cost COPT of the optimal solution
are such that CA COPT
We will see:
2-approximation algorithm for TSP in the plane
ln(m)-approximation algorithm for Set Cover
Piotr Indyk
December 8, 2004
L24.7
Comments on Approximation
CA COPT makes sense only for

minimization problems
For maximization problems, replace by
CA 1/ COPT
Additive approximation CA + COPT

also makes sense, although difficult to
achieve
Piotr Indyk
December 8, 2004
L24.8
2-approximation for TSP
Compute MST T
An edge between any pair of points
Weight = distance between endpoints
Compute a tree-walk W of T
Each edge visited twice
Convert W into a cycle C using
shortcuts
Piotr Indyk
December 8, 2004
L24.9
2-approximation: Proof
Let COPT be the optimal cycle
Cost(T) Cost(COPT )
Removing an edge from C gives a spanning
tree, T is a spanning tree of minimum cost
Cost(W) = 2 Cost(T)
Each edge visited twice
Cost(C) Cost(W)
Triangle inequality
Cost(C) 2 Cost(COPT )
Piotr Indyk
December 8, 2004
L24.10
Approximation for Set Cover
Greedy algorithm:
Initialize C=
Repeat until all elements are covered:
Choose Si which contains largest number

of yet-not-covered elements
Add i to C
Mark all elements in Si as covered
Piotr Indyk
December 8, 2004
L24.11
Greedy Algorithm: Example
X={1,2,3,4,5,6}
Sets:
S1={1,2}
S2={3,4}
S3={5,6}
S4={1,3,5}
Algorithm picks C={4,1,2,3}
Not optimal!
Piotr Indyk
December 8, 2004
L24.12
ln(m)-approximation
Notation:
COPT = optimal cover
k=|COPT |
Fact: At any iteration of the algorithm, there exists Sj

which contains at 1/k fraction of yet-not-covered
elements
Proof: by contradiction.
If all sets cover <1/k fraction of yet-not-covered

elements, there is no way to cover them using k sets
But COPT does that !
Therefore, at each iteration greedy covers 1/k fraction of
yet-not-covered elements
Piotr Indyk
December 8, 2004
L24.13
ln(m)-approximation
Let ui be the number of yet-not-covered elements

at the end of step i=0,1,2,
We have
ui+1 ui (1-1/k)
u0=m
Therefore, after t=k ln m steps, we have
ut u0 (1-1/k)t m (1-1/k)k ln m < m 1/eln m =1
I.e., all elements are covered by the k ln m sets
chosen by greedy algorithm
Opt size is k greedy is ln(m)-approximate
Piotr Indyk
December 8, 2004
L24.14
Approximation Algorithms
Very rich area
Algorithms use greedy, linear

programming, dynamic programming
E.g., 1.01-approximate TSP in the plane
Sometimes can show that approximating a

problem is as hard as finding exact
solution !
E.g., 0.99 ln(m)-approximate Set Cover
Piotr Indyk
December 8, 2004
L24.15

CLR Explained

Caricato da

Informazioni sul documento

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

CLR Explained

Caricato da

Copyright:

Formati disponibili

Introduction to Algorithms

Prof. Charles E. Leiserson

Course information handout

Why study algorithms and

20014 by Charles E. Leiserson

The problem of sorting

20014 by Charles E. Leiserson

20014 by Charles E. Leiserson

Example of insertion sort

20014 by Charles E. Leiserson

Example of insertion sort

20014 by Charles E. Leiserson

Example of insertion sort

20014 by Charles E. Leiserson

Example of insertion sort

20014 by Charles E. Leiserson

Example of insertion sort

20014 by Charles E. Leiserson

Example of insertion sort

20014 by Charles E. Leiserson

Example of insertion sort

20014 by Charles E. Leiserson

Example of insertion sort

20014 by Charles E. Leiserson

Example of insertion sort

20014 by Charles E. Leiserson

Example of insertion sort

20014 by Charles E. Leiserson

Example of insertion sort

20014 by Charles E. Leiserson

(g(n)) = { f (n) : there exist positive constants c1, c2, and

20014 by Charles E. Leiserson

20014 by Charles E. Leiserson

Insertion sort analysis

Average case: All permutations equally likely.

Is insertion sort a fast sorting algorithm?

Merging two sorted arrays

20014 by Charles E. Leiserson

Merging two sorted arrays

20014 by Charles E. Leiserson

Merging two sorted arrays

20014 by Charles E. Leiserson

Merging two sorted arrays

20014 by Charles E. Leiserson

Merging two sorted arrays

20014 by Charles E. Leiserson

Merging two sorted arrays

20014 by Charles E. Leiserson

Merging two sorted arrays

20014 by Charles E. Leiserson

Merging two sorted arrays

20014 by Charles E. Leiserson

Merging two sorted arrays

20014 by Charles E. Leiserson

Merging two sorted arrays

20014 by Charles E. Leiserson

Merging two sorted arrays

20014 by Charles E. Leiserson

Merging two sorted arrays

20014 by Charles E. Leiserson

Merging two sorted arrays

Time = (n) to merge a total

Analyzing merge sort

Recurrence for merge sort