Data Structure and Algorithms - Four

CS514: Design and Analysis
of Algorithms
Quicksort
Analyzing Quicksort
 What will be the worst case for the algorithm?

 Partition is always unbalanced
 What will be the best case for the algorithm?
 Partition is perfectly balanced
 Which is more likely?
 The latter, by far, except...
 Will any particular input elicit the worst case?
 Yes: Already-sorted input
Improving Quicksort
 The real liability of quicksort is that it runs in

O(n2) on already-sorted input
 Discusses two solutions:
 Randomize the input array, OR
 Pick a random pivot element
 How will these solve the problem?
 By insuring that no particular input can be chosen
to make quicksort run in O(n2) time
Analyzing Quicksort: Average Case
 Assuming random input, average-case running

time is much closer to O(n lg n) than O(n2)
 First, a more intuitive explanation/example:
 Suppose that partition() always produces a 9-to-1
split. This looks quite unbalanced!
 Intuitively, a real-life run of quicksort will

produce a mix of “bad” and “good” splits
 Randomly distributed among the recursion tree
 Pretend for intuition that they alternate between
best-case (n/2 : n/2) and worst-case (n-1 : 1)
 What happens if we bad-split root node, then
good-split the resulting size (n-1) node?
 Intuitively, the O(n) cost of a bad split

(or 2 or 3 bad splits) can be absorbed
into the O(n) cost of each good split
 Thus running time of alternating bad and good
splits is still O(n lg n), with slightly higher
constants
 How can we be more rigorous?
Randomized PARTITION
RANDOMIZED-PARTITION(A, p, r)
i ← RANDOM(p, r)
exchange A[p] ↔ A[i]
return PARTITION(A, p, r)
Randomized Quicksort
RANDOMIZED-QUICKSORT(A, p, r)
if p < r
then q ← RANDOMIZED-PARTITION(A, p, r)
RANDOMIZED-QUICKSORT(A, p, q)
RANDOMIZED-QUICKSORT(A, q + 1, r)
 For simplicity, assume:

 All inputs distinct (no repeats)
 Slightly different partition() procedure
 partition around a random element, which is not
included in subarrays
 all splits (0:n-1, 1:n-2, 2:n-3, … , n-1:0) equally likely
 What is the probability of a particular split

happening?
 Answer: 1/n
 So partition generates splits

(0:n-1, 1:n-2, 2:n-3, … , n-2:1, n-1:0)
each with probability 1/n
 If T(n) is the expected running time,
1 n 1
T n    T k   T n  1  k   n 
n k 0
 What is each term under the summation for?
 What is the (n) term for?
 So…
1 n 1
T n    T k   T n  1  k   n 
n k 0
2 n 1
  T k   n 
n k 0
 We can solve this recurrence using the dreaded

substitution method
 Guess the answer
 Assume that the inductive hypothesis holds
 Substitute it in for some value < n
 Prove that it follows for n

substitution method
 What’s the answer?

substitution method
 T(n) = O(n lg n)

substitution method
 T(n) = O(n lg n)
 What’s the inductive hypothesis?

substitution method
 T(n) = O(n lg n)
 T(n)  an lg n + b for some constants a and b

substitution method
 T(n) = O(n lg n)
 What value?

substitution method
 T(n) = O(n lg n)
 The value k in the recurrence

substitution method
 T(n) = O(n lg n)
 The value k in the recurrence
 Grind through it…
2 n 1
T n    T k   n  The recurrence to be solved
n k 0
2 n 1
  ak lg k  b   n  Plug
What
in inductive
are we doing
hypothesis
here?
n k 0
2  n 1 
 b   ak lg k  b   n  Expand
Whatout
arethe
we k=0 case
doing here?
n  k 1 
2 n 1
  ak lg k  b    n 
2b 2b/n is just a constant,
What are we doing here?
so fold it into (n)
n k 1 n
2 n 1
  ak lg k  b   n 
Note: leaving the same
recurrence as the book
n k 1
2 n 1
T n    ak lg k  b   n  The recurrence to be solved
n k 1
2 n 1 2 n 1
  ak lg k   b  n  Distribute
What arethewe
summation
doing here?
n k 1 n k 1
2a n 1
  k lg k 
2b
( n  1)   n  Evaluate the summation:
b+b+…+b = b (n-1)
n k 1 n
2a n 1
  k lg k  2b  n  Since n-1<n,
What 2b(n-1)/n
are we < 2b
doing here?
n k 1
This summation gets its own set of slides later
2a n 1
T n    k lg k  2b  n  The recurrence to be solved
n k 1
2a  1 2 1 2
  n lg n  n   2b  n  What
We’llthe
prove this later
hell?
n 2 8 
an lg n  n  2b  n 
a
 Distribute
What arethewe(2a/n)
doingterm
here?
4
 a  Remember, our goal is to get
 an lg n  b   n   b  n  What are we doing here?
 4  T(n)  an lg n + b
 an lg n  b Pick a large enough that

How did we do this?
an/4 dominates (n)+b
 So T(n)  an lg n + b for certain a and b

 Thus the induction holds
 Thus T(n) = O(n lg n)
 Thus quicksort runs in O(n lg n) time on average
Tightly Bounding
The Key Summation
n 1 n 2 1 n 1
 k lg k   k lg k   k lg k
Split the summation for a
tighter bound
k 1 k 1 k  n 2 
n 2 1 n 1
 k lg k   k lg n
The lg k in the second term
 What are we doing here?
is bounded by lg n
k 1 k  n 2 
n 2 1 n 1
 k lg k  lg n  k
Move the lg n outside the
 What are we doing here?
summation
k 1 k  n 2 
Tightly Bounding
The Key Summation
n 1 n 2 1 n 1
 k lg k 
k 1
 k lg k  lg n
k 1
k
k  n 2 
The summation bound so far
n 2 1 n 1
 k lg n 2  lg n  k
The lg k in the first term is
 What are we doing here?
bounded by lg n/2
k 1 k  n 2 
n 2 1 n 1
  k lg n  1  lg n
k 1
k
k  n 2 
lg n/2
What= lg n we
are - 1 doing here?
n 2 1 n 1
 lg n  1  k  lg n  k
Move (lg n - 1) outside the
summation
k 1 k  n 2 
Tightly Bounding
The Key Summation
n 1 n 2 1 n 1
 k lg k  lg n  1
k 1
 k  lg n
k 1
k
k  n 2 
n 2 1 n 2 1 n 1
 lg n k 
k 1
 k  lg n
k 1
k
k  n 2 
Distribute
What are wethe (lg nhere?
doing - 1)
n 1 n 2 1
 lg n k  k
The summations overlap in
range; combine them
k 1 k 1
 n  1(n)  n 2 1
 lg n
 2


k
k 1
TheWhat
Guassian
are weseries
doing here?
Tightly Bounding
The Key Summation
n 1
 n  1(n)  n 2 1

k 1
k lg k  
 2
 lg n   k
 k 1
n 2 1
 nn  1lg n   k
1 Rearrange first term, place
upper bound on second
2 k 1
1  n  n 
 nn  1lg n     1
1
X Guassian
What are series
we doing?
2 2  2  2 
1 2
2
 8

1 2 n
 n lg n  n lg n  n 
4
Multiply it
What are we doing?
all out
Tightly Bounding
The Key Summation
 
n 1
1 2 1 2 n

k 1
k lg k  n lg n  n lg n  n 
2 8 4
1 2 1 2
 n lg n  n when n  2
2 8
Done!!!

Data Structure and Algorithms - Four

Caricato da

Informazioni sul documento

Descrizione originale:

Titolo originale

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Data Structure and Algorithms - Four

Caricato da

Copyright:

Formati disponibili

CS514: Design and Analysis

 What will be the worst case for the algorithm?

 The real liability of quicksort is that it runs in

 Assuming random input, average-case running

 Intuitively, a real-life run of quicksort will

 Intuitively, the O(n) cost of a bad split

exchange A[p] ↔ A[i]

 For simplicity, assume:

 What is the probability of a particular split

 So partition generates splits

 We can solve this recurrence using the dreaded

 We can solve this recurrence using the dreaded

 We can solve this recurrence using the dreaded

 We can solve this recurrence using the dreaded

 We can solve this recurrence using the dreaded

 We can solve this recurrence using the dreaded

 We can solve this recurrence using the dreaded

 We can solve this recurrence using the dreaded

 an lg n  b Pick a large enough that

 So T(n)  an lg n + b for certain a and b

Potrebbero piacerti anche