Sei sulla pagina 1di 143

CSD 205 – Design and Analysis Algorithm

Instructor: Dr. M. Hasan Jamal


Lecture# 03: Sorting Algorithms

1
Types of Sorting Algorithms
• Non-recursive/Incremental comparison sorting
• Selection sort
• Bubble sort
• Insertion sort
• Recursive comparison sorting
• Merge sort
• Quick sort
• Heap sort
• Non-comparison linear sorting
• Count sort
• Radix sort
2
• Bucket sort
Selection Sort
• Idea:
• Find the smallest element in the array
• Exchange it with the element in the first position
• Find the second smallest element and exchange it with the
element in the second position
• Continue until the array is sorted

8 4 6 9 2 3 1 1 2 3 4 9 6 8

1 4 6 9 2 3 8 1 2 3 4 6 9 8

1 2 6 9 4 3 8 1 2 3 4 6 8 9
3
1 2 3 9 4 6 8 1 2 3 4 6 8 9
Selection Sort Analysis
Alg.: SELECTION-SORT(A) Cost Steps
n ← length[A] c1 1
for j ← 0 to n - 2 c2 n-1
smallest ← j c3 n-1
n2
for i ← j + 1 to n-1 c4 t j
j 0

if A[i] < A[smallest] c5 n2

t j
j 0
smallest ← i c6 n2

t
j 0
j

exchange A[j] ↔ A[smallest]


c7 n-1
tj: # of times the inner for loop statement is executed at iteration j 4
n2 n2 n2
T (n)  c1  c 2 (n  1)  c3 (n  1)  c 4  t j  c5  t j  c6  t j  c7 (n  1)
j 0 j 0 j 0
Best/Worst/Average Case Analysis
n(n  1) n(n  1) n(n  1)
T (n)  c1  c 2 (n  1)  c3 (n  1)  c 4  c5  c6  c7 (n  1)
2 2 2
 an 2  bn  c
n2
n(n  1)
T(n) = (n2) 
j 0
j
2

Disadvantage:
(n2) running time in all cases

5
n2 n2 n2
T (n)  c1  c2 (n  1)  c3 (n  1)  c4  t j  c5  t j  c6  t j  c7 (n  1)
j 0 j 0 j 0
Bubble Sort
Given a list of N elements, repeat the following steps N-1 times:
• For each pair of adjacent numbers, if number on the left is greater than the
number on the right, swap them.
• “Bubble” the largest value to the end using pair-wise comparisons and swapping.

12
6 12
6 22
14 14
22
8 8
17
22 17
22

6 12 14
8 14
8 17 22

6 12
8 8
12 14 17 22
6

6 8 12 14 17 22
Bubble Sort Analysis
BubbleSort(A) cost times
1. n = Length[A]; c1 1
2. for j = 0 to n-2 c2 n-1

n2
3. for i = 0 to n-j-2 c3 t
j 0 j


n2
4. if A[i] > A[i+1] c4 t
j 0 j

c5 
n2
5. temp = A[i] t
j 0 j

6. A[i] = A[i+1] c6 
n2
t
j 0 j

7. A[i+1] = temp c7 
n2
t
j 0 j

8. return A; c8 1
tj: # of times the inner for loop statement is executed at iteration j 7
n2 n2 n2 n2 n2
T (n)  c1  c 2 (n  1)  c3  t j  c 4  t j  c5  t j  c6  t j  c7  t j  c8
j 0 j 0 j 0 j 0 j 0
Best/Worst/Average Case Analysis
• Since we don’t know at which iteration the list is completely
sorted, the nested loop will execute completely. n(n  1)
 j 0
n2
j 
2

n(n  1) n(n  1) n(n  1) n(n  1) n(n  1)


T (n)  c1  c 2 (n  1)  c3  c4  c5  c6  c7  c8
2 2 2 2 2

 an 2  bn  c a quadratic function of n

• T(n) = (n2) order of growth in n2

8
n2 n2 n2 n2 n2
T (n)  c1  c2 (n  1)  c3  t j  c4  t j  c5  t j  c6  t j  c7  t j  c8
j 0 j 0 j 0 j 0 j 0
Bubble Sort Modified
BubbleSort(A)
1. n = Length[A];
2. for j = 0 to n-2
3. swap = 0;
4. for i = 0 to n-j-2
5. if A[i] > A[i+1]
6. temp = A[i]
7. A[i] = A[i+1]
8. A[i+1] = temp
swap = swap + 1;
9. if swap = 0
10. break;
9
11.return A;
Best Case Analysis (Modified)
• The array is already sorted
• Keep track of no. of swaps done in an iteration
• Can finish early if no swapping occurs
• tj = 1  n – 1 comparisons in a single iteration

T(n) = c1 + c2(n -1) + c3(n -1) + c4(n -1) + c8

= (c2 + c3 + c4)n + (c1 – c2 – c3 – c4 + c8)

= an + b = (n)

10
n2 n2 n2 n2 n2
T (n)  c1  c2 (n  1)  c3  t j  c4  t j  c5  t j  c6  t j  c7  t j  c8
j 0 j 0 j 0 j 0 j 0
Worst Case Analysis (Modified)
• The array is in reverse sorted order n(n  1)

n2
j 0
j
• Always A[i] > A[i+1] 2

n(n  1) n(n  1) n(n  1) n(n  1) n(n  1)


T (n)  c1  c 2 (n  1)  c3  c4  c5  c6  c7  c8
2 2 2 2 2
 an 2  bn  c a quadratic function of n
T(n) = (n2) order of growth in n2

Advantages Disadvantages
Simple to code (n2) running time in worst & average case
Requires little memory (in-place) Nobody “EVER” uses bubble sort
(EXTREMELY INEFFICIENT)

11
n2 n2 n2 n2 n2
T (n)  c1  c2 (n  1)  c3  t j  c4  t j  c5  t j  c6  t j  c7  t j  c8
j 0 j 0 j 0 j 0 j 0
Insertion Sort
• Idea: like sorting a hand of playing cards
• Start with empty left hand and cards face down on the table.
• Remove one card at a time from the table, and insert it into the
correct position in the left hand
• compare it with each card already in the hand, from right to left
• The cards held in the left hand are sorted
• these cards were originally the top cards of the pile on the table

12
Insertion Sort
InsertionSort(A, n)
1. for i = 1 to n-1
2. key = A[i]
3. j = i – 1
4. while (j > 0) and (A[j] > key)
5. A[j+1] = A[j]
6. j = j – 1
7. A[j+1] = key

Array index 0 1 2 3 4 5 6 7 8 9
Value 2.78 7.42 0.56 1.12 1.17 0.32 6.21 4.42 3.14 7.71

13

Iteration 0: step 0.
Insertion Sort
InsertionSort(A, n)
1. for i = 1 to n-1
2. key = A[i]
3. j = i – 1
4. while (j > 0) and (A[j] > key)
5. A[j+1] = A[j]
6. j = j – 1
7. A[j+1] = key

Array index 0 1 2 3 4 5 6 7 8 9
Value 2.78 7.42 0.56 1.12 1.17 0.32 6.21 4.42 3.14 7.71

14

Iteration 1: step 0.
Insertion Sort
InsertionSort(A, n)
1. for i = 1 to n-1
2. key = A[i]
3. j = i – 1
4. while (j > 0) and (A[j] > key)
5. A[j+1] = A[j]
6. j = j – 1
7. A[j+1] = key

Array index 0 1 2 3 4 5 6 7 8 9
Value 2.78 0.56
7.42 7.42
0.56 1.12 1.17 0.32 6.21 4.42 3.14 7.71

15

Iteration 2: step 0.
Insertion Sort
InsertionSort(A, n)
1. for i = 1 to n-1
2. key = A[i]
3. j = i – 1
4. while (j > 0) and (A[j] > key)
5. A[j+1] = A[j]
6. j = j – 1
7. A[j+1] = key

Array index 0 1 2 3 4 5 6 7 8 9
Value 2.78 2.78
0.56 0.56 7.42 1.12 1.17 0.32 6.21 4.42 3.14 7.71

16

Iteration 2: step 1.
Insertion Sort
InsertionSort(A, n)
1. for i = 1 to n-1
2. key = A[i]
3. j = i – 1
4. while (j > 0) and (A[j] > key)
5. A[j+1] = A[j]
6. j = j – 1
7. A[j+1] = key

Array index 0 1 2 3 4 5 6 7 8 9
Value 2.78 2.78
0.56 0.56 7.42 1.12 1.17 0.32 6.21 4.42 3.14 7.71

17

Iteration 2: step 2.
Insertion Sort
InsertionSort(A, n)
1. for i = 1 to n-1
2. key = A[i]
3. j = i – 1
4. while (j > 0) and (A[j] > key)
5. A[j+1] = A[j]
6. j = j – 1
7. A[j+1] = key

Array index 0 1 2 3 4 5 6 7 8 9
Value 0.56 2.78 1.12
7.42 7.42
1.12 1.17 0.32 6.21 4.42 3.14 7.71

18

Iteration 3: step 0.
Insertion Sort
InsertionSort(A, n)
1. for i = 1 to n-1
2. key = A[i]
3. j = i – 1
4. while (j > 0) and (A[j] > key)
5. A[j+1] = A[j]
6. j = j – 1
7. A[j+1] = key

Array index 0 1 2 3 4 5 6 7 8 9
Value 0.56 1.12
2.78 2.78
1.12 7.42 1.17 0.32 6.21 4.42 3.14 7.71

19

Iteration 3: step 1.
Insertion Sort
InsertionSort(A, n)
1. for i = 1 to n-1
2. key = A[i]
3. j = i – 1
4. while (j > 0) and (A[j] > key)
5. A[j+1] = A[j]
6. j = j – 1
7. A[j+1] = key

Array index 0 1 2 3 4 5 6 7 8 9
Value 0.56 1.12
2.78 2.78
1.12 7.42 1.17 0.32 6.21 4.42 3.14 7.71

20

Iteration 3: step 2.
Insertion Sort
InsertionSort(A, n)
1. for i = 1 to n-1
2. key = A[i]
3. j = i – 1
4. while (j > 0) and (A[j] > key)
5. A[j+1] = A[j]
6. j = j – 1
7. A[j+1] = key

Array index 0 1 2 3 4 5 6 7 8 9
Value 0.56 1.12 2.78 1.17
7.42 7.42
1.17 0.32 6.21 4.42 3.14 7.71

21

Iteration 4: step 0.
Insertion Sort
InsertionSort(A, n)
1. for i = 1 to n-1
2. key = A[i]
3. j = i – 1
4. while (j > 0) and (A[j] > key)
5. A[j+1] = A[j]
6. j = j – 1
7. A[j+1] = key

Array index 0 1 2 3 4 5 6 7 8 9
Value 0.56 1.12 1.17
2.78 2.78
1.17 7.42 0.32 6.21 4.42 3.14 7.71

22

Iteration 4: step 1.
Insertion Sort
InsertionSort(A, n)
1. for i = 1 to n-1
2. key = A[i]
3. j = i – 1
4. while (j > 0) and (A[j] > key)
5. A[j+1] = A[j]
6. j = j – 1
7. A[j+1] = key

Array index 0 1 2 3 4 5 6 7 8 9
Value 0.56 1.12 1.17 2.78 7.42 0.32 6.21 4.42 3.14 7.71

23

Iteration 4: step 2.
Insertion Sort
InsertionSort(A, n)
1. for i = 1 to n-1
2. key = A[i]
3. j = i – 1
4. while (j > 0) and (A[j] > key)
5. A[j+1] = A[j]
6. j = j – 1
7. A[j+1] = key

Array index 0 1 2 3 4 5 6 7 8 9
Value 0.56 1.12 1.17 2.78 0.32
7.42 7.42
0.32 6.21 4.42 3.14 7.71

24

Iteration 5: step 0.
Insertion Sort
InsertionSort(A, n)
1. for i = 1 to n-1
2. key = A[i]
3. j = i – 1
4. while (j > 0) and (A[j] > key)
5. A[j+1] = A[j]
6. j = j – 1
7. A[j+1] = key

Array index 0 1 2 3 4 5 6 7 8 9
Value 0.56 1.12 1.17 0.32
2.78 2.78
0.32 7.42 6.21 4.42 3.14 7.71

25

Iteration 5: step 1.
Insertion Sort
InsertionSort(A, n)
1. for i = 1 to n-1
2. key = A[i]
3. j = i – 1
4. while (j > 0) and (A[j] > key)
5. A[j+1] = A[j]
6. j = j – 1
7. A[j+1] = key

Array index 0 1 2 3 4 5 6 7 8 9
Value 0.56 1.12 0.32
1.17 1.17
0.32 2.78 7.42 6.21 4.42 3.14 7.71

26

Iteration 5: step 2.
Insertion Sort
InsertionSort(A, n)
1. for i = 1 to n-1
2. key = A[i]
3. j = i – 1
4. while (j > 0) and (A[j] > key)
5. A[j+1] = A[j]
6. j = j – 1
7. A[j+1] = key

Array index 0 1 2 3 4 5 6 7 8 9
Value 0.56 0.32
1.12 1.12
0.32 1.17 2.78 7.42 6.21 4.42 3.14 7.71

27

Iteration 5: step 3.
Insertion Sort
InsertionSort(A, n)
1. for i = 1 to n-1
2. key = A[i]
3. j = i – 1
4. while (j > 0) and (A[j] > key)
5. A[j+1] = A[j]
6. j = j – 1
7. A[j+1] = key

Array index 0 1 2 3 4 5 6 7 8 9
Value 0.56 0.56
0.32 0.32 1.12 1.17 2.78 7.42 6.21 4.42 3.14 7.71

28

Iteration 5: step 4.
Insertion Sort
InsertionSort(A, n)
1. for i = 1 to n-1
2. key = A[i]
3. j = i – 1
4. while (j > 0) and (A[j] > key)
5. A[j+1] = A[j]
6. j = j – 1
7. A[j+1] = key

Array index 0 1 2 3 4 5 6 7 8 9
Value 0.56 0.56
0.32 0.32 1.12 1.17 2.78 7.42 6.21 4.42 3.14 7.71

29

Iteration 5: step 5.
Insertion Sort
InsertionSort(A, n)
1. for i = 1 to n-1
2. key = A[i]
3. j = i – 1
4. while (j > 0) and (A[j] > key)
5. A[j+1] = A[j]
6. j = j – 1
7. A[j+1] = key

Array index 0 1 2 3 4 5 6 7 8 9
Value 0.32 0.56 1.12 1.17 2.78 6.21
7.42 7.42
6.21 4.42 3.14 7.71

30

Iteration 6: step 0.
Insertion Sort
InsertionSort(A, n)
1. for i = 1 to n-1
2. key = A[i]
3. j = i – 1
4. while (j > 0) and (A[j] > key)
5. A[j+1] = A[j]
6. j = j – 1
7. A[j+1] = key

Array index 0 1 2 3 4 5 6 7 8 9
Value 0.32 0.56 1.12 1.17 2.78 6.21
7.42 7.42
6.21 4.42 3.14 7.71

31

Iteration 6: step 1.
Insertion Sort
InsertionSort(A, n)
1. for i = 1 to n-1
2. key = A[i]
3. j = i – 1
4. while (j > 0) and (A[j] > key)
5. A[j+1] = A[j]
6. j = j – 1
7. A[j+1] = key

Array index 0 1 2 3 4 5 6 7 8 9
Value 0.32 0.56 1.12 1.17 2.78 6.21 4.42
7.42 7.42
4.42 3.14 7.71

32

Iteration 7: step 0.
Insertion Sort
InsertionSort(A, n)
1. for i = 1 to n-1
2. key = A[i]
3. j = i – 1
4. while (j > 0) and (A[j] > key)
5. A[j+1] = A[j]
6. j = j – 1
7. A[j+1] = key

Array index 0 1 2 3 4 5 6 7 8 9
Value 0.32 0.56 1.12 1.17 2.78 4.42
6.21 6.21
4.42 7.42 3.14 7.71

33

Iteration 7: step 1.
Insertion Sort
InsertionSort(A, n)
1. for i = 1 to n-1
2. key = A[i]
3. j = i – 1
4. while (j > 0) and (A[j] > key)
5. A[j+1] = A[j]
6. j = j – 1
7. A[j+1] = key

Array index 0 1 2 3 4 5 6 7 8 9
Value 0.32 0.56 1.12 1.17 2.78 4.42
6.21 6.21
4.42 7.42 3.14 7.71

34

Iteration 7: step 2.
Insertion Sort
InsertionSort(A, n)
1. for i = 1 to n-1
2. key = A[i]
3. j = i – 1
4. while (j > 0) and (A[j] > key)
5. A[j+1] = A[j]
6. j = j – 1
7. A[j+1] = key

Array index 0 1 2 3 4 5 6 7 8 9
Value 0.32 0.56 1.12 1.17 2.78 4.42 6.21 3.14
7.42 7.42
3.14 7.71

35

Iteration 8: step 0.
Insertion Sort
InsertionSort(A, n)
1. for i = 1 to n-1
2. key = A[i]
3. j = i – 1
4. while (j > 0) and (A[j] > key)
5. A[j+1] = A[j]
6. j = j – 1
7. A[j+1] = key

Array index 0 1 2 3 4 5 6 7 8 9
Value 0.32 0.56 1.12 1.17 2.78 4.42 3.14
6.21 6.21
3.14 7.42 7.71

36

Iteration 8: step 1.
Insertion Sort
InsertionSort(A, n)
1. for i = 1 to n-1
2. key = A[i]
3. j = i – 1
4. while (j > 0) and (A[j] > key)
5. A[j+1] = A[j]
6. j = j – 1
7. A[j+1] = key

Array index 0 1 2 3 4 5 6 7 8 9
Value 0.32 0.56 1.12 1.17 2.78 3.14
4.42 4.42
3.14 6.21 7.42 7.71

37

Iteration 8: step 2.
Insertion Sort
InsertionSort(A, n)
1. for i = 1 to n-1
2. key = A[i]
3. j = i – 1
4. while (j > 0) and (A[j] > key)
5. A[j+1] = A[j]
6. j = j – 1
7. A[j+1] = key

Array index 0 1 2 3 4 5 6 7 8 9
Value 0.32 0.56 1.12 1.17 2.78 3.14
4.42 4.42
3.14 6.21 7.42 7.71

38

Iteration 8: step 3.
Insertion Sort
InsertionSort(A, n)
1. for i = 1 to n-1
2. key = A[i]
3. j = i – 1
4. while (j > 0) and (A[j] > key)
5. A[j+1] = A[j]
6. j = j – 1
7. A[j+1] = key

Array index 0 1 2 3 4 5 6 7 8 9
Value 0.32 0.56 1.12 1.17 2.78 3.14 4.42 6.21 7.42 7.71

39

Iteration 9: step 0.
Insertion Sort
InsertionSort(A, n)
1. for i = 1 to n-1
2. key = A[i]
3. j = i – 1
4. while (j > 0) and (A[j] > key)
5. A[j+1] = A[j]
6. j = j – 1
7. A[j+1] = key

Array index 0 1 2 3 4 5 6 7 8 9
Value 0.32 0.56 1.12 1.17 2.78 3.14 4.42 6.21 7.42 7.71

40

Iteration 10: DONE!.


Insertion Sort Analysis
InsertionSort(A, n) { cost times
for i = 1 to n-1 { c1 n-1
key = A[i] c2 n-1
j = i - 1; c3 n-1
while (j > 0) and (A[j] > key){ c4 
n 1
t
i 1 i
A[j+1] = A[j] c5 
n 1
t
i 1 i
j = j – 1 c6 
n 1
t
i 1 i
}
A[j+1] = key c7 n-1
}
} ti: # of times the while statement is executed at iteration i 41
n 1 n 1 n 1
T (n)  c1 (n  1)  c 2 (n  1)  c3 (n  1)  c 4  t i  c5  t i  c6  t i  c7 (n  1)
i 1 i 1 i 1
Best Case Analysis
“while j > 0 and A[j] > key”
• The array is already sorted in ascending order
• Inner loop will not be executed
• The number of moves: 2*(n-1)
• The number of key comparisons: (n-1)

T(n) = c1(n-1) + c2(n -1) + c3(n -1) + c4(n -1) + c7(n-1)

= (c1 + c2 + c3 + c4 + c7)n - (c1 + c2 + c3 + c4 + c7)

= an + b = (n)

42
n 1 n 1 n 1
T (n)  c1 (n  1)  c 2 (n  1)  c3 (n  1)  c 4  t i  c5  t i  c6  t i  c7 (n  1)
i 1 i 1 i 1
Worst Case Analysis
“while j > 0 and A[j] > key”
• The array is in reverse sorted order
• The number of moves: 2*(n-1)+(1+2+...+n-1)= 2*(n-1)+ n*(n-1)/2
• The number of key comparisons: (1+2+...+n-1)= n*(n-1)/2
• Always A[j] > key in while loop test
• Have to compare key with all elements to the left of the j-th position
 compare with i-1 elements  tj = i n 1
n(n  1)

i 1
t i 
2
 n(n  1) 
T (n)  c1 (n  1)  c 2 (n  1)  c3 (n  1)  (c 4  c5  c6 )   c7 (n  1)
 2 
 an  bn  c
2
a quadratic function of n
T(n) = (n2) order of growth in n2
43
n 1 n 1 n 1
T (n)  c1 (n  1)  c 2 (n  1)  c3 (n  1)  c 4  t i  c5  t i  c6  t i  c7 (n  1)
i 1 i 1 i 1
Insertion Sort Summary
• Running time not only depends on the size of the array but also
the contents of the array.

• Average-case:  O(n2)
• We have to look at all possible initial data organizations.

• Advantages
• Good running time for “almost sorted” arrays (n)

• Disadvantages
• (n2) running time in worst and average case
•  n2/2 comparisons and exchanges 44
Types of Sorting Algorithms
• Non-recursive/incremental comparison sorting
• Selection sort
• Bubble sort
• Insertion sort
• Recursive comparison sorting
• Merge sort
• Quick sort
• Heap sort
• Non-comparison linear sorting
• Count sort
• Radix sort
45
• Bucket sort
Recursive Comparison Sorting
There are many ways to design algorithms:

Insertion/bubble/selection sort uses an incremental approach


• Having sorted to array A[i..j]
• We insert the single element A[j+1] into its proper place, to get a sorted
array A[i..j+1]

An alternative design approach is “Divide and Conquer”


• Divide the problems into a number of sub-problems.
• Conquer the sub-problems by solving them recursively. If sub-problem sizes
are small enough, just solve them in a straight forward manner.
• Combine the solutions to the sub-problems into the solution for the original
problem. 46
Merge Sort
• Merge sort: orders a list of values by recursively dividing the list in half
until each sub-list has one element, then recombining
• Invented by John von Neumann in 1945
• More specifically:
• Divide the n element sequence to be sorted into two subsequences
of n/2 elements each.
• Conquer: Sort the two subsequences to produce the sorted answer.
• Combine: Merge the two sorted sub sequences to produce the
sorted answer.

mergeSort(0, n/2-1) mergeSort(n/2, n-1)

47
sort merge(0, n/2, n-1) sort
Merge Sort
Base Case: When the sequences to be sorted has length 1.

Unsorted
Sorted 66
12 108
14 56
34 14
56 89
66 12
89 108
34

Divide
Merge Divide
Merge

66
14 108
56 56
66 108
14 89
12 12
34 34
89

Divide
Merge Divide
Merge Divide
Merge Divide
BCase
Merge

66
66 108
108 56
14 14
56 89
12 12
89 34

Merge
Divide
BCase Divide
BCase
Merge Divide
BCase
Merge Divide
BCase
Merge
48
66 108 56 14 89 12
Merge Function
• Given two sorted arrays, merge operation produces a sorted
array with all the elements of the two arrays

• Temporary array is used to store the sorted elements from the


two arrays

A 6 13 18 21 B 4 8 9 20

C 4 6 8 9 13 18 20 21

• Running time of merge: O(n), where n is the number of 49


elements in the merged array.
Merge Sort Pseudo Code

50
Merge Sort Pseudo Code (2)

51
Merge Sort Analysis

Let T(n) be the time taken by this algorithm to sort an array of n


elements dividing A into sub-arrays A1 and A2. Merge operation takes
the linear time. Consequently,
T(n) = T(n/2) + T(n/2) + θ(n)
T(n) = 2T (n/2) + θ(n)
The above recurrence relation is non-homogenous and can be solved
by any of the methods: 52
• Substitution
• Recursion tree
• Master method
Merge Sort Analysis
Merge n items:
Level 0
n O(n)

Level 1 Merge two n/2 items:


n/2 n/2
O(n)

n/4 n/4 n/4 n/4 Level 2

Each level requires


O(n) operations

1 1 1 ... 1 1 1 Tree Height : log2n

Each level O(n) operations & O(log2n) levels  O(n*log2n)


53
Merge Sort Analysis
• Worst case: O(n * log2n).
• Average case: O(n * log2n).

• Performance is independent of the initial order of the array items.

• Advantage:
• Mergesort is an extremely fast algorithm.

• Disadvantage:
• Mergesort requires a second array as large as the original array.

54
Quick sort
• Quick Sort: orders a list of values by partitioning the list
around one element called a pivot, then sorting each
partition.

• Key Points:
• choose one element in the list to be the pivot (= partition
element)
• organize the elements so that all elements less than the
pivot are to its left and all greater are to its right
• apply the quick sort algorithm (recursively) to both
partitions
55
Quicksort Outline
• Divide and conquer approach
• Given array S to be sorted
• If size of S < 1 then done;
• Pick any element v in S as the pivot
• Partition S-{v} (remaining elements in S) into two groups
• S1 = {all elements in S-{v} that are smaller than v}
• S2 = {all elements in S-{v} that are larger than v}
• Return {quicksort(S1) followed by v followed by
quicksort(S2) }
• Trick lies in handling the partitioning (step 3).
• Picking a good pivot
• Efficiently partitioning in-place 56
Quick sort illustrated
pick a pivot

40 18 37 2
10 32 6 35
12 17
partition
6 17 40 37
10 12 2 18 32 35

quicksort quicksort
2 6 10 12 17 18 32 35 37 40

combine 57

2 6 10 12 17 18 32 35 37 40
Pseudocode

58
Pseudocode

What is the running time of


partition()?
59
partition() runs in O(n) time
Partitioning
• To partition a[left...right]:
1. Set p = a[left], l = left + 1, r = right;
2. while l < r, do
2.1. while l < right & a[l] < p, set l = l + 1
2.2. while r > left & a[r] >= p, set r = r - 1
2.3. if l < r, swap a[l] and a[r]
3. Set a[left] = a[r], a[r] = p
4. Terminate

60
Example of partitioning
• choose pivot: 436924312189356
• search: 436924312189356
• swap: 433924312189656
• search: 433924312189656
• swap: 433124312989656
• search: 433124312989656
• swap: 433122314989656
• search: 4 3 3 1 2 2 3 1 4 9 8 9 6 5 6 (left > right)
• swap with pivot: 133122344989656
61
Quicksort: Best Case Analysis
• Assume that keys are random, uniformly distributed.
• What is best case running time?
• Best case: pivot is the median
• Recursion:
1. Partition splits array in two sub-arrays of size n/2
2. Quicksort each sub-array
• Depth of recursion tree? O(log2n)
• Number of accesses in partition? O(n)

62
Recursion Tree for Best Case
Partition Comparisons

Nodes contain problem size n n

n/2 n/2 n

n/4 n/4 n/4 n/4 n

n/8 n/8 n/8 n/8 n/8 n/8 n/8 n/8


.
.
.
.
.
.
n
. >
. >
>
>

T(n) = 2 T(n/2) + cn 63

T(n) = cn log n + n = O(n log n)


Quicksort: Average Case Analysis
• Assume that keys are random, uniformly distributed.
• What is average case running time?
• Many Recursions:
• Depth of recursion tree? O(logxn)
• x depends on how the partition is split
• Number of accesses in partition? O(n)

O (n lg n)

64
Quicksort: Worst Case Analysis
• Assume that keys are random, uniformly distributed.
• What is worst case running time?
• Pivot is the smallest (or largest) element all the time.
• Recursion:
1. Partition splits array in three sub-arrays:
• one sub-array of size 0
• one sub-array with the pivot itself
• the other sub-array of size n-1
2. Quicksort each sub-array

• Depth of recursion tree? O(n)


• Number of accesses per partition? O(n)
65
Worst Case Intuition
t(n) = n-1 n-1

0 n-2 n-2

0 n-3 n-3
T(n) = T(n-1) + cn
T(n-1) = T(n-2) + c(n-1) 0 n-4 n-4
T(n-2) = T(n-3) + c(n-2) .
.
.

T(2) = T(1) + 2c 0 1 1
N
T(N)  T(1)  c  i  O(N 2 ) 0 66
i2 0 0
Picking the Pivot
• How would you pick one?

• Strategy 1: Pick the first element in S

• Works only if input is random

• What if input S is sorted, or even mostly sorted?


• All the remaining elements would go into either S1 or S2!
• Terrible performance!

• Why worry about sorted input?


• Remember  Quicksort is recursive, so sub-problems could be sorted
• Plus mostly sorted input is quite frequent

67
Picking the Pivot (contd.)
• Strategy 2: Pick the pivot randomly

• Would usually work well, even for mostly sorted input

• Unless the random number generator is not quite random!

• Plus random number generation is an expensive operation

68
Picking the Pivot (contd.)
• Strategy 3: Median-of-three Partitioning

• Ideally, the pivot should be the median of input array S


• Median = element in the middle of the sorted sequence

• Would divide the input into two almost equal partitions

• Unfortunately, its hard to calculate median quickly, without sorting first!

• So find the approximate median


• Pivot = median of the left-most, right-most and center element of the
array S
• Solves the problem of sorted input

69
Picking the Pivot (contd.)
• Example: Median-of-three Partitioning

• Let input S = {6, 1, 4, 9, 0, 3, 5, 2, 7, 8}

• left=0 and S[left] = 6

• right=9 and S[right] = 8

• center = (left+right)/2 = 4 and S[center] = 0

• Pivot
• = Median of S[left], S[right], and S[center]
• = median of 6, 8, and 0
• = S[left] = 6
70
Partitioning Algorithm
• Original input : S = {6, 1, 4, 9, 0, 3, 5, 2, 7, 8}

• Get the pivot out of the way by swapping it with the last element

8 1 4 9 0 3 5 2 7 6
pivot
• Have two ‘iterators’ – i and j
• i starts at first element and moves forward
• j starts at last element and moves backwards

8 1 4 9 0 3 5 2 7 6
i j pivot
71
Partitioning Algorithm (contd.)
 While (i < j)

• Move i to the right till we find a number greater than pivot

• Move j to the left till we find a number smaller than pivot

• If (i < j) swap(S[i], S[j])

• (The effect is to push larger elements to the right and smaller elements to
the left)

4. Swap the pivot with S[i]


72
Partitioning Algorithm Illustrated
i j pivot
8 1 4 9 0 3 5 2 7 6
Move i j pivot
8 1 4 9 0 3 5 2 7 6
i j pivot
swap
2 1 4 9 0 3 5 8 7 6
i j pivot
move
2 1 4 9 0 3 5 8 7 6
i j pivot
swap
2 1 4 5 0 3 9 8 7 6
move j i pivot i and j
2 1 4 5 0 3 9 8 7 6 have crossed
Swap S[i]
2 1 4 5 0 3 6 8 7 9 73
with pivot
j i
pivot
Dealing with small arrays
• For small arrays (N ≤ 20),
• Insertion sort is faster than quicksort

• Quicksort is recursive
• So it can spend a lot of time sorting small arrays

• Hybrid algorithm:
• Switch to using insertion sort when problem size is small
(say for N < 20)

74
Special cases
• What happens when the array contains many duplicate
elements?

• What happens when the array is already sorted (or nearly


sorted) to begin with?

75
Quick sort: Final Comments
• If the array is sorted to begin with, Quicksort is terrible: O(n2)
• It is possible to construct other bad cases
• However, Quicksort is usually O(n log2n)
• The constants are so good that Quicksort is generally the
fastest algorithm known
• Most real-world sorting is done by Quicksort
• For optimum efficiency, the pivot must be chosen carefully
• “Median of three” is a good technique for choosing the pivot
• However, no matter what you do, there will be some cases
where Quicksort runs in O(n2) time
76
Heapsort
• Combines the better attributes of merge sort and insertion sort.
• Like merge sort, but unlike insertion sort, running time is O(nlg n).
• Like insertion sort, but unlike merge sort, sorts in-place.

• Introduces an algorithm design technique


• Use of a data structure (heap) to manage information during execution
of an algorithm.

• The heap has other applications beside sorting.


• Priority Queues

77
Special Types of Trees
4
• Def: Full binary tree = a binary tree in
which each node is either a leaf or has 1 3
degree exactly 2. 2 16 9 10
14 8 712
Full binary tree

• Def: Complete binary tree = a binary 4

tree in which all leaves are on the same


level and all internal nodes have degree 2. 1 3
2 16 9 10

Complete binary tree 78


The Heap Data Structure
• Def: A heap is a nearly complete binary tree with the following
two properties:
• Structural property: all levels are full, except possibly the last
one, which is filled from left to right
• Order (heap) property: for any node x
Parent(x) ≥ x

8 From the heap property, it


follows that:
7 4 “The root is the maximum
5 2
element of the heap!”

Heap 79

A heap is a binary tree that is filled in order


Larger than
its parent
20

15 8

4 10 7 9

Not a heap 80
Missing
left child
25

10 12

9 5 11 7

1 6 3 8

Not a complete tree – Not a heap 81


Array Representation of Heaps
• A heap can be stored as an array
A.
• Root of tree is A[1]
• Left child of A[i] = A[2i]
• Right child of A[i] = A[2i + 1]
• Parent of A[i] = A[ i/2 ]
• Heapsize[A] ≤ length[A]
• The elements in the subarray
A[(n/2+1) .. n] are leaves

82
Heap Types
• Max(Min)-Heap
• For every node excluding the root, value is at most
(at least) that of its parent – called heap property.
• A[parent[i]]  A[i] (max-heap)
• A[parent[i]]  A[i] (min-heap)

• Largest (smallest) element is stored at the root.


• Subtree rooted at a node contains values no larger
(smaller) than that at the node.
83
Adding/Deleting Nodes
• New nodes are always inserted at the bottom level (left to right)
• Nodes are removed from the bottom level (right to left)

84
Heaps – Example
26 24 20 18 17 19 13 12 14 11 Max-heap as an
array.
1 2 3 4 5 6 7 8 9 10

Max-heap as a binary
tree.
26

24 20

18 17 19 13

85

12 14 11 Last row filled from left to right.


Heap Properties
• Height = lg n
• No. of leaves = n/2
• No. of nodes of height h  n/2h+1

86
Operations on Heaps
• Maintain/Restore the max-heap property
• MAX-HEAPIFY
• Create a max-heap from an unordered array
• BUILD-MAX-HEAP
• Sort an array in place
• HEAPSORT
• Priority queues

87
Maintaining the Heap Property
• Suppose a node is smaller than a child
• Left and Right subtrees of i are max-heaps
• To eliminate the violation:
• Exchange with larger child
• Move down the tree
• Continue until node is not smaller than children

88
Example
MAX-HEAPIFY(A, 2, 10)

A[2]  A[4]

A[2] violates the heap property A[4] violates the heap property

A[4]  A[9]

89
Heap property restored
Maintaining the Heap Property
• Assumptions:
Alg: MAX-HEAPIFY(A, i, n) • LEFT(i) and RIGHT(i)
1. l ← LEFT(i) are max-heaps
• A[i] may be smaller
2. r ← RIGHT(i)
than its children
3. if l ≤ n and A[l] > A[i]
4. then largest ←l
5. else largest ←i Time to fix node i and
its children = (1)
6. if r ≤ n and A[r] > A[largest]
7. then largest ←r PLUS
8. if largest  i
9. then exchange A[i] ↔ A[largest] Time to fix the
subtree rooted at one
10. MAX-HEAPIFY(A, largest, n) of i’s children = T(size
90
of subree at largest)
Running Time for MaxHeapify(A, n)
• Intuitively:
- h
-
- 2h

• T(n) = T(largest) + (1)


• largest  2n/3 (worst case occurs when the last row
of tree is exactly half full)
• T(n)  T(2n/3) + (1)  T(n) = O(lg n)
• Alternately, MaxHeapify takes O(h) where h is the
height of the node where MaxHeapify is applied 91

• Since the height of the heap is lgn


Building a Heap
• Convert an array A[1 … n] into a max-heap (n = length[A])
• The elements in the subarray A[(n/2+1) .. n] are leaves
• Apply MAX-HEAPIFY on elements between 1 and n/2

Alg: BUILD-MAX-HEAP(A) 1

4
1. n = length[A]
2 3

2. for i ← n/2 downto 1 1 3


4 5 6 7

3. do MAX-HEAPIFY(A, i, n) 8
2 9 10
16 9 10
14 8 7

A: 4 1 3 2 16 9 10 14 8 7 92
Example: A 4 1 3 2 16 9 10 14 8 7

i=5 i=4 i=3


1 1 1

4 4 4
2 3 2 3 2 3

1 3 1 3 1 3
4 5 6 7 4 5 6 7 4 5 6 7

8
2 9 10
16 9 10 8
2 9 10
16 9 10 8
14 9 10
16 9 10
14 8 7 14 8 7 2 8 7

i=2 i=1
1 1 1

4 4 16
2 3 2 3 2 3

1 10 16 10 14 10
4 5 6 7 4 5 6 7 4 5 6 7

8
14 9 10
16 9 3 8
14 9 10
7 9 3 8
8 9 10
7 9 3
2 8 7 2 8 1 2 4 1
93
Running Time of BUILD MAX HEAP
Alg: BUILD-MAX-HEAP(A)
1. n = length[A]
2. for i ← n/2 downto 1
O(n)
3. do MAX-HEAPIFY(A, i, n) O(lgn)

 Running time: O(nlgn)


• This is not an asymptotically tight upper bound

94
Running Time of BUILD MAX HEAP
• HEAPIFY takes O(h)  the cost of HEAPIFY on a node i is
proportional to the height of the node i in the tree
h
 T (n)   ni hi   2 h  i   O (n)
h
i

i 0 i 0
Height Level No. of nodes
h0 = 3 i=0 20

h1 = 2 i=1 21

h2 = 1 i=2 22

h3 = 0 i = 3 (lgn) 23
95
Running Time of BUILD MAX HEAP
h
T (n)   ni hi Cost of HEAPIFY at level i  number of nodes at that level
i 0
h
  2i h  i  Replace the values of ni and hi computed before
i 0
h
hi h
 h i
2 Multiply by 2h both at the nominator and denominator and
i 0 2 write 2i as 1 i
2
h
k
2  k
h
Change variables: k = h - i
k 0 2

k
 n k
The sum above is smaller than the sum of all elements to 
k 0 2 and h = lgn

 O (n) The sum above is smaller than 2


96
Running time of BUILD-MAX-HEAP: T(n) = O(n)
Heap Sort
• Goal:
• Sort an array using heap representation
• Idea
• Build a max-heap from the array (BuildMaxHeap)
• Swap the first and last (root) elements of the array.
• Now, the largest element is in the last position (where it belongs)
• “Discard” the last node by decreasing the heap size
• Call (MaxHeapify) on the new root
• Repeat this process until only one node remain.
97
Example: A=[7, 4, 3, 1, 2]

MAX-HEAPIFY(A, 1, 4) MAX-HEAPIFY(A, 1, 3) MAX-HEAPIFY(A, 1, 2)

MAX-HEAPIFY(A, 1, 1) 98
Alg: HEAPSORT(A)
O(n)
1. BUILD-MAX-HEAP(A)
2. for i ← length[A] downto 2
n-1 times
3. do exchange A[1] ↔ A[i]
4. MAX-HEAPIFY(A, 1, i - 1) O(lgn)

• In-place
• Build-Max-Heap takes O(n) and each of the n-1 calls to
Max-Heapify takes time O(lg n). 99

• Therefore, T(n) = O(n lg n)


Heap Procedures for Sorting
• MaxHeapify O(lg n)
• BuildMaxHeap O(n)
• HeapSort O(n lg n)

100
Priority Queue
• Popular application of heaps.
• Max and min priority queues.
• Maintains a dynamic set S of elements.
• Each set element has a key – an associated value.
• Goal is to support insertion and extraction efficiently.
• Applications:
• Ready list of processes in operating systems by their priorities – the
list is highly dynamic
• In event-driven simulators to maintain the list of events to be
simulated in order of their time of occurrence.

101
Basic Operations
• Operations on a max-priority queue:
• Insert(S, x) - inserts the element x into the set S
• S  S  {x}.
• Maximum(S) - returns the element of S with the
largest key.
• Extract-Max(S) - removes and returns the element of S
with the largest key.
• Increase-Key(S, x, k) – increases the value of element
x’s key to the new value k.
• Min-priority queue supports Insert, Minimum,
Extract-Min, and Decrease-Key.
• Heap gives a good compromise between fast
102
insertion but slow extraction and vice versa.
HEAP-MAXIMUM
Goal:
• Return the largest element of the heap

Alg: HEAP-MAXIMUM(A)
Running time: O(1)
1. return A[1]

Heap A:

103
Heap-Maximum(A) returns 7
HEAP-EXTRACT-MAX
Goal:
• Extract the largest element of the heap (i.e., return the max value
and also remove that element from the heap
Idea:
• Exchange the root element with the last
• Decrease the size of the heap by 1 element
• Call MAX-HEAPIFY on the new root, on a heap of size n-1

Heap A: Root is the largest element

104
Example: HEAP-EXTRACT-MAX
16 1

14 10 max = 16 14 10
8 7 9 3 8 7 9 3
2 4 1 2 4
Heap size decreased with 1

14

Call MAX-HEAPIFY(A, 1, n-1)


8 10
4 7 9 3
105
2 1
HEAP-EXTRACT-MAX
Alg: HEAP-EXTRACT-MAX(A, n)

1. if n < 1

2. then error “heap underflow”

3. max ← A[1]

4. A[1] ← A[n]

5. MAX-HEAPIFY(A, 1, n-1) remakes heap

6. return max
106
Running time: O(lgn)
HEAP-INCREASE-KEY
• Goal:
• Increases the key of an element i in the heap
• Idea:
• Increment the key of A[i] to its new value
• If the max-heap property does not hold anymore: traverse a path
toward the root to find the proper place for the newly increased key

16

14 10
8 i 7 9 3
107
Key [i] ← 15 2 4 1
Example: HEAP-INCREASE-KEY
16 16

14 10 14 10
8 i 7 9 3 8 i 7 9 3
2 4 1 2 15 1

Key [i ] ← 15

16 16
i
14 10 15 10
i
15 7 9 3 14 7 9 3
2 8 1 2 8 1 108
HEAP-INCREASE-KEY
Alg: HEAP-INCREASE-KEY(A, i, key)

1. if key < A[i]


2. then error “new key is smaller than current key”
3. A[i] ← key
4. while i > 1 and A[PARENT(i)] < A[i]
5. do exchange A[i] ↔ A[PARENT(i)]
16
6. i ← PARENT(i)
14 10
• Running time: O(lgn) 8 i 7 9 3
109
2 4 1
Key [i] ← 15
MAX-HEAP-INSERT
• Goal: 16
• Inserts a new element into a max-heap
14 10
• Idea:
8 7 9 3
• Expand the max-heap with a new element
2 4 1 -
whose key is -
• Calls HEAP-INCREASE-KEY to set the key of the
new node to its correct value and maintain
the max-heap property 16

14 10
8 7 9 3
2 4 1 15 110
Example: MAX-HEAP-INSERT
Insert value 15: Increase the key to 15
- Start by inserting - Call HEAP-INCREASE-KEY on A[11] = 15
16 16

14 10 14 10
8 7 9 3 8 7 9 3
2 4 1 - 2 4 1 15

The restored heap containing


the newly added element

16 16

14 10 15 10
8 14 9 3 111
8 15 9 3
2 4 1 7 2 4 1 7
MAX-HEAP-INSERT

Alg: MAX-HEAP-INSERT(A, key, n) 16

1. heap-size[A] ← n + 1
14 10
2. A[n + 1] ← - 8 7 9 3

3. HEAP-INCREASE-KEY(A, n + 1, key) 2 4 1 -

Running time: O(lgn)


112
Summary
• We can perform the following operations on heaps:
• MAX-HEAPIFY O(lgn)
• BUILD-MAX-HEAP O(n)
• HEAP-SORT O(nlgn)
• MAX-HEAP-INSERT O(lgn)
• HEAP-EXTRACT-MAX O(lgn)
Average
• HEAP-INCREASE-KEY O(lgn)
O(lgn)
• HEAP-MAXIMUM O(1)

113
Priority Queue Using Linked List

12 4

Average: O(n)
Increase key: O(n)

Extract max key: O(1)

114
Sorting So Far
• Insertion sort:
• Easy to code
• Fast on small inputs (less than ~50 elements)
• Fast on nearly-sorted inputs
• O(n2) worst case
• O(n2) average (equally-likely inputs) case
• O(n2) reverse-sorted case

115
Sorting So Far
• Merge sort:
• Divide-and-conquer:
• Split array in half
• Recursively sort subarrays
• Linear-time merge step
• O(n lg n) worst case
• Doesn’t sort in place

116
Sorting So Far
• Quick sort:
• Divide-and-conquer:
• Partition array into two subarrays, recursively sort
• All of first subarray < all of second subarray
• No merge step needed!
• O(n lg n) average case
• Fast in practice
• O(n2) worst case
• Naïve implementation: worst case on sorted input
• Address this with picking pivot using “median of three partitioning”

117
Sorting So Far
• Heap sort:
• Uses the very useful heap data structure
• Complete binary tree
• Heap property: parent key > children’s keys
• O(n lg n) worst case
• Sorts in place
• Fair amount of shuffling memory around

118
How Fast Can We Sort?
• We will provide a lower bound, then beat it
• How do you suppose we’ll beat it?
• First, an observation: all of the sorting algorithms so far are
comparison sorts
• The only operation used to gain ordering information about a
sequence is the pairwise comparison of two elements

119
Theorem (Comparison based Sorting)
Theorem: all recursive comparison sorts are (n lg n)
Proof:
1- Draw decision tree
2- Calculate asymptotic height of any decision tree for sorting n elements

• Decision trees provide an abstraction of comparison sorts


• A decision tree represents the comparisons made by a comparison
sort. Every thing else ignored
• Decision trees can model comparison sorts. For a given algorithm:
• One tree for each n
• Tree paths are all possible execution traces
120
Decision Trees (Example)
Consider sorting three numbers a1, a2, a3.
There are 3! = 6 possible combinations:
(a1, a2, a3), (a1, a3, a2) , (a3, a2, a1) (a3, a1, a2), (a2, a1, a3) , (a2, a3, a1)

121
Decision Trees
• What’s the longest path in a decision tree for insertion sort?
For merge sort?
• What is the asymptotic height of any decision tree for sorting
n elements?

• Theorem: Any decision tree that sorts n elements has height


(n lg n)

• What’s the minimum # of leaves?


• What’s the maximum # of leaves of a binary tree of height h?
• Clearly the minimum # of leaves is less than or equal to the
122
maximum # of leaves
Lower Bound For Comparison Sorting
• Max no. of nodes at height h  2h
• So we have…
n!  2h
• Taking logarithms:
lg (n!)  h
• Stirling’s approximation tells us:
n n
n n
n!    h  lg  
e e

 n lg n  n lg e

 n lg n  123

• Thus the minimum height of a decision tree is (n lg n)


Lower Bound For Comparison Sorting
• Thus the time to comparison sort n elements is (n lg n)

• Corollary: Heapsort and Mergesort are asymptotically optimal


comparison sorts

• Can we do any better than (n lg n)?

124
Types of Sorting Algorithms
• Non-recursive/incremental comparison sorting
• Selection sort
• Bubble sort
• Insertion sort
• Recursive comparison sorting
• Merge sort
• Quick sort
• Heap sort
• Non-comparison linear sorting
• Count sort
• Radix sort
125
• Bucket sort
Counting Sort
Non-comparison sort.
Assumption
n numbers in the range 1..k.

Key idea
Input: A[1..n], where A[j]  {1, 2, 3, …, k}
Output: B[1..n], sorted (notice: not sorting in place)
Also: Array C[1..k] for auxiliary storage
For each x count the number C(x) of elements ≤ x
Insert x at output position C(x) and decrement C(x).
126
https://www.cs.usfca.edu/~galles/visualization/CountingSort.html
Counting Sort
1 CountingSort(A, B, k)
2 for i=1 to k
3 C[i]= 0;
4 for j=1 to n
5 C[A[j]] += 1;
6 for i=2 to k
7 C[i] = C[i] + C[i-1];
8 for j=n downto 1
9 B[C[A[j]]] = A[j];
10 C[A[j]] -= 1;

127
Example of Counting Sort

128

Work through example: A={4 1 3 4 3}, k = 4


Analysis of Counting Sort
Counting-Sort(A, B, k)
for i = 1 to k
do C[i] = 0 // (k)
for j = 1 to length[A]
do C[A[j]] = C[A[j]] + 1 // (n)
// C[i] now contains the number of elements = i
for i = 2 to k
do C[i] = C[i] + C[i-1] // (k)
// C[i] now contains the number of elements ≤ i
for j = length[A] downto 1
do B[C[A[j]]] = A[j]
C[A[j]] = C[A[j]] - 1 // (n) 129

Running time is (n+k) (or (n) if k = O(n))!


Counting Sort
• Total time: O(n + k)
• Usually, k = O(n)
• Thus counting sort runs in O(n) time
• But sorting is (n lg n)!
• No contradiction--this is not a comparison sort (in fact, there are
no comparisons at all!)
• Notice that this algorithm is stable

130
Counting Sort
• Cool! Why don’t we always use counting sort?
• Because it depends on range k of elements
• Could we use counting sort to sort 32 bit integers? Why or
why not?
• Answer: no, k too large (232 = 4,294,967,296)

131
Stability
Counting sort is stable:
Input order maintained among items with equal keys.

Stability is important because


data are often carried with the keys being sorted.
radix sort (which uses counting sort as a subroutine)
relies on it to work correctly.

How is stability realized?

for j = length[A] downto 1


132
do B[C[A[j]]] = A[j]
C[A[j]] = C[A[j]] - 1
Radix Sort
• Sort on the most significant digit, then the second msd, etc.
• Problem: lots of intermediate piles of cards (read: scratch
arrays) to keep track of
• Key idea: sort the least significant digit first
RadixSort(A, d)
for i=1 to d
StableSort(A) on digit i

133
https://www.cs.usfca.edu/~galles/visualization/RadixSort.html
Radix Sort
• Algorithm used by the card-sorting machines you now find
only in computer museums

• How did IBM get rich originally?


• Answer: punched card readers for census tabulation in early 1900’s.
• In particular, a card sorter that could sort cards into different bins
• Each column can be punched in 12 places
• Decimal digits use 10 places
• Problem: only one column can be sorted on at a time

134
Radix Sort

The operation of radix sort on a list of seven 3-digit numbers.


The leftmost column is the input. The remaining columns show the list after
successive sorts on increasingly significant digit positions. Shading indicates the
digit position sorted on to produce each list from the previous one.
135
Radix Sort
RADIX-SORT(A, d)
1 for i ← 1 to d
2 do use a stable sort to sort array A on digit i

Sort n d-digit numbers.


Let k be the range of each digit.

Each pass takes time (n+k). // use counting sort


There are d passes in total.

The running time for radix sort is (dn+dk). 136

Linear running time when d is a constant and k = O(n).


Radix Sort
• Can we prove it will work?
• Sketch of an inductive argument (induction on the number of
passes):
• Assume lower-order digits {j: j<i}are sorted
• Show that sorting next digit i leaves array correctly sorted
• If two digits at position i are different, ordering numbers by that digit
is correct (lower-order digits irrelevant)
• If they are the same, numbers are already sorted on the lower-order
digits. Since we use a stable sort, the numbers stay in the right order

137
Radix Sort
• Problem: sort 1 million 64-bit numbers
• Treat as four-digit radix 216 numbers
• Can sort in just four passes with radix sort!
• Compares well with typical O(n lg n) comparison sort
• Requires approx lg n = 20 operations per number being sorted
• So why would we ever use anything but radix sort?

• In general, radix sort based on counting sort is


• Fast
• Asymptotically fast (i.e., O(n))
• Simple to code
• A good choice
• To think about: Can radix sort be used on floating-point numbers? 138
Bucket Sort
• Assumption: uniform distribution
• Input numbers are uniformly distributed in [0,1).
• Suppose input size is n.
• Idea:
• Divide [0,1) into n equal-sized subintervals (buckets).
• Distribute n numbers into buckets
• Expect that each bucket contains few numbers.
• Sort numbers in each bucket (insertion sort as default).
• Then go through buckets in order, listing elements,
139
Bucket Sort
BUCKET-SORT(A)
1 n ← length[A]
2 for i ← 1 to n
3 do insert A[i] into list B[⌊n A[i]⌋]
4 for i ← 0 to n - 1
5 do sort list B[i] with insertion sort
6 concatenate the lists B[0], B[1], . . ., B[n - 1]

140
Bucket Sort
Assumption: input elements are uniformly distributed over [0, 1)
n inputs dropped into n equal-sized subintervals of [0, 1)
.78 0

.17 1 .12 .17


.39 2 .21 .23 .26
.26 3 .39
.72 4
.94 5 Step 2: Concatenate
all lists
.21 6 .68
.12 7 .72 .78
8 Step 1: insertion sort
.23 within each list 141
.68 9 .94
Bucket sort runs in (n) expected time.
Bucket Sort Analysis
BUCKET-SORT(A)
1 n ← length[A] (1)
2 for i ← 1 to n O(n)
3 do insert A[i] into list B[⌊n A[i]⌋] (1) (i.e total O(n))
4 for i ← 0 to n - 1 O(n)
5 do sort list B[i] with insertion sort O(ni2)
6 concatenate the lists B[0], B[1], . . ., B[n - 1] O(n)

Where ni is the size of bucket B[i].


Thus T(n) = (n) + σ𝑛−1𝑖=0 𝑂(𝑛𝑖 )
2

= (n) + O(n) = (n) Beat (nlg n)


142
Comparison of Sorting Algorithms
Insertion sort: suitable only for small n.
Merge sort: guaranteed to be fast even in its worst case; stable.
Quicksort: most useful general-purpose sorting for very little memory
requirement and fastest average time. (choose the median of
three elements as pivot in practice :-)
Heapsort: requiring minimum memory and guaranteed to run fast;
average and maximum time both roughly twice the average
time of quicksort.
Counting sort: very useful when the keys have small range; stable;
memory space for counters and for 2n records.
Radix sort: appropriate for keys either rather short or with an
lexicographic collating sequence.
143
Bucket sort: assuming keys to have uniform distribution.

Potrebbero piacerti anche