Comparison Sort Design and Analysis of Algorithm

Comparison Sort
Type of sorting algorithm that only reads the list elements through a single abstract comparison operation
(often a "less than or equal to"operator or a three-way comparison ) that determines which of two elements
should occurs first in the final sorted list. The onlyn requirement is that the operator obey the two of the properties of
a total order:
1.if a b and b c then a c (transitivity)

2.for all a and b, either a b or b a (totalness or trichotomy).
It is possible that both a b and b a; in this case either may come first in the sorted list. In a stable sort, the input
order determines the sorted order in this case.
There are limitations in Comparison Sorting:

*There are fundamentals limits on the performance of comparison sorts.
*To sort elements, comparison sort must make (n log n) comparisons in the worst case.
*That is a comparison sort must have a lower bound of (n log n) comparison operations, which is known as linear or
line arrhythmic time.
*This is a consequence of limited information available through comparisons alone
Some of the most well-known Comparison Sorts include:

Quick sort
Quicksort (sometimes called partition-exchange sort) is an efficient sorting algorithm, serving as a systematic method
for placing the elements of an array in order. Developed by Tony Hoare in 1959,[1] with his work published in 1961,[2]
it is still a commonly used algorithm for sorting. When implemented well, it can be about two or three times faster than
its main competitors, merge sort and heapsort.[3]
Quicksort is a comparison sort, meaning that it can sort items of any type for which a "less-than" relation (formally, a
total order) is defined. In efficient implementations it is not a stable sort, meaning that the relative order of equal sort
items is not preserved. Quicksort can operate in-place on an array, requiring small additional amounts of memory to
perform the sorting.
Mathematical analysis of quicksort shows that, on average, the algorithm takes O(n log n) comparisons to sort n
items. In the worst case, it makes O(n2) comparisons, though this behavior is rare.
Sorting
How to alphabetize a list of words? Sort a list of numbers? Some other information? We saw last time one reason for
doing this (so we can apply binary search) and the same problem comes up over and over again in programming.
Comparison sorting
This is a very abstract model of sorting. We assume we are given a list of objects to sort, and that there is some
particular order in which they should be sorted. What is the minimum amount of information we can get away with
and still be able to sort them?
As a particular case of the sorting problem, we should be able to sort lists of two objects. But this is the same as
comparing any two objects, to determine which comes first in the sorted order. (For now, we assume no two objects
are equal, so one should always go before the other; most sorting algorithms can also handle objects that are "the
same" but it complicates the problem.)
Algorithms that sort a list based only on comparisons of pairs (and not using other information about what is being
sorted, for instance arithmetic on numbers) are called comparison sorting algorithms
Why do we care about this abstract and restrictive model of sorting?

We only have to write one routine to do sorting, that can be used over and over again without having to rewrite it and
re-debug it for each new sorting problem you need to solve.
In fact we don't even have to write that one routine, it is provided in the qsort() routine in the Unix library.
For some problems, it is not obvious how to do anything other than comparisons. (I gave an example from my own
research, on a geometric problem of quadtree construction, which involved comparing points (represented as pairs of
coordinates) by computing bitwise exclusive ors of the coordinates, comparing those numbers, and using the result to
determine which coordinates to compare).
It's easier to design and analyze algorithms without having to think about unnecessary problem-specific details
Some comparison sorting algorithms work quite well, so there is not so much need to do something else.
Sorting algorithms
There are dozens of sorting algorithms. Baase covers around seven. We'll probably have time only for four: heapsort,
merge sort, quicksort, and bucket sort. Each of these is useful as an algorithm, but also helps introduce some new
ideas:
Heapsort shows how one can start with a slow algorithm (selection sort) and by adding some simple data structures
transform it into a much better one.
Merge sort and quick sort are different examples of divide and conquer, a very general algorithm design technique in
which one partitions an input into parts, solves the parts recursively, then recombines the sub problem solutions into
one overall solution. The two differ in how they do the partition and recombination; merge sort allows any partition,
but the result of the recursive solution to the parts is two interleaved sorted lists, which we must combine into one in a
somewhat complicated way. Quick sort instead does a more complicated partition so that one subproblem contains
all objects less than some value, and the other contains all objects greater than that value, but then the
recombination stage is trivial (just concatenate).
Quick sort is an example of randomization and average case analysis.
Bucket sort shows how abstraction is not always a good idea -- we can derive improved sorting algorithms for both
numbers and alphabetical words by looking more carefully at the details of the objects being sorted.
Sorting time bounds
What sort of time bounds should we expect? First, how should we measure time? If we have a comparison sorting
algorithm, we can't really say how many machine instructions it will take, because it will vary depending on how
complicated the comparisons are. Since the comparisons usually end up dominating the overall time bound, we'll
measure time in terms of the number of comparisons made.
Sorting algorithms have a range of time bounds, but for some reason there are two typical time bounds for
comparison sorting: mergesort, heapsort, and (the average case of) quicksort all take O(n log n), while insertion sort,
selection sort, and the worst case of quicksort all take O(n^2). As we'll see, O(n log n) is the best you could hope to
achieve, while O(n^2) is the worst -- it describes the amount of time taken by an algorithm that performs every
possible comparison it could.
O(n log n) is significantly faster than O(n^2):
n log n
n^2
--
-------
10
33
100
665
1000
10^4
10^6
10^6
2 10^7
10^12
10^9
3 10^10
10^18
--100
10K
So even if you're sorting small lists it pays to use a good algorithm such as quicksort instead of a poor one like
bubblesort. You don't even have the excuse that bubblesort is easier, since to get a decent sorting algorithm in a
program you merely have to call qsort.
Lower bounds
A lower bound is a mathematical argument saying you can't hope to go faster than a certain amount. More precisely,
every algorithm within a certain model of computation has a running time at least that amount. (This is usually proved
for worst case running times but you could also do the same sort of thing for average case or best case if you want
to.) This doesn't necessarily mean faster algorithms are completely impossible, but only that if you want to go faster,
you can't stick with the abstract model, you have to look more carefully at the problem. So the linear time bound we'll
see later for bucketsort won't contradict the n log n lower bounds we'll prove now.
Lower bounds are useful for two reasons: First, they give you some idea of how good an algorithm you could expect
to find (so you know if there is room for further optimization). Second, if your lower bound is slower than the amount
of time you want to actually spend solving a problem, the lower bound tells you that you'll have to break the
assumptions of the model of computation somehow.
We'll prove lower bounds for sorting in terms of the number of comparisons. Suppose you have a sorting algorithm
that only examines the data by making comparisons between pairs of objects (and doesn't use any random numbers;
the model we describe can be extended to deal with randomized algorithms but it gets more complicated). We
assume that we have some particular comparison sorting algorithm A, but that we don't know anything more about
how it runs. Using that assumption, we'll prove that the worst case time for A has to be at least a certain amount, but
since the only assumption we make on A is that it's a comparison sorting algorithm, this fact will be true for all such
algorithms.
Decision trees
Given a comparison sorting algorithm A, and some particular number n, we draw a tree corresponding to the different
sequences of comparisons A might make on an input of length n.
If the first comparison the algorithm makes is between the objects at positions a and b, then it will make the same
comparison no matter what other list of the same length is input, because in the comparison model we do not have
any other information than n so far on which to make a decision.
Then, for all lists in which a<b, the second comparison will always be the same, but the algorithm might do something
different if the result of the first comparison is that a>b.
So we can draw a tree, in which each node represents the positions involved at some comparison, and each path in
the tree describes the sequence of comparisons and their results from a particular run of the algorithm. Each node
will have two children, representing the possible behaviors of the program depending on the result of the comparison
at that node. Here is an example for n=3.
1:2
/
</
>\
2:3
1:3
/\
/\
</>\
/
</>\
1,2,3
1:3 2,1,3
/\
/\
</>\
/
2:3
</>\
/
1,3,2 3,1,2 2,3,1 3,2,1

This tree describes an algorithm in which the first comparison is always between the first and second positions in the
list (this information is denoted by the "1:2" at the root of the tree). If the object in position one is less than the object
in position two, the next comparison will always be between the second and third positions in the list (the "2:3" at the
root of the left subtree). If the second is less than the third, we can deduce that the input is already sorted, and we
write "1,2,3" to denote the permutation of the input that causes it to be sorted. But if the second is greater than the
third, there still remain two possible permutations to be distinguished between, so we make a third comparison "1:3",
and so on.
Any comparison sorting algorithm can always be put in this form, since the comparison it chooses to make at any
point in time can only depend on the answers to previously asked comparisons. And conversely, a tree like this can
be used as a sorting algorithm: for any given list, follow a path in the tree to determine which comparisons to be made
and which permutation of the input gives a sorted order. This is a reasonable way to represent algorithms for sorting
very small lists (such as the case n=3 above) but for larger values of n it works better to use pseudo-code. However
this tree is also useful for discovering various properties of our original algorithm A.
The worst case number of comparisons made by algorithm A is just the longest path in the tree.
One can also determine the average case number of comparisons made, but this is more complicated.
At each leaf in the tree, no more comparisons to be made -- therefore we know what the sorted order is. Each
possible sorted order corresponds to a permutation, so there are at least n! leaves. (There might be more if for
instance we have a stupid algorithm that tests whether a<c even after it has already discovered that a<b and b<c).
The Sorting Lower Bound

What is longest path in binary tree with k leaves? At least log k. (Proof: one of the two subtrees has at least half the
leaves so LP(k) >= 1 + LP(k/2); the result follows by induction.)
So the number of comparisons to sort is at least log n!. This turns out to be roughly n log n; to distinguish lower
bounds from upper bounds we write them a little differently, with a big Omega rather than a big O, so we write this
lower bound as Omega(n log n). More precisely,
log n! = n log n - O(n).

A reasonably simple proof follows:
n
n! = product i
i=1
so
n
log n! = sum log i
i=1
n
= sum log (n i/n)
i=1
n
= sum (log n - log n/i)
i=1
n
= n log n - sum log n/i .
i=1
Let f(n) be the last term above, sum log(n/i); then we can write down a recurrence bounding f(n):
n
f(n) = sum log n/i
i=1
n/2
f(n) = sum log n/i + sum log n/i

i=1
i=n/2+1
All of the terms in the first sum are equal to log 2((n/2)/i) = 1 + log((n/2)/i), and all of the terms in the second sum are
logs of numbers between 1 and 2, and so are themselves numbers between 0 and 1. So we can simplify this
equation to
n/2
f(n) <= n + sum log (n/2)/i
i=1
= n + f(n/2)
which solves to 2n and completes the proof that log n! >= n log n - 2n.
(Note: in class I got this argument slightly wrong and lost a factor of two in the recurrence for f(n).) We can get a
slightly more accurate formula from Sterling's formula (which I won't prove):
n! ~ sqrt(pi/n) (n/e)^n
so
log n! ~ n log n - 1.4427 n - 1/2 log n + .826
Let's compute a couple examples to see how accurate this is:

log n!
n=10
21.8
n=100 524.8
formula gives
33.22 - 14.43 ~ 18.8
664.4 - 144.3 ~ 520.1
Reference:
https://en.wikipedia.org/wiki/Comparison_sort
https://www.ics.uci.edu/~eppstein/161/960116.html

Comparison Sort Design and Analysis of Algorithm

Caricato da

Informazioni sul documento

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Comparison Sort Design and Analysis of Algorithm

Caricato da

Copyright:

Formati disponibili

Comparison Sort

1.if a b and b c then a c (transitivity)

There are limitations in Comparison Sorting:

Some of the most well-known Comparison Sorts include:

Why do we care about this abstract and restrictive model of sorting?

Sorting time bounds

O(n log n) is significantly faster than O(n^2):

1,3,2 3,1,2 2,3,1 3,2,1

The Sorting Lower Bound

log n! = n log n - O(n).

f(n) = sum log n/i + sum log n/i

Let's compute a couple examples to see how accurate this is:

Potrebbero piacerti anche