Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
'
+
+
+
) , ( ) , (
) , ( ) , (
) , ( ) , (
min ) , (
T C d C S d
T B d B S d
T A d A S d
T S d
The question is: How do we find the shortest route from, say vertex A, to vertex T? Note that we can
use the same principle to find a shortest route from A to T. That is, the problem to find a shortest
route from A to T is the same as the problem finding a shortest route from S to T except the size of the
problem is now smaller.
The shortest route finding problem can now be solved systematically as follows:
'
+
+
+
) , ( ) , (
) , ( ) , (
) , ( ) , (
min ) , (
T C d C S d
T B d B S d
T A d A S d
T S d
'
+
+
+
) , ( 3
) , ( 18
) , ( 15
min
T C d
T B d
T A d
'
+
+
) , ( ) , (
) , ( ) , (
min ) , (
T E d E A d
T D d D A d
T A d
'
+
+
) , ( 10
) , ( 11
min
T E d
T D d
'
+
+
21 10
41 11
min
= 31
'
+
+
+
) , ( ) , (
) , ( ) , (
) , ( ) , (
min ) , (
T G d G B d
T F d F B d
T E d E B d
T S d
2-9
(2.2)
(2.1)
'
+
+
+
) , ( 2
) , ( 1
) , ( 9
min
T G d
T F d
T E d
'
+
+
+
21 2
3 1
21 9
min
= 4
'
+
+
) , ( ) , (
) , ( ) , (
min ) , (
T H d H C d
T G d G C d
T S d
'
+
+
27 16
21 14
min
= 35
Substituting 2.2, 2.3 and 2.4 into 2.1, we obtain that
) , ( T S d
= min{15+31,18+4,3
+35} = 22, which implies that the shortest route from S to T is T F B S . As shown above,
the basic idea of dynamic programming strategy is to decompose a large problem into several sub-
problems. Each sub-problem is identical to the original problem except the size is smaller. Thus the
dynamic programming strategy always solves a problem recursively.
In the next section, we go back to the longest common subsequence problem and show how the
dynamic programming strategy can be applied to solve the problem.
2.4 Application of the Dynamic Programming Strategy to Solve the
Longest Common Subsequence Problem
The longest common subsequence problem was presented in Section 2.2. It was also pointed out that
we can not solve the problem in any nave and unsophisticated way. In this section, we shall show
that this problem can be solved elegantly by using the dynamic programming strategy.
We are given two sequences:
m
a a a S ...
2 1 1
and
n
b b b S ...
2 1 2
. Consider
m
a and
n
b . There are
two cases:
Case 1:
n m
b a . In this case,
m
a , which is equivalent to
n
b , must be included in the longest
2-10
(2.4)
(2.3)
common subsequence. The longest common subsequence of
1
S and
2
S is the longest common
subsequence of
1 2 1
...
m
a a a and
1 2 1
...
n
b b b plus
m
a .
Case 2:
n m
b a . Then we find two longest common subsequences, that of
m
a a a ...
2 1
and
1 2 1
...
n
b b b and that of
1 2 1
...
m
a a a and
n
b b b ...
2 1
. Among these two, we choose the longer one
and the longest common subsequence of
1
S and
2
S must be this longer one.
To summarize, the dynamic programming strategy decomposes the longest common subsequence
problem into three identical sub-problems and each of them is of smaller size. Each sub-problem can
now be solved recursively.
In the following, to simplify the discussion, let us concentrate our mind to finding the length of a
longest common subsequence. It will be obvious that our algorithm can be easily extended to find a
longest common subsequence.
Let
) , ( j i LCS
denote the length of a longest common subsequence of
i
a a a ...
2 1
and
j
b b b ...
2 1
.
) , ( j i LCS
can be found by the following formula:
{ }
'
j i
j i
b a if j i LCS j i LCS
b a if j i LCS
j i LCS
) 1 , ( ), , 1 ( max
1 ) 1 , 1 (
) , (
0 ) 1 , 0 ( ) 0 , 1 ( ) 0 , 0 ( LCS LCS LCS
The following is an algorithm to find the length of a longest common subsequence based upon the
dynamic programming strategy:
Algorithm 2.4 An Algorithm to Find the Length of a Longest Common Subsequence Based upon the
Dynamic Programming Strategy
Input:
m
a a a ...
2 1
and
n
b b b ...
2 1
Output: The length of a longest common subsequence of A and B, denoted as
) , ( n m LCS
Step 1:
0 ) 1 , 0 ( ) 0 , 1 ( ) 0 , 0 ( L L L
Step 2: for i = 1 to m do
for j = 1 to n do
if
j i
b a
then { 1 ) 1 , 1 ( ) , ( + j i LCS j i LCS
else { } { ) 1 , ( ), , 1 ( max ) , ( j i LCS j i LCS j i LCS
end for
end for
2-11
Let us consider an example.
A = AGCT
and B = CGT
The entire process of finding the length of a longest common subsequence of A and B is now
illustrated in the following table, Table 2.1. By tracing back, we can find two longest subsequences
CT and GT.
Let us consider another example: A = aabcdec and B = badea. Table 2.2 illustrates the process.
Again, it can be seen that we have two longest common subsequences, namely bde and ade.
Table 2.1: The Process of Finding the Length of Longest Common Subsequence of AGCT and CGT
Table 2.2: The Process of Finding the Length of a Longest Common Subsequence of A = aabcdec and
B = badea
2.5 The Time-Complexity of Algorithms
In the above sections, we showed that it is important to be able to design efficient algorithms. Or, to
put it in another way, we may say that many problems can hardly be solved if we can not design
efficient algorithms. Therefore, we now come to a critical question: How do we measure the
efficiency of algorithms?
2-12
We usually say that an algorithm is efficient if the program based upon this algorithm runs very fast.
But whether a program runs fast or not sometimes depends on the hardware and also the skill of the
programmers which are irrelevant to the algorithm.
In algorithm analysis, we always choose a particular step of the algorithm. Then we try to see how
many such steps are needed to complete the program. For instance, in all sorting algorithms, the
comparison of data can not be avoided. Therefore, we often use the number of comparisons of data as
the time-complexity of a sorting algorithm.
Let us consider the straight insertion sort algorithm. We are given a sequence of numbers,
n
x x x ...
2 1
.
The straight insertion sort algorithm scans this sequence. If
i
x is found to be smaller than
1 i
x , we
put
1
x to the left of
1 i
x . This process is continued until the left of
i
x is not smaller than it.
Algorithm 2.4 The Straight Insertion Sort Algorithm
Input: A sequence of numbers
n
x x x ...
2 1
Output: The sorted sequence of
n
x x x ...
2 1
for j = 2 to n do
; 1 j i
;
j
x x
while 0 > < i and x x
i
do
; 1
;
1
+
i i
x x
i i
end while
;
1
x x
i
+
end for
Suppose that the input sequence is 9, 17, 1, 5, 10. The straight insertion sort sorts this sequence into a
sorted sequence as follows:
9
9,17
1,9,17
1,5,9,17
1,5,9,10,17
In this sorting algorithm, the dominating steps are data movements. There are three data movements,
2-13
namely
i i j
x x x x
+1
, and x x
i
+1
. We can use the number of data movements to measure the
time-complexity of the algorithm. In the algorithm, there are one outer loop and one inner loop. For
the outer loop, the data movement operations
j
x x
and x x
i
+1
are always executed no matter
what the input is. For the inner loop, there is only one data movement, namely
i i
x x
+1
. This
operation is executed only if this inner loop is executed. In other words, whether this operation is
executed depends on the input data. Let us denote the number of data movements executed for the
inner loop by
i
d . The total number of data movements for the straight insertion sort is
+
+
n
i
i
n
i
i
d n
d X
2
2
) 1 ( 2
) 2 (
Best Case:
The worst case occurs when the input sequence is exactly reversibly sorted. In such a case,
1
2
d
2
3
d
1 n d
n
Thus,
( )
2
2
2
) 4 )( 1 (
2
) 1 (
) 1 ( 2
2
) 1 (
n
n n n n
n X
n n
d
n
i
i
+
Average Case:
To conduct an analysis of the average case, note that when
i
x is being considered,
) 1 ( i
data
movements have already been sorted. Imagine that
i
x is the largest among the i numbers. Then the
inner loop will not be executed. If
i
x is the jth largest number among the i numbers, there will be j-1
data movements executed in the inner loop. The probability that
i
x is the jth largest among i numbers
is
i
1
for
i j 1
. Therefore the average number of data movements is
2-14
.
2
3
1
1
....
3 2
1
+
+
+ +
i
i
j
i
i
i i
i
j
The average case time-complexity for straight insertion sort is
( ) ( ) ( )
2
2 2
2
8 1
4
1
3
2
1
2
3
n n n
i
i
n
i
n
i
n
i
+
,
_
+
+
'
> +
5
5 ) ( ) ) 1 ((
) (
n for c
n for n n f T
n T
k
For sufficiently large n, we have
k k k
k
n f c cn n f T
cn n f T n T
) 1 ( ) ) 1 ((
) ) 1 (( ) (
2
+ +
+
pk k k k
k pk k k k
f f f cn c
n f c n f c cn c
) 1 ( ... ) 1 ( ) ) 1 ( 1 ( '
) 1 ( ... ) 1 ( '
2
+ + + + +
+ + + +
Since
1 ) 1 ( < f
,as
n
,
) ( ) (
k
n n T
2-23
.
.
.
.
.
.
.
.
We now explain why the prune and search strategy can be applied to solve the selection problem.
Given a set S of n numbers, suppose that there is a number p which divides S into three subsets
1
S ,
2
S and
3
S ,
1
S containing all numbers smaller than p,
2
S containing all numbers equal to p and
3
S
containing all numbers greater than p. Then we have the following cases:
Case 1: The size of
1
S is greater than k. In this case, the kth smallest of S must be located in
1
S and
we can prune away
2
S and
3
S .
Case 2: The condition of Case 1 is not valid. But the size of
1
S and
2
S is greater than k. The kth
smallest number of S must be equal to p.
Case 3: None of the conditions of Case 1 and Case 2 is valid. In this case, the kth smallest number of
S must be located in
3
S and we can prune away
1
S and
2
S .
The problem is to determine an appropriate p. This number p must guarantee that a constant fraction
of numbers can be eliminated. Algorithm 2.7 can be used to find that p.
Algorithm 2.7 A Subroutine to Find p from n Numbers for the Selection Problem
Input: A set S of n numbers..
Output: The number p which is to be used in the algorithm to find the kth smallest number based
upon the prune and search strategy.
Step 1: Divide S into
1
1
1
5
n
subsets of 5 numbers, adding if possible into the last subset.
Step 2: Sort each of the 5-number subsets.
Step 3: Find the median
i
m of the ith subset. Recursively find the median of
'
1
1
1
5
,..., 2 , 1
n
m m by
using the selection algorithm. Let p be this median.
2-24
Figure 2.15: The Execution of Algorithm 2.7
That p selected can guarantee
4
1
of the input data can be eliminated and is illustrated in Figure 2.15.
Some points about Algorithm 2.7 are in order. First, it is not absolute that the input set must be
divided into subsets containing 5 numbers. We may divide it into subsets, each containing, say 7,
numbers. Our algorithm would work as long as it each subset contains a constant number of numbers.
Note that as long as the input size is a constant, it takes
) 1 (
, meaning a constant, steps to complete
the algorithm. Thus, each sorting performed in Step 2 takes constant number of steps. For Step 3, p
will be found by using the selection algorithm itself recursively.
The following is the algorithm based upon the prune and search strategy to find the kth smallest
number.
Algorithm 2.8 A Prune and Search Algorithm to Find the Smallest kth Number
Input: A set S of n numbers..
Output:.The kth smallest number of S.
Step 1: Divide S into
1
1
1
5
n
subsets of 5 numbers, if n is not a net multiple of S ,add some dummy
elements to the last subset such that it contains five elements.
Step 2: Sort each subset of elements.
2-25
Step 3: Use Algorithm 2.7 to determine p.
Step 4: Partition S into three subsets
1
S ,
2
S and
3
S , containing numbers less than p, equal to p and
larger than p.
Step 5: If k S
1
, discard
2
S and
3
S and selects the kth smallest number of
1
S in the next
iteration; else if k S S +
2 1
, p is the kth smallest element of p; otherwise, let
2 1
' S S k k . Solve the problem that selects the ' k th smallest number from
3
S
during the next iteration.
Let
) (n T
denote the time-complexity of the algorithm. Then
) ( ) 5 / ( ) ) 4 / 3 (( ) ( n n T n T n T + +
The first term
) ) 4 / 3 (( n T
is due to the fact that 1/4 of the input data will be eliminated after each
iteration. The second term
) 5 / (n T
is due to the fact that during the execution of Algorithm 2.7, we
have to execute the selection problem involving n/5 numbers. The third term
) (n
is due to the fact
that dividing n numbers into n/5 subsets takes
) (n
steps.
It can be proved that
) ( ) ( n n T
. The proof is rather complicated and is omitted here.
Although we may dislike time-complexities such as
3
n
,
5
n
and etc, they are not so bad as compared
with the time-complexities as
n
2
or ! n . When
n
= 10000,
n
2
is an exceedingly large number. If an
algorithm has such a time-complexity, then the problem can never be solved by any computer when n
is large. An algorithm is a polynomial algorithm if its time-complexity is
)) ( ( n p
where
) (n p
is a
polynomial function, such as
2
n
,
4
n
and etc. An algorithm is an exponential algorithm if its time-
complexity cannot be bounded by a polynomial function. There are many problems which have
polynomial algorithms. The sorting problem, the minimal spanning tree problem and the longest
common subsequence problem all have polynomial algorithms. A polynomial problem is called a
polynomial problem if there exist polynomial algorithms to solve them. Unfortunately, there are many
problems which, up to now, have no polynomial algorithms to solve them. We are interested in one
question: Is it possible that in the future, some polynomial algorithms will be found for them? This
question will be answered in the next section.
2.8 The NP-Complete Problems
The concept of NP-completeness is perhaps the most difficult one in the field of design and analysis
of algorithms. It is impossible to present this idea formally. We shall instead present an informal
2-26
discussion of these concepts.
Let us first define some problems.
The partition problem: We are given a set S of numbers and we are asked to determine whether S
can be partitioned into two subsets
1
S and
2
S such that that the sum of elements in
1
S is equal to
the sum of elements of
2
S .
For example, let S = {13, 2, 17, 20, 8}. The answer to this problem instance is "yes" because we can
partition S into
1
S = {13, 17} and
2
S = {2, 20, 8}.
The Sum of Subset Problem: We are given a set S of numbers and a constant c and we are asked to
determine whether there exists a subset S of S such that the sum of S is equal to c.
For example, let S = {12, 9, 33, 42, 7, 10, 5} and c = 24. The answer of this problem instance is "yes"
as there exists S = {9, 10, 5} and the sum of the elements in S is equal to 24. If c is 6, the answer
will be "no".
The Satisfiability Problem: We are given a Boolean formula X and we are asked whether there exists
an assignment of true or false to the variables in X which makes X true.
For example, let X be ) ( ) ( ) (
3 2 2 1 3 2 1
x x x x x x x . Then the following
assignment will make X true and the answer will be yes.
T x F x F x
3 2 1
, ,
If X is
1 1
x x , there will be no assignment which can make X true and the answer will be no.
The Minimal Spanning Tree Problem: Given a graph G, find a spanning tree T of G with the
minimum length.
The Traveling Salesperson Problem: Given a graph G = (V,E), find a cycle of edges of this graph
such that all of the vertices in the graph is visited exactly once with the minimum total length.
For example, consider Figure 2.16. There are two cycles satisfying our condition. They are
a f c d e b a C
1
and
a f d e b c a C 2
.
1
C is
shorter and is the solution of this problem instance.
Figure 2.16 A Graph
2-27
For the partition problem, the sum of subset problem and the satisfiability problem, their solutions are
either "yes" or "no". They are called decision problems. The minimal spanning tree problem and the
traveling salesperson problem are called optimization problems.
For an optimization problem, there is always a decision problem corresponding to it. For instance,
consider the minimal spanning tree problem, we can define a decision version of it as follows: Given
a graph G, determine whether there exists a spanning tree of G whose total length is less than a given
constant c. This decision version of the minimal spanning tree can be solved after the minimal
spanning tree problem, which is an optimization problem, is solved. Suppose the total length of the
minimal spanning tree is a. If a < c, the answer is "yes"; otherwise, its answer is "no". The decision
version of this minimal spanning tree problem is called the minimal spanning tree decision problem.
Similarly, we can define the longest common subsequence decision problem as follows: Given two
sequences, determine whether there exists a common subsequence of them whose length is greater
than a given constant c. We again call this decision problem the longest common subsequence
subsequence decision problem. The decision version problem will be solved as soon as the
optimization problem is solved.
In general, optimization problems are more difficult than decision problems. To investigate whether
an optimization problem is difficult to solve, we merely have to see whether its decision version is
difficult or not. If the decision version is difficult already, the optimization version must be difficult.
Before discussing NP-complete problems, note that there is a term called NP problem. We cannot
formally define NP problems here as it is too complicated to do so. The reader may just remember
the following: (1) NP problems are all decision problems. (2) Nearly all of the decision problems are
NP problems. Among the NP problems, there are many problems which have polynomial algorithms.
They are called P problems. For instance, the minimal spanning tree decision problem and the longest
common subsequence decision problem are all P problems. There are also a large set of problems
which, up to now, have no polynomial algorithms.
Figure 2.17: NP Problems
2-28
NP-complete problems constitute a subset of NP problems, as shown in Figure 2.17 . Precise and
formal definition of NP-complete problems cannot be given in this book. But some important
properties of NP-complete problems can be stated as follows:
(1) Up to now, no NP-complete problem has any worst case polynomial algorithm.
(2) If any NP-complete problem can be solved in polynomial time in worst case, all NP problems,
including all NP-complete problems, can be solved in polynomial time in worst case.
(3) Whether a problem is NP-complete or not has to be formally proved and there are thousands of
problems proved to be NP-complete problems.
(4) If the decision version of an optimization problem is NP-complete, this optimization problem is
called NP-hard.
Base upon the above facts, we can conclude that all NP-complete and NP-hard problems must be
difficult problems. Not only they do not have polynomial algorithms at present, it is quite unlikely
that they can have polynomial algorithms in the future because of the second property stated above.
The satisfiability problem is a famous NP-complete problem. The traveling salesperson problem is an
NP-hard problem. Many other problems, such as the chromatic number problem, vertex covering
problem, bin packing problem, 0/1 knapsack problem and the art museum problem are all NP-hard. In
the future, we will often claim that a certain problem is NP-complete without giving a formal proof.
Once a problem is said to be NP-complete, it means it is quite unlikely that a polynomial algorithm
can be designed for it. In fact, the reader should never even try to find a polynomial algorithm for it.
But the reader must understand that we can not say that there exists no polynomial algorithms for NP-
complete problems. We are merely saying that the chance of having such algorithms is very very
small.
It should be noted here that NP-completeness refers to worst cases. Thus, it is still possible to find an
algorithm for an NP-complete problem which has polynomial time-complexity in average cases. It is
our experience that this is also quite difficult as the analysis of average cases is usually quite difficult
to begin with. Is is also possible to design some algorithms which perform rather well although we
can not have an average case analysis of them.
Should we give up hope when we have proved that a problem is NP-hard? No, we should not. In the
next section, we shall introduce the concept of approximation algorithms. Whenever we have proved
a problem to be NP-complete, we should try to design an approximation algorithm for it. This will be
discussed in the following section.
2.9 Approximation Algorithms
As indicated in the previous section, many optimization problems are NP-hard problems. This means
2-29
that it is quite unlikely that polynomial algorithms can be designed for these problems. Thus it is
desirable to have approximation algorithms which will produce approximate solutions with
polynomial time-complexities.
Figure 2.18: A Graph
Let us consider the vertex covering problem. Given a graph G = (V,E), the vertex covering problem
requires us to find a minimum number of vertices from V which covers all edges in E. For instance,
for the graph in Figure 2.18, vertex a covers all edges. The solution is {a}. For the graph in Figure
2.19, the solution is {b,d}.
It has been proved that the vertex covering problem is NP-complete. Algorithm 2.9 is an
approximation algorithm for this problem.
Figure 2.19: A Graph
Let us apply this approximation algorithm to the graph in Figure 2.18. Suppose we pick up edge (a,d).
We can see that all other edges are incident to {a,d}. Thus {a,d} is the approximate solution. Note
that the optimum solution is {a}. Thus the size of the approximate solution is twice as large as that of
the optimum solution.
2-30
Now we apply the algorithm to the graph in Figure 2.19. Suppose we pick up (c,d). S will be {c,d}
and edges (b,c) and (d,e) will be eliminated. Edge (a,b) still remains. We pick up (a,b). The final S
will be {c,d,a,b}. It was pointed out that the optimum solution is {b,d}. Thus the approximation
algorithm has again produced an approximate solution with size as large as twice of the optimum
solution.
Algorithm 2.9 An Approximation Algorithm to Solve the Vertex Covering Problem
Input: A graph G = (V,E).
Output: An approximate solution for the vertex covering problem with performance ratio 2 .
Step 1: Pick up any edge e. Put the two end vertices u and v of e into S.
Step 2: Eliminate all edges which are incident to u and v.
Step 3: If there is no edge left, output S as the approximate solution. Otherwise, go to Step 1.
It is by no means accidental that in each case, the solution of the approximate solution is twice as
large as the optimum solution. We shall prove later that Algorithm 2.9 will always perform in such a
way.
Let App be the solution of an approximation algorithm. Let Opt be an optimal solution. The
performance ratio of the approximation algorithm, denoted as
, is defined as
Opt
App
. For some
approximation algorithms, the performance ratio