Sei sulla pagina 1di 3

Strings

A string is a sequence of Let P be a string of size m


characters n A substring P[i .. j] of P is the

Pattern Matching Examples of strings:


n C program
subsequence of P consisting of
the characters with ranks
between i and j
n HTML document
n A prefix of P is a substring of
n DNA sequence the type P[0 .. i]
a b a c a a b Digitized image
n
n A suffix of P is a substring of
1 An alphabet Σ is the set of the type P[i ..m − 1]
a b a c a b possible characters for a Given strings T (text) and P
family of strings (pattern), the pattern matching
4 3 2 problem consists of finding a
Example of alphabets:
a b a c a b substring of T equal to P
n ASCII
n Unicode Applications:
n {0, 1} n Text editors
n {A, C, G, T} n Search engines
n Biological research
Pattern Matching 2

Brute-Force Algorithm
The brute-force pattern function BruteForceMatch(T, P, m, n)
Brute Force
matching algorithm compares Input text T of size n and pattern
the pattern P with the text T P of size m
for each possible shift of P Output starting index of a
relative to T, until either substring of T equal to P or −1
n a match is found, or if no such substring exists
n all placements of the pattern for (i = 0; i< n ; i ++){
have been tried /* test shift i of the pattern */
Brute-force pattern matching j = 0;
runs in time O(nm)
while (j < m && T[i + j] = = P[j])
Example of worst case: j = j + 1;
T = aaa … ah
if ( j == m)
n

n P = aaah
return i ; /* match at i */
n may occur in images and
DNA sequences
n unlikely in English text return −1; /* no match */
Pattern Matching 3 Pattern Matching 4

Brute Force-Complexity Brute Force-Complexity(cont.)


Given a pattern M characters in length, and a text N characters
in length... Given a pattern M characters in length, and a text N
Worst case: compares pattern to each substring of text of characters in length...
length M. For example, M=5. Best case if pattern found: Finds pattern in first M
This kind of case can occur for image data. positions of text. For example, M=5.

Total number of comparisons: M


Best case time complexity: O(M)

Total number of comparisons: M (N-M+1)


Worst case time complexity: O(MN)
Pattern Matching 5 Pattern Matching 6

1
Brute Force-Complexity(cont.)
Given a pattern M characters in length, and a text N
Boyer-Moore’s Algorithm (1)
characters in length... The Boyer-Moore’s pattern matching algorithm is based on two
Best case if pattern not found: Always mismatch on heuristics
first character. For example, M=5. Looking-glass heuristic: Compare P with a subsequence of T
moving backwards
Character-jump heuristic: When a mismatch occurs at T[i] = c
n If P contains c, shift P to align the last occurrence of c in P with T[i]
n Else, shift P to align P[0] with T[i + 1]
Example

a p a t t e r n m a t c h i n g a l g o r i t h m

1 3 5 11 10 9 8 7
r i t h m r i t h m r i t h m r i t h m

2 4 6
Total number of comparisons: N r i t h m r i t h m r i t h m
Best case time complexity: O(N)
Pattern Matching 7 Pattern Matching 8

Boyer-Moore’s Algorithm (2)


Last-Occurrence Function
function BoyerMooreMatch(T, P, Σ )
Case 1: j ≤ 1 + l
L = lastOccurenceFunction(P, Σ );
Boyer-Moore’s algorithm preprocesses the pattern P and the i = m − 1; . . . . . . a . . . . . .
alphabet Σ to build the last-occurrence function L mapping Σ to j = m − 1; i
integers, where L(c) is defined as repeat { . . . . b a
n the largest index i such that P[i ] = c or if (T[i] == P[j]) j l
n −1 if no such index exists if (j == 0) m −j
return i; /* match at i */
Example: else { . . . . b a
c a b c d
n Σ = {a, b, c, d} i −− ; j
P = abacab L(c) 4 5 3 −1 j −− ;
Case 2: 1 + l ≤ j
n
}
else . . . . . . a . . . . . .
The last-occurrence function can be represented by an array /* character-jump */ i
indexed by the numeric codes of the characters l = L[T[i]] ;
. a . . b .
The last-occurrence function can be computed in time O(m + s), i = i + m – min(j, 1 + l); l j
where m is the size of P and s is the size of Σ j = m −1; m − (1 + l)
}
until ( i > n − 1); . a . . b .
return −1; /* no match */
1+ l
Pattern Matching 9 Pattern Matching 10

Example Analysis
Boyer-Moore’s algorithm
runs in time O(nm + s) a a a a a a a a a
a b a c a a b a d c a b a c a b a a b b
Example of worst case: 6 5 4 3 2 1
1 n T = aaa … a b a a a a a
a b a c a b
n P = baaa
12 11 10 9 8 7
4 3 2 13 12 11 10 9 8 The worst case may occur in b a a a a a
a b a c a b a b a c a b images and DNA sequences
5 7 but is unlikely in English text 18 17 16 15 14 13
a b a c a b a b a c a b Boyer-Moore’s algorithm is b a a a a a
significantly faster than the 24 23 22 21 20 19
6
brute-force algorithm on b a a a a a
a b a c a b
English text

Pattern Matching 11 Pattern Matching 12

2
KMP’s Algorithm (1) KMP’s Algorithm (2)
Knuth-Morris-Pratt’s The failure function can function FailureFunction( P)
j 0 1 2 3 4 5
algorithm preprocesses the be represented by an
i = 1;
P[j] a b a a b a
pattern to find matches of j = 0;
prefixes of the pattern with F(j) 0 0 1 1 2 3 array and can be F[0] = 0;
while (i < m){
the pattern itself computed in O( m) time if (P[i] == P[j]){
The failure function F(i) is . . a b a a b x . . . . . F[i ] = j + 1;
i ++;
defined as the size of the j ++;
largest prefix of P[0..j] that is }
also a suffix of P[1..j] else if ( j > 0)
a b a a b a j = F[j − 1];
Knuth-Morris-Pratt’s else{
j
algorithm modifies the brute- F[i ] = 0;
force algorithm so that if a i ++;
a b a a b a
mismatch occurs at P[j] ≠ T[i] }
we set j ← F(j − 1) }
F(j − 1) return F ;

Pattern Matching 13 Pattern Matching 14

KMP’s Algorithm (3) Example


function KMPMatch(T, P)
F = failureFunction(P); a b a c a a b a c c a b a c a b a a b b
At each iteration of the i = 0;
1 2 3 4 5 6
while-loop, either j = 0;
while (i < n){ a b a c a b
n i increases by one, or if (T[i] == P[j])
n the shift amount i − j if (j == m − 1) 7
return (i − j ); /*match */ a b a c a b
increases by at least one else {
(observe that F(j − 1) < j) i++; 8 9 10 11 1 2
Hence, there are no j++; a b a c a b
more than 2n iterations else
}
13
of the while-loop if (j > 0) a b a c a b
Thus, KMP’s algorithm j = F[j − 1]; j 0 1 2 3 4 5
else 14 15 16 17 18 19
runs in optimal time i++;
P[j] a b a c a b
a b a c a b
O( m + n) } F(j) 0 0 1 0 1 2
return −1; /* no match */
Pattern Matching 15 Pattern Matching 16

Potrebbero piacerti anche