Sei sulla pagina 1di 16

The Knuth-Morris-Pratt

algorithm
KMP Matcher Algorithm
Prefix Function Algorithm
Alternative Prefix function algorithm
Input: pattern P of length m
Overlap[1] = 0
For k:=1 to m-1 // Consider P[1..k+1]
c:=P[k+1] // current character of P
v:=Overlap[k]
while P[v+1] ≠ c and v ≠ 0 // until overlap can be extended
v:=Overlap[v] // find next largest precomputed overlap
if P[v+1] = c then
Overlap[k+1]:=v+1 // extend the current overlap
else
Overlap[k+1]:=0 // no overlap exists return overlap
Matching Algorithm
i=1,j=1,k=1
While (n-k) ≥ m do
while j ≤ m and T[i] = P[j] do
i++, j++
if j > m then output k
if Overlap (j-1) > 0 then
k=i-Overlap (j-1)
else
if i==k then i++
k=i;
if j>1 then j=Overlap(j-1) + 1
Computation of Prefix and Matching
j 1 2 3 4 5 6 7 8
P A T T A T A C A
Overlap (j) 0 0 0 1 2 1 0 1

i 1 2 3 4 5 6 7 8 9 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6

P A T C G C A C A T T A T A C A T T A T T A T A C A T
j
Example
• i=1, j=1, k=1 – match
• i=2, j=2 – match
• i=3, j=3 – no match
• Since Overlap(j-1) = 0, j = overlap(j-1) + 1 => j = 1, k = i=3
• i=3, j=1 – no match
• Since i = k, i++ => i=4, j =1 – no match
• Since i = k, i++ => i=5, j =1 – no match
• Since i = k, i++ => i=6, j =1 – match
• i=7, j=2 – no match
• Since Overlap(j-1) = 0, j = overlap(j-1) + 1 => j = 1, k = i=7
• i=7, j=1 – no match
Example
• i=8, j=1 – match
• i=9, j=2 - match
• i=10, j=3 - match
• i=11, j=4 - match
• i=12, j=5 - match
• i=13, j=6 – match
• i=14, j=7 - match
• i=15, j=8 - match
• i=16, j=9 – j > m => output k = 8 (position from where the pattern is
found)
• Overlap(8) = 1 > 0 => k = i – overlap(j-1) = 16-1 = 15
• Start matching at j = overlap(j-1) + 1 = 1+1 = 2
Example
• i=16, j=2 – match
• i=17, j=3 – match
• i=18, j=4 – match
• i=19, j=5 – match
• i=20, j=6 – no match
• Overlap(j-1) = 2 > 0 => k=i-overlap(j-1) = 20-2=18; j = overlap(j-1)+1 =
2+1 = 3
• i=20, j = 3 – match
• i=21, j = 4 – match
• i=22, j = 5 – match
• i=23, j = 6 – match
• i=24, j = 7 – match
• i=25, j = 8 – match
• i=26, j = 9 – match => j > m => output k – k =18.
Example 2
Running time
• For prefix computation – Θ(m)
• For matching - Θ(n)
Lemma 32.5 (Prefix function iteration lemma)
• *[q] is the list of all possible values obtained by
repeatedly applying the prefix function  to q.
• Lemma: Let P be a pattern of length m with prefix
function π. Then, for q = 1, 2, …, m, we have *[q]
= {k : k < q and Pk ] Pq}.
• Proof: We first prove that i ϵ π*[q] implies Pi ] Pq.
• If i ϵ π*[q], then i = π(u)[q] for some u > 0. we prove
the above equation by induction on u.
• For u = 1, we have i = π[q], and the claim follows
since i < q and Pπ[q] ] Pq.
Lemma 32.5
• Using the relations π[i] < i and Pπ[i] ] Pi and the
transitivity of < and ] establishes the claim for
all i in π*[q].
• Therefore, π*[q]  {k : k < q and Pk ] Pq}.
• We prove that {k : k < q and Pk ] Pq}  π*[q] by
contradiction.
• Suppose to the contrary that there is an
integer in the set {k : k < q and Pk ] Pq} - π*[q],
and let j be the largest such value.
Lemma 32.5
• Because π[q] is the largest value in {k : k < q
and Pk ] Pq} and π[q] ϵ π*[q], we must have j <
π[q], and so we let j’ denote the smallest
integer in π*[q] that is greater than j.
• We can choose j’ = π[q] if there is no other
number in π*[q] that is greater than j.
• We have Pj ] Pq because j ϵ {k : k < q and Pk ]
Pq}, and we have Pj’ ] Pq because j’ ϵ π*[q].
Lemma 32.5
• Thus, Pj ] Pj’ by lemma 32.1 and j is the largest
value less than j’ with this property.
• Therefore, we must have π[j’] = j and, since j’ ϵ
π*[q], we must have j ϵ π*[q] as well.
• This contradiction proves the lemma.
Lemma 32.6
• Let P be a pattern of length m, and let π be the
prefix function for P. for q = 1, 2, …, m, if π[q] >
0, then π[q] – 1 ϵ π*[q – 1].
• Proof: if r = π[q] > 0, then r < q and Pr ] Pq; thus
r – 1 < q – 1 and Pr-1 ] Pq-1 (by dropping the last
character from Pr and Pq).
• By lemma 32.5, therefore, π[q] – 1 = r – 1 ϵ
π*[q-1].

Potrebbero piacerti anche