Sei sulla pagina 1di 4

World Academy of Science, Engineering and Technology

Vol:2 2008-06-27

RB-Matcher: String Matching Technique


Rajender Singh Chillar, Barjesh Kochar


International Science Index Vol:2, No:6, 2008 waset.org/Publication/10088

AbstractAll Text processing systems allow their users to


search a pattern of string from a given text. String matching is
fundamental to database and text processing applications. Every text
editor must contain a mechanism to search the current document for
arbitrary strings. Spelling checkers scan an input text for words in the
dictionary and reject any strings that do not match. We store our
information in data bases so that later on we can retrieve the same
and this retrieval can be done by using various string matching
algorithms. This paper is describing a new string matching algorithm
for various applications. A new algorithm has been designed with the
help of Rabin Karp Matcher, to improve string matching process.

KeywordsAlgorithm, Complexity, Matching-patterns, Pattern,


Rabin-Karp, String, text-processing.
I. INTRODUCTION

TRING-MATCHING is a technique to find out pattern from


given text. Let be an alphabet, a nonempty finite set.
Elements of are called symbols or characters. A string (or
word) over is any finite sequence of characters from . For
example, if = {a,b}, then abab is a string over . Stringmatching is a very important topic in the domain of text
processing.[2] String-matching consists in finding one, or
more generally, all the occurrences of a string (more generally
called a pattern) in a text. The pattern is denoted by P=P [0 ...
m-1]; its length is equal to m. The text is denoted by T=T [0 ...
n-1]; its length is equal to n. Both strings are building over a
finite set of character called an alphabet denoted by . [3]
RABIN KARP matcher is one of the most effective string
matching algorithms. To find a numeric pattern P from a
given text T. It firstly divides the pattern with a predefined
prime number q to calculate the remainder of pattern P. Then
it takes the first m characters (where m is the length of pattern
P) from text T at first shift s to compute remainder of m
characters from text T. If the remainder of the Pattern and the
remainder of the text T are equal only then we compare the
text with the pattern otherwise there is no need for the
comparison.[1] The reason is that if the remainders of the two
numbers are not equal then these numbers cannot be equal in
any case. We will repeat the process for next set of characters
Dr. Rajender Singh Chillar is Reader, Formal HOD (CS) with Maharishi
Dayanand University, India (phone: +919416277507; e-mail: chillar01@
rediffmail.com).
Barjesh Kochar is working as HOD-IT with GNIM, New Delhi, India
(phone:+919212505801; e-mail: barjeshkochar@gmail.com)
Garima Singh (Jr. Author) is pursuing B.Tech studying in GGSIPU
University, New Delhi, India (e-mail: garima_20_04@yahoo.co.in).
Kanwaldeep Singh (Jr. Author) is pursuing B.Tech studying in GGSIPU
University, New Delhi, India (e-mail: kawal_deep87@yahoo.co.in ).

International Scholarly and Scientific Research & Innovation 2(6) 2008

from text for all the possible shifts which are from s=0 to nm(where n denotes the length of text and m denotes the length
of P). So according to this two number n1 and n2 can only be
equal if
REM (n1/q) = REM (n2/q) [1]
After division we will be having three cases :Case 1:
Successful hit: - In this case if REM (n1) = REM(n2) and also
characters of n1 matches with characters of n2.
Case 2:
Spurious hit: - In this case REM (n1) = REM (n2) but
characters of n1 are not equal to characters of n2.
Case 3:
If REM (n1) is not equal to REM (n2), then no need to
compare n1 and n2.
ExFor a given text T, pattern P and prime number q

so to find out this pattern from the given text T we will take
equal number of characters from text as in pattern and divide
these characters with predefined number q and also divide the
pattern with the same predefined number q. Now compare
their remainders to decide whether to compare the text with
pattern or not.
Rem (Text) =234567/11=3
Rem (Pattern) =667888/11=1
As both the remainders are not equal so there is no need to
compare text with pattern. Now move on to next set of
characters from text and repeat the procedure. [1]. If
remainders match then only we compare the part of text to the
pattern otherwise there is no need to perform the comparison.
We will maintain three variables Successful Hit, Spurious Hit
and Unsuccessful Hit.
Rabin Karp Matcher Algorithm
Rabin_Matcher (T,P,d,q)
{
n =Length (T)
m= Length (P)
t0=0
p=0
h=dm-1mod q

995

World Academy of Science, Engineering and Technology


Vol:2 2008-06-27

For i=1 to m
{ p = (dp+P[i]) mod q
t0 =(d t0 + T[i] ) mod q
}
For s =0 to n-m
{ If ts=p
{ then
{ if P[1.m]=T[s+1.s+m]
then print pattern matches at shift s
}
}
if s<= n-m
ts+1= (d(ts-h*T[s+1]) + T[s+1+m] ) mod q
}
}[1]

International Science Index Vol:2, No:6, 2008 waset.org/Publication/10088

II. IMPROVED STRING MATCHING ALGORITHM


A. Theory
As we can see, spurious hit is an extra burden on algorithm
which increases its time complexity because we have to
compare text with pattern and wont be able to get pattern at
that shift so to avoid this extra matching, RB_Matcher says
that along with remainders compare the quotients also.
REM(n1/q)=REM(n2/q) and
QUOTIENT (n1/q)= QUOTIENT (n2/q)
So, according to this method along with calculation of
remainder, we will also find out quotient and if both
remainder and quotient of text matches with pattern then it is
successful hit otherwise it is an unsuccessful hit and then there
is no need to compare it. That means there is no extra
computation of spurious hits if both are same then pattern
found else pattern not found. Please, leave two blank lines
between successive sections as here.

then print pattern matches at shift s


}
if s<= n-m
ts+1= ( d(ts-h*T[s+1]) + T[s+1+m] ) mod q
}
} [3]
III. IMPROVEMENTS
Rabin Karp matcher algorithm was computing remainder on
the basis of which it was conducting whether the pattern has
been found in the text or not. So there was an extra
computation when processing for the spurious hits. But in the
case of Modified RB matcher there is no chance of spurious
hits because it always gives one solution i.e. in case of
successful hits.
A. Comparisons in terms of Time Complexity
To compare the previous work with the new one we applied
both these algorithms on a lot of Fictitious Data & the results
shows that Modified RB_matcher algorithm is having less
Time complexity as compared to Rabin Karp Matcher. The
worst case time complexity of Rabin-Karp matcher is O (nm+1) m). While the worst case time complexity of Modified
RB Matcher is O (nm+ 1) (This Time complexity can be
further improved if q=m) where n denotes the total characters
in Text T and m denotes total characters in Pattern P.[9]
Graphs:-

B. Algorithms
The modified algorithm is as follows:RB_ Matcher (T,P,d,q)
{
n =Length (T)
m= Length (P)
t0=0
p=0
Q=0
pq=0
h=dm-1mod q
For i=1 to m
{
p = (dp+P[i]) mod q
t0 =(d t0 + T[i] ) mod q
}
pq= P[1..m] DIV q
For s =0 to n-m
{
Q=T{s+1.s+ m] DIV q
If (ts = p and Q = pq)
{

International Scholarly and Scientific Research & Innovation 2(6) 2008

Fig. 1 shows Time complexities of Rabin Karp and RB


Matcher if n=9

996

International Science Index Vol:2, No:6, 2008 waset.org/Publication/10088

World Academy of Science, Engineering and Technology


Vol:2 2008-06-27

Fig. 4 Example Rabin Karp


Fig. 2 shows Time complexities of Rabin Karp and RB
Matcher if n=20

Modified Rabin Karp Example:

(Q denotes Quotient)
In this algorithm comparison of pattern and Text will
always lead to successful hits.

Fig. 3 shows Time complexities of Rabin Karp and B


Matcher if n=100

B. Comparison in terms of Example


Rabin Karp Example: Text= 14412217356431121441, to
find Pattern P= 1441, q = 11, Remainder of Pattern P is p=0
Rest cases are Unsuccessful Hits. So in Rabin Karp to find out
Pattern P we encounter Spurious Hits which is extra
processing.

International Scholarly and Scientific Research & Innovation 2(6) 2008

C. Test Cases
We will some study test cases for the modified algorithms
complexity.
TABLE I FOR MODIFIED RABIN KARP MATCHER (CONSTANT TEXT LENGTH)

S No

997

Length of
text (n)

Length of
pattern
(m)

100

10

VALUE
OF q
2
3
5
7
11
13

Complexity
(n-m+1)
91
91
91
91
91
91

World Academy of Science, Engineering and Technology


Vol:2 2008-06-27

100

International Science Index Vol:2, No:6, 2008 waset.org/Publication/10088

100

100

15

20

50

17
19
23
2
3
5
7
11
13
17
19
23
2
3
5
7
11
13
17
19
23
2
3
5
7
11
13
17
19
23

91
91
91
86
86
86
86
86
86
86
86
86
81
81
81
81
81
81
81
81
81
49
49
49
49
49
49
49
49
49

Length
of
text
(n)
100

Length
of pattern
(m)
10

200

10

300

10

VALU
E OF q

Complexity
(n-m+1)

2
3
5
7
11
13
17
19
23
2
3
5
7
11
13
17
19
23
2
3
5

91
91
91
91
91
91
91
91
91
191
191
191
191
191
191
191
191
191
291
291
291

International Scholarly and Scientific Research & Innovation 2(6) 2008

10

291
291
291
291
291
291
391
391
391
391
391
391
391
391
391

IV. CONCLUSION
With the above, we concluded that any numeric pattern can
be found out from the given Text T by following Modified
RB matcher in an effective & efficient way. We also invite
other Research Scholars to work on same and find out some
better way to find out string from the given text.T.
REFERENCES
[1]
[2]
[3]
[4]

TABLE II FOR MODIFIED RABIN KARP (FOR CONSTANT PATTERN LENGTH)

S
No

400

7
11
13
17
19
23
2
3
5
7
11
13
17
19
23

998

Algorithm design and analysis by T.H Coreman .


Algorithm design by Aho ulman and Hopcrafft
www.algodesign.com
Richard M.Karp. An Introduction to Randomized Algorithms. Discrete
Applied mathematics,34:165-201,1991

Potrebbero piacerti anche