Sei sulla pagina 1di 6

10/16/2016

BoyerMoorealgorithm

Stringmatching

BoyerMoorealgorithm
Idea
ThealgorithmofBoyerandMoore[BM77]comparesthepatternwiththetextfromrighttoleft.Ifthetextsymbol
thatiscomparedwiththerightmostpatternsymboldoesnotoccurinthepatternatall,thenthepatterncanbe
shiftedbympositionsbehindthistextsymbol.Thefollowingexampleillustratesthissituation.
Example:
0 1 2 3 4 5 6 7 8 9 ...

a b b a d a b a c b a
b a b a c
b a b a c

Thefirstcomparisondcatposition4producesamismatch.Thetextsymbolddoesnotoccurinthepattern.
Therefore,thepatterncannotmatchatanyofthepositions0,...,4,sinceallcorrespondingwindowscontaina
d.Thepatterncanbeshiftedtoposition5.
ThebestcasefortheBoyerMoorealgorithmisattainedifateachattemptthefirstcomparedtextsymboldoesnot
occurinthepattern.ThenthealgorithmrequiresonlyO(n/m)comparisons.

Badcharacterheuristics
Thismethodiscalledbadcharacterheuristics.Itcanalsobeappliedifthebadcharacter,i.e.thetextsymbolthat
causesamismatch,occurssomewhereelseinthepattern.Thenthepatterncanbeshiftedsothatitisalignedto
thistextsymbol.Thenextexampleillustratesthissituation.
Example:
0 1 2 3 4 5 6 7 8 9 ...

a b b a b a b a c b a
b a b a c
b a b a c

Comparisonbccausesamismatch.Textsymbolboccursinthepatternatpositions0and2.Thepatterncan
beshiftedsothattherightmostbinthepatternisalignedtotextsymbolb.

Goodsuffixheuristics
Sometimesthebadcharacterheuristicsfails.Inthefollowingsituationthecomparisonabcausesamismatch.An
alignmentoftherightmostoccurenceofthepatternsymbolawiththetextsymbolawouldproduceanegativeshift.
Instead,ashiftby1wouldbepossible.However,inthiscaseitisbettertoderivethemaximumpossibleshift
distancefromthestructureofthepattern.Thismethodiscalledgoodsuffixheuristics.
Example:
0 1 2 3 4 5 6 7 8 9 ...

a b a a b a b a c b a
c a b a b
c a b a b

Thesuffixabhasmatched.Thepatterncanbeshifteduntilthenextoccurenceofabinthepatternisalignedto
thetextsymbolsab,i.e.toposition2.
Inthefollowingsituationthesuffixabhasmatched.Thereisnootheroccurenceofabinthepattern.Therefore,the
patterncanbeshiftedbehindab,i.e.toposition5.
Example:
http://www.inf.fhflensburg.de/lang/algorithmen/pattern/bmen.htm

1/6

10/16/2016

BoyerMoorealgorithm

0 1 2 3 4 5 6 7 8 9 ...

a b c a b a b a c b a
c b a a b
c b a a b

Inthefollowingsituationthesuffixbabhasmatched.Thereisnootheroccurenceofbabinthepattern.Butinthis
casethepatterncannotbeshiftedtoposition5asbefore,butonlytoposition3,sinceaprefixofthepattern(ab)
matchestheendofbab.Werefertothissituationascase2ofthegoodsuffixheuristics.
Example:
0 1 2 3 4 5 6 7 8 9 ...

a a b a b a b a c b a
a b b a b
a b b a b

Thepatternisshiftedbythelongestofthetwodistancesthataregivenbythebadcharacterandthegoodsuffix
heuristics.

Preprocessingforthebadcharacterheuristics
Forthebadcharacterheuristicsafunctionoccisrequiredwhichyields,foreachsymbolofthealphabet,the
positionofitsrightmostoccurrenceinthepattern,or1ifthesymboldoesnotoccurinthepattern.
Definition:LetAbetheunderlyingalphabet.
Theoccurrencefunctionocc:A*A

isdefinedasfollows:

Letp A*withp=p0...pm1bethepatternanda Aanalphabetsymbol.Then

occ(p,a)=max{j|pj=a}
Heremax( )issetto1.
Example:

occ(text,x)=2
occ(text,t)=3
Therightmostoccurenceofsymbol'x'inthestring'text'isatposition2.Symbol't'occursatpositions0and3,
therightmostoccurenceisatposition3.
Theoccurrencefunctionforacertainpatternpisstoredinanarrayoccwhichisindexedbythealphabetsymbols.
Foreachsymbola Athecorrespondingvalueocc(p,a)isstoredinocc[a].
ThefollowingfunctionbmInitocccomputestheoccurrencefunctionforagivenpatternp.
Badcharacterpreprocessing
voidbmInitocc()
{
chara
intj
for(a=0a<alphabetsizea++)
occ[a]=1
for(j=0j<mj++)
{
a=p[j]
http://www.inf.fhflensburg.de/lang/algorithmen/pattern/bmen.htm

2/6

10/16/2016

BoyerMoorealgorithm

occ[a]=j
}
}

Preprocessingforthegoodsuffixheuristics
Forthegoodsuffixheuristicsanarraysisused.Eachentrys[i]containstheshiftdistanceofthepatternifa
mismatchatpositioni1occurs,i.e.ifthesuffixofthepatternstartingatpositionihasmatched.Inorderto
determinetheshiftdistance,twocaseshavetobeconsidered.
Case1:Thematchingsuffixoccurssomewhereelseinthepattern(Figure1).
Case2:Onlyapartofthematchingsuffixoccursatthebeginningofthepattern(Figure2).

Figure1:Thematchingsuffix(gray)occurssomewhereelseinthepattern

Figure2:Onlyapartofthematchingsuffixoccursatthebeginningofthepattern

Case1:
ThesituationissimilartotheKnuthMorrisPrattpreprocessing.Thematchingsuffixisaborderofasuffixofthe
pattern.Thus,thebordersofthesuffixesofthepatternhavetobedetermined.However,nowtheinversemapping
isneededbetweenagivenborderandtheshortestsuffixofthepatternthathasthisborder.
Moreover,itisnecessarythatthebordercannotbeextendedtotheleftbythesamesymbol,sincethiswouldcause
anothermismatchaftershiftingthepattern.
Inthefollowingfirstpartofthepreprocessingalgorithmanarrayfiscomputed.Eachentryf[i]containsthestarting
positionofthewidestborderofthesuffixofthepatternbeginningatpositioni.Thesuffixbeginningatpositionm
hasnoborder,thereforef[m]issettom+1.
SimilartotheKnuthMorrisPrattpreprocessingalgorithm,eachborderiscomputedbycheckingifashorterborder
thatisalreadyknowncanbeextendedtotheleftbythesamesymbol.
However,thecasewhenabordercannotbeextendedtotheleftisalsointeresting,sinceitleadstoapromising
shiftofthepatternifamismatchoccurs.Therefore,thecorrespondingshiftdistanceissavedinanarrays
providedthatthisentryisnotalreadyoccupied.Thelatteristhecasewhenashortersuffixhasthesameborder.
Goodsuffixpreprocessingcase1
voidbmPreprocess1()
{
inti=m,j=m+1
f[i]=j
while(i>0)
http://www.inf.fhflensburg.de/lang/algorithmen/pattern/bmen.htm

3/6

10/16/2016

BoyerMoorealgorithm

{
while(j<=m&&p[i1]!=p[j1])
{
if(s[j]==0)s[j]=ji
j=f[j]
}
ij
f[i]=j
}
}
Avisualizationofthepreprocessingalgorithmisgivenin[3].Thefollowingexampleshowsthevaluesinarrayfand
inarrays.
Example:
i:
p:
f:
s:

0
a
5
0

1
b
6
0

2
b
4
0

3
a
5
0

4
b
6
2

5
a
7
0

6 7
b
7 8
4 1

Thewidestborderofsuffixbababbeginningatposition2isbab,beginningatposition4.Therefore,f[2]=4.The
widestborderofsuffixabbeginningatposition5is,beginningatposition7.Therefore,f[5]=7.
Thevaluesofarraysaredeterminedbythebordersthatcannotbeextendedtotheleft.
Thesuffixbababbeginningatposition2hasborderbab,beginningatposition4.Thisbordercannotbeextended
totheleftsincep[1]p[3].Thedifference42=2istheshiftdistanceifbabhasmatchedandthena
mismatchoccurs.Therefore,s[4]=2.
Thesuffixbababbeginningatposition2hasborderb,too,beginningatposition6.Thisbordercannotbe
extendedeither.Thedifference62=4istheshiftdistanceifbhasmatchedandthenamismatchoccurs.
Therefore,s[6]=4.
Thesuffixbbeginningatposition6hasborder,beginningatposition7.Thisbordercannotbeextendedtothe
left.Thedifference76=1istheshiftdistanceifnothinghasmatched,i.e.ifamismatchoccursinthefirst
comparison.Therefore,s[7]=1.
Case2:
Inthissituation,apartofthematchingsuffixofthepatternoccursatthebeginningofthepattern.Thismeansthat
thispartisaborderofthepattern.Thepatterncanbeshiftedasfarasitswidestmatchingborderallows(Figure2).
Inthepreprocessingforcase2,foreachsuffixthewidestborderofthepatternthatiscontainedinthatsuffixis
determined.
Thestartingpositionofthewidestborderofthepatternatallisstoredinf[0].Intheexampleabovethisis5since
theborderabstartsatposition5.
Inthefollowingpreprocessingalgorithm,thisvaluef[0]isstoredinitiallyinallfreeentriesofarrays.Butwhenthe
suffixofthepatternbecomesshorterthanf[0],thealgorithmcontinueswiththenextwiderborderofthepattern,i.e.
withf[j].
Goodsuffixpreprocessingcase2
voidbmPreprocess2()
{
inti,j
j=f[0]
for(i=0i<=mi++)
{
http://www.inf.fhflensburg.de/lang/algorithmen/pattern/bmen.htm

4/6

10/16/2016

BoyerMoorealgorithm

if(s[i]==0)s[i]=j
if(i==j)j=f[j]
}
}
Avisualizationoftheexecutionofthealgorithmisgivenin[3].Thefollowingexampleshowsthefinalvaluesof
arrays.
Example:
i:
p:
f:
s:

0
a
5
5

1
b
6
5

2
b
4
5

3
a
5
5

4
b
6
2

5
a
7
5

6 7
b
7 8
4 1

TheentirepreprocessingalgorithmoftheBoyerMoorealgorithmconsistsofthebadcharacterpreprocessingand
bothpartsofthegoodsuffixpreprocessing.
BoyerMoorepreprocessing
voidbmPreprocess()
{
int[]f=newint[m+1]
bmInitocc()
bmPreprocess1()
bmPreprocess2()
}

Searchingalgorithm
Thesearchingalgorithmcomparesthesymbolsofthepatternfromrighttoleftwiththetext.Afteracompletematch
thepatternisshiftedaccordingtohowmuchitswidestborderallows.Afteramismatchthepatternisshiftedbythe
maximumofthevaluesgivenbythegoodsuffixandthebadcharacterheuristics.

BoyerMooresearchingalgorithm
voidbmSearch()
{
inti=0,j
while(i<=nm)
{
j=m1
while(j>=0&&p[j]==t[i+j])j
if(j<0)
{
report(i)
i+=s[0]
}
else
i+=Math.max(s[j+1],jocc[t[i+j]])
}
}

Analysis
Ifthereareonlyaconstantnumberofmatchesofthepatterninthetext,theBoyerMooresearchingalgorithm
perfomsO(n)comparisonsintheworstcase.Theproofofthisisratherdifficult.
http://www.inf.fhflensburg.de/lang/algorithmen/pattern/bmen.htm

5/6

10/16/2016

BoyerMoorealgorithm

Ingeneral(nm)comparisonsarenecessary,e.g.ifthepatternisamandthetextan.Byaslightmodificationof
thealgorithmthenumberofcomparisonscanbeboundedtoO(n)eveninthegeneralcase.
Ifthealphabetislargecomparedtothelengthofthepattern,thealgorithmperformsO(n/m)comparisonsonthe
average.Thisisbecauseoftenashiftbymispossibleduetothebadcharacterheuristics.

Conclusions
TheBoyerMoorealgorithmusestwodifferentheuristicsfordeterminingthemaximumpossibleshiftdistancein
caseofamismatch:the"badcharacter"andthe"goodsuffix"heuristics.Bothheuristicscanleadtoashiftdistance
ofm.Forthebadcharacterheuristicsthisisthecase,ifthefirstcomparisoncausesamismatchandthe
correspondingtextsymboldoesnotoccurinthepatternatall.Forthegoodsuffixheuristicsthisisthecase,ifonly
thefirstcomparisonwasamatch,butthatsymboldoesnotoccurelsewhereinthepattern.
Thepreprocessingforthegoodsuffixheuristicsisratherdifficulttounderstandandtoimplement.Therefore,
sometimesversionsoftheBoyerMoorealgorithmarefoundinwhichthegoodsuffixheuristicsisleftaway.The
argumentisthatthebadcharacterheuristicswouldbesufficientandthegoodsuffixheuristicswouldnotsavemany
comparisons.However,thisisnottrueforsmallalphabets.
Ifforsimplicityonewantstorestrictoneselftothebadcharacterheuristics,theHorspoolalgorithm[Hor80]orthe
Sundayalgorithm[Sun90]aresuitedbetter.

References

[BM77]

R.S.BOYER,J.S.MOORE:AFastStringSearchingAlgorithm.CommunicationsoftheACM,20,
10,762772(1977)

[Hor80]

R.N.HORSPOOL:PracticalFastSearchinginStrings.SoftwarePracticeandExperience10,501
506(1980)

[Sun90]

D.M.SUNDAY:AVeryFastSubstringSearchAlgorithm.CommunicationsoftheACM,33,8,132
142(1990)

[Web1]

http://wwwigm.univmlv.fr/~lecroq/string/

[Web2]

http://www.inf.fh
flensburg.de/lang/algorithmen/pattern/stringmatchingclasses/BmStringMatcher.java
BoyerMoorealgorithmasaJavaclasssourcefile

[Web3]

http://www.inf.fhflensburg.de/lang/algorithmen/pattern/bmPreprocess.xls
BoyerMooregoodsuffixpreprocessingvisualizationinExcel

Next:[Horspoolalgorithm]or

H.W.LangHochschuleFlensburglang@hsflensburg.deImpressumCreated:22.02.2001Updated:29.05.2016

http://www.inf.fhflensburg.de/lang/algorithmen/pattern/bmen.htm

6/6