Sei sulla pagina 1di 30

BloggerTank.com http://www.bloggertank.

com

Text search and closure properties


Vikash Shankar http://www.bloggertank.com/search/label/Notes

Text search

The program grep


grep -E regexp file Searches for the occurrence of patterns matching a regular expression
cat|12 [abc] [ab][12] (ab)* [ab]? (cat)+ [ab]{2} {cat, 12} {a, b, c} {a1, a2, b1, b2} {e, ab, abab, ...} {e, a, b} {cat, catcat, ...} {aa, ab, ba, bb} union shorthand for a|b|c concatenation star zero or one one or more {n} copies

Searching with grep


Words containing savor or savour grep E `savou?r` words
outsavor savor savored savorer savorily savoriness savoringly savorless savorous savorsome savory savour unsavored unsavoredly unsavoredness unsavorily unsavoriness unsavory

Words with 5 consecutive a or b grep E `[ab]{5}` words


grabbable

grep E `zo+zo+` words


zoozoo

More grep commands


. any symbol [a-z] anything in a range \< $ beginning of line end of line 3 1 If a DFA is too hard, I do an ... 2 CSCI 3130 homework makes me ... 3 If a DFA likes it, it is ... 4 $10000000 = $10? 5 I study 3130 hard because it will make me ... grep E `\<.ff.u..t` words 4 5

n 2 s u
1

r e g

a f f l u e n t

a f e r l a r

a r

how do you look for...


Words that start in cat and have another cat
grep E `\<cat.*cat` words

Words with at least ten vowels?


grep E `([aeiouy].*){10}` words

Words without any vowels? Words with exactly ten vowels?

[^R]

does not contain R

grep E `\<[^AEIOUYaeiouy]*$` words

grep E `\<[^AEIOUYaeiouy]* ([aeiouy][^AEIOUYaeiouy]*){10}$` words

How grep (could) work


regular expression text file NFA NFA
without e

DFA

differences
[ab]?, a+, (cat){3}

in class not allowed matches whole accept/reject

in grep allowed looks for pattern finds pattern

input handling output

Implementation of grep
How do you handle expressions like:
[ab]? ()|[ab] zero or one R? e|R one or more R+ RR* {n} copies R{n} RR...R
n times

(cat)+ (cat)(cat)*

a{3}

aaa

[^aeiouy]

not containing any

Closure properties

Example
The language L of strings that end in 101 is regular (0+1)*101 How about the language L of strings that do not end in 101?

Example
Hint: w does not end in 101 if and only if it ends in: 000, 001, 010, 011, 100, 110 or 111
or it has length 0, 1, or 2

So L can be described by the regular expression


(0+1)*(000+001+010+010+100+110+111) + e + (0 + 1) + (0 + 1)(0 + 1)

Complement
The complement L of a language L is the set of all strings that are not in L Examples (S = {0, 1})
L1 = all strings that end in 101 L1 = all strings that do not end in 101 = all strings end in 000, , 111 or have length 0, 1, or 2 L2 = 1* = {e, 1, 11, 111, } L2 = all strings that contain at least one 0 = (0 + 1)*0(0 + 1)*

Example
The language L of strings that contain 101 is regular (0+1)*101(0+1)* How about the language L of strings that do not contain 101? You can write a regular expression, but it is a lot of work!

Closure under complement


If L is a regular language, so is L. To argue this, we can use any of the equivalent definitions for regular languages:
regular expression NFA DFA

The DFA definition will be most convenient


We assume L has a DFA, and show L also has a DFA

Arguing closure under complement


Suppose L is regular, then it has a DFA M
accepts L

Now consider the DFA M with the accepting and rejecting states of M reversed
accepts strings not in L this is exactly L

Food for thought


Can we do the same thing with an NFA?
0, 1 q0 1 q1 0 q2

NO!

(0+1)*10

0, 1 q0 1 q1 0 q2

(0+1)*

Intersection
The intersection L L is the set of strings that are in both L and L Examples:
L = (0 + 1)*11 L = (0 + 1)*10 L = 1* L = 1* L L = 1*11 L L =

If L, L are regular, is L L also regular?

Closure under intersection


If L and L are regular languages, so is L L. To argue this, we can use any of the equivalent definitions for regular languages:
regular expression NFA DFA

Suppose L and L have DFAs, call them M and M Goal: Construct a DFA (or NFA) for L L

An example
M
r0 1 0 0 r1 1

M
s0

0 1 1 s1

L = even number of 0s
r0, s0

L = odd number of 1s
1 1
r0, s1

0 1 1

r1, s0

r1, s1

L L = even number of 0s and odd number of 1s

Closure under intersection


M and M states Q = {r1, ..., rn} Q = {s1, ..., sn} DFA for L L Q Q = {(r1, s1), (r1, s2), ..., (r2, s1), ..., (rn, sn)} (ri, sj) F F = {(ri, sj): ri  F, sj  F}

start state ri for M sj for M accepting states F for M F for M

Whenever M is in state ri and M is in state sj, the DFA for L L will be in state (ri, sj)

Closure under intersection


M and M transitions
ri si a a rj sj

DFA for L L in M in M
ri, si a rj, sj

Reversal
The reversal wR of a string w is w written backwards
w = cave wR = evac

The reversal LR of a language L is the language obtained by reversing all its strings
L = {cat, dog} LR = {tac, god}

Reversal of regular languages


L = all strings that end in 101 is regular (0+1)*101 How about LR? This is the language of all strings beginning in 101 Yes, because it is represented by 101(0+1)*

Closure under reversal


If L is a regular language, so is LR.

How do we argue?
regular expression NFA DFA

Arguing closure under reversal


Take a regular expression E for L We will show how to reverse E A regular expression can be of the following types:
The special symbols and e Alphabet symbols like a, b The union, concatenation, or star of simpler expressions

Proof of closure under reversal


regular expression E reversal ER

e
a (alphabet symbol) E1 + E2 E1E2 E1*

e
a E1R + E2R E2RE1R (E1R)*

A question
LDUP = {ww: w L}
Ex.

L = {cat, dog} LDUP = {catcat, dogdog}

If L is regular, is LDUP also regular?

regular expression

NFA

DFA

A question
Lets try with regular expression:

LDUP = LL
Lets try with NFA:
q0 e NFA for L e

L = {a, b} LDUP = {aa, bb} LL = {aa, ab, ba, bb}

NFA for L

q1

An example
L = 0*1 is regular LDUP = {1, 01, 001, 0001, ...} LDUP = {11, 0101, 001001, 00010001, ...} = {0n10n1: n 0}

Lets try to design an NFA for LDUP

An example
0 1 1 1 01 0 1 001 0 1 0001 0

LDUP = {11, 0101, 001001, 00010001, ...} = {0n10n1: n 0}