Sei sulla pagina 1di 38

Regular Expressions

2
Regular Expressions
A regular expression is used to specify a language, and it does
so precisely.
Regular expressions are very intuitive.
Regular expressions are very useful in a variety of contexts.
Given a regular expression, an NFA- can be constructed from it
automatically.
Thus, so can an NFA, a DFA, and a corresponding program, all
automatically!


3
Definition of a Regular Expression
Let be an alphabet. The regular expressions over are:

Represents the empty set { }
Represents the set {}
a Represents the set {a}, for any symbol a in

Let r and s be regular expressions that represent the sets R and S,
respectively.

r+s Represents the set R U S (precedence 3)
rs Represents the set RS (precedence 2)
r
*
Represents the set R* (highest precedence)
(r) Represents the set R (not an op, provides precedence)

If r is a regular expression, then L(r) is used to denote the corresponding
language.
4
Identities:

1. u = u = Multiply by 0
2. u = u = u Multiply by 1
3. * = L
*
= L
i
= L
0
U L
1
U L
2
U
4. * = = {}
5. u+v = v+u
6. u + = u
7. u + u = u
8. u* = (u*)*
9. u(v+w) = uv+uw
10. (u+v)w = uw+vw
11. (uv)*u = u(vu)*
12. (u+v)* = (u*+v)*
= u*(u+v)*
= (u+vu*)*
= (u*v*)*
= u*(vu*)*
= (u*v)*u*

=0 i
Regular grammar and regular
expression
They are equivalent
Every regular expression can be expressed by
regular grammar
Every regular grammar can be expressed by
regular expression
Different ways to express the same thing
RE is more concise

6
Operations on Languages
Let L, L
1
, L
2
be subsets of
*

Concatenation: L
1
L
2
= {xy | x is in L
1
and y is in L
2
}

Concatenating a language with itself: L
0
= {}
L
i
= LL
i-1
, for all i >= 1

Kleene Closure: L
*
= L
i
= L
0
U L
1
U L
2
U

Positive Closure: L
+
= L
i
= L
1
U L
2
U

Question: Does L
+
contain ?

=0 i

=1 i
abc, abbbc, abbbbccc, bc

a, aa, aaa, aaaa, aaaaa

, b, bb, bbb, bbbbb, bbbbb

RE operations
Operation Notati
on
Definition Example
L={a, b} M={0,1}
union of L
and M
L M

L M = {s | s is in L
or s is in M}
{a, b, 0, 1}
concatenatio
n of L and M
LM

LM = {st | s is in L
and t is in M}
{a0, a1, b0, b1}
Kleene
closure of L
L*

L* denotes zero or
more
concatenations of L
All the strings consists
of a and b, plus the
empty string. {, a, aa,
bb, ab, ba, aaa, }
positive
closure
L+

L+ denotes one or
more
concatenations of
L
All the strings consists
of a and b.
9
Examples: Let = {0, 1}

(0 + 1)* All strings of 0s and 1s

0(0 + 1)* All strings of 0s and 1s, beginning with a 0

(0 + 1)*1 All strings of 0s and 1s, ending with a 1

(0 + 1)*0(0 + 1)* All strings of 0s and 1s containing at least one 0

(0 + 1)*0(0 + 1)*0(0 + 1)* All strings of 0s and 1s containing at least two 0s

(0 + 1)*01*01* All strings of 0s and 1s containing at least two 0s

(1 + 01*0)* All strings of 0s and 1s containing an even number
of 0s

1*(01*01*)* All strings of 0s and 1s containing an even number
of 0s

(1*01*0)*1* All strings of 0s and 1s containing an even number
of 0s


Regular expressions
R is a regular expression if R is
a, for some a e
, the empty string
, the empty set
(R
1
R
2
), where R
1
and R
2
are reg. exprs.
(R
1
R
2
), where R
1
and R
2
are reg. exprs.
(R
1
*), where R
1
is a regular expression
A reg. expression R describes the language L(R).
Regular expressions
example: R = (0 1)
if = {0,1} then use as shorthand for R

example: R = 0 *
shorthand: omit R = 0*
precedence: *, then , then , unless override by
parentheses
in example R = 0(*), not R = (0)*
Some examples
{w : w has at least one 1}
= *1*
{w : w starts and ends with same symbol}
= 0*0 1*1 0 1
{w : |w| s 5}
= ( )( )( )( )( )
{w : every 3
rd
position of w is 1}
= (1)*( 1 1)
alphabet
= {0,1}
Manipulating regular expressions
The empty set and the empty string:
R = R
R = R = R
R = R =
and behave like +, x; , behave like 0,1
additional identities:
R R = R (here + and differ)
(R
1
*R
2
)*R
1
* = (R
1
R
2
)*
R
1
(R
2
R
1
)* = (R
1
R
2
)*R
1

Languages of Regular Expressions

: language of regular expression


Example



( ) r L
r
( ) { } ,... , , , , , * ) ( bca abc aa bc a c b a L = +
Definition (continued)

For regular expressions and


1
r
2
r
( ) ( ) ( )
2 1 2 1
r L r L r r L = +
( ) ( ) ( )
2 1 2 1
r L r L r r L =
( ) ( ) ( )* *
1 1
r L r L =
( ) ( ) ( )
1 1
r L r L =
Example
Regular expression: ( ) * a b a +
( ) ( ) * a b a L + ( ) ( ) ( ) * a L b a L + =
( ) ( ) * a L b a L + =
( ) ( ) ( ) ( ) ( )* a L b L a L =
{ } { } ( ) { } ( )* a b a =
{ }{ } ,... , , , , aaa aa a b a =
{ } ,... , , ,..., , , baa ba b aaa aa a =
Example

Regular expression
( ) ( ) b bb aa r * * =
( ) } 0 , : {
2 2
> = m n b b a r L
m n
Example

Regular expression
* ) 1 0 ( 00 * ) 1 0 ( + + = r
) (r L
= { all strings with at least
two consecutive 0 }
Example

Regular expression
) 0 ( * ) 01 1 ( + + = r
) (r L
= { all strings without
two consecutive 0 }
Equivalent Regular Expressions

Definition:

Regular expressions and

are equivalent if
1
r
2
r
) ( ) (
2 1
r L r L =
Example

L = { all strings without
two consecutive 0 }
) 0 ( * ) 01 1 (
1
+ + = r
) 0 ( * 1 ) 0 ( * *) 011 * 1 (
2
+ + + = r
L r L r L = = ) ( ) (
2 1
1
r
2
r and
are equivalent
regular expr.
Therefore:
( ) ( ) ( )
( ) ( ) ( )
( ) ( ) ( )* *
1 1
2 1 2 1
2 1 2 1
r L r L
r L r L r r L
r L r L r r L
=
=
= +
Are regular
languages
1. All words begin with a, end with a and in between
any word using b.


2. Is A*b* = (ab)*


3. Define language such that it contain at least one
double letter.


a+ab*a

(a+b)*(aa+bb)(a+b)*
Regular expression exercises
Can the string baa be created from the
regular expression a*b*a*b* ?

Describe the language (in words)
represented by (a*a)b|b.

Write the regular expression that represents:
All strings over ={a, b} that end in a.
All strings over ={0,1} of even length.
Regular expressions and FA
a language L is recognized by a FA if and
only if L is described by a regular expression.
Must prove two directions:
() L is recognized by a FA implies L is described
by a regular expression
(:) L is described by a regular expression implies L
is recognized by a FA.
Regular expressions and FA
(:) L is described by a regular expression implies L
is recognized by a FA

Proof: given regular expression R we will build
a NFA that recognizes L(R).

then NFA, FA equivalence implies a FA for L(R).
RE to NFA
Thompson construction
Introduced by Ken Thompson.
Key idea:
NFA pattern for each symbol and operator;
Join them with moves;
Based on the inductive definition of RE.
29
Basis: OP(r) = 0

Then r is either , , or a, for some symbol a in

For :




For :




For a:
q
f
q
0
q
f
q
f
q
0
a
Thompson construction (basis)
For epsilon:
The NFA for the
expression has an
arc labeled from its
start node (i) to its end
node (f).
For c:
The NFA for the
regular expression c,
for any character c,
has an arc labeled c
from its start node (i) to
its end node (f).

f i

f i
c
Induction step in Thompson construction:
s|t
Given REs s and t, suppose N(s) and N(t) are NFAs for s
and t.
NFA(s | t) is:






Add two new states i and f.
Add two -transitions from i to the start states of N(s) and
N(t);
Add two transitions from the final states of N(s) and N(t)
to f.


N(s)
N(t)




i f
Induction step for st
Given REs s and t, suppose N(s) and N(t) are NFAs
New start state: start state of N(s);
New final state: final state of N(t);
Final state of N(s) is merged with the start state of
N(t);



Q: What if there are multiple final states in N(s)?



N(s) N(t) f
i
Induction step for s*
N(s) is NFA for s;
Add two new states: start state i and final state f;
The NFA for the regular expression s* has empty arcs
from i to f, from i to s.i, from s.f to s.i, and from s.f to f.
N(s)


i
f


Example for constructing (a|b)*abb
Recall the DFA
and NFA. We
have seen how to
transform the
NFA to DFA. But
how the NFA can
be constructed
automatically?
start
0
b b a
a
b
1 2
3
start
0 3
b
2 1
b a
b
a
b
a
a
Another example for Thompson construction
Construct NFA for
a, b, and c.



Construct b|c





(b|c)*


a
b
c
b
c

b

c

Example:

r = 0(0+1)*

r = r
1
r
2

r
1
= 0

r
2
= (0+1)*

r
2
= r
3
*

r
3
= 0+1

r
3
= r
4
+ r
5

r
4
= 0

r
5
= 1
q
6
q
5
q
4
q
0
q
1
1
q
2
q
3
0



q
f


Definitions Required to
Convert a DFA to a Regular Expression


Let M = (Q, , , q
1
, F) be a DFA with state set Q = {q
1
,
q
2
, , q
n
}, and define:

R
i,j
= { x | x is in * and (q
i
,x) = q
j
}

R
i,j
is the set of all strings that define a path in M from q
i

to q
j
.



Identities for regular expression

Potrebbero piacerti anche