Week 3

Fall 1999 Formal Language Theory Dr. R.
Boyer
Week Three:
Equivalence between Finite Automata and Regular Expressions
1. Example of Conversion of NFA with e-moves with a DFA
We summarize the example from the last lecture. Consider the NFA M = (Q; ; ; q0; fq3 g);
where Q = fq0 ; q1 ; q2 ; q3 ; q4 g; with two letter alphabet = fa; bg; and with one accepting
state fq3 g and is given by the following table:
NFA Transitions Computation of e-closures

state a b e states e-closure
0 1 - 2 q0 fq0; q1; q2 ; q3g
1 - - 3 q1 fq1 ; q3g
2 4 - 1 q2 fq1; q2 ; q3g
3 - 4 - q3 fq3 g
4 1 - 3 q4 fq3 ; q4g
Computation of the states and transitions of the equivalent DFA: Since the original NFA
has initial state f0g; the corresponding DFA has initial state which is its e-closure: E (0) =
f0; 1; 2; 3g
DFA Transitions
set of states a b
f0; 1; 2; 3g E (1) [ E (2) = f1; 3; 4g E (4) = f3; 4g
f1; 3; 4g E (1) = f1; 3g E (4) = f3; 4g
f3; 4g E (1) = f1; 3g E (4) = f3; 4g
f1; 3g ; E (4) = f3; 4g
; ; ;
The accepting states are any sets of states that contain an accepting state of the original
NFA. So, the accepting sets of the equivalent DFA are f0; 1; 2; 3g; f1; 3; 4g; f3; 4g and f1; 3g:
2. Proposition. Let r be a regular expression. Then there exists a NFA that accepts L(r):
1
Week Three : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : Automata and Regular Expressions
The construction using induction on the number of operators in r:

It is convenient to normalize the type of NFA used in the construction. We require that
the NFA has one nal state and no transitions out of this unique nal state. It is easy to
verify that this does not restrict the languages that will be recognized.
Base Case: r contains zero operators. Then either r = ;; r = , or r = 2 : For r = ;;
the NFA has two states q0 and qf ; where q0 is the start state and qf is the accepting state.
There is no edge connecting them. For r = ; again there are two states q0 and qf as before.
There is one edge connecting them labeled with the letter : The case r = is similarly
handled.
The induction hypothesis is that the result holds for all regular expressions with k operators
or less.
Now, assume r is a regular expression containing k + 1 operators. There are three cases to
consider.
Case One: r = r1 [ r2: We rst state the construction informally. Then r1 = L(M1 ),
where M1 has initial state q1 and unique accepting state f1 ; and r2 = L(M2 ), where M2
has initial state q2 and unique accepting state f2 : Then NFA that accepts the union r will
have initial state q0 and unique accepting state f0 ; further there are e-transitions from q0
to both initial states q1 and q2 and e-transitions from the nal states f1 and f2 to f0 :
In detail, let the NFA's be given as M1 = (Q1 ; 1 ; 1 ; q1 ; ff1 g) and M2 = (Q2 ; 2 ; 2 ; q2 ; ff2 g)
where Q1 \ Q2 = ;:
We create a new initial state q0 and a new nal state f0 : Take M = (Q1 [ Q2 [fq0 ; f0 g; 1 [
S2 ; ; q0 ; ff0 g); where (q0 ; e) = fq1 ; q2 g; (q; a) = 1 (q; a); q 2 Q1 nff1 g; a 2 1 [feg;
(q; a) = 2 (q; a); q 2 Q2 n ff1 g; a 2 2 [ feg; (f1 ; e) = (f2 ; e) = ff0 g:
Case Two: r = r1 r2: Let M1 and M2 be as above. Informally, the NFA that accepts
ris obtained by connecting the nal state of M1 to the initial state of M2 : More precisely,
M = (Q1 [ Q2 ; 1 [ S2 ; ; q1 ; ff2 g); where (q; a) = 1 (q; a); q 2 Q1 nff1 g; a 2 1 [feg;
(f1 ; e) = fq2 g; and (q; a) = 2 (q; a); q 2 Q2 n ff2 g; a 2 2 [ feg;
Case Three: r = (r1 ) : Let the NFA M = (Q1 [ fq0 ; f0 g; ; ; q0; ff0 g); where
2
(q0 ; e) = fq1 ; f0 g; (f1 ; e) = fq1 ; f0 g; and (q; a) = 1 (q; a); q 2 Q1; a 2 1:
3. Example. We apply the above construction to r = 0 1 [ 1:
We will build up the NFA as follows:
1; 1; 0 (1); 0 (1) [ 1:
4. Proposition. The class of languages accepted by DFA's are closed under the following
operations:
(1) union, concatenation, Kleene star;
(2) complementation;
(3) intersection.
Comments on the Proof: We can use the same technique as showing that regular languages
are accepted by NFA's. So we leave (1) to the reader.
(2) Suppose the language L is accepted by the DFA M = (Q; ; ; s; F ). Then the comple-
mentary language n L is accepted by the machine M = (Q; ; ; s; Q n F ).
(3) We shall present two dierent constructions for intersection. The rst will be somewhat
implicit and makes use of (2) while the second construction will use the cross product of
automata.
Let M1 = (Q1 ; ; 1; s1 ; F1 ) and M2 = (Q2 ; ; 2 ; s2 ; F2 ):
We can use the set-theoretic identity that L(M1 ) \ L(M2 ) = (L(M1 ) [ L(M2 )). Since the
class of languages accepted by DFA's is closed under unions and complementations, we see
that the intersection of such languages is also accepted by a DFA.
Now for the second approach. We let M = (Q1 Q2 ; ; ; (s1; s2 ); F ) where ((q; q 0 ); a) =
(1 (q; a); (q 0; a)); q 2 Q1 ; q 0 2 Q2 ; a 2 ; and F = F1 F2 = f(f; f 0 ) : f 2 F1 ; f 0 2 F2 g:
The idea of the cross product construction is that a computation in M actually will trace
out the computation in the two machines M1 and M2 simultaneously.
3
Note: if the collection of accepting states is changed to (F1 Q2 ) [ (Q1 F2 ); then M would
accept the union L(M1 ) [ L(M2 ): So, this construction will produce a DFA that accepts
the union rather than a NFA.
5. There are algorithms to answer the following questions about DFA's:
(1) given w 2 ; is w 2 L(M )?
(2) Is L(M ) = ;?
(3) Is L(M ) = ?
(4) Given two DFA's M1 and M2 ; is L(M1 ) L(M2 )?
(5) Given two DFA's M1 and M2 ; is L(M1 ) = L(M2 )?
Comments on the Proof: (1) We can trace out the operation of the DFA M in jwj-steps to
determine if w is accepted.
(2) The state diagram of the DFA M is nite. We can use a graph searching algorithm to
decide if there is a path from the start state to some nal state. Since we need only use
paths of length jQj or less (where Q is the state space of the DFA, this algorithm then has
complexity O(jQjjj ).
(3) Let M be the DFA that accepts n L. Use part (2) to decide if L(M ) is empty.
(4) We can use (2) to decide if ( n L(M2 )) \ L(M1 ) = ;.
(5) We can use (4) to decide if L(M1 ) L(M2 ) and L(M2 ) L(M1 ). There is a better
algorithm that we shall see later.
6. Proposition. If L = L(M ); where M is a DFA, then L is a regular language; that is,
there is a regular expression r such that L = L(r):
We shall study two constructions for this correspondence. The rst is a graph based
algorithm that builds up the regular expression as nodes are deleted from the state diagram
that is described in the text by Sipser. The second is a variant of it that is often found in
other texts.
4
7. To present the graph oriented algorithm, we need to introduce an even more general
notion of NFA. It will have the expanded property that its edges may be labeled by regular
expressions, not simply by a 2 or e: We will denote this new class of automata by GNFA.
Method of Acceptance: In a usual NFA, the machine matches an input symbol with an
edge label in order to make a move. For a GNFA, the automaton will consume, perhaps,
more than one symbol in order to make the next move. It will consume a substring that
belongs to the regular language which is denoted by the edge label.
A detailed description of this mode of acceptance follows. Let M be a GNFA, and let
w 2 : If w 2 L(M ); then w = w1 w2 : : : wk and there is a sequence of states q0 =
s; q1 ; : : : ; qk = f; such that wi 2 L(Ri); where (qi,1 ; qi ) = Ri ; where Ri is the regular
expression label.
As a consequence, when working with a GNFA, we represent the transition relation as a
square matrix whose entries are regular expressions. The size of this matrix is the number
of states of the automaton.
Normalization Condition: we shall assume that the GNFA M has a start state s that
has NO edge coming into it and that M has a unique nal state ff g with NO edges leaving
6 s: Also, it will be convenient to write the transitions in the form
it. Further, assume f =
(q; q 0 ) = R; where q and q 0 are states and R is a regular expression.
9. Graph Theoretical Algorithm of converting NFA into an equivalent regular expression.
Step 1: Convert the given NFA M into a GNFA M 0 by introducing a new start state,
new nal state, and the necessary transitions. In particular, if there is no edge in the
NFA, then there is an edge in the corresponding GNFA whose label is ;, the empty
set. If there is no looping edge for a state in the NFA, then we need to introduce a
loop in the GNFA labeled by the empty string . Further, as indicated above, the
GNFA has two additional states s and f . Suppose M 0 has k states.
Step 2: If k = 2; then M 0 has just the start state and the unique nal state. It is
clear that L(M 0 ) = L(R); where R = (s; f ):
Step 3: \Node Reduction Step" If k > 2; we remove a node to produce an equivalent
automaton M 00 : In particular, select a node q 00 6= s; f: Then dene M 00 = (Q n
5
fq00 g; ; 00; s; ff g) where 00 (qi; qj ) = 0 (qi; qj ) if either 0 (qi; q00 ) = ; or 0 (q00 ; qi) = ;;
otherwise, 00 (qi ; qj ) = R1 (R2 ) R3 [ R4 ; if R1 = 0 (qi ; q 00 ); R2 = 0 (q 00 ; q 00 ); R3 =
0 (q 00 ; qj );
and R4 = 0 (qi ; qj ):
Step 4: Repeat Step 3 until M 00 has two states: s and f:
We need to verify that Step 3 does indeed produce an equivalent automaton. We can
argue by visualizing paths through the state diagram of the GNFA. In particular, it is
sucient to observe that if a GNFA M3 ; with transition 3 ; had just three states q1 ; q2
and q3 ; then there is an equivalent GNFA M2 ; with transition 2 ; with two states, where:

2 (q1 ; q2 ) = R1;3 (R3;3 ) R3;2 [ R1;2 :
Here, we write Ri;j for 3 (qi ; qj ):

We can describe this method as an algorithm as follows. We let G be a GNFA with start
state s and unique accepting state f; with the remaining states given as q1 ; q2 ; : : : ; qn :
The algorithm given below successively removes the states q1 ; q2 ; and so on, one at a time,
producing an equivalent GNFA. When the looping terminates, the resulting GNFA has only
two states s and f: The language that it will accept is denoted by the regular expression
given by the label (s; f ):
The correctness of the algorithm is shown by observing that the language accepted by the
GNFA is unchanged as each node is deleted.
Recursive Algorithm for the computation of the regular expression (s; f ):
We shall write r(i; j; k + 1) for the regular expression to denote the set of all strings
that start at state qi and terminate at state qj whose interior states may range over
fqn,k+1 ; qn,k+2 ; : : : ; qng. Here, we take s = q0 and f = qn+1 . In particular, r(0; n+1; n+1)
is the regular expression for the language accepted by the automaton.
function r(i; j; k + 1)
if k = 0 then
case i = j : RETURN( (qi; qj ) [ feg)
case i 6= j : RETURN( (qi; qj ))
else
% delete node n , k + 1
6
r(i; j; k + 1) := r(i; j; k) [ r(i; n , k + 1; k) r(n , k + 1; n , k + 1; k) r(n , k + 1; j; k)

;
end
Note: the length of the regular expression will be exponentially long relative to the number
of states of the DFA.
Algorithm for the computation of the regular expression from a DFA
Since the recursive algorithm has many common subcases, we can rewrite the algorithm
iteratively using the method of dynamic programming.
for k = 1::n do
for i; j = k + 1::n do
new (qi ; qj ) := (qi ; qj ) [ (qi ; qk ) (qk ; qk ) (qk ; qj )
od;
for i = k + 1::n do
new (s; qi ) := (s; qi ) [ (s; qk ) (qk ; qk ) (qk ; qi );
new (qi ; f ) := (qi ; f ) [ (qi ; qk ) (qk ; qk ) (qk ; qi )
od;
new (s; f ) := (s; f ) [ (s; qk ) (qk ; qk ) (qk ; f );
:= new ;
od;
Notes: the outer loop deletes the nodes 1 : : : k. The inner loops updates the transition
relation in the GNFA. When the outer loop variable k achieves the value n, then the ranges
on the inner loops are empty and only (s; f ) is updated.
This is the algorithm given in Sipser's book. We also describe another common algorithm
for this coversion. which may be compared with Dijkstra's algorithm for solving the "single-
source shortest path" problem that you studied in algorithms.
As usual, the language L is accepted by the DFA
M = (fq1 ; : : : ; qn g; ; ; q1; F ):
7
We let Rkij denote the set of all strings that take the automaton M from state qi to state
qj without going through any state numbered k or larger; that is, only strings are allowed
that start at qi and end at qj and only use states q1 ; : : : ; qk,1 as intermediate states.
Such sets of strings satisfy an important inductive identity:
Rki;j+1 = Rki;j [ Rki;k (Rkk;k )Rkk;j :
To understand this identity, consider the following. Any string w that is contained in Rkij+1
either uses state qk or not. If w does not use state qk ; then it must lie in Rkij : So, we
must examine how state qk is used by the string w: Of course, qk must be used as some
intermediate state. So, w uses a path from state qi to qk and from qk to qj ; such that only
states q1 ; : : : ; qk are used as intermediate states. So, it seems that we must add the set
Rkik Rkkj : In fact, there is a further possibility. We can also use paths that cycle through
state qk : So, the set of strings that use state qk is: Rkik (Rkkk ) Rkkj :
One detail to consider is that (Rkkk ) is properly larger than Rkij itself, because qk is not
allowed as an intermediate state for strings from Rkij :
We can easily describe the set of strings R1ij : For i 6= j; R1ij = fa : (qi ; a) = qj g; while for
i = j; R1ii = fa : (qi ; a) = qi g [ feg:
We show by induction that Rkij is denoted by a regular expression rijk :

For k = 1; we let, for i 6= j; rij1 = a1 [ : : : [ ap ; for i = j; rii1 = e [ a1 [ : : : [ ap ; where
(qi ; a` ) = qj ; 1 ` p: If this set is empty, then rij
1 = ;; for i 6= j ; while for i = j; r1 = e:
ii
We conclude that R1ij = L(rij1 ):
We now assume that the result holds for value k: That is, for any set Rk`m ; there exists
k such that Rk = L(rk ): We need to show that any set Rk+1 is
a regular expression r`m `m `m ij
given by a regular expression. By the inductive property of the set Rkij+1 ; we know that
Rkij+1 = Rkij [ Rkik (Rkkk ) Rk (k; j ):
By the induction hypothesis, we have:
Rkij [ Rkik (Rkkk ) Rkkj = L(rij
k ) [ L(rk )L((rk ) L(rk )
ik kk kj
k k k k
= L(rij [ rik ((rkk ) rkj ):
8
This nishes the induction.

Finally, we observe that L(M ) = fRn1j+1 : qj 2 F g:
S
Example:
State a b
1 2 3
2 1 3
3 2 2
The accepting states are fq2 ; q3 g and the initial state is fq1 g:
We rst outline the \node elimination algorithm." We need to introduce two new nodes
s = q0 and f = q4 . Further, if there is no explicit edge connecting two distinct states, then
it has a label of the emptyset; there is always a contribution of the empty string to the loop
connecting a state with itself. So the new transition table and the successive tables are:
Standardized Form
state 0 1 2 3 4
0 e e ; ; ;
1 ; e a b ;
2 ; a e b e
3 ; ; a [ b e e
4 ; ; ; ; e
We use the rule 0 (qi ; qj ) = (qi ; qj ) [ (qi ; q1 )[ (q1 ; q1 )] (q1 ; qj ) to obtain:
Elimination of Node 1
state 0 2 3 4
0 e a b ;
2 ; e [ a2 b [ ab e
3 ; a [b e e
4 ; ; ; e
We use the rule 0 (qi ; qj ) = (qi ; qj ) [ (qi ; q2 )[ (q2 ; q2 )] (q2 ; qj ) to obtain:
9
state 0 3 4
0 e [ a(e [ a2 )(b [ ab)
b a(e a2 )
[
3 ; e [ (a [ b)(e [ a2 ) (b [ ab) e [ [(a b)(e a2 )
[
4 ; ; e
We use the rule 0 (q0 ; q4 ) = (q0 ; q4 ) [ (q0 ; q3 )[ (q3 ; q3 )] (q3 ; q4 ) to obtain:
state 0 4
0 e a(e [ a2 ) [ (b [ a(e [ a2 )(b [ ab)(e [ (a [ b)(e [ a2 ) (b [ ab))(e [ (a [ b)(e [ a2 )
4 ; e
We now compute using the second method.

We compute the regular expressions rijk ; for k = 1; 2; 3; 4: Recall:
k+1 k ) rk :
= rijk [ rikk (rkk
rij kj
For k = 1; we are in the base case.

1 = e;
r11 1
r12 = a; r131 = b;
1 = a;
r21 1 = e; r1 = b;
r22 23
1 = ;;
r31 1 1 = e:
r32 = a [ b; r33
For the remaining sets, we need to use the inductive set identity.
2 = r1 [ r1 (r1 ) r1 = e
r11 11 11 11 11
2 1 [ r1 (r1 ) r1 = a [ ae e = a
r12 = r12 11 11 12
2 1 1 (r1 ) r1 = b [ ee b = b
r13 = r13 [ r11 11 13
2
r21 1 [ r1 (r1 ) r1 = a [ ae e = a
= r21 21 11 11
2 = r1 [ r1 (r1 ) r1 = e [ ae a = e [ aa
r22 22 21 11 12
2 1 [ r1 (r1 ) r1 = b [ ae b = b [ ab
r23 = r23 21 11 13
10
2
r31 1 [ r1 (r1 )r1 = ; [ ;e e = ;
= r31 31 11 11
2 1 1 (r1 )r1 = (a [ b) [ ;e e = a [ b
r32 = r32 [ r31 11 12
2 1 1 1 )r1 = [ ;e a =
r33 = r33 [ r31 (r11 13
3
r11 2 [ r2 (r2 ) r2 = e [ a(e [ aa) a = (aa)
= r11 12 22 21
3 2 2 (r2 ) r2 = a [ a(e [ aa) (e [ aa) = a(aa)
r12 = r12 [ r12 22 22
3 2 2 2 ) r2 = b [ a(e [ aa) (b [ ab)
r13 = r13 [ r12 (r22 23
3
r21 2 [ r2 (r2 ) r2 = a [ (e [ aa)(e [ aa) a = a(aa)
= r21 22 22 21
3 2 2 (r2 ) r2 = (e [ aa) [ (e [ aa)(e [ aa) (e [ aa)
r22 = r22 [ r22 22 22
3 2 2 2 ) r2 = (b [ ab) [ (e [ aa)(e [ aa) (b [ ab)
r23 = r23 [ r22 (r22 23
3
r31 2 [ r2 (r2 ) r2 = ; [ (a [ b)(e [ aa) a = (a [ b)(aa)a
= r31 32 22 21
3 = r2 [ r2 (r2 ) r2 = (a [ b) [ (a [ b)(e [ aa) (e [ aa)
r32 32 32 22 22
3 = r2 [ r2 (r2 ) r2 = e [ (a [ b)(e [ aa) (b [ ab) = e [ (a [ b)ab
r33 33 32 22 23
4 and r4 as r4 = r3 [ r3 (r3 )r3 and r4 = r3 [ r3 (r3 ) r3 .
Finally, we only need to compute r12 13 12 12 13 33 32 13 13 13 33 33
It is perhaps more convenient to collect these calculations in a table. It is left to the reader
to ll in all the columns.
Reg. Expr. k =1 k =2 k =3
k
r11 e e
k
r12 a a
k
r13 b b
k
r21 a a
k
r22 e e [ a2
k
r23 b b [ ab
k
r31 ; ;
k
r32 a [b a[b
k
r33 e e
11

Week 3

Caricato da

Informazioni sul documento

Titolo originale

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Week 3

Caricato da

Copyright:

Formati disponibili

Fall 1999 Formal Language Theory Dr. R.

NFA Transitions Computation of e-closures

The construction using induction on the number of operators in r:

1; 1; 0 (1); 0 (1) [ 1:

Here, we write Ri;j for 3 (qi ; qj ):

r(i; j; k + 1) := r(i; j; k) [ r(i; n , k + 1; k) r(n , k + 1; n , k + 1; k) r(n , k + 1; j; k)

We show by induction that Rkij is denoted by a regular expression rijk :

This nishes the induction.

We now compute using the second method.

For k = 1; we are in the base case.

Potrebbero piacerti anche

Week 3

Caricato da

Informazioni sul documento

Titolo originale

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Week 3

Caricato da

Copyright:

Formati disponibili

Fall 1999 Formal Language Theory Dr. R.

NFA Transitions Computation of e-closures

The construction using induction on the number of operators in r:

1; 1; 0 (1); 0 (1) [ 1:

Here, we write Ri;j for 3 (qi ; qj ):

r(i; j; k + 1) := r(i; j; k) [ r(i; n , k + 1; k)  r(n , k + 1; n , k + 1; k)  r(n , k + 1; j; k)

We show by induction that Rkij is denoted by a regular expression rijk :

This nishes the induction.

We now compute using the second method.

For k = 1; we are in the base case.

Potrebbero piacerti anche

Here, we write Ri;j for 3 (qi ; qj ):

r(i; j; k + 1) := r(i; j; k) [ r(i; n , k + 1; k) r(n , k + 1; n , k + 1; k) r(n , k + 1; j; k)