Sei sulla pagina 1di 26

Lecture 3 Graph Representation for Regular Expressions

digraph (directed graph)

A digraph is a pair of sets (V, E) such that each element of E is an ordered pair of elements in V. A path is an alternative sequence of vertices and edges such that all edges are in the same direction.

string-labeled digraph
A string-labeled digraph is a digraph in which each edge is labeled by a string. In a string-labeled digraph, every path is associated with a string which is obtained by concatenating all strings on the path. This string is called the label of the path.

G(r)
For each regular expression r, we can construct a digraph G(r) with edges labeled by symbols and as follows. If r=, then

If r, then

Theorem 1
G(r) has a property that a string x belongs to r if and only if x is the label of a path from the initial vertex to the final vertex. Proof is done by induction on r.

Graph Representation
A graph representation of a regular expression r is a string-labeled graph with an initial vertex s and a final vertex f such that a string x belongs to r if and only if x is associated with a path from s to f.

Corollary 2
For any regular expression r, there exists a string-labeled digraph with two special vertices, a initial vertex s and a final vertex f, such that a string x belongs to r if and only if x is associated with a path from s to f.

Puzzle: If a regular expression r contains u +''s, v ''s, and w *''s, how many -edges does G(r) contain?
Question: How to reduce the number of -edges?

Theorem 3
An -edge (u,v) in G(r) which is a unique out-edge from a nonfinal vertex u or a unique in-edge to a noninitial vertex v can be shrunk to a single vertex. (If one of u and v is the initial vertex or the final vertex, so is the resulting vertex.) Remark: Shrinking should be done one by one.

Lecture 4 Deterministic Finite Automata (DFA)

DFA

tape

Finite Control

The tape is divided into finitely many cells. Each cell contains a symbol in an alphabet .

The head scans at a cell on the tape and can read a symbol on the cell. In each move, the head can move to the right cell.

The finite control has finitely many states which form a set Q. For each move, the state is changed according to the evaluation of a transition function :QxQ.

(q, a) = p means that if the head reads symbol a and the finite control is in the state q, then the next state should be p, and the head moves one cell to the right.

There are some special states: an initial state s and a set F of final states. Initially, the DFA is in the initial state s and the head scans the leftmost cell. The tape holds an input string.

When the head gets off the tape, the DFA stops. An input string x is accepted by the DFA if the DFA stops at a final state. Otherwise, the input string is rejected.

The DFA can be represented by M = (Q, , , s, F) where is the alphabet of input symbols. The set of all strings accepted by a DFA M is denoted by L(M). We also say that the language L(M) is accepted by M.

The transition diagram of a DFA is an alternative way to represent the DFA. For M = (Q, , , s, F), the transition diagram of M is a symbol-labeled digraph G=(V, E) satisfying the following: V = Q (s = E={q
a

,f=

for f \in F)

p | (q, a) = p}.

s p q

0 p q q

1 s s q

1 0 p 1 0

0, 1

L(M) = (0+1)*00(0+1)*.

The transition diagram of the DFA M has the following properties: For every vertex q and every symbol a, there exists an edge with label a from q. For each string x, there exists exactly one path starting from the initial state s associated with x. A string x is accepted by M if and only if this path ends at a final state.