Sei sulla pagina 1di 23

MODULE III MCA-303 SYSTEM SOFTWARE ADMN 2011-‘12

3.1 INTRODUCTION
Since computer hardware is capable of understanding only machine level
instructions, so it is necessary to convert the instructions of a program written in high level
language to machine instructions before the program can be executed by the computer. This job
is carried out by the compiler. So a compiler can be defined as a computer program that
translates a program in a source language into an equivalent program in a target language.
This is shown in figure 3.1.

Figure 3.1: A compiler

A source program/code is a program/code written in the source language, which is


usually a high-level language such as C or C++.A target program/code is a program/code
written in the target language, which often is a machine language or an intermediate code. An
important role of the compiler is to report any errors in the source program that it detects during
the translation process.
If the target program is an executable machine-language program, it can then be
called by the user to process inputs and produce outputs; see Figure 3.2.

Figure 3.2 Running the target program

The first FORTAN compiler took 18 man-years to implement. A compiler can


translate only those source programs which have been written in the language for which the
computer is meant. For example, a FORTAN compiler is only capable of translating source
programs which have been written in FORTAN and therefore, each machine requires a separate
compiler for each high level language.
A compiler cannot diagnose logical errors. It can only diagnose grammatical
(syntactical) errors in the program. For example, if one has wrongly typed -25 as the age of a
person, when he actually intended +25, the compiler cannot diagnose this. Programs containing

Dept.of Computer Science And Applications, SJCET, Palai P a g e | 60


MODULE III MCA-303 SYSTEM SOFTWARE ADMN 2011-‘12

such errors will be successfully compiled and the object code will be obtained without any error
message. But, such programs when executed will not produce the right answers.

3.2 STRUCTURE OF A COMPILER


A compiler takes as input a source program and produces as output an equivalent
sequence of machine instructions. The compilation process is so complex, so we can‟t consider
it as a single step. So the compilation process is partitioned into series of sub process called
phases (as shown in figure 3.3), each performing one specific task. A phase is a logically
cohesive operation that takes as input one representation of the source program and produces as
output another representation.
The complete compilation procedure can be divided into six phases and these
phases can be regrouped in two parts, as follows:
1. Analysis: This phase analyses the source program and generates the intermediate code.
Analysis can be done in three phases:
a) Lexical Analysis

b) Syntax Analysis

c) Semantic Analysis

3.2.1 Lexical Analyzer or Scanner


This module separates characters of the source language into groups that
logically belong together; these groups are called tokens. A token may composed of a single
character or a sequence of characters.
Examples of tokens are keywords such as IF, DO, identifiers such as X or NUM,
operator symbols such as <= or +, and punctuation symbols such as parenthesis or commas. The
output of lexical analyzer is a stream of tokens, which is passed to the next phase.

As an example, consider the following line of code,

sum = old_sum+ value/100;

this code contains 7 tokens:


sum identifier
= assignment operator

Dept.of Computer Science And Applications, SJCET, Palai P a g e | 61


MODULE III MCA-303 SYSTEM SOFTWARE ADMN 2011-‘12

Lexical Analysis

Syntax Analysis

Table Semantic Analysis Error


Management Handling

Intermediate Code
Generation

Code
Optimization

Code
Generation

Figure 3.3 : Phases of a compiler

old_sum identifier

+ operator

value identifier

/ Division operator

100 integer constant

3.2.2 Syntax Analyzer(Parser)


Syntax analyzer takes token from lexical analyzer and performs syntax analysis,
which determines the structure of the program. This is similar to performing grammatical

Dept.of Computer Science And Applications, SJCET, Palai P a g e | 62


MODULE III MCA-303 SYSTEM SOFTWARE ADMN 2011-‘12

analysis on a sentence in a natural language. Syntax analysis determines the structural elements
of the program as well as their relationships. The result of syntax analysis is usually represented
as a parse tree or a syntax tree.
For example, consider a line of code a=b+i.
a=b+i can be represented as a parse tree as follows:

Figure 3.4 A parse tree

3.2.3 Semantic Analyzer


The semantic analyzer gathers type information and checks the tree produced by the
syntax analyzer for semantic errors. Let us see the statement and consider rate is float.
position = initial + rate*100;
Here in the statement the semantic analyzer might add a type conversion node, say
intoreal, to the syntax tree to convert the integer to real quantity. The output of the semantic
analyzer is as shown in the figure 3.5.

3.2.4 Intermediate Code Generation


The intermediate code generator uses the structure produced by the syntactic
analyzer, to create stream of simple instructions. Many styles of intermediate code are possible.
Syntax tree are a form of intermediate representation. This intermediate representation should
have two properties: it should be easy to produce and it should be easy to translate into the
target machine.Another form is three-address code
temp1 = inttoreal(60)
temp2 = rate*temp1
temp3= initial+temp2
sum = temp3

Dept.of Computer Science And Applications, SJCET, Palai P a g e | 63


MODULE III MCA-303 SYSTEM SOFTWARE ADMN 2011-‘12

Figure 3.5 the output of semantic analysis

Figure 3.5 Parse tree

3.2.5 Code Optimization


Code optimization is an optional phase designed to improve the intermediate code
so that ultimate object program runs faster and/or takes less space. Its output is another
Intermediate code program that does the same job as the original, but perhaps in a way that
saves time and/or space.
temp1= rate * 60.0
sum = initial + temp1

3.2.6 Code Generation


The final phase of a compiler is the code generation, takes input an intermediate
representation of the source program and maps it into the target language. If the target language
is machine code, registers or memory locations are selected for each of the variables used by the
program. Compilers may generate many types of target codes depending on machine while
some compilers make target code only for a specific machine.
MOVF id3, R2
MULF #60.0, R2
MOVF id2, R1
ADDF R2, R1
MOVF R1, id1

3.2.7 Table Management


The table management or book keeping, portion of the compiler keeps track of the
names used by the program and records essential information about each, such as its type
(integer, real etc.). The data structure used to record this information is called a symbol table.

Dept.of Computer Science And Applications, SJCET, Palai P a g e | 64


MODULE III MCA-303 SYSTEM SOFTWARE ADMN 2011-‘12

3.2.8 Error Handler


The error handler is invoked when a flaw in the source program is detected. It must
warn the programmer by issuing a diagnostic, and adjust the information being passed from
phase to phase so that each phase can proceed. It is desirable that compilation be completed on
flawed programs, at least the syntax analysis phase, so that as many errors as possible can be
detected in one compilation. Both table management and error handling routines interact with
all phases of the compiler.

x=y+z;

p=q+t
else
x=y–z;
p=q*t
endif

3.3 LEXICAL ANALYSIS


Lexical analysis is a process of analyzing lexical units in a source string. The
function of the lexical analyser is to read the source program, one character at a time and to
translate it into a sequence of primitive units called tokens. Keywords, identifiers, constants and
operators are examples of tokens.
We use regular expressions as the notation used to describe essentially all the
tokens of programming languages. Second, having decided what the tokens are we need some
mechanism to recognize these tokens in the input stream. Transition diagram and finite
automata are convenient ways of designing token recognizers.
One advantage of using regular expressions to specify tokens is that from a regular
expression we can automatically construct a recognizer for tokens denoted by that regular
expression.

Dept.of Computer Science And Applications, SJCET, Palai P a g e | 65


MODULE III MCA-303 SYSTEM SOFTWARE ADMN 2011-‘12

3.3.1 THE ROLE OF LEXICAL ANALYZER

Figure 3.7: Interactions between the lexical analyzer and the parser

The main task of lexical analyzer is to read the input characters of the source
program, group them into lexemes, and produce as output a sequence of tokens for each lexeme
in the source program. The stream of tokens is then sent to the parser for syntax analysis.
Actually the syntax analyzer is the master program. It sends the request to the lexical analyzer
for getting the next token. The lexical analyzer will read the input string and convert this into
token and pass the result to the syntax analyzer. The syntax checker do the grammar check over
the token and then sends the request to the lexical analyzer. This process continues recursively
till the entire input string is consumed.
The need of lexical analysis is to simplify the overall design of the compiler. It is
easy to specify the structure of tokens than the syntactic structure of the source
program.Therefore we can construct a more specialized and hence more efficient recognizer for
tokens than syntactic structure.
Another function of lexical analyzer is stripping out comments and white space
(blank, newline, and tab). Also if any erroneous input is provided by the user in the program,
the lexical analyzer will correlate that error with the source file and the line number.
Lexical analyzer is kept as an independent module. This is to
 Simplify the overall design of the compiler
 Compiler efficiency is improved
 Compiler portability is enhanced.

Dept.of Computer Science And Applications, SJCET, Palai P a g e | 66


MODULE III MCA-303 SYSTEM SOFTWARE ADMN 2011-‘12

3.3.2 INPUT BUFFERING


Because of the amount of time taken to process characters and the large
number of characters that must be processed during the compilation of a large source program,
specialized buffering techniques have been developed to reduce the amount of overhead required
to process a single input character. Also sometimes lexical analyzer needs to look ahead some
symbols to decide about the token to return for example, In C, single-character operators like -,
=, or < could also be the beginning of a two-character operator like ->, ==, or <=. Thus we use
input buffer to handle this situation.

Figure 3.8 Input Buffer


Each buffer is of the same size N, and N is usually the size of a disk block,
e.g., 4096 bytes. Using one system read command we can read N characters into a buffer, rather
than using one system call per character. If fewer than N characters remain in the input file, then
a special character, represented by eof, marks the end of the source file and is different from any
possible character of the source program.
Two pointers to the input are maintained:
 Pointer lexemeBegin, marks the beginning of the current lexeme, whose extent we
are attempting to determine.
 Pointer forward scans ahead until a pattern match is found

Once the next lexeme is determined, forward is set to the character at its right
end. Then, after the lexeme is recorded as an attribute value of a token returned to the parser,
lexemeBegin is set to the character immediately after the lexeme just found. In Fig. 3.8, we see
forward has passed the end of the next lexeme, ** (the FORTRAN exponentiation operator),
and must be retracted one position to its left.

Advancing forward requires that we first test whether we have reached theend
of one of thebuffers, and if so, we must reload the other buffer from theinput, and move forward
to the beginning of the newly loaded buffer. As longas we never need to look so far ahead of the
actual lexeme that the sum of thelexeme's length plus the distance we look ahead is greater than
N, we shallnever overwrite the lexeme in its buffer before determining it.

3.3.3 A SIMPLE APPROACH TO THE DESIGN OF LEXICAL


ANALYZERS

Dept.of Computer Science And Applications, SJCET, Palai P a g e | 67


MODULE III MCA-303 SYSTEM SOFTWARE ADMN 2011-‘12

One way to begin the design of any program is to describe the behavior of the
program by a flow chart. This approach is particularly useful when the program is a lexical
analyzer, because the action taken is highly dependent on what characters have been seen
recently. The specialized kind of flow chart for lexical analyzers are called the transition
diagram. In a transition diagram, the boxes of flow chart are drawn as circles and called states.
The states are connected by arrows called edges. The labels on the various edges leaving a state
indicate the input characters that can appear after that state.

Figure 3.9 Transition diagram for identifier

Figure 3.9 shows a transition diagram for an identifier, defined to be a letter


followed by any number of letters or digits. The starting state of the transition diagram is state 0,
the edge from which indicates that the first input character must be a letter. If this is the case, we
enter state 1and look at the input character after that. We continue this until the next input
character is a delimiter for an identifier, which we assume any character that is not a letter or a
digit. On reading the delimiter, we enter state 2.
To make transition diagram into a segment of code, we can write segment of
code for each state. The first step to be done in the code for any state is to obtain the next
character from the input buffer. For this, we use GETCHAR function, which returns the next
character, advancing the forward pointer at each call. The next step is to determine which edge,
if any, out of the state is labeled by a character or class of characters that includes the characters
just read. If such an edge is found, control is transferred to the state pointed to by that edge. If
no such edge is found, and the state is not one which indicates that a token has been found
(indicted by the double circle), we have failed to find this token. The forward pointer must be
retracted to where the beginning pointer is, and another token must be searched for, using
another transition diagram. If all the transition diagram have been tried without having been
tried without success, a lexical error has been detected, and an error correction routine must be
called.

Consider the transition diagram given in figure 3.9. The code for state0 might be:
state 0 : C:=GETCHAR();
if LETTER(C) then goto state1
else FAIL()

Dept.of Computer Science And Applications, SJCET, Palai P a g e | 68


MODULE III MCA-303 SYSTEM SOFTWARE ADMN 2011-‘12

Here LETTER is a procedure which returns true if and only if C is a letter.


FAIL is a routine which retracts the forward pointer and starts up the next transition diagram.
The code for state1 is:

state 1 : C:= GETCHAR();


if LETTER(C) or DIGIT(C) then goto state1
else if DELIMITER(C) then goto state2
else FAIL()

DIGIT is a procedure which returns true if and only if C is one of the digits
0,1,…9. DELIMITER is a procedure which returns true whenever C is a character that is not a
letter or digit. State2 indicates that an identifier has been found. Since delimiter is not a part of
the identifier we must retract the forward pointer one character.

3.3.4 REGULAR EXPRESSIONS


Lexical Analyzer uses regular expressions for describing the possible token
that can appear in the input stream. Regular expression is used as a notation for describing the
tokens. These regular expressions can be converted automatically into finite automata, which
are formal specifications of transition diagrams.
The notations used in regular expression include:
Alphabets:
An alphabet is a finite, nonempty set of symbols. Conventionally we use the
symbol  for an alphabet. Common alphabet include:

1.  ={0,1}; the binary alphabet

2.  ={a,b,…z}; the set of all lower-case letters

3. the set of all ASCII characters, or set of all printable ASCII characters.
Strings
A string( or word) is a finite sequence of symbols chosen from some alphabet.
For example, 01101 is a string from the binary alphabet  = {0,1}. The string 111 is another
string chosen from this alphabet.

The Empty String


The empty string is the string with zero occurrences of symbols. It is denoted
by  .
Length of a string
It is the number of positions for symbols in an string. The standard notation
for the length of a string in wis |w|. For example |0110| is 4 and | |=0.

Dept.of Computer Science And Applications, SJCET, Palai P a g e | 69


MODULE III MCA-303 SYSTEM SOFTWARE ADMN 2011-‘12

Powers of an alphabet
The set of all strings of a certain length from an alphabet can be represented
using exponential notation. be the set of strings of length k.
For example, If ={0,1}; then ={0,1}; ={00,01,10,11}; ={ }.
The set of all strings over an alphabet is denoted using *. For example
{0,1}* ={0,1,00,01,10,11,000,…}.This can be written as :

= 0 …

Sometimes, we want to exclude the empty string from set of strings. The nonempty strings from
alphabet is denoted by . Thus,

 ={ ….}
 =
Concatenation of Strings
Let x and y be strings. Then xy or x.y denotes the concatenation of x and y,
that is the string formed by making a copy of x and the following it by a copy of y. for example,
if x=a1a2…ai and y is the string composed of j symbols y=b1,b2,…bj, then xy is thestring of
length i+j; xy=a1a2…aib1b2..bj.
The concatenation of the empty string with any string; more formally  x=x  =x.
Also we can take exponential of strings i.e., x1=x, x2=xx, x3=xxx and so on. In
general, xi is the string x repeated i times.x0 to be  for any string x. Thus,  plays the role of
1.
The prefix of a string s is any string obtained by removing zero or more symbols
from the end of s. For example, ban, banana, and  are the prefix of banana. The suffix of
string s is any string obtained by removing zero or more symbols from the beginning of s. For
example, nana, banana, and  are suffixes of banana. A substring of s is obtained by deleting
any prefix and any suffix from s.
Languages
A language is any countable set of strings over some fixed alphabet. Abstract
languages like  , the empty set, or {  }, the set containing only empty string, are languages
under this definition. Also the language can contain set of all syntactically well-formed c-
programs and the set of all grammatically correct English sentences.
The concatenation can also be applied to languages. If L and M are two
languages, then L.M or just LM, is the language consisting of all strings xy which can be
formed by selecting a string x from L, a string y from M and concatenating them in that order.
That is, LM = {xy | x is in L and y is in M }
Definition of Regular Expression

Dept.of Computer Science And Applications, SJCET, Palai P a g e | 70


MODULE III MCA-303 SYSTEM SOFTWARE ADMN 2011-‘12

A regular expression is built up of simpler regular expression using a set of


defining rules. The regular expression allows us to define precisely the lexical units.

identifier = letter(letter |digit)*

With this notation we define identifiers as the one that begins with a letter and
may include any number of letter or digits. The vertical bar means “or”. The parenthesis is used
to group subexpression. The „*‟ means 0 or more instances of the parenthesis expression and the
position of letter with remainder of expression means concatenation.
Each regular expression r denotes a language L(r). The defining rules specify
how L(r) is formed by combining in various ways.
1.  is a regular expression that denotes {  } i.e., a set containing an empty string.
2. If „a‟ is a then „a‟ is a regular expression that denotes {a} i.e., the set containing
string a.
3. If r and s are regular expressions denoting the languages L(r) and L(s) then
a. (r) | (s) is a regular expression denoting L(r)  L(s).
b.(r)(s) is a regular expression denoting L(r)(s).
c. r* is a regular expression denoting (L(r))*

A language denoted by a regular expression is said to a regular set. The unary


operator * has the highest precedence and then concatenation and | has the lowest precedence.

e.g: let  = { a, b}

1.the regular expression a|b denotes the {a,b}

2. The regular expression (a|b)(a|b) denotes {aa,ab,ba,bb}

3. The regular expression a* denotes the {  , a, aa, aaa,…}

4. The regular expression (a|b)*denotes the set of all strings containing 0 or more
instances of an a or b i.e., set of strings of a‟s and b‟s.

5. a|a*b denotes the language {a, b, ab, aab, aaab,…}i.e., the string a and all strings
consisting of zero or more a‟s and ending in b.

We can give names to certain regular expressions and use those names in
subsequent expressions, as if the names were themselves symbols.
Example: C identifiers are strings of letters, digits, and underscores. Hereis a regular definition
for the language of C identifiers.
letterA| B |…|Z |a | b |…|z|
digit0|1|…|9

Dept.of Computer Science And Applications, SJCET, Palai P a g e | 71


MODULE III MCA-303 SYSTEM SOFTWARE ADMN 2011-‘12

idletter( letter|digit)*
Example: Unsigned numbers (integer or floating point) are strings suchas 5280, 0.01234,
6.336E4, or 1.89E-4. The regular definition
digit 0 |1 |…| 9
digitsdigit digit*
optionalFraction. digits | 
optionalExponent ( E ( + | - |  ) digits ) | 
numberdigits optionalFractionoptionalExponent

3.4 FINITE AUTOMATA


Finite Automata are recognizers. A recognizer for a language is a program
that takes x as an input string and answers „yes‟ if x is a sentence of the language and „no‟
otherwise. We convert a regular expression into a recognizer by constructing a generalized
transition diagram called finite automata. A Finite automata can be deterministic or
nondeterministic. Nondeterministic means that more than one transition out of a state may be
possible for the same symbol.
Both deterministic and nondeterministic finite automata are capable of
recognizing precisely the regular sets. The deterministic finite automata can lead to faster
recognizers than nondeterministic automata. A deterministic automaton can be much bigger
than equivalent nondeterministic automata.

3.4.1 NONDETERMINISTIC FINITEAUTOMATA (NFA)

A NFA(nondeterministic Finite Automata) is a mathematical model that


consist of
 A set of input symbols such as (a,b,c,…)
 Aset of finite states S
 Atransition function that gives for each state, and for each symbol in   { } a

set of next states.


 A state s0 that is distinguished as the start state or initial state.
 A state Fthat is distinguished as the final state or accepting state.

An NFA can be represented diagrammatically by a labeled directed transition


graph, in which the nodes are the states and the labeled edges represent the transition function.
This graph looks like a transition diagram but edges can be labeled by  as well as characters,
and the same character can label two or more transitions out of one state. One state (0 in figure

Dept.of Computer Science And Applications, SJCET, Palai P a g e | 72


MODULE III MCA-303 SYSTEM SOFTWARE ADMN 2011-‘12

3.10)is distinguished as the start state, and one or more states may be distinguished as accepting
states (or final states) in figure 3.10 , state 3 is accepting, as indicated by the double circle.

Figure 3.10 Nondeterministic finite Automata

1). {a, b} set of symbols


2). set of states = {0, 1,2,3}
3). moves abb (0  1  2  3)
4) start state 0
5) final state 3

Transition Table

Input Symbol
State
a b
0 {0,1} {0}
1 {2}
2 {3}

Figure 3.11 Transition Table

The transitions of an NFA can be conveniently represented in tabular form by


means of a transition table. The transition table for the NFA of figure 3.10 is shown in figure
3.11 . In the transition table there is a row for each state and a column for each input symbol and
 , if necessary. The entry for row i and symbol a is the set of states that can be reached by a
transition from state i on input a. The figure 3.12 , is an NFA accepting L(aa*|bb*).

The transition table has the advantage that we can easily find the transactions on a
given state and input. Its disadvantage is that it takes a lot of space when the input alphabet is
large and most transition is to the empty set.

On a NFA, We can choose either an epsilon transition or a transition on an alphabet


character, andif there are several transitions with the same symbol, we can choose
betweenthese. This makes the automaton nondeterministic, as the choice of action isnot
determined solely by looking at the current state and input. It may be thatsome choices lead to
an accepting state while others do not. This does, however,not mean that the string is sometimes
in the language and sometimes not:We will include a string in the language if it is possible to
make a sequence ofchoices that makes the string lead to an accepting state.

Acceptance of Input Strings by Automata

Dept.of Computer Science And Applications, SJCET, Palai P a g e | 73


MODULE III MCA-303 SYSTEM SOFTWARE ADMN 2011-‘12

An NFA accepts input string x if and only if there is some path in the
transition graph from the start state to one of the accepting states, such that the symbols along the
path spell out x. Note that c labels along the path are effectively ignored, since the empty string
does not contribute to the string constructed along the path.

Figure 3.12NFA accepting aa* | bb*

Example :The string aabbis accepted by the NFA of Figure. 3.10. Thepath labeled by aabbfrom
state 0 to state 3 is:

The language defined (or accepted) by an NFA is the set of input strings it access.

Figure 3.12is an NFA accepting L(aa* | bb*). String aaaisaccepted because of the path

3.4.2 DETERMINISTIC FINITE AUTOMATA (DFA)

A DFA(Deterministic Finite Automaton) D consists of an alphabet , a set of states


S, a transition function T: S S, a start state s0 S, and a set of accepting states A S. The
language accepted by D is represented by L (D). Also DFA is a special case of an NFA where:

1. It has no moves on input 


2. For each state s and input symbol ‘a’, there is exactly one edge out of s labeled
‘a’.

Dept.of Computer Science And Applications, SJCET, Palai P a g e | 74


MODULE III MCA-303 SYSTEM SOFTWARE ADMN 2011-‘12

If we are using a transition table to represent a transition function of a DFA, then


each entry in the transition table is a single state. As a consequence it is very easy to determine
whether a DFA access an input stream, since there is atmost one path from the start state labeled
by that string.The following algorithm shows how to apply a DFA to a string.

Algorithm 3. 2: Simulating a DFA


INPUT: An input string x terminated by an end-of-file character eof. A DFAD with start state s0,
accepting states F, and transition function move.
OUTPUT: Answer ''yes" if D accepts x; "no" otherwise.
METHOD:Apply the algorithm to the input string x. The function move(s, c) gives the state to
which there is an edge from state son input c.The function nextChar returns the next character of
the input string x.

s = s0;
c = nextChar();
while( c != eof)
{
s = move(s,c);
c=nextChar();
}
if( s is in F ) return “yes”;
else return “no”;

Figure 3.13: DFA accepting (a|b)*abb

Dept.of Computer Science And Applications, SJCET, Palai P a g e | 75


MODULE III MCA-303 SYSTEM SOFTWARE ADMN 2011-‘12

Example: In figure 3.13, a transition graph of a DFA accepting the language (a|b)*abb, the
same as the accepted by the NFA of figure 3.10. Given the input string ababb, this DFA enters
the sequence of states 0,1,2,1,2,3 and return “yes”.

3.5 CONSTRUCTION OF AN NFA FROM A REGULAR


EXPRESSION
There are many strategies for building a recognizer from a regular expression. If
run-time speed is essential we convert the NFA into a DFA using the subset construction. That
is, inorder to construct an NFA from a regular expression, we will fragment regular expression
into sub expression and construct NFA fragment and then combine these fragments. A fragment
is not a complete NFA, so we complete the construction by adding necessary components to
make a complete NFA. A number of text editing programs are available to construct an NFA
from a regular expression.The algorithm is syntax directed in that it uses syntactical structure of
regular expression to guide the construction process.

THOMSON’S CONSTRUCTION

To constructing an NFA from a regular expression we decompose the regular expression into
sub expressions and then construct NFA for each of the basic symbols in r.
Input:A regular expression R over alphabet
Output:An NFA N accepting language L(r)
Method:First, we decompose r into sub expressions then using the rules we construct NFA‟s for
each of the basic symbols in r. Then guided by the syntactic structure of regular expressions we
combine these NFA‟s, until we obtain NFA for the entire expression.

Each intermediate NFA produced during the course of construction corresponds to a sub
expression r and has several important properties

 It has exactly one final state


 No edge enters the start state and no edge leaves the final state.

1. For  in R

2. For a in R we construct

Dept.of Computer Science And Applications, SJCET, Palai P a g e | 76


MODULE III MCA-303 SYSTEM SOFTWARE ADMN 2011-‘12

3. For a | b

4. For ab

5. For a*

Dept.of Computer Science And Applications, SJCET, Palai P a g e | 77


MODULE III MCA-303 SYSTEM SOFTWARE ADMN 2011-‘12

Construct an NFA for the regular expression (a | b)* abb

Construct an NFA for the regular expression a*b*

Construct an NFA for the regular expression aba*b

Dept.of Computer Science And Applications, SJCET, Palai P a g e | 78


MODULE III MCA-303 SYSTEM SOFTWARE ADMN 2011-‘12

3.6 CONVERSION OF AN NFA TO A DFA


Since NFA often has a choice of move on an input symbol or on , or even a
choice of making a transition on or on real input symbol, its simulation is less straightforward
than for a DFA. Thus often it is important to convert an NFA to a DFA that accepts the same
language.

To convert an NFA to DFA we use a technique called as ‘subset


construction’. The general idea behind the subset construction is that each state of the
constructed DFA corresponds to a set of NFA state. After reading input a1a2..an, the DFA is in
that state which corresponds to the set of states that the NFA can reach, from its start state,
following paths labeled a1,a2,…an. The NFA and DFA have approximately the same number of
states.

The algorithm constructs a transition table Dtran for the DFD D. On the
transition table, for a set of NFA state, we construct a DFA state. Another notation used in the
algorithm is , which means a set of NFA states reachable from NFA state s on
transitions alone. means, set of NFA states reachable from some NFA state s in
set T on transition alone and then move(T, a) gives the set of NFA states to which there is a
transition on input symbol a from state sin T.

ALGORITHM: The subset construction of a DFA from an NFA


INPUT: An NFA N.
OUTPUT: A DFA D accepting the same language as N.
METHOD:The algorithm for computing

Dept.of Computer Science And Applications, SJCET, Palai P a g e | 79


MODULE III MCA-303 SYSTEM SOFTWARE ADMN 2011-‘12

The algorithm for subset construction is as follows:

Example:

Construct an DFA from the NFA shown in figure 3.14

Figure 3.14 NFA N for (a|b)*abb

Here, the start state of this NFA is state 0. We have to find -closure(start state) in order to find
the start state of the DFA, D.
1). =A

2). then we have to findmove(A,a) = {3,8}

Dept.of Computer Science And Applications, SJCET, Palai P a g e | 80


MODULE III MCA-303 SYSTEM SOFTWARE ADMN 2011-‘12

( )
3). move(A, b) = {5}
( )
4).move(B, a) ={3,8}
( )
5). move(B, b) = {5, 9}
( )
6). move(C,a) = {3, 8}
( )
7). move(C,b) = {5}
( )
8). move(D,a) ={3,8}
( )
9). move(D,b) = {5,10}
( )
10). move(E, a) = {3,8}
( )
11). ). move(E,b) = {5}
( )
Transition Table:

List of NFA states DFA state a b


A B C
B B D
C B C
D B E
E B C

and the corresponding DFA is as follows:

Dept.of Computer Science And Applications, SJCET, Palai P a g e | 81


MODULE III MCA-303 SYSTEM SOFTWARE ADMN 2011-‘12

Dept.of Computer Science And Applications, SJCET, Palai P a g e | 82

Potrebbero piacerti anche