Module 3

MODULE III MCA-303 SYSTEM SOFTWARE ADMN 2011-‘12
3.1 INTRODUCTION
Since computer hardware is capable of understanding only machine level
instructions, so it is necessary to convert the instructions of a program written in high level
language to machine instructions before the program can be executed by the computer. This job
is carried out by the compiler. So a compiler can be defined as a computer program that
translates a program in a source language into an equivalent program in a target language.
This is shown in figure 3.1.
Figure 3.1: A compiler
A source program/code is a program/code written in the source language, which is

usually a high-level language such as C or C++.A target program/code is a program/code
written in the target language, which often is a machine language or an intermediate code. An
important role of the compiler is to report any errors in the source program that it detects during
the translation process.
If the target program is an executable machine-language program, it can then be
called by the user to process inputs and produce outputs; see Figure 3.2.
Figure 3.2 Running the target program
The first FORTAN compiler took 18 man-years to implement. A compiler can

translate only those source programs which have been written in the language for which the
computer is meant. For example, a FORTAN compiler is only capable of translating source
programs which have been written in FORTAN and therefore, each machine requires a separate
compiler for each high level language.
A compiler cannot diagnose logical errors. It can only diagnose grammatical
(syntactical) errors in the program. For example, if one has wrongly typed -25 as the age of a
person, when he actually intended +25, the compiler cannot diagnose this. Programs containing
Dept.of Computer Science And Applications, SJCET, Palai P a g e | 60

such errors will be successfully compiled and the object code will be obtained without any error
message. But, such programs when executed will not produce the right answers.
3.2 STRUCTURE OF A COMPILER

A compiler takes as input a source program and produces as output an equivalent
sequence of machine instructions. The compilation process is so complex, so we can‟t consider
it as a single step. So the compilation process is partitioned into series of sub process called
phases (as shown in figure 3.3), each performing one specific task. A phase is a logically
cohesive operation that takes as input one representation of the source program and produces as
output another representation.
The complete compilation procedure can be divided into six phases and these
phases can be regrouped in two parts, as follows:
1. Analysis: This phase analyses the source program and generates the intermediate code.
Analysis can be done in three phases:
a) Lexical Analysis
b) Syntax Analysis
c) Semantic Analysis
3.2.1 Lexical Analyzer or Scanner

This module separates characters of the source language into groups that
logically belong together; these groups are called tokens. A token may composed of a single
character or a sequence of characters.
Examples of tokens are keywords such as IF, DO, identifiers such as X or NUM,
operator symbols such as <= or +, and punctuation symbols such as parenthesis or commas. The
output of lexical analyzer is a stream of tokens, which is passed to the next phase.
As an example, consider the following line of code,
sum = old_sum+ value/100;
this code contains 7 tokens:

sum identifier
= assignment operator

Lexical Analysis
Syntax Analysis
Table Semantic Analysis Error

Management Handling
Intermediate Code
Generation
Code
Optimization
Code
Generation
Figure 3.3 : Phases of a compiler
old_sum identifier
+ operator
value identifier
/ Division operator
100 integer constant
3.2.2 Syntax Analyzer(Parser)

Syntax analyzer takes token from lexical analyzer and performs syntax analysis,
which determines the structure of the program. This is similar to performing grammatical

analysis on a sentence in a natural language. Syntax analysis determines the structural elements
of the program as well as their relationships. The result of syntax analysis is usually represented
as a parse tree or a syntax tree.
For example, consider a line of code a=b+i.
a=b+i can be represented as a parse tree as follows:
Figure 3.4 A parse tree
3.2.3 Semantic Analyzer

The semantic analyzer gathers type information and checks the tree produced by the
syntax analyzer for semantic errors. Let us see the statement and consider rate is float.
position = initial + rate*100;
Here in the statement the semantic analyzer might add a type conversion node, say
intoreal, to the syntax tree to convert the integer to real quantity. The output of the semantic
analyzer is as shown in the figure 3.5.
3.2.4 Intermediate Code Generation

The intermediate code generator uses the structure produced by the syntactic
analyzer, to create stream of simple instructions. Many styles of intermediate code are possible.
Syntax tree are a form of intermediate representation. This intermediate representation should
have two properties: it should be easy to produce and it should be easy to translate into the
target machine.Another form is three-address code
temp1 = inttoreal(60)
temp2 = rate*temp1
temp3= initial+temp2
sum = temp3

Figure 3.5 the output of semantic analysis
Figure 3.5 Parse tree
3.2.5 Code Optimization

Code optimization is an optional phase designed to improve the intermediate code
so that ultimate object program runs faster and/or takes less space. Its output is another
Intermediate code program that does the same job as the original, but perhaps in a way that
saves time and/or space.
temp1= rate * 60.0
sum = initial + temp1
3.2.6 Code Generation

The final phase of a compiler is the code generation, takes input an intermediate
representation of the source program and maps it into the target language. If the target language
is machine code, registers or memory locations are selected for each of the variables used by the
program. Compilers may generate many types of target codes depending on machine while
some compilers make target code only for a specific machine.
MOVF id3, R2
MULF #60.0, R2
MOVF id2, R1
ADDF R2, R1
MOVF R1, id1
3.2.7 Table Management

The table management or book keeping, portion of the compiler keeps track of the
names used by the program and records essential information about each, such as its type
(integer, real etc.). The data structure used to record this information is called a symbol table.

3.2.8 Error Handler

The error handler is invoked when a flaw in the source program is detected. It must
warn the programmer by issuing a diagnostic, and adjust the information being passed from
phase to phase so that each phase can proceed. It is desirable that compilation be completed on
flawed programs, at least the syntax analysis phase, so that as many errors as possible can be
detected in one compilation. Both table management and error handling routines interact with
all phases of the compiler.
x=y+z;
p=q+t
else
x=y–z;
p=q*t
endif
3.3 LEXICAL ANALYSIS

Lexical analysis is a process of analyzing lexical units in a source string. The
function of the lexical analyser is to read the source program, one character at a time and to
translate it into a sequence of primitive units called tokens. Keywords, identifiers, constants and
operators are examples of tokens.
We use regular expressions as the notation used to describe essentially all the
tokens of programming languages. Second, having decided what the tokens are we need some
mechanism to recognize these tokens in the input stream. Transition diagram and finite
automata are convenient ways of designing token recognizers.
One advantage of using regular expressions to specify tokens is that from a regular
expression we can automatically construct a recognizer for tokens denoted by that regular
expression.

3.3.1 THE ROLE OF LEXICAL ANALYZER
Figure 3.7: Interactions between the lexical analyzer and the parser
The main task of lexical analyzer is to read the input characters of the source
program, group them into lexemes, and produce as output a sequence of tokens for each lexeme
in the source program. The stream of tokens is then sent to the parser for syntax analysis.
Actually the syntax analyzer is the master program. It sends the request to the lexical analyzer
for getting the next token. The lexical analyzer will read the input string and convert this into
token and pass the result to the syntax analyzer. The syntax checker do the grammar check over
the token and then sends the request to the lexical analyzer. This process continues recursively
till the entire input string is consumed.
The need of lexical analysis is to simplify the overall design of the compiler. It is
easy to specify the structure of tokens than the syntactic structure of the source
program.Therefore we can construct a more specialized and hence more efficient recognizer for
tokens than syntactic structure.
Another function of lexical analyzer is stripping out comments and white space
(blank, newline, and tab). Also if any erroneous input is provided by the user in the program,
the lexical analyzer will correlate that error with the source file and the line number.
Lexical analyzer is kept as an independent module. This is to
 Simplify the overall design of the compiler
 Compiler efficiency is improved
 Compiler portability is enhanced.

3.3.2 INPUT BUFFERING

Because of the amount of time taken to process characters and the large
number of characters that must be processed during the compilation of a large source program,
specialized buffering techniques have been developed to reduce the amount of overhead required
to process a single input character. Also sometimes lexical analyzer needs to look ahead some
symbols to decide about the token to return for example, In C, single-character operators like -,
=, or < could also be the beginning of a two-character operator like ->, ==, or <=. Thus we use
input buffer to handle this situation.
Figure 3.8 Input Buffer

Each buffer is of the same size N, and N is usually the size of a disk block,
e.g., 4096 bytes. Using one system read command we can read N characters into a buffer, rather
than using one system call per character. If fewer than N characters remain in the input file, then
a special character, represented by eof, marks the end of the source file and is different from any
possible character of the source program.
Two pointers to the input are maintained:
 Pointer lexemeBegin, marks the beginning of the current lexeme, whose extent we
are attempting to determine.
 Pointer forward scans ahead until a pattern match is found
Once the next lexeme is determined, forward is set to the character at its right
end. Then, after the lexeme is recorded as an attribute value of a token returned to the parser,
lexemeBegin is set to the character immediately after the lexeme just found. In Fig. 3.8, we see
forward has passed the end of the next lexeme, ** (the FORTRAN exponentiation operator),
and must be retracted one position to its left.
Advancing forward requires that we first test whether we have reached theend
of one of thebuffers, and if so, we must reload the other buffer from theinput, and move forward
to the beginning of the newly loaded buffer. As longas we never need to look so far ahead of the
actual lexeme that the sum of thelexeme's length plus the distance we look ahead is greater than
N, we shallnever overwrite the lexeme in its buffer before determining it.
3.3.3 A SIMPLE APPROACH TO THE DESIGN OF LEXICAL

ANALYZERS

One way to begin the design of any program is to describe the behavior of the
program by a flow chart. This approach is particularly useful when the program is a lexical
analyzer, because the action taken is highly dependent on what characters have been seen
recently. The specialized kind of flow chart for lexical analyzers are called the transition
diagram. In a transition diagram, the boxes of flow chart are drawn as circles and called states.
The states are connected by arrows called edges. The labels on the various edges leaving a state
indicate the input characters that can appear after that state.
Figure 3.9 Transition diagram for identifier
Figure 3.9 shows a transition diagram for an identifier, defined to be a letter

followed by any number of letters or digits. The starting state of the transition diagram is state 0,
the edge from which indicates that the first input character must be a letter. If this is the case, we
enter state 1and look at the input character after that. We continue this until the next input
character is a delimiter for an identifier, which we assume any character that is not a letter or a
digit. On reading the delimiter, we enter state 2.
To make transition diagram into a segment of code, we can write segment of
code for each state. The first step to be done in the code for any state is to obtain the next
character from the input buffer. For this, we use GETCHAR function, which returns the next
character, advancing the forward pointer at each call. The next step is to determine which edge,
if any, out of the state is labeled by a character or class of characters that includes the characters
just read. If such an edge is found, control is transferred to the state pointed to by that edge. If
no such edge is found, and the state is not one which indicates that a token has been found
(indicted by the double circle), we have failed to find this token. The forward pointer must be
retracted to where the beginning pointer is, and another token must be searched for, using
another transition diagram. If all the transition diagram have been tried without having been
tried without success, a lexical error has been detected, and an error correction routine must be
called.
Consider the transition diagram given in figure 3.9. The code for state0 might be:
state 0 : C:=GETCHAR();
if LETTER(C) then goto state1
else FAIL()

Here LETTER is a procedure which returns true if and only if C is a letter.

FAIL is a routine which retracts the forward pointer and starts up the next transition diagram.
The code for state1 is:
state 1 : C:= GETCHAR();

if LETTER(C) or DIGIT(C) then goto state1
else if DELIMITER(C) then goto state2
else FAIL()
DIGIT is a procedure which returns true if and only if C is one of the digits
0,1,…9. DELIMITER is a procedure which returns true whenever C is a character that is not a
letter or digit. State2 indicates that an identifier has been found. Since delimiter is not a part of
the identifier we must retract the forward pointer one character.
3.3.4 REGULAR EXPRESSIONS

Lexical Analyzer uses regular expressions for describing the possible token
that can appear in the input stream. Regular expression is used as a notation for describing the
tokens. These regular expressions can be converted automatically into finite automata, which
are formal specifications of transition diagrams.
The notations used in regular expression include:
Alphabets:
An alphabet is a finite, nonempty set of symbols. Conventionally we use the
symbol  for an alphabet. Common alphabet include:
1.  ={0,1}; the binary alphabet
2.  ={a,b,…z}; the set of all lower-case letters
3. the set of all ASCII characters, or set of all printable ASCII characters.
Strings
A string( or word) is a finite sequence of symbols chosen from some alphabet.
For example, 01101 is a string from the binary alphabet  = {0,1}. The string 111 is another
string chosen from this alphabet.
The Empty String

The empty string is the string with zero occurrences of symbols. It is denoted
by  .
Length of a string
It is the number of positions for symbols in an string. The standard notation
for the length of a string in wis |w|. For example |0110| is 4 and | |=0.

Powers of an alphabet
The set of all strings of a certain length from an alphabet can be represented
using exponential notation. be the set of strings of length k.
For example, If ={0,1}; then ={0,1}; ={00,01,10,11}; ={ }.
The set of all strings over an alphabet is denoted using *. For example
{0,1}* ={0,1,00,01,10,11,000,…}.This can be written as :
= 0 …
Sometimes, we want to exclude the empty string from set of strings. The nonempty strings from
alphabet is denoted by . Thus,
 ={ ….}
 =
Concatenation of Strings
Let x and y be strings. Then xy or x.y denotes the concatenation of x and y,
that is the string formed by making a copy of x and the following it by a copy of y. for example,
if x=a1a2…ai and y is the string composed of j symbols y=b1,b2,…bj, then xy is thestring of
length i+j; xy=a1a2…aib1b2..bj.
The concatenation of the empty string with any string; more formally  x=x  =x.
Also we can take exponential of strings i.e., x1=x, x2=xx, x3=xxx and so on. In
general, xi is the string x repeated i times.x0 to be  for any string x. Thus,  plays the role of
1.
The prefix of a string s is any string obtained by removing zero or more symbols
from the end of s. For example, ban, banana, and  are the prefix of banana. The suffix of
string s is any string obtained by removing zero or more symbols from the beginning of s. For
example, nana, banana, and  are suffixes of banana. A substring of s is obtained by deleting
any prefix and any suffix from s.
Languages
A language is any countable set of strings over some fixed alphabet. Abstract
languages like  , the empty set, or {  }, the set containing only empty string, are languages
under this definition. Also the language can contain set of all syntactically well-formed c-
programs and the set of all grammatically correct English sentences.
The concatenation can also be applied to languages. If L and M are two
languages, then L.M or just LM, is the language consisting of all strings xy which can be
formed by selecting a string x from L, a string y from M and concatenating them in that order.
That is, LM = {xy | x is in L and y is in M }
Definition of Regular Expression

A regular expression is built up of simpler regular expression using a set of

defining rules. The regular expression allows us to define precisely the lexical units.
identifier = letter(letter |digit)*
With this notation we define identifiers as the one that begins with a letter and
may include any number of letter or digits. The vertical bar means “or”. The parenthesis is used
to group subexpression. The „*‟ means 0 or more instances of the parenthesis expression and the
position of letter with remainder of expression means concatenation.
Each regular expression r denotes a language L(r). The defining rules specify
how L(r) is formed by combining in various ways.
1.  is a regular expression that denotes {  } i.e., a set containing an empty string.
2. If „a‟ is a then „a‟ is a regular expression that denotes {a} i.e., the set containing
string a.
3. If r and s are regular expressions denoting the languages L(r) and L(s) then
a. (r) | (s) is a regular expression denoting L(r)  L(s).
b.(r)(s) is a regular expression denoting L(r)(s).
c. r* is a regular expression denoting (L(r))*
A language denoted by a regular expression is said to a regular set. The unary

operator * has the highest precedence and then concatenation and | has the lowest precedence.
e.g: let  = { a, b}
1.the regular expression a|b denotes the {a,b}
2. The regular expression (a|b)(a|b) denotes {aa,ab,ba,bb}
3. The regular expression a* denotes the {  , a, aa, aaa,…}
4. The regular expression (a|b)*denotes the set of all strings containing 0 or more
instances of an a or b i.e., set of strings of a‟s and b‟s.
5. a|a*b denotes the language {a, b, ab, aab, aaab,…}i.e., the string a and all strings
consisting of zero or more a‟s and ending in b.
We can give names to certain regular expressions and use those names in
subsequent expressions, as if the names were themselves symbols.
Example: C identifiers are strings of letters, digits, and underscores. Hereis a regular definition
for the language of C identifiers.
letterA| B |…|Z |a | b |…|z|
digit0|1|…|9

idletter( letter|digit)*
Example: Unsigned numbers (integer or floating point) are strings suchas 5280, 0.01234,
6.336E4, or 1.89E-4. The regular definition
digit 0 |1 |…| 9
digitsdigit digit*
optionalFraction. digits | 
optionalExponent ( E ( + | - |  ) digits ) | 
numberdigits optionalFractionoptionalExponent
3.4 FINITE AUTOMATA

Finite Automata are recognizers. A recognizer for a language is a program
that takes x as an input string and answers „yes‟ if x is a sentence of the language and „no‟
otherwise. We convert a regular expression into a recognizer by constructing a generalized
transition diagram called finite automata. A Finite automata can be deterministic or
nondeterministic. Nondeterministic means that more than one transition out of a state may be
possible for the same symbol.
Both deterministic and nondeterministic finite automata are capable of
recognizing precisely the regular sets. The deterministic finite automata can lead to faster
recognizers than nondeterministic automata. A deterministic automaton can be much bigger
than equivalent nondeterministic automata.
3.4.1 NONDETERMINISTIC FINITEAUTOMATA (NFA)
A NFA(nondeterministic Finite Automata) is a mathematical model that

consist of
 A set of input symbols such as (a,b,c,…)
 Aset of finite states S
 Atransition function that gives for each state, and for each symbol in   { } a
set of next states.

 A state s0 that is distinguished as the start state or initial state.
 A state Fthat is distinguished as the final state or accepting state.
An NFA can be represented diagrammatically by a labeled directed transition

graph, in which the nodes are the states and the labeled edges represent the transition function.
This graph looks like a transition diagram but edges can be labeled by  as well as characters,
and the same character can label two or more transitions out of one state. One state (0 in figure

3.10)is distinguished as the start state, and one or more states may be distinguished as accepting
states (or final states) in figure 3.10 , state 3 is accepting, as indicated by the double circle.
Figure 3.10 Nondeterministic finite Automata
1). {a, b} set of symbols

2). set of states = {0, 1,2,3}
3). moves abb (0  1  2  3)
4) start state 0
5) final state 3
Transition Table
Input Symbol
State
a b
0 {0,1} {0}
1 {2}
2 {3}
Figure 3.11 Transition Table
The transitions of an NFA can be conveniently represented in tabular form by

means of a transition table. The transition table for the NFA of figure 3.10 is shown in figure
3.11 . In the transition table there is a row for each state and a column for each input symbol and
 , if necessary. The entry for row i and symbol a is the set of states that can be reached by a
transition from state i on input a. The figure 3.12 , is an NFA accepting L(aa*|bb*).
The transition table has the advantage that we can easily find the transactions on a
given state and input. Its disadvantage is that it takes a lot of space when the input alphabet is
large and most transition is to the empty set.
On a NFA, We can choose either an epsilon transition or a transition on an alphabet

character, andif there are several transitions with the same symbol, we can choose
betweenthese. This makes the automaton nondeterministic, as the choice of action isnot
determined solely by looking at the current state and input. It may be thatsome choices lead to
an accepting state while others do not. This does, however,not mean that the string is sometimes
in the language and sometimes not:We will include a string in the language if it is possible to
make a sequence ofchoices that makes the string lead to an accepting state.
Acceptance of Input Strings by Automata

An NFA accepts input string x if and only if there is some path in the
transition graph from the start state to one of the accepting states, such that the symbols along the
path spell out x. Note that c labels along the path are effectively ignored, since the empty string
does not contribute to the string constructed along the path.
Figure 3.12NFA accepting aa* | bb*
Example :The string aabbis accepted by the NFA of Figure. 3.10. Thepath labeled by aabbfrom
state 0 to state 3 is:
The language defined (or accepted) by an NFA is the set of input strings it access.
Figure 3.12is an NFA accepting L(aa* | bb*). String aaaisaccepted because of the path
3.4.2 DETERMINISTIC FINITE AUTOMATA (DFA)
A DFA(Deterministic Finite Automaton) D consists of an alphabet , a set of states

S, a transition function T: S S, a start state s0 S, and a set of accepting states A S. The
language accepted by D is represented by L (D). Also DFA is a special case of an NFA where:
1. It has no moves on input 

2. For each state s and input symbol ‘a’, there is exactly one edge out of s labeled
‘a’.

If we are using a transition table to represent a transition function of a DFA, then

each entry in the transition table is a single state. As a consequence it is very easy to determine
whether a DFA access an input stream, since there is atmost one path from the start state labeled
by that string.The following algorithm shows how to apply a DFA to a string.
Algorithm 3. 2: Simulating a DFA

INPUT: An input string x terminated by an end-of-file character eof. A DFAD with start state s0,
accepting states F, and transition function move.
OUTPUT: Answer ''yes" if D accepts x; "no" otherwise.
METHOD:Apply the algorithm to the input string x. The function move(s, c) gives the state to
which there is an edge from state son input c.The function nextChar returns the next character of
the input string x.
s = s0;
c = nextChar();
while( c != eof)
{
s = move(s,c);
c=nextChar();
}
if( s is in F ) return “yes”;
else return “no”;
Figure 3.13: DFA accepting (a|b)*abb

Example: In figure 3.13, a transition graph of a DFA accepting the language (a|b)*abb, the
same as the accepted by the NFA of figure 3.10. Given the input string ababb, this DFA enters
the sequence of states 0,1,2,1,2,3 and return “yes”.
3.5 CONSTRUCTION OF AN NFA FROM A REGULAR

EXPRESSION
There are many strategies for building a recognizer from a regular expression. If
run-time speed is essential we convert the NFA into a DFA using the subset construction. That
is, inorder to construct an NFA from a regular expression, we will fragment regular expression
into sub expression and construct NFA fragment and then combine these fragments. A fragment
is not a complete NFA, so we complete the construction by adding necessary components to
make a complete NFA. A number of text editing programs are available to construct an NFA
from a regular expression.The algorithm is syntax directed in that it uses syntactical structure of
regular expression to guide the construction process.
THOMSON’S CONSTRUCTION
To constructing an NFA from a regular expression we decompose the regular expression into
sub expressions and then construct NFA for each of the basic symbols in r.
Input:A regular expression R over alphabet
Output:An NFA N accepting language L(r)
Method:First, we decompose r into sub expressions then using the rules we construct NFA‟s for
each of the basic symbols in r. Then guided by the syntactic structure of regular expressions we
combine these NFA‟s, until we obtain NFA for the entire expression.
Each intermediate NFA produced during the course of construction corresponds to a sub
expression r and has several important properties
 It has exactly one final state

 No edge enters the start state and no edge leaves the final state.
1. For  in R
2. For a in R we construct

3. For a | b
4. For ab
5. For a*

Construct an NFA for the regular expression (a | b)* abb
Construct an NFA for the regular expression a*b*
Construct an NFA for the regular expression aba*b

3.6 CONVERSION OF AN NFA TO A DFA

Since NFA often has a choice of move on an input symbol or on , or even a
choice of making a transition on or on real input symbol, its simulation is less straightforward
than for a DFA. Thus often it is important to convert an NFA to a DFA that accepts the same
language.
To convert an NFA to DFA we use a technique called as ‘subset

construction’. The general idea behind the subset construction is that each state of the
constructed DFA corresponds to a set of NFA state. After reading input a1a2..an, the DFA is in
that state which corresponds to the set of states that the NFA can reach, from its start state,
following paths labeled a1,a2,…an. The NFA and DFA have approximately the same number of
states.
The algorithm constructs a transition table Dtran for the DFD D. On the
transition table, for a set of NFA state, we construct a DFA state. Another notation used in the
algorithm is , which means a set of NFA states reachable from NFA state s on
transitions alone. means, set of NFA states reachable from some NFA state s in
set T on transition alone and then move(T, a) gives the set of NFA states to which there is a
transition on input symbol a from state sin T.
ALGORITHM: The subset construction of a DFA from an NFA

INPUT: An NFA N.
OUTPUT: A DFA D accepting the same language as N.
METHOD:The algorithm for computing

The algorithm for subset construction is as follows:
Example:
Construct an DFA from the NFA shown in figure 3.14
Figure 3.14 NFA N for (a|b)*abb
Here, the start state of this NFA is state 0. We have to find -closure(start state) in order to find
the start state of the DFA, D.
1). =A
2). then we have to findmove(A,a) = {3,8}

( )
3). move(A, b) = {5}
( )
4).move(B, a) ={3,8}
( )
5). move(B, b) = {5, 9}
( )
6). move(C,a) = {3, 8}
( )
7). move(C,b) = {5}
( )
8). move(D,a) ={3,8}
( )
9). move(D,b) = {5,10}
( )
10). move(E, a) = {3,8}
( )
11). ). move(E,b) = {5}
( )
Transition Table:
List of NFA states DFA state a b

A B C
B B D
C B C
D B E
E B C
and the corresponding DFA is as follows:


Module 3

Caricato da

Informazioni sul documento

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Module 3

Caricato da

Copyright:

Formati disponibili

MODULE III MCA-303 SYSTEM SOFTWARE ADMN 2011-‘12

Figure 3.1: A compiler

A source program/code is a program/code written in the source language, which is

Figure 3.2 Running the target program

The first FORTAN compiler took 18 man-years to implement. A compiler can

Dept.of Computer Science And Applications, SJCET, Palai P a g e | 60

3.2 STRUCTURE OF A COMPILER

3.2.1 Lexical Analyzer or Scanner

As an example, consider the following line of code,

sum = old_sum+ value/100;

this code contains 7 tokens:

Dept.of Computer Science And Applications, SJCET, Palai P a g e | 61

Table Semantic Analysis Error

Figure 3.3 : Phases of a compiler

100 integer constant

3.2.2 Syntax Analyzer(Parser)

Dept.of Computer Science And Applications, SJCET, Palai P a g e | 62

Figure 3.4 A parse tree

3.2.3 Semantic Analyzer

3.2.4 Intermediate Code Generation

Dept.of Computer Science And Applications, SJCET, Palai P a g e | 63

Figure 3.5 the output of semantic analysis

Figure 3.5 Parse tree

3.2.5 Code Optimization

3.2.6 Code Generation

3.2.7 Table Management

Dept.of Computer Science And Applications, SJCET, Palai P a g e | 64

3.2.8 Error Handler

3.3 LEXICAL ANALYSIS

Dept.of Computer Science And Applications, SJCET, Palai P a g e | 65

3.3.1 THE ROLE OF LEXICAL ANALYZER

Dept.of Computer Science And Applications, SJCET, Palai P a g e | 66

3.3.2 INPUT BUFFERING

Figure 3.8 Input Buffer

3.3.3 A SIMPLE APPROACH TO THE DESIGN OF LEXICAL

Dept.of Computer Science And Applications, SJCET, Palai P a g e | 67

Figure 3.9 Transition diagram for identifier

Figure 3.9 shows a transition diagram for an identifier, defined to be a letter

Dept.of Computer Science And Applications, SJCET, Palai P a g e | 68

Here LETTER is a procedure which returns true if and only if C is a letter.

state 1 : C:= GETCHAR();

3.3.4 REGULAR EXPRESSIONS

1.  ={0,1}; the binary alphabet

2.  ={a,b,…z}; the set of all lower-case letters

The Empty String

Dept.of Computer Science And Applications, SJCET, Palai P a g e | 69

Dept.of Computer Science And Applications, SJCET, Palai P a g e | 70

A regular expression is built up of simpler regular expression using a set of

identifier = letter(letter |digit)*

A language denoted by a regular expression is said to a regular set. The unary

1.the regular expression a|b denotes the {a,b}

2. The regular expression (a|b)(a|b) denotes {aa,ab,ba,bb}

3. The regular expression a* denotes the {  , a, aa, aaa,…}

Dept.of Computer Science And Applications, SJCET, Palai P a g e | 71

3.4 FINITE AUTOMATA

3.4.1 NONDETERMINISTIC FINITEAUTOMATA (NFA)

A NFA(nondeterministic Finite Automata) is a mathematical model that

set of next states.

An NFA can be represented diagrammatically by a labeled directed transition

Dept.of Computer Science And Applications, SJCET, Palai P a g e | 72

Figure 3.10 Nondeterministic finite Automata

1). {a, b} set of symbols

Figure 3.11 Transition Table

The transitions of an NFA can be conveniently represented in tabular form by

Construct an NFA for the regular expression ab