Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
2170701 Unit-1
UNIT -1
Introduction to Compiler
1. TRANSLATORS
A translator is one kind of program that takes one form of program (input) and
converts into another form (output). The input program is called source language
and the output program is called target language.
The source language can be high level language like C, C++, JAVA, FORTRAN,
and so on.
The target language can be a low level language (assembly language) or a
machine language (set of instructions executed directly by a CPU).
(1). Compilers
(2). Interpreters
(3). Assemblers
COMPILER:-
Prepared By:
Prof. Ankita Chauhan
CE Department
MBICT-New Vvnagar
Compiler Design
2170701 Unit-1
Error massage
It is one of the translators that translate high level language to low level
language.
An interpreter is another common kind of language processor. Instead of
producing a target program as a translation, an interpreter appears to directly
execute the operations specified in the source program on inputs supplied by
the user, as shown in Figure 1.4.
One-pass assemblers go through the source code once and assume that all
symbols will be defined before any instruction that references them.
Two-pass assemblers create a table with all symbols and their values in the
first pass, and then use the table in a second pass to generate code. The
assembler must at least be able to determine the length of each instruction on
the first pass so that the addresses of symbols can be calculated.
Example: Microprocessor 8085, 8086.
Hybrid Compiler
Hybrid Compiler is combination of compilation and interpretation.
Java language processors combine compilation and interpretation as shown in
Figure 1.6.
Prepared By:
Prof. Ankita Chauhan
CE Department
MBICT-New Vvnagar
Compiler Design
2170701 Unit-1
Java source program first be compiled into an intermediate form called bytecodes.
The byte codes are then interpreted by a virtual machine.
A benefit of this arrangement is that bytecodes compiled on one machine can be
interpreted on another machine.
Compiler Interpreter
Scans the entire program and translates it Translates program one statement
as a whole into machine code. at a time.
Save machine code for future use. Doesnt store translated statements.
Prepared By:
Prof. Ankita Chauhan
CE Department
MBICT-New Vvnagar
Compiler Design
2170701 Unit-1
Figure 2.1: A language-processing system
Note: Preprocessors, Assemblers, Linkers and Loader are collectively called cousins of
compiler
Preprocessor:
A preprocessor is a program that processes its input data (source program) to produce
output that is used as input to another program. The output is said to be a modified source
program. A preprocessor produce input to compilers. They may perform the following
functions.
1. Macro processing: A preprocessor may allow a user to define macros that are
short hands for longer constructs. macro definitions (#define, #undef)
2. File inclusion: A preprocessor may include header files into the program text.
When the preprocessor finds #include directive it replaces it by the entire
content of the specified file. #include <file>
Prepared By:
Prof. Ankita Chauhan
CE Department
MBICT-New Vvnagar
Compiler Design
2170701 Unit-1
Two-pass assemblers create a table with all symbols and their values in the first pass, and
then use the table in a second pass to generate code.
LINKER:
Large programs are often compiled in pieces, so the relocatable machine code may have
to be linked together with other relocatable object files and library files into the code that
actually runs on the machine.
Three tasks of the linker are :
1. Searches the program to find library routines used by program, e.g. printf(), math
routines.
2. Determines the memory locations that code from each module will occupy and
relocates its instructions by adjusting absolute references
3. Resolves references among files.
LOADER:
The loader then puts together the entire executable object files into memory for
execution. It also performs relocation of an object code. A loader is the part of an
operating system that is responsible for loading programs in memory, one of the essential
stages in the process of starting a program.
Linker & Loader produce absolute machine code.
Prepared By:
Prof. Ankita Chauhan
CE Department
MBICT-New Vvnagar
Compiler Design
2170701 Unit-1
Synthesis part constructs the desired target program from the intermediate
representation.
The synthesis part carried out in two phases, they are in Code Optimization and
Code Generation. The synthesis part is called the back end of the compiler.
A Compiler operates in phases, each of which transforms the source program from one
representation into another. The following are the phases of the compiler:
Main phases:
1) Lexical analysis
2) Syntax analysis
3) Semantic analysis
4) Intermediate code generation
5) Code optimization
6) Code generation
LEXICAL ANALYSIS:
Prepared By:
Prof. Ankita Chauhan
CE Department
MBICT-New Vvnagar
Compiler Design
2170701 Unit-1
The first phase of a compiler is called lexical analysis or scanning or linear
analysis.
The lexical analyzer reads the stream of characters making up the source program
and groups the characters into meaningful sequences called lexemes.
For each lexeme, the lexical analyzer produces output as a token.
For example, suppose a source program contains the assignment statement
position = initial + rate * 60 3.1
The characters in this assignment statement could be grouped into the following
lexemes and mapped into the following tokens.
(1) position is a lexeme that would be mapped into a token < id1 >. Where id1 is
an abstract symbol standing for identifier.
(2) The assignment symbol = is a lexeme that is mapped into the token < = >.
(3) initial is a lexeme that is mapped into the token < id2 >.
(4) + is a lexeme that is mapped into the token < + >.
(5) rate is a lexeme that is mapped into the token < id3 >.
(6) * is a lexeme that is mapped into the token < * >.
(7) 60 is a lexeme that is mapped into the token < 60 >.
Blanks separating the lexemes would be discarded by the lexical analyzer. The
sequence of tokens produced as follows after lexical analysis.
Id1 = id2 + id3 * 60 3.2
Syntax Analysis
It is called parsing or hierarchical analysis. It involves grouping the tokens of the
source program into grammatical phrases that are used by the compiler to
synthesize output.
They are represented using a syntax tree as shown below:
Prepared By:
Prof. Ankita Chauhan
CE Department
MBICT-New Vvnagar
Compiler Design
2170701 Unit-1
A syntax tree is the tree generated as a result of syntax analysis in which the
interior nodes are the operators and the exterior nodes are the operands.
This analysis shows an error when the syntax is incorrect.
Semantic Analysis :
It is the third phase of the compiler.
The semantic analyzer uses the syntax tree and the information in the symbol
table to check the source program for semantic consistency with the language
definition.
It checks the source programs for semantic errors and saves it in either the syntax
tree or the symbol table, for subsequent use during intermediate-code generation.
An important part of semantic analysis is type checking, where the compiler
checks that each operator has matching operands.
For example, many programming language definitions require an array index to
be an integer; the compiler must report an error if a floating-point number is used
to index an array.
Suppose that position, initial and rates have been declared to be floating-point
numbers, and that the lexeme 60 by itself forms an integer.
The type checker in the semantic analyzer discovers that the operator * is applied
to a floating-point number rate and an integer 60.
Prepared By:
Prof. Ankita Chauhan
CE Department
MBICT-New Vvnagar
Compiler Design
2170701 Unit-1
In this case, the integer may be converted into a floating-point number. In Fig.
3.3, notice that the output of the semantic analyzer has an extra node for the
operator inttofloat, which explicitly converts its integer argument into a floating-
point number.
Code Optimization
The machine-independent code-optimization phase attempts to improve the
intermediate code so that better target code will result. Usually better means
faster.
Optimization has to improve the efficiency of code so that the target program
running time and consumption of memory can be reduced.
The optimizer can deduce that the conversion of 60 from integer to floating point
can be done once and for all at compile time, so the inttofloat operation can be
eliminated by replacing the integer 60 by the floating-point number 60.0.
Prepared By:
Prof. Ankita Chauhan
CE Department
MBICT-New Vvnagar
Compiler Design
2170701 Unit-1
Moreover, t3 is used only once to transmit its value to id1 so the optimizer can
transform (eq.3.3) into the shorter sequence
t1 = id3 * 60.0
id1 = id2 + t1 3.4
Code Generation
The code generator takes as input an intermediate representation of the source
program and maps it into the target language.
If the target language is machine code, then the registers or memory locations are
selected for each of the variables used by the program.
The intermediate instructions are translated into sequences of machine
instructions.
For example, using registers R1 and R2, the intermediate code in (3.4) might get
translated into the machine code
LDF R2, id3
MULF R2, R2 , #60.0
LDF Rl, id2
ADDF Rl, R2
STF idl, Rl 3.5
The first operand of each instruction specifies a destination. The F in each
instruction tells us that it deals with floating-point numbers.
The above code loads the contents of address id3 into register R2, and then
multiplies it with floating-point constant 60.0. The # signifies that 60.0 is to be
treated as an immediate constant.
The third instruction moves id2 into register R1 and the fourth adds to it the value
previously computed in register R2.
Finally, the value in register R1 is stored into the address of id1, so the code
correctly implements the assignment statement (eq. 3.1).
Prepared By:
Prof. Ankita Chauhan
CE Department
MBICT-New Vvnagar
Compiler Design
2170701 Unit-1
Prepared By:
Prof. Ankita Chauhan
CE Department
MBICT-New Vvnagar
Compiler Design
2170701 Unit-1
Symbol-Table Management
The symbol table, which stores information about the entire source program, is
used by all phases of the compiler.
Prepared By:
Prof. Ankita Chauhan
CE Department
MBICT-New Vvnagar
Compiler Design
2170701 Unit-1
An essential function of a compiler is to record the variable names used in the
source program and collect information about various attributes of each name.
These attributes may provide information about the storage allocated for a name,
its type, its scope.
In the case of procedure names, such things as the number and types of its
arguments, the method of passing each argument (for example, by value or by
reference), and the type returned are maintained in symbol table.
The symbol table is a data structure containing a record for each variable name,
with fields for the attributes of the name. The data structure should be designed
to allow the compiler to find the record for each name quickly and to store or
retrieve data from that record quickly.
A symbol table can be implemented in one of the following ways:
Linear (sorted or unsorted) list
Binary Search Tree
Hash table
Among the above all, symbol tables are mostly implemented as hash tables,
where the source code symbol itself is treated as a key for the hash function and
the return value is the information about the symbol.
A symbol table may serve the following purposes depending upon the language
in hand:
To store the names of all entities in a structured form at one place.
To verify if a variable has been declared.
To implement type checking, by verifying assignments and
expressions.
To determine the scope of a name (scope resolution).
Prepared By:
Prof. Ankita Chauhan
CE Department
MBICT-New Vvnagar
Compiler Design
2170701 Unit-1
The syntax and semantic phases handles large number of errors in compilation
process.
Error handler handles all types of errors like lexical errors, syntax errors, semantic
errors and logical errors.
Lexical errors:
Lexical analyzer detects errors from input characters.
Name of some keywords identifiers typed incorrectly.
Example: switch is written as swich.
Syntax errors:
Syntax errors are detected by syntax analyzer.
Errors like semicolon missing or unbalanced parenthesis.
Example: ((a+b* (c-d)). In this statement ) missing after b.
Semantic errors:
Data type mismatch errors handled by semantic analyzer.
Incompatible data type value assignment.
Example: Assigning a string value to integer.
Logical errors:
Code note reachable and infinite loops.
Misuse of operators. Codes written after end of main() block.
Prepared By:
Prof. Ankita Chauhan
CE Department
MBICT-New Vvnagar
Compiler Design
2170701 Unit-1
Prepared By:
Prof. Ankita Chauhan
CE Department
MBICT-New Vvnagar
Compiler Design
2170701 Unit-1
1. Helps focus on the problem. Compilation is quite a complex task. Hence
separating it into stages makes it easier and helps to focus on the problem. It also
makes it easier to engineer and have different teams focus on each part.
2. Makes it modular. The frontend is specific to the source language and the backend
is specific to the target machine. An issue of the target architecture does not show
up in the frontend and an issue of the source language does not show up in the
backend. If someone comes up with a better frontend then he/she can replace the
frontend without any changes to the backend. Same with backend.
3. Makes it portable across languages and architecture. The intermediate code is
independent of the source language and the machine code. Hence the compiler
can easily be ported to different source language (by writing a different frontend)
and target architecture(by writing a different backend).
Compiler passes
A collection of phases is done only once (single pass) or multiple times (multi
pass)
Single pass:
Usually requires everything to be defined before being used in source
program.
Multi pass:
Compiler may have to keep entire program representation in memory.
Several phases can be grouped into one single pass and the activities of
these phases are interleaved during the pass. For example, lexical analysis,
syntax analysis, semantic analysis and intermediate code generation might
be grouped into one pass.
Difference:
1. A one-pass compilers is faster than multi-pass compilers. Multi - pass
compiler is slower than a single pass compiler because each pass reads &
writes an intermediate file.
2. A one-pass compiler has limited scope of passes but multi-pass compiler has
wide scope of passes.
Prepared By:
Prof. Ankita Chauhan
CE Department
MBICT-New Vvnagar
Compiler Design
2170701 Unit-1
3. Multi-pass compilers are sometimes called wide compilers where as one-pass
compiler are sometimes called narrow compiler.
4. Single pass compiler can be made to use more space than multi-pass compiler
whereas Multi - pass compiler can be made to use less space than a single -
pass compiler.
5. Single pass compiler is not portable where as multipass compiler supports
portability.
6. Many programming languages cannot be represented with single pass
compilers, for example Pascal can be implemented with a single pass compiler
where as languages like Java require a multi-pass compiler.
Prepared By:
Prof. Ankita Chauhan
CE Department
MBICT-New Vvnagar
Compiler Design
2170701 Unit-1
It is one of the essential stages in the process of starting a program. Because it
places programs into memory and prepares them for execution.
Loading a program involves reading the contents of executable file into memory.
Once loading is complete, the operating system starts the program by passing
control to the loaded program code.
All operating systems that support program loading have loaders. In many
operating systems the loader is permanently resident in memory.
Prepared By:
Prof. Ankita Chauhan
CE Department
MBICT-New Vvnagar
Compiler Design
2170701 Unit-1
Assignment-1
Introduction to Compiler
1. What is compiler? List major functions done by compiler.
2. Differentiate compiler, interpreter and Assembler.
3. Explain the analysis synthesis model of compilation. List the factors that affect the
design of compiler. Or (explain structure of compiler.) Or (Draw different phases
of Compiler with example. Also explain all Phase in brief. )
4. Find errors and identify the phase of compiler detecting them for following C
program segment. Justify your answers.
int fi( int);
char a[10], * cptr;
int k = 1 ;
int j = 2;
float f;
cptr = a;
if (k);
fi(k);
fi( j )
++k;
*(cptr + 1 ) = 0 ;
++ a;
n + *k ;
5. Explain cousins of compiler or context of compiler.
6. What is a symbol table? Discuss any two data structures suitable for it & compare
their merits / demerits. Also compare one pass & two pass compilers.
7. Explain linker and loader in details.
8. Differentiate single pass and multi-pass compiler.
9. Explain front end and back end in details.
Prepared By:
Prof. Ankita Chauhan
CE Department
MBICT-New Vvnagar
Compiler Design
2170701 Unit-1
19. An assembler is
20. An analysis, which determines the syntactic structure of the source statement, is
called
(A) Assembler (B) Compiler (C) Interpreter (D) All of the above
22. A system program that combines separately compiled modules of a program into a
form suitable for execution is __________
(A) Assembler
Prepared By:
Prof. Ankita Chauhan
CE Department
MBICT-New Vvnagar
Compiler Design
2170701 Unit-1
(B) Linking loader
(C) Linker
(D) None of the Above
23. A compiler that runs on a particular platform and is capable of generating executable
code for another platform is called a _______
A. Assembler
B. Linking loader
C. Loader
D. Cross-Compiler
A. Analysis
B. Synthesis
C. Analysis & Synthesis
D. None of above
25. Comments may appear in a special font is an example of
A. Structure Editor
B. Pretty printers
C. Static checker
D. Interpreter
26. Certain variables might be used before being defined is an example of
A. Structure Editor
B. Pretty printers
C. Static checker
D. Interpreter
27. Which of the following may detect that parts of the program can never be
executed?
A. Structure Editor
B. Pretty printers
C. Static checker
Prepared By:
Prof. Ankita Chauhan
CE Department
MBICT-New Vvnagar
Compiler Design
2170701 Unit-1
D. Interpreter
28. Analysis of source program consist of
A. Linear analysis
B. Hierarchical analysis
C. Semantic analysis
D. All of above
29. Output of compiler is
A. Source program
B. Relocatable machine code
C. Target assembly program
D. Absolute machine code
30. Output of Assembler is
a. Source program
b. Relocatable machine code
c. Target assembly program
d. Absolute machine code
31. Output of preprocessor is
a. Source program
b. Relocatable machine code
c. Target assembly program
d. Absolute machine code
32. In which phase identifiers entered into the symbol table?
a. Lexical Analysis
b. Syntax Analysis
c. Semantic Analysis
d. Intermediate Code Generation
33. Which of the following are compiler construction tools?
a. Parser generators
b. Scanner generators
c. Data flow engines
d. All of above
Prepared By:
Prof. Ankita Chauhan
CE Department
MBICT-New Vvnagar
Compiler Design
2170701 Unit-1
34. Which of the following is a phase of compilation
a. Lexical Analysis
b. Code generation
c. Static analysis
d. Both a & b
35. An ideal compiler should
a. Detect error
b. Detect & repair error
c. Detect, repair & correct error
d. None of these
36. Which of the following is/are phases of analysis
a. Lexical Analysis
b. Code generation
c. Code optimization
d. None of above
37. Which of the following phase of compilation process is an optional phase.
a. Lexical analysis
b. Syntax analysis
c. Code optimization
d. Code generation
Prepared By:
Prof. Ankita Chauhan
CE Department
MBICT-New Vvnagar