Sei sulla pagina 1di 79

Complier Design (CSE306)

Dr. Murali Krishna Enduri


Department of CSE

1
Syllabus
SUBJECT CODE SUBJECT TITLE CORE/ ELECTIVE CREDITS
L T P C
CSE 306 Complier Design C 3 0 2 4

UNIT I: Introduction to Compilers


 
Translators-Compilation and Interpretation-Language processors-The Phases of Compiler-
Errors Encountered in Different Phases-The Grouping of Phases-Compiler Construction
Tools - Programming Language basics.

UNIT II : Lexical Analysis


Need and Role of Lexical Analyzer-Lexical Errors-Expressing Tokens by Regular Expressions-
Converting Regular Expression to DFA- Minimization of DFA-Language for Specifying Lexical
Analyzers-LEX-Design of Lexical Analyzer for a sample Language.
Syllabus
UNIT III: Syntax Analysis
Need and Role of the Parser-Context Free Grammars -Top Down Parsing -General
Strategies- Recursive Descent Parser Predictive Parser-LL(1) Parser-Shift Reduce Parser-LR
Parser-LR (0)Item- Construction of SLR Parsing Table -Introduction to LALR Parser - Error
Handling and Recovery in Syntax Analyzer-YACC-Design of a syntax Analyzer for a Sample
Language
 
UNIT IV : Syntax Directed Translation and Run Time Environment
Syntax directed Definitions-Construction of Syntax Tree-Bottom-up Evaluation of S-Attribute
Definitions- Design of predictive translator - Type Systems-Specification of a simple type
checker- Equivalence of Type Expressions-Type Conversions. RUN-TIME ENVIRONMENT:
Source Language Issues-Storage Organization-Storage Allocation- Parameter Passing-
Symbol Tables-Dynamic Storage Allocation-Storage Allocation in FORTAN.
 
Syllabus
UNIT V : Code Optimization and Code Generation
Principal Sources of Optimization-DAG- Optimization of Basic Blocks-Global Data Flow Analysis-
Efficient Data Flow Algorithms-Issues in Design of a Code Generator - A Simple Code Generator
Algorithm.

Books of Study
1. Compilers – Principles, Techniques and Tools, Alfred V Aho, Monica S. Lam, Ravi Sethi and
Jeffrey D Ullman, 2 nd Edition, Pearson Education, 2007.
Books of References
2. Optimizing Compilers for Modern Architectures: A Dependence-based Approach,Randy
Allen, Ken Kennedy, Morgan Kaufmann Publishers, 2002.
3. Advanced Compiler Design and Implementation, Steven S. Muchnick, Morgan Kaufmann
Publishers - Elsevier Science, India, Indian Reprint 200
4. Engineering a Compiler,Keith D Cooper and Linda Torczon, Morgan Kaufmann Publishers
Elsevier Science, 2004.
5. Crafting a Compiler with C,Charles N. Fischer, Richard. J. LeBlanc, Pearson Education, 2008.
The Course covers

o Compiler Basics
o Lexical Analysis
o Syntax Analysis
o Semantic Analysis
o Runtime environments
o Code Generation
o Code Optimization
Distribution of Marks
Assessment Tool Conducting Marks Converting Marks Final Conversion
Internal
Mid-term-I 25 10 30
Theory
Mid-term-II 25 10
CLA 1 30 5
CLA 2 30 5
Lab Performance 10 10 20
Practical
Observation note 10 10
Total 50

External Assessment tool Conducting Marks Final Conversion


End semester theory exam Final exam 100 30

End semester Practical exam Project 100 20

Total 50
Playing it safe in CSE-306

If you follow these 4 simple rules during the class, you'll make
sure that you do well in the course:
1. Attend every Theory and LAB classes.
2. Read the course material (textbook sections assigned +
slides).
3. Submit everything (Assignments, Quizzes, Exams) on time -
don't be late.
4. Don't cheat.
Basics of Complier Design

8
Translators
What do you understand by the terms translator?

A translator or language processor is a program that translates an input program written in a


programming language into an equivalent program in another language.

The source language can be low level language like assembly language or a high level
language like C, C++, JAVA, FORTRAN, and so on.
The target language can be a low level language (assembly language) or a machine
language (set of instructions executed directly by a CPU).
Translators

What do you understand by the terms translator?

Different types of Translators are:

o Compiler
o Interpreter
o Assembler
o Cross-Compiler
o Language Translator / Source to source translator / Language Converter
o Language Rewriter
o Decompiler
o Compiler-Compiler
o Linker
o Loader
What is a Compiler

Compiler is a type of translator, which takes a program written in a high-level programming


language as input and translates into an equivalent program in low-level language such as
machine language or assembly language.

The program written in high-level language is known as source program, and the program
converted into low-level language is known as object (or target) program.
What is a Compiler

Source Program Target Program


12
Why Study Compiler?

Its essential to understand the heart of programming by


computer engineering students.

There may be need of designing a compiler for any software


language in the profession.

Increases understanding of language semantics.

 Helps to handle language performance issues.

Opportunity for non-trivial programming project


13
History

• Software for early computers were written in assembly language.


• The need of reusability of code gave birth to programming
languages.
• This need grow huge to overcome the cost restriction of compiler.

1. The concept of machine independent programming gave birth


to the need of compilers in 1950s.
2. The 1st compiler was written by Grace Hopper in 1952 for A-0
programming language.
14
History

1. The 1st complete compiler was developed by FORTRAN team lead by


John Backus @ IBM in 1957.
2. COBOL was the 1st language to be compiled on multiple platforms in
1960.
3. Earlier compilers were written in assembly languages.
4. The 1st compiler in HLL was created for LISP by Tim Hart & Mike
Levin @ MIT, USA in 1962, which was a selfhosting Compiler.
5. Most compilers are made in C or Pascal languages.
6. However the trend is changing to self-hosting compilers, which can
compile the source code of the same language in which they are
created.
15
Translator and Interpreter
A translator or language processor is a program that translates an input program written in a
programming language into an equivalent program in another language.

An interpreter is another common kind of Translator.


Interpreter: This software converts the high-level language into low-level language line
by line and executed.

16
Interpreter

1.It takes less memory than compiler.


2.It takes more time than compiler.
3.An interpreter, however, can usually give better error diagnostics
than a compiler, because it executes the source program statement
by statement.
4.It takes more time to build.
5.The machine-language target program produced by a compiler is
usually much faster than an interpreter at mapping inputs to
outputs.

17
Difference between compiler and interpreter

18
Translators

What do you understand by the terms translator?

Different types of Translators are:

o Compiler
o Interpreter
o Assembler
o Cross-Compiler
o Language Translator / Source to source translator / Language Converter
o Language Rewriter
o De-compiler
o Compiler-Compiler
o Linker
o Loader
Hybrid Compiler
Hybrid Compiler is combination of compilation and interpretation. Java language
processors combine compilation and interpretation

Java source program first be compiled into an intermediate form called bytecodes. The bytecodes are then
interpreted by a virtual machine.

A benefit of this arrangement is that


bytecodes compiled on one machine
can be interpreted
on another machine

In order to achieve faster processing of


inputs to outputs, some Java compilers,
called Just in time compilers, translate the
bytecodes into machine language
immediately before they run.
20
Translation

21
Other different Translators

Assembler: This software converts the assembly language (assembly


instruction mnemonics) into machine level language (opcodes i.e.0,1).

• It offers reusability of assembly codes on different machine


platforms.

Cross-Compiler: If the compiled program can run on a computer whose


C.P.U or O.S is different from the one on which the compiler runs, the
compiler is known as a crosscomplier.
22
Other different Translators

Language Translator / Source to source translator / Language


Converter: It converts programs in one high-level language to another
high-level language.
Ex: From Java code to ASP.Net code, C program to Python program

Language Rewriter: It is a program that changes form of expression of


the same language.
It does not changes the language of source code.
De-compiler: It is a piece of software that converts the low-level
language to high-level language .
It is not as famous, but may prove a useful tool sometimes.
23
Other different Translators
Compiler-Compiler: It is a tool that creates a compiler, interpreter or
parser from the information provided on formal description of any language.
The earliest & most common type is parser-generator.

Linker: It is a program that combines object modules(object programs) to form executable


program.
Each module of software is compiled separately to produce object programs. Linker will
combine all object modules & give it to loader for loading it in memory.

Loader: It is a program which accepts input as linked modules & loads them into main memory
for execution.
It copies modules from secondary memory to main memory. It may also replace virtual
addresses with physical addresses. Linker & Loader may overlap.

24
Language processing system

25
Language processing system

26
Language processing system
Pre-processor: Preprocessor collects the source program which is
divided into modules and stored in separate files. The preprocessor
may also expand shorthands called macros into source language
statements. E.g. # include<math.h>, #define PI .14

Compiler: The modified source program is then fed to a compiler.


The compiler may produce an assembly-language program as its
output. because assembly language is easier to produce as output and
is easier to debug.

Assembler: The assembly language is then processed by a program


called an assembler that produces relocatable machine code as its
output.

27
Steps involved in the analysis of a source program

Pre-processor modifies the source code


by replacing the header files with the suitable
content.

Compiler translates the modified source program


of high-level language into the target
program.

Assembler translates the assembly


language code into the relocatable machine
language code.

Linker links the relocatable code with the library files and
the relocatable objects, and loader loads the integrated
code into memory for the execution.

28
compiling C programs are preprocessing, compilation, and linking

Example:

Reference: https://jsommers.github.io/cbook/programstructure.html
29
Structure of Compiler

Compilers bridge the gap between high level language & machine
hardware.
• Compiler requires:
1. Finding errors in syntax of program.
2. Generating correct & efficient object code.
3. Run-time organization.
4. Formatting o/p according to linker/ assembler

30
Structure of Compiler

31
Structure of Compiler

32
Structure of Compiler

• Lexical Analysis
• Syntax Analysis Analysis Part

• Semantic Analysis Front End

• Intermediate Code Generation machine independent.

• Code Optimization Synthesis Part

• Code Generation Back End


machine dependent

33
Structure of Compiler

• Lexical Analysis
• Syntax Analysis Analysis Part

• Semantic Analysis Front End

• Intermediate Code Generation machine independent.

• Determines operations implied by the source program which are recorded in a tree structure
called the Syntax Tree.
• Breaks up the source code into basic pieces, while storing info. in the symbol table.

• Code Optimization Synthesis Part


Back End
• Code Generation machine dependent
34
Structure of Compiler

• Lexical Analysis
• Syntax Analysis Analysis Part

• Semantic Analysis Front End

• Intermediate Code Generation machine independent.

Synthesis Part
• Code Optimization Back End
• Code Generation machine dependent

• Constructs the target code from the syntax tree, and from the information in the symbol table.
• Here, code optimization offers efficiency of code generation with least use of resources.

35
Structure of Compiler

36
Lexical Analysis

• Initial part of reading and analyzing the program text.


• Text is read and divided into tokens, each of which corresponds to
a symbol in the programming language.
• Ex: Variable, keyword, delimiters or digits.

37
Lexical Analysis- Example

38
Lexical Analysis- Example

39
Structure of Compiler

40
Syntax Analysis
• It takes list of tokens produced by lexical analysis.
• Then, these tokens are arranged in a tree like structure (Syntax tree), which reflects
program structure.
• Also known as Parsing.

In syntax analysis, the tokens are grouped together and checked, if they form a valid sequence
as defined in the programming language.

A context-free grammar specifies the rules or productions for identifying constructs that are
valid in a programming language.

The productions can be compared to the rules of grammar in natural languages.

41
Syntax Analysis- Example

42
Syntax Analysis- Example

43
Structure of Compiler

44
Semantic Analysis
• It validates the syntax tree by applying rules & regulations of the target language.
• It does type checking, scope resolution, variable declaration, etc.
• It decorates the syntax tree by putting data types, values, etc.

In semantic analysis, we check if the syntactically correct statements make a meaningful reading.
For example, a statement in the input source program ‘x = y + 2;’ would not make a meaningful
read if say x is the name of a function or array and y is a float type of variable. This statement
might be syntactically acceptable by the productions of the context-free grammar in syntax
analysis, but would not hold out during semantic analysis because the data types of x and y are
not compatible.

45
Structure of Compiler

46
Intermediate Code Generation
• The program is translated to a simple machine independent intermediate language.
• Register allocation of variables is done in this phase.

47
Structure of Compiler

48
Code Optimization
• It aims to reduce process timings of any program.
• It produces efficient programming code.
• It is an optional phase
• Removing unreachable code.
• Getting rid of unused variables
• Eliminating multiplication by 1 and addition by 0
• Removing statements that are not modified from the loop
• Common sub-expression elimination

49
Structure of Compiler

50
Code Generation
• Target program is generated in the machine language of the target architecture.
• Memory locations are selected for each variable.
• Instructions are chosen for each operation
• Individual tree nodes are translated into sequence of m/c language instructions

51
Structure of Compiler

52
Symbol Table
• It stores identifiers identified in lexical analysis.
• It adds type and scope information during syntactical and semantical analysis.
• Also used for ‘Live analysis’ in optimization.
• This info is used in code generation to find which instructions to use.
• A symbol table is a data structure that is used by the compiler to record and collect
information about source program constructs like variable names and all of its attributes,
which provide information about the storage space occupied by a variable (name, type, and
scope of the variables).
• A symbol table should be designed in an efficient way so that it permits the compiler to
locate the record for each token name quickly and to allow rapid transfer of data from the
records.

53
54
Mini QUIZ-4 & 5:
https://bit.ly/2Cit03a

55
Structure of Compiler

56
Error Handler
• It handles error handling & reporting during many phases.
• Error detection and reporting of errors are important functions of the compiler.
• Ex: Invalid character sequence in scanning, invalid token sequences in parsing, type & scope
errors in semantic analysis.
1. In lexical analysis phase, errors can occur due to misspelled tokens, unrecognized characters,
etc. These errors are mostly the typing errors.
2. In syntax analysis phase, errors can occur due to the syntactic violation of the language.
3. In intermediate code generation phase, errors can occur due to incompatibility of operands
type for an operator.
4. In code optimization phase, errors can occur during the control flow analysis due to some
unreachable statements.
5. In code generation phase, errors can occurs due to the incompatibility with the computer
architecture during the generation of machine code. For example, a constant created by
compiler may be too large to fit in the word of the target machine.
6. In symbol table, errors can occur during the bookkeeping routine, due to the multiple
57
declaration of an identifier with ambiguous attributes.
Structure of Compiler

58
Structure of Compiler Example

59
Compiler Construction Tools

• Compiler construction tools were introduced after widespread of computers.


• Also known as compiler- compilers, compiler generators or translator writing systems.
• These tools may use sophisticated algorithm or specified languages for specifying &
implementing the component.

60
Compiler Construction Tools
• Compiler construction tools were introduced after widespread of computers.
• Also known as compiler- compilers, compiler generators or translator writing systems.
• These tools may use sophisticated algorithm or specified languages for specifying &
implementing the component.

Scanner Generators: These generate lexical analyzers.


• The basic lexical analyzer is produced by Finite Automata, which takes input in form of
regular expressions.
• Ex: LEX for Unix O.S.

Parser Generators: Parser generators that automatically produce syntax analyzers (parse tree)
from a grammatical description of a programming language. Unix has a tool called YACC which
is a parser generator.
Earlier, used to be most difficult to develop but now, easier to develop & implement.
61
Compiler Construction Tools
• Automatic Code Generator: This software basically take intermediate code as input &
produce machine language as output.
• It is capable of fetching data from various storage locations like registers, static memory,
stack, etc.
• Basic technique here is ‘template matching’.

• Data Flow Engines: It is a tool used for code optimization.


• Info is supplied by user & intermediate code is compared to analyze the relation.
• It also does data-flow analysis i.e. finding out how values are transmitted from one part to
another part of the program.

62
Compiler Construction Tools

Code-generator generators that produce a code generator from a collection of rules for
translating each operation of the intermediate language into the machine language for a
target machine.

Compiler-construction toolkits that provide an integrated set of routines for constructing


various phases of a compiler.

63
Structure of Compiler

64
PROGRAMMING LANGUAGE BASICS

To design an efficient compiler we should know some language basics.

1. The Static/Dynamic Distinction


2. Environments and States
3. Static Scope and Block Structure
4. Explicit Access Control
5. Dynamic Scope
6. Parameter Passing Mechanisms
7. Aliasing

65
The Static/Dynamic Distinction
The language uses a static policy or that the issue can be decided at compile time. On the
other hand, a policy that only allows a decision to be made when we execute the program is
said to be a dynamic policy or to require a decision at run time.

Example:

public static int x;

66
Environments and States
Programming languages affect the values of data elements or affect the interpretation of
names for that data changes, as the program runs. For example, the execution of an assignment
such as x = y + 1 changes the value denoted by the name x. More specifically, the assignment
changes the value in whatever location is denoted by x.

Example:

67
Environments and States
Programming languages affect the values of data elements or affect the interpretation of
names for that data changes, as the program runs. For example, the execution of an assignment
such as x = y + 1 changes the value denoted by the name x. More specifically, the assignment
changes the value in whatever location is denoted by x.
Example:

Output: 13 11 13 11 68
Static Scope and Block Structure
The scope rules for C are based on program structure; the scope of a declaration is determined implicitly by where
the declaration appears in the program. Later languages, such as C++, Java, and C#, also provide explicit control
over scopes through the use of keywords like public, private, and protected.

A block is a grouping
of declarations and
statements. C uses
braces { and } to
delimit a block; the
alternative use of
begin and end in some
languages.

69
Static Scope and Block Structure
The scope rules for C are based on program structure; the scope of a declaration is determined implicitly by where the
declaration appears in the program. Later languages, such as C++, Java, and C#, also provide explicit control over scopes
through the use of keywords like public, private, and protected.

70
Explicit Access Control

Classes and structures introduce a new scope for their members. If p is an object of a class with a field (member) x, then the use
of x in p.x refers to field x in the class definition. the scope of a member declaration x in a class C extends to any subclass C',
except if C' has a local declaration of the same name x.

Through the use of keywords like public, private, and protected, object oriented languages such as C++ or Java provide
explicit control over access to member names in a super class. These keywords support encapsulation by restricting access.

71
Dynamic Scope
Technically, any scoping policy is dynamic if it is based on factor(s) that can be known only when the program executes. The
term dynamic scope, however, usually refers to the following policy: a use of a name x refers to the declaration of x in the most
recently called procedure with such a declaration.

72
Parameter Passing Mechanisms
All programming languages have a notion of a procedure, but they can differ in how these procedures get their arguments.

In call-by-value, the actual parameter is evaluated (if it is an expression) or copied (if it is a variable). The value is placed in the
location belonging to the corresponding formal parameter of the called procedure. This method is used in C and Java.

In call- by-reference, the address of the actual parameter is passed to the call as the value of the corresponding formal parameter.
Uses of the formal parameter in the code of the call are implemented by following this pointer to the location indicated by the
caller. Changes to the formal parameter thus appear as changes to the actual parameter.

73
Parameter Passing Mechanisms
All programming languages have a notion of a procedure, but they can differ in how these procedures get their arguments.

Call by reference
Call by value

void swap(int a, int b) void swap(int *a, int *b)


{ {
int temp = a; int temp = *a;
a = b; *a = *b;
b = temp; *b = temp;
} }

When called with swap(x, y); When called with swap(&x, &y); where x and y
x and y will remain the same are int values, then a points to x and b points
to y, so the values pointed to are swapped,
not the pointers themselves
74
Parameter Passing Mechanisms

Call by value
void main() swap(int a, int b)
{ {
int x=10, y=5;
swap(x,y); int temp;
temp=a;
a=b;
} b=temp;

}
10 5
x y 10 5
75
a b
Parameter Passing Mechanisms
Call by swap(int *a, int *b)
Reference {
void main()
{ int temp;
int x=10, y=5; temp=*a;
swap(&x,&y);
*a=*b;
*b=temp;
}
}
1400 10 5 1500 *b
x y
1400 1500
*a
76
a b
Aliasing
There is an interesting consequence of call-by-reference parameter passing or its simulation, as in Java, where references to
objects are passed by value. It is possible that two formal parameters can refer to the same location; such variables are said to
be aliases of one another. As a result, any two variables, which may appear to take their values from two distinct formal
parameters, can become aliases of each other.

Example: Suppose a is an array belonging to a procedure p, and p calls another procedure q(x, y) with a call q(a, a). Suppose
also that parameters are passed by value, but that array names are really references to the location where the array is stored, as in
C or similar languages. Now, x and y have become aliases of each other. The important point is that if within q there is an
assignment x [10] = 2, then the value of y[10] also becomes 2.

77
PROGRAMMING LANGUAGE BASICS

To design an efficient compiler we should know some language basics.

1. The Static/Dynamic Distinction


2. Environments and States
3. Static Scope and Block Structure
4. Explicit Access Control
5. Dynamic Scope
6. Parameter Passing Mechanisms
7. Aliasing

78
Mini QUIZ-6:
https://bit.ly/2XQXbpP

79

Potrebbero piacerti anche