Sei sulla pagina 1di 39

TK3163– Compiler Constructions

1:Introduction to Compilers

Dr. Bahari Idrus


Agenda for Today
 Introductions
 Language Processors
 Why study compiler
 The structure of a compiler
 Front end
 Back end
Introduction
 Which one is correct?
A. Nama saya Ahmad.
B. Na masa yaAh mad.
C. Ahmad saya Nama.

9/30/2002 © 2002 Hal Perkins & UW CSE A-3


Introduction
 Which one is correct for additional
operations?
A. 12 + 3 = 15
B. X+Y=Z
C. 2x + Y = Z
D. +a=bc

9/30/2002 © 2002 Hal Perkins & UW CSE A-4


Introduction
If x == y then
z = 1;
Else
z = 2;

9/30/2002 © 2002 Hal Perkins & UW CSE A-5


Introduction
 Programming language are notations for
describing computations to people and to
machines (computer)
 Before a program can be run, it first must
be translated into a form in which it can
be executed by a computer.
 The software systems that do this
translation are called compilers.
Language Processors – What
is compiler?
 A program that reads a program written in one
language (source language) and translates it
into another language (target language).

Source language Compiler Target language

 An important role of the compiler is to report


any errors in the source program that it detects
during the translation process.
Language Processors (cont)
 If the target program is an executable
machine-language program, it can
then be called by the user to process
inputs and produce outputs.

Target
input output
Program
Language Processors -
Interpreter
 An interpreter is another common kind of
language processor.
 Instead of producing a target program as a
translation, an interpreter appears to directly
execute the operations specified in the source
program on inputs supplied by the user.
Source
Program
Interpreter Output
Input

Error messages
Common Issues
 Compilers and interpreters both must
read the input – a stream of characters
– and “understand” it; analysis

w h i l e ( k < l e n g t h ) { <nl> <tab> i f ( a [ k ] > 0


) <nl> <tab> <tab>{ n P o s + + ; } <nl> <tab> }
Typical Implementations
 Compilers
 FORTRAN, C, C++, Java, COBOL, etc.
 Target program produces by a compiler is
usually much faster than an interpreter at
mapping inputs to outputs.
 Interpreters
 PERL, Python, postscript printer, Java VM
 Give better error diagnostic than a compiler,
because it executes the source program
statement by statement.
Task 1

1. What is the difference between a compiler


and an interpreter?
2. What are the advantages of (a) a compiler
over an interpreter (b) an interpreter over
a compiler.
Why Study Compilers? (1)
 Why study compiler?
 Become a better programmer(!)
 Insight into interaction between languages,
compilers, and hardware.
 Understanding of implementation
techniques.
 Better intuition about what your code does.
Why Study Compilers? (2)
 Compiler techniques are everywhere
 Parsing (little languages, interpreters)
 Database engines
 AI: domain-specific languages
 Text processing
 Tex/LaTex -> dvi -> Postscript -> pdf
 Hardware: VHDL; model-checking tools
 Mathematics (Mathematica, Matlab)
Why Study Compilers? (3)
 Fascinating blend of theory and
engineering
 Direct applications of theory to practice
 Parsing, scanning, static analysis
 Some very difficult problems (NP-hard or
worse)
 Resource allocation, “optimization”, etc.
 Need to come up with good-enough solutions
Structure of a compiler
 Front end: analysis
 Read source program and understand its structure
and meaning
 Back end: synthesis
 Generate equivalent target language program
Intermediate
Language

Source Target Language


Front End – Back End –
Language
language specific machine specific
Compiler Architecture
Front End – language specific Back End –machine specific

Intermediate
Intermediate
Language
Language

Scanner Parser Semantic Code


Source Code Target
language
(lexical (syntax Analysis Generator language
Optimizer
analysis) analysis) (IC generator)
tokens Syntactic
structure

Analysis Synthesis

Symbol
Table
Implications
 Must generate correct code
 Must manage storage of all variables
 Must agree with OS & linker on target format

Source Front End Back End Target


More Implications
 Need some sort of Intermediate
Representation (IR)
 Front end maps source into IR
 Back end maps IR to target machine code

Source Front End Back End Target


source tokens IR
Scanner Parser

Front End
 Split into two parts
 Scanner: Responsible for converting character
stream to token stream
 Also strips out white space, comments
 Parser: Reads token stream; generates IR
 Both of these can be generated automatically
 Source language specified by a formal grammar
 Tools read the grammar and generate scanner &
parser (either table-driven or hard coded)
Lexical Analysis - Scanning
Scanner tokens Parser Semantic Code
Source (lexical (syntax Generator
Analysis
languag analysis) analysis) (IC generator)
e

Code
Optimizer

the lexical analyzer reads the


stream of characters making up
• Tokens described formally the source program & groups the
• Breaks input into tokens characters into meaningful
• Remove white space
Symbol
sequences called lexemes.
Table
For each lexemes, the lexical
Analyzer produces as output a
token of the form:
<token-name, attribute-value>
Tokens
 Token stream: Each significant lexical
chunk of the program is represented by
a token
 Operators & Punctuation: {}[]!+-=*;: …
 Keywords: if, while, return, goto
 Identifiers: id & actual name
 Constants: kind & value; int, floating-point
character, string, …
Scanner Example (1)
Scanner Example (1)

Source program ---------------------------- lexical analysis

position:=initial+rate*60 <id,1><=><id,2><+><id,3><*><60>
Scanner Example (1)
 Input text
// this statement does very little
if (x >= y) y = 42;
 Token Stream
IF LPAREN ID(x) GEQ ID(y)

RPAREN ID(y) BECOMES INT(42) SCOLON

 Note: tokens are atomic items, not character


strings
Syntax Analysis - Parsing
tokens Syntactic
Scanner Parser Semantic Code
Source structure Target
language
(lexical (syntax Analysis Generator language
analysis) analysis) (IC generator)

Code
Optimizer

 Tokens organized into syntax


tree that describes structure
 Error checking (syntax) Symbol
 Common output from a parser is Table
an abstract syntax tree
Parser Example (1)

lexical analysis ---------------------------- syntax analysis

<id,1><=><id,2><+><id,3><*><60>

<id,1> +

<id,2> *

<id,3> 60
Parser Example (2)
 Token Stream Input  Abstract Syntax Tree
IF LPAREN ID(x) ifStmt

GEQ ID(y) RPAREN


>= assign
ID(y) BECOMES

INT(42) SCOLON ID(x) ID(y) ID(y) INT(42)


Parser Example (2)
Assign
Exp ::= Exp ‘+’ Exp
| Exp ‘-’ Exp
ID ‘=‘ Exp
| Exp ‘*’ Exp
| Exp ‘/’ Exp Exp ‘+’ Exp
| ID
ID Exp ‘*’ Exp
Assign ::= ID ‘=‘ Exp
ID Exp ‘/’ Exp

ID ID
Semantic Analysis
Syntactic/semantic
Syntactic structure
Scanner Parser structure Semantic Code
Source Target
language
(lexical (syntax Analysis Generator
language
analysis) analysis) (IC generator)

Syntactic/semantic Code
structure
Optimizer

• “Meaning”
• Type/Error Checking

• Intermediate Code Generation –

abstract machine Symbol


Table
Back End
 Responsibilities
 Translate IR into target machine code
 Should produce fast, compact code
 Should use machine resources effectively
 Registers
 Instructions
 Memory hierarchy

9/30/2002 © 2002 Hal Perkins & UW CSE A-32


Back End Structure
 Typically split into two major parts with
sub phases
 “Optimization” – code improvements
 May well translate parser IR into another IR
 Code generation
 Instruction selection & scheduling
 Register allocation
The Result
 Input  Output

if (x >= y) mov eax,[ebp+16]


y = 42; cmp eax,[ebp-8]
jl L17
mov [ebp-8],42
L17:
Optimization

Scanner Parser Semantic Code


Source Target
(lexical (syntax Analysis Generator
language language
analysis) analysis) (IC generator)
Syntactic/semantic
structure
Syntactic/semantic
Code structure
Optimizer

Symbol
• Improving efficiency (machine independent) Table
• Finding optimal code is NP
Code Generation
Syntactic/semantic
structure
Scanner Parser Semantic Code
Source Target
(lexical (syntax Analysis Generator
language language
analysis) analysis) (IC generator)

Syntactic/semantic
Code structure
Optimizer

• IC to real machine code


• Memory management, register allocation,
instruction selection, instruction scheduling, Symbol
… Table
Translation

of
statement
The Phases of a Compiler
Phase Output Sample
Programmer Source string A=B+C;
Scanner (performs lexical Token string ‘A’, ‘=’, ‘B’, ‘+’, ‘C’, ‘;’
analysis) And symbol table for identifiers

Parser (performs syntax analysis Parse tree or abstract syntax tree ;


|
based on the grammar of the =
programming language) / \
A +
/ \
B C

Semantic analyzer (type Parse tree or abstract syntax tree


checking, etc)
Intermediate code generator Three-address code, quads, or int2fp B t1
RTL + t1 C t2
:= t2 A
Optimizer Three-address code, quads, or int2fp B t1
RTL + t1 #2.3 A
Code generator Assembly code MOVF #2.3,r1
ADDF2 r1,r2
MOVF r2,A
Peephole optimizer Assembly code ADDF2 #2.3,r2
MOVF r2,A

Potrebbero piacerti anche