Sei sulla pagina 1di 18

Compiler Construction

Vana Doufexi

1
Administrative info
 Instructor
 Name: Vana Doufexi
 E-mail: vdoufexi@cs.northwestern.edu
 Office: Ford Building, #2-229
 Hours: E-mail to set up appointment

 Teaching Assistant
 TBA

2
Administrative info
 Course webpage
 http://www.cs.northwestern.edu/academics/courses/322
 contains:
 news
 staff information
 lecture notes & other handouts
 homeworks & manuals
 policies, grades
 newsgroup info
 useful links

 Newsgroup
 Name: cs.322
 nntp: news.cs.northwestern.edu
3
What is a compiler
 A program that reads a program written in some
language and translates it into a program written in
some other language
 Modula-2 to C
 Java to bytecodes
 COOL to MIPS code

 How was the first compiler created?

4
Why study compilers?
 Application of a wide range of theoretical techniques
 Data Structures
 Theory of Computation
 Algorithms
 Computer Architecture

 Good SW engineering experience


 Better understanding of programming languages

5
Features of compilers
 Correctness
 preserve the meaning of the code

 Speed of target code


 Speed of compilation
 Good error reporting/handling
 Cooperation with the debugger
 Support for separate compilation

6
Compiler structure

source IR target
Front End Back End
code code

 Use intermediate representation


 Why?

7
Compiler Structure
 Front end
 Recognize legal/illegal programs
 report/handle errors

 Generate IR
 The process can be automated

 Back end
 Translate IR into target code
 instruction selection

 register allocation

 instruction scheduling

 lots of NPC problems -- use approximations

8
Compiler Structure
 Optimization
 goals
 improve running time of generated code

 improve space, power consumption, etc.

 how?
 perform a number of transformations on the IR

 multiple passes

 important: preserve meaning of code

9
The Front End
 Scanning (a.k.a. lexical analysis)
 recognize "words" (tokens)

 Parsing (a.k.a. syntax analysis)


 check syntax

 Semantic analysis
 examine meaning (e.g. type checking)

 Other issues:
 symbol table (to keep track of identifiers)
 error detection/reporting/recovery

10
The Scanner
 Its job:
 given a character stream, recognize words (tokens)
 e.g. x = 1 becomes IDENTIFIER EQUAL INTEGER

 collect identifier information


 e.g. IDENTIFIER corresponds to a lexeme (the actual
word x) and its type (acquired from the declaration of x).
 ignore white space and comments
 report errors

 Good news
 the process can be automated

11
The Parser
 Its job:
 Check and verify syntax based on specified syntax rules
 e.g. IDENTIFIER LPAREN RPAREN make up an
EXPRESSION.
 Coming soon: how context-free grammars specify syntax

 Report errors
 Build IR
 often a syntax tree

 Good news
 the process can be automated

12
Semantic analysis
 Its job:
 Check the meaning of the program
 e.g. In x=y, is y defined before being used? Are x and y
declared?
 e.g. In x=y, are the types of x and y such that you can
assign one to the other?
 Meaning may depend on context
 Report errors

13
IRs
 Graphical
 e.g. parse tree, DAG

 Linear
 e.g. three-address code

 Hybrid
 e.g. linear for blocks of straight-line code, a graph to
connect blocks
 Low-level or high-level

14
The scanning process
 Main goal: recognize words
 How? by recognizing patterns
 e.g. an identifier is a sequence of letters or digits that starts
with a letter.
 Lexical patterns form a regular language
 Regular languages are described using regular
expressions (REs)
 Can we create an automatic RE recognizer?
 Yes! (Hold that thought)

15
The scanning process
 Definition: Regular expressions (over alphabet )
  is an RE denoting {}
 If , then  is an RE denoting {}
 If r and s are REs, then
 (r) is an RE denoting L(r)

 r|s is an RE denoting L(r)L(s)

 rs is an RE denoting L(r)L(s)

 r* is an RE denoting the Kleene closure of L(r)

 Property: REs are closed under many operations


 This allows us to build complex REs.

16
The scanning process
 Definition: Deterministic Finite Automaton
 a five-tuple (, S, , s0, F) where
  is the alphabet

 S is the set of states

  is the transition function (SS)

 s0 is the starting state

 F is the set of final states (F  S)

 Notation:
 Use a transition diagram to describe a DFA

 DFAs are equivalent to REs


 Hey! We just came up with a recognizer!

17
The scanning process
 Goal: automate the process
 Idea:
 Start with an RE
 Build a DFA
 How?

 We can build a non-deterministic finite automaton


(Thompson's construction)
 Convert that to a deterministic one
(Subset construction)
 Minimize the DFA
(Hopcroft's algorithm)
 Implement it
 Existing scanner generator: flex

18

Potrebbero piacerti anche