Sei sulla pagina 1di 40

Compiler Construction

(CS-460)

1
Outline

1. Semantic Analysis
2. Attributes & Attribute Grammars
3. The Symbol Table
1. The Structure of the Symbol Table
2. Declarations
3. Scope Rules and Block Structure
4. Interaction of Same Level Declarations
4. Data Types & Type Checking
5. Summary

2
Semantic Analysis

Lecture: 19-20

3
The Compiler So Far

 Scanner - Lexical analysis


 Detects inputs with illegal tokens
 e.g.: int @#5$;

 Parser - Syntactic analysis


 Detects inputs with ill-formed parse trees
 e.g.: missing semicolons

 Semantic analysis
 Last “front end” analysis phase
 Catches all remaining errors

4
Semantic Analysis
Source code
lexical errors
Lexical Analysis

tokens
syntax
Syntactic Analysis errors

AST
semantic
Semantic Analysis errors

AST’

Intermediate Code Gen

5
Beyond Syntax
What’s wrong with this
code?
foo(int a, char * s){ … }

int bar() {
(Note: it parses perfectly)
int f[3];
int i, j, k;
char *p;
float k;
foo(f[6], 10, j);
break;
i->val = 5;
j = i + k;
printf(“%s,%s.\n”,p,q);
goto label23;
}

Semantic Analysis 6
Semantic Analysis

 Semantic analysis computes additional information


about the program which is beyond the capabilities
of CFG and parsing algorithms
 The computed information is closely related to eventual
meaning or semantics of the program being translated
 Since this analysis by compiler is by definition static,
so it is also called static semantic analysis
 Static semantic analysis involves both description
of the analyses to perform and the implementation
of the analyses using appropriate algorithm

7
Semantic Analysis (Continue…)

 One method of describing semantic analysis is to


identify attributes of language entities
 After identifying attributes, write attribute equation,
or semantic rules that express how the
computation of such attributes is related to the
grammar rules of the language
 Such a set of attributes and equations is called an
attribute grammar
 Attribute grammars are most useful for languages
that obey the principle of syntax-directed
semantics
8
Attributes

 An attribute is any property of the programming


language construct
 The data type of a variable
 The value of an expression
 The location of a variable in the memory
 The object code of a procedure
 The number of significant digits in a number
 Attributes may be fixed prior to the compilation
process and they may be only determinable during
program execution

9
Attributes (Continue…)

 The process of computing an attribute and


associating the computed value with the language
construct in question is called binding of attribute
 The time during the compilation/execution process
when the binding of an attribute occurs is called its
binding time
 Attributes that can be bound prior execution are
static and attributes that can only be bound during
execution are dynamic

10
Attribute Grammars

 In syntax-directed semantics, the attributes are


associated directly with grammar symbols of the
language
 If X is a grammar symbol, and a is an attribute
associated to X, the we write X a for the value of a
associated to X
 Given a collection of attributes a1,…,ak, the principle
of syntax-directed semantics implies that for each
grammar rule X0 → X1 X2 … Xn, values of attributes
Xiaj of each grammar symbol Xi are related to the
values of attributes of other symbols in the rule
11
Attribute Grammars (Continue…)

 Should the same symbol Xi appear more than once


in the grammar rule, then each occurrence must be
distinguished from the other occurrences by suitable
subscription
 Each relationship is specified by an attribute
equation or semantic rule of the form:
Xiaj = fij(X0a1,…,X0ak.X1a1,…,X1ak,…,Xna1,…,Xnak)

 An attribute grammar for the attributes a1,…,ak is


the collection of all such equations for all the
grammar rules of the language
12
Attribute Grammars (Continue…)

 In the attribute grammar generality, attribute


grammars may appear to be extremely complex but
it is not the case because usually the functions fij
are quire simple
 It is rare of attributes to depend on large number of
other attributes, and so attributes can be separated
into small independent sets of independent
attributes and grammar can be written for them
 Typically, attribute grammars are written in tabular
form, with each grammar rule listed with the set of
attribute equations, associated to the rule
13
Attribute Grammars (Continue…)

 Below is a general form of attribute grammar in


tabular form
Grammar Rule Semantic Rules
Rule 1 Associated attribute
. equations
.
.
Rule n Associated attribute
equations

14
Example – Attribute Grammars

 Consider the following grammar for unsigned


numbers

number  number digit | digit


digit  0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

 The most significant attribute of the digit is its value,


which we give the name val
 Each digit has a value which we directly compute
from the actual digit it represents
 Thus grammar rule digit  0 implies that digit has value 0

15
Example – Attribute Grammars
(Continue…)

 For grammar rule digit  0, we can write its attribute


equation as digit.val = 0 and associate it with that
rule
 Furthermore each number has a value based on the
digit it contains e.g. number  digit that shows
number contains just one digit and in this case the
attribute equation will be number.val = digit.val
 In case number contains more than just one digit
then it is derived from number  number digit
grammar rule

16
Example – Attribute Grammars
(Continue…)

 Here the number on LHS is different from number on


RHS because both has different value so we
distinguish between them with the help of subscript
number1  number2 digit
 Now to represent 2 digit number e.g. 34 we need to
multiply number2 with 10 hence the attribute equation
of grammar number  number digit rule will look like
as follows;
number1 .val  number2 .val * 10 + digit.val

17
Example – Attribute Grammars
(Continue…)

Grammar Rule Semantic Rule


number  umber1.val =
number digit number2.val * 10 + digit.val
number  digit number.val = digit.val
digit  0 digit.val = 0
digit  1 digit.val = 1
digit  2 digit.val = 2
digit  3 digit.val = 3
digit  4 digit.val = 4
digit  5 digit.val = 5
digit  6 digit.val = 6
digit  7 digit.val = 7
digit  8 digit.val = 8
digit  9 digit.val = 9

18
Exercise

 Construct a parse tree with along with attribute


equations to derive and generate value “289” by
using following grammar:

number  number digit | digit


digit  0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

19
Example 2 – Attribute Grammars

 Consider the following grammar for simple integer


arithmetic expressions:

exp  exp + term | exp - term| term


term  term * factor | factor
factor  ( exp ) | number

20
The Symbol Table

 The symbol table is a major data structure in a


compile after the syntax tree
 In some languages symbol table is involved during
the process of parsing and even lexical analysis
where they need to add some information in it or
may need to look for something from it
 But in a careful designed language like Pascal or
Ada, it is possible and reasonable to put off symbol
table operations until after a complete parse, when
the program being translated is known to be
syntactically correct
21
The Symbol Table (Continue…)

 The principal symbol table operations include


 Insert is used to store the information provided by name
declarations
 Lookup is needed to retrieve the information associated to a
name
 Delete is needed to remove the information provided by
declaration when that declaration no longer applies
 Typically symbol table stores data type information,
information or region of applicability (scope) , and
information on eventual location in memory

22
The Structure of the Symbol Table

 The symbol table in a compiler is a typical dictionary


data structure
 The efficiency of three basic operations insert, lookup, and
delete vary according to the organization of data structure
 Typical implementations of dictionary structures
include linear lists, various search tree structures,
and hash tables
 Linear lists are a good basic data structure that can
provide easy and direct implementation of the three
basic operations
 Constant Time for insert and Linear Time to the size of the
list for Lookup and Delete
23
The Structure of the Symbol Table
(Continue…)

 Linear lists can be a good choice in case of


implementations where compilation speed is not a
major concern
 Search tree structures are somewhat less useful for
the symbol table, not only because of the insert
operation, but also because of the complexity of the
delete operation
 The hash table often provides the best choice for
implementing the symbol table
 All three basic operations can be performed in almost
constant time, and is used most frequently in practice

24
The Structure of the Symbol Table
(Continue…)

 A hash table is an array of entries, called buckets,


indexed by an integer range, usually from 0 to the
table size minus one
 A has function turns the search key (identifier
name) into an integer hash value in the index range,
and the item corresponding to the search key is
stored in the bucket at this index
 The has function should distribute the key indices as
uniformly as possible over the index range, since
hash collisions cause performance degradation in
the lookup and delete operations
25
The Structure of the Symbol Table
(Continue…)

 An important question is how has table deals with


collisions (often called collision resolution)
 One method allocates only enough space for a
single item in each bucket and resolves collisions by
inserting new items in successive buckets (this is
sometimes called open addressing)
 In this case the contents of the hash tables are limited by
the size of the array used for the table, and as the array fills
collisions become more and more frequent
 The best choice for compilers is the alternative to
open addressing, called separate chaining
26
The Structure of the Symbol Table
(Continue…)

 In separate chaining method, each bucket is actually


a linear list
 Collisions are resolved by inserting the new item into the
bucket list

27
The Structure of the Symbol Table
(Continue…)

 One question still remains that how the hash


function works
 The hash function
 It converts a character string into an integer in the range
0…size-1 in three steps
 First, each character in the string is converted into a
nonnegative integer
 Second, these nonnegative integers are combined in some
way to form a single integer
 Finally, the resulting integer is scaled in the range 0…size-1

28
Declarations

 The behavior of the symbol table depends heavily


on the properties of declarations of the language
being translated
 How the insert and delete operations act on the symbol
table, when these operations need to be called, and what
attributes are inserted into the table
 There are four basic kinds of declarations
1. Constant declaration
2. Type declaration
3. Variable declaration
4. Procedure/Function declaration

29
Declarations (Continue…)

 It is easiest to use one symbol table to hold the


names from all the different kinds of declarations
 When programming language prohibits the use of the same
name in different kinds of declarations
 Occasionally it is easier to use a different symbol
table for each kind of declaration
 For example all type declarations are contained in one
symbol table whereas all variable declarations are in a
different symbol table and so on

30
Declarations (Continue…)

 The attributes bound to a name by declaration vary


with the kind of the declaration
 Constant declarations associate values to names;
sometimes constant declarations are called value bindings
for this reason
 Type declarations bind names to newly constructed types
and may also create aliases for existing named types
 Variable declarations most often bind names to data types.
Besides data type, it may bind more attributes implicitly e.g.
scope of a variable
 Procedure/Function declarations may bind return type and
parameters as attribute

31
Scope Rules and Block Structure

 Scope rules vary widely from language to language


but there are some rules that are common
 Here we will discuss two of these rules; declaration
before use and most closely nested rule for block
structure
 Declaration before use
 Name be declared in the text of the program prior to any
reference to the name
 It permits the symbol table to be built as parsing proceeds
and for lookup to be performed as soon as a name
reference is encountered in the code

32
Scope Rules and Block Structure
(Continue…)

 Block structure
 It is a common property of programming languages
 A language is block structured if it permits the nesting of
blocks inside other blocks
 If the scope of declarations in a block are limited to that
block and other blocks contained in that block, subject to
the most closely nested rule
 Given several different declarations for same name, the declaration
that applies to a reference is the one in the most closely nested
block to the reference

33
Scope Rules and Block Structure
(Continue…)

 To implement nested scopes and most closely


nested rule, the symbol table insert operation must
not overwrite previous declaration
 The insert operation should hide the previous declaration so
the lookup operation can only find the recent one
 The delete operation must not delete all declarations
corresponding to a name, but only the most recent
one, uncovering any previous declaration
 Symbol table construction can proceed by
performing insert operations for all declared names
on entry into block & delete operations on exit from
block
34
Scope Rules and Block Structure
(Continue…)

 Build symbol table for following code;


int i, j

int f(int size)


{ char i, temp;

{ double j;

}

{ char * j;

}
}

35
Interaction of Same Level
Declarations
 One main issue that relates to scope is the
interactions among declarations at the same level
 One typical requirement in many languages is that
there can be no reuse of the same name in the
declaration at the same level
 To check this requirement, a compiler must perform a
lookup before each insert and determine by some
mechanism whether any preexisting declaration with the
same name are at the same level or not
 Somewhat more difficult is the question of how
much information the declaration in a sequence at
the same level have available about each other
36
Interaction of Same Level
Declarations (Continue…)
 Consider the following code;
int a = 1;
void f(void)
{ int a = 2, j = a+1;

} //which ‘a’ will be used to assign value to ‘j’?
 If each declaration is added to the symbol table as it
is processed, it is called sequential declaration
 If all the declarations are processed simultaneously
and added at once to symbol table at the end of a
section, then it is called collateral declaration

37
Interaction of Same Level
Declarations (Continue…)
 For each recursive declaration of function or
procedure, the compiler must insert the name of
function or procedure as it finds its declaration,
otherwise compiler may consider recursive call as
an error
 Error of use before declaration

38
Data Types & Type Checking

 One of the principal tasks of a compiler is the


computation and maintenance of information on
data types (type inference)
 Compiler uses this information to ensure that each
part of the program makes sense under the type
rules of the language (type checking)
 Data type information can occur in a program in
several different forms
 Theoretically, a data type is a set of values, or more
precisely a set of values with certain operations on
those values
39
Summary

Any Questions?

40

Potrebbero piacerti anche