Sei sulla pagina 1di 13

1/13 STUDY MATERIAL

SUBJECT COURSE SEMESTER UNIT STAFF : : : : :


INTRODUCTION TO COMPILER DESIGN

III-BCA V

IV
C. Gayathri

UNIT I V
SYLLABUS
Runtime organization source language issues storage organizations storage allocation strategies. Intermediate code generation: intermediate languages declarations assignment statements. RUN-TIME ORGANIZATION The allocation and de-allocation of data object is managed by the run-time support package, consisting of routines loaded with the generated target code. The design of the run-time support package is influenced by the semantics of procedures. Execution of a procedure is referred to as an activation of the procedure. If the procedure is recursive, several of its activations may be alive at the same time. In Pascal, each call of a procedure leads to an activation that may manipulate data objects allocated for its use. The representation of a data object at run time is determined by its type. Elementary data types represented by equivalent data objects in the target machine. Example : characters , integers and reals. Aggregates are represented by collections of primitive objects. Example : arrays, strings and structures. SOURCE LANGUAGE ISSUES Procedures A procedure definition is declaration that associates an identifier with a statement. The identifier is the procedure name, and the statement is the procedure body. The identifiers appearing in a procedure definition are called formal parameters of the procedure. Arguments known as actual parameters may be passed to a called procedure, they are substituted for the formals in the body. Activation Trees Flow of control among procedures during the execution of a program 1. Control flows sequentially

2/13
The execution of a program consists of a sequence of steps, with control being at some specific point in the program at each step. 2. Each execution of a procedure starts at the beginning of the procedure body and returns control to the point immediately following the place where the procedure was called. This means the flow of control between procedures can be using trees. Each execution of procedure body is referred to as an activation of the procedure. The lifetime refers to a consecutive sequence of steps during the execution of program. A procedure is recursive if a new activation can begin before an earlier activation of the same procedure has ended. An activation tree, to depict the way control enters and leaves activation. In activation tree, Each node represents an activation of a procedure. The root represents the activation of the main program. The root for a is the parent of the node for b if only if control flows from activation a to b. The node for a is to the left of the node for b if only if the lifetime of a occurs before the lifetime b. Each node represents a unique activation, when the activation represented by the node. Control Stacks The flow of control in a program corresponds to a depth-first traversal of the activation tree that starts at the root, visits a node before its children, and recursively visits children at each node in a left-to-right order. A control stack to keep track of live procedure activations. Idea To push the node for an activation onto the control stack as the activation begins. To pop the node when the activation ends. The contents of the control stack are relate to paths to the root of the activation tree. The Scope of a Declaration A declaration in a language is a syntactic construct that associates information with a name. The scope rules of a language determines which declaration of a name applies when the name appears in the text of a program. The portion of the program to which a declaration applies is called the scope of that declaration. An occurrence of a name in a procedure is a said to be local procedure if it is in the scope of a declaration within the procedure, otherwise, the occurrence is said to be non local. At compile time, the symbol-table can be used to find the declaration that applies to an occurrence of a name. When a declaration is seen, a symbol table entry is created for it.

3/13
The scope of declaration, its entry is returned when the name in it is looked up. Bindings on Names Each name is declared once in a program, the same name may denote different data objects at run time. Data objects corresponds to a storage location that can hold values. Environment refers to a function that maps a name to a storage location to the value held there. Environments and states are different, an assignment changes the state, but not the environment. An environment maps a name to an l-value, and a state maps the l-value to an r-value. When an environment associates storage location with a name, the association itself is referred to as a binding of name. A binding is the dynamic counterpart of a declaration. More than one activation of a recursive procedure can be alive at the same time. environment name storage Fig. Two-storage from names to values STATIC NOTION Definition of a procedure Declaration of a name Scope of a declaration STORAGE ORGANIZATION Subdivision of run-time memory The compiler obtains a block of storage from the operating system for the compiled program to run in. The run time storage can be subdivided, a) The generated target code b) Data objects c) A counterparts of the control stack to keep track of procedure activations. The organization of run time storage can be used for Fortran, Pascal and C. The size of the generated target code is fixed at compile time, the compiler can be placed in a statically determined area, low end of memory. The size of the data objects may be known at compile time can be placed in a statically determined area. The address of data objects can be compiled into the target code allocated statically. When a call occurs, execution of an activation is interrupted and information about the status of the machine. When control returns from the call, this activation can be restarted DYNAMIC COUNTERPART activations of the procedure bindings of the name lifetime of a binding state value

4/13
after restoring the values of relevant registers and setting the program counter to the point immediately after the call. Data objects whose life times are contained in that of an activation can be allocated on the stack, along with their information associated with the activation. A separate area of run-time memory, called heap, holds all other information. An activation tree may use the heap to keep information about activations. The data is allocated and de-allocated on a stack makes it cheaper to place data on the stack on the heap. The sizes of the stack and the heap can change as the program executes , to show these at opposite ends of memory, where they can grow toward each other. Stacks grow down, the top of the stack is drawn towards the bottom of the page. Memory addresses increase go down a page, downwards-growing means toward higher addresses. If top marks the top of the stack, offsets from the top of the stack can be computed by substituting the offset from top. Code Static Data Stack Heap Fig. Subdivision of run-time memory into code and data areas. Activation Records A single execution of a procedure is managed using a contiguous block of storage called an activation record or frame, consisting of the collection of fields. To push the activation record of a procedure on the run-time stack when the procedure is called. To pop the activation record off the stack when control returns to the caller. Returned value Actual parameters Optional control links Optional access link Saved machine status Local data Temporaries Fig. A general activation record The field for temporaries hold temporary values , the evaluation of expressions are stored in the field for temporaries. The field for local data holds data that is local to an execution of a procedure. The field for saved machine status holds information about the state of the machine before the procedure is called. This information includes the values of the program counter and machine registers that have to be restored when control returns from the procedure.

5/13
The optional access link is to refer to non local data held in other activation records. The optional control link points to the activation record of the caller. The field for actual parameters is used by the calling procedure to supply parameters to the called procedure. This parameters are passed in machine registers. The field for the returned value by the called procedure to return a value to the calling procedure. This value is returned in a register. An exception occurs if a procedure may have a local array whose size is determined by the value of an actual parameter, available only when the procedure is called at run time.

Compile-time Layout of Local Data Multi-byte objects are stored in consecutive bytes and given the address of the first byte. An elementary data type can be stored in an integral number of bytes, such as a character, integer or real. An aggregate must be large enough to hold all its components, such as an array or record. It is allocated in one contiguous block of bytes. The field for local data is the declarations in a procedure are examined at compile time. Variable-length data is kept outside this field. The count of memory locations have been allocated for previous declarations. To determine a relative address of the storage for a local with respect to the beginning of the activation record. The relative address, or offset is the difference between the addresses of the position and the data object. Space left unused due to alignment considerations is referred to as padding. Example : an array of 10 characters needs only enough bytes to hold ten characters, a compiler allocate 12 bytes, leaving 2 bytes unused. When space is at a premium, a compiler may pack data so that no padding is left. STORAGE ALLOCATION STRATEGIES It can be used in three data areas in the organization. Static Allocation Static allocation lays out storage for all data objects at compile time. The addresses at which information is to be saved when a procedure call occurs are known at compile time. Limitations with using static allocation The size of a data object an constraints on its position in memory must be known at compile time. Recursive procedures are restricted, because all activations of procedure use the same bindings for local names. Data structures cannot be created dynamically, since there is no mechanism for storage allocation at run-time.

Stack Allocation Stack allocation manages the run-time storage as a stack. Stack allocation is based on the control stack, storage is organized stack, and activation records are pushed and popped as activations begins and end, respectively.

6/13
Local are bound to fresh storage in each activation, because a new activation record is pushed onto the stack when a call is made. The values of locals are deleted when the activation ends, that is, the values are lost because the storage for locals disappears when the activation record is popped. At run-time, an activation record can be allocated and de-allocated by incrementing and decrementing the top of the stack, respectively, by the size of the record. Calling Sequences Procedure calls are implemented by generating what are known as calling sequences in the target code. A call sequence allocates an activation record and enters and information into its fields. A return sequence restores the state of the machine so the calling procedure can continue execution. The code in a calling sequence is divided between the calling procedure( the caller ) and the procedure it calls ( the callee ). The designing of calling sequences and activation records is that fields whose sizes are fixed are placed in the middle.

Advantage : To placing the fields for parameters and a potential returned value next to the activation record of the caller. The caller can access these fields using offsets from the end of its own activation record, without knowing the complete layout of the record for the callee. The register top-sp points to the end of the machine-status fielding an activation record. This position is known to the caller, so it can be made responsible for setting top-sp before control flows to the called procedure. The code for the callee can access its temporaries and local data using offsets from top-sp. Parameters and returned value Control link Links and saved status Temporaries and local data Parameters and returned value Control link Links and saved status Temporaries and local data callers activation record callees activation record

callers responsibility Top-sp callees Responsibility Fig. Division of tasks between caller and callee. The call sequence,

The caller evaluates actuals. The caller stores a return address and the old vale of top-sp into the callees activation record. The caller increments top-sp to the position, that is , top-sp is moved past the callers local data and temporaries and the callees parameter and status fields. The callee saves register values and other status information. The callee initializes

7/13
its local data and begins execution. The return sequence, The callee places a return value next to the activation record of the caller. Using the information in the status field, the callee restores top-sp and other registers and branches to a return address in the callers code. Although top-sp has been decremented, the caller can copy the returned value into its own activation record and use it to evaluate an expression. The calling sequences allow the number of arguments of the called procedure to depend on the call. At compile time, the target code of the caller knows the number of arguments it is supplying to the callee. Hence the callee knows the size of the parameter field. The target code of the callee must be prepared to handle other calls, it waits until it is called, and then examines the parameter field.

Variable-Length Data The relative addresses of these pointers are known at compile time, the target code can access array elements though the pointers. The activation record for q begins after the arrays of p, and the variable length arrays of q begin. Control link Pointer to A Pointer to B Pointer to C Array A Array B Array C Activation Record for p

Array of p Control link Activation Record for procedure q called by p Top_sp Top_sp Array of q Fig. Access to dynamically allocated Access to data on the stack is two pointers: top-sp and top. The actual top of the stack, it points to the position at which the next activation record will begin. To find local data, for consistency with organization, suppose top-sp points to the end of the machine status field. Top-sp points to the end of this field in the activation record for q. Within the field is a control link to the previous value of top-sp when control was in the calling activation of p. the code reposition top and top-sp can be generated at compile-time,

8/13
using the sizes of the fields in the activation records. When q returns, the new value of top is top-sp minus the length of the machine-status and parameters fields in qs activation record. This length is known at compile time, at least to the callee. After adjusting top, the new value of top-sp can be copied from the control link of q. Dangling References A dangling references occurs when there is a reference to storage that has allocated. Heap Allocation Heap allocation allocates and de-allocates storage as needed at run-time from a data area known as a heap. The values of local names must be retained when an activation ends. A called activation outlives the caller. This possibility cannot occur for those language where activation trees correctly depict the flow of control between procedures. To handle small activation records or records of a predictable size as a special case : For each size of interest, keep a linked list of free blocks of that size. If possible, fill a request for size s with a block of size s , where s is the smallest size greater than or equal to s. When the block is eventually de-allocated, it is returned to the linked list it came from. For large blocks of storage use the heap manager. Control link Control link Control link Allocation and de-allocation of small amounts of storage, taking and returning a block from a linked list are efficient operations. Large amounts of storage, the computation to take the storage to use, the time taken by the allocator is negligible compared with the time taken to do the computation. been

INTERMEDIATE CODE GENERATION The front end translates a source program into an intermediate representation from which the back end generates target code. A source program can be translated into target language directly, using a machineindependent intermediate. Retargeting is facilitated; a compiled for a different machine can be created by attaching a back end for the new machine to an existing front end. A machine-independent code optimizer can be applied to the intermediate representation. The syntax directed method can be used to translate into an intermediate form programming language constructs such as declarations, assignments and flow of controls

9/13
statements. The source program has been parsed and statically checked. Intermediate code generation can be folded into parsing, if desired. Parser Static Checker intermediate intermediate Code Generator code code generator

Fig. Position of intermediate Code generator. Intermediate Languages The semantic rules for generating three-address code from common programming language constructs are similar to those for constructing syntax trees or for generating postfix notation. Graphical Representations A syntax tree depicts the hierarchical structure of a source program. A DAG gives the same information but in a more compact way because common sub expressions are identified. For example, a syntax tree and dag for the assignment statement a := b * -c + b * -c assign assign a * b a) uminus b | c Syntax tree + * uminus b b) DAG a + * uminus | c

Fig. Graphical Representation of a := b * - c + b * -c Postfix notation is a linearized representation of a syntax tree, it is a list of the nodes of the tree in which a node appears immediately after its children. The postfix notation for the syntax tree abc - * bc - * + := Syntax trees for assignment statements are produced by the syntax-directed definition. Non-terminal S generates an assignment statement. PRODUCTION SEMANTIC RULE

S id := E S.nptr := mknode (:= , mkleaf (id, id.place ), E.nptr ) E E1 + E2 E.nptr := mknode (+ , E 1 .nptr, E 2 .nptr ) E E1 * E2 E.nptr := mknode (* , E 1 .nptr ,E 2 .nptr ) E -E 1 E.nptr := mknode (- , E 1 .nptr ) E id E.nptr := mkleaf (id, id.place ) Fig. Syntax-directed definition to produce syntax trees for assignment statements

10/13
Two representations of the syntax tree in the following figure. a) Each node is represented as a record with a field for its operator and additional fields for pointers to its children. b) Nodes are allocated from an array of records and the index or position of the node serves as the pointer to the node. 0 id b 1 id c assign 2 - 1 * * 3 * 0 2 id a 4 id b id b id b 5 id c + 6 - 5 7 * 4 6 8 + 3 7 id id 9 id a (a) 10 := 9 8 (b) 11

Three-Address Code Three-address code is a sequence of statements x := y op z , where x, y, and z are names, constants or compiler-generated temporaries, op for any operator. No built-up arithmetic expressions are permitted, a there is only operator on the right side of the statement. Three-address code is each statement contains three addresses , two for the operands and one for the result.

Types Of Three-Address Statements a. Statements can have symbolic labels and there are statements for flow of control. b. A symbolic label represents the index of a three-address statements in the array holding intermediate code. c. An indices can be substituted for the labels either by marketing a separate pass, or by using backpatching. 1. Assignment statements

a. x := y op z, where op is a binary arithmetic or logical operation b. x := op y, where op is unary operation, unary minus, logical negation, shift operators and conversion operators to convert a fixed-point number to a floating-point number. 2. Copy Statements x := y , where the value of y is assigned to x. 3. Unconditional jump goto L , where L is the next to be executed. 4. Conditional jump if x relop y goto L , where relop is a relational operator ( < , = , >= , ) to x and y

11/13
and executes the statements with the L next if x stands in relation relop to y. 5. Procedure Calls param x call p, n for procedure call return y where y is a returned value is optional. 6. Indexed assignment x := y[i] and x[i] := y, where x, y and i data objects. 7. Address and pointer assignments x := &y, x := *y and *x := y. Syntax-Directed Translation into Three-address Code When three-address code is generated, temporary names are made up for the interior nodes of a syntax tree. The synthesized attribute represents the three-address code for the assignment. The non terminal has two attributes : a) The name that will hold the value, and b) The sequence of three address statements evaluated. Three-address statements may be sent to an output file, rather than built up into the code attributes. Flow of control statements can be added to the assignments in productions and semantic rules. Productions concatenate only the operator after the code for the operands. The intermediate form produced by the syntax-directed translations can be changed by making modifications to the semantic rules.

Implementations of three address statements a. Quadruples To identify the three-address code. A quadruples is a record structure with four fields, which call op, arg1, arg2 and result. The op field contains an internal code for the operator. The contents of fields arg1, arg2 and result are pointers to the symbol table entries for the names represented by the fields. Temporary names must be entered into the symbol table as they are created. b. Triples To avoid entering temporary names into the symbol table refer to a temporary value by the position of the statement that computes it. Three address statements can be represented by records with only three fields : op, arg1, and arg2. The fields arg1 and arg2, for the arguments of op, are either pointers to the symbol table or pointers into the triple structure. Three fields are used, this intermediate code format is known as triples, its refers to two address code. Parenthesized numbers represent pointers into the triple structure. c. Indirect triples

12/13
Implementation of three-address code that has been considered is that of listing pointers to triples, rather than listing the triples. This implementation is called indirect triples. Quadruple Notation Three address statement defining or using a temporary can immediately access the location for that temporary via the symbol table. The symbol table interposes an extra degree of indirection between the computation of a value and its use. Benefit : an optimizing compiler. Triples Allocation of storage to those temporaries needing it must be deferred to the code generation phase. Moving a statement that defines a temporary value require to change all referenced to that statement. Problem : Difficult to use in an optimizing compiler. Indirect Triples To save space compared with quadruples if the same temporary value is used more than once. Two or more entries in the statement array can point to the same line of the op-arg1ara2 structure. DECLARATIONS The relative address consists of an offset from the base of the static data area or the field for local data in an activation record. Declarations in a procedure A global variable called offset, can keep track of the next available relative address. Offset is incremented by the width of the data object denoted by that name. The procedure enter(name, type, offset) creates a symbol-table entry for name, gives it type type and relative address offset in its data area. Attribute type represents a type expression constructed from the basic types integer and real by applying the type constructors pointer and array. To keep track of scope information and field names in records. The semantic rules a. mktable(previous) creates a new symbol table and returns a pointer to the new table. The argument previous points to a previously created symbol table. The pointer previous is placed in a header for the new symbol table. b. enter(table, name, type, offset) creates a new entry for name name in the symbol tablepointed to by table. enter places type type and relative address offset in fields within the entry. c. addwidth(table, width) records the cumulative width of all the entries in table in the header associated with this symbol table. d. enterproc(table, name, newtable) creates a new entry for procedure name in the symbol table pointed to by table. The argument new table points to the symbol table for this procedure name. ASSIGNMENTS The lexeme for the name represented by id is given by attribute id.name. Operation lookup(id.name) checks if three is an entry for this occurrence of the name in the symbol table A pointer to the entry is returned, otherwise, lookup returns nil to indicate that no entry

13/13
was found. Reusing temporary names To generate a new temporary name each time a temporary is new temp. The temporaries used to hold intermediate values in the symbol table and space has to be allocated to hold their values. Temporaries can be reused by changing newtemp, it may be assigned or used more than once. Problem : Temporaries defined or used more than once occurs when perform code optimization such as continuing common sub expressions or moving a computation out of a loop. To create a new name whenever create an additional definition or use for a temporary or move its computation. Addressing array elements Elements of an array can be accessed if the elements are stored in a block of consecutive locations. If width of each array elements is w, then i th element of array base + ( I low ) x w where, low is the lower bound on the subscript and base is the relative address of the storage allocated for the array. To keep this information in symbol table entries for the field names is that the routine for looking up names in the symbol table can also be used for field names.

Potrebbero piacerti anche