Sei sulla pagina 1di 3

[Type text]

[Type text]

[Type text]

Expt No.: 01. Aim:Write a C program to implement Lexical Analyzer for simple arithmetic operation. Objective: 1. To study of the lexical analyzer of compiler. 2. To understand Identifier table, literal table, Symbol Table, Uniform Table etc. Theory: What is mean by lexical analysis: Lexical analysis identifies the lexical units in a source statement. It then classifies the units into different lexical classes i.e. Tokens or Lexemes e.g. ids, constants, keywords etc. 2. It enters them into different tables. This classification may be based on the nature of a string or on the specification of the source language. 3. A token is basically a descriptor and it contains two fields viz; 4. The statement k = I + j; is represented as string of tokens:
1.

Why Lexical Analysis? 1. Three types of activities that are carried out during the analysis phase of any language processor. 2. Activities: 1. Linear analysis 2. Hierarchical analysis 3. Semantic analysis 3. Linear analysis involves scanning of source program into proper classes (tokens. And this is what, is called lexical analysis. 4. Reorganization of the input strings and categorization of these strings as basic elements like identifiers, operators, keywords etc. 5. These basic elements are placed into tables, which are used and modified by the other phases of compilers. 6. Since, these other phases of compilers use the attributes of these basic elements and hence they must be able to access this information. 7. There are two ways to pass this information: (a) All information about each element is passed to other phases or (b) The source string itself is converted into a string of uniform symbols. 8. However for this conversion to take place, one has to scan the entire program statementby-statement. 9. And this is the only reason, why we have a lexical scan over the entire program. 10. Thus we can say that lexical analysis is required for. a. Parsing the source program into lexemes and converting these lexemes to tokens.

[Type text]

[Type text]

[Type text]

b. Building tables of information. c. Passing this information onto other phases by converting the source program into uniform symbols. 11. The reason why we separate the analysis phase into lexical analysis and parsing are cited as follows: (a) The separation allows us to construct a specialized processor and hence efficiency of te compiler is increased. This is so because a large amount of time is spent reading the source program and recognizing the tokens. We can reduce this time by using specialized buffering techniques; (b) Compiler portability is enhanced because then we can restrict the language-dependent features only the lexical analysis. (c) Also, the overall design becomes much simpler. Algorithm for lexical Analysis 1. Parse the input characters into tokens. Note that a token is a substring of input string and these tokens represent basic elements like identifiers, operators, keywords etc. 2. The input string is separated into tokes bu break characters. 3. Read the source characters, check for their validity and see if they are break characters. 4. If they are non-break characters, then they are accumulated in token s. 5. There are three types of tokens we need to look out for: (a) Terminal symbols (b) Possible identifiers (c) Literals 6. Check all the tokens by comparing them with entries in the terminal table. If match is forum; classify the token or terminal symbol and create an entry in uniform symbol table of type TRM. 7. If token is not terminal symbol, scanner has to check if its identifier or literal. 8. There are certain lexical rules applicable for validating identifier. Most common of these lexical rules is that an identifier is made up of alphanumeric strings and always starts with an alphabetic character. 9. This rule, we have come across innumerable times while studying the know-how is of C-language. Note that in C language also, the sage of reserved keywords is not allowed for identifiers. Also identifiers must start with an alphabetic character. 10. With these lexical rules, we can identify the identifier. When we are done with this, we have to scan through entire identifier table, to nullify any possibility of duplicate identifier. 11. But our work does not end here. For an identifier, we also hve to record additional information like its attributes e.g. the pattern i.e. the lexical rule num matches both the strings 0 and 1. But it is essential to know, the exact nature of string. 12. For this additional information about possible identifiers we must know there attributes. Identifiers have attributes like data types, base precision etc. 13. The identifier table is examined. If this token is not encountered, then new entry is made. 14. All that go into the identifier table.

[Type text]

[Type text]

[Type text]

15. Lexical analysis creates a uniform symbol of type IDN. 16. Literal analysis e.g. numbers, quoted character strings or any self defining data. 17. Search the literal table for this token, if it is not found a new entry is made. 18. The information regarding the literals is available in lexical phase only. This information is obtained by examining the characters representing a literal. 19. Remember that, whenever we encounter literal, an entry of type LIT is made in uniform symbol table 20. In the terminal table we have one column indicating whether the token is a break character or an operator. The significance of this column is to pass on information to the parser about the operators encountered in the program. The parser then replaces these operators with their respective subroutines. 21. The process of lexical scanning is thus quite simple. There are several languagedependent features that must be taken care of; for example in C, the statements which are preceded by # are not considered as identifier in FORTRAN, a number in columns 1-5 of a source statement is treatex as a statement number and not a literal. 22. Unusual cases must be considered. 23. Unusual cases are handled by the scanner; using the look ahead technique. 24. In C language too; we deploy this technique to resolve ambiguity for tokens like a+=, a+1. In the above case also, one pointer will rest a+, but in spite of announcing it as a token + , it will look ahead to check if there is any the possibility for the token. 25. A number of tools have been developed for automatically constructing lexical scanners from specifications stated in special purpose language. Conclusion :

Potrebbero piacerti anche