Sei sulla pagina 1di 21

Syntax Analysis

Constructing LR(1) parsing tables An LR(1) (or CLR(1)) parsing table is more powerful then an SLR table. The algorithm for construction LR(1) parsing tables is similar to that for LR(0) but the definition of LR(1) item is more sophisticated (or complicated). A LR(1) or a canonical LR (CLR) item uses lookahead for each rule in an LR(0) automatons state. An LR(1) item's general form is [ AB.C, t] where AB.C is a grammar rule and t is a lookahead terminal symbol or a $ sign. 1 refers to the length of the second component (t) called lookahead of the item. The lookahead has no effect on items of the form [A. ,t] where is not . It has an effect on all items of the form [A. ,t]. It calls for the reduction by rule A only if the next symbol is t. Now the item is actually reduced if the next input token matches a current lookahead t (and not any lookahead as in LR(0) or any Follows symbol as in SLR).

For example, given a grammar: S bCd | A A a C Ab | b A computation of Follow(A) produces Follow(A)={$, b}. If an item Aa. placed in some state I and a state before I was a state J with a rule C.Ab
J: C.Ab A.a
a

I: Aa.

then that parser should reduce Aa. only if in the next derivation step it sees a b token. An LR(1) state is a set of LR(1) items. All LR(1) items constructed from 2 parts: [ A., t/k] A current rule with dot division between parsed and unparsed parts: A. And the next symbol that can be derived after reducing a current rule: t or k

A construction of the sets of LR(1) items is done using closure and goto functions of LR(1) automaton.

Closure of the LR(1) state An algorithm for computing Closure(I) repeat for any item [A .B, t] in I for each production B for each wFirst(t) II { [B., w]} until I does not change return I Example: Given a grammar G: S S S AaAb S BbBa A B Given an I0 state: compute a Closure(I0) I0: S.S, $

Closure(I0):

I0: S.S, $ S.AaAb, $ S.BbBa, $ A., a B., b

First(t)=First($)={$}

First(t)=First(aAb$)={a} First(t)=First(bBa$)={b}

Goto of to LR(1) state goto(I,X) where I is a state of LR(1) automaton and X is a grammar symbol. goto(I,X) is a closure set of all items [AX., t] such that [A.X, t] in I. An algorithm for computing goto(I,X) set J to the empty set for any item [A.X, t] in I add [AX., t] to J return Closure(J)

Example: Given a grammar G: S S S AaAb S BbBa A B I0: S.S, $ S.AaAb, $ S.BbBa, $ A., a B., b compute: goto(I0,A), goto(I0,B).
A

Given an I0 state:

I0: S.S, $ S.AaAb, $ S.BbBa, $ A., a B., b

J1: SA.aAb, $

J2: SB.bBa, $

Constructing an LR(1) Automata An algorithm for the construction of C the canonical collection of LR(1) items for an augmented grammar G. C=closure({[S.S, $]})

repeat for each set of items I in C and each grammar symbol X such that goto(I,X) is no empty and not in C do add state goto(I,X) to C //goto is a closed set until no more sets of items can be added to C
Example Given a grammar G: S S S AaAb| BbBa A B I2: SA.aAb, $
a A

Build LR(1) automaton.

I3: SAa.Ab, $ A., b


A

I0: S.S, $ S.AaAb, $ S.BbBa, $ A., a B., b

I1: SS., $ I6: SB.bBa, $


b

I7: S Bb.Ba, $ B., a


B

I4: SAaA.b, $
b

I8: S BbB.a, $
a

I5: SAaAb., $
6

I9: S BbBa., $

Building action and goto tables for an LR(1) parser Input: an augmented grammar G Output: action and goto tables 1. Construct C = { I0, I1,.. In } the collection of LR(1) items for augmented grammar G. 2. State k (or row k in the table) is constructed from Ik. The parsing actions for state k are determined as follows: a. If [B., t] is in Ik then set action[k,t] to reduce B. (where BS) b. If [SS., $] is in Ik then set action[k,$] to accept. c. If [A.v, t] is in Ik and goto(Ik,v)=Ij then set action[k,v] to shift j (where v is a terminal symbol). 3. The goto transitions for state k are constructed for all nonterminals A using the rule: If goto(Ik,A)=Ij then goto[k,A]=j. 4. All entries not defined in 2 and 3 are errors. 5. The initial state of the parser is the one constructed from the set of items containing [S.S, $].

Example: Build action and goto tables for a given grammar G: 1. 2. 3. 4. 5. S S S AaAb S BbBa A B

First construct an LR(1) automaton (previous example). A number of rows in action and goto tables are equal to the number of states in LR(1) automaton: I0 - I9. An action table has 3 columns for terminal symbols: a, b and $. A goto table has 3 columns for variables: S, A and B. The resulting table is:
action

0 1 2 3 4 5 6 7 8 9

a r4 s3

b r5 r4 s5

$ acc

S 1

goto A 2 4

B 6

r2 s7 r5 s9 r3 8

A grammar is LR(1) if it does not have conflicts: shift-reduce conflict for any item [A.x, t] in state I with x as a terminal, there is no other item of the from [B., x] in I reduce-reduce conflict there are no two items of the form [A., t] and [B., t] in state I

LALR LR(1) parsing tables can be very large. Often the same states repeated in LR(1) automaton. The only difference is their lookahead components. A smaller table can be produced by merging any two states whose items are identical except for lookahead sets. Example:
J: [A., b] [B., c] K: [A., d] [B., a]

States J and K can be merged because they have the same items. The only difference is lookahead symbols. Merged state JK contains:
JK: [A., b/d] [B., c/a]

LR(1) automaton with merged states called LALR(1) automaton.

LALR - Stands for LookAhead LR - 1 means 1 token lookahead - Typically LALR(1) automaton has 10 times fewer states than LR(1) Do we get an SLR table after merging the states in LR(1) automaton? No. Sometimes when we merge two states we end up with the complete Follow set. But sometimes after merging we are still talking about subset of Follow. Example: SS SBb| aa | bBa B a

Follow(B)={b,a}

In SLR table we will have: I0:S.S S.Bb S.aa S.bBa B.a I2: Sa.a Ba.
Shift-reduce conflict in action[2,a]

10

In LALR table we will have: I0:S.S, $ S.Bb, $ S.aa , $ S.bBa, $ B.a, b I2: Sa.a, $ Ba., b
No conflicts

An LALR table construction A construction of an LALR(1) table can be done in the following way: 1. Construct an automaton of LR(1) items. 2. Merge similar states. 3. Build action and goto tables. Another way to construct LALR(1) table is to use step by step merging technique. Merge the states after each state construction. A grammar is said to be an LALR if its LALR parsing table contains no conflicts. Any reasonable programming language has a LALR(1) grammar. Exist many parser-generator tools available for LALR(1) grammars.

11

LALR example Grammar a G: 1. SS 2. SE 3. EE+T 4. ET 5. T(E) 6. Tnum


First compute first and follow sets for G variables:

S S E T

first num ( num ( num ( num (

follow $ $ $+) $+)

Afterwards build an LR(1) items automaton (or canonical LR(1) collection) with merging the states with the same righthand side and different lookahead:
I1: S.S , $ S.E , $ E.E+T ,$/+ E.T ,$/+ T.(E) ,$/+ T.num ,$/+ num I6: Tnum. ,$/+/) ( S I2: SS. , $ I3: SE. , $ EE.+T ,$/+ I4: ET. ,$/+ T I5: T(.E) ,$/+/) E.E+T ,)/+ E.T ,)/+ T.(E) ,)/+ T.num ,)/+ ( I9:EE+T. ,$/+/) T + I7: EE+.T ,$/+/) T.(E) ,$/+/) T.num ,$/+/) + I8: T(E.) ,$/+/) EE.+T ,)/+ ) I10:T(E). ,$/+/)

E )

num

num

12

Merged states are marked with red color. (If you build LR(1) automaton you will have 17 states in it) The resulting LALR action/goto table is: 1 2 3 4 5 6 7 8 9 10 num + S6 S7 R4 S6 R6 S6 S7 R3 R5 S5 R6 S5 S10 R3 R5 R3 R5 ( S5 ) $ acc R2 R4 8 R6 9 4 S S 2 E 3 T 4

13

An hierarchy of grammar classes Unambiguous Grammars

LL(k)

LR(k) LR(1) LALR SLR LR(0)

LL(1) LL(0)

14

Bison parser generator Bison is a PC version of YACC (Yet another compilercompiler). YACC is a classic and widely used parser generator. YACC was build by S.C.Johnson in the early 70th. YACC is available as a command on the UNIX system. A bison input file has .y ending. filename.y bison filename_tab.c

It transforms filename.y into C program named filename_tab.c using LALR parser method. An input file for the Bison tool is a grammar file written according to bison's syntax format. The general form of a Bison file is: %{ C user declarations %} Bison parser declarations %% grammar rules %% additional c code where C user declarations copied to code (written to y.tab.c verbatum)

15

parser declarations include a list of terminal symbols, nonterminal symbols, priorities of operators. For example token declaration: %token MY_TOKEN grammar rules are productions of the form exp: exp PLUS exp {semantic action} where exp is grammar variable producing a right-hand side of exp+exp and PLUS is a terminal token. Semantic actions are written in C and will be performed whenever the parser perform reduce action using this rule. additional c code (written to y.tab.c verbatum). For example function yylex().

Example:
%{ #define alloca _alloc #include<malloc.h> %} %token PLUS %token DOT %token NUM Those lines should be added for proper work in Visual Studio

tokens returned by yylex

%% input : NUM PLUS NUM DOT { $$=$1+$3; printf("\n%d\n",$$); } input | /*empty*/ ; end of the rule sign %% #include "lex.yy.c" void main (void) { yyparse (); } int yyerror (char *s) { printf ("%s\n", s); } 16

file produced by flex or written by hand with function yylex() in it

where an input file to FLEX is:


%{ #include<math.h> int yywrap(){return 1;} %} %% [0-9]+ {yylval=atoi(yytext); return NUM;} "+" {return PLUS;} EOF { return 1;} "." {return DOT;} . %%

Stages in using Bison 1. Formally specify the grammar in a form recognized by Bison. For each grammatical rule in the language, describe the action that is to be taken when an instance of that rule is recognized. The action is described by a sequence of C statements. 2. Write a lexical analyzer to process input and pass tokens to the parser. The lexical analyzer may be written by hand in C. It could also be produced using flex. 3. Write error-reporting routines.

17

Integration with lexor The bison parser is actually a C language function named yyparse(). yyparse() calls yylex() when needs a new token. Returned value of yylex can be mapped to some token in parser. Example: in flex file return (TOKEN); return(a); in bison file %token TOKEN TOKEN is used by the production a is used in production rule

yylval used to return more interesting values. yylval assumed to be integer if you take no other action. Conflicts policy Bison reports shift-reduce and reduce-reduce conflicts. If you have shift-reduce conflict bison solves the conflict by shifting. If you have reduce-reduce conflict bison solves it by reducing using the rule that appears earlier in the grammar.

18

Using attributes in Bison We can associate actions with Bison productions. a: b c d {action for this production} | e {action for this production} f Action is done at reduce time. It is written in curly brackets { and }. Action can be embedded. Elements of the production referred to using $ sign. $$ represents yylval of the variable on the left hand side of the rule. $i represents an is element in right hand side of the production. For example: a is $$, b is $1, c is $2,e is $1 and f is $3. In our example:
input : NUM PLUS NUM DOT { $$=$1+$3; printf("\n%d\n",$$); } input

$$ is input value $1 is NUM token value $3 is NUM token value $4 is DOT token value

19

Another example:
%{ #define alloca _alloc #include<malloc.h> #include<stdio.h> #include<ctype.h> %union is a bison declaration to specify typedef %union { several possible data types for semantic values char* name; int type; }STYPE; YYSTYPE is a macro for the data type #define YYSTYPE STYPE of semantic values; int by default %} %token <type> TYPE %token <name> ID %type <type> var where <value> specifies the type of the token or variable

%% dec : TYPE var ; { $<type>2=$<type>1; } ; var : var ; ID { $$=$1; add2symtable($3,$$); } | ID { $<type>$=$<type>0; /*changing a type of $$ to alternative type*/ add2symtable($<name>1,$<type>$); } ; %% add2symtable(char * name, int type) /*puts the name name in the symbol table and attaches the type type to it.*/ } yylex() {... }

20

Integer arithmetics example:


%{ #define alloca _alloc #include<malloc.h> %} %token INTEGER %% S : E {printf(%d\n,$1);} ; E : T { $$=$1;} | E A T {switch($2) { case+ $$=$1+$3;break; case- $$=$1-$3;break; } } ; T : F {$$=$1;} | T M F {switch($2) { case* $$=$1*$3;break; case/ $$=$1/$3;break; } } ; F: (E) {$$=$2;} | INTEGER {$$=$1;} |-INTEGER{$$=-$2;} |-(E) {$$=-$3;} ; A:+ {$$=+;} |-{$$=-;} ; M:* {$$=*;} |/{$$=/;} ; %% #include "lex.yy.c" void main(void) { if(!yyparse()) printf(Parsed correctly); else printf(Illegal expression); }

21

Potrebbero piacerti anche