Sei sulla pagina 1di 53

Parsing

Outline
Top-down v.s. Bottom-up Top-down parsing
 

Bottom-up parsing
 

Recursive-descent parsing LL(1) parsing  LL(1) parsing algorithm  First and follow sets  Constructing LL(1) parsing table  Error recovery

Shift-reduce parsers LR(0) parsing  LR(0) items  Finite automata of items  LR(0) parsing algorithm  LR(0) grammar SLR(1) parsing  SLR(1) parsing algorithm  SLR(1) grammar  Parsing conflict
2

2301373

Chapter 4 Parsing

Introduction
Parsing is a process that constructs a syntactic structure (i.e. parse tree) from the stream of tokens. We already learn how to describe the syntactic structure of a language using (context-free) grammar. So, a parser only need to do this?
Stream of tokens Parser Context-free grammar
2301373 Chapter 4 Parsing 3

Parse tree

Top Down Parsing

Bottom Up Parsing

A parse tree is created A parse tree is created from root to leaves from leaves to root The traversal of parse The traversal of parse trees is a preorder trees is a reversal of traversal postorder traversal Tracing leftmost Tracing rightmost derivation derivation Try different structures and Two types: More powerful than top-down parsing  Backtracking parser backtrack if it does not matched


Predictive parser

the input
4

Guess the structure of the parse tree Chapter 4 Parsing 2301373 the next input from

Parse Trees and Derivations


E E id E + E * E

id id Top-down parsing E E id + E E * E

id id Bottom-up parsing
2301373

E+E id + E id + E * E id + id * E id + id * id EE+E E + E * E E + E * id E + id * id id + id * id

Chapter 4 Parsing

Top-down Parsing
What does a parser need to decide?


Which production rule is to be used at each point of time ?

How to guess? What is the guess based on?




What is the next token?




Reserved word if, open parentheses, etc. If statement, expression, etc.


Chapter 4 Parsing 6

What is the structure to be built?




2301373

Top-down Parsing
Why is it difficult?


Cannot decide until later



 

Next token: if

Structure to be built: St

St p MatchedSt | UnmatchedSt UnmatchedSt p if (E) St| if (E) MatchedSt else UnmatchedSt MatchedSt p if (E) MatchedSt else MatchedSt |...

Production with empty string


Next token: id  par p parList | P



Structure to be built: par

parList p exp , parList | exp


Chapter 4 Parsing 7

2301373

Recursive-Descent
Write one procedure for each set of productions with the same nonterminal in the LHS Each procedure recognizes a structure described by a nonterminal. A procedure calls other procedures if it need to recognize other structures. A procedure calls match procedure if it need to recognize a terminal.
2301373 Chapter 4 Parsing 8

Recursive-Descent: Example
For this grammar: E ::= F {O F}  We cannot decide which O ::= + | rule to use for E, and F ::= ( E ) | id  If we choose E p E O F, procedure E procedure F it leads to infinitely E; O; F; } recursive loops. { switch token { { case (: match((); Rewrite the grammar E; into EBNF match()); case id: match(id); procedure E default: error; { F; } while (token=+ or token=-) } { O; F; } }
2301373 Chapter 4 Parsing 9

EpEOF|F Op+|F p ( E ) | id

Match procedure
procedure match(expTok) { if (token==expTok) then getToken else error } The token is not consumed until getToken is executed.

2301373

Chapter 4 Parsing

10

Problems in Recursive-Descent
Difficult to convert grammars into EBNF Cannot decide which production to use at each point Cannot decide when to use P-production ApP

2301373

Chapter 4 Parsing

11

LL(1) Parsing
LL(1)
Read input from (L) left to right  Simulate (L) leftmost derivation  1 lookahead symbol


Use stack to simulate leftmost derivation


Part of sentential form produced in the leftmost derivation is stored in the stack.  Top of stack is the leftmost nonterminal symbol in the fragment of sentential form.

2301373 Chapter 4 Parsing 12

Concept of LL(1) Parsing


Simulate leftmost derivation of the input. Keep part of sentential form in the stack. If the symbol on the top of stack is a terminal, try to match it with the next input token and pop it out of stack. If the symbol on the top of stack is a nonterminal X, replace it with Y if we have a production rule X p Y.


Which production will be chosen, if there are both X p Y and X p Z ?


Chapter 4 Parsing 13

2301373

Example of LL(1) Parsing


E TX FNX (E)NX (TX)NX (FNX)NX (nNX)NX (nX)NX (nATX)NX (n+TX)NX (n+FNX)NX (n+(E)NX)NX (n+(TX)NX)NX (n+(FNX)NX)NX (n+(nNX)NX)NX (n+(nX)NX)NX (n+(n)NX)NX (n+(n)X)NX (n+(n))NX (n+(n))MFNX (n+(n))*FNX (n+(n))*nNX (n+(n))*nX (n+(n))*n
2301373

n F N T ( ( n + ( n ) ) * n $ X E ) + A n F EpTX N T ( XpATX|P Ap+|X E * M Finished TpFN ) F n NpMFN|P Mp* N T Fp(E ) | n X E $ Chapter 4 Parsing 14

LL(1) Parsing Algorithm


Push the start symbol into the stack WHILE stack is not empty ($ is not on top of stack) and the stream of tokens is not empty (the next input token is not $) SWITCH (Top of stack, next token) CASE (terminal a, a): Pop stack; Get next token CASE (nonterminal A, terminal a): IF the parsing table entry M[A, a] is not empty THEN Get A pX1 X2 ... Xn from the parsing table entry M[A, a] Pop stack; Push Xn ... X2 X1 into stack in that order

ELSEError
CASE ($,$): OTHER:
2301373

Accept Error
Chapter 4 Parsing 15

LL(1) Parsing Table


If the nonterminal N is on the top of stack and the next token is t, which production rule to use? Choose a rule N p X such that
X * tY or  X * P and S * WNtY


t Y X N Q

X N t Y

t
2301373 Chapter 4 Parsing 16

First Set
Let X be P or be in V or T. First(X ) is the set of the first terminal in any sentential form derived from X.
If X is a terminal or P, then First(X ) ={X }.  If X is a nonterminal and X pX1 X2 ... Xn is a rule, then


First(X1) -{P} is a subset of First(X)  First(Xi )-{P} is a subset of First(X) if for all j<i First(Xj) contains {P}  P is in First(X) if for all j n First(Xj)contains P

2301373 Chapter 4 Parsing 17

Examples of First Set


st p ifst | other p exp addop term | ifst p if ( exp ) st elsepart term elsepart p else st | P addop p + | p0|1 term p term mulop factor | exp factor First(exp) = {0,1} mulop p * First(elsepart) = {else, P} factor p (exp) | num First(ifst) = {if} First(addop) = {+, -} First(st) = {if, other} First(mulop) = {*} First(factor) = {(, num} First(term) = {(, num} First(exp) = {(, num} exp
2301373 Chapter 4 Parsing 18

Algorithm for finding First(A)


For all terminals a, First(a) = {a} For all nonterminals A, First(A) := {} While there are changes to any First(A) For each rule A p X1 X2 ... Xn For each Xi in {X1, X2, , Xn } If for all j<i First(Xj) contains P, Then add First(Xi)-{P} to First(A) If P is in First(X1), First(X2), ..., and First(Xn) Chapter 4 Parsing 2301373Then add P to First(A)
If A is a terminal or P, then First(A) = {A}. If A is a nonterminal, then for each rule A pX1 X2 ... Xn, First(A) contains First(X1) - {P}. If also for some i<n, First(X1), First(X2), ..., and First(Xi) contain P, then First(A) contains First(Xi+1)-{P}. If First(X1), First(X2), ..., and First(Xn) contain P, then First(A) also contains P.

19

Finding First Set: An Example


exp p term exp exp p addop term exp | P addop p + | term p factor term term p mulop factor term | P mulop p * factor p ( exp ) | num First exp exp addop term term mulop factor P + ( num

P *
( num

2301373

Chapter 4 Parsing

20

Follow Set
Let $ denote the end of input tokens If A is the start symbol, then $ is in Follow(A). If there is a rule B p X A Y, then First(Y) {P} is in Follow(A). If there is production B p X A Y and P is in First(Y), then Follow(A) contains Follow(B).

2301373

Chapter 4 Parsing

21

Algorithm for Finding Follow(A)


Follow(S) = {$} FOR each A in V-{S} Follow(A)={} WHILE change is made to some Follow sets FOR each production A p X1 X2 ... Xn, FOR each nonterminal Xi Add First(Xi+1 Xi+2...Xn)-{P} into Follow(Xi). (NOTE: If i=n, Xi+1 Xi+2...Xn= P) IF P is in First(Xi+1 Xi+2...Xn) THEN Add Follow(A) to Follow(Xi)
2301373 Chapter 4 Parsing

If A is the start symbol, then $ is in Follow(A). If there is a rule A p Y X Z, then First(Z) - {P} is in Follow(X). If there is production B p X A Y and P is in First(Y), then Follow(A) contains Follow(B).
22

Finding Follow Set: An Example


exp p term exp exp p addop term exp | P addop p + | term p factor term term p mulop factor term |P mulop p * factor p ( exp ) | num

First exp exp addop term term mulop factor


( num

Follow
$) $) )

P + + P* *
( num

( num + - $

2301373

Chapter 4 Parsing

23

Constructing LL(1) Parsing Tables


FOR each nonterminal A and a production A p X FOR each token a in First(X) A p X is in M(A, a) IF P is in First(X) THEN FOR each element a in Follow(A) Add A p X to M(A, a)

2301373

Chapter 4 Parsing

24

Example: Constructing LL(1) Parsing Table


First exp {(, num} exp {+,-, P} addop {+,-} term {(,num} term {*, P} mulop {*} factor {(, num} Follow {$,)} ( ) {$,)} {(,num} exp 1 {+,-,),$} {+,-,),$} exp 3 {(,num} addop {*,+,-,),$}
term term mulop factor

+ -

n
1

2 4

2 5 6

1 exp p term exp 2 exp p addop term exp 3 exp p P 4 addop p + 5 addop p 6 term p factor term 7 term p mulop factor term 8 term p P 9 mulop p * 10 factor p ( exp ) 11 factor p num
2301373

6 8 8 8 7 9 10

11
25

Chapter 4 Parsing

LL(1) Grammar
A grammar is an LL(1) grammar if its LL(1) parsing table has at most one production in each table entry.

2301373

Chapter 4 Parsing

26

LL(1) Parsing Table for non-LL(1) Grammar


1 exp p exp addop term 2 exp p term 3 term p term mulop factor 4 term p factor 5 factor p ( exp ) 6 factor p num exp 7 addop p + term 8 addop p factor 9 mulop p * First(exp) = { (, num } First(term) = { (, num } First(factor) = { (, num } First(addop) = { +, - } First(mulop) = { * }
2301373

( 1,2 3,4 5

) +

addop mulop

- * num $ 1,2 3,4 6 7 8 9

Chapter 4 Parsing

27

Causes of Non-LL(1) Grammar


What causes grammar being non-LL(1)?
Left-recursion  Left factor


2301373

Chapter 4 Parsing

28

Left Recursion
Immediate left recursion
 

Can be removed very easily


 

A p A X | Y A=Y X* A p A X1 | A X2 | | A Xn | Y1 | Y2 |... | Ym , Xn}*

A p Y A, A p X A| P A p Y1 A | Y2 A |...| Ym A , A p X1 A | X2 A | | Xn A | P

A={Y1, Y2, , Ym} {X1, X2,

General left recursion




A => X =>* A Y

Can be removed when there is no empty-string production and no cycle in the grammar
29

2301373

Chapter 4 Parsing

Removal of Immediate Left Recursion exp p exp + term | exp - term | term term p term * factor | factor factor p ( exp ) | num Remove left recursion exp = term (s term)* exp p term exp exp p + term exp | - term exp | P term p factor term term = factor (* factor)* term p * factor term | P factor p ( exp ) | num
2301373 Chapter 4 Parsing 30

General Left Recursion Bad News!




Can only be removed when there is no emptystring production and no cycle in the grammar. Never seen in grammars of any programming languages

Good News!!!!


2301373

Chapter 4 Parsing

31

Left Factoring
Left factor causes non-LL(1)


Given A p X Y | X Z. Both A p X Y and A p X Z can be chosen when A is on top of stack and a token in First(X) is the next token.

ApXY|XZ can be left-factored as A p X A and A p Y | Z

2301373

Chapter 4 Parsing

32

Example of Left Factor


ifSt p if ( exp ) st else st | if ( exp ) st can be left-factored as ifSt p if ( exp ) st elsePart elsePart p else st | P seq p st ; seq | st can be left-factored as seq p st seq seq p; seq | P
2301373 Chapter 4 Parsing 33

Bottom-up Parsing
Use explicit stack to perform a parse Simulate rightmost derivation (R) from left (L) to right, thus called LR parsing More powerful than top-down parsing


Left recursion does not cause problem

Two actions
Shift: take next input token into the stack  Reduce: replace a string B on top of stack by a nonterminal A, given a production A p B

2301373 Chapter 4 Parsing 34

Example of Shift-reduce Parsing


Grammar
S pS S p (S)S | P

Reverse of
Action 1 shift 2 shift 3 reduce S p P 4 shift 5 reduce S p P reduce S p ( S ) S 6 7 shift reduce S p P 8 reduce S p ( S ) S 9 accept 10 S
Chapter 4 Parsing

Parsing actions Stack Input $ (())$ $( ())$ $(( ))$ $((S ))$ $((S) )$ $((S)S )$ $(S )$ $(S) $ $(S)S $ $S $
2301373

rightmost derivation from left to right


(()) (()) (()) ((S)) ((S)) ((S)S) (S) (S) (S)S S

35

Example of Shift-reduce Parsing


Grammar
S pS S p (S)S | P

Parsing actions Stack Input $ (())$ $( ())$ $(( ))$ $((S ))$ $((S) )$ $((S)S )$ $(S )$ $(S) $ $(S)S $ $S $ Viable prefix
2301373

Action shift 1 shift 2 reduce S p P 3 shift 4 reduce S p P 5 reduce S p ( S ) S 6 shift 7 reduce S p P 8 reduce S p ( S ) S 9 accept 10 S
Chapter 4 Parsing

(()) (()) (()) ((S)) ((S)) ((S)S) (S) (S) (S)S S

handle

36

Terminologies
Right sentential form


Right sentential form


 

sentential form in a rightmost derivation sequence of symbols on the parsing stack right sentential form + position where reduction can be performed + production used for reduction production with distinguished position in its RHS

(S)S ((S)S) ( S ) S, ( S ), ( S, ( ( ( S ) S, ( ( S ), ( ( S , ( (, ( ( S ) S. with S p P ( S ) S . with S p P ( ( S ) S . ) with S p ( S ) S

Viable prefix


Viable prefix
 

Handle


Handle
  

LR(0) item
    

LR(0) item


Sp Sp Sp Sp Sp

( S ) S. (S).S (S.)S (.S)S .(S)S


37

2301373

Chapter 4 Parsing

Shift-reduce parsers
There are two possible actions:


shift and reduce

Parsing is completed when


the input stream is empty and  the stack contains only the start symbol


The grammar must be augmented


a new start symbol S is added  a production S p S is added



To make sure that parsing is finished when S is on top of stack because S never appears on the RHS of any production.
Chapter 4 Parsing 38

2301373

LR(0) parsing
Keep track of what is left to be done in the parsing process by using finite automata of items


An item A p w . B y means:
A p w B y might be used for the reduction in the future,  at the time, we know we already construct w in the parsing process,  if B is constructed next, we get the new item ApwB.Y


2301373

Chapter 4 Parsing

39

LR(0) items
LR(0) item


production with a distinguished position in the RHS Item with the distinguished position on the leftmost of the production Item with the distinguished position on the rightmost of the production Item x together with items which can be reached from x via P-transition Original item, not including closure items
Chapter 4 Parsing 40

Initial Item


Complete Item


Closure Item of x


Kernel Item

2301373

Finite automata of items


Grammar:
S pS S p (S)S S pP
S p .S P S p .(S)S ( P P S ) S S P S p. S p S.

Items:
S p .S S p S. S p .(S)S S p (.S)S S p (S.)S S p (S).S S p (S)S. Sp.
2301373

S p (.S)S

S p (S.)S

S p (S).S

S p (S)S.

Chapter 4 Parsing

41

DFA of LR(0) Items


S p .S S S p S. S p .S S p .(S)S S p. ( S p (.S)S S p .(S)S S p. ( ( S ) S S p S. S p (S.)S

P
S p .(S)S

P P
S p.

P
S S p (S.)S

S p (.S)S )

P
S p (S).S S S p (S)S.
2301373 Chapter 4 Parsing

S p (S).S S p .(S)S S p. S S p (S)S.


42

LR(0) parsing algorithm


Item in state token Action shift B and push state s containing A -> xB.y A-> x.By where B is terminal not B error A -> x. reduce with A -> x (i.e. pop x, backup to the state s on top of stack) and push A with new state d(s,A) S -> S. S -> S. none any accept error A-> x.By where B is terminal B

2301373

Chapter 4 Parsing

43

LR(0) Parsing Table


A p .A A p .(A) A p .a 0 A a A p a. 2 A p (A.) 4 ) ( A p (A). 5 A p A. 1

a ( A p (.A) A p .(A) A p .a 3 A

State 0 1 2 3 4 5

Action Rule ( a ) A shift 3 2 1 reduce A -> A reduce A -> a shift 3 2 4 shift 5 reduce A -> (A)

2301373

Chapter 4 Parsing

44

Example of LR(0) Parsing


State Action 0 shift 1 reduce 2 reduce 3 shift 4 shift 5 reduce Rule A -> A A -> a 3 2 5 A -> (A) 4 ( a ) A 3 2 1

Stack $0 $0(3 $0(3(3 $0(3(3a2 $0(3(3A4 $0(3(3A4)5 $0(3A4 $0(3A4)5 $0A1


2301373

Input Action ((a))$ shift (a))$ shift a))$ shift ) ) $ reduce ) ) $ shift ) $ reduce ) $ shift $ reduce $ accept
Chapter 4 Parsing

45

Non-LR(0)Grammar
Conflict


Shift-reduce conflict


A state contains a complete item A p x. and a shift item A p x.By A state contains more than one complete items.

S p .S S p .(S)S Sp.

p S. 1

S p (S.)S 3 S )

( S p (.S)S S p .(S)S Sp.

Reduce-reduce conflict


A grammar is a LR(0) grammar if there is no conflict in the grammar.


2301373 Chapter 4 Parsing

S p (S).S S p .(S)S Sp.

S S p (S)S. 5
46

SLR(1) parsing
Simple LR with 1 lookahead symbol Examine the next token before deciding to shift or reduce
If the next token is the token expected in an item, then it can be shifted into the stack.  If a complete item A p x. is constructed and the next token is in Follow(A), then reduction can be done using A p x.  Otherwise, error occurs.


Can avoid conflict


2301373 Chapter 4 Parsing 47

SLR(1) parsing algorithm


Item in state token Action shift B and push state s containing A -> xB.y A-> x.By (B is terminal) not B A -> x. in error reduce with A -> x (i.e. pop x, stack) and push A with new state d(s,A) A -> x. S -> S. S -> S.
2301373

A-> x.By (B is terminal) B

Follow(A) backup to the state s on top of

not in Follow(A) none any

error accept error

Chapter 4 Parsing

48

SLR(1) grammar
Conflict


Shift-reduce conflict


A state contains a shift item A p x.Wy such that W is a terminal and a complete item B p z. such that W is in Follow(B). A state contains more than one complete item with some common Follow set.

Reduce-reduce conflict


A grammar is an SLR(1) grammar if there is no conflict in the grammar.

2301373

Chapter 4 Parsing

49

SLR(1) Parsing Table


A p (A) | a

A p .A A p .(A) A p .a 0

A a

A p A. 1 A p a. 2

( a A p (.A) A p .(A) A p .a 3 A A p (A.) 4 ) ( A p (A). 5

State ( a ) $ A 0 S3 S2 1 1 AC 2 R2 3 S3 S2 4 4 S5 5 R1

2301373

Chapter 4 Parsing

50

SLR(1) Grammar not LR(0)


S p .S S p .(S)S S p. 0 ( S p (.S)S S p .(S)S S p. 2 ( ( S ) S S p S. 1 S p (S.)S 3

S p (S)S | P

S p (S).S S p .(S)S S p. 4 S Sp (S)S. 5

State ( ) $ 0 S2 R2 R2 1 AC 2 S2 R2 R2 3 S4 4 S2 R2 R2 5 R1 R1

S 1 3 5

2301373

Chapter 4 Parsing

51

Disambiguating Rules for Parsing Conflict

Shift-reduce conflict


Prefer shift over reduce




In case of nested if statements, preferring shift over reduce implies most closely nested rule for dangling else

Reduce-reduce conflict


Error in design

2301373

Chapter 4 Parsing

52

Dangling Else
S p .S 0 S p .I S p .other I p .if S I p .if S else S other S I S

p S.

1 I

S p I. 2 if I

I p if S else .S 6 S p .I S p .other I p .if S I p .if S else S state 0 1 2 3 4 5 6 7 S4 R4 S4 S6 R1 R2 if S4 else

S I p .if S else S 7 other S3 ACC R1 R2 S3 R3 S3 R4


53

else if

other other

S 1

I 2

S p .other 3 other I p if S. 5 I p if S. else S S

I p if .S 4 I p if .S else S S p .I S p .other I p .if S I p .if S else S

5 7

2 2

if

2301373

Chapter 4 Parsing

Potrebbero piacerti anche