Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Goal of parser : build a derivation Top-down parser : build a derivation by working from the start symbol towards the input. Builds parse tree from root to leaves Builds leftmost derivation Bottom-up parser : build a derivation by working from the input back toward the start symbol Builds parse tree from leaves to root Builds reverse rightmost derivation
Bottom-up parsing
The parser looks for a substring of the parse tree's
frontier...
...that matches the rhs of a production and ...whose reduction to the non-terminal on the lhs represents on step along the reverse of a rightmost derivation
Such a substring is called a handle. Important: Not all substrings that match a rhs are handles.
Shift-Reduce parsing
A shift-reduce parser has 4 actions: Shift -- next input symbol is shifted onto the stack Reduce -- handle is at top of stack pop handle push appropriate lhs Accept -- stop parsing & report success Error -- call error reporting/recovery routine
Shift-Reduce parsing
How can we know when we have found a handle? Analyze the grammar beforehand. Build tables Look ahead in the input LR(1) parsers recognize precisely those languages in
L : for left-to-right parse of the input R : for reverse rightmost derivation 1: for one symbol of lookahead
LR parsing techniques
SLR (not in the book) Simple LR parsing Easy to implement, not strong enough Uses LR(0) items Canonical LR Larger parser but powerful Uses LR(1) items LALR (not in the book) Condensed version of canonical LR May introduce conflicts Uses LR(1) items
8
Class examples
E' E E T T F p E p E+T p T p T*F p F p id S' S S L L R p S p L=R p R p *R p id p L
Finding handles
As a shift/reduce parser processes the input, it must
keep track of all potential handles. For example, consider the usual expression grammar and the input string x+y.
Suppose the parser has processed x and reduced it to E. Then, the current state can be represented by E +E where means that an E has already been parsed and that +E is a potential suffix, which, if found, will result in a successful parse. Our goal is to eventually reach state E+E , which represents an actual handle and should result in the reduction EpE+E
10
LR parsing
Typically, LR parsing works by building an automaton where
each state represents what has been parsed so far and what we hope to parse in the future.
States containing handles (meaning the dot is all the way to the
right end of the production) lead to actual reductions depending on the lookahead.
11
SLR parsing
SLR parsers build automata where states contain
items (a.k.a. LR(0) items) and reductions are decided based on FOLLOW set information. We will build an SLR table for the augmented grammar
S'pS S p L=R SpR L p *R L p id RpL
12
SLR parsing
When parsing begins, we have not parsed any input at all and
Note that in order to parse that S, we must either parse an L=R or an R. This is represented by SpyL=R and SpyR
closure of a state: if ApayBb represents the current state and BpK is a production, then add B p yK to the state. Justification: ayBb means that we hope to see a B next. But parsing a B is equivalent to parsing a K, so we can say that we hope to see a K next
13
SLR parsing
Use the closure operation to define states containing
From this state, if we parse, say, an id, then we go to L p id y state If, after some steps we parse input that reduces to S p L y=R an L, then we go to state Rp Ly
14
SLR parsing
Continuing the same way, we define all LR(0) item
states:
I0 S'py S S p y L=R SpyR L p y *R L p y id RpyL id I3 R S
I1 S'p S y
L p id y
I4 S p R y
SLR parsing
The automaton and the FOLLOW sets tell us how to build the
parsing table: Shift actions If from state i, you can go to state j when parsing a token t, then slot [i,t] of the table should contain action "shift and go to state j", written sj Reduce actions If a state i contains a handle ApEy, then slot [i, t] of the table should contain action "reduce using ApE", for all tokens t that are in FOLLOW (A). This is written r(ApE)
The reasoning is that if the lookahead is a symbol that may follow A, then a reduction ApE should lead closer to a successful parse.
16
SLR parsing
The automaton and the FOLLOW sets tell us how to build the
parsing table: Reduce actions, continued Transitions on non-terminals represent several steps together that have resulted in a reduction. For example, if we are in state 0 and parse a bit of input that ends up being reduced to an L, then we should go to state 2. Such actions are recorded in a separate part of the parsing table, called the GOTO part.
17
SLR parsing
Before we can build the parsing table, we need to compute the
FOLLOW sets: S'p S S p L=R Sp R L p *R L p id Rp L FOLLOW(S') = {$} FOLLOW(S) = {$} FOLLOW(L) = {$, =} FOLLOW(R) = {$, =}
18
SLR parsing
state 0 1 2 3 4 5 6 7 8 9 id s3 = s6/r(RpL) r(Lpid) s3 s3 r(RpL) r(Lp*R) s5 s5 r(RpL) r(Lp*R) r(SpL=R) action * s5 $ accept r(Lpid) r(SpR) 7 7 8 9 goto S L R 1 2 4
Conflicts in LR parsing
There are two types of conflicts in LR parsing: shift/reduce On some particular lookahead it is possible to shift or reduce The if/else ambiguity would give rise to a shift/reduce conflict reduce/reduce This occurs when a state contains more than one handle that may be reduced on the same lookahead.
20
it seems to occur when we have parsed an L and are seeing an =. A reduce at that point would turn the L into an R. However, note that a reduction at that point would never actually lead to a successful parse. In practice, L should only be reduced to an R when the lookahead is EOF ($). An easy way to understand this is by considering that L represents l-values while R represents r-values.
21
to reduce based on what token may follow a non-terminal at any time. However, the fact that a token t may follow a non-terminal N in some derivation does not necessarily imply that t will follow N in some other derivation. SLR parsing does not make a distinction.
22
information, try to keep track of exactly what tokens many follow a non-terminal in each possible derivation and perform reductions based on that knowledge. Save this information in the states. This gives rise to LR(1) items:
23
read any input (S'pyS), we hope to parse an S and after that we should expect to see a $ as lookahead. We write this as: S'pyS, $ Now, consider a general item ApEy&F, x. It means that we have parsed an E, we hope to parse &F and after those we should expect an x. Recall that if there is a production &pK, we should add &pyK to the state. What kind of lookahead should we expect to see after we have parsed K?
We should expect to see whatever starts a F. If F is empty or can vanish, then we should expect to see an x after we have parsed K (and reduced it to B)
24
as follows: For each item ApEy&F, x in state I, each production &pK in the grammar, and each terminal b in FIRST(Fx), add &pyK, b to I If a state contains core item &pyK with multiple possible lookaheads b1, b2,..., we write &pyK, b1/b2 as shorthand for &pyK, b1 and &pyK, b2
25
I1
S pL= y R, $ R SpL=Ry, $ R p y L, $ id L p y *R, $ Lpidy, $ I3' L p y id, $ * L R pLy, $ I7' L p*yR, $ L id I3' R p yL, $ L p yid, $ L p*R y, $ L p y*R, $ R I8' *
I9
we now use the possible lookahead tokens saved in each state, instead of the FOLLOW sets. Note that the conflict that had appeared in the SLR parser is now gone. However, the LR(1) parser has many more states. This is not very practical.
27
LALR(1) parsing
This is the result of an effort to reduce the number of
states in an LR(1) parser. We notice that some states in our LR(1) automaton have the same core items and differ only in the possible lookahead information. Furthermore, their transitions are similar.
We shrink our parser by merging such states. SLR : 10 states, LR(1): 14 states, LALR(1) : 10 states
28
LALR(1) parsing
I6 S'p S y, $ I0 S'py S, $ S p y L=R, $ L S p y R, $ S p L y=R, $ = L p y *R, =/$ I2 R p L y, $ L p y id, =/$ * R p y L, $ L p*yR, =/$ I5 R p yL, =/$ id R L p yid, =/$ I3 L p id y, =/$ id L p y*R, =/$ L I4 S p Ry, =/$ * R S I1 S pL= y R, $ R R p y L, $ id I L p y *R, $ 3 L p y id, $ * L I9 SpL=Ry, $
R pLy, =/$ I7
I8 L p*R y, =/$
29
created the LR(1) parser has not reappeared. Can LALR(1) parsers introduce conflicts that did not exist in the LR(1) parser? Unfortunately YES. BUT, only reduce/reduce conflicts.
30
Such conflicts are caused when a lookahead is the same as a token on which we can shift. They depend on the core of the item. But we only merge states that had the same core to begin with. The only way for an LALR(1) parser to have a shift/reduce conflict is if one existed already in the LR(1) parser. LALR(1) parsers can introduce reduce/reduce conflicts. Here's a situation when this might happen:
A p B y, x A p C y, y
merges with
Ap By,y to give: A p C y, x
A p B y , x/y A p C y, x/y
31