Sei sulla pagina 1di 21

| 

?  ?? ?

 


 M 
{ ?    is also often termed as
|
  .

{ ›n computing, |
  or ?   
  , is the process of analyzing a
text, made of a sequence of tokens (for
example, words), to determine its
grammatical structure with respect to a
given (more or less) formal grammar.
|


›n computing, a 

is one of the components
in an interpreter or compiler, which checks for correct
syntax and builds a data structure (often some kind of
parse tree , abstract syntax tree or other hierarchical
structure) implicit in the input tokens. The parser often
uses a separate Lexical Analysis to create tokens from
the sequence of input characters.
Parsers may be programmed by hand or may be
(semi-)automatically generated (in some programming
languages) by a tool.
?yntax Analysis (Parsing)
‡input
±?equence of tokens

‡output
±Abstract ?yntax Tree

‡Report syntax errors


-- unbalanced parenthesizes

‡[Create ³symbol-table´ ] and Parse Tree


‡›n some cases the tree need not be generated
(one-pass compilers)
m  
 

? 

The first stage is the token generation,


or Lexical Analysis, by which the input
character stream is split into meaningful
symbols defined by a grammar of regular
expressions.
m  
 
? 
The next stage is parsing or syntactic
analysis, which is checking that the tokens
form an allowable expression. This is usually
done with reference to a context free grammar,
which recursively defines components that can
make up an expression and the order in which
they must appear.
m  
 
  Ú

?ource
program
Parse
tree
Rest of
  
parser front end
 

Request
for token
‡ We categorize the parsers into two groups:

 |


± the parse tree is created top to bottom, starting


from the root.
 |


± the parse is created bottom to top; starting from the


leaves
doth  and  parsers scan
the input from left to right (one symbol at a
time).
Efficient  and  parsers can
be implemented only for sub-classes of context-
free grammars.
±LL for top-down parsing
±LR for bottom-up parsing
O 


!O "
‡ ›nherently recursive structures of a
programming language are defined by a CFG.
‡ ›n a CFG, we have:
± A finite set of terminals (in our case, this will be
the set of tokens)
± A finite set of non-terminals (syntactic-variables)
± A finite set of productions rules in the following
form
A0 where A is a non-terminal and is a string
of terminals and non-terminals (including the empty
string)
± A start symbol (one of the non-terminal symbol)
‡Example:

p0 pp  pp  pp  pp  p


p0 p
p 0

]erivations
E   E+E
‡ E+E derives from E
± we can replace E by E+E
± to able to do this, we have to have a production rule
E0E+E in our grammar.
E   E+E   id+E   id+id
‡ A sequence of replacements of non-terminal symbols is
called a 
#  of id+id from E.

*
+
‡›n general a derivation step is

A    if there is a production rule A0 in our


grammar
where and  are arbitrary strings
of terminal and non-terminal symbols

1  2  ...   n ( n derives from 1 or 1 derives n)

  : derives in one step


  : derives in zero or more steps
  : derives in one or more steps
]erivations

E   -E   -(E)   -(E+E)   -(id+E)   -(id+id)


OR
E   -E   -(E)   -(E+E)   -(E+id)   -(id+id)

‡ At each derivation step, we can choose any of the


non-terminal in the sentential form of G for the
replacement.
‡›f we always choose the left-most non-terminal in
each derivation step, this derivation is called as  $

# .

‡›f we always choose the right-most non-terminal in


each derivation step, this derivation is called as

%
# .
Left-Most and Right-Most ]erivation
Left-Most ]erivation
lm E    -(E)  
lm -E lm lm lm   -(id+E)   -
-(E+E)
(id+id)

rm rm rm rm rm
Right-Most ]erivation
E   -E   -(E)   -(E+E)   -(E+id)   -
(id+id)
‡The top-down parsers try to find the left-
most derivation of the given source
program.

‡The bottom-up parsers try to find the right-


most derivation of the given source
program in the reverse order.
Parse Tree
‡ ›nner nodes of a parse tree are non-terminal symbols.
‡ The leaves of a parse tree are terminal symbols.

‡ A parse tree can be seen as a graphical representation of a derivation.


E   -E   -(E) E E
E   -(E+E)
- E - E
- E
( E ) ( E )

E E + E
E
- - E
E
  -(id+E)   -(id+id)
( E ) ( E )

E + E E + E

id id id
Ambiguity
‡ A grammar produces more than one parse tree for a sentence
is called as an    grammar.

E   E+E   id+E   id+E*E E + E


  id+id*E   id+id*id id *
E E

id id

E
E   E*E   E+E*E   id+E*E
E * E
  id+id*E   id+id*id
E + E id

id id
Ambiguity (cont.)
‡ For the most parsers, the grammar must be
unambiguous.

‡ unambiguous grammar
{ unique selection of the parse tree for a
sentence
Ambiguity (cont.)
‡We should eliminate the ambiguity in the
grammar during the design phase of the
compiler.
‡An unambiguous grammar should be written to
eliminate the ambiguity.
‡We have to prefer one of the parse trees of a
sentence (generated by an ambiguous grammar)
to disambiguate that grammar to restrict to this
choice.

Potrebbero piacerti anche