Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
ii. Find a variable that is written down and a rule whose lhs is
that variable. Replace the written down variable with the rhs
of that rule.
We may also say that uwv directly derived from uAv using the
rule A w
Language specified by G:
If G=(V, , R, S) is a CFG then the language specified by G (or the
language of G) is L(G)={w *|S * w}
7/21/2017 10:52 AM CS 120 SemesterII-2013 19
Example 3: GFG G3
Consider the grammar:
G3=({S},{a,b},{S aSb|SS| },S)
L(G3) strings such as:
abab
aaabbb
aababb
Note: if one think of a, b as (,) then we can see that
L(G3) is the language of all strings of properly nested
parentheses.
CFL
Regular
Then, we can show that the above CFG generates exactly the
same language as D (how to show?)
7/21/2017 10:52 AM CS 120 SemesterII-2013 33
Regular Language & CFG (Example)
DFA 0 1
1
q0 q1
start
0
CFG G = ( {V0, V1}, {0,1}, R, V0 ), where R is
V0 0V0 | 1V1 |
V1 1V1 | 0V0
7/21/2017 10:52 AM CS 120 SemesterII-2013 34
Leftmost Derivation
A derivation which always replace the leftmost
variable in each step is called a leftmost derivation
E.g., Consider the CFG for the properly nested
parentheses ( {S}, {(,)}, R, S ) with rule R: S (
S ) | SS |
Then, S SS (S)S ( )S ( ) ( S )
( ) ( ) is a leftmost derivation
But, S SS S(S) (S)(S) ( ) ( S )
( ) ( ) is not a leftmost derivation
However, we note that both derivations correspond to
the same parse tree
S S
S + S S x S
a
S x S S + S a
a a a a
S S
S ? ?
A
A B A B
A A Infinitely
A A b B A A b many others
A A A b possible.
a a b a A b
a A b
a
a
7/21/2017 10:52 AM CS 120 SemesterII-2013 39
Grammar Ambiguity
If a string has two or more leftmost(or rightmost)
derivations in a CFG G, we say the string is derived
ambiguously in G
A grammar is ambiguous if some strings are derived
ambiguously.
Note that the two leftmost derivations in the previous
example correspond to different parse trees (see previous
slide)
In fact, each leftmost derivation corresponds to a
unique parse tree.
S S+M|M S + M
MM*T | T
T (S) | number
Derivation
M M * T
Parsing
Programming
languages 1
T T
are (should be)
designed to make
parsing easy, 2
3
efficient, and
unambiguous.
3 + 2 * 1
7/21/2017 10:52 AM CS 120 SemesterII-2013 45
Easy and Efficient Parsing
50
7/21/2017 10:52 AM CS 120 SemesterII-2013
Example Consider the CFG
S a | Xb | aYa
XY|
Yb|X
51
7/21/2017 10:52 AM CS 120 SemesterII-2013
Consider the CFG
Example S Xa
X aX | bX |
X bX Xb
52
7/21/2017 10:52 AM CS 120 SemesterII-2013
Example
S XY
X Zb
Null-able Non-terminals are?
Y bW
Z AB A, B, Z and W
WZ
A aA | bA |
B Ba | Bb |
53
7/21/2017 10:52 AM CS 120 SemesterII-2013
S XY
X Zb
Example Contd. Y bW
Z AB
WZ
A aA | bA |
B Ba | Bb |
Old nullable New So the new CFG is
Production Production S XY
X Zb Xb
Y bW Yb X Zb | b
Z AB Z A and Z B Y bW | b
WZ Nothing new Z AB | A | B
A aA Aa
A bA Ab
WZ
B Ba B a A aA | bA | a | b
B Bb Bb B Ba | Ba | a | b
54
7/21/2017 10:52 AM CS 120 SemesterII-2013
Killing unit-productions
56
7/21/2017 10:52 AM CS 120 SemesterII-2013
Killing unit-productions
Consider the CFG
S A| bb
AB|b
BS|a
The non-unit productions are
S bb, A b ,B a
And unit productions are
SA
AB
BS
57
7/21/2017 10:52 AM CS 120 SemesterII-2013
Killing unit-productions: Example
contd.
Lets list all unit productions and their sequences and create new
productions:
SA gives Sb
SAB gives Sa
AB gives Aa
ABS gives A bb
BS gives B bb
BSA gives Bb
Eliminating all unit productions, the new CFG is
S bb | b | a
A b | a | bb
B a | bb | b
This CFG generates a finite language since there are no non-terminals
in any strings produced from S.
58
7/21/2017 10:52 AM CS 120 SemesterII-2013
Useless Symbols
Let a CFG G. A symbol X (V U ) is useful if there is a derivation
* *
S UxV w
G G
Where U and V (V U ) and w *. A symbol that is not useful is
useless
A terminal is useful if it occurs in a string of the language of G.
A variable is useful if it occurs in a derivation that begins from S and
generates a terminal string
S A aA aaA aaaA
63
7/21/2017 10:52 AM CS 120 SemesterII-2013
Another grammar:
SA
A aA
A
B bA Useless Production
64
7/21/2017 10:52 AM CS 120 SemesterII-2013
contains only
In general:
terminals
if S xAy w
w L(G )
65
7/21/2017 10:52 AM CS 120 SemesterII-2013
A production A x is useless
if any of its variables is useless
S aSb
S Productions
Variables SA useless
useless A aA useless
useless BC useless
useless CD useless
66 7/21/2017 10:52 AM CS 120 SemesterII-2013
Removing Useless Productions
Example Grammar:
S aS | A | C
Aa
B aa
C aCb
67
7/21/2017 10:52 AM CS 120 SemesterII-2013
First: Find all variables that can produce
strings with only terminals
Round 1:
S aS | A | C { A, B}
Aa SA
B aa
C aCb Round 2: { A, B, S}
68
7/21/2017 10:52 AM CS 120 SemesterII-2013
Keep only the variables
that produce terminal symbols:{ A, B, S}
(the rest variables are useless)
S aS | A | C
Aa S aS | A
B aa Aa
C aCb B aa
Remove useless productions
69
7/21/2017 10:52 AM CS 120 SemesterII-2013
Find all variables
Second:
reachable from S
S aS | A
Aa S A B
B aa not
reachable
70
7/21/2017 10:52 AM CS 120 SemesterII-2013
Keep only the variables
reachable from S
(the rest variables are useless)
Final Grammar
S aS | A
S aS | A
Aa
Aa
B aa
71
7/21/2017 10:52 AM CS 120 SemesterII-2013
Set of variables that Derive terminal
symbols
Input = CFG (V, , P , S)
TERM = { A | there is a rule Aw P with
w *}
repeat
PREV = TERM
For each variable in A V do
o If there is a rule A w and w (PREV U )* then
TERM = TERM U {A}
Until PREV = TERM
72 CS 120 SemesterII-2013 7/21/2017 10:52 AM
Example
76
7/21/2017 10:52 AM CS 120 SemesterII-2013
Removing All
77
7/21/2017 10:52 AM CS 120 SemesterII-2013
Chomsky Normal Form (CNF)
A CFG is in Chomsky Normal Form if each rule is of the
form
A BC
Aa
where
a is any terminal
A,B,C are variables
B, C cannot be start variable
However, S is allowed
79
7/21/2017 10:52 AM CS 120 SemesterII-2013
Chomsky Normal Form (CNF)
Why should we care for CNF? Well, its an effective
grammar, in the sense that every variable that being
expanded (being a node in a parse tree), is guaranteed
to generate a letter in the final string.
Not Chomsky
Normal Form
83
7/21/2017 10:52 AM CS 120 SemesterII-2013
Introduce variables for Ta , Tb , Tc
terminals:
S ABTa
S ABa A TaTaTb
A aab B ATc
B Ac Ta a
Tb b
Tc c
84
7/21/2017 10:52 AM CS 120 SemesterII-2013
Introduce intermediate variable: V1
S AV1
S ABTa
V1 BTa
A TaTaTb
A TaTaTb
B ATc
B ATc
Ta a
Ta a
Tb b
Tb b
Tc c
Tc c
85
7/21/2017 10:52 AM CS 120 SemesterII-2013
Introduce intermediate variable: V2
S AV1
S AV1
V1 BTa
V1 BTa
A TaV2
A TaTaTb
V2 TaTb
B ATc
B ATc
Ta a
Ta a
Tb b
Tb b
Tc c
86
7/21/2017 10:52 AM CS 120 SemesterII-2013 Tc c
Final grammar in S AV1
Chomsky Normal Form:
V1 BTa
A TaV2
Initial grammar
V2 TaTb
S ABa B ATc
A aab Ta a
B Ac Tb b
87
7/21/2017 10:52 AM CS 120 SemesterII-2013
Tc c
General Conversion Steps [Step 1]
After the change, the string on the right side of any rule
is either of length 1 (a terminal) or length 2 (two
variables, or 1 variable + 1 terminal, or two terminals)
After the change, the string on the right side of any rule
is exactly a terminal or two variables
S ASA | aB S0 S
AB|S S ASA | aB
Bb| AB|S
Bb|
S0 S S0 S
S ASA | aB S ASA | aB | a
AB|S AB|S|
Bb| Bb
S0 S S0 ASA | aB | a |
S ASA | aB | a | SA | AS
SA | AS S ASA | aB | a |
AB|S SA | AS
Bb AB|S
Bb
After removing S S After removing S0 S
Then, we remove A B
S0 ASA | aB | a | S0 ASA | aB | a |
SA | AS SA | AS
S ASA | aB | a | S ASA | aB | a |
SA | AS SA | AS
AB|S Ab|S
Bb Bb
S0 ASA | aB | a | S0 ASA | aB | a |
SA | AS SA | AS
S ASA | aB | a | S ASA | aB | a |
SA | AS SA | AS
Ab|S A b | ASA | aB |
a | SA | AS
Bb
Bb