Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Context-free grammar
This is an a different model for describing
languages
The language is specified by productions
(substitution rules) that tell how strings can
be obtained, e.g.
A 0A1
AB
B#
A, B are variables
0, 1, # are terminals
A is the start variable
Natural languages
CFGs were first used for natural languages
a girl with a flower likes the boy
ART NOUN PREP
ART
NOUN
CMPLX-NOUN
CMPLX-NOUN
PREP-PHRASE
VERB
ART
NOUN
CMPLX-NOUN
NOUN-PHRASE
CMPLX-VERB
VERB-PHRASE
NOUN-PHRASE
SENTENCE
Some examples
We can describe parts of English like this:
SENTENCE NOUN-PHRASE VERBPHRASE
NOUN-PHRASE CMPLX-NOUN
NOUN-PHRASE CMPLX-NOUN PREPPHRASE
VERB-PHRASE CMPLX-VERB
VERB-PHRASE CMPLX-VERB PREPPHRASE
PREP-PHRASE PREP CMPLX-NOUN
CMPLX-NOUN ARTICLE NOUN
CMPLX-VERB VERB NOUN-PHRASE
variables:
SENTENCE,
NOUN-PHRASE,
CMPLX-VERB
VERB
terminals: a, the, boy, girl, flower, likes,
touches, sees, with
start variable: SENTENCE
ARTICLE a
ARTICLE
the
NOUN boy
NOUN girl
NOUN
flower
VERB likes
VERB
touches
VERB sees
PREP with
Derivations
(10)ARTICLE a
(1) SENTENCE NOUN-PHRASE VERB(11)ARTICLE
(2) PHRASE
(12)the
(3) NOUN-PHRASE CMPLX-NOUN
(13)NOUN boy
(4) NOUN-PHRASE CMPLX-NOUN PREP(14)NOUN girl
(5) PHRASE
(15)NOUN
(6) VERB-PHRASE CMPLX-VERB
(16)flower
(7) VERB-PHRASE CMPLX-VERB PREP(17)VERB likes
(8) PHRASE
(18)VERB
(9) PREP-PHRASE PREP CMPLX-NOUN
touches
CMPLX-NOUN ARTICLE NOUN
VERB sees
CMPLX-VERB VERB NOUN-PHRASE
SENTENCE
NOUN-PHRASE
VERB-PHRASE
(1)
PREP with
CMPLX-VERB VERB
CPLX-NOUN VERB-PHRASE
(2)
ARTICLE NOUN VERB-PHRASE (7)
a NOUN VERB-PHRASE
(10)
a boy VERB-PHRASE
(12)
a boy CPLX-VERB
a boy VERB
(4)
(9)
a boy sees
(17)
Programming languages
Context-free grammars are also used to
describe (parts of) programming languages
For instance, expressions like (2 + 3) * 5 or
3 + 8 + 2 * 7 can be described by the CFG
EE+E
EE*E
E (E)
E0
E1
E9
Variables: E
Terminals: + * ( ) 0 1 2 3 4 5 6 7 8 9
Definition of context-free
grammar
A context-free grammar (CFG) is a 4-tuple
(V, , R, S) where
N 0N
N 1N
N0
N1
Variables: E, N
Terminals: +, *, (, ), 0,
1
Start variable: E
Derivation
A derivation is a sequential application of productions:
(E) * E
(E) * N
(E + E ) * N
(E + E ) * 1
(E + N) * 1
(N + N) * 1
(N + 1N) * 1
(N + 10) * 1
(1 + 10) * 1
derivation
EE*E
Language of a CFG
The language of a CFG is the set of all
strings of terminals that can be derived
from the start variable
L(G) = { | * and* S }
Example 1
A 0A1 | B
B#
variables: A, B
terminals: 0, 1, #
start variable: A
00#11
00#111
00##11
Example 2
S SS | (S) |
convention: variables in uppercase,
terminals in lowercase, start variable first
(2)
(3)
S (S)
(SS)
((S)S)
((S)
(S))
(()
(S))
(()())
Design examples
L = {0n1n | n 0}
These strings have recursive structure:
0000001111
0000011111
11
00001111
000111
0011
01
S 0S1|
Design examples
L = numbers without leading zeros
0, 109, 2, 23
allowed
S 0|LN
N NA|
A 0|L
L 1|2|3|4|5|6|7|8|9
, 01, 003
not allowed
1052870032
any number
leading digit
Design examples
L = {0n1n0m1m | n 0, m 0}
These strings have two parts:
L = L1L2
L1 = {0n1n | n 0}
L2 = {0m1m | m 0}
rules for L1: S1 0S11|
L2 is the same as L1
S S1S1
S1 0S11 |
Design examples
L = {0n1m0m1n | n 0, m 0}
These strings have nested structure:
outer part: 0n1n
inner part: 1m0m
S 0S1|I
I 1I0 |
Design examples
not allowed
10010011010010110
A: , or ends in 1
initial partmiddle part final part
C: , or begins with 1
Design examples
A: , or ends in 1
10010011010010110
A
S ABC
A | U1
U 0U | 1U |
C | 1U
B 0D0 | 0B0
D 1U1 | 1
C: , or begins with 1
00110100
D
at least one 0
Parse tree
Derivations can also be represented using parse trees
E E + E | E - E | (E) |
V
Vx|y|z
EE+E
V+E
x+E
x + (E)
x + (E E)
x + (V E)
x + (y E)
x + (y V)
x + (y z)
E
E +E
V ( E )
x E E
V
Left derivation
Always derive the leftmost variable first:
EE+E
V+E
x+E
x + (E)
x + (E E)
x + (V E)
x + (y E)
x + (y V)
x + (y z)
E
E +E
V ( E )
x E E
V
y
V
z
Ambiguity
A grammar is ambiguous if some
strings have more than one parse tree
Example:E E + E | E E | (E) |
V
Vx|y|z
E +E
E +E
V E +E
x V
y
V
z
x+y+z
E +E V
V
V z
x+y+z
V
z
E +E
E +E V
V
V z
E | (E)
E * E
E +E
V E +E
x V
y
xy+z
V
z
first y + z, then x
E E V
V
V z
first x y, then + z
Examples
Which of these are ambiguous:
L = {0n1n | n 0}
S 0S1|
L = {0n1n0m1m | n, m
S 0|LN
N NA|
A 0|P
0}
L 1|2|3|4|5|6|7|8|9
S S1S1
S1 0S11 |
S ABC
A | U1
C | 1U
B 0D0 | 0B0
D 1U1 | 1
U 0U | 1U |
Disambiguation
Sometimes we can rewrite the grammar to
remove the ambiguity
EE+E|EE|E
V
Vx|y|z
E | (E) |
Disambiguation
Example
E
ET|E+T|E
T
TF|TF
F (E) | V
Vx|y|z
T
T
F
V
x
Disambiguation
In general, disambiguation is not possible
There exist inherently ambiguous languages:
Every CFG for this language is ambiguous
There is no general procedure that can tell if
a grammar is ambiguous
However, grammars used in programming
languages can typically be disambiguated
Analysis example
S aB | bA
A a | aS | bAA
B b | bS | aBB
ab
baba
abbbaa
a
bba