Sei sulla pagina 1di 29

Context-free languages

Context-free grammar
This is an a different model for describing
languages
The language is specified by productions
(substitution rules) that tell how strings can
be obtained, e.g.
A 0A1
AB
B#

A, B are variables
0, 1, # are terminals
A is the start variable

Using these rules, we can derive strings like


this:
A 0A1 00A11 000A111
000B111
000#111

Natural languages
CFGs were first used for natural languages
a girl with a flower likes the boy
ART NOUN PREP

ART

NOUN

CMPLX-NOUN

CMPLX-NOUN
PREP-PHRASE

VERB

ART

NOUN

CMPLX-NOUN
NOUN-PHRASE
CMPLX-VERB
VERB-PHRASE

NOUN-PHRASE
SENTENCE

Some examples
We can describe parts of English like this:
SENTENCE NOUN-PHRASE VERBPHRASE
NOUN-PHRASE CMPLX-NOUN
NOUN-PHRASE CMPLX-NOUN PREPPHRASE
VERB-PHRASE CMPLX-VERB
VERB-PHRASE CMPLX-VERB PREPPHRASE
PREP-PHRASE PREP CMPLX-NOUN
CMPLX-NOUN ARTICLE NOUN
CMPLX-VERB VERB NOUN-PHRASE
variables:
SENTENCE,
NOUN-PHRASE,
CMPLX-VERB
VERB
terminals: a, the, boy, girl, flower, likes,
touches, sees, with
start variable: SENTENCE

ARTICLE a
ARTICLE
the
NOUN boy
NOUN girl
NOUN
flower
VERB likes
VERB
touches
VERB sees
PREP with

Derivations
(10)ARTICLE a
(1) SENTENCE NOUN-PHRASE VERB(11)ARTICLE
(2) PHRASE
(12)the
(3) NOUN-PHRASE CMPLX-NOUN
(13)NOUN boy
(4) NOUN-PHRASE CMPLX-NOUN PREP(14)NOUN girl
(5) PHRASE
(15)NOUN
(6) VERB-PHRASE CMPLX-VERB
(16)flower
(7) VERB-PHRASE CMPLX-VERB PREP(17)VERB likes
(8) PHRASE
(18)VERB
(9) PREP-PHRASE PREP CMPLX-NOUN
touches
CMPLX-NOUN ARTICLE NOUN
VERB sees
CMPLX-VERB VERB NOUN-PHRASE
SENTENCE

NOUN-PHRASE
VERB-PHRASE
(1)
PREP with
CMPLX-VERB VERB
CPLX-NOUN VERB-PHRASE
(2)
ARTICLE NOUN VERB-PHRASE (7)
a NOUN VERB-PHRASE
(10)
a boy VERB-PHRASE
(12)
a boy CPLX-VERB
a boy VERB

(4)
(9)

a boy sees

(17)

Programming languages
Context-free grammars are also used to
describe (parts of) programming languages
For instance, expressions like (2 + 3) * 5 or
3 + 8 + 2 * 7 can be described by the CFG
EE+E
EE*E
E (E)
E0
E1

E9

Variables: E
Terminals: + * ( ) 0 1 2 3 4 5 6 7 8 9

Motivation for studying


CFGs
Context-free grammars are essential for
understanding the meaning of computer programs
code: (2 + 3) * 5
E E*E
(E) * E
(E + E) * E
(2 + E) * E
(2 + 3) * E
(2 + 3) * 5

meaning: add 2 and 3, and then multiply by 5

Definition of context-free
grammar
A context-free grammar (CFG) is a 4-tuple
(V, , R, S) where

V is a finite set of variables or non-terminals


is a finite set of terminals (V = )
R is a set of productions or substitution rules of
the form
A
where A is a variable V and is a string with
variables and terminals
S is a variable called the start variable

Shorthand notation for


productions

When we have multiple productions


with the same variable on the left like
EE+E
EE*E
E (E)
EN

N 0N
N 1N
N0
N1

Variables: E, N
Terminals: +, *, (, ), 0,
1
Start variable: E

we can write this in shorthand as


E E + E | E * E | (E) | 0 | 1
N 0N | 1N | 0 | 1

Derivation
A derivation is a sequential application of productions:

(E) * E
(E) * N
(E + E ) * N
(E + E ) * 1
(E + N) * 1
(N + N) * 1
(N + 1N) * 1
(N + 10) * 1
(1 + 10) * 1

derivation

EE*E

means can be obtained


from with one production
*

means can be obtained


from after zero or more
productions

Language of a CFG
The language of a CFG is the set of all
strings of terminals that can be derived
from the start variable
L(G) = { | * and* S }

Such languages are called context-free

Example 1
A 0A1 | B
B#

Can you derive:

variables: A, B
terminals: 0, 1, #
start variable: A
00#11
00#111
00##11

What is the language of this CFG?


L = {0n#1n: n 0}

Example 2
S SS | (S) |
convention: variables in uppercase,
terminals in lowercase, start variable first

Give derivations of (), (()())


S (S)
()

(2)
(3)

How about ())?

S (S)
(SS)

((S)S)
((S)
(S))
(()
(S))
(()())

Design examples
L = {0n1n | n 0}
These strings have recursive structure:
0000001111
0000011111
11
00001111
000111
0011
01

S 0S1|

Design examples
L = numbers without leading zeros
0, 109, 2, 23
allowed

S 0|LN
N NA|
A 0|L
L 1|2|3|4|5|6|7|8|9

, 01, 003
not allowed

1052870032
any number
leading digit

Design examples
L = {0n1n0m1m | n 0, m 0}
These strings have two parts:
L = L1L2
L1 = {0n1n | n 0}
L2 = {0m1m | m 0}
rules for L1: S1 0S11|
L2 is the same as L1

S S1S1
S1 0S11 |

Design examples
L = {0n1m0m1n | n 0, m 0}
These strings have nested structure:
outer part: 0n1n
inner part: 1m0m

S 0S1|I
I 1I0 |

Design examples

L = {x: x has two 0-blocks with same number o


01011, 001011001, 10010101001
01001000, 01111
allowed

not allowed

10010011010010110
A: , or ends in 1
initial partmiddle part final part

C: , or begins with 1

Design examples
A: , or ends in 1
10010011010010110
A

S ABC
A | U1
U 0U | 1U |
C | 1U
B 0D0 | 0B0
D 1U1 | 1

C: , or begins with 1

B has recursive structure:

00110100
D
at least one 0

Parse tree
Derivations can also be represented using parse trees
E E + E | E - E | (E) |
V
Vx|y|z
EE+E
V+E
x+E
x + (E)
x + (E E)
x + (V E)
x + (y E)
x + (y V)
x + (y z)

E
E +E
V ( E )
x E E
V

Left derivation
Always derive the leftmost variable first:
EE+E
V+E
x+E
x + (E)
x + (E E)
x + (V E)
x + (y E)
x + (y V)
x + (y z)

E
E +E
V ( E )
x E E
V
y

V
z

Corresponds to a left-to-right traversal of


parse tree

Ambiguity
A grammar is ambiguous if some
strings have more than one parse tree
Example:E E + E | E E | (E) |
V
Vx|y|z

E +E

E +E

V E +E
x V
y

V
z

x+y+z

E +E V
V

V z

Why ambiguity matters


The parse tree represents the intended meaning:
E
E
E +E
V E +E
x V
y

x+y+z

V
z

first add y and z,


and then add this to x

E +E
E +E V
V

V z

first add x and y,


and then add z to this

Why ambiguity matters


Suppose we also had multiplication:
EE+E|EE|E
|V
Vx|y|z

E | (E)

E * E

E +E

V E +E
x V
y

xy+z

V
z

first y + z, then x

E E V
V

V z

first x y, then + z

Examples
Which of these are ambiguous:
L = {0n1n | n 0}

L = numbers without leading 0s

S 0S1|
L = {0n1n0m1m | n, m

S 0|LN
N NA|
A 0|P
0}
L 1|2|3|4|5|6|7|8|9

S S1S1
S1 0S11 |

L = {x: x has two 0-blocks with


same number of 0s}
L = {0n1m0m1n | n, m 0}
S 0S1|I
I 1I0 |

S ABC
A | U1
C | 1U

B 0D0 | 0B0
D 1U1 | 1
U 0U | 1U |

Disambiguation
Sometimes we can rewrite the grammar to
remove the ambiguity
EE+E|EE|E
V
Vx|y|z

E | (E) |

Rewrite grammar so cannot be broken by


+:
ET|E+T|E
T
TF|TF
F (E) | V
Vx|y|z

T stands for term: x * (y +


z)
F stands for factor: x, (y +
z)
A term always splits into
factors
A factor is either a variable
or a

Disambiguation
Example
E

ET|E+T|E
T
TF|TF
F (E) | V
Vx|y|z

T
T

F
V
x

Disambiguation
In general, disambiguation is not possible
There exist inherently ambiguous languages:
Every CFG for this language is ambiguous
There is no general procedure that can tell if
a grammar is ambiguous
However, grammars used in programming
languages can typically be disambiguated

Analysis example
S aB | bA
A a | aS | bAA
B b | bS | aBB

What is the language?


Is it ambiguous?

ab
baba
abbbaa
a
bba

Potrebbero piacerti anche