Sei sulla pagina 1di 26

CS 25

Automata Theory & Formal Languages

Lecture 5

CONTEXT-FREE GRAMMARS
(CFGs)

a notation for describing languages.


more powerful than finite automata or REs, but still cannot define
all possible languages.
can describe features with recursive structure
First used in the study of human language
Important application in specification and compilation of
programming languages

CONTEXT-FREE GRAMMARS
(CFGs)
Examples: CFG for { 0n1n | n > 1}
G1:
S -> 01
S -> 0S1

Consists of PRODUCTIONS (collection of substitution rules)


Rules has VARIABLE and STRING (variables and other symbols
called terminals) separated by arrows

FORMAL DEFINITION OF
CONTEXT-FREE GRAMMAR
A context-free grammar is a 4-tuple (V, , R, S), where
1. V is a finite set called VARIABLES
is a finite set, disjoint from V, called TERMINALS
3. R is a finite set of RULES/PRODUCTIONS
4. S is the START SYMBOL

CFGs: Productions

A production has the form variable -> string of variables and


terminals.
Convention:
A, B, C, are variables.
a, b, c, are terminals.
, , , are strings of terminals and/or variables.

For convenience, we abbreviate rules with the same lefthand variable into a single line

Ex: A 0A1 | B
5

FORMAL DEFINITION OF
CONTEXT-FREE GRAMMAR
Example: CFG G for { 0n1n | n > 1}.
G= (V, , R, S), where
1. V = {S}
= {0, 1}
3. R = {S -> 01,
S -> 0S1}
4. S is the start symbol

DERIVATION from CFG's

DERIVATION is the sequence of substitutions to obtain a string.


We derive strings in the language of a CFG by starting with the
start symbol, and repeatedly replacing some variable A by the
right side of one of its productions.

That is, the productions for A are those that have A on the left
side of the ->.

All strings generated by the grammar constitute the


LANGUAGE OF THE GRAMMAR.

L(G1) -language of grammar G1


7

DERIVATION from CFG's

Formally, we say A => if A -> is a production.

=>* means zero or more derivation steps.

Example: S -> 01; S -> 0S1


S => 0S1 => 00S11 => 000111.

DERIVATION from CFG's


LEFTMOST DERIVATION at every step, the leftmost
remaining variable is the one replaced
RIGHTMOST DERIVATION - at every step, the rightmost
remaining variable is the one replaced

DERIVATION from CFG's


G2 = (V, , R, <EXPR>) where
V is {<EXPR>, <TERM>, <FACTOR>},
IS {a, +, X, (, )},
R is <EXPR> <EXPR> + <TERM> | <TERM>
<TERM> <TERM> X <FACTOR> | <FACTOR>
<FACTOR> (<EXPR>) | a
Leftmost Derivation of a + a X a :
<EXPR> <EXPR> + <TERM>
<TERM> + <TERM>
<FACTOR> + <TERM>
a + <TERM>
a + <TERM> X <FACTOR>
a + <FACTOR> X <FACTOR>
a + a X <FACTOR>
a + a X a
10

DERIVATION from CFG's


G2 = (V, , R, <EXPR>) where
V is {<EXPR>, <TERM>, <FACTOR>},
IS {a, +, X, (, )},
R is <EXPR> <EXPR> + <TERM> | <TERM>
<TERM> <TERM> X <FACTOR> | <FACTOR>
<FACTOR> (<EXPR>) | a
Rightmost Derivation of a+aXa :
<EXPR> <EXPR> + <TERM>
<EXPR> + <TERM> X <FACTOR>
<EXPR> + <TERM> X a
<EXPR> + <FACTOR> X a
<EXPR> + a X a
<TERM> + a X a
<FACTOR> + a X a
a + a X a
11

SENTENTIAL FORMS

Any string of variables and/or terminals


derived from the start symbol is called a
sentential form.
Formally, is a sentential form iff
.

S =>*

12

LANGUAGE OF A GRAMMAR

If G is a CFG, then L(G), the language of G,


is {w | S =>* w}.

Note: w must be a terminal string, S is the start


symbol.

Example: G has productions S -> and


S -> 0S1.
L(G) = {0n1n | n > 0}.

Note: is a legitimate
right side.

13

CONTEXT-FREE LANGUAGES

A language that is defined by some CFG is


called a context-free language.
There are CFLs that are not regular languages,
such as the example just given.
But not all languages are CFLs.
Intuitively: CFLs can count two things, not three.
PUSHDOWN AUTOMATA (PDA) is a class of
machines recognizing context-free languages
(CFLs)
14

EXERCISES:
1. Using the grammar G2 defined in the previous slide, show the leftmost and rightmost
derivation with their corresponding parse trees for the string (a + a) X a
2. Given the following grammar:
<SENTENCE> <NOUN-PHRASE> <VERB-PHRASE>
<NOUN-PHRASE> <COMPLX-NOUN> | <COMPLX-NOUN> <PREP-PHRASE>
<VERB-PHRASE> <COMPLX-VERB> | <COMPLX-VERB> <PREP-PHRASE>
<PREP-PHRASE> <PREP> <COMPLX-NOUN>
<COMPLX-NOUN> <ARTICLE> <NOUN>
<COMPLX-VERB> <VERB> | <VERB> <NOUN-PHRASE>
<ARTICLE> a | the
<NOUN> boy | girl | flower
<VERB> touches | likes | sees
<PREP> with

a. Give the formal definition


b. Show the leftmost derivation for:
the girl sees a flower
a boy with a flower likes the girl

PARSE TREES

represents the (syntactic) structure of w


an alternative representation to derivations and
recursive inferences.
there can be several parse trees for the same
string
ideally there should be only one parse tree (the
true structure) for each string, i.e. the language
should be unambiguous.

16

AMBIGUITY

Sometimes, a grammar generates a string in several different


ways (several parse trees and hence, different meanings).

Ex: <EXPR> <EXPR> + <EXPR> | <EXPR> x <EXPR> | (<EXPR>) | a

You should be able to prove that a + a X a is generated


ambiguously.
A grammar generates a string ambiguously, if the string has two
different parse trees, NOT two different derivations.
A string is derived ambiguously in CFG if it has two or more
different leftmost derivations.
The grammar is ambiguous if it generates some string
ambiguously.
INHERENTLY AMBIGUOUS - languages that can only be
generated by ambiguous grammars.
i j k

Ex: { 0 1 2 | i=j or j=k}

17

EXERCISES:
Given the following CFG,
E E + E | E * E | (E) | id
a.) Show that the given grammar is ambiguous by showing the
leftmost derivations and parse trees for the string id + id * id.
b.) Construct an equivalent unambiguous grammar

DESIGNING CONTEXT-FREE
GRAMMARS

Many CFGs are union of simpler CFGs. Construct CFGs


for simpler CFGs and then combine S S1 | S2 | | Sk

Ex: { 0n1n | n 0 } U { 1n0n | n 0 }

If regular, construct DFA first and then convert to CFG.


1. Make variable Ri aRj if (qi, a) = qj
2. Add Ri if qi is the accept state
3. Make R0 the start variable where q0 is the start state.

Two linked substrings. Use RuRv.

Ex: { 0n1n | n 0}

DESIGNING CONTEXT-FREE
GRAMMARS

Strings may contain certain structures that appear


recursively as part of the other structure.

Place variable symbol generating the structure in the location of


rules corresponding to where that structure may recursively appear

Example: G2 = (V, , R, <EXPR>) where


V is {<EXPR>, <TERM>, <FACTOR>},

IS {a, +, X, (, )},

R is <EXPR> <EXPR> + <TERM> | <TERM>

<TERM> <TERM> X <FACTOR> | <FACTOR>

<FACTOR> (<EXPR>) | a

CHOMSKY NORMAL FORM (CNF)

Simplified form

Form:

A BC
Aa
S

where A, B and C are any variables except that B and C may


not be start variables; a is any terminal; S is the start variable.

CHOMSKY NORMAL FORM (CNF)


Theorem:

Any context-free language is generated by a contextfree grammar in Chomsky normal form.

CONVERSION OF ANY GRAMMAR


TO CNF
1. Add new start symbol. (S0 S)
2. Eliminate all rules of form A .

(If R uAv and A , R uAv | uv.

If R uAvAw and A , R uAvAw | uvAw | uAvw |


uvw.

If R A and A , do the same for R.)


3. Eliminate all unit rules of form A B.
(If B u and A B, A u.)
4. Convert remaining rules into the proper form.

(Replace A u1u2uk where k 3 and each ui is a


variable or a terminal symbol with A u1A1, A1 u2A2,
A2 u3A3,, Ak-2 uk-1uk. If k 2, replace ui with Ui
and add Ui ui.)

CONVERSION OF ANY GRAMMAR


TO CNF
Example:
S ASA | aB
AB|S
Bb|

EXERCISES
1. Find CFGs that generate these regular languages over the
alphabet = {a, b}:
a) The language defined by (aaa + b)*
b) All strings with an even number of as
c) All strings with an odd number of as or an even number of
bs
2. Give a CFG that generates the non-regular language
PALINDROME (strings that read the same forward and
backward) over alphabet {a, b}.

EXERCISES
3. Given the following CFG,
G5 :

S aM
S bS
M aF
M bS
F bF
F

What is L(G5) ?
4. Find an equivalent grammar in CNF for the following CFGs:
i. S SS | A
A SS | AS | a
ii. S aA | bB
A aAA | bS | b
B bBB | aS | a

Potrebbero piacerti anche