Sei sulla pagina 1di 47

Unit – II

UNIT II

GRAMMARS

Grammar Introduction– Types of Grammar - Context Free Grammars and


Languages– Derivations and Languages – Ambiguity- Relationship between
derivation and derivation trees – Simplification of CFG – Elimination of Useless
symbols - Unit productions - Null productions – Greiback Normal form –
Chomsky normal form – Problems related to CNF and GNF.

2.1 Grammar Introduction

Let G be the grammar S→0B|1A, A → 0 | 0S 1AA,

B → 1|1S|0BB. For the string 00110101, Let us discuss its left and right
most derivation and derivation tree.

Solution:

(a)Leftmost derivation:

S ⇒ 0B

⇒ 00BB (B→0BB)

⇒ 001B (B→l)

⇒ 0011S (B→1S)

⇒ 00110B (S→0B)

⇒ 001101S (B→1S)

⇒ 0011010B (S→0B)

⇒ 00110101 (B→l)

(b)Rightmost derivation:
S ⇒ 0B

⇒ 00BB (B→0BB)

⇒ 00B1S (B→1S)

⇒ 00B10B (S→0B)

⇒ 00B101S (B→1S)

⇒ 00B1010B (S→0B)

⇒ 00B10101 (B→1)

⇒ 00110101 (B→l)

(c)Derivation tree:

2.2 Types of Grammar

Type 0 grammars:

Type 0 grammars are those where the rules are of the form

α→β

where α, β ∈ (Σ ∪ V ) ∗

Example:

Consider the grammar G with Σ = {a} with


S → $Ca# | a | εCa → aaC $D → $C

C# → D# | E aD → Da aE → Ea

$E → ε

Type 1 Grammars:

The rules in a type 1 grammar are of the form

α→β

where α, β ∈ (Σ ∪ V ) ∗ and |α| ≤ |β|.

In every derivation, the length of the string never decreases.

Example:

Consider the grammar G with Σ = {a, b, c}, V = {S, B, C, H} and

S → aSBC | aBC CB → HB HB → HC

HC → BC aB → ab bB → bb

bC → bc cC → cc

L(G) = {a n b n c n | n ≥ 0}

Type 2 Grammars:

The rules in a type 2 grammar are of the form

A→β

where A ∈ V and β ∈ (Σ ∪ V ) ∗ .

Type 2 grammars describe context-free languages.

Example:

Consider G over Σ = {0, 1} with rules

S → | 0S1

L(G) = {0 n1 n | n ≥ 0}
Type 3 Grammars:

The rules in a type 3 grammar are of the form

A → aB or A → a

where A, B ∈ V and a ∈ Σ ∪ {}.

Example:

Consider the grammar over Σ = {0, 1} with rules

S → 1S | 0A A → | 1A | 0S

L(G) = {w ∈ {0, 1} ∗ | w has an odd number of 0s}

2.3 Context Free Grammars and Languages

Let us see an example for Context free Language :

S→aSb | aAb

A→bAa

A→ba

S⇒aSbS⇒aAbS⇒aSb

⇒aaAbb⇒abab⇒aaSbb

⇒aababb⇒aaaAbbb

⇒aaababbb

S⇒aAbS⇒aAb

⇒abAab⇒abAab

⇒abbaab⇒abbaab

L = {The set of strings over Σ={a,b}, starting with a and ending with b and
substring ba}.
Let us construct a CFG over {a, b} generating a language consisting of equal
number of a’s and b’s.

S ( aSb | ab | bSa | ba and try to discuss whether the language {am bmcm,
m ≥ 0} is context free or not.

The given language L = {an bn cn}.

Solution:

Let z be any string that belongs to L

Let z = apbp cp ∈ L

According to pumping lemma, if z is in L and |z| > n, z can be written as

z = uvwxy

z = ap bp cp as

u, vwx and y respectively, we get

u = ap

vwx = bpwhere |vwx| ≤ n

vx = bp–mwhere |vx| ≥ 1

y = cp

Substituting these values in uviwxiy

= uvi - 1 vwx xi-l y (uviwxiy is expressed in this form)

= uvwx (vx)i - 1 y

= ap bp (bp-m)i - 1 cp

= ap bp bpi-mi-p+m cp∉ L for all values of i

Let i = 0.

UVi - 1 vwx xi - 1 y = ap bp bp(0)-m(0)-p+m cp


= ap bp bm–P cp

= ap bm cp ∉ L

Hence L is not a context free grammar.

Let us construct a CFG for the set {ai bj ck | i ≠ j or j ≠ k}

Given:

{0i1j2k | i!=j or j!=k}

Solution:

S → AC | BC | DE | DF

A → 0 | 0A | 0A1

B → 1 | B1 | 0B1

C → 2 | 2C

D → 0| 0D

E → 1 | 1E | 1E2

F → 2 | F2 | 1F2

Let us Construct the CF for the following languages:

(1)L(G)={ambn | m≠n m, n>0}

(2)L(G)={an ban | n≥1}.

(1)Given:

L(G) ={ambn | m≠n, m, n > 0}

Solution:

CFG:
S → aSb

S → aC|a|bD|b

C → aC|a

D → bD|b

(2)Given:

L(G) = {anban | n≥1}

Solution:

CFG:

S →aSa

S →b.

Let us discuss a closure property of content free languages - Regular Vs


Context-free language :

Theorem:

Every regular language is context-free.

Proof:

(i)Let L be regular.

(ii)Given a DFA (Finite Automata) for L, add a stack, but do not use the stack.

(iii)That is, change each DFA transition (p,a,q) to a DPA transition

δ(p,a,z) = {(q,λ)}

(iv)The result is DPA whose language is L.

(v)Therefore L is context-free

2.Closure under Union:

Theorem:
Let L1 and L2 be CFLs. Then L1∪L2 is also a CFL.

Proof:

(i)Let L1 have grammar (V1, T1, P1, S1) and let.L2 have grammar (V2, T2, P2, S2)

(ii)Then L1∪L2 has grammar (V3,T3,P3,S3)

where

•V3 = V1 ∪ V2 ∪ S3

•T3 = T1 ∪ T2

•S3 = new start symbol

•P3 = P1 ∪ P2 ∪ {S3 → S1 | S2}

(iii)Therefore L1∪L2 is CFL.

3.Closure under Concatenation Theorem:

Let L1 and L2 be CFLs. Then L1L2 is also CFL.

Proof:

(i)Let L1 have grammar (V1, T1, P1, S1) and L2 have grammar (V2, T2, P2, S2)

(ii)Then L1L2 has grammar (V, T, P, S), where

V = V1 ∪V1 ∪ {S}

T = T, ∪ T2

P = P1 ∪ P2 ∪ {S→S1S2}

S = start symbol

(iii)Therefore L1L2 is a CFL.

4.Closure under Kleene star:

Theorem:

Let L be a CFL. Then L* is also a CFL.


Proof:

(i)Let L have grammar (V1, T1, P1, S1)

(ii)Then L* has a grammar (V, T, P, S)

where

V = V1 ∪ {S}

T = T1

P = P1 ∪ (S → e, S→SS1}

S = start symbol

(iii)Therefore, L is a CFL.

5.Intersection of a CFL and RE:

Theorem:

Intersection of a CFL and a Regular Language is a CFL.

Proof:

(i)Given: Let L1 = L(M1) for some PDA,

M1 = (Q1, Σ1, Γ1, δ1, S1, F1)

and L2 = L(M2) for some DFA

M2 = (Q2, Σ2, δ2, S2, F2)

(ii)Need to show:

L1∩L2 = L(M) for some PDA, M

where M = (Q, Σ, Γ, δ, S, F)

(iii)Idea:

Construct a PDA, M that operates in the same way as M1 except that it also keeps
track of the change in state in M2 caused by reading the same input.

(iv)Construction:
Q = Q1 × Q2

Σ = Σ1∪Σ2

Γ = Γ1

S = {S1, S2}

F = F1 × F2

- for each transition {(q1, a, β), ((p1, γ)} ∈δ1 and for each state q2∈Q2 add to δ the
transition

(((q1,q2), a, β), ((p1,δ(q2,a)),γ))

- for each transition {(q1, λ, β), (p1,γ)} ∈ δ1 and for each state q2 ∈ Q2 add to δ the
transition

(((q1,q2), λ β), ((p1, q2), γ))


Running of DFA and PDA in parallel.

6.Complementation and Intersection:

(i)The complement of a context-free language is not necessarily context-free.

(ii)The intersection of two context-free language is not necessarily context-free.

7.Property of CFL (Fanout & Height):

Let G = (V, T, P, S) be a CFG.

•The fanout of G φ (G) is the largest number of symbols on the RHS of any rule
in R.

•The height of a parse tree is defined as the length of the longest path from the
root to some leaf.

Prove that if ‘w’ is a string of a language then there is a parse tree with yield ‘w’
and also prove that if A => w then it implies that ‘w’ is a string of the language L
defined by a CFG.

Let us prove if ‘w’ is a string of a language then there is a parse tree with
yield ‘w’ and also prove that if A => w then it implies that ‘w’ is a string of
the language L defined by a CFG.

Theorem:

Let G = (VN, Σ, P, S) be a context free grammar (CFG)

then if and only if there is a derivation tree for G which yield α.

Proof:

Step 1:

We prove that A ⇒ α if and only if there is an A-tree which derives α. Once this
is proved, the theorem follows by assuming that A=S.

Let a be the yield of an A-tree T. We prove that AA⇒ α by induction on the number
of internal vertices in T.

When the tree has only one internal vertex, the remaining vertices are leaves and
are the sons of the root.
By the definition of derivation tree (iv) A→A1 A2........ Am=α is a production in G
(i.e.) A⇒α.

This is a basis step for induction (k = 1). Now assume the result is true fork–1
internal vertices (k>1).

Let T be an A-tree with k internal vertices (k≥2). Let v1, v2,..... vm be the sons of
the root in the left-to-right ordering.

Let their labels be x1, x2...... xm. By the definition of derivation tree (iv)
A→x1, x2....xm is one of the production P. Therefore:

A ⇒ x1, x2........... Xm

As k ≥ 2, at least one of the sons is an internal vertex.

By the left-to-right ordering of leaves, α can be written as α1, α2, .........αm, where
αi is obtained by:

(a)The concatenation of labels of the leaves which are descendents of


vertex vi. vi is an internal vertex of the subtree xi ⇒ αi

(b)If vi is not an internal vertex (i.e.) a leaf, then xi = αi

(i.e.) (by Induction Principle)

Step 2:

To prove the “only if’ part, let us assume that .

When A⇒α, A→α is a production in P. If α = x1 x2....... xm, the A-tree with yield α
is basis for induction. That is:
Assume the result for derivations in atmost k steps. Let split this as:

A ⇒ x1 x2....... xm is a production in P.

In the derivation either

(a)xi is not changed throughout the derivation (i.e.) xi=αi

(b)xi is changed in some subsequent step, (i.e.)

As G is context free, replace a single variable α by a string α1 α2.....αm

2.4 Derivations and Languages

Let us convert the grammar S→AB|αB,A→aab|∈, B→bbA into CNR.

Given:

S→AB | aB

A→aab | ε

B → bbA

Solution:
Step 1:

No unit productions in the given P.

The null production A→ε is eliminated and the resultant productions are:

S→AB|aB|B

A→aab

B→bbA|bb

Eliminate the unit production S→B.

The resultant productions are:

S→AB | aB|bbA|bb

A → aab

B → bbA | bb

Step 2:

Let G1 = (N' {a, b}, S, P'), where P' and N' are constructed as follows:

(i)S→AB added to P'

(ii)The remaining productions are:

S→CaB|CbCbA|CbCb

A→CaCaCb

B→CbCbA|CbCb

Where Ca →a, Cb→b

∴ N' = {S, A, B, Ca, Cb}

Step 3:

Let G2 = (N", {a, b}, S, P") where P" and N" are constructed as follows:

(i)S→AB, S→CaB|CbCb, B→CbCb


Ca→a, Cb→b are added to P"

(ii)S→CbCbA is replaced as:

S→CbD1

D1→CbA

A→CaCbCb is replaced as:

D2→CbCb

B→CbCbA is replaced as:

B→CbD1

D1→CbA

∴ N"={S, A, B, Ca, Cb, D1, D2}

P" consists of:

S→AB | CaB|CbCb|CbD1

a→Cad2

b→CbCb|CbD1

D1→CbA

D2→CbCb

Ca→a

Cb→b

Hence G2 is in CNF for the given grammar.

2.5 Ambiguity

Let us discuss a problem with the following :

Consider the grammar:


(i)S → i C t S

(ii)S → i C t S e S

(iii)S → a

(iv)C → b

where i, t, and, e stand for if, then, and else, and C and S for “conditional”
and “statement” respectively.

(1)Construct a leftmost derivation for the sentence w = i b t i b t a e a.

(2)Show the corresponding parse tree for the above sentence.

(3)Is the above grammar ambiguous? If so, prove it.

(4)Remove ambiguity if any and prove that both the grammar produces the
same language.

Solution :

w=ibtibtaea

Leftmost derivation 1:

S⇒iCtS

⇒ i b t S (C →b)

⇒ i b t i C t SeS (S→i C t SeS)

⇒ i b t i bt SeS (C→b)

⇒ i b t i bt aeS (S →a)

⇒ i b t i bt aea (S →a)

(2)
(3)Leftmost derivation 2:

S⇒ i C t S e S

⇒ i b t S e S (C→b)

⇒ i b t i C t S e S (S→iCtS)

⇒ i b t i b t S e S (C →b)

⇒ i b t i b t a e S (S→a)

⇒ i b t i b t a e a (S→a)

For the string w= i b t i b t a e a, the given grammar has two leftmost derivations.
Therefore it is ambiguous.

Let us see whether the grammar E → E + E | id is ambiguous.

Solution:
The given grammar is ambiguous.

The sentence id+id+id can be derived from more than one leftmost derivation or
rightmost derivation or parse tree.

Leftmost derivations:

(i)E ⇒ E + E

⇒ id + E (E→id)

⇒ id + E + E (E→E + E)

⇒ id + id + E (E→id)

⇒ id + id + id (E→id)

(ii)E ⇒ E + E

⇒ E + E + E (E→E + E)
⇒ id + E + E (E→id)

⇒ id + id + E (E→id)

⇒ id + id + id (E→id)

Let us see that the grammar S → a |Sa| |bSS| |SSb| |SbS is ambiguous.

Given:

S → a | Sa | bSS | SSb | SbS

Let w = aabbaa

Solution:
Left most derivations

S⇒SbS

⇒SabS (S → Sa)

⇒aabS (S → a)

⇒aabbSS (S → bSS)

⇒aabbaS (S → a)

⇒aabbaa(S → a)

S⇒SbS

⇒SSbS (S → SSb)

⇒aSbbS (S → a)

⇒aabbS (S → a)

⇒aabbSa (S → Sa)

⇒aabbaa (S → a)

The given grammar is ambiguous.

Since it has two left most derivations for the string w = aabbaa.

Let us prove that the expression grammar is ambiguous.

E E + E \ E *E \ (E) \ a.

Proof of E E + E \ E *E \ (E) \ a.

Given:

E→ E+E | E*E | (E) | a

Solution:

Leftmost derivations
(1)E ⇒ (E)

⇒ (E + E)

⇒ (a.+ E) (E→ a)

⇒ (a + E * E) (E → E*E)

⇒ (a + a * E) (E→ a)

⇒ (a + a * a) (E→ a)

(2)E ⇒ (E)

⇒ (E * E)

⇒ (E + E * E) (E → E+E)

⇒ (a + E * E) (E→ a)

⇒ (a + a * E) (E→ a)

⇒ (a + a * E) (E→ a)

⇒ (a + a * a) (E→ a)

Therefore w = (a + a * a) and the grammar is ambiguous.

Let us discuss the overview about the Ambiguity:

A context free grammar G is said to be ambiguous if there exists some w∈L(G)


that has at least two distinct derivation trees.

Alternatively ambiguity implies the existence of two or more leftmost or rightmost


derivations.

Consider G = ({S}, {a,b,+,*}, P,S) where P consists of S→S+S | S

From the given G, P two leftmost derivations of are induced. They are:

S⇒ S+S

⇒ a + S (S→a)
The corresponding derivation tree is:

The corresponding derivation tree is:


Therefore is ambiguous.

Let us see about the Ambiguous CFG.

A context free grammar G is said to be ambiguous if there exists somew∈L(G)


that has at least two distinct derivation trees. Alternatively,ambiguity implies
the existence of two or more leftmost or rightmost derivations.

Example:

Consider G = ({S}, {a,b,+,*}, P,S) where P consists of S→S+S From


the given G, P two leftmost derivations of are induced. They are:

S⇒ S+S

⇒ a+S (S→a)
The corresponding derivation tree is:

The corresponding derivation tree is:


2.6 Relationship between derivation and derivation trees

Let us prove that if ‘w’ is a string of a language then there is a parse tree
with yield ‘w’ and also prove that if A => w then it implies that ‘w’ is a string
of the language L defined by a CFG.

Theorem:

Let G = (VN, Σ, P, S) be a context free grammar (CFG) then if and only if


there is a derivation tree for G which yield α.

Proof:

Step 1:

We prove that A ⇒ α if and only if there is an A-tree which derives α. Once this
is proved, the theorem follows by assuming that A=S.

Let a be the yield of an A-tree T. We prove that AA⇒ α by induction on the number
of internal vertices in T.
When the tree has only one internal vertex, the remaining vertices are leaves and
are the sons of the root.

By the definition of derivation tree (iv) A→A1 A2........ Am=α is a production in G


(i.e.) A⇒α. This is a basis step for induction (k = 1). Now assume the result is
true for k–1 internal vertices (k>1).

Let T be an A-tree with k internal vertices (k≥2). Let v1, v2,..... vm be the sons of
the root in the left-to-right ordering.

Let their labels be x1, x2...... xm. By the definition of derivation tree (iv)
A→x1, x2....xm is one of the production P. Therefore:

A ⇒ x1, x2........... Xm

As k ≥ 2, at least one of the sons is an internal vertex.

By the left-to-right ordering of leaves, α can be written as α1, α2, .........αm, where
αi is obtained by:

(a)The concatenation of labels of the leaves which are descendents of


vertex vi. vi is an internal vertex of the subtree xi ⇒ αi

(b)If vi is not an internal vertex (i.e.) a leaf, then xi = αi

(i.e.) (by Induction Principle)

Step 2:

To prove the “only if’ part, let us assume that .


When A⇒α, A→α is a production in P. If α = x1 x2....... xm, the A-tree with yield α
is basis for induction. That is:

Assume the result for derivations in atmost k steps. Let split this as:

A ⇒ x1 x2....... xm is a production in P.

In the derivation either

(a)xi is not changed throughout the derivation (i.e.) xi=αi

(b)xi is changed in some subsequent step, (i.e.)

As G is context free, replace a single variable α by a string α1 α2.....αm

Let us Construct a parse tree of (a+b)*c for the grammar E→E+E|E*E|(E)|id

String = (a+b)*c

Solution:

Parse tree representation


Let us see an example problem that G be the grammar.

S → aB | bA , A → a | aS | bAA, B → b | bS | aBB for the string


baaabbabba. Let us also discuss its leftmost derivation, rightmost
derivation and parse tree.

(i)Leftmost derivation

S ⇒ bA

⇒ baS (A → aS)

⇒ baaB (S → aB)

⇒ baaaBB (B → aBB)
⇒ baaabsB (B → bS)

⇒ baaabbAB (S → bA)

⇒ baaabbaB (A → a)

⇒ baaabbabS (B → bS)

⇒ baaabbab bA (S → bA)

⇒ baaabbabba (A → a)

(ii)Rightmost derivation

S ⇒ bA

⇒ baS (A → aS)

⇒ baaB (S → aB)

⇒ baaaBB (B → aBB)

⇒ baaaBbS (B → bS)

⇒ baaaBbbA (S → bA)

⇒ baaaBbba (A → a)

⇒ baaabSbba (B → bS)

⇒ baaabbAbba (S → bA)

⇒ baaabbabba (A → a)

(iii)Parse trees:
2.7 Simplification of CFG

Let us discuss the steps involved in converting from CFG’S to PDA’S.


Theorem:

For any context free language L, there exists an pda M such that

L = L(M).

Proof:

Let G = (V, T, P, S) be a grammar. T

here exists a Greibach Normal Form then we can construct pda which simulates
left most derivations in this grammar.

M= (Q, Σ, Γ, δ,q0, z, F), where

Q = {q0, q1, qf} = set of states

Σ = terminal of grammar G

Γ = V ∪ {z} where V is the variable in grammar G

F = {qf} = final state.

The transition function will include

δ(q0, λ, z) = {(q1, Sz)}, so that after the first move of M, the stack contains the start
symbol S of the derivation. (The stack symbol z is a marker to allow us to detect
the end of the derivation)

In addition, the set of transition rules is such that

(i)δ(q1, λ, A) = {(q, a)} for each A→α in P

(ii)δ(q, a, a) = {(q, λ)} for each a∈Σ

Let us Consider the following grammar G with productions S→ABC|BaB


A→aA | BaC | aaa B→bBb|a C → CA | AC Give a CFG with no useless
variables that generates the same language.

Given:

S→ABC|BaB

A→aA | BaC | aaa

B→bBb|a
C → CA | AC

Solution:

The variables A and B are generating and derives a terminal string. Therefore C
is a useless variable. After removing ‘C’, the CFG is:

S→BaB

A→aA | aaa

B→bBb|a.

2.8 Elimination of Useless symbols

Eliminating useless symbols from the productions in Context Free


Grammar:

This defines those symbols that do not participate in derivation of any string,
i.e. the useless symbols, usually remove the useless productions from the
grammar.
A symbol X is useful if:

If X is generating, i.e., X =>* w, where w ϵ L(G) and w in Vt*, this means that
the string leads to a string of terminal symbols.

If X is reachable then there is a derivation S =>* αXβ =>* w, w ϵ L(G), for same
α and β, then X is said to be reachable.

A number that is useful is both the generating and reachable.

For reduction of a given grammar G:

Identify the non-generating symbols in the given CFG and eliminate those
productions which contains non-generating symbols.

Identify the non-reachable symbols and eliminate those productions which


contain the non-reachable symbols.

Example: Remove the useless symbol from the given context free grammar:
S -> aB / bX
A -> Bad / bSX / a
B -> aSB / bBX
X -> SBD / aBx / ad

Solution:

A and X directly derive string of terminals a and ad, hence they are useful.

Since X is a useful symbol so S is also a useful symbol as S -> bX.

But B does not derive any string w in V*t so clearly B is a non-generating


symbol.

So eliminating those productions with B in them we get


S -> bX
A -> bSX / a
X -> ad
In the reduced grammar A is a non-reachable symbol. so, we remove it and the
final grammar after elimination of the useless symbols is
S -> bX
X -> ad

Example: Find the equivalent useless grammar from the given grammar
A -> xyz / Xyzz
X -> Xz / xYz
Y -> yYy / Xz
Z -> Zy / z

Solution:

A and Z is a useful symbol as it can be derived to a string of terminal symbol (Z


-> z and A -> xyz).

X and Y are not useful so all the production with X and Y in them should be
removed to eliminate non-generating symbols. The grammar then becomes
A -> xyz
Z -> Zy / z
Since A is the starting symbol this implies Z is the non-reachable symbol. So,
we remove it to get a grammar free of useless symbols:
A -> xyz.
2.9 Unit productions

Eliminating the unit productions from the productions in the Context


Free Grammar.

A unit production is a production A -> B where both A and B are non-


terminals.

Unit productions are redundant and hence it should be removed.

Follow the steps to remove the unit production

Repeat the following steps, while there is a unit production.

Select a unit production A -> B, such that there exist a production B -> α,
where α is a terminal

For every non-unit production, B -> α repeat the following step

Add production A -> α to the grammar

Eliminate A -> B from the grammar

Example: Remove the unit productions from the following grammar


S -> AB
A -> a
B -> C / b
C -> D
D -> E
E -> a

Solution:

There are 3 unit production in the grammar


B -> C
C -> D
D -> E
For production D -> E there is E -> a so we add D -> a to the grammar and add
D -> E from the grammar.

Now we have C -> D so we add a production C -> a to the grammar and delete
C -> D from the grammar.
Similarly we have B -> C by adding B -> a and removing B -> C we get the final
grammar free of unit production as:
S -> AB
A -> a
B -> a / b
C -> a
D -> a
E -> a
We can see that C, D and E are unreachable symbols so to get a completely
reduced grammar we remove them from the CFG. The final CFG is :
S -> AB
A -> a
B -> a / b

Example: Identify and remove the unit productions from the following
CFG
S -> S + T/ T
T -> T * F/ F
F -> (S)/a

Solution:

S -> T and T -> F are the two unit productions in the CFG.

For productions T -> F we have F -> (S)/a so we add T -> (S)/a to the grammar
and remove T-> F from the grammar.

Now for production S -> T we have production T -> T * F/(S)/a so we add S -> T
* F/(S)/a to the grammar.

So the grammar after removal of unit production is:


S->S + T/ T * F/ (S)/ a
T -> T * F/ F
F -> (S)/ a

2.10 Null productions

Eliminating null production from the productions in the Context Free


Grammar:

Null productions are of the form A -> ϵ.


Here we will learn to remove the null productions from the grammar.

We cannot remove all ϵ-productions from a grammar if the language contains ϵ


as a word, but if it doesn’t we can remove all.

In a given CFG, we call a non-terminal N nullable if there is a production N -> ϵ


or there is a derivation that starts at N and leads to ϵ:

N => … => ϵ.

If A -> ϵ is a production to be eliminated then we look for all productions,


whose right side contains A, and replace each occurrence of A in each of these
productions to obtain the non ϵ-productions.

These resultant non ϵ-productions must be added to the grammar to keep the
language the same.

Example: Remove the null productions from the following grammar


S -> ABAC
A -> aA / ϵ
B -> bB / ϵ
C -> c.

Solution:

We have two null productions in the grammar A -> ϵ and B -> ϵ.

To eliminate A -> ϵ we have to change the productions containing A in the right


side.

Those productions are S -> ABAC and A -> aA.

So, by replacing each occurrence of A by ϵ, we get four new productions.


S -> ABC / BAC / BC
A -> a
Add these productions to the grammar and eliminate A -> ϵ.
S -> ABAC / ABC / BAC / BC
A -> aA / a
B -> bB / ϵ
C -> c
To eliminate B -> ϵ we have to change the productions containing B on the
right side.
Doing this we generate these new productions:
S -> AAC / AC / C
B -> b
Add these productions to the grammar and remove the production B -> ϵ from
the grammar. The new grammar after removal of ϵ – productions is:
S -> ABAC / ABC / BAC / BC / AAC / AC / C
A -> aA / a
B -> bB / b
C -> c.

2.11 Greiback Normal form

Conversion of the Grammer G in greibach normal form S →ABb | a A →aaA


| B B → bAb

Given:

S → ABb | a

A → aaA \ B

B → bAb

Solution:

Replace S with A1, A with A2 and B with A3

A1 → A2A3b | a

A2 → aaA2| A3

A3 → bA2b

Here A2 and A3 ar e useless symbols.

Therefore the required GNF form is:

A1 → a.

Let us discuss the Griebach Normal Form.


A context free grammar G is in GNF if every production is of the form A→aα
where α∈N* and a∈T (α may be λ.) and S→λ is in G if λ ∈ L(G), where S does not
appear on the RHS of any production.

Let us prove GNF equivalent of the grammar S → AA | 0, A → SS | 1.

Solution:

Step 1:

The given grammar has no null productions and is on CNF.

Therefore the variables S and A are renamed as A1, A2. Hence the productions
are:

A1 → A2A2 | a, A2 → A1 A1 | b

Step 2:

Derive the productions of the form Ai → aγ or Ai. → Ajv where j>i in P.

Apply A1 → A2A2|a in A2 → A1A1, derives A2→ A2A2A1, A2 → aA1

Therefore A2 productions are:

A2→A2A2A1, A2→aA1, A2→b

Step 3:

To derive the productions of the form An→aγ, from An→Anγ.

Let z2 be the new variable to apply in A2→A2A2A1 as per lemma 2 of GNF.

Therefore the resulting productions are:

A2 → aA1, A2→b

A2→aA1,z2, A2→bz2

Z2→A2A1, Z2→A2A1Z2.

Lemma 2:
Let G=(N, T, S, P) be a CFG. Let the set of A productions be A→Aα1,|Aα2|…..|Aαr|
β1,| β2|…….| β S.

Let B be a new variable then G1, = (N∪{B}, T, P', S) where P' is defined as

(a)the set of A productions in P' are

A → β1|β2|…… βs

A→β1,B| β2B|…..| βsB

(b)the set of B productions in P' are

B→α,|α2...... | α r

B→α1,B| α2B|....| αrB

(c)The productions for other variables are as in P, then G1, is a CFG in GNF
equivalent to G.

Step 4:

Apply the same steps (2, 3) for. A1 also.

(a)A2 productions are:

A2→aA1|b|aA1z2bz2

(b)A1 productions are:

A1→a, retained as it is.

A1→A2A2 is modified as:

A1→aA1A2 | bA2

Al→aA1z2A2 | bz2A2

∴ The total A1, productions are:

A1→a | aA1,A2| bA2 | aA1,z2A2 | bz2A2

Step 5:

Modification of new variable production to the form of zi.→ aγ.


Therefore the z2 productions (z2→A2A1, z2→ A2A1,z2) are modified as:

z2→ aA1A1 | bA1 | aA1z2A1 | bz2A1

z2→ aA1A1z2., | bA1z2 | aA1z2A1z2 | bz2A1z2

Hence the equivalent grammar is:

G1 = ({A1,A2,z2}, {a,b}.,P1,A1)

where P1, consists of:

A1→a|aA1A2|bA2|aA1z2A2|bz2A2

A2→aA1, |b| aA1, z2| bz2

z2→ aA1,A1 | bA1 | aA1z2A1 | bz2A1

z2→ aA1A1,z2 | bA1z2 | aA1z2A1z2 | bz2A1z2.

2.12 Chomsky normal form

Let us discuss the conversion of the following in to Chomsky Normal Form:

S → A|CB

A → C|D

B → 1B|1

C → 0C|0

D → 2D|2

Given:

S → A|CB

A → C|D

B → 1 B|1

C → 0C|0
D → 2D|2

Solution:

1.No null productions, the unit productions exist in the form of chain
productions.

S→A

A→C

C → 0C|0 can be rewritten as

S → 0C|0

Similarly

S→A

A→D

D → 2D|2 can be rewritten as

S → 2D|2

The new set of productions are:

S → 0C|2D|CB|0|2

B → IB | 1

C → 0C|0

D 2D|2

2.Let G1 = (N1 {0, 1,2}, S, P1) where P1 is:

S → 0|2|CB

B → 1C → 0D → 2 are added to P1

S → 0C|2D,B → 1BC → 0C

D → 2D yield
S → A1C|A3D,B → A2B,C → A1C

D → A1D, where A1 → 0,A2 → 1,A2 → 2

∴N1 = {S1A1,A2,A3, B,C,D}

Here G1 is in CNF for the given grammar.

Let us discuss about the Chomsky normal form :

The length of the parse tree to represent a string w is n (longest path) then |w|
≤ 2n–1 in CNF.

Let us discuss and prove the Chomsky normal form for CFL.

Theorem:

For every context free grammar, there is an equivalent grammar in Chomsky


Normal Form (CNF).

Proof:

Step 1: Elimination of null productions. We then apply theorem to eliminate


chain productions.

Let the grammar thus obtained be G = (N, T, S, P).

Step 2: Elimination of terminals on R.H.S. We define G1 - (N1, T, S, P1)

where P1 and N1 are constructed as follows:

(i)All the productions in P of the form A→ a or A→BC

are included in P1 All the variables in N are included in N1.

(ii)Consider A → X1 X2Xn with some terminal on R.H.S. If Xi is a terminal,


say ai, add a new variable Cai to N1 and Cai → ai to P1.

In production A→ X1 X2 ...... Xn, every terminal on R.H.S, is replaced by


the corresponding new variable and the variables on the R.H.S. are retained.

The resulting production is added to P1 Thus we get G1 = (N1, T, P1, S).

Step 3: Restricting the number of variables on R.H.S. For any production in


P1, the R.H.S. consists of either a single terminal (or λ in S → λ) or two or more
variables.
We define G2 = (N", T, P2, S) as follows:

(i)All productions in P1 are added to P2 if they are in the required form. All the
variables in N1 are added to N".

(ii)Consider A → A1 A2..... Am, where m ≥ 3. We introduce new productions

A→ A1C1, C1 → A2 C2,........Cm-2 → Am-1 Am,

and new variables C1, C2 ........Cm-2. These are added to P" and N" respectively.

Thus we get G2 in Chomsky Normal Form.

Step 4:

To complete the proof we have to show that L(G) = L(G1) = l(G2).

To show that L(G) ⊆ L(G1), we start with w ∈ L(G).

If A → X1X2.....Xn is used in the derivation of w, the same effect can be achieved


by using the corresponding production in P1 and the productions involving the
new variables.

Hence

Let w ∈ L(G1). To show that w ∈ L(G), it is enough to prove the following

We prove 1 by induction on the number of steps in

If then A → w is a production in P1.

By construction of P1, w is a single terminal. So A → w is in P i.e., This


is basis for induction.

Let us assume ① for derivations in atmost k steps.

Let A We can split this derivation as


Each Ai is either in N or a new variable, say Cai When Ai ∈ N, is a
derivation in atmost k steps, and so by induction hypothesis, Thus① is
true for all derivations. Therefore L(G) = L(G1).

The effect of applying A → A1 A2.....Am in a derivation for w ∈ L(G1) can be


achieved by applying the production A→A1 C1, C1→A2C2,......Cm-2 →Am-1 Am in P2.

Hence it is easy to see that L(G1) ⊆ L(G2).

To prove L(G2) ⊆ L(G1), we can prove an auxiliary result.

Condition ② can be proved by induction on the number of steps


Applying ① to S, we get L(G2) ⊆ L(G2).

Thus L(G) = L(G1) = L(G2).

2.13 Problems related to CNF and GNF.

Let us reduce the grammar G to CNF.

G is S → aAB, A → aB/bAB, B→D→d

Solution:

Step 1:

As there are no null productions or unit productions, proceed step 2.

Step 2:

Let G1 = (V1, {a, b, d}, P1, S), where P1, V1 are constructed as follows:

(i)B→b, D→d are included in P1

(ii)S→aAD gives rise to S→CaAD and Ca→a.

A→aB gives rise to A→CaB


A→ bAB gives rise to A→CbAB and Cb→b

∴ V1 = {S,A,B,D,Ca,Cb}

∴ P1 = {S → CaAD,

A → C B | CaAB,

B → b,

D → d,

Ca → a,

Cb → b}

Step 3:

Let G2 = (V1 {a, b, d}, P2, S), where P2, V1 are constructed as follows:

A→CaB, B→b, D→d, D→d, Ca →a, Cb →b are added to P2

A→CaAD is replaced by S→CaC1 and C1 →AD

A→CbAB is replaced by A→CbC2 and C2 →AB

∴ V1 = {S, A, B, D, Ca, Cb, C1,C2}

∴ P2 = {S → CaC1

A → CaB | CbC2

C1 → AD

C2 → AB

B → b,

D → d,

Ca → a,

Cb → b }

Hence G2 is in CNF and it is equivalent to G.


Let us reduce the following grammar to Chomsky normal form.

S → a \ AAB,A → ab | aB\∈.,B → aba \∈

Step 1:

Remove all and productions.

S → a | AAB | AB |AA

A → ab | aB | a

B → aba

Step 2:

Let G1 = {N1, {a, b} S, p1} where

p1 & N1 are constructed as:

(i)S → a, A → a, S → AB | AA are added to p1

(ii)S → AAB, A → ab | aB,

A → aba yields

S → AAB, A → C1 C2 | C1 B, B → C1 C2 C1

where C1 → a, C2 → b

Step 3:

Let G2 = {N11, {a, b} S, p2} where

p2 & N11 are constructed as:

(i)S → a, A → a, S → AB | AA,

A → C1 C2 | C1 B, C1 → a, C2 → b are added to p2

(ii)S → AAB is equivalent to:


B → C1 C2 C1 is equivalent to

Let us try to convert the following to Greibach Normal Form.

S → a \ AB,A → ab \ BC,B → b,C → b.

Solution:

Rewrite the given rules into:

A1 → A2 A3 | a

A2 → A3 A4 | a

A3 → b

A4 → b

A → Bv, B productions i.e. B → β1 | β2 ..... Βs

Then P' is defined as

P' = (P – {A → Bv} ∪ {A → Biv} | 1 ≤ i ≤ S)

2.Replace the first symbol as terminal symbol.

A1 → a A3 | a

A2 → b A4 | a

A3 → b

A4 → b

Now the productions are in the required GNF.

Potrebbero piacerti anche