Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
CHAPTER PAGE
1. Introduction 2
2. Finite Automata 8
3. Regular Expression 14
8. Undecidability 46
Introduction to Theory of Computation
In this subject , the only motive is to prove the following statement :
or, in other words , a problem which can be solved using an algorithm can also be solved by the
computer.
To understand this subject we will consider some machines like : Finite Automata , Push Down
Automata and Turing Machines. Using these models, we will solve all those problems which
have an algorithm behind them. For ex : all mathematical functions and logics. Multiplying two
numbers have logic behind them , so we can perform it using computer .
But suppose , I have 100 numbers on a screen , and I choose one number in my mind. The
computer cannot guess that number because this problem doesn’t have any logic behind it. I can
choose any number I want.
Example: 0,1
Example: 011111… is an infinite string which has infinite length. (infinite strings are not
used in any formal language)
Language: A language is a collection of sentences of finite length all constructed from a finite
alphabet of symbols.
Example: L = {00, 010, 00000, 110000} is a language over input alphabet ∑ = {0, 1}
Formal Language: It is a language where form of strings is restricted over given alphabet.
Example:
Set of all strings where each string starts with 1 over binary alphabet.
L={1, 10, 11, ...} over 0's and 1's.
Empty String (Λ or ε or λ): If length of the string is zero, such string is called as empty string
or void string.
Kleene Closure:
If ∑ is the Alphabet, then there is a language in which any string of letters from ∑ is a
word, even the null string. We call this language closure of the alphabet.
It is denoted by * (asterisk) after the name of the alphabet is ∑*. This notation is also
known as the Kleene Star.
If ∑ = {a, b}, then ∑* = {ε, a, b, a, ab, bb,….}
∑* = ∑0 ∪ ∑1 ∪ ∑2 ∪ …
∑* = ∑+ ∪ {ε}
Positive Closure:
The' +' (plus operation) is sometimes called positive Closure.
If ∑ = {a}, then ∑+ = {a, aa, aaa, ...} = the set of nonempty strings from ∑
∑+ = ∑* - {ε}
If x, y ∈ ∑*, then x concatenated with y is the word formed by the symbols of x followed
by the symbols of y.
This is denoted by x.y, it is same as xy.
Substring of a string:
A string v is a substring of a string ω if and only if there are some strings x and y such
that ω = xvy.
Suffix of a string:
Prefix of a string:
Reversal of a string:
Given a string ω, its reversal denoted by ωR is the string spelled backwards.
Grammar:
Unrestricted Grammars
Recursively Enumerable Languages
Turing Machine
Finite Automata with unbounded structured memory in the form of a pushdown stack,
accepts context free languages.
Application of PDA: useful for modeling parsing, compilers, postfix evaluations, etc.
A Finite State Machine (FSM) or finite state automaton is an abstract machine used in the study of
computation and language that has only a finite, constant amount of memory.
A finite automaton is defined as 5-tuples (Q, Σ, δ, q 0, F). where, Q is finite non-empty set of states, Σ is
finite non-empty set of input d alphabets, δ is transition function which maps Q × Σ into Q, q 0 is initial
state and q0 ∈ Q, F is set of final states and F ⊆ Q.
A transition diagram is a finite directed labelled graph in which each vertex represents a state and directed
edges indicate the transition from one state to another. Edges are labelled with input/output. In this, the
initial state is represented by a circle with an arrow towards it, the final state is represented by two
concentric circles and intermediate states are represented by just a circle.
1. The finite automaton M1 has three states, labeled q1, q2, and q3; since these are the labels found in
the three circles. Thus, Q = { q1, q2, q3 }
2. The input symbols for M1 are 0 and 1, as these are the only labels found on the arrows that
represent the transitions. Thus, Σ = { 0, 1 }.
3. The start state is q1, the state shown with an unlabeled input arrow coming from nowhere.
4. The only state marked with the double circle is q3, so F = {q3}.
5. Transitions:
In the given transition diagram, vertex A is the initial state of the finite automata and C is the final state.
Transition Table:
In this representation, initial (start) state is represented by an arrow towards it and a final state is
represented by a circle or prefixed with star symbol.
A string ω is accepted by a finite automaton, M = (Q, Σ, δ, q 0, F), if δ(q0, ω) = F, where q0 is initial (start)
state and F is the final state. For every input symbol of string it takes one transition and starts with initial
state and reaches final state by reading the complete string.
Transition Function
It takes two arguments i.e., a state and an input symbol. δ(q, a) is the transition function for the DFA
(Deterministic Finite Automata) which is at the state q and when it receives the input a, DFA will move to
the next state.
where, ω is a string i.e., ω = xa, in which a is a single word and x is remaining string except the last
symbol.
In contrast to the NFA (NDFA), the Deterministic Finite Automata (DFA) has
There are two types of machines: Moore machine and Mealy machine.
Moore Machine
Mealy Machine
It is a finite automata in which the output depends upon both the present input and present state. Mealy
machine is also a 6-tuple (Q, Σ, Δ, δ, λ, q 0), where all symbols except λ have the same meaning as in
Moore machine. λ is the output function mapping Σ × Q into Δ.
Sequence detector 11011 using Mealy machine:
Study Notes on Regular Expressions
Regular Expression
Regular expressions mean to represent certain sets of strings in some algebraic fashion. A regular
expression over the alphabet Σ is defined as follows
Regular Language
The languages accepted by FA are regular languages and these languages are easily described by simple
expressions called regular expressions.
For any regular expression r and s over Σ corresponding to the languages L r and Ls respectively, each of
the following is a regular expression corresponding to the language indicated.
The following points are the some identities for regular expressions.
ϕ + R = R + ϕ = R
εR=Rε=R
R + R = R, where R is the regular expression.
(R*)* = R*
ϕR = Rϕ = ϕ
ε * = ε and ϕ* = ε
RR* = R*R = R+
R*R* = R*
(P + Q)* = (P*Q*)* = (P* + Q*)*, where P and Q are regular expressions.
R (P + Q) = RP + RQ and (P + Q)R = PR + QR
P(QP)* = (PQ)*P
Arden’s Theorem
If P and Q are two expressions over an alphabet Σ such that P does not contain ε, then the
following equation R = Q + RP.
The above equation has a unique solution i.e., R = QP *. Arden's theorem is used to determine the
regular expression represented by a transition diagram.
1. Union
2. Concatenation
3. Kleene closure
4. Complementation
5. Transpose
6. Intersection
Study Notes on Context-Free Grammars and
Push-Down Automata
Context Free Language
The languages which are generated by context-free grammars are called Context-Free Languages
(CFLs).
CFLs are accepted by Push down Automata.
CFLs are also called as non-deterministic CFL.
Example: Consider the context free grammar G = ({s}, {0, 1}, P, S) where Productions are:
(i) S → 0S1
(ii) S →ε
Derivations are:
Derivations:
In each sentential form, left most non-terminal substituted first to derive a string from the starting
symbol.
A derivation is left most if at each step in the derivation a production is applied to the left most
non-terminal in the sentential form.
In each sentential form, right most non-terminal substituted first to derive a string from the
starting symbol.
A derivation is left most if at each step in the derivation a production is applied to the left most
non-terminal in the sentential form.
Example:
S ⇒ AB
⇒ aAAB
⇒ aaAB
⇒ aaaB
⇒ aaab
S ⇒ AB S ⇒ AB S ⇒ AB
⇒ aAAB ⇒ Ab ⇒ AB
A derivation tree (or parse tree) can be defined with any non-terminal as the root, internal nodes
are non-terminals and leaf nodes are terminals.
Every derivation corresponds to one derivation tree.
If a vertex A has k children with labels A1, A2, A3,…Ak, then A → A1 A2 A3…Ak will be a
production in context-free grammar G.
Example:
Ambiguous Grammar
A context-free grammar G is ambiguous if there is atleast one string in L(G) having two or more distinct
derivation trees (or equivalently, two or more distinct left most derivations or two or more distinct right
most derivations).
e.g., consider the context-free grammar G having productions E → E + E/a. The string a + a + a has two
left most derivations.
Eliminate ambiguity.
Eliminate useless symbols productions.
Eliminate ∧ productions: A → ∧
• Eliminate unit productions: A → B
We can remove the ambiguity by removing the left recursing and left factoring.
Left Recursion
A production of the context free grammar G = (VN, E, P, S) is said to be left recursive if it is of the form
A → Aα
α∈(VN ∪ E)*
(i) A → Aα1 |Aα2|Aα3|…|Aαn|β1|β2|β3|…|βn
Where β1, β2 ..... βn do not begin with A. Then we replace A production in the form of
(ii) A → β1A1 |β2A1|…|βmA1 where
A1 → α1A1|α2A1|α3A1|…,|αnA1|∧
Left Factoring
Two or more productions of a variable A of the grammar G = (V N, E, S, P) are said to have left factoring
if the productions are of the form
A → αβ1|αβ2|…αβn where β1,…βn(VN ∪Σ)
A → αβ1|αβ2|…|αβn|y1|y2|y3|…|ym
α and y1, y2,….ym does not contain a as a prefix, then we replace the production into the form as follows
A → αA1|Y1Y2|…..|YM, where
A1 → β1|β2|…..|βn
The symbols that cannot be used in any productions due to their unavailability in the productions or
inability in deriving the terminals are known as useless symbols.
S → aS |A| C
A → a
B → aa
C → ab
U = {A, B, S}
Because C does not produce terminal symbols so this production will be deleted. Now the modified
productions are
S → aS |A
A → a
B → aa
S → AB
In this graph, B variable is not reachable from S so it will be deleted also. Now the productions are
S → aS |A
A → a
(Null productions)
S → ABaC
A → BC
B → b|∧
C → D|∧
D → d
solve step find the nullable variables Firstly the set is empty
N = {}
N = {B, C}
N = {A, B, C}
Due to B, C variables, A will also be a nullable variable.
A → B | C
B → b
C → D
D → d
The above grammar is the every possible combination except ∧ Now put this new grammar with original
grammar with null.
∵ S B & B A
∴ S A
S → Aa S → bb | a | bc
B → bb + A → bb
A → a | bc B → a | bc
S → Aa | bb | a | bc
B → bb | a | bc
A → a | bc | bb
Ambiguity is the undesirable property of a context-free grammar that we might wish to eliminate. To
convert a context-free grammar into normal form, we start by trying to eliminate null productions of the
form A → ∧ and the unit productions of the form B → C.
A context-free grammar G is said to be in Chomsky Normal Form, if every production is of the form
either A → a, (exactly a single terminal in the right hand side of the production) or A → BC (exactly two
variables on the right-hand side of the production).
e.g., the context-free grammar G with productions S → AB, A → a, B → b is in Chomsky normal form.
The number of steps in the derivation of any string ω of length n is 2n – 1, where the grammar
should be in CNF.
The minimum height of derivation tree of any ω of length n is [log 2 n] + 1.
The maximum height of derivation tree of any ω of length n = n.
A context-free grammar is said to be in Greibach Normal Form if every production is of the form
A → aα
The set of deterministic context-free languages is a proper subset of the set of context-free languages that
possess an unambiguous context-free grammar.
Key Points
Pushdown Automata (PDA)
A Pushdown Automata (PDA) is essentially an NFA with a stack. A PDA is inherently non-deterministic.
To handle a language like {an bn |n ≥ 0}, the machine needs to remember the number of a's and b's. To do
this, we use a stack. So, a PDA is a finite automaton with a stack. A stack is a data structure that can
contain a number of elements but for which only the top element may be accessed.
Definition of PDA
Acceptance of PDA
Tape is divided into finitely many cells. Each cell contains a symbol in an alphabet L. The stack head
always scans the top symbol of the stack as shown in figure.
It means that if the tape head reads input a, the stack head read v and the finite control is in state q, then
one of the possible moves is that the next state is p, v is replaced by u at stack and the tape head moves
one cell to the right.
δ (q, ε, v) = (p, u)
δ (q, a, ε) = (p, u)
It means that a push operation performs on stack.
δ (q, a, v) = (p, ε)
PDA acceptance by Empty Stack: If the stack is empty after reading the entire input string then
PDA accepted the given string, otherwise rejected.
PDA acceptance by Final State: If the stack reaches final state after reading the input string then
PDA accepted the given string, otherwise rejected.
PDA acceptance by Final State and Empty Stack: If the stack reaches final state and also stack
is empty after reading the entire input string then PDA accepted the given string, otherwise
rejected.
Non-deterministic PDA: Like NFA, Non-deterministic PDA (NPDA) has a number of choices for its
inputs. An NPDA accepts an input, if sequence of choices leads to some final state or causes PDA to
empty its stack.
Deterministic PDA
Deterministic PDA (DPDA) is a pushdown automata whose action is a situation is fully determined rather
than facing a choice between multiple alternative actions. DPDAs cannot handle languages or grammars
with ambiguity. A deterministic context-free language is a language recognised by some deterministic
pushdown automata.
L = {anbn : n ≥ 0}
L = {ancb2n : n ≥ 0}
L = {ωcωR : ω∈(a + b) * but not L = {ωωR : (a + b)*}
For every regular set, there exists a CFG e such that L = L (G).
Every regular language is a CFL.
Let G1 and G2 be context-free grammars. Then, G1 and G2 are equivalent if and only if L (G1) = L
(G2).
Union
Concatenation
Kleene closure
Positive closure
Substitution
Homomorphism
Inverse homomorphism
Reversal
Intersection with regular
Union with regular
Difference with regular
Intersection
Complementation
Difference
Decidable Problems:
Undecidable Problems:
Regular Languages
1. {w | w ∈ {a, b }* }
2. {aw | w ∈ {a, b }* }
3. {bw | w ∈ {a, b }* }
4. {wa | w ∈ {a, b }* }
5. {awb | w ∈ {a, b }* }
6. {w1abw2 | w1,w2 ∈ {a, b }* }
7. { ambn | m,n>0 }
8. { ambnck | m,n,k>=0}
9. {a2n | n>=0}
Non-Regular Languages
CFL’s (NCFL’s)
Non-CFL’s
Important Properties:
Note :Union , Intersection or difference with regular doesn't change the language. i.e.
Let (AnyLANG) below represents any of the language among DCFl , CFl , CSL , RE or REC
Regular ∩ AnyLANG = AnyLANG
Pumping Lemma is not a sufficiency, that is, even if there is an integer n that satisfies the
conditions of Pumping Lemma, the language is not necessarily regular.
Suppose that L is regular and let n be the number of states of an FA that accepts L.
Consider a string x = anbn for that n.
Then there must be strings u, v, and w such that
x = uvw, |uv| ≤ n |v| > 0, and for every m ≥ 0, uvmw ∈ L.
Since |v| > 0, v has at least one symbol.
Also since |uv| ≤ n, v = ap, for some p > 0,
Let us now consider the string uvmw for m = 2.
Then uv2w = an-pa2pbn = an+pbn. Since p > 0 , n + p ≠ n .
Hence an+pbn can not be in the language L represented by akbk.
This violates the condition that for every m ≥ 0, uvmw ∈ L.
Hence L is not a regular language.
Let L be a CFL. Then there exists a constant N such that if z ∈L s.t. |z|≥N, then we can
write z=uvwxy,
|vwx| ≤ N
vx ≠ ε
For all k ≥ 0 : uvkwxky ∈ L
Study Notes on Turing Machines
Turing Machine
The languages accepted by Turing machine are said to be recursively enumerable. A Turing
Machine (TM) is a device with a finite amount of read only hard memory (states) and an
unbounded amount of read/write tape memory.
For simulating even a simple behaviour, a Universal Turing Machine must have a large number
of states. If we modify our basic model by increasing the number of read/write heads, the number
of dimensions of input tape and adding a special purpose memory, then we can design a
Universal Turing Machine.
where, Q is a finite non-empty set of states, Σ is a non-empty set of input symbols (alphabets)
which is a subset of Γ and b ∈ Σ, Γ is a finite non-empty set of tape symbols, δ is the transition
function which maps (Q × Γ) to (Q × Γ × {L, R}), q0 is the initial state and q0 ∈ Q, b is the blank
and b ∈ Γ, F is the set of final states and F ⊆ Q.
The transition function Q × Γ → Q × Γ × {L, R} states that if a Turing machine is in some state
(from set Q), by taking a tape symbol (from set Γ), it goes to some next state (from set ï) by
overwriting (replacing) the current symbol by another or same symbol and the read/write head
moves one cell either left (L) or right (R) along the tape.
3. Construct a TM that accepts the language A = {0(2^n) | n>=0}
TM can be used as a language recogniser. TM recognises all languages, regular language, CFL,
CSL, Type-0.
There are several ways an input string might fail to be accepted by a Turing machine
It can lead to some non-halting configuration from which the Turing machine cannot
move.
At some point in the processing of the string, the tape head in scanning the first cell and
the next move specifies moving the head left off the end of the tape.
In either of these cases, we say that the Turing machine crashes
Halting Problem of Turing Machine: A class of problems with two output (true/false) is called
solvable (or decidable) problem, if there exists some definite algorithm which always halts (also
called terminates), else the class of problem is called unsolvable (or undecidable.
A language L is said to be recursively enumerable, if there exists a Turing machine that accepts
it.
A language is recursive if and only if there exists a membership algorithm for it. Therefore, a
language L on Σ is said to be recursive, if there exists a Turing machine that accepts the language
L and 'it halts on every ω∈Σ+.
Recursively enumerable languages are closed under union, intersection, concatenation and
Kleene closure and these languages are not closed under complementation.
An infinite set is countable if and only if there is a one-to-one correspondence between its
elements and the natural numbers. Otherwise it is said to be uncountable.
1. Halting TM : (Accepts Recursive languages) : TMs that always halt, no matter accepting
or non no matter accepting or non-accepting (called as decidable problems)
2. TM : (Accepts Recursively enumerable): TMs that are guaranteed to halt are guaranteed
to halt only on acceptance only on acceptance. If non-accepting, it may or may not halt
(i.e., could loop forever). (Either decidable or partially decidable)
Decidable Problem
If there is a Turing machine that decides the problem, called as Decidable problem.
A decision problem that can be solved by an algorithm that halts on all inputs in a finite number
of steps.
A problem is decidable, if there is an algorithm that can answer either yes or no.
A language for which membership can be decided by an algorithm that halts on all inputs in a
finite number of steps.
Decidable problem is also called as totally decidable problem, algorithmically solvable,
recursively solvable.
A problem that cannot be solved for all cases by any algorithm whatsoever.
Equivalent Language cannot be recognized by a Turing machine that halts for all inputs.
Undecidable problems are two types: Partially decidable (Semi-decidable) and Totally not decidable.
Semi decidable: A problem is semi-decidable if there is an algorithm that says yes. if the answer
is yes, however it may loop infinitely if the answer is no.
Totally not decidable (Not partially decidable): A problem is not decidable if we can prove that
there is no algorithm that will deliver an answer.
Decidability table for Formal Languages:
Problems RL DC CFL Rec RE
Membership Y Y Y Y N
Finiteness Y Y Y N N
Emptiness Y Y Y N N
Equivalence Y Y N N N
Is L1 ⊆ L2 ?(SUBSET) Y N N N N
Is L = REGULAR? Y Y N N N
Is L Ambiguous? Y N N N N
L=∑* ?(UNIVERSAL) Y Y N N N
L1 ∩ L2= Ф ?(DISJOINT) Y N N N N
Is L= Regular? Y Y N N N
L1 ∩ L2= L Y N N Y Y
Is L' also same type? Y Y N Y N