Sei sulla pagina 1di 76

INTRODUCTION TO AUTOMATA THEORY

Contents

1. Finite-state machine
1.1. State diagram
1.2. Deterministic finite automaton
1.3. Nondeterministic finite automaton
1.4. DFA minimization
1.5. NDFA to DFA conversion algorithm
1.6. Finite-state transducer
1.6.1. Moore machine
1.6.2. Mealy machine
2. Regular grammar
2.1. Regular expression
2.2. Regular language
2.3. Pumping lemma for regular languages
3. Context-free grammar
3.1. Production (computer science)
3.2. Context-free language
3.3. Ambiguous grammar
3.4. Chomsky normal form
3.5. Greibach normal form
3.6. Pumping lemma for context-free languages
4. Pushdown automaton
4.1. Nested stack automaton
5. Turing machine
5.1. Linear bounded automaton
5.2. Multitape Turing machine
5.3. Multi-track Turing machine
5.4. Non-deterministic Turing machine
6. Recursive language
6.1. Recursive set
6.2. Decision problem
6.3. Undecidable problem
6.3.1. Halting problem
6.4. P (complexity)
6.5. R (complexity)
6.6. RP (complexity)
6.7. Recursively enumerable set
6.8. Rice's theorem

1. Finite-state machine
A finite state machine (FSM) or finite-state automaton (FSA, plural: automata), finite automaton, or
simply a state machine, is a mathematical model of computation. It is an abstract machine that can be in
exactly one of a finite number of states at any given time. The FSM can change from one state to another
in response to some external inputs; the change from one state to another is called a transition. An FSM is
defined by a list of its states, its initial state, and the conditions for each transition. Finite state machines
are of two types - deterministic finite state machines and non-deterministic finite state machines.[1]

The behavior of state machines can be observed in many devices in modern society that perform a
predetermined sequence of actions depending on a sequence of events with which they are presented.
Examples are vending machines, which dispense products when the proper combination of coins is
deposited, elevators, whose sequence of stops is determined by the floors requested by riders, traffic
lights, which change sequence when cars are waiting, and combination locks, which require the input of
combination numbers in the proper order.

The finite state machine has less computational power than some other models of computation such as the
Turing machine.[2] The computational power distinction means there are computational tasks that a
Turing machine can do but a FSM cannot. This is because a FSM's memory is limited by the number of
states it has.

1 of 76
Example: coin-operated turnstile

State diagram for a turnstile

A turnstile

An example of a simple mechanism that can be modeled by a state machine is a turnstile.[3][4] A turnstile,
used to control access to subways and amusement park rides, is a gate with three rotating arms at waist
height, one across the entryway. Initially the arms are locked, blocking the entry, preventing patrons from
passing through. Depositing a coin or token in a slot on the turnstile unlocks the arms, allowing a single
customer to push through. After the customer passes through, the arms are locked again until another
coin is inserted.

Considered as a state machine, the turnstile has two possible states: Locked and Unlocked.[3] There are
two possible inputs that affect its state: putting a coin in the slot (coin) and pushing the arm (push). In
the locked state, pushing on the arm has no effect; no matter how many times the input push is given, it
stays in the locked state. Putting a coin in – that is, giving the machine a coin input – shifts the state from
Locked to Unlocked. In the unlocked state, putting additional coins in has no effect; that is, giving
additional coin inputs does not change the state. However, a customer pushing through the arms, giving a
push input, shifts the state back to Locked.

The turnstile state machine can be represented by a state transition table, showing for each possible state,
the transitions between them (based upon the inputs given to the machine) and the outputs resulting from
each input:

Current State Input Next State Output


coin Unlocked Unlocks the turnstile so that the customer can push through.
Locked
push Locked None
coin Unlocked None
Unlocked
push Locked When the customer has pushed through, locks the turnstile.

The turnstile state machine can also be represented by a directed graph called a state diagram (above).
Each state is represented by a node (circle). Edges (arrows) show the transitions from one state to
another. Each arrow is labeled with the input that triggers that transition. An input that doesn't cause a
change of state (such as a coin input in the Unlocked state) is represented by a circular arrow returning
to the original state. The arrow into the Locked node from the black dot indicates it is the initial state.

Concepts and terminology


A state is a description of the status of a system that is waiting to execute a transition. A transition is a set
of actions to be executed when a condition is fulfilled or when an event is received. For example, when
using an audio system to listen to the radio (the system is in the "radio" state), receiving a "next" stimulus
results in moving to the next station. When the system is in the "CD" state, the "next" stimulus results in
moving to the next track. Identical stimuli trigger different actions depending on the current state.

In some finite-state machine representations, it is also possible to associate actions with a state:

an entry action: performed when entering the state, and


an exit action: performed when exiting the state.

2 of 76
Representations

Fig. 1 UML state chart


example (a toaster oven)

Fig. 2 SDL state machine


example

Fig. 3 Example of a simple


finite state machine
For an introduction, see State diagram.

State/Event table

Several state transition table types are used. The most common representation is shown below: the
combination of current state (e.g. B) and input (e.g. Y) shows the next state (e.g. C). The complete action's
information is not directly described in the table and can only be added using footnotes. A FSM definition
including the full actions information is possible using state tables (see also virtual finite-state machine).

State transition table


        Current state State A State B State C
Input
Input X … … …
Input Y … State C …
Input Z … … …

Classification

3 of 76
Finite state machines can be subdivided into transducers, acceptors, classifiers and sequencers. [5]

Acceptors and recognizers

Fig. 4 Acceptor FSM:


parsing the string "nice"

Acceptors, also called recognizers and sequence detectors, produce binary output, indicating whether
or not the received input is accepted. Each state of an FSM is either "accepting" or "not accepting". Once
all input has been received, if the current state is an accepting state, the input is accepted; otherwise it is
rejected. As a rule, input is a sequence of symbols (characters); actions are not used. The example in
figure 4 shows a finite state machine that accepts the string "nice". In this FSM, the only accepting state is
state 7.

A (possibly infinite) set of symbol sequences, aka. formal language, is called a regular language if there is
some Finite State Machine that accepts exactly that set. For example, the set of binary strings with an
even number of zeroes is a regular language (cf. Fig. 5), while the set of all strings whose length is a prime
number is not.[6]:18,71

A machine could also be described as defining a language, that would contain every string accepted by the
machine but none of the rejected ones; that language is "accepted" by the machine. By definition, the
languages accepted by FSMs are the regular languages—; a language is regular if there is some FSM that
accepts it.

The problem of determining the language accepted by a given finite state acceptor is an instance of the
algebraic path problem—itself a generalization of the shortest path problem to graphs with edges
weighted by the elements of an (arbitrary) semiring.[7][8][9][jargon]

Fig. 5: Representation of a
finite-state machine; this
example shows one that
determines whether a
binary number has an even
number of 0s, where is
an accepting state.

The start state can also be an accepting state, in which case the automaton accepts the empty string.

Classifiers

A classifier is a generalization of a finite state machine that, similar to an acceptor, produces a single
output on termination but has more than two terminal states.[citation needed]

Transducers

4 of 76
Fig. 6 Transducer FSM:
Moore model example
Main article: Finite-state transducer

Transducers generate output based on a given input and/or a state using actions. They are used for
control applications and in the field of computational linguistics.

In control applications, two types are distinguished:

Moore machine
The FSM uses only entry actions, i.e., output depends only on the state. The advantage of the Moore
model is a simplification of the behaviour. Consider an elevator door. The state machine recognizes
two commands: "command_open" and "command_close", which trigger state changes. The entry
action (E:) in state "Opening" starts a motor opening the door, the entry action in state "Closing"
starts a motor in the other direction closing the door. States "Opened" and "Closed" stop the motor
when fully opened or closed. They signal to the outside world (e.g., to other state machines) the
situation: "door is open" or "door is closed".

Fig. 7 Transducer FSM:


Mealy model example

Mealy machine
The FSM also uses input actions, i.e., output depends on input and state. The use of a Mealy FSM
leads often to a reduction of the number of states. The example in figure 7 shows a Mealy FSM
implementing the same behaviour as in the Moore example (the behaviour depends on the
implemented FSM execution model and will work, e.g., for virtual FSM but not for event-driven
FSM). There are two input actions (I:): "start motor to close the door if command_close arrives" and
"start motor in the other direction to open the door if command_open arrives". The "opening" and
"closing" intermediate states are not shown.

Generators

Sequencers, or generators, are a subclass of the acceptor and transducer types that have a single-letter
input alphabet. They produce only one sequence which can be seen as an output sequence of acceptor or
transducer outputs.[citation needed]

Determinism

A further distinction is between deterministic (DFA) and non-deterministic (NFA, GNFA) automata. In a
deterministic automaton, every state has exactly one transition for each possible input. In a
non-deterministic automaton, an input can lead to one, more than one, or no transition for a given state.
The powerset construction algorithm can transform any nondeterministic automaton into a (usually more
complex) deterministic automaton with identical functionality.

A finite state machine with only one state is called a "combinatorial FSM". It only allows actions upon
transition into a state. This concept is useful in cases where a number of finite state machines are required
to work together, and when it is convenient to consider a purely combinatorial part as a form of FSM to
suit the design tools.[10]

5 of 76
1.1. State diagram

A state diagram for a door


that can only be opened
and closed

A state diagram is a type of diagram used in computer science and related fields to describe the behavior
of systems. State diagrams require that the system described is composed of a finite number of states;
sometimes, this is indeed the case, while at other times this is a reasonable abstraction. Many forms of
state diagrams exist, which differ slightly and have different semantics.

State diagrams are used to give an abstract description of the behavior of a system. This behavior is
analyzed and represented as a series of events that can occur in one or more possible states. Hereby "each
diagram usually represents objects of a single class and track the different states of its objects through the
system".[1]

State diagrams can be used to graphically represent finite state machines. This was introduced by C.E.
Shannon and W. Weaver in their 1949 book "The Mathematical Theory of Communication". Another source
is Taylor Booth in his 1967 book "Sequential Machines and Automata Theory". Another possible
representation is the State transition table.

1.2. Deterministic finite automaton

A deterministic finite automaton (DFA) is a finite state machine that accepts and rejects strings of
symbols and only produces a unique computation (or run) of the automaton for each input string. [1] DFA is
also known as a deterministic finite acceptor (DFA) and a deterministic finite state machine
(DFSM) or a deterministic finite state automaton (DFSA). Deterministic refers to the uniqueness of
the computation. In search of the simplest models to capture finite-state machines, Warren McCulloch and
Walter Pitts were among the first researchers to introduce a concept similar to finite automata in
1943.[2][3]

An example of a deterministic
finite automaton that accepts
only binary numbers that are
multiples of 3. The state S0 is
both the start state and an
accept state.

The figure illustrates a deterministic finite automaton using a state diagram. In the automaton, there are
three states: S0, S1, and S2 (denoted graphically by circles). The automaton takes a finite sequence of 0s
and 1s as input. For each state, there is a transition arrow leading out to a next state for both 0 and 1.
Upon reading a symbol, a DFA jumps deterministically from one state to another by following the

6 of 76
transition arrow. For example, if the automaton is currently in state S 0 and the current input symbol is 1,
then it deterministically jumps to state S1. A DFA has a start state (denoted graphically by an arrow
coming in from nowhere) where computations begin, and a set of accept states (denoted graphically by a
double circle) which help define when a computation is successful.

A DFA is defined as an abstract mathematical concept, but is often implemented in hardware and software
for solving various specific problems. For example, a DFA can model software that decides whether or not
online user input such as email addresses are valid.[4]

Formal definition
A deterministic finite automaton M is a 5-tuple, (Q, Σ, δ, q0, F), consisting of

a finite set of states (Q)


a finite set of input symbols called the alphabet (Σ)
a transition function (δ : Q × Σ → Q)
an initial or start state (q0 ∈ Q)
a set of accept states (F ⊆ Q)

Let w = a1a2 ... an be a string over the alphabet Σ. The automaton M accepts the string w if a sequence of
states, r0,r1, ..., rn, exists in Q with the following conditions:

1. r0 = q 0
2. ri+1 = δ(ri, ai+1), for i = 0, ..., n−1
3. rn ∈ F.

In words, the first condition says that the machine starts in the start state q0. The second condition says
that given each character of string w, the machine will transition from state to state according to the
transition function δ. The last condition says that the machine accepts w if the last input of w causes the
machine to halt in one of the accepting states. Otherwise, it is said that the automaton rejects the string.
The set of strings that M accepts is the language recognized by M and this language is denoted by L(M).

A deterministic finite automaton without accept states and without a starting state is known as a transition
system or semiautomaton.

Complete and incomplete


According to the above definition, deterministic finite automata are always complete: they define a
transition for each state and each input symbol.

While this is the most common definition, some authors use the term deterministic finite automaton for a
slightly different notion: an automaton that defines at most one transition for each state and each input
symbol; the transition function is allowed to be partial.[citation needed] When no transition is defined, such
an automaton halts.

Example
The following example is of a DFA M, with a binary alphabet, which requires that the input contains an
even number of 0s.

The state diagram for M

M = (Q, Σ, δ, q0, F) where

Q = {S1, S2},
Σ = {0, 1},
q 0 = S1 ,

7 of 76
F = {S1}, and
δ is defined by the following state transition table:

0 1
S1 S2 S1
S2 S1 S2

The state S1 represents that there has been an even number of 0s in the input so far, while S2 signifies an
odd number. A 1 in the input does not change the state of the automaton. When the input ends, the state
will show whether the input contained an even number of 0s or not. If the input did contain an even
number of 0s, M will finish in state S1, an accepting state, so the input string will be accepted.

Closure properties
If DFAs recognize the languages that are obtained by applying an operation on the DFA recognizable
languages then DFAs are said to be closed under the operation. The DFAs are closed under the following
operations.

Reversal
Union Init
Intersection Quotient[citation needed]
Concatenation
Substitution[citation needed]
Negation
Kleene closure Homomorphism[citation needed]

For each operation, an optimal construction with respect to the number of states has been determined in
the state complexity research. Since DFAs are equivalent to nondeterministic finite automata (NFA), these
closures may also be proved using closure properties of NFA.

Advantages and disadvantages


DFAs are one of the most practical models of computation, since there is a trivial linear time, constant-
space, online algorithm to simulate a DFA on a stream of input. Also, there are efficient algorithms to find
a DFA recognizing:

the complement of the language recognized by a given DFA.


the union/intersection of the languages recognized by two given DFAs.

Because DFAs can be reduced to a canonical form (minimal DFAs), there are also efficient algorithms to
determine:

whether a DFA accepts any strings


whether a DFA accepts all strings
whether two DFAs recognize the same language
the DFA with a minimum number of states for a particular regular language

DFAs are equivalent in computing power to nondeterministic finite automata (NFAs). This is because,
firstly any DFA is also an NFA, so an NFA can do what a DFA can do. Also, given an NFA, using the
powerset construction one can build a DFA that recognizes the same language as the NFA, although the
DFA could have exponentially larger number of states than the NFA. [11][12]

On the other hand, finite state automata are of strictly limited power in the languages they can recognize;
many simple languages, including any problem that requires more than constant space to solve, cannot be
recognized by a DFA. The classic example of a simply described language that no DFA can recognize is
bracket or Dyck language, i.e., the language that consists of properly paired brackets such as word "(()())".
Intuitively, no DFA can recognize the Dyck language because DFAs are not capable of counting: a DFA-like
automaton needs to have a state to represent any possible number of "currently open" parentheses,
meaning it would need an unbounded number of states. Another simpler example is the language
consisting of strings of the form anbn for some finite but arbitrary number of a's, followed by an equal
number of b's.[13]

1.3. Nondeterministic finite automaton

A finite state machine is called a deterministic finite automaton (DFA), if

8 of 76
each of its transitions is uniquely determined by its source state and input symbol, and
reading an input symbol is required for each state transition.

A nondeterministic finite automaton (NFA), or nondeterministic finite state machine, does not need to
obey these restrictions. In particular, every DFA is also an NFA. Sometimes the term NFA is used in a
narrower sense, referring to a NFA that is not a DFA, but not in this article.

NFAs were introduced in 1959 by Michael O. Rabin and Dana Scott,[2] who also showed their equivalence
to DFAs.

An NFA, similar to a DFA, consumes a string of input symbols. For each input symbol, it transitions to a
new state until all input symbols have been consumed. Unlike a DFA, it is non-deterministic, i.e., for some
state and input symbol, the next state may be nothing or one or two or more possible states. Thus, in the
formal definition, the next state is an element of the power set of the states, which is a set of states to be
considered at once. The notion of accepting an input is similar to that for the DFA. When the last input
symbol is consumed, the NFA accepts if and only if there is some set of transitions that will take it to an
accepting state. Equivalently, it rejects, if, no matter what transitions are applied, it would not end in an
accepting state.

Formal definition
An NFA is represented formally by a 5-tuple, (Q, Σ, Δ, q0, F), consisting of

a finite set of states Q


a finite set of input symbols Σ
a transition function Δ : Q × Σ → P(Q).
an initial (or start) state q0 ∈ Q
a set of states F distinguished as accepting (or final) states F ⊆ Q.

Here, P(Q) denotes the power set of Q. Let w = a1a2 ... an be a word over the alphabet Σ. The automaton
M accepts the word w if a sequence of states, r0,r1, ..., rn, exists in Q with the following conditions:

1. r0 = q 0
2. ri+1 ∈ Δ(ri, ai+1), for i = 0, ..., n−1
3. rn ∈ F.

In words, the first condition says that the machine starts in the start state q0. The second condition says
that given each character of string w, the machine will transition from state to state according to the
transition function Δ. The last condition says that the machine accepts w if the last input of w causes the
machine to halt in one of the accepting states. In order for w being accepted by M it is not required that
every state sequence ends in an accepting state, it is sufficient if one does. Otherwise, i.e. if it is
impossible at all to get from q0 to a state from F by following w, it is said that the automaton rejects the
string. The set of strings M accepts is the language recognized by M and this language is denoted by L(M).

We can also define L(M) in terms of Δ*: Q × Σ* → P(Q) such that:

1. Δ*(r, ε)= {r} where ε is the empty string, and


2. If x ∈ Σ*, a ∈ Σ, and Δ*(r, x)={r1, r2,..., rk} then Δ*(r, xa)= Δ(r1, a)∪...∪Δ(rk, a).

Now L(M) = {w | Δ*(q0, w) ∩ F ≠ ∅}.

Note that there is a single initial state, which is not necessary. Sometimes, NFAs are defined with a set of
initial states. There is an easy construction that translates a NFA with multiple initial states to a NFA with
single initial state, which provides a convenient notation.

For a more elementary introduction of the formal definition see automata theory.

Example

9 of 76
The state diagram for M. It is
not deterministic since in state
p reading a 1 can lead to p or
to q.

Let M be an NFA, with a binary alphabet, that determines if the input ends with a 1.

In formal notation, let M = ({p, q}, {0, 1}, Δ, p, {q}) where the transition function Δ can be defined by
this state transition table:

Input 0 1
State
p {p} {p,q}
q ∅ ∅

Note that Δ(p,1) has more than one state therefore M is nondeterministic. Some possible state sequences
for the input word "1011" are:

Input: 1 0 1 1
State sequence 1: p q ?
State sequence 2: p p p q ?
State sequence 3: p p p p q

The word is accepted by M since state sequence 3 satisfies the above definition; it doesn't matter that
sequences 1 and 2 fail to do so. In contrast, the word "10" is rejected by M, since there is no way to reach
the only accepting state, q, by reading the final 0 symbol or by an ε-transition.

Equivalence to DFA
A Deterministic finite automaton (DFA) can be seen as a special kind of NFA, in which for each state and
alphabet, the transition function has exactly one state. Thus, it is clear that every formal language that can
be recognized by a DFA can be recognized by a NFA.

Conversely, for each NFA, there is a DFA such that it recognizes the same formal language. The DFA can
be constructed using the powerset construction.

This result shows that NFAs, despite their additional flexibility, are unable to recognize languages that
cannot be recognized by some DFA. It is also important in practice for converting easier-to-construct NFAs
into more efficiently executable DFAs. However, if the NFA has n states, the resulting DFA may have up to
2n states, which sometimes makes the construction impractical for large NFAs.

NFA with ε-moves


Nondeterministic finite automaton with ε-moves (NFA-ε) is a further generalization to NFA. This
automaton replaces the transition function with the one that allows the empty string ε as a possible input.
The transitions without consuming an input symbol are called ε-transitions. In the state diagrams, they are
usually labeled with the Greek letter ε. ε-transitions provide a convenient way of modeling the systems
whose current states are not precisely known: i.e., if we are modeling a system and it is not clear whether
the current state (after processing some input string) should be q or q', then we can add an ε-transition
between these two states, thus putting the automaton in both states simultaneously.

Formal definition

An NFA-ε is represented formally by a 5-tuple, (Q, Σ, Δ, q0, F), consisting of

a finite set of states Q


a finite set of input symbols called the alphabet Σ
a transition function Δ : Q × (Σ ∪ {ε}) → P(Q)
an initial (or start) state q0 ∈ Q
a set of states F distinguished as accepting (or final) states F ⊆ Q.

Here, P(Q) denotes the power set of Q and ε denotes empty string.

For a q ∈ Q, let E(q) denote the set of states that are reachable from state q by following ε-transitions in
the transition function Δ, i.e., p ∈ E(q) if there is a sequence of states q1,..., qk such that

q1 = q,

10 of 76
qi+1 ∈ Δ(qi, ε) for each 1 ≤ i < k, and
qk = p.

E(q) is known as the ε-closure of q.

ε-closure is also defined for a set of states. The ε-closure of a set of states, P, of an NFA is defined as the
set of states reachable from any state in P following ε-transitions. Formally, for P ⊆ Q, E(P) = ∪q∈P E(q).

Let w = a1a2 ... an be a word over the alphabet Σ. The automaton M accepts the word w if a sequence of
states, r0,r1, ..., rn, exists in Q with the following conditions:

1. r0 ∈ E(q0),
2. ri+1 ∈ E(r') where r' ∈ Δ(ri, ai+1) for each i = 0, ..., n−1, and
3. rn ∈ F.

In words, the first condition says that the machine starts at the state that is reachable from the start state
q0 via ε-transitions. The second condition says that after reading ai, the machine takes a transition of Δ
from ri to r', and then takes any number of ε-transitions according to Δ to move from r' to ri+1. The last
condition says that the machine accepts w if the last input of w causes the machine to halt in one of the
accepting states. Otherwise, it is said that the automaton rejects the string. The set of strings M accepts is
the language recognized by M and this language is denoted by L(M).

Example

The state diagram for M

Let M be a NFA-ε, with a binary alphabet, that determines if the input contains an even number of 0s or an
even number of 1s. Note that 0 occurrences is an even number of occurrences as well.

In formal notation, let M = ({s0, s1, s2, s3, s4}, {0, 1}, Δ, s0, {s1, s3}) where the transition relation Δ can
be defined by this state transition table:

Input 0 1 ε
State
S0 {} {} {S1, S3}
S1 {S2} {S1} {}
S2 {S1} {S2} {}
S3 {S3} {S4} {}
S4 {S4} {S3} {}

M can be viewed as the union of two DFAs: one with states {S1, S2} and the other with states {S3, S4}.
The language of M can be described by the regular language given by this regular expression
(1*(01*01*)*) ∪ (0*(10*10*)*). We define M using ε-moves but M can be defined without using ε-moves.

Equivalence to NFA

To show NFA-ε is equivalent to NFA, first note that NFA is a special case of NFA-ε, so it remains to show
for every NFA-ε, there exists an equivalent NFA.

Let A = (Q, Σ,Δ, q0, F) be a NFA-ε. The NFA A' = (Q, Σ, Δ', E(q0), F) is equivalent to A, where for each a ∈
Σ and q ∈ Q, Δ'(q,a) = E( Δ(q,a) ).

Thus NFA-ε is equivalent to NFA. Since NFA is equivalent to DFA, NFA-ε is also equivalent to DFA.

11 of 76
Closure properties

Composed NFA accepting


the union of the languages
of some given NFAs N(s)
and N(t). For an input
word w in the language
union, the composed
automaton follows an
ε-transition from q to the
start state (left colored
circle) of an appropriate
subautomaton — N(s) or
N(t) — which, by following
w, may reach an accepting
state (right colored circle);
from there, state f can be
reached by another
ε-transition. Due to the
ε-transitions, the
composed NFA is properly
nondeterministic even if
both N(s) and N(t) were
DFAs; vice versa,
constructing a DFA for the
union language (even of
two DFAs) is much more
complicated.

NFAs are said to be closed under a (binary/unary) operator if NFAs recognize the languages that are
obtained by applying the operation on the NFA recognizable languages. The NFAs are closed under the
following operations.

Union (cf. picture)


Intersection
Concatenation
Negation
Kleene closure

Since NFAs are equivalent to nondeterministic finite automaton with ε-moves (NFA-ε), the above closures
are proved using closure properties of NFA-ε. The above closure properties imply that NFAs only
recognize regular languages.

NFAs can be constructed from any regular expression using Thompson's construction algorithm.

Properties
The machine starts in the specified initial state and reads in a string of symbols from its alphabet. The
automaton uses the state transition function Δ to determine the next state using the current state, and the
symbol just read or the empty string. However, "the next state of an NFA depends not only on the current
input event, but also on an arbitrary number of subsequent input events. Until these subsequent events
occur it is not possible to determine which state the machine is in". [3] If, when the automaton has finished
reading, it is in an accepting state, the NFA is said to accept the string, otherwise it is said to reject the
string.

The set of all strings accepted by an NFA is the language the NFA accepts. This language is a regular
language.

For every NFA a deterministic finite automaton (DFA) can be found that accepts the same language.

12 of 76
Therefore, it is possible to convert an existing NFA into a DFA for the purpose of implementing a (perhaps)
simpler machine. This can be performed using the powerset construction, which may lead to an
exponential rise in the number of necessary states. For a formal proof of the powerset construction, please
see the Powerset construction article.

Implementation
There are many ways to implement a NFA:

Convert to the equivalent DFA. In some cases this may cause exponential blowup in the number of
states.[4]
Keep a set data structure of all states which the NFA might currently be in. On the consumption of an
input symbol, unite the results of the transition function applied to all current states to get the set of
next states; if ε-moves are allowed, include all states reachable by such a move (ε-closure). Each step
requires at most s2 computations, where s is the number of states of the NFA. On the consumption of
the last input symbol, if one of the current states is a final state, the machine accepts the string. A
string of length n can be processed in time O(ns2),[5] and space O(s).
Create multiple copies. For each n way decision, the NFA creates up to copies of the machine.
Each will enter a separate state. If, upon consuming the last input symbol, at least one copy of the
NFA is in the accepting state, the NFA will accept. (This, too, requires linear storage with respect to
the number of NFA states, as there can be one machine for every NFA state.)
Explicitly propagate tokens through the transition structure of the NFA and match whenever a token
reaches the final state. This is sometimes useful when the NFA should encode additional context
about the events that triggered the transition. (For an implementation that uses this technique to
keep track of object references have a look at Tracematches.[6])

Application of NFA
NFAs and DFAs are equivalent in that if a language is recognized by an NFA, it is also recognized by a DFA
and vice versa. The establishment of such equivalence is important and useful. It is useful because
constructing an NFA to recognize a given language is sometimes much easier than constructing a DFA for
that language. It is important because NFAs can be used to reduce the complexity of the mathematical
work required to establish many important properties in the theory of computation. For example, it is
much easier to prove closure properties of regular languages using NFAs than DFAs.

1.4. DFA minimization

DFA minimization is the task of transforming a given deterministic finite automaton (DFA) into an
equivalent DFA that has a minimum number of states. Several different algorithms accomplishing this task
are known and described in standard textbooks on automata theory.[1]

Example DFA. If in state c,


it exhibits the same
behavior for every input
string as in state d, or in
state e. Similarly, states a
and b are
nondistinguishable. The
DFA has no unreachable
states.

Equivalent minimal DFA.


Nondistinguishable states
have been joined into a

13 of 76
single one.

Minimum DFA
For each regular language, there also exists a minimal automaton that accepts it, that is, a DFA with a
minimum number of states and this DFA is unique (except that states can be given different names). [2][3]
The minimal DFA ensures minimal computational cost for tasks such as pattern matching.

There are two classes of states that can be removed or merged from the original DFA without affecting the
language it accepts to minimize it.

Unreachable states are the states that are not reachable from the initial state of the DFA, for any
input string.
Nondistinguishable states are those that cannot be distinguished from one another for any input
string.

DFA minimization is usually done in three steps, corresponding to the removal or merger of the relevant
states. Since the elimination of nondistinguishable states is computationally the most expensive one, it is
usually done as the last step.

Unreachable states
The state p of DFA M=(Q, Σ, δ, q0, F) is unreachable if no such string w in Σ* exists for which p=δ*(q0, w).
Reachable states can be obtained with the following algorithm:[citation needed]
let reachable_states := {q0};
let new_states := {q0};
do {
temp := the empty set;
for each q in new_states do
for each c in Σ do
temp := temp ∪ {p such that p = δ(q,c)};
end;
end;
new_states := temp \ reachable_states;
reachable_states := reachable_states ∪ new_states;
} while (new_states ≠ the empty set);
unreachable_states := Q \ reachable_states;

Unreachable states can be removed from the DFA without affecting the language that it accepts.

NFA minimization
While the above procedures work for DFAs, the method of partitioning does not work for non-deterministic
finite automata (NFAs).[9] While an exhaustive search may minimize an NFA, there is no polynomial-time
algorithm to minimize general NFAs unless P=PSPACE, an unsolved conjecture in computational
complexity theory which is widely believed to be false. However, there are methods of NFA minimization
that may be more efficient than brute force search.[10]

1.5. NDFA to DFA conversion algorithm

The powerset construction or subset construction is a standard method for converting a


nondeterministic finite automaton (NFA) into a deterministic finite automaton (DFA) which recognizes the
same formal language. It is important in theory because it establishes that NFAs, despite their additional
flexibility, are unable to recognize any language that cannot be recognized by some DFA. It is also
important in practice for converting easier-to-construct NFAs into more efficiently executable DFAs.
However, if the NFA has n states, the resulting DFA may have up to 2n states, an exponentially larger
number, which sometimes makes the construction impractical for large NFAs.

The construction, sometimes called the Rabin–Scott powerset construction (or subset construction) to
distinguish it from similar constructions for other types of automata, was first published by Michael O.
Rabin and Dana Scott in 1959.[1]

To simulate the operation of a DFA on a given input string, one needs to keep track of a single state at any

14 of 76
time: the state that the automaton will reach after seeing a prefix of the input. In contrast, to simulate an
NFA, one needs to keep track of a set of states: all of the states that the automaton could reach after
seeing the same prefix of the input, according to the nondeterministic choices made by the automaton. If,
after a certain prefix of the input, a set S of states can be reached, then after the next input symbol x the
set of reachable states is a deterministic function of S and x. Therefore, the sets of reachable NFA states
play the same role in the NFA simulation as single DFA states play in the DFA simulation, and in fact the
sets of NFA states appearing in this simulation may be re-interpreted as being states of a DFA. [2]

Construction
The powerset construction applies most directly to an NFA that does not allow state transformations
without consuming input symbols (aka: "ε-moves"). Such an automaton may be defined as a 5-tuple (Q, Σ,
T, q0, F), in which Q is the set of states, Σ is the set of input symbols, T is the transition function (mapping
a state and an input symbol to a set of states), q0 is the initial state, and F is the set of accepting states.
The corresponding DFA has states corresponding to subsets of Q. The initial state of the DFA is {q0}, the
(one-element) set of initial states. The transition function of the DFA maps a state S (representing a subset
of Q) and an input symbol x to the set T(S,x) = ∪{T(q,x) | q ∈ S}, the set of all states that can be reached
by an x-transition from a state in S. A state S of the DFA is an accepting state if and only if at least one
member of S is an accepting state of the NFA.[2][3]

In the simplest version of the powerset construction, the set of all states of the DFA is the powerset of Q,
the set of all possible subsets of Q. However, many states of the resulting DFA may be useless as they may
be unreachable from the initial state. An alternative version of the construction creates only the states
that are actually reachable.[4]

NFA with ε-moves

For an NFA with ε-moves (also called an ε-NFA), the construction must be modified to deal with these by
computing the ε-closure of states: the set of all states reachable from some given state using only ε-moves.
Van Noord recognizes three possible ways of incorporating this closure computation in the powerset
construction:[5]

1. Compute the ε-closure of the entire automaton as a preprocessing step, producing an equivalent NFA
without ε-moves, then apply the regular powerset construction. This version, also discussed by
Hopcroft and Ullman,[6] is straightforward to implement, but impractical for automata with large
numbers of ε-moves, as commonly arise in natural language processing application.[5]
2. During the powerset computation, compute the ε-closure of each state q that is
considered by the algorithm (and cache the result).
3. During the powerset computation, compute the ε-closure of each subset of
states Q' that is considered by the algorithm, and add its elements to Q'.

Multiple initial states

If NFAs are defined to allow for multiple initial states,[7] the initial state of the corresponding DFA is the
set of all initial states of the NFA, or (if the NFA also has ε-moves) the set of all states reachable from
initial states by ε-moves.

Example
The NFA below has four states; state 1 is initial, and states 3 and 4 are accepting. Its alphabet consists of
the two symbols 0 and 1, and it has ε-moves.

15 of 76
The initial state of the DFA constructed from this NFA is the set of all NFA states that are reachable from
state 1 by ε-moves; that is, it is the set {1,2,3}. A transition from {1,2,3} by input symbol 0 must follow
either the arrow from state 1 to state 2, or the arrow from state 3 to state 4. Additionally, neither state 2
nor state 4 have outgoing ε-moves. Therefore, T({1,2,3},0) = {2,4}, and by the same reasoning the full
DFA constructed from the NFA is as shown below.

As can be seen in this example, there are five states reachable from the start state of the DFA; the
remaining 11 sets in the powerset of the set of NFA states are not reachable.

Complexity

NFA with 5 states (left) whose DFA (right)


requires 16 states.[4]

Because the DFA states consist of sets of NFA states, an n-state NFA may be converted to a DFA with at
most 2n states.[2] For every n, there exist n-state NFAs such that every subset of states is reachable from
the initial subset, so that the converted DFA has exactly 2n states, giving Θ(2n) worst-case time complexity.

16 of 76
[8][9]A simple example requiring nearly this many states is the language of strings over the alphabet {0,1}
in which there are at least n characters, the nth from last of which is 1. It can be represented by an (n +
1)-state NFA, but it requires 2n DFA states, one for each n-character suffix of the input; cf. picture for
n=4.[4]

Applications
Brzozowski's algorithm for DFA minimization uses the powerset construction, twice. It converts the input
DFA into an NFA for the reverse language, by reversing all its arrows and exchanging the roles of initial
and accepting states, converts the NFA back into a DFA using the powerset construction, and then repeats
its process. Its worst-case complexity is exponential, unlike some other known DFA minimization
algorithms, but in many examples it performs more quickly than its worst-case complexity would
suggest.[10]

Safra's construction, which converts a non-deterministic Büchi automaton with n states into a
deterministic Muller automaton or into a deterministic Rabin automaton with 2O(n log n) states, uses the
powerset construction as part of its machinery.[11]

1.6. Finite-state transducer

A finite-state transducer (FST) is a finite-state machine with two memory tapes, following the
terminology for Turing machines: an input tape and an output tape. This contrasts with an ordinary
finite-state automaton, which has a single tape. An FST is a type of finite-state automaton that maps
between two sets of symbols.[1] An FST is more general than a finite-state automaton (FSA). An FSA
defines a formal language by defining a set of accepted strings while an FST defines relations between
sets of strings.

An FST will read a set of strings on the input tape and generates a set of relations on the output tape. An
FST can be thought of as a translator or relater between strings in a set.

In morphological parsing, an example would be inputting a string of letters into the FST, the FST would
then output a string of morphemes.

Overview
An automaton can be said to recognize a string if we view the content of its tape as input. In other words,
the automaton computes a function that maps strings into the set {0,1}. Alternatively, we can say that an
automaton generates strings, which means viewing its tape as an output tape. On this view, the automaton
generates a formal language, which is a set of strings. The two views of automata are equivalent: the
function that the automaton computes is precisely the indicator function of the set of strings it generates.
The class of languages generated by finite automata is known as the class of regular languages.

The two tapes of a transducer are typically viewed as an input tape and an output tape. On this view, a
transducer is said to transduce (i.e., translate) the contents of its input tape to its output tape, by
accepting a string on its input tape and generating another string on its output tape. It may do so
nondeterministically and it may produce more than one output for each input string. A transducer may
also produce no output for a given input string, in which case it is said to reject the input. In general, a
transducer computes a relation between two formal languages.

Each string-to-string finite-state transducer relates the input alphabet Σ to the output alphabet Γ.
Relations R on Σ*×Γ* that can be implemented as finite-state transducers are called rational relations.
Rational relations that are partial functions, i.e. that relate every input string from Σ* to at most one Γ*,
are called rational functions.

Finite-state transducers are often used for phonological and morphological analysis in natural language
processing research and applications. Pioneers in this field include Ronald Kaplan, Lauri Karttunen,
Martin Kay and Kimmo Koskenniemi.[2][non-primary source needed] A common way of using transducers is in a
so-called "cascade", where transducers for various operations are combined into a single transducer by
repeated application of the composition operator (defined below).

Formal construction
Formally, a finite transducer T is a 6-tuple (Q, Σ, Γ, I, F, δ) such that:

17 of 76
Q is a finite set, the set of states;
Σ is a finite set, called the input alphabet;
Γ is a finite set, called the output alphabet;
I is a subset of Q, the set of initial states;
F is a subset of Q, the set of final states; and
(where ε is the empty string) is the transition relation.

We can view (Q, δ) as a labeled directed graph, known as the transition graph of T: the set of vertices is Q,
and means that there is a labeled edge going from vertex q to vertex r. We also say that a is
the input label and b the output label of that edge.

NOTE: This definition of finite transducer is also called letter transducer (Roche and Schabes 1997);
alternative definitions are possible, but can all be converted into transducers following this one.

Define the extended transition relation as the smallest set such that:

;
for all ; and
whenever and then .

The extended transition relation is essentially the reflexive transitive closure of the transition graph that
has been augmented to take edge labels into account. The elements of are known as paths. The edge
labels of a path are obtained by concatenating the edge labels of its constituent transitions in order.

The behavior of the transducer T is the rational relation [T] defined as follows: if and only if there
exists and such that . This is to say that T transduces a string into a
string if there exists a path from an initial state to a final state whose input label is x and whose
output label is y.

Weighted automata

Finite State Transducers can be weighted, where each transition is labelled with a weight in addition to
the input and output labels. A Weighted Finite State Transducer (WFST) over a set K of weights can be
defined similarly to an unweighted one as an 8-tuple T=(Q, Σ, Γ, I, F, E, λ, ρ), where:

Q, Σ, Γ, I, F are defined as above;


(where ε is the empty string) is the finite set of transitions;
maps initial states to weights;
maps final states to weights.

In order to make certain operations on WFSTs well-defined, it is convenient to require the set of weights
to form a semiring.[3] Two typical semirings used in practice are the log semiring and tropical semiring:
unweighted automata may be regarded as having weights in the Boolean semiring.[4]

Stochastic FST

Stochastic FSTs (also known as probabilistic FSTs or statistical FSTs) are presumably a form of weighted
FST.[citation needed]

Operations on finite-state transducers


The following operations defined on finite automata also apply to finite transducers:

Union. Given transducers T and S, there exists a transducer such that if and only if
or .
Concatenation. Given transducers T and S, there exists a transducer such that if and
only if there exist with and
Kleene closure. Given a transducer T, there exists a transducer with the following properties:

(k1)
;

and (k2)

18 of 76
then

and does not hold unless mandated by (k1) or (k2).

Composition. Given a transducer T on alphabets Σ and Γ and a transducer S on alphabets Γ and Δ,


there exists a transducer on Σ and Δ such that if and only if there exists a string
such that and . This operation extends to the weighted case.[5]

This definition uses the same notation used in mathematics for relation composition. However, the
conventional reading for relation composition is the other way around: given two relations T and S,
when there exist some y such that and

Projection to an automaton. There are two projection functions: preserves the input tape, and
preserves the output tape. The first projection, is defined as follows:

Given a transducer T, there exists a finite automaton such that accepts x if and only if there
exists a string y for which

The second projection, is defined similarly.

Determinization. Given a transducer T, we want to build an equivalent transducer that has a unique
initial state and such that no two transitions leaving any state share the same input label. The
powerset construction can be extended to transducers, or even weighted transducers, but sometimes
fails to halt; indeed, some non-deterministic transducers do not admit equivalent deterministic
transducers.[6] Characterizations of determinizable transducers have been proposed[7] along with
efficient algorithms to test them:[8] they rely on the semiring used in the weighted case as well as a
general property on the structure of the transducer (the twins property).
Weight pushing for the weighted case.[9]
Minimization for the weighted case.[10]
Removal of epsilon-transitions.

1.6.1. Moore machine

A Moore machine is a finite-state machine whose output values are determined only by its current state.
This is in contrast to a Mealy machine, whose output values are determined both by its current state and
by the values of its inputs. The Moore machine is named after Edward F. Moore, who presented the
concept in a 1956 paper, “Gedanken-experiments on Sequential Machines.”[1]

Formal definition
A Moore machine can be defined as a 6-tuple consisting of the following:

A finite set of states


A start state (also called initial state) which is an element of
A finite set called the input alphabet
A finite set called the output alphabet
A transition function mapping a state and the input alphabet to the next state
An output function mapping each state to the output alphabet

A Moore machine can be regarded as a restricted type of finite-state transducer.

Visual representation
States a b output
q0 q1 q2 1
q1 q1 q1 0
q2 q1 q0 1

19 of 76
Table

State transition table is a table showing relation between an input and a state. [clarification needed]

Diagram

The state diagram for a Moore machine or Moore diagram is a diagram that associates an output value
with each state. Moore machine is an output producer.

1.6.2. Mealy machine

A Mealy machine is a finite-state machine whose output values are determined both by its current state
and the current inputs. (This is in contrast to a Moore machine, whose output values are determined solely
by its current state.) A Mealy machine is a deterministic finite-state transducer: for each state and input,
at most one transition is possible.

The Mealy machine is named after George H. Mealy, who presented the concept in a 1955 paper, “A
Method for Synthesizing Sequential Circuits”.[1]

Formal definition
A Mealy machine is a 6-tuple consisting of the following:

a finite set of states


a start state (also called initial state) which is an element of
a finite set called the input alphabet
a finite set called the output alphabet
a transition function mapping pairs of a state and an input symbol to the
corresponding next state.
an output function mapping pairs of a state and an input symbol to the corresponding
output symbol.

In some formulations, the transition and output functions are coalesced into a single function
.

Comparison of Mealy machines and Moore machines


1. Mealy machines tend to have fewer states:
Different outputs on arcs (n2) rather than states (n).
2. Moore machines are safer to use:
Outputs change at clock edge (always one cycle later).
In Mealy machines, input change can cause output change as soon as logic is done—a big
problem when two machines are interconnected – asynchronous feedback may occur if one isn't
careful.
3. Mealy machines react faster to inputs:
React in same cycle—don't need to wait for clock.
In Moore machines, more logic may be necessary to decode state into outputs—more gate delays
after clock edge.

Not all sequential circuits can be implemented using the Mealy model. Some sequential circuits can only
be implemented as Moore machines.[2]

Diagram
The state diagram for a Mealy machine associates an output value with each transition edge (in contrast
to the state diagram for a Moore machine, which associates an output value with each state).

When the input and output alphabet are both Σ, one can also associate to a Mealy Automata an Helix
directed graph (S × Σ, (x, i) → (T(x, i), G(x, i))).[3] This graph has as vertices the couples of state and
letters, every nodes are of out-degree one, and the successor of (x, i) is the next state of the automata and
the letter that the automata output when it is instate x and it reads letter i. This graph is a union of
disjoint cycles if the automaton is bireversible.

20 of 76
Examples
Simple

State diagram for a


simple Mealy
machine with one
input and one
output.

A simple Mealy machine has one input and one output. Each transition edge is labeled with the value of
the input (shown in red) and the value of the output (shown in blue). The machine starts in state Si. (In
this example, the output is the exclusive-or of the two most-recent input values; thus, the machine
implements an edge detector, outputting a one every time the input flips and a zero otherwise.)

Complex

More complex Mealy machines can have multiple inputs as well as multiple outputs.

Applications
Mealy machines provide a rudimentary mathematical model for cipher machines. Considering the input
and output alphabet the Latin alphabet, for example, then a Mealy machine can be designed that given a
string of letters (a sequence of inputs) can process it into a ciphered string (a sequence of outputs).
However, although one could use a Mealy model to describe the Enigma, the state diagram would be too
complex to provide feasible means of designing complex ciphering machines.

Moore/Mealy machines, are DFAs that have also output at any tick of the clock. Modern CPUs, computers,
cell phones, digital clocks and basic electronic devices/machines have some kind of finite state machine to
control it.

Simple software systems, particularly ones that can be represented using regular expressions, can be
modeled as Finite State Machines. There are many of such simple systems, such as vending machines or
basic electronics.

By finding the intersection of two Finite state machines, one can design in a very simple manner
concurrent systems that exchange messages for instance. For example, a traffic light is a system that
consists of multiple subsystems, such as the different traffic lights, that work concurrently.

Some examples of applications:

number classification
watch with timer
vending machine
traffic light
bar code scanner
gas pumps

2. Regular grammar

21 of 76
A regular grammar is a formal grammar that is right-regular or left-regular.

Strictly regular grammars


A right regular grammar (also called right linear grammar) is a formal grammar (N, Σ, P, S) such that
all the production rules in P are of one of the following forms:

1. B → a - where B is a non-terminal in N and a is a terminal in Σ


2. B → aC - where B and C are non-terminals in N and a is in Σ
3. B → ε - where B is in N and ε denotes the empty string, i.e. the string of length 0.

In a left regular grammar (also called left linear grammar), all rules obey the forms

1. A → a - where A is a non-terminal in N and a is a terminal in Σ


2. A → Ba - where A and B are in N and a is in Σ
3. A → ε - where A is in N and ε is the empty string.

A regular grammar is a left or right regular grammar.

Some textbooks and articles disallow empty production rules, and assume that the empty string is not
present in languages.

Examples

An example of a right regular grammar G with N = {S, A}, Σ = {a, b, c}, P consists of the following rules

S → aS
S → bA
A→ε
A → cA

and S is the start symbol.

2.1. Regular expression

The match results of the


pattern
(?<=\.) {2,}(?=[A-Z])

At least two spaces are


matched, but only if they
occur directly after a
period (.) and before an
upper case letter.

22 of 76
Stephen Cole Kleene, who
helped found the concept

A blacklist on Wikipedia
which uses regular
expressions to identify bad
titles

A regular expression, regex or regexp[1] (sometimes called a rational expression)[2][3] is  a sequence
of characters that define a search pattern. Usually this pattern is then used by string searching algorithms
for "find" or "find and replace" operations on strings, or for input validation.

The concept arose in the 1950s when the American mathematician Stephen Cole Kleene formalized the
description of a regular language. The concept came into common use with Unix text-processing utilities.
Since the 1980s, different syntaxes for writing regular expressions exist, one being the POSIX standard
and another, widely used, being the Perl syntax.

Regular expressions are used in search engines, search and replace dialogs of word processors and text
editors, in text processing utilities such as sed and AWK and in lexical analysis. Many programming
languages provide regex capabilities, built-in or via libraries.

Patterns
The phrase regular expressions, and consequently, regexes, is often used to mean the specific, standard
textual syntax (distinct from the mathematical notation described below) for representing patterns for
matching text. Each character in a regular expression (that is, each character in the string describing its
pattern) is either a metacharacter, having a special meaning, or a regular character that has a literal
meaning. For example, in the regex a., a is a literal character which matches just 'a' and '.' is a meta
character that matches every character except a newline. Therefore, this regex matches, for example, 'a ',
or 'ax', or 'a0'. Together, metacharacters and literal characters can be used to identify text of a given
pattern, or process a number of instances of it. Pattern-matches may vary from a precise equality to a very
general similarity, as controlled by the metacharacters. For example, . is a very general pattern, [a-z]
(match all letters from 'a' to 'z') is less general and a is a precise pattern (matches just 'a'). The
metacharacter syntax is designed specifically to represent prescribed targets in a concise and flexible way
to direct the automation of text processing of a variety of input data, in a form easy to type using a
standard ASCII keyboard.

A very simple case of a regular expression in this syntax is to locate a word spelled two different ways in a
text editor, the regular expression seriali[sz]e matches both "serialise" and "serialize". Wildcards also
achieve this, but are more limited in what they can pattern, as they have fewer metacharacters and a
simple language-base.

The usual context of wildcard characters is in globbing similar names in a list of files, whereas regexes are

23 of 76
usually employed in applications that pattern-match text strings in general. For example, the regex ^[
\t]+|[ \t]+$ matches excess whitespace at the beginning or end of a line. An advanced regular expression
that matches any numeral is [+-]?(\d+(\.\d+)?|\.\d+)([eE][+-]?\d+)?.

Translating the Kleene


star
(s* means 'zero or more of
s ')

A regex processor translates a regular expression in the above syntax into an internal representation
which can be executed and matched against a string representing the text being searched in. One possible
approach is the Thompson's construction algorithm to construct a nondeterministic finite automaton
(NFA), which is then made deterministic and the resulting deterministic finite automaton (DFA) is run on
the target text string to recognize substrings that match the regular expression. The picture shows the
NFA scheme N(s*) obtained from the regular expression s*, where s denotes a simpler regular expression
in turn, which has already been recursively translated to the NFA N(s).

Basic concepts
A regular expression, often called a pattern, is an expression used to specify a set of strings required for a
particular purpose. A simple way to specify a finite set of strings is to list its elements or members.
However, there are often more concise ways to specify the desired set of strings. For example, the set
containing the three strings "Handel", "Händel", and "Haendel" can be specified by the pattern
H(ä|ae?)ndel; we say that this pattern matches each of the three strings. In most formalisms, if there exists
at least one regular expression that matches a particular set then there exists an infinite number of other
regular expression that also match it—the specification is not unique. Most formalisms provide the
following operations to construct regular expressions.

Boolean "or"
A vertical bar separates alternatives. For example, gray|grey can match "gray" or "grey".
Grouping
Parentheses are used to define the scope and precedence of the operators (among other uses). For
example, gray|grey and gr(a|e)y are equivalent patterns which both describe the set of "gray" or "grey".
Quantification
A quantifier after a token (such as a character) or group specifies how often that preceding element
is allowed to occur. The most common quantifiers are the question mark ?, the asterisk * (derived
from the Kleene star), and the plus sign + (Kleene plus).

? The question mark indicates zero or one occurrences of the preceding element. For
example, colou?r matches both "color" and "colour".
* The asterisk indicates zero or more occurrences of the preceding element. For example,
ab*c matches "ac", "abc", "abbc", "abbbc", and so on.
+ The plus sign indicates one or more occurrences of the preceding element. For example,
ab+c matches "abc", "abbc", "abbbc", and so on, but not "ac".

{n}[19] The preceding item is matched exactly n times.


{min,}[19] The preceding item is matched min or more times.
{min,max}[19] The preceding item is matched at least min times, but not more than max times.

These constructions can be combined to form arbitrarily complex expressions, much like one can
construct arithmetical expressions from numbers and the operations +, −, ×, and ÷. For example,
H(ae?|ä)ndel and H(a|ae|ä)ndel are both valid patterns which match the same strings as the earlier example,
H(ä|ae?)ndel.

The precise syntax for regular expressions varies among tools and with context; more detail is given in the
Syntax section.

Regular expressions describe regular languages in formal language theory. They have the same expressive
power as regular grammars.

Formal definition

24 of 76
Regular expressions consist of constants, which denote sets of strings, and operator symbols, which
denote operations over these sets. The following definition is standard, and found as such in most
textbooks on formal language theory.[20][21] Given a finite alphabet Σ, the following constants are defined
as regular expressions:

(empty set) ∅ denoting the set ∅.


(empty string) ε denoting the set containing only the "empty" string, which has no characters at all.
(literal character) a in Σ denoting the set containing only the character a.

Given regular expressions R and S, the following operations over them are defined to produce regular
expressions:

(concatenation) RS denotes the set of strings that can be obtained by concatenating a string in R and
a string in S. For example, let R = {"ab", "c"}, and S = {"d", "ef"}. Then, RS = {"abd", "abef", "cd",
"cef"}.
(alternation) R | S denotes the set union of sets described by R and S. For example, if R describes
{"ab", "c"} and S describes {"ab", "d", "ef"}, expression R | S describes {"ab", "c", "d", "ef"}.
(Kleene star) R* denotes the smallest superset of set described by R that contains ε and is closed
under string concatenation. This is the set of all strings that can be made by concatenating any finite
number (including zero) of strings from set described by R. For example, {"0","1"}* is the set of all
finite binary strings (including the empty string), and {"ab", "c"}* = {ε, "ab", "c", "abab", "abc",
"cab", "cc", "ababab", "abcab", … }.

To avoid parentheses it is assumed that the Kleene star has the highest priority, then concatenation and
then alternation. If there is no ambiguity then parentheses may be omitted. For example, (ab)c can be
written as abc, and a|(b(c*)) can be written as a|bc*. Many textbooks use the symbols ∪, +, or ∨ for
alternation instead of the vertical bar.

Examples:

a|b* denotes {ε, "a", "b", "bb", "bbb", …}


(a|b)* denotes the set of all strings with no symbols other than "a" and "b", including the empty string:
{ε, "a", "b", "aa", "ab", "ba", "bb", "aaa", …}
ab*(c|ε) denotes the set of strings starting with "a", then zero or more "b"s and finally optionally a "c":
{"a", "ac", "ab", "abc", "abb", "abbc", …}
(0|(1(01*0)*1))* denotes the set of binary numbers that are multiples of 3: { ε, "0", "00", "11", "000",
"011", "110", "0000", "0011", "0110", "1001", "1100", "1111", "00000", … }

Expressive power and compactness

The formal definition of regular expressions is purposely parsimonious and avoids defining the redundant
quantifiers ? and +, which can be expressed as follows: a+ = aa*, and a? = (a|ε). Sometimes the complement
operator is added, to give a generalized regular expression; here Rc matches all strings over Σ* that do not
match R. In principle, the complement operator is redundant, as it can always be circumscribed by using
the other operators. However, the process for computing such a representation is complex, and the result
may require expressions of a size that is double exponentially larger.[22][23]

Regular expressions in this sense can express the regular languages, exactly the class of languages
accepted by deterministic finite automata. There is, however, a significant difference in compactness.
Some classes of regular languages can only be described by deterministic finite automata whose size
grows exponentially in the size of the shortest equivalent regular expressions. The standard example here
is the languages Lk consisting of all strings over the alphabet {a,b} whose kth-from-last letter equals a. On
one hand, a regular expression describing L4 is given by .

Generalizing this pattern to Lk gives the expression:

On the other hand, it is known that every deterministic finite automaton accepting the language Lk must
have at least 2k states. Luckily, there is a simple mapping from regular expressions to the more general
nondeterministic finite automata (NFAs) that does not lead to such a blowup in size; for this reason NFAs
are often used as alternative representations of regular languages. NFAs are a simple variation of the
type-3 grammars of the Chomsky hierarchy.[20]

In the opposite direction, there are many languages easily described by a DFA that are not easily
described a regular expression. For instance, determining the validity of a given ISBN number requires
computing the modulus of the integer base 11, and can be easily implemented with an 11-state DFA.
However, a regular expression to answer the same problem of divisibility by 11 is at least multiple

25 of 76
megabytes in length.[citation needed]

Given a regular expression, Thompson's construction algorithm computes an equivalent nondeterministic


finite automaton. A conversion in the opposite direction is achieved by Kleene's algorithm.

Finally, it is worth noting that many real-world "regular expression" engines implement features that
cannot be described by the regular expressions in the sense of formal language theory; rather, they
implement regexes. See below for more on this.

Deciding equivalence of regular expressions

As seen in many of the examples above, there is more than one way to construct a regular expression to
achieve the same results.

It is possible to write an algorithm that, for two given regular expressions, decides whether the described
languages are equal; the algorithm reduces each expression to a minimal deterministic finite state
machine, and determines whether they are isomorphic (equivalent).

Algebraic laws for regular expressions can be obtained using a method by Gischer which is best explained
along an example: In order to check whether (X+Y)* and (X* Y*)* denote the same regular language, for all
regular expressions X, Y, it is necessary and sufficient to check whether the particular regular expressions
(a+b)* and (a* b*)* denote the same language over the alphabet Σ={a,b}. More generally, an equation E=F
between regular-expression terms with variables holds if, and only if, its instantiaton with different
variables replaced by different symbol constants holds.[24][25]

The redundancy can be eliminated by using Kleene star and set union to find an interesting subset of
regular expressions that is still fully expressive, but perhaps their use can be restricted. [clarification needed]
This is a surprisingly difficult problem. As simple as the regular expressions are, there is no method to
systematically rewrite them to some normal form. The lack of axiom in the past led to the star height
problem. In 1991, Dexter Kozen axiomatized regular expressions as a Kleene algebra, using equational
and Horn clause axioms.[26] Already in 1964, Redko had proved that no finite set of purely equational
axioms can characterize the algebra of regular languages.[27]

Syntax
A regex pattern matches a target string. The pattern is composed of a sequence of atoms. An atom is a
single point within the regex pattern which it tries to match to the target string. The simplest atom is a
literal, but grouping parts of the pattern to match an atom will require using ( ) as metacharacters.
Metacharacters help form: atoms; quantifiers telling how many atoms (and whether it is a greedy
quantifier or not); a logical OR character, which offers a set of alternatives, and a logical NOT character,
which negates an atom's existence; and backreferences to refer to previous atoms of a completing pattern
of atoms. A match is made, not when all the atoms of the string are matched, but rather when all the
pattern atoms in the regex have matched. The idea is to make a small pattern of characters stand for a
large number of possible strings, rather than compiling a large list of all the literal possibilities.

Depending on the regex processor there are about fourteen metacharacters, characters that may or may
not have their literal character meaning, depending on context, or whether they are "escaped", i.e.
preceded by an escape sequence, in this case, the backslash \. Modern and POSIX extended regexes use
metacharacters more often than their literal meaning, so to avoid "backslash-osis" it makes sense to have
a metacharacter escape to a literal mode; but starting out, it makes more sense to have the four
bracketing metacharacters ( ) and { } be primarily literal, and "escape" this usual meaning to become
metacharacters. Common standards implement both. The usual metacharacters are {}[]()^$.|*+? and \. The
usual characters that become metacharacters when escaped are dswDSW and N.

Delimiters

When entering a regex in a programming language, they may be represented as a usual string literal,
hence usually quoted; this is common in C, Java, and Python for instance, where the regex re is entered as
"re". However, they are often written with slashes as delimiters, as in /re/ for the regex re. This originates
in ed, where / is the editor command for searching, and an expression /re/ can be used to specify a range
of lines (matching the pattern), which can be combined with other commands on either side, most
famously g/re/p as in grep ("global regex print"), which is included in most Unix-based operating systems,
such as Linux distributions. A similar convention is used in sed, where search and replace is given by
s/re/replacement/ and patterns can be joined with a comma to specify a range of lines as in /re1/,/re2/. This
notation is particularly well-known due to its use in Perl, where it forms part of the syntax distinct from
normal string literals. In some cases, such as sed and Perl, alternative delimiters can be used to avoid
collision with contents, and to avoid having to escape occurrences of the delimiter character in the

26 of 76
contents. For example, in sed the command s,/,X, will replace a / with an X, using commas as delimiters.

Standards

The IEEE POSIX standard has three sets of compliance: BRE (Basic Regular Expressions), [28] ERE
(Extended Regular Expressions), and SRE (Simple Regular Expressions). SRE is deprecated,[29] in favor of
BRE, as both provide backward compatibility. The subsection below covering the character classes applies
to both BRE and ERE.

BRE and ERE work together. ERE adds ?, +, and |, and it removes the need to escape the metacharacters
( ) and { }, which are required in BRE. Furthermore, as long as the POSIX standard syntax for regexes is
adhered to, there can be, and often is, additional syntax to serve specific (yet POSIX compliant)
applications. Although POSIX.2 leaves some implementation specifics undefined, BRE and ERE provide a
"standard" which has since been adopted as the default syntax of many tools, where the choice of BRE or
ERE modes is usually a supported option. For example, GNU grep has the following options: "grep -E" for
ERE, and "grep -G" for BRE (the default), and "grep -P" for Perl regexes.

Perl regexes have become a de facto standard, having a rich and powerful set of atomic expressions. Perl
has no "basic" or "extended" levels. As in POSIX EREs, ( ) and { } are treated as metacharacters unless
escaped; other metacharacters are known to be literal or symbolic based on context alone. Additional
functionality includes lazy matching, backtracking, named capture groups, and recursive patterns.

POSIX basic and extended[edit]

In the POSIX standard, Basic Regular Syntax (BRE) requires that the metacharacters ( ) and { } be
designated \(\) and \{\}, whereas Extended Regular Syntax (ERE) does not.

Metacharacter Description
^ Matches the starting position within the string. In line-based tools, it matches the
starting position of any line.
. Matches any single character (many applications exclude newlines, and exactly which
characters are considered newlines is flavor-, character-encoding-, and platform-specific,
but it is safe to assume that the line feed character is included). Within POSIX bracket
expressions, the dot character matches a literal dot. For example, a.c matches "abc", etc.,
but [a.c] matches only "a", ".", or "c".
[ ] A bracket expression. Matches a single character that is contained within the brackets.
For example, [abc] matches "a", "b", or "c". [a-z] specifies a range which matches any
lowercase letter from "a" to "z". These forms can be mixed: [abcx-z] matches "a", "b", "c",
"x", "y", or "z", as does [a-cx-z].

The - character is treated as a literal character if it is the last or the first (after the ^, if
present) character within the brackets: [abc-], [-abc]. Note that backslash escapes are not
allowed. The ] character can be included in a bracket expression if it is the first (after the
^) character: []abc].
[^ ] Matches a single character that is not contained within the brackets. For example, [^abc]
matches any character other than "a", "b", or "c". [^a-z] matches any single character
that is not a lowercase letter from "a" to "z". Likewise, literal characters and ranges can
be mixed.
$ Matches the ending position of the string or the position just before a string-ending
newline. In line-based tools, it matches the ending position of any line.
( ) Defines a marked subexpression. The string matched within the parentheses can be
recalled later (see the next entry, \n). A marked subexpression is also called a block or
capturing group. BRE mode requires \( \).
\n Matches what the nth marked subexpression matched, where n is a digit from 1 to 9.
This construct is vaguely defined in the POSIX.2 standard. Some tools allow referencing
more than nine capturing groups.
* Matches the preceding element zero or more times. For example, ab*c matches "ac",
"abc", "abbbc", etc. [xyz]* matches "", "x", "y", "z", "zx", "zyx", "xyzzy", and so on. (ab)*
matches "", "ab", "abab", "ababab", and so on.
{m,n} Matches the preceding element at least m and not more than n times. For example, a{3,5}
matches only "aaa", "aaaa", and "aaaaa". This is not found in a few older instances of
regexes. BRE mode requires \{m,n\}.

Examples:

.at matches any three-character string ending with "at", including "hat", "cat", and "bat".
[hc]at matches "hat" and "cat".

27 of 76
[^b]at matches all strings matched by .at except "bat".
[^hc]at matches all strings matched by .at other than "hat" and "cat".
^[hc]at matches "hat" and "cat", but only at the beginning of the string or line.
[hc]at$ matches "hat" and "cat", but only at the end of the string or line.
\[.\] matches any single character surrounded by "[" and "]" since the brackets are escaped, for
example: "[a]" and "[b]".
s.* matches s followed by zero or more characters, for example: "s" and "saw" and "seed".

POSIX extended[edit]

The meaning of metacharacters escaped with a backslash is reversed for some characters in the POSIX
Extended Regular Expression (ERE) syntax. With this syntax, a backslash causes the metacharacter to be
treated as a literal character. So, for example, \( \) is now ( ) and \{ \} is now { }. Additionally, support is
removed for \n backreferences and the following metacharacters are added:

Metacharacter Description
? Matches the preceding element zero or one time. For example, ab?c matches only "ac" or
"abc".
+
Matches the preceding element one or more times. For example, ab+c matches "abc",
"abbc", "abbbc", and so on, but not "ac".
The choice (also known as alternation or set union) operator matches either the
| expression before or the expression after the operator. For example, abc|def matches
"abc" or "def".

Examples:

[hc]?at matches "at", "hat", and "cat".


[hc]*at matches "at", "hat", "cat", "hhat", "chat", "hcat", "cchchat", and so on.
[hc]+at matches "hat", "cat", "hhat", "chat", "hcat", "cchchat", and so on, but not "at".
cat|dog matches "cat" or "dog".

POSIX Extended Regular Expressions can often be used with modern Unix utilities by including the
command line flag -E.

Character classes[edit]

The character class is the most basic regex concept after a literal match. It makes one small sequence of
characters match a larger set of characters. For example, [A-Z] could stand for the upper case alphabet,
and \d could mean any digit. Character classes apply to both POSIX levels.

When specifying a range of characters, such as [a-Z] (i.e. lowercase a to upper-case z), the computer's
locale settings determine the contents by the numeric ordering of the character encoding. They could
store digits in that sequence, or the ordering could be abc…zABC…Z, or aAbBcC…zZ. So the POSIX
standard defines a character class, which will be known by the regex processor installed. Those definitions
are in the following table:

POSIX Non-standard Perl/Tcl Vim Java ASCII Description


[:ascii:][30] \p{ASCII} [\x00-\x7F] ASCII characters
[:alnum:] \p{Alnum} [A-Za-z0-9] Alphanumeric characters
[:word:]
\w \w \w [A-Za-z0-9_]
Alphanumeric characters
[citation needed] plus "_"
\W \W \W [^A-Za-z0-9_] Non-word characters
[:alpha:] \a \p{Alpha} [A-Za-z] Alphabetic characters
[:blank:] \s \p{Blank} [ [[\t]]] Space and tab
\< (?<=\W)(?=\w)|(?<=\w)
\b
\>
\b
(?=\W) Word boundaries
(?<=\W)(?=\W)|(?<=\w)
\B
(?=\w)
Non-word boundaries
[:cntrl:] \p{Cntrl} [\x00-\x1F\x7F] Control characters
[:digit:] \d \d \p{Digit} or [0-9] Digits
\d
\D \D \D [^0-9] Non-digits
[:graph:] \p{Graph} [\x21-\x7E] Visible characters
[:lower:] \l \p{Lower} [a-z] Lowercase letters
[:print:] \p \p{Print} [\x20-\x7E]
Visible characters and the
space character

28 of 76
[][!"#$%&'()*+,./:;
[:punct:] \p{Punct}
<=>?@\^_`{|}~-] Punctuation characters

[:space:] \s \_s \p{Space} or [ \t\r\n\v\f] Whitespace characters


\s
\S \S \S [^ \t\r\n\v\f] Non-whitespace characters
[:upper:] \u \p{Upper} [A-Z] Uppercase letters
[:xdigit:] \x \p{XDigit} [A-Fa-f0-9] Hexadecimal digits

POSIX character classes can only be used within bracket expressions. For example, [[:upper:]ab] matches
the uppercase letters and lowercase "a" and "b".

An additional non-POSIX class understood by some tools is [:word:], which is usually defined as [:alnum:]
plus underscore. This reflects the fact that in many programming languages these are the characters that
may be used in identifiers. The editor Vim further distinguishes word and word-head classes (using the
notation \w and \h) since in many programming languages the characters that can begin an identifier are
not the same as those that can occur in other positions.

Note that what the POSIX regex standards call character classes are commonly referred to as POSIX
character classes in other regex flavors which support them. With most other regex flavors, the term
character class is used to describe what POSIX calls bracket expressions.

2.2. Regular language

A regular language (also called a rational language[1][2]) is a formal language that can be expressed
using a regular expression, in the strict sense of the latter notion used in theoretical computer science (as
opposed to many regular expressions engines provided by modern programming languages, which are
augmented with features that allow recognition of languages that cannot be expressed by a classic regular
expression).

Alternatively, a regular language can be defined as a language recognized by a finite automaton. The
equivalence of regular expressions and finite automata is known as Kleene's theorem[3] (after American
mathematician Stephen Cole Kleene). In the Chomsky hierarchy, regular languages are defined to be the
languages that are generated by Type-3 grammars (regular grammars).

Regular languages are very useful in input parsing and programming language design.

Formal definition
The collection of regular languages over an alphabet Σ is defined recursively as follows:

The empty language Ø, and the empty string language {ε} are regular languages.
For each a ∈ Σ (a belongs to Σ), the singleton language {a} is a regular language.
If A and B are regular languages, then A ∪ B (union), A • B (concatenation), and A* (Kleene star) are
regular languages.
No other languages over Σ are regular.

See regular expression for its syntax and semantics. Note that the above cases are in effect the defining
rules of regular expression.

Examples
All finite languages are regular; in particular the empty string language {ε} = Ø* is regular. Other typical
examples include the language consisting of all strings over the alphabet {a, b} which contain an even
number of as, or the language consisting of all strings of the form: several as followed by several bs.

A simple example of a language that is not regular is the set of strings { anbn | n ≥ 0 }.[4] Intuitively, it
cannot be recognized with a finite automaton, since a finite automaton has finite memory and it cannot
remember the exact number of a's. Techniques to prove this fact rigorously are given below.

Closure properties

29 of 76
The regular languages are closed under the various operations, that is, if the languages K and L are
regular, so is the result of the following operations:

the set theoretic Boolean operations: union K ∪ L, intersection K ∩ L, and complement L, hence also
relative complement K-L.[10]
the regular operations: K ∪ L, concatenation K ∘ L, and Kleene star L*.[11]
the trio operations: string homomorphism, inverse string homomorphism, and intersection with
regular languages. As a consequence they are closed under arbitrary finite state transductions, like
quotient K / L with a regular language. Even more, regular languages are closed under quotients with
arbitrary languages: If L is regular then L/K is regular for any K.[citation needed]
the reverse (or mirror image) LR.[citation needed]

The number of words in a regular language


Let denote the number of words of length in . The ordinary generating function for L is the
formal power series

The generating function of a language L is a rational function if L is regular.[26] Hence for any regular
language there exist an integer constant , complex constants and complex polynomials
such that for every the number of words of length in is
.[28][29][30][31]

Thus, non-regularity of certain languages can be proved by counting the words of a given length in .
Consider, for example, the Dyck language of strings of balanced parentheses. The number of words of

length in the Dyck language is equal to the Catalan number , which is not of the form

, witnessing the non-regularity of the Dyck language. Care must be taken since some of the
eigenvalues could have the same magnitude. For example, the number of words of length in the
language of all even binary words is not of the form , but the number of words of even or odd
length are of this form; the corresponding eigenvalues are . In general, for every regular language
there exists a constant such that for all , the number of words of length is asymptotically
.[32]

The zeta function of a language L is[26]

The zeta function of a regular language is not in general rational, but that of a cyclic language is. [33][34]

Generalizations
The notion of a regular language has been generalized to infinite words (see ω-automata) and to trees (see
tree automaton).

Rational set generalizes the notion (of regular/rational language) to monoids that are not necessarily free.
Likewise, the notion of a recognizable language (by a finite automaton) has namesake as recognizable set
over a monoid that is not necessarily free. Howard Straubing notes in relation to these facts that “The
term "regular language" is a bit unfortunate. Papers influenced by Eilenberg's monograph[35] often use
either the term "recognizable language", which refers to the behavior of automata, or "rational language",
which refers to important analogies between regular expressions and rational power series. (In fact,
Eilenberg defines rational and recognizable subsets of arbitrary monoids; the two notions do not, in
general, coincide.) This terminology, while better motivated, never really caught on, and "regular
language" is used almost universally.”[36]

Rational series is another generalization, this time in the context of a formal power series over a semiring.
This approach gives rise to weighted rational expressions and weighted automata. In this algebraic
context, the regular languages (corresponding to Boolean-weighted rational expressions) are usually
called rational languages.[37][38] Also in this context, Kleene's theorem finds a generalization called the

30 of 76
Kleene-Schützenberger theorem.

Induction
1. ^ 1. ⇒ 2. by Thompson's construction algorithm
2. ^ 2. ⇒ 1. by Kleene's algorithm
3. ^ 2. ⇒ 3. by the powerset construction
4. ^ 3. ⇒ 2. since the former definition is stronger than the latter
5. ^ 2. ⇒ 4. see Hopcroft, Ullman (1979), Theorem 9.2, p.219
6. ^ 4. ⇒ 2. see Hopcroft, Ullman (1979), Theorem 9.1, p.218
7. ^ 3. ⇔ 9. by the Myhill–Nerode theorem
8. ^ u~v is defined as: uw∈L if and only if vw∈L for all w∈Σ*
9. ^ 3. ⇔ 10. see the proof in the Syntactic monoid article, and see p.160 in Holcombe, W.M.L. (1982).
Algebraic automata theory. Cambridge Studies in Advanced Mathematics. 1. Cambridge University
Press. ISBN 0-521-60492-3. Zbl 0489.68046.
10. ^ Check if LA ∩ LB = LA. Deciding this property is NP-hard in general; see File:RegSubsetNP.pdf for
an illustration of the proof idea.

2.3. Pumping lemma for regular languages

The pumping lemma for regular languages is a lemma that describes an essential property of all
regular languages. Informally, it says that all sufficiently long words in a regular language may be
pumped—that is, have a middle section of the word repeated an arbitrary number of times—to produce a
new word that also lies within the same language.

Specifically, the pumping lemma says that for any regular language L there exists a constant p such that
any word w in L with length at least p can be split into three substrings, w = xyz, where the middle
portion y must not be empty, such that the words xz, xyz, xyyz, xyyyz, … constructed by repeating y zero or
more times are still in L. This process of repetition is known as "pumping". Moreover, the pumping lemma
guarantees that the length of xy will be at most p, imposing a limit on the ways in which w may be split.
Finite languages vacuously satisfy the pumping lemma by having p equal to the maximum string length in
L plus one.

The pumping lemma is useful for disproving the regularity of a specific language in question. It was first
proven by Michael Rabin and Dana Scott in 1959.

Formal statement
Let L be a regular language. Then there exists an integer p ≥ 1 depending only on L such that every string
w in L of length at least p (p is called the "pumping length"[4]) can be written as w = xyz (i.e., w can be
divided into three substrings), satisfying the following conditions:

y is the substring that can be pumped (removed or repeated any number of times, and the resulting string
is always in L). (1) means the loop y to be pumped must be of length at least one; (2) means the loop must
occur within the first p characters. |x| must be smaller than p (conclusion of (1) and (2)), but apart from
that, there is no restriction on x and z.

In simple words, for any regular language L, any sufficiently long word w (in L) can be split into 3 parts.
i.e. w = xyz , such that all the strings xynz for n ≥ 0 are also in L.

Below is a formal expression of the Pumping Lemma.

31 of 76
Use of the lemma
The pumping lemma is often used to prove that a particular language is non-regular: a proof by
contradiction (of the language's regularity) may consist of exhibiting a word (of the required length) in the
language that lacks the property outlined in the pumping lemma.

For example, the language L = {anbn : n ≥ 0} over the alphabet Σ = {a, b} can be shown to be
non-regular as follows. Let w, x, y, z, p, and i be as used in the formal statement for the pumping lemma
above. Let w in L be given by w = apbp. By the pumping lemma, there must be some decomposition w =
xyz with |xy| ≤ p and |y| ≥ 1 such that xyiz in L for every i ≥ 0. Using |xy| ≤ p, we know y only consists of
instances of a. Moreover, because |y| ≥ 1, it contains at least one instance of the letter a. We now pump y
up: xy2z has more instances of the letter a than the letter b, since we have added some instances of a
without adding instances of b. Therefore, xy2z is not in L. We have reached a contradiction. Therefore, the
assumption that L is regular must be incorrect. Hence L is not regular.

The proof that the language of balanced (i.e., properly nested) parentheses is not regular follows the same
idea. Given p, there is a string of balanced parentheses that begins with more than p left parentheses, so
that y will consist entirely of left parentheses. By repeating y, we can produce a string that does not
contain the same number of left and right parentheses, and so they cannot be balanced.

Proof of the pumping lemma

Proof idea: Whenever a sufficiently long string


xyz is recognized by a finite automaton, it must
have reached some state (qs=qt) twice. Hence,
after repeating ("pumping") the middle part y
arbitrarily often (xyyz, xyyyz, ...) the word will
still be recognized.

For every regular language there is a finite state automaton (FSA) that accepts the language. The number
of states in such an FSA are counted and that count is used as the pumping length p. For a string of length
at least p, let q0 be the start state and let q1, ..., qp be the sequence of the next p states visited as the
string is emitted. Because the FSA has only p states, within this sequence of p + 1 visited states there
must be at least one state that is repeated. Write qs for such a state. The transitions that take the machine
from the first encounter of state qs to the second encounter of state qs match some string. This string is
called y in the lemma, and since the machine will match a string without the y portion, or with the string y
repeated any number of times, the conditions of the lemma are satisfied.

For example, the following image shows an FSA.

The FSA accepts the string: abcd. Since this string has a length at least as large as the number of states,
which is four, the pigeonhole principle indicates that there must be at least one repeated state among the
start state and the next four visited states. In this example, only q1 is a repeated state. Since the substring
bc takes the machine through transitions that start at state q1 and end at state q1, that portion could be
repeated and the FSA would still accept, giving the string abcbcd. Alternatively, the bc portion could be
removed and the FSA would still accept giving the string ad. In terms of the pumping lemma, the string
abcd is broken into an x portion a, a y portion bc and a z portion d.

32 of 76
General version of pumping lemma for regular languages
If a language L is regular, then there exists a number p ≥ 1 (the pumping length) such that every string
uwv in L with |w| ≥ p can be written in the form

uwv = uxyzv

with strings x, y and z such that |xy| ≤ p, |y| ≥ 1 and

uxyizv is in L for every integer i ≥ 0.[5]

From this, the above standard version follows a special case, with both u and v being the empty string.

Since the general version imposes stricter requirements on the language, it can be used to prove the
non-regularity of many more languages, such as { ambncn : m≥1 and n≥1 }.[6]

Converse of lemma not true


While the pumping lemma states that all regular languages satisfy the conditions described above, the
converse of this statement is not true: a language that satisfies these conditions may still be non-regular.
In other words, both the original and the general version of the pumping lemma give a necessary but not
sufficient condition for a language to be regular.

For example, consider the following language L:

In other words, L contains all strings over the alphabet {0,1,2,3} with a substring of length 3 including a
duplicate character, as well as all strings over this alphabet where precisely 1/7 of the string's characters
are 3's. This language is not regular but can still be "pumped" with p = 5. Suppose some string s has
length at least 5. Then, since the alphabet has only four characters, at least two of the first five characters
in the string must be duplicates. They are separated by at most three characters.

If the duplicate characters are separated by 0 characters, or 1, pump one of the other two characters
in the string, which will not affect the substring containing the duplicates.
If the duplicate characters are separated by 2 or 3 characters, pump 2 of the characters separating
them. Pumping either down or up results in the creation of a substring of size 3 that contains 2
duplicate characters.
The second condition of L ensures that L is not regular: Consider the string . This
string is in L exactly when and thus L is not regular by the Myhill-Nerode theorem.

The Myhill-Nerode theorem provides a test that exactly characterizes regular languages. The typical
method for proving that a language is regular is to construct either a finite state machine or a regular
expression for the language.

3. Context-free grammar

In formal language theory, a context-free grammar (CFG) is a certain type of formal grammar: a set of
production rules that describe all possible strings in a given formal language. Production rules are simple
replacements. For example, the rule

replaces with . There can be multiple replacement rules for any given value. For example,

33 of 76
means that can be replaced with either or .

In context-free grammars, all rules are one-to-one, one-to-many, or one-to-none. These rules can be applied
regardless of context. The left-hand side of the production rule is always a nonterminal symbol. This
means that the symbol does not appear in the resulting formal language. So in our case, our language
contains the letters and but not [1]

Rules can also be applied in reverse to check if a string is grammatically correct according to the
grammar.

Here is an example context-free grammar that describes all two-letter strings containing the letters and
.

If we start with the nonterminal symbol then we can use the rule to turn into . We can
then apply one of the two later rules. For example, if we apply to the first we get . If we then
apply to the second we get . Since both and are terminal symbols, and in context-free
grammars terminal symbols never appear on the left hand side of a production rule, there are no more
rules that can be applied. This same process can be used, applying the second two rules in different orders
in order to get all possible strings within our simple context-free grammar.

Formal definitions
A context-free grammar G is defined by the 4-tuple:[5]

where

1. V is a finite set; each element is called a nonterminal character or a variable. Each variable
represents a different type of phrase or clause in the sentence. Variables are also sometimes called
syntactic categories. Each variable defines a sub-language of the language defined by G.
2. Σ is a finite set of terminals, disjoint from V, which make up the actual content of the sentence. The
set of terminals is the alphabet of the language defined by the grammar G.
3. R is a finite relation from V to , where the asterisk represents the Kleene star operation. The
members of R are called the (rewrite) rules or productions of the grammar. (also commonly
symbolized by a P)
4. S is the start variable (or start symbol), used to represent the whole sentence (or program). It must
be an element of V.

Production rule notation

A production rule in R is formalized mathematically as a pair , where is a nonterminal


and is a string of variables and/or terminals; rather than using ordered pair notation,
production rules are usually written using an arrow operator with α as its left hand side and β as its right
hand side: .

It is allowed for β to be the empty string, and in this case it is customary to denote it by ε. The form
is called an ε-production.[6]

It is common to list all right-hand sides for the same left-hand side on the same line, using | (the pipe
symbol) to separate them. Rules and can hence be written as . In this case,
and is called the first and second alternative, respectively.

Rule application

For any strings , we say u directly yields v, written as , if with


and such that and . Thus, v is a result of applying the rule
to u.

Repetitive rule application

34 of 76
For any strings we say u yields v, written as (or in some textbooks), if
such that . In this case, if (i.e., ),

the relation holds. In other words, and are the reflexive transitive closure (allowing a word
to yield itself) and the transitive closure (requiring at least one step) of , respectively.

Proper CFGs

A context-free grammar is said to be proper,[7] if it has

no unreachable symbols:
no unproductive symbols:
no ε-productions:

no cycles:

Every context-free grammar can be effectively transformed into a weakly equivalent one without
unreachable symbols,[8] a weakly equivalent one without unproductive symbols,[9] and a weakly
equivalent one without cycles.[10] Every context-free grammar not producing ε can be effectively
transformed into a weakly equivalent one without ε-productions;[11] altogether, every such grammar can
be effectively transformed into a weakly equivalent proper CFG.

Example

The grammar , with productions

S → aSa,
S → bSb,
S → ε,

is context-free. It is not proper since it includes an ε-production. A typical derivation in this grammar is

S → aSa → aaSaa → aabSbaa → aabbaa.

This makes it clear that . The language is context-free, however, it can be


proved that it is not regular.

3.1. Production (computer science)

A production or production rule in computer science is a rewrite rule specifying a symbol substitution
that can be recursively performed to generate new symbol sequences. A finite set of productions is the
main component in the specification of a formal grammar (specifically a generative grammar). The other
components are a finite set of nonterminal symbols, a finite set (known as an alphabet) of terminal
symbols that is disjoint from and a distinguished symbol that is the start symbol.

In an unrestricted grammar, a production is of the form where and are arbitrary strings of
terminals and nonterminals however may not be the empty string. If is the empty string, this is
denoted by the symbol , or (rather than leave the right-hand side blank). So productions are members
of the cartesian product

where is the vocabulary, is the Kleene star operator, indicates concatenation, and
denotes set union. If we do not allow the start symbol to occur in (the word on the right side), we have to
replace by on the right side of the cartesian product symbol.[1]

The other types of formal grammar in the Chomsky hierarchy impose additional restrictions on what
constitutes a production. Notably in a context-free grammar, the left-hand side of a production must be a
single nonterminal symbol. So productions are of the form:

35 of 76
Grammar generation
To generate a string in the language, one begins with a string consisting of only a single start symbol, and
then successively applies the rules (any number of times, in any order) to rewrite this string. This stops
when we obtain a string containing only terminals. The language consists of all the strings that can be
generated in this manner. Any particular sequence of legal choices taken during this rewriting process
yields one particular string in the language. If there are multiple different ways of generating this single
string, then the grammar is said to be ambiguous.

For example, assume the alphabet consists of and , with the start symbol , and we have the following
rules:

1.
2.

then we start with , and can choose a rule to apply to it. If we choose rule 1, we replace with and
obtain the string . If we choose rule 1 again, we replace with and obtain the string . This
process is repeated until we only have symbols from the alphabet (i.e., and ). If we now choose rule 2,
we replace with and obtain the string , and are done. We can write this series of choices more
briefly, using symbols: . The language of the grammar is the set of all the
strings that can be generated using this process: .

3.2. Context-free language

In formal language theory, a context-free language (CFL) is a language generated by a context-free


grammar (CFG).

Context-free languages have many applications in programming languages, in particular, most arithmetic
expressions are generated by context-free grammars.

Context-free parsing

Main article: Parsing

The context-free nature of the language makes it simple to parse with a pushdown automaton.

Determining an instance of the membership problem; i.e. given a string , determine whether
where is the language generated by a given grammar ; is also known as recognition. Context-free
recognition for Chomsky normal form grammars was shown by Leslie G. Valiant to be reducible to boolean
matrix multiplication, thus inheriting its complexity upper bound of O(n2.3728639).[2][3][note 2] Conversely,
Lillian Lee has shown O(n3−ε) boolean matrix multiplication to be reducible to O(n3−3ε) CFG parsing, thus
establishing some kind of lower bound for the latter.[4]

Practical uses of context-free languages require also to produce a derivation tree that exhibits the
structure that the grammar associates with the given string. The process of producing this tree is called
parsing. Known parsers have a time complexity that is cubic in the size of the string that is parsed.

Formally, the set of all context-free languages is identical to the set of languages accepted by pushdown
automata (PDA). Parser algorithms for context-free languages include the CYK algorithm and Earley's
Algorithm.

A special subclass of context-free languages are the deterministic context-free languages which are
defined as the set of languages accepted by a deterministic pushdown automaton and can be parsed by a
LR(k) parser.[5]

See also parsing expression grammar as an alternative approach to grammar and parser.

Closure

Context-free languages are closed under the following operations. That is, if L and P are context-free
languages, the following languages are context-free as well:

the union of L and P


the reversal of L

36 of 76
the concatenation of L and P
the Kleene star of L
the image of L under a homomorphism
the image of L under an inverse homomorphism
the cyclic shift of L (the language )

Context-free languages are not closed under complement, intersection, or difference. This was proved by
Scheinberg in 1960.[6] However, if L is a context-free language and D is a regular language then both their
intersection and their difference are context-free languages.

Nonclosure under intersection, complement, and difference[edit]

The context-free languages are not closed under intersection. This can be seen by taking the languages
and , which are both context-free.[note 3] Their
intersection is , which can be shown to be non-context-free by the pumping
lemma for context-free languages.

Context-free languages are also not closed under complementation, as for any languages A and B:

Context-free language are also not closed under difference: LC = Σ* \ L.[6]

Languages that are not context-free

The set is a context-sensitive language, but there does not exist a context-free grammar
generating this language.[19] So there exist context-sensitive languages which are not context-free. To
prove that a given language is not context-free, one may employ the pumping lemma for context-free
languages[18] or a number of other methods, such as Ogden's lemma or Parikh's theorem.[20]

3.3. Ambiguous grammar

An ambiguous grammar is a context-free grammar for which there exists a string that can have more
than one leftmost derivation, while an unambiguous grammar is a context-free grammar for which every
valid string has a unique leftmost derivation or parse tree. Many languages admit both ambiguous and
unambiguous grammars, while some languages admit only ambiguous grammars. Any non-empty language
admits an ambiguous grammar by taking an unambiguous grammar and introducing a duplicate rule or
synonym (the only language without ambiguous grammars is the empty language).

For computer programming languages, the reference grammar is often ambiguous, due to issues such as
the dangling else problem. If present, these ambiguities are generally resolved by adding precedence
rules or other context-sensitive parsing rules, so the overall phrase grammar is unambiguous.
[citation needed] The set of all parse trees for an ambiguous sentence is called a parse forest.[1]

Examples
Trivial language

The simplest example is the following ambiguous grammar for the trivial language, which consists of only
the empty string:

A→A|ε

…meaning that a production can either be itself again, or the empty string. Thus the empty string has
leftmost derivations of length 1, 2, 3, and indeed of any length, depending on how many times the rule A →
A is used.

This language also has the unambiguous grammar, consisting of a single production rule:

A→ε

…meaning that the unique production can only produce the empty string, which is the unique string in the
language.

37 of 76
In the same way, any grammar for a non-empty language can be made ambiguous by adding duplicates.

Unary string

The regular language of unary strings of a given character, say 'a' (the regular expression a*), has the
unambiguous grammar:

A → aA | ε

…but also has the ambiguous grammar:

A → aA | Aa | ε

These correspond to producing a right-associative tree (for the unambiguous grammar) or allowing both
left- and right- association. This is elaborated below.

Addition and subtraction

The context free grammar

A → A + A | A − A | A * A | id

is ambiguous since there are two leftmost derivations for the string a + a + a:

     A → A + A      A → A + A
→ A + A + A (First A is replaced by A+A. Replacement of the second A would yield a
     →a+A     
similar derivation)
→a+A+
          →a+A+A
A
→a+a+
          →a+a+A
A
→a+a+
          →a+a+a
a

As another example, the grammar is ambiguous since there are two parse trees for the string a + a − a:

The language that it generates, however, is not inherently ambiguous; the following is a non-ambiguous
grammar generating the same language:

A→A+a|A−a|a

Dangling else

Main article: Dangling else

A common example of ambiguity in computer programming languages is the dangling else problem. In
many languages, the else in an If–then(–else) statement is optional, which results in nested conditionals
having multiple ways of being recognized in terms of the context-free grammar.

Concretely, in many languages one may write conditionals in two valid forms: the if-then form, and the
if-then-else form – in effect, making the else clause optional: [note 1]

In a grammar containing the rules


Statement → if Condition then Statement |
if Condition then Statement else Statement |
...
Condition → ...

some ambiguous phrase structures can appear. The expression


if a then if b then s else s2

38 of 76
can be parsed as either
if a then begin if b then s end else s2

or as
if a then begin if b then s else s2 end

depending on whether the else is associated with the first if or second if.

This is resolved in various ways in different languages. Sometimes the grammar is modified so that it is
unambiguous, such as by requiring an endif statement or making else mandatory. In other cases the
grammar is left ambiguous, but the ambiguity is resolved by making the overall phrase grammar context-
sensitive, such as by associating an else with the nearest if. In this latter case the grammar is
unambiguous, but the context-free grammar is ambiguous. [clarification needed]

Counter-example

The simple grammar


S → A + A
A → 0 | 1

is an unambiguous grammar for the language { 0+0, 0+1, 1+0, 1+1 }. While each of these four strings
has only one leftmost derivation, it has two different derivations, for example
S ⇒ A + A ⇒ 0 + A ⇒ 0 + 0

and
S ⇒ A + A ⇒ A + 0 ⇒ 0 + 0

Only the former one is a leftmost one.

Recognizing ambiguous grammars


The decision problem of whether an arbitrary grammar is ambiguous is undecidable because it can be
shown that it is equivalent to the Post correspondence problem.[2] At least, there are tools implementing
some semi-decision procedure for detecting ambiguity of context-free grammars.[3]

The efficiency of context-free grammar parsing is determined by the automaton that accepts it.
Deterministic context-free grammars are accepted by deterministic pushdown automata and can be
parsed in linear time, for example by the LR parser.[4] This is a subset of the context-free grammars which
are accepted by the pushdown automaton and can be parsed in polynomial time, for example by the CYK
algorithm. Unambiguous context-free grammars can be nondeterministic.

For example, the language of even-length palindromes on the alphabet of 0 and 1 has the unambiguous
context-free grammar S → 0S0 | 1S1 | ε. An arbitrary string of this language cannot be parsed without
reading all its letters first which means that a pushdown automaton has to try alternative state transitions
to accommodate for the different possible lengths of a semi-parsed string. [5] Nevertheless, removing
grammar ambiguity may produce a deterministic context-free grammar and thus allow for more efficient
parsing. Compiler generators such as YACC include features for resolving some kinds of ambiguity, such
as by using the precedence and associativity constraints.

Inherently ambiguous languages


The existence of inherently ambiguous languages was proven with Parikh's theorem in 1961 by Rohit
Parikh in an MIT research report.[6]

While some context-free languages (the set of strings that can be generated by a grammar) have both
ambiguous and unambiguous grammars, there exist context-free languages for which no unambiguous
context-free grammar can exist. An example of an inherently ambiguous language is the union of
with . This set is context-free, since the union of two
context-free languages is always context-free. But Hopcroft & Ullman (1979) give a proof that there is no
way to unambiguously parse strings in the (non-context-free) common subset .[7]

3.4. Chomsky normal form

In formal language theory, a context-free grammar G is said to be in Chomsky normal form (first

39 of 76
described by Noam Chomsky)[1] if all of its production rules are of the form:[2]:92–93,106

A → BC,   or
A → a,   or
S → ε,

where A, B, and C are nonterminal symbols, a is a terminal symbol (a symbol that represents a constant
value), S is the start symbol, and ε denotes the empty string. Also, neither B nor C may be the start
symbol, and the third production rule can only appear if ε is in L(G), the language produced by the
context-free grammar G.

Every grammar in Chomsky normal form is context-free, and conversely, every context-free grammar can
be transformed into an equivalent one[note 1] which is in Chomsky normal form and has a size no larger
than the square of the original grammar's size.

Converting a grammar to Chomsky normal form


To convert a grammar to Chomsky normal form, a sequence of simple transformations is applied in a
certain order; this is described in most textbooks on automata theory. [2]:87–94[3][4][5] The presentation
here follows Hopcroft, Ullman (1979), but is adapted to use the transformation names from Lange, Leiß
(2009).[6][note 2] Each of the following transformations establishes one of the properties required for
Chomsky normal form.

START: Eliminate the start symbol from right-hand sides

Introduce a new start symbol S0, and a new rule

S0 → S,

where S is the previous start symbol. This doesn't change the grammar's produced language, and S0 won't
occur on any rule's right-hand side.

TERM: Eliminate rules with nonsolitary terminals

To eliminate each rule

A → X1 ... a ... Xn

with a terminal symbol a being not the only symbol on the right-hand side, introduce, for every such
terminal, a new nonterminal symbol Na, and a new rule

Na → a.

Change every rule

A → X1 ... a ... Xn

to

A → X1 ... Na ... Xn.

If several terminal symbols occur on the right-hand side, simultaneously replace each of them by its
associated nonterminal symbol. This doesn't change the grammar's produced language. [2]:92

BIN: Eliminate right-hand sides with more than 2 nonterminals

Replace each rule

A → X1 X2 ... Xn

with more than 2 nonterminals X1,...,Xn by rules

A → X1 A1,
A1 → X2 A2,
... ,
An-2 → Xn-1 Xn,

40 of 76
where Ai are new nonterminal symbols. Again, this doesn't change the grammar's produced language.
[2]:93

DEL: Eliminate ε-rules

An ε-rule is a rule of the form

A → ε,

where A is not S0, the grammar's start symbol.

To eliminate all rules of this form, first determine the set of all nonterminals that derive ε. Hopcroft and
Ullman (1979) call such nonterminals nullable, and compute them as follows:

If a rule A → ε exists, then A is nullable.


If a rule A → X1 ... Xn exists, and every single Xi is nullable, then A is nullable, too.

Obtain an intermediate grammar by replacing each rule

A → X1 ... Xn

by all versions with some nullable Xi omitted. By deleting in this grammar each ε-rule, unless its left-hand
side is the start symbol, the transformed grammar is obtained.[2]:90

For example, in the following grammar, with start symbol S0,

S0 → AbB | C
B → AA | AC
C→b|c
A→a|ε

the nonterminal A, and hence also B, is nullable, while neither C nor S0 is. Hence the following
intermediate grammar is obtained:[note 3]

S0 → AbB | AbB | AbB | AbB   |   C


B → AA | AA | AA | AεA   |   AC | AC
C→b|c
A→a|ε

In this grammar, all ε-rules have been "inlined at the call site".[note 4] In the next step, they can hence be
deleted, yielding the grammar:

S0 → AbB | Ab | bB | b   |   C
B → AA | A   |   AC | C
C→b|c
A→a

This grammar produces the same language as the original example grammar, viz.
{ab,aba,abaa,abab,abac,abb,abc,b,bab,bac,bb,bc,c}, but apparently has no ε-rules.

UNIT: Eliminate unit rules

A unit rule is a rule of the form

A → B,

where A, B are nonterminal symbols. To remove it, for each rule

B → X1 ... Xn,

where X1 ... Xn is a string of nonterminals and terminals, add rule

A → X1 ... Xn

unless this is a unit rule which has already been (or is being) removed.

Order of transformations

When choosing the order in which the above transformations are Mutual preservation

41 of 76
to be applied, it has to be considered that some transformations of transformation results
may destroy the result achieved by other ones. For example, Transformation X always preserves (✓)
START will re-introduce a unit rule if it is applied after UNIT. The resp. may destroy (✗) the result of Y:
table shows which orderings are admitted.
Y START TERM BIN DEL UNIT
X\
Moreover, the worst-case bloat in grammar size[note 5] depends on START ✓ ✓ ✗ ✗
the transformation order. Using |G| to denote the size of the TERM ✓ ✗ ✓ ✓
original grammar G, the size blow-up in the worst case may range
BIN ✓ ✓ ✓ ✓
from |G|2 to 22 |G|, depending on the transformation algorithm
DEL ✓ ✓ ✓ ✗
used.[6]:7 The blow-up in grammar size depends on the order
UNIT ✓ ✓ ✓ (✓) *
between DEL and BIN. It may be exponential when DEL is done
first, but is linear otherwise. UNIT can incur a quadratic blow-up *UNIT preserves the result of DEL
in the size of the grammar.[6]:5 The orderings   if START had been called before.
START,TERM,BIN,DEL,UNIT and
START,BIN,DEL,UNIT,TERM lead to the least (i.e. quadratic) blow-up.

Example

Abstract syntax tree of the


arithmetic expression
"a^2+4*b" wrt. the
example grammar (top)
and its Chomsky normal
form (bottom)

The following grammar, with start symbol Expr, describes a simplified version of the set of all syntactical
valid arithmetic expressions in programming languages like C or Algol60. Both number and variable are
considered terminal symbols here for simplicity, since in a compiler front-end their internal structure is
usually not considered by the parser. The terminal symbol "^" denoted exponentiation in Algol60.

Expr → Term | Expr AddOp Term | AddOp Term


Term → Factor | Term MulOp Factor
Factor → Primary | Factor ^ Primary
Primary → number | variable | ( Expr )
AddOp → + | −
MulOp → * | /

In step "START" of the above conversion algorithm, just a rule S0→Expr is added to the grammar. After
step "TERM", the grammar looks like this:

S0 → Expr
Expr → Term | Expr AddOp Term | AddOp Term
Term → Factor | Term MulOp Factor
Factor → Primary | Factor PowOp Primary
Primary → number | variable | Open Expr Close
AddOp → + | −
MulOp → * | /
PowOp → ^
Open → (
Close → )

After step "BIN", the following grammar is obtained:

S0 → Expr
Expr → Term | Expr AddOp_Term | AddOp Term

42 of 76
Term → Factor | Term MulOp_Factor
Factor → Primary | Factor PowOp_Primary
Primary → number | variable | Open Expr_Close
AddOp → + |−
MulOp → * |/
PowOp → ^
Open → (
Close → )
AddOp_Term → AddOp Term
MulOp_Factor → MulOp Factor
PowOp_Primary → PowOp Primary
Expr_Close → Expr Close

Since there are no ε-rules, step "DEL" doesn't change the grammar. After step "UNIT", the following
grammar is obtained, which is in Chomsky normal form:

|
→ | | Open | Factor | Term | Expr
S0 AddOp
number variable Expr_Close PowOp_Primary MulOp_Factor AddOp_Term
Term
|
→ | | Open | Factor | Term | Expr
Expr AddOp
number variable Expr_Close PowOp_Primary MulOp_Factor AddOp_Term
Term
→ | | Open | Factor | Term
Term
number variable Expr_Close PowOp_Primary MulOp_Factor
→ | | Open | Factor
Factor
number variable Expr_Close PowOp_Primary
→ | | Open
Primary
number variable Expr_Close
AddOp →+ |−
MulOp →* |/
PowOp →^
Open →(
Close →)
AddOp_Term → AddOp Term
MulOp_Factor → MulOp Factor
PowOp_Primary → PowOp Primary
Expr_Close → Expr Close

The Na introduced in step "TERM" are PowOp, Open, and Close. The Ai introduced in step "BIN" are
AddOp_Term, MulOp_Factor, PowOp_Primary, and Expr_Close.

3.5. Greibach normal form

In formal language theory, a context-free grammar is in Greibach normal form (GNF) if the right-hand
sides of all production rules start with a terminal symbol, optionally followed by some variables. A
non-strict form allows one exception to this format restriction for allowing the empty word (epsilon, ε) to
be a member of the described language. The normal form was established by Sheila Greibach and it bears
her name.

More precisely, a context-free grammar is in Greibach normal form, if all production rules are of the form:

or

where is a nonterminal symbol, is a terminal symbol, is a (possibly empty) sequence of


nonterminal symbols not including the start symbol, is the start symbol, and ε is the empty word.[1]

Observe that the grammar does not have left recursions.

Every context-free grammar can be transformed into an equivalent grammar in Greibach normal form. [2]
Various constructions exist. Some do not permit the second form of rule and cannot transform context-free

43 of 76
grammars that can generate the empty word. For one such construction the size of the constructed
grammar is O(n4) in the general case and O(n3) if no derivation of the original grammar consists of a
single nonterminal symbol, where n is the size of the original grammar.[3] This conversion can be used to
prove that every context-free language can be accepted by a real-time pushdown automaton, i.e., the
automaton reads a letter from its input every step.

Given a grammar in GNF and a derivable string in the grammar with length n, any top-down parser will
halt at depth n.

3.6. Pumping lemma for context-free languages

The pumping lemma for context-free languages, also known as the Bar-Hillel [clarification needed]
lemma, is a lemma that gives a property shared by all context-free languages and generalizes the
pumping lemma for regular languages. As the pumping lemma does not suffice to guarantee that a
language is context-free there are more stringent necessary conditions, such as Ogden's lemma.

Proof idea: If s is sufficiently long,


its derivation tree w.r.t. a Chomsky
normal form grammar must contain
some nonterminal N twice on some
tree path (upper picture). Repeating
n times the derivation part N ⇒...⇒
vNx obtains a derivation for uvnwxny
(lower left and right picture for n=0
and 2, respectively).

If a language L is context-free, then there exists some integer p ≥ 1 (called a "pumping length"[1]) such
that every string s in L that has a length of p or more symbols (i.e. with |s| ≥ p) can be written as

s = uvwxy

with substrings u, v, w, x and y, such that

1. |vwx| ≤ p,
2. |vx| ≥ 1, and
3. uvnwxny ∈ L for all n ≥ 0.

Below is a formal expression of the Pumping Lemma.

Informal statement and explanation


The pumping lemma for context-free languages (called just "the pumping lemma" for the rest of this

44 of 76
article) describes a property that all context-free languages are guaranteed to have.

The property is a property of all strings in the language that are of length at least p, where p is a constant
—called the pumping length—that varies between context-free languages.

Say s is a string of length at least p that is in the language.

The pumping lemma states that s can be split into five substrings, s = uvwxy, where vx is non-empty and
the length of vwx is at most p, such that repeating v and x any (and the same) number of times in s
produces a string that is still in the language (it is possible and often useful to repeat zero times, which
removes v and x from the string). This process of "pumping up" additional copies of v and x is what gives
the pumping lemma its name.

Finite languages (which are regular and hence context-free) obey the pumping lemma trivially by having p
equal to the maximum string length in L plus one. As there are no strings of this length the pumping
lemma is not violated.

Usage of the lemma


The pumping lemma is often used to prove that a given language L is non-context-free, by showing that
arbitrarily long strings s are in L that cannot be "pumped" without producing strings outside L.

For example, the language L = { anbncn | n > 0 } can be shown to be non-context-free by using the
pumping lemma in a proof by contradiction. First, assume that L is context free. By the pumping lemma,
there exists an integer p which is the pumping length of language L. Consider the string s = apbpcp in L.
The pumping lemma tells us that s can be written in the form s = uvwxy, where u, v, w, x, and y are
substrings, such that |vwx| ≤ p, |vx| ≥ 1, and uviwxiy ∈ L for every integer i ≥ 0. By the choice of s and the
fact that |vwx| ≤ p, it is easily seen that the substring vwx can contain no more than two distinct symbols.
That is, we have one of five possibilities for vwx:

1. vwx = aj for some j ≤ p.


2. vwx = ajbk for some j and k with j+k ≤ p.
3. vwx = bj for some j ≤ p.
4. vwx = bjck for some j and k with j+k ≤ p.
5. vwx = cj for some j ≤ p.

For each case, it is easily verified that uviwxiy does not contain equal numbers of each letter for any i ≠ 1.
Thus, uv2wx2y does not have the form aibici. This contradicts the definition of L. Therefore, our initial
assumption that L is context free must be false.

While the pumping lemma is often a useful tool to prove that a given language is not context-free, it does
not give a complete characterization of the context-free languages. If a language does not satisfy the
condition given by the pumping lemma, we have established that it is not context-free.

On the other hand, there are languages that are not context-free, but still satisfy the condition given by
the pumping lemma, for example L = { bjckdl | j, k, l ∈ ℕ } ∪ { aibjcjdj | i, j ∈ ℕ, i≥1 }: for s=bjckdl with
e.g. j≥1 choose vwx to consist only of b’s, for s=aibjcjdj choose vwx to consist only of a’s; in both cases all
pumped strings are still in L.[2]

4. Pushdown automaton

A pushdown automaton (PDA) is a type of automaton that employs a stack.

Pushdown automata are used in theories about what can be computed by machines. Deterministic
pushdown automata can recognize all deterministic context-free languages while nondeterministic ones
can recognize all context-free languages, with the former often used in parser design.

The term "pushdown" refers to the fact that the stack can be regarded as being "pushed down" like a tray
dispenser at a cafeteria, since the operations never work on elements other than the top element. A stack
automaton, by contrast, does allow access to and operations on deeper elements. Stack automata can

45 of 76
recognize a strictly larger set of languages than pushdown automata.[1] A nested stack automaton allows
full access, and also allows stacked values to be entire sub-stacks rather than just single finite symbols.

A diagram of a pushdown automaton

A finite state machine just looks at the input signal and the current state: it has no stack to work with. It
chooses a new state, the result of following the transition. A pushdown automaton (PDA) differs from a
finite state machine in two ways:

1. It can use the top of the stack to decide which transition to take.
2. It can manipulate the stack as part of performing a transition.

A pushdown automaton reads a given input string from left to right. In each step, it chooses a transition by
indexing a table by input symbol, current state, and the symbol at the top of the stack. A pushdown
automaton can also manipulate the stack, as part of performing a transition. The manipulation can be to
push a particular symbol to the top of the stack, or to pop off the top of the stack. The automaton can
alternatively ignore the stack, and leave it as it is.

Put together: Given an input symbol, current state, and stack symbol, the automaton can follow a
transition to another state, and optionally manipulate (push or pop) the stack.

If, in every situation, at most one such transition action is possible, then the automaton is called a
deterministic pushdown automaton (DPDA). In general, if several actions are possible, then the
automaton is called a general, or nondeterministic, PDA. A given input string may drive a
nondeterministic pushdown automaton to one of several configuration sequences; if one of them leads to
an accepting configuration after reading the complete input string, the latter is said to belong to the
language accepted by the automaton.

Formal definition
We use standard formal language notation: denotes the set of strings over alphabet and denotes the
empty string.

A PDA is formally defined as a 7-tuple:

where

is a finite set of states


is a finite set which is called the input alphabet
is a finite set which is called the stack alphabet
is a finite subset of , the transition relation.
is the start state
is the initial stack symbol
is the set of accepting states

An element is a transition of . It has the intended meaning that , in state , on


the input and with as topmost stack symbol, may read , change the state to , pop ,
replacing it by pushing . The component of the transition relation is used to formalize
that the PDA can either read a letter from the input, or proceed leaving the input untouched.

In many texts the transition relation is replaced by an (equivalent) formalization, where

is the transition function, mapping into finite subsets of

46 of 76
Here contains all possible actions in state with on the stack, while reading on the input.
One writes for example precisely when
because . Note that finite in this definition is essential.

Computations

a step of the pushdown


automaton

In order to formalize the semantics of the pushdown automaton a description of the current situation is
introduced. Any 3-tuple is called an instantaneous description (ID) of , which
includes the current state, the part of the input tape that has not been read, and the contents of the stack
(topmost symbol written first). The transition relation defines the step-relation of on
instantaneous descriptions. For instruction there exists a step ,
for every and every .

In general pushdown automata are nondeterministic meaning that in a given instantaneous description
there may be several possible steps. Any of these steps can be chosen in a computation. With the
above definition in each step always a single symbol (top of the stack) is popped, replacing it with as many
symbols as necessary. As a consequence no step is defined when the stack is empty.

Computations of the pushdown automaton are sequences of steps. The computation starts in the initial
state with the initial stack symbol on the stack, and a string on the input tape, thus with initial
description . There are two modes of accepting. The pushdown automaton either accepts by final
state, which means after reading its input the automaton reaches an accepting state (in ), or it accepts
by empty stack ( ), which means after reading its input the automaton empties its stack. The first
acceptance mode uses the internal memory (state), the second the external memory (stack).

Formally one defines

1. with and (final state)


2. with (empty stack)

Here represents the reflexive and transitive closure of the step relation meaning any number of
consecutive steps (zero, one or more).

For each single pushdown automaton these two languages need to have no relation: they may be equal but
usually this is not the case. A specification of the automaton should also include the intended mode of
acceptance. Taken over all pushdown automata both acceptance conditions define the same family of
languages.

Theorem. For each pushdown automaton one may construct a pushdown automaton such that
, and vice versa, for each pushdown automaton one may construct a pushdown
automaton such that

Example
The following is the formal description of the PDA which recognizes the language by final
state:

47 of 76
PDA for
(by final state)

, where

states:
input alphabet:
stack alphabet:
start state:
start stack symbol: Z
accepting states:

The transition relation consists of the following six instructions:

,
,
,
,
, and
.

In words, the first two instructions say that in state p any time the symbol 0 is read, one A is pushed onto
the stack. Pushing symbol A on top of another A is formalized as replacing top A by AA (and similarly for
pushing symbol A on top of a Z).

The third and fourth instructions say that, at any moment the automaton may move from state p to state q.

The fifth instruction says that in state q, for each symbol 1 read, one A is popped.

Finally, the sixth instruction says that the machine may move from state q to accepting state r only when
the stack consists of a single Z.

There seems to be no generally used representation for PDA. Here we have depicted the instruction
by an edge from state p to state q labelled by (read a; replace A by ).

Understanding the computation process

accepting computation for


0011

The following illustrates how the above PDA computes on different input strings. The subscript M from the
step symbol is here omitted.

a. Input string = 0011. There are various computations, depending on the moment the move from state
p to state q is made. Only one of these is accepting.
i.
The final state is accepting, but the input is not accepted this way as it has not been read.
ii.
No further steps possible.
iii.
Accepting computation: ends in accepting state, while complete input has been read.
b. Input string = 00111. Again there are various computations. None of these is accepting.
i.
The final state is accepting, but the input is not accepted this way as it has not been read.

48 of 76
ii.
No further steps possible.
iii.
The final state is accepting, but the input is not accepted this way as it has not been (completely)
read.

PDA and context-free languages


Every context-free grammar can be transformed into an equivalent nondeterministic pushdown
automaton. The derivation process of the grammar is simulated in a leftmost way. Where the grammar
rewrites a nonterminal, the PDA takes the topmost nonterminal from its stack and replaces it by the
right-hand part of a grammatical rule (expand). Where the grammar generates a terminal symbol, the PDA
reads a symbol from input when it is the topmost symbol on the stack (match). In a sense the stack of the
PDA contains the unprocessed data of the grammar, corresponding to a pre-order traversal of a derivation
tree.

Technically, given a context-free grammar, the PDA has a single state, 1, and its transition relation is
constructed as follows.

1. for each rule (expand)


2. for each terminal symbol (match)

The PDA accepts by empty stack. Its initial stack symbol is the grammar's start symbol. [citation needed]

For a context-free grammar in Greibach normal form, defining (1,γ) ∈ δ(1,a,A) for each grammar rule A →
aγ also yields an equivalent nondeterministic pushdown automaton.[2]:115

The converse, finding a grammar for a given PDA, is not that easy. The trick is to code two states of the
PDA into the nonterminals of the grammar.

Theorem. For each pushdown automaton one may construct a context-free grammar such that
.[2]:116

The language of strings accepted by a deterministic pushdown automaton is called a deterministic


context-free language. Not all context-free languages are deterministic.[note 1] As a consequence, the
DPDA is a strictly weaker variant of the PDA and there exists no algorithm for converting a PDA to an
equivalent DPDA, if such a DPDA exists. [citation needed]

Generalized pushdown automaton (GPDA)


A GPDA is a PDA which writes an entire string of some known length to the stack or removes an entire
string from the stack in one step.

A GPDA is formally defined as a 6-tuple:

where , and are defined the same way as a PDA.

is the transition function.

Computation rules for a GPDA are the same as a PDA except that the 's and 's are now strings
instead of symbols.

GPDA's and PDA's are equivalent in that if a language is recognized by a PDA, it is also recognized by a
GPDA and vice versa.

One can formulate an analytic proof for the equivalence of GPDA's and PDA's using the following
simulation:

Let be a transition of the GPDA

where .

Construct the following transitions for the PDA:

49 of 76
Stack automaton
As a generalization of pushdown automata, Ginsburg, Greibach, and Harrison (1967) investigated stack
automata, which may additionally step left or right in the input string (surrounded by special endmarker
symbols to prevent slipping out), and step up or down in the stack in read-only mode. [4][5] A stack
automaton is called nonerasing if it never pops from the stack. The class of languages accepted by
nondeterministic, nonerasing stack automata is NSPACE(n2), which is a superset of the context-sensitive
languages.[1] The class of languages accepted by deterministic, nonerasing stack automata is
DSPACE(n⋅log(n)).[1]

Alternating pushdown automata


An alternating pushdown automaton (APDA) is a pushdown automaton with a state set

where .

States in and are called existential resp. universal. In an existential state an APDA
nondeterministically chooses the next state and accepts if at least one of the resulting computations
accepts. In a universal state APDA moves to all next states and accepts if all the resulting computations
accept.

The model was introduced by Chandra, Kozen and Stockmeyer.[6] Ladner, Lipton and Stockmeyer[7]
proved that this model is equivalent to EXPTIME i.e. a language is accepted by some APDA iff it can be
decided by an exponential-time algorithm.

Aizikowitz and Kaminski[8] introduced synchronized alternating pushdown automata (SAPDA) that are
equivalent to conjunctive grammars in the same way as nondeterministic PDA are equivalent to
context-free grammars.

4.1. Nested stack automaton

In automata theory, a nested stack automaton is a finite automaton that can make use of a stack
containing data which can be additional stacks.[1] Like a stack automaton, a nested stack automaton may
step up or down in the stack, and read the current symbol; in addition, it may at any place create a new
stack, operate on that one, eventually destroy it, and continue operating on the old stack. This way, stacks
can be nested recursively to an arbitrary depth; however, the automaton always operates on the innermost
stack only.

A nested stack automaton is capable of recognizing an indexed language,[2] and in fact the class of
indexed languages is exactly the class of languages accepted by one-way nondeterministic nested stack
automata.[1][3]

Nested stack automata should not be confused with embedded pushdown automata, which have less
computational power.[citation needed]

Formal definition

50 of 76
A (nondeterministic two-way) nested stack automaton is a tuple ‹Q,Σ,Γ,δ,q0,Z0,F,[,],]› where

Q, Σ, and Γ is a nonempty finite set of states, input symbols, and stack symbols, respectively,
[, ], and ] are distinct special symbols not contained in Σ ∪ Γ,
[ is used as left endmarker for both the input string and a (sub)stack string,
] is used as right endmarker for these strings,
] is used as the final endmarker of the string denoting the whole stack. [note 1]
An extended input alphabet is defined by Σ' = Σ ∪ {[,]}, an extended stack alphabet by Γ' = Γ ∪ {]},
and the set of input move directions by D = {-1,0,+1}.
δ, the finite control, is a mapping from Q × Σ' × (Γ' ∪ [Γ' ∪ {], []}) into finite subsets of Q × D × ([Γ*
∪ D), such that δ maps[note 2]

      Q × Σ' × [Γ into subsets of Q × D × [Γ* (pushdown mode),


Q × Σ' × Γ' into subsets of Q × D × D (reading mode),
Q × Σ' × [Γ' into subsets of Q × D × {+1} (reading mode),
Q × Σ' × {]} into subsets of Q × D × {-1} (reading mode),
Q × Σ' × (Γ' ∪ [Γ') into subsets of Q × D × [Γ*] (stack creation mode), and
Q × Σ' × {[]} into subsets of Q × D × {ε}, (stack destruction mode),

Informally, the top symbol of a (sub)stack together with its preceding left endmarker "[" is viewed as
a single symbol;[4] then δ reads
the current state,
the current input symbol, and
the current stack symbol,
and outputs
the next state,
the direction in which to move on the input, and
the direction in which to move on the stack, or the string of symbols to replace the topmost stack
symbol.

q0 ∈ Q is the initial state,


Z0 ∈ Γ is the initial stack symbol,
F ⊆ Q is the set of final states.

Configuration

A configuration, or instantaneous description of such an automaton consists in a triple ‹ q,


[a1a2...ai...an-1], [Z1X2...Xj...Xm-1] ›, where

q ∈ Q is the current state,


[a1a2...ai...an-1] is the input string; for convenience, a0 = [ and an = ] is defined[note 3] The current
position in the input, viz. i with 0 ≤ i ≤ n, is marked by underlining the respective symbol.
[Z1X2...Xj...Xm-1] is the stack, including substacks; for convenience, X1 = [Z1 [note 4] and Xm = ] is
defined. The current position in the stack, viz. j with 1 ≤ j ≤ m, is marked by underlining the
respective symbol.

Example

An example run (input string not shown):

Action Step Stack


1:       [a b [k ] [p ] c ]
create substack       2: [a b [k ] [p [r s ] ] c ]
pop 3: [a b [k ] [p [s ] ] c ]
pop 4: [a b [k ] [p [] ] c ]
destroy substack 5: [a b [k ] [p ] c ]
move down 6: [a b [k ] [p ] c ]
move up 7: [a b [k ] [p ] c ]
move up 8: [a b [k ] [p ] c ]
push 9: [a b [k ] [n o p ] c ]

Properties

51 of 76
When automata are allowed to re-read their input ("two-way automata"), nested stacks do not result in
additional language recognition capabilities, compared to plain stacks. [5]

Gilman and Shapiro used nested stack automata to solve the word problem in certain groups.[6]

5. Turing machine
A Turing machine is a mathematical model of computation that defines an abstract machine,[1] which
manipulates symbols on a strip of tape according to a table of rules. [2] Despite the model's simplicity,
given any computer algorithm, a Turing machine capable of simulating that algorithm's logic can be
constructed.[3]

The machine operates on an infinite[4] memory tape divided into discrete cells.[5] The machine positions
its head over a cell and "reads" (scans)[6] the symbol there. Then, as per the symbol and its present place
in a finite table[7] of user-specified instructions, the machine (i) writes a symbol (e.g., a digit or a letter
from a finite alphabet) in the cell (some models allowing symbol erasure or no writing), [8] then (ii) either
moves the tape one cell left or right (some models allow no motion, some models move the head), [9] then
(iii) (as determined by the observed symbol and the machine's place in the table) either proceeds to a
subsequent instruction or halts the computation.[10]

The Turing machine was invented in 1936 by Alan Turing,[11][12] who called it an a-machine (automatic
machine).[13] With this model, Turing was able to answer two questions in the negative: (1) Does a
machine exist that can determine whether any arbitrary machine on its tape is "circular" (e.g., freezes, or
fails to continue its computational task); similarly, (2) does a machine exist that can determine whether
any arbitrary machine on its tape ever prints a given symbol.[14] Thus by providing a mathematical
description of a very simple device capable of arbitrary computations, he was able to prove properties of
computation in general—and in particular, the uncomputability of the Entscheidungsproblem ("decision
problem").[15]

Thus, Turing machines prove fundamental limitations on the power of mechanical computation. [16] While
they can express arbitrary computations, their minimalistic design makes them unsuitable for computation
in practice: real-world computers are based on different designs that, unlike Turing machines, use
random-access memory.

Turing completeness is the ability for a system of instructions to simulate a Turing machine. A
programming language that is Turing complete is theoretically capable of expressing all tasks
accomplishable by computers; nearly all programming languages are Turing complete if the limitations of
finite memory are ignored.

A Turing machine is a general example of a CPU that controls all data manipulation done by a computer,
with the canonical machine using sequential memory to store data. More specifically, it is a machine
(automaton) capable of enumerating some arbitrary subset of valid strings of an alphabet; these strings
are part of a recursively enumerable set. A Turing machine has a tape of infinite length that enables read
and write operations to be performed.

Assuming a black box, the Turing machine cannot know whether it will eventually enumerate any one
specific string of the subset with a given program. This is due to the fact that the halting problem is
unsolvable, which has major implications for the theoretical limits of computing.

The Turing machine is capable of processing an unrestricted grammar, which further implies that it is
capable of robustly evaluating first-order logic in an infinite number of ways. This is famously
demonstrated through lambda calculus.

A Turing machine that is able to simulate any other Turing machine is called a universal Turing machine
(UTM, or simply a universal machine). A more mathematically oriented definition with a similar
"universal" nature was introduced by Alonzo Church, whose work on lambda calculus intertwined with
Turing's in a formal theory of computation known as the Church–Turing thesis. The thesis states that
Turing machines indeed capture the informal notion of effective methods in logic and mathematics, and
provide a precise definition of an algorithm or "mechanical procedure". Studying their abstract properties
yields many insights into computer science and complexity theory.

52 of 76
The Turing machine mathematically models a machine that mechanically operates on a tape. On this tape
are symbols, which the machine can read and write, one at a time, using a tape head. Operation is fully
determined by a finite set of elementary instructions such as "in state 42, if the symbol seen is 0, write a 1;
if the symbol seen is 1, change into state 17; in state 17, if the symbol seen is 0, write a 1 and change to
state 6;" etc. In the original article ("On Computable Numbers, with an Application to the
Entscheidungsproblem", see also references below), Turing imagines not a mechanism, but a person
whom he calls the "computer", who executes these deterministic mechanical rules slavishly (or as Turing
puts it, "in a desultory manner").

The head is always over a particular


square of the tape; only a finite
stretch of squares is shown. The
instruction to be performed (q4) is
shown over the scanned square.
(Drawing after Kleene (1952) p.
375.)

Here, the internal state (q1) is


shown inside the head, and the
illustration describes the tape as
being infinite and pre-filled with "0",
the symbol serving as blank. The
system's full state (its complete
configuration) consists of the
internal state, any non-blank
symbols on the tape (in this
illustration "11B"), and the position
of the head relative to those symbols
including blanks, i.e. "011B".
(Drawing after Minsky (1967) p.
121.)

More precisely, a Turing machine consists of:

A tape divided into cells, one next to the other. Each cell contains a symbol from some finite
alphabet. The alphabet contains a special blank symbol (here written as '0') and one or more other
symbols. The tape is assumed to be arbitrarily extendable to the left and to the right, i.e., the Turing
machine is always supplied with as much tape as it needs for its computation. Cells that have not
been written before are assumed to be filled with the blank symbol. In some models the tape has a
left end marked with a special symbol; the tape extends or is indefinitely extensible to the right.
A head that can read and write symbols on the tape and move the tape left and right one (and only
one) cell at a time. In some models the head moves and the tape is stationary.
A state register that stores the state of the Turing machine, one of finitely many. Among these is the
special start state with which the state register is initialized. These states, writes Turing, replace the
"state of mind" a person performing computations would ordinarily be in.
A finite table[19] of instructions[20] that, given the state(qi) the machine is currently in and the
symbol(aj) it is reading on the tape (symbol currently under the head), tells the machine to do the
following in sequence (for the 5-tuple models):

1. Either erase or write a symbol (replacing aj with aj1).


2. Move the head (which is described by dk and can have values: 'L' for one step left or 'R' for one step
right or 'N' for staying in the same place).
3. Assume the same or a new state as prescribed (go to state qi1).

In the 4-tuple models, erasing or writing a symbol (aj1) and moving the head left or right (dk) are specified
as separate instructions. Specifically, the table tells the machine to (ia) erase or write a symbol or (ib)
move the head left or right, and then (ii) assume the same or a new state as prescribed, but not both
actions (ia) and (ib) in the same instruction. In some models, if there is no entry in the table for the
current combination of symbol and state then the machine will halt; other models require all entries to be
filled.

53 of 76
Note that every part of the machine (i.e. its state, symbol-collections, and used tape at any given time) and
its actions (such as printing, erasing and tape motion) is finite, discrete and distinguishable; it is the
unlimited amount of tape and runtime that gives it an unbounded amount of storage space.

Formal definition
Following Hopcroft and Ullman (1979, p. 148), a (one-tape) Turing machine can be formally defined as a
7-tuple where

is a finite, non-empty set of states;


is a finite, non-empty set of tape alphabet symbols;
is the blank symbol (the only symbol allowed to occur on the tape infinitely often at any step
during the computation);
is the set of input symbols, that is, the set of symbols allowed to appear in the initial tape
contents;
is the initial state;
is the set of final states or accepting states. The initial tape contents is said to be accepted by
if it eventually halts in a state from .
is a partial function called the transition function, where L is left
shift, R is right shift. (A relatively uncommon variant allows "no shift", say N, as a third element of the
latter set.) If is not defined on the current state and the current tape symbol, then the machine
halts;[21]

Anything that operates according to these specifications is a Turing machine.

The 7-tuple for the 3-state busy beaver looks like this (see more about this busy beaver at Turing machine
examples):

(states);
(tape alphabet symbols);
(blank symbol);
(input symbols);
(initial state);
(final states);
see state-table below (transition function).

Initially all tape cells are marked with .

State table for 3 state, 2 symbol busy beaver


Current state A Current state B Current state C
Tape symbol
Write symbol Move tape Next state Write symbol Move tape Next state Write symbol Move tape Next state
0 1 R B 1 L A 1 L B
1 1 L C 1 R B 1 R HALT

Turing machine "state" diagrams

The table for the 3-state busy beaver ("P" = print/write a "1")
Tape
Current state A Current state B Current state C
symbol
Write Move Next Write Move Next Write Move Next
symbol tape state symbol tape state symbol tape state
0 P R B P L A P L B
1 P L C P R B P R HALT

54 of 76
The "3-state busy beaver" Turing machine in a finite state
representation. Each circle represents a "state" of the
table—an "m-configuration" or "instruction". "Direction" of a
state transition is shown by an arrow. The label (e.g. 0/P,R)
near the outgoing state (at the "tail" of the arrow) specifies
the scanned symbol that causes a particular transition (e.g.
0) followed by a slash /, followed by the subsequent
"behaviors" of the machine, e.g. "P Print" then move tape "R
Right". No general accepted format exists. The convention
shown is after McClusky (1965), Booth (1967), Hill, and
Peterson (1974).

To the right: the above table as expressed as a "state transition" diagram.

Usually large tables are better left as tables (Booth, p. 74). They are more readily simulated by computer
in tabular form (Booth, p. 74). However, certain concepts—e.g. machines with "reset" states and machines
with repeating patterns (cf. Hill and Peterson p. 244ff)—can be more readily seen when viewed as a
drawing.

Whether a drawing represents an improvement on its table must be decided by the reader for the
particular context. See Finite state machine for more.

The evolution of the busy-beaver's computation starts at the


top and proceeds to the bottom.

The reader should again be cautioned that such diagrams represent a snapshot of their table frozen in
time, not the course ("trajectory") of a computation through time and space. While every time the busy
beaver machine "runs" it will always follow the same state-trajectory, this is not true for the "copy"
machine that can be provided with variable input "parameters".

The diagram "Progress of the computation" shows the three-state busy beaver's "state" (instruction)
progress through its computation from start to finish. On the far right is the Turing "complete
configuration" (Kleene "situation", Hopcroft–Ullman "instantaneous description") at each step. If the
machine were to be stopped and cleared to blank both the "state register" and entire tape, these
"configurations" could be used to rekindle a computation anywhere in its progress (cf. Turing (1936) The
Undecidable, pp. 139–140).

5.1. Linear bounded automaton

55 of 76
A linear bounded automaton is a nondeterministic Turing machine that satisfies the following three
conditions:

Its input alphabet includes two special symbols, serving as left and right endmarkers.
Its transitions may not print other symbols over the endmarkers.
Its transitions may neither move to the left of the left endmarker nor to the right of the right
endmarker.[1]:225

In other words: instead of having potentially infinite tape on which to compute, computation is restricted
to the portion of the tape containing the input plus the two tape squares holding the endmarkers.

An alternative, less restrictive definition is as follows:

Like a Turing machine, an LBA possesses a tape made up of cells that can contain symbols from a
finite alphabet, a head that can read from or write to one cell on the tape at a time and can be moved,
and a finite number of states.
An LBA differs from a Turing machine in that while the tape is initially considered to have unbounded
length, only a finite contiguous portion of the tape, whose length is a linear function of the length of
the initial input, can be accessed by the read/write head; hence the name linear bounded automaton.
[1]:225

This limitation makes an LBA a somewhat more accurate model of a real-world computer than a Turing
machine, whose definition assumes unlimited tape.

The strong and the weaker definition lead to the same computational abilities of the respective automaton
classes,[1]:225 due to the linear speedup theorem.

In 1960, John Myhill introduced an automaton model today known as deterministic linear bounded
automaton.[2] In 1963, Peter S. Landweber proved that the languages accepted by deterministic LBAs are
context-sensitive.[3] In 1964, S.-Y. Kuroda introduced the more general model of (nondeterministic) linear
bounded automata, noted that Landweber's proof also works for nondeterministic linear bounded
automata, and showed that the languages accepted by them are precisely the context-sensitive languages.
[4][5]

LBA and context-sensitive languages


Linear bounded automata are acceptors for the class of context-sensitive languages.[1]:225-226 The only
restriction placed on grammars for such languages is that no production maps a string to a shorter string.
Thus no derivation of a string in a context-sensitive language can contain a sentential form longer than the
string itself. Since there is a one-to-one correspondence between linear-bounded automata and such
grammars, no more tape than that occupied by the original string is necessary for the string to be
recognized by the automaton.

5.2. Multitape Turing machine

A multi-tape Turing machine is like an ordinary Turing machine with several tapes. Each tape
has its own head for reading and writing. Initially the input appears on tape 1, and the others start
out blank.[1]

This model intuitively seems much more powerful than the single-tape model, but any multi-tape
machine, no matter how many tapes, can be simulated by a single-tape machine using only
quadratically more computation time.[2] Thus, multi-tape machines cannot calculate any more
functions than single-tape machines,[3] and none of the robust complexity classes (such as
polynomial time) are affected by a change between single-tape and multi-tape machines.

Formal definition
A k-tape Turing machine can be described as a 6-tuple where:

is a finite set of states


is a finite set of the tape alphabet
is the initial state
is the blank symbol
is the set of final or accepting states

56 of 76
is a partial function called the transition function, where k is the
number of tapes, L is left shift, R is right shift and S is no shift.

Two-stack Turing machine


Two-stack Turing machines have a read-only input and two storage tapes. If a head moves left on either
tape a blank is printed on that tape, but one symbol from a "library" can be printed.

5.3. Multi-track Turing machine

A Multitrack Turing machine is a specific type of Multi-tape Turing machine. In a standard n-tape
Turing machine, n heads move independently along n tracks. In a n-track Turing machine, one head reads
and writes on all tracks simultaneously. A tape position in a n-track Turing Machine contains n symbols
from the tape alphabet.

Formal definition
A multitrack Turing machine with -tapes can be formally defined as a 6-tuple ,
where

is a finite set of states


is a finite set of symbols called the tape alphabet

is the initial state


is the set of final or accepting states.
is a partial function called the transition function.

Sometimes also denoted as , where .

A non-deterministic variant can be defined by replacing the transition function by a transition relation
.

Proof of equivalency to standard Turing machine


This will prove that a two-track Turing machine is equivalent to a standard Turing machine. This can be
generalized to a n-track Turing machine. Let L be a recursively enumerable language. Let M=
be standard Turing machine that accepts L. Let M' is a two-track Turing machine. To
prove M=M' it must be shown that M M' and M' M

If all but the first track is ignored then M and M' are clearly equivalent.

The tape alphabet of a one-track Turing machine equivalent to a two-track Turing machine consists of an
ordered pair. The input symbol a of a Turing machine M' can be identified as an ordered pair [x,y] of
Turing machine M. The one-track Turing machine is:

M= with the transition function

This machine also accepts L.

5.4. Non-deterministic Turing machine

Deterministic Turing Machine

In a deterministic Turing machine (DTM), the set of rules prescribes at most one action to be
performed for any given situation.

57 of 76
A deterministic Turing machine has a transition function that, for a given state and symbol under
the tape head, specifies three things:

the symbol to be written to the tape,


the direction (left, right or neither) in which the head should move, and
the subsequent state of the finite control.

For example, an X on the tape in state 3 might make the DTM write a Y on the tape, move the head
one position to the right, and switch to state 5.

Non-Deterministic Turing Machine

By contrast, a non-deterministic Turing machine (NTM), the set of rules may prescribe more than one
action to be performed for any given situation. For example, an X on the tape in state 3 might allow the
NTM to:

Write a Y, move right, and switch to state 5

or

Write an X, move left, and stay in state 3.

Resolution of multiple rules

How does the NTM "know" which of these actions it should take? There are two ways of looking at it. One
is to say that the machine is the "luckiest possible guesser"; it always picks a transition that eventually
leads to an accepting state, if there is such a transition. The other is to imagine that the machine
"branches" into many copies, each of which follows one of the possible transitions. Whereas a DTM has a
single "computation path" that it follows, an NTM has a "computation tree". If at least one branch of the
tree halts with an "accept" condition, we say that the NTM accepts the input.

Definition
A non-deterministic Turing machine can be formally defined as a 6-tuple , where

is a finite set of states


is a finite set of symbols (the tape alphabet)
is the initial state
is the blank symbol
is the set of accepting (final) states
is a relation on states and symbols called the transition
relation. is the movement to the left, is no movement, and is the movement to the right.

The difference with a standard (deterministic) Turing machine is that for those, the transition relation is a
function (the transition function).

Configurations and the yields relation on configurations, which describes the possible actions of the
Turing machine given any possible contents of the tape, are as for standard Turing machines, except that
the yields relation is no longer single-valued. The notion of string acceptance is unchanged: a
non-deterministic Turing machine accepts a string if, when the machine is started on the configuration in
which the tape head is on the first character of the string (if any), and the tape is all blank otherwise, at
least one of the machine's possible computations from that configuration puts the machine into a state in
. (If the machine is deterministic, the possible computations are the prefixes of a single, possibly infinite,
path.)

Computational equivalence with DTMs


NTMs can compute the same results as DTMs, that is, they are capable of computing the same values,
given the same inputs. The time complexity of these computations varies, however, as is discussed below.

DTM as a special case of NTM

NTMs effectively include DTMs as special cases, so it is immediately clear that DTMs are not more
powerful. It might seem that NTMs are more powerful than DTMs, since they can allow trees of possible
computations arising from the same initial configuration, accepting a string if any one branch in the tree
accepts it.

58 of 76
DTM simulation of NTM

It is possible to simulate NTMs with DTMs, and in fact this can be done in more than one way.

Multiplicity of configuration states

One approach is to use a DTM of which the configurations represent multiple configurations of the NTM,
and the DTM's operation consists of visiting each of them in turn, executing a single step at each visit, and
spawning new configurations whenever the transition relation defines multiple continuations.

Multiplicity of tapes

Another construction[1] simulates NTMs with 3-tape DTMs, of which the first tape always holds the
original input string, the second is used to simulate a particular computation of the NTM, and the third
encodes a path in the NTM's computation tree. The 3-tape DTMs are easily simulated with a normal
single-tape DTM.

In this construction, the resulting DTM effectively performs a breadth-first search of the NTM's
computation tree, visiting all possible computations of the NTM in order of increasing length until it finds
an accepting one. Therefore, the length of an accepting computation of the DTM is, in general, exponential
in the length of the shortest accepting computation of the NTM. This is considered to be a general
property of simulations of NTMs by DTMs; the most famous unresolved question in computer science, the
P = NP problem, is related to this issue.

6. Recursive language
A formal language (a set of finite sequences of symbols taken from a fixed alphabet) is called recursive if
it is a recursive subset of the set of all possible finite sequences over the alphabet of the language.
Equivalently, a formal language is recursive if there exists a total Turing machine (a Turing machine that
halts for every given input) that, when given a finite sequence of symbols as input, accepts it if it belongs
to the language and rejects it otherwise. Recursive languages are also called decidable.

The concept of decidability may be extended to other models of computation. For example one may speak
of languages decidable on a non-deterministic Turing machine. Therefore, whenever an ambiguity is
possible, the synonym for "recursive language" used is Turing-decidable language, rather than simply
decidable.

The class of all recursive languages is often called R, although this name is also used for the class RP.

This type of language was not defined in the Chomsky hierarchy of (Chomsky 1959). All recursive
languages are also recursively enumerable. All regular, context-free and context-sensitive languages are
recursive.

Definitions
There are two equivalent major definitions for the concept of a recursive language:

1. A recursive formal language is a recursive subset in the set of all possible words over the alphabet of
the language.
2. A recursive language is a formal language for which there exists a Turing machine that, when
presented with any finite input string, halts and accepts if the string is in the language, and halts and
rejects otherwise. The Turing machine always halts: it is known as a decider and is said to decide the
recursive language.

By the second definition, any decision problem can be shown to be decidable by exhibiting an algorithm
for it that terminates on all inputs. An undecidable problem is a problem that is not decidable.

Examples
As noted above, every context-sensitive language is recursive. Thus, a simple example of a recursive
language is the set L={abc, aabbcc, aaabbbccc, ...}; more formally, the set

59 of 76
is context-sensitive and therefore recursive.

Examples of decidable languages that are not context-sensitive are more difficult to describe. For one such
example, some familiarity with mathematical logic is required: Presburger arithmetic is the first-order
theory of the natural numbers with addition (but without multiplication). While the set of well-formed
formulas in Presburger arithmetic is context-free, every deterministic Turing machine accepting the set of
true statements in Presburger arithmetic has a worst-case runtime of at least , for some constant c>0
(Fischer & Rabin 1974). Here, n denotes the length of the given formula. Since every context-sensitive
language can be accepted by a linear bounded automaton, and such an automaton can be simulated by a
deterministic Turing machine with worst-case running time at most for some constant c[citation needed],
the set of valid formulas in Presburger arithmetic is not context-sensitive. On positive side, it is known that
there is a deterministic Turing machine running in time at most triply exponential in n that decides the set
of true formulas in Presburger arithmetic (Oppen 1978). Thus, this is an example of a language that is
decidable but not context-sensitive.

Closure properties
Recursive languages are closed under the following operations. That is, if L and P are two recursive
languages, then the following languages are recursive as well:

The Kleene star


The image φ(L) under an e-free homomorphism φ
The concatenation
The union
The intersection
The complement of
The set difference

The last property follows from the fact that the set difference can be expressed in terms of intersection
and complement.

6.1. Recursive set

A subset S of the natural numbers is called recursive if there exists a total computable function f such
that f(x) = 1 if x ∈ S and f(x) = 0 if x ∉ S . In other words, the set S is recursive if and only if the indicator
function 1S is computable.

Examples
Every finite or cofinite subset of the natural numbers is computable. This includes these special
cases:
The empty set is computable.
The entire set of natural numbers is computable.
Each natural number (as defined in standard set theory) is computable; that is, the set of natural
numbers less than a given natural number is computable.
The set of prime numbers is computable.
The set of Gödel numbers of arithmetic proofs described in Kurt Gödel's paper "On formally
undecidable propositions of Principia Mathematica and related systems I"; see Gödel's
incompleteness theorems.

Properties
If A is a recursive set then the complement of A is a recursive set. If A and B are recursive sets then A ∩ B,
A ∪ B and the image of A × B under the Cantor pairing function are recursive sets.

A set A is a recursive set if and only if A and the complement of A are both recursively enumerable sets.
The preimage of a recursive set under a total computable function is a recursive set. The image of a
computable set under a total computable bijection is computable.

0
A set is recursive if and only if it is at level Δ1 of the arithmetical hierarchy.

A set is recursive if and only if it is either the range of a nondecreasing total computable function or the

60 of 76
empty set. The image of a computable set under a nondecreasing total computable function is computable.

6.2. Decision problem

A decision problem has


only two possible
outputs (yes or no) on
any input.

In computability theory and computational complexity theory, a decision problem is a problem that can
be posed as a yes-no question of the input values. Decision problems typically appear in mathematical
questions of decidability, that is, the question of the existence of an effective method to determine the
existence of some object or its membership in a set; some of the most important problems in mathematics
are undecidable.

For example, the problem "given two numbers x and y, does x evenly divide y?" is a decision problem. The
answer can be either 'yes' or 'no', and depends upon the values of x and y. A method for solving a decision
problem, given in the form of an algorithm, is called a decision procedure for that problem. A decision
procedure for the decision problem "given two numbers x and y, does x evenly divide y?" would give the
steps for determining whether x evenly divides y, given x and y. One such algorithm is long division,
taught to many school children. If the remainder is zero the answer produced is 'yes', otherwise it is 'no'. A
decision problem which can be solved by an algorithm, such as this example, is called decidable.

The field of computational complexity categorizes decidable decision problems by how difficult they are to
solve. "Difficult", in this sense, is described in terms of the computational resources needed by the most
efficient algorithm for a certain problem. The field of recursion theory, meanwhile, categorizes
undecidable decision problems by Turing degree, which is a measure of the noncomputability inherent in
any solution.

Definition
A decision problem is any arbitrary yes-or-no question on an infinite set of inputs. Because of this, it is
traditional to define the decision problem equivalently as: the set of possible inputs together with the set
of inputs for which the problem returns yes.

These inputs can be natural numbers, but may also be values of some other kind, such as strings over the
binary alphabet {0,1} or over some other finite set of symbols. The subset of strings for which the problem
returns "yes" is a formal language, and often decision problems are defined in this way as formal
languages.

Alternatively, using an encoding such as Gödel numberings, any string can be encoded as a natural
number, via which a decision problem can be defined as a subset of the natural numbers.

Examples
A classic example of a decidable decision problem is the set of prime numbers. It is possible to effectively
decide whether a given natural number is prime by testing every possible nontrivial factor. Although much
more efficient methods of primality testing are known, the existence of any effective method is enough to
establish decidability.

61 of 76
6.3. Undecidable problem

In computability theory and computational complexity theory, an undecidable problem is a decision


problem for which it is known to be impossible to construct a single algorithm that always leads to a
correct yes-or-no answer. The halting problem is an example: there is no algorithm that correctly
determines whether arbitrary programs eventually halt when run.

A decision problem is any arbitrary yes-or-no question on an infinite set of inputs. Because of this, it is
traditional to define the decision problem equivalently as the set of inputs for which the problem returns
yes. These inputs can be natural numbers, but also other values of some other kind, such as strings of a
formal language. Using some encoding, such as a Gödel numbering, the strings can be encoded as natural
numbers. Thus, a decision problem informally phrased in terms of a formal language is also equivalent to a
set of natural numbers. To keep the formal definition simple, it is phrased in terms of subsets of the
natural numbers.

Formally, a decision problem is a subset of the natural numbers. The corresponding informal problem is
that of deciding whether a given number is in the set. A decision problem A is called decidable or
effectively solvable if A is a recursive set. A problem is called partially decidable, semi-decidable,
solvable, or provable if A is a recursively enumerable set. This means that there exists an algorithm that
halts eventually when the answer is yes but may run for ever if the answer is no. Partially decidable
problems and any other problems that are not decidable are called undecidable.

Example: the halting problem in computability theory


In computability theory, the halting problem is a decision problem which can be stated as follows:

Given the description of an arbitrary program and a finite input, decide whether the program finishes
running or will run forever.

Alan Turing proved in 1936 that a general algorithm running on a Turing machine that solves the halting
problem for all possible program-input pairs necessarily cannot exist. Hence, the halting problem is
undecidable for Turing machines.

Relationship with Gödel's incompleteness theorem


The concepts raised by Gödel's incompleteness theorems are very similar to those raised by the halting
problem, and the proofs are quite similar. In fact, a weaker form of the First Incompleteness Theorem is
an easy consequence of the undecidability of the halting problem. This weaker form differs from the
standard statement of the incompleteness theorem by asserting that a complete, consistent and sound
axiomatization of all statements about natural numbers is unachievable. The "sound" part is the
weakening: it means that we require the axiomatic system in question to prove only true statements about
natural numbers. It is important to observe that the statement of the standard form of Gödel's First
Incompleteness Theorem is completely unconcerned with the question of truth, but only concerns the
issue of whether it can be proven.

The weaker form of the theorem can be proved from the undecidability of the halting problem as follows.
Assume that we have a consistent and complete axiomatization of all true first-order logic statements
about natural numbers. Then we can build an algorithm that enumerates all these statements. This means
that there is an algorithm N(n) that, given a natural number n, computes a true first-order logic statement
about natural numbers such that, for all the true statements, there is at least one n such that N(n) yields
that statement. Now suppose we want to decide if the algorithm with representation a halts on input i. We
know that this statement can be expressed with a first-order logic statement, say H(a, i). Since the
axiomatization is complete it follows that either there is an n such that N(n) = H(a, i) or there is an n' such
that N(n') = ¬ H(a, i). So if we iterate over all n until we either find H(a, i) or its negation, we will always
halt. This means that this gives us an algorithm to decide the halting problem. Since we know that there
cannot be such an algorithm, it follows that the assumption that there is a consistent and complete
axiomatization of all true first-order logic statements about natural numbers must be false.

Examples of undecidable problems


Undecidable problems can be related to different topics, such as logic, abstract machines or topology.
Note that since there are uncountably many undecidable problems, any list, even one of infinite length, is
necessarily incomplete.

62 of 76
Examples of undecidable statements
There are two distinct senses of the word "undecidable" in contemporary use. The first of these is the
sense used in relation to Gödel's theorems, that of a statement being neither provable nor refutable in a
specified deductive system. The second sense is used in relation to computability theory and applies not to
statements but to decision problems, which are countably infinite sets of questions each requiring a yes or
no answer. Such a problem is said to be undecidable if there is no computable function that correctly
answers every question in the problem set. The connection between these two is that if a decision problem
is undecidable (in the recursion theoretical sense) then there is no consistent, effective formal system
which proves for every question A in the problem either "the answer to A is yes" or "the answer to A is no".

Because of the two meanings of the word undecidable, the term independent is sometimes used instead of
undecidable for the "neither provable nor refutable" sense. The usage of "independent" is also ambiguous,
however. It can mean just "not provable", leaving open whether an independent statement might be
refuted.

Undecidability of a statement in a particular deductive system does not, in and of itself, address the
question of whether the truth value of the statement is well-defined, or whether it can be determined by
other means. Undecidability only implies that the particular deductive system being considered does not
prove the truth or falsity of the statement. Whether there exist so-called "absolutely undecidable"
statements, whose truth value can never be known or is ill-specified, is a controversial point among
various philosophical schools.

One of the first problems suspected to be undecidable, in the second sense of the term, was the word
problem for groups, first posed by Max Dehn in 1911, which asks if there is a finitely presented group for
which no algorithm exists to determine whether two words are equivalent. This was shown to be the case
in 1952.

The combined work of Gödel and Paul Cohen has given two concrete examples of undecidable statements
(in the first sense of the term): The continuum hypothesis can neither be proved nor refuted in ZFC (the
standard axiomatization of set theory), and the axiom of choice can neither be proved nor refuted in ZF
(which is all the ZFC axioms except the axiom of choice). These results do not require the incompleteness
theorem. Gödel proved in 1940 that neither of these statements could be disproved in ZF or ZFC set
theory. In the 1960s, Cohen proved that neither is provable from ZF, and the continuum hypothesis cannot
be proven from ZFC.

In 1970, Russian mathematician Yuri Matiyasevich showed that Hilbert's Tenth Problem, posed in 1900 as
a challenge to the next century of mathematicians, cannot be solved. Hilbert's challenge sought an
algorithm which finds all solutions of a Diophantine equation. A Diophantine equation is a more general
case of Fermat's Last Theorem; we seek the integer roots of a polynomial in any number of variables with
integer coefficients. Since we have only one equation but n variables, infinitely many solutions exist (and
are easy to find) in the complex plane; however, the problem becomes impossible if solutions are
constrained to integer values only. Matiyasevich showed this problem to be unsolvable by mapping a
Diophantine equation to a recursively enumerable set and invoking Gödel's Incompleteness Theorem.[1]

In 1936, Alan Turing proved that the halting problem—the question of whether or not a Turing machine
halts on a given program—is undecidable, in the second sense of the term. This result was later
generalized by Rice's theorem.

In 1973, the Whitehead problem in group theory was shown to be undecidable, in the first sense of the
term, in standard set theory.

In 1977, Paris and Harrington proved that the Paris-Harrington principle, a version of the Ramsey
theorem, is undecidable in the axiomatization of arithmetic given by the Peano axioms but can be proven
to be true in the larger system of second-order arithmetic.

Kruskal's tree theorem, which has applications in computer science, is also undecidable from the Peano
axioms but provable in set theory. In fact Kruskal's tree theorem (or its finite form) is undecidable in a
much stronger system codifying the principles acceptable on basis of a philosophy of mathematics called
predicativism.

Goodstein's theorem is a statement about the Ramsey theory of the natural numbers that Kirby and Paris
showed is undecidable in Peano arithmetic.

Gregory Chaitin produced undecidable statements in algorithmic information theory and proved another
incompleteness theorem in that setting. Chaitin's theorem states that for any theory that can represent
enough arithmetic, there is an upper bound c such that no specific number can be proven in that theory to
have Kolmogorov complexity greater than c. While Gödel's theorem is related to the liar paradox, Chaitin's
result is related to Berry's paradox.

63 of 76
In 2007, researchers Kurtz and Simon, building on earlier work by J.H. Conway in the 1970s, proved that a
natural generalization of the Collatz problem is undecidable.[2]

6.3.1. Halting problem

In computability theory, the halting problem is the problem of determining, from a description of an
arbitrary computer program and an input, whether the program will finish running or continue to run
forever.

Alan Turing proved in 1936 that a general algorithm to solve the halting problem for all possible
program-input pairs cannot exist. A key part of the proof was a mathematical definition of a computer and
program, which became known as a Turing machine; the halting problem is undecidable over Turing
machines. It is one of the first examples of a decision problem.

Informally, for any program f that might determine if programs halt, a "pathological" program g called
with an input can pass its own source and its input to f and then specifically do the opposite of what f
predicts g will do. No f can exist that handles this case.

Jack Copeland (2004) attributes the term halting problem to Martin Davis.[1]

The halting problem is a decision problem about properties of computer programs on a fixed Turing-
complete model of computation, i.e., all programs that can be written in some given programming
language that is general enough to be equivalent to a Turing machine. The problem is to determine, given
a program and an input to the program, whether the program will eventually halt when run with that
input. In this abstract framework, there are no resource limitations on the amount of memory or time
required for the program's execution; it can take arbitrarily long, and use an arbitrary amount of storage
space, before halting. The question is simply whether the given program will ever halt on a particular
input.

For example, in pseudocode, the program

while (true) continue

does not halt; rather, it goes on forever in an infinite loop. On the other hand, the program

print "Hello, world!"

does halt.

While deciding whether these programs halt is simple, more complex programs prove problematic.

One approach to the problem might be to run the program for some number of steps and check if it halts.
But if the program does not halt, it is unknown whether the program will eventually halt or run forever.

Turing proved no algorithm exists that always correctly decides whether, for a given arbitrary program
and input, the program halts when run with that input. The essence of Turing's proof is that any such
algorithm can be made to contradict itself and therefore cannot be correct.

Programming consequences

Some infinite loops can be quite useful. For instance, event loops are typically coded as infinite loops.[2]
However, most subroutines are intended to finish (halt).[3] In particular, in hard real-time computing,
programmers attempt to write subroutines that are not only guaranteed to finish (halt), but are also
guaranteed to finish before a given deadline.[4]

Sometimes these programmers use some general-purpose (Turing-complete) programming language, but
attempt to write in a restricted style—such as MISRA C or SPARK—that makes it easy to prove that the
resulting subroutines finish before the given deadline.[citation needed]

Other times these programmers apply the rule of least power—they deliberately use a computer language
that is not quite fully Turing-complete, often a language that guarantees that all subroutines are
guaranteed to finish, such as Coq.[citation needed]

Common pitfalls

The difficulty in the halting problem lies in the requirement that the decision procedure must work for all
programs and inputs. A particular program either halts on a given input or does not halt. Consider one
algorithm that always answers "halts" and another that always answers "doesn't halt". For any specific
program and input, one of these two algorithms answers correctly, even though nobody may know which

64 of 76
one. Yet neither algorithm solves the halting problem generally.

There are programs (interpreters) that simulate the execution of whatever source code they are given.
Such programs can demonstrate that a program does halt if this is the case: the interpreter itself will
eventually halt its simulation, which shows that the original program halted. However, an interpreter will
not halt if its input program does not halt, so this approach cannot solve the halting problem as stated; it
does not successfully answer "doesn't halt" for programs that do not halt.

The halting problem is theoretically decidable for linear bounded automata (LBAs) or deterministic
machines with finite memory. A machine with finite memory has a finite number of states, and thus any
deterministic program on it must eventually either halt or repeat a previous state:

...any finite-state machine, if left completely to itself, will fall eventually into a perfectly periodic
repetitive pattern. The duration of this repeating pattern cannot exceed the number of internal states
of the machine... (italics in original, Minsky 1967, p. 24)

Minsky warns us, however, that machines such as computers with e.g., a million small parts, each with two
states, will have at least 21,000,000 possible states:

This is a 1 followed by about three hundred thousand zeroes ... Even if such a machine were to
operate at the frequencies of cosmic rays, the aeons of galactic evolution would be as nothing
compared to the time of a journey through such a cycle (Minsky 1967 p. 25):

Minsky exhorts the reader to be suspicious—although a machine may be finite, and finite automata "have a
number of theoretical limitations":

...the magnitudes involved should lead one to suspect that theorems and arguments based chiefly on
the mere finiteness [of] the state diagram may not carry a great deal of significance. (Minsky p. 25)

It can also be decided automatically whether a nondeterministic machine with finite memory halts on
none, some, or all of the possible sequences of nondeterministic decisions, by enumerating states after
each possible decision.

Representation as a set

The conventional representation of decision problems is the set of objects possessing the property in
question. The halting set

K := { (i, x) | program i halts when run on input x}

represents the halting problem.

This set is recursively enumerable, which means there is a computable function that lists all of the pairs
(i, x) it contains (Moore and Mertens 2011, pp. 236–237). However, the complement of this set is not
recursively enumerable (Moore and Mertens 2011, pp. 236–237).

There are many equivalent formulations of the halting problem; any set whose Turing degree equals that
of the halting problem is such a formulation. Examples of such sets include:

{ i | program i eventually halts when run with input 0 }


{ i | there is an input x such that program i eventually halts when run with input x }.

Proof concept

The proof that the halting problem is not solvable is a proof by contradiction. To illustrate the concept of
the proof, suppose that there exists a total computable function halts(f) that returns true if the subroutine
f halts (when run with no inputs) and returns false otherwise. Now consider the following subroutine:
def g():
if halts(g):
loop_forever()

halts(g) must either return true or false, because halts was assumed to be total. If halts(g) returns true,
then g will call loop_forever and never halt, which is a contradiction. If halts(g) returns false, then g will
halt, because it will not call loop_forever; this is also a contradiction. Overall, halts(g) can not return a
truth value that is consistent with whether g halts. Therefore, the initial assumption that halts is a total
computable function must be false.

The method used in the proof is called diagonalization - g does the opposite of what halts says g should do.

65 of 76
The difference between this sketch and the actual proof is that in the actual proof, the computable
function halts does not directly take a subroutine as an argument; instead it takes the source code of a
program. The actual proof requires additional work to handle this issue. Moreover, the actual proof avoids
the direct use of recursion shown in the definition of g.

Sketch of proof

The concept above shows the general method of the proof; this section will present additional details. The
overall goal is to show that there is no total computable function that decides whether an arbitrary
program i halts on arbitrary input x; that is, the following function h is not computable (Penrose 1990,
p. 57–63):

Here program i refers to the i th program in an enumeration of all the programs of a fixed Turing-
complete model of computation.

The proof proceeds by directly establishing that no total


computable function with two arguments can be the required f(i,j)
i
function h. As in the sketch of the concept, given any total 1 2 3 4 5 6
computable binary function f, the following partial function g is 1 1 0 0 1 0 1
also computable by some program e: 2 0 0 0 1 0 0
3 0 1 0 1 0 1
j
4 1 0 0 1 0 0
5 0 0 0 1 1 1
6 1 1 0 0 1 0

The verification that g is computable relies on the following f(i,i) 1 0 0 1 1 0


constructs (or their equivalents): g(i) U 0 0 U U 0

computable subprograms (the program that computes f is a Possible values for a total computable
subprogram in program e), function f arranged in a 2D array. The
duplication of values (program e computes the inputs i,i for f orange cells are the diagonal. The
values of f(i,i) and g(i) are shown at the
from the input i for g), bottom; U indicates that the function g
conditional branching (program e selects between two results is undefined for a particular input
depending on the value it computes for f(i,i)), value.
not producing a defined result (for example, by looping
forever),
returning a value of 0.

The following pseudocode illustrates a straightforward way to compute g:


procedure compute_g(i):
if f(i,i) == 0 then
return 0
else
loop forever

Because g is partial computable, there must be a program e that computes g, by the assumption that the
model of computation is Turing-complete. This program is one of all the programs on which the halting
function h is defined. The next step of the proof shows that h(e,e) will not have the same value as f(e,e).

It follows from the definition of g that exactly one of the following two cases must hold:

f(e,e) = 0 and so g(e) = 0. In this case h(e,e) = 1, because program e halts on input e.
f(e,e) ≠ 0 and so g(e) is undefined. In this case h(e,e) = 0, because program e does not halt on input
e.

In either case, f cannot be the same function as h. Because f was an arbitrary total computable function
with two arguments, all such functions must differ from h.

This proof is analogous to Cantor's diagonal argument. One may visualize a two-dimensional array with
one column and one row for each natural number, as indicated in the table above. The value of f(i,j) is
placed at column i, row j. Because f is assumed to be a total computable function, any element of the array
can be calculated using f. The construction of the function g can be visualized using the main diagonal of
this array. If the array has a 0 at position (i,i), then g(i) is 0. Otherwise, g(i) is undefined. The contradiction
comes from the fact that there is some column e of the array corresponding to g itself. Now assume f was
the halting function h, if g(e) is defined (g(e) = 0 in this case), g(e) halts so f(e,e) = 1. But g(e) = 0 only
when f(e,e) = 0, contradicting f(e,e) = 1. Similarly, if g(e) is not defined, then halting function f(e,e) = 0,
which leads to g(e) = 0 under g's construction. This contradicts the assumption of g(e) not being defined.

66 of 76
In both cases contradiction arises. Therefore any arbitrary computable function f cannot be the halting
function h.

Computability theory
The typical method of proving a problem to be undecidable is with the technique of
reduction [clarification needed]. To do this, it is sufficient to show that if a solution to the new problem were
found, it could be used to decide an undecidable problem by transforming instances of the undecidable
problem into instances of the new problem. Since we already know that no method can decide the old
problem, no method can decide the new problem either. Often the new problem is reduced to solving the
halting problem. (Note: the same technique is used to demonstrate that a problem is NP complete, only in
this case, rather than demonstrating that there is no solution, it demonstrates there is no polynomial time
solution, assuming P ≠ NP).

For example, one such consequence of the halting problem's undecidability is that there cannot be a
general algorithm that decides whether a given statement about natural numbers is true or not. The
reason for this is that the proposition stating that a certain program will halt given a certain input can be
converted into an equivalent statement about natural numbers. If we had an algorithm that could find the
truth value of every statement about natural numbers, it could certainly find the truth value of this one;
but that would determine whether the original program halts, which is impossible, since the halting
problem is undecidable.

Gregory Chaitin has defined a halting probability, represented by the symbol Ω, a type of real number that
informally is said to represent the probability that a randomly produced program halts. These numbers
have the same Turing degree as the halting problem. It is a normal and transcendental number which can
be defined but cannot be completely computed. This means one can prove that there is no algorithm which
produces the digits of Ω, although its first few digits can be calculated in simple cases.

While Turing's proof shows that there can be no general method or algorithm to determine whether
algorithms halt, individual instances of that problem may very well be susceptible to attack. Given a
specific algorithm, one can often show that it must halt for any input, and in fact computer scientists often
do just that as part of a correctness proof. But each proof has to be developed specifically for the
algorithm at hand; there is no mechanical, general way to determine whether algorithms on a Turing
machine halt. However, there are some heuristics that can be used in an automated fashion to attempt to
construct a proof, which succeed frequently on typical programs. This field of research is known as
automated termination analysis.

Since the negative answer to the halting problem shows that there are problems that cannot be solved by
a Turing machine, the Church–Turing thesis limits what can be accomplished by any machine that
implements effective methods. However, not all machines conceivable to human imagination are subject to
the Church–Turing thesis (e.g. oracle machines). It is an open question whether there can be actual
deterministic physical processes that, in the long run, elude simulation by a Turing machine, and in
particular whether any such hypothetical process could usefully be harnessed in the form of a calculating
machine (a hypercomputer) that could solve the halting problem for a Turing machine amongst other
things. It is also an open question whether any such unknown physical processes are involved in the
working of the human brain, and whether humans can solve the halting problem (Copeland 2004, p. 15).

Gödel's incompleteness theorems

The concepts raised by Gödel's incompleteness theorems are very similar to those raised by the halting
problem, and the proofs are quite similar. In fact, a weaker form of the First Incompleteness Theorem is
an easy consequence of the undecidability of the halting problem. This weaker form differs from the
standard statement of the incompleteness theorem by asserting that a complete, consistent and sound
axiomatization of all statements about natural numbers is unachievable. The "sound" part is the
weakening: it means that we require the axiomatic system in question to prove only true statements about
natural numbers. The more general statement of the incompleteness theorems does not require a
soundness assumption of this kind.

The weaker form of the theorem can be proven from the undecidability of the halting problem as follows.
Assume that we have a consistent and complete axiomatization of all true first-order logic statements
about natural numbers. Then we can build an algorithm that enumerates all these statements i.e. an
algorithm N that, given a natural number n, computes a true first-order logic statement about natural
numbers, such that for all the true statements there is at least one n such that N(n) is equal to that
statement. Now suppose we want to decide whether the algorithm with representation a halts on input i.
By using Kleene's T predicate, we can express the statement "a halts on input i" as a statement H(a, i) in
the language of arithmetic. Since the axiomatization is complete it follows that either there is an n such
that N(n) = H(a, i) or there is an n' such that N(n') = ¬ H(a, i). So if we iterate over all n until we either
find H(a, i) or its negation, we will always halt. This means that this gives us an algorithm to decide the
halting problem. Since we know that there cannot be such an algorithm, it follows that the assumption

67 of 76
that there is a consistent and complete axiomatization of all true first-order logic statements about natural
numbers must be false.

Generalization
Many variants of the halting problem can be found in Computability textbooks (e.g., Sipser 2006, Davis
1958, Minsky 1967, Hopcroft and Ullman 1979, Börger 1989). Typically their undecidability follows by
reduction from the standard halting problem. However, some of them have a higher degree of
unsolvability. The next two examples are typical.

Halting on all inputs

The universal halting problem, also known (in recursion theory) as totality, is the problem of determining,
whether a given computer program will halt for every input (the name totality comes from the equivalent
question of whether the computed function is total). This problem is not only undecidable, as the halting
problem, but highly undecidable. In terms of the Arithmetical hierarchy, it is -complete (Börger 1989, p.
121).

This means, in particular, that it cannot be decided even with an oracle for the halting problem.

Recognizing partial solutions

There are many programs that, for some inputs, return a correct answer to the halting problem, while for
other inputs they do not return an answer at all. However the problem ″given program p, is it a partial
halting solver″ (in the sense described) is at least as hard as the halting problem. To see this, assume that
there is an algorithm PHSR (″partial halting solver recognizer″) to do that. Then it can be used to solve the
halting problem, as follows: To test whether input program x halts on y, construct a program p that on
input (x,y) reports true and diverges on all other inputs. Then test p with PHSR.

The above argument is a reduction of the halting problem to PHS recognition, and in the same manner,
harder problems such as halting on all inputs can also be reduced, implying that PHS recognition is not
only undecidable, but higher in the Arithmetical hierarchy, specifically -complete.

6.4. P (complexity)

In computational complexity theory, P, also known as PTIME or DTIME(nO(1)), is a fundamental


complexity class. It contains all decision problems that can be solved by a deterministic Turing machine
using a polynomial amount of computation time, or polynomial time.

Cobham's thesis holds that P is the class of computational problems that are "efficiently solvable" or
"tractable". This is inexact: in practice, some problems not known to be in P have practical solutions, and
some that are in P do not, but this is a useful rule of thumb.

A language L is in P if and only if there exists a deterministic Turing machine M, such that

M runs for polynomial time on all inputs


For all x in L, M outputs 1
For all x not in L, M outputs 0

P can also be viewed as a uniform family of boolean circuits. A language L is in P if and only if there exists
a polynomial-time uniform family of boolean circuits , such that

For all , takes n bits as input and outputs 1 bit


For all x in L,
For all x not in L,

The circuit definition can be weakened to use only a logspace uniform family without changing the
complexity class.

Notable problems in P
P is known to contain many natural problems, including the decision versions of linear programming,
calculating the greatest common divisor, and finding a maximum matching. In 2002, it was shown that the

68 of 76
problem of determining if a number is prime is in P.[1] The related class of function problems is FP.

Several natural problems are complete for P, including st-connectivity (or reachability) on alternating
graphs.[2] The article on P-complete problems lists further relevant problems in P.

Relationships to other classes


A generalization of P is NP, which is the class of decision problems decidable by a non-deterministic Turing
machine that runs in polynomial time. Equivalently, it is the class of decision problems where each "yes"
instance has a polynomial size certificate, and certificates can be checked by a polynomial time
deterministic Turing machine. The class of problems for which this is true for the "no" instances is called
co-NP. P is trivially a subset of NP and of co-NP; most experts believe it is a proper subset, [3] although this
(the hypothesis) remains unproven. Another open problem is whether NP = co-NP (a negative
answer would imply ).

P is also known to be at least as large as L, the class of problems decidable in a logarithmic amount of
memory space. A decider using space cannot use more than time, because this is
the total number of possible configurations; thus, L is a subset of P. Another important problem is whether
L = P. We do know that P = AL, the set of problems solvable in logarithmic memory by alternating Turing
machines. P is also known to be no larger than PSPACE, the class of problems decidable in polynomial
space. Again, whether P = PSPACE is an open problem. To summarize:

Here, EXPTIME is the class of problems solvable in exponential time. Of all the classes shown above, only
two strict containments are known:

P is strictly contained in EXPTIME. Consequently, all EXPTIME-hard problems lie outside P, and at
least one of the containments to the right of P above is strict (in fact, it is widely believed that all
three are strict).
L is strictly contained in PSPACE.

The most difficult problems in P are P-complete problems.

Another generalization of P is P/poly, or Nonuniform Polynomial-Time. If a problem is in P/poly, then it can


be solved in deterministic polynomial time provided that an advice string is given that depends only on the
length of the input. Unlike for NP, however, the polynomial-time machine doesn't need to detect fraudulent
advice strings; it is not a verifier. P/poly is a large class containing nearly all practical problems, including
all of BPP. If it contains NP, then the polynomial hierarchy collapses to the second level. On the other hand,
it also contains some impractical problems, including some undecidable problems such as the unary
version of any undecidable problem.

In 1999, Jin-Yi Cai and D. Sivakumar, building on work by Mitsunori Ogihara, showed that if there exists a
sparse language that is P-complete, then L = P.[4]

Properties
Polynomial-time algorithms are closed under composition. Intuitively, this says that if one writes a function
that is polynomial-time assuming that function calls are constant-time, and if those called functions
themselves require polynomial time, then the entire algorithm takes polynomial time. One consequence of
this is that P is low for itself. This is also one of the main reasons that P is considered to be a machine-
independent class; any machine "feature", such as random access, that can be simulated in polynomial
time can simply be composed with the main polynomial-time algorithm to reduce it to a polynomial-time
algorithm on a more basic machine.

Languages in P are also closed under reversal, intersection, union, concatenation, Kleene closure, inverse
homomorphism, and complementation.[5]

6.5. R (complexity)

In computational complexity theory, R is the class of decision problems solvable by a Turing machine.

Equivalent formulations

69 of 76
R is equal to the set of all total computable functions.

Relationship with other classes


Since we can decide any problem for which there exists a recogniser and also a co-recogniser by simply
interleaving them until one obtains a result, the class is equal to RE ∩ coRE.

6.6. RP (complexity)

In computational complexity theory, randomized polynomial time (RP) is the complexity class of
problems for which a probabilistic Turing machine exists with these properties:

It always runs in polynomial time in the input size RP algorithm (1 run)


If the correct answer is NO, it always returns NO Answer produced
If the correct answer is YES, then it returns YES with Correct Yes No
probability at least 1/2 (otherwise, it returns NO). answer
Yes ≥ 1/2 ≤ 1/2
In other words, the algorithm is allowed to flip a truly random
No 0 1
coin while it is running. The only case in which the algorithm
can return YES is if the actual answer is YES; therefore if the RP algorithm (n runs)
algorithm terminates and produces YES, then the correct Answer produced
answer is definitely YES; however, the algorithm can terminate Correct Yes No
with NO regardless of the actual answer. That is, if the answer
algorithm returns NO, it might be wrong. Yes ≥ 1 − 2−n ≤ 2−n
No 0 1
If the correct answer is YES and the algorithm is run n times
co-RP algorithm (1 run)
with the result of each run statistically independent of the
others, then it will return YES at least once with probability at Answer produced
−n
Correct Yes No
least 1 − 2 . So if the algorithm is run 100 times, then the answer
chance of it giving the wrong answer every time is lower than Yes 1 0
the chance that cosmic rays corrupted the memory of the
No ≤ 1/2 ≥ 1/2
computer running the algorithm.[1] In this sense, if a source of
random numbers is available, most algorithms in RP are highly practical.

The fraction 1/2 in the definition is arbitrary. The set RP will contain exactly the same problems, even if
the 1/2 is replaced by any constant nonzero probability less than 1; here constant means independent of
the input to the algorithm.

Related complexity classes


The definition of RP says that a YES answer is always right and that a NO answer might be wrong
(because a question with the YES answer can be sometimes answered NO). In other words, while NO
questions are always answered NO, you cannot trust the NO answer, it may be a mistaken answer to a YES
question. The complexity class co-RP is similarly defined, except that NO is always right and YES might
be wrong. In other words, it accepts all YES instances but can either accept or reject NO instances. The
class BPP describes algorithms that can give incorrect answers on both YES and NO instances, and thus
contains both RP and co-RP. The intersection of the sets RP and co-RP is called ZPP. Just as RP may be
called R, some authors use the name co-R rather than co-RP.

Connection to P and NP
P is a subset of RP, which is a subset of NP. Similarly, P is a subset of Unsolved problem in
co-RP which is a subset of co-NP. It is not known whether these computer science:
inclusions are strict. However, if the commonly believed conjecture P
= BPP is true, then RP, co-RP, and P collapse (are all equal).
Assuming in addition that P ≠ NP, this then implies that RP is strictly (more unsolved problems in
contained in NP. It is not known whether RP = co-RP, or whether RP computer science)
is a subset of the intersection of NP and co-NP, though this would be
implied by P = BPP.

A natural example of a problem in co-RP currently not known to be in P is Polynomial Identity Testing, the
problem of deciding whether a given multivariate arithmetic expression over the integers is the
zero-polynomial. For instance, x·x − y·y − (x + y)·(x − y) is the zero-polynomial while x·x + y·y is not.

An alternative characterization of RP that is sometimes easier to use is the set of problems recognizable

70 of 76
by nondeterministic Turing machines where the machine accepts if and only if at least some constant
fraction of the computation paths, independent of the input size, accept. NP on the other hand, needs only
one accepting path, which could constitute an exponentially small fraction of the paths. This
characterization makes the fact that RP is a subset of NP obvious.

6.7. Recursively enumerable set

In computability theory, traditionally called recursion theory, a set S of natural numbers is called
recursively enumerable, computably enumerable, semidecidable, provable or Turing-
recognizable if:

There is an algorithm such that the set of input numbers for which the algorithm halts is exactly S.

Or, equivalently,

There is an algorithm that enumerates the members of S. That means that its output is simply a list of
the members of S: s1, s2, s3, ... . If necessary, this algorithm may run forever.

The first condition suggests why the term semidecidable is sometimes used; the second suggests why
computably enumerable is used. The abbreviations r.e. and c.e. are often used, even in print, instead of
the full phrase.

In computational complexity theory, the complexity class containing all recursively enumerable sets is RE.
In recursion theory, the lattice of r.e. sets under inclusion is denoted .

A set S of natural numbers is called recursively enumerable if there is a partial recursive function whose
domain is exactly S, meaning that the function is defined if and only if its input is a member of S.

Equivalent formulations
The following are all equivalent properties of a set S of natural numbers:

Semidecidability:
The set S is recursively enumerable. That is, S is the domain (co-range) of a partial recursive
function.
There is a partial recursive function f such that:

Enumerability:
The set S is the range of a partial recursive function.
The set S is the range of a total recursive function or empty. If S is infinite, the function can be
chosen to be injective.
The set S is the range of a primitive recursive function or empty. Even if S is infinite, repetition
of values may be necessary in this case.
Diophantine:
There is a polynomial p with integer coefficients and variables x, a, b, c, d, e, f, g, h, i ranging
over the natural numbers such that

There is a polynomial from the integers to the integers such that the set S contains exactly the
non-negative numbers in its range.

The equivalence of semidecidability and enumerability can be obtained by the technique of dovetailing.

The Diophantine characterizations of a recursively enumerable set, while not as straightforward or


intuitive as the first definitions, were found by Yuri Matiyasevich as part of the negative solution to
Hilbert's Tenth Problem. Diophantine sets predate recursion theory and are therefore historically the first
way to describe these sets (although this equivalence was only remarked more than three decades after
the introduction of recursively enumerable sets). The number of bound variables in the above definition of
the Diophantine set is the best known so far; it might be that a lower number can be used to define all
diophantine sets.

Examples
Every recursive set is recursively enumerable, but it is not true that every recursively enumerable set
is recursive. For recursive sets, the algorithm must also say if an input is not in the set – this is not

71 of 76
required of recursively enumerable sets.
A recursively enumerable language is a
recursively enumerable subset of a formal
language.
The set of all provable sentences in an
effectively presented axiomatic system is a
recursively enumerable set.
Matiyasevich's theorem states that every
recursively enumerable set is a Diophantine
set (the converse is trivially true).
The simple sets are recursively enumerable
but not recursive.
The creative sets are recursively enumerable
but not recursive.
Any productive set is not recursively Recursive enumeration of the set of all Turing
enumerable. machines halting on a fixed input: Simulate all Turing
Given a Gödel numbering of the machines (enumerated on vertical axis) step by step
computable functions, the set (horizontal axis), using the shown diagonalization
(where is the Cantor scheduling. If a machine terminates, print its number.
pairing function and indicates This way, the number of each terminating machine is
is defined) is recursively enumerable (cf. eventually printed. In the example, the algorithm
picture for a fixed x). This set encodes the prints "9, 13, 4, 15, 12, 18, 6, 2, 8, 0, ..."
halting problem as it describes the input parameters for which each Turing machine halts.
Given a Gödel numbering of the computable functions, the set is recursively
enumerable. This set encodes the problem of deciding a function value.
Given a partial function f from the natural numbers into the natural numbers, f is a partial recursive
function if and only if the graph of f, that is, the set of all pairs such that f(x) is defined, is
recursively enumerable.

Properties
If A and B are recursively enumerable sets then A ∩ B, A ∪ B and A × B (with the ordered pair of natural
numbers mapped to a single natural number with the Cantor pairing function) are recursively enumerable
sets. The preimage of a recursively enumerable set under a partial recursive function is a recursively
enumerable set.

A set is recursively enumerable if and only if it is at level of the arithmetical hierarchy.

A set is called co-recursively enumerable or co-r.e. if its complement is recursively


enumerable. Equivalently, a set is co-r.e. if and only if it is at level of the arithmetical hierarchy.

A set A is recursive (synonym: computable) if and only if both A and the complement of A are recursively
enumerable. A set is recursive if and only if it is either the range of an increasing total recursive function
or finite.

Some pairs of recursively enumerable sets are effectively separable and some are not.

6.8. Rice's theorem

In computability theory, Rice's theorem states that all non-trivial, semantic properties of programs are
undecidable. A semantic property is one about the program's behavior (for instance, does the program
terminate for all inputs), unlike a syntactic property (for instance, does the program contain an if-then-else
statement). A property is non-trivial if it is neither true for every computable function, nor for no
computable function.

Rice's theorem can also be put in terms of functions: for any non-trivial property of partial functions, no
general and effective method can decide whether an algorithm computes a partial function with that
property. Here, a property of partial functions is called trivial if it holds for all partial computable
functions or for none, and an effective decision method is called general if it decides correctly for every
algorithm. The theorem is named after Henry Gordon Rice, who proved it in his doctoral dissertation of
1951 at Syracuse University. It is also known as the Rice–Myhill–Shapiro theorem after Rice, John
Myhill, and Norman Shapiro.

Another way of stating Rice's theorem that is more useful in computability theory follows.

72 of 76
Let S be a set of languages that is nontrivial, meaning

1. there exists a Turing machine that recognizes a language in S


2. there exists a Turing machine that recognizes a language not in S

Then, it is undecidable to determine whether the language recognized by an arbitrary Turing machine lies
in S.

In practice, this means that there is no machine that can always decide whether the language of a given
Turing machine has a particular nontrivial property. Special cases include the undecidability of whether a
Turing machine accepts a particular string, whether a Turing machine recognizes a particular
recognizable language, and whether the language recognized by a Turing machine could be recognized by
a nontrivial simpler machine, such as a finite automaton.

It is important to note that Rice's theorem does not say anything about those properties of machines or
programs that are not also properties of functions and languages. For example, whether a machine runs
for more than 100 steps on some input is a decidable property, even though it is non-trivial. Implementing
exactly the same language, two different machines might require a different number of steps to recognize
the same input. Similarly, whether a machine has more than 5 states is a decidable property of the
machine, as the number of states can simply be counted. Where a property is of the kind that either of the
two machines may or may not have it, while still implementing exactly the same language, the property is
of the machines and not of the language, and Rice's Theorem does not apply.

Using Rogers' characterization of acceptable programming systems, Rice's Theorem may essentially be
generalized from Turing machines to most computer programming languages: there exists no automatic
method that decides with generality non-trivial questions on the behavior of computer programs.

As an example, consider the following variant of the halting problem. Let P be the following property of
partial functions F of one argument: P(F) means that F is defined for the argument '1'. It is obviously
non-trivial, since there are partial functions that are defined at 1, and others that are undefined at 1. The
1-halting problem is the problem of deciding of any algorithm whether it defines a function with this
property, i.e., whether the algorithm halts on input 1. By Rice's theorem, the 1-halting problem is
undecidable. Similarly the question of whether a Turing machine T terminates on an initially empty tape
(rather than with an initial word w given as second argument in addition to a description of T, as in the full
halting problem) is still undecidable.

Formal statement
Let be an admissible numbering of the computable functions; a map from the natural
numbers to the class of unary (partial) computable functions. Denote by the eth (partial)
computable function.

We identify each property that a computable function may have with the subset of consisting of the
functions with that property. Thus given a set , a computable function has property if and
only if . For each property there is an associated decision problem of determining,
given e, whether .

Rice's theorem states that the decision problem is decidable (also called recursive or computable) if
and only if or .

Examples
According to Rice's theorem, if there is at least one computable function in a particular class C of
computable functions and another computable function not in C then the problem of deciding whether a
particular program computes a function in C is undecidable. For example, Rice's theorem shows that each
of the following sets of computable functions is undecidable:

The class of computable functions that return 0 for every input, and its complement.
The class of computable functions that return 0 for at least one input, and its complement.
The class of computable functions that are constant, and its complement.
The class of indices for computable functions that are total [1]
The class of indices for recursively enumerable sets that are cofinite
The class of indices for recursively enumerable sets that are recursive

Proof by Kleene's recursion theorem

73 of 76
A corollary to Kleene's recursion theorem states that for every Gödel numbering of the
computable functions and every computable function , there is an index such that returns
. (In the following, we say that "returns" if either , or both and are
undefined.) Intuitively, is a quine, a function that returns its own source code (Gödel number), except
that rather than returning it directly, passes its Gödel number to and returns the result.

Let be a set of computable functions such that . Then there are computable functions
and . Suppose that the set of indices such that is decidable; then, there exists a
function that returns if , and otherwise. By the corollary to the recursion
theorem, there is an index such that returns . But then, if , then is the same
function as , and therefore ; and if , then is , and therefore . In both cases, we
have a contradiction.

Proof by reduction to the halting problem


Proof sketch

Suppose, for concreteness, that we have an algorithm for examining a program p and determining
infallibly whether p is an implementation of the squaring function, which takes an integer d and returns
d2. The proof works just as well if we have an algorithm for deciding any other nontrivial property of
programs, and is given in general below.

The claim is that we can convert our algorithm for identifying squaring programs into one that identifies
functions that halt. We will describe an algorithm that takes inputs a and i and determines whether
program a halts when given input i.

The algorithm for deciding this is conceptually simple: it constructs (the description of) a new program t
taking an argument n, which (1) first executes program a on input i (both a and i being hard-coded into
the definition of t), and (2) then returns the square of n. If a(i) runs forever, then t never gets to step (2),
regardless of n. Then clearly, t is a function for computing squares if and only if step (1) terminates. Since
we've assumed that we can infallibly identify programs for computing squares, we can determine whether
t, which depends on a and i, is such a program, and that for every a and i; thus we have obtained a
program that decides whether program a halts on input i. Note that our halting-decision algorithm never
executes t, but only passes its description to the squaring-identification program, which by assumption
always terminates; since the construction of the description of t can also be done in a way that always
terminates, the halting-decision cannot fail to halt either.
halts (a,i) {
define t(n) {
a(i)
return n×n
}
return is_a_squaring_function(t)
}

This method doesn't depend specifically on being able to recognize functions that compute squares; as
long as some program can do what we're trying to recognize, we can add a call to a to obtain our t. We
could have had a method for recognizing programs for computing square roots, or programs for
computing the monthly payroll, or programs that halt when given the input "Abraxas"; in each case, we
would be able to solve the halting problem similarly.

Formal proof[edit]

74 of 76
If we have an algorithm
that decides a non-trivial
property, we can construct
a Turing machine that
decides the halting
problem.

For the formal proof, algorithms are presumed to define partial functions over strings and are themselves
represented by strings. The partial function computed by the algorithm represented by a string a is
denoted Fa. This proof proceeds by reductio ad absurdum: we assume that there is a non-trivial property
that is decided by an algorithm, and then show that it follows that we can decide the halting problem,
which is not possible, and therefore a contradiction.

Let us now assume that P(a) is an algorithm that decides some non-trivial property of Fa. Without loss of
generality we may assume that P(no-halt) = "no", with no-halt being the representation of an algorithm
that never halts. If this is not true, then this holds for the negation of the property. Since P decides a
non-trivial property, it follows that there is a string b that represents an algorithm and P(b) = "yes". We
can then define an algorithm H(a, i) as follows:

1. construct a string t that represents an algorithm T(j) such that


T first simulates the computation of Fa(i)
then T simulates the computation of Fb(j) and returns its result.
2. return P(t).

We can now show that H decides the halting problem:

Assume that the algorithm represented by a halts on input i. In this case Ft = Fb and, because P(b) =
"yes" and the output of P(x) depends only on Fx, it follows that P(t) = "yes" and, therefore H(a, i) =
"yes".
Assume that the algorithm represented by a does not halt on input i. In this case Ft = Fno-halt, i.e., the
partial function that is never defined. Since P(no-halt) = "no" and the output of P(x) depends only on
Fx, it follows that P(t) = "no" and, therefore H(a, i) = "no".

Since the halting problem is known to be undecidable, this is a contradiction and the assumption that
there is an algorithm P(a) that decides a non-trivial property for the function represented by a must be
false.

Rice's theorem and index sets


Rice's theorem can be succinctly stated in terms of index sets:

Let be a class of partial recursive functions with index set . Then is recursive if and
only if or .

where is the set of natural numbers, including zero.

An analogue of Rice's theorem for recursive sets


One can regard Rice's theorem as asserting the impossibility of effectively deciding for any recursively
enumerable set whether it has a certain nontrivial property.[2] In this section, we give an analogue of
Rice's theorem for recursive sets, instead of recursively enumerable sets.[3] Roughly speaking, the
analogue says that if one can effectively determine for every recursive set whether it has a certain
property, then only finitely many integers determine whether a recursive set has the property. This result
is analogous to the original theorem of Rice, because both results assert that a property is "decidable"
only if one can determine whether a set has that property by examining for at most finitely many (for no
, for the original theorem), if belongs to the set.

Let be a class (called a simple game and thought of as a property) of recursive sets. If is a recursive
set, then for some , computable function is the characteristic function of . We call a characteristic
index for . (There are infinitely many such .) Let's say the class is computable if there is an
algorithm (computable function) that decides for any nonnegative integer (not necessarily a
characteristic index),

if is a characteristic index for a recursive set belonging to , then the algorithm gives "yes";
if is a characteristic index for a recursive set not belonging to , then the algorithm gives "no".

75 of 76
A set extends a string of 0's and 1's if for every (the length of ), the th element of is 1
if ; and is 0 otherwise. For example, extends the string . A string is
winning determining if every recursive set extending belongs to . A string is losing determining if no
recursive set extending belongs to .

We can now state the following analogue of Rice's theorem (Kreisel, Lacombe, and Shoenfield, 1959,[4]
Kumabe and Mihara, 2008[5]):

A class of recursive sets is computable if and only if there are a recursively enumerable set of losing
determining strings and a recursively enumerable set of winning determining strings such that every
recursive set extends a string in .

This result has been applied to foundational problems in computational social choice (more broadly,
algorithmic game theory). For instance, Kumabe and Mihara (2008,[5] 2008[6]) apply this result to an
investigation of the Nakamura numbers for simple games in cooperative game theory and social choice
theory.

76 of 76

Potrebbero piacerti anche