Sei sulla pagina 1di 6

CS103B Handout 21

Winter 2007 March 7, 2007


Pushdown Automata
Handout by Maggie Johnson and Jerry Cain.

PDAs
In our discussion of CFG's, we saw that the class of languages generated by CFG's is larger than the class of
languages defined by regular expressions. This means that all regular languages can be defined by a CFG,
and so can some non-regular. We also accept that a class of abstract machines (FA's) has this property: for
each regular language, there is at least one machine that runs successfully only on the input strings from that
language.

It would be very helpful if we could define an abstract machine model that corresponds analogously to
CFG's. We want CFL acceptors, just as FA's are regular language acceptors. To build these new machines,
we will begin with FA's and add some new gadgets to make them more powerful.

Step 1 is to develop a different representation for FA's. Thus far, we have not defined where the input string
lives as it is being read; we will define this now as an input tape. This tape must be infinitely long and will
be read in one particular direction, so we can distinguish between the first, second, third... symbol on the
tape. Each symbol goes in a location called a cell. The symbol Δ will be used to indicate an empty cell
(which will be the contents of the majority of the cells on this infinite tape).

As we process this tape, we read one symbol at a time and eliminate each as it is used; we can never go
backwards. When we reach the first Δ cell, we stop. As part of this new pictorial representation for FA's, we
need the following symbols:

The START symbol just represents a "-" or start state; the ACCEPT symbol represents a "+" or final state
from which there is no outbound path (dead-end final state); the REJECT symbol represents a dead-end state
that is not final. Note there is an important difference between these states and final states in an FA.
Previously in an FA, we could pass through a final state as we read the input string. With ACCEPT and
REJECT, we cannot exit.

Another modification to the pictorial representation is to represent every function performed in a state by a
different box. The typical task performed in a state is to "read and branch" which will now be represented by
a diamond shaped box:
2

So, with these new symbols, the following FA and its new representation:

Note that Δ edges lead only from READ boxes to some kind of halt state indicating the end of the string has
been encountered. Notice that this just looks like a flowchart. The reason we have defined this new
representation for FA's is because now it is easy to add a new mechanism that is required to enhance our
machine to accept CFL's. This new mechanism is a pushdown stack. This is a place where we can store
input symbols until we want to refer to them again. Push adds a new symbol to the stack pushing all other
letters on the stack down; and pop removes the top symbol from the stack. In terms of our use of this stack,
there is absolutely no way to get to a symbol in the stack short of popping (and therefore discarding) the
letters on top of it. There is no "peek" or "traverse". Popping an empty stack, like reading an empty tape,
gives us Δ. Also, the stack is initialized with a blank on top. A stack is added to the representation with the
following symbols:

When FA's have been enhanced with stacks, we call them PDA's or pushdown automata. PDA's were
introduced by Anthony Oettinger in 1963. Stacks as a data structure have been around for a long time, but
these researchers recognized that incorporating one into an FA increased its language-recognizing
capabilities considerably.
3

What language does this PDA accept?

Trace the operation of this machine on the string "aaabbb". We push 3 "a's" on the stack and then read a "b".
We pop the top "a" off and read another "b". Three "a's" are popped and the three "b's" are read when we
encounter Δ. This takes us to the final pop where we pop the Δ that is on the bottom of the stack. This string
is accepted.

The language of the words accepted by this PDA is {anbn for n = 0, 1, 2, 3...}. This is evident by the read and
push states right after the start state. All the "a's" are pushed on the stack. There are then two possibilities
on the tape: Δ or "b". If a Δ is read, we then pop the top element off the stack. If this element is Δ then the
tape was blank to begin with; if the top element is an "a", the input string is rejected (a+ is not accepted by
this PDA).

The other possibility is, after loading the "a's" at the front of the string, we read a "b". This sets us popping
and reading. This series of states ensures that we will only accept if the number of "b's" read is equal to the
number of "a's" popped. If any other combination occurs, we reject the string.

This is a language that we know is not regular, so PDA's are more powerful than FA's. We know we can
represent any FA in the new notation of the PDA, so we can create a PDA for any FA. If we add a stack to
the PDA, acceptors for non-regular languages can be built.

What makes these machines more powerful? What does a stack really provide? PDA's have a finite number
of states just like an FA but PDA's have a memory. PDA's can figure out where they have been, and how
often. The reason we could not build an FA to recognize {anbn for n = 0, 1, 2, 3...} is because for large n, the
an part had to run around a circuit and the machine could not keep track of how many times it had looped
the circuit. So, it could not distinguish between ambn and anbm. In a PDA, the stack serves as a memory. Is
this model as powerful as a computer? Not yet; we still have some enhancements to throw in.
4

Non-Determinism in PDA's
We saw above that a stack allowed us to build a machine that accepted a language generated by a CFG. Is
the addition of a stack all that is needed to allow these new machines to accept all CFL's? A deterministic
PDA is one like the example above, where every input string has a unique path through the machine. A
non-deterministic PDA is one where we may have to choose among many possible paths. We say that an
input string is accepted by such a machine if some set of choices leads to an Accept state.

The PDA's that are equivalent to CFG's is the class of non-deterministic ones. The following diagram
illustrates:

It's beyond the scope of this class to prove that PDA's are equivalent to CFG's, but the proof is constructuve
like the proof of Kleene's Theorem. We show how all possible constructs in a CFG can be represented by
mechanisms in a PDA, and vice versa. This proof provides a concise algorithm for translating any CFG into
a PDA.

The following is an example of a CFL that can be accepted by a non-deterministic PDA, but not by a
deterministic PDA. ODD-PALINDROME is a language of all strings of a's and b's that are palindomes and
have an odd number of letters.

ODD-PALINDROME = {a b aaa aba bab bbb ... }

This is a difficult language because the middle letter does not stand out making it difficult to tell where the
first part ends and the second part begins. It's not only hard - it's impossible. A PDA (like an FA) just
reads the string from left to right. It has no idea how many letters remain to be read. If the middle letter
was an "X" or something that really stands out, the job is much easier.

Consider the following machine:


5

This is non-deterministic since the left READ has two choices for exit for both "a" and "b". If we branch at
the right time, the machine will accept the words in this language; if we don't, it won't. But recall that the
definition of non-determinism in a PDA: if we make the right choices, the machine accepts; we need only
find one set of right choices. The word aba is accepted by this machine if we decide to read the "b" rather
than push; it will be rejected otherwise.

A non-deterministic pushdown automaton (N-PDA) consists of 7 components:

1. An alphabet Σ of input letters.


2. An input tape (infinite in one direction). Initially the string of input letters is placed
on the tape starting at the first cell. The rest of the tape is blank (Δ).
3. A stack (infinite in one direction). Initially, the stack has a blank cell (Δ) on top.
4. One START state with only out-edges, no in-edges.
5. Two types of HALT states: ACCEPT and REJECT. They have only in-edges, and no out-edges.
6. Finitely many nonbranching PUSH states that introduce characters on the top of the
stack.
7. Finitely many branching states of two kinds:

a) READ: states that read the next unused letter from the tape. These states
may have out-edges labeled with letters from S and Δ, with no restriction on
duplication of labels and no insistence that there be a label for each member
of S. They must have at least one in-edge.

b) POP: states that read and discard the top letter from the stack. These
states may have out-edges labeled with letters from S or Δ, again with
no restrictions. They must have at least one in-edge.

One last requirement is that the states are connected so as to become a connected
directed graph.
6

Non-Context Free Langauges


Are all languages context free? No. The set of CFL's contains the set of regular languages, but both are
subsets of still larger sets.

Example
Consider the language { anbnan for n = 1, 2, 3, ... } = { aba aabbaa aaabbbaaa ... }. Think about a PDA that
might accept this language. We know that as we read the a's, we have to keep track of exactly how many a's
have been read so we can check it against the number of b's read. The stack is used to track this information.
This works fine for the first pair (anbn) but when it comes time to check the second an, the stack is empty
because we had to pop off all the a's as we read the b's.

Maybe we can try this approach: for every "a" read from the initial cluster, we push two a's on the stack.
Then, when we read the b's we match them up with half the a's. When we get to the last clump of a's, we
have exactly the right number of a's in the stack to match. Will this idea work? It works, but unfortunately,
this PDA will accept a lot more than the specified language.

The problem is we have no way of checking to be sure the b's use up exactly half of the a's in the stack. The
word a10b8a12 will also be accepted by this PDA. The truth is no PDA can be built to accept this language.
This is an example of a non-context free language.

Potrebbero piacerti anche