Aho - Indexed Grammars

Indexed Grammars--An Extension of Context°
Free Grammars
ALFt~ED V. A H O
Bell Telephone Laboratories, Inc., Murray Hill, New Jersey
aBSTRACT. A new type of grammar for generating formal languages, called an indexed gram-
mar, is presented. An indexed grammar is at~ extension of a context-free grammar, and the
class of languages generated by indexed g r a m m a r s has closure properties and decidability re-
sults similar to those for context-free languages. The class of languages generated by indexed
grammars properly includes all context-free languages and is a proper subset of the class of
context-sensitive languages. Several subclasses of indexed grammars generate interesting
classes of languages.
KEY WORDS AND PHRASES: formal grammar, formal language, language theory, a u t o m a t a
theory, p h r a s e - s t r u c t u r e grammar, p h r a s e - s t r u c t u r e language, syntactic specification, con-
text-free grammar, context-sensitive grammar, stack a u t o m a t a
CR CATEGORIES: 5.22, 5.23
1. Introduction
A language, whether a natural language such as English or a programming language
such as ALGOL, in an abstract sense can be considered to be a set of sentences)
One criterion for the syntactic specification of a language is that, invariably, a
finite representation is required for an infinite class of sentences. There are several
different approaches toward such a specification. In one approach a finite genera-
tire device, called a grammar, is used to describe the syntactic structure of a lan-
guage. Another approach is to specify a device or algorithm for recognizing well-
formed sentences. In this approach the language consists of all sentences recognized
by the device or algorithm. A third possible method for the specification of a lan-
guage would be to specify a set of properties and then consider a language to be a
set of words obeying these properties.
In this paper a new type of grammar, called an indexed grammar, is defined.
The language generated by an indexed grammar is called an indexed language. It
is shown that the class of indexed languages properly includes all context-free
languages and yet is a proper subset of the class of cotttext-sensitive languages.
Moreover, indexed languages resemble context-fl'ee languages in that many of the
closure properties and decidability results for both classes of languages are the same.
Recently, there has been a great deal of interest in defining classes of recursive
languages larger than the class of context-free languages. Programmed grammars
are a recent example of a grammatical definition of such a class [16], and various
A summary of this paper was presented a t the I E E E E i g h t h Annual Symposium on Switch-
ing and A u t o m a t a Theory, October 1967.
The words " s e n t e n c e , " " w o r d , " and " p r o g r a m " will be used synonymously as being an ele-
ment in a language.
Journal of the Associationfor Computing Machinery, Vol. 15, No. 4, Oetcber 1968,pp. 647-671.
Copyright ~ 1968,Associationfor Computing Machinery, Inc.

648 A. V. AHO
kinds of stack automata provide a device-oriented definition of such a class [8, 9,

14].
Part of this interest in larger classes of languages stems from the inadequacy of
context-free grammars in specifying all of the syntactic structures found in many
modern-day algorithmic programming languages, such as ALGOL [5]. For any auto-
matic compiler writing system to be a practical reality, at the very least, some
reasonable scheme for representing the complete syntactic structure of a language
must be available. At present, however, there does not appear to be a reasonable
grammar or automaton model less powerful than a context-sensitive grammar or
linear-bounded automaton which is capable of providing such a specification.
In Section 2 of this paper the definitions of indexed grammar and indexed lan-
guage are given. The basic closure properties and decidability results for indexed
grammars and languages are presented in Sections 3 and 4. It is shown that the class
of indexed languages possesses closure properties enjoyed by many other well-known
classes of languages and, in fact, qualifies as an example of an abstract family of
languages as defined in [7].
In Section 5 it is shown that the class of indexed languages includes all context-
free languages and some context-sensitive languages, but yet is a proper subset of
the class of context-sensitive languages.
Several restricted forms of indexed grammars capable of generating interesting
subclasses of indexed languages are presented in Section 6. Two such classes of
restricted forms of indexed grammars are equivalent in generative capability to the
class of context-free grammars.
Each of the four classes of grammars in the Chomsky hierarchy, viz. type 0,
context-sensitive, context-free, and regular grammars, generates a family of lan-
guages which corresponds exactly to that family of sets recognizable by a particular
class of automata, viz. Turing machines, linear bounded automata, one-way non-
deterministic pushdown automata, and finite automata, respectively [3]. Indexed
languages also enjoy this dual representation. A new automaton model, called
a nested stack automaton, provides this additional characterization, in the sense
that a set is recognized by a (one-way, nondeterministie) nested stack automaton
if and only if the set is an indexed language [1].
Both the pushdown automaton and the recently defined stack automaton are
special cases of a nested stack automaton. Because of this characterization for
indexed languages in terms of languages accepted by a class of devices which are
natural generalization of both pushdown automata and stack automata, we feel
that the class of indexed languages assumes a natural position in the Chomsky
hierarchy of languages.
Nested stack automata and the equivMence of indexed languages with (one-way,
nondeterministic) nested stack automaton languages are discussed in [1].
2. Indexed Grammars
In this section the basic definition of an indexed grammar is presented. A few simple
examples of indexed grammars are given, and it is shown how a derivation tree can
be associated with the derivation of a sentence in an indexed grammar.
Definition. An indexed grammar is a 5-tuple, G = (N, T, F, P, S), in which:
(a) N is a finite nonempty set of symbols called the nonterminal alphabet.
Journal of the Association for Computing Machinery, *Vol. 15, No. 4, October 1968
Indexed Grammars--An Extension of Context-Free Grammars 649
(b) T is a finite set of symbols called the terminal alphabet.

(c) F :is a finite set each element of which is a finite set of ordered pairs of the
form (A, x), where A is in N and x is in ( N U T)*. 2 An element f in F is called an
index or flag• Art ordered pair (A, x) in f is written A + x and is called an index
production contained i~t f.
(d) P is a finite set of ordered pairs of the form (A, a) with A in N and a in
(NF* U T)*• Such a pair is usually written A --, a; it is called a production.
(e) S, the sentence symbol, is a distinguished symbol in N.
Let G = (N, 7', F, P, S) be an indexed grammar• Roughly speaking, a derivation
m G is a sequence of strings, eel, a2, • • • , a~ with each a~ in (NF* U T)*, in which
a~+t is derived from a~ b y the application of either one production or one index
production. E x c e p t for the m a n n e r in which indices and strings of indices are manip-
ulated, a derivation in G proceeds in exactly the same manner as in a context-
free grammar.
However, in a derivation in G each nonterminal in N can be immediately followed
by a string of indices in F*. A string of symbols of the form Af, where A is in N and
~"~• s"m F * , is" calle d an indexed nonterminal. A string ce in (NF* U T ) * is often referred
to as a sentential form.
t o r m a t l y , a sentential form c~ is said to directly generate (or directly derive) a
sentential form/~, written a 7 5, if either: a
1. c, = 'tArS,
A ~ X~,X2n2 • • • Xk~k is a production in P,
= ,rX~O~X~O2...XkO~where, f o r l < i < k, 0 ~ = n £ if X~ is in N, or
0~ = e if X~ is in .7';
or
2. = ,rAft'S,
A ~ X~X: . . . Xk is an index production in the index f,
= 7XlOIX202"'XkOk~where, for 1 < i < k, 0~ = ~" if Xi is in N, or
0~= eifXiisinT.
In both cases 1 and 2, the nonterminal A is said to be expanded• I n case 2 the
index f is said to be consumed by the nonterminal A. Observe t h a t a terminal symbol
can never have a string of indices immediately following it in a n y line of a deriva-
tion.
For example, if Ag" is an indexed nonterminM and A --~ aB~COb is a production
in P, then A~ can directly derive the sentential form aB~'CO~b in G. Here the index
2Let T be a set. 7'+ is the set of all finite-length strings of elements of T, excluding ~, the empty
string. T* := 7'+ U l~}.
3Whenever the reference is to a grammar G ~- (N, T, F, P, S), unless otherwise stated, the
following symbolic conventions are used:
(1) A, B, C, D are in N.
(2) a a n d b a r e i n T (j {~} where ~ is the null word.
(3) w and x are in T*.
(4) x and ~o are in (N U T)*.
(5) f, g, h are in F.
(6) L v, 8 are in F*.
(7) a, fl, 7 , ~ a r e i n (NF* U T)*.
(8) XisinN tJ T .
Jom'~alof ~he Associationfor ComputingMachinery,Vol. 15, No. 4, October19668

650 A.v. A~o
string f distributes over the indexed nonterminals B~ and CO but not over the
terminal symbols a and b in the production.
If A f f is an indexed nonterminal and the index f contains the index production
A --~ aBC, then A f ~ can directly derive the sentential form a B f C f in G. Notice that
in such a step the index f is removed from the next line of the derivation, and the
index string f distributes over the nonterminals B and C but not over the terminal
symbol a.
The reflexive and transitive closure of the relation -2 oil ( N F * U T)* is defined
as follows:
0
1. a-2~.
n+l n
2. F o r n > 0,
- -
~ - - -(/* ~ , i f , forsome¢~, a-~>~and~-~%
i
3. a*/~iffc-2/~forsomei> 0.
We say that ~ generates or derives ~ if a: *-~ ¢~.A sequence of sentential forms
a0, (:Y1, ~:~2, " ' " , an (1.1)

such that a~ -2 a~+~, for 0 < i < n, is called a derivation (of length n) of a~ from
a0 in G. If in each sentential form a~ the leftmost nonterminal is expanded in order
to directly generate the string a~+~, for 0 < i < n, then the sequence of strings (1.1)
is called a lefgmost derivation. It should be clear that if there exists a derivation of
B from a, then there always exists a leftmost derivation of ~ from ~. Unless it is
otherwise specified, the word "derivation" implies a leftmost derivation.
The language generated by an indexed grammar, G = (N, T, F, P, S), denoted g
L(G), is called an indexed language and is defined to be the s e t / w in T* [ S -L*ow}.

Henceforth, the subscript G on the relations "-~"
o
and "*-*"
o
is dropped when-
ever G is clearly understood.
E x a m p l e 1. Let G1 = (IS, A, B}, {a, b, e}, {f, g}, P, S), where P consists of
the productions S - - ~ a A f c , A --* aAgc, and A --~ B, a n d f = [B--* b], g =
[B ---* bB].
Applying the first production once, the second production n - 1 times, n > 1,
and then the third production once, we have
S --~ a A f c --~ aaAgfcc ~ . . . --* a'Mg'~-~fc n --~ a~B g ~-~fc

~ ~•
Then, expanding B by consuming indices, we have
Bg " - l f --~ .oti

. . .g -~,j --~ • .. --~ b ~- ~B f "-'* b ~.
Thus, S 2,~+~ a % % ' L Moreover, a derivation in G~ is "deterministic" except for the

decision to expand A to B. Thus, it should be clear that the only strings of terminals
which can be derived from S in G~ are all of the form a~b% ~, n >_ 1. Consequently,
L(G~) = Iambic ~ I n 7> 11.
As with context-free grammars the structural description of a word in an indexed
language can be described in terms of a derivation tree or parsing diagram. A deriva-
tion tree in an indexed grammar G = (N, T, F, P, S) is a tree with labeled nodes.
Each node consists of a tuple of positive integers of the form (i~, i2, . " , i,),
Journal of the Association for Computing Machinery, VoI. 15, No. 4, October 1968
Iezdexed G r a m m a r s - - A n E x t e n s i o n of Context-Free G r a m m a r s 651
r >_ 1, and the directed lines of the tree are ordered pairs of nodes of the form
( ( i , , • .. , it), (i~, • .. , i r , in+,)). E a c h node of the derivation tree is labeled by
~ n indexed nonterminal or terminal symbol in N F * U T.
Let l be the m a x i m u m n u m b e r of nonterminals and terminals oil the right-hand
s i d e of a n y production in P. With eaeh node n = (i~, i2, .. • , i~) with 1 < is <_ l,
xve ean associate the l-sty fraction O.ici2 . • • i~, which we call the node n u m b e r for n.
VVe say t h a t n~ < n2 (or n~ < n2) if the node number for'n, is less t h a n (or less t h a n
o r equal t o ) t h a t for n2. T h e r e I a t i o n " _<" is a simple order oa the set of nodes.
Suppose D is derivation tree and {a~ni~, a.en~2, " " , o~n,~} is the set of labeled
leaves ~ of D arranged such that n~j < nij+, for 1 _< j </~. Each node label, a l , is in
N F * U T . T h e string o ~ 2 .. • ak is called the yield of D.
A derivation tree in an indexed g r a m m a r G = ( N , T, F, P , S ) with root A¢0, A
i n N and ~'0 in F*, is a set of labeled nodes formally defined as follows:
1. T h e set containing exactly the labeled node A¢0 (1), the root of the tree, is a
d e r i v a t i o n tree.
2. Suppose D is a derivation tree with yield 5 B b ' where ~ and-~ are in ( N F * U T ) *
a n d Be is the node label of leaf (i~, i2, . . . , i~). If B - ~ X~v~X2v.,.... Xk~k is in P,
t.hen D' - D U {X~O~m~, X202m~, . . . , XkOkmk} is a derivation tree where, for
1 < j <_ k, 0~ = ~.¢ if X j is in N, or 0~ = e otherwise, and mj =
( i, , i~ , • • • , i~ , j ) . T h e yield of D' is 5X~O,X,O, • • • X~O~'I.
3. Suppose D is a derivation tree with yield ~ B f b where B f f is the node label of
teaf (i~, iu, . . . , i,). If f contains B --~ X,X= . . . X~, then D ' = D U {X,O,m~,
X,~O~m,, . .. , X~O~m~} is a derivation tree where, for i < j _< k, 03. = ~" if X~. is in
N , or 0i = e otherwise, and mi = (i~, i~, . . . , i~, j ) . T h e yield of D ' is 5X~O,X~O=
• .. X~O~'y.
4. D is a derivation tree in G with root Af0 if and only if its being so follows from
a finite n u m b e r of applications of 1, 2, and 3.
I t should be clear that, given an indexed g r a m m a r G = ( N , T, F, P , S ) , for
each derivation in G there exists a corresponding derivation tree, and for each
derivation tree in G there exists at least one derivation (and exactly one leftmost
derivation).
THEOREM 2.1 Given an indexed g r a m m a r G = (N, T, F, P , S ) , A ¢ 2+ o
ce i f and
only i f there exists a derivation tree in G with root A f and yield a.
The proof of T h e o r e m 2.1 is similar to t h a t for context-free grammars.
We conclude this section with a a example of a derivation tree in an indexed
grammar. For notational convenience each node of the derivation tree is represented
only b y its node label.
E x a m p l e 2. Let G= = ( {S, T, A, B, C}, {a, b}, {f, g}, P, S), where P contains
the productions
S ---* T f , T ~ Tg, T --~ A B A ,
and where
f = [A -+ a, B -+ b, C --~ b], g = [A --+ a A , B ---+ bBCC, C ---+bC].
4 A leaf is a node (i~ , ... , i~) of D such that there is no node in D of the form (il , ... , is, j)
for any j.
Journal of the Association for Computing Machinery, VoL 15, No. 4, October 1968
652 A. V. AH0
S
I
Tf
Tgf
/~fAgf Bgf A~
FIG. 1. Derivation tree for aabbbbaa
It is easy to show that

L(G2) = {anb'~2a'~ln > 1}.
The derivation tree corresponding to the derivation of a2b4c~is shown in Figure 1.
3. Closure Properties
A class of languages can be formally considered to be a pair (T, £), or £ whenever
T is understood, where
(a) T is a countably infinite set of symbols,
(b) £ is a collection of subsets of T*,
(c) For each L in £, there exists a finite subset T1 of T such that L ~ TI*.
A large number of the classes of languages studied in language theory have many
properties that are common to each class. A number of the closure properties enjoyed
by many well-known classes of languages were selected in [7] as an axiom system
for defining abstract families of languages. Specifically, a full abstract family of
languages (full AFL) is a class of languages closed under the operations of union,
concatenation, Kleene closure (*), homomorphism, inverse homomorphism, and
intersection with regular sets. These properties, 5 in turn, imply a number of other
closure properties including closure under left and right quotient by a regular set
and closure under L i t , Fin, and Sub [7].
In this section we show that the class of indexed languages qualifies as a full AFL
and thus possesses all properties enjoyed by full AFL's. In addition, we show that
indexed languages are also closed under the operations of word reversal and full
substitution.
Before doing so, we use some preliminary results which make the representation
of an indexed grammar notationally more convenient.
Two grammars G1 and G2 are said to be equivalent if L(GO = L(G2).
An indexed grammar G = (N, T, F, P, S) is said to be in reduced form if:
.(a) Each index production in each index in F is of the form A --+ B, where both
A and B are in N.
(b) Each production in P is of one of the forms
(1) A --+ BC,
Not all of the AFL operations themselves are independent. See [12] for details.
Journal of the Association for Computing Machinery, Vol. 15, No. 4, October 1968
(2) A ~ Bf, or
(3) A ~ a,
where A, B, C are in N, f is in F, and a is in T U {e}.
THEOREM 3.1. Given an indexed grammar G = (N, T, F, P, S), an equivalent
reduced form indexed grammar G' = ( N', T, F', P', S) can be effectively constructed
from G.
PROOF.
1. Suppose f is an index in F. A corresponding reduced index f' in F' is constructed
as follows:
(a) If A --~ B is in f, then A --* B is also i n f .
(b) If A --+ x is in f, where x is not a single nonterminal, then A --+ C is i n f '
and C --~ × is in P', where C is a new nonterminal not in N.
2. Add to P ' all productions in P replacing each index in F by the corresponding
reduced index in F'.
3. In P ' replace each production of the form A --~ X171... Xkw with k >__ 2,
where vi is a string of reduced indices, with the following productions:
A ~ B1CI ,
or, if k = 2,
A --~ B1B2 ,
C~--~B~+IC~+~ for 1 <i< k- 2,
Ck-1 ---* B k _ l B k •
Here B~ and C~ are new symbols not in N, for 1 < i < k, except as noted in (d)
below. For each i = 1, • .. , k:
(a) If w = f~' " " " f / w i t h r >_ 2, then add to P ' the productions:
Bi --~ D,f,',
I
Di---*Dj-,f¢-i for 3 _ < j _ < r,
D~ --~ X~fl'.
Again, D~. is a new symbol for 2 _< j < r.
(b) If m = f', add to P the production
Bi ~ X~f'.
(c) If X~ is a terminal symbol, add to P the production
Bi --~ X i .
(d) Otherwise, Bi = X i . (This is the case where X~ is a nonterminal not
followed by any indices.)
4. In P', each production of the form A -+ Bf~ • • • f / w i t h r >__ 2 is replaced b y
a set of productions similar to those in 3(a).
5. The productions now in P ' which are not of reduced form are those of the
form A --* B. Call such a production a type 1 production. All type 1 productions
are removed from P'. However, if A *-+ O B by type 1 productions only and B ~ A,
then, for each production in P ' of the form (i) B ~ CD, (ii) B --~ Cf, or (iii)
B ~ a, add to P ' the production (i) A -~ CD, (ii) A -+ Cf', or (iii) A --* a, re-
spectively.
Journal of t.he Association for Computing Machinery, Vol. 15, No. 4, October 1968
654 A . v . AHO
Let G' = (N', T, F', P', S) be the grammar constructed from G according to
rules 1-5 given above, where N' is that set of nonterminals appearing in P'. G' is a
reduced form indexed grammar, and a straightforward proof by induction on the
length of a derivation yields that Afl ... f~ -~G w if and only if Aft' ... f / *-~
G
w,
where r > 0, A is in N', and f ( is the reduced index constructed from fi • From
this, it immediately follows that L(G t) = L(G).
LEMMA 3.1. The class of indexed languages is closed under the operations of union,
concatenalion, and Kleene closure.
The proof of Lemma 3.1 is obvious.
The concept of a "finite" transducer mapping has many important applications
in language theory.
Definition. A nondeterministicfinite transducer (NFT, for short) is a 6-tuple M =
(Q, T, 2;, 3, q0, F) where:
(a) Q, T, Z are finite sets of states, input symbols, and output symbols, respec-
tively.
(b) ~ is a mapping from Q X ( T [J {e} ) into finite subsets of Q X ~*.
(c) q0, in Q, is the initial state.
(d) F ~ Q is the set of final states.
If ~ is a mapping from Q × T into Q X z*, then M is a deterministic finite trans-
ducer ( D F T ) .
will be an extended mapping from Q X T* into subsets of Q × 2;* defined as
follows:
(i) ~(q, e) contains (q, e).
(ii) If ~(q~, w) contains (q2, x) and ~(q~, a) eomains (q~, y), then g(q~ , wa)
contains (q3, xy) for w in T*, a in T iJ {e}, x and y in 2;*.
An N F T M = (Q, T, Z, ~, q0, F) can induce a mapping on a language L ~ T*
in the following manner. For w in T*, M(w) = {x I ~(qo, w) contains (p, x) for
s o m e p i n F } . M ( L ) = (J~i,,LM(w).
We can also define an inverse N F T mapping. Let M-l(x) = {wl M(w) = xl.
Then, M-~(L ') = (J~ in L' M-'(X).
The definition of an N F T is sufficiently general for every inverse N F T mapping
to be an N F T mapping. That is, given an NFT, M = (Q, T, 2~, ~, q0, F), an NFT
M ~ can be effectively constructed from M such that M ' ( L ) = M-~(L) for any
L ~Z*.
LEMMA 3.2. The class of indexed languages is closed under N F T mappings.
P~ooF. Let G = (N, T, F, P, S) be a reduced form indexed grammar and
let M = (Q, T, 2;, ~, q0, K) be an NFT. We will construct an indexed grammar
G' = (N', ~, F', P', S') such that L(G') = M ( L ( G ) ) .
The nonterminals in N' will be of the form (p, X, q), where p and q are in Q and
X is in N (J T (J I e}. The set of productions P' is constructed as follows.
1. If A ~ BC is in P, then P' contains the set of productions (p, A, q) -o
(p, B, r) (r, C, q) for allp, q, r in Q.
2. If A ---* Bf is in P, then P' contains the set of productions
(p, A, q) --* (p, B, q)y'
for all p, q in Q. The index f' contains the index productions (r, C, s) --* (r, D, s)
for all r, s in Q, if and only iff contains the index production C -~ D.
Indexed G r a m m a r s - - A n Exten~ion of Context-Free Grammar;~ 655
3. If A -~> a is h: P, then P' conto.ins the set of productions (p, A, q) --~ (p, a, q)
for ~dl p, q i:~ @
4. ]/ t~lso eont;ains the productions
(~, a, g) --, (p, a, r) (r, ~, q),
(~, ~, q) --~ (p, ~, ~) (r, a, q),
(~, ,, q) -~ (~J, ~, r) (r, ~, q),
for all a in T a:id p, q, r in Q.
5. [ / eont~fins the terrnh:~al production (p, a, q) --~ x if ~(p, a) contains (q, x)
f o r a i n T U{e}, x i n E * .
6. Finally, P' contains the initial productions S' ~ (q0, S, p) for' all p in K.
t $
We now show that, (p, A, q)f: . . . f j ~* x, j > 0, x in E*, :if and only if
A/:. •. fj --Lvv and ~(p, w) contains (q, m) for some ~vin T*. First, we show that:
G
k
(*) If (p, A, q ) f j . . , fj ~> x is any derivation of length k, then All . . .
fj -;, w and ~(p, w) contains (q, x) for some w in T*.
The proof of (*) will be by induction on/c, the length of a derivation in g'. (*) is
vacuously true for derivations of length 1.
Now suppose (*) is true for all/c < m with m > 1 and consider a derivation
m
(p, A, q)f:' .. • f j -~ x. Such a derivation can be of one of four following forms.
(i) (p, A, q)f:' . . . f / o-? (p, B, r)f:' . . . f j ( r , C, q)f:' - . - f /
! 'tT'lI
(p, B, r)f: . . . fi' ~7 x: where m~ < m,
t m~
(r, g, q)A " " fJ' ~ z2 whereto2 < m .
In th:is ease, we have the following derivation in G:
A fl "'" f j -~ BJi "'" f j C f , "'" fs
-L
O w:Cf: . . . f j
W:W2 •
Also, from the inductive hypothesis ~(p, w:) contains (r, x:) and ~(r, w~) contains
(q, z~). Hence ~(p, w) cont~fins (q, x), where w = w:w~ and x = x::~.
(ii) (p, A , q)f:' " " f / ~,-->(p, B, q ) f ' f : ' . . , f j
m !
-*z within ~ <m.
ty ~
In G, we have the derivation

A f: . . . f i -~ Bff: . . . f~
w,
a:d from the inductive hypothesis, ~(p, w) contMns (q, x).

(iii) (p, A, q)f:' .." f j ~ (P, a, q)f:' . . . f j
m ~
--~ x with m' < m.
G~
i~i! Here, the indices disappear when terminals are introduced.
Journal of the Associatioa for Computing Maehiaery, V o l . 15, N o . 4, O c t o b e r 1968
~ili
i , U ,!ili,
656 A, V. A H 0
It is straightforward to show that (p, a, q) ~ x, a in T U {e} if and only if

~(p, a) contains (q, x).
Consequently, in G we have: A f l . . . f~ -~ a and ~(p, a) contains (q, x).
G
(iv) (p, A , q)£'f2' " " fj' ~ (p, B, q)f~' . . . f /
--*x
ql within' < m .
Here, the index fl' contains the index production (p, A, q) --~ (p, B, q), and is
consumed in the first line of the derivation.
In G we have the derivation
A f ~ f 2 . . . f j --~
G
Bf2 . . . f~
--~
q
w for some w in T*,
and from the inductive hypothesis ~(p, w) contains (q, w).
These four cases represent all possible ways in which the first expansion in the
derivation in G' can proceed. Thus, (*) holds for all k > 1.
A similar proof by induction on the length of a derivation in G yields that if
Af~ . . . f j *-~ow and ~(p, w) contains (q, x), then (p, A, q)f~' .. • f j ~, x. The details
are left for the reader.
Thus, we have: S' * w, w in T*,
(q0, S, p) o27,x for p in K if and only if S -~
and ~(q6, w) contains (p, x). Thus, L ( G ' ) = M ( L ( G ) ) .
Since an inverse N F T mapping is also an N F T mapping, we have:
COROLLARY l. The class of indexed languages is closed under inverse N F T map-
pings.
We also have the following special cases.
COROLLARY 2. The class of indexed languages is closed under (i) gsm mappings, 6
( ii) inverse gsm mappings, ( iii) homomorphlsms,
• 7 and (iv) inverse homomorphisms.
COROLLARY 3. The class of indexed languages is closed under intersection with
regular sets.
PRoof. Let L ~ ~* be L ( G ) for some indexed grammar G and let R ~ Z* be
any regular set. We can construct an N F T M = (Q, z, ~, ~, q0, F) such that
M ( w ) = {w} ifw is in R and M ( w ) = ~, where ~ is the empty set, otherwise. Then,
M(L) -- L n R.
F r o m Lemma 3.1 and Corollaries 2(iii), 2(iv), and 3 of Lemma 3.2, we have:
THEOREM 3.2. The class of indexed languages i~ a full abstrac~ family of lan-
guages.
Two important closure properties of the class of indexed languages which do not
follow from its being a full AFL are closure under word reversal s and substitution. 9
THEOREM 3.3. The class of indexed languages is closed under word reversal.
PROOF. Let L be L ( G ) for some reduced form indexed grammar G =
e A generalized sequential machine (gsm) mapping is defined by a DFT of the form (Q, T, ~, ~,
q0, Q) [6].
7 A homomorphism is a mapping h of T* into ~* such that h(e) = e and h(xa) = h(x)h(a) for x
inT*, a i n T .
eR = ~.Ifw = ala2.., a . , t h e n w R = a,~.., al. L R = {w R I w i s i n L } . w Ris called the
reversal of w.
Let T be a finite set and for each a in T let Ta also be a finite set. Suppose r(a) C Ta* for
all a in T. r is a substitution if r(~) = e and T ( x a ) = T(X)T(a) for x in T*, a in T.
(N, T, F, P, S). Let G' = (N, T, F, P', S) where P' contains the following produc-
tions.
1. If A --* BC is in P, P' contains A --~ CB.
2. All productions of the form A ~ Bf and A --~ a which arc in P are also in P'.
A simple proof by induction of the length of a derivation yields that A~ * w if
G
and only if A~" a27 wR. It then follows L(G') = ( L ( G ) ) ~.
Substitution of arbitrary indexed languages for the terminal symbols of words
in an indexed language yields an indexed language.
THEOREm 3.4. Let T -- {al, • •., a,,}, Ti be a finite alphabet and L~ ~ Ti* be an
indexed language for 1 ~_ i < n. I f L ~ T* is an indexed language, then L ' =
{w~ ... wm ] aq ... ai.~ is in L and for 1 ~_ j <_ m and 1 ~_ ij ~ n, w~ is in L~j} is
an indexed language.
PROOF. Let L be L(G) for G = (N, T, F, P, S), and let L~ be L(G~) for G~ =
(N~, T~, F i , P~, a~), 1 < i < n. We can assume without loss of generality that
N~NNj= cfori~j, NFIN~=~,andTNT~= ~forl<i<n.
Let G' = (N', T', F', P', S) with
N'=NU ON,, F'=FU (JF,,

¢=1 i=l
T' --- U T , , P'=PU OP,.

ill i=l
It should be clear that L(G') = L'.

In the next section we show that the class of indexed languages is not closed
under intersection or complement.
The class of indexed languages very clearly includes all context-free languages
and also, as we have seen, includes some context-sensitive languages which are not
context-free, such as { a % % " [ n >_ 1} and {a~b~a~[n >_ 1}. However, indexed
languages seem to more closely resemble context-free languages than context-sensi-
tive languages in that many properties that hold for indexed languages also hold
for context-free languages but do not necessarily hold for context-sensitive lan-
guages. For example, the class of context-sensitive languages is not a full AFL,
whereas both the context-free and the indexed languages are. In Section 4 we show
that decidability results for indexed languages also bring out this distinction.
4. Decidability
Let ~ be a family of languages. The emptiness problem is said to be solvable for ~ if,
given any L in ~, we can effectively determine whether L = ~. If L is specified
as L(G) for some grammar G, the emptiness problem for L becomes one of deter-
mining whether G generates any terminal strings.
The membership problem is solvable for £ if given any string w and given any
L in 2, we can effectively determine whether or not w is in L. The membership
problem is solvable for ~ if each language in .9 is recursive.
The solvability of either the emptiness problem or the membership problem for a
class of languages depends upon the representation of the class of languages. Here,
we implicitly assume that the class of indexed languages is represented as some
lexicographic listing of indexed grammars. Under this assumption we show that
658 A.V. A~O
both the emptiness problem and the membership problem are solvable for the class
of indexed languages.
THEOREM 4.1. Given an indexed grammar G' = (N, T, F, P', S), the predicate
L( G') = ~ is solvable.
PaOOF. Without loss of generality, assume that G~ is in reduced form. Let G =
(N, ~, F, P, S) be the grammar derived from G by replacing each production in P
of the form A -~ a, where A is in N and a is in T by the production A --~ e, and
leaving all other productions unchanged. I t should be clear that L(G') is nonernpty
if and only if e is in L(G).
Let 1° # ( P ) -- p and #(N) = n.
From P, we construct a set Q of new productions. The symbols appearing in these
productions are of the form (A, Ni), where A is in N and N~ is in n 2 N, and there
is also a symbol of the form (e, ~), where ~ is the empty set. Moreover, all symbols
on the left-hand side of index productions appearing in Q are in N.
Initially, Q is as follows:
1. If A ---* BC is in P, then Q contains the 22~ productions (A, N~ (J N~) -~
(B, Ni) (C, Nj) for all values of N~ and N j ir~ 2 ~.
2. I f A --+ e is in P, then Q contains the production (A, ~) --~ (e, ~0).
3. If A ~ Bf is in P and f = [C1 --* D1, .. • , Ck -+ Dk], then Q contains for all
values of N~j, 1 ~_ j _< k, the productions (A, Ni) --~ (B, M ) f ' , where
f ' = [C, --* (D1, N~,), . . . , C~ --, (Dk, N~)]
such that N~ = [J~ffi~N~i . M is a set to which nonterminals in N appearing on the
left-hand side of the index productions in f~ may be added during the course of the
algorithm to follow. Initially, M = ¢. However, if it is determined that Dj --~* ¢0,
~0 in N~*, and Cj --~ ( D j , Nii) is an index production in f', then the nonterminal
Cj is added to M.
The number of productions in Q is certainly no more than p ( 2 r~ ~- 22~), where
l is the maximum number of index productions in any index in F.
We now make a sequence of passes through Q. During each pass symbols in Q of
the form (A, N~) are marked or checked off in accordance with Algorithm 1 below.
Upon termination of the algorithm a symbol of the form (A, N~) is marked in Q if
and only if A *-2-,B~B~ . . . B~ and[J~=~ {B~.} = N~. A symbol of the form (A, ~) is
(7
marked if and only if A -**
q
e. Thus, L ( G ~) is nonempty if and only if (S, ¢) is
marked in Q.
ALGORITHM 1
1. On the first pass through Q:
(i) Mark all occurrences of symbols of the form (A, {A}) for all A in N.
(ii) For all A in N mark the symbol (A, ~o) in all productions of the form (A, ~) ~ (e, ~)
in Q.
2. On each successive pass through Q:
(i) For all A in N and all N~ in 2~, if (A, N~) is a marked symbol on the left side of a pro-
duction in Q, mark all occurrences of the symbol (A, N~) appearing on the right of pro-
ductions and index productions in Q.
10#(E) = cardinality of the set E.
n Let M be any finite set with #(M) = k. Then, 2M = {M1, M2, -.. , M2k} represents the set
of subsets of M.
Indexed Grarmrlars--An Extension of Context-Free Grammars 659
(ii) If (A, N~) ~ (B, N~)(C, N~) is in Q ~nd b o t h (B, N~) and (C, N~) ~re marked in this
production, then mark the symbol (A, N~) in this production.
(iii) If ( A , N ~ ) - > (B, M)f' is in Q, suchth~t ( B , M ) i s m a r k e d , M = { C ~ , . . . ,C,}, r > 0,
C~-~ (Di,N~i) i s i n f ' , a n d ( D ~ , N ~ ) is marked f o r l < j < r, a n d N ~ = tJ~=~ {N~i},
t h e n mark the symbol (A, N~) in this production.
(iv) If (A, N~) -~ (B, M)f' is in Q a n d f ' contains an index production C --~ (D, N~) in which
(D, N~) is marked, add to M the nonterminM C. If M now contMns the nonterminM B
and N~ = N~ , t h e n mark the symbol (A, N~) in this production.
3. Repeat step 2 of this procedure until on a pass no more symbols are marked.
If on a given pass no more symbols are marked, then on each succeeding pass no
more symbols would be marked. Since there are only a finite number of symbols in
Q, it is obvious that this procedure must always halt.
The following lemma completes the proof of Theorem 4.1.
LEMMA 4.1. A symbol (A, N~) in Q, A in N , N~ in 2 N is marked by
Algorithm 1 if and only if A ~ oofor some ~oin .. ~ .
o
PRooF. (*) If (A, Ni) is the kth symbol to be marked in Q, then A ~O oo for
• *
some co in N i .
The statement (*) is proved by induction on k. (*) is obviously true for k = 1.
Assuming that the statement is true for all k < m with m > 1, consider tile ways
in which the mth symbol can be marked. Suppose that (B, Ni) is the ruth symbol
in Q to be marked.
(a) (*) is clearly true if (B, N¢) is marked in the first pass.
(b) If (B, N¢) is marked in step 2(i), (*) is trivally true•
(c) If (B, Nj) is marked in step 2(ii) in a production of the form (B, Ni) --~
(C, Nk) (D, Nz) where N¢ = N~ U N~, then B ---~ CD is in P' and from the induc-
tive hypothesis C 2_> o~ and D ~ 002 from some col in Nk* and ~2 in Nt*. Thus,
o o
B 2_> w~¢o2for some oo~o~2in Ni*.
G
(d) If (B, Ni) is marked by step 2(iii) in a production of the form (B, Ni)
(C, M ) f ' , then C ~* Ci, " " Ci,,, with
" Cik in
" M and Cik f -~ Dik -~
* ook, ~k n" l N i*k ,
1 < k < m'. Consequently, B -~> Cf "-~ ¢o~ . . . ¢o~, and col .. • tom' is in Ni* since
- - - - G
m!
N~. = Uk=l Nik.
(e) Suppose (B, N~) ~ (C, M ) f ! is a production in Q such that (i) (B, Nj) is
marked by step 2(iv), and (ii) f ' contains an index production C ~ (D, Nj) in
which (D, Nj) has been marked. Then, B -~ Cf --~ D .2~ ~ofor some o~in Ni*.
Thus, statement (*) is valid for all values of k. Now consider the following state-
ment.
k
(**) If A --~ B~B~ . . . B, and U[=~ {B~} = N~, then all occurrences of (A, N~)
on the right of productions and index productions in Q and at least one occurrence
of (A, N~) on the left of a production in Q are marked upon termination of Algo-
rithm 1.
This statement is obviously true for k = 1. Assuming that it is true for all k < m,
consider the derivation B -o C~ -. • C, where U s~=~ {C~} = N i , with m > 1. Such a
o
derivation can be one of two forms:
By convention, ~* = {e}.
660 ~. v . h~to
m I ~/ "~ m2
1. B --~
G
DE --~ C1 " " CkE --~ CI .." CkCk+~ " . C~ , where m~ < m and m2 < m.
Let Nj~ = U~=~ {Ci} and Nj~ = [J~=k+~ {C~}. From the inductive hypothesis,
(D, Nj~) and (E, Nj~) must be marked symbols in the production (D, Nj)
(D, Ni~) (E, Nj~) in Q, and from step 2(ii) of Algorithm 1, (D, N~.) is marked.
From step 2(i) of Algorithm 1 ~ll occurrences of (B, N j) on the right of productions
and index productions in Q are raarked.
~n I m i
2. B --~ Df, where D --~ E ~ . . . E ~ , r _> 0, and m' < m, and E ~ f - ~ F~ -~
C ~ . . . C~j~, m~ < m , f o r l < i <rsuchthat
Cll "'" CmC21 "'" C2j2 "'" Crl "'" C~jr = C1 "'" C,.
Let N~ = U~L~{Cik}. There is a production (B, N j ) ---+ (D, M ) f ' in Q in which f'
contains the index productions E,~ --~ (Fi, Nj~) for 1 < i < r. From the inductive
hypothesis each (Fi, Nj.~) is marked, and from step 2(iv) of the algorithm M will
contain U;=l {Ek}. From the inductive hypothesis, (D, M) is marked, and thus,
by step 2(iii) of the algorithm, (B, Ni) is marked. If m' = 0, then (B, Nj) is
checked by step 2(iv) of the algorithm.
This completes the proof of (**) and also the proof of Lemma 4.1. Theorem 4.1
now follows directly from the fact that L(G') is nonempty if and only if (S, ~) is a
checked symbol in Q upon termination of Algorithm 1.
Solvability of emptiness and closure under intersection with regular sets imply
solvability of membership for any class of languages.
THEOREM 4.2. Let £ be a family of languages for which the emptiness problem is
solvable. If £ is effectively closed under intersection with regular sets, then the member-
ship problem is solvable for £.
PROOf. Let L ~ T* be any language in £ and let w be an arbitrary string in T*.
w is in L if and only if L A {w} is not empty.
THEOREM 4.3. The membership problem is solvable for the class of indexed lan-
guages.
PROOF. Theorem 4.3 follows immediately from Corollary 3, Lemma 3.2, and
Theorems 4.1 and 4.2.
COROLLARY. Given an indexed grammar G = (N, T, F, P, S), the predicate
A f ~ w for given w in T*, A in N, and ~ in F* is recursive.
PROOf. Let G' = (N U {S'}, T, F, P U {S' -~ A~'}, S'), where S' is a new sym-
bol. w is in L(G') if and only if A~" 2~ w.
Several nonclosure properties of indexed languages follow directly from some
well-known results. Given any recursively enumerable set W, it is possible to find
two deterministic context-free languages L~(W) and L2(W) and a homomorphism
h such that W = h(L~(W) N L K W ) ) [9].
If W is a recursively enumerable set which is not recursive [4], it must then fol-
low that L I ( W ) fl L2(W) = L K W ) U L~(W) is not an indexed language. L~(W)
and L K W ) and their complements (L denotes the complement of L) are all de-
terministic context-free languages. Nonclosure of indexed languages under comple-
ment and intersection is thus evident.
THEOREM 4.4. The class of indexed languages is not closed under intersection or
complement.
The language L I ( W ) N L~(W) is a context-sensitive language. Thus, there can-
Journal of the Association for Computing Machinery, Vol. 15, No, 4, October 1968
~ o t be any class of reeursive languages closed under homomorphism which includes

fall context-sensitive languages, la
At present, the class of indexed languages represents one of the largest known full
e~bstract families of rccursive languages.
It is often convenient to be able to remove e-productions (productions of the
f o r m A --~ e) from a grammar.
A normal form indexed grammar G is a reduced form indexed grammar (N, T, F,
t9, S) in which:
1. No production or index production contains the symbol S on the right-hand
side.
2. No production is of the form A --~ e, where A is in N - {S}.
3. The production S --~ e is in P if and only if e is in L(G).
THEOREM 4.5. Given an indexed grammar G, an equivalent normal form indexed
grammar G' can be effectively constructed from G.
PROOf. Let G = (N, T, F, P, S) be an indexed grammar in reduced form.
F r o m G, an equivalent e-free intermediate indexed grammar G' = (N', T, F', P',
S ' ) will be constructed. The nonterminals in N' will be of the form (A, M~), where
A is in N and M~ is in 2 N. For each M~ in 2 N, let m~ be the index containing the index
production B --* e if and only if B is in M~. If M~ = ~, then m~ = e.
P' is constructed in the following manner:
i. If A --~ BC is in P, P' contains the productions (A, M~) --~ (B, M~)(C, M~)
for all values of Mi in 2 ~:.
(i) If Bm~ ~ e, p ' also contains the production (A, M~) --~ (C, M~).
(ii) If Cm~ *-~ ~, p ' contains the production (A, M~) --* (B, Mi).
G
2. If A -~ a is in P with a in T, P' contains (A, Mi) --* a for all Mi in 2 N.
3. If A --~ Bf is in P, P' contains for all values of Mi the productions (A, M~) --~
(B, M~)f', where f contains the index production (C, M~) --* (D, Mi) if and only
if f contains C -* D and where Mi = {C {Cfml -~ Dm~ -~ ~}.
4. P' contains the initial production S' -* (S, ~,).
5. Finally, S' -~ e is in P' if and oifly if e is in L(G).
At this point it is convenient to define corresponding indexed nonterminals (c.i.n.
for short) in the grammars G and G'. The definition is recursive.
1. A and (A, ~) are c.i.n, for all A in N.
2. If A~- and (A, M ) f ' are c.i.n, and A --~ BC is in P, then (i) Bf and (B, M ) f '
are c.i.n., (ii) C~" and (C, M)f' are c.i.n.
3. If A~" and (A, M ) f ' are c.i.n, and A --+ Bf is in P, then Bf~ and (B, M')f'f"
are c.i.n, where f ' contains the index production (C, M') --~ (D, M) if and only if
f contains 6'-~ D and where M' = {C I Cfm ~o Dm -~ e}.
4. If Aff and (A, M)f'f' are e.i.n, and f and f ' contain the index productions
A --* B and (A, M) -~ (B, M'), respectively, then Bf and (B, M')f' are e.i.n.
5. Af and (A, M ) f ' are c.i.n, if and only if their being so follows from the rules
above.
LEMMX 4.2. If Af and (A, M)f' are c.i.n, in G and G', respectively, then A f -~o w,
w ~ e, if and only if (A, M)f' o--#,w.
*
~ I n fact, there cannot be any family of recursive languages closed under homomorphism which
includes all languages of offline Turing machine [13] tape complexity L(n), for any L(n) > log
log n.
Journal of the Association for Computing Machinery, ¥ol. 15, No. 4, October 1968
662 A.v. AHO
PROOF. A proof by induction on the length of a derivation is straightforward.

The details are left for the reader.
We now have S ' wo, w ~ ~,if and only if S' ~ (S, ~) 7* w. Also S ~ j2 e if
and only if S *G e. Thus, L ( G ~) = L ( G )
To complete the proof of Theorem 4.5, apply the construction of Theorem 3.1 to
obtain an equivalent reduced form grammar G~ from G'. G" will also be an equiva-
lent normal form grammar.
One undecidability result should be stated.
THEOnEM 4.6. Given an indexed grammar G, it is recursively unsolvable to determine
whether:
(a) L ( G ) = R for some regular set R,
(b) L ( G ) = L for some context-free language L.
PROOF. (a) follows directly from the fact that it is reeursively unsolvable to
determine whether a context-free grammar ~ generates a regular set [2]. (b) See [7]
or [11] for what has become a standard proof for questions of this type.
5. Containment Within Context-Sensitive Languages

At first glance it may not be clear that the class of indexed languages is a subset of
the class of context-sensitive languages. For example, in a normal form indexed
grammar it is possible to have a derivation of the form (5.1) in which an
s --, All1 ~ A2fJ, -~... ---, A & .../1 ---, " ~ A k f ~ . . . f l ---,
(5.1)
Bk-lfk-1 " " f~ --~ Bk-2fk-~ ' ' " f~ --~ ' ' " ---+B~f~ . . . fl -~ a
intermediate sentential form of arbitrary length derives only one terminal symbol"
Let G = (N, T, F, P, S) be a normal form indexed grammar. We begin by de-
scribing how an online nondetcrministic Turing machine with one working tape can
be effectively constructed from G to recognize exactly L ( G ) . We then modify M
such that the length of working tape used in determining whether an input word
w is in L ( G ) is a linear function of the length of w.
We construct M to trace out, nondeterministically, on its working tape a leftmost
derivation of any word in L ( G ) . The word that is being derived is compared symbcl
by symbol with w, the word on the input tape. If the two words are the same, M
accepts w. Otherwise, M rejects w. Since M is nondeterministie, if there exists a
leftmost derivation of w from S in G, M will find this derivation and accept w. If
there exists no derivation of w from S in G, M will reject w.
M will retain only one copy of any string of indices on its working tape. For
example, M will simulate a derivation of the form (5.2)
All . . . fk ~ Bfl . . . h e f t " " f~ (5.2)
by first having on its working tape the string SAf~ . . . fk# and then replacing the
nonterminal A by the nonterminals B C to create the string SBCfl . . . fk#. Thus,
each nonterminal on the working tape of M will be considered to be indexed by all
indices between that nonterminal and the right endmarker #.
If the index fl contains the index production B --* D and if derivation (5.2) pro-
14A coatexVfree grammar G = (N, T, P, S) is the indexed grammar (N, T, ~, P, S).
Ju~tnal of the Association for Computing Machinery, Vol. 15, No. 4, October 1968
Indexed G r a m m a r s - - A n Extension of Context-Free CGrammars 663
ceeds with the expansion of B by consuming the index fl as in (5.3), then M
Bfl . . . fkCfl . . . J'k --+ Df= . . . fkCf, . . . f , (5.3)

rewrites its working tape as $Cf~$D¢f2 .. • fk#.
We call the symbol immediately to the right of the rightmost $ sign on the work-
ing tape the active symbol. We use the symbol ¢ as a right endmarker on the working
tape for any intermediate sentential form derived by consuming an index.
For the sake of clarity, we describe the operation of M in terms of "composite
moves." Each composite move consists of a number of elementary moves by M,
and we leave it to the interested reader to supply the set of elementary moves to
effect each composite move. We assume that at the staxt of each composite move,
M is in the same state and the working tape head is scanning the active symbol on
the working tape. A configuration of M can then be represented by a pair (aoal • • •
d / . . . an, Z 0 " " ~ , j ' " Z,~), where
(i) a k i s i n T f o r 0 < k < n , w = a l . . . an-l;
(ii) ao = ¢, the left endmarker for the input; a~ = $, the right endmarker for
the input;
(iii) Z k i s i n N U T U F U { $ , ¢ } f o r 0 < k < m;
(iv) Z0 = $ and Zm = #;
(V) M is scanning the input symbol a~, 0 < i < n;
(vi) Z~ is the active symbol. Zi is in N lJ T LJ F U {~}.
We write (a0 . . . e l / . . , a , , o2/3) ~- (a0 . . . tli+~ . . . a~, a~2,5~), d in {0, 1} if
there is a composite move which takes M from configuration ( a 0 . . . d , . . . a . ,
aZS) to configuration (a0 . . . di+~ . • • a~, a~2~/~,). We use the relation t-* over the
set of configurations to denote the reflexive and transitive closure of ~-.
M is constructed to execute the following composite moves.
1. If M is in any configuration of the form ~5 (x~y, a$Al~) and if (i) A ~ BC,
(ii) A + b, or (iii) A ~ B f is a production in P, then M may enter the configura-
tion (i) (x~y, aSBC[3), (ii) (x~y, oeSbt~), or (iii) (x~y, aSBfl~), respectively.
2. If M is in configuration (xdy, a$~13f~/) where f is the first index to the right of
A and f contains an index production of the form A --~ B, then M may enter the
configuration ( xdy, ce$1~f$B~'l ).
3. If M is in configuration (ao • • • d~ .. • a , , aSetZl~), then M enters the configura-
tion (a0 . . . d~+~ "." a , , a$2~) if a = al, and M halts and rejects a~ . . . a._, if
a~a~.
4. If M is in configuration (xdy, c $]ZS) where f is in F, then M enters the con-
figuration ( xt~y, a$2l~).
5. If M is in configuration (xdy, a$Zl3$~'l) where/3 contains no $ symbols, then
M enters configuration (xdy, 05213"r).
6. If M is in configuration (itS, $~#) and S --~, is in P, then M enters configura-
tion (¢.~, $~).
M initially enters the configuration (a0tt~ . . . a , , $S~) and then executes any
~ We use the following conventions unless otherwise specified:
(1) xay is in ¢T*$.
(2) a is any string of working-tape symbols.
(3) /~ and "r are strings of working-tape symbols not containing the symbol $.
(4) Each string of symbols on the working tape begins with $ and ends with #.
(5) Z is any working-tape symbol except $.
Journal of the Associationfor Computing Machinery, Vol. 15, No. 4, October 1968
664 A . v . ~to
sequence of possible composite moves. M accepts the input w = a, . . . a~-I if and

only if M can enter the final configuration (a0 ' . . d~, $~) after some sequence of
composite moves.
A rather straightforward proof by induction on the length of a derivation and on
the number of composite moves made by M yields that
(ao . . . d~ . . . a,,, c$~Z~,flf3~f~ ... ~kfkl~k+1#) ~- *
(ao . . . di+,,+l " " a , , , c,$2~1f~ . . . f~fk~k+l#), k >__ 0,

where
(i) Zf~ is~6 in (Y U {¢} )* and
(ii) each /~j is in ( N U {¢} )*, 1 _<j_< k + 1
if and only if A f x . . . f k *(7 al . . . a¢+,, .
Thus, (a0g~ . . . am ' $S#) [- * (a0 . . . ~,$~) if and onlyif S *q ax - . . a~-i ' The set
of words accepted by M is then exactly L ( G ) .
The manner in which M operates is very similar to the operation of a device
called a nested stack automaton, which is a generalized stack automaton in which
the stack automaton can create and erase embedded stacks in very much the same
way M can create and erase $-¢ pairs [1]. It is shown in [1] that a language L is
an indexed language if and only if L is accepted by a (one-way, nondeterministic)
nested stack automaton.
There has been no guarantee that the length of the string of symbols on the work-
ing tape in any intermediate configuration in a sequence of composite moves by M
is linearly bounded by the length of the input. Each nonterminal appearing on the
working tape derives at least one terminal symbol. However, an arbitrarily long
string of indices may give rise to only one terminal, as in derivation (5.1).
An a u g m e n t e d g r a m m a r G is a normal form indexed grammar (N, T, F, P, S)
such that if A * B and if B ~ a, B --~ C f , or B ~ C D is a production in P, then
A ~ a, A ~ C f , or A ~ C D respectively, is also a production in P. Moreover, if
A *q B and if an index f in F contains the index production C --~ A, then f also
contains the index production C--~ B.
In un augmented grammar, it is not necessary to generate any indices in a deriva-
tion in which the only net change is one nonterminal into another nonterminal. For
$
example, in derivation (5.1), A~--* B~ and thus this derivation in an augmented
grammar would proceed as shown in (5.4).
S ---> A l f ~ ~ A~f2f~ - - - ~ . . . --~ A ~ f , . . . f l - ~ a. (5.4)
From Theorem 4.3 it immediately follows that the predicate A --~*
q B is recursive
for given G = (N, T, F, P, S) and A and B in N. Thus, given a normal form indexed
grammar G, an equivalent augmented grammar G' can be effectively constructed
from G.
We now describe how M can be modified into a machine M' which will accept ~i1
input w = al • • • a~_~ never using more than 6n squares of working tape if and only
if w is in L ( G ) for any augmented grammar (N, T, F, P, S).
In derivation (5.4) the indices f~, f2, " " , f~ are not used and consequently need
16If z/~ = Eand Y ~ ~, then (xdy, a $ 2 ~ Y ~ ) =- (x~y, a$~'~.).
J'ournal of the Association for Computing Machinery, Vol. 15, No. 4, October 1968
ndexed Grammars--An Extension of Context-Free Grammars 665
ot have been generated in this particular derivation1. Hence, M' will not print on
~s working tape any indices which are never used in the course of a derivation.
Moreover, M' will compress, wherever possible, strings of indices into single sym-
ols. For example, consider derivation (5.5),
A --~ Blfl '~ B~f~fl ---* Cf~flDfff,, (5.5)

~here
Cfef~ --~ E~fi ~ E~ --~ a, Df.2f, ~ b.
a this particular derivation the index string f2f~ could be condensed in a single
ndex h = (f2f~) containing the index production C --~ E~.
In general we define a set F~ of compressed indices as follows.
1. All indices in F are in F~.
2. If f and g are in F~, F¢ also contains the index (fg) containing the index
production A --* C if and only if Afg 7 Bg 7 C.
M' will print indices on the working tape in the form (f, d) where f is in F~ and d
s in {0, 1, 2, 3}. If f is the first index generated by a nonterminal and if f is con-
~umed during the derivation, then f will be printed on the working tape as (f, 0).
Any other index f which is consumed during the derivation will be printed as (f, 2),
~r if the symbol immediately to the right of the active symbol is an index of the
form (g, d) and no production of the form A --~ a or A --~ BC is ever used after f
has been consumed but before g is consumed, then f will be combined with g in a
symbol of the form (h, d) where h = (fg).
If an index production, A -~ B, is used from an index of the form (f, 0) or (f, 2)
on the working tape and then B is expanded as B -j~ a or B * C~ -~ D~E~ in the
derivation, then the index will be rewritten on the working tape as (f, 1) or (f, 3),
respectively. An index (f, d) will be erased from the working tape only if d = 1 or
3. Since G is an augznented grammar, this ensures that each index written on the
working tape derives at least one additional terminal symbol.
M ' will print a nonterminal A in N on its working tape as A' or A depending on
whether there is, or is not, respectively, at least one index to the right of A on the
working tape which will be consumed during the expansion of A.
The working tape vocabulary for M' will consist of the sets of symbols N,
N' = {A' I A is in N}, FJ = {(f, d) If is in F~, d is in {0, 1, 2, 3}} plus the special
symbols $, ¢, #.
M' will initially start off with the working tape blank and the string Ca~ • • • a~_~$
on the input tape. M' then marks off 6n squares on the working tape in such a
fashion that if more than 6n symbols are ever written on the working tape, M' will
halt and reject the input w = a l . . . a~_~.
M' then enters the configuration (¢~% . . . a~_~$, $~#) and will then be able to
execute any sequence of the following composite moves. If a is a string of working-
tape symbols, then a+ is
(i) fl(f, 1) if a = fl(f, 0),
(ii) fl(f, 3) if a = fl(f, 2),
(iii) a otherwise.
1. If M ' is in any configuration of the form (x~iy, a$Afl) and if (i) A ~ BC,
(ii) A --~ b, or (iii) A ~ B f is in P, then M may eater the configuration
666 A. V. ARO
(i) (xdy, ,~+$[~Cfl),

(ii) (x~y, a+$~),
(iii) (x~y, aSB~) or (xdy, a$]~'(f, 0)~), respectively.
2. If M' is in any configuration of the form (xdy, asfl.'Z~) and if (i) A ---, BC
or (ii) A ---*Bf is in P, then M 1may enter the configuration
(i) (a) (xdy, a+$B'ClZfl) or (b) (xay, a+$B'CZ~) or
(c) (x~y, ~+$~c'z~),
(ii) (a) (xdy, aSB'(f, 2)Z~) or,
(b) if Z is of the form (g, d) and the compressed index h = (fg) is non-
empty, (x~y, a$[:f (h, d)fl).
3. If M' is in any configuration of the form (x~y, ,~$~'~(f, d)~,) where g~contains
no symbol in Fo' and f contains the index production A -~ B, then M' may enter
the configuration
(~) (x~y, aS~(f, d)$[~¢~,) or,
(b) if d = 2 or 3, (xdy, aSH(f, d)$ff~7).
4. If M ' is in a configuration of the form (a0 .. • (k • ." a~, a$(~Z~), 0 < i < n,
then M ' enters the configuration (a0 • .. ~+~ • • • a~, a$2~) if a -- ai and M ' halts
and rejects w = al • • • a~_~ if a ~ a~.
5. If M' is in a configuration of the form (xdy, aS(], d)Z~), then M' enters the
configuration (xdy, ~$2~) if d = 1 or 3, and halts and rejects w = at . . . a~_~ if
d = 0or2.
6. If M ' is in a configuration of the form (xdy, aSZ~$~'y) where B contains no $
symbols, then M ' eaters configuration (xgty, c $],~'y).
7. If M' is in configuration (¢~, $~#) and if S---, e is in P, then M' enters con-
figuration (¢S, $~).
The input w = a~ . . . a~_~ will be accepted by M' if and only if M ' can go from
configuration ( ¢ d ~ . . . a~_~$, $~#) to configuration ( C a , " " a~_lS, $~) b y some
sequence of composite moves during which the number of symbols on the working
tape in any intermediate configuration never exceeds 6n.
For example, M' would simulate derivation (5.5) with the following sequence of
configurations.
(~abS, SB~(h, 0)#)

}- ((tabS, $CD(h, 0)#)
[- (CabS, SD (h, 0)$B~¢#)
(¢(lb$, $D (h, 1)$~¢#)
[- (CabS, $D(h, 1 ) $ ~ )
(¢a~, $D(h, 1)#)
(ta~, $~(h, 1)#)
~- (CabS, $(h, 1)#)
(~ab~, $~)
where h = (H,).
It should be clear that (aod,... a,~, $~#) ~-* (aoa,... ~,~, $~) if and only if
S -~ al • • • a,_l. Moreover, each nonterminal printed on the working tape derives
at least one terminal symbol. Also each index first written on the working tape in
the form (f, d) is such that f contains at least one index production of the form
A -~ B which is used in the simulation of the derivation of w -- a~ . . . a,-1 from
S in G, and in which the nonterminal B is involved in a derivation of the form
B * a or B --~ C~ --~ D~E~ for some ~ in F*. The index f can be associated with the
terminal a or the first terminal derived from CL respectively. Thus, each index
appearing on the working tape in any configuration can be associated with a distinct
terminal symbol. Therefore, the number of indices on the working tape can never
exceed the length of w. Since each symbol in N, N' or F ] can at worst be surrounded
by a $-¢ pair, and since the working tape is delimited by 8-#, the maximum number
of symbols on the working tape at any time can certainly be no more than 6n.
It is straightforward to construct a linear bounded automaton (Iba for short)
M " which simulates the behavior M'. Since a set L is accepted by some lba if and
only if L is context-sensitive [15], we have shown the following result.
THEOREM 5.1 Every indexed language is a context-sensitive language.
We summarize results of Section 4 and Theorem 5.1 in the following theorem.
THEOREM 5.2. The class of indexed languages is a proper superset of the class of
context-free languages and is a proper subset of the class of context-sensitive languages.
6. Restricted Forms of Indexed Grammars

Indexed grammars, as defined, t~re capable of generating a very large class of lan-
guages. In many situations it may be more convenient to consider smaller classes
of languages. With this in mind some restricted forms of indexed grammars are
defined in this section and the hierarchy of languages they generate is investigated.
A grammar is herein named according to the form of the index productions to-
gethcr with the form of the productions.
Definition. A right linear indexed right linear ( R I R for short) grammar G is an
indexed grammar (N, T, F, P, S) in which:
(1) Each index production in each index in F is of the form A ~ aB or A --~ a.
A and B are in N, a i s i n T U { ~ } .
(2) Each production in P is of the form A --~ aBf, A --~ aB, or A --~ a. A and
BareinN, fisinF, aisinTU{,}.
LEMMA 6.1. Given a context-free grammar G = (N, T, P, S), an equivalent RIR
grammar G' can be effectively constructed from G.
PROOF. Without loss of generality, assume that G is in Greibach normal 2-form
[10].17 Let N' = N U {(AB') I A is in N, B is in N} U {A' I A is in N}.
Let P ' be the set of productions constructed in the following manner:
For each production in P of the form (i) A --+ aBC, (ii) A ~ aB, or (iii) A --+ a,
add to P ' productions of the form
(i) A --~ a(BB')[B' ~ C], and (AD') --+ a(BB')[B' ~ (CD')] for all D in N;
(ii) A --~ aB, and (AD') ---+a(BD') for all D in N;
(iii) A --+ a, and (AD') ---+aD' for all D in N;
respectively.
Let F be the set of indices appearing in the productions in P'. Let G' = (N', T,
F, P', S).
We now claim that:
If A -ow, then (AD') *-~wD', for allD i n N . (6.1)

O (/!
17That is, every production in P is of one of the following forms: (i) A ---*aBC, (ii) A --* aB,
or (iii) A --~ a, where A, B, and C are in N, and a is in T. If e is in L(G), then S --* ~ is also
in P.
Journal of tile Association for Computing Machinery, Vol. 15. No. 4, October 1,q68
668 A . v . AHO
Statement (6.1) is clearly true for n = 1. Assuming that (6.1) is true for all n < m
m
with m > 1, consider a derivation A "~7 w. Such a derivation in G can be of one of
two forms:
(a) A -~ aBC
ml
awiC w i t h m l < m,
m2
O
awlw2 with ms < m.
(b) A --~ aB
ml
--~ awl with m~ < m.
From the constructions and the inductive hypothesis, the following derivations
are possible in G'.
(a) (AD') --~a(BB')[B' --~ (CD')]
awiB [B --~ (CD') ]
awl(CD')
* P.
awfw2D
(b) (AD') ~ a(BD')
awlD'.
Similarly, we can show that:
n t
If (AD')-~wD, then A ~w. (6.2)
O
In the same fashion it is easy to show that A ~q w if and only if A ~ w. It now

follows t h a t L( G') = L( G).
A (one-way, nondeterministic) pushdown automaton P is a 6-tuple (Q, T, F, ~,
q0, Z0) where:
(a) Q, T, and P are finite nonempty sets of state symbols, input symbols, and
stack symbols, respectively.
(b) ~ is a mapping from Q × (~ U {e}) X r into finite subsets of Q X I'*.
(c) q0 in Q is the initial state, and Z0 in Z is the initial symbol on the stack.
A configuration of P is a triple (q, aw, Z~,) where q is in Q, a is in ~ U {e}, w is
in z*, Z~ is in r*. q is the state of finite control, aw is the string remaining on
the input tape, and Z~ is the stack contents with Z the top symbol.
We write (p, aw, Z~,) ~ (q, w, c~,) if ~(p, a, Z) contains (q, c~). The relation ~-*
over the set of configurations denotes the reflexive and transitive closure of ~.
N ( P ) = {w I (qo, w, Zo) ~* (q, ~, ~) for some q in Q}. It is well known that L is
N(P) for some pushdown automaton P if and only if L is context-free [3].
LEMMi 6.2. Given an RI R grammar G = ( N, T, F, P, S ), a pushdown automaton
P = (Q, T, I', ~, qo, S) can be e~?ctively constructed from G such ~hat N ( P ) = L(G).
PRooF. P will be capable of deriving a leftmost derivation of any string w in
L(G). The set of pushdown tape symbols I' will be N U T U F. Q = {q0l O {qA I A
is in N}. ~ is constructed as follows:
1. If (i) A ~ aBf, (ii) A ~ aB, or (iii) A -~ a is a production in P, then ~(q0, e,
A) contains (i) (q0, aBf), (ii) (qo, aB), or (iii) (qo, a), respectively.
2. If f contains the index production (i) A ~ aB or (ii) A --~ a, then ~(qa, e, f)
contains (i) (q0, aB) or (ii) (q0, a), respectively.
3. ~(q0, a, a) = {(qo, e)} for all a in T.
Journal of t h e Association for C o m p u t i n g Machinery, Vol. 16, No. 4, October 1968

Indexed G r a m m a r s - - A n Extension of Context-Free Grammars 669
4. ~(q0, e, A) also contains (qa, e) for aliA in N.

5. ~(q0, e,f) = {(q0, e)} for all f i n F.
A straightforward proof by induction on the length of a derivation in G yields
that if A~ *>w, then (qo, w, A~) ~* (q0, e, e). A proof by induction oa tlle num-
ber of moves made by P shows that if (q0, w, A~) ~-* (q0, e, E), then A ~ - ~ w.
Thus, ,~ --*>w if and only if (q0, w, S) }* (q0, e, e) and, hence, N ( P ) = L ( G ) .
Theorem 6.1 now follows immediately from Lemmas 6.1 and 6.2.
THEOREM 6.1. The class of languages generated by R I R grammars is equivalent
to the class of context-free languages.
A left linear indexed left linear (LIL for short) grammar can be defined analo-
gously. Then, the following lemma is obvious.
LEMMA 6.3. Given an R I R grammar G, an L I L grammar G' can be effectively
constructed from G such that L ( G ' ) = (L(G) )~.
The class of context-free languages is closed under reversal. Thus:
THEOREM 6.2. The class of languages generated by L I L grammars is also equiva-
lent to the class of context-free languages.
It is now natural to consider gramm'ars in which productions are left linear in
form and index productions are right linear, or vice versa.
Definition. A left linear indexed right linear (LIR) grammar G is an indexed
grammar (N, T, F, P, S) in which:
(1) Each index production in each index in F is of the form A --~ B a or A -+ a.
(2) Each production in P is of the form A -~ aBf, A --~ aB, or A -+ a.
Here, A and B are in N, f i s i n F , a n d a i s i n T U { e } .
Definition. A ,¢ght linear indexed left linear (RIL) grammar G is an indexed
grammar (N, T, F, P, S) in which:
(1) Each index production in each index in F is of the form A -+ aB or A --+ a.
(2) Each production in P is of the form A - + B f a , A ---~Ba, or A -+a.
Example 3. Let
Ga = ( { S , A , B , C , D } , { a , b } , { f a , h , g ~ , g b , h , , h 2 } , P , S ) ,
be the L I R grammar where
fa = [C --+ Ca], go = [D -+ Da],
fb = [C --+ Cb], gb = [D --~ Db],
hi = [C --~ el, h2 = [D-+ A],
and P contains the productions
S --~ Ahl , A ~ Bh2 ,
A ---+aAf~, A -+bAfb,
B --~ aBga, B -+ bBgb,
A-+C, B -+D,
L(Ga) = {wlxlw2x~ " " wnx~wlw~ " " w~xnx,_t ".. xl I wl , xl are in {a, b}*,
l<i<n}.
L(Ga) is not a context-free language.
The construction in Lemma 6.1 also shows that each context-free language can
be generated by some L I R grammar. Thus, the class of languages generated by
670 A.v. AHO
LIR grammars includes alt context-free la~guages and some context-sensitive

languages.
A similar remark holds for RIL grammars.
Perhaps one other restricted form of indexed grammar should be mentioned. It
is possible to have two types of nonterminals in an indexed grammar, viz. those
that are capable of generating new indices and those that are not. Tile former we
call nonterminals, and the latter, intermediates. If all index productions contain
only terminals and intermediates, then once an index is consumed, no new
indices can be generated by these intermediates. We call such a grammar a restricted
indexed grammar.
Formally, a restricted #tdezed grammar is a 6-tuple (N, I, 7', F, P, S) where:
(a) N, I, T are finite disjoint sets of nontermirtal, intermediate, and terminal
symbols, respectively.
(b) F is a finite set of objects of the form [B~ --~ ~o~, •. • , B~ --÷ cok], where B~ is
in I and e~ is iIl (I U T)*, 1 < i < k.
(e) P is a set of productions of two forms: (i) A --, a where A is in N, a is in
( N F * U I F * U T ) * ; (ii) A - - - ~ w h e r e B i s i n I , ~ i s i n (I U T)*. A pro-
duetion of type (ii) is called an intermediate production.
(d) S is a distinguished symbol in N.
A derivation in a restricted indexed grammar G = (N, I, T, F, P, S) and the
language generated by G are defined in exactly the same way as for the indexed
grammar (N U I, T, F, P, S). The derivation tree corresponding to the derivation
has the property that once an intermediate appears in a path no new indices appear
later in that path.
There are several interesting subclasses of restricted indexed grammars. Re-
stricted right linear indexed right linear grammars and restricted left linear indexed
left linear grammars can be defined analogously to R I R grammars and LIR gram-
mars. It is not difficult to show that the class of languages generated by restricted
RIR grammars and the class of languages generated by restricted LIL grammars
are both exactly the class of linear context-free languages.
An R-grammar is a restricted indexed grammar in which all intermediate and
index productions are right linear in form. A stack automaton [8, 9] which earl read
the stack in only one direction, from the top toward the bottom, is called a reading
pushdown automaton (RPA) [1]. It is shown in [1] that the class of languages
generated by R-grammars is equivalent to the class of languages recognized by
(one-way, nondeterministic) RPA.
7. Conclusions
A new class of grammars, called indexed grammars, has been defined. These gram-
mars generate a new class of languages, called indexed languages, properly contain-
ing all context-free languages and properly contained within the class of con-
text-sensitive languages. The class of indexed languages represents a natural
generalization of context-free languages and closely resembles this class of lan-
guages with similar closure properties and decidability results.
REFERENCES
1. AHO,A.V. Nested stack automata. (Submitted to a technical journal.)
2. BAR-HI~LEL,Y., PIi~RLES,M., AND SHAMIR, E. On formal properties of simple phrase
Indexed Grammars--An Extension of Conte~ct-Free Gra~nmars 671
structure grammars. Z. Phonetik, Sprachwiss. Kommunikatio~tsforsch. 14 (1961), 143-172;

in Bar-Hillel, Y,, Language and Information, Addisol~-Wesley, Reading, Mass., 1965, pp.
116-150.
3. C~io~tsKY,N. Formal properties of grammars. In Lace, R., Bush, R., and Galauter, E.
(Eds.), Handbook of Mathematical Psychology, Vol. II, Wiley, New York, 1963.
4. DAvis, M. Computability and Unsolvability. McGraw-Hilt, New York, 1958.
5. FLOYD,R.W. On the nonexistence of a phrase structure gr~mmar for ALGOL 60. Corn.re.
ACM 5, 9 (Sept. 1962), 483-484.
6. GINSBURC,S. The Mathematical Theory of Context Free Languages. McGraw-Hill, New
York, 1966.
7. GINSB]~:RG,S., AND GI~EIBACII,S.A. Abstract families of languages. IEEE Conference
Record of Eighth Annual Symposium on Switching and Automata Theory, IEEE pub. no.
16-C-56, Oct. 1967, pp. 128-139.
8. GINSBURG,S., QItEIBACH,S. A., AND HA:RRISON,M.A. Stack automata and compililxg.
J. ACM 14, 1 (Jan. 1967), 172-201.
9. GINSBURG,S., GREIBACI-I,S. A., AND H~naISON, M. A. One-way stack automata. J.
ACM 14, 2 (April 1967), 389-418.
10. GR~I:RACH,S. A. A new normal-form theorem for eontext~free phrase structure gram-
mars. J. ACM 12, 1 (Jan. 1965), 42-52.
11. GR~IBACH, S. A. A note on undecidable properties of formal languages. Tech. Rep.
TM-738/038/00, System Development Corp., Santa Monica, Calif., Aug. i967.
12. GREI:RACH,S. A., AND HOPCaOFT, J. E. Independence of AFL operations. Tech. Rep.
TM-738/034/00, System Development Corp., Santa Monica, Calif., July 1967.
13. H.ARTMANIS,J., AND STEARNS, R . E . On the computational complexity of algorithms.
Trans. Amer. Math. Soc. 117 (May 1965), 285-306.
14. HOPC:ROFT,J. E., AND ULLMAN, J . D . Nonerasing stack automata. J. Camp. Syst. Sci. I,
2 (Aug. 1967), 166-186.
15. KURODA,S. Y. Classes of languages and linear bounded automata. Inform. Contr. 7
(June 1964), 207-223.
16. ROSENKRANTZ, D. J. Programmed grammars--a new device for generating formal
languages. IEEE Conference Record of Eighth Annual Symposium on Switching and
Automata Theory, IEEE pub. no. 16-C-56, Oct, 1967, pp. 14-20.
:RECEIVED FEBRUARY,1968; REVISEDAPRIL, 1968
Journal of the Associationfor ComputingMachinery,VoL 15, No. 4, October1968

Aho - Indexed Grammars

Caricato da

Informazioni sul documento

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Aho - Indexed Grammars

Caricato da

Copyright:

Formati disponibili

Indexed Grammars--An Extension of Context°

Bell Telephone Laboratories, Inc., Murray Hill, New Jersey

CR CATEGORIES: 5.22, 5.23

Copyright ~ 1968,Associationfor Computing Machinery, Inc.

kinds of stack automata provide a device-oriented definition of such a class [8, 9,

(b) T is a finite set of symbols called the terminal alphabet.

Jom'~alof ~he Associationfor ComputingMachinery,Vol. 15, No. 4, October19668

a0, (:Y1, ~:~2, " ' " , an (1.1)

L(G), is called an indexed language and is defined to be the s e t / w in T* [ S -L*ow}.

S --~ a A f c --~ aaAgfcc ~ . . . --* a'Mg'~-~fc n --~ a~B g ~-~fc

Then, expanding B by consuming indices, we have

Bg " - l f --~ .oti

Thus, S 2,~+~ a % % ' L Moreover, a derivation in G~ is "deterministic" except for the

S ---* T f , T ~ Tg, T --~ A B A ,

f = [A -+ a, B -+ b, C --~ b], g = [A --+ a A , B ---+ bBCC, C ---+bC].

FIG. 1. Derivation tree for aabbbbaa

It is easy to show that

A fl "'" f j -~ BJi "'" f j C f , "'" fs

In G, we have the derivation

a:d from the inductive hypothesis, ~(p, w) contMns (q, x).

i~i! Here, the indices disappear when terminals are introduced.

Journal of the Associatioa for Computing Maehiaery, V o l . 15, N o . 4, O c t o b e r 1968

It is straightforward to show that (p, a, q) ~ x, a in T U {e} if and only if

N'=NU ON,, F'=FU (JF,,

T' --- U T , , P'=PU OP,.

It should be clear that L(G') = L'.

~ o t be any class of reeursive languages closed under homomorphism which includes

PROOF. A proof by induction on the length of a derivation is straightforward.

5. Containment Within Context-Sensitive Languages

ceeds with the expansion of B by consuming the index fl as in (5.3), then M

Bfl . . . fkCfl . . . J'k --+ Df= . . . fkCf, . . . f , (5.3)

sequence of possible composite moves. M accepts the input w = a, . . . a~-I if and

(ao . . . d~ . . . a,,, c$~Z~,flf3~f~ ... ~kfkl~k+1#) ~- *

(ao . . . di+,,+l " " a , , , c,$2~1f~ . . . f~fk~k+l#), k >__ 0,

A --~ Blfl '~ B~f~fl ---* Cf~flDfff,, (5.5)

(i) (xdy, ,~+$[~Cfl),

(~abS, SB~(h, 0)#)

6. Restricted Forms of Indexed Grammars

If A -ow, then (AD') *-~wD', for allD i n N . (6.1)

In the same fashion it is easy to show that A ~q w if and only if A ~ w. It now

Journal of t h e Association for C o m p u t i n g Machinery, Vol. 16, No. 4, October 1968

4. ~(q0, e, A) also contains (qa, e) for aliA in N.

LIR grammars includes alt context-free la~guages and some context-sensitive

structure grammars. Z. Phonetik, Sprachwiss. Kommunikatio~tsforsch. 14 (1961), 143-172;

:RECEIVED FEBRUARY,1968; REVISEDAPRIL, 1968

Journal of the Associationfor ComputingMachinery,VoL 15, No. 4, October1968

Potrebbero piacerti anche