Kracht Mathematics of Language

The Mathematics of Language
Marcus Kracht Department of Linguistics UCLA PO Box 951543 450 Hilgard Avenue Los Angeles, CA 900951543 USA
& # '%$"!

Printed Version September 16, 2003
ii
Was dann nachher so sch n iegt . . . o wie lange ist darauf rumgebrutet worden.
o Peter R hmkorf: Ph nix voran u
Preface
The present book developed out of lectures and seminars held over many years at the Department of Mathematics of the Freie Universit t Berlin, the a Department of Linguistics of the Universit t Potsdam and the Department of a Linguistics at UCLA. I wish to thank in particular the Department of Mathematics at the Freie Universit t Berlin as well as the Freie Universit t Berlin a a for their support and the always favourable conditions under which I was allowed to work. Additionally, I thank the DFG for providing me with a HeisenbergStipendium, a grant that allowed me to continue this project in between various paid positions. I have had the privilege of support by HansMartin G rtner, Ed Keenan, a Hap Kolb and Uwe M nnich. Without them I would not have had the energy o to pursue this work and ll so many pages with symbols that create so much headache. They always encouraged me to go on. Lumme Erilt, Greg Kobele and Jens Michaelis have given me invaluable help by scrupulously reading earlier versions of this manuscript. Further, I wish to thank Helmut Alt, Christian Ebert, Benjamin Fabian, Stefanie Gehrke, Timo Hanke, Wilfrid Hodges, Gerhard J ger, Makoto Kanazawa, a Franz Koniecny, Thomas Kosiol, Ying Lin, Zsuzsanna Lipt k, Istv n N meti, a a e Terry Parsons, AlexisManaster Ramer, Jason Riggle, Stefan Salinger, Ed Stabler, Harald Stamm, Peter Staudacher, Wolfgang Sternefeld and Ngassa Tchao for their help.
Los Angeles and Berlin, September 2003
Marcus Kracht
Introduction
This book is as the title suggests a book about the mathematical study of language, that is, about the description of language and languages with mathematical methods. It is intended for students of mathematics, linguistics, computer science, and computational linguistics, and also for all those who need or wish to understand the formal structure of language. It is a mathematical book; it cannot and does not intend to replace a genuine introduction to linguistics. For those who are not acquainted with general linguistics we recommend (Lyons, 1968), which is a bit outdated but still worth its while. For a more recent book see (Fromkin, 2000). No linguistic theory is discussed here in detail. This text only provides the mathematical background that will enable the reader to fully grasp the implications of these theories and understand them more thoroughly than before. Several topics of mathematical character have been omitted: there is for example no statistics, no learning theory, and no optimality theory. All these topics probably merit a book of their own. On the linguistic side the emphasis is on syntax and formal semantics, though morphology and phonology do play a role. These omissions are mainly due to my limited knowledge. However, this book is already longer than I intended it to be. No more material could be tted into it. The main mathematical background is algebra and logic on the semantic side and strings on the syntactic side. In contrast to most introductions to formal semantics we do not start with logic we start with strings and develop the logical apparatus as we go along. This is only a pedagogical decision. Otherwise, the book would start with a massive theoretical preamble after which the reader is kindly allowed to see some worked examples. Thus we have decided to introduce logical tools only when needed, not as overarching concepts. We do not distinguish between natural and formal languages. These two types of languages are treated completely alike. I believe that it should not matter in principle whether what we have is a natural or an articial product. Chemistry applies to naturally occurring substances as well as articially produced ones. All I will do here is study the structure of language. Noam Chomsky has repeatedly claimed that there is a fundamental difference between natural and nonnatural languages. Up to this moment, conclusive evidence for this claim is missing. Even if this were true, this difference should
Introduction
not matter for this book. To the contrary, the methods established here might serve as a tool in identifying what the difference is or might be. The present book also is not an introduction to the theory of formal languages; rather, it is an introduction to the mathematical theory of linguistics. The reader will therefore miss a few topics that are treated in depth in books on formal languages on the grounds that they are rather insignicant in linguistic theory. On the other hand, this book does treat subjects that are hardly found anywhere else in this form. The main characteristic of our approach is that we do not treat languages as sets of strings but as algebras of signs. This is much closer to the linguistic reality. We shall briey sketch this approach, which will be introduced in detail in Chapter 3. A sign is dened here as a triple e c m , where e is the exponent of , which typically is a string, c the (syntactic) category of , and m its meaning. By this convention a string is connected via the language with a set of meanings. Given a set of signs, e means m in if and only if (= iff) there . Seen this way, the task of language is a category c such that e c m theory is not only to say which are the legitimate exponents of signs (as we nd in the theory of formal languages as well as many treatises on generative linguistics which generously dene language to be just syntax) but it must also say which string can have what meaning. The heart of the discussion is formed by the principle of compositionality, which in its weakest formulation says that the meaning of a string (or other exponent) is found by homomorphically mapping its analysis into the semantics. Compositionality shall be introduced in Chapter 3 and we shall discuss at length its various ramications. We shall also deal with Montague Semantics, which arguably was the rst to implement this principle. Once again, the discussion will be rather abstract, focusing on mathematical tools rather than the actual formulation of the theory. Anyhow, there are good introductions to the subject which eliminate the need to include details. One such book is (Dowty et al., 1981) and the book by the collective of authors (Gamut, 1991b). A system of signs is a partial algebra of signs. This means that it is a pair M , where is a set of signs and M a nite set, the set of socalled modes (of composition). Standardly, one assumes M to have only one nonconstant mode, a binary function , which allows one to form a sign 1 2 from two signs 1 and 2 . The modes are generally partial operations. The action of is explained by dening its action on the three components of the respective signs. We give a
) (
0 ) ) (
1 20 ) ) (
Introduction
xi
simple example. Suppose we have the following signs.
Here, v and n are the syntactic categories (intransitive) verb and proper name, respectively. is a constant, which denotes an individual, namely Paul, and is a function from individuals to the set of truth values, which typically is the 1 if and only if x is running.) On the level set 0 1 . (Furthermore, x of exponents we choose word concatenation, which is string concatenation (denoted by ) with an intervening blank. (Perfectionists will also add the period at the end...) On the level of meanings we choose function application. Finally, let be a partial function which is only dened if the rst argument is n and the second is v and which in this case yields the value t. Now we put
Then
is a sign, and it has the following form. : t
We shall say that this sentence is true if and only if 1; otherwise we say that it is false. We hasten to add that is not a sign. So, is indeed a partial operation. The key construct is the free algebra generated by the constant modes alone. This algebra is called the algebra of structure terms. The structure terms can be generated by a simple context free grammar. However, not every structure term names a sign. Since the algebras of exponents, categories and meanings are partial algebras, it is in general not possible to dene a homomorphism from the algebra of structure terms into the algebra of signs. All we can get is a partial homomorphism. In addition, the exponents are not always strings and the operations between them not only concatenation. Hence the dened languages can be very complex (indeed, every recursively enumerable language can be so generated). Before one can understand all this in full detail it is necessary to start off with an introduction into classical formal language theory using semi Thue systems and grammars in the usual sense. This is what we shall do in Chapter 1. It constitutes the absolute minimum one must know about these matters. Furthermore, we have added some sections containing basics from algebra,
0 aV
3 B P 7 H G B P 7 H G QVRU '4 dR'4 D ( 3 0aV U ) ) A9@785cP7IHG D BCAF97b5'4 QI64 B P 7 H G 3 B A 9 7 5 B P 7 H G "b'4 QR64 ( 3 Y0 ) ) (
W ` W (
e1 c1 m1
e2 c2 m2 :
0 ) ) ( P 7 H 0 ) ) RG ( A 9 7 F85
v n
DEQI64 B P 7 H G D B A 9 7 5 EC@864 D V U W X T ) S 3
e1
e2 c1 c2 m2 m1
xii
Introduction
set theory, computability and linguistics. In Chapter 2 we study regular and context free languages in detail. We shall deal with the recognizability of these languages by means of automata, recognition and analysis problems, parsing, complexity, and ambiguity. At the end we shall discuss semilinear languages and Parikhs Theorem. In Chapter 3 we shall begin to study languages as systems of signs. Systems of signs and grammars of signs are dened in the rst section. Then we shall concentrate on the system of categories and the socalled categorial grammars. We shall introduce both the AjdukiewiczBar Hillel Calculus and the LambekCalculus. We shall show that both can generate exactly the context free string languages. For the LambekCalculus, this was for a long time an open problem, which was solved in the early 1990s by Mati Pentus. Chapter 4 deals with formal semantics. We shall develop some basic concepts of algebraic logic, and then deal with boolean semantics. Next we shall provide a completeness theorem for simple type theory and discuss various possible algebraizations. Then we turn to the possibilities and limitations of Montague Semantics. Then follows a section on partiality and presupposition. In the fth chapter we shall treat socalled PTIME languages. These are languages for which the parsing problem is decidable deterministically in polynomial time. The question whether or not natural languages are context free was considered settled negatively until the 1980s. However, it was shown that most of the arguments were based on errors, and it seemed that none of them was actually tenable. Unfortunately, the conclusion that natural languages are actually all context free turned out to be premature again. It now seems that natural languages, at least some of them, are not context free. However, all known languages seem to be PTIME languages. Moreover, the socalled weakly context sensitive languages also belong to this class. A characterization of this class in terms of a generating device was established by William Rounds, and in a different way by Annius Groenink, who introduced the notion of a literal movement grammar. We shall study these types of grammars in depth. In the nal two sections we shall return to the question of compositionality in the light of Leibniz Principle, and then propose a new kind of grammars, de Saussure grammars, which eliminate the duplication of typing information found in categorial grammar. The sixth chapter is devoted to the logical description of language. This approach has been introduced in the 1980s and is currently enjoying a revival. The close connection between this approach and the socalled constraint programming is not accidental. It was proposed to view grammars not as
Introduction
xiii
generating devices but as theories of correct syntactic descriptions. This is very far away from the tradition of generative grammar advocated by Chomsky, who always insisted that language contains a generating device (though on the other hand he characterizes this as a theory of competence). However, it turns out that there is a method to convert descriptions of syntactic structures into syntactic rules. This goes back to ideas by B chi, Wright as well u as Thatcher and Doner on theories of strings and theories of trees in monadic second order logic. However, the reverse problem, extracting principles out of rules, is actually very hard, and its solvability depends on the strength of the description language. This opens the way into a logically based language hierarchy, which indirectly also reects a complexity hierarchy. Chapter 6 ends with an overview of the major syntactic theories that have been introduced in the last 25 years. N OTATION . Some words concerning our notational conventions. We use is the German typewriter font for true characters in print. For example: word for mouse. Its English counterpart appears in (English) texts either as or as , depending on whether or not it occurs at the beginning of a sentence. Standard books on formal linguistics often ignore these points, but since strings are integral parts of signs we cannot afford this here. In between true characters in print we also use socalled metavariables (placeholders) such as a (which denotes a single letter) and x (which denotes a string). The notation i is also used, which is short for the true letter followed by the binary code of i (written with the help of appropriately chosen characters, mostly and ). When dening languages as sets of strings we distinguish between brackets that appear in print (these are and ) and those which are just used to help the eye. People are used to employ abbreviatory conventions, for example in place of . Similarly, in logic one uses or even in place of . We shall follow that usage when the material shape of the formula is immaterial, but in that case we avoid using the true function symbols and the true brackets and , and use and instead. For is actually not the same as . To the reader our notation may appear overly pedantic. However, since the character of the representation is part of what we are studying, notational issues become syntactic issues, and syntactical issues simply cannot be ignored. Notice that and are truly metalinguistic symbols that are used to dene sequences. We also use sans serife fonts for terms in formalized and computer languages, and attach a prime to refer to its denotation (or meaning). For example, the computer code for a whileloop is written
A 7 FRH s
RR6 t t x v ws v u @8d8ys
RI x v w v 8$bu
e h A 7 g U
RdI
@ d ' t
h A g 7 f
xiv
Introduction
semiformally as i 100 x: x x i . This is just a string of symbols. However, the notation denotes the proposition that John sees Paul, not the sentence expressing that.
V k 8@Ra) k 8$6nmlIjih r q p o d Uk d V g fe U d @ D
Contents
1 1 2 3 4 5 6 7 2 1 2 3 4 5 6 7 3 1 2 3 4 5 6 7 8 4 1 2 3 4 5
Fundamental Structures Algebras and Structures . . . Semigroups and Strings . . . Fundamentals of Linguistics Trees . . . . . . . . . . . . . Rewriting Systems . . . . . Grammar and Structure . . . Turing machines . . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . .
1 1 16 29 43 52 66 80 95 95 103 117 132 147 160 165 177 177 191 207 225 239 249 258 269 281 281 296 308 323 332
Context Free Languages Regular Languages . . . . . . . . . . . . . . . Normal Forms . . . . . . . . . . . . . . . . . . Recognition and Analysis . . . . . . . . . . . . Ambiguity, Transparency and Parsing Strategies Semilinear Languages . . . . . . . . . . . . . . Parikhs Theorem . . . . . . . . . . . . . . . . Are Natural Languages Context Free? . . . . . Categorial Grammar and Formal Semantics Languages as Systems of Signs . . . . . . . . . Propositional Logic . . . . . . . . . . . . . . . Basics of Calculus and Combinatory Logic . The Syntactic Calculus of Categories . . . . . . The ABCalculus . . . . . . . . . . . . . . . . The LambekCalculus . . . . . . . . . . . . . Pentus Theorem . . . . . . . . . . . . . . . . Montague Semantics I . . . . . . . . . . . . . . Semantics The Nature of Semantical Representations Boolean Semantics . . . . . . . . . . . . Intensionality . . . . . . . . . . . . . . . Binding and Quantication . . . . . . . . Algebraization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
xvi 6 7 5 1 2 3 4 5 6 7 8 6 1 2 3 4 5 6 7
Contents
Montague Semantics II . . . . . . . . . . . . . . . . . . . . . 343 Partiality and Discourse Dynamics . . . . . . . . . . . . . . . 354 PTIME Languages MildlyContext Sensitive Languages . . . . Literal Movement Grammars . . . . . . . . Interpreted LMGs . . . . . . . . . . . . . . Discontinuity . . . . . . . . . . . . . . . . Adjunction Grammars . . . . . . . . . . . . Index Grammars . . . . . . . . . . . . . . . Compositionality and Constituent Structure de Saussure Grammars . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 367 367 381 393 401 414 424 434 447 461 461 470 485 505 515 529 540
The Model Theory of Linguistic Structures Categories . . . . . . . . . . . . . . . . . . . . . Axiomatic Classes I: Strings . . . . . . . . . . . Categorization and Phonology . . . . . . . . . . Axiomatic Classes II: Exhaustively Ordered Trees Transformational Grammar . . . . . . . . . . . . GPSG and HPSG . . . . . . . . . . . . . . . . . Formal Structures of GB . . . . . . . . . . . . .
Chapter 1 Fundamental Structures

1. Algebras and Structures
In this section we shall provide denitions of basic terms and structures which we shall need throughout this book. Among them are the notions of algebra and structure. Readers for whom these are entirely new are advised to read this section only cursorily and return to it only when they hit upon something for which they need background information. We presuppose some familiarity with mathematical thinking, in particular some knowledge of elementary set theory and proof techniques such as induction. For basic concepts in set theory see (Vaught, 1995) or (Just and Weese, 1996; Just and Weese, 1997); for background in logic see (Goldstern and Judah, 1995). Concepts from algebra (especially universal algebra) can be found in (Burris and Sankappanavar, 1981) and (Gr tzer, 1968), and in a (Burmeister, 1986) and (Burmeister, 2002) for partial algebras; for general background on lattices and orderings see (Gr tzer, 1971) and (Davey and a Priestley, 1990). We use the symbols for the union, for the intersection of two sets. Instead of the difference symbol M N we use M N. denotes the empty set. M denotes the set of subsets of M, f in M the set of nite subsets of M. Sometimes it is necessary to take the union of two sets that does not identify the common symbols from the different sets. In that case one uses . We dene M N : M 0 N 1 ( is dened below). This is called the disjoint union. For reference, we x the background theory of sets that we are using. This is the theory (Zermelo Fraenkel Set Theory with Choice). It is essentially a rst order theory with only two two place relation symbols, and . (See Section 3.8 for a denition of rst order logic.) We dene x y by z z x x y . Its axioms are as follows. 1. Singleton Set Axiom. x y z z y z x . This makes sure that for every x we have a set x . 2. Powerset Axiom. x y z z y z x . This ensures that for every x the power set x of x exists.
T S
V U V }
T xe S
1 V V V U U ~U U ~
1 V V V U U ~U U ~
| { 8Qz
s S yT xe
1 V U U ~
Fundamental Structures
5. Replacement. If f is a function with domain x then the direct image of x under f is a set. (See below for a denition of function.) 6. Weak Foundation.
This says that in every set there exists an element that is minimal with respect to . 7. Comprehension. If x is a set and a rst order formula with only y occurring free, then y : y x y also is a set.
We remark here that in everyday discourse, comprehension is generally applied to all collections of sets, not just elementarily denable ones. This difference will hardly matter here; we only mention that in monadic second order logic this stronger from of comprehension is expressible and also the axiom of foundation.
Foundation is usually dened as follows
In mathematical usage, one often forms certain collections of sets that can be shown not to be sets themselves. One example is the collection of all nite sets. The reason that it is not a set is that for every set x, x also is a set. The
T S
Foundation. There is no innite chain x0
x1
x2
Full Comprehension. For every class P and every set x, y : y is a set.
x and x
9. Axiom of Choice. For every set of sets x there is a function f : x y for all y x. with f y
8. Axiom of Innity. There exists an x and an injective function f : x such that the direct image of x under f is not equal to x.
VV aaV
1 V U ~U
x x
y y
z z
V aV
1 V U ~U
T V U
1 V w V U U U U ~ D
U ~U ~ V V U
4. Extensionality.
y x
z z
y .
V aV
1 V U U
1 V V V U U ~U U ~ 1 S
3. Set Union. x u is denoted by tence.
z z y u z u u x . z or simply by x. The axiom guarantees its exisx
x x
1 QV U
Algebras and Structures
function x x is injective (by extensionality), and so there are as many nite sets as there are sets. If the collection of nite sets were a set, say y, its powerset has strictly more elements than y by a theorem of Cantor. But this is impossible, since y has the size of the universe. Nevertheless, mathematicians do use these collections (for example, the collection of algebras). This is not a problem, if the following is observed. A collection of sets is called a class. A class is a set iff it is contained in a set as an element. (We use iff to abbreviate if and only if.) In set theory, numbers are dened as follows. n 1: k:k n
The set of soconstructed numbers is denoted by . It is the set of natural numbers. In general, an ordinal (number) is a set that is transitively and linearly ordered by . (See below for these concepts.) For two ordinals and , either (for which we also write ) or or . Theorem 1.1 For every set x there exists an ordinal and a bijective funcx. tion f : f is also referred to as a wellordering of x. The nite ordinals are exactly the natural numbers dened above. A cardinal (number) is an ordinal such that for every ordinal there is no onto map f : . It is not hard to see that every set can be wellordered by a cardinal number, and this cardinal is unique. It is denoted by M and called the cardinality of M. The smallest innite cardinal is denoted by 0 . The following is of fundamental importance.
By denition, 0 is actually identical to so that it is not really necessary to distinguish the two. However, we shall do so here for reasons of clarity. (For example, innite cardinals have a different arithmetic than ordinals.) If M is nite, its cardinality is a natural number. If M 0 , M is called countable; it is uncountable otherwise. If M has cardinality , the cardinality of M is denoted by 2 . 20 is the cardinality of the set of all real numbers. 2 0 is strictly greater than 0 (but need not be the smallest uncountable cardinal). We remark here that the set of nite sets of natural numbers is countable.
Theorem 1.2 For two sets x, y exactly one of the following holds: x x y or x y.
v aaa) ) ) S )
(1.1)
0:
T S g 1
012
y,
If M is a set, a partition of M is a set P M such that every member of P is nonempty, P M and for all A B P such that A B, A B . If M and N are sets, M N denotes the set of all pairs x y , where x M and y N. A denition of x y , which goes back to Kuratowski and Wiener, is as follows.
Proof. By extensionality, if x u and y v then x y u v . Now assume u v . Then either x u or x u v , and x y u or x y that x y u v . Assume that x u. If u x y then x x y , whence x x, in violation to foundation. Hence we have x y u v . Since x u, we must have y v. This nishes the rst case. Now assume that x u v . Then xy u cannot hold, for then u u v y , whence u uv u. So, u v . However, this gives x x y , once again a we must have x y contradiction. So, x u and y v, as promised. With these denitions, M N is a set if M and N are sets. A relation from M to N is a subset of M N. We write x R y if x y R. Particularly interesting is the case M N. A relation R M M is called reexive if x R x for all x M; symmetric if from x R y follows that y R x. R is called transitive if from x R y and y R z follows x R z. An equivalence relation on M is a reexive, symmetric and transitive relation on M. A pair M is called an ordered set if M is a set and a transitive, irreexive binary relation on M. is then called a (strict) ordering on M and M is then called ordered by . is linear if for any two elements x y M either x y or x y or y x. A partial ordering is a relation which is reexive, transitive and antisymmetric; the latter means that from x R y and y R x follows x y. x y : y R x for the socalled If R M N is a relation, we write R : converse of R. This is a relation from N to M. If S N P and T M N are relations, put
We have R S M P and R T M N. In case M N we still make further denitions. We put M : x x : x M and call this set the diagonal
R T:
x y : x R y or x T y
0 ) @S ( } sD
0 ) @S ( D ( 0 ) @S
(1.3)
R S:
x y : for some z : x R z S y
1 T ) S
T ) S D 1 T ) 1 S T ) S
0 i)
D T ) D T ) S T ) S 0D ) ( 0 ) D
1 0 ) (
( 0 ) @S
DT ) S
T bT ) S ) S
T ) S T ) S
D T ) S
0 ) (
DT ) S
0 ) (
Lemma 1.3 x y
u v iff x
u and y
T T ) ') S S
0 ) (
0 ) (
(1.2)
xy :
x xy
v.
0 ) (
0 ) (
T ) S 0 ) (
T ) S
on M. Now put Ri
R is the smallest transitive relation which contains R. It is therefore called the transitive closure of R. R is the smallest reexive and transitive relation containing R. M N such that if A partial function from M to N is a relation f x f y and x f z then y z. f is a function if for every x there is a y such f x to say that x f y and f : M N to say that that x f y. We write y f is a function from M to N. If P M then f P : f P N . Further, f: M N abbreviates that f is a surjective function, that is, every y N is of the form y f x for some x M. And we write f : M N to say that f is injective, that is, for all x x M, if f x f x then x x . f is bijective if it is injective as well as surjective. Finally, we write f : x y if f x : x X is the socalled direct image y f x . If X M then f X : of X under f . We warn the reader of the difference between f X and f X . For example, let suc : : x x 1. Then according to the denition 5 and suc 4 1 2 3 4 , since of natural numbers above we have suc 4 0 1 2 3 . Let M be an arbitrary set. There is a bijection between the set 4 of subsets of M and the set of functions from M to 2 0 1 , which is dened as follows. For N M we call N : M 0 1 the characteristic function of N if N x 1 iff x N. Let y N and Y N; then put f 1 y : x: 1Y : 1 y denotes the f x y and f x : f x Y . If f is injective, f unique x such that f x y (if that exists). We shall see to it that this overload in notation does not give rise to confusions. M n , n , denotes the set of ntuples of elements from M.
In addition, M 0 : 1 . Then an ntuple of elements from M is an element of M n . Depending on need we shall write xi : i n or x0 x1 xn 1 for a member of M n . An nary relation on M is a subset of M n , an nary function on M is a function f : M n M. n 0 is admitted. A 0ary relation is a subset of 1, hence it is either the empty set or the set 1 itself. A 0ary function on M is a function c : 1 M. We also call it a constant. The value of this constant is
) aaa)
(1.5)
M1 :
Mn
Mn
T ) ) ) S
V U V U
V U
U xt
Vk U
T ) S
0 i
V U
R :
R :
T ) S
(1.4)
R0 :
Rn
V U
1 k ) 1
V U S
1 V U
V U
V T w IIS
V U S D T 1D V U } D
V U
R Rn Ri
D }
T ) ) ) S
V U
V U
the element c . Let R be an nary relation and x M n . Then we write R x in place of x R. Now let F be a set and : F . The pair F , also denoted by alone, is called a signature and F the set of function symbols. Denition 1.4 Let : F be a signature and A a nonempty set. Further, let be a mapping which assigns to every f F an f ary function on A. Then we call the pair : A an algebra. algebras are in general denoted by upper case German letters. In order not to get drowned in notation we write f for the function f . In place of denoting by the pair A we shall denote it somewhat ambiguously by A f : f F . We warn the reader that the latter notation may give rise to confusion since functions of the same arity can be associated with different function symbols. However, this problem shall not arise. The set of terms is the smallest set Tm such that if f F and ti Tm , i f , also f t0 t f 1 Tm . Terms are abstract entities; they are not to be equated with functions nor with the strings by which we denote 0, then f is a term them. To begin we dene the level of a term. If f of level 0, which we also denote by f . If t i , i f , are terms of level ni , then f t0 t f 1 is a term of level 1 max ni : i f . Many proofs run by induction on the level of terms, we therefore speak about induction on the construction of the term. Two terms u and v are equal, in symbols u v, if they have identical level and either they are both of level 0 and there is an f F such u v f or there is an f F, and terms s i , ti , i f , such that u f s0 s f 1 and v f t0 t f 1 as well as si ti for all i f . An important example of an algebra is the socalled term algebra. We choose an arbitrary set X of symbols, which must be disjoint from F. The signature is extended to F X such that the symbols of X have arity 0. The terms over this new signature are called terms over X. The set of terms over X is denoted by Tm X . Then we have Tm Tm . For many purposes (indeed most of the purposes of this book) the terms Tm are sufcient. For we can always resort to the following trick. For each x X add a 0ary function symbol x to F. This gives a new signature X , also called the constant expansion of by X. Then Tm can be canonically identied X with Tm X . There is an algebra which has as its objects the terms and which interprets
V U i
V U
V w U
V U D
V U
T V U
V U
0 ) ( D V
V U V U
1 i
)iaaa) U 1
0 ) (
V U
0 ) (
1 QV
0 IT
V V U
) iaaa) U
) iaaa)
V w U D
') ( S
1 i
) aaa) U
V U
V U D
V U
the function symbols as follows.
Then Tm X is an algebra, called the term algebra X : generated by X. It has the following property. For any algebra and any map v : X A there is exactly one homomorphism v : Tm X such that v X v. This will be restated in Proposition 1.6. Denition 1.5 Let be an algebra and X A. We say that X generates if A is the smallest subset which contains X and which is closed under all functions f . If X we say that is generated. Let be a class of algebras and . We say that is freely generated by X in if for every and maps v : X B there is exactly one homomorphism v : such that v X v. If X we say that is freely generated in . Proposition 1.6 Let be a signature, and let X be disjoint from F. Then the term algebra over X, X , is freely generated by X in the class of all algebras. The following is left as an exercise. It is the justication for writing for the (up to isomorphism unique) freely generated algebra of . In varieties such an algebra always exists. Proposition 1.7 Let be a class of algebras and a cardinal number. If and are both freely generated in they are isomorphic. Maps of the form : X Tm X , as well as their homomorphic extensions are called substitutions. If t is a term over X, we also write t in place of t . Another notation, frequently employed in this book, is as follows. Given terms si , i n, we write si xi : i n t in place of t , where is dened as follows.
(Most authors write t si xi : i n , but this notation will cause confusion with other notation that we use.) Terms induce term functions on a given algebra . Let t be a term with variables xi , i n. (None of these variables has to occur in the term.) Then
V U
(1.7)
y :
si y
if y xi , else.
V U
dV U
V U
V U
) aaa) U
V U
0 "aV U D
0 V U ) V U
V U D
(1.6)
f : ti : i
f t0
V U
V U
We denote by Clon the set of nary term functions on . This set is also called the clone of nary term functions of . A polynomial of is a term function over an algebra that is like but additionally has a constant for each element of A. (So, we form the constant expansion of the signature with every a A. Moreover, a (more exactly, a ) shall have value a in A.) . For The clone of nary term functions of this algebra is denoted by Pol n example, x0 x1 x0 is a term and denotes a binary term function in an algebra for the signature containing only and . However, 2 x 0 x0 is a polynomial but not a term. Suppose that we add a constant 1 to the signature, with denotation 1 in the natural numbers. Then 2 x 0 x0 is still not a term of the expanded language (it lacks the symbol 2), but the associated function actually is a term function, since it is identical with the function induced by x 0 x0 . the term 1 1
We write h : if h is a homomorphism from to . Further, we write h: if h is a surjective homomorphism and h : if h is an injective homomorphism. h is an isomorphism if h is injective as well as surjective. is called isomorphic to , in symbols if there is an isomorphism from to . If we call h an endomorphism of ; if h is additionally bijective then h is called an automorphism of .
Denition 1.9 Let be an algebra and a binary relation on A. is called a congruence relation on if is an equivalence relation and for all f F and all x y A f we have:
V U i
V U i
V U
(1.10)
If xi yi for all i
f then f
x f
y.
If h : A B is an isomorphism from phism from to .
to
then h
1:
V aV
U aaaV ))
U V )
U U
V i aV U U
(1.9)
h f
h x0 h x1
h x
A is an isomor-
1 0 IT
') ( S
0 IT
') ( S
1 i
Denition 1.8 Let algebras and h : A every f tuple x
A f : f F and B f : f F B. h is called a homomorphism if for every f A f we have
be F and
V aV
V U
U g U
V aV
U g U
V U
V aV
V i aV 6U
)) i iaaaV 6U U
V U
) aaa) U U
f t0
a :
V 6U aV i V V 6U i
(1.8)
xi a :
ai f t0 a t a
0 aV U
U g V
1 ) i i
g aU U
U aU
V U
t : An
A is dened inductively as follows (with a
ai : i
f ).
We call x the equivalence class of x. Then for all x and y we have either x y or x y . Further, we always have x x . If additionally is a congruence relation then the following holds: if y i xi for all i f then f y f x . Therefore the following denition is independent of representatives.
(1.13) (1.14)
A :
x:x
The product of the algebras i , i I, is dened as follows. The carrier set is the set of functions : I Ai for all i I. Call this i I Ai such that i set P. For an nary function symbol f put
0V aaV U
U aaaaV U ))V
U aV U )V
U (
D V V U
) iaaa)
(1.16)
0 i
1 i
0 IT
Denition 1.10 Let A f : f F be an algebra and B under all f F. Put f : f B f . The pair B f : f F a subalgebra of .
A closed is called
1 i
V U
') ( S
ker h is a congruence relation on . Furthermore, to if h is surjective. A set B A is closed under f we have f x B.
1 QV U
T V U
V U
0 IT
1 y0 ) @S (
D ') ( S
V U
(1.15)
ker h :
xy
A2 : h x
hy
ker h is isomorphic F if for all x B f
We call the factorization of by . The map h : x proved to be a homomorphism. be a homomorphism. Then put Conversely, let h :
0 IT
: f
V y iU
V Q iU
Then because of (1.10) we immediately have f means f y f x . Put
@S') ( D T 1 @S D i V U V U 1 i
y f
x . This simply
x is easily
V U
) aaa) 1
Namely, let y0
x0
. Then yi xi for all i
i V U
) iaaa) ) aU
(1.12)
x0 x1
x f .
i V U xV U 1 i
S D
1 iU V Q
V U
1 D
(1.11)
x:
y : xy
V U
We also write x y in place of xi yi for all i relation put
f . If is an equivalence
V U
10
The resulting algebra is denoted by i I i . One also denes the product in the following way. The carrier set is A B and for an nary function symbol f we put
The algebra is isomorphic to the algebra i 2 i , where 0 : , . However, the two algebras are not identical. (Can you verify this?) : 1 A particularly important concept is that of a variety or equationally denable class of algebras. Denition 1.11 Let be a signature. A class of algebras is called a variety if it is closed under isomorphic copies, subalgebras, homomorphic images, and (possibly innite) products. Let V : xi : i be the set of variables. An equation is a pair s t of terms (involving variables from V ). We introduce a formal symbol and write s t for this pair. An algebra satises the equation s t iff for all maps v : V A, v s v t . We then write s t. A class of algebras satises this equation if every algebra of satises it. We write s t.
The verication of this is routine. It follows from the rst three facts that equality is an equivalence relation on the algebra V , and together with the fourth that the set of equations valid in form a congruence on V . There is a bit more we can say. Call a congruence on fully invariant if for : if x y then h x h y . The next theorem folall endomorphisms h : lows immediately once we observe that the endomorphisms of V are
V U
V U
V U
V U
V U
V U
V U
If s t .
t and : V
Tm V is a substitution, then
V iU
V U i
V U
V U
If
si
ti for all i
f then
f s
DD D
DD D
If
t;t
u then
DD D
DD D
DD D
If
t then
s. s. u. f t .
Proposition 1.12 The following holds for all classes
of algebras.
DD D
0 aV
D0 ) (
) iaaa)
DD D
U V )
)iaaa) U ( D V a0 )
()) aaa0
V U
V U
( aU b
(1.17)
a0 b0
an
bn
a0
D DD
DDD
an
b0
bn
11
exactly the substitution maps. To this end, let h : V V . Then h is uniquely determined by h V , since V is freely generated by V . It is easily computed that h is the substitution dened by h V . Moreover, every map v : V V induces a homomorphism v : V V , which is unique. Now write Eq : st : s t .
Let E be a set of equations. Then put
This is a class of algebras. Classes of algebras that have the form E for some E are called equationally denable.
We state without proof the following result. Theorem 1.15 (Birkhoff) Every variety is an equationally denable class. Furthermore, there is a biunique correspondence between varieties and fully invariant congruences on the algebra 0 . The idea for the proof is as follows. It can be shown that every variety has free exists. Moreover, a variety is algebras. For every cardinal number , uniquely characterized by 0 . In fact, every algebra is a subalgebra of a direct image of some product of 0 . Thus, we need to investigate the equations that hold in the latter algebra. The other algebras will satisfy these equations, too. The free algebra is the image of V under the map xi i. The induced congruence is fully invariant, by the freeness of 0 . Hence, this congruence simply is the set of equations valid in the free algebra, hence in the whole variety. Finally, if E is a set of equations, we write E t u if t u for all E . Theorem 1.16 (Birkhoff) E t u iff t of the rules given in Proposition 1.12. u can be derived from E by means
The notion of an algebra can be extended into two directions, both of which shall be relevant for us. The rst is the concept of a manysorted algebra.
V U
V 2 U
Proposition 1.14 Let E be a set of equations. Then
V U
D D V 2 U 1
1 d0 ) (
U y
D DD
jS
V 2 U
(1.18)
E :
: for all s t
E:
E is a variety.
V lU
Corollary 1.13 Let be a class of algebras. Then Eq variant congruence on V .
is a fully in-
V U
V U
dV U
dV U
T d D
DD D
0 ) @S (
V U
V lU
V U
V U
DD D
V 2 U
12
Denition 1.17 A sorted signature is a triple F , where F and are sets, the set of function symbols and of sorts, respectively, and : F a function assigning to each element of F its socalled signature. We shall denote the signature by the letter , as in the unsorted case. So, the signature of a function is a (nonempty) sequence of sorts. The last member of that sequence tells us what sort the result has, while the others tell us what sort the individual arguments of that function symbol have.
If B : is another algebra, a (sorted) homomorphism from to is a set h : A B : of functions such that for each f F with signature i : i n 1 :
A manysorted algebra is an algebra of some signature .
Evidently, if for some , then the notions coincide (modulo trivial adaptations) with those of unsorted algebras. Terms are dened as before, but now they are sorted. First, for each sort we assume a countably innite set whenever . Now, every term is V of variables. Moreover, V V given a unique sort in the following way.
The set of terms over V is denoted by Tm V . This can be turned into a sorted algebra; simply let Tm V be the set of terms of sort . Again, given a map v that assigns to a variable of sort an element of A , there is a unique homomorphism v from the algebra of terms into . If t has sort , A . A sorted equation is a pair s t , where s and t are of equal then v t sort. We denote this pair by s t. We write s t if for all maps v into ,vs v t . The Birkhoff Theorems have direct analogues for the many sorted algebras, and can be proved in the same way.
0 ) (
V U
V U
f t0 tn 1 has sort n , if f for all i n.
V U
) iaaa) U
V U
If x
V , then x has sort .
i : i
1 and ti has sort i
V aV
)) aaaV
V aV
) iaaa)
U U
(1.20)
h n f
a0
an
h 0 a 0
T 1
e aae
( S 0 b1 ) T
V U
(1.19)
f : A 0
A 1
A n
A n
h n
an
( V U 0 D b1 ) T
S !(
Denition 1.18 A (sorted) algebra is a pair that for every A is a set and for every f i n 1
A : such F such that f i :
0 ) ( ) D
!( S D
1 QV U
V U
13
Sorted algebras are one way of introducing partiality. To be able to compare the two approaches, we rst have to introduce partial algebras. We shall now return to the unsorted notions, although it is possible even though not really desirable to introduce partial manysorted algebras as well. Denition 1.19 Let be an unsorted signature. A partial algebra is a pair A , where A is a set and for each f F: f is a partial function from A f to A. The denitions of canonical terms split into different notions in the partial case. Denition 1.20 Let and be partial algebras, and h : A B a map. h is a weak homomorphism from to if for every a A f we have f h a if both sides are dened. h is a homomorphism if it is h f a a weak homomorphism and for every a A f if h f a is dened then so is f h a . Finally, h is a strong homomorphism if it is a homomorphism and h f a is dened iff f h a is. is a strong subalgebra of if A B and the identity map is a strong homomorphism. Denition 1.21 An equivalence relation on A is called a weak congruence of if for every f F and every a c A f if a c and f a , f c are both dened, then f a f c . is strong if in addition f a is dened iff f c is. It can be shown that the equivalence relation induced by a weak (strong) homomorphism is a weak (strong) congruence, and that every weak (strong) congruence denes a surjective weak (strong) homomorphism. A be a function, t f s0 s f 1 a term. Then v t is Let v : V
dened iff (a) v si is dened for every i f and (b) f is dened on v si : i n . Now, we write v w s t if v s v t in case both are s s dened and equal; v t if v s is dened iff v t is and if one is dened the two are equal. An equation s t is said to hold in in the weak (strong) sense, if v w s t ( v s s t) for all v : V A. Proposition 1.12 holds with respect to s but not with respect to w . Also, algebras satisfying an equation in the strong sense are closed under products, strong homomorphic images and under strong subalgebras. The relation between classes of algebras and sets of equations is called a Galois correspondence. It is useful to know a few facts about such correspondences. Let A, B be sets and R A B (A and B may in fact also be
V U
V y iU
V 6U i V 6y iU
1 i
V U V U
V iU aV 6 U
V U V i
V U D V U DD D ) iaaa)
D DD
DV U
) 0 (
1 i
1 !) i i
) Q0 D DD(
V i U aV 6U
V U i
D DD
) 0 (
) 0 (
V 6U i
V i aV 6U U
V U
V iU aV ' U V i aV 6U U
0 ) ( D
V i aV 6U U V y iU }
V U (
14
classes). The triple A B R is called a context. Now dene the following operators:
Theorem 1.22 Let A B R be a context. Then the following holds for all OO A and all P P B.
Proof. Notice that if A B R is a context, B A R also is a context, and so we only need to show , and . . O P iff every x O stands in relation R to every member of P iff P O . . If O O and y O , then for every x O : x R y. This means that for every x O: x R y, which is the same as y O . . Notice that O O by implies O O . Denition 1.23 Let M be a set and H : M M a function. H is called a closure operator on M if for all X Y M the following holds.
Proposition 1.24 Let A B R be a context. Then O O and P P are closure operators on A and B, respectively. The closed sets are the sets of the form P for the rst, and O for the second operator.
V U
A set X is called closed if X
0 ) ) (
V aV U
V U
H X
H H X . H X .
V U
} "V U
If X
Y then H X
V U
H X . H Y .
1 } } 0 Q ) ) (
V
0 ) ) (
P .
O .
If P
P then P
P .
If O
O then O
O .
P iff O
P.
One calls O the intent of O
A and P the extent of P
B.
V U
} Y ) 0 ) ) (
(1.22)
:B
A :P
A : for all y
P:xRy
1 S S
V U
V U V U
(1.21)
:A
0 ) ) (
B :O
B : for all x
O:xRy
} )
Theorem 1.26 Let A B R be a context. The concepts are exactly the pairs of the form P P , P B, or, alternatively, the pairs of the form O O , O A. As a particular application we look again at the connection between classes of algebras and sets of equations over terms. (It sufces to take the set of algebras of size for a suitable to make this work.) Let denote is the class of algebras, the set of equations. The triple and the map nothing but . The a context, and the map is nothing but classes E are the equationally denable classes, the equations E such that E and E . valid in . Concepts are pairs Often we shall deal with structures in which there are also relations in addition to functions. The denitions, insofar as they still make sense, are carried over analogously. However, the notation becomes more clumsy.
as well as Denition 1.27 Let F and G be disjoint sets and : F : G functions. A pair A is called an structure if for all f F f is an f ary function on A and for each g G g is a g ary relation on A. is called the functional signature, the relational signature of .
Whenever we can afford it we shall drop the qualication and simply talk of structures. If A is an structure, then A F is an algebra. An algebra can be thought of in a natural way as a structure, where is the empty relational signature. We use a convention similar to that of algebras. Furthermore, we denote relations by upper case Roman letters such as R, S and so on. Let A f : f F R :R G and B f : f F R : R G be structures of the same signature. A map h : A B is called an isomorphism from to , if h is bijective and for all f F and all x A f we have
V i aV U U
V i aV U U
(1.23)
h f
hx
0 IT
VlUF V 2 U D D V U l$ 2 $i) 2( 2
0 w !) ( 0
0 "!)
V U
V " U
) ( 0 ) (
e V U
'bT S)
0 ) (
1 Y0 ) (
Denition 1.25 Let A B R be a context. A pair O P called a concept if O P and P O .
B is
') ( S
Proof. We have O O , from which O O O , so that we get O O . Likewise, P now follow easily.
. On the other hand, O P is shown. The claims
15
0 ) (
% 0 IT $ 0 ) ( 1 D D
0 l( )
0 ) (
'bT S)
0 ) ) D (
) ( } 0 0 ) ) (
V U
D 1 i
') ( S
V " U
V % U
V U
16
Exercise 1. Since y y is an embedding of x into x , we have x x . Show that x x for every set. Hint. Let f : x x be any function. Look at the set y : y f y x. Show that it is not in im f . N and g : N P. Show that if g f is surjective, g Exercise 2. Let f : M is surjective, and that if g f is injective, f is injective. Give in each case an example that the converse fails. Exercise 3. In set theory, one writes N M for the set of functions from N to M. n and M m, then N M mn . Deduce that N M Mn . Show that if N Can you nd a bijection between these sets?
(1.25a) (1.25b)
R R
R S
S S
R S
R S
Exercise 5. Let and be algebras for some signature . Show that if h: is a surjective homomorphism then is isomorphic to with x y iff h x hy. Exercise 6. Show that every algebra is the homomorphic image of a term algebra. Hint. Take X to be the set underlying . .
0
Exercise 8. Prove Proposition 1.7. 2. Semigroups and Strings
In formal language theory, languages are sets of strings over some alphabet. We assume throughout that an alphabet is a nite, nonempty set, usually called A. It has no further structure (but see Section 1.3), it only denes the material of primitive letters. We do not make any further assumptions on the
V e
U e I
e "V
e U e c
Exercise 7. Show that . Show also that 1
is isomorphic to i 0 1 is isomorphic to
i,
where
Show by giving an example that analogous laws for
do not hold.
} k )
V k X V X U V k s X U s U D V X k V X U U s X V k s U D
} k )
Exercise 4. Show that for relations R R
N, S S
P we have
V U
V U
V U
V aV
,
U aaaV )) D
} T V U
U V )
U U D
1 S jV U T S
V U
V U i D D
(1.24)
R x
h x0 h x1

h x R
1 i
as well as for all R
G and all x
A R
V U
jV U
Semigroups and Strings
17
size of A. The Latin alphabet consists of 26 letters, which actually exist in two variants (upper and lower case), and we also use a few punctuation marks and symbols as well as the blank. On the other hand, the Chinese alphabet consists of several thousand letters! Strings are very fundamental structures. Without a proper understanding of their workings one could not read this book, for example. A string over A is nothing but the result of successively placing elements of A after each other. It , is not necessary to always use a fresh letter. If, for example, A then , , are strings over A. We agree to use typewriter font to mark actual symbols (= pieces of ink), while letters in different font are only proxy for letters (technically, they are variables for letters). Strings are denoted by a vector arrow, for example w, x, y and so on, to distinguish them from individual letters. Since paper is of bounded length, strings are not really written down in a continuous line, but rather in several lines, and on several pieces of paper, depending on need. The way a string is cut up into lines and pages is actually immaterial for its abstract constitution (unless we speak of paragraphs and similar textual divisions). We wish to abstract from these details. Therefore we dene strings formally as follows. A for Denition 1.28 Let A be a set. A string over A is a function x : n some natural number n. n is called the length of x and is denoted by x . x i , i n, is called the ith segment or the ith letter of x. The unique string of length 0 is denoted by . If x : m A and y : n A are strings over A then x y denotes the unique string of length m n for which the following holds: (1.26) x y j :
We often write x y in place of x y. In connection with this denition the set A is called the alphabet, an element of A is also referred to as a letter. Unless stated otherwise, A is nite and nonempty. So, a string may also be written using simple concatenation. Hence we have . Note that there no blank is inserted between the two strings; for the blank is a letter. We denote it by . Two words of a language are usually separated by a blank possibly using additional punctuation marks. That the blank is a symbol is felt more clearly when we use a typewriter. If we want to have a blank, we need to press down a key in order to get it. For purely formal reasons we have added the empty string to the set of strings.
i dW i
v U i V V i W U U i V U i D
x j y j
if j m, else.
V U i ni
T ) p) $'l) S H D i
i i i i g
p F p p W p H H H D H H H
p p @ H H H H H i i
i Wi
18
It is not visible (unlike the blank). Hence, we need a special symbol for it, which is , in some other books also . We have
We say, the empty string is the unit with respect to concatenation. For any triple of strings x, y and z we have
We therefore say that concatenation, , is associative. More on that below. We dene the notation x i by induction on i. x :
Note that the letter is technically distinct from the string x : 1 A : 0 . They are nevertheless written in the same way, namely . If x is a string over A and A B, then x is a string over B. The set of all strings over A is denoted by A . Let be a linear order on A. We dene the socalled lexicographical ordering (with respect to ) as follows. Put x L y if there exist u, v and w as well as a and b such that x u a v, y u b w and a b. Notice that x L y can obtain even if x is longer than y. Another important ordering is the following one. Let a : k if a is the kth symbol of A in the ordering . Further, put n : A . For x x0 x1 x p 1 we associate the following number. (1.31) Z x :
p 1 i 0
xi
1 n
p i 1
Now put x N y if and only if Z x Z y . This ordering we call the numerical ordering. Notice that both orderings depend on the choice of . We shall illustrate these orderings with A : and . Then the numerical ordering is as follows. 1 2 4 5 7 8 13 14 16
aa
V U i
x Z x
i i
H H
H @H
V U V U i i i i U g V U i V g V @V U U D aa i D V U D Di i i W W i i i W W i i D i D
H H @H
T l) S H D
i 0
i n 1
i n
i WV i
H H
(1.30)
xi :
Furthermore, we dene i
i W i
(1.29)
i 1
x0 :
xi x
n xi
as follows.
xi
i W V i W U i
V i W U W i i
(1.28)
y z
x y
i W D
i i
Wi
(1.27)
xn
19
This ordering is linear. The map sending i to the ith element in this sequence is known as the dyadic representation of the numbers. In the dyadic representation, 0 is represented by the empty string, 1 by , 2 by , 3 by and so on. (Actually, if one wants to avoid using the empty string here, one may start with instead.) The lexicographical ordering is somewhat more complex. We illustrate it for words with at most four letters.
In the lexicographical as well as the numerical ordering is the smallest element. Now look at the ordered tree based on the set A . It is a tree in which every node is nary branching (cf. Section 1.4). Then the lexicographical ordering corresponds to the linearization obtained by depthrst search in this tree, while the numerical ordering corresponds to the linearization obtained by breadthrst search (see Section 2.2). A monoid is a triple M 1 where is a binary operation on M and 1 an element such that for all x y z M the following holds. (1.32b) 1 x
(1.32a)
x 1
x x
1 ) ) 0 X !) )
@
H
, ,
@ H H
@ H @H
HH H@ @H H H
, , , ,
H H @H H H H H
H@H H H H H
, ,
, ,
, , , ,
H H
@ H @ H H H@H H H H @H
H H H H H @H
H H @H
HH H H
Figure 1. The Tree A
aa
aa aa H
aa H H H
20
A monoid is therefore an algebra with signature : 1 0 2, which in addition satises the above equations. An example is the algebra 4 0 max (recall that 4 0 1 2 3 ), or 0 .
The function which assigns to each string its length is a homomorphism from A onto the monoid 0 . It is surjective, since A is always assumed to be nonempty. A are special monoids:
This map is surely well dened. For the dening clauses are mutually exclusive. Now we must show that this map is a homomorphism. To this end, let x and y be words. We shall show that
This will be established by induction on the length of y. If it is 0, the claim is evidently true. For we have y , and hence v x y vx vx 1 v x v y . Now let y 0. Then y w a for some a A. vx y (1.35) vx w a vx w vx vx vx
va
vw vw
va
va
vy
This shows the claim. The set A is the only set that generates A freely. For a letter cannot be produced from anything longer than a letter. The empty string is always dispensable, since it occurs anyway in the signature. Hence any generating set
X i V U
V U i
D i
1 V i W U i
V Y U
W i
V U @V U i X i D V aV U V ijU Y@V U X U X i D V U aV ijU @V U U X V X i D V U @V i W U X i D V W i W U i V i W U i D
V U @V U i X i
I ni
V i W U i
(1.34)
vx y
vx
vy
V U @V U X i
vx a :
(1.33)
v :
1 vx va
N 1 be a monoid and v : A Proof. Let we dene a map v as follows.
V Y U
Proposition 1.30 The monoid
A is freely generated by A. N an arbitrary map. Then
V Y U
0 a) ) (
0 g !) ) (
V Y U
Proposition 1.29 Let
A :
. Then
A is a monoid.
) ) (
8a)
0 g !) ) (
X @V X U D
T ) ) ) S
0 X !) ) (
V X X U D V Y U V U V W U i D D
(1.32c)
y z
x y
V U V U i X i
V Y U
21
must contain A, and since A generates A it is the only minimal set that does so. A nonminimal generating set can never freely generate a monoid. For example, let X . X generates A , but it is not minimal. Hence it does not generate A freely. For example, let v : . Then there is no homomorphism that extends v to A . For then on the one hand v , on the other v v v v . The fact that A generates A freely has various noteworthy consequences. First, a homomorphism from A into an arbitrary monoid need only be xed on A in order to be dened. Moreover, any such map can be extended to a homomorphism into the target monoid. As a particular application we get that every map v : A B can be extended to a homomorphism from A to B . Furthermore, we get the following result, which shows that the monoids A are up to isomorphism the only freely generated monoids (allowing innite alphabets). They reader may note that the proof works for algebras of any signature.
be freely generated by X, freely generated by Y . Then either Proof. Let X Y or Y X . Without loss of generality we assume the rst. Then there is an injective map p : X Y and a surjective map q : Y X such that q p 1X . Since X generates freely, there is a homomorphism p : with p X p. Likewise, there is a homomorphism q : such that q Y q, since is freely generated by Y . The restriction of q p to X is the identity. (For if x X then q p x q px q px x.) Since X freely generates , there is only one homomorphism which extends 1 X on and this is the identity. Hence q p 1M . It immediately follows that q is surjective and p injective. Hence obtains. If Y X holds, is shown in the same way.
i YW i
If x u
y u, then x
V Y U
Theorem 1.32 In
A the following cancellation laws hold. y.
V aV U U
There exists an injective homomorphism i : homomorphism h : such that h i 1N .
There is an injective homomorphism i : momorphism h : such that h i
0 @!) ( ) X
V aV U U
Theorem 1.31 Let M 1 and noids. Then either or obtains.
1 be freely generated moand a surjective hoand a surjective
1M .
) ll) H
V U W U V D H
V Y U
V W U
V U
V Y U V Y U V U D H X X 0 @!) ) X
V Y U T @ll) S ) H H D ( D 1
V Y U
V @U H D H D V Y U D i YW i
V Y U X
22
xT is dened as follows. (1.36) :
T i n i n
xT is called the mirror string of x. It is easy to see that x T T x. The reader is asked to convince himself that the map x x T is not a homomorphism if A 1. Denition 1.33 Let x y A . Then x is a prex of y if y x u for some u A . x is called a postx or sufx of y if y u x for some u A . x is called a substring of y if y u x v for some u v A . It is easy to see that x is a prex of y exactly if x T is a postx of yT . Notice that a given string can have several occurrences in another string. For example, occurs four times is . The occurrences are in addition not always disjoint. An occurrence of x in y can be dened in several ways. We may for example assign positions to each letters. In a string x 0 x1 xn 1 the numbers n 1 are called positions. The positions are actually thought of as the spaces between the letters. The ith letter, x i , occurs between the position i and the position i 1. The substring i j k xi occurs between the positions i and k. The reason for doing it this way is that it allows us to dene occurrences of the empty string as well. For each i, there is an occurrence of between position i and position i. We may interpret positions as time points in between which certain events take place, here the utterance of a given sound. Another denition of an occurrence is via the context in which the substring occurs. Denition 1.34 A context is a pair C y z of strings. The substitution of x into C, in symbols C x , is dened to be the string y x z. We say that x occurs in v in the context C if v C x . Every occurrence of x in a string v is uniquely dened by its context. We call C a substring occurrence of x in v. Actually, given x and v, only one half of the context denes the other. However, as will become clear, contexts dened in this way allow for rather concise statements of facts in many cases. Now consider two substring occurrences C, D in a given word z. Then there are various ways in which the substrings may be related with respect to each other.
1 i i YW i
i i W dW i
V U i
aa
i W i
1 ) i i
0 l) ( i i
V U i
i i QW yW i
H H H H @@H
V U i
1 ) i i
xi
xn
i Wi i
If u x
u y, then x
y.
i Wi i
1 i
H H
i i i
1 i
23
Denition 1.35 Let C u1 u2 and D v1 v2 be occurrences in z of the strings x and y, respectively. We say that C precedes D if u 1 x is a prex of v1 . C and D overlap if C does not precede D and D does not precede C. C is contained in D if v1 is a prex of u1 and v2 is a sufx of u2 . Notice that if x is a substring of y then every occurrence of y contains an occurrence of x; but not every occurrence of x is contained in a given occurrence of y. Denition 1.36 A (string) language over the alphabet A is a subset of A . This denition admits that L and that L A . Moreover, L also may occur. The admission of is often done for technical reasons (like the introduction of a zero).
0 . Proof. This is a standard counting argument. We establish that A The claim then follows since there are as many languages as there are subsets of 0 , namely 20 . If A is nite, we can enumerate A by enumerating the strings of length 0, the strings of length 1, the strings of length 2, and so on. If A is innite, we have to use cardinal arithmetic: the set of strings of length k of any nite k is countable, and A is therefore the countable union of countable sets, again countable. One can prove the previous result directly using the following argument. (The argument works even when C is countably innite.) Theorem 1.38 Let C ci : i p , p 2, be an arbitrary alphabet and i A . Further, let v be the homomorphic extension of v : c i . The map S v S : C A dened by V S v S is a bijection between C and those languages which are contained in the direct image of v. The proof is an exercise. The set of all languages over A is closed under , , and , the relative complement with respect to A . Furthermore, we can
Theorem 1.37 Suppose A is not empty, and A 20 languages.
0 . Then there are exactly
i W i i
0 !) ( i i D V U
0 a) 6( i i V U D
V U
V U T Fl) S H D
i v
24
dene the following operations on languages.
(1.37b) (1.37c) (1.37d) (1.37e)
L :
1
L : L : L M: M L:
0 n
is called the Kleene star. For example, L A is the set of all strings which can be extended to members of L; this is exactly the set of prexes of members of L. We call this set the prex closure of L, in symbols L P . Analogously, LS : A L is the sufx or postx closure of L. It follows that L P S is nothing but the substring closure of L. In what is to follow, we shall often encounter string languages with a special distinguished symbol, the blank, typically written . Then we use the abbreviation
Let L be a language over A, C x y a context and u a string. We say that C accepts u in L, and write u L C, if C u L. The triple A A A L is a context in the sense of the previous section. Let M A and P A A. Then denote by CL M the set of all C which accept all strings from M in L (intent); and denote by ZL P the set of all strings which are accepted by all contexts from P in L (extent). We call M (L)closed if M Z L CL M . The closed sets form the socalled distribution classes of strings in a language. ZL CL M is called the Sestierclosure of M and the map S L : M ZL CL M the Sestieroperator. From Proposition 1.24 we immediately get this result. Proposition 1.39 The Sestieroperator is a closure operator. For various reasons, identifying terms with strings that represent them is a dangerous affair. As is wellknown, conventions for writing down terms
V aV
e )
e ) (
1 !) i
1 i
i i y S
1 "V i6U 0 !) ( i i
V U
i W ` Wi
V aV
i y i
(1.38)
x y:
L M:
x y:x
Ly
(1.37g)
A :
M x y
TV T V
1 i W V iU 1 i W V iU
1 U i 1 U i
(1.37f)
i 1 IS D 1 IS i D
Ln
Ln L Ln Ln A : x M y x L
n
1 !) i
1 i
D D T S D i W IS i
(1.37a)
L M:
x y:x
Ly
V aV U
25
can be misleading, since they might be ambiguous. Therefore we dened the term as an entity in itself. The string by which we denote the term is only as a representative of that term. Denition 1.40 Let be a signature. A representation of terms (by means of strings over A) is a relation R Tm A such that for each term t there exists a string x with t x R. x is called a representative or representing string of t with respect to R. x is called unambiguous if from t x u x R it follows that t u. R is called unique or uniquely readable if every x A is unambiguous. R is uniquely readable iff it is an injective function from Tm to A (and therefore its converse a partial injective function). We leave it to the reader to verify that the representation dened in the previous section is actually uniquely readable. This is not self evident. It could be that a term possesses several representing strings. Our usual way of denoting terms is in fact not even though this could uniquely readable. For example, one writes be a representative of the term 2 3 4 or of the term 2 3 4 . This hardly matters, since the two terms denote the same number, but nevertheless they are different terms. There are many more conventions for writing down terms. We give a few examples. (a) A binary symbol is typically written in between its arguments but . (b) (this is called the inx notation). So, we do not write Outermost brackets may be omitted: denotes the same term as . (c) The multiplication sign binds stronger than . So, the following strings all denote the same term.
" #!#
In logic, it was customary to use dots in place of brackets. In this notation, means the same as the more common . The dots are placed to the left or right (sometimes both) of the operation sign. Ambiguity is resolved by using more than one dot, for example . (See (Curry, 1977) on this notation.) Also, let be a binary operation symbol, written in inx notation. Suppose that denes a string for every term in the following way.
(1.40)
3 X yV U @V Ub3 V aV ) $3 U XU D X@V b3 U V aV ) $b3 U XU V 3 U
x : : :
xy xt
x x
( % 16
" $
" $Q#
" @#!d
(1.39)
x basic y basic t complex
1 i 1 i () i C0 !) 0 6) (

v 8
t v 8s
V aV ) !) U g U g
t s Qv
x g g y
t v 8s
V V ) U U g ) g
1 i y0 6) ( D D
& (& % 0)'
26
(1.41)
represents a different term than (and both Since the string have a different value) the brackets cannot be omitted. That we can do without brackets is an insight we owe to the Polish logician Jan ukasiewicz. In his notation, which is also called Polish Notation (PN), the function symbol is always placed in front of its arguments. Alternatively, the function symbol may be consistently placed behind its arguments (this is the socalled Reverse Polish Notation, RPN). There are some calculators (in addition to the programming language FORTH) which have implemented RPN. In place of . It is needed to separate the (optional) brackets there is a key called two successive operands. For in RPN, the two arguments of a function follow each other immediately. If nothing is put in between them, both the terms 13 5 and 1 35 would both be written . To prevent this, is used to separate the rst from the second input string. You therefore need . (Here, the box is the usual way in to enter into the computer computer handbooks to turn a sequence into a key. In Chapter 3 we shall deal again with the problem of writing down numbers.) Notice that in practice (i.e. as far as the tacit conventions go) the choice between Polish and Reverse Polish Notation only affects the position of the function symbol, and not the way in which arguments are placed with respect to each other. For example, suppose there is a key for the exponential function. Then to 3 , you enter get the result of 2 on a machine using RPN and on a machine using PN. Hence, the relative order between base ( ) and exponent ( ) remains. (Notice incidentally the need for typing in or something else that indicates the end of the second operand in PN!) This effect is also noted in natural languages: the subject precedes the object in the overwhelming majority of languages irrespective of the place of the verb. The mirror image of an VSO language is an SOV language, not OSV. Now we shall show that Polish Notation is uniquely readable. Let F be a set of symbols and a signature over F. Each symbol f F is assigned an arity f . Next, we dene a set of strings over F, which we assign to the various terms of Tm . PN is the smallest set M of strings over F for which
5 h 4 9 8h
" !
5 h 4 9 b$Rh
v u 81r
8 6 97
v 8u
V U
y : x xy : x y t y : t y
" #
y basic x basic t complex
V U
If t represents t, we say that is leftassociative. If on the other hand t represents the term t, is said to be rightassociative.
5 h 4 9 8$$Rh
8 6 97
5 h 4 9 8$$Rh
X "yV U V aV ) $b3 U XU D V U X V aV ) $U U X V U
5
D D
) U g
@
5 h 4 9 8$$h
V U
V )
V b3 U
8 6 #
U g h
27
the following holds.
(Notice the special case n 0. Further, notice that no special treatment is needed for variables, by the remarks of the preceding section.) This denes the set PN , members of which are called wellformed strings. Next we shall dene which string represents which term. The string f , f 0, represents the term f . If xi represents ti , i f , then f x0 x f 1 represents f t0 t f 1 . We shall now show that this relation is bijective. (A different proof than the one used here can be found in Section 2.4, proof of Theorem 2.61.) Here we use an important principle, namely induction over the length of the string. The following is for example proved by induction on x. No proper prex of x is a wellformed string. If x is a wellformed string then x has length at least 1 and the following holds.
(b) If x 1, then there are f and y such that x f y, and y is the concatenation of exactly f many uniquely dened well formed strings. The proof is as follows. Let t and u be terms represented by x. Let x 1. Then t u f , for some f F with f 0. A proper prex is the empty string, which is clearly not well formed. Now for the induction step. Let x have length at least 2. Then there is an f F and a sequence y i , i f , of wellformed strings such that
Therefore for each i f there is a term u i represented by yi . By , the ui are uniquely determined by the yi . Furthermore, the symbol f is uniquely determined, too. Now let zi , i f , be wellformed strings with
i W aa W i W
(1.43)
f z0
V U
i W aa W i W
V U
(1.42)
f y0
V U
i W
i D V U
V U
V U
ni i
(a) If x
1, then x
f for some f
F with f
i W i d!aaW dW V U
0.
V U
V U
D i W i Q!aaW QW 1 i 1 i D V i
For all f
F and for all xi M, i f x0 x f 1 M.
f :
ni
) iaaa) U D D
ni
28
Then y0 z0 . For no proper prex of z0 is a wellformed term, and no proper prex of y0 is a term. But they are prexes of each other, so they cannot be proper prexes of each other, that is to say, they are equal. If f 1, we are done. Otherwise we carry on in the same way, establishing by the same argument that y1 z1 , y2 z2 , and so on. The fragmentation of the string in f many wellformed strings is therefore unique. By inductive hypothesis, the individual strings uniquely represent the terms u i . So, x uniquely represents the term f u . This shows . Finally, we shall establish . Look again at the decomposition (1.42). If u is a wellformed prex, then u . Hence u f v for some v which can be decomposed into f many wellformed strings w i . As before we shall argue that wi xi for every i f . Hence u x, which shows that no proper prex of x is wellformed. Notes on this section. Throughout this book the policy is to regard any linguistic object as a string. Strings are considered the fundamental structures. This in itself is no philosophical commitment, just a matter of convenience. Moreover, when we refer to sentences qua material objects (signiers) we take them to be strings over the Latin alphabet. This again is only a matter of convenience. Formal language theory very often treats words rather than letters as units. If one does so, their composite nature has to be ignored. Yet, while most arguments can still be performed (since a transducer can be used to switch between these representations), some subtleties can get lost in this abstraction. We should also point out that since alphabets must be nite, there can be no innite set of variables as a primitive set of letters, as is often assumed in logic. Exercise 9. Prove Theorem 1.38. Exercise 10. (The Typewriter Model.) Fix an alphabet A. For each a A assume a unary symbol a . Finally, let be a zeroary symbol. This denes the signature . Dene a map t : Tm A as follows. : , and a s : s a. Show that is bijective. Further, show that there is no x y , and not even a term vx y term u over such that u x y such that vx y x y , for any given x A . On the other hand there x y for any given y A . does exist a wy such that wy x Exercise 11. Put Z x : i p xi n p i 1 . Now put x N y if and only if Z x Z y . Show that N is transitive and irreexive, but not total. Exercise 12. Show that the postx relation is a partial ordering, likewise the
V U
V U
V q U
V U i
1 i
i yW
V U
1 i
i W y'V U
W V U
V U
V U
V aV U A U D V U V aV ) U U
i D
V U
Wi
V U i
V aV U
WV U
V U i
V 'U i
V aV U
i V U
Fundamentals of Linguistics
29
prex and the subword relation. Show that the subword relation is the transitive closure of the union of the postx relation with the prex relation. Exercise 13. Let F, X and be three pairwise disjoint sets, a signature over F. We dene the following function from terms into strings over F X :
(To be clear: we represent terms by the string that we have used in Section 1.1 already.) Prove the unique readability of this notation. Notice that this does not already follow from the fact that we have chosen this notation to begin with. (We might just have been mistaken ...) Exercise 14. Give an exact upper bound on the number of prexes (postxes) of a given string of length n, n a natural number. Also give a bound for the number of subwords. What can you say about the exactness of these bounds in individual cases?
(1.45b)
M L:
y:
M x y
3.
In this section we shall say some words about our conception of language and introduce some linguistic terminology. Since we cannot dene all the linguistic terms we are using, this section is more or less meant to get those readers acquainted with the basic linguistic terminology who wish to read the
u au
Exercise 16. Show that not all equivalences are valid if in place of we choose and . Which implications remain valid, though?
u au
(1.46)
L N
L M
Show the following for all L M N
A : L N M
TV T V
1 i W V iU 1 i W V iU
1 U i ~ 1 U i ~
iIS D i IS
u au a
(1.45a)
L M:
y:
Exercise 15. Let L M
A . Dene x M y x L
t W
W aa W W s W
) iaaa) U
f t0
V D
(1.44)
x : :
T t)s 8iS D
T t)s S 8is u
x f t0 t
and
30
Semantical Stratum Syntactical Stratum Morphological Stratum Phonological Stratum

Figure 2. The Strata of Language
book without going through an introduction into linguistics proper. (However, it is recommended to have such a book at hand.) A central tool in linguistics is that of postulating abstract units and hierarchization. Language is thought to be more than a mere relation between sounds and meanings. In between the two realms we nd a rather rich architecture that hardly exists in formal languages. This architecture is most clearly articulated in (Harris, 1963) and also (Lamb, 1966). Even though linguists might disagree with many details, this basic architecture is assumed even in most current linguistic theories. We shall outline what we think is minimal consensus. Language is organized in four levels or layers, which are also called strata, see Figure 2: the phonological stratum, the morphological stratum, the syntactic stratum and the semantical stratum. Each stratum possesses elementary units and rules of combination. The phonological stratum and the morphological stratum are adjacent, the morphological stratum and the syntactic stratum are adjacent, and the syntactic stratum and the semantic stratum are adjacent. Adjacent strata are interconnected by so called rules of realization. On the phonological stratum we nd the mere representation of the utterance in its phonetic and phonological form. The elementary units are the phones. An utterance is composed from phones (more or less) by concatenation. The terms phone, syllable, accent and tone refer to this stratum. In the morphological stratum we nd the elementary signs of the language (see Section 3.1), which are called morphs. These are dened to be the smallest units that carry meaning, although the denition of
31
smallest may be difcult to give. They are different from words. The word is a word, but it is the combination of two morphs, the root and the ending of the third person singular present, . The units of the syntactical stratum are called lexes, and they more or less are the same as words. The units of the semantical stratum are the semes. On each stratum we distinguish concrete from abstract units. The concrete forms represent substance, while the abstract ones represent the form only. While the relationship between these two levels is far from easy, we will simplify the matter as follows. The abstract units are seen as sets of concrete ones. The abstraction is done in such a way that the concrete member of each class that appears in a construction is dened by its context, and that substitution of another member results simply in a non wellformed unit (or else in a virtually identical one). This denition is deliberately vague; it is actually hard to make precise. The interested reader is referred to the excellent (Harris, 1963) for a thorough discussion of the structural method. We shall also return to this question in Section 6.3. The abstract counterpart of a phone is a phoneme. A phoneme is simply a set of phones. The sounds of a single language are a subset of the entire space of human sounds, partitioned into phonemes. This is to say that two distinct phonemes of a languages are disjoint. We shall deal with the relationship between phones and phonemes in Section 6.3. We use the following notation. We enclose phonemes in slashes while square brackets are used to name phones. So, if [p] denotes a phone then /p/ is a phoneme containing [p]. (Clearly, there are innitely many sounds that may be called [p], but we pick just one of them.) An index is used to make clear which language the phoneme belongs to. For phonemes are strictly language bound. It makes little sense to compare phonemes across languages. Languages cut up the sound continuum in a different way. For example, let [p] and [p h ] be two , distinct phones, where [p] is a phone corresponding to the letter in [ph ] a phone corresponding to the letter in . Hindi distinguishes these two phones as instantiations of different phonemes: p H ph H . Enh glish does not. So, p E p E . Moreover, the context determines whether what is written is pronounced either as [p] or as [p h ]. Actually, in English there is no context in which both will occur. Finally, French does not even have the sound [ph ]. We give another example. The combination of the letis pronounced in two noticeably distinct ways in German. After [], it ters sounds like [c], for example in [lct], but after [a] it sounds like [x] as [naxt]; the choice between these two variants is conditioned solely in by the preceding vowel. It is therefore assumed that German does not possess
4 D 8 FEC
h h RA A w
4 8
7
G H
G 5
G H
A h h @A H
P
32
Table 1. German Plural Morphs
singular
plural
G 5p P h 4 V H 5 h54 V H bp 9WIU G 58h4 D h A A RF7 T A g 7 S 4 9 h R R$#H Q
two phonemes but only one, written , which is pronounced in these two ways depending on the context. In the same way one assumes that German has only one plural morpheme even though there is a fair number of individual plural morphs. Table 1 shows some possibilities of forming the plural in German. The plural can be expressed either by no change, or by adding an sufx, an sufx (the reduis a phonological effect and needs no accounting for plication of in in the morphology), an sufx, or by umlaut or a combination of umlaut together with an sufx. (Umlaut is another name for the following change of vowels: becomes , becomes , and becomes . All other vowels remain the same. Umlaut is triggered by certain inectional or derivational sufxes.) All these are clearly different morphs. But they belong to the same morpheme. We therefore call them allomorphs of the plural morpheme. The differentiation into strata allows to abstract away from irregularities. Moving up one stratum, the different members of an abstraction class are not distinguished. The different plural morphs for example, are dened as sequences of phonemes, not of phones. To decide which phone is to be inserted is the job is known to the of the phonological stratum. Likewise, the word syntactical stratum only as a plural nominative noun. That it consists of the root morph together with the morph rather than any other plural morph is not visible in the syntactic stratum. The difference between concrete and abstract carries over in each stratum in the distinction between a surface and a deep substratum. The morphotaxis has at deep level only the root and the plural morpheme. At the surface, the latter gets realized as . The step from deep to surface can be quite complex. For example, the plural of is formed by changing the root vowel and adding
5 h 8$4
G 5
5 bh
G H
5 hH4 8$p #H G
A 7 g $7 4 9 h R H
G H
car car bus light father night
g VH
5 bh
S T I U P G 5
h A A F7 p H
P
G 5
h $4
G 5
G H V
5 8h
33
the sufx . (Which of the vowels of the root are subject to umlauted must be determined by the phonological stratum. For example, the plural of altar is and not or !) As we have already said, on the socalled deep morphological (sub)stratum we nd only the combination of two morphemes, the morpheme and the plural morpheme. On the syntactical stratum (deep or surface) nothing of that decomposition is visible. We have one lex(eme), . On the phonological stratum we nd a sequence of 5 (!) phonemes, which in writing correspond to , , , and . This is the deep phonological representation. On the surface, we nd the allophone [c] for the phoneme (written as) . In Section 3.1 we shall propose an approach to language by means of signs. This approach distinguishes only 3 dimensions: a sign has a realization, it has a combinatorics and it has a meaning. While the meaning is uniquely identiable to belong to the semantic stratum, for the other two this is not clear. The combinatorics may be seen as belonging to the syntactical stratum. The realization of a sign, nally, could be spelled out either as a sequence of phonemes, as a sequence of morphemes or as a sequence of lexemes. Each of these choices is legitimate and yields interesting insights. However, notice that choosing sequences of morphemes or lexemes is somewhat incomplete since it further requires an additional algorithm that realizes these sequences in writing or speaking. Language is not only spoken, it is also written. However, one must distinguish between letters and sounds. The difference between them is foremost a physical one. They use a different channel. A channel is a physical medium in which the message is manifested. Language manifests itself rst and foremost acoustically, even though a lot of communication is done in writing. We principally learn a language by hearing and speaking it. Mastery of writing is achieved only after we are fully uent just speaking the language, even though our views of language are to a large extent shaped by our writing culture (see (Coulmas, 2003) on that). (Sign languages form an exception that will not be dealt with here.) Each channel allows by its mere physical properties a different means of combination. A piece of paper is a two dimensional thing, and we are not forced to write down symbols linearly, as we are with acoustical signals. Think for example of the fact that Chinese characters are composite entities which contain parts in them. These are combined typically by juxtaposition, but characters are aligned vertically. Moreover, the graphical composition internally to a sign is of no relevance for the actual sound that goes with it. To take another example, Hindi is written in a
5 H 4 b#P
G H
VH 9
h 5 4 bV H #P
G H G H
VS
h 4
h 5 H 4 $b9P
G 5
VS
h 5 4 V H #P
34
syllabic script, which is called Devanagari. Each simple consonantal letter denotes a consonant plus . Vowel letters may be added to these in case the vowel is different from . (There are special characters for word initial vowels.) Finally, to denote consonantal clusters, the consonantal characters are melted into each other in a particular way. There is only a nite number of consonantal clusters and the way the consonants are melted is xed. The individual consonants are usually recognizable from the graphical complex. In typesetting there is a similar phenomenon known as ligature. The graphemes and melt into one when the rst is before the second: . (Typewriters have no ligature, for obvious reasons. So you get .) Also, in mathematics the possibilities of the graphical channel are widely used. We use indices, superscripts, subscripts, underlining, arrows and so on. Many diagrams are , therefore not so easy to linearize. (For example, x is spelled out as x as .) Sign languages also make use of the threedimensional space, which proves to require different perceptual skills than spoken language. While the acoustic manifestation of language is in some sense essential for human language, its written manifestation is typically secondary, not only for the individual human being, as said above, but also from a cultural historic point of view. The sounds of the language and the pronunciation of words is something that comes into existence naturally, and they can hardly be xed or determined arbitrarily. Attempts to stop language from changing are simply doomed to failure. Writing systems, on the other hand, are cultural products, and subject to sometimes severe regimentation. The effect is that writing systems show much greater variety across languages than sound systems. The number of primitive letters varies between some two dozen and a few thousand. This is so since some languages have letters for sounds (more or less) like Finnish (English is a difcult case), others have letters for syllables (Hindi, written in Devanagari) and yet others have letters for words (Chinese). It may be objected that in Chinese a character always stands for a syllable, but words may consist of several syllables, hence of several characters. Nevertheless, the difference with Devanagari is clear. The latter shows you how the word sounds like, the former does not, unless you know character by character how it is pronounced. If you were to introduce a new syllable into Chinese you would have to create a new character, but not so in Devanagari. But all this has to be taken with care. Although French uses the Latin alphabet it becomes quite similar to Chinese. You may still know how to pronounce a word that you see written down, but from hearing it you are left in the dark as to how to spell it. For example, the following words are
4 #
D X
5 H
35
4 7
pronounced completely alike: , , , ; similarly , , , . In what is to follow, language will be written language. This is the current practice in such books as this one; but it requires comment. We are using the socalled Latin alphabet. It is used in almost all European countries, while each country typically uses a different set of symbols. The difference is slight, but needs accounting for (for example, when you wish to produce keyboards or design fonts). Finnish, Hungarian and German, for example, use , and . The letter is used in the German alphabet (but not in Switzerland). In French, one uses , also accents, and so on. The resource of single characters, which we call letters, is for the European languages somewhere between 60 and 100. Besides each letter, both in upper and lower case, we also have the punctuation marks and some extra symbols, not to forget the ubiquitous blank. Notice, however, that not all languages have a blank (Chinese is a case in point, and also the Romans did not use any blanks). On the other hand, one blank is not distinct from two. We can either decide to disallow two blanks in a row, or postulate that they are equal to one. (So, the structure .) A nal problem area to be considered we look at is A is our requirement that sign composition is additive. This means that every change that occurs is underlyingly viewed as adding something that was not there. This can yield awkward results. While the fact that German umlaut is graphically speaking just the addition of two dots ( becomes , becomes , becomes ), the change of a lower case letter to an upper case letter cannot be so analysed. This requires another level of representation, one at which the process is completely additive. This is harmless, if we only change the material aspect (substance) rather than the form. The counterpart of a letter in the spoken languages is the phoneme. Every language utterance can be analyzed into a sequence of phonemes (plus some residue about which we will speak briey below). There is generally no biunique correspondence between phonemes and letters. The connection between the visible and the audible shape of language is everything but predictable or unambiguous in either direction. English is a perfect example. There is hardly any letter that can unequivocally be related to a phoneme. For example, the letter represents in many cases the phoneme [g] unless it is followed by , in which case the two typically together represent a sound that can be zero (as in [sO:t]), or (as in ([la:ft@]). To add to the confusion, the letters represent different sets of phones in different languages. (Note that it makes no sense to speak of the same phoneme in two
V
5 8h
g VH
A 5 8h
g VH
5 h b4
6 ! G
7 H 7 H I$h Rh
R !
7 H IP
4
7 RH
X
T ` W `
7 IH
R !
` S V Y U
7 g A
A h 5 5 $8h
a G V
h 5 5 bh
7 7
36
different languages, as phonemes are abstractions that are formed within a single language.) The letter has many different manifestations in English, German and French that are hardly compatible. This has prompted the invention of an international standard, the socalled International Phonetic Alphabet (IPA, see (IPA, 1999)). Ideally, every sound of a given language can be uniquely transcribed into IPA such that anyone who is not acquainted with the language can reproduce the utterances correctly. The transcription of a word into this alphabet therefore changes whenever its sound manifestation changes, irrespective of the spelling norm. Unfortunately, the transcription must ultimately remain inconsequential, because even in the IPA letters stand for sets of phones, but in every language the width of a phoneme (= the set of phones it contains) is different. For example, if (English) p E contains both (Hindi) p H and ph H , we either have to represent in English by (at least) two letters or else give up the exact correspondence. The carriers of meaning are however not the sounds or letters (there is simply not enough of them); it is certain sequences thereof. Sequences of letters that are not separated by a blank or a punctuation mark other than are called words. Words are units which can be analyzed further, for example into letters, but for the most part we shall treat them as units. This is the reason why the alphabet A in the technical sense will often not be the alphabet in the sense of stock of letters but in the sense of stock of words. However, since most languages have innitely many words (due to compounding), and since the alphabet A must be nite, some care must be exercised in choosing the alphabet. Typically, it will exclude the compound words, but it will have to include all idioms. We have analyzed words into sequences of letters or sounds, and sentences into sequences of words. This implies that sentences and words can always be so analyzed. This is what we shall assume throughout this book. The individual occurrences of sounds (letters) are called segments. For ex. ample, the (occurrences of the) letters , , and are the segments of The fact that words can be segmented is called segmentability property. At closer look it turns out that segmentability is an idealization. For example, a question differs from an assertion in its intonation contour, which is the rise and fall of the pitch during the utterance. The contour shows distribution over the whole sentence but follows specic rules. It is of course different in different languages. (Falling pitch at the end of a sentence, for example, may accompany questions in English, but not in German.) Because of its nature, intonation contour is called a suprasegmental feature. There are more, for
4 c
g 9
g 9
37
example emphasis. Segmentability differs also with the channel. In writing, a question is marked by a segmental feature (the question mark), but emphasis is not. Emphasis is typically marked by underlining or italics. For example, if we want to emphasize the word board, we write or board. As can be seen, every letter is underlined or set in italics, but underlining or italics is usually not something that is meant to emphasize those letters that are marked by it; rather, it marks emphasis of the entire word that is composed from them. We could have used a segmental symbol, just like quotes, but the fact of the matter is that we do not. Disregarding this, language typically is segmentable. However, even if this is true, the idea that the morphemes of the language are sequences of letters is largely mistaken. To give an extreme example, the plural is formed in Bahasa Indonesia by reduplicating the noun. For exmeans child, the word therefore means ample, the word means man, and means men. children, the word Clearly, there is no sequence of letters or phonemes that can be literally said to constitute a plural morph. Rather, it is the function f : A A : x x x, sending each string to its duplicate (with an interspersed hyphen). Actually, and is commonplace. Here, is a in writing the abbreviation segmentable marker of plurality. However, notice that the words in the singular or the plural are each fully segmentable. Only the marker of plurality cannot be identied with any of the segments. This is to some degree also the case in German, where the rules are however much more complex, as we have seen above. The fact that morphs are (at closer look) not simply strings will be of central concern in this book. Finally, we have to remark that letters and phonemes are not unstructured either. Phonemes consist of various socalled distinctive features. These are features that distinguish the phonemes from each other. For example, [p] is distinct from [b] in that it is voiceless, while [b] is voiced. Other voiceless consonants are [k], [t], while [g] and [d] are once again voiced. Such features can be relevant for the description of a language. There is a rule of German (and other languages, for example Russian) that forbids voiced consonants hunting is to occur at the end of a syllable. For example, the word pronounced ["ja:kt], not ["ja:gd]. This is so since [g] and [d] may not occur at the end of the syllable, since they are voiced. Now, rst of all, why do we not write then? This is so since inection and derivation show that when these consonants occur nonnally in the syllable they are voiced: we have ["ya:kden] huntings, with [d] now in fact being voiced, and
i i @c i
R #
R !
9 H I5 g d H 9 FRH
5 H g
c c
R $
d F
9 H R5 g H 9 RH
R $
9 H R5 g
d F
H 9 IH
R $
9 H R5 g
d F
H 9 RH
4 d 9
9 h
R
38
f
also ["ya:g@n] to hunt. Second: why do we not propose that voiceless consonants become voiced when syllable initial? Because there is plenty of evidence that this does not happen. Both voiced and voiceless sounds may appear at the beginning of the syllable, and those ones that are analyzed as underlyingly voiceless remain so in whatever position. Third: why bother writing the underlying consonant rather than the one we hear? Well, rst of all, since we know how to pronounce the word anyway, it does not matter whether we write [d] or [t]. On the other hand, if we know how to write the word, we also know a little bit about its morphological behaviour. What this comes down to is that to learn how to write a language is to learn how the language works. Now, once this is granted, we shall explain why we nd [k] in place of [g] and [t] in place of [d]. This is because of the internal organisation of the phoneme. The phoneme is a set of distinctive features, one of which (in German) is voiced . The rule is that when the voiced consonant may not occur, it is only the feature voiced that is replaced by voiced . Everything else remains the same. A similar situation is the relationship between upper and lower case letters. The rule says that a sentence may not begin with a lower case letter. So, when the sentence begins, the rst letter is changed to its upper case counterpart if necessary. Hence, letters too contain distinctive features. Once again, in a dictionary a word always appears as if it would normally appear elsewhere. Notice by the way that although each letter is by itself an upper or a lower case letter, written language attributes the distinction upper versus lower case to the word not to the initial letter. Disregard, ing some modern spellings in advertisements (like in Germany and so on) this is a reasonable strategy. However, it is nevertheless not illegitimate to call it a suprasegmental feature. In the previous section we have talked extensively about representations of terms by means of strings. In linguistics this is an important issue, which is typically discussed in conjunction with word order. Let us give an example. Disregarding word classes, each word of the language has one (or several) arities. The nite verb has arity 2. The proper names and on the other hand have arity 0. Any symbol of arity 0 is called a functor with respect to its argument. In syntax one also speaks of head and complement. These are relative notions. In the term , the functor , and its arguments are and . To distinguish these or head is arguments from each other, we use the terms subject and object. is is the object of the sentence. The notions subject and the subject and object denote socalled grammatical relations. The correlation between
h r ut RvFq
D R q p
h r ut RxFq
h 5 h 4 89
h r ut RvFq s V 8@jRvFq w8jih r q r ) h r ut s U
r q 8@r
r q 8@r
jih
r q 8@r
g h
jih
A A h 9D A 77 T h
9 h R $#H
39
argument places and grammatical relations is to a large extent arbitrary, and is of central concern in syntactical theory. Notice also that not all arguments are complements. Here, syntactical theories diverge as to which of the arguments may be called complement. In generative grammar, for example, it is assumed that only the direct object is a complement. Now, how is a particular term represented? The representation of is , that of is and that of is . The whole term (1.47) is represented by the string (1.48). (1.48)
y
So, the verb appears after the subject, which in turn precedes the object. At the end, a period is placed. However, to spell out the relationship between a language and a formal representation is not as easy as it appears at rst sight. For rst of all, the term should be something that does not depend on the particular language we choose and which gives us the full meaning of the term (so it is like a language of thought or an interlingua, if you wish). So the above term shall mean that Marcus sees Paul. We could translate the English sentence (1.48) by choosing a different representation language, but the choice between languages of representation should actually be immaterial as long as they serve the purpose. This is a very rudimentary picture but it works well for our purposes. We shall return to the idea of producing sentences from terms in Chapter 3. Now look rst at the representatives of the basic symbols in some other languages. German Latin Hungarian
Here is how (1.47) is phrased in these languages. (1.51)
English is called an SVOlanguage, since in transitive constructions the subject precedes the verb, and the verb in turn the object. This is exactly the inx
(1.52)
(1.50)
P H @'G A 7 P 7 H FRRG
y9'cH 4 P H G 4 H P A 9F7 ` f y41D D 7 P 7 H A IIG F7 G P 7 H G R4 7F7 h D A A
H 9P` 4 H
4 FD
(1.49)
P 7 H r Rq 8@r G
p e A7 p bH e 5 AF7 p H e 5 A r 7 u t 5 q FRvFH s h
p e P 7 H G A h h A A 5 R@F7 H nV8@jRv9q w8jih r q r ) h r ut s U
(1.47)
jih
P 7 H RG
r q 8@r
p e A 5 7 bH h jD i bA h
h r ut RvFq
5H p e 5H e 5 H
A h h @A
40
notation. (However, notice that languages do not make use of brackets.) One uses the mnemonic symbols S, V and O to dene the following basic 6 types of languages: SOV, SVO, VSO, OSV, OVS, VOS. These names tell us how the subject, verb and object follow each other in a basic transitive sentence. We call a language of type VSO or VOS verb initial, a language of type SOV or OSV verb nal and a language of type SVO or OVS verb medial. By this denition, German is SVO, Hungarian too, hence both are verb medial and Latin is SOV, hence verb nal. These types are not equally distributed. Depending on the method of counting, 40 50 % of the worlds languages are SOV languages, up to 40 % SVO languages and another 10 % are VSO languages. This means that in the vast majority of languages the order of the two arguments is: subject before object. This is why one does not generally emphasize the relative order of the subject with respect to the object. There is a bias against placing the verb initially (VSO), and a slight bias to put it nally (SOV) rather than medially (SVO). One speaks of a head nal (head initial) language if a head is consistently put at the end behind all of its arguments (at the beginning, before all the arguments). One denotes the type of order by XH (HX), X being the complement, H the head. There is no notion of a head medial language for the reason that most heads only have one complement. It is often understood that the direct object is the only complement of the verb. Hence, the word orders SVO and VOS are head initial, OVS and SOV head nal. (The orders VSO and OSV are problematic since the verb is not adjacent to its object.) A verb is a head, however a very important one, since it basically builds the clause. Nevertheless, different heads may place their arguments differently, so a language that is verb initial need not be head initial, a language that is verb nal need not be head nal. Indeed, there are few languages that are consistently head initial (medial, nal). Japanese is rather consistently head nal. Even a relative clause precedes the noun it modies. Hungarian is a mixed case: adjectives precede nouns, there are no prepositions, only postpositions, but the verb tends to precede its object. For the interested reader we give some more information on the languages shown above. First, Latin was initially an SOV language, however word order was not really xed (see (Lehmann, 1993) and (Bauer, 1995)). In fact, any of the six permutations of the sentence (1.51) is grammatical. Hungarian is more complex, again the word order shown in (1.52) is the least marked, but the rule is that discourse functions determine word order. (Presumably this is true for Latin as well.) German is another special case. Against all appearances
41
there is all reason to believe that it is actually an SOV language. You can see this by noting rst that only the carrier of inection appears in second place, for example only the auxiliary if present. Second, in a subordinate clause all parts of the verb including the carrier of inection are at the end. (1.53) (1.54) (1.55) (1.56) (1.57) (1.58)
y G p e P 7 H G R4 7F7 H h D A A 5
Marcus sees Paul.
Marcus wants to see Paul.
Marcus wants to be able to see Paul. ..., because Marcus sees Paul. ..., ...,
G
...,
..., because Marcus wants to see Paul. ..., because Marcus wants to be able to see Paul. So, the main sentence is not always a good indicator of the word order. Some languages allow for alternative word orders, like Latin and Hungarian. This is not to say that all variants have the same meaning or signicance; it is only that they are equal as representatives of (1.47). We therefore speak of Latin as having free word order. However, this only means that the head and the argument can assume any order with respect to each other, not that simply all permutations of the words mean the same. Now, notice that subject and object are coded by means of socalled cases. In Latin, the object carries accusative case, so we nd instead of . Likewise, in Hungarian we have in place of , the nominative. So, the way a representing string is arrived at is rather complex. We shall return again to case marking in Chapter 5. Natural languages also display socalled polyvalency. We say that a word is polyvalent if it can have several arities (even with the same meaning). The can be unary (= intransitive) as well as binary (= transitive verb if the second argument is accusative, intransitive otherwise). This is not allowed in our denition of signature. However, it can easily be modied to account for polyvalent symbols.
y
P H @'G f IRG 7 P 7 H
P P D 9 h 9 bR9 V g
G p e P P D 9 h P 7 H A 5 bRh RA IG F7 bH b! P D h
9 h 9 9 V g
4 # '
P @H G
y 4
G p e 9 h P 7 H A 5 Rh RA IG F7 bH b! P D h
p e h D P 7 H A 5 bA IG F7 bH b! P D h
G p e 9 h A P 7 H G P P D A 5 h RRcbF7 H G p e 9 h A P 7 H G P P D A 5 h RRcbF7 H
P P g 5 g
A 7 P 7 H FRIG
4
42
G 5
Notes on this section. The rule that spells out the letters in German is more complex than the above explications show. For example, it is [x] in but [c] in . This may have two reasons: (a) There is a morpheme boundary between and in the second word but not in the rst. This morpheme boundary induces the difference. (b) The morpheme is special in that will always be realized as [c]. The difference between (a) and (b) is that while (a) denes a realization rule that uses only the phonological representation, (b) uses morphological information to dene the realization. Mel uk denes the realization rules as follows. In each stratum, there are c rules that dene how deep representations get mapped to surface representations. Across strata, going down, the surface representations of the higher stratum get mapped into abstract representations of the lower stratum. (For example, a sequence of morphemes is rst realized as a sequence of morphs and then spelled out as a sequence of phonemes, until, nally, it gets mapped onto a sequence of phones.) Of course, one may also reverse the process. However, adjacency between (sub-)strata remains as dened. Exercise 17. Show that in Polish Notation, unique readability is lost when there exist polyvalent function symbols. Exercise 18. Show that if you have brackets, unique readability is guaranteed even if you have polyvalency. Exercise 19. We have argued that German is a verb nal language. But is it strictly head nal? Examine the data given in this section as well as the data given below. Josef is.picking a beautiful rose for Mary Heinrich is fatter than Josef
e
Exercise 20. Even if languages do not have brackets, there are elements that indicate clearly the left or right periphery of a constituent. Such elements are and ( ). Can you name more? Are there elements in the determiners English indicating the right periphery of a constituent? How about demonstratives like or ? Exercise 21. By the denitions, Unix is head initial. For example, the com-
9 H
4 #
G H
A $D
(1.60)
y X
h RA g
A P 5 h H bd
4 5 D
A $
D q
5 9 D $h
(1.59)
9 h
G 5
y q D
H 5 8 bH
5 $V 7
h RA g
G 5p p h 9 V g !d V 7 P A h 9D h 4
G 5
9 h
7 5p G
7 H R5
X 8
h A g
G H
9 h
G 5
7 IH
Trees
43
mand precedes its arguments. Now study the way in which optional arguments are encoded. (If you are sitting behind a computer on which Unix (or Linux) is running, type and you get a synopsis of the command and its syntax.) Does the syntax guarantee unique readability? (For the more linguistic minded reader: which type of marking strategy does Unix employ? Which natural language you know of corresponds best to it?) 4. Trees
where L is a nite linearly Strings can also be dened as pairs ordered set and : L A a function, called the labelling function. Since L is nite we have L n for n : L . (Recall that n is a set that is linearly by the isomorphic n , and eliminating ordered by .) Replacing L the redundant , a string is often dened as a pair n , where n is a natural number. In what is to follow, we will very often have to deal with extensions of relational structures (over a given signature ) by a labelling function. , where M is a set, an interpretation They have the general form M and a function from M to A. These structures shall be called structures over A or Astructures. A very important notion in the analysis of language is that of a tree. A tree , is a special case of a directed graph. A directed graph is a structure G 2 is a binary relation. As is common usage, we shall write x where G y if x y or x y. Also, x and y are called comparable if x y or y x. A (directed) chain of length k is a sequence x i : i k 1 such that xi xi 1 for all i k. An undirected chain of length k is a sequence x i : i k 1 where xi xi 1 or xi 1 xi for all i k. A directed graph is called connected if for every two elements x and y there is an undirected chain from x to y. A directed chain of length k is called a cycle of length k if x k x0 . A binary relation is called cycle free if it only has cycles of length 0. A root is an r, where is the reexive, transitive element r such that for every x x closure of .
0 i) (
Denition 1.42 and if x y and x
G is called a forest if is transitive and irreexive z then y and z are comparable. A forest with a root is
0 i) (
Denition 1.41 A directed acyclic graph (a DAG) is a pair that G2 is an acyclic relation on G. If is transitive, directed transitive acyclic graph (DTAG).
G such is called a
0 g 0 i) (
03 ) ( 0 1 Ci) (
0 i) (
0 3) 2(
0 3) )
5 8 P 9 $FIH f (
1 0i) ( 1 0 1 Ci) ( 0 i) ( D 3
5 8 1P
44
called a tree. In a connected rooted DTAG the root is comparable with every other element since the relation is transitive. Furthermore, in presence of transitivity is cycle free iff it is irreexive. For if is not irreexive it has a cycle of length 1. Conversely, if there is a cycle x i : i k 1 of length k 0, we immediately have x0 xk x0 , by transitivity. If x y and there is no z such that x z y, x is called a daughter of y, and y the mother of x, and we write x y. Lemma 1.43 Let T be a nite tree. If x y then there exists a x such that x x y and a y such that x y y. x and y are uniquely determined by x and y. The proof is straightforward. In innite trees this need not hold. We dene x y by x y or y x and say that x and y overlap. The following is also easy. Lemma 1.44 (Predecessor Lemma) Let be a nite tree and x and y nodes which do not overlap. Then there exist uniquely determined u, v and w, such that x u w, y v w and v u. A node branches n times downwards if it has exactly n daughters; and it branches n times upwards if it has exactly n mothers. We say that a node branches upwards (downwards) if it branches upwards or downwards at least 2 times. A nite forest is characterized by the fact that it is transitive, irreexive and no node branches upwards. Therefore, in connection with trees and forests we shall speak of branching when we mean downward branching. x is called a leaf if there is no y x, that is, if x branches 0 times. The . set of leaves of is denoted by b Further, we dene the following notation.
Y
By denition of a forest, x is linearly ordered by . Also, x together with the restriction of to x is a tree. A set P G is called a path if it is linearly ordered by and convex, that is to say, if x y P then z P for every z such that x z y. The length of P is dened to be P 1. A branch is a maximal path with respect to set
(1.61)
x:
y:y
x:
y:y
V U
0 i) (
1 )
Trees
45
inclusion. The height of x in a DTAG, in symbols h x or simply h x , is the maximal length of a branch in x. It is dened inductively as follows.
Dually we dene the depth in a DTAG.
If and are DTAG, forests or trees, then is a subDTAG, subforest and subtree of , respectively. A subtree of with underlying set x is called a constituent of . Denition 1.46 Let A be an alphabet. A DAG over A (or an ADAG) is a pair such that G is a DAG and : G A an arbitrary function. Alternatively, we speak of DAGs with labels in A, or simply of labelled DAGs if it is clear which alphabet is meant. Similarly with trees and DTAGs. The notions of substructures are extended analogously. The tree structure in linguistic representations encodes the hierarchical relations between elements and not their spatial or temporal relationship. The latter have to be added explicitly. This is done by extending the signature by another binary relation symbol, . We say that x is before y and that y is after x if x y is the case. We say that x dominates y if x y. The relation articulates the temporal relationship between the segments. This is rst of all dened on the leaves, and it is a linear ordering. (This reects the insistance on segmentability. It will have to be abandoned once we do not assume segmentability.) Each node x in the tree has the physical span of its segments. This allows to dene an ordering between the hierarchically higher elements
d i
if
D i)
Denition 1.45 Let G G and G H. Then is called a subgraph of

i
i) (
0 i) (
and call this the height of
. (This is an ordinal, as is easily veried.) H be directed graphs and 2 H G .
V U S
V U
(1.64)
For the entire DTAG
we set h x :x T
V U S
V U
(1.63)
d x :
0 1
max d y : y
V U S
V U
(1.62)
hx :
0 1
max h y : y
if x is a leaf, otherwise.
if x is a root, otherwise.
V U
V U
0 3) (
46
j
as well. We simply stipulate that x y iff all leaves below x are before all leaves below y. This is not unproblematic if nodes can branch upwards, but this situation we shall rarely encounter in this book. The following is an intrinsic denition of these structures.
The condition (ot2) requires that the ordering is coherent with the ordering on the leaves. It ensures that x y only if all leaves below x are before all leaves below y. (ot3) is a completeness condition ensuring that if the latter holds, then indeed x y. x b . We We agree on the following notation. Let x G. Put x : call this the extension of x. x is linearly ordered by . If a labelling function is given in addition, we write k x : x x and call this the associated string of x. It may happen that two nodes have the same associated string. The string associated with the entire tree is
A constituent is called continuous if the associated string is convex with respect to . A set M is convex (with respect to ) if for all x y z M: if x z y then z M as well. For sets M, N of leaves put M N iff for all x M and all y N we have x y. From (ot4) and (ot3) we derive the following:
This property shows that the orderings on the leaves alone determines the relation uniquely.
j
(1.66)
1 ) )
0V aU U ( 3 ) j ) V
V U
(1.65)
V U
d k
0 3) j) l a(
(ot4) If x is not a leaf and for all y If z is not a leaf and for all y
V U
(ot3) If x If x
z and y z and y
x then also y z then also x
z. y. xy zx z then also x y then also x z. z.
0 i) (
(ot2)
0 i) (
(ot1) T
is a tree. .
is a linear, strict ordering on the leaves of T
0 j) i) (
Denition 1.47 An ordered tree is a triple T holds.
such that the following
Trees
j
47
We emphasize that the ordering cannot be linear if the tree has more than one element. It may even happen that . One can show that overlapping nodes can never be comparable with respect to . For let x y, say x y. Let u x be a leaf. Assume x y; then by (ot3) u y as well as u u. This contradicts the condition that is irreexive. Likewise y x cannot hold. So, nodes can only be comparable if they do not overlap. We now ask: is it possible that they are comparable exactly when they do not overlap? In this case we call exhaustive. Theorem 1.49 gives a criterion on the existence of exhaustive orderings. Notice that if M and N are convex sets, then so is M N. Moreover, if M N then either M N or N M. Also, M is convex iff for all u: u M or M u. be a tree and a linear ordering on the leaves. Theorem 1.49 Let T There exists an exhaustive extension of iff all constituents are continuous. Proof. By Theorem 1.48 there exists a unique extension, . Assume that all constituents are continuous. Let x and y are nonoverlapping nodes. Then x y . Hence x y or y x . since both sets are convex. So, by (1.66) we have x y or y x. The ordering is therefore exhaustive. Conversely, assume that is exhaustive. Pick x. We show that x is convex. Let u be a leaf and u x . Then u does not overlap with x. By hypothesis, u x or x u, whence u x or x u , by (1.66). This means nothing but that either u y for all y x or y u for all y x . So, x is convex. Lemma 1.50 (Constituent Lemma) Assume T is an exhaustively ordered Atree. Furthermore, let p q. Then there is a context C uv such that
x.
0 i) ( j )
Proposition 1.51 Let T ing iff x y for some y
be an ordered tree and x
T . x is 1branch-
0 ) (
The converse does not hold. Furthermore, it may happen that C which case k q k p without q p.
i WV U
W i
V aV U U
V U
V U
V U
(1.67)
k q
C k p
u k p
in
0 ) 6( i i
0 %i) ( k j) k
j j j
0 3) j) i) (
x1
Theorem 1.48 Let T be a tree and Then there exists exactly one relation dered tree.
j 1 cj c j 1 k j k j k j j
j
a linear ordering on its leaves. such that T is an or-
k j
k j
0 i) ( t
j
0 i) (
k %j
x@ t
k %j
48
Proof. Let x be a 1branching node with daughter y. Then we have x y but x y. So, the condition is necessary. Let us show that is sufcient. Let x be minimally 2branching. Let u x. There is a daughter z x such that u z, and there is z x different from z. Then u z x as well as z x . All sets are nonempty and z z . Hence z x and so also u x. We say that a tree is properly branching if it has no 1branching nodes. There is a slightly different method of dening trees. Let T be a set and a cycle free relation on T such that for every x there is at most one y such that x y. And let there be exactly one x which has no successor (the root). Then put : . T is a tree. And x y iff x is the daughter of y. Let D x be the set of daughters of x. Now let P be a relation such that (a) y P z only if y and z are sisters, (b) P , the transitive closure of P, is a relation that linearly orders D x for every x, (c) for every y there is at most one z such that y P z and at most one z such that z P y. Then put x y iff there is z such that (a) x x z for some x, (b) y y y for some y, (c) x P y. and P are the immediate neighbourhood relations in the tree.
j f k 0 j) i) (
Finally we mention a further useful concept, that of a constituent structure.
Proposition 1.54 Let M be a nonempty set. There is a biunique correspondence between nite constituent structures over M and nite properly branching trees whose set of leaves is x : x M . Proof. Let M be a constituent structure. Then is a tree. To see this, one has to check that is irreexive and transitive and that it has a root. , because of This is easy. Further, assume that S T U. Then U T S condition (cs2). Moreover, because of (cs3) we must have U T or T U.
} w
0 m ) (
T S S
0 !)
(cs3) if S T
and S
T as well as T
S then S T
1 w
(cs2)
,M
, .
1 T S
(cs1)
for every x
M,
Denition 1.53 Let M be a set. A constituent structure over M is a system of subsets of M with the following properties.
j
Proposition 1.52 Let T iff there are x x and y

f
be an exhaustively ordered tree. Then x y which are sisters and x y .
n m d } }
Y
j k t
Y o
0 i) (
V U
n m k }
Y
V U
Trees
49
This means nothing else than that T and U are comparable. The set of leaves is exactly the set x : x M . Conversely, let T be a properly branching tree. Put M : b and : x : x T . We claim that M is a constituent structure. For (cs1), notice that for every u b , u u . Further, for every x x , since the tree is nite. There is a root r of , and we have r M. This shows (cs2). Now we show (cs3). Assume that x y and y x . Then x and y are incomparable (and different). Let u be a leaf and u x , then we have u x. u y cannot hold since u is linear, and then x and y would be comparable. Likewise we see that from u y we get u x. Hence x y . The constructions are easily seen to be inverses of each other (up to isomorphism). In general we can assign to every tree a constituent structure, but only if the tree is properly branching it can be properly reconstructed from this structure. The notion of a constituent structure can be extended straightforwardly to the notion of an ordered constituent structure, and we can introduce labellings. We shall now discuss the representation of terms by means of trees. There are two different methods, both widely used. Before we begin, we shall introduce the notion of a tree domain. Denition 1.55 Let T be a set of nite sequences of natural numbers. T is called a tree domain if the following holds.
We assign to a tree domain T an ordered tree in the following way. The set of nodes is T , (1) x y iff y is a proper prex of x and (2) x y iff there are numbers i j and sequences u, v, w such that (a) i j and (b) x u i v, y u j w. (This is exactly the lexicographical ordering.) Together with these relations, T is an exhaustively ordered nite tree, as is easily seen. Figure 3 0 1 2 10 11 20 200 . If T is a tree domain shows the tree domain T and x T then put
This is the constituent below x. (To be exact, it is not identical to this constituent, it is merely isomorphic to it. The (unique) isomorphism from T x onto the constituent x is the map y x y.)
i i dW i
1 i Wi
i IS
(1.68)
T x:
y:x y
i i W W i i iD j i
Wi
) ) ) ) S
i i i
i i
(td2) If x i
T and j
1 i
(td1) If x i
T then x
T. i then also x j T.
$ p 12T S U V 0 !) ( D `
q
0 i) (
@S
V lU 1
w " D }
D T S S
1 W p ) Wi Wi
1 i
i W W i
50
10
11
Figure 3. A Tree Domain
Conversely, let T be an exhaustively ordered tree. We dene a tree domain T by induction on the depth of the nodes. If d x 0, let x : . In this case x is the root of the tree. If x is dened, and y a daughter of x, then put y : x i, if y is the ith daughter of x counting from the left (starting, as d x .) We can see quite easily that the usual, with 0). (Hence we have x so dened set is a tree domain. For we have u T as soon as u j T for some j. Hence (td1) holds. Further, if u i T , say u i y then y is the ith daughter of a node x. Take j i. Then let z be the jth daughter of x (counting from the left). It exists, and we have z u j. Moreover, it can easily be shown that the relations dened on the tree domain are exactly the ones that are dened on the tree. In other words the map x x is an isomorphism. Theorem 1.56 Let T be a nite, exhaustively ordered tree. The function x x is an isomorphism from onto the associated tree domain . . Furthermore, iff Terms can be translated into labelled tree domains. Each term t is assigned a tree domain t b and a labelling function t . The labelled tree domain associated with t is t m : t b t . We start with the variables. xb : , and x : x. Assume that the labelled tree domains t im , i n 1, are dened, and put n : f . Let s : f t0 tn 1 ; then
i n
1 i
i W S
s "T S
(1.69)
sb :
i x:x
tib
T S
W i
V U
Wi
1 i
W i
1 D V
V U
r D
W i
) aaa) U
0 j) i) (
2 20 200
r o
) (
0 j) i) ( D D D
V U
0 i) j )
l(
Trees
51
Then s is dened as follows.
This means that sm consists of a root named f which has n daughters, to which the labelled tree domains of t0 tn 1 are isomorphic. We call the reprem the dependency coding. This coding is more sentation which sends t to t efcient that the following, which we call structural coding. We choose a new symbol, T , and dene by induction to each term t a tree domain t c and a labelling function t . Put xc : 0 , x : , x 0 : x. Further let for s f t0 tn 1
0 i n 1
(Compare the structural coding with the associated string in the notation without brackets.) In Figure 4 both codings are shown for the term for comparison. The advantage of the structural coding is that the string associated to the labelled tree domain is also the string associated to the term (with brackets dropped, as the tree encodes the structure anyway). Notes on this section. A variant of the dependency coding of syntactic structures has been proposed by Lucien Tesni` re in (1982). He called tree e representations stemmata (sg. stemma). This notation (and the theory surrounding it) became known as dependency syntax. See (Mel uk, 1988) for c a survey. Unfortunately, the stemmata do not coincide with the dependency trees dened here, and this creates very subtle problems, see (Mel uk, 1988). c Noam Chomsky on the other hand proposed the more elaborate structural coding, which is by now widespread in linguistic theory. Exercise 22. Dene exhaustive ordering on constituent structures. Show that a linear ordering on the leaves is extensible to an exhaustive ordering in a tree iff it is in the related constituent structure. Exercise 23. Let T be a tree and a binary relation such that x y only if x y are daughters of the same node (that is, they are sisters). Further, the daughter nodes of a given node shall be ordered linearly by . No other
j j j
u " #tQ
V U i
0 i) (
g aU U
x : t x j
s 0 :
Vi WV D V U V U
(1.71)
s :
1 i
i W S
s "T ) S
sc :
i x:x
tic
V U i
V U
Vi W U D
V U
T ) S
) aaa)
V U
(1.70)
s :
s j x : t x j
) aaa) U )
52
s
Figure 4. Dependency Coding and Structural Coding
relations shall hold. Show that this ordering can be extended to an exhaustive ordering on . Exercise 24. Show that the number of binary branching exhaustively ordered trees over a given string is exactly (1.72) Cn 1 n 2n 1 n
x w
These numbers are called Catalan numbers. Exercise 25. Show that Cn n 1 1 4n . (One can prove that 2n approximates n n the series 4 n in the limit. The latter even majorizes the former. For the exercise there is an elementary proof.)
z y
5.
Rewriting Systems
Languages are by Denition 1.36 arbitrary sets of strings over a (nite) alphabet. However, languages that interest us here are those sets which can be described by nite means, particularly by nite processes. These can be processes which generate strings directly or by means of some intermedi-
Exercise 26. Let L be nite with n elements and Construct an isomorphism from L onto n .
a linear ordering on L.
0 1 Ci) (
v 0 i) ( w g
u D
Rewriting Systems
53
ate structure (for example, labelled trees). The most popular approach is by means of rewrite systems on strings. Denition 1.57 Let A be a set. A semi Thue system over A is a nite set T xi yi : i m of pairs of Astrings. If T is given, write u 1 v if there T are s t and some i m such that u s xi t and v s yi t . We write u 0 v T n if u v, and u T 1 v if there is a z such that u 1 z n v. Finally, we write T T u T v if u n v for some n , and we say that v is derivable in T from u. T We can dene 1 also as follows. u 1 v iff there exists a context C and T T x y T such that u C x and v C y . A semi Thue system T is called a Thue system if from x y T follows y x T . In this case v is derivable from u iff u is derivable from v. A derivation of y from x in T is a nite sequence vi : i n 1 such that v0 x, vn y and for all i n we have vi 1 vi 1 . The length of this derivation is n. (A more careful denition will T be given on Page 57.) Sometimes it will be convenient to admit v i 1 vi even if there is no corresponding rule. A grammar differs from a semi Thue system as follows. First, we introduce a distinction between the alphabet proper and an auxiliary alphabet, and secondly, the language is dened by means of a special symbol, the so called start symbol. S N A R such that N A Denition 1.58 A grammar is a quadruple G are nonempty disjoint sets, S N and R a semi Thue system over N A such that R only if A . We call S the start symbol, N the nonterminal alphabet, A the terminal alphabet and R the set of rules. Elements of the set N are also called categories. Notice that often the word type is used instead of category, but this usage is dangerous for us in view of the fact that type is reserved here for types in the calculus. As a rule, we choose S . This is not necessary. The reader is warned that need not always be the start symbol. But if nothing else is said it is. As is common practice, nonterminals are denoted by upper case Roman letters, terminals by lower case Roman letters. A lower case Greek letter signies a letter that is either terminal or nonterminal. The use of vector arrows follows the practice established for strings. We write G or G in case that S R and say that G generates . Furthermore, we write G if R . The language generated by G is dened by
| i i | i i i i D i) i 0 !) @S i i( D
T ~ i
1 IS i
V U
(1.73)
LG :
A :G
0 ) ) ) (
i | i i i W dW i
|
1 i i d0 6) ( V U i
i i
i i W QW i
|
i D
1 i i Q0 ) ( V U i i
1 i
1 d0 i ) i (
i (
1 i i C0 !) (
|
54
Notice that G generates strings which may contain terminal as well as nonterminal symbols. However, those that contain also nonterminals do not belong to the language that G generates. A grammar is therefore a semi Thue system which additionally denes how a derivation begins and how it ends. Given a grammar G we call the analysis problem (or parsing problem) for G the problem (1) to say for a given string whether it is derivable in G and (2) to name a derivation in case that a string is derivable. The problem (1) alone is called the recognition problem for G. A rule is often also called a production and is alternatively written . We call the left hand side and the right hand side of the production. The productivity p of a rule is the difference . is called expanding if p 0, strictly expanding if p 0 and contracting if p 0. A rule is terminal if it has the form x (notice that by our convention, x A ). This notion of grammar is very general. There are only countably many grammars over a given alphabet and hence only countably many languages generated by them ; nevertheless, the variety of these languages is bewildering. We shall see that every recursively enumerable language can be generated by some grammar. So, some more restricted notion of grammar is called for. Noam Chomsky has proposed the following hierarchy of grammar types. .) (Here, X is short for X Any grammar is of Type 0. A grammar is said to be of Type 1 or context sensitive if all rules are of the form 1 X 2 1 2 and either (i) always or (ii) is a rule and never occurs on the right hand side of a production. A grammar is said to be of Type 2 or context free if it is context sensitive and all productions are of the form X . A grammar is said to be of Type 3 or regular if it is context free and all productions are of the form X where A N .
}
One says that X can be rewritten into in the context 1 2 . A language is said to be of Type i if it can be generated by a grammar of Type i. It is
i i
(1.74)
i i i
A context sensitive rule 1 X 2
1 2 is also written
i i V U
i D
1 i
i i
f
V U
V U
1 i
i i i
T s S
V U
0 i) i(
i i i v
Rewriting Systems
55
not relevant if there also exists a grammar of Type j, j i, that generates this language in order for it to be of Type i. We give examples of grammars of Type 3, 2 and 0. E XAMPLE 1. There are regular grammars which generate number expressions. Here a number expression is either a number, with or without sign, or a pair of numbers separated by a dot, again with or without sign. The grammar is as follows. The set of terminal symbols is , the set of nonterminals is . The start symbol is and the productions are
(1.75)
Here, we have used the following convention. The symbol on the right hand side of a production indicates that the part on the left of this sign and the one to the right are alternatives. So, using the symbol saves us from writing two rules expanding the same symbol. For example, can be expanded either by , or by . The syntax of the language ALGOL has been written down in this notation, which became to be known as the BackusNaur Form. The arrow was written :: . (The BackusNaur form actually allowed for contextfree rules.) E XAMPLE 2. The set of strings representing terms over a nite signature with nite set X of variables can be generated by a context free grammar. Let F m and i : i . i:i
i
Since the set of rules is nite, so must be F. The start symbol is . This grammar generates the associated strings in Polish Notation. Notice that this grammar reects exactly the structural coding of the terms. More on that later. If we want to have dependency coding, we have to choose instead the following grammar.
(1.77)
i
j0 j1
j i
j0 j1
j i
1 1
aa
aa
(1.76)
T !bS ) c) v) ) ) w) ) u) x) ) ) r) q
aaH rRqebaaIRH bq e er e e y aa q r aaIH1q r c 7Cv
T e) ) ) F) S V U D D
V U
e
s
c
56
This is a scheme of productions. Notice that for technical reasons the root symbol must be . We could dispense with the rst kind of rules if we are allowed to have several start symbols. We shall return to this issue below. E XAMPLE 3. Our example for a Type 0 grammar is the following, taken from (Salomaa, 1973).
5
is the start symbol. This grammar generates the language n : n 0 . This can be seen as follows. To start, with (a) one can either generate the string or the string . Let i i , i . We consider derivations which go from i to a terminal string. At the beginning, only (b) or (d) can be applied. Let it be (b). Then we can only continue with (c) and then we create a string of length 4 i . Since we have only one letter, the string is uniquely determined. Now assume that (d) has been chosen. Then we get the string i . The only possibility to continue is using (e). This moves the index 1 stepwise to the left and puts before every occurrence of i . Now there is no an . Finally, it hits and we use (f) to get other choice but to move the index 2 to the right with the help of (g). This gives a string i 1 i 1 with i 1 i . We have
3
where x i counts the number of in i . Since x i 1 2, 0 , x i 2 we conclude that x i 2i and so i i 1 4, i 0. Hence, i i 1 2 , as promised. In the denition of a context sensitive grammar the following must be remembered. By intention, context sensitive grammars only consist of noncontracting rules. However, since we must begin with a start symbol, there would be no way to derive the empty string if no rule is contracting. Hence, . But in order not to let other contracting uses of we do admit the rule
i D D i
g V i U
v V V iU
g U
g @V i U
(1.79)
T H1 i ) S
k i
k i
1
i D i H
i g
F7
i D H
g ! i i D
#H
1
(1.78)
H @H
H
HH H H
1
V iU
i H
V U V U V U
V U V U V U V U
a b c d e f g
V U
g U
Rewriting Systems
}
57
this rule creep in we require that is not on the right hand side of any rule whatsoever. Hence, can only be applied once, at the beginning of the derivation. The derivation immediately terminates. This condition is also in force for context free and regular grammars although without it no more languages can be generated (see the exercises). For assume that in a grammar G with rules of the form X there are rules where occurs on the right hand side of a production, and nevertheless replace by Z in all rules which are not not of the form . Add also all rules , where is a rule of G and results from by replacing by Z. This is a context free grammar which generates the same language, and even the same structures. (The only difference is with the nodes labelled or Z.) The class of regular grammars is denoted by RG, the class of all context free grammars by CFG, the class of context sensitive grammars by CSG and the class of Type 0 grammars by GG. The languages generated by these grammars is analogously denoted by RL, CFL, CSL and GL. The grammar classes form a proper hierarchy.
m m m
(1.80)
RG
CFG
CSG
GG
This is not hard to see. It follows immediately that the languages generated by these grammar types also form a hierarchy, but not that the inclusions are proper. However, the hierarchy is once again strict.
m m m
(1.81)
RL
CFL
CSL
GL
We shall prove each of the proper inclusions. In Section 1.7 (Theorem 1.96) we shall show that there are languages of Type 0 which are not of Type 1. Furthermore, from the Pumping Lemma (Theorem 1.81) for CFLs it follows that n n n : n is not context free. However, it is context sensitive (which is left as an exercise in that section). Also, by Theorem 1.65 below, 2 the language n : n has a grammar of Type 1. However, this language is not semilinear, whence it is not of Type 2 (see Section 2.6). Finally, it will be shown that n n : n is context free but not regular. (See Exercise 51.) Let . We call a triple A C an instance of if C is an occurrence of in and also an occurrence of in . This means that there exist 1 and 2 such that C 1 2 and 1 2 as well as 1 2 . We call C the domain of A. A derivation of length n is a i Ci i for sequence Ai : i n of instances of rules from G such that A i
0 i)
) i(
i Wi W i
k i
0 i) ) i( D
0 i) i(
k i H
( i W i W i
p D
D i
58
1
n 1 the end. We denote by der G the set of derivations G from the string and der G : der G S . This denition has been carefully chosen. Let A i : i n be a derivation in G, where Ai i Ci i 1 (i n). Then we call i : i n 1 the (associated) string sequence. Notice that the string sequence has one more element than the derivation. In what is to follow we shall often also call the string sequence a derivation. However, this is not quite legitimate, since the string sequence does not determine the derivation uniquely. Here is an example. Let G consist of the rules , and . Take the string sequence . There are two derivations for this sequence.
After application of a rule , the left hand side is replaced by the right hand side, but the context parts 1 and 2 remain as before. It is intuitively clear that if we apply a rule to parts of the context, then this application could be permuted with the rst. This is claried in the following denition and theorem.
, Denition 1.59 Let 1 2 be an instance of the rule and let 1 2 be an instance of . We call the domains of these applications disjoint if either (a) 1 is a prex of 1 or (b) 2 is a sufx of 2 .
Lemma 1.60 (Commuting Instances) Let C be an instance of , and D an instance of . Suppose that the instances are disjoint. Then there exists an instance D of as well as an instance C of , and both have disjoint domains. The proof is easy and left as an exercise. Analogously, suppose that to the same string the rule can be applied with context C and the rule can be applied with context D. Then if C precedes D, after applying one of them the domains remain disjoint, and the other can still be applied (with the context modied accordingly).
i W i
0 i lk ) i ( ) i i D 0 i) ) i(
i W i i D
0 i 0 i ) i ) i ( ) ( 0 i 0 i ) i ) i ( ) (
(1.82b)
0a0 0 a0
T S S
)0 ) ) ( SS S ) 0 ) ) (
T S
(0 ) S () 0
T S
)0 ) ) a( ( ( S } ) 0 ) ) a( ( (
(1.82a)
i(
T S
S S
T S
V i) U
0 i)
) i(
V ) U
0 i) ) i(
T S S
0 i)k ) i (
T S
V U
}
) (
n and for every j
n 1 j
i D
j . 0 is called the start of the derivation,
Rewriting Systems
59
We give rst an example where the instances are not disjoint. Let the following rules be given.
There are two possibilities to apply the rules to . The rst has domain , the second the domain . The domains overlap and indeed the rst rule when applied destroys the domain of the second. Namely, if we apply the rule we cannot reach a terminal string.
T | T S | T S
(1.84)
So much for noncommuting instances. Now take the string . Again, the two rules are in competition. However, this time none destroys the applicability of the other.
As before we can derive the string . Notice that in a CFG every pair of rules that are in competition for the same string can be used in succession with either order on condition that they do not compete for the same occurrence of a nonterminal. Denition 1.61 A grammar is in standard form if all rules are of the form X Y, X x. In other words, in a grammar in standard form the right hand side either consists of a string of nonterminals or a string of terminals. Typically, one restricts terminal strings to a single symbol or the empty string, but the difference between these requirements is actually marginal. Lemma 1.62 For every grammar G of Type i there exists a grammar H of Type i in standard form such that L G LH .
V U
V U
(1.87)
7 |
(1.86)
(1.85)
If on the other hand we rst apply the rule
we do get one.
H H
H H
0 ) (
(1.83)
0 ) ( i
60
Proof. Put N : A N and h : a X : N A N 1 . For a:a a X each rule let h be the result of applying h to both strings. Finally, let R : h : R a:a A ,H: N A R . It is easy to a verify, using the Commuting Instances Lemma, that L H L G . (See also below for proofs of this kind.) We shall now proceed to show that the conditions on Type 0 grammars are actually insignicant as regards the class of generated languages. First, we may assume a set of start symbols rather than a single one. Dene the notion of a grammar (of Type i) to be a quadruple G N A R such that N and for all S , S N A R is a grammar (of Type i). Write G if there is an S such that S R . We shall see that grammars are not more general than grammars with respect to languages. Let G be a A N be a new nonterminal and grammar . Dene G as follows. Let S add the rules S X to R for all X . It is easy to see that L G LG. (Moreover, the derivations differ minimally.) Notice also that we have not changed the type of the grammar. The second simplication concerns the requirement that the set of terminals and the set of nonterminals be disjoint. We shall show that it too can be dropped without increasing the generative power. We shall sometimes work without this condition, as it can be cumbersome to deal with. N A R such that A Denition 1.63 A quasigrammar is a quadruple and N are nite and nonempty sets, N, and R a semi Thue system over N A such that if R then contains a symbol from N. Proposition 1.64 For every quasigrammar there exists a grammar which generates the same language. N A R be a quasigrammar. Put N1 : N A. Then assume for Proof. Let every a N1 a new symbol a . Put Y : N1 , N : N N1 Y , a :a A : A. Now N A . We put : if A and : if A. Finally, we dene the rules. Let be the result of replacing every occurrence of an a N1 by the corresponding a . Then let
Put G : N A R . We claim that L G L G . To that end we dene A N by h a : a for a A N1 , a homomorphism h : A N ha : N1 and h X : X for all X N N1 . Then h a for a
}
Dv
V U
v 1 V U D V U
V V U
S s "T
1 i
V U UD V
1 s U 0 ) ) ) j j (
i S
(1.88)
}
R :
a:a
N1
V U
1 s $V }
0 ) ) ) (
V C U
v 7D U
V U V U 0 k )D ) k ) ( 0 ) ) ) ( D
}
0 ) ) ) (
SD
s CT
P C
1 Q0 i ) i (
S s yT D 1
0 ) ) ) (
V U t
P C
V U S 1 D 1 }
V U
Rewriting Systems
61
as well as h R R . From this it immediately follows that if G then G h . (Induction on the length of a derivation.) Since we can derive in G from h , we certainly have L G L G . For the converse we have to convince ourselves that an instance of a rule a a can always be moved to the end of the derivation. For if is a rule then it is of type b b and replaces a b by b; and hence it commutes with that instance of the rst rule. ; since a does not occur in , Or it is of a different form, namely these two instances of rules commute. Now that this is shown, we conclude from G already G . This implies G . The last of the conditions, namely that the left hand side of a production must contain a nonterminal, is also no restriction. For let G N A R be a grammar which does not comply with this condition. Then for every terminal a let a1 be a new symbol and let A1 : a1 : a A . Finally, for each rule let 1 be the result of replacing every occurrence of an a A by a 1 (on every side of the production). Now set : if A and : otherwise, R : 1 : R a a : a A . Finally put G : N A1 A R . It is not hard to show that L G L G . These steps have simplied the notion of a grammar considerably. Its most general form is N A R , where N N A N A a nite set. is the set of start symbols and R Next we shall show a general theorem for context sensitive languages. A grammar is called noncontracting if either no rule is contracting or only the rule is contracting and in this case the symbol never occurs to the right of a production. Context sensitive grammars are contracting. However, not all noncontracting grammars are context sensitive. It turns out, however, that all noncontracting grammars generate context sensitive languages. (This can be used also to show that the context sensitive languages are exactly those languages that are recognized by a linearly space bounded Turing machine.) Theorem 1.65 A language is context sensitive iff there is a noncontracting grammar that generates it. Proof. ( ) Immediate. ( ) Let G be a noncontracting grammar. We shall construct a grammar G which is context sensitive and such that L G L G . To this end, let X0 X1 Xm 1 Y0Y1 Yn 1 , m n, be a production. (As remarked above, we can reduce attention to such rules and rules of the form X a. Since the latter are not contracting, only the former kind needs attention.) We assume m new symbols, Z 0 , Z1 Zm 1 . Let be the
|
V " U
0k ) )
0 ) ) ) (
1D }
V s 0 ) ) ) (
k )k ( k } D 1
) aaa)
aa
V U T i
} ~
U e V
} V U D k i S
}
V U
U }
aa
Vk U
S T s
} V U V iU 1
V iU
~
V U
i k
62
following set of rules. Z0 X1 X2 (1.89) Xm Z0 Z1 X2 Z0 Z1 Y0 Z1 Xm

1
Let G be the result of replacing all non context sensitive rules by . The new grammar is context sensitive. Now let us be given a derivation in G. Then replace every instance of a rule by the given sequence of rules in . This gives a derivation of the same string in G . Conversely, let us be given a derivation in G . Now look at the following. If somewhere the rule is applied, and then a rule from 1 then the instances commute unless 1 and the second instance is inside that of that rule instance of . Thus, by suitably reordering the derivation is a sequence of segments, where each segment is a sequence of the rule for some , so that it begins with X and ends with Y . This can be replaced by . Do this for every segment. This yields a derivation in G. Given that there are Type 0 languages that are not Type 0 (Theorem 1.96) the following theorem shows that the languages of Type 1 are not closed under arbitrary homomorphisms. Theorem 1.66 Let A be (distinct) symbols. For every language L over A of Type 0 there is a language M over A of Type 1 such that for i x every x L there is an i with M and every y M has the form i x with x L.
be a contracting rule. Then put

1
aa
aa
(1.91)
: X0 X1
Xm
m n
Y0Y1
aa
aa
(1.90)
X0 X1
T d ) ) s S
Xm
Proof. We put N :
. Let Yn
1
Y0Y1
Yn
1 i
T S l) s
aa
1 i
1 l)
aa
Y0Y1
Ym 2 Zm
Y0Y1
Yn
aa
Y0 Z1 Z2
Zm
Y0Y1 Z2
Zm
Z0 Z1
Zm
Zm
aa aa
aa aa
aa
Z0 Z1
Zm 2 Xm
Zm
1 1
aa
aa
aa aa
aa aa
X0 X1
Xm
Z0 X1
Xm
1 1
1 i 1 i
Rewriting Systems
63
is certainly not contracting. If is not contracting then put : . Let R consist of all rules of the form for R as well as the following rules.
i x for some x A . For strings Let M : L G . Certainly, y M only if y contain (or ) only once. Further, can be changed into only if it occurs directly before . After that we get followed by . Hence must occur after all occurrences of but before all occurrences of . Now consider the homomorphism v dened by v : and v : X X for X N, v : a a for a A. If i : i n is a derivation in G then v i : 0 i n is a derivation in G (if we disregard repetitions). In this way one shows that i x M implies x L G . Next, let x L G . Let : i n be a derivation i of x in G. Then do the following. Dene 0 : S and 1 . Further, ki for some k which is determined inductively. let i 1 be of the form i i It is easy to see that i 1 G i 2 , so that one can complete the sequence
i : i n 1 to a derivation. From kn x one can derive kn x. This shows that kn x M, as desired. Now let v : A B be a map. v (as well as the generated homomorphism v) is called free if v a for all a A.
v L1 , where v is free. Proof. Before we begin, we remark the following. If L A is a language and G N A R a grammar over A which generates L then for an arbitrary N B R is a grammar over B which generates L B . Therefore we B A may now assume that L1 and L2 are languages over the same alphabet. is seen as follows. We have G1 1 N1 A R1 and G2 2 N2 A R2 with L G1 L G2 . By renaming the nonterminals of G2 we can see to it that
) )
) )
0 ) ) ) ( 0 ) ) }) (
If i
1 then v L1 also is of Type i even if v is not free.
L 1 L2 , L 1 L2 , L 1
Theorem 1.67 Let L1 and L2 be languages of Type i, 0 following are also languages of Type i.
3. Then the
} T
V iU (
1 iH
i(
i V U
H D
) Fl) ) )
1 i
S T
1 i
2V U D
V U
S T
i(
1 i
S T
(1.92)
X
T
} T
1 i
i i 1 i
i(
64
N1 N2 . Now we put N3 : N1 N2 (where N1 N2 ) and R : R 1 R2 . This denes G3 : N3 A R3 . This 1 2 is a grammar which generates L1 L2 . We introduce a new start symbol together with the rules 1 2 where 1 is the start symbol of G1 and G2 the start symbol of G2 . This yields a grammar of Type i except if i 3. In this case the fact follows from the results of Section 2.1. It is however not difcult to construct a grammar which is regular and generates the language L 1 L2 . Now for L1 . Let be the start symbol for a grammar G which generates L 1 . Then introduce a new symbol as well as a new start symbol together with the rules
This grammar is of Type i and generates L 1 . (Again the case i 3 is an exception that can be dealt with in a different way.) Finally, . Let v be free. We extend it by putting v X : X for all nonterminals X. Then replace the rules by v : v v . If i 0 2, this does not change the type. If i 1 we must additionally require that v is free. For if X is a rule and is a terminal string we may have v . This is however not the case if v is free. If i 3 again a different method must be used. For xY now after applying the replacement we have rules of the form X and X x, x x0 x1 xn 1 . Replace the latter by X x0 Z0 , Zi xi Zi 1 and Zn 2 xn 1Y and Zn 2 xn 1 , respectively. Denition 1.68 Let A be a (possibly innite) set. A nonempty set A is called an abstract family of languages (AFL) over A if the following holds.
We still have to show that the languages of Type i are closed with respect to intersections with regular languages. A proof for the Types 3 and 2 is found
If L1 L2
then also L1
L2
and L1 L2
If L
and R is a regular language then L R
If h : A A is a homomorphism and L h 1L B .
,B
A nite, then also .
t 1
If h : A
A is a homomorphism and L
then also h L
For every L
there is a nite B
A such that L
B . .
V U
i i i
V U
V iU
V i U
D V U
V U
} }
aa
i i
(1.93)
} 7}
) ) D
}
7}
7}
7}
S s
7}
} }
S s
tD
Rewriting Systems
65
in Section 2.1, Theorem 2.14. This proof can be extended to the other types without problems. The regular, the context free and the Type 0 languages over a xed alphabet form an abstract family of languages. The context sensitive languages fulll all criteria except for the closure under homomorphisms. It is easy to show that the regular languages over A form the smallest abstract family of languages. More on this subject can be found in (Ginsburg, 1975). Notes on this section. It is a gross simplication to view languages as sets of strings. The idea that they can be dened by means of formal processes did not become apparent until the 1930s. The idea of formalizing rules for transforming strings was rst formulated by Axel Thue (1914). The observation that languages (in his case formal languages) could be seen as generated from semi Thue systems, is due to Emil Post. Also, he has invented independently what is now known as the Turing machine and has shown that this machine does nothing but string transformations. The idea was picked up by Noam Chomsky and he dened the hierarchy which is now named after him (see for example (Chomsky, 1959), but the ideas have been circulating earlier). In view of Theorem 1.66 it is unclear, however, whether grammars of Type 0 or 1 have any relevance for natural language syntax, since there is no notion of a constituent that they dene as opposed to context free grammars. There are other points to note about these types of grammars. (Langholm, 2001) voices clear discontentment with the requirement of a single start symbol, which is in practice anyway not complied with. Exercise 27. Let T be a semi Thue system over A and A B. Then T is also a semi Thue system T over B. Characterize T B B by means of A A . Remark. This exercise shows that with the Thue system we T also have to indicate the alphabet on which it is based. Exercise 28. Let A be a nite alphabet. Every string x is the value of a constant term xE composed from constants a for every a A, the symbol , and . Let T be a Thue system over A. Write T E : xE yE : x y T . Let M be consist of Equations (1.27) and (1.28). T E is an equational theory. Show that x T y iff y T x iff T E M xE yE . Exercise 29. Prove the Commuting Instances Lemma. Exercise 30. Show that every nite language is regular. Exercise 31. Let G be a grammar with rules of the form X
. Show that
1 i i C0 !) (
1 D DD i
i IS
D DD
66
L G is context free. Likewise show that L G is regular if all rules have the form X 0 1 where 0 A and 1 N . a is Exercise 32. Let G be a grammar in which every rule distinct from X strictly expanding. Show that a derivation of a string of length n takes at most 2n steps. Exercise 34. Write a Type 1 grammar for the language one for x x : x A . 6. Grammar and Structure
Processes that replace strings by strings can often be considered as processes that successively replace parts of structures by structures. In this section we shall study processes of structure replacement. They can in principle operate on any kind of structure. But we will restrict our attention to algorithms that generate ordered trees. There are basically two kinds of algorithms: the rst is like the grammars of the previous section, generating intermediate structures that are not proper structures of the language; and the second, which generates in each step a structure of the language. Instead of graphs we shall deal with socalled multigraphs. A directed multigraph is a structure V Ki : i n where is V a set, the set of vertices, and Ki V V a disjoint set, the set of edges of type i. In our case edges are always directed. We shall not mention this fact explicitly later on. Ordered trees are one example among many of (directed) multigraphs. For technical reasons we shall not exclude the case V , so that :i n also is a multigraph. Next we shall introduce a colouring on the vertices. A vertexcolouring is a function V : V FV where FV is a nonempty set, the set of vertex colours. Think of the labelling as being a vertex colouring on the graph. The principal structures are therefore vertex coloured multigraphs. However, from a technical point of view the different edge relations can also be viewed as colourings on the edges. Namely, if v and w are vertices, we colour the edge v w by the set i : v w Ki . This set may be empty. Denition 1.69 An FV FE coloured multigraph or simply a graph (over FV and FE ) is a triple V V E , where V is a (possibly empty) set and V : V FV as well as E : V V FE are functions.
0 a0
w() w (
1 d0 ) (
0 a0
Exercise 33. Show that the language
n n
:n
is context free.
n n n
:n
and
T S s
V U
T s S 0 S
( ) (
) (
0 ) (
1 i
i i yW IS
V U
Grammar and Structure
67 y
Figure 5. Graph Replacement
Now, in full analogy to the string case we shall distinguish terminal and nonterminal colours. For simplicity, we shall study only replacements of a single vertex by a graph. Replacing a vertex by another structure means embedding a structure into some other structure. We need to be told how to do so. Before we begin we shall say something about the graph replacement in general. The reader is asked to look at Figure 5. The graph 3 is the result of replacing in 1 the encircled dot by 2 . The edge colours are 1 and 2 (the vertex colours pose no problems, so they are omitted here for clarity). E E K be a graph and M1 and M2 be disjoint subsets of E Let i i i i with M1 M2 E. Put i Mi V E , where V : V Mi and E : E Mi Mi . These graphs do not completely determine since there is no information on the edges between them. We therefore dene functions in out : M2 FE M1 , which for every vertex of M2 and every edge colour name the set of all vertices of M1 which lie on an edge with a vertex that either is directed into M1 or goes outwards from M1 .
out p 1
out p 2
T ) S V ) U D V ) U
T S
V ) U V ) U
(1.95)
in p 1
in p 2
wy
It is clear that 1 , 2 and the functions in and out determine In our example we have
(1.94b)
out x f :
M1 : f
Ta0 ) aU V ( TV a0 ) aU (
V ) U V ) U
(1.94a)
in x f :
M1 : f
E y x E x y
completely.
)D (
c
2
68
i
Now assume that we want to replace 2 by a different graph . Then not only do we have to know but also the functions in out : H FE M1 . This, however, is not the way we wish to proceed here. We want to formulate rules of replacement that are general in that they do not presuppose exact knowledge about the embedding context. We shall only assume that the functions in x f and out x f , x H, are systematically dened from the sets in y g , out y g , y M2 . We shall therefore only allow to specify how the sets of the rst kind are formed from the sets of the second kind. This we do by means of four socalled colour functionals. A colour functional from to 2 is a map
In our case a functional is a function from a b c 1 2 to p 1 2 . We can simplify this to a function from a b c 1 2 to 1 2 . The colour functionals are called , , and . For the example of Figure 5 we get the following colour functionals (we only give values when the functions do not yield ). : : c2
The result of substituting 2 by by means of the colour functionals from is denoted by . This graph is the union of 1 and together 2 : with the functions in and out , which are dened as follows.
If g x f we say that an edge with colour g into x is transmitted as an ingoing edge of colour f to y. If g x f we say that an edge with colour g going out from x is transmitted as an ingoing edge with colour f to and . So, we do allow for an edge to change colour y. Analogously for and direction when being transmitted. If edges do not change direction, we only need the functionals and , which are then denoted simply by and . Now we look at the special case where M2 consists of a single element,
0 aV ) 1 U
in x g : g
x f
0 aV ) U
V ) "1 U
V ) U
V ) U
V ) U
out
x f :
(1.98)
out x g : g
0 aV ) C1 U
out x g : g
0 aV ) 1 U
V ) U
V ) U
V ) U
in
x f :
in x g : g
x f
x f
x f
T "0 ) ( S T C0 ) ( S
"
T C0 ) ( S
V ) x1 U
(1.97)
: b1
: a2
V IT ) !U S e T !U S
T ) dT ) ) S S e T ) T ) ) S S e
(1.96)
:H
FE
M2
FE
V ) U
V ) U V ) U V ) U
V IT ) S
69
Denition 1.70 A context free graph grammar with edge replacement a context free grammar for short is a quintuple of the form
in which FV is a nite set of vertex colours, FE a nite set of edge colours, T FV FV a set of socalled terminal vertex colours, a graph over FV and FE , the socalled start graph, and nally R a nite set of triples X T such that X FV FV is a nonterminal vertex colour, a graph over FV and FE and is a matrix of colour functionals. A derivation in a grammar is dened as follows. For graphs and 1 with the colours FV and FE , means that there is X R such R : , where is a subgraph consisting of a single vertex x that having the colour X. Further we dene R to be the reexive and transitive closure of 1 and nally we put if R . A derivation terminates R if there is no vertex with a nonterminal colour. We write L for the class of graphs that can be generated from . Notice that the edge colours only the vertex colours are used to steer the derivation. We also dene the productivity of a rule as the difference between the cardinality of the replacing graph and the cardinality of the graph being replaced. The latter is 1 in context free grammars, which is the only type 1. It equals 1 if the we shall study here. So, the productivity is always replacing graph is the empty graph. A rule has productivity 0 if the replacing graph consists of a single vertex. In the exercises the reader will be asked to verify that we can dispense with rules of this kind. Now we shall dene two types of context free grammars. Both are context free as grammars but the second type can generate nonCFLs. This shows that the concept of grammar is more general. We shall begin with ordinary CFGs. We can view them alternatively as grammars for string replacement or as grammars that replace trees by trees. For that we shall now assume that there are no rules of the form X . (For such rules generate trees whose leaves are not necessarily marked by letters from A. This case can be treated if we allow labels to be in A A , which we shall not do here.) Let G A N R be such a grammar. We put FV : A N 2 . We 0 for X 0 and X 1 for X 1 . F T : A N 0 . FE : . Furwrite X V thermore, the start graph consists of a single vertex labelled 1 and no edge.
i
0 ) i ) (
Tj)8S } U s D
1 0 ) 2) (
V U
T e S
v f
T s S
0 ) (
0 )
0 ) ( 0 ) ) ) (
) (
(1.99)
T FV FV FE R
say x. In this case a colour functional simply is a function FE .
:H
FE
70
The rules of replacement are as follows. Let X 0 1 n 1 be a rule from G, where none of the i is . Then we dene a graph as follows. H : yi : i n x . V x X 0 , V yi i if i A and V yi i1 if i N. (1.100)
(1.101)
u u
: :
u u
: :
(1.102)
G :
T FE FE FT R
We shall show that this grammar yields exactly those trees that we associate with the grammar G. Before we do so, a few remarks are in order. The nonterminals of G are now from a technical viewpoint terminals since they are also part of the structure that we are generating. In order to have any derivation at all we dene two equinumerous sets of nonterminals. Each nonterminal N is split into the nonterminal N 1 (which is nonterminal in the new grammar) and N 0 (which is now a terminal vertex colour). We call the rst kind active, nonactive the second. Notice that the rules are formulated in such a way that only the leaves of the generated trees carry active nonterminals. A sinhas been gle derivation step is displayed in Figure 6. In it, the rule applied to the tree to the left. The result is shown on the right hand side. It is easy to show that in each derivation only leaves carry active nonterminals. This in turn shows that the derivations of the grammar are in one to one correspondence with the derivations of the CFGs. We put
aa
This is the for each X on labelled if (i) of is maximal
class of trees generated by G, with X 0 and X 1 mapped to X N. The rules of G can therefore be interpreted as conditions ordered trees in the following way. is called a local subtree it has height 2 (so it does not possess inner nodes) and (ii) it with respect to inclusion. For a rule X Y0Y1 Yn 1 we
V U
(1.103)
LB G :
h L G
Finally we put :
. R :
R .
) 0 ) ) ) ( D S 0 IT ) ') ) ( S i D D T8S T V i) U 8S V i) U D D T j S V j ) U S T j V ) U j D D
This denes
Now we dene the colour functionals. For u
bT
yi y j : i
) bT
E 1
yi x : i
n we put
V U
aa
V U
0 ) @S ( V T j S I!U D 0 ) @S ( V T S I8!U D
V U
T T S s
71
Figure 6. Replacement in a Context Free Grammar
Proposition 1.71 Let G N AR. isomorphic to an such that R.
LB G iff every local tree of
Theorem 1.72 Let B be a set of trees over an alphabet A N with terminals n of from A. Then B LB G for a CFG G iff there is a nite set i :i trees of height 2 and an S such that B exactly if the root carries label S, a label is terminal iff the node is a leaf, and
every local tree is isomorphic to some
i.
We shall derive a few useful consequences from these considerations. It is clear that G generates trees that do not necessarily have leaves with terminal symbols. However, we do know that the leaves carry labels either from A or 1 while all other nodes carry labels from N 0 : N 0 . from N 1 : N For a labelled tree we dene the associated string sequence k in the usual A N be dened way. This is an element of A N 1 . Let v : A N 2 by v a : a, a A and v X 0 : v X 1 : X for X N.
T e S
yi : i n x , : yi x : i dene L : j n , and nally x : X, yi : Yi . : isomorphism between labelled ordered trees C is a bijective map h : B C such and hx x for all x B.
1 1 V U 0 ) ) ) ( } D 1 V U 3 aV U U 3 V j D 0 3) j) $i) D ( 0 D) ) i) ( 3 j 0 ) ) i) ( D 3 j V b3 U V U 3 T ) @S ( j T D 0 ) @S D ( T QT S Ds S D D D
n , : L B that h
yi y j : i . Now, an and ,h is
U s
U V
V U D s U
V U
T e S
D 1
V U
72
Proof. Induction over the length of the derivation. If the length is 0 then 1 and v 1 . Since G this case is settled. Now let be the on where result of an application of some rule X . We then have k A N 1 . The rule has been applied to a leaf; this leaf corresponds to an occurrence of X 1 in k . Therefore we have k 1 X 1 2 . Then 1 2 . k is the result of a single application of the rule from k k . Denition 1.74 Let be a labelled ordered tree. A cut through is a maximal set that contains no two elements comparable by . If is exhaustively ordered, a cut is linearly ordered and labelled, and then we also call the string associated to this set a cut.
This theorem shows that the tree provides all necessary information. If you have the tree, all essential details of the derivation can be reconstructed (up to commuting applications of rules). Now let us be given a tree and let be a cut. We say that an occurrence C of in is a constituent of category X in if this occurrence of in is that cut dened by on x where x carries the label X. This means that 1 2 , C 1 2 , and x contains exactly those nodes that do not belong to 1 or 2 . Further, let G be a CFG. A substring occurrence of is a Gconstituent of category X in if there is a Gtree for which there exists a cut such that the occurrence is a constituent of category X. If G is clear from the context, we shall omit it. Lemma 1.76 Let be a Gtree and a cut through a tree with associated string and v v . . Then there exists
Lemma 1.77 Let G 1 2 , C 1 2 an occurrence of as a G constituent of category X. Then C is a Gconstituent occurrence of X in 1 X 2 . C X For a proof notice that if 1 2 is a cut and is a constituent of category X therein then 1 X 2 also is a cut. Theorem 1.78 (Constituent Substitution) Suppose that C is an occurrence of as a Gconstituent of category X. Furthermore, let X G . Then G C 1 2 and C is a Gconstituent occurrence of of category X.
~
V iU
0 i) i(
i i W i W i
V iU
0 i) i(
V iU
i Wi W i
i W i W i
Proposition 1.75 Let G
and let be a cut through
. Then G
v .
V iU `
i W
W i
s x1 i U V U D
V U
i W
Lemma 1.73 Let G

}
and
. Then
A N1
and G
v .
W i
i W
i Wi W i
i Wi W i
W i
s "U U 1 V D D D V U V V iU V "U i U
73
Proof. By assumption there is a tree in which is a constituent of category X in 1 2 . Then there exists a cut 1 X 2 through this tree, and by Lemma 1.76 there exists a tree with associated string 1 X 2 . Certainly we have that X is a constituent in this tree. However, a derivation X G can in this case be extended to a Gderivation of 1 2 in which is a constituent. Lemma 1.79 Let G be a CFG. Then there exists a number k G such that for each derivation tree of a string of length k G there are two constituents y and z of identical category such that y z or z y, and the associated strings are different. Proof. To begin, notice that nothing changes in our claim if we eliminate the unproductive rules. This does not change the constituent structure. Now let be the maximum of all productivities of rules in G, and : N . Then let kG : 1 1. We claim that this is the desired number. (We can assume that 0. Otherwise G only generates strings of length 1, and then k G : 2 k G . Then there exists in satises our claim.) For let x be given such that x every derivation tree a branch of length . (If not, there can be no more than leaves.) On this branch we have two nonterminals with identical label. The strings associated to these nodes are different since we have no unproductive rules. We say, an occurrence C is a left constituent part (right constituent part) if C is an occurrence of a prex (sufx) of a constituent. An occurrence of x contains a left constituent part z if some sufx of x is a left constituent part. We also remark that if u is a left constituent part and a proper substring of x then x v v1 u with v1 a possibly empty sequence of constituents and v a right constituent part. This will be of importance in the sequel. Lemma 1.80 Let G be a CFG. Then there exists a number k G such that for every derivation tree of a string x and every occurrence in x of a string z of length kG z contains two different left or two different right constituent parts y and y1 of constituents that have the same category. Moreover, y is a prex of y1 or y1 a prex of y in case that both are left constituent parts, and y is a sufx of y1 or y1 a sufx of y in case that both are right constituent parts. Proof. Let : N and let be the maximal productivity of a rule from G. We can assume that 2. Put kG : 2 2 . We show by induction on the number m that a string of length 2 2 m has at least m left or at least
d
i W i
i Wi W i
W i
f 1
i W V
W i
f
g f U g U k D
i i i
i k
g V
i W i W i i i D
f
g U
74
m right constituent parts that are contained in each other. If m 1 the claim is trivial. Assume that it holds for m 1. We shall show that it also holds for m 1. Let z be of length 2 2 m 1 . Let x i 2 2 xi for certain xi with length at least 2 2 m . By induction hypothesis each xi contains at least m constituent parts. Now we do not necessarily have 2 2 m constituent parts in x. For if xi contains a left part then x j with j i may contain the corresponding right part. (There is only one. The sections in between contain subwords of that constituent occurrence.) For each left constituent part we count at most one (corresponding) right constituent part. In total we have at least 1 m m 1 constituent parts. However, we have to verify that at least m 1 of these are contained inside each other. Assume this is not the case, for all i. Then xi , i 2 2, contains exactly m left or exactly m right constituent parts. Case 1. x0 contains m left constituent parts inside each other. If x1 also contains m left constituent parts inside each other, we are done. Now suppose that this is not the case. Then x 1 contains m right constituent parts inside each other. Then we obviously get m entire constituents stacked inside each other. Again, we would be done if x 2 contained m right constituent parts inside each other. If not, then x2 contains exactly m left constituent parts. And again we would be done if these would not correspond to exactly m right part that x3 contains. And so on. Hence we get a sequence of length of constituents which each contain m constituents stacked inside each other. Now three cases arise: (a) one of the constituents is a left part of some constituent, (b) one of the constituent is a right part of some constituent. (For if neither is the case, we have a rule of arity , a contradiction.) In Case (a) we evidently have m 1 left constituent parts stacked inside each other, and in Case (b) m 1 right constituent parts. Case 2. x 0 contains m right hand constituents stacked inside each other. Similarly. This shows our auxiliary claim. Putting m : 1 the main claim now follows. Theorem 1.81 (Pumping Lemma) Given a CFL L there exists a p L such that for every string z L of length at least p L and an occurrence of a string r of length at least pL in z, z possesses a decomposition (1.104) x y such that the following holds. Either the occurrence of x or the occurrence of y is contained in the specied occurrence of r .
u x v y w
i U
V g f U i g
i Wi Wi Wi W i i D i i 1 i
g U i i
QW i i D
g g U
75
(The last property is called the pumpability of the substring occurrences of x and y.) Alternatively, in place of one may require that v p L . Further we can choose pL in such a way that every derivable string with designated occurrences of a string of length pS can be decomposed in the way given. Proof. Let G be a grammar which generates L. Let p L be the constant dened in Lemma 1.80. We look at a Gtree of z and the designated occurrence of r . Suppose that r has length at least pL . Then there are two left or two right constituent parts of identical category contained in r . Without loss of generality we assume that r contains two left parts. Suppose that these parts are not fully contained in r. Then r s x s1 where x s1 and s1 are left constituent parts of 0. There are s 2 and y such that v : s1 s2 identical category, say X. Now x and x s1 s2 y are constituents of category X. Hence there exists a decomposition
where v is a constituent of the same category as x v y satisfying and . By the Constituent Substitution Theorem we may replace the occurrence of x v y by v as well as v by x v y. This yields , after an easy induction. Now let the smaller constituent part be contained in r but not the larger one. Then we have a decomposition r s x v s1 such that v is a constituent part of category X and x v s1 a left constituent part of a constituent of category X. Then there exists a s2 such that also xvs1 s2 is a constituent of category X. Now put y : s 1 s2 . Then we also have y . The third case is if both parts are proper substrings of r . Also here we nd the desired decomposition. If we want to have in place of that v is as small as possible then notice that v already is a constituent. If it has length 1 then there is a decomposition of v such that it contains pumpable substrings. Hence in place of we may require that v pG. The Pumping Lemma can be stated more concisely as follows. For every large enough derivable string x there exist contexts C, D, where C , and a string y such x D C y , and D C k y L for every k . The strongest form of a pumping lemma is the following. Suppose that we have two decompositions into pumping pairs u 1 x1 v1 y1 w1 , u2 x2 v2 y2 w2 . We say that the two pairs are independent if either (1a) u 1 x1 v1 y1 is a prex of u2 , or (1b) u2 x2 v2 y2 is a prex of u1 , or (1c) u1 x1 is a prex of u2 and y1 w1 a sufx of w2 , or (1d) u2 x2 is a prex of u1 and y2 w2 a sufx of
i i i
0 ) (
i W i i i yW i i i W i i i i i QW dW dW i i W i W i W i W i i W i W i W i W i
i i
i i i
1 aV U V i
V i aV U U i
i i W i W i W i
i Wi Wi Wi W i
i i i i
i i i
i i iiiD
g f U
(1.105)
u x v y w
i i
i i
R ni
} T
i i i
i W i W i W i W bS i i i i D i i i
u xi v yi w : i
L.
i i i i
i W i i
i i i
76
w1 and (2) each of them can be pumped any number of times independently of the other. Theorem 1.82 (Manaster-Ramer & Moshier & Zeitman) Let L be a CFL. Then there exists a number mL such that if x L and we are given kmL occurrences of letters in x there are k independent pumping pairs, each of which contains at least one and at most mL of the occurrences. This theorem implies the wellknown Ogdens Lemma (see (Ogden, 1968)), which says that given at least mL occurrences of letters, there exists a pumping pair containing at least one and at most m L of them. Notice that in all these theorems we may choose i 0 as well. This means that not only we can pump up the string so that it becomes longer except if i 1, but we may also pump it down (i 0) so that the string becomes shorter. However, one can pump down only once. Using the Pumping Lemma we can show that the language n n n : n is not context free. For suppose the contrary. Then there is an m such that for all k m the string k k k can be decomposed into
k times the letters , and . It is clear The string v x contains exactly that we must have v . For if v contains two distinct letters, say and , then v contains an occurrence of before an occurrence of (certainly not the other way around). But then v 2 contains an occurrence of before an occurrence of , and that cannot be. Analogously it is shown that y . But this is a contradiction. We shall meet this example of a nonCFL quite often in the sequel. The second example of a context free graph grammar shall be the so called tree adjunction grammars. We take an alphabet A and a set N of nonterminals. A centre tree is an ordered labelled tree over A N such that all leaves have labels from A all other nodes labels from N. An adjunction tree is an ordered labelled tree over A N which is distinct from ordinary trees in that of the leaves there is exactly one with a nonterminal label; this label is the same as that of the root. Interior nodes have nonterminal labels. We require that an adjunction tree has at least one leaf with a terminal symbol.
s s } i p H v d3
i W i W i W i Wi
(1.107)
u v2 w x2 y
Furthermore there is an
i Wi W i Wi Wi
D
(1.106)
k k k
u v w x y k such that
1 i
s s p
i yW i
1 i p
77
Figure 7. Tree Adjunction
An unregulated tree adjunction grammar, briey UTAG, over N and A, N A where is a nite set of centre trees over N and is a quadruple A, and a nite set of adjunction trees over N and A. An example of a tree adjunction is given in Figure 7. The tree to the left is adjoined to a centre tree with root X and associated string ; the result is shown to the right. Tree B be a tree adjunction can formally be dened as follows. Let and A m an adjunction tree. We assume that r is the root of
0 3) j) i) (
S S
H H
0 ) ) ) (
0 i) ( ) j)
78
and that s is the unique leaf such that m r m s . Now let x be a node of B such that x m r . Then the replacement of x by is dened by naming the colour functionals. These are (1.108) y y y : : : y y y y : : : :
Two things may be remarked. First, instead of a single start graph we have a nite set of them. This can be remedied by standard means. Second, all vertex colours are terminal as well as nonterminal. One may end the derivation at any given moment. We have noticed in connection with grammars for strings that this can be remedied. In fact, we have not dened context free grammars but context free quasi grammars . However, we shall refrain from being overly pedantic. Sufce it to note that the adjunction grammars do not dene the same kind of generative process if dened exactly as above. Finally we shall give a graph grammar which generates all strings of the o form n n n , n 0. The idea for this grammar is due to Uwe M nnich (1999). We shall exploit the fact that we may think of terms as structures. We posit a ternary symbol, , which is nonterminal, and another ternary symbol, , which is terminal. Further, there is a binary terminal symbol . The rules are as follows. (To enhance readability we shall not write terms in Polish Notation but by means of brackets.) xyz xyz x y z
X
(1.110)
xyz
These rules constitute a socalled term replacement system. The start term . Now suppose that u v is a rule and that we have derived a is term t such that u occurs in t as a subterm. Then we may substitute this
T 8S
V i) U
V i) U
(1.109)
if y s, else.
f
T j S
V j ) U
if y s, else.
j
V j ) U
if s y, else.
j
V U
V U
V ) ) U V ) ) U X ) V W ') W l) W V ) ) p U U H
T j S V j ) U T ) j S i% D
w V j ) U T 8S D
w V i) U D w V i) U T S 8% D
V U
V b3 U
V p) U 8'Fl) F H
79
occurrence by v . Hence we get the following derivations.
Notice that the terms denote graphs here. We make use of the dependency coding. Hence the associated strings to these terms are , and . In order to write a graph grammar which generates the graphs for these , terms we shall have to introduce colours for edges. Put FE : 0 1 2 T : FV : , and FV . The start graph is as follows. It has four vertices, p, q, r and s. ( is empty (!), and q r s.) The labelling ,q ,r and s . is p
There are two rules of replacement. The rst can be written schematically as follows. The root, x, carries the label and has three incoming edges; their colours are 0, 1 and 2. These come from three disjoint subgraphs, 0 , and and in which 1 and 2 , which are ordered trees with respect to there are no edges with colour 0, 1 and 2. In replacement, x is replaced by a graph consisting of seven vertices, p, q i , ri and si , i 2, where qi r j sk ,
(With p q0 r0 s0 we reproduce the begin situation.) The tree 0 is attached to q0 to the right of q1 , 1 to r0 to the right of r1 and 2 to s0 to the right of

(1.113)
p W
H W
q0 q1
r0 r1
s0 s1
T 0
() 0 )
() 0
( @S
i j k 2, and q colouring is
p, r
p and s
p.
q 1 q0
r1 r0
s1 s0 . The
p 3
(1.112)
T ) j i) ) ) S
p p F@
H @H
p T p) 'l) ) S H X D
T p) @'l) ) FCS ) H X D
3H
(1.111)
V iVbp W !U W 'y W U p p)V Vibp W !U W 'yV W U V p p) Vbp Vbp V bp
) W lyV ) W lyV p) W 'F p) W 'F p) W 'F
W U W H H W U W H H W l) W H W l) W H W l) W H )V p) b'l)
U H X U H U V p) 8'l) H U H X U8'l) V p) H V p) 8'l) H
U U U ) p p p @F@ ) S

) )
H H @H
80
s1 . Additionally, we put x p for all vertices x of the i . (So, the edge x p has colour for all such x.) By this we see to it that in each step the union of the relations , 0, 1 and 2 is the intended tree ordering and that there always exists an ingoing edge with colour 0, 1 and 2 into the root. The second replacement rule replaces the root by a one vertex graph with label at the root. This terminates the derivation. The edges with label 0, 1 and 2 are transmitted under the name . This completes the tree. It has the desired form. Exercise 35. Strings can also be viewed as multigraphs with only one edge colour. Show that a CFG for strings can also be dened as a context free grammar on strings. We shall show in Section 2.6 that CFLs can also be generated by UTAGs, but that the converse does not hold. Exercise 36. Show that for every context free grammar there exists a context free grammar which has no rules of productivity 1 and which generates the same class of graphs. Exercise 37. Show that for every context free grammar there exists a context free grammar with the same yield and no rules of productivity 0. Exercise 38. Dene unregulated string adjunction grammars in a similar way to UTAGs. Take note of the fact that these are quasigrammars. Characterize the class of strings generated by these grammars in terms of ordinary grammars. Exercise 39. Show that the language w w : w A is not context free but that it satises the Pumping Lemma. (It does not satisfy the Interchange Lemma (2.111).) 7. Turing machines
We owe to (Turing, 1936) and (Post, 1936) the concept of a machine which is very simple and nevertheless capable of computing all functions that are believed to be computable. Without going into the details of what makes a function computable, it is nowadays agreed that there is no loss if we dene computable to mean computable by a Turing machine. The essential idea was that computations on objects can be replaced by computations on strings. The number n can for example be represented by n 1 successive strokes on a piece of paper. (So, the number 0 is represented by a single stroke. This is
0 ) (
1 i
i i W S
Turing machines
81
really necessary.) In addition to the stroke we have a blank, which is used to separate different numbers. The Turing machine, however powerful, takes a lot of time to compute even the most basic functions. Hence we agree from the start that it has an arbitrary, nite stock of symbols that it can use in addition to the blank. A Turing machine is a physical device, consisting of a tape which is innite in both directions. That is, it contains cells numbered by the set of integers (but the numbering is irrelevant for the computation). Each cell may carry a symbol from an alphabet A or a blank. The machine possesses a read and write head, which can move between the cells, one at a time. Finally, it has nitely many states, and can be programmed in the following way. We assign instructions for the machine that tell it what to do on condition that it is in state q and reads a symbol a from the tape. These instruction tell the machine whether it should write a symbol, then move the head one step or leave it at rest, and subsequently change to a state q . Denition 1.83 A (nondeterministic) Turing machine is a quintuple
where A is a nite set, the alphabet, L A is the socalled blank, Q a nite set, the set of (internal) states, q 0 Q the initial state and
L . Often, we use or even Here, we have written AL in place of A as particular blanks. What this describes physically is a machine that has a twosided innite tape (which we can think of as a function : A L ), with a read/write head positioned on one of the cells. A computation step is as follows. Suppose the machine scans the symbol a in state q and is on cell i . Then if b 1 q f a q , the machine may write b in place of a, advance to cell i 1 and change to state q . If b 0 q f a q the machine may write b in place of a, stay in cell i and change to state q . Finally, if b 1q f a q , the machine may write b in place of a, move to cell i 1 and switch to state q . Evidently, in order to describe the process we need (i) the tape, (ii) the position of the head of that tape, (iii) the state the machine is currently in. We assume throughout that the tape is almost everywhere lled by a blank. (The locution almost all and almost everywhere is often used
V ) U
RjV ) U
1 0 dk ) ) (
T s S
V ) U
the transition function. If for all b is called deterministic.
A L and q
e T ) )
v S 8e
1 y0 k ) ) (
V ) U
(1.115)
f : AL
AL
101
0 )
Q Q f bq 1, the machine
) ) ) ( g
(1.114)
A L Q q0 f
1 C0 k )
v i) (
82
in place all but nitely many and all but nitely many places, respectively.) This means that the content of the tape plus the information on the machine may be coded by a single string, called conguration. Namely, if the tape is almost everywhere lled by a blank, there is a unique interval m n which contains all nonblank squares and the head of the machine. Suppose that the machine head is on Tape . Then let x1 be the string dened by the interval m 1 (it may be empty), and x2 the string dened by the interval n . Finally, assume that the machine is in state q. Then the string x 1 q x2 is the conguration corresponding to that phyical conguration. So, the state of the machine is simply written behind the symbol of the cell that is being scanned. (Obviously, A and Q are assumed to be disjoint.) Denition 1.84 Let T A Q q0 f be a Turing machine. A T conguration is a string xqy AL Q AL such that x does not begin and y does not end with a blank. This conguration corresponds to a situation that the tape is almost empty (that is, almost all occurrences of symbols on it are blanks). The nonempty part is a string x, with the head being placed somewhere behind the prex u. Since x u v for some v, we insert the state the machine is in between u and v. The conguration omits most of the blanks, whence we have agreed that uqv is the same conguration as uqv and the same uqv . We shall now describe the working of the machine using congurations. We say, x q y is transformed by T in one step into x 1 q1 y1 and write x q y T x1 q1 y1 if one of the following holds.
Now, for T congurations Z and Z we dene Z n Z inductively by (a) T n Z 0 Z iff Z Z and (b) Z T 1 Z iff for some Z we have Z n Z T Z . T T It is easy to see that we can dene a semi Thue system on congurations that mimicks the computation of T . The canonical Thue system, C T , is shown in Table 2. (x and y range over A L and q and q over Q.) Notice that we have to take care not to leave a blank at the left and right end of the strings. This is why the denition is more complicated than expected. The
V U
~ F
V ) U
kk
1 d0
v i) (
kk
i QW
We have x
x1 c and y1
b y as well as c
1 q1
V ) U
1 0
) ) (
i W
We have x1
x c and y
b y1 as well as c 1 q1
f bq. f bq.
i yW
i yW
x1 x, and for some v and b and c we have y well as c 0 q1 f bq.
b v and y1
c v, as
R3 ) i
) D i
i W W i i dW
W i
i i
0 )
i i
e e ) ) ( ) I
i D V ) U
W i i D i Wi D 10 ) ) ( i i D i W W i ~ i W Wi i QW W i
1 i i
i i
v 3 Q)
~
i i
Turing machines Table 2. The Canonical Thue System
83
alphabet of the semi Thue system is Q A L . The following is easily shown by induction. Proposition 1.85 Let T be a Turing machine, C T be its associated semi Thue system. Then for all T congurations Z and Z and for all n 0: Z n Z T n n iff Z C T Z . Moreover, if Z is a T conguration and Z C T u for an arbitrary string u Q AL , then u is a T conguration and Z n u. T Of course, the semi Thue system denes transitions on strings that are not congurations, but this is not relevant for the theorem. Denition 1.86 Let T be a Turing machine, Z a conguration and x A . Z is called an end conguration if there is no conguration Z such that Z T Z . T accepts x if there is an end conguration Z such that q 0 x T Z. The language accepted by T , L T , is the set of all strings from A which are accepted by T . It takes time to get used to the concept of a Turing machine and the languages that are accepted by such machines. We suggest to the interested reader to play a little while with these machines and see if he can program them to compute a few very easy functions. A rst example is the machine which computes the successor function on binary strings. Assume our alphabet is . We want to build a machine which computes the next string for x in the numerical encoding (see Section 1.2 for its denition). This means that if
1 i
i W
k V U
) U
) U dk ) ) ( 0 i k a) i 1 0 i ) U d0 k ) i( 0 k a) 1 v) i ) U d0 k ) i) ( 0 i k ) 1 v 1 i U 0 k ) D i) ( 0 i k a) i 1 v i ) ) V U d0 k ) ( k 1 ) 0 V ) U d0 k ) ( 0 i k ) i 1 ) V U k ) ) ( 0 k a) ) 1 0 i V s i U V U V
1 d0 k ) ) (
0 ik
i i) i
i( 6@S ) @S ( ( @S i( 6@S i( 6@S i( 6@S i @S ( s i s s s
i( '@S U x1 i s s s s k
C T :
uqxv uyq v : y 1 q
f x q ; u or y A; v or x A uq uyq : y 1 q f q ; u or y A qxv q v : 1q f x q ; v or x A qq : q f q 101 uxqv uq yv : y 1 q f xq; u or x A; v or y A qv q yv : y 1 q f q ; v or y A uxq uq : 1q f x q ; u or x A uqxv uq yv : y 0 q f xq; v or x y A
T 1 ) i D V T 1 i V D i V T 1 T 1 D i D V ) TT ) ) 81 v S T 1 i T 1 D i T 1 iD D i V D
V U
T r) q S
84
Table 3. The Successor Machine
q3
the machine starts with q0 x it shall halt in the conguration q 0 y where y is the word immediately following x in the numerical ordering. (If in the sequel we think of numbers rather than strings we shall simply think instead of the string x of the number n, where x occupies the nth place in the numerical ordering.) How shall such a machine be constructed? We need four states, q i , i 4. First, the machine advances the head to the right end of the string, staying in q0 until it reads . Finally, when it hits , it changes to state q 1 and starts moving to the left. As long as it reads , it changes to and continues in state q1 , moving to the left. When it hits , it replaces it by , moves left and changes to state q2 . When it sees a blank, that blank is lled by and the machine changes to state q3 , the nal state. In q2 , the machine simply keeps moving leftwards until it hits a blanks and then stops in state q 3 . The machine is shown in Table 3. (If you want a machine that computes the successor in the binary encoding, you have to replace Line 6 by 1 q 3 .) In recursion theory the notions of computability are dened for functions on the set of natural numbers. By means of the function Z, which is bijective, these notions can be transferred to functions on strings. Denition 1.87 Let A and B be alphabets and f : A B a function. f is called computable if there is a deterministic Turing machine T such that for every x A there is a qt Q such that q0 x T qt f x and qt f x is an end conguration. Let L A . L is called recursively enumerable if L or there is a computable function f : A such that f L.
w D
D V U i
b T r) q S
v ) r ( i8a
i W
V U i
bS T r) q
i QW
q2
0 0 0
) ) )
q1
0 0
) ( ) v) r iba( v) q i( v) q i( v) q i( v) r iba( v) i( ) ba( ) r ) ( ) q
q0
1 q0 1 q0 1 q1 1 q2 1 q1 1 q3 1 q2 1 q2 0 q3
i W 1
1 i
Turing machines
85
The proof is a construction of a machine U from machines T and T computing f and g, respectively. Simply write T and T using disjoint sets of states, and then take the union of the transition functions. However, make the transition function of T rst such that it changes to the starting state of T as soon as the computation by T is nished (that is, whenever T does not dene any transitions).
Write a machine that generates all strings of A in successive order (using the successor machine, see above), and computes f x for all these strings. As soon as the target string is found, the machine writes x and deletes everything else. Lemma 1.90 Let A and B be nite alphabets. Then there are computable bijections f : A B and g : B A such that f g 1 . In this section we shall show that the recursively enumerable sets are exactly the sets which are accepted by a Turing machine. Further, we shall show that these are exactly the Type 0 languages. This establishes the rst correspondence result between types of languages and types of automata. Following this we shall show that the recognition problem for Type 0 languages is in general not decidable. The proofs proceed by a series of reduction steps for Turing machines. First, we shall generalize the notion of a Turing machine. A ktape Turing machine is a quintuple A L Q q 0 f where A, L, Q, and q0 are as before but now
This means, intuitively speaking, that the Turing machine manipulates k tapes in place of a single tape. There is a read and write head on each of the tapes. In each step the machine can move only one of the heads. The next state depends on the symbols read on all the tapes plus the current internal state. The initial conguration is as follows. All tapes except the rst are empty.
e T ) )
v S 8e
(1.116)
f : Ak L
Ak L
101
Lemma 1.89 Let f : A B be computable and bijective. Then f A also is computable (and bijective).
0 )
V U i
) ) ) (
Lemma 1.88 Let f : A B and g : B Then g f : A C is computable as well.
L is decidable if both L and A
L are recursively enumerable. C be computable functions.
1:
86
The heads are anywhere on these tapes (we may require them to be in position 0). On the rst tape the head is immediately to the left of the input. The ktape machine has k 1 additional tapes for recording intermediate results. The reader may verify that we may also allow such congurations as initial congurations in which the other tapes are lled with some nite string, with the head immediately to the left of it. This does not increase the recognition power. However, it makes the denition of a machine easier which computes a function of several variables. We may also allow that the information to the right of the head consists in a sequence of strings each separated by a blank (so that when two successive blanks follow the machine knows that the input is completely read). Again, there is a way to recode these machines using a basic multitape Turing machine, modulo computable functions. We shall give a little more detail concerning the fact that also ktape Turing machines (in whatever of the discussed forms) cannot compute more functions than 1tape machines. For this dene the following coding of the k tapes using a single tape. We shall group 2k cells together to a macro cell. The (micro) cell 2kp 2m corresponds to the entry on cell p on Tape m. The (micro) cell number 2kp 2m 1 only contains or depending on whether the head of the machine is placed on cell p on tape m. (Hence, every second micro cell is lled only with or .) Now given a ktape Turing machine T we shall dene a machine U that simulates T under the given coding. This machine operates as follows. For a single step of T it scans the actual string for the positions of the read and write heads and remembers the symbols on which they are placed (they can be found in the adjacent cell). Remembering this information requires only nite amount of memory, and can be done using the internal states. The machine scans the tape again for the head that will have to be changed in position. (To identify it, the machine must be able to do calculations modulo 2k. Again nite memory is sufcient.) It adjusts its position and the content of the adjacent cell. Now it changes into the appropriate state. Notice that each step of T costs 2k x time for U to simulate, where x is the longest string on the tapes. If there is an algorithm taking f n steps to compute then the simulating machine needs at most 2k f n n 2 time to compute that same function under simulation. (Notice that in f n steps the string(s) may acquire length at most f n n.) We shall use this to show that the nondeterministic Turing machines cannot compute more functions than the deterministic ones.
V U
Proposition 1.91 Let L
L T for a Turing machine. Then there is a deter-
V U V V U U g V U
ni
g @V U
Turing machines
87
Proof. Let L L T . Choose a number b such that f q x b for all q Q, x A. We x an ordering on f q x for all x and q. V is a 3tape machine that does the following. On the rst tape V writes the input x. On the second tape we generate all sequences p of numbers b of length n, for increasing n. These sequences describe the action sequences of T . For each sequence p a0 a1 an 1 we copy x from Tape 1 onto Tape 3 and let V work as follows. The head on Tape 2 is to the left of the sequence a. In the rst step V follows the a0 th alternative for machine T on the 3rd tape and advances head number 2 one step to the right. In the second step it follows the alternative a 1 in the transition set of T and executes it on Tape 3. Then the head of Tape 2 is advanced one step to the right. If an 1 b and the an 1 st alternative does not an 1 , exist for T but there is a computation for a 0 a1 an 2 a for some a V exits the computation on Tape 3 and deletes p on Tape 2. If a n 1 b, the b, then V an 1 st alternative does not exist for T , and none exists for any a halts. In this way V executes on Tape 3 a single computation of T for the input and checks the prexes for paths for which a computation exists. Clearly, V is deterministic. It halts iff for some n T halts on some alternative sequences of length n 1. It is easy to see that we can also write a machine that enumerates all possible outputs of T for a given input.
has to be dealt with separately. It is easy to construct Proof. The case L a machine that halts on no word. This shows the equivalence in this case. Now assume that L . Let L be recursively enumerable. Then there exists a function f : A such that f L and a Turing machine U which computes f . Now we construct a (minimally) 3tape Turing machine V as follows. The input x will be placed on the rst tape. On the second tape V generates all strings y starting with , in the numerical order. In order to do this we use the machine computing the successors in this ordering. If we have computed the string y on the second tape the machine computes the value f y on the third tape. (Thus, we emulate machine T on the third tape, with input given on the second tape.) Since f is computable, V halts on Tape 3. Then it compares the string on Tape 3, f y , with x. If they are equal, it halts, if not it computes the successor of y and starts the process over again.
V U
Lemma 1.92 L is recursively enumerable iff L T.
L T for a Turing machine
D k
IjV ) U i k
V U i
i aa
b T r) q S
V U
ministic Turing machine U such that L
LU .
V ) U i
b1 i T r) q S
bS T r) q w D
V U D
V U i
aa
88
It is easy to see that L L V . By the previous considerations, there is a one tape Turing machine W such that L L W . Now conversely, let L L T for some Turing machine T . We wish to show that L is recursively enumerable. We may assume, by the previous theorem, that T is deterministic. We leave it to the reader to construct a machine U which computes a function f: A whose image is L. Theorem 1.93 The following are equivalent. L is of Type 0. L is recursively enumerable.
and . The theorem then follows with Proof. We shall show Lemma 1.92. Let L be of Type 0. Then there is a grammar N A R which generates L. We have to construct a Turing machine which lists all strings that are derivable from . To this end it is enough to construct a nondeterministic machine that matches the grammar. This machine always starts at input and in each cycle it scans the string for a left hand side of a rule and replaces that substring by the right hand side. This shows . Now let L L T for some Turing machine. Choose the following grammar G: in addition to the alphabet let be the start symbol, and two nonterminals, and let each q Q q be a nonterminal. The rules are as follows.
q0
}
(1.117)

qb qb qb
rc
if c 0 r if c if f b q
r cb
1r
Starting with this grammar generates strings of the form q0 x, where x is a binary string. This codes the input for T . The additional rules code in a transparent way the computation of T on the string. If the computation stops, it is allowed to eliminate q . If the string is terminal it will be generated by G. In this way it is seen that L G LT . Now we shall derive an important fact, namely that there exist undecidable languages of Type 0. We rst of all note that Turing machines can be regarded
w V D V ) U y0 ) 1 V ) U 0 1 V ) U 0 1
) U v i) ( ) ) ( ) ) (
qb
if c 1 r
f bq
f bq
f bq
0 ) ) ) (
V U
V U
V U
r q t
V U
L T for a Turing machine T .
V U `
V U
bS T r) q D
Turing machines
89
as semi Thue systems, as we have done earlier. Now one can design a machine U which takes two inputs, one being the code of a Turing machine T and the other a string x, and U computes what T computes on x. Such a machine is called a universal Turing machine. The coding of Turing machines can be done as follows. We only use the letters , and , which are, of course, also contained in the alphabet B. Let A n . Then let i be the i :i number i in dyadic coding (over , where replaces and replaces ). The number 0 is coded by to distinguish it from . Furthermore, we associate the number n with the blank, . The states are coded likewise; we assume that Q 01 n 1 for some n and that q 0 0. Now we still have to write down f . f is a subset of
Each element a q b m r of f can be written down as
where x a , u Z 1 q , y b , v Z 1 r . Further, we have if 1, if m 0 and if m 1. Now we simply write down f m as a list, the entries being separated by . (This is not necessary, but is easier to handle.) We call the code of T T . The set of all codes of Turing machines is decidable. (This is essential but not hard to see.) It should not be too hard to see that there is a machine U with two tapes, which for two strings x and y does the following. If y T for some T then U computes on x exactly as T does. If y is not the code of a machine, U moves into a special state and stops. Suppose that there is a Turing machine V which decides for given x and T wether or not x L T . Now we construct a two tape machine W as follows. The input is x, and it is given on both tapes. If x T for some T then W computes T on x. (This is done by emulating V .) If T halts on x, we send W into an innite loop. If T does not halt, W shall stop. (If x is not the code of a machine, the computation stops right away.) Now we have the following: W L W exactly if W L W . For W L W exactly when W stops if applied to W . This however is the case exactly if W does not stop. If on the other hand W L W then W does not stop if applied to W , which we can decide with the help of machine V , and then W does halt on the input W . Contradiction. Hence, V cannot exist. There is, then, no machine that can decide for any Turing machine (in code) and any input whether that machine halts on that string. It is still conceivable that this is decidable for
H D
p@p v i i H D V U D i V U D i V U D i V U D i D D D D p W i W p W i W p W W p W i W p W i i
(1.119)
e 2T ) )
v S e 82
V U
0 ) ) ) ) (
1 i
(1.118)
101
V U H
T Fl) S H H T
v aaa) ) S )
90
every T , but that we simply do not know how to extract such an algorithm for given T . Now, in order to show that this too fails, we use the universal Turing machine U, in its single tape version. Suppose that L U is decidable. Then we can decide whether U halts on x T . Since U is universal, this means that we can decide for given T and given x whether T halts on x. We have seen above that this is impossible. Theorem 1.94 (Markov, Post) There is a recursively enumerable set which is not decidable. So we also shown that the Type 1 languages are properly contained in the Type 0 languages. For it turns out that the Type 1 languages are all decidable. Theorem 1.95 (Chomsky) Every Type 1 language is decidable. Proof. Let G be of Type 1 and let x be given. Put n : x and : A N . If there is a derivation of x that has length n , there is a string that occurs twice in it, since all occurring strings must have length n. Then there exists a shorter derivation for x. So, x L G iff it has a Gderivation of length n . This is decidable.
m
Corollary 1.96 CSL
GL.
Chomsky (1959) credits Hilary Putnam with the observation that not all decidable languages are of Type 1. Actually, we can give a characterization of context sensitive languages as well. Say that a Turing machine is linearly space bounded if given input x it may use only O x on each of its tapes. Then the following holds. Theorem 1.97 (Landweber, Kuroda) A language L is context sensitive iff L L T for some linear space bounded Turing machine T . The proof can be assembled from Theorem 1.65 and the proof of Theorem 1.93. We briey discuss socalled word problems. Recall from Section 1.5 the denition of a Thue process T . Let A be an alphabet. Consider the monoid A . The set of pairs s t A A such that s T t is a congruence on A . Denote the factor algebra by T . (One calls the pair A T a preT .) It can be shown to be undecidable whether T is sentation of the one element monoid. From this one deduces that it is undecidable whether
V U 0 ) (
ni
V U
nV niiU
V " U e
V U
W i i
1 i
1 0 ) (
V " U
V U
V Y U V Y U
Turing machines
91
or not T is a nite monoid, whether it is isomorphic to a given nite monoid, and many more. Before we close this chapter we shall introduce a few measures for the complexity of computations. In what is to follow we shall often have to deal with questions of how fast and with how much space a Turing machine can compute a given problem. Let f : be a function, T a Turing machine which computes a function g : A B . We say that T needs O f space if there is a constant c such that for all but nitely many x A there is a computation of an accepting conguration qt g x from q0 x in which every conguration has length c f x . For a multi tape machine we simply add the lengths of all words on the tapes. We say that T needs O f time if for almost all x A there is a k c f x such that q0 x k q0 g x . T We denote by DSPACE f (DTIME f ) the set of all functions which for some k are computable by a deterministic ktape Turing machine in O f space (O f time). Analogously the notation NSPACE f and NTIME f is dened for nondeterministic machines. We always have
as well as
For a machine can ll at most k cells in k steps, regardless of whether it is deterministic or nondeterministic. This applies as well to multi tape machines, since they can only write on one cell and move one head at a time. The reason for not distinguishing between the time complexity f n and the c f n (c a constant) is the following result. Theorem 1.98 (Speed Up Theorem) Let f be a computable function and let T be a Turing machine which computes f x in at most g x steps (using at most h x cells) where inf n g n n . Further, let c be an arbitrary real number 0. Then there exists a Turing machine U which computes f in at most c g x steps (using at most c h x cells). The proof results from the following fact. In place of the original alphabet AL we may introduce a new alphabet BL : A B L , where each symbol from B corresponds to a sequence of length k of symbols from A L . The symbol L then corresponds to Lk . The alphabet AL is still used for giving the
V U
nV niiU
T k s S
V U i
nV niiU
V U
V U
} QV U
(1.121)
DSPACE f
NSPACE f
V U
} dV U
} dV U
(1.120)
DTIME f
NTIME f
NSPACE f
V U V U
V U i
V U
V U
i yW
i W 1 i
V U
V U i
nV niiU
V U e
nV i iU
V U
1 i
V U V U k
nV i iU nV i iU
V U
92
input. The new machine, upon receiving x recodes the input and calculates completely inside BL . Since to each single letter corresponds a block of k letters in the original alphabet, the space requirement shrinks by the factor k. (However, we need to ignore the length of the input.) Likewise, the time is cut by a factor k, since one move of the head simulates up to k moves. However, the exact details are not so easy to sum up. They can be found in (Hopcroft and Ullman, 1969). Typically, one works with the following complexity classes. Denition 1.99 PTIME is the class of functions computable in deterministic polynomial time, NP the class of functions computable in nondeterministic polynomial time. PSPACE is the class of functions computable in polynomial space, EXPTIME (NEXPTIME) the class of functions computable in deterministic (nondeterministic) exponential time.

Notes on this section. In the mid 1930s, several people have independently studied the notion of feasibility. Alonzo Church and Stephen Kleene have dened the notion of denablity and of a general recursive function, Emil Post and Alan Turing the notion of computability by a certain machine, now called the Turing machine. All three notions can be shown to identify the same class of functions, as these people have subsequently shown. It is known as Churchs Thesis that these are all the functions that humans can compute, but for the purpose of this book it is irrelevant whether it is correct. We shall dene the calculus later in Chapter 3, without going into the details alluded to here, however. It is to be kept in mind that the Turing machine is a physical device. Hence, its computational capacities depend on the structure of the spacetime continuum. This is not any more a speculation. Quantum computing exploits the different physical behaviour of quantum physics to do parallel computation. This radically changes the time complexity of problems (see (Deutsch et al., 2000)). This asks us to be cautious not to attach too much signicance to complexity results in connection with human behaviour since we do not know too well how the brain works. Exercise 40. Construct a Turing machine which computes the lexicographic predecessor of a string, and which returns for input . Exercise 41. Construct a Turing machine which, given a list of strings (each
Denition 1.100 A language L
A is in a complexity class
iff L
Turing machines
93
string separated from the next by a single blank), moves the rst string onto the end of the list. Exercise 42. Let T be a Turing machine over A. Show how to write a Turing which computes the same partial function over A under machine over a coding that assigns each letter of A a unique block of xed length. Exercise 43. In many denitions of a Turing machine the tape is only one sided. Its cells can be numbered by natural numbers. This requires the introduction of a special symbol that marks the left end of the tape, or of a predicate , which is true each time the head is at the left end of the tape. The transitions are different depending on whether the machine is at the left end of the tape or not. (There is an alternative, namely to stop the computation once that the left end is reached, but this is not recommended. Such a machine can compute only very uninteresting functions.) Show that for a Turing machine with a one sided tape there is a corresponding Turing machine in our sense computing the same function, and that for each Turing machine in our sense there is a one sided machine computing the same function. Exercise 44. Prove Lemma 1.90. Hint. Show rst that it is enough to look at the case A 1.
T r) q S b
Exercise 45. Show that L putable.
T r) q bS
o IRt D
A is decidable iff L : A
is com-
Chapter 2 Context Free Languages

1. Regular Languages
Type 3 or regular grammars are the most simple grammars in the Chomsky Hierarchy. There are several characterizations of regular languages: by means of nite state automata, by means of equations over strings, and by means of socalled regular expressions. Before we begin, we shall develop a simple form for regular grammars. First, all rules of the form X Y can be eliminated. To this end, the new set of rules will be (2.1)
G
~
G
~
It is easy to show that the grammar with R in place of R generates the same strings. We shall introduce another simplication. For each a A we introduce a new nonterminal Ua . In place of the rules X a we now add the rules X aUa as well as Ua . Now every rule with the exception of Ua is strictly expanding. This grammar is therefore not regular if L G but it generates the same language. However, the last kind of rules can be used only once, at the end of the derivation. For the derivable strings all have the then the nonterform x Y with x A and Y N. If one applies a rule Y minal disappears and the derivation is terminated. We call a regular grammar strictly binary if there are only rules of the form X aY or X . Denition 2.1 Let A be an alphabet. A (partial) nite state automaton is a quintuple A Q i0 F such that Q is a nite set, i0 Q, F Q and : Q A Q . Q is the set of states, i0 is called the initial state, F the set of accepting states and the transition function. is called deterministic if q a contains exactly one element for each q Q and a A.
V V 6) U U ) i
(2.2c)
Sx a : Sx a
V ) U (
(2.2b)
Sa :
V W 6) U i D V ) U V ) U
(2.2a)
S : S q a :q
S
can be extended to sets of states and strings in the following way (S a A).
V U
1 6) i i
V U 0 ) ) ) ) (
x:X
xx
S S s
R :
aY : X
aY
1 i
Wi
V ) U
Q,
96
Context Free Languages
With this dened, we can now dene the accepted language.
is strictly partial if there is a state q and some a A such that q a . An automaton can always be transformed into an equivalent automaton which is not partial. Just add another state q and add to the transition function the following transitions.
Furthermore, q shall not be an accepting state. In the case of a deterministic q for some q . In this case we think of the automaton we have q x transition function as yielding states from states plus strings, that is, we now have q x q . Then the denition of the language of an automaton can be rened as follows.
For every given automaton there is a deterministic automaton that accepts the same language. Put
Proposition 2.2 d is deterministic and L d L . Hence every language accepted by a nite state automaton is a language accepted by a deterministic nite state automaton. The proof is straightforward and left as an exercise. Now we shall rst show that a regular language is a language accepted by a nite state automaton. We may assume that G is (almost) strictly binary, as we have seen above. So, let G N A R . We put QG : N, i0 : , FG : X : X R as well as
) )
) (
Now put
A QG i0 FG G .
V ) U
(2.7)
G X a :
Y :X
aY
V U
where F d : G Q:G extended to sets of states.
0 )
Tw D ) bT
S) 'V U
) (
0 ) ) ) (
(2.6)
AQ
i0 F d
1 i V 6) U
i IS
V U
(2.5)
x : i0 x
and is the transition function of
T lk S
V !) U i
V ) U
(2.4)
qa :
V ) U
qa q
if q a if q a
and q q , or q q .
V ) U
w V ) U D w 2V ) U D
T w
t i) V 6bT
S !U
i IS D
V U
(2.3)
x:
i0 x
V 6) U i
Regular Languages
R
|
97
Proof. Induction over the length of x. The case x is evident. Let x a A. Then Y G X a by denition iff X aY R, and from this we get X R aY . Conversely, from X R aY follows that X aY R. For since the derivation uses only strictly expanding rules except for the last step, the derivation of aY from X must be the application of a single rule. This nishes the case of length 1. Now let x y a. By denition of G we have Hence there is a Z such that Z G X y and Y G Z a . By induction hypothesis this is equivalent with X R y Z and Z R aY . From this we get X R y a Y x Y . Conversely, from X R x Y we get X R y Z and Z R aY for some Z, since G is regular. Now, by induction hypothesis, Z G X y and Y G Z a , and so Y G x X .
G
Proof. It is easy to see that L G x:G x Y Y R . By Lemma 2.3 x Y L G iff S R x Y . The latter is equivalent with Y G x . And this is nothing but x L G . Hence L G L G . Given a nite state automaton A Q i0 F put N : Q, S : i0 . R consists of all rules of the form X aY where Y X a as well as for X F. Finally, G : S N A R . G is all rules of the form X strictly binary and G . Therefore we have L G L . Theorem 2.5 The regular languages are exactly those languages that are accepted by some deterministic nite state automaton. Now we shall turn to a further characterization of regular languages. A regular term over A is a term which is composed from A with the help of the symbols 0 (0ary), (0ary), (binary), (binary) and (unary). A regular term denes a language over A as follows. (2.9b) (2.9c) (2.9d) (2.9e) (2.9f) L : La : LR S : LR
V U D V s U V U
LR S :
aV U V U V U s V U V U T S T S
a LR LS LR LS LR
V U V U
V U
(2.9a)
L0 :
V 6) U i
V U V 0 ) ) ) D ( V ) U 1
D 0 ) ) ) ) ( U V U D D
W ~ i
i S
V U
V U
Wi
1 i
Proposition 2.4 L
|
LG.
Wi
V ) U
Wi
V ) U i
Wi | V ) U i
V V ) U ) i
V ) U
Wi
V 6) U i
(2.8)
G X x
G G X y a
Wi
1 i D V 6) U i
Wi
Lemma 2.3 For all X Y
N and x we have Y
X x iff X
x Y.
V ) U D
V ) U i | W Wi |
V U
Wi
98
(Commonly, one writes R in place of L R , a usage that we will follow in the sequel to this section.) Also, R : R R is an often used abbreviation. Languages which are dened by a regular term can also be viewed as solutions of some very simple systems of equations. We introduce variables (say X, Y and Z) which are variables for subsets of A and we write down equations for the terms over these variables and the symbols 0, , a (a A), , and . An example is the equation X X, whose solution is X .
Proof. The proof is by induction over the length of x. x X means by denition that x R X. If x then x R . Hence let x ; then x R X and so it is of the form u0 x0 where u0 R and x0 X. Since u0 , x0 has smaller length than x. By induction hypothesis we therefore have x 0 R . Hence x R . The other direction is as easy.
We shall now show that regular languages can be seen as solutions of systems of equations. A general system of string equations is a set of equations of the i i form X j Q i m T where Q is a regular term and the T have the form R Xk where R is a regular term. Here is an example. (2.11) X0 X1
X1 X0
Notice that like in other systems of equations a variable need not occur to the right in every equation. Moreover, a system of equations contains any given variable only once on the left. The system is called proper if for all i and j we have L T ji . We shall call a system of equations simple if it is proper and Q as well as the T ji consist only of terms made from elements of A using and . The system displayed above is proper but not simple. Let now N A R be a strictly binary regular grammar. Introduce for each nonterminal X a variable QX . This variable QX shall stand for the set of all strings which can be generated from X in this grammar, that is, all strings
has exactly one solution, namely X
p s p fs D p H H D
0 ) ) ) (
(2.10)
C D X D C.
V U
Lemma 2.7 Let C D be regular terms, D
0 and
L D . The equation
1 i i 1D i
1 i i D1 i i
1 i 1 i
V U
i W i i
Lemma 2.6 Assume R X R X.
0 and
L R . Then R is the unique solution of
H D 1
V U
D H
D i
1 i
1 i
Regular Languages
99
x for which X R x. This latter set we denote by X . We claim that the Q X so interpreted satisfy the following system of equations.
This system of equations is simple. We show QY Y for all Y N. The proof is by induction over the length of the string. To begin, we show that QY Y . For let y QY . Then either y and Y R or we have y a x with x QX and Y a x R. In the rst case Y R, whence Y . In the second case x y and so by induction hypothesis x X , hence X R x. Then we have Y R a x y, from which y Y . This shows the rst inclusion. Now we show that Y QY . To this end let Y R y. Then either y and so Y R or y a x for some x. In the rst case y QY , aX R by denition. In the second case there must be an X such that Y y and therefore by induction hypothesis x Q X . and X R x. Then x Finally, by denition of QY , y QY , which had to be shown. So, a regular language is the solution of a simple system of equations. Conversely, every simple system of equations can be rewritten into a regular grammar which generates the solution of this system. Finally, it remains to be shown that regular terms describe nothing but regular languages. What we shall establish is more general and derives the desired conclusion. We shall show that every proper system of equations which has as many equations as it has variables has as its solution for each variable a regular language. To i this end, let such a system X j i m j T j be given. We begin by eliminating X0 from the system of equations. We distinguish two cases. (1) X0 appears in i the equation X0 i m0 T j only to the left. This equation is xed, and called the pivot equation for X0 . Then we can replace X0 in the other equations by i m T ji . (2) The equation is of the form X0 C D X0 , C a regular 0 term, which does not contain X0 , D free of variables and L D . Then X0 D C by Lemma 2.7. Now X0 does not occur and we can replace X0 in the other equations as in (1). The system of equations that we get is not simple, even if it was simple at the beginning. We can proceed in this fashion and eliminate step by step the variables from the right hand side (and putting aside the corresponding pivot equations) until we reach the last equation. The solution for Xn 1 does not contain any variables at all and is a regular term. The solution can be inserted into the other equations, and then we continue with Xn 2 , then with Xn 3 , and so on. As an example, we take the following
1 i
1 i i
1 i
V U
1 i
i yW i 1 } D i iyW | D nib i 1 QW i
1 i
i i
1 i
(2.12)
a QX : Y
aX
QY
:Y
s D
1 i
1 i QW i } D
100
system of equations.
X1 X1 X1 X1
(IV) X2
Now that X2 is known, X1 can be determined by inserting the regular term for X2 , and, nally, X0 is obtained by inserting the values for X2 and X1 . Theorem 2.8 (Kleene) Let L be a language over A. Then the following are equivalent: L is regular.
L is the solution for X0 of a simple system of equations over A with variables Xi , i m. Further, there exist algorithms which (i) for a given automaton compute a L R ; (ii) for a given regular term R compute regular term R such that L a simple system of equations over X whose solution for a given variable X0 is exactly L R ; and (iii) which for a given simple system of equations over Xi : i m compute an automaton such that X is its set of states and the solution for Xi is exactly the set of strings which send the automaton from state X0 into Xi . This is the most important theorem for regular languages. We shall derive a few consequences. Notice we can turn a nite state automaton into a Turing machine T accepting the same language in linear time and no additional space. Therefore, the recognition problem for regular languages is in
V U
V U
V U
L R for some regular term R over A.
V U
for a nite, deterministic automaton
(III) X1 X2
V s U Ip H H IIp V p s s V p H !U H sH p V H H s H sp !U s p s H
over A.
yp V yp
s @V p !U ! s p V pU H H H H H p!U !" @V V pU s s 2U pU !s H p H !U @H V p H H H
s U s H
H@H s H p
(II)
X0 X1 X2
pH ys s H
s s s V p H H
ypH
s D D D
D D
(I)
X0 X1 X2
X0 X0 X0
X1
X2 X2 X2 X2 X2 X2 X2
V U
D D
Regular Languages
101
DTIME n and in DSPACE n . This also applies to the parsing problem, as is easily seen.
Corollary 2.10 The set of regular languages over A is closed under intersection and relative complement. Further, for given regular terms R and S one can determine terms U and V such that L U A L R and L V LR LS. Proof. It is enough to do this construction for automata. Using Theorem 2.8 it follows that we can do it also for the corresponding regular terms. Let A Q i0 F . Without loss of generality we may assume that is determin: A Q i0 Q F . We then have L A L . istic. Then let This shows that for given we can construct an automaton which accepts the complement of L . Now let A Q i0 F . Put
where
Theorem 2.11 Let L and M be regular languages. Then so are L M and M L. Moreover, LT , LP : L A as well as LS : A L are regular. Furthermore, the following important consequence can be established.
Proof. Let and be given. By Theorem 2.8 we can compute a regular term R with L R L as well as a regular term S with L S L . Then L L iff L R L S iff L R L S LS LR . By Corollary 2.10 we can compute a regular term U such that
Hence L
iff L U
. This is decidable by Lemma 2.13.
V aV U
(2.15)
LU
LR
LS
LS
LR
UD
V aV U
V U
v V U $aV U U s V
v V U U
w V U V U V U D D v V U aV U V U U V U U s V v D
V U
V U D V U
Theorem 2.12 Let L whether L
and .
be nite state automata. Then it is decidable
V k U
It is easy to show that L L L The proof of the next theorem is an exercise.
T V ) k U k
1 k V ) U )
t @V U
0 k ) @S (
V k e U
V 0 k ) aV k ) (U
V U V U
V U
(2.14)
qq
a :
r r :r
qa r
0k
)k
) 0 k ) ) k (
(2.13)
AQ
i0 i0 F
q a
V U D D
Corollary 2.9 The recognition and the parsing problem are in DTIME n and DSPACE n .
V U
V U
V U
V "U
0k )k ) k )k ) (
V U
0 )
V U
) ) ) (
) ( k e D V U
V U
0 ) ) ) ) (
V U
V U
V U
t V U u
102
Proof. By induction on R. If R or R a then L R . If R 0 then by denition L R . Now assume that the problems L R and L S are decidable. Notice that (a) L R S iff L R and L S , (b) L R S iff L R or L S and (c) L R iff L R . All three problems are decidable. We conclude with the following theorem, which we have used already in Section 1.5. Theorem 2.14 Let L be context free and R regular. Then L R is context free. Proof. Let be G S N A R be a CFG with L G L and n0F a deterministic automaton consisting of n states such that L R. We may assume that rules of G are of the form X a or X Y . We dene new nonterminals, which are all of the form i X j , where i j n and X N. The interpretation is as follows. X stands for the set of all strings A such that X G . i X j stands for the set of all such that X G and i j. We have a set of start symbols, consisting of all 0 S j with j F. As we already know, this does not increase the generative power. A rule X Y0Y1 Yk 1 is now replaced by the set of all rules of the form
a, i a j. This denes the Finally, we take all rules of the form i X j grammar Gr . We shall show: Gr x iff G x and x L . ( ) Let be a Gr j tree with associated string x. The map i X X turns into a Gtree. Hence x L G . Further, it is easily shown that 0 x 0 x1 x j k j , where k j 1 X k j 0 Skn is the top node and by is the node dominating x j . Also, if x n, then construction kn F. Hence x 0 F and so x L . ( ) Let x L G and x L . We shall show that x L Gr . We take a Gtree for x. We shall now prove that one can replace the Gnonterminals in in such a way by Gr nonterminals that we get a Gr tree. The proof is by induction on the height of a node. We begin with nodes of height 1. Let x i n xi ; and let Xi ji . Then p0 0 and be the nonterminal above xi . Further let 0 i j xi pn F. We replace Xi by pi X pi 1 . We say that two nodes x and y connect if they are adjacent and for the labels i X j of x and kY of y we have j k. Let x be a node of height n 1 with label X and let x be mother of the nodes with
V U
1 i
V U
iD
V U 1 i V ) U
aa
1 i
) U
) U
ni
1 i 1 YV ) U i
W aa W
(2.16)
Xj
Y0i0
i0
Y1i1
ik
Ykj
0 ) ) ) (
w V U w D V U V DU
aa
V i) U D 1 i
V U D
w V U w D V U w VD U D w V U D t i 1 ) i D
V U
w V U V Ds U
w i
V U D
Lemma 2.13 The problem L R able.
, where R is a regular term, is decid-
0 ) ) ) (
V U
V U D
V U
V U V U i
1 i
1 i
Normal Forms
103
labels Y0Y1 Yn 1 in G. We assume that below x all nodes carry labels from Gr in such a way that adjacent nodes connect. Then there exists a rule in G r such that X can be labelled with superscripts, the left hand superscript of Y0 to its left and the right hand superscript of Yn 1 to its right. All adjacent nodes of height n 1 connect, as is easily seen. Further, the leftmost node carries the left superscript 0, the rightmost node carries a right superscript p n , which is an accepting state. Eventually, the root has superscripts as well. It carries the label 0 S pn , and so we have a Gr tree. Exercise 46. Prove Theorem 2.11. Exercise 47. Show that a language is regular iff it can be generated by a grammar with rules of the form X Y, X Ya, X a and X . Such a grammar is called left regular, in contrast to the grammars of Type 3, which we also call right regular. Show also that it is allowed to add rules of the form X x and X Y x. a, Exercise 48. Show that there is a grammar with rules of the form X aY and X Ya which generates a nonregular language. This means X that a Type 3 grammar may contain (in general) only left regular rules or only right regular rules, but not both. Exercise 49. Show that if L and M are regular, then so are L M and M L. Exercise 50. Let L be a language over A. Dene an equivalence relation S over A as follows. x S y iff for all z A we have x z L y z L. L is said to have nite index if there are only nitely many equivalence classes with respect to S . Show that L is regular iff it has nite index.
Exercise 52. Show that the intersection of a context sensitive language with a regular language is again context sensitive. Exercise 53. Show that L is regular iff it is accepted by a read only 1tape Turing machine. 2. Normal Forms
In the remaining sections of this chapter we shall deal with CFGs and their languages. In view of the extensive literature about CFLs it is only possible
Exercise 51. Show that the language index. Hence it is not regular.
n n
:n
does not have nite
1 i Wi
1 i Wi
1 i
aa g
104
to present an overview. In this section we shall deal in particular with normal forms. There are many normal forms for CFGs, each having a different purpose. However, notice that the transformation of a grammar into a normal form necessarily destroys some of its properties. So, to say that a grammar can be transformed into another is meaningless unless we specify exactly what properties remain constant under this transformation. If, for example, we are only interested in the language generated then we can transform any CFG into Chomsky Normal Form. However, if we want to maintain the constituent structures, then only the socalled standard form is possible. A good exposition of this problem area can be found in (Miller, 1999). Before we deal with reductions of grammars we shall study the relationship between derivations, trees and sets of rules. To be on the safe side, we shall assume that every symbol occurs at least once in a tree, that is, that the grammar is slender in the sense of Denition 2.17. From the considerations N A R and G of Section 1.6 we conclude that for any two CFGs G N A R LB G LB G iff der G der G . Likewise we see that for all X N N der G X der H X iff R R . Now let G N A R and a sequence i : i n be given. In order to test whether is a Gstring sequence we have to check for each i n 1 whether i 1 can be derived from i with a single application of a rule. To this end we have to choose an i and apply a rule and check whether the string obtained equals i 1 . Checking this needs aG i steps, where aG is a constant which depends only on G. Hence for the whole derivation we need i n aG i steps. This can be estimated from above by aG n n 1 and if G is strictly expanding also by aG n 1 2 . It can be shown that there are grammars for which this is the best possible bound. In order to check for an ordered labelled tree whether it can be generated by G we need less time. We only need to check for each node whether the local tree at x conforms to some rule of G. This can be done in constant time. The time therefore only linearly depends on the size of the tree. There is a tight connection between derivations and trees. To begin, a derivation has a unique tree corresponding to it. Simply translate the derivation in G into a derivation in G. Conversely, however, there may exist many derivations for the same tree. Their number can be very large. However, we be an (exhauscan obtain them systematically in the following way. Let tively ordered, labelled) tree. Call B 2 a linearisation if is an irreexive, linear ordering and from x y follows x y. Given a linearisation, a derivation is found as follows. We begin with the element which is smallest
0 ) ) ) ( i
0 ) ) ) (
}
Vk U
i fe
V U
} c
V )
i e
Vk U D 0
i( V ) U Dk s 1 ) V U 0 k ) ) k k ( i xe i i
Normal Forms
105
with respect to . This is, as is easy to see, the root. The root carries the label . Inductively, we shall construct cuts i through such that the sequence i : i n is a derivation of the associated string. (Actually, the derivation is somewhat more complex than the string sequence, but we shall not complicate matters beyond need here.) The beginning is clear: we put 0 : . Now assume that i has been established, and that it is not identical to the associated string of . Then there exists a node y with nonterminal label in i . (There is a unique correspondence between nodes of the cut and segments of the strings i .) We take the smallest such node with respect to . Let its label be Y . Since we have a Gtree, the local tree with root y corresponds to a rule of the form Y for some . In i y denes a unique instance of that rule. Then i 1 is the result of replacing that occurrence of Y by . The new string is then the result of applying a rule of G, as desired. It is also possible to determine for each derivation a linearisation of the tree which yields that derivation in the described manner. However, there can be several linearisations that yield the same derivation. LB G . Further, let be a lineariTheorem 2.15 Let G be a CFG and sation of . Then determines a Gderivation der of the string which is associated to . If is another linearisation of then der der is the case iff and coincide on the interior nodes of . Linearisations can also be considered as top down search strategies on a tree. We shall present examples. The rst is a particular case of the socalled depthrst search and the linearisation shall be called leftmost linearisation. It is as follows. x y iff x y or x y. For every tree there is exactly one leftmost linearisation. We shall denote the fact that there is a leftmost derivation of from X by X G . We can generalize the situation as follows. Let be a linear ordering uniformly dened on the leaves of local subtrees. That is to say, if and are isomorphic local trees (that is, if they correspond to the same rule ) then orders the leaves linearly in the same way as orders the leaves of (modulo the unique (!) isomorphism). In the case of the leftmost linearisation the ordering is the one given by . Now a minutes reection reveals that every linearisation of the local subtrees of a tree induces a linearisation of the entire tree but not conversely (there are orderings which do not proceed in this way, as we shall see shortly). X G denotes the fact that there is a derivation of from X determined by . Now call a priorisation for G N A R if denes a linearisation on the local tree R. Since the root is always the rst element in a linearisa , for every
}
V U
0~
V U
V U
V U
0 ) ) ) (
i( i
106
tion, we only need to order the daughters of the root node, that is, the leaves. Let this ordering be . We write X if X G for the linearisation G dened by .
G
~
A different strategy is the breadthrst search. This search goes through the tree in increasing depth. Let Sn be the set of all nodes x with d x n. For each n, Sn shall be ordered linearly by . The breadthrst search is a linearisation , which is dened as follows. (a) If d x d y then x y iff x y, and (b) if d x d y then x y. The difference between these search strategies, depthrst and breadthrst, can be made very clear with tree domains (see Section 1.4). The depthrst search traverses the tree domain in the lexicographical order, the breadthrst search in the numerical order. Let the following tree domain be given.
00 10 11 20 The depthrst linearisation is
The breadthrst linearisation, however, is
Notice that with these linearisations the tree domain cannot be enumerated. Namely, the depthrst linearisation begins as follows.
So we never reach 1. The breadthrst linearisation goes like this.
aaa) ) ) ) )
(2.20)
0123
aaa)
) )
(2.19)
0 00 000 0000
) ) ) )
(2.18)
0 1 2 00 10 11 20
) )
) )
) )
(2.17)
0 00 1 10 11 2 20
V U
V U
V U
Proposition 2.16 Let be a priorisation. Then X
x iff X
x.
V U
yV U
Normal Forms
107
So, we never reach 00. On the other hand, is countable, so we do have a linearisation, but it is more complicated than the given ones. The rst reduction of grammars we look at is the elimination of superuous symbols and rules. Let G A N R be a CFG. Call X N reachable if G X for some and . X is called completable if there is an x such that X R x.
S T S T
In the given grammar , and are completable, and , , and are reachable. Since , the start symbol, is not completable, no symbol is both reachable and completable. The grammar generates no terminal strings. Let N be the set of symbols which are both reachable and completable. If N then L G . In this case we put N : and R : . Otherwise, let R be the restriction of R to the symbols from A N . This denes G N A R . It may be that throwing away rules may make some nonterminals unreachable or uncompletable. Therefore, this process must be G, in which case every element is both reachable and repeated until G completable. Call the resulting grammar G s . It is clear that G iff Gs . Additionally, it can be shown that every derivation in G is a derivation in G s and conversely. Denition 2.17 A CFG is called slender if either L G and G has no nonterminals except for the start symbol and no rules; or L G and every nonterminal is both reachable and completable. Two slender grammars have identical sets of derivations iff their rule sets are identical.
Proposition 2.19 For every CFG G there is an effectively constructable slender CFG Gs N s A Rs such that N s N, which has the same set of derivations as G. In this case it also follows that L B Gs LB G . Next we shall discuss the role of the nonterminals. Since these symbols do not occur in L G , their name is irrelevant for the purposes of L G . To make this precise we shall introduce the notion of a rule simulation. Let G and
V U
V U
V U
Proposition 2.18 Let G and H be slender. Then G

}
H iff der G
der H .
V U
w dV U D w V U D
T S
) )
S
0 k ) k ) ( )
V U
) (
V U
(2.21)
i 0 ) ) ) (
T 6
T S
D k
i W
}
W i k D k
108
G be grammars with sets of nonterminals N and N . Let N N be a relation. This relation can be extended to a relation N A N A by putting if and are of equal length and i i for every i. A relation N N is called a forward rule simulation or an Rsimulation if (0) , (1) if X R and X Y then there exists a such that and Y R , and (2) if Y R and X Y then there exists an such that and X R. A backward simulation is dened thus. (0) From X follows X and from Y follows Y , (1) if X R and then Y R for some Y such that X Y , and (2) if Y R and then X R for some X such that X Y . We give an example of a forward simulation. Let G and G be the following grammars.
The start symbol is in both grammars. Then the following is an Rsimulation.
Together with also the converse relation is an Rsimulation. If is an Rsimulation and i : i n 1 is a Gderivation there exists a G derivation i : i n 1 such that i i for every i n 1. We can say more exactly that if i C i 1 is an instance of a rule from G where C 1 2 then there is a context D 1 2 such that i D i 1 is an instance of a rule from G . In this way we get that for every B LB G there is a B LB G such that x x for every leaf and x x for every nonleaf. Analogously to a rule simulation we can dene a simulation of derivation by requiring that for every Gderivation there is a G derivation which is equivalent to it. Proposition 2.20 Let G1 and G2 be slender CFGs and N1 N2 be an Rsimulation. Then for every G1 derivation i : i n there exists a G2 derivation i : i n such that i i , i n. We shall look at two special cases of simulations. Two grammars G and G are called equivalent if there is a bijection b : N A N A such that b x x
V 3 U
Dk
V U
V U
V U V b3 U D 1d0)j)i) ( 3 0 i ) D ) i (
sk
T 0 ) $) 0 ) 0 ) 0 ) @S ()0 () () () (
i(
Vk U
i(
V U 1 ) j) Q0 i) ( k D
0 i) ) i(
i(
(2.23)
(2.22)
s U ak e V s U k e }
}
} c k
}
} S
1 i
T s S
s S
1 i
1 i
} 1i D
1 i
T S
T } S
1 i
} g
1 i 1 i
k k
i(
Normal Forms
109
for every x A, b S S and b induces a bijection between Gderivations and G derivations. This notion is more restrictive than the one which requires that b is a bijection between the sets of rules. For it may happen that certain rules can never be used in a derivation. For given CFGs we can easily decide whether they are equivalent. To begin, we bring them into a form in which all rules are used in a derivation, by removing all symbols that are not reachable and not completable. Such grammars are equivalent if there is a bijection b which puts the rules into correspondence. The existence of such a bijection is easy to check. The notion of equivalence just proposed is too strict in one sense. There may be nonterminal symbols which cannot be distinguished. We say G is reducible to G if there is a surjective function b : N A N A such that S,b x x for every x A and such that b maps every Gderivation bS onto a G derivation, while every preimage under b of a G derivation is a G derivation. (We do not require however that the preimage of the start symbol from G is unique; only that the start symbol from G has one preimage which is a start symbol of G .) Denition 2.21 G is called reduced if every grammar G such that G is reducible onto G can itself be reduced onto G. Given G we can effectively construct a reduced grammar onto which it can be reduced. We remark that in our example above G is not reducible onto G. For even though is a function (with ) can be derived from in one step, cannot be derived from and in one step. Given G and the function the following grammar is reduced onto G.
} }
Now let G be a CFG. We add to A two more symbols, namely already contained in A. Subsequently, we replace every rule X
(2.24)
T
and , not by the
sk
s )
T S
T S
s S
s S
} S
} S
T s S
T s S
V U
T } S
T } S
Dk
V U
T } S
V U
110 rule X
If we look at the analogous derivations in G b we get the strings (2.27)
Now look at the class of trees L G and forget the labels of all nodes which are not leaves. Then the structure obtained shall be called a bracketing analysis of the associated strings. The reason is that the bracketing analyses are in onetoone correspondence with the strings which L G b generates. Now we will ask ourselves whether for two given grammars G and H it is decidable whether they generate the same bracketing analyses. We ask ourselves rst what the analogon of a derivation of G is in G b . Let X be derivable in G, and let the corresponding Gb string in this derivation be b X b . In the next step X is replaced by . Then we get , and in Gb the string b b . If we have an Rsimulation to H then it is also an Rsimulation from G b to H b provided that it sends the opening bracket of G b to the opening bracket of H b and the closing bracket of Gb to the closing bracket of H b . It follows that if LH there is an Rsimulation from G to H then not only we have L G b b . but also L G LH
Theorem 2.22 We have L Gb to H.
L H b if there is an Rsimulation from G
V U
i t i s i
V U
i i i
V U
V U
(2.28)
LG
e L Gb
These are obviously distinct. Dene a homomorphism e by e a : a, if a e: and e : . Then it is not hard to see that
t t s t t t s s s s s @'F@'Ft t @s H H
V U
)t t t s t t s s s s s @'F't @t @s H H
T S T S
T } S
D t
T }
) (
(2.26)
The grammar G generates the language . The string derivations, which correspond to different trees.
H @H
0@ iaaa) ) 0 H ) @ H iaaa) H H
T S T S
T } S
} S
) }(
(2.25)
T
has several
t
A,
T S
s t
T }
t 's t s s t H s
} S
T S
T }
} S
t W i Ws
}
. The soconstructed grammar is denoted by G b . G Gb
2t
Normal Forms
111
The bracketing analysis is too strict for most purposes. First of all it is not customary to put a single symbol into brackets. Further, it makes no sense to distinguish between x and x , since both strings assert that x is a constituent. We shall instead use what we call constituent analyses. These are pairs x in which x is a string and an exhaustively ordered constituent structure dened over x. We shall denote by L c G the class of all constituent analyses generated by G. In order to switch from bracketing analyses to constituent analyses we only have to eliminate the unary rules. This can be done as follows. Simply replace every rule Y , where 1, by the set 2 : Z : Z Y .R : 2 : R . Finally, let G : N A R . Every rule is strictly productive and we have L c G Lc G . (Exception needs to be made for , as usual. Also, if necessary, we shall assume that G is slender.) Denition 2.23 A CFG is in standard form if every rule different from has the form X Y with Y 1 or the form X a. A grammar is in 2 standard form or Chomsky Normal Form if every rule is of the form , a. X Y0Y1 or X (Notice that by our conventions a CFG in standard form contains the rule X for X , but this happens only if is not on the right hand side of a rule.) We already have proved that the following holds. Theorem 2.24 For every CFG G one can construct a slender CFG G n in standard form which generates the same constituent structures as G. Theorem 2.25 For every CFG G we can construct a slender CFG G c in LG. Chomsky Normal Form such that L Gc Proof. We may assume that G is in standard form. Let X Y0Y1 Yn 1 Zn 2 be new nonterminals. Replace by be a rule with n 2. Let Z0 Z1 the rules
} }
Every derivation in G of a string can be translated into a derivation in c c c n 1 . For the Gc by replacing every instance of by a sequence 0 1 converse we introduce the following priorisation on the rules. Let Z i be
) iaaa)
c n
) iaaa)
(2.29)
c 0 : X
c Y0 Z0 1 :
Z0
Y1 Z1
Zn
Yn 2Yn
D i
aa
i U
V U
V U i
V U
t s i
) iaaa)
i
t is t y@s
|
0 i !) (
) ) ) (
112
always before Yi . However, in Zn 3 Yn 2Yn 1 we choose the leftmost prix iff Gc x. For if i : i p 1 is a leftmost orisation. We show G derivation of x in G, then replace every instance of a rule by the sequence c c c 0 , 1 , and so on until n 2 . This is a Gc derivation, as is easily checked. It is also a derivation. Conversely, let j : j q 1 be a Gc derivation c which is priorized with . If i 1 is the result of an application of the rule k , c k n 2, then i 2 q 1 and i 2 is the result of an application of k 1 on i 1 , which replaced exactly the occurrence Z k of the previous instance. This c c c c means that every k in a block of instances of 0 , 1 n 2 corresponds to a single instance of . There exists a Gderivation of x, which can be obtained by backward replacement of the blocks. It is a leftmost derivation. For example, the right hand side grammar is the result of the conversion of the left hand grammar into Chomsky Normal Form.
For an invertible grammar the labelling on the leaves uniquely determines the labelling on the entire tree. We propose an algorithm which creates an invertible grammar from a CFG. For simplicity a rule is of the form X Y x. Now we choose our nonterminals from the set N . The or X terminal rules are now of the form x, where X :X x R . The nonterminal rules are of the form 0 1 n 1 with

Further, we choose a start symbol, , and we take the rules for every X, for which there are Xi X R. This grammar we call Gi . with S i
1 i
aa
(2.31)
X :X
Y0Y1
Yn
R for some Yi
1 i
1 i T w v IS V U
1 i
Denition 2.26 A CFG is called invertible if from X it follows that X Y .
aa
(2.30)
R and Y
i ) iaaa)
U S
T 7T
9}
s T
T T S
T s T } S
Normal Forms

113
It is not difcult to show that Gi is invertible. For let 0 1 n 1 be the , i n, and an X such right hand side of a production. Then there exist Yi i Y is a rule in G. Hence there is an such that is in Gi . that X i is in standard form (Chomsky Normal is uniquely determined. Further, G Form), if this is the case with G. Theorem 2.27 Let G be a CFG. Then we can construct an invertible CFG Gi which generates the same bracketing analyses as G. The advantage offered by invertible grammars is that the labelling can be reconstructed from the labellings on the leaves. The reader may reect on the fact that G is invertible exactly if G b is. Denition 2.28 A CFG is called perfect if it is in standard form, slender, reduced and invertible. It is instructive to see an example of a grammar which is invertible but not reduced. G H
} } T S } T } S }
Theorem 2.29 For every CFG we can construct a perfect CFG which generates the same constituent structures. Finally we shall turn to the socalled Greibach Normal Form. This form most important for algorithms recognizing languages by reading the input from left to right. Such algorithms have problems with rules of the form X Y , in particular if Y X. Denition 2.30 Let G N A R be a CFG. G is in Greibach (Normal) Form if every rule is of the form or of the form X x Y.
G
~
1 i
1 i
Proposition 2.31 Let G be in Greibach Normal Form. If X a leftmost derivation from X in G iff y Y for some y and y only if Y X.
then has A and Y N
i W
G is invertible but not reduced. To this end look at H and the map , . This is an Rsimulation. H is reduced and invertible.
}
i W
i Wi D i
0 )} ) ) (
D i
(2.32)
T
aa
114
The proof is not hard. It is also not hard to see that this property characterizes the Greibach form uniquely. For if there is a rule of the form X Y then there is a leftmost derivation of Y from X, but not in the desired form. Here we assume that there are no rules of the form X X. Theorem 2.32 (Greibach) For every CFG one can effectively construct a grammar Gg in Greibach Normal Form with L Gg LG. Before we start with the actual proof we shall prove some auxiliary statements. We call an Xproduction if X for some . Such a production is called left recursive if it has the form X X . Let X as follows. For every factorisation be a rule; dene R 1 Y 2 of and every rule Y add the rule X 1 2 to R and nally remove N A R . Then L G L G . We call this the rule . Now let G : construction as skipping the rule . The reader may convince himself that the tree for G can be obtained in a very simple way from trees for G simply by removing all nodes x which dominate a local tree corresponding to the rule , that is to say, which are isomorphic to . (This has been dened in Section 1.6.) This technique works only if is not an production. In this case we proceed as follows. Replace by all rules of the form where derives from by applying a rule. Skipping a rule does not necessarily yield a new grammar. This is so if there are rules of the form X Y (in particular rules like X X). Lemma 2.33 Let G N A R be a CFG and let X X i , i m, be all left recursive Xproductions as well as X j , j n, all non left recursive 1 : Xproductions. Now let G N Z A R1 , where Z N A and R1 consists of all Y productions from R with Y X as well as the productions (2.33) Then L G1
LG.
Proof. We shall prove this lemma rather extensively since the method is relatively tricky. We consider the following priorisation on G 1 . In all rules of the form X j and Z i we take the natural ordering (that is, the leftmost
ordering) and in all rules X
j Z as well as Z
i Z we also put the left
W i
W i
V U
X X
j j Z
j j
n n
Z Z
i i Z
i i
m m
i W
i W
W i i D W i
V U
i W
V U
V U
i W i W i
) D )bT s S
0 ) ) ) (
i W
) (
0 ) ) ) (
Normal Forms
115
to right ordering except that Z precedes all elements from j and i , respectively. This denes the linearisation . Now, let M X be the set of all such that there is a leftmost derivation from X in G in such a way that is the rst element not of the form X . Likewise, we dene P X to be the set of all which can be derived from X priorized by in G 1 such that is the rst element which does not contain Z. We claim that P X M X . It can be seen that
j n i m
From this the desired conclusion follows thus. Let x L G . Then there exists a leftmost derivation Ai : i n 1 of x. (Recall that the Ai are instances of rules.) This derivation is cut into segments i , i , of length ki , such that
This partitioning is done in such a way that each i is a maximal portion of of Xproductions or a maximal portion of Y productions with Y X. The Xsegments can be replaced by a derivation i in G1 , by the previous considerations. The segments which do not contain Xproductions are already G1 derivations. For them we put i : i . Now let be result of stringing together the i . This is welldened, since the rst string of i equals the rst string of i , as the last string of i equals the last string of i . is a G1 derivation, priorized by . Hence x L G 1 . The converse is analogously proved, by beginning with a derivation priorized by . Now to the proof of Theorem 2.32. We may assume at the outset that G is in Chomsky Normal Form. We choose an enumeration of N as N Xi : i p . We claim rst that by taking in new nonterminals we can see to it that we get a grammar G1 such that L G1 L G in which the Xi productions have the form Xi x Y or Xi X j Y with j i. This we prove by induction on i. Let i0 be the smallest i such that there is a rule Xi X j Y with j i. Let j0 be the largest j such that Xi X j Y is a rule. We distinguish two cases. 0 The rst is j0 i0 . By the previous lemma we can eliminate the production by introducing some new nonterminal symbol Z i . The second case is j0 i0 . 0 Here we apply the induction hypothesis on j 0 . We can skip the rule Xi
X j Y and introduce rules of the form (a) Xi Xk Y with k 0 0 way the second case is either eliminated or reduced to the rst.
j0 . In this
i W
k i W
V U
p i
p i 1
i W
1 i
i W
i W
(2.35)
Aj :
kp
ki
V U
1 i
V U
V i
U i
V U
(2.34)
M X
PX
V U
V U
V U
V U
i W
i W
116
Now let P : Zi : i p be the set of newly introduced nonterminals. It may happen that for some j Z j does not occur in the grammar, but this does not disturb the proof. Let nally Pi : Z j : j i . At the end of this reduction we have rules of the form (2.36b) (2.36c) Xi x Y
x Y . If It is clear that every X p 1 production already has the form X p 1 some X p 2 production has the form (2.36a) then we can skip this rule and get rules of the form X p 2 xY . Inductively we see that all rules of the form can be eliminated in favour of rules of the form (2.36b). Now nally the rules of type (2.36c). Also these rules can be skipped, and then we get rules of the form Z x Y for some x A, as desired. For example, let the following grammar be given.
With this we get the grammar

Next we skip the productions.

T S
T S
(2.41)
S T S
S T S
S T S S T S 7 7
T S
T S
(2.40)
S 7
(2.39)
Likewise we replace the production

T S
)
(2.38)
The production lemma by

}
is left recursive. We replace it according to the above
S }
(2.37)
by
i W
Ux1 i U
k i i
T S
S }
Zi
Pi
VaV s U W V V 1 U U
Zi
i W i W
(2.36a)
Xi
Xj Y
i W
}
Recognition and Analysis
117
Next can be eliminated (since it is not reachable) and we can replace on the right hand side of the productions the rst nonterminals by terminals.
S T
Now the grammar is in Greibach Normal Form.
Exercise 55. Let Gi be the invertible grammar constructed from G as dened above. Show that the relation dened by

is a backward simulation from Gi to G. Exercise 56. Let B be an ordered labelled tree. If x is a leaf then x . Since is a branch and can be thought of in a natural way as a string x the leaf x plays a special role, we shall omit it. We say, a branch expression of is a string of the form x x , x a leaf of . We call it x . Show that the set of all branch expressions of trees from L B G is regular. Exercise 57. Let G be in Greibach Normal Form and x a terminal string of length n 0. Show that every derivation of x has exactly the length n. How long is a derivation for an arbitrary string ? 3. Recognition and Analysis
e
CFLs can be characterized by special classes of automata, just like regular languages. Since there are CFLs that are not regular, automata that recognize them cannot all be nite state automata. They must have an innite memory. The special way such a memory is organized and manipulated differentiates the various kinds of nonregular languages. CFLs can be recognized by so called pushdown automata. These automata have a memory in the form of a stack onto which they can put symbols and remove (and read them) one by one. However, the automaton only has access to the symbol added most recently. A stack over the alphabet D is a string over D. We shall agree that the rst letter of the string is the highest entry in the stack and the last letter
V U
0 3) i)
V U
e x
0 3) ) S ibT v
e x
0 3) j) i) (
(2.43)
V U
Exercise 54. Show that for a CFG G it is decidable (a) whether L G (b) whether L G is nite, (c) whether L G is innite.
V U
V U
(2.42)
S T 9
p p S T H 9pH p
118
corresponds to the lowest entry. To denote the end of the stack, we need a special symbol, which we denote by . (See Exercise 43 for the necessity of an endofstack marker.) A pushdown automaton steers its actions by means of the highest entry of the stack and the momentary memory state. Its actions consist of three successive steps. (1) The disposal or removal of a symbol on the stack. (2) The moving or not moving of the read head to the right. (3) The change into a memory state (possibly the same one). If the automaton does not move the head in (2) we call the action an move. We write A in place of A . Denition 2.34 A pushdown automaton over A is a septuple
a function such that q a d is always nite. We call Q the set of states, i 0 the initial state, F the set of accepting states, D the stack alphabet, the beginning of the stack and the transition function.
x
if for some d1 d Z d1 , d e d1 and p e p Z x . We call this a transition. We extend the function to congurations. p d pd x is also used. Notice that in contrast to a pushdown automaton a nite state automaton may not change into a new state without reading a new symbol. For a pushdown automaton this is necessary in particular if the automaton wants to clear the stack. If the stack is empty then the automaton cannot work further. This means, however, that the pushdown automaton is necessarily partial. The transition function can now analogously be extended to strings. Likewise, we can dene it for sets of states.
T i ( 0 l) ) ( 0
1 l) i
i IS
V xU
(2.48)
x : for some q
Fz
D : i0
x
A
qz
k I
If
x
A
we say that there is a computation for x from to . Now
kk
kk
(2.47)
there exists
with
A A
x y
V )i ) U
10 k i )k ( V ) ) U
1 i) 0 !k (
W i Wi k D i i D i i 0 k i ) k 0 i ) ( (
(2.46)
pd
p d
1 i
We call :
q d , where q
Q and d
D , a conguration. We now write
V ) ) U
0i ) ( D
(2.45)
: Q
where Q and D are nite sets, i0
0 ) ) ) ) ) ( )
(2.44)
Q i0 A F D
Q,
D and F
Q, as well as
T s S
119
We call this the language which is accepted by by state. We call a pushdown automaton simple if from q z p Z a follows z a 2. It is an exercise to prove the next theorem.
Proposition 2.35 For every pushdown automaton down automaton such that L L .
there is a simple push-
For this reason we shall tacitly assume that the automaton does not write arbitrary strings but a single symbol. In addition to L there also is a language which is accepted by by stack.
The languages L and Ls are not necessarily identical for given . Howfor some pushdown automaton ever, the set of all languages of the form L s equals the set of all languages of the form L for some pushdown automaton. This follows from the next theorem.
Proof. Let Q i0 A F D be given. We add to Q two states, q i and q f . qi shall be the new initial state and F : q f . Further, we add a new symbol which is the beginning of the stack of . We dene qi : i0 . There are no more transitions exiting qi . For q qi q f and Z q Z x : q Z x , x A. Further, if q F and Z , we put qZ : qZ q f and otherwise q Z : q Z . for x A and q f Z : q f for Z Finally, let q f Z x :
versely, let x Ls . Then qi p for a certain p. Then is deleted only at last since it happens only in q f and so p q f . Further, we have putation i0 q d . This, however, is also a computation. This s L and so also the rst claim. Now for the construcshows that L tion of . We add two new states, q f and qi , and a new symbol, , which

V xU }YYU V 0aW i ) ( a$) ( 0 W A 0 a W i ) ( a) ( 0
qi
x
A
qd
for some state q
F. This means that there is an com-
0 ) a) ( ( 0
x
A
V YU
Since q f d
q f we have x
Ls
. Hence L
Ls
. Now, con-
0i
q d for some q
F and so we also have an computation q i
qf d .
. Assume now x
. Then there is a
computation i 0
x
A
V xU
Proposition 2.36 For every pushdown automaton there is an with L Ls as well as a pushdown automaton with L s L .
V U } dV xU ) ( a) ( 0 A 0 $) ( T 0 ) @S ( V ) ) D ) ) U V ) ) U D 1 D ) V ) U D )
V lU
T 0 ) ) ( ( 0
V 2xU
V2xU V xU
1 i 0 ) ( 0i ) ( 1 0i ) ( V xU 1 i T S 1s 1 w V ) ) U D T 0 ) (@SsIV ) ) U V ) ) U 1 V ) ) U V i6) D ) U D D T0 W ( a$) @S
0 ) ) ) ) ) ( )
V xU
i S
V YU
V 2xU
V 2xU
(2.49)
Ls
x : for some q
Q : i0
x
A
W i
V xU
V ) ) U
V xU
1 i d0 l) ( D
V U
1 i
V U
120
tion qi q f d for some d. One can see quite easily that d this computation factors as follows.
Lemma 2.37 Let L be a CFL over A. Then there exists a pushdown automaton such that L Ls . Proof. We take a CFG G N A R in Greibach Form with L L G . We assume that G. (If L G , then we construct an automaton for LG and then modify it slightly.) The automaton possesses only one state, i0 , and uses N as its stack alphabet. The beginning of the stack is .
This denes : i0 i0 A i0 N . We show that L Ls . To this end recall that for every x L G there is a leftmost derivation. In a grammar in Greibach Form every leftmost derivation derives strings of the form y Y . i0 y . Now one shows by induction that G y Y iff i0 Y
Q i0 A F D be a pushdown automaton. We may asProof. Let sume that it is simple. Put N : Q D Q , where is a new symbol.
}
V 2xU
V S IT s
U e
e 0 ) D ) ) ) ) ( )
Lemma 2.38 Let
be a pushdown automaton. Then L s
is context free.
i W i
V xU
V !) ) U i
1 d0 i ) (
1 i W
i i W ~
(2.52)
i0 X x :
i0 Y : X
x Y
V U
0 ) $) ( ( 0
a transition. Hence there is a computation i 0 follows x Ls , and so Ls L .
V U 1 i 0 ) ) )bT ') ) bT !( S ) S } D ( 0 i ) @S V ) ) U D
V lU
V U 1 0 ) ) ) (
V 2xU
V xU
V 2xU
T S V U v
1 i
Here p
Q, whence p
q f qi . But every
0 )
) D ( a) a W ) a) ( 0 ( 0 ( 0
(2.51)
qi
i0
qf
transition from i0 to p is also

x
p . From this
D i
V lU
1 i
0 i ) a) ( ( 0 A V lU 1 i
Hence x
. Conversely, let x
0 )
( a) a) ( 0 ( 0
(2.50)
qi
x
A
qf
. Then there exists an computa-
0 ) ) ( ( 0
consider an x Ls . There is a computation i0 Then there exists an computation
x
A
p for some p.
. Further,
shall be the begin of stack of , and we put F qi x : for x A and qi : q Z x : q Z x for Z and q q x : for x A. Further, q f Z x :
w V ) ) U D T0 ) @S ( V ) U ) Ta ( 0 V ) U D ) W ) D @S D T S D
: q f . Again we put i0 . Also, we put : q f , as well as . This denes . Now
1 w V ) ) U D V ) ) " U V ) ) U D V ) ) U V xU
1 i

}
121
shall also be the start symbol. We write a general element of N in the form q A p . Now we dene R : Rs R0 R R , where
(2.53)
This sufces for the proof. For if x L G then we have i 0 q G x and i0 x , which means nothing but x Ls . so because of (2.54) q And if the latter holds then we have i 0 q G x and so G x, which is nothing else but x L G . Now we show (2.54). It is clear that (2.54) follows from (2.55).
(2.55) is proved by induction. On some reection it is seen that for every automaton there is an automaton with only one accepting state which accepts the same language. If one takes in place of then there is no need to use the trick with a new start symbol. Said in another way, we may choose i 0 q as a start symbol where q is the accepting state of . Theorem 2.39 (Chomsky) The CFLs are exactly the languages which are accepted by a pushdown automaton, either by state or by stack. From this proof we can draw some further conclusions. The rst conclusion is that for every pushdown automaton we can construct a pushdown automaton for which Ls Ls and which contains no moves. Also, such that L s Ls and which there exists a pushdown automaton contains only one state, which is at the same time an initial and an accepting state. For such an automaton these denitions reduce considerably. Such an automaton possesses as a memory only a string. The transition function can
V ) ) U i
q0 Y0Y1
V xU
1d0 aa ) )
) )
V lU
) ( aa )
V xU
V YU
Wi
~
) )
(2.55)
pZ q
y q0 Y0 q1 q1 Y1 q2
qm
Ym
Ym
pZ y
V xU
1 i
) )
V !) ) U i
~ l
) )
V U
1 d0 ) (
V 6) U i) 1 i
1 Q0 ) (
V U
~
1 i
) )
(2.54)
pZ q
pZ x
1 i
V U
The grammar thus dened is called G every p q Q and every Z D
. We claim that for every x
TV ) ) U Q0 ) k ( 1 T V ) ) U 0 ) ( 1 T V ) ) U 0 ) ( 1
A ,
) ) ! ) ) k $ ) ) $ $ ) )
) ) @S ) ) @S D ) ) @S D S D
: : : :
Rs R0 R R
i0 pZ q pZ q pZ q
q : q Q x: r pZ x xrY q : rY pZ x p X r r Y q : p XY pZ
1 )
) )
122
be reduced to a function from A D into nite subsets of D . (We do not allow transitions.) The pushdown automaton runs along the string from left to right. It recognizes in linear time whether or not a string is in the language. However, the automaton is nondeterministic. Q i 0 A F D is determinDenition 2.40 A pushdown automaton istic if for every q Q, Z D and x A we have q Z x 1 and for all q Q and all Z D either (a) q Z or (b) q Z a for all for a deterministic a A. A language L is called deterministic if L L automaton . The set of deterministic languages is denoted by . Deterministic languages are such languages which are accepted by a deterministic automaton by state. Now, is it possible to build a deterministic automaton accepting that language just like regular languages? The answer is negative. To this end we consider the mirror language x x T : x A . This language is surely context free. There are, however, no deterministic automata that accept it. To see this one has to realize that the automaton will have to put into the stack the string x xT at least up to x in order to compare it with the remaining word, xT . The machine, however, has to guess when the moment has come to change from putting onto stack to removing from stack. The reader may reect that this is not possible without knowing the entire word. Theorem 2.41 Deterministic languages are in DTIME n . The proof is left as an exercise. We have seen that also regular languages are in DTIME n . However, there are deterministic languages which are not regular. Such a language is L x xT : x . In contrast to the mirror language L is deterministic. For now the machine does not have to guess where the turning point is: it is right after the symbol . Now there is the question whether a deterministic automaton can recognize languages using the stack. This is not the case. For let L L s , for some deterministic automaton . Then, if x y L for some y then x L. We say that L is prex free if it has this property. For if x L then there exists a computation from q0 to q . Further, since is deterministic: if q0 then q . However, if the stack has been emptied the automaton cannot work further. Hence x y L. There are deterministic languages which are not prex free. We present an important class of such
1 i i
1 i V xU
V U D w V ) ) U w V ) ) U D D jV ) ) U 1 0 ) ) ) ) ) ( ) D
1 i
V U
1 i i D
i S i
V U
1 i i
0 ) (
0 ) ( D 0 ) (
i i
T Fl) 1 i T S H
) 0
i i 8p IS
123
languages, the Dycklanguages. Let A be an alphabet. For each x A let x be another symbol. We write A : x : x A . We introduce a congruence on A A . It is generated by the equations (2.56) aa
for all a A. (The analogous equations aa are not included.) A string x A A is called balanced if x . x is balanced iff x can be rewritten into by successively replacing substrings of the form xx into . Denition 2.42 Dr denotes the set of balanced strings over an alphabet consisting of 2r symbols. A language is called a Dycklanguage if it has the form Dr for some r (and some alphabet A A). The language XML (Extensible Markup Language, an outgrowth of HTML) embodies like no other language the features of Dycklanguages. For every string x it allows to form a pair of tags x (opening tag) and x (closing tag). The syntax of XML is such that the tags always come in pairs. The tags alone (not counting the text in between) form a Dyck Language. What distinguishes XML from other languages is that tags can be freely formed. Proposition 2.43 Dycklanguages are deterministic but not prex free. The following grammars generate the Dycklanguages:
Dycklanguages are therefore context free. It is easy to see that together with x y Dr also xy Dr . Hence Dycklanguages are not prex free. That they are deterministic follows from some general results which we shall establish later. We leave it to the reader to construct a deterministic automaton which recognizes Dr . This shows that the languages which are accepted by a deterministic automaton by empty stack are a proper subclass of the languages which are accepted by an automaton by state. This justies the following denition. Denition 2.44 A language L is called strict deterministic if there is a deterministic automaton such that L Ls . The class of strict deterministic languages is denoted by s . Theorem 2.45 L is strict deterministic if L is deterministic and prex free.
V xU
1 ii
(2.57)
x x
}
i @
i @
} }
s 1 i U
s Y U 1 !) i i
124
Proof. We have seen that strict deterministic languages are prex free. Now let L be deterministic and prex free. Then there exists an automaton which accepts L by state. Since L is prex free, this holds for every x L, and for every proper prex y of x we have that if q 0 q Y then q is not an accepting state. Thus we shall rebuild in the following way. Let 1 q Z x : q Z x if q is not accepting. Further, let 1 q Z x : if q F and x A; q , Z D. Finally, let be the automaton which relet 1 q Z : sults from by replacing with 1 . is deterministic as is easily checked. Further, an computation can be factored into an computation followed by a deletion of the stack. We claim that L L s . The claim then follows. So let x L . Then there exists a computation using x from q 0 to q Y where q F. For no proper prex y of x there is a computation into an accepting state since L is prex free. So there is an computation with x from q0 to q Y . Now q Y q and so x Ls . Conversely, asthe longest string such that q0 q Y . Then the step before reaching q Y is a step. So there is a computation for x from q 0 to q Y , and so x L . The proof of this theorem also shows the following.
For the following denition we make the following agreement, which shall be used quite often in the sequel. We denote by k the prex of of length k in case has length at least k; otherwise k : . Denition 2.47 Let G N A R be a grammar and N A a partition. We write if there is an M such that M. is called strict for G if the following holds.
(b) 1
and C
C.
k i
(a) 1 2
and
2 or
i i
For C C N and 1 2 N as C 2 R then either
U f1 i ) i ) i
i D D i) D 1 i i 1 $k
bk
A : if C C and C
1 as well
1 i
Theorem 2.46 Let U be a deterministic CFL. Let L be the set of all x which no proper prex is in U. Then L is strict deterministic.
` 0 i) (
0 )
D i
0 i ) $) ( 0
x
A
U for
1 i
0 ) ) ) (
V 2xU
V YU
1 i
1 i
sume x
Ls
. Then there is a computation q0
q . Let Y
D be
0 )
1 V ) ) U
0 ) ) ( 0 A V YU 1 i
0 i) ( V U
y
A
1 i
V ) ) U
$) 0 i D
V xU
0 ) 0 i ) ( (
T 0 ) @S (
0 i) (
1 V xU
1 i
V ) ) U V ) ) U
0 )
0 i) (
0 i) (
125
Denition 2.48 A CFG G is called strict deterministic if there is a strict partition for G. We look at the following example (taken from (Harrison, 1978)):
is a strict partition. The language generated by this grammar is n k n n k k : k n 1 . We shall now show that the languages generated by strict deterministic grammars are exactly the strict deterministic languages. This justies the terminology in retrospect. To begin, we shall draw a few conclusions from the denitions. If G N A R is strict deterministic and R R then N A R is strict deterministic as well. Therefore, for a strict deG terministic grammar we can construct a weakly equivalent strict deterministic slender grammar. We denote by n the fact that there is a leftmost L derivation of length n of from . Lemma 2.49 Let G be a CFG with a strict partition . Then the following is true. For C C N and 1 2 N A : if C C and C n 1 as well L n then either as C L 2 1
The proof is an easy induction over the length of the derivation.

L
|
Proof. Assume C n D . Then because of Lemma 2.49 we have for all L k 1: C kn D for some . From this it follows that there is no terminating L leftmost derivation from C. This contradicts the fact that G is slender. It follows that a strict deterministic grammar is not left recursive, that is, A L A cannot hold. We can construct a Greibach Normal Form for G in the following way. Let C be a rule. If A then we skip by such that R. Then replacing it with the set of all rules C Lemma 2.49 assures us that is a strict partition also for the new grammar. This operation we repeat as often as necessary. Since G is not left recursive, this process terminates.
Lemma 2.50 Let G be slender and strict deterministic. Then if C we have C D.
1 i
i i
and C
C.
1 2
and
2 or
} k
i i
s i
p ) S T H )H S ) T 5'bT H ) 'bT 'b'l) S S) S) T p) S T S } H D
U 1 i ) i ) i
0 ) ) ) (
i
}
H
|
0k ) ) ) (
i D D i 1 $k )
S T
(2.58)
pp q
D
S H H
D i) i
126
Theorem 2.51 For every strict deterministic grammar G there is a strict deterministic grammar H in Greibach Normal Form such that L G LH . Now for the promised correspondence between strict deterministic languages and strict deterministic grammars. Lemma 2.52 Let L be strict deterministic. Then there exists a deterministic automaton with a single accepting state which accepts L by stack. be given. We add a new state q into which the automaton Proof. Let changes as soon as the stack is empty. Lemma 2.53 Let be a deterministic automaton with a single accepting is strict deterministic. state. Then G Proof. Let Q i0 A F D . By the preceding lemma we may assume that F q f . Now let G dened as in (2.53). Put
We show that is a strict partition. To this end, let q Z q 1 and qZ q 2 be two rules. Assume rst of all 1 2 . Case 1. . Consider i : 1 i . If 1 A then also 2 A, since is deterministic. If on the other hand 1 A then we have 1 q Y0 q1 and 2 q Y0 q1 , 1 . If and so 1 2 . Case 2. . Let then : A, then we now have 1 qi Yi qi 1 and 2 qi Yi qi 1 for some qi qi 1 qi 1 Q. This completes this case. Assume now 1 . Then 1 is a prex of 2 . Case 1. . Then 2 , hence 2 . Case 2. . Then it is easy to see that 2 . Hence in both cases we have 2 , and so q q . This shows the claim. Theorem 2.54 Let L be a strict deterministic language. Then there exists a strict deterministic grammar G such that L G L. The strategy to put a string onto the stack and then subsequently remove it from there has prompted the following denition. A stack move is a move where the machine writes a symbol onto the stack or removes a symbol from the stack. (So the stack either increases in length or it decreases.) The automaton is said to make a turn if in the last stack move it increased the stack and now it decreases it or, conversely, in the last stack move it diminishes the stack and now increases it.
` kk k i D i D i i i i D D D D i i i i i i D D 1 k ) ) k ) ) ) ) D D 1 i i D k ) ) ) ) D 1 D 1D 1 i D i i kk ) ) i) i "k ) ) D
i i i
1 ) kk lk ) 1 ) kk ) )
V U
(2.59)
A or qZ q qZ q for some q q q QZ
V U
V U
) k ) ) 1 D)
V U 0 ) ) ) ) ) ( )
V U
127
Denition 2.55 A language L is called an nturn language if there is a pushdown automaton which recognizes every string from L with at most n turns. L is ultralinear if it is an nturn language for some n . Notice that a CFL is nturn exactly if there is an automaton which accepts L and in which for every string x every computation needs at most n turns. For given any automaton which recognizes L, we build another automaton which has the same computations as except that they are terminated before the n 1st turn. This is achieved by adding a memory that counts the number of turns. We shall not go into the details of ultralinear languages. One case is worth noting, that of 1turn languages. A CFG is called linear if in every rule X the string contains at most one occurrence of a nonterminal symbol. A language is linear if it is generated by a linear grammar. Theorem 2.56 A CFL is linear iff it is 1turn. Proof. Let G be a linear grammar. Without loss of generality a rule is of the aY or X Ya. Further, there are rules of the form X . We form X N, where is the beginning construct the following automaton. D : q , i0 : ,F: q . Further, for x A we put of the stack, Q : X x : Y if X xY R and X : Y if X Y x R; let Y x : if X Y x R. And nally X x : if X x R. Finally, : q . This denes the automaton G . It is not hard to show that G only admits computations without stack moves. For if the automaton is in state the stack may not decrease unless the automaton changes into the state . If it is in , the stack may not increase and it may only be changed into a state , or, nally, into q. We leave it to the reader to check that L G L G . Therefore L G is a 1turn language. Conversely, let be an automaton which allows computations with at most one turn. It is then clear that if the stack is emptied the automaton cannot put anything on it. The automaton may only ll the stack and later empty it. Let us consider the automaton G as dened above. Then all rules are of the form X xY with x A . Let Y Y0Y1 Yn 1 . We claim that every Yi production for i 0 is of the form Yi a or Yi X. If not, there is a computation in which the automaton makes two turns, as we have indicated above. (This argument makes tacit use of the fact that the automapX q ton possesses a computation where it performs a transition to Yi that is to say, that it goes from p to q where X is the topmost stack symbol.
) )
V U V U V aV U xU D v v v g V U V U T0 ) @S ( V 'U ) ) v 1 T ) v( 0 '@S Va0 ) )(aU g 1D T ) v( 0 '@S V ) 'U ) v 1 D T ) g 0 (@S V ) )U g 1 TD ) g ( 0 @S V ) U ) g D D g ) v) g T S T iIS D D s T D HS
aa
D V i xU
128
If this is not the case, however, then the transitions can be eliminated without harm from the automaton.) Now it is easy to eliminate the rules of the form Yi X by skipping them. Subsequent skipping of the rules Yi a yields a linear grammar. The automata theoretic analyses suggest that the recognition problem for CFLs must be quite hard. However, this is not the case. It turns out that the recognition and parsing problem are solvable in O n 3 steps. To see this, let a grammar G be given. We assume without loss of generality that G is in Chomsky Normal Form. Let x be a string of length n. As a rst step we try to list all substrings which are constituents, together with their category. If x is a constituent of category S then x L G ; if it is not, then x L G . In order to enumerate the substrings we use an n 1 n 1 matrix whose entries are subsets of N. Such a matrix is called a chart. Every substring is dened by a pair i j of numbers, where 0 i j n 1. In the cell i j we enter all X N for which the substring x i xi 1 x j 1 is a constituent of category X. In the beginning the matrix is empty. Put d : i j. Now we start by lling the matrix starting at d 1 and counting up to d n. For each d, we go from i 0 until i n d. So, we begin with d 1 and compute for i 0, i 1, i 2 and so on. Then we set d : 2 and compute for i 0, i 1 etc. We consider the pair d i . The substring x i xi d is a constituent of category X iff it decomposes into substrings y x i xi e and z xi e 1 xi d such that there is a rule X Y Z where y is a constituent of category Y and z is a constituent of category Z. This means that the set of all X N which we enter at i i d is computed from all decompositions into substrings. There are d 1 n such decomposition. For every decomposition the computational effort is limited and depends only on a constant c G whose value is determined by the grammar. For every pair we need c G n 1 steps. Now there exist n proper subwords. Hence the effort is bounded by c G n3 . 2 In Figure 8 we have shown the computation of a chart based on the word . Since the grammar is invertible any substring has at most one category. In general, this need not be the case. (Because of Theorem 2.27 we can always assume the grammar to be invertible.)
The construction of the chart is as follows. Let Cx i j be the set of all nonterminals X such that X G xi xi 1 x j 1 . Further, for two nonterminals X
V ) U
aa
T }
} T
(2.60)
T
V U
aa aa
g U
1 i
aa
g "V U e
g U
0 ) (
V U
1 i
S T
v g ) (
S }
T S
0 ) (
} 7}
} S
aa
H H H
0 ) (
129
Now we can compute Cx i j inductively. The induction parameter is j i. If j i 1 then Cx i j X :X x R . If j i 1 then the following equation holds.
i k j
We always have j k k i j i. Now let x L G . How can we nd a derivation for x? To that end we use the fully computed chart. We begin with x and decompose it in an arbitrary way; since x has the category , there must be a rule XY and a decomposition into x of category X and y of category Y . Or x a A and a is a rule. If the composition has been found, then we continue with the substrings x and y in the same way. Every decomposition needs some time, which only depends on G. A substring of length i has i n decompositions. In our analysis we have at most 2n substrings. This follows from the fact that in a properly branching tree with n leaves there are at most 2n nodes. In total we need time at most d G n2 for a certain constant dG which only depends on G. From this it follows that in general even if the grammar is not in Chomsky Normal Form the recognition and analysis only needs O n 3 steps where
V U
1 i
V ) U
V ) U
v )
V ) U
(2.62)
Cx i j
Cx i k
Cx k j
0 21
) %1
S V ) U V ) UDA
(2.61)
Y :X
} ) %
and Y X
Y:
Z:Z
1HFH
Figure 8. A Chart for
XY
R and for sets
T T w } T }
N let
H S w S } T

S S S
w
}

S T
130
at the same time we only need O n2 cells. For let G be given. Now transform G into 2standard form into the grammar G 2 . Since L G2 L G , the recognition problem for G is solvable in the same amount of time as G 2 . One needs O n2 steps to construct a chart for x. One also needs an additional O n2 steps in order to create a Gtree for x and O n steps to turn this into a derivation. However, this is not already a proof that the problem is solvable in O n 3 steps and O n2 space, for we need to nd a Turing machine which solves the problem in the same time and space. This is possible; this has been shown independently by Cocke, Kasami and Younger. Theorem 2.57 (Cocke, Kasami, Younger) CFLs have the following multitape complexity.
Proof. We construct a deterministic 3 tape Turing machine which only needs O n2 space and O n3 time. The essential trick consists in lling the tape. Also, in addition to the alphabet A we need an auxiliary alphabet consisting of and as well as for every U N a symbol U and a symbol U . On Tape 1 we have the input string, x. Put C i j : Cx i j . Let x have length n. On Tape 1 we construct a sequence of the following form.
T T 7T
This is the skeleton of the chart. We call a sequence of s in between two s a block. The rst block is being lled as follows. The string x is deleted step by step and the sequence n is being replaced by the sequence of the C i i 1 . This procedure requires O n2 steps. For every d from 1 to n 1 we shall ll the d 1st block. So, let d be given. On Tape 2 we write the sequence (2.64)
On Tape 3 we write the sequence (2.65)
V ) v U aaV ) g v U !V ) v U W W aaV g ) U aaV g ) U !V g ) U W W V ) v U aaV ) U !V ) U
C 0d C 1d C d 1d C 1d 1 C 2d 1 C d d 1 C n d n C n d 1n C n 1n
V )
v U aaV g v ) v U !V g v ) v U W aaWlV g ) U aaV ) U !V ) U W W V ) U aaV ) U !V ) U
C 01 C 02 C 0d C 12 C 13 C 1d 1 C n d n d 1 C n d n
C n
d n
g ) U
aay
(2.63)
n 1
V ) UA
V ) U
CFL
DSPACE n2 .
CFL
DTIME n3 .
V U
V U
V V
U U
131
From this sequence we can compute the d 1st block quite fast. The automaton has to traverse the rst block on Tape 2 and the second block on Tape 3 cogradiently and memorize the result of C 0 j C j d 1 . When it reaches the end it has computed C 0 d 1 and can enter it on Tape 1. Now it moves on to the next block on the second and the third tape and computes C 1 d 2 . And so on. It is clear that the computation is linear in the length of the Tape 2 (and the Tape 3) and therefore needs O n 2 time. At the end of this procedure Tape 2 and 3 are emptied. Also this needs quadratic time. At the end we need to consider that the lling of Tapes 2 and 3 needs O n 2 time. Then for every d the time consumption is at most O n 2 and in total O n3 . For this we rst write and position the head of Tape 1 on the element C 0 1 . We write C 0 1 onto Tape 2 and C 0 1 onto Tape 1. (So, we tick off the symbol. This helps us to remember what we did.) Now we advance to C 1 2 copy the result onto Tape 2 and replace it by C 1 2 . And so on. This only needs linear time; for the symbols C i i 1 we recognize because they are placed before the . If we are ready we write onto Tape 2 and move on Tape 1 on to the beginning and then to the rst symbol to the right of a ticked off symbol. This is C 1 2 . We copy this symbol onto Tape 2 and tick it off. Now we move on to the next symbol to the right of the symbol which has been ticked off, copy it and tick it off. In this way Tape 2 is lled in quadratic time. At last the symbols that have been ticked off are being ticked on, which needs O n 2 time. Analogously the Tape 3 is lled. Exercise 58. Prove Proposition 2.35. Exercise 59. Prove Theorem 2.41. Hint. Show that the number of moves of an automaton in scanning of the string x is bounded by k x , where k is a number that depends only on . Now code the behaviour of an arbitrary pushdown automaton using a 2tape Turing machine and show that to every move of the pushdown automaton corresponds a bounded number of steps of the Turing machine. Exercise 60. Show that a CFL is 0turn iff it is regular. Exercise 61. Give an algorithm to code a chart onto the tape of a Turing machine. Exercise 62. Sketch the behaviour of a deterministic Turing machine which recognizes a given CFL using O n2 space.
V V ) U
niy
g ) U
g ) U U
V ) U
V ) U
V ) U
g ) U V
V ) U
V ) U
V ) U
g ) U V U
132
Exercise 64. Construct a deterministic automaton which recognizes a given Dycklanguage. Exercise 65. Prove Theorem 2.46. 4. Ambiguity, Transparency and Parsing Strategies
In this section we will deal with the relationship between strings and trees. As we have explained in Section 1.6, there is a bijective correspondence between derivations in G and derivations in the corresponding graph grammar G. Moreover, every derivation Ai : i p of G denes an exhaustively ordered tree with labels in N A whose associated string is exactly p , p 1 C p 1 p . If p is not a terminal string, the labels of where A p 1 the leaves are also not all terminal. We call such a tree a partial Gtree. Denition 2.58 Let G be a CFG. is called a Gconstituent of category A if A G . Let be a Gtree with associated string x and y a substring of x. Assume further that y is a Gconstituent of category A and x D y . The occurrence D of y in x is called an accidental Gconstituent of category A in if it is not a Gconstituent of category A in . We shall illustrate this terminology with an example. Let G be the following grammar.
S
The string x has several derivations, which generate among other the following bracketing analyses. (2.67)
We now list all Gconstituents which occur in x: (2.68)

T
:
}
:
S
VVV VV aaa U aa aU UUUV U H H H
F ) ) H H H H @H H ) @ H ) l) ) ) H @H H H H H H H H H
)VVVV V U U aaa aU "U U H H H
T }
} T
i H @H H D
(2.66)
T
V U i
1 i D s
Exercise 63. Show that w wT : w
is context free but not deterministic.
S T
0 i)
i S i
S } T S
} 7}
} S
i(
Ambiguity, Transparency and Parsing Strategies
133
Some constituents occur several times, for example in and also in . Now we look at the rst bracketing, . The constituents are (contexts: , , ), , (for example in the context: ), in the context , , and . These are the constituents of the tree. The occurrence of in is therefore an accidental occurrence of a Gconstituent of category in that tree. For although is a Gconstituent, this occurrence in the tree is not a constituent occurrence of it. Notice that it may happen that y is a constituent of the tree but that as a Gconstituent of category C it occurs accidentally since its category in is D C. Denition 2.59 A grammar G is called transparent if no Gconstituent occurs accidentally in a Gstring. A grammar which is not transparent will be called opaque. A language for which no transparent grammar exists will be called inherently opaque. An example shall illustrate this. For any given signature , Polish Notation can be generated by a transparent grammar.
This denes the grammar for PN . Moreover, given a string x generated by this grammar, the subterm occurrences of x under a given analysis are in one to one correspondence with the subcontituents of category . An occurrence of an nary function symbol is a constituent of type n . We shall show that this grammar is not only unambiguous, it is transparent. Let x x0 x1 xn 1 be a string. Then let x : i n xi , where for every f F, f : f 1. (So, if f 0, f 1.) The proof of the following is left as an exercise.
It follows from this theorem that no proper prex of a term is a term. (However, a sufx of a term may again be a term.) The constituents are therefore all the substrings that have the properties (a) and (b). We show that the grammar is transparent. Now suppose that x contains an accidental occurrence of a term y. Then this occurrence overlaps properly with a constituent z. Without loss of generality y u v and z v w (with u w ). It follows that v y u 0 since u 0. Hence there exists a proper prex u 1
) i i
i Wi
V U i
Lemma 2.60 x PN iff (a) x x we have y 0.
1 and (b) for every proper prex y of
V U
V U
V U D i
V U
V 6U i
i W i
vV U V U D aa
i yV 6U
1 i
V U i
(2.69)
0 ) ( H @H @ 0 ) ( @ 0 l) ( H@H @H @l) H H ( ( @ H l) ( H H H 0 0 ) 0 H H H @H VVVV V U U aaaH aU "HU H U 0 ) ( H H H H H H

f
}
v i V U
@ } H @ H D 1 i i
0 l) ( H H D
H H H H V U i i
134
of u such that u1 1. (In order to show this one must rst conclude that the set P x : p : p is a prex of x is a convex set for every term x. See Exercise 68.)
Now look at the languages and . Both are regular. There is a transparent regular grammar for . It has the rules , . is on the other hand inherently opaque. For any CFG must generate at least two constituents of the form p and q , q p. Now there exist two occurrences of p in q which properly overlap. One of them must be accidental.
It can easily be seen that if L is transparent and L, then L . Also, a language over an alphabet consisting of a single letter can only be transparent if it contains no more than a single string. Many properties of CFGs are undecidable. Transparency is different in this respect. Theorem 2.63 (Fine) Let G be a CFG. It is decidable whether or not G is transparent. Proof. Let k be the constant from the Pumping Lemma (1.81). This constant can effectively be determined. By Lemma 2.64 there is an accidental occurrence of a constituent iff there is an accidental occurrence of a right hand side of a production. These are of the length p 1 where p is the maximum productivity of a rule from G. Further, because of Lemma 2.66 we only need to check those constituents for accidental occurrences whose length does not exceed p2 p. This can be done in nite amount of time.
Proof. Let be a string of minimal length which occurs accidentally. And let C be an accidental occurrence of . Further, let 1 2 , and let A be a rule. Then two cases may occur. (A) The occurrence of is accidental. Then we have a contradiction to the minimality of . (B) The occurrence of is not accidental. Then : 1 A2 also occurs accidentally in C ! (We can undo the replacement A in the string C since is a constituent.) Also this contradicts the minimality of . So, is the right hand side of a production.
V iU
i i i
Lemma 2.64 G is opaque iff there is a production has an accidental occurrence in a partial Gtree.
such that
Proposition 2.62
is inherently opaque.
Theorem 2.61 The grammar is transparent.
T S
T S
V iU
Ti
H H
V jU S i H
i D
V U i g
135
Lemma 2.65 Let G be a CFG without rules of productivity 1 and let , be strings. Further, assume that is a Gconstituent of category A in which occurs accidentally and in which is minimal in the following sense: there is no of category A with (1) and (2) G and (3) occurs accidentally in . Then every constituent of length 1 overlaps with the accidental occurrence of . Proof. Let 1 2 , 1, and assume that the occurrence of is a constituent of category A which does not overlap with . Then occurs accidentally in : 1 A 2 . Further, , contradicting the minimality of . Lemma 2.66 Let G be a CFG where the productivity of rules is at least 0 and at most p, and let be a string of length n which occurs accidentally. Then there exists a constituent of length np in which occurs accidentally. Proof. Let A G be minimal in the sense of the previous lemma. Then we have that every constituent of of length 1 overlaps properly with . Hence has been obtained by at most n applications of rules of productivity 0. Hence np. The property of transparency is stronger than that of unique readability, also known as unambiguity, which is dened as follows. Denition 2.67 A CFG G is called unambiguous if for every string x there is at most one Gtree whose associated string is x. If G is not unambiguous, it is called ambiguous. A CFL L is called inherently ambiguous if every CFG generating it is ambiguous. Proposition 2.68 Every transparent grammar is unambiguous. There exist inherently ambiguous languages. Here is an example. Theorem 2.69 (Parikh) The language L is inherently ambiguous.
Proof. L is context free and so there exists a CFG G such that L G L. We shall show that G must be ambiguous. There is a number k which satises the Pumping Lemma (1.81). Let n : k! : k 1 i . Then there exists a i decomposition of 2n 2n 3n into
i W i W i W i W i
(2.71)
u 1 x1 v1 y1 z1
V U
S "T s
(2.70)
L:
n n m
:n m
m n n
:n m
i i
i b i
i $ i i i H
i
i i i
D i
R i
136
in such a way that u1 k. Furthermore, we may also assume that v 1 k. It is easy to see that x1 y1 may not contain occurrences of , and at the p and same time. Since it contains , it may not contain . So we have x 1 p for some p. We consider a maximal constituent of (2.71) of the form y1 q q . Such a constituent must exist. (x 1 v1 y1 is of that form.) In it there q i q i for some i is a constituent of the form k. This follows from the Pumping Lemma. Hence we can pump up i and i at the same time and get strings of the form
in such a way that z2 v2 k. Analogously we get a constituent of the form 2p s 2p r for certain r s k. These occurrences overlap. For the left hand constituent contains 3p s many occurrences of and the right hand constituent contains 3p s many occurrences of . Since 3p s 3p s 6p s s 3p, these constituents must overlap. However, they are not equal. But this is impossible. So G is ambiguous. Since G was arbitrary, L is inherently ambigous. Now we discuss a property which is in some sense the opposite of the property of unambiguity. It says that if a right hand side occurs in a constituent, then under some different analysis this occurrence is actually a constituent occurrence. Denition 2.70 A CFG has the NTSproperty if from C G 1 2 and B R follows: C G 1 B 2 . A language is called an NTSlanguage if it has an NTSgrammar. The following grammar is not an NTSgrammar.
For we have but it does not hold that mars are not NTS. However, we have
~
H @H )
(2.75)
. In general, regular gram-
i W i W i
i W
v ) k k
W i
) ni i
i W i W i W i W i
(2.74)
u 2 x2 v2 y2 z2
Now we form a decomposition of
(2.73)
3p 3p 3q
3n 2n 2n
into
while there exists a constituent of the form k. In particular, for k : p i we get
(2.72)
2p ki 2p ki 3q 2p ki r 2p ki s
for certain r s
H p D i
i i dW QW i H
i i i
Vk
g U v
p W
1 i
iH
137
Theorem 2.71 All regular languages are NTSlanguages. Proof. Assume that L is regular. Then there exists a nite state automaton A Q q0 F such that L L . Put N : S L pq :pq Q . Further, put G : N A R , where R consists of (2.76) pq pq
Then we have p q G x iff q p x , as is checked by induction. From L . Hence we have L G L. It remains this follows that G x iff x to show that G has the NTSproperty. To this end let p q G 1 2 r s 2 . In order to and r s G . We have to show that p q G 1 do this we extend the automaton to an automaton which reads strings from N A. Here q p C iff for every string y with C G y we have q p y . Then it is clear that q p p q . Then it still holds that p q G iff q p . Hence we have r p 1 and q s 2 . From this follows that p q G p r r s s q and nally p q G 1 r s 2 . If a grammar has the NTSproperty, strings can be recognized very fast. We sketch a pushdown automaton that recognizes L G . Scanning the string from left to right it puts the symbols onto the stack. Using its states the automaton memorizes the content of the stack up to symbols deep, where is the length of a longest right hand side of a production. If the upper part of the stack matches a right hand side of a production A in the appropriate order, then is deleted from the stack and A is put on top of it. At this moment the automaton rescans the upper part of the stack up to symbols deep. This is done using a series of empty moves. The automaton pops symbols and then puts them back onto the stack. Then it continues the procedure above. It is important that the replacement of a right hand side by a left hand side is done whenever rst possible. Theorem 2.72 Let G be an NTSgrammar. Then G is deterministic. Furthermore, the recognition and parsing problem are in DTIME n . We shall deepen this result. To this end we abstract somewhat from the pushdown automata and introduce a calculus which manipulates pairs x of strings separated by a turnstile. Here, we think of as the stack of the automaton and x as the string to the right of the reading head. It is not really
i V ) U i
i W i W i
i ~ i
V ) FI U
~ U i V ) UFI i V ) I V i) U 1
~
U i W V ) I W i V ) I U V U
V U
V U
V aV ) U
V ) I U
1 U
pa
1 U i
V U 1 i i V !) U 1 i i
V ) V ) "V ) I U I U I U V i) U 1 V aV ) ) U U I
V ) UV ) I I U V ) I U
q0 q pr rq
1 )
V ) U S b S s T
V U
0 ) ) )
1 V ) U
V ) } I U
V ) FI U V ) FI U
( 0 )D ) i 1
V ) FI U V i) U 1
) ) ( i
V ) FI U
138
necessary to have terminal strings on the right hand side; however, the generalization to arbitrary strings is easy to do. There are several operations. The rst is called shift. It simulates the reading of the rst symbol.
~ ~~
Another operation is reduce. (2.78) reduce :
x X x
Here X must be a Grule. This calculus shall be called the shift reducecalculus for G. The following theorem is easily proved by induction on the length of a derivation.
~~ ~
Theorem 2.73 Let G be a CFG. G x iff there is a derivation of x in the shiftreducecalculus for G.
This strategy can be applied to every language. We take the following grammar.
S
Then we have S G . Indeed, we get a derivation shown in Table 4. Of course the calculus does not provide unique solutions. On many occasions we have to guess whether to shift or whether to reduce, and if the latter, then by what rule. Notice namely that if some right hand side of a production is a sufx of a right hand side of another production we have an option. We call a kstrategy a function f which tells us for every pair x whether or not we shall shift or reduce (and by which rule). Further, f shall only depend (1) on the reduction rules which can be at all applied to and (2) on the rst k symbols of x. We assume that in case of competition only one rule is chosen. i So, a kstrategy is a map R x is given then we i k A to s r . If determine the next rule application as follows. Let be a sufx of which is reducible. If f k x s, then we shift; if f k x r then we apply reduction to . This is in fact not really unambigous. For a right hand side
i ~ i
i~~~ i
V i ) iU
T ) S
@p
V i ) iU
H@H
(2.79)
T
i%~ i i %~~~ i i
T } S
(2.77)
shift:
xy x y
from
Ambiguity, Transparency and Parsing Strategies Table 4. A Derivation by Shifting and Reducing
139
of a production may be the sufx of a right hand side of another production. Therefore, we look at another property.
Denition 2.74 A CFG G is called an LR k grammar if not and if for some k there is a kstrategy for the shiftandreduce calculus for G. A language is called an LR k language if it is generated by some LR k grammar. Theorem 2.75 A CFG is an LR k grammar if the following holds: Suppose that 1 1 x1 and 2 2 x2 have a rightmost derivation and that with p : 1 1 k we have (2.81) Then 1
p
1 1 x1
2 2 x2
k
2 , 1
2 and
x1
x2 .
This theorem is not hard to show. It says that the strategy may be based indeed only on the kprex of the string which is to be read. This is essentially the
V U
V !) i U i
V !) i U i
V U
i i i i i D D D i i i i i i D
V U
I ni
and if y
k then f 1 y or f 2 y is undened.
1 i
V U
1 i
i i i
(2.80)
If 1
X1
R and 2
X2
R, 1
~~ ~ S ~ } } S ~~~ T S S ~ } S } S ~~~ S S ~ p} S S p @~~~ S S p @~ H S p ~~~ S p H ~ H H p ~~ ~ H H
~ ~~ ~
T } S }
2 ,
i i i
g ! i i i
140
property (2.80). One needs to convince oneself that a derivation in the shift reducecalculus corresponds to a rightmost derivation, provided reduction is scheduled as early as possible. Theorem 2.76 LR k languages are deterministic. We leave the proof of this fact to the reader. The task is to show how to extract a deterministic automaton from a strategy. The following is easy.
So we have the following hierarchy.
This hierarchy is stationary already from k
Proof. For a proof we construct an LR k grammar G from an LR k 1 grammar G. For simplicity we assume that G is in Chomsky Normal Form. The general case is easily shown in the same way. The idea behind the construction is as follows. A constituent of G corresponds to a constituent of G which has been shifted one letter to the right. To implement this idea we introduce new symbols, a X b , where a b A, X N, and a X , a A. The start symbol of G is the start symbol of G. The rules are as follows, where a b c range over A.
By induction on the length of a derivation the following is shown.

~
G
~
(2.84b)
aX
~ ) ) ) )
~
(2.84a)
aX b
b G
(2.83)
) ) ) ) ) ) ) )
a a a a
X X X X
b b
YZ R YZ R a R a R
if a A if X if X if X if X
) ) F ) ) ) ) F ) ) ) )
a a aY c cZ b aY c cZ b
}
g U
) )
1 )
g U
V U
) )
Lemma 2.78 Let k LR k language.
0. If L is an LR k
1 language then L also is an
aa@V U
} V U
} V U
} V U
(2.82)
LR 0
LR 1
LR 2
LR 3 1.
g U
Lemma 2.77 Every LR k language is an LR k
V U
V U
1 language.
) )
V U
141
From this we can deduce that G is an LR k grammar. To this end let 1 1 x1 and 2 2 x2 be rightmost derivable in G , and let p : 1 1 k as well as
Then a1 1 x1 1 1 bx1 for some a b A and some 1 , 1 with a1 1 c for c A and c1 1 b. Furthermore, we have a2 2 x2 2 2 bx2 , a2 2 c and c2 2 for certain 2 and 2 . Hence we have
1 1 k 1. Furthermore, the left hand and the right hand and p 1 string have a rightmost derivation in G. From this it follows, since G is an LR k 1 grammar, that 1 2 and 1 2 , as well as k 1 bx1 k 1 bx2 . k x , as required. From this we get 1 2 , 1 2 and k x1 2 Now we shall prove the following important theorem.
Theorem 2.79 Every deterministic language is an LR 1 language. The proof is relatively long. Before we begin we shall prove a few auxiliary theorems which establish that strictly deterministic languages are exactly the languages that are generated by strict deterministic grammars, and that they are unambiguous and in LR 0 . This will give us the key to the general theorem. We still owe the reader a proof that strict deterministic grammars only generate strict deterministic languages. This is essentially the consequence of a property that we shall call left transparency. We say occurs in 1 2 with left context 1 . Denition 2.80 Let G be a CFG. G is called left transparent if a constituent may never occur in a string accidentally with the same left context. This means that if x is a constituent of category C in y 1 x y2 and if z : y1 x y3 is a Gstring then x also is a constituent of category C in z. For the following theorem we need a few denitions. Let be a tree and n n a natural number. Then denotes the tree which consists of all nodes above the rst n leaves from the left. Let P the set of leaves of , say P pi : i q , and let pi p j iff i j. Then put Nn : pi : i n , and n Nn . : On r , where and are the relations relativized On : to On . If is a labelling function and a labelled tree then let n :
i i i
i i D
i i i
V U
i i i
0) ( 3 D
Dk i
k i
k iD
0 j) i) )
V U
k i
g k i k i
(2.86)
p 1
1 1 bx1
p 1
2 2 bx2
Dk i
i k ik i i i i k iD k i
D i i i D i V
3
(2.85)
1 1 x1
2 2 x2
i i i
g 6 i i
V U
i k ik i i k ik i D k i k i k i k i i 1 ) i k i kD i D i i i i i i D
i i i g
e
g U
k i
142

3
On . Again, we denote On simply by . We remark that the set n 1 Rn : n is linearly ordered by . We look at the largest element z from Rn . Two cases arise. (a) z has no right sister. (b) z has a right sister. In Case (a) the constituent of the mother of z is closed at the transition from n 1 to n . Say that y is at the right edge of if there is no z such that y z. Then Rn consists exactly of the elements which are at the right edge of n and Rn consists of all those elements which are at the right edge of n but not contained in n 1 . Now the following holds. Proposition 2.81 Let G be a strict deterministic grammar. Then G is left transparent. Furthermore: let 1 1 1 and 2 2 2 be partial Gtrees such that the following holds.
.
1 and
3 3
Proof. We show the theorem by induction on n. We assume that it holds for all k n. If n 0, it holds anyway. Now we show the claim for n. There exists n by assumption an isomorphism f n : n 1 2 satisfying the conditions n 1 n given above. Again, put Rn 1 : 1 1 . At rst we shall show that fn x x for all x Rn 1 . From this it immediately follows that 2 1 Rn 1 Rn 1 since G is strict deterministic. 2 fn x 1 x for all x This claim we show by induction on the height of x. If h x 0, then x is a leaf and the claim holds because of the assumption that 1 and 2 have the same associated string. If h x 0 then every daughter of x is in R n 1 . x. Since G By induction hypothesis therefore 2 fn y 1 y for every y is strict deterministic, the label of x is uniquely xed by this for 2 fn x 1 x , by induction hypothesis. So we now have 2 f n x 1 x . This shows the rst claim. Now we extend f n to an isomorphism f n 1 from n 1 1 onto n 1 Rn 1 . 2 and show at the same time that 2 f n x 1 x for every x This holds already by inductive hypothesis for all x R n 1 . So, we only have to show this for x Rn 1 . This we do as follows. Let u0 be the largest node in Rn 1 . Certainly, u0 is not the root. So let v be the mother of u 0 . fn is dened on v and we have 2 fn v 1 v . By assumption, 2 f n x 1 x for all x u. So, we rst of all have that there is a daughter x 0 of fn v which is not

V aV U
e1 V U
V U
V U
V aV U U
1 n 1
2 1
V U V aV U U
Then there is an isomorphism f : n 1 1 x in case x is not at the right edge of wise.
n 1
such that 2 f x
f x x other-
V aV U
V U
V U V aV U
V U
3 n
v 1 e e 1 v D
V aV U
V aV U
If Ci is the label of the root of
then C1
C2 .
3 D
3
V U
V U
3 W
V aV U
V lU V lU D
V U V U
v D
3 e
3VaV U V aV U
) D 3
V U
V U
(
n
U U
j
3 3 3 3
143
in the image of f n . We choose x0 minimal with this property. Then we put fn 1 u0 : x0 . Now we have 2 fn 1 u0 1 u0 . We continue with u0 in n 1 place of v. In this way we obtain a map f n 1 from n 1 Rn 1 1 n 1 Rn 1 and 2 fn 1 x to 2 with 2 f n 1 x 1 x , if x 1 x otherwise. That f n 1 is surjective is seen as follows. Suppose that u k is the fn 1 uk is not a leaf in 2 , and then there leaf of 1 in Rn 1 . Then xk n 1 n exists a xk 1 in 2 2 . We have 2 f n 1 xk 1 uk . Let x p be the leaf in L. By Lemma 2.50 2 x p xk and therefore also 2 x p 2 1 uk . However, by assumption x p is the n 1st leaf of 2 and likewise uk is the n 1st leaf of 1 , from which we get 1 uk 2 x p in contradiction to what has just been shown. Theorem 2.82 Let G be a strict deterministic grammar. Then L G is unambiguous. Further, G is an LR 0 grammar and L G is strict deterministic. Proof. The strategy of shifting and reducing can be applied as follows: every time we have identied a right hand side of a rule X then this is a constituent of category X and we can reduce. This shows that we have a 0strategy. Hence the grammar is an LR 0 grammar. L G is certainly unambiguous. Furthermore, L G is deterministic, by Theorem 2.76. Finally, we have to show that L G is prex free for then by Theorem 2.45 it follows that L G is strict deterministic. Now let x y L G . If also x L G , then by Proposition 2.81 we must have y . At rst sight it appears that Lemma 2.78 also holds for k 0. The construction can be extended to this case without trouble. Indeed, in this case we get something of an LR 0 grammar; however, it is to be noted that a strategy for G does not only depend on the next symbol. Additionally, it depends on the fact whether or not the string that is yet to be read is empty. The strategy is therefore not entirely independent of the right context even though the dependency is greatly reduced. That LR 0 languages are indeed more special than LR 1 languages is the content of the next theorem. Theorem 2.83 (Geller & Harrison) Let L be a deterministic CFL. Then the following are equivalent. L is an LR 0 language.
There are strict deterministic languages U and V such that L
1 i i
1 i
1 i i
1 i
If x
L, x v
L and y
L then also y v
L. U V .
V U 3 V aV U U D s D V ` ` U
3
YV
V U
V U
1 i
V U
V aV
V U
V U
U U
3 kW
1 i i
V U
V aV
3
V U
U U k U g U V U
3
YV
V U
V U
V aV U U
3
V U
V U
V U
V U
V U
144
Proof. Assume . Then there is an LR 0 grammar G for L. Hence, if X is a rule and if y is Gderivable then also X y is Gderivable. Using induction, this can also be shown of all pairs X, for which X G . Now let x L and x v L. Then G x, and so by the previous G v. Therefore, since y we have G y v. Hence obtains. Assume now . Let U be G the set of all x L such that y L for every proper prex y of x. Let V be the set of all v such that x v L for some x U but x w L for every x U and every proper prex w of v. Now, V is the set of all y V for which no proper prex is in V . We show that U V L. To this end let us prove rst that L U V . Let u L. We distinguish two cases. (a) No proper prex of u is in L. Then u U, by denition of U. (b) There is a proper prex x of u which is in L. We choose x minimally. Then x U. Let u x v. Now two subcases arise. (A) For no proper prex w 0 of v we have x w0 L. Then v V , and we are done. (B) There is a proper prex w0 of v with x w0 L. Let v w0 v1 . Then, by , we have x v1 L. (In , put x w0 in place of x and in place of y put x and for w put v 1 .) x v1 has smaller length than x v. Continue with x v 1 in the same way. At the end we get a partition of v w0 w1 wn 1 such that wi V for every i n. Hence L. Let u x i n wi . If n 0, then u x L U V . We now show U V and by denition of U we have u L. Now let n 0. With we can show that x i n 1 wi L. This shows that u L. Finally, we have to show that U and V are deterministic. This follows for U from Theorem 2.46. Now let x y U. Then by P : v : x v L v : y v L . The reader may convince himself that P is deterministic. Now let V be the set of all v for which there is no prex in P . Then P V and because of Theorem 2.46 V is strict deterministic. This shows . Finally, assume . We have to show that L is an LR 0 language. To this end, let G1 N1 A R1 be a strict deterministic grammar which generates U and G2 2 N2 A R2 a strict deterministic grammar which generates V . Then let G 3 : 3 N1 N2 3 4 A R3 be dened as follows.
3 1 3 1 4 4 2 4 2 4
It is not hard to show that G3 is an LR 0 grammar and that L G3 L. The decomposition in is unique, if we exclude the possibility that V and if we require that U shall be the case only if V . In this way and L U. The case U V may arise. Then we take care of the cases L L U . The semi Dyck languages are of this kind.
w ` D
T S
V U
T DS
S s
(2.87)
R3 :
R 1 R2
T S v 1 i
i 1 i
i i
) bT )
i i
1D i
Ss s ) ( 0 ) ) } ) D( ) ) }) ( D
1 i i
1 i Wi i 1 i D i i
1 i i IS i
1 i
1 i 1 i } T Sfv i i 1 i i
V U
}
i i
1 i } i Iaa i
1 i
1 i i IS i
i i i
1 i
i iD
T xv S
i i i 1 i
1 i
1 i i i 1 i D i i
i i
i i i i i
Wi
V U
1 i i
1 !) i i
145
Now we proceed to the proof of Theorem 2.79. Let L be deterministic. Then put M : L $ , where $ is a new symbol. M is certainly deterministic; and it is prex free and so strict deterministic. It follows that M is an LR 0 language. Therefore there exists a strict deterministic grammar G which generates M. From the next theorem we now conclude that L is an LR 1 language. Lemma 2.84 Let G be an LR 0 grammar of the form G N AR with R N N A N A and L G A , and assume that there is no derivation in G. Then let H : N A R , where R A R
Then H is an LR 1 grammar and L H
LG.
For a proof consider the following. We do not have in H. Further: if L L in H then there exists a D such that L D in G, and if L in G then we have D and L in H. From this we can immediately conclude that H is an LR 1 grammar. Finally, let us return to the calculus of shifting and reducing. We generalize this strategy as follows. For every symbol of our grammar we add a symbol . This symbol is a formal inverse of ; it signals that at its place we look for an but havent identied it yet. This means that we admit the following transitions.
We call this rule cancellation. We write for strings also . This denotes the formal inverse of the entire string. If i n i then i n n i . Notice that the order is reversed. For example . These new strings allow to perform reductions on the left hand side even when only part of the right hand side of a production has been identied. The most general rule is this one. (2.90)
X x x
This rule is called the LCrule. Here X must be a Grule. This means intuitively speaking that vec is an X if followed by . Since is not yet there
T S
i i
i%~ i i i %~~~ i i
(2.89)
x x
TV
V U V U D T 1 i U 1 i ) 1 i
V U
i D i
i%~~~ i %~
S s
V U
(2.88)
R :
:A :A
0 ) bT s ) S
) (
0k ) ) ) ( }} V U D
V V
V U
U s } V
T j S
|
UU ae
V U }
V U
146
we have to write . The LCcalculus consists of the rules shift, reduce and LC. Now the following holds.
G
~ |
A special case is . Here no part of the production has been identied, and one simply guesses a rule. If in place of the usual rules only this rule is taken, we get a strategy known as topdown strategy. In it, one may shift, reduce and guess a rule. A grammar is called an LL k grammar if it has a deterministic recognition algorithm using the topdownstrategy in which the next step depends on the rst k symbols of x. The case k 0 is of little use (see the exercises). This method is however too exible to be really useful. However, the following is an interesting strategy. The right hand side of a production is divided into two parts, which are separated by a dot.
This dot xes the part of the rule that must have been read when the corresponding LCrule is triggered. A strategy of this form is called generalized left corner strategy. If the dot is at the right edge we get the bottomup strategy, if it is at the left edge we get the topdown strategy. Exercise 66. Let R be a set of context free rules, a symbol, N and A nite sets, and G : N A R . Show that if R and G is transparent then G is a CFG. Remark. Transparency can obviously be generalized to any grammar that uses context free rules. Exercise 67. Show Theorem 2.76. Exercise 68. Prove Lemma 2.60. Show in addition: If x is a term then the set P x : y : y is a prex of x is convex. Exercise 69. Show the following: If L is deterministic then also L x as well as x L are deterministic. (See Section 1.2 for notation.) Exercise 70. Show that a grammar is an LL 0 grammar if it generates exactly one tree. Exercise 71. Give an example of an NTSlanguage which is not an LR 0 language.
}
V U
T IS i
V U
Ti
0 ) ) ) (
i V U S i
(2.91)
T
V U
Theorem 2.85 Let G be a grammar. from x in the LCcalculus.
T wy S }
x holds iff there is a derivation of
i ~ i
}
u i IT S
V U i
Semilinear Languages Table 5. The Generalized LCStrategy
147
5.
Semilinear Languages
In this section we shall study semilinear languages. The notion of semilinearity is important in itself as it is widely believed that natural languages are semilinear. Whether or not this is case, is still open (see Section 2.7). The issue of semilinearity is important, because many grammar formalisms proposed in the past only generate semilinear languages (or else are generally so powerful that they generate every recursively enumerable set). Even though semilinearity in natural languages is the rule rather than the exception, the counterexamples show that the grammar formalisms do not account for natural language in a satisfactory way. In this chapter we shall prove a theorem by Ginsburg and Spanier which says that the semilinear subsets of n are exactly the sets denable in Presburger Arithmetic. This theorem has numerous consequences, in linguistics as well as in mathematics. The proof given here differs substantially from the original one. Denition 2.86 A commutative monoid or commutative semigroup with unit
~~~ ~ T T ~ ~~ T ~ T ~~~ T ~ lp} } } T p @~~~ T p @~ } S } p @~~~ p ~ H } H } p ~~~ S p H ~ H H p ~~ ~ H H

~ ~~ ~
T T
} } } T T T T T T T T T T T
148
Notice that because of associativity we may dispense with brackets. Alternatively, any term can be arbitrarily bracketed without affecting its value. We dene the notation x as follows: 0 x : 0 and 1 x : x x. (Later on we shall drop .) Then x0 x0 x0 , and x0 x0 , simply by denition. Furthermore, x y x y , by induction on . This can be generalized. Lemma 2.87 In a commutative semigroup, the following holds. (2.93)
i m
Proof. Induction on m. The case m

i m 1
i m i m
i m
i m 1
Also
i m 1
i m
i m 1
i xi
V
U g
i xi
i m
i m
m xm
V
xi
U@V g
i m
i m
xi
m xm
xi
m xm
UgV
i m 1
(2.96)
i xi
i xi
V
i xi
xi
m xm
V
U g
V
i xi
m xm
U j
g V
U j
xi
m xm
U j
U j
(2.95)
i xi
i m
i m
1 has been dealt with. Now: xi
V
(2.94)
xi
xi
i m
i m
V
U j
xi
V @V U V g j U g U V U D V g U g D g D V g U D D
g V
g g D g U V g U g U U
(2.92)
x x y z
i xi i xi
m xm
1 ) )
0 g !) )
is a structure H 0
in which the following holds for every x y z
H.
V
149
This nishes the proof. We shall denote by M A set underlying the commutative monoid freely generated by A. By construction, A : M A 0 is a commutative semigroup with unit. What is more, A is freely generated by A as a commutative semigroup. We now look at the set n of all nlong sequences of natural numbers, endowed with the operation dened by
This also forms a commutative semigroup with unit. Here the unit is the sequence 0 consisting of n 0s. We denote this semigroup by n . For the following theorem we also need the socalled Kronecker symbol. (2.98)
ji :
1 0
if i = j, otherwise.
Theorem 2.88 Let A ai : i n . Let h be the map which assigns to each element ai the sequence ei ji : j n . Then the homomorphism which extends h is an isomorphism from A onto n . Proof. Let be the smallest congruence relation on Tm A (with : 0 0 2) which satises (2.92). It follows from Lemma 2.87 by induction on the level of the term t that for t Tm A there is a u t of the form
i n
If (2.99) obtains, put q t : ki : i n . Now, it is immediately seen that ker q, whence Tm A n . On the other hand, Tm A A, since it is easily shown that the rst is also freely generated by A. Namely, suppose that v : ai ni is a map from A into . Let v : Tm A N be the extension of v. Then, since is a monoid, ker v, so that we can dene a such that v q h . map q : A This theorem tells us that free commutative semigroups can be thought of as vectors of numbers. A general element of M A can be written down as i n ki ai where ki . Now we shall dene the map : A M A by (2.100)
V U i
g i V U
x y
V U
V i W U i V U
ai
0 ai
V U
CV U
V U
V U
V U
V U V U
(2.99)
ki
ai
V U
V U
V U
( 0 g
"V U
(2.97)
xi : i
yi : i
n :
xi
yi : i
0 g ) !) V U
V U V U D
V U
g !) D
150
This map is a homomorphism of monoids and also surjective. It is not injective, except in the case where A consists of one element only. The map is called the Parikh map. We have
Denition 2.90 Elements of M A will also be denoted using vector arrows. Moreover, if x n we write x i for the ith component of x. A set U M A is called linear if for some and some u v i M A
The vi are called cyclic vectors of U. The smallest for which U has such a representation is called the dimension of U. U is said to be semilinear if U is the nite union of linear sets. A language L A is called semilinear if S is semilinear. We can denote semilinear sets rather compactly as follows. If U and V are subsets of M A then write U V : x y : x U y V . Further, let x U : x y : y U . So, vectors are treated as singleton sets. Also, we write nx : n . Finally, we denote by U the union of all nU, n . nU : With these abbreviations we write the set U from Denition 2.90 as follows.
This in turn we abbreviate by

i
Lemma 2.91 The following holds.
(2.105)
U;V : U
i S
Finally, for V
vi : i
g i
(2.104)
vi vi
g iaag i
g i
g i
(2.103)
v0
v1
g i
1 !) i
1 i
i i g IS
) iaaa)
i i
g bS i
(2.102)
ki
vi : k 0
V U
V U
1 !) i i
Denition 2.89 Two languages L M M. have L
1 V U i V U
i k
i k
V U i
A are called letter equivalent if we
(2.101)
1 i S 1 i i g IS D i D V U
1 i
xi
xi
151
Theorem 2.92 (Parikh) A language is semilinear iff it is letter equivalent to a regular language. Proof. It is enough to show this for linear languages. Suppose that L u ;V , V vi : i n . Pick a string x and yi , i n, such that x u and yi vi for all i n. Put
i n
Clearly, M is regular and letter equivalent to L. By induction on the length of the regular term R we shall show that L R is semilinear. This is clear for R ai or R . It is also clear for R S1 S2 . Now let R S1 S2 . Using the equations S T U S U T U and U S T U S U T , we u ;C1 can assume that S1 and S2 are linear. Then by denition L S1 and L S2 v ;C2 , for certain u, v, and sets C1 and C2 . Then, using Lemma 2.91, we get
Theorem 2.93 Let A be a (possibly innite) set. The set of semilinear languages over A form an AFL with the exception that the intersection of a semilinear language with a regular language need not be semilinear. Proof. Closure under union, star and concatenation are immediate. We have to show that semilinear languages are closed under homomorphisms and inverse homomorphisms. The latter is again trivial. Now let v : A A be a
Hence R too is linear. This ends the proof. We draw a useful conclusion from the denitions.
s i "T bS T i !U S
T b!U i S
V U
(2.108)
LR
u ;C
0 ; u
i V U
D T 8!U i S
sD
Now, nally, R S . If S T assume that S is linear, say, S
U, then R T U , so that we may again u ;C for some u and C. By Lemma 2.91 C
T i g b!U i S
T !U i S
g @V
D T b!U i S
D V U
(2.107)
LR
u ;C1
v ;C2
v ;C1 C2
T b!U i S V s D
U V
s a U s V U V U
i i
bV
V i
s U
T I!U i S
U Wi
(2.106)
M:
yi
V U i
T i !U S
i S
U;V
0 ;U
V .
Vk
Vk
k U
g V D D
U;V
U ;V
U ;V
k U
s V
U;V
U ;V
U ;V . V .
i V U i D V T b!U i S V | U
U U
152
homomorphism. v induces a map v : A A . The image under v of a semilinear set is semilinear. For given a string x A we have v x v x , as is easily checked by induction on the length of x. Let M be linear, say M u i k vi . Then
i k
From this the claim follows. Hence we have v L v L . The right hand side is semilinear as we have seen. Finally, take the language L : 2i 2i : i j j : j 2i 2i : i . L is semilinear. L is not semilinear, however. Likewise, a subset of n ( n ) is called linear if it has the form
for subsets of n . The linear subsets of n are nothing but the afne subspaces. A subset of n ( n, n ) is called semilinear if it is the nite union of linear sets. Presburger Arithmetic is dened as follows. The basic symbols are , , , and m , m 0 1 . Then Presburger Arithmetic is the set of rst order sentences which are valid in : 01 m , where m: 1 a m b iff a b is divisible by m (for FOL see Sections 3.8 and 4.4). Negation can be eliminated. Notice namely that 0 =x1 is equivalent to , to 0 1 and 0 1 1 0 0 1 1 0 0 m 1 is equivalent to 0 n m 0 m 1 n . Here, n is dened by 0 : ,n 1: n . We shall use 0 1 for 0 1 0 1 . Moreover, multiplication by a given natural number also is denable: put 0t : 0, and n 1 t : nt t . Every term in the variables i , i n, is equivalent to a term 0 i n ai i , where b ai , i n. A subset S of n is denable if there is a formula 0 1 n 1 such that (2.112)
The denable subsets of n are closed under union, intersection and complen 1 is denable, ment and permutation of the coordinates. Moreover, if S
T ) aaa) ) 1 Q0 ( @S D V iaaa) ) U ) 1 ) V g U D D @ g D d D d d
ki : i
k 0 k1
kn
0 a0
() ) g i!) ) ( )
i g %aaag 2g %g i i i
(2.111)
v0
T ) xv S
for subsets of
as well as v1 v2 vm
g aaag i
g i
g i
(2.110)
v0
v1
v2
vm
V U i
g i @V 6U
s S "T
(2.109)
v M
v u
v vi
V i aV U U
1 i V V U U
i a
g i 1
V i aV D U U
153
so is its projection

The same holds for denable subsets of n , which are simply those denable n is denable, so is subsets of n that are included in n . Clearly, if S n. S Lemma 2.94 Suppose that a i n pi xi b i n qi xi is a linear equation with rational numbers a, b, pi and qi (i n). Then there is an equation

with the same solutions, but with positive integer coefcients such that g h 0 and for every i n: vi ui 0.
Proof. First, multiply with the least common denominator to transform the equation into an equation with integer coefcients. Next, add p i xi to both sides if pi 0, unless qi pi 0, in which case we add qi xi . Now all coefcients are positive. Next, for every i n, substract q i xi from both sides if pi qi and pi xi otherwise. These transformations preserve the set of solutions. Call an equation reduced if it has the form
with positive integer coefcients g and k i , i n. Likewise for an inequation. Evidently, modulo renaming of variables we can transform every rational equation into reduced form. Lemma 2.95 The set of solutions of a reduced equation is semilinear. Proof. Let be the least common multiple of the k i . Consider a vector of the form ci j k i ei k j e j , where i m and m j n. Then if v is a solution, so is v ci j and conversely. Put C : ci j : i m j n and
Both P and C are nite. Moreover, the set of solutions is exactly P;C .
i m
m i n
dV U aV U i i)
V U i
(2.116)
P:
u:g
ki u i
ki u i u i
ki
S i
i aV
g i i U g 'V U i
(2.115)
i m
ki xi
m i n
ki xi
i n
(2.114)
ui xi
i n
vi xi
1 Q0
( @S
(2.113)
n S :
ki : i
n : there is kn
such that ki : i n 1 S
154
Lemma 2.96 The set of solutions of a reduced inequation is semilinear. Proof. Assume that the inequation has the form
We can assume that the vi are linearly independent. Clearly, since w n, i w for any nonzero rational number , we can assume that v i m. Now, put
0 i m
w V
g 'aag i
g i
g i
(2.122)
v0
v1
v2
vm
Proof. It sufces to show this for linear subsets. Let v i , i such that
Lemma 2.98 Let M linear.
be a semilinear subset of
n.
Then M n is semi1, be vectors
This is a semilinear set.
V %o'aag %og U i g i i
(2.121)
v1
vm
1 ik
g i
1 i
g i
1 k
V if i
n is
nite. Moreover, if v0 i . Hence,
i m i vi
g IS i
(2.120)
V:
v0
i vi : 0
n then
v0
i m i vi
1 i
i 'aag i g %g i g i
(2.119)
v0
Proof. Let vi , i
1, be vectors such that v1 v2 vm

1
"}
Lemma 2.97 Let M subset of n.
be an afne subspace. Then M
n is
a semilinear
i IS
The set of solutions is P;C F , where F :
i m
(2.118)
ki xi
m i n
ki xi ei : i m .
Dene C and P as before. Let E : ei : m i is P;C E . If the inequation has the form
i S
i m
g V
(2.117)
ki xi
m i n
ki xi n . Then the set of solutions
V i
155
Thus, we may without loss of generality assume that
Notice, however, that these vectors are not necessarily in n . For i starting at 1 until n we do the following. Let xij : v j i . Assume that for 0 j p we have xij 0, and that for p j m we have xij 0. (A renaming of the variables can achieve this.) We introduce new cyclic vectors c j k for 0 j p and p k m. Let the least common multiple of the xis , for all 0 s m where xis 0.
Notice that the scoordinates of these vectors are positive for s i, since this is a positive sum of positive numbers. The ith coordinate of these vectors is 0. Suppose that the ith coordinate of
0 j m
is 0, where j for all 0 j m. Suppose further that for some k p we have k vi0 m xik . Then there must be a j p such that j xij . i and : i . Then Then put r : r for r j k, j : j x j xk k k
0 j m
Moreover, j j for all j p, and k k . Thus, by adding these cyclic vectors we can see to it that the coefcients of the v k for p k m are bounded. Now dene P to be the set of all w which have a decomposition
0 j m
u P
0 j p
0 j p k m
g i
g i
(2.129)
M n
jv j
where j
j v0
m xij for all 0
1 i
g i
(2.128)
v0
jv j
n
j
m. Then
j kc j k
ik
g i
(2.127)
cj k
jv j
V g U f U
U v
D nV U
g i
(2.126)
f
v0
jv j
i aV
g aV U i
(2.125)
ci j :
xij v j
xik vk
g aaag i
g i
g i
(2.124)
v0
v1
v2
vm
g aaag i
g aaag i
g i
g i
(2.123)
v0
V U i
D D
i v k D
f
Put wi :
vi , 0
m. Then
v1
v2
vm
w1
wm
156

f
with all j , j k 0. Now we have achieved that all jth coordinates of vectors are positive. The following is now immediate.
Lemma 2.100 The intersection of semilinear sets is again semilinear. Proof. It is enough to show the claim for linear sets. So, let S 0 and S1 be ui : i m and C1 vi : i n and u and v such linear. Then there are C0 that S0 u ;C0 and S1 : v ;C1 . Notice that w S0 S1 iff there are natural numbers i (i m) and j ( j n) such that
So, we have to show that the set of these w is semilinear. The equations are now taken as linear equations with i , i m and i , i n, as variables. Thus we have equations for m n variables. We solve these m n equations rst in m n . The solutions form an afne subspace V m n . By Lemma 2.99, V m n is semilinear, and so is its projection m (or to n for that matter). Let it be p, onto i p Li , where for each i Li m is linear. Thus there is a representation of L i as
Now put
i m
From the construction we get that
So, the Wi are linear. This shows the claim.
g aaag i
g i
(2.134)
Wi
q0
i V U i
g i
i V U i
Dene vectors qi :
m i
j ui , i
and r : c
(2.133)
S0
S1
i p
Wi j
m
1 i
i V U i
g bS i
(2.132)
Wi :
i ui :
g aaag i
(2.131)
Li
Li
j ui . Then
$}
i m
i n
g i
g i D
g i
(2.130)
i ui
i vi
1 i
i IS
T !U i S
D 8S i
#}
Lemma 2.99 Let M subset of n .
be an afne subspace. Then M n is a semilinear
T b!U i S
157
We need one more prerequisite. Say that a rstorder theory T has quantier elimination if for every formula x there exists a quantier free formula x such that T x x . We follow the proof of (Monk, 1976). Theorem 2.102 (Presburger) Presburger Arithmetic has quantier elimination.
We may further eliminate negation (see the remarks above) and disjunctions inside y x (since is equivalent with . Finally, we may assume that all conjuncts contain . For if does not contain free, is equivalent to . So, can be assumed to be a conjunction of atomic formulae of the following form:
Since s mt is equivalent with ns m nt, so after suitable multiplication we may see to it that all the ni , ni , ni and ni are the same number .
i p
i q
i r
i s
Assume that p 0. Then the rst set of conjunctions is equivalent with the conjunction of i j p i j (which does not contain ) and 0 . We may by 0 in the formula. therefore eliminate all occurrences of Thus, from now on we may assume that p 0. Furthermore, notice that is equivalent to . This means that we can assume q 1, and likewise that r 1. Next we show that we can actually have s 1. To see this, notice the following.
kkk
3@ 4

kk
6 7 5 k 4
$

4

5 F (1 4 &
(2.138)
i
mi i
We may rewrite the formula in the following way (replacing is divisible by ). adding instead the condition that
by
kkk
kk
6

4 F (1 &
(2.137)
4 k
i p
i q
i r
i
i s mi i
kkk
kkk
4 6 kk 7 kk
5 4
kkk
kk
'F (1 4 &
(2.136)
i p ni
ti
i q ni
ti
i r ni
ti
i s ni
mi t i
and
('d3 &
21@ &
821d &
k k 4
i yV U
V ) U i
& i ) F (' 0
& 21
21 &

V ) U i
(2.135)
V ) U i
y with y x Proof. It is enough to show that for every formula quantier free there exists a quantier free formula y such that
V ) U i
V U i (' &
V U i
V U i
V U i
Lemma 2.101 If S
n is semilinear, so is its projection n S .

V U i
@
158
Let u v w x be integers, w x 1, and let p be the least common multiple of w and x. Then gcd p w p x 1, and so there exist integers m n such that 1 m p w n p x. It follows that the following are equivalent. u v mod gcd w x and y m p wu
The Euclidean algorithm yields numbers m and n as required (see (Jones, 1955)). Now suppose that the rst obtains. Then y u ew and y v f x for some numbers e and f . Then u v f x ew, which is divisible by gcd x w . So, u v mod gcd w x . Furthermore,
m p wu n p x y
m p w y
m p w em 0
mod p
So, the second holds. Conversely, if the second holds, then for some k we have u v k gcd w x . Then
Analogously y v mod x is shown. Using this equivalence we can reduce the congruence statements to a conjunction of congruences where only one involves . This leaves us with 8 possibilities. If r 0 or s 0 the formula is actu , , ally trivially true. So, m , as well as m and m can all be dropped or replaced by . Finally, is equivalent with and is equivalent with i m i i m . m This shows the claim. Theorem 2.103 (Ginsburg & Spanier) A subset of n is semilinear iff it is denable in Presburger Arithmetic.
5
D
@F21 &
5D 8 @(' &
D
F 21 & (' & HH H (' & ( 21 & @F21 &
mod w
V ) U
V U
m p wu
n p xv
v V U V U
V v V v V
(2.140)
m p wu
n p xu
n p x k gcd m n
V V U g V v V v V U v V U
V U V U V V g V
v V
V ) U
(2.139)
m p wu
n p xv
m p wy
n p xy u
n p xv
n p x fn
V ) U
B A @
E B
U U D U g U D U v U V U D
A @
BGB 8 @ F B
V aV ) U
@ @
u mod w and y
v mod x
n p x v mod p .
C DB A 8 A @ 9 8
8 8 8 F F A
|
159
Proof. ( ) Every semilinear set is denable in Presburger Arithmetic. To see this it is enough to show that linear sets are denable. For if M is a union of Ni , i p, and each Ni is linear and hence denable by a formula i x , then M is denable by i p i x . Now let M v v0 vm 1 be linear. Then put
x denes M. ( ) Let x be a formula dening S. By Theorem 2.102, there exists a quantier free formula x dening S. Moreover, as we have remarked above, can be assumed to be negation free. Thus, is a disjunction of conjunctions of atomic formulae. By Lemma 2.100, the set of semilinear subsets of n is closed under intersection of members, and it is also closed under union. Thus, all we need to show is that atomic formulae dene semiis equivalent to m , linear sets. Now, observe that m m onto the rst two which is semilinear, as it is the projection of components.
Exercise 72. Let A 1. Show that A is isomorphic to A . Derive from this that there are only countably many semilinear languages over A. Exercise 73. Let L A . Call L almost periodical if there are numbers p (the modulus of periodicity) and n0 such that for all x L with length n0 there is a string y L such that y x p. Show that a semilinear language is almost periodical. Exercise 74. Let A . Further, let U : . Now let N M A be a set such that N U is innite. Show that there are 2 0 many languages L with L N. (The cardinality of A is 0 , hence there can be no more than 20 such languages. The exercise consists in showing that there are no less of them either.) Exercise 75. Show that semilinear languages have the following pumping property: For every semilinear set V n there exists a number n such that if v V has length n, there exist w and x such that v w x and w x V .
f
Show that V satises the pumping property of the previous exercise. Show
0 ) @S (
(2.142)
V :
m n :m
n or m
Exercise 76. Let
. Let V
2 be dened by
V U i V U i V U i )V U 4 i 4 F ('aa@ 21 21 & & &
V U
} i
& @F ('
g i
V U
i xg i
1 i

H D
V Y U
g i

ni
T Fl) S H D
1 i
V U i
(2.141)
x :
n 1
n m 1 i n
i m
n i
v i j
m n iv
V U i
g aaa g i
g i
V U i
V U i
1 i
160
further that V is semilinear iff is. Exercise 77. Show that for every sentence of Presburger Arithmetic it is decidable whether or not it is true in . Hint. Use quantiers elimination and the fact that the elimination is constructive. 6. Parikhs Theorem
Now we shall turn to the already announced embedding of context free tree sets into tree sets generated by UTAGs. (The reader may wonder why we speak of sets and not of classes. In fact, we shall tacitly assume that trees are really tree domains, so that classes of nite trees are automatically sets.) N A R be a CFG. We want to dene a tree adjunction grammar Let G LB G G N A G such that LB G G . We dene G to be the set of all (ordered labelled) tree (domains) which can be generated by L B G and which are centre trees and in which on no path not containing the root some nonterminal symbol occurs twice. Since there are only nitely many symbols and the branching is nite, this set is actually nite. Now we dene N, (modulo identication of X, X G . Let G contain all adjunction trees Y 0 Y 1 with Y for all Y N) such that (1) X can be derived from X in G, (2) no symbol occurs twice along a path that does contain the root. Also G is nite. It is not hard to show that LB LB G . The reverse inclusion we G shall show by induction on the number of nodes in the tree (domain). Let be in LB G . Either there is a path not containing the root along which some symbol occurs twice, or there is not. In the second case the tree is in G . LB B of Hence G and we are done. In the rst case we choose an x minimal height such that there is a y x with identical label; let the label be X. Consider the subtree induced by the set x y y . We claim that . For this we have to show the following. (a) is an adjunction tree, G (b) can be deduced from X, (c) no symbol symbol occurs twice along a path which does not contain x. Ad (a). A leaf of is either a leaf of or y. In the rst case the label is a terminal symbol in the second case it is identical to that of the root. Ad (b). If is a tree of G then can be derived from X. Ad (c). Let be a path which does not contain x and let u v nodes with identical label and u v. Then v x, and this contradicts the minimality of x. Hence all three conditions are met. So we can disembed . This means that there is a such that is derived from by adjoining . We have LB G tree and by induction hypothesis LB . Hence LB , which had G G
V U V U
1 @k
V yU
T V S s
1 )
V U
V yU
r
V U k
} V yU
V U D
1 k
0 ) ) ) ( 0 ) ) ) ( D } D V yU V U 1
Parikhs Theorem
161
to be shown. Theorem 2.104 (Joshi & Levy & Takahashi) Every set of labelled ordered tree domains generated by a CFG is also one generated by a UTAG. Now we shall prove Parikhs Theorem for UTAGs. Let be a letter and a tree. Then is the number of nodes whose label is . If is an adjunction tree then the label of the root is not counted. Now let N A be a UTAG and , . i:i j : j
The proof of this lemma is easy. From this it follows that we only need to know for an arbitrarily derived tree how many times which tree has been be a tree which resulted adjoined and what the starting tree was. So let from i by adjoining j p j times, j . Then
i i
We dene the following sets
Then LB i n i . However, equality need not always hold. We have to notice the following problem. A tree j can be adjoined to a tree only if its root label actually occurs in the tree . Hence not all values of i are among the values under of a derived tree. However, if a tree can be adjoined once it can be adjoined any number of times and to all trees that result from this tree by adjunction. Hence we modify our starting set of trees somewhat. We consider the set D of all pairs k W such that k , W and there is a derivation of a tree that starts with k and uses exactly the trees from W . For k W D
i j W
V U
g @V U
) U
(2.146)
LkW
) (
V U
d2aU } V0 ) (
g V U
(2.145)
i :
g V U
D 1 0
) (
(2.144)
pj
V
Let now
A a
a. Then
V U
g V U
(2.143)
pj
Vk
Lemma 2.105 Let .
result from
by adjoining the tree . Then
0 ) ) ) ( D
jS
IS
V U
g @V
162
Theorem 2.106 Let L be the language of an unregulated tree adjunction grammar then L is semilinear. Corollary 2.107 (Parikh) Let L be context free. Then L is semilinear.
This theorem is remarkable is many respects. We shall meet it again several times. Semilinear sets are closed under complement (Theorem 2.103) and hence also under intersection. We shall show, however, that this does not hold for semilinear languages.
The Parikh image is 2n 2 2 2n 1 1 : n . This set is not semilinear, since the result of deleting the symbol (that is, the result of applying the projection onto ) is not almost periodical. We know that for every semilinear set N M A there is a regular grammar G such that L G N. However G can be relatively complex. Now the question arises whether the complete preimage 1 N under is at least regular or context free. This is not the case. However, we do have the following. Theorem 2.109 The full preimage of a semilinear set over a single letter alphabet is regular. This is the best possible result. The theorem becomes false as soon as we have two letters. Theorem 2.110 The full preimage of context free. The full preimage of
is not regular; it is however is not context free.
aaa)
V U
V p g bYg U V xg U H H
U cg V
V U
U @S
(2.148)
2 2 4
2 2 4 4 8
2 2 4 4 8 8 16
Because of Theorem 1.67 L1 and L2 are context free. Now look at L1 is easy to see that the intersection consists of the following strings.
(2.147)
L1 :
M1
L2 :
M2 L2 . It
Proof. Let M1 :
n n
:n
and M2 :
n 2n
:n
. Put
Proposition 2.108 There are CFLs L1 and L2 such that L1 linear.
L2 is not semi-
0 ) ) ) ( H
1 y0
) ( V
V U ) U ( D D H
Then L : set of all
LkW : kW D is semilinear. At the same time it is the where is derivable from N A .
Parikhs Theorem
163
Proof. We show the second claim rst. Let (2.149) W:
Assume that W is context free. Then the intersection with the regular language is again context free. This is precisely the set n n n : n . Contradiction. Now for the rst claim. Denote by b x the number of occurrences of in x minus the number of occurrences of in x. Then V : x: bx 0 is the full preimage of . V is not regular; otherwise the intersection with is also regular. However, this is n n : n . Contradiction. However, V is context free. To show this we shall construct a CFG G over A which generates V . We have three nonterminals, , , and . The rules are
S
The start symbol is . We claim: 0, 1 and G x iff b x G x iff b x 1. The directions from left to right are easy to verify. It G x iff b x therefore follows that V L G . The other directions we show by induction on the length of x. It sufces to show the following claim.
Hence let x i n xi be given. Dene k x j : b j x , and K : k x j : j n 1 . As is easily seen, K m m with m 0. Further, k x n bx . (a) Let b x 0. Then put y : x0 and z : 0 i n xi . This satises the 1. Case 1: x0 . Then put again y : x0 and conditions. (b) Let b x z : 0 i n xi . Case 2: x0 . Then k x 1 1 and there is a j such that k x j 0. Put y : i j xi , z : j i n xi . Since 0 j n, we have y z x . Furthermore, b y 0 and b z 1. (c) b x 1. Similar to (b).
Exercise 79. Prove Theorem 2.109. Hint. Restrict your attention rst to the case that A .
Exercise 78. Let A 1 and ated by over A is regular.
be a UTAG. Show that the language gener-
V U i V ) U i V ) U D S i D
I WC I I
V U i
V i U
S I VS I 0TS I S S U S8
1 0 1 there are y and z such that y z If b x as well as b y b z 1.
x and such that x
V U i
V U i
V iU D V ) U i
H i D
V ) U i
) D
i D
V U
V U i
T }
Y S I S 8S I `B X@ 0TB @ S R8 8 B P@ Q I
V U i
} T
T S H D
(2.150)
T
yz
i IS
V U i
V g U H
V p Rg g U H
1
S T
S }
T S
} 7}
} S
T Fl) S H D
V U i
H p D V U i i y T T
i 8 i ni ) V ) U i D i D
V U i
~
T T
164
Exercise 80. Let N M A be semilinear. Show that the full preimage is of Type 1 (that is, context sensitive). Hint. It is enough to show this for linear sets. Exercise 81. In this exercise we sketch an alternative proof of Parikhs Theorem. Let A n be an alphabet. In analogy to the regular terms we i:i dene semilinear terms. (a) i , i n, is a semilinear term, with interpretation ei . (b) If A and B are semilinear terms, so is A B with interpretation u v : u A v B , A B, with interpretation u : u A or u B and A with interpretation ku : k u A . The rst step is to translate a CFG into a set of equations of the form Xi Ci X0 X1 Xq 1 , q the number of nonterminals, Ci semilinear terms. This is done as follows. Without loss of generality we can assume that in a rule X , contains a given variable at most once. Now, for each nonterminal X let X i , i p, be all the rules of G. Corresponding to these rules there is an obvious equation of the form
where A and B are semilinear terms that do not contain X. The second step is to prove the following lemma:
C X , with A, B and C semilinear terms not conLet X A B X taining X. Then the least solution of that equation is A B C. If B X is missing from the equation, the solution is A C, and if C X is missing the solution is A B.
Using this lemma it can be shown that the system of equations induced by G can be solved by constant semilinear terms for each variable. generates exExercise 82. Show that the UTAG n , where x is a string of n s and n s such actly the strings of the form x that every prex of x has at least as many s as s.
Hi 0 T H S ) T ) p ) S ) S ) T S Ij'b$'Fl) 'bT 'bCI!( H }
p i
@ a bB
U s
(2.151)
X or X
1 i
1 i
) iaaa)
i bS
TD
1 a) i
V U T
i S T
c ba @
1 ) i
1 i i g bS i T IS i
C
Are Natural Languages Context Free?
165
Show also that this language is not context free. (This example is due to (Joshi et al., 1975).) 7. Are Natural Languages Context Free?
We shall nish our discussion of CFLs by looking at some naturally arising languages. We shall give examples of languages and constructions which are denitely not context free. The complexity of natural languages has been high on the agenda ever since the introduction of this hierarchy. Chomskys intention was in part to discredit structuralism, which he identied with the view that natural languages always are context free. By contrast, he claimed that natural languages are not context free and gave many examples. It is still widely believed that Chomsky had won his case. (For an illuminating discussion read (Manaster-Ramer and Kac, 1990).) It has emerged over the years that the arguments given by Noam Chomsky and Paul Postal against the context freeness of natural languages were faulty. Gerald Gazdar, Geoffrey Pullum and others have repeatedly found holes in the argumentation. This has nally led to the bold claim that natural languages are all context free (see (Gazdar et al., 1985)). The rst to deliver a correct proof of the contrary was Riny Huybregts, only shortly later followed by Stuart Shieber. (See (Huybregts, 1984) and (Shieber, 1985).) Counterevidence from Bambara was given by Culy (1987). Of course, it was hardly doubted that from structural point of view natural languages are not context free (see the analyses of Dutch and German within GB, for example, or (Bresnan et al., 1987)), but it was not shown decisively that they are not even weakly context free. How is a proof the non context freeness of a language L possible? A typical method is this. Take a suitable regular language R and intersect it with L. If L is context free, so is L R. Now choose a homomorphism h and map the language L R onto a known nonCFL. We give an example from the paper by Stuart Shieber. Look at (2.152) (2.154). If one looks at the nested innitives in Swiss German (rst rows) we nd that they are structured differently from English (last rows) and High German (middle rows). (Instead of a gloss, , we offer the following parallels: , , , , , .) (2.152)
G Hp G f e y '4 D 5 4 A H H A 7 n@1c@7 A h A 9 IH H A 4 D A 1$V H IH 9 Hp G 5p G 9 h P $5 G 8D d 5 fbh D d D 9 G 9 4 h P d 9 h A A H P 8pRcd 9 HV g p P eId G h d 5 D G Hp G 5 8h f hAF7 g bFRH i7 $H 8 d 9 d A 7 d A 7 4 9 D D h 5 4 A 9 H d D 5 4 A H 8 P G G G gh $5Ieh n@1H Fh P $h gh X V H 9H f@H eH d P 4 4 d A A d A
d 9 hh
G 5p G y 0h D 5 4 A H H q5h X @V H P 9 Vg P G 5p G f h 8h f H x FV H RH e A 7 F7 IH f h D A h A 9 9 5 A 4 D A 9 G 5p y 0h D 5 4 A H q5H G G x e h X @V H F7 IH h 8h f H P A 7 A h A 9 5 A 4 D A FV H RH 9 G G y h A 0R7 g h $$8 4 4 9 D H G 5p G G G e A 9 FRH F$h R5 8D 8 P 9 h P h 8#H A bIH 4 4 h P h 4 4 H A 9 G G Hp y 9 h A A H P R@h X h 9 P 9 Rh !$5RH D h 5 4 A 9 f e A 7 FRH H FRH 8h D bD RH A A 9 5 9 h 5 D A A 4 R H A bIH 9 G 5p G y 0h D 5 4 A H H q5h X @V H P 9 Vg P G 5p G f e A 7 F7 IH f h D B bh f H A h A 9 9 5 A 4 D A 1$V H IH 9 G G G G e yhF7 g A h IH 1$h #H A bIH 4 4 9 D H 8 A 9 8 P h 4 4 H A 9 G y 9 Rh X $h P G 5p f e 9Rh !$1FRFRH H FRH RH D h 5 4 A 9 H A 7 A A 9 5 D A A 4 R H A bIH 9 G 5p y 0h D 5 4 A H q5H G G f e h X V H @7 RH f bh f H P A 7 A h A 9 h 5 A 4 D A 1$V H IH 9 G G G e yhF7 g A h $q$IH #H A bIH 4 R 9 D 4 9 D H 8 A D A 9 4 4 H A 9 f e h 5 4 A 9 H A 7 @1FRFRH H FRH H A A 9 A A 4 R H A bIH 9 y 4 G 5
D !
(2.156)
(2.155)
(2.154)
(2.153)
Now we assume this is an empirical assumption, to be sure that this is the general pattern. It shall be emphasized that the processing of such sentences becomes difcult with four or ve innitives. Nevertheless, the resulting sentences are considered grammatical. By asking who does what to whom (we let, the children help, Hans paints) we see that the constituents are quite different in the three languages. Subject and corresponding verb are together in English (see (2.157a)), in High German they are on opposite sides of the embedded innitive (see (2.157b), this is called the nesting order). Swiss German, however, is still different. The verbs follow each other in the reverse order as in German (so, they occur in the order of the subjects, see (2.157c)). This is called the crossing order. (2.157a) 166
S1 V1 S2 V2 S3 V3
aa
(2.157b)
S1 S2 S3
aa aa
(2.157c)
S1 S2 S3
V1 V2 V1 V3 V2 V1
aa
167
Now we proceed as follows. The verbs require accusative or dative on their complements. The following examples show that there is a difference between dative and accusative. In (2.155) is accusative and the complement of , which selects dative. The resulting sentence is ungrammatical. In (2.156), is dative, while selects accusative. Again the sentence is ungrammatical. We now dene the following regular language (recall the denition of from Section 1.2). (2.158) R:
This is dened over the standard alphabet. It is not hard to see (invoking the Transducer Theorem, 6.40) that the corresponding language over the alphabet of lexemes is also regular. We dene the following mapping from the lexemes and to , and (denoted by their strings). v sends , , to , everything else inculding the blank is mapped to . The claim is that To this end we remark that a verb is sent to if it has a dative object and to if it has an accusative object. An accusative object is of the form N or N (N a noun) and is mapped to by v. A dative object has the form N, N a noun, and is mapped onto . Since the nouns are in the same order as the associated innitives we get the desired result. In mathematics we nd a phenomenon similar to Swiss German. Consider the integral of a function. If f x is a function, the integral of f x in the interval a b is denoted by
a
This is not in all cases well formed. For example, 01 x 1 dx is ill formed, since there Riemann approximation leads to a sequence which is not bounded, hence has no limit. Similarly, limn 1 n does not exist. Notice that the value range of x is written at the integral sign without saying with what variable the range is associated. For example, let us look at
a c
V ) U
(2.161)
f x y dxdy
v 'U
V U
(2.160)
f x dx
f h
V U
TV ji@s j U
V U
1 i
i IS i
(2.159)
hS R
xx : x
h X V H P
` W G s s ` W U % W GHp WV ` W h q41HH D 5 A h X @V H P 9 Vg P H H @P G W A 7 F7 A h aV ` W V s ` W G5!U W V ` W s s ` W U U p f h a% AFRH 9 9 D h B Q9 W e A H 4 D FV H A 9 RH D
f h H
g 9V P
g 9V P
A 9 FRH
H H @P h
5 8h f
9 D
G5p f h
G H
D q 1 4
5 A H H
168
The rectangle over which we integrate the function is a x b and c y d. Hence, the rst integral sign corresponds to the operator dx, which occurs rst in the list. Likewise for three integrals:
a0
a1
a2
is well formed iff ai 0 for all i n such that i 1. The dependencies are crossing, and the order of elements is exactly as in Swiss German (considering the boundaries and the variables). The complication is the mediating function, which determines which of the boundary elements must be strictly positive. In (Kac et al., 1987), it is argued that even English is not context free. The argument applies a theorem from (Ogden et al., 1985). If L is a language, let Ln denote the set of strings that are in L and have length n. The following theorem makes use of the fact that a string of length n possesses n n 1 2 proper substrings and that n n 1 2 n2 for all n 1. Denote by c the smallest integer c. Theorem 2.111 (Interchange Lemma) Let L be a CFL. Then there exists a real number cL such that for every natural number n 0 and every set Q L n there is a k Q cL n2 , and strings xi , yi , zi , i k, such that
1 i i i
for all i j
k: xi y j zi
1 i i i
for all i
k: xi yi zi
I i ni ni )
for all i
k: yi xi zi
0,
Q, and Ln .
ni i
for all i
k: xi
x j , yi
y j , and zi
zj .
t s
g U
i i
g U
t uV
a0
a1
an
aa
) iaaa)
q aa
U $f s
(2.164)
b0
b1
bn
f x 0 x1
xn
dx0 dx1
dxn
1 1 , i n. Further, we allow for the interval a i bi either 0 1 with i or 1 2 . Then an integral expression
) aaa)
T )
(2.163)
f x0
xn :
x i
D
i n
where the value range is ai functions:
xi
(2.162)
b0
b1
b2
f x0 x1 x2 dx0 dx1 dx2 bi for all i 3. Consider the following
v S 81
169
Proof. Let G be a CFG that generates L. Let c L : N . We show that cL satises the above conditions. Take any set Q L n . Then there is E Q of cardinality 2 Q n 1 n and numbers k 0 and 0 such that every member of E possesses a decomposition x y z where x has length k, y has length , and x z is a constituent occurrence of y in the string. It is then clear that there is a subset F E of cardinality 2Q n 1 nN Q c L n2 such that all x z are constituent occurrences of identical nonterminal category. The above conditions are now satised for F. Moreover, F Q c L n2 , which had to be shown. Note that if the sequence of numbers L n n2 is bounded, then L satises the conditions of the Interchange Lemma. For assume that there is a c such that for all n we have Ln n2 c. Then cL : sup Ln n2 : n c. Then Ln cL n2 1. for every n and every subset Q of Ln , Q cL n2 However, with k 1 the conditions above become empty. Theorem 2.112 Let L A be a language such that Ln n2 n is a bounded sequence. Then L satises the conditions of the Interchange Lemma. This is always the case if A 1. Kac, ManasterRamer and Rounds use constructions with shown below, in which there is an equal number of nouns and verb phrases to be matched. In these constructions, the nth noun must agree in number with the nth verb phrase.
(2.166) (2.167) (2.168)
The problematic aspect of these constructions is illustrated by (2.168). There need not be an exact match of NPs and VPs, and when there is no match,
4 $
A P h h A
4 $
9 h R5 g
9 h $5 g
(2.165)
t uV
P $h
tuV U xuV s t U s S w T g1 D
D q4
U v5 s f
` p y h n4 C5 P D h 8A h f h G 4$9R5 9Ih ` A f h G A g 4 4 p #7h 4 h H A P h 4 P P h h h 8 6 p G g h A G 9 9 H P A G 9IH 9 H A RPcD 4 RA @g Rh 4 IH Ic$D s 9 H ` p ` G y h q4 Hh5 X P$RF@$RA P D h 8A h A 4 D P P h 9 H A P h Rh $RA f h 4 p p g h A G 9 9 H P A G h $4 h#hh 9IH A g h 4 RH R$D s 8 6 ` p ` G y h q4 H5 Ah P$RA f h P D h 8A h h 4 P P h @$A 9 RH X $RFD P h A 4 p p g G G 4 h h 8 6 4 97hh 9RH A @g Rh 4 IH Ic$D s h A 9 9 H P A ` G y A h $RA f h P h 4 p p g h A G X P$hRFD A 4 P P hA g 4 h$4 #h RH A g h s h 8 6 h 9 ` G y A h $A f h 4 P h p p G h h 8 6 h $4 #h RH 9 9 H P A Ic$D s
h 8A h H5
U nV V
y V
P h A 4 D P P h 1A g
iU
g aU U
i i i
f
g U D }
0 l) ( i i
0 l) ( i i
170
agreement becomes obscured (though it follows clear rules). Now let
and let D be the set of strings of A that contain as many nouns as they contain pronouns. B is that subset of D where the ith noun is iff the ith pronoun is . The empirical fact about English is that the intersection of English with D is exactly B. Based on this we show that English is not context free. For suppose it were. Then we have a constant c L satisfying the Interchange Lemma. (We ignore the blanks and the period from now on.) Let n be given. Choose Q : Bn , the set of strings of length n in B. Notice that B n 2 n 8 2 for all n. Therefore, for some n, Bn 2n2 cL so that Bn cL n2 2. This means that there are x1 , x2 , z1 , z2 and y1 and y2 such that Bn contains x1 y1 z1 as well as x2 y2 z2 , but x1 y2 z1 and x2 y1 z2 are also grammatical (and therefore even in Bn ). It is easy to see that this cannot be. The next example in our series is modelled after the proof of the non context freeness of ALGOL. It deals with a quite well known language, namely predicate logic. Predicate logic is dened as a language over a set of relation and function symbols of varying arity and a set of variables i : i . In order to be able to conceive of predicate logic as a language in our sense, we code the variables as consisting of sequences , where . We have iff . (Leading zeros are not suppressed. The numbers are usually put as subscripts, but we shall not do that here.) We restrict ourselves to the language of pure equality. The alphabet is . The grammar rules are as follows. (2.170)
T () ) ) ) ) 8G ) 8iRS ) ) ) &) )
i i i
` V W h ` $A f h G 4 s A P h ` V W h ` A f h A P h
b1 i T ) S
f t
f H
S
` V W s 9 H R@P
735 5 6 & F71 F ) Q5 dI d (
i
(2.169)
A:
` p W W h n4 C5 y P D h 8A h ` W U %FV s U 2 W X FD P h A 4 P P h $RA $R5 4 9 h 9 RH G s ` W U %9V s U % X $R1D 4 P h A 4 P P h @$A $$5 4 9 h g 4 4 p #7h h " RH p W h `h 8 6 9 G s W G % U A gg h 4 h A 9 H IP $D 4 A V @g G s Q G U A g h 4 h A 9 H R@P D 4 D A
9 RH
i i i
i i i i i
i D i
i i i
i D i
P h A 4 FD
171
Here stands for the set of formulae for the set of prime formulae for the set of quantier prexes, the set of variables and for the set of strings over and . Let x be a formula and C an occurrence of a variable . We now say that this occurrence of a variable is bound in x if it is an occurrence D of a formula Q y in x with Q which contains C. A formula is called a sentence if every occurrence of a variable is bound. Theorem 2.113 The set of sentences of predicate logic of pure equality is not context free. Proof. Let L be the set of sentences of pure equality of predicate logic. Assume this set is context free. Then by the Pumping Lemma there is a k such that every string of length k has a decomposition u x v y w such that u x i v y i z L for all i and x v y k. Dene the following formulae.
All these formulae are sentences. If is sufciently long (for example, longer than k) then there is a decomposition as given. Since x v y must have length k x and y cannot both be disjoint to all occurrences of . On the other hand, it follows from this that x and y consist only of and , and so necessarily they are disjoint to some occurrence of . If one pumps up x and y, necessarily one occurrence of a variable will end up being unbound. We can strengthen this result considerably. Theorem 2.114 The set of sentence of predicate logic of pure equality is not semilinear. Proof. Let P be the set of sentences of predicate logic of pure equality. Assume that P is semilinear. Then let P1 be the set of sentences which contain only one occurrence of a quantier, and let this quantier be . P1 is the intersection of P with the set of all vectors whose component is 1 and whose component is 0. This is then also semilinear. Now we consider the image of P1 under deletion of all symbols which are different from , and . The result is denoted by Q1 . Q1 is semilinear. By construction of P1 such that every occurrence of a variable is of the form there is an . If this variable occurs k times and if contains p occurrences of and kp kq . It is easy q occurrences of we get as a result the vector k to see that k must be odd. For a variable occurs once in the quantier and
&
g 2
i i i i
&
g 2
i i F i )
(2.171)
i
i i i i i
T &) )S RF1
I i i ni
i R i
b1 i T ) S
1 i i i i i
i
172
elsewhere once to the left and once to the right of the equation sign. Now we have among others the following sentences. (2.172)
Since we may choose any sequence we have
Q1 is an innite union of planes of the form 2k 3 . We show: no nite union of linear planes equals Q 1 . From this we automatically get a contradiction. So, assume that Q1 is the union of Ui , i n, Ui linear. Then there exists a Ui which contains innitely many vectors of the form 2k 3 . From this one easily deduces that Ui contains a cyclic vector of the form m , m 0. (This is left as an exercise.) However, it is clear that if v Q 1 then we have m v Q1 , and then we have a contradiction. Now we shall present an easy example of a natural language which is not semilinear. It has been proposed in somewhat different form by Arnold Zwicky. Consider the number names of English. The stock of primitive names for numbers is nite. It contains the names for digits ( up to ) the names for the multiples of ten ( until ), the numbers from and until as well as some names for the powers of ten: , , , , and a few more. (Actually, using Latin numerals we can go to very high powers, but few people master these numerals, so they will hardly know more than these.) Assume without loss of generality that is the largest of them. Then there is an additional recipe for naming higher powers, namely by stacking the word . The 6k is represented by the kfold iteration of the word number 10 . For example, the sequence (2.174)
names the number 1024 . (It is also called , from Latin eight, because there are eight blocks of three zeros.) For arbitrary numbers the schema is as follows. A number in digital expansion is divided from right to left into blocks of six. So, it is divided as follows:
aa
(2.175)
106
1012
9 h P Rh 8$h h 9 D 9
0V
9 g $bD f D P P 9 g $@8D f D P P
g 4 p g
V I
g C
g 8h 5
9 g 8D f 9 g $@8D f 9 g bD f 9 g $@8D f 9 g D P P D P P D P P D P P h
g U QV
p 9 g $@8n4 g D P P D
1 ) )
4
h 9 D 9
G D P P 9 H A 9 g $@8D 9 g bD f IbF7 g 4 ` $5 @7 D P P h 9 9 h h 4 h 9 D @#9 h $74 P h
g y
9 h R4
g U 2V
9 g 8D f D P P
U @S
(2.173)
Q1
@ i i dI i i @d'V i i @ i (' & i i R i i @ i (' & i i i (' &

2k 3 p
:k p q
g
173
where i is the number name of i . If i 0 the ith block is omitted. Let Z be the set of number names. We dene a function as follows. ; : , all other primitive names are mapped onto . The Parikh image of Z is denoted by W . Now we have
Here, k is the largest integer k. We have left the proof of this fact to the reader. We shall show that W is not semilinear. This shows that Z is also not semilinear. Suppose that W is semilinear, say W i n Ni where all the Ni are linear. Let
for certain ui and vij ji ij . Suppose further that for some i and j we have ji 0. Consider the set Certainly we have P put : ji ij . Then Ni k ij W . Furthermore, we surely have ij
Proof. Let ui x y . Then a general element of the set P is of the form x k ji y k ij . We have to show that for almost all k the inequality y k ij
This holds for almost all k.
k ij
k ij
ij x
ij
(2.182)
k ji
is satised. Indeed, if k
(2.181)
k ji
x , ij
then x k ij x
k ji
Lemma 2.115 For every q where q p .
0 almost all elements of P have the form p
V
(2.180)
ui
:k
g U H
(2.179)
P:
ui
vij
ui
k ji
(2.178)
Ni
ui
j pi
vij
k ij : k
o px
l mw
(2.177)
k0
k1 : k 1
k0 9 2
V g U D 9 8D f D P P
i h d f f d 2g(e
i h d f f d i h d f f d 7252ee77g52e
j kD
i )aa
(2.176)
g g V U H
where i
106 for all i. The associated number name is then as follows.
V `U
l
0. Now
g U
174
Lemma 2.116 Almost all points of P are outside of W . Proof. Let n0 be chosen in such a way that n02 9 n0 1 . Then for all n 9 n 1 . Let p q W with p n0 . Then n n0 we also have 2 q we have p , and therefore p q P. Put H : p q : p n0 . Then P H . However W H is certainly nite. Hence W P is nite, as required. Now have the desired contradiction. For on the one hand no vector is a multiple of ; on the other hand there can be no vector m n with n 0. Hence W is not semilinear. Notes on this section. The question concerning the complexity of variable binding is discussed in (Marsh and Partee, 1987). It is shown there that the language of sentences of predicate logic is not context free (a result that was folklore) but that it is at least an indexed language. (Indexed languages neeed not be semilinear.) On the other hand, it has been conjectured that if we take V to the set of formulae in which every quantier binds at least one free occurrence of a variable, the language V is not even an indexed language. See also Section 5.6. Philip Miller (1991) argues that Swedish and Norwegian are not context free, and if right branching analyses are assumed, they are not even indexed languages. Exercise 83. Formalize the language of functions and integral expressions. Prove that the language of proper integral expressions is not context free. Exercise 84. Show the following: Let U be a linear set which contains innitely many vectors of the form k . Then there exists a cyclic vector of the form m , m 0. Hint. Notice that the alphabet may consist of more than one letter. Exercise 85. Show that W has the claimed form. Exercise 86. Show that the set V is not semilinear. k0 2
w
(2.184)
: max
ij :i ji
n j
pi
Hint. Evidently, no linear set following is welldened.
V may contain a vector k . Therefore the
o gx
(2.183)
V:
k0
k1 : k 1
g U H S
D 1 g H zr qy
Hg U H
v t
zr q
D H
175
Show now that for every 0 almost all elements of W are of the form x y where y x. If we put for example 1 we now get a contradiction. Exercise 87. Prove the unique readability of predicate logic. Hint. Since we have strictly speaking not dened terms, restrict yourself to proving that the grammar given above is unambiguous. You might try to show that it is also transparent.
m n:m Exercise 88. Let . Put L : n or m . Then L V , as dened in Exercise 76. Show that L satises the properties of Theorem 1.82 and of Theorem 2.111. It follows that there are 2 0 many languages over and that satisfy these criteria for context freeness and are not even semilinear.
g c U }
Chapter 3 Categorial Grammar and Formal Semantics

1. Languages as Systems of Signs
Languages are certainly not sets of strings. They are systems for communication. This means in particular that the strings have meaning, a meaning which all speakers of the language more or less understand. And since natural languages have potentially innitely many strings, there must be a way to nd out what meaning a given string has on the basis of nite information. An important principle in connection with this is the socalled Principle of Compositionality. It says in simple words that the meaning of a string only depends on its derivation. For a CFG this means: if 0 1 n 1 is a rule and ui a string of category i then v : u0 u1 un 1 is a string of category and the meaning of v depends only on the meaning of the u i and . In this form the principle of compositionality is still rather vague, and we shall rene and precisify it in the course of this section. However, for now we shall remain with this denition. It appears that we have admitted only context free rules. This is a restriction, as we know. We shall see later how we can get rid of it. To begin, we shall assume that meanings come from some set M, which shall not be specied further. As before, exponents are members of A , where A is a nite alphabet. (Alternatives to this assumption will be discussed later.) Denition 3.1 An interpreted (string) language over the alphabet A and with meanings in M is a relation A M. The string language associated with is
The meanings expressed by are
Alternatively, we may regard a language as a function from A to M . x: f x is the string language associated with f and Then L f : M f : f x the set of expressed meanings of f . These denitions are x A
T s 1 p0 ) ( i
1 i
T w V U i D
V U i
i IS
Vs PU
(3.2)
m : there is x
A such that x m
T s 1 Ig0 ) ( i
i S
Vs PU
(3.1)
x : there is m
M such that x m
aa i
i aaa i i
} s
V U V UD
178
Categorial Grammar and Formal Semantics
not equivalent when it comes to compositionality. In the original denition, any particular meaning of a composite expression is derived from some particular meanings of its parts, in the second the totality of meanings is derived from the totality of the meanings of the parts. We give an example. We consider the number terms as known from everyday life as for example . We shall write a grammar with which we can compute the value of a term as soon as its analysis is known. This means that we regard an interpreted language as a set of pairs t x where t is an arithmetical term and x its value. Of course, the analysis does not directly reveal the value but we must in addition to the rules of the grammar specify in which way the value of the term is computed inductively over the analysis. Since the nodes correspond to the subterms this is straightforward. Let T be the following grammar. (3.3)
(This grammar only generates terms which have ciphers in place of decimal strings. But see Section 3.4.) Let now an arbitrary term be given. To this term corresponds a unique number (if for a moment we disregard division by 0). This number can indeed be determined by induction over the term. To this end we dene a partial interpretation map I, which if dened assigns a number to a given term.
. We may also regard If a function f is undened on x we write f x as a value. The rules for are then as follows. If at least one argument is , for all a. If x is a term, then I x is so is the value. Additionally, a 0 uniquely dened. For either x is a cipher from to or it is a negative cipher,
V U i
{ $D
V U
{ |D
V w $zU
aa D
(3.4)
0 1
V U i V U i V U i V U i
V U v i y i V U eV U i vV U i g i @V U
V R i v V i V bu i V i V i $
VRuU uV$zU U iyU iyU iyU i yU
I I I I I I I
x x x x
y y y y x
: : : : : : :
I I I I
x x x x I x
I I I I
y y y y
0 ) (
t v 2gt
" #7
wxaa7 u t 7ut t t

t t
Languages as Systems of Signs
179
or x y1 y2 for some uniquely determined y1 , y2 and . In this way one can calculate I x if one knows I y 1 and I y2 . The value of a term can be found by naming a derivation and then computing the value of each of its subterms. Notice that the grammar is transparent so that only one syntactical analysis can exist for each string. The method just described has a disadvantage: the interpretation of a term is in general not unique, for example if a string is ambiguous. (For example, has two values, 13 or 16.) As if we erase all brackets then the term explained above, we could take the meaning of a string to be a set of numbers. If the language is unambiguous this set has at most one member. Further, we have I x only if x is a constituent. However, in general we wish to avoid taking this step. Different meanings should arise only from different analyses. There is a way to implement this idea no matter what the grammar is. Let U be the grammar which results from T by deleting the brackets of T . (3.5)
The strings of U can be viewed as images of a canonical transparent grammar. This could be (3.3). However, for some reason that will become clear we shall choose a different grammar. Intuitively, we think of the string as the image of a term which codes the derivation tree. This tree differs from the structure tree in that the intermediate symbols are not nonterminals but symbols for rules. The derivation tree is coded by term in Polish Notation. For each rule we add a new symbol . In place of the rule A we now take the rule A . This grammar, call it V , is transparent (see Exercise 89). x L V is called a derivation term. We dene two maps and . yields a string for each derivation term, and yields an interpretation. Both maps shall be homomorphisms from the term algebra, though the concrete denition is dened over strings. can be uniformly dened by deleting the symbols . However, notice that the rules below yield values only if the strings are derivation terms.
In the last line, is different from all . We have assumed here that the grammar has no rules of the form A even though a simple adaptation can
V U
(3.6)
W aa W V
WV
} i aa i ~U
n 1 : 0 :
T ) H1 ) ) S
v u
V U i
V U i
" $7
t v 7t
wxaa7 u t u 2t t t
V U i D
} i
w yV U i D
V U
1 i
180
help here as well. Now on to the denition of . In the case at hand this is without problems.
Here we have put the derivation term into Polish Notation, since it is uniquely readable. However, this only holds under the condition that every symbol is unique. Notice, namely, that some symbols can have different meanings as in our example the minus symbol. To this end we have added an additional annotation of the symbols. Using a superscript we have distinguished between the unary minus and the binary one. Since the actual language does not do so (we write without distinction), we have written 1 if the rule for the unary symbol has been used, and 2 if the one for the binary symbol has been used. The mapping is a homomorphism of the algebra of derivation terms into the algebra of real numbers with , which is equivalent to a partial homomorphism from the algebra of terms to the algebra of real numbers. For example the symbol is interpreted by the function : , where : and satises the laws specied above. In principle this algebra can be replaced by any other which allows to interpret unary and binary function symbols. We emphasize that it is not necessary that the interpreting functions are basic functions of the algebras. It is enough if they are polynomial functions (see (Hendriks, 2001) on this point). For example, we can introduce a unary function symbol whose interpretation is duplication. Now 2x x x, and hence the duplication is a polynomial function of the algebra 0 1 , but not basic. However, the formal setup is easier if we interpret each function symbol by a basic function. (It can always be added, if need be.) This exposition motivates a terminology which sees meanings and strings as images of abstract signs under a homomorphism. We shall now develop this idea in full generality. The basis is formed by an algebra of signs. Recall from Section 1.1 the notion of a strong (partial) subalgebra. A strong subalgebra is determined by the set B. The functions on B are the restrictions of the respective functions on A. Notice that it is not allowed to partialize functions
ge
m}
D D
(3.7)
V iU V iU V iU V iU
V iU v y V i U eV i U vV i U g @V i U
D D
V vi V i V iu V i V i
u
} i g~U } i ~U } i g~U u } } ~U
} i ~U
0 1 2 0 1 0 1 0 1 1
: : : : :
0 0 0 0
1 1 1 1
T S s
0 ) )a!2P( ) g) g D
181
additionally. For example, A with f is not a strong subalgebra of unless f . A sign is a triple e c m where e is the exponent of , usually some kind of string over an alphabet A, c the category of and m its meaning. Abstractly, however, we shall set this up differently. We shall rst dene an algebra of signs as such, and introduce exponent, category and meaning as values of the signs under some homomorphisms. This will practically amount to the same, however. So, we start by xing a signature F . In this connection the function symbols from F are called modes. Over this signature we shall dene an algebra of signs, of exponents, of categories and meanings. An algebra of signs over F is simply a 0generated partial algebra over this signature together with certain homomorphisms, which will be dened later. A is called ngenerated if Denition 3.2 A (partial) algebra there is an nelement subset X A such that the smallest strong subalgebra containing X is .
is called a sign grammar over the Denition 3.3 The quadruple signature if is a 0generated partial algebra and : ,: and : homomorphisms to certain partial algebras such that the homomorphism is injective and strong. is called the algebra the of signs, the algebra of exponents, the algebra of categories and algebra of meanings.
This means in particular: Every sign is uniquely characterized by three things: its socalled exponent , its (syntactical) category (which is also often called its type), its meaning .
To every function symbol f F corresponds an f ary function f in , an f ary function f in and an f ary function f in . Signs can be combined with the help of the function f any time their respective exponents can be combined with the help of f , their respective categories can be combined with f and their respective meanings with f . (This corresponds to the condition of strongness.)
V U
0 ) (
V U
0 ) (
V U
V U D
0 ) ) ( )
V U
0 ) ) ( } V U
0 ) (
0 ) ) (
0 ) (
V U
V U
182
In the sequel we shall write f in place of f , f in place of f and f in place of f . This will allow us to suppress mentioning which actual algebras are chosen. If is a sign, then is uniquely dened by , and on the other hand it uniquely denes as well. We shall call this triple the realization of . Additionally, we can represent by a term in the free algebra. We shall now deal with the correspondences between these viewpoints. Let PN g : g F , where PN is the set of constant : terms written in Polish Notation and
1
is a freely 0generated algebra. The elements of PN are called structure terms. We use , , and so on as metavariables for structure terms. We give an example. Suppose that is a 0ary mode and a unary mode. : x Then we have and x. This yields the following strings as representatives of structure terms.
We denote by h : M N the fact that h is a partial function from M to N. p p p We now dene partial maps : PN E, : PN C and : PN M in the following way.
0 g 1 0 g 1
Here, the left hand side is dened iff the right hand side is and then the two are equal. If we have a 0ary mode g, then it is a structure term g g E. Likewise we dene the other maps.
0 0 g g 1 1 0 g 1
As remarked above, for every sign there is a structure term. The converse need not hold. Denition 3.4 We say, a structure term is orthographically denite if is dened and semantically deis dened. is syntactically denite if is dened. Finally, is denite if is orthographically, syntactinite if cally as well as semantically denite.
V 8U
V aV
V 8U
7 U
(3.12)
V aV
U aaaV U U )) )) U aaaV U U
VaV iaaa) U ) ) iaaa) U
V aV
7 U
(3.11)
V U
V aV
)) U iaaaV U U
V aV
) iaaa) U
7 U
(3.10)
) aaaP
) P
) ) P P
(3.9)
i g
i QW
i) i 6iaaa) U
} } 7}
} }
(3.8)
x0
x g
0 aV U V U V U ( ) )
g
0 IT
S 7 ')
V 8U
xi
183
Figure 9. Synopsis
The reader is referred to Figure 9 for a synopsis of the various algebras and maps between them. In the sequel we shall often identify the structure term with its image under the unfolding map. This will result in rather strange types of denitions, where on the left we nd a string (which is the structure term, by convention) and on the right a triple. This abuse of the language shall hopefully present no difculty. is isomorphic to the partial algebra of all , where is a denite structure term. This we can also look at differently. Let D be the set of denite structure terms. This set becomes a partial algebra together with the partial functions g D. We denote this algebra by . is usually not a strong subalgebra of . be the identity map. Then we have j g For let j : 0 g 1 g j 0 j g 1 . The right hand side is always dened, the left hand side need not be. The homomorphism D (which we also denote by ) is however strong. 1 . is a congruence Now look at the relation : 0 1 : 0 on ; for it clearly is an equivalence relation and if i i for all i f
V U
VaV iaaa) 6 U ) U D Q 2
T V U
V U
0 ) ) (
Denition 3.5 The partial map :
is called the unfolding map.
0 ) @S (
RV RW
V aV
)) U aaaV U U
0V )V )V a8U 8U 8U (
0 ) ) (
0 ) ) (
e e
RW

Q
RW 7
id
184
So, is isomorphic to the algebra of signs. For every sign there is a structure term, but there might also be several. As an instructive example we look at the sign system of triples of the form T 285 , where is the arrangement of hands of an ordinary clock (here showing 4:45), T a xed letter, and 285 the number of minutes past midnight/noon that is symbolized by this arrangement. So, the above triple is a sign of the language, while T 177 is not, since the hands show 3:10, which equals 190 minutes, not 177. We propose two modes: (the zero, 0ary) and (the successor function, unary). So, the unfolding of is T 0 , and the unfolding of is the advancement by one minute. Then is a total function, and we have
From this one easily gets that for every structure term , 720 . Hence every sign has innitely many structure terms, and so is inherently structurally ambiguous. If instead we take as meanings the natural numbers (say, the minutes that elapsed since some xed reference point) and : 0 as well as : n n 1 then every structure term represents a different sign! However, still there are only 720 exponents. Only that every exponent has innitely many meanings. We shall illustrate the concepts of a sign grammar by proceeding with our initial example. Our alphabet is now
The algebra consists of R together with some functions that we still have to determine. We shall now begin to determine the modes. They are , 2 , , , which are binary, 1 , , which are unary, and nally ten 0ary modes, namely , . We begin with the 0ary modes. These are, by denition, signs. For their identication we only need to know the three components. For example, to
g} D}
T ) ) 8i) $) Gaaa8S ) ) w)) )
(3.15)
R:
V 8
V 8U
(Gaaa ))
V
V PU
(3.14)
720
) ( )
V @ U 0 ) ( )
Proposition 3.6
This is welldened and we get an algebra, the algebra is easy to see.
RV
V0 aaV U
( aU
V0 aaV U
RV D
aaU ( D D
(3.13)
:i
:i
. The following
V i U
V @i U
V i U
then f is dened iff f now put:
is. And in this case we have f
. We can
V @i U
RW
) ( )
} }
185
the mode corresponds the triple 0 . This means: the exponent of the sign (what we get to see) is the digit ; its category is , and its meaning the number 0. Likewise with the other 0ary modes. Now on to the unary modes. These are operations taking signs to make new signs. We begin with , which is dened as 1 1 . On the level of strings we get the polynomial follows.
1
On the level of categories we get the function

1
Here is again the symbol for the fact that the function is not dened. Finally we have to dene 1 . We put
1
Notice that even if the function x x is iterable, the mode 1 is not. This is made impossible by the categorial assignment. This is an artefact of the example. We could have set things up differently. The mode nally is dened by the following functions. x : x, x : x and c : 1 c . Finally we turn to the binary modes. Let us look at . is the partial (!) binary function on . Further, we put
as well as (3.20)
p
The string denes as is easily computed a sign whose exponent is . By contrast, does not represent a sign. It is syntactically denite but not semantically, since we may not divide by 0. Denition 3.7 A linear system of signs over the alphabet A, the set of categories C and the set of meanings M is a set A C M. Further, let be a category. Then the interpreted language of with respect to this category is dened by
1 ) i 0 R) (
0 ) @S i(
V U
(3.21)
xm : x
m}
t{
cd :
u " ##7@
} (353 x}
V ) U
if c d , otherwise.
i i
V !) U } i i
(3.19)
xy :
x y
V U }
V U
} D}
V U
V U i
V U }
(3.18)
x :
t{
V U }
(3.17)
c :
if c , otherwise.
V U } i
(3.16)
x :
0 Gz( ) )
186
We added the qualifying phrase linear to distinguish this from sign systems which do not generally take strings as exponents. (For example, pictograms are nonlinear.) A system of signs is simply a set of signs. The question is whether one can dene an algebra over it. This is always possible. Just take a 0ary mode for every sign. Since this is certainly not as intended, we shall restrict the possibilities as follows. Denition 3.8 Let E C M be a system of signs. We say that is compositional if there is a nite signature and partial algebras E f :f F , C f :f F , M f : f F such that all functions are computable and is the carrier set of the 0generated par. is weakly compositional tial (strong) subalgebra of signs from if there is a compositional system such that E C M. Notice that E C M for certain sets E , C and M . We remark that p M in the sense of the denition above is a coma partial function f : M n M such that f M n f . So, the compuputable total function f : M n tation always halts, and we are told at its end whether or not the function is dened and if so what the value is. Two conditions have been made: the signature has to be nite and the functions on the algebras computable. We shall show that however strong they appear, they do not really restrict the class of sign systems in comparison to weak compositionality. We start by drawing some immediate conclusions from the denitions. If is a sign we say that (no dots!) is its realization. We have introduced the unfolding map above.
be a compositional sign grammar. Then the Proposition 3.9 Let unfolding map is computable.
Simply note that the unfolding of a structure term can be computed inductively. This has the following immediate consequence. Corollary 3.10 Let be compositional. Then is recursively enumerable. This is remarkable inasmuch as the set of all signs over E C M need not even be enumerable. For typically M contains uncountably many elements (which can of course not all be named by a sign)! Theorem 3.11 A system of signs is weakly compositional iff it is recursively enumerable.
0 IT
t k
S ') k
D e e
0 aV U V U V U ( ) )
IT 0
0 ) ) ( )
S ') (
e 8k
e bk
IT 0
} k
S ') (
187
Proof. Let E C M be given. If is weakly compositional, it also is recursively enumerable. Now, let us assume that is recursively enumerable, say ei ci mi : 0 i . (Notice that we start counting with 1.) Now n n n :n let be a symbol and : a system of signs. By properly choosing we can see to it that and that no n occurs in E, C , : 0, : 1 and : 1. or M. Let F :
(3.22)
This is welldened. Further, the functions are all computable. For example, the map i ei is computable since it is the concatenation of the computable functions i i, i ei ci mi with ei ci mi ei . We claim: the system of signs generated is exactly . For this we notice rst that a structure term is i i denite iff it has the following form. (a) t , or (b) t . In Case i 1 i 1 i 1 , in Case (b) the sign e (a) we get the sign i 1 ci 1 mi 1 . Hence we generate exactly . So, is weakly compositional. Notice that the algebra of exponents uses additional symbols which are only used to create new objects which are like natural numbers. The just presented algebra is certainly not very satisfying. (It is also not compositional.) Hence one has sought to provide a more systematic theory of categories and their meanings. A rst step in this direction are the categorial grammars. To motivate them we shall give a construction for CFGs that differs markedly from the one in Theorem 3.11. The starting point is once again an interpreted language x f x : x L , where L is context free and f computable. Then let G N A R be a CFG with L G L. Put A : A, C : N and M : M A . For simplicity we presuppose that G is already in Chomsky Normal Form. For every rule of the form A x we take a 0ary mode , which is dened as follows:
B C we take a binary mode
For every rule of the form
0 6) ) ( i i
(3.23)
xAx
dened
` 0
) ) ( F
V U
F0
) ) (
s k 0 ) ) ) ( D 1 i aV U ) @S } D s 0 i i( D
s 0 ) ) ( U U U
) ) (
V U
0 ) )
ei ci mi
i if otherwise.
) ) (
V U
0 )
0 ) ) ( U U U
i 1
i 1
i 1
i if otherwise,
) 0 ) ) (
V RU
V U
1D
t D
V U
( @S
1 D
D T ) ) 5S
) ) @S ( T S s
U
188 by
Finally we choose a unary mode :
Then is indeed the set of signs with category . As one can see, this algebra of signs is more perspicuous. The strings are just concatenated. The meanings, however, are not the ones we expect to see. And the category assignment is unstructured. This grammar is not compositional, since it still uses nonstandard meanings. Hence once again some pathological examples, which will show that there exist nonrecursive compositional systems of signs. Suppose that is a decidable system of signs. This means that there are , countable sets E, C and M such that either (i) E C M, or (ii) or (iii) there are two computable functions,
In particular, E, C and M are nite or countable. Also, we can nd a bijection : , where . (Simply generate a list d i for i 0 1 and skip repeated items.) Its inverse is also computable. Now we look at the projections 0 : e c m e, 1 : e c m c and 2 : e c m m. Denition 3.12 Let be a system of signs. is called enumerative if the projections 0 , 1 , and 2 are either bijective and computable or constant. Here is an enumerative subsystem of English. Take E to be the set of number , where is the category of names of English (see Section 2.7), C numbers, and M . Now let be the set of signs x n , where x names the number n in English. It is straightforward to check that is enumerative. Let be enumerative. We introduce two modes, (zeroary) and (unary) and say that
This generates , as is easily veried. This, however, is not compositional, unless we can show that the can be dened componentwise. Therefore put
VV aaV U U U
V U
(3.28)
e :
e 0
(3.27)
gV U U V U
if 0 is constant, otherwise.
aaa) )
"0 ) ) (
V U
0 ) ) ( i
T S
0 ) ) (
"0 ) ) (
U V
(3.26)
d :
d :
0 i aV U )
i ) (
i( V i a0 6) ) aU
(3.25)
f x
0 i !) ) i ( i i
V i a0 ) ) 0 6) ) aU i() i i(
}
(3.24)
xBx
yC y
xy A xy
189
This is computable if it is decidable whether or not e is in the image of 0 . So, the set 0 must be decidable. Similarly and are dened, and are computable if 1 and 2 , respectively, are decidable.
Theorem 3.14 Suppose that is modularly decidable and enumerative. Then is compositional. Theorem 3.15 (Extension) Let E C M be a recursively enumerable set of signs. Let be modularly decidable and enumerative. Assume that E is nite iff 0 is constant on ; similarly for C and M. Then is compositional. Proof. We rst assume that E, C and M are all innite. By Theorem 3.14, is compositional. Further, is recursively enumerable. So there is a computable function : . Moreover, 1 is also computable, and so 1 : is computable. Add a unary mode to the signature and let
m : 2
(On all other inputs the functions are not dened.) This is welldened and surjective. is partial, computable, and dened only on . Its full image is . Now assume that one of the projections, say 0 , is constant. Then e i : i n for some n. Then put E is nite, by assumption on , say E ei C M . i is also recursively enumerable. We do the proof i : as before, with an enumeration i : i in place of . Assume n new unary modes, i , and put : ei : 2 i (3.30) : 1 i
All i are computable, partial, and dened exactly on , which they i i map onto i .
VaaV U V X V U VV aaV U V X U
UaU U aU
V U D V U V U
i e i c i m
1 1
1 1 c
2 1 m
V U
V U
(3.29)
c : 1
1 1
VaaV U V V U VaaV U V V U VV aaV U V U
U X aU U X aU U X aU
e : 0
V U
0 1 e 1 1 c
2 1 m
Denition 3.13 is called modularly decidable if , 0 , 1 and 2 are decidable.
e T
0 ) ) (
SU !t
190
In this construction all occurring signs are in . Still, we do want to say that the grammar just constructed is compositional. Namely, if we apply to the string x we may get a string that may have nothing to do with x at all. Evidently, we need to further restrict our operations, for example, by not allowing arbitrary string manipulations. We shall deal with this problem in Section 5.7. Compositionality in the weak sense denes semantics as an autonomous component of language. When a rule is applied, the semantics may not spy into the phonological form or the syntax to see what it is supposed to do. Rather, it acts autonomously, without that knowledge. Its only input is the semantics of the argument signs and the mode that is being applied. In a similar way syntax is autonomous from phonology and semantics. That this is desirable has been repeatedly argued for by Noam Chomsky. It means that syntactic rules apply regardless of the semantics or the phonological form. It is worthwile to explain that our notion of compositionality not only makes semantics autonomous from syntax and phonology, but also syntax autonomous from phonology and semantics and phonology autonomous from syntax and semantics. Notes on this section. The notion of sign dened here is the one that is most commonly found in linguistics. In essence it goes back to de Saussure (1965), published posthumously in 1916, who takes a linguistic sign to consist of a signier and denotatum (see also Section 5.8). De Saussure therewith diverged from Peirce, for whom a sign was a triadic relation between the signier, the interpreting subject and the denotatum. (See also (Lyons, 1978) for a discussion.) On the other hand, following the mainstream we have added to de Saussure signs the category, which is nothing but a statement of the combinatorics of that sign. This structure of a sign is most clearly employed, for example, in Montague Grammar and in the MeaningtoText framework of Igor Mel uk (see for example (Mel uk, 2000)). Other theories, for example c c early HPSG and Unication Categorial Grammar also use the tripartite distinction between what they call phonology, syntax and semantics, but signs are not triples but much more complex in structure. The distinction between compositionality and weak compositionality turns on the question whether the generating functions should work inside the language or whether they may introduce new objects. We strongly opt for the former not only because it gives us a stronger notion. The denition in its informal rendering makes reference to the parts of an expression and their meanings and in actual practice the parts from which we compose an ex-
Propositional Logic
191
pression do have meanings, and it is these meanings we employ in forming the meaning of a complex expression.
Exercise 90. Show that English satises the conditions of Theorem 3.15. Hence English is compositional! Exercise 91. Construct an undecidable set such that its projections 0 , 1 and 2 are decidable. Construct a which is decidable but not its projection 0 . Exercise 92. Show that the functions postulated in the proof of Theorem 3.15, z and m , do exist if is recursively enumerable. Exercise 93. Say that E C M is extra weakly compositional if there exists a nite signature and algebras , and over sets E E, C C and M M, respectively, such that is the carrier set of which belong to the set the 0generated partial subalgebra of E C M. (So, the denition is like that of weak compositionality, only that the functions are not necessarily computable.) Show that is extra weakly compositional iff it is countable. (See also (Zadrozny, 1994).) 2. Propositional Logic
Before we can enter a discussion of categorial grammar and type systems, we shall have to introduce some techniques from propositional logic. We seize the opportunity to present boolean logic using our notions of the previous . Further, let section. The alphabet is dened to be A P : T : P , and M : 0 1 . Next, we dene the following modes. The zeroary modes are Here, ranges over (possibly empty) sequences of and . (So, the signature is innite.) Further, let be the following function: (3.32) 0 1 0 1 1 1 0 1
A n
0 ) ( )
) 0 ) ) i (
) 0 ) ) i (
A n
(3.31)
P0
P1
P0
T () ) ) ) ) ) R@ij8GS
k k
k $C e k e k
1 i 0 ) ) ) D( e } k
T ) S
Exercise 89. Let G R : X : transparent.
N A R be a CFG. Put N : N R , and : X R ,G : N A R . Show that G is
k 0k ) )k ) ( S s k } D D
T S
192
The system of signs generated by these modes is called boolean logic and is denoted by . To see that this is indeed so, let us explain in more conventional terms what these denitions amount to. First, the string language L we have dened is a subset of AP , which is generated as follows.
x is also called a wellformed formula (wff) or simply a formula iff it belongs to L. There are three kinds of wffs. Denition 3.16 Let x be a wellformed formula. x is a tautology if x P 0 . x is a contradiction if x P 1 . If x is neither a tautology nor a contradiction, it is called contingent. The set of tautologies is denoted by Taut , or simply by Taut if the language is clear from the context. It is easy to see that x is a tautology iff x is a contradiction. Likewise, x is a contradiction iff x is a tautology. We now agree on the following convention. Lower case Greek letters are proxy for wellformed formulae, upper case Greek letters are proxy for sets of formulae. Further, we write ; instead of and ; in place of . Our rst task will be to present a calculus with which we can generate all the tautologies of . For this aim we use a socalled Hilbert style calculus. Dene the following sets of formulae.
( ( (
The logic axiomatized by (a0) (a3) is known as classical or boolean logic, the logic axiomatized by (a0) (a2) as intuitionistic logic. To be more precise, (a0) (a3) each are sets of formulae. For example:
d( S
(3.35)
(a0)
( 1
( 1 g( @ 1 (
(3.34)
8
( d1
Q(5
d( Q(
(a0) (a1) (a2) (a3)
1 0 ) ) ( i
V i g( y
S s
V ) (U R
1 0 ) ) ( i
1 $( y i i
1 !) i i
If x y
L. L then x y L.
1 i
b1 i T ) S
If variables.
, then
L. These sequences are called propositional
) @ $( ( ) i i
V a0 ) ) 0 ) ) aU e i() i(
(3.33)
x P
y P
ge
The binary mode
of implication formation is spelled out as follows. x y P
V i (
T )
Propositional Logic
193
We call (a0) an axiom schema and its elements instances of (a0). Likewise with (a1) (a3). Denition 3.17 A nite sequence i : i n of formulae is a proof of if (a) n 1 and (b) for all i n either (b1) i is an instance of (a0) j i . The number n is called (a3) or (b2) there are j k i such that k if there is a proof of . the length of . We write The formulae (a0) (a3) are called the axioms of this calculus. Moreover, this calculus uses a single inference rule, which is known as Modus Ponens. It is the inference from and to . The easiest part is to show that the calculus generates only tautologies.
~
Lemma 3.18 If
then is a tautology.
The proof is by induction on the length of the proof. The completeness part is somewhat harder and requires a little detour. We shall extend the notion of proof somewhat to cover proofs from assumptions. Denition 3.19 A proof of from is a nite sequence i : i n of formulae such that (a) n 1 and (b) for all i n either (b1) i is an instance of (a0) (a3) or (b2) there are j k i such that k j i or (b3) i . We write if there is a proof of from . To understand this notion of a hypothetical proof, we shall introduce the notion of an assignment. It is common to dene an assignment to be a function from variables to the set 0 1 . Here, we shall give an effectively equivalent denition. Denition 3.20 An assignment is a maximal subset A of
A q`
(So, an assignment is a set of zeroary modes.) Each assignment denes a and , which we denote by A . closure under the modes Lemma 3.21 Let A be an assignment and a wellformed formula. Then either P 0 A or P 1 A , but not both. The proof is by induction on the length of x. We say that an assignment A makes a formula true if P 1 A .
V U
V U
V U
1 0 ) ) (
1 0 ) ) (
A A z n
such that for no both
A.
T V s U j'zx1 i
S s T V s U "j'Iyzf1 i
V U
1 0 ) ) (
(3.36)
T ) S
194
Denition 3.22 Let be a set of formulae and a formula. We say that follows from (or is a consequence of) if for all assignments A: if A makes all formulae of true then it makes true as well. In that case we write . Our aim is to show that the Hilbert calculus characterizes this notion of consequence:
Again, the proof has to be deferred until the matter is sufciently simplied. Let us rst show the following fact, known as the Deduction Theorem (DT).
Proof. The direction from right to left is immediate and left to the reader. Now, for the other direction suppose that ; . Then there exists a proof i : i n of from ; . We shall inductively construct a proof j : j m of from . The construction is as follows. We dene i inductively.
1
where i , i n, is dened as given below. Furthermore, we will verify inductively that i 1 is a proof of its last formula, which is i . Then : n will be the desired proof, since n 1 . Choose i n. Then either (1) i or (2) is an instance of (a0) (a3) or (3) i or (4) there are j k i such that k j i . In the rst two cases we put i : i i i i . In Case (3) we put
i is a proof of , as is readily checked. Finally, Case (4). There are j k i such that k j i . Then, by induction hypothesis, j and
) 8
( d1
0 ( ) R@ ( Q( ) ( d1 ( d( ( ) 1 ( @Q( ( d( d1@ 1 ( d( ( (
@(
(3.38)
i :
) 8
(3.37)
0 :
i i
Lemma 3.24 (Deduction Theorem) ;
iff
Theorem 3.23
Q( ) (
k (
iff
Propositional Logic
195
Proof. Assume that . Then there is a proof of from . It follows that is a proof of from ; . Conversely, assume that ; . Applying DT we get . Using (a3) we get . Proposition 3.26 The following holds.
~
.
~
This is easily veried. Now we are ready for the proof of Theorem 3.23. An easy induction on the length of a proof establishes that if then also . (This is called the correctness of the calculus.) So, the converse implication, which is the completeness part needs proof. Assume that . We shall show that also . Call a set consistent (in ) if .
Proof. . Assume that both ; and ; are inconsistent. Then we and ; . So by DT and, ushave ; ing (a3), . Hence ; and so ; . Because , we also have ; , showing that ; is inconsis; tent. . Assume ; ; is inconsistent. Then ; ; . So,
~ g( ( ~ ( ( 1(
g(
g(
g(
( g5
g(
Let ;
be consistent. Then also ; ;
is consistent.
Lemma 3.27 Let ; be consistent. Then either ; consistent or ; is consistent.
If
and ;
then ;
If
and
then also
. .
is
Lemma 3.25 (Little Deduction Theorem) For all and : only if ; .
if and
It is veried that i 1 is a proof of i . A special variant is the following.
( g5 g( ~ (
) b
( d1
0 R ( )b ( d1 ( ( ( d1@ ( d( @(
(
(3.39)
i :
~ g( 0 ) R 8g( ( W
g(
D d(
k
(
j i
already occur in the proof. Then put
j i
196
~
; , by applying DT. So, ; , using (a3). Applying DT we get . Using (a3) and DT once again it is nally seen that ; is inconsistent. Finally, let us return to our proof of the completeness theorem. We assume that . We have to nd an assignment A that makes true but not . We may also apply the Little DT and assume that ; is consistent and nd an assignment that makes this set true. The way to nd such an assignment is by applying the socalled downward closure of the set. Denition 3.28 A set is downward closed iff (1) for all either or and (2) for all formulae also and . Now, by Lemma 3.27 every consistent set has a consistent closure . (It is an exercise for the diligent reader to show this. In fact, for innite sets a little work is needed here, but we really need this only for nite sets.) Dene the following assignment. does occur in
It is shown by induction on the formulae of that the sodened assignment makes every formula of true. Using the correspondence between syntactic derivability and semantic consequence we immediately derive the following. Theorem 3.29 (Compactness Theorem) Let be a formula and a set of formulae such that . Then there exists a nite set such that . Proof. Suppose that . Then . Hence there exists a proof of from . Let be the set of those formulae in that occur in that proof. is nite. Clearly, this proof is a proof of from , showing . Hence . Usually, one has more connectives than just and . Now, two effectively equivalent strategies suggest themselves, and they are used whenever convenient. The rst is to introduce a new connective as an abbreviation. So, we might dene (for wellformed formulae) (3.42) (3.43)
(1@g( Q( (5 g( g(
(3.41)
: : :
jk
~ #
} Rk
( i ' ( i '
0 ) ) i @s ( S ( 0 ) ) i @S
(3.40)
P1 : P0 :
does not occur in
1 g1 (
g(
( g1 ( ~ ( g5g( D 1 D k
1 g( 1 g(
bk
Propositional Logic
197
After the introduction of these abbreviations, everything is the same as before, because we have not changed the language, only our way of referring to its strings. However, we may also change the language by expanding the alphabet. In the cases at hand we will add the following unary and binary modes (depending on which symbol is to be added): x P x P 0 1 0 1 1 1 y P y P 0 1
(3.47)
(
0 1
0 1 1 0
(3.49) (3.50)
If we eliminate the connective and dene as before (eliminating the axioms (a2) and (a3), however) we get once again intuitionistic logic, unless we add (3.51). The semantics of intuitionistic logic is too complicated to be explained here, so we just use the Hilbert calculus to introduce it. We claim that with only (a0) and (a1) it is not possible to prove all formulae of Taut that use only . A case in point is the formula
which is known as Peirces Formula. Together with Peirces Formula, (a0) and (a1) axiomatize the full set of tautologies of boolean logic in . The
(
( 1
( 1
@
(3.52)
( 5
(3.51)
Notice that in dening the axioms we have made use of (3.51) is derivable.
alone. The formula
)
@ dQ( !@ ) Q5 (5 @!@ ( ( ) ) b@ d( !R ) !R 1 ) ( Q( !I@ )
( 1 Q(
d1 @Q1 ( ( ( ( ( Q5 5 @ Q( d( d(
(3.48)
For , and respectively:
we need the postulates shown in (3.48), (3.49) and (3.50),
0 1 0 0 0 1
(3.46)
(3.45)
(3.44)
) @ ( ) i i V a0 ) ) 0 ) ) ag i() i(U D ) @ ( ) i i V a0 ) ) 0 ) ) ag i() i(U D v ) i i) 8 ( V a0 ) ) a$ i(U D

x P : x P
x y P x y P
198
calculus based on (a0) and (a1) is called and we write to say that there is a proof in the Hilbert calculus of from using (a0) and (a1). Rather than axiomatizing the set of tautologies we can also axiomatize the deducibility relation itself. This idea goes back to Gerhard Gentzen, who used it among other to show the consistency of arithmetic (which is of no concern here). For simplicity, we stay with the language with only the arrow. We shall axiomatize the derivability of intuitionistic logic. The statements that we are deriving now have the form and are called sequents. is called the antecedent and the succedent of that sequent. The axioms are
~
(3.53)
(ax)
Then there are the following rules of introduction of connectives:
Notice that these rules introduce occurrences of the arrow. The rule (I ) introduces an occurrence on the right hand side of , while ( I) puts an occurrence on the left hand side. (The names of the rules are chosen accordingly.) Further, there are the following socalled rules of inference:
~ ~ ~
(3.55)
(cut)
The sequents above the line are called the premisses, the sequent below the lines the conclusion of the rule. Further, the formulae that are introduced by the rules ( I) and (I ) are called main formulae, and the formula in (cut) the cutformula. Let us call this the Gentzen calculus. It is denoted by . Denition 3.30 Let be a sequent. A (sequent) proof of length n of in is a sequence i i : i n 1 such that (a) n , n , (b) for all i n 1 either (ba) i i is an axiom or (bb) i i follows from some earlier sequents by application of a rule of . It remains to say what it means that a sequent follows from some other sequents by application of a rule. This, however, is straightforward. For example, follows from the earlier sequents by application of the rule (I ) if among the earlier sequents we nd the sequent ; . We shall dene also a different notion of proof, which is based on trees rather than sequences. In doing so, we shall also formulate a somewhat more abstract notion of a calculus.
~ ~~ ~
~~ ~
; ;
~~ ~
(mon)
~~ 0 ~
~~
~~~
~~ ~
(3.54)
(I )
;
~
( I)
; ;
~~
D~
Propositional Logic
199
Denition 3.31 A nitary rule is a pair M , where M is a nite set of sequents and a single sequent. (These rules are written down using lower case Greek letters as schematic variables for formulae and upper case Greek letters as schematic variables for sets of formulae.) A sequent calculus is a set of nitary rules. An proof tree is a triple T such that T is a tree and for all x: if yi : i n are the daughters of T , yi : i n x is an instance of a rule of . If r is the root of , we say that proves r in . We write
~
For negation we have these rules.
The following are the rules for conjunction.
Finally, these are the rules for .

~'
(3.60)
Let us return to the calculus . We shall rst of all show that we can weaken the rule system without changing the set of derivable sequents. Notice that the following is a proof tree.
~
This shows us that in place of the rule (ax) we may actually use a restricted rule, where we have only i i . Call such an instance of (ax) primitive. This fact may be used for the following theorem.
~~~
(3.61)
~~ ~
~ ~~
(I1 )
~ ~~
~~ ~
( I)
; ; ; (I2 )
~'
(3.59)
(I )
~ ~~
~ ~~
~~~
( I)
; ; ;
p~ '
(3.58)
~ ~~
( I)
(I )
~~~
~~ ~
(3.57)
( I)
We start with the only rule for
, which actually is an axiom.
to say that the sequent
(3.56)
has a proof in .
0 3) h ) (
V b3 U 0 U 3) aV bbT
V b!( U 3S 0 ) (
200
Proof. From right to left follows using the rule (I ). Let us prove the other direction. We know that there exists a proof tree for from primitive axioms. Now we trace backwards the occurrence of in the tree from the root upwards. Obviously, since the formula has not been introduced by (ax), it must have been introduced by the rule (I ). Let x be the node where the formula is introduced. Then we remove x from the tree, thereby also removing that instance of (I ). Going down from x, we have to repair our proof as follows. Suppose that at y x we have an instance of (mon). Then instead of the proof part to the left we use the one to the right.
~~ ~ (
Suppose that we have an instance of (cut). Then our specied occurrence of is the one that is on the right of the target sequent. So, in place of the proof part on the left we use the one on the right.
(
Now suppose that we have an instance of ( I). Then this instance must be as shown to the left. We replace it by the one on the right.
~~ ~ ~ ~~ ( ~ ~~
The rule ( I) does not occur below x, as is easily seen. This concludes the replacement. It is veried that after performing these replacements, we obtain a proof tree for ; .
~
Q( ~ ( ~~~
(3.65)
~~ ~
Proof. Suppose that

~
. By induction on the length of the proof we shall show that . Using DT we may restrict ourselves to . First, we shall show that (a0) and (a1) can be derived. (a0) is derived as follows.
Theorem 3.33
iff
(3.64)
(
~ 0 ~~~
; ;
; ; ; ;
~~~
~~~
(3.63)
; ;
(3.62)
~~~
; ; ;
; ; ; ;
~~ ~
Lemma 3.32
iff
Propositional Logic
201
For (a1) we need a little more work.
If we apply (I ) three times we get (a1). Next we have to show that if we and then . By DT, we also have have
and then a single application of (cut) yields the desired conclusion. . Now, conversely, we have to show that This proves that implies that . This is shown by induction on the height of the nodes clearly holds. in the proof tree. If it is 1, we have an axiom: however, Now suppose the claim is true for all nodes of depth i and let x be of depth i. Then x is the result of applying one of the four rules. ( I). By induction hypothesis, and ; . We need to show that ; . Simply let 1 be a proof of from , 2 a proof of from ; . Then 3 is a proof of from ; .
(I ). This is straightforward from DT. (cut). Suppose that 1 is a proof of from and 2 a proof of from ; . Then 1 2 is a proof of from ; , as is easily seen. (mon). This follows from Proposition 3.26. Call a rule admissible for a calculus if any sequent that is derivable in is also derivable in . Conversely, if is admissible in , we say that is eliminable from . We shall show that (cut) is eliminable from , so that it can be omitted without losing derivable sequents. As cut elimination will play a big role in the sequel, the reader is asked to watch the procedure carefully.
Proof. Recall that (cut) is the following rule.

~ ~~ ~~ ~
(3.68)
(cut)
; ;
~
Two measures are introduced. The degree of (3.68) is
6 ! ! g g g
(3.69)
d:
Theorem 3.34 (Cut Elimination) (cut) is eliminable from
~~ ~
) W 0 b
( W
(3.67)
(
3 :
~ ~~
~ ~ ~
~' ~
~~ ~

( (
~~ ~
(3.66)
; ; ; ; ;
~
~ ~ ~
d(
~
~ ~~
~ ~ ~
~ ~~
202
The weight of (3.68) is 2d . The cutweight of a proof tree is the sum over all weights of occurrences of cuts (= instances of (cut)) in it. Obviously, the cutweight of a proof tree is zero iff there are no cuts in it. We shall now present a procedure that operates on proof trees in such a way that it reduces the cutweight of every given tree if it is nonzero. This procedure is as follows. Let be given, and let x be a node carrying the conclusion of an instance of (cut). We shall assume that above x no instances of (cut) exist. (Obviously, x exists if there are cuts in .) x has two mothers, y 1 and y2 . Case (1). Suppose that y1 is a leaf. Then we have y1 , y2 ; and x ; . In this case, we may simply skip the application of cut by dropping the nodes x and y1 . This reduces the degree of the cut by 2 , since this application of (cut) has been eliminated without trace. Case (2). , y1 , whence Suppose that y2 is a leaf. Then y2 and x y1 . Eliminate x and y2 . This reduces the cutweight by the weight of that cut. Case (3). Suppose that y 1 has been obtained by application of (mon). Then the proof is as shown on the left. (3.70)
We may assume that 0. We replace the local tree by the one on the right. The cut weight is reduced by
Case (4). y2 has been obtained by application of (mon). This is similar to the previous case. Case (5). y1 has been obtained by ( I). Then the main formula is not the cut formula. (3.72) ; ; ; ; ;
And the cut can be rearranged as follows. (3.73)

~ ~ ~
~~ ~
~'
~~ ~
; ; ; ; ; ;
~~ ~
(3.71)
~~ ~
~ ~~
~~ ~
U b3
; ; ; ;
~ ~~
; ; ;
~~~
U b3
~~~
U b3
U b3
U 3
~0
U 3
~F
~~ ~
~~ ~
U 3
V 3 U
V b3 U
Propositional Logic
203
Here, the degree of the cut is reduced by 0. Thus the cut weight is reduced as well. Case (6). y 2 has been obtained by ( I). Assume . (3.74)
In this case we can replace the one cut by two as follows. (3.75)
~ ~ ~ ~
If we now apply ( I), we get the same sequent. The cutweight has been diminished by
(See also below for the same argument.) Suppose however . Then either is not the main formula of y 1 , in Case (1), (3), (5), or it actually is the main formula, and then we are in Case (7), to which we now turn. Case (7). y1 has been introduced by (I ). If the cut formula is not the main formula, we are in cases (2), (4), (6) or (8), which we dealt with separately. Suppose however the main formula is the cut formula. Here, we cannot simply permute the cut unless y 2 is the result of applying ( I). In this case we proceed as follows. for some and . The local proof is as follows. (3.77)
This is rearranged in the following way. (3.78)

~~ ~ ~~ ~ ~~ ~
This operation eliminates the cut in favour of two cuts. The overall degree of these cuts may be increased, but the weight has been decreased. Let d :
~~ ~
; ; ; ;
~'
~~ ~
~~~
~~ ~
~~ ~
; ; ;
U b3
U b3 D
(3.76)
~~ ~
; ;
~ ~~ (
~~ ' ~
; ; ; ; ; ; ;
~ ~~
; ; ; ;
@ j@ v
U b3
~ ~~
U 3
~ ~~
204
since p 0. (Notice that 2a c 2a d 2a 2c 2d 2a 2c d a c d if c d 2 0.) Case (8). y2 has been obtained by (I ). Then for some and . We replace the left hand proof part by the right hand part, and the degree is reduced by 0. (3.80)
~~ ~ ~~ ~
So, in each case we managed to decrease the cutweight. This concludes the proof. Before we conclude this section we shall mention another deductive calculus, called Natural Deduction. It uses proof trees, but is based on the Deduction Theorem. First of all notice that we can write Hilbert style proofs also in tree format. Then the leaves of the proof tree are axioms, or assumptions, and the only rule we are allowed to use is Modus Ponens.
(
(3.81)
(MP)
This, however, is a mere reformulation of the previous calculus. The idea behind natural deduction is that we view Modus Ponens as a rule to eliminate the arrow, while we add another rule that allows to introduce it. It is as follows.
(
However, when this rule is used, the formula may be eliminated from the assumptions. Let us see how this goes. Let x be a node. Let us call the set y y : y x y leaf the set of assumptions of x. If (I ) is used Ax : to introduce , any number of assumptions of x that have the form y may be retracted. In order to know what assumption has been effectively retracted, we check mark the retracted assumptions by a superscript
(
(3.82)
(I )
~~~
; ; ; ;
(
; ; ; ; ;
"V
v
g D
U b3
( ~~~ ~
~~ ~
(3.79)
2d
2d
2d
2d
; , p : . Then the rst cut has weight 2 d cuts have weight
) g
( 0 U 3 ( aV b) @S
. The two other
0 ) (
V U
Propositional Logic
{
205
(e. g. ). Here are the standard rules for the other connectives. The fact that the assumption is or may be removed is annotated as follows:
Here, means that any number of assumptions of the form above the node carrying may be check marked when using the rule. (So, it does not mean that it requires these formulae to be assumptions.) The rule (E ) is nothing but (MP). First, conjunction.
For negation we need some administration of the check mark.
So, using the rule (I ) any number of assumptions of the form may be check marked. Disjunction is even more complex.
(E )
(3.87)
. . .
. . .
(I1 )
(I2 )
(3.86)
(I )
(E )
(3.85)
(E )
The next is
. . .
(3.84)
(I )
(E1 )
(E2 )
(3.83)
(I )
(E )
. . .
206
In the last rule, we have three assumptions. As we have indicated, whenever it is used, we may check mark any number of assumptions of the form in the second subtree and any number of assumptions of the form in the third. We shall give a characterization of natural deduction trees. A nitary rule is a pair i Ai : i n , where for i n, i is a formula, Ai a nite set of formulae and a single formula. A natural deduction calculus is a set of nitary rules. A proof tree for is a quadruple T such that T is a tree, T a set of leaves and is derived in the following way. (Think of as the set of leaves carrying discharged assumptions.)
There is a rule i Ai : i n , and is formed from trees i, i n, with roots si , by adding a new root node r, such that yi i , i n, i x . Further, i n i n Ni , where Ni is a set of leaves i of i such that for all i n and all x Ni : x Ai .
i
Further, here is a proof tree ending in (a1).
A formula depends on all its assumptions that have not been retracted in the following sense. Lemma 3.35 Let be a natural deduction tree with root x. Let be the set of all formulae such that y is an unretracted assumption of x and let : x . Then .

Q5 ( @
( @Q1@ ( Q( @ ( d1 ( @ (
(3.89)
0 ) (
d(
d(
(3.88)
V S U 3 0 S) 3) w) S IT 'R!bT !(
(Notice that the second case includes n 0, in which case where x is simply an axiom.) We say that proves r in x leaf x . Here now is a proof tree ending in (a0).
x from
V U 3
V 3 U
1 yV U 3
D 1 s m D
0 bT )
0 w) 3) w) S !R!bT !(
S !(
T 1 ) V 3 U
V F3 U
V 3 U
, where : x
0 ) 3) h I) (
0 bT )
S !(
0 ) ( D
x x :
Basics of Calculus and Combinatory Logic
207
Proof. By induction on the derivation of the proof tree. The converse also holds. If then there is a natural deduction proof for with the set of unretracted assumptions (this is Exercise 99). Notes on this section. Proofs are graphs whose labels are sequents. The procedure that eliminates cuts can be described using a graph grammar. Unfortunately, the replacements also manipulate the labels (that is, the sequents), so either one uses innitely many rules or one uses schematic rules.
Exercise 96. Show that a Hilbert style calculus satises DT for iff the formulae (a0) and (a1) are derivable in it. (So, if we add, for example, the connectives , and together with the corresponding axioms, DT remains valid.) Exercise 97. Dene by and . Show that if then iff ; , and (b) for all : iff (a) for all and : ; . Exercise 98. Let us call the Hilbert calculus for , , , and . Fur iff ther, call the Gentzen calculus for these connectives . Show that Exercise 99. Show the following claim: If then there is a natural deduction proof for with the set of unretracted assumptions. Exercise 100. Show that the rule of Modus Tollens is admissible in the natural deduction calculus dened above (with added negation).
( ~~ ~
3.
There is a fundamental difference between a term and a function. The term x2 2xy is something that has a concrete value if x and y have a concrete value. For example, if x has value 5 and y has value 2 then x 2 2xy 25 20 45. However, the function f : : xy x 2 2xy does not
y0 ) (
(3.90)
Modus Tollens:
R ~
Exercise 95. Show that a set is inconsistent iff for every :
Exercise 94. Show (a) , where is with the axioms for added.
(
and (b)
Q(
d(
208
need any values for x and y. It only needs a pair of numbers to yield a value. That we have used variables to dene f is of no concern here. We would have obtained the same function had we written f : x u x 2 2xu. However, the term x2 2xu is different from the term x2 2xy. For if u has value 3, x has value 5 and y value 2, then x2 2xu 25 30 55, while x2 2xy 45. To accommodate this difference, the calculus has been developed. The calculus allows to dene functions from terms. In the case above we may write f as
This expression denes a function f and by saying what it does to its arguments. The prex xy means that we are dealing with a function from pairs m n and that the function assigns this pair the value m 2 2mn. This is the x 2 2xy. Now we can also same as what we have expressed with x y dene the following functions.
The rst is a function which assigns to every number m the function y m 2 2my; the latter yields the value m2 2mn for every n. The second is a function which gives for every m the function x x 2 2xm; this in turn yields n2 2nm for every n. Since in general m2 2mn n2 2nm, these two functions are different. In calculus one usually does not make use of the simultaneous abstraction of several variables, so one only allows prexes of the form x, not those of the form xy. This we shall also do here. We shall give a general denition of terms. Anyhow, by introducing pairing and projection (see Section 3.6) simultaneous abstraction can be dened. The alphabet consists of a set F of function symbols (for which a signature needs to be given as well), , the variables V : the brackets , and the period . i:i Denition 3.36 The set of terms over the signature , the set of terms for short, is the smallest set Tm V for which the following holds: Every term is in Tm V .
V U
1 t
s
If M
Tm and x is a variable then
x M
V U
1 %t
V U
If M N
Tm V then also MN
Tm V . Tm V .
t s
g g D
V U
V U
(3.92)
x y x2
2xy
y x x2
2xy
Y0 ) (
(3.91)
f:
xy x2
2xy
y0 ) (
D g
0 ) (
209
If the signature is empty or clear from the context we shall simply speak of terms. Since in we do not write an operator symbol, Polish Notation is now ambiguous. Therefore we follow standard usage and use the brackets and . We agree now that x, y and z and so on are metavariables for variables (that is, for elements of V ). Furthermore, upper case Roman letters like M, N are metavariables for terms. One usually takes F to be , to concentrate on the essentials of functional abstraction. If F , we speak of pure terms. It is customary to omit the brackets if the term is bracketed to the left. Hence MNOP is short for MN O P and x MN short for x MN (and distinct from x M N ). However, this abbreviation has to be used with is not a care since the brackets are symbols of the language. Hence string of the language but only a shorthand for , a difference that we shall ignore after a while. Likewise, outer brackets are often omitted and brackets are not stacked when several prexes appear. Notice that is a term. It denotes the application of to itself. We have dened occurrences of a string x in a string y as contexts u v where u x v y. terms are thought to be written down in Polish Notation. Denition 3.37 Let x be a variable. We dene the set of occurrences of x in a term inductively as follows. If M is an term then the set of occurrences of x in the term M is the set of occurrences of the variable x in the term M. The set of occurrences of x in MN is the union of the set of pairs u vN , where u v is an occurrence of x in M and the set of pairs M u v , where u v is an occurrence of x in N.
So notice that technically speaking the occurrence of the string x in the prex of x M is not an occurrence of the variable x. Hence does not occur in x as a term although it does occur in it as a string! Denition 3.38 Let M be a term, x a variable and C an occurrence of x in M. C is a free occurrence of x in M if C is not inside a term of the form x N for some N; if C is not free, it is called bound. A term is called closed if no variable occurs free in it. The set of all variables having a free occurrence in M is denoted by fr M .
0 i t ) i
s (
The set of occurrences of x in x M is the set of all where u v is an occurrence of x in M.
x uv ,
t @ s
t t

i i i
s s
@
0 !) 6( i i
s
t t U i
0 ) '( i i 0 ) 6( i i
t s s s s @s y t t
6 )y
0 ) '( i i
y
s s
0t !) i ( i s 0 i is t !) ( t
s
210
A few examples shall illustrate this. In M the variable occurs only bound, since it only occurs inside a subterm of the form N (for example N : ). However, occurs free. A variable may occur free as well as bound in a term. An example is the variable in . Bound and free variable occurrences behave differently under replacement. If M is a term and x a variable then denote by N x M the result of replacing x by N. In this replacement we do not simply replace all occurrences of x by N; the denition of replacement requires some care.
(3.93c) (3.93d) (3.93e) (3.93f)
N x MM
N xM
N xM
N x
x M : y M : y M :
x M
N x N x
y N xM
z N x z yM xy
In (3.93f) we have to choose z in such a way that it does not occur freely in N or M. In order for substitution to be uniquely dened we assume that z i , where i is the least number such that z satises the conditions. The precaution in (3.93f) of an additional substitution is necessary. For let y and M . Then without this substitution we would get
This is clearly incorrect. For is the function which for given a returns the value of . However, is the identity function and so it is different from that function. Now the substitution of a variable by another variable shall not change the course of values of a function. (3.95a) (3.95b) (3.95c) M M M M N N L
|
D DD
D DD
M L
s
t s
6 )y t
s 6 s
6 )y
DDD
y s
(3.94)
6 )y
if y
fr N and x
fr M
if y
x and: y
fr N or x
V U 1 ) D t ! s y V U 1 D t s y s t y tyV k aV as U U iaaa) aU )
6 y
t s y D y t s y D y tk s y V U i
(3.93b)
N x f s :
6 )y
(3.93a)
N xy:
N y
if x y, otherwise. N x s
f N x s0
fr M
t t I s s
6
t R s
s y
DD D DD D DD D
t t
6 )y
6 k
s s
s D
6
211
We shall present the theory of terms which we shall use in the sequel. It consists in a set of equations M N, where M and N are terms. These are subject to the laws above. The theory axiomatized by (3.95a) (3.95g) and (3.95i) is called , the theory axiomatized by (3.95a) (3.95i) . Notice that (3.95a) (3.95e) simply say that is a congruence. A different rule is the following socalled extensionality rule.
|
(3.96)
Mx
Nx
(ext)
It can be shown that (ext) . The model theory of calculus is somewhat tricky. Basically, all that is assumed is that we have a domain D together with a binary operation that interprets function application. Abstraction is dened implicitly. Call a function : V D a valuation. Now dene M inductively as follows.
i
(3.97b) (3.97c)
MN
xM
a:
(Here, a D.) (3.97c) does not x the interpretation of x M uniquely on the basis of the interpretation of M. If it does, however, the structure is called extensional. We shall return to that issue below. First we shall develop some more syntactic techniques for dealing with terms. Denition 3.39 Let M and N be terms. We say, N is obtained from M by replacement of bound variables or by conversion and write M N if there is a subterm y L of M and a variable z which does not occur in L such that N is the result of replacing an occurrence of y L by z z y L . The relation is the transitive closure of . N is congruent to M, in symbols M N, if both M N and N M.
s
s
D V aU D V U
s
3 t s t s
(3.97a)
x: a
(3.95i)
xM
xN
DDD
g d
DD D
D s
(3.95h)
xM
fr M
s
(3.95g)
xM N
N xM
t s t s | D V U 1 t D t D DD s t
s 1
(3.95f)
xM
y y xM
D DD
(3.95e)
LM
LN
t
y fr M
s
|
(3.95d)
ML
DDD t DDD
DD D DD D
NL
conversion) conversion conversion rule
s
212
Similarly the denition of conversion. Denition 3.40 Let M be a term. We write M N and say that M contracts to N if N is the result of a single replacement of an occurrence of x L P in M by P x L . Further, we write M N if N results from M by a series of contractions and M N if M N and N M. x M N is called a redex and N x M its contracA term of the form tum. The step from the redex to the contractum represents the evaluation of a function to its argument. A term is evaluated or in normal form if it contains no redex. Similarly for the notation , and . Call M and N equivalent ( equivalent) if M N is contained in the least equivalence relation containing ( .
~
Proposition 3.41 M N iff M and N are equivalent. iff M and N are equivalent.
~
If M N and N is in normal form then N is called a normal form of M. Without proof we state the following theorem.
The proof can be found in all books on the calculus. This theorem also holds for .
The proof is simple. For by the previous theorem there exists a P such that N P and N P. But since N as well as N do not contain any redex and conversion does not introduce any redexes then P results from N and N by conversion. Hence P is congruent with N and N and hence N and N are congruent. Not every term has a normal form. For example
t@t s t s s t s t s 6 6 y 6 6 6 y 6 t t @t s t s s s t s
(3.98)
Corollary 3.43 Let N and N be normal forms of M. Then N
N.
Theorem 3.42 (Church, Rosser) Let L M N be terms such that L and L N. Then there exists a P such that M P and N P.
DD D
0 )
t s k
DD D
s @s
Rk
t t
s s
213
Or
The typed calculus differs from the calculus which has just been presented by an important restriction, namely that every term must have a type.
In other words: types are simply terms in the signature with 2 over a set of basic types. Each term is associated with a type and the structure of terms is restricted by the type assignment. Further, all terms are admitted. Their type is already xed. The following rules are valid. If MN is a term of type then there is a type such that M has the type and N the type . x M is of
Notice that for every type there are countably many variables of type . i :i More exactly, the set of variables of type is V : . We shall often use the metavariables x , y and so on. If then also x x (they represent different variables). With these conditions the formation of terms is severely restricted. For example is not a typed term no matter which type has. One can show that a typed term always has a normal form. This is in fact an easy matter. Notice by the way that if the term has type and and also have the type , the function has the type . The type of an term is the type of its value, in this case . The types are nothing but a special version of sorts. Simply take Typ B to be the set of sorts. However, while application (written ) is a single symbol in the typed calculus, we must now assume in place of it a family of symbols of signature
0 ) )
s
If M has the type and x is a variable of type then type .

6
V U
t t s s
T cIS
SD
V U
t@ttI v s s s 6 6 y t v6 R I s y 6
If
M. M and M then
M.
V U
Denition 3.44 Let B be a set. The set of types over B, Typ est set M for which the following holds.
t t t t t t R R @I @ @s @R @s @s s s t t t s s s s

B , is the small-
t ttt t s Ft t s @s s s t s s s 6 6 6 6 y 6 6 6 6 y 6 t t @t t @s @t t s s s s t s s

6 6 6 6 y 6 6 6 6 y 6
(3.99)
214
for every type . Namely, M N is dened iff M has type and N type , and the result is of sort (= type) . While the notation within many sorted algebras can get clumsy, the techniques (ultimately derived from the theory of unsorted algebras) are very useful, so the connection is very important for us. Notice that algebraically speaking it is not but that is a member of the signature, and once again, in the many sorted framework, . That is to say, turns into a family of operations of sort is a function symbol that only forms a term with an argument of sort (= type) and yields a term of type . We shall now present a model of the calculus. We begin by studying the purely applicative structures and then turn to abstraction after the introduction of combinators. In the untyped case application is a function that is everywhere dened. The model structures are therefore socalled applicative structures. A where is a Denition 3.45 An applicative structure is a pair binary operation on A. If is only a partial operation, A is called a partial applicative structure. is called extensional if for all a b A:
Denition 3.46 A typed applicative structure over a given set of basic types B is a structure A : Typ B such that (a) A is a set for every , (b) A A if and (c) a b is dened iff there are types and such that a A and b A , and then a b A . A typed applicative structure denes a partial applicative structure. Namely, is nothing but a partial binary operation on A. The put A : A ; then typing is then left implicit. (Recovering the types of elements is not a trivial affair, see the exercises.) Not every partial applicative structure can be typed, though. One important type of models are those where A consists of sets and is the usual functional application as dened in sets. More precisely, we want that A is a set of sets for every . So if the type is associated with the set S then a variable may assume as value any member of S. So, it follows that if is associated with the set T and M has the type then the interpretation of to be the M is a function from S to T . We set the realization of set of all functions from S to T . This is an arbitrary choice, a different choice (for example a suitable subset) would do as well.
3 0 3) T !bV U
(3.100)
b iff for all c
A:a c
b c
1 )
03!) ( 0 3 !) (
) (
1 D
S !( D
s
215
Let M and N be sets. Then a function from M to N is a subset F of the cartesian product M N which satises certain conditions (see Section 1.1). Namely, for every x M there must be a y N such that x y F and if xy F and x y F then y y . (For partial functions the rst condition is omitted. Everything else remains. For simplicity we shall deal only with totally dened functions.) Normally one thinks of a function as something that gives values for certain arguments. This is not so in this case. F is not a function in this sense, it is just the graph of a function. In set theory one does not distinguish between a function and its graph. We shall return to this later. How do we have to picture F as a set? Recall that we have dened
is a bijection. Its inverse is the mapping
Finally we put
Elsewhere we have used the notation N M for that set. Now functions are also sets and their arguments are sets, too. Hence we need a map which applies a function to an argument. Since it must be dened for all cases of functions and arguments, it must by necessity be a partial function. If x is a function and y an arbitrary object, we dene x y as follows. (3.105) xy :
is a partial function. Its graph in the universe of sets is a proper class, however. It is the class of pairs F x y , where F is a function and x y F. Note that if F M N O then
U e
} V
U e
(3.106)
1 0 ) (
1 Q0 ) (
V 0 0 ) a( ) (
1 d0 ) (
if y z x, if no z exists such that y z
V ) @@8q U p p
V ) @@q U p p
(3.104)
N:
N : F a function
x.
U e
e "V
0 a0 ) ) 0 0 ) a( ( ( ) (
(3.103)
xy z
x yz : M
e V
U FV
U e
0 0 ) aa0 ) ) ( ) ( ( 0 (
(3.102)
: x yz
xy z :M
e FV
U e
This is a set. Notice that M
U yV D 1 )
0 ) @S (
(3.101)
x y :x
M y
O. However, the mapping M N O
1 0 ) (
1 0 k ) ( D
1 Q0 ) (
p p 8@q
216
Then F M N O, and one calculates that F M N O. In this way a unary function with values in N O becomes a unary function from M N to O (or a binary function from M, N to O). Conversely, one can see that if F M N O then F M N O.
In place of V one can take any V where is an ordinal. However, only if is a limit ordinal (that is, an ordinal without predecessor), the structure will be combinatorially complete. A more general result is described in the following theorem for the typed calculus. Its proof is straightforward. Theorem 3.48 Let B be the set of basic types and Mb , b B, arbitrary sets. Let M be inductively dened by M : M M . Then
is a typed applicative structure. Moreover, it is extensional. For a proof of this theorem one simply has to check the conditions. In categorial grammar, with which we shall deal in this chapter, we shall use terms to name meanings for symbols and strings. It is important however that the term is only a formal entity (namely a certain string), and it is not the meaning in the proper sense of the word. To give an example, is a string which names a function. In the set universe, this function is a subset of . For this reason one has to distinguish between equality and the symbol(s) / . M N means that we are dealing with the same strings (hence literally the same terms) while M N means that M and N name the same function. In this sense , but they also denote the same value. Nevertheless, in what is to follow we shall not always distinguish between a term and its interpretation, in order not to make the notation too opaque. The calculus has a very big disadvantage, namely that it requires some caution in dealing with variables. However, there is a way to avoid having to use variables. This is achieved through the use of combinators. Given a set V of variables and the zeroary constants , , , combinators are terms over the signature that has only one more binary symbol, . This symbol is generally omitted, and terms are formed using inx notation with brackets. Call this signature .
V w w PU
0 p p q) T @8bV U
w D
t s 8 Ft @t v s t D 6 6 6
t t v @R R
6 )y
6 )y
S !(
(3.107)
M :
Typ
0 p p q R@@)
Theorem 3.47 Let V be the set of nite sets. Then V applicative structure.
is a partial
FV
U } 1 V (
U V
e V
U f1
U }
6 6
s s
s s
y y
6 6
217
Denition 3.49 An element of Tm V is called a combinatorial term. A combinator is an element of Tm . (3.108a) (3.108c) (3.108d) (3.108e) (3.108f) (3.108g)
}
X X XY X
X X
if X Y and Y Z then X Z
Combinatory logic ( ) is (3.108a) (3.108e). It is an equational theory if we read simply as equality. (The only difference is that is not symmetric. So, to be exact, the rule if X Y then Y X needs to be added.) We note that there is a combinator containing only and such that (see Exercise 104). This explains why is sometimes omitted. We shall now show that combinators can be dened by terms and vice versa. First, dene (3.109b) (3.109c) : :
6 )y
(3.110d)
x MN :
)bt s s t V U 1 )
(3.110c)
x Mx :
M if x
var M . xN otherwise.
xM
(3.110b)
xM:
i
(3.110a)
xx:
M if x
var M .
The converse translation is more difcult. We shall dene rst a function x on combinatory terms. (Notice that there are no bound variables, so var M fr M for any combinatorial term M.)
D then
D .
Theorem 3.50 Let C and D be combinators. If C D then C
D . Also, if
Dene a translation by X : X for X V , : , : , the following is proved by induction on the length of the proof.
D t t t @t
6 y
t @t
D s s s 6 y 6 D s s
DDD
s
(3.109a)
i f
DDD
if X Y then ZX
ZY
m m
if X Y then XZ
YZ
DD D
XY Z XZ Y Z
6 )y
(3.108b)
Further, the redex relation
is dened as follows.
V U
V w U
. Then
DD D
218
(So, (3.110d) is applied only if (3.110b) and (3.110c) cannot be applied.) For example . Indeed, if one applies this to , then one gets
Further, one has

H
Now we have dened translations from terms to combinators and back. It can be shown, however, that the theory is stronger than under translation. Curry found a list A of ve equations such that is as strong as A in the sense of Theorem 3.52 below. Also, he gave a list A such that A is equivalent to (ext). A also is equivalent to the rstorder z x z y z x y. postulate (ext): xy Theorem 3.52 (Curry) Let M and N be terms.
~
There is also a typed version of combinatorial logic. There are two basic approaches. The rst is to dene typed combinators. The basic combinators now split into innitely many typed versions as follows. Combinator Type
Together with they form the typed signature . For each type there are countably innitely many variables of that type in V . Typed combinatorial terms are elements of Tm V , and typed combinators are elements of Tm . Further, if M is a combinator of type and N a combinator
V aV
U V
UU V a"aV
U U U
V U
(3.113)
g |
~
If
N then
D DD
g |
DD D
If
N then
N . N .
g d|
g y|
Theorem 3.51 Let C be a closed term. Then
C .
DD D
D DD
t s D D
The reader may verify that MN : M N and x N
. Now dene x N .
t t s i ws
D DD
V 3
DD D
t y s
3 V aV U ~U U
g y
~ U
DD D
(3.112)
t I
s
t eR
s
i 0
@ t
(3.111)
i 6
t Ft s s
6 6 6
} 6

6 }
by x :
x, x
V,
219
of type then MN is a combinator of type . In this way, every typed combinatorial term has a unique type. The second approach is to keep the symbols , and and to let them stand for any of the above typed combinators. In terms of functions, takes an argument N of any type and returns N (of type ). Likewise, is dened on any M, N of type and , respectively, and MN M of type . Also, M is dened and of type . Basically, the language is the same as in the untyped case. A combinatorial term is stratied if for each variable and each occurrence of , , there exists a type such that if that (occurrence of the) symbol is assigned that type, the resulting string is a typed combinatorial term. (So, while each occurrence of , and , respectively, may be given a different type, each occurrence of the same variable must have the same is stratied, while is not. type.) For example, := We show the second claim rst. Suppose that there are types , , , , such that is a typed combinator. (3.114)
}
This combinator is applied to , and so we have , whence , which is impossible. So, is not stratied. On the other hand, is stratied. Assume types such that is a typed combinator. First, is applied to . This means that
The result has type (3.117)
VV aaV
V U
aaV UU V
U a UU
(3.118)
This is the argument of
Hence we must have
VV aaV
V U
aaV UU V
U a UU
V aV
V U
aaV UU V
U U
(3.116)
yV
i 7i
UU aV
U aU
(3.115)
Then, since is applied to we must have whence . So, has the type
V aV
i i
UV V aV U s s
'
UU a U
U D
220
So, , , . The resulting type is . This is applied to of type . For this to be welldened we must have , or and . Finally, this results in , . So, , , and may be freely chosen, and the other types are immediately dened. It is the second approach that will be the most useful for us later on. We call combinators implicitly typed if they are thought of as typed in this way. (In fact, they simply are untyped terms.) The same can be done with terms, giving rise to the notion of a stratied term. In the sequel we shall not distinguish between combinators and their representing terms. Finally, let us return to the models of the calculus. Recall that we have dened abstraction only implicitly, using Denition (3.97c) repeated below:
In general, this object need not exist, in which case we do not have a model for the calculus. Denition 3.53 An applicative structure is called combinatorially complete if for every term t in the language with free variables from i : i n there exists a y such that for all bi A, i n: This means that for every term t there exists an element which represents this term:
Thus, this denes the notion of an applicative structure in which every element can be abstracted. It is these structures that can serve as models of the calculus. Still, no explicit way of generating the functions is provided. One way is to use countably many abstraction operations, one for every number i (see Section 4.5). Another way is to translate terms into combinatory logic using for abstraction. In view of the results obtained above we get the following result.
V 3 Y@V 3 U U 3
V @V @V 3 aaU 3 3 UU
V @V 3 aU 3 U
(3.122)
k a
s a
Theorem 3.54 (Schonnkel) ements k and s such that
is combinatorially complete iff there are ela c b c
n 1
n 1
t t t aaCyV
) iaaa) U
s aa
s s
(3.121)
) iaaa)
3 3 iaaQV
3 V
3 ayaa6U UU
(3.120)
y b0
b1
bn
t b0
bn
3 t
s
(3.119)
xM
a:
x: a
D V
D D V U D D U CV U U
U V v
6
221
Denition 3.55 A structure A is called a combinatory algebra if x y x x y z x z y z . It is a algebra (or extensional) if it satises A (A ) in addition. So, the class of combinatory algebras is an equationally denable class. (This is why we have not required A 1, as is often done.) Again, the partial case is interesting. Hence, we can use the theorems of Section 1.1 to create structures. Two models are of particular signicance. One is based on the algebra of combinatorial terms over V modulo derivable identity, the other is the algebra of combinators modulo derivable identity. Indirectly, this also shows how to create models for the calculus. We shall explain a different method below in Section 4.5. Call a structure A a partial combinatory algebra if (i) x y is always dened and (ii) the dening equations hold in the intermediate sense, that is, if one side is dened so is the other and they are equal (cf. Section 1.1). Consider once again the universe V . Dene (3.124) : x y z xz
V is not a partial combinatory algebra because x y is not always dened. So, the equation k x y x does not hold in the intermediate sense (since the right hand is obviously always dened). The dening equations hold only in the weak sense: if both sides are dened, then they are equal. Thus, V is a useful model only in the typed case. In the typed case we need a variety of combinators. More exactly: for all types , and we need elements A , and A , such that for all a A and b A we have
We now turn to an interesting connection between intuitionistic logic and type theory, known as the CurryHowardIsomorphism. Write M : if M is a term of type . Notice that while each term has exactly one type, there
V 3 V 3 U U 3
3 3 V V 3 laU hU
(3.126)
a c
b c
and for every a
3 V 3 0U
(3.125)
a ,b A and c A we have
V V ) @@8@q ) U p p qU p p
1 ) )
DD D
3 V 3 U
V aV
U V
(3.123)
x yx
:x y
yz
:x y z
3 h
0a0aaV ) Up8p@q)V ) Up@p@8@) ) ) @S V qU p p q ( ( ( D 1 ) a0 ) ) @S 0 ( ( D
V 3 C3 3 U 0h) ) 3 u@!) (
DDD
0h) ) 3 Ru@!) (
UU V aaV
3 3 @) 3 h D
DD D
0 ) p p q 8) 8@8)
3 p 3
222
Table 6. Rules of the Labelled Calculus

~ ~~
(E ) (I )
( (
are innitely many terms having the same type. The following is a Gentzen calculus for statements of the form M : . Here, , , denote arbitrary sets of such statements, x, y individual variables (of appropriate type), and M, N terms. The rules are shown in Table 6. First of all notice that if we strip off the labelling by terms we get a natural deduction calculus for intuitionistic N: logic (in the only connective ). Hence if a sequent Mi : i : i n is derivable then i : i n , whence i : i n . Conversely, given a natural deduction proof of i : i n , we can decorate the proof with terms by assigning the variables at the leaves of the tree for the axioms and then descending it until we hit the root. Then we get a proof of the sequent Mi : i : i n N : in the above calculus. Now we interpret the intuitionistic formulae in this proof calculus as types. For a set of terms over the set B of basic types we put
~
Denition 3.56 For a set of types and a single type over a set B of basic types we put if there is a term M of type such that every type of a variable occurring free in M is in . Returning to our calculus above we notice that if
~ ~
is derivable, we also have i : i n . This is established by induction on the proof. Moreover, the converse also holds (by induction on the derivation). Hence we have the following result.
~
(3.128)
Mi : i : i
N:
V U
(3.127)
Typ
B : there is M
of type
~~ 0 ~
~~ ~
s
~~ ~
(cut)
t s ~ ) ~~~ ( ~ ) ) )
~~~
(axiom)
M: x: M: M: x: N: M xN :B N: M: MN : x: M: x M : x: x: (M)
~~ ~ ~
223
The correspondence between intuitionistic formulae and types has also been used to obtain a rather nice characterization of shortest proofs. Basically, it turns out that a proof of N : can be shortened if N contains a redex. Suppose, namely, that N contains the redex x M U . Then, as is easily seen, the proof contains a proof of x M U : . This proof part can be shortened. To simplify the argument here we assume that no use of (cut) and (M) has been made. Observe that we can assume that this very sequent has been introduced by the rule (I ) and its left premiss by the rule (E ) and .
~
Then a single application of (cut) gives this: (3.130)

~
While the types and the antecedent have remained constant, the conclusion now has a term associated to it that is derived from contracting the redex. The same can be shown if we take intervening applications of (cut) and (M), but the proof is more involved. Essentially, we need to perform more complex proof transformations. There is another simplication that can be made, namely when the derived term is explicitly converted. Then we have a sequent of the form x Mx : . Then, again putting aside intervening occurrences of (cut) and (M), the proof is as follows. (3.131)
~
This proof part can be eliminated completely, leaving only the proof of the left hand premiss. An immediate corollary of this fact is that if the sequent xi : i : i n N : is provable for some N, then there is an N obtained from N by a series of / and normalization steps such that the sequent xi : i : i n N : is also derivable. The proof of the latter formula is shorter than the rst on condition that N contains a subterm that can be or reduced.
s ~~~
s
s ~
x Mx : y: y: My : y My :
~~~ kk ) k ) k
s ~~ ~
~9
kk
U :
x: M: M xU :
kk
s s
~~ ~
kk ) k
s ~ k
(3.129)
x: M: x M : x M U :
~~~
U :
y:
s @s
s
~
~~
~~ ~
Theorem 3.57 (Curry)
iff
) k
kk
sk
S S
224
Notes on this section. abstraction already appeared in (Frege, 1962) (written in 1891). Frege wrote . f ( ). The rst to study abstraction systematically was Alonzo Church (see (Church, 1933)). Combinatory logic on the other hand has appeared rst in the work of Moses Sch nnkel (1924) and o Haskell Curry (1930). The typing is reminiscent of Husserls semantic categories. More on that in Chapter 4. Sufce it to say that two elements are of the same semantic category iff they can meaningfully occur in the same terms. There are exercises below on applicative structures that demonstrate that Husserls conception characterizes exactly the types up to renaming of the basic types. Exercise 102. Find combinators XY Z XZY. and such that XY Z X ZY Z and
Exercise 104. We have seen in Section 3.2 that can be derived from (a0) and (a1). Use this proof to give a denition of in terms of and . Exercise 105. Show that any combinatorially complete applicative structure with more than one element is innite. Exercise 106. Show that , and dened on V are proper classes in V . Hint. It sufces to show that they are innite. However, there is a proof that works for any universe V , so here is a more general method. Say that C V is rich if for every x V , x C. Show that no set is rich. Next show that , and are rich. Exercise 107. Let A : Typ B be a typed applicative structure. Now dene the partial algebra A where A : A . Show that if the applicative structure is combinatorially complete, the type assignment is unique up to permutation of the elements of B. Show also that if the applicative structure is not combinatorially complete, uniqueness fails. Hint. First, establish the elements of basic type, and then the elements of type b c, where b c C are basic. Now, an element of type b c can be applied to all and only the elements of type c. This allows to dene which elements have the same basic type.
}
: . Denote the set of all types of comExercise 108. Let V : binators that can be formed over the set V by C. Show that C is exactly the
1 )
0 3) T !bV U
T b1 i T r) q S
Exercise 103. Determine all types of
and
of the previous exercise.
e V
U e
0 3 !) (
| { @dz
Exercise 101. Show that in
,M
U YV D
O.
i S
S !(
The Syntactic Calculus of Categories
225
set of intuitionistically valid formulae, that is, the set of formulae derivable in .
4.
Categorial grammars in contrast to phrase structure grammars specify no special set of rules, but instead associate with each lexical element a nite set of context schemata. These context schemata can either be dened over strings or over structure trees. The second approach is older and leads to the so called AjdukiewiczBar HillelCalculus ( ), the rst to the Lambek Calculus ( ). We present rst the calculus . We assume that all trees are strictly binary branching with exception of the preterminal nodes. Hence, every node whose daughter is not a leaf has exactly Y Z licenses the expansion of two daughters. The phrase structure rule X the symbol X to the sequence Y Z. In categorial grammar, the category Y represents the set of trees whose root has label Y , and the rule says that trees with root label Y and Z, respectively, may be composed to a tree with root X. The approach is therefore from bottom to top rather than top to bottom. The fact that a tree of the named kind may be composed is coded by the so called category assignment. To this end we rst have to dene categories. Categories are simply terms over a signature. If the set of proper function symbols is M and the set of 0ary function symbols is C we write Cat M C rather than TmM C for the set of terms over this signature. The members are called categories while members of C are called basic categories. In the AB Calculus we have M . ( also has .) Categories are written in inx notation. So, we write in place of . Categories will be denoted by lower case Greek letters, basic categories by lower case Latin letters. If C then , are categories. Notice that we take the actual strings to be the categories. This convention will soon be relaxed. Then we also use left associative bracketing as with terms. So, will be short for . (Notice the change in font signals that the way the functor is written down has been changed.) The interpretation of categories in terms of trees is as follows. A tree is understood to be an C , exhaustively ordered strictly binary branching tree with labels in Cat which results from a constituent analysis. This means that nonterminal nodes branch exactly when they are not preterminal. Otherwise they have a single daughter, whose label is an element of the alphabet. The labelling function
V U
V @ U
x 7
v 1 T 3S )
3v
v
v
'
v
V U
'@
T ) @) 3S
226
must be correct in the sense of the following denition.
Call a tree 2standard if a node is at most binary branching, and if it is nonbranching iff it is preterminal. Denition 3.58 Let A be an alphabet and : A Cat C be a function for which a is always nite. Then is called a category assignT t be a 2standard tree with labels in Cat C . ment. Let is correctly labelled if (1) for every nonbranching x with daughter y x y , and (2) for every branching x which immediately dominates x y1 or y1 y0 x . y0 , y1 and y0 y1 we have: y0 Denition 3.59 The quadruple K S C A is an ABgrammar if A and C are nite sets, the alphabet and the set of basic categories, respectively, S C, and : A Cat C a category assignment. The set of labelled trees that is accepted by K is denoted by L B K . It is the set of 2standard correctly labelled trees with labelling : T Cat C such that the root carries the label S. We emphasize that for technical reasons also the empty string must be assigned a category. Otherwise no language which contains the empty string is a language accepted by a categorial grammar. We shall ignore this case in the sequel, but in the exercises will shed more light on it. ABgrammars only allow to dene the mapping . For given , the set of trees that are correctly labelled are then determined and can be enumerated. To this end we need to simply enumerate all possible constituents. Then for each preterminal x we choose an appropriate label y , where y x. The labelling function therefore is xed on all other nodes. In other words, the ABgrammars (which will turn out to be variants of CFGs) are invertible. The algorithm for nding analysis trees is not very effective. However, despite this we can show that already a CFG generates all trees, which allows us to import the results on CFGs. Theorem 3.60 Let K CFG G such that LB K C A be an ABgrammar. Then there exists a LB G .
V @ U
U 3 yV bFuV
V U 3 aV xU
V U aV
U 3 w
V U
U b3
yV
V U
0 ) ) ) (
U b3 V b U 3
D D V
V U V U 0 ) ) ) (D } D
V U aV $
U 3
0 )j)i) ( V U D
(3.132)
V U 3 aV bxU
j
1V Ub3 1
227
Proof. Let N be the set of all subterms of terms in a , a A. N is clearly nite. It can be seen without problem that every correctly labelled tree only carries labels from N. The start symbol is that of K. The rules have the form
where , run through all symbols of N and a through all symbols from A. This denes G : N A R . If LB G then the labelling is correct, as is easily seen. Conversely, if LB K then every local tree is an instance of a rule from G, the root carries the symbol , and all leaves carry a terminal symbol. Hence LB G . Conversely every CFG can be converted into an ABgrammar; however, these two grammars need not be strongly equivalent. Given L, there exists a L. We distinguish grammar G in Greibach Normal Form such that L G two cases. Case 1. L. We assume that is never on the right hand side of a production. (This can be installed keeping to Greibach Normal Form; see the exercises.) Then we choose a category assignment as in Case 2 and add : . Case 2. L. Now dene
Put K : NG A G . We claim that L K L G . To this end we shall transform G by replacing the rules X a i n Yi by the rules
This denes the grammar H. We have L H L G . Hence it sufces to show that L K L H . In place of K we can also take a CFG F; the nonterminals are NF . We show now that that F and H generate the same trees modulo the NH NF , which is dened as follows. (a) For X NG we Rsimulation have X Y iff X Y . (b) Zi W iff W X Yn 1 Yi 1 and X Y0 Y1 Yn 1 for certain Y j , i j n. To this end it sufces to show that the rules of F correspond via to the rules of H. This is directly calculated. Theorem 3.61 (BarHillel & Gaifman & Shamir) Let L be a language. L is context free iff L LB K for some ABgrammar.
D 1
aa
V U
) iaa
V U
V U
D }
V U
(3.137)
Z0
aY0
Z1
Z0Y1
Zn
V U
V U
i n
Yn 2Yn
aa
) )
V U
(3.136)
}
G a :
X Yn
Y1 Y0 : X
Yi
V aV U
V U
1 U
V U
V U 1
1 0 ) ) ) (
V U
(3.135)
(3.134)
(3.133)
V U
) (
T S D
W aa W
V U
V U W
228
Notice that we have used only . It is easy to see that alone would also have sufced. Now we look at Categorial Grammar from the standpoint of the sign grammars. We introduce a binary operation on the set of categories which satises the following equations.
Hence is dened only when or for some . Now let us look at the construction of a sign algebra for CFGs of Section 3.1. Because of the results of this section we can assume that the set T is a subset of Cat C which is closed under . Then for our proper modes we may proceed as follows. If a is of category then there exists a context free rule a and we introduce a 0ary mode : a a . The other rules can be condensed into a single mode
(Notice that is actually a structure term, so should actually write is place of it. We will not do so, however, to avoid clumsy notation.) However, this still does not generate the intended meanings. We still have as in Section 3.1. We do not want to do this, however. Instead to introduce we shall deal with the question whether one can generate the meanings in a more systematic fashion. In general this is not possible, for we have only assumed that f is computable. However, in practice it appears that the syntactic categories are in close connection to the meanings. This is the philosophy behind Montague Semantics. Let an arbitrary set C of basic categories be given. Further, let a set B of basic types be given. From B we can form types in the sense of the typed calculus and from C categories in the sense of categorial grammar. We shall require that these two are connected by a homomorphism from the algebra of categories to the algebra of types. Both are realized over strings. So, for each basic category c C we choose a type c . Then we put (3.140)
Let now A : Typ B be a typed applicative structure. denes a realization of B in by assigning to each category the set A ,
yV U yV U
0 3) T !bV U
( )
( )
V U
V U
c : c : :
V U
0 i 6) ) i ( i i
V i a0 !) ) 0 6) ) aU i() i i(
VR v U D VR U V U
S !(
(3.139)
x x
y y
xy xy
0 ) ) (
D b
V $ U
(3.138)
229
which we also denote by . We demonstrate this with our arithmetical terms. The applicative structure shall be based on sets, using as the interpretation of function application. This means that A A A . Consequently, . There is the basic category , and it is realized by the set of numbers from 0 to 9. Further, there is the category which gets realized by the rational numbers for example.
: is a binary function. We can redene it as shown in Section 3.3 to an element of , which we also denote by . The syntactic category which we assign to has to match this. We choose . Now we have
as desired. Now we have to see to it that the meaning of the string is indeed 12. To this end we require that if is combined with to the constituent the meaning of (which is a function) is applied to the number 7. So, the meaning of is the function x x 7 on . If we nally group and together to a constituent then we get a constituent of category whose meaning is 12. If things are arranged in this way we can uniformly dene two modes for , and .
We further assume that if a A has category then there are only nitely many M which are meanings of a of category . For each such meaning M we assume a 0ary mode a M . Therewith is completely standardized. In the respective algebras , and there is only one binary operation. In it is the concatenation of two strings, in it is cancellation, and in function application. The variability is not to be found in the proper modes, only in the 0ary modes, that is, the lexicon. Therefore one speaks of Categorial Grammar as a lexical theory; all information about the language is in the lexicon.
(3.143b)
x M
y N
x y NM
) ) (
) ) aU i( S ) ) aU i(
(3.143a)
x M
y N
x y MN
w $v
w v $bu
V cc U
) i ( i V a0 ) ) 0 i() D ) i ( i V a0 ) ) 0 i() D
t 2v
w $v
s
(3.142)
t 2v
g V U
T aaa) ) S )
t gFs
(3.141)
01
p p 8@q D
t t s
t 7
t s
s
t 7
t s
5S
s
s 1
w $v
230
Denition 3.62 A sign grammar is called an ABsign grammar if the signature consists of the two modes and and nitely many 0ary modes i , i n such that
Notice that the algebra of meanings is partial and has as its unique operation function application. (This is not dened if the categories do not match.) As we shall see, the concept of a categorial grammar is somewhat restrictive with respect to the language generated (it has to be context free) and with respect to the categorial symbols, but it is not restrictive with respect to meanings. We shall give an example. We look at our alphabet of ten digits. Every nonempty string over this alphabet denotes a unique number, which we name denotes the number by this very sequence. For example, the sequence or . We want to write an AB 721, which in binary is grammar which couples a string of digits with its number. This is not as easy as it appears at rst sight. In order not to let the example appear trivial we shall write a grammar for binary numbers, with in place of 1 and in place of 0. To start, we need a category as in the example above. This category is realized by the set of natural numbers. Every digit has the category . So, we have the following 0ary modes.
(3.145)
Now we additionally agree that digits have the category number is analyzed in this way.
I
0 ) ( ) I
5'
0 ) ( )
(3.144)
Z 0
Z 1
. With this the
r w
I I I I #5g#g#gI
r q q r q r r q II@Ir
and Ni
,i
n.
0 a0
() 3) T @!bV U
S !(
I #gI
M : Typ B Ni : i applicative structure by constants,
0 a0
( ) U ) aV $
Cat
i : i
0 a0
i( ) W ) (
xi : i
i ) ) (
xi i Ni , i
n, n , n for some set C, n is an expansion of a typed
0 ) ) ( )
231
This means that digits are interpreted as functions from to . As one easily nds out these are the functions x0 2x0 k, k 0 1 . Here k must be the value of the digit. So, we additionally need the following zeroary modes. (3.147) :
(Notice that we write x0 2x0 and not , since the latter is a string, while the former is actually a function in a particular algebra.) However, the grammar does not have the ideal form. For every digit has two different meanings which do not need to have anything to do with each other. For example, we could have introduced the following mode in place of or even in addition to .
We can avoid this by introducing a second category symbol, , which stands for a sequence of digits, while only stands for digits. In place of we now dene the empty modes , and : (3.149) (3.150) : :
The meaning of this term is calculated as follows.

P
x1 x0 2x1 5
U U aU
P P
(3.152)
x1 x0 2x1
2 1
V V aV U UV g U V V U V UaV V aV V U UV g U V aV V U UV U V aV aV V UV UV U U aU V aV aV U aU UV UV P U V aV aV U U U V U V P V
(3.151)
x 0 x0 1
1
1 0
x0
I 9I
For example, we get
as the exponent of the term
x0
1 0
2 1
x 0 x0 x1 x0 2x1
x0
) R g) ( P v t D 0 ) ) ( P D
) ) q b5'(
t v 2t
P
U
(3.148)
x0 2x0
t t
$
s s
(3.146)
x0 2x0 x0 2x0
T ) x1 S
e )b5'ba( ) r D ) ) q b5'( e D e D
232
This solution is far more elegant than the rst. Despite of this, it too is not satisfactory. We had to postulate additional modes which one cannot see on the string. Also, we needed to distinguish strings from digits. For comparison we show a solution that involves restricting the concatenation function. Put
Now take a binary symbol (3.154)
One could also dene two unary modes for appending a digit. But this would mean making the empty string an exponent for 0, or else it requires another set of two digits to get started. A further problem is the restricted functionality in the realm of strings. With the example of the grammar T of the previous section we shall exemplify this. We have agreed that every term is enclosed by brackets, which merely are devices to help the eye. These brackets are now symbols of the alphabet, but void of real meaning. To place the brackets correctly, some effort must be made. We propose the following grammar. : : : : : : : : x0 x1
0 1
The conception is that an operation symbol generates an unbracketed term which needs a left and a right bracket to become a real term. A semantics that ts with this analysis will assign the identity to all these. We simply take for all basic categories. The brackets are interpreted by the identity function. If we add a bracket, nothing happens to the value of the term. This is a viable solution. However, it amplies the set of basic categories without any increase in semantic types as well.
(3.155)
x0 x1
x0 x1
0 0 v ) ) ) )
t7v t7v t7v t 7v
0 ) ) t ( ) t 0 ) P( ) ) t b vg8'( ) t ) su b 7v j( ) b t ) ( U t ( ) v t ) u ( t ) ( ) (
x1 x1 x1 x1 x 0 x0 x 0 x0 x 0 x0
x0 x0 x0 x0
x0
x1
) ) i
i (
V ) i() ) i( a0 ) 0 ) aU
x y
( D ( D D 2
(3.153)
x y:
and set 2m n
1 i
x y undened
if y A, otherwise.
i QW i D
233
The application of a function to an argument is by far not the only possible rule of composition. In particular Peter Geach has proposed in (Geach, 1972) to admit further rules of combination. This idea has been realized on the one hand in the LambekCalculus, which we will study later, and also in combinatory categorial grammars. The idea to the latter is as follows. Each mode in Categorial Grammar is interpreted by a semantical typed combinator. For example, acts on the semantics like the combinator (dened in Section 3.3) and is interpreted by the combinator . This choice of combinators is seen from the standpoint of combinatory logic only one of many possible choices. Let us look at other possibilities. We could add to the ones we have also the functions corresponding to the following closed term.
MN is nothing but function composition of the functions M and N. For evidently, if has type then must have the type for some and the type for some . Then is of type 1 . Notice that for each , and we have a typed term .

However, as we have explained earlier, we shall not use the explicitly typed terms, but rather resort to the implicitly typed terms (or combinators). We dene two new category products and by (3.158b) : : :
Here, it is not required that the type of M matches in any way, or the type of N the category . In place of NM we could have used MN, where
t t t t t @@@ s s
s
s s
(3.161)
(3.160)
x M
y N
x y
NM
) )
) ) aU i( ) ) aU i(
(3.159)
x M
y N
x y
i ) i W ( V a0 ) ) 0 i() D i ) i W ( V a0 ) ) 0 i() D
Further, we dene two new modes,
and
(3.158d)
(3.158c)
D v v v D v D
(3.158a)
, as follows: MN
t t t t @@t
s
s
s
(3.157)
t t t @ s s
t t t t @@t
s
y 6
s s
s
s s
(3.156)
234
We denote by CCG the extension of by the implicitly typed combinator . This grammar not only has the modes and but also the modes and . The resulting tree sets are however of a new kind. For now, if x is branching with daughters y0 and y1 , x can have the category if y0 has the category and y1 the category . In the denition of the products and there is a certain arbitrariness. What we must expect from the semantic typing regime is that the type of and equals if and for some , and . Everywhere else the syntactic product should be undened. However, in fact the syntactic product has been symmetried, and the directions specied. This goes as follows. By applying a rule a category (here ) is cancelled. In the category the directionality (here: right) is viewed as a property of the argument, hence of . If is not cancelled, we must nd being selected to the right again. If, however, it is cancelled from , then the latter must be to the left of its argument, which contains some occurrence of (as a result, not as an argument). This yields the rules as given. We leave it to the reader to show that the tree sets that can be generated from an initial category assignment are again all context free. Hence, not much seems to have been gained. We shall next study another extension, CCG . Here
In order for this to be properly typed we may freely choose the type of and , say and . Then is of type for some and of type for some . stands for an at least binary function, for a function that needs at least one argument. If the combinator is dened, the mode is xed if we additionally x the syntactic combinatorics. To this end we dene the products , as in Table 7. Now we dene the following new modes:
We shall study this type of grammar somewhat closer. We take the following modes.
(3.165)
v v 0 R) ) ) v v ) ) )
x 0 x 1 x0 x1 x 0 x 1 x0 x1
r
(3.164)
x M
y N
x y
NM
r
) ) aU i( G ) ) aU i(
(3.163)
x M
y N
x y
MN
t t t t t t @@$ @ s s s
) i W ( i V a0 ) ) 0 i() D ) i W ( i V a0 ) ) 0 i() D
s
s s
Fe e
(3.162)
6
V r FU
V U
V "uU
y
V U
235
: :
Take the string . It has two analyses, shown in Figure 10. In both analyses the meaning is 5. In the rst analysis only the mode has been used. The second analysis uses the mode . Notice that in the course of the derivation the categories get larger and larger (and therefore also the types).
We shall show that the grammar just dened is of this kind. To this end we shall make a few more considerations.
In particular, is associative if dened (in contrast to ). Now, let us look at a string of the form x y, where x ,y and h x yT, where h : . An example is the string . Then with the exception of x all prexes are constituents. For prexes of x are constituents, as one can easily see. It follows easily that the tree sets are not context free. For if x y then x h y T is not derivable. However, x h x T is derivable. If the tree set was context free, there cannot be innitely many such x, a contradiction.
V U i
V U i
i S H@H s H U S T 1 i S V H V H
Proof. Proof by direct computation. For example,
s 1 i U
V U i
(3.166)
1 2 3 4 5 .
Lemma 3.64 Let
1 2 3 ,
3 4 5 and
V r U
Theorem 3.63 There exist CCG free tree sets.
grammars which generate non context
5 6 7 . Then
x 0 x0
@ @ @ V v v
C v C v C C v v
D D D D D D D
v v @ v v
H H v H S T S )b x ) ( D e ) ge 0 ) F( D
: : : : : : : :
v @ v v
Table 7. The Products
and
v @
i i D
S
V U V U H D H s U D V S s V U T S H D
: h
3v
v v
H
x@
S
x@
S
v
x
x v v
$
. . . . . .

. . .
x v

S
x@

S
236
. . . . . . . . . . . .
v x v v v v x@
v
. . .
. . . . . .
v v x3v v @ @ v v 3v v x@v v $ v v
Figure 10. Two Analyses of
So, we have already surpassed the border of context freeness. However, we can push this up still further. Let be the following grammar.
#HF9
x 0 x 1 x0 x1 x 0 x 1 x0 x1
0 ) $ ) ( ) 0 ) F( ) v 0 R) H ( ) v x2x) ( T ) x2x) (

x 0 x0
(3.167)
Proof. Let L be the language generated by . Put M : is context free, so is L M (by Theorem 2.14). Dene h by h
D D
Theorem 3.65 :
generates a non context free language. 2 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. If L : ,
P P
P
237
Now let xy be such that x and y . It is not hard to see that then x is a constituent. (Basically, one can either multiply or apply. The complex categories cannot be applied to the right, they can only be applied to to the left, so this can happen only with . If one applies one gets , which cannot be multiplied by with any other constituent formed. It cannot be applied either (assuming that the string is not , in which case does become a constituent under this analysis), because nothing on the right of it has category . Now let x : x 0 x1 xn 1 . Further, let if xi and di : if xi , i n. Then the category of x equals di : with dn i 1 : i n . Hence x is a constituent of category . This means, however, that y0 has the category d0 (because d0 is the last in the list hence the rst to be discharged), y 1 the category d1 and so on. But if yi has the category di then h xi yi , as is easily checked. This yields that h x y. If on the other hand this is the case, the string is derivable. Hence we now have a grammar which generates a non context free language. CCGs are therefore stronger than ABgrammars. There is a still different way to introduce CCGs. There we do not enlarge the set of combinatorial rules but instead introduce empty modes. : : :
Here we do not have four but innitely many modes, one for each choice of , and . Only in this way it is possible to generate non context free languages. Lexical elements that have a parametric (= implicitly typed) set of categories (together with parametric meanings) are called polymorphic.
(3.170)
0 ) 2V u U V 0 ) 2V U 0"V u U V ) 0 ) "V U
u U V u ) U V u U ) U V u ) U U )
V U i
@ xC(
aa
s 1 i V U H
i
s 1 i U
V U
R$8 U u
(3.169)
2 ; 1
RU$ub V R$V U u D aa
Hence h L M yy : y this follows by Theorem 1.67 that L context free either. Now for the proof denote the category 0 1
. The latter is not context free. From M is not context free, hence L is not of (3.168). If i : i n then let n 1 . Then we have:
ii
V U i
t T s 1 i V U H
1 i
i IS i
1 i
S
(3.168)
L M iff (a) x
L and (b) h x
yy for some y
s f1 i V U H
V $U
V U
T T
i i i
V R$8 U u
H D
V U
as well as h
. We show:
238
Particularly interesting cases of polymorphic elements are the logical connectors, and . Syntactically, they have the category and , respectively, where can assume any (non parametric) category. This means that two constituents of identical category can be conjoined by to another constituent of the same category, and every constituent can be turned by to a constituent of identical category. Notes on this section. Although we have said that the meanings shall be functions in an applicative structure, we sometimes put strings in their place. These strings only denote these functions. This is not an entirely harmless afand the string denote fair. For example, the string the same function. In fact, for reduced terms terms uniqueness holds only up to renaming of bound variables. It is standard practice in calculus to consider terms up to renaming of bound variables (see (Pigozzi and Salibra, 1995) for a discussion). A possible remedy might be to use combinators. But here the same problem arises. Different strings may denote the same function. This is why normalisation becomes important. On the other hand, strings as meanings have the advantage to be nite, and thus may function as objects that can be stored (like codes of a Turing machine, see the discussion of Section 4.1). Exercise 109. Let : A Cat C be a category assignment. Show that the correctly labelled trees form a context free tree set. Exercise 110. Show that for every CFG there exists a weakly equivalent grammar in Greibach Normal Form, where the start symbol does not occur on the right hand side of a production.
} }
Exercise 112. Let L A be context free and f : A M a computable function. Write an ABsign grammar whose interpreted language is x f x : x L . Exercise 113. Let be an ABsign grammar. Show for all signs x M generated by that grammar: M has the type . Hint. Induction on the length of the structure term.
0 i aV U ) @S i(
V U
V U
k aa k V U aV $
Exercise 111. Let : A Cat C let be the distinguished category. is a contains an of the form 0 Show that for any there is a normal language.
be a category assignment. Further, called normal if and no n 1 with i for some i n. such that and have the same
9 RH
t r @v
u U
6 )y
s
V U aV $
t r @v
6 )y
s
0 ) ) ( )
g 9
9 RH
) ) ( i
g 9
}
V U
1 i
The ABCalculus
239
Exercise 114. Show that the CCG grammars only generate context free string languages, even context free tree sets. Hint. Show the following: if A is an arbitrary nite set of categories, then with one can generate at most A n many categories. Exercise 115. Suppose we dened a product on categories as follows. is dened whenever (a) is dened (and has the same value), or (b) is dened (and has the same value). Show that this does not allow to unambiguously dene the semantics. (Additional question: why does this problem not arise with ?) Hint. Take .
5.
The ABCalculus
We shall now present a calculus to derive all the valid derivability statements for ABgrammars. Notice that the only variable element is the elementary category assignment. We choose an alphabet A and an elementary category assignment . We write for the set of all unlabelled binary constituent structures over A that have root category under some correct labelling. As is completely arbitrary, we shall deal here only with the constituent structures obtained by taking away the terminal nodes. This eliminates and A, and leaves a class of purely categorial structures, denoted by . Since ABgrammars are invertible, for any given constituent structure there exists at most one labelling function (with the exception of the terminal labels). Now we introduce a binary symbol , which takes as arguments correctly labelled constituent structures. Let X and Y m such constituent structures and X Y . Then let
(3.172)
nz :
(In principle, is well dened also if the constituent structures are not binary branching.) In case where X Y one has to proceed to the disjoint sum. We shall not spell out the details. With the help of we shall form terms over A, that is, we form the algebra freely generated by A by means of . To every
)x1 ) 1
V U
z mz X mY
if z if z if z
X Y.
0 bT )
S s s
V 3 U V U V U D V 3 U
0 x) Y@) ) ( ) ( X 03
(3.171)
m :
X Y
X Y n
0 f) ( )
03 ) ) (
V CuU D
240
term we inductively associate a constituent structure in the following way. (3.173b) s t

k
Notice that has been used with two meanings. Finally, we take a look at . It denotes classes of binary branching constituent structures over A. The following holds. (3.174b)
We abstract now from A and . In place of interpreting as a constructor for constituent structures over A we now interpret it as a constructor to form constituent structures over Cat C for some given C. We call a term from categories with the help of a category complex. Categories are denoted by lower case Greek letters, category complexes by upper case Greek letters. Inductively, we extend the interpretation to structures as follows.
Next we introduce yet another symbol, . This is a relation between structures and categories. If is a structure and a category then denotes the fact that for every interpretation in some alphabet A with category assignment . We call the object a sequent. The interpretation that we get in this way we call the cancellation interpretation. Here, categories are inserted as concrete labels which are assigned to nodes and which are subject to the cancellation interpretation. We shall now introduce two different calculi, one of which will be shown to be adequate for the cancellation interpretation. In formulating the rules we use the following convention. above the line means in this connection that is a category complex in which we have xed a single occurrence of . When we write, for example, below the line, then this denotes the result of replacing that occurrence of by .
~ ~$ ~
~
~~ ~
~ ~~
(I )
( I)
~$
~
(3.176)
(I )
( I)

~ ~
~~ ~
~~ ~
(ax)
(cut)
~
~ ~~
~~ ~
~~ ~
Y X
(3.175)
V U
F u } X F } X
(3.174a)
X D 0 ) S() T S S) S a0 bT !bT 'bT !(

s
k
(3.173a)
k :
V X U X
F }
The ABCalculus
241
We denote the above calculus by (cut), and by the calculus without (cut). Further, the calculus consisting of (ax) and the rules ( I) and ( I) is called . Denition 3.66 Let M be a set of category constructors. A categorial sequent grammar is a quintuple
where C is a nite set, the set of basic categories, C the so called distinguished category, A a nite set, the alphabet, : A Cat M C a category assignment, and a sequent calculus. We write G x if for some category We stress here that the sequent calculi are calculi to derive sequents. A sequent corresponds to a grammatical rule, or, more precisely, the sequent expresses the fact that a category complex of type is a constituent that has the category by the rules of the grammar. The rules of the sequent calculus can then be seen as metarules, which allow to pass from one valid statement concerning the grammar to another.
} ~
In natural deduction style calculi this corresponds to the following unary rule: (3.179)
This rule is known as type raising, since it allows to proceed from the category to the raised category . Perhaps one should better call it category raising, but the other name is standard. To see that it is not derivable in we simply note that it is not correct for the cancellation interpretation. in the next We shall return to the question of interpretation of the calculus section. An important property of these calculi is their decidability. Given and we can decide in nite time whether or not is derivable.
~~
u aV
u aV
u aV
~~ ~ U
(3.178)
is strictly stronger than derivable in :
. Notice namely that the following sequent is
$ }
~~ ~
Proposition 3.67 (Correctness) If
is derivable in
then
~ ~~
complex whose associated string via is x we have
V aV U
0 ) ) ) ) (
(3.177)
C A
242
Theorem 3.68 (Cut Elimination) There exists an algorithm to construct a proof of a sequent in from a proof of in (cut). Hence (cut) is admissible for . Proof. We presented a rather careful proof of Theorem 3.34, so that here we just give a sketch to be lled in accordingly. We leave it to the reader to verify that each of the operations reduces the cutweight. We turn immediately to the case where the cut is on a main formula of a premiss. The rst case is that the formula is introduced by (I ). (3.180)
Now look at the rule instance that is immediately above . There are several cases. Case (0). The premiss is an axiom. Then , and the cut is superuous. Case (1). is a main formula of the right hand premiss. Then for some and , and the instance of the rule was as follows.
Now we can restructure (3.181) as follows. (3.182)

~$
Now we assume that the formula is not a main formula of the right hand premiss. Case (2). and the premiss is obtained by application of (I ).
~~ ~
(3.183)
We replace (3.183) by
~
~ ~~
~ X
(3.184)
~
~
~
(3.181)
~~ ~
~~ ~
~! ~
~~ ~
~
~ ~~
X X X
The ABCalculus
243
Case (3). and has been obtained by applying the rule (I ). Then proceed as in Case (2). Case (4). The application of the rule introduces a formula which occurs in . This case is left to the reader. Now if the left hand premiss has been obtained by ( I), then one proceeds quite analogously. So, we assume that the left hand premiss is created by an application of ( I). (3.185)
~
We can restructure (3.185) as follows. (3.186)

~
Also here one calculates that the degree of the new cut is less than the degree of the old cut. The case where the left hand premiss is created by ( I) is analogous. All cases have now been looked at.
gives a method to test category complexes for their syntactic category. We expect that the meanings of the terms are likewise systematically connected with a term and that we can determine the meaning of a certain string once to see how we have found a derivation for it. We now look at the rules of they can be used as rules for deducing signsequents. Before we start we shall distinguish two interpretations of the calculus. The rst is the intrinsic interpretation: every sequent we derive should be correct, with all basic parts of it belonging to the original lexicon. The second is the global interpretation: the sequents we derive should be correct if the lexicon was suitably expanded. This only makes a difference with respect to signs with empty exponent. If a lexicon has no such signs the intrinsic interpretation bans their use altogether, but the global interpretation leaves room for their addition. Adding them, however, will make more sequents derivable that are based on the original lexicon only. We also write x : : M for the sign x M . If x, or M is irrelevant in the context it is omitted. For the meanings we use terms, which are however only proxy for the real meanings (see the discussion at the
) ) ( i
Corollary 3.69
(cut) is decidable.
~~ ~
~
~
~~ ~
X ~ ~ ~ X ~
~
~~ ~
244
end of the preceding section). Therefore we now write x xy in place of . A sign complex is a term made of signs with the help of . Sequents are pairs where is a sign complex and a sign. maps categories to types, as in Section 3.4. If x : : M is derivable, we want that M is of type . Hence, the rules of should preserve this property. We dene rst a relation between sign complexes. It proceeds by means of the following rules. (3.188) x: :M y: :N x y : : NM
Since M and N are actually functions and not terms, one may exchange any two terms that denote the same function. However, if one considers them being actual terms, then the following rule has to be added:
For a sign complex and a sign put iff . We want to design a calculus that generates all and only the sequents such that . To begin, we shall omit the strings and deal with the meanings. Later we shall turn to the strings, which pose independent problems. The axioms are as follows.
~
(3.190)
(ax)
:M
:M
where M is a term of type . (cut) looks like this.
So, if : x is a sign complex containing an occurrence of : x , then the occurrence of this sign complex is replaced and with it the variable x in M. So, semantically speaking cut is substitution. Notice that since we cannot tell which occurrence in M is to be replaced, we have to replace all of them. We will see that there are reasons to require that every variable has exactly one occurrence, so that this problem will never arise. (We could make this a condition on (cut). But see below for the fate of (cut).) The other rules are more complex. (3.192) ( I)
~
:M : x : x : x
~ ~ ~
~ ~ ~
(3.191)
cut
~
:N
: x : M : N x M
:N M x N
~~ ~
V U
h q
(3.189)
x: :M
x: :N
if M
V V
i W ni i W ni
h $
h $
i CX
i CX
(3.187)
x: :M y: :N
x y : : MN
V U
t @t s i
The ABCalculus
245
This corresponds to the replacement of a primitive constituent by a complex constituent or the replacement of a value M x by the pair M x . Here, the variable x is introduced, which stands for a function from objects of type to objects of type . The variable x has, however, disappeared. This is a serious decit of the calculus (which has other advantages, however). We shall below develop a different calculus. Analogously for the rule ( I).
Lemma 3.70 The rule (E ) is derivable from (cut) and ( I). Likewise, the rule (E ) is derivable from (cut) and ( I). Proof. The following is an instance of ( I). (3.194)
: x : x : x : x : x : x : x
Now two cuts, with : M : M and with : N : N, give (E ). Thus, the rules (3.187) and (3.188) are accounted for. The rules (I ) and (I ) can be interpreted as follows. Assume that is a constituent of category . (3.195) (I ) : x : M : x M
~
Here, the meaning of is of the form M and the meaning of is N. Notice that forces us in this way to view the meaning of a word of category to be a function from objects to objects. For it is formally required that has to have the meaning of a function. We call the rules (I ) and (I ) also abstraction rules. These rules have to be restricted, however. Dene for a variable x and a term M, focc x M to be the number of free occurrences of x in M. In the applications of the introduction rules, we add a side condition:
(In fact, one can show that from this condition already follows focc x M 1, by induction on the proof.) To see the need for this restriction, look at the
(3.196)
In (I ) and (I ) : focc x M
~~~
~~ ~
~~ ~
) U
~~ ~
(E )
~ ~~
~~ ~
(3.193)
~~~
~~ ~
(E )
:M : :M :
:N NM :N NM
0 )
V U
246
following derivation.
The rst is obtained using two applications of the derivable (E ). This rule must be further restricted, however, as the next example shows. In the rule ( I) put : : . (3.197)
Using (I ), we get

This is the type raising rule which we have discussed above. A variable x can also be regarded as a function, which for given function f taking arguments of type returns the value f x . However, x is not the same function as x x x . (The latter has the type .) Therefore the application of the rule is incorrect in this case. Moreover, in the typed x x x is invalid. calculus the equation x To remedy the situation we must require that the variable which we have abstracted over appears on the left hand side of in the premiss as an argument variable and not as a variable of a function that is being applied to something. So, the nal form of the right slashintroduction rule is as follows.
~
How can one detect whether x is an argument variable? To this end we require that the sequent be derivable in categorial . This seems
(3.200)
(I )
: x : M : x M
x an argument variable, and focc x M 1
QV
V aV
~~
V aV
U
U
u aV
~ U
V aV
(3.199)
: x : x : x x : x : x x x
Now x x x is the same function as x ing (I ) we get
~ ~~
U
(3.198)
. On the other hand, by apply-
VaV V
: x : x
: x : x : x x
x x
: x : x : x : x : x : x : x x
~~ ~
UU a
U
~ ~ ~~
: M : x : x : M : x :M
: Mx x : x Mx x : x x Mx x
X @V X D
~~ ~
U
v
U
The ABCalculus
247
paradoxical. For with this restriction the calculus seems to be as weak as . Why should one make use of the rule (I ) if the sequent is anyway derivable? To understand this one should take note of the difference between the categorial calculus and the interpreted calculus. We allow the use of the interpreted rule (I ) if is derivable in the categorial calculus; or, to be more prosaic, if has the category and hence the type . That this indeed strengthens the calculus can be seen as follows. In the interpreted the following sequent is not derivable (though it is derivable in ). The proof of this claim is left as an exercise.
We assign to a sign complex a sign as follows.
x: :M y: :N :
x M
It is easy to see that if : M is derivable in the interpreted then x M for some M M. (Of course, M and M are just notational variants denoting the same object. Thus they are identical qua objects they represent.) The calculus that has just been dened has drawbacks. We will see below that (cut) cannot be formulated for strings. Thus, we have to do without it. But then we cannot derive the rules (E ) and (E ). The calculus obviates the need for that.

In (the interpreted) , (cut) is admissible. (The proof of that is left as an exercise.) Now let us turn to strings. Now we omit the interpretation, since it has been dealt with. Our objects are now written as x : where x is a string and a category. The reader is reminded of the fact that y x denotes that string which results from y by removing the postx x. This is clearly dened only if
Denition 3.71 The calculus (E ).

has the rules (ax), (I ), (E ) (I ) and
Va0 ) u ) 0 i() V a0 ) ) 0 ) i()
i i
i CX
0k
iU i U
(3.202)
x: :M y: :N :
x M
x: :M :
) ) aU i( V ) aU D V i( D i ) ) ( V D
iCX i U
~~ ~
(3.201)
: x
: x x
x M
y N
y N
) ) ( i
V U
248 y
The cut rule is however no more a rule of the calculus. There is no formulation of it at all. Suppose we try to formulate a cut rule. Then it would go as follows. (3.204)
Here, y x z denotes the result of replacing y for x in z. So, on the strings (cut) becomes constituent replacement. Notice that only one occurrence may be replaced, so if x occurs several times, the result of the operation y x z is not uniquely dened. Moreover, x may occur accidentally in z! Thus, it is not clear which of the occurrences is the right one to be replaced. So the rule of (cut) cannot even be properly formulated. On the other hand, semantically it is admissible, so for the semantics we can do without it anyway. However, the same problem of substitution arises with the rules ( I) and ( I). Thus, they had to be eliminated as well. This completes the denition of the sign calculus . Call the calculus consisting of just ( E) and ( E). Based on Lemma 3.70 the completeness of for is easily established.

~
is certainly correct for the global interpretation, but it is correct for the intrinsic interpretation? The answer is actually yes! The fact is that the introduction rules are toothless tigers: they can only eliminate a variable that has never had a chance to play a role. For assume that we have an proof. If (I ) is used, let the highest application be as follows. (3.205) : : x y : : M y : : x M y: :N : : x : : x : : x y : : Nx
~~~
Then the sequent above the line has been introduced by (E ):
i ~
(3.206)
U i ~
i ~
i ~~~

Theorem 3.72

iff
i i ni
i ~
i i i ~
i ~
x:
y: z: y x z:
(I )
(E )
i ~
(3.203)
(I )
(E )
i W ~~~ i i~ i i dW ~~~ i ~
i ~
u iu ~~~ i i~ X i i i~~~ i CX
i ~
i ~
(ax)
x: x: x: y: y x: x: y: x y:
x: y: y x: x: y: x y:
i u i
u x for some u, and then we have y x
i i
i i i
i Wi
u. Analogously for x y.
The LambekCalculus
249
Theorem 3.73 The rules (I ) and (I ) are admissible in . In the next section, however, we shall study the effect of adding associativity. In presence of associativity the introduction rules actually do considerable work. In that case, to regain correctness, we can either ban the introduction rules, or we can restrict the axioms. Given an ABsign grammar we can restrict the set of axioms to
For an ABgrammar does not possess any modes of the form x where x is a variable. Exercise 116. Prove the correctness theorem, Proposition 3.67. Exercise 117. Dene a function p from category complexes to categories as follows.
Show that is derivable in iff p . Show that this also holds for (cut). Conclude from this that (cut) is admissible in (categorial!) . (This could in principle be extracted from the proof for , but this proof here is quite simple.) Exercise 118. Show that every CFL can be generated by an ABgrammar using only two basic categories.
Exercise 120. Show that (cut) is admissible for . 6. The LambekCalculus

The LambekCalculus, , is in many respects an extension of . It has (Lambek, 1958). In contrast to , in categories are been introduced in not interpreted as sets of labelled trees but as sets of strings. This means
Exercise 119. Show the following claim: in the interpreted sequents are derivable which contain bound variables.
calculus no
V U
V U V U
(3.208)
p : p
p :
) ) (
V U
V U
V U
(3.207)
(ax )
where f
F and f
V
v
Here, Nx M. Since x M be eliminated.
x Nx
V U
X U
N, this part of the proof can
250
that the calculus has different laws. Furthermore, possesses a new category constructor, pair formation; it is written and has a counterpart on the level of categories, also denoted by that symbol. The constructors of the classical Lambekcalculus for categories therefore are , and . Given an alphabet A and an elementary category assignment we denote by the set of all strings over A which are of category with respect to . Then the following holds. (3.209)

Since we have the constructor at our disposal, we can in principle dispense with the symbol . However, we shall not do so. We shall formulate the calculus as before using , which makes it directly comparable to the ones we have dened above. Hence as before we distinguish terms from structures. . We shall axiomatize the sequents of . In We write if order to do so we add the following rules to the calculus (without (cut)). (ass1) (ass2)
~
This calculus is called the LambekCalculus, or simply . Further, we put : (I ) ( I) and : ( I) (I ). is also called the Nonassociative LambekCalculus. Theorem 3.74 (Lambek) (cut) is admissible for .
For a proof we only have to look at applications of (cut) following an application of the new rules. Assume that the left hand premiss has been obtained by an application of (ass1). (3.210) 1 2 2 1 2 2 1 2 3
~
~~ ~
~
X @V
X ~ ~~ V
Corollary 3.75 (Lambek)
with or without (cut) is decidable.
~~ ~
X X V
~~ ~
UYX X U
~
~~ ~
1 2 3 1 2 3 ( I)
1 2 3 1 2 3 (I )
~~
T T S S
~
~ ~ ~
S }
X@V X
U X@V X X YX U
3 X X U U YX
T S
S T
g 3
S
~
S 3
The LambekCalculus
251
This proof part we reformulate into the following one. (3.211)

~~ ~
Analogously if the left hand premiss has been obtained by using (ass2). We leave it to the reader to treat the case where the right hand premiss has been obtained by using (ass1) or (ass2). We have to remark here that by reformulation we do not diminish the degree of the cut. So the original proof is not easily transported into the new setting. However, the depth of the application has been diminished. Here, depth means (intuitively) the length of a longest path through the proof tree from the top up to the rule occurrence. If we assume that 1 2 3 has depth i and depth j then in the rst tree the application of (cut) has depth max i j 1, in the second however it has depth max i j . Let us look at the cases of introduction of . The case of ( I) on the left hand premiss is easy. 1 2 1 2 1 2
Now for the case of (I ) on the right hand premiss. (3.213)
In this case 1 2 . Furthermore, 1 2 and the marked occurrence of either is in 1 or in 2 . Without loss of generality we assume that it is in 1 . Then we can replace the proof by (3.214)
~
~~ ~
~ ~$ ~
1 1 1 1 2 1 2 1 2
~ ~ ~
~
~~ ~
~ ~~
~~ ~
~
~ ~ ~
~~$ ~
~~ ~
~$
~~ ~
(3.212)
1 2 1 2 1 2
gQT ) S
~~ ~ V ~
XV X
~ ~ ~
T ) S
X U YX
U
~~ ~
U YX X
2 3 1 2 3 1 2 3
U %X
252
We have 1 2 by hypothesis on the occurrence of . Now we look at the case where the left hand premiss of cut has been introduced by (I ). We may assume that the right hand premiss has been obtained through application of ( I). The case where is a side formula is once again easy. So let be main formula. We get the following local tree. (3.215)
~~ ~
In all cases the cutweight (or the sum of the depth of the cuts) has been reduced. We shall also present a different formulation of using natural deduction over ordered DAGs. Here are the rules:
(I )
These rules are very much like the natural deduction rules for intuitionistic logic. However, two differences must be noted. First, suppose we disregard for the moment the rules for . (This would incidentally give exactly the nat.) The rules must be understood ural deduction calculus corresponding to to operate on ordered trees. Otherwise, the difference between then rules for and the rules for would be obliterated. Second, the elimination rule for creates two linearly ordered daughters for a node, thus we not only create
(I )
(E )
. . .
(3.216)
(E )
. . .
(I )
(E )
~ ~ ~
~
~ ~ ~
~~ ~
1 2 1 1 2 2
~
~~ ~
~~ ~
1 1 1 2
~ ~~
X I 3
2 2 1 2 1 2
1 2 1 2
v3
The LambekCalculus
253
ordered trees, we in fact create ordered DAGs. We shall not spell out exactly how the rules are interpreted in terms of ordered DAGs, but we shall point out a few noteworthy things. First, this style of presentation is very much linguistically oriented. We may in fact proceed in the same way as for and dene algorithms that decorate strings with certain categorial labels and proceed downward using the rules shown above. Yet, it must be clear that the so created structures cannot be captured by constituency rules (let alone rules of a CFG) for the simple reason that they are not trees. The following derivation is illustrative of this.
(3.217)
Notice that if a rule has two premisses, these must be adjacent and follow each other in the order specied in the rule. No more is required. This allows among other to derive associativity, that is, . However, notice the role of the socalled assumptions and their discharge. Once an assumption is discharged, it is effectively removed, so that the items to its left and its right are now adjacent. This plays a crucial role in the derivation of the rule of function composition.
(3.218)
As soon as the assumption is removed, the top sequence reads . The relationship with is as follows. Let be a sequence of categories. We interpret this as a labelled DAG, which is linearly ordered. Now we successively apply the rules above. It is veried that each rule application preserves the property that the leaves of the DAG are linearly ordered. Dene a category corresponding to a sequence as follows.
V ) U D
(3.219)
: :
. . .
V 3
U 3
~ ~
3 V
. . .
3 @V
V u U V u 3 U V u Y3 U U
254
First of all we say that for two sequences and , is derivable from in the natural deduction style calculus if there is a DAG constructed according to the rules above, whose topmost sequence is and whose lowermost sequence is . (Notice that assumptions get discharged, so that we cannot simply say that is the sequence we started off with.) The following is then shown by induction. Theorem 3.76 Let and be two sequences of categories. is derivable from iff is derivable in . This shows that the natural deduction style calculus is effectively equivalent to . allows for a result akin to the CurryHowardIsomorphism. This is an extension of the latter result in two respects. First, we have the additional type constructor , which we have to match by some category constructor, and second, there are different structural rules. First, the new type constructor is actually the pairformation. Denition 3.77 Every term is a term. Given two terms M and N, M N , p1 M and p2 M also are terms. Further, the following equations hold.
p1 U and p2 U are not dened if U is not of the form M N for some M and N. The functions p1 and p2 are called the projections. Notice that antecedents of sequents no longer consist of sets of sequences. Hence, , , now denote sequences rather than sets. In Table 8 we display the new calculus. We have also put a general constraint on the proofs that variables may not be used twice. To implement this constraint, we dene the notion of a linear term: Denition 3.78 A term M is strictly linear if for every variable x and every 1. A term is linear if it results from a strictly linear subterm N, focc x N term M by iterated replacement of a subterm M by p1 N x p2 N y M , where N is a linear term. The calculus above yields only linear terms if we start with variables and require in the rules (I ), (E ), (E ) that the sets of free variables be disjoint, and that in (I ) and (I ) the variable occurs exactly once free in M.
V U
! V U
0 )
V a0 )
( aU
QV ) U
V a0 )
V U
( aU
(3.220)
p1 M N
p2 M N
0 )
V U
The LambekCalculus
255
x: x: M: x: N: (cut) M xN: M: N: x: M: (E ) (I ) MN : x M : M: N: x: M: (E ) (I ) MN : x M : M: x: y: U : (E ) p1 M x p 2 M y U : M: N: (I ) M N :
In this way we can ensure that for every sequent derivable in there actually exists a labelling such that the labelled sequent is derivable in the labelled calculus. This new calculus establishes a close correspondence between linear terms and the socalled multiplicative fragment of linear logic, which naturally arises from the above calculus by stripping off the terms and leaving only the formulae. A variant of proof normalization can be shown, and all this yields that has quite wellbehaved properties. In presence of the rules (ass1) and (ass2) behaves exactly like concatenation, that is, it is a fully associative operation. Therefore we shall change the notation in what is to follow. In place of structures consisting of categories we C . shall consider nite sequences of categories, that is, strings over Cat We denote concatenation by comma, as is commonly done. Now we return to the theory of meaning. In the previous section we have seen how to extend by a component for meanings which computes the meaning in tandem with the category. We shall do the same here. To this end we shall have to rst clarify what we mean by a realization of . We shall agree on the following.
s e p%t
(3.221)
V U
~ ~~
~~ ~
V )
! V
~~~ )
~~ ~
~~~
0 )
~ 3
) )
( ~
) )
~~ ~
(ax)
~ ~~
~~ ~
~~ ~
Table 8.
with Term Annotation
256
The rules are tailored to t this interpretation. They are as follows.

~
This means that the restructuring of the term is without inuence on its meaning. Likewise we have (3.223)
~ ~ ~
(3.224) says that in place of a function of two arguments and we can form a function of a single argument of type . The two arguments we can recover by application of the projection functions. The fourth rule nally tells us how the type/category is interpreted.
~ ~
Here we have the same problem as before with . The meaning assignments that are being computed are not in full accord with the interpretation. The term 0 1 2 does not denote the same function as 0 1 2 . (Usually, one of them is not even well dened.) So this raises the question whether it is at all legitimate to proceed in this way. We shall avoid the question by introducing a totally different calculus, sign based (see Table 9), which builds on the calculus of the previous section. The rules (ass1) and (ass2) are dropped. Furthermore, ( I) is restricted to . These restrictions are taken over from for the abstraction rules. Sign based has the global side condition that no variable is used in two different leaves. This condition can be replaced (up to conversion) by the condition that all occurring terms are linear. In turn, this can be implemented by the adding suitable side conditions on the rules. Sign based is not as elegant as plain categorial . However, it is semantically correct. If one desperately wants to have associativity, one has to introduce combinators at the right hand side. So, a use of the associativity
s s
0 )
t t
(3.225)
:M :N : M N
~ ~$ ~
(3.224)
~ ~ ~
So, for
we assume the following rule.
:M : x : x : p 1 z x p2 z
~
1 2 3 1 2 3
~
(3.222)
1 2 3 1 2 3
:M :M
XV X X X V
X U U X U X X U
:M :M
x M
The LambekCalculus
257
Exercise 121. Assume that in place of sequents of the form for arbitrary only sequents c c, c C, are axioms. Show that with the rules of can be derived for every . Exercise 122. Let G C SA be a categorial sequent grammar. Show that the language x : G x is context free.
The categories do not form a loop with respect to , and (!), for the reason that is only partially dened. Here is a possible remedy. Dene
V U
V u j U
(3.226)
x x y
y x x
V U
V u U
Exercise 124. A loop is a structure L where and the following equations hold for all x y L.
Exercise 123. Show that the sequent is derivable in not in . What semantics does the structure have?
but 2
~ ~~
V 6U D
rule is accompanied in the semantics by a use of We shall not spell out the details here.
with MNP
~~ ~
1 ) 0 @!) a) ( ) u
0 )
T i ~ S i 0 ) ) ) ) (
(I )
( i ~
i ~~~
i yW i~ X i~ i Wi
( I)
V i ~~~
i CX
i i W ~ i
(E )
i ~~~
i W ~ i
(E )
"V )
i~
i ~ i i Ru ~
(I )
"V )
(I )
V U
i ~
i ~
~ ~~
i i~ i CX
i ~~~
(ax)
x : : x x : : x x : : x y : : N x an argument variable, 1 focc x N y x : : x N x : : x y : : N x an argument variable, 1 focc x N x y : : x N y: :N x: :M y x : : NM x: :M y: :N x y : : NM x : : x y : : y z : : M x y : : z z : : p 1 z x p2 z y M x: :M y: :N x y: : M N
Table 9. The Sign Based Calculus
M NP .
~~ ~
258
Exercise 125. Show that the following rules are admissible in .

~
7.
Pentus Theorem
It was conjectured by Noam Chomsky that the languages generated by are context free, which means that is in effect not stronger than . This was rst shown by Mati Pentus (see (Pentus, 1997)). His proof makes use of the fact that has interpolation. We start with a simple observation. Let : 1 1 be a group and : C G G. We extend to all types and structures as follows.
: 1 :
We call a group valued interpretation.
The proof is performed by induction over the length of the derivation and is left as an exercise. Let C be given and c C. For a category over C we dene
c c :
V U
V U
V U
gV U gV U g V U
u U U
(3.230)
c : c c : c c : c
c c c
1 0
if c c , otherwise.
Theorem 3.79 (Roorda) If is derivable in interpretations .
V U
(3.229)
V U V V U V V U V V U V
then for all group valued
~~ ~
~~ ~
~
~~ ~
V U
Vk U
X U u U U
(3.228)
( E)
(E )
(E )
1 2 1 2
Show that the free algebra of categories over C factored by is in the factored algebra?
U $u
(3.227)
V U
Cat
to be the least congruence such that
0 ) ) a) (
is a loop. What
Pentus Theorem
259
Likewise we dene
c C
These denitions are extended in the canonical way to structures. Let be a nonempty structure (that is, ) and a structure containing a marked occurrence of a substructure. An interpolant for a sequent in a calculus with respect to is a category such that
In particular if satises these conditions. We say that has interpolation if for every derivable there exists an interpolant with respect to . We are interested in the calculi and . In the case of we have to remark that in presence of full associativity the interpolation property can be formulated as follows. We deal with sequents of the form where is a sequence of categories. If 1 2 with then there is an interpolant with respect to . For let be a structure in which corresponds to (after omitting all occurrences of ). Then there exists a sequent which is derivable and in which occurs as a substructure. Interpolation is shown by induction on the derivation. In the case of an axiom there is nothing to show. For there we have a sequent and the marked structure has to be . In this case is an interpolant. Now let us assume that the rule (I ) has been applied to yield the nal sequent. Further, assume that the interpolation property has been shown for the premisses. Then we have the following constellation.
~ ~~
(3.233)
We have to nd an interpolant with respect to . By induction hypothesis there is a formula such that and are both derivable and c min c c for all c C. Then also and
~ ~~
~$
~ ~~
~ ~
~~ ~
) )
~~ ~
V U
T V U X
t $V
) V
X U
X U
} YV U
~
~
is derivable in .
is derivable in ,
T V U
) V U
g V U
QV U
min c
c c , for all c
C,
~W ~
dV U
V U
(3.232)
C : c
V U
0
(3.231)
yV U
260
~ ~~
are derivable and we have c min c c . Hence also is an interpolant with respect to in . The case of (I ) is fully analogous. Now we look at the case that the last rule is ( I).
Choose a substructure Z from . Several cases have to be distinguished. (1) Z is a substructure of , that is, Z . Then there exists an interpolant for Z with respect to Z. Then also is an interpolant for Z with respect to Z. (2) Z is disjoint with . Then we have Z (with two marked occurrences of structures) and there is an interpolant with respect to Z for Z . Also in this case one calculates that is the desired interpolant. (3) Z . By induction hypothesis there is an interpolant for with respect to , as well as an interpolant r for with respect to . Then : r is the interpolant. For we have
Furthermore, (3.236)
~
(4) Z . Then for some . Then by hypothesis there is an interpolant for with respect to . We show that is the desired interpolant.
~~ $ ~
In addition (3.238)
This ends the proof for the case ( I). The case ( I) again is fully analogous.
TV l
min c
X U TV l U
)V X k U ) V X k U
dV U
min c
c c
~
(3.237)
~~ ~
~~ ~
~~ ~
~~ ~
~ ~~
~ q k k
~
~ X
r r
r r r
T V U
) V U
T V
U V X X U ) g T V U V X U )
g V
V U
(3.235)
c r c min c c min c c min c c
~#
D ) lk
~
~
lk
) k
(3.234)
~~ ~
~~ ~
~~ ~
T V U
) V
X U
~ ~
YV U
kD
~~ ~
X D
Pentus Theorem
261
Now we move on to . Clearly, we only have to discuss the new rules. Let us rst consider the case where we add together with its introduction rules. Assume that the last rule is ( I). (3.239)
~$
Choose a substructure Z of . (1) Z does not contain the marked occurrence of . Then Z , and by induction hypothesis we get an interpolant for Z with respect to Z. It is easily checked that also is an interpolant for Z with respect to Z. (2) Let Z . Then . By induction hypothesis there is with respect to , and it also is an interpolant for an interpolant for with respect to . In both cases we have found an interpolant. Now we turn to the case (I ). (3.240)
~ ~~ ~~ ~
There are now three cases for Z. (1) Z . By induction hypothesis there is an interpolant for Z with respect to Z. This is the desired interpolant. (2) Z . Analogous to (1). (3) Z . By hypothesis there is an interpolant for with respect to and an interpolant r for with respect to . Put : r . This is the desired interpolant. For
~ ~
In addition it is calculated that c min c c . This concludes a proof of interpolation for . Finally we must study . The rules (ass1), (ass2) pose a technical problem since we cannot proceed by induction on the derivation. For the applications of these rules change the structure. Hence we change to another system of sequents and turn as discussed above to sequents of the form where is a sequence of categories. In this case the rules (ass1) and (ass2) must be eliminated. However, in the proof we must make more distinctions in cases. The rules
T V
X U
) V
~~
QV U
~ ~~
(3.241)
r r
lk
~
3 X
~ ~~ n ~ X 3 k D ~ E 3 ) k ~ X ) ~~ ) lk X D
k 3
3 k k 3
~$
Theorem 3.80
has interpolation.
X 3 X
262
(I ) and (I ) are still unproblematic. So we look at a more complicated case, namely an application of the rule ( I).
~~ ~ ~~ ~
We can segment the structure into . Let a subsequence Z be distinguished in . The case where Z is fully contained in is relatively easy; likewise the case where Z is fully contained in . The following cases remain. (1) Z 1 1 , where 0 1 for some 0 , and 1 2 for some 2 . Even if Z is not empty 1 as well as 1 may be empty. Assume 1 . In this case an interpolant for with respect to 2 and r an interpolant of with respect to 1 . (Here it becomes clear why we need not assume 1 .) The following sequents are therefore derivable.
~~ ~ ~
and on the other

~ ~~
The conditions on the numbers of occurrences of symbols are easy to check. (2) As Case (1), but 1 is empty. Let then be an interpolant for with respect to and r an interpolant for 0 1 with respect to 1 . Then put : r . is an interpolant for the end sequent with respect to Z.
(3) Z does not contain the marked occurrence of . In this case Z 2 1 for some nal part 2 of and an initial part 1 of . 2 as well as 1 may
kk ) )
kk
~~~
) )
(3.246)
~~ 9 ~
kk )
~~ ~
1 r 1 r 1 r
0 r 0 r
kk ) )
~ ~# ~
~9
kk )
(3.245)
kk )
0 r 0 r 2
~ ~~
) )
(3.244)
1 1 r 1 1 r 1 1 r
~
Now put :
r . Then we have on the one hand
kk )
~~ ~
(3.243)
2 1 r
1 0 r
~~ ~
kk ) )
k )
D ~
D kk ) ) ) k )
~
(3.242)
kk
Pentus Theorem
263
be assumed to be nonempty, since otherwise we have a case that has already been discussed. The situation is therefore as follows with Z 2 1 .
~ ~~ ~~~
Let be an interpolant for 1 2 with respect to 2 and r an interpolant for 1 2 with respect to 1 . Then the following are derivable
~
and
In this case as well the conditions on numbers of occurrences are easily checked. This exhausts all cases. Notice that we have used to construct the interpolant. In the case of the rules (I ) and ( I) there are no surprises . with respect to
Now we shall move on to show that is context free. To this end we introduce a series of weak calculi of which we shall show that together they are not weaker than . These calculi are called m , m . The axioms of m are sequents such that the following holds.
(cut) is the only rule of inference. The main work is in the proof of the following theorem.
1 2
m.
~ ~~
is derivable in .
1 2 or
1 for certain categories 1 and 2 .
Theorem 3.81 (Roorda)
has interpolation.
~~~
k ) k )
(3.250)
1 r 2 1 r 2 1 r 2
~
) 3 ) ) ) ) ) )k
~~ ~
(3.249)
2 1
1 r r
Now we choose :
r . Then we have both
~~ ~
) )k
(3.248)
2 1 r
~~ ~
1 r 2
k )
(3.247)
1 2 1 2 1 2 1 2
) ) ) k )
) )
~~ ~
) )k D
264
We shall show rst how to get from this fact that grammars are context free. We weaken the calculi still further. The calculus m has the axioms of m but (cut) may be applied only if the left hand premiss is an axiom.
~~ ~
Lemma 3.83 For all sequents the following holds: in m iff is derivable in m . The proof is relatively easy and left as an exercise.
~~ ~
Theorem 3.84 The languages accepted by grammars are context free. Proof. Let C A be given. Let m be larger than the maximum of all , a , a A. Since A as well as a are nite, m exists. For simplicity we shall assume that C : a a A . Now we put N: : m .G: N A R , where
Now let x , x x 0 x1 xn 1 . Then for all i n there exist an i xi such that is derivable in , where : 0 1 n 1 . By Theorem 3.82 and Lemma 3.83 is also derivable in m . Induction over the length of the derivation yields that G 0 1 n 1 and hence also x. Now let conversely G x. We extend the category assignment to G :A N Cat C by putting : while A . By induction over the length of the derivation of one shows that from G we get . Now on to the proof of Theorem 3.82.
%V U
Denition 3.85 A category is called thin if c sequent is called thin if the following holds.
~
1 for all c
C. A
V U
) aaa)
$
~ ~~
W aa W
")
V U
aaW
V U
0 1 : 0 1
~~ ~
" #)
S s
(3.251)
T V U
S s
R:
a:a
0 1
1 V U )
V U
0 ) ) ) ( } D V U (
1 V U 0 b) ) ) ) (
i ! i ~
D I
~~ ~
~ ~~
m and
is derivable in .
R i s
m for all i
m,
is derivable
~~ ~
) aaa)
Theorem 3.82 (Pentus) Let
0 1
n 1 .
is derivable in
iff
Pentus Theorem
265
All categories occurring in as well as are thin.
For a thin category we always have . We remark that for a thin sequent only c 0 or 2 can occur since c always is an even number in a derivable sequent (see Exercise 127). Let us look at a thin sequent and an interpolant of it with respect to . Then c c 1. For either c , and then c , whence c 0. Or c ; but 1. then c , and so by assumption c
Lemma 3.86 Let be a sequent and c d C two distinct elementary categories. Further, let c as well as d . Then is not thin. Proof. Let G C be the free group generated by the elementary categories. 1 The elements of this group are nite products of the form c s0 cs2 c sn 1 , n 0 2 0 . (If n 0 then the empty product where ci ci 1 for i n 1 and si denotes the group unit, 1.) For if c0 c1 the term cs0 cs1 can be shortened 1 0 s to c00 s1 . Look at the group valued interpretation sending every element of C to itself. If the sequent was thin we would have . By hypothesis the left hand side is of the form w c 1 x d 1 y c 1 z for certain products w x y z. The right hand side equals t d 1 u for certain t u. Furthermore, we know that terms which stand for w, x, y, z as well as t and u cannot contain c or d if maximally reduced. But then equality cannot hold.
Proof. The proof is by induction on n. We start with n 1. Here the sequent 1 since the sequent has the form 0 1 2 . Let c 1 . Then c 1 is thin. And since c 0 1 2 2, we have c 0 2 1, whence c 0 2 . This nishes the case n 1. Now let n 1 and the claim proved for all m n. Case a. 0 1 n 2 n . Then we
t V
) aaa)
s V
V U D 1
~~ ~
} V U ) iaaa) )
Lemma 3.87 Let 0 1 0 k n 1 and k

~~ ~
n n 1 be thin, n k 1 k 1 .
0. Then there is a k with
V U
V U
jaa
t V U
~~ ~
Now c c is an even number hence either 0 or 2. Hence is thin. Likewise it is shown that is thin.
% % % V U V U V U
dV U
g ) V U
V U
T 1 S v D
t V U
QV U
~ ~
g @V U
~~~
) )
) ) )
QV ) U
(3.252)
V U V U
V U
V ) U
V U
jV U
V U
V U
V ) U
V U
V U
V U
V ) U 1
dV ) U
2 for all c
g @V U
) )
~ ~~
is derivable in .
C.
s V
~F
also
266
choose k : n. For if c n then c 0 n 2 0, and so we have c n 1 c n 1 1. Hence we get c n 1 n 1 . Case b. 0 1 n 2 n . Then there exists an elementary category c with c 0 n 2 and c n . Put : 0 1 n 1 , : n . Let be an interpolant for n 1 with respect to . Then and n n 1 are thin. By induction hypothesis there exists a k such that k k 1 k 1 , if k n 1, or k k 1 in case k n 1. If k n 1 then k is the desired number for the main sequent. Let now k n 1. Then
We show that k in this case too is the desired number for the main sequent. Let n 1 n 1 , say d n 1 n 1 . Then surely d n , so d c. Therefore the sequent is not thin, by Lemma 3.86. Hence we have n 1 n 1 , and so n 1 n 2 n . Lemma 3.88 Let be an derivable thin sequent in which all categories have length m. Then is already derivable in m .
n 1 ; put n : . If n 2 then already is Proof. Let 0 1 an axiom of m . So, let n 2. By the previous lemma there is a k such that k k 1 k 1 . Case 1. k n. Case 1a. k 1 k k 1 k . Put : 0 1 k 2 , : k 1 n 1 , and : k 1 k . Let be an interpolant for n with respect to . Then the sequent
~
is thin. Furthermore (3.255)
1 1
Let c k 1 . Then c k 1 1 and c k 1 k n 2, from which c k n 1. Hence either c k 1 or c n 1. Since c was arbitrary we have
V aV
t @V
U U V v
) ) U
t V
(3.256)
D V aV
) ) U V ) )
) ) U
t @V
D V U ) U
U aV U s V ) ) U
V D ) ) U t V @aV U
t@V s @V
U U D U QV U U }
) iaaa)
) )
) ) ) U V U
) iaaa)
(3.254)
f
D jV
t V
~~ ~
) iaaa) U
s V
~~
) ) ) iaaa)
t bV
} V
s V
s V
} QV U
s V
V D
) iaaa)
w V D w YV D
~~ ~
} dV
) jV U V U t s V U YV U }
(3.253)
V U
) iaaa)
s bV
s V U
} CV
U 1 ) iaaa) U
~~
w V D
1 D U
) iaaa) U 1 t V ) iaaa) ) U V U g V U v s bV
~~ ~
t@V t bV D
Dv } CV ) 1
DU U U
Pentus Theorem
267
So (3.258)
(Note that k 1 k .) Therefore also m and so k 1 k is an axiom of m . Hence, by induction hypothesis n is derivable in m . A single application from both sequents yields the main sequent. It is there k l 1 . Here fore derivable in m . Case 1b. k 1 k one puts : 0 k 1 , : k k 1 , : k 1 n 1 and proceeds n 2 . Also here we as in Case 1a. Case 2. k n 1. So, n 1 distinguish to cases. Case 2a. n 2 n 1 n 1 n . This case is similar to Case 1a. Case 2b. n 2 n 1 n 1 n . n 2 , : n 1 . Let be an interpolant for n Here put : 0 with respect to . Then as well as n 1 n are thin. Further we have (3.259)
As in Case 1a we conclude that

2
Hence n 1 n is an axiom of m . By induction hypothesis, is derivable in m . A single application of (cut) yields the main sequent, which is therefore derivable in m . Finally we proceed to the proof of Theorem 3.82. Let i m for all i n, m. Finally, let 0 1 m 1 be derivable in . We choose and a derivation of this sequent. We may assume here that the axioms are only
~~ ~
) iaaa)
jV
(3.260)
jV
t V
U ijV v
U !jV g
t V
n n m
VV aaV
t V
U U vV U
V aV
U @aV U s V U t@V U @aV U s V V aV U @V s
t V U U D U V U U t D U @V U QV U U t }
n 1 n n 1 n n 2 n 1 n n
jV
~~
~ ~~ ) ~~ ~ ) ) iaaa) U IV t U U t U D jV IV jV U RV t U jV f U RV t U V U RV s U V } U v D aaa) )iaaa) ) ) U jV D U $V t U D
jV
t $V
~~ ~
) )
jV
t V
U 6jV g
t @V
jV U !jV g
jV U D U U jV U D
U U jV U D
U U }
(3.257)
VaV U V U aV U t U s V V aV ) ) U @V U aV t UV
1
tV U U V v ) ) U @V t
U U
V U
~
k k
n k n k 1 k k k
V
1
t V
) ) U
t V
By choice of k, k
. Hence
268

~~ ~ ~~ ~
sequents of the form c c. For every occurrence of an axiom c c we choose a new elementary category c and replace this occurrence of c c by c c. We extend this to the entire derivation and so we get a new derivation of a sequent 0 1 n 1 . We get c i n c i 2, if c occurs at all in the sequent. Nevertheless, the sequent need not be thin, since it may contain categories which are not thin. However, if c 2 for some and some c, then c is not contained in any other category. We exploit this as follows. By successively applying interpolation we get the following sequents, which are all derivable in .
Y ~
(3.261)
It is not hard to show that c i 1 for all c and all i n. So the sequent 0 1 n 1 is thin. Certainly m as well as i i i m for all i n. By Lemma 3.88 the sequent 0 1 n 1 is derivable in m . The sequents i i , i n, as well as n are axioms of n 1 is derivable in m . We undo the replacement in m . Hence 0 0 the derivation. This can in fact be done by applying a homomorphism (substin 1 n tution) t which replaces c by c. So, we get a derivation of 0 1 in m . This concludes the proof of Theorem 3.82. We remark that Pentus has also shown in (Pentus, 1995) that is complete with respect to socalled Lframes. Denition 3.89 An Lframe is a free semigroup of the form A . A valuation is a function v : C A . v is extended to categories and sequents as follows: v v v v
(3.262)
: :
v :
v v
V U
} YV U
is true under v if v under all valuations.

~ ~~
v . It is valid in an Lframe if it is true
V U V U V U V U
V U uu aaV U a V U V U
V X D VR D VR V R
~ ~~
) aaa)
0 a) (
) iaaa)
~~ ~
Y
~~ ~
yV U
V U
) iaaa)
0 1
) aaa)
n n
Y
0 1
. . .
. . .
~ ~~
) Y aaa) ) Y aaa)
0 0 1 1
~ ~ ~~ Y Y
0 1 2 0 1 2
n n
1 1
Y ~
~ ~~
V YU
V U
g bV Y U
~ ~~
) iaaa)
U v U U U
) iaaa) 3
~
Y ) iaaa) D )
Montague Semantics I
269
A survey of this subject area can be found in (Buszkowski, 1997). Exercise 126. Prove Theorem 3.79. Exercise 127. Let is an even number.
Exercise 128. Prove Lemma 3.83.

~ ~
8.
Until the beginning of the 1970s semantics of natural languages was considered a hopeless affair. Natural language was thought of as being completely illogical so that no formal theory of semantics for natural languages could ever be given. By contrast, Montague believed that natural languages can be analysed in the same way as formal languages. Even if this was too optimistic (and it is quite certain that Montague did deliberately overstate his case) there is enough evidence that natural languages are quite wellbehaved. To prove his claim, Montague considered a small fragment of English, for whose semantics he produced a formal account. In this section we shall give a glimpse of the theory shaped by Montague. Before we can start, we have to talk about predicate logics and its models. For Montague has actually built his semantics somewhat differently than we have done so far. In place of dening the interpretation in a model directly, he dened a translation into calculus over predicate logic, whose interpretation on the other hand is xed by some general conventions. A language of rstorder predicate logic with identity has the following symbols: a set R of relation symbols, a disjoint set F of function symbols,
i
the equality symbol , the booleans , , , , the quantiers , .

( @
a countably innite set V :
"
Exercise 129. Show that if
is valid in all Lframes.
:i
of variables,
V U
g $V U
be derivable in , c
"
Theorem 3.90 (Pentus)
iff
is valid in all Lframes.
C. Show that c
&
270
As outlined in Section 1.1, the language is dened by choosing a signature . Then r is a r ary relation symbol and f a f ary function symbol. Equality is always a binary relation symbol (so, 2). We dene the set of terms as usual. Next we dene formulae (see also Section 2.7).
1
If t0 and t1 are terms then t0 t1 is a formula.
A structure is a triple M f : f F r : r R such that M for every f F and r M r for every r R. Now let f : M f :V M. Then we dene for a formula by induction. To begin, we associate with every t its value t under . (3.263)
(3.264)
In this way formulae are interpreted in models.
) "0 l(
Denition 3.91 Let be a set of formulae, and a formula. Then for all models : if for every , then also
0 k l( ) ) 0 k l( ) 0 l( ) 0 l( ) "0 l(
k k
0 0
(0
(0
: : : : x : x :
and or or there is x : for all x :
1 0 QaV U
r s :
si : i
)l ( )l ( )l ( )l ( a (
) 0 l(
0 l( ) ) 10 l( & ) ( 0 l( ) 0 l( ) 0 l( ) "0 l( ) "0 l( i ) ) C0 l(
s 0 s1 :
s0
s1
Now we move on to formulae. (In this denition, y only if y x.)
, for x
V , if y
YV U D
0 l( )
) iaaa) U
f t0
V )aaa) aU V U
x :
t0
0 IT
' &
S) 'bT
0 )
S ')
0 )l( 1
If is a formula and x
V , then
x and
x are formulae.
if .
If and are formulae, so are
, , and .
) iaaa) s
V U
If ti , i
r , are terms then r t0
t r
is a formula.
V @ U DV U
V U
0 ) (
0 ) ( V U
271
For example, the arithmetical terms in , and with the relation can be interpreted in the structure where and are the usual operations, 0 and . Then for the valuation with 7 we have:
This formula says that is a prime number. For a number w is a prime number iff for all numbers u and v: if u v w then u 1 or v 1. We compare this with (3.265). (3.265) holds if for all different only on from This in turn is the case if for all different only on
This means: if u : , v: and w : and if we have w u v, then u 1 or v 1. This holds for all u and v. Since on the other hand w we have (3.265) iff , that is to say 7, is a prime number. The reader may convince himself that for every
This says that for every number there exists a prime number larger than it. For later use we introduce a type e. This is the type of terms. e is realized by M. Before we can start designing a semantics for natural language we shall have to eliminate the relations from predicate logic. To this end we shall introduce a new basic type, t, which is the type of truth values. It is realized by the set 0 1 . An nplace relation r is now replaced by the characteristic function r from ntuples of objects to truth values, which is dened as follows.
This allows us to use calculus for handling the argument places of r. For example, from the binary relation r we can dene the following functions r 1 and r2 . (3.271) r2 :
x e ye r
ye xe
(3.270)
r1 :
x e ye r
xe ye
) aaa)
) iaaa)
(3.269)
@@ (3 Q( V ) ) & ) ) w @0b0 (' gy0 2P(
(3.268)
x0 x1
x r
1:
r x0
x r
V l kk U
D V U l kk V D U R kk D D D ) w @@ 3 d5 7 0 kk 2P( (
(3.267)
@ d5 F 0 k 2P( ( ) ) w
(3.266)
8 (3 QC 7 @ 0 gy0 2P( ( ) ) ) w
(3.265)
from
V U
D y
v D qy v k
V R!U
kk
V U
D y
T ) S
V U
D y
272
So, we can dene functions that either take the rst argument of r rst, or one which takes the rst argument of r second. Further, we shall also interpret , , and by the standard settheoretic functions , , and , respectively: (3.272) 0 1 1 0 0 1 0 1 0 1
(
Syntactically speaking has category t t and , and have category t t t. Finally, also the quantiers must be turned into functions. To this end t . Moreover, we introduce the function symbols and of type e t X is true iff for all x X x is true, and X is true iff for some x X x is true. x is now replaced by x , and x by x . So, ignoring types for the moment, we have the equations (3.273) (3.274)
We shall however continue to write x and x . This denition can in fact be used to dene quantication for all functions. This is the core idea behind the language of simple type theory (STT) according to Church (1940). Church assumes that the set of basic categories contains at least t. The symbol has the type t t, while the symbols , and have type t t t . (Church actually works only with negation and conjunction as basic symbols, but this is just a matter of convenience.) To get the power of predicate logic we assume for each type a symbol of type t t and a symbol of type t . Put : Typ B . Denition 3.92 A Henkin frame is a structure
such that the following holds.
For every a
1 iff for every b
D : b a
Dt 0 1 , : Dt Dt and intersection, respectively.
0 3) T !b1
v T ) S
S !(
D :
is a functionally complete typed applicative structure. : Dt Dt Dt are complement and 1.
0 T I1
S) T 'b1
S) t) v) 3) T '!i@!b1
S !(
(3.275)
D :
V U
&
C'
CV
1 &
aU U
V ' U
0 1 0 0 0 1
V U
'
&
d&
&
0& )& R& ( ' & D 0& Q(& R& ( & ) D
&
V U
0 1 0 1 1 1
0 1 1 1 0 1
V u U
0
V F& U
273
A valuation into a Henkin frame is a function such that for every variable x of type x D . For every N of type t, N iff N 1. Further, for a set of expressions of type t and every N of type t, N if for every Henkin frame and every valuation : if M for all M then N. is the interpretation of and the interpretation of . So, is the device discussed above that allows to dene the universal quantier for functions of type t. on the other hand is a kind of choice or witness function. If a is a function from objects of type into truth values then a is an object of type , and, moreover, if a is at all true on some b of type , then it is true on a. In Section 4.4 we shall deliver an axiomatization of STT and show that the axiomatization is complete with respect to these models. The reason for explaining about STT is that every semantics or calculus that will be introduced in the sequel can easily be interpreted into STT. We now turn to Montague Semantics. To begin we choose a very small base of words.
The type of (the meaning of) and is e, the type of is e t, the type of e e t . This means: names are interpreted by individuals, intransitive verbs by unary relations, and transitive verbs by binary relations. The (nite) verb is interpreted by the relation and by the relation . Because of our convention a transitive verb denotes a function (!) of type e e t . So the semantics of these verbs is
We already note here that the variables are unnecessary. After we have seen how the predicate logical formulae can be massaged into typed expressions, we might as well forget this history and write in place of the function xe xe and in place of xe ye ye xe . This has the additional advantage that we need not mention the variables at all (which is a moot point, as we have seen above). We continue in this section to use the somewhat more longwinded notation, however. We agree further that the value of shall
P 7 H RG
V ) jjinh Uk p k Rj jh
k jih
(3.278)
U k jinh U k Rj jnh p
(3.277)
xe xe ye
xe
ye xe
hp k Rj jh
A 8 h h P F@A
(3.276)
) i 0 v(
) i 0 v(
V U A h h @A 5bh4h$G P 7 H RG ) ) ) S A h h A 8 h h P 5 h 4 h P 7 H RA 9@@A b$G RG
) i 0 v(
A 8 h h P F9@@bA
&
k jih
For every a D a a 1.
t,
if there is a b
D such that a b
1 then also
1D
AA h h A 8 h h P F98bA
V 3
1 yV U V
U k Rj jnh p
U Y3
A h h RA
274
(3.279)
e e e t e t e
xe x e ye
xe
ye xe
The sentences or are grammatical, and their meaning is and . The syntactic categories possess an equivalent in syntactic terminology. e for example is the category of proper names. The category e t is the category of intransitive verbs and the category e t e is the category of transitive verbs. This minilanguage can be extended. For example, we can introduce the word by means of the following constant mode.
The reader is asked to verify that now is an intransitive verb, whose meaning is the complement of the meaning of . So, is true iff is false. This is perhaps not such a good example, since the negation in English is formed using the auxiliary . To give a better example, we may introduce by the following mode.
In this way we have a small language which can generate innitely many grammatical sentences and which assigns them correct meanings. Of course, English is by far more complex than this. The real advance that Montague made was to show that one can treat quantication. Let us take a look at how this can be done. (Actually, what we are going to outline right now is not Montagues own solution, since it is not in line with Categorial Grammar. We will deal with Montagues approach to quantication in Chapter 4.) Nouns like and are not proper names does not denote a single but semantically speaking unary predicates. For individual but a class of individuals. Hence, following our conventions, the semantic type of and is e t. Syntactically speaking this corresponds to either t e or e t. Here, no decision is possible, for neither
4 #
0 aV U
V U
4 9H h A g 7 f
4 9
) V u U aV u $aV u a V U u UU)
h A g F7 f
4 #
(3.281)
e t
e t
e t
xe
ye
z e xe
ze
ye
ze
P 7 H RG
A 8 h h P F8bA
9 IH
A 8 h h P A P 7 H F9@@bRG
4 g 9 F@A A 8 h h P 0 aV U
(3.280)
e t
e t ee
x e xe
xe
kt F'bp u
be the constant and the value of nally our 0ary modes.
58h4#hG AR8#G h h A 5 h 4 h A 8 h h P A 5 h 4 h 9@@8#G 0aV ) lkjinh U ) V u U ) ( A h h 0 aV U k pRj jnh ) u ) RA ( A) 8 h h P 9@@A ( 0 k F'bp t ) 5 h 4 h 0 k @@Ip r q ) ) 8$$G ( P 7 H RG V k F'baF6b6jih t p ) k t p U k V u U V k 8@R6U k Rj jh r q p p
5 h 4 h 8$$G
the constant
. Here are
) V u $aV u ) g ( U u U 4 9
r q k @@Ip
9 RH
g 9 F98bA A 8 h h P
g 9
4 9
275
nor is a grammatical sentence. Montague did not solve this problem; he introduced a new category constructor , which allows to distinguish a category t e from t e (the intransitive verb) even though they are not distinct in type. Our approach is simpler. We introduce a category n and stipulate that n : e t. This is an example where the basic categories are different from the (basic) semantic types. Now we say that the subject quantier has the sentactic category t e t n. This means the following. It forms a constituent together with a noun, and that constituent has the category t e t . This therefore is a constituent that needs an intransitive verb to form a sentence. So we have the following constant mode.
Let us give an example.
The syntactic analysis is as follows.
This induces the following constituent structure.
Now we shall have to insert the meanings in place of the words and calculate. This means converting into normal form. For by convention, a constituent has the meaning that arises from applying the meaning of one immediate part to the meaning of the other. That this is now welldened is checked by the syntactic analysis. We calculate in several steps. is a constituent and its meaning is
V aV
V
V U k 5 q u U U k 5hu q UU a
(3.287)
V aV U k Hhu q
V aV U UVV aaV U
UVaV U
~U
xe ye ye
Further,
is a constituent with the following meaning
y e t xe xe t xe ye t xe xe xe x e xe xe ye t xe xe xe ye t xe
V k F'ba) t p
U k jinh
V k F6b6aV t p U V
U k jinh
4 #
5bh h U
(3.286)
x e ye
ye xe
ye
5 h 4 h G A h h 8#A
V aV
U dV
U aU
(3.285)
(3.284)
V u U
5 h 4 h G A h h 8$Y@A
V u U aV u U U V
t e t n t e t
e t e e t
5 h 4 h b$G
A h h RA
4 #
4 #
5 8h h
5 bh h
(3.283)
ye
xe
0V aaV U
V U
U
5 h 4 h A h h A 4 8$$G @#H
) aV u U ) V U
5 bh h
(3.282)
t e t
n xe
ye
xe xe
aV u U U V
V u U
V U
5 bh h
5 8h h
P 7 H RG
`
P 7 H IG
xe
ye
xe
276
Now we combine these two: (3.288) xe xe
ye
ye
xe
xe
This is the desired result. Similarly to
we dene
If we also want to have quantiers for direct objects we have to introduce new modes. (3.290) (3.291)
e t
e t e
e t
e t
e t
t
e t e
e t
e t
For as a direct object is analyzed as a constituent which turns a transitive verb into an intransitive verb. Hence it must have the category e t e t e . From this follows immediately the category assignment for . Let us look at this using an example.
`
The constituent structure is as follows.
From this we get for
is analogous to
V aV U
@V U k 5 q u U
(3.296)
ye
(3.295)
4 9
xe
xe
xe
ye
xe
V aV
U UV V aV
p ` 4 #H 5 bh h V aV ) U k jiV U k i$d )U h hr 1 U k jinh U V U k i$d )U hr 1
y e xe y e xe
xe
xe ye ye xe
e t
e t
ye xe
xe ye
V aV
U V U
` g hAF7 f 8h A 5 h A h h U k i$d )U ~ hr 1
(3.294)
ye
The meaning of
is, as is easily checked, the following: xe ye xe ye
y e xe
VVV aaaV
(3.293)
h A g F7 f
h A g R7 f bh h 5 ` U U yV p f aU U 5 8h h RA #H h Ig A A h h 4
(3.292)
0V U aaV V U
V
xe
ye
y e xe xe
xe
ye
0V U aaV V U
V
xe
ye
y e xe xe
xe
ye
xe ye xe ye
0V aaV
V
U
h A g F7 f
U ) aV V u aU V u a) g V U UU f h U ~ ) aV V u aU V u a) V U UU ` 8h 5
(3.289)
t e t
n xe
ye
xe xe
xe
V t p aV k F'ba)
: ye
t
` f 8h h h Ig A 5 V t p aV k F6ba) U k jiV U k 5 h q u U V UV t p aV aV k F'ba) U k jinh U V U k 5 q u U U k jinh UV aV U V U k 5hu ~ q
ye xe xe
xe
xe
ye
xe
ye
ye
5 h A h h A 4 8h 9H
) aV u U ) Ig ( V U f h A
f h Ig A
4 #
V V u aU V 8h u U h U 5
U ~ ~ A h p ( ( D D
xe
5 8h h H p
`
f h g A
277
We combine (3.296) and (3.295).
xe
ze
xe ze
One can see that the calculations require some caution. Sometimes variables may clash and this calls for the substitution of a variable. This is the case for example when we insert a term and by doing so create a bound occurrences of a variable. The calculus is employed to do this work for us. (On the other hand, if we used plain functions here, this would again be needless.) Montague used the cancellation interpretation for his calculus, hence the sequent formulation uses the calculus . We have seen that this calculus can also be rendered into a sign grammar, which has two modes, forward application ( ) and backward application ( ). In syntactic theory, however, the most popular version of grammar is the LambekCalculus. However, the latter does not lend itself easily to a compositional interpretation. The fault lies basically in the method of hypothetical assumptions. Let us see why this is has category n n, and its type is e t e t . so. An adjective like (This is not quite true, but good enough for illustration.) This means that it , but not relational nouns such as or can modify nouns such as . Let us assume that the latter have category n g (where g stands for the category of a genitive argument). Now, in Natural Deduction style LambekCalculus we can derive a constituent by rst feeding it a hypothetical argument and then abstracting over it. n n: . . . n g: g:y
(3.298)
n:
n: n g : y
This allows us, for example, to coordinate and and then compose with . Notice that this proof is not available in . There also is a sign based analogue of this. Introduce binary modes and : (3.299)
I
x M
y x
y x x Mx
0aV 0 aV
U) u ) Ru ( i i V a0 D U ) ) i ( i V a0
) ) 0 i() ) ) 0 i()
x M
y x
x y x Mx
V U
9 h D n5
V aV
I
9 h D n5
UVV aaV
X X
G 5 7 g
h 9 D f X g @G 5$7 g RFD!hcFD 9 R V aV U k $b3 I6U k b4 t r d 2 o 2 V aV U k $b3 II6U k b2 t r d 2 o V U k tr$b3 IIo d 2 d 2 k tr$b3 Io k b2 G 5 $7 g F!9 R D h R FD
R D 9$
h 9 R c9D
5 bH
R FD
) ) aU i( ) ) aU i(
D D
(3.297)
xe
y e xe xe xe
y e xe ze
VV aaV ) U k jihV U k i$d hr ) U k jihV U k i$d ) hr 1U VaaV ) U k jiV U k V h VV aaV U

xe ye xe xe
1U ) V U k 5 ~ q u U ~ U V U k 5 q u U h r d 1U i$) ~ U @V U k 5 q u U
ye
xe
xe
ye
xe
U @G 5 7 g
R D 9$
ye xe
xe
h 9
I
278
A condition on the application of these modes is that the variable x actually occurs free in the term. Now introduce a new 0ary mode with exponent c , which shall be a symbol not in the alphabet.
Consider the structure term (3.301)
These modes play the role of hypothetical arguments in Natural Deduction style derivations. However, the combined effect of these modes is not exactly the same as in the LambekCalculus. The reason is that abstraction can only be over a variable that is introduced right or left peripherally to the constituent. However, if we introduce two arguments in succession, we can abstract over them in any order we please, as the reader may check (see the exercises). The reason is that c bears no indication of the name of the variable that it introduces. This can be remedied by introducing instead the following 0ary modes.
Notice that these empty elements can be seen as the categorial analogon of traces in Transformational Grammar (see Section 6.5). Now the exponent reveals the exact identity of the variable and the LambekCalculus is exactly mirrorred by these modes. The price we pay is that there are structure terms whose exponents are not pronounceable: they contain elements that are strictly speaking not overtly visible. The strings are therefore not surface strings. Notes on this section. Already in (Harris, 1963) the idea is defended that one must sometimes pass through nonexistent strings, and TG has made much use of this. An alternative idea that suggests itself is to use combinators. This route has been taken by Steedman in (1990; 1996). For example, the addition of the modes and assures us that we can derive the these constituents as well. Steedman and Jacobson emphasize in their work also
) )
(3.303)
c i x i
0 aV U k 7$b$ Io k b4 t r d 2 2
) )
(
(3.302)
n g xg 0
0 k 7$b$ Ia) ) t r d 2 o
Here, : n n and : n g dition that it is denite, it has the following unfolding.
@G 5 $7 g
R D F!
h 9 fD
bP
6 @ 6 @ A A
0
. On conxg 0
G 5 7 g
0 k 8B) ) 2
) )
R D 9!
( h 9
9 7 @8
R 9D
R 9D
(
(3.300)
c x i
R T
279
that variables can be dispensed with in favour of combinators. See (Jacobson, 1999; Jacobson, 2002) (and references therein) for a defense of variable free semantics. For a survey of approaches see (B ttner and Th mmel, 2000). o u Exercise 130. Write an ABgrammar for predicate logic over a given signature and a given structure. Hint. You need two types of basic categories: e and t, which now stand for terms and truthvalues. Exercise 131. The solutions we have presented here fall short of taking certain aspects of orthography into account. In particular, words are not separated by a blank, sentences do not end in a period and the rst word of a sentence is written using lower case letters only. Can you think of a remedy for this situation?
6
U I I
Exercise 132. Show that with the help of it is possible to derive the sign
and
from the sign

`
Exercise 133. We have noted earlier that , and are polymorphic. The polymorphicity can be accommodated directly by dening polyadic operations in the calculus. Here is how. Call a type tnal if it has the following form: (a) t, or (b) , where is tnal. Dene , and by induction. Similarly, for every type dene functions and t that interpret the existential and universal quantier. of type Exercise 134. A (unary) generalized quantier is a function from propert). Examples are ties to truth values (so, it is an object of type e t and , but there are many more: (3.306) (3.307) (3.308)
First, give the semantics of each of the generalized quantiers and dene a sign for them. Now try to dene the semantics of . (It takes a
f h Ig A
'
9 RH
h g 5 f
g 9
0 U U aV V V U k RC $n
5 g RH 9
p G AB 5 g 4 $RD h 4 h 5 g 8h f @h ` RH 5 7 9 9 h 9 G G h h 4 9 4 h g @5 RH 5 f
) V u ) U
(3.305)
e t e e x y z
0 U U aV V V U k RC $n
z y x
) V u ) U h
D qR
(3.304)
e t e e x y z
z x y
and the 0ary modes
D qR
5 bh h
&
280
number and forms a generalized quantier.) Exercise 135. In CCG , many (but not all) substrings are constituents. We should therefore be able to coordinate them with . As was noted for example by Eisenberg in (1973), such a coordination is constrained (the brackets enclose the critical constituents).
The constraint is as follows. x y z is wellformed only if both x z and y z are. The suggestion is therefore that rst the sentence x z y z is formed and then the rst occurrence of z is deleted. Can you suggest a different solution? Note. The construction is known as forward deletion. The more common backward deletion gives x z y, and is far less constrained.
i i i 9 i RH
i i
9 RH
i d i
(3.310)
h #H A 4
D $
H A
(3.309)
h #H A 4
D
9 IH
H bA
5 H
i i l i 9 RH G ybh f H1h 5 f D A 4 A h 4 A H e i G G e 5bH 9 RH 4 #H 4 D H A b9 g f D A 4 A h G $D y'58h @f n51h 4 A i G G e I 9 IH 4 #H 4 D H A $9 g
V "uU
i i
Chapter 4 Semantics
1. The Nature of Semantical Representations
This chapter lays the foundation of semantics. In contrast to much of the current semantical theory we shall not use a modeltheoretic approach but rather an algebraic one. As it turns out, the algebraic approach helps to circumvent many of the difculties that beset a modeltheoretic analysis, since it does not try to spell out the meanings in every detail, only in as much detail as is needed for the purpose at hand. In this section we shall be concerned with the question of feasibility of interpretation. Much of semantical theory simply denes mappings from strings to meanings without assessing the question whether such mappings can actually be computed. While on a theoretical level this gives satisfying answers, one still has to address the question how it is possible that a human being can actually understand a sentence. The question is quite the same for computers. 2. However, Mathematicians solve the equation x 2 2 by writing x this is just a piece of notation. If we want to know whether or not 3 2 6, this requires calculation. This is the rule rather than the exception (think of trigonometric functions or the solutions of differential equations). However, hope is not lost. There are algorithms by which the number 2 can be approximated to any degree of precision needed, using only elementary operations. Much of mathematical theory has been inspired by the need to calculate difcult functions (for example logarithms) by means of elementary ones. Evidently, even though we do not have to bother any more with them thanks to computers, the computer still has to do the job for us. Computer hardware actually implements sophisticated algorithms for computing nonelementary functions. Furthermore, computers do not compute with arbitrary degree of precision. Numbers are stored in xed size units (this is not necessary, but the size is limited anyhow by the size of the memory of the computer). Thus, they are only close to the actual input, not necessarily equal. Calculations on the numbers propagate these errors and in bad cases it can happen that small errors in the input yield astronomic errors in the output (problems that have this property independently of any algorithm that computes the solution are called illconditioned). Now, what reason do we have to say that a
D Eg
282
Semantics
particular machine with a particular algorithm computes, say, 2? One answer could be: that the program will yield exactly 2 given exact input and enough time. Yet, for approximative methods the ones we generally have to use the computation is never complete. However, then it computes a series of numbers an , n , which converges to 2. That is to say, if 0 is any real number (the error) we have to name an n such that for all n n : an 2 , given exact computation. That an algorithm computes such a series is typically shown using pure calculus over the real numbers. This computation is actually independent of the way in which the computation proceeds as long as it can be shown to compute the approximating series. For example, to compute 2 using Newtons method, all you have to do is to write a program that calculates
1
For the actual computation on a machine it matters very much how this series is calculated. This is so because each operation induces an error, and the more we compute the more we depart from the correct value. Knowing the error propagation of the basic operations it is possible to compute exactly, given any algorithm, with what precision it computes. To sum up, in addition to calculus, computation on real machines needs two things: a theory of approximation, and a theory of error propagation. Likewise, semantics is in need of two things: a theory of approximation, showing us what is possible to compute and what not, and how we can compute meanings, and second a theory of error propagation, showing us how we can determine the meanings in approximation given only limited resources for computation. We shall concern ourselves with the rst of these. Moreover, we shall look only at a very limited aspect, namely: what meanings can in principle be computed and which ones cannot. We have earlier characterized the computable functions as those that can be computed by a Turing machine. To see that this is by no means an innocent assumption, we shall look at propositional logic. Standardly, the semantics of classical propositional logic is given as follows. (This differs only slightly and the set from the setup of Section 3.2.) The alphabet is of variables V : . A function : V 2 is called a valuation. We
T ) ) r) q ) t)s b) @iS
U V r s q
U v
8 k
(4.1)
an
an
a2 n
2 2an
D Fv
The Nature of Semantical Representations
283
extend to a mapping from the entire language to 2.
(4.2)
To obtain from this a compositional interpretation for the language we turn matters around and dene the meaning of a proposition to be a function from valuations to 2. Let 2V be the set of functions from V to 2. Then for every proposition , denotes the function from 2V to 2 that satises
(The reader is made aware of the fact that what we have performed here is akin to type raising, turning the argument into a function over the function that applies to it.) Also can be dened inductively.
(4.4)
Now notice that V is innite. However, we have excluded that the set of basic modes is innite, and so we need to readjust the syntax. Rather than working with only one type of expression, we introduce a new type, that of a register. . Then V G. Valuations are now Registers are elements of G : functions from G to 2. The rest is as above. Here is now a sign grammar for propositional logic. The modes are (0ary), , , , (all unary), and (binary). The exponents are strings over the alphabets, categories are either R or P, and meanings are either registers (for expressions of category R) or sets of functions from registers to 2 (for expressions of category P).
(4.5c) (4.5d) (4.5e) (4.5f)
xRy xRx
: : :
Ry
xP
x PM
y PN
x y PM N
x PM
x P 2V
0 I 0
Va0 D Va0 D Va0 D Va0 V a0
) ) 0 ) ) aU H i() i( ) ) a7H i(U i6) ) a i(U i!) ) aU i( i !) ) a$ i(U
(4.5b)
xRy
) 8 ( ) i i ) 8 ( ) i 0 l i ) ) ( i ) i W i!) W ( ) i W i!) W ( 0 ) ) (
(4.5a)
Ry
7H
IzU V s
p :
: p
t j D v j D V U S D
V U
V ! U
(4.3)
V U
1 p V
t V U V I U D V U v V I U D V U V U D
p : p
284
Semantics
It is easily checked that this is welldened. This denes a sign grammar that meets all requirements for being compositional except for one: the functions on meanings are not computable. Notice that (a) valuations are innite objects, and (b) there are uncountably many of them. However, this is not sufcient as an argument because we have not actually said how we encode sets of valuations as strings and how we compute with them. Notice also that the notion of computability is dened only on strings. Therefore, meanings too must be coded as strings. We may improve the situation a little bit by assuming that valuations are functions from nite subsets of G to 2. Then at least valuations can be represented as strings (for example, by listing pairs consisting of a register and its value). However, still the set of all valuations that make a given proposition true is innite. On the other hand, there is an algorithm that can check for any given partial function whether it assigns 1 to a given register (it simply scans the string for the pair whose rst member is the given register). Notice that if the function is not dened on the register, we must still give an output. Let it be . We may then simply take the code of the Turing machine computing that function as the meaning the variable (see Section 1.7 for a denition). Then, inductively, we can dene for every proposition a machine T that computes the value of under any given partial valuation that gives a value for the occurring variables, and assigns otherwise. Then we assign as the meaning of the code T of that Turing machine. However, this approach suffers from a number of deciencies. First, the idea of using partial valuations does not always help. To see this let us now turn to predicate logic (see Section 3.8). As in the case of propositional logic we shall have to introduce binary strings for registers, to form variables. The meaning of a formula is by denition a function from pairs to 0 1 , where is a structure and a function from variables to the domain of . Again we have the problem to name nitary or at least computable procedures. We shall give two ways of doing so that yield quite different results. The rst attempt is to exclude innite models. Then , and in particular the domain M of , are nite. A valuation is a partial function from V to M with a nite domain. The meaning of a term under such a valuation is a member of M or . (For if x is in t, and if is not dened on x then t is undened.) The meaning of a formula is either a truth value or . The truth values can be inductively dened as in Section 3.8. M has to be nite, since we usually cannot compute the value of x x without knowing all values of x . This denition has a severe drawback: it does not give the correct results.
{ D
T ) S
0 l( )
285
For the logic of nite structures is stronger than the logic of all structures. For example, the following set of formulae is not satisable in nite structures while it has an innite model. (Here is a 0ary function symbol, and a unary function symbol.) Proposition 4.1 The theory T is consistent but has no nite model.
Proof. Let be a nite model for T . Then for some n and some k 0: sn k 0 sn 0. From this it follows with the second formula that s k 0 0. Since k 0, the rst formula is false in . There is, however, an innite model for these formulae, namely the set of numbers together with 0 and the successor function. We remark here that the logic of nite structures is not recursively enumerable if we have two unary relation symbols. (This is a theorem from (Trakht nbrodt, 1950).) However, the logic of all structures is clearly recure sively enumerable, showing that the sets are very different. This throws us into a dilemma: we can obviously not compute the meanings of formulae in a structure directly, since quantication requires search throughout the entire structure. (This problem has once worried some logicians, see (Ferreir s, o 2001). Nowadays it is felt that these are not problems of logic proper.) So, once again we have to actually try out another semantics. The rst route is to let a formula denote the set of all formulae that are equivalent to it. Alternatively, we may take the set of all formulae that follow from it. (These are almost the same in boolean logic. For example, can be dened using and ; and can be dened by . So these approaches are not very different. However the second one is technically speaking more elegant.) This set is again innite. Hence, we do something different. We shall take a formula to denote any formula that follows from it. (Notice that this makes formulae have innitely many meanings.) Before we start we seize the opportunity to introduce a more abstract theory. A propositional language is a language of formulas generated by a set V of variables and a signature. The identity of V is the same as for boolean logic above. As usual, propositions are considered here as certain strings. The language is denoted by the letter L. A substitution is given by a map : V L. denes a map from L to L by replacement of occurrences of variables by their image. We denote by the result of applying to .
T ( I ) ) ) I ) bI I 3QFIF@b @P@$gS
(4.6)
T:
286
Semantics
Denition 4.2 A consequence relation over L is a relation L L such that the following holds. (We write for the more complicated .)
~
.
~ #
is called structural if from follows for every substitution. is nitary if implies that there is a nite subset of such that . In the sequel consequence relations are always assumed to be structural. A . is nitary if rule is an element of L L, that is, a pair is nite; it is nary if n. Given a set R of rules, we call R the least structural consequence relation containing R. This relation can be explicitly dened. Say that is a 1step Rconsequence of if there is a substitution and some rule R such that and . Then, an nstep consequence of is inductively dened. Proposition 4.3 R iff there is a natural number n such that is an nstep Rconsequence of . The reader may also try to generalize the notion of a proof from a Hilbert calculus and show that they dene the same relation on condition that the rules are all nitary. We shall also give an abstract semantics and show its completeness. The notion of an algebra has been dened. Denition 4.4 Let L be a propositional logic over the signature . A matrix D , where is an algebra (the algebra of for L and is a pair truth values) and D a subset of A, called the set of designated truth values. Let h be a homomorphism from . We write h if V into D and say that is true under h in . Further, we write if h for all homomorphisms h : V : if h D then h D.
~ ~ ~ ~ 5
Notice that in boolean logic is the 2element boolean algebra and D 1 , but we shall encounter other cases later on. Here is a general method for obtaining matrices.
T S
Proposition 4.5 If lation.
is a matrix for L,
is a structural consequence re-
1 dV U ) 0 l(
0 ) (
V U
V U
0 ( )
eD V U
1 Q0 ) (
If
~
and ;
, then ;
If
and
then
. .
e QV U
1 0 ) ( 1 QV U

~
287
Denition 4.6 Let L be a propositional language and a consequence relation. Put : : . is deductively closed if . It is consistent if L. It is maximally consistent if it is consistent but no proper superset is. A matrix is canonical for if for some set . (Here, V V is the canonical algebra with carrier set L whose functions are just the associated string functions.) It is straightforward to verify that . Now consider some set and a formula such that . Then put : and let h be the identity. Then h , but h V by denition of . So, . This shows the following.
~
Theorem 4.7 (Completeness of Matrix Semantics) Let consequence relation over L. Then
be a structural
(The reader may verify that an arbitrary intersection of consequence relations again is a consequence relation.) This theorem establishes that for any consequence relation we can nd enough matrices such that they together characterize that relation. We shall notice also the following. Given and D , then iff D is closed under the consequence. (This is pretty trivial: all it says is that if and h is a homomorphism, then if D we must have h D.) Such sets are called lters. Now, let h D be a matrix, and a congruence on . Suppose that for any x: x D or x D . Then we call admissible for and put : D , where D : x : x D . The following is easy to show.
Finally, call a matrix reduced if only the diagonal is an admissible congruence. Then, by Proposition 4.8 and Theorem 4.7 we immediately derive that every consequence relation is complete with respect to reduced matrices. One also calls a class of matrices a (matrix) semantics and says is adequate . for a consequence relation if Now, given L and , the system of signs for the consequence relation is this.
T ~ i 0 ) ) @T i i i( S s
1 i 0 6) ) @S i i(
(4.8)
P :
x R x :x
x Py :x
Y `D
Proposition 4.8 Let . Then
be a matrix and an admissible congruence on
0 ~
@S
1 %V U
S X(
( }D 0 ( ) } yD
0 ( )
(4.7)
canonical for
D T} S
1 V U
0 ) R V U
dl(
S U
V WD
0 R V U )
V U
R D
dl(
288
Semantics
How does this change the situation? Notice that we can axiomatize the consequences by means of rules. The following is a set of rules that fully axiomatizes the consequence. The proof of that will be left to the reader (see the exercises), since it is only peripheral to our interests. (4.9b) (4.9c) (4.9d) (4.9e) (4.9f) (4.9g)
p1 : mp :
With each rule we can actually associate a mode. We only give examples, since the general scheme for dening modes is easily extractable. (4.10) (4.11)
If we have as a primitive symbol then the following mode corresponds to the rule mp , Modus Ponens. This is satisfactory in that it allows to derive all and only the consequences of a given proposition. A drawback is that the functions on the exponents are nonincreasing. They always return x. The structure term of the sign x P y on the other hand encodes a derivation of y from x. Now, the reader may get worried by the proliferation of different semantics. Arent we always solving a different problem? Our answer is indirect. The problem is that we do not know exactly what meanings are. Given a natural language, what we can observe more or less directly is the exponents. Although it is not easy to write down rules that generate them, the entities are more or less concrete. A little less concrete are the syntactic categories. We have already seen in the previous chapter that the assignment of categories to strings (or other exponents, see next chapter) are also somewhat arbitrary. We shall return to this issue. Even less clearly denable, however, are the meanings. What, for example, is the meaning of (4.13)?
G
(4.13)
0 !) ) ( i i
0 ) ) ( i i
p 9 g D @h 7 p
V i i()0 i i a0 ) ) ( ) ) ag i(U f e
(4.12)
0 i R i) ) ( i V i i() i i( a0 l) ) 0 !) ) aU d P D 0 !) ) ( i i 0 i I ) ) aU i( b a c@ D
xP
x Py
xP y z
0l)bTIRd'!( ) S 0 ) T S $lbbR'!( 0 ) T S lbbR'!( 0R'bq !( ) T ) S 8 0 ) T ) S $lbbR!( 0 ) T S IRbF!( 0 ) T S lbd!(

y : x Py x Pz : x Py :
(4.9a)
p h A RA g 5 bb@H 5 H A h
D D D D D D
d : dn : u : c : p0 :
xP y z
x Pz
289
The rst answer we have given was: a truth value. For this sentence is either true or false. But even though it is true, it might have been false, just in case Caesar did not cross the Rubicon. What makes us know this? The second answer (for rstorder theories) is: the meaning is a set of models. Knowing what the model is and what the variables are assigned to, we know whether that sentence is true. But we simply cannot look at all models, and still it seems that we know what (4.13) means. Therefore the next answer is: its meaning is an algorithm, which, given a model, tells us whether the sentence is true. Then, nally, we do not have to know everything in order to know whether (4.13) is true. Most facts are irrelevant, for example, whether Napoleon was French. On the other hand, suppose we witness Caesar walk across the Rubicon, or suppose we know for sure that rst he was north of the Rubicon and the next day to the south of it. This will make us believe that (4.13) is true. Thus, the algorithm that computes the truth value does not need all of a model; a small part of it actually sufces. We can introduce partial models and dene algorithms on them, but all this is a variation on the same theme. A different approach is provided by our last answer: a sentence means whatever it implies. We may cast this as follows. Start with the set L of propositions and a of models. A primary (or model theoretic) semantics is set (or class) given in terms of a relation L . Most approaches are variants of the primary semantics, since they more or less characterize meanings in terms of facts. However, from this semantics we may dene a secondary semantics, which is the semantics of consequence. iff for all M : if M for all then M . (We say in this case that entails .) Secondary semantics is concerned only with the relationship between the objects of the language, there is no model involved. It is clear that the secondary semantics is not fully adequate. Notice namely that knowing the logical relationship between sentences does not reveal anything about the nature of the models. Second, even if we knew what the models were: we could not say whether a given sentence is true in a given model or not. It is perfectly conceivable that we know English to the extent that we know which sentences entail which other sentences, but still we are unable to say, for example, whether or not (4.13) is true even when we witnessed Caesar cross the Rubicon. An example might make this clear. Imagine that all I know is which sentences of English imply which other sentences, but that I know nothing more about their actual meaning. Suppose now that the house is on re. If I realize this I know that I am in danger and I act accordingly. However, suppose that someone shouts
h p1
h ie
} d
290
Semantics
(4.14) at me. Then I can infer that he thinks (4.14) is true. This will certainly make me believe that (4.14) is true and even that (4.15) is true as well. But still I do not know that the house is on re, nor that I am in danger.
X G
(4.14) (4.15)
Therefore, knowing how sentences hang together in a deductive system has little to do with the actual world. The situation is not simply remedied by knowing some of the meanings. Suppose I additionally know that (4.14) means that the house is on re. Then if I see that the house is on re then I know that I am in danger, and I also know that (4.15) is the case. But I still may fail to see that (4.15) means that I am in danger. It may just mean something else that is being implied by (4.14). This is reminiscent of Searles thesis that language is about the world: knowing what things mean is not constituted by an ability to manipulate certain symbols. We may phrase this as follows.
Indeterminacy of secondary semantics. No secondary semantics can x the truth conditions of propositions uniquely for any given language.
Searles claims go further than that, but this much is perhaps quite uncontroversial. Despite the fact that secondary semantics is underdetermined, we shall not deal with primary semantics at all. We are not going to discuss what a word, say, really means we are only interested in how its meaning functions language internally. Formal semantics really cannot do more than that. In what is to follow we shall sketch an algebraic approach to semantics. This contrasts with the far more widespread modeltheoretic approach. The latter may be more explicit and intuitive, but on the other hand it is quite inexible. We begin by examining a very inuential principle in semantics, called Leibniz Principle. We quote one of its original formulation from (Leibniz, 2000) (from Specimen Calculi Coincidentium, (1), 1690). Eadem vel Coincidentia sunt quae sibi ubique substitui possunt salva veritate. Diversa quae non possunt. Translated it says: The same or coincident are those which can everywhere be substituted for each other not affecting truth. Different are those that cannot. Clearly, substitution must be understood here in the context of sentences, and we must assume that what we substitute is constituent
h 5 ID
y 5 h R 9 '8$$RH 9 D f H G 9 g D F7 g h A h A h
X $ D
291
occurrences of the expressions. We therefore reformulate the principle as follows.

Leibniz Principle. Two expressions A and B have the same meaning iff in every sentence any occurrence of A can be substituted by B and any occurrence of B by A without changing the truth of that sentence.
To some people this principle seems to assume bivalence. If there are more than two truth values we might interpret Leibniz original denition as saying that substitution does not change the truth value rather than just truth. (See also Lyons for a discussion.) We shall not do that, however. First we give some unproblematic examples. In second order logic ( , see Chapter 1), the following is a theorem.
Hence, Leibniz Principle holds of second order logic with respect to terms. There is general no identity relation for predicates, but if there is, it is dened according to Leibniz Principle: two predicates are equal iff they hold of the same individuals. This requires full second order logic, for what we want to have is the following for each n (with Pn and Qn variables for nary relations):
(Here, x abbreviates the ntuple x0 xn 1 .) (4.16) is actually the basis for Montagues type raising. Recall that Montague identied an individual with the set of all of its properties. In virtue of (4.16) this identication does not conate distinct individuals. To turn that around: by Leibniz Principle, this identication is onetoone. We shall see in the next section that boolean algebras of any kind can be embedded into powerset algebras. The background of this proof is the result that if there are two elements x, y in a boolean 2 we have h x h y , then algebra and for all homomorphisms h : x y. (More on that in the next section. We have to use homomorphisms here since properties are functions that commute with the boolean operations, that is to say, homomorphisms.) Thus, Leibniz Principle also holds for boolean semantics, dened in Section 4.6. Notice that the proof relies on the Axiom of Choice (in fact the somewhat weaker Prime Ideal Axiom), so it is not altogether innocent.
V U
V U
VV i aaV U
i V U
U i ~U V
) iaaa)
U V
~U V
~ U
(4.17)
Pn
Qn Pn
Qn
x Pn x
VV aaV U
V U V U
~U
U ~U ~ V V U
(4.16)
y x
P Px
Py
Qn x
q r
292
Semantics
We use Leibniz Principle to detect whether two items have the same meaning. One consequence of this principle is that semantics is essentially unique. If : A M, : A M are surjective functions assigning meanings to expressions, and if both satisfy Leibniz Principle, then there is a bijection : M M such that and 1 . Thus, as far as formal semantics is concerned, any solution is as good any other. As we have briey mentioned in Section 3.5, we may use the same idea to dene types. This method goes back to Husserl, and is a key ingredient to the theory of compositionality by Wilfrid Hodges (see his (2001)). A type is a class of expressions that can be substituted for each other without changing meaningfulness. Hodges just uses pairs of exponents and meanings. If we want to assimilate his setup to ours, we may add a category U, and let U : U. However, the idea is to do without for every mode f , f U categories. If we further substract the meanings, we get what Hodges calls a grammar. We prefer to call it an Hgrammar. (The letter H honours Hodges here.) Thus, an Hgrammar is dened by some signature and corresponding operations on the set E of exponents, which may even be partial. An Hsemantics is a partial map from the structure terms (!) to a set M of meanings. Structure terms and are synonymous if is dened on both and . We write to say that and are synonymous. (Notice iff is dened on .) An Hsemantics is equivalent to if that . An Hsynonymy is an equivalence relation on a subset of the set of structure terms. We call that subset the eld of the Hsynonymy. Given an Hsynonymy , we may dene M to be the set of all equivalence classes of , and set : iff is in that subset, and undened otherwise. Thus, up to equivalence, Hsynonymies and Hsemantics are in onetoone extends if the eld of contains the eld correspondence. We say that of , and the two coincide on the eld of . Denition 4.9 Let G be an Hgrammar and an Hsemantics for it. We write iff for every structure term with a single free variable x, x is meaningful iff x is meaningful. The equivalence classes of are called the categories. This is the formal rendering of the meaning categories that Husserl denes. Denition 4.10 and its associated synonymy is called Husserlian if for then all structure terms and : if . is called Husserlian if it is Husserlian.
p p
k R
k R
V aaa) U )
k b
k b
V U 8ys
V !PU
k R
V 8U
293
It is worthwhile to compare this denition with Leibniz Principle. The latter denes identity in meaning via intersubstitutability in all sentences; what must remain constant is truth. Husserls meaning categories are also dened by intersubstitutability in all sentences; however, what must remain constant is the meaningfulness. We may connect these principles as follows. Denition 4.11 Let Sent be a set of structure terms and Sent. We call sentential if Sent, and true if . is Leibnizian if for all structure terms and : iff for all structure terms such that x also x and conversely. Under mild assumptions on it holds that Leibnizian implies Husserlian. The following is from (Hodges, 2001). Theorem 4.12 (Hodges) Let be an Hsemantics for the Hgrammar G. Suppose further that every subterm of a meaningful structure term is again meaningful. Then the following are equivalent.
Furthermore, if is Husserlian then the second already holds if it holds for n 1. It is illuminating to recast the approach by Hodges in algebraic terms. This allows to compare it with the setup of Section 3.1. Moreover, it will also give a proof of Theorem 4.12. We start with a signature . The set Tm X forms an algebra which we have denoted by X . Now select a subset D Tm X of meaningful terms. It turns out that the embedding i : D Tm X : x x is a strong homomorphism iff D is closed under subterms. We denote the induced algebra by . It is a partial algebra. The map : D M induces an equivalence relation . There are functions f : M f M and into a homomorphism iff is a weak that make M into an algebra congruence relation (see Denition 1.21 and the remark following it). This is the rst claim of Theorem 4.12. For the second claim we need to investigate the structure of partial algebras.
V U
V U
ut
V U V U }
xi : i
ut
If
is a structure term and xi : i n and i xi : i n i i then
i,
(i n) are structure terms such that n are both meaningful and if for all xi : i n
For each mode f there is an f ary function f : M that is a homomorphism of partial algebras.
M such
1 T
V U
k @
k 1
1 T k D
294
Semantics
Proposition 4.14 Let be a partial algebra. (a) is a strong congruence relation on . (b) A weak congruence on is strong iff it is contained in . Proof. (a) Clearly, is an equivalence relation. So, let f F and a i ci for all i f . We have to show that f a f c , that is, for all g Pol 1 : g f a is dened iff g f c is. Assume that g f a is dened. The function g f x 0 a1 a f 1 is a unary polynomial h0 , and h0 a0 is dened. By denition of , h0 c0 g f c 0 a1 a f 1 is also dened. Next, (4.18) h 1 x1 : f g c 0 x1 a2 a
f 1
is a unary polynomial and dened on a 1 . So, it is dened on c1 and we have h1 c1 f g c 0 c1 a2 a f 1 . In this way we show that f g c is dened. (b) Let be a weak congruence. Suppose that it is not strong. Then there is a polynomial f and vectors a c A f with ai ci (i f ) such that f a is dened but f c is not. Now, for all i f ,
if both sides are dened. Now, f a is not congruent to f c . Hence there is an i f such that the left hand side of (4.19) is dened and the right hand side is not. Put
Then h ai is dened, h ci is not, but ai ci . So, . Conversely, if is strong we can use (4.19) to show inductively that if f a is dened, so are all members of the chain. Hence f c is dened. And conversely. Proposition 4.15 Let be a partial algebra and an equivalence relation on . is a strong congruence iff for all g Pol 1 and all a c A such that a c: g a is dened iff g c is, and then g a g c . The proof of this claim is similar. To connect this with the theory by Hodges, notice that is the same as . is Husserlian iff .
1 )
v y}
V U
V 6U i
v xp
V U
V U
V U ) iaaa) ) ) ) iaaa) U
V U i
V U
V U
V U
(4.20)
hx :
f a0
ai
x ci
V U i ) iaaa) ) )
f a0
)iaaa) U
) aaa) ) ) ) aaa)
V 6U i
V U
(4.19)
f a0
ai
ai ci
ai
ci ci
V U
V i aV U U
Denition 4.13 Let be a partial algebra. Put x y (or simply x if for all f Pol1 : f x is dened iff f y is dened.
y)
V U
V U
V i aV 6U U V U i
V aV iaaa) ) ) U U ) V U D VaV )iaaa) ) U U V U v D V aV iaaa) ) V i aV U U
V U
v wV
1 !) i i
i 'U
V aV
V U i
) iaaa)
V U
V U )
v
U U
V U
V U
V 6U i
) U U V aV i'U U V U
295
Propositions 4.14 and 4.15 together show the second claim of Theorem 4.12. If is the only operation, we can actually use this method to dene the types (see Section 3.5). In the following sections we shall develop an algebraic account of semantics, starting rst with boolean algebras and then going over to intensionality, and nally carrying out the full algebraization. Notes on this section. The idea that the logical interconnections between sentences constitute their meanings is also known as holism. This view and its implications for semantics is discussed by Dresner (2002) . We shall briey also mention the problem of reversibility (see Section 4.6). Most formalisms are designed only for assigning meanings to sentences, but it is generally hard or impossible to assign a sentence that expresses a given content. We shall briey touch on that issue in Section 4.6. Exercise 136. Prove Proposition 4.8. be a rule. Devise a mode that captures the Exercise 137. Let effect of this rule in the way discussed above. Translate the rules given above into modes. What happens with 0ary rules (that is, rules with )? Exercise 138. There is a threefold characterization of a consequence: as a consequence relation, as a closure operator, and as a set of theories. Let be a consequence relation. Show that is a closure operator. The closed sets are the theories. If is structural the set of theories of are inversely closed under substitutions. That is to say, if T is a theory and a substitution, then 1 T is a theory as well. Conversely, show that every closure gives rise to a consequence relation and that the conoperator on V sequence relation is structural if the set of theories is inversely closed under substitutions.
~
Exercise 140. Show that for any given nite signature the set of predicate logical formulae valid in all nite structures for that signature is corecursively enumerable. (The latter means that its complement is recursively enumerable.) Exercise 141. Let L be a rstorder language which contains at least the symbol for equality ( ). Show that a rstorder theory T in L satises Leibniz
Exercise 139. Show that the rules (4.9) are complete for boolean logic in and .
0 ) (
VaV U QlU
Proposition 4.16 congruence.
is Husserlian iff it is contained in
iff it is a strong
296
Semantics
Principle if the following holds for any relation symbol r
and the following for every function symbol f :
Use this to show that the rstorder set theory satises Leibniz Principle. Further, show that every equational theory satises Leibniz Principle.
2.
Boolean Semantics
Boolean algebras are needed in all areas of semantics, as is demonstrated in (Keenan and Faltz, 1985). Boolean algebras are the structures that correspond to propositional logic in the sense that the variety turns out to be generated from just one algebra: the algebra with two values 0 and 1, and the usual operations (Theorem 4.33). Moreover, the calculus of equations and the usual deductive calculus mutually interpret each other (Theorem 4.36). This allows to show that the axiomatization is complete (Theorem 4.39). , where 0 1 B, : B B and Denition 4.17 An algebra B 0 1 : B2 B, is called a boolean algebra if it satises the following equations for all x y z B.
The operation is generally referred to as the meet (operation) and as the join (operation). x is called the complement of x and 0 the zero and 1 the
vU t 'V
v 'U
vU 's
s Uv s
vU s 'V
v 'U D D
V v'Uv V t Uv t
(li ) x x (ne ) x 1 (dm ) x y (dn ) x
(ui ) x x (ne0) x 0 (dm ) x y
s V U t
t @V U s
x y 0 x x x
x z
x y 1 x
s V
D D V s DU V
D V
s s U
t s U t s U
t V
t UD
D V
t t U
s t U s t U
vU 't
(as (co (id (ab (di
) ) ) ) )
x y z x y x x x y x x y z
x y y x x x
(as (co (id (ab (di
) ) ) ) )
x y z x y x x x y x x y z
x y y x x x
s D s U D V D
1 )
i @ y
| { @dz
i
0 s) t) v y!!i) ) ) (
" V U ~ T
t D t U D V D
1 ) )
(4.22)
T;
x i yi : i
f x
i @
f y z x z
( 5
" BV U i ~ T
(4.21)
T;
x i yi : i
r x
r y
s) !t
Boolean Semantics
297
one or unit. Obviously, the boolean algebras form an equationally denable class of algebras. The laws (as ) and (as ) are called associativity laws, the laws (co ) and (co ) commutativity laws, (id ) and (id ) the laws of idempotence and (ab ) and (ab ) the laws of absorption. A structure L satisfying these laws is called a lattice. If only one operation is present and the corresponding laws hold we speak of a semilattice. (So, a semilattice is a semigroup that satises commutativity and idempotence.) Since and are associative and commutative, we follow the general practice and omit brackets whenever possible. So, rather than x y z we simply write x y z. Also, x y x is simplied to x y. Furthermore, given a nite set S L the notation x : x S or simply S is used for the iterated join of the elements of S. This is uniquely dened, since the join is independent of the order and multiplicity in which the elements appear.
Notice that x y iff x y x. This can be shown using the equations above. We leave this as an exercise to the reader. Notice also the following.
Proof. (a) x x x, whence x x. (b) Suppose that x y and y x. Then we get x y x and y x y, whence y x y x. (c) Suppose that x y and y z. Then x y y and y z z and so x z x y z x y z y z z. Let x y z. Then, since x x y, we have x z by (c); for the same reason also y z. Now assume x z and y z. Then x z y z z and so z z z x z y z x y z, whence x y z. Similarly, using x y iff x y x. In fact, it is customary to dene a lattice by means of . This is done as follows. Denition 4.20 Let be a partial order on L. Let X L be an arbitrary set. The greatest lower bound (glb) of X, also denoted X, is that element u such that for all z: if x z for all x X then also u z (if it exists). Analogously, the least upper bound (lub) of X, denoted by X, is that element v such that for all z: if x z for all x X then also v z (if it exists).
s V
s U
s s U
s bV
s U
s bV U s
s D
t sD U D
s D
x y iff z
x and z
x y
z iff x
Lemma 4.19
is a partial ordering. z and y z. y.
Denition 4.18 Let
be a lattice. We write x
y if x y
y.
0 s t y!) !) ( t
V aV
t t U U
V aV
t t U U
298
Semantics
Notice that there are partial orderings which have no lubs. For example, let L 0123 , where (4.23)
Here, 0 1 has no lub. This partial ordering does therefore not come from a lattice. For by the facts established above, the join of two elements x and y is simply the lub of x y , and the meet is the glb of x y . It is left to the reader to verify that these operations satisfy all laws of lattices. So, a partial order is the order determined by a lattice structure iff all nite sets have a least upper bound and a greatest lower bound. The laws (di ) and (di ) are the distributivity laws. A lattice is called distributive if they hold in it. A nice example of a distributive lattice is the following. Take a natural number, say 28, and list all divisors of it: 1, 2, 4, 7, 14, 28. Write x y if x is a divisor of y. (So, 2 14, 2 4, but not 4 7.) Then turns out to be the greatest common divisor and the least common multiple. Another example is the linear lattice dened by the numbers n with the usual ordering. is then the minimum and the maximum. which is a lattice with reA bounded lattice is a structure L 0 1 spect to and , and in which satises (ne ) and (ne ). From the denition of , (ne ) means that x 1 for all x and (ne ) that 0 x for all x. Every nite lattice has a least and a largest element and can thus be extended to a bounded lattice. This extension is usually done without further notice. Denition 4.21 Let L be a lattice. An element x is join irreducible in if for all y and z such that x y z either x y or x z. x is meet irreducible if for all y and z such that x y z either x y or x z. It turns out that in a distributive lattice irreducible elements have a stronger property. Call x meet prime if for all y and z: from x y z follows x y or x z. Obviously, if x is meet prime it is also meet irreducible. The converse is generally false. Look at M3 shown in Figure 11. Here, c a b 0 , but neither c a nor c b holds. Lemma 4.22 Let be a distributive lattice. Then x is meet (join) prime iff x is meet (join) irreducible. Let us now move on to the complement. (li ) and (ui ) have no special name. They basically ensure that x is the unique element y such that x y 0 and x y 1. The laws (dm ) and (dm ) are called de Morgan laws. Finally, (dn ) is the law of double negation.
( T0 ) ()y0 ) ()y0 ) ()y0 ) (y)0 ) ()0 ) ()0 ) ()0 ) @S D 0 ) bT ) ) ) !( S

: 00 02 03 11 12 13 22 33
T ) S
s t 0 s) t y!!) ) ) (
sD
0 s t y!) !) (
T ) S
T ) S
Boolean Semantics
299
Figure 11. The Lattice M3
Lemma 4.23 The following holds in a boolean algebra.
x y x y , whence Proof. x y means x y y, and so y y x. From y x we now get x x y y. If x y then x y x, and so x y x y y x 0 0. Conversely, suppose y 0. Then x y x y x y x y y x 1 x. that x So, x y. It is easily seen that x y 0 iff x y 1. We can use the terminology of universal algebra (see Section 1.1). So, the notions of homomorphisms and subalgebras, congruences, of these structures should be clear. We now give some examples of boolean algebras. The rst example is the powerset of a given set. Let X be a set. Then X is a boolean algebra with in place of 0, X in place of 1, A X A, and the intersection and union. We write X for this algebra. A subalgebra of this algebra is called a eld of sets. Also, a subset of X closed under the boolean operations is called a eld of sets. The smallest examples are the algebra 1 : , consisting just of one element ( ), and 2 : , the algebra of subsets of 1 . Now, let X be a set and B01 be a boolean algebra. Then for two functions f g : X B we may dene f , f g and f g as follows. f x : g x : g x : f x (4.24) f f f x f x gx gx
Further, let 0 : X B : x 0 and 1 : X B : x 1. It is easily veried that the set of all functions from X to B form a boolean algebra: B X 0 1 .
v 0vi)!!) ) ) ( s) t D w V T w SU II!f D
0 s) t) v y!!i) ) )
vU t v 'V D 'U
V aV
V U
s v V 'U v'Us D Ut V vU aV 't D t V vD v D v v U V s v
V U
V 'Ut v D UsIV t U t v 'UtV t U D V 't vU D v v
Dv
V f U
s @V
V U V U t V V U D V U V U s V V U D V U v V V U D
v 'U
T w IS
vU 't
V wU f
s U
t U
v 'U
vU 't
y iff x
0 iff
y iff
x. 1.
3 3 3
vD t t v
300
Semantics
We denote this algebra by X . The notation has been chosen on purpose: this indexed over X. A particular algebra is nothing but the direct product of case is 2. Here, we may actually think of f : X 2 as the characteristic 1 1 . It is then again veried that function M of a set, namely the set f M M , M N M N , M N M N . So we nd the following.
We provide some applications of these results. The intransitive verbs of English have the category e t. Their semantic type is therefore e t. This in turn means that they are interpreted as functions from objects to truth values. We assume that the truth values are just 0 and 1 and that they form a boolean algebra with respect to the operations , and . Then we can turn the interpretation of intransitive verbs into a boolean algebra in the way given above. Suppose that the interpretation of , and is also canonically extended in the given way. That is: suppose that they can now also be used for intransitive verbs and have the meaning given above. Then we can account for a number of inferences, such as the inference from (4.25) to (4.26) and (4.27), and from (4.26) and (4.27) together to (4.25). Or we can infer that (4.25) implies that (4.28) is false; and so on. (4.26) (4.28)

With the help of that we can now also assign a boolean structure to the transitive verb denotations. For their category is e t e, which corresponds to the type e e t . Now that the set functions from objects to truth values carries a boolean structure, we may apply the construction again. This allows us then to deduce (4.30) from (4.29). (4.30)
y d D q 9 4
Obviously, any category that nally ends in t has a space of denotations associated to it that can be endowed with the structure of a boolean algebra. (See also Exercise 133.) These are, however, not all categories. However, for the remaining ones we can use a trick used already by Montague. Montague
5 H G A 5 H $h
5 H 8h P p
(4.29)
V u U
D q # 4
p ` 5 g d q#@bh @P D 5 4 H G A h h A 5 H G ` 5 H G A 5 H $h 5 g @bh @P A h h A 5 H
(4.27)
(4.25)
g 9
5 g IH 9
y F d
V f U
Theorem 4.24 2X is isomorphic to
X .
s V U D
s t
A P H F 4
yd9P@H4 g h g 9 A 5 bh y A d P H 4 5 9@bh y A d P H 5 9@bh 9 H A d P H 5 Rc9@bh
H@P H@P H@P H @P
D v
Boolean Semantics
301
was concerned with the fact that names such as and denote objects, which means that their type is e. Yet, they ll a subject NP position, and subject NP positions can also be lled by (nominative) quantied NPs such as , which are of type e t t. In order to have homogeneous type assignment, Montague lifted the denotation of and to e t t. In terms of syntactic categories we lift from e to t e t . We have met this earlier in Section 3.4 as raising. Cast in terms of boolean algebras this is the following construction. From an arbitrary set X we rst form the boolean algebra X and then the algebra 2 X .
(4.32)
So, this licenses the inference from (4.31) to (4.33) and (4.34), as required. (We have tacitly adjusted the morphology here.) (4.33) (4.34)
yAFFRbFR} d P H 9 H A 7 A P H 5 h 4 h F 8#G y F d
It follows that we can make the denotations of any linguistic category a boolean algebra. The next theorem we shall prove is that boolean algebras are (up to isomorphism) the same as elds of sets. Before we prove the full theorem we shall prove a special case, which is very important in many applications. An atom is an element x 0 such that for all y x: either y 0 or y x. At denotes the set of all atoms of .
V k o@jlU k 8V k F6b6U k T@ q h r h q t t p q D V aV k T@qU k @jlaV k @yU k F'b6U o qh r hU t V q t p D V k T@V k @jt k F'b6U q U o q h r h t p
k F'bp t
9 H A 7 k F6IbI} t bp
We interpret now by interprets ilarly,
(4.31)
is the individual Peter. Simwhere . Then (4.31) means
V U
Proof. Suppose that x y. Then x x x x 1, while y x y 0. Thus x y . To see that this does the trick, consider the following sentence.
V U
V U
V U
V U
y d F
P H 9 H A 7 RbFR}
Proposition 4.25 The map x of X into 2 X .
x given by x f :
f x is an embedding
5 h 4 h 8$$G
9 H A 7 RR}
V
5 h 4 h 8#G U D
V f U
5 bh
g A g bD P D
V D D
9 H 5 h 4 h R8$$G
5 h 4 h 8$$G
f h Ig A
ko qhr @jh
9 H A 7 RR} D
V U
V u U
302
Semantics
Lemma 4.26 In a boolean algebra, an element is an atom iff it is join irreducible. This is easy to see. An atom is clearly join irreducible. Conversely, suppose that x is join irreducible. Suppose that 0 y x. Then
The map x x is a homomorphism: x At x. For let u be an atom. For any x, u u x u x ; and since u is irreducible, u u x or u u x , which gives u x or u x. But not both, since u 0. Second, x y x y, as is immediately veried. Now, if is nite, x is nonempty iff x 0.
Proof. Put x : x. Clearly, x x. Now suppose x x. Then x x 0. x x, whence u x. But u x , a contradicHence there is an atom u tion. A boolean algebra is said to be atomic if x is the lub x for all x.
Y
Now we proceed to the general case. First, notice that this theorem is false in general. A subset N of M is called conite if its complement, M N, is nite. Let be the set of all subsets of which are either nite or conite. Now, as is easily checked, contains , and is closed under complement, union and intersection. The singletons x are the atoms. However, not every set of atoms corresponds to an element of the algebra. A case in point is 2k : k . Its union in is the set of even numbers, which is neither nite nor conite. Moreover, there exist innite boolean algebras that have no atoms (see the exercises). Hence, we must take a different route. be a boolean algebra. A point is a homomorphism Denition 4.29 Let h: 2. The set of points of is denoted by Pt .
Theorem 4.28 Let be a nite boolean algebra. The map x morphism from onto At .
x is an iso-
t bV k
v 'U
@k
T S
t V k
V aV
Lemma 4.27 If
is nite, x
x.
v FV U
V aV
8k
vU 'xt xV U s
vU '
U f
t U
(4.36)
x:
At
:y
vD
By irreducibility, either y x or x y x. From the latter we get x or y x, using Lemma 4.19. Since also y x, y x x 0. So, y Therefore, x is an atom. Put
vU 't
V aV
y, 0.
D vU 't V U s
t U D
(4.35)
x y
V 't vU D vU 't s U V aV D
y x
Y Y
v 'U t D
v T
S S
Boolean Semantics
303
Notice that points are necessarily surjective. For we must have h 0 0 and h1 1. (As a warning to the reader: we will usually not distinguish 1 and 1.) Denition 4.30 A lter of is a subset that satises the following.
A lter F is an ultralter iff for all x: either x F or x F. For suppose neither is the case. Then let F x be the set of elements y such that there is a u F with y u x. This is a lter, as is easily checked. It is a proper lter: it does not contain x. For suppose otherwise. Then x u x for some u F. x. So, By Lemma 4.23 this means that 0 u x, from which we get u x F, since u F. Contradiction. Proposition 4.31 Let h : be a homomorphism of boolean algebras. Then Fh : h 1 1 is a lter of . Moreover, for any lter F of , F F is dened by x F y iff x y F is a congruence. The factor algebra also denoted by F and the map x x F by hF . It follows that if h : then Fh . Now we specialize to 2. Then 2, we have a lter h 1 1 . It is clear that this must be an ultralif h : ter. Conversely, given an ultralter U, U 2. We state without proof the following theorem. A set X B has the nite intersection property if for every nite S X we have S 0. Theorem 4.32 For every subset of B with the nite intersection property there exists an ultralter containing it.
Y Y
x y
x y
Y Y
(4.37)
x y
x y
Y '
V U
Now put x :
Y
Pt
:h x
1 . It is veried that
V U D
A lter F is called an ultralter if F F G B.

m m
D } Y
V " U
If x
F and x
y then y
1 )
If x y
F. F then x y F. F. B and there is no lter G such that
V U
V U 1 1
304
Semantics
To see the rst, assume h x. Then h x 1, from which h x 0, and so h x, that is to say h x. Conversely, if h x then h x 1, whence h x 1, showing h x. Second, h x y implies h x y 1, so h x 1 and h y 1, giving h x as well as h y. Conversely, if the latter holds then h x y 1 and so h x y. Similarly with . Theorem 4.33 The map x x is an injective homomorphism from into the algebra Pt . Consequently, every boolean algebra is isomorphic to a eld of sets. Proof. It remains to see that the map is injective. To that end, let x and y be two different elements. We claim that there is an h : 2 such that h x hy. For we either have x y, in which case x y 0; or we have y x, in which case y x 0. Assume (without loss of generality) the rst. There is y , by Theorem 4.32. Obviously, an ultralter U containing the set x x U but y U. Then hU is the desired point. We point out that this means that every boolean algebra is a subalgebra of a direct product of 2. The variety of boolean algebras is therefore generated by 2. The original representation theorem for nite boolean algebras can be extended in the following way (this is the route that Keenan and Faltz take). is called complete if any set has a least upper bound A boolean algebra and a greatest lower bound.
It should be borne in mind that within boolean semantics (say, in the spirit of Keenan and Faltz) the meaning of a particular linguistic item is a member of a boolean algebra, but it may at the same time be a function from some boolean algebra to another. For example, the denotations of adjectives form a boolean algebra, but they may also be seen as functions from the algebra of common noun denotations (type e t) to itself. These maps are, however, in general , can in not homomorphisms. The meaning of a particular adjective, say principle be any such function. However, some adjectives behave better than others. Various properties of such functions can be considered. Denition 4.35 Let be a boolean algebra and f : B B. f is called monotone iff for all x y B: if x y then f x f y . f is called antitone if f y . f is called restricting iff for each for all x y B: if x y then f x x Bf x x. f is called intersecting iff for each x B: f x x f 1.
Theorem 4.34 Let At .
be a complete atomic boolean algebra. Then
V U
V U
V U
V U D
P P H @4
V t U DV U V UD D V U
V U
v 1 s
dV U
vU 't
T V
v 'U
V U
vU 't S
f 0
V U
v f1 v 1
Y
v 1
1 )
V aV
v t
U f
yV U 1 )
V aV
t U V U
V 'U v D 1 U 1 1
U f
Boolean Semantics
305
Adjectives that denote intersecting functions are often also called intersective. An example is . A white car is something that is both white and a car. Hence we nd that is intersecting. Intersecting functions are restricting but not necessarily conversely. The adjective denotes a restricting function (and is therefore also called restricting). A tall student is certainly a student. Yet, a tall student is not necessarily also tall. The problem is that tallness varies with the property that is in question. (We may analyze it, say, as: belongs to the 10 % of the longest students. Then it becomes clear that it has this property.) Suppose that students of sports are particularly tall. Then a tall student of sports will automatically qualify as a tall student, but a tall student may not be a tall student of sports. On the other hand, if students of sports are particularly short, then a tall student will be a tall student of sports, but the converse need not hold. There are also adjectives that have or ). We will return none of these properties (for example, to sentential modiers in the next section. We conclude the section with a few remarks on the connection with theories and lters. Let be the signature of boolean logic: the 0ary symbols , , the unary and the binary and . Then we can dene boolean algebras by means of equations, as we have done with Denition 4.17. For reference, . Or we may actually dene a consequence we call the set of equations relation, for example by means of a Hilbertcalculus. Table 10 gives a complete set of axioms, which together with the rule MP axiomatize boolean logic. Call this calculus . We have to bring the equational calculus and the deductive calculus into correspondence. We have a calculus of equations (see Section 1.1), which tells us what equations follow from what other equations. in place of . Write Theorem 4.36 The following are equivalent.
The proof is lengthy, but routine. and are equivalent by the fact that an algebra is a boolean algebra iff it satises . So, needs proof. It rests on the following
H ~ gy D H 8 ~ 8
Lemma 4.37 (a) (b) iff

~
iff
. .
y
$Q
For every boolean algebra :
8w~
. .
P P H @4
h R h P P 7@@H
h RA g
8 8
7 A
U V
k ' 8 $Q
h 4 $FD | "r U
$Q
306
Semantics
Table 10. The Axioms of Propositional Logic
p0 p0
p0
p0
p0
p0
Proof. (a) Suppose that . Since we get . we get . Conversely, if , then Similarly, from with (a8) we get and with (a6) and MP, . (b) We can take advantage of our results on BAs here. Put a b : a b a b . The claim boils down to a b 1 iff a b. Now, if a b 1, then a b 1, from which a b, and also a b 1, from which b a. Together this gives a b. Conversely, if a b then a b b b 1 and a b a a 1, showing a b 1. The next thing to show is that if ; then also . Finally, for all of the form (a1) (a12), . This will show that implies . is an immediate consequence. For the converse direction, rst we establish that for all basic equations of we have . This is routine. Closure under substitution is guaranteed for theorems. So we need to show that this is preserved by the inference rules of Proposition 1.12, that is:
V iU
V i U
T V U
(4.38d)
i : i
(4.38c)
(4.38b)
(4.38a)
D V
cH 8 H 8~
v s
s v v s @V U t
H p~ $Q
V aV
v s
s D 'U v
U V
H U
H ~ $Q
8~
V aV
UU aV
$Q
v s
V
V V
p0 p1 p 0 p1 p 0 p1 p0 p0 p1 p0 p 0 p1
p0 p0 p1 p1 p1 p2
p1
p0
p2
p1
p2
V aV
U "V
V UU V aaV
V U V U aU U U U
8#~
H 8
U U U
H 8
8~
U aU
(a0) (a1) (a2) (a3) (a4) (a5) (a6) (a7) (a8) (a9) (a10) (a11) (a12)
p0 p0 p0
p1 p1 p1
p0 p2 p0
p0
p1
p0
p2
8~
H ~ p$Q
y
Boolean Semantics
307
In the last line, f is one of the basic functions. The verication is once again routine. We shall now show that the sodened logic is indeed the logic of the two element matrix with designated element 1. By DT (which holds in ), iff and .
is a congruence on the term algebra. What is more, it is admissible for every deductively closed set. For if is deductively closed and , then also for every , by Modus Ponens. Lemma 4.38 V is a boolean algebra. Moreover, if is a deductively closed set in is a lter on V then V . If is maximally consistent, is an ultralter. Conversely, if F is a lter on 1 V , then h F is a deductively closed set. If F is an ultralter, this set is a maximally consistent set of formulae. is the intersection of all is a boolean algebra and Thus, F , where F a lter. Now, instead of deductively closed sets we can also take maximal (consistent) deductively closed sets. Their image under the canonical map is an ultralter. However, the equivalence U : x y :x y U is a congruence, and it is admissible for U. Thus, we can once again factor it out and obtain the following completeness theorem.
)
This says that we have indeed axiomatized the logic of the 2valued algebra. What is more, equations can be seen as statements of equivalence and conversely. We can draw from this characterization a useful consequence. Call a propositional logic inconsistent if every formula is a theorem. Corollary 4.40 is maximally complete. That is to say, if an axiom or rule is not derivable in , is inconsistent. Proof. Let be a rule that is not derivable in . Then by Theorem 4.39 there is a valuation which makes every formula of true but false. Dene the following substitution: p : if p 1, and p : otherwise. Then for every , , while . Hence, as derives for every , it also derives , and so . On the other hand, in , everything follows from . Thus, is inconsistent.
V U
V U V U
| "r
g | r V U
H V U
V U
g | "r
| r
0 ) (
8~
Theorem 4.39
2 1
V U
( 0 ) @S
8
~
V U
( 0 ) @S
d V U
| r
V U
(4.39)
| r
V U
| "r | "r
308
Semantics
Notes on this section. The earliest sources of propositional logic are the writing of the Stoa, notably by Chrysippos. Stoic logic was couched in terms of inference rules. The rst to introduce equations and a calculus of equations was Leibniz. The characterization of in terms of union (or intersection) is explicitly mentioned by him. Leibniz only left incomplete notes. Later, de Morgan, Boole and Frege have completed the axiomatization of what is now known as Boolean logic. Exercise 143. For a lattice L dene d : L . Show that dd . d is called the dual lattice of . this is lattice as well. Obviously, The dual of a lattice term t d is dened as follows. xd : x if x is a variable, t t d : t d t d , t t d : t d t d . Evidently, s t iff d sd t d . d d holds in every lattice. Deduce that s t holds in every lattice iff s t Exercise 144. (Continuing the previous exercise.) For a boolean term dene additionally 0d : 1, 1d : 0, t d : t d and d : B 1 0 for d B01 . Show that . This implies that s t iff sd t d . Exercise 145. Prove Lemma 4.22. Exercise 146. Let be a partial ordering on L with nite lubs and glbs. Dene x y : lub x y , and x y : glb x y . Show that L is a lattice. Exercise 147. Let be the set of entire numbers. For i j and j 2 i let Ri j : m 2i j : m . Let H be the set of subsets of generated by all nite unions of sets of the form Ri j . Show that H forms a eld of sets (hence a boolean algebra). Show that it has no atoms.
3.
Intensionality
Leibniz Principle has given rise to a number of problems in formal semantics. One such problem is its alleged failure with respect to intensional contexts. This is what we shall discuss now. The following context does not admit any substitution of A by a B different from A without changing the truth value of the entire sentence.
G G
(4.40)
y $
B 4 9 g $97h f bh A H D A A h 5 8 6 h H A
0 vD ) t s i!) !) ) ) (
0 s) t y!!) (
0 t) s !!) (
1 )
T ) S
0 s )D t !!) (
4 D
A $ B 4
k s
V 'U v
9 g 7@$#h D A A h 5 8 6 h
Exercise 142. Show that x
y iff x y
x.
Vk t U
T ) S
0 v) s) t i!!) ) ) D (
k Dt D g
Vk s U D
Intensionality
309
Obviously, if such sentences were used to decide about synonymy, no expression is synonymous with any other. However, the feeling with these types of sentences is that the expressions do not enter with their proper meaning here; one says, the expressions A and B are not used in (4.40) they are only mentioned. This need not cause problems for our sign based approach. We might for example say that the occurrences of A where A is used are occurrences with a different category than those where A is mentioned. If we do not assume this we must exclude those sentences in which the occurrences of A or B are only mentioned, not used. However, in that case we need a criterion for deciding when an expression is used and when it is mentioned. The picture is as follows. Let S x be shorthand for a sentence S missing a constituent x. We call them contexts. Leibniz Principle says that A and B have identical meanS B is true for all S x . Now, let be the ing, in symbols A B, iff S A set of all contexts, and the set of all contexts where the missing expression is used, not mentioned. Then we end up with two kinds of identity: SA
Obviously, . Generalizing this, we get a Galois correspondence here between certain sets of contexts and equivalence relations on expressions. Contexts outside of are called hyperintensional. In our view, (4.40) does not contain occurrences of the language signs for A and B but only occurrences of strings. Strings denote themselves. So, what we have inserted are not the same signs as the signs of the language, and this means that Leibniz Principle is without force in example (4.40) with respect to the signs. , we get the actual However, if put into the context meaning of A that the language gives to it. Thus, the following is once again transparent for the meanings of A and B: B
G G G G
(4.43)
A hyperintensional context is
G G e
(4.44)
What John thinks here is that the expression denotes a special kind of leaet, where in fact it denotes a kind of manuscript. Although this
y 4
A H R 9 D 9 H !Rh f h f bh H A
A h H h P h 5 A 4 A h A D P H 8 4 8P X @$bH 1F8 f #H
4 5
A h A D P H R8 f $@8
R D ! E
9 9 H I$h f h
A H
y $B 4 9 g 7$#7h D A A h 5 8 6 B 9 g 7$#7h 4 D A A h 5 8 6 h
4 k D d
A 9 F
G 9 g
(4.42)
B:
Sx
SB
VaV U V aV U
"V U V U V U V U
1QV U 1 QV U
~U ~ U
(4.41)
B:
Sx
SA
SB
V U
V U
V U
V U
310
Semantics
is a less direct case of mentioning an expression, it still is the case that the sign with exponent is not an occurrence of the genuine English language sign, because it is used with a different meaning. The meaning of that sign is once again the exponent (string) itself. There are other problematic instances of Leibniz Principle, for example the socalled intensional contexts. Consider the following sentences. (4.45) (4.46) (4.47) (4.48) (4.49)
e e s e y 9 4 A A h RH @@P G G ` G g g X g 4 g 5cbh #H A $D h 5 H 7 A 4 4 4 A kh $h h D P 9 G g 5 h 5 H 7 A G X g 4 @g c$bbh y 9 4 A h P A RH kA@@cD ` y 5 H 4 A 'b5!ERh h R 9 D 9 G G G ` G h 4A$D5bH1$85 gf h #H kh $h 9 g 4 A R 9 D 9 4 4 4 A h D P y 5 H 4 A 'b5!Eb5 f R 9 D 9 g G G ` G 4#H 4 4 A kh $h h D P 9 g G A 5 H 4 A R 9D 9 g b5!Eb5 f h G h 4A$Db1$85 f h 5 H 4 A R 9 D 9 g ` G 5 H 4 A R 9D 9 h b5!ERh h y ' 4 D 4 1
It is known that (4.45) is true. However, it is quite conceivable that (4.46) may be true and (4.47) false. By Leibniz Principle, we must assume that and have different meaning. However, as Frege points out, in this world they refer to the same thing (the planet Venus), so they are not different. Frege therefore distinguishes reference (Bedeutung) from sense (Sinn). In (4.45) the expressions enter with their reference, and this is why the sentence is true. In (4.46) and (4.47), however, they do not enter with their reference, otherwise John holds an inconsistent belief. Rather, they enter with their senses, and the senses are different. Thus, we have seen that expressions that are used (not mentioned) in a sentence may either enter with their reference or with their sense. The question is however the same as before: how do we know when an expression enters with its sense rather than its reference? The general feeling is that one need not be worried by that question. Once the sense of an expression is given, we know what its reference is. We may think of the sense as an algorithm that gives us the reference on need. (This analogy has actually been pushed by Yannis Moschovakis, who thinks that sense actually is an algorithm (see (Moschovakis, 1994)). However, this requires great care in dening the notion of an algorithm, otherwise it is too ne grained to be useful. Moschovakis shows that equality of meaning is decidable, while equality of denotation is not.) Contexts that do not vary with the sense only with the reference of
5 H 4 A R 9D 9 h b5!ERh h
A h A D P H R8 f $@8
7 ` G G 4
5 H 4 A R 9D 9 g 5!E85 f
Intensionality
311
their subexpression are called extensional. Nonextensional contexts are intensional. Just how ne grained intensional contexts are is a difcult matter. For example, it is not inconceivable that (4.48) is true but (4.49) is false. Since 2 1 5 we expect that it cannot be otherwise, and that one cannot even believe otherwise. This holds, for example, under the modal analysis of belief by Hintikka (1962). Essentially, this is what we shall assume here, too. The problem of intensionality with respect to Leibniz Principle disappears once we realize that it speaks of identity in meaning, not just identity in denotation. These are totally different things, as Frege rightly observed. Of course, we still have to show how meaning and denotation work together, but there is no problem with Leibniz Principle. Intensionality has been a very important area of research in formal semantics, partly because Montague already formulated an intensional system. The inuence of Carnap is clearly visible here. It will turn out that equating intensionality with normal modal operators is not always helpful. Nevertheless, the study of intensionality has helped enormously in understanding the process of algebraization. , where the boolean symbols are used as beLet A : fore and is a unary symbol, which is written before its argument. We form expressions in the usual way, using brackets. The language we obtain shall and as well as typical shortbe called LM . The abbreviations hands (omission of brackets) are used without warning. Notice that we have a propositional language, so that the notions of substitution, consequence relation and so on can be taken over straightforwardly from Section 4.1. Denition 4.41 A modal logic is a subset L of L M which contains all boolean tautologies and which is closed under substitution and Modus Ponens. L is called classical if from L follows that L, monotone if from L follows L. L is normal if for all L M (a) L, (b) if L then L. The smallest normal modal logic is denoted by , after Saul Kripke. A quasi normal modal logic is a modal logic that contains . One also denes
and calls this the dual operator (see Exercise 144). is usually called a necessity operator, a possibility operator. If is an axiom and L a (normal) modal logic, then L (L ) is the smallest (normal) logic containing L . Analogously the notation L , L for a set is dened.
td
(4.50)
T ) ) ) r) q ) t)s 8) @ijS
1 V
U V 1
S s
312
Semantics
~
Denition 4.42 Let L be a modal logic. Then L is the following consequence relation. L iff can be deduced from L using (mp) only. L is the consequence relation generated by the axioms of L, the rule (mp) and the rule (mn): . L is called the local consequence relation, the global consequence relation associated with L. L It is left to the reader to verify that this indeed denes a consequence relation. the rule (mn) is by denition admissible. HowWe remark here that for ever, it is not derivable (see the exercises) while in it is, by denition. Before we develop the algebraic approach further, we shall restrict our attention to normal logics. For these logics, a geometric (or model theoretic) semantics has been given. where F is a set, the set of Denition 4.43 A Kripkeframe is a pair F worlds, and F 2 , the accessibility relation. A generalized Kripkeframe is a triple F where F is a Kripkeframe and F a eld of sets closed under the operation on F dened as follows:
d
(4.52)
(One often writes x , suppressing and .) Furthermore, the local frame consequence is dened as follows. if for every and x: if x for every then x . This is a consequence relation. Moreover, are valid. Furthermore, the axioms and rules of
For if x ; and x y then y ; , from which y . As y was arbitrary, x . Finally, suppose that . Then . For choose x and . Then for all y such that x y: y , by assumption. Hence x . Since x and were arbitrarily chosen, the conclusion follows. g if for all : if Dene F for all , then also F.
V U
C0 ) ( )
V U
U V
f g
(4.53)
0 ) ( )
C0 ) ( )
x x x
C0 ) ( ) 50 ) ( ) V U 1
f X
"0 ) ( ) y C0 ) ( ) y C0 ) ( ) "0 ) ( )
p p V x x ; for all y : if x y then
0 ) ) (
Call a valuation into a general Kripkeframe .
| "r 0 ) ( )
(4.51)
A:
x : for all y : if x
y then y
a function : V
V U
0 ) (
V U
0 ) (
0 Rt
#~
s ) S bT !(
0)) ( } c
F0 ) ( )

Intensionality
313
This is the global frame consequence determined by . For a class of frames we put
: . We noticed that this is a normal modal Proof. We put L : logic if is one membered. It is easy to see that this therefore holds for all classes of frames. Clearly, since both L and have a deduction theorem, they are equal if they have the same tautologies. This we have just shown. For g . Moreover, the global consequence relation, notice rst that L : L is the smallest global consequence relation containing L , and similarly g the smallest global consequence relation containing . We shall give some applications of modal logic to the semantics of natural language. The rst is that of (meta)physical necessity. In uttering (4.55) we suggest that (4.56) obtains whatever the circumstances. Likewise, in uttering (4.57) we suggest that there are circumstances under which (4.58) is true.
(4.56) (4.57) (4.58)
as an operator on senThe analysis is as follows. We consider tences. Although it appears here in postverbal position, it may be rephrased by , which can be iterated any number of times. The same can be done with , which can be rephrased as and turns out to be the dual of the rst. We disregard questions of form here and represent sentential operators simply as and , prexed to the sentence in question. is a modal operator, and it is normal. For example, if A and B are both necessary, then so is A B, and conversely. Second, if A is logically true, then A is necessary. Necessity has been modelled following to Carnap by frames of the form W W W . Metaphysically possible worlds should be possible no matter what is the case (that is, no matter which world
h 8P
A A g
D 7
8 D
A
4 1D
(4.55)
p 8qbh h9 P D 5 H A A p G y 6 D 09n5 g 4#$!9D h R 5 8h U 9$h X h 4 g H b8H h 4 H 9 A 5 H A h p ` G G y'6FDn5 g 4hR!9D 5bh U 9H$h X h h H 4 g 9 4 FD f b8H h 4 R 5 H A h x G v 8 y 9 4 5 h 4 H h 5 R A IH 89$$@$D p v 8 P D 5 H A A h 9 A bnbh $D
9 RH
5 h 4 H h 5 b#8R
Theorem 4.44 For every class g L . Moreover, L.
of frames there is a modal logic L such that
1 f
w
R FD
4 #
p 5 H A A h 9 A D 4 bh FD
Analogously,
is the intersection of all
0 1 f
g,
f g(
(4.54)
V hD
4 #
314
Semantics
we are in). It turns out that the interpretation above yields a particular logic, called .
We defer a proof of the fact that this characterizes . Hintikka (1962) has axiomatized the logic of knowledge and belief. Write KJ to represent the proposition John knows that and B J to represent the proposition John believes that . Then, according to Hintikka, both turn out to be normal modal operators. In particular, we have the following axioms.
(4.63) (4.64) (4.65)
Further, if is a theorem, so is BJ and KJ . Now, we may either study both operators in isolation, or put them together in one language, which now has two modal operators. We trust that the reader can make the necessary amendments to the above denitions to take care of any number of operators. We can can then also formulate properties of the operators in combination. It turns out, namely, that the following holds.
The logic of BJ is known as : p p and it is the logic of all transitive Kripkeframes; KJ is once again . The validity of this set of axioms for the given interpretation is of course open to question. A different interpretation of modal logic is in the area of time. Here there is no consensus on how the correct model structures look like. If one believes in determinism, one may for example think of time points as lying on the real line . Introduce an operator by
"0 k ) 2P( ) )
` 0 ) 2P( ) )
(4.67)
for all t
t:
P n i
l m
(4.66)
KJ
BJ
a U U !
(4.62)
(4.61)
! V V ! V aU
U !
(4.60)
BJ
BJ BJ BJ B J BJ KJ KJ KJ KJ KJ KJ KJ KJ KJ KJ
logical omniscience for belief positive introspection for belief logical omniscience factuality of knowledge positive introspection negative introspection
i P
TW
k)
S %
(4.59)
i P
0 ) 2P(
i P
p p
Intensionality
315
One may read as it will always be the case that . Likewise, may be read as it will at least once be the case that . The logic of is
to be read as it has always been the case that . Finally, reads it has been the case that . On , has the same logic as . We may also study both operators in combination. What we get is a bimodal logic (which is simply a logic over a language with two operators, each dening a modal logic in p p; p p. The details its own fragment). Furthermore, need not be of much concern here. Sufce it to say that the modelling of time with the help of modal logic has received great attention in philosophy and linguistics. Obviously, to be able to give a model theory of tenses is an important task. Already Montague integrated into his theory a treatment of time in combination with necessity (as discussed above). We shall use the theory of matrices to dene a semantics for these logics. We have seen earlier that one can always choose matrices of the form V , deductively closed. Now assume that L is classical. Then put L if L. This is a congruence relation, and we can form the factor algebra along that congruence. (Actually, classicality is exactly the condition that induces a congruence relation.) It turns out that this algebra is a boolean algebra and that is interpreted by a function on that boolean algebra (and by a function ). Denition 4.45 A boolean algebra with (unary) operators (BAO) is an algebra A 0 1 such that i : A A for all i . i :i
e o
If furthermore L is a normal modal logic, morphism.
turns out to be a socalled hemi-
Denition 4.46 Let be a boolean algebra and h : B B a map. h is called x y. a hemimorphism if (i) 1 1 and (2) for all x y B: x y A multimodal algebra is an algebra M 01 :i , i is a boolean algebra and i , i , a hemimorphism where M 0 1 on it.
e 0 a0 e( ) v ) s) t ri!!) ) ) ( V U sIV U e V t U e e t 1 ) D
"0 k ) 2P( ) )
0 ) i2P(
0 a0
` i2P( 0 )
` 0 ) 2P( ) )
(4.69)
for all t
Alternatively, we may dene an operator
by t:
0 ) 2P( ) )
S 1
e( ) v ri)
0 v) s) t i!!) ) )
0 ) ( 2P%p s) t !!) ) ) (
(4.68)
: for all x :
0 V U )
dl(
316
Semantics
We shall remain with the case 1 for reasons of simplicity. A hemimorphism is thus not a homomorphism (since it does not commute with ). The modal algebras form the semantics of modal propositional logic. We also have to look at the deductively closed sets. First, if L then iff . So, we can factor by L . It turns out that , being closed under (mp), becomes a lter of the boolean quotient algebra. Thus, normal modal logics are semantically complete with respect to matrices F , where is a modal algebra and F a lter. We can rene this still further to F being an ultralter. This is so since if L there actually is a maximally consistent set of formulae that contains but not , and reduced by L this turns into an ultralter. Say that if U for all ultralters U on . Since x is in all ultralters iff x 1, we have exactly if for all homomor1. (Equivalently, iff 1 .) Notice phisms h into , h that if for all h: h h . Now we shall apply the representation theory of the previous section. A boolean algebra can be represented by a eld of sets, where the base set is the set of all ultralters (alias points) over the boolean algebra. Now take a modal algebra . Underlying it we nd a boolean algebra, which we can represent . Now, for two by a eld of sets. The set of ultralters is denoted by U ultralters U, V put U V iff for all x U we have x V . Equivalently, U V iff x V implies x U. We end up with a structure U , where is a binary relation over U and U a eld of sets closed under the operation A A. is an algebra if it satises the axioms given above. A modal algebra Let be an algebra and U V W ultralters. Then (a) U U. For let x U. Then x U since p p. Hence, U U. (b) Assume U V and V W . We show that U W . Pick x W . Then x V and so x U. Since p p, we have x U. (c) Assume U V . We show V U. To this end, pick x U. Then x U. Hence x V , by denition of . Hence we nd that is an equivalence relation on U . More exactly, we have shown the following.
p iff
T
p iff
p iff
is reexive. is transitive. is symmetric.
V lU
Proposition 4.47 Let above.
be a modal algebra, and
0 ) )V d lU (
be dened as
q Tq
0 l( )
0 S) QIT 'l(
VV alU U
1 V lU
V lU
V lU
V U
0 l( )
) )
V U
qe q
e P W i
V U D D
i P
ud
Intensionality
317
The same holds for Kripkeframes. For example, F p p iff is reexive. Therefore, U already satises all the axioms of . Finally, let F be a Kripkeframe, G F be a set such that x G and x y implies y G. (Such sets are called generated.) Then the induced frame G G2 is called a generated subframe. A special case of a generated subset is the set F x consisting of all points that can be reached in nitely many steps from x. Write x for the generated subframe induced by F x. Then a valuation on induces a valuation on x, which we denote also by .
Then 0 1 iff 0 and 1 . (For if x F0 then 0 1 x iff 0 x , and analogously for x F1 .) It follows that a modal logic which is determined by some class of Kripkeframes is already determined by some class of Kripkeframes generated from a single point. This shows the following.
Now that we have looked at intensionality we shall look at the question of individuation of meanings. In algebraic logic a considerable amount of work has been done concerning the semantics of propositional languages. Notably in Blok and Pigozzi (1990) Leibniz Principle was made the starting point of a denition of algebraizability of logics. We shall exploit this work for our purposes here. We start with a propositional language of signature . Recall the denition of logics, consequence relation and matrix from Section 4.1. We distinguish between a theory, (world) knowledge and a meaning postulate. Denition 4.50 Let be a consequence relation. A theory is a set such that . If T is a set such that T , T is called an axiomatization of . Theories are therefore sets of formulae, and they may contain variables. For example, is a theory. However, in virtue of the fact that
~
Theorem 4.49
is the logic of all Kripkeframes of the form M M
F0 ) ) %
T ( ) @@mQH 'S
i P
Q0 ) ) ( %
(4.70)
F0
F1
M .
A special consequence is the following. Let 0 : F0 be Kripkeframes. Assume that F0 and F1 are disjoint.

and
C0 ) )
Lemma 4.48 x iff x x is a generated subframe of then also

e
. It follows that if L.
L and F1
t ) (
i P
0 ) ( 1
0 )V lU
C0 ) ( )
0 ) ( D
R
e
318
Semantics
variables are placeholders, it is not appropriate to say that knowledge is essentially a theory. Rather, for a theory to be knowledge it must be closed under substitution. Sets of this form shall be called logics. Denition 4.51 Let be a structural consequence relation. A logic is a theory closed under substitution. Finally, we turn to meaning postulates. Here, it is appropriate not to use sets of formulae, but rather equations. Denition 4.52 Let L be a propositional language. A meaning postulate for L is an equation. Given a set M of meaning postulates, an equation s t follows from M if s t holds in all algebras satisfying M. Thus, the meaning postulates effectively axiomatize the variety of meaning algebras, and the consequences of a set of equations can be derived using the calculus of equations of Section 1.1. In particular, if f is an nary operation f t , and and si ti holds in the variety of meaning algebras, so does f s t holds for any substitution . These likewise, if s t holds, then s are natural consequences if we assume that meaning postulates characterize identity of meaning. We shall give an instructive example. (4.72)
T ~ ~ ~
(4.73) (4.74)
It is not part of the meanings of the words that Caesar crossed the Rubicon, so John may safely believe or disbelieve it. However, it is part of the language that bachelors are unmarried men, so not believing it means associating different meanings to the words. Thus, if (4.73) is true and moreover a meaning postulate, (4.74) cannot be true. It is unfortunate having to distinguish postulates that take the form of a formula from those that take the form of an equation. Therefore, one has sought to reduce the equational calculus to the logical calculus and conversely. The notion of equivalential logic has been studied among other by Janusz Czelakowski and Roman Suszko. The following denition is due to
(4.71)
V iU
y 9 Rh f h D 5 5 9 bn@bH f 7 G 5p G ` G hbH5 g $h H #H h $h 4 g h g 9 g 5 A P 4 4 h D P 9 A G 5p y9h f h D 5 5 9 7 h 5 H A P nH f 5 g h H p y 9 g D 7 @p p G ` G h A@A g 5 bA8H #H h $h 4 g h g 9 g 5 H h 4 4 h D P 9 A p G p 9 g D 8h 4 @A g 5 b8H 7 p h A 5 H A h
V U i
V U
V U
Intensionality
~
319
Prucnal and Wro ski (1974). (For a set of formulae, we write n that for all .)
~ ~
to say
(4.75d) (4.75e)
~
i f
p i qi q
f p f q
p; p q
is called equivalential if it has a set of equivalential terms, and nitely pq equivalential if it has a nite set of equivalential terms. If p q is a set of equivalential terms for then p q is called an equivalential term for . q is an equivalential term for . If there As the reader may check, p is no equivalential term then synonymy is not denable language internally. (Zimmermann, 1999) discusses the nature of meaning postulates. He requires among other that meaning postulates should be expressible in the language itself. To that effect we can introduce a 0ary symbol and a binary symbol such that is an equivalential term for in the expanded language. (So, p q .) To secure that and do the job we add (4.75) for p q : as intended, we shall stipulate that the logical and the equational calculus are intertranslatable in the following way. (4.77)
Here, denotes model theoretic consequence, or, alternatively, derivability in the equational calculus (see Section 1.1). In this way, equations are translated into sets of formulae. In order for this translation to be faithful in both directions we must require the following (see (Pigozzi, 1991)).
Y
(4.79)
x y
(Rrule)
An equivalent condition is x; y
@ ~
(4.78)
(Grule) x y . Second, we must require that
i : i
:i
T ~ T
~ 'T T
(4.76)
si ti : i
u v
si
ti : i
u v
T V ) U S
V ) U
V ) U
V aV i6U )V U U i V ) U
T b
V ) U
(4.75c)
p q ; q r
(4.75b)
pq
~ V ) U ~ WV ) U ~V ) U V ) V ) U V ) ~ V ) U
(4.75a)
p p
q p
pr
V ) U
Denition 4.53 Let be a consequence relation. We call the set p q i p q : i I a set of equivalential terms for if the following holds
V ) U
320
Semantics
Then one can show that on any algebra and any two congruences , on , iff , so every congruence is induced by a theory. (Varieties satisfying this are called congruence regular.) Classical modal logics admit the addition of . is simply . The postulates can more or less directly be veried. Notice however that for a modal logic L there are two choices for in Denition 4.53: if we choose L then p q is an equivn p alential term; if, however, we choose L then q : n is a set of equivalential terms. In general no nite set can be named in the local case. In fact, this holds for any logic which is an extension of boolean logic by any number of congruential operators. There we may conate meaning postulates with logics. However, call a logic Fregean if it satises p q p q . A modal logic is Fregean iff it contains p p. There are exactly p p. The other three four Fregean modal logics the least of which is are p p, and , the inconsistent logic. This follows from the following theorem.
p p. p p p p, so have to show Proof. Put L : p L. (1) p p . Furthermore, L p p. that p Hence also L p p . (2) p is a theorem of , p p holds by assumption. This shows the claim. Now, in a Fregean logic, any proposition is equivalent either to a non , where and modal proposition or to a proposition are nonmodal. It follows from this that the least Fregean logic has only constant extensions: by the axiom , by its negation , or both (which yields the inconsistent logic). Now let us return to Leibniz Principle. Fix a theory T . Together with the algebra of formulae it forms a matrix. This matrix tells us what is true and what is not. Notice that the members of the algebra are called truthvalues in the language of matrices. In the present matrix, a sentence is true only if it is a member of T . Otherwise it is false. Thus, although we can have as many truth values as we like, sentences are simply true or false. Now apply Leibniz Principle. It says: two propositions and have identical meaning iff for every proposition and every variable p: p T iff p T . This leads to the following denition.
Denition 4.55 Let be a signature and
an algebra. Then for any
U V H V b k gz
VV ab U
V b r U ~ V #Eb U
cb
U %
V b ~ U 1 V b U
Proposition 4.54
k yy x
b %
x yy
%
Intensionality
321
This is called the Leibniz equivalence and the map the Leibniz operator. Notice that the denition uses unary polynomials, not just terms. This means in effect that we have enough constants to name all expressible meanings, not an unreasonable assumption.
Proof. It is easy to see that this is an equivalence relation. By Proposition 4.15 it is a congruence relation. We show that it is compatible with F. Let x F and x F y. Take t : p we get x p t x, y p p y. Since x F we have y F as well. We know that the consequence relation of F is the same as the congruence relation of F F F . So, from a semantical point of view we may simply factor out the congruence F. Moreover, this matrix satises Leibniz Principle! So, in aiming to dene meanings from the language and its logic, we must rst choose a theory and then factor out the induced Leibniz equivalence. We may then take as the meaning of a proposition simply its equivalence class with respect to that relation. Yet, the equivalence depends on the theory chosen and so do therefore the meanings. If the 0ary constant 0 is a particular sentence (say, that Caesar crossed the Rubicon) then depending on whether this sentence is true or not we get different meanings for our language objects. However, we shall certainly not make the assumption that meaning depends on accidental truth. Therefore, we shall say the following. Denition 4.57 Let be a structural consequence relation over a language of signature . Then the canonical Leibniz congruence is dened to be
The reader is asked to check that Pol 1 Clo1 , so that nothing hinged on the assumption made earlier to admit all polynomials for the denition of . We shall briey comment on the fact that we deal only
VV w aU
dlU
For a proposition free of variables, the object (Leibniz) meaning of .
VV w aU
VV w aU
QlU
t ~ U
(4.81)
:
R
Taut
Tm
is called the canonical
0 ( )
Lemma 4.56 F is an admissible congruence on
F .
1 dV U
1 QV U D
0 ( )
V U
0 ) @S ( ( D
~
(4.80)
F:
} 1
A put a b : for all t Pol1 :t a F t b F
322
Semantics
with constant propositions. From the standpoint of language, propositional variables have no meaning except as placeholders. To ask for the meaning of in the context of language makes little sense since language is a means of communicating concrete meanings. A variable only stands in for the possible concrete meanings. Thus we end up with a single algebra of meanings, one that even satises Leibniz Principle. In certain cases the Leibniz operator actually induces an isomorphism between the lattice of deductively closed sets and the lattice of congruences on . This means that different theories will actually generate different equalities and different equalities will generate different theories. For example, in boolean logic, a theory corresponds to a deductively closed set in the free algebra of propositions. Moreover, T iff T . On the left hand side we nd the Leibniz congruence generated by T , on the right hand side we nd T applied to a complex expression formed from and . It means in words the following (taking T to be the theory generated by ): two propositions, and , have the same meaning iff the proposition is a tautology. This does not hold for modal logic; for in modal logic a theory induces a nontrivial consequence only if it is closed under the necessitation rule. (The exactness of the correspondence is guaranteed for boolean logic by the fact that it is Fregean.) We shall now briey address the general case of a language as a system of signs. We assume for a start a grammar, with certain modes. The grammar supplies a designated category t of sentences. We may dene notions of logic, theory and so on on the level of denite structure terms of category t, since these are unique by construction. This is how Church formulated the simple (see next section). A theory is now a set of denite theory of types, structure terms of category t which is closed under consequence. Given a theory T we dene the following relation on denite structure terms: iff the two are intersubstitutable in any structure term preserving deniteness, and for a structure term : x T iff x T . Again, this proves to be a congruence on the partial algebra of denite structure terms, and the congruence relation can be factored. What we get is the algebra of natural meanings. Notes on this section. There have been accounts of propositional attitudes that propose representations of attitude reports that do not contain a representation of the embedded proposition. These accounts have difculties with Leibniz Principle. Against this argues (Recanati, 2000). For him, the representation of the report contains a representation of the embedded proposition
k a
1 k T !
1 Q0 ) (
1 T
p | @gp
@@5ed5'
Binding and Quantication
323
so that Leibniz Principle does not need to be stipulated. Modal operators actually have that property. However, they do not provide enough analysis of the kinds of attitudes involved. On the other hand, modal operators are very exible. In general, most attitudes will not give rise to a normal logic, though classicality must be assumed, in virtue of Leibniz Principle. Also, there is a consensus that a proposition is an expression modulo the laws of . However, notice that this means only that if two expressions are interderivable in , we must have the same attitude towards them. It does not say, for example, that if follows from then if I believe I also believe that . Classical logics need not be monotone (see the exercises below). For the general theory of modal logic see (Kracht, 1999). Exercise 148. Set up a Galois correspondence between contexts and equivalence classes of expressions. You may do this for any category . Can you characterize those context sets that generate the same equivalence class? Exercise 149. Show that L as dened above is a consequence relation. Show that (mn) is not derivable in . Hint. You have to nd a formula such that .
Exercise 151. Show that the logic of knowledge axiomatized above is
Exercise 153. Show that there are classical modal logics which are not monotone. Hint. There is a counterexample based on a twoelement algebra. Exercise 154. Prove Lemma 4.48. Exercise 155. Dene : . Let L be the set of formulae in and the boolean connectives that are derivable. Show that L is a normal modal logic containing . 4. Binding and Quantication
}
Quantication and binding are one of the most intricate phenomena of formal semantics. Examples of quantiers we have seen already: the English phrases
V U
C0 ) ( )
V U d 0ek)!) !) !x( s t ) w) 0 ) ) ( D
Exercise 152. Let F : into it. Put tension : V
be a generalized Kripkeframe and a valuation F . Then has an obvious homomorphic ex. Show that x iff x .
i r
Exercise 150. Show that with .
is the global consequence relation associated .
| r
l m
%p
| "r
324
Semantics
and , and and of predicate logic. Examples of binding without quantication can be found easily in mathematics. The integral
0
is a case in point. The integration operator takes a function (which may have parameters) and returns its integral over the interval 0 1 . What this has in common with quantication is that the function h y does not depend on x. Likewise, the limit limn an of a convergent series a : , is independent of n. (The fact that these operations are not everywhere dened shall not concern us here.) So, as with quantiers, integration and limits take entities that depend on a variable x and return an entity that is independent of it. The easiest way to analyze this phenomenon is as follows. Given a function f that depends on x, x f is a function which is independent of x. Moreover, everything that lies encoded in f is also encoded in x f . So, unlike quantication and integration, abstraction does not give rise to a loss of information. This is ensured by the identity x f x f . Moreover, extensionality ensures that abstraction also does not add any information: the abstracted function is essentially nothing more than the graph of the function. abstraction therefore is the mechanism of binding. Quantiers, integrals, limits and so on just take the abstract and return a value. This is exactly how we have introduced the x . Likewise, the integral quantiers: x was just an abbreviation of can be decomposed into two steps: rst, abstraction of a variable and then the actual integration. Notice, how the choice of variable matters:
0
The notation dx actually does the same as x: it shows us over which variable we integrate. We may dene integration as follows. First, we dene an operation I : , which performs the integration over the interval 0 1 of f . Then we dene
0
This denition decouples the denition of the actual integration from the binding process that is involved. In general, any operator O x i : i n M which binds the variables xi , i n, and returns a value, can be dened as
1
baaa
(4.85)
O xi : i
n M:
O x 0 x1
xn
V U
(4.84)
f x dx :
I x f
(4.83)
y 3
x2 ydx
x 2
1 0
x2 ydy
V U i
s '
V ) U i U
V U i
(4.82)
hy :
f x y dx
f h g A
~ g1
5 8h h
`
Binding and Quantication Table 11. The Axioms for Predicate Logic (FOL)
325
for a suitable O. In fact, since Ox M does not depend on x, we can use (4.85) to dene O. What this shows is that calculus can be used as a general tool for binding. It also shows that we can to some extent get rid of explicit variables, something that is quite useful for semantics. The elimination of variables removes a point of arbitrariness in the representation that makes meanings nonunique. In this section, we shall introduce two different algebraic calculi. The rst is the algebraic approach to predicate logic using so called cylindric algebras, the other an equational theory of calculus, which embraces the (marginally popular) variable free approach to semantics for rst order logic. We have already introduced the syntax and semantics of rstorder predicate logic. Now we are going to present an axiomatization. To this end we expand the set of axioms for propositional logic by the axioms (a13) (a21) in Table 11. The calculus (a0) (a21) with the rules (mp) and (gen) is called . A rstorder theory is a set of formulae containing (a0) (a21) and which is closed under (mp). We write if every theory containing x also contains . In virtue of (a16) and (a17) we get that x as well as x x which means that one of the quantiers can be dened from the other. In (a21), we assume i R . We shall prove its completeness using a more powerful result due to Leon
V U
V U ~
V U
" B
V U ~
~ U
V U
RULES . (mp)
(gen)
VV aaV
) iaaa)
A XIOMS . (a0) (a12) (a13) x x x (a14) x t x (a15) x x fr (a16) x x (a17) x x (a18) x x x (a19) x y x y y x (a20) x y z x y y z x z (a21) x0 x R 1 y yi i R xi R x0 x R 1 y xi R x0
U V 4 aV UU
DV
V V ~U D
V aV U VD
iaaa) U ) U ~U V aa@V ~U U ~U ~U V V V D U ~U V V D DV U V V D V U ~U V U V 1 U V ~U V V aV ~U U V U

Y
~U ~ U ~ U ~U ~U ~U ~ U
x R
326
Semantics
Henkin that simple type theory ( ) is complete with respect to Henkin frames. Notice that the status of (gen) is the same as that of (mn) in modal logic. (gen) is admissible with respect to the model theoretic consequence dened in Section 3.8, but it is not derivable in it. To see the rst, suppose that is a theorem and let x be a variable. Then x is a theorem, too. However, x does not follow. Simply take a unary predicate letter P and a structure consisting of two elements, 0, 1, such that P is true of 0 but P x but xPx. not of 1. Then with x : 0 we have Now let P be the set of all formulae that can be obtained from (a0) (a21) by applying (gen). Then the following holds.
"
~
Theorem 4.58
iff is derivable from P using only (mp).
Proof. Let be a proof of from P using only (mp). Transform the proof in the following way. First, prex every occurring formula by x . Further, for j for some i j k, insert in every k such that k follows from i and i front of the formula x j the sequence
The new proof is a proof of x from P, since the latter is closed under (gen). Hence, the set of formulae derivable from P using (mp) is closed under (gen). Therefore it contains all tautologies of . The next theorem asserts that this axiomatization is complete. Denition 4.59 Let be a set of formulae of predicate logic over a signature, a formula over that same signature. Then iff can be proved from P using only (mp).
Recall from Section 3.8 the denition of the simple theory of types. There we have also dened the class of models, the so called Henkinframes. Recall further that this theory has operators , which allow to dene the universal quantiers in the following way.
The simple theory of types is axiomatized as follows. We dene a calculus exclusively on the terms of type t (truth values). However, it will also be possible to express that two terms are equal. This is done as follows. Two terms
V aV
~ U
(4.87)
x Nt :
x Nt
" B
Theorem 4.60 (Godel)
iff
V ~U
V V ~U )
V ~U
V aV ~U U
V U ~
U ~ V U
(4.86)
x i
x i
x j
x i
x j
V U V 20 l( ~U )
) V U ~
V U ~
V U
) 0 l(
p | @gp
V U ~
V U
V x ~U
Binding and Quantication Table 12. The Simple Theory of Types
327
RULES . (mp) (conv)
For this denition we assume that z t is free neither in M nor in N . If one dislikes the side conditions, one can prevent the accidental capture of z t using the following more rened version: However, if z t is properly chosen, no problem will ever arise. Now, let (s0) (s12) be the formulae (a0) (a12) appropriately translated into the language of types. We call the Hilbertstyle calculus consisting of (s0) (s17) and the rules given in Table 12 . All instances of theorems of are theorems of . For predicate logic this will also be true, but the proof of that requires work. The rule (gen) is a derived rule of this calculus. To see this, assume that M t z is a theorem. Then, by (conv), z M t z z also is a theorem. Using (ug) we get z M t z , which by abbreviatory convention is z M t . We will also show that (a14) and (a15) are theorems of . Proof. By convention, x yt x yt . Moreover, by (s14), x yt x yt x x yt yt . Using (sub) we get
~ U
~U U a!
2gc~
(4.90)
N x
x yt
yt
x yt
N x yt
p | @gp
V V
~U aU V V D U ~
~U ~ p | gp
Lemma 4.61
x yt
N x yt .
p | @gp
V aV
| "r
U V
~U
p | rp
(4.89)
N :
x y
t x
t y
M N
U V
~ U
(4.88)
N :
t M
t N
M and N of type are equal if for every term O and O t N are equivalent.
the terms O
Mt
fr M
U V V
Mt
Nt Mt Nt Mt Nt Nt
(ug)
M t x M M t x (sub) M t N
fr M
U V "aV
V
t t M
U V aV
A XIOMS . (s0) (s12) + (s13) x yt M t x (s14) x t x t y (s15) xt yt xt yt z x z y (s16) (s17) x t y x t x

yt
U V aU ~U V U V U V aU U ~U
p | gp
~U aU
~ U
328
Semantics
Lemma 4.62 Assume that x is not free in Nt . Then Proof. With Nt Nt Nt x Nt x and the fact that x Nt is derivable (using (gen)), we get with (conv) x Nt x Nt x with (s13) and (mp) we get
(The fact that x is not free in Nt is required when using (s13). In order for the replacement of Nt for yt in the scope of x to yield exactly Nt again, we need that x is not free in Nt .) = + (ext), and (ext) is the axiom (s16). Hence it remains to show Proof. that M N implies M N . So, assume the rst. Then we have M N . Hence
By abbreviatory convention, M N . We shall now show that is complete with respect to Henkinframes where simply is interpreted as identity. To do that, we rst prove that is a congruence relation.
I k
(4.96d)
(4.96c)
(4.96b)
(4.96a)
x y y x z x z x
y
p | @gp
Lemma 4.64 The following formulae are provable in
p | @gp Qc ~
U V
~U 2gc~
(4.95)
t M
Qc
By symmetry,
t M
gc
(4.94)
t M
t N
t N .
Using (gen) we get

t N
g
Hence, using (conv), and
t M
(4.93)
t M
t M
t M
t M
p | gp
gc
~
Lemma 4.63 If
N then
N .
t N
we get
~U
~ U
DD D
gc
(4.92)
Nt
x Nt
Nt
x Nt
V aV
U V
~ U
U aU
U V
~ U
~U
p | @gp
(4.91)
Nt
x Nt Nt and
as required.
~ '
329
Proof. (4.96a) Let z t be a variable of type t. Then z t x z t x is provable in (as p p is in ). Hence, z t z t x z t x is provable, which is x x . (4.96b) and (4.96c) are shown using predicate logic. (4.96d) Assume x x and y y . Now,
W
(4.98)
Likewise, one can show that

This allows to derive the desired conclusion. Now we get to the construction of the frame. Let C be the set of closed formulae of type . Choose a maximally consistent Ct . Then, for each type , dene by M N iff M N . By Lemma 4.64 this is an equivalence relation for each , and, moreover, if M M and (4.101) M : N : M
Lemma 4.65 (Witnessing Lemma)
V "aV
00 T aI1
()0 T I1
() t) v) 3) T !i@!b1
Qc
S !(
p | gp
(4.102)
D :
D bn
0 1D
Finally, put D : : , : Typ B ).
M : M and :
C . Next, is dened as usual, : . This denes a structure (where
& D @S D
V U
bn
N , then also M
N .
For M
C put
Qc
k U
(4.100)
V aV
Similarly, using N
one shows that x

y
V k
k U
V
k U
g
(4.99)
V k
V k
gX~
g
M y
V aV
Put M
u z
. Using the rule (conv), we get M y
UV aV
(4.97)
u z
U V
~ U k
| r
p | rp
330
Semantics
Using classical logic we obtain Now, N t yt Similarly, N Hence

gc
~
Using (gen), (s13) and (mp) we get This we had to show.
Proof. By Lemma 4.63, if M N , then M N . So, the axioms of the theory are valid, and D : S is a functionally complete (typed) applicative structure. Since is maximally consistent, D t consists of two elements, which we now call 0 and 1. Furthermore, we may arrange 1 iff Mt . It is then easily checked that the interpretation it that Mt of is complement, and the interpretation of is intersection. Now we treat . We have to show that for a D a : 1 iff for every b D : t M a b 1. Or, alternatively, t iff M t N for every closed term N . Suppose that M t . Using Lemma 4.61 and the fact that is deductively closed, M t N . Conversely, assume M t N for every constant term N . Then M x M t x is a constant term, and it is in . Moreover, by the Witnessing Lemma, M t . Finally, we have to show that for every a D t : if there is a b D such that a b 1 then a a 1. This means that for M t : if there is a constant term N such that M t N then M t M t . This is a consequence of (s17). Now, it follows that Nt iff Nt . More generally, let an assign ment of constant terms to variables. Let M be a term. Write M for the result of replacing a free occurrence of a variable x by x . Then This is shown by induction.
"0 )
i v
(4.107)
0 3) !bT
V aV
1 V
S !(
%i
Lemma 4.66
is a Henkinframe.
VV aaV
U gc
(4.106)
V "aV
(4.105)
t y
V aV
VV aaV
N t y , the latter being equivalent to N t y . N t is equivalent with N t N t .
VV aaV
U V
U aU
U U U V aU U V V U aU
U aU gc
(4.104)
V aV
t
U V
U gc
(4.103)
Proof. Write N
t x
. Now, by (s17)
& D V 3
U Q3
331
Lemma 4.67 Let 0 be a consistent set of constant terms. Then there exists a maximally consistent set of constant terms containing 0 . Proof. Choose a wellordering N : on the set of constant terms. Dene i by induction as follows. 1 : N if the latter is consistent. Otherwise, 1 : . If is a limit ordinal, : . We shall show that : is maximally consistent. Since it contains 0 , this will complete the proof. (a) It is consistent. This is shown inductively. By assumption 0 is consistent, and if is consistent, then so is 1 . Finally, let be a limit ordinal and suppose that is inconsistent. Then there is a nite subset which is inconsistent. There exists an ordinal such that . is consistent, contradiction. (b) There is no consistent superset. Assume that there is a term M such that M is consistent. Then . Then is consistent, whence by denition N for some , M N N 1 . Contradiction. Theorem 4.68 (Henkin) (a) A term Nt is a theorem of iff it is valid in iff it holds all Henkinframes. (b) An equation M N is a theorem of in all Henkinframes iff it is valid in the many sorted sense. We sketch how to prove Theorem 4.60. Let be a signature for predicate : logic. Dene a translation into
0 )
l(
) "0 l(
(4.110)
Lemma 4.69 Let be a valuation on
. Extend to . Then
Now, given a rstorder model with De M and D : D (4.108a) and (4.108b).
(4.109b)
r : e
, we can construct a Henkinframe for D , by interpreting f and r as given by
VV a'aa@V VV aaaa@V
aa U aa U
U U U U
(4.109a)
f : e
p | rp
This is extended to all formulae. Now we look at the signature for the constants f , r of type

) aaa)
U V
U aa@V
U V
(4.108b)
r :
x 0 x1
x r
r x0
x r
) aaa)
U V
U aa@V
U V
(4.108a)
f :
x 0 x1
f x0
with
p | gp
p | gp
S s
S s
0 ) (
p | rp
Ss 1
332
Semantics
Right to left is by induction. Now if then there is a model , from which we get a Henkinframe . The proof of Theorem 4.60 is as follows. Clearly, if is derivable it is valid. Suppose that it is not derivable. Then is not derivable in . There is a Henkinframe . This allows to dene a rstorder model . Exercise 156. The set of typed terms is dened over a nite alphabet if the set B of basic types is nite. Dene from this a wellordering on the set of terms. Remark. This shows that the proof of Lemma 4.67 does not require the use of the Axiom of Choice for obtaining the wellordering. Exercise 157. Show Lemma 4.69. Exercise 158. Complete the details of the proof of Theorem 4.60. Exercise 159. Let L be a normal modal logic. Show that L iff b L , n : where b : n . Hint. This is analogous to Theorem 4.58.
~
5.
Algebraization
Now that we have shown completeness with respect to models and frames, we shall proceed to investigate the possibility of algebraization of predicate logic and simple type theory. Apart from methodological reasons, there are also practical reasons for preferring algebraic models over frames. If is a sentence and a model, then either or . Hence, the theory of a single model is maximally consistent, that is, complete. One may argue that this is as it should be; but notice that the base logic ( , ) is not complete neither is the knowledge ordinary people have. Since models are not enough for representing incomplete theories, something else must step in their place. These are algebras for some appropriate signature, for the product of algebras is an algebra again, and the logic of the product is the intersection of the logics of the factors. Hence, for every logic there is an adequate algebra. However, algebraization is not straightforward. The problem is that there is no notion of binding in algebraic logic. Substitution always is replacement of an occurrence of a variable by the named string, there is never a preparatory replacement of variables being performed. Hence, what creates in fact big problems is those axioms and rules that employ the notion of a free or bound
) 0 l(
p | rp
(0 )
p | @gp
l(
" B
"
~
1 )
gc
~
Lemma 4.70
iff
Algebraization
333
variable. In predicate logic this is the axiom (a15). (Unlike , has no rule of substitution.) It was once again Tarski who rst noticed the analogy between modal operators and quantiers. Consider a language L of rst order logic without quantiers. We may interpret the atomic formulae of this language as propositional atoms, and formulae made from them using the boolean connectives. Then we have a somewhat more articulate version of our propositional boolean language. We can now introduce a quantier Q simply as a unary operator. For example, is a unary operator on formulae. Given a formula , is again a formula. (Notice that the way we write the formulae is somewhat different, but this can easily be accounted for.) In this way we get an extended language: a language of formulae extended by a turn the logic exactly into a single quantier. Moreover, the laws of normal modal logic. The quantier then corresponds to , the dual of . Clearly, in order to reach full expressive power of predicate logic we need to add innitely many such operators, one for each variable. The resulting algebras are called cylindric algebras. The principal reference is to (Henkin et al., 1971). We start with the intended models of cylindric algebras. A formula may , to 2, where is be seen as a function from models, that is, pairs a structure and an assignment of values to the variables. First of all, we shall remove the dependency on the structure, which allows us to focus on the assignments. There is a general rst order model for any complete (= maximal consistent) theory, in which exactly those sentences are valid that belong to the theory. Moreover, this model is countable. Suppose a theory T is not complete. Then let i , i I, be its completions. For each i I, let i be the canonical structure associated with i . If i is the cylindrical algebra associated with i (to be dened below), the algebra associated with T will i I i . In this way, we may reduce the study to that of a cylindric algebra of a single structure. Take a rst order structure M , where M is the universe and the interpretation function. For simplicity, we assume that there are no functions. (The reader shall see in the exercises that there is no loss of expressivity in renouncing functions.) Let V : xi : i be the set of variables. Let V ; M be the boolean algebra of sets of functions into M. Then for every formula we associate the following set of assignments:
) "0 l(
(4.111)
| "r
0 l( )
) g
& 21
0 )
) 5 ( 1
) 50 D
334
Semantics
i.
Then i S : S . (The standard notation for i is i . The letter here i is suggestive for cylindrication. We have decided to stay with a more logical notation.) Furthermore, for every pair of numbers i j we assume the element i j .
i j
It is interesting to note that with the help of these elements substitution can be dened. Namely, put
i j
Lemma 4.71 Let y be a variable distinct from x. Then y x is equivalent with x y x . Thus, equality and quantication alone can dene substitution. The relevance of this observation for semantics has been nicely explained in (Dresner, 2001). For example, in applications it becomes necessary to introduce constants for the relational symbols. Suppose, namely that is a binary relation symbol. Its interpretation is a binary relation on the domain. If we want to replace the structure by its associated cylindric algebra, the relation is replaced by an element of that algebra, namely
However, this allows us prima facie only to assess the meaning of x 0 is taller than x1 . We do not know, for example, what happens to x 2 is taller than x7 . For that we need the substitution functions. Now that we have the unary substitution functions, any nitary substitution becomes denable. In this particular case,
Thus, given the denability of substitutions, to dene the interpretation of R x R 1 . we only need to give the element R x0 x1 The advantage in using this formulation of predicate logic is that it can be axiomatized using equations. It is directly veried that the equations listed in the next denition are valid in the intended structures.
) iaaa)
U k F @a h h t q
U k F @a t q
(4.116)
x2 x7
7 2 1 0
x0 x1
T V
U 1 0 " daV
U V )
U (
U k F @a t q
(4.115)
x0 x1 :
: x 0 x1
5 h P P H 88@4
5 h P P H b@@4
t U
I ' & &
V U h
(4.114)
i j
x :
x x
if i j, otherwise.
T V
V U
(4.113)
: xi
xj
1 )
v'U %v V U D V U D
(4.112)
S :
: for all
xi
Now, for each number i we assume the following operation
Algebraization
335
Denition 4.72 A cylindric algebra of dimension , a cardinal number, is a structure

(4.118)
We shall see that this denition allows to capture the effect of the axioms above, with the exception of (a15). Notice rst the following. is a congruas well. For if is a tautology then so is x i xi . ence in Hence, we can encode the axioms of as equations of the form as long as no side condition concerning free or bound occurrences is present. We shall not go into the details. For example, in x x occurs trivially bound. It remains to treat the rule (gen). It corresponds to the rule (mn) of modal logic. In equational logic, it is implicit anyway. For if x y then Ox O y for any unary operator O.
A particular example of a cylindric algebra is , the formulae of pure equality based on the variables i , i , and iff is a theorem. (If additional function or relation symbols are needed, they can be added with little change to the theory.) This algebra is locally nite dimensional and is freely generated. The second approach we are going to elaborate is one which takes substitutions as basic functions. For predicate logic this has been proposed by
is called the dimension of a. a 0 for all a A.
is said to be locally nite dimensional if
i)
(4.119)
a :
i:i
ia
Denition 4.73 Let
be a cylindric algebra of dimension , and a
A. Then
V aV
v'Ut U V t U t V t U D
V U ~
D V t D D D 0 s) t) v !!i) ) D
D U s
) (
(ca1) (ca2) (ca3) (ca4) (ca5) (ca6) (ca7) (ca8)
A01 is a boolean algebra. 0 0. x x x. x y x y. x x. 1. If then . If then x x
) )
1 )
such that the following holds for all x y
A and
0 a0
0.
0 ()
() s) t) v '!!i) ) ) ( D
V U
(4.117)
A01
V U
336
Semantics
Halmos (1956), but most people credit Quine (1960) for this idea. For an exposition see (Pigozzi and Salibra, 1995). Basically, Halmos takes substitution as primitive. This has certain advantages that will become apparent soon. Let us agree that the index set is , again called the dimension. Halmos denes operations for every function : such that there are only nitely many i such that i i. The theory of such functions is axiomatized independently of quantication. Now, for every nite set I Halmos admits an operator I , which represents quantication over each of variables i , , I is the identity, otherwise I K x I K x. where i I. If I Thus, it is immediately clear that the ordinary quantiers i sufce to generate all the others. However, the axiomatization is somewhat easier with the polyadic quantiers. Another problem, noted in (Sain and Thompson, 1991), is the fact that the axioms for polyadic algebras cannot be schematized using letters for elements of the index set. However, Sain and Thompson (1991) also note that the addition of transpositions is actually enough to generate the same functions. To see this, here are some denitions. I. The support of , supp , is Denition 4.74 Let I be a set and : I the set i : i i . A function of nite support is called a transformation. is called a permutation of I if it is bijective. If the support contains exactly two elements, is called a transposition. The functions whose support has at most two elements are of special interest. Notice rst the case when supp has exactly one element. In that case, is j called an elementary substitution. Then there are i j I such that i and k k if k i. If i and j are in I, then denote by i j the permutation that sends i to j and j to i. Denote by i j the elementary substitution that sends i to j. Proposition 4.75 Let I be a set. The set I of functions : I I of nite support is closed under concatenation. Moreover, I is generated by the elementary substitutions and the transpositions. The proof of this theorem is left to the reader. So, it is enough if we take only functions corresponding to i j and i j . The functions of the rst kind are already known: these are the ij . For the functions of the second kind, write i j . Sain and Thompson effectively axiomatize cylindric algebras that have these additional operations. They call them nitary polyadic algebras. Notice also the following useful fact, which we also leave as an exercise.
6
V U
V U
s U
V SU IT ! V V 'V F U U U
V ) U
V U
1 )
V U
V ) U
h )
V U
V F U
%V U D T
2V U D
VD U
V U
V U
Algebraization
337
Proposition 4.76 Let : I I be an arbitrary function, and M I nite. Then there is a product of elementary substitutions such that M M. This theorem is both stronger and weaker than the previous one. It is stronger because it does not assume to have nite support. On the other hand, only approximates on a given nite set. (The reader may take notice of the fact that there is no sequence of elementary substitutions that equals the transformation 0 1 on . However, we can approximate it on any nite subset.) Rather than developing this in detail for predicate logic we shall do it for the typed calculus, as the latter is more rich and allows to encode arbitrarily complex abstraction (for example, by way of using ). Before we embark on the project let us outline the problems that we have to deal with. Evidently, we wish to provide an algebraic axiomatization that is equivalent to the rules (3.95a) (3.95g) and (3.95i). First, the signature we shall choose has function application and abstraction as its primitives. However, we cannot have a single abstraction symbol corresponding to , rather, for each variable (and each type) we must assume a different unary function symbol i , corresponding to i . Now, (3.95a) (3.95e) and (3.95i) are already built into the BirkhoffCalculus. Hence, our only concern are the rules of conversion. These are, however, quite tricky. Notice rst that the equations make use of the substitution operation N x . This operation is in turn dened with the denitions (3.93a) (3.93f). Already (3.93a) for N i can only be written down if we have an operation that performs an elementary substitution. So, we have to add the unary functions i j , to denote this substitution. Additionally, (3.93a) needs to be broken down into an inductive denition. To make this work, we need to add correlates of the variables. That is, we add zeroary function symbols i for every i . Symbols for the functions i j permuting i and j will also be added to be able to say that the variables all range over the same set. Unfortunately, this is not all. Notice that (3.95f) is not simply an equation: it has a side condition, namely that y is not free in M. In order to turn this into an equation we must introduce sorts, which will help us keep track of the free variables. Every term will end up having a unique sort, which will be the set of i such that i is free in it. B is the set of basic types. Call a member of : Typ B an index. If i is an index, is its type and i its numeral. Let be the set of pairs where is a type and a nite set of indices. We now start with the signature. Let and be nite sets of indices, ,
D }
p | rp
0 ) ( 0 ) (
e V U
338
Semantics
Here is the result of exchanging and in and is the result of replacing by in . Notice that is now a constant! We may also have additional functional symbols stemming from an underlying (sorted) algebraic signature. The reader is asked to verify that nothing is lost if we assume that additional function symbols only have arity 0, and signature for a suitable . This greatly simplies the presentation of the axioms. This denes the language. Notice that in addition to the constants we also have variables x for each sort . The former represent the variable i i (where i ) of the calculus and the latter range over terms of sort . Now, in order to keep the notation perspicuous we shall drop the sorts whenever possible. That this is possible is assured by the following fact. If t is a term without variables, and we hide all the sorts except for those of the variables, still we can recover the sort of the term uniquely. For the types this is clear, for the second component we observe the following.
The proof of this fact is an easy induction. For the presentation of the equations we therefore omit the sorts. They have in fact only been introduced to ensure that we may talk about the set of free variables of a term. The equations (vb1) (vb6) of Table 13 characterize the behaviour of the substitution and permutation function with respect to the indices. We assume that , , all have the same type. (We are dropping the superscripts indicating the sort.) The equations (vb7) (vb11) characterize the pure binding by the unary operators . The set of equations is invariant under permutation of the indices. Moreover, we can derive the invariance under replacement of bound variables, for example. Thus, effectively, once the interpretation of 0 is known, the interpretation of all i , i , is
V U
Lemma 4.77 If a term t has sort then fr t
00 w ( a!) a(
! )
0 a0 k
s ) 0 k ) ()
0 ) (
0 ) (
V ) U
(4.120e)
00 al !
(4.120d)
(4.120c)
where
00 al V 00 S aIT fv )
(4.120b)
() 0 ) a( ( ) ) 0 ) a( () ( A ) ) 0 ) a( U () ( 8 0 ) a( () ( 00 S ( aIT ') a(
(4.120a)
: :
0 k k ( )

0 ) (
types, and their type:
i ,
i indices. We list the symbols together with
Algebraization
339
The equivalent of (3.95f) now turns out to be derivable. However, we still need to take care of (3.95g). Since we do not dispose of the full substitution N x , we need to break down (3.95g) into an inductive denition (vb12) (vb14). The condition fr y is just a shorthand; all it says is that we take only those equations where the term y has sort and . Notice that
0 ) (

(4.121)
known as well. For using the equations we can derive that inverse of 0 i , and so
8 8
V U
3 V
D
U
(vb14)
if if
3 @V
U
(vb13)
V U
V @V 3
U aU
if if
V V 3
UU 3 3 aY@V @V
U aU
3@VaV 3 U U V 3 U
y x y
T ) ) S
if
T ) ) S
if
T ) ) S
A A
x x x x x x
if
x
8
(vb3) (vb4) (vb5) (vb6) (vb7) (vb8) (vb9) (vb10) (vb11) (vb12)
x
x
(vb2)
if if otherwise.
(vb1)
) bT ) 1 S
Table 13. The Variable Binding Calculus
if otherwise.
A
8 8 8
3 3 3
fr y
is the
340
Semantics
the disjunction in (vb13) is not complete. From these equations we deduce that k k 0 0 , so we could in principle dispense with all but one variable symbol for each type. The theory of sorted algebras now provides us with a class of models which is characteristic for that theory. We shall not spell out a proof that these models are equivalent to models of the calculus in a sense made to be precise. Rather, we shall outline a procedure that turns an algebra into a model of the above equations. Start with a signature F , sorted or unsorted. For ease of presentation let it be sorted. Then the set B of basic types is the set of sorts. Let be the equational theory of the functions from the signature alone. For complex types, put A : A A . Now transform the original signature into a new signature F where f 0 for all f F. Namely, for f : i n Ai A set
n 1
n 1
This is an element of A where
We blow up the types in the way described above. This describes the transition from the signature to a new signature . The original equations are turned into equations over as follows.
(4.124a) (4.124b) (4.124c)
Next, given an theory, T , let T be the translation of T , with the postulates (vb1) (vb14) added. It should be easy to see that if T s t then also T s t . For the converse we provide a general model construction that for each multisorted structure for T gives a multisorted structure for T in which that equation fails. An environment is a function from ( Typ B ) into A : such that for every index i , i A . We denote the set of environments by . Now let C be the set of functions from to A which depend at most on . That is to say, if and are environments such that for all , , and if f C then f f .
V U
Vk U
V U
1QVa0 ) aU ( e
3 3 iaay@V
0 ) (
3 @V
V U D D D 3 UaUyaa6U V i aV U U D 6 D 6
f s
s 0
s 1
VV a6aa@V
U yaa
V lk U
V U
(4.123)
V iaaa) U )
8aaa
0 1
(4.122)
f :
n 1
n 1
s n
V k U
0 ) (
0k ) ( k D
$ D
8 k
DD D
Algebraization
341
(4.125)
It takes some time to digest this denition. Basically, given f , g f : y f y : y A is a function from A to A with parameter . Hence it is a member of A . It assigns to y the value of f on , which is identical to except that now is replaced by y. This is the abstraction the value from y. Finally, for each f C , f assigns to g f . (Notice that the role of abstraction is now taken over by the set formation operator x : .) Theorem 4.78 Let a multisorted signature, and T an equational theory over . Furthermore, let be the signature of the calculus with 0ary constants f for every f F. The theory T consisting of the translation of T and the equations (vb1) (vb14) is conservative over T . This means that an equation s t valid in the algebras satisfying T iff its translation is valid in all algebras satisfying T .
p1
T 1
V U
0 IT
0 aV V U
V U! V U
aU ) @') @ ( S ( S
y f y
(4.129)
:C
0 ) (
Finally, we dene abstraction. Let
i.
:y
T 1
0 aV U
3V U ) @0 ) ( ( S u
f g
0 aV V U
(4.128)
aU ) @S (
Next,
is interpreted as follows.
:C
V ) U X
(4.127)
: C
C : f
) X
(4.126)
: C
C : f
0 l V ) ) ( U
Let
and
The constant f is now interpreted by the function f : variables we put : . A transformation : duces a map : : . Further,
f . For the naturally in-
2s
V U X
8 A 0 l ! ) ) ( 0 ) ( D D V V Y X Y U U D VaV U Y U Y D XV X U D X X U V U X D
V U
342
Semantics
Notes on this section. The theory of cylindric algebras has given rise to a number of difcult problems. First of all, the axioms shown above do not fully characterize the cylindric algebras that are representable, that is to say, have as their domain U , the dimension, and where relation variables range over nary relations over U. Thus, although this kind of cylindric algebra was the motivating example, the equations do not fully characterize it. As Donald Monk (1969) has shown, there is no nite set of schemes (equations using variables for members of set of variable indices) axiomatizing the class of representable cylindric algebras of dimension if 0 ; moreover, for nite , the class of representable algebras is not nitely axiomatizable. J. S. Johnson has shown in (Johnson, 1969) an analogue of the second result for polyadic algebras, Sain and Thompson (1991) an analogue of the rst. The model construction for the model of the calculus is called a syntactical model in (Barendregt, 1985) . It is due to Hindley and Longo from (Hindley and Longo, 1980). The approach of using functions from the set of variables into the algebra as the carrier set is called a functional environment model, and has been devised by Koymans (see (Koymans, 1982)). A good overview over the different types of models is found in (Meyer, 1982) and (Koymans, 1982). Exercise 160. For f an nary function symbol let R f be an n 1ary relation symbol. Dene a translation from terms to formulae as follows. First, for a term t let xt be a variable such that xt xs whenever s t. (4.130b) f t0 tn
Finally, extend this to formulae as follows.
M be a signature. We replace the function symbols by Now, let relation symbols, and let be the extension of such that
V V U iU
1 Q0 ) @S i(
(4.132)
Rf
xy
: f x
1 &
0 )
'U &
(4.131d)
(4.131c)
(4.131b)
V
) iaaa)
) iaaa) U
(4.131a)
R t0
tn
R xt0
xtn
i n
ti
@V
) iaaa)
R f xt0

V U
(4.130a)
xi
xi
xtn
xf
i n
ti
) iaaa) U ) (
Montague Semantics II
343
Exercise 161. Show that if is a cylindric algebra of dimension , every , satises the axioms of . Moreover, show that if then , . Exercise 162. Prove Lemma 4.71. Exercise 163. Show Proposition 4.76. Exercise 164. Show that is a cylindric algebra of dimension and that it is locally nite dimensional. Exercise 165. Prove Proposition 4.75.
6.
This section deals with the problem of providing a language with a compositional semantics. The problem is to say, which languages that are weakly context free are also strongly context free. The principal result of this section is that if a language is strongly context free, it can be given a compositional interpretation based on an ABgrammar. Recall that there are three kinds of languages: languages as sets of strings, interpreted languages, and nally, systems of signs. A linear system of signs is a subset of A C M, where C is a set of categories. Denition 4.79 A linear sign grammar is context free if (a) C is nite, (b) if f is a mode of arity n 0 then f x0 xn 1 : i n xi , (c) f m0 mn 1 is dened if there exist derivable signs i ei ci mi , i n, such that f c is dened and (d) if f g then f g . If only (a) (c) are satised, the grammar is quasi context free. is (quasi) context free if it is generated by a (quasi) context free linear sign grammar. This denition is somewhat involved. (a) says that if f is an nary mode, f can be represented by a list of nary immediate dominance rules. It conjunction with (b) we get that we have a nite list of context free rules. Condition (c) says that the semantics does not add any complexity to this by introducing partiality. Finally, (d) ensures that the rules of the CFG uniquely dene the modes. (For we could in principle have two modes which reduce to the same phrase structure rule.) The reader may verify the following simple fact.
V iU
) iaaa)
0 ) l( e 0
) ) (
) "0 l( V D
) iaaa)
i r
0 ) D
Then put
. Show that
iff
344
Semantics
Proposition 4.80 Suppose that is a context free linear system of signs. Then the string language of is context free. An interpreted string language is a subset of A M where M is the set of (possible) (sentence) meanings. The corresponding string language is S . An interpreted language is weakly context free iff the string language is. Denition 4.81 An interpreted language is strongly context free if there is a context free linear system of signs and a category such that . For example, let L be the set of declarative sentences of English. M is arbitrary. We take the meanings of declarative sentences to be truthvalues, here 0 or 1 (but see Section 4.7). A somewhat more rened approach is to let the meanings be functions from contexts to truth values. Next, we shall also specify what it means for a system of signs to be context free. Obviously, a linear context free system of signs denes a strongly context free interpreted language. The converse does not hold, however. A counterexample is provided by the following grammar, which generates simple equality statements.

Expressions of category are called equations, and they have as their meanor . Now, assign the following meanings to the strings. ing either has as its and meaning the number 1, the number 2, and as its meaning the number 2. The meanings are as follows.
This grammar is unambiguous; and every string of category X has exactly one Xmeaning for X . Yet, there is no CFG of signs for this language. has the same meaning as , while substituting one for For the string the other in an equation changes the truth value.
s
(4.134)
r v Fr
T rt IICS TIih @S q T rt IICS T rt IICS T q Iih @S
r v r @ r v yF5r
@ r v 4R5r
4jh r v r @ 4j5r r @ r v j5r r v r @
T rt ICS TIih @S D q TIih @S D q T rt ICS D
T G ) S ) 51
q ih @
r C v
(4.133)
Vs PU
s
V U
@ 7
r @ 4FHr r @ 4F#h I$Hr @ @ 4I$#h
r v Fr
rt
345
We shall show below that weakly context free and strongly context free coincide for interpreted languages. This means that the notion of an interpreted language is not a very useful one, since adding meanings to sentences does not help in establishing the structure of sentences. The idea of the proof is very simple. Consider an arbitrary linear sign grammar and a start symbol . We replace throughout by , where C. Now replace the algebra of meanings by the partial algebra of denite structure terms. This denes . Then for c C , x c is a sign generated by iff is a denite struc ture term such that x and c; and x is generated by iff is a denite structure term such that x, and . Finally, we introduce the following unary mode .
This new grammar, call it , denes the same interpreted language with respect to . So, if an interpreted language is strongly context free, it has a context free sign grammar of this type. Now, suppose that the interpreted language is weakly context free. So there is a CFG G generating . At the rst step we take the trivial semantics: everything is mapped to 0. This is a strongly context free sign system, and we can perform the construction above. This yields a context free sign system where each x has as its Cdenotations the set of structure terms that dene a Cconstituent with string x. Finally, we have to deal with the semantics. Let x be an string and let Mx be the set of meanings of x and x the set of structure terms for x as an . If x Mx , there is no grammar for this language based on G. If, however, x Mx there is a function f x : x Mx . Finally, put (4.136)
This denes the sign grammar . It is context free and its interpreted language with respect to the symbol is exactly . Theorem 4.82 Let be a countable interpreted language. If the string language of is context free then is strongly context free. there exists a context free system of signs For every CFG G for and with a category such that
Vs PU
) a i(U
0V alU
f :
fx : x
7 A f I A }
Vs PU
0aV8I ) ) ( U i V0 a8) } D 1 i A ( D
0 ) ) ( i
V0 a8)
) a i(U
(4.135)
}
0D 8)
) ( i
i D 0 i 8) ) (D T v S
}
346
Semantics
(b) for every nonterminal symbol A

G
~
Proof. has been established. For it sufces to observe that for every CFL there exists a CFG in which every sentence is innitely ambigous. Just replace by and add the rules . Notice that the use of unary rules is essential. If there are no unary rules, a given string can have only exponentially many analyses. Lemma 4.83 Let L be a CFL and d 0. Then there is a CFG G and such that for all x L the set of nonisomorphic Gtrees for x has at least d x members. Proof. Notice that it is sufcient that the result be proved for almost all x. For nitely many words we can provide as many analyses as we wish. First of all, there is a grammar in Chomsky normal form that generates L. Take two rules that can be used in succession.
Add the rules
Then the string ABC has two analyses: A B C and A B C . We proceed similarly if we have a pair of rules
This grammar assigns exponentially many parses to a given string. To see this notice that any given string x of length n contains n distinct constituents. For n 3, we use the almost all clause. Now, let n 3. Then x has a decomposition x y0 y1 y2 into constituents. By inductive hypothesis, for y i we have d yi many analyses. Thus x has at least 2d y0 d y1 d y2 2d x analyses. The previous proof actually assumes exponentially many different structures to a string. We can also give a simpler proof of this fact. Simply replace N by N d and replace in each rule every nonterminal X by any one of the X k , k d.
i i i
(4.139)
XE
AB
(4.138)
XE
(4.137)
BC
DE
AD
Ti
i S
1 0 ) ) ( i
x : for some m
M: x A m
i S
V U
(a)
x:A
1 i
0 ) (
347
Theorem 4.84 Let be a countable interpreted language. Then is strongly context free for a CFG without unary rules iff is context free and there is some constant c 0 such that for almost all strings of A : the number of meanings of x is bounded by c x . We give an example. Let A : (4.140) L: and
We associate the following truth values to the sentences.
(4.141)
1 0
Furthermore, we x a CFG that generates L:

U
We construct a context free system of signs with . For every rule of arity n 0 we introduce a symbol of arity n. In the rst step the interpretation is simply given by the structure term. For example,
(To be exact, on the left hand side we nd the unfolding of rather than the symbol itself.) Only the denition of is somewhat different. Notice that the meaning of sentences is xed. Hence the following must hold.
We can now do two things: we can redene the action of the function . Or we can leave the action as given and factor out the congruence dened by
(4.144)
1 0 1 0
0 ! W W ) P
) y ( i i
V0 a!z)
) 8) ) aU i()0 i(
VI$HP HP U P P P V P tP 1$U P P P VI$HP P P U P P V HP P P U P P
(4.143)
x y
V U
(4.142)
0 : 2 : 4 :
1 : 3 :
T p e p ) e p e AF7 H AA 7 bH RR7 bH 5 h h A 5 P 7 H G A h h A A 5 ) p e ) S A7 5bH @IG IG RG D A h h A P 7 H P 7 H A h h A P 7 H T ) p ) e S A h h A 5 A 7 H RG D P 7 H
A hP h A G U
Vs PU G
T 0 ) p e p ( e A7 H @F7 bH 5 A h h A A 5 ) 0 ) p ( e P 7 H G A h h A A 5 I@F7 bH )0 ) p e ( A 5 7 H @RG A h h A P 7 H ) 0 ) ( @S s P 7 H G A h h A P 7 H I@RG D
p e A 5 7 bH P 7 H PG R U G
348
Semantics
(4.144) in the algebra of the structure terms enriched by the symbols 0 and 1. If we choose the latter option, we have (4.145)
Now let be the congruence dened by (4.144). (4.146)
The functions on the exponents are xed. Let us pause here and look at the problem of reversibility. Say that f : S T is nite if f x for every x S. Likewise, f is bounded if there k for all x. is a number k such that f x Denition 4.85 Let E M be an interpreted language. is nitely reversible (boundedly reversible) if for every x E, x : y : x y is nite (bounded), and for every y M, y : x : x y is nite (bounded), and moreover, the functions x x and y y are computable. The conditions on x are independent from the conditions on x (see (Dymetman, 1991; Dymetman, 1992)). Theorem 4.86 (Dymetman) There are interpreted languages and such that (a) x x is nite and computable but y y is not, (b) y y is nite and computable but x x is not. In the present context, the problem of enumerating the analyses is trivial. So, given a context free sign grammar, the function x x is always computable, although not always nite. If we insist on branching, x grows at most exponentially in the length of x. We have established nothing about the maps m m . Let us now be given a sign system whose projection to E C is context free. What conditions must be impose so that there exists a context free sign

T s 1 b0 ) (
T 1 s Q0 ) (
jV U
8jV U
} s
) iaaa)
(4.147)
n :
if for all i n : i otherwise.
Ai ,
Next we dene the action on the categories. Let
D aa D T T P PS) 8$HP C'bT ) bT P tP 1) P P 5) P P P P P P ) T P S) T P S) T P P P) P bbgC'bb$C'b855P 5P b$HP P
5P CS P P S) P 'bT CS P ) S P S
M :
T P P 5) P tP 5) P P 1$P P P P P P P P P ) 1P P P 855P ) 55P ) 88q) ) S P P ) P P P P P ) P ) P D

0 B A0 An 1 . Then
M:
01
V U
349
grammar for ? Let x A and A C. We write x A : m: x A m and call this set the set of Ameanings of x in . If is given by the context, we omit it. If a system of has a CFG then it satises the following equations for every A and every x.
Z F 1 1
where Z B0 Bn 1 A. This means simply put that the Ameanings of x can be computed directly from the meanings of the immediate subconstituents. From (4.148) we can derive the following estimate.
1
This means that a string cannot have more meanings than it has readings (= structure terms). We call the condition (4.149) the count condition. In particular, it implies that there is a constant c such that for almost all x and all A:
Even if the count condition is satised it need not be possible to construct a context free system of signs. Here is an example.
A CFG for is
The condition (4.148) is satised. But there is no context free sign grammar for . The reason is that no matter how the functions are dened, they must produce an innite set of numbers from just one input, 0. Notice that we can even dene a boundedly reversible system for which no CFG exists. n It consists of the signs n n , n , the signs 2n and the signs n n n where n is prime. We have n 3n , whence x 3 2 2, to be partial, but if 2. However, suppose we allow the functions f and k dened f x0 x f 1 i f xi . Then a partial context free grammar exists if the interpreted language dened by is boundedly reversible. (Basically, the partiality allows any degree of sensitivity to the string of which the
T ) } S q H H ) ) q ( } H H
0 ) ) ( q
i) i 6iaaa) U
(4.152)
0 ) )
q( 0 ) ) @S
(4.151)
0 :n
q @"T ( S s H
H s #H
(4.150)
xA
cx
n :n
Z F
i Z
i W aa W i f
(4.149)
xi
Bi
x0
x Z
i W aa W i
i Raae e
(4.148)
B0
x Z
Z x0
B Z
x0
x Z
1 0 ) ) ( i
ni
1 i i
) baaa)
0
}
) ) q ( } H H
350
Semantics
expression is composed. Each meaning is represented by a bounded number of expressions.) Let A B0 Bn 1 . Notice that the function makes Ameanings from certain Bi meanings of the constituents of the string. However, often linguists use their intuition to say what an Astring means under a certain analysis, that is to say, structure term. This as is easy to see is tantamount to knowing the functions themselves, not only their domains and ranges. For let us assume we have a function which assigns to every structure term of an Astring x an Ameaning. Then the functions are uniquely determined. For let a derivation of x be given. This derivation determines derivations of its immediate constituents, which are now unique by assumption. For the tuple of meanings of the subconstituents we know what the function does. Hence, it is clear that for any given tuple of meanings we can say what the function does. (Well, not quite. We do not know what it does on meanings that are not expressed by a Bi string, i f . However, on any account we have as much knowledge as we need.) Let us return to Montague Grammar. Let A C M be strongly context free, with F the set of modes. We want to show that there is an AB grammar which generates . We have to precisify in what sense we want to understand this. We cannot expect that is any context free system, since ABgrammars are always binary branching. This, however, means that we have to postulate other constituents than those of . Therefore we shall only aim to have the same sentence meanings. In what way we can get more, we shall see afterwards. To start, there is a trivial solution of our problem. If A B0 B1 is a rule we add a 0ary mode
0 1 0 1
This allows us to keep our constituents. However, postulating empty elements does have its drawbacks. It increases the costs of parsing, for example. We shall therefore ask whether one can do without empty categories. This is possible. For, as we have seen, with the help of combinators one can liberate oneself from the straightjacket of syntactic structure. Recall from Section 2.2 the transformation of a CFG into Greibach Normal Form. This uses essentially the tool of skipping a rule and of eliminating left recursion. We leave it to the reader to formulate (and prove) an analogon of the skipping of rules for context free sign grammars. This allows us to concentrate on the elimination of left recursion. We look again at the construction of Lemma 2.33. Choose
0 aV
) (
(4.153)
A B0 B1 xB xB F xB xB
V U
aa i
351
a nonterminal X. Assume that we have the following Xproductions, where j , j m, and i , i n, do not contain X.
assume that j Y j , i Ui , are nonterminal symbols. (Evidently, this can be achieved by introducing some more nonterminal symbols.) We we have now these rules.
So, we generate the following structures.
We want to replace them by these structures instead:

0 1 n 2 n 1
Proceed as in the proof of Lemma 2.33. Choose a new symbol Z and replace the rules by the following ones.
Now dene the following functions.

i i
Now we have eliminated all left recursion on X. We only have to show that we have not changed the set of Xmeanings for any string. To this end, let x be an Xstring, say x y i k zi , where y is a Y j string and zi a U j string. i Then in the transformed grammar we have the Zstrings
p i k
(4.160)
up :
zi
x0 :
Wi
x0
x0 x1
) V
x0 x1 :
(4.159)
x2
x0 x1 :
i i i
x 1 x0
x1 x2 x0
(4.158b)
j : X
Yj Z
: Z
Yi Z
(4.158a)
j : X
Uj
i : Z
aa
k aa
(4.157)
Y Ui Ui
n 2
n 1
Ui
aa
aa
(4.156)
Y Ui Ui
Ui
Ui
Ui
Yi
(4.155)
X Ui
D i
Further let
j ,
m, and
i ,
n, be given. To keep the proof legible we
Yi
) i W
(4.154)
j : X
X i
i : X
352
Semantics
and x is an Xstring. Now we still have to determine the meanings. Let be a meaning of y as a Yi string and i , i k, a meaning of zi as a U j string. i The meaning of x as an Xstring under this analysis is then
Inductively we get

If we put j n, and if we apply at last the function on and the result we j nally get that x has the same Xmeaning under this analysis. The converse shows likewise that every Xanalysis of x in the transformed grammar can be transformed back into an Xanalysis in the old grammar, and the Xmeanings of the two are the same. The reader may actually notice the analogy with the semantics of the Geach rule. There we needed to get new constituent structures by bracketing A B C into A B C . Supposing that A and B are heads, the semantics of the rule forming A B must be function composition. This is what the denitions achieve here. Notice, however, that we have no categorial grammar to start with, so the proof given here is not fully analogous. Part of the semantics of the construction is still in the modes themselves, while categorial grammar requires that it be in the meaning of the lexical items. After some more steps, consisting in more recursion elimination and skipping of rules we are nally done. Then the grammar is in Greibach normal form. The latter can be transformed into an ABgrammar, as we have already seen.
n j
n j
n 2
n 1
) V
)) iaaaV
aa6U
(4.164)
x0
x0
x2
) V
) V
~U
(4.163)
x2
n n
n 1 x2 n 2 i x 2 n 1 n 2
n 2
n 2
n 2
n 1
)
Then un
has the meaning

i
n 1
n 1
n 1
n 1
(4.162)
As a Zstring un
has the meaning

i
x0
x0
n 1
n 2
n 2
n 1
) V
)) aaaV
) V
) U
U aa6U
(4.161)
353
Theorem 4.87 Let be a context free linear system of signs. Then there exists an ABgrammar that generates . The moral to be drawn is that Montague grammmar is actually quite powerful from the point of view of semantics. If the string languages are already context free, then if any context free analysis succeeds, so does an analysis in terms of Montague grammar (supposing here that nothing except linear concatenation is allowed in the exponents). We shall extend this result later to PTIMElanguages. Notes on this section. With suitable conditions (such as nondeletion) the set x becomes enumerable for every given x, simply because the number of parses of a string is nite (and has an a priori bound based on the length of x). Yet, x is usually innite (as we discussed in Section 5.1), and the sets y need not be recursively enumerable for certain y. (Pogodalla, 2001) studies how this changes for categorial grammars if semantic representations are not formulae but linear formulae. In that case, the interpreted grammar becomes reversible, and generation is polynomial time computable. Exercise 166. Let G be a quasi context free sign grammar. Construct a context free sign grammar which generates the same interpreted language. . Show that Exercise 167. Let G be determined by the two rules the set of constituent structures of G cannot be generated by an ABgrammar. Hint. Let d be the number of occurrences of slashes ( or ) in . If is d or d d . the mother of and then either d Exercise 168. Let be strongly context free with respect to a 2standard CFG G with the following property: there exists a k such that for every Gtree T and every node x T there is a terminal node y T with yx k. Then there exists an ABgrammar for which generates the same constituent structures as G. Exercise 169. As we have seen above, left recursion can be eliminated from a grammar which generates for every CFG G. Show that there exists a CCG nonterminal X the same set of Xstrings. Derive from this that we can write an ABgrammar which for every X generates the same Xstrings as G. Why does it not follow that LB G can be generated by some ABgrammar? Hint. For the rst part of the exercise consider Exercise 114.
i,
V U
h1
Exercise 170. Let C A be an ABgrammar. Put 0 : are the 0th projections. Inductively we put for i
a A
a . These , i 1 :
Vv U
} }
dV U
V U
V "uU
dV U
V U
0 ) ) ) (
0 3) j) i) (
V U
)
354
Semantics
. In this way we dene the projections of the symbols from 0 . Show that by these denitions we get a grammar which satises the principles of Xsyntax. Remark. The maximal projections are not necessarily in 2 .
7. Partiality and Discourse Dynamics
After having outlined the basic setup of Montague Semantics, we shall deal with an issue that we have so far tacitly ignored, namely partiality. The name partial logic covers a wide variety of logics that deal with radically different problems. We shall look at two of them. The rst is that of partiality as undenedness. The second is that of partiality as ignorance. We start with partiality as undenedness. Consider the assignment y : x 1 u2 9 to y in a program. This clause is potentially dangerous, since u may equal 3, in which case no value a n n , lim : limn an can be assigned to y. Similarly, for a sequence is dened only if the series is convergent. If not, no value can be given. Or in type theory, a function f may only be applied to x if f has type for certain and x has type . In the linguistic and philosophical literature, this phenomenon is known as presupposition. It is dened as a relation between propositions (see (van der Sandt, 1988)).
~
Denition 4.88 A proposition presupposes if both and We write (or simply ) to say that presupposes .
The denition needs only the notion of a negation in order to be welldened. Clearly, in boolean logic this denition gives pretty uninteresting results. presupposes in iff is a tautology. However, if we have more than two truthvalues, interesting results appear. First, notice that we have earlier remedied partiality by assuming a dummy element that a function assumes as soon as it is not dened on its regular input. Here, we shall remedy the situation by giving the expression itself the truthvalue . That is to say, rather than making functions themselves total, we make the assignment of truthvalues a total function. This has different consequences, as will be seen. Suppose that we totalize the operator lim n so that it can be applied 1 n, to all sequences. Then if an n is not a convergent series, say an and 3 . The negation of 3 limn an is not true, since limn an the statement will then be true. This is effectively what Russel (1905) and Kempson (1975) claim. Now suppose we say that 3 lim n an has no truth
v 'U
{ D
{ D
U V
g U
| r
Partiality and Discourse Dynamics
355
value; then 3 limn an also has no truthvalue. To nevertheless be able to deal with such sentences rather than simply excluding them from discourse, we introduce a third truthvalue, . The question is now: how do we dene the 3valued counterparts of , and ? In order to keep confusion at a minimum, we agree on the following conventions. 3 denotes the 3valued consequence relation determined by a matrix 01 1 , where is an interpretation of the connectives. We shall assume that F consists of a subset of the set of the 9 unary and 27 binary symbols, which represent the unary and binary functions on the three element set. This denes . Then 3 is uniquely xed by F, and the logical connectives will receive a distinct name every time we choose a different function on 0 1 . What remains to be solved, then, is not what logical language to use but rather by what , , , and connective to translate the ordinary language connectors . Here, we assume that whatever interprets them is a function on 01 (or 0 1 2 ), whose restriction to 0 1 is its boolean counterpart, which is already given. For those functions, the 2valued consequence is also dened and denoted by 2 . Now, if is the truthvalue reserved for the otherwise truth valueless statements, we get the following three valued logic, due to Bochvar (1938) . Its characteristics is the fact that undenedness is strictly hereditary. 0 1 1 0 0 1 0 1
The basic connectives are , and , which are interpreted by , and . Here is a characterization of presupposition in Bochvars Logic. Call a connective a Bochvarconnective if x iff x i for some i . Proposition 4.89 Let and be composed using only Bochvarconnectives. Then 3 iff (i) is not classically satisable or (ii) 2 and var var . Proof. Suppose that 3 and that is satisable. Let be a valuation such that 1 for all . Put p : p for all p var and p : otherwise. Suppose that var var . Then , contradicting our assumption. Hence, var var . It follows that every
} dV U
V U { D V U
{ vD
V U w V U DV U
{ D
V V U iU
}QV U vV U V U
(4.165)
0 1 0 0 0 1
0 1 0 1 1 1
5 g RH 9
0 S IT ')
) bT ) ) !( S
T ) ) S
g 9
T ) S
T ) ) S
V U { D
9 Rh
T ) ) S
V U
V U
aa
X D
356
Semantics
valuation that satises also satises , since the valuation does not assume on its variables (and can therefore be assumed to be a classical valuation). Now suppose that 3 . Then clearly must be satisable. Furthermore, by the argument above either var var or else 2 . This characterization can be used to derive the following corollary. Corollary 4.90 Let and be composed by Bochvarconnectives. Then iff var var and 2 . Hence, although Bochvars logic makes room for undenedness, the notion of presupposition is again trivial. Bochvars Logic seems nevertheless adequate as a treatment of the operator. It is formally dened as follows. Denition 4.91 is a partial function from predicates to objects such that x x is dened iff there is exactly one b such that b , and in that case x x : b. Most mathematical statements which involve presuppositions are instances of a (hidden) use the operator. Examples are the derivative, the integral and the limit. In ordinary language, corresponds to the denite determiner . Using the operator, we can bring out the difference between the bivalent inon innite terpretation and the three valued one. Dene the predicate sequences of real numbers as follows:
~
This is in formal terms the denition of a Cauchy sequence. Further, dene a predicate as follows.
This predicate says that x is a cumulation point of . Now, we may set lim : x x . Notice that is equivalent to
This is exactly what must be true for lim to be dened. (4.170) (4.171)
V
CV UVIkU k 13rRUV U@V UIkU k 3RV U u ~ V 1 r uU V IkU k $Rhu UV 1 r
(4.169)
V aV
U V 1 r uU ~U U V 1 r uU CV IkU k 3RV @V IkU k $RV U
(4.168)
v U V @jV
y 0
h h @5
~U U V V
4 k
A P H 7 @3h
V | u r q IkU k gR5@u
~ U
4 FD
V IkU k 3Ru UV 1 r
(4.167)
x :
U v U bjV @V @jV
~U U V V
~ U
V | u r q IkU k QR1@u
(4.166)
| u r q k QR18u
V U
w V U D
v V U
V U
f $h D P
} dV U
k 3Ru 1 r
V Ikk $Rhu UV U 1 r
V U D V U
357
Under the analysis (4.170) the sentence (4.169) presupposes that is a Cauchy sequence. (4.171) does not presuppose that. However, the dilemma for the translation (4.171) is that the negation of (4.169) is also false (at least in ordinary judgement). What this means is that the truthconditions of (4.172) are not expressed by (4.174), but by (4.175) which in three valued logic is (4.173).
G
(4.172) (4.173) (4.174) (4.175)
It is difcult to imagine how to get the translation (4.175) in a bivalent approach, although a proposal is made below. The problem with a bivalent analysis is that it can be shown to be inadequate, because it rests on the assumption that the primitive predicates are bivalent. However, this is problematic. The most clearcut case is that of the truthpredicate. Suppose we dene the semantics of on the set of natural numbers as follows.
Here, is, say, the G del code of . It can be shown that there is a such o that is true in the natural numbers. This contradicts (4.176). The sentence corresponds to the following liar paradox.
X G
(4.177)
Thus, as Tarski has observed, a truthpredicate that is consistent with the facts in a sufciently rich theory must be partial. As sentence (4.177) shows, natural languages are sufciently rich to produce the same effect. Since we do not want to give up the correctness of the truthpredicate (or the falsity predicate), the only alternative is to assume that it is partial. If so, however, there is no escape from the use of three valued logic, since bivalence must fail. Let us assume therefore that we three truthvalues. What Bochvars logic gives us is called the logic of hereditary undenedness. For many reasons it
y 0
h A P H
p A D $h $$R$D 9 h 4 9 h A A
V "t
sU p
(4.176)
V aV
U V
V
x x x
h h 5
CV IkU k 3RV @V IkU k 3RV U UV 1 r uU ~U U V 1 r uU V UIkU k 1$RV V IkU k 3RV U V r uU ~U U V 1 r uU V V IkU k $Rhu U UV 1 r D g h g X g FD f $h G s 9 A 4 D P
4
P H 7 @3h
V dt
s
s p U s
358
Semantics
is problematic, however. Consider the following two examples.
lim 3
lim
(4.179)
y:
u2
y:
By Bochvars Logic, (4.178) presupposes that , and are convergent series. (4.179) presupposes that u 3. However, none of the two sentences have nontrivial presuppositions. Let us illustrate this with (4.178). Intuitively, the ifclause preceding the equality statement excludes all sequences from consideration where and are nonconvergent sequences. One can show that , the pointwise sum of and , is then also convergent. Hence, the if clause covers all cases of partiality. The statement lim lim lim never fails. Similarly, has the power to eliminate presuppositions.
lim
lim x
(4.181)
u:
4; y :
X D
u2
As it turns out, there is an easy x for that. Simply associate the following connectives with and . 0 1 0 1 0 1
The reader may take notice of the fact that while and are reasonable candidates for and , is not as good for . In the linguistic literature, various attempts have been made to explain these facts. First, we distinguish the presupposition of a sentence from its assertion. The denition of these terms is somewhat cumbersome. The general idea is that the presupposition of a sentence is a characterization of those circumstances under which it is either true or false, and the assertion is what the sentence says when it is either true of false (that is to say, the assertion tells us when the sentence is true given that it is either true or false). Let us attempt to dene this. Let be a proposition. Call a generic presupposition of if the following holds. (a) , (b) if then 3 . If is a generic
g k 5
k Y
9 h
aa
(4.182)
0 1 1 1 1 0 1
0 1 0 0 1 0 1
0 1 0 1 1 1 1
V g Rx"kU
(4.180)
lim
V g IkU g "
V g Rx"kU
g C
(4.178)
ih R V v U V g U o Ry D D D g " y D p ` p i X A 9 h 7 h A 4 9 h R 5 h b3R$8h 9 g H h 5 9 IH
lim
V v U V g U D D g " y D ` p 9 H A h D 5 h A 4 9 h R 5 Rbnb$bh 9 g $bH h 5 9 RH
9 IH
9 Rh
X D
9 RH
4
aa
9 IH
g C
359
presupposition of , is called an assertion of . First, notice that presuppositions are only dened up to interderivability. This is not a congruence. We may have 3 without and receiving the same truthvalue under all assignments. Namely, 3 iff and are truthequivalent, that 1 exactly when 1. In order to have full equivalence, we is, must also require 3 . Second, notice that satises (a) and (b). However, presupposes itself, something that we wish to avoid. So, we additionally require the generic presupposition to be bivalent. Here, is bivalent if for every valuation into 0 1 : 0 1 . Dene the following connective.
Denition 4.92 Let be a proposition. is a generic presupposition of with respect to 3 if (a) , (b) is bivalent and (c) if then 3 . is the assertion of if (a) is bivalent and (b) . Write 3 P for the generic presupposition (if it exists), and A for the assertion. It is not a priori clear that a proposition has a generic presupposition. A case in point is the truthpredicate. Write 3 if for all .
3
The projection algorithm is a procedure that assigns generic presuppositions to complex propositions by induction over their structure. Table 14 shows a projection algorithm for the connectives dened so far. It is an easy matter to dene projection algorithms for all connectives. The prevailing intuition is that the three valued character of , and is best explained in terms of context change. A text is a sequence of propositions, say i : i n . A text is coherent if for every i n: j : j i 3 i i . In other words, every member is either true or false given that the previous propositions are considered true. (Notice that the order is important now.) In order to extend this to parts of the j we dene the local context as follows. Denition 4.94 Let i : i n . The local context of j is i : i j . For a subformula occurrence of j , the local context of that occurrence is dened as follows.
9 h
aa
X D
5 g RH 9
V U
V U
Proposition 4.93
P .
V U
V U
V U
(4.183)
0 1 0 0 1 1
T ) QV U S 1
T ) ) S
~ W
V U
~ W
V U
V U
360
Semantics
Table 14. The Projection Algorithm
If is the local context of then (a) is the local context of and (b) ; is the local context of . If is the local context of then (a) is the local context of and (b) ; is the local context of . If is the local context of then (a) is the local context of and (b) ; is the local context of .
j is bivalent in its local context if for all valuations that make all formulae in the local context true, j is true or false.
The presupposition of is the formula such that is bivalent in the context , and which implies all other formulae that make bivalent. It so turns out that the context dynamics dene a three valued extension of a 2valued connective, and conversely. The above rules are an exact match. Such formulations have been given by Irene Heim (1983), Lauri Karttunen (1974) and also Jan van Eijck (1994). It is easy to understand this in computer programs. A computer program may contain clauses carrying presuppositions (for example clauses involving divisions), but it need not fail. For if whenever a clause carrying a presupposition is evaluated, that presupposition is satised, no error ever occurs at runtime. In other words, the local context of that clause satises the presuppositions. What the local context is, is dened by the evaluation procedure for the connectives. In computer languages, the local context is always to the left. But this is not necessary. The computer
V aV U
V U V aV U
k y
@V U U
V k
V aV U
V U
V U
V k
V aV U
V U V U V U
FV U 2 U V U V k V U D U 2 V U V d k V U D 2 U
U U U
D V
V kd
V V
U U U U
V U V U V U
V U
U U U U
D D D
D V
V k
V V
A A A A
V V V V V
@V V V @V V U
U U U U U U
A A A A A
A A A A A
P P P P P
P P P P P A P A P A
P P P
361
evaluates by rst evaluating and then only if is true but it could also evaluate rst and then evaluate whenever is true. In (Kracht, 1994) it is shown that in the denition of the local context all that needs to be specied is the directionality of evaluation. The rest follows from general principles. Otherwise one gets connectives that extend the boolean connectives in an improper way (see below on that notion). The behaviour of presuppositions in quantication and propositional attitude reports is less straightforward. We shall only give a sketch.
X G X
(4.184) (4.185) (4.186)
using the quantier in predicate logic. We wish We have translated to extend it to a threevalued quantier, which we also call . (4.184) is true even if not everybody in the region is a bachelor; in fact, it is true exactly if there is no nonbachelor. Therefore we say that x is true iff there is no x for which x is false. x is false if there is an x for which x is false. Thus, the presupposition effectively restricts the range of the quantier. x is bivalent. This predicts that (4.185) has no presuppositions. On (4.186) the intuitions vary. One might say that it does not have any presuppositions, or else that it presupposes that the neighbour is a man (or perhaps: that John believes that his neighbour is a man). This is deep water (see (Geurts, 1998)). Now we come to the second interpretation of partiality, namely ignorance. Let now stand for the fact that the truthvalue is not known. Also here the resulting logic is not unique. Let us take the example of a valuation that is only dened on some of the variables. Now let us be given a formula . Strictly speaking is not dened on if the latter contains a variable that is not in the domain of . On the other hand, there are clear examples of propositions that receive a denite truthvalue no matter how we extend to a total function. For example, even if is not dened on p, every extension of it must make p p true. Hence, we might say that also makes p p true. This is the idea of supervaluations by Bas van Fraassen. Say that is svtrue (svfalse) under if is true under every total . If is neither svtrue nor svfalse, call it svindeterminate. Unfortunately, there is no logic to go with this approach. Look at the interpretation of . Clearly, if either or is svtrue, so is their disjunction. However, what if both and
V U ~
G 5p 5 g $h H P
f Ig 5
V U
5 h 4 4 h P H b7@4 g
5 g
5 g
~ ~ ` 8h h 5 G G G ` G e H A$D7 g R9D!9$D #H kh $h 9 g 5 h A 4 4 A h D P G 5p G ` G P h H cAD 9 g Dq7$54#H 9 g 88 8h H R h 4 9 D A 5 h 5 p G y hRHbnH f 9H 9 h R H D 5 5 4 4 G ` g 5 g $h Hp H 8h G P 5
V U ~
9 g
D R q
h 5 h
V U ~
V U
o @q
V U
362
Semantics
are svindeterminate?
0 1 0 1 0 1 1 1 1 1 ?
(4.187)
The formula p q is svindeterminate under the empty valuation. It has no denite truthvalue, because both p and q could turn out to be either true p is svtrue under the empty valuation, or false. On the other hand, p even though both p and p are svindeterminate. So, the supervaluation approach is not so wellsuited. Stephen Kleene actually had the idea of doing a worst case interpretation: if you cant always say what the value is, ll in . This gives the socalled Strong Kleene Connectives (the weak ones are like Bochvars). 0 1 1 0 0 1 0 1
For example, 0 0 01 0 00 1 01 . So, we simply take sets of truthvalues and calculate with them. A more radical account of ignorance is presented by constructivism and intuitionism. A constructivist denies that the truth or falsity of a statement can always be assessed directly. In particular, an existential statement is true only if we produce an instance that satises it. A universal statement can be considered true only if we possess a proof that any given element satises it. For example, Goldbachs conjecture is that every even number greater than 2 is the sum of two primes. According to a constructivist, at present it is neither true nor false. For on the one hand we have no proof that it holds, on the other hand we know of no even number greater than 2 which is not the sum of two primes. Both constructivists and intuitionists unanimously reject axiom (a2). (Put for p1 . Then in conjunction with the other rules
{ $D
T ) S
s )
(4.189)
f x0 x1 :
f x0
x1
T S
These connectives can be dened in the following way. Put 1 : 0 and : 0 1 . Now put
1 ,0 :
s S jT ) Se%T %s a0 ) aU s ( S V D D { e V ) U D
(4.188)
ws
0 1 0 0 0 0 1 0
0 1 0 1 1 1 1 1
{ {
T ) S
{ {
D {
T S
363
this gives p0 p0 p0 . This corresponds to the Rule of Clavius: from p0 p0 conclude p0 .) They also reject p p, the socalled Law of the Exluded Middle. The difference between a constructivist and an intuitionist is the treatment of negative evidence. While a constructivist accepts basic negative evidence, for example, that this lemon is not green, for an intuitionist there is no such thing as direct evidence to the contrary. We only witness the absence of the fact that the lemon is green. Both, however, are reformist in , , , the sense that they argue that the mathematical connectives and have a different meaning. However, one can actually give a reconstruction of both inside classical mathematics. We shall deal rst with intuitionism. Here is a new set of connectives, dened with the help of , which satises .
Intuitively, the nodes of P represent stages in the development of knowledge. Knowledge develops in time along . We say that x accepts if P x , and that x knows if P x . By denition of , once a proposition p is accepted, it is accepted for good and therefore considered known. Therefore G del simply translated variables p by the foro mula p. Thus, intuitionistically the statement that p may therefore be understood as p is known rather than p is accepted. The systematic conation of knowledge and simple temporary acceptance as true is the main feature of intuitionistic logic.
Constructivism in the denition by Nelson adds to intuitionism a second valuation for those variables that are denitely rejected, and allows for the possibility that neither is the case. (However, nothing can be both accepted and rejected.) This is reformulated as follows.
0 i) ( )
Proposition 4.96 Let P for all .
be an Imodel and an Iproposition. Then
0 i) (
Denition 4.95 An Imodel is a pair P , where P ordered set and p p for all variables p.
is a partially
Call an Iproposition a proposition formed from variables and the connectives just dened.
0 ) )i) (
0 i) ( )
V U
V U
V U
(4.190)
using only
g 9 5 g IH 9
"V
9 h
0 ) i) ( )
aa
X D
V U
364
Semantics
Denition 4.97 A Cmodel is a pair P , where P is a partially ordered set and : V P 01 such that if p v 1 and v w then also p w 1, and if p v 0 and v w then p w 0. We write P x p if p x 1 and P x p if p x 0. We can interpret any propositional formula over 3valued logic that we have dened so far. We have to interpret and , however. x x x there is y for no y
f
(4.191)
x:y
f
there is y
f
x:y x:y
Now dene the following new connectives.

c
In his data semantics (see (Veltman, 1985)), Frank Veltman uses constructive logic and proposes to interpret and as and , respectively. What is interesting is that the set of points accepting is lower closed but not necessarily upper closed, while the set of points rejecting it is upper but not necessarily lower closed. The converse holds with respect to . This is natural, since if our knowledge grows there are less things that may be true but more that must be. The interpretation of the arrow carries the germ of the relational interpretation discussed here. A different strand of thought is the theory of conditionals (see again (Veltman, 1985) and also (G rdenfors, 1988)). The conditional a is accepted as true under Ramseys interpretation if, on taking as a hypothetical assumption (doing as if is the case) and performing the standard reasoning, we nd that is true as well. Notice that after this routine of hypothetical reasoning we retract the assumption that . In G rdenfors moda els there are no assignments in the ordinary sense. A proposition is mapped directly onto a function, the update function U . The states in the G rdenfors a model carry no structure. Denition 4.98 A G rdenfors model is a pair G U , where G is a set, and a U : Tm GG subject to the following constraints.
0 ) (
H f
4 1
A F7 f
(4.192)
: :

c c
: : : :
for no y
x:y
V ) U D V ) U
0 i) D (
V ) U
C"0 ) )i) ( V ) U V ) UD T ) D ) e S
0 i) ( )
) 0 ) i) ( V ) U
365
For all : U U
The reader may verify that this relation is reexive and transitive. If we require that x y iff x and y accept the same propositions, then this ordering is also a partial ordering. We can dene as follows. If U U U then we write . Hence, if our language actually has a conjunction, the latter is a condition on the interpretation of it. To dene , G rdenfors a does the following. Suppose that for all and there exists a such that U U U U . Then we simply put U : U . Finally, since is equivalent to , once we have and , we can also dene . For negation we need to assume the existence of an inconsistent state. The details need not concern us here. Obviously, G rdenfors models are still more a general than data semantics. In fact, any kind of logic can be modelled by a G rdenfors model (see the exercises). a Notes on this section. It is an often discussed problem whether or not a statement of the form x is true if there are no x at all. Equivalently, in three valued logic, it might be said that x is undened if there is no x such that x is dened. Exercise 171. A three valued binary connective satises the Deduction Theorem if for all , and : ; 3 iff 3 . Establish all truth tables for binary connectives that satisfy the Deduction Theorem. Does any of the implications dened above have this property? Exercise 172. Let be a language and a structural consequence relation over . Let G be the set of theories of . For , let U : T T . Show that this is a G rdenfors model. Show that the set of formulae accepted a by all T G is exactly the set of tautologies of .
R
V IT
S s U
V U ~
V U
X aayX
V U ~
(4.193)
U 0 U 1
U n
V U
Put x
y iff there is a nite set i : i
V U
We say that x
G accepts if U x
For all and : U U
U . U U . x. n such that
Chapter 5 PTIME Languages

1. MildlyContext Sensitive Languages
The introduction of the Chomsky hierarchy has sparked off a lot of research into the complexity of formal and natural languages. Chomskys own position was that language was not even of Type 1. In transformational grammar, heavy use of context sensitivity and deletion has been made. However, Chomsky insisted that these grammars were not actually models of performance, neither of sentence production nor of analysis; they were just models of competence. They were theories of language or of languages, couched in algorithmic terms. In the next chapter we shall study a different type of theory, based on axiomatic descriptions of structures. Here we shall remain with the algorithmic approach. If Chomsky is right, the complexity of the generated languages is only of peripheral interest and, moreover, cannot even be established by looking at the strings of the language. Thus, if observable language data can be brought to bear on the question of the language faculty, we actually need to have a theory of the human language(s), a theory of human sentence production, and a theory of human sentence analysis (and understanding). Namely, the reason that a language may fail to show its complexity in speech or writing is that humans simply are unable to produce the more complex sentences, even though given enough further means they would be able to produce any of them. The same goes for analysis. Certain sentences might be avoided not because they are illegitimate but because they are misleading or too difcult to understand. An analogy that might help is the notion of a programming language. A computer is thought to be able to understand every program of a given computer language if it has been endowed with an understanding of the syntactic primitives and knows how to translate them into executable routines. Yet, some programs may simply be too large for the computer to be translated let alone executed. This may be remedied by giving it more memory (to store the program) or a bigger processing unit (to
368
PTIME Languages
be able to execute it). None of the upgrading operations, however, seem to touch on the basic ability of the computer to understand the language: the translation or compilation program usually remains the same. Some people have advanced the thesis that certain monkeys possess the symbolic skills of humans but since they cannot handle recursion, so that their ability to use language is restricted to single clauses consisting of single word phrases (see for example (Haider, 1991) and (Haider, 1993), Pages 8 12). One should be aware of the fact that the average complexity of spoken language is linear (= O n ) for humans. We understand sentences as they are uttered, and typically we seem to be able to follow the structure and message word by word. To conclude that therefore human languages must be regular is premature. For one thing, we might just get to hear the easy sentences because they are also easy to generate: it is humans who talk to humans. Additionally, it is not known what processing device the human brain is. Suppose that it is a nite state automaton. Then the conclusion is certainly true. However, if it is a pushdown automaton, the language can be deterministically context free. More complex devices can be imagined giving rise to even larger classes of languages that can be parsed in linear time. This is so since it is not clear that what is one step for the human brain also is one step for, say, a Turing machine. It is known that the human brain works with massive use of parallelism, for example. Therefore, the problem with the line of approach advocated by Chomsky is that we do not possess a reliable theory of human sentence processing let alone of sentence production (see (Levelt, 1991) for an overview of the latter). Without them, however, it is impossible to assess the correctness of any proposed theory of grammar. Many people have therefore ignored this division of labour into three faculties (however reasonable that may appear) and tried to assess the complexity of the language as we see it. Thus let us ask once more:
How complex is human language (are human languages)?
While the Chomsky hierarchy has suggested measuring complexity in terms of properties of rules, it is not without interest to try to capture its complexity in terms of resources (time and space complexity). The best approximation that we can so far give is this.
Human languages are in PTIME.
V U
MildlyContext Sensitive Languages
369
In computer science, PTIME problems are also called tractable, since the time consumption grows slowly. On the other hand, EXPTIME problems are called intractable. Their time consumption grows too fast. In between the two lie the classes NPTIME and PSPACE. Still today it is not known whether or not NPTIME is contained in (and hence equal to) PTIME. Problems which are NPTIMEcomplete usually do possess algorithms that run (deterministically) in polynomial time on the average. Specically, Aravind Joshi has advanced the claim that languages are what he calls mildly context sensitive (see (Joshi, 1985)). Mildly context sensitive languages are characterized as follows. Denition 5.1 L A has the constant growth property if it is nite or there y is a number cL such that for every x L there is a y L such that x x cL . Every context free language is mildly context sensitive. There are mildly context sensitive languages which are not context free. Mildly context sensitive languages can be recognized in deterministic polynomial time. There is only a nite number of crossed dependency types. Mildly context sensitive languages have the constant growth property. These conditions are not very strong except for the second. It implies that the mildly context sensitive languages form a proper subset of the context sensitive languages. needs no comment. is quite weak. Moreover, it seems that for every natural language there is a number d L such that for every n dL there is a string of length n in L. Rambow (1994) proposes to replace it with the requirement of semilinearity, but that seems to be too strong (see Michaelis and Kracht (1997)). Also is problematic. What exactly is a crossed dependency type? In this chapter we shall study grammars in which the notion of structure can be dened as with context free languages. Constituents are certain subsets of disjoint (occurrences of) subwords. If this denition is accepted, can be interpreted as follows: there is a number n such that a given constituent has no more than n parts. This is certainly not what Joshi had in mind when formulating his conditions, but it is certainly not easy to come up with a denition that is better than this one and as clear.
f
R nicR i
1 i
1 i
g 6 ni
370
PTIME Languages
So, the conditions are problematic with the exception of . Notice that implies , and, as we shall see, also (if only weak equivalence counts here). shall be dropped. In order not to create confusion we shall call a language a PTIME language if it has a deterministic polynomial time recognition algorithm (see Denition 1.100). In general, we shall also say that a function f : A B is in PTIME if there is a deterministic Turing machine which computes that function. Almost all languages that we have considered so far are PTIME languages. This shall emerge from the theorems that we shall prove further below. Proposition 5.2 Every context free language is in PTIME. This is a direct consequence of Theorem 2.57. However, we get more than this. Proposition 5.3 Let A be a nite alphabet and L 1 , L2 languages over A. If L1 L2 PTIME then so is A L1 , L1 L2 and L1 L2 . The proof of this theorem is very simple and left as an exercise. So we get that the intersection of CFLs, for example n n n : n , are PTIME languages. Condition for mildly context sensitive languages is satised by the class of PTIME languages. Further, we shall show that the full preimage of a PTIME language under the Parikhmap is again a PTIME language. To this end we shall identify M A with the set of all strings of the form A , which i n ipi . The Parikhmap is identied with the function : A pn 1 p0 p1 assigns to a string x the string 0 , where p j is the number 1 n 1 of occurrences of j in x. Now take an arbitrary polynomial time computable function g : A 2. Clearly, g M A is also in PTIME. The preimage of 1 under this function is contained in the image of . g 1 1 M A can be thought of in a natural way as a subset of M A . Theorem 5.4 Let L M A be in PTIME. Then 1 L , the full preimage of L under , also is in PTIME. If L is semilinear, 1 L is in PTIME. The reader is warned that there nevertheless are semilinear languages which are not in PTIME. For PTIME is countable, but there are uncountably many semilinear languages (see Exercise 74). Theorem 5.4 follows directly from
A be in PTIME and L Theorem 5.5 Let f : B M : f 1 L also is in PTIME.
A in PTIME. Then
V U
t V U
V U
W aa W
V U
V U
V U
i H
371
Proof. By denition L PTIME. Then M L f PTIME. This is the characteristic function of M. Another requirement for mildly context sensitive languages was the constant growth property. We leave it to the reader to show that every semilinear language has the constant growth property but that there are languages which have the constant growth property without being semilinear. We have introduced the Polish Notation for terms in Section 1.2. Here we shall introduce a somewhat exotic method for writing down terms, which has been motivated by the study of certain Australian languages (see (Ebert and Kracht, 2000)). Let F be a nite signature. Further, let
An element of M t is a product f y, where f F and y . We call f the main symbol and y its key. We choose a new symbol, . Now we say that y is an Aform of t if y is the product of the elements of M t in an arbitrarily chosen (nonrepeating) sequence, separated by . For example, let t: . Then (5.2) M t
Hence the following string is an Aform of t:
Theorem 5.6 Let F be a nite signature and L the language of Aforms of terms of this signature. Then L is in PTIME. Proof. For each Aform x there is a unique term t such that x is the Aform of t, and there is a method to calculate M t on the basis of x. One simply has to segment x into correctly formed parts. These parts are maximal sequences
v r q @5@q
q q q
V U
r q
c c 5 $7 $
r q q q v q r r r 8 $@7F5Hp
(5.3)
V U
Tr@rp')qrF )q )brq )q@q )qq$)br@S V U r q q v c) q c) v H 6 D t t p c s v c $$Qt Ft v @s s s H 6 D
1 i
i f
T V U
1 i
i W S
i W
s T S
M t :
0 ) (
V U
) aaa)
V U
If t
f s0
then put x i:x M si
T S
V U
V U
If t
f with f
0 then put M f :
f .
IT V
v d
)) r) q aaa8S s
U } V U
Inductively, we assign to every term t a set M t
V U
(5.1)
max f : f
0 ) (
D D
372
PTIME Languages
consisting of a main symbol and a key, which we shall now simply call stalks. The segmentation into stalks is unique. We store x on a read and write tape i . Now we begin the construction of t. t will be given in Polish Notation. t will be constructed on Tape o in lefttoright order. We will keep track of the unnished function symbols Tape on s . We search through the keys (members of ) in lexicographic order. On a separate tape, k , we keep note of the current key. s :
, k : , o : .
Let g be the last symbol of s , n the last symbol of k . Else s :
If k If i
, go to else go to .
exit: String is an Aform. Else exit: String is not an Aform.
It is left to the reader to check that this algorithm does what it is supposed to do. Polynomial runtime is obvious. Ebert and Kracht (2000) show that this algorithm requires O n 3 2 log n time to compute. Now we shall start the proof of an important theorem on the characterization of PTIME languages. An important step is a theorem by Chandra, Kozen and Stockmeyer (1981), which characterizes the class PTIME in terms of space requirement. It uses special machines, which look almost like Turing machines but have a special way of handling parallelism. Before we can do that, we introduce yet another class of functions. We say that a function f : A B is in LOGSPACE if it can be computed by a so called deterministic logarithmically bounded Turing machine. Here, a Turing machine is called logarithmically space bounded if it has k 2 tapes (where k may be 0) such that the length of the tapes number 1 through k is bounded by log2 x . Tape 0 serves as the input tape, Tape k 1 as the output tape. Tape 0 is read only, Tape k 1 is write only. At the end of the computation, T has to have written f x onto that tape. (Actually, a moments reection shows that we may assume that the length of the intermediate tapes is bounded by c log2 x , for some c 0, cf. also Theorem 1.98.) This means that if x has
g %'V U W
s , k :
k n
1.
v V U
If n
1, s :
s g, k :
k n .
i i
Match i y f k z. If match succeeds, put i : o : o f . Else exit: String is not an Aform.
y z , s :
i R
V U i
ni
s f ,
ni
373
length 12 the tapes 2 to k 1 have length 3 since 3 log 2 12 4. It need not concern us further why this restriction makes sense. We shall see in Section 5.2 that it is wellmotivated. We emphasize that f x can be arbitrarily large. It is not restricted at all in its length, although we shall see later that the machine cannot compute outputs that are too long anyway. The reader may reect on the fact that we may require the machine to use the last tape only in this way: it moves strictly to the right without ever looking at the previous cells again. Further, we can see to it that the intermediate tapes only contain single binary numbers. Denition 5.7 Let A be a nite alphabet and L A . We say that L is in LOGSPACE if L is deterministically LOGSPACEcomputable.
Proof. We look at the congurations of the machine (see Denition 1.84). A conguration is dened with the exception of the output tape. It consists of the positions of the read head of the rst tape and the content of the intermediate tapes plus the position of the read/write heads of the intermediate tapes. Thus the congurations are ktuples of binary numbers of length c log 2 x , for some c. A position on a string likewise corresponds to a binary number. So we have k 1 binary numbers and there are at most
of them. So the machine can calculate at most x c k 1 steps. For if there are more the machine is caught in a loop, and the computation does not terminate. Since this was excluded, there can be at most polynomially many steps. Since f is polynomially computable we immediately get that f x is likewise polynomially bounded. This shows that a space bound implies a time bound. (In general, if f n is the space bound then c f n is the corresponding time bound for a certain c.) We have found a subclass of PTIME which is dened by its space consumption. Unfortunately, these classes cannot be shown to be equal. (It has not been disproved but is deeemed unlikely that they are equal.) We have to do much more work. For now, however, we remark the following.
Theorem 5.9 Suppose that f : A computable. Then so is g f .
B and g : B
C are LOGSPACE
i jV U
ni
V U
(5.4)
2k
1 c log 2 x
xck
ni
Theorem 5.8 Let f : A PTIME.
B be LOGSPACEcomputable. Then f is in
V U i
374
PTIME Languages
Proof. By assumption there is a logarithmically space bounded deterministic k 2tape machine T which computes f and a logarithmically space bounded deterministic 2tape machine U which computes g. We cascade these machines in the following way. We use k 3 tapes, of which the rst k 2 are the tapes of T and the last 2 the tapes of U. We use Tape k 1 both as the output tape of T and as the input tape of U. The resulting machine is deterministic but not necessarily logarithmically space bounded. The problem is Tape k 1. However, we shall now demonstrate that this tape is not needed at all. For notice that T cannot but move forward on this tape and write on it, while U on the other hand can only progress to read the input. Now rather than having Tape k ready, it would be enough for U if it can access the symbol number i on the output tape of T on request. Clearly, as T can compute that symbol, U only needs to communicate the request to T by issuing i in p x for binary. (This takes only logarithmic space. For we have f x some polynomial p for the length of the output computed by T , so we have log2 x for some natural number .) The proof follows once log2 f x we make this observation: there is a machine T that computes the ith symbol of the output tape of T , given the input for T input and i, using only logarithmic space. The rest is simple: everytime U needs a symbol, it calls T issuing i in binary. The global input T reads from Us input tape. This proof is the key to all following proofs. We shall now show that there is a certain class of problems which are, as one says, complete with respect to the class PTIME modulo LOGSPACEreductions. An nary boolean function is an arbitrary function f : 2 n 2. Every such function is contained in the polynomial clone of functions generated by the functions , and (see Exercise 176). We shall now assume that f is composed from projections using the functions , , and . For example, let f x0 x1 x2 : x2 x1 x0 . Now for the variables x0 , x1 and x2 we insert concrete values (either 0 or 1). Which value does f have? This problem can clearly be solved in PTIME. However, the formulation of the problem is a delicate affair. Namely, we want to think of f not as a string, but as a network. (The difference is that in a network every subterm needs to be represented only once.) To write down networks, we shall have to develop a more elaborate coding. Networks are strings over the alphabet . A cell is a string of the form , W : , or of the form , where is a binary sequence writ, and either are binary strings (written down ten in the alphabet using the letters and in place of and ) or a single symbol of the form
i i i
nV i iU
i @jV U
V aV
g 3 g r s
q T Fl) S H i) i H i i i a i i i T ) ) ) ) ) ) ) ) G) R8G8iS D
U t
g 3 k
vU 'v
ni
g 3
i 8jV U
375
or . is called the number of the cell and and the argument key, unless it is of the form or . Further, we assume that the number represented by and is smaller than the number represented by . (This makes sure that there are no cycles.) A sequence of cells is called a network if (a) there are no two cells with identical number, (b) the numbers of cells are the numbers from 1 to a certain number , and (c) for every cell with argument key (or ) there is a cell with number ( ). The cell with the highest number is called the goal of the network. Intuitively, a network denes a boolean function into which some constant values are inserted for the variables. This function shall be evaluated. With the cell number we associate a value wx as follows. (5.5) wx :
A
We write w in place of wx . The value of the network is the value of its goal (incidentally the cell with number ). Let : W 01 be the if x is not a network. Otherwise, x is the following function. x : value of x. We wish to dene a machine calculating . We give an example. We want to evaluate f x0 x1 x2 x2 x1 x0 for x0 : 0, x1 : 1 and x2 : 0. Then we write down the following network:
Lemma 5.10 The set of all networks is in LOGSPACE.
Proof. The verication is a somewhat longwinded matter but not difcult to do. To this end we shall have to run over the string several times in order to check the different criteria. The rst condition is that no two cells have the same number. To check that we need the following: for every two positions i and j that begin a number, if i j, then the numbers that start there are different. To compare the numbers means to compare the strings starting at these positions. (To do that requires to memorize only one symbol at a time, running back and forth between the strings.) This requires memorizing two further positions. However, a position takes only logarithmic space. Theorem 5.11 is in PTIME.
V U
t V $ U
Now w and v x
1, w w
1 0 1, w 1 0.
1 1
v @ U v V U V V U i D D V@ U s D V U v U VD D D D D i # F#X` iTu#'
(5.6)
V U i T ) ) S
i i i i
i i i i i i
V aV
U t
vU 'v
V iU V iU
wx w x wx w x wx
A A
if x contains , if x contains , if x contains .
V iU
V iU
V iU A v tV i U A V iU D s V i U A
{ |D
V U i
V iU
i r
1,
376
PTIME Languages
Proof. Let x be given. First we compute whether x is a network. This computation is in PTIME. If x is not a network, output . If it is, we do the following. Moving up with the number k we compute the value of the cell number k. For each cell we have to memorize its value on a separate tape, storing pairs w consisting of the name and the value of that cell. This can also be done in polynomial time. Once we have reached the cell with the highest number we are done. It is not known whether the value of a network can be calculated in LOGSPACE. The problem is that we may not be able to bound the number of intermediate values. Now the following holds. Theorem 5.12 Let f : A be in PTIME. Then there exists a function N: A W in LOGSPACE such that for every x A N x is a network and f x N x. Proof. First we construct a network and then show that it is in LOGSPACE. By assumption there exist numbers k and c such that f x is computable in : c x k time using a deterministic Turing machine T . We dene a construction algorithm for a sequence x : C i j i j , where 0 i j c x k . x is ordered in the following way:
(5.7)
C 10 C 11 C 12 C 20 C 21 C 22
C i j contains the following information: (a) the content of the jth cell of the tape of T at time point i, (b) information, whether the read head is on that cell at time point i, (c) if the read head is on this cell at i also the state of the automaton. This information needs bounded length. Call the bound . We denote by C i j k , k , the kth binary digit of C i j . C i 1 j depends only on C i j 1 , C i j and C i j 1 . (T is deterministic. Moreover, we assumed that T works on a tape that is bounded to the left. See Exercise 43 that this is no loss of generality.) C 0 j are determined by x alone. (A) We have C i 1 j C i j if either (A1) at i the head is not at j 1 or else did not move right, or (A2) at i the head is not at j 1 or else did not move left; (B) C i 1 j can be computed from (B1) C i j 1 if the head was at i positioned at j and moved right, (B2) C i j if the head was at i positioned at j and did not move, (B3) C i j 1 if the head at i was positioned at j 1
V )
g U
V ) U
v ) U g
V ) U
V ) U
) aaiV ) U V ) U V ) U ) ) )aaiV ) U V ) U V ) U ) ) ) ) ) aaiV ) U V ) U V ) U ) )
C 00 C 01 C 02
V U i
V U i
aV ) U U V
1 i
V C iU
g ) U
T ) bS
g ) U
V ) U
V ) U
V aV i U ) i U
V v ) U V ) ) U
V )
V V iU
V )
g U
X U
g U
V ) U
V C iU
V U i D
377
These functions can be computed from T in time independent of x. Moreover, we can compute sequences of cells that represent these functions. Basically, the network we have to construct results in replacing for every i 0 and appropriate j the cell C i j by a sequence of cells calling on appropriate other cells to give the value C i j . This sequence is obtained by adding a xed number to each argument key of the cells of the sequence computing the boolean functions. Now let x be given. We compute a sequence of cells i j corresponding to C i j . The row 0 j is empty. Then ascending in i, the rows i j are computed and written on the output tape. If row i is computed, the following numbers are computed and remembered: the length of the ith row, the position of the read head at i 1 and the number of the rst cell of i j , where j is the position of the read head at i. Now the machine writes down C i 1 j with ascending j. This is done as follows. If j is not the position of the read head at i 1, the sequence is a sequence of cells that repeats the value of the cells of i j . So, j 1 i k for the number of the actual cell and is minus some appropriate number, which is computed from the length of the ith row the length of the sequence i j and the length of the sequence i 1 j . If j is the position of the read head, we have to insert more material, but basically it is a sequence shifted by some number, as discussed above. The number by which we shift can be computed in LOGSPACE from the numbers which we have remembered. Obviously, it can be decided on the basis of this computation when the machine T terminates on x and therefore when to stop the sequence. The last entry is the goal of the network. One also says that the problem of calculating the value of a network is complete with respect to the class PTIME. A network is monotone if it does not contain the symbol . Theorem 5.13 There exists a LOGSPACEcomputable function M which transforms an arbitrary network into a monotone network with identical value.
V k ) U
V ) U
Vk ) U
V ) U
ax i i
V ) )
V )
g U
V ) U
g U
V ) U
i i V ) U
V ) U
C i
1 k
fR C i
1 C i
V aV
(5.8)
C i
1 j k
g U g U g U
C i
10k
fL C i 0 C i 1 fM C i j
1 C i j C i j
T ) S
VaV ) U V v ) U U ) V ) D g ) U V ) U V v ) U U ) ) V ) D V aV ) U V ) U U ) V ) D T ) S
T ) S
T ) S
and moved left. Hence, for every k k k k and fR such that f L fR : 0 1 2
k k there exist boolean functions f L , fM , f k : 0 1 3 , and 01 01 M
V ) k
V ) U
g U
378
PTIME Languages
The proof is longwinded but rather straightforward, so we shall only sketch it. Call g : 2n 2 monotone if for every x and y such that x i yi for all i n, gx g y . (We write x y if xi yi for all i n.) Lemma 5.14 Let f be an nary boolean function. Then there exists a 2nary boolean function g which is monotone such that
Theorem 5.15 Every monotone boolean function is a polynomial function over and . Proof. One direction is easy. and is monotone, and every composition of monotone functions is again monotone. For the other direction, let f be monotone. Let M be the set of minimal vectors x such that f x 1. For every vector x, put p x : 0 0, then p x : .) Finally, xi 1 xi . (If x put
x M
If M , put f : . It is easily seen that f x f x for all x 2n . What is essential (and left for the reader to show) is that the map f g translates into a LOGSPACE computable map on networks. So, if g is an arbitrary PTIMEcomputable function from A to 0 1 , there exists a LOGSPACEcomputable function N constructing monotone networks such that g x N x for all x A . Now we shall turn to the promised new type of machines. Denition 5.16 An alternating Turing machine is a sextuple
where A Q q0 f is a Turing machine and : Q an arbitrary function. A state q is called universal if q , and otherwise existential. We tacitly generalize the concepts of Turing machines to the alternating Turing machines (for example an alternating ktape Turing machine, and a logarithmically space bounded alternative Turing machine). To this end one has to add the function in the denitions. Now we have to dene when a Turing machine accepts an input x. This is done via congurations. A conguration is said to be accepted by T if one of the following is the case:
T ) YRIS
V U
0 ) )
0 )
) ) ( ) I
(5.11)
Q q0 f
1 i
T ) S
V U i
V U i
1 i
V V iU
V U i
(5.10)
M :
px
V U D i
V U i
aa
v i)
) aaa)
v i)
v i)
V U i
X U
D A
V U i
(5.9)
f x
g x0
x0 x1
x1
xn
xn
i i
) ) ( ) I
V U i
V U i
i V U
379
T is in an existential state and one of the immediately subsequent congurations is accepted by T . T is in a universal state and all immediately subsequent congurations are accepted by T . Notice that the machine accepts a conguration that has no immediately subsequent congurations if (and only if) it is in a universal state. The difference between universal and existential states is effective if the machine is not deterministic. Then there can be several subsequent congurations. Acceptance by a Turing machine is dened as for an existential state if there is a successor state, otherwise like a universal state. If in a universal state, the machine must split itself into several copies that compute the various subsequent alternatives. Now we dene ALOGSPACE to be the set of functions computable by a logarithmically space bounded alternating multitape Turing machine. Theorem 5.17 (Chandra & Kozen & Stockmeyer)
The theorem is almost proved. First, notice
For every deterministic logarithmically space bounded Turing machine also is an alternating machine by simply letting every state be universal. Likewise the following claim is easy to show, if we remind ourselves of the facts concerning LOGSPACEcomputable functions.
Also this proof is not hard. We already know that there are at most polynomially many congurations. The dependency between these congurations can also be checked in polynomial time. (Every conguration has a bounded number of successors. The bound only depends on T .) This yields a computation tree which can be determined in polynomial time. Now we must determine in the last step whether the machine accepts the initial conguration. To this end we must determine by induction on the depth in a computation tree
Lemma 5.20 ALOGSPACE
PTIME.
Lemma 5.19 Let f : A B and g : B ALOGSPACE, so is g f .
Lemma 5.18 LOGSPACE
ALOGSPACE.
C be functions. If f and g are in
ALOGSPACE
PTIME
380
PTIME Languages
whether the respective conguration is accepted. This can be done as well in polynomial time. This completes the proof. Now the converse inclusion remains to be shown. For this we use the following idea. Let f be in PTIME. We can write f as N where N is a monotone network computing f . As remarked above we can construct N in LOGSPACE and in particular because of Lemma 5.18 in ALOGSPACE. It sufces to show that is in ALOGSPACE. For then Lemma 5.19 gives us that f N ALOGSPACE.
Proof. We construct a logarithmically space bounded alternating machine which for an arbitrary given monotone network x calculates its value w x . Let a network be given. First move to the goal. Descending from it compute as follows. If the cell contains change into the universal state q 1 . Else change into the existential state q2 . Goto . Choose an argument key of the current cell and go to the cell number . If is not an argument key go into state q f if and into qg if . Here q f is universal and qg existential and there are no transitions dened from q f and qg . All other states are universal; however, the machine works nondeterministically only in one case, namely if it gets the values of the arguments. Then it makes a nondeterministic choice. If the cell is an cell then it will accept that conguration if one argument has value 1, since the state is existential. If the cell is a cell then it shall accept the conguration if both arguments have value 1 for now the state is universal. The last condition is the termination condition. If the string is not an argument key then it is either or and its value can be computed without recourse to other cells. If it is the automaton changes into a nal state which is universal and so the conguration is accepted. If the value is the automaton changes into a nal state which is existential and the conguration is rejected. Notes on this section. The gap between PTIME and NPTIME is believed to be a very big one, but it is not known whether the two really are distinct. The fact that virtually all languages are in PTIME is good news, telling
V U i
Lemma 5.21
ALOGSPACE.
Literal Movement Grammars
381
us that natural languages are tractable, at least syntactically. Concerning the tractability of languages as sign systems not very much is known, however.
Exercise 174. Show the following. If L 1 and L2 are in PTIME then so is L1 L2 . Exercise 175. Show that L has the constant growth property if L is semilinear. Give an example of a language which has the constant growth property but is not semilinear. 2 a boolean function. Show that it can be obExercise 176. Let f : 2n tained from the projections and the functions , , and . Hint. Start with 2 such that gx y : 1 iff x y. Show that they can the functions gx : 2n be generated from and . Proceed to show that every boolean function is either the constant 0 or can be obtained from functions of type g x using . Exercise 177. Prove Lemma 5.14. Exercise 178. Call a language L A weakly semilinear if every intersection with a semilinear language A has the constant growth property. Show m n : that every semilinear language is also weakly semilinear. Let M : m . Show that M n 2 is weakly semilinear but not semilinear. Exercise 179. Show that every function f : A B which is computable using an alternating Turing machine can also be computed using a Turing machine. (It is not a priori clear that the class of alternating Turing machines is not more powerful than the class of Turing machines. This has to be shown.)
2.
The concept of a literal movement grammar LMG for short has been introduced by Annius Groenink in (1997b) (see also (Groenink, 1997a)). With the help of these grammars one can characterize the PTIME languages by means of a generating device. The idea to this characterization goes back to a result by William Rounds (1988). Many grammar types turn out to be special subtypes of LMGs. The central feature of LMGs is that the rules contain a context free skeleton which describes the abstract structure of the string and in addition to this a description of the way in which the constituent is
t S
Exercise 173. Show Proposition 5.3: With L 1 and L2 also L1 well as A L1 are in PTIME.
L2 , L 1
L2 as
i i D s
V U i
Fl) } T S H
382
PTIME Languages
formed from the basic parts. The notation is different from that of CFGs. In an LMG, nonterminals denote properties of strings and therefore one writes Q x in place of just Q. The reason for this will soon become obvious. If Q x obtains for a given string x we say that x has the property Q or that x is a Qstring. The properties play the role of the nonterminals in CFGs, but technically speaking they are handled differently. Since x is metavariable for strings, we now need another set of (ofcial) variables for strings in the formulation of the LMGs. To this end we use the plain symbols x, y, z and so on (possibly with subscripts) for these variables. In addition to these variables there are also constants , , for the symbols of our alphabet A. We give a simple example of an LMG. It has two rules.
These rules are written in Hornclause format, as in Prolog, and they are exactly interpreted in the same way: the left hand side obtains with the variables instantiated to some term if the right hand obtains with the variables instantiated in the same way. So, the rules correspond to more familar looking formulae:
(Just reverse the arrow and interpret the comma as conjunction.) Denition 5.22 A formulae of predicate logic is called a Hornformula iff it has the form
1 i n
Here, it is assumed that only the variables x i , i q, occur in the i and in . We abbreviate x0 x p 1 by x . Now, consider the case where the language has the following functional signature: for every letter from A a zeroary function symbol (denoted by the same letter), (zeroary) and (binary). Further, assume the following set of axioms:
T V
U ~U ) W V V
x x
x x
U ~ W V U ) V W V W U
V W U W V U
~U @S
(5.15)
SG :
xyz x
y z
x y
V U i ~
~U yaa@V
~ U
where the i , i
n, and are atomic formulae.
U V
~U aa@V
~ U
(5.14)
x0
xq
V aV W U
"V U V U U ~
(5.13)
x Sx
Sx x
XV
V U
) V U H
(5.12)
S xx
Sx ;
V U i V U i
383
Then a Hornclause is of the form
Denition 5.23 A literal movement grammar, or LMG for short, is a quintuple G A R S H , where A is the alphabet of terminal symbols, R a set of socalled predicates, : R a signature, S R a distinguished symbol such that S 1, and H a set of Hornformulae in the language consisting of constants for every letter of A, the empty string, concatenation, and the relation symbols of R. x is a Gsentence iff S x is derivable from H and S G :
Proof. Surely L G . This settles the case n 0. By induction one shows n n 1 2n L G for every n 0. For if 2 is a string of category S so is 2 n n n 2 2 . This shows that L G 2 :n 0 . On the other hand this set satises the formula . For we have L G and with x L G we also have 2n for a certain n 2n 2 2n 1 x x L G . For if x 0 then x x LG. There is an inductive denition of L G by means of generation. We write is a rule or x y y and G S y . G S x (vector arrow!), if either S x Both denitions dene the same set of strings. Let us elaborate the notion of a 1LMG in more detail. The maximum of all n such that G has an nary rule is called the branching number of G. In the rule
we have n (5.20) we have n
1 and T
U0
S, t
xx and s0
0, T
S and t
H D
V U
D D XV U H
(5.19)
S xx
Sx
x. In the rule
V U
V U i
H D D H V U 1 i
i i
i i
V U
i V U V U
HS
V U
V U
Proposition 5.24 L G
2n
:0
n .
We call G a kLMG if max Q : Q 1LMG.
T i V U
V U
V U
H D
i IS
V U
(5.18)
LG :
x : SG ; H
Sx
k. The grammar above is a
V U i
)) aaaV U
) V U
) ) ) ) (
V U
(5.17)
T t
U0 s0 U1 s1
where t and the si (i
n) are string polynomials. This we shall write as Un sn
V aV U
'aa@V U
V U
U i ~ V FU D V U D
(5.16)
x U0 s0
U1 s1
Un
sn
T t
V U H1 V U i
` 1 i i
H H
~
384
PTIME Languages
~
Denition 5.25 Let G A R S H be an LMG. Then we write n iff G n in predicate logic and ; H; SG iff ; H; SG in predicate logic. G We shall explain in some detail how we determine whether or not n Q x (Q G unary). Call a substitution a function which associates a term to each variable; and a valuation a function that associates a string in A to each string variable. Given we dene s for a polynomial by homomorphic extension. For example, if s x2 y and x , y then s , as is easily computed. Notice that strings can be seen as constant terms modulo equivalence, a fact that we shall exploit here by confusing valuations with substitutions that assign constant terms to the string variable. (The socalled Herbranduniverse is the set of constant terms. It is known that any Horn formula that is not valid can be falsied in the Herbranduniverse, in this case A .)
is an instance of the rule
is an instance of
if there is a substitution such that s i si for all i m and t t . The notion of generation by an LMG can be made somewhat more explicit. Proposition 5.26 (a) 0 Q x iff Q x is a ground instance of a rule of G n G. (b) G 1 Q x iff n Q x or there is a number m, predicates R i , i m, G and strings yi , i m, such that
i U
)) i iaaaV U
Qx
V U i
i U
n G
Ri yi , and R 0 y0 Rm ym is a ground instance of a rule of G.
V U
i mV U
)) aaaV U
V U i V U i
) V U
i V U i
(5.24)
T t
U0 s0 U1 s1
Um
sm
k U
)) iaaaV k U
) V k U
V k
(5.23)
T t
U0 s0 U1 s1
Um
sm
if there is a valuation such that x
t and yi
s for all i i
)) aaaV U
) V U
(5.22)
T t
U0 s0 U1 s1
Um
sm
i U
)) i iaaaV U
) i V U
i U
(5.21)
T x
U0 y0 U1 y1
Um
ym
m. Similarly,
@p p H D H H
V U i
V U
H D
0 ) ) ) ) (
~
V U
H D
V Y U
385
We shall give an example to illustrate these denitions. Let K be the following grammar.
Then L K is that language which contains all strings that contain an identical number of , and . To this end one rst shows that L K and in virtue of the rst rule L K is closed under permutations. Here y is a permutation of x if y and x have identical image under the Parikh map. Here is an example (the general case is left to the reader as an exercise). We can derive in one step from S using the second rule, and S in two S steps, using again the second rule. In a third step we can derive S from this, using the rst rule this time. Put v : , x : , y : and z : . Then
Let H be a CFG. We dene a 1LMG H as follows. (For the presentation we shall assume that H is already in Chomsky normal form.) For every nonterminal A we introduce a unary predicate A. The start symbol is . If A BC is a rule from H then H contains the rule
One can show relatively easily that L H LH . The 1LMGs can therefore generate all CFLs. Additionally, they can generate languages without constant growth, as we have already seen. Let us note the following facts. Theorem 5.27 Let L1 and L2 be languages over A which can be generated by 1LMGs. Then there exist 1LMGs generating the languages L 1 L2 and L1 L2 . Proof. Let G1 and G2 be 1LMGs which generate L1 and L2 , respectively. We assume that the set of nonterminals of G 1 and G2 are disjoint. Let Si be the start predicate of Gi , i 1 2 . Let H be constructed as follows. We
V U
T ) S 1
(5.28)
Aa
If A
a is a terminal rule then we introduce the following rule into H :
V U V U )
wV
(5.27)
A xy
Bx C y
) p p @F
(5.26)
vxyz
vyxz
p p F F H D H
H H D
p F V U V U H D V p D RpF@ U H@ VHp p b F U H H
V U
XV
} b U V p
V U
H D
V U
V U
V U
(5.25)
S vxyz
S vyxz ;
p W F U H
Sx ;
H V U D i
V U
V p 8 U H s
386
PTIME Languages
form the union of the nonterminals and rules of G 1 and G2 . Further, let S be a new predicate, which will be the start predicate of G . At the end we add the following rules: S x S1 x ; S x S2 x . This denes G . G is dened similarly, only that in place of the last two rules we have a single rule, S x S1 x S2 x . It is easily checked that L G L1 L2 L1 L2 . We show this for G . We have x L G if there is and L G n an n with n S x . This in turn is the case exactly if n 0 and G 1 S1 x G n n n as well as G 1 S2 x . This is nothing but G 1 S1 x and G 1 S1 x . Since n 1 2 was arbitrary, we have x L G iff x L G1 L1 and x L G2 L2 , as promised. The 1LMGs are quite powerful, as the following theorem shows.
The proof is left to the reader as an exercise. Since the set of recursively enumerable languages is closed under union and intersection, Theorem 5.27 already follows from Theorem 5.28. It also follows that the complement of a language that can be generated by a 1LMG does not have to be such a language again. For the complement of a recursively enumerable language does not have to be recursively enumerable again. (Otherwise every recursively enumerable set would also be decidable, which is not the case.) In order to arrive at interesting classes of languages we shall restrict the format of the rules. Let be the following rule.
is called upward nondeleting if every variable which occurs in one of the si , i n, also occurs in t. is called upward linear if no variable occurs more than once in t. is called downward nondeleting if every variable which occurs in t also occurs in one of the si . is called downward linear if none of the variables occurs twice in the si . (This means: the si are pairwise disjoint in their variables and no variable occurs twice in any of the s i .)
)) aaaV U
) V U
(5.29)
: T t
U0 s0 U1 s1
Un
sn
V U
Theorem 5.28 Let A be a nite alphabet and L iff L is recursively enumerable.
A .L
L G for a 1LMG
V U i
V U D V U i
V U V U
1 i
1 i
~
V U D
V U i
uV
V U
1 i
V U
V U
uV
) V U
1 i
V U i V U i
V U
387
is called noncombinatorial if the s i are variables. is called simple if it is noncombinatorial, upward nondeleting and upward linear. G has the property if all rules of G possess . In particular the type of simple grammars shall be of concern for us. The denitions are not always what one would intuitively expect. For example, the following rule is called upward nondeleting even though applying this rule means deleting a symbol: x x . This is so since the denition focusses on the variables and ignores the constants. Further, downward linearity could alternatively be formulated as follows. One requires any symbol to occur in t as often as it occurs in the si taken together. This, however, is too strong a requirement. One would like to allow a variable to occur twice to the right even though on the left it occurs only once. Lemma 5.29 Let be simple. Further, let

Proof. Let x be an arbitrary string and n : N x . Because of Lemma 5.29 for every predicate Q: G Q x iff n Q x . From this follows that every G derivation of S x has length at most n. Further, in a derivation there are only predicates of the form Q y where y is a subword of x. The following chartalgorithm (which is a modication of the standard chartalgorithm) only takes polynomial time: For i 0 n: For every substring y of length i and every predicate Q check if there are subwords z j , j p, of length i and predicates R j , j p, such that Q y R 0 z0 R1 z1 R p 1 z p 1 is an instance of a rule of G. The number of subwords of length i is proportional to n. For given p, a string of length n can be decomposed in O n p 1 ways as product of p (sub)strings.
i U
ni
)) i iaaaV U
V U i
) V i U i
V U i
V U i
mV
i U
Theorem 5.30 Let L in PTIME.
A be generated by some simple 1LMG. Then L is
f 9
i S
ni
f 9
ni
be an instance of . Then y subword of y for every i n.
xi
max xi : i
i U
)) i aaaV U
) i V U
) aaa)
V U i
XV
i U
(5.30)
Qy
U U W wwV " i D i
R 0 x0 R1 x1
Rn
xn
n . Further, xi is a
388
PTIME Languages
Thus for every i, O n p many steps are required, in total O n p 1 on a deterministic multitape Turing machine. The converse of Theorem 5.30 is in all likelihood false. Notice that in an LMG, predicates need not be unary. Instead, we have allowed predicates of any arity. There sometimes occurs the situation that one wishes to have uniform arity for all predicates. This can be arranged as follows. For an iary predicate A (where i k) we introduce a kary predicate A which satises
1 1
j i
There is a small difculty in that the start predicate is required to be unary. So we lift also this restriction and allow the start predicate to have any arity. Then we put
i S G
~
This does not change the generative power. An important class of LMGs, which we shall study in the sequel, is the class of simple LMGs. Notice that in virtue of the denition of a simple rule a variable is allowed to occur on the right hand side several times, while on the left it may not occur more than once. This restriction however turns out not to have any effect. Consider the following grammar H. (5.33)
It is easy to see that H E x y iff x y. H is simple. Now take a rule in which a variable occurs several times on the left hand side.
We replace this rule by the following one and add (5.33).
This grammar is simple and generates the same strings. Furthermore, we can see to it that no variable occurs more than three times on the right hand side,
V ) U
) V U
wV
(5.35)
S xy
Sx
V U
(5.34)
S xx
Sx
E xy
V ) U
E xy
1 U 1 U
V !) U i i
V ) U V ) U V ) U
E E aa E xa ya
a a
A A
T V
i) i !aaa) U
V U
(5.32)
LG :
xi :
S x0
x S
@V
) iaaa)
V
) aaa)
(5.31)
A x0
xk
A x0
xi
xj
k 1
389
j and that sij sk for i k. Namely, replace sij by distinct variables, say xij , and add the clauses E xij xij , if sij sij . We do not need to introduce all of these clauses. For each variable we only need two. (If we want to have A i A j for all i n we simply have to require Ai A j for all j i 1 mod n .) With some effort we can generalize Theorem 5.30.
The main theorem of this section will be to show that the converse also holds. We shall make some preparatory remarks. We have already seen that PTIME = ALOGSPACE. Now we shall provide another characterization of this class. Let T be a Turing machine. We call T read only if none of its heads can write. If T has several tapes then it will get the input on all of its tapes. (A read only tape is otherwise useless.) Alternatively, we may think of the machine as having only one tape but several read heads that can be independently operated. Denition 5.32 Let L A . We say that L is in ARO if there is an alternating read only Turing machine which accepts L. Theorem 5.33 ARO ALOGSPACE.
Proof. Let L ARO. Then there exists an alternating read only Turing machine T which accepts L. We have to nd a logarithmically space bounded alternating Turing machine that recognizes L. The input and output tape remain, the other tapes are replaced by read and write tapes, which are initially empty. Now, let be a read only tape. The actions that can be performed on it are: moving the read head to the left or to the right (and reading the symbol). We code the position of the head using binary coding. Evidently, this coding needs only log 2 x 1 space. Calculating the successor and predecessor (if it exists) of a binary number is LOGSPACE computable (given some extra tapes). Accessing the ith symbol of the input, where i is given in binary code, is as well. This shows that we can replace the read only tapes by logarithmically space bounded tapes. Hence L ALOGSPACE. Suppose now that L ALOGSPACE. Then L L U for an alternating, logarithmically space bounded Turing machine U. We shall construct a read only alternating Turing machine which accepts the same language. To this end we shall replace every intermediate Tape by several read only tapes which together perform
V U
Theorem 5.31 Let L PTIME.
A be generated by a simple LMG. Then L is in
V ) D
g a i
390
PTIME Languages
the same actions. Thus, all we need to show is that the following operations are computable on read only tapes (using enough auxiliary tapes). (For simplicity, we may assume that the alphabet on the intermediate tapes is just and .) (a) Moving the head to the right, (b) moving the head to the left, (c) writing onto the tape, (d) writing onto the tape. Now, we must use at least two read only tapes; one, call it a , contains the content of Tape , b contains the position of the head of . The position i, being bounded by log 2 x , can be coded by placing the head on the cell number i. Call i a the position of the head of a , ib the position of the head of b . Arithmetically, these steps correspond to the following functions: (a) i b ib 1, (b) ib 1 ib , (c) replacing the ib th symbol in the binary code of ia by , (d) replacing the ib th symbol in the binary code of ia by . We must show that we can compute (c) and (d). (It is easy to see that if we can compute this number, we can reset the head of b onto the position corresponding to that number.) (A) The i b th symbol in the binary code of ia is accessed as follows. We successively divide i a by 2, exactly ib times, throwing away the remainder. If the number is even, the result is , otherwise it is . (B) 2ib is computed by doubling 1 ib times. So, (c) is performed as follows. First, check the i b th digit in the representation. If it is , leave ia unchanged. Otherwise, substract 2 ib . Similarly for (d). This shows that we can nd an alternating read only Turing machine that recognizes L. Now for the announced proof. Assume that L is in PTIME. Then we know that there is an alternating read only Turing machine which accepts L. This machine works with k tapes. For simplicity we shall assume that the machine can move only one head in a single step. We shall construct a 2k 2LMG G such that L G L. Assume for each a A two binary predicates, L a and Ra , with the following rules. Ra a Ra cx cy Ra x y
It is easy to see that La x y is derivable iff y ax and Ra x y is derivable iff y x a. If w is the input we can code the position of a read head by a pair x y for which x y w. A conguration is simply determined by naming the state of the machine and k pairs xi yi with xi yi w. Our grammar will monitor the actions of the machine step by step. To every state q we associate a predicate q . If q is existential the predicate is 2k 2ary. If q changes to r when reading the letter a and if the machine moves to the left on Tape j then the following
0 ) ( i i
V !) U i i
wV
(5.37)
V ) U V ) U
wV
i i
0 !) ( i i
V !) U i i
V ) U V ) U
(5.36)
La a
La xc yc
La x y
ni
V U
i i
391
rule is added to G.
r w w x 0 y0
If the machine moves the head to the right we instead add the following rule.
r w w x 0 y0
If the machine does not move the head, then the following rule is added.
Notice that the rst two argument places of the predicate are used to get rid of superuous variables. If the state q is universal and if there are exactly p transitions with successor states r i , i p, which do not need to be different, then q becomes 2k 2ary and we introduce symbols q i , i p, which are 2k 2ary. Now, rst the following rule is introduced.
Second, if the transition i consists in the state q changing to r i when reading the symbol a and if the machine moves to the left on Tape j, G gets (5.38) with q i in place q and r j in place of r . If movement is to the right, instead we use (5.39). If the machine does not move the head, then (5.40) is added. All of these rules are simple. If q is an accepting state, then we also take the following rule on board.
The last rule we need is
V ) aaa) ) ) ) ) ) U )
(5.43)
Sw
q0 w w w w
) iaaa)
)k ) U
(5.42)
q w w x0 y0
V ) ) iaaa) ) ) k ) U V v U ) iaa )V ) ) iaaa) ) ) k ) U V U ) V ) )iaaa) ) ) k ) U V U ) ) iaaa) ) ) k ) U
q0 q1
w w x0 y0 w w x0 y0
q p
w w x0 y0
xk
yk
xk xk
(5.41)
q w w x0 y0
xk
yk
1 1
yk yk
1 1
xk
yk
V U
) iaaa)
)k ) U
) iaaa)
)k ) U
(5.40)
q w w x0 y0
V ) k U V k ) U V ) ) ) ) aaa) ) ) ) U V ) ) iaaa) ) ) k ) k ) ) ) iaaa) )
) U
(5.39)
q w x j y j x0 y0
V ) k U V k ) U V ) ) ) ) aaa) ) ) ) U V ) ) iaaa) ) ) k ) k ) ) ) iaaa) )

1 1 1 1 1 1
) U
(5.38)
q w x j y j x0 y0
xj
yj
xj yj xj
1
yj
a
xk
yk
xk
yk
L yj yj R xj xj
xj
yj
xj yj xj
1
yj
a
xk
yk
xk
yk
R xj xj L yj yj
xk
yk
r w w x0 y0
xk
yk
V U
392
PTIME Languages
This is a simple rule. For the variable w occurs to the left only once. With this denition made we have to show that L G L. Since L L T it sufces to show that L G L T . We have w L T if there is an n such that T moves into an accepting state from the initial conguration for w. Here the initial conguration is as follows. On all tapes we have w and the read heads are to the left of the input. An end conguration is a conguration from which no further moves are possible. It is accepted if the machine is in a universal state. We say that : q w w x0 y0 xk 1 yk 1 codes the conguration K where T is in state q, and for each i k (a) Tape i is lled by x i yi , and (b) the read head Tape i is on the symbol immediately following x i . Now we have: If q is existential then is an instance of a rule of G iff T computes K from K in one step.
~
0 G
iff K is an accepting end conguration.
This corresponds to the initial conguration of T for the input w. We conclude n from what we have said above that if G 1 there exists a k n such that K in k steps. Furthermore: if T accepts K in k steps, then 2k . T accepts G Hence we have L G LT .
Notes on this section. LMGs are identical to Elementary Formal Systems (EFSs) dened in (Smullyan, 1961), Page 4. Smullyan used them to dene recursion without the help of a machine. (Post, 1943) did the same, however his rules are more akin to actions of a Turing machine than to rules of an EFS (or an LMG). There is an alternative characterization of PTIMElanguages. Let be the expansion of rstorder predicate logic (with constants for each letter and a single binary symbol in addition to equality) by the leastxed point operator. Then the PTIMElanguages are exactly those that can be dened in . A proof can be found in (Ebbinghaus and Flum, 1995).
Theorem 5.34 (Groenink) L is accepted by a simple LMG iff L
PTIME.
i ) i V ) iaaa) I) ) ) ) ) U i i i
V U
V r { U q$
V U
V r { U Fn$
(5.44)
n 1 G
: q0 w w w w
V U i
V U
1 i
Let w
L G . This means that
n G
S w and so
If q is universal then is derivable from i , i T computes the transitions K iK (i p).
p, in two rule steps iff
1 V U
V U i )
V U
i) i i i iU !aaa) ) 6) ) j
1 i
V U
V U
Interpreted LMGs
393
Exercise 180. Prove Theorem 5.28. Hint. You have to simulate the actions of a Turing machine by the grammar. Here we code the conguration by means of the string, the states by means of the predicates. Exercise 181. Prove Theorem 5.31. Exercise 182. Construct a simple 1LMG G such that LG.
n n n
A R S H be an LMG which generates L. FurtherExercise 183. Let G more, let U be the language of all x whose Parikh image is that of some y L. (In other words: U is the permutation closure of L.) Let
Exercise 184. Let L be the set of all theorems of intuitionistic logic. Write a 1LMG that generates this set. Hint. You may use the Hilbertstyle calculus here. 3. Interpreted LMGs
In this section we shall concern ourselves with interpreted LMGs. The basic idea behind interpreted LMGs is quite simple. Every rule is connected with a function which tells us how the meanings of the elements on the right hand side are used to construct the meaning of the item on the left. We shall give an example. The following grammar generates as we have shown above n the language 2 : n 0 .
We write a grammar which generates all pairs 2 n . So, we take the number n n to be the meaning of the string 2 . For the rst rule we choose the function n n 1 as the meaning function and for the second the constant 0. We shall adapt the notation to the one used previously and write as follows.
0 ) )
(5.48)
:S:2
or
S2
0 )
H H H @@H
mV
V U
H H H @@H
(5.47)
S xx
Sx
Show that L G p
U.
T V
V U
S s
(5.46)
Hp :
V " U
where S
1, and let R S x S x ;S vyxz S vxyz
) bT
S s ) (
(5.45)
Gp :
AR
Hp
1 i
:n
H S
) ) ) ) (
V U
394
PTIME Languages
Both notations will be used concurrently. (5.48) names a sign with exponent with category (or predicate) S and with meaning 2. The rules of the above grammar are written as follows:
This grammar is easily transformed into a sign grammar. We dene a 0ary mode and a unary mode . (5.50)
S0
The structure term for example denes the sign 8 S 3 . It seems that one can always dene a sign grammar from a LMGs in this way. However, this is not so. Consider adding the following rule to (5.47).
The problem with this rule is that the left hand side is not uniquely determined S 2 we can derive in one by the right hand side. For example, from step S 6 as well as S 6 and S 6 . We shall therefore agree on the following. Denition 5.35 Let U1 s0 s1 1 1
q s 11 1
be a rule. is called denite if for all instances of the rule the following holds: For all , if the sij are given, the t j are uniquely determined. An LMG is called denite if each of its rules is denite. Clearly, to be able to transform an LMG into a sign grammar we need that it is denite. However, this is still a very general concept. Hence we shall restrict our attention to simple LMGs. They are denite, as is easily seen. These grammars have the advantage that the s ij are variables over strings and the t j polynomials. We can therefore write them in notation. Our grammar can therefore be specied as follows. (5.53)
: :
S0
x x x S n n
) aaa)
V U
Un
s0 n
s1 n
) V
) aaa)
)) iaaaV
) iaaa) ) U
(5.52)
T t0 t1
tp
U0 s0 s1 0 0
0 ) F ) ( H H 0 ) ) H ( H H H @H
0 ) ) ( H H H
0 ) )
) iaaa)
) ) W ( D 0 ) ) ( H D
( m0
V U
) )
0 ) )
(5.51)
y S 3n
ySn
q s 00
q snn 11
0 ) )
xSn
xx S n
m0
g ) ) ( V a0 ) ) ab (U D ) 0 ) ) ( H D
0 ) ) w0 ( H H
S
g ) )
(
(5.49)
xx S n
xSn
) ) ( H
( H H H
H H H @H
S0
Interpreted LMGs
395
In certain cases the situation is not so simple. For this specication only works if a variable of the right hand side occurs there only once. If it occurs several times, we cannot regard the t j as polynomials using concatenation. Namely, they are partial, as is easily seen. An easy example is provided by the following rule.
Intuitively, one would choose x x for the string function; however, how does one ensure that the two strings on the right hand side are equal? For suppose we were to introduce a binary mode .
Then we must ensure that x y is only dened if x y. So in addition to concatenation on A we also have to have a binary operation , which is dened as follows.
With the help of this operation we can transform the rule into a binary mode. Then we simply put : x y x y . We shall try out our concepts by giving a few examples. Let x be a binary sequence. This is the binary code n of a natural number n. This binary sequence we shall take as the meaning of the same number in Turing n 1 . Here is a grammar for code. For the number n it is the sequence n : the language n S n : n . x Sn xT n
(5.57)
xx
T n
xT n xT n
xx T n
Notice that the meanings are likewise computed using concatenation. In place of n 2n or n 2n 1 we therefore have x x and x x . We can also write a grammar which transforms binary codes into Turing codes, by simply exchanging exponent and meaning.
I
RC1 i T ) IS
H D
V ) U
0 ) ) ( 0 ) ) (X W ) ) ( 0 0 ) ) (X0I W ) ) ( H 0 ) ) ( 0 ) ) ( H 0 ) ) @S (
V !) U i i
(5.56)
xy :
if x y, otherwise.
0 aV ) U
) V ) U
) i i V ) U
V ) U i i
V a0 ) ) 0 ) ) a i() i(U
(5.55)
x X
y Y
V U V U )
i D
mV
(5.54)
C x
Ax Bx
xy
X Y
396
PTIME Languages
A somewhat more complex example is a grammar which derives triples x y S z of binary numbers where z is the binary code of the sum of the numbers represented by x and y. (The symbol serves to separate x from y.)
Now let us return to the specication of interpreted LMGs. First of all we shall ask how LMGs can be interpreted to become sign grammars. To this end we have to reconsider our notion of an exponent. Up to now we have assumed that exponents are strings. Now we have to assume that they are sequences of strings (we say rather vectors of strings, since strings are themselves sequences). This motivates the following denition.
k the Denition 5.36 Let A be a nite set. We denote by V A : k A V : set of vectors of strings over A. Furthermore, let F 0 , 2, 1; 0 0. Here, the following is assumed to hold. (Strings are denoted by vector arrows, while , and range over V A .)
B
is the empty sequence.
V bg U
0 ( D V U g
V U V U V U T ) D $r$ kD ) W ) ) S D ) ) ) V U V UD D
0 I )
0R) 0( ) I I ) I 0@ (
(5.58c)
V 8U
V 8U
V U
V U
V rU
(5.58b)
x x x x x x x x
yA z yA z yA z yA z
x yAz
x yU z x yU z x yAz x yAz x yAz x yU z
yU z yU z yU z yU z A
x yU z
) ) ) ) ) ) ) ) ) ) ) ) ) ) ) ) ) )
x yS z
x yU z
x yS z
x ySz
ySz
x ySz
x ySz
x ySz
0 0 0 0 0 0 0 0 0 0 0 0 0
) ) ) ) ) ) ) )
X) 0 I XR) 0 ( I 0 ) ( 0 R) ( I 0 ) ( 0 R) ( I 0 ) ( 0 R) ( I 0 ) ( 0 R) ( 0 z) ( 0 R) (m0 ) (m0 ) ( m0 )
) ( ) I ( ) 3 ( ) 0 I ) 0 I ) ) ) 0 I ) 0 I ) ) ) ( ) ( ) ( ) (
( ( (
I I I I
(5.58a)
x ySz
x yAz
0 l) ) 2 ( i i i

V W QU
Interpreted LMGs
397
is the empty string.
The resulting (partial) algebra is called the algebra of string vectors over A and is denoted by A . In this algebra the following laws hold among other.
V D V BbU VV D abU BVb V U U D D
The fourth and fth equation hold under the condition that is dened. A vector has length m if m 1 is dened but m is not. In this case m i 1 i is dened for all i m and they are the projection functions. Now we have: (5.60)
m 1 m 2 m 2 m 1
All polynomial functions that appear in the sequel can be dened in this algebra. The basis is the following theorem.
Proof. Let p be given. We assume that one of the variables appears at least once. (Otherwise p and then we put q : .) Let q arise from p by replacement of xi by m i 1 i , for all i m. This denes q. (It is well
i) i !aaa) U
V U
i (
V f1 U
If
and
xi : i
m then
V 8 U D
V f1 U
V b U
is dened only if
m.
p x0
xm
A be a function which is a polynomial in Theorem 5.37 Let p : A m and . Then there exists a vector polynomial : V A V A such that
V bU
V U
V U
z g)aa D
V U
V U
V U
(5.59)
V b() U
V U
U U
V b() U
if
and
otherwise.
{ D
V U i
U
V U
if is not a string; x
x otherwise.
{ D
i mx
i m 1 xi ,
and
i m xi
is the usual concatenation of strings, so it is not dened on vectors of length 1.
0 i m xi .
398
PTIME Languages
dened, for the symbols , , are in the signature F V .) Let now be given. As remarked above, q is dened on only if has length m. In this case xi : i n for certain xi , and we have xi m i 1 i . Since the symbols , and coincide on the strings in both algebras (that of the strings and that of the vectors) we have q q x0 xm 1 . m n is a polynomial function means that there exist That p : A A polynomials pi , i n, such that
We can therefore replace the polynomials on strings by polynomials over vectors of strings. Thise simplies the presentation of LMGs considerably. We can now write down a rule as follows.
We shall make a further step and consider LMGs as categorial grammars. To this end we shall rst go over to Chomsky Normal Form. This actually brings up a surprise. For there are kLMGs for which no kLMG in Chomsky Normal Form can be produced (see the exercises). However, there exists a k LMG in Chomsky Normal Form for a certain effectively determinable k k, where is the maximal productivity of a rule. Namely, look at a rule. We introduce new symbols Zi , i m 2, and replace this rule by the following rules.
Here q and f are chosen in such a way that
e aae
f X0
Xm
(5.64)
m 1
m 1
f X0
Xm
)iaaa) U ) iaaa) U
aa
m 3
m 1
) 0
Zm
Ym
m 3
m 1
0 aV
U I
A f Ym
Xm
Bm
Xm
m 4
m 2
Zm
Ym
m 4
m 2
(5.63)
Zm
Ym
Xm
Bm
Xm
) (
) ( ) V ) U R ( ) ( ) ) ( aa ) (
Z1 Y0
X2
Z0 Y0
B2 X2
0 0
) ( ) (
) 0 ) 0
) (
Z0 X0
X1
B0 X0
B1 X1
k k
m 1
B0 X0
Bm
m 1
0 aV
) ()) aaa0 ) ) ( ) iaaa) U ) V ) ) iaaa) U (
(5.62)
A f X0
Xm
Xm
i) i 6iaaa) U
i) i !aaa) U
(5.61)
p x0
xm
pi x0
xm
:i
V U V
i) i !aaa) U
V U D
U V
) (
V U
i (
Interpreted LMGs
399
It is not hard to see how to dene the functions by polynomials. Hence, in the sequel we may assume that we have at most binary branching rules. 0ary rules are the terminal rules. A unary rule has the following form.
There is only one binary mode, , which is dened thus:
This is exactly the scheme of application in categorial grammar. One difference remains. The polynomial p is not necessarily concatenation. Furthermore, we do not have to distinguish between two modes, since we in the string case we have the possibility of putting p to be either x x y or x y x. Application has in this way become independent of the accidental order. Many more operations can be put here, for example reduplication. The grammar that we have mentioned at the beginning of the section is dened by the following two modes. (5.68) : : S0
In this way the grammar has become an ABgrammar, with one exception: the treatment of strings must be explicitly dened. The binary rules remain. A binary rule has the following form.
We keep the sign on the right hand side and introduce a new sign.
0 iV ) U
(5.71)
0 ) 0 ) ) XaV ) U ) b() U ( ) () ( 0 )V
(5.70)
) V b() U ( U)V D
(5.69)
C f X Y
To the previous structure term
AX
C A B y x f x y
x x x S S n n
now corresponds the structure term
BY
W i
i W
0 aV
U ) V U ( )
V a0 ) ) 0 ) ) a () (U
(5.67)
pAX
qBY
0 aV U
) ) W ( D 0 ) ) ( H D
(5.66)
C A x f x
p q A B XY
) V U ( ) D
0 ) ) (
We keep the sign ing form.
0 ) ) aV U ) V U ( ( 0 )
A X and introduce a new sign
(5.65)
C f X
AX
which has the follow-
400
PTIME Languages
1
Table 15. Arabic Binyanim:
I II III VI VIII I II III VI VIII
Exercise 185. Show that for any k 1 there are simple kLMGs G with branching number 3 such that for no simple kLMG H with branching number 2, L G LH . Exercise 186. Here are some facts from Arabic. In Arabic a root typically consists of three consonants. Examples are to write, to study. There are also roots with four letters, such as (from Greek Drachme), which names a numismatic unit. From a root one forms socalled binyanim, roughly translated as word classes, by inserting vowels or changing the consonantism of the root. In Table 15 we give some examples of verbs derived from the root . Of these forms we can in turn form verbal forms in different tenses and voices. We have only shown the transparent cases, there are other classes whose forms are not so regular. Write an interpreted LMG that generates these forms. For the meanings, simply assume unary operators, for example for II, for passive, and so on. Exercise 187. In Mandarin (a Chinese language) a yesnoquestion is formed as follows. A simple assertive sentence has the form (5.72) and the corresponding negative sentence the form (5.73). Mandarin is an SVOlanguage, and so the verb phrase follows the subject. The verb phrase is negated by
k 8u h r q
H 4 H 4 d #7 H47#F7 4 H d H 4 H 4 H H d #@7 H 4 4 H d 97 H 4 d 7 A 5
D 4 H 4 d n#FH H47#99H 4 H d H 4 4 H H d Dn#@7 H 7 7 4 d 8FH f

G D 4 4 d n7#
4 d
D 4 7 4 n8d Dq4$78b4 7 d 7 D 7 n47@d 78 7 8

D 4 4 n7$ d
Perf. Act.
H 4 H 4 #d H49F4 H H d H H 4 H H #@d H 4 4 H 9d H 4 H 9d

D 4 q$ d
to write to make write to correspond to write to each other to write, to be inscribed Impf. Act. Impf. Pass.
Perf. Pass.
H 4 H 4 #d H49HF4 H d H H 4 H H #@d H 4 4 H 9d H 4 H 9d V U
4 d
khh q 'aIp
V U
Discontinuity
401
He/She/It (is) at home He/She/It (is) not at home The yesnoquestion is formed by concatenating the subject phrase with the positive verb phrase and then the negated verb phrase. Is he/she/it at home?
f f
As Radzinski (1990) argues, the verb phrases have to be completely identical (with the exception of ). For example, (5.75) is grammatical, (5.76) is ungrammatical. However, (5.77) is again grammatical and means roughly what (5.75) means. You like his shirt not like his shirt?
P
Do you like his shirt? You like his not like his shirt? You like not like his shirt?
G 5 G G 5
Write an interpreted LMG generating these examples. Use ? : to denote the question, whether or not is the case. 4. Discontinuity
In this section we shall study a very important type of grammars, the so called Linear ContextFree Rewrite Systems LCFRS for short (see (Vijay-Shanker et al., 1987)). These grammars are weakly equivalent to what we call linear LMGs.
8c
(5.77)
9 RH
A 9 Rh
8c
H 4 9 H I7
D q6
8c
9 H R7
D 6 qD
(5.76)
9 IH
bc
A 9 h
H 4 9 H I7
D q6
G 5
8c
H 4 9 H I7
9 IH
D 6 qD
A 9 h
G 5
(5.75)
H 4 9 H R7
D !6
9 A 9 IH Fh
H 4 9 H R7
D 6 qD
(5.74)
H bD
D
H bD
D
(5.73)
y D
H 8
D $
(5.72)
y D
H b
D
H s H s H s
prexing
. (We do not write tones.)
402
PTIME Languages
Denition 5.38 A kLMG is called linear if it is a simple kLMG and every rule which is not 0ary is downward nondeleting and downward linear, while 0ary rules have the form X x0 x X 1 with xi A for all i X . In other words, if we have a rule of this form (5.78) A t0 tk B0 s0 0 s0 k Bn
n s0 1 n sk 1 1
then for every i k and every j n: sij xij and xij xij implies i i and j j . Finally, i k ti is a term containing each of these variables exactly once, in addition to occurrences of constants. It is easy to see that the generative capacity is not diminished if we disallowed constants. In case k 1 we get exactly the CFGs, though in somewhat disguised form: for now if there are no constants a rule is of the form
i n 1
i n
(To this end we replace the variable x i by the variable x i for every i. After that we permute the Bi . The order of the conjuncts is anyway insignicant in an LMG.) This is as one can easily see exactly the form of a context free rule. For we have
i n
This rule says therefore that if we have constituents u i of type B i for i n, then i n ui is a constituent of type A. The next case is k 2. This denes a class of grammars which have been introduced before, using a somewhat different notation and which have been shown to be powerful enough to generate non CFLs such as Swiss German. In linear 2LMGs we may have rules of this kind. A x 2 x1 y1 y2 A y 1 x1 y2 x2 A x 1 y1 x2 y2 B x 1 y1 C x 2 y2 B x 1 y1 C x 2 y2 B x 1 y1 C x 2 y2
(5.82)
V V V V
U V ) U V ) U V ) U V )
A x 1 x2 y1 y2
B x 1 y1 C x 2 y2
aa
(5.81)
xi
i
x0 x1
xn
n 1
)) iaaaV
) V
XV
(5.80)
xi
x0 B
x1
where is a permutation of the numbers n. If : inverse to we can write the rule in this way.
U aaaV ))
) V
(5.79)
x i
B0 x0 B1 x1
B xn
is the permutation
xn
)) aaaV
V U
D D V iaaa) U )
1 i
i) i 6iaaa) U D ) iaaa) U
XV
) iaaa) U
Discontinuity
403
The following rules, however, are excluded.
The rst is upward deleting, the second not linear. We shall see that the language n n n n : n can be generated by a linear 2LMG, the language n n n n n :n however cannot. The second fact follows from Theorem 5.45. For the rst language we give the following grammar.
This shows that 2linear LMGs are strictly stronger than CFGs. As a further example we shall look again at Swiss German (see Section 2.7). We dene y.) the following grammar. (Recall that x y : x
This grammar is pretty realistic also with respect to the constituent structure, about which more below. For simplicity we have varied the arities of the predicates. Notice in particular the last two rules. They are the real motor of
(5.86)
U
x y x x y x z0 x z0 xy
y z 0 z1 u y z1 y z1
V U ) V U P 9 U ) U H V ) Ui V U RH V G P ) U U V ) Ui V U 5 V U G P ) ) U U H UG V U V ) Ui V U bH V ) ) 5 P ) U X U A A G ) U V U @V p U V U G) V U f P U A G V U p G U 4FD$V H A V U A H
x x x
y y y
z 0 z1 z 0 z1 z 0 z1
V A 7 V @7 A 9 IH
V U 9 Vg U P X G Ah U G H f h G
W ` W
V ) $i U V ) $i U U V ) $i V U V # W U G V W UG V y fU 5 8h U@G A A V Hp G h Dq4 A H 9 5 V1H U RH H H G 5 V @P vU bH hV X @V U 5 P eH 9R f pH U G A B G H
V @
9 D
(5.85)
T
V ) p $$!U l) U V H XV ) U
U V )
U V )
y0 x0 y1 z0 x1 z1
}
x0 x1
y0 y1
G 5
p h
(5.84)
A x 2 x2 x1 y1 y2
B x 1 y1 C x 2 y2
V V
z0 z1 u
U V ) U V )
(5.83)
A x 1 y1 y2
B x 1 y1 C x 2 y2
404
PTIME Languages
the Swiss German innitive constructions. For we can derive the following.
However, we do not have
The sentences of Swiss German as reported in Section 2.7 are derivable and some further sentences, which are all grammatical. Linear LMGs can also be characterized by the vector polynomials which occur in the rules. We shall illustrate this by way of example with linear 2 LMGs and here only for the at most binary rules. We shall begin with the unary rules. They can make use of these vector polynomials. x0 x1 : x0 x1 : x0 x1 : x 1 x0
Then the following holds.
G K
x0 x1
x0 x1
This means that one has G at ones disposal if one also has X and F , and that one has F if one also has X and G and so on. With binary rules already the situation gets quite complicated. Therefore we shall assume that we have all unary polynomials. A binary vector polynomial is of the form

VaV ) V aV )
(5.95)
x0 x1
U U V D U U V D
V aV
x0 x1
x0 x1
) (
) (
x0 x1 :
x 0 x1 x 1 x0
0 )
U U
(5.94)
x 1 x0
0 )
x0 x1 :
x 0 x1
x0 x1 :
x 0 x1
XV
(5.93)
x0 x1
XV
) H H A 9 @P FRH
U f h i
(5.92)
z0
z1
z0 z1
V ) V )
U$i U $i
XV
(5.91)
(5.90)
G ) H@HPh X V H P G ) h X @V H @P P H H XV h XV
(5.89)
z0
V
z1 z0 z1
U $i
XV
(5.88)
z0
z1
z0 z1
U i
XV
(5.87)
G U $i A 7 F7 IH f h A h A 9 Hp $i G U A 9 FRH f h D B 9 G Hp ) G U $i D 5 4 A H q1H @7 A 7 A h GHp ) Hp $i G U D 5 4 A H q1H 9 D B G ) U $i h X PV H A 9 IH f h ) Hp $i G U H H P 9 D B
z0
z1
z0 z1
Discontinuity
405
p0 x0 x1 y0 y1 p1 x0 x1 y0 y1 such that q : p0 p1 is linear. Given q there exist exactly 5 choices for p0 and p1 , determined exactly by the cutoff point. So we only need to list q. Here we can assume that in q x 0 x1 y0 y1 x0 always appears to the left of x1 and y0 to the left of y1 . Further, one may also assume that x0 is to the left of y0 (otherwise exchange the xi with the yi ). After simplication this gives the following polynomials. (5.96) qW x0 x1 y0 y1 : qZ x0 x1 y0 y1 :
Let us take a look at qW . From this polynomial we get the following vector polynomials. x0 x1 x0 x1 x0 x1 x0 x1 y0 y1 y0 y1 y0 y1 y0 y1 : : : :
W1
W2
W3
W4
We say that a linear LMG has polynomial basis Q if in the rules of this grammar only vector polynomials from Q have been used. It is easy to see that if is a polynomial that can be presented by means of polynomials from Q, then one may add to Q without changing the generative capacity. Notice also that it does not matter if the polynomial contains constants. If we have, for example,
we can replace this by the following rules.
(5.99)
This is advantageous in proofs. We bring to the attention of the reader some properties of languages that can be generated by linear LMGs. The following is established in (1987), see also (Weir, 1988).
U ) U ) V V FV U V FV U ) U )
Xb! V pU XU V T XV U
wV
uxvwy
U ) U p V V FV
U F
(5.98)
0 )
(5.97)

W0
) (
( Va0 D ( Va0 D ( Va0 D ( Va0 V a0
(0 ) (0 ) (0 ) (0 ) () 0
x0 x1
y0 y1
qC x0 x1 y0 y1 :
x 0 x1 y0 y1 x 0 y0 x1 y1
x 0 y0 y1 x1
x 0 y0 x1 y1 x 0 y0 x1 y1 x 0 y0 x1 y1 x 0 y0 x1 y1 x 0 y0 x1 y1
0 aV
) V
(aU (aU (aU (aU ( aU
406
PTIME Languages
Proposition 5.39 (VijayShanker & Weir & Joshi) Let G be a linear k LMG. Then L G is semilinear. A special type of linear LMGs are the socalled head grammars. These grammars have been introduced by Carl Pollard in (1984). The strings that are manipulated are of the form xay where x and y are strings and a A. One speaks in this connection of a in the string as the distinguished head. This head is marked here by underlining it. Strings containing an underlined occurrence of a letter are called marked. The following rules for manipulating marked strings are now admissible. hC2 vaw ybz : hL2 vaw ybz : hR2 vaw ybz : vawybz vaybzw vaybzw vybzaw vybzaw
(Actually, this denition is due to (Roach, 1987), who showed that Pollards denition is weakly equivalent to this one.) Notice that the head is not allowed to be empty. In (Pollard, 1984) the functions are partial: for example, hC1 w is undened. Subsequently, the denition has been changed slightly, basically to allow for empty heads. In place of marked strings one takes 2vectors of strings. The marked head is the comma. This leads to the following denition. (This is due to (VijayShanker et al., 1986). See also (Seki et al., 1991).) Denition 5.40 A head grammar is a linear 2LMG with the following polynomial basis.
C2
L1
L2 R1
It is not difcult to show that the following basis of polynomials is sufcient: C1 , C2 and
W
V a0
() 0
( aU
(5.102)
x0 x1
y0 y1
x 0 y0 y1 x1
x0 x1
y0 y1
x 0 y0 y1 x1
x0 x1
y0 y1
x 0 y0 y1 x1
(5.101)

x0 x1
y0 y1
x 0 y0 y1 x1
x0 x1
y0 y1
C1
x 0 x1 y0 y1
( Va0 D ( Va0 D ( Va0 D ( Va0 V a0
(0 ) (0 ) (0 ) (0 ) () 0
x0 x1
y0 y1
ii
ii
hR1 vaw ybz :
(5.100)
hL1 vaw ybz :
i i i i i i i i i i i i i i i i
hC1 vaw ybz :
vawybz
x 0 x1 y0 y1
i i D D D
V i !) i V i !) i V i !) i V i !) i V i !) i V i !) i
i U i i U i i U i i U i i U i i U i
V U (aU (aU (aU (aU ( aU
V ) U i

Discontinuity
407
Notice that in this case there are no extra unary polynomials. However, some of them can be produced by feeding empty material. These are exactly the polynomials , F and H . The others cannot be produced, since the order of the component strings must always be respected. For example, one has
C2 F
We shall now turn to the description of the structures that correspond to the trees for CFGs. Recall the denitions of Section 1.4.
i n
Notice that an nsequence of strings can alternatively be regarded as an n 1 context. Let G be a klinear LMG. If G A x0 xk 1 then this means that the ktuple xi : i k is a constituent of category A. If G S x , then the derivation will consist in deriving statements of the form A y i : i A such that there is an n 1context for y i : i A in x. The easiest kinds of structures are trees where each nonterminal node is assigned a tuple of subwords of the terminal string. Yet, we will not follow this approach as it generates stuctures that are too articial. Ideally, we would like to have something analogous to constituent structures, where constituents are just appropriate subsets of the terminal nodes. Denition 5.42 An labelled ordered tree of discontinuity degree k is a labelled, ordered tree such that x has at most k discontinuous pieces. If G is given, the labels are taken from A and R, and A is the set of labels of leaves, while R is the set of labels for nonleaves. Also, if Q is a kary predicate, it must somehow be assigned a unique ktuple of subwords of the terminal string. To this end, we segment x into k continuous parts. The way this is done exactly shall be apparent later. Now, if x is assigned B and if x is the disjoint union of the continuous substrings y i , i k, and if yi precedes in y j in x iff i j then B yi : i k . However, notice that the predicates apply to sequences of substrings. This is to say that the linear order is projected from the terminal string. Additionally, the division into substrings can be read off from the tree (though not
V0 aaV U
V U i
i( aU
0 aV U
i) i !aaa) U
i (
V a0
i( aU
i (
V t 8rU
We write C
in place of x.
i i
W i
(5.104)
w0
vi wi
0 i (
Denition 5.41 A sequence C context. A sequence vi : i
wi : i n 1 of strings is called an n n occurs in x in the context C if
V a0 ) 0 () i ( D
( aU
(5.103)
x0 x1
x 0 x1
U 0 ) D
x0 x1
408
PTIME Languages
from x alone). There is, however, one snag. Suppose G contains a rule of the form
Then assuming that we can derive B u0 u1 and C v0 v1 , we can also derive A u1 u0 v0 v1 . However, this means that u1 must precede u0 in the terminal string, which we have excluded. We can prevent this by introducing a nonterminal B such that B x0 x1 B x1 x0 , and then rewrite the rule as The problem with the rule (5.105) is that it switches the order of the x i s. Rules that do this (or switch the order of the y i s) are called nonmonotone in the sense of the following denition. Denition 5.43 Let L M0 Mn 1 be a linear rule, L B t j : j k and Mi Ai xij : j ki , i n. is called monotone of for every i n and every pair j j ki the following holds: if xij occurs in t q and xij in t q then either q q or q q and xij occurs before xij in the polynomial tq . An LMG is monotone if all of its rules are. Now, for every LCFRS there exists a monotone LCFRS that generates the same strings (and modulo lexical rules also the same structures). For every predicate A and every permutation : k k, k : A assume a distinct predicate A with the intended interpretation that A x 0 xk 1 iff x A 10 x 1 k 1 . Every rule A s B0 B p 1 is replaced by all
possible rules A 1 s B0 B p 1 . Let A B0 B p 1 be a rule. Now put i j : k iff xk is the jth variable i from the variables xq , q Bi , which occurs in i A ti , counting from the i results from A by applying the substitution x ij xi j (while the variables i of the B remain in the original order). This rule is monotone. For example, i assume that we have the rule
) V
XV
(5.108)
A x 0 y0 x1 x2 y1 y2 y3
B 0 x 0 x 1 x 2 C 1 y 0 y 1 y 2 y 3
Then we put 0 : 0 So we get
21
12
0, and 1 : 0
21
02
U V )
XV
(5.107)
A x 2 y1 x1 x0 y2 y0 y3
B0 x0 x1 x2 C y 0 y1 y2 y3
13
3.
aa
left. Then A
B 0 0
1 B pp 1 will replace the rule A
B0
B p 1 , where A
V a0
)iaaa) U V U
( aU
aa
V U
i U
U V )
aa
aa
aV U U V i
V a0
aa
@k ( aU
aa
) aaa)
(5.106)
A x 0 x1 y0 y1
B x0 x1 C y 0 y1
V ) U i i
i V a) 6U i i
U V ) )
(5.105)
V !) i i 6U i i
A x 1 x0 y0 y1
B x 0 x1 C y 0 y1
Discontinuity
409
Every terminal rule is monotone. Thus, we can essentially throw out all nonmonotone rules. Thus, for nonmonotone LCFRS there is a monotone LFCRS generating the same strings. We can a derive useful theorem on LCFRSs. Denition 5.44 A language L is called kpumpable if there is a constant p L such that for all x of length pL there is a decomposition x u0 i k vi ui 1 such that
i k
Theorem 5.45 (Groenink) Suppose that L is a kLCFRL. Then L is 2k pumpable. We provide a proof sketch based on derivations. Transform the language into a language of signs, by adding the trivial semantics. Then let be a sign grammar based on a monotone kLCFRS for it. Observe that if is a structure term containing x free exactly once, and if and are denite and unfold to signs of identical category, then with also n is denite for every n (the rule skeleton is context free). Now, if x is large enough, its structure term will be of the form such that and have the same category. Finally, suppose that the grammar is monotone. Then is a monotone, linear polynomial function on ktuples; hence there are v i , i 2k, such that
1 1
Thus let us now focus on monotone LCFRSs. We shall dene the structures that are correspond to derivations in monotone LCFRSs, and then show how they can be generated using graph grammars. Denition 5.46 Suppose that degree of discontinuity k and G that is a Gtree if
for every nonleaf v: there is a set H v i : i k v , and H v i x is continuous,
V U
} V U
V U 3 aV xU
1 U V 3
A iff v is a leaf, k of leaves such that
0 ) ) ) ) ( 0 3) j) i) ( D D
T is an ordered labelled tree with A R S H a monotone LCFRS. We say
i (
) aaa)
(5.110)
p x0
xk
v2i xi v2i
:i
VV U a~U P
V U ~
VV U U a~P V U ~
} T
VV U U a~P
V i
i U
i 8S D
(5.109)
u 0 vi n ui : n
V i U i
410
PTIME Languages
such that
The last clause is somewhat convoluted. It says that whatever composition we assume of the leaves associated with v, it must be compatible with the composition of the wi , and the way the polynomials are dened. Since the preterminals are unary, it can be shown that H v i is unique for all v and i. The following grammar is called G .
G derives with the structure shown in Figure 12. This tree is not exhaustively ordered. This is the main difference with CFGs. Notice that the tree does not reect the position of the empty constituent. Its segments are found between the second and the third as well as between the sixth and the seventh letter. One can dene the structure tree also in this way that it explicitly contains the empty strings. To this end one has to replace also occurrences of by variables. The rest is then analogous. We shall now dene a context free graph grammar that generates the same structures as a given monotone LCFRS. For the sake of simplicity we assume that all predicates are kary, and that all terminal rules are of the form Y a , a A. Monotonicity is not necessary to assume, but makes life easier. The only problem that discontinuity poses is that constituents cannot be ordered with respect to each other in a simple way. The way they are related to each other shall be described by means of special matrices. vi : i k 1 a k 1context for xi , i k, Assume that u is a string, C in u and D wi : i k 1 a k 1context for y j , j k, in u. Now, write M C D : i j i j for the following matrix:
i p
i q
i i
W i
i i
(5.112)
pq
1 iff
vi xi is a prex of w0 yi wi
U V 8! V pU V U V wV ) U V U V U ) U )
i (
(5.111)
xy
V )
U V )
U ) V
y0 x0 y1 z0 x1 z1 xy x y
mV
x0 x1
y0 y1
V U
gi
v has label A, wi has label Bi (i if t j i xh i then H v j
n), H wg i
hi
z0 z1
) iaaa)
V
1
) ) aaV
V U
) aaa)
XV
iaaa) U ) )
55
A t0
t A
B0 x0 0
if v has daughters wi , i
n, then there is a rule

x 0 B0
Bn
x0 n
x n Bn 1
mV XV
V XU T XV U S ) H U ) U
i j(
V iaaa) ) U )
V ) U i
Discontinuity
411
We call M C D the order scheme of C and D. Intuitively, the order scheme tells us which of the x p precede which of the yq in u. We say that C and D overlap if there are i j k such that the (occurrences of) x p and yq overlap. This is the case iff pq qp 0. Lemma 5.47 Assume that is a labelled ordered tree for a monotone LCFRS with yield u. Further, let x and y be nodes. Then x and y overlap iff x and y are comparable. Notice that in general qp 1 pq in the nonoverlapping case, which is what we shall assume from now on. We illustrate this with an example. Figure 13 shows some 2schemes for monotone rules together with the orders which dene them. (We omit the vector arrows; the ordering is dened by is to the left of in the string.) For every kscheme M let M be a relation. Now we dene our graph grammar. N : A : A N is a set of fresh nonterminals. The set of vertex colours FV : N N A, the set of terminal vertex T M : M a kscheme . colours is FV : N A, the set of edge colours is (As we will see immediately, is the relation which ts to the relation where 1 i j is the matrix which consists only of 1s.) The start graph is the oneelement graph which has one edge. The vertex has colour S, the edge
V b3 U
S s T 8S
V b3 U
s 1
S D
Figure 12. A Structure Tree for G

vv } T
S S
D ) i
V ) U
H V U i D
412
PTIME Languages
Figure 13. Order Schemes
be given. Let p : i k t i be the characteristic polynomial of the rule. Then the graph replacement replaces the node B by the following graph. B w (5.114)
Furthermore, between the nodes vi and w j , i j, the following relations hold (which are not shown in the picture). Put i j i j : 1 if xii is to the left of x jj in p. Otherwise, put i j i j : 0. Then put Hi j : i j i j i j . The relation from vi to v j is H . Notice that by denition always either x ii is to the
ij
left of x jj or to the right of it. Hence the relation between v j and vi is exactly 1 H . This is relevant insofar as it allows us to concentrate on one colour functional only: . Now supppose that w is in relation M to a node u. (So, there is an edge of colour M from v to u.) We need to determine the relation (there is only one) from vi to u. This is N , where N pq pq and pq 1 iff p q 1, where xip occurs in t p . The map that sends M to N and to is the desired colour functional . The other functionals can be straightforwardly dened.
V V ) U U
V ) U
V ) U
A0
An
v0
v1 A1
xn
1 1
V a0
( aU
V aa@a0
( aU
3 3 aa 3
V ma0
( aU
(5.113)
B tj : j
j A 0 x0 : j
An
j xn
pq pq with pq has only one colour, K , where K rule we add a graph replacement rule. Let
0 1 0 1 y0 x0 x1 y1
x w
0 1 0 0 y0 x0 y1 x1
0 0 0 0 y0 y1 x0 x1
1 1 1 1 x0 x1 y0 y1
x w
1 1 0 1 x0 y0 x1 y1
1 1 0 0 x0 y0 y1 x1
1 iff p
q. For every :j k
Discontinuity
413
Figure 14. An Exponentially Growing Tree
There is a possibility of dening structure in some restricted cases, namely always when the right hand sides do not contain a variable twice. This differs from linear grammars in that variables are still allowed to occur several times on the left, but only once on the right. An example is the grammar
The notion of structure that has been dened above can be transferred to this grammar. We simply do as if the rst rule was of this form
where it is clear that x and y always represent the same string. In this way we get the structure tree for shown in Figure 14. Notes on this section. In (Seki et al., 1991), a slightly more general type of grammars than the LCFRSs is considered, which are called Multiple Context Free Grammars (MCFGs). In our terminology, MCFGs maybe additionally upward deleting. In (Seki et al., 1991) weak equivalence between MCFGs and LCFRs is shown. See also (Kasami et al., 1987). (Michaelis, 2001b) also denes monotone rules for MCFGs and shows that any MCFL can be
H H H H H H H @@@H
V U V U )
mV
(5.116)
xy
mV
V U
mV
(5.115)
xx
x ;
vv
H
}
H
} }
414
PTIME Languages
generated by a monotone MCFG. For the relevance of these grammars in parsing see (Villemonte de la Clergerie, 2002a; Villemonte de la Clergerie, 2002b) . In his paper (1997), Edward Stabler describes a formalisation of minimalist grammars akin to Noam Chomskys Minimalist Program (outlined in (Chomsky, 1993)). Subsequently, in (Michaelis, 2001b; Michaelis, 2001a; Michaelis, 2001c) and (Harkema, 2001) it is shown that the languages generated by this formalism are exactly those that can be generated by simple LMGs, or, for that matter, by LCFRSs. Exercise 188. Show that the derivation is determined by up to renaming of variables. Exercise 189. Prove Proposition 5.39. Exercise 190. Let Ak : k , and let Wk : i:i that Wk is a mLCFRL iff k 2m.
Exercise 191. Determine the graph grammar G . Exercise 192. Show the following. Let N xi : i k yi : i k and a linear ordering on N with xi x j as well as yi y j for all i j k. Then if mi j 1 iff xi y j then M mi j i j is a kscheme. Conversely: let M be a kscheme and dened by (1) xi x j iff i j, (2) yi y j iff i j, (3) xi y j iff mi j 1. Then is a linear ordering. The correspondence between orderings and schemes is biunique. Exercise 193. Show the following: If in a linear kLMG all structure trees are exhaustively ordered, the generated tree set is context free.
5.
Adjunction Grammars
In this and the next section we shall concern ourselves with some alternative types of grammars which are all (more or less) equivalent to head grammars. These are the tree adjoining grammars (TAGs), CCGs (which are some rened version of the adjunction grammars of Section 1.4 and the grammars CCG Q of Section 3.4, respectively) and the socalled linear index grammars. Let us return to the concept of tree adjoining grammars. These are pairs N A , where is the set of centre trees and a set of adjunction G trees. In an adjunction tree a node is called central if it is above the distin
S s "T
3S
n k i
:n
. Show
0 ) ) ) (
V U
Adjunction Grammars
415
guished leaf or identical to it. It is advantageous to dene a naming scheme for nodes in an adjunction tree. Let T T T T be a centre tree. Then put N T : T . If adjoining A A A A at x to yields U U U U then
Let H be the set of all nodes of centre or adjunction trees. Then N H H . Furthermore, as is inductively veried, N is prex free. Thus, as x gets replaced by strings of which x is a sufx, no name that is added equals any name in N . So, the naming scheme is unique. Moreover, it turns out that the structure of is uniquely determined by N . The map that identies x T with its name is called . Put N : N and
If j, let N : X for the unique X which is the label of the node j in . Second, put N if j and j for certain , , , and j j such that j j . Third, if j and j for certain , , , and j j , such that (a) j j , (b) j and every node of is central in its corresponding adjunction tree.
Thus, we basically only need to dene N . It turns out that the sets N are the sets of leaves of a CFG. For the sake of simplicity we assume that the set of nodes of is the set of numbers j 01 j 1 . j : max j : . The terminals are the numbers j . The nonterminals are pairs i where is a tree and i j . The start symbols are 0 , . First, we shall dene the grammar D G . The rules are of this form.
where (5.119) is to be seen as rule a scheme: for every and every admissible j we may choose whether Xi is i or i i (i j ) for some tree i which can be adjoined at i in . This grammar we denote by D G and call it the derivation grammar. A derivation for G is simply a tree generated by D G . The following is clear: trees from D G are in onetoone correspondence with their sets of leaves, which in turn dene tree of the adjunction grammar.
V U
V U
V U
V U
) U
aa
CV ) U
(5.119)
X0
X1
Xj
V lU
v "V
U yaaa) ) S )
V U
V l U
Proposition 5.48 is an isomorphism from
onto
k k i D k j i i i i W W W i D W W i CW W i i "W k W i i
V lU
V ) U T Ys o1
i i
j
i) (
V iU
V lU
(5.118)
} V lU
V lU
V lU
V lU
(5.117)
N T
v:v
V U
j
1 V V U W W SsIT SvV U U D r i) ( 0 ) ) i) ( 3 j i) ( D D D
r
V U
j
i i W Wk W i i iD i
U S
1 !) U V
W Wi
U x D D i
416
PTIME Languages
It should be said that the correspondence is not always biunique. (This is so since any given tree may have different derivations, and these get matched with nonisomorphic trees in D G . However, each derivation tree maps to exactly one tree of G modulo isomorphism.) TAGs differ from the unregulated tree adjunction grammars in that they allow to specify whether adjunction at a certain node is licit, which trees may be adjoined at which node, and whether adjunction is obligatory at certain nodes. We shall show that increases the generative power but and do not in presence of . To establish control over derivations, we shall have to change our denitions a little bit. We begin with . To control for the possiblity of adjunction, we assume that the category symbols are now of the form a, a A, or X and X respectively, where X N. Centre and adjunction trees are dened as before. Adjunction is also dened as before. There is a leaf i which has the same label as the root (all other leaves carry terminal labels). However, no adjunction is licit at nodes with label X . Notice that root and distinguished leaf must carry the same nonterminal, it is not admitted that one carries X while the other has X . Even if we admitted that, this would not increase the generative capacity. Using such grammars one can generate the language n n n n : n . Figure 15 shows such a grammar. The centre tree is shown to the left, the adjunction tree to the right. It is not hard to show that one can generally reduce such grammars to those where both the root and the leaf carry labels of the form X . Namely, if the root does not carry an adjunction prohibition, but, say, a label X N, then add a new root which has label X , and similarly for the distinguished leaf. Also, notice that adjunction prohibition for interior nodes of adjunction trees can be implemented by changing their label to a newly added nonterminal. Trivially, no tree can be adjoined there. Denition 5.49 A standard tree adjoining grammar (or simply a TAG) is an adjunction grammar in which the adjunction trees carry an adjunction prohibition at the root and the distinguished leaf. Now let us turn to and . It is possible to specify in standard TAGs whether adjunction is obligatory and which trees may be adjoined. So, we also have
V U
Adjunction Grammars
417
a function f which maps all nodes with nonterminal labels to sets of adjuncthen that node effectively has an adjunction tion trees. (If for some i f i prohibition.) We can simulate this as follows. Let be the set of adjunction trees. We think of the nonterminals as labels of the form X and X , respectively, where X N and . A (centre or adjunction) tree is replaced by all trees on the same set of nodes, where i carries the label X if i had label X in if f i , and X if i has the label X in . However, if i is the root, it will only get the label i . The second element says nothing but which tree is going to be adjoined next. This eliminates the second point from the list, as we can reduce the grammars by keeping the tree structure. Now let us turn to the last point, the obligation for adjunction. We can implement this by introducing labels of the form X . (Since obligation and prohibition to adjoin are exclusive, occurs only when does not.) A tree is complete only if there are no nodes with label X for any X. Now we shall show that for every adjunction grammar of this kind there exists a grammar generating the same set of trees where there is no obligation for adjunction. We adjoin to a centre tree as often as necessary to eliminate the obligation. The same we do for adjunction trees. The resulting trees shall be our new centre and adjunction trees. Obviously, such trees exist (otherwise we may choose the set of centre trees to be empty). Now we have to show that there
0 ) (
0 g)
0 ) (
) (
0 ) (
Figure 15. A Tree Adjoining Grammar for
n n n n
:n
} 3
3 s
3 3 s
}
3H 3

V U
V U
3 3 1
r
k Y
418
PTIME Languages
exists a nite set of minimal trees. Look at a tree without adjunction obligation and take a node. This node has a history. It has been obtained by successive adjunction. If this sequence contains an adjunction tree twice, we may cut the cycle. (The details of this operation are left to the reader.) This grammar still generates the same trees. So, we may remain with the standard form of TAGs. Now we shall rst prove that adjunction grammars cannot generate more languages as linear 2LMGs. From this it immediately follows that they can be parsed in polynomial time. The following is from (1986), who incidentally show the converse of that theorem as well: head grammars are weakly equivalent to TAGs. Theorem 5.50 (VijayShanker & Weir & Joshi) For every TAG G there exists a head grammar K such that L K LG. Proof. Let G be given. We assume that the trees have pairwise disjoint sets of nodes. We may also assume that the trees are at most binary branching. (We only need to show weak equivalence.) Furthermore, we can assume that the nodes are strictly branching if not preterminal. The set of all nodes is denoted ia : i M in : i M . The by M. The alphabet of nonterminals is N : start symbol is the set of all ia and in where i is the root of a centre tree. By massaging the grammar somewhat one can achieve that the grammar contains only one start symbol. Now we shall dene the rules. For a local tree we put
if i is a leaf with terminal symbol a. If i is a distinguished leaf of an adjunction tree we also take the rule
Further, if i is a node to which a tree with root j can be adjoined, then also this is a rule.
(5.123)
i n x0 y0 y1 x1
j n x0 x1
ia y0 y1
) V
(5.122)
Now let i
k be a branching local tree. Then we add the following rules. j n x0 x1 k n y0 y1
i a x0 x1 y0 y1
XV
) U
(5.121)
in
mV
) U
(5.120)
ia
S s "T
V U S
V U
Adjunction Grammars
419
If adjunction is not necessary or prohibited at i, then nally the following rule is added.
This ends the denition of K. In view of the rules (5.122) it is not entirely clear that we are dealing with a head grammar. So, replace the rules (5.122) by the following rules: j n x0 x1 y0 y1 j n x0 x1
(5.128)
These are rules of a head grammar; (5.122) can be derived from them. For this reason we remain with the rules (5.122). It remains to show that L K L G . First the inclusion L G LK . We show the following. Let be a local tree which contains exactly one distinguished leaf and nonterminal leaves x i , i n, with labels ki . Let therefore j i be distinguished. We associate with a vector polynomial which returns
i j j i n
0 i n
This claim can be proved inductively over the derivation of in G. From this it follows immediately that x L K if x L G . For the converse inclusion one has to choose a different proof. Let x L K . We choose a Kderivation of x. Assume that no rule of type (5.123) has been used. Then x is the string of
V U V U
1 i 1 i
V U
1 i
0 i i
) i ( i
(5.131)
y 0 z0
yi zi
V lU
If no leaf is distinguished in
the value of p
is exactly
V a0
i l)
i( aU
)iaa )V i i( a0 ) aU
maa0 V V
0 l) aalU U i i((UV
(5.130)
in
yi zi : i
n k0 y0 z0
n kn
yn
zn
for given pairs of strings yi zi . It is possible to show by induction over that there is a Kderivation
0 l) ( i i
0 i i
) i i
(5.129)
yi zi
yi zi
V U
V lU
} V U
V U
V U
(5.127)
x0 x1 y0 y1
x0 x1 k y0 y1
V V
(5.126)
y0 y1
(5.125)
XV ) I U U V ) V ) U I ) UV ) U I ) V ) ) V ) U V )
ia x0 x1 y0 y1
j n x0 x1 k n y0 y1
XV
(5.124)
i n x0 x1
ia x0 x1
420
PTIME Languages
a centre tree as is easily seen. Now we assume that the claim has been shown for derivations with fewer than n applications of (5.123) and that the proof has exactly n applications. We look at the last application. This is followed only by applications of (5.122), (5.120) and (5.121). These commute if they belong to different subtrees. We can therefore rearrange the order such that our application of (5.123) is followed exactly by those applications of (5.122), (5.120) and (5.121) which belong to that subtree. They derive
where i is the left hand side of the application of (5.123), and x 0 x1 is the pair of the adjunction tree whose root is i. (x 0 is to the left of the distinguished leaf, x1 to the right.) Before that we have the application of our rule (5.123):
Now we eliminate this part of the derivation. This means that in place of ja x0 y0 y1 x1 we only have j n y0 y1 . This however is derivable (we already have the derivation). But on the side of the adjunction this corresponds exactly to the disembedding of the corresponding adjunction tree. The converse also holds. However, the head grammars do not exhaust the 2LMGs. For example look at the following grammar G. x0 x1 x0 y1 x0 x1 x0 x1 x0 x1
(5.134)
To analyze the generated language we remark the following facts.
From this the following characterization can be derived.
G
~
(5.135)
x0
x1
iff
x0 x1
V ) U
As a proof one may reect that rst of all
and secondly
0 ) ( i i
x y iff x y
V !) U i i
Lemma 5.51
n n
x0 x1
V V
x0 x1
x0 x1
n n
V V U V )
XV ) U U XV ` ) U U XV ) U 7 4 X$' U V p) H } U " U XV ) H } U V "X$ ') U p
XV
y0 x0 y1 x1
x0 x1
y0 y1
for some n
i i V ) U
) i i V 6) U
V !) U i i
i !) i U i i
(5.133)
j a x0 y0 y1 x1
ia x0 x1 j n y0 y1
0 6) ( i i
i i V 6) U )
(5.132)
i a x0 x1
V i !) i U i i
Adjunction Grammars
`
421
and yn :
for certain natural numbers m and n. For example
are in L G but not
Now for the promised proof that there is no TAG which can generate this language.
for certain natural numbers m and n .
We put : n m . (This is , if m 0.) Certainly, there exists the minimum of all for all adjunction trees. It is easy to show that it must be 0. So there exists an adjunction tree which consists only of , , and , in equal number. Further there exists an adjunction tree which contains . Let x be a string from L G such that
for certain natural numbers m and n such that (a) m is larger than any m , and (b) n m is smaller than any that is not equal to 0. It is to be noticed that such a x exists. If m and n are chosen, the following string does the job.
(5.140)
n n m 1 n 1 n n
m m
g Y` g
g U
H 7
V U i
(5.139)
g Y` g
g U
g V g p Cg g U H
V U
(5.138)
V U
V U
Lemma 5.53 Let H be a TAG with L H tion tree. Then
L G and
p p @
` `
` $
a centre or adjunc-
p p p aa @@
` `
`
H 7 4 H 7 7 4 4 89b9H
g V g p g g U H
p p RF
p RF@
`
g Y` g
H 7 4 H 7 7 4 4 89@89H
g U
H 7 4 b#H
H 7 4 b#H
V U i
(5.137)
g V g p Cg g U H
V U
1 i
In particular, for every x
LG
1 ) 1 aa2 p i i i
1
6aa i
(5.136)
LG
x n0 x n1
x nk
y nk
4 k
Lemma 5.52 Let xn :
n n
i S V U H D D
n n.
Then y n1 y n0
k k
: k
ni
for all i
p p @
4 #
V U
H H
422
PTIME Languages
This string results from a centre tree by adjoining (a ) an in which occurs, by adjoining (b ) a in which does not occur. Now we look at points in which has been inserted in (5.140). These can only be as follows.
If we put this on top of each other, we get
Now we have a contradiction. The points of adjunction may not cross! For the subword between the two must be a constituent, likewise the part between the two . However, these constituents are not contained in each other. (In order for this to become a real proof one has to reect over the fact that the constituent structure is not changed by adjunction. This is Exercise 194.) So we have a 2LMG which generates a language that cannot be generated by a TAG. This grammar is 2branching. In turn, 2branching 2LMGs are weaker than full linear 2LMGs. Some parts of the argumentation shall be transferred to the exercises, since they are not of central concern. Denition 5.54 A linear LMG is called nbranching if the polynomial base consists of at most kary vector polynomials. The reason for this denition is the following fact. Proposition 5.55 Let L L G for some nbranching, klinear LMG G. Then there exists a klinear LMG H with L H L in which every rule is at most nbranching. To this end one has to see that a rule with more than n daughters can be replaced by a canonical sequence of rules with at most n daughters, if the corresponding vector polynomial is generated by at most nary polynomials. On the other hand it is not guaranteed that there is no nbranching grammar if higher polynomials have been used. Additionally, it is possible to construct languages such that essentially n 1ary polynomials have been used and they cannot be reduced to at most nary polynomials. Dene as before
`
(5.144)
xn :
yn :
n n
V U
"X
p F3
V U
FX
(5.143)
n m 1
n 1 n n
"X
p F
FX
(5.142)
n n m 1
n 1 n n
However, let us look where the adjunction tree
F p
H 7
H 7
(5.141)
n m 1 n 1 n
m m
has been inserted.
n n
H 7
4 # 4 # 4 #
H H H X
Adjunction Grammars
423
The following polynomial is not generable using polynomials that are at most ternary.
From this we can produce a proof that the following language cannot be generated by a 2branching LMG.
We close this section with a few remarks on the semantics. Adjunction is an operation that takes complete trees as input and returns a complete tree. This concept is not easily coupled with a semantics that assembles the meanings of sentences from their parts. It is at least in Montague semantics impossible to recover the meaning components of a sentence after completion, which would be necessary for a compositional account. (Harris, 1979) only gives a modest sketch of how adjunction is done in semantics. Principally, for this to work one needs a full record of which items are correlated to which parts of meaning (which is assumed, for example, in many syntactic theories, for example LFG and HPSG). Exercise 194. Let be a tree and an adjunction tree. Let be the result of adjoining to x in . We view in a natural way as a subtree of with x the lower node of in . Show the following: the constituents of are exactly the intersection of constituents of with the set of nodes of .
n n n n :n Exercise 195. Show that the language L : cannot be generated by an unregulated TAG. Hint. Proceed as in the proof above. Take a string which is large enough so that a tree has been adjoined and analyze the places where it has been adjoined.
Exercise 197. Show the following: For every TAG G there is a TAG G in standard form such that G and G have the same constituent structures. What can you say about the labelling function? Exercise 198. Prove Proposition 5.55.

Exercise 196. Show that in the example above min : Compare the discussion in Section 2.7.
0. Hint.
i S
(5.146)
x n0 x n1 x n2 x n3 y n2 y n0 y n3 y n1 : n 0 n 1 n 2 n 3
V a0
() 0
() 0
() 0
(U a8 D
(5.145)
w 0 w1
x0 x1
y0 y1
z0 z1
w 0 x0 y0 z0 y1 w1 z1 x1
424 6.
PTIME Languages
Index Grammars
Index grammars broaden the concept of CFGs in a very special way. They allow to use in addition of the nonterminals a sequence of indices; the manipulation of the sequences is however very limited. Therefore, we may consider these grammars alternatively as grammars that contain rule schemata rather than individual rules. Let as usual A be our alphabet, N the set of nonterminals (disjoint with A). Now add a set I of indices, disjoint to both A and N. Furthermore, shall be a symbol that does not occur in A N I. An index scheme has the form
or alternatively the form
where i I for i n, and a A. The schemata of the second kind are called terminal schemata. An instantiation of is a rule
where the following holds.
For a terminal scheme the following condition holds: if then x . An index scheme simply codes the set of all of its instantiations. So we may also call it a rule scheme. If in a rule scheme we have as well as i for all i n then we have the classical case of a context free rule. We therefore call an index scheme context free if it has this form. We call it linear if i for at most one i n. Context free schemata are therefore also linear but the converse need not hold. One uses the following suggestive notation. A denotes an A with an arbitrary stack; on the other hand, A is short for A . Notice for example the following rule.
(5.150)
For all i
n: if i
then yi
D i
i D
If
then for all i
n: yi
or yi
If
then x
and yi
for all i
n. x.
i W
aa
i i W
i i W
(5.149)
A x
B 0 y0 0
Bn
T S Hs
i W
(5.148)
i W
aa
i W
i W
(5.147)
B 0 0
Bn
yn 1 n
1 i) i D D
D i
Index Grammars
425
This is another form for the scheme

f S T
which in turn comprises all rules of the following form

f
Denition 5.56 We call an index grammar a sextuple G SAN I R where A, N, and I are pairwise disjoint nite sets not containing , S N the start symbol and R a nite set of index schemata over A, N, I and . G is called linear or a LIG if all its index schemata are linear. The notion of a derivation can be formulated over strings as well as trees. (To this end one needs A, N and I to be disjoint. Otherwise the category symbols cannot be uniquely reconstructed from the strings.) The easiest is to picture an index grammar as a grammar S N A R , where in contrast to a context free rule set we have put an innite set of rules which is specied by means of schemata, which may allow innitely many instantiations. This allows us to transfer many notions to the new type of grammars. For example, it is easily seen that for an index grammar there is a 2standard form which generates the same language. ,N: The following is an example of an index grammar. Let A : ,I: , and (5.153)
Index grammars are therefore quite strong. Nevertheless, one can show that they too can only generate PTIMElanguages. (For index grammars one can
) f q ) f f $q! ) f f f H H! ) f f H! $qq!@H $!q @H f f
(5.154)
V U
This denes the grammar G. We have L G look at the following derivation.

D )7D D f s D D 7D f s s D7D ) (s }
2n
:n
. As an example,
1 0 ) ) ) ) ( )
T S H D
"y$ D $
0 ) ) ) (
f !
i $
f" H yw D s bf
0 H H H )$!q@@H f f q f ) q !H f f D ) D f
T f ) S
(5.152)
(5.151)
D S
T F) ) S
426
PTIME Languages
dene a variant of the chartalgorithm This variant also needs only polynomial time.) Of particular interest are the linear index grammars. Now we turn to the equality between LIGs and TAGs. Let G be an LIG; we shall construct a TAG which generates the same constituent structures. We shall aim for roughly the same proof as with CFGs. The idea is again to look for nodes x and y with identical label X x. This however can fail. For on the one hand we can expect to nd two nodes with identical label from N, but they may have different index stack. It may happen that no such pair of nodes exists. Therefore we shall introduce the rst simplication. We only allow rules of the following form. (5.155b) (5.155c) (5.155d) X Y0 Y0 a Yj Yn Yj i Yj Yn
X X
In other words, we only admit rules that stack or unstack a single letter, or which are context free. Such a grammar we shall call simple. It is clear that we can turn G into simple form while keeping the same constituent structures. Then we always have the following property. If x is a node with label X x and if x immediately dominates the node x with label Y xi then there exists a node y x with label V xi which immediately dominates a node with label W x. At least the stacks are now identical, but we need not have Y V . To get this we must do a second step. We put N : N 2 o e a (but write A B x in place of A B x ). The superscript keeps score of the fact whether at this point we stack an index (a), we unstack a letter (e) or we do nothing (o). The index alphabet is I : N 2 I. The rules above are now reformed as follows. (For the sake of perspicuity we assume that n 3 and j 1.) For a rule of the form (5.155b) we add all rules of the form So we stack in addition to the index i also the information about the label with which we have started. The superscript a is obligatory for X X ! From the rules of the form (5.155a) we make rules of the following form. However, we shall also add these rules:
( 0 ) k 0 k )
(5.158)
Y1 Y1
Y1 Z
a o
0k )
0k )
0k )
$0 ) k )
( 0k ) (
(5.157)
X X
a o
W Y1 i
Y0 Y0
a o
W Y1
Y2 Y2
0k ) (
0k )
( 0 ) k ) ( 0 k )
( 0k )
( 0 k ) (
(5.156)
X X
Y0 Y0
a o
Y1 Y1
a o
X X i
Y2 Y2
a o
a o
0 ) (
T ) ) e S
aa aa
aa aa aa

(5.155a)
Xi
Y0
Yj
Yj
Yj
Yn
1 1
0 ) ) (
8k
Index Grammars
427
Finally, the rules of the form (5.155d) are replaced by these rules.
We call this grammar G . We shall at rst see why G and G generate the same constituent structures. To this end, let us be given a G derivation. We then get a Gderivation as follows. Every symbol of the form X X a e o is replaced by X, every stack symbol X X i by i. Subsequently, the rules of type (5.158) are skipped. This yields a Gderivation, as is easily checked. It gives the same constituent structure. Conversely, let a Gderivation be given with associated ordered labelled tree . Then going from bottom to top we do the following. Suppose a rule of the form (5.155b) has been applied to a node x and that i has been stacked. Then look for the highest node y x where the index i has been unstacked. Let y have the label B, x the label A. Then replace A by A B a and the index i on all nodes up to y by A B i . In between x and y we insert a node y with label A B e . y has y as its only daughter. y keeps at rst the label B. If however no symbol has been stacked at x then exchange the label A by A A o , where A is arbitrary. If one is at the bottom of the tree, one has a G tree. Again the constituent structures have been kept, since only unary rules have been inserted. Now the following holds. If at x the index A B i has been stacked then x has the label A B a and there is a node y below x at which this index is again removed. It has the label A B e . We say that y is associated to x. Now dene as in the case of CFLs centre trees as trees whose associated string is a terminal string and in which no pair of associated nodes exist. It is easy to see that in such trees no symbol is ever put on the stack. No node carries a stack symbol and therefore there are only nitely many such trees. Now we dene the adjunction trees. These are trees in which the root has label A B a exactly one leaf has a nonterminal label and this is A B e . Further, in the interior of the tree no pair of associated nodes shall exist. Again it is clear that there are only nitely many such trees. They form the basic set of our adjunction trees. However, we do the following. The labels X X o we replace by X X , the labels X X a and X X e by X X . (Root and associated node get an adjunction prohibition.) Now the proof is as in the context free case.
0k
0 ) ) (
0k ) (
0 ) (
) (
0 ) (
0 ) ) (
0 ) (
0k ) (
0 )k ) (
0k ) (
0 k
0 ) (
) (
0k ) (
0 ) (
0 ) (
0k ) (
(5.160)
X X
0k )
0k )
0k )
( 0 k ) (
(5.159)
X X
for all Y1 Y1 Z
N. The rules of the form (5.155c) are replaced thus.

o
0k ) (
)k )
Y0 Y0
a o
Y1 Y1
a o
Y2 Y2
a o
428
PTIME Languages
Now let conversely a TAG G N A be given. We shall construct a LIG which generates the same constituent structures. To this end we shall assume that all trees from and are based on pairwise disjoint sets of nodes. Let K be the union of all sets of nodes. This is our set of nonterminals. The set is our set of indices. Now we formulate the rules. Let i j 0 j1 jn 1 be a local subtree of a tree. (A) i is not central. Then add
(C) Let jk be a distinguished leaf of . Then add (D) Let i be central in , but not a root and j k central but not a distinguished leaf. Then let
be a rule. Nothing else shall be a rule. This denes the grammar G I . (This grammar may have start trees over distinct start symbols. This can be remedied.) Now we claim that this grammar generates the same constituent structures over A. This is done by induction over the length of the derivation. Let I , where be a centre tree, say B . Then let I : B I i : i if i is nonterminal and I i : i otherwise. One establishes easI ily that this tree is derivable. Now let B and I B already be constructed; let C result from by adjoining a tree to a node x. By making x into the root of an adjoined tree we get I . B C, B2 , B2 and B . Now I C Further, there is an isomorphism between the adjunction tree and the local subtree induced on C x . Let : C x B be this isomorphism. Put I y : I y if y I y : y if y is not central; and put B C. Put I y : I x : I x if y is a distinguished leaf. Finally, assume I x X x, where X is a nonterminal symbol and x I . If y is central but not root or leaf then put
i 'V U
V U k
(5.165)
y :
yx
0 i) ( 3 ) j )
0 l2%i) ( k 3)k j )k
V U
0 i) ( 3 ) j )
0 kl2j)i) ( 3 ) k k D 0)j)i) ( 3 V U3 D V U 3 0D3 ) j ) i) (
V U
V U V U 3 k S T Ds
aa
k y3
1 i
v T s S
t k 2j
aa
V U
V U k3 V U k3 D V U 3 D V U 3 k D
(5.164)
j0
jk
jk
jk
jn
aa
$y
(5.163)
j0
jk
jk
jk
jn
aa
(5.162)
j0
(B) Let i be root of Then add
and jk central (and therefore not a distinguished leaf). jk jk jk

1
aa
(5.161)
j0
j1
jn
jn
aa
0 ) ) ) (
t k %
V U
Index Grammars
429
Now it is easily checked that the sodened tree is derivable in G I . We have to show likewise that if is derivable in G I there exists a tree A with A I which is derivable in G. To this end we use the method of disembedding. One looks for nodes x and y such that they have the same stack, x y, there is no element between the two that has the same stack. Further, there shall be no such pair in x y x . It is easily seen that this tree is isomorphic to an adjunction tree. We disembed this tree and gets a tree which is strictly smaller. (Of course, the existence of such a tree must still be shown. This is done as in the context free case. Choose x of minimal height such that such there exists a y x with identical stack. Subsequently, choose y maximal with this property. In x y x there can then be no pair x , y of nodes with identical stack such that y x . Otherwise, x would not be minimal.) We summarize. Theorem 5.57 A set of constituent structures is generated by a linear index grammar iff it is generated by a TAG. We also say that these types of grammars are equivalent in constituent analysis. A rule is called right linear if the index is only passed on to the right hand daughter. So, the right hand rule is right linear, the left hand rule is not:
r
An index grammar is called right linear if all of its rules are right linear. Hence it is automatically linear. The following is from (Michaelis and Wartena, 1997; Michaelis and Wartena, 1999 ). Theorem 5.58 (Michaelis & Wartena) A language is generated by a right linear index grammar iff it is context free. Proof. Let G be right linear, X N. Dene HX as follows. The alphabet of nonterminals has the form T : X : X N . The alphabet of terminals is the one of G, likewise the alphabet of indices. The start symbol is X. Now for every rule
1
we add the rule
y
(5.168)
B i
aa
(5.167)
B0
Bn
Bn i
1 D
(5.166)
k V IT S s
V S IT s
U v
U v
430
PTIME Languages
This grammar is right regular and generates a CFL (see the exercises). So L L there exists a CFG LX : SX NX N RL which generates L HX . (Here N is X the alphabet of nonterminals of G but the terminal alphabet of L X .) We asX L sume that NL is disjoint to our previous alphabets. We put N : NX N as L well as R : RX R R where R is the set of context free rules of G and R the set of rules A B0 Bn 1 such that A B0 Bn 1 Bn i R. Finally, let G : SL N A R . G is certainly context free. It remains to show that L G L G . To this end let x L G . There exists a tree with associated string x which is derived from G. By induction over the height of this tree one shows that x L G . The inductive hypothesis is this: For every Gtree with associated string x there exists a G tree with associated string x; and if the root of carries the label X x then the root of carries the label contains no stack symbols, this claim is certainly true. Simply take X. If : . Further, the claim is easy to see if the root has been expanded with a context free rule. Now let this not be the case; let the tree have a root with label U. Let P be the set of right hand nodes of . For every x P let B x be that tree which contains all nodes which are below x but not below any y P with y x. It is easy to show that these sets form a partition of . Let u x, u P. By induction hypothesis, the tree dominated by u can be restructured into a tree u which has the same associated string and the same root label and which is generated by G . The local tree of x in B x is therefore an instance of a rule of R . We denote the tree obtained from x in such a way by x, y P, and if u x then u y. x. x is a G tree. Furthermore: if y xi : i n is an enumeration with xi xi 1 for Therefore we have that P all i n 1. Let Ai be the root label of xi in xi . The string i n Ai is a string of HU . Therefore it is generated by LU . Hence it is also generated by G . So, there exists a tree associated to this string. Let the leaves of this tree be exactly the xi and let xi have the label Ai . Then we insert xi at the place of xi for all i n. This denes . is a G tree with associated string x. The converse inclusion is left to the reader. We have already introduced Combinatory Categorial Grammars (CCGs) in Section 3.4. The concept of these grammars was very general. In the literature, the term CCG is usually xed following Mark Steedman to a particular variant where only those combinators may be added that perform function application and generalized function composition. In order to harmonize the notation, we revise it as follows.
(5.169)
replaces
replaces
V U
aa
V U
V U
) )
1 i k
0k ) )k ) aa
Vk U
1 i
V U
k D i
Vk U v k
Index Grammars
431
We take pi as a variable for elements from . A category is a well formed string over B . We agree on obligatory left associative bracketing. That means that the brackets that we do not need to write assuming left associativity actually are not present in the string. Hence is a category, as is . However, and are not. A block is a sequence of the form a or a, a basic, or of the form or , where is a complex category symbol. (Often we ignore the details of the enclosing brackets.) A pcategory is a sequence of blocks, seen as a string. With this a category is simply a string of the form where is a pcategory. If and are pcategories, so is . For a category we dene by induction the head, , K , as follows. K b : b.
If we regard the sequence simply as a string we can use as the concatenation symbol of blocks as well as of sequences of blocks. We admit the following operations. (If is basic, omit the additional enclosing brackets.)
Here n is a variable for pcategories consisting of n blocks. In addition it is possible to restrict the choice of heads for and . This means that we dene operations iF A n in such a way that
This means that we have to step back from our ideal to let the categories be solely determined by the combinators.
(5.174)
LRn i
1 dV U )
1 dV U
(5.173)
n 4
: n
(5.172)
X W D X W C D X
(5.171)
n 3
n :
(5.170)
:
:
if K LK otherwise.
R,
WV U
Lemma 5.59 Every category can be uniquely segmented as where is a pcategory.
V U
: K
: K .
K
@ 1 W k
T HS )
u
#
'
V U
T 3) 3@ij) S ) t ) s
D
V U
432
PTIME Languages
Denition 5.60 A combinatory categorial grammar (or CCG) is a categorial grammar which uses nitely many operations from
Lemma 5.61 Let G be a CCG over A and M the set of categories which are subcategories of some a , a A. Then the following holds. If x is a string of category in G then where M and is a pcategory over M. The proof is by induction over the length of x and is left as an exercise. Theorem 5.62 For every CCG G there exists a linear index grammar H which generates the same trees. Proof. Let G be given. In particular, G associates with every letter a A a nite set a of categories. We consider the set M of subterms of categories from a : a A . This is a nite set. We put N : M and I : M. By Lemma 5.61, categories can be written as pairs where N and is a pcategory over I. Further, there exist nitely many operations which we L write as rules. Let for example 1 R be an operation. This means that we have rules of the form
where K L and K R, and not basic. We write this into linear index rules. Notice that in any case M because of Lemma 5.61. Furthermore, we must have M . So we write down all the rules of the form
where for certain M and M. We can group these R. Let be the into nitely many rule schemata. Simply x where K set of all sequences i : i p M whose concatenation is . is nite. Now put for (5.177) all rules of the form
k
(5.178)
1 V U
1 d0
(5.177)
1 V U
(5.176b)
(5.176a)
DV U
Notice by the way that
0 3
and
n. 4
This simplies the calculations.
) )
X X D X X S s !) I"T
X X !) IS 1 1 V U D
(5.175)
LR 1
LR 2
:L R
LRn 3
LRn 4
:n
LR
V U ( V U
Index Grammars
433
where M is arbitrary with K L and . Now one can see easily that every instance of (5.177) is an instance of (5.178) and conversely. Analogously for the rules of the following form.
where K L and K R. Now it turns out that, because of Lemma 5.61 n M n and M. Only may again be arbitrarily large. Nevertheless we have M , because of Lemma 5.61. Therefore, (5.180) only corresponds to nitely many index schemata. The converse does not hold: for the trees which are generated by an LIG need not be 3branching. However, the two grammar types are weakly equivalent. Notes on this section. There is a descriptive characterization of indexed languages akin to the results of Chapter 6 which is presented in (Langholm, 2001). The idea there is to replace the index by a socalled contingency function, which is a function on the nodes of the constituent structure that codes the adjunction history. Exercise 199. Show the following claim. For every index grammar G there L H . If G is is an index grammar H in 2standard form such that L G linear (context free) H can be chosen linear (context free) as well. Exercise 200. Prove the Lemma 5.61. Exercise 201. Write an index grammar that generates the sentences of predicate logic. (See Section 2.7 for a denition.) Exercise 202. Let NB be the set of formulae of predicate logic with and in which every quantier binds at least one occurrence of a variable. Show that there is no index grammar that generates NB. Hint. It is useful to concentrate on formulae of the form QM, where Q is a sequence of quantiers and M a formula without quantiers (but containing any number of conjuncts). Show that in order to generate these formulae from NB, a branching rule is needed. Essentially, looking top down, the index stack has to memorize which variables have been abstracted over, and the moment that there is a branching rule, the stack is passed on to both daughters. However, it is not required that the left and right branch contain the same variables.
V U
V U
1 "V U
(5.180)
In a similar way we obtain from the operations
(5.179)
LRn 3
i1
1 dV U
1 V U
1 $k
rules of the form
434 7.
PTIME Languages
Compositionality and Constituent Structure
In this section we return to the discussion of compositionality, which we started in Chapter 3. Our concern is how constituent structure and compositionality constrain each other. In good cases this shows that the semantics of a language does not allow a certain syntactic analysis. This will allow to give substance to the distinction between weak and strong generative capacity of grammar types. Recall once again Leibniz Principle. It is dened on the basis of constituent substitution and truth equivalence. However, constituent substitution is highly problematic in itself, for it hardly ever is string substitution. If we do string substitution of by in (5.181), we get (5.182) and not the correct (5.183).
G
(5.181) (5.182) (5.183)
Notice that substitution in calculus and predicate logic also is not string substitution but something more complex. (This is true even when variables are considered simple entities.) What makes matters even more difcult in natural languages is the fact that there seems to be no uniform algorithm to perform such substitution. Another, related problem is that of determining occur as a subexpression in the word occurrences. For example, does , as a subexpression of ? What about in ? Worse still, does occur as a subexpression of , as a subexpression of ? Obviously, no one would say that occurs in , and that by Leibniz Principle its meaning is distinct from since there is no word . Such an argument is absurd. Likewise, in the formula , the variable does not occur, even though the string is a substring of the formula as a string. To be able to make progress on these questions we have to resort to the distinction between language and grammar. As the reader will see in the exercises, there is a tight connection between the choice of constituents and the meanings these constituents can have. If we x the possible constituents and their meanings this eliminates some but not all choices. However it does settle the question of identity in meaning and can then lead to a detection of subconstituents. For if two given expressions have the same meaning we can
d
D d
p d 4 p H 5 5 9 H @8h R5
y P 7 H RG RH 9 G yPIG IH 7 H 9 P 7 H 9 RG RH
h h h 9 8P A
5 h 4 8$#H
g g
D q
t r @q
G g D 4 5 h 4 4 b#h A D 9 f $} g g D 458h g $D R A 9 f } g D 5 h 4 A b1H X D 9 f $} A
%q s
4 5 8
A H
4 # d d
h h h @P 9 p p H 4 1A H p d 4 H 4 p hG 5 p d 1A h 7 h d d p 4 D 5 5 8h bE#89H 5 H P P D 8 5 h 4

8
435
conclude that they can be substituted for each other without change in the truth value in any given sentence on condition that they also have the same category. (Just an aside: sometimes substitution can be blocked by the exponents so that substitution is impossible even when the meanings and the category are the same. These facts are however generally ignored. See below for further remarks.) So, Leibniz Principle does tell us something about which substitutions are legitimate, and which occurrences of substrings are does not exist we can actually occurrences of a given expression. If safely conclude that has no proper occurrence in if substitution is simply string substitution. Moreover, Leibniz Principle also says that if two expressions are intersubstitutable everywhere without changing the truth value, then they have the same meaning. Denition 5.63 Let be a structure term with a single free variable, x, and a structure term unfolding to . If x unfolds to we say that the sign occurs in under the analyses and . Suppose now that is a structure term unfolding to , and that x is denite and unfolds to . Then we say that results from by replacing the occurrence of by under the analyses , and . This denition is complicated since a given sign may have different structure terms, and before we can dene the substitution operation on a sign we must x a structure term for it. This is particularly apparent when we want to dene simultaneous substitution. Now, in ordinary parlance one does not usually mention the structure term. And substitution is typically dened not on signs but on exponents (which are called expressions). This, however, is dangerous and the reason for much confusion. For example, we have proved in Section 3.1 that almost every recursively enumerable sign system has a compositional grammar. The proof used rather tricky functions on the exponents. Consequently, there is no guarantee that if y is the exponent of and x the exponent of there is anything in x that resembles y. Contrast this with CFGs, where a subexpression is actually also a substring. To see the dangers of this we discuss the theory of compositionality of (Hodges, 2001). Hodges discusses in passim the following principle, which he attributes to Tarski (from (Tarski, 1983)). The original formulation in (2001) was awed. The correct version according to Hodges (p.c.) is this.
A A X8 A 3
Tarskis Principle. If there is a meaningful structure term x to and x also is a meaningful structure term with x then .
d
unequal x
H45A h h h 9 8P A i
k T
H 4
k $
Q A XQ
436
PTIME Languages
Notice that the typed calculus satises this condition. Hodges dismisses Tarskis Principle on the following grounds. (5.185) (5.186) (5.187)
s
Substituting by in (5.184) yields a meaningful sentence, but it does not with (5.186). (As an aside: we consider the appearance of the upper case letter as well as the period as the result of adding a sign that turns the proposition into an assertion. Hence the substitution is performed on the string beginning with a lower case .) Thus, substitutability in one sentence does not imply substitutability in another, so the argument goes. The problem with this argument is that it assumes that for . Moreover, it we can substitute assumes that this is the effect of replacing a structure term by in some structure term for (5.184). Thirdly, it assumes that if we perform the same substitution in a structure term for (5.186) we get (5.187). Unfortunately, none of these assumptions is justied. (The pathological examples of Section 3.1 should sufce to destroy this illusion.) What we need is a strengthening of the conditions concerning admissible operations on exponents. In the example sentence, the substituted strings are actually nonconstituents, so even under standard assumptions they do not constitute counterexamples. We by . This is can try a different substitution, for example replacing a constituent substitution under the ordinary analysis. But this does not save the argument, for we do not know which grammar underlies the examples. It is not clear that Tarskis Principle is a good principle. But the argument against it is fallacious. Obviously, what is needed is a restriction on the syntactic operations. In this book, we basically present two approaches. One is based on polynomials (noncombinatorial LMGs), the other on terms for strings. In both cases the idea is that the functions should not destroy any material (ideally, each rule should add something). In this way the notion of composition does justice to the original meaning of the word. (Compositionality derives from Latin the putting together.) Thus, every derived string is the result of applying some polynomial applied to certain vectors, and this polynomial determines the structure as well as indirectly the meaning and the cate-
4 #
H h f h
` h 5 $7 g h
4 1
A H $h
h 4 $#H
4 5
A H $h
h h h h
G G G s
(5.184)
s
` G h7 g h 5$h h 4 5 4 A H h 4 H 4 A H #1h ` y h 5 $7 g h 5h 4 A H y h 4 H 4 A H 091$h G ` y'4#H$h f h 4 h 5 $7 g h 1$h 4 A H G H h f h 91$h 4 h 4 H 4 A H
y 4 '#
d h d p Rb1
d ju
437
gory. Both approaches have a few abstract features in common. For example, that the application of a rule is progressive with respect to some progress measurse. (For LMGs this measure is the combined length of the parts of the string vector.)
f is strictly progressive if
i n
A sign grammar is (strictly) progressive with respect to if for all modes f , f is (strictly) progressive. For example, a CFG is progressive with respect to length if it has no unary rules, and no empty productions (and then it is also strictly progressive). Let be a progress measure, a progressive grammar generating . Then a given exponent e can be derived with a term that has depth at most e . This means that its length is e , where : max f : f F . The number of such terms is F , so it is doubly exponential in e ! If is strictly progressive, the length of the structure term is e , so we have at most F e many. Now, nally, suppose that the unfolding of a term is at most exponential in its length, then we can compute for strictly progressive grammars in time O 2c e whether e is in 0 . Theorem 5.65 Suppose that is strictly progressive with respect to the progress measure . Assume that computing the unfolding of a term can be done in time exponential in the length. Then for every e, e 0 can be solved in time O c e for some c 0. If is only progressive, e 0 can be e for some c 0. solved in time O cc Notice that this one half of the nite reversibility for grammars (see Denition 4.85). The other half requires a similar notion of progress in the semantics. This would correspond to the idea that the more complex a sentence is the more complex its meaning. (Unlike in classical logic, where for example is simpler than .)
V U
V U
V U
V U
V U
V i aV U U
(5.189)
f e
ei
V U
V i aV U U U U
(5.188)
f e
max ei : i
Denition 5.64 A progress measure is a function : E f : E n E is progressive with respect to if
. A function
@Rd'
438
PTIME Languages
We still have not dened the notion of intersubstitutability that enters Leibniz Principle. We shall give a denition based on a grammar. Recall that for Leibniz Principle we need a category of sentences, and a notion of truth. So, let be a sign grammar and a category. Put
This denes the set of sentential terms (see also Section 6.1). Let be a structure term with a single occurrence of a free variable, say x. Then given , x if denite is the result of putting in place of x. Thus Pol1 . We dene the context set of e as follows.
We shall spell this out for a CFG. In a CFG, Cont Pol1 A . Moreover, if x occurs only once, the polynomials we get are quite simple: they are of the form p x u x v for certain strings u and v. Putting C : u v , p x C x . Thus, the context set for x dened in (5.191) is the set of all C such that C x is a sentence, and C is a constituent occurrence of x in it. Thus, (5.191) denes the substitution classes of the exponents. We shall also dene what it means to be syntactically indistinguishable in a sign system.
Denition 5.66 e and e are syntactically indistinguishable we write e e iff
This criterion denes which syntactic objects should belong to the same substitution category. Obviously, we can also use Husserls criterion here. However, there is an intuition that certain sentences are semantically but not syntactically well formed. Although the distinction between syntactic and semantic wellformedness breaks the close connection between syntactic categories and context sets, it seems intuitively justied. Structural linguistics, following Zellig Harris and others, typically denes categories in this way, using context sets. We shall only assume here that categories may not distinguish syntactic objects ner than the context sets.
1 Y0 k ) lk ( )
1 k
for all c C and all m . such that e c m
M: if e c m
then there is an m
1 k
1 C0 ) ) (
for all c C and all m M: if e c m that e c m and
then there is an m
M such M
V U i
0 !) '( i i
VaV YU U 1 Q
`S
1 d0 ) ) (
V U
(5.191)
Cont
e :
: for some such that
e: x Sent
V C0U
S
1 0 k ) k ( ) 1
i yW W i
(5.190)
Sent
V U
V U i V U i
439
Denition 5.67 Let be a system of signs. A sign grammar that generates is natural with respect to if Cont e Cont e implies e e . A context free sign grammar is natural iff the underlying CFG is reduced. Here is an example. Let
0 0 3
(5.194)
Further, is a two place function dened only on with result , a , two place function dened only on 0 3 with value 5. Similarly, and is undened elsewhere, and 0 3 7, and is undened elsewhere. Then the only denite structure terms are , , , , and . Together they unfold to exactly . This grammar is, however, not natural. We have
However, and do not have the same category. Now look at the grammar based on the following rules:
Here we compute that
Notice that in this grammar, has no constituent occurrence in a sentence. Only has. So, is treated as an idiom.
V U
(5.197)
Cont
Cont
T0 ( b) @S w U V D D
7 g
(5.196)
T0 ( b) @S
V U
V @ U
(5.195)
Cont
Cont
V ) U
0 ) (
V ) U 0 ) (
This corresponds to choosing ve modes, binary.
all unary, and
0 !( ) ) p D 0 ) F( ) T D 0 ) ) ( S H D
(5.193)
7
Let
be the sign grammar based on the following rules.
T ) ) () ) ) () ) ) () 0 R 0 Rg 0 (R0 ) ) 0 @S () ) ) (
(5.192)
both
V k U
V U
440
PTIME Languages
Denition 5.68 A vectorial system of signs is strictly compositional if there is a natural sign grammar for in which for every f F the function f is a vector term which is stricly progressive with respect to the combined lengths of the strings. The denition of compositionality is approximately the one that is used in the literature (modulo adaptation to systems of signs) while the notion of strict compositionality is the one which we think is the genuine notion reecting the intuitions concerning compositionality. A particularly wellknown case of noncompositionality in the strict sense is the analysis of quantication by Montague.
G
In the traditional, preFregean understanding the subject of this sentence is and the remainder of the sentence is the predicate; further, the predicate is predicated of the subject. Hence, it is said of nobody that he has seen Paul. Now, who is this nobody? Russell, following Freges analysis has claimed that the syntactic structure is deceptive: the subject of this sentence is contrary to all expectations not the argument of its predicate. Many, including Montague, have endorsed that view. For them, the subject denotes a socalled generalized quantier. Type theoretically the generalized quantie t . This is a set of properties, in this er of a subject has the type e case the set of properties that are disjoint to the set of all humans. Now, denotes a property, and (5.199) is true iff this property is in the set denoted by , that is to say, if it is disjoint with the set of all humans. The development initiated by Montague has given rise to a rich literature. Generalised quantiers have been a big issue for semantics for quite some time (see (Keenan and Westerst hl, 1997)). Similarly for the treatment of ina tensionality that he proposed. Montague systematically assigned intensional types as meanings, which allowed to treat world or situation dependencies. The general ideas were laid out in the semiotic program and left room for numerous alternatives. This is what we shall discuss here. However, rst we scrutinize Montagues analysis of quantiers. The problem that he chose to deal with was the ambiguity of sentences that were unambiguous with respect
A H
9 IH f 7
g 9
(5.199)
9 f IH g g g
(5.198)
yPIG @H 7 H 9 h h A A P 7 H 9 h h A A IG @H y
P 7 H 9 h h IG @A
g g 9
441
to the type assignment. Instructive examples are the following. (5.201)

e
(5.200)
Both sentences are ambiguous. (5.200) may say that there is a man such that he loves all women. Or it may say that for every woman there is a man who loves her. In the rst reading the universal quantier is in the scope of the existential, in the second reading the existential quantier is in the scope of the universal quantier. Likewise, (5.201) may mean two things. That there is a real unicorn and Jan is looking for it, or Jan is looking for something that is in his opinion a unicorn. Here we are dealing with scope relations between an existential quantier and a modal operator. We shall concentrate on example (5.200). The problem with this sentence is that the universal quantier may not take scope over . This is so since the latter does not form a constituent. (Of course, we may allow it to be a constituent, but then this option creates problems of its own. In particular, this does not t in with the tight connections between the categories and the typing regime.) So, if we insist on our analysis there is only one reading: the universal quantier is in the scope of the existential. Montague solved the problem by making natural language look more like predicate logic. He assumed an innite set of pronouns called n . These pronouns exist in inected forms and in other genders as well, so we also have n, n and so on. (We shall ignore gender and case inection at this point.) The other reading is created as follows. We feed to the verb the pronoun and get the constituent . This is an intransitive verb. Then we feed another pronoun, say and get . Next we combine this with and then with . These operations substitute genuine phrases for these pronouns in the following way. Assume that we have two signs:
Further, let Qn be the following function on signs:
Here subn x y is dened as follows.

n
by
V !) U i i
h i
For some k: x currences of

G
k.
Then subn x y is the result of replacing all ock.
0 aV k
) U ) V !) U ) i i
Vk ) U
(5.203)
Qn :
subn x y t Q m xn m
0 k ) ) ( i
) 0 ) u ) ( i
(5.202)
xe t m
yt m
f 9 Ig IH H G r h ` A h g P
5 bh h
9 IH f
` f A h g RH f h Ig A P 9
p g P A IH yb5 g DE9@7H5 g X !nd @g c$D 9 R 9 D 9 ` ` f 5 h A bh h g IH f h g P 9

y
9 f IH Ig
`
5 8h h
5 bh
V ) U i i
I h @Prf
9 f RH g
442
PTIME Languages
At last we have to give the signs for the pronouns. These are
Depending on case, x is a variable of type e t (for nominative pronouns) or of type e e t (for accusative pronouns). Starting with the sign
we get the sign (5.206)
Now we need the following additional signs. (5.207) (5.208) (5.209) (5.210)
e t x
If we feed the existential quantier rst we get the reading , and if we feed the universal quantier rst we get the reading . The structure terms are as follows. (5.211) (5.212)
We have not looked at the morphological realization of the phrase x. Number and gender must be inserted with the substitution. So, the case is determined by the local context, the other features are not. We shall not go into this here. (Montague had nothing to say about morphology as English has very little. We can only speculate what would have been the case if Montague had
~
~ 8
S S G S G 5S Q 555S } HS g g
I t
$G
u ) ( Q D Q ( e 9 RH f D
x y sub n xy t t x y x n x xn y
e t x x x
n
}
x y sub n xy t t x y x n x xn y
0 aV U k @1 Q ) o q d 0 aV U k @q 4 ) u o 1 0 aV V U U ) V u U V V ) f U U) h g A 0 aV V U ~ U ) V u U V V ) U) ` U 8h h 5
0V aaV
I h U k Id a) ) rP$(f r( h C
x1 x0
e t
e t
0 aV
U k I d C
(5.205)
e t e x 0 x1
0 aV
U d
U ) V u ) ` g ( I Ah P D V U
G v
(5.204)
x x xn
V !) U i i
For all k: x occurrence of of n .
hG i D h
k . Then subn x y n by x and deleting
is the result of replacing the rst the index n on all other occurrences
D D D
x1 x0
443
spoken, say, an inecting language.) Notice that the present analysis makes quantiers into sentence adjuncts. Recall from Section 2.7 that the grammar of sentences is very complex. Hence, since Montague dened the meaning of sentences to be closed formulae, it is almost unavoidable that something had to be sacriced. In fact, the given analysis violates several of our basic principles. First, there are innitely many lexical elements. Second, the syntactic structure is not respected by the translation algorithm, and this yields the wrong results. Rather than taking an example from a different language, we shall exemplify the probas the surface realization lems with the genitive pronouns. We consider of , where is the socalled Anglo Saxon genitive. Look for example at (5.213), which resulted from (5.214). Application of the above rules gives (5.215), however. (5.214) (5.215)
This happens not because the possessive pronouns are also part of the rules: of course, they have to be part of the semantic algorithm. It is because the wrong occurrence of the pronoun is being replaced by the quantier phrase. This is due to the fact that the algorithm is ignorant about the syntactic structure (which the string reveals only partly) and second because the algorithm is order sensitive at places where it should better not be. See Section 6.5 on GB, a theory that has concerned itself extensively with the question which NP may be a pronoun, or a reexive pronoun or empty. Fiengo and May (1994) speak quite plastically of vehicle change, to name the phenomenon that a variable appears sometimes as a pronoun, sometimes as pro (an empty pronoun, see Section 6.5), sometimes as a lexical NP and so on. The synchronicity between surface structure and derivational history which has been required in the subsequent categorial grammar, is not found with Montague. He uses instead a distinction proposed by Church between tectogrammar (the inner structure, as von Humboldt would have called it) and phenogrammar (the outer structure, which is simply what we see). Montague admits quite powerful phenogrammatical operations, and it seems as if only the label distinguishes him from GB theory. For in principle his maps could be interpreted as transformations.
G G ` G G 58h$4#h d g h 4 A g P H $h B IH f 8h 9 g 9H H 1 A 9 5 h 4 4 D V $ @$ 5f I h h P(PI `7h m $) d i d ` U 8h h 5
sub
(5.213)
5 h 4 4 87h
A g P 9 Fd @g IH f
A $D
5 8h h
H h
A D
9RH f 9 g
4 9
A yB H
4 D 1
A B f
444
PTIME Languages
We shall briey discuss the problems of blocking and other apparent failures of compositionality. In principle, we have allowed the exponent functions to be partial. They can refuse to operate on certain items. This may be used in the analysis of defective words, for example . This word exists only in the singular (though it arguably also has a plural meaning). There is no form . In morphology, one says that each word has a root; in this case the root may simply be . The singular is formed by adding , the plural by adding . The word does not let the plural be formed. It is defective. If that is so, we are in trouble with Leibniz Principle. Suppose we have a word X that is synonymous with but exists in the singular and the plural (or only in the plural like ). Then, by Leibniz Principle, the two roots can never have the same meaning, since it is not possible to exchange them for each other in all contexts (the context where X appears in the plural is a case in point). To avoid this, we must actually assume . The classical grammar calls it a singuthat there is no root form of lare tantum, a singular only. This is actually more appropriate. If namely this word has no root and exists only as a singular form, one simply cannot exchange the root by another. We remark here that English has pluralia tanta . In Latin, darkness, (plural only nouns), for example cease re are examples. Additionally, there are words which are only formwise derived from the singular counterpart (or the root, for that matter). One such example is in its meaning troops, in Latin assets, whose singular means luck. Again, if both forms are assumed to be derived from the root, we have problems with the meaning of the plural. Hence, some of these forms (typically but not always the plural form) will have to be part of the lexicon (that is, it constitutes a 0ary mode). Once we have restricted the admissible functions on exponents, we can show that weak and strong generative capacity do not necessarily coincide. Recall the facts from Exercise 187, taken from (Radzinski, 1990). In Mandarin yesnoquestions are formed by iterating the statement with the negation word in between. Although it is conceivable that Mandarin is context free as a string language, Radzinski argues that it is not strongly context free. Now, suppose we understand by strongly context free that there is a context free sign grammar. Then we shall show that under mild conditions Mandarin is not strongly context free. To simplify the dicussion, we shall dene a somewhat articial counterpart of Mandarin. Start with a context free language G G. Furand a meaning function dened on G. Then put M : G G
h H 9 7 4 8b5 g
"
h H h 9 h 85 R4
p h R H 5 $#7 g s
A 4 7 $p R h R H 5 $#$7 g D
p h$#7 g p R H 5 h R H 5 $#$7 g
A g 5 F8 g 4
p h R H 5 $#$7 g
H 9 7p 4 875 g A h 5 g X
p A h R H 5 $7 g
X
h H D 4 9 88q$7 D
445
ther, put
Here, ? forms questions. We only need to assume that it is injective on the set G and that ? G is disjoint from x y : x y L G . (This is the case in Mandarin.) Assume that there are two distinct expressions u and v of equal category in G such that u v . Then they can be substituted for each other. Now suppose that G has a sublanguage of the form r z i s : i such that r z i s r z j s for all i j. We claim that M together with is not context free. Suppose otherwise. Then we have a context free grammar H together with a meaning function that generates it. By the Pumping Lemma, there is a k such that z k can be adjoined into some r z i s any number of times. (This is left as an exercise.) Now look at the expressions
Adjunction is the result of substitution. However, the meaning of these expressions is if i j and a yesno question if i j. Now put j i k. If we adjoin z k on the left side, we get a yesno question, if we substitute it to the right, we do not change the meaning, so we do not get a yesno question. It follows that one and the same syntactic substitution operation denes two different semantic functions, depending on where it is performed. Contradiction. Hence this language is not strongly context free. It is likely that Mandarin satises the additional assumptions. For example, colour words are extensional. So, means the same as , , and so on. Next we look at Bahasa Indonesia. Recall that it forms the plural by reduplication. If the lexicon is nite, we can still generate the set of plural expressions. However, we must assume a distinct syntactic category for each noun. This is clearly unsatisfactory. For every time the lexicon grows by another noun, we must add a few rules to the grammar (see (Manaster-Ramer, 1986)). However, let us grant this point. Suppose, we have two nouns, m and n, which have identical meaning. If there is no syntactic or mophological blocking, by Leibniz Principle any constituent occurrence of the rst can be substituted by the second and vice versa. Therefore, if m has two constituent occurrences in m m, we must have a word m n and a word n m, and both mean the same as the rst. This is precisely what is not the case. Hence, no
h 7 RP
4 D
5 A h 7 I RP
i Cc i
h 7 IP
i @c i
i i W i W "
4 D
5 A h 7 R RP
C i W i W i
(5.217)
r zi s
r zj s
i i jS i
T V U
1 !) i V U i i
i i i
i 7V U
V U i
V 6U i
V U i
V i i U i
V y i i D
i Cc i
4 7 D
5 A h 7 R IP
V i i U i
i C U
H
G
(5.216)
y :
V U i
i 5V U
x x?
if x if x
D i i D
y, y.
h 7 IP
446
PTIME Languages
such pair of words can exist if Bahasa Indonesia is strongly context free. This argument relies on a stronger version of Leibniz Principle: that semantic identity enforces substitutability tout court. Notice that our previous discussion of context sets does not help here. The noun m has a different context set as the noun n, since it occurs in a plural noun m m, where n does not occur. However, notice that the context set of m contains occurrences of m itself. If that circularity is removed, m and n become indistinguishable. These example might sufce to demonstrate that the relationship between syntactic structure and semantics is loose but not entirely free. One should be extremely careful, though, of hidden assumptions. Many arguments in the literature showing that this or that language is not strongly context free rest on particular assumptions that are not made explicit. Notes on this section. The idea that syntactic operations should more or less be restricted to concatenation give or take some minor manipulations is advocated for in (Hausser, 1984), who calls this surface compositionality. Hausser also noted that Montague did not actually dene a surface compositional grammar. Most present day categorial grammars are, however, surface compositional.
(5.218) (5.219)
}
Here, suc x denotes the successor of x in the decimal notation, for example, suc . Let a string be given. What does a derivation of that string look like? When does a sign occur in another sign ? Describe the exponent of , for given structure terms , , . Dene a progress measure for which this grammar is progressive. Exercise 204. Let A : . We shall present two ways for generating ordinary arithmetical terms. Recall that there is a convention to drop brackets in the following circumstances. (a) When the same operation symbol is used in succession ( in place of ), (b) when the enclosed term is multiplicative ( in place of ). Moreover, (c) the outermost brackets are dropped. Write a sign grammar that generates triples x T n , where x is a term and n its value, where the conventions (a), T 16 as well (b) and (c) are optionally used. (So you should generate as T 16 ). Now apply Leibniz Principle to the pairs and
u " #t 0 ) 37( ) u "
5$ u " @Q#t
T ) ) ) ) w)) @ijGaaaS
suc x
T )) r) q aaabS
Exercise 203. Suppose that A
, with the following modes.
i c i i
g ))V U ( i V ) i( a0 ) aU D ) ) q 0 ( D
!
u 37#"
) @Q#t( ) u "
0 ) ) ( i
q@ 1aU V r V U D i
I k
de Saussure Grammars
447
Figure 16. Latin Verbs of Saying
Exercise 205. (Continuing the previous exercise.) Write a grammar that treats every accidental occurrence of a term as a constituent occurrence in some difin is in the grammar ferent parse. For example, the occurrence of of the previous exercise a nonconstituent occurrence, now however it shall be a constituent occurrence under some parse. Apply Leibniz Principle. Show is not identical to , and is not identical to and so that on. Which additive terms without brackets are identical in meaning, by Leibniz Principle? and (I say) are highly defective. Exercise 206. The Latin verbs They exist only in the present. Apart from one or two more forms (which we shall ignore for simplicity), Figure 16 gives a synopsis of what forms exist of these verbs and contrast them with the forms of . The morphology of is irregular in that form (we expect ); also the syntax of is somewhat peculiar (it is used parenthetically). Discuss whether and can be identical in meaning by Leibniz Principle or not. Further, the verb is formwise in the perfect, but it means I remember; similarly I hate. 8. de Saussure Grammars
In his famous Cours de Linguistique G n rale, de Saussure speaks about line e guistic signs and the nature of language as a system of signs. In his view, a sign is constituted by two elements: its signier and its signied. In our terms, these are the exponent and the meaning, respectively. Moreover, de Saussure says that signiers are linear, without further specifying what he means by that. To a modern linguist all this seems obviously false: there are
u v w v 8$b
w x v br@@
g bPD 7 9 g p D
x v @
f PD H 7 9
w v u v 8b
g H D
u v $@w
u 37"
D E D
g D 9 f h f p
4 1D
t x v ws v @@d8u
and
. What problems arise? Can you suggest a solution?
p 4 9 $@7 D p A 4 Dn1D p D AF7 f D p D 4 1D D p A D D g p D
4 $
9 7D 7 9 EPD
4 D 7 9 1EPD AEPD D 7 9 f PD H 7 9 4
9 7 D H
A$H D g H D
4 D F 4 D 7 9 1EbD 4 D 7 9 1EbD f bPD H 7 9
I say you(sg) say he says we say you(pl) say they say
H w v $8u
u "
448
PTIME Languages
categories, and linguistic objects are structured, they are not linear. Notably Chomsky has repeatedly offered arguments to support this view. He believed that structuralism was fundamentally mistaken. In this section we shall show that the rejection of de Saussures ideas is illfounded. To make the point, we shall look at a few recalcitrant syntactic phenomena and show how they can be dealt with using totally string based notions. Let us return to the idea mentioned earlier, that of terms on strings. We call a string term a term over the algebra of strings (consisting of constants for every a A, , and ). We assume here that strings are typed, and that we have strings of different type. Assume for the moment that there is only one type, that of a string, denoted by s. Then x y y x is the function of reverse concatenation, and it is of type s s s . Now we wish to implement restrictions on these terms that make sure we do not lose any material. Call a term relevant if for all subterms x N, x occurs at least once free in N. x y y x is relevant, x y x is not. Clearly, relevance is a necessary restriction. However, it is not sufcient. Let and be variables of type s s, x a variable of type x. Then function composition, x x , is a relevant term. But this is problematic. Applying this term leaves no visible trace on the string, it just changes the analysis. Thus, we shall also exclude combinators. This means, an admissible term is a relevant term that contains or an occurrence of a constant at least once. Denition 5.69 A string term is weakly progressive if it is relevant and not a combinator. is progressive if it is weakly progressive and does not contain . em, Denition 5.70 A de Saussure sign or simply dSsign is a pair where e is a progressive string term and m a term over meanings. The type of is the pair , where is the type of e and the type of m. If e m is another de Saussure sign then is dened iff ee is dened and mm is dened, and then
In this situation we call the functor sign and the argument sign. A de Saussure grammar is a nite set of dSsigns. So, the typing regime of the strings and the typing regime of the meanings do all the work here.
0k
)k
Vk U
(5.220)
ee mm
0 ) (
V U U aV )2
Vk U
0 ) (
k 0k )k (
449
Proposition 5.71 Let and be dSsigns of type and , respectively. Then is dened iff there are , such that , , and then has type . The rest is actually the same as in ABgrammars. Before we shall prove any results, we shall comment on the denition itself. In Montague Grammar and much of Categorial Grammar there is a conation of information that belongs to the realm of meaning and information that belongs to the realm of exponents. The category , for example, tells us that the meaning must be a function of type , and that the exponent giving us the argument must be found to the right. , is different only in that the exponent is to be found to the left. While this seems to be reasonable at rst sight, it is already apparent that the syntactic categories simply elaborate the semantic types. (This is why is a homomorphism.) The information concerning the semantic types is however not necessary, since the merger would fail anyhow if we did not supply signs with the correct types. So, we could leave it to syntax to specify only the directionality. However, syntax is not well equipped for that. There are discontinuous constituents and this is not easily accommodated in categorial grammar. Much of the research can be seen as an attempt to upgrade the string handling potential in this direction. Notice further that the original categorial apparatus created distinctions that are nowhere attested. For example, adjectives in English are of category n n. In order to modify a relational noun, however, they must be lifted to the category of a relational noun. The lifting will have to specify whether the noun is looking for its complement on its right or on its left. Generally, however, modiers and functors do not care very much about the makeup of their arguments. However, in and , categories must be explicit about these details. De Saussure grammars do away with some of the problems that beset CGs. They do not require to iterate the semantic types in the category, and the string handling has more power than in standard categorial grammar. We shall discuss a few applications of de Saussure grammars. These will illustrate both the strength as well as certain deciencies. A striking fact about de Saussure grammars is that they allow for word order variation in the most direct way. Let us take a transitive verb, , with meaning x y y x . Its rst argument is the direct object and the second its subject. We assume no case marking, so that the following nouns will be either subject or object.
0 k Fq B) |t 1
e (
0 k 8$6m ) o d
JOHN
MARY
(5.221)
k 0 k k D ( ) h h A
0 ) ( 5 bH
0 ) (
V V U k jinh U
u V U
Vk U
G ( e 9 g D
FV U
Vk U
k jih
450
PTIME Languages
Now we can give to the verb one of the following six signs. of which each corresponds to a different word order pattern. Recall that x y x y. (5.223) (5.224) (5.225) (5.226) (5.227)
SEES 1 SEES 2 SEES 3 SEES 4 SEES 5
The structure term for a basic sentence expressing that John sees Mary is in all cases the same. (Structure terms will be written using brackets, to avoid confusion. The convention is that bracketing is leftassociative.) It is SEES i MARY JOHN , i 6. Only that the order of the words is different in each case. For example,
(5.229)
SEES 4 MARY
x y x y
Notice that this construction can be applied to heads in general, and to heads with any number of arguments. Thus, de Saussure grammars are more at ease with word order variation than categorial grammars. Moreover, in the case of OSV word order we see that the dependencies are actually crossing, since the verb does not form a constituent together with its subject. We have seen in Section 5.3 how interpreted LMGs can be transformed into ABgrammars using vector polynomials. Evidently, if we avail ourselves of vector polynomials (for example by introducing pair formation and projections and redening the notion of progressivity accordingly) this result can be reproduced here for de Saussure grammars. Thus, de Saussure grammars suitably generalized are as strong as interpreted LMGs. However, we shall actually not follow this path. We shall not use pair formation; instead, we shall
9 g @A bH A h h 5 e AhA bH h 5 A hhA U V U e G g 5H 9 eC bH 5
x y y x y y
JOHN
y y
0 o d U |t 1 h aV k @$!nmV k Fq )U k ji) V o d a0 k 8$6m ) G aaV k 4Fq )U k ji) e ( U 0 |t 1 h 9 g V o d a0 k 8$6m ) G aa0 k Fq B) e ( U V |t 1 e(U h a0 k ji) bH 9 g 5 V 0 o d U |t 1 h aV k @$!nmV k Fq )U k ji) V o d a0 k 8$6m ) G aaV k 4Fq )U k ji) e ( U 0 |t 1 h 9 g V a0 k o8$d6m ) G aa0 k Fq B) e ( U V |t 1 e(U h a0 k ji) bH 9 g 5 V
A@A h h AA h h A h h RA U V
(5.228)
SEES 0 MARY
( V D D D D
( D
(5.222)
SEES 0
x y y x x y y x y x y x y x y x x y
SOV SVO VSO OSV OVS VOS
y x y
x y
JOHN
0 k ji) h A h h A 0 k ji) h A h h RA 0 k ji) h A h h RA 0 k ji) h A h h A 0 k ji) h A h h RA 0 k ji) h A h h RA
e (
U V ( ( ( ( (
D D D D D D
de Saussure Grammars Table 16. Plural in English
451
stay with the more basic apparatus. The examples that we shall provide below will give evidence that this much power is actually sufcient for natural languages, though some modications will have to be made. Next we shall look at plural in Bahasa Indonesia (or Malay). The plural is man is formed by reduplicating the noun. For example, the plural of , the plural of child is . To model this, we assume one type of strings, n.
The term x x x is progressive. The plural operation can in principle be iterated; we shall see below how this can be handled. (We see no obvious semantical reason why it cannot, so it must be blocked morphologically.) Now let us turn to English. In English, the plural is formed by adding an . However, some morphophonological processes apply, and some nouns form their plural irregularly. Table 16 gives an (incomplete) list of plural formation. Above the line we nd regular plurals, below irregular plurals. As we have outlined in Section 1.3, these differences are explained by postulating different plural morphs, one for each noun class. We can account for that by introducing noun class distinctions in the semantic types. For example, we may introduce a semantic type for nouns endings in a nonsibilant, another for nouns ending in a sibilant, and so on. However, apart from introducing the distinction where it obviously does not belong, this proposal has another drawback. Recall, namely, that linguists speak of a plural morpheme, which abstracts away from the particular realizations of plural formation. Mel uk c denes a morpheme as a set of signs that have identical category and identical meaning. So, for him the plural morpheme is simply the set of plural morphs. Now, suppose that we want the morpheme to be a (de Saussure) sign. Then
0 T U IV 2
S ) W
(5.230)
PLU
:
c
x x
x:
R $
9 H R5 g
d 9
H 9 RH
d 9
H 9 RH
9 h f G A $D X 9 h R$6 g G A A h 7 A h h 5 @4
d 9
Singular
Plural
H 9 RH
9 RH f G A $D X 6 g G A 7 h h 5 @4 W D W
R $
plain sufx e-insertion ensufx no change vowel change
9 H R5 g
R !
9 H R5 g
452
PTIME Languages
its meaning is that of any of its morphs, but the string function cannot be a term. For it may act differently on identical strings of different noun class. A good example is German . Like its English counterpart it can denote (i) a money institute, (ii) something to sit on, (iii) the bank of a river. However, in the rst case its plural is and in the other two it is . Now, since the function forming the plural cannot access the meaning we must distinguish two different string classes, one for nouns that form the plural by umlaut plus added , and the other for nouns that form the plural by adding . Further, we shall assume that German is in both, but with different meanings. Thus, we have two signs with exponent , one to mean money institute and the other to mean something to sit on or the bank of a river. This is the common practice. The classes are morphological, that is, they do not pertain to meaning, just to form. Thus we are led to the introduction of string types. We assume that types are ordered by some partial ordering , so that if and are string types and then any string of type is a string of type . Moreover, we put iff and . No other relations hold between nonbasic types. The basic type s is the largest basic type. Returning now to English, we shall split the type n into various subtypes. In particular, we need the types ni, nr, of irregular and regular nouns. We shall rst treat the regular nouns. The rule is that if a noun ends in a sibilant, the vowel is inserted, otherwise not. Since this is a completely regular phenomenon, we can only dene the string function if we have a predicate that is true of a string iff it ends in a sibilant. Further, we need to be able to dene a function by cases.

Thus, we must have a basic type of booleans plus some functions. We shall not spell out the details here. Notice that denitions by cases are not necessarily unique, so they have to be used with care. Notice a further problem. The minute that we admit different types we have to be specic about the type of the resulting string. This is not an innocent matter. The operation is dened on all strings. Suppose now that is a string of type nr, which type does have? Obviously, we do not want it to be just a string, and we may not want it to be of type nr again. (The difference between regular and irregular is needed only for plural formation.) Also, as we shall see below, there are operations that simply change the type of a string without changing the string itself. Hence we shall move from a system of implicit typing to one of explicit typing (see (Mitchell, 1990) for an overview). Rather than using
W ih R
o U 2 h W RyV
h h 5 @4
r 8t p
(5.231)
h d 9 VH RlT h
d
9 IH
2 h
d
9 RH
A h
9 h d 9 RH
d
9 RH
A h h 5 $4
9 h
453
variables for each type, we use a single set of variables. abstraction is now written x : M : in place of x M. Here x must be a variable of type , and the result will be of type . Thus, x : nr x : n denotes the function that turns and nrstring into an nstring by appending . The reader may recall from Section 4.1 the idea that strings can be taken to mean different things depending on what type they are paired with. Internally, a typed string term is represented by N , where N is the string term and its type. The operation M : does the following: it evaluates M on N, and gives it the type . Now, the function is also dened for all , so we nally have
Now we turn to the irregular plural. Here we face two choices. We may simply take all singular and plural nouns as being in the lexicon; or we devise rules for all occurring subcases. The rst is not a good idea since it does not allow and the plural morpheme actually occur in . The sign us to say that is namely an unanalyzable unit. So we discard the rst alternative and turn to the second. In order to implement the plural we again need a predicate of strings that tells us whether a string equals some given string. The minimum we have to do is to introduce an equality predicate on strings. This allows to dene the plural by cases. However, suppose we add a binary predicate x y which is true of x and y iff x is a sufx of y. Then the regular plural can be dened also as follows:
Moreover, equality is denable from . Evidently, since we allow a function to be dened by cases, the irregular plural forms can be incorporated here as well, as long as they are additive (as is but not ). For nonadditive plural formations see the remarks on umlaut in Section 6.3. Now take another case, causatives. Many English verbs have causative forms. Examples are , , . (5.235) (5.236)
s
(5.234)
s
y 0 5 R 4
g 9 g d p H RbD p 7 f 4 9 H D A G y h 08P h 8h H 4 4 5 9 @7 G p X h H A h 4 X g 5 g 4
9 h f
h A 7 9
p 4!9E1P R D 8 8 H A H h G 9 h bDq5 X D 9R5 A d H g p h G 4 h G R$7RHh P G
9 h R$6 g
8 1
H P
r jh
d D n
9 5
y0$#5h h R H 4 A 4 p 9 h 7 H RbD Rh 58hRH f h R H 9 9 h 7 H RbD Rh
R !
7 H IP
r @t p
(5.233)
: x : nr
: n;
9 h $6 g
W ih R o W Ry A A h V ) Urh t d aV G aHj8#V ) aCjl% U r hU A A
U V
(5.232)
x : M : N :
N xM:
Vk D
@k
0 ) (
if , otherwise.
V ) aHjh Ur
454
PTIME Languages
In all these cases the meaning of the causative is regularly formed so that we may actually assume that there is a sign that performs the change. But it leaves no visible trace. Thus, we must at least allow operators that perform type conversion even when they change nothing in the semantics. In the type system we have advocated above they can be succinctly represented by
Now, the conversion of a string of one type into another is often accompanied by morphological marking. For example, the gerund in English turns a verb ). It is formed regularly by sufxing . So, it has the into a noun ( following sign:
The semantics of these nominalizations is rather complex (see (Hamm and van Lambalgen, 2003)), so we have put the identity for simplicity here. Signs that consist of nothing more than a type conversion are called conversionemes in (Mel uk, 2000). Obviously, they are not progressive in the intuitive sense. c For we can in principle change a string from to and back; and we could do this as often as we like. However, there is little harm in admitting such signs. The additional complexity can be handled in much the same way as unproductive context free rules. Another challenge is Swiss German. Since we do not want to make use of products, it is not obvious how we can instrumentalize the terms to get the word order right. Here is how this can be done. We distinguish the main (inected) verb from its subordinate verbs, and raising from nonraising verbs. (See Table 17. We use i short for inected, r for raising, and t for transitive. We have suppressed the type information as it is of marginal relevance here.) Here, v, x, z are variables over NPcluster strings, w, y variables over verbcluster strings, and a variable for functions from NPcluster strings to functions from verbcluster strings to strings. NPs are by contrast very basic:
We ignore case for the moment. The lowest clause is translated as follows. z
D n 1 4
0 h r d o q p aV k i$I6U k 58Ia)
G 5
y z y
5 A H @ @H
A 7 F7
A h
(5.240)
AASTE HUUS
0 k i$Ia) hr d
0 k y)
(5.239)
MER
HUUS
A 7 @7
A h
( 5 8h f D
(5.238)
GER
R ! D
x : v x
: n x x
R ! D
R D R D $ !!
9 9 A
(5.237)
x : x :
de Saussure Grammars Table 17. Swiss German Verbs
455
LAAT
The recursive part, raising verb plus object, is translated as follows:
(5.241)
If we combine the two we get something that is of the same kind as the lower innitive, showing that the recursion is adequately captured: (5.242) w
D n 5 4
Had we inserted a nite verb, the second hole would have been closed. There would have been just a place for the subject. Once that is inserted, there are no more holes left. The recursion is nished. Notice that the structure term has the form of the corresponding English structure. The terms simply transliterate it into Swiss German. Let us briey speak about case. We insert only the bare nouns and let the verb attach the appropriate case marker. For example, if DAT is the function that turns a DP into a dative marked DP, the sign H ALFE will be
Next, we shall deal with case agreement inside a noun phrase. In many languages, adjectives agree in case with the noun they modify. We take our
0 k p RIaV )
UV aV U
U 2
(5.243)
H ALFE
v w
DAT
G 5
v w v
5 A H H @h X V H P
H H @P
VaaV k ihr$dI6U k Ho 8Ip6V k oRq 8RU k p R V q U t u G G 5p G f h ( D U
V aV
A 7 F7
A h
9 D
U V
H ALFE CHIND
AASTE HUUS
)V aV
v w
)V @
v w
v x w
9 D
GHp UV f h aV h X @V H P
h X @V H P
0aV k Rq 8RU k o t u U@ 5p V G f h U2 9 D 0 o u aV k Rtq 8RU k G UV 2 U U
H ALFE CHIND
( (
LAA
SCHW
SCHWE
AAST
ir+t +ir+t irt +irt i+r+t +i+r+t
AASTE
:= := := := := :=
x y z y x z x y y x y z y z x x x v w v x w x x
0 k aV ) U U V 2 H 0 k aV ) U V 2 P U @H H H @P 0 k ') 1 h G f f D Hp 0 k ) f 4 Hp A 1 h G h f D 0 k 58Ia) 5p A o q p G D 5 4 A H 0 k 58Ia) 5p 4 q5 H o q p G h q5H D 5 4 A H
p R U a( D p R a( U D
456
PTIME Languages
example from Finnish. The phrase a/the big train inects in the singular as follows. (We show only a fraction of the case system.) nominative genitive allative inessive
(5.244)
In the present case, it is the same sufx that is added to the adjective as well as the noun. Now, suppose we analyze the allative as a sufx that turns a caseless noun phrase into a case marked noun phrase. Then we want to avoid analyzing the allative as consisting of occurrences of the allative case. We want to say that it occurs once, but is spelled out twice. To achieve this, we introduce two types: , the type of case marked nouns, and , the type of case markers. Noun roots will be of type , adjectives of type .
Finally, assume the following sign for the allative.
Then the last two signs combine to
This has the advantage that the tectogrammatical structure of signs is much like their semantic structure, and that we can stack as many adjectives as we like: the case ending will automatically be distributed to all constituents. Notice that LMGs put a limit on the number of occurrences that can be controlled at the same time, and so they cannot provide the same analysis for agreeing adjectives. Thus, de Saussure grammars sometimes provide more adequate analyses than do LMGs. We remark here that the present analysis
0V qt 2 d C d 1 aaV k o blU k 8)U k 6tRB)
h P P H 9 @@@7 8P g $D h P A V aV U U
(5.249)
ALL ISO JUNA

f
0 k 6tRB) d C d 1
(5.248)
ALL
0 o qt 2 aV k 8tU k bB)
f
h P @P
(5.247)
ISO JUNA
x :
H 9 7
W g AD
So, x has the type ,
the type
. These signs combine to

x:
0 k bB) 2
V 2 U
(5.246)
ISO
x :
W g A $D 0 k o bl) qt
(5.245)
JUNA
x :
x:
x :
H 9 7
Hb@A@7 f A A H 9 H A h @P@PH@7 8P 9 hf P 9 H 9 R@7 f 9 H 9 7 h P P H 9 @@@7

f
g D A
H 9 @7
h P A 8P g $D
g $D A g $D A g $D A g $D A V
D D
U V (
457
conforms to the idea proposed in (Harris, 1963), who considers agreement simply as a multiple manifestation of a single morpheme. Case assignment can also be handled in a rather direct way. Standardly, a verb that takes a case marked noun phrase is assumed to select the noun phrase as a noun phrase of that case. Instead, however, we may assume that the sign for a case marking verb actually carries the case marker and attaches it to the NP. The Finnish verb to resemble selects ablative case. Assume that it has an ablative marked argument that it takes directly to its right. Then its sign may be assumed to be like this (taking the 3rd person singular present form).
The reason is that if we simply insist that the noun phrase comes equipped with the correct case, then it enters with its ablative case meaning rather than with its typical NP meaning. Notice namely that the ablative has an unmotivated appearance here given the semantics of the ablative case in Finnish. (Moreover, it is the only case possible with this verb.) So, semantically the situation is the same as if the verb was transitive. Notice that the fact that selects an ablative NP is a coincidence in this setup. The ablative form is directly added to the complement selected. This is not the best way of arranging things, and in (Kracht, 2003) a proposal has been made to remedy the situation. There is a list of potential problems for de Saussure grammars. We mention a few of them. The plural in German is formed with some stems by umlauting them (see Section 1.3). This is (at least on the surface) an operation that is not additive. As mentioned earlier, we shall discuss this phenomenon in Section 6.3. Another problem is what is known as suppletion. We exemplify this phenomenon with the gradation of Latin adjectives. Recall )a that adjectives in many languages possess three forms: a positive ( comparative ( ) and a superlative ( ). This is so in Latin. Table 18 gives some examples. Adjectives above the line are regularly formed, the ones below are irregular. Interesting is the fact that it is not the comparative or superlative sufx that is irregular: it is the root form itself. The expected form is replaced by : the root changes from to . (The English adjective is also an example.) A different phe. The comparative is formed either nomenon is exhibited by English by adding ( , ) or by adding . The form resists we speak of a decomposition into a stem and a sufx. In the case of a portmanteau morph. Portmanteau morphs can be treated in de Saussure
9 g
1 8 8
h A 5 g
h A R5 g
0 k 8diwV 2 1 h t )
4 1
h g 5 f
A h D 8 8 E7FH
5 g $h f D P
h A 5 g g g
5 h 4 A 8$1H
(5.250)
TUNTUU
H 4 #P
7 7 4 9 7 @884
5 g E9 g D
5 h D 8 8 bbFH
5 h 4 4 b7h
5 bh
H 7 4 9 7 b$@84
H 7 4 9 7 8$@84 P $h f
458
PTIME Languages
Table 18. Gradation of Latin Adjectives
grammars only as lexical items (since we only allow additive phonological processes). Notes on this section. A proposal similar to de Saussure grammars one has been made by Philippe de Groote (2001). Exercise 207. Recall from Section 3.5 the notion of a combinatory extension of categorial grammar. We may attempt the same for de Saussure grammars. Dene a new mode of combination, , as follows.
Here, e is of type , e of type and x of type , so that the string term x e e x is of type . Likewise for the semantics. Show that this extension does not generate different signs, it just increases the set of structure terms. Contrast this with the analogous extension of ABgrammars. Look especially at mixed composition rules. Exercise 208. Review the facts from Exercise 186 on Arabic. Write a de Saussure grammar that correctly accounts for them. Hint. This is not so simple. First, dene schemes, which are functions of type
They provide a way of combining consonantism (root) and vocalism. The rst three arguments form the consonantism, the remaining three the vocalism. The change in consonantism or vocalism can be dened on schemes before inserting the actual consonantism and vocalism. Exercise 209. Write a de Saussure grammar that generates the facts of Mandarin shown in Exercise 187. Exercise 210. In European languages, certain words inside a NP do not inect for case (these are adverbs, relative clauses and other) and moreover, no
VVVV aaaaV
U U U U U
(5.252)
0V aaV U k U
)V aV U k U
V a0 k ) k aa0 ) aU (UV (
V U aV k U
(5.251)
em
e m
x e e x
A D A A h F7 f 8 A D 4 7 f n8 g A7 f $q#h D A A D 4 H ` A D A A 5 H 7 f $D 8

y m m y
5 g 5 g
T
5 g !8 D h 5 g D$P$h f H $h 5 H b8
D D 4 q# `
Positive
Comparative
Superlative
A 7 P FR@H f A 7 F9 g Ab`#h 7 4 H A 5 H 7 8
459
word can more than two cases. Dene case marking functions that take care of this. (If you need concrete examples, you may elaborate the Finnish example using English substitute words.) Exercise 211. We have talked briey in Section 5.1 about Australian case marking systems. We shall simplify the facts (in particular the word order) as follows. We dene a recursive translation from PN (terms t in Polish notation) inductively as follows. We assume case markers i , i . For a constant term c, put c : c. If F is an nary function symbol and t i , i n, terms then put
Exercise 212. In many modern theories of grammar, socalled functional heads play a fundamental role. Functional elements are elements that are responsible for the correct shape of the structures, but have typically very little if any content. A particularly useful idea is to separate the content of an element from its syntax. For example, we may introduce the morphological type of a transitive verb (tv) without specifying any selectional behaviour.
Then we assume one or two functional elements that turn this sign into the signs SEE i , i 6. Show how this can be done for the particular case of the signs SEE i . Can you suggest a general recipe for words of arbitrary category? Do you see a solution of the problem of ablative case selection in Finnish?
0 k ji) h
(5.254)
SEE
: tv
0 R @S ) (
Write a de Saussure grammar that generates the set
t t :t
p W
aa p W
p W
h h @A
aa
(5.253)
Ft0
tn
F t0
t1
tn
n 1
PN .
Chapter 6 The Model Theory of Linguistic Structures

1. Categories
Up to now we have used plain nonterminal symbols in our description of syntactic categories symbols with no internal structure. For many purposes this is not a serious restriction. But it does not allow to capture important regularities of language. We give an example from German. The sentences (6.1) (6.6) are grammatical. I see-1.S G You.S G see-2.S G He/She/It see-3.S G We see-1.P L You.P L see-2.P L They see-3.P L By contrast, the following sentences are ungrammatical. I see-2.S G/see-3.S G/see-1/3.P L /see-2.P L You.S G see-1.S G/see-3.S G/see-1/3.P L /see-2.P L One says that the nite verb of German agrees with the subject in person and number. This means that the verb has different forms depending on whether the subject is in the 1st, 2nd or 3rd person, and whether it is singular or plural.
G G G G G G G Hi G Q G G G G H i
(6.2) (6.3) (6.4) (6.5) (6.6)
(6.8)
y 4
G h 9 h A h A
G h D bA h 7 h A
G 5 i
p 8
(6.7)
y '4
G h 9 h RA Rh A
y 4
h D 7A
h D A bA
4 1
y 4 1
A 7A h D
y 4
9 h A 5 h ID
9 h A h D h $}
A 77 h D A
G
h D $} 5
h A 5
p @
(6.1)
y 0
h A h
462
The Model Theory of Linguistic Structures
How can we account for this? On the one hand, we may simply assume that there are six different kinds of subjects (1st, 2nd or 3rd person, singular or plural) as well as ve different kinds of verb forms (since two are homophonous, namely 1st and 3rd person plural). And the subjects of one kind can only cooccur with a matching verb form. But the grammars we looked at so far do not allow to express this fact at this level of generality; all one can do is provide lists of rules. A different way has been proposed among other in Generalized Phrase Structure Grammar (GPSG, see (Gazdar et al., 1985)). Let us start with the following basic rule. Here the symbols , and are symbols not for a single category but for a whole set of them. (This is why we have not used typewriter font.) In fact, the labels are taken to be descriptions of categories. They are not string anymore. This means that these labels can be combined using boolean connectives such as negation, conjunction and disjunction. For example, if we introduce and then our rule (6.9) can be rened the properties , and as well as as follows: Furthermore, we have the following terminal rules. Here is the description of a category which is a noun phrase ( ) in the rst person ( ) singular ( ). This means that we can derive the sentence (6.1). In order for the sentences (6.7) and (6.8) not to be derivable we now have to eliminate the rule (6.9). But this excludes the sentences (6.2) (6.6). To get them back again we still have to introduce ve more rules. we These can however be fused into a single schematic rule. In place of now write CAT : np , in place of we write PERS : 1 , and in place of we write NUM : pl . Here, we call CAT, PER and NUM attributes, and np, vp, 1, and so on values. In the pair CAT : np we say that the attribute CAT has the value np. A set of pairs A : v , where A is an attribute and v a value is called an attributevalue structure or simply an AVS. The rule (6.9) is now replaced by the schematic rule (6.12).
CAT PER NUM
G r r t
(6.12)
CAT
: np : :
CAT PER NUM
: vp : :
Cr
r
x
) G Hp
x z
x r dp
r '
(6.11)
r
h A h
x F
x r Ft
(6.10)
Cr
r '
(6.9)
Categories
463
Here, and are variables. However, they have different value range; may assume values from the set 1 2 3 values from the set sg pl . This fact shall be dealt with further below. One has to see to it that the properties inducing agreement are passed on. This means that the following rule also has to be rened in a similar way.
r
This rule says that a VP may be a constituent comprising a (transitive) verb and an NP. The agreement features have to be passed on to the verb.
CAT PER NUM
NUM
Now, there are languages in which the verb not only agrees with the subject but also with the object in the same categories. This means that it does not sufce to simply write PER : ; we also have to say whether concerns the subject or the object. Hence the structure relating to agreement has to be further embedded into the structure.

PER
It is clear that this rule does the job as intended. One can make it look even nicer by assuming also for the NP an embedded structure for the agreement complex. This is what we shall do below. Notice that the value of an attribute is now not only a single value but may in turn be an entire AVS. Thus, two kinds of attributes are distinguished. 1, sg are called atomic values. In the present context, all basic expressions are either (atomic) values or attributes. Attributes which have only atomic values are called Type 0 attributes, all others are Type 1 attributes. This is the basic setup of (Gazdar et al., 1988). In the socalled Head Driven PhraseStructure Grammar by Carl Pollard and Ivan Sag (HPSG, see (Pollard and Sag, 1994)) this has been pushed much further. In HPSG, the entire structure is encoded using AVSs of the kind just shown. Not only the bare linguistic features but also the syntactic structure
AGRO :
NUM :
PER
(6.15)
NUM :
: vp PER : NUM :
CAT
AGRS
CAT
:v
: np PER : NUM :
CAT
(6.14)
: vp : :
CAT PER
: v : :
(6.13)
CAT
np
T ) ) S
464
itself is coded into AVSs. We shall study these structures from a theoretical point of view in Section 6.6. Before we enter this investigation we shall move one step further. The rules that we have introduced above use variables for values of attributes. This certainly is a viable option. However, HPSG has gone into a different direction here. It introduces what are in fact structure variables, whose role it is to share entire AVSs between certain members of an AVS. To see how this works we continue with our example. Let us now write an NP not as a at AVS, but let us instead embed the agreement related attribute value pairs as the value of an attribute AGR. A 3rd person NP in the plural is now represented as follows.
CAT AGR
: :
np
NUM
(6.16)
The value of AGR is now structured in the same way as the values of AGRS and AGRO. Now we can rewrite our rules with the help of structure variables as follows. The rule (6.12) now assumes the form
AGR
The rule that introduces the object now has this shape. (6.18)
CAT
The labels 1 and 2 are variables for AVSs. If some variable occurs several times in a rule then every occurence stands for the same AVS. This is precisely what is needed to formulate agreement. AVS variables help to avoid that agreement blows up the rule apparatus beyond recognition. The rules have become once again small and perspicuous. (However, the agreement facts of languages are full of tiny details and exceptions, which make the introduction of more rules unavoidable.) Now if AVSs are only the description, then what are categories? In a nutshell, it is thought that categories are Kripkeframes. One assumes a set of vertices and associates with each attribute a binary relation on this set. So, attributes are edge colours, atomic values turn into vertex colours. And a syntactic tree is no longer an exhaustively ordered tree with simple labels but
: vp AGRS : 1
:v AGRS : 1 AGRO : 2
CAT
CAT
: np AGR : 2
(6.17)
CAT
CAT
: :
PER
: pl : 3
np 1
CAT AGRS
: :
vp 1
Categories
CAT
465
np
vp
Figure 17. The Kripkeframe of an AVS
an exhaustively ordered tree with labels having complex structure. Or, as it is more convenient, we shall assume that the tree structure itself also is coded by means of AVSs. The Figure 17 shows an example of a structure which as one says is licensed by the rule (6.17). The literature on AVSs is rich (see the books (Johnson, 1988) and (Carpenter, 1992) ). In its basic form, however, it is quite simple. Notice that it is a mistake to view attributes as objects. In fact, AVSs are not objects, they are descriptions of objects. Moreover, they can be the values of attributes. Therefore we treat values like np, 1 as properties which can be combined with the usual boolean operations, for example , , or . This has the advantage that we are now able to represent the category of the German verb form in either of the following ways.
CAT PER NUM
NUM
The equivalence between these two follows only if we assume that the values of PER can be only 1, 2 or 3. This fact, however, is a fact of German, and will be part of the grammar of German. (In fact, it seems to hold pretty universally across languages.) Notice that the collocation of attributevalue pairs into an attributevalue structure is nothing but the logical conjunction. So the left hand AVS can also be written down as follows. One calls underspecication the fact that a representation does not x an object in all detail but that it leaves certain properties unspecied. Disjunctive
(6.20)
CAT
:v
PER
:1 3
NUM : pl
(6.19)
: v : 1 3 : pl
CAT PER
: v : 2 : pl
9 h h A
sg
NUM
CAT
CAT
NUM
PER
PER
sg
466
specications are a case in point. However, they do not in fact provide the most welcome case. The most ideal case is when certain attributes are not contained in the AVS so that their actual value can be anything. For example, the category of the English verb form may be (partially!) represented thus.

(6.21)
CAT TEMP
: :
v past
This means that we have a verb in the past tense. The number and person are simply not mentioned. We can but need not write them down explicitly.
TEMP NUM PER
Here is the maximally unspecied value. We have this is a linguistical, that is to say, an empirical, fact :
From this we can deduce that the category of representation.

TEMP NUM PER
1 2 3
Facts of language are captured by means of axioms. More on that later. Since attributevalue pairs are propositions, we can combine them in the same way. The category of the English verb form has among other the following grammatical representation.
CAT PER NUM
This can alternatively be written as follows.
V l
V 5l
U aU
(6.26)
CAT
:v
PER
:3
NUM : sg
(6.25)
: v : 3 : sg
CAT NUM
: :
v pl
NUM
h h RA
(6.24)
CAT
: : : :
v past
H A
(6.23)
PER
:1 2 3
(6.22)
CAT
: : : :
v past
H bA
also has the following
: pl
Categories
467
In turn, this can be simplied.
This follows on the basis of the given interpretation. Since AVSs are not the objects themselves but descriptions thereof, we may exchange one description of an object or class of objects by any other description of that same object or class of objects. We call an AVS universally true if it is always true, that is, if it holds of every object. If is a tautology of propositional logic then holds for all replacements of AVSs for the propositional variables. If is universally true, then so is X : .
is uniIn order to avoid having to use , we shall write if versally true. Most attributes are denite, that is, they can have at most one value in any object. For such attributes we also have
Denite attributes are the norm. Sometimes, however, one needs nondenite attributes; they are called set valued to distinguish them from the denite ones. The AVSs are nothing but an alternative notation for formulae of some logical language. In the literature, two different kinds of logical languages have been proposed. Both serve the purpose equally well. The rst is the so ), which is a fragment called monadic second order predicate logic ( of second order logic ( ). Second order logic extends standard rst order predicate logic as follows. There additionally are variables and quantiers for predicates of any given arity n . The quantiers are also written and and the variables are Pin , n i . Here, n tells us that the variable is a variable for nary relations. So, PdV : Pin : n i is the set of predx i : i the set of object icate variables for unary predicates and V : variables. We write Pin x to say that Pin applies to (the ntuple) x. If is a formula so are Pin and Pin . The set of (M)SOformulae dened over a given signature is denoted by and , respectively. The
V U
1 )
q r
q g
D S
V U
q r
1 )
q r
V U i
~ U
(6.28)
X:
X:
X:
If and
are universally true then so is .
X:
X:
X: .
V l
(6.27)
CAT
:v
PER
:3
NUM
: pl
468
structures are the same as those of predicate logic (see Section 3.8): triples M f : f F r : r R , where M is a nonempty set, f the interpretation of the function f in and r the interpretation of the relation r. A model is a triple where is a structure : V Ma function assigning to each variable an element from M and : P M a function assigning to each nary predicate variable an nary relation on M. The relation is dened inductively.
We write iff for all and . is that fragment of which has only predicate variables for unary relations (n 1). When using MSO we drop the superscript 1 in the variables Pi1 . Another type of languages that have been proposed are modal languages (see (Blackburn, 1993) and (Kracht, 1995a)). We shall pick out one specic language that is actually an extension of the ones proposed in the quoted literature, namely quantied modal logic (QML). This language possesses a pi : i of proposition variables, a set denumerably innite set PV : Md of socalled modalities, and a set Cd of propositional constants. And nally, there are the symbols , , , , , , and . Formulas (called propositions) are dened inductively in the usual way. Moreover, if is a proposition, so is pi and pi . The notions of Kripkeframe and Kripemodel remain the same. A Kripkeframe is a triple F R C , where R : Md F 2 and C : Cd F . If m is a modality, R m is the accessibility relation associated with m. In particular, we have
For the constants we put
We dene
(6.34)
C0 k ( ) ) C0 ) k ( )
V 0 ) ( U ) ~U 0 ) ( )
x x
p p
: :
V U
"0 ) ( )
(6.33)
C c
for all for some
: p :
p
0 ) ( )
V U
0 "0 ) ( ( )
(6.32)
there is y : x R m y and
x x
q r
V U 0 ) ) (
q r
~ 0 '( v v
0 ) l( )
V U
(6.31)
P:
for some
0 ) k l( ) 0 ) k l( )
C0 ) l( U ) ~U C0 ) l( )
(6.30)
P:
for all
V U
V Uk
~ U
We dene
if Q
Q for all Q
P.
1 i QV U
D V U i
"0 ) l( )
(6.29)
Pin x
Pin
0 IT
0 ) l( )
S) 'bT
"0 ) l( )
S ')
Categories
469
We write if for all x F x ; we write , if for all we have . We dene an embedding of into m , where m is dened as follows. Let R : r m : m Md and Cd : Qc : c K . m rm : 2, m Qc : 1. Then dene as in Table 19. Here in the last two clauses xi is a variable that does not already occur in . Finally, if is a Kripke frame, we dene an MSOstructure m as follows. The underlying set is F, m m rm : R m for every m Md, and Qc : C c . Now we have
Proof. We shall show the following. Assume : PV F is a valuation F valuations for the in and that x F and : PdV F and : V predicate and the object variables. Then if Pi pi for all i and x0 x we have
It is not hard to see that (6.35) allows to derive the claim. We prove (6.35) by induction. If pi then Pi x0 and the claim holds in virtue of the fact that pi Pi and x0 x. Likewise for c C. The steps for , , and are routine. Let us therefore consider pi . Let x . Then for some which differs from at most in pi : x . Put as follows: Pi : pi for all i . By induc and differs from at most in Pi . Theretion hypothesis m m fore we have Pi , as desired. The argument can be rePi . Now for versed, and the case is therefore settled. Analogously for m . Let x . Then there exists a y with x r m y and y .
0 ) )( V U ~
1D
Uk
0 ) )
V D U
D k V Uk k
U C0 ) ) ( ) y0 lk ) (
V U
0 ) ( )
"0 ) ( )
(6.35)
V U
V U
V U
V $1 U s
Theorem 6.1 Let frame : iff m
. Then
m . And for every Kripke
V V ) V V ) V V aU U V U
D D D
V U
q g
qr V
"0 ) ( )
T 1 V $ U s
UV U V
U ~U D ~ U D
p i 1 2 pi m m
: : : : : :
Pi x0 c : Q c x0 1 2 : 1 2 1 2 1 2 : 1 2 : Pi pi Pi m x x x0 r 0 i xi x0 x 0 r m x0 xi xi x0
Table 19. Translating
into

V 0 aU ( V aU V V aU ~U V U D V ) 0 ( ) 0 ( 1 U
V U
0 ) k ( ) 20 ) ( )
0 (
470
Choose such that x0 y and xi xi for every i 0. Then by induction hypothesis m . If xi is a variable that does not occur in then let xi : xi , x0 : x and x j : x j for all j 0 i . Then m rm x0 xi ; xi x0 . Hence we have m . Now it holds that x0 x x0 and xi is bound. Therem fore also . Again the argument is reversible, and the case is proved. Likewise for m . Exercise 213. Let 1, 2 and 3 be modalities. Show that a Kripkeframe satises the following formula iff R 3 R1 R2.
Exercise 214. Let 1, 2 and 3 be modalities. Show that a Kripkeframe satises the following formula iff R 3 R1 R2.
Exercise 215. In HPSG one writes CAT : if CAT has at least two values: and . (If is consistent, CAT can take also one value, , for . What if cat : but not necessarily.) Devise a translation into also means that CAT can have no other value than and ? Exercise 216. Let r be a binary relation symbol. Show that in a model of the following holds: Q x y iff x r y (this means that y can be reached from x in nitely many rsteps).
2.
Axiomatic Classes I: Strings
For the purposes of this chapter we shall code strings in a new way. This will result in a somewhat different formalization than the one discussed in Section 1.4. The differences are, however, marginal. Denition 6.2 A Zstructure over the alphabet A is a triple of the form L Qa : a A , where L is an arbitrary nite set, Q a : a A a partition and its inverse are linear, of L and a binary relation on L such that both irreexive, total orderings of L.
V aV U
V IaV U I
@V U V U
~U V U V U
~ U
0 IT
V ) U
(6.38)
Qxy :
P Px
yz P y
yrz
Pz
Py
s
V ) U
0 ) l( )
0 0 ( (
0 (
(6.37)
3 p
1 2 p
V U
X V U
V U
0 0 ( (
0 (
(6.36)
3 p
1 p
2 p
Uk
U kk
V U U
V U
V U kk D VD ) U Q0 kk ) ) ( V U kk V Uk V U kk D Y0 k ) D ) ( V k U s V U D V U D
V k U
Q0 ) ) ( 0 kk ) ) T ) S 1
S) ') (
q r
(
s
471
Zstructures are not strings. However, it is not difcult to dene a map which assigns a string to each Zstructure. However, if L there are innitely many Zstructures which have the same string associated with them and they form a proper class. the MSOlanguage of the binary relation symFix A and denote by bol as well as a unary predicate constant a for every a A. Denition 6.3 Let be a set or class of Zstructures over an alphabet A. Then denotes the set : for all : , called the then let be the MSOtheory of . If is a set of sentences from set of all which satisfy every sentence from . is called the model class of . Recall from Section 1.1 the notion of a context. It is easy to see that together with the class of Zstructures and the MSOformulae form a context. From this we directly get the following Theorem 6.4 The map : is a closure operator on the class of of classes Zstructures over A. Likewise, : is a closure operator on the set of all subsets of . (We hope that the reader does not get irritated by the difference between classes and sets. In the usual set theory one has to distinguish between sets and classes. Model classes are except for trivial exception always classes while classes of formulae are always sets, because they are subclasses of the set of formulae. This difference can be neglected in what is to follow.) We now call the sets of the form logics and the classes of the form axiomatic classes. A class is called nitely MSOaxiomatizable if it has the form for a nite , while a logic is nitely MSOaxiomatizable if it is the logic of a nitely axiomatizable class. We call a class of Zstructures over A regular if it is the class of all Zstructures of a regular language. Formulae are called valid if they hold in all structures. The following result from (B chi, 1960) is the central theorem of this section. u Theorem 6.5 (Buchi) A class of Zstructures is nitely MSOaxiomatizable iff it corresponds to a regular language which does not contain . we can only dene regular This sentence says that with the help of classes of Zstructures. If one wants to describe nonregular classes, one has to use stronger logical languages (for example ). The proof of this theorem requires a lot of work. Before we begin, we have to say something about
q g
s
V U
@ds T
d s p @%`
r1 q
d s
q r
p d s %
q g
q g
V lU
q r
V @s U d
472
Px
Py
F Pi x Pi x
the formulation of the theorem. By denition, models are only dened on nonempty sets. This is why a model class always denes a language not containing . It is possible to change this but then the Zstructure of (which is actually unique) is a model of every formula, and then is regular but not MSOaxiomatizable. So, complete correspondence cannot be expected. But this is the only exception. Let us begin with the simple direction. This is the claim that every regular class is nitely MSOaxiomatizable. Let be a regular class and L the corresponding regular language. Then there exists a nite state automaton A Q i0 F with L L. We may choose Q : n for a natural number n and i0 0. Look at the sentence dened in Table 20.
Proof. Let and let be an MSOstructure. Then there exists a binary relation (the interpretation of ) and for every a A a subset Q a L. By (a) and (b) an element x has at most one successor and at most one predecessor. By (c), every nonempty subset which is closed under successors contains a last element, and by (d) every nonempty subset which is
V U
Lemma 6.6 Let be an MSOstructure. Then and its associated string is in L .
iff
is a Zstructure
V aV U
V U V aV U RV
y x
VV aaV TVV aaV U V U V aV U V TVV aaV U V U V aV U V
TlV U G V @V U 4 V % U ~U GIV U V aV % ~ U U ~ U U V UaV U@V ~ U ~ S aa V U "V U 4 V V U V U T V U V V U V 2 ~U U V U U V @V U ~U S U ~U U U V V U V U V U I UV @V ~U S U U U V ~V U UV U V U I V ~@V U U S V UV U V
V U
I R
V U
V U
V U
U ~U ~ U
~ U ~ U ~ U
~ U
~ U
0 ) )D ) ) (
V U
xyz x y x z y z xyz y x z x y z P xy x y Px x Px y x y P xy x y Py x Px y y x P xy x y Px xPx xPx x a Aa x x a ba x bx P0 P1 Pn 1 x y x y x y xy x y a A a Pj y j ia
B @
(a) (b) xPx xPx Py Py (c) (d) (e) (f) (g) Py P0 x (h)
Table 20. The Formula
473
closed under predecessors contains a rst element. Since L is not empty, it has a least element, x0 . Let H : xi : i be a maximal set such that xi 1 is the (unique) successor of xi . H cannot be innite, for otherwise x i : i would be a successor closed set without last element. So, H is nite. H is also closed under predecessors. So, H is a maximal connected subset of L. By (e), every maximal connected nonempty subset of L is identical to L. So, H L, and hence L is nite, connected, and linear in both directions. Further, by (f) and (g) every x L is contained in exactly one set Q a . Therefore is a Zstructure. We have to show that its string is in L . (h) says that we can nd sets Hi L for i n such that if x is the rst element with respect to then x H0 , if x is the last element with respect to then x H j for some j F and if x Hi , y Qa , x y then y H j for some j i a . This means that the string is in L . (There is namely a biunique correspondence between accepting runs of the automaton and partitions into Hi . Under that partition x Hi means exactly that the automaton is in state i . Then either is not a Zstructure or there at x in that run.) Now let exists no accepting run of the automaton . Hence the string is not in L . This concludes the proof. Notice that we can dene , the transitive closure of , and , the transitive closure of (and converse of ) by an MSOformula (see Exercise 216). Further, we will write x y for x y x y, and x y in place of x y. Now, given a Zstructure L Qa : a A put A structure of the form M we call an MZstructure. Now we shall prove the converse implication of Theorem 6.5. To this end we shall make a detour. We put M : and C : ca : a A . Then we call the language of quantied modal logic with basic modalities from M and propositional constants from C. Now we put
% 4 2v V v a! v 2 g V U g a! g U 0v'( ''2 0 v(0 v( 0 g( ( 0 g(0 g 2 h 0 h( 0 ( 0 h ( !2 0 ( h ! 0 ' 2 v( 0 h( 0 g( ( 0
:
(6.40)
p p p
p p p p
p p
b ca
p cb
p p
A ca
with nonempty We call a structure connected if it is not of the form and . As already mentioned, QMLformulae cannot distinguish between
T h) ) v) g iIS
0 IT
S) ) ) h) 'ii) (
V YU
s
V U
(6.39)
Qa : a
` V U
V U
V U
0 IT
V U
S) ') (
V ) U
474
connected and nonconnected structures.
Proof. The proof is not hard but somewhat lengthy. It consists of the following facts (the others are dual to these).
p p. Then there exists We show only and . For this let and x such that x p; p. This means that there is a y F y. If x R y then x p, contradiction. This with y p and x R R . Assume conversely R R . Then there exist x shows R and y such that x R y but not x R y. Now set p : y . Then we have x p; p. Because of we restrict ourselves to proving this for transitive R . Assume p p p. Then there exists a and a x0 with x p p; p. So there exists a x1 with x0 R x1 and x1 p . Then x1 p p and therefore x1 p. Because of the transitivity of R we also have x 1 p p . Repeating this argument we nd an innite chain x i : i such that xi R xi 1 . Therefore, R contains an innite ascending chain. Conversely, assume that R has an innite ascending chain. Then there exists a set x i : i with xi R xi 1 for all i . Put p : y : there is i : y R xi . Then it holds that x0 p p; p. For let x R y and suppose that y p. Then there exists an i with y R x i (for R is transitive). Hence y p, whence y p p. Since y was arbitrary, we have x p p . Also x1 p and x0 R x1 . Hence x0 p, as required. Notice that a nite frame has an innite ascending chain iff it has a cycle. into . To this end we need some Now we dene an embedding of preparations. As one convinces oneself easily the following laws hold.
` 0 g( V g U V g a!g x U g V g U V g U g V g U 0 g g a! g d0 ) Y( U ) ( V U S T VU g 1 V U 1 V g D 0 1 ( V g U V g U gU V 0 1 ( V g a! g U V g U 0 g g V U 1 V g U ( 0 g U ) ( V g !g C0 ) ( g CV g a!g i U V g U 0 g ( 0 0 ) ( ( ) T S V U V g U V U D VU pVU g V g U 0U p V 0( g V g U V U V U 1 0 ) 0 g ( ( d0 ) ( 0 ( e
0 g(
s
V g U
q r
yV g
U ! g
p p p iff R ascending chains (or cycles).
V g U
0 g(
0 g(0 g(
p iff R
is transitive.
is transitive and free of innite
V U
0 (
p iff every point has at most one R
'V U
} V h U
0 ( h !
p iff R
V g U
} V U
0 g(
0 (
p iff R
. . successor.,
V dU
s
V dU
0 IT
S')Vv'U )VU U V U ) ( g )V h )
Theorem 6.7 Let FR R R ture over A.
be a connected Kripkeframe for R K ca : a A . iff Z
. Put Z : is an MZstruc-
475
We now dene following quantiers. x y: y: y: x x x x x x y
(6.42)
We call these quantiers restricted. Evidently, we can replace an unrestricted quantier x by the conjunction of restricted quantiers. Lemma 6.8 For every MSOformula with at least one free object variable there is an MSOformula g with restricted quantiers such that in all Z structures : g. We dene the following functions f , f , g and g on unary predicates. g x : y x y
h
(6.43)
A somewhat more abstract approach is provided by the notion of a universal modality. Denition 6.9 Let M be a set of modalities and M. Further, let L be a M modal logic. is called a universal modality of L if the following formulae are contained in L:
x :
g
f
x :
x y x y
V U V V U V V U V V U V
x :
x y
U V aV UV D U V aV UV D U V aV UV D U V aV UV D
V U V U V U V U
U D U D ~U D ~ U
y:
x x
V U
x y
@VaV U V aV U
V U ~U V U U ~ "V U V U ~
(6.41)
x x
Finally, for every variable y
x: x x y
x x
V x U
yV
U V U V
V ~U
QV
U U U U U U U U
V U ~
U U ~U ~ U
U ~ V U
x 1 2 1 x 2 , does not occur freely in 1 .
V U V ~U
V V U V V ~U
UV U U ~ V U
x 1 x 1
2 2
x 1 x 1
U V U
V U ~ V
.
x 2 , x 2 . x 1
x 2 , if x
U s $
476
Proposition 6.10 Let L be a M logic, M a universal modality and F R a connected Kripkeframe with L. Then R F F. The proof is again an exercise. The logic of Zstructures allows to dene a universal modality. Namely, set
This satises the requirements above. The obvious mismatch between and is that the former allows for several object variables to occur freely, while the latter only contains one free object variable (the world variable), which is left implicit. However, given that contains only one free object variable, we can actually massage it into a form suitable for . Let Px be a predicate variable which does not occur in . Dene Px x inductively as in Table 21. (Here, assume that x v w and x y.) Let Px x . Then
Lemma 6.11 Let (6.46)
x Px
V UV aV aV U U V UV aV aV U U
V U V U ~U V U V aV U V U U ~U V U
P :
x Px
x Px
S "0 ) l( )
"0 ) l( )
(6.45)
s
T V U
q r
2v g
s
V U
(6.44)
Px x
V U
U s $
m p, for all m
M.
0 ! (
0 ) (
p, p
p, p
p.
T V U S V V T V U D V V S ~ V U D V V T V U D V V S S T S D V S T S D V S V U D V aV S V aV U U D UV V
V UD
D D
UU a@T ~U U a@T UU a@T UU a@T U @T U @T U @T U @T U U @T D @T U
Px Px Px Px Px Px Px Px Px Px
x x x x x x x x x x
x y v w y x a x 1 2 1 2 y x P P
: Px y Px : v w Px : f Px y Px : ax Px : Px x 1 Px : Px x 1 Px : y Px x : x : P Px x : P Px x
x x x x x x
T T T S V @T U D V @T U V UaV U U D V @T V U D V V U U @T D D
y x v
Table 21. Mimicking the Variables in
2 2
x : y : w : :
Px y g Px y v w Px x
T ) 1 S
S S S S S S S S S S D
477
This is easy to show. Notice that P contains no free occurrences of x. This is crucial, since it allows to directly translate P into a formula. Now we dene an embedding of into . Let h : P PV be a bijection from the set of predicate variables of onto the set of propositional variables of .
For the rstorder quantiers we put
and
The correctness of this translation follows from the fact that
Theorem 6.12 Let contain at most one free variable, the object variable x0 . Then there exists a QMLformula M such that for all Zstructures :
0 aV
U YU )V
) 0 2(
(6.51)
x0 iff M
q g
T S
exactly if px
v for some v
F.
x0
x0
VV aaV
k ! 5V U
U aaU k ! aV k U U ~U
0 "0 ) ( ( )
(6.50)
px
px
px
px
px
px
V T
S VaVaVaV k ! u U k ! aV k U U ~U
U a 0 aaV U (UU ~
V aV U V aU U
(6.49)
x x
px
px px
px px px
px
px
px x
V T
CVaVaaV V k ! U U a ~ k ! aV k 0 aaV U U U ~U (UU
V aV U V aU ~U
(6.48)
x x
px
g
f
px px
~ VaV U U V V D V aV U U V V
px px px
px
~U aU U aU
(6.47)
0 g ( 0 v '( 0 ( 0 h ( V U
Py
hP
1 : 1 : h : h
:
V aV VaV VaV VaV VaV V aV
ax
ca
1 2 1 2 P P
2 2 P P
px
px p
s
qr s V U
D V
q r
V U
T S
V U
s
V U
U U U U U U U U U U
U U
"0 ) l( ) U
Then
P iff P
x for some x
M.
px
478
Corollary 6.13 Modulo the identication M , and dene the same classes of connected nonempty and nite Zstructures. Further: is a nitely MSOaxiomatizable class of Zstructures iff M is a nitely QMLaxiomatizable class of MZstructures. This shows that it is sufcient to prove that nitely QMLaxiomatizable classes of MZstructures dene regular languages. This we shall do now. Notice that for the proof we only have to look at grammars with rules of the form X a aY and no rules of the form X . Furthermore, instead of regular grammars we can work with regular grammars , where we have a set of start symbols. Let G N A R be a regular grammar and x a string. For a derivation of x we dene a Zstructure over A N (!) as follows. We consider the grammar G : N A N R which consists of the following rules.
The map h : A N A : a X a denes a homomorphism from A N to A , which we likewise denote by h. It also gives us a map from Zstructures over A N to Zstructures over A. Every Gderivation of x uniquely denes a G derivation of a string x with h x x and this in turn denes a Z structure
N A R be a regular grammar and a Denition 6.14 Let G constant formula (with constants for elements of A). We say, G is faithful to if there is a subset H N such that for every string x, every x and every iff there exists X H with w QX . We say, H codes with w: x w respect to G.
s 1
0 ) ) ) (
) i 0 (
Here w a A.
Qa iff w
Q a X for some X and w
QX iff w
0 IT
S) 'bT
S) ') (
(6.54)
x :
Qa : a
QX : X
Q a X for some
From
we dene a model over the alphabet A N, also denoted by x .
0 IT
1 d0 ) (
S) ') (
(6.53)
Q aX : a X
V i U
a X :X
0 ) (
0 ) ( 0 ) (
S s
(6.52)
R :
a X Y :X
aY
s
V lU
q r
V YU
0 @ )
) ) (
0 ) ) ) ( D D
479
The idea behind this denition is as follows. Given a set H and a formula , H codes with respect to G if in every derivation of a string x is true in x at exactly those nodes where the nonterminal H occurs. The reader may convince himself of the following facts. Proposition 6.15 Let G be a regular grammar and let H code and K code with respect to G. Then the following holds.
We shall inductively show that every QMLformula can be coded in a regular grammar on condition that one suitably extends the original grammar.
where
We have
The following theorem is not hard to show and therefore left as an exercise.
V U
. A code for is a pair G H where L G Denition 6.19 Let and H codes with respect to G. is called codable if it has a code.
Lemma 6.18 Let be coded in G2 by H. Then is coded in G1 N1 H and in G2 G1 by H N1 .
G2 by A
Proposition 6.17 Let G1 and G2 be grammars over A. Then L G1 L G1 L G2 .
G2
) (
s 1
We call G1
G2 the product of the grammars G1 and G2 .
X1 X2
a : Xi
Ri
"0 "0
(@s S ( @S
(6.56)
R1
R2 :
X1 X2
a Y1 Y2 : Xi
) )
(6.55)
G1
G2 :
2 N1
N2 A R1
R2
aYi
Ri
) )
) )
Denition 6.16 Suppose that G1 are regular grammars . Then put
1 N1 A R1 and G2
K codes
K codes
H codes with respect to G.
with respect to G. with respect to G.
t V
2 N2 A R2
480
Assume G H is a code for and let G be given. Then we have L G G L G and is coded in G G by N H. Therefore, it sufces to name just one code for every formula. Moreover, the following fact makes life simpler for us. Lemma 6.20 Let be a nite set of codable formulae. Then there exists a grammar G and sets H , , such that G H is a code of . Proof. Let : i : i n and let Gi Mi be a code of i , i n. Put G : Gi and Hi : i n j i Ni Hi i j n N j . Iterated application of Lemma 6.18 yields the claim. Theorem 6.21 (Coding Theorem) Every constant QMLformula is codable. Proof. The proof is by induction over the structure of the formula. We begin with the code of a, a A. Dene the following grammar Ga . Put N : X Y and : X Y . The rules are
a . The code is G a X as where b ranges over all elements from A one easily checks. The inductive steps for , , and are covered by Proposition 6.15. Now for the case . We assume that is codable G H is a code. Now we dene G . Let N : N 01 and that C and : 0 1 . Finally, let the rules be of the form
for X a R . One easily checks that for every rule X aY or X a there is a rule G . The code of is now G N 1 . Now to the case . Again we assume that is codable and that the code is C G H . Now we dene G . Let N : N 0 1 , : 0 . Finally, let R be the set of rules of the form
) 0 ) (
0 ) (
) 0 ) (
"0 ) (
(6.61)
X 0
a Y 1
X 1
a Y 1
0 h (
T e S
0 S IT e
T ) e S
"0 ) (
(6.60)
X 0
where X
aY
R and Y
H . Further, we take all rules of the form
) 0 ) (
0 ) (
) 0 ) (
"0 ) (
(6.59)
X 0
a Y 0
where X
aY
R and Y
H and of the form X 0 a Y 1
) 0 ) (
0 ) (
) 0 ) (
"0 ) (
(6.58)
X 1
a Y 0
X 1
a Y 1
T ) e S
0 IT
S ')
0 ( D T S v
T ) Se (
(6.57)
a aX aY
b bX bY
T )
e k U D
) (
e Ik
e bk T
S D
T )
) ( S
Vk U
481
on board. The code of is now G N 1 . Now we look at . Again we put N : N 0 1 as well as : 0 1 . We take all rules of the form
a R . The code of is then G N 1 . Now we look at where X . Let N : N 0 1 , and : 0 . The rules are of the form
0 ) (
"0 ) (
(6.71)
X 0
where X
aY
R and X
0 ) (
"0 ) (
(6.70)
X 0
a Y 0
for X
aY
R and X
0 ) (
"0 ) (
(6.69)
X 0
for X
aY
R . Further there are rules of the form a Y 1
0 ) (
"0 ) (
(6.68)
X 1
a Y 1
H . Moreover, we take the rules
H , and, nally, all rules of the form X 1 a
T e S 0 S IT e )
D(
T ) e S
"0 ) (
(6.67)
X 0
0 v '(
for X
aY
R and Y
0 ) (
"0 ) (
(6.66)
X 1
a Y 0
H , as well as all rules
where X
aY
R and Y
0 ) (
"0 ) (
(6.65)
X 1
where X
aY
R . Further, the rules have the form a Y 1
0 ) (
"0 ) (
(6.64)
X 0
a Y 0
H . Moreover, we take the rules
0 g (
T ) e S
D IT e 0 S
0 ) (
T ) e S
"0 ) (
(6.63)
X 0
X 1
where X the rule
aY
R and X
H ; and nally for every rule X
) 0 ) (
a we take
0 ) (
) 0 ) (
"0 ) (
(6.62)
X 0
a Y 0
where X
aY
R and X
H and of the form X 1 a Y 0
482
where X a R . The code of is then G N 1 . The biggest effort goes into the last case, pi . To start, we introduce a new alphabet, namely A 0 1 , and a new constant, . Let a : a 0 a 1 . Further, assume that a and a 0 a . a A a 1 holds. Then a 1 Then let : pi . We can apply the inductive hypothesis to this formula. Let be the set of subformulae of . For an arbitrary subset let

We can nd a grammar G and for each sets H such that G H codes . Hence for every there exist H such that G H codes L . Now we rst form the grammar G1 with the nonterminals N and the alphabet 0 1 . The set of rules is the set of all A
1 where X a R and X H . Put H : H . Again one easily sees that 1 H 1 is a code for for every G . We now step over to the grammar G2 with N 2 : N 0 1 and A2 : A as well as all rules
where X a i R1 . Finally, we dene the following grammar . N 3 : N , A3 : A, and let
1 Q0 k ) k ) (
0 ) ) (
(6.78)
X i
Y i
R2
T ) xk ) S 1
be a rule iff such that
is the set of all for which
0 ) (
0 Y) (
(6.77)
and there are i i
01
1 "0 D ) (
0 ) ) (
(6.76)
X i
1 0 k ) k (
0 ) 0 ) ( (
where X
ai
0k )k )k (
0 ) ) (
(6.75)
X i
X i
R1 and
T e S
V U
e 2T ) e S
0 ) 0 ) ( (
(6.74)
ai
1 k
1 k
V aV U U 0 ) (
where X
aX
R, X
H and X
0k )k (
0 ) 0 ) ( (
(6.73)
ai
H ; further all rules of the form
) (
V U
e ) (
(6.72)
L :
0 ) ( u 0 ) x 0 ) ( ( }
0 S IT e D
0 ) (
~ U
k u 0 ) ( uD T ) e S D
T ) e S 0 ) e
483
Likewise
Put H : : . We claim: G3 H is a code for . For a proof let x be a string and let a G3 derivation of x be given. We construct a G 1 derivation. Let x i n xi . By assumption we have a derivation

i 1
We therefore have a G2 derivation of x. From this we immediately get a G 1 derivation. It is over the alphabet A 0 1 . By assumption is coded in G1 by H . Then holds in all nodes i with Xi H . This is the set of all i with Xi H for some with . This is exactly the set of all i with Xi i H . Hence we have Xi i H iff the Zstructure of x satises in the given G3 derivation at i. This however had to be shown. Likewise from a G1 derivation of a string a G3 derivation can be constructed, as is easily seen. Now we are almost done. As our last task we have to show that from the fact that a formula is codable we also get a grammar which only generates strings that satisfy this formula. So let be nite. We may assume that all members are sentences (if not, we quantify over the free variables with a universal quantier). By Corollary 6.13 we can assume that in place of MSOsentences we are dealing with QMLformulae. Further, a nite conjunction of QMLformulae is again a QMLformula so that we are down to
q r
1 d0 ) ) (
T ) e S i
10 } @k
) )
(6.84)
Xi ji i
Xi
Descending we get for every i
0
(6.83)
Xn
jn
R2
1 a ji and a i with ji
1
R2
By construction there exists a jn
n 1
(6.82)
1 "0 1
for i
1 and Xn
xn
and a n
) (
(6.81)
Xi
xi
Xi
n 1
such that
0 ) ) (
(6.80)
X i
R2
T ) 1 S
1 k
iff there is a
and some i
0 Y) ( S D
(6.79)
R3 0 1 with
484
the case where consists of a single QMLformula . By Theorem 6.21, has a code G H , with G N A R . Put G : H N H A RH , where X a R:X H
Now there exists a G derivation of x iff x . Notes on this section. Some remarks are in order about FOLdenable classes of Zstructures over the signature containing (!) and a, a A. A regular term is free if it does not contain , but may contain occurrences of , which is a unary operator forming the complement of a language. Then the following are equivalent. The class of Zstructures for L are nitely FOaxiomatizable.
See (Ebbinghaus and Flum, 1995) and references therein. Exercise 217. Show that every (!) language is an intersection of regular languages. (This means that we cannot omit the condition of nite axiomatizability in Theorem 6.5.) Exercise 218. Let be a nite MSOtheory, L the regular language which belongs to . L is recognizable in O n time using a nite state automaton. Give upper bounds for the number of states of a minimal automaton recognizing L. Use the proof of codability. Are the derived bounds optimal? Exercise 219. An MSOsentence is said to be in 1 if it has the form 1
where does not contain second order quantiers. is said to be in 1 if 1 it has the form P0 P1 Pn 1 P0 Pn 1 where does not contain second order quantiers. Show the following: Every MSOaxiomatizable class of Zstructures is axiomatizable by a set of 1 sentences. If is 1 1 sennitely MSOaxiomatizable then it is axiomatizable by nitely many 1 tences.
) aaa)
) iaaa)
U V
U V
~U yaa@V
U aa@V
~U V
U V
~ U
(6.86)
P0
P1
Pn
P0
Pn
1 i i i
1 ) i i
1 i
V U
There is a k x y k 1 z L.
1 i i i
V U
L t for a free regular term t. 1 such that for every y A and x z A : x y kz L iff
S S s
(6.85)
RH :
aY
R:X Y
) )
t (
0 ) ) ) (
) (
Categorization and Phonology
485
3.
In this section we shall deal with syllable structure and phonological rules. We shall look at the way in which discrete entities, known as phonemes, arise from a continuum of sounds or phones, and how the mapping between the sound continuum and the discrete space of language particular phonemes is to be construed. The issue is far from resolved; moreover, it seems that it depends on the way we look at language as a whole. Recall that we have assumed sign grammars to be completely additive: there is no possibility to remove something from an exponent that has been put there before. This has a number of repercussions. Linguists often try to dene representations such that combinatory processes are additive. If this is taken to be a denition of linguistic processes (as in our denition of compositionality) the organisation of phonology and the phoneticstophonology mapping have to have a particular form. We shall discuss a few examples, notably umlaut and nal devoicing. is . How can For example, the plural of the German noun this be realized in an additive way? First, notice that the plural is not formed from the singular; rather, both forms are derived from an underlying form, the root. Notice right away that the root cannot be a string, it must be a string where at most one vowel is marked for umlaut. (Not all roots will undergo umlaut and if so only one vowel is umlauted!) Technically, we can implement . This allows us to restrict this by writing the root as a string vector: our attention to the representation of the vowel alone. Typically, in grammar books the root is assumed to be just like the singular: . Early phonological theory on the other hand would have posited an abstract phoneme in place of or , a socalled archiphoneme. Write for the archiphoneme that is underspecied between and . Then the root is , and the singular adds the specication, say an element , that makes be like , while the plural adds something, say that makes be like . In other words, in place of and we have and .
U
This solution is additive. Notice, however, that cannot be pronounced, and so the root remains an abstract element. In certain representations, however, is derived from . Rather than treating the opposition between and as
V
(6.88)
plur : x y z
z y
z:
(6.87)
sing : x y z
z y
z:
5 h 4 VH b9WU
S
584 7S h 5 h 4 86 S
6 S
5 h 4 b9H
5 h 84
58$4 h 5 h 8$4
5 h b4 H
5 h 84
S S
6 S
486
equipollent, we may treat it as privative: is plus something else. One specic proposal is that differs from in having the symbol i in the i tier (see (Ewen and van der Hulst, 2001) and references therein). So, rather than writing the vowels using the Latin alphabet, we should write them as sequences indicating the decomposition into primitive elements, and the process becomes literally additive. Notice that the alphabet that we use actually is additive. differs from by just two dots and this is the same with and . Historically, the dots derive from an e that was written above the vowel to indicate umlaut. (This does not always work; in Finnish is written , blocking for us this cheap way out for Finnish vowel harmony.) Final devoicing could be solved similarly by positing a decomposition of voiced consonants into voiceless consonant plus an abstract voice element (or rather: being voiceless is being voiced plus having a devoicingfeature). All these solutions, however, posit two levels of phonology: a surface phonology and a deep phonology. At the deep level, signs are again additive. This allows us to say that languages are compositional from the deep phonological level onwards. The most inuential model of phonology, by Chomsky and Halle (1968), is however not additive. The model of phonology they favoured referred to simply as the SPEmodel transforms deep structures into surface structures using context sensitive rewrite rules. We may illustrate these rules with German nal devoicing. The rule says, roughly, that syllable nal consonants (those following the vowel) are voiceless in German. However, as we have noted earlier (in Section 1.3), there is evidence to assume that some consonants are voiced and only become voiceless exactly when they end up in syllable nal position. Hence, instead of viewing this as a constraint on the structure of the syllable we may see this as the effect of a rule that devoices consonants. Write for the syllable boundary. Sidestepping a few difculties, we may write the rule of nal devoicing as follows.
(Phonologists write +voiced what in attributevalue notation is VOICED : .) This says that a consonant preceding a voiceless sound or a syllable boundary becomes voiceless. Using such rules, Chomsky and Halle have formulated a theory of the sound structure of English. This is a Type 1 grammar for English. It has been observed, however, by Ron Kaplan and Martin Kay (1994) and Kimmo Koskenniemi (1983) that for all that language really needs the relation between deep level and surface level is a regular relation
(6.89)
voiced
voiced
voiced or
v | D
487
and can be effected by a nite state transducer. Before we go into the details, we shall explain something about the general abstraction process in structural linguistics, exemplied here with phonemes, and on syllable structure. Phonetics is the study of phones (= linguistic sounds) whereas phonology is the study of the phonemes of the languages. We may simply dene a phoneme as a set of phones. Different languages group different phones into different phonemes, so that the phonemes of languages are typically not comparable. The grouping into phonemes is far from trivial. A good exposition of the method can be found in (Harris, 1963). We shall look at the process of phonemicization in some detail. Let us assume for simplicity that words or texts are realized as sequences of discrete entities called phones. This is not an innocent assumption: it is for example often not clear whether the sequence [t] plus [S], resulting in an affricate [tS], is to be seen as one or as two phones. (One can imagine that this varies from language to language.) Now, denote the set of phones by . A word is not a single sequence of phones, but rather a set of such sequences. Denition 6.22 L is a language over if L is a subset of such that and if W W L and W W then W W . We call the members of L words. x W is called a realization of W . For two sequences x and y we write x L y if they belong to (or realize) the same word. One of the aims of phonology is to simplify the alphabet in such a way that words are realized by as few as possible sequences. (That there is only one sequence for each word in the written system is an illusion created by orthographical convention. English orthography often has little connection with actual pronunciation.) We proceed by choosing a new alphabet, P, and a map s we say ping : P. The map induces a partition on . If s that s and s are allophones. induces a mapping of L onto a subset of P in the following way. For a word W we write W : x : x W . Finally, L : W :W L .
Lemma 6.24 Let L be a language and : nating for L, L is a language over P.
P. If is discrimi-
t j
V U
1 @k }
Denition 6.23 Let : P be a map and L is called discriminating for L if whenever W W W .
be a language . L are distinct then W
V U
V k U
V U
1 i V U i
V U
w k t D
V U
1 k
V U
1 i
i D D
V U
1 w
488
Denition 6.25 A phonemicization of L is a discriminating map v : A B such that for every discriminating w : A C we have C B . We call the members of B phonemes. If phonemes are sets of phones, they are clearly innite sets. To account for the fact that speakers can manipulate them, we must assume that they are nitely specied. Typically, phonemes are dened by means of articulatory gestures, which tell us (in an effective way) what basic motor program of the vocal organs is associated with what phoneme. For example, English [p] is voiceless. This says that the chords do not vibrate while it is being pronounced. It is further classied as an obstruent. This means that it obstructs the air ow. And thirdly it is classied as a bilabial: it is pronounced by putting the lips together. In English, there is exactly one voiceless bilabial obstruent, so these three features characterize English [p]. In Hindi, however, there are two phonemes with these features, an aspirated and an unaspirated one. (In fact, the actual pronunciation of English [p] for a Hindi speaker oscillates between two different sounds, see the discussion below.) As sounds have to be perceived and classied accordingly, each articulatory gesture is identiable by an auditory feature that can be read off its spectrum. The analysis of this sort ends in the establishment of an alphabet P of abstract sounds classes, dened by means of some features, which may either be called articulatory or auditory. (It is not universally agreed that features must be auditory or articulatory. We shall get to that point below.) These can be modeled in the logical language by means of constants. For example, the . Then is the same feature +voiced corresponds to the constant as being unvoiced. The features are often interdependent. For example, vowels are always voiced and continuants. In English and German voiceless plosives are typically aspirated, while in French this is not the case; so [t] is pronounced with a subsequent [h]. (In older German books one often nds (part) in place of the modern .) The aspiration is lacking when [t] is preceded within the syllable by a sibilant, which in standard German always is [S], for example in ["StUmpf]. In German, vowels are not simply long or short. Also the vowel quality changes with the length. Long vowels are tense, short vowels are not. The letter is pronounced [] when it is short and [i:] when it is long (the colon indicates a long sound). (For example, (sense) ["zn] as opposed to (deep) ["t h i:f].) Likewise for the other vowels. Table 22 shows the pronunciation of the long and short vowels,
P D b!h
d jju RC
9 f
G s
d jju RC
h D bns
P D h b!$s
X 8
f 85A 7 4
9 9D $}
Categorization and Phonology Table 22. Long and short vowels of German
489
long i: a: o: u:
short a O U
long y: e: : E:
short Y @ E
drawn from (IPA, 1999), Page 87 (written by Klaus Kohler). Only the pairs [a:]/[a] and [E:]/[E] are pronounced in the same way, differing only in length. It is therefore not easy to say which feature is distinctive: is length distinctive in German for vowels, or is it rather the tension? This is interesting in particular when speakers learn a new language, because they might be forced to keep distinct two parameters that are cogradient in their own. For example, in Finnish vowels are almost purely distinct in length, there is no cooccurring distinction in tension. If so, tension cannot be used to differentiate a long vowel from a short one. This is a potential source of difculty for Germans if they want to learn Finnish. If L is a language in which every word has exactly one member, L is uniquely dened by the language L : x: x L . Let us assume after suitable reductions that we have such a language ; then we may return to studying languages in the customary sense. It might be thought that languages do not possess nontrivial phonemicization maps. This is, however, not so. For example, English has two different sounds, [p] and [p h ]. The rst occurs after [s], while the second appears for example word initially before a vowel. It turns out that in English [p] and [ph ] are not two but one phoneme. To see why, we offer rst a combinatorial and then a logical analysis. Recall the denition of a context set. For regular languages it is simply
If ContL a ContL a , a and a are said to be in complementary distribution. An example is the abovementioned [p] and [p h ]. Another example is pronounced is [x] versus [ ] in German. Both are written . However, [x] if occurring after [a], [o] and [u], while it is pronounced [c] if occurring af ter other vowels and [r], [n] or [l]. Examples are ["lct], ["naxt], ["ect] and ["fuKct]. (If you do not know German, here is a short
4 G H
G H
G 5
G 5
1 i W W i 0 !) @S i i(
Vk U
G H
V U
(6.90)
ContL a :
x y :x a y
1 i T IS
i IS
5 $7
t @V U
G H
490
description of the sounds. [x] is pronounced at the same place as [k] in English, but it is a fricative. [c] is pronounced at the same place as in English , however the tongue is a little higher, that is, closer to the palatum and also the air pressure is somewhat higher, making it sound harder.) Now, from Denition 6.25 we extract the following. Denition 6.26 Let A be an alphabet and L a language over A. : A B is a prephonemicization if is injective on L. : A B is a phonemicization if for all prephonemicizations : A C, C B. The map sending [x] and [c] to the same sound is a prephonemicization in German. However, notice the following. In the language L 0 : , and are in complementary distribution. Nevertheless, the map sending both to the same element is not injective. So, complementary distribution is not enough to make two sounds belong to the same phoneme. We shall see be. We may either send and to low what is. Second, let L1 : and obtain the language M0 : , or we may send and to and obtain the language M1 : . Both maps are phonemicizations, as is easily checked. So, phonemicizations are not necessarily unique. In order to analyse the situation we have to present a few denitions. The general idea is this. Suppose that A is not minimal for L in the sense that it possesses a noninjective phonemicization. Then there is a prephonemicization that conates exactly two symbols into one. The image M of this map is a regular language again. Now, given the latter we can actually recover for each member of M its preimage under this conation. What we shall show now is that moreover if L is regular there is an explicit procedure telling us what the preimage is. This will be cast in rather abstract terms. We shall dene here a modal language , namely with converse. that is somewhat different from The syntax of propositional dynamic logic (henceforth PDL) has the usual boolean connectives, the program connectives ;, , , further ? and the brackets and . Further, there is a set 0 of elementary programs. Every propositional variable is a proposition. if and is a proposition, so are , If is a proposition, ? is a program. Every elementary program is a program.
If and are programs, so are ; ,
, and .
, and
T l)
H @H
cF f
!r
T T p ) T ) h l$p
s
l) S X X S H D h S D D
0 '( v
2v
G 5
491
If is a program and a proposition, and are propositions.
(6.91)
R ? :
x : there is y : x R y and y
We write F R if x . Elementary PDL (EPDL) is the fragment of PDL that has no star. The elements of 0 are constants; they are like the modalities of modal logic. Obviously, it is possible to add also propositional constants. In addition to , it also has a program constructor . denotes the R . The axiomatizaconverse of . Hence, in a Kripkeframe R tion consists in the axioms for together with the axioms p p, p p for every program . The term dynamic logic will henceforth refer to an extension of by some axioms. The fragment without is called elementary PDL with converse, and is denoted by . An ana. log of B chis Theorem holds for the logic u Theorem 6.27 Let A be a nite alphabet. A class of MZstructures over A is regular iff it is axiomatizable over the logic of all MZstructures by means of constant formulae in (with constants for letters from A). Proof. By Kleenes Theorem, a regular language is the extension of a regular term. The language of such a term can be written down in using a formula it can be constant formula. Conversely, if is a constant rewritten into an formula.
0 ! (
"r
r "E
T V U 1 T V U 1
'V U
V U !r
V U "r
V d U
V U V U
"r
V U
V U "r
"r
"r
q r
0 aU ( aU
0 ) ) (
: R R ; : R R : R
S V D S V D U V D U V D U V D ) (@S V D v 'U V
x : for all y : if x R y then y
x x :x
aV V U @V X V U V s T V U 1 0 V U saV U V V U s@V t @V
V U
V U
: F
A Kripkemodel is a triple F R , where R : 0 F . We extend the maps R and as follows.
0 (
F 2 , and : PV
0 ) ) (
U U
0 ! (
V U
492
The last point perhaps needs reection. There is a straightforward translation of into . We only have to observe that the transitive closure of an denable relation is again denable (see Exercise 216).
X z
X y
(6.93b) (6.93c) (6.93d)
R ; R
R ; R R ?
R ?
can also be seen as an axiomatic extension of ; Hence, by the axioms p p, p p. Now let be a dynamic logic. Recall from Section 4.3 the denition of , the global consequence associated with . Now, we shall assume that we have a language ; D , where D is a set of constants. For simplicity, we shall assume that for each letter a A, D contains a constant a. However, there may be additional constants. It is those constants that we shall investigate here. We shall show (i) that these constants can be eliminated in an explicit way, (ii) that one can always add constants such that A can be be described purely by contact rules. Denition 6.28 Let be a dynamic logic and q a formula. q globally implicitly denes q in if q ; q q. q Features (or constants, for that matter) that are implicitly dened are called inessential. Here the leading idea is that an inessential feature does not constitute a distinctive phonemic feature, because removing the distinction that this feature induces on the alphabet turns out to induce an injective map. 0 1 be an alphabet, and Formally, this is spelled out as follows. Let A assume that the second component indicates the value of the feature . Let : A 01 A be the projection onto the rst factor. Suppose that the denes language L can be axiomatized by the constant formula . implicitly if : L L is injective. This in turn means that the map is a
u u
V h
U !r
V U
Vu U
Vu U
U "r
T ) e S
k V U
0 ( h !
mV k
V U V V aU U V U V s U
V U
0 h ( !
V V aU U D V V aU U D V V U aU V V s aU U
(6.93a)
Notice also that we can eliminate lowing identities.
from complex programs using the fol-
VaV U V aV k U @V U V k V @V U V U ~U ~U U
~ U
V U !r
(6.92)
xR y
X X x
q r
q g
T ) e S
r r q"
X z
zRz
493
prephonemicization. For in principle we could do without the feature. Yet, it is not clear that we can simply eliminate it. In we call eliminable if there is a formula provably equivalent to that uses only the constants of without . In the present case, however, an inessential feature is also eliminable. Notice rst of all that a regular language over an alphabet B is denable by means a constant formula over the logic of all strings, with constants b for every element b of B. By Lemma 6.31, it is therefore enough to show the claim for the logic of all strings. Moreover, by a suitable replacement of other variables by new constants we may reduce the problem to the case where p is the only variable occurring in the formula. Now the language L is regular over the alphabet A 0 1 . Therefore, L is regular as well. This means that it can be axiomatized using a formula without the constant . However, this only means that we can make the representation of words more compact. Ideally, we also wish to describe for given a A, in which context we nd a 0 (an a lacking ) and in which context we nd a 1 (an a having ). This can be done. Let A Q q0 F be a nite state automaton. Then L q : x : q0 q is a regular language (for L q L A Q q0 q , and the latter is a nite state automaton). Furthermore, A q Q L q . If is deterministic, then L q L q whenever q q . Now, let be a deterministic nite state automaton over A 0 1 such that x L iff x . Suppose we have a constraint , where is a constant formula. Denition 6.29 The FisherLadner closure of , FL , is dened as follows. FL : : : : : :
(6.94g) (6.94h)
The FisherLadner closure covers only ; formulae, but this is actually enough for our purposes. For each formula in the FisherLadner closure of we introduce a constant c . In addition, we add the following
V U
V h
U "r
V U
( aU
(aU ( aU
(6.94f)
( aU
(6.94e)
V 0 aU (
0 (@S 0 (@S 0 (@S ( @S ( @S
( aU
(6.94d)
V U s "T V U s @V U s T s@V 0 0 aU ( ( s T V 0 0 aU ( ( s "T 0 s @V 0 aU ( s "T 0 Vk U s V U s T k
(6.94c)
FL FL ; FL FL ? FL
FL
T S
V 0 V 0 V 0 V 0 V 0 Vk V U
(6.94b)
FL FL FL FL ; FL FL FL ? FL FL FL basic
(6.94a)
FL pi :
pi
V U 1 i k V U D V ) S a0 bT ') ) ) D aU (
V U
V U
T ) e S
Vk U
t V U
x
A
0 ) (
Vu U
Vu U
% !r
0 ) )
T ) e S
) ) (
Du
0 ) ( i IS
Vu U
V U
494
axioms.
(6.95)
We call these formulae cooccurrence restrictions. After the introduction of these formulae as axioms c is provable for every FL . In parc is provable. This means that we can eliminate in favour ticular, of c . The formulae that we have just added do not contain any of ?, , , or ;. We only have the most simple axioms, stating that some constant is true before or after another. Now we construct the following automaton. Let be a subset of FL . Then put
Now let Q be the set of all consistent q . Furthermore, put q q iff q ; a? q is consistent. Let F : q : and B : q : . For every b B, A Q b F is a nite state automaton. Then
b B
is a regular language. It immediately follows that the automaton above is welldened and for every subformula of the set of positions i such that x i is uniquely xed. Hence, for every x there exists exactly one accepting run of the automaton. x i iff holds at the ith position of the accepting run. We shall apply this to our problem. Let be an implicit denition of . Construct the automaton for as just shown, and lump together a into a single state q and put q q for all states that do not contain c every a. All states different from q are accepting. This denes the automaton . Now let C : q : . The language c C L q is regular, and it possesses a description in terms of the constants a, a A, alone.
u
V U
Vu U
Vu U
i F0 ) (
V a0 ) ) ) ) aU (
(6.97)
L:
L AQbF
1 h P
1 P
T 1 k VaVU U u VVu U aU
V U
0 ) ) ) ) (
@V
(6.96)
q :
V U
V 0 0 aU ( ( V 0
V 0 aU (
5V
V U
0 aU h ( 0 aU ( 0 aU (
c ; c c c
V U
c ?
V U
U V 0 h( U V 0 ( V U V (0 aU "V ( 0 aU V ( V U "V @V U V V U V
c
( aU
D V U
V U
( aU
( aU
d0 ) ( i
V U 0
495
Denition 6.30 Let be a logic and q a formula. Further, let be a formula not containing q. We say that globally explicitly denes q in with respect to if q q. Obviously, if globally explicitly denes q with respect to q then q globally implicitly denes q. On the other hand, if q globally implicitly denes q then it is not necessarily the case that there is an explicit denition for it. It very much depends on the logic in addition to the formula whether there is. A logic is said to have the global Bethproperty if for any global implicit denition there is a global explicit denition. Now suppose that we have a formula implicitly dening q. Suppose further that is an explicit denition. Then the following is valid.
The logic dened by adding the formula as an axiom to can therefore equally well be axiomatized by . The following is relatively easy to show. Lemma 6.31 Let be a modal logic, and a constant formula. Suppose that has the global Bethproperty. Then also has the global Beth property. Theorem 6.32 Every logic of a regular string language has the global Beth property. If the axiomatization is innite, by the described procedure we get an innite array of formulae. This does not have a regular solution in general, as the reader is asked to show in the exercises. The procedure of phonemicization is inverse to the procedure of adding features that we have looked at in the previous section. We shall briey look at this procedure from a phonological point of view. Assume that we have an alphabet A of phonemes, containing also the syllable boundary marker and the word boundary marker . These are not brackets, they are separators. Since a word boundary is also a syllable boundary, no extra marking of the syllable is done at the word boundary. Let us now ask what are the rules of syllable and word structure in a language. The minimal assumption is that any combination of phonemes may form a syllable. This turns out to be false. Syllables are in fact constrained by a number of (partly language dependent)
V U
V U
V U
(6.98)
V U
V U
V U
V U
XV
496
principles. This can partly be explained by the fact that vocal tract has a certain physiognomy that discourages certain phoneme combinations while it enhances others. These properties also lead to a deformation of sounds in contact, which is called sandhi, a term borrowed from Sanskrit grammar. A particular example of sandhi is assimilation ([np] [mp]). Sandhi rules exist in nearly all languages, but the scope and character varies greatly. Here, we shall call sandhi any constraint that is posed on the occurrence of two phonemes (or sounds) next to each other. Sandhi rules are 2templates in the sense of the following denition. Denition 6.33 Let A be an alphabet. An ntemplate over A (or template of length n) is a cartesian product of length n of subsets of A. A language L is an ntemplate language if there is a nite set of length n such that L is the set of words x such that every subword of length n belongs to at least one template from . L is a template language if there is an n such that L is an ntemplate language. Obviously, an ntemplate language is an n 1template language. Furthermore, 1template languages have the form B where B A. So the rst really interesting class is that of the 2template languages. It is clear that if the alphabet is nite, we may actually dene an ntemplate to be just a member of An . Hence, a template language is dened by naming all those sequences of bounded length that are allowed to occur. Proposition 6.34 A language is a template language iff its class of Astrings is axiomatizable by nitely many positive EPDLformulae. To make this more realistic we shall allow also boundary templates. Namely, of left edge templates and a set of right edge we shall allow a set templates. lists the admissible nprexes of a word and the admissible nsufxes. Call such languages boundary template languages. Notice that phonological processes are conditioned by certain boundaries, but we have added the boundary markers to the alphabet. This effectively eliminates the need for boundary templates in the description here. We have not explored the question what would happen if they were eliminated from the alphabet. Proposition 6.35 A language is a boundary template language iff its class of Astrings is axiomatizable by nitely many EPDLformulae. It follows from Theorem 6.5 that template languages are regular (which is is regular but easy to prove anyhow). However, the language not a template language.
p p s H H
497
The set of templates effectively names the legal transitions of an automaton that uses the alphabet A itself as the set of states to recognize the language. We shall dene this notion, using a slightly different concept here, namely that of a partial nite state automaton. This is a quintuple A Q I F , such that A is the input alphabet, Q the set of internal states, I p the set of initial states, F the set of accepting states and : A Q Q a partial function. accepts x if there is a computation from some q I to some q F with x as input. is a 2template language if Q A and a b is either undened or a b b. The reason for concentrating on 2template languages is the philosophy of naturalness. Basically, grammars are natural if the nonterminal symbols can be identied with terminal symbols, that is, for every nonterminal X there is ContL a . a terminal a such that for every Xstring x we have Cont L x For a regular grammar this means in essence that a string beginning with a has the same distribution as the letter a itself. A moments reection reveals that this is the same as the property of being 2template. Notice that the 2template property of words and syllables was motivated from the nature of the articulatory organs, and we have described a parser that recognizes whether something is a syllable or a word. Although it seems prima facie plausible that there are also auditory constraints on phoneme sequences we know of no plausible constraint that could illustrate it. We shall therefore concentrate on the former. What we shall now show is that syllables are not 2template. This will motivate either adding structure or adding more features to the description of syllables. These features are necessarily nonphonemic. We shall show that nonphonemic features exist by looking at syllable structure. It is not possible to outline a general theory of syllable structure. However, the following sketch may be given (see (Grewendorf et al., 1987)). The sounds are aligned into a socalled sonoricity hierarchy, which is shown in Table 23 (vd. = voiced, vl. = voiceless). The syllable is organized as follows.
Syllable Structure. Within a syllable the sonoricity increases monotonically and then decreases.
This means that a syllable must contain at least one sound which is at least as sonorous as all the others in the syllable. It is called the sonoricity peak. We shall make the following assumption that will simplify the discussion.
Sonoricity Peak. The sonoricity peak can be constituted by vowels only.
V U
V ) U
V U i
V ) U
0 ) ) ) ) ( 1 k
498
Table 23. The Sonoricity Hierarchy
This wrongly excludes the syllable [krk], or [dn]. The latter is heard in the German (to disappear) [ fEK"Swndn]. (The second that appears in writing is hardly ever pronounced.) However, even if the assumption is relaxed, the problem that we shall address will remain. The question is: how can we implement these constraints? There are basiformulae. cally two ways of doing that. (a) We state them by means of This is the descriptive approach. (b) We code them. This means that we add some features in such a way that the resulting restrictions become speciable by 2templates. The second approach has some motivation as well. The added features can be identied as states of a productive (or analytic) device. Thus, while the solution under (a) tells us what the constraint actually is, the approach under (b) gives us features which we can identify as (sets of) states of a (nite state) machine that actually parses or produces these structures. That this can be done is expressed in the following corollary of the Coding Theorem. Theorem 6.36 Any regular language is the homomorphic image of a boundary 2template language. So, we only need to add features. Phonological string languages are regular, so this method can be applied. Let us see how we can nd a 2template solution for the sonoricity hierarchy. We introduce a feature and its negation . We start with the alphabet P, and let C P be the set of consonants. The new alphabet is
S e
s T
v S 8e
(6.99)
!r
vd. plosives [b], [d]
vl. fricatives [s], [S]
rsounds [r]
nasals; laterals [m], [n]; [l]
dark vowels [a], [o]
mid vowels [], []
high vowels [i], [y] vd. fricatives [z], [Z] vl. plosives [p], [t]
G Hp 9 9 D h n bh A 5
499
Let son a be the sonoricity of a. (It is some number such that the facts of Table 23 fall out.) a a
(6.100)
As things are dened, any subword of a word is in the language. We need to mark the beginning and the end of a sequence in a special way, as described above. This detail shall be ignored here. has a clear phonetic interpretation: it signals the rise of the sonoricity. It has a natural correlate in what de Saussure calls explosive articulation. A phoneme carrying is pronounced with explosive articulation, a phoneme carrying is pronounced with implosive articulation. (See (Saussure, 1965).) So, actually has an articulatory (and an auditory) correlate. But it is a nonphonemic feature; it has been introduced in addition to the phonemic features in order to constrain the choice of the next phoneme. As de Saussure remarks, it makes the explicit marking of the syllable boundary unnecessary. The syllable boundary is exactly where the implosive articulation changes to explosive articulation. However, some linguists (for example van der Hulst in (1984)) have provided a completely different answer. For them, a syllable is structured in the following way. (6.101)
[onset
[nucleus
So, the grammar that generates the phonological strings is actually not a regular grammar but context free (though it makes only very limited use of phrase structure rules). marks the onset, while marks the nucleus together with the coda (which is also called rhyme). So, we have three possible ways to arrive at the constraint for the syllable structure: we postulate an axiom, we introduce a new feature, or we assume more structure. We shall nally return to the question of spelling out the relation between deep and surface phonological representations. We describe here the simplest kind of a machine that transforms strings into strings, the nite state transducer. Denition 6.37 Let A and B be alphabets. A (partial) nite state transducer from A to B is a sextuple A B Q i0 F such that i0 Q, F Q and
T)S1 T v 0 a0 k ) k 0 () TV k U dV U ) 1 k a0 i) k 0 0 v () TV k U fV U 0a0 i) k 0 v () 0 a0 ) k 0 ()

a
: son a
:a
C son a
:a
coda]]
T V k U
0 ) ) ) ) ) (
dV U v
vi) a@s (( S ) a@s (( S vi) a@s (( S (( ) a@S
V U v
: son a
son a
son a
son a
500
: Q A Q B where q x is always nite for every x A . Q is called the set of states, i0 is called the initial state, F the set of accepting states and the transition function. is called deterministic if q a contains at most one element for every q Q and every a A.
We call A the input alphabet and B the output alphabet. The transducer differs from a nite automaton in the transition function. This function does not only say into which state the automaton may change but also what symbol(s) it will output on going into that state. Notice that the transducer may also output an empty string and that it allows for empty transitions. These are not eliminable (as they would be in the nite state automaton) since the machine may accompany the change in state by a nontrivial output. We write
if the transducer changes from state q with input x ( A ) into the state q and outputs the string y ( B ). This is dened as follows.
Finally one denes
Transducers can be used to describe the effect of rules. One can write, for example, a transducer that syllabies a given input according to the constraints on syllable structure. Its input alphabet is A , where A is the set of phonemes, the word boundary and the syllable boundary. The output alphabet is A . Here, stands for onset, for nucleus, and for coda. The machine annotates each phoneme stating whether it belongs to the onset of a syllable, to its nucleus or its coda. Additionally, the machine inserts a syllable boundary wherever necessary. (So, one may leave the input partially or entirely unspecied for the syllable boundaries. The machine will look which syllable segmentation can or must be introduced.)
T ) v S s
T ) v S s T p C') ) g e S 9 v
# $i
0 !) @S i i(
V lU
(6.104)
x y : there is q
F with i0
x:y
A A
i Wi
q q 1 1q and x u u1 y
i!) i W i i D k v kk v
u:v
A A
u :v
A A
i i i i !) ) a) a) kk

)k
(6.103)
x:y
A A
if
or
for some q u u1 v v1 : v v1 .
V 6) U i
1 i) dV !lk U
q y
qx
1 i

1 i
(6.102)
x:y
A A
V ) U
1 i
V 6) U i
501
Now we write a machine which simulates the actions of nal devoicing. It has one state, i0 , it is deterministic and the transition function consists in [b] : [p] , [d] : [t] , [g] : [k] as well as [z] : [s] and [v] : [f] . Everywhere else we have P : P , P a phoneme, . The success of the construction is guaranteed by a general theorem known as the Transducer Theorem. It says that the image under transduction of a regular language is again a regular language. The proof is not hard. First, by adding some states, we can replace the function : Q A Q B by a function : Q A Q B for some set Q . The details of this construction are left to the reader. Next we replace this function by the function 2 : Q A B Q . What we now have is an automaton over the alphabet A B . We now take over the notation from the Section 5.3 and write x y for the pair consisting of x and y. We dene
Denition 6.38 Let R be a regular term. We dene L 2 R as follows. (6.106b) (6.106c) (6.106d) (6.106e) L2 x y : L R S : L R S : L R
2 2 2
L R
A regular relation on A is a relation of the form L 2 R for some regular term R.
This is essentially a consequence of the Kleenes Theorem. In place of the alphabets A we have chosen the alphabet A B . Now observe that the transitions : do not add anything to the language. We can draw a lot of conclusions from this. Corollary 6.40 (Transducer Theorem) The following holds. Regular relations are closed unter intersection and converse.
}D
Theorem 6.39 A relation Z transducer such that L
A Z.
B is regular iff there is a nite state
V U
V U
aV U s V U
L R
L S
L R
2
L S
TV U 1 ) (V U 1 2 U i i
Ti
V lU
V U D V s U D S V U D iIS V 2 U i i V U
(6.106a)
L2 0 :
x y
x y
V U
i V i W BV i W 'U iU
V 2 U W V i 'U i i i
(6.105)
u v
w x :
u w
v x
0 p b')
( 8') 0 p U
0 ) (
0 ) ( 0 p 8')
( 8') 0 p V i
e U
( b') 0 p V U
% &
0 p 8') D D
T ') ) p 0bp') ( 8') 09 p ( b') ( 0 p
i i
S 1 H ( 0 p 8') (
502
If Z is a regular relation and H A a regular set then Z H also is regular. Z H : y : there is x H with x y Z One can distinguish two ways of using a transducer. The rst is as a machine which checks for a pair of strings whether they stand in a particular regular relation. The second, whether for a given string over the input alphabet there is a string over the output alphabet that stands in the given relation to it. In the rst use we can always transform the transducer into a deterministic one that recognizes the same set. In the second case this is impossible. The relation n : n is regular but there is no deterministic translation algorithm. One easily nds a language in which there is no deterministic algorithm in any of the directions. From the previous results we derive the following consequence.
Proof. By assumption and the previous theorems, both R C and A S are regular. Furthermore, R C A S x y z : x y R y z S is regular, and so is its projection onto A B , which is exactly R S. This theorem is important. It says that the composition of rules which dene regular relations denes a regular relation again. Effectively, what distinguishes regular relations from Type 1 grammars is that the latter allow arbitrary iterations of the same process, while the former do not. Notes on this section. There is every reason to believe that the mapping from phonemes to phones is not constant but context dependent. In particular, nal devoicing is believed by some not to be a phonological process, rather, it is the effect of a contextually conditioned change of realization of the voice feature (see (Port and ODell, 1985)). In other words, on the phonological level nothing changes, but the realization of the phonemes is changed, sometimes so radically that they sound like the realization of a different phoneme
e X 1 i i( "0 ) ) C0 !) ( 0 ) ) @S V e 8V 1 i i i i i( U t D e e
Corollary 6.41 (Kaplan & Kay) Let R A lar relations. Then R S A C is regular.
B and S
C be regu-
1 i i 0 ) (
1 i
2 Z :
y : there is x with x y
Z .
1d0 !) ( i i 1 i i d0 !) (
1 Z :
i IS
iIS D i IS D
e T
If Z
B is a regular relation, so are the projections x : there is y with x y Z ,
If H
A is regular so is H
B . If K
B is regular so is A
K.
) @S ( H H
503
(though in a different environment). This simplies phonological processes at the cost of complicating the realization map. The idea of eliminating features was formulated in (Kracht, 1997) and already brought into correspondence with the notion of implicit denability. Concerning long and short vowels, Hungarian is an interesting case. The vowels , , , , show length contrast alone, while the long and short forms of and also differ in lip attitude and/or aperture. Sauvageot noted in (1971) that Hungarian moved towards a system where length alone is not distinctive. Effectively, it moves to eliminate the feature . Exercise 220. Show that for every given string in a language there is a separation into syllables that conforms to the Syllable Structure constraint. Exercise 221. Let 0 : i : i n be a nite set of basic programs. Dene M : i : i n i : i n . Show that for every formula there is a modal formula over the set M of modalities such that . Remark. A modal formula is a formula that has no test, and no and ;. formula. Whence it can be seen as a Exercise 222. The results of the previous section show that there is a translaM into M . Obviously, the problematic symbols are tion of and . With respect to the technique shown above works. Can you suggest a perspicuous translation of ? Hint. holds if holds in the smallest set of worlds closed under successors containing the current world. This can be expressed in rather directly. Exercise 223. Show that in Theorem 6.32 the assumption of regularity is nec2n n :n essary. Hint. For example, show that the logic of L fails to have the global Bethproperty. Exercise 224. Prove Lemma 6.31. Exercise 225. One of the aims of historical linguistics is to reconstruct the afliation of languages, preferrably by reconstructing a parent language for a certain group of languages and showing how the languages of that group developed from that parent language. The success of the reconstruction lies in the establishment of socalled sound correspondences. In the easiest case they take the shape of correspondences between sounds of the various languages. Let us take the IndoEuropean (I.E.) languages. The ancestor of this language, called IndoEuropean, is not known directly to us, if it at all existed. The proof of its existence is among other the successful estabr " # t
"r
d Ijh
"r
U s
s
FT S s
U "r
h 7 7 Vg g
504
lishment of such correspondences. Their reliability and range of applicability have given credibility to the hypothesis of its existence. Its sound structure is reconstructed, and is added to the sound correspondences. (We base the correspondence on the written language, viz. transcriptions thereof.) IE Sanskrit
G G
Greek
G A g @8 9 A g $R 9 h A g 9 h H d #h G H 4 8 9h G A g A $D g g 5 G A f bh 4
Latin
Some sounds of one language have exact correspondences in another. For corresponds to across all languages. (The added star inexample, I.E. dicates a reconstructed entity.) With other sounds the correspondence is not so clear. I.E. and become in Sanskrit. Sanskrit in fact has multiple correspondences in other languages. Finally, sounds develop differently becomes Sanskrit , but it in different environments. In the onset, I.E. becomes at the end of the word. The details need not interest us here. Write a transducer for all sound correspondences displayed here. Exercise 226. (Continuing the previous exercise.) Let L i , i n, be languages over alphabets Ai . Show the following: Suppose R is a regular relation between Li , i n. Then there is an alphabet P, a protolanguage Q P , and regular relations Ri P Ai , i n, such that (a) for every x P there is exactly one y such that x Ri y and (b) Li is the image of P under Ri . Exercise 227. Finnish has a phenomenon called vowel harmony. There are three kinds of vowels: back vowels ([a], [o], [u], written , and , respectively), front vowels ([], [], [y], written , and , respectively) and neutral vowels ([e], [i], written and ). The principle is this.
Vowel harmony (Finnish). A word contains not both a back and a front harmonic vowel.
(meaning) warm sheep his seven ten new gender sleep
The vowel harmony only goes up to the word boundary. So, it is possible to combine two words with different harmony. Examples are share
V n4 D
1 i
h d H 9bA g
g H
AF9 g A 7 f A 7 9 h `R$R A 7 p g 9 f h h f $FA h 4 8 h A 7 7 @`A A $D g A F7 f 5 g X
g VH
H f H 5 i e
G D
HG1H A f 9 8 H 9 RH G ` H 9 H HH A H4FA 8 H G ` H ` A H h i H
`
A g @9h A 9 8 7 A g @R 9 h A g R9 7 h f )9h ( d f $FA 4 8 h A g A 7 A $D 7 g g 5 h' A f 8R

8
Axiomatic Classes II: Exhaustively Ordered Trees
505
holder company. It consists of the back harmonic word share and the front harmonic word society. First, give an denition of strings that satisfy Finnish vowel harmony. It follows that there is a nite automaton that recognizes this language. Construct such an automaton. Hint. You may need to explicitly encode the word boundary.
V q4 D
4.
The theorem by B chi on axiomatic classes of strings has a very interesting u analogon for exhaustively ordered trees. We shall prove it here; however, we shall only show those facts that are not proved in a completely similar way. Subsequently, we shall outline the importance of this theorem for syntactic theory. The reader should consult Section 1.4 for notation. Ordered trees are structures over a language that has two binary relation symbols, and . We also take labels from A and N (!) in the form of constants and get the b language . In this language the set of exhaustively ordered trees is a nitely axiomatizable class of structures. We consider rst the postulates. is transitive and irreexive, x is linear for every x, and there is a largest element, and every subset has a largest and a smallest element with respect to . From this it follows in particular that below an arbitrary element there is a leaf. Here are now the axioms listed in the order just described.
In what is to follow we use the abbreviation x y : x y x y. Now we shall lay down the axioms for the ordering. is transitive and irreexive, it is linear on the leaves, and we have x y iff for all leaves u below x and all leaves v y we have u v. Finally, there are only nitely many leaves, a fact which we can express by requiring that every set of nodes has a smallest and
(6.107f)
y Px
Py
VaV U V aV U
@V U V U @V U V U
(6.107e)
y Px
(6.107d)
y y
x y
Py
(6.107c)
xyz x
y x
z y
(6.107b)
~V V U U ~ UV V U U ~U V V V U U V V U
~ U ~ U U ~U ~U ~ U
(6.107a)
xyz x
y y
z y
!r h d H 9bA g
q g
506
(6.108b) (6.108c) (6.108d) (6.108e)
xy b x xy x P
x Px
bx
y Py y Py
z Pz z Pz
y z
Thirdly, we must regulate the distribution of the labels.
The fact that a tree is exhaustively ordered is described by the following formula.
The class of ordered trees. The class of nite exhaustively ordered trees. Likewise we can dene a quantied modal language. However, we shall change the base as follows, using the results of Exercise 23. We assume 8 operators, M8 : , which correspond to the relations , , , , immediate left sister of, left sister of, immediate right sister of, as well as right sister of. These relations are MSOdenable from the original ones, and conversely the original relations can be MSOdened from the present ones. Let T be an exhaustively ordered tree.
h T o) o) o) o) o) o) o) o S D
Proposition 6.42 The following are nitely
axiomatizable classes.
q r
0 j) i) (
RV
U V U
~ U
(6.110)
xy
y y
V a0
V U
"V U V U ( U ~
(6.109c)
x :
V a0
V U ( 2V U 1
U V
~ U
(6.109b)
bx
A x :A
V a0
V U 2V U V U ( 1 U ~
(6.109a)
x bx
a x :a
y y
V aV
TVV aaV 0 U U ~U U V U V V U UV VV aaV j U ~ U V U UV UV U V U VaV U "V U V ~@V U U S @V U @V U V I j V U ~U U V j j 8V U V U V U D V j U V U V

x x by x y x y uv b u
j j j
~U ~U ~U ~U ~ U
(6.108a)
xyz x
y y
y y
x bv
a largest member (with respect to
). We put b x :
V U U V U D
y y
x.
507
xR xR (6.111) xR xR xR xR xR
y: y: y: y: y: y: y:
xR
y y
xR xR
xR
y y y y
The resulting structure we call M . Now if T as well as R are given, then the relations , , , , and , as well as are denable. First we dene : , and likewise for the other relations. Then R R . R R
Analogously, as with the strings we can show that the following properties are axiomatizable: (a) that R is transitive and irreexive with converse relation R ; (b) that R is the transitive closure of R and R the transitive closure of R . Likewise for R and R ,R and R . With the help of the axiom below we axiomatically capture the condition that x is linear: The other axioms are more complex. Notice rst the following.
Hence the following denitions. (6.114) (6.115)
o o o
o o o
: :
Lemma 6.43 Let T be an exhaustively ordered tree and x y Then x y iff (a) x y or (b) x y or (c) x y or (d) x y.
(6.113)
p q
V oU
V oU
o U o V o U o 5V
V oU
V oU
V oU
V oU
V oU V oU
U o
XV o U XV o U
D D o o D
V oU V oU V oU
0 j) i) (
I o
0 ( D
(6.112)
j
V oU V oU
X @V o U X@V o U V oU V oU
T.
V o U D
V lU
V V oU
X V o U
z x
z y
V V oU
y y
X V o U
U V oU V U U 0 U V oU U U V j
xR
y:
V U
D D D D D D D D
Then we dene R : M8
T as follows.
y z x z y z
V oU V oU V oU V oU V oU V oU V oU V oU
h
o D D
}
V oU
V oU o
508
So we add the following set of axioms.
(Most of them are already derivable. The axiom system is therefore not minimal.) These axioms see to it that in a connected structure every node is reachable from any other by means of the basic relations, moreover, that it is reachable in one step using R . Here we have
Notice that this always holds in a tree and that conversely it follows from the above axioms that R possesses a largest element. Now we put
b holds at a node x iff x is a leaf and is true exactly at x. Now we can must be linear on the set of axiomatically capture the conditions that R leaves. (6.119) bq p p
Finally, we have to add axioms which constrain the distribution of the labels. The reader will be able to supply them. A forest is dened here as the disjoint union of trees.
b b can be embedded into . The converse is as We already know that usual somewhat difcult. To this end we proceed as in the case of strings. We , introduce an analogon of restricted quantiers. We dene functions , , , , , , , as well as on unary predicates, whose meaning should be self explanatory. For example
V U
~ U
V U
(6.121)
O :
V U
where y
fr . Finally let O be dened by x
(6.120b)
x :
x y
V U V V U V
oU o U
(6.120a)
x :
x y
s
Proposition 6.44 The class of exhaustively ordered forests is nitely axiomatisable.

q r
s
V o U
0 ( D
V U 0 ` ( D
U V V U D U V V U D
s
V U
(6.118)
b :
` D
0 ) @S (
V oU
V } rU
(6.117)
x y : there is z : x
T
b
} } j
} k)
} ` }
} k)
} k)
V } rU
`
}
} k)
` D
} S o
(6.116)
V U
509
O says that x is nowhere satisable. Let Px be a predicate variable which does not occur in . Dene Px x inductively as described in Section 6.2. Let Px x . Then we have
Therefore put
Let again h : P PV be a bijection from the set of predicate variables of b b onto the set of proposition variables or .
b Theorem 6.45 Let be an formula with at most one free variable, b the object variable x0 . Then there exists a formula M such that for all exhaustively ordered trees :
b b Corollary 6.46 Modulo the identication M and deis a ne the same model classes of exhaustively ordered trees. Further: b structures iff M is a nitely axiomnitely axiomatizable class of b atizable class of structures.
s
q r
V lU
V lU
0 aV
U lU )V
q r
s
0 ) l(
(6.126)
x0 iff M
x0
s
s
q g
Then the desired embedding of

s
1 2 : hP
q r
V aV U U
1 2 P
o D
(6.125)
O 1 2 P
: : 1 2 : hP
: is shown.
into
x0
~ V ~U aV U U V V aU D U V D } D o D o D
o D
V U
VaV U U V aV o U V oU V oU V oU V aV U U
U V V aU U V V U V oU V oU o U V V o U V aV U U
ax
Qa
Py
hP
U 0 ) %l( )
s
V 0 ) ) l( U
(6.124)
q r
Ex
Because of this we have for all exhaustively ordered trees
IV
@V
U V
V U V
(6.123)
Ex x :
Px
O Px
O Px
Px
0 ( D
S C0 ) l( )
C0 ) l( )
(6.122)
T V U
V U D
V U
Px x
Px x
510
For the purpose of denition of a code we suspend the difference between terminal and nonterminal symbols.
b N A R be a CFG and Denition 6.47 Let G a constant formula (with constants over A). We say, G is faithful for if there is a set H N such that for every tree and every node w T : w iff b w H . We also say that H codes with respect to G. Let be a formula and n a natural number. An ncode for is a pair G H such that LB G is the set of all at most nary branching, nite, exhaustively ordered trees over A N and H codes in G. is called ncodable if there is an ncode for . is called codable if there is an ncode for for every n.
Notice that for technical reasons we must restrict ourselves to at most n branching trees since we can otherwise not write down a CFG as a code. Let G N A R and G N A R be grammars over A. The product is dened by
where
To prove the analogon of the Coding Theorem (6.21) for strings we shall have to use a trick. As one can easily show the direct extension on trees is false since we have also taken the nonterminal symbols as symbols of the language. So we proceed as follows. Let h : N N be a map and T a tree hA where hA N : h and with labels in A N. Then let h : T hA a : a for all a A. Then h is called a projection of . If is a class of trees, then let h : h : . Theorem 6.48 (Thatcher & Wright, Doner) Let A be a terminal alphabet, N a nonterminal alphabet and n . A class of exhaustively ordered, at most b iff it is nbranching nite trees over A N is nitely axiomatizable in the projection onto A N of a context free class of ordered trees over some alphabet. Here a class of trees is context free if it is the class of trees generated by some CFG . Notice that the symbol is not problematic as it was for regular
q r
s
0 3) j) Di) (
Tk
k aa k
k ) 1 aa ( aa@0 k ) k )
03 EX
T 1 y Q ) j) i) ( Q k D
( 0 k ) @S (
(6.128)
R :
X X
0 0 n 1 n 0 n 1 R X
0k
) )k
)k
(6.127)
N AR
0 ) ( s C0 ) l( 1
:
s 31
0 k ) k k ( ) )
0 ) ) ) (
0 ) ) ) (
V U
1 U V b3 V U D
511
languages. We may look at it as an independent symbol which can be the label of a leaf. However, if this is to be admitted, we must assume that the terminal alphabet may be A and not A. Notice that the union of two context free sets of trees is not necessarily itself context free. (This again is different for regular languages, since the structures did not contain the nonterminal symbols.) From now on the proof is more or less the same. First one shows the b codability of formulae. Then one argues as follows. Let G H be the code of a formula . We restrict the set of symbols (that is, both N as well as A) to H. In this way we get a grammar which only generates trees that satisfy . Finally we dene the projection h : H A N as follows. Put h a : a, a A, and h Y : X if LB G x Y x X x . In order for this to be well dened we must therefore have for all Y H an X N with this property. In this case we call the code uniform. Uniform codability follows easily from codability since we can always construct products G G of grammars so N A R and LB G G X x y iff LB G X x . The map that G h is nothing but the projection onto the rst component.
Proof. We only deliver a sketch of the proof. We choose an n and show the uniform ncodability. For ease of exposition we illustrate the proof for n 2. For the formulae a x , a A, and Y x , Y N, nothing special has to be done. Again, the booleans are easy. There remain the modal operators and the quantiers. Before we begin we shall introduce a somewhat more convenient notation. As usual we assume that we have a grammar G N A R as well as some sets H for certain formulae. Now we take the product with a new grammar and dene H . In place of explicit labels we now use the formulae themselves, where stands for the set of labels from H . The basic modalities are as follows. Put
where R2 consists of all possible nbranching rules of a grammar in standard form. To code , we form the product of G with 2. However, we only choose a subset of rules and of the start symbols. Namely, we put : 0 1 and H : H 0 1 ,H : N 1 . The rules are all rules of the
T e S
) bT ) 'bT ) !( ) S) S
(6.129)
2:
01
0 1 A R2
0 ) ) ) (
V U
s
Theorem 6.49 Every constant
formula is uniformly codable.
V U
) ( V U
"V U k e
V aV U
V a0 ) aU (
V U V fCV U U ~U V k
o
T ) e S
V U
0 ) ) ) (
s
o
V U D D k
T ) S
512 form
We proceed to the transitive relations. Notice that on binary branching trees and . Now let us look at the relation .
o U V o o
(6.134)
(6.133)
Likewise,
is the start symbol of G in the case of
(6.132)
T e S
With
we choose
0 .
(6.131)
T e S
Now we proceed to
. Here
0 .
(6.130)
513
0 1 . Next we look at
The set of start symbols is : 0 . Finally we study the quantier p . Let : p , where is a new constant. Our terminal alphabet is now A 0 1 , the nonterminal alphabet 1 H 1 is a uniform code for , an arbitrary N 0 1 . We assume that G subformula of . For every subset of the set of all subformulae of we put

1 Then G1 H is a code for L where
(6.138)
1 1 1 where X i H , Y0 j0 H , Y1 j1 H and 0 1 is a rule of 0 1 G1 . (This in turn is the case if there are X, Y0 and Y1 as well as i, j0 and j1 Y0 j0 Y1 j1 R.) Likewise for unary rules. Now we go such that X i 3 , with N 3 : N N 1 . Here we take all rules over to the grammar G of the form
(6.139) Y0
0 ) ( 0 ) 0 y) (
Y1
V aV
1 0
D C0 1
Y0 j0 0
1 y0
( d0
) 0 ) ) ( 0 ) ) (
X i
Y1 j1 1
e T ) e S
Now we build a new CFG , G2 . Put N 2 : G2 are all rules of the form
(6.137)
1 H :
1 H
V wD
( 0 ) (
(6.136)
L :
1 H
01
N 1 . The rules of
T ) e S k
V U T e S
(6.135)
o o o U
T ) e S
The set of start symbols is
o D 0 1 y0 ) ( ( ( ) T ) e S (
514 where
is the set of all for which there are 0 , 1 and i, j0 , j1 such that
(6.140)
is a rule of G2 . Notes on this section. From complexity theory we know that CFLs, being in PTIME, actually possess a description using rst order logic plus inationary xed point operator. This means that we can describe the set of strings in L G for a CFG by means of a formula that uses rst order logic plus inationary xed points. Since we can assume G to be binary branching and invertible, it sufces to nd a constituent analysis of the string. This is a set of subsets of the string, and so of too high complexity. What we need is a rst order description of the constituency in terms of the string alone. The exercises describe a way to do this. Exercise 228. Show the following: is denable from , likewise . Also, trees can be axiomatized alternatively with (or ). Show furthermore that in ordered trees is uniquely determined from . Give an explicit denition.
j
Exercise 229. Let x L y if x and y are sisters and x trees L can be dened with and conversely.
y. Show that in ordered
Exercise 230. Let be a tree over A and N such that every node that is not preterminal is at least 2branching. Let x x 0 xn 1 be the associated C iff the least node above x i string. Dene a set C n3 as follows. i j k and x j is lower than the least node above xi and xk . Further, for X N, dene LX n2 by i j LX iff the least node above xi and x j has label X. Show that C uniquely codes the tree structure and L X , X N, the labelling. Finally, for every a A we have a unary relation Ta n to code the nodes of category a. Axiomatize the trees in terms of the relations C, L X , X N, and Ta , a A. Exercise 231. Show that a string of length n possesses at most 2 cn different constituent structures for some constant c.
3
V U
aa
1 y0 ) ) ( i
Y0 j0 0
) ( 0 ) ) 0 ) ) ( }
X i
Y1 j1 1
1 C0 ) (
( }
Transformational Grammar
515
5.
In this section we shall discuss the socalled Transformational Grammar, or TG. Transformations have been introduced by Zellig Harris. They were operations that change one syntactic structure into another without changing the meaning. The idea to use transformations has been adopted by Noam Chomsky, who developed a very rich theory of transformations. Let us look at a simple example, a phenomenon known as topicalization. (6.141) (6.142)
yA5$P @bH h d D 5 5 A 9 D H 5 F$s A 9D H 5 4 A h d D $@k1P H 5 5 | y
We have two different English sentences, of which the rst is in normal serialization, namely SVO, and the second in OSV order. In syntactic jargon we say that in the second sentence the object has been topicalized. (The metaphor is by the way a dynamic one. Speaking statically, one would prefer to express that differently.) The two sentences have different uses and probably also different meanings, but the meaning difference is hard to establish. For the present discussion this is however not really relevant. A transformation is a rule that allows us for example to transform (6.141) into (6.142). TransformaSC. Here SD stands for structural description tions have the form SD and SC for structural change. The rule TOP, for topicalization, may be formulated as follows.
|
(6.143)
NP1 V NP2 Y
This means the following. If a structure can be decomposed into an NP followed by a V and a second NP followed in turn by an arbitrary string, then the rule may be applied. In that case it moves the second NP to the position immediately to the left of the rst NP. Notice that Y is a variable for arbitrary strings while NP and V are variables for constituents of category NP and V, respectively. Since a string can possess several NPs or Vs we must have for every category a denumerable set of variables. Alternatively, we may write W NP . This denotes an arbitrary string which is an NPconstituent. We agree that (6.143) can also be applied to a subtree of a tree (just as the string replacement rules of Thueprocesses apply to substrings). Analogously, we may formulate also the reversal of this rule: (6.144) NP2 NP1 V Y
NP2 NP1 V Y
NP1 V NP2 Y
516
However, one should be extremely careful with such rules. They often turn out to be too restrictive and often also too liberal. Let us look again at TOP. As formulated, it cannot be applied to (6.145) and (6.147), even though topicalization is admissible, as (6.146) and (6.148) show. (6.146) (6.147) (6.148)
The problem is that in the SD V only stands for the verb, not for the complex consisting of the verb and the auxiliaries. So, we have to change the SD in such a way that it allows the examples above. Further, it must not be disturbed by eventually intervening adverbials. German exemplies a construction which is one of the strongest arguments in favour of transformations, namely the socalled V2phenomenon. In German, the verb is at the end of the clause if that clause is subordinate. In a main clause, however, the part of the verb cluster that carries the inection is moved to second position in the sentence. Compare the following sentences. ..., that Hans his car repairs. Hans repairs his car.
G H S S a
(6.150) (6.151)
..., that Hans not into the opera go wants. Hans wants not into the opera go.
G 5 G
..., that Hans in class rarely PREFattention.pay. Hans attention.pay in class rarely PREF.
G H
As is readily seen, the auxiliaries and the verb are together in the subordinate clause, in the main clause the last of the series (which carries the inection) moves into second place. Furthermore, as the last example illustrates, it can
(6.154)
y X
7 H 9 h 4 P h A RR9$4
D n
5 5 h 4 @8$$9
D 4
H 8 A 9 IH
) iaa
(6.153)
y 4
H 7 H 9 h 4 P h A 8 X I$#4
D n
5 5 h 4 b9
G 5
(6.152)
P P D 9 h R 5 h @8nh $b8
9 h R 5 h h 88
h bD
g 7 4
h bD
9 D
9D h A 4 5 h D 5 H 8 h 5 A 9 !bbnFIH
9 D
D E
9 A 9 FRH
A 9 FRH
9 P P D A 9 bkIH
) iaa
(6.149)
y 4 '
5 h D 5 H 8 h 8n9$5 g 7 4
9D h A A 9 !FRH
(6.145)
p @bH yh1$P $bh A d D P 9 D H 4 5 5 5 yF9D$k5$P $78h A H 5 4 A h d D P 9D H 4 5 G y0h5$P 4 FD f @bH d D R 5 5 G A 9 D H 5 4 h d D @1P 4 9D R

y
AF$s 9 D H 5 p H 5 5 AF9$s D H 5 5 5 H ) iaa
517
happen that certain prexes of the verb are left behind when the verb moves. In transformational grammar one speaks of V2movement. This is a transformation that takes the inection carrier and moves it to second place in a main clause. A similar phenomenon is what might be called damit- or davorsplit, which is found mainly in northern Germany.
` G 5 G G
(6.155)
DA
has he me always VOR warned.

4
He has always warned me of that.

DA
could I simply not MIT reckon.
I simply could not reckon with that. We leave it to the reader to picture the complications that arise when one wants to formulate the transformations when V2movement and damit- or davorsplit may operate. Notice also that the order of application of these rules must be reckoned with. A big difference between V2movement and damitsplit is that the latter is optional and may apply in subordinate clauses, while the former is obligatory and restricted to main clauses.
(6.158) (6.159) (6.160)
In (6.158) we have reversed the effect of both transformations of (6.155). The sentence is ungrammatical. If we only apply V2movement, however, we get (6.157), which is grammatical. Likewise for (6.160) and (6.159). In contrast to Harris, Chomsky did not construe transformations as mediating between grammatical sentences (although also Harris did allow to pass through illegitimate structures). He insisted that there is a two layered process of generation of structures. First, a simple grammar (context free, preferrably) generates socalled deep structures. These deep structures may be seen as the canonical representations, like Polish Notation or inx notation, where the meaning can be read off immediately. However, these structures may not be legitimate objects of the language. For example, at deep structure, the verb of
f f
G p 58i 9D h 4 !1D f H G p H@i 9!$$@9 g d D h h 4 9 G Hp G 5 f 8h @f D D f 5
G 5
(6.157)
G 5p G Hp G 5p y0$$9 g R9 h 4 9 d 9 h h 5 $4 9 D H G5p Hp G G 5p y 99 h h 4 51D f H 4 9 D H G ` y9H 4 4 9 5 H h R 8b#5 g H ` 9 5 H h R bb#$5 g H bh 5
G 5
y 4 $
G 5
G 5
G 5
(6.156)
9 h R9
h 5 4 $FD f
y 4 '$
9 5 H h R 89$5 g
D E
5 f 8h f
H X !h 9D
p f D 5 h 4 89H
D 4
p f D
h 9 9 g
4 9
H H 5
518
a German sentence appears in nal position (where it arguably belongs) but alas these sentences are not grammatical as main clauses. Hence, transformations must apply. Some of them apply optionally, for example damit- and davorsplit, some obligatorily, for example V2movement. At the end of the transformational cycle stands the surface structure. The second process is also called (somewhat ambiguously) derivation. The split between these two processes has its advantages, as can be seen in the case of German. For if we assume that the main clause is not the deep structure, but derived from a deep structure that looks like a surface subordinate clause, the entire process for generating German sentences is greatly simplied. Some have even proposed that all languages have universally the same deep structure, namely SVO in (Kayne, 1994); or right branching, allowing both SVO and SOV deep structure. The latter has been defended in (Haider, 2000) (dating from 1991) and (Haider, 1995; Haider, 1997). Since the overwhelming majority of languages belongs to either of these types, such claims are not without justication. The differences that can be observed in languages are then caused not by the rst process, generating the deep structure, but entirely by the second, the transformational component. However, as might be immediately clear, this is on the one hand theoretically possible but on the other hand difcult to verify empirically. Let us look at a problem. In German, the order of nominal constituents is free (within bounds). (6.162) (6.163) (6.164)
The father gives a dog to the son.
How can we decide which of the serializations are generated at deep structure and which ones are not? (It is of course conceivable that all of them are deep structure serializations and even that none of them is.) This question has not found a satisfactory answer to date. The problem is what to choose as a diagnostic tool to identify the deep structure. In the beginning of transformational grammar it was thought that the meaning of a sentence is assigned at deep structure. The transformations are not meaning related, they only serve to make the structure speakable. This is reminiscent of Harris idea that transformations leave the meaning invariant, the only difference being that Harris conceived of transformations as mediating between sentences of the
(6.161)
G Hp G y 5b4#H U 8h h 5 9 7 $h 9 g } f h 9 h 9 D h 4 d 9 A G Hp G y 9@7 9R!b#H U 8h h 9 g } f h h 9D h 5 h 4 5 4 d 9 A G G f h b#H U 8h h Hp A 7 D G y9 g } 5 h 4 5 4 d 9 9 9 h 9 G G 5p 9 h 9D h R!9 g } f h Rh 8$#H U bh 4 d 9 A 5 h 4 5
y
9 @7
519
language. Now, if we assume this then different meanings in the sentences sufce to establish that the deep structures of the corresponding sentences are different, though we are still at a loss to say which sentence has which deep structure. Later, however, the original position was given up (on evidence that surface structure did contribute to the meaning in the way that deep structure did) and a new level was introduced, the socalled Logical Form (LF), which was derived from surface structure by means of further transformations. We shall not go into this, however. Sufce it to say that this increased even more the difculty in establishing with precision the deep structure(s) from which a given sentence originates. Let us return to the sentences (6.161) (6.164). They are certainly not identical. (6.161) sounds more neutral, (6.163) and (6.162) are somewhat marked, and (6.164) nally is somewhat unusual. The sentences also go together with different stress patterns, which increases the problem here somewhat. However, these differences are not exactly semantical, and indeed it is hard to say what they consist in. Transformational grammar is very powerful. Every recursively enumerable language can be generated by a relatively simple TG. This has been shown by Stanley Peters and R. Ritchie (1971; 1973) . In the exercises the reader is asked to prove a variant of these theorems. The transformations that we have given above are problematic for a simple reason. The place from which material has been moved is lost. The new structure is actually not distinguishable from the old one. Of course, often we can know what the previous structure was, but only when we know which transformation has been applied. However, it has been observed that the place from which an element has been moved inuences the behaviour of the structure. For example, can be contracted to in American Chomsky has argued that English; however, this happens only if no element has been placed between and during the derivation. For example, contraction is permitted in (6.166), in (6.168) however it is not, since was the subject of the lower innitive (standing to the left of the verb), and had been raised from there. (6.165) (6.166) (6.167) (6.168)
` G G y h 9 g PcFh H$@P 9IhIH f h $$D H A 7 h H 9 H 9 4 A D A s ` G G y0h9 g H 7h H@P g 49RhRH f h $$D P A h 4 H 9 4 A D A s ` y h @Ih Q H h P H 9 9 H ` h @P g $Ih Q H h 4 4 9 H
H 9 9 H I
9 IH f h
4 4 $
9 H I
4 $
9 H I
520
The problem is that the surface structure does not know that the element has once been in between and . Therefore, one has assumed that the moved element leaves behind a socalled trace, written t. For other reasons the trace also got an index, which is a natural number, the same one that is given to the moved element (= antecedent of the trace). So, (6.142) really looks like this.

We have chosen the index 1 but any other would have done equally well. The indices as well as the t are not audible, and they are not written either (except in linguistic textbooks, of course). Now the surface structure contains traces and is therefore markedly different from what we actually hear or read. Whence one assumed nally that there is a further process turning a surface structure into a pronounceable structure, the socalled Phonological Form (PF). PF is nothing but the phonological representation of the sentence. On PF there are no traces and no indices, no (or hardly any) constituent brackets. One of the most important arguments in favour of traces and the instrument of coindexing was the distribution of pronouns. In the theory one distinor ) from anaphors. guishes referential expressions (like , ) as well as reexive pronouns To the latter belong pronouns ( , ( ). The distribution of these three is subject to certain rules which are regulated in part by structural criteria.

(6.170)
is a subjectoriented anaphor. Where it appears, it refers to the subject of the same sentence. Semantically, it is interpreted as a variable which is identical to the variable of the subject. As (6.171) shows, the domain of a reexive ends with the nite sentence. The antecedent of must in be taken to be John, not Harry. Otherwise, we would have to have place of . (6.172) shows that sometimes also phonetically empty pronouns can appear. In other languages they are far more frequent (for example in Latin or in Italian). Subject pronouns may often be omitted. One
P h $A f
P h A f
(6.172)
X
(6.171)
G ` X PA f D h 5 g X @P D7AF9 g HAh$5h g 4 h b$h h 8 h D P y X A P h G e G ` 5 g X h@P D9 g HA5 H9 g 49H 4 h b$h A 8 h A h D P G P h $RA f D 1P A h d D
9 D H 5 4 $h
h 7 g 5 5 H
y X
(6.169)
t1
4 $
A h d D 1$P
9 H I
5 5 H
A 9 D H 5 Fs
5 5 H
5H 5 5 5 H
P h A h $R9 g
P h A f
9 IH f
D
521
says that these languages have an empty pronoun, called pro (little PRO). Additionally to the question of the subject also structural factors are involved.
We may understand (6.173) in two ways: either Peter was driving his (= Peters) car or Nats car. (6.174) allows only one reading (on condition that his refers to Nat, which it does not have to): Peter was happy about Nats car, not Peters. This has arguably nothing to do with semantical factors, but only with the fact that in the rst sentence, but not in the second, the pronoun is bound by its antecedent. Binding is dened as follows. Denition 6.50 Let be a tree with labelling . x binds y if (a) the smallest branching node that properly dominates x dominates y, but x does not dominate y, and (b) x and y carry the same index. The structural condition (a) of the denition is called ccommand. (A somewhat modied denition is found below.) The antedecent ccommands the pronoun in case of binding. In (6.173) the pronoun is ccommanded by . For the smallest constituent properly containing is the entire senis not ccommanded by . (This is of tence. In (6.174) the pronoun course not entirely clear and must be argued for independently.) There is a rule of distribution for pronouns that is as follows: the reexive pronoun has to be bound by the subject of the sentence. A nonreexive pronoun however may not be bound by the subject of the sentence. This applies to German as well as to English. Let us look at (6.175).
X 3
(6.175)
If this sentence is grammatical, then binding is computed not only at surface structure but at some other level. For the pronoun is not c commanded by the subject . The structure that is being assumed is [ [ ]]. Such consideration have played a role in the introduction of traces. Notice however that none of the conclusions is inevitable. They are only inevitable moves within a certain theory (because it makes certain assumptions). It has to be said though that binding was the central diagnostic tool of transformational grammar. Always if it was diagnosed that there was no ccommand relation between an anaphor and some
X
P h $A f
4 #
g g
4 # G
A D
A h d D 1P
A $D
5 5 H
5 5 @bH
A h d D 1P
P h A f
5 5 @bH
(6.174)
g g
4 #
(6.173)
G 5 h 4 h b#G IH 7FH 9 8 8 p 5 h 4 h 8$$G RH D 9 H 5 A
R $ D
p h H f bH D 5 A A H 4 9H
D n
P h $RA f
4 9
522
element one has concluded that some movement must have taken place from a position, where ccommand still applied. In the course of time the concept of transformation has undergone revision. TG allowed deletions, but only if they were recoverable: this means that if one has the output structure and the name of the transformation that has been applied one can reconstruct the input structure. (Effectively, this means that the transformations are partial injective functions.) In the socalled Theory of Government and Binding (GB) Chomsky has banned deletion altogether from the list of options. The only admissible transformation was movement, which was later understood as copy and delete (which in effect had the same result but was theoretically a bit more elegant). The movement transformation was called Move and allowed to move any element anywhere (if only the landing site had the correct syntactic label). Everything else was regulated by conditions on the admissibility of structures. Quite an interesting complication arose in the form of the socalled parasitic gaps.
We are dealing here with two verbs, which share the same direct object ( and ). However, at deep structure only one them could have as its object and so at deep struchad the overt object phrase ture we either had something like (6.177) or something like (6.178).
X
It was assumed that essentially (6.177) was the deep structure while the verb (in its form , of course) just got an empty coindexed object.
G G 5
However, the empty element is not bound in this conguration. English does not allow such structures. The transformation that moves the whconstituent at the beginning of the sentence however sees to it that a surface structure the pronoun is bound. This means that binding is not something that is decided at deep structure alone but also at surface structure. However, it cannot be one of the levels alone (see (Frey, 1993)). We have just come to see that deep structure alone gives the wrong result. If one replaces
A 5 h 8 H bF8
G 5
(6.179)
e1
R ! D
9 H h 5 4 c$7 g
4 D 1
A 5 h 8 H bF8
R ! D
h P @8D X 9 H h 5
7 g
(6.178)
7 g
(6.177)
G Hp G 3A58hF8 8 H D H h 5 4 $$$7 g
R 9 r$D
$ D R G 4 D F
G 9 H h 5 4 $$$7 g G 5p A 5 h 8 H 8F8
A 5 h 8 H 8F8
G H

D G
4FD G
h88D P h P 88D
H h 5 g 7 g
G 5
Q 4
(6.176)
R 9 P!D
H h 5 4 7 g
4 D Fn
h P @bD
7 g
A 5 h 8 H bF8
H h $$5 g
h P @bD
X
523
by then we have an example in which the binding conditions apply neither exclusively at deep structure nor exclusively at surface structure. And the example shows that traces form an integral part of the theory. A plethora of problems have since appeared that challenged the view and the theory had to be revised over and over again in order to cope with them. One problem area were the quantiers and their scope. In German, quantiers have scope more or less as in the surface structure, while in English matters are different (not to mention other languages here). Another problem is coordination. In a coordinative construction we may intuitively speaking delete elements. However, deletion is not an option any more. So, one has to assume that the second conjunct contains empty elements, whose distribution must be explained. The deep structure of (6.180) is for example (6.181). For many reasons, (6.182) or (6.183) would however be more desirable.
Karl has Maria a bicycle stolen and Peter a radio. Karl has stolen a bicycle from Maria and a radio from Peter. e1 e2
G
We shall conclude this section with a short description of GB. It is perhaps not an overstatement to say that GB has been the most popular variant of TG, so that it is perhaps most fruitful to look at this theory rather than previous ones (or even the subsequent Minimalist Program). GB is divided into several subtheories, socalled modules. Each of the modules is responsible for its particular set of phenomena. There is Binding Theory, the ECP (Empty Category Principle), Control Theory,
(6.183)
(6.182)
(6.181)
G G y 4 '#H 9 h @P g 1R g D !8#G 7 4 A h H p 9 D h 5 h 4 h 9 G G G e 4 #H 9 h R@P g 5$R @5 H $8qbH 4 A h H 5 9 D h H D 5 P 5 H G G y 4 9H 9 h R@P g 5$R 4 A h g R G e 4 7 S 9D$h8$$G 5 h 4 h 9 7 @5 H $8qbH H 5 9 D h H D 5 P 5 H g D !b$G y H p 9D h 5 h 4 h 9 7 G G e 9 h @P g 1R @5 H $8qbH 4 A h H 5 9 D h H D 5 P 5 H
4 9
g D
(6.180)
5 h 4 h 8$$G
G G 9 7 9 h c8P g 1$R 5 H 4 A h H 5
P h A 5 $R7 g
9 D h H D 5 !c8nH
4
7 g H 8F8 5 h 8 H
4 9
H!h p 9 D G H H P 5
G 5
524
Bounding Theory, the Theory of Government, Case Theory, Theory, Projection Theory. The following four levels of representation were distinguished. DStructure (formerly deep structure), SStructure (formerly surface structure), Phonetic Form (PF) and Logical Form (LF). There is only one transformation, called Move . It takes a constituent of category ( arbitrary) and moves it to another place either by putting it in place of an empty constituent of category (substitution) or by adjoining it to a constituent. Binding Theory however requires that trace always have to be bound, and so movement always is into a position ccommanding the trace. Substitution is dened as follows. Here X and Y are variables for strings and category symbols. i is a variable for a natural number. It is part of the representation (more exactly, it is part of the label, which we may construe as a pair of a category symbol and a set of natural numbers). i may occur in the left hand side (SC) namely, if it gures in the label . So, if C I ,C a category label and I , i : C I i .
Adjunction is the following transformation.
Both rules make the constituent move leftward. Corresponding rightward rules can be formulated analogously. (In present day theory it is assumed that movement is always to the left. We shall not go into this, however.) In both cases the constituent on the right hand side, X i , is called the antecedent of the trace, ti . This terminology is not arbitrary: traces in GB are
4 |
(6.185)
Adjunction:
X Y
X ti
(6.184)
Substitution:
X eY Z
X Z
iY
ti
0 ) (
| D 0 S IT s ) (
525
considered as anaphoric elements. In what is to follow we shall not consider adjunction since it leads to complications that go beyond the scope of this exposition. For details we refer to (Kracht, 1998). For the understanding of the basic techniques (in particular with respect to Section 6.7) it is enough if we look at substitution. As in TG, the Dstructure is generated rst. How this is done is not exactly clear. Chomsky assumes in (1981) that it is freely generated and then checked for conformity with the principles. Subsequently, the movement transformation operates until the conditions for an Sstructure are satised. Then a copy of the structure is passed on to the component which transforms it into a PF. (PF is only a level of representation, therefore there must be a process to arrive at PF.) For example, symbols like ti , e, which are empty, are deleted together with all or part of the constituent brackets. The original structure meanwhile is subjected to another transformational process until it has reached the conditions of Logical Form and is directly interpretable semantically. Quantiers appear in their correct scope at LF. This model is also known as the Tmodel. We begin with the phrase structure, which is conditioned by the theory of projection. The conditions of theory of projection must in fact be obeyed at all levels (with the exception of PF). This theory is also known as X syntax. It differentiates between simple categorial labels (for example V, N, A, P, I and C, to name the most important ones) and a level of projection. The categorial labels are either lexical or functional. Levels of projection are natural numbers, starting with 0. The higher the number the higher the level. In the most popular version one distinguishes exactly 3 levels for all categories (while in (Jackendoff, 1977) it was originally possible to specify the numbers of levels for each category independently). The levels are added to the categorial label as superscripts. So N2 is synonymous with
If X is a categorial symbol then XP is the highest projection. In our case NP is synonymous with N2 . The rules are at most binary branching. The non branching rules are
(6.188)
Xj
Xj
X j is the head of X j
(6.187)
Xj
Xj
1
. There are, furthermore, the following rules: YP Xj

1
(6.186)
CAT PROJ
: :
N 2
YP X j
526
Here, YP is called the complement of X j if j Finally, we have these rules.
0, and the specier if j
Here YP is called the adjunct of X j . The last rules create a certain difculty. We have two occurrences of the symbol X j . This motivated the distinction between a category (= connected sets of nodes carrying the same label) and segments thereof. The complications that arise from this denition have been widely used by Chomsky in (1986). The relation head of is transitive. Hence x with category Ni is the head of y with N j , if all nodes z with x z y have category Nk for some k. By necessity, we must have i k j. Heads possess in addition to their category label also a subcategorization frame. This frame determines which arguments the head needs and to which arguments it assigns case and/or a role. roles are needed to recover an argument in the semantic representation. For example, there are roles for agent, experiencer, theme, instrument and so on. These are coded by suggestive names such as a , e , th , inst , and so on. gets for example the following subcategorization frame. It is on purpose that the verb does not assign case to its subject. It only assigns a role. The case is assigned only by virtue of the verb getting the niteness marker. The subcategorization frames dictate how the local structure surrounding a head looks like. One says that the head licenses nodes in the deep structure, namely those which correspond to entries of its subcategorization frame. It will additionally determine that certain elements get case and/or a role. Case- and Theory determine which elements need case/ roles and how they can get them from a head. One distinguishes between internal and external arguments. There is at most one external argument, and it is signalled in the frame by underlining it. It is found at deep structure outside of the maximal projection of the head (some theorists also think that it occupies the specier of the projection of the head, but the details do not really matter here). Further, only one of the internal arguments is a complement. This is already a consequence of Xsyntax; the other arguments therefore have to be adjuncts at Dstructure. One of the great successes of the theory is the analysis of . The uninected has the following frame.
0 l
(6.191)
INFL2 t
f RA h h
0 l
(6.190)
NP e NP ACC th
h h @A
f hA h f RA h h
(6.189)
Xj
Xj
YP
Xj
YP X j
1.
h h A
527
(INFL is the symbol of inection. This frame is valid only for the variant which selects innitives.) This verb has an internal argument, which must be realized by the complement in the syntactic tree. The verb assigns a role to this argument. Once it is inected, it has a subject position, which is assigned case but no role. A caseless NP inside the complement must be moved into the subject position of in syntax, since being an NP it needs case. It can only appear in that position, however, if at deep structure it has been assigned a role. The subject of the embedded innitive however is a canonical choice: it only gets a role, but still needs case.
(6.193)
It is therefore possible to distinguish two types of intransitive verbs, those which assign a role to their subject ( ) and those which do not ( ). There were general laws on subcategorization frames, such as
Burzios Generalization. A verb assigns case to its governed NPargument if and only it assigns a role to its external argument.
The Theory of Government is responsible among other for case assignment. It is assumed that nominative and accusative could not be assigned by heads (as we wrongly, at least according to this theory said above) but only in a specic conguration. The simplest conguration is that between head and complement. A verb having a direct complement licenses a direct object position. This position is qua structural property (being sister to an element licensing it) assigned accusative. The following is taken from (von Stechow and Sternefeld, 1987), p. 293. Denition 6.51 x with label governs y with label iff (1) x and y are dominated by the same nodes with label XP, X arbitrary, and (2) either X 0 , where X is lexical or AGR0 and (3) x ccommands y. x governs y properly if x governs y and either X 0 , X lexical, or x and y are coindexed. (Since labels are currently construed as pairs X i P , where X i is a category symbol with projection and P a set of natural numbers, we say that x and y are coindexed if the second component of the label of x and the second component of the label of y are not disjoint.) The ECP is responsible for the distribution of empty categories. In GB there is a whole army of different
f @A h h
0 )
P P @H
f h i 337I g 2H 3@$ I I i f h R@37I g m33I 3(H
(6.192)
f @A h h
]
528
empty categories: e, a faceless constituent into which one could move, t, the trace, PRO and pro, which were pronouns. The ECP says among other that t must always be properly governed, while PRO may never be governed. We remark that traces are not allowed to move. In Section 6.7 we consider this restriction more closely. The Bounding Theory concerns itself with the distance that syntactic processes may cover. It (or better: notions of distance) is considered in detail in Section 6.7. Finally, we remark that Transformational Grammar also works with conditions on derivations. Transformations could not be applied in any order but had to follow certain orderings. A very important one (which was the only one to remain in GB) was cyclicity. Let y be the antecedent of x after movement and z y. Then let the interval x z be called the domain of this instance of movement. Denition 6.52 Let be a set of syntactic categories. x is called a bounding node if the label of x is in . A derivation is called cyclic if for any two instances of movement 1 and 2 and their domains B1 and B2 the following holds: if 1 was applied before 2 then every bounding node from B1 is dominated (not necessarily properly) by some bounding node from B 2 and every bounding node from B2 dominates (not necessarily properly) a bounding node from B1 . Principally, all nite sentences are bounding nodes. However, it has been argued by Rizzi (and others following him) that the choice of bounding categories is language dependent. Notes on this section. This exposition may sufce to indicate how complex the theory was. We shall not go into the details of parametrization of grammars and learnability. We have construed transformations as acting on labelled (ordered) trees. No attempt has been made to precisify the action of transformations on trees. Also, we have followed common practice to write t , even though strictly speaking t is a symbol. So, it would have been more appropriate to write , say, to make absolutely clear that there is a symbol that gets erased. (In TG, deletion really erased the symbol. Today transformations may not delete, but deletion must take place on the way to PF, since there are plenty of empty categories.) , and have quite a exible syntax, Exercise 232. Coordinators like as was already remarked at the end of Section 3.5. We have , , and so on. What difculties arise in connection with Xsyntax for these words? What solutions can you propose?
R
9 H 4 R9H
g 9
5 g RH 9
h 7 RP
9 H 9 h h 5 h 4 D 5 R$@R Fq$
9 RH
H h 5
GPSG and HPSG
529
Exercise 233. A transformation is called minimal if it replaces at most two adjacent symbols by at most two adjacent symbols. Let L be a recursively enumerable language. Construct a regular grammar G and a nite set of minimal transformations such that the generated set of strings is L. Here the criterion for a derivation to be nished is that no transformation can be applied. Hint. If L is recursively enumerable there is a Turing machine which generates L from a given regular set of strings. Exercise 234. (Continuing the previous exercise.) We additionally require that the deep structure generated by G as well as all intermediate structures conform to Xsyntax. Exercise 235. Write a 2LMG that accommodates German V2 and damit and davorsplit. Exercise 236. It is believed that if traces are allowed to move, we can create unbound traces by movement of traces. Show that this is not a necessary conclusion. However, the ambiguities that arise from allowing such movement on condition that it does not make itself unbound are entirely harmless. 6. GPSG and HPSG
In the 1980s, several alternatives to transformational grammar were being developed. One alternative was categorial grammar, which we have discussed in Chapter 3. Others were the grammar formalisms that used a declarative (or model theoretic) denition of syntactic structures. These are Generalised Phrase Structure Grammar (mentioned already in Section 6.1) and Lexical Functional Grammar (LFG). GPSG later developed into HPSG. In this section we shall deal mainly with GPSG and HPSG. Our aim is twofold. We shall give an overview of the expressive mechanism that is being used in these theories, and we shall show how to translate these expressive devices into a suitable polymodal logic. In order to justify the introduction of transformational grammar, Chomsky had given several arguments to show that traditional theories were completely inadequate. In particular, he targeted the theory of nite automata (which was very popular in the 1950s) and the structuralism. His criticism of nite automata is up to now unchallenged. His negative assessment of structuralism, however, was based on factual errors. First of all, Chomsky has made a caricature of Bloomelds structuralism by equating it with the claim that natural
530
languages are strongly context free (see the discussion by ManasterRamer and Kac (1990)). Even if this was not the case, his arguments of the insufciency of CFGs are questionable. Some linguists, notably Gerald Gazdar and Geoffrey Pullum, after reviewing these and other proofs eventually came to the conclusion that contrary to what has hitherto been believed all natural languages were context free. However, the work of Riny Huybregts and Stuart Shieber, which we have discussed already in Section 2.7 put a preliminary end to this story. On the other hand, as Rogers (1994) and Kracht (1995b) have later shown, the theories of English proposed inside of GB actually postulated an essentially context free structure for it. Hence English is still (from a theoretical point of view) strongly context free. An important argument against context free rules has been the fact that simple regularities of language such as agreement cannot be formulated in them. This was one of the main arguments by Paul Postal (1964) against the structuralists (and other people), even though strangely enough TG and GB did not have much to say about it either. Textbooks only offer vague remarks about agreement to the effect that heads agree with their speciers in certain features. Von Stechow and Sternefeld (1987) are more precise in this respect. In order to formulate this exactly, one needs AVSs and variables for values (and structures). These tools were introduced by GPSG into the apparatus of context free rules, as we have shown in Section 6.1. Since we have discussed this already, let us go over to word order variation. Let us note that GPSG takes over Xsyntax more or less without change. It does, however, not insist on binary branching. (It allows even unbounded branching, which puts it just slightly outside of context freeness. However, the bound on branching may seem unnatural, see Section 6.4.) Second, GPSG separates the context free rules into two components: one is responsible for generating the dominance relation, the other for the precedence relation between sisters. The following rule determines that a node with label VP can have daughters, which may occur in any order.
This rule stands for no less than 24 different context free rules. In order to get for example the German word order of the subordinate clause we now add the following condition.
(6.195)
(6.194)
VP
NP nom NP dat NP acc V
GPSG and HPSG
531
This says that every daughter with label N is to the left of any daughter with label V. Hence there only remain 6 context free rules, namely those in which the verb is at the end of the clause. (See in this connection the examples (6.161) (6.164).) For German one would however not propose this analysis since it does not allow to put any adverbials in between the arguments of the verb. If one uses binary branching trees, the word order problems reappear again in the form of order of discharge (for which GPSG has no special mechanism). There are languages for which this is better suited. For example, Staal (1967) has argued that Sanskrit has the following word orders: SVO, SOV, VOS and OVS. If we allow the following rules without specifying the linear order, these facts are accounted for.
All four possibilities can be generated and no more. Even if we ignore word order variation of the kind just described there remain a lot of phenomena that we must account for. GPSG has found a method of capturing the effect of a single movement transformation by means of a special device. It rst of all denes metarules, which generate rules from rules. For example, to account for movement we propose that in addition to Y ti W V also the tree Yi W V will be a legitimate tree. To make this happen, there shall be an additional unary rule that allows to derive the latter tree whenever the former is derivable. The introduction of these rules can be captured by a general scheme, a metarule. However, in the particular case at hand one must be a bit more careful. It is actually necessary to do a certain amount of bookkeeping with the categories. GPSG borrows from categorial grammar the category W Y , where W and Y are standard categories. In place of the rule V W one writes V Y W Y . The ofcial notation is

(6.197)
W
SLASH
:Y
How do we see to it that the feature SLASH : Y is correctly distributed? Also here GPSG has tried to come up with a principled answer. GPSG distinguishes foot features from head features. Their behaviour is quite distinct. Every feature is either a foot feature or a head feature. The attribute SLASH is classied as a foot feature. (It is perhaps unfortunate that it is called a feature and not an attribute, but this is a minor issue.) For a foot feature such as SLASH, the SLASHfeatures of the mother are the unication of the SLASH features of the daughters, which corresponds to the logical meet. Let us look
aa aa
(6.196)
VP
NP nom
V1
V1
NP acc
V0
aa aa
532
more closely into that. If W is an AVS and f a feature then we denote by f W the value of f in W . Denition 6.53 Let G be a set of rules over AVSs. f is a foot feature in G if for every maximally instantiated rule A B 0 Bn 1 the following holds.
i n
So, what this says is that the SLASHfeature can be passed on from mother to any number of its daughters. In this way (Gazdar et al., 1985) have seen to it that parasitic gaps can also be handled (see the previous section on this phenomenon). However, extreme care is needed. For the rules do not allow to count how many constituents of the same category have been extracted. Head features are being distributed roughly as follows.
The exact formulation of the distribution scheme for head features however is much more complex than for foot features. We shall not go into the details here. This nishes our short introduction to GPSG. It is immediately clear that the languages generated by GPSG are context free if there are only nitely many category symbols and bounded branching. In order for this to be the case, the syntax of paths in an AVS was severely restricted. Denition 6.54 Let A be an AVS. A path in A is a sequence f i : i n such that fn 1 f0 A is dened. The value of this expression is the value of the path. In (Gazdar et al., 1985) it was required that only those paths were legitimate in which no attribute occurs twice. In this way the niteness is a simple matter. The following is left to the reader as an exercise. Proposition 6.55 Let A be a nite set of attributes and F a nite set of paths over A. Then every set of pairwise nonequivalent AVSs is nite. Subsequently to the discovery on the word order of Dutch and Swiss German this restriction nally had to fall. Further, some people had anyway argued that the syntactic structure of the verbal complex is quite different, and that
Head Feature Convention. Let A B0 f Bi . a head feature. Then f A
C mB @
V U
V U
(6.198)
f A
f Bi
Bn
be a rule with head Bi , and f
aa
X yaaX
GPSG and HPSG Table 24. An LFGGrammar
533
this applies also to German. The verbs in a sequence of innitives were argued to form a constituent, the so called verb cluster. This has been claimed in the GB framework for German and Dutch. Also, Joan Bresnan, Ron Kaplan, Stanley Peters and Annie Zaenen argue in (1987) for a different analysis, based on principles of LFG. Central to LFG is the assumption that there are three (or even more) distinct structures that are being built simultaneously: cstructure or constituent structure: this is the structure where the linear precedence is encoded and also the syntactic structure. fstructure or functional structure: this is the structure where the grammatical relations (subject, object) but also discourse relations (topic) are encoded. astructure or argument structure: this is the structure that encodes argument relations ( roles). A rule species a piece of c-, f- and astructure together with correspondences between the structures. For simplicity we shall ignore astructure from now on. An example is provided in Table 24. The rules have two lines: the upper line species a context free phrase structure rule of the usual kind for the cstructure. The lower line tells us how the cstructure relates to the fstructure. These correspondences will allow to dene a unique fstructure (together with the universal rules of language). The rule if applied creates a local tree in the cstructure, consisting of three nodes, say 0, 00, 01, with label S, NP and VP, respectively. The corresponding fstructure is different. This is indicated by the equations. To make the ideas precise, we shall assume two sets of nodes in the universe, C and F, which are sets of cstructure
OBJ
e x
VP
PP
NP
Det
SUBJ
e x
NP
VP
534
and fstructure nodes, respectively. And we assume a function FUNC, which maps C to F. It is clear now how to translate context free rules into rstorder formulae. We directly turn to the fstructure statements. Cstructure is a tree, Fstructure is an AVS. Using the function UP to map a node to its mother, the equation SUBJ is translated as follows:
In simpler terms: I am my mothers subject. The somewhat simpler statement is translated by
Here the fstructure does not add a node, since the predicate installs itself into the root node (by the second condition), while the subject NP is its SUBJ value. Notice that the statements are local path equations, which are required to hold of the cstructure node under which they occur. LFG uses the fact that fstructure is atter than cstructure to derive the Dutch and Swiss German sentences using rules of this kind, despite the fact that the cstructures are not context free. While GPSG and LFG still assume a phrase structure skeleton that plays an independent role in the theory, HPSG actually offers a completely homogeneous theory that makes no distinction between the sources from which a structure is constrained. What made this possible is the insight that the attribute value formalism can also encode structure. A very simple possibility of taking care of the structure is the following. Already in GPSG there was a feature SUBCAT whose value was the subcategorization frame of the head. Since the subcategorization frame must map into a structure we require that in the rules
where B A. (Notice that the order of the constituents does not play any role.) This means nothing but that B is being subsumed under A that is to say that it is a special A. The difference with GPSG is now that we allow to stack the feature SUBCAT arbitrarily deep. For example, we can attribute to
(6.201)
SUBCAT
: A
V U
V U
V V U
e x
(6.200)
x :
FUNC
UP
FUNC
V U
V U
U V V D
e x
(6.199)
SUBJ
x :
SUBJ
FUNC
UP
e x
FUNC
GPSG and HPSG
535
the German word

SUBCAT
CASE SUBCAT
The rules of combination for category symbols have to be adapted accordingly. This requires some effort but is possible without problems. HPSG essentially follows this line, however pushing the use of AVSs to the limit. Not only the categories, also the entire geometrical structure is now coded using AVSs. HPSG also uses structure variables. This is necessary in particular for the semantics, which HPSG treats in the same way as syntax. (In this it differs from GPSG. The latter uses a Montagovian approach, pairing syntactic rules with semantical rules. In HPSG and LFG for that matter , the semantics is coded up like syntax.) Parallel to the development of GPSG and related frameworks, the so called constraint based approaches to natural language processing were introduced. (Shieber, 1992) provides a good reference. Denition 6.56 A basic constraint language is a nite set F of unary funct x or a statement tion symbols. A constraint is either an equation s x s x . A constraint model is a partial algebra A for the signas t iff for every a, s a and t a are dened and ture. We write equal. s x iff for every a, s a is dened. Often, one has a particular constant o, which serves as the root, and one considers equations of the form s o t o . Whereas the latter type of equation holds only at the root, the above type of equations are required to hold globally. We shall deal only with globally valid equations. Notice that we can encode atomic values into this language by interpreting an atom p as a unary function f p with the idea being that f p is dened at b iff p holds of b. (Of course, we wish to have f p f p b f p b if the latter is dened, but we need not require that.) We give a straightforward interpretation of this language into modal logic. For each f F, take a modality, which we call by the same f q f p q . (This logic is known as name. Every f satises f p , see (Kracht, 1995a).) With each term t we associate a modality in the
V y U 0 ) ( DV U V U D
CAT CASE
: NP : acc
V y U
0 8 0 0 ( U ( (
V U
V U
V y U
DD D
V aV U
V U
SUBCAT
(6.202)
CAT
: NP : dat
CASE
CAT
: NP : nom
CAT
:v
9 h h $R DD D
(to give) the following category.
e E
V U
8 9 @q
V U
536
Further,
Now, this language of constraints has been extended in various ways. The attributevalue structures of Section 6.1 effectively extend this language by boolean connectives. C : A is a shorthand for C A , where A is the modal formula associated with A. Moreover, following the discussion of Section 6.4 we use , , and to steer around in the phrase structure skeleton. HPSG uses a different encoding. It assumes an attribute called DAUGHTERS, whose value is a list. A list in turn is an AVS which is built recursively using the predicates FIRST and REST. (The reader may write down the path language for lists.) The notions of a Kripkeframe and a generalized Kripkeframe are then dened as usual. The Kripkeframes take the role of the actual syntactic objects, while the AVSs are simply formulae to talk about them. The logic L0 is of course not very interesting. What we want to have is a theory of the existing objects, not just all conceivable ones. A particular concern in syntactic theory is therefore the formulation of an adequate theory of the linguistic objects, be it a universal theory of all linguistic objects, or be it a theory of the linguistic objects of a particular language. We may cast this in logical terms in the following way. We start with a set (or class) of Kripkeframes. The theory of that class is . It would be most preferrable if for any given Kripkeframe we had iff . Unfortunately, this is not always the case. We shall see, however, that the situation is as good as one can hope for. Notice the implications of the setup. Given, say, the admissible structures of English, we get a modal logic L M Eng , which is an extension of L0 . Moreover, if LM Univ is the modal logic of all existing
0 (
%p x %p
V U e x
V U e
(6.205)
(6.204)
V U
0 ) (
Finally, given A we dene a Kripkeframe iff f x is dened and equals y. Then
H u0
0 ) ( D V e U
(6.203)
DD D
t :
V aV U U
f, f t
s p
obvious way: f : by
f ;t . Now, the formula s t p
A R with x R f y
DD D
t is translated
U
o o
V U
GPSG and HPSG
537
linguistic objects, then LM Eng furthermore is an axiomatic extension of LM Univ . There are sets and E of formulae such that
If we want to know, for example, whether a particular formula is satisable in a structure of English it is not enough to test it against the postulates of the logic L0 , nor those of LM Univ . Rather, we must show that it is consistent with LM Eng . These problems can have very different complexity. While L 0 is decidable, this need not be the case for L M Eng nor for LM Univ . The reason is that in order to know whether there is a structure for a logic that satises the axioms we must rst guess that structure before we can check the axioms on it. If we have no indication of its size, this can turn out to be impossible. The exercises shall provide some examples. Another way to see that there is a problem is this. is a theorem of L M Eng if it can be derived from L0 E using modus ponens (MP), substitution and (MN). However, E L iff can be derived from L0 E using (MP) and (MN) 0 alone. Substitution, however, is very powerful. Here we shall be concerned with the difference in expressive power of the basic constraint language and the modal logic. The basic constraint language allows to express that two terms (called paths for obvious reasons) are identical. There are two ways in which such an identity can be enforced. (a) By an axiom: then this axiom must hold of all structures under consideration. An example is provided by the agreement rules of a language. (b) As a datum: then we are asked to satisfy the equation in a particular structure. In modal logic, only equations as axioms are expressible. Except for trivial cases there is no formula s t in polymodal such that
Hence, modal logic is expressibly weaker than predicate logic, in which such a condition is easily written down. Yet, it is not clear that such conditions are at all needed in natural language. All that is needed is to be able to state conditions of that kind on all structures which we can in fact do. (See (Kracht, 1995a) for an extensive discussion.) HPSG also uses types. Types are properties of nodes. As such, they can be modelled by unary predicates in , or by boolean constants in modal logic. For example, we have represented the atomic values by proposition constants. In GPSG, the atomic values were assigned only to Type 0 features.
q r
s
V U
V U
C0 ) )
(6.207)
s x
V ) U
8 8q
(6.206)
L0
LM Univ
L0
LM Eng
L0
538
HPSG goes further than that by typing AVSs. Since the AVS is interpreted in a Kripkeframe, this creates no additional difculty. Reentrancy is modelled by path equations in constraint languages, and can be naturally expressed using modal languages, as we have seen. As an example, we consider the agreement rule (6.17) again.

P
In the earlier (Pollard and Sag, 1987) the idea of reentrancy was motivated by information sharing. What the label 1 says is that any information available under that node in one occurrence is available at any other occurrence. One way to make this true is to simply say that the two occurrences of 1 are not distinct in the structure. (An analogy might help here. In the notation a a b the two occurrences of a do not stand for different things of the universe: they both denote a, just that the linear notation forces us to write it down twice.) There is a way to enforce this in modal logic. Consider the following formula. (6.209)
AGR
AGRS
This formula says that if we have an S which consists of an NP and a VP, then whatever is the value of AGR of the NP also is the value of AGRS of the VP. The constituency structure that the rules specify can be written down using is so powquantied modal logic. As an exercise further down shows, erful that rstorder can be encoded. (See Section 1.1 for the denition of .) In one can write down an axiom that forces sets to be wellfounded with respect to and even write down the axioms of (von in having a NeumannG delBernays Set Theory), which differs from o simpler scheme for set comprehension. In its place we have this axiom.
It says that from a set x and an arbitrary subset of the universe P (which does not have to be a set) there is a set of all things that belong to both x and P. In presence of the results by Thatcher, Doner and Wright all this may sound paradoxical. However, the introduction of structure variables has made the structures into acyclic graphs rather than trees. However, our reformulation of HPSG is not expressed in but in the much weaker polymodal
B RB @
B AB DB BB A@ @ @@ C@ @@ @
Class Comprehension.
z z
x Pz .
s
| { @dz
V 0 VV a o C(0 p C
( o 0 ( aU o C80 p o
s
( o U o ( o U o !0 h
CAT
CAT
CAT
V Q11'U ) 1 | { D DD @dz q r
(6.208)
CAT
:s
CAT
: np AGR : 1
CAT
: vp AGRS : 1
T T ) ') S S | { 8Qz
GPSG and HPSG
539
logic. Thus, theories of linguistic objects are extensions of polymodal . However, as (Kracht, 2001a) shows, by introducing enough modalities one can axiomatize a logic such that a Kripkeframe F R is a frame for this logic iff F R is a model of . This means that effectively any higher order logic can be encoded into HPSG notation, since it is reducible to set theory, and thereby to polymodal logic. Although this is not per se an argument against using the notation, it shows that anything goes and that a claim to the effect that such and such phenomenon can be accounted for in HPSG is empirically vacuous. Notes on this section. One of the seminal works in GPSG besides (Gazdar et al., 1985) is the study of word order in German by Hans Uszkoreit (1987). The constituent structure of the continental Germanic languages has been a focus of considerable debate between the different grammatical frameworks. The discovery of Swiss German actually put an end to the debate whether or not context free rules are appropriate. In GSPG it is assumed that the dominance and the precedence relations are specied separately. Rules contain a dominance skeleton and a specication that says which of the orderings is admissible. However, as Almerindo Ojeda (1988) has shown, GPSG can also generate cross serial dependencies of the Swiss German type. One only has to relax the requirement that the daughters of a node must be linearly ordered to a requirement that the yield of the tree must be so ordered.
Exercise 238. Show that the logic L0 of any number of basic modal operators satisfying p q p q is decidable. This shows the decidability of L0 . Hint. Show that any formula is equivalent to a disjunction of conjunctions of statements of the form , where is a sequence of modalities and is either nonmodal or of the form m . Exercise 239. Write a grammar using LFGrules of the kind described above to generate the crossing dependencies of Swiss German. Exercise 240. Let A be an alphabet, T a Turing machine over A. The computation of T can be coded onto a grid of numbers . Take this grid to be a Kripkestructure, with basic relations the immediate horizontal successor and predecessor, the transitive closure of these relations, and the vertical successor. Take constants ca for every a A Q . c codes the position of the read write head. Now formulate an axiom T such that a Kripkestructure
w e
S s
| { @dz
Exercise 237. Show that all axioms of are expressible in .
and also Class Comprehension
0 ) (
0 (
DD D
2 U
) 1 Q11 'U
q r
0V 1 a11'U ) (

540
satises T iff it represents a computation of T . 7. Formal Structures of GB
We shall close this chapter with a survey of the basic mathematical constructs of GB. The rst complex concerns constraints on syntactic structures. GB has many types of such constraints. It has for example many principles that describe the geometrical conguration within which an element can operate. A central denition is that of idccommand, often referred to as ccommand, although the latter was originally dened differently.
d
In (1986), Jan Koster has proposed an attempt to formulate GB without the use of movement transformations. The basic idea was that the traces in the surface structure leave enough indication of the deep structure that we can replace talk of deep structure and derivations by talk about the surface structure alone. The general principle that Koster proposed was as follows. Let x be a node with label , and let be a socalled dependent element. (Dependency is dened with reference to the category.) Then there must exist a uniquely dened node y with label which ccommands x, and is local to x. Koster required in addition that and shared a property. However, in formulating this condition it turns out to be easier to constrain the possible choices of and . In addition to the parameters and it remains to say what locality is. Anticipating our denitions somewhat we shall say that we have x R y for a certain relation R. (Barker and Pullum, 1990) have surveyed the notions of locality that enter in the denition of R that were used in the literature and given a denition of command relation. Using this, (Kracht, 1993) developed a theory of command relations that we shall outline here. Denition 6.58 Let T be a tree and R T 2 a relation. R is called a command relation if there is a function f R : T T such that (1) (3) hold. R is a monotone command relation if in addition it satises (4), and tight if it satises (1) (5).
V U
Rx :
y:xRy
fR x .
1 ) }
0 i) (
0 i) (
Denition 6.57 Let every z x we have z if x idccommands y.
T be a tree, x y T . x idccommands y if for y. A constituent x idccommands a constituent y
Formal Structures of GB
541
The rst class that we shall study is the class of tight command relations. Let be a tree and P T . We say, x Pcommands y if for every z x with z P we have z y. We denote the relation of Pcommand by K P . If we choose P T we exactly get idccommand. The following theorem is left as an exercise.
the set of monotone command relations Let be a tree. We denote by MCr on . This set is closed under intersection, union and relation composition. We even have (6.210) fR x x
S S
For union and intersection this holds without assuming monotonicity. For relation composition, however, it is needed. For suppose x R S y. Then we can conclude that x R f R x and fR x S y. Hence x R S y iff y f S fR x , from which the claim now follows. Now we set
is a distributive lattice with respect to and . What is more, there are additional laws of distribution concerning relation composition.
R S T S T R
R S S R
R T , T R.
R S T S T R
R S S R
R T , T R.
V lU
X @V X U U s X V s U X V X U D V s YX U s U D X @V X U U t X V t U X V X U D V t YX U t U D
Proposition 6.60 Let R S T
MCr
. Then
0 X) s) t) !!!V lU
) )
V U lI
(6.211)
MCr
V aV U
V U
V U
fR
fS fR x
min fR x fS x
V V U T V U V U ) T V U V U )
U V U D V U V U
fR
max fR x fS x
0 i) (
Proposition 6.59 Let R be a binary relation on the tree T command relation iff R K P for some P T .
. R is a tight
V U
V U
V lU
V U
yV U
V U
If x
fR y then fR x
V U
dV U
If x
V U D
fR r
r. y then f R x fR y . fR y .
V U
fR x for all x
r.
V U lR
542
Proof. Let x be an element of the tree. Then min fS fR x
S T T
Denition 6.61 Let be a tree, R MCr . R is called generated if it can be produced from tight command relations by means of , and . R is called chain like if it can be generated from tight relations with alone. Theorem 6.62 R is generated iff R is an intersection of chain line command relations. Proof. Because of Proposition 6.60 we can move to the inside of and . Furthermore, we can move outside of the scope of . It remains to be shown that the union of two chain like command relations is an intersection of chain like command relations. This follows from Lemma 6.66. (6.213) R S
fV U V U
Proof. Let x be given. We look at f R x and fS x . Case 1. f R x fS x . Then fR S x fS x . On the right hand side we have f S fR x fS x , since S is tight. fR fS x fS x , as well as f K P Q x fS x . Case 2. f S x fR x . fR x . Then fS R x fR x fS x , whence Analogously. Case 3. f S x fR fS x fS fR x fS R x . The smallest node above x which is both in fS x . Hence P and in Q is clearly in f S x . Hence we have f K P Q x equality holds in all cases. We put
The operation is dened only on tight command relations. If R i : i m is a sequence of command relations, then R 0 R1 Rn 1 is called its product. In what is to follow we shall characterize a union of chain like relations as the intersection of products. To this end we need some denitions. The rst is that of a shufing. This operation mixes two sequences in such a way that the liner order inside the sequences is respected.
X 'aaCX
t U
V U
3 @V U
(6.214)
K P
K Q :
K P P
V U
V U
V U "V U V U
V U
V U D
V U
V U
V U
D V U
V U
V U
V U
t U
V U
t V
V U
X V X U U t
R S
S R
K P Q
V U
V U V U
V U
Lemma 6.63 Let R
K P and S
K Q be tight. Then
The other claims can be shown analogously.
X s t
V lU
V U
D D V U
f 0
(6.212)
min fR f
R S
x fR x
R T
TV aV U
V U 8 TV U ) V U S )V aV U U S
V U
V U D V U D 3
fR
fS
fR x
fT fR x
) V U
V U
543
Denition 6.64 Let ai : i m and b j : j n be sequences of objects. A shufing of and is a sequence c k : k m n such that there are injective monotone functions f : n m n and g : m m n such that im f im g and im f im g m n, as well as c f i ai for all i m and cg j b j for all j n. f and g are called the embeddings of the shufing. Denition 6.65 Let Ri : i m and S j : j n be sequences of tight command relations. Then T is called weakly associated with and if there is a shufing Ti : i m n of and together with embeddings f and g such that
im g .
If m
2, we have the following shufings.
The sequence R1 S0 S1 R0 is not a shufing because the order of the R i is not respected. In general there exist up to mn n different shufings. For every shufing there are up to 2n 1 weakly associated command relations (if n m). For example the following command relations are weakly associated to the third shufing.
Lemma 6.66 Let Ri : i m and Si : i n be sequences of tight command relations with product T and U, respectively. Then T U is the intersection of all chain like command relations which are products of sequences weakly associated with a shufing of and . In practice one has restricted attention to command relations which are characterized by certain sets of nodes, such as the set of all maximal projections, the set of all nite sentences, the set of all sentences in the indicative
The relation R0 S0 S1 R1 is however not weakly associated to it since may not occur in between two S.
(6.217)
R 0 S0 S1 S1
R0 S0 S1 R1
(6.216)
) 0
)0 ) 0
)0 ) 0
R0 R1 S0 S1 S0 R0 R1 S1
R0 S0 R1 S1 S0 R0 S1 R1
R0 S0 S1 R1 S0 S1 R0 R1
V U
} T
g ) S
D D V U } T g ) S T 3) X S 8!@I1 X
where i ii 1
for i
1 and
X aaa
(6.215)
T0
T1
T2
n 2
Tn
always if i i
D g
V U
s $V U
w D V U D ) D (
t $V U
im f or
544
mood and so on. If we choose P to be the set of nodes carrying a label subsuming the category of nite sentences, then we get the following: if x is a reexive anaphor, it has to be ccommanded by a subject, which it in turn Pcommands. (The last condition makes sure that the subject is a subject of the same sentence.) There is a plethora of similar examples where command relations play a role in dening the range of phenomena. Here, one took not just any old set of nodes but those that where denable. To precisify this, let with : T N be a labelled tree and Q N. Then K Q : K 1 Q is called a denable tight command relation. Denition 6.67 Let be a tree and R T T . P is called a (denable) command relation if it can be obtained from denable tight command relations by means of composition, union and intersection. In follows from the previous considerations that the union of denable relations is an intersection of chains of tight relations. A particular role is played by subjacency. The antecedent of a trace must be 1subjacent to a trace. As is argued in (Kracht, 1998) on the basis of (Chomsky, 1986) this relation is exactly
The movement and copytransformations create socalled chains. Chains connect elements in different positions with each other. The mechanism inside the grammar is coindexation. For as we have said in Section 6.5 traces must be properly governed, and this means that an antecendent must ccommand its trace in addition to being coindexed with it. This is a restriction on the structures as well as on the movement transformations. Using coindexation one also has the option of associating antecedent and trace without assuming that anything has ever moved. The transformational history can anyway be projected form the Sstructure up to minor (in fact inessential) variations. This means that we need not care whether the Sstructure has been obtained by transformations or by some other process introducing the indexation (this is what Koster has argued for). The association between antecedent and trace can also be done in a different way, namely by collecting sets of constituents. We call a chain a certain set of constituents. In a chain the members may be thought to be coindexed, but this is not necessary. Chomsky has once again introduced the idea in the 1990s that movement is the sequence of copying and deletion and made this one of the main innovations of the reform
V p u RRU
X V p R!U
(6.218)
V aV U xU 3
V U
0 3) %l(
545
in the Minimalist Program (see (Chomsky, 1993)). Deletion here is simply marking as phonetically empty (so the copy remains but is marked). However, the same idea can be introduced into GB without substantial change. Let us do this here and introduce in place of Move the transformation Copy . It will turn out that it is actually not necessary to say which of the members of the chain has been obtained by copying from which other member. The reason is simple: the copy (= antecedent) ccommands the original (= trace) but the latter does not ccommand the former. Knowing who is in a chain with whom is therefore enough. This is the central insight that is used in the theory of chains in (Kracht, 2001b) which we shall now outline. We shall see below that copying gives more information on the derivation than movement, so that we must be careful in saying that nothing has changed by introducing copymovement. Recall that constituents are subtrees. In what is to follow we shall not distinguish between a set of nodes and the constituent that is based on that set. Say that x accommands y if x and y are incomparable, x idccommands y but y does not idccommand x. Denition 6.68 Let be a tree. A set of constituents of which is linearly ordered with respect to accommand is called a chain in . The element which is highest with respect to accommand is called the head of , the lowest the foot. is a copy chain if any two members are isomorphic. is a trace chain if all non heads are traces. The denition of chains can be supplemented with more detail in the case of copy chains. This will be needed in the sequel.
is a chain.
The chain associated with is .
0 ) (
(b)
(a)
: is a family of isomorphisms such that for all we have
0 ) (
Denition 6.69 Let the following holds.
be a tree. A copy chain in
is a pair for which
1) ) S D
546
Often we shall identify a chain with its associated chain. The isomorphisms give explicit information which elements of the various constituents are counterparts of which others. Denition 6.70 Let be a tree and a copy chain . Then we put x y if there is a map such that x y. We put x : y : x y . If C is a set of copy chains then let C be the smallest equivalence relation generated by all , C. Further, let x C : y : x C y . Denition 6.71 Let be a copy chain , . is said to be immediately above if there is no distinct from and which ac commands and is accommanded by . A link of is a triple where is immediately above . is called a link map if it occurs in a link. An ascending map is a composition of link maps.
Proof. Let , v, w. Further, let t x be the depth of x in , t x the depth of x in . Then t x t x , since is an isomorphism. On the other hand t x t v t x and t x t w t x t w t x . The claim now follows from the next lemma given the remark that v ccommands w, but w does not ccommand v.
Proof. There exists a uniquely dened z with z x. By denition of c command we have z y. But y z, since y is not comparable with x. Hence y z. Now we have t x t z 1 and t y t z t x 1. Whence the claim. We call a pair C a copy chain tree (CCT) if C is a set of copy chains on , a nite tree. We consider among other the following constraints.
Uniqueness. Every constituent of is contained in exactly one chain. Liberation. Let be chain, and 0 1 with 0 1 such that . Then is the foot of . 0 1
Lemma 6.74 Let K be a CCT which satises Uniqueness and Liberation. . Then already Further, let and be link maps with im im .
QV U
T &U C
w V k U D
v V U
V U
t V U
YV U
T 8
1 )
gV U D
0 i) (
V U
0 %l( )
Lemma 6.73 Let t y.
be a tree, x y
T . If x accommands y, t x
g V U
V aV U U V V U U g V aV U U V U
V U D
V U
V daV U U
Lemma 6.72 Let be a link map. Then t x
t x.
)
) (
H P
) d
V U 0 )D (
V U
V U
0 ) (
1 G
V U
H Q
g $V U
V aV U U
S V
V aV U U
T 8
H I T
V U D
547
Proof. Let : , : be link maps. If im im then or . Without loss of generality we may assume the rst. If then also , since ccommands . By Liberation is the foot of its chain, in contradiction to our assumption. Hence we have . By Uniqueness, , and are therefore in the same chain. Since and are link maps, we must have . Hence . Denition 6.75 Let K be a CCT. x is called a root if x is not in the image of a link map. Then proof of the following theorem is now easy to provide. It is left for the reader. Proposition 6.76 Let K be a CCT which satises Uniqueness and Liberation. Let x be an element and i , i m, j , j n, link maps, and y, z roots such that
Hence, for given x there is a uniquely dened root x r with x C xr . Further, there exists a unique sequence i : i n of link maps such that x is the 0 . This sequence we call the canonical decomposition image of n 1 of x. Proposition 6.77 Let K be a CCT satisfying Uniqueness and Liberation. Then the following are equivalent.

y. yr .
xr
Proof. . Let x C y. Then there exists a sequence i : i p of link maps or inverses thereof such that y p 1 0 x . Now if i is a link map and i 1 an inverse link map, then i 1 i 1 . Hence we may assume that for some q p all i , i q, are inverse link maps and all i , p i q, q and : 0 1 q 1 . are link maps. Now put : p p 1 and are ascending maps. So, obtains. . Let ascending maps and
f
X 'aaX
V U
There exist two ascending maps and with y
V U
D X aaX
Then we have y
z, m
n and i
i for all i
n.
X aaa
V U
X aayX
V U
X aaayX
X iaaX
(6.219)
0 y
0 z
x.
w V k U D
D
t V U
k k D k
} $k
x D
548
be given with y 1 x . Put u : 1 x . Then u ur for some ascending map . Further, x u ur and y u ur . Now, ur is a root and x as well as y are images of u r under ascending maps. Hence ur is a root of x and y. This however means that u r xr yr . Hence, obtains. is straightforward. The proof also establishes the following fact.
Lemma 6.78 Every ascending map is a canonical decomposition. Every composition of maps equals a product 1 where and are ascending maps. A minimal composition of link maps and their inverses is unique.
the trajectory of x. The trajectory mirrors the history of x in the process of derivation. We call root line of x the set
Notice that xr idccommands itself. The peak of x is the element of WK x of smallest depth. We write x for the peak of x and x for the ascending map which sends x to x . Denition 6.79 Let K be a CCT satisfying Uniqueness and Liberation. If r is the root of the tree then r is the zenith of r, the zenith map is r : 1T . If x r then the zenith map is the composition y x , where y x . The zenith of x equals y x x . We write x for the zenith of x. Denition 6.80 A link map is called orbital if it occurs in a minimal decomposition of the zenith map. At last we can formulate the following restriction on CCTs.
No Recycling. All link maps are orbital.
The effect of a copy transformation is that (1) it adds a new constituent and (2) this constituent is added to an already existing chain as a head. Hence the whole derivation can be thought of as a process which generates a tree together with its chains. These can be explicitly described and this eliminates the necessity of talking about transformations.
V U
) V U
V U
V U
(6.221)
WK x :
y:y
TK x y idccommands xr
V U
X aayX
V U
(6.220)
TK x :
Let x be an element and i : i call
n its canonical decomposition. Then we
0 x : j
V D U D
V U
V U X V U
V U X
V DU (
549
Everything that one wants to say about transformations and derivations can be said also about copy chain structures. The reason for this is the following fact. We call a CCT simply a tree if every chain consists of a single constituent. Then also this tree is a CCS. A transformation can naturally be dened as an operation between CCSs. It turns out that Copy turns a CCS into a CCS. The reason for this is that traces have to be bound and may not be moved. (Only in order to reect this in the denition of the CCSs the condition No Recycling has been introduced. Otherwise it was unnecessary.) The following now holds. Theorem 6.82 A CCT is a CCS iff it is obtained from a tree by successive application of Copy . Transformational grammar and HPSG are not as different as one might think. The appearance to the contrary is created by the fact that TG is written up using trees, while HPSG has acyclic structures, which need not be trees. In this section we shall show that GB actually denes structures that are more similar to acyclic graphs than to trees. The basis for the alternative formulation is the idea that instead of movement transformations we dene an operation that changes the dominance relation. If the daughter constituent z of x moves and becomes a daughter constituent of y then we can simply add to the dominance relation the pair z y . This rather simple idea has to be worked out carefully. For rst we have to change from using the usual transitive dominance relation the immediate dominance relation. Second one has to take care of the linear order of the elements at the surface since it is now not any more represented. Denition 6.83 A multidominance structure (MDS) is a triple M r such is a directed acyclic graph with root r and for every x r the set that M M x : y : x y is linearly ordered by . With an MDS we only have coded the dominance relation between the constituents. In order to include order we cannot simply add another relation as we did with trees. Depending on the branching number, a fair number of new relations will have to be added, which represent the relations the ith daughter of (where i n, the maximum branching number). Since we are dealing with binary branching trees we need only two of these relations.
0 ) )
0 %l( ) (
Denition 6.81 A copy chain structure (CCS) is a CCT K satises Uniqueness, Liberation and No Recycling.
C which
0 ) ( T
S V U 0 ) D (
550
Denition 6.84 An ordered (binary branching) multidominance structure (OMDS) is a quadruple M 0 1 r such that the following holds: From x From x
h
(The reader may verify that and together imply that 0 .) Let 1 be a binary branching ordered tree. Then we put x 0 y if x is a daughter of y and there is no daughter z of y with z x. Further, we write x 1 y if x is a daughter of y but not x 0 y. C be a CCS over an ordered binary branching Theorem 6.85 Let K tree with root r. Put M : x C , x T , as well as for i 0 1, x C i y C iff there is an x C x and an y C y with x i y . Finally let

Then M K is an OMDS.
Now we want to deal with the problem of nding the CCS from the OMDS. Denition 6.86 Let M 0 1 r be an OMDS. An identier is a sequence I xi : i n such that r x0 and xi xi 1 for all i n. denotes the set of all identiers of . The address of I is that sequence i : i n such that for all i n one has xi i xi 1 . The following is easy to see. Proposition 6.87 The set of addresses of an OMDS is a tree domain. This means that we have already identied the tree structure. What remains to do is to nd the chains. The order is irrelevant, so we ignore it. At rst we want to establish which elements are overt. In a CCS an element x is called overt if for every y x the constituent y is the head of its chain. This we can also describe in the associated MDS. We say a pair x y is a link in M r if x y. The link is maximal if y is maximal with respect to in M x . An Sidentier is an identier I xi : i n where xi 1 xi is a maximal link for all i n. (For the purpose of this denition, x 1 is the root.) The overt elements are exactly the Sidentiers.
V U 0 ) ( )
( V U lbs
0 )
0 ) (
0 )
0 )
V U
(6.222)
M K :
rK
If x
z for some z then there exists a y
#k 0 %l( D )
1
h
y and x
z follows y
0
h
y and x
z follows y
0 )
s 0
V U
r is an MDS.
0 )
0 j) ) i%l(
z. z. z with x
0
y.
551
M, r
One nds out easily that if K is derived from K by simple copying then M K is isomorphic to a link extension of M K . Let conversely be a link extension of and K a CCS such that M K . Then we claim that there is a CCS K for which M K and which results by copying from . Further, K. This is unique up to isomorphism. The tree is given by let the tree of K be exactly . First we have , and the identity is an embedding whose image contains all identiers which do not in contain the subsequence x; y. Let now y be maximal with respect to . Further, let I be the Sidentier of y and I the Sidentier of y in . I; J for some J since y y. Dene : I; J; x; K I; x; K. This Then I is an isomorphism of the constituent I; J; x onto the constituent I; x. Now we dene the chains on . Let be the chain of K which contains the constituent I; J; xK. Then let : im , where 1 : . For every other chain let : : : . Finally for an identier L I; J; x; K we put L : L 1L . Then we put
L
This is a CCS. Evidently it satises Uniqueness. Further, Liberation is satised as one easily checks. For No Recycling it sufces that the new link map is orbital. This is easy to see. Now, how does one dene the kinds of structures that are common in GB? One approximation is the following. We say a trace chain structure is a pair C where C is a set of trace chains. If we have a CCS we get the trace chain structure relatively easily. To this end we replace all maximal nonovert constituents in a tree by a trace (which is a one node tree). This however deletes some chain members! Additionally it may happen that some traces are not any more bound. Hence we say that a trace chain structure is a pair C which results from a CCS by deleting overt constituents. Now one can dene trace chain structures also from MDSs, and it turns out that if two CCSs K and K have isomorphic MDSs then their trace chain structures are isomorphic. This has the following reason. An MDS is determined from and C alone. We can determine the root of every element from and C , and further also the root line. From this we can dene the peak of every element and therefore
0 %l( )
0 IT
S s T
U s 1 k 7') iV k lbP( S ) )
(6.223)
K :
T 0 ) @c 2 ( S s k k D k 0 k 2k D ( )k )
:L I; J; x; K
0 IT
S) 'bT
0 ) T k bV U
V lbs lbs k U V U V k lbs U
S !(
S s (
CV U D V U
0 1 k G 0 D ) (
k D D k
k V k U D
k
V U l"
0 ) ) X k
Denition 6.88 Let M is called a link extension of where x y is maximal in .
r and if M
V k lbs U
S s CT
r be MDSs. Then r and xy ,
S s
0 ) ( D k
Vk U D
0 %l( )
D k
k
552
also the zenith. The overt elements are exactly the elements in zenith position. Except for the overt element, the trace chain structure contains also the traces. These are exactly the overt daughters of the overt elements. Let us summarize. There exists a biunique correspondence between derivations of trace chain structures, derivations of CCSs and derivations of MDSs. Further, there is a biunique correspondence between MDSs and trace chain structures. Insofar the latter two structures are exactly equivalent. CCSs contain more information over the derivation (see the exercises).
Exercise 242. Prove Proposition 6.59. Exercise 243. Show Lemma 6.66. Exercise 244. Show that accommand is transitive. Exercise 245. Show Proposition 6.76. Exercise 246. Let the CCS in Figure 18 be given. The members of a chain are annotated by the same upper case Greek letter. Trivial chains are not 4, : i i 6 i 6 , and also shown. Let the link maps be : 2 : i i 13 i 13 . Compute i C for every i. If instead of we take the map how do the equivalence classes change?
Determine the peak and the zenith of every element and the maps. Exercise 247. Let d n be the largest number of nonisomorphic CCSs which have (up to isomorphism) the same MDS. Show that d n O 2n .
1 QV U
V U
(6.224)
: 1
82
73
94
10 5
11
V U lbs
Exercise 241. This example shows why we cannot use the ordering the MDSs. Let 012 0 and 012 0 with 2 1 1 0 and 2 0 . Evidently . Construct and as well as the connected CCS.
in
k 0 %bT ) ) D !( )k ) S U g
T 0 ) @Y k ( S s 0 bT ) ) !( D ) ) S V
Vk U ls T 0 ) 0 ) @S () ( k g
27
26
13
19
12
18
3
24
11
16
25
22
17
Figure 18. A CopyChain Structure
3 3
2 4 7 10 8 14 15 20 29 553 21 23 28
3 3
Bibliography
Ajdukiewicz, Kazimierz 1936 Die syntaktische Konnexit t [The syntactic connectivity]. Studia a Philosophica 1:1 27. Barendregt, Henk 1985 The Lambda Calculus. Its Syntax and Semantics. Studies in Logic and the Foundations of Mathematics 103. 2nd ed. Amsterdam: Elsevier. Barker, Chris, and Pullum, Geoffrey. 1990 A theory of command relations. Linguistics and Philosophy 13:134. Bauer, Brigitte L. M. 1995 The Emergence and Development of SVO Patterning in Latin and French. Oxford: Oxford University Press. Bird, Steven, and Ellison, Mark 1994 Onelevel phonology: Autosegmental representations and rules as nite automata. Computational Linguistics 20:55 90. Blackburn, Patrick 1993 Modal logic and attribute value structures. In Diamonds and Defaults, Maarten de Rijke (ed.), 19 65. (Synthese Library 229.) Dordrecht: Kluwer. Blok, Wim J., and Pigozzi, Don J. 1990 Algebraizable logics. Memoirs of the Americal Mathematical Society, 77(396). Bochvar, D. A. 1938 On a threevalued logical calculus and its application to the analysis of contradictions. Mathematicheskii Sbornik 4:287 308. B ttner, Michael, and Th mmel, Wolf (eds.) o u 2000 Variablefree Semantics. Artikulation und Sprache 3. Osnabr ck: seu colo Verlag. Bresnan, Joan, Kaplan, Ronald M., Peters, Stanley, and Zaenen, Annie 1987 CrossSerial Dependencies in Dutch. In The Formal Complexity of Natural Language, Walter Savitch, Emmon Bach, William Marsch, and Gila SafranNaveh (eds.), 286 319. Dordrecht: Reidel. B chi, J. u 1960 Weak secondorder arithmetic and nite automata. Zeitschrift f ur Mathematische Logik und Grundlagen der Mathematik 6:66 92.
556
Bibliography
Burmeister, Peter 1986 A Model Theoretic Oriented Approach to Partial Algebras. Berlin: Akademie Verlag. 2002 Lecture Notes on Universal Algebra. Many Sorted Partial Algebras. Manuscript available via internet. Burris, Stanley, and Sankappanavar, H. P. 1981 A Course in Universal Algebra. Graduate Texts in Mathematics 78. Berlin/New York: Springer. Buszkowski, Wojciech 1997 Mathematical linguistics and proof theory. In Handbook of Logic and Language, Johan van Benthem and Alice ter Meulen (eds.), 683 736. Amsterdam: Elsevier. Carpenter, Bob 1992 The Logic of Typed Feature Structures. Cambridge Tracts in Theoretical Computer Science 32. Cambridge: Cambridge University Press. Chandra, A. K., Kozen, D. C., and Stockmeyer, L. J. 1981 Alternation. Journal of the Association for Computing Machinery 28:114 133. Chomsky, Noam and Halle, Morris 1968 The sound pattern of English. New York: Harper and Row. Chomsky, Noam 1959 On certain formal properties of grammars. Information and Control 2:137 167. 1962 Contextfree grammars and pushdown storage. MIT Research Laboratory of Electronics Quarterly Progress Report 65. 1981 Lecture Notes on Government and Binding. Dordrecht: Foris. 1986 Barriers. Cambridge (Mass.): MIT Press. 1993 A minimalist program for linguistic theory. In The View from Building 20: Essays in Honour of Sylvain Bromberger, Ken Hale and Samuel J. Keyser (eds.), 1 52. Cambridge (Mass.): MIT Press. Church, Alonzo 1933 A set of postulates for the foundation of logic. Annals of Mathematics 2:346 366. 1940 A formulation of the simple theory of types. Journal of Symbolic Logic 5:56 68. Coulmas, Florian 2003 Writing Systems. An introduction to their linguistic analysis. Cambridge: Cambridge University Press.
Bibliography
557
Culy, Christopher 1987 The Complexity of the Vocabulary of Bambara. In The Formal Complexity of Natural Language, Walter Savitch, Emmon Bach, William Marsch, and Gila SafranNaveh (eds.), 345 351. Dordrecht: Reidel. Curry, Haskell B. 1930 Grundlagen der kombinatorischen Logik [Foundations of combinatory logic]. American Journal of Mathematics 52:509 536, 789 834. 1977 Foundations of Mathematical Logic. 2nd ed. New York: Dover Publications. Davey, B. A., and Priestley, H. A. 1990 Lattices and Order. Cambridge: Cambridge University Press. Deutsch, David, Ekert, Artur, and Luppacchini, Rossella 2000 Machines, logic and quantum physics. Bulletin of Symbolic Logic 6:265 283. Doner, J. E. 1970 Tree acceptors and some of their applications. Journal of Computer and Systems Sciences 4:406 451. Dowty, David R., Wall, Robert E., and Peters, Stanley 1981 Introduction to Montague Semantics. Synthese Library 11. Dordrecht: Reidel. Dresner, Eli 2001 Tarskis Restricted Form and Neales Quanticational Treatment of Proper Names. Linguistics and Philosophy 24:405 415. 2002 Holism, Language Acquisition, and Algebraic Logic. Linguistics and Philosophy 25:419 452. Dymetman, Marc 1991 Inherently reversible grammars, logic programming and computability. In Proceedings of the ACL Workshop: Reversible Grammars in Natural Language Processing. 1992 Transformations de Grammaires Logiques et R versibilit [Transfore e mations of Logical Grammars and Reversibility]. Ph. D. diss., Universit Joseph Fourier, Grenoble. e Ebbinghaus, Hans-Dieter, and Flum, J rg o 1995 Finite Model Theory. Perspectives in Mathematical Logic. Berlin/New York: Springer. Ebert, Christian, and Kracht, Marcus 2000 Formal syntax and semantics of case stacking languages. In Proceedings of the EACL 2000.
558
Bibliography
van Eijck, Jan 1994 Presupposition failure: a comedy of errors. Formal Aspects of Computing 3. Eisenberg, Peter 1973 A Note on Identity of Constituents. Linguistic Inquiry 4:417 420. Ewen, Colin J., and van der Hulst, Harry 2001 The Phonological Structure of Words. Cambridge: Cambridge University Press. Ferreir s, Jos o e 2001 The road to modern logic an interpretation. The Bulletin of Symbolic Logic 7:441 484. Fiengo, Robert, and May, Robert 1994 Indices and Identity. Linguistic Inquiry Monographs 24. Cambridge (Mass.): MIT Press. Fine, Kit 1992 Transparency, Part I: Reduction. Unpublished manuscript, UCLA.
Frege, Gottlob 1962 Funktion und Begriff [Function and Concept]. In Funktion, Begriff, Bedeutung. F nf logische Studien [Function, Concept, Meaning. Five u logical Studies], G nther Patzig (ed.), 17 39. G ttingen: Vandenu o hoeck & Ruprecht. Frey, Werner 1993 Syntaktische Bedingungen fur die semantische Interpretation [Syntac tic Conditions for the Semantic Interpretation]. Number 35 in Studia Grammatica. Berlin: Akademie Verlag. Fromkin, V. (ed.) 2000 Linguistics: An Introduction to linguistic theory. London: Blackwell. Gamut, L. T. F. 1991a Logic, Language and Meaning. Vol. 1: Introduction to Logic. Chicago: The University of Chicago Press. 1991b Logic, Language and Meaning. Vol. 2: Intensional Logic and Logical Grammar. Chicago: The University of Chicago Press. G rdenfors, Peter a 1988 Knowledge in Flux. Cambridge (Mass.): MIT Press. Gazdar, Gerald, Klein, Ewan, Pullum, Geoffrey, and Sag, Ivan. 1985 Generalized Phrase Structure Grammar. London: Blackwell. Gazdar, Gerald, Pullum, Geoffrey, Carpenter, Bob, Hukari, T., and Levine, R. 1988 Category structures. Computational Linguistics 14:1 19.
Bibliography
559
Geach, Peter 1972 A Program for Syntax. In Semantics for Natural Language, Donald Davidson and Gilbert Harman (eds.). (Synthese Library 40.) Dordrecht: Reidel. Geller, M. M., and Harrison, M. A. 1977 On LR k grammars and languages. Theoretical Computer Science 4:245 276. Geurts, Bart 1998 Presupposition and Anaphors in Attitude Contexts. Linguistics and Philosophy 21:545 601. Ginsburg, Seymour and Spanier, Edwin H. 1964 Bounded ALGOLLike Languages. Transactions of the American Mathematical Society 113:333 368. 1966 Semigroups, Presburger Formulas, and Languages. Pacic Journal of Mathematics 16:285 296. Ginsburg, Seymour 1975 Algebraic and AutomataTheoretic Properties of Formal Languages. Amsterdam: NorthHolland. Goldstern, Martin, and Judah, Haim 1995 The Incompleteness Phenomenon. Wellesley (Mass.): AK Peters. Gr tzer, George a 1968 Universal Algebra. New York: van Nostrand. 1971 Lattice Theory: First Concepts and Distributive Lattices. Freeman. Greibach, Sheila A. 1967 A new normalform theorem for contextfree phrase structure grammars. Journal of the Association for Computing Machinery 13:42 52. Grewendorf, G nter, Hamm, Friedrich, and Sternefeld, Wolfgang u 1987 Sprachliches Wissen. Eine Einfuhrung in moderne Theorien der gram matischen Beschreibung [Knowledge of Language. An Introduction to Modern Theories of Grammatical Description]. Number 695 in suhrkamp taschenbuch wissenschaft. Frankfurt a.M.: Suhrkamp Verlag. Groenink, Annius V. 1997a Mild contextsensitivity and tuplebased generalizations of context grammar. Linguistics and Philosophy 20:607636. 1997b Surface without Structure. Word Order and Tractability Issues in Natural Language Analysis. Ph. D. diss., University of Utrecht.
B @
560
Bibliography
de Groote, Philippe 2001 Towards Abstract Categorial Grammars. In Association for Computational Linguistics, 39th Annual Meeting and 10th Conference of the European Chapter, 148 155, Toulouse. Haider, Hubert 1991 Die menschliche Sprachf higkeit exaptiv und kognitiv opak [The a human language faculty exaptive and cognitively opaque]. Kognitionswissenschaft 2:11 26. 1993 Deutsche Syntax generativ. Vorstudien zur Theorie einer projektiven Grammatik [German Syntax generative. Preliminary studies towards a theory of projective grammar]. T bingen: Gunter Narr Veru lag. 1995 Downright down to the right. In On Extraction and Extraposition, U. Lutz and J. Pafel (eds.), 145 271. Amsterdam: John Benjamins. 1997 Extraposition. In Rightward Movement, D. Beerman, D. LeBlanc, and H. van Riemsdijk (eds.), 115 151. Amsterdam: John Benjamins. 2000 Branching and Discharge. In Lexical Specication and Insertion, Peter Coopmans, Martin Everaert, and Jane Grimshaw (eds.), 135 164. (Current Issues in Linguistic Theory 197.) Amsterdam: John Benjamins. Halmos, Paul 1956 Homogeneous locally nite polyadic boolean algebras of innite degree. Fundamenta Mathematicae 43:255 325. Hamm, Friedrich, and van Lambalgen, Michiel 2003 Event Calculus, Nominalization and the Progressive. Linguistics and Philosophy 26:381 458. Harkema, Henk 2001 A Characterization of Minimalist Languages. In Logical Aspects of Computational Linguistics (LACL 01), Philippe de Groote, Glyn Morrill, and Christian Retor (eds.), 193 211. (Lecture Notes in Articial e Intelligence 2099.) Berlin/New York: Springer. Harris, Zellig S. 1963 Structural Linguistics. The University of Chicago Press. 1979 Mathematical Structures of Language. Huntington (NY): Robert E. Krieger Publishing Company. Harrison, Michael A. 1978 Introduction to Formal Language Theory. Reading (Mass.): Addison Wesley. Hausser, Roland R. 1984 Surface Compositional Grammar. M nchen: Wilhelm Finck Verlag. u
Bibliography
561
Heim, Irene 1983 On the projection problem for presuppositions. In Proceedings of the 2nd West Coast Conference on Formal Linguistics, M. Barlow and D. Flickinger, D. Westcoat (eds.), 114 126, Stanford University. Hendriks, Herman 2001 Compositionality and ModelTheoretic Interpretation. Journal of Logic, Language and Information 10:29 48. Henkin, Leon, Monk, Donald, and Tarski, Alfred 1971 Cylindric Algebras. Part 1. Studies in Logic and the Foundation of Mathematics 64. Amsterdam: NorthHolland. Hindley, J. R., Lercher, B., and Seldin, J. P. 1972 Introduction to Combinatory Logic. London Mathematical Society Lecture Notes 7. Oxford: Oxford University Press. Hindley, J. R. and Longo, L. 1980 Lambda calculus models and extensionality. Zeitschrift fur mathema tische Logik und Grundlagen der Mathematik 26:289 310. Hintikka, Jaakko 1962 Knowledge and Belief. An Introduction into the logic of the two notions. Ithaca: Cornell University Press. Hodges, Wilfrid 2001 Formal features of compositionality. Journal of Logic, Language and Information 10:7 28. Hopcroft, John E., and Ullman, Jeffrey D. 1969 Formal Languages and their Relation to Automata. Reading (Mass.): Addison Wesley. van der Hulst, Harry 1984 Syllable Structure and Stress in Dutch. Dordrecht: Foris. Huybregts, Riny 1984 Overlapping Dependencies in Dutch. Utrecht Working Papers in Linguistics 1:3 40. IPA 1999 Handbook of the International Phonetic Association. Cambridge: Cambridge University Press.
Jackendoff, Ray XSyntax: A Study of Phrase Structure. Linguistic Inquiry Mono1977 graphs 2. Cambridge (Mass.): MIT Press. Jacobson, Pauline 1999 Toward a VariableFree semantics. Linguistics and Philosophy 22:117 184.
562
Bibliography The (Dis)Organisation of the Grammar: 25 Years. Linguistics and Philosophy 25:601 626.
2002
Johnson, J. S. 1969 Nonnitizability of classes of representable polyadic algebras. Journal of Symbolic Logic 34:344 352. Johnson, Mark 1988 AttributeValue Logic and the Theory of Grammar. CSLI Lecture Notes 16. Stanford: CSLI. Jones, Burton W. 1955 The Theory of Numbers. New York: Holt, Rinehart and Winston. Joshi, Aravind, Levy, Leon S., and Takahashi, Masako 1975 Tree Adjunct Grammars. Journal of Computer and System Sciences 10:136 163. Joshi, Aravind K. 1985 Tree adjoining grammars: How much contextsensitivity is required to provide reasonable structural descriptions? In Natural Language Parsing. Psychological, Computational, and Theoretical Perspectives, David Dowty, Lauri Karttunen, and Arnold Zwicky (eds.), 206250. Cambridge: Cambridge University Press. Just, Winfried and Weese, Martin 1996 Discovering Modern Set Theory. Vol. I: The Basics. Graduate Studies in Mathematics 8. AMS. 1997 Discovering Modern Set Theory. Vol. II: SetTheoretic Tools for Every Mathematician. Graduate Studies in Mathematics 18. AMS. Kac, Michael B., ManasterRamer, Alexis, and Rounds, William C. 1987 SimultaneousDistributive Coordination and ContextFreeness. Computational Linguistics 13:25 30. Kaplan, Ron M., and Kay, Martin 1994 Regular Models of Phonological Rule Systems. Computational Linguistics 20:331 378. Karttunen, Lauri 1974 Presuppositions and linguistic context. Theoretical Linguistics 1:181 194. Kasami, Tadao, Seki, Hiroyuki, and Fujii, Mamoru 1987 Generalized contextfree grammars, multiple contextfree grammars and head grammars. Technical report, Osaka University.
Bibliography
563
Kasami, Tadao 1965 An efcient recognition and syntaxanalysis algorithm for context free languages. Technical report, Air Force Cambridge Research Laboratory, Bedford (Mass.). Science Report AFCRL65758. Kayne, Richard S. 1994 The Antisymmetry of Syntax. Linguistic Inquiry Monographs 25. Cambridge (Mass.): MIT Press. Keenan, Edward L., and Faltz, Leonard L. 1985 Boolean Semantics for Natural Language. Dordrecht: Reidel. Keenan, Edward L., and Westerst hl, Dag a 1997 Generalized quantiers. In Handbook of Logic and Language, Johan van Benthem and Alice ter Meulen (eds.), 835 893. Amsterdam: Elsevier. Kempson, Ruth 1975 Presupposition and the delimitation of semantics. Cambridge: Cambridge University Press. Kleene, Stephen C. 1956 Representation of events in nerve nets. In Automata Studies, C. E. Shannon and J. McCarthy (eds.), 3 40. Princeton: Princeton University Press. Knuth, Donald 1956 On the translation of languages from left to right. Information and Control 8:607 639. Koskenniemi, Kimmo 1983 Twolevel morphology. A general computational model for word form recognition. Technical Report 11, Department of General Linguistics, University of Helsinki. Koster, Jan 1986 Domains and Dynasties: The Radical Autonomy of Syntax. Dordrecht: Foris.
Koymans, J. P. C. 1982 Models of the Lambda Calculus. Information and Control 52:306 332. Kracht, Marcus 1993 Mathematical aspects of command relations. In Proceedings of the EACL 93, 241 250. 1994 Logic and Control: How They Determine the Behaviour of Presuppositions. In Logic and Information Flow, Jan van Eijck and Albert Visser (eds.), 88 111. Cambridge (Mass.): MIT Press.
564
Bibliography Is there a genuine modal perspective on feature structures? Linguistics and Philosophy 18:401 458. Syntactic Codes and Grammar Renement. Journal of Logic, Language and Information 4:41 60. Inessential Features. In Logical Aspects of Computational Linguistics (LACL 96), Christian Retor (ed.), 43 62. (Lecture Notes in Artie cial Intelligence 1328.) Berlin/New York: Springer. Adjunction Structures and Syntactic Domains. In The Mathematics of Sentence Structure. Trees and Their Logics, Uwe M nnich and Hanso Peter Kolb (eds.), 259 299. (Studies in Generative Grammar 44.) Berlin: Mouton de Gruyter. Tools and Techniques in Modal Logic. Studies in Logic and the Foundations of Mathematics 142. Amsterdam: Elsevier. Modal Logics That Need Very Large Frames. Notre Dame Journal of Formal Logic 42:141 173. Syntax in Chains. Linguistics and Philosophy 24:467 529. Against the feature bundle theory of case. In New Perspectives on Case Theory, Eleonore Brandner and Heike Zinsmeister (eds.), 165 190. Stanford: CSLI.
1995a 1995b 1997 1998
1999 2001a 2001b 2003
Kuroda, S. Y. 1964 Classes of languages and linear bounded automata. Information and Control 7:207 223. Lamb, Sydney M. 1966 Outline of Straticational Grammar. Washington: Georgetown University Press. Lambek, Joachim 1958 The Mathematics of Sentence Structure. The American Mathematical Monthly 65:154 169. Landweber, Peter S. 1963 Three theorems on phrase structure grammars of type 1. Information and Control 6:131 137. Langholm, Tore 2001 A Descriptive Characterisation of Indexed Grammars. Grammars 4:205 262. Lehmann, Winfred P. 1993 Theoretical Bases of IndoEuropean Linguistics. London: Routledge. Leibniz, Gottfried Wilhelm 2000 Die Grundlagen des logischen Kalkuls (LateinischDeutsch) [The Foundations of the Logical Calculus (LatinGerman)]. Philosophische Bibliothek 525. Hamburg: Meiner Verlag.
Bibliography
565
Levelt, Willem P. 1991 Speaking. From Intention to Articulation. 2nd ed. Cambridge (Mass.): MIT Press. Lyons, John 1968 Introduction to Theoretical Linguistics. Cambridge: Cambridge University Press. 1978 Semantics. Vol. 1. Cambridge: Cambridge University Press. ManasterRamer, Alexis, and Kac, Michael B. 1990 The Concept of Phrase Structure. Linguistics and Philosophy 13:325 362. ManasterRamer, Alexis, Moshier, M. Andrew, and Zeitman, R. Suzanne 1992 An Extension of Ogdens Lemma. Manuscript. Wayne State University, 1992. ManasterRamer, Alexis 1986 Copying in natural languages, contextfreeness and queue grammars. In Proceedings of the 24th Annual Meeting of the Association for Computational Linguistics, 85 89. Markov, A. A. 1947 On the impossibility of certain algorithms in the theory of associative systems (Russian). Doklady Akad mii Nauk SSSR 55:587 590. e Marsh, William, and Partee, Barbara H. 1987 How NonContext Free is Variable Binding? In The Formal Complexity of Natural Language, Walter Savitch, Emmon Bach, William Marsch, and Gila SafranNaveh (eds.), 369 386. Dordrecht: Reidel. Mel uk, Igor c 1988 Dependency Syntax: Theory and Practice. SUNY Linguistics Series. Albany: State University of New York Press. 2000 Cours de Morphologie G n rale [General Morphology. A Coursee e book]. Volume 1 5. Montr al: Les Presses de lUniversit de e e Montr al. e Meyer, A. R. 1982 What is a model of the lambda calculus? Information and Control 52:87 122. Michaelis, Jens, and Kracht, Marcus 1997 Semilinearity as a syntactic invariant. In Logical Aspects of Computational Linguistics (LACL 96), Christian Retor (ed.), 329 345. e (Lecture Notes in Articial Intelligence 1328.) Heidelberg: Springer.
566
Bibliography
Michaelis, Jens, and Wartena, Christian 1997 How linguistic constraints on movement conspire to yield languages analyzable with a restricted form of LIGs. In Proceedings of the Conference on Formal Grammar (FG 97), Aix en Provence, 158168. 1999 LIGs with reduced derivation sets. In Constraints and Resources in Natural Language Syntax and Semantics, Gosse Bouma, GeertJan M. Kruijff, Erhard Hinrichs, and Richard T. Oehrle (eds.), 263279. Stanford: CSLI. Michaelis, Jens 2001a Derivational minimalism is mildly contextsensitive. In Logical Aspects of Computational Linguistics (LACL 98), Michael Moortgat (ed.), 179 198. (Lecture Notes in Articial Intelligence 2014.) Heidelberg: Springer. 2001b On Formal Properties of Minimalist Grammars. Ph. D. diss., Universit t Potsdam. a 2001c Transforming linear contextfree rewriting systems into minimalist grammars. In Logical Aspects of Computational Linguistics (LACL 01), Philippe de Groote, Glyn Morrill, and Christian Retor (eds.), e 228 244. (Lecture Notes in Articial Intelligence 2099.) Heidelberg: Springer. Miller, Philip H. 1991 Scandinavian Extraction Phenomena Revisited: Weak and Strong Generative Capacity. Linguistics and Philosophy 14:101 113. 1999 Strong Generative Capacity. The Semantics of Linguistic Formalisms. Stanford: CSLI. Mitchell, J. C. 1990 Type systems for programming languages. In Handbook of Theoretical Computer Science, Vol B. Formal Models and Semantics, Jan van Leeuwen (ed.), 365 458. Amsterdam: Elsevier. Monk, Donald J. 1969 Nonnitizability of classes of representable cylindric algebras. Journal of Symbolic Logic 34:331 343. 1976 Mathematical Logic. Berlin, Heidelberg: Springer. M nnich, Uwe o 1999 On cloning contextfreeness. In The Mathematics of Sentence Structure. Trees and their Logics, Hans-Peter Kolb and Uwe M nnich o (eds.), 195 229. (Studies in Generative Grammar 44.) Berlin: Mouton de Gruyter.
Bibliography
567
Moschovakis, Yannis 1994 Sense and denotation as algorithm and value. In Proceedings of the ASL Meeting 1990, Helsinki, Juha Oikkonen and Jouko V an nen a a (eds.), 210 249. (Lecture Notes in Logic 2.) Berlin and Heidelberg: Springer. Myhill, John 1960 Linear bounded automata. Technical report, WrightPatterson Air Force Base. Ogden, R. W., Ross, R, J., and Winkelmann, K. 1985 An Interchange Lemma for Context Free Languages. SIAM Journal of Computing 14:410 415. Ogden, R. W. 1968 A helpful result for proving inherent ambiguity. Mathematical Systems Sciences 2:191 194. Ojeda, Almerindo E. 1988 A Linear Precedence Account of CrossSerial Dependencies. Linguistics and Philosophy 11:457 492. Pentus, Mati 1995 Models for the Lambek calculus. Annals of Pure and Applied Logic 75:179 213. 1997 ProductFree LambekCalculus and ContextFree Grammars. Journal of Symbolic Logic 62:648 660. Peters, Stanley P., and Ritchie, R. W. 1971 On restricting the base component of transformational grammars. Information and Control 18:483 501. 1973 On the generative power of transformational grammars. Information Sciences 6:49 83. Pigozzi, Don J., and Salibra, Antonino 1995 The Abstract VariableBinding Calculus. Studia Logica 55:129 179. Pigozzi, Don J. 1991 Fregean Algebraic Logic. In Algebraic Logic, Hajnal Andr ka, Donald e Monk, and Istv n N meti (eds.), 475 504. (Colloquia Mathematica a e Societatis J nos Bolyai 54.), Budapest and Amsterdam: J nos Bolyai a a Matematikai T rsulat and NorthHolland. a Pogodalla, Sylvain 2001 R seaux de preuve et g n ration pour les grammaires de type logiques e e e [Proof nets and generation for type logical grammars]. Ph. D. diss., Institut National Polytechnique de Lorraine.
568
Bibliography
Pollard, Carl J., and Sag, Ivan 1987 InformationBased Syntax and Semantics. Vol. 1. CSLI Lecture Notes 13. Stanford: CSLI. 1994 HeadDriven Phrase Structure Grammar. Chicago: The University of Chicago Press. Pollard, Carl J. 1984 Generalized Phrase Structure Grammar, Head Grammars and Natural Language. Ph. D. diss., Stanford University. Port, R. F., and ODell, M. L. 1985 Neutralization of syllablenal voicing in German. Technical Report 4, Indiana University, Bloomington. Post, Emil L. 1936 Finite combinatory processes formulation. Journal of Symbolic Logic 1:103 105. 1943 Formal reductions of the combinatorial decision problem. Americal Journal of Mathematics 65:197 215. 1947 Recursive unsolvability of a problem of Thue. Journal of Symbolic Logic 11:1 11. Postal, Paul 1964 Constituent Structure: A Study of Contemporary Models of Syntax. The Hague: Mouton. Prucnal, T., and Wro ski, A. n 1974 An algebraic characterization of the notion of structural completeness. Bulletin of Section Logic of the Polish Academy of Sciences 3:20 33. Quine, Willard van Orman 1960 Variables explained away. Proceedings of American Philosophical Society 104:343 347. Radzinski, Daniel 1990 Unbounded Syntactic Copying in Mandarin Chinese. Linguistics and Philosophy 13:113 127, 1990. Rambow, Owen 1994 Formal and Computational Aspects of Natural Language Syntax. Ph. D. diss., University of Pennsylvania. Recanati, Francois 2000 Oratio Obliqua, Oratio Recta. An Essay on Metarepresentation. Cambridge (Mass.): MIT Press. Roach, Kelly 1987 Formal properties of head grammars. In Mathematics of Language, Alexis ManasterRamer (ed.), 293347. Amsterdam: John Benjamins.
Bibliography
569
Rogers, James 1994 Studies in the Logic of Trees with Applications to Grammar Formalisms. Ph. D. diss., University of Delaware, Department of Computer & Information Sciences. Rounds, William C. 1988 LFP: A Logic for Linguistic Description and an Analysis of its Complexity. Computational Linguistics 14:1 9. Russell, Bertrand 1905 On denoting. Mind 14:479 493. Sain, Ildik , and Thompson, Richard S. o 1991 Finite Schema Axiomatization of QuasiPolyadic Algebras. In Algebraic Logic, Hajnal Andr ka, Donald Monk, and Istv n N meti e a e (eds.), 539 571. (Colloquia Mathematica Societatis J nos Bolyai a 54.) Budapest and Amsterdam: J nos Bolyai Matematikai T rsulat and a a NorthHolland. Salomaa, Arto K. 1973 Formal Languages. New York: Academic Press. van der Sandt, Rob A. 1988 Context and Presupposition. London: Croom Helm. de Saussure, Ferdinand 1965 Course in General Linguistics. Columbus: McGrawHill. Sauvageot, Aur lien e 1971 LEdication de la Langue Hongroise [The building of the Hungarian language]. Paris: Klincksieck. Sch nnkel, Moses o 1924 Uber die Bausteine der mathematischen Logik [On the building blocks of mathematical logic]. Mathematische Annalen 92:305 316. Seki, Hiroyuki, Matsumura, Takashi, Fujii, Mamoru, and Kasami, Tadao. 1991 On multiple contextfree grammars. Theoretical Computer Science 88:191 229. Sestier, A. 1960 Contributions a une th orie ensembliste des classications linguis` e tiques [Contributions to a settheoretical theory of classications]. In Actes du Ier Congr` s de lAFCAL, 293 305, Grenoble. e
Shieber, Stuart 1985 Evidence against the ContextFreeness of Natural Languages. Linguistics and Philosophy 8:333 343.
570
Bibliography ConstraintBased Grammar Formalisms. Cambridge (Mass.): MIT Press.
1992
Smullyan, Raymond M. 1961 Theory of Formal Systems. Annals of Mathematics Studies 47. Princeton: Princeton University Press. Staal, J. F. 1967 Word Order in Sanskrit and Universal Grammar. Foundations of Language, Supplementary Series No. 5. Dordrecht: Reidel.
Stabler, Edward P. 1997 Derivational Minimalism. In Logical Aspects of Computational Linguistics (LACL 96), Christian Retor (ed.), 68 95. (Lecture Notes in e Articial Intelligence 1328.) Heidelberg: Springer. von Stechow, Arnim, and Sternefeld, Wolfgang 1987 Bausteine syntaktischen Wissens. Ein Lehrbuch der generativen Grammatik [Building blocks of syntactic knowledge. A textbook of generative grammar]. Opladen: Westdeutscher Verlag. Steedman, Mark 1990 Gapping as constituent coordination. Linguistics and Philosophy 13:207 263. 1996 Surface Structure and Interpretation. Linguistic Inquiry Monographs 30. Cambridge (Mass.): MIT Press. Tarski, Alfred 1983 The concept of truth in formalized languages. In Logic, Semantics, Metamathematics, J. Corcoran (ed.), 152 178. Indianapolis: Hackett Publishing. Tesni` re, Lucien e 1982 El ments de syntaxe structurale [Elements of structural syntax]. 4th e ed. Paris: Klincksieck. Thatcher, J. W., and Wright, J. B. 1968 Generalized nite automata theory with an application to a decision problem of secondorder logic. Mathematical Systems Theory 2:57 81. Thue, Axel 1914 Probleme uber Ver nderungen von Zeichenreihen nach gegebe a nen Regeln [Problems concerning changing strings according to given rules]. Skrifter utgit av Videnskapsselkapet i Kristiania, I. Mathematisknaturvidenskabelig klasse 10.
Bibliography
571
Trakht nbrodt, B. A. e 1950 N vozmo nost algorifma dl probl my razr simosti na kon cnyh e z a e e e klassah [Impossibility of an algorithm for the decision problem of nite classes]. Doklady Akad mii Nauk SSSR, 569 572. e Turing, Alan M. 1936 On computable numbers, with an application to the Entscheidungsproblem. Proceedings of the London Mathematical Society 42:230 265. Uszkoreit, Hans 1987 Word Order and Constituent Structure in German. CSLI Lecture Notes 8. Stanford: CSLI. Vaught, Robert L. 1995 Set Theory. An Introduction. 2nd ed. Basel: Birkh user. a Frank Veltman. 1985 Logics for Conditionals. Ph. D. diss., Department of Philosophy, University of Amsterdam. VijayShanker, K., Weir, David J., and Joshi, Aravind K. 1986 Tree adjoining and head wrapping. In Proceedings of the 11th International Conference on Computational Linguistics (COLING 86), Bonn, 202207. 1987 Characterizing structural descriptions produced by various grammar formalisms. In Proceedings of the 25th Meeting of the Association for Computational Linguistics (ACL 87), Stanford, CA, 104 111. Villemonte de la Clergerie, Eric 2002a Parsing MCS Languages with Thread Automata. In Proceedings of the Sixth International Workshop on Tree Adjoining Grammars and Related Formalisms (TAG+6), Venezia. 2002b Parsing mildly contextsensitive languages with thread automata. In Proceedings of the 19th International Conference on Computational Linguistics (COLING 02), Taipei. Weir, David J. 1988 Characterizing Mildly ContextSensitive Grammar Formalisms. Ph. D. diss., University of Pennsylvania, Philadelphia. Younger, D. H. 1967 Recognition and parsing of contextfree languages in time n 3 . Information and Control 10:189 208. Zadrozny, Wlodek 1994 From Compositional Semantics to Systematic Semantics. Linguistics and Philosophy 17:329 342.
572
Bibliography
Zimmermann, Thomas Ede 1999 Meaning Postulates and the ModelTheoretic Approach to Natural Language Semantics. Linguistics and Philosophy 22:529 561.
Index
,1 , , , ,1 , , ,1 , f in , 1 , 0 , 3 M ,3 R S, R S, 4 , Rn , R , R , 5 N , 5 , ,5 f P, 5 f X ,5 Tm , 6 X ,7 x , 9 , 9 ker, 9 E , 11 w , s , 13 x y, , 17 x , 17 Z x , 18 x i , i n xn , 18 A , 20 xT , 22 CL M , ZL P , 24 L M, Ln , L , L , L M, L M, 24 LP , LS , 24 , , 24 L , 24 PN , 26 M L, L M, 29 , 44 x, x, 44 , 44 b , 44 x y, 44 , 45 h x , d x , 45 x , 46 k x , 46 T x, 49 G x, G x, 53 L G , 53 n, T , 53 T T, X , 54 , 54 der G , der G , 58 in x f , out x f , 67 : , 68 , , , , 68 L , 69 LB G , 70 x q y T x1 q1 y1 , 82 C T , 82 L T , 83 L , 96 d , 96 , , , 97 , 98 der , 105 Lc G , 111 , 122 s , 123 k x, 124 n , 125 L X Y, , 129 , 133 A , n , 149 , 149 U V , nU, U, 150 U;V , 150 a , , 161 L , M , 177 F , 181 , , , 181 f , f , f , 182
I q
B 8 @ B @
I A r B 8 @ B I8 @ 6 I
q I
j 8 Bh '@ Bh '@ i B f pg@ pg@ B f
e d
I q qI
B @ I I A B @ I B @ B @ I B @ B @
B @
B @ B @ B @
I I
B @
A A
B @
I r
B @
B p@
B D@
B @
B @ I S IS I qI
t xt
B @
p i hf g
A
d eR
Is
Yy
574
Index , 311 , 311 , 312 L L , 312 KJ , BJ , 314 , 314 , 314 , , , , 314 , , 315 , 321 , , 325 M N , 327 , 327 a, 335 supp , 336 f , 340 x , x , 348 x A , 349 , 354 2 , 3 , 355 , , , 358 , 359 3 , 359 , , 362 , 371 x , 375 w x , 375 G , 384 , , , , , 396 , 397 D G , 415 , 415 X , 416 K , 431 Sent , 438 , 438 , 467 m , 469 , 472 Px x , 476 G1 G2, 479 ContL a , 489
, 241
y 6 F I
B @
B @
B @
a W
B R @ B @ B @
B @
B @
v )
u
B P@ I B @ I

D8
8
c F
7 f
, 183 Taut , 192 A , 193 , 193 , 194 , , 205 , 207 , 211 (ext), 211 , 212 M N, 215 x y , 215 , , 215 , , , 216 , 217 Cat C , 225 , 229 , , 229 , , 233 CCG , 234 , , 234 , , 239 , 239 , (cut), , , 244 focc x M , 245 , 247 , 247 , 248 , , , 250 , 256 Lm , 263 Lm , 264 , , 294 X , 299 X , 300 At , 301 Pt , 302 , , 305 , 307 t d , 308 A B, 309
B kk @ B llPx5k l8 (@ B
7
~ B @
F t 8 D8 E

F t 8 g8
B 8 @ wwu v v
B (@
8 @
B @ } B f pg@
| { }
B @
B @
qm o rpn

Yf
s s z f
Index , , , 491 FL , 493 b , 505 , , , , , , , s t, s , 535 , 541 K P K Q , 542 x y, x , 546 x , x , x , x , 548
575
B chi, J., xiii, 471 u BackusNaur form, 55 backward deletion, 280 Bahasa Indonesia, 37, 445, 451 BarHillel, Yehoshua, 225, 227
Aform, 371 Ameaning, 349 astructure, 533 absorption, 297 abstract family of languages, 64 accommand, 545 accessibility relation, 312 address, 550 adjunct, 526 adjunction tree, 76 AFL, 64 Ajdukiewicz, Kazimierz, 225 AjdukiewiczBar Hillel Calculus, 225 algebra, 6 , 6 ngenerated, 181 boolean, 296 freeely ( -)generated, 7 manysorted, 12 product, 9 algebra of structure terms, xi ALGOL, 55, 170 allomorph, 32 allophone, 487 ALOGSPACE, 379 alphabet, 17 input, 500 output, 500 Alt, Helmut, vii analysis problem, 54 anaphor, 520 antecedent, 198, 524 antitonicity, 304
x $ x `
B @
, 506
applicative structure, 214 combinatorially complete, 220 extensional, 214 partial, 214 typed, 214 Arabic, 400, 458 archiphoneme, 485 argument, 38 external, 526 internal, 526 argument key, 375 argument sign, 448 ARO, 389 assertion, 359 assignment, 193 associativity, 18, 297 left , 26 right , 26 assumption, 204 atom, 301 attribute, 462 denite, 467 set valued, 467 Type 0, 463 Type 1, 463 attributevalue structure (AVS), 462 automaton deterministic nite state, 95 nite state, 95 pushdown, 118 automorphism, 8 AVS (see attribute value structure), 462 axiom, 193 primitive, 199 axiom schema, 193 instance, 193 axiomatization, 317
B @ B R@ S )
@ B
C
576
Index functional, 525 lexical, 525 thin, 264 category assignment, 225, 226 category complex, 240 CCG (see combinatory categorial grammar), 233 CCS (see copy chain structure), 549 CCT (see copy chain tree), 546 cell, 374 centre tree, 76 chain, 43, 545 associated, 545 copy , 545 foot , 545 head , 545 trace , 545 Chandra, A. K., 372, 379 channel, 33 characteristic function, 5 chart, 128 Chinese, 34, 400 Chomsky Normal Form, 111 Chomsky, Noam, ix, xii, 51, 65, 90, 165, 367, 414, 448, 486, 517, 519, 522, 525, 529, 544 Chrysippos, 308 Church, Alonzo, 92, 224, 272, 443 class, 3 axiomatic, 471 nitely MSOaxiomatisable, 471 closure operator, 14 Cocke, J., 130 coda, 499 code, 478, 479, 510 uniform, 511 coding dependency, 51 structural, 51 Coding Theorem, 480 colour functional, 68 combinator, 217
ccommand, 521, 540 Cmodel, 364 cstructure, 533 cancellation, 145 cancellation interpretation, 240 canonical decomposition, 547 canonical Leibniz congruence, 321 canonical Leibniz meaning, 321 cardinal, 3 Carnap, Rudolf, 311 case, 41, 455 Catalan numbers, 52 Categorial Grammar, 274 categorial grammar AB , 226 categorial sequent grammar, 241 category, 53, 181, 225, 526 , 292 basic, 225, 241 distinguished, 241
Beth property global, 495 binding, 521 bivalence, 359, 360 blank, 24, 81 block, 431 Blok, Wim, 317 Bloomeld, Leonard, 529 Boole, George, 308 boolean algebra, 296 atomic, 302 complete, 304 with operators (BAO), 315 boolean function, 374 monotone, 378 bounding node, 528 bracketing analysis, 110 branch, 44 branch expression, 117 branching number, 44, 383 Bresnan, Joan, 533 Burzio, Luigi, 527
Index
577
stratied, 219 typed, 218 combinatorial term, 217 stratied, 219 typed, 218 combinatory algebra, 221 extensional, 221 partial, 221 combinatory categorial grammar, 233, 432 combinatory logic, 217 command relation, 540 chain like, 542 denable, 544 generated, 542 monotone, 540 product of s, 542 tight, 540 weakly associated, 543 commutativity, 297 Commuting Instances Lemma, 58, 60, 65 Compactness Theorem, 196 complement, 38, 296, 526 complementary distribution, 489 compositionality, x, 177 comprehension, 2 computation, 118 computation step, 81 concept, 15 conclusion, 198 conguration, 82, 118 congruence fully invariant, 10 strong, 13 weak, 13 congruence relation, 8 admissible, 287 connective Bochvar, 355 strong Kleene, 362 consequence, 194
nstep , 286 consequence relation, 286 equivalential, 319 nitary, 286 nitely equivalential, 319 global, 312 local, 312 structural, 286 constant, 5 eliminable, 493 constant growth property, 369 constituent, 45, 72 G , 132 accidental, 132 continuous, 46 constituent analysis, 111 Constituent Lemma, 47 constituent part left, right, 73 constituent structure, 48 Constituent Substitution Theorem, 72 constraint, 535 basic, 535 context, 14, 22, 309 n , 407 extensional, 311 hyperintensional, 309 intensional, 311 left, 141 local, 359 context change, 359 context set, 438, 489 contractum, 212 contradiction, 192 conversion , , , 211 conversioneme, 454 cooccurrence restrictions, 494 copy chain structure, 549 copy chain tree, 546 copy chain , 545 Copy , 545
578
Index domains disjoint, 58 domination, 45 Doner, J. E., xiii, 510 double negation, 298 Dresner, Eli, 295 Dutch, 533 dyadic representation, 19 Ebert, Christian, vii, 372 edge, 66 element overt, 550 elementary formal system, 392 embedding, 543 end conguration, 83 endomorphism, 8 English, 31, 36, 165, 168, 172, 451, 488, 515, 519, 523 environment, 340 equation, 10 reduced, 153 sorted, 12 equationally denable class, 11 equivalence, 108, 292 equivalence class, 9 equivalence relation, 4 equivalential term, 319 set of s, 319 Erilt, Lumme, vii exponent, 181 exponents syntactically indistinguishable, 438 EXPTIME, 92 extension of a node, 46 extent, 14 fstructure, 533 Fabian, Benjamin, vii factorization, 9 faithfulness, 510 Faltz, Leonard L., 304 feature, 531
Curry, Haskell B., 218, 221, 223, 224 CurryHowardIsomorphism, 221 cut, 72 degree of a , 201 weight of a , 201 Cut Elimination, 201, 242 cutweight, 201 cycle, 43 cyclicity, 528 cylindric algebra, 335 locally nite dimensional, 335 Czelakowski, Janusz, 318 DAG, 43 damitsplit, 517 de Groote, Philippe, 458 de Morgan law, 298 de Morgan, Augustus, 308 de Saussure grammar, 448 de Saussure sign, 448 de Saussure, Ferdinand, 190, 447 decidable set, 84 Deduction Theorem, 194, 365 deductively closed set, 287 deep structure, 517 denability, 152 denition global explicit, 495 global implicit, 492 dependency syntax, 51 depth, 45 derivation, 53, 58, 69, 415, 518 end of a , 58 start of a , 58 derivation grammar, 415 derivation term, 179 Devanagari, 34 devoicing, 485, 486 diagonal, 5 dimension, 150, 335, 336 direct image, 5 discontinuity degree, 407 distribution classes, 24
Index distinctive, 37 foot, 531 head, 531 suprasegemental, 36 eld of sets, 299 Fiengo, Robert, 443 lter, 287, 303 Fine, Kit, 134 nite intersection property, 303 nite state automaton partial, 497 Finnish, 34, 35, 456, 489, 504 rstorder logic, 269, 325 FisherLadner closure, 493 FOL, 269, 325 forest, 43 formula codable, 478, 510 contingent, 192 cut , 198 main, 198 wellformed, 192 FORTH, 26 forward deletion, 280 foundation, 2 frame consequence global, 313 local, 312 Frege, Gottlob, 224, 308, 440 French, 31, 3436 Fujii, Mamoru, 413 function, 5 bijective, 5 bounded, 348 computable, 84 nite, 348 injective, 5 partial, 5 surjective, 5 functional head, 459 functor, 38 functor sign, 448
579
graph, 66 G rdenfors model, 364 a G rdenfors, Peter, 364 a G rtner, HansMartin, vii a G del, Kurt, 326 o Gaifman, Haim, 227 Galois correspondence, 13 Gazdar, Gerald, 165, 530 GB (see Theory of Government and Binding), 522 Gehrke, Stefanie, vii Geller, M. M., 143 Generalized Phrase Structure Grammar, 462, 529 generalized quantier, 279 Gentzen calculus, 198 German, 31, 35, 36, 40, 165, 452, 457, 461, 488, 489, 517, 523, 530, 533, 539 Ginsburg, Seymour, 147, 158 government, 527 proper, 527 GPSG (see Generalized Phrase Structure Grammar), 462 grammar, 53 LR k , 139 ambiguous, 135 Chomsky Normal Form, 111 context free, 54 context sensitive, 54 de Saussure, 448 derivation, 415 inherently opaque, 133 invertible, 112 left regular, 103 left transparent, 141 linear, 127 natural, 439 noncontracting, 61 of Type 0,1,2,3, 54 perfect, 113 reduced, 109
B @
580
Index Hindi, 31, 34, 36 Hindley, J. R., 342 Hodges, Wilfrid, vii, 292, 293, 435 homomorphism, 8, 13 free, 63 sorted, 12 strong, 13 weak, 13 Hornformula, 382 Howard, William, 221 HPSG (see Head Driven Phrase Structure Grammar), 463 Hungarian, 35, 40, 503 Husserl, Edmund, 224 Huybregts, Riny, 165, 530 Imodel, 363 idccommand, 540 idempotence, 297 identier S , 550 independent pumping pair, 75 index, 337 index grammar, 425 linear, 425 right linear, 429 simple, 426 index scheme, 424 context free, 424 linear, 424 terminal, 424 IndoEuropean, 503 instantiation, 424 intent, 14 Interchange Lemma, 80, 168 interpolant, 259 interpolation, 259 interpretation group valued, 258 interpreted language boundedly reversible, 348 nitely reversible, 348 strongly context free, 344
Hgrammar, 292 Hsemantics, 292 Halle, Morris, 486 Halmos, Paul, 336 Hanke, Timo, vii Harkema, Henk, 414 Harris, Zellig S., 278, 423, 457, 517 Harrison, M. A., 143 Hausser, Roland, 446 head, 38, 525 Head Driven Phrase Structure Grammar, 529, 463 head grammar, 406 height, 45 Heim, Irene, 360 hemimorphism, 315 Henkin frame, 272 Henkin, Leon, 331 Herbranduniverse, 384 Hilbert (style) calculus, 192
regular, 54 right regular, 103 slender, 107 standard form, 111 strict deterministic, 125 strictly binary, 95 transparent, 133 grammar , 60 faithful, 478 product of s, 510 grammatical relation, 38 graph connected, 43 directed, 43 directed acyclic, 43 directed transitive acyclic (DTAG), 43 graph grammar, 69 context free, 69 greatest lower bound (glb), 297 Greibach Normal Form, 113 Groenink, Annius, xii, 381, 392, 409
Index weakly context free, 344 interpreted string language, 177 intersectiveness, 304 intuitionistic logic, 192 isomorphism, 8 Italian, 520 J ger, Gerhard, vii a Japanese, 40 Johnson, J. S., 342 join, 296 join irreducibility, 298 Joshi, Aravind, 161, 369, 406, 418 Kac, Michael B., 530 Kanazawa, Makoto, vii Kaplan, Ron, 486, 502, 533 Kartttunen, Lauri, 360 Kasami, Tadao, 130, 413 Kay, Martin, 486, 502 Keenan, Edward L., vii, 304 Kempson, Ruth, 354 key, 371 Kleene star, 24 Kleene, Stephen C., 92, 100, 362 Kobele, Greg, vii Kolb, Hap, vii Koniecny, Franz, vii Kosiol, Thomas, vii Koskenniemi, Kimmo, 486 Koymans, J. P. C., 342 Kozen, Dexter C., 372, 379 Kracht, Marcus, 369, 372, 530, 539 Kripkeframe, 312, 468 generalized, 312 Kripkemodel, 468 Kronecker symbol, 149 Kuroda, S.Y., 90
581
algebra, 221 term, 209 closed, 209 congruent, 211
contraction, 212 evaluated, 212 pure, 209 relevant, 448 term, 254 linear, 254 strictly linear, 254 term, 208 Lframe, 268 labelling function, 43 Lambek, Joachim, 225, 250 LambekCalculus, 225, 250 Nonassociative, 250 Landweber, Peter S., 90 Langholm, Tore, 433 language, 23 LR k , 139 kpumpable, 409 nturn, 127 accepted by stack, 119 accepted by state, 119 2template, 497 accepted, 83 almost periodical, 159 context free, 55, 103 context free deterministic, 122 context sensitive, 55 decidable, 85 Dyck, 123 nite index, 103 head nal, 40 head initial, 40 inherently ambiguous, 135 interpreted, 177 linear, 127, 150 mirror, 122 NTS , 136 of Type 0,1,2,3, 54 OSV, OVS, VOS, 39 prex free, 122 propositional, 285 PTIME, 300
B @
582
Index link, 546, 550 maximal, 550 link extension, 551 link map, 546 orbital, 548 Lipt k, Zsuzsanna, vii a literal movement grammar nbranching, 422 denite, 394 linear, 402 monotone, 408 noncombinatorial, 387 nondeleting, 386 simple, 387 literal movement grammar (LMG), 383 Little Deduction Theorem, 195 little pro, 521 LMG (see literal movement grammar), 383 logic, 318, 471 boolean, 192 classical, 192 rstorder, 269, 325 Fregean, 320 intuitionistic, 192 logical form, 519 LOGSPACE, 372 Longo, G., 342 loop, 257 M nnich, Uwe, vii, 78 o Malay, 451 ManasterRamer, Alexis, vii, 530 Mandarin, 400, 444, 458 map ascending, 546 discriminating, 487 Markov, A. A., 90 matrix, 286 canonical, 287 reduced, 287 matrix semantics, 287 adequate, 287
recursively enumerable, 84 regular, 55, 95 semilinear, 150 SOV, SVO, VSO, 39 strict deterministic, 123 string, 23 ultralinear, 127 verb nal, 40 verb initial, 40 verb medial, 40 weakly semilinear, 381 language , 487 Latin, 39, 40, 447, 457, 520 lattice, 297 bounded, 298 distributive, 298 dual, 308 law of the excluded middle, 363 LCcalculus, 145 LCrule, 145 leaf, 44 least upper bound (lub), 297 left transparency, 141 Leibniz equivalence, 321 Leibniz operator, 321 Leibniz Principle, 290, 291, 293, 296, 308311, 317, 320323, 434, 435, 438, 444447 Leibniz, Gottfried W., 290 letter, 17, 35 letter equivalence, 150 Levy, Leon S., 161 lex, 31 LFG (see Lexical Functional Grammar), 529 Liberation, 546 licensing, 465, 526 LIG, 425 ligature, 34 Lin, Ying, vii linearisation, 104 leftmost, 105
Index Matsumura, Takashi, 413 May, Robert, 443 MDS (see multidominance structure), 549 meaning A , 349 meaning postulate, 318 meet, 296 meet irreducibility, 298 meet prime, 298 Mel uk, Igor, 42, 51, 190, 451 c mention, 309 metarule, 531 Michaelis, Jens, vii, 369, 413, 429 Miller, Philip H., 174 Minimalist Program, 523 mirror string, 22 modal logic, 311 classical, 311 monotone, 311 normal, 311 quasinormal, 311 modality universal, 475 mode, x, 181 model, 468 model class, 471 module, 523 Modus Ponens (MP), 193, 204 Monk, Donald, 342 monoid, 19 commutative, 147 monotonicity, 304 Montague Grammar (see Montague Semantics), 190 Montague Semantics, 190, 228, 269, 273, 343, 350, 353, 354 Montague, Richard, 269, 274, 291, 300, 311, 315, 440443, 446 morph, 30 morpheme, 32 morphology, 30
583
move , 118 Move , 522 MSO (see monadic second order logic), 467 multidominance structure, 549 ordered, 549 multigraph, 66 multimodal algebra, 315 MZstructure, 473 N meti, Istv n, vii e a natural deduction, 204 natural deduction calculus, 206 natural number, 3 network, 375 goal, 375 monotone, 377 NEXPTIME, 92 No Recycling, 548 node active, 70 central, 414 daughter, 44 mother, 44 nonactive, 70 nonterminal completable, 107 reachable, 107 normal form, 212 notation inx, 25 Polish, 26 Reverse Polish, 26 NP, 92 NTSproperty, 136 nucleus, 499 number, 375 object, 38 occurrence, 435 OMDS (see ordered multidominance structure), 549
584
Index polyadic algebra nitary, 336 polymorphism, 237 polynomial, 8 polyvalency, 41 portmanteau morph, 457 position, 22 Post, Emil, 65, 80, 90, 92 Postal, Paul, 165, 530 postx, 22 postx closure, 24 precedence, 23 Predecessor Lemma, 44 predicate, 383 prex, 22 prex closure, 24 premiss, 198 Presburger, 157 Presburger Arithmetic, 147, 160 presentation, 90 presupposition, 354 generic, 358, 359 priorisation, 105 problem illconditioned, 281 product, 479 product of algebras, 9 product of grammars , 479 production, 54 X , 114 contracting, 54 expanding, 54 left recursive, 114 strictly expanding, 54 productivity, 54 program elementary, 490 progress measure, 437 projection, 254, 510 projection algorithm, 359 proof, 193 length of a , 193
one, 297 onset, 499 operator dual, 311 necessity, 311 possibility, 311 order crossing, 166 nesting, 166 ordered set, 4 ordering, 4 exhaustive, 47 lexicographical, 18 numerical, 18 ordinal, 3 overlap, 44, 411 pcategory, 431 parasitic gap, 522 Parikh map, 150 Parikh, Rohit, 135, 151, 162 parsing problem, 54 Parsons, Terry, vii partition, 4 strict, 124 path, 44, 532 PDL (see propositional dynamic logic), 490 Peirce, Charles S., 190 Pentus, Mati, xii, 258, 264, 268, 269 permutation, 336, 385 Peters, Stanley, 533 phenogrammar, 443 phone, 30 phoneme, 31, 35, 488 phonemicization, 488 Phonological Form, 520 phonology, 30 Pigozzi, Don, 317 Pogodalla, Sylvain, 353 point, 302 Polish Notation, 42, 55, 133, 179, 180 Pollard, Carl, 406
Index proof tree, 199, 206 propositional dynamic logic, 490 elementary, 491 propositional logic inconsistent, 307 Prucnal, T., 319 PSPACE, 92 PTIME, 92 Pullum, Geoffrey, 165, 530 Pumping Lemma, 74, 80 pushdown automaton, 118 deterministic, 122 simple, 119 Putnam, Hilary, 90 QML (see quantied modal logic), 468 quantied modal logic, 468 quantier restricted, 475 quantier elimination, 157 quasigrammar, 60 Quine, Willard van Orman, 336 Rsimulation, 108 Radzinski, Daniel, 401, 444 Rambow, Owen, 369 Ramsey, Frank, 364 readability unique, 25 realization, 182, 487 Recanati, Francois, 322 recognition problem, 54 recursively enumerable set, 84 redex, 212 reducibility, 109 reduction, 138 reference, 310 referential expression, 520 register, 283 regular relation, 501 relation, 4 reexive, 4 regular, 501 symmetric, 4 transitive, 4 replacement, 211, 435 representation, 25 representative unique, 25 restrictiveness, 304 RG, CFG, CSG, GG, 57 rhyme, 499 Riggle, Jason, vii RL, CFL, CSL, GL, 57 Roach, Kelly, 406 Rogers, James, 530 Roorda, Dirk, 258, 263 root, 43, 547 Rounds, William, xii, 381 rule, 53, 199, 206 admissible, 201 denite, 394 downward linear, 386 downward nondeleting, 386 eliminable, 201 nitary, 199, 206, 286 instance, 384 instance of a , 57 monotone, 408 noncombinatorial, 387 simple, 387 skipping of a , 114 terminal, 54 upward linear, 386 upward nondeleting, 386 rule instance domain of a , 57 rule of realization, 30 rule scheme, 424 rule simulation backward, 108 forward, 108 Russell, Bertrand, 354, 440 Sain, Ildik , 336, 342 o Salinger, Stefan, vii
585
586
Index meaning, x realization, 186 sign complex, 244 sign grammar AB , 230 context free, 343 progressive, 437 quasi context free, 343 strictly progressive, 437 sign system compositional, 186 context free, 343 linear, 185 modularly decidable, 189 strictly compositional, 440 weakly compositional, 186 signature, 6 constant expansion, 6 functional, 15 relational, 15 sorted, 12 signied, 447 signier, 447 simple type theory (STT), 272, 326 singulare tantum, 444 SO (see second order logic), 467 sonoricity hierarchy, 498 sort, 12 Spanier, Edwin H., 147, 158 SPEmodel, 486 specier, 526 Staal, J. F., 531 Stabler, Edward P., vii, 414 stack alphabet, 118 stack move, 126 Stamm, Harald, vii standard form, 59, 111 start graph, 69 start symbol, 53 state, 81, 500 accepting, 81, 95, 500 initial, 81, 95, 500
sandhi, 496 Sch nnkel, Moses, 220, 224 o search breadthrst, 106 depthrst, 105 second order logic, 467 monadic, 467 segment, 17, 36, 526 segmentability, 36 Seki, Hiroyuki, 413 semantics primary, 289 secondary, 289 seme, 31 semi Thue system, 53 semigroup commutative, 147 semilattice, 297 sense, 310 sentence, 171 sequent, 198, 240 thin, 264 sequent calculus, 199 sequent proof, 198 Sestierclosure, 24 Sestieroperator, 24 set, 1 of worlds, 312 conite, 302 consistent, 195, 287 countable, 3 deductively closed, 287 downward closed, 196 maximally consistent, 287 Shamir, E., 227 Shieber, Stuart, 165, 530 shift, 138 shufing, 543 sign, x, 181 category, x de Saussure, 448 exponent, x
Index Staudacher, Peter, vii Steedman, Mark, 278 stemma, 51 Sternefeld, Wolfgang, vii, 530 Stockmeyer, L. J., 372, 379 strategy, 138 bottomup, 146 generalized left corner, 146 topdown, 146 stratum, 30 deep, 32 morphological, 30 phonological, 30 semantical, 30 surface, 32 syntactical, 30 string, 17 associated, 46 length, 17 representing, 25 string sequence, 58 associated, 58 string term, 448 progressive, 448 weakly progressive, 448 string vector algebra, 397 structural change, 515 structural description, 515 structure, 15, 468 structure over A, 43 structure term, 182 denite, 182 orthographically denite, 182 semantically denite, 182 sentential, 293 syntactically denite, 182 structures A , 43 connected, 473 subalgebra, 9 strong, 13 subcategorization frame, 526
587
subframe generated, 317 subgraph, 45 subjacency, 544 subject, 38 substitution, 285, 384 string, 22 substitutions, 7 substring, 22 substring occurrence, 22 contained, 23 overlapping, 23 subtree, 45 local, 70 succedent, 198 sufx, 22 supervaluation, 361 suppletion, 457 support, 336 surface structure, 518 Suszko, Roman, 318 Swiss German, 165, 167, 454, 539 symbol nonterminal, 53 terminal, 53 synonymy, 292 H , 292 Husserlian, 292 Leibnizian, 293 syntax, 30 system of equations proper, 98 simple, 98 Tmodel, 525 TAG, 416 Takahashi, Masako, 161 Tarskis Principle, 435 Tarski, Alfred, 435 tautology, 192 Tchao, Ngassa, vii tectogrammar, 443 template, 496
588
Index correctly labelled, 226 ordered, 46 partial G , 132 properly branching, 48 tree adjoining grammar standard, 416 tree adjunction grammar unregulated, 77 tree domain, 49 truth, 193 truth value, 286 designated, 286 Turing machine, 81 alternating, 378 deterministic, 81 linearly space bounded, 90 logarithmically space bounded, 372 multitape, 85 universal, 89 Turing, Alan, 80, 92 turn, 126 type, 181, 337 type raising, 241 ultralter, 303 umlaut, 32, 33, 35, 485 underspecication, 465 unfolding map, 183 unication, 531 uniqueness, 546 unit, 18 use, 309 Uszkoreit, Hans, 539 UTAG, 77 V2movement, 517 valuation, 211, 282, 312, 384 value, 462 atomic, 463 van der Hulst, Harry, 499 van Eijck, Jan, 360 van Fraassen, Bas, 361 variable
template language, 496 boundary, 496 term, 6 , 6 equivalential, 319 level of a , 6 regular, 97 term algebra, 7 term function, 8 clone of, 8 term replacement system, 78 Tesni` re, Lucien, 51 e text, 359 coherent, 359 TG (see Transformational Grammar), 515 Thatcher, J. W., xiii, 510 theory, 317, 322 MSO, 471 Theory of Government and Binding, 522 thin category, 264 Thompson, Richard S., 336, 342 Thue system, 53 Thue, Axel, 65 topicalisation, 515 trace, 520, 524 trace chain structure, 551 trajectory, 548 Trakht nbrodt, B. A., 285 e transducer deterministic nite state, 500 nite state, 499 Transducer Theorem, 167, 501 transformation, 336 Transformational Grammar, 515 transition, 118 transition function, 95, 118, 500 transitive closure, 5 transparency, 133 transposition, 336 tree, 44, 549
Index bound, 171, 209 free, 209 occurrence, 209 propositional, 192 structure , 464 variety, 10 congruence regular, 320 vector cyclic, 150 Veltman, Frank, 364 verb intransitive, 41 transitive, 41 verb cluster, 533 vertex, 66 vertex colour, 66 vertex colouring, 66 VijayShanker, K., 406, 418 Villemonte de la Clergerie, Eric, 414 von Humboldt, Alexander, 443 von Stechow, Arnim, 530 Wartena, Christian, 429 Weir, David, 406, 418 wellordering, 3 wff, 192 word, 36, 487 word order free, 41 Wright, J. B., xiii, 510 Wro ski, Andrzej, 319 n
589
rule, 211 Xsyntax, 353, 525 XML, 123

Younger, D. H., 130 Zstructure, 470 Zaenen, Annie, 533 zero, 296 Zwicky, Arnold, 172

Kracht Mathematics of Language

Caricato da

Informazioni sul documento

Descrizione originale:

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Kracht Mathematics of Language

Caricato da

Copyright:

Formati disponibili

The Mathematics of Language

 & #       '%$"! 

o Peter R hmkorf: Ph nix voran u

Los Angeles and Berlin, September 2003

simple example. Suppose we have the following signs.

is a sign, and it has the following form. : t

Chapter 1 Fundamental Structures

Foundation is usually dened as follows

Foundation. There is no innite chain x0

Full Comprehension. For every class P and every set x, y : y is a set.

3. Set Union. x u is denoted by tence.

z z y u z u u x . z or simply by x. The axiom guarantees its exisx

Algebras and Structures

Algebras and Structures

Algebras and Structures

the function symbols as follows.

If h : A B is an isomorphism from phism from to .

Denition 1.8 Let algebras and h : A every f tuple x

A f : f F and B f : f F B. h is called a homomorphism if for every f A f we have

A is dened inductively as follows (with a

Algebras and Structures

ker h is a congruence relation on . Furthermore, to if h is surjective. A set B A is closed under f we have f x B.

ker h is isomorphic F if for all x B f

Then because of (1.10) we immediately have f means f y f x . Put

. Then yi xi for all i

We also write x y in place of xi yi for all i relation put

Proposition 1.12 The following holds for all classes

Algebras and Structures

Let E be a set of equations. Then put

Proposition 1.14 Let E be a set of equations. Then

Corollary 1.13 Let be a class of algebras. Then Eq variant congruence on V .

A manysorted algebra is an algebra of some signature .

f t0 tn 1 has sort n , if f for all i n.

V , then x has sort .

1 and ti has sort i

A : such F such that f i :

Algebras and Structures

A set X is called closed if X

One calls O the intent of O

A and P the extent of P

Denition 1.25 Let A B R be a context. A pair O P called a concept if O P and P O .

. On the other hand, O P is shown. The claims

Algebras and Structures

Exercise 8. Prove Proposition 1.7. 2. Semigroups and Strings

Exercise 7. Show that . Show also that 1

Show by giving an example that analogous laws for

Exercise 4. Show that for relations R R

as well as for all R

Semigroups and Strings

Semigroups and Strings

Figure 1. The Tree A

V U @V U i X i D V aV U V ijU Y@V U X U X i D V U aV ijU @V U U X V X i D V U @V i W U X i D V W i W U i V i W U i D

N 1 be a monoid and v : A Proof. Let we dene a map v as follows.

Proposition 1.30 The monoid

A is freely generated by A. N an arbitrary map. Then

Proposition 1.29 Let

Semigroups and Strings

A the following cancellation laws hold. y.

There exists an injective homomorphism i : homomorphism h : such that h i 1N .

There is an injective homomorphism i : momorphism h : such that h i

Theorem 1.31 Let M 1 and noids. Then either or obtains.

1 be freely generated moand a surjective hoand a surjective

xT is dened as follows. (1.36) :

& # '%$"!

V U @V U i X i D V aV U V ijU Y@V U X U X i D V U aV ijU @V U U X V X i D V U @V i W U X i D V W i W U i V i W U i D

3 X yV U @V Ub3 V aV ) $3 U XU D X@V b3 U V aV ) $b3 U XU V 3 U

& (& % 0)'

h r ut RvFq s V 8@jRvFq w8jih r q r ) h r ut s U

y9'cH 4 P H G 4 H P A 9F7 ` f y41D D 7 P 7 H A IIG F7 G P 7 H G R4 7F7 h D A A

p e A7 p bH e 5 AF7 p H e 5 A r 7 u t 5 q FRvFH s h

p e P 7 H G A h h A A 5 R@F7 H nV8@jRv9q w8jih r q r ) h r ut s U

G p e P P D 9 h P 7 H A 5 bRh RA IG F7 bH b! P D h

p e h D P 7 H A 5 bA IG F7 bH b! P D h

aaH rRqebaaIRH bq e er e e y aa q r aaIH1q r c 7Cv