Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
63
Natural Language Communication with Computers
Editor Leonard Bolc Institute of Informatics Warsaw University PKiN, pok. 850 00-901 Warszawa/Poland
in Publication Data
Natural language communication with computers. (Lecture notes in computer science ; 63) Bibliography: p. Includes index. 1. Interactive computer systems--Addresses, essays~ lectures. 2. Question-answering systems--Addresses, essays, lectures. 3. Language data processing-Addresses, essays, lectures. I. Bolc, Leonard, 193~II. Series. QA76 99. I58N37 OO1.6 '~ 78-15393
AMS Subject Classifications (1970): 68-02, 68A30, 68A45 CR Subject Classifications (1974):
This work is subject to copyright. All rights are reserved, whether the whole or part of the material is concerned, specifically those of translation, reprinting, re-use of illustrations, broadcasting, reproduction by photocopying machine or similar means, and storage in data banks. Under w 54 of the German Copyright Law where copies are made for other than private use, a fee is payable to the publisher, the amount of the fee to be determined by agreement with the publisher. 9 by Springer-Verlag Berlin Heidelberg 1978 Printed in Germany Printing and binding: Beltz Offsetdruck, Hemsbach/Bergstr. 2145/3140-543210
P R E F A C E
In r e c e n t y e a r s velop natural
in n u m e r o u s
countries,
attempts
have been
m a d e to de-
language
systems of c o m m u n i c a t i o n
w i t h computers.
research
institutes
research programs.
This p u b l i c a t i o n cerning
should
facilitate
an e x c h a n g e in this
of i n f o r m a t i o n , area.
con-
the p r e s e n t
state of r e s e a r c h
The a u t h o r s w o u l d publishing
like to e x p r e s s
their thanks
to S p r i n g e r - V e r l a g
for
this volume.
Warsaw,
May
1978
L e o n a r d Bolc
C O N T E N T S
of
answering Schwind
..........................................
Access natural
to d a t a b a s e language
systems
via
Klaus-Dieter
Kr~geloh,
Peter
C. L o c k e m a n n
................
49
An overview A problem as q u e r y
G.L.
solving language
Berry-Rogghe,
H. W u l z
...............................
87
Metamorphosis A. Colmerauer
The
theory
of a u g m e n t e d
transition Madeleine
grammars 191
..........................................
Syntactic Stanis~aw
analysis
of w r i t t e n
Polish 261
szpakowicz
.....................................
A FORMALISM FOR THE DESCRIPTION QUESTION ANSWERING Technische Universit~t SYSTEMS MGnchen Camilla Schwind
OF
ABSTRACT
The following article presents a formalism for the description system. by a sta~o of a lan-
which are applied to formulae of the non-logical syntactically symbols deby a for-
and make their truth value dependent the formula is evaluated. changes. consists Natural
pends also on the state of the world and it may change when a state language texts are described is an extension The derivation and the structure mal grammar, which of a CHOMSKY-grammar. The alphabet
en by special rules. to symbols usual method. of the symbols tencus. rules. tences give a Natural
of a rule is governed by the structure into state logic formuwith the production structure Of the senWe will are evaluated.
and on applying one rule, we can derive a set of sentexts are translated which are associated
INTRODUCTION Since the early 60's intelligent capable of understanding questions natural systems have been developed whioh are language sentences, of answering bases, or of carrying out comin regard to their from each other (eg.
.,andu. Most of those systems have been designed spL,cial problem areas which are very different [1], the (1) [2], [3], [4], [5], same main problems: The r e p r e s e n t a t i o n of the knowledge which is by thu natural language
language -~nt~nc~,s.
syntactic analysis of texts and the translation of them into a semantic representation. We intend to propose a f o r m a l i s m that describes these two p r o b l e m areas in a very general manner so that e x i s t i n g natural language b a s e d intelligent systems fit into this formalism. tors for immediately for all future states tense logic systems, "If (~p ~ p The heart of the know(+,-) as well as ledge r e p r e s e n t a t i o n system is a state logic containing special operafollowing and p r e c e d i n g states (F) and all past states (P) . Similar systems
have also been mentioned in [7]. But the crucial point is: In usual the structure of tense has been studied only as we could only prove theorems like to its "pure logical" properties;
is true from today on then it will be true from tomorrow on" + Fp) . In intelligent systems however, we need theorems about The tense structure of a If a
the nonlogical properties of state changes. logical symbols of the world, robot takes a block of the world cate symbol ON a
world is determined by chan~es within the world which affect the noni.e. the functions or predicates: lying on a block b , then this causes a change
the world, with the meaning of the function symbol we incorporate such nonlogical change descriptions tem, w~ will be able to prove theorems like: instant". Taking into account these considerations,
"If only
table ar,d John takes it, then the table will be empty at the followin~
which can be subject to some change and the resulting structure is zation el' modal logic
Such KrJl,ke-type :~emantics has been used for the semantic characteri([8],[9]). Truth values are assigned to fol.mulae d e p c n d l n g on the state of the world in which the formula is evaluated. And the state operators take into account the truth value of a formula in some other states which can be "reached" from the actual state. a,bj i.e.
Let us consider as an example a world consisting of three blocks c b and two hands or h'
It
and
h'
the possible actions which can be executed next, are that tak,~s
back on networ.k
tako-
on the I'1~,oI, ur
os the wo,.ld
carl p u t transition
be a g e n e z , a l , t a t , c~
changes
or we may also say contains descriptions of all actions which can be executed by some of the objects of the world.
(3)
lh lh
m
m
(i) (4) h h takes b . (2) h gives puts b on the floor. b to I h' . (3) h puts b on a .
Figure
The language of the state logic is formalized by a set of axioms and inference rules for which completeness has been proven. Natural language texts are analysed s y n t a c t i c a l l y by a formal grammar which is an e x t e n s i o n of a CHOMSKY-grammar. The algebraic properties The alphabet consists of (feature, value). symbols which are in their turn composed of pairs
in [10]. Their structure is defined by insertion rules which specify what features with what values can be contained in one structured symbol. The set of the structured symbols is ordered by inclusion and this ordering gives rise to a modified definition of derivability. (PI ... Pm,Q1 currence of ... Qn ) PI' "" " P'm
! " " " Qn
If
' and ' for 1 ~ i ~ m and 1 S j S n . Structured Pi ~ Pi Qj ~ Qj symbols are used in a formal grammar d e s c r i b i n g natural language sentences in the f o l l o w i n g way': There is one "starting frature", cat, the
value
usu:Jlly appearing VP
language DET
(determiner),
features
properties
categories it
(transitive),
(perfect) ,
(future)
Semantic
"animate"
bo]on~ing
to nouns.
feature.
in:~e~'tlon rules
are t r a n s d u c e d
is p e r f o r m e d This
by functions
the p r o d u c t i o n
rules.
concept
of assigning syntactic
sentation introduced
to sentences by Knuth
depending
on their [12]
has been
[11]
and Koster
languages.
an attributed formulae.
feature
description
of sentences
logic
syntax
transformation
introduced
are a very
cumbersome
means
of analysing
for i n t r o d u c i n g
to associate
structure "surface"
because which
get
had the same m e a n i n g tem the semantic the help ture but (2) The
syntactically
different.
representation
of a sentence
of applications.
and different
sentences symbols
logically
equivalent
rules which
classes. rules
Derivability
in such a way
that production
incompletely,
have to be specified
features can be calculated by application of an Ir~sertion rule, which udds features to alphabetic ~ymbols accord.~ng to the features belonging already to that symbol. It is po~)t~Ible to add such f~atures during sentence analysis when they are r~c, eded for the alh'11yuls and it is possible to neglect them when they are not rleuded. T h l t ) redundant value + is advantageous in natural language ana]ys1~) because there are many features of categories or words which arc sometimes in sentence analysis and sometimes are not. See for exfor the feature "animate" of the noun "woman" is not ample the sentence "The woman is sitting in the caf~". The feature needed for the analysis of this sentence.
OF THE M E A N I N G
OF NATII-
SENTENCES
The
language
of
Z,L,
prudicate logic
logic. [7]. We
It
in tenure here,
have h o w e v e r
of tense nor
the p r o p o s e d
antisymmetric
linear.
con:~ists
of the
following x,y,z, V, A
symbols: ''' xI' "'' YI' "'' and +A, set which zl' "'']sl-place symbols are PA this R9 , in formed is. We to be is done of
variables ^, ~ , and if
symbols way
F i , where +, -,
s i c ~; r j - p l a c e F, P . T e r m s then of the
rj r w 9
is a formula, subset In c l a s s i c a l
a certain
formulae
predicate
logic,
of a s t r u c t u r e
consists
of a set
between
the objects.
a set
of c l a s s i c a l
structures, of the
relation
reflects
the m e a n i n g
operators.
values
are a s s i g n e d state
to f o r m u l a e
depending
on the
en s t r u c t u r e
and the a c t u a l
of the
structure.
for
is g i v e n the set
by
the
pair of
is c a l l e d
of states relation
( fl)ir a
is a b i n a r y
on
structure: OB , r fl Pjg objects of system. assigns s on si-tuples for c ~ . i OB . by R . : OBSl Arj , is a p a r t i a l mapping which
of e l e m e n t s
to e l e m e n t s is a r j - p l a c e closure of
rj c m reflexive
relation Re
transitive,
is d e n o t e d
We
consider of every t If L
A and
as a m a p p i n g a truth
assigning
an e l e m e n t
of
OB
to every of
value
to every
variable-free
formula
state
(2) If
is not a name,
t = F i to
sl-place
function
L{ymbol
F i . Then
.~'t(c*) ( A ( ~ , t o ) , . . . A ( m , t p s
of L :
a closed
formula then
A(~,t I) : A(s,t2) trj_ | , then (A(&,to), A(a,B) then A(s,B) then A(s,Bx[C]) for all A(8,A) for all A(8,A) 8 : T 8 : T B : T B = T . A iff A iff such that 8 R such that a R B such that 8 R o : T such for all that c r s Ro 8 . = A(s,C) = T = F .... A(~ ,trj. I ) ) r -(~) pj
A(~,A) A(~,A)
A(a,A) (7) if
A(~,A)
(8) A C ~ , + A ) : (9) A ( s , - A ) :
(D) A ( s , F A ) =
iff
(]I) A ( ~ , P A ) =
iff
is v a Z 4 d for all
4n
a 8~a~e
of i,
a strueture a 8~ruature
and a f o r m u l a
is u u Z 4 d
A formula We have
is u a Z 4 d a set
given
of l o g i c a l proven.
inference
completeness
has b e e n
The
existence
within
each
As
for
symbols
us to d e s c r i b e For
nonlogical think
changes
the w o r l d piling
an e x a m p l e , a change
up sand.
causes
increase.
If we
choose we can
SIZE
the c
size be
of ~n object the n a m e a
in the name 10 :
following
of the
child,
of the pile,
at state
being
5 and
at state
SIZES(h)
= 5;
SIZES(h)
= 10 ;
Q Ro 8
and
we
can verify
the
formula
SIZE(h)
= 5
PILE(c,h)
~+~SIZE(h)
= 10 .
A FOHMAL LANGUAGE FOR THE SYNTACTIC ANALYSIS OF NATURAL LANGUAGE TEXTS Texts are analysed syntactically with the help of a formal grammar, which is an extension of a CHOMSKY-grammar. The alphabet cOn~Ii~tn of finite sets, which are generated by "insertion rule~". The d n ~ e r t l o n I'ules are applied to "start symbols". These start symbols cort.(u~l)ond to the alphabetic elements which one founds usually in phrase structure grammars VP for natural N languages: (noun), V S (sentence), NP (noun phrase), (verbal phrase), (verb), etc. Insertion rules subNP we get:
classify the categories a set of alphabetic [ NI',(o, playlr,g the piano" [N}',(c,+)],[NP,(c,-)] [V,(vk,,i1:Vans)]
in such a way that for every category there is E.g. : for (noun phrase w~th e m b e d d i n g "the Child who is
elements.
and wiLhout embedding "the green ball"), (composed noun phrase "the teacher and all his for V we get: [V,(vk,trans)], [V,(t,per1')]
|~ui, i]s" :n,d not composed "my father"); (verb in perfect form), [V,part f]
Our grammar has rule classes instead of rules according to the structure of the alphabetic elements. jaarden grammar [14]. Our grammar is a special van Wijn-
Structured
symbols
The following definitions are from [10]: Definition Let M I (Bm)mr M a family of finite sets, where a: a B
m
be a finite set,
Bm * ~ . Then every partial mapping a(m) 9 Bm are called features, and is noted a(m) val~e of m in the domain of a .
M ~ U{Bmlm r
M }
where M and a m
M 9 The elements of
Let
M . The following
and
If
a T b , the
and
10 d(a) = d(b) classes for <{m}> . ~ , the least a and is denoted by $ ~ M are de<m> instead of =
~--
notud by
~ , the latter be~n~ ch filted only for compatible structured symbol a with the domain = ai (ml,...m k}
a(ms
is written
[mlml,...,akmk] . The unusual n o t a t i o n is used in phonology where structured symbols characterize phonemes. We use structured symbols in our formal grammar for natural languages in the l'ollowing way: (I) Th,~~ is one feature, S cat (category), that plays a special part (sentence), NP (noun phrase), VP (verbal ~nd whose values are the categories usually needed in a natural language grammar: phrase) etc. (2) There are further features whose values stand for properties according to which these categories are subclassified. subclassified. E.g. the feature possible values are numbers complements "semantic" tures. to the verb. features are not d i s t i n g u i s h e d from "syntactical" feap The features turn out to be ordering principles a c c o r d i n g to which a category can be subclassifies verbs and its which stand for the n u m b e r of I, 2, 3
Insertion rules The features are ordering principles for the grammatical categories.
T h e r e f o r e they always refer to certain categories and the alphabet of the grammar is a strict subset of the set of all structured symbols. [N cat, itrans v] for example is not a m e a n i n g f u l alphabetic element because nouns are not s u b c l a s s i f i e d a c c o r d i n g to transitiveness. Definition 2 An insertion ru~e for A ~ <S> p for S = M and C and is a pair X r C p = (a,A) , where iff a r C and d(a)~ S = ~ .
is called applioab~e tO
(A1) a u x
11
(A2) d ( x ) ~ S : Let
u,v r
R ~ C x {AIA ~ <8> , $ = M)
C 9
Then ar%d b
u imp o v
Iff
3p r R , p : (a,A)
(A2)
We write also
reflexive, also
is denoted by
u plimPpnV..,
... impoVpn ' C in an analogous way as produca r C , then we denote the set a by
R
by
L(R,a)
and we set
= (x r C J a pllmppnX
Plr
R)
TL(R,a)
= L ( R , a ) ~ {x r C I n3y r C : x i m p o y} .
Insertion rules for structured symbols are a p p l i e d to symbols of the form [X cat], which figure as "start symbols" TL (R, [X cat]) X . for the alphabetic elements and is the set of alphabetic elements belong-
Feature 5rammars Feature grammars are defined in the same way as CHOMSKY-grammars, but the derivability concept is modified a c c o r d i n g to the alphabet structure. The alphabet of a feature grammar is to be the set of all structured symbols which can be derived from a set of feature values of the feature cat of a set of features M by some given insertion rules R. Now it is often the case that production rules are independent of the s u b c l a s s i f i c a t i o n of the alphabetic elen,ents and that they should be applied to all subclassifications. rule "indefinite article" "noun group". Let us consider for' an example the r e p l a c i n g "noun phrase" by For The rule which replaces "noun [NP cat] ~ [DET cat][NG cat] [NP cat] [DET cat, indef d][NG cat]
12
chains
at~d
having
the r u l e defined
elementwlse.
us with this p o s s i b i l i t y .
grammar
is a tupel
G = (M,(Bm)mcM,R,cat,S,Z,~) , where finite set of f e a t u r e s family over M feature of cat rules, where for every a r C p = occurrp r ~ : cat r d(a) : Let of f i n i t e value rules sets for the features for the set C of s t r u c t u r e d symbols set o1" i n s e r t i o n starting
(xl ''" Xn' Y1 "'' Ym ) r ~' then cat r d(X i) and cat r d(Yj) for all i,j : I < i E n and 1 < J gm. Let CB T CS C6' rCS, B be a value of cat. T h e n we set cat]) cat]) r 8'} for 8' m 8ca t
: U{~CBII+ r 8'}
These b : B! Cb
definitions
9
are e x t e n d a b l e
cat
over strings
over
cat
'
Let
"
Bn r B
=
"
T h e n we set as r
LBI
: {ala
aI
...
an;
TCb : {a]a = a I ... an; a i 9 TLBI} C 8 : U { C b l b r 8} for 8 U 8ca t TCB = U { T C b l b r 8} for B u Boa t it is n a t u r a l that the d e f i n i t i o n must be e x t e n d e d q g q' only such in such a way
Aftez. what we said at the b e g i n n i n g , of d e n i v a b i l i t y that, rules tained given (p',q') for f e a t u r e provided grammars and a production rule
(p,q) , we can apply all p r o d u c t i o n We only have to pay q' w h i c h are conrules.
p g p'
attention
in the a l p h a b e t
by the i n s e r t i o n
%)
C" r
over
and
C : C~\{z)
where
13
Defir~|t Let
ion r
,
I~ C*.
x,y
in
~ ,
x o'
> y
C
ifr
*
and
B(p,q) r R x~--~--> y .
and
BI,",q" r
Beat
p = p' = p"
is the transitive, reflexive closure of derivable by a feature grammar IS cat] > x} and G . is the Zanguage generated by G G
o is
> .
The set of sentences L(G) TL(G) 9 {xlx r C = L(G) ~ and (T CZ) "
is not limited p
" as can easily be seen in (op q) ~ X for all x z sScat . However the definition of derivability, such p r o d u c t i o n rules can never be applied. So we can eliminate such production rules in R without changlr~g the set of derivable sentences. The type of a feature grammar is defined in exactly the same way as the type of a CHOMSKY-language. C and therefore production rule ed p' and q' TCScat (p,q) being finite, (p',q') where it is possible to replace every p ~ p' p" and and q ~ q' , provid* of T C Scat of a feature grammar, which is a rule class, q"
The C H O M S K Y - g r a m m a r obtained in this way is equivalent to the appropriate feature grammar. So we have proved the following theorem.
Theorem For every feature grammar G of type i there is an equivalent C H O M S K Y - g r a m m a r of the same type.
Semantic attributes Every natural language sentence generated by a feature grammar must be translated into a state logic formula. This transduction is a mapping from the set of sentences together with their derivations into the set of state logic formulae. attribute functions. It is calculated by semantic attributes and This formalism has been introduced in [II] for
14
of p r o g r a m m i n g production
languages.
To every atele-
element is given
and to every
a set of values.
For every
a s,~t of a~rgbute
[un~s
ment
to an alphabetic define
occuring
within values
functions
all of with-
the attribute in the same and have belonging nition. values. rational rections, Therefore, values
to in terms occuring
of the attributes
belonging
production
functions
as arguments
of those attributes
to alphabetic values
elements
occur
The So,
if we think within
structure
attribute
functions
transport
ute values
to the leaves
fl-om t h e
are used:
ders
from the
leaves
carry
from the root to the leaves. for every derived left attribute
production
side of the p r o d u c t i o n
belonging
rule value
to a value element
set of
this d,.t'iw.d attribute. the: noo,~ labelled productLon there rule to which element
]:3 an attribute
for every
side of a production
attribute
of other attributes
belonging rule
alphabetic value
elements
occuring of that
within
of the value
Bet of that
attribute.
attribute
we need
o~e special
derived
attribute,
to represent
fine what
is a semantically
tial functions
structure
trees
formulae. it gram-
that a sentence,
s , is semanticalZy structure
entreat, if
it can be analysed
15
a formal
defin:[tlon
here;
the
o~' the m e c h a n i s m
will be d e m o n s t r a t e d
example.
16
T|(ANSLATION OF NATURAL LANGUAGE DIALOGS INTO STATE LOGIC FORMULAE An attributed English grammar fragment state logic formulae. is given and discussed in de-
tail. The grammar analyses natural language dialogs and maps them to
T h e a l~h:,bo.t Here we describe what features and what insertion rules are used for a natural language grammar.
Features: a Kind of adjective. Wc distlngu~sh between two kinds of adjectives: (i) r~,latlonal adjectives (ii) adjectives (value r) which describe a property of e.g. big, oZd. a noun ~n comparison with other nouns;
that select a subset of' the set of all objects e.g. round, black.
they can refer to, i.e. these objects that have the property described by the adjectiv; cat Starting feature category. The values of cat correspond to alphabetic elements usually needed in transformational grammars s,,ntence;
~ '~ IZ ~ ~ O ) ;
for
NP V
i~g:: and without article, e.g. jectives; phrase cp DET for proper name; PRON
Zarge yeZlow teeth); A for adfor d e t e r m i n e r (e.g. the, aZZ, any, some); PN
for pronoun; ADV PP for p r e p o s i t i o n a l (e.g. today, aZfor adverbial
Composition of nouns, noun phrases, noun groups, or prepositional groups. The possible values of cp are and according as the corresponding noun or group is compound (e.g. the teacher and al~
like 8ome). dc can be abs, comp, sup according as the coror superlative
17
good, be~ter,
eb are
The values of
eating) and kc
f prop
aPe part for verbs in partlclple form (e.g. for verbs in "propositional" form (e.g. ea~8). Its values are: for causal (e.g. for concessive
Kind of conjunction. This feature subclasslfies sentence conjunctions. cond for conditional temp fin (e.g. ~f .,. ~h~n); caus conc because); for temporal for final (e.g. after)}
(aZ~lzough);
m Nc~gatlon.
Number. The values are plur for plural and pl are sing for singular. Number of "supplements" The possible values of one-place (e.g. work); of a verb or a noun. 1,2,3 : Intransitive verbs are (e.g. two-place verbs have one object
pl
know, John know8 Mary); three-place verbs have two objects (e.g. give, John gives Mary a book). Nouns are one-place (e.g. table) or two-place (e.g. father, John i8 the father of Mary).
tel S u b c l a s s i f i c a t i o n of relative clauses. The values of rel are subj,objl,obj2 according as the relative pronoun is the subject, t Tense of a verb. The values of t are past,pres,fut. the first or the second object
Insertion rules (NP,<{eb,cp}>) Noun phrases can have embeddings and can be compound.
([NP,- cp],<{n,pl}>) (NG,<{n,pl}>) (N,<{n,pl}>) Noun phrases that are not 9ompound and noun groups and nouns are in singular or plural form and are specified according to their number of places.
(It
Relational
to their n u m b e r of possible
and they are in participle
tO negation,
Derived
attributes
ag
NP,
~s a constant sentence,
of the object
For an imperative
is the name
of the "person"
to whom the
command
dressed. con is defined for determiners, DET, and elementary noun phrases, there is a sen-
formula
fragments.
depends phrase
belonging
men w o r k
is r e p r e s e n t e d
some m e n w o r k by
some
on the d e t e r m i n e r
for a verb
and
belongs
of questions
if an
phrase.
If there
is more
what
in the structure
a question
is g e n e r a t e d
is defined nector of
CONJ,
the
con-
to the conjunction.
Example:
h(or)
19
log
quadrupel ( q u a n t i f i e r ,
(2) Adjective groups as conjunction of the formulae for the adjectives (3) Noun groups as conjunction of the formulae of the nouns and the adjectives the noun group is composed os parrot" is represented by e.g. "a green Bx[PARROT x ^ GREEN x].
(4) Noun phrases as conjunctions of the formulae of the noun groups and supplements the noun phrase is composed of; e.g. tha ~eaan parrot, relative clause ~hs
whlch
...
~x[I'AHROT x ^ GREEN x ^ I]
....
PP; e.g. on ~ha tabZe is r e p r e s e n t e d is the name of the Object of the noun t the
phrase the p r e p o s i t i o n a l phrase is e m b e d d e d in and name of the object described by op q is defined for sentence adverbs, PrOl,riate operator. i:~ defined for determiners, for
80m~.
tabZe.
DET , and its value is the quantiIn the example given below 3 for a~Z resp. for PREP and V resp.
sy top w
Y , nouns,
N , prepositions,
its value is the predicate symbol r e p r e s e n t i n g that concept. V , and has as i~s value the tense operad e p e n d i n g on the tense of the verb. w is the state of the structure in which the diaWe need this information for the assignment of
20 Inherited attributes agr ~s defined for adjectives, adjective groups and sentences embeddagr is the name of the ob-
ject the adjective refers to. For a sentence it is the name of the object described by the noun phrase the Benten(!e is embedded in. agcr is defined for adjectives in comparative jec$ive refers to. ix is defined for nouns, noun groups and noun phrases. The attribute functions generate bounded variables for noun phrases like a ~ x~ , xa, jx. ... x i and form and its w~lue is the name of the object that is compared with the object the ad-
oh4~dren.
syr
their indexes are generated by the attribute is de1'ined for relational adjectives cate symbol that represents
Attribute
In the following, we describe how the most important word categories are represented in state logic. We shall explain in detail the attribute functions for the lexlcal rules.
(I)
Verbs are represented by p r e d i c a t e symbols of the appropriate number of places. We are aware of the manifold difficulties w h i c h can arise w h e n e v e r this number is not Uniquely determinable. (e.g. Problems connected with this have often been described and discussed [15]). We have not resolved this problem but we think it For every verb the number of suppleshould be possible to come to terms with it with the help of the following practical device. ments is fixed and part of the lexical information for that verb. Whenever the verb occurs in a text w i t h one or more supplements missing the empty variable places are filled up by dummy elements. When it occurs with supplements not provided in the lexicon the additional formula fragments must be connected with the rest of the sentence formula by [V,x t, y m, z pl]
sy(V) :
^ .
:: v
= ~(z)
21
of m :
is
or
depending
on the value
of
con(I m])
con(Im])
The ture
tense t :
of
depends
on the v a l u e
of the
fea-
means F ~
"there means
is an i n s t a n t
"there
is an i n s t a n t con and
such
values placed
of the at
attributes
the head
of the w h o l e sentence.
sentence
occurs
operate
on the whole
(2)
Nouns by
by
one-
or t w o - p l a c e One-place
predicate
symbols
or de-
one-place
symbols.
nouns
scribing an{mals
objects, or h u m a n s ,
e.g.
tabZe, house, bZoak, and nouns d e s c r i b i n g mouse, baby. T w o - p l a c e nouns also debut they express at the same time a reexor pe~'J,,rn;; ,.xaml,les ~,t'e all n o u n s as luther, mother, aunt.
or p e r s o n s
to o t h e r
things
congeniality ::= a
pl]) =
relations
[N,x pl]
sy([N,x
= ~(x)
is the
x-place
predicate
symbol
representing
the m e a n -
ing of the noun. Function same type nouns always correspond function. to a d j e c t i v e s For each expressing the a
of m e a s u r e
of these
adjectives the
measure
function to into
is i n t r o d u c e d
mapping
the o b j e c t s
adjective be de-
can r e f e r scribed
numbers.
This w i l l needed
in d e t a i l
function is
~b
~eiB#
SIZE
~o~e~ muZioe because these kinds of nouns have hardly been dealt with in
Such concepts appear in [2],
but there they are treated in a very "material" manner. They are measurable and they operate exactly like concrete nouns. The degree of maZice or
these numbers increase or decrease depending on the things that happen in the world. We think that for a better treatment of such nouns it would be necessary to use higher order predicate symbols; but we would need predicates that can operate on other predicates of different types and t~is possibility logic. "Mass" nouns (see [16]) have not been treated semantically. They is not provided in type
symbols. Sometimes they have the same properties as constants, sometimes they act like predicates. (3) Adjectives The state logic formula representing the meaning of the proposition contained in the adjective is built up on the lexical level of the grammar. Therefore, log belongs to A .
As mentioned above, we distinguish between two kinds of adjectives. (3.1)Adjectives that select a subset of the set of all objects they
can refer to. All adjectives describing colors belong to this group; the noun phrase the red buZZs designs a subset of the set of all balls, the balls being red. Other adjectives selecting a subset are round,
[ A , s a]
::=
log(A) = ~ agr(A) = ~(1) Jective is the one-place predicate symbol representing the adu . The inherited attribu~m aEr agr has as its value th~ within the rule AG1,
name of the object the adjective refers to within the appropriate sentence and this value is assigned to
23
is generated. log(red) phrase : RED t , whe,'e is describing. b,?t,w~,en the objects th~'y reI~ t is the
name of the object (7.2) AdJ~,ctlves comparison care,gory; that wlth i.e.
the noun
express
a relation describe
adjectives being
belonging
comprehended
,?x,:~,ed~ a c e r t a i n number
fox' dog:~.
other absolute size. A sentence like thls big dog is m~ch emaZl~r than ~hat smaZZ eZephan~ must be verifyable. Such "relational" adjectives are paired: (young, oZd),(smalZ,big),(thin,~at),(sOft,
adjectives
orders
to the m e a s u r a b l e
So the pair
an age a c c o r d i n g
to that
of relational
adjectives
functions,
relations,
and constants:
(i)
~a
is a function
symbol whose
extension
takes
subset
of the u n i v e r s e
the logic of ~a
the adj~ctlves
typical for
am . For example
a2 : big.
: s r M} , Re)
be a structure
~Js)
T
: T ~ ~ {ia} , T ~ O B
of the set of objects OB of A al and am
is the subset
refer to.
(ii)
OP al
and
OP Ga are two-place p r e d i c a t e symbols, o r d e r i n g of the objects r e f e r r e d to by to the property OP at one and OP states invers~ expressed by al and
describing al and aa
the
ordered
T h e i r extensions
24
s r M
.*..~ n S m
and
n,m r
,,... n 9 m
Example:
OPyo~ng = OPozd =
~Jears ~Jears
P IN
and
such that
al
and
as
and limiting the sc~]e for values oi' P a~ . symbol P comprehending obcan reach a minimal CS
G3tP
fur objects "of type P" . Wc w]:h to exprc,:3s by this that CS and a maximal size of about al and ac-
an object of the conceptual category size of about a1,P cording to the properties
al
and
am
r r R
CSozd, dog
Example:
= 20 = 20
CS
young~man
(iv)
cs
al
and and aa
CS ~a
al
: X CS : X CS
predicate symbol
alwX
of
L(Z)
such that
predicate
a3wX
ing of a relational adjective is composed of the symbols introof the adjective, i.e. of the value of the feature
25
(3.2.1) Absolute A2 [A, abs dc] log(A) agr(A) syr(A) ::: u = ~u agr(A) OP u CS
u,syr(A)
u refers to within the apNG2, s~:e later. u
occurs within a nounphrase of the form "deis the lexical entry for the e.g. in the sentence is a proper name,
agr(A) syr(A)
John
0.3
(3.2.2) Comparative A3 [A,comp dc] log(A) ::= u = @u agr(A) OP u ~u agcr(A) agcr It i s l e t is the name of the object always possible to find agr(A) agcr(A) is comwithin
the same sentence; as in these other cases this value must be found in the dialog or text structure. Example: we get: ~ o Z d = AGE
[A,comp dc]
John is oldsr than Mary. We have the lexical rule ::= oZdsr for the g e n e r a t i o n of the adjective and
26
tile
di-
i.e. the
name of the person whose elder brother in be|ng spoken about. (3.2.3) Superlative A4 [A, sup de] log(A) Example: Let
=
::= u
V x [ s y r ( A ) x ~ ~u agr(A) OPu ~u x]
OPbig ~big
(4) Prepositions PI [PREP,n pl] sy (PPEP) = : ~(n) is an n-place p r e d i c a t e symbol d e s c r i b i n g the meanp . ::= p
(5) Predicate symbols are also used for the description of such relations between nouns that are not expressed by a fixed word category. We have for an example the OWN-relation which can be or by cases; OWN bee x p r e s s e d by pronouns, by verbs, by prepositions;
e.g. his dog, John has a dog, John owns a dog, tha dog of John,
John's dog. In all these examples the relationship tween John and the dog is expressed.
(6) D e t e r m i n e r
De-
pending on the type a q u a n t i f i e r and a connector are assigned to DET which are needed for the construction of the formula describing the meaning of the appropriate noun phrase. The q u a n t i f i e r becomes the q u a n t i f i e r for the whole noun phrase and the connector is the connector with which the formula is attached to the other formula fragments belonging to the other sentence fragments.
27
DI",'I' : :: u D1
c(,n([DET,indef d]) : ^ q ( [~)I~T,iIldef d]) : a , oF is She empty is plur. 3x1[DOO x1^ ..., 3 and
Here, n
string if
Example: ^ D2
respectively.
Noun phrases with definite article like "the ball" design always a certain, fixed object of the world which is already known in the 3x[BALL x ...] , but the name of the object menWe will discuss the probcontext and so has already a name. Therefore we do not generate an exp~'ession like tioned is searched for in the structure. erating the noun phrase. D5 con([DET,ex d]) : ^ q([DET,ex d]) : The lexical entries for determiners s p e c i f i e d by nouns like s o m e . Example:
Some
[ex d]
are pro-
ohs
are
works and ^
depending on the pronoun a o m e . D4 con([DET,all d]) : q([DET,all d]) : Y Pronouns fied by like e u e r ~ [all d] .
pZa~ ...
and e a c h
connective is Example:
aZ~ a h i K d r o n
is represented by
28
(7) Teml.orul adverbs N(iturnlly, it is possible to duucz'ib,~ any time res help of the time operators one-place time operators of the following, A Z with tile In +, -, F, P . We demonstrate here the for some time adverbial groups.
::=
aZ~a~a
= ([~) Z
op(ADV) (~)
is a defined operator of
(~) A
+-*
FPA ^ PFA
This means that from every state from now on we can go to every stat~, into the past and from every state from now on into the past we can go into the future and A is true in all states we can "reach" on this way. We would like to stress that what is meant by a temporal adverb depends on the structure in which sentences are evaluated. If we consider a linear time structure, it would be sufficient to represent aZwa~s by be "reached" by means of F and FPA . Our representation demands a otherwise isolated points cannot P . This consideration is impor-
tarot because we require that time adverbs have non-logical meanings, i.e. what they are represented by depends on a given structure and not only on a given logic, that is to say it does not depend on the logic but on the i n t e r p r e t a t i o n of the logic. We conceive of time as a non-logical concept.
ADV2
ADV
op(AOV)
C~-~A .,--*~
[.-, + ~ A
((~)) A
immediately preceding or following state in which tuitively speaking, -ed" starting from every state.
ADV3 ADV ::=
sometimes
= C~D
op(ADV)
29
.nPF .nA
sometgmos
states
means
or there
from which
states
in the future
ADV4
ADV
::: a Z m o s ~ :
.ooor
op(ADV)
(~A
['n+A
"n-A]
This means
that
in every
state
((D)
immedi-
altely p r e c e d i n g
or following
state where
ADV5
ADV
op(ADV)
~DA
*-* FP.nA
PF'~A
In every future A
state
reachable
by first
going
and then
or by first going
is false.
the following
furmulae:
the
more often.
adverbial
oor V oo~dom,
vo~y~vor~
so~dom,
~a~h~r of~r
oery o : t e .
and so on.
Attributed
production
rules
a part
rules
together to state
with
their atformu-
functions.
English
are mapped
logic
representing sentences.
is the c o n j u n c t i o n
of the for-
30
NO1
NG : : :
NG2
NOt
::= A NO. #) = log(A) : ag(NGs) : sy(NGa) = ag(NGs) : sy(NGs) = ix(NGI) logic expression representing where resp. P the noun group is the one- or P is the valfor N resp. ags is ^ log(NG1)
the state
or Pxy, of ag
representing
the noun.
N ,x
values above, xi ,
of the object
described This
is a subscript.
a production
it must be guaranteed
variables
are generated
noun phrases
For these
the verb p r e d i c a t e
generates
such as ~ h e
b e a u t i f u l f~ower.
with the formula in NGI are submitted
formula
representing
the m e a n i n g
It is linked
%) Subscripts are used to distinguish between identical non-terminals occuring within the same rule.
31
u~.~/~n_.J? h | ~{III I! I] N o
NI'I
con(DET)
log
state logic expression consists of the proposition the noun group the quantifier which depends on the type of the determithe value of ag is the name of con Js are and the value of ner quantifies this expression;
the connector with which the noun phrase is linked to the other sentence formulae fragments. These values, q, log, ag, con, the constituents of the state logic expression representing the noun phrase. This expression is only formed on sentence level. E.g. For the noun phrase uZZ men within the sentence uZZ men work we get log = (V,MAN xl,x1, . The definite VxIMAN xl logical expression is the re[,re~.nlt[ng the noun phrase is pz'~.ssion ~'or the sentence is and the definite ex-
connector linking the noun phrase formula to the vex'b phrase formula. The reason for this is that the constituents of compound noun phrases such as the teacher and =Z~ his pupiZs or nelther John nor Mar~ must be still available on sentence level because they are arranged within the logical expression in another order than within the natural language sentence. level, and attribute In fact we have transposed the problem otherwise resolved by a t r a n s f o r m a t i o n to the semantic functions perform the task of transformational rules in transfo,'mational gramma1'~. Nourl phrases w~th definite article are not represented by th~ logical expression but only by an object name. The name of the object described by the noun phrase is either a bounded variable or a constant. type of the determiner. [ex d] or by This depends on the [all d] or by If it is s u b c l a s s i f i e d by
[indef a], i.e. it is all or some or a etc., xi which has been generated within NGI. Other-
then the noun phrase expression is q u a n t i f i e d and the object name is the variable wise, i.e. if the determiner is s u b c l a s s i f i e d by [def d], the
32 noun phrase is definite, find the object tha~ c c c i.e. it has the form tile u and we must
has the properties described by the noun phrase and that is unambigues only if it is
sentence 2akm ~he blg grsen ba~ll cloar what ball is meant, tion for in, s(NP) ag, w(NG) c
green. This search condition is formulated by the attribute funcis the structure the dialog is evaluated is the only constant of the structure, is true in w(NP) at state sy(NO) z s(NP). if such is the actual state of the dialog. The search condilog(NG)[c]
If the search condition cannot be verified a following up sentence must be generated. that log(NG)[z] It is a question of the form ~ h a ~ is true. the noun phrase is ambigues, if there is no such i.e. there is more than one
It is a sentence there is no sy(NO) gm . The logical expression representwhere u and ~ c is the logical erpresis the constant de~[c] can be c
ing a noun phrase the u is ~[c] sion representing the noun group 3x[~[X]
signated by the nbun phrase. We could also generate ^ V y [ u [ y ] ~ x : y] . The first expression u[c] derived from the second semantically by searching for an object such that set of x holds. T o g e t h e r with the requirement that the u x holds contains only one element we have In NP1 we have formulated it i.e. on the semantic In our solution we such that
in terms of structure and truth condition, on the syntactical level of the logic only. 3x[ux a Vy[~y x = y]]
must verify the expression of the second solution, when a n a l y s i n g the sentence and transducag(NP). If we take the second soIn each of the two The ing it, namely when generating
lution we first generate the expression and evaluate it when the parsing of the sentence is already finished. when, cases the search condition is the same. The difference is only i.e. on what level, we execute the necessary deductions. advantage of our solution is that an ambiguzty is discovered during sentence analysis and a following up question for resolving such an ambiguity can be generated and answered immediately.
NP2
::= PN
= (c,c,ag(NP),~) : sy(PN)
= r
If a noun phrase consists only of a proper name it does not contain a logical proposition. The only "information" noun phr~se is the object name, NP3 [NP,- z,- e] log(NP) ag(NP) con(NP) ::= PRON contained in such a i.e. the proper name.
= (r162162 = ag(PRON) : r
As in the case of proper names a pronoun only refers to an object and does not contain a logical expression, ag(PRON), i.e. the name of the object the pronoun refers to must be found in the structure the sentence is evaluated in. There is no general rule for finding this object. One can compare the objects mentioned in the text and take the nearest one that fits m o r p h o l o g i c a l l y appropriate verb demands. and semantically, i.e. has the same number and gender and the semantic features the Questions such as What ga meant by he? that are not resolvable in this way. are generated for ambiguities NP4 [NP,- z,- e,x pl]
::: P O S S P R O N [ N G , x pl]
log([NP,l pl]) = (E,beg,ag(NG),r where beg = log(NG) ^ R E L N G a g ( P O S S P R O N ) a g ( N G ) log([NG,2 pl]) = (3,1og(NG),ag(NG),A) agr([NG,2 ag(NP) con(NP) pl]) : ag(POSSPRON) = ag(NG) = ^
If a possessive pronoun precedes a one-place noun there is a binary relation between the object the possessive pronoun refers to and the object the noun refers to. This relation is not explicitly mentioned. What relation is meant must be concluded from the If the possessive pronoun RELNG is if the semantic descriptions of the two objects. most probably the ownership relation. of the body RELNG two refer to things RELNG
34 F~)r a two-place noun the relation between expI'es:~ed by the predicate (~g(l'OSS PBON) meets the same difficulties the two objects as for ag(PRON). is already
symbol representing
Noun phrases with embedded sentences NP5 [NPI,- z,+ e] IoE(NPI) I : ~ log(S)^ ::= [NPa,- z] Is, rl ks] if ~,(log(NPa)) ^ log(S) : r
: (wI(IoE(NPa),I,~3(IoE(NPm)),=~(IoE(NPa))) else
~a(log(NPa)) = ag(NPm)
ag(NP,)
agr(Nl'a): sg(NPI) The state-logic-formula ^ subclassifying sentences sentences ~i log(NP) representing the relative is a feature clauses clause is linked by (kind of sentence) We need this mappas a quadrupel constitlevel.
into relative
the logical expression representing uent of the quadrupel erated until a rule Thus, AZZ ed by log(S) Vx[CHILD
i.e. at sentence
is always in the domain of the quantifier. x ^ PLAY x p ~ AGE x = (jears,ll)]. clauses referring who is working (r162162 in London, ....
~he ch~Zdren who are pla~ing hero are eleven years oZd is representThis case disi.e. NP Here the log(S) ^ is needed for relative like John, to proper names,
tinction
John is represented by
The proposition
the other formula fragments ed by the noun phrase to the sentence by Prepositional tences, linked by ^
complements
representing
35
9 (Iog(NPs),Iog(NP=),h(CONJ)) = ix(NP,)
= ix(NP,)
... nor)
h(egther
generated as such a triplet because the conjunction has such an effect to the noun phrase that the sentence the noun phrase is contained in is a compound sentence linked by this conjunct. E.g. the sentence I
kno~ ~ohn and Mary has the same meaning as I kno~ John and I know Mary.
Sentences We have production rules for types of sentences differing as to the kind and the number of verb complements. We will give here as an example the rule for a sentence with three verb complements. All the other sentence generation rules have the same form. $I IS,as ks] log(S) i~(NPI) ix(NPm) ix(NPs) s ::= NPI[V,x t,y m, z f,3 pl] = top([x t]) eon([y m]) a(sy(V),Iog(NPI),Iog(NPm),Iog(NPs)) = i = 2 = 3 Its arguments are the expressions a log(NP) NP= NP=
is a partial function.
exprcusion from the formula fragments which are the constituents of is the predicate symbol for the verb U . (quantifier, fermula fragment, object 11 and and i= h are again is a con(11,1=,h) where log(NP i) log(NP i) is either a quadrupel log(NP)
is called elementary.
I)~.~f[nlt!r~n
:;~eheme of
:(sy,l,,...in)
.
I. Let sll i i be elementary, i.e. i i = (qi,Fi,agi,coni) a(sy,ll,...in) = t t qlag~[F1 con1 qs aga[Fm...qnagn[Fn cOnnSY ag,...agn]...] ag~ : r if qi = r and ag~ : ag i else Example for the two-place verb know: From the sentence Every boy knows ~ ggrZ we get: log(NP,) log(NPa) sy(V) = (w,Boy x~,x~, = (B,GIRL xm,xm,^) = KNOW Xs,Xs,A)) xa a KNOW x~ xa]] i such that = lio
be the least
is not elementary.
lio : (k~,ks,h) . ~(sy,ll,...lio_1,(kl,km,h),lio+1,...in) m(sy,ll,...lio_1,k1,1io+1,...in) a(sY,ll,...llo_1,km,lio+1,...In) If there i~ more than one non-elementary noun phrase argument is (.nly defined when the two corresponding connectors h. are coztq)ats is only defined Let if I i = (kl,km,h i) and lj = (ml,mm,hj) of a and h. a h
. Then
~(sy,ll,...kl,...ml,...in) hi(e(sy,l~,ka,...ml,...In)
(Xl hj Xa) hi(Y1 h~ Ym) ~ (Xl h i Yl) hj(xa h i Ya) s ~ is not compatible with itself. like Neithet the teacher
nor hio puplZ know neither the alphabet nor multlp~ication tabZee. Such a sentence is refused semantically. The sentence is incorrect because it allows more than one analysis.
37
+~ a ~
Ca +
b) +
(c +
d) ~ Ca +
c) +
(b +
d)
John Or Marv drs e4~her
ahampajna or boer have the same meaning representation. It is obvious that a acts like a t r a n s f o r m a t i o n rule analysing a send into a does o a n d a dooa d and b does o
tence
like
a a n d b do o a n d d.
and b doas
The value of top is the sentence's tense operator determined by the wilue x of the feature t . con is ~ or z according $o whether NP the verb is negated or not. The subscripts of the object variables of the NP are generated on sentence level, w i t h i n an S-rule since that are different on this level must receive different object names because they are w i t h i n the scope of one verb predicate, the verb predlcate of the verb of the sentence, sy(V) .
Helative clauses For relative clauses the relative pronoun is generated directly and it is not the result of the application of a t r a n s f o r m a t i o n rule to a noun phrase once generated. As before we have a sentence rule for every type of relative clause, c o r r e s p o n d i n g to the kind and the number of verb complements and to the grammatical function of the relative pronoun in the sentence. We give an example rule for a verb with three complements and the relative pronoun as its first object. $2 [S,objl rel] log(S) ix(NP1) ix(NPm) ::= RP NPI [V,x t,y m,z f,3 pl] NP2
= top([x t]) con ([y m]) a(sy(V),log(NP1),(z,r162 = I =2 m corresponding to the relative pronoun NP and has been assigned within NP5 . RP is ele-
This rule d e s c r i b e s
~t does not rain o r
John works i n
and Mary
the garden
esudlee
~fl
in London
aS go-
L(Z)
representing
the conjunc-
are expressed by
~ . iS Sm ;
1 wiZZ
take
SI 9 T h e s e n -
is
C1.2
x : temp
S, after
Sm
~ + ~
log(S,). Example:
I went I wont to bed after I had eaten. I had care,
is
S,
and
to bed is
is represented by
to bed.
log(S=) CI.3
~ + ~
x = caus $I because Sa is represented ^ log(St) Sl and by ^ log(Sa) SI and $I Sa both hold. Ss if Our is ~D[log(Sa) log(S1)] implies
Sa
I am wet beaause
it is raining.
it 48 ralning. ~D[log(S=)
it ia raining
~ log(S,)]
39
formalization does not prevent that thingu which are sometimes truc at the same time are causally related. to relate causally things in this way. CI.4 X : conc But we think that humans intend
$4 aZthough S a i e
(~D[IoE(S=) i,e. almost always Example: * ~
repros=need by
IoE(SI)] ^ log(St) ^ IoE(S=) implies ~log(S1) and the both hold.
log(S=)
t a k e the u m b r e l l a aZthough i r i a r a i n i n g ; this means t h e umbreZZa if i t i a r a i n i n g and i t ia r a i n i n g az,d I do n o t take t h e umbretZa; i.e. (-ff~D[log(S=) * ~log(S1)] ^ log(Sa) ^ log(S4) where $I is I do not ga~r tha umbraZZa and Ss is it is raining.
almost always I t a k e
I do n o t
Sample sentence ~ e n e r a t l o n
big
red
blook
where SIZE = q)big and lOg(As) : RED agr(As) sy(N) BL : BL : BL(') = b-~
(N1)
(NG1) log(NO)
ag (NG)
40 sy(NS) (NG2)I IoZ(NGI) ~ig(NG1) sy(NG1) agr(A) syr(A) ix(NG) (NG2)m log(NGz) : BL = I~ID agr(Az) = HED Xix(NG) = ag(NG) = sy(NG) 9 ag(NG) = sy(NG) = ix(NG1) = SIZE Xix(NG1 ) ~(meter) ^ (meter,l)A a bL Xix(~G ) ^ BL xix(NG)
= Xix(NG) = BL
attributes
(NP1)
Iog(NPI)
= (3,l,Xix(NP,),^) i = size
Xix(NP~ ) ~{meter)
9 9 9
and
V
NP,
[co Jl( o )
all
NPa NPs
we get (NP6)
(NPI)
con(DET) q(DET)
= * = V like NPI : = ls
: (V,BALL X i x ( N P s ) , X i x ( N P a ) , = (11,1z,^)
41
Let now be
John
NP~
/P e r f
PN
~ /
m [
I/
NPm
\
~
[def
kdJ
...z
NP- (NP2~)
(NP2)
= (r162
= John
= c : ^
(D2)
con(DET)
q(DET)
= r
w(NP) , contain more than one object and Tcm for cl,gm objects of
(NP1)
= (c,TXix(NP~),Yix(NP~),a) = Yix(NP~)
= <what teaoher>
is generated.
say by
c , or this
in M u n i c h ,
answering noun phrase is evaluated in the same way and if necessary another following up question is generated. fying the object found, get Iog(NP~) = (r When the answer is satislog(NP~) and we then c , is inserted into
(sl)
log(St) =
~P'~m(GIVE,(r162162 = ~P-~[~(GIVE,(c,r ^~(GIVE,(c,r162
^[Tc ^ GIVE John x, c]]] xm ~ [Tc ^ GIVE John xa]]]] clause embedded into a noun phrase, where
be a relative
',/
""/
pl
NP, (NP5)
s3 (s2)
(NPS) log(NPi) ag(NPe) agr(Sa) ($3) log(Sa) = (r
= John
= John rule not explicitely mentioned here. = a(WORK,(c,r = log(St) a WORK John
is a sentence generation
John
v~ov~ing
b6oause
he
neeae
mo,~e~,
sB (c2.3)
(C2.3) is not explicitely mentioned here. This rule generates compound sentences where the superordinated sentence precedes the subordinated
one.
43
lo~(Sm)
9 ~D[log(s,) a log(S~)
- log(S,)]
^ log(s~)
Dialogs
A dialog lowod
is a s e q u e n c e
of s e n t e n c e s sentence,
of
L(G)
where
a question
is fol-
by an a u s e r t i v e
the a n s w e r ,
or by a n o t h e r
question, by anup
th(' f e l l u w i n ~
up q u e s t i o n .
A command
or a s t a t e m e n t
is f o l l o w e d
4~ owt or I unJ~*,s~and or by a f o l l o w i n g
are g e n e r a t e d noun whenever whose a definite phrase object
up q u e s t i o n s i.e.
the s e n t e n c e can-
~n a m b i g u i t y ;
n o t be d e t e r m i n e d ter'mined.
or a p r o n o u n
the r e f e r e n c e
of w h i c h
cannot
b e de-
DeI'in~t~on
I. A d~'alog o~oZo on a s t r u c t u r e
Is a s e q u e n c e S S QC A of where L(G) is a s e n t e n c e
A = ({A s
: s c M},Ro)
in a s t a t e
or a s e q u e n c e
of pairs u and
QB B
where
is a f o l l o w i n g of the f o r m up q u e s t i o n
up o to
of the f o r m w h a C QC is e m p t y
is an a n s w e r
if t h e r e
is no f o l l o w i n g ... Qn Bn if
g m = ~ . QC = QI B, Qm Bm 9 S . If c . and for by S If
is a q u e s t i o n S
is yes A
or no or is of the
~he u or c
is a w h - q u e s t i o n = T where 1 is
Ohs v resp.
A(s,1)
log(S)
with
generated described
word replaced by c . If S
by the n a m e is an If S S is is
question
A = yes
A(s,log(S))
= T .
is I carried ~ sentence A
ou~ or I c a n n o t do i~. If
an a s s e r t i v e
is I u n d e r s t a n d .
2. A d i ~ Z o g on a s t r u c t u r e dialog cycles DI Da
is
... Dn and
... Sn
s i R si+ I and
s i * si+ I i~ out.
D i = SiQZiA i
A. = I c a r r i e d I is e v a l u a t e d
a dialog
in a Z - s t r u c t u r e . it m a M e s
It c o n s t i t u t e s forward
through
the s t r u c t u r e is c a r r i e d
a l o n g Ro and out. We w i l l
a step
when-
see in the f o l l o w ~ n g
paragraph
is the e f f e c t
of a c o m m a n d
in a s t r u c t u r e .
44
CHARACTERIZATION OF QUESTION-AN3WERING-Su
BY Z-STHUCTUHES
ller,~ we shall describe how a stru,:ture for the state logic can charactuz'~z,~, a natural language u n d e r s t a n d i n g system. The knowledge that is formulated in such a system is represented by a Kripke-structure. connection with thls the non-logical tions is very important. ~ng. F~,r two structures As, c{~n be executed within In interpretation of state transiHe bears a non-logical meanhol~s iff the "world"
~s us,~d in such a way that the relation J:1 obtained from the world
Worldsa
state changes and their dependence on action verbs i.e. a color, like MAN,
A world is a set of objects that have certain properties, a size, etc. They are subclassified by conceptualizations objects,
TREE, TABLE, HUMAN, etc. There are relations which can hold between i.e. position relations such as ON, BESIDE-OI~, etc., or "abOWN. All these eleObjects stract" relations such as the ownership relation
mlents of a world are represented by the language of the state logic as it can be concluded from the d e s c r i p t i o n of the last paragraph. are con(3tants, properties are predicate or function symbols, relations are predicates of the appropriate number of places. Verbs are relations between objects too and they are represented by predicate symbols of the appropriate number of places. of verbs, We can distinguish two types called statio verbs and dynamic verbs. An n-place verb is i.e. if the formula representaExamples of static
called 8ratio if it does not describe an action that changes relations or functions of objects of the world; ring the assertion of the verb holds within a structure this does not effect any change on objects of the structure. verbs are believe,
structure is subject to some change w h e n e v e r the action described by the verb is executed in it. Examples are take, give, grow, put. somebody takes a thing the position of that thing changes, i.e. the extension of the predicate symbols ON, BEHIND etc., and the extension of the static verb predicate symbol HOLD changes, because the person holds the thing now. Constructive verbs are always dynamic, e.g.
45
l, u t ' t d ,
p,~nt.
She b u ~ d m
a house.
Ha p a { n t s
a p~otuvo.
The ~tpi,I'opz't~Lte
thb~g ~:ornet~ I n t o e x i s t e n c e
ar~ corldLtions |,osslble if such both a does
by e x e c u t i o n
of the action,
and if b
Furthe,', there
is orlly has a position b . We describe by non-loglcul in
of an action,
a takes b
is nothil~g on of an actlon
axioms. whatever
must
of an action hold
We would ~xl...Xn
out a p e c u l i a r i t y dynamic
when
expressions
are evaluated:
If an expression verb v
a state
of a structure
that
action
following
all the
a state more
on each other.
by a text
or a dialog which
constitutes
something
like a path
a structure.
of action
verbs by n o n - l o s i c a l
axioms
verb
there
are two n o n - l o g i c a l
axioms
de-
axiom
~x,...Xn
(2) Ex~cut[on ~xl...Xn C E is called is called
C[x,,...Xn]
axiom E[xl,...Xn] v and
v v.
(CA) and
Naturally
C[xl,...Xn]
~ E[xl,...Xn]
holds.
what
requirements
must be complied
in an action take
can be executed.
46
(CA)
TAKE
xy x
HAND can
x ^ THING take y
y ^-~I|OLD x and
x z ^-~ON and
z y y is a t h i n g or, y . E in and
't'hlu m,,ans: x
iff object
is a hand there
any o t h e r
is n o t h l n g
of the does
execution not
of an action. F,P,+,-
form
contain
as sub-
tuke x y if that * x x ~ +~[HOLD takes holds always v s , y y x y ^ ~ON then and y z] is an i m m e d i a t e l y lying following
means: such
there y
is not
on anything.
C ~ E
holds
we have: and every (s Re s'), by v state where s there is a state of the s' im-
action
verb
following action
the r e s u l t s its
execuhold.
of the
described
hold whenever
conditions
SO, we
can d e f i n e
a model
for a c t i o n
verbs:
Del'tnttion
Let verbs tion A
S
and
and axioms
a set of n o n - l o g i c a l
i.e.
condition
is a m o d e Z
r M
.
A 9 ~
and all
Because Let
of the
correctness Then of v
of
the s 9 M
theorem and N v c V
be a m o d e l E = ~ +~
for every
every
is the
execution
= T - there
is
s' = T
such
that
s R s'
and
s * s'
and
A(s,E')
The
(CA)
and
(EA)
is i m p o r t a n t
too for the use commands. a state the use and the con-
of the Then
answers
or e x e c u t i n g possible by
a system
verify where
(CA), (CA)
is not This
found (CA)
axioms.
When
is v e r i f i e d within the
can be So
out
can p u r s u e "heuristic"
structure.
and
(EA)
for the
execution
of commands.
47 CONCLUSION We have given a device to describe how natural be tr~Jn~lated an underlying into a semantic representation knowledge system. The proposed language sentences can
a first attempt in formalizing "what is understanding of natural langu:~ge text". Among the most important problems is a bett~,r and more refined subclassificatlon of adjectives, a sati~3fscto~'y descrlptlon of mass nouns and a revision of the logic to allow the ap|,earenc~ a n d dlsappearence of objects of the world. As to the latter we have either to revise the substitution rule or allow only closed formulae to be manipulated by inference factory solutions rules. Both restrictions [18], [19]). are not very satis(see also [17],
I~EI"EHENCES
[4]
[5] [6]
Winograd, T., Understanding Natural Language. Academic Press 1972. Schank, R.C. and Abelson, R.P~ Scripts, Plans, and Knowledge. Advance Papers of the IJCAI 4, Sept. 1975. Bobrow, D., Natural Language Input for a Computer Problem Solving System. In Minsky, M., Ed., Semantic Information Processing. Cambridge: The MIT Press, 1968. Kellogg, C., A. Natural Language Compiler for on-line Data management. Proceedings of the Fall Joint Computer Conference. New York: Spartan, 1968. Woods, W.A., Transition Network Grammars for Natural Language Analysis. Comm. of the ACM vol.13, Nr.10, 0ct.1974. Ershov, A.P., Mel'chuk, I.A., Nariniany, A.S., HITA - An Experimental Man-Computer System on a Natural Language Basis. Advance Papers of the IJCAI 4, Sept.1975. Rescher, N. and Urquhart, A., Temporal Logic. Springer Verlag, Wien 1971. Kr~pke, S.A., Semantical Analysis of Modal Logic I Normal Modal Propositional Calculi. Zeitschr. f. math. Logik und Grundlagen d. Math. Bd.9, 1963. SchGtte, K., Vollst~ndige Systeme modaler und intultionistischer Logik. Springer Verlag 1968.
[7]
[8] [91
[10] Braun, S., Eigenschaften strukturierter Symbole in formalen Sprachen. Habilitationsschrift, MGnchen 1971. [11] Knuth, D.E., Semantics of Context-Free Languages. Math. Syst. Theory 2, 1969. [12] Koster, C.H.A., Affix-Grammars. In Peck, J.E.L., ALGOL 68 Implem,~ntation, North Holland Publ. Comp. 1971. [13] Chomsky, N., Aspects of the Theory of Syntax. Cambridge, MIT 1965. [14] v;tn W1jngaarden, A., E~., et al., Report on the Algorithmic Language ALGOL 68.
48 [15] [16] [17] [18] [19] Bruce, B., Case Systems for Natural Langua~.<,. Artificial lut,.tllgence 6, 1975. Moravcslk, J., Mass Terms ~n English. In H]utikka et al, ~Sd., AI,proaches to Natural Langun~e, D. Heidel Publ. Comp, Ig7~. H~ntlkka~ J., Modality and Quarltifioation. T~Weoria 1961. Krlpke, S.A., Semantical Considerations on Modal Logig. Aota Philosophlca Fennlca 1963. Montague, R., Universal Grammar. Theoria 36, 1970.
ACCESS
TO DATA
BASE
SYSTEMS
VIA NATURAL
LANGUAGE
Kr~geloh
Universit~t
Abstract
with
computer
via natural
language
of a r t i f i c i a l by s i m u l a t i n g
intelligence. human
Modern
approaches The
language
perception.
the s e m a n t i c s The
of
natural
statements
data formal
on the o t h e r subject
model due
matter
in the
form of a d a t a
base.
is m a i n l y
to the large
amount short
of d a t a time.
that m u s t The
be i n s p e c t e d of any diato
within
semantics
the m a c h i n e model.
that
all
statements
can be r e l a t e d
In l i g h t
of this
difference
the d e f i n i t i o n
of a n a t u r a l
query
language
for d a t a b a s e to a m o d e l l i n g
systems
of n a t u r a l
language
system will
to be a p p r o a c h e d one will
intelligence.
In g e n e r a l
research the
in a r t i f i c i a l
simpler different
base
account. will
In o t h e r
cases
pragmatic
intent
language
access
to a d a t a
In d o i n g base
language
analysis model,
systems
are d e v e l o p e d
the s y n t a c t i c premises
analysis,
semantic system
validity that
These
underly
on a d a t a
base
provides
for a s e t - t h e o r e t i c
modelling
and a G e r m a n
language
interface.
50
Goals
of n a t u r a l
language
processing
systems
1.1
Simulation
of n a t u r a l
language
understanding
Natural
language
long
tradition ties
in i n f o r m a t i c s .
these a c t i v i (AI) or
intelligence
understanding cognitive
systems
a i m to s i m u l a t e
processes
is the l i n g u i s t i c to l a n g u a g e a cog-
of m e a n i n g
of p r o d u c i n g
image of his e n v i r o n m e n t .
(called a model)
is always
an a b s t r a c t i o n it is to serve.
understanding,
real w o r l d are r e l a t e d
to the c o g n i t i v e m o d e l such as m o d i f i c a t i o n s
of this p r o c e s s
above all r e q u i r e s
specification
of a
system
(MS), by m e a n s of w h i c h as a m o d e l
any e n v i r o n m e n t problem).
or p a r t of it
(representation
s y s t e m is a l w a y s b a s e d on a m o d e l
formulated
expressions
understanding
is the fact
units
or " d e m o c r a c y "
systems
however, models
leads to
v e r y c o m p l e x MS and, cases w h e r e
to e x t e n s i v e
even in t h o s e
only small
of the e n v i r o n m e n t networks
are c o n s i d e r e d [5]).
[3], d e p e n d e n c y
[4], demons
the f i t t i n g
p r o b l e m and,
if of concern,
the s y s t e m reactions.
51
T a k e as an e x a m p l e those subnets
s e m a n t i c net all
corresponding
B o t h in h u m a n c o g n i t i o n be p h y s i c a l l y realized. (brain,
represented,
case w e a s s u m e a p r a c t i c a l l y This p r o v i d e s
neurons,
a fully associative
structures,
In c o m p u t e r
simulation, of l i m i t e d exten-
devices
and p r o c e s s e s
sive models,
this m e r e l y allows
and r e p r e s e n t a t i o n Likewise, no on
of a d r a s t i c a l l y manipulation a computer
l i m i t e d p o r t i o n of the e n v i r o n m e n t . on an a s s o c i a t i v e basis
of m o d e l s
is d i r e c t l y p o s s i b l e
as yet.
Instead,
the a s s o c i a t i v e
thus c a u s i n g times
considerable
costs a n d p r o c e s s
in an i n t e r a c t i v e
mode. less t r o u b l e s o m e as n e w h a r d w a r e
become
available etc.).
LISP-machines,
discussions,
on c o m p u t e r [6].
r e a l l y d i f f e r just q u a n t i t a t i v e l y
qualitatively
If one a p p r o a c h e s
the same p r o b l e m
from d a t a b a s e t e c h n o l o g y , of a d m i n i s t r a t i n g
however,
the n e c e s s i t y Consequently,
and p r o c e s s i n g is
again regarded
modelling
s y s t e m c a n n o t be chosen
at w i l l but m u s t m e e t a
n u m b e r of o b j e c t s representation
and facts to be m o d e l l e d
s h o u l d take up as little
storage
2) T h e rules
even e x t e n s i v e m o d e l s in a data b a s e
times
system). their MS to
in data b a s e t e c h n o l o g y w i l l a b s t r a c t
f r o m the real w o r l d
52
more
than
the u s u a l
Consequently, be m u c h details
the r a n g e than
smaller that
in AI,
are m u c h m o r e Thus,
to r e j e c t in DT are Since in DT in
one m a y w i s h towards
to i n c l u d e
in them.
generally
oriented
narrow
purpose.
defined is t h e n
on m o d e l s called
they
are u s u a l l y
included
language
access
to data access
systems not
by p r o v i d i n g one h o p e s the
easy
to u s e r s more
familiar
formal [14,15,16,
system
to b e c o m e
widely
available
in MS, data
it is o b v i o u s have
that
language
under-
bases
to be b a s e d
on s o m e t h i n g are
than
of c o g n i t i v e interfaces
processes.
In DT the M S data b a s e
rigorously
defined the
of e x i s t i n g
systems. necessary
to an e x t e n t Under these
language
systems,
although
much more
programming For
languages, use,
is still
a formal
practical
these
restrictions in c a s e
are q u i t e base
tests
of the u s e r access
behaviour
of d a t a English
that
provided that
by u n r e s t r i c t e d language
language is n e v e r almost
shown,
of n a t u r a l certain
expressions were
utilized
query
structures
used
indicating highly
that
a natural
language user's
interface point
for d a t a too.
becomes be due
stylized
f r o m the
of view,
to a c e r t a i n
lethargy
to w h o m
it is d i f f i c u l t
complex of this
statements paper
is to e x a m i n e to the
the
consequences imposed
processing
conditions
by d a t a
interfaces. f r o m AI m a y for d a t a
include
discussion
of the q u e s t i o n
incorporated access,
of n a t u r a l violating techniques
languages the a s p e c t
and to w h i c h
without
of p r a c t i c a b i l i t y . (automatic area
In a d d i t i o n indexing,
to AI,
analysis
morphemic their
analysis results.
that may
contribute will be
clusions means
of t h e s e
illustrated data
in this base
of n a t u r a l
to a p a r t i c u l a r
system.
53
above,
the d i f f e r e n c e s
approach.
We shall e x p l o r e
of the d e v e l o p m e n t
of l a n g u a g e p r o c e s s i n g
1.2
Approaches
to n a t u r a l
language processing
in AI
The e a r l y l a n g u a g e
processing
systems
in A I c o n c e r n e d language
themselves
with
a natural
into a n o t h e r
[23]. T h e s e
a certain knowledge
of the e x t e r n a l
s t r u c t u r e of the
(grammar),
(dictionary_of
approaches
and d i c t i o n a r y meaning
by t h e m s e l v e s of sentences
proved [I]:
insufficient
to deal w i t h
the d i f f e r e n t
like
(I) The fish was b o u g h t by the cook. (2) The fish was b o u g h t by the river. The system would have something is a p p l i e d to k n o w a b o u t the real w o r l d to w h i c h the d i f f e r e n c e (place). in t r a n s l a t i o n in the
the s e n t e n c e meaning
in o r d e r to r e c o g n i z e (person)
language under-
of a c o m p u t e r
to c o m p r e h e n d
natural
in m a n - m a c h i n e "plausible" the T u r i n g
communication
the m a c h i n e
that seems
A s y s t e m w o u l d be p e r f e c t
if it p a s s e d
In o r d e r to e n a b l e least knowledge
the s i m u l a t i o n
language relative
at
a b o u t the m e a n i n g
e n v i r o n m e n t m u s t be p r o v i d e d purpose the e n v i r o n m e n t
or p a r t of it has to be m o d e l l e d is a s s i g n e d
of the m o d e l however,
below,
(4) is m e a n i n g l e s s ,
the s y s t e m r e q u i r e s (complex)
knowledge
of the e n v i r o n m e n t
than the
meaning
b e t w e e n m o d e l parts.
54
Some of the MS u s e d in the past c a n n o t m e e t this c o n d i t i o n or, only if a p p l i e d interest. standing [29]. to s e c t i o n s of an e n v i r o n m e n t
at best,
In p a r t i c u l a r , systems
this a p p l i e s [26],
to the e a r l i e s t SIR
such as S T U D E N T the S T U D E N T
[27], B A S E B A L L
For example,
as MS,
restricted
to a small m a t h e m a t i c a l for v e r y s p e c i f i c
that w e r e
suitable only
situations
of a formal
best-known
among
(applied to a l a n g u a g e
understanding
system
by C o l e s
In a f o r m a l - l a n g u a g e
MS the e n v i r o n m e n t (axioms).
of c e r t a i n b a s i c units
Strictly
for d e r i v i n g
b e t w e e n m o d e l parts. to c o n s t r u c t description
[31] or W o o d s
m o r e c o m p l e x MS f r o m p r o g r a m m e s of e n v i r o n m e n t (e.g. S h a n k situations
w h i c h p r o v i d e b o t h the changes.
and o p e r a t i o n a l
Other authors
[4]) a r g u e that l a n g u a g e
as MS c o m p l e x d a t a s t r u c t u r e s networks
such as
[3] or d e p e n d e n c y
is s o l v e d by d e m o n s t r a t i n g (blockworld
the c a p a b i l i t i e s guide
ly r e s t r i c t e d w o r l d s geology [33],
[31], A i r l i n e [34]).
industrial
enterprise
1.3
Approaches
to n a t u r a l
language
processing
in DT
The a p p l i c a t i o n industry
of c o m p u t e r s
to the p r o c e s s i n g
of large data
sets
in
and public
lead to the d e v e l o p m e n t
of a class
of s p e c i a l - p u r p o s e systems". management
A data base
system provides
and p r o c e s s i n g
which make
the m o d e l l i n g
of e x t e n s i v e to
The s t r u c t u r e
of m o s t MS is s u c h that it is p o s s i b l e according to c o m p a r a t i v e l y
the d e s c r i p t i o n
of m o d e l s
simple The
and to e x p r e s s languages
their processing
b y m e a n s of algorithms.
developed
for that p u r p o s e
are c a l l e d the i n t e r f a c e s
55
of d a t a [9],
base
systems.
known [13],
MS
in use
are
Model
the N e t w o r k
and the
Binary As soon
attempted
to m a k e EPD,
data even
base
systems
available
to the
casual
unfamiliar
with
simple
formalization
rules handi-
corresponding users.
formal
languages
proved
to be a s e v e r e language need
to m a n y
Natural
language
some h o p e
of o v e r c o m i n g
these
the u ~ e r have
to o b s e r v e
restricaccess like
well-known be
to him.
Natural
language in areas or p u b l i c
to d a t a
advantage
industrial
management, where
engineering - as p a r t
of a p r o b l e m in w h i c h [17, he
employ and
the v e r y solution
formulates of m e a n i n g s
problem
33]).
of n a t u r a l by the MS
language
queries
system
is d e t e r m i n e d towards base
(data m o d e l ) .
Generally, The
oriented the d a t a
cognitive
theories
as it is in AI. no m a t t e r of n a t u r a l what
system it,
is m a i n t a i n e d in the case
language
is c h o s e n too.
i.e. the
language
access,
systems language
a i m of l a n g u a g e into
processing of the
queries
expressions
interface. was one of the CONVERSE first to a d v a n c e a formal this approach. He c h o s e as the
system
[39]
language
to i n f o r pur-
retrieval Likewise,
and h e n c e Woods
containing
operators
p u t his p r o c e d u r a l and
f o r m of an for the
in w h i c h
predicate Both
function
defined
procedures.
these Also
approaches
to be i n c l u d e d
at this
m a y be i n t e r p r e t e d and did
model
a formal the
he - just data
Woods into
- was
heavily system.
concerned
with
integration
large
bases
his REL
expressiveness
of n a t u r a l data
language base
goes
far b e y o n d In o r d e r
the p o w e r
of the of
interfaces this
of a v a i l a b l e the
systems.
to m a k e for
some u s e ex-
expressiveness
formal
MS must
at l e a s t
allow
lengthy
~6
pressions
b a s e d on a small n u m b e r of o p e r a t o r s n e s t i n g of functions).
In general, [41].
in s o - c a l l e d n a v i g a t i n g
systems
In t h e s e a c o m p l e x into a The
is d e s c r i b e d b y a s i n g l e e x p r e s s i o n w h i c h is d i v i d e d or p a r a l l e l
as a g u i d e to f u r t h e r model; Counter or
example
is the r e l a t i o n a l
e x i s t for a n a t u r a l
language
access
to it [17,42].
s e e m to be the N e t w o r k Model,
file m a n a g e m e n t .
Correspondingly, access
have b e e n pub-
for use of n a t u r a l - l a n g u a g e
that m i g h t p r o v i d e
a basis
are
[12], L E A P - s t r u c t u r e s
[43] or m a t h e m a t i c a l
relations
strating natural-language
57
A set-theoretic
modelling
system
(set language)
The s e t - t h e o r e t i c
modelling
language
MS in types 2-I
based on set and r e l a t i o n the language, operands the language and the control
algorithms (object)
must be c l a s s i f i e d
into operators,
of the set language together with for the instances corresponding theoretic Further,
together
symbols
are listed
in fig.
examples objemt
application.
by i n d e x i n g includes operators
the symbols
operators some
2-2).
logical
operators
structure
the o p e r a t o r s
is i n d i c a t e d
in the language
by e x p r e s s i o n s
~(I
in functional
, MA (M
notation,
Steicard
Iheart neurosis ))) drug for heart neurosis?) , @. They offer of complexity: drugs? cytostatic drugs? the
Is S t e i c a r d i n applied
were
in the sequence:
Vg, M n
are i n t r o d u c e d possibility
quantifiers.
to formulate
(I) Are all drugs (2) W h i c h In both tics") those examples a certain with elements
antibiotics
are i n c o m p a t i b l e
the flow of control of a given condition cytostatic listed set drugs) (i.e. (e.g.:
is identical: "drugs for glaucoma" to a test. In or "antibioto be in(2) only (I) cormachine: drug,
are
for w h i c h
"true".
to the f o l l o w i n g
formulation
in the s e t - t h e o r e t i c
are: EI: three some arguments: each of its s u b s t i t u t i o n s xI (range): ZB: how many
of the loop:
in a set of objects
Vg(Rdrug' Iglaucom a)
58
59
Premises
for l a n g u a g e
analysis
in d a t a
base
systems
The main
purpose
of d a t a
base
systems
is to p r o v i d e of d a t a
tools
for the
management
of large
volumes
that
on p e r i p h e r a l justified as storage be
devices.
Access
by n a t u r a l
language
if it does space
not consume
an i n o r d i n a t e
amount
of r e s o u r c e s
or p r o c e s s i n g on a n a t u r a l interface
time.
Consequently interface
a number
of r e s t r i c t i o n s system system. of a
must
imposed
language
in a data b a s e
as c o m p a r e d (I) The
to the
of a g e n e r a l should
language
understanding in terms
natural
language model.
interface
be d e s c r i b a b l e
simple This
syntax
suggests
to l i m i t has
the shown
syntax that
model
to c o n t e x t - f r e e grammars but
Previous for
research
the p u r p o s e
of d e f i n i n g
natural
the e x a m p l e s complex
in the mars
literature
of m o r e as
gramapplia
are of a r a t h e r in d a t a base
far as t h e i r [45]
cability
systems
Kratzer
defined
large without
subset
of n a t u r a l
by m e a n s
of a c o n t e x t the semandefiMS
any n e e d
for r e s t r i c t i n g expect
subset.
one w o u l d
nition and
to be j u s t i f i e d
the m o r e
in c o n n e c t i o n
of the
semantics that
to it.
The work
by M a l h o t r a sive
indicates
is no n e e d area. Hence
language
definition languages
seem
to p l a c e
formulations should be
procedures
analysis
of n a t u r a l
language,
in p a r t i c u l a r , Depending forms)
duces
the p r o b l e m error
of m o r p h e m i c rate
analysis
[20,46]. word
reduced rise
costs
this simple
arbitrarily be chosen
high here
procedures
should
resulting low
simple
procedures
is s u r p r i s i n g l y should be
requirements
in d e f i n i n g to a large MS accounts
a language extent.
for d a t a
m a y be o m i t t e d as soon as a the
Obviously,
verbs
indispensible and,
for t e m p o r a l ' r e l a t i o n s h i p s
consequently,
permits
60
description
of d y n a m i c p r o c e s s e s however, have
(e.g.
see REL
[40]).
Data b a s e s
of that kind,
so far n e v e r gone b e y o n d p i l o t studies; use do not include t h e m yet. on the average, influences fast. of the
d a t a b a s e systems (4) The p a r s e r Like (I) and total free system. languages
in p r a c t i c a l
the c o m p l e x i t y
a n u m b e r of p a r s e r s
for c o n t e x t is m e a s u r e d sentento
[48,49,50].
Their efficiency
for p r o c e s s i n g
n words
the e f f i c i e n c y
is p r o p o r t i o n a l
for q u e r i e s
to a d a t a b a s e s y s t e m n is c o m p a r a t i v e l y
small, so that for choosing a parser the factor k becomes of major importance. (5) The s e m a n t i c tic analysis. The s e m a n t i c v a l i d i t y of a q u e r y m a y be c o n t r o l l e d with the retrieval. Concurrent in c o m b i n a t i o n validity test s h o u l d be p e r f o r m e d only after a s y n t a c -
far m o r e n e g a t i v e
effect,
since all d e a d ends d u r i n g the a n a l y s i s time a l t h o u g h not to the result. [51] q u i t e a n u m b e r of a u t h o r s [17,32]
at n a t u r a l - l a n g u a g e
d e f e n d the p r i n c i p l e syntactically
of p o s t p o n i n g
The n u m b e r of
c o r r e c t but s e m a n t i c a l l y m e a n i n g l e s s structure
of the g r a m m a r
a s u b s e t of the G e r m a n
l a n g u a g e was d e f i n e d
the K A I F A S data b a s e
(4.1). F o l l o w i n g
analysis
w a s developed.
The p a r s e r
since
language
traditional
This p r o c e s s
consists
(a) l e x i c a l
analysis analysis
(b) s y n t a c t i c a l
(c) code g e n e r a t i o n (d) t r a n s f o r m a t i o n s . Fig. these 3-I i l l u s t r a t e s the f o l l o w i n g the i n t e r a c t i o n functions between the steps. D u r i n g each of
(a) T h e q u e r y
is d i v i d e d
into t e r m i n a l
n a r y on those, language
their corresponding
representations
on the set-
level is found.
61
the syntactical
(c) If the query is parsed to a sentence, an expression (d) Then, of the set-language
62
Translation:
Natural
language
into set l a n g u a g e
4.1
4.1.1
Vocabular[
a context-free
formalization
of the n a t u r a l
lan-
as a c o n t e x t - f r e e be p r a c t i c a b l e ,
as i m p o r t a n t
comprehensible b y the
and user,
and easy to
Practicability
grammar
contains phenomena
in o r d e r to ia~of case,
the s y n t a c t i c a l
In m a n y n a t u r a l
the c o m b i n a t i o n s
T h e s e m a y be c o n s i d e r e d
a main category
to s y n t a c t i c a l
r e f e r to s e c o n d a r y p h e n o m e n a correspond
of a f e a t u r e (I) (2) = =
to such p h e n o m e n a ,
Schemas
denote
sets of n o n - t e r m i n a l s , N num,cas,gen
denotes
a set of 24 n o n - t e r m i n a l s
Complex categories
m a y be p a r t i a l l y
o r d e r e d by a s s i g n i n g N
to the features:
for each,
Assigning
a s i n g l e v a l u e to each f e a t u r e of a c o m p l e x
results
The t r e a t m e n t existing
of c o m p l e x c a t e g o r i e s
approaches
(a) B e c a u s e chosen
of the r e s t r i c t e d in a c c o r d a n c e w i t h ME,
for sets:
for relations:
In this w a y we m a k e
63
valid constructions
(b) A grammar whose main categories not "naturally" nouns, reject sentences in number,
syntactical
the features.
main categories features For practical tained. The set language, (syntactical)
defined for quantifiers. (c) Only binary features case features: gender features: number features: The binary values are allowed:
~enj femj dat, neu acc
hum, mas,
sin, p l u
tures may easily be expressed by logic formulas. Fig. 4-I provides a list of the main categories and features of the
KAIFAS grammar.
It is apparent
fall into two classes, (a) object-categories (b) opera~or-categories, corresponding Applying to the classification of the set-theoretic language into in objects and operators. the same distinction (a) object symbols instances of the object types of the set-theoretic in the sense of symbols the operators to the terminals of the grammar results
(the environment
of the language
symbols
is variable area)
and large
(approxi-
in the pharmaceutical
terminal productions
84
by means
of a dic-
(lexical
is d e s c r i b e d
4.2).
form a fixed
symbols
for q u a n t i f i e r s
category
(see above).
symbols
included
4.1.2
casjgen,
num
is a p r o d u c t i o n when
from w h i c h
substituting
suitable
In doing
values
to meet
are s e p a r a t e d
program
which
combination rule.
of feature
values
to the c o m p l e x
categories specifying
The feature
program
the conditions
categories
in the r i g h t - h a n d
and an a s s i g n m e n t complex
([45])
or by means
of programs
language
a complex
(I) Vo VI
rewrite feature
Vo, VI,...
Vp denote complex c a t e g o r i e s , Vo' V I ' ' ' " Vp t h e i r main catea summary of the o p e r a t o r s used in feature programs.
gories.
Fig. The 4-2 provides semantical
part
the m e a n i n g Since
rule.
It consists
of s e t - l a n g u a g e
and p l a c e - m a r k e r s
semantics
of the eomplex
in the rule.
65
aspects
level,
the semantical
on conditions
These dependencies
are defined by
feature programs
(dependencies
on features)
(no dependencies)
A complex rule may be applied in the following are provided in chapter 4.3):
side of the
(2) Testing the right-hand (3) If yielding assignment 4.2 4.2.1 Lexical analysis "true", of features
features
reduction
to left-hand
and semantics
contains
will not be defined until a user actually works with the system. The lexical analysis fulfills two functions: to a terminal
(a) assignment
of a complex category
(a I) assign a main category (a2) assign feature-values, (b) assignment language). A large n u m b e r of syntactical ambiguities may be expressed within one "ein" of semantics (i.e. a terminal symbol of the set
complex category,
is needed as well,
a conjunctive
normal
88
Example
from G e r m a n
language:
n o t for n e u t r u m or femininum.
4.2.2
Morphology
The m u l t i t u d e
of i n f l e c t i o n s all w o r d
in G e r m a n forms
language
does not a l l o w
for sto-
to be d e r i v e d only contains
the d i c t i o n a r y form to w o r d
tion f r o m i n f l e c t i v e (morphological
stem is done of v e r b s
analysis).
The e x c l u s i o n
ture of a t e r m i n a l published
(gender,
Different
have been
for s o l v i n g this p r o b l e m
but all of t h e m r e q u i r e
that e x t e n s i v e
can h a r d l y be e x p e c t e d a w o r d s t e m in K A I F A S the
to s p e c i f y of a w o r d
a m i n i m u m of i n f o r m a t i o n ,
namely:
and p l u r a l
to a s p e c i f i c m o r p h e m i c endings
all m o r p h e m i c
that m a y be a t t a c h e d one or m o r e
Each morphemic
(set of f e a t u r e values) By e x p l i c i t l y
for all t h e s e t e r m i n a l s
s t o r i n g the p l u r a l of t e r m i n a l s
forms of t e r m i n a l s of
reduction
The a d d i t i o n a l forms o c c u r
be t o l e r a t e d , identifiers
in case of
only.
4-4 p r e s e n t s
an e x a m p l e
of a m o r p h e m i c
In o r d e r to save s t o r a g e
space in the d i c t i o n a r y ,
the s y n t a c t i c a l
analysis.
67
Thus
any m o r p h e m i c
class will
contain
an entry
"e".
In fig.
lexical
analysis
of a query
of a terminal
due to
analysis.
The feature
programs,
for easy
as d e m o n s t r a t e d
by the example
4.2.3
Lexical
analysis:
algorithm
analysis algorithm
out according
to the
Z = O,1,2,...,
then assign
of the d i c t i o n a r y entry
of the d i c t i o n a r y defined
by the entry
of X k _ Z . . . x k in the
class specified to each in the dictionary. of a query. parsing The result will process.
algorithm
terminal
be c o n v e r t e d 4.3 Parser
to a form suitable
for the e n s u i n g
completes
the s y n t a c t i c a l
and s e m a n t i c a l
analysis
of a query.
the f o l l o w i n g
(I) The parser has to recognize (2) It must be able to operate (3) The storage be kept val process. (4) F u r t h e r m o r e constructing process. in order space small in c o m p a r i s o n
categories
and e x e c u t i o n
time r e q u i r e d
is i n d e p e n d e n t
grammar.
This
for natural
languages
or to extend
permissible
68
(4), w h e r e a s Earley's
an
parser
suggests
[48].
However,
adapting
an i m p r o v e d v e r s i o n in an u n w i e l d y
of this p a r s e r
to c o m p l e x c a t e g o r i e s condition (3)
resulted
(see
The
on g e n e r a l expressed
rules
in our c o m p l e x
O n l y a short i n t r o d u c t i o n [38].
to this p a r s e r will be p r e s e n t e d ,
The g r a p h c o n t a i n s
is l a b e l l e d by a c o m p l e x c a t e g o r y
an i n i t i a l p a r s i n g
g r a p h is c o n s t r u c t e d
(heavy
It c o n t a i n s
edges o n l y b e t w e e n v e r t i c e s two v e r t i c e s
(l~k~n). number
is Z, w h e r e
of c o m p l e x
a s s i g n e d to the k - t h t e r m i n a l
on the i n i t i a l
for all s e q u e n c e s
of edges
compares
the m a i n c a t e g o r i e s
r, the p a r s e r p e r f o r m s
steps: on the c o m p l e x c a t e g o r i e s
the p a r s e r p r o d u c e s
a n e w edge b e t w e e n The n e w
and e n d i n g v e r t i c e s
edge is l a b e l l e d b y the l e f t - h a n d the f e a t u r e s p r o g r a m of r. (c) T h e edge is a d d i t i o n a l l y all p l a c e m a r k e r s plex categories This p r o c e s s vertix 1. of a q u e r y w i l l p r o v e I and n+1, w h i c h obtained
rule and b y
f r o m the a s s i g n m e n t
in the f e a t u r e
l a b e l l e d by the s e m a n t i c s
r e p l a c e d by p o i n t e r s
to the s e m a n t i c s
in the s e q u e n c e
is r e p e a t e d
for all v e r t i c e s
The p a r s i n g
successful,
tween vertices
is l a b e l l e d by the a x i o m of the g r a m m a r
69
In fig.
4-8 a s p e c i a l
numbering
( @
, Q)
e d g e s have b e e n generated.
the semantics.
space o v e r a s o l u t i o n g e n e r a t i n g
code f r a g m e n t s
4.4
Code g e n e r a t i o n
the f r a g m e n t s
are a s s e m b l e d
language depending
form the r e s u l t of q u e r y t r a n s l a t i o n .
4.5
Transformations
Applying results r
process
to the p a r s i n g g r a p h
in fig.
4-7
(M25,M4),#),
This expression
is n o t w e l l - f o r m e d , (prenex n o r m a l
of this k i n d a n d o t h e r s y n t a c t i c a l
in the set l a n g u a g e
is s u b j e c t to c e r t a i n an e x p r e s s i o n :
their r e l a t i v e
position within
like DB m u s t a p p e a r EI.
in f r o n t of logical q u a n t i -
(b) D i f f i c u l t i e s operator
arise
from the d i f f e r e n c e
in r e l a t i v e
position
of
symbols
for q u a n t i f i e r s
w i t h i n n a t u r a l G e r m a n and in w i t h i n the s e t - t h e o r e t i c
that of t h e i r c o r r e s p o n d i n g equivalent: Which remedies DB(xl,Mdiseases, These problems grammar proves to an a n a l y s i s process
quantifiers
for w h i c h d i s e a s e s
are p r e s c r i p t i o n
DB(x2,Vg(Rremedy,Xl) , ~ (x 2 ,M p r e s c r i p t i o n
these problems
of the p a r s i n g
and h e n c e a f t e r a p p l i c a t i o n
A solution semantical
c o u l d be b a s e d on the t r e e - l i k e fragments,
w h i c h w e shall call a s e m a n t i c a l
70
tical linear
by suitable puts
rules
the q u a n t i f i e r s
Transformations
are usually
formulated
of t r a n s f o r m a t i o n a l
implementing
these requires
in time and p e r s o n e l l
subset
of the transthere
It can be shown
on the linear
an e x p r e s s i o n
pattern
to be transformed.
For example,
of linear e x p r e s s i o n s Thus
are easier
the t r a n s f o r m a t i o n s
are p o s t p o n e d
by means
(Our work
are i n t e g r a t e d
after
some examples
7~
5 The
Conclusions linguistic techniques discussed in this p a p e r have been i m p l e m e n t e d 6700 as part of the KAIFAS has b e e n g a i n e d
at the U n i v e r s i t y information by a p p l y i n g
of K a r l s r u h e
on a B u r r o u g h s
system.
Some e x p e r i e n c e
in t h e i r u s e f u l n e s s
data b a s e c o n t a i n i n g d a t a on market
on the G e r m a n p h a r m a c e u t i c a l
inexperienced
v i a the n a t u r a l G e r m a n of this i m p l e m e n t a t i o n
to test the p r e m i s e s
in
A context-free
proved sufficient
for this a p p l i c a t i o n .
is true in g e n e r a l interface.
can only be d e c i d e d
if one i n c l u d e d verbs
The d e s c r i p t i v e large,
in the s y s t e m w a s
even u n n e c e s s a r i l y tended
to use s u c c e s s i v e l y i.e.
in steps.
clauses
be e x c l u d e d
the p o s s i b i l i t y
of r e f e r e n c e s
to q u e r i e s
involving
solutions analysis
to the w e l l - k n o w n proved
The m o r p h o l o g i c a l inflectional
to be s u f f i c i e n t ,
not guarantee, w i l l be r e f u s e d
incorrect
u n d e r any c i r c u m s t a n c e s . to c f - l a n g u a g e s , t u r n e d out to be is s u p e r i o r to in
T h e M . K a y - p a r s e r , w h i c h we r e s t r i c t e d a very simple a l g o r i t h m .
Earley's
the n e i g h b o u r h o o d is p a r t i c u l a r l y
of ten w o r d s or less.
Consequently,
the M . K a y p a r s e r above.
s u i t e d to the s t e p w i s e u s e r a p p r o a c h m e n t i o n e d
72
Fig. 2-I
Object
types
e.g.
Thomapyrin, diseases
Perphyllon
M
R
drugs,
Relations,
manufacturer Lists of pairs of i n d i v i d u a l s to cover n - a r y relations) Numbers Measures, Measure e.g. 4 tablets/day e.g. dosage last c o m p o n e n t (Work is u n d e r w a y
functions,
L i s t s of o r d e r e d n - t u p l e s w h o s e is a m e a s u r e truth values
Operands
I I, 12, I3,..., I n , M I, M 2, M 3, .... , M k, R I, ....
73 Fig. 2-2
Operators on sets: Mb(I l,...,I n ) Mu (M 1 ,M 2 ) M ( M 1 ,M 2 ) Km (M 1 ,M 2 ) Kz (M I ) on relations: Ko (R 1 ) Rb (R 1 ,M I ) RP (RI ,R 2 ) RU (R 1 'R2 ) reduction of b i n a r y relations: domain range {xI~y: {xI3y: range (x,y)~ R I} (y,x)& R I} {xl (x,Ii)~ R I} {xl (If,x)~ R I} converse restriction
product union
({ (x,y) I ( x , y ) ~ R 1 A x G Mi})
v o (R i )
domain
number
G (II,M I )
C (MI,M 2 )
74
Fig.
3-I
Translation
to set language
75
Fig.
4-I
Main cate@ories
AF DZ ED EI IN ME MF PR RE RP RS SA SF QU VO ZA
AL - quantifier measures t e r m for e v a l u a t i n g m e a s u r e s m e a s u r e units p r o p e r name set measure function o p e r a t o r s : r e l a t i o n s sets (prepositions) relations DB-quantifier r e s t r i c t i o n of sets (relative clause) sentence EI / KE - q u a n t i f i e r other quantifiers relational operator number
Features
syntactical mas fem neu nom gen dat acc sin plu adj att ajm pdt prm pom std svk
aspects:
masculinum femininum neutrum nominative genitive dative accusative number adjective/noun attributive adjective-modified predetermined (the drug) premodified (Peter's friend) postmodified (friend of Peter) strong declination stops g e n i t i v e c o n c a t e n a t i o n s aspects:
76
(<complex category>,<list of feature-values>) yields true, if the complex category has associated with it the feature-values specified, else false. (<complex category>,<complex category>,<list of features>) yields true, whenever at least one of the listed features agrees in both complex categories specified. (<complex category>,<complex category>,<list of features>) same as meq, but all features must agree.
meq
equ
A , V logical connectives
(2) assignment part: (all assignments are to the complex category of the left rule part)
zuw
(<list of feature-values>) assigns the feature-values specified. (<complex category>,<list of features>) copies the values of the features of the denoted complex symbols. (<complex category>,<complex category>,<list of features>) assigns those feature-values which agree in both complex categories.
cop
and
77
Fig.
4-3
(I (2
ME ME
ME
rewrite
rule
test (ME 2, +adj-att) ^ test (ME3,-adj)^ feature p r o g r a m meq (ME2,ME 3, sin,plu) meq (ME2,ME3,nom,~e~dat, acc) A (test) meq
(ME 2, ME3,mas, femjneu);
(3
zuw(-adj), and(ME2,ME3,sin, plu), and(ME2,ME3,mas, fem, neu) feature p r o g r a m and (ME2,ME 3 , nom, gen, dat, acc); (assignment)
M~ (ME2,ME 3) semantical part
(4
The upper
index
(ME 2) serves
for d i s a m b i g u a t i o n
categoand
test excludes
adjectives or
The r e s u l t i n g
complex
category in gender
ambiguities
The semantics
of the p r o d u c t i o n
78 Fig. 4-4
ending
"e"
"es"
Word
set-theoretic representation
Welche +mas+fem+neu+nom+acc+plu +fem+nom+acc+sin DB DB +mas+neu+gen+dat+acc+sin +fem+neu+gen+dat+sin +mas+fem+neu+nom+gen+dat+acc+plu M25 M25 M25 +neu+nom+gen+acc+plu M4
QU QU
drageef~rmigen
ME ME ME
Psychopharmaka
ME
haben
+nom+dat+acc
<terminal> I64
-
Depression
IN
als + f e m + n o m + g e n + d a t + a c c + s in
<terminal>
Ng R8 s
Indikation
RE
80
Fig. 4-6
DisambiHuation Rule :
ME i + ME 2
ME 3
Meq (mas, fem, neu, ME2, ME3) ^ Meq (nom, gen, dat, acc, ME2, ME3 )^ Meq (sin,plu,ME 2 ,ME3) ; And (mas, fem, neu, ME 2 ,ME 3 ) ,And(nom,gen,dat,acc,ME2,ME 3 ) , And (sin,plu,ME 2 ,ME 3 ) ; M n (2,3) ; applied to (i) (2) (3) (4) ME
ME
+mas-fem-neu-nom+gen +dat+acc+sin~plu
-mas+fem+neu-nom+gen +dat-acc+sin-plu
ME
ME
+mas+fem+neu+nom+gen +dat+acc-sin+plu
-mas-fem+neu+nom+gen -dat+acc-sin+plu
accepts
combinations
(3)
-mas-fem+neuenom+gen -dat+acc-sin+plu ME 3
(draqeef6rmigen Psychopharmaka)
Rule :
M E 1 + QU 2
Meq(mas,fem,neu,QU2,ME3)A Meq(nom,gen,dat,acc,QU2,ME3)^ Meq(sin,plu,QU2,ME3); And(mas,fem,neu,QU2,ME3), And(nom,gen,dat,acc,QU2,ME3), And(sin,plu,QU2,ME3): 2(x,3,#) applied to (6) (7) (5) Qu QU ME +mas+fem+neu+nom-gen-dat +acc-sin+plu -mas+fem-neu+nom-gen-dat +acc+sin-plu -mas-fem+neu+nom+gen -dat+acc-sin+plu (6) / (5) is a c c e p t e d . (welche drageef~rmigen Psychopharmaka) (drageef~rmigen Psychopharmaka) ~welche)
Only
Combination ME
Result:
-mas-fem+neu+nom-gen -dat+acc-sin+plu
Thus
all
ambiguities
with
the e x c e p t i o n
of case
(nominativ/accusative)
are resolved.
Fig. 4 - 7
Only
those
features
are
listed
which
prevent
a rule
from
being
applied.
c (o,%)
~ ~ SA ~
//
/I
,,
,/
/|
//
~-+ ~i~\
II
//
"".
"-..
@ D~//
\
\~\\
/
(:D /| M4
""\
ME ~ ' ~
5 haben
,
6 Depression
,. . . . . . . . . . .
7 als
.
8 Indikation
Welche
drageef6rmigen
Psychopharmaka
ME + IN als RE;
ME I + Qu
semantics only):
ME1+ ME 2
M ~ (ME2,ME3) Only those rules are stated whose application do not result in dead-ends~
82
Fig.
4-8
Transformations
I)
haben
Transformation: DB(x,Mn
(M25,M4), G(x,Ng(R8,I128)))
2) Welche Indikationen w e l c h e r M e d i k a m e n t e Psychosen? P r e l i m i n a r y translation: (DB(x I,Vg(R8,DB(x 2,M29,#)) ,#) , M30) Transformation: Both q u a n t i f i e r s
sind
83
[I ]
Computer
[2 ]
L.C.Smith: A r t i f i c i a l I n t e l l i g e n c e in I n f o r m a t i o n R e t r i e v a l Systems. I n f o r m a t i o n P r o c e s s i n g & M a n a g e m e n t , Voi.12, pp.189-222, P e r g a m o n (1976) R.Quillian: Semantic Memory. In M . M i n s k y (ed.) : Semantic tion Processing. MIT Press, Cambridge, Mass. (1968) R. Schank: I d e n t i f i c a t i o n of C o n c e p t u a l i z a t i o n s Language. In Schank and Colby (eds.) : C o m p u t e r and Language, pp. 187-248, F r e e m a n (1973) Informa-
[3 ] [4 ]
[5 ] [6 ]
E.Charniak: T o w a r d a Model of C h i l d r e n ' s Story Comprehension. MIT A r t i f i c i a l I n t e l l i g e n c e Laboratory, Cambridge, Mass. (1972) V . C h e r n i a v s k y : On A l g o r i t h m i c N a t u r a l L a n g u a g e A n a l y s i s and Understanding. A d v a n c e d Course on Data Base L a n g u a g e s and N a t u r a l Language Processing, F r e u d e n s t a d t (Sept. 1976) Procs. IFIP-TC-2 W o r k i n g C o n f e r e n c e on: M o d e l l i n g M a n a g e m e n t Systems, Freudenstadt, 5 . - 9 . 1 . 1 9 7 6 in Data Base
[7 ] [8]
R.Durchholz, G.Richter: Concepts for Data Base M a n a g e m e n t Systems. Procs. IFIP-TC-2 W o r k i n g C o n f e r e n c e on "Data Base M a n a g e m e n t Systems", Cargese, Corsica, N o r t h - H o l l a n d P u b l i s h i n g Co. (1974) E.F.Codd: A R e l a t i o n a l Model of Data Comm. ACM 13 (1970), pp. 377-387 R.W.Taylor, R.L.Frank: CODASYL C o m p u t i n g Surveys, Vol.8, Nr.1 CODASYL-DBTG, for Large Shared Data Banks. Systems. ACM
[9 ] [10 ]
[11 ] [12 ]
(1971)
J.R.Abrial: Data Semantics. Procs. IFIP-TC-2 on "Data Base M a n a g e m e n t Systems", Cargese, land P u b l i s h i n g Co. (1974) C.J.Date: An I n t r o d u c t i o n Publ. C o m p a n y (1975) C.A.Montgomery: Proc. A C M Natl. to D a t a b a s e
Systems.
K.D.Kr~geloh, P.C.Lockemann: H i e r a r c h i e s of Data Base Languages: An Example. I n f o r m a t i o n Systems, Vol.1, pp.79-90, P e r g a m o n Press (1975) E.F.Codd: Seven Steps to R e n d e z v o u s with the Casual User. Procs. IFIP-TC-2 W o r k i n g C o n f e r e n c e on "Data Base M a n a g e m e n t Systems", Cargese, Corsica, N o r t h - H o l l a n d P u b l i s h i n g Co. (1974) A.Malhotra: D e s i g n C r i t e r i a for a K n o w l e d g e - B a s e d English Language System for Management: An E x p e r i m e n t a l Analysis. MIT Project MAC, Cambridge, Mass. (1975) Data Base. Vol.8, Nr.2 (1968)
[17 ]
[18 ]
[19]
84
[20]
G.Schott: Automatische Analyse der Flexionsmerpheme deutscher Substantive. Technische Universit~t MHnchen, Abteilung Mathematik, Gruppe Informatik, Bericht Nr.7210 (1972) Zur maschinellen Syntaxanalyse. Forschungsberichte, Institut fHr Deutsche Sprache, Mannheim, Bd. 18.1, 18.2, 19., Narr-Verlag, T~bingen (1974) G.Salton: Automatic Information Organization McGraw Hill Book Co., New York (1968) and Retrieval.
[21]
[22] [23]
H.L.Josselson: Automatic Translation of Languages Since 1960: A Linguist's View. In: Advances in Computers 11 (1971), pp. 1-58 A.M.Turing: Computing Machinery 59, pp. 433-460 and Intelligence. Mind (1959),
[24] [25]
J.A.Fodor, J.J.Katz: The Structure of a Semantic Theory. In: Fodor/Katz (eds.) : The Structure of Language. Prentice Hall (1964), pp. 170-210 D.G.Bobrow: A Question-Answering System for High School Algebra Word Problems. In Procs. AFIPS 1964 Fall Joint Comp. Conf., Vol. 26, pp. 591-614 B.Raphael: SIR. 26, pp. 577-589 In Procs. AFIPS 1964 Fall Joint Comp. Conf., Vol.
[26]
[27]
B.F.Green et al.: BASEBALL. In F.A.Feigenbaum, J.Feldman (eds.) : Computer and Thought. McGraw Hill, New York (1963), pp. 207-216 J.Weizenbaum: pp. 36-45 ELIZA. Communications of the ACM, Vol.9 (1966),
L.Coles, L.Stephen: An On-Line Question Answering System With Natural Language and Pictorial Input. In Procs. ACM 23rd Natl. Conf. (1968), pp. 157-167 T.Winograd: Understanding Natural New York (1972) Language. Academic Press Inc.,
[31]
[32] [33]
W.A.Woods: Procedural Semantics for a Question Answering Machine. Procs. AFIPS Fall Joint Comp. Conf., 33 (1968), pp. 457-471 W.A.Woods: Progress in Natural Language Understanding - An Application to Lunar Geology. Procs. National Comp. Conf. (1973), pp. 441-450 J.Mylopoulos, S.Schuster, D.Tsichritzis: A Multi-Level Relational System. Procs. National Comp. Conf. (1975), pp. 403-408 M.M.Astrahan et al.: System R: Relational Approach to Database Management. ACM Transactions in Database Systems, Vol.1, Nr.2 (1976) IMS 2. In: Kurzbeschreibung von Information Storage and Retrieval Systemen, Gesellschaft f0r Mathematik und Datenverarbeitung, St.Augustin (1973) S.Todd: Integrated Architecture for Transaction Specification and Optimization in Relational Data Base Systems. Summer School on Data Base Technology, GMD St.Augustin (1976)
[34]
[35]
[36]
[37]
85
[38]
K.D.Kr~geloh: A Multi-Level System Architecture with Natural Language Interface (in German). Ph.D.Thesis, University of Karlsruhe (1976) C.H.Kellog: A Natural Language Compiler for Online Data Management. AFIPS 1968 Fall Joint Comp. Conf., Voi.33, pp.473-493 F.B.Thompson, P.C.Lockemann, B.Dostert, R.S.Deverill: REL: A Rapidly Extensible Language System. Procs. 24th National ACM Conference (1969), pp. 399-417 C.W.Bachman: ACM, Voi.16, The Programmer as Navigator. Nr.11 (1973), pp. 653-658 Communications of the
[39] [40]
[41]
[42]
M.Lacroix, A.Pirotte: ILL: An English Structured Query Language for Relational Data Bases. M.B.L.E. Research Laboratory Report, Brussels (1976) J.A.Feldman, P.P.Rovner: An ALGOL-Based Associative Language. Communications of the ACM, Voi.12, Nr.8 (1969), pp. 439-449 G.Goos: Programmkonstruktion. Karlsruhe (1974) Interner Bericht, Universit~t
A.Kratzer, E.Pause, A.v. Stechow: E i n f ~ h r u n g in die Theorie und Anwendung der qenerativen Syntax. Athenaeum Verlag, Frankfurt (1974) PASSAT. Systembeschreibung Siemens PBS4OO4, MOnchen (1973)
Dokumentationssysteme.
De Gruyter,
Berlin, New
J.C.Earley: An Efficient Context-Free Parsing Algorithm. Ph.D. Thesis, Carnegie-Mellon University, Pittsburgh, Pennsylvania (1968) T.Kasami: An Efficient Recognition and Syntax Analysis Algorithm for Context-Free Languages. University of Illinois (1966) D.H.Young~r: Recognition and Parsing of Context-Free Languages in Time n . Information and Control 10 (1967), pp. 189-208 R.F.Simmons: Natural Language Question Anwering Systems: 1969. Communications of the ACM, Voi.13, Nr.1 (1970), pp. 15-30 M. Kay: Experiments with a Powerful Parsere sur le Traitement automatique des langues, Deuxi~me Conference Grenoble (1967)
B.H.Dostert, F.B.Thompson: How Features Resolve Syntactic Ambiguity. Procs. of the Symposium on Information Storage and Retrieval, University of Maryland (1971) K.Brockhaus: Automatische schweig (1971) Ubersetzung. Vieweg Verlag, Braun-
[54]
[55]
H.Wulz: ISLIB - Ein Informationssystem auf linguistischer Basis. Interner Bericht, Institut for Deutsche Sprache, Abteilung Linguistische Datenverarbeitung, M a n n h e i m (1975) F.B.Thompson: English for the Computer. Comp. Conf. (1966), pp. 349-356 Procs. AFIPS Fall Joint
[56]
86
W.Wohlleber: Ein Parser f~r die Analyse natHrlicher Sprache. Diplomarbeit, Universit~t Karlsruhe (1973) J.Friedman: A Computer Model fo Transformational Grammar. American Elsevier Publishing Company Inc., New York (1971) C.Mathis: Entwurf und Implementierung einer textverarbeitenden Sprache. Diplomarbeit, Universit~t Karlsruhe (1975) ROTE LISTE 1975. Herausgeber: Bundesverband der pharmazeutischen Industrie, Frankfurt, EDITIO CANTOR, Aulendorf/W~rtt. (1975)
P L I D I S
S Y S T E M W I T H G E R M A N AS Q U E R Y L A N G U A G E
~)
G.L. B e r r y - R o g g h e H. Wulz
Institut Postfach
for d e u t s c h e 5409,
Sprache I
D-6800 Mannheim
I. B a c k g r o u n d
and a p p l i c a t i o n
of the s y s t e m
PLIDIS
(Problem!6sendes
Informationssystem language
mit Deutsch
als I n t e r a k is b e i n g
tionssprache) designed
is a n a t u r a l
information
system which
in the c o n t e x t
of a p r o j e c t Sprache
in a u t o m a t e d sponsored 1976-77.
at the I n s t i t u t
f~r d e u t s c h e
by the M i n i s t r y
for the y e a r s of a p r e v i o u s
The p r e s e n t p r o j e c t
two-year project which achieved question-answering Basis) (e.g. KOLB system ISLIB & WULZ 1975)
the c o n s t r u c t i o n
of the e x p e r i m e n t a l
(Informationssystem
auf l i n g u i s t i s c h e r
Within
theoretical
experimented
with.
The P L I D I S p r o j e c t d i f f e r s an a c t u a l system,
in its i n t e n t i o n
to i m p l e m e n t
lies on the a d a p t a t i o n
of the m e t h o d s
of the s y s t e m is b e i n g d e v e l o p e d
w i t h the r e g i o n a l supervise
at S t u t t g a r t ,
industrial
~)
The r e s e a r c h r e p o r t e d here is s u p p o r t e d by the G e r m a n F e d e r a l R e p u b l i c ' s " B u n d e s m i n i s t e r f~r F o r s c h u n g und T e c h n o l o g i e " u n d e r g r a n t Nr. O81 5900 69 w i t h i n the "3. D V - P r o g r a m m der B u n d e s r e g i e rung" . The a u t h o r s are e n d e b t e d to W . B r e c h t , W . D i l g e r , R . G u n t e r m a n n , D.Kolb, M . K o l v e n b a c h , A . L ~ t s c h e r , H . D . L u t z , K . S a u k k o , G . Z i f o n u n w h o c o l l a b o r a t e w i t h i n the P L I D I S - p r o j e c t and w h o d i d a lot of the r e s e a r c h r e p o r t e d here.
~)
88
of w a t e r
pollution and
several
bodies local
such
as
of the L a n d
the v a r i o u s variety of
PLIDIS dif-
of p e o p l e
with
aspects
industrial
waste
following
involved
in the p r o c e s s
of c o n t r o l l i n g
a particular
inspect
the
firms
in o r d e r the w a s t e
to c o l l e c t products plant,
and genetc.,
about type
production
process,
functioning sewage
of t h e are
sewage taken
at r e g u l a r
intervals
samples
for a n a l y s i s
is sent
reference in the
to this
of a p p l i c a t i o n ,
is s c h e d u l e d
following
system, the
to c o m p a r e and
current
sample
with
previous
to i s s u e
appropriate
warnings
if a n o r m h a s
been
trans-
gressed, - as i n f o r m a t i o n sition and system, e.g. to a n s w e r queries the concerning the compoof the
toxicity processes
of c e r t a i n of the
chemicals, involved
characteristics
production
firms
- as i n v e s t i g a t i o n riginated
system,
e.g.
to d e t e c t plans
where
and possibly
suggest
of a c t i o n s
to be taken.
2.
General
design
of P L I D I S
The
PLIDIS
information
system
is c o m p o s e d the
of,
on the o n e input
hand,
a lin-
guistic-logical nal
translates
German
into and,
an i n t e r on the the
representation hand,
calculus,
other usual
a problem-solving and r e t r i e v a l
in a d d i t i o n
to p e r f o r m i n g
storage
involves
problem
domain-specific
in the d e d u c t i o n system
of the
is l a r g e l y
and
allows
extensive
useris
interaction
between
the v a r i o u s
execution
phases.
This m o d u l a r i t y
89
an e s s e n t i a l
prerequisite
for e f f i c i e n t
teamwork
a specific
part and p o s s i b l e
smoothly.
interactive
are e s s e n t i a l
to f a c i l i t a t e
and d e b u g g i n g .
I is a d i a g r a m m a t i c
representation
are d e s i g n e d
especially
destined
for the s y s t e m d e s i g n e r and a d m i n i s t r a t o r . language processor as n a t u r a l (NLP) e n a b l e s language the u s e r to f o r m u l a t e or to use n a t u r a l such as r u l e s
The n a t u r a l
questions
for the i n p u t of s h o r t e r p i e c e s of i n f o r m a t i o n
a b o u t his p r o b l e m d o m a i n or d a t a for u p d a t i n g . F o r the input of s t e r e o t y p e d have data-sheets of f o r m a t t e d data of l a r g e r q u a n t i t i e s , w h i c h are p r o c e s s e d provides also the user m a y
on his terminal,
input
for p l a u s i b i l i t y
checks
The n a t u r a l
and the f o r m a t t e d to t r a n s l a t e
input processor
representation
(IR), an e x t e n s i o n
for i n f o r m a t i o n s information
and p r o b l e m d e s c r i p t i o n s or a c t i v a t e s problem
(PIP)
either
stores the i n c o m i n g
solving mechanisms
according
to the type of q u e s t i o n
the p r o c e s s o r
generates
o n l y some sort of
'pretty-print'
las of i n t e r n a l
representation, as a n s w e r
which contain
the i n f o r m a t i o n
the P I P - c o m p o n e n t
to the users q u e s t i o n s .
It w o u l d be d e s i r -
be r e p l a c e d by p r o c e d u r e s o u t of I R L - f o r m u l a s .
of t h e s e c o m p o n e n t s the c o m m a n d language
|
~
h~
Processor or
NLP PIP
I~orm- I
FIP
PAF
FormattedInput
i
Processor for Informations and Problemdescriptions Data Base
Processor
P L I D I S S up e r v i s o r
~ioms~
User
(IR)
- Natural Language (NL) - Forms (FS) - Internal Representation Command Language (CL)
fig. 1
91
The c o m m a n d active
language
gives
the n o n - n a i v e
user access
to v a r i o u s
interThe in
facilities,
which
are h e l p f u l
for t e s t i n g
algorithms external
d r a w on l e x i c a l
and o p e r a t i o n a l
information
data bases,
The m o r p h o - s y n t a c t i c
tries of n o n - l e m m a t i s e d tures
-
number, contains
The s e m a n t i c
in the i n t e r n a l symbol,
such as r e l a t i o n a l 'sort'
individual
(see s e c t i o n
3), the n u m b e r
such as l a b o r a t o r y
The s y n t a c t i c
Transition Network'.
-
The t r a n s l a t i o n the N L - s e n t e n c e s
rules
specify
'transformations'
of the p a r s i n g s
of
- Heuristics lem-solver.
specify
and s e m a n t i c
'knowledge'
in the internal
representation
information r i v e r water,
a b o u t the p r o b l e m domain:
(i) m a s s - d a t a
s i t i o n of v a r i o u s
information processes,
production logical
axioms
stating general
implications
regularities
in the w o r l d - m o d e l
than y implies
"if a c h e m i c a l
of the system, in a p r e v i o u s to s p e c i f i c
the n a t u r a l Some
lan~-
and the f o r m a t t e d
input p r o c e s s o r
as they w e r e
adaptation
of the i n t e r n a l
representation
in the
d o m a i n of a p p l i c a t i o n efficient
A new concept
for a m o r e the
and theoretically
for t r a n s l a t i n g representation is
analysis
at the moment.
to
92
be d o n e
is in the p r o b l e m
solving
component.
3. T h e
internal
representation
in KS
3.1.
General
considerations
The
choice
of an a p p r o p r i a t e within
internal was
(IR)
knowledge ical
the P L I D I S - s y s t e m
solely of,
considerations retrieval
namely
fective
of a n s w e r s must have
in n a t u r a l
language.
following
to m a t c h capacity
the c o m p l e x i t y to d e s c r i b e all
states
occurring
in a g i v e n
micro-world, of p r o b l e m s p u t to the
capacity
pertaining
to the
solution
aspects
can be m a d e
more
explicit
in the f o l l o w i n g
specific
requirements. (I) L i k e should on the should natural language, the I R m u s t be an 'object language', but ie it act it
describe
regularities level
of the G e r m a n
language, This
should that
referential contain
as n a t u r a l symbols
language. such
entails
metalinguistic
as set t h e o r e t i c a l
ones,
cases... (II) any all The given IR s h o u l d concrete be able to d e s c r i b e arbitrary have microworlds; ie for
micro-world, existing
it s h o u l d
the m e a n s
typical
individuals, Similarly,
processes,
it s h o u l d
temporal of
relations
syntax
the IR m u s t automatic
be e x p l i c i t l y
grammar
guides
processes
of n a t u r a l
into
IR s t r u c t u r e s level of the
allows
the p r o b l e m - s o l v e r
syntactic
With
the I R m u s t
be a s s o c i a t e d in w h i c h
a formal
semantic
interpretation to p a r t i c one to
accounts
for the w a y
IR f o r m u l a s and
correspond
arrangements about
in the e x t e r n a l
world
allows
decide
the e q u i v a l e n c e
of f o r m a l i s m s
93
of general
formal d e d u c t i o n deduction
to p r o g r a m
specific
- w h i c h does
for the i n t e r n a l
representation,
as PC is i n t e r p r e t e d t h e o r y and a g e n e r a l
by a formal 'theorem
semantics prover'
mechanism
The s t a n d a r d
first-order
predicate
calculus
does,
however,
not fulfill a
(for example,
condition
(V)). T h e r e f o r e
(in G e r m a n
'Konstruktsprache', predicate
abbreviated
to KS) was
on the f i r s t - o r d e r
calculus
but i n c o r p o -
a n u m b e r of e x t e n s i o n s below,
In the d i s c u s s i o n
we d i s t i n g u i s h according
to r e q u i r e m e n t
is i n d e p e n d e n t
d e f i n e d by a w o r l d -
KS are d e s c r i b e d
in 3.2; a p r e l i m i n a r y
KS-water-pollution-control
is g i v e n in 3.4.
3.2.
Short description
of the s y n t a x of KS
In a d d i t i o n
to the u s u a l symbols,
sets of s y m b o l s individual
in a p r e d i c a t e
calculus
namely predicate
symbols,
connectives
and q u a n t i -
can be u n d e r s t o o d Intervall,
as a b b r e v i a t i o n s
Objekt,
Situation,
Person/Personenk~rperschaft,
Zustand,
of v a r i a b l e s
or r e l a t i o n
94
The
following
conditions
of w e l l - f o r m e d n e s s from SV are
for t e r m s
are defined:
variables constants
terms. constant is a s s i g n e d a
are terms.
To each
F be an n-place of s o r t s :
tuple
<al,...,an,an+1> Let
aI an t I ,...,t n be t e r m s aI a (F t I ,...,tnn)
is a t e r m of the s o r t vidual
a n + I. O p e r a t i o n a l term
terms. If t h e n t h a r g u m e n t then
time. and
actions,
so on.
They
sort
(situation) following
u p of an o p e r a t i o n
symbol
followed
of s o r t s :
<al,...,an_1,int,sit> (4) L e t R b e an m - p l a c e of sorts: <a I ..... am> Let Then are al tam-1 t I ''''' m-1 (ai6S) be terms of t h e s o r t a l , . . . , a m _ I. Relational terms (see 3.3). relation symbol, to w h i c h is a s s i g n e d an m - t u p l e
of i n d i v i d u a l s to t h e
Atomic
in KS a r e c o n s t r u c t e d
according
following
conditions
be terms
of t h e
sorts
a l , . . . , a n + 1. T h e n
,e a n e a n+1
n ;~n+1 )
is an o p e r a t i o n a l
atomic
formula.
95
6) L e t
R b e an m - p l a c e
relation
symbol
with
the t u p l e
of sorts
aI am (R t I ..... t m ) is an r e l a t i o n a l Non-atomic rules formulas atomic formula. according detailed and to t h e u s u a l description 1976. construction
of p r e d i c a t e found
of the s y n t a x
of KS c a n be
ZIFONUN
3.3.
Special
features
of KS
a.
~YZ~9[~w
set S of sorts, described in 3.2, fields c a n be e x t e n d e d as d e m a n d e d sorts by requirements of s p e c i f i c of a p p l i c a t i o n . u s e of The underlie
The the
structure
which
is m a d e
structure
of KS i m p o s e s - in the
semantically sense
syntactic
of K a t z - F o d o r and
'selection
strictions'. guments
specification is m a d e
of the n u m b e r
sorts
of the arrather
of a p r e d i c a t e guided by
in f u n c t i o n principles.
of the w o r l d - m o d e l ,
linguistic
advantages
of a sortal 1971.
structure
in a r e p r e s e n t a t i o n calculus with
language linguistic
were con-
indicated
in H A Y E S was
A logical
sortal 1972.
siderations
proposed
by T H O M A S O N
notion
of
'term' terms
so t h a t
it is p o s s i some nat-
to e m b e d language
within
more
closely
constructs,
as c o m p l e x
Example:
"the m o t h e r in KS:
of the n e i g h b o u r (NEIGHBOUR
of the
friend
(MOTHER
(FRIEND
HANS)))
96
s y m b o l s VIELE,
entities.
The same a p p l i e s
to the n a t u r a l
u s e d as q u a n t i f i c a t i o n of w e l l - f o r m e d n e s s :
symbols.
They underlie
symbol,
let
al .am-l, (R t I ,...,tm_ I
Singular terms'
can be d e s i g n a t e d
in KS by
'individual
and
respectively.
der
As an e x a m p l e
Mutter Franz von und
t i o n of the s e n t e n c e s
"die Nachbarn der
"der N a c h b a r
yon Hans
Hans Egon"
and
Freunde
sind
is given:
(NACHBAR (NACHBAR
(MUTTER HANS);
FRITZ)
can be i n t e r p r e t e d
as LISP f u n c t i o n s .
3.4. T h e K S - l a n g u a g e
for the c o n t r o l of w a t e r p o l l u t i o n
In P L I D I S
is d e f i n e d a c o n c r e t e K S - l a n g u a g e in the c o n t r o l
deriving
its v o c a b u l a r y
from
the f i e l d of a p p l i c a t i o n
of w a t e r p o l l u t i o n . extended. A pre-
is s h o w n
of K S - w a t e r - p o l l u t i o n - c o n t r o l
are g i v e n
constants:
ARSEN ZYANID
(sort
: 'stoffkoll')
(sort : 'stoffkoll')
- operation - PROBE
(2-place;
/\
p e r ~ s t o f fquant maBzahl
ort
physobj
num
sit
/\
int
or t i n d i v
ortkoll
/\
stoffkoll natper firma betrieb
stoff
zeitpunkt
zeitkoll
stoffindiv
GEWAESSER,
RHEIN,
FRANKFURT,
...
stoffindiv:
ARSEN,
stoffkoll:
PROBE,
GROESSE,
stoffqual:
TEMPERATUR,
stoffquant:
ANTEIL,
maBzahl:
(~.3 M G / L ) . . . .
fig.2
Preliminary
sort
tree
of K S - w a t e r - p o l l u t i o n
98
- PROBENEHMER - ANTEIL
(l-place;
tuple
: <stoffkoll,
(3-place;
sortal
: <stoff, tuple
stoffkoll,
- LABORBERICHT
(3-place;
: <perkSrp, physobj>)
stoffkoll,
BETRIEB
sortal
tuple
: <firma,
ort,
betrieb>)
sortal c a n be
tuple
predicates = 'sample', =
'translated'
PROBENEHMER
='sampler',
'laboratory
report',
BETRIEB
'firm',
is an e x a m p l e
(BETRIEB
sample
taken
following (ANTEIL
ZYANID
(LABORBERICHT
CHEM-UNTERSUCHUNGSANSTALT MAX-MULLER
(BETRIEB
STUTTGART)
76.~I .15.)
; (0,5 mg/l))
"The
amount
of c y a n i d e
contained on
in the
sample
taken
f r o m the
firm
in S t u t t g a r t
13.1.76,
according
to the l a b o r a t o r y produced on
of the c h e m i c a l amounted
analysis
centre pro
in P l o c h i n g e n liter."
to 0.5 m i l l i g r a m
4. N a t u r a l - l a n g u a g e
analysis
in P L I D I S
The not
requirement only
of m o d u l a r i t y
in a s y s t e m but
such
as P L I D I S from a more
is d i c t a t e d systematic between
by o r g a n i z a t i o n a l was
reasons,
also,
point
of v i e w y i t which
desirable
to m a i n t a i n
a strict
separation well
are t h e o r e t i c a l l y interest
or m e t h o d o l o g i c a l l y to the p r o j e c t ,
of no c e n t r a l effort
and t h o s e
of g e n u i n e on.
and e x p e r i m e n t a t i o n
and where
research
is still
99
Thus the natural language processor is separated into three passes fig. 4): a PASS@ for the morphological identification, a PASSI
(see
for syn-
tactic analysis and a PASS2 for code generation, the language of internal representation.
In an earlier stage of the system PASS# was a p r o g r a m for morphological analysis which operated with a len~atized dictionary. For each german
w o r d of the system's vocabulary there existed only one dictionary entry, the basic form of the word. A certain class of verbforms for example were represented by their infinitive form. It was the task of the p r o g r a m to apply m o r p h o l o g i c a l rules to inflected forms of words and to reduce them to their basic form, which then allowed a dictionary look-up for further information. This analysis tech-
nique was very time consuming and it was replaced by a very simple program which works with a non-lemmatized dictionary. Each inflected form
of a w o r d has an entry in the dictionary with its full morphological information such as basic form of the inflected word, wordclass, tense etc. (see fig. 3). gender,
'PROBEN':
NF PROBE KNG 7 6 9 0 K (NOM GEN DAT A K K ) PN 6 G F) (VERB NF PROBEN KNG 7 8 3 1 PN ( 2 5 ) ROD BEF TEMP Gs D I A A K T ) (VERB ~F PROBEN KNG 8191 TEMP I N ) (VERB NF PROBEN KNG 7727 PN (4 6 ) MOD (IND KONJ) TEMP GE DIA AKT))
MORPHOLOGISCHE
BESCHREIBUNG
VON
'PRODUZIERT':
((VERB
TEMP P2 HS HABEN
fig.3
IO0
Error Message
~fr~
I-.4
I
(D o m
"[
eu O O
;~
-~.~
,~
O@
~ .,era
H
II
~4 0 Ol ,-t
r r~
r~
4-1 r/l
I
sZ 9 .,,.4 m~ in r (D @ 0 M m I 4-i
H
o"o q-t 4o
r~
D3
t~.t
",~,~
L__I
o-1 M4~ M ~
o"~
F
I
D~ ..-4 q-i
101
The
dictionary such
is s t o r e d time
on an e x t e r n a l
device
with
index
sequential of a
that
required
for m o r p h o l o g i c a l
identification
is r a t h e r amount the
of w o r k
is r e d u c e d
by a s p e c i a l of a b a s i c
of PASSe, entries
a dictionary
entry
a HELP-routine knowledge
of P A S S e the
enables
even
a user with
little
lin-
to w r i t e
dictionary
entry
for a b a s i c
wordform.
4.2.
PASS1:
Morpho-syntactic
analysis
In the laid
framework
of the P L I D I S analysis
project,
the
following
requirements
were
down
for a s y n t a x should be
requirement of the
of m o d i f i c a t i o n
and e n a b l e s
formulation versed
grammar
to be c a r r i e d
out by linguists
in p r o g r a m m i n g ; must allow the f o r m u l a t i o n it m u s t which be a b l e are v e r y analysis adequate of c e r t a i n to d e a l common should context-sensitive with dis-
in p a r t i c u l a r constituents, of the
effectively in the G e r m a n be o n l y
language;
syntactic with
one p a r s i n g , facilities to of
must
be e q u i p e d parsings
backtracking
alternative
if r e q u e s t e d
by s u b s e q u e n t
components
the system.
4.2.1. ~ 2 ~ ! _ ~ ~ ! 2 ~ _ 2 ~ _ ~ _ ~
Bearing model the above requirements in mind, it w a s d e c i d e d (ATN) (WOODS to a d o p t 1973). the An ATN (BTN).
of an
'augmented
network' of a
is a c o n t e x t - s e n s i t i v e The latter
'basic
transition
network'
is e q u i v a l e n t To i l l u s t r a t e the
'push-down consider
store'
automaton small
or a c o n t e x t context-free
the
following
with
following
rules:
S ~ NP VP NP ~ D E T N VP ~ V DET ~ d e r
I NPR
102
N V
~ ~
Hund schl~ft
NPR ~ Hans
Whereby symbols;
S is the s t a r t symbol;
der, Hund, Hans,
NP, VP,
DET, N, NPR,
V are n o n - t e r m i n a l
schl~ft
are t e r m i n a l
symbols.
The f i n i t e - s t a t e - t r a n s i t i o n - d i a g r a m as s h o w n in fig. 5.
(1)
U H (~ , PSNP/ ~
PS P ~ UH / V
PP :~ O
(2)
CTE . ~ AD T
N
C A T N P R ~ ~ / C A T
PP ~ O
() 3
CT A V
PP O
fig.5
A sample b a s i c t r a n s i t i o n
network
concepts
of a BTN are
'state'
and
'arc'. A n arc r e p r e -
the t r a n s i t i o n
f r o m state i to statej.
T h e r e are the f o l l o w i n g
i n p u t s e n t e n c e w h i c h is b e i n g p o i n t e d e g o r y i n d i c a t e d on the arc.
103
A PUSH-arc taken
introduces
recursion o f the
The
transition by the
c a n be in-
if a s u b s t r i n g on the arc.
input
is a c c e p t e d the
state
dicated
When
leaving phrase.
state,
scanner
points
to the
last w o r d A POP-arc
in t h e should
accepted be
with
a PUSH-arc.
After
on the P U S H - a r c , level.
up one
It i~ a
to a n e x t also
state, the
it l e a v e s
as b e i n g given
It w i l l
specify
presentation An ATN
being
substring
is a c o n t e x t - s e n s i t i v e o n the arcs
extension
of a B T N
to c o n t r o l in t h e above
sample
it c o u l d is o n l y
transition
state
S/NP
to s t a t e
allowed
if the
category
"number"
subject
agree. of p a r t s has
Furthermore, of a p a r s i n g been
actions, when
i.e. the
construction
formulated
transition
to a s t a t e
To a c h i e v e number thus
of r e g i s t e r s for t h e
which
to s t o r e
results
allow
treatment
of d i s c o n t i n u o u s to d e a l m o r e arc w a s
To e n a b l e lem,
this
probthere if the
a further
a VIR the
In a d d i t i o n , state only
is a T S T - a r c , word currently
transition
scanned
satisfies except
a given
condiEion, not
arc w h i c h input
is like
a TST-arc,
that
it d o e s
consume
structure
building string
actions.
Normally, directly
a parser
assigns
corresponds
to the
of the t r a n s i allow
corresponds
to the s u r f a c e picking
structure).
actions
to b u i l d thus
up n e w s t r u c t u r e s ,
up c o n s t i t u e n t s or e v e n semantic
o u t of order, representaver-
to r e p r e s e n t of t h i s for t h e
tions.
Because
an A T N w a s
also used
in an e a r l i e r
sion of P L I D I S representation.
translation
from natural
language
to i n t e r n a l
~04
of an A T N to r e p r e s e n t N e w arcs,
the g r a m m a r of a n a t u r a l can b e
'open e n d e d n e s s ' .
tests and a c t i o n s
formulated lysed.
to m e e t the r e q u i r e m e n t s
of the s p e c i f i c
l a n g u a g e b e i n g ana-
4.2.2. ~ s ~ a ~ @ _ ~ ! ~ ! ~ _ ~ [ ~ _ ~ [ i ~
The following describes b r i e f l y h o w the A T N concept, structures p r o p o s e d by WOODS, language was a d a p t e d to suit the s p e c i f i c (e.g. K O L B & LUTZ 1975). allowing of the G e r m a n
LCAT s t i p u l a t e s
of
a verb-form, verbphrase.
a passive
The t a b l e b e l o w d e s c r i b e s
the f o r m a t s
ly i m p l e m e n t e d A T N for German:
<category>
(<category>M)
(VIR < c a t e g o r y > (VIRL < c a t e g o r y > (TST < l a b e l > (WRD < w o r d > (LWRD < w o r d > (MEM
<stack> <stack>
<test>
<action>M)
(<word>M)
(PUSH < s t a t e > < t e s t > < p r e a c t i o n > M (POP < f o r m > <test>) <test> <action>M)
<test> <action>M)
Disadvantages
of u s i n g an A T N for G e r m a n b e c a m e structure.
apparent,
m a i n l y due am-
Since morpho-syntactic
articles, a m o u n t of
the p a r s i n g backtracking
the a r t i c l e
morpho-syntactic
interpretations:
105
sing m a s c sing sing fem fem masc fem neut disadvantage so that of an A T N is t h a t the tests are for-
general
as c o n s t a n t s
it is not p o s s i b l e followed
to v a r y
a test a c c o r d by an e x c e s -
to the p a t h complex
in the aparatus
analysis
so far e x c e p t
sively
of r e g i s t e r
there
seems
to b e no a l t e r n a t i v e parsing, which
of n a t u r a l and which
language
is as w e l l
at the same
easily
b y lin-
without
a special
programming grammar
Diagrams verb
of a p a r t c a n be for n o u n
of the A T N seen
noun
phrase
and
phrase
in figs.
6-1ff.
is v e r y w e a k l y like to h a v e a
phrases.
Whereas
a linguist fig.
to s o m e t h i n g the s e n t e n c e
like
7 PASSI
produces
an ana-
an C y a n i d (The a m o u n t
in der P r o b e of c y a n i d e 2 mg/l.)
der Firma
MUller
betrug
contained
in the
sample
was
the p a r s e r
is a b l e
to r e c o g n i s e sentences
of
structures,
including relative
complex
all obon
of s u b o r d i n a t e subject
clauses:
clauses,
adverbial
ject a n d
clauses.
There
a restriction to o c c u r
of a d v e r b i a l
clauses,
either
at the
o r at the e n d of the s e n t e n c e . main (NG) noun phrases (ADJG) (PNG) sentence-constituents are recognized:
following
attributes
phrases noun
phrases as
tributes, Abwasser'
such
'Das v o n d e r
Firma MUller
in d e n
eingeleitete
c a n be h a n d l e d
as well.
""
CAT
REFLPRON
O3
= / ,.,,,,-,~, ~
~
VF )A,x,v.VERB) T
s.\ MCAT (VERB AUXH AUXS AUXWMVERB) T
(HOLD VFIN)
v o
fig. 6-I
(Hauptsatz)
~EM
(%,
?
VIR VERBVF|tl (NEO (GETFTEMP'P2--Tk "'>~ ('NEO (GETFTEHP ' IN) ~ ~ [)/S / POPT ~.~
!) T
PUSHRES/ T
"4
Pk,._.M_USH
.~ PUSHHSVK/
Auou(;
k.
PUSHAAI T
S/
FEAT (VERPARGLEIPAR) T
WRD , T
PUSHHSVK/T
fig.6-2
ATN
for German
(Hauptsatz)
CAT PRAEP T
t-h p,O~
I
Ph 0
-4
0 0
fD 'lJ 0 m
>
PUSH ZAHL/ T
I....0
i-J
i L
~
CAT ADJ (AGREE (GETF K) (GETKASUS (GETR Krlr-)))
~d v
/
" ~ 1 (AGREE (GETF K) 1 ~",,,,~., JUMP T (GETKASUS (GETR.KrIG))) ~
801,
WRD ET~'/A T
POPT
CAT ADd
(AGREEKNG-1) POPT
CAT ADV T
/
--'~
CAT
ADJUT
fig.6-4
110
"W
i
,-g
vz
(I)
0
,1
,..Q (])
,I v,
%
P~
zl
-i
r~
LC) I k.O
q-I
111
C
NK
NG NK
VK
VG
NK
I
NP NK PNK NG V
I
NG
I
DET N PNK PNG
I
NP NP DET NPR
I
PNG PRAEP NP
/-,,,,
NP
PRAEP
ZAHL
DET
I
N
der
Anteil
an
Cyanid
in
der
Probe
der Firma-M~ller
betrug
20
mg/l
fig.7
Example of noun
syntactic
structuring
within
the
domain
NG
PNG
PNG
NG
VK
NG
DET
PRAEP
PRAEp
DET
DET
NPR
ZAHL
der
Antell
an
Cyanld
in
der
Probe
der Firma-M~ller
betrug
20
mg/l
fig.8
Example language
capacity
of
the
PLIDIS
natural
112
(S
( ( T Y P E . AUSSAGE) (DIATHESE . A K T I V ) (NS . (DER ANTEIL AN ZYANID IN DER PROBE DER FIRMA-ffiUELLER BETRUG 2 MG/L , ) ) ) (VK ( ( P N . (1 5 ) ) ) (V ( ( T E M P . V E ) ) BETRAGEN)) (NG ( ( K N G . 4 1 6 4 ) (K . N O M ) (PN . 3 ) ( 6 . M) (NS = (OER A N T E I L ) ) ) (BET NIL DER) (N NIL ANTEIL)) (PNG ((KNG . 1 6 0 1 ) (K . ( D A T A K K ) ) (PN . 3 ) ( G . N) (NS . (AN Z Y A N I D ) ) ) (PRAEP N I L AN) (N N I L ZYANID)) (PUG ((KNG . 1 0 9 0 ) (K . O A T ) (PN . 3 ) (G . F) (N$ = ( I N DER P R O B E ) ) ) (PRAEP N I L IN) (DET NIL DIE) (N N I L PROBE)) (NG ((KNG . 3138) (K . (GEN D A Y ) ) (PN . 3) (G . F ) (NS . (DER FIRMA-MUELLER))) (OET N I L OIE) (NPR N I L FIRMA-MUELLER)) (VERB ( ( N S . BETRUG)) BETRAGEN) (NG ((KNG . 7 7 4 5 ) (K = (NON GEN D A I A K K ) ) (PN . 3) (G . N) (NS . ( 2 M G I L ) ) ) (ZAHL N I L (INTEGERZAHL N I L 2)) (N NIL MG/L)))
fig.9
Sample
output
of
PASSI
113
may
contain
a main mood.
verb
in a n y
tense
or mode,
with
the
of t h e i r
ambiguity,
which had
cannot
be r e s o l v e d
by p u r e l y
syntactic
-
criteria,
constructions
to be e x c l u d e d :
coordination (eg
between M~nner
noun phrases und Frauen') i.e. noun phrases without a nominal head
'Die a l t e n
'elliptical' (eg
noun
phrases,
'Er n a n n t e
das b i l l i g s t e be p o s s i b l e
Certainly
it w o u l d the
by e x t e n d i n g dependency sooner
syntactic of verbs.
informations phrase
frames
analysis
needs
or l a t e r
semantic
information
decided
to r e s t r i c t
the
syntactic of the
analysis
of a l i s t dependency
of the m a i n structure
input
sentence
the b u r d e n PASS2.
of s e m a n t i c
interpretation
to the t r a n s l a t i o n
component
in
4.3.
P~SS2:
Semantic
analysis
component
Within
the P L I D I S - s y s t e m natural
semantic
analysis into
is v i e w e d formulas
of
translating
language
sentences
representation the p a r s i n g
language which
KS, m o r e
precisely:
to g e n e r a t e
trees,
are p r o d u c e d
ISLIB-approach for K S - c o d e
augmented
generation.
stated and
earlier,
this
efficient
remained
at an a d - h o c which
since have
it w a s
to find
a theoretical of r u l e s needed
foundation within
allowed
the a m o u n t
this
approach. The new concept the c o n c e p t where lation prets The for the n a t u r a l - l a n g u a g e - t o - K S grammar for a p a i r goal translation starts LI, from L2,
of a t r a n s l a t i o n source
of l a n g u a g e s language
L I is the (WULZ
language then
and L 2 the
1976).
PASS2
as a p r o g r a m
the t r a n s l a t i o n
grammar
translation
grammar 1969),
m a y be c o m p a r e d the r u l e s
a transformational on a l r e a d y
grammar existing
(GINSBURG/PARTEE
of w h i c h
operate
114
trees
of a p h r a s e
structure The
grammar nodes of
language, labelled
in the P L I D I S
system.
are
(syntactic
categories
of the grammar)
terminal In a
language
words)
grammar.
translation
grammar
trees
language
which and
correspond labelled
within
of b r a c k e t e d
constituents
of n a t u r a l sake here
language
of s i m p l i c i t y by s i m p l i f i e d trees. grammar
plained
in an a b b r e v i a t e d
terminology
of d e r i v a t i o n The
translation
disposes
of t h r e e
types
(I) r u l e s
of s o u r c e words
language
general
- by the c o n t e x t - p a t t e r n
of t h e i r
rules
raising
for e a c h pattern.
language context
w e will
pattern in w h i c h
syntactical
context
the w r i t e r
grammar
to KS m a y
correspond that
to the g e r m a n
(sample). two-place
of KS defines, sort
PROBE where
<TERM>
of the
of the s o r t Thus
<firma>
second
argument
in any c o n t e x t PROBE,
where
the g e r m a n
"Probe"
is t r a n s specias a
lated fied
by the K S - s y m b o l as a b o v e like
it w i l l pattern
be f o l l o w e d for P R O B E
by two terms
can be d e f i n e d
of type
I for the g e r m a n
word
"Probe"
would
state
then,
that
is to be r e p l a c e d context pattern
by the c o n t e x t <TERM
pattern
of fig.
10. as the
of PROBE,
; stoffkoll>
is v i e w e d
of the c o n t e x t ; firma>
pattern,
whereas are
the n o n - t e r m i n a l considered as
KS-symbols and it is
and <TERM
; int>
"slots" grammar,
the t a s k how
(2) r u l e s slots.
of the
translation
to d e f i n e ,
to fill
A distributional
analysis
of the c o n t e x t
115
<TERM;stoffkoll>
PROBE
< T E R M ;f i r m a >
<TEP_M ;i n t >
fig.10
Context
pattern
of P R O B E
of g e r m a n i.e.
"Probe"
will
show
that
the n o m i n a l case
attributes
of
a noun-group "Probe",
following
the c o n s t i t u e n t s , slots of
into for
the P R O B E pattern
pattern. within
the
the K S - c o n t e x t language
transstate with a
of a n a t u r a l the
sentence <TERM
to g e r m a n ; int>
"Probe"
would
e.g.
slot w i t h of the
has
to be f i l l e d
context
pattern
resulting "Probe",
f r o m the
translation also
of a
following "vom".
specifying
possible
RR 6 denote
some
rules
of type
(1)
of
german IRI,
words
pattern
of t h e i r insertion
IR 2 r u l e s
patterns;
the of
pattern, german
consisting "Probe"
these
rules Co v o m
within
bei MOller&
15.12.76"
(the
sample
12/15/76)
c a n be r e p r e s e n t e d stand
schematically
as s h o w n
11, the
where arc.
the arcs
f o r the a p p l i c a t i o n
of the r u l e s
label
The u s e c a n be
of the shown
sorts
within
if o n e
yon MHller
116
die
Probe
bei
M011er&Co
vom
15.12.76j
C
<TERM ;stof fkol i>
PROBE
<TE~;firma>
<TERM;int>
fig.11
of r e p l a c e m e n t
117
as ~ i t e r n a t i v e As insertion
formulation
rule IR2 r e q u i r e s
of a p r e p o s i t i o n a l
pattern
the P R O B E -
is a T E R M of the
at the p l a c e of a T E R M w i t h the sort < i n t e r v a l l > . F o r each i n s e r t i o n context-pattern deleted (see fig. rule there is a side e f f e c t defined. If a f i l l e d - i n it is
is i n s e r t e d
at its o r i g i n a l
place
r e p l a c e d b y the e m p t y p a t t e r n E
If all t e r m i n a l
to the r e m a i n i n g
structure
f o l l o w i n g ways:
s y m b o l x of the s o u r c e l a n g u a g e in c o n t e x t p a t t e r n
g r a m m a r can be reis d o m i n a t e d
placed by a filled
if this p a t t e r n which
are d o m i n a t e d by x,
(2) If a n o n - t e r m i n a l
language
grammar dominates
catenation tion
symbols
of a s o u r c e l a n g u a g e grammar; above in
gram-
13 then i l l u s t r a t e s
the a p p l i c a t i o n
t h e s e rules
to the tree, w h o s e
<TEP~4; s t o f f k o l l >
PROBE
<TERM;firma>
<TE~;int>
( BETRIEB
FRITZ-MULLER&CO
7@@~.STUTTGART
76.12.15.
fig.12
Result M~ller
of t h e a p p l i c a t i o n of & Co v o m 1 5 . 1 2 . 7 6 " .
the
rules
of
type
I and
2 on
"die
Probe
bei
F \
bcef
fig.13
(PR).
120
indicate
the order,
in w h i c h
f r o m the a p p l i c a t i o n symbols
of t e r m i n a l
of the goal
s u b j e c t of e x p e r i m e n t a t i o n
the t r a n s l a t i o n
5. I n f o r m a t i o n
handling
and p r o b l e m - s o l v i n g
for i n f o r m a t i o n s
and p r o b l e m
descriptions
14)
data b a s e m a n a g e m e n t and,
procedures
'problem-s01ving
procedures'
of q u e s t i o n s .
and will
"real-life"
of the system,
when mass-data
to be p r o c e s s e d .
data b a s e m a n a g e m e n t
be too w e a k to h a n d l e adaptions of a l r e a d y
the c o m p o n e n t m a y be r e p l a c e d systems.
data b a s e m a n a g e m e n t
5.1. A n o u t l i n e of D a t a b a s e m a n a g e m e n t
component
of P L I D I S
to in
s u c h a w a y as to e n s u r e easy r e t r i e v a l . the s e c u r i t y of the data persons. The normalising tifiers process includes reducing so that a c c e s s
task is to e n s u r e
is o n l y g i v e n to a u t h o r i z e d
skolemising
of the e x i s t e n t i a l
quan-
and s u b s e q u e n t l y
the K S - f o r m u l a s
als. C e r t a i n
argument-terms,
Z~I@9.STOFFKOLL,
the n u m b e r of (PROBE M U L L E R
This p r e s u p p o s e s
that the f o r m u l a
13.10.76
; ~I@9.STOFFKOLL)
is also stored.
PIP
Processor
for
Informations
and
Problem-descriptions
vi
Monitor
I
Problem-solver Theorem]?rover other Term- ProblemInter- solving preter Operations
Management
~Fac
tsilData t i c s / Axioms
Heuri~s-
P L
I D
I S
S u p e r v
i s o
fig.14
Structure of the PLIDIS component for information processing and problem solving.
122
The d a t a
base
is d i v i d e d base,
into
two the
containing
base, are
containing
the
item
with
entering
security-keys is a l l o w e d the k e y
of all
users
in a d e p e n d e n c y with his
tree.
A user
access
only
to t h o s e
prefixed This
own k e y or w i t h author-
of a u s e r have
on a d e p e n d e n t access
ensures
that o n l y
ized p e r s o n s
to s p e c i f i c
data.
5.2.
Characteristics
of the p r o b l e m - d o m a i n
for P L I D I S
First
it s h o u l d
be m a d e
clear what
is m e a n t Defined
by
'problem-solving' sense
the c o n t e x t
in a n e g a t i v e theorem
of m a t h e m a t i c a l emphasis from
proving,
reasoning.
is r a t h e r data
on r e t r i e v i n g base with
for a q u e s t i o n
a large
a mini-
of d e d u c t i o n s . of p r o b l e m s the system will explicitly have to d e a l w i t h are: in the fact sample'. were
I. The data
of facts
or i m p l i c i t l y asking
contained
An e x a m p l e 'how h i g h asking
of a q u e s t i o n
for an e x p l i c i t in a s p e c i f i c toxic
would
is the l e v e l an i m p l i c i t whereby
of a r s e n i c fact:
An e x a m p l e contained by ~toxic'
for
'which
in s a m p l e from such
y';
it has
to be d e d u c e d as 'a s t u f f
is m e a n t impedes
statements
of facts
the g r o w t h is i m p e d e d
of p l a n t s
in the r i v e r which
is toxic'; the
'the g r o w t h of o x y g e n
of p l a n t s in water'
by c h e m i c a l s
reduce
level
of a r i t h m e t i c level
operations
on the d a t a
retrieved, period'
'the a v e r a g e
of c y a n i d e processes.
over For
a specified example,
of some
an e x c e s s i v e x, w h i c h firm
was
detected
in a r i v e r
at p l a c e
this?
To a r r i v e
it has x might
to be d e have pronot
located
upstream
cyanide
as a w a s t e given, but
product; it m i g h t
this have
information
itself from
might
be e x p l i c i t l y ical processes
to be d e d u c e d process.
the c h e m -
involved
in the p r o d u c t i o n
123
4. The c o n t r o l l i n g of w a t e r samples
of p o l l u t i o n .
is i m m e d i a t e l y
if a n o r m is f o u n d to be t r a n s g r e s s e d , This a c t i o n is a involves
appropriate samples
a c t i o n is taken. to find o u t if it
c h e c k i n g of p r e v i o u s and so on.
'first o f f e n c e ' ,
F r o m the a b o v e c a t a l o g u e ,
it f o l l o w s t h a t the p r o b l e m
solving component
operations:
(hash-coding,
questions
put to the s y s t e m u s u a l l y
involve
the
it s h o u l d be p o s s i b l e or of m a s s - t e r m s ) .
(of i n d i v i d u a l s called
f o r m e d by a c o m p o n e n t the K S - q u e s t i o n
'Terminterpreter'
(TI), w h i c h r e f o r m u l a t e s evaluates is
into s e t - t h e o r e t i c
this t e r m w i t h s e t - t h e o r e t i c
operators.
process proper
p r e s e n t no p a r t i c u l a r which are e v a l u a t e d
as K S - o p e r a t o r s
components
in the p r o c e s s g u i d e d by a
problem. example
'monitor'.
of the o p e r a t i o n
problem-solver
is s h o w n in s e c t i o n
5.3. P r o b l e m - s o l v i n @
w i t h an a u t o m a t i c
' t h e o r e m - p r 0 v e r '~)
disposes
of a 'declarative'
internal
representation
of
c o n s i s t i n g of a set of K S - f o r m u l a s , an a u t o m a t i c
it s e e m e d i n d i c a t e d (TP) b a s e d
theorem-prover
o f f e r i n g the a d v a n t a g e s (cf. C H A N G
~) for f u r t h e r
information
see D I L G E R
1976a
~24
Without
giving a detailed
analysis
of the r e s p e c t i v e m e r i t s of d i f f e r e n t for
problem-solving
approaches,
a pro-
in its a d a p t -
problem-domains.
the l a t t e r m e t h o d is c l e a r l y de-
may achieve
the p r o b l e m - d o m a i n
expected
in the p a s t b e e n h e a v i l y
searching
strategies
h a v e sigcriticism in f i r s t -
A m o r e deep r e a c h i n g
c a n n o t be a d e q u a t e l y
presented
order predicate
In our o p i n i o n , earlier,
the e x t e n s i o n s
incorporated the
in KS, w h i c h w e r e d e s c r i b e d p o w e r of f i r s t - o r d e r c a t e s of a
have considerably
improved
predicate
calculus.
Another objection
by a d v o operates reon
'procedural'
approach
is t h a t a t h e o r e m - p r o v e r
a static w o r l d - m o d e l
whereas
in a r e a l - w o r l d model,
it is o f t e n
q u i r e d to be able to r e m o v e data states of the world. all facts and a c t i o n s variables. Most important in the e v a l u a t i o n In P L I D I S
are c h a r a c t e r i s e d
normalization of n o r m a l i s i n g
The p r o c e s s
in r e d u c i n g the K S - f o r m u l a s in c o n j u n c t i v e normal
into form,
t a k e s p l a c e w h e n the f o r m u l a s
are
so that it o n l y needs
to b e c a r r i e d o u t
Questions
m u s t of c o u r s e
The p r o c e s s pects:
of r e s o l u t i o n
proper
i) s e a r c h s t r a t e g i e s
'problem r e d u c t i o n ' ,
versus
'breadth-first'
analysis,
g r a p h s as d e s c r i b e d
125
by K O W A L S K I
(1975),
s u p p o r t e d by m e t h o d s presents important
E a c h of these t e c h n i q u e s problems. It w a s d e e m d e d
advantages
in the P L I D I S
implementation
strategies
to be k e p t variable,
according
for example,
'input r e s o l u t i o n ' )
has the a d v a n t a g e
r e l e v a n t to the q u e s of i n p u t r e s o l u is avoided.
B e c a u s e of the i n c o m p l e t e n e s s
tion c o n c l u s i o n s
f r o m false p r e m i s e s
w h e r e the TP is to be u s e d for c o n t r o l l i n g p o l l u is in g e n e r a l indicated. of the P L I D I S theorem'prover functions in not k n o w n and a 'state-space' de-
the g o a l - s t a t e
is h e n c e
implementation
state-space
analysis.
It is p o s s i b l e or
to c h a n g e
'input r e s o l u t i o n ' ,
The a x i o m s b e i n g is e n v i s a g e d
r e s o l v e d u p o n are l i n k e d b y a c o n n e c t i o n connection
graph. ~) It
to c o n s t r u c t
g r a p h s w h e n the d a t a is e n t e r e d into s u b s e t s
in the d a t a base.
l i n k e d by a c o n n e c t i o n of r e l a t e d axioms.
As a f u r t h e r e x t e n s i o n , subsets,
into
w h i c h w o u l d aid the s e l e c t i o n
the e n t i r e it seems
system's
knowledge
c a n thus be n e a t l y d i v i d e d verified. On a p r e l i m i n a r y
s u b s e t s has n o t y e t b e e n e m p i r i c a l l y tigation,
process,
the s e l e c t i o n
to be r e s o l v e d u p o n is g u i d e d by a calls u p o n s e m a n t i c
'selection
function
as w e l l as s y n t a c t i c
heuristics.
~)
cf. D I L G E R 1976b
126
In the
context
of r e s o l u t i o n as f u n c t i o n s so t h a t
by m e a n s which
of c o n n e c t i o n links by
graphs, some
heuristics or to
c a n be d e f i n e d semantic
evaluate
syntactic is c h o s e n
criteria upon.
the
'optimal'
pair
of c l a u s e s
be r e s o l v e d Such
heuristic
functions
could
be v i e w e d
as f u n c t i o n s fuzzy
operating
on the de-
of the set of
links,
having
as v a l u e s
sets
of l i n k s
as follows: {kl, fuzzy K' . o. k n} be the set c o n s i s t i n g a subset set of links in the c o n n e c t i o n of all fuzzy sets graph and
K = the
of the u n i o n
FS(K'), set FS of
whereby K'
denotes
of K: FS = K~cK
FS(K').
(A f u z z y
is a function:
function
: 2K~Fs 6 FS(K') (K'~K) function of a are does not need to y i e l d a value for
h(K') that
a heuristic
of K. only
method end
of a n a l y s i s , same
in the
for a p a r t i c u l a r be defined.
subset
K' of K,
several
heuristic
func-
It is n e c e s s a r y functions
to a l l o w to the
the u s e r
the p o s s i b i l by his
new heuristic
system,
required
of a s y n t a c t i c i.e.
heuristic the n u m b e r
would of
be a f u n c t i o n
the
substitutions then
unifier would
k contains
I f(k)=~) ; a n o t h e r (the v a l u e
example
be t h e u s e would
of r e s o l u t i o n '0' or into
clauses
of this
function Semantic
be e i t h e r take
heuristics and
account of the
the a r g u m e n t s
Such h e u r i s t i c s
in t e r m s
of the w o r l d - m o d e l makes
Finally,
the P L I D I S
problem-solver a unifier
of KS in s e l e c t i n g tution is c a r r i e d with
for a set o f
clauses.
out,
it is c h e c k e d of the
if t h e
sort of the c o n s t a n t
compatible following
the
sort
argument,
as is i l l u s t r a t e d
by the
two clauses:
127
i (AT x y)vl (MOVE x y z) v (AT x z) (AT table The f o l l o w i n g (PLACE table)) together with a specification
u n i f i e r can b e e s t a b l i s h e d
of the s u b s t i t u t i o n s table)LOC)/y
resolvent: z)
is i l l - f o r m e d
'animate';
the s u b s t i t u t i o n m u s t h e n c e b e rejected.
5.4.
l l i u s t r a t i n 9 the p r o b l e m - s o l v i n g
c o m p o n e n t of P L I D I S
interaction
in s e c t i o n
set t h e o r e t i c
operations,
operations
(Yes/no question) (2) Q u e s t i o n s questions) (3) Q u e s t i o n s asking about 'processes' or s e q u e n c e s of a c t i o n s needed asking for s p e c i f i c information (what/which/who...
(how/why questions) a theorem-prover knowledge?' the v a r i a b l e w h o s e extension which is b e i n g is i m p l i e d can be c a l l e d upon, as all of t h e m repre-
to the form:
'can q be d e d u c e d
f r o m the f o r m u l a s
the s y s t e m ' s
b y a d d i n g an
'answer-predicate'
The d e d u c t i o n clause'
is c o m p l e t e d
if i n s t e a d of the e m p t y of o n l y o n e
'answering
is derived.
This consists
the a n s w e r p r e d i c a t e , In the c a s e of s
the a r g u m e n t
of w h i c h c o n s t i t u t e s is similar,
3 questions,
the p r o c e d u r e clause
is not an indi-
128
variable
hut
a term,
which
denotes
the s e q u e n c e
of a c t i o n s
col-
in the d e d u c t i o n
following of a type
2 question.
is a p r a g m a t i c
a yes/no
firm MOller
(?(ANZAHL
(PROBE
int, xI ))))
- "How o f t e n
has M U l l e r zahl xI
been
checked
this
year?"
(LAMBDA
(ANZAHL
int (LAMBDA x I stoffkoll (EXIST x I (UND (PROBE (BETRIEB M U L L E R & CO 7 O O O . S T U T T G A R T ) int stoffkoll) xI ; x1
x int,}))) I
The part
interaction of P L I D I S
between will be
components by m e a n s
of a f i c t i c i o u s simplified
wherein
in a s o m e w h a t
format.
predicates
'translated'
to e n s u r e to the
greater
t o the n o n - G e r m a n
system
is the
question: toxic taken materials were contained in the samples of the firm
on 2 4 . 5 . 7 5
internal form
representation (making
lowing
allowance
the
translation
of t h e p r e d i c a t e s
individual
(LAMBDA x s t ~
x stOff)
(TOXIC x S t ~
129
The data b a s e c o n t a i n s
following
axioms d e f i n i n g
'toxic' m a t e r i a l
in
interfere
d i r e c t l y or i n d i r e c t l y w i t h
the f a u n a or f l o r a in the river. - poisons interfere d i r e c t l y w i t h the f l o r a and fauna of the river. the o x y g e n level of the w a t e r interfere indi-
- materials
which reduce
r e c t l y w i t h the f l o r a and fauna. - chemicals which materials reduce stimulate growth excessively or s l i g h t l y o x i d i s i n g
the o x y g e n
The a b o v e a x i o m s
are f o r m a l i s e d
(i
(FUERALL x s t ~
(IMPLIK(ODER(DIRINTERFER
(2
(FUERALL x
stoff
(IMPLIK(POISON
x st~
( D I R I N T E R F E R xSt~
(3
(FUERALL x
stoff
(IMPLIK(REDUCEOXYGEN
x st~
Apart
f r o m t h e s e axioms, of the s a m p l e s
the d a t a b a s e
contains
entries
a b o u t the com-
as w e l l as i n f o r m a t i o n that n i t r a t e
for e x a m p l e
stimulates
that a r s e n i c (5)
and d y a n i d e
(COMPONENT(SAMPLE
MUELLER
MUELLER
LEAD))
( P O I S O N ( L I S T E A R S E N I C CYANIDE))
130
to deduce
the answer
the f o l l o w i n g
steps
'TI' c o m p o n e n t formula,
reformulates
the K S - q u e s t i o n
whereby
the o p e r a t o r s
intersection
and union,
MUELLER MUELLER
(ii) The e x t e n s i o n
of the individual
set terms c o n t a i n e d
in the for-
mula has to be defined: A B C : (COMPONENT(SAMPLE : (COMPONENT(SAMPLE : (TOXIC) operations called by TI o b t a i n from (5) and (6) give the MUELLER MUELLER 24.5.75)) 7.9.75))
The m a t c h i n g following A = B =
answers
for the p r e d i c a t e
at this point. The c o n c l u s i o n (LAMBDA x which to be d e d u c e d (TOXIC x)) as: x)) changes sentences (I-4) into the f o l l o w i n g by the TP is:
is n o r m a l i s e d x))(ANS process
(10)
((NEG(TOXIC
The n o r m a l i s a t i o n clauses: (a) (b) (c) (d) (e) (f) From (g) (h)
((NEG (DIRINTERFER
X)) (TOXIC X)) X)) (TOXIC X)) X)) X)) X)) X))(INDIRINTERFER X))(REDUCEOXYGEN X))
((NEG (INDIRINTERFER ((NEG (REDUCEOXYGEN ((NEG (STIMULGROWTH ((NEG (OXIDISING (10) and (a-f)
((NEG (DIRINTERFER
(a), (c),
(10) (g)
131
I n s t e a d of c o n t i n u i n g to r e t r i e v e
the d e d u c t i o n ,
Control
is p a s s e d b a c k a g a i n to the TP w h i c h m a k e s
ther deductions : (i) (j) (k) ((NEG(INDIRINTERFER ((NEG(REDUCEOXYGEN ((NEG(STIMULGROWTH X)) (ANS X)) X))(ANS X)) (10) , (b) (d) (e) , (i) , (j)
TI r e t r i e v e s
( ( N E G ( O X I D I S I N G X ) ) ( A N S X))
Evaluation
(ET(VEL A B)C)
(LISTE C Y A N I D E N I T R A T E )
6. I m p l e m e n t a t i o n
of P L I D I S
PLIDIS
is w r i t t e n
which
is an i m p l e m e n t a t i o n
of
on a S I E M E N S - 4 0 0 4 / 1 5 1
r u n n i n g u n d e r the
Uppsala-INTERLISP 1974)
is i t s e l f an i m p l e m e n t a -
(TEITELMAN
for an I B M 3 6 0 / 3 7 0 c o n f i g u r a t i o n .
SIEMENS-INTERLISP
f e a t u r e s w e r e u s e d so t h a t the s y s t e m implementations.
run in o t h e r I N T E R L I S P
132
REFERENCES IdS Institut C.L. fur deutsche & Lee, R. Sprache, Mannheim Theorem
Chang,
Dilger,
W.
(1976a) :
Ein Frage-Antwort-System auf der Basis einer pr~dikatenlogischen SDrache. - Proceedings of the workshop in ' D i a l o g e in n a t U r l i c h e r Sprache und Darstellung von Wissen', Freudenstadt, 1976, p. 31ff. Verbindungsgraph und Auswahlfunktion. w o r k i n g p a p e r , IdS, M a n n h e i m . - unDubl.
---
(1976b) :
Ginsburg,
S.
& Partee,
B. (1969): A M a t h e m a t i c a l M o d e l of T r a n s f o r m a tional Grammars. - In: I n f o r m a t i o n and Control (1969), pp. 2 9 7 - 3 3 4 . A L o g i c of A c t i o n s . - In: B. M e l t z e r & D. M i c h i e (eds.) : M a c h i n e I n t e l l i g e n c e 6. E d i n b u r g h . Some Problems and Non-problems in R e p r e s e n t a t i o n Theory. - Proceedings o f the 1974 A I S B S u m m e r Conference, pp. 63ff. (1975): V e r a r b e i t u n g von I n f o I-4, IdS, M a n n h e i m . Netzwerken. - ISLIB-
15
Hayes,
P.J.
(1971):
(1974):
Kolb,
D.
& Lutz,
H.D.
---
& Wulz,
H.
(1975): A l l g e m e i n e Beschreibung und f u r d i e B e n u t z u n g v o n I S L I B B ~ r s e. IdS, M a n n h e i m . A Proof P r o c e d u r e U s i n g C o n n e c t i o n J o u r n a l of t h e A C M , 2 2 ( 4 ) . INTERLISP Reference Manual. Research Center, Palo Alto.
Kurzanleitung - ISLIB-Info
I-I,
Kowalski,
R.
(1975):
Graphs.
- In:
Teitelman,
W.
(1974):
- XEROX
Palo
Alto
Thomason,
R.
(1972):
A S e m a n t i c T h e o r y of S o r t a l I n c o r r e c t n e s s . - In: J o u r n a l of P h i l o s o p h i c a l L o g i c I, pp. 2 0 9 - 2 5 8 . INTERLISP /360 and Uppsala University /370 Data User Reference Manual. Center, Uppsala. -
Urmi,
J.
(1975) :
Woods,
W.A.
(1973) :
Transition (ed.) : N a t u r a l
Wulz,
H.
(1976) :
Konzept einer Theorie einer ubersetzungsgrammatik. - u n p u b l , m s . , IdS, M a n n h e i m . KS: e i n e f o r m a l e S p r a c h e zur k a n o n i s c h e n Darstellung natUrlicher I n h a l t e in e i n e m a u t o m a t i s c h e n Frage-Antwort-System. - Arbeitspapier LDV-MA-73-3, IdS, M a n n h e i m . Die Konstruktsprache KS. E n t w u r f e i n e s D a r s t e l lungsmittels fur natUrlichsprachlich formulierte Information. - w o r k i n g p a p e r , IdS, M a n n h e i m .
Zifonun,
G.
(1974) :
(1976):
METAMORPHOSIS GRAMMARS
A. COLMERAUER
GROUPE D'INTELLIGENCE ARTIFICIELLE U.E.R. Scientifique de Luminy Unlversit@ d'Aix-Marseille II 70, Route L@on Lachamp 13288 MARSEILLE (FRANCE)
Let us also indicate that the Artificial Intelligence Group is an Associated Research Group of the CNRS.
Abstract :
of the type : "replace such and such sequence of trees by such and such another sequence of trees". Within the framework of programming in first-order logic, we propose axioms for these grammars which produce efficient parsing and syntheses algorithms. We illustrate this work by the programming-language PROLOO and by two important examples : writing of a compiler and writing of an intelligent system conversing in French.
Key-words
INTRODUCTION
In 1970 I was trying to perfect a particular Kind of non-determinist programminglanguags : q-systemS (4). This work concerned a formal sysbem allowing us to write
complex grammars, to which was associated an interpreter in order to analyse or synthesise structures conforming to these grammars. The basis of the formal system was composed of re-writing rules.
134
"context-free" type, i.e. one could re-write any sub-sequence of any length in any sequence ; on the other hand, instead of working on sequences of simple symbols, one could work on sequences of complex symbols (more precisely, trees). A system of formal parameters allowed us to transmit into each symbol any In{ormation required.
powerful language, based on few but very systematic principles. It allowed us to complete all the stages of our process of English/French translation : morphology and analysis of English sentences, stages of transference from the English deep structure to the French deep structure, synthesis and morphology of the French sentences,
Having become more interested subsequently in the semantics of language and in mechanisms of deduction, I abandoned q-systems and turned to techniques of automatic demonstration, basing my work on J.A. Robinson's principle of resolution (cf. 10 and 8).
I then collaborated in the elaboration of a programming-language PROLOG (cf. 11 and I). Originally conceived to resolve deductive problems in a system conversing in French (6), this language found immediately a number of applications : let us quote among others, formal integration (3), robotic (12) and speech-recognition (2). However, although this language was superior in many flelds to the q-systems, the latter were simpler and clearer as far as the treatment of syntax was concerned. It was to remedy this situation that we conceived metamorphosis grammars : these involve an axiomatlsation into 1st-order logic of the assoclativity of the concatenation in order to obtain in PROLO6 the facilities of the q-systems, thus obtaining a very powerful instrument for all syntactic and semantic treatment of languages.
This article is divided into two parts : a theoretical part in chapters 1 and 2, and a practical part in the last 3 chapters.
The #irst chapter introduces our terminology and proposes some ideas which may be considered a better basis for PROLOG than "SL-resolution" ideas suggested in (9). (8). We take up here
135
The third chapter gives a brief outline of PROLOG and of the way in which metamorphosis grammars ere treated in that language. For more details we refer the reader to the PROLOG-Manual [11).
Chapter 4 illustrates by an example the way in which we man write a compiler by means of metamorphosis grammars.
In chapter 5 metamorphosis grammars are used to treat the problem which interests us most of all : conversing in French with a machine capable of reasoning. The example proposed is described very briefly, but is based on an extensive study of the role of articles in French. This study follows the general line of R. Pasero's work on the representation of French in logic.
136
CHAPTER
= = = = = = = = =
A SUBSET OF I S T - O R O E R
LOGIC AS A P R O G R A M M I N G - L A N G U A G E
1.1 BASIC T E R M I N O L O G Y
is a s s o c i a t e d
a integer
symbols
set
as f o l l o w s is a t e r m f
is called a term on
is a v a r i a b l e
f 6 F f 6 F
then and
is a t e r m are terms
tl,t I ..... t n
f ( t l , t 2 ..... t n) is a term. or simply ~ , the set of terms, H and HIE] or simply H the set
containing
no variables.
is often c a l l e d
a Herbrand
universe.
The e l e m e n t s
of the H e r b r a n d
computer scientist
constructed
A formula
or set of f o r m u l a e p
p'
obtained
by substi-
t u t i n g for each v a r i a b l e of
of the H e r b r a n d universe.
Let
called
relational
symbols
formula
(I] if {2) if
= 0 = n
then
is a t o m i c r [ t l , t 2 , . . , , t n)
A clause
is a set of literals.
A (Herbrand) relational
interpretation r
variables.
To each
symbol
of o r d e r
relation p
between
the e l e m e n t s
of the H e r b r a n d
universe
137
iff
r(t I ..... t n) E I
Vtl,t 2 ..... t N E H :
r 6 1
An interpretation
iff
I c J
We consider that {1) a set of clauses is a conjunction (A) of clauses quantified at its head
(2J t h e v a r i a b l e s
of literals
We therefore
as follows
An Interpretation
as always satisfied. each value of the clause, if~ it satisfies at least one literal of the
A and B
by satisfies B .
each interpretation
INTERPRETATION
SATISFYING
THEM
to considering
a "programme"
as the definition
Let
be a set of clauses
n-ary
relational
symbol
r.
Let us suppose that there exists a smallest interpretation ween the trees, p I therefore
interpretation r an
E . This bet-
as a "programme"
deductive n-ary
rules which play the part of a "maby enumeratinz all the n-uplets
chine" allowinz
relation us.
138
From this point of view, our programmes will be sets of clauses of a peculiar type, called "regular".
Definition :
A clause is said to be regular iff it contains one and only one posi-
tive literal. A set of clauses is said to be regular iff it contains only reguiar clauses.
A regular set of clauses always admits an interpretation I which satisfies it. We need only take as I the set of aii atomic formulae without variables.
is regular and if
and
satisfy
, then {{+a,+b}}
IDJ
is e counter-example).
If we now consider the intersection of all the interpretations which satisfy a regular set, we can deduce from it the following property :
Property I. If
is a regular set of clauses, then there exists a smallest interImin[E], which satisfies it.
pretation, written
+conc(.(s,x),y,.(e,z))
(each literals line represents one a f t e r
the other). E
The alert reader will verify that the smallest interpretation satisfying in this example a s s o c i a t ~ t o conc'[u,v,w] iff u ccnc the ternary relation ~ , ~ .
is of the form
v w
The notion of smallest interpretation satisfying a set of clauses takes on all its interest only when one notes the second property, which follows.
139
Property
2.
Let
be a set of clauses
interpretation we have
Imin[E]
satisfying
iff
p E Imin[E]
needed to calculate
be those used
demonstration.
presented
of Robinson's
principle
(10), reasoning
into account the fact that the, notlon of a sequence of elements easily to programming than that of a set of elements.
Let
llterale
ordered clauses
(including 4) is written
For each
x,y C L*
x y = ala2...anblb2...bn
XA = AX = X
Let
be a regular set of clauses and Eord a set of ordered clauses obtained by for each clause
substituting
{+pO,-Pl,-P2 . . . . -pn }
of E an ordered clause +Po -Pl -P2 "'" -Pn where the positive literal is placed at the head.
Definition
x,y E L* iff
(s] 3+p E L
is a variant of s
the variables
140
Theorem E
p a v a l u e of p
there - p +p
an a t o m i c f o r m u l a q. we o b t a i n :
such that
By u s i n g p r o p o s i t i o n
Corollary
r E Imin[E]
n > 0 +q and
such that
Let us consider again the preceding example and try to calculate conc'[.[a,nil),.[b,nil),x] Since -conc[.[a,nil),.(b,nil),u) -conc[nil,.[b,n +conc[.(a,nil),.[b,nil),u) +conc[.[a,nil),.[b,nil),.[a,z)) E~rd E~rd
such t h a t
x,y
such that
+conc[u,v,.(a,nil))
E~rd
+conc(u,v,.(a,nil))
Eord
+conc[;(a,x),y,,(a,nil)E~rd
deductions are possible, t h e two s o l u t i o n s y = nil may be deduced i s of semi-decision. not nqcesare and x = . [ a , n i l )
+conc[.[a,nil],nil,.[a,nil))
and s i n c e no o t h e r x = nil Of c o u r s e , sarily y = .Ca,nil)
in general,
finite
However, i n
by i n t r o d u c i n g
141
Definition
containing
triplet x = u -p v
q[x] = [u,-p,v]
u,v E L *
Stronger theorem
Let
The preceding
theorem is
~[x]
[u,-p,v]
142
CHAPTER 2
===~=:===
METAMORPHOSIS
GRAMMARS
2.1
STRINGS, STRING-SCHEMAS
AND CONCATENATION
of functional "nil" .
symbols
We use an infix notation with bracketing tructed with the functlonal symbol
al.S2.---.an_l.a instead of
Let
A string-schema
of length
al.a2.---,an.nil
ei E V
The s t r i n g - s c h e m a string-schemas. a
of
length
0 reduces of length
to
"nil".
We w r i t s
V~
For strings
for
1 we i n t r o d u c e
the abridged
a.nil
contains
no variables,
V*, concatenation
is e law of internal
composition
written as a product and defined by if if Of course, x = nil then xy = y then xy = a1.a2.---.an.Y element is "nil".
x = al.a2.---.en,nil
this is an associative
Moreover,
xy
yE~
.If we
a I e 2 --- a n
el.a2.---,an.nil
143
2.2
RE-WRITING
RELATION
~ AND R E L A T I O N S ~ i A N O ~ *
relation i.e.
between
the elements
of
and let
V be a v o c a b u l a r y
The relation
Starting
w i t h the r e - w r i t i n g of
y y iff iff
, we define
the f o l l o w i n g
relations
bet-
H .
x = y there and exist x,y E V* E V* such that
i+1
u,v,r,s
and
r ~ s i ~ 0
and
usv~
exist
such that
i x ~ y
new r e l a t i o n s
are also r e - w r i t i n g
relations.
2.3
METAMORPHOSIS
GRAMMAR
Definition
where (1) [2) (3)
A metamorphosis
grammar
is defined
by a q u i n t u p l e t
{F,VT,VN,V S, ~)
is a set of functional
symbols
csntaining with
"." and
"nil"
V T is e v o c a b u l a r y V N is a v o c a b u l a r y that
said to be terminal
said to be non-terminal V = VT U VN
V N N V T = B and w r i t e
T h e language
generated L(G)
by the g r a m m a r
strings s ~*
on t}
VT
= {t E ~T I there and _s -~ t
exist s
I~
s E Vs
, t E ~T
then
deep
structure
of
Example (I)
I : Here is an e x a m p l e
of a m e t a m o r p h o s i s with
grammar
= o r d e r [a]
= order [b] = I
= 0
= order [suo]
144
vT = {a,b} VN = Vs U { b s ( x ) Vs = { s u i t e ( x ) I x E H [F]}
I x E H IF]}
satisfying the r e - w r i t i n g relation ~ are enume-
of strin&S
We o b t a i n suite(suc(suc(z@ro)))
Since
a b b b
suite[suc[suc(z~ro)))
~1 a s u i t e [ s u c [ s u c [ s u c ( z @ r o ) ) ) ) ~1 a b b s [ s u c [ s u c ( z @ r o ) ) ) ~1
~1 a b b b bs(z@~e) ~1
In a general of Strings
way,
we notice
that
the language
generated
by this grammar
is the set
of the f o r m a i bj w i t h
- i ~ 0 to each s t r i n g
end t h a t
t h e deep s t r u c t u r e
ai bj
associated
is the t r e e
Example
2 :
Here is a n o t h e r
example
of a m e t a m o r p h o s i s
grammar
(I) F = { n i l , a , b order
[+] = order = I
order order
[formula] [,] = 2
= order
(2) V T = {a,b,<,>,+}
145
V N = V S U (end) U { value[x] V S = {formula[x] The couples rated by : formula(a] formula(b] formula[x] value[y.z] value[nil] end < ~ end ~ > where gnates y and z designate of H[F] + ~ ~ ~ a b
I x 6 H [ F]}
of s t r i n g s s a t i s f y i n g
~ <formula(y) ~ nil
arbitrary
elements u.v
of
H[F]
and
desi-
an element among
of the f o r m
We t h e r e f o r e
obthin
other results ~* a ~* < a + b + a > ~* < a + < < a + b > > >
formula(a]
formu!a[a.b.a.nilJ
"does".
2.4.
METAMORPHOSIS
GRAMMAR
IN NORMAL
FORM
Definition
A metamorphosis : ~x ~ y
grammar
is said to be in normal
f o r m if it s a t i s f i e s
the r e s t r i c t i o n
implies
a E VN given
and
x E V~ examples. The r e s t r i c t i o n
This
in the p r e c e d i n g :
proposed
Property there
I .
G = (F,VT,VN,Vs,~ ]
exists
G' = [F',V+,V~,V~,~'
a E V N , each
a ~* t if#
t E V~
a ~* t
Here
G' with I I
from
order
[2] V~ [3] V~
a E V N} a C V T}
146
(4)
V~
a 4' nt(a)
te(a) 4' a
for for
a E VN a E VT y' with
ax 4 y
implies
if
a' = a
a i E VT if ai E V N
a!m = t e ( a . ) z
y' = nil if
of metamorphosis
Property 2.
For each
t E V~
each
x,y E V* 3z E V*
each wlth
i ~ 0 x 4i z and tz = y
lmplles
can be demonstrated
by induction on
i .
Let
G = (F,VT,VN,Vs,4)
be a m e t a m o r p h o s i s grammar i n normal f o r m .
Definition
: ~
of the Herbrand
universe. :
is constructed
in this way
Vu E V * ,
n E V~, then
Vbl,b2,---,b uv 0 ~ t o V 0
n E VN,
VVo,Vl,---,v
n E H[F]
u 4 tO
u 4 tobbltl~2t2---bntn
and i f
we a l r e a d y
have
~b tnVn ~ Vn-1
147
Let us agree that a binary relation ~ i s If# Yx,y x ~I y implies ~is x~2 y
of this re-
Equivalent o#
definition
[I) and
Let us notice that the following property the couples Ix,y] satisfying ~
is constantly
verified
as we construct
Property
For each
x,y E V* x ~ y
each implies
u E H xu ~ yu
The theorem and the property which follow show that there exists a very simple llnK between the relation ~ and the relation ~*
Theorem
For each x ~y
as ~* t,
tv = y
By
is minimal, v = cw.
we understand
o E
VT
and
w E H
such that
The damenstratlon
2.7
If,
i n t h e p r e c e d i n g t h e o r e m we t a k e
x : ~
with
a E VN
we o b t a i n
Corollary
for a ~t
each iff
a E VN, ~ ~* t
each and
t t
s H E V~
2.6.
CALCULATING RELATION
Let
G = (F.VT,VN,Vs,~)
be a metamorphosis :
We maKe t h e f o l l o w i n g
hypotheses
148
Hypotheses [1) there exist x x we write sets of terms VT and ~N such that ~T ~N iff iff x E VT x E VN
such t h a t
Vx,y
EH[F]
E E
, and R
symbol o f o r d e r
contained in
x,y E ~
Definition
1. Let
be a new relational t by
R.
u E ~,
each
t O E ~T where
^
{ + d [ U V o , t o V o )}
v0
i s a new v a r i a b l e
^
[2) { o r each
u E ~,
each
t i E ~T"
each
b i s VN
Definition
2,
We des E
by
Tr[E]
U g
relational symbols of
Theorem :
The demonstration can be found in paragraph 2.8. By using the corollary of parazraph 2.5, we obtain the new corollary :
Corollary
: For each
a E VN
and
each and
a.nil ~*t
t E V~
2.3. The re-writlng relation ~ can be defined by the minimal interpretation fying the set of clauses E
+r(fcrmula[a].nil,a.nil] +r[formula[b].nil,b.nil]
+r(value(x.y].nll,<.formula[x).end.value(y).nil] +r(value[nil).nll,nil)
+r[endi<.nil,+.nil] +r(end.ni1,>.nil)
graph are s a t i s f i e d
and therefore,
~T = { a , b }
min~nel interpretation
according to the preceding theorem, the relation ~ is defined by the satisfying the set of clauses
-d(formula[x).vl,v o? -d[end.v2,v 1)
- d ( v a l u e [ y ] . v 3 , v 2]
150
to obtain the deep structure i%rmula (a, b. nil ) by the sequence of deductions -d (formula(x). nil,<, a. +. b.>. nil) +d (formula (x). nil,<, a. +. b.>.nil)
+d (formula (a.b. nil) .nil,<.a.+,b.>. nii) and inversely to produce the terminal string
< a + b >
i~rom the deep structure formula (a. b. nil) by the sequence of deductions -d (formula(a. b. nil). nil, x) +d (formula (a. b. nil). nii, x)
+d(formula(a.b.nil).nil,<.a.+.b.>.nil)
Remark :
are always
of the form :
d(f(
where f
1 ). 2 , 3 )
n 9 This is true in a general
way and results fmom the restrictive hypotheses stated at the beginning of the paragraph. We can therefore substitute for each of these formulae the formula :
where
f'
f'( 1 , 2 , 3 ) i s a new r e l a t i o n a l
n+2
Tr[E]
+formula'Ca,va,a. Vo) + f o r m u l a ' { b , V o , b . v o) + f o r m u l s ' ( X . V l , V o) - e g a l ( x , r . s ) - v a l u e ' C x , v l , v o) - v a l u e ' ( y , v 3 , v 2) +valueP(nil,Vo,V o) +end'(<.Vo,+.v o) +end'(Vo,>.v o) +egal(x,x)
151
and to obtain,
for example,
formula[a.b.nil] of the string < a + b > we need only make the sequence of deductions -formula'(x,nil,<.a.+.b.>.nil) :
+formula'(x,nil,<.a.+.b.>.nil)
+formula'[a.b.nil,nil,<.a.+.b.>.nil)
Theorem
for all
x,y E H
(I)
x ~ y
iff
[2)
3a E V N, As ~ t,
3s,t
E V~, y=tv
3v E H and v
such t h a t minimal
x=~sv,
Demonstration,
Ist part.
is true constantly ~. If x ~ y
the couples
(x,y)
is constructed
by using rule
there exists
a E V N, u , t 0 E V { , x = E u v 0, posing
v0 E H
such that
y = t o V 0, a_u ~ t o
v 0 = WoV with s = uw O,
w 0 E V~
and
minimal
t = tow 0
't52
(x,yJ
, Because of this, and because we are reasonin@ on a grammar ~n normal form, there exists
U,to,tl,---,t x = ~ u v n, ~ItlVl n E V~, y :tov a,bl,b2,---b n E VN, v o , v l , - - - , v ~ n E H
such t h a t
o,
~ t~It1~t2---~t ---
~ v O,
~ 2 t 2 v 2 ~ v 1,
, ~ n t n V n ~ Vn_ I
the couples
(bitivi,vi_l)
= wim i
with
w E V~ implies
and
bitiwi ~
wO, _b2t2w 2 ~ *
WI ,
---
, btnW n ~*
Wn_ 1
we have
bltlb2t2---btnW and t h e r e f o r e ~uw n ~ toW 0 n ~* w0
posing
s = uwn, t = tow 0
we f i n a l l y
as ~ t ,
obtain
x = auv = auw v ~ a s v , y = t o Y 0 = toWo v = t v
6 V~,
Vv 6 H
as ~
and
mlnimal implies
asv ~ tv
Recalling the property of ~ cited in paragraph 2.5, we deduce from it that we need only to demonstrate that
Va s VN, as ~ t Ys,t implies E V~ as ~ t
153
This last implication is the particular case of the proposition which ensues when u = nil Va E V N, ~su ~ t Vs,t E V~, u Vu E V * , ~ v and V i ~ O, asv ~ t a v E V~, and 3j ~ 0 such t h a t
implies
j ~ i
Let us demonstrate this last proposition by induction on The proposition is true for plication is false since a E VN and t E V~ and V N n vT = ~ implies
asu # t
Let us suppose the propositiod true for is true for i+l . If asu ~ i+1 t
O~<k<i
there exist
r E V~
The passage f r o m
[a]
asu
aSU'
= r
with
u ~lu'
u ~ E V*
since
asu' ~ i t
there exist
v E VT
and
j ~ 0 j~i
such that
U 7 ~J Vj
asv ~ tj
[b]
with
~s' ~t O,
s',t 0 E V~,
u' E V*
according to the property 2 such that u' ~i therefore su : s'u' i S.Vo Vo ' toy 0 = t
v 0 E V~
v E V~
V,
SV
S'V 0
154
:~
(c)
= r
with b i E V N, u' E V*
s',t i E V~,
v 0 E V{
b.t.b~t~---b n t n u' ~ -
-~i
v O,
the preposition that we wish to demonstrate k ~< i, there exists b2t2---bntnU' v I E VT ~Jl v I, and Jl >~ 0
bltlVl ~ Vo "
the proposltien that we wish to demonstrate being supposed true for k ~< i~ there exists b3t3---bntnU' v 2 E V~ and J2 >~ 0 such that J2 ~< i
-*J2 v 2 , _b2t2v 2 ~ v I ,
o,o~
....
~176
........
. . . . . . .
~ 1 7 6 1 7 . . . . . . . . 6
, o ~ 1 7 6 . . . . . . .
~ 1 7 6
Jn ~ i
such that
~jn v, SV = SPV n
155
2.8
Theorem
for all x ~ y
is the set o{ clauses of the form U {-r(u,v)} with that E U Trbis[E] ~ {{+d(x,y)}} +r(u,v) element of a clause of
[+r(u,v)] need o n l y =
to demonstrate if{
{{+d(x,y)}}
U {{-d[x,y)}}
E U Trbis[E]
U {{-d(x,y)}}
If
satisfies
Tr[E]
we can a r r a n g e In that by
in G
such a way t h a t
it
contains
no f o r m u l a
of the form of E
case l e t I
be t h e s e t o f a l l
the values
satisfied with
. The i n t e r p r e t a t i o n
I 3g E G U {{-d[x,y]}}
+r(u,v)
E g}
satisfies
E U Trbis[E]
I{
that
according
to
hypothesis
[3)
of paragraph
2.6,
there
could only
a clause-value
of the form
t[+r[u,v)]
does n o t o c c u r i n I
g) g
. There{ore
does n o t s a t i s f y
it.
Since
does n o t s a t i s f y
g ,
satisfies
J , which contradicts
(a).
156
If
satisfies
E U Trbis[E] J
U {-d(x,y)}
I , obtained
by removing from
, satisfies
Tr[E]
U {-d[x,y)}
If that were not the case, there would exist a clause value of the form
t[+r[u,v)] not s a t i s f i e d
U not s a t i s f i e d by J . By h y p o t h e s i s J
satisfies
by I and t h e r e { o r e U g and
{+r[u,v)} therefore 3
t[+r(u,v~]
U {-r(u,v)}
satisfies U g
t[+r[u,v]] which i s c o n t r a d i c t o r y .
x,y~H
Let us first demonstrate that Imin[E U Trbis[E]] = Imin[E] U Kmin Kmin = the smallest K ~ Id Id = { d [ u , v ) such that Imin[E] U K satisfies Trbis[E]
I u , v E H}
Indeed, let I be an interpretation satisfying E U Trbis[E] Let us pose I' = Imin[E] U K We obtain on the one hand I' c I on the other hand, I' satisfies also with K = I N Id
E U Trbis[E] since I' satisfies E by definition and satisfies Trbis[E] which contains no
+rCu,v). Therefore
Im[E U Trbis[E]] = the smallest hence the required result. Imin[E] U K which satisfies E U Trbis[E]
157
L e t us s p e c i f y
of U K
may be w r i t t e n
+r[u,v] element of a value of a clause of Imin[E] U K satisfies E U {-flu,v)}
implies that or
t[+r[u,v)]
of a clause
of
and
t[+r[u,v]]
E Imin[E] Imin[E],
+r[u,v)
element
of
a value E
of a clause
of E
- {r[u,v)}
does n o t s a t i s f y
; the prece-
ding property
satisfies
t[+r[u,v)]
and t h e r e f o r e
implies
L e t us now n o t i c e defined by
that
t o each
K c Id
the relation
u ~K v
iff
d[u,v)
E K
The relation
~Kmin
is therefore
the smallest
relation
satisfying
the points
{I]
of ~ in paragraphe
to the equivalent
d[x,y)
E Kmin
iff
x ~ y
158
CHAPTER 3
= = = = = = = = =
INTRODUCTION TO PROLOG
PROLOG Is e programming language which materialises ideas developed in chapter 1. (In fact, these ideas only became clear after the birth o~ PROLOG). In this language each instruction is therefore a logical statement and the execution of a programme consists in making deductions.
More precisely, a PROLOG programme will consist in a sequence of clauses. Each clause is a sequence of llterals and ends with either a full-stop or an exclamation marK, The clauses ending with a full-stop correspond to instructions to be recorded, while those ending with an exclamation-marK correspond to instructions to be executed immediately. If we take up the example common to paras 1.2 and 1.3, it may be written in PROLOG :
-SORT(*X)
Let us note in passing that the variables are preceded by an asterisK. The general system, of which a large part is written in PROLOG, reads the first two clauses, records them and launches an execution as soon as it has read a third clause. This execution consists in taking the third clause culating successively the clauses x as a starting-point and in calsuch that
two c l a u s e s
The s e l e c t i n g exist
literal.
a Yi there
clauses a
order in
to construct
Yl E~rd
Yi+I
159
direction
c2
are recorded
The literal
-SORTC~X)
other literals
: it is a specie1
the printing
. (This Kind of mechanism will be described after reading the third clause,
Therefore,
.(A,.CB.NIL))
then,
3.2
PREDEFINED
RELATIONS
In PROLOG there exist a certain number of relatioqalsymbols set of clauses or by sub-programmes on one of them).
predefined
by a standard
relations
Input and o u t p u t
reads the next character reeds the next character writes the character x.
and u n i f i ~ i t
with
x. with x .
jumps a line on the output device. writes the term x . writes one after the other the characters considers constituting the string x is x.
AJOP(",",n,"f")
of characters
an infixed functional
specified
-AJOP(,.,,,I,,,X=CX=X] ,,) will allow us to note the functional in the usual manner. symbol ....
160
Note :
It is a l w a y s permitted
to write
in@tead o f
C1.C2.---.Cn.NIL
Creation
of c l a u s e s and symbols
AJOUT(x)
all the clauses which already exist within the system. Example : the evaluation of
UNIV[x,y)
Example : t h e e v a l u a t i o n
of
-UNIV[*X,(T.O.T.O.NIL).F(A).G(B).NIL)
unifies *X wlth
(T.O,T.O.NIL).F(A].G(B).NIL
(1) +P(*X)
-O(*X)
-R(*X)
-/
-S(*X)
(2] +P[*X) -U(*X) . To evaluate a literal of the form -P(y) we w i l l first use the
a)
If
the literals
-O(y]
and
-R[y)
one w i l l [2).
evaluate
but on r e t u r n i n g
one w l H n o t
use t h e c l a u s e
161
(b) If one cannot evaluate eli the llterals precede -/ , one will use clause (2).
of clause
(1) which
Treatment
o4 characters
and integers
x x
INF(x,y)
is strictly
3.3
TREATMENT
OF METAMORPHOSIS
GRAMMARS
IN PROLOG
language PROLOG was conceived to facilitate grammars in normal form. These grammars,
the definition
and use
of course,
must satisfy
o4 pera 2.8.
:VALUE(*X.*Y) :VALUE(NIL)
==.
+EGAL(*X,*X).
The terms which correspond to non-terminals Cpseudo-non-terminals) (pseudo-terminals) are preceded are preceded by by "#"
to terminals
written in PROLOG)
into account the remark at the end of pare 2.6. Each pseudo-non-terminal transformed are inserted into a literal with two supplementary into these supplementary arguments. arguments.
162
To analyse or synthesise a string one must use the prede~ined relational (abbreviation og synthesis) which plays the same role as cution o~ -SYN(FORMULA(*X).NIL,< .A.+.B.> .NIL) -SORT(FORMULA(*X))!
symboi
SYN
w i ~ provoke the printing o~ the deep structure og <.Am+.B.>.NIL whereas the execution o~ -SYN[FORMULA(A.B.NIL).NIL,*X) -SORT(*X)!
wiI1 provoke the printing og the terminaI sequence og which the deep structure is FORMULA(A.B. NIL)
163
4.1
It will be constituted
principally
It contains no declarations
the reader may deduce the semantic part from the notations
<progran~
::=
<instruction>
<identifier> while
exp I> I J
<boolean
<instruction> <boolean
repeat <instruction> read goto if else if <identifier> <identifier> <boolean exp I>
until write
<arithmetical
then
<instruction>
<instruction> ::=
2>
exp 2> *
<integer>
exp I>
164
<boolean
exp 2>
::=
<boolean and
exp 3>
I <boolean
exp 2>
<boolean <boolean
<boolean
exp 3>
::=
not
<arithmetical
<relation> <integer> ::= = I less ::= <digit> ::=
<arithmetical
<identifier>
<letter>
The m a c h i n e which executes the compiled programme is constituted by a series of memories (numbered from O) and by an accumulator (which we shall call accu). . The execution
of the programme starts with the instruction contained in the memory n ~ O. Here is the llst of the instructions end pseudo-instructions of the machine :
LOAD n STOR n PLUS n MINU n MULT n GOTO n GOZE n GONE n GONZ n GONN n WRIT READ
load into the accu the contents of memory n ~ store into memory n ~ n
add to the contents of the aceu the contents of memory n ~ n subtract from the contents of the accu the contents of memory n ~ n multiply the contents of the accu by the contents o4 memory n ~ n gore the instruction contained in memory n ~ n geto n if the contents of the aceu = 0 (goto if zero] goto n it the contents of the aceu < 0 (goto if negative) gore n if the contents c~ the accu ~ 0 (goto if not zero) gota n if the contents o4 the aecu ~ 0 (goto if not negative] write the integer contained in the accu read an integer and load it into the accu stop allocate a memory and (pseudo-instruction executed when leading the program) it with n (pseudo-instruction executed
STOP
allocate a memory
165
4.2
Here is the whole of the programme PROLOG which constitutes this compiler, followed bY two examples of program-compilation, In the case o# the first example we print intermediate results,
166
)!
** ( i ) READINGOF THE SOURCE-PROGRAM. +READING(*L.*U) - / -LUB(*K) -TR(*K.NIL,*L) -READBIS(*L,*U). +READBIS(DOT,NIL) - / . +READBIS(BLANK,*U) - / -READING(*U). +READBIS(*K,*M.*U) -LU(*L) -TR(*L.NIL,*M) -READBIS(*M,*U). +TR(".",DOT) - / . +TR("*",STAR) - / . +TR(")",RBRACK) - / . +TR(.... ,BLANK) - / . +TR("(",LBRACK) - / . +TR(*K.NIL,*K).
** (2) PREATREATINGOF THE SOURCE-PROGRAM. +PRETREATING(*U,*V) -SYN(UNITS(*V).NIL,*U) -OUT(*V). :UNITS(*U.*X) : : :UNIT(*U) - / :SPACE :UNITS(*X). :UNITS(NIL) =:.
:SPACE == s -/. :SPACE :=.
:UNIT(IN(*X)) == s -CHIFFRE(*K) - / :DIGITS(*U) -UNIV(*X,(*K.*U).NIL). :UNIT(*Y) == s -LETTRE(*K) - / :ALPHANUMS(*U) -UNIV(*X,(*K.*U).NIL) -CHGT(*X,*Y). :UNIT(*K) == s :DIGITS(*K.*U) =: s :DIGITS(NIL) ==. -CHIFFRE(*K) - / :DIGITS(*U). -ALPHANUM(*K) - / :ALPHANUMS(*U). +ALPHANUM(*K) -CHIFFRE(*K). +CHGT(*X,ID(*X)). +DO. +DR. +END. +REPEAT.
167
** (3) ANALYSISOF THE SOURCE-PROGRAM. +ANALYSIS(*S,*I) -SYN(PROG(*I).NIL,*S) - / -OUT(*I). +ANALYSIS(*S,*I) -LIGNE -SORM("SYNTAX-ERRDR") -LIGNE -CULDESAC. :PROG(*I) == :INST(*I) s :INST(SEQ.*I.*S) == s - / :INST(*I) :INSTRS(*S) s :INST(ASSIGN.*X.*Y) == s s s - / :EXP(ARIT,I,*Y). :INST(WHILE.*B.*I) == s - / :EXP(BdOL,I,*B) s :INST(*I). :INST(REPEA.*B.*I) == s - / :INST(*I) s :EXP(BODL,I,*B). :INST(GOTD.*X) == s s -/. :INST(READ.*X) == s s -/. :INST(WRITE.*X) == s - / =EXP(ARIT,I,*X). :INST(*IF.*B.*S) == s - / ~EXP(BOOL,I,*B) :ENDIF(*IF,*S). :INST(LABEL.*X.*I) == s s - / :INST(*I). :INST(SEQ.NIL) ==. :ENDIF(IFI,*I) == s - / :INST(*I). :ENDIF(IF2,*I.*3) == s :INST(*I) s :INSTRS(*I.*S) == s :INSTRS(NIL) ==. :INST(*3).
- / :INST(*I) :INSTRS(*S).
=EXP(*T,3,*X) == s - / :EXP(*T,I,*X) s :EXP(ARIT,3,*X) == s -/. :EXP(ARIT,3,1N(*X)) == s -/. :EXP(BOOL,3,NOT.*B) == s - / :EXP(BOOL,3,*B). :EXP(BOOL,3,*R.*X.*Y) == - / :EXP(ARIT,I,*X) s -RELATION(*R) :EXP(ARIT,I,*Y). :EXP(*T,*N,*X) == -INF(*N,3) -PLUS(*N,I,*M) :EXP(*T,*M,*Y) - / :ENDEXP(*T,*N,*Y,*X). :EXP(ARIT,I,*X) == :ENDEXP(ARIT,I,IN(O),*X). +RELATION(=) - / . +RELATION(LESS).
:ENDEXP(*T,*N,*X,*Z) == s -DPERATDR(*R,*T,*N) - / -PLUS(*N,I,*M) :EXP(*T,*M,*Y) :ENDEXP(*T,*N,*R.*X.*Y,*Z). :ENDEXP(*T,*N,*X,*X) ==. +OPERATDR(DR,BOOL,1) - / . +OPERATDR(+,ARIT,I) - / . +DPERATOR(STAR,ARIT,2). +DPERATOR(AND,BOOL,2). -/ +OPERATOR(-,ARIT,1). -/
168 ** (4) SYNTHESIS OF THE MACHINE-CODE. +SYNTHESIS(*I,*S) -SYN(PRO(*I).NIL,*S) -DUT(*S). :PRO(*I) == :INS(*I,*U.*V.*W) s :ALLOCATION(*W). :ALLOCATIDN(*V)
:INS(SEQ.NIL,*D) := - / . :INS(SEQ.*I.*S,*D) == - / :INS(*I,*D) :INS(SEQ.*S,*D). :INS(LABEL.*X.*I,*U.*V.*W) == s - / -ADR(*X,*E,*U) :INS(*I,*U.*V.*W). :INS(ASSIGN.*X.*Y,*U.*V.*W) == - / -ADR(*X,*E,*V) :EXPARIT(*Y,*V.*W) s :INS(WHILE.*B.*I,*D) == s -/ :IFGO(NOT.*B,*F,*D) :INS(*I,*D) s s :INS(REPEA.*B.*I,*D) == s -/ :INS(*I,*D) :IFGO(NOT.*B,*E,*D). :INS(GOTO.*X,*U.*V.*W) == s -/ -ADR(*X,*E,*U). :INS(READ.*X,*U.*V.*W) == s s -/ -ADR(*X,*E,*V). :INS(WRITE.*X,*U.*V.*W) == -/ :EXPARIT(*X,*V.*W) s :INS(IF2.*B.*I.*J,*D) == -/ :IFGO(*B,*E,*D) :INS(*J,*D) s s :INS(*I,*D) s :INS(IFI.*B.GOTO.*X,*U.*V.*W) == -/ :IFGO(*B,*E,*U.*V.*W) -ADR(*X,*E,*U). :INS(IFI.*B.*I,*D) == :IFGO(NOT.*B,*E,*D) :INS(*I,*D) s :ALLOCATION(NIL) == -/. :ALLOCATION((*X.*E).*U) == s :ALLOCATION(*U). +CONTENT(IN(*X),*X) -/. s -CONTENT(*X,*Y)
+CONTENT(*X,EMPTY).
:IFGO(OR.*B.*C,*E,*D) == -/ :IFGO(*B,*E,*D) :IFGO(*C,*E,*D). :IFGO(AND.*B.*C,*E,*D) == -/ :IFGO(NOT.*B,*F,*D) :IFGO(*C,*E,*D) s :IFGO(NOT.NOT.*B,*E,*D) == -/ :IFGO(*B,*E,*D). :IFGO(NOT.OR.*B.*C,*E,*D) == -/ :IFGO(AND.(NOT.*B).NOT.*C.*E,*D). :IFGO(NOT.AND.*B.*C,*E,*D) == -/ :IFGO(OR.(NOT.*B).NOT.*C,*E,*D). :IFGO(NOT.*R.*S,*E,*D) == -/ :IFGO((NOT.*R).*S,*E,*D). :IFGO(*R.*X.*Y,*E,*U.*V.*W) == :EXPARIT(-.*x.*Y,*v.*w) s -HOMOLOGOUS(*R,*Q). :EXPARIT(*R.*X.*Y,*V.(EMPTY.*E).*W) == -COMPLEX(*Y) -/ :EXPARIT(*Y,*V.(EMPTY.*E).*W) s :EXPARIT(*X,*V.*W) s :EXPARIT(*R.*X.*Y,*V.*W) == -/ -HOMOLOGOUS(*R,*Q) -ADR(*Y,*E,*V) :EXPARIT(*X,*V.*W) s :EXPARIT(*X,*V.*W) == s -ADR(*X,*E,*V). +COMPLEX(*R.*X.*Y). +HOMOLOGOUS(=,GOZE) -/. +HOMOLOGOUS(LESS,GONE) -/. +HOMOLOGOUS(+,PLUS) -/. +HOMOLOGOUS(STAR,MULT). +HOMOLOGOUS(NOT.=,GONZ) -/. +HOMOLOGOUS(NOT.LESS,GONN) -/. +HOMOLOGOUS(-,MINU) -/.
169 ** (5) ASSEMBLINGOF THE MACHINE-CODE. +ASSEMBLING(*X,*U) -ASS(*X,*U,O). +ASS(LAB(*N).*X,*U,*N) - / -ASS(*X,*U,*N). +ASS(CODE(*C).*X,*C.*U,*N) - / -PLUS(*N,I,*M) -ASS(*X,*U,*M). +ASS(NIL,NIL,*N) - / . +ASS(*X,*U,*N) -SORM("ERROR: TWICE THE SAMELABEL") -LIGNE -CULDESA[.
** (6) FINAL PRINTING. +PRINTING(*X) -LIGNE -PRI(D,*X) -LIGNE -LIGNE. +PRI(*N,*X.*Y) - / -SORT(*N) -SORM(" -PLUS(*N,I,*M) -PRI(*M,*Y). +PRI(*N,NIL).
") -SDRT(*X) -LIGNE
170
-TRACE -COMPILE! BEGIN READ NI READM; IF N T N=5 A D (M LESS 1D O M=50) THEN WRITE D O N R ELSE WRITE (2+N)*(IO+M) END. BEGIN.READ.ID(N).I.READ.ID(M).~.IF.NDT.ID(N).=.IN(5).AND.LBRACK.ID(M) .LESS.IN(1D).OR.ID(M).=.IN(50).RBRACK.THEN.WRITE.IN(O).ELSE.WRITE.LBR ACK.IN(2).+.ID(N).RBRACK.STAR.LBRACK.IN(IO).+.ID(M).RBRACK.END.DOT.NI L
SEQ.(READ.N).(READ.H).(IF2.(AND.(NOT.=.N.IN(5)).OR.(LESS.M.IN(1D)).=.
M.IN(SO)).(WRITE.IN(D)).WRITE.STAR.(+.IN(2).N).+.IN(1D).M).NIL CODE(READ).CODE(STDR.*XO).CODE(READ).CODE(STOR.*X1).CDDE(LOAD.*XO).CO DE(MINU.*X2).CODE(GOZE.*X3).CODE(LOAD.*X1).CODE(MINU.*X4).CODE(GONE.* XS).CODE(LDAD.*X1).CODE(MINU.*X6).CODE(GDZE.*XS).LAB(*X3).CODE(LOAD.*
X4).CODE(PLUS.*X1).CODE(STDR.*X7).CDDE(LOAD.*XB).CODE(PLUS.*XO).CODE( MULT.*XT).CDDE(WRIT).CODE(GOTD.*Xg).LAB(*XS).CODE(LDAD.*XID).CODE(WRI
T).LAB(*Xg).CODE(STOP).LAB(*XO).CODE(ALLD.EMPTY).LAB(*X1).CODE(ALLD.E MPTY).LAB(*X2).CDDE(ALLO.5).LAB(*X4).CODE(ALLD.1D).LAB(*X6).CODE(ALLO .SD).LAB(*XS).CODE(ALLO.2).LAB(*XID).CODE(ALLO.D).LAB(*XT).CODE(ALLO. EMPTY).NIt READ STOR.24 READ STOR.25 4 LOAD.24 5 MINU.26 6 GOZE.13 7 LBAD.25 B MINU.27 9 GONE.21 1D LDAD.25 11 MINU.28 12 GDZE.21 13 LDAD.27 14 PLUS.25 15 STDR.31 16 LDAD.29 17 PLUS.2A 18 MULT.31 19 WRIT 20 GOTO.23 21 LOAD.30 22 WRIT 23 STOP 24 ALLO.EMPTY 25 ALLD.EMPTY 26 ALLO.5 27 ALLO.1D 2B ALLO.5D 29 ALLO.2 30 ALLO.O 31 ALLO.EMPTY 0 1 2 3
171
-TRACE -COMPILE!
BEGIN READ N; IF i0 LESS N DO GOTO TOOBIG; l::O; F:=I; WHILE I LESS N DO BEGIN I:=I+i; F:=I*F END; WRITE F; TOOBIG: END.
0 1 2 READ STDR.22 LOAD.23 MINU.22 GONE.21 LOAD.25 STOR.24 LOAD.27 STOR.26 LOAD.24 MINU.22 GONN.19 LOAD.2A PLUS.27 STOR.24 LOAD.24 MULT.26 STOR.26 GOTO.9 LOAD.26 WRIT STOP ALLB.EMPTY ALLO.IO ALLO.EMPTY ALLO.O ALLO.EMPTY ALLO.I
3
4
5
6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27
172
4.3
EXPLANATION
OF THE P R O G R A M
clause
indicates with
that the f o n c t i o n n a l
symbol
"."
will
be w r i t t e n
in
notation
right-to-lsft
parenthesising.
phasis.
The phases
[1),
(2),
succes-
[1) R e a d d n g
The s o u r c e - p r o g r a m of characters. is r e d u c ~ t o
is read
character
by c h a r a c t e r
spacing
...., "(",
renamed
respectively
DOT,
The string
of c h a r a c t e r s
is t r a n s f o r m e d
[by
into a string
of units. capped
a basic symbol
or an i d e n t i f i e r
ID, or by an integer
capped
by the f u n c t i o n n a l
symbol
IN.
[3) A n a l y s i s which
of the source-program.
The string
of units
is t r a n s f o r m e d
into a tree
This
is done by means
of a m e t a m o r p h o s i s and the
grammar boolean
expressions
are treated
compact
structure
of all the n o r m a l i s e d
program-forms.
<normalised <instruction>
forme> ::=
::=
prog
[ <instruction> ) I
[ saqu
exp> ) i ) l
) l
. <boolean . <boolean
exp> )
I
exp> ) i
. <identifier>
I [ <instruction> ( <integer> )
::= in
<identifier> exp>
I
exp> )
( <+ - mult>
.<arithmetical
. <arithmetical
173
<boolean
exp>
::=
<arithmetical exp> ) .
exp>
<arithmetical exp> ) I
exp>
<boolean exp>
<boolean
<boolean
4+
melt>
::=
I mult
<and or>
::=
<relation> , <integer>
(4) Synthesis of the machine-code, A metamorphosis grammar transforms the normalised form of the program into a string of elementary instructions. Each elementary instruction is either an instruction of which the address-part is a prolog-variable, the all capped by the functionnal symbol CODE, or a prolog-variable representing an address capped by the symbol LAB [for label].
The last parameter of the non-terminal INS represen~ three tables. Each of these is composed of a sequence of doublets [object, prolog-variable representing the address where it is to be found). The first table associates an address to each identifier representing a label. The second table associates an address to each integer or ideptifier representing an integer. The third table associates an address to each supplementary memory necessary to compute an arithmetical expression. These table which, at the start, are represented by prolog-variables, are constantly updated and consulted by the predioat ADR(x,e,v] which gives the adress the table v . e of the object x recorded in
The code of the boolean expressions is generated by means o~ the non-terminal IFGO [b,s,d) which means : "if the boolean expression b is true then goto e" (d repre-
sents the three preceding tables]. This code is optimised so as to minimise the evaluations of relations at run-time.
The code of arithmetical expressions is optimised in the sense that in an operation whers the second operand is simple pllcate no supplementary memory is used. So as not to com(by playing on associativity, commu-
(5) Assembling of.the mecbine-eode. All the addresses represented by prolog variables are replaced by integers and same minor transformations result. allow us to obtain the final
(6) Final printing. The final result is printed, each instruction being numbered.
174
(7) Printing of intermediate results. The call for TRACE provokes the printing of Intermediate results. Recalling TRACE suppresses this printing. A second recalling provokes printing anew, etc ....
175
CHAPTER
= = = = : = = : =
5.1
We propose to write a system allowing us to hold an "intelligent" conversation in French with the computer. The conversation will concern relations of friendship and parenthood between different persons. The user of this system will be able :
- to communicate information to the computer by means of affirmative and negative sentences (and possibly by replying "yes" or "no" to certain questions)
- to suppress information (volontarily or involontarily] by negative or affirmative sentences contradicting facts previously stated ;
t o ask q u e s t i o n s
("who i s . . . . ")
or "qui
n'est
pas,.,"
("who is not.,.") to which the computer will attempt to reply by making a certain number o~ deductions
- to ask questions of the type "pourquoi..." ("why...") to which the computer will reply by retracing the sequence of deductions which allows him to reach the conclusion ;
to ask the computer to write or not to write intermediate results instrumental in his understanding of sentences and in his reasoning
- t o s t o p t h e system by s a v i n g
"au r e v o l t
!"
Here i s t h e s e t o f a l l
the sentences
("phrases"
o f French i n which t h e i n t e r l o c u t o r
must address t h e c o m p u t e r ,
<phrase>
::=
AU REVOIR i
I OUI [ I
NON ! I QUI <he> <Bet> <pas> <sn> ? POURQUOI EST-CE <que> <en> <ne> <est> <pas> <sn> ? I <sn> <ne> <est> <pas> <sn>.
<sn>
::= <art> <nom> <de> <sn> I TOUT LE MONDE QUELQU'UN I JE I <nom propre>
I PERSONNEI
176
<art> <nom> DE JE <art> <de> LE <est> <mon> <ne> <nom> ::= ::= ::= ::= ::= ::=
AUCUN I AUCUNE I CHAQUE I L' l LA [ LE I UN [ U N E DU SUIS I EST MON l MA <vide> I N' i NE AMI I AMIE l FEMME l FILLE I FILS l FRERE I MARI l MERE l NEVEU I NIECE l ONCLE J PERE I SOEUR l TANTE <de> ::= d' I de
>
::=
<vide> OU '
Here now is the list of ell the replies ["r@ponses" in French) which can be produced by the computer.
<r~ponse>
::= <r~pons e & : qui...> I <r~ponse ~ : pourquo <rejet d'une phrase> I <demande d'information>
<r6ponse ~ : qui...>
::=
::=
<et que>
::=
::=
JE NE COMPRENDS PAS TRES BIEN. IL Y A UN PROBLEME DE SEXE. VOUS VOUS CONTREDISEZ. J'OUBLIE OUE VOUS M'AVEZ DIT : ' <phrase> '
<demande d'information>
::=
VOUS ETES BIEN DE SEXE MASCULIN ? I <nom propre> EST BIEN DE SEXE MASCULIN ? l OUI ] OU NON !
5.2
Here is the whole of the program prolog which realises the system.
177 -AJOP( "ET" ,1 ,"Xs (Xs -AJOP("IMPLIQUE" ,1 ,"Xs163 -AJOP( "NON",2 ,"s ! -AJOP( " / " ,3 ,"Xs (Xs ! -AJOP(" ." ,4 ,"Xs163 ") ! -AJOP( " - " ,5 ," (Ds163 !
** (O) IMBRICATION DES DIFFERENTES PHASES. +BAVARDONS -ECHANGES(1) -IMPASSE. +ECHANGES(*N) -PHRASELUE(*P) -AJOUTER(+(PHRASE(*N,*P)).NIL). +ECHANGES(*N) -PHRASE(*N,*P) -PRETRAITEMENT(*P,*Q) -ANALYSE(*Q,*R) -ENONCE(*R,*S,*N) -AJOUTER(*S). +ECHANGES(*N) -HALTE -MESSAGE("BONSOIR!") - / . +ECHANGES(*N) -ABSURDE(*R) -SORTIR(*R) -ELAGUER(*R,*S,*M,*P) -REPONSE(*N.*S,*M,*P). +ECHANGES(*N) -QUESTION -AJOUTER(+(KO(*N)).NIL). +ECHANGES(*N) -PLUS(*N,I,*M) -ECHANGES(*M). +ELAGUER(NIL,*S,IOOOO,*P) - / . +ELAGUER(*I-*U.*R,*S,*K,*P) - / -OK(*I) -DANS(*I,*S) -ELAGUER(*R,*S,*J,*P) -MIN(*I,*Jg*K). +ELAGUER(INDIVlDU(*X).*R,*S,*K,*P) - / -CONFIRMATION(*X) -ELAGUER(*R,*S,*K,*P). +ELAGUER(*P.*R,*S,*K,*P) -ELAGUER(*R,*S,*K,*P). +MIN(*I,*3,*I) -INF(*I,*3) - / . +MIN(*I,*J,*J).
+REPONSE(*S,*N,NIL) - / -AJDUTER(+(KO(*N)).NIL) -PHRASE(*N,*P) -MESSAGE("VOUS VOUS CONTREDISEZ. 3'OUBLIE QUE VOUS M'AVEZ DIT:") -DIRE("'".*P.' .... ) -LIGNE. +REPONSE(*S,*N,QUI(*X)) - / -VALR(*X,*P) 'MESSAGE(*P."!"). +REPONSE(*M.*S,*N,POURQUOI) -MESSAGE("PARCEQUE JE SAIS RAISONNER.") -ETQUE(*M,*S)~ +VALR(NIL-IN(NIL),"TOUT LE MONDE") - / . +VALR(F-IN(NIL),"TOUTE FEMME") - / . +VALR(M-IN(NIL),"TOUT HDMME") - / . +VALR(*G-JE,"VOUS") - / . +VALR(*G-IN(*A),*P) -UNIV(*A,*P). +ETQUE(*N,NIL) - / . +ETQUE(*N,*N.*S) -ETQUE(*N,*S). -/ +ETQUE(*N,*M.*S) -PHRASE(*M,*P) -DIRE("ET QUE VOUS M'AVEZ DIT: " . " ' " . * P . " ' " ) -LIGNE -ETQUE(*N,*S). ** (1) DEMANDED'INFORMATION SUPPLEMENTAIRE. +CONFIRMATION(*G-*I) -VAR(*G) - / . +CONFIRMATION(*G-*I) -NOMPROPRE(*H-*I) - / -PAREILS(*G,*H). +CONFIRMATION(*G-*I) -RECONNU(*H-*I) -AJOUTER(+(NOMPROPRE(*H-*I)).NIL) -PAREILS(*G,*H). +RECONNU(*G-JE) - / -MESSAGE("VOUS ETES BIEN DE SEXE MASCULIN?") -REPONSELUE(*G). +RECONNU(*G-IN(*A)) -UNIV(*A,*U) -MESSAGE(*U." EST BIEN DE SEXE MASCULIN?") -REPONSELUE(*G).
178
+REPONSELUE(*G) -PHRASELUE(*P) -RESULTAT(*P,*G). +RESULTAT("OUI!",M) - / . +RESULTAT("NON!",F)/ . +RESULTAT(*P,*G) -MESSAGE("OUI! O NON!") -REPONSELUE(*G). U ** (2) PRETRAITEMENT DE LA PHRASE. +PRETRAITEMENT(*U,*V) -SYN(UNITES(*V).NIL,*U).
:UNITES(*U.*X) : : :UNITES(NIL) ==. :ESPACE : : s :ESPACE == s :UNITE(*U) - / :ESPACE :UNITES(*X).
-/ -/
:ESPACE. :ESPACE.
-/
:ESPACE.
:UNITE(*X) == s :UNITE(*K) =: s
** (3) ANALYSEDU FRANCAIS. +ANALYSE(*P,*Q) -SYN(PH(*Q).NIL,*P) - / -SORTIR(*Q). +ANALYSE(*P,*Q) -MESSAGE("JE NE C M R N S PAS CETTE PHRASE.") O PED -IMPASSE. :PH(*U.HALTE) == s s s -/. :PH(*U.DETAILS(*U)) == s s s -/. :PH(NDN *U.DETAILS(*U)) == s s s s -/. :PH(NON(*Q ET *U.DANS(QUI(*X),*U)) ET *V.QUESTION) == s -/ :NE :EST :PAS(*P.*Q) :CO(*X.*P) s :PH(NON(*R ET *U.DANS(PDURQUOI,*U)) ET *V.QUESTION) == s s s - / :QUE :SN(*X.*P.*Q) :NE :EST :PAS(*Q.*R) :CO(*X.*P) s :PH(*R) : : :SN(*X.*P.*Q) :NE :EST :PAS(*Q.*R) :CO(*X.*P) s :CD(*X.*Q) == :ART(*X.*P.*Q.ILYA(*I,*P ET *Q)) - / :NOM(*X.*Y.*P) :DE :SN(*Y.*P.*Q). :CO(*X.*Q) =: :SN(*Y.(*U.EGAL(*X,*Y,*U)).*Q). :SN(*X.*Q.*S) == :ART(*X.*P.*Q.*R) :NOM(*X.*Y.*P) - / :DE :SN(*Y.*R.*S). :SN(*G-*I.*P.TOUT(*G,TDUT(*I,*P))) == s s s -/. :SN(*G-*I.*P.(NON ILYA(*G,ILYA(*I,*P)))) == s -/. :SN(*G-*I.*P.ILYA(*G,ILYA(*I,*P))) == s s -/. :SN(*G-JE.*P.LE(*G,*U.NOMPROPRE(*G-JE),*P)) == s -/.
:SN(*G-IN(*A).*P.LE(*G,*U.NOMPR OPRE(*G-IN(*A)), *P) ) :ART(*XPQR) == s -VAL(*M,ART(*XPQR)) - / . : A R T ( * G - * I . * P . * Q . I L Y A ( * I , * P ET *Q)) s s :NOM(*XYP) == s :DE : : s :DE == s -VAL(*M,NOM(*XYP)). :DE s :: s -/. := s
== :MON(*G-*I) s
-/.
179 :EST : : s -/. :MON(M-*I) : : s -/. :NE : : s -/. :NE ==. :PAS(*P.(NON *P)) == s :QUE == s -/. :EST : : s :MON(F-*I) : : s :NE : : s - / . -/. :PAS(*P.*P) = : . :QUE == s
+AMI(NOM(M-*I.*Y.*U.AMY(M-*I,*Y,*U))). +AMIE(NOM(F-*I.*Y.*U.AMY(F-*I,*Y,*U))). +AUCUN(ART(M-*I.*P.*Q.TOUT(*I,*P IMPLIQUEN N *Q))). O +AUCUNE(ART(F-*I.*P.*Q.TOUT(*I,*P IMPLIQUE N N *Q))). O +CHAQUE(ART(*G-*I.*P.*Q.TDUT(*I,*P IMPLIQUE *Q))). +FEMME(NOM(F-*I.M-*J.*U.EPOUX(M-*J,F-*I,*U))).
+FILLE(NOM(F-*I.*P)) -ENFAN(F-*I.*P). +FILS(NDM(M-*I.*P)) -ENFAN(M-*I.*P). +FRERE(NOM(M-*I.*P)) -FREUR(M-*I.*P). +L(ART(*G-*I.*P.*Q.ILYA(*I,*P ET * Q ) ) ) . +LA(ART(F-*I.*P.*Q.ILYA(*I,*P ET * Q ) ) ) . +LE(ART(M-*I.*P.*Q.ILYA(*I,*P ET * Q ) ) ) . +MARI(NOM(M-*I.F-*J.*U.EPOUX(M-*I,F-*J,*U))). +MERE(NDM(F-*I.*Y.*P)) -ENFAN(*Y.F-*I.*P). +NEVEU(NDM(M-*I.*Y.*P)) -ONTE(*Y.M-*I.*P). +NIECE(NOM(F-*I.*Y.*P)) -ONTE(*Y.F-*I.*P). +ONCLE(NOM(M-*I.*Y.*P)) -ONTE(M-*I.*Y.*P). +PERE(NOM(M-*I.*Y.*P)) -ENFAN(*Y.M-*I.*P). +SOEUR(NOM(F-*I.*P)) -FREUR(F-*I.*P). +TANTE(NOM(F-*I.*Y.*P)) -ONTE(F-*I.*Y.*P). +UN(ART(M-*I.*P.*Q.ILYA(*I,*P ET * Q ) ) ) . +UNE(ART(F-*I.*P.*Q.ILYA(*I,*P ET * Q ) ) ) .
180
** (6) REGLESDE RAISDNNEMENT. +VALIDE(*N-*U,*V) -OK(*N) -NOUVEAU(*N-*U,*V). +OK(*N~ -KO(*N) - / -IMPASSE. +OK(*N).
+ABSURDE(*U) -EGAL(*G-IN(*A),*G-IN(*B),I.*U) -PASPAREILS(*A,*B). +ABSURDE(*U) -EGAL(M-*I,F-*J,I.*U). +EGAL(*X,*X ,1 .*U). +EGAL(*X ,*Y ,1 .*U) -EGAL(*X,*Z ,2 .*U) -EGAL(*Z,*Y ,1 .*U). +EGAL(*X,*Y ,1 .*U) -EGAL(*Z,*X ,2 .*U) -EGAL(*Z,*Y ,1 .*U). +AMY(*X,*Y,I.*U) -EGAL(*X,*R,I.*U) -EGAL(*Y,*S,I.*U) -AMYBIS(*R,*S,*U). +AMYBIS(*X,*Y,*U) -AMY(*X,*Y,2.*U). +AMYBIS(*X,*Y,*U) -AMY(*Y,*X,2.*U).
+EPOUX(*X,*Y,I.*U) -EGAL(*X,*R,I.*U) -EGAL(*Y~*S,I.*U) -EPOUX(*R,*S,2.*U).
181
-TR(*K.NIL,*L) -SUITELIRE(*L,*U). +SUITELIRE(PBINT,NIL) - / . +SUITELIRE(!,NIL)- / . +SUITELIRE(?,NIL) - / . +SUITELIRE(*K,*M.*U) -LU(*L) -TR(*L.NIL,*M) -SUITELIRE(*M,*U). +AJOUTER(*P) -DETAILS(I.*U) -/ -AJOUT(*P) -MESSAGE("J'ENREGISTRE:") -SORC(*P) -LIGNE. +AJBUTER(*P) -AJOUT(*P). +SORTIR(*U) -DETAILS(I.*V) -/ -MESSAGE("JETROUVE:") -SORT(*U) -LIGNE. +SORTIR(*U). +MESSAGE(*U) -LIGNE -DIRE("LA MACHINE. - "} -DIRE(*U) -LIGNE. +DIRE(NIL) - / . +DIRE(*U.*V) - / -DIRE(*U) -DIRE(*V). +DIRE(*U) -TR(*V.NIL,*U) -ECRIT(*V). +TR(".",POINT) - / . +TR("-",TBAIT) -/. +TR(.... ,BLANC) -/. +TR(*X.NIL,*X).
182
The program is composed of 8 parts. Here is the explanation relative to each of them.
[0) Imbrication of the different phasis (imbrication des diff6rentes phases), The system functions by executing a series of exchanges. Each exchange consists principally : in reading a sentence and recording it, in transforming it into a sequence of elementary statements (which are regular clauses) and retaining them, in starting a proof from the litteral -ABSURDE(~R), in computing from ~R a possibly reply and
possibly in suppressing certain clauses which render the recorded information absurd. In this last case, it suppresses the clauses deriving from the oldest sentences.
(1) Request for information (demande d'information). The system manages by itself the dictionnary of proper nouns and of their genders. In certain cases when it ~ s not been able to determine the gender of e proper noun and needs it in its deductions, ~ q u e s ~ directly this information from the interlocutor. re-
~2) Pretreating of the sentence {pr6traitement de la phrase), After reading a sentence presented in the form of a sequence of characters, the system eliminates some of them and produces a sequence of words and punctuation marks. Thls is done by means of a small metamorphis grammar.
(3)
Analysis
0% F r e n c h
[analyse
du f r a n g a i s ] .
The a n a l y s i s modelled is
o f each s e n t e n c e definition
is
effected of our :
grammar d i r e c t l y
on t h e f o r m a l
Each deep s t r u c t u r e
obtained
a formula
of the following
type
<formula>
::=
( NON <formula> ) I ( <formula> ET <formula> ) I ( <formule> IMPLIQUE <formule> ) I ILYA (<indice> , <formula> ) I TOUT { <indice>,<formule> ) I
LE ( < i n d i c e >
,<formule>,<formule> ) l
<fermule @l~mentaire>
::=
AMY [ <personne> , <personne> , <varlable> ) l DANS [ POURQUOI , <variable> ) I DANS ( OUI (<personne> ] , <variable> ) l DETAILS [ <variable> ) l ESAL [ <personne> , <personne> , <variable> ) t EPOUX ( <personne> , <personne> , <variable> ) I HALTE I NOMPROPRE [ <personne> ] I QUESTION
183
::=
<variable>
I F I M
::= < g e n r e > - <nom> P I M I <variable> <nom p r o p r e > ::= I PAR ( < g e n r e > PROLOG " , <personne> ) I <variable>
::= ::=
<variable>
" variable
(4)
Dictionary
(dictionneire).
The d i c t i o n a r y All
essoeiatmto of
each a r t i c l e
formula.
the relations
parenthood
EGAL [ e q u a l ) sex ~G
by i n t r o d u c i n g
PAR(~G,~X) w h i c h
represents
of the individual
(5) Creation of elementary statements (creations d'~nonc6s 61~mentaires). A metamorphosis grammar permits the transformation of each deep structure into a set of elementary statements, This is essentially an algorithm of the "skolemisation" type which also produces regular clauses. The positive litteral is always placed at the heed ; if there is none we create the litteral +ABSUROE(~U), if there are several, the ~t'stOOcomplicated). Moreover,
machine emits the message "C'EST TROP COMPLIQUE" the litteral -VALIOE(~N-~T,~U)
it permits the control of deductions and allows us to simulate the suppression of a clause. In this litteral clauses comes ; ~T ~N designates the number of the sentence ~ O m which the
is a term permitting differentiation between several clauses ~U is e v a r i a b l e appearing in all the litterels
of certaln results
and in particular the listing o~ ell the clauses instrumental in this demonstration,
(6) Rules of reasoning (r@gles de raisonnement). In order to reason, the machine Knows axioms of equality, it supposes that a person can have only one name and only one sex, and that, it a is the friend of b , then b is the friend of a
[7) Reading end writing [lectures et 6critures). This concerns the management of the input-output device.
5.8
EXAMPLE OF CONVERSATION
Here now is an example of a conversation. When the machine prints intermediates results, it gives all the clauses it is recording, the deep structure of each sentence and the result of the deductions from which it produces its replies.
184
-BAVARDONS!
MOI. DIEU -
MOI. DIEU
DETAILS!
MOI. J E SUIS HORACE. LA MACHINE. - 3'ENREGISTRE: +PHRASE(4,J,E.BLANC.S.U.I~S.BLANC.H.O.R.A.C.E.POINT.NIL). LA MACHINE. - 3E TROUVE: LE(*XO,*XI.NDMPROPRE(*XO-JE),LE(*X2,*X3.NOMPROPRE(*X2-1N(HORACE)),*X4 .EGAL(*XO-JE,*X2-1N(HORAcE),*X4))) LA MACHINE. - 3'ENREGISTRE: +EGAL(*XO-JE,*XI-IN(HORACE),2.*X2)-VALIDE(a-(1-(1-1)),*X2)-DANS(INDIV IDU(*XO-JE),*X2)-DANS(INDIVIDU(*XI-IN(HORACE)),*X2). LA MACHINE. - JE TROUVE: 4-(I-(1-1)).INDIVlDU(M-JE).INDIVIDU(F-IN(HORACE)).*XO LA MACHINE. - VOUS ETES BIEN DE SEXE MASCULIN?
MOI. OUI! -
DE DETAILS!
LA MACHINE. - 3'ENREGISTRE: +PHRASE(5,P.A.S.BLANC.D.E.BLANC.D.E.T.A.I.L.S.!.NIL). LA MACHINE. - 3E TROUVE: N N *XO.DETAILS(*XO) O LA MACHINE. - J'ENREGISTRE: +ABSURDE(*XO)-VALIDE(5-1,*XO)-DETAILS(I.*XO). LA MACHINE. - JE TROUVE: 5 - 1 . 3 - 1 .*XO
185
LA MACHINE. +KO(3).
3'ENREGISTRE:
LA MACHINE. - VOUS VOUS CONTREDISEZ. 3'DUBLIE Q E VOUS M'AVEZ DIT: U 'DES DETAILS!'
MOI. -
PDURQUOI
LA MACHINE. - PARCE QUE 3E SAIS RAISONNER. E~ QUE VOUS M'AVEZ D I T : ' J E SUIS HORACE.' LA MACHINE. - PARCE QUE 3E SAIS RAISONNER. ET QUE VDUS M'AVEZ D I T : 'JE SUIS HORACE.'
~01. L'AMI -
ZDZD EST MON F I L S . LA MACHINE. MOI. OUI! MOI. ZDZD EST MON F I L S . MOI. QUI EST L ' A M I DE LA TANTE AU ZOZO. LA MACHINE. - 3E NE COMPRENDS PAS CETTE PHRASE. MOI. QUI EST L ' A M I DE LA TANTE DE ZDZD? LA MACHINE. - DIEU! LA MACHINE. - CURIACE! LA MACHINE. - DIEU! LA MACHINE. - CURIACE! LA MACHINE. - DIEU! LA MACHINE. - CURIACE! MDI. POURQUOI EST-CE QUE CURIACE EST L ' A M I DE LA TANTE DE ZOZO? LA MACHINE. - PARCE QUE JE SAIS RAISONNER. ET QUE VOUS M'AVEZ D I T : ' L ' A M I DE MA SOEUR EST CURIACE,' ET QUE VOUS M'AVEZ D I T : 'ZOZO EST MON F I L S . '
-
OUI!
OU NON!
186
LA MACHINE. - PARCE QUE JE SAIS RAISONNER. ET QUE VOUS M'AVEZ D I T : ' L ' A M I DE MA SOEUR EST CURIACE.' ET QUE VOUS M'AVEZ D I T : 'ZOZO EST MON F I L S . ' LA ET ET ET MACHINE. QUE VOUS QUE VOUS QUE VOUS - PARCE QUE M'AVEZ D I T : M'AVEZ D I T : M'AVEZ D I T : JE SAIS RAISONNER. 'JE SUIS HORACE.' ' L ' A M I DE MA SOEUR EST CURIACE.' 'ZOZO EST MON F I L S . '
MOI. SATAN EST MON A M I , M01. QUI EST L ' A M I DE DIEU? LA MACHINE, - DIEU! LA MACHINE. - TOUT HOMME! LA MACHINE, - VOUS! LA MACHINE. - VOUS! LA MACHINE. - CURIACE! LA MACHINE. - HORACE! LA MACHINE. - HORACE! MOI. SATAN N'EST L ' A M I DE PERSONNE. LA MACHINE. - VOUS VOUS CONTREDISEZ. J'OUBLIE OUE VOUS M'AVEZ D I T : 'SATAN EST MON A M I . ' LA MACHINE. - VOUS VOUS CONTREDISEZ. J'OUBLIE QUE VOUS M'AVEZ D I T : 'DIEU E T L'AMI DE TOUT LE MONDE.' S MOI.
QUI -
LA MACHINE.
SATAN!
LA MACHINE. - SATAN! LA MACHINE. - SATAN! MOI. L'AMI DE LA SOEUR D M NIECE N'EST PAS M N AMI. E A O LA MACHINE. - C'EST TROP COMPLIQUE. MOT. AU REVOIR! LA MACHINE.
-
BDNSOIR!
187
BIBLIOGRAPHY
(I) BATTANI G. et MELONI H., Interpr@teur du langage de programmation PROLOG, Rapport de OEA, Groupe d'Intelllgence Arti#iclelle, UER de LUMINY, Unlverslt@ d'AIXMARSEILLE, September 1973.
(2] BATTANI G. et MELONI H., Mise en oeuvre des c o n t r a i n t e s phonologiques, e t s6mantiques dans un syat6me de ThEse de 36me c y c l e ,
syntaxiques
Groupe d ' I n t e l l i g e n o e
(3) BERGMAN M. et KANOUI H., Sycophants : Syst@me de calcul formel et d'int@gration symbolique sur ordlnateur, Rapport de recherche, Groupe d'Intelligenoe Artificielle, UER de LUMINY, Unlversit@ d'AIX-MARSEILLE, October 1975.
(4) COLMERAUER A., les syst~mes-q ou un formalisme pour analyser et synth@tiser des phrases sur ordinateur, publication interne n ~ 43, D@partement d'Informatique, Universit~ de MONTREAL, September 1970.
(5) COLMERAUER A., OANSEREAU J., HARRIS B. et KITTREDGE, TAUM 71, Rapport annuel du proJet de traduction automatique de l'Universit@ de MONTREAL, Januar 1971.
(6) COLMERAUER A., KANOUI H., PASERO R. et ROUSSEL Ph., Un syst@me de communication homme-machine en frangais, Rapport de recherche, Groupe d'Intelligenoe ArtS#ioielle, UER de LUMINY, Universit@ d'AIX-MARSEILLE, June 1973.
(7) PASERO R., Repr@sentation du #rangais en lo~ique du ler ordre, en vue de dialoguer avec un ordlnateur, Th@se de 3@me cycle, Groupe d'IntelIigence Artlflblelle, UER de LUMINY, Universit@ d'AIX-MARSEILLE, October 1972.
(8) KOWALSKI R. et KUEHNER D., Linear resolution with selection function, Artificial Intelligence 2, 1971.
(9) KOWALSKI R. et VAN EMDEN M., The semantic of predicate logic as programming language, JACM, 23, n ~ 4, pp. 733-743, October 1976.
(10) ROBINSON J.A., A machine-orlented lo$1c based on the resolution p rincipie, JACM 12, n~ pp. 227-234, December 1965.
188
THE THEORY
AND PRACTICE
OF AUGMENTED
TRANSITION
NETWORK GRAMMARS
Mathematics
Massachusetts
and Newman
Mass.
02138 / USA
I. INTRODUCTION
eight
years
augmented language
transition
network
(ATN) and
understanding
systems
for both text and speech. and debug, and easy to interface
able to handle
to other
structures
computers,
by easily program a it
visualized computer
diagrams.
in order to write or use an ATN grammar. to learn about ATNs. in 1975]. English, as well. Although English have found ATNs useful the
description
examples
presented
will be useful
network
grammars et al,
although
appeared
independently 1969]. 2)
in earlier
summarized of
perspicuity, 4)
power,
efficiency regularities
representation,
the ability
and generalities,
and 5) efficiency
formalism
and
various content
unfortunately attempts
to bring together
the primary
192
of
those
sources
and to p r o v i d e
the reader
with e x a m p l e s for
styles wish
It is i n t e n d e d about ATNs to
I have recent to of
to describe
since
no one grammar
the ATN
formalism
and
by r e a d e r s 1970]
details
There
of the issues
and t r a d e o f f s
in w r i t i n g
ATN g r a m m a r s
variety
of p u r p o s e s
and i l l u s t r a t i o n s
constructions suggested
procedure
to f o r m u l a t e
an ATN grammar.
2. THE ATN F O R M A L I S M
basic
transition
network
(BTN)
grammar is a
of finite labeled
diagrams; arcs,
each
labeled final
states.
the s y n t a c t i c
category,
allow the t r a n s i t i o n
to be m a d e
by the n e t w o r k of arcs)
beginning
state,
(sequence
can be followed
terminates
on a final
state.
The n e t w o r k permits
differs by
state label
automaton on some
in arcs
that to
it be a
recursion rather
nonterminal
than a t e r m i n a l
symbol.
That
of input
re-applying When
state.
such a r e c u r s i v e onto
encountered, is
computation
is pushed
constituent. stack is
in this
suspended
193
computation have
is
continued.
The
input in
pointer,
been moved
to a later point by
the level
sentence process.
accepted
constituent)
the lower
An attempt the
input p o i n t e r
is at the end of
that the s e n t e n c e
acceptable.
such
a basic
transition in
subsequent used. of
diagrams
States the
as c i r c l e s states
around
(label)
indicated is u s u a l l y states
by double obvious,
circles most
(the initial
state
grammar final
likely the
are shown
on any other in F i g u r e
later;
a transition
processing
CAT means
be of
indicated
category
(and
is taken), in the
that
a recursive
is to be made
indicated
CAT ADJ
CAT N
PUSH PP/
Figure
I: A Small G r a m m a r
for Noun
Phrases
followed
in this
paper
for
labeling
states
is that
generally
composed
of two parts
separated
by a slash.
194
part
indicates for
being
(a
phrase,
example); the
indicates or
through
occur that in
next.
in a h y p o t h e t i c a l it has been an an
parsing that
sentence
imperative;
NP/ADJ
either where
a conjunction (a name
is also used
that a POP,
the t e r m i n a t i o n
It is c r u c i a l l y only to
important the
meaning
writer but
s y s t e m does not use them as a n y t h i n g set of arcs coming from them. instead
...
REL/PRO,
the o p e r a t i o n that
of the parser.
A grammar
mneumonically will
accurate
or any other)
greatly and
clarify
the w r i t i n g
debugging
the grammar.
A context strong
BTN
grammar
as
described
above store
is
to a from
or a p u s h d o w n only in
automaton.
its i n a b i l i t y
to c h a r a c t e r i z e
unbounded
branching,
ART
ADJ
ADJ
ADJ
...
ADJ
N.
The g r a m m a r grammar
of F i g u r e
I corresponds
to
the
following
context
free
productions):
195
ADJS? I an
NMODS? i a
N PPS?
i ... I ...
QUANT?
I all
I some
Both
grammars
can p r o d u c e
to accept
sentences
such
as
The new red law books. Each b e a u t i f u l Men with wives The t a l l e s t picture in the recent exhibit. (ambiguous)
in p r o f e s s i o n a l
careers.
of a
context
free rule
grammar [Woods,
in
expression
Thus a r e g u l a r
expression
1969]
as X ->
(A) B C* D can be r e p r e s e n t e d
in F i g u r e
Figure
2:
Another
Simple
Network
free g r a m m a r s
are not
adequate is
addition, or
we have d e s c r i b e d strings; it
capable produce
accepting which
rejecting shows
cannot
structure
something
about the r e l a t i o n s h i p s
a m o n g the words.
196
To make and a
a more of
powerful
each
arc
is p r o v i d e d
with
test
actions,
producing an arc
an a u g m e n t e d must be
arc The
to be taken, actions
and the
arc
is traversed. case of
structures,
variables and
level are
grammar
Registers be
contents changed,
available
on s u b s e q u e n t of the
and
combined,
copied,
or added
to as more
input
is processed.
The of
arrangement input
of states sentences, a
and
arcs the
the
surface
structure
acceptable
and
permit be
to c r e a t e
which very
quite
surface
structure.
general deep
mechanism
capability
which
produce
structures and it m a k e s
the ATN
as those
of a t r a n s f o r m a t i o n a l in power to a T u r i n g
grammar, machine.
formalism
equivalent
We now an ATN
describe
in d e t a i l An arc
the
format
and as
operation
of the
arcs list
of of
is r e p r e s e n t e d be w o r d s schemas words or
themselves by the
lists. I.
of arcs
as shown
in T a b l e
(Capitalized are
actual
lower be
case
in b r a c k e t s and *
descriptions
defined zero or
below, more
is the K l e e n e of the
which
indicates
occurrences
previous
The
first
element second
of
each
arc
indicates on the
its type
type. of the
of the
be e x p l a i n e d be s a t i s f i e d occur in any
below. in o r d e r number is
element arc to
on all stored
arcs in
except
that
registers. as flags
constants of s t r u c t u r e . and/or
used
to be t e s t e d built
arcs)
or p i e c e s
structures item
are
register of the
of input
and/or
in the POP
dictionary. indicates
element of the
of every grammar
JUMP
is to be c o n s i d e r e d
next.
197
<test>
<action>*
(TO n e x t s t a t e > ) )
(WRD <word>
<test>
<action>*
(TO n e x t s t a t e > ) )
(MEM <list>
<test>
<action>*
(TO n e x t s t a t e > ) )
<test>
<pre-action>*
<action>*
<test>
<action>*
<test>
<action>*)
(POP <form>
<test>)
Table
I:
CAT
arc
may
be
taken
if
the c u r r e n t
is of the arc. A
category
specified
by the second
element
specifies
rather that
a WRD
which
element allow
eliminate
of a WRD arc to be a list of words.) input when they are taken, that
t h r e e of these cause
is, they
the input p o i n t e r
to be a d v a n c e d
A JUMP made
arc s p e c i f i e s "consuming"
is
to
be
without
a constituent action
of the named
list by a HOLD
of some
previous
(see below).
A network
PUSH which
arc
initiates
new,
recursive, which
begins A POP
marks of
terminal
element which
indicates
(usually the
structure)
is to be r e t u r n e d
result
t98
of the
the
portion
of input to
parsed return
level arc
of the which
control
See Figure
3 for an and
a PUSH
pUSH T 1
POP
POP
Figure
3:
The O p e r a t i o n
of PUSH
When Whenever
a number register
of r e g i s t e r s list, along
are with
active. other
recursively register
(lower)
list.
is popped,
list and r e s t o r i n g
the r e g i s t e r
The c o n s t i t u e n t then
the second
the current
item
In m o s t structure
of
the
examples is
given
in
this
paper,
the
type trees
of of in by the
produced
grammar is
written
of e l e m e n t s
the root which may be either both a standard for further tree and
leaves
or subtrees.
4 shows
representation
w h i c h has been
formatted
clarity.
199
ART AN
ADJ OLD
ADJ NP
N LION
NU SG CHASE THE
/ j i
/\
N
ART
/7\\
ADJ N DEER
NU
NU
// YOUNG
I SG/PL \
MOUNTAIN
!G
(S DCL (NP (ART AN) (ADJ OLD) (ADJ (NP (N MOUNTAIN) (NU SG))) (N LION) (NU SG)) (TNS PAST) (VP (V CHASE) (NP (ART THE) (ADJ YOUNG) (N DEER) (NU SG/PL))))
Figure 4:
as
easily,
for
example,
((ACT: HIDE) (ACTOR: JOHN) (OBJECT: MONEY) (TIME: PAST) (LOCATION: FLOWER-POT))
200
(NP
,ANIM
semantic
representations:
(FOR:
ALL XI / (FINDQ:
structures,
stratificational
analyses,
as
5 shows
the details
of the
arcs
which
to
listing
file
using
LISP the
1967;
were written
below.
represented other in
element
elements and
Comments
upper
should that
Appendix
this
point.)
ATN parsers
and grammars
of p r o g r a m m i n g
and that
if one other
the notation. and LIFTR may are the basic extend have this been actions set. found of an are
SENDR,
Here
which
convenient. set to
the indicated
register
to be
value
form. This is like It is SETR except that to sets the last <reg> element (QUOTE named
<reg> not
<value>)
equivalent (NPCLAUSE))
(SETR a
HEAD
register
value
returned
by the function
NPCLAUSE;
(SETRQ HEAD
201
(NP/ (CAT ART T (* T is a predicate which is always true) (SETR ART (BUILDQ ((ART *)))) (TO NP/ART)) (JUMP NP/ART T)) (NP/ADJ (CAT N T (SETR N *) (SETR NU (GETF NUMBER)) (TO NP/N)) (CAT N T (ADDL ADJS (BUILDQ (ADJ (NP (N *) (NU #))) (GETF NUMBER)) (TO NP/ADJ))) (NP/ART (CAT QUANT T (SETR QUANT (BUILDQ ((QUANT *)))) (TO NP/QUANT)) (JUMP NP/QUANT T)) (NP/QUANT (CAT ADJ T (ADDR ADJS
(BUILDQ (@ (ADJ) # (*) (GETF DEGREE)) (* This will add the form (ADJ SUPERLATIVE root) for words like BIGGEST and the form (ADJ word) if uninflected.) (TO NP/QUANT)) (JUMP NP/ADJ T))
(NP/N (PUSH PP/ (PPSTART) (* the test checks that the next word is a preposition) (ADDL NMODS *) (TO NP/N)) (POP (BUILDQ (@ (NP) + + + ((N +)) ((NU +)) +) ART QUANT ADJS N NU NMODS) (DETAGREE) (* the predicate DETAGREE tests for agreement between the ART and N registers to screen out "a books", "an table")) (PP/ (CAT PREP T (SETR PREP *) (TO PP/PREP))) (PP/PREP (PUSH NP/ (NPSTART) (* predicate fails if the next word cannot begin a NP) (SETR NP *) (TO PP/NP))) (PP/NP (POP (BUILDQ (PP (PREP +) +) PREP NP) T))
Figure 5:
202
(NPCLAUSE))
sets the r e g i s t e r
to a list which
has
one
element,
(ADDL <reg>
<form>) to
This
action
takes
contents
(which
is of to
expected
be a list)
of the named
the form to the left end of the list, this new list. sets it to If the r e g i s t e r a list which
ADDL It is
contains (CONS
form.
equivalent
to (SETR <reg>
(EVAL
(GETR <reg>))).
(ADDR
<reg> adds
<form>) elements to
action the
is e x a c t l y of the
that It
it is
right
equivalent <form>)))).
(APPEND
(GETR
(LIST
(EVAL
are useful
like a d j e c t i v e s
and conjuncts.
(SENDR
<reg>
<form>) It
This
is a p r e - a c t i o n
which
is only used on
PUSH
arcs.
causes
the r e g i s t e r
level of r e c u r s i o n allows
about
in effect,
the lower
(LIFTR
<reg>
This
is the i n v e r s e
of SENDR
register current
value
<form>)
This
places
the
indicated
form
on list
the is a
as a c o n s t i t u e n t is
of c o n s t i t - t y p e . at all
The HOLD
list w h i c h with
accessible
levels.
This
action with
together
VIR
arcs c o n s t i t u t e left
a mechanism in
for d e a l i n g
called
extraposition
transformational
Examples
of this will
be given below.
Sometimes
it is useful its
to have
a second
test
on
an If on
VERIFY
evaluates
a predicate. condition
the arc is a b o r t e d
the
had failed.
is implemented,
one may do
203
of d i f f e r e n t
<form>s
may be used w i t h i n
an action.
The GETF, it as
implementation are e v a l u a t e d
forms
follows:
word of input
as it
appears
before
has been p e r f o r m e d
on it.
* refers arc
to the c u r r e n t it
On a JUMP,
POP, WRD,
or
MEM
On a CAT arc it is the root * = for arc "stop". the it On test is a and the
word on the
pre-actions, value
returned
lower
level
computation
w h i c h was
initiated
(GETR <reg>)
returns
contents set,
of the
indicated
register.
been
it r e t u r n s
an error.
<word>) value
checks of word
word
and is
the
(GETF
SG or PL.
features
without
values;
(GETF PASSIVE)
on w h e t h e r
word
argument
where
are d e p e n d e n t
on the s y n t a c t i c
<form>*)
is a c o n s t r u c t o r of structure of forms.
that
takes
template and of
fragment
constants a piece
special
from s u b s t i t u t i n g in the t e m p l a t e .
special
The s p e c i a l m a r k s
+ expects
the c o r r e s p o n d i n g
form to be a r e g i s t e r ~o be s u b s t i t u t e d
causes
the c o n t e n t s # causes
of the r e g i s t e r
form to be e v a l u a t e d
as a LISP
form and
substitutes
for the #.
204
* causes
the value of * to be s u b s t i t u t e d . form. form be form. if the DET r e g i s t e r (@ x x ... appended x) and together.
It
does
not
need
causes It
the does
lists not
which take a
contained
the list
(DET then
THE) the
current
"books"
(BUILDQ
(NP + (N *) (NP
would return
structure
(DET THE)
(N BOOK)
If in a d d i t i o n were set to
the form
(BUILDQ
(@ (NP +) the
(NU #)))
(GETF
would
produce
(ADJ OLD)
(ADJ RED)
(N BOOK)
(ABORT) the
causes
failed.
in c o n d i t i o n a l
used
as actions.
(COND
(<pred1> (<pred2>
<e11> <e21>
<e12> <e22>
(<predi>
<ell>
<ei2>
... <eij>))
This The
a nested
conditional
statement. I for
actions
or forms.
See A p p e n d i x
details.
(QUOTE
<value>)
This
is a LISP above
function
w h i c h keeps value
from being
evaluated.
Bee SETRQ
for an example.
useful
to m e n t i o n
a few of
the
(or in V E R I F Y
or COND
common and
predicates EQUAL,
functions
than to invent
(NULLR
<reg>)
is true
been
set to NIL.
205
(CHECKF
<feature> word to
<value>)
succeeds
for
the is
current
is the i n d i c a t e d (EQUAL
ROLE OBJ)
equivalent
(GETF ROLE)
(QUOTE OBJ)).
(CATCHECK cat.
<word>
<cat>)
word
is of the lexical as in
category (CATCHECK
It is useful
in r e g i s t e r s ,
(GETR V)
(QUOTE MODAL)). succeeds only if there are no more words in the input
(ENDOFSENTENCE) string.
(x-AGREE some
<form> sort
<form>) of
represents
a family of the
predicates e.g.,
to
check
agreement etc.
between
forms,
DET-N-AGREE,
SUBJ-V-AGREE,
(x-START)
represents on JUMP
set
of
"look-ahead
predicates"
which
are
useful
For example,
IMP-START
there were
sentence; began
similarly
with a q u e s t i o n
consant as the
false, arc
this (since by
does
not
appear
by be
function
fails.
T is a LISP any
constant
which
true;
however,
since
non-NIL
value
context, (GETR
SUBJ)
used
contents
tests w h e t h e r
register
has been
To grammar
test
his
understanding,
the
reader
to m o d i f y nouns,
the
of F i g u r e s
I and 5 to include
pronouns,
and/or
possessives.
206
2a. A D e t a i l e d
Example
In
this
section
we p r e s e n t
a more
complex
grammar
the p r o c e s s i n g which,
of an actual with
sentence.
Figures
together
and p r e p o s i t i o n a l number of
of F i g u r e s English [Woods,
I and 5, can
large
fairly
(This g r a m m a r
is an e l a b o r a t i o n it can
of that given
of the s e n t e n c e s
parse are:
The girl on the red bus was w a n t e d police. Will a boy scout help an old woman
in s e v e r a l
countries
by the
to cross
The m a y o r w o u l d
not have w a n t e d
to be e l e c t e d
trying
burning
of
the
is fairly to those
this
section
into u n d e r s t a n d i n g
the f o l l o w i n g
explanation
as was put
into w r i t i n g
it.
Let
us consider
the p a r s i n g
of the
sentence of
"The mayor
have w a n t e d give an
to be e l e c t e d
to the p o s i t i o n
dog-catcher."
overview attention
special
in state
S/ we look
to serve, is
perhaps we
one
found,
grammar
determiners
If we cannot
a noun
we assume Of
question.
details
the parse
(it w o u l d
good e x e r c i s e
but we assert
arc s u c c e e d s from is
(NP (DET T H E ) ( N
MAYOR)(NU
is r e t u r n e d
The TYPE
register
207
r.~1" N E G ~ C ~.~.V V / h
puSH Np/
CAT_V
(~
JUMP
~,
) . ~
CATv
Figure
6:
A Grammar
for S e n t e n c e s
DCL to i n d i c a t e
that the s e n t e n c e
is
declarative,
to S/NP.
In state S/NP the CAT V arc cannot be t a k e n with the word in the input, form the but the CAT MODAL arc succeeds. ((MODAL WILL)), extra using the root will
"would" is
(The is
parentheses
be c l e a r
The tense
recorded
in the TNS r e g i s t e r
in the form
(TNS PAST).
In state S/AUX we pick up the negative. ensures negatives that the arc has not been taken The NEG r e g i s t e r
The before,
test
on
this
arc
thus m a k i n g
double
unacceptable.
is being
for the test and as a piece of s t r u c t u r e the parse structure. to undo The test for
to be later "do"
necessary meet
the effect
of d o - s u p p o r t
in s e n t e n c e s to
it would be n e c e s s a r y
action not
it from the V
the V r e g i s t e r register.
"do"
208
(S/ (PUSH NP/ T (SETR SUBJ *) (SETRQ TYPE DCL) (TO S/NP)) (JUMP S/NP T (SETRQ TYPE Q))) (S/NP (CAT V (GETF TNS) (SETR V *) (SETR TNS (BUILDQ (TNS #) (GETF TENSE)) (SETR PNCODE (GETF PNCODE)) (TO S/AUX)) (CAT MODAL T (SETR MODAL (BUILDQ ((MODAL *)) (SETR TNS (BUILDQ (TNS #) (GETF TENSE)) (TO S/AUX))) (S/AUX (CAT NEG (NULLR NEG) (SETRQ NEG (NEG)) (COND ((EQUAL (GETR V) (QUOTE DO)) (SETRQ V NIL))) (TO S/AUX)) (JUMP VP/V (AND (GETR SUBJ) (AGREE (GETR SUBJ) (GETR PNCODE)))) (PUSH NP/ (NULLR SUBJ) (COND ((NOT (AGREE ~ (GETR PNCODE))) (ABORT))) (SETR SUBJ *) (TO VP/V))) (VP/V (CAT V (AND (GETF PASTPART) (EQUAL (GETR V) (QUOTE BE))) (HOLD (QUOTE NP) (GETR SUBJ)) (SETRQ SUBJ (NP (PRO SOMEONE))) (SETR AGFLAG T) (SETR V *) (TO VP/V)) (CAT V (AND (GETF PASTPART) (EQUAL (GETR V) (QUOTE HAVE))) (ADDR TNS (QUOTE PERFECT)) (SETR V *) (TO VP/V)) (CAT V (AND (GETF UNTENSED) (GETR MODAL) (NULLR V)) (SETR V *) (TO VP/V)) (CAT V (AND (GETF PRESPART) (EQUAL (GETR V) (QUOTE BE) (ADDR TNS (QUOTE PROGRESSIVE)) (SETR V *) (TO VP/V)) (JUMP VP/HEAD T (COND ((OR (GETR MODAL) (GETR NEG)) (SETR AUX (BUILDQ ((@ (AUX) + +)) MODAL NEG))))))
209
(VP/HEAD (JUMP VP/VP (GETF INTRANS (GETR V))) (PUSH NP/ (GETF TRANS (GETR V)) (SETR OBJ *) (TO VP/OBJ)) (VIR NP (GETF TRANS (GETR V)) (SETR OBJ *) (TO VP/OBJ)) (WRD TO (AND (GETF SCOMP (GETR V)) (NULLR AGFLAG)) (SETR SPECIALSUBJ (GETR SUBJ)) (TO VP/TO))) (VP/OBJ (JUMP VP/VP T) (WRD TO (GETF SCOMP (GETR V)) (SETR SPECIALSUBJ (GETR OBJ)) (TO VP/TO))) (VP/VP (PUSH PP/ T (ADDR VMODS *) (TO VP/VP)) (WRD BY (GETR AGFLAG) (SETR AGFLAG NIL) (TO VP/BY)) (POP (COND ((GETR OBJ) (BUILDQ (S + + + (@ + (VP (V +) +) +)) TYPE SUBJ TNS AUX V OBJ VMODS)) (T (BUILDQ (S + + + (@ + (VP (V +)) +)) TYPE SUBJ TNS AUX V VMODS))) T)) (VP/BY (PUSH NP/ T (SETR SUBJ *) (TO VP/VP))) (VP/TO (PUSH VP/ T (SENDR SUBJ (GETR SPECIALSUBJ)) (SENDR TNS (TENSEOF (GETR TNS))) (SENDRQ TYPE COMP) (SETR OBJ *) (TO VP/VP))) (COMP/ (CAT V (GETF UNTENSED) (SETR V *) (TO VP/V)))
Figure 7:
210
back
to
our
analysis,
again
the input
pointer
positioned
"wanted".
have
something
in the SUBJ r e g i s t e r
can f o l l o w the JUMP arc to state VP/V. PUSH NP/ arc would try to
find one,
strike
thirteen?"
the sentence,
by the time
state
VP/V
is
reached
the
(or modal)
In this state
of the c o m p l e x arcs
verb a u x i l i a r y
is processed.
i m p l y i n g that
the c o n d i t i o n s be
impossible.
passivized it places it
list as a NP so We invent
probably
a dummy
subject
AGFLAG, the
in a b y - p h r a s e
sentence This
("Juliet ability
by Romeo.")
dummy. it on
to put i n f o r m a t i o n context
in r e g i s t e r s
remove of
features and
looks
like
later it can
of input, the
be t h o u g h t of as p o s t p o n i n g is reached.
differential
The verb be a
second form as
CAT of in
requs current
participle, remembers
killed"
the effect
"have"
by a p p e n d i n g
PERFECT, by the
and r e p l a c e s
the "have"
in the V r e g i s t e r
The untensed
third
CAT
arc by
may be t a k e n only a
only
word go
is to
modal,
Washington?" handles
emulate
present
participles
which and
sentence
losing?"
w o u l d now be
(TNS P R E S E N T
PROGRESSIVE).)
211
it w o u l d
be p o s s i b l e
to c o l l a p s e
all
four
CAT
arcs
one arc w h i c h
and a c o m p l i c a t e d
conditional
action. be
That w o u l d less
be s l i g h t l y This
efficient of than
clear.
method
processing
may seem
but it is s i m p l e r constraints.
a set of c o n t e x t
free rules to e x p r e s s
sentence,
the
third register
and
second
arcs
would
be
in the
following
settings:
(NP (DET THE) DCL ((MODAL (NEG) (TNS PAST WANT WILL))
(N MAYOR)
(NU SG))
PERFECT)
Finally creates so
only
the
JUMP
may be taken.
This are
arc set,
an AUX r e g i s t e r sample
or NEG r e g i s t e r s NEG))
for our
sentence
(MODAL WILL)
in the AUX
register.
(This is a good e x a m p l e
we k n o w that is
the head
is in to
If the verb
intransitive
directly There
We can look
for an o b j e c t by p u s h i n g
to get an object:
directly
or by p i c k i n g on the HOLD
noun
phrase
sentence
was p a s s i v e
three of
can be taken
in our example,
but we will
postpone
discussion
last arc w h i l e d e s c r i b i n g
either
with or w i t h o u t
a direct
object,
we
"to"
indicate
of the action.
to p r e v e n t
finding phrases
of p r e p o s i t i o n a l with
modifying
interspersed
the p r e p o s i t i o n a l
phrases.
We now
postpone above.
discussion
212
Returning word is "to" marked to our sample sentence Since in state VP/HEAD, the verb "want" we find the
string.
dictionary
feature we are
SCOMP, not
and WRD
since
it
appear looks
final tree. a
declarative in the
structure. may be
the
subject
swallower have
liked
to eat greasy
food".
the
verb,
anything
which
could occur
in the of
verb p h r a s e the
of a sentence, network
it is c o n v e n i e n t
to m e r g e
the end
complement
network
as shown
S/~~ENC E ~ENT
C O M P / ~
Figure 8: Merging Networks
in F i g u r e to
8.
double
use of a p o r t i o n
of
work
be sure that the r e g i s t e r s are set on both initial has t h r e e at the of our of
To a c c o m p l i s h which
initialize
registers path
from the c u r r e n t is
level.
example,
register
initialized
contents this
SPECIALSUBJ,
If we had r e a c h e d be s e n d i n g
state to
via the WRD TO arc from VP/OBJ, become the lower level
subject
("Alice w a n t e d
to s p e a k , " of
"The p r o d i g y
was e x p e c t e d
Notice
movement
213
,'the
prodigy"
from
the
TENSEOF
we are
find
ourselves set; on
in
a lower
in state before in a
the registers
the stack and are completely (since "be" is untensed) registers set:
inaccessible. results
Taking the CAT V arc from COMP/ transition SUBJ: TYPE: TNS: V: Now SOMEONE)). to VP/HEAD
to VP/V with the following (NP (ART THE) COMP (TNS PAST) BE the first replace CAT the (N MAYOR)
(NU SG)
We put the mayor on the the dummy (NP (PRO and we take the JUMP arc Here the VIR arc removes NOTE:
the mayor from the HOLD list and puts him in the OBJ register. we did not VP/VP VIR arc. really need to use the HOLD list here. register, power be say EXTRANP, OBJ of the (GETR EXTRANP)(SETR The which real must
into another
picked
In VP/VP we push for the prepositional dog-catcher" structure SUBJ: TYPE: TNS: AGFLAG: V: OBJ: VMODS: Finally the structure (details left to the register. reader)
phrase and
of
place
resulting
(NP (PRO SOMEONE)) COMP (TNS PAST) T ELECT (NP (ART THE) ((PP (PREP TO) (N MAYOR) (NP (NU SG)) ...))) is taken. The BUILDQ creates
214
(S COMP (NP (PRO SOMEONE)) (TNSPAST) (VP (V ELECT) (NP (ART THE) (N MAYOR) (NU SG)) (PP (PREP TO) (NP (ART THE) (N POSITION) (NU SG) (PP (PREP OF) (NP (N DOG-CATCHER) (NU SG))))))). This structure is returned to the configuration we were in at the time of the PUSH from state VP/TO, and the remaining actions place the structure in the OBJ register. words in the input string, the PUSH PP/ and value of the successful parse: (S DCL (NP (ART THE) (N MAYOR) (NU SG)) (TNS PAST PERFECT) (AUX (MODAL WILL) NEG) (VP (V WANT) (S COMP (NP (PRO SOMEONE)) (TNS PAST) (VP (V ELECT) (NP (ART THE) (N MAYOR) (NU SG)) (PP (PREP TO) WRD BY on that cannot arc be Since there are no more arcs
taken, so the POP arc creates the following structure to return as the
(NP . . . ) ) ) ) ) )
215
which
in tree
f o r m is:
~S
NP TNS AUX VP ,~T ~ / ~ P S \ PERFECT N E G / ~ N ~ AT ~ ~
S EO.E /
PRO PAST , V
/\~ NP
/j\
/
"
~ ~-PP NU
ELECT ART N NU
/ THE
PREP NP
THE POSITION SG
OF
PREP
/\
NP
N NU DOGCATCHER SG
I\
In top down,
this d i s c u s s i o n
viewing
the p a r s i n g
process
as a
where,
any arc w h o s e of
may be taken.
for p u r p o s e s The
state.
next
other p a r s i n g
algorithms
ATN grammars.
are in order
about the
inadequacies there
of the
sample
For p a s s i v e a by-phrase
sentences, can r e a l l y
is no check that
subject. grounds.
(Consider,
was k i l l e d
by B r u t u s "
216
"They be
will
be married were
by the
time the
the
clock
ten.") be ruled
If
check subject
made
it could out. in a to
incorrect
like
mountains outside
worn
down
to a function
the parser
could
complement cases.
structure
in this cause
grammar their
is also subject
limited rather
to the than
Some verbs
should
object
to be sent down:
friend
to leave.
(obj) (subj) may be tested This to on the WRD TO grammar screen will Qut the
to leave.
in VP/OBJ accept
to determine various
down. be
badly
sentences,
"The house
only
of c o m p l i c a t i n g
for this
question handle
answering forms
system
grammar
some
of anaphoric
reference, and
ellipsis, clauses,
several
types
reduced
relative
parenthetical syntactic
comparatives,
a number
of other
grammars
large such
enough as
useful who
large
to be in
this. of large
interested consult
listings
grammars
report
contains
of the grammar
in Figure
descriptions
actions
Leal
[Leal, in a
1975]
based which
tagmemie of the
theory large
is written grammar
semi-LISP
particularly
for n o n - L i S P - u s e r s
to follow.
Bates
a listing
syntactic 1975]
used of
system grammar
same
system
et al.,
Anyone
! v.,,oJ
,I ..,7 V
( ...... ~I ~ . I ......
j
C*T 0-.,.LN~ION(Q, I ,
~ .
<@~
POe INULL ~ R l m L
"
.........
mlStl l ' r II
WltO l T
. ~ +~'i'~
ke ~ ,
il ..................
,~t r
>'.!
~
!
i 5
......
~\\\\tll,.~/..>+7
.....
.... 7 (FITH&T
:o.o<Avl \
%/
Figure
9:
LUNAR
System
(s/s ~
DCL NP &UX VP
JUMP (AND(NOT(WRD OF))(GETR ANAPHORFLG)} JUMP{OR(WRD OF}(NULLR ANAPHORFLG)) PUSH PP/(WRO(OF AMONG)} PUSH PP/PREP CAT FRO T a
JuMp s H8
~,,
CAT LI~T(
CO
v CAT V 8
VIR PP (HPREP}
CATC01~JIf,
JpMPT
UP
Figure
9 (con't):
The G r a m m a r
for the L U N A R S y s t e m
~
CAT ~ MEM(WHICHWHOTHAT~T Wm~wHgu T PUSHhiP/ (WROWHOSE] ~ ~~ ~u~"~ |_ --% (WMOWHOSE] ~. ~ S
2 T %
PUSH N P / A R T
T
~
CIT AOVT
VP
Figure
(con't) :
The G r a m m a r
for the L U N A R S y s t e m
220
language [Grimes,
spoken 1975]
Republic given in by
of Cameroun)
be interested
followed
one produces
structures
and the
regular
2b. Formal
Properties
of ATNs
It unwieldy [Woods,
be
shown to
that a
the
ATN
formalism It
above are
is
equivalent
in power that
Turing formal
However, direct
inefficient
been and
it is possible to optimize an
recursion optimization
and
network
using
finite
techniques. algorithm for context in free recognition using [Earley, 1970] clever of and K
parallel,
of merging bounded
paths.
strings of the
amount
n is the length
of p r o p o r t i o n a l i t y the input);
dependent
grammar time is
grammars
bounded
by n~2, are
the bound to
interesting
things
note grammars
First,
it works form)
free
achieves having
the
smaller
for the
or LR(k)
an ATN grammar
algorithm
which m a i n t a i n s Simple
bounds.
removing may be
optimization
constant
proportionality
2c. M o d i f i c a t i o n s
to the Grammar
Formalism
applications, (necessitating
changes
may easily
be
made
in
the
corresponding
changes
in the parser).
221
The
following
changes
have
been
incorporated
in v a r i o u s
ATN grammars.
I. after
on arcs.
A number either
can be p l a c e d
on every
arc,
just
to i n d i c a t e
information can
the
parser
use
information first.
to score
alternative
in order
to pursue
be to have
an action
on the arc w h i c h to
calculates
available)
a score
incorporate
Actions contents
on
POP
arcs.
Such
actions
could
to the next
level or to r e a d y
registers
If a c t i o n s
each arc e n t e r i n g
state.
3. 1975b, free
Factorization 1976]
of tests.
speech
parser
[Bates,
(testing
of the c u r r e n t register
context-sensitive
contents).
4. allows which
of
arcs.
The Burton
grammar-compiler to be grouped
of the
that
whenever
fail, up
to be e x a m i n e d
back
taken. so
example, This
a set of m u t u a l l y
grouped.
is purely
an e f f i c i e n c y
measure.
5.
New
actions.
Because
feature If this
of LISP, feature
as an action
on an
fixed of
be known
combination to his
specify If the
particular to be
before another to be
interfaced
interpretation
routine,
it is c o n v e n i e n t
program
on an arc.
results further
of the i n t e r p r e t a t i o n parsing.
may be used to
222
3. P A R S E R S
FOR ATN G R A M M A R S
an
of top for
parsing up,
bottom
driven,
grammars.
In fact,
almost
context
free
algorithms
to ATNs
by p r o v i d i n g the tests
along r e g i s t e r
and to p e r f o r m
can be t a i l o r e d
aids,
requirements
of the a p p l i c a t i o n
be used.
described
b e l o w have
been
implemented in a high
in
is no r e a s o n why one could not be w r i t t e n as PL/I, ALGOL, should the PASCAL, be or BCPL, or even or at
level
in an a s s e m b l e r least able to
The l a n g u a g e recursion,
recursive to
and
ability
evaluate
a portion
of the though
as if it were
a part of a p r o g r a m feature.
is a very desirable,
not a b s o l u t e l y
necessary,
3a. ATN I n t e r p r e t e r s
parsers like
w h i c h were w r i t t e n interpreters
for ATNs
to the
as data. interpreter
to build to a
grammar
similar
a context tried
free grammar. in w h i c h
the arcs
in the order to
This
grammar
deliberately
and thus
paths
are tried.
cells.
ATN parser
[Woods,
1970.
1973]
is
more
versatile
It is based
information
the
alternative
alternatives
be done of the
An a l t e r n a t i v e
consists
223
current state,
state
the arcs r e m a i n i n g
to be tried
at that it
the r e g i s t e r
list is used, it is
and
remember
arcs w h i c h
level.
at s e v e r a l one
remembers
after "backs
alternative to try.
Another
type of a l t e r n a t i v e of a
is a m b i g u i t y
there
is a m b i g u i t y there
about the
word like
(This h a p p e n s which
in cases
where
are c o m p o u n d
market"
possibly
should
be collapsed.
One may think configuration current register follow process more more LUNAR to
process
as one of
moving
from
one
is a snapshot state,
input
stack
processing
of input in
several
the p r o c e s s
of c r e a t i n g
from a given
configuration. functions
To make
of the parser
PARSER
is
called
with
the
input
string
as
its empty
configuration the
(initial LEXIC
state, to
function
according
to d i c t i o n a r y
information,
etc.
to c o m p u t e
a new set of c o n f i g u r a t i o n s
from each
configuration.
STEP, (usually
a configuration uses
and a t t e m p t s
to follow
state, in
are shown
flowchart 1972]).
form in F i g u r e
(These
flowcharts
[Woods et al,
224 STEP..
=, RETURN NIL
YES
I OET,,cs ,OR
! COMPUTE ARCS I
I CUNRENT I STATE
SYSCONJARC IF NEEDED
I ~ o ~ EXARCS
NO
ONFIS~
RETURN NIL
~YES
NEXT ARC ANDFEATURES LIST OF BLOCKS NO RETURN ==VCFS
I TYPEANDCHECK CONDITIONS NO
COMMENT RETURNNIL
1=~NS IPERF~M I
L ~
FOR ARC
Figure
10c:
The F u n c t i o n
STEP
225
PARSER:
YES
YES
?NO
~
ALTCAT~
ALTCONJ ALTLEX
TORESUME
ALTERNATIVE
I APPLYSTEP TO EACHACF
YES ~DNO~. ~
Y E S +
NO
'E$
I - '1 1
ERROR COMMENT RETURNNIL
I
Figure lOa: The Function PARSER
226
LEXIC:
/,,
~,~ ALT$f / ~
NO~",~
GENERATE
ALT O,.,P
i RET~
ALTERN"T'VE I
N ~ YEs ~ NIL RETURN
N@YES
INO
~
YES
CALL MORPH I
YES
IM'KESUBST'TOT'ON I
I
iALT E RNATIVE S J
"-..~R LEX?/
IA'-TS FOR
~YES TRY 1"OMATCH I ICOMPOUNDS i I--CHOOSELONGESTI IMA'('CH GENERATE I a I ALTS, ANYOTHERS FOR I
RETURN ALTS
;NO
PRINTERROR COMMENT RETURN NIL
Figure
iQb:
The F u n c t i o n
LEXIC
227
When STEPed, a
PARSER
runs
out
of
active is
function
called
DETOUR
If nothing special has been done to give the DETOUR will pickup alternatives in the
weights,
This parser has the advantage of allowing actions on the arcs influence the order in which alternatives are chosen. the search
to
that it is not necessary to completely exhaust following all possible parse paths; by
careful
ordering
of the the
alternatives, others to be
the most likely parsing can be found first, found later if necessary. are ambiguous, alternatives.
leaving
the alternatives implicit in the factoring and merging help to reduce the number of alternatives which must
of the grammar,
This
parser
has
also
proved
to
be
flexible
testbed for on
modifier
(compiled in LISP) using the grammar of Figure 9 could sentences in about 2 seconds of cpu time on a PDP-IO.
parse
Several mentioned.
important
efficiency
The first is the problem of storing register contents in a Although only a few
to be used on any one arc, the total number of quite large. A representation
registers to be copied would be wasteful of both space and time. problem can be solved by representing a register list name/value pairs. as a list
The function GETR searches the list from front to and returns the
back for the first occurrence of the named register associated value. SETR does
not change the name/value pair in the front a new name/value pair.
will effectively hide from GETR any old pair with the particular
Thus to preserve the entire register list at a a pointer to the current front
remembered. remember
Other processes can add new information to the their new front pointer. Thus the on
only
information
228
process makes
path than
faster
registers
are g e n e r a l l y
parsing is likely At
particularly
in
backup
well-formed-substring just found together the WFST. see whether current directly level. a
table with
(WFST).
every of input
the portion
a PUSH input.
of the desired
instead taken
of redoing
slightly,
constituent
matches parsed.
the constituent
originally
situation
3b.
Inside-Out
Parsers
Two
parsers
have
been are
for
an
speech
system
which
preceding
They
illustrate
the e x t r a o r d i n a r y where
flexibility
of ATNs with
to the environment
cannot
disputable
effects
(why choose,
shoes),
function
in context
induced attempt in
process. system
it is not n e c e s s a r i l y
left
to right; the
it may be better long content ends. In and about what word this can be these of
begin the
a reliably provide
identified out to
to a portion
of the
processed,
expectations subsequent
system
in its analysis
of the utterance.
229
these b,
in mind, could
a parser
was
developed
[Bates,
start
parsing
anywhere as to
despite at
each point
As p a r t i a l so
in tables
that
them
need to r e p a r s e paths
sections complete
of input.
a WFST
for p a r t i a l the
rather could
constituents.) about
grammar,
parser
of lexical
classes
that c o u l d be used If
a sequence
either small
between
words
of w o r d s could
be m o d i f i e d of
fairly backup,
to a l l o w e x p e r i m e n t a t i o n and p a r a l l e l search.
combinations
sequential, and
a combination following
of d e p t h - f i r s t
breadth-first
techniques,
into p a r a l l e l to i n t e r a c t (notably
desirable.
components guidance
system
and to
verify
completed
constituents.
the parser
set up
an
index nouns
into could
consume of the
be easily were
located.
Then,
utterance
presented
to the parser,
it would
retrieve
By a d d i n g onto to make
the right and the g r a m m a r "island" utterance. maintenance very general of words Careful of could use
predictions until it
to the
spanned
of
table paths
numerous
alternate
allowed
middle-out
parsing
algorithm.
faster, [Woods
more et al,
efficient 1976].
parser
for
the
strategy.
It did less
interpretation because
system
it p r e - p r o c e s s e d states.
relations
between
holds
from state
SI to
arrays
the g r a m m a r left-to-right
~n a b i - d i r e c t i o n a l way it is written.
way rather
than the in
The sets of r e l a t i o n s
230
were
used
by
the
parser
to the L 1970]
algorithm similar
PUSHes was used. This parser all possible had the advantage paths of being able to follow efficiently thus eliminating the need to
partial
in parallel,
checking
setting,
to be made
predictions
tightened
information
added to the utterance. The only accommodation this that the grammar writer needed to make contents taken to
state or states where the original (These context small "scope" price to declarations pay for the were is present to perform
setting
have
predictions.
3c. A Grammar-Compiler
A system has been written Woods, 1976] which produces In this 11.
by R. Burton
[Burton,
1976;
Burton
and
an extremely
a compiled
program which is
is the in
parser/grammar
is schematically program
illustrated shown
The operation
resulting [Burton,
are from
1976].)
The
function which
is produced
in which the state names of the grammar and actions are the "statements".
become
The function
(LAMBDA (PROG
SPREAD-ACF
configuration)
231
(GO EVAL-ARC) NEXTLEX DETOUR (if (another word?) (if (another then (advance then input)
(GO EVAL-ARC)) alternative?) (ACF-BIt) (GO SPREAD-ACF) else EVAL-ARC arclabell arclabel2 (BRANCH STATE arclabell (code for arc) (code for arc) (RETURN failure)) arclabel2 ... arclabeln)
arclabeln
WRD, CAT,
looks like:
(create
The next
function arclabel,
DOTO which
changes
If the to the
"falls
is the following
The arc code for the last clause. invoke the grammar:
(GO DETOUR)"
(if (test satisfied?) then (create alternatives for remaining arcs) (DOPUSH pushstate (GO pushstate)) remaining-actions-label)
remaining-actions-label (do actions on PUSH arc) (DOPTO nextstate) (GO nextstate-lstarclabel) The function DOPUSH saves as the the current current configuration state) (with the
to initialize
registers.
The lower
232
~
USER ARC ACTIONS
ATN GRAMMAR
R A M M A R coMPI
1
ATN OBJECT CODE BJECT LANGUA ENVIRONMENT
L "ME~ Ru~TT,
J
SE.TE,.CE
V.T
~ . ~ .NTERUSm~
'
XICAL ROUTINES
PARSE(S)
Figure ii:
233
CONFIGURATION
I_
~
y
T
CAL
~
NO
!
I
O~E~r I 1 CONF,GURAT.ON J
p~NO RESTORE HIGHER I I SAVECURRENT LEVELI
NO
NO
u
PERFORM ARC ACTIONS ADVANCESTATE AND LEXICALNODE I
I
Figure 12: The O p e r a t i o n of an A T N Parser
234
by
the GO to the arc label of the state. control will transfer the state
When
level the
finishes, actions
by DOPTO,
is begun.
POP arcs
produce
code like:
(GO E V A L - A R C ) )
DOPOP
builds
the s t r u c t u r e
to be returned,
restores
the higher
level
by g e t t i n g and sets
information
at E V A L - A R C
(remaining-actions-label).
"parser" methods,
executes about
very
much
faster
10 times
faster
than
parsing
150 m i l l i s e c o n d s .
For extremely
production good.
programs
where
speed
this m e t h o d
is
The c o m p i l e r it
whether of
generation much
sort of c h e c k i n g features
eliminates
allows
unused
lexical
removed,
efficiency
a change
is to be made
-- a time
consuming
process,
if the in the an
or d e b u g g i n g
version.
A method
ATN s y s t e m
in a language
w h i c h does
not have
4. V A R I O U S
T Y P E S OF ATN G R A M M A R S
In
this
section
we explore
some of the t r a d e o f f s to w r i t e
and d e c i s i o n s for a
w h i c h must be made
by someone
who w a n t s
an ATN grammar
235
particular
purpose. which
earlier,
there
or is
purpose.
The
in t e s t i n g a
a theory of l a n g u a g e programmer
will w r i t e
systems
a small, of
and both
for i n s t r u c t i o n a l
purposes.
4a.
Parsing
vs. G e n e r a t i o n
to
build
"natural as
think
of a g r a m m a r i.e.
exclusively
for analysis,
to g e n e r a t e computer
instruction
and g e n e r a t i o n ,
is w r i t t e n
it can be t h o u g h t
as a form w h i c h
is i n d e p e n d e n t
or p r o d u c t i o n .
(rather a
than a parser) as
which
takes
an
and
dictionary
produces the
sentences. can
initial
grammar,
generator
t O be followed. be n e a r l y the
VIR, WRD
from the
dictionary
which
satisfies
the c o n d i t i o n s
Of course, since
the method
just o u t l i n e d
will or
produce
it is in no way however,
guided
by i n t e n t i o n s which
concepts. supposed
useful,
are
to be "tight" input.
and reject
incorrect
Such
speech
discovery
sentences
accepted
by the grammar.
generation proposed to
scheme
[Simmons,
1973;
Simmons
1972]
which a
is driven
by a s e m a n t i c A similar whose
network
grammar system of a
similar 1975]
BTN an ATN
grammar. grammar
but m o r e g e n e r a l is a node
[Shapiro,
uses
input
236
semantic describing
network
and
whose
output
is
a linear
string,
a sentence
the node.
is to be w r i t t e n assumption of E n g l i s h
exclusively
one
can
input will
according be the is
rules
in a computer
assisted
system which
supposed
and c o r r e c t
input
from s t u d e n t s dialog
to parse
a naturally
spoken
or t r a n s c r i p t
to w r i t t e n in many
text.)
the a s s u m p t i o n be simplified
places
to accept
incorrect
A grammar checks
which
is w r i t t e n
to g e n e r a t e
to e l i m i n a t e to help
incorrect
combinations.
designed
choose
must make
extensive
tests
to screen
input.
4b.
Competence
vs.
Performance
Linguists actually it
have
long made
between
language
use it (performance)
(competence). that
As an example, relative
one k n o w s but
a reduced of
it is a
fact
performance
embedding
performed
only once
position:
the man k i s s e d
tells
must
agree
in number such as
yet sentences
are often
in the band
plays
at the
football
game.
them!
is a verb
rule
which
(as in "call
be m o v e d
object
long or c o m p l e x
people would
the s e n t e n c e
237
I called
up my neighbor. up. in the town where forced me to move I used to live before to C h i c a g o last month up.
financial
Most outside
systems
must
deal with
of p e r f o r m a n c e this reason,
which the
are term
the formal
competence
becomes
ambiguous may be
referring
conventionally
still be a c c e p t e d yet be r e j e c t e d
by the grammar,
or it may be
conventionally
by the grammar.
4c. S y n t a c t i c
vs.
"Semantic"
Grammars
particular of
applications, words
one
of
key
classes
by r e s t r i c t i n g only
to accept also
syntactically
but
semantically
pragmatically.
It is p o s s i b l e not only) by
which
classifies
words
not
(or
their
of speech
groupings.
in a s y s t e m
to p a r s e
at a zoo,
lexicml could
PEOPLE, words
PEOPLE-ADJ lion),
contain
(boy,
father,
woman), vain)
(hungry,
caged,
fierce),
(hungry,
naughty,
educated,
respectively.
13 shows
portions
of two d i f f e r e n t
semantic
grammars. to
Of any
a grammar extent.
may blend
syntactic
and s e m a n t i c
categories
purely
syntactic accept
grammar a large
would
be c o n f i n e d of
to the usual
parts
of speech which
and w o u l d
syntactic speep be
constructions In be
were m e a n i n g l e s s domain a
("Colorless where
ideas
a limited
of d i s c o u r s e semantic
input
can
meaningful, parsing
grammar
is a very
efficient
to the
problem. See
for the g e n e r a t i o n
meaningful
sentences. s y s t e m using
for the d e s c r i p t i o n
of a very e f f i c i e n t
such a grammar.
238
~
ff
~~.o.,
~T~,oo~"
C4T 4f
POP
MEAS/QUANT
CAT
(B~TWEEN i FROM}~.~ ~ S ~
pRONOUN/
~O ~ANDTG,~
?.OS~IpRONOuN/
Figure
13a:
A Semantic
Grammar
for C i r c u i t
Measurements
Semantic which author system actions accept same accept knows [Bates,
grammars
tend to be much l a r g e r
than s y n t a c t i c
grammars this
The l a r g e s t
ATN g r a m m a r
1975];
it c o n t a i n e d limited
arcs,
in the v a r i e t y 386 a c t i o n
to a s e m a n t i c
it must be
be w r i t t e n extremely
of d i s c o u r s e to w r i t e
is changed,
and it w o u l d for
to a t t e m p t area.
such a g r a m m a r
anything
but a l i m i t e d
application
4d. ~
Prosodics
Because complex
syntax,
semantics, languages,
and
pragmatics
interact to a t t e m p t
in a very a similar
way in n a t u r a l
it is d e s i r a b l e
fusion when m o d e l i n g
linguistic
processing.
~USH y.~AR
E- ~'~
EX-ALL-OF?/
O~'-
EX'DET'PL- INDEF/
- - -
Figure
13b:
A Semantic Grammar
JUMP , . . ~ ~ . - - - - - - - ' - -
FOR
~u~rp THERE
PUSH CiTy/
BUDGET JUMP
POP
- PREVIOUS PRECISE PO
COST C O ~ T . ~ = v ~ . .
EXPENSE-S
0 pUSH
- EX~SE/COS~-~
PUSH DATE MOD__~/ JUMP
JUMP
Figure
13b
(con't):
A Semantic
Grammar
for Travel
Expenses
241
As
was
described on
above,
it
is
possible
to
place
constraints this
the grammar
lexical
categories, Tests
actions
semantic
phrase The to is
agreement
parser
may also
be m o d i f i e d
a semantic
verification
or m e a n i n g f u l n e s s
of each c o n s t i t u e n t
the order
of a l t e r n a t i v e s
to to
the s e m a n t i c
called
effect
functions
detect even
prosodic
punctuation actions)
appropriate
[Weischedel,
1976]
demonstrates
the
and e n t a i l m e n t s
of fairly c o m p l e x
Consider
believed
that
some
special
in the lexicon
the r e l e v a n t
structures.
For
systems
dealing
with speech
input,
prosodic system
information [Woods et
is al,
understanding PBDRY
called
consumed function
before in
which effect
a prosodic change
boundary
depending in the
upon w h e t h e r expected
boundary
location.
information
good c l u e s ( f o r
242
humans)
about vs.
syntactic a yes-no
boundaries, question),
sentence pronominal
type
(for
reference,
is currently
known
about how to
automatically
It
is
not one.
always A system
necessary
to
have
the of
efficiency a syntactic
of a semantic
being a
built which uses a very general case-oriented rules). and semantic dictionary
ATN grammar
together with
semantic
interpretation parsing, is
can be done as soon as a constituent size (75 states, and 153 arcs)
complete. sentences
This grammar
is of moderate
but,
using Button's
grammar-compiler,
has parsed
interpreted
20-word
5. LINGUISTIC
ISSUES
are
number degrees
of of
syntactic depth by
structures linguists.
which have been The current is not very purposes. ideas of and
varying
linguistic
theory,
grammar,
for either a
provide
which
grammar
computationally
efficient
interesting
it is not necessary
to account
language,
for practical
linguistically, captured
smoothly
by an ATN grammar.
5a. Extraposition
In
many
English position
sentences, up and to
deep structure
243
subtree examples:
to
a place
in a d o m i n a t i n g
tree.
The
following
sentences
are
This
I was afraid
someone
was going
The man
with
this c o n s t r u c t i o n cannot
is that
the c o n s t i t u e n t
when
parsing
later
because
the place
or more reached,
levels
down;
place
input
the c o n s t i t u e n t
and i n a c c e s s i b l e
on the stack.
The HOLD in an
action
to h a n d l e action
this
problem a
efficient, with
The HOLD
associates list
a name
this pair
on a s p e c i a l a global
called which is
list.
The HOLD
list is in e f f e c t
variable
at all lower
levels.
level
or at it the in
at the held
position are
insure
constructions
dominated so that
by the one in w h i c h
found,
the parser
if any c o n s t i t u e n t s level
list at the c u r r e n t
remain on it.
The
HOLD-VIR
mechanism It would
is not a b s o l u t e l y be possible to
a register
w h i c h was then
sent down
every time
for that
register
constituent indication
in its proper of w h e t h e r
Every
complicated, the p r e v i o u s
error-prone mechanism.
does
the c l a i r t y
Another place of
using the H O L D - V I R
mechanism
is
to
use
in
VIR
built.
such
in F i g u r e the HOLD
level have
(on the PUSH arc on the been done) a copy of the the
action
structure
returned
can be made,
substituting
An a d v a n t a g e constituent
244
node may be placed to s u b s t i t u t e numerous in the WFST for use by other paths, which may there an e x p l i c i t want are test
a different
structure compared
However,
disadvantages
a constituent
from a wasted at
much
w h i c h cannot do a g r e e m e n t
be r e p l a c e d tests
levels,
between
the e x t r a p o s e d
/\
I
"t/T
NP AUX NU SG
VP
NlU
SG
Node
Figure
14: A R e l a t i v e
Clause
Tree w i t h
a Dummy
may happen
to the
right
as well as
the
left.
In
The resort
I went
was pleasant.
How many c h i c k e n s
complete
constituent
is m o v e d
constituent position
(which
is
still
completely
well-formed
despite
to a
farther
to the right
and above
the o r i g i n a l
constituent.
To register some
handle
this
right <state>)
The action
(RESUMETAG
a marker
could c o n t i n u e (RESUME)
later
if the action
encountered the p a r s e r
an arc,
the m a r k e r
is r e t r i e v e d was e x e c u t e d
was in when
the R E S U M E T A G
245
text
as usual,
and when
a POP is done
the
as the c u r r e n t
input
5b.
Conjunction
in E n g l i s h theoretical its
is
problem
computational are
linguist. conjoined; in an
simplest complex,
complete constituents
constituents
at its most
numerous The
overlapping of c o m p l e x i t y .
manner.
following
sentences
range
brown
fox and the lazy dog are with the red, orange
this m e s s a g e
you plant
the roses or cut the g r a s s ? my cat. the c o u r s e hard to fail. the boy next
I groomed
and b r u s h e d
fought with,
later married,
and soon d i v o r c e d
complete
constituents
could
be
conjoined the
(either
by
conjunction, conjunct,
having the
conjunction
before all by be
conjunction be made
between conjoinable
X could
c a l l e d XLIST. PUSHed to
arcs w h i c h
XLIST/
PUSH to X/,
accumulate
would
conjunction different
example,
"and,"
precedence,
then a m o r e
complex
grammar
required.
general
conjunction
which
can
violate
constituent to
cannot
be c o n v e n i e n t l y
handled which
It is p o s s i b l e insight
provides
called
SYSCONJ
in
[Woods,
conjunctions
and c o n s t r u c t s any c h a n g e s
unreduced
structures
246
is the
reduced
conjunction In
as seen in "John seldom talked about and tried to forget the war." this case the conjoined fragments "... to forget ..." are not constituents, There is,
seldom talked about" and "tried nor are they similar at either however, a great deal of
parser
the SYSCONJ a
facility before
may the
be
built grammar
conjunction
processes
history of the parse up to that time in the form of configurations the path and stack. SYSCONJ selects one of these configurations a deterministic, and non-deterministic,
decision)
restarts
restarted some
reaches
portion of the input which is shared with the suspended configuration. Then the paths can be combined and the constituent completed. at which the If the
suspended
This method is potentially very combinatorially it may be guided by some syntactic heuristics
explosive,
but
before a word which is identical to or of the word after the conjunction and in the house barked"). (e.g.,
this has not been investigated in depth. embedded problem. complex structures.
247
Usually
punctuation
He eats fried eggs, over-ripe bananas, and soft custard. It is useful punctuation to have a prepass to scan the input to separate , or WRD ;
or CAT PUNCT may be put in the grammar. It is tempting to separate endings like "n't," and be a turn prepass, but here the problem is more complex. for "is" or There it are can "'ve," and "'s" ~"'s" can
them into the separate words "not," "have," and "is" in the For example, be the possessive morpheme predecessor on where may be restrictions grammar to words
("Helen's jewelry"), and "n't" can change the form of its ("won't" not"). also contraction can occur lost by
("Mary's eating and Joe is too," but not "Mary's One way to handle which
eating and Joe's too"), so information useful to the expanding the contraction out of context. the latter problem is to attach a feature CONTRACTED have been so expanded.
5d.
Modifier Placement One of the most common sources of syntactic ambiguity in English is modified by a modifier. This is the true of a series of prepositional phrases, and it also Often, but not always, semantics determines
is the problem of what head particularly occurs with adverbs. correct attachment: I saw the man in the I
park
~with
telescope~ J
~ark
sui~
~igeons.
t o ~ Jane on Friday.
!
L menti~
]
recently. ~ikesJ
248
parser
incorporated by a special
a selective type of
facility
SPOP. the
When stack
SPOP other
was e n c o u n t e r e d , that
about to be SPOPed.
sentence find
shown
configurations for a r e l a t i v e At
(after
"Jane"),
"book"),
(after PUSH,
configuration
for
process Then
determines it finds
instead
of the PUSH.
level
(the PUSH
clause)
for a p r e p o s i t i o n a l for a r e l a t i v e
phrase.
This
the c o n f i g u r a t i o n prepositional
PUSHing
clause
is a c a n d i d a t e
for the
modifier.
up
the
stack Then
in
the
same way,
is made.
semantic
information is
represented
semantic
forbid m o d i f i e r s require
modifiers
to make a
and heads
modifier
with
needs (in
Alternatives
are c r e a t e d
case of b a c k u p ambiguities),
or to e n s u r e
the e v e n t u a l one
production
and the p r e f e r e d
is continued.
6. D E V E L O P I N G
AN ATN G R A M M A R
This grammar.
final
section that
is d e s i g n e d
an ATN
Remember
in order the
to e f f e c t i v e l y structure of
about
(even a r e s t r i c t e d the l a n g u a g e
subset)
of e x p r e s s i n g
in an ATN
form.
of parses
is u s u a l l y
sufficient
for t e s t i n g
249
The mind
first types
step of
in w r i t i n g sentences one
a grammar would
is to have
a clear What
idea
in
of the
like
aspects going to
of c o m p e t e n c e be used
must
be h a n d l e d ?
Of p e r f o r m a n c e ? Make a list
grammar or t w e n t y
to g e n e r a t e Then for
or to p a r s e ? decide on the
sample which is
sentences. desired,
general
of g r a m m a r which will
example,
decide
on a s y n t a c t i c
grammar
produce
stratificational
structures.
sketch find
an ATN
of
the
most After It
common the
in your add
sentences. and
surface
been
drawn, chosen
a few
is a good on
idea
state its
and r e g i s t e r together
to r e c o r d
every
purpose Like
a sample
phrase
arc.
commenting
written, come
this
never easier
actually it will
closer
to this
be to debug
and m o d i f y
grammar
later.
for
portions
which
are
identical. and m a k i n g
There a new
to c o n s o l i d a t e by PUSH
by l o o p s
be r e a c h e d the to choice
grammar a new
is e v e n t u a l l y of the
of w h e t h e r
to create more
network
or to use
a longer,
complex
of arcs
in the
original
grammar can
writers,
where
to
a a a
be a problem. but
natural a
phrase
constituent,
what
verb
phrase?
determiner?
an a u x i l a r y ?
a reduced
relative
clause?
linguists To them as a
have
fairly
fixed
ideas
about
what
constitutes which
grouping part
can a
whole
unless
is also
to a n o t h e r respect
and w h i c h This
obeys
certain is not
definition
to w r i t e r s it does
applications , of words
except
indicate
certain is
groups
in s e v e r a l single repeat
places of
more
efficient units
to
process them.
such
rather a
and
nodes nature
comprising ~f the
required phrases
the may
structure
in noun contain
contain
prepositional
phrases
must
250
noun
phrases. when
In a
other section
cases of
a input
PUSH can
may be it.
be
used
merely
for
convenience
processed
relatively
independently
of any i n f o r m a t i o n
preceding
of the n e t w o r k there
which
are very
similar, either
identical. using
are two a l t e r n a t i v e s : to k e e p t r a c k
merge
registers
and tests
of which
followed, to convey
or make the
to be r e a c h e d about
by PUSHs w h i c h
information tests
the d i f f e r e n c e s . since it
similar and
networks
is u s u a l l y
desirable
concisely
expresses
some g e n e r a l i z a t i o n however,
about the
language. is lost
this m e t h o d the
is c a r r i e d
to extremes,
complexity
of the tests.
One c o u l d m e r g e tests,
network would
a huge number
of c o n d i t i o n a l
the language!
The permit
merging
of
common
portions
than of
a more
compact
redundant by m a t c h i n g performed
processing
to match)
information
redoing
for
each
sentences
the parser
Try to sketch
constituents those
to the use a
grammar
to p r o d u c e action
structures.
conditional tests
and actions.
the n e t w o r k being
by u s i n g
self loop
taken more
grammar
fragment
in F i g u r e
15 may be r e d u c e d in F i g u r e 6.
state
and one
if it is r e p r e s e n t e d
as it was
tests
on PUSH
arcs
is a great time
saver,
is wasted input.
if a r e c u r s i v e
call
any
Figure
15:
A Grammar
Fragment
Which
May be Reduced
is to parse on t h o s e
a number which
of s e n t e n c e s It is
of the parser
fail.
facility
in the parser w h i c h will (not the whole arc, each structure POPed, just and
each r e g i s t e r
state w h i c h
blocks.
is useful parsed
not only
to
debug
sentences
which
didn't to a
which
and r e t u r n e d
the w r o n g of
structure,
but also If
in the p r o c e s s i n g but r e q u i r e s
correct
sentences.
correctly
to look
for b e t w e e n
be r e o r d e r e d
the c o r r e c t
and w r o n g paths be m e r g e d ?
consideration
in a d e c i s i o n should
among
several
adequate
network all,
representations purpose
method.
the u l t i m a t e about
of w r i t i n g
communicate beings. of
something
the s t r u c t u r e
of l a n g u a g e clarity
A grammar
w r i t e r who always
sacrifices amounts
efficiency
will w a s t e
extraordinary
of time m o d i f y i n g it work,
explaining it fast,"
The p r o g r a m m e r ' s
maxim,
"Make
then make
be heeded.
of the g r a m m a r it
interacts harder or
with m a n y
others,
so as
the
large
becomes
implications
of a d d i t i o n s
commenting list
of the g r a m m a r which
of s e n t e n c e s
all parts
new c a p a b i l i t i e s list,
are added
just to be sure
to p a r s e
does.
252
grows,
will
probably
grow
at
the
of what whether
features a word
the grammar
decide
feature
or not.
See A p p e n d i x
II for a d e s c r i p t i o n
of the
w h i c h may be kept
in the dictionary.
The
reader
studied grammar
presented with
parser
experience
mechanism,
would
greatly of others'
suggestions,
reports
like to e x p r e s s
appreciation
to W i l l i a m
A. Woods
for
his is my
of a draft
of this paper;
the r e s p o n s i b i l i t y
for errors
253
References
Bates, M. "The Use of Syntax in a Speech Understanding System," IEEE Transactions on Speech and Signal Processing, Vol. ASSP-23, No. I, Feb. 1975, pp. 112-117. Bates, M. "Syntactic Analysis in a Speech Understanding System," BBN Report No. 3116, Bolt Beranek and Newman Inc., Cambridge, Ma., 1975. Bates, M. "Syntax in Automatic Speech Understanding," American Journal of Computational Linguistics, Microfiche 45, 1976. Bobrow, D.G. and Fraser, J.B. "An Augmented State Transition Analysis Procedure.', Proc. IJCAI, 557-567, 1969. Network
Bobrow, R. and Bates, M. "The Efficient Integration of Syntactic Processing with Case-Oriented Semantic Interpretation,~" submitted to the Annual Meeting of the Association for Computational Linguistics, Georgetown University, Washington D.C., March 1977. Burton, R.R. "Semantic Grammar: An Engineering Technique for Constructing Natural Language Understanding Systems.~" BBN Report No. 3453, Bolt Beranek and Newman Inc., Cambridge, Ma., December 1976. Burton, R.R. and Woods, W.A. "A Compiling System for Augmented Transition Networks," presented at the International Conference on Computational Linguistics, Ottawa, Canada, June 1976 Earley, J. "An Efficient Context-Free Parsing Communications of the ACM. 13, 1970, 94-102. Grebe, K. "Verb Clusters of Lamnsok," in J., ed., 1975. Network A Algorithm." Grimes, Network
Grammars, Guide," in
Grimes, J., ed. Network Grammars, a publication of the Summer Institute of LinguisTics of the University of Oklahoma, 1975. Leal, W.M. "Transition Network Grammars as Tagmemics," in Network Grammars, Grimes, Rustin, R., ed. Natural N.Y., 1973. Language a Notation Scheme J., ed., 1975. Algorithmics for
Processing.
Press,
Shapiro, Stuart C., "Generation as Parsing from a Network into a Linear String,L" American Journal of Computational Linguistics, Microfiche 33, 1975. Simmons, R.F. "Semantic Networks: Their Computation and Use for Understanding English." in Computer Models of Thought and Language. Eds. R.C. Schank and K.M. Colby. San Francisco: W.H. Freeman and Company. 1973. Simmons, R. and Slocum, J. "Generating English Discourse Semantic Networks," CACM, 15:10 (Oct. 1972) pp. 891-905. from
254
Teitelman, W. INTERLISP Reference Manual. Center, Palo Alto, California, 1974. Thorne, J.P., Bratley, P., and Dewar, H. English by Machine," in Michie, 281-309, 1968.
Xerox Palo
Alto
Research
Weischedel, R.M. "A New Semantic Computation While Parsing: Presupposition and Entailment." Technical Report 76, Department of Information and Computer Science, University of California, Irvine, California, 1976. Weissman, C. LISP 1.5 Calif., 190"?-~. Primer, Dickenson Publishing Co, Belmont,
Woods, W.A. "Augmented Transition Networks for Natural Language Analysis." Harvard Computation Laboratory Report No. CS-I, Harvard University, Cambridge, Ma., 1969. (Available from the National Technical Information Service 5285 Port Royal Rd., Arlington, Vao, 22209, USA, as Microfiche PB-203-527; also available from ERIC, PO Box O, Bethesda, Md., 20014, USA as publication ED-O37-733) Woods, W.A. "Transition Network Gr.ammars for Natural Language Analysis." Communications of the ACM. 13(1970), 591-606. Woods, W.A. "An Experimental Parsing System Grammars." Natural Language Processing, York; Algorithmics Press, 1973. for Transition Randall Rustin, Network ed., New
Woods, W.A., R.M. Kaplan, and B. Nash-Webber, "The Lunar Sciences Natural Language Information System: Final Report." BBN Report No. 2378, Bolt Beranek and Newman Inc., Cambridge, Ma., 1972. (available from the National Technical Information Service as publication N72-28984) Woods, W.A. et al, "Speech Understanding Systems, Final Report Vol. IV (Syntax and Semantics)," BBN Report No. 3438, Bolt Ber~nek and Newman Inc., Cambridge, Ma., 1976.
SYNTACTIC ANALYSIS
OF WRITTEN POLISH
University
Abstract The aim of the paper is to give an idea of methodology cal solutions used in the design of an experimental and of techni-
syntax-oriented
program to process Polish texts; the program is currently being developed by the author. A classification of Polish words is presented. category and it covers in
Polish syntax is to be described by means of a formal grammar; description takes into account some newer results
describe syntax is the Colmerauerts metamorphic will be implemented in PROLOG, which metamorphic
program will be the surface syntactic Next, a subset of Polish is specified. ces to be processed by the program. gram are given.
of each sentence.
Finally,
262
1~ Introduction
Syntactic analysis maY b e understood in several ways, depending on the definition or the description of syntax itself and on the task performed by the analysis process, gorithm (program, system). for example by some al-
The analysis may concern texts, or sinThe results of analysis depend again on between them.
or phrases.
I use the results of Saloni (1976) as the theoretical foundation of my syntax definition; I confine myself to sentences (com-
pound clauses as well as simole clauses); ve sentences for the time being.
tic function of each word in the sentence by indicating the possibility of locating it in the abstract syntactic tence, e.g. in such units as noun-phrase, structure of the senadjective-
verb-phrase,
ending or on another similar formal represenconsist first of all in matching categories as case, gender, number
distinguished.
263
The words are the basic constituents of a sentence. I do not use the notion of morphemes, that is, I do not distinguish stems and
affixes. Instead, I assume that each word is supplied with sufficient inflexional and syntactic information. This can be achieved
by some dictionary-based preprocessing of the input sentence. The outcome of syntactic analysis is the sentence representation revealing its structure and the relations between particular words. It might be, for instance, a tree of parsing.
284
description
of words
language,
featu-
The word
less important
from the point of view of Polish on the stylistic and semantic cha-
racteristics
seven inflexional
categories.
six values
- nominative,
genitive,
it is useful to single
boy );
- "cock") ;
("kobiet~',
. II
"leg") window i,
( dzlecko
, "okno"
9
child
l
'l
,
tl
,|
II
);
II
It
II
, L drzwi
siblings
, "door").
second,
- positive,
comparative, imperative,
it is sufficient
to distinguish
(the compound
at the level
265
besides
0 (i.e. " a n o t h e r
like in an implied
and superlative
I assume that words can be gathered into sets called lexemes, which group words differing only in the value tegory (or values of categories). of some grammatical ca-
tionary entry of the kind used e.g. in the great dictionary of Polish (Doroszewski 1958). category of a word is the category that to which the word be-
It is easiest
each lexeme by simply listing its features are actually recategories: the
a preposition decides the case of a noun; a noun has the selective category of gender which determines ted with the noun. per category lective The selective the gender of an adjective conuec-
266
The ge~eralized
selective
fier; a preposition grouo; and several combinations Moreover a subordinate ("I know that syntactically, equivalent clause can be required,
as in "wiem,
...i,). All mentioned word categories are understood that is " a n infinitive" stands for every distributive
of an infinitive,
by an adverb. I assume for the sake of clarity that no word has more than three different requirements at a time; the assumption seems to be
justified in almost all cases. The inflexional s~rntactic requirements tegories. Below I present the classification combinations ignored. Several of relevant syntactic of words according to the Irrelevant ones are categories (both oroper and selective) and the ca-
categories.
Basically each class has a unique selection of categories. classes are further subdivided. Needless to say, the classi-
fication is arbitzary,
although relatively well suited to the recent 1974, 1976a); there are
results in the morphology and syntax (Saloni also some similarities (Misz, Szupryczy~ska
267
number,
p - person,
d - degree,
m - mood,
t - tense,
quirements greater
consistency).
The symbol x o means that the category x has of the values categories of remaining categories. Pro-
are separated is
of categories
categories
Word class I ) Noun 2) Substantival 3) Adjective 4) Adjectival 5) Adverb 6) Adverbial 7) Numeral 8) Preposition 9) Conjunction 10) Personal verb three subclasses sets of proper These are: pronoun ("tak""so") pronoun ("taki" - "such") pronoun (e.g. "ja'' - "I")
n,p,mo,to;-;rl,r2,r3 verb
("9 md~ " , "przyJd% " - "I go", 9 "I shall come") n,p,mo,to;-;rl,r2,r3
2~
"I knew")
g,n,p,m, to;-;rl,r2,r3
11) Impersonal verb ("zrobiono" "one did" or "it was done") 12) Infinitive 13) Gerund 14) Adjectival participle This class is further divided into two subclasses with distinct syntactic functions but with identical categories: 14.1) Active participle ~ i d ~cy
I ,, 9 "
"going" as in "a going man") 14.2) Passive participle ( " b i t y " - "beaten") 15) Adverbial participle ("id~c" "going" as in "he slept, going 'l) 16) Auxiliary verb "b~d~" ("shall", "will") n,p;-;-;-;rl,r2,r3
constitutin~ the compound future 17) Unclassified, i.e. anything else; this class has no syntactic categories. Remarks:
a) An adjective may have certain requirements which will be taken into account later. b) At the present stage of research the list of categories of the numeral is still incomplete. c) Certain characteristics of the conjunction can be categorized, for example affinity to another conjunction, say, ~'either" to " o r " | "if" to "then". Such facts will be investigated later.
269
d) The mood of present verbs is fixed otherwise than that of imperative verbs. e) The class
Jz
unclasslfled
LI
to include
abbreviations,
The process of assignin~ each word in a given sentence a set of values of its syntactic categories I call syntactic preprocessing. A simple search algorithm will suffice if only the search soace is oroperly organized. One approach can consist in writing down all inThe dictionary obtained in
flexional forms of all words of vocabulary. such a way should also include selective
I assume that the syntactic preprocessing can be relatively easy to implement or at any rate easy to simulate. nnections between words need to be analysed. It is so because no coThe syntactic categories
of a word can be singled out solely on the basis of its apoearance. Any oossible ambiguities can be solved just by reoeatin~ an appropriate dictionary entry as many times (with suitable values of categories) as is needed to account for those ambiguities. Therefore in further considerations I shall use freely all necessary syntactic information. The above classification and the grammatical characterization
of word classes have been already outlined and partially verified in the NARu system (Bie~ e~ al. 1973, 1973a, 1973b, 1974; ~ukaszewicz,
270
~. The m e t h o d of s2ntax d e s c r i p t i o n
Syntactically
p r e p r o c e s s e d words are the terminal symbols of this grammar. The nonterminal symbols (further r e f e r r e d to as syntactic units) are chosen a l t h o u g h a c c o r d i n g to some linguistic intu-
tic units r e p r e s e n t i n g any word of a particular class (cf 2). Actually, the syntactic units are not listed explicitly, they are instead g i v e n
i m p l i c i t l y by a set of rules. The words are not listed at all: the set of words is determined by the content of a dictionary. The tas~ of syntactic analysis consists in m a p p i n g an analysed sentence onto an appropriate structure; such m a ~ o i n g need not be uni-
que but it should reflect the fundamental characteristics of a sentence. W i t h i n the adopted set of replacement rules one should be able to find (for each sentence of a p r e d e f i n e d collection) at least one sequence of rules which constitutes a derivation of a given sentence from the axiom of the grammar. The d e r i v a t i o n should comprise every m a t c h needed to ta~e into account values of syntactic categories of words w h i c h make the sentence. Every syntactic unit has also some syn-
tactic categories due to the word class d i s t r i b u t i v e l y equivalent to it. These are the external categories of a unit w h i c h determine its connections, as a whole, with another constituents of a sentence, If a unit includes something more than a single specimen of a word class, then it
271
has its own internal structure expressed by means of suitable category matches~ This structure is b i d d e n from above but it must be r e v e a l e d
if the analysis is to be complete. The structure found out in the course of analysis I call surface syntactic structure. The only considered features of a word are its word class characteristics. Any word of a given class can be substituted for another one provided that both have identical values of all syntactic categories; the resulting surface syntactic structure is the
same in b o t h cases. On the other hand, changing order of two different n e i g h b o u r i n g units renders a different (however similar) structure,
although both structures may differ only at the lowest level. The surface syntactic structure can be represented by a parsing tree. Every rule used during analysis specifies a parent node and its daughter nodes. The leaves of such a tree are the syntactically pre-
p r o c e s s e d words. An auannented version of a p a r s i n ~ tree might be a parsing graph, p r o d u c e d from the tree by linking up all pairs of nodes w h i c h have some m a t c h i n g category. Every such link would be an category. All syn-
tactic relations observed in a sentence w o u l d be thus fully exposed. Some well k n o w n facts should be p o i n t e d out. It is practically impossible to describe the natural language in extenso by means of a formal grammar. It would be unrealistic~ if at all possible. A reasodescribed in a
sufficiently detailed manner. A carefully selected collection of syntactic units makes it possible to write down such a set of rules that is h i g h l y plausible as a starting point of some computer-based imple-
272
mentation. always
which should be
considered as specific to some application. At the present stage of research it is convenient to express
syntactic relations by means of context-free rules with parameters. Those parameters stand for syntactic categories. The rule with a para-
The parameter can occur in various units in the same rule; it assumes tactic then the same value. This means that the corresponding syn-
This is how the matching is of two units match, (for instance, the
realized.
categories
connection between the case of a noun and of an adjective reflected). If the proper category of a unit matches
can be thus
this mannner the gender of a noun and of an adjective Similarly the syntactic requirement
can be matched).
word class of a required unit. In general, matching tactic categories different enables us to render distributive
syntactic units,
of complexity. The m o t i v a t i o n underlying the choice of syntactic units is strictly distributive. The word class may be (slightly imprecisely) equivalent but
have different degree of complexity in some specific sense. It is then convenient to distinguish a number of subclasses of a word class;
273
I call
ly one). The phrases are linked up by means of conjunctions king more precisely, lent to conjunctions. by means of constructions The phrase syntactically
or, speaequiva-
tration of the fact that the phrases of all degrees equivalent from the standpoint of distribution.
The number of degrees is arbitrary. it should conform to the experimentally of respective constructions
that
determined relative
frequency
rules one should always choose the most complicated phrase to stand for an element of a word class: any less complicated As an example, valeuts one. let us consider the sequence of syntactic equiconsists of one or one or more this phrase can be directly replaced by
(SNP)
in turn,
phrases".
makes a
, which
a substantival
a noun accompa-
274
adjective mentioned
phrase;
case;
the phrase
de subordinate
related
(classes
Every member
can be located
at various
of the sentence,
depending
on the degree
of a relevant
is treated as an SNP, in "he and Jack fence t~_ as a "noun 0hra~e", whereas in 'leither he and Jack or Jim and Joe fence ~i
--
as a
l!
single-
-noun phrase ". Here are a few e~amples phrases described of rules, connected with the noun-like
that every such phrase has four The names of syntactic units
parameters:
case, number,
person.
parameters
The sequence
numbe~ is not the part of a rule. I ) SERNOUNPHR( case, numb, gend, pers ) = NOUNPHR( case, numb, gend, pets ) 2 ) SERNOUNPHR( cas e, PLURAL, gend, oers ) = NOUNPHR( c as e, numb2, gend2 ~oers 2) CONJUNC SERNOUNPHR( case, numb3, gend3, pets3) = SNGLNOU~HR(case,numb,gend,pers) = SNGLNOUNPHR(case,numb2,gend2,
NOUNPHR(case,numb3,gend3,oers3)
5 ) SNGLNOUNPHR( case, numb, gend, pers ) = TRIMSNGLNOUNP ( cas e, numb, gend, pets ) 6) TRIMSNGLNOUNP(case,numb,~end,pers) = NOUNATTR(case,numb,~end,pers)
7 ) TRIMSNGLNOUNP( case, numb, gend, pets ) = NOUNATTR( case, numb, gend, pets) SERNOUNPHR( GENITIVE, numb2, gend2, o ers 2 )
275
Each rule is aoplied according to the left-to-right That is, a rule reads:
principle.
a sequence of right side units, if the sections ponding to the right side units, are contiguous.
all cate-
gories supposed to match should actually match. Note that in case of the rules 2 and 4 an additional procedure ought to be used which adjusts the gender of a left side to the genders of all right sides. Care should be also taken that more subtle
rules are used to handle special cases of number and gender adjustment. As an example let us consider the sentence:"Dziecko, przyszli" ko~ i kobieta
("A child, a horse and a woman have come"). Each of the which is
nouns has different gender, neither is masculine-personal, the case with the whole group. Another example:
dzie" ("John or Peter will come"), where the group is to be treated as singular.
276
A grammar of the kind described in the previous section can be directly and conveniently tamorphic expressed as a metamorphic grammar. The me-
proved in practice as useful means of defining some formal properties of a natural language (Battani, Meloni 1975). Metamorphic grammars are It is
grammars.
even more than is currently needed from the standpoint of written Polish. to a language defined by a metamorin the PROLOG programming lanare translated one-to-one of language elements
of words belonging
PROLOG has been designed and developed by Colmerauerts (Roussel 1975). It is an implementation calculus,
in predicate
Externally it can be viewed as a theorem prover for the facts expressed in clausal form, which is based on the SL-resolution principle ski, Kuehner 1971). Internally, process, such as substitutions certain side-effects (Kowal-
of a proving lite-
rals, result in that PROLOG is a very powerful, programming language. It is not, however,
particularly
277
The basic data structures in PROLO~ are terms, tures. The proof procedure,
or tree struc-
d e p t h - f i r s t w i t h b a c k t r a c k i n g in case of failure. A program in P R O L O G is made of subprograms, each consisting of a sequence of clauses, and
a sequence of i n v o k i n g clauses w h i c h can be i n t e r p r e t e d as subprogram calls. The choice of a clause within a subprogram resembles a case statement w i t h a set of parameters as a selector. It is then a kind
of p a t t e r n - d i r e c t e d procedure i n v o c a t i o n where the p a t t e r n - m a t c h i n g process is carried out by means of unification. The m e t a m o r p h i c gr~]~ar rules can be straightly incorporated into a P R O L O G program. gram, They are in fact treated as a part of the oro-
thus r e g a r d e d as a predicate calculus v e r s i o n of a lanpula~e definition. The rules " w o r k " in two directions: their can be used equally well dulanguage.
A c t i v a t i o n of any of those processes requires a P R O L O G command. This command specifies both the d i r e c t i o n of a process and the parameters which indicate a particular object submitted to the process. The m e t a m o r p h i c grammars in PROLOG are especially handy for two reasons. First, one can interpret any p a r a m e t e r of a syntactic unit as
name, then the nonterminal x1(x2,...,xn) makes this unit. The second reason is the possibility of i n s e r t i n g in the right side of a rule any number of procedure calls w h i c h are called conditions. They are v e r b a t i m transmitted to the clause c o r r e s p o n d i n g to a rule and they
278
279
~, The specification
of a subset of Polish
of a subset of Polish,
to be actually
processed by a preliminary version of a syntactic analysis and synthesis system which is currently being implemented in PROLOG. For the
sake of the system it is useful to determine what is meant by a sentence from the technical point of view: it is each section of an input text terminated by a period or a semicolon. analysis of a sentence The task of syntax
(that is, its accordance with a given set of reolacement rules which implicitly define the notion of correctness); every correct sentence Punctuation must
should be assigned its surface syntactic be correct too. The subset of Polish includes that conform themselves
structure.
then all and only those sentences listed below. or conditional. Com-
to the restrictions
indicative
clause which has at least one predicate; would not be accepted.) 2) No ellipses are allowed, day.") is not accepted. 3) The phrases ought to be continuous: interlace,
is not accepted. 4) The word order should be approximately tions of whole phrases are possible. neutral, although permuta-
280
A finite verb is the pivot of a Polish sentence. belonging to word classes 10 and 11. The members
It is the verb
tions) identical requirements. lected into a superclass has been introduced: applies
of verb derivatives.
category
it is called derivational
and it
I follow here the idea of verb derivatives (1973). It has also (in a specific form)
A verb derivative is the central syntactic unit of a generalized verbal construction built of the derivative itself and of the units reeach require-
given earlier,
ment is satisfied by the most complicated phrase which can stand for an element of a required word class. requires For instance, if a verb derivative
a noun in dative case, then we refer to a "series of noun The verbal construction with a fixed discriminant adjective phrase, adverb phrase,
p h r a s e s " in dative.
It is then convenient
The syntactic units which may correspond to single words I regard as elementary units, each of classes The elementary units are associated with of classes 10 sad 14, and
of a suitable
281
which represents
a word form belonging to that class. For instance that correspond to a word
the
The elementary units are, in some sense, terminal units with respect to the definition of the subset of Polish. surface syntactic substituted structure any representative That is, within a
for an elementary unit related to the class, and the struc(obviously, semantic considerations substitutions). would As a
of syntax (in the sense adopted here) since their only features rele-
should well do without lexical items, vant to syntax are their syntactic
categories.
2~
program
The syntactic analysis program has not been implement yet. Below I shall present some technical decisions which will be thoroughly tested soon. The replacement rules constituting the syntax definition are of the subset of Polish
which has been described above, provided that each word of the sentence is linked to a corresponding elementary unit, This can be accomplished via syntactic preprocessing. analysed, If a separate sentence to complement ought to be
with those and only those specific rules which concern this sentence. These rules can be regarded as local (to the sentence). A local rule
defines an elementary unit having a specified word form parameter as this particular word form. The form is supolied with oertinent tic categories. The global rules would be the constant part of a PROLOG program. The local rules would be exchangeable: tence to another. they would vary from one senthough, the arransyntac-
tinguished nonterminal NT (cf 4). There is one global rule for each elementary unit. For example, a rule for the NOUN unit is: == NT(form,case,numb,gend,pers)
NOUN(form,case,numb,gend,pers)
(The double " = " separates left and right sides of a rule.) For a fixed word form, NT(form,case,numb,gend,pers) corresponds to
283
a nonterminal form(case,numb,gend,pers) If the parameter "form" has the value, tive), then the nonterminal say, P I ~ K ~ ("a ball", accusa-
PI~K~(ACC,SING,FEM, 3) The vocabulary is composed or more readings with respect a rule for each reading, of word forms. Every form has one categories. There is
to its syntactic
with a nonterminal
left side of the rule and with a word form at the right. is written as a metamorphic symbol, grammar terminal
( a friend ) may be as
PRZYJACIELA(GEN,SING,N~SCPERS,3) PRZYJACIELA(ACC,SING,MASCPERS,3)
==~PRZYJACIELA == ~ P R Z Y J A C I E L A
This is how the syntactic preprocessing is simulated. Beneath I shall give the list of n o n - e l e m e n t a r y which occur at the left sides of global replacement must not be regarded as complete or definitive, syntactic units
rules made up so far ought to be verified and then perhaps modified in order to mirror more adequately the characteristics subset of Polish. The verification would be carried of the chosen
ticular text corpus. The list of non-elementary I) Sentence 2) Subject syntactic units is the following:
2~
3) Predicate 4) Noun phrases (four degress of complexity) 5) Verb phrases (u.s.) 6) Infinitive phrases (u.s.) 7) Adjective phrases (three degress of complexity) 8) Adverb phrases (u.s.) 9) Numeral phrases (u.s.) 10) Conjunctive construction (such as "a tak~e", "jak r 6 w n i e ~ " II
also ,
II
I|
as well as !' )
11) Verbal construction (cf 5) 12) Verb with requirements, a separate unit for each of these situations: no requirement, noun required, preposition plus noun required, two nouns required, noun and preposition plus non required, subordinate clause required; this list can be amplified in the future. 13) An undetermined so far number of subordinate clauses, such as those connected with "~e" ("that") or "ktSry" ("which", "who"). 14) Negation NIE, realized as the word "nie" or as an empty word. 15) Noun with attributes (introduced mainly for technical reasons). 16) Adjective with modifiers (u.s.) The list will be probably expanded as a result of the verification mentioned above. Punctuation will be also taken into account, as in the initial outline it is not considered at all. Syntactic analysis or synthesis of a sentence is activated by means of a special PROLOG command S Y N w i t h two parameters. The first
285
se), the second is the sentence put down as a concatenated list of consecutive words and punctuation marks. For purely technical rsasous each syntactic unit will have an additional parameter used to transmit succesive approximations of a parsing tree produced during analysis. The same parameter will indicate the parsing tree of a sentence
to be produced durin~ synthesis. The tree will be transmitted as a term. In the case of analysis the initial value of tree parameter of
SENTENCE should be a free variable; the final value would then be a parsin~ tree. In the case of synthesis the second parameter of SYN command, initially a free variable, would eventually receive the sentence representation as a result. The information connected with a node of a parsing tree may be as complicated as necessary. The term corresoonding to the node
may have any number of parameters. The daughter nodes (which are terms themselves) must be among them; one can also choose, for instance, to place in the node an information concerning some match of the daughter nodes, such as name and value of a matching syntactic category. I shall present below a sample term which corresponds to a parsing tree of the sentence: wczoraj "Syn mojej siostry i cgrka przy~aciela
and the friendts daughter found a ball yesterday and took it home"). For the sake of clarity I have simplified the term by cmitting less significant stages of analysis; for instance, I have neglected all single-unit phrases (such as "single-noun phrase ~, cf 3), beca~se they are not important in this example. I have also removed from the nodes almost all syntactic categories. The remaining categories appear
286
elementary units). The names of nodes have the following meanings: SN~ = series of noun phrases, -noun phrase ; NP = noun phrase, NPIT = trimmed single-
SVP, V~, VPIT = as above for verbs; ADJP, ADJPIT = similarly for adjectives; VCON = verbal construction, VRN = verb requiring noun, (plus noun). MASI - masculin-inanimate; another names VRNPR = verb
Four subterms
term so that it would be easier to read it. The items corresponding to daughter nodes have been succesively have been underlined. SENTENCE (SUB JECT (PL, S N P ( N O M , P L , ~ SP, indented. The word forms
0
CONJ(I_),
PREDICATE(PL, SVP (PERS, MASP, PL,
@
cONJ(I),
@)))
287
|
NP (NOM, SING, NASP, NPI T(NOM, SING,NASP, NOUN(NOM, SING,NASP, SYN), SNP( GEN, SING, FFA~, NP( GEN, SING, F ~ , NPI T( GEN, SING, FEM, ADJP( GEN, SING, FEN, ADJP IT( GEN, SING, FEM, AD JPRON( GEN, SING, FEM,MOJEJ) ) ),
O
NP(NOM, SING, FEM, NP1 T(NOM,SING, ~EM, NOUN(NOM,SING, FEM,C ~ ) , SNP( GEN, SING, MASP,
NP( GEN, SING,NASa, NPI T( GEN, SING, MASP, NOUN (GEN, SING, MASP, PRZYJACIELA) ) ) ) ))
288
@
VP (PERS, MASP ,PL, 3, MODIFIER(
AOVEP~ (W CZ 0RAJ) ),
VPIT(PERS,MASP,PL, 3, VCON(P ERS, ~ASP, PL, 3, VRN( PERS ,MASP, PL, 3, ACC, VERBP ERS (MASP, PL, 3, ACC, ZNALET,L!,), SNP(ACC, SING, FEM, NP(ACC, SING, FI~, NPI T(ACC, SING, F ~ ,
|
VP (PERS,MASP,PL, 3, VPI T(PERS,MASP,PL, 3, VCON(PERS,MASP, PL, 3, VRNPR(PERS ,MASP ,PL, 3, ACC, VERBPERS (MASP, PL, 3, ACC, ZABRALI ), SNP (ACC, SING, FI~, NP(ACC, SING, FE~, NP] T(ACC, SING, FF~,
) ) ),
289
The structure of the sentence revealed during analysis is roughly represented by this term. It can also be shown (in a simplified manner) in the following parenthesized form: ((((syn)(mojej siostry))(i)((cgrka)(przyjaciela))) domu))))
(((wczoraj)((znale~li)(pi~k~)))(i)((zabrali)(j~)(do
290
~. Conclusion
Automatic processing of Polish syntax reached only the preliminary phase of investigation. The task of this phase consists in
disclosing problems and in indicating the course of further research. The syntax definition must be verified, corrected and improved. The
set of rules must be then expanded to cover some richer subsets of the language; it seems that the restrictions as to word order and con-
tinuity of phrases would be dropped first. Well structured dictionary accompanied by a reasonably organized lookup should make syntactic pre-
processing more efficient and flexible than in the current version. The research should be carried on in two interacting directions. First, it is necessary to study Polish syntax, point of view of computer applications. sophisticated programming especially from the
tools is essential
systems.
will probably allow better insight into problems which arise during the work at automatic processing of natural language texts.
291
References (Battani,Meloni 1975) G.Battani, H.Meloni, "Nise en oeuvre des contraintes phonologiques, syntaxiques et semautiques dans un systeme de comprehension automatique de la parole". G.I.A., Universit6 d~Aix-Marseille, June 1975. (Bie~ et al. 1973) J. St.Bie~, W.~ukaszewicz, S.Szpakowicz, "Wprowadzenie do systemu MARYSIA". Reports of the Warsaw University Computation Centre, No 39, 1973. (Bie~ et al. 1973a) J. St.Bie~, W,~ukaszewicz, S.Szpakowicz, "0pis systemuMARYSIA. I. Zasady pisania scenariusza i scenopisu". Reports of the Warsaw University Computation Centre, No 41, 1973. (Bie~ et al. 1973b) J.St.Bie~, W.~ukaszewicz, S. Szpakowicz, JaOpis systemu~ARYSIA. II. Wprowadzanie hase~ do systemu". Reports of the Warsaw University Computation Centre, No 42, 1973. (Bie~ et al. 1974) J.St.Bie~, W.~ukaszewicz, S.Szpakowicz, "0pis systemu MARYSIA. III. Tworzenie czq~ci gramatycznych s~ownikdw systemu". Reports of the Warsaw University Computation Centre,No 43, 1974. (Colmerauer 1975) A. Colmerauer, "Les grammaires de metamorphose". G.I.A., Unlverslte dZAix-Marseille, November 1975. (Also in this volume. ) (Doroszewski 1958) W.Doroszewski (ed.),"S~ownik Jgzyka PolskiegJ~ vol. I-XI. Warszawa 1958-1969. (Kowalski 1973) R.Kowalski, "Predicate calculus as programming language". D.C.L. Memo 70, University of Edinburgh, 1973. (Kowalski 1974) R.Kowalski,"Logic for preble~ solving". D.C.L. Memo 75, University of Edinburgh, 1974. (Kowalski, Kuehner 1971) R.Kowalski, D.Kuehner, "Linear resolution with selection function". Artificial Intelligence 2, 1971, pp.227260. (~ukaszewicz, Szoakowicz 1973) W.&ukaszewicz, S.Szpakowicz~ "Start prac nad systememNLARYSIA". In: "Zastosowanie maszFn matematycznych do bada~ nad jqzykiem naturalnym ~'. Wydawnictwa UW 1973, pp. 34-41. (~ukaszewicz, Szpakowicz 1974) W.~ukaszewicz, S.Szpakowicz, "Charakterystyka systemu NARYSIA". In: " Systemy ~jszukiwania informaeji", PWN 1974, pp. 181-186. (~ukaszewicz, Szpakov~cz Iq76)W.Lukaszewicz, S.Szpakowicz, "System konw~rsacyjnv N[ARYSIA". In: "Zastosowanie maszyn matematycznvch
9 . f
292
do bada~ had j~zykiem naturalnym II", Wydawnictwa UW 1976, pp. 127-137. (Nisz 1967) H.Misz, "Opis grup synta~tycznyoh dzisiejszej polszczyzny pisanej". Bydgoszoz 1967. (Nisz, Szupryczy~ska 1971) H.Misz, ~.Szupryczy~ska, '~Nad zagadnieniem deskryptor6w dla niewsp~rz~dnych grup syntaktycznych dzisiejszej polszczyzny pisanej". In: "Problemy sk~adni polskiej", Warszawa 1971. (Roussel 1975) Ph.Roussel, " PROLOG, manuel de reference et dlutilisation". G.I.A., Unlverslte dIAix-Marseille, September 1975. (Saloni 1974) Z.Saloni, "Klasyfikacja gramatyczna leksem~w Dolskich". "J~zyk Polski" LIV (1974), vol. I, pp. 3-13, vol. 2, pp. 93-101. (Salonl ~ 1976) Z.Saloni, ItCechy sk~adniowe polskiego czasownika". Wroc~aw 1976. (Saloni 1976a) Z.Saloni, "Kategoria rodzaju we wsp~czesnvm jNz~ku polskim". In: "Kategorie gramatyczne grup imiennych w j~zyku polskim", Wroc~aw 1976. (Tokarski 1973) J.Tokarski, "Fleksja polska". Warszawa 1973.