Sei sulla pagina 1di 288

Lecture Notes in Computer Science

Edited by G. Goos and J. Hartmanis

63
Natural Language Communication with Computers

Edited by Leonard Bolc

Springer-Verlag Berlin Heidelberg NewYork 1978

Editorial Board P. Brinch Hansen D. Gries C. Moler G. Seegm011er J. Stoer N. Wirth

Editor Leonard Bolc Institute of Informatics Warsaw University PKiN, pok. 850 00-901 Warszawa/Poland

brary of Congress ~ sin entry underCataloging title:

in Publication Data

Natural language communication with computers. (Lecture notes in computer science ; 63) Bibliography: p. Includes index. 1. Interactive computer systems--Addresses, essays~ lectures. 2. Question-answering systems--Addresses, essays, lectures. 3. Language data processing-Addresses, essays, lectures. I. Bolc, Leonard, 193~II. Series. QA76 99. I58N37 OO1.6 '~ 78-15393

AMS Subject Classifications (1970): 68-02, 68A30, 68A45 CR Subject Classifications (1974):

ISBN 3-540-08911-X ISBN 0-387-08911-X

Springer-Verlag Berlin Heidelberg NewYork Springer-Verlag NewYork Heidelberg Berlin

This work is subject to copyright. All rights are reserved, whether the whole or part of the material is concerned, specifically those of translation, reprinting, re-use of illustrations, broadcasting, reproduction by photocopying machine or similar means, and storage in data banks. Under w 54 of the German Copyright Law where copies are made for other than private use, a fee is payable to the publisher, the amount of the fee to be determined by agreement with the publisher. 9 by Springer-Verlag Berlin Heidelberg 1978 Printed in Germany Printing and binding: Beltz Offsetdruck, Hemsbach/Bergstr. 2145/3140-543210

P R E F A C E

In r e c e n t y e a r s velop natural

in n u m e r o u s

countries,

attempts

have been

m a d e to de-

language

systems of c o m m u n i c a t i o n

w i t h computers.

This r e s e a r c h has b e e n i n i t i a t e d b y w e l l - k n o w n sponsored o f t e n by g o v e r n m e n t

research

institutes

research programs.

This p u b l i c a t i o n cerning

should

facilitate

an e x c h a n g e in this

of i n f o r m a t i o n , area.

con-

the p r e s e n t

state of r e s e a r c h

The a u t h o r s w o u l d publishing

like to e x p r e s s

their thanks

to S p r i n g e r - V e r l a g

for

this volume.

Warsaw,

May

1978

L e o n a r d Bolc

C O N T E N T S

A formalism question Camilla

for the d e s c r i p t i o n systems

of

answering Schwind

..........................................

Access natural

to d a t a b a s e language

systems

via

Klaus-Dieter

Kr~geloh,

Peter

C. L o c k e m a n n

................

49

An overview A problem as q u e r y
G.L.

of P L I D I S information system with German

solving language

Berry-Rogghe,

H. W u l z

...............................

87

Metamorphosis A. Colmerauer

grammars .................................... ........ 133

The

theory

and p r a c t i c e network Bates

of a u g m e n t e d

transition Madeleine

grammars 191

..........................................

Syntactic Stanis~aw

analysis

of w r i t t e n

Polish 261

szpakowicz

.....................................

A FORMALISM FOR THE DESCRIPTION QUESTION ANSWERING Technische Universit~t SYSTEMS MGnchen Camilla Schwind

OF

ABSTRACT
The following article presents a formalism for the description system. by a sta~o of a lan-

natural language based intelligent guage texts is to be represented of predicate

The meaning of natural

Iog4o. This is an extension

logic by special operators, The extension

which are applied to formulae of the non-logical syntactically symbols deby a for-

and make their truth value dependent the formula is evaluated. changes. consists Natural

on the state of the world in which

pends also on the state of the world and it may change when a state language texts are described is an extension The derivation and the structure mal grammar, which of a CHOMSKY-grammar. The alphabet

of complex symbols in different

of these symbols is givan extension of the

en by special rules. to symbols usual method. of the symbols tencus. rules. tences give a Natural

rules of our grammar are applied

way which constitutes

The application language

of a rule is governed by the structure into state logic formuwith the production structure Of the senWe will are evaluated.

and on applying one rule, we can derive a set of sentexts are translated which are associated

lae by special functions These functions detailed

depend on the syntactic for the application

and on the world in which the sentences example

of the whole formalism.

INTRODUCTION Since the early 60's intelligent capable of understanding questions natural systems have been developed whioh are language sentences, of answering bases, or of carrying out comin regard to their from each other (eg.

according to their knowledge

.,andu. Most of those systems have been designed spL,cial problem areas which are very different [1], the (1) [2], [3], [4], [5], same main problems: The r e p r e s e n t a t i o n of the knowledge which is by thu natural language

[6]). But all such systems are confronted with formulated in a s y s t e m

and which Is manipulated

language -~nt~nc~,s.

(2) The handling of the natural

input sentences D that is, the

syntactic analysis of texts and the translation of them into a semantic representation. We intend to propose a f o r m a l i s m that describes these two p r o b l e m areas in a very general manner so that e x i s t i n g natural language b a s e d intelligent systems fit into this formalism. tors for immediately for all future states tense logic systems, "If (~p ~ p The heart of the know(+,-) as well as ledge r e p r e s e n t a t i o n system is a state logic containing special operafollowing and p r e c e d i n g states (F) and all past states (P) . Similar systems

have also been mentioned in [7]. But the crucial point is: In usual the structure of tense has been studied only as we could only prove theorems like to its "pure logical" properties;

is true from today on then it will be true from tomorrow on" + Fp) . In intelligent systems however, we need theorems about The tense structure of a If a

the nonlogical properties of state changes. logical symbols of the world, robot takes a block of the world cate symbol ON a

world is determined by chan~es within the world which affect the noni.e. the functions or predicates: lying on a block b , then this causes a change

(i.e. a state transition) changing.

with the meaning of the prediSIZE a changing. If

If a flower grows, this causes a change of into a formal syslieu on the

the world, with the meaning of the function symbol we incorporate such nonlogical change descriptions tem, w~ will be able to prove theorems like: instant". Taking into account these considerations,

"If only

table ar,d John takes it, then the table will be empty at the followin~

a model for the formal sysM and a binary re~.

tem can be given by a set of classical structures lation Re on M , with (m,8) r Re

Iff there are objects in

which can be subject to some change and the resulting structure is zation el' modal logic

Such KrJl,ke-type :~emantics has been used for the semantic characteri([8],[9]). Truth values are assigned to fol.mulae d e p c n d l n g on the state of the world in which the formula is evaluated. And the state operators take into account the truth value of a formula in some other states which can be "reached" from the actual state. a,bj i.e.

Let us consider as an example a world consisting of three blocks c b and two hands or h'
It

and

h'

(Figure I). The possible changen, h b

the possible actions which can be executed next, are that tak,~s
back on networ.k

tako-

b . After that the hand can put


a. that So tl~e m o d e l contains all turns out to "possible"

on the I'1~,oI, ur
os the wo,.ld

carl p u t transition

be a g e n e z , a l , t a t , c~

changes

or we may also say contains descriptions of all actions which can be executed by some of the objects of the world.

(3)

lh lh

m
m
(i) (4) h h takes b . (2) h gives puts b on the floor. b to I h' . (3) h puts b on a .

Figure

The language of the state logic is formalized by a set of axioms and inference rules for which completeness has been proven. Natural language texts are analysed s y n t a c t i c a l l y by a formal grammar which is an e x t e n s i o n of a CHOMSKY-grammar. The algebraic properties The alphabet consists of (feature, value). symbols which are in their turn composed of pairs

of these structured symbols have been studied

in [10]. Their structure is defined by insertion rules which specify what features with what values can be contained in one structured symbol. The set of the structured symbols is ordered by inclusion and this ordering gives rise to a modified definition of derivability. (PI ... Pm,Q1 currence of ... Qn ) PI' "" " P'm
! " " " Qn

If

is a production rule, we can replace each ocwithin a string by QI provided

' and ' for 1 ~ i ~ m and 1 S j S n . Structured Pi ~ Pi Qj ~ Qj symbols are used in a formal grammar d e s c r i b i n g natural language sentences in the f o l l o w i n g way': There is one "starting frature", cat, the

value

:~et of which grammars:

are the categoz'ies NP etc. (noun phrase), Further

usu:Jlly appearing VP

in nutural N (noun), ac-

language DET

(verb phrase), characterize e.g. v

(determiner),

features

properties

cordini~ to which whose values are tlonal)

categories it

are s u b c l a s sl f i e d ; t verbs; (present) features e.g. t

is a feature p (pz'eposiwith the and into fe~that cat

(intransitive), subclassify pres

(transitive),

and these values perf

~s a feature and fut

properties specifies the grammar ture whose

(perfect) ,

(future)

the time of a verb. in the same values are +

Semantic

are Jrlcorporated is a semantic It is clear of

formal way; and -

"animate"

bo]on~ing

to nouns.

not just any features cannot The appear

can be co,~bir~,d t~ g c t h e r in a s t r u c t u r e d the possible

and that values symbol with any

indiscriminately determine sentences

feature.

in:~e~'tlon rules

combir~at[ons. into state logic formu-

The na~,ural language lae. with This translation

are t r a n s d u c e d

is p e r f o r m e d This

by functions

which are a s s o c i a t e d a semantic structure repre-

the p r o d u c t i o n

rules.

concept

of assigning syntactic

sentation introduced

to sentences by Knuth

depending

on their [12]

has been

[11]

and Koster

for the semantic language sentences

description grammar and the

of p r o g r a m m i n g transduction We pursued (I) Natural

languages.

So we propose of natural into state

an attributed formulae.

feature

for the syntactic

description

of sentences

logic

two aims with language

this proposal: should be d e s c r i b e d rules by a context free lan-

syntax

guage without tion rules

transformation

introduced

in [13]. T r a n s f o r m a sentences. the but The

are a very

cumbersome

means

of analysing

real m o t i v a t i o n same "deep"

for i n t r o d u c i n g

them has been which have

to associate

structure "surface"

to sentences structure. but were

the same m e a n i n g claimed for sentences

a dif1"erent one would

And this has been representation

because which

get

the same semantic

had the same m e a n i n g tem the semantic the help ture but (2) The

syntactically

different.

In our syswith struc-

representation

of a sentence

is generated and "deep"

of applications.

We have no "surface" have different as alphabetic are rule

and different

sentences symbols

syntactic formulae elements

structures; for them. us to allows

the system generates use of s t r u c t u r e d production that

logically

equivalent

formulate is defined element

rules which

classes. rules

Derivability

in such a way

that production

can be formulated of an alphabetic rules. The "missing"

incompletely,

is to say not all features in all p r o d u c t i o n

have to be specified

features can be calculated by application of an Ir~sertion rule, which udds features to alphabetic ~ymbols accord.~ng to the features belonging already to that symbol. It is po~)t~Ible to add such f~atures during sentence analysis when they are r~c, eded for the alh'11yuls and it is possible to neglect them when they are not rleuded. T h l t ) redundant value + is advantageous in natural language ana]ys1~) because there are many features of categories or words which arc sometimes in sentence analysis and sometimes are not. See for exfor the feature "animate" of the noun "woman" is not ample the sentence "The woman is sitting in the caf~". The feature needed for the analysis of this sentence.

'l'}IE FU]~MAI, SYSTEM HAL L A N G U A G E

FOR THE J~EPHESENTATION

OF THE M E A N I N G

OF NATII-

SENTENCES

The

language

of

Z,L,

is an e x t e n s i o n in m o d a l the logic concept

of c ] a s s i c a ] [81 and logic

prudicate logic

logic. [7]. We

It

ha:; it~ sy~tem

predessessors avoided neither being

in tenure here,

have h o w e v e r

of tense nor

the p r o p o s e d

antisymmetric

linear.

con:~ists

of the

following x,y,z, V, A

symbols: ''' xI' "'' YI' "'' and +A, set which zl' "'']sl-place symbols are PA this R9 , in formed is. We to be is done of

In(i[v~dual function wh~re the the by u~ual set

variables ^, ~ , and if

symbols way

F i , where +, -,

s i c ~; r j - p l a c e F, P . T e r m s then of the

predicate formulae -A, PA, of all

rj r w 9

is a formula, subset In c l a s s i c a l

now waf~t to select introducing

a certain

formulae

of vuZld formulae. the c o n c e p t

predicate

logic,

of a s t r u c t u r e

consists

of a set

objects tension gether

and f u n c t i o n s of p r e d i c a t e with a binary Truth

and r e l a t i o n s logic we use that

between

the objects.

For our extostate giv-

a set

of c l a s s i c a l

structures, of the

relation

reflects

the m e a n i n g

operators.

values

are a s s i g n e d state

to f o r m u l a e

depending

on the

en s t r u c t u r e

and the a c t u a l

of the

structure.

A structure where (I) M * @

for

is g i v e n the set

by

the

pair of

A = <(Am : a c M} , Ro> A . M . (pj)jcj> of the OB is a c l a s s i c a l

is c a l l e d

of states relation
( fl)ir a

(2) Ro u M x M (3) For (3.1) (3.2) (3.3) The every

is a b i n a r y

on

a c M A s : <OB, is the ~ OB of set OB of the

structure: OB , r fl Pjg objects of system. assigns s on si-tuples for c ~ . i OB . by R . : OBSl Arj , is a p a r t i a l mapping which

of e l e m e n t s

to e l e m e n t s is a r j - p l a c e closure of

rj c m reflexive

relation Re

transitive,

is d e n o t e d

We

consider of every t If L

A and

as a m a p p i n g a truth

assigning

an e l e m e n t

of

OB

to every of

term and Leb (I)

value

to every

variable-free

formula

state

~ r M . ~ r M : of an object then a Of OB , then ... t~i_ I , A(m,t) since t = a is

be a t e r m and t t is the name

(2) If

is not a name,

t = F i to

varlablefree A(a,l.'it ~ ... Let (3) now be If A

for some t.i_ 1) :

sl-place

function

L{ymbol

F i . Then

.~'t(c*) ( A ( ~ , t o ) , . . . A ( m , t p s
of L :

a closed

formula then

A ~ It, : t2] , : T iff

A(~,A) (4) (5) (6) If If If

A(~,t I) : A(s,t2) trj_ | , then (A(&,to), A(a,B) then A(s,B) then A(s,Bx[C]) for all A(8,A) for all A(8,A) 8 : T 8 : T B : T B = T . A iff A iff such that 8 R such that a R B such that 8 R o : T such for all that c r s Ro 8 . = A(s,C) = T = F .... A(~ ,trj. I ) ) r -(~) pj

A ~ pit ~ ... : T z T iff iff A ~ ~B , then

A(~,A) A(~,A)

A : B ^ C , : T iff A : WxB , : T t t iff iff iff

A(a,A) (7) if

A(~,A)

(8) A C ~ , + A ) : (9) A ( s , - A ) :

(D) A ( s , F A ) =

iff

for all A(B,A)

(]I) A ( ~ , P A ) =

iff

for all A(S,A)

A formula A(s,A) A(s,A) : T : T

is v a Z 4 d for all

4n

a 8~a~e

of i,

a strueture a 8~ruature

and a f o r m u l a

is u u Z 4 d

s r M . iff it is valid axioms in every and structure rules A for w h i c h

A formula We have

is u a Z 4 d a set

given

of l o g i c a l proven.

inference

completeness

has b e e n

The

existence

of f u n c t i o n s of L(S) in a very This

and p r e d i c a t e s allows general "action" will way.

within

each

As

for

the apof a in functhis the 8 being

propriate within child that tion

symbols

us to d e s c r i b e For

nonlogical think

changes

the w o r l d piling

an e x a m p l e , a change

up sand.

causes

of the w o r l d the unary formalize h be

the size symbol

of the pile to mean way: its Let size

increase.

If we

choose we can

SIZE

the c

size be

of ~n object the n a m e a

in the name 10 :

following

of the

child,

of the pile,

at state

being

5 and

at state

SIZES(h)

= 5;

SIZES(h)

= 10 ;

Q Ro 8

and

we

can verify

the

formula

SIZE(h)

= 5

PILE(c,h)

~+~SIZE(h)

= 10 .

A FOHMAL LANGUAGE FOR THE SYNTACTIC ANALYSIS OF NATURAL LANGUAGE TEXTS Texts are analysed syntactically with the help of a formal grammar, which is an extension of a CHOMSKY-grammar. The alphabet cOn~Ii~tn of finite sets, which are generated by "insertion rule~". The d n ~ e r t l o n I'ules are applied to "start symbols". These start symbols cort.(u~l)ond to the alphabetic elements which one founds usually in phrase structure grammars VP for natural N languages: (noun), V S (sentence), NP (noun phrase), (verbal phrase), (verb), etc. Insertion rules subNP we get:

classify the categories a set of alphabetic [ NI',(o, playlr,g the piano" [N}',(c,+)],[NP,(c,-)] [V,(vk,,i1:Vans)]

in such a way that for every category there is E.g. : for (noun phrase w~th e m b e d d i n g "the Child who is

elements.

and wiLhout embedding "the green ball"), (composed noun phrase "the teacher and all his for V we get: [V,(vk,trans)], [V,(t,per1')]

|~ui, i]s" :n,d not composed "my father"); (verb in perfect form), [V,part f]

(t~'an~itive and intransitive verb),

(verb in participle form).

Our grammar has rule classes instead of rules according to the structure of the alphabetic elements. jaarden grammar [14]. Our grammar is a special van Wijn-

Structured

symbols

The following definitions are from [10]: Definition Let M I (Bm)mr M a family of finite sets, where a: a B
m

be a finite set,

Bm * ~ . Then every partial mapping a(m) 9 Bm are called features, and is noted a(m) val~e of m in the domain of a .

M ~ U{Bmlm r

M }

where M and a m

is called a s t r u c t u r e d svmboZ over d(a) ; the elements of

M 9 The elements of

is called feature set of are called value8 of

Let

be the set of all structured symbols over C : Vm r d(a)


a(m) = b(m)

M . The following

relations are defined over a ~ b *-- d(a) = d(b) a(m) m b(m)


a 9 b --~ V m 9 d(a)~ d(b)

and

If

a T b , the

and

are also called compatibZe.

10 d(a) = d(b) classes for <{m}> . ~ , the least a and is denoted by $ ~ M are de<m> instead of =

~--

is an equivalence relation. E q u i v a l e n c e <$> . We write also lower bound b e l o n g i n g to

notud by

The greatest and b . A

upper bound by Notation~ values

~ , the latter be~n~ ch filted only for compatible structured symbol a with the domain = ai (ml,...m k}

(al,...,a k) , such that

a(ms

is written

[mlml,...,akmk] . The unusual n o t a t i o n is used in phonology where structured symbols characterize phonemes. We use structured symbols in our formal grammar for natural languages in the l'ollowing way: (I) Th,~~ is one feature, S cat (category), that plays a special part (sentence), NP (noun phrase), VP (verbal ~nd whose values are the categories usually needed in a natural language grammar: phrase) etc. (2) There are further features whose values stand for properties according to which these categories are subclassified. subclassified. E.g. the feature possible values are numbers complements "semantic" tures. to the verb. features are not d i s t i n g u i s h e d from "syntactical" feap The features turn out to be ordering principles a c c o r d i n g to which a category can be subclassifies verbs and its which stand for the n u m b e r of I, 2, 3

(3) Sem~ntic criteria are also c h a r a c t e r i z e d by features and those

Insertion rules The features are ordering principles for the grammatical categories.

T h e r e f o r e they always refer to certain categories and the alphabet of the grammar is a strict subset of the set of all structured symbols. [N cat, itrans v] for example is not a m e a n i n g f u l alphabetic element because nouns are not s u b c l a s s i f i e d a c c o r d i n g to transitiveness. Definition 2 An insertion ru~e for A ~ <S> p for S = M and C and is a pair X r C p = (a,A) , where iff a r C and d(a)~ S = ~ .

is called applioab~e tO

(A1) a u x

11

(A2) d ( x ) ~ S : Let
u,v r

R ~ C x {AIA ~ <8> , $ = M)
C 9

a set of insertion rules. Let and p is applicable to u u

Then ar%d b

u imp o v

Iff

3p r R , p : (a,A)

3b ~ A : v = u ~ b . AS a result of transitive if closure of imPo u imPpl u! o

(A2)

is compatlble with u imPo v . The imp . p We write

and the least upper bound is defined.

We write also

reflexive, also

is denoted by

u plimPpnV..,

... impoVpn ' C in an analogous way as produca r C , then we denote the set a by
R

Insertion rules generate subsets of tion rules generate languages. If R

is a set of insertion rules and

of structured symbols generated from

by

L(R,a)
and we set

= (x r C J a pllmppnX

Plr

R)

TL(R,a)

= L ( R , a ) ~ {x r C I n3y r C : x i m p o y} .

Insertion rules for structured symbols are a p p l i e d to symbols of the form [X cat], which figure as "start symbols" TL (R, [X cat]) X . for the alphabetic elements and is the set of alphabetic elements belong-

ing to the category

Feature 5rammars Feature grammars are defined in the same way as CHOMSKY-grammars, but the derivability concept is modified a c c o r d i n g to the alphabet structure. The alphabet of a feature grammar is to be the set of all structured symbols which can be derived from a set of feature values of the feature cat of a set of features M by some given insertion rules R. Now it is often the case that production rules are independent of the s u b c l a s s i f i c a t i o n of the alphabetic elen,ents and that they should be applied to all subclassifications. rule "indefinite article" "noun group". Let us consider for' an example the r e p l a c i n g "noun phrase" by For The rule which replaces "noun [NP cat] ~ [DET cat][NG cat] [NP cat] [DET cat, indef d][NG cat]

phrase" by "definite article" generating all chains

"noun group" has the same structure.

that reason we intend to write a rule

[bET "cat,...][NG cat,...], that is to say all

12

chains
at~d

having

the surer, length

as the chains The

occurz'ing w J t h l n feature grammars

the r u l e defined

collt~lil~[ng the chains

elementwlse.

here provide Definition A feature M * ~ (Sm)mr M R cat c M

us with this p o s s i b i l i t y .

grammar

is a tupel

G = (M,(Bm)mcM,R,cat,S,Z,~) , where finite set of f e a t u r e s family over M feature of cat rules, where for every a r C p = occurrp r ~ : cat r d(a) : Let of f i n i t e value rules sets for the features for the set C of s t r u c t u r e d symbols set o1" i n s e r t i o n starting

S 9 8ca t s t a r t i n g value of cat Z c B a t cot of t e r m i n a l v a l u e s I! u C- x C+ t) set of p r o d u c t i o n ink w l t h i n a production

(xl ''" Xn' Y1 "'' Ym ) r ~' then cat r d(X i) and cat r d(Yj) for all i,j : I < i E n and 1 < J gm. Let CB T CS C6' rCS, B be a value of cat. T h e n we set cat]) cat]) r 8'} for 8' m 8ca t

= L(R,[B : TL(R,[B : U{CslB

: U{~CBII+ r 8'}

These b : B! Cb

definitions
9

are e x t e n d a b l e
cat

over strings

over

cat

'

Let

"

Bn r B
=

"

T h e n we set as r
LBI

: {ala

aI

...

an;

TCb : {a]a = a I ... an; a i 9 TLBI} C 8 : U { C b l b r 8} for 8 U 8ca t TCB = U { T C b l b r 8} for B u Boa t it is n a t u r a l that the d e f i n i t i o n must be e x t e n d e d q g q' only such in such a way

Aftez. what we said at the b e g i n n i n g , of d e n i v a b i l i t y that, rules tained given (p',q') for f e a t u r e provided grammars and a production rule

(p,q) , we can apply all p r o d u c t i o n We only have to pay q' w h i c h are conrules.

p g p'

attention

to the fact that we g e n e r a t e specified

in the a l p h a b e t

by the i n s e r t i o n

%)

C" r

is the free w o r d s e m i - g r o u p is the empty string.

over

and

C : C~\{z)

where

13

Defir~|t Let

ion r
,

I~ C*.

x,y

l~duabgZgt~ Titan y io d~rgt~abZ.~e ]'eom


y = x'q'x" and x o > (p,q------~> y and or q = q' = q" .

in

~ ,

x o'

> y
C

ifr
*

X : x'p x" such that

and

B(p,q) r R x~--~--> y .

and

BI,",q" r

Beat

p = p' = p"

We write also As usual

is the transitive, reflexive closure of derivable by a feature grammar IS cat] > x} and G . is the Zanguage generated by G G

o is

> .

The set of sentences L(G) TL(G) 9 {xlx r C = L(G) ~ and (T CZ) "

The set of production rules of a Feature grammar ae('or'ding to CScat,

is not limited p

i.e. there can be pl'oductions (p,q) , where

" as can easily be seen in (op q) ~ X for all x z sScat . However the definition of derivability, such p r o d u c t i o n rules can never be applied. So we can eliminate such production rules in R without changlr~g the set of derivable sentences. The type of a feature grammar is defined in exactly the same way as the type of a CHOMSKY-language. C and therefore production rule ed p' and q' TCScat (p,q) being finite, (p',q') where it is possible to replace every p ~ p' p" and and q ~ q' , provid* of T C Scat of a feature grammar, which is a rule class, q"

by all production rules

are contained in elements

The C H O M S K Y - g r a m m a r obtained in this way is equivalent to the appropriate feature grammar. So we have proved the following theorem.

Theorem For every feature grammar G of type i there is an equivalent C H O M S K Y - g r a m m a r of the same type.

Semantic attributes Every natural language sentence generated by a feature grammar must be translated into a state logic formula. This transduction is a mapping from the set of sentences together with their derivations into the set of state logic formulae. attribute functions. It is calculated by semantic attributes and This formalism has been introduced in [II] for

14

the d e s c r i p t i o n alphabetic tribute

of the semantics we associate

of p r o g r a m m i n g production

languages.

To every atele-

element is given

a set of attributee rule be]onging

and to every

a set of values.

For every

a s,~t of a~rgbute

[un~s
ment

for each attribute the rule.

to an alphabetic define

occuring

within values

The attribute element

functions

all of with-

the attribute in the same and have belonging nition. values. rational rections, Therefore, values

of an alphabetic rule. tupels So,

they belong elements

to in terms occuring

of the attributes

belonging

to other a l p h a b e t i c attribute which of attribute values

production

functions

are many-place defi-

as arguments

of those attributes

to alphabetic values

elements

occur

in the appropriate are again

The So,

of tha attribute of a phrase from node

functions structure to node. and

attribute the deriattrib-

if we think within

tree r e f l e c t i n g This ~s done ]eaves which

structure

of a sentence, the tree

attribute

functions

transport

ute values

in two dito the root. carry values rule and

from the root two kinds

to the leaves

fl-om t h e

of attributeu to the root

are used:

ders

from the

leaves

and inherited which for every

carry

from the root to the leaves. for every derived left attribute

Therefore, belonging rule there

production

to an alphabetic is an attribute to alphabetic

element function elements

on the mappon the for

side of the p r o d u c t i o n

ing values right

of other attributes This

belonging

~;Lde of the production with

rule value

to a value element

from the value

set of

this d,.t'iw.d attribute. the: noo,~ labelled productLon there rule to which element

~s the v;tlue el' that attribute in question inherited belongs. attribute

the alphabetic the attribute on the right values function

on the left side of the Likewise belonging This rule.

]:3 an attribute

for every

to an alphabetic f unc t i o n takes

side of a production

attribute

of other attributes

belonging rule

to other and maps This

alphabetic value

elements

occuring of that

within

the same p r o d u c t i o n inherited node. for that

them into a value is the value

of the value

Bet of that

attribute.

attribute

Additionally, expression This

we need

o~e special

derived

attribute,

w , that belongs value is the sentence.

to the staz'ting elements designed special attribute,

of the feature also called correct

g~'amrnar and whose

to represent

the meaning of the whole the main attribute, sentence:

is used to deare par-

fine what

is a semantically

The attributes to logical

tial functions

which map phrase correct, is defined i.e.

structure

trees

formulae. it gram-

So we can define is syntactically mar, and if w

that a sentence,

s , is semanticalZy structure

entreat, if

it can be analysed

by the feature tree.

for its phrase

15

We will not give working

a formal

defin:[tlon

of' att~'J.buted g ramm~ b~ a detailed

here;

the

o~' the m e c h a n i s m

will be d e m o n s t r a t e d

example.

16

T|(ANSLATION OF NATURAL LANGUAGE DIALOGS INTO STATE LOGIC FORMULAE An attributed English grammar fragment state logic formulae. is given and discussed in de-

tail. The grammar analyses natural language dialogs and maps them to

T h e a l~h:,bo.t Here we describe what features and what insertion rules are used for a natural language grammar.

Features: a Kind of adjective. Wc distlngu~sh between two kinds of adjectives: (i) r~,latlonal adjectives (ii) adjectives (value r) which describe a property of e.g. big, oZd. a noun ~n comparison with other nouns;

that select a subset of' the set of all objects e.g. round, black.

they can refer to, i.e. these objects that have the property described by the adjectiv; cat Starting feature category. The values of cat correspond to alphabetic elements usually needed in transformational grammars s,,ntence;
~ '~ IZ ~ ~ O ) ;

for natural languages: (e.g. (NP

for

NP V

for noun phrase i'O~' verb; NG

the worm e~ting green


without embedd-

for noun group

i~g:: and without article, e.g. jectives; phrase cp DET for proper name; PRON

Zarge yeZlow teeth); A for adfor d e t e r m i n e r (e.g. the, aZZ, any, some); PN
for pronoun; ADV PP for p r e p o s i t i o n a l (e.g. today, aZfor adverbial

(e.g. on the tabZe);

Composition of nouns, noun phrases, noun groups, or prepositional groups. The possible values of cp are and according as the corresponding noun or group is compound (e.g. the teacher and al~

hia pupils is a compound noun phrase and generated by [NP,+ cp]).


d Type of determiner. d has the values all def (definite article), indef (indefinite ex (for article), pronouns dc (for pronouns like all, every), and

like 8ome). dc can be abs, comp, sup according as the coror superlative

Degree of comparison. The values of responding adjective is in absolute, comparative,

17

form (e.g. eb Embedding.

good, be~ter,
eb are

besS). or ~ccordi,,5 au the c o r r e s p o n d i n g

'Pi~e w~tue:~ of f Fo~'m of a verb.

noun phrase has a sentence or a noun phrase embedded.

The values of
eating) and kc

f prop

aPe part for verbs in partlclple form (e.g. for verbs in "propositional" form (e.g. ea~8). Its values are: for causal (e.g. for concessive

Kind of conjunction. This feature subclasslfies sentence conjunctions. cond for conditional temp fin (e.g. ~f .,. ~h~n); caus conc because); for temporal for final (e.g. after)}

(aZ~lzough);
m Nc~gatlon.

(in ordsr to). resp. according as the correspond-

The possible values are

Jng verb is negated.


n

Number. The values are plur for plural and pl are sing for singular. Number of "supplements" The possible values of one-place (e.g. work); of a verb or a noun. 1,2,3 : Intransitive verbs are (e.g. two-place verbs have one object

pl

know, John know8 Mary); three-place verbs have two objects (e.g. give, John gives Mary a book). Nouns are one-place (e.g. table) or two-place (e.g. father, John i8 the father of Mary).
tel S u b c l a s s i f i c a t i o n of relative clauses. The values of rel are subj,objl,obj2 according as the relative pronoun is the subject, t Tense of a verb. The values of t are past,pres,fut. the first or the second object

of the c o r r e s p o n d i n g relative clause.

Insertion rules (NP,<{eb,cp}>) Noun phrases can have embeddings and can be compound.

([NP,- cp],<{n,pl}>) (NG,<{n,pl}>) (N,<{n,pl}>) Noun phrases that are not 9ompound and noun groups and nouns are in singular or plural form and are specified according to their number of places.

18 ( A ,<c,> ) An adjeatlve is relational or not.

(It

a],<dc>) adjectives can be compared.

Relational

(V,<{pl,m,n,t,f}>) Verbs are specified number, according and tense

to their n u m b e r of possible
and they are in participle

supplements, form or not.

tO negation,

Derived

attributes

ag

is defined tive the of variable of

for noun groups, L(Z) which

NG, noun phrases, of ag

NP,

and for imperaor a by described

sentences, noun group ag

IS,imp s] . The value is the name or phrase.

~s a constant sentence,

of the object

For an imperative

the value is ad-

is the name

of the "person"

to whom the

command

dressed. con is defined for determiners, DET, and elementary noun phrases, there is a sen-

[NP,- cp, - eb], connector tence which

and for verbs. links

For every noun phrase, formula

the noun phrase This

with the other

formula

fragments.

connector phrase. the noun

depends phrase

on the kind of z, is expanded by aZZ resp. to

determiner if there "i)ronoun" Example: Yx[MAN 3x[MAN

belonging

to the noun i.e.

It is the empty word,

is no determiner, or "proper name". The sentence al~

men w o r k

is r e p r e s e n t e d

x WORK x] , the sentence x ^ WORK x] . D e p e n d i n g is con * resp. A . is

some m e n w o r k by
some

on the d e t e r m i n e r

the connector The value e gm of

for a verb

if the verb is negated

and

if it is not. to NP . arises. Let the g r e e n


b a Z Z be a noun

belongs

It is used for the g e n e r a t i o n

of questions

if an

ambiguity Example: hal~

phrase.

If there

is more
what

than one green ball do y o u mean?

in the structure

a question

is g e n e r a t e d

is defined nector of

for conjunctions, L(Z) belonging = v.

CONJ,

and has as its value

the

con-

to the conjunction.

Example:

h(or)

19

log

iu the main attribute and Its v:e1~e is th- l'uz'mula b,~ItnJKJn~ to


t h e sentence, l o g its i s a l s o des l'(,z, oth,:i. ,:ttt.,,g(,z'I,~. und thun log I s dev a l u e i s a f o r m u l a o r a f o r m u l a foll(,w,,d by a c o n n e c t o r o r a f o r m u l a , term, eonr~ectivo),

quadrupel ( q u a n t i f i e r ,

fined for (1) Adjectives; e.g. RED x for "x ie road"

(2) Adjective groups as conjunction of the formulae for the adjectives (3) Noun groups as conjunction of the formulae of the nouns and the adjectives the noun group is composed os parrot" is represented by e.g. "a green Bx[PARROT x ^ GREEN x].

(4) Noun phrases as conjunctions of the formulae of the noun groups and supplements the noun phrase is composed of; e.g. tha ~eaan parrot, relative clause ~hs
whlch

...

~s r e p r e s e n t e d by where i : log(S) and S is the

~x[I'AHROT x ^ GREEN x ^ I]
....

(5) P r e p o s i t i o n a l phrases by ON x t where x

PP; e.g. on ~ha tabZe is r e p r e s e n t e d is the name of the Object of the noun t the

phrase the p r e p o s i t i o n a l phrase is e m b e d d e d in and name of the object described by op q is defined for sentence adverbs, PrOl,riate operator. i:~ defined for determiners, for
80m~.

tabZe.

ADV , and its value is the ap-

DET , and its value is the quantiIn the example given below 3 for a~Z resp. for PREP and V resp.

fier for the appropriate noun phrase. con this q u a n t i f i e r is

sy top w

is defined for verbs, is defined for verbs, tor of L(Z)

Y , nouns,

N , prepositions,

its value is the predicate symbol r e p r e s e n t i n g that concept. V , and has as i~s value the tense operad e p e n d i n g on the tense of the verb. w is the state of the structure in which the diaWe need this information for the assignment of

is a "global" attribute b e l o n g i n g to every non-terminal element. The value of log is evaluated.

object nouns to noun phrases that describe that object.

20 Inherited attributes agr ~s defined for adjectives, adjective groups and sentences embeddagr is the name of the ob-

cd into a noun phrase. The value of

ject the adjective refers to. For a sentence it is the name of the object described by the noun phrase the Benten(!e is embedded in. agcr is defined for adjectives in comparative jec$ive refers to. ix is defined for nouns, noun groups and noun phrases. The attribute functions generate bounded variables for noun phrases like a ~ x~ , xa, jx. ... x i and form and its w~lue is the name of the object that is compared with the object the ad-

oh4~dren.
syr

These variables have the form

their indexes are generated by the attribute is de1'ined for relational adjectives cate symbol that represents

and its value is the predi-

the noun the adjective belongs to.

Attribute

functions for lexical rules

In the following, we describe how the most important word categories are represented in state logic. We shall explain in detail the attribute functions for the lexlcal rules.

(I)

Verbs are represented by p r e d i c a t e symbols of the appropriate number of places. We are aware of the manifold difficulties w h i c h can arise w h e n e v e r this number is not Uniquely determinable. (e.g. Problems connected with this have often been described and discussed [15]). We have not resolved this problem but we think it For every verb the number of suppleshould be possible to come to terms with it with the help of the following practical device. ments is fixed and part of the lexical information for that verb. Whenever the verb occurs in a text w i t h one or more supplements missing the empty variable places are filled up by dummy elements. When it occurs with supplements not provided in the lexicon the additional formula fragments must be connected with the rest of the sentence formula by [V,x t, y m, z pl]
sy(V) :

^ .

:: v

= ~(z)

is the z-place p r e d i c a t e symbol r e p r e s e n t i n g the mean-

21

[rim o i ' the verb. The the connector feature


=.,

of m :

is

or

depending

on the value

of

con(I m])
con(Im])

The ture

tense t :

operator t]) t]) t]) = r : ~ = ~ P F

of

depends

on the v a l u e

of the

fea-

top([pres top([past top([fur P ~ and The are They ~

means F ~

"there means

is an i n s t a n t

in the past in the top

such that future

..."; that..2'. which in.

"there

is an i n s t a n t con and

such

values placed

of the at

attributes

are o p e r a t o r s the verb

the head

of the w h o l e sentence.

sentence

occurs

operate

on the whole

(2)

Nouns by

are r e p r e s e n t e d function e.g.

by

one-

or t w o - p l a c e One-place

predicate

symbols

or de-

one-place

symbols.

nouns

are all nouns

scribing an{mals

objects, or h u m a n s ,

e.g.

tabZe, house, bZoak, and nouns d e s c r i b i n g mouse, baby. T w o - p l a c e nouns also debut they express at the same time a reexor pe~'J,,rn;; ,.xaml,les ~,t'e all n o u n s as luther, mother, aunt.

sc~'ibe things latlr~nzh~p pressing N1

or p e r s o n s

to o t h e r

things

congeniality ::= a
pl]) =

relations

[N,x pl]

sy([N,x
= ~(x)

is the

x-place

predicate

symbol

representing

the m e a n -

ing of the noun. Function same type nouns always correspond function. to a d j e c t i v e s For each expressing the a

of m e a s u r e

of these

adjectives the

measure

function to into

is i n t r o d u c e d

mapping

the o b j e c t s

adjective be de-

can r e f e r scribed

the set of n a t u r a l below. The same noun. function

numbers.

This w i l l needed

in d e t a i l

function is

for the re-

presentation N2 [N,r a] sy([N,r

of the ::= b a]) = ~b

22 ~I) is the one-place function symbol belonging to b

~b

Example: ~Zangth = LENGTH

~eiB#

SIZE

We have not treated "abstract" nouns such as eternity, existing question-answering-systems,

~o~e~ muZioe because these kinds of nouns have hardly been dealt with in
Such concepts appear in [2],

but there they are treated in a very "material" manner. They are measurable and they operate exactly like concrete nouns. The degree of maZice or

heu~th of somebody is exPressed by numbers and

these numbers increase or decrease depending on the things that happen in the world. We think that for a better treatment of such nouns it would be necessary to use higher order predicate symbols; but we would need predicates that can operate on other predicates of different types and t~is possibility logic. "Mass" nouns (see [16]) have not been treated semantically. They is not provided in type

cannot be treated like concepts,

i.e. represented by predicate

symbols. Sometimes they have the same properties as constants, sometimes they act like predicates. (3) Adjectives The state logic formula representing the meaning of the proposition contained in the adjective is built up on the lexical level of the grammar. Therefore, log belongs to A .

As mentioned above, we distinguish between two kinds of adjectives. (3.1)Adjectives that select a subset of the set of all objects they

can refer to. All adjectives describing colors belong to this group; the noun phrase the red buZZs designs a subset of the set of all balls, the balls being red. Other adjectives selecting a subset are round,

open. These adjectives cannot be compared. They

are represented by one-place predicate symbols.


A1

[ A , s a]

::=

log(A) = ~ agr(A) = ~(1) Jective is the one-place predicate symbol representing the adu . The inherited attribu~m aEr agr has as its value th~ within the rule AG1,

name of the object the adjective refers to within the appropriate sentence and this value is assigned to

23

or AG2 when Ex~mple:

is generated. log(red) phrase : RED t , whe,'e is describing. b,?t,w~,en the objects th~'y reI~ t is the

~he r#d ~abZ#.

name of the object (7.2) AdJ~,ctlves comparison care,gory; that wlth i.e.

the noun

express

a relation describe

fer to. These

adjectives being

the proDez'ty o~' an object by the same pl.edlcate. characterlstlc

other objects speaking is meant

belonging

to the same co:icc:ptual For ex~Jz~:

comprehended

amDl~-, iX" we are Arid the size that

of a big dog, we mean a dog who~e of centimetez's if we speak

,?x,:~,ed~ a c e r t a i n number

fox' dog:~.

of a smaZZ eZephant is an-

other absolute size. A sentence like thls big dog is m~ch emaZl~r than ~hat smaZZ eZephan~ must be verifyable. Such "relational" adjectives are paired: (young, oZd),(smalZ,big),(thin,~at),(sOft,

hard). A pair of relational


jects it refers to a c c o r d i n g ed by the adjective. that have are comparable. (a,,am)

adjectives

orders

the set of the obproperty exp,'ess-

to the m e a s u r a b l e

So the pair

(young,old) orders all things


age. Relational adjectives

an age a c c o r d i n g

to that

For every pair following

of relational

adjectives

there are the

functions,

relations,

and constants:

(i)

~a

is a function

symbol whose

extension

takes

as its domain al,aa

subset

of the u n i v e r s e

of the structure of all objects of the extension by an index ia : meter

the logic of ~a

is c h a r a c t e r i z is IN{la} , i.e. and of Z .

ed in, namely natural al and

the subset indexed A = ((A

the adj~ctlves

can refer to. The range numbers Let

typical for

for the adjectives al : 8ma~Z

am . For example

a2 : big.

: s r M} , Re)

be a structure

~Js)
T

: T ~ ~ {ia} , T ~ O B
of the set of objects OB of A al and am

is the subset

refer to.

(ii)

OP al

and

"indexed" according sions of on IN ( ~ a )

OP Ga are two-place p r e d i c a t e symbols, o r d e r i n g of the objects r e f e r r e d to by to the property OP at one and OP states invers~ expressed by al and

describing al and aa

the

aa , The extenbinary relations do not dif-

are totally a~ the other. to of A .

ordered

T h e i r extensions

fer with different

24

So we set for all


(la,n)OPa1(ia,m) and . _(ia,n)OPaa(ia,m) _ _

s r M
.*..~ n S m

and

n,m r

,,... n 9 m

Example:

OPyo~ng = OPozd =

~Jears ~Jears
P IN

and

(~ii) For every predicate of the extension of CSas,p ~a belonging to

such that

al

and

as

refer to objects CSas,p and

P , there are two constants

and limiting the sc~]e for values oi' P a~ . symbol P comprehending obcan reach a minimal CS
G3tP

fur objects "of type P" . Wc w]:h to exprc,:3s by this that CS and a maximal size of about al and ac-

an object of the conceptual category size of about a1,P cording to the properties

So we can fix for every predicate jects CS CS


GI~P and aa,P

al

and

am

can refer to:

r r R
CSozd, dog

Example:

= 20 = 20

CS

young~man

(iv)

cs

al

and and aa

CS ~a

are absolute measure numbers for all objects P .

al

can refer to i n d e p e n d e n t l y of a predicate

SO, we can set CSal = min(CS u t , X and CS aa = max(CS


Qawx

: X CS : X CS

predicate symbol
alwX

of

L(Z)

such that

is defined} symbol of L(Z) such that

predicate
a3wX

is defined} loE([A,r a]), r e p r e s e n t i n g the meandepends on the degree of comparison dc within

The state logic expression, duced above, log([A,r a])

ing of a relational adjective is composed of the symbols introof the adjective, i.e. of the value of the feature

the structured symbol the lexical rule is applied to.

25

(3.2.1) Absolute A2 [A, abs dc] log(A) agr(A) syr(A) ::: u = ~u agr(A) OP u CS

u,syr(A)
u refers to within the apNG2, s~:e later. u

is the name of the object

propriate text; refers to. If terminer" noun. If is taZZ, u u

its value is assigned in

is the predicate symbol he,longing to the object "noun", syr(A)

occurs within a nounphrase of the form "deis the lexical entry for the e.g. in the sentence is a proper name,

agr(A) syr(A)

John

cannot be found directly

in the same sentence

and not always within the same text. ated. Example:

It mmst be searched for

within the structure in which the text or the dialog is evalu-

aKZ smaKZ dogo ~smaZZ = SIZE CSsmaZl, d o g = 0.3


:=

OPsmaZZ : ~meter The lexlcal rule is [A,pos dc] log(A) x = SIZE xl S


meteE

smaZZ and we get

0.3

is the bound variable b e i n g the name of the object.

(3.2.2) Comparative A3 [A,comp dc] log(A) ::= u = @u agr(A) OP u ~u agcr(A) agcr It i s l e t is the name of the object always possible to find agr(A) agcr(A) is comwithin

The value of p a r e d with.

the same sentence; as in these other cases this value must be found in the dialog or text structure. Example: we get: ~ o Z d = AGE

[A,comp dc]

John is oldsr than Mary. We have the lexical rule ::= oZdsr for the g e n e r a t i o n of the adjective and

OPold = ~Jears agr (A) 9 John agcr(A) = Mary


log (A) = AGE

John ~Jeaz, AGE Mary

26

In the sentence ~he s~dcr b r ~ h e r

... we mu=Jt find in is compar~:d with,

tile

di-

alog structure the object "brother"

i.e. the

name of the person whose elder brother in be|ng spoken about. (3.2.3) Superlative A4 [A, sup de] log(A) Example: Let
=

::= u

V x [ s y r ( A ) x ~ ~u agr(A) OPu ~u x]

Che blgges~ dog ::= biggest be the appropriate lexical rule.

[A,sup dc] = DOG = d

Then we get syr(A) agr(A)

OPbig ~big

: ~meter = SIZE log(A) = Vx[DOG X ~ SIZE d ~metmz SIZE x]

(4) Prepositions PI [PREP,n pl] sy (PPEP) = : ~(n) is an n-place p r e d i c a t e symbol d e s c r i b i n g the meanp . ::= p

ing of the preposition

(5) Predicate symbols are also used for the description of such relations between nouns that are not expressed by a fixed word category. We have for an example the OWN-relation which can be or by cases; OWN bee x p r e s s e d by pronouns, by verbs, by prepositions;

e.g. his dog, John has a dog, John owns a dog, tha dog of John,

John's dog. In all these examples the relationship tween John and the dog is expressed.
(6) D e t e r m i n e r

As mentioned above, there are four types of determiners.

De-

pending on the type a q u a n t i f i e r and a connector are assigned to DET which are needed for the construction of the formula describing the meaning of the appropriate noun phrase. The q u a n t i f i e r becomes the q u a n t i f i e r for the whole noun phrase and the connector is the connector with which the formula is attached to the other formula fragments belonging to the other sentence fragments.

27

DI",'I' : :: u D1

c(,n([DET,indef d]) : ^ q ( [~)I~T,iIldef d]) : a , oF is She empty is plur. 3x1[DOO x1^ ..., 3 and

Here, n

is an indefinite article like DET [DET,...]

string if

is in plural form, i.e. the value of the feature

for the structured symbol "a dog

Example: ^ D2

..." is represented by con and q

are the values of

respectively.

con([DET,def d]) : ^ q([DET,def d]) : r

Noun phrases with definite article like "the ball" design always a certain, fixed object of the world which is already known in the 3x[BALL x ...] , but the name of the object menWe will discuss the probcontext and so has already a name. Therefore we do not generate an exp~'ession like tioned is searched for in the structure. erating the noun phrase. D5 con([DET,ex d]) : ^ q([DET,ex d]) : The lexical entries for determiners s p e c i f i e d by nouns like s o m e . Example:
Some

lems of this r e p r e s e n t a t i o n below, when we discuss the rules gen-

[ex d]

are pro-

ohs

are

works and ^

is r e p r e s e n t e d by in this formula are generated

3xI[CHILD xi^ WORK xd 9 3

depending on the pronoun a o m e . D4 con([DET,all d]) : q([DET,all d]) : Y Pronouns fied by like e u e r ~ [all d] .
pZa~ ...

and e a c h

are generated by a determiner speciY and the

and the appropriate q u a n t i f i e r is

connective is Example:

aZ~ a h i K d r o n

is represented by

Yx,[CHILD xl PLAY xl]

28

(7) Teml.orul adverbs N(iturnlly, it is possible to duucz'ib,~ any time res help of the time operators one-place time operators of the following, A Z with tile In +, -, F, P . We demonstrate here the for some time adverbial groups.

is always the formula representing the natural

isnguage sentence the temporal adverb belongs to.


ADVI ADV

::=

aZ~a~a
= ([~) Z

op(ADV) (~)

is a defined operator of

(~) A

+-*

FPA ^ PFA

This means that from every state from now on we can go to every stat~, into the past and from every state from now on into the past we can go into the future and A is true in all states we can "reach" on this way. We would like to stress that what is meant by a temporal adverb depends on the structure in which sentences are evaluated. If we consider a linear time structure, it would be sufficient to represent aZwa~s by be "reached" by means of F and FPA . Our representation demands a otherwise isolated points cannot P . This consideration is impor-

totally connected time structure;

tarot because we require that time adverbs have non-logical meanings, i.e. what they are represented by depends on a given structure and not only on a given logic, that is to say it does not depend on the logic but on the i n t e r p r e t a t i o n of the logic. We conceive of time as a non-logical concept.

ADV2

ADV

::= aZmost aZwa~a


=
v -', - "-,A]

op(AOV)
C~-~A .,--*~

[.-, + ~ A

is almost always true if for every state A

((~)) A

there is an is true. In-

immediately preceding or following state in which tuitively speaking, -ed" starting from every state.
ADV3 ADV ::=

is almost always true if it can be "reach-

sometimes
= C~D

op(ADV)

29

(~DA *-* .nFP .nA

.nPF .nA

sometgmos
states

means

that there in the past starting is true.

are states such that

in the future A holds,

from which are are

we can find states in the past where A reachable

or there

from which

states

in the future

ADV4

ADV

::: a Z m o s ~ :

.ooor

op(ADV)

(~A

['n+A

"n-A]

This means

that

in every

state

((D)

we can reach another A is false.

immedi-

altely p r e c e d i n g

or following

state where

ADV5

ADV

::: naoor : (~)

op(ADV)

~DA

*-* FP.nA

PF'~A

In every future A

state

reachable

by first

going

into the future

and then

into the past

or by first going

into the past

and then into the

is false.

Naturally, ~DA (-~A *-+ *-+

we can derive ~.n A (-~.n A

the following

furmulae:

It ~s clear how we can refine operators groups + and like seZdom,

the time even more by applying So we can describe

the

more often.

adverbial

oor V oo~dom,

vo~y~vor~

so~dom,

~a~h~r of~r

oery o : t e .

and so on.

Attributed

production

rules

Here we present tribute mulae

a part

of the p r o d u c t i o n sentences a text

rules

together to state

with

their atformu-

functions.

English

are mapped

logic

lae and the formula of the single

representing sentences.

is the c o n j u n c t i o n

of the for-

30

NO1

NG : : :

1og([NO,1 pll) : my(N) ag(NO)


log([NG,2 ag(NG) sy(NO) pl]) : sy(N) ag(NG) ags(NG) : Xix(NG) : sy(N)

NG2

NOt

::= A NO. #) = log(A) : ag(NGs) : sy(NGa) = ag(NGs) : sy(NGs) = ix(NGI) logic expression representing where resp. P the noun group is the one- or P is the valfor N resp. ags is ^ log(NG1)

Iog(NGI) ag(NG1) sy(NS,) agr(A) syr(A) iX(NGa) In NG1,

the state

constructed. It has two-place ue of sy predicate for

the f o r m P x symbol resp. y

or Pxy, of ag

representing

the noun.

N ,x

NG , these described has rule

values above, xi ,

having been aE(NG) where

assigned within level,

to on the lexical this production i.e. within

level by rule and sub-

is the name i : ix(NG)

of the object

described This

the noun phrase. the form S script

It is generated on sentence because

is a subscript.

is generated ::= ...

a production

it must be guaranteed

that different occuring withone

variables

are generated

for different variables

noun phrases

in one sentence. single NG2 predicate,

For these

occur as terms within of the verb

the verb p r e d i c a t e

of the sentence. The

generates

noun phrases level.

such as ~ h e

b e a u t i f u l f~ower.
with the formula in NGI are submitted

formula

representing

the m e a n i n g

of the adjective constructed values

has been generrepresentconidentiby the

at,~d on the lexical nector cally. ^ .

It is linked

ing the noun group which has been

All the other a t t r i b u t e

%) Subscripts are used to distinguish between identical non-terminals occuring within the same rule.

31

u~.~/~n_.J? h | ~{III I! I] N o

NI'I

[NI',- cp, - eb] ::: DKT NG


log(NP) ag(NP) ag(NP) = (q(DET).Iog(NG).ag(NG).con(DET)) : ag(NG) 9 c if if con(DET) = r @ r and M = (0) = T}

con(DET)

w h e r e M = (Z ag(NP) gm(NP) gm(NP) gm(NP)

: W(NP)(s(NP).log(NG)ag(NG)[Z]) else ag(NP) By(NG)>


i, no

= Ylx(NG) = ~ if = <wha~ = <~horo

~ Ylx(NG) if card M > 0 if M = sy(NB)>

The value of (quantifier, contains;

log

for an elementary noun phrase is a quadrupel object name, connector); the

state logic expression,

state logic expression consists of the proposition the noun group the quantifier which depends on the type of the determithe value of ag is the name of con Js are and the value of ner quantifies this expression;

the object the noun phrase describes

the connector with which the noun phrase is linked to the other sentence formulae fragments. These values, q, log, ag, con, the constituents of the state logic expression representing the noun phrase. This expression is only formed on sentence level. E.g. For the noun phrase uZZ men within the sentence uZZ men work we get log = (V,MAN xl,x1, . The definite VxIMAN xl logical expression is the re[,re~.nlt[ng the noun phrase is pz'~.ssion ~'or the sentence is and the definite ex-

YxI[MAN xl ~ WOHK xl]; ~

connector linking the noun phrase formula to the vex'b phrase formula. The reason for this is that the constituents of compound noun phrases such as the teacher and =Z~ his pupiZs or nelther John nor Mar~ must be still available on sentence level because they are arranged within the logical expression in another order than within the natural language sentence. level, and attribute In fact we have transposed the problem otherwise resolved by a t r a n s f o r m a t i o n to the semantic functions perform the task of transformational rules in transfo,'mational gramma1'~. Nourl phrases w~th definite article are not represented by th~ logical expression but only by an object name. The name of the object described by the noun phrase is either a bounded variable or a constant. type of the determiner. [ex d] or by This depends on the [all d] or by If it is s u b c l a s s i f i e d by

[indef a], i.e. it is all or some or a etc., xi which has been generated within NGI. Other-

then the noun phrase expression is q u a n t i f i e d and the object name is the variable wise, i.e. if the determiner is s u b c l a s s i f i e d by [def d], the

32 noun phrase is definite, find the object tha~ c c c i.e. it has the form tile u and we must

that is spoken of. The search condition is The

has the properties described by the noun phrase and that is unambigues only if it is

iB the only object of the world h a v i n g these properties. i.e.

sentence 2akm ~he blg grsen ba~ll cloar what ball is meant, tion for in, s(NP) ag, w(NG) c

if there ~s only one ball that is

green. This search condition is formulated by the attribute funcis the structure the dialog is evaluated is the only constant of the structure, is true in w(NP) at state sy(NO) z s(NP). if such is the actual state of the dialog. The search condilog(NG)[c]

tion requires that w(NP), such that

If the search condition cannot be verified a following up sentence must be generated. that log(NG)[z] It is a question of the form ~ h a ~ is true. the noun phrase is ambigues, if there is no such i.e. there is more than one

It is a sentence there is no sy(NO) gm . The logical expression representwhere u and ~ c is the logical erpresis the constant de~[c] can be c

z . This following up sentence is generated

by the attribute function

ing a noun phrase the u is ~[c] sion representing the noun group 3x[~[X]

signated by the nbun phrase. We could also generate ^ V y [ u [ y ] ~ x : y] . The first expression u[c] derived from the second semantically by searching for an object such that set of x holds. T o g e t h e r with the requirement that the u x holds contains only one element we have In NP1 we have formulated it i.e. on the semantic In our solution we such that

exactly the search condition of NPI. level of the logic.

in terms of structure and truth condition, on the syntactical level of the logic only. 3x[ux a Vy[~y x = y]]

In the alternative approach it is formulated

must verify the expression of the second solution, when a n a l y s i n g the sentence and transducag(NP). If we take the second soIn each of the two The ing it, namely when generating

lution we first generate the expression and evaluate it when the parsing of the sentence is already finished. when, cases the search condition is the same. The difference is only i.e. on what level, we execute the necessary deductions. advantage of our solution is that an ambiguzty is discovered during sentence analysis and a following up question for resolving such an ambiguity can be generated and answered immediately.

33 [NP,- z,- eb] iug(NP) ag(NP) con(NP)

NP2

::= PN

= (c,c,ag(NP),~) : sy(PN)
= r

If a noun phrase consists only of a proper name it does not contain a logical proposition. The only "information" noun phr~se is the object name, NP3 [NP,- z,- e] log(NP) ag(NP) con(NP) ::= PRON contained in such a i.e. the proper name.

= (r162162 = ag(PRON) : r

As in the case of proper names a pronoun only refers to an object and does not contain a logical expression, ag(PRON), i.e. the name of the object the pronoun refers to must be found in the structure the sentence is evaluated in. There is no general rule for finding this object. One can compare the objects mentioned in the text and take the nearest one that fits m o r p h o l o g i c a l l y appropriate verb demands. and semantically, i.e. has the same number and gender and the semantic features the Questions such as What ga meant by he? that are not resolvable in this way. are generated for ambiguities NP4 [NP,- z,- e,x pl]

::: P O S S P R O N [ N G , x pl]

log([NP,l pl]) = (E,beg,ag(NG),r where beg = log(NG) ^ R E L N G a g ( P O S S P R O N ) a g ( N G ) log([NG,2 pl]) = (3,1og(NG),ag(NG),A) agr([NG,2 ag(NP) con(NP) pl]) : ag(POSSPRON) = ag(NG) = ^

If a possessive pronoun precedes a one-place noun there is a binary relation between the object the possessive pronoun refers to and the object the noun refers to. This relation is not explicitly mentioned. What relation is meant must be concluded from the If the possessive pronoun RELNG is if the semantic descriptions of the two objects. most probably the ownership relation. of the body RELNG two refer to things RELNG

refers to an animate object and the noun to a thing

If the noun refers to a part

is mdre probably the PART-OF-relation,

is probably the PART-OF-relation too.

34 F~)r a two-place noun the relation between expI'es:~ed by the predicate (~g(l'OSS PBON) meets the same difficulties the two objects as for ag(PRON). is already

symbol representing

the noun. Finding

Noun phrases with embedded sentences NP5 [NPI,- z,+ e] IoE(NPI) I : ~ log(S)^ ::= [NPa,- z] Is, rl ks] if ~,(log(NPa)) ^ log(S) : r

: (wI(IoE(NPa),I,~3(IoE(NPm)),=~(IoE(NPa))) else

~a(log(NPa)) = ag(NPm)

ag(NP,)

agr(Nl'a): sg(NPI) The state-logic-formula ^ subclassifying sentences sentences ~i log(NP) representing the relative is a feature clauses clause is linked by (kind of sentence) We need this mappas a quadrupel constitlevel.

with the noun phrase formula, ks (value as).

into relative

(value rl) and assertive

is the i-the projection. and log(S) NPs

ing here because (q,log,ag,con)

has already been generated

(rules NP1 to NP4)

must be connected with q(NPs) is not genThe sentence

the logical expression representing uent of the quadrupel erated until a rule Thus, AZZ ed by log(S) Vx[CHILD

which is the second

log(NPs) . The quantifier S ::= NP V ... is applied,

i.e. at sentence

is always in the domain of the quantifier. x ^ PLAY x p ~ AGE x = (jears,ll)]. clauses referring who is working (r162162 in London, ....

~he ch~Zdren who are pla~ing hero are eleven years oZd is representThis case disi.e. NP Here the log(S) ^ is needed for relative like John, to proper names,

tinction

for sentences relative

John is represented by

The proposition

for the with

clause who is working in London must be linked 5y of the sentence. NPm

the other formula fragments ed by the noun phrase to the sentence by Prepositional tences, linked by ^

The object being describ-

is the same as is described by the relaIts name must be made accessible

tive pronoun of the embedded sentence. agro . are treated

complements

in the same way as embedded senthe prepositional phrase is

i.e. the expression

representing

with the noun phrase.

35

Compouud noun 19hraoes


NP6 [N['4,+ z] ::= NPm CONJ NP= Iog(NP,) Jx(NP=)
ix(Nt'=)

9 (Iog(NPs),Iog(NP=),h(CONJ)) = ix(NP,)
= ix(NP,)

This rule is for noun phrases such as John and Mary,

th# teacher or h is the logical Iog(NPI) is

~h, children, h(nelther

no,that my fa~har nor I. The value of


e.g. = ~ and

connector c o r r e s p o n d i n g to the conjunction:

... nor)

h(egther

... or ...) = +~* .

generated as such a triplet because the conjunction has such an effect to the noun phrase that the sentence the noun phrase is contained in is a compound sentence linked by this conjunct. E.g. the sentence I

kno~ ~ohn and Mary has the same meaning as I kno~ John and I know Mary.

Sentences We have production rules for types of sentences differing as to the kind and the number of verb complements. We will give here as an example the rule for a sentence with three verb complements. All the other sentence generation rules have the same form. $I IS,as ks] log(S) i~(NPI) ix(NPm) ix(NPs) s ::= NPI[V,x t,y m, z f,3 pl] = top([x t]) eon([y m]) a(sy(V),Iog(NPI),Iog(NPm),Iog(NPs)) = i = 2 = 3 Its arguments are the expressions a log(NP) NP= NP=

is a partial function.

generated within the NP-rules. the log(NP i) . sy(V)

constructs a well-formed logical

exprcusion from the formula fragments which are the constituents of is the predicate symbol for the verb U . (quantifier, fermula fragment, object 11 and and i= h are again is a con(11,1=,h) where log(NP i) log(NP i) is either a quadrupel log(NP)

name, connector) or a triplet expressions nector of

(i.e. triplets or quadrupels)

L(Z) . In the first case

is called elementary.

36 Now we are able to define ~ for an n-place verb predicate.

I)~.~f[nlt!r~n

:;~eheme of

:(sy,l,,...in)
.

I. Let sll i i be elementary, i.e. i i = (qi,Fi,agi,coni) a(sy,ll,...in) = t t qlag~[F1 con1 qs aga[Fm...qnagn[Fn cOnnSY ag,...agn]...] ag~ : r if qi = r and ag~ : ag i else Example for the two-place verb know: From the sentence Every boy knows ~ ggrZ we get: log(NP,) log(NPa) sy(V) = (w,Boy x~,x~, = (B,GIRL xm,xm,^) = KNOW Xs,Xs,A)) xa a KNOW x~ xa]] i such that = lio

a(KNOW,(W,BOY x~,x~, = FX,[I~OY X~ Bxs[GIRL II. Let io

be the least

is not elementary.

lio : (k~,ks,h) . ~(sy,ll,...lio_1,(kl,km,h),lio+1,...in) m(sy,ll,...lio_1,k1,1io+1,...in) a(sY,ll,...llo_1,km,lio+1,...In) If there i~ more than one non-elementary noun phrase argument is (.nly defined when the two corresponding connectors h. are coztq)ats is only defined Let if I i = (kl,km,h i) and lj = (ml,mm,hj) of a and h. a h

. Then

~(sy,ll,...kl,...ml,...in) hi(e(sy,l~,ka,...ml,...In)

hj a(sy,ll,...kl .... mm,...in)) hj m(sy,ll,...kl,...ma,...in)) h i ~(sy,l~,...ka,...ml,...in)) h i ~(sy,ll,...k2 .... ma,...ln)) hj

(m(sy,ll,...kl,...m~,...in) (a(sy,ll,...kl,...ma,...ln) This holds iff

(Xl hj Xa) hi(Y1 h~ Ym) ~ (Xl h i Yl) hj(xa h i Ya) s ~ is not compatible with itself. like Neithet the teacher

(a # b) # (c + d) , (a # c) + (b # d) For this a is not defined for a sentence

nor hio puplZ know neither the alphabet nor multlp~ication tabZee. Such a sentence is refused semantically. The sentence is incorrect because it allows more than one analysis.

37

+~ a ~

is selfcompatible: (b *~ c) E (a ~ b) *~ c and therefore

Ca +

b) +

(c +

d) ~ Ca +

c) +

(b +

d)
John Or Marv drs e4~her

The two analyses of the sentence E ~ h e r

ahampajna or boer have the same meaning representation. It is obvious that a acts like a t r a n s f o r m a t i o n rule analysing a send into a does o a n d a dooa d and b does o

tence

like

a a n d b do o a n d d.

and b doas

The value of top is the sentence's tense operator determined by the wilue x of the feature t . con is ~ or z according $o whether NP the verb is negated or not. The subscripts of the object variables of the NP are generated on sentence level, w i t h i n an S-rule since that are different on this level must receive different object names because they are w i t h i n the scope of one verb predicate, the verb predlcate of the verb of the sentence, sy(V) .

Helative clauses For relative clauses the relative pronoun is generated directly and it is not the result of the application of a t r a n s f o r m a t i o n rule to a noun phrase once generated. As before we have a sentence rule for every type of relative clause, c o r r e s p o n d i n g to the kind and the number of verb complements and to the grammatical function of the relative pronoun in the sentence. We give an example rule for a verb with three complements and the relative pronoun as its first object. $2 [S,objl rel] log(S) ix(NP1) ix(NPm) ::= RP NPI [V,x t,y m,z f,3 pl] NP2

= top([x t]) con ([y m]) a(sy(V),log(NP1),(z,r162 = I =2 m corresponding to the relative pronoun NP and has been assigned within NP5 . RP is ele-

The argument of its object is (any concept).

mentary and refers to another agr(S) and con are empty

within the sentence. The name of log, q,

because the object does not contain any statement

38 C~,,,[,ou,!dsent,,nees Cl S ::: SI[CONJ,x kc] Sa log(S) : log(S=) h([x kc]) log(S,)


compound sentences as

This rule d e s c r i b e s
~t does not rain o r

John works i n
and Mary

the garden
esudlee

~fl

also John worko

in London

aS go-

sex. tion. CI.1

h([x kc]) it depends x = cond

is the connector of on the value of kc .

L(Z)

representing

the conjunc-

h([cond kc]) = Conditional Example:


if is tence

llnks between sentences

are expressed by

~ . iS Sm ;

t h e umbreZZa i f i t r a i n s . It r a i n s [CONJ,cond kc] a n d I wi$$ t a k e t h e umbreZZa i s


represented by Sm * S , .

1 wiZZ

take

SI 9 T h e s e n -

is

C1.2

x : temp
S, after

Sm

is represented by log(St) following state where

log(S=) i.e. log(S=)

~ + ~

and there is an immediately

log(S,). Example:
I went I wont to bed after I had eaten. I had care,

is

S,

and

to bed is

S, . The sentence log(St);

is represented by
to bed.

log(S=) CI.3

~ + ~

i.e. I had eaten and then I went

x = caus $I because Sa is represented ^ log(St) Sl and by ^ log(Sa) SI and $I Sa both hold. Ss if Our is ~D[log(Sa) log(S1)] implies

i.e. sometimes Example:

Sa

I am wet beaause

it is raining.

is I am wet and i.e. sometimes (~)

it 48 ralning. ~D[log(S=)
it ia raining

The representation ^ log(S,)

is ^ log(S=); and it is raining.

~ log(S,)]

I am ~eS and I am wet

We are awure of the problems

arising in connection with causality.

39

formalization does not prevent that thingu which are sometimes truc at the same time are causally related. to relate causally things in this way. CI.4 X : conc But we think that humans intend

$4 aZthough S a i e
(~D[IoE(S=) i,e. almost always Example: * ~

repros=need by
IoE(SI)] ^ log(St) ^ IoE(S=) implies ~log(S1) and the both hold.

log(S=)

t a k e the u m b r e l l a aZthough i r i a r a i n i n g ; this means t h e umbreZZa if i t i a r a i n i n g and i t ia r a i n i n g az,d I do n o t take t h e umbretZa; i.e. (-ff~D[log(S=) * ~log(S1)] ^ log(Sa) ^ log(S4) where $I is I do not ga~r tha umbraZZa and Ss is it is raining.
almost always I t a k e

I do n o t

Sample sentence ~ e n e r a t l o n

big

red

blook

Gs(NG2)= NP~(NP1) (D1)


(A2) (At) con(DET) q (DET) log(A~) = ^ = 3 = SIZE agr(A1) ~{meter} CPb4g, syr(A1) ~{meter} = OPblg

where SIZE = q)big and lOg(As) : RED agr(As) sy(N) BL : BL : BL(') = b-~

(N1)

(NG1) log(NO)
ag (NG)

= BL aK(NG) = BL Xlx(Nfl) = Xix(NG)

40 sy(NS) (NG2)I IoZ(NGI) ~ig(NG1) sy(NG1) agr(A) syr(A) ix(NG) (NG2)m log(NGz) : BL = I~ID agr(Az) = HED Xix(NG) = ag(NG) = sy(NG) 9 ag(NG) = sy(NG) = ix(NG1) = SIZE Xix(NG1 ) ~(meter) ^ (meter,l)A a bL Xix(~G ) ^ BL xix(NG)

= Xix(NG) = BL

RED Xix(NG1 ) The values submitted of the other identically.

BL Xix(NGI ) aE, s y , a g r , s y r and ix are

attributes

(NP1)

Iog(NPI)

= (3,l,Xix(NP,),^) i = size

where (meter,l) ^ R E D Xix(NP1 ) A

Xix(NP~ ) ~{meter)

Let log(NP1) If the noun phrase

BL Xix(NP1 ) = 11 NPI is the part aZZ of a compound noun phrase


balle

9 9 9

and

V
NP,

[co Jl( o )
all

NPa NPs
we get (NP6)

(NPI)

(D~) (co) (NPI)


(NP6)

con(DET) q(DET)

= * = V like NPI : = ls

h(CONJ) = ^ NPs is derived Ios log(NPs)

: (V,BALL X i x ( N P s ) , X i x ( N P a ) , = (11,1z,^)

41

Let now be
John

NP~

the object of a uentrnce


gave 99" to ~he teacher

/P e r f
PN

~ /
m [

I/

NPm

\
~

[def

kdJ

...z

NP- (NP2~)

(NP2)

Iog(NP,) ag(NPm) con(NPm)

= (r162
= John

= c : ^

(D2)

con(DET)

q(DET)

= r
w(NP) , contain more than one object and Tcm for cl,gm objects of

L~.t the appropriate structure, which is a teacher; w(NP). i.e. Tcl

(NP1)

Iog(NP~) ag(NP~) gm(NP~)

= (c,TXix(NP~),Yix(NP~),a) = Yix(NP~)
= <what teaoher>

The sentence analysis is interrupted here and the following up question w h a t


teacher?

is generated.

When it is answered, e.g. w h o is w o r k i n g

say by

c , or this

by a more detailed description,

in M u n i c h ,

answering noun phrase is evaluated in the same way and if necessary another following up question is generated. fying the object found, get Iog(NP~) = (r When the answer is satislog(NP~) and we then c , is inserted into

(sl)

log(St) =
~P'~m(GIVE,(r162162 = ~P-~[~(GIVE,(c,r ^~(GIVE,(c,r162

42 = ~[[3xa[.SIZE ^[u Let now Sa is Sa $I x, ~(m} (m,1) ^ RED xa ^ BLxa

^[Tc ^ GIVE John x, c]]] xm ~ [Tc ^ GIVE John xa]]]] clause embedded into a noun phrase, where

be a relative

with John replaced by who:


John who l ie v)orkin H

',/

""/

pl

NP, (NP5)

s3 (s2)
(NPS) log(NPi) ag(NPe) agr(Sa) ($3) log(Sa) = (r
= John

= John rule not explicitely mentioned here. = a(WORK,(c,r = log(St) a WORK John

is a sentence generation

Let this sentence be embedded into a causal sentence

John

v~ov~ing

b6oause

he

neeae

mo,~e~,

sB (c2.3)
(C2.3) is not explicitely mentioned here. This rule generates compound sentences where the superordinated sentence precedes the subordinated
one.

43

lo~(Sm)

9 ~D[log(s,) a log(S~)

- log(S,)]

^ log(s~)

Dialogs

A dialog lowod

is a s e q u e n c e

of s e n t e n c e s sentence,

of

L(G)

where

a question

is fol-

by an a u s e r t i v e

the a n s w e r ,

or by a n o t h e r

question, by anup

th(' f e l l u w i n ~

up q u e s t i o n .

A command

or a s t a t e m e n t

is f o l l o w e d

oth~.~" :~tut~mel, t I o a r r i e ~ qu,~stlon. contains Following

4~ owt or I unJ~*,s~and or by a f o l l o w i n g
are g e n e r a t e d noun whenever whose a definite phrase object

up q u e s t i o n s i.e.

the s e n t e n c e can-

~n a m b i g u i t y ;

n o t be d e t e r m i n e d ter'mined.

or a p r o n o u n

the r e f e r e n c e

of w h i c h

cannot

b e de-

DeI'in~t~on

I. A d~'alog o~oZo on a s t r u c t u r e
Is a s e q u e n c e S S QC A of where L(G) is a s e n t e n c e

A = ({A s

: s c M},Ro)

in a s t a t e

QC is e m p t y question or the v. S , i.e.

or a s e q u e n c e

of pairs u and

QB B

where

is a f o l l o w i n g of the f o r m up q u e s t i o n

up o to

of the f o r m w h a C QC is e m p t y

is an a n s w e r

if t h e r e

is no f o l l o w i n g ... Qn Bn if

g m = ~ . QC = QI B, Qm Bm 9 S . If c . and for by S If

gm = { Q I , Q m , . . . Q n } A is the a n s w e r o f the form form to

is a q u e s t i o n S

is yes A

or no or is of the

~he u or c

is a w h - q u e s t i o n = T where 1 is

Ohs v resp.

A(s,1)

log(S)

with

the v a r i a b l e of the o b j e c t alternative a command A

generated described

the qur the v resp. iff

word replaced by c . If S

by the n a m e is an If S S is is

question

A = yes

A(s,log(S))

= T .

is I carried ~ sentence A

ou~ or I c a n n o t do i~. If

an a s s e r t i v e

is I u n d e r s t a n d .

2. A d i ~ Z o g on a s t r u c t u r e dialog cycles DI Da

A = ({A s : s r M},Ro) on A in 81 s~ iff

is

a sequence o f such that and for Si

... Dn and

... Sn

I 9 i < n is a c o m m a n d This path means

s i R si+ I and

s i * si+ I i~ out.

D i = SiQZiA i

A. = I c a r r i e d I is e v a l u a t e d

a dialog

in a Z - s t r u c t u r e . it m a M e s

It c o n s t i t u t e s forward

through

the s t r u c t u r e is c a r r i e d

a l o n g Ro and out. We w i l l

a step

when-

ever a con~and what

see in the f o l l o w ~ n g

paragraph

is the e f f e c t

of a c o m m a n d

in a s t r u c t u r e .

44

CHARACTERIZATION OF QUESTION-AN3WERING-Su

BY Z-STHUCTUHES

ller,~ we shall describe how a stru,:ture for the state logic can charactuz'~z,~, a natural language u n d e r s t a n d i n g system. The knowledge that is formulated in such a system is represented by a Kripke-structure. connection with thls the non-logical tions is very important. ~ng. F~,r two structures As, c{~n be executed within In interpretation of state transiHe bears a non-logical meanhol~s iff the "world"

The very general model of K r i p k e - s t r u c t u r e s As and As, Am s H o s'

~s us,~d in such a way that the relation J:1 obtained from the world

as the result of an action which On natural

A s . What actions can be executed within a

world depend on the extensions of the n o n - l o g i c a l symbols. cies by non-loglcal axioms.

language level actions are verbs. We characterize all these dependen-

Worldsa

state changes and their dependence on action verbs i.e. a color, like MAN,

A world is a set of objects that have certain properties, a size, etc. They are subclassified by conceptualizations objects,

TREE, TABLE, HUMAN, etc. There are relations which can hold between i.e. position relations such as ON, BESIDE-OI~, etc., or "abOWN. All these eleObjects stract" relations such as the ownership relation

mlents of a world are represented by the language of the state logic as it can be concluded from the d e s c r i p t i o n of the last paragraph. are con(3tants, properties are predicate or function symbols, relations are predicates of the appropriate number of places. Verbs are relations between objects too and they are represented by predicate symbols of the appropriate number of places. of verbs, We can distinguish two types called statio verbs and dynamic verbs. An n-place verb is i.e. if the formula representaExamples of static

called 8ratio if it does not describe an action that changes relations or functions of objects of the world; ring the assertion of the verb holds within a structure this does not effect any change on objects of the structure. verbs are believe,

want, know. An n-place verb is called dynamic if a


If

structure is subject to some change w h e n e v e r the action described by the verb is executed in it. Examples are take, give, grow, put. somebody takes a thing the position of that thing changes, i.e. the extension of the predicate symbols ON, BEHIND etc., and the extension of the static verb predicate symbol HOLD changes, because the person holds the thing now. Constructive verbs are always dynamic, e.g.

45

l, u t ' t d ,

p,~nt.

She b u ~ d m

a house.

Ha p a { n t s

a p~otuvo.

The ~tpi,I'opz't~Lte

thb~g ~:ornet~ I n t o e x i s t e n c e
ar~ corldLtions |,osslble if such both a does

by e x e c u t i o n

of the action,
and if b

Furthe,', there
is orlly has a position b . We describe by non-loglcul in

for the execution

of an action,

a takes b

not yet hold a n y t h i n g i.e. there

that it can be taken, the conditions

is nothil~g on of an actlon

and the consequences structure

axioms. whatever

And the appropriate state state

must

have the proper'ty that there must

all the conditions in which verbs

of an action hold

be some like conholds

following to point taining within mean ~]q]~ sible

its consequences which arises

are realized. logical

We would ~xl...Xn

out a p e c u l i a r i t y dynamic

when

expressions

are evaluated:

If an expression verb v

a state

of a structure

for a dynamic is really

this does not but that it is ~_qs_is a |,osbut not It is alwsys

that

the a p p r o p r i a t e it; i.e. state where

action

executed are true.

to execute that within

all the conditions consequences

are true and there can be executed following "really"

following

all the

a state more

than one action are several actions are

all at the same time do no% depend sc~'ibed through

and so there What

states which executed is de-

on each other.

by a text

or a dialog which

constitutes

something

like a path

a structure.

Ch~racterization Definition For every n-place fining v (I) Condition

of action

verbs by n o n - l o s i c a l

axioms

action (CA) (EA)

verb

there

are two n o n - l o g i c a l

axioms

de-

axiom

~x,...Xn
(2) Ex~cut[on ~xl...Xn C E is called is called

C[x,,...Xn]
axiom E[xl,...Xn] v and

the oondltion of the szeoution of

v v.

(CA) and

(EA) are called aotion axioms of

Naturally

C[xl,...Xn]

~ E[xl,...Xn]

holds.

(CA) describes involved Example:

what

requirements

must be complied

with by the objects

in an action take

in order that the action

can be executed.

46

(CA)

TAKE

xy x

HAND can

x ^ THING take y

y ^-~I|OLD x and

x z ^-~ON and

z y y is a t h i n g or, y . E in and

't'hlu m,,ans: x

iff object

is a hand there

d,,e:~ not hold

any o t h e r

is n o t h l n g

(EA) d e s c r i b e s (EA) ht,s the ~tring.

the c o n s e q u e n c e s -~ +-~x and x

of the does

execution not

of an action. F,P,+,-

form

contain

as sub-

Exampl~: (EA) T A K E This state Slnce

tuke x y if that * x x ~ +~[HOLD takes holds always v s , y y x y ^ ~ON then and y z] is an i m m e d i a t e l y lying following

means: such

there y

is not

on anything.

C ~ E

holds

we have: and every (s Re s'), by v state where s there is a state of the s' im-

For every mediately tion

action

verb

following action

the r e s u l t s its

execuhold.

of the

described

hold whenever

conditions

SO, we

can d e f i n e

a model

for a c t i o n

verbs:

Del'tnttion
Let verbs tion A
S

({A s : s r M},Ro) N for all for Z v 9 V and N

be a Z - s t r u c t u r e axioms; and all iff the o t h e r A(s,A) = T

and

a set of a c t i o n and execuThen axioms.

and axioms

a set of n o n - l o g i c a l

i.e.

condition

non-logical for all

is a m o d e Z
r M
.

A 9 ~

and all

Because Let

of the

correctness Then of v

of

the s 9 M

following for and E' Z

theorem and N v c V

holds: and V such of a set that v we C

A = ({A s : s r M},Ro) verbs. condition and

be a m o d e l E = ~ +~

of a c t i o n is the have A(s,C)

for every

every

is the

execution

= T - there

is

s' = T

such

that

s R s'

and

s * s'

and

A(s,E')

The

differentiation axioms when must must

between deriving first be

(CA)

and

(EA)

is i m p o r t a n t

too for the use commands. a state the use and the con-

of the Then

answers

to q u e s t i o n s if this holds. (EA)

or e x e c u t i n g possible by

a system

verify where

(CA), (CA)

is not This

of the w o r l d of o t h e r dialog stitute

found (CA)

is done carried (CA)

axioms.

When

is v e r i f i e d within the

can be So

out

can p u r s u e "heuristic"

its path aids

structure.

and

(EA)

for the

execution

of commands.

47 CONCLUSION We have given a device to describe how natural be tr~Jn~lated an underlying into a semantic representation knowledge system. The proposed language sentences can

and how orle can *'ep,'esent formalism only eon~tltutes

a first attempt in formalizing "what is understanding of natural langu:~ge text". Among the most important problems is a bett~,r and more refined subclassificatlon of adjectives, a sati~3fscto~'y descrlptlon of mass nouns and a revision of the logic to allow the ap|,earenc~ a n d dlsappearence of objects of the world. As to the latter we have either to revise the substitution rule or allow only closed formulae to be manipulated by inference factory solutions rules. Both restrictions [18], [19]). are not very satis(see also [17],

I~EI"EHENCES

[1] [2] [3]

[4]

[5] [6]

Winograd, T., Understanding Natural Language. Academic Press 1972. Schank, R.C. and Abelson, R.P~ Scripts, Plans, and Knowledge. Advance Papers of the IJCAI 4, Sept. 1975. Bobrow, D., Natural Language Input for a Computer Problem Solving System. In Minsky, M., Ed., Semantic Information Processing. Cambridge: The MIT Press, 1968. Kellogg, C., A. Natural Language Compiler for on-line Data management. Proceedings of the Fall Joint Computer Conference. New York: Spartan, 1968. Woods, W.A., Transition Network Grammars for Natural Language Analysis. Comm. of the ACM vol.13, Nr.10, 0ct.1974. Ershov, A.P., Mel'chuk, I.A., Nariniany, A.S., HITA - An Experimental Man-Computer System on a Natural Language Basis. Advance Papers of the IJCAI 4, Sept.1975. Rescher, N. and Urquhart, A., Temporal Logic. Springer Verlag, Wien 1971. Kr~pke, S.A., Semantical Analysis of Modal Logic I Normal Modal Propositional Calculi. Zeitschr. f. math. Logik und Grundlagen d. Math. Bd.9, 1963. SchGtte, K., Vollst~ndige Systeme modaler und intultionistischer Logik. Springer Verlag 1968.

[7]
[8] [91

[10] Braun, S., Eigenschaften strukturierter Symbole in formalen Sprachen. Habilitationsschrift, MGnchen 1971. [11] Knuth, D.E., Semantics of Context-Free Languages. Math. Syst. Theory 2, 1969. [12] Koster, C.H.A., Affix-Grammars. In Peck, J.E.L., ALGOL 68 Implem,~ntation, North Holland Publ. Comp. 1971. [13] Chomsky, N., Aspects of the Theory of Syntax. Cambridge, MIT 1965. [14] v;tn W1jngaarden, A., E~., et al., Report on the Algorithmic Language ALGOL 68.

48 [15] [16] [17] [18] [19] Bruce, B., Case Systems for Natural Langua~.<,. Artificial lut,.tllgence 6, 1975. Moravcslk, J., Mass Terms ~n English. In H]utikka et al, ~Sd., AI,proaches to Natural Langun~e, D. Heidel Publ. Comp, Ig7~. H~ntlkka~ J., Modality and Quarltifioation. T~Weoria 1961. Krlpke, S.A., Semantical Considerations on Modal Logig. Aota Philosophlca Fennlca 1963. Montague, R., Universal Grammar. Theoria 36, 1970.

ACCESS

TO DATA

BASE

SYSTEMS

VIA NATURAL

LANGUAGE

Klaus-Dieter Peter Fakult~t C. fur

Kr~geloh

Lockemann Informatik Karlsruhe

Universit~t

Abstract

Communication concerns this goal

with

computer

via natural

language

is one of the m a j o r try to a c h i e v e resulting models

of a r t i f i c i a l by s i m u l a t i n g

intelligence. human

Modern

approaches The

language

perception.

are h i g h l y must remain

complex, b e c a u s e largely base

the s e m a n t i c s The

of

natural

language with with

statements

unrestricted. system, of the

communication hand, deals

a commercially a heavily re-

available stricted This

data formal

on the o t h e r subject

model due

matter

in the

form of a d a t a

base.

is m a i n l y

to the large

amount short

of d a t a time.

that m u s t The

be i n s p e c t e d of any diato

and manipulated logue w i t h the formal

within

reasonably are such

semantics

the m a c h i n e model.

that

all

statements

can be r e l a t e d

In l i g h t

of this

difference

the d e f i n i t i o n

of a n a t u r a l

query

language

for d a t a b a s e to a m o d e l l i n g

systems

and the m a p p i n g have

of n a t u r a l

language

statements different use of to the systems solu-

system will

to be a p p r o a c h e d one will

in a m a n n e r try to make intelligence of d a t a more

from artificial the r e s u l t s extent into tions The that

intelligence.

In g e n e r a l

of l i n g u i s t i c they can take

research the

in a r t i f i c i a l

simpler different

requirements and perhaps

base

account. will

In o t h e r

cases

pragmatic

be r e q u i r e d . of this paper is to i l l u s t r a t e base system. in d a t a morphemic the w o r k system such an a p p r o a c h so a n u m b e r to n a t u r a l of p r e m i s e s contests

intent

language

access

to a d a t a

In d o i n g base

for n a t u r a l cerning etc.

language

analysis model,

systems

are d e v e l o p e d

the s y n t a c t i c premises

analysis,

semantic system

validity that

These

underly

on a d a t a

base

provides

for a s e t - t h e o r e t i c

modelling

and a G e r m a n

language

interface.

50

Goals

of n a t u r a l

language

processing

for data base

systems

1.1

Simulation

of n a t u r a l

language

understanding

Natural

language

communication with Depending

the c o m p u t e r has a r e l a t i v e l y on their o b j e c t i v e s , artificial

long

tradition ties

in i n f o r m a t i c s .

these a c t i v i (AI) or

fall into one of two d i s c i p l i n e s , (DT). of l a n g u a g e

intelligence

data b a s e t e c h n o l o g y In AI the c o n c e p t i o n more comprehensive [1,2].

understanding cognitive

systems

is part of the on the c o m p u t e r

a i m to s i m u l a t e

processes

The b a s i c c o n c e r n of A I in this c o n n e c t i o n of cognition, i.e. the a s s i g n m e n t

is the l i n g u i s t i c to l a n g u a g e a cog-

component entities. nitive

of m e a n i n g

C h a r a c t e r i s t i c f o r m a n is his c a p a b i l i t y This image

of p r o d u c i n g

image of his e n v i r o n m e n t .

(called a model)

is always

an a b s t r a c t i o n it is to serve.

from the real world, In l a n g u a g e

c h o s e n w i t h r e s p e c t to the p u r p o s e the s t a t e m e n t s a b o u t the

understanding,

real w o r l d are r e l a t e d

to the c o g n i t i v e m o d e l such as m o d i f i c a t i o n s

(assignment of m e a n i n g ) , of the model, evaluation

and thus c a u s e r e a c t i o n s of the model, The s i m u l a t i o n modelling or answers.

of this p r o c e s s

above all r e q u i r e s

specification

of a

system

(MS), by m e a n s of w h i c h as a m o d e l

any e n v i r o n m e n t problem).

or p a r t of it

can be d e s c r i b e d understanding Further,

(representation

A language in some MS.

s y s t e m is a l w a y s b a s e d on a m o d e l

formulated

there is a need for m e c h a n i s m s in r e l a t i o n for h u m a n to that m o d e l language

w h i c h put n a t u r a l - l a n g u a g e (fitting p r o b l e m [I]). that

expressions

Characteristic meaning meaning as w e l l mentions account usually

understanding

is the fact

c a n n o t d i r e c t l y be c o n s t r u c t e d [I]. Instead, complex relations

f r o m some b a s i c units of between the m o d e l o b j e c t s Winograd In o r d e r to are enter

into the m e a n i n g of l a n g u a g e words like "virtue"

units

such as words. as examples.

or " d e m o c r a c y "

for the a s p e c t of relations, developed

special modelling This,

systems

for l a n g u a g e p r o c e s s i n g . consequently, sections

however, models

leads to

v e r y c o m p l e x MS and, cases w h e r e

to e x t e n s i v e

even in t h o s e

only small

of the e n v i r o n m e n t networks

are c o n s i d e r e d [5]).

(e.g. s e m a n t i c m e m o r i e s Obviously the c o m p l e x i t y

[3], d e p e n d e n c y

[4], demons

and the size of the m o d e l a f f e c t s the rules for d e r i v i n g

the f i t t i n g

p r o b l e m and,

if of concern,

the s y s t e m reactions.

51

T a k e as an e x a m p l e those subnets

the task to find in an e x t e n s i v e in their s t r u c t u r e

s e m a n t i c net all

corresponding

to a g i v e n net. models must t h e m m u s t be

B o t h in h u m a n c o g n i t i o n be p h y s i c a l l y realized. (brain,

and in the case of its simulation, and o p e r a t o r s for m a n i p u l a t i n g

represented,

In the first etc.).

case w e a s s u m e a p r a c t i c a l l y This p r o v i d e s

unlimited memory s o l u t i o n of such as

neurons,

a fully associative

the f i t t i n g p r o b l e m e v e n for h i g h l y recollection however, capacity of i m p r e s s i o n s , moods,

complex model etc..

structures,

In c o m p u t e r

simulation, of l i m i t e d exten-

we h a v e to deal w i t h p h y s i c a l and duration.

devices

and p r o c e s s e s

G i v e n a c o m p l e x M S and its c o r r e s p o n d i n g for the d e s c r i p t i o n

sive models,

this m e r e l y allows

and r e p r e s e n t a t i o n Likewise, no on

of a d r a s t i c a l l y manipulation a computer

l i m i t e d p o r t i o n of the e n v i r o n m e n t . on an a s s o c i a t i v e basis

of m o d e l s

is d i r e c t l y p o s s i b l e

as yet.

Instead,

the a s s o c i a t i v e

b e h a v i o u r m u s t be s i m u l a t e d times, e.g. long r e s p o n s e

thus c a u s i n g times

considerable

costs a n d p r o c e s s

in an i n t e r a c t i v e

mode. less t r o u b l e s o m e as n e w h a r d w a r e

This problem will technologies

certainly become commercially

become

available etc.).

(use of m i c r o p r o c e s s o r s , Still, it is by no m e a n s and its and not

n e w s t o r a g e devices, clear from current simulation

LISP-machines,

discussions,

whether human cognition

on c o m p u t e r [6].

r e a l l y d i f f e r just q u a n t i t a t i v e l y

qualitatively

If one a p p r o a c h e s

the same p r o b l e m

from d a t a b a s e t e c h n o l o g y , of a d m i n i s t r a t i n g

however,

one m u s t take into a c c o u n t v e r y large v o l u m e s usually of data.

the n e c e s s i t y Consequently,

and p r o c e s s i n g is

e v e n t h o u g h a data base [7,8], the

again regarded

as the m o d e l of some real w o r l d completely

modelling

s y s t e m c a n n o t be chosen

at w i l l but m u s t m e e t a

n u m b e r of c o n d i t i o n s : I) A t j u s t i f i a b l e costs, the MS m u s t a l l o w for the d e s c r i p t i o n of e v e n is very

such w o r l d s w h o s e large space (i.e. m o d e l

n u m b e r of o b j e c t s representation

and facts to be m o d e l l e d

s h o u l d take up as little

storage

as p o s s i b l e ) . for m a n i p u l a t i n g the m o d e l s s h o u l d be k e p t simple, is v e r y l i m i t e d since

2) T h e rules

the time for e v a l u a t i n g (acceptable r e s p o n s e Under these conditions,

even e x t e n s i v e m o d e l s in a data b a s e

times

system). their MS to

data b a s e s y s t e m s m u s t r e s t r i c t and are s i m p l e c o m p a r e d

t h o s e that can be f o r m a l i z e d Models

to those in AI. far

in data b a s e t e c h n o l o g y w i l l a b s t r a c t

f r o m the real w o r l d

52

more

than

the u s u a l

AI models. of s i t u a t i o n s since they to w h i c h they are a p p l i c a b l e likely models will

Consequently, be m u c h details

the r a n g e than

smaller that

in AI,

are m u c h m o r e Thus,

to r e j e c t in DT are Since in DT in

one m a y w i s h towards

to i n c l u d e

in them.

generally

oriented

a comparatively are few,

narrow

purpose.

the o p e r a t o r s the MS w h i c h Natural cussed, models 17]. Because standing other

defined is t h e n

on m o d e l s called

they

are u s u a l l y

included

a data model base

[9,10,11,12,13]. has frequently with been dis-

language

access

to data access

systems not

by p r o v i d i n g one h o p e s the

easy

to u s e r s more

familiar

formal [14,15,16,

system

to b e c o m e

widely

available

of the d i f f e r e n c e s systems the for large

in MS, data

it is o b v i o u s have

that

language

under-

bases

to be b a s e d

on s o m e t h i n g are

than

simulation by the language

of c o g n i t i v e interfaces

processes.

In DT the M S data b a s e

rigorously

defined the

of e x i s t i n g

systems. necessary

Consequently, and reasonable natural

can be r e s t r i c t e d of the system.

to an e x t e n t Under these

for the u s a g e in d a t a base

conditions divers than

language

systems,

although

much more

programming For

languages, use,

is still

a formal

language. acceptable. systems have For instance,

practical

these

restrictions in c a s e

are q u i t e base

tests

of the u s e r access

behaviour

of d a t a English

that

provided that

(simulated) the w e a l t h Even thus more,

by u n r e s t r i c t e d language

language is n e v e r almost

shown,

of n a t u r a l certain

expressions were

utilized

[18]. time, systems might to

query

structures

used

all of the base This

indicating highly

that

a natural

language user's

interface point

for d a t a too.

becomes be due

stylized

f r o m the

of view,

to a c e r t a i n

lethargy

of the user, [19].

to w h o m

it is d i f f i c u l t

conceptualize The objective

complex of this

statements paper

is to e x a m i n e to the

the

consequences imposed

for n a t u r a l base which

language system results

processing

subject This be base will

conditions

by d a t a

interfaces. f r o m AI m a y for d a t a

include

discussion

of the q u e s t i o n

incorporated access,

in the d e v e l o p m e n t extent, text

of n a t u r a l violating techniques

languages the a s p e c t

and to w h i c h

without

of p r a c t i c a b i l i t y . (automatic area

In a d d i t i o n indexing,

to AI,

analysis

in d o c u m e n t a t i o n are another large

morphemic their

analysis results.

[20,21,22]) The conpaper by

that may

contribute will be

clusions means

of t h e s e

considerations language access

illustrated data

in this base

of n a t u r a l

to a p a r t i c u l a r

system.

53

As mentioned linguistic outline

above,

the d i f f e r e n c e s

in M S h a v e a clear e f f e c t on the this f u r t h e r by g i v i n g a r o u g h in AI and DT.

approach.

We shall e x p l o r e

of the d e v e l o p m e n t

of l a n g u a g e p r o c e s s i n g

1.2

Approaches

to n a t u r a l

language processing

in AI

The e a r l y l a n g u a g e

processing

systems

in A I c o n c e r n e d language

themselves

with

the p r o b l e m of t r a n s l a t i n g systems posessed

a natural

into a n o t h e r

[23]. T h e s e

a certain knowledge

of the e x t e r n a l

s t r u c t u r e of the

languages languages grammar

(grammar),

and of the c o r r e s p o n d e n c e s synonyms). These

b e t w e e n w o r d s of the failed, since

(dictionary_of

approaches

and d i c t i o n a r y meaning

by t h e m s e l v e s of sentences

proved [I]:

insufficient

to deal w i t h

the d i f f e r e n t

like

(I) The fish was b o u g h t by the cook. (2) The fish was b o u g h t by the river. The system would have something is a p p l i e d to k n o w a b o u t the real w o r l d to w h i c h the d i f f e r e n c e (place). in t r a n s l a t i o n in the

the s e n t e n c e meaning

in o r d e r to r e c o g n i z e (person)

of "by the cook"

and "by the river"

Due to the f a i l u r e one t u r n e d

in the e a r l y sixties of these a p p r o a c h e s

to the b r o a d e r q u e s t i o n of h o w to s i m u l a t e in general. One i n d i c a t i o n

language under-

s t a n d i n g on the c o m p u t e r system's whether capability

of a c o m p u t e r

to c o m p r e h e n d

natural

l a n g u a g e w o u l d be, e.g. responds in a m a n n e r

in m a n - m a c h i n e "plausible" the T u r i n g

communication

the m a c h i n e

that seems

to its h u m a n partner. test [24]. of h u m a n of w o r d s

A s y s t e m w o u l d be p e r f e c t

if it p a s s e d

In o r d e r to e n a b l e least knowledge

the s i m u l a t i o n

language relative

understanding, to a s p e c i f i c For this (MS). A

at

a b o u t the m e a n i n g

e n v i r o n m e n t m u s t be p r o v i d e d purpose the e n v i r o n m e n t

(as we p o i n t e d out earlier).

or p a r t of it has to be m o d e l l e d is a s s i g n e d

certain portion The e x a m p l e s sufficient

of the m o d e l however,

to the w o r d as its meaning. is still not

below,

show that this a p p r o a c h

[25]: (3) J o h n jumps h i g h e r than Peter. than the E i f f e l tower. a

(4) J o h n jumps h i g h e r In o r d e r to r e c o g n i z e more extensive that

(4) is m e a n i n g l e s s ,

the s y s t e m r e q u i r e s (complex)

knowledge

of the e n v i r o n m e n t

than the

meaning

of i n d i v i d u a l words. b e i n g can jump h i g h e r establish relations

The m o d e l m u s t c o n t a i n the fact that no l i v i n g than the E i f f e l tower, in general, it m u s t

b e t w e e n m o d e l parts.

54

Some of the MS u s e d in the past c a n n o t m e e t this c o n d i t i o n or, only if a p p l i e d interest. standing [29]. to s e c t i o n s of an e n v i r o n m e n t

at best,

too small to be of any language under[28], E L I Z A i.e. it was

In p a r t i c u l a r , systems

this a p p l i e s [26],

to the e a r l i e s t SIR

such as S T U D E N T the S T U D E N T

[27], B A S E B A L L

For example,

system used equations world;

as MS,

restricted

to a small m a t h e m a t i c a l for v e r y s p e c i f i c

ELIZA used patterns of a dialogue. language;

that w e r e

suitable only

situations

M a n y MS h a v e the c h a r a c t e r i s t i c s these is p r e d i c a t e [30]). logic

of a formal

best-known

among

(applied to a l a n g u a g e

understanding

system

by C o l e s

In a f o r m a l - l a n g u a g e

MS the e n v i r o n m e n t (axioms).

is u l t i m a t e l y m o d e l l e d b y m e a n s speaking, they a l r e a d y b e l o n g the

of c e r t a i n b a s i c units

Strictly

to the class of MS w h i c h do not m e e t the d e m a n d m e a n i n g of w o r d s from c o m p l e x Winograd structures

for d e r i v i n g

with diverse dependencies [32], therefore, preferred

b e t w e e n m o d e l parts. to c o n s t r u c t description

[31] or W o o d s

m o r e c o m p l e x MS f r o m p r o g r a m m e s of e n v i r o n m e n t (e.g. S h a n k situations

w h i c h p r o v i d e b o t h the changes.

and o p e r a t i o n a l

Other authors

[4]) a r g u e that l a n g u a g e

understanding relations, and

c o u l d only be s i m u l a t e d on the b a s i s of c o g n i t i v e correspondingly developed

as MS c o m p l e x d a t a s t r u c t u r e s networks

such as

semantic memories problem

[3] or d e p e n d e n c y

[4]. The r e p r e s e n t a t i o n of the MS on s e v e r e [32], lunar

is s o l v e d by d e m o n s t r a t i n g (blockworld

the c a p a b i l i t i e s guide

ly r e s t r i c t e d w o r l d s geology [33],

[31], A i r l i n e [34]).

industrial

enterprise

1.3

Approaches

to n a t u r a l

language

processing

in DT

The a p p l i c a t i o n industry

of c o m p u t e r s

to the p r o c e s s i n g

of large data

sets

in

and public

administration program systems

lead to the d e v e l o p m e n t

of a class

of s p e c i a l - p u r p o s e systems". management

k n o w n u n d e r the name of "data b a s e for m e c h a n i s m s for the d e s c r i p t i o n ,

A data base

system provides

and p r o c e s s i n g

of large sets of data. [11,35,36,37,38] feasible are b a s e d on MS (designated

M o d e r n data b a s e s y s t e m s as "data models") environments. formalize rules formal

which make

the m o d e l l i n g

of e x t e n s i v e to

The s t r u c t u r e

of m o s t MS is s u c h that it is p o s s i b l e according to c o m p a r a t i v e l y

the d e s c r i p t i o n

of m o d e l s

simple The

and to e x p r e s s languages

their processing

b y m e a n s of algorithms.

developed

for that p u r p o s e

are c a l l e d the i n t e r f a c e s

55

of d a t a [9],

base

systems.

Well Model [123.

known [13],

MS

in use

are

the R e l a t i o n a l Model [10],

Model

the H i e r a r c h i c a l Relation as one user Model

the N e t w o r k

and the

Binary As soon

attempted

to m a k e EPD,

data even

base

systems

available

to the

casual

unfamiliar

with

simple

formalization

rules handi-

and their cap

corresponding users.

formal

languages

proved

to be a s e v e r e language need

to m a n y

Natural

language

as an e n d user since only

offers not learn

some h o p e

of o v e r c o m i n g

these

problems, but will

the u ~ e r have

a new and artificial tions on a l a n g u a g e base systems

language, already could

to o b s e r v e

restricaccess like

well-known be

to him.

Natural

language in areas or p u b l i c

to d a t a

the p a r t i c u l a r pharmacy, retrieval

advantage

industrial

management, where

medicine, even in data

engineering - as p a r t

administration solving process

of a p r o b l e m in w h i c h [17, he

- the user m a y his

employ and

the v e r y solution

language (also to see

generally The scope

formulates of m e a n i n g s

problem

33]).

of n a t u r a l by the MS

language

queries

a data b a s e such an MS interface isnot of

system

is d e t e r m i n e d towards base

(data m o d e l ) .

Generally, The

oriented the d a t a

cognitive

theories

as it is in AI. no m a t t e r of n a t u r a l what

system it,

is m a i n t a i n e d in the case

language

is c h o s e n too.

for a c c e s s i n g In t h e s e natural system Kellog

i.e. the

language

access,

systems language

a i m of l a n g u a g e into

processing of the

is to t r a n s l a t e formal language at the

queries

expressions

interface. was one of the CONVERSE first to a d v a n c e a formal this approach. He c h o s e as the

MS of his mation pose.

system

[39]

language

dedicated typical into were into

to i n f o r pur-

retrieval Likewise,

and h e n c e Woods

containing

operators

for this the

p u t his p r o c e d u r a l and

semantics symbols fall

f o r m of an for the

interface, underlying the AIThompson

in w h i c h

predicate Both

function

defined

procedures.

these Also

approaches

an area b e t w e e n is the w o r k both as the of a b i n a r y interby

und DT-methodology. [40], whose ring

to be i n c l u d e d

at this

structures relations Thompson

m a y be i n t e r p r e t e d and did

representation relational face, of The

of c o g n i t i v e [12]. like While

as the r e a l i z a t i o n not develop

model

a formal the

he - just data

Woods into

- was

heavily system.

concerned

with

integration

large

bases

his REL

expressiveness

of n a t u r a l data

language base

goes

far b e y o n d In o r d e r

the p o w e r

of the of

interfaces this

of a v a i l a b l e the

systems.

to m a k e for

some u s e ex-

expressiveness

formal

MS must

at l e a s t

allow

lengthy

~6

pressions

b a s e d on a small n u m b e r of o p e r a t o r s n e s t i n g of functions).

(such as the d e f i n i t i o n this c a p a b i l i t y is

of a l g o r i t h m s , only found proSlem

In general, [41].

in s o - c a l l e d n a v i g a t i n g

systems

In t h e s e a c o m p l e x into a The

is d e s c r i b e d b y a s i n g l e e x p r e s s i o n w h i c h is d i v i d e d or p a r a l l e l

large n u m b e r of s u c c e s s i v e results search Of steps

steps d u r i n g processing. steps

serve to the f o l l o w i n g A classic

as a g u i d e to f u r t h e r model; Counter or

in the data base.

example

is the r e l a t i o n a l

suggestions examples simply lished

e x i s t for a n a t u r a l

language

access

to it [17,42].

s e e m to be the N e t w o r k Model,

the H i e r a r c h i c a l Model, no a p p r o a c h e s to them. for n a v i g a t i n g

file m a n a g e m e n t .

Correspondingly, access

have b e e n pub-

for use of n a t u r a l - l a n g u a g e

O t h e r data m o d e l s binary relations [16].

that m i g h t p r o v i d e

a basis

systems sets and

are

[12], L E A P - s t r u c t u r e s

[43] or m a t h e m a t i c a l

relations

The l a t t e r w i l l s e r v e as our s t a r t i n g p o i n t for d e m o n access to a data b a s e system.

strating natural-language

57

A set-theoretic

modelling

system

(set language)

The s e t - t h e o r e t i c

modelling

system of KAIFAS In order [38,44]. their

is a formal to express The operand

language

MS in types 2-I

based on set and r e l a t i o n the language, operands the language and the control

algebra. elements structure with

algorithms (object)

must be c l a s s i f i e d

into operators,

of the set language together with for the instances corresponding theoretic Further,

together

symbols

are listed

in fig.

examples objemt

from a p h a r m a c e u t i c a l are g e n e r a t e d The language Special

application.

The symbols of the setto sets.

of types types. (fig.

by i n d e x i n g includes operators

the symbols

the standard m a p relations

operators some

2-2).

logical

and r e l a t i o n a l of the language are executed.

operators

are defined. the sequential order

The control in which

structure

determines This e.g.:

the o p e r a t o r s

is i n d i c a t e d

in the language

by e x p r e s s i o n s
~(I

in functional
, MA (M

notation,

Steicard

R prescription drug' Vg(drug' a prescription

Iheart neurosis ))) drug for heart neurosis?) , @. They offer of complexity: drugs? cytostatic drugs? the

(Interpretation: The operators Loops (only)

Is S t e i c a r d i n applied

were

in the sequence:

Vg, M n

are i n t r o d u c e d possibility

by the use of b o u n d e d queries for g l a u c o m a

quantifiers.

to formulate

of any degree prescription with

(I) Are all drugs (2) W h i c h In both tics") those examples a certain with elements

antibiotics

are i n c o m p a t i b l e

the flow of control of a given condition cytostatic listed set drugs) (i.e. (e.g.:

is identical: "drugs for glaucoma" to a test. In or "antibioto be in(2) only (I) cormachine: drug,

For each element compatible responds

to be a p r e s c r i p t i o n is subject the test yields

are

for w h i c h

"true".

to the f o l l o w i n g

formulation

in the s e t - t h e o r e t i c

A L ( x I, Vg(Rdrug, Important AL: Bounded quantifiers all, every

Iglaucoma)' ~ (xl, Mprescription drug ))

are: EI: three some arguments: each of its s u b s t i t u t i o n s xI (range): ZB: how many

DB: which quantifiers contain

(a) The name of a bound variable, defining an i n v o c a t i o n resulting (b) An e x p r e s s i o n

of the loop:

in a set of objects

Vg(Rdrug' Iglaucom a)

58

(c) An e x p r e s s i o n (scope): Expressions

for the c o n d i t i o n resulting in a truth value

~ ( x I , Mprescription drug). i.e.

cont a i n i n g q u a n t i f i e r s m u s t be in prenex normal form,

q u a n t i f i e r s must always appear as the leftmost part of an expression.

59

Premises

for l a n g u a g e

analysis

in d a t a

base

systems

The main

purpose

of d a t a

base

systems

is to p r o v i d e of d a t a

tools

for the

management

and retrieval storage

of large

volumes

that

are m a i n t a i n e d can o n l y be such

on p e r i p h e r a l justified as storage be

devices.

Access

by n a t u r a l

language

if it does space

not consume

an i n o r d i n a t e

amount

of r e s o u r c e s

or p r o c e s s i n g on a n a t u r a l interface

time.

Consequently interface

a number

of r e s t r i c t i o n s system system. of a

must

imposed

language

in a data b a s e

as c o m p a r e d (I) The

to the

of a g e n e r a l should

language

understanding in terms

natural

language model.

interface

be d e s c r i b a b l e

simple This

syntax

suggests

to l i m i t has

the shown

syntax that

model

to c o n t e x t - f r e e grammars but

grammars. are i n a d e q u a t e used

Previous for

research

context-free language, the n e e d at l e a s t Indeed, German

the p u r p o s e

of d e f i n i n g

natural

the e x a m p l e s complex

in the mars

literature

for d e m o n s t r a t i n g exotic nature, goes.

of m o r e as

gramapplia

are of a r a t h e r in d a t a base

far as t h e i r [45]

cability

systems

Kratzer

defined

comparatively free tics grammar of his

large without

subset

of n a t u r a l

by m e a n s

of a c o n t e x t the semandefiMS

indicating Therefore all

any n e e d

for r e s t r i c t i n g expect

subset.

one w o u l d

a context-free with formal

nition and

to be j u s t i f i e d

the m o r e

in c o n n e c t i o n

the r e s t r i c t i o n s [18] also

of the

semantics that

corresponding there base

to it.

The work

by M a l h o t r a sive

indicates

is no n e e d area. Hence

for an e x t e n the a p p l i c a t i o n con-

language

definition languages

in the d a t a does not

of c o n t e x t - f r e e straints (2) Simple The on the

seem

to p l a c e

unreasonable to use. analysis.

formulations should be

a user m a y be able chosen for m o r p h e m i c and G e r m a n

procedures

analysis

of n a t u r a l

language,

in p a r t i c u l a r , Depending forms)

introon the and

duces

the p r o b l e m error

of m o r p h e m i c rate

analysis

[20,46]. word

permissible efforts [21]).

(incorrectly problem may

reduced rise

costs

for s o l v i n g Preferably, rate

this simple

arbitrarily be chosen

high here

(e.g. again, (masking

procedures

should

the e r r o r [47]) (3) V e r b s Both

resulting low

from even very (~30 %). f r o m the

simple

procedures

is s u r p r i s i n g l y should be

omitted (I) and

interface. justified systems, are all the more, verbs since,

requirements

(2) can be base

in d e f i n i n g to a large MS accounts

a language extent.

for d a t a

m a y be o m i t t e d as soon as a the

Obviously,

verbs

indispensible and,

for t e m p o r a l ' r e l a t i o n s h i p s

consequently,

permits

60

description

of d y n a m i c p r o c e s s e s however, have

(e.g.

see REL

[40]).

Data b a s e s

of that kind,

so far n e v e r gone b e y o n d p i l o t studies; use do not include t h e m yet. on the average, influences fast. of the

d a t a b a s e systems (4) The p a r s e r Like (I) and total free system. languages

in p r a c t i c a l

should be simple and, (2), this r e q u i r e m e n t In the l i t e r a t u r e are g i v e n

the c o m p l e x i t y

a n u m b e r of p a r s e r s

for c o n t e x t is m e a s u r e d sentento

[48,49,50].

Their efficiency

in terms of an u p p e r ces c o n t a i n i n g kn3). However,

limit of the time n e e d e d (in g e n e r a l

for p r o c e s s i n g

n words

the e f f i c i e n c y

is p r o p o r t i o n a l

for q u e r i e s

to a d a t a b a s e s y s t e m n is c o m p a r a t i v e l y

small, so that for choosing a parser the factor k becomes of major importance. (5) The s e m a n t i c tic analysis. The s e m a n t i c v a l i d i t y of a q u e r y m a y be c o n t r o l l e d with the retrieval. Concurrent in c o m b i n a t i o n validity test s h o u l d be p e r f o r m e d only after a s y n t a c -

a c c e s s to the data base w o u l d h a v e a

far m o r e n e g a t i v e

effect,

since all d e a d ends d u r i n g the a n a l y s i s time a l t h o u g h not to the result. [51] q u i t e a n u m b e r of a u t h o r s [17,32]

w o u l d add to the total r e t r i e v a l I n s p i t e of w e l l - k n o w n working objections

at n a t u r a l - l a n g u a g e

a c c e s s to d a t a base systems the v a l i d i t y test.

d e f e n d the p r i n c i p l e syntactically

of p o s t p o n i n g

The n u m b e r of

c o r r e c t but s e m a n t i c a l l y m e a n i n g l e s s structure

constructions (see 4.1).

m a y be r e d u c e d by a special B a s e d on the a b o v e d e m a n d s for a c c e s s i n g grammar

of the g r a m m a r

a s u b s e t of the G e r m a n

l a n g u a g e was d e f i n e d

the K A I F A S data b a s e

s y s t e m by m e a n s of a c o n t e x t - f r e e [20] a p r o c e d u r e for sim-

(4.1). F o l l o w i n g

the w o r k s by S c h o t t (without verbs) [52]

plified morphemic was d e r i v e d ments of (4)

analysis

w a s developed.

The p a r s e r

f r o m the M . K a y - p a r s e r (see 4.3, [38]).

since

it b e s t m e t the r e q u i r e from the n a t u r a l follows language

The t r a n s l a t i o n (MS in KAIFAS)

to the set a l g e b r a i c approaches.

language

traditional

This p r o c e s s

consists

of the four steps:

(a) l e x i c a l

analysis analysis

(b) s y n t a c t i c a l

(c) code g e n e r a t i o n (d) t r a n s f o r m a t i o n s . Fig. these 3-I i l l u s t r a t e s the f o l l o w i n g the i n t e r a c t i o n functions between the steps. D u r i n g each of

are p e r f o r m e d . symbols. When s e a r c h i n g a dictio-

(a) T h e q u e r y

is d i v i d e d

into t e r m i n a l

n a r y on those, language

their corresponding

representations

on the set-

level is found.

61

(b) The parser completes ma r.

the syntactical

analysis by means of the gram-

(c) If the query is parsed to a sentence, an expression (d) Then, of the set-language

the code generation will form representations according

using the terminal

and the code fragments transformations

generated by the parser. will be applied to this expression in chapter 4.5.

to certain rules which will be explained

62

Translation:

Natural

language

into set l a n g u a g e

4.1

Type and form of the G e r m a n g r a m m a r

4.1.1

Vocabular[

As p o i n t e d out e a r l i e r guage interface German language

a context-free

formalization

of the n a t u r a l

lan-

in K A I F A S was chosen. can be d e s c r i b e d

W h i l e a s u i t a b l e s u b s e t of the language, i.e. it is just

as a c o n t e x t - f r e e be p r a c t i c a b l e ,

as i m p o r t a n t

that this d e s c r i p t i o n to the d e s i g n e r

comprehensible b y the

and t r a n s p a r e n t system. tools.

and user,

and easy to

Practicability

can only be a c h i e v e d by u s i n g some a d d i t i o n a l

Each context-free describe

grammar

contains phenomena

a n u m b e r of n o n - t e r m i n a l s of a language. see e.g.

in o r d e r to ia~of case,

the s y n t a c t i c a l

In m a n y n a t u r a l

guages these tend to be q u i t e extensive, g e n d e r and number. grammar, duced. so-called

the c o m b i n a t i o n s

In o r d e r to limit the set of n o n - t e r m i n a l s complex categories schemas (based on REL [40,53])

in the are intro-

T h e s e m a y be c o n s i d e r e d

for n o n - t e r m i n a l s , Traditionally phenomena

and c o n s i s t of ([40,54,55]), such as noun, such as nume.g.:

a main category

and a n u m b e r of features. are r e l a t e d features

the m a i n c a t e g o r i e s n o u n phrase, ber or case. whereas

to s y n t a c t i c a l

r e f e r to s e c o n d a r y p h e n o m e n a correspond

The values number case

of a f e a t u r e (I) (2) = =

to such p h e n o m e n a ,

singular genitive e.g.

Schemas

denote

sets of n o n - t e r m i n a l s , N num,cas,gen

denotes

a set of 24 n o n - t e r m i n a l s

all of t h e m nouns. values

Complex categories

m a y be p a r t i a l l y

o r d e r e d by a s s i g n i n g N

to the features:

n u m , c a s (2,3),gen(1,2) s c h e m a than N n u m , c a s , g e n , and d e n o t e s are p o s s i b l e a set of o n l y cas(2,3)

is a m o r e r e s t r i c t e d 8 non-terminals, and gen(1,2). category

since only two v a l u e s

for each,

Assigning

a s i n g l e v a l u e to each f e a t u r e of a c o m p l e x

results

in a single n o n - t e r m i n a l . in K A I F A S is d i f f e r e n t from already

The t r e a t m e n t existing

of c o m p l e x c a t e g o r i e s

approaches

in s e v e r a l respects: semantics semantical RE). in KAIFAS, aspects main categories are

(a) B e c a u s e chosen

of the r e s t r i c t e d in a c c o r d a n c e w i t h ME,

(e.g. m a i n c a t e g o r y sure that

for sets:

for relations:

In this w a y we m a k e

63

only semantically ductions

valid constructions

are described by the pro-

of the grammar. are based on semantical terms does

(b) A grammar whose main categories not "naturally" nouns, reject sentences in number,

containing major errors such as and syntactical

missing congruence categories.

gender and case of adjectives

since these concepts The necessary

are based on the traditional follows

syntactical

aspects will be assigned to the correspondences

the features.

As a result KAIFAS closely - semantical - syntactical for example,

main categories features For practical tained. The set language, (syntactical)

aspects aspects. cannot always be main(semantically) difare handled and in contains

reasons this classification

ferent types of quantifiers in the same hence productions

which in many productions

way. The number of main categories the difference

can be reduced by expressing

type on the feature

level so that only one main category will be

defined for quantifiers. (c) Only binary features case features: gender features: number features: The binary values are allowed:
~enj femj dat, neu acc

hum, mas,

sin, p l u

are designated by +/-. Then,

operations with fea-

tures may easily be expressed by logic formulas. Fig. 4-I provides a list of the main categories and features of the

KAIFAS grammar.

It is apparent

from the figure that the main categories

fall into two classes, (a) object-categories (b) opera~or-categories, corresponding Applying to the classification of the set-theoretic language into in objects and operators. the same distinction (a) object symbols instances of the object types of the set-theoretic in the sense of symbols the operators to the terminals of the grammar results

which represent (b) operator which represent machine [56]).

(the environment

of the language

Both because the set of terminal mately 50.000 objects

symbols

is variable area)

and large

(approxi-

in the pharmaceutical

terminal productions

84

are not made tionary symbols will

part of the grammar, analysis set,

but are m a i n t a i n e d in chapter

by means

of a dic-

(lexical

is d e s c r i b e d

4.2).

The operator such in

form a fixed

but some c o n c o r d a n c e s to w h i c h The o p e r a t o r

can be identified, only one m a i n are also

as all operator be a s s i g n e d the dictionary.

symbols

for q u a n t i f i e r s

category

(see above).

symbols

included

4.1.2

Productions categories requires a similar extension of p r o d u c t i o n s

The use of complex into c o m p l e x NP This

productions: Det schema


cas,genjnum

cas, gen, num

casjgen,

num

is a p r o d u c t i o n when

from w h i c h

one may derive

a set of contextfor the comcertain the complex categories only, may

free p r o d u c t i o n s plex categories. conditions rules

substituting

suitable

non-terminals have Consequently, on main

In doing

so, the feature

values

to meet

(e.g. c o n g r u e n c e in case and number). into a rewrite specifying

are s e p a r a t e d

rule defined in this

and a feature be a s s i g n e d consists values tion have defining

program

which

combination rule.

of feature

values

to the c o m p l e x

categories specifying

The feature

program

of a test section to meet in order values

the conditions

that the feature section Test and

of the c o m p l e x the feature

categories

in the r i g h t - h a n d

part of the produccategory.

to apply the rule, for the l e f t - h a n d in list form

and an a s s i g n m e n t complex

specification in a special In summary,

could be done programming

([45])

or by means

of programs

language

such as in KAIFAS. as follows: rule program (test) of

a complex

rule may be d e f i n e d .... Vp

(I) Vo VI

rewrite feature

(2) A(VI,...,Vp) (3) Z ( V l , . . . , V p) (4) S(VI,...,Vp)

feature p r o g r a m (assignment the features of V O) semantical part of the rule

Vo, VI,...

Vp denote complex c a t e g o r i e s , Vo' V I ' ' ' " Vp t h e i r main catea summary of the o p e r a t o r s used in feature programs.

gories.
Fig. The 4-2 provides semantical

part

is a t e r m for d e f i n i n g symbols, categories

the m e a n i n g Since

of the c o m p l e x for the some s e m a n t i c a l

rule.

It consists

of s e t - l a n g u a g e

and p l a c e - m a r k e r s

semantics

of the eomplex

in the rule.

65

aspects

are treated on the feature concerning as well. features.

level,

the semantical

part may depend

on conditions

These dependencies

are defined by

feature programs

Thus the semantical be phrased as S' (VI,...,Vp)

part S(Vl,...~ p) of a com-

plex rule may alternatively A(VI,...,V p) ~ or Fig. S' (VI,...,V p)

(dependencies

on features)

(no dependencies)

4-3 gives an example of a complex rule. steps (further details

A complex rule may be applied in the following are provided in chapter 4.3):

(I) Matching rule

the input string with the right-hand

side of the

(2) Testing the right-hand (3) If yielding assignment 4.2 4.2.1 Lexical analysis "true", of features

features

for acceptance side and

reduction

to left-hand

and semantics

Assignment of complex categories

The dictionary symbols

contains

all the object-symbols into types.

and all those operatorSince the set of objectfashion, these

symbols which have been classified

is chosen in a user- or application-dependent

will not be defined until a user actually works with the system. The lexical analysis fulfills two functions: to a terminal

(a) assignment

of a complex category

(a I) assign a main category (a2) assign feature-values, (b) assignment language). A large n u m b e r of syntactical ambiguities may be expressed within one "ein" of semantics (i.e. a terminal symbol of the set

complex category,

such as the ambiguity

arising by the German word

(+nom or +aaa): ein lexical analysi~ QU-mas-fem+neu+nom-gen-dat+acc+sin-plu


may be considered a conjunctive logical form

where the list of feature-values expression. is used. If disjunction

is needed as well,

a conjunctive

normal

88

Example

from G e r m a n

language:

drageef~rmigen --~ ~ ME+mas-nom+gen+dat+acc+~in-plu I ME+fem+neu-nom+gen+dat-acc+~in-plu


Here the a c c u s a t i v e case is a l l o w e d for the g e n d e r of m a s c u l i n u m , but

n o t for n e u t r u m or femininum.

4.2.2

Morphology

The m u l t i t u d e

of i n f l e c t i o n s all w o r d

in G e r m a n forms

language

does not a l l o w

for sto-

ring in a d i c t i o n a r y vocabulary. Rather,

to be d e r i v e d only contains

from a large u s e r the w o r d stems. Reduc-

the d i c t i o n a r y form to w o r d

tion f r o m i n f l e c t i v e (morphological

stem is done of v e r b s

by a l g o r i t h m i c m e a n s simplifies the problem.

analysis).

The e x c l u s i o n

A w o r d s t e m is d e f i n e d as follows: (a) nouns: nominative-singular attributive form

(b) adjectives: Also, the m o r p h o l o g i c a l

form the s y n t a c t i c a l approaches struc-

analysis must determine case, etc.).

ture of a t e r m i n a l published

(gender,

Different

have been

for s o l v i n g this p r o b l e m

for the G e r m a n linguistic

l a n g u a g e ([20]), information be sup-

but all of t h e m r e q u i r e

that e x t e n s i v e

p l i e d w i t h each w o r d in the d i c t i o n a r y w h i c h f r o m a c a s u a l user. Therefore, when defining

can h a r d l y be e x p e c t e d a w o r d s t e m in K A I F A S the

user w i l l o n l y be r e q u i r e d (I) o b j e c t - c l a s s (2) g e n d e r (3) n o u n / a d j e c t i v e (4) s i n g u l a r

to s p e c i f y of a w o r d

a m i n i m u m of i n f o r m a t i o n ,

namely:

and p l u r a l

forms of the w o r d class (see [20]). to the

The w o r d m a y then be a s s i g n e d This word class stem. contains

to a s p e c i f i c m o r p h e m i c endings

all m o r p h e m i c

that m a y be a t t a c h e d one or m o r e

Each morphemic

ending will determine

syntactical that c o n t a i n the

structures this ending.

(set of f e a t u r e values) By e x p l i c i t l y

for all t h e s e t e r m i n a l s

s t o r i n g the p l u r a l of t e r m i n a l s

forms of t e r m i n a l s of

highly problematical vowels becomes

reduction

involving mutation storage

unnecessary. since p l u r a l Fig.

The a d d i t i o n a l forms o c c u r

space r e q u i r e d m a y set- and r e l a t i o n class. struc-

be t o l e r a t e d , identifiers

in case of

only.

4-4 p r e s e n t s

an e x a m p l e

of a m o r p h e m i c

In o r d e r to save s t o r a g e

space in the d i c t i o n a r y ,

the s y n t a c t i c a l

ture of a w o r d stem w i l l also be d e f i n e d b y m o r p h o l o g i c a l

analysis.

67

Thus

any m o r p h e m i c

class will

contain

an entry

for the null ending is illustrated. ambiguous allow 4-6.

"e".

In fig.

4-5 the complete structure

lexical

analysis

of a query

The s y n t a c t i c a l the lexical disambiguation

of a terminal

can be highly however, of fig.

due to

analysis.

The feature

programs,

for easy

as d e m o n s t r a t e d

by the example

4.2.3

Lexical

analysis:

algorithm

Lexical simple For

analysis algorithm

for a word X = X l . . . x k is carried outlined min below: (k-I,3) : (delete x k _ Z . . . X k from x)

out according

to the

Z = O,1,2,...,

x' = X / X k _ Z . . . X k If x'is belongs (I (2 (3 (4 This

found in the d i c t i o n a r y to the m o r p h e m i c to x': category the m a i n class

and X k _ Z . . . x k of x', ens

then assign

of the d i c t i o n a r y entry

the features the features morphemic the s e m a n t i c s is a p p l i e d

of the d i c t i o n a r y defined

by the entry

of X k _ Z . . . x k in the

class specified to each in the dictionary. of a query. parsing The result will process.

algorithm

terminal

be c o n v e r t e d 4.3 Parser

to a form suitable

for the e n s u i n g

The parser According conditions:

completes

the s y n t a c t i c a l

and s e m a n t i c a l

analysis

of a query.

to what has been said so far it has to m e e t context-free on complex

the f o l l o w i n g

(I) The parser has to recognize (2) It must be able to operate (3) The storage be kept val process. (4) F u r t h e r m o r e constructing process. in order space small in c o m p a r i s o n

languages. and rules. should retrieof the entire

categories

and e x e c u t i o n

time r e q u i r e d

for the analysis

to the r e q u i r e m e n t s approach is needed

a syntax-directed of a special a grammar

for parsing which

is i n d e p e n d e n t

grammar.

This

is due to the fact that is an a p p r o x i m a t i v e and enlarged, the set of modified

for natural

languages

The g r a m m a r will be c o n t i n u o u s l y to e l i m i n a t e sentences. wrong constructions

or to extend

permissible

68

Several parsers adaptation for cfg's to

are k n o w n to m e e t c o n d i t i o n s necessary. Amongh

(I) and these,

(4), w h e r e a s Earley's

an

(2) is always itself

parser

suggests

[48].

However,

adapting

an i m p r o v e d v e r s i o n in an u n w i e l d y

of this p a r s e r

to c o m p l e x c a t e g o r i e s condition (3)

and rules [57]).

resulted

algorithm violating Therefore original

(see

a p a r s e r b a s e d on the ideas of Kay algorithm is c a p a b l e of o p e r a t i n g to c o n t e x t - f r e e grammars

[52] was d e v e l o p e d . rewrite

The

on g e n e r a l expressed

rules

but was r e s t r i c t e d notation.

in our c o m p l e x

O n l y a short i n t r o d u c t i o n [38].

to this p a r s e r will be p r e s e n t e d ,

for d e t a i l s w e r e f e r to Fig. 4-7 r e p r e s e n t s

a t y p i c a l p a r s i n g g r a p h as g e n e r a t e d by the parser. n+1 v e r t i c e s for a q u e r y consisting of n words. and its se-

The g r a p h c o n t a i n s

E v e r y edge of the g r a p h mantics. During lines lexical analysis in fig. 4-7).

is l a b e l l e d by a c o m p l e x c a t e g o r y

an i n i t i a l p a r s i n g

g r a p h is c o n s t r u c t e d

(heavy

It c o n t a i n s

edges o n l y b e t w e e n v e r t i c e s two v e r t i c e s

k and k+1 Z the

(l~k~n). number

The n u m b e r of edges b e t w e e n categories

is Z, w h e r e

of c o m p l e x

a s s i g n e d to the k - t h t e r m i n a l

in a query. Starting (k<k'~n+1)

The p a r s e r o p e r a t e s at v e r t i x k, the p a r s e r right-hand

on the i n i t i a l

p a r s i n g g r a p h as follows: f r o m k to v e r t i c e s within On total k'

for all s e q u e n c e s

of edges

compares

the m a i n c a t e g o r i e s

the labels w i t h t h e a g r e e m e n t w i t h a rule

sides of all c o m p l e x rules. the f o l l o w i n g

r, the p a r s e r p e r f o r m s

steps: on the c o m p l e x c a t e g o r i e s

(a) The f e a t u r e p r o g r a m of rule r o p e r a t e s in the s e q u e n c e of edges. "true",

(b) If the test y i e l d s the s t a r t i n g

the p a r s e r p r o d u c e s

a n e w edge b e t w e e n The n e w

and e n d i n g v e r t i c e s

of the s e q u e n c e of edges. side of the r e w r i t e section

edge is l a b e l l e d b y the l e f t - h a n d the f e a t u r e s p r o g r a m of r. (c) T h e edge is a d d i t i o n a l l y all p l a c e m a r k e r s plex categories This p r o c e s s vertix 1. of a q u e r y w i l l p r o v e I and n+1, w h i c h obtained

rule and b y

f r o m the a s s i g n m e n t

in the f e a t u r e

l a b e l l e d by the s e m a n t i c s

of the rule w i t h of the com-

r e p l a c e d by p o i n t e r s

to the s e m a n t i c s

in the s e q u e n c e

of edges. f r o m r i g h t to left down to

is r e p e a t e d

for all v e r t i c e s

The p a r s i n g

successful,

if there is an edge be-

tween vertices

is l a b e l l e d by the a x i o m of the g r a m m a r

(in this case SA).

69

In fig.

4-8 a s p e c i a l

numbering

( @

, Q)

shows the order by w h i c h the of the p a r s e can be deObviously,

e d g e s have b e e n generated.

The exact structure connect

r i v e d by m e a n s of the p o i n t e r s w h i c h the p o i n t e r complete structure saves s t o r a g e

the semantics.

space o v e r a s o l u t i o n g e n e r a t i n g

code f r a g m e n t s

for each edge.

4.4

Code g e n e r a t i o n

O n code g e n e r a t i o n pressions the query.

the f r a g m e n t s

are a s s e m b l e d

into one or m o r e exo n the a m b i g u i t y of

of the s e t - t h e o r e t i c These expressions

language depending

form the r e s u l t of q u e r y t r a n s l a t i o n .

4.5

Transformations

Applying results r

the code g e n e r a t i o n in the f o l l o w i n g

process

to the p a r s i n g g r a p h

in fig.

4-7

expression: Ng(R8,I128)) since the q u a n t i f i e r is not the

(M25,M4),#),

This expression

is n o t w e l l - f o r m e d , (prenex n o r m a l

leftmost operator Problems

f o r m ) . # serves as a place_markerforthe scope. properties of the set lanintro-

of this k i n d a n d o t h e r s y n t a c t i c a l

g u a g e pose d i f f i c u l t i e s d u c e d above [38]. O t h e r

w h e n h a n d l e d by the t r a n s l a t i o n m e c h a n i s m examples of this n a t u r e are the following:

(a) N e s t i n g of q u a n t i f i e r s rules w h i c h c o n t r o l set q u a n t i f i e r s fiers like AL,

in the set l a n g u a g e

is s u b j e c t to c e r t a i n an e x p r e s s i o n :

their r e l a t i v e

position within

like DB m u s t a p p e a r EI.

in f r o n t of logical q u a n t i -

(b) D i f f i c u l t i e s operator

arise

from the d i f f e r e n c e

in r e l a t i v e

position

of

symbols

for q u a n t i f i e r s

w i t h i n n a t u r a l G e r m a n and in w i t h i n the s e t - t h e o r e t i c

that of t h e i r c o r r e s p o n d i n g equivalent: Which remedies DB(xl,Mdiseases, These problems grammar proves to an a n a l y s i s process

quantifiers

for w h i c h d i s e a s e s

are p r e s c r i p t i o n

drugs? drug )))

DB(x2,Vg(Rremedy,Xl) , ~ (x 2 ,M p r e s c r i p t i o n

can be s o l v e d b y m e a n s of g r a m m a r impractical (see [38]). Thus

rules, but t h e n the are d e f e r e d

these problems

p h a s e that takes p l a c e a f t e r c o m p l e t i o n of the g r a m m a r pointer

of the p a r s i n g

and h e n c e a f t e r a p p l i c a t i o n

rules. structure tree. of the The seman-

A solution semantical

c o u l d be b a s e d on the t r e e - l i k e fragments,

w h i c h w e shall call a s e m a n t i c a l

70

tical linear

tree has then to be t r a n s f o r m e d expression derived

by suitable puts

rules

such that the in

by c o d e - g e n e r a t i o n of p a r s i n g - t r e e s grammars, but [58].

the q u a n t i f i e r s

right order. by means large efforts

Transformations

are usually

formulated

of t r a n s f o r m a t i o n a l

implementing

these requires

in time and p e r s o n e l l

Moreover, general exists formation

the problems rule defined

just d i s c u s s e d problem. on semantical rule d e f i n e d approach contain

form a trivial trees where

subset

of the transthere

transformation a corresponding these rules

It can be shown

that for every form, trees

and needed here basic

on the linear

the expression. are given as the pattern

In place of the traditional arguments, tained in the e x p r e s s i o n

an e x p r e s s i o n

pattern

which must be con-

to be transformed.

For example,

(a) DB I (x I,vg(R,DB 2 (x2,M,#))) is t r a n s f o r m e d into:

DB 2 (x2,M,DB I (x l,vg (R,x2) ,# )

Manipulations tion of trees. generation. manipulation direction Fig.

of linear e x p r e s s i o n s Thus

are easier

to do than t r a n s f o r m a until after code

the t r a n s f o r m a t i o n s

are p o s t p o n e d

The t r a n s f o r m a t i o n s language forming in is d i s c u s s e d [59]).

may be f o r m u l a t e d a set of procedures. These procedures

by means

of a stringin this in the

(Our work

are i n t e g r a t e d

system and e x e c u t e d 4-8 shows

after

code generation. of transformations.

some examples

7~

5 The

Conclusions linguistic techniques discussed in this p a p e r have been i m p l e m e n t e d 6700 as part of the KAIFAS has b e e n g a i n e d

at the U n i v e r s i t y information by a p p l y i n g

of K a r l s r u h e

on a B u r r o u g h s

system.

Some e x p e r i e n c e

in t h e i r u s e f u l n e s s

the s y s t e m to a p h a r m a c e u t i c a l available This

data b a s e c o n t a i n i n g d a t a on market

a part of the d r u g s (about 8000 [60]).

on the G e r m a n p h a r m a c e u t i c a l

data b a s e was a p p l i e d by e x p e r t s interface. was

inexperienced

in d a t a p r o c e s s i n g O n e of the p u r p o s e s chapter language

v i a the n a t u r a l G e r m a n of this i m p l e m e n t a t i o n

to test the p r e m i s e s

in

3 for t h e i r validity. interface

A context-free

d e f i n i t i o n of the n a t u r a l Whether in the this

proved sufficient

for this a p p l i c a t i o n .

is true in g e n e r a l interface.

can only be d e c i d e d

if one i n c l u d e d verbs

The d e s c r i p t i v e large,

p o w e r of the c f - g r a m m a r because several

in the s y s t e m w a s

even u n n e c e s s a r i l y tended

the u s e r s w h e n w o r k i n g w i t h the s y s t e m short q u e r i e s i n s t e a d of a s i n g l e F o r this r e a s o n

to use s u c c e s s i v e l y i.e.

long one, relative Instead, needed

they solved their problems could eventually

in steps.

clauses

be e x c l u d e d

from the q u e r y language. stated before is

the p o s s i b i l i t y

of r e f e r e n c e s

to q u e r i e s

involving

solutions analysis

to the w e l l - k n o w n proved

p r o b l e m of pronouns. too. All c o r r e c t

The m o r p h o l o g i c a l inflectional

to be s u f f i c i e n t ,

forms w e r e d e t e c t e d and reduced. however, that a s y n t a c t i c a l

The simple a p p r o a c h will inflectional form

not guarantee, w i l l be r e f u s e d

incorrect

u n d e r any c i r c u m s t a n c e s . to c f - l a n g u a g e s , t u r n e d out to be is s u p e r i o r to in

T h e M . K a y - p a r s e r , w h i c h we r e s t r i c t e d a very simple a l g o r i t h m .

One can show t h a t the a l g o r i t h m to p r o c e s s i n g

Earley's

parser with respect

time for short s e n t e n c e s

the n e i g h b o u r h o o d is p a r t i c u l a r l y

of ten w o r d s or less.

Consequently,

the M . K a y p a r s e r above.

s u i t e d to the s t e p w i s e u s e r a p p r o a c h m e n t i o n e d

72

Fig. 2-I

Object

types

Individuals, Sets, Lis~of e.g.

e.g.

Thomapyrin, diseases

Perphyllon

M
R

drugs,

individuals e.g. indication, contraindication,

Relations,

manufacturer Lists of pairs of i n d i v i d u a l s to cover n - a r y relations) Numbers Measures, Measure e.g. 4 tablets/day e.g. dosage last c o m p o n e n t (Work is u n d e r w a y

functions,

L i s t s of o r d e r e d n - t u p l e s w h o s e is a m e a s u r e truth values

Operands
I I, 12, I3,..., I n , M I, M 2, M 3, .... , M k, R I, ....

73 Fig. 2-2

Operators on sets: Mb(I l,...,I n ) Mu (M 1 ,M 2 ) M ( M 1 ,M 2 ) Km (M 1 ,M 2 ) Kz (M I ) on relations: Ko (R 1 ) Rb (R 1 ,M I ) RP (RI ,R 2 ) RU (R 1 'R2 ) reduction of b i n a r y relations: domain range {xI~y: {xI3y: range (x,y)~ R I} (y,x)& R I} {xl (x,Ii)~ R I} {xl (If,x)~ R I} converse restriction
product union

set c o n s t r u c t i o n union intersection set d i f f e r e n c e cardinality

({ (x,y) I ( x , y ) ~ R 1 A x G Mi})

v o (R i )

Na(R I) Vg (RI,I 1 ) N g (RI,I I ) reduction of m e a s u r e Fw(FI,I I ) logical operators:

individual individual functions: measure

domain

number

G (II,M I )
C (MI,M 2 )

test on set m e m b e r s h i p test on set i n c l u s i o n

74

Fig.

3-I

Translation

to set language

75

Fig.

4-I

Main cate@ories

AF DZ ED EI IN ME MF PR RE RP RS SA SF QU VO ZA

AL - quantifier measures t e r m for e v a l u a t i n g m e a s u r e s m e a s u r e units p r o p e r name set measure function o p e r a t o r s : r e l a t i o n s sets (prepositions) relations DB-quantifier r e s t r i c t i o n of sets (relative clause) sentence EI / KE - q u a n t i f i e r other quantifiers relational operator number

Features

syntactical mas fem neu nom gen dat acc sin plu adj att ajm pdt prm pom std svk

aspects:

masculinum femininum neutrum nominative genitive dative accusative number adjective/noun attributive adjective-modified predetermined (the drug) premodified (Peter's friend) postmodified (friend of Peter) strong declination stops g e n i t i v e c o n c a t e n a t i o n s aspects:

semantical qua neg frw mul div exp add

quantified negation-quantifiers (no: KE) i n t e r r o g a t i v e (who, what) for a r i t h m e t i c operators

76

Fig. 4-2 Operators in feature programs: (I) test part:


test

(<complex category>,<list of feature-values>) yields true, if the complex category has associated with it the feature-values specified, else false. (<complex category>,<complex category>,<list of features>) yields true, whenever at least one of the listed features agrees in both complex categories specified. (<complex category>,<complex category>,<list of features>) same as meq, but all features must agree.

meq

equ

A , V logical connectives

(2) assignment part: (all assignments are to the complex category of the left rule part)
zuw

(<list of feature-values>) assigns the feature-values specified. (<complex category>,<list of features>) copies the values of the features of the denoted complex symbols. (<complex category>,<complex category>,<list of features>) assigns those feature-values which agree in both complex categories.

cop

and

77

Fig.

4-3

(I (2

ME ME

ME

rewrite

rule

test (ME 2, +adj-att) ^ test (ME3,-adj)^ feature p r o g r a m meq (ME2,ME 3, sin,plu) meq (ME2,ME3,nom,~e~dat, acc) A (test) meq
(ME 2, ME3,mas, femjneu);

(3

zuw(-adj), and(ME2,ME3,sin, plu), and(ME2,ME3,mas, fem, neu) feature p r o g r a m and (ME2,ME 3 , nom, gen, dat, acc); (assignment)
M~ (ME2,ME 3) semantical part

(4

The upper

index

(ME 2) serves

for d i s a m b i g u a t i o n

of the complex those

categoand

ries of the production. nouns w h i c h case. do not agree

The feature in number

test excludes

adjectives or

and in at least one of gender is treated

The r e s u l t i n g

complex

category in gender

like a noun (-adj),most

of the possible operator.

ambiguities

and noun are solved by the and -~ is the set-intersection.

The semantics

of the p r o d u c t i o n

78 Fig. 4-4

Morphemic class for articles singular (e.9. "kein")

ending

syntactical structure +mas-fem-num+nom-gen-dat-acc -mas-fem+neu+nom-gen-dat+acc

"e"
"es"

-mas+fem-neu+nom-gen-dat+acc +mas-fem+neu-nom+gen-dat-acc +mas-fem+neu-nom-gen+dat-acc +mas-fem-neu-nom-gen-dat+acc -mas+fem-neu-nom+gen+dat-acc

"em" "en" "er"

Word

Main category features

set-theoretic representation

Welche +mas+fem+neu+nom+acc+plu +fem+nom+acc+sin DB DB +mas+neu+gen+dat+acc+sin +fem+neu+gen+dat+sin +mas+fem+neu+nom+gen+dat+acc+plu M25 M25 M25 +neu+nom+gen+acc+plu M4

QU QU

drageef~rmigen

ME ME ME

Psychopharmaka

ME

haben
+nom+dat+acc

<terminal> I64
-

Depression

IN

als + f e m + n o m + g e n + d a t + a c c + s in

<terminal>

Ng R8 s

Indikation

RE

80

Fig. 4-6
DisambiHuation Rule :

ME i + ME 2

ME 3

Meq (mas, fem, neu, ME2, ME3) ^ Meq (nom, gen, dat, acc, ME2, ME3 )^ Meq (sin,plu,ME 2 ,ME3) ; And (mas, fem, neu, ME 2 ,ME 3 ) ,And(nom,gen,dat,acc,ME2,ME 3 ) , And (sin,plu,ME 2 ,ME 3 ) ; M n (2,3) ; applied to (i) (2) (3) (4) ME
ME

+mas-fem-neu-nom+gen +dat+acc+sin~plu
-mas+fem+neu-nom+gen +dat-acc+sin-plu

(drageef6rmigen) " " (Psychopharmaka)

ME
ME

+mas+fem+neu+nom+gen +dat+acc-sin+plu
-mas-fem+neu+nom+gen -dat+acc-sin+plu

B e c a u s e of n u m b e r a n d (4) only. T h e (5) ME

(sin,plu) the f e a t u r e t e s t feature-assignment yields:

accepts

combinations

(3)

-mas-fem+neuenom+gen -dat+acc-sin+plu ME 3

(draqeef6rmigen Psychopharmaka)

Rule :

M E 1 + QU 2

Meq(mas,fem,neu,QU2,ME3)A Meq(nom,gen,dat,acc,QU2,ME3)^ Meq(sin,plu,QU2,ME3); And(mas,fem,neu,QU2,ME3), And(nom,gen,dat,acc,QU2,ME3), And(sin,plu,QU2,ME3): 2(x,3,#) applied to (6) (7) (5) Qu QU ME +mas+fem+neu+nom-gen-dat +acc-sin+plu -mas+fem-neu+nom-gen-dat +acc+sin-plu -mas-fem+neu+nom+gen -dat+acc-sin+plu (6) / (5) is a c c e p t e d . (welche drageef~rmigen Psychopharmaka) (drageef~rmigen Psychopharmaka) ~welche)

Only

Combination ME

Result:

-mas-fem+neu+nom-gen -dat+acc-sin+plu

Thus

all

ambiguities

with

the e x c e p t i o n

of case

(nominativ/accusative)

are resolved.

Fig. 4 - 7

Only

those

features

are

listed

which

prevent

a rule

from

being

applied.

c (o,%)
~ ~ SA ~

//

/I

,,

,/

/|

//

~-+ ~i~\

II

//

"".
"-..

@ D~//
\

\~\\
/
(:D /| M4

""\
ME ~ ' ~

5 haben
,

6 Depression
,. . . . . . . . . . .

7 als
.

8 Indikation

Welche

drageef6rmigen

Psychopharmaka

Grammar rules applied Ng(RE,IN) ME3;

ME + IN als RE;

ME I + Qu

ME2; QU(x,ME2,#) SA + ME 1 haben ME2; o(MEI,ME2)

(the rewrite rule and

semantics only):

ME1+ ME 2

M ~ (ME2,ME3) Only those rules are stated whose application do not result in dead-ends~

82

Fig.

4-8

Transformations

I)

Welche d r a g e e f ~ r m i g e n P s y c h o p h a r m a k a D e p r e s s i o n als Indikation? Preliminary translation c (DB(x,Mn (M25,M4) ,#) , Ng(R8,I128)

haben

Transformation: DB(x,Mn

Q u a n t i f i e r DB is placed in front of the expression, c transformed to ~ .

(M25,M4), G(x,Ng(R8,I128)))

2) Welche Indikationen w e l c h e r M e d i k a m e n t e Psychosen? P r e l i m i n a r y translation: (DB(x I,Vg(R8,DB(x 2,M29,#)) ,#) , M30) Transformation: Both q u a n t i f i e r s

sind

are placed in front

but in reverse order. D B ( x 2 , M 2 9 , D B ( x l,Vg(R8,x 2) , ~ (x1,M30)))

83

[I ]

T.Winograd: Five Lectures on A r t i f i c i a l Science Department, Stanford U n i v e r s i t y

Intelligence. (Sept. 1974)

Computer

[2 ]

L.C.Smith: A r t i f i c i a l I n t e l l i g e n c e in I n f o r m a t i o n R e t r i e v a l Systems. I n f o r m a t i o n P r o c e s s i n g & M a n a g e m e n t , Voi.12, pp.189-222, P e r g a m o n (1976) R.Quillian: Semantic Memory. In M . M i n s k y (ed.) : Semantic tion Processing. MIT Press, Cambridge, Mass. (1968) R. Schank: I d e n t i f i c a t i o n of C o n c e p t u a l i z a t i o n s Language. In Schank and Colby (eds.) : C o m p u t e r and Language, pp. 187-248, F r e e m a n (1973) Informa-

[3 ] [4 ]

U n d e r l y i n g Natural Models of Thought

[5 ] [6 ]

E.Charniak: T o w a r d a Model of C h i l d r e n ' s Story Comprehension. MIT A r t i f i c i a l I n t e l l i g e n c e Laboratory, Cambridge, Mass. (1972) V . C h e r n i a v s k y : On A l g o r i t h m i c N a t u r a l L a n g u a g e A n a l y s i s and Understanding. A d v a n c e d Course on Data Base L a n g u a g e s and N a t u r a l Language Processing, F r e u d e n s t a d t (Sept. 1976) Procs. IFIP-TC-2 W o r k i n g C o n f e r e n c e on: M o d e l l i n g M a n a g e m e n t Systems, Freudenstadt, 5 . - 9 . 1 . 1 9 7 6 in Data Base

[7 ] [8]

R.Durchholz, G.Richter: Concepts for Data Base M a n a g e m e n t Systems. Procs. IFIP-TC-2 W o r k i n g C o n f e r e n c e on "Data Base M a n a g e m e n t Systems", Cargese, Corsica, N o r t h - H o l l a n d P u b l i s h i n g Co. (1974) E.F.Codd: A R e l a t i o n a l Model of Data Comm. ACM 13 (1970), pp. 377-387 R.W.Taylor, R.L.Frank: CODASYL C o m p u t i n g Surveys, Vol.8, Nr.1 CODASYL-DBTG, for Large Shared Data Banks. Systems. ACM

[9 ] [10 ]

Data Base M a n a g e m e n t (1976), pp. 67-IO4 Report, N e w York

[11 ] [12 ]

Data Base Task Group

(1971)

J.R.Abrial: Data Semantics. Procs. IFIP-TC-2 on "Data Base M a n a g e m e n t Systems", Cargese, land P u b l i s h i n g Co. (1974) C.J.Date: An I n t r o d u c t i o n Publ. C o m p a n y (1975) C.A.Montgomery: Proc. A C M Natl. to D a t a b a s e

Working Conference Corsica, N o r t h - H o l Addison-Wesley Query Language? and N a t u r a l

[13] [14 ] [15 ] [16 ]

Systems.

Is N a t u r a l L a n g u a g e an U n n a t u r a l Conf. 1972, pp. 1075-1078

Procs. of the A d v a n c e d Course on Data Base L a n g u a g e s L a n g u a g e Processing, F r e u d e n s t a d t (Sept. 1976)

K.D.Kr~geloh, P.C.Lockemann: H i e r a r c h i e s of Data Base Languages: An Example. I n f o r m a t i o n Systems, Vol.1, pp.79-90, P e r g a m o n Press (1975) E.F.Codd: Seven Steps to R e n d e z v o u s with the Casual User. Procs. IFIP-TC-2 W o r k i n g C o n f e r e n c e on "Data Base M a n a g e m e n t Systems", Cargese, Corsica, N o r t h - H o l l a n d P u b l i s h i n g Co. (1974) A.Malhotra: D e s i g n C r i t e r i a for a K n o w l e d g e - B a s e d English Language System for Management: An E x p e r i m e n t a l Analysis. MIT Project MAC, Cambridge, Mass. (1975) Data Base. Vol.8, Nr.2 (1968)

[17 ]

[18 ]

[19]

84

[20]

G.Schott: Automatische Analyse der Flexionsmerpheme deutscher Substantive. Technische Universit~t MHnchen, Abteilung Mathematik, Gruppe Informatik, Bericht Nr.7210 (1972) Zur maschinellen Syntaxanalyse. Forschungsberichte, Institut fHr Deutsche Sprache, Mannheim, Bd. 18.1, 18.2, 19., Narr-Verlag, T~bingen (1974) G.Salton: Automatic Information Organization McGraw Hill Book Co., New York (1968) and Retrieval.

[21]

[22] [23]

H.L.Josselson: Automatic Translation of Languages Since 1960: A Linguist's View. In: Advances in Computers 11 (1971), pp. 1-58 A.M.Turing: Computing Machinery 59, pp. 433-460 and Intelligence. Mind (1959),

[24] [25]

J.A.Fodor, J.J.Katz: The Structure of a Semantic Theory. In: Fodor/Katz (eds.) : The Structure of Language. Prentice Hall (1964), pp. 170-210 D.G.Bobrow: A Question-Answering System for High School Algebra Word Problems. In Procs. AFIPS 1964 Fall Joint Comp. Conf., Vol. 26, pp. 591-614 B.Raphael: SIR. 26, pp. 577-589 In Procs. AFIPS 1964 Fall Joint Comp. Conf., Vol.

[26]

[27]

[28] [29] [30]

B.F.Green et al.: BASEBALL. In F.A.Feigenbaum, J.Feldman (eds.) : Computer and Thought. McGraw Hill, New York (1963), pp. 207-216 J.Weizenbaum: pp. 36-45 ELIZA. Communications of the ACM, Vol.9 (1966),

L.Coles, L.Stephen: An On-Line Question Answering System With Natural Language and Pictorial Input. In Procs. ACM 23rd Natl. Conf. (1968), pp. 157-167 T.Winograd: Understanding Natural New York (1972) Language. Academic Press Inc.,

[31]

[32] [33]

W.A.Woods: Procedural Semantics for a Question Answering Machine. Procs. AFIPS Fall Joint Comp. Conf., 33 (1968), pp. 457-471 W.A.Woods: Progress in Natural Language Understanding - An Application to Lunar Geology. Procs. National Comp. Conf. (1973), pp. 441-450 J.Mylopoulos, S.Schuster, D.Tsichritzis: A Multi-Level Relational System. Procs. National Comp. Conf. (1975), pp. 403-408 M.M.Astrahan et al.: System R: Relational Approach to Database Management. ACM Transactions in Database Systems, Vol.1, Nr.2 (1976) IMS 2. In: Kurzbeschreibung von Information Storage and Retrieval Systemen, Gesellschaft f0r Mathematik und Datenverarbeitung, St.Augustin (1973) S.Todd: Integrated Architecture for Transaction Specification and Optimization in Relational Data Base Systems. Summer School on Data Base Technology, GMD St.Augustin (1976)

[34]

[35]

[36]

[37]

85

[38]

K.D.Kr~geloh: A Multi-Level System Architecture with Natural Language Interface (in German). Ph.D.Thesis, University of Karlsruhe (1976) C.H.Kellog: A Natural Language Compiler for Online Data Management. AFIPS 1968 Fall Joint Comp. Conf., Voi.33, pp.473-493 F.B.Thompson, P.C.Lockemann, B.Dostert, R.S.Deverill: REL: A Rapidly Extensible Language System. Procs. 24th National ACM Conference (1969), pp. 399-417 C.W.Bachman: ACM, Voi.16, The Programmer as Navigator. Nr.11 (1973), pp. 653-658 Communications of the

[39] [40]

[41]

[42]

M.Lacroix, A.Pirotte: ILL: An English Structured Query Language for Relational Data Bases. M.B.L.E. Research Laboratory Report, Brussels (1976) J.A.Feldman, P.P.Rovner: An ALGOL-Based Associative Language. Communications of the ACM, Voi.12, Nr.8 (1969), pp. 439-449 G.Goos: Programmkonstruktion. Karlsruhe (1974) Interner Bericht, Universit~t

[43] [44] [45]

A.Kratzer, E.Pause, A.v. Stechow: E i n f ~ h r u n g in die Theorie und Anwendung der qenerativen Syntax. Athenaeum Verlag, Frankfurt (1974) PASSAT. Systembeschreibung Siemens PBS4OO4, MOnchen (1973)

[46] [47] [48]

I.Steinacker: York (1975)

Dokumentationssysteme.

De Gruyter,

Berlin, New

J.C.Earley: An Efficient Context-Free Parsing Algorithm. Ph.D. Thesis, Carnegie-Mellon University, Pittsburgh, Pennsylvania (1968) T.Kasami: An Efficient Recognition and Syntax Analysis Algorithm for Context-Free Languages. University of Illinois (1966) D.H.Young~r: Recognition and Parsing of Context-Free Languages in Time n . Information and Control 10 (1967), pp. 189-208 R.F.Simmons: Natural Language Question Anwering Systems: 1969. Communications of the ACM, Voi.13, Nr.1 (1970), pp. 15-30 M. Kay: Experiments with a Powerful Parsere sur le Traitement automatique des langues, Deuxi~me Conference Grenoble (1967)

[49] [50] [51] [52] [53]

B.H.Dostert, F.B.Thompson: How Features Resolve Syntactic Ambiguity. Procs. of the Symposium on Information Storage and Retrieval, University of Maryland (1971) K.Brockhaus: Automatische schweig (1971) Ubersetzung. Vieweg Verlag, Braun-

[54]

[55]

H.Wulz: ISLIB - Ein Informationssystem auf linguistischer Basis. Interner Bericht, Institut for Deutsche Sprache, Abteilung Linguistische Datenverarbeitung, M a n n h e i m (1975) F.B.Thompson: English for the Computer. Comp. Conf. (1966), pp. 349-356 Procs. AFIPS Fall Joint

[56]

86

[57] [58] [59] [60]

W.Wohlleber: Ein Parser f~r die Analyse natHrlicher Sprache. Diplomarbeit, Universit~t Karlsruhe (1973) J.Friedman: A Computer Model fo Transformational Grammar. American Elsevier Publishing Company Inc., New York (1971) C.Mathis: Entwurf und Implementierung einer textverarbeitenden Sprache. Diplomarbeit, Universit~t Karlsruhe (1975) ROTE LISTE 1975. Herausgeber: Bundesverband der pharmazeutischen Industrie, Frankfurt, EDITIO CANTOR, Aulendorf/W~rtt. (1975)

A N O V E R V I E W OF A PROBLEM SOLVING INFORMATION

P L I D I S

S Y S T E M W I T H G E R M A N AS Q U E R Y L A N G U A G E

~)

G.L. B e r r y - R o g g h e H. Wulz

Institut Postfach

for d e u t s c h e 5409,

Sprache I

D-6800 Mannheim

I. B a c k g r o u n d

and a p p l i c a t i o n

of the s y s t e m

PLIDIS

(Problem!6sendes

Informationssystem language

mit Deutsch

als I n t e r a k is b e i n g

tionssprache) designed

is a n a t u r a l

information

system which

in the c o n t e x t

of a p r o j e c t Sprache

in a u t o m a t e d sponsored 1976-77.

language processing for Reis in

at the I n s t i t u t

f~r d e u t s c h e

by the M i n i s t r y

search and T e c h n o l o g y many ways an e x t e n s i o n

for the y e a r s of a p r e v i o u s

The p r e s e n t p r o j e c t

two-year project which achieved question-answering Basis) (e.g. KOLB system ISLIB & WULZ 1975)

the c o n s t r u c t i o n

of the e x p e r i m e n t a l

(Informationssystem

auf l i n g u i s t i s c h e r

b a s e d on the s i m u l a t e d this framework

p r o b l e m d o m a i n of the s t o c k - e x c h a n g e . foundations were investigated

Within

theoretical

and d i f f e r e n t f r o m its p r e d whereby our

approaches ecessor emphasis

experimented

with.

The P L I D I S p r o j e c t d i f f e r s an a c t u a l system,

in its i n t e n t i o n

to i m p l e m e n t

lies on the a d a p t a t i o n

of the m e t h o d s

t r i e d out in the p i l o t the p r o b l e m s o l v i n g

s t u d y to a real p r o b l e m d o m a i n and on e n h a n c i n g capacities of the system.

The f i e l d of a p p l i c a t i o n tion. ~ pilot version

of P L I D I S will be the c o n t r o l of w a t e r p o l l u in c o - o p e r a t i o n who

of the s y s t e m is b e i n g d e v e l o p e d

w i t h the r e g i o n a l supervise

' d e p a r t m e n t of the e n v i r o n m e n t ' wastes

at S t u t t g a r t ,

industrial

lead into the r i v e r s of Northern-Wiirttember d.

~)

The r e s e a r c h r e p o r t e d here is s u p p o r t e d by the G e r m a n F e d e r a l R e p u b l i c ' s " B u n d e s m i n i s t e r f~r F o r s c h u n g und T e c h n o l o g i e " u n d e r g r a n t Nr. O81 5900 69 w i t h i n the "3. D V - P r o g r a m m der B u n d e s r e g i e rung" . The a u t h o r s are e n d e b t e d to W . B r e c h t , W . D i l g e r , R . G u n t e r m a n n , D.Kolb, M . K o l v e n b a c h , A . L ~ t s c h e r , H . D . L u t z , K . S a u k k o , G . Z i f o n u n w h o c o l l a b o r a t e w i t h i n the P L I D I S - p r o j e c t and w h o d i d a lot of the r e s e a r c h r e p o r t e d here.

~)

88

In the c o n t r o l the W a t e r b o a r d is t h e r e f o r e ferent The

of w a t e r

pollution and

several

bodies local

co-operate, authorities; dealing water.

such

as

of the L a n d

the v a r i o u s variety of

PLIDIS dif-

to be u s e d by a w i d e of the s u p e r v i s i o n sets firm: regularly their and of a c t i o n s are

of p e o p l e

with

aspects

industrial

waste

following

involved

in the p r o c e s s

of c o n t r o l l i n g

a particular

- the a u t h o r i t i e s verify erated, data the

inspect

the

firms

in o r d e r the w a s t e

to c o l l e c t products plant,

and genetc.,

about type

production

process,

functioning sewage

of t h e are

sewage taken

treatment and sent

at r e g u l a r

intervals

samples

for a n a l y s i s

to a n e u t r a l - the with used firm

laboratory, a report field containing the results PLIDIS of the analysis. to be

is sent

reference in the

to this

of a p p l i c a t i o n ,

is s c h e d u l e d

following

capacities: e.g. to c h e c k the chemical composition samples of t h e from the

- as s u p e r v i s i o n samples, same firm

system, the

to c o m p a r e and

current

sample

with

previous

to i s s u e

appropriate

warnings

if a n o r m h a s

been

trans-

gressed, - as i n f o r m a t i o n sition and system, e.g. to a n s w e r queries the concerning the compoof the

toxicity processes

of c e r t a i n of the

chemicals, involved

characteristics

production

firms

etc., pollution may have o-

- as i n v e s t i g a t i o n riginated

system,

e.g.

to d e t e c t plans

where

and possibly

suggest

of a c t i o n s

to be taken.

2.

General

design

of P L I D I S

The

PLIDIS

information

system

is c o m p o s e d the

of,

on the o n e input

hand,

a lin-

guistic-logical nal

part which modelled

translates

German

into and,

an i n t e r on the the

representation hand,

on the p r e d i c a t e part which, functions, process. modular

calculus,

other usual

a problem-solving and r e t r i e v a l

in a d d i t i o n

to p e r f o r m i n g

storage

involves

problem

domain-specific

regularities The design

in the d e d u c t i o n system

of the

is l a r g e l y

and

allows

extensive

useris

interaction

between

the v a r i o u s

execution

phases.

This m o d u l a r i t y

89

an e s s e n t i a l

prerequisite

for e f f i c i e n t

teamwork

as each m e m b e r o f the changes in p e r s o n fa-

g r o u p c a n be a l l o c a t e d nel take p l a c e cilities Figure nents

a specific

part and p o s s i b l e

smoothly.

F r o m the same p o i n t of view, experimentation

interactive

are e s s e n t i a l

to f a c i l i t a t e

and d e b u g g i n g .

I is a d i a g r a m m a t i c

representation

of the s y s t e m ' s m a i n c o m p o them. to the system, some o~ are

s h o w i n g the f l o w of i n f o r m a t i o n b e t w e e n has s e v e r a l c h o i c e s of access

The PLIDIS-user which

are d e s i g n e d

especially

for a m o r e n a i v e u s e r and some w h i c h

destined

for the s y s t e m d e s i g n e r and a d m i n i s t r a t o r . language processor as n a t u r a l (NLP) e n a b l e s language the u s e r to f o r m u l a t e or to use n a t u r a l such as r u l e s

The n a t u r a l

problem descriptions language

questions

for the i n p u t of s h o r t e r p i e c e s of i n f o r m a t i o n

a b o u t his p r o b l e m d o m a i n or d a t a for u p d a t i n g . F o r the input of s t e r e o t y p e d have data-sheets of f o r m a t t e d data of l a r g e r q u a n t i t i e s , w h i c h are p r o c e s s e d provides also the user m a y

on his terminal,

by the p r o c e s s o r facilities ac-

input

(FIP). This p r o c e s s o r command language

c e s s i b l e b y the s y s t e m ' s sheets and p r o c e d u r e s

(CL) to d e f i n e n e w d a t a of the f o r m a t t e d input. have

for p l a u s i b i l i t y

checks

The n a t u r a l

language processor i.e.

and the f o r m a t t e d to t r a n s l a t e

input processor

the same task to p e r f o r m of i n t e r n a l calculus. .The p r o c e s s o r

the i n p u t into the l a n g u a g e of f i r s t order predicate

representation

(IR), an e x t e n s i o n

for i n f o r m a t i o n s information

and p r o b l e m d e s c r i p t i o n s or a c t i v a t e s problem

(PIP)

either

stores the i n c o m i n g

solving mechanisms

in the case of p r o b l e m d e s c r i p t i o n s , asked. In the c u r r e n t tions (PAF) s t a t e of the system,

according

to the type of q u e s t i o n

the p r o c e s s o r

for a n s w e r - f o r m u l a from the formuf o u n d by

generates

o n l y some sort of

'pretty-print'

las of i n t e r n a l

representation, as a n s w e r

which contain

the i n f o r m a t i o n

the P I P - c o m p o n e n t

to the users q u e s t i o n s .

It w o u l d be d e s i r -

able at a f u t u r e s t a g e t h a t this c o m p o n e n t which generate natural language sentences

be r e p l a c e d by p r o c e d u r e s o u t of I R L - f o r m u l a s .

The i n t e r a c t i o n which processes LISP-code.

of t h e s e c o m p o n e n t s the c o m m a n d language

is g u i d e d by the P L I D I S - s u p e r v i s o r statements and a c c e p t s also INTER-

|
~
h~
Processor or

NLP PIP
I~orm- I

FIP

PAF

Lexical Base tJ c i cali~n~an~:iData- / shoet | iDes~/

Natural Language Processor AnswerFormulation

FormattedInput

i
Processor for Informations and Problemdescriptions Data Base

Processor

CO O Facts I Fact- | specific /

P L I D I S S up e r v i s o r

~ioms~

User
(IR)

- Natural Language (NL) - Forms (FS) - Internal Representation Command Language (CL)

fig. 1

PLIDIS - main components and information flow

91

The c o m m a n d active

language

gives

the n o n - n a i v e

user access

to v a r i o u s

interThe in

facilities,

which

are h e l p f u l

for t e s t i n g

and debugging. contained

algorithms external

d r a w on l e x i c a l

and o p e r a t i o n a l

information

data bases,

w h i c h are s u p p l i e d by the u s e r / d e s i g n e r : lexicon contains word-forms at the m o m e n t some 10,OO0 enfea-

The m o r p h o - s y n t a c t i c

tries of n o n - l e m m a t i s e d tures
-

with their morpho-syntactic

such as tense, lexicon

number, contains

g e n d e r etc. informations a b o u t a word's e q u i v a l e n t symbol, operational

The s e m a n t i c

in the i n t e r n a l symbol,

representation term..., its

such as r e l a t i o n a l 'sort'

individual

(see s e c t i o n

3), the n u m b e r

and sort of the a r g u m e n t s - The d a t a - s h e e t inventory

for each p r e d i c a t e , contains the v a r i o u s reports,

and so on. data-sheets particulars for ena b o u t the

tering mass-data firms,


-

such as l a b o r a t o r y

etc .... rules specify a grammar for G e r m a n as an 'Augmented

The s y n t a c t i c

Transition Network'.
-

The t r a n s l a t i o n the N L - s e n t e n c e s

rules

specify

'transformations'

of the p a r s i n g s

of

into the i n t e r n a l syntactic

representation. criteria to g u i d e the p r o b -

- Heuristics lem-solver.

specify

and s e m a n t i c

The data b a s e p r o p e r or the of a t o m i c formulas

'knowledge'

of the s y s t e m is a c o l l e c t i o n s t a t i n g the f o l l o w i n g about samples of the c o m p o about treatas

in the internal

representation

information r i v e r water,

a b o u t the p r o b l e m domain:

(i) m a s s - d a t a

the legal n o r m s of the a l l o w e d c o n c e n t r a t i o n s , chemicals, t h e i r toxicity, (type of plant, etc...,

s i t i o n of v a r i o u s

information processes,

the firms b e i n g c o n t r o l l e d m e n t of w a s t e . . . ) , well as s p e c i f i c (ii)

production logical

axioms

stating general

implications

regularities

in the w o r l d - m o d e l

such as "x is g r e a t e r interferes

than y implies

that x is n o t equal to y" and it is toxic". in the d e v e l o p m e n t largely conceived

"if a c h e m i c a l

w i t h the r i v e r - f l o r a , A t the p r e s e n t guage component pleted, stage

of the system, in a p r e v i o u s to s p e c i f i c

the n a t u r a l Some

lan~-

and the f o r m a t t e d

input p r o c e s s o r

are m o r e or less comproject. problems

as they w e r e

adaptation

of the i n t e r n a l

representation

in the

d o m a i n of a p p l i c a t i o n efficient

are still in progress.

A new concept

for a m o r e the

and theoretically

sounder based mechanism into the i n t e r n a l

for t r a n s l a t i n g representation is

o u t p u t of the s y n t a c t i c being discussed

analysis

at the moment.

The b u l k of w o r k that still r e m a i n s

to

92

be d o n e

is in the p r o b l e m

solving

component.

3. T h e

internal

representation

in KS

3.1.

General

considerations

The

choice

of an a p p r o p r i a t e within

internal was

representation not motivated

(IR)

for the by t h e o r e t the efAn

knowledge ical

the P L I D I S - s y s t e m

solely of,

considerations retrieval

b u t by the use. it is to be m a d e to q u e r i e s the stated

namely

fective

of a n s w e r s must have

in n a t u r a l

language.

IR for a Q A - s y s t e m - expressive power

following

properties: of n a t u r a l situations, language, events, actions,

to m a t c h capacity

the c o m p l e x i t y to d e s c r i b e all

- world-modelling and changes of

states

occurring

in a g i v e n

micro-world, of p r o b l e m s p u t to the

- deductive system. The b r o a d

capacity

pertaining

to the

solution

aspects

can be m a d e

more

explicit

in the f o l l o w i n g

specific

requirements. (I) L i k e should on the should natural language, the I R m u s t be an 'object language', but ie it act it

not same not

describe

regularities level

of the G e r m a n

language, This

should that

referential contain

as n a t u r a l symbols

language. such

entails

metalinguistic

as set t h e o r e t i c a l

ones,

cases... (II) any all The given IR s h o u l d concrete be able to d e s c r i b e arbitrary have microworlds; ie for

micro-world, existing

it s h o u l d

the m e a n s

to d e s i g n a t e sets of inbe able

typical

entities events, time,

in t h a t w o r l d : actions, and etc.

individuals, Similarly,

dividuals, to e x p r e s s (III) This The

processes,

it s h o u l d

temporal of

relations

causality. described in a grammar. language to o p e r a t e

syntax

the IR m u s t automatic

be e x p l i c i t l y

grammar

guides

mapping and IR.

processes

of n a t u r a l

structures on the (IV) which ular

into

IR s t r u c t u r e s level of the

allows

the p r o b l e m - s o l v e r

syntactic

With

the I R m u s t

be a s s o c i a t e d in w h i c h

a formal

semantic

interpretation to p a r t i c one to

accounts

for the w a y

IR f o r m u l a s and

correspond

arrangements about

in the e x t e r n a l

world

furthermore (HAYES 1974).

allows

decide

the e q u i v a l e n c e

of f o r m a l i s m s

93

(V) It s h o u l d be s u i t e d to the a p p l i c a t i o n mechanisms, algorithms so that it is n o t n e c e s s a r y for e a c h d e d u c t i o n

of general

formal d e d u c t i o n deduction

to p r o g r a m

specific

(in the s e n s e of "methods")

- w h i c h does

not o f c o u r s e e x c l u d e In p a r t i c u l a r , culus points

the use of h e u r i s t i c s . (IV) and (V) i n d i c a t e the use of a p r e d i c a t e cal-

for the i n t e r n a l

representation,

as PC is i n t e r p r e t e d t h e o r y and a g e n e r a l

by a formal 'theorem

semantics prover'

in the f o r m of T a r s k i a n m o d e l operates on it.

mechanism

The s t a n d a r d

first-order

predicate

calculus

does,

however,

not fulfill a

all the a b o v e r e q u i r e m e n t s symbolic language

(for example,

condition

(V)). T h e r e f o r e

(in G e r m a n

'Konstruktsprache', predicate

abbreviated

to KS) was

designed modelled rating

on the f i r s t - o r d e r

calculus

but i n c o r p o -

a n u m b e r of e x t e n s i o n s below,

d e s c r i b e d below. between the formal ~ representaof

In the d i s c u s s i o n

we d i s t i n g u i s h according

tion l a n g u a g e KS, w h i c h the g i v e n m i c r o - w o r l d , specific vocabulary.

to r e q u i r e m e n t

is i n d e p e n d e n t

and c o n c r e t e K S - l a n g u a g e s The g e n e r a l construction outline

d e f i n e d by a w o r l d -

rules of the I R - l a n g u a g e of the c o n c r e t e language

KS are d e s c r i b e d

in 3.2; a p r e l i m i n a r y

KS-water-pollution-control

is g i v e n in 3.4.

3.2.

Short description

of the s y n t a x of KS

In a d d i t i o n

to the u s u a l symbols,

sets of s y m b o l s individual

in a p r e d i c a t e

calculus

namely predicate

symbols,

connectives

and q u a n t i -

fiers - the v o c a b u l a r y S = {uni, obj, int,

of KS c o n t a i n s sit, per, ort,

the set S of sorts: zus, akt...] for the German: 'Uni-

(These n a m e s versal, Ort,

can be u n d e r s t o o d Intervall,

as a b b r e v i a t i o n s

Objekt,

Situation,

Person/Personenk~rperschaft,

Zustand,

Aktion'.) variables is the C a r t e s i a n p r o d u c t o f the

T h e set SV of s o r t - i n d e x e d set V = KS-terms symbols. {Xl,...,Xn]

of v a r i a b l e s

a n d the set S of sorts. and r e l a t i o n are specified.

can be c o n s t r u c t e d w i t h the aid of o p e r a t i o n F o r each such symbol,

the sorts of its a r g u m e n t s is d e t e r m i n e d symbol.

T h e sort of the t e r m thus c o n s t r u c t e d last a r g u m e n t of the o p e r a t i o n

by the sort of the

or r e l a t i o n

94

The

following

conditions

of w e l l - f o r m e d n e s s from SV are

for t e r m s

are defined:

(I) S o r t - i n d e x e d (2) I n d i v i d u a l member (3) L e t

variables constants

terms. constant is a s s i g n e d a

are terms.

To each

o f the set S. operation symbol, to w h i c h is a s s i g n e d an n + 1 -

F be an n-place of s o r t s :

tuple

<al,...,an,an+1> Let

(ai6S) of t h e s o r t s al,...,a n respectively. Then

aI an t I ,...,t n be t e r m s aI a (F t I ,...,tnn)

is a t e r m of the s o r t vidual

a n + I. O p e r a t i o n a l term

terms. If t h e n t h a r g u m e n t then

t e r m s a r e in g e n e r a l i n d i a (tnn) is of the s o r t 'int' with reference to

(interval), a particular processes,

the term designates Such individuals are t e r m s

individuals are states of the

time. and

o f the w o r l d , 'sit' by the

actions,

so on.

They

sort

(situation) following

and are made tuple

u p of an o p e r a t i o n

symbol

followed

of s o r t s :

<al,...,an_1,int,sit> (4) L e t R b e an m - p l a c e of sorts: <a I ..... am> Let Then are al tam-1 t I ''''' m-1 (ai6S) be terms of t h e s o r t a l , . . . , a m _ I. Relational terms (see 3.3). relation symbol, to w h i c h is a s s i g n e d an m - t u p l e

aI am-l) (R t I , . . . , t m _ 1 'list t e r m s ' . formulas Such

is a t e r m o f t h e s o r t a mterms designate sets

of i n d i v i d u a l s to t h e

Atomic

in KS a r e c o n s t r u c t e d

according

following

conditions

of w e l l - f o r m e d n e s s : operation symbol with the t u p l e of s o r t s

(5) L e t F b e an n - p l a c e < a l , . - . , a n , a n + 1 >. Let

t a I ' ' ' ' ' i n 'nn+1 I 'an 'an+1 (F t a I I


' ....

be terms

of t h e

sorts

a l , . . . , a n + 1. T h e n

,e a n e a n+1
n ;~n+1 )

is an o p e r a t i o n a l

atomic

formula.

95

6) L e t

R b e an m - p l a c e

relation

symbol

with

the t u p l e

of sorts

<al,...,am>. Let aI a t I ,...,tmm be terms of the sorts a 1,...,a m . Then

aI am (R t I ..... t m ) is an r e l a t i o n a l Non-atomic rules formulas atomic formula. according detailed and to t h e u s u a l description 1976. construction

ar~ constructed calculus. in Z I F O N U N A more 1974

of p r e d i c a t e found

of the s y n t a x

of KS c a n be

ZIFONUN

3.3.

Special

features

of KS

a.

~YZ~9[~w
set S of sorts, described in 3.2, fields c a n be e x t e n d e d as d e m a n d e d sorts by requirements of s p e c i f i c of a p p l i c a t i o n . u s e of The underlie

The the

a hierarchical The sortal

structure

which

is m a d e

in p r o b l e m - s o l v i n g . motivated conditions reof

structure

of KS i m p o s e s - in the

semantically sense

syntactic

well-formedness But the

of K a t z - F o d o r and

'selection

strictions'. guments

specification is m a d e

of the n u m b e r

sorts

of the arrather

of a p r e d i c a t e guided by

in f u n c t i o n principles.

of the w o r l d - m o d e l ,

than being The

linguistic

advantages

of a sortal 1971.

structure

in a r e p r e s e n t a t i o n calculus with

language linguistic

were con-

indicated

in H A Y E S was

A logical

sortal 1972.

siderations

proposed

by T H O M A S O N

The ble ural

notion

of

'term' terms

in KS is d e f i n e d terms, such thus

recursively, reflecting noun

so t h a t

it is p o s s i some nat-

to e m b e d language

within

more

closely

constructs,

as c o m p l e x

groups. of Hans" becomes

Example:

"the m o t h e r in KS:

of the n e i g h b o u r (NEIGHBOUR

of the

friend

(MOTHER

(FRIEND

HANS)))

In the f r a m e w o r k of the c o n c r e t e K S - l a n g u a g e w i t h r e f e r e n c e to s o c i a l r e l a t i o n s , M O T H E R w o u l d h a v e b e e n d e f i n e d as a l - p l a c e o p e r a t i o n s y m b o l t a k i n g the t u p l e of s o r t s <per, per>. F R I E N D a n d N E I G H B O U R w o u l d h a v e b e e n d e f i n e d as 2 - p l a c e r e l a t i o n s y m b o l s a l s o w i t h the t u p l e <per, per>.

96

In KS are d e f i n e d the n a t u r a l MANCHE, EINIGE (many, several,

language quantification some). They describe

s y m b o l s VIELE,

the size of sets of

entities.

The same a p p l i e s

to the n a t u r a l

n u m b e r s w h i c h can also be the f o l l o w i n g c o n d i t i o n s

u s e d as q u a n t i f i c a t i o n of w e l l - f o r m e d n e s s :

symbols.

They underlie

L e t QU be a q u a n t i f i c a t i o n of the sort am, then (QU is a q u a n t i f i e d

symbol,

let

aI m-1 (R t I ,...,tm_ I) be a list t e r m

al .am-l, (R t I ,...,tm_ I

list t e r m of the sort a . m

Singular terms'

and p l u r a l o b j e c t s 'list terms'

can be d e s i g n a t e d

in KS by

'individual

and

respectively.
der

As an e x a m p l e
Mutter Franz von und

the K S - r e p r e s e n t a ist Fritz"

t i o n of the s e n t e n c e s
"die Nachbarn der

"der N a c h b a r
yon Hans

Hans Egon"

and

Freunde

sind

is given:

(NACHBAR (NACHBAR

(MUTTER HANS);

FRITZ)

(FREUND HANS) ;(LISTE FRANZ EGON))

f- ~ ! ~ h ~ i E _ 2 ~ ! ~ KS i n c o r p o r a t e s which arithmetic operations such as PLUS, D I F F E R E N C E , TIMES...

can be i n t e r p r e t e d

as LISP f u n c t i o n s .

3.4. T h e K S - l a n g u a g e

for the c o n t r o l of w a t e r p o l l u t i o n

In P L I D I S

is d e f i n e d a c o n c r e t e K S - l a n g u a g e in the c o n t r o l

deriving

its v o c a b u l a r y

from

the f i e l d of a p p l i c a t i o n

of w a t e r p o l l u t i o n . extended. A pre-

The set S of sorts g i v e n in 3.2. has b e e n a c c o r d i n g l y liminary partial in f i g u r e 2. of the v o c a b u l a r y

sort tree for the d o m a i n of w a t e r p o l l u t i o n

is s h o w n

Some e x a m p l e s below: - individual

of K S - w a t e r - p o l l u t i o n - c o n t r o l

are g i v e n

constants:

ARSEN ZYANID

(sort

: 'stoffkoll')

(sort : 'stoffkoll')

- operation - PROBE

symbols: sortal t u p l e : <betrieb, int, stoffkoll>)

(2-place;

/\
p e r ~ s t o f fquant maBzahl

ort

physobj

num

sit

/\

int

or t i n d i v

ortkoll

/\
stoffkoll natper firma betrieb

stoff

zeitpunkt

zeitkoll

stoffindiv

ortkoll: ... ... PH-WERT, ... ...

GEWAESSER,

RHEIN,

FRANKFURT,

...

stoffindiv:

ARSEN,

stoffkoll:

PROBE,
GROESSE,

stoffqual:

TEMPERATUR,

stoffquant:

ANTEIL,

maBzahl:

(~.3 M G / L ) . . . .

fig.2

Preliminary

sort

tree

of K S - w a t e r - p o l l u t i o n

98

- PROBENEHMER - ANTEIL

(l-place;

sortal tuple sortal

tuple

: <stoffkoll,

per>) physobj, num>) int,

(3-place;

sortal

: <stoff, tuple

stoffkoll,

- LABORBERICHT

(3-place;

: <perkSrp, physobj>)

stoffkoll,

BETRIEB

(2-place; symbol: (l-place;

sortal

tuple

: <firma,

ort,

betrieb>)

- relational - GIFTIG The above PROBE

sortal c a n be

tuple

: <stoff>) into English ANTEIL = = as follows: 'amount', GIFTIG =

predicates = 'sample', =

'translated'

PROBENEHMER

='sampler',

LABORBERICHT 'toxic'. The following (PROBE "The at The

'laboratory

report',

BETRIEB

'firm',

is an e x a m p l e

of a K S - t e r m : STUTTGART) 76.~1.13.14.~) in S t u t t g a r t on 13.1.1976

(BETRIEB

MAX-MULLER from the

sample

taken

firm Max MOller

14.OO hours." is an e x a m p l e (PROBE of a K S - f o r m u l a : MAX-MULLER STUTTGART) 76.~1.13.) PLOCHINGEN) 76.@1.13.)

following (ANTEIL

ZYANID

(BETRIEB (BETRIEB (PROBE

(LABORBERICHT

CHEM-UNTERSUCHUNGSANSTALT MAX-MULLER

(BETRIEB

STUTTGART)

76.~I .15.)
; (0,5 mg/l))

"The

amount

of c y a n i d e

contained on

in the

sample

taken

f r o m the

firm

Max MOller report 15.1.76

in S t u t t g a r t

13.1.76,

according

to the l a b o r a t o r y produced on

of the c h e m i c a l amounted

analysis

centre pro

in P l o c h i n g e n liter."

to 0.5 m i l l i g r a m

4. N a t u r a l - l a n g u a g e

analysis

in P L I D I S

The not

requirement only

of m o d u l a r i t y

in a s y s t e m but

such

as P L I D I S from a more

is d i c t a t e d systematic between

by o r g a n i z a t i o n a l was

reasons,

also,

point

of v i e w y i t which

desirable

to m a i n t a i n

a strict

separation well

components, or w h i c h topics going are

are t h e o r e t i c a l l y interest

or m e t h o d o l o g i c a l l y to the p r o j e c t ,

understood which are

of no c e n t r a l effort

and t h o s e

of g e n u i n e on.

and e x p e r i m e n t a t i o n

and where

research

is still

99

Thus the natural language processor is separated into three passes fig. 4): a PASS@ for the morphological identification, a PASSI

(see

for syn-

tactic analysis and a PASS2 for code generation, the language of internal representation.

i.e. translation into

4.1. PASS@: M o r p h o l o g i c a l identification

In an earlier stage of the system PASS# was a p r o g r a m for morphological analysis which operated with a len~atized dictionary. For each german

w o r d of the system's vocabulary there existed only one dictionary entry, the basic form of the word. A certain class of verbforms for example were represented by their infinitive form. It was the task of the p r o g r a m to apply m o r p h o l o g i c a l rules to inflected forms of words and to reduce them to their basic form, which then allowed a dictionary look-up for further information. This analysis tech-

nique was very time consuming and it was replaced by a very simple program which works with a non-lemmatized dictionary. Each inflected form

of a w o r d has an entry in the dictionary with its full morphological information such as basic form of the inflected word, wordclass, tense etc. (see fig. 3). gender,

MORPHOLOGISCHE BESCHREIBUNG VON ((N

'PROBEN':

NF PROBE KNG 7 6 9 0 K (NOM GEN DAT A K K ) PN 6 G F) (VERB NF PROBEN KNG 7 8 3 1 PN ( 2 5 ) ROD BEF TEMP Gs D I A A K T ) (VERB ~F PROBEN KNG 8191 TEMP I N ) (VERB NF PROBEN KNG 7727 PN (4 6 ) MOD (IND KONJ) TEMP GE DIA AKT))

MORPHOLOGISCHE

BESCHREIBUNG

VON

'PRODUZIERT':

((VERB

NF P R O D U Z I E R E N KNG 8 1 9 1 PAS 1 REFL (NER A K K ) ) ( A D J U NF P R O D U Z I E R T S P O S ) )

TEMP P2 HS HABEN

fig.3

Sample entries pf the non-lemmatized morphological dictionary.

IO0

Error Message

~fr~

I-.4
I

(D o m

"[
eu O O

;~
-~.~

,~
O@

~ .,era
H

II
~4 0 Ol ,-t
r r~

r~

4-1 r/l
I

sZ 9 .,,.4 m~ in r (D @ 0 M m I 4-i

H
o"o q-t 4o

r~

D3

t~.t

",~,~

L__I

o-1 M4~ M ~

o"~

F
I

D~ ..-4 q-i

101

The

dictionary such

is s t o r e d time

on an e x t e r n a l

device

with

index

sequential of a

access word The into

that

required

for m o r p h o l o g i c a l

identification

is r a t h e r amount the

small. necessary for e n t e r i n g all inflected forms of a w o r d which for the

of w o r k

dictionary from form.

is r e d u c e d

by a s p e c i a l of a b a s i c

function form the

of PASSe, entries

generates inflected Finally guistic

a dictionary

entry

a HELP-routine knowledge

of P A S S e the

enables

even

a user with

little

lin-

to w r i t e

dictionary

entry

for a b a s i c

wordform.

4.2.

PASS1:

Morpho-syntactic

analysis

In the laid

framework

of the P L I D I S analysis

project,

the

following

requirements

were

down

for a s y n t a x should be

component: independent ease f r o m the p a r s i n g algothe un-

- the g r a m m a r rithm. This

formulated ensures rules

requirement of the

of m o d i f i c a t i o n

and e n a b l e s

formulation versed

grammar

to be c a r r i e d

out by linguists

in p r o g r a m m i n g ; must allow the f o r m u l a t i o n it m u s t which be a b l e are v e r y analysis adequate of c e r t a i n to d e a l common should context-sensitive with dis-

- the g r a m m a r features, continuous

in p a r t i c u l a r constituents, of the

effectively in the G e r m a n be o n l y

language;

- as the o u t p u t the p a r s e r find

syntactic with

one p a r s i n g , facilities to of

must

be e q u i p e d parsings

backtracking

alternative

if r e q u e s t e d

by s u b s e q u e n t

components

the system.

4.2.1. ~ 2 ~ ! _ ~ ~ ! 2 ~ _ 2 ~ _ ~ _ ~
Bearing model the above requirements in mind, it w a s d e c i d e d (ATN) (WOODS to a d o p t 1973). the An ATN (BTN).

of an

'augmented

transition extension to a a BTN

network' of a

is a c o n t e x t - s e n s i t i v e The latter

'basic

transition

network'

is e q u i v a l e n t To i l l u s t r a t e the

'push-down consider

store'

automaton small

or a c o n t e x t context-free

free PSG. grammar

the

following

with

following

rules:

S ~ NP VP NP ~ D E T N VP ~ V DET ~ d e r

I NPR

102

N V

~ ~

Hund schl~ft

NPR ~ Hans

Whereby symbols;

S is the s t a r t symbol;
der, Hund, Hans,

NP, VP,

DET, N, NPR,

V are n o n - t e r m i n a l

schl~ft

are t e r m i n a l

symbols.

The f i n i t e - s t a t e - t r a n s i t i o n - d i a g r a m as s h o w n in fig. 5.

for this g r a m m a r can be r e p r e s e n t e d

(1)

U H (~ , PSNP/ ~

PS P ~ UH / V

PP :~ O

(2)

CTE . ~ AD T
N

C A T N P R ~ ~ / C A T

PP ~ O

() 3

CT A V

PP O

fig.5

A sample b a s i c t r a n s i t i o n

network

The two b a s i c sents

concepts

of a BTN are

'state'

and

'arc'. A n arc r e p r e -

the t r a n s i t i o n

f r o m state i to statej.

T h e r e are the f o l l o w i n g

m a i n t y p e s of arcs: A CAT-arc repreSents a transition t h a t c a n be t a k e n if the w o r d in the at b e l o n g s to the s y n t a c t i c cat-

i n p u t s e n t e n c e w h i c h is b e i n g p o i n t e d e g o r y i n d i c a t e d on the arc.

103

A PUSH-arc taken

introduces

recursion o f the

in the n e t w o r k . string this

The

transition by the

c a n be in-

if a s u b s t r i n g on the arc.

input

is a c c e p t e d the

state

dicated

When

leaving phrase.

state,

scanner

points

to the

last w o r d A POP-arc

in t h e should

accepted be

interpreted the state

in c o n j u n c t i o n indicated a n d goes not lead

with

a PUSH-arc.

After

successfully the arc, state current in the

reaching level sense

on the P U S H - a r c , level.

it l e a v e s 'pseudo' the re-

of r e c u r s i o n that it does final. to t h e

up one

It i~ a

to a n e x t also

state, the

but marks structural

it l e a v e s

as b e i n g given

It w i l l

specify

presentation An ATN

being

substring

analysed. in the the sense that from

is a c o n t e x t - s e n s i t i v e o n the arcs

extension

of a B T N

conditions one be state stated

can be s t a t e d For example, from of

to c o n t r o l in t h e above

transition BTN S/VP

to the next. that the

sample

it c o u l d is o n l y

transition

state

S/NP

to s t a t e

allowed

if the

category

"number"

subject

and verb the

agree. of p a r t s has

Furthermore, of a p a r s i n g been

structure-building tree can be

actions, when

i.e. the

construction

formulated

transition

to a s t a t e

achieved. the d e s i r e d context-sensitivity, c a n be u s e d an A T N is a u g m e n t e d with a and

To a c h i e v e number thus

of r e g i s t e r s for t h e

which

to s t o r e

intermediate results. with

results

allow

treatment

of d i s c o n t i n u o u s to d e a l m o r e arc w a s

To e n a b l e lem,

the g r a m m a r - w r i t e r arc, which being namely allows

conveniently introduced. to the n e x t

this

probthere if the

a further

a VIR the

In a d d i t i o n , state only

is a T S T - a r c , word currently

transition

scanned

satisfies except

a given

condiEion, not

and a JUMPany of the

arc w h i c h input

is like

a TST-arc,

that

it d o e s

consume

string. of t h e registers is a l s o to s t o r e the intermediate structure sequence The results of

The purpose the

structure

building string

actions.

Normally, directly

a parser

assigns

to the i n p u t tions one and (i.e.

corresponds

to the

of the t r a n s i allow

corresponds

to the s u r f a c e picking

structure).

actions

to b u i l d thus

up n e w s t r u c t u r e s ,

up c o n s t i t u e n t s or e v e n semantic

o u t of order, representaver-

to r e p r e s e n t of t h i s for t h e

'deep s t r u c t u r e s ' feature,

tions.

Because

an A T N w a s

also used

in an e a r l i e r

sion of P L I D I S representation.

translation

from natural

language

to i n t e r n a l

~04

Another major advantage language is its

of an A T N to r e p r e s e n t N e w arcs,

the g r a m m a r of a n a t u r a l can b e

'open e n d e d n e s s ' .

tests and a c t i o n s

formulated lysed.

to m e e t the r e q u i r e m e n t s

of the s p e c i f i c

l a n g u a g e b e i n g ana-

4.2.2. ~ s ~ a ~ @ _ ~ ! ~ ! ~ _ ~ [ ~ _ ~ [ i ~
The following describes b r i e f l y h o w the A T N concept, structures p r o p o s e d by WOODS, language was a d a p t e d to suit the s p e c i f i c (e.g. K O L B & LUTZ 1975). allowing of the G e r m a n

The i n v e n t o r y right-to-left verbphrase.

of arcs w a s a u g m e n t e d by an L C A T and an LWRD arc, analysis, which is p a r t i c u l a r l y

r e q u i r e d by the G e r m a n in the s e n t e n c e be a for the f o r m w o r d e n to

LCAT s t i p u l a t e s
of

t h a t the last w o r d and L W R D looks

past participle complete

a verb-form, verbphrase.

a passive

The t a b l e b e l o w d e s c r i b e s

the f o r m a t s

for all arcs used in the c u r r e n t -

ly i m p l e m e n t e d A T N for German:

(CAT < c a t e g o r y > (LCAT (MCAT

< t e s t > <action>M) <test> <action>~)

<category>

(<category>M)

< t e s t > <action>~) < t e s t > <action>M) <test> <action>M)

(VIR < c a t e g o r y > (VIRL < c a t e g o r y > (TST < l a b e l > (WRD < w o r d > (LWRD < w o r d > (MEM

<stack> <stack>

<test>

<action>M)

< t e s t > <action>M) <test> <action>~) <test> <action>M) <action>M)

(<word>M)

(PUSH < s t a t e > < t e s t > < p r e a c t i o n > M (POP < f o r m > <test>) <test> <action>M)

(JUMP < s t a t e > (TO < s t a t e >

<test> <action>M)

Disadvantages

of u s i n g an A T N for G e r m a n b e c a m e structure.

apparent,

m a i n l y due am-

to its c o m p l e x m o r p h o - s y n t a c t i c biguity is extreme, particularly

Since morpho-syntactic

in the case of v e r y c o m m o n due to the e x t e n s i v e

articles, a m o u n t of

the p a r s i n g backtracking

is slowed d o w n c o n s i d e r a b l y required. For example,

the a r t i c l e

der has the f o l l o w i n g

morpho-syntactic

interpretations:

105

nom gen dat

sing m a s c sing sing fem fem masc fem neut disadvantage so that of an A T N is t h a t the tests are for-

gen p l u r gen p l u r gen p l u r Another, mulated ing more

general

as c o n s t a n t s

it is not p o s s i b l e followed

to v a r y

a test a c c o r d by an e x c e s -

to the p a t h complex

in the aparatus

analysis

so far e x c e p t

sively

of r e g i s t e r

setting. parsing technique studied within as

Nevertheless the d o m a i n ATN-parsing guists

there

seems

to b e no a l t e r n a t i v e parsing, which

of n a t u r a l and which

language

is as w e l l

at the same

time m a y be h a n d l e d training. of the G e r m a n This grammar would

easily

b y lin-

without

a special

programming grammar

Diagrams verb

of a p a r t c a n be for n o u n

of the A T N seen

noun

phrase

and

phrase

in figs.

6-1ff.

is v e r y w e a k l y like to h a v e a

structured structure lysis like

phrases.

Whereas

a linguist fig.

resembling fig. 8 for

to s o m e t h i n g the s e n t e n c e

like

7 PASSI

produces

an ana-

"Der A n t e i l 2 mg/l." of the

an C y a n i d (The a m o u n t

in der P r o b e of c y a n i d e 2 mg/l.)

der Firma

MUller

betrug

contained

in the

sample

firm MUller version,

was

In the p r e s e n t German types sentence

the p a r s e r

is a b l e

to r e c o g n i s e sentences

the m a j o r i t y containing clauses, imposed

of

structures,

including relative

complex

all obon

of s u b o r d i n a t e subject

clauses:

clauses,

adverbial

ject a n d

clauses.

There

is, h o w e v e r , which have

a restriction to o c c u r

the p o s i t i o n beginning The


-

of a d v e r b i a l

clauses,

either

at the

o r at the e n d of the s e n t e n c e . main (NG) noun phrases (ADJG) (PNG) sentence-constituents are recognized:

following

noun phrases prepositional

- adjectival - adverbs - verb Complex

attributes

(ADV) (VK) having inflected participal constructions Rhein as at-

phrases noun

phrases as

tributes, Abwasser'

such

'Das v o n d e r

Firma MUller

in d e n

eingeleitete

c a n be h a n d l e d

as well.

:AT (ADV ,IADV) T

1~' "'~MCAT (VERB AUXHA

""

CAT

REFLPRON

O3
= / ,.,,,,-,~, ~

~
VF )A,x,v.VERB) T
s.\ MCAT (VERB AUXH AUXS AUXWMVERB) T

,/~ --. ~// _

(HOLD VFIN)
v o

fig. 6-I

ATN for German

(Hauptsatz)

~EM

(%,

?
VIR VERBVF|tl (NEO (GETFTEMP'P2--Tk "'>~ ('NEO (GETFTEHP ' IN) ~ ~ [)/S / POPT ~.~

!) T

PUSHRES/ T

"4

'..-~ .+7" >/

Pk,._.M_USH
.~ PUSHHSVK/

Auou(;

k.

PUSHAAI T

S/

FEAT (VERPARGLEIPAR) T
WRD , T

PUSHHSVK/T

fig.6-2

ATN

for German

(Hauptsatz)

CAT PRAEP T

t-h p,O~
I

Ph 0

-4

0 0

fD 'lJ 0 m

>

PUSH ZAHL/ T

I....0

CAT ORDZAHL (AGRF Kt,IG , _E . )

i-J

i L

~
CAT ADJ (AGREE (GETF K) (GETKASUS (GETR Krlr-)))

~d v

/
" ~ 1 (AGREE (GETF K) 1 ~",,,,~., JUMP T (GETKASUS (GETR.KrIG))) ~

801,

WRD ET~'/A T

POPT

CAT ADd

(AGREEKNG-1) POPT

CAT ADV T
/

--'~

CAT

ADJUT

fig.6-4

ATN for German (Artangabe; Zahlqruppe; AdjektivgrupDe flektiert; AdjektivgrupDe unflektiert)

110

"W
i

,-g

vz

(I)

0
,1

,..Q (])

,I v,

%
P~
zl

-i

r~

(NIt (dN31 ~139) 03) ~xnv

( ( N I , (dW31 ~139) O3) ( * XnV B3~9V) ~NV) SXnY lV~


0 tH

(Nit (dN31 ~139) b3) (* XnV 33~9V) GNV) HXRV lV3

LC) I k.O

q-I

111

C
NK
NG NK

VK

VG

NK

I
NP NK PNK NG V

I
NG

I
DET N PNK PNG

I
NP NP DET NPR

I
PNG PRAEP NP

/-,,,,
NP

PRAEP

ZAHL

DET

I
N

der

Anteil

an

Cyanid

in

der

Probe

der Firma-M~ller

betrug

20

mg/l

fig.7

Example of noun

for desirable groups.

syntactic

structuring

within

the

domain

NG

PNG

PNG

NG

VK

NG

DET

PRAEP

PRAEp

DET

DET

NPR

ZAHL

der

Antell

an

Cyanld

in

der

Probe

der Firma-M~ller

betrug

20

mg/l

fig.8

Example language

for structuring parser.

capacity

of

the

PLIDIS

natural

112

ERGEBNIS DER MORPHO-SYNTAKTISCHEN ANALYSE ( P A S S I ) :

(S

( ( T Y P E . AUSSAGE) (DIATHESE . A K T I V ) (NS . (DER ANTEIL AN ZYANID IN DER PROBE DER FIRMA-ffiUELLER BETRUG 2 MG/L , ) ) ) (VK ( ( P N . (1 5 ) ) ) (V ( ( T E M P . V E ) ) BETRAGEN)) (NG ( ( K N G . 4 1 6 4 ) (K . N O M ) (PN . 3 ) ( 6 . M) (NS = (OER A N T E I L ) ) ) (BET NIL DER) (N NIL ANTEIL)) (PNG ((KNG . 1 6 0 1 ) (K . ( D A T A K K ) ) (PN . 3 ) ( G . N) (NS . (AN Z Y A N I D ) ) ) (PRAEP N I L AN) (N N I L ZYANID)) (PUG ((KNG . 1 0 9 0 ) (K . O A T ) (PN . 3 ) (G . F) (N$ = ( I N DER P R O B E ) ) ) (PRAEP N I L IN) (DET NIL DIE) (N N I L PROBE)) (NG ((KNG . 3138) (K . (GEN D A Y ) ) (PN . 3) (G . F ) (NS . (DER FIRMA-MUELLER))) (OET N I L OIE) (NPR N I L FIRMA-MUELLER)) (VERB ( ( N S . BETRUG)) BETRAGEN) (NG ((KNG . 7 7 4 5 ) (K = (NON GEN D A I A K K ) ) (PN . 3) (G . N) (NS . ( 2 M G I L ) ) ) (ZAHL N I L (INTEGERZAHL N I L 2)) (N NIL MG/L)))

fig.9

Sample

output

of

PASSI

113

The verbphrase exclusion Because

may

contain

a main mood.

verb

in a n y

tense

or mode,

with

the

of the c o n j u n c t i v e inherent some

of t h e i r

ambiguity,

which had

cannot

be r e s o l v e d

by p u r e l y

syntactic
-

criteria,

constructions

to be e x c l u d e d :

coordination (eg

between M~nner

noun phrases und Frauen') i.e. noun phrases without a nominal head

'Die a l t e n

'elliptical' (eg

noun

phrases,

'Er n a n n t e

das b i l l i g s t e be p o s s i b l e

gut') to p u s h the n o u n phrase analysis further such as

Certainly

it w o u l d the

by e x t e n d i n g dependency sooner

syntactic of verbs.

categories But since

and b y u s i n g deeper it w a s noun

informations phrase

frames

analysis

needs

or l a t e r

semantic

information

decided

to r e s t r i c t

the

syntactic of the

analysis

to the g e n e r a t i o n with a minimal

of a l i s t dependency

of the m a i n structure

constituents and to pass

input

sentence

the b u r d e n PASS2.

of s e m a n t i c

interpretation

to the t r a n s l a t i o n

component

in

4.3.

P~SS2:

Semantic

analysis

component

Within

the P L I D I S - s y s t e m natural

semantic

analysis into

is v i e w e d formulas

as the p r o b l e m of the i n t e r n a l KS-code from

of

translating

language

sentences

representation the p a r s i n g

language which

KS, m o r e

precisely:

to g e n e r a t e

trees,

are p r o d u c e d

by the n e t w o r k - p a r s e r transition As networks

of PASSI. were used

In the e a r l i e r to s t a t e approach level, would

ISLIB-approach for K S - c o d e

augmented

the r u l e s turned out

generation.

stated and

earlier,

this

to b e not v e r y not possible to r e d u c e

efficient

remained

at an a d - h o c which

since have

it w a s

to find

a theoretical of r u l e s needed

foundation within

allowed

the a m o u n t

this

approach. The new concept the c o n c e p t where lation prets The for the n a t u r a l - l a n g u a g e - t o - K S grammar for a p a i r goal translation starts LI, from L2,

of a t r a n s l a t i o n source

of l a n g u a g e s language

L I is the (WULZ

language then

and L 2 the

of the t r a n s which inter-

1976).

PASS2

can be v i e w e d rules. with

as a p r o g r a m

the t r a n s l a t i o n

grammar

translation

grammar 1969),

m a y be c o m p a r e d the r u l e s

a transformational on a l r e a d y

grammar existing

(GINSBURG/PARTEE

of w h i c h

operate

114

derivation i.e. with German

trees

of a p h r a s e

structure The

grammar nodes of

of the s o u r c e these trees and

language, labelled

in the P L I D I S

system.

are

non-terminal (source way the

(syntactic

categories

of the grammar)

terminal In a

symbols similar tion

language

words)

of the p h r a s e - s t r u c t u r e rules are applied

grammar.

translation

grammar

to the d e r i v a the c o n t e x t from the

trees

of the s o u r c e to the lists

language

which and

correspond labelled

within

of P L I D I S parsing For the

of b r a c k e t e d

constituents

of n a t u r a l sake here

language

sentences. and clarity, examples and the translation grammar is ex-

of s i m p l i c i t y by s i m p l i f i e d trees. grammar

plained

in an a b b r e v i a t e d

terminology

of d e r i v a t i o n The

translation

disposes

of t h r e e

types

of rules: symbols - i.e. in

(I) r u l e s

for the r e p l a c e m e n t natural language

of s o u r c e words

language

general

- by the c o n t e x t - p a t t e r n

of t h e i r

goal-language (2) insertion

equivalent for the goal language context-pattern

rules

(3) p a t t e r n The rules

raising

rules. on the c o n c e p t symbol that it w i l l what be p o s s i b l e call to d e f i n e

are b a s e d goal The

for e a c h pattern.

language context

something of a s y m b o l this symbol

w e will

a context about the

pattern in w h i c h

is a p r e d i c t i o n will occur. state Thus

syntactical

context

the w r i t e r

of a translation (1), that

grammar

for G e r m a n PROBE may

to KS m a y

in a rule of type word "Probe" a has of

the K S - s y m b o l The grammar

correspond that

to the g e r m a n

(sample). two-place

of KS defines, sort

PROBE where

m a y be u s e d w i t h i n the first argument a <TEP~>

<TERM>

of the

<stoffkoll>, and the

to be a < T E R M > the sort <int>.

of the s o r t Thus

<firma>

second

argument

in any c o n t e x t PROBE,

where

the g e r m a n

"Probe"

is t r a n s specias a

lated fied

by the K S - s y m b o l as a b o v e like

it w i l l pattern

be f o l l o w e d for P R O B E

by two terms

and the c o n t e x t fig. 10.

can be d e f i n e d

structure The rule

of type

I for the g e r m a n

word

"Probe"

would

state

then,

that

"Probe" In the head <TERM

is to be r e p l a c e d context pattern

by the c o n t e x t <TERM

pattern

of fig.

10. as the

of PROBE,

; stoffkoll>

is v i e w e d

of the c o n t e x t ; firma>

pattern,

whereas are

the n o n - t e r m i n a l considered as

KS-symbols and it is

and <TERM

; int>

"slots" grammar,

the t a s k how

of the type in these

(2) r u l e s slots.

of the

translation

to d e f i n e ,

to fill

A distributional

analysis

of the c o n t e x t

115

<TERM;stoffkoll>

PROBE

< T E R M ;f i r m a >

<TEP_M ;i n t >

fig.10

Context

pattern

of P R O B E

of g e r m a n i.e.

"Probe"

will

show

that

the n o m i n a l case

attributes

of

"Probe", noun-group of w h i c h Thus the has

a noun-group "Probe",

in the g e n i t i v e are the

or a p r e p o s i t i o n a l the t r a n s l a t i o n s context assigned

following

the c o n s t i t u e n t s , slots of

to be i n s e r t e d insertion lation that rules

into for

the P R O B E pattern

pattern. within

the

the K S - c o n t e x t language

transstate with a

of a n a t u r a l the

sentence <TERM

to g e r m a n ; int>

"Probe"

would

e.g.

slot w i t h of the

the n a m e same name,

has

to be f i l l e d

context

pattern

resulting "Probe",

f r o m the

translation also

of a

prepositional prepositions E__Xample: L e t RRI, ...,

noun-group like "am" or

following "vom".

specifying

possible

RR 6 denote

some

rules

of type

(1)

for the r e p l a c e m e n t KS-equivalent, into context

of

german IRI,

words

b y the c o n t e x t of type empty

pattern

of t h e i r insertion

IR 2 r u l e s

(2) for the context onto

patterns;

let E d e n o t e the a p p l i c a t i o n "die P r o b e of

the of

pattern, german

consisting "Probe"

of no symbols; the c o n t e x t & Co

these

rules Co v o m

within

bei MOller&

15.12.76"

(the

sample

from MHller in fig. which

12/15/76)

c a n be r e p r e s e n t e d stand

schematically

as s h o w n

11, the

where arc.

the arcs

f o r the a p p l i c a t i o n

of the r u l e s

label

The u s e c a n be

of the shown

sorts

of KS for d i s a m b i g u a t i o n considers "die P r o b e

within

the t r a n s l a t i o n & Co am 15.12.76"

if o n e

yon MHller

116

die

Probe

bei

M011er&Co

vom

15.12.76j

C
<TERM ;stof fkol i>

PROBE

<TE~;firma>

<TERM;int>

fig.11

Simplified i l l u s t r a t i o n of the a p p l i c a t i o n rules (RR) and i n s e r t i o n rules (IR).

of r e p l a c e m e n t

117

as ~ i t e r n a t i v e As insertion

formulation

for "die P r o b e b e i M 0 l l e r the t r a n s l a t i o n

& Co v o m 15.12.76". noun-

rule IR2 r e q u i r e s

of a p r e p o s i t i o n a l

g r o u p w i t h the p r e p o s i t i o n m e n t into the c o n t e x t "M0ller & Co" w o u l d

"yon" or "am" to b e i n s e r t e d as tense a r g u assigned to "Probe", the t r a n s l a t i o n of

pattern

take the p l a c e of the s e c o n d T E R M w i t h i n to " M ~ l l e r & Co"

the P R O B E -

patAtern. But s i n c e the K S - e q u i v a l e n t sort <firma>,

is a T E R M of the

a c h e c k of the sort c o n s i s t e n c y w i l l b l o c k the i n s e r t i o n

at the p l a c e of a T E R M w i t h the sort < i n t e r v a l l > . F o r each i n s e r t i o n context-pattern deleted (see fig. rule there is a side e f f e c t defined. If a f i l l e d - i n it is

is i n s e r t e d

into the slot of a n o t h e r pattern, i.e.

at its o r i g i n a l

place

r e p l a c e d b y the e m p t y p a t t e r n E

12 for i l l u s t r a t i o n ) . symbols i.e. all n a t u r a l l a n g u a g e w o r d s of a d e r i v a of t h e i r K S - e q u i v a l e n t in, the p a t t e r n in t h e raising

If all t e r m i n a l

tion t r e e are r e p l a c e d by the c o n t e x t - p a t t e r n and if all slots of t h e s e p a t t e r n s r u l e s m a y be a p p l i e d (I) A n o n - t e r m i n a l are f i l l e d

to the r e m a i n i n g

structure

f o l l o w i n g ways:

s y m b o l x of the s o u r c e l a n g u a g e in c o n t e x t p a t t e r n

g r a m m a r can be reis d o m i n a t e d

placed by a filled

if this p a t t e r n which

by x and if all o t h e r c o n t e x t patterns, are e q u a l to the e m p t y c o n t e x t pattern. symbol x of the s o u r c e

are d o m i n a t e d by x,

(2) If a n o n - t e r m i n a l

language

grammar dominates

only empty patterns,

then it is r e p l a c e d b y the e m p t y pattern. tree s t r u c t u r e is l a b e l l e d b y a

(3) If the top node of the r e m a i n i n g

s y m b o l of the g r a m m a r of the goal l a n g u a g e , pattern can be r e p l a c e d b y the s t r i n g w h i c h of the s y m b o l s

a h e a d y of a c o n t e x t results f r o m the con-

catenation tion

d o m i n a t e d by the h e a d y u n d e r the c o n d i a n o t h e r h e a d of a c o n t e x t pattern. the a p p l i c a t i o n of the p a t t e r n raising

t h a t y does n o t d o m i n a t e illustrate example.

For s i m p l i c i t y w e will rules w i t h an a b s t r a c t Example:

L e t A, B, C, D be some n o n - t e r m i n a l m a r a n d a, b, c, d, e, f s y m b o l s PR2, (2),

symbols

of a s o u r c e l a n g u a g e grammar; above in

gram-

of the goal l a n g u a g e rules as d e s c r i b e d

let PRI, (I), of

P R 3 d e n o t e the p a t t e r n - r a i s i n g (3) r e s p e c t i v e l y . Fig.

13 then i l l u s t r a t e s

the a p p l i c a t i o n

t h e s e rules

to the tree, w h o s e

top is l a b e l l e d b y A and w h e r e a and d The n u m b e r s preceding the rule n a m e s

are the h e a d s of c o n t e x t patterns.

<TEP~4; s t o f f k o l l >

PROBE

<TERM;firma>

<TE~;int>

( BETRIEB

FRITZ-MULLER&CO

7@@~.STUTTGART

76.12.15.

fig.12

Result M~ller

of t h e a p p l i c a t i o n of & Co v o m 1 5 . 1 2 . 7 6 " .

the

rules

of

type

I and

2 on

"die

Probe

bei

F \
bcef

fig.13

Application of pattern raising rules

(PR).

120

indicate

the order,

in w h i c h

these rules were

applied. of the p a t t e r n r a i s i n g l a n g u a g e grammar, then

If the s t r i n g r e s u l t i n g rules consists

f r o m the a p p l i c a t i o n symbols

of t e r m i n a l

of the goal

a translation Since various

has b e e n found. d e t a i l s of a t r a n s l a t i o n grammar for a s u b s e t of G e r m a n

into KS are still interprets form.

s u b j e c t of e x p e r i m e n t a t i o n

the PASS2 p r o g r a m w h i c h its d e f i n i t i v e

the t r a n s l a t i o n

rules has not y e t r e a c h e d

5. I n f o r m a t i o n

handling

and p r o b l e m - s o l v i n g

The processor consists storing

for i n f o r m a t i o n s

and p r o b l e m

descriptions

(see fig. for

14)

of, on the one hand, the s y m b o l i c

data b a s e m a n a g e m e n t and,

procedures

data into the data base; for the a n s w e r i n g

on the o t h e r hand, This sec-

'problem-s01ving

procedures'

of q u e s t i o n s .

tion d e a l s p r i m a r i l y w i t h the latter, are not w i t h i n only in the

as data base m a n a g e m e n t development

problems arise have will by

the m a i n topics of the P L I D I S application

and will

"real-life"

of the system,

when mass-data

to be p r o c e s s e d .

For the case that the P L I D I S these p r o b l e m s , existing

data b a s e m a n a g e m e n t

be too w e a k to h a n d l e adaptions of a l r e a d y

the c o m p o n e n t m a y be r e p l a c e d systems.

data b a s e m a n a g e m e n t

5.1. A n o u t l i n e of D a t a b a s e m a n a g e m e n t

It is the t a s k of the data b a s e m a n a g e m e n t 'normalise ~ the K S - f o r m u l a s representing

component

of P L I D I S

to in

the s y s t e m ' s k n o w l e d g e Its o t h e r

s u c h a w a y as to e n s u r e easy r e t r i e v a l . the s e c u r i t y of the data persons. The normalising tifiers process includes reducing so that a c c e s s

task is to e n s u r e

is o n l y g i v e n to a u t h o r i z e d

skolemising

of the e x i s t e n t i a l

quan-

and s u b s e q u e n t l y

the K S - f o r m u l a s

into sets of l i t e r 'stoffkoll' 13.10.76) are might

als. C e r t a i n

argument-terms,

such as t h o s e of the type for e x a m p l e (PROBE M O L L E R denoting

r e p l a c e d by s k o l e m c o n s t a n t s , be r e p l a c e d b y the c o n s t a n t this specific sample.

Z~I@9.STOFFKOLL,

the n u m b e r of (PROBE M U L L E R

This p r e s u p p o s e s

that the f o r m u l a

13.10.76

; ~I@9.STOFFKOLL)

is also stored.

PIP

Processor

for

Informations

and

Problem-descriptions

vi

Monitor

I
Problem-solver Theorem]?rover other Term- ProblemInter- solving preter Operations

Management

~Fac

tsilData t i c s / Axioms

Heuri~s-

P L

I D

I S

S u p e r v

i s o

fig.14

Structure of the PLIDIS component for information processing and problem solving.

122

The d a t a

base

is d i v i d e d base,

into

two the

sections: items 'modes of access' Each to the items.

- the p r i m a r y - the Both gets

containing

secondary data bases prefixed

base, are

containing

the

in the f o r m of I S A M - f i l e s . the security-key of the u s e r

item

to be s t o r e d the item. The

with

entering

security-keys is a l l o w e d the k e y

of all

users

are o r g a n i s e d items node.

in a d e p e n d e n c y with his

tree.

A user

access

only

to t h o s e

prefixed This

own k e y or w i t h author-

of a u s e r have

on a d e p e n d e n t access

ensures

that o n l y

ized p e r s o n s

to s p e c i f i c

data.

5.2.

Characteristics

of the p r o b l e m - d o m a i n

for P L I D I S

First

it s h o u l d

be m a d e

clear what

is m e a n t Defined

by

'problem-solving' sense

in the game the

the c o n t e x t

of the Q A s y s t e m excludes robot answer the

PLIDIS. area The

in a n e g a t i v e theorem

problem-domain playing, and

of m a t h e m a t i c a l emphasis from

proving,

reasoning.

is r a t h e r data

on r e t r i e v i n g base with

appropriate mal number The sorts

for a q u e s t i o n

a large

a mini-

of d e d u c t i o n s . of p r o b l e m s the system will explicitly have to d e a l w i t h are: in the fact sample'. were

I. The data

retrieval base. be:

of facts

or i m p l i c i t l y asking

contained

An e x a m p l e 'how h i g h asking

of a q u e s t i o n

for an e x p l i c i t in a s p e c i f i c toxic

would

is the l e v e l an i m p l i c i t whereby

of a r s e n i c fact:

An e x a m p l e contained by ~toxic'

for

'which

materials what which

in s a m p l e from such

y';

it has

to be d e d u c e d as 'a s t u f f

is m e a n t impedes

statements

of facts

the g r o w t h is i m p e d e d

of p l a n t s

in the r i v e r which

is toxic'; the

'the g r o w t h of o x y g e n

of p l a n t s in water'

by c h e m i c a l s

reduce

level

2. The p e r f o r m a n c e such 3. The as

of a r i t h m e t i c level

operations

on the d a t a

retrieved, period'

'the a v e r a g e

of c y a n i d e processes.

over For

a specified example,

reconstruction of c y a n i d e have which caused firms,

of some

an e x c e s s i v e x, w h i c h firm

level could duced duced

was

detected

in a r i v e r

at p l a c e

this?

To a r r i v e

at an answer, from place

it has x might

to be d e have pronot

located

upstream

cyanide

as a w a s t e given, but

product; it m i g h t

this have

information

itself from

might

be e x p l i c i t l y ical processes

to be d e d u c e d process.

the c h e m -

involved

in the p r o d u c t i o n

123

4. The c o n t r o l l i n g of w a t e r samples

of p o l l u t i o n .

I n c o m i n g data a b o u t the c o m p o s i t i o n checked against the legal norms, and

is i m m e d i a t e l y

if a n o r m is f o u n d to be t r a n s g r e s s e d , This a c t i o n is a involves

appropriate samples

a c t i o n is taken. to find o u t if it

c h e c k i n g of p r e v i o u s and so on.

'first o f f e n c e ' ,

F r o m the a b o v e c a t a l o g u e ,

it f o l l o w s t h a t the p r o b l e m

solving component

of P L I D I S m u s t be able to p e r f o r m the f o l l o w i n g - matching operations operations

operations:

- set-theoretic - deduction - arithmetic

operations operations operations depends in the f i r s t component

T h e c h o i c e of the a p p r o p r i a t e m a t c h i n g p l a c e on the t e c h n i q u e s for s t o r a g e As n a t u r a l of m a s s - d a t a language used in the

'data b a s e m a n a g e m e n t ' pattern-matching...).

(hash-coding,

questions

put to the s y s t e m u s u a l l y

involve

the

use of p l u r a l n o u n phrases, tension of sets

it s h o u l d be p o s s i b l e or of m a s s - t e r m s ) .

to ask for the exThis task is p e r -

(of i n d i v i d u a l s called

f o r m e d by a c o m p o n e n t the K S - q u e s t i o n

'Terminterpreter'

(TI), w h i c h r e f o r m u l a t e s evaluates is

into s e t - t h e o r e t i c

terms and s u b s e q u e n t l y The d e d u c t i o n

this t e r m w i t h s e t - t h e o r e t i c

operators.

process proper

done by m e a n s of a t h e o r e m - p r o v e r Arithmetic resented operations

b a s e d on the r e s o l u t i o n p r i n c i p l e . problem, as L I S P as they are repfunctions. The

p r e s e n t no p a r t i c u l a r which are e v a l u a t e d

as K S - o p e r a t o r s

components

w h i c h p e r f o r m the a b o v e o p e r a t i o n s of s o l v i n g a p a r t i c u l a r An illustrative 5.4.

i n t e r a c t w i t h each o t h e r This interaction is of the

in the p r o c e s s g u i d e d by a

problem. example

'monitor'.

of the o p e r a t i o n

problem-solver

is s h o w n in s e c t i o n

5.3. P r o b l e m - s o l v i n @

w i t h an a u t o m a t i c

' t h e o r e m - p r 0 v e r '~)

Since PLIDIS its k n o w l e d g e

disposes

of a 'declarative'

internal

representation

of

c o n s i s t i n g of a set of K S - f o r m u l a s , an a u t o m a t i c

it s e e m e d i n d i c a t e d (TP) b a s e d

to a d o p t as d e d u c t i o n m e c h a n i s m on the r e s o l u t i o n uniform principle

theorem-prover

o f f e r i n g the a d v a n t a g e s (cf. C H A N G

of a u n i v e r s a l , & LEE 1970).

system with provable properties

~) for f u r t h e r

information

see D I L G E R

1976a

~24

Without

giving a detailed

analysis

of the r e s p e c t i v e m e r i t s of d i f f e r e n t for

problem-solving

approaches,

it m a y be a r g u e d that the j u s t i f i c a t i o n against its a l t e r n a t i v e , namely

the c h o i c e of a t h e o r e m - p r o v e r gramming ability language with deductive to d i f f e r e n t

a pro-

capacity, lies p r i m a r i l y Nevertheless

in its a d a p t -

problem-domains.

the l a t t e r m e t h o d is c l e a r l y de-

may achieve

greater efficiency where

the p r o b l e m - d o m a i n

f i n e d and the sorts of a n s w e r s Theorem-provers ficiency. nificantly is p r o b a b l y have

expected

are k n o w n in advance. critised for t h e i r inef-

in the p a s t b e e n h e a v i l y

R e c e n t r e s e a r c h on i m p r o v e d increased their efficiency.

searching

strategies

h a v e sigcriticism in f i r s t -

A m o r e deep r e a c h i n g

that all k n o w l e d g e calculus.

c a n n o t be a d e q u a t e l y

presented

order predicate

In our o p i n i o n , earlier,

the e x t e n s i o n s

incorporated the

in KS, w h i c h w e r e d e s c r i b e d p o w e r of f i r s t - o r d e r c a t e s of a

have considerably

improved

predicate

calculus.

Another objection

by a d v o operates reon

'procedural'

approach

is t h a t a t h e o r e m - p r o v e r

a static w o r l d - m o d e l

whereas

in a r e a l - w o r l d model,

it is o f t e n

q u i r e d to be able to r e m o v e data states of the world. all facts and a c t i o n s variables. Most important in the e v a l u a t i o n In P L I D I S

f r o m the data base to r e f l e c t c h a n g e d

t h e r e is no need to r e m o v e d a t a since by s i t u a t i o n a l and t e m p o r a l

are c h a r a c t e r i s e d

of the e f f i c i e n c y of a TP is the exheuristics which not

t e n t to w h i c h it can be g u i d e d by a p p r o p r i a t e only evaluate syntactic features but also

s e m a n t i c ones. principle proceeds in two

The t h e o r e m - p r o v e r main stages:

b a s e d on the r e s o l u t i o n and r e s o l u t i o n . consists

normalization of n o r m a l i s i n g

The p r o c e s s

in r e d u c i n g the K S - f o r m u l a s in c o n j u n c t i v e normal

into form,

sets of l i t e r a l s o b t a i n e d out of c l a u s e s the e x i s t e n t i a l or - f u n c t i o n s . For g r e a t e r entered once. efficiency, normalising quantifiers

having been replaced by skolem-constants

t a k e s p l a c e w h e n the f o r m u l a s

are

into the data base,

so that it o n l y needs

to b e c a r r i e d o u t

Questions

m u s t of c o u r s e

still be n o r m a l i s e d by the TP. generally involves two i m p o r t a n t as-

The p r o c e s s pects:

of r e s o l u t i o n

proper

i) s e a r c h s t r a t e g i e s

and ii) h e u r i s t i c s . fall such a l t e r n a t i v e 'depth-first' techniques

U n d e r the h e a d i n g o f s e a r c h s t r a t e g i e s as 'state space' versus

'problem r e d u c t i o n ' ,

versus

'breadth-first'

analysis,

and the use of c o n n e c t i o n

g r a p h s as d e s c r i b e d

125

by K O W A L S K I

(1975),

s u p p o r t e d by m e t h o d s presents important

such as the W a l t z - a l g o r i t h m . for p a r t i c u l a r types of of the to

E a c h of these t e c h n i q u e s problems. It w a s d e e m d e d

advantages

in the P L I D I S

implementation

TP to a l l o w the d e d u c t i o n the type of p r o b l e m In a Q A system,

strategies

to be k e p t variable,

according

at hand. 'the p r o b l e m r e d u c t i o n ' method (such as

for example,

'input r e s o l u t i o n ' )

has the a d v a n t a g e

that the q u e s t i o n b e i n g a s k e d clause, thus en-

(i.e. the conclusion)

can be t a k e n as the s t a r t i n g containing a predicate

s u r i n g that only c l a u s e s tion is resolved upon.

r e l e v a n t to the q u e s of i n p u t r e s o l u is avoided.

B e c a u s e of the i n c o m p l e t e n e s s

tion c o n c l u s i o n s

f r o m false p r e m i s e s

('ex falso quodlibet')

On the o t h e r hand, tion,

w h e r e the TP is to be u s e d for c o n t r o l l i n g p o l l u is in g e n e r a l indicated. of the P L I D I S theorem'prover functions in not k n o w n and a 'state-space' de-

the g o a l - s t a t e

duction method The 'default'

is h e n c e

implementation

state-space

mode with breadth-first

analysis.

It is p o s s i b l e or

to c h a n g e

the o p e r a t i o n m o d e to e i t h e r by specifying the a p p r o p r i a t e

'unit r e s o l u t i o n ' parameters.

'input r e s o l u t i o n ' ,

The a x i o m s b e i n g is e n v i s a g e d

r e s o l v e d u p o n are l i n k e d b y a c o n n e c t i o n connection

graph. ~) It

to c o n s t r u c t

g r a p h s w h e n the d a t a is e n t e r e d into s u b s e t s

in the d a t a base.

The set of c l a u s e s c a n thus be d i v i d e d graph, representing

l i n k e d by a c o n n e c t i o n of r e l a t e d axioms.

different mini-world models heuristics c o u l d be simifunction. into inves-

As a f u r t h e r e x t e n s i o n , subsets,

larly connected Whether

into

w h i c h w o u l d aid the s e l e c t i o n

the e n t i r e it seems

system's

knowledge

c a n thus be n e a t l y d i v i d e d verified. On a p r e l i m i n a r y

s u b s e t s has n o t y e t b e e n e m p i r i c a l l y tigation,

t h a t at l e a s t c e r t a i n c o h e r e n t b o d i e s of k n o w l e d g e such as legal norms, geographical data, composi-

can be d i s t i n g u i s h e d , tion of c h e m i c a l s etc.

A t each step in the d e d u c t i o n of clauses This

process,

the s e l e c t i o n

of the next p a i r function'.

to be r e s o l v e d u p o n is g u i d e d by a calls u p o n s e m a n t i c

'selection

function

as w e l l as s y n t a c t i c

heuristics.

~)
cf. D I L G E R 1976b

126

In the

context

of r e s o l u t i o n as f u n c t i o n s so t h a t

by m e a n s which

of c o n n e c t i o n links by

graphs, some

heuristics or to

c a n be d e f i n e d semantic

evaluate

syntactic is c h o s e n

criteria upon.

the

'optimal'

pair

of c l a u s e s

be r e s o l v e d Such

heuristic

functions

could

be v i e w e d

as f u n c t i o n s fuzzy

operating

on the de-

subsets fined Let FS

of the set of

links,

having

as v a l u e s

sets

of l i n k s

as follows: {kl, fuzzy K' . o. k n} be the set c o n s i s t i n g a subset set of links in the c o n n e c t i o n of all fuzzy sets graph and

K = the

of the u n i o n

FS(K'), set FS of

whereby K'

denotes

of K: FS = K~cK

FS(K').

(A f u z z y

is a function:

f : K'~[O,I]). is a p a r t i a l function f r o m the p o w e r s e t of K into

A heuristic the set FS:

function

h whereby This all means subsets

: 2K~Fs 6 FS(K') (K'~K) function of a are does not need to y i e l d a value for

h(K') that

a heuristic

of K. only

In the c a s e those links

'depth-first' evaluated which

method end

of a n a l y s i s , same

for e x a m p l e , clause. In g e n e r a l , tions will

in the

for a p a r t i c u l a r be defined.

subset

K' of K,

several

heuristic

func-

It is n e c e s s a r y functions

to a l l o w to the

the u s e r

the p o s s i b i l by his

ity o f a d d i n g problem. An e x a m p l e size

new heuristic

system,

required

of a s y n t a c t i c i.e.

heuristic the n u m b e r

would of

be a f u n c t i o n

computing (e.g. if the

the

of the u n i f i e r , of the link

substitutions then

unifier would

k contains

p elements with unit

I f(k)=~) ; a n o t h e r (the v a l u e

example

be t h e u s e would

of r e s o l u t i o n '0' or into

clauses

of this

function Semantic

be e i t h e r take

'I'). the semantic characterisation must of

heuristics and

account of the

the predicate be formulated

the a r g u m e n t s

literal. and use

Such h e u r i s t i c s

in t e r m s

of the w o r l d - m o d e l makes

the p r o b l e m of the sortal Before

at hand. structure a substiis

Finally,

the P L I D I S

problem-solver a unifier

of KS in s e l e c t i n g tution is c a r r i e d with

for a set o f

clauses.

out,

it is c h e c k e d of the

if t h e

sort of the c o n s t a n t

compatible following

the

sort

argument,

as is i l l u s t r a t e d

by the

two clauses:

127

i (AT x y)vl (MOVE x y z) v (AT x z) (AT table The f o l l o w i n g (PLACE table)) together with a specification

u n i f i e r can b e e s t a b l i s h e d

of the sortal c h a r a c t e r i s a t i o n (table P H Y S O B J ) / x , ( ( P L A C E yielding the f o l l o w i n g

of the s u b s t i t u t i o n s table)LOC)/y

resolvent: z)

i (MOVE t a b l e The a b o v e c l a u s e of the sort

(PLACE table) z) v (AT t a b l e

is i l l - f o r m e d

as the f i r s t a r g u m e n t of MOVE has to be

'animate';

the s u b s t i t u t i o n m u s t h e n c e b e rejected.

5.4.

l l i u s t r a t i n 9 the p r o b l e m - s o l v i n g

c o m p o n e n t of P L I D I S

The theorem-prover question-answering the four c o m p o n e n t s matching functions,

is o n l y p a r t of the p r o b l e m - s o l v i n g s y s t e m PLIDIS. described T h e r e is c o n s t a n t 5.2.,

c o m p o n e n t of the between and and

interaction

in s e c t i o n

namely retrieving arithmetic

set t h e o r e t i c

operations,

operations

the t h e o r e m - p r o v e r . The following is a b r i e f semantic characterisation of the k i n d of q u e s -

tions p u t to PLIDIS. (I) Q u e s t i o n s a s k i n g w h e t h e r or n o t s o m e t h i n g is the case

(Yes/no question) (2) Q u e s t i o n s questions) (3) Q u e s t i o n s asking about 'processes' or s e q u e n c e s of a c t i o n s needed asking for s p e c i f i c information (what/which/who...

to r e a c h a goal In all t h e s e c a s e s can b e r e d u c e d senting

(how/why questions) a theorem-prover knowledge?' the v a r i a b l e w h o s e extension which is b e i n g is i m p l i e d can be c a l l e d upon, as all of t h e m repre-

to the form:

'can q be d e d u c e d

f r o m the f o r m u l a s

the s y s t e m ' s

In the c a s e of t y p e 2 q u e s t i o n s , questioned is 'traced'

b y a d d i n g an

'answer-predicate'

b y the c o n c l u s i o n . clause the

The d e d u c t i o n clause'

is c o m p l e t e d

if i n s t e a d of the e m p t y of o n l y o n e

'answering

is derived.

This consists

formula with the answer.

the a n s w e r p r e d i c a t e , In the c a s e of s

the a r g u m e n t

of w h i c h c o n s t i t u t e s is similar,

3 questions,

the p r o c e d u r e clause

b u t the a r g u m e n t of the f o r m u l a of the a n s w e r i n g

is not an indi-

128

vidual lected The tion

variable

hut

a term,

which

denotes

the s e q u e n c e

of a c t i o n s

col-

in the d e d u c t i o n

process. s h o w the similarity ('?' of the K S - r e p r e s e n t a operator

following of a type

two e x a m p l e s I and a type

2 question.

is a p r a g m a t i c

indicating - "Has the

a yes/no

question) already been checked three times this year?"

firm MOller

(?(ANZAHL

int (LAMBDA x I (EXIST (UND stoffkoll Xl (BETRIEB M ~ L L E R & CO 7 O O O . S T U T T G A R T ) int stoffkoll, xI ; xI )

(PROBE

(IN 1976. 3))

int, xI ))))

- "How o f t e n

has M U l l e r zahl xI

been

checked

this

year?"

(LAMBDA

(ANZAHL

int (LAMBDA x I stoffkoll (EXIST x I (UND (PROBE (BETRIEB M U L L E R & CO 7 O O O . S T U T T G A R T ) int stoffkoll) xI ; x1

(IN 1976. zahl,


; x I ))

x int,}))) I

The part

interaction of P L I D I S

between will be

the v a r i o u s illustrated appear

components by m e a n s

of the p r o b l e m - s o l v i n g example, The ease

of a f i c t i c i o u s simplified

wherein

the K S - f o r m u l a s have been

in a s o m e w h a t

format.

predicates

'translated'

into E n g l i s h , reader. Input

to e n s u r e to the

greater

of r e a d a b i l i t y following 'Which MOller The

t o the n o n - G e r m a n

system

is the

question: toxic taken materials were contained in the samples of the firm

on 2 4 . 5 . 7 5

a n d on 7.9.75?' of this for question in KS w o u l d take the foland

internal form

representation (making

lowing

allowance

the

translation

of t h e p r e d i c a t e s

individual

terms): (UND(COMPONENT(SAMPLE M~LLER(LISTE 24.5.75 7.9.75))

(LAMBDA x s t ~

x stOff)
(TOXIC x S t ~

129

The data b a s e c o n t a i n s

following

axioms d e f i n i n g

'toxic' m a t e r i a l

in

the c o n t e x t of w a t e r p o l l u t i o n , duction: - toxic are such m a t e r i a l s which

w h i c h m i g h t b e c a l l e d upon for the de-

interfere

d i r e c t l y or i n d i r e c t l y w i t h

the f a u n a or f l o r a in the river. - poisons interfere d i r e c t l y w i t h the f l o r a and fauna of the river. the o x y g e n level of the w a t e r interfere indi-

- materials

which reduce

r e c t l y w i t h the f l o r a and fauna. - chemicals which materials reduce stimulate growth excessively or s l i g h t l y o x i d i s i n g

the o x y g e n

c o n t e n t of the water. as follows: x st~

The a b o v e a x i o m s

are f o r m a l i s e d

(i

(FUERALL x s t ~

(IMPLIK(ODER(DIRINTERFER

( I N D I R I N T E R F E R xSt~ (TOXIC xSt~

(2

(FUERALL x

stoff

(IMPLIK(POISON

x st~

( D I R I N T E R F E R xSt~

(3

(FUERALL x

stoff

(IMPLIK(REDUCEOXYGEN

x st~

( I N D I R I N T E R F E R xSt~ (4) (FUERALL x stoff (IMPLIK(ODER(STIMULGROWTH x st~

(OXIDISING xSt~ (REDUCEOXYGEN xSt~

Apart

f r o m t h e s e axioms, of the s a m p l e s

the d a t a b a s e

contains

entries

a b o u t the com-

position 7.9.75, cals,

taken f r o m the f i r m M U l l e r o n 24.5.75 and on a b o u t the p r o p e r t i e s excessively of c e r t a i n c h e m i p l a n t g r o w t h and

as w e l l as i n f o r m a t i o n that n i t r a t e

for e x a m p l e

stimulates

that a r s e n i c (5)

and d y a n i d e

are poisons: 24.5.75) CYANIDE))

(COMPONENT(SAMPLE

MUELLER

(LISTE O X Y G E N (6) (COMPONENT(SAMPLE

SULPHATE 7.9.75) CYANIDE

MUELLER

(LISTE N I T R A T E (7) (8) ( S T I M U L G R O W T H NITRATE)

LEAD))

( P O I S O N ( L I S T E A R S E N I C CYANIDE))

130

In order (i) The

to deduce

the answer

the f o l l o w i n g

steps

are required: into a s e t - t h e o set-theo-

'TI' c o m p o n e n t formula,

reformulates

the K S - q u e s t i o n

retic retic (9)

whereby

the o p e r a t o r s

ET and VEL denote

intersection

and union,

respectively. 24.5.75)) 7.9.75)))

(ET(VEL(COMPONENT(SAMPLE (COMPONENT(SAMPLE (TOXIC))

MUELLER MUELLER

(ii) The e x t e n s i o n

of the individual

set terms c o n t a i n e d

in the for-

mula has to be defined: A B C : (COMPONENT(SAMPLE : (COMPONENT(SAMPLE : (TOXIC) operations called by TI o b t a i n from (5) and (6) give the MUELLER MUELLER 24.5.75)) 7.9.75))

The m a t c h i n g following A = B =

answers

to A and B: SULPHATE CYANIDE CYANIDE) LEAD) TOXIC is found, the t h e o r e m - p r o v e r is

(LISTE OXYGEN (LISTE N I T R A T E no entry

Since called (iii)

for the p r e d i c a t e

at this point. The c o n c l u s i o n (LAMBDA x which to be d e d u c e d (TOXIC x)) as: x)) changes sentences (I-4) into the f o l l o w i n g by the TP is:

is n o r m a l i s e d x))(ANS process

(10)

((NEG(TOXIC

The n o r m a l i s a t i o n clauses: (a) (b) (c) (d) (e) (f) From (g) (h)

((NEG (DIRINTERFER

X)) (TOXIC X)) X)) (TOXIC X)) X)) X)) X)) X))(INDIRINTERFER X))(REDUCEOXYGEN X))

((NEG (INDIRINTERFER ((NEG (REDUCEOXYGEN ((NEG (STIMULGROWTH ((NEG (OXIDISING (10) and (a-f)

((NEG (POISON X)) (DIRINTERFER

X)) (REDUCEOXYGEN can be deduced: X)) (ANS X))

((NEG (DIRINTERFER

(a), (c),

(10) (g)

((NEG (POISON X)) (ANS X))

131

A t this stage, cate POISON. TI in o r d e r tries

the TP does n o t find a p o s i t i v e

l i t e r a l w i t h the p r e d i control is p a s s e d of all ento

I n s t e a d of c o n t i n u i n g to r e t r i e v e

the d e d u c t i o n ,

from the d a t a b a s e the e x t e n s i o n s yielding:

about toxic materials, (LISTE ARSENIC

CYANIDE) the f o l l o w i n g fur-

Control

is p a s s e d b a c k a g a i n to the TP w h i c h m a k e s

ther deductions : (i) (j) (k) ((NEG(INDIRINTERFER ((NEG(REDUCEOXYGEN ((NEG(STIMULGROWTH X)) (ANS X)) X))(ANS X)) (10) , (b) (d) (e) , (i) , (j)

X)) (ANS X))

A t this point, NITRATE

TI r e t r i e v e s

from the d a t a b a s e the answer:

A further deduction (i)

step is: (f) , (j)

( ( N E G ( O X I D I S I N G X ) ) ( A N S X))

In this case no e n t r y a n s w e r to C is: C =

is f o u n d in the d a t a b a s e so t h a t the final

(LISTE A R S E N I C C Y A N I D E NITRATE) b y TI of the e x p r e s s i o n : yields the final a n s w e r to q u e s t i o n (10):

Evaluation

(ET(VEL A B)C)

(LISTE C Y A N I D E N I T R A T E )

6. I m p l e m e n t a t i o n

of P L I D I S

PLIDIS

is w r i t t e n

in S I E M E N S - I N T E R L I S P , (URMI 1975) system.

which

is an i m p l e m e n t a t i o n

of

Uppsala-INTERLISP BS 2000 o p e r a t i n g tion of I N T E R L I S P No specific

on a S I E M E N S - 4 0 0 4 / 1 5 1

r u n n i n g u n d e r the

Uppsala-INTERLISP 1974)

is i t s e l f an i m p l e m e n t a -

(TEITELMAN

for an I B M 3 6 0 / 3 7 0 c o n f i g u r a t i o n .

SIEMENS-INTERLISP

f e a t u r e s w e r e u s e d so t h a t the s y s t e m implementations.

will almost certainly

run in o t h e r I N T E R L I S P

132

REFERENCES IdS Institut C.L. fur deutsche & Lee, R. Sprache, Mannheim Theorem

Chang,

(1970): S y m b o l i c L o g i c a n d M e c h a n i c a l Proving. - Academic Press, New York.

Dilger,

W.

(1976a) :

Ein Frage-Antwort-System auf der Basis einer pr~dikatenlogischen SDrache. - Proceedings of the workshop in ' D i a l o g e in n a t U r l i c h e r Sprache und Darstellung von Wissen', Freudenstadt, 1976, p. 31ff. Verbindungsgraph und Auswahlfunktion. w o r k i n g p a p e r , IdS, M a n n h e i m . - unDubl.

---

(1976b) :

Ginsburg,

S.

& Partee,

B. (1969): A M a t h e m a t i c a l M o d e l of T r a n s f o r m a tional Grammars. - In: I n f o r m a t i o n and Control (1969), pp. 2 9 7 - 3 3 4 . A L o g i c of A c t i o n s . - In: B. M e l t z e r & D. M i c h i e (eds.) : M a c h i n e I n t e l l i g e n c e 6. E d i n b u r g h . Some Problems and Non-problems in R e p r e s e n t a t i o n Theory. - Proceedings o f the 1974 A I S B S u m m e r Conference, pp. 63ff. (1975): V e r a r b e i t u n g von I n f o I-4, IdS, M a n n h e i m . Netzwerken. - ISLIB-

15

Hayes,

P.J.

(1971):

(1974):

Kolb,

D.

& Lutz,

H.D.

---

& Wulz,

H.

(1975): A l l g e m e i n e Beschreibung und f u r d i e B e n u t z u n g v o n I S L I B B ~ r s e. IdS, M a n n h e i m . A Proof P r o c e d u r e U s i n g C o n n e c t i o n J o u r n a l of t h e A C M , 2 2 ( 4 ) . INTERLISP Reference Manual. Research Center, Palo Alto.

Kurzanleitung - ISLIB-Info

I-I,

Kowalski,

R.

(1975):

Graphs.

- In:

Teitelman,

W.

(1974):

- XEROX

Palo

Alto

Thomason,

R.

(1972):

A S e m a n t i c T h e o r y of S o r t a l I n c o r r e c t n e s s . - In: J o u r n a l of P h i l o s o p h i c a l L o g i c I, pp. 2 0 9 - 2 5 8 . INTERLISP /360 and Uppsala University /370 Data User Reference Manual. Center, Uppsala. -

Urmi,

J.

(1975) :

Woods,

W.A.

(1973) :

An Experimental P a r s i n g S y s t e m for Network Grammars. - In: R u s t i n , R. Language Processing. N e w York.

Transition (ed.) : N a t u r a l

Wulz,

H.

(1976) :

Konzept einer Theorie einer ubersetzungsgrammatik. - u n p u b l , m s . , IdS, M a n n h e i m . KS: e i n e f o r m a l e S p r a c h e zur k a n o n i s c h e n Darstellung natUrlicher I n h a l t e in e i n e m a u t o m a t i s c h e n Frage-Antwort-System. - Arbeitspapier LDV-MA-73-3, IdS, M a n n h e i m . Die Konstruktsprache KS. E n t w u r f e i n e s D a r s t e l lungsmittels fur natUrlichsprachlich formulierte Information. - w o r k i n g p a p e r , IdS, M a n n h e i m .

Zifonun,

G.

(1974) :

(1976):

METAMORPHOSIS GRAMMARS

A. COLMERAUER

GROUPE D'INTELLIGENCE ARTIFICIELLE U.E.R. Scientifique de Luminy Unlversit@ d'Aix-Marseille II 70, Route L@on Lachamp 13288 MARSEILLE (FRANCE)

This work was completed with the help of a grant ~ e m 730471.

SESORI (Research Convention

Let us also indicate that the Artificial Intelligence Group is an Associated Research Group of the CNRS.

Abstract :

We present some very general grammars in which e a c h ~ - w r i t i n g rule is

of the type : "replace such and such sequence of trees by such and such another sequence of trees". Within the framework of programming in first-order logic, we propose axioms for these grammars which produce efficient parsing and syntheses algorithms. We illustrate this work by the programming-language PROLOO and by two important examples : writing of a compiler and writing of an intelligent system conversing in French.

Key-words

Grammars, syntactic analysis, 1st order logic, predicate calculus,

automatic demonstration, compilation, natural language.

INTRODUCTION

In 1970 I was trying to perfect a particular Kind of non-determinist programminglanguags : q-systemS (4). This work concerned a formal sysbem allowing us to write

complex grammars, to which was associated an interpreter in order to analyse or synthesise structures conforming to these grammars. The basis of the formal system was composed of re-writing rules.

134

These rules were very general

: on the one hand they were not necessarily of the

"context-free" type, i.e. one could re-write any sub-sequence of any length in any sequence ; on the other hand, instead of working on sequences of simple symbols, one could work on sequences of complex symbols (more precisely, trees). A system of formal parameters allowed us to transmit into each symbol any In{ormation required.

The formal aspect of this work was very satisfactory

: here was an example of a

powerful language, based on few but very systematic principles. It allowed us to complete all the stages of our process of English/French translation : morphology and analysis of English sentences, stages of transference from the English deep structure to the French deep structure, synthesis and morphology of the French sentences,

Having become more interested subsequently in the semantics of language and in mechanisms of deduction, I abandoned q-systems and turned to techniques of automatic demonstration, basing my work on J.A. Robinson's principle of resolution (cf. 10 and 8).

I then collaborated in the elaboration of a programming-language PROLOG (cf. 11 and I). Originally conceived to resolve deductive problems in a system conversing in French (6), this language found immediately a number of applications : let us quote among others, formal integration (3), robotic (12) and speech-recognition (2). However, although this language was superior in many flelds to the q-systems, the latter were simpler and clearer as far as the treatment of syntax was concerned. It was to remedy this situation that we conceived metamorphosis grammars : these involve an axiomatlsation into 1st-order logic of the assoclativity of the concatenation in order to obtain in PROLO6 the facilities of the q-systems, thus obtaining a very powerful instrument for all syntactic and semantic treatment of languages.

This article is divided into two parts : a theoretical part in chapters 1 and 2, and a practical part in the last 3 chapters.

The #irst chapter introduces our terminology and proposes some ideas which may be considered a better basis for PROLOG than "SL-resolution" ideas suggested in (9). (8). We take up here

The 2nd chapter is devoted to metamorphosis grammars.

135

The third chapter gives a brief outline of PROLOG and of the way in which metamorphosis grammars ere treated in that language. For more details we refer the reader to the PROLOG-Manual [11).

Chapter 4 illustrates by an example the way in which we man write a compiler by means of metamorphosis grammars.

In chapter 5 metamorphosis grammars are used to treat the problem which interests us most of all : conversing in French with a machine capable of reasoning. The example proposed is described very briefly, but is based on an extensive study of the role of articles in French. This study follows the general line of R. Pasero's work on the representation of French in logic.

136

CHAPTER
= = = = = = = = =

A SUBSET OF I S T - O R O E R

LOGIC AS A P R O G R A M M I N G - L A N G U A G E

1.1 BASIC T E R M I N O L O G Y

In all that f o l l o w s we s u p p o s e that to each symbol i ~ 0 called its order. We w r i t e = i

is a s s o c i a t e d

a integer

order[s] Let F be a set of s y m b o l s Each f o r m u l a

called f u n c t i o n a l constructed then = 0 = n vi

symbols

end let there be a f i n i t e F :

set

of variables. [1) if (2) if [3) if then We w r i t e of terms ~[F] vi

as f o l l o w s is a t e r m f

is called a term on

is a v a r i a b l e

f 6 F f 6 F

and order[f] and order[f]

then and

is a t e r m are terms

tl,t I ..... t n

f ( t l , t 2 ..... t n) is a term. or simply ~ , the set of terms, H and HIE] or simply H the set

containing

no variables.

is often c a l l e d

a Herbrand

universe.

The e l e m e n t s

of the H e r b r a n d

u n i v e r s e are none o t h e r than the on F but r e s p e c t i n g

"good" trees of the ~

computer scientist

constructed

the o r d e r of each symbol.

A formula

or set of f o r m u l a e p

has as its v a l u e any a tree, i.e. an e l e m e n t

p'

obtained

by substi-

t u t i n g for each v a r i a b l e of

of the H e r b r a n d universe.

Let

be a n o t h e r set of s y m b o l s constructed r E R r E R as f o l l o w s and order[r] end o r d e r [ r ] :

called

relational

symbols

; we call atomic each

formula

(I] if {2) if

= 0 = n

then

is a t o m i c r [ t l , t 2 , . . , , t n)

and tl,t 2 ..... t n are terms then

is atomic. If p is atomic, then +p and -p are literals.

A clause

is a set of literals.

A (Herbrand) relational

interpretation r

I is a set of a t o m i c f o r m u l a e w i t h o u t n , it a s s o c i a t e s : the n-ary

variables.

To each

symbol

of o r d e r

relation p

between

the e l e m e n t s

of the H e r b r a n d

universe

137

P[tl.t 2 ..... t n] in the case where p iff

iff

r(t I ..... t n) E I

Vtl,t 2 ..... t N E H :

n = O, p is reduced to the boolean value

r 6 1

An interpretation

is smaller than an interpretation

iff

I c J

We consider that {1) a set of clauses is a conjunction (A) of clauses quantified at its head

(2J t h e v a r i a b l e s

of a clause are universally (A)

(3) a clause is a disjunction

of literals

[4] the sign + marks affirmation

and the sign - nezation.

We therefore

define the notion of satisfaction I satisfies

as follows

An Interpretation

[1) a set of clauses iff it satisfies of clauses is considered

each clause of the set. The empty set

as always satisfied. each value of the clause, if~ it satisfies at least one literal of the

[23 a clause iff it satisfies (3) a clause without variables clause.

The empty clause is never satisfied. -p iff iff p E I p ~ I

(4) a literal without variables a literal without variables

Between two sets of clauses A b B iff

A and B

we define the relation which satisfies A

by satisfies B .

each interpretation

1.2. REGULAR SETS OF CLAUSES AND THE SMALLEST

INTERPRETATION

SATISFYING

THEM

We are accustomed tion

to considering

a "programme"

as the definition

of a certain functhis function by

f . The "machine" which executes fix] for every x

it permits us to "compute" which is given as input.

giving the result

Let

be a set of clauses

in which eppears a certain

n-ary

relational

symbol

r.

Let us suppose that there exists a smallest interpretation ween the trees, p I therefore

interpretation r an

I which satisfies n-ary relation p

E . This bet-

associates to the symbol E

We can therefore consider

as a "programme"

defining the relation

, provided we have at our disposal us to "compute" this

deductive n-ary

rules which play the part of a "maby enumeratinz all the n-uplets

chine" allowinz

relation us.

of trees which satisfy it and which may interest

138

From this point of view, our programmes will be sets of clauses of a peculiar type, called "regular".

Definition :

A clause is said to be regular iff it contains one and only one posi-

tive literal. A set of clauses is said to be regular iff it contains only reguiar clauses.

A regular set of clauses always admits an interpretation I which satisfies it. We need only take as I the set of aii atomic formulae without variables.

One can also show that if also satisfies

is regular and if

and

satisfy

, then {{+a,+b}}

IDJ

E . (This is not always true for an non-regular set :

is e counter-example).

If we now consider the intersection of all the interpretations which satisfy a regular set, we can deduce from it the following property :

Property I. If

is a regular set of clauses, then there exists a smallest interImin[E], which satisfies it.

pretation, written

Example F = {a,b,niI,.} R = {conc} variables E = order[a] = order[b] = order[nil] = 0 order[.] = 2

order[conc] = 3 : e,x,y,z ....

+conc(.(s,x),y,.(e,z))
(each literals line represents one a f t e r

-conc(x,y,z) a clause, each c l a u s e is written by w r i t i n g its

the other). E

The alert reader will verify that the smallest interpretation satisfying in this example a s s o c i a t ~ t o conc'[u,v,w] iff u ccnc the ternary relation ~ , ~ .

is of the form

v w

is of any form is obtained by substituting bottom of u. v

~nil for the nil at the

The notion of smallest interpretation satisfying a set of clauses takes on all its interest only when one notes the second property, which follows.

139

Property

2.

Let

be a set of clauses

having a smallest variables,

interpretation we have

Imin[E]

satisfying

it. For each atomic formula without


E = {{+p}}

iff

p E Imin[E]

The rules of deduction in automatic

needed to calculate

relations will therefore

be those used

demonstration.

1.3 RULES OF DEDUCTION

The rules of deduction of resolution

presented

here are a simplification

of Robinson's

principle

(10), reasoning

on regular sets of clauses.

They are formulated taking lends itself more

into account the fact that the, notlon of a sequence of elements easily to programming than that of a set of elements.

Let

be the set of all the literals. ala2...a n with n~O. When

We will call ordered clause any sequence of

llterale

n=O , we write this sequence A . The set of L*.

ordered clauses

(including 4) is written

For each

x,y C L*

, we agree that if x = ala2...a n and if y = blb2...b n

x y = ala2...anblb2...bn
XA = AX = X

Let

be a regular set of clauses and Eord a set of ordered clauses obtained by for each clause

substituting

{+pO,-Pl,-P2 . . . . -pn }
of E an ordered clause +Po -Pl -P2 "'" -Pn where the positive literal is placed at the head.

Definition

for each x ~ord y

x,y E L* iff

we note 3u,v E L * such that x = U -P v of s obtained by

(s] 3+p E L

(b] 3sE Eord and +q t renaming

is a variant of s

the variables

in such a way as to x (in

have no common variable with (e) y = [ u t Robinson's n x ~ord y iff

v] ~ where ~ is a most general unifier sense) of the set {p,q}

3Uo,U 1 ..... u n E L * such that x = u O E~rd Ul E~rd u2 "'" E~rd Un = y

140

Since this is not the main purpose of this paper,

we ask the reader to admit that

Theorem E

: ~or any atomic formula ~ {{+r}} and r is

p a v a l u e of p

there - p +p

exists n > 0 and t h e r e e x i s t s n ~ +q and r is a value of Eord

an a t o m i c f o r m u l a q. we o b t a i n :

such that

By u s i n g p r o p o s i t i o n

2 of the preceding paragraph,

Corollary

for any atomic formula p and iff r is a value of p

r E Imin[E]

there exist -P +P Fn Eord

n > 0 +q and

and there exist an atomic formula r is a value of q

such that

Let us consider again the preceding example and try to calculate conc'[.[a,nil),.[b,nil),x] Since -conc[.[a,nil),.(b,nil),u) -conc[nil,.[b,n +conc[.(a,nil),.[b,nil),u) +conc[.[a,nil),.[b,nil),.[a,z)) E~rd E~rd

such t h a t

+conc[.[a.nil),.[b,nil),.(a,.[b,nil)]) we deduce a c c o r d i n g t o t h e c o r o l l a r y x = .[a,.[b,nil])

If we now try to calculate conc'[x,y,.[a,nil)] since -conc(u,v,.[a,nil))

all the couples

x,y

such that

+conc[u,v,.(a,nil))

E~rd

+conc(nil,.(a,nil),.[a,nil)) and s i n c e -conc[u,v,.(a,nil)) -conc[x,y,nil)

+conc(u,v,.(a,nil))

Eord

+conc[;(a,x),y,,(a,nil)E~rd
deductions are possible, t h e two s o l u t i o n s y = nil may be deduced i s of semi-decision. not nqcesare and x = . [ a , n i l )

+conc[.[a,nil],nil,.[a,nil))
and s i n c e no o t h e r x = nil Of c o u r s e , sarily y = .Ca,nil)

in general,

the set of ordered clauses that have o n l y an a l g o r i t h m

finite

and we t h e r e f o r e the field

However, i n

order to restrict sophisticated

of research, the notion

one can make t h e p r e c e d i n g t h e o r e m more of selection function.

by i n t r o d u c i n g

141

Definition

is a selection ~unetion literal,

if to each ordered clause


a

containing

at least one negative

it associate~ with -pEL

triplet x = u -p v

q[x] = [u,-p,v]

u,v E L *

Stronger theorem

Let

be any selection function. of F Eord

The preceding

theorem is

always true if in the definition

we add to the point

[a] the constraint

~[x]

[u,-p,v]

142

CHAPTER 2
===~=:===

METAMORPHOSIS

GRAMMARS

2.1

STRINGS, STRING-SCHEMAS

AND CONCATENATION

We now suppose that the set ""

of functional "nil" .

symbols

contains the binary symbol

and the symbol of order 0

We use an infix notation with bracketing tructed with the functlonal symbol

from right to left to write any term cons-

".", i.e. we write

al.S2.---.an_l.a instead of

Let

.[al,.(a2,---.(an_l,an)---)) be e s u b s e t o f H called vocabulary

A string-schema

of length

on the vocabulary with n~O and

is a term of the form

al.a2.---,an.nil

ei E V

The s t r i n g - s c h e m a string-schemas. a

of

length

0 reduces of length

to

"nil".

We w r i t s

V~

the set of all notation

For strings
for

1 we i n t r o d u c e

the abridged

a.nil

If the vocabulary schemas.

contains

no variables,

we speak of strings instead of string-

In the set of string-schemas

V*, concatenation

is e law of internal

composition

written as a product and defined by if if Of course, x = nil then xy = y then xy = a1.a2.---.an.Y element is "nil".

x = al.a2.---.en,nil

this is an associative

law of which the neutral

Moreover,

xy

is also defined for a

yE~

which is not a string-schema

.If we

use the abridged

notation of string-schemas instead of

of length I, we can now write

a I e 2 --- a n

el.a2.---,an.nil

143

2.2

RE-WRITING

RELATION

~ AND R E L A T I O N S ~ i A N O ~ *

Let ~ be a binary without variables,

relation i.e.

between

the elements

of

and let

V be a v o c a b u l a r y

V c H . relation on V* iff for each x,y s H

The relation

~ is said to be a r e - w r i t i n g x ~ y implies x , y E V* relation ~

Starting

w i t h the r e - w r i t i n g of
y y iff iff

, we define

the f o l l o w i n g

relations

bet-

ween the elements


x * x ~ o

H .
x = y there and exist x,y E V* E V* such that

i+1

u,v,r,s

x = urv x ~* y iff there

and

r ~ s i ~ 0

and

usv~

exist

such that

i x ~ y

Note that these

new r e l a t i o n s

are also r e - w r i t i n g

relations.

2.3

METAMORPHOSIS

GRAMMAR

Definition
where (1) [2) (3)

A metamorphosis

grammar

is defined

by a q u i n t u p l e t

{F,VT,VN,V S, ~)

is a set of functional

symbols

csntaining with

"." and

"nil"

V T is e v o c a b u l a r y V N is a v o c a b u l a r y that

said to be terminal

V T c H [F] with V N c H [F]. We suppose

said to be non-terminal V = VT U VN

V N N V T = B and w r i t e

[4) V S c V N . The e l e m e n t s of V S are termed s t a r t i n g non-terminals.


(5) is a r e - w r i t i n g implies x * nil relation on V* with the r e s t r i c t i o n that x ~ y

T h e language

generated L(G)

by the g r a m m a r

is the set of s E Vs is called with

strings s ~*

on t}

VT

= {t E ~T I there and _s -~ t

exist s

I~

s E Vs

, t E ~T

then

deep

structure

of

Example (I)

I : Here is an e x a m p l e

of a m e t a m o r p h o s i s with

grammar

F : {nil,zGro,a,b,suite,bs,suc,.} o r d e r [nil] = order [z~ro] = order [bs]

= o r d e r [a]

= order [b] = I

= 0

o r d e r [suite] order [.] = 2

= order [suo]

144

(2) [3) (4)

vT = {a,b} VN = Vs U { b s ( x ) Vs = { s u i t e ( x ) I x E H [F]}

I x E H IF]}
satisfying the r e - w r i t i n g relation ~ are enume-

[5) The couples rated

of strin&S

by : suite(x) suite[x) bs[suc[x)) bs(z@ro) ~ a suite[suc(x)) ~ bs[x) ~ b bs[x) ~ nil Vx E H [ F ] Yx E H I F ] Vx E H [ F ]

We o b t a i n suite(suc(suc(z@ro)))
Since

a b b b

suite[suc[suc(z~ro)))

~1 a s u i t e [ s u c [ s u c [ s u c ( z @ r o ) ) ) ) ~1 a b b s [ s u c [ s u c ( z @ r o ) ) ) ~1

a bs(suc(suc(su~(z@ro)))) a b b bs[suc(z@ro)) abbb

~1 a b b b bs(z@~e) ~1

In a general of Strings

way,

we notice

that

the language

generated

by this grammar

is the set

of the f o r m a i bj w i t h

- i ~ 0 to each s t r i n g

end t h a t

t h e deep s t r u c t u r e
ai bj

associated

is the t r e e

suc[suc(---suc(z@ro)---)) where t h e number o f "suc" is equal to j - i.

Example

2 :

Here is a n o t h e r

example

of a m e t a m o r p h o s i s

grammar

(I) F = { n i l , a , b order

,<,>,+,end,formula,value,.} [nil] = order order [a] = o r d e r [>] = order [value]

with [b] = order [<]= [end] = 0

[+] = order = I

order order

[formula] [,] = 2

= order

(2) V T = {a,b,<,>,+}

145

[3] [4] (SJ

V N = V S U (end) U { value[x] V S = {formula[x] The couples rated by : formula(a] formula(b] formula[x] value[y.z] value[nil] end < ~ end ~ > where gnates y and z designate of H[F] + ~ ~ ~ a b

I x 6 H [ F]}

I x 6 H [F]} the r e - w r i t i n g relation ~ are enume-

of s t r i n g s s a t i s f y i n g

value(x] end value(z)

~ <formula(y) ~ nil

arbitrary

elements u.v

of

H[F]

and

desi-

an element among

of the f o r m

We t h e r e f o r e

obthin

other results ~* a ~* < a + b + a > ~* < a + < < a + b > > >

formula(a]

formu!a[a.b.a.nilJ

formula[a.[[a.b.nilJ.nilJ.nil] which gives a good idea of whet this g r a m m a r

"does".

2.4.

METAMORPHOSIS

GRAMMAR

IN NORMAL

FORM

Definition

A metamorphosis : ~x ~ y

grammar

is said to be in normal

f o r m if it s a t i s f i e s

the r e s t r i c t i o n

implies

a E VN given

and

x E V~ examples. The r e s t r i c t i o n

This

is the case in the two grammars is not very strong,

in the p r e c e d i n g :

proposed

since one can show that

Property there

I .

For e a c h m e t e m o r p h o s ~ s g r a m m a r a grammar in normal form

G = (F,VT,VN,Vs,~ ]

exists

G' = [F',V+,V~,V~,~'

such that for each

a E V N , each
a ~* t if#

t E V~
a ~* t

Here

is a way of c o n s t r u c t i n g [I] F' = F U {te,nt} = V T U {te[a] = V N U {nt[a]

G' with I I

from

G (re] = order [nt] = I

order

[2] V~ [3] V~

a E V N} a C V T}

146

(4)

V~

= Vs the relation 4, are all enumerated by

[5) The couples of strings satisfying

(a) (b) (c)

a 4' nt(a)

te(a) 4' a

for for

each each a'x' 4

a E VN a E VT y' with

ax 4 y

implies
if

a' = a

a E VN if if a E VT x = nil if x = ~IE2-~-~n with

a' = nt(a) x' = nil

x' = E~ a2 .... ~a' a~ = a i if

a i E VT if ai E V N

a!m = t e ( a . ) z
y' = nil if

y = nil if if y = ~l~2-~-b bi E VT and with 3x,y E V* wlth b.x4 y


--I

Y' = ~I =2b .... ~b' b~ : nt(b i) b' Here is a characteristic property = b. i z

otherwise grammars in normal form.

of metamorphosis

Property 2.

For each

t E V~

each

x,y E V* 3z E V*

each wlth

i ~ 0 x 4i z and tz = y

tx 4 i y This property 2,5 RELATION

lmplles

can be demonstrated

by induction on

i .

Let

G = (F,VT,VN,Vs,4)

be a m e t a m o r p h o s i s grammar i n normal f o r m .

Definition

: ~

is a binary relation between the elements

of the Herbrand

universe. :

The set of couples of trees which verify this relation

is constructed

in this way

Vu E V * ,

Vto,tl,---,t (1) (2) if if

n E V~, then

Vbl,b2,---,b uv 0 ~ t o V 0

n E VN,

VVo,Vl,---,v

n E H[F]

u 4 tO

u 4 tobbltl~2t2---bntn

and i f

we a l r e a d y

have

--bltlV 1 ~ Vo, _b2t2v 2 ~ v I . . . . . then uvn ~ t o V 0

~b tnVn ~ Vn-1

147

Let us agree that a binary relation ~ i s If# Yx,y x ~I y implies ~is x~2 y

smaller than a binary relatlon

From the way in which lation.

constructed we deduce another de#inition

of this re-

Equivalent o#

definition

is the smallest the conditbns

binary relation [2).

[between the elements

H[F]) which satisfies

[I) and

Let us notice that the following property the couples Ix,y] satisfying ~

is constantly

verified

as we construct

Property

For each

x,y E V* x ~ y

each implies

u E H xu ~ yu

The theorem and the property which follow show that there exists a very simple llnK between the relation ~ and the relation ~*

Theorem

For each x ~y

x,y E H i#f 3e E V N, x = asv, 3s, t E V~, 3v E H and such that v minimal

as ~* t,

tv = y

By

is minimal, v = cw.

we understand

that there does not exist

o E

VT

and

w E H

such that

The damenstratlon

can be found in paragraph

2.7

If,

i n t h e p r e c e d i n g t h e o r e m we t a k e

x : ~

with

a E VN

we o b t a i n

Corollary

for a ~t

each iff

a E VN, ~ ~* t

each and

t t

s H E V~

2.6.

CALCULATING RELATION

Let

G = (F.VT,VN,Vs,~)

be a metamorphosis :

grammar in normal form,

We maKe t h e f o l l o w i n g

hypotheses

148

Hypotheses [1) there exist x x we write sets of terms VT and ~N such that ~T ~N iff iff x E VT x E VN

is a value of an element of is a value of an element of ~ = ~T U i N

(2) t h e r e e x i s t s e r e g u l a r set o f clauses x ~ y F R r iff r(x,y) E Imin[E] symbols o f symbols o f 2

such t h a t

Vx,y

EH[F]

i s the set o f f u n c t i o n a l i s the set o f r e l a t i o n a l is a relational

E E

, and R

symbol o f o r d e r

contained in

[3] no clause in -r(x,y) where

contains a negative literal of the form x , y E ~[F]

(4) if a clause of +r(x,y) then

contains a positive literal of the form

x,y E ~

we also introduce the definitions

Definition

1. Let

be a new relational t by

symbol of order 2, not contained in

R.

Me define the transformation

(1) f o r each t[+r[U,to)]

u E ~,

each

t O E ~T where
^

{ + d [ U V o , t o V o )}

v0

i s a new v a r i a b l e
^

[2) { o r each

u E ~,

each

t i E ~T"

each

b i s VN

t[+r[U,to~bltlb_2t2---~ntn) ] = {+d[UVn,toVo],-d[~ltlVl,Vo],-d[~2t2v2,vl),---,-d[btnvn,Vn_l)} where the v, 1 are new v a r i a b l e s ,

Definition

2,

We des E

by

Tr[E]

the set o f clauses o b t a i n e d by s u b s t i t u t i n g

f o r each clause i n {+r[x,y)} t h e clause t[+r[x,y)]

o f the form U g ( r does not appear in g)

U g

149 Tr[E], and (R U { d } ) - { r } t h a t of the

is the set of functional symbols of Tr[E].

relational symbols of

We then obtain the following result :

Theorem :

For each x ~ y iff

x,y s H[F] d[x,y) E Imin[Tr[E]]

The demonstration can be found in paragraph 2.8. By using the corollary of parazraph 2.5, we obtain the new corollary :

Corollary

: For each

a E VN

and

each and

t E H[F] lff d[a.nil,t] E Imin[Tr[E]]

a.nil ~*t

t E V~

Let us consider ag~n the 2nd example of e metamorphosis

grammar ziven in paragraph satis-

2.3. The re-writlng relation ~ can be defined by the minimal interpretation fying the set of clauses E

+r(fcrmula[a].nil,a.nil] +r[formula[b].nil,b.nil]

+r(value(x.y].nll,<.formula[x).end.value(y).nil] +r(value[nil).nll,nil)
+r[endi<.nil,+.nil] +r(end.ni1,>.nil)

+egal[x,x) where r,s,x,y are v a r i a b l e s . i f we take ~N={formula[x], value[x}}


Tr[E] :

The o t h e r hypotheses of the beginning of t h i s para-

graph are s a t i s f i e d
and therefore,

~T = { a , b }
min~nel interpretation

according to the preceding theorem, the relation ~ is defined by the satisfying the set of clauses

+d[formula[a].Vo,a.v o} +d[formula[b].vo,b.v o] +d[formula[x).Vl,V o] -egal(x,r.s] +d[value[x.y].v3,<.v o} -d[value[x].vl,v o)

-d(formula[x).vl,v o? -d[end.v2,v 1)
- d ( v a l u e [ y ] . v 3 , v 2]

+ d ( v a l u e [ n i l ] . V o , V o] +d(end.<.Vo,+.vo] + d [ e n d . v o , > . v o) +egal(x,x]

150

The l a s t c o r o l l a r y and the deductive r u l e s of paragraph 1.3 a l l o w us f o r example to analyse the s t r i n g


< a + b >

to obtain the deep structure i%rmula (a, b. nil ) by the sequence of deductions -d (formula(x). nil,<, a. +. b.>. nil) +d (formula (x). nil,<, a. +. b.>.nil)

+d (formula (a.b. nil) .nil,<.a.+,b.>. nii) and inversely to produce the terminal string
< a + b >

i~rom the deep structure formula (a. b. nil) by the sequence of deductions -d (formula(a. b. nil). nil, x) +d (formula (a. b. nil). nii, x)

+d(formula(a.b.nil).nil,<.a.+.b.>.nil)

Remark :

All atomic formulae constructed with the relatlonal symbol

are always

of the form :

d(f(
where f

1 ). 2 , 3 )
n 9 This is true in a general

is a precise functional symbol of order

way and results fmom the restrictive hypotheses stated at the beginning of the paragraph. We can therefore substitute for each of these formulae the formula :

where

f'

f'( 1 , 2 , 3 ) i s a new r e l a t i o n a l

symbol of the order

n+2

associated to the symbol f.


can be written :

If we take up our example again, the set of clauses

Tr[E]

+formula'Ca,va,a. Vo) + f o r m u l a ' { b , V o , b . v o) + f o r m u l s ' ( X . V l , V o) - e g a l ( x , r . s ) - v a l u e ' C x , v l , v o) - v a l u e ' ( y , v 3 , v 2) +valueP(nil,Vo,V o) +end'(<.Vo,+.v o) +end'(Vo,>.v o) +egal(x,x)

+ v a l u e ' ( x . y , v 3 , < . v o) - f o r m u l a ' ( X , V l , V o) - e n d ' ( v 2 , v 1)

151

and to obtain,

for example,

the deep structure

formula[a.b.nil] of the string < a + b > we need only make the sequence of deductions -formula'(x,nil,<.a.+.b.>.nil) :

+formula'(x,nil,<.a.+.b.>.nil)

+formula'[a.b.nil,nil,<.a.+.b.>.nil)

2 . 7 DEMONSTRATION OF THE THEOREM CONCERNING THE LINK BETWEEN ~ AND ~

Here is the demonstration terms.

of the theorem of paragraph

2.5, of wlch we repeat the

Theorem

for all

x,y E H

(I)

x ~ y

iff

[2)

3a E V N, As ~ t,

3s,t

E V~, y=tv

3v E H and v

such t h a t minimal

x=~sv,

Demonstration,

Ist part.

Let us demonstrate that (I) implies as we construct

(2]. We need only verify of trees satis-

that the implication fying the relation

is true constantly ~. If x ~ y

the couples

two csses present themselves

(a] the couple of ~. Because of this,

(x,y)

is constructed

by using rule

(I) of the definition

and because we are reasoning on a grammar in normal form,

there exists

a E V N, u , t 0 E V { , x = E u v 0, posing

v0 E H

such that

y = t o V 0, a_u ~ t o

v 0 = WoV with s = uw O,

w 0 E V~

and

minimal

t = tow 0

't52

we o b t a i n ~s = auw 0 ~mtoW 0 = t , x = ~UWoV = a s v , y = toWoW = t v

Cb) the couple

(x,yJ

is constructed by using rule (2] of the definition of

, Because of this, and because we are reasonin@ on a grammar ~n normal form, there exists
U,to,tl,---,t x = ~ u v n, ~ItlVl n E V~, y :tov a,bl,b2,---b n E VN, v o , v l , - - - , v ~ n E H

such t h a t

o,

~ t~It1~t2---~t ---

~ v O,

~ 2 t 2 v 2 ~ v 1,

, ~ n t n V n ~ Vn_ I

the couples

(bitivi,vi_l)

having already been constructed, the implies-

tion that we wish to demonstrate applies to them. Posing


vi

= wim i

with

w E V~ implies

and

m i minimal wi_ I and m i = mi_ I = v

bitivi ~ vi_ I Since


_bltlW I ~*

bitiwi ~

wO, _b2t2w 2 ~ *

WI ,

---

, btnW n ~*

Wn_ 1

we have
bltlb2t2---btnW and t h e r e f o r e ~uw n ~ toW 0 n ~* w0

posing
s = uwn, t = tow 0

we f i n a l l y
as ~ t ,

obtain
x = auv = auw v ~ a s v , y = t o Y 0 = toWo v = t v

Demonstration, 2nd part,


to demonstrate that
Va E VN, Vs,t

Let us demonstrate that [2) implies (1), We therefore have

6 V~,

Vv 6 H

as ~

and

mlnimal implies

asv ~ tv

Recalling the property of ~ cited in paragraph 2.5, we deduce from it that we need only to demonstrate that
Va s VN, as ~ t Ys,t implies E V~ as ~ t

153

This last implication is the particular case of the proposition which ensues when u = nil Va E V N, ~su ~ t Vs,t E V~, u Vu E V * , ~ v and V i ~ O, asv ~ t a v E V~, and 3j ~ 0 such t h a t

implies

j ~ i

Let us demonstrate this last proposition by induction on The proposition is true for plication is false since a E VN and t E V~ and V N n vT = ~ implies

i = 0 , since in this case the left side of the im-

asu # t

Let us suppose the propositiod true for is true for i+l . If asu ~ i+1 t

O~<k<i

and let us demonstrate that it

there exist

r E V~

such t h a t I i asu ~ r ~ t asu to r can be made i n t h r e e ways :

The passage f r o m

[a]

asu

aSU'

= r

with

u ~lu'

u ~ E V*

since

asu' ~ i t

there exist

v E VT

and

j ~ 0 j~i

such that

U 7 ~J Vj

asv ~ tj

and therefore u ~j+1 v, asv ~ t, j+l ~ i+I

[b]

~su = ~s'u ~1 to u, = r therefore toU. i t

with

~s' ~t O,

s',t 0 E V~,

u' E V*

according to the property 2 such that u' ~i therefore su : s'u' i S.Vo Vo ' toy 0 = t

of paragraphe 2.4, there exists

v 0 E V~

according to the same property of paragraph such that


U

2.4, there exists

v E V~

V,

SV

S'V 0

154

eccordln6 to point 1 e{ the definition of as' -~ t O implles espy 0 :~ toy 0

:~

we therefore flnally obtain u _,i v, _asv = as'v 0 =~ toy 0 = t, i ~ i+I

(c)

_asu = as'u' -~lto~bltlb2t2---bntnU' as' ~* tobltlb2t2---bnt n, therefore tob_ltlb_2t2___bntn u, _,i t

= r

with b i E V N, u' E V*

s',t i E V~,

according to the property a# para~raphe 2,4, there exists such that


--'I " I - Z

v 0 E V{

b.t.b~t~---b n t n u' ~ -

-~i

v O,

toy 0 = t beln@ supposed true for such that Jl ~< i

the preposition that we wish to demonstrate k ~< i, there exists b2t2---bntnU' v I E VT ~Jl v I, and Jl >~ 0

bltlVl ~ Vo "

the proposltien that we wish to demonstrate being supposed true for k ~< i~ there exists b3t3---bntnU' v 2 E V~ and J2 >~ 0 such that J2 ~< i

-*J2 v 2 , _b2t2v 2 ~ v I ,

o,o~

....

~176

........

. . . . . . .

~ 1 7 6 1 7 . . . . . . . . 6

, o ~ 1 7 6 . . . . . . .

~ 1 7 6

u' 4Jn v n , bntnVn ~ Vn_ ~ , since as' ~ tobltlb2t2---bnt n

Jn ~ i

bltlV I ~ v O , #2t2v2 = v I , --- , ~ n t n V n ~ Vn_ 1 according to point


a s ' v n ~ toY 0

(2) of the definition of

we obtain su = s'w ~jn s'v n since uP ~jn v n v n E V T there exists

according to property 2 of paragraph v E VT


U

2.4 and since

such that
~jn v, SV = SPV n

therefore u ~jn v, ~sv = as'v n ~ toy 0 = t , Jn ~ i Q i+1

155

2.8

DEMONSTRATION OF THE THEOREM ON THE CALCULATION OF

Here is the demonstration of the theorem of paragraph terms.

2.6, of w h i c h we repeat the

Theorem

for all x ~ y

x , y E H[F] i{{ d(x,y) E Imin[Tr]E]]

Demonstration. Ist part.

Let us demonstrate first that i{{ d(x,y) E Imin[E U Trbis[E]]

d{x,y) E Imin[Tr[E]] where Trbis[E] t We t h e r e f o r e Tr[E] or t h a t 3I satisfying if{ 3J satis{ying Tr[E]

is the set o{ clauses of the form U {-r(u,v)} with that E U Trbis[E] ~ {{+d(x,y)}} +r(u,v) element of a clause of

[+r(u,v)] need o n l y =

to demonstrate if{

{{+d(x,y)}}

U {{-d[x,y)}}

E U Trbis[E]

U {{-d(x,y)}}

If

satisfies

Tr[E]

U {{-d(x,y]}} r(u,v). not

we can a r r a n g e In that by

in G

such a way t h a t

it

contains

no f o r m u l a

of the form of E

case l e t I

be t h e s e t o f a l l

the values

of the clauses J = I U {r(u,v)

satisfied with

. The i n t e r p r e t a t i o n

I 3g E G U {{-d[x,y]}}

+r(u,v)

E g}

satisfies

E U Trbis[E]

I{

that

were n o t t h e c a s e , exist [a)

according

to

hypothesis

[3)

of paragraph

2.6,

there

could only

a clause-value

of the form

t[+r[u,v)]

U {-r[u,v)} J, and therefore

which would not be satisfied by r(u.v) E J

According to the definition of of the form {+r(u,v)} not satisfied by I U g (r

J , there therefore exists a clause-value of

does n o t o c c u r i n I

g) g

. There{ore

does n o t s a t i s f y

According to the definition of t[+r(u,v]] and, by h y p o t h e s i s , t[+r[u,v)] and t h e r e f o r e so does I U g satisfies

Tr[E] , there exists a clause-value of the form

it.

Since

does n o t s a t i s f y

g ,

satisfies

J , which contradicts

(a).

156

If

satisfies

E U Trbis[E] J

U {-d(x,y)}

then the interpretation r(u,v)

I , obtained

by removing from

all the atomic formulae of the form

, satisfies

Tr[E]

U {-d[x,y)}

If that were not the case, there would exist a clause value of the form

t[+r[u,v)] not s a t i s f i e d

U not s a t i s f i e d by J . By h y p o t h e s i s J
satisfies

by I and t h e r e { o r e U g and

{+r[u,v)} therefore 3

t[+r(u,v~]

U {-r(u,v)}

satisfies U g

t[+r[u,v]] which i s c o n t r a d i c t o r y .

Demonstration. 2nd part. x ~ y iff

It remains to demonstrate that for all

x,y~H

d(x,y) E Imin[E U Trbis[E]]

Let us first demonstrate that Imin[E U Trbis[E]] = Imin[E] U Kmin Kmin = the smallest K ~ Id Id = { d [ u , v ) such that Imin[E] U K satisfies Trbis[E]

I u , v E H}

Indeed, let I be an interpretation satisfying E U Trbis[E] Let us pose I' = Imin[E] U K We obtain on the one hand I' c I on the other hand, I' satisfies also with K = I N Id

E U Trbis[E] since I' satisfies E by definition and satisfies Trbis[E] which contains no

literals of the form

+rCu,v). Therefore

Im[E U Trbis[E]] = the smallest hence the required result. Imin[E] U K which satisfies E U Trbis[E]

It remains only to demonstrate that x ~ y iff d(x,y) E Kmin

157

L e t us s p e c i f y

the value "Imin[E]

of U K

Kmin . The property satisfies Trbis[E]"

may be w r i t t e n
+r[u,v] element of a value of a clause of Imin[E] U K satisfies E U {-flu,v)}

implies that or

t[+r[u,v)]

+r(u,v) r(u,v) implies noticing that r[u,v) since, contrary to

element of a value s lmin[E] K satisfies

of a clause

of

and

t[+r[u,v]]

E Imin[E] Imin[E],

implies Imin[E] to implies

+r[u,v)

element

of

a value E

of a clause

of E

- {r[u,v)}

does n o t s a t i s f y

; the prece-

ding property

can be s i m p l i f i e d r[u,v) E Imin[E]

satisfies

t[+r[u,v)]

and t h e r e f o r e

finally Kmin = t h e u ~ v smallest K K c Id satisfies such t h a t t[+r[u,v)] for all u,v E H

implies

L e t us now n o t i c e defined by

that

t o each

K c Id

we may assoeiate bi-univocally

the relation

u ~K v

iff

d[u,v)

E K

The relation

~Kmin

is therefore

the smallest

relation

satisfying

the points

{I]

and [2) of the definition definition of the relation x i.e. =~Kmin y

of ~ in paragraphe

2.5. According deduce

to the equivalent

~ we can therefore iff x ~ y

d[x,y)

E Kmin

iff

x ~ y

158

CHAPTER 3
= = = = = = = = =

INTRODUCTION TO PROLOG

3.1 GENERAL MECHANISMS OF PROLOG

PROLOG Is e programming language which materialises ideas developed in chapter 1. (In fact, these ideas only became clear after the birth o~ PROLOG). In this language each instruction is therefore a logical statement and the execution of a programme consists in making deductions.

More precisely, a PROLOG programme will consist in a sequence of clauses. Each clause is a sequence of llterals and ends with either a full-stop or an exclamation marK, The clauses ending with a full-stop correspond to instructions to be recorded, while those ending with an exclamation-marK correspond to instructions to be executed immediately. If we take up the example common to paras 1.2 and 1.3, it may be written in PROLOG :

+CONC(NIL,*X,*X), +CONC(.(*E,*X),*y,.(*E,*Z)) -CONC(.(A,NIL),.(B,NIL),*X)) -CONC(*X,*Y,.(A,NIL)) -CONC[*X,*Y,*Z). -SORT(*X)] -SORT(*Y) J

-SORT(*X)

Let us note in passing that the variables are preceded by an asterisK. The general system, of which a large part is written in PROLOG, reads the first two clauses, records them and launches an execution as soon as it has read a third clause. This execution consists in taking the third clause culating successively the clauses x as a starting-point and in calsuch that

Yl' Y2" Y3 ....

x E~rd Yl Eord~ Y2 Eo~d Y3 Eo~d "'"

where Eord r e p r e s e n t s function several is that

the set of the first

two c l a u s e s

(see para 1 . 3 ) , If for

The s e l e c t i n g exist

w h i c h chooses a l w a y s t h e l e f t - m o s t Cl,C2,C 3 .... YI+I such t h a t (recorded in that

literal.

a Yi there

clauses a

order in

Eord) w h i c h may be used

to construct

Yl E~rd

Yi+I

the system chooses first

c 1, and it is only after completing its search in this

159

direction

that it will choose

c2

and explore that direction, can therefore

and so on .... The

order in which the clauses

are recorded

assume a certain importance.

The literal

-SORTC~X)

does not behave like the provokes

other literals

: it is a specie1

literal which, when evaluated, substituted paragraph.) for ~X

the printing

of the term which has been in the following

. (This Kind of mechanism will be described after reading the third clause,

Therefore,

the system will print

.(A,.CB.NIL))

then,

after reading the fourth clause

NIL. CA,NIL) .CA,NIL) NIL

3.2

PREDEFINED

RELATIONS

In PROLOG there exist a certain number of relatioqalsymbols set of clauses or by sub-programmes on one of them).

predefined

by a standard

Ccalled on in the evaluation predefined

of any literal based :

The followlng ere the principal

relations

Input and o u t p u t

LU(x) LUB(x) ECRIT(x) LIONE SORT[x) SORM(x)

reads the next character reeds the next character writes the character x.

and u n i f i ~ i t

with

x. with x .

other than a blank and u n i f i ~ i t

jumps a line on the output device. writes the term x . writes one after the other the characters considers constituting the string x is x.

AJOP(",",n,"f")

that from now on the sequence symbol of priority n

of characters

an infixed functional

and that it must be by f.

noted according to certain conventions Example : the evaluation of

specified

-AJOP(,.,,,I,,,X=CX=X] ,,) will allow us to note the functional in the usual manner. symbol ....

160

Note :

It is a l w a y s permitted

to write

"CIC2---Cn" If the C. & are characters.

in@tead o f

C1.C2.---.Cn.NIL

Creation

of c l a u s e s and symbols

AJOUT(x)

transforms the term

into a clause and adds it to the list of

all the clauses which already exist within the system. Example : the evaluation of

-AJOUT(+(P(*X)).-(O(A,*X)).NIL) s r e a t e s and adds t h e c l a u s e +P[*X) -O(A,*X).

UNIV[x,y)

Example : t h e e v a l u a t i o n

of

-UNIV[*X,(T.O.T.O.NIL).F(A).G(B).NIL)
unifies *X wlth

TOTO[F[A),G(B)) whereas the evaluation of -UNIV[TOTO(F(A),G(B)),*Y) unlfles *Y with

(T.O,T.O.NIL).F(A].G(B).NIL

ControI of the strategy VAR(x) / verifies that x is a variable

limits the non-determinlsm Example : Let us consider the twe clauses

(1) +P(*X)

-O(*X)

-R(*X)

-/

-S(*X)

(2] +P[*X) -U(*X) . To evaluate a literal of the form -P(y) we w i l l first use the

clause (I). Two eases then present themselves :

a)

If

one can e v a l u a t e -S(y)

the literals

-O(y]

and

-R[y)

one w i l l [2).

evaluate

but on r e t u r n i n g

one w l H n o t

use t h e c l a u s e

161

(b) If one cannot evaluate eli the llterals precede -/ , one will use clause (2).

of clause

(1) which

Treatment

o4 characters

and integers

LETTRE(x) CHIFFRE(X) PLUS(x,y,z)

veri4ies that veri4ies that

x x

is a letter. is a digit. x to the integer y and unifies the result

adds the integer with z .

INF(x,y)

veri4ies that the integer teger y

is strictly

smaller than the in-

3.3

TREATMENT

OF METAMORPHOSIS

GRAMMARS

IN PROLOG

The programming of metamorphosis the hypotheses in PROLOG :

language PROLOG was conceived to facilitate grammars in normal form. These grammars,

the definition

and use

of course,

must satisfy

o4 pera 2.8.

The grammar 04 the example 2 in pare 2.4 is written

:FORMULA(A) :FORMULA(B) :FORMULA(*X)

== ~A. == #B. == -EGAL[*X,*R.*S) == ~< :FORMULA(*X) :VALUE(*X). :END :VALUE(*Y).

:VALUE(*X.*Y) :VALUE(NIL)

==.

:END W< == ~+. :END == ~>.

+EGAL(*X,*X).
The terms which correspond to non-terminals Cpseudo-non-terminals) (pseudo-terminals) are preceded are preceded by by "#"

..... while those which correspond or '~". Literals

to terminals

can be inserted in the right-hand

side of each .rule. Of course 9 E which defines

this set of rules represents the relation

nothing other than the set of clauses

~. As these rules are read, they are transformed

(by a programme Tr[E] 9 but taking is therefore

written in PROLOG)

in order to obtain finally the set of clauses

into account the remark at the end of pare 2.6. Each pseudo-non-terminal transformed are inserted into a literal with two supplementary into these supplementary arguments. arguments.

The pseudo-terminals in the right-

The literals figuring

hand sides re main unchanged.

162

To analyse or synthesise a string one must use the prede~ined relational (abbreviation og synthesis) which plays the same role as cution o~ -SYN(FORMULA(*X).NIL,< .A.+.B.> .NIL) -SORT(FORMULA(*X))!

symboi

SYN

d . For instance, the exe-

w i ~ provoke the printing o~ the deep structure og <.Am+.B.>.NIL whereas the execution o~ -SYN[FORMULA(A.B.NIL).NIL,*X) -SORT(*X)!

wiI1 provoke the printing og the terminaI sequence og which the deep structure is FORMULA(A.B. NIL)

163

A COMPILER WRITTEN IN PROLOG

4.1

NATURE OF THE PROBLEM

We propose to write a compiler. sis grammars,

It will be constituted

principally

by two metamorphoform, the

one to analyse the source-program the machine-code

end to furnish a normalSsed

other to synthesise we will compile

by means of this normalised

form. The language and each variable

is of the ALGOL type.

It contains no declarations

within it is of integer type.

We simply give its defir~bion in BacKus normal form ; :

the reader may deduce the semantic part from the notations

<progran~

::=

<instruction>. ::= begin <instruction> := <instructions> end I <empty> I

<instruction>

<identifier> while

<arithmetical exp I> do

exp I> I J

<boolean

<instruction> <boolean

repeat <instruction> read goto if else if <identifier> <identifier> <boolean exp I>

until write

exp I> I exp I>

<arithmetical

then

<instruction>

<instruction> <boolean exp 1> : do <instruction> I

<identifier> <instructions> <arithmetical ::= exp I> ;

<instruction> <instructions> I <empty> exp 2> I

<instruction> ::=

<plus or minus> <arithmetical exp 2> I

<arithmetical <arithmetical <arithmetical exp 2> ::=

exp I> <plus or minus> <arithmetical exp 3> I <arithmetical

2>

<arithmetical exp 3>

exp 2> *

<arithmetical <arithmetical exp 3> ::=

<integer>

I <identifier> exp 1> )

[ <arithmetical <plus or minus> <boolean exp I> ::= ::= or + I -

<boolean exp 2> I <boolean <boolean exp 2>

exp I>

164

<boolean

exp 2>

::=

<boolean and

exp 3>

I <boolean

exp 2>

<boolean <boolean

exp 3> exp 3> I ( <boolean exp 1> ) [ exp 1>

<boolean

exp 3>

::=

not

<arithmetical
<relation> <integer> ::= = I less ::= <digit> ::=

exp 1> < r e l a t i o n >

<arithmetical

I <digit> <integer> I <identifier> <letter> I

<identifier>

<letter>

<identifier> <digit> <digit> ::= 0 I I I... I g I z

<letter> ::= a I b I.,, <empty> :: =

The m a c h i n e which executes the compiled programme is constituted by a series of memories (numbered from O) and by an accumulator (which we shall call accu). . The execution

Each memory may contain indifferently an instruction or an integer

of the programme starts with the instruction contained in the memory n ~ O. Here is the llst of the instructions end pseudo-instructions of the machine :

LOAD n STOR n PLUS n MINU n MULT n GOTO n GOZE n GONE n GONZ n GONN n WRIT READ

load into the accu the contents of memory n ~ store into memory n ~ n

the oontents of the accu

add to the contents of the aceu the contents of memory n ~ n subtract from the contents of the accu the contents of memory n ~ n multiply the contents of the accu by the contents o4 memory n ~ n gore the instruction contained in memory n ~ n geto n if the contents of the aceu = 0 (goto if zero] goto n it the contents of the aceu < 0 (goto if negative) gore n if the contents c~ the accu ~ 0 (goto if not zero) gota n if the contents o4 the aecu ~ 0 (goto if not negative] write the integer contained in the accu read an integer and load it into the accu stop allocate a memory and (pseudo-instruction executed when leading the program) it with n (pseudo-instruction executed

STOP

ALLO EMPTY ALLO n

allocate a memory

when loading the program].

165

4.2

PROGRAMME AND EXAMPLES OF COMPILATION

Here is the whole of the programme PROLOG which constitutes this compiler, followed bY two examples of program-compilation, In the case o# the first example we print intermediate results,

166

-AJOP( "." ,I ,"Xs163

)!

** (O) CALL FOR THE DIFFERENTPHASIS.


+COMPILE -LIGNE -READING(*U) -PRETREATING(*U,*V) -ANALYSIS(*V,*W) -SYNTHESIS(*W,*X) -ASSEMBLING(*X,*Y) -PRINTING(*Y).

** ( i ) READINGOF THE SOURCE-PROGRAM. +READING(*L.*U) - / -LUB(*K) -TR(*K.NIL,*L) -READBIS(*L,*U). +READBIS(DOT,NIL) - / . +READBIS(BLANK,*U) - / -READING(*U). +READBIS(*K,*M.*U) -LU(*L) -TR(*L.NIL,*M) -READBIS(*M,*U). +TR(".",DOT) - / . +TR("*",STAR) - / . +TR(")",RBRACK) - / . +TR(.... ,BLANK) - / . +TR("(",LBRACK) - / . +TR(*K.NIL,*K).

** (2) PREATREATINGOF THE SOURCE-PROGRAM. +PRETREATING(*U,*V) -SYN(UNITS(*V).NIL,*U) -OUT(*V). :UNITS(*U.*X) : : :UNIT(*U) - / :SPACE :UNITS(*X). :UNITS(NIL) =:.
:SPACE == s -/. :SPACE :=.

:UNIT(IN(*X)) == s -CHIFFRE(*K) - / :DIGITS(*U) -UNIV(*X,(*K.*U).NIL). :UNIT(*Y) == s -LETTRE(*K) - / :ALPHANUMS(*U) -UNIV(*X,(*K.*U).NIL) -CHGT(*X,*Y). :UNIT(*K) == s :DIGITS(*K.*U) =: s :DIGITS(NIL) ==. -CHIFFRE(*K) - / :DIGITS(*U). -ALPHANUM(*K) - / :ALPHANUMS(*U). +ALPHANUM(*K) -CHIFFRE(*K). +CHGT(*X,ID(*X)). +DO. +DR. +END. +REPEAT.

:ALPHANUMS(*K.*U) := s :ALPHANUMS(NIL) ==.

+ALPHANUM(*K) -LETTRE(*K) - / . +CHGT(*X,*X) -*X - / . +THEN. +LESS. +IF.

+BEGIN. +WRITE. +AND. + U N T I L . +READ. +NOT. +ELSE. +WHILE. +GOTO.

167

** (3) ANALYSISOF THE SOURCE-PROGRAM. +ANALYSIS(*S,*I) -SYN(PROG(*I).NIL,*S) - / -OUT(*I). +ANALYSIS(*S,*I) -LIGNE -SORM("SYNTAX-ERRDR") -LIGNE -CULDESAC. :PROG(*I) == :INST(*I) s :INST(SEQ.*I.*S) == s - / :INST(*I) :INSTRS(*S) s :INST(ASSIGN.*X.*Y) == s s s - / :EXP(ARIT,I,*Y). :INST(WHILE.*B.*I) == s - / :EXP(BdOL,I,*B) s :INST(*I). :INST(REPEA.*B.*I) == s - / :INST(*I) s :EXP(BODL,I,*B). :INST(GOTD.*X) == s s -/. :INST(READ.*X) == s s -/. :INST(WRITE.*X) == s - / =EXP(ARIT,I,*X). :INST(*IF.*B.*S) == s - / ~EXP(BOOL,I,*B) :ENDIF(*IF,*S). :INST(LABEL.*X.*I) == s s - / :INST(*I). :INST(SEQ.NIL) ==. :ENDIF(IFI,*I) == s - / :INST(*I). :ENDIF(IF2,*I.*3) == s :INST(*I) s :INSTRS(*I.*S) == s :INSTRS(NIL) ==. :INST(*3).

- / :INST(*I) :INSTRS(*S).

=EXP(*T,3,*X) == s - / :EXP(*T,I,*X) s :EXP(ARIT,3,*X) == s -/. :EXP(ARIT,3,1N(*X)) == s -/. :EXP(BOOL,3,NOT.*B) == s - / :EXP(BOOL,3,*B). :EXP(BOOL,3,*R.*X.*Y) == - / :EXP(ARIT,I,*X) s -RELATION(*R) :EXP(ARIT,I,*Y). :EXP(*T,*N,*X) == -INF(*N,3) -PLUS(*N,I,*M) :EXP(*T,*M,*Y) - / :ENDEXP(*T,*N,*Y,*X). :EXP(ARIT,I,*X) == :ENDEXP(ARIT,I,IN(O),*X). +RELATION(=) - / . +RELATION(LESS).

:ENDEXP(*T,*N,*X,*Z) == s -DPERATDR(*R,*T,*N) - / -PLUS(*N,I,*M) :EXP(*T,*M,*Y) :ENDEXP(*T,*N,*R.*X.*Y,*Z). :ENDEXP(*T,*N,*X,*X) ==. +OPERATDR(DR,BOOL,1) - / . +OPERATDR(+,ARIT,I) - / . +DPERATOR(STAR,ARIT,2). +DPERATOR(AND,BOOL,2). -/ +OPERATOR(-,ARIT,1). -/

168 ** (4) SYNTHESIS OF THE MACHINE-CODE. +SYNTHESIS(*I,*S) -SYN(PRO(*I).NIL,*S) -DUT(*S). :PRO(*I) == :INS(*I,*U.*V.*W) s :ALLOCATION(*W). :ALLOCATIDN(*V)

:INS(SEQ.NIL,*D) := - / . :INS(SEQ.*I.*S,*D) == - / :INS(*I,*D) :INS(SEQ.*S,*D). :INS(LABEL.*X.*I,*U.*V.*W) == s - / -ADR(*X,*E,*U) :INS(*I,*U.*V.*W). :INS(ASSIGN.*X.*Y,*U.*V.*W) == - / -ADR(*X,*E,*V) :EXPARIT(*Y,*V.*W) s :INS(WHILE.*B.*I,*D) == s -/ :IFGO(NOT.*B,*F,*D) :INS(*I,*D) s s :INS(REPEA.*B.*I,*D) == s -/ :INS(*I,*D) :IFGO(NOT.*B,*E,*D). :INS(GOTO.*X,*U.*V.*W) == s -/ -ADR(*X,*E,*U). :INS(READ.*X,*U.*V.*W) == s s -/ -ADR(*X,*E,*V). :INS(WRITE.*X,*U.*V.*W) == -/ :EXPARIT(*X,*V.*W) s :INS(IF2.*B.*I.*J,*D) == -/ :IFGO(*B,*E,*D) :INS(*J,*D) s s :INS(*I,*D) s :INS(IFI.*B.GOTO.*X,*U.*V.*W) == -/ :IFGO(*B,*E,*U.*V.*W) -ADR(*X,*E,*U). :INS(IFI.*B.*I,*D) == :IFGO(NOT.*B,*E,*D) :INS(*I,*D) s :ALLOCATION(NIL) == -/. :ALLOCATION((*X.*E).*U) == s :ALLOCATION(*U). +CONTENT(IN(*X),*X) -/. s -CONTENT(*X,*Y)

+CONTENT(*X,EMPTY).

:IFGO(OR.*B.*C,*E,*D) == -/ :IFGO(*B,*E,*D) :IFGO(*C,*E,*D). :IFGO(AND.*B.*C,*E,*D) == -/ :IFGO(NOT.*B,*F,*D) :IFGO(*C,*E,*D) s :IFGO(NOT.NOT.*B,*E,*D) == -/ :IFGO(*B,*E,*D). :IFGO(NOT.OR.*B.*C,*E,*D) == -/ :IFGO(AND.(NOT.*B).NOT.*C.*E,*D). :IFGO(NOT.AND.*B.*C,*E,*D) == -/ :IFGO(OR.(NOT.*B).NOT.*C,*E,*D). :IFGO(NOT.*R.*S,*E,*D) == -/ :IFGO((NOT.*R).*S,*E,*D). :IFGO(*R.*X.*Y,*E,*U.*V.*W) == :EXPARIT(-.*x.*Y,*v.*w) s -HOMOLOGOUS(*R,*Q). :EXPARIT(*R.*X.*Y,*V.(EMPTY.*E).*W) == -COMPLEX(*Y) -/ :EXPARIT(*Y,*V.(EMPTY.*E).*W) s :EXPARIT(*X,*V.*W) s :EXPARIT(*R.*X.*Y,*V.*W) == -/ -HOMOLOGOUS(*R,*Q) -ADR(*Y,*E,*V) :EXPARIT(*X,*V.*W) s :EXPARIT(*X,*V.*W) == s -ADR(*X,*E,*V). +COMPLEX(*R.*X.*Y). +HOMOLOGOUS(=,GOZE) -/. +HOMOLOGOUS(LESS,GONE) -/. +HOMOLOGOUS(+,PLUS) -/. +HOMOLOGOUS(STAR,MULT). +HOMOLOGOUS(NOT.=,GONZ) -/. +HOMOLOGOUS(NOT.LESS,GONN) -/. +HOMOLOGOUS(-,MINU) -/.

+ADR(*X,*E,(*X.*E).*U) -/. +ADR(*X,*E,(*Y.*F).*U) -ADR(*X,*E,*U).

169 ** (5) ASSEMBLINGOF THE MACHINE-CODE. +ASSEMBLING(*X,*U) -ASS(*X,*U,O). +ASS(LAB(*N).*X,*U,*N) - / -ASS(*X,*U,*N). +ASS(CODE(*C).*X,*C.*U,*N) - / -PLUS(*N,I,*M) -ASS(*X,*U,*M). +ASS(NIL,NIL,*N) - / . +ASS(*X,*U,*N) -SORM("ERROR: TWICE THE SAMELABEL") -LIGNE -CULDESA[.

** (6) FINAL PRINTING. +PRINTING(*X) -LIGNE -PRI(D,*X) -LIGNE -LIGNE. +PRI(*N,*X.*Y) - / -SORT(*N) -SORM(" -PLUS(*N,I,*M) -PRI(*M,*Y). +PRI(*N,NIL).
") -SDRT(*X) -LIGNE

** (7) PRINTINGOF INTERMEDIATERESULTS. +TRACE -SUPP(+(OK).NIL) - / . +TRACE -AJOUT(+(DK).NIL).

+OUT(* -0~ - / -LtGNE -SORT(*X) -LIGNE. +OUT(*X).

170
-TRACE -COMPILE! BEGIN READ NI READM; IF N T N=5 A D (M LESS 1D O M=50) THEN WRITE D O N R ELSE WRITE (2+N)*(IO+M) END. BEGIN.READ.ID(N).I.READ.ID(M).~.IF.NDT.ID(N).=.IN(5).AND.LBRACK.ID(M) .LESS.IN(1D).OR.ID(M).=.IN(50).RBRACK.THEN.WRITE.IN(O).ELSE.WRITE.LBR ACK.IN(2).+.ID(N).RBRACK.STAR.LBRACK.IN(IO).+.ID(M).RBRACK.END.DOT.NI L

SEQ.(READ.N).(READ.H).(IF2.(AND.(NOT.=.N.IN(5)).OR.(LESS.M.IN(1D)).=.
M.IN(SO)).(WRITE.IN(D)).WRITE.STAR.(+.IN(2).N).+.IN(1D).M).NIL CODE(READ).CODE(STDR.*XO).CODE(READ).CODE(STOR.*X1).CDDE(LOAD.*XO).CO DE(MINU.*X2).CODE(GOZE.*X3).CODE(LOAD.*X1).CODE(MINU.*X4).CODE(GONE.* XS).CODE(LDAD.*X1).CODE(MINU.*X6).CODE(GDZE.*XS).LAB(*X3).CODE(LOAD.*

X4).CODE(PLUS.*X1).CODE(STDR.*X7).CDDE(LOAD.*XB).CODE(PLUS.*XO).CODE( MULT.*XT).CDDE(WRIT).CODE(GOTD.*Xg).LAB(*XS).CODE(LDAD.*XID).CODE(WRI
T).LAB(*Xg).CODE(STOP).LAB(*XO).CODE(ALLD.EMPTY).LAB(*X1).CODE(ALLD.E MPTY).LAB(*X2).CDDE(ALLO.5).LAB(*X4).CODE(ALLD.1D).LAB(*X6).CODE(ALLO .SD).LAB(*XS).CODE(ALLO.2).LAB(*XID).CODE(ALLO.D).LAB(*XT).CODE(ALLO. EMPTY).NIt READ STOR.24 READ STOR.25 4 LOAD.24 5 MINU.26 6 GOZE.13 7 LBAD.25 B MINU.27 9 GONE.21 1D LDAD.25 11 MINU.28 12 GDZE.21 13 LDAD.27 14 PLUS.25 15 STDR.31 16 LDAD.29 17 PLUS.2A 18 MULT.31 19 WRIT 20 GOTO.23 21 LOAD.30 22 WRIT 23 STOP 24 ALLO.EMPTY 25 ALLD.EMPTY 26 ALLO.5 27 ALLO.1D 2B ALLO.5D 29 ALLO.2 30 ALLO.O 31 ALLO.EMPTY 0 1 2 3

171

-TRACE -COMPILE!

BEGIN READ N; IF i0 LESS N DO GOTO TOOBIG; l::O; F:=I; WHILE I LESS N DO BEGIN I:=I+i; F:=I*F END; WRITE F; TOOBIG: END.
0 1 2 READ STDR.22 LOAD.23 MINU.22 GONE.21 LOAD.25 STOR.24 LOAD.27 STOR.26 LOAD.24 MINU.22 GONN.19 LOAD.2A PLUS.27 STOR.24 LOAD.24 MULT.26 STOR.26 GOTO.9 LOAD.26 WRIT STOP ALLB.EMPTY ALLO.IO ALLO.EMPTY ALLO.O ALLO.EMPTY ALLO.I

3
4

5
6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27

172

4.3

EXPLANATION

OF THE P R O G R A M

The first infixed

clause

indicates with

that the f o n c t i o n n a l

symbol

"."

will

be w r i t t e n

in

notation

right-to-lsft

parenthesising.

CO) Call for the different sively.

phasis.

The phases

[1),

(2),

. .... (6) are called

succes-

[1) R e a d d n g

of the source-program. into e string

The s o u r c e - p r o g r a m of characters. is r e d u c ~ t o

is read

character

by c h a r a c t e r

and is t r a n s f o r m e d are s u p p r e s s e d ".", "U',

The blanks a single BLANK,

at the bead of the p r o g r a m blank. STAR, The c h a r a c t e r s LBRACK, RBRACK.

and any o t h e r ")"

spacing

...., "(",

renamed

respectively

DOT,

(2) P r e t r e a t i n g a small symbol

of the source-program. grammar) +, -, ...],

The string

of c h a r a c t e r s

is t r a n s f o r m e d

[by

metamorphosis (begin, end,

into a string

of units. capped

Each unit is either by the f u n c t i o n n a l

a basic symbol

or an i d e n t i f i e r

ID, or by an integer

capped

by the f u n c t i o n n a l

symbol

IN.

[3) A n a l y s i s which

of the source-program.

The string

of units

is t r a n s f o r m e d

into a tree

is the n o r m a l i s e d derived directly

f o r m of the program. f r o m the d e f i n i t i o n in a very

This

is done by means

of a m e t a m o r p h o s i s and the

grammar boolean

of the language. and general

The a r i t h m e t i c a l way. Here follows

expressions

are treated

compact

structure

of all the n o r m a l i s e d

program-forms.

<normalised <instruction>

forme> ::=

::=

prog

[ <instruction> ) I

[ saqu

. instructions . <identifier> exp>

( assign [while ( repea ( read ( write [ goto [ if1 [ if2

. <arithmetical . <instruction> . <instruction>

exp> ) i ) l

) l

. <boolean . <boolean

exp> )

. <identifier> .<arithmetical . <identifier> . <boolean . <boolean exp> exp>

I
exp> ) i

) i . <instruction> . <instruction> . <instruction> . <instructions> ) l . <instruction> ) i ) ) l

( label <instructions> <arithmetical ::= exp> nil

. <identifier>

I [ <instruction> ( <integer> )

::= in

<identifier> exp>

I
exp> )

( <+ - mult>

.<arithmetical

. <arithmetical

173

<boolean

exp>

::=

[ <relation> ( <end ( not . or> .

<arithmetical exp> ) .

exp>

<arithmetical exp> ) I

exp>

<boolean exp>

<boolean

<boolean

4+

melt>

::=

I mult

<and or>

::=

and I or and <identifier> are defined at pare 4.1.

<relation> , <integer>

(4) Synthesis of the machine-code, A metamorphosis grammar transforms the normalised form of the program into a string of elementary instructions. Each elementary instruction is either an instruction of which the address-part is a prolog-variable, the all capped by the functionnal symbol CODE, or a prolog-variable representing an address capped by the symbol LAB [for label].

The last parameter of the non-terminal INS represen~ three tables. Each of these is composed of a sequence of doublets [object, prolog-variable representing the address where it is to be found). The first table associates an address to each identifier representing a label. The second table associates an address to each integer or ideptifier representing an integer. The third table associates an address to each supplementary memory necessary to compute an arithmetical expression. These table which, at the start, are represented by prolog-variables, are constantly updated and consulted by the predioat ADR(x,e,v] which gives the adress the table v . e of the object x recorded in

The code of the boolean expressions is generated by means o~ the non-terminal IFGO [b,s,d) which means : "if the boolean expression b is true then goto e" (d repre-

sents the three preceding tables]. This code is optimised so as to minimise the evaluations of relations at run-time.

The code of arithmetical expressions is optimised in the sense that in an operation whers the second operand is simple pllcate no supplementary memory is used. So as not to com(by playing on associativity, commu-

the compiler we make no transformation

tativity .... ) in order to render this ease as oommun as possible.

(5) Assembling of.the mecbine-eode. All the addresses represented by prolog variables are replaced by integers and same minor transformations result. allow us to obtain the final

(6) Final printing. The final result is printed, each instruction being numbered.

174

(7) Printing of intermediate results. The call for TRACE provokes the printing of Intermediate results. Recalling TRACE suppresses this printing. A second recalling provokes printing anew, etc ....

175

CHAPTER
= = = = : = = : =

AN INTELLIGENT SYSTEM CONVERSING IN FRENCH

5.1

DESCRIPTION OF THE SYSTEM

We propose to write a system allowing us to hold an "intelligent" conversation in French with the computer. The conversation will concern relations of friendship and parenthood between different persons. The user of this system will be able :

- to communicate information to the computer by means of affirmative and negative sentences (and possibly by replying "yes" or "no" to certain questions)

- to suppress information (volontarily or involontarily] by negative or affirmative sentences contradicting facts previously stated ;

t o ask q u e s t i o n s

of the type "qui est.,,"

("who i s . . . . ")

or "qui

n'est

pas,.,"

("who is not.,.") to which the computer will attempt to reply by making a certain number o~ deductions

- to ask questions of the type "pourquoi..." ("why...") to which the computer will reply by retracing the sequence of deductions which allows him to reach the conclusion ;

to ask the computer to write or not to write intermediate results instrumental in his understanding of sentences and in his reasoning

- t o s t o p t h e system by s a v i n g

"au r e v o l t

!"

(~ood bye i'),

Here i s t h e s e t o f a l l

the sentences

("phrases"

in French) which form the subset

o f French i n which t h e i n t e r l o c u t o r

must address t h e c o m p u t e r ,

<phrase>

::=

AU REVOIR i

I DES DETAILS !. I PAS DE OETAILS !

I OUI [ I

NON ! I QUI <he> <Bet> <pas> <sn> ? POURQUOI EST-CE <que> <en> <ne> <est> <pas> <sn> ? I <sn> <ne> <est> <pas> <sn>.

<sn>

::= <art> <nom> <de> <sn> I TOUT LE MONDE QUELQU'UN I JE I <nom propre>

I PERSONNEI

176

<art> <nom> DE JE <art> <de> LE <est> <mon> <ne> <nom> ::= ::= ::= ::= ::= ::=

::= <mon> <nom>

AUCUN I AUCUNE I CHAQUE I L' l LA [ LE I UN [ U N E DU SUIS I EST MON l MA <vide> I N' i NE AMI I AMIE l FEMME l FILLE I FILS l FRERE I MARI l MERE l NEVEU I NIECE l ONCLE J PERE I SOEUR l TANTE <de> ::= d' I de

<nom p r o p r e <pas> <que> <vide> ::= ::= ::=

>

::=

"any w o r d " I PAS I gUE

<vide> OU '

Here now is the list of ell the replies ["r@ponses" in French) which can be produced by the computer.

<r~ponse>

::= <r~pons e & : qui...> I <r~ponse ~ : pourquo <rejet d'une phrase> I <demande d'information>

<r6ponse ~ : qui...>

::=

TOUT LE MONDE ! I TOUTE FEMME ! I VOUS ! [ TOUT HOMME I <nom propre> ]

~@ponse & : pourquoi,..>

::=

PARCE QUE JE SAIS RAISONNER. <et que>

<et que>

::=

<vide> I ET QUE VOUS M'AVEZ DIT ' <phrase>

' <et que>

<rejet d'une phrase>

::=

JE NE COMPRENDS PAS CETTE PHRASE. C'EST TROP COMPLIQUE. I

JE NE COMPRENDS PAS TRES BIEN. IL Y A UN PROBLEME DE SEXE. VOUS VOUS CONTREDISEZ. J'OUBLIE OUE VOUS M'AVEZ DIT : ' <phrase> '

<demande d'information>

::=

VOUS ETES BIEN DE SEXE MASCULIN ? I <nom propre> EST BIEN DE SEXE MASCULIN ? l OUI ] OU NON !

5.2

PROGRAM AND EXPLANATION

Here is the whole of the program prolog which realises the system.

177 -AJOP( "ET" ,1 ,"Xs (Xs -AJOP("IMPLIQUE" ,1 ,"Xs163 -AJOP( "NON",2 ,"s ! -AJOP( " / " ,3 ,"Xs (Xs ! -AJOP(" ." ,4 ,"Xs163 ") ! -AJOP( " - " ,5 ," (Ds163 !

** (O) IMBRICATION DES DIFFERENTES PHASES. +BAVARDONS -ECHANGES(1) -IMPASSE. +ECHANGES(*N) -PHRASELUE(*P) -AJOUTER(+(PHRASE(*N,*P)).NIL). +ECHANGES(*N) -PHRASE(*N,*P) -PRETRAITEMENT(*P,*Q) -ANALYSE(*Q,*R) -ENONCE(*R,*S,*N) -AJOUTER(*S). +ECHANGES(*N) -HALTE -MESSAGE("BONSOIR!") - / . +ECHANGES(*N) -ABSURDE(*R) -SORTIR(*R) -ELAGUER(*R,*S,*M,*P) -REPONSE(*N.*S,*M,*P). +ECHANGES(*N) -QUESTION -AJOUTER(+(KO(*N)).NIL). +ECHANGES(*N) -PLUS(*N,I,*M) -ECHANGES(*M). +ELAGUER(NIL,*S,IOOOO,*P) - / . +ELAGUER(*I-*U.*R,*S,*K,*P) - / -OK(*I) -DANS(*I,*S) -ELAGUER(*R,*S,*J,*P) -MIN(*I,*Jg*K). +ELAGUER(INDIVlDU(*X).*R,*S,*K,*P) - / -CONFIRMATION(*X) -ELAGUER(*R,*S,*K,*P). +ELAGUER(*P.*R,*S,*K,*P) -ELAGUER(*R,*S,*K,*P). +MIN(*I,*3,*I) -INF(*I,*3) - / . +MIN(*I,*J,*J).

+REPONSE(*S,*N,NIL) - / -AJDUTER(+(KO(*N)).NIL) -PHRASE(*N,*P) -MESSAGE("VOUS VOUS CONTREDISEZ. 3'OUBLIE QUE VOUS M'AVEZ DIT:") -DIRE("'".*P.' .... ) -LIGNE. +REPONSE(*S,*N,QUI(*X)) - / -VALR(*X,*P) 'MESSAGE(*P."!"). +REPONSE(*M.*S,*N,POURQUOI) -MESSAGE("PARCEQUE JE SAIS RAISONNER.") -ETQUE(*M,*S)~ +VALR(NIL-IN(NIL),"TOUT LE MONDE") - / . +VALR(F-IN(NIL),"TOUTE FEMME") - / . +VALR(M-IN(NIL),"TOUT HDMME") - / . +VALR(*G-JE,"VOUS") - / . +VALR(*G-IN(*A),*P) -UNIV(*A,*P). +ETQUE(*N,NIL) - / . +ETQUE(*N,*N.*S) -ETQUE(*N,*S). -/ +ETQUE(*N,*M.*S) -PHRASE(*M,*P) -DIRE("ET QUE VOUS M'AVEZ DIT: " . " ' " . * P . " ' " ) -LIGNE -ETQUE(*N,*S). ** (1) DEMANDED'INFORMATION SUPPLEMENTAIRE. +CONFIRMATION(*G-*I) -VAR(*G) - / . +CONFIRMATION(*G-*I) -NOMPROPRE(*H-*I) - / -PAREILS(*G,*H). +CONFIRMATION(*G-*I) -RECONNU(*H-*I) -AJOUTER(+(NOMPROPRE(*H-*I)).NIL) -PAREILS(*G,*H). +RECONNU(*G-JE) - / -MESSAGE("VOUS ETES BIEN DE SEXE MASCULIN?") -REPONSELUE(*G). +RECONNU(*G-IN(*A)) -UNIV(*A,*U) -MESSAGE(*U." EST BIEN DE SEXE MASCULIN?") -REPONSELUE(*G).

178

+REPONSELUE(*G) -PHRASELUE(*P) -RESULTAT(*P,*G). +RESULTAT("OUI!",M) - / . +RESULTAT("NON!",F)/ . +RESULTAT(*P,*G) -MESSAGE("OUI! O NON!") -REPONSELUE(*G). U ** (2) PRETRAITEMENT DE LA PHRASE. +PRETRAITEMENT(*U,*V) -SYN(UNITES(*V).NIL,*U).
:UNITES(*U.*X) : : :UNITES(NIL) ==. :ESPACE : : s :ESPACE == s :UNITE(*U) - / :ESPACE :UNITES(*X).

-/ -/

:ESPACE. :ESPACE.

:ESPACE =: s :ESPACE ==.

-/

:ESPACE.

:UNITE(*X) == s :UNITE(*K) =: s

-LETTRE(*K) - / :LETTRES(*U) -UNIV(*X,(*K.*U).NIL). -LETTRE(*K) - / :LETTRES(*U).

:LETTRES(*K.*U) : : s :LETTRES(NIL) ==.

** (3) ANALYSEDU FRANCAIS. +ANALYSE(*P,*Q) -SYN(PH(*Q).NIL,*P) - / -SORTIR(*Q). +ANALYSE(*P,*Q) -MESSAGE("JE NE C M R N S PAS CETTE PHRASE.") O PED -IMPASSE. :PH(*U.HALTE) == s s s -/. :PH(*U.DETAILS(*U)) == s s s -/. :PH(NDN *U.DETAILS(*U)) == s s s s -/. :PH(NON(*Q ET *U.DANS(QUI(*X),*U)) ET *V.QUESTION) == s -/ :NE :EST :PAS(*P.*Q) :CO(*X.*P) s :PH(NON(*R ET *U.DANS(PDURQUOI,*U)) ET *V.QUESTION) == s s s - / :QUE :SN(*X.*P.*Q) :NE :EST :PAS(*Q.*R) :CO(*X.*P) s :PH(*R) : : :SN(*X.*P.*Q) :NE :EST :PAS(*Q.*R) :CO(*X.*P) s :CD(*X.*Q) == :ART(*X.*P.*Q.ILYA(*I,*P ET *Q)) - / :NOM(*X.*Y.*P) :DE :SN(*Y.*P.*Q). :CO(*X.*Q) =: :SN(*Y.(*U.EGAL(*X,*Y,*U)).*Q). :SN(*X.*Q.*S) == :ART(*X.*P.*Q.*R) :NOM(*X.*Y.*P) - / :DE :SN(*Y.*R.*S). :SN(*G-*I.*P.TOUT(*G,TDUT(*I,*P))) == s s s -/. :SN(*G-*I.*P.(NON ILYA(*G,ILYA(*I,*P)))) == s -/. :SN(*G-*I.*P.ILYA(*G,ILYA(*I,*P))) == s s -/. :SN(*G-JE.*P.LE(*G,*U.NOMPROPRE(*G-JE),*P)) == s -/.
:SN(*G-IN(*A).*P.LE(*G,*U.NOMPR OPRE(*G-IN(*A)), *P) ) :ART(*XPQR) == s -VAL(*M,ART(*XPQR)) - / . : A R T ( * G - * I . * P . * Q . I L Y A ( * I , * P ET *Q)) s s :NOM(*XYP) == s :DE : : s :DE == s -VAL(*M,NOM(*XYP)). :DE s :: s -/. := s

== :MON(*G-*I) s

-/.

179 :EST : : s -/. :MON(M-*I) : : s -/. :NE : : s -/. :NE ==. :PAS(*P.(NON *P)) == s :QUE == s -/. :EST : : s :MON(F-*I) : : s :NE : : s - / . -/. :PAS(*P.*P) = : . :QUE == s

+VAL(*M,*V) -UNIV(*M,*A.NIL) -UNIV(*N,*A.*V.NIL) -*N. * * (4) DICTIONNAIRE.

+AMI(NOM(M-*I.*Y.*U.AMY(M-*I,*Y,*U))). +AMIE(NOM(F-*I.*Y.*U.AMY(F-*I,*Y,*U))). +AUCUN(ART(M-*I.*P.*Q.TOUT(*I,*P IMPLIQUEN N *Q))). O +AUCUNE(ART(F-*I.*P.*Q.TOUT(*I,*P IMPLIQUE N N *Q))). O +CHAQUE(ART(*G-*I.*P.*Q.TDUT(*I,*P IMPLIQUE *Q))). +FEMME(NOM(F-*I.M-*J.*U.EPOUX(M-*J,F-*I,*U))).
+FILLE(NOM(F-*I.*P)) -ENFAN(F-*I.*P). +FILS(NDM(M-*I.*P)) -ENFAN(M-*I.*P). +FRERE(NOM(M-*I.*P)) -FREUR(M-*I.*P). +L(ART(*G-*I.*P.*Q.ILYA(*I,*P ET * Q ) ) ) . +LA(ART(F-*I.*P.*Q.ILYA(*I,*P ET * Q ) ) ) . +LE(ART(M-*I.*P.*Q.ILYA(*I,*P ET * Q ) ) ) . +MARI(NOM(M-*I.F-*J.*U.EPOUX(M-*I,F-*J,*U))). +MERE(NDM(F-*I.*Y.*P)) -ENFAN(*Y.F-*I.*P). +NEVEU(NDM(M-*I.*Y.*P)) -ONTE(*Y.M-*I.*P). +NIECE(NOM(F-*I.*Y.*P)) -ONTE(*Y.F-*I.*P). +ONCLE(NOM(M-*I.*Y.*P)) -ONTE(M-*I.*Y.*P). +PERE(NOM(M-*I.*Y.*P)) -ENFAN(*Y.M-*I.*P). +SOEUR(NOM(F-*I.*P)) -FREUR(F-*I.*P). +TANTE(NOM(F-*I.*Y.*P)) -ONTE(F-*I.*Y.*P). +UN(ART(M-*I.*P.*Q.ILYA(*I,*P ET * Q ) ) ) . +UNE(ART(F-*I.*P.*Q.ILYA(*I,*P ET * Q ) ) ) .

+ENFAN(*X.*G-*I.*U.EGAL(*G-PAR(*G,*X),*G-*I,*U)). +FREUR(*X.*Y.(*U.EGAL(*G-PAR(*G,*X),*G-PAR(*G,*Y),*U) ET N N *V.EGAL(*X,*Y,*V))). O +ONTE(*X.*Y.ILYA(*G,ILYA(*I,*P ET *Q))) -FREUR(*X.*G-*I.*P) -ENFAN(*Y.*G-*I.*Q).


* * (5) CREATION D'ENONCES ELEMENTAIRES. +ENONCE(*R,+(*A).-VALIDE(*N-*V,*U).*C,*N) -SYN(ONA(*R,*N/*V/*U.*A).NIL,*C). :ONA(NON NON * P , * S ) := - / :ONA(*P,*S). :ONA(NON(*P ET *Q),*S) : : - / :ONA(*P IMPLIQUE NON *Q,*S). :ONA(NON(*P IMPLIQUE *Q),*S) == - / :ONA(*P ET NON *Q~*S). :ONA(NON TOUT(*I~*P),*S) == - / :ONA(ILYA(*I~NON * P ) , * S ) . :ONA(NON I L Y A ( * I , * P ) , * S ) == - / :ONA(TOUT(*IgNON * P ) , * S ) . :ONA(NON L E ( * I , * P , * Q ) , * S ) == - / :ONA(LE(*I,*P,NON * Q ) , * S ) . :ONA(NON *U.NOMPROPRE(*X),*N/1/*U.ABSURDE(*U)) == - / :SEXE(*Xg*U). :ONA(NON (1.*U).*P,*N/1/*U.ABSURDE(*U)) =: s -/. :ONA(*P ET *Q,*N/*I-O/*R) == :ONA(*P~G(*N)/*I/*R). :ONA(*P ET *Q,*N/O-*I/*R) == - / :ONA(*Q,D(*N)/*I/*R). :ONA(*P IMPLIQUE * Q , * N / * I - * J / * T ) == - / :ONA(NON * P , G ( * N ) / * I / * R )

180

:ONA(*Q,D(*N)/*J/*S) -COMBINAISON(*R,*S,*T). :ONA(TOUT(*I,*P),*N/*S) =: - / :ONA(*P,*I-*N/*S).


:ONA(ILYA(SK(*N),*P),*N/*S) =: - / :ONA(*P,D(*N)/*S). :ONA(ILYA(*I,*P),*S) : : - / :ONA(*P,*S). :DNA(LE(*I,*P,*Q),*S) == - / :ONA(TOUT(*I,*P IMPLIQUE *Q),*S). :ONA((2.*U).*P,*N/1/*U.*P) ==. +COMBINAISON(*U.*P,*U.ABSURDE(*U),*U.*P) - / . +COMBINAISDN(*U.ABSURDE(*U),*U.*P,*U.*P) - / . +COMBINAISDN(*P,*Q,*R) -MESSAGE("C'EST TROP COMPLIQUE.") -IMPASSE. :SEXE(*X,*U) : : :SEXE(*G-*I,*U) -MESSAGE("IL :SEXE(*G-*I,*U) :SEXE(*X,*U) == -NOMPROPRE(*X) - / . : : -NOMPROPRE(*H-*I) - / Y A UN PROBLEMEDE SEXE.") -IMPASSE. == s -VAR(*G) - / . -AJOUTER(+(NOMPROPRE(*X)).NIL).

** (6) REGLESDE RAISDNNEMENT. +VALIDE(*N-*U,*V) -OK(*N) -NOUVEAU(*N-*U,*V). +OK(*N~ -KO(*N) - / -IMPASSE. +OK(*N).

+NOUVEAU(*A,*B.*U) -VAR(*B) - / -PAREILS(*A,*B). +NOUVEAU(*A,*A.*U) - / -IMPASSE. +NOUVEAU(*A,*B.*U) -NOUVEAU(*A,*U).


+DANS(INDIVIDU(*X),*U) -NOMPROPRE(*X) - / . +DANS(INDIVIDU(*G-*I),*U) -NOMPROPRE(*H-*I) - / -IMPASSE. +DANS(*X,*X.*U) - / . +DANS(*X,*Y.*U) -DANS(*X,*U).

+ABSURDE(*U) -EGAL(*G-IN(*A),*G-IN(*B),I.*U) -PASPAREILS(*A,*B). +ABSURDE(*U) -EGAL(M-*I,F-*J,I.*U). +EGAL(*X,*X ,1 .*U). +EGAL(*X ,*Y ,1 .*U) -EGAL(*X,*Z ,2 .*U) -EGAL(*Z,*Y ,1 .*U). +EGAL(*X,*Y ,1 .*U) -EGAL(*Z,*X ,2 .*U) -EGAL(*Z,*Y ,1 .*U). +AMY(*X,*Y,I.*U) -EGAL(*X,*R,I.*U) -EGAL(*Y,*S,I.*U) -AMYBIS(*R,*S,*U). +AMYBIS(*X,*Y,*U) -AMY(*X,*Y,2.*U). +AMYBIS(*X,*Y,*U) -AMY(*Y,*X,2.*U).
+EPOUX(*X,*Y,I.*U) -EGAL(*X,*R,I.*U) -EGAL(*Y~*S,I.*U) -EPOUX(*R,*S,2.*U).

+DETAILS(I.*U) -DETAILS(2.*U). +PASPAREILS(*X,*Y) -PAREILS(*X,*Y) - / -IMPASSE. +PASPAREILS(*X,*Y).


+PAREILS(*A,*A). ** (7) LECTURESET ECRITURES.

+PHRASELUE(*L.*U) - / -LIGNE -DIRE("MOI. -") -LIGNE -LUB(*K)

181

-TR(*K.NIL,*L) -SUITELIRE(*L,*U). +SUITELIRE(PBINT,NIL) - / . +SUITELIRE(!,NIL)- / . +SUITELIRE(?,NIL) - / . +SUITELIRE(*K,*M.*U) -LU(*L) -TR(*L.NIL,*M) -SUITELIRE(*M,*U). +AJOUTER(*P) -DETAILS(I.*U) -/ -AJOUT(*P) -MESSAGE("J'ENREGISTRE:") -SORC(*P) -LIGNE. +AJBUTER(*P) -AJOUT(*P). +SORTIR(*U) -DETAILS(I.*V) -/ -MESSAGE("JETROUVE:") -SORT(*U) -LIGNE. +SORTIR(*U). +MESSAGE(*U) -LIGNE -DIRE("LA MACHINE. - "} -DIRE(*U) -LIGNE. +DIRE(NIL) - / . +DIRE(*U.*V) - / -DIRE(*U) -DIRE(*V). +DIRE(*U) -TR(*V.NIL,*U) -ECRIT(*V). +TR(".",POINT) - / . +TR("-",TBAIT) -/. +TR(.... ,BLANC) -/. +TR(*X.NIL,*X).

182

The program is composed of 8 parts. Here is the explanation relative to each of them.

[0) Imbrication of the different phasis (imbrication des diff6rentes phases), The system functions by executing a series of exchanges. Each exchange consists principally : in reading a sentence and recording it, in transforming it into a sequence of elementary statements (which are regular clauses) and retaining them, in starting a proof from the litteral -ABSURDE(~R), in computing from ~R a possibly reply and

possibly in suppressing certain clauses which render the recorded information absurd. In this last case, it suppresses the clauses deriving from the oldest sentences.

(1) Request for information (demande d'information). The system manages by itself the dictionnary of proper nouns and of their genders. In certain cases when it ~ s not been able to determine the gender of e proper noun and needs it in its deductions, ~ q u e s ~ directly this information from the interlocutor. re-

~2) Pretreating of the sentence {pr6traitement de la phrase), After reading a sentence presented in the form of a sequence of characters, the system eliminates some of them and produces a sequence of words and punctuation marks. Thls is done by means of a small metamorphis grammar.

(3)

Analysis

0% F r e n c h

[analyse

du f r a n g a i s ] .

The a n a l y s i s modelled is

o f each s e n t e n c e definition

is

effected of our :

b y means o f a m e t a m o r p h o s i s subset of French.

grammar d i r e c t l y

on t h e f o r m a l

Each deep s t r u c t u r e

obtained

a formula

of the following

type

<formula>

::=

( NON <formula> ) I ( <formula> ET <formula> ) I ( <formule> IMPLIQUE <formule> ) I ILYA (<indice> , <formula> ) I TOUT { <indice>,<formule> ) I

LE ( < i n d i c e >

,<formule>,<formule> ) l

<veriable> . <formula ~l~mentaire>

<fermule @l~mentaire>

::=

AMY [ <personne> , <personne> , <varlable> ) l DANS [ POURQUOI , <variable> ) I DANS ( OUI (<personne> ] , <variable> ) l DETAILS [ <variable> ) l ESAL [ <personne> , <personne> , <variable> ) t EPOUX ( <personne> , <personne> , <variable> ) I HALTE I NOMPROPRE [ <personne> ] I QUESTION

183

<indice> <personne> <genre> <nom>

::=

<variable>

I F I M

::= < g e n r e > - <nom> P I M I <variable> <nom p r o p r e > ::= I PAR ( < g e n r e > PROLOG " , <personne> ) I <variable>

::= ::=

<variable>

" variable

(4)

Dictionary

(dictionneire).

The d i c t i o n a r y All

essoeiatmto of

each a r t i c l e

and each noun in

a "syntactical-semantic" function of the relation

formula.

the relations

parenthood

are expressed the term ~X .

EPOUX ( s p o u s e ) , the parent of

EGAL [ e q u a l ) sex ~G

by i n t r o d u c i n g

PAR(~G,~X) w h i c h

represents

of the individual

(5) Creation of elementary statements (creations d'~nonc6s 61~mentaires). A metamorphosis grammar permits the transformation of each deep structure into a set of elementary statements, This is essentially an algorithm of the "skolemisation" type which also produces regular clauses. The positive litteral is always placed at the heed ; if there is none we create the litteral +ABSUROE(~U), if there are several, the ~t'stOOcomplicated). Moreover,

machine emits the message "C'EST TROP COMPLIQUE" the litteral -VALIOE(~N-~T,~U)

is systematically inserted after the first litteral :

it permits the control of deductions and allows us to simulate the suppression of a clause. In this litteral clauses comes ; ~T ~N designates the number of the sentence ~ O m which the

is a term permitting differentiation between several clauses ~U is e v a r i a b l e appearing in all the litterels
of certaln results

coming 9rom the same sentence,

and which, in the course of a demonstration, permits the c u m u l a t i o n

and in particular the listing o~ ell the clauses instrumental in this demonstration,

(6) Rules of reasoning (r@gles de raisonnement). In order to reason, the machine Knows axioms of equality, it supposes that a person can have only one name and only one sex, and that, it a is the friend of b , then b is the friend of a

[7) Reading end writing [lectures et 6critures). This concerns the management of the input-output device.

5.8

EXAMPLE OF CONVERSATION

Here now is an example of a conversation. When the machine prints intermediates results, it gives all the clauses it is recording, the deep structure of each sentence and the result of the deductions from which it produces its replies.

184

-BAVARDONS!
MOI. DIEU -

EST L'AMI DE TOUT LE MONDE.


-

MOI. DIEU

EST L ' A M I E DE TOUT LE MONDE.

LA MACHINE. - IL Y A UN PROBLEME DE SEXE.


MOI. DES -

DETAILS!

MOI. J E SUIS HORACE. LA MACHINE. - 3'ENREGISTRE: +PHRASE(4,J,E.BLANC.S.U.I~S.BLANC.H.O.R.A.C.E.POINT.NIL). LA MACHINE. - 3E TROUVE: LE(*XO,*XI.NDMPROPRE(*XO-JE),LE(*X2,*X3.NOMPROPRE(*X2-1N(HORACE)),*X4 .EGAL(*XO-JE,*X2-1N(HORAcE),*X4))) LA MACHINE. - 3'ENREGISTRE: +EGAL(*XO-JE,*XI-IN(HORACE),2.*X2)-VALIDE(a-(1-(1-1)),*X2)-DANS(INDIV IDU(*XO-JE),*X2)-DANS(INDIVIDU(*XI-IN(HORACE)),*X2). LA MACHINE. - JE TROUVE: 4-(I-(1-1)).INDIVlDU(M-JE).INDIVIDU(F-IN(HORACE)).*XO LA MACHINE. - VOUS ETES BIEN DE SEXE MASCULIN?
MOI. OUI! -

LA MACHINE. - 3'ENREGISTRE: +NOMPROPRE(M-JE). LA MACHINE. - HORACE EST BIEN DE SEXE MASCULIN?


MOI. OUI! -

LA MACHINE. - J'ENREGISTRE: +NOMPROPRE(M-IN(HORACE)). MOI.


PAS -

DE DETAILS!

LA MACHINE. - 3'ENREGISTRE: +PHRASE(5,P.A.S.BLANC.D.E.BLANC.D.E.T.A.I.L.S.!.NIL). LA MACHINE. - 3E TROUVE: N N *XO.DETAILS(*XO) O LA MACHINE. - J'ENREGISTRE: +ABSURDE(*XO)-VALIDE(5-1,*XO)-DETAILS(I.*XO). LA MACHINE. - JE TROUVE: 5 - 1 . 3 - 1 .*XO

185

LA MACHINE. +KO(3).

3'ENREGISTRE:

LA MACHINE. - VOUS VOUS CONTREDISEZ. 3'DUBLIE Q E VOUS M'AVEZ DIT: U 'DES DETAILS!'
MOI. -

PDURQUOI

EST-CE QUE JE NE SUIS PAS DIEU?

LA MACHINE. - PARCE QUE 3E SAIS RAISONNER. E~ QUE VOUS M'AVEZ D I T : ' J E SUIS HORACE.' LA MACHINE. - PARCE QUE 3E SAIS RAISONNER. ET QUE VDUS M'AVEZ D I T : 'JE SUIS HORACE.'
~01. L'AMI -

DE M SDEUR EST CURIACE. A

LA MACHINE. - CURIACE EST BIEN DE SEXE MASCULIN?


MOI.

ZDZD EST MON F I L S . LA MACHINE. MOI. OUI! MOI. ZDZD EST MON F I L S . MOI. QUI EST L ' A M I DE LA TANTE AU ZOZO. LA MACHINE. - 3E NE COMPRENDS PAS CETTE PHRASE. MOI. QUI EST L ' A M I DE LA TANTE DE ZDZD? LA MACHINE. - DIEU! LA MACHINE. - CURIACE! LA MACHINE. - DIEU! LA MACHINE. - CURIACE! LA MACHINE. - DIEU! LA MACHINE. - CURIACE! MDI. POURQUOI EST-CE QUE CURIACE EST L ' A M I DE LA TANTE DE ZOZO? LA MACHINE. - PARCE QUE JE SAIS RAISONNER. ET QUE VOUS M'AVEZ D I T : ' L ' A M I DE MA SOEUR EST CURIACE,' ET QUE VOUS M'AVEZ D I T : 'ZOZO EST MON F I L S . '
-

OUI!

OU NON!

186

LA MACHINE. - PARCE QUE JE SAIS RAISONNER. ET QUE VOUS M'AVEZ D I T : ' L ' A M I DE MA SOEUR EST CURIACE.' ET QUE VOUS M'AVEZ D I T : 'ZOZO EST MON F I L S . ' LA ET ET ET MACHINE. QUE VOUS QUE VOUS QUE VOUS - PARCE QUE M'AVEZ D I T : M'AVEZ D I T : M'AVEZ D I T : JE SAIS RAISONNER. 'JE SUIS HORACE.' ' L ' A M I DE MA SOEUR EST CURIACE.' 'ZOZO EST MON F I L S . '

MOI. SATAN EST MON A M I , M01. QUI EST L ' A M I DE DIEU? LA MACHINE, - DIEU! LA MACHINE. - TOUT HOMME! LA MACHINE, - VOUS! LA MACHINE. - VOUS! LA MACHINE. - CURIACE! LA MACHINE. - HORACE! LA MACHINE. - HORACE! MOI. SATAN N'EST L ' A M I DE PERSONNE. LA MACHINE. - VOUS VOUS CONTREDISEZ. J'OUBLIE OUE VOUS M'AVEZ D I T : 'SATAN EST MON A M I . ' LA MACHINE. - VOUS VOUS CONTREDISEZ. J'OUBLIE QUE VOUS M'AVEZ D I T : 'DIEU E T L'AMI DE TOUT LE MONDE.' S MOI.
QUI -

N'EST PAS M N AMI? O


-

LA MACHINE.

SATAN!

LA MACHINE. - SATAN! LA MACHINE. - SATAN! MOI. L'AMI DE LA SOEUR D M NIECE N'EST PAS M N AMI. E A O LA MACHINE. - C'EST TROP COMPLIQUE. MOT. AU REVOIR! LA MACHINE.
-

BDNSOIR!

187

BIBLIOGRAPHY

(I) BATTANI G. et MELONI H., Interpr@teur du langage de programmation PROLOG, Rapport de OEA, Groupe d'Intelllgence Arti#iclelle, UER de LUMINY, Unlverslt@ d'AIXMARSEILLE, September 1973.

(2] BATTANI G. et MELONI H., Mise en oeuvre des c o n t r a i n t e s phonologiques, e t s6mantiques dans un syat6me de ThEse de 36me c y c l e ,

syntaxiques

compr6hension automatique de l a p a r o l e , Arti{icielle, UER de LUMINY,

Groupe d ' I n t e l l i g e n o e

U n i v e r s i t ~ d'AIX-MARSEILLE, June 1975.

(3) BERGMAN M. et KANOUI H., Sycophants : Syst@me de calcul formel et d'int@gration symbolique sur ordlnateur, Rapport de recherche, Groupe d'Intelligenoe Artificielle, UER de LUMINY, Unlversit@ d'AIX-MARSEILLE, October 1975.

(4) COLMERAUER A., les syst~mes-q ou un formalisme pour analyser et synth@tiser des phrases sur ordinateur, publication interne n ~ 43, D@partement d'Informatique, Universit~ de MONTREAL, September 1970.

(5) COLMERAUER A., OANSEREAU J., HARRIS B. et KITTREDGE, TAUM 71, Rapport annuel du proJet de traduction automatique de l'Universit@ de MONTREAL, Januar 1971.

(6) COLMERAUER A., KANOUI H., PASERO R. et ROUSSEL Ph., Un syst@me de communication homme-machine en frangais, Rapport de recherche, Groupe d'Intelligenoe ArtS#ioielle, UER de LUMINY, Universit@ d'AIX-MARSEILLE, June 1973.

(7) PASERO R., Repr@sentation du #rangais en lo~ique du ler ordre, en vue de dialoguer avec un ordlnateur, Th@se de 3@me cycle, Groupe d'IntelIigence Artlflblelle, UER de LUMINY, Universit@ d'AIX-MARSEILLE, October 1972.

(8) KOWALSKI R. et KUEHNER D., Linear resolution with selection function, Artificial Intelligence 2, 1971.

(9) KOWALSKI R. et VAN EMDEN M., The semantic of predicate logic as programming language, JACM, 23, n ~ 4, pp. 733-743, October 1976.

(10) ROBINSON J.A., A machine-orlented lo$1c based on the resolution p rincipie, JACM 12, n~ pp. 227-234, December 1965.

188

[11) ROUSSEL Ph,, PROLOG, Manuel d ' u t i l i s a t i o n , Arti{icielle,

Rapport i n t e r n e , Groupe d ' I n t e l l i g e n c e

UER de LUMINY, U n i v e r s i t e d'AIX-MARSEILLE, September 1975,

(12) WARREN D., WARPLAN computational June 1974.

: A system #or ~eneretlng, logic, School of Arti#iclel

plans, Memo n ~ 76, Department of Intelligence, Univers d'EDIMBOURG,

THE THEORY

AND PRACTICE

OF AUGMENTED

TRANSITION

NETWORK GRAMMARS

Madeleine Boston University Boston,

Bates Department Inc. 02115 / USA

Mathematics

Massachusetts

and Bolt Beranek Cambridge,

and Newman

Mass.

02138 / USA

I. INTRODUCTION

For the last grammars question

eight

years

augmented language

transition

network

(ATN) and

have been used in natural answering systems easy to write

understanding

systems

for both text and speech. and debug, and easy to interface

They have proved a wide variety components of humans

to be flexible, of syntactic of and a total linguistic

able to handle

constructions, system. and

to other

They provide they may be

a useful way to give an account to both (partially) to know presented how to

structures

which can be easily communicated

computers,

by easily program a it

visualized computer

diagrams.

One does not need

in order to write or use an ATN grammar. to learn about ATNs. in 1975]. English, as well. Although English have found ATNs useful the

This fact makes without of

easy for linguists to a computer languages languages throughout [Grimes,

Even linguists ATN grammars will be

access several for

description

can be written used for

other than this paper;

examples

most of the techniques

presented

will be useful

for other languages

Augmented Woods but [Woods,

transition 1969, 1968; 1970,

network

grammars et al,

were developed 1972],

by William similar work of ATNs 3)

1973; Woods models

although

less well developed

appeared

independently 1969]. 2)

in earlier

[Thorne et al, may be

Bobrow and Fraser, as I)

The advantages generative

summarized of

perspicuity, 4)

power,

efficiency regularities

representation,

the ability

to capture linguistic of operation.

and generalities,

and 5) efficiency

Much has been written applications, available. but This paper

about both the ATN

formalism

and

various content

unfortunately attempts

many of these sources

are not widely

to bring together

the primary

192

of

those

sources

and to p r o v i d e

the reader

with e x a m p l e s for

of several those who

styles wish

of ATN grammars. to learn enough

It is i n t e n d e d about ATNs to

to be a guide implement mechanism problems.

their own. but also

I have recent to of

attempted advances indicate grammar

to describe

not only the basic and u n s o l v e d tradeoffs

and a p p l i c a t i o n s time/space/clarity is suitable

I have also tried or style

since

no one grammar

for all purposes.

The f o l l o w i n g several skipped [Woods, the some a types of

section parsers familiar

introduces for with

the ATN

formalism

and

discusses part may be given in in of for

ATN grammars. the p r e s e n t a t i o n

The first of ATNs

by r e a d e r s 1970]

and the second may be s k i p p e d of parser implementation. involved

by those not i n t e r e s t e d follows a discussion

details

There

of the issues

and t r a d e o f f s

in w r i t i n g

ATN g r a m m a r s

variety

of p u r p o s e s

and i l l u s t r a t i o n s

of how some of the s y n t a c t i c The paper concludes with a

constructions suggested

of E n g l i s h may be handled. to f o l l o w in order

procedure

to f o r m u l a t e

an ATN grammar.

2. THE ATN F O R M A L I S M

basic

transition

network

(BTN)

grammar is a

looks like directed start

a collection graph state, with and a the

of finite labeled

state t r a n s i t i o n states and

diagrams; arcs,

each

labeled final

a distinguished The label i.e.,

set of d i s t i n g u i s h e d type (usually

states.

on an arc i n d i c a t e s part of speech) state.

the s y n t a c t i c

category,

of input A input in the which

w h i c h will sequence start

allow the t r a n s i t i o n

to be m a d e

to the next if,

is said to be a c c e p t e d some path

by the n e t w o r k of arcs)

beginning

state,

(sequence

can be followed

terminates

on a final

state.

The n e t w o r k permits

differs by

from a finite allowing the

state label

automaton on some

in arcs

that to

it be a

recursion rather

nonterminal

than a t e r m i n a l

symbol.

That

is, the label on some constituent beginning arc is which with is an

arcs may call found by

not for a word

of input

but for a the network

recursively new start

re-applying When

indicated the begun lower

state.

such a r e c u r s i v e onto

encountered, is

current to look level

computation

is pushed

a stack and a new p r o c e s s When popped a final and state the

for the d e s i r e d is reached, the

constituent. stack is

in this

suspended

193

computation have

is

continued.

The

input in

pointer,

in the meantime, (just after

will the to input

been moved

to a later point by

the level

sentence process.

accepted

constituent)

the lower

An attempt the

pop an empty means

stack when the

input p o i n t e r

is at the end of

that the s e n t e n c e

has been found

acceptable.

Figure levels. following ovals

illustrates all are

such

a basic

transition in

network with this paper

two the or are

In it and in conventions the name

subsequent used. of

diagrams

States the

are w r i t t e n state. Start

as c i r c l e s states

around

(label)

indicated is u s u a l l y states

by double obvious,

circles most

(the initial

state

for the entire and

grammar final

likely the

first one on the page)

are shown

by the p r e s e n c e state. I JUMP

of an arc l a b e l e d types that will

POP which does not be described in

terminate detail without

on any other in F i g u r e

The arc indicates

later;

a transition

may be made word in the is call

processing

any input, the

CAT means

that the c u r r e n t syntactic

input string must "consumed"

be of

indicated

category

(and

as the arc beginning

is taken), in the

and PUSH means state.

that

a recursive

is to be made

indicated

CAT ADJ

CAT N

PUSH PP/

Figure

I: A Small G r a m m a r

for Noun

Phrases

The c o n v e n t i o n state names will

followed

in this

paper

for

labeling

states

is that

generally

composed

of two parts

separated

by a slash.

194

The first noun

part

indicates for

the type of c o n s t i t u e n t the s e c o n d parse has part

being

processed either the

(a

phrase,

example); the

indicates or

how far sort of

through

the c o n s t i t u e n t which may

proceeded For example, a

construction grammar: discovered adjective adjective point where

occur that in

next.

in a h y p o t h e t i c a l it has been an an

S/IMP may mean to be

parsing that

sentence

imperative;

NP/ADJ

in a noun phrase past the place

either where

has been could

found \ or we have g o t t e n occur; NP/CONJ? that may be expected; like S/POP

in a noun phrase we are at a S/S that an entire in this sentence situation

a conjunction (a name

has been p r o c e s s e d to indicate

is also used

that a POP,

the t e r m i n a t i o n

of one level of the network, to remember that the the

is about to be done). state parsing for well the names have

It is c r u c i a l l y only to

important the

meaning

writer but

of the grammar; unique

s y s t e m does not use them as a n y t h i n g set of arcs coming from them. instead

identifiers could just as ...

Thus the states of S/, NP/HEAD,

be named A, B, C, D, changing however,

...

REL/PRO,

without discover, names grammar

the o p e r a t i o n that

of the parser.

A grammar

w r i t e r will state the of

the the use of

mneumonically will

accurate

(by the above c o n v e n t i o n for human

or any other)

greatly and

clarify

use and will s i m p l i f y

the w r i t i n g

debugging

the grammar.

A context strong

BTN

grammar

as

described

above store

is

weakly equivalent It differs

to a from

free grammar equivalence as in

or a p u s h d o w n only in

automaton.

its i n a b i l i t y

to c h a r a c t e r i z e

unbounded

branching,

ART

ADJ

ADJ

ADJ

...

ADJ

N.

The g r a m m a r grammar

of F i g u r e

I corresponds

to

the

following

context

free

(which uses empty

productions):

195

NP --> ART? Q U A N T ? ART? AJDS? --> --> --> i the

ADJS? I an

NMODS? i a

N PPS?

i ... I ...

QUANT?

I all

I some

i ADJ ADJS I red I ...

ADJ --> pretty NMODS? -->

I N NMODS I girl I love ...

N --> dog PPS? -->

I PP PPS? NP I in i with by I ...

PP --> PREP PREP --> of

Both

grammars

can p r o d u c e

and can be used

to accept

sentences

such

as

The new red law books. Each b e a u t i f u l Men with wives The t a l l e s t picture in the recent exhibit. (ambiguous)

in p r o f e s s i o n a l

careers.

boy in a group of students.

One can regular such 2.

look at a BTN as a model form.

of a

context

free rule

grammar [Woods,

in

expression

Thus a r e g u l a r

expression

1969]

as X ->

(A) B C* D can be r e p r e s e n t e d

by the BTN shown

in F i g u r e

Figure

2:

Another

Simple

Network

But it is well k n o w n for English. only of a In

that context the

free g r a m m a r s

are not

adequate is

addition, or

basic grammar input

we have d e s c r i b e d strings; it

capable produce

accepting which

rejecting shows

cannot

structure

something

about the r e l a t i o n s h i p s

a m o n g the words.

196

To make and a

a more of

powerful

model, thus with the

each

arc

is p r o v i d e d

with

test

sequence The to test the

actions,

producing an arc

an a u g m e n t e d must be

transition (in are of in

network. addition executed structure registers, local their to

associated label) for

satisfied actions pieces keep as them

arc The

to be taken, actions

and the

as the (tree which the

arc

is traversed. case of

construct etc.) and

structures, may be t h o u g h t of the

structures,

in p r o g r a m m i n g where they arcs are

terms set. can

variables and

level are

grammar

Registers be

contents changed,

available

on s u b s e q u e n t of the

and

combined,

copied,

or added

to as more

input

is processed.

The of

arrangement input

of states sentences, a

and

arcs the

reflects actions may

the

surface

structure

acceptable

and

permit be

rearrangements different provides of the a

and e m b e d d i n g s from the

to c r e a t e

structure This can

which very

quite

surface

structure.

general deep

mechanism

transformational same sort

capability

which

produce

structures and it m a k e s

the ATN

as those

of a t r a n s f o r m a t i o n a l in power to a T u r i n g

grammar, machine.

formalism

equivalent

We now an ATN

describe

in d e t a i l An arc

the

format

and as

operation

of the

arcs list

of of

grammar. which may

is r e p r e s e n t e d be w o r d s schemas words or

a parenthesized There are

elements types are

themselves by the

lists. I.

seven words of star

of arcs

as shown

in T a b l e

(Capitalized are

actual

elements, which will

lower be

case

in b r a c k e t s and *

descriptions

elements operator element.)

defined zero or

below, more

is the K l e e n e of the

which

indicates

occurrences

previous

The

first

element second

of

each

arc

indicates on the

its type

type. of the

The arc test

interpretation and which which will must may

of the

element The for

depends third the

be e x p l a i n e d be s a t i s f i e d occur in any

below. in o r d e r number is

element arc to

is an a r b i t r a r y be taken. POP arcs, The

Actions, generally register on

on all stored

arcs in

except

manipulate contents later

information are either

that

registers. as flags

constants of s t r u c t u r e . and/or

(often The the

used

to be t e s t e d built

arcs)

or p i e c e s

structures item

are

using the The which

previous features last state

register of the

contents current arc

current are found an

of input

and/or

item w h i c h type except

in the POP

dictionary. indicates

element of the

of every grammar

JUMP

is to be c o n s i d e r e d

next.

197

(CAT < c a t e g o r y >

<test>

<action>*

(TO n e x t s t a t e > ) )

(WRD <word>

<test>

<action>*

(TO n e x t s t a t e > ) )

(MEM <list>

<test>

<action>*

(TO n e x t s t a t e > ) )

(PUSH < s t a t e >

<test>

<pre-action>*

<action>*

(TO < n e x t s t a t e > ) )

(VIR < c o n s t i t - t y p e >

<test>

<action>*

(TO < n e x t s t a t e > ) )

(JUMP < n e x t s t a t e >

<test>

<action>*)

(POP <form>

<test>)

Table

I:

The Form of Arcs of an ATN G r a m m a r

CAT

arc

may

be

taken

if

the c u r r e n t

input word of the

is of the arc. A

(syntactic) WRD than the arc

category

specified

by the second

element

specifies

the exact word of input which like

is required, arc except

rather that

a category, input word

and a MEM arc is e x a c t l y

a WRD

must be one of the list of words (Some element "consume" implementations

which

is the second MEM arcs but All

element allow

of the arc. the second arcs

eliminate

of a WRD arc to be a list of words.) input when they are taken, that

t h r e e of these cause

is, they

the input p o i n t e r

to be a d v a n c e d

to the next word.

A JUMP made

arc s p e c i f i e s "consuming"

the state to w h i c h anything from the

transition input string.

is

to

be

without

A VIR arc placed

c h e c k s to see w h e t h e r on the HOLD

a constituent action

of the named

type has been arc

list by a HOLD

of some

previous

(see below).

A network

PUSH which

arc

initiates

new,

perhaps state and

recursive, which

call to the for a the the

begins A POP

in the i n d i c a t e d arc, as a which

looks state, level

constituent. state that network; syntactic

has no d e s t i n a t i o n state the for form as some

marks of

it leaves its second

terminal

element which

indicates

(usually the

some sort of of the

structure)

is to be r e t u r n e d

result

t98

analysis network. caused

of the

the

portion

of input to

parsed return

by the c u r r e n t to the PUSH

level arc

of the which

The POP causes process

control

at it's level to be invoked. in w h i c h

See Figure

3 for an and

e x a m p l e of the order POP are traversed.

the arcs of a path i n v o l v i n g

a PUSH

pUSH T 1

POP

POP

Figure

3:

The O p e r a t i o n

of PUSH

and POP Arcs

When Whenever

the parser a PUSH

is operating, occurs, this

a number register

of r e g i s t e r s list, along

are with

active. other

information, on the new

is saved on a stack w h i l e level b e g i n n i n g the stack with

the parser an empty

recursively register

operates When (lower

(lower)

list.

a POP arc is taken, level) before register the

is popped,

w i p i n g out the current

list and r e s t o r i n g

the r e g i s t e r

list which was c u r r e n t (the value input of

last PUSH. element

The c o n s t i t u e n t then

w h i c h was POPed becomes

the second

of the POP arc)

the current

item

for the rest of the PUSH arc.

In m o s t structure

of

the

examples is

given

in

this

paper,

the

type trees

of of in by the

produced

a p a r s e tree theory. as is sons Figure

like the deep s t r u c t u r e a of tree the

transformational linear form, it parentheses; subsequent terminal a linear

grammar is

To r e p r e s e n t a the of list root

structure surrounded tree and

written

of e l e m e n t s

the first e l e m e n t elements are the

the root which may be either both a standard for further tree and

leaves

or subtrees.

4 shows

representation

w h i c h has been

formatted

clarity.

199

ART AN

ADJ OLD

ADJ NP

N LION

NU SG CHASE THE

/ j i
/\
N

ART

/7\\
ADJ N DEER

NU

NU

// YOUNG

I SG/PL \

MOUNTAIN

!G

(S DCL (NP (ART AN) (ADJ OLD) (ADJ (NP (N MOUNTAIN) (NU SG))) (N LION) (NU SG)) (TNS PAST) (VP (V CHASE) (NP (ART THE) (ADJ YOUNG) (N DEER) (NU SG/PL))))

Figure 4:

Two Representations of a Parse Tree

Other structures can be produced just case representations:

as

easily,

for

example,

((ACT: HIDE) (ACTOR: JOHN) (OBJECT: MONEY) (TIME: PAST) (LOCATION: FLOWER-POT))

tagmemic representations [modified from Grimes, 1975 p.169]:

200

(NP

,ANIM

(DET:ART,DEF:THE) (MOD:ADJ:PLEASANT) (HEAD:NW,ANIM PL (NUC:STEM,ANIM:BOY) (PER:SFX,PL:S)))

semantic

representations:

(FOR:

ALL XI / (FINDQ:

TRIP CHICAGO) BILL)) (TIMEOF: XI)))

(DESTINATION (TRAVELER ; (EXPENSIVE XI) : (OUTPUT:

dependency well. Figure the grammar

structures,

stratificational

analyses,

and many others

as

5 shows

the details

of the

arcs

which

are needed edited ATN

to

make of a were 1974] is

of Figure containing the in

I an ATN. the LISP

It is a slightly Because [Weisman, shown

listing

computer developed they

file

grammar. language form first

grammars Teitelman, Each name state

using

LISP the

1967;

were written

below.

represented other in

as a list whose are the arcs lower read case

element

is the state state.

and whose are

elements and

emanating and begin I at

from that with a *. may

Comments

upper

(Readers It be written than LISP

not familiar should is chosen be it in a variety

with LISP emphasized

should that

Appendix

this

point.)

ATN parsers

and grammars

of p r o g r a m m i n g

languages, to modify SETR, but

and that

if one other

may be advisable The actions ATN grammar,

the notation. and LIFTR may are the basic extend have this been actions set. found of an are

SENDR,

implementors of actions causes

Here

descriptions (SETR <reg> the (SETRQ is

of a number <form>) of the This

which

convenient. set to

the indicated

register

to be

value

form. This is like It is SETR except that to sets the last <reg> element (QUOTE named

<reg> not

<value>)

evaluated. Thus (SETR

equivalent (NPCLAUSE))

(SETR a

<value>)). HEAD to the

HEAD

register

value

returned

by the function

NPCLAUSE;

(SETRQ HEAD

201

(NP/ (CAT ART T (* T is a predicate which is always true) (SETR ART (BUILDQ ((ART *)))) (TO NP/ART)) (JUMP NP/ART T)) (NP/ADJ (CAT N T (SETR N *) (SETR NU (GETF NUMBER)) (TO NP/N)) (CAT N T (ADDL ADJS (BUILDQ (ADJ (NP (N *) (NU #))) (GETF NUMBER)) (TO NP/ADJ))) (NP/ART (CAT QUANT T (SETR QUANT (BUILDQ ((QUANT *)))) (TO NP/QUANT)) (JUMP NP/QUANT T)) (NP/QUANT (CAT ADJ T (ADDR ADJS

(BUILDQ (@ (ADJ) # (*) (GETF DEGREE)) (* This will add the form (ADJ SUPERLATIVE root) for words like BIGGEST and the form (ADJ word) if uninflected.) (TO NP/QUANT)) (JUMP NP/ADJ T))

(NP/N (PUSH PP/ (PPSTART) (* the test checks that the next word is a preposition) (ADDL NMODS *) (TO NP/N)) (POP (BUILDQ (@ (NP) + + + ((N +)) ((NU +)) +) ART QUANT ADJS N NU NMODS) (DETAGREE) (* the predicate DETAGREE tests for agreement between the ART and N registers to screen out "a books", "an table")) (PP/ (CAT PREP T (SETR PREP *) (TO PP/PREP))) (PP/PREP (PUSH NP/ (NPSTART) (* predicate fails if the next word cannot begin a NP) (SETR NP *) (TO PP/NP))) (PP/NP (POP (BUILDQ (PP (PREP +) +) PREP NP) T))

Figure 5:

Details'of the Noun Phrase Grammar

202

(NPCLAUSE))

sets the r e g i s t e r

to a list which

has

one

element,

the word NPCLAUSE.

(ADDL <reg>

<form>) to

This

action

takes

the p r e v i o u s register, and sets

contents

(which

is of to

expected

be a list)

of the named

adds the value the register set,

the form to the left end of the list, this new list. sets it to If the r e g i s t e r a list which

has not been p r e v i o u s l y the value <form>) of the

ADDL It is

contains (CONS

form.

equivalent

to (SETR <reg>

(EVAL

(GETR <reg>))).

(ADDR

<reg> adds

<form>) elements to

This to (SETR ADDL

action the

is e x a c t l y of the

like ADDL except previous <reg>) list.

that It

it is

right

equivalent <form>)))).

<reg> and ADDR

(APPEND

(GETR

(LIST

(EVAL

are useful

for accumulating things

like a d j e c t i v e s

and conjuncts.

(SENDR

<reg>

<form>) It

This

is a p r e - a c t i o n

which

is only used on

PUSH

arcs.

causes

the r e g i s t e r

to be set to the value of form at to be i n i t i a l i z e d level network to by the PUSH. be like a

the lower This,

level of r e c u r s i o n allows

about

in effect,

the lower

subroutine to which p a r a m e t e r s can be passed via SENDRs.


(SENDRQ <reg> <value>) This is to SENDR as SETRQ is to SETR.

(LIFTR

<reg>

<form>) to level. the

This

is the i n v e r s e

of SENDR

in that it sets the just above the

register current

value

of the form at the level

(HOLD < c o n s t i t - t y p e > HOLD global list

<form>)

This

places

the

indicated

form

on list

the is a

as a c o n s t i t u e n t is

of c o n s t i t - t y p e . at all

The HOLD

list w h i c h with

accessible

levels.

This

action with

together

VIR

arcs c o n s t i t u t e left

a mechanism in

for d e a l i n g

the p h e n o m e n o n grammar theory.

called

extraposition

transformational

Examples

of this will

be given below.

(VERIFY <form>) arc.

Sometimes

it is useful its

to have

a second

test

on

an If on

VERIFY

evaluates

form as if it were just as if

a predicate. condition

the form fails, the arc

the arc is a b o r t e d

the

had failed.

(If this action on arcs.)

is implemented,

one may do

away with the test c o m p o n e n t

203

A number basic ones

of d i f f e r e n t

<form>s

may be used w i t h i n

an action.

The GETF, it as

are the v a r i a b l e s but o t h e r s to allow

* and LEX and the f u n c t i o n s GETR, (In a LISP These

and BUILDQ, is useful

may be included. any LISP form.)

implementation are e v a l u a t e d

forms

follows:

LEX is a l w a y s set to the c u r r e n t any m o r p h o l o g i c a l analysis

word of input

as it

appears

before

has been p e r f o r m e d

on it.

* refers arc

to the c u r r e n t it

item of input. as LEX.

On a JUMP,

POP, WRD,

or

MEM

has the same value thus the

On a CAT arc it is the root * = for arc "stop". the it On test is a and the

form of the word; PUSH arc, * is

if LEX = "stopped", current input actions

word on the

pre-actions, value

but on s u b s e q u e n t from the

returned

lower

level

computation

w h i c h was

initiated

by the PUSH arc.

(GETR <reg>)

returns

the c u r r e n t has never

contents set,

of the

indicated

register.

If the r e g i s t e r does not c a u s e

been

it r e t u r n s

the value NIL but

an error.

(GETF < f e a t u r e > returns omitted, NUMBER) for

<word>) value

checks of word

the d i c t i o n a r y the of indicated input is

entry of the feature. assumed.

word

and is

the

If the word Thus

the current may r e t u r n

(GETF

SG or PL.

It may also be used e.g.,

as a p r e d i c a t e will return has

features

without

values;

(GETF PASSIVE)

true or false d e p e n d i n g the f e a t u r e PASSIVE.

on w h e t h e r

or not the current its second

word

It may be used w i t h o u t features

argument

only on a CAT arc, category involved.

where

are d e p e n d e n t

on the s y n t a c t i c

(BUILDQ < t e m p l a t e > (an arbitrary marks) that

<form>*)

is a c o n s t r u c t o r of structure of forms.

that

takes

template and of

fragment

containing It returns the values

constants a piece

special

and a series results marks

structure for t h e follows:

from s u b s t i t u t i n g in the t e m p l a t e .

of the forms are as

special

The s p e c i a l m a r k s

+ expects

the c o r r e s p o n d i n g

form to be a r e g i s t e r ~o be s u b s t i t u t e d

name and for the +.

causes

the c o n t e n t s # causes

of the r e g i s t e r

the c o r r e s p o n d i n g the value

form to be e v a l u a t e d

as a LISP

form and

substitutes

for the #.

204

* causes

the value of * to be s u b s t i t u t e d . form. form be form. if the DET r e g i s t e r (@ x x ... appended x) and together.

It

does

not

need

corresponding @ appears follow in the it to

causes It

the does

lists not

which take a

corresponding As an example, and the form the

contained

the list

(DET then

THE) the

current

word of input on a CAT arc were (NU #)) DET

"books"

(BUILDQ

(NP + (N *) (NP

(GETF NUMBER)) (NU PL)). register

would return

structure

(DET THE)

(N BOOK)

If in a d d i t i o n were set to

to the r e g i s t e r s ((ADJ OLD)(ADJ

just m e n t i o n e d DUSTY)(ADJ DET ADJS RED))

the ADJS then

the form

(BUILDQ

(@ (NP +) the

+ ((N *) structure (NU eL)).

(NU #)))

(GETF

NUMBER)) (ADJ DUSTY)

would

produce

(NP (DET THE)

(ADJ OLD)

(ADJ RED)

(N BOOK)

(ABORT) the

is a form w h i c h arc had

causes

the arc to fail It is useful

just as if the test on expressions

failed.

in c o n d i t i o n a l

used

as actions.

(COND

(<pred1> (<pred2>

<e11> <e21>

<e12> <e22>

... <eln>) ...<e2m>)

(<predi>

<ell>

<ei2>

... <eij>))

This The

is the LISP way of w r i t i n g <e>s may be either

a nested

conditional

statement. I for

actions

or forms.

See A p p e n d i x

details.

(QUOTE

<value>)

This

is a LISP above

function

w h i c h keeps value

from being

evaluated.

Bee SETRQ

for an example.

It is p r o b a b l y may most NOT, be used

useful

to m e n t i o n

a few of

the

predicates forms). AND,

which The OR, his ones

as tests on the arcs are NULLR

(or in V E R I F Y

or COND

common and

predicates EQUAL,

and the B o o l e a n be even more and forms.

functions

but the user will new actions

likely to d e v e l o p Some useful

own tests follow.

than to invent

(NULLR

<reg>)

is true

if the re__ggister has never

been set or if it has

been

set to NIL.

205

(CHECKF

<feature> word to

<value>)

succeeds

if the value of feature value; e.g., (CHECKF

for

the is

current

is the i n d i c a t e d (EQUAL

ROLE OBJ)

equivalent

(GETF ROLE)

(QUOTE OBJ)).

(CATCHECK cat.

<word>

<cat>)

tests w h e t h e r to test words

word

is of the lexical as in

category (CATCHECK

It is useful

in r e g i s t e r s ,

(GETR V)

(QUOTE MODAL)). succeeds only if there are no more words in the input

(ENDOFSENTENCE) string.

(x-AGREE some

<form> sort

<form>) of

represents

a family of the

predicates e.g.,

to

check

agreement etc.

between

forms,

DET-N-AGREE,

SUBJ-V-AGREE,

(x-START)

represents on JUMP

set

of

"look-ahead

predicates"

which

are

useful

and PUSH arcs.

For example,

IMP-START

could be to test of the

used on a JUMP whether

arc from the initial an u n t e n s e d QSTART word. verb could

state of the grammar at the beginning

there were

sentence; began

similarly

be used to see if the input

with a q u e s t i o n

NIL is a LISP itself taken), which

consant as the

which means test on an

false, arc

this (since by

does

not

appear

by be

the arc would never a predicate

but it is the value r e t u r n e d

function

fails.

T is a LISP any

constant

which

is used to r e p r e s e n t in LISP may is i n t e r p r e t e d

true;

however,

since

non-NIL

value

as true in a B o o l e a n For example,

context, (GETR

any function can be

be used not but

as a predicate. only also

SUBJ)

used

in a form which uses the as a predicate set. which

contents

of the SUBJ r e g i s t e r or not the SUBJ

tests w h e t h e r

register

has been

To grammar

test

his

understanding,

the

reader

may wish proper

to m o d i f y nouns,

the

of F i g u r e s

I and 5 to include

pronouns,

and/or

possessives.

206

2a. A D e t a i l e d

Example

In

this

section

we p r e s e n t

a more

complex

grammar

and d e s c r i b e grammar levels complex in

the p r o c e s s i n g which,

of an actual with

sentence.

Figures

6 and 7 show a phrase

together

the noun p h r a s e handle a

and p r e p o s i t i o n a l number of

of F i g u r e s English [Woods,

I and 5, can

large

fairly

sentences. 1970].) Some

(This g r a m m a r

is an e l a b o r a t i o n it can

of that given

of the s e n t e n c e s

parse are:

The girl on the red bus was w a n t e d police. Will a boy scout help an old woman

in s e v e r a l

countries

by the

to cross

the street? to the p o s i t i o n of

The m a y o r w o u l d

not have w a n t e d

to be e l e c t e d

dog-catcher. The m o n e y was b e l i e v e d to have been hidden by a thief.

Was the fire engine A forest weeks.

trying

to get to the fire? in western Colorado for several

fire had been

burning

The details For

diagram probably reason

of

the

grammar confusing take

is fairly to those

straightforward, readers new to

but the ATNs. in

look very we will

this

the time to e x a m i n e will not just skim effort

this grammar this

detail. but will

It is hoped that the reader take pencil in hand

section

and put as much

into u n d e r s t a n d i n g

the f o l l o w i n g

explanation

as was put

into w r i t i n g

it.

Let

us consider

the p a r s i n g

of the

sentence of

"The mayor

w o u l d not We will with

have w a n t e d give an

to be e l e c t e d

to the p o s i t i o n

dog-catcher."

overview attention

of the p u r p o s e to the ones used

of all the arcs in the grammar, in the parse of this sentence.

special

Beginning temporarily, assume current "what" the

in state

S/ we look

for a noun phrase If

to serve, is

perhaps we

as the subject sentence is

of the sentence. declarative.

one

found,

(This is r e a s o n a b l e question phrase

since our like

noun phrase or "which".)

grammar

does not accept find

determiners

If we cannot

a noun

at the b e g i n n i n g We will not be a

of the sentence, give the

we assume Of

that we are in a through

question.

details

the parse

the NP/ level

(it w o u l d

good e x e r c i s e

for the reader)

but we assert

that the PUSH SG))

arc s u c c e e d s from is

and that the s t r u c t u r e the lower level

(NP (DET T H E ) ( N

MAYOR)(NU

is r e t u r n e d

and is put in the SUBJ register.

The TYPE

register

207
r.~1" N E G ~ C ~.~.V V / h

puSH Np/

CAT_V

(~

JUMP

~,

) . ~

CATv

Figure

6:

A Grammar

for S e n t e n c e s

set to the c o n s t a n t and c o n t r o l moves

DCL to i n d i c a t e

that the s e n t e n c e

is

declarative,

to S/NP.

In state S/NP the CAT V arc cannot be t a k e n with the word in the input, form the but the CAT MODAL arc succeeds. ((MODAL WILL)), extra using the root will

"would" is

The MODAL r e g i s t e r form of "would." later.)

set to the need for

(The is

parentheses

be c l e a r

The tense

recorded

in the TNS r e g i s t e r

in the form

(TNS PAST).

In state S/AUX we pick up the negative. ensures negatives that the arc has not been taken The NEG r e g i s t e r

The before,

test

on

this

arc

thus m a k i n g

double

unacceptable.

is being

used both as a flag incorporated in is I a one

for the test and as a piece of s t r u c t u r e the parse structure. to undo The test for

to be later "do"

in the verb r e g i s t e r like have "Didn't such

necessary meet

the effect

of d o - s u p p o r t

in s e n t e n c e s to

you in P a n g o - P a n g o ? " action test

If we were not p e r m i t t e d to have

conditional with the

it would be n e c e s s a r y

two CAT NEG arcs, removing be in

for "do" and an u n c o n d i t i o n a l and one with a c o n d i t i o n that

action not

it from the V

the V r e g i s t e r register.

"do"

208

(S/ (PUSH NP/ T (SETR SUBJ *) (SETRQ TYPE DCL) (TO S/NP)) (JUMP S/NP T (SETRQ TYPE Q))) (S/NP (CAT V (GETF TNS) (SETR V *) (SETR TNS (BUILDQ (TNS #) (GETF TENSE)) (SETR PNCODE (GETF PNCODE)) (TO S/AUX)) (CAT MODAL T (SETR MODAL (BUILDQ ((MODAL *)) (SETR TNS (BUILDQ (TNS #) (GETF TENSE)) (TO S/AUX))) (S/AUX (CAT NEG (NULLR NEG) (SETRQ NEG (NEG)) (COND ((EQUAL (GETR V) (QUOTE DO)) (SETRQ V NIL))) (TO S/AUX)) (JUMP VP/V (AND (GETR SUBJ) (AGREE (GETR SUBJ) (GETR PNCODE)))) (PUSH NP/ (NULLR SUBJ) (COND ((NOT (AGREE ~ (GETR PNCODE))) (ABORT))) (SETR SUBJ *) (TO VP/V))) (VP/V (CAT V (AND (GETF PASTPART) (EQUAL (GETR V) (QUOTE BE))) (HOLD (QUOTE NP) (GETR SUBJ)) (SETRQ SUBJ (NP (PRO SOMEONE))) (SETR AGFLAG T) (SETR V *) (TO VP/V)) (CAT V (AND (GETF PASTPART) (EQUAL (GETR V) (QUOTE HAVE))) (ADDR TNS (QUOTE PERFECT)) (SETR V *) (TO VP/V)) (CAT V (AND (GETF UNTENSED) (GETR MODAL) (NULLR V)) (SETR V *) (TO VP/V)) (CAT V (AND (GETF PRESPART) (EQUAL (GETR V) (QUOTE BE) (ADDR TNS (QUOTE PROGRESSIVE)) (SETR V *) (TO VP/V)) (JUMP VP/HEAD T (COND ((OR (GETR MODAL) (GETR NEG)) (SETR AUX (BUILDQ ((@ (AUX) + +)) MODAL NEG))))))

209

(VP/HEAD (JUMP VP/VP (GETF INTRANS (GETR V))) (PUSH NP/ (GETF TRANS (GETR V)) (SETR OBJ *) (TO VP/OBJ)) (VIR NP (GETF TRANS (GETR V)) (SETR OBJ *) (TO VP/OBJ)) (WRD TO (AND (GETF SCOMP (GETR V)) (NULLR AGFLAG)) (SETR SPECIALSUBJ (GETR SUBJ)) (TO VP/TO))) (VP/OBJ (JUMP VP/VP T) (WRD TO (GETF SCOMP (GETR V)) (SETR SPECIALSUBJ (GETR OBJ)) (TO VP/TO))) (VP/VP (PUSH PP/ T (ADDR VMODS *) (TO VP/VP)) (WRD BY (GETR AGFLAG) (SETR AGFLAG NIL) (TO VP/BY)) (POP (COND ((GETR OBJ) (BUILDQ (S + + + (@ + (VP (V +) +) +)) TYPE SUBJ TNS AUX V OBJ VMODS)) (T (BUILDQ (S + + + (@ + (VP (V +)) +)) TYPE SUBJ TNS AUX V VMODS))) T)) (VP/BY (PUSH NP/ T (SETR SUBJ *) (TO VP/VP))) (VP/TO (PUSH VP/ T (SENDR SUBJ (GETR SPECIALSUBJ)) (SENDR TNS (TENSEOF (GETR TNS))) (SENDRQ TYPE COMP) (SETR OBJ *) (TO VP/VP))) (COMP/ (CAT V (GETF UNTENSED) (SETR V *) (TO VP/V)))

Figure 7:

Details of the Grammar for Sentences

210

Going S/AUX we with

back

to

our

analysis,

we find o u r s e l v e s at the word which

again

in state Since we the

the input

pointer

positioned

"wanted".

have

something

in the SUBJ r e g i s t e r

agrees with the verb,

can f o l l o w the JUMP arc to state VP/V. PUSH NP/ arc would try to

If we had had no subject, as in "Will the clock

find one,

strike

thirteen?"

Whatever first verb

the sentence,

by the time

state

VP/V

is

reached

the

(or modal)

has been p r o c e s s e d . structure

In this state

all the rest V in The

of the c o m p l e x arcs

verb a u x i l i a r y

is processed.

All four CAT times

are self loops, unless may

i m p l y i n g that

they may be taken m u l t i p l e

any o r d e r first arc

the c o n d i t i o n s be

on the arcs make this

impossible.

taken only if the p r e v i o u s verb may be

verb were ("The

some form of promise was was up to

"be" and if the current broken," "Would he have on a s p e c i a l

passivized it places it

been k i l l e d ? " ) ; HOLD

what we t h o u g h t can be picked "someone"

the s u b j e c t later, put in

list as a NP so We invent

probably

as the object. This

a dummy

subject

the SUBJ register. to i n d i c a t e that

may not be correct, is found

so we set a flag, later in the

AGFLAG, the

if an agent was loved

in a b y - p h r a s e

sentence This

("Juliet ability

by Romeo.")

it should r e p l a c e and later

dummy. it on

to put i n f o r m a t i o n context

in r e g i s t e r s

remove of

the basis of s u b s e q u e n t Although decision this about

is one of the best we are making

features and

ATN grammars. reversing also point a

looks

like

later it can

the role of a p o r t i o n a decision until

of input, the

be t h o u g h t of as p o s t p o n i n g is reached.

differential

The verb be a

second form as

CAT of in

arc in state V P / V and have that been the

requs current

that the p r e v i o u s one be a past It

"have" "may of the

participle, remembers

killed"

or "has s u r p r i s e d . " an indicator,

the effect

"have"

by a p p e n d i n g

PERFECT, by the

to the TNS r e g i s t e r current verb.

and r e p l a c e s

the "have"

in the V r e g i s t e r

The untensed

third

CAT

arc by

may be t a k e n only a

only

if the current as in "Can I

word go

is to

and is p r e c e d e d and "He will

modal,

Washington?" handles

emulate

his s u p e r i o r s . " are p r e c e d e d

The final CAT V arc by a form of "be," as

present

participles

which and

in "He will PROGRESSIVE verb. register (For

be going to school" indicator a

"Was the clown crying?" and remembers been the

It adds a current the TNS

to the TNS r e g i s t e r like

sentence

"Has the g a m b l e r PERFECT

losing?"

w o u l d now be

(TNS P R E S E N T

PROGRESSIVE).)

211

Note that into

it w o u l d

be p o s s i b l e

to c o l l a p s e

all

four

CAT

arcs

one arc w h i c h

had a T test more

and a c o m p l i c a t e d

conditional

action. be

That w o u l d less

be s l i g h t l y This

efficient of than

for time and space but w o u l d verb sequences

clear.

method

processing

may seem

complex, the same

but it is s i m p l e r constraints.

a set of c o n t e x t

free rules to e x p r e s s

For our sample taken, resulting

sentence,

the

third register

and

second

arcs

would

be

in the

following

settings:

SUBJ: TYPE: MODAL: NEG: TNS: V:

(NP (DET THE) DCL ((MODAL (NEG) (TNS PAST WANT WILL))

(N MAYOR)

(NU SG))

PERFECT)

Finally creates so

only

the

JUMP

arc to state V P / H E A D if either we get the MODAL ((AUX

may be taken.

This are

arc set,

an AUX r e g i s t e r sample

or NEG r e g i s t e r s NEG))

for our

sentence

(MODAL WILL)

in the AUX

register.

(This is a good e x a m p l e

of the use of @ in a BUILDQ.)

In state V P / H E A D the V register. VP/OBJ. ways

we k n o w that is

the head

verb of the sentence we jump

is in to

If the verb

intransitive

directly There

We can look

for an o b j e c t by p u s h i n g

if it is t r a n s i t i v e . for a noun phrase

are two ("I lost which

to get an object:

directly

my m i t t e n s . " ) was placed

or by p i c k i n g on the HOLD

up on the VIR arc the list in

noun

phrase

V P / V when we d i s c o v e r e d sown."). None of those

that the arcs the

sentence

was p a s s i v e

("The seed was

three of

can be taken

in our example,

but we will

postpone

discussion

last arc w h i l e d e s c r i b i n g

the rest of the network.

When we get to VP/OBJ, either jump directly

either

with or w i t h o u t

a direct

object,

we

to V P / V P or find the w o r d if the A G F L A G was set to

"to"

and go to VP/TO. that the

In state V P / V P we can, sentence it, it is passive, is

indicate

look for the real agent register

of the action.

If we find "someone") We can also verb. The

put in the SUBJ is reset

(wiping out the dummy another agent. the

and the AGFLAG

to p r e v e n t

finding phrases

pick up any number agent may be

of p r e p o s i t i o n a l with

modifying

interspersed

the p r e p o s i t i o n a l

phrases.

We now

postpone above.

discussion

of the POP arc and r e t u r n

to the place we left off

212
Returning word is "to" marked to our sample sentence Since in state VP/HEAD, the verb "want" we find the

in the input in the

string.

in the V r e g i s t e r meaning in a it can passive

dictionary

with the since TO

feature we are

SCOMP, not

take a s e n t e n t i a l sentence, we can holds

complement take the

and WRD

arc to VP/TO. "to" is not in our like

A new register, placed in any A

SPECIALSUBJ, register complement sentence surface clause

the subject; does not

the word need tO for

since

it

appear looks

final tree. a

of the type we are l o o k i n g beginning with an u n t e n s e d However the deep of

declarative in the

verb w h i c h has no subject subject level, first of the e.g.

structure. may be

structure the upper After

embedded "The sword we may

the

subject

swallower have

liked

to eat greasy

food".

the

verb,

anything

in the rest of the c o m p l e m e n t hence

which

could occur

in the of

verb p h r a s e the

of a sentence, network

it is c o n v e n i e n t

to m e r g e

the end

complement

with the end of the s e n t e n c e

network

as shown

S/~~ENC E ~ENT
C O M P / ~
Figure 8: Merging Networks

abstractly the which

in F i g u r e to

8.

In order correctly, the

for this we must paths

double

use of a p o r t i o n

of

network are used

work

be sure that the r e g i s t e r s are set on both initial has t h r e e at the of our of

in the tail of this,

portions. pre-actions lower

To a c c o m p l i s h which

the PUSH COMP/ the SUBJ, TNS,

arc from V P / T O and TYPE

initialize

registers path

level with values the SUBJ which

from the c u r r e n t is

level.

On the with the

example,

register

initialized

contents this

SPECIALSUBJ,

is the old subject. we w o u l d

If we had r e a c h e d be s e n d i n g

state to

via the WRD TO arc from VP/OBJ, become the lower level

down the object

subject

("Alice w a n t e d

the parrot the

to s p e a k , " of

"The p r o d i g y

was e x p e c t e d

to win the prize."

Notice

movement

213

,'the

prodigy"

from

the

SUBJ to OBJ registers if any.

and then to SUBJ at a extract the tense,

lower level!). leaving

TENSEOF

is a function which will

behind the mood and aspect,

Thus the PUSH

we are

find

ourselves set; on

in

a lower

level computation which existed

in state before in a

COMP/ with three registers hidden

the registers

the stack and are completely (since "be" is untensed) registers set:

inaccessible. results

Taking the CAT V arc from COMP/ transition SUBJ: TYPE: TNS: V: Now SOMEONE)). to VP/HEAD

to VP/V with the following (NP (ART THE) COMP (TNS PAST) BE the first replace CAT the (N MAYOR)

(NU SG)

V arc may be taken. SUBJ register is set, by register

We put the mayor on the the dummy (NP (PRO and we take the JUMP arc Here the VIR arc removes NOTE:

HOLD list and

Then the AGFLAG without setting

the AUX register.

the mayor from the HOLD list and puts him in the OBJ register. we did not VP/VP VIR arc. really need to use the HOLD list here. register, power be say EXTRANP, OBJ of the (GETR EXTRANP)(SETR The which real must

We could have put a arc in place of is in holding e.g.

the mayor (JUMP the

into another

in state VP/V and had list

(GETR EXTRANP))) HOLD

constituents "What painter

picked

up at an even lower level, painted by?"

did he want to have his portrait

In VP/VP we push for the prepositional dog-catcher" structure SUBJ: TYPE: TNS: AGFLAG: V: OBJ: VMODS: Finally the structure (details left to the register. reader)

phrase and

"to the position the are:

of

place

resulting

into the VMODS

Thus the registers

(NP (PRO SOMEONE)) COMP (TNS PAST) T ELECT (NP (ART THE) ((PP (PREP TO) (N MAYOR) (NP (NU SG)) ...))) is taken. The BUILDQ creates

the POP arc in state VP/VP

214

(S COMP (NP (PRO SOMEONE)) (TNSPAST) (VP (V ELECT) (NP (ART THE) (N MAYOR) (NU SG)) (PP (PREP TO) (NP (ART THE) (N POSITION) (NU SG) (PP (PREP OF) (NP (N DOG-CATCHER) (NU SG))))))). This structure is returned to the configuration we were in at the time of the PUSH from state VP/TO, and the remaining actions place the structure in the OBJ register. words in the input string, the PUSH PP/ and value of the successful parse: (S DCL (NP (ART THE) (N MAYOR) (NU SG)) (TNS PAST PERFECT) (AUX (MODAL WILL) NEG) (VP (V WANT) (S COMP (NP (PRO SOMEONE)) (TNS PAST) (VP (V ELECT) (NP (ART THE) (N MAYOR) (NU SG)) (PP (PREP TO) WRD BY on that cannot arc be Since there are no more arcs

taken, so the POP arc creates the following structure to return as the

(NP . . . ) ) ) ) ) )

215

which

in tree

f o r m is:

~S
NP TNS AUX VP ,~T ~ / ~ P S \ PERFECT N E G / ~ N ~ AT ~ ~

S EO.E /

PRO PAST , V

/\~ NP
/j\
/

"

~ ~-PP NU

ELECT ART N NU

/ THE

I \ //1\ MAYOR SG TO ,A~.T N


/

PREP NP

THE POSITION SG

OF

PREP

/\

NP

N NU DOGCATCHER SG

I\

In top down,

this d i s c u s s i o n

we have been process

viewing

the p a r s i n g

process

as a

non-deterministic are satisfied the c o r r e c t discuss

where,

at any state, Of course, at every

any arc w h o s e of

label and test illustration, section will

may be taken.

for p u r p o s e s The

choice was made

state.

next

other p a r s i n g

algorithms

w h i c h may be used with

ATN grammars.

A few c o m m e n t s grammar the NP

are in order

about the

inadequacies there

of the

sample

just p r e s e n t e d . picked This up in

For p a s s i v e a by-phrase

sentences, can r e a l l y

is no check that

be the deep s t r u c t u r e s t r i c t l y on syntactic and

subject. grounds.

is not a check w h i c h for example,

can be made "Caesar

(Consider,

was k i l l e d

by B r u t u s "

216

"They be

will

be married were

by the

time the

the

clock

strikes could not

ten.") be ruled

If

semantic the Similarly, cases call


use.

check subject

made

on the head should were

of the NP to see whether parsing be "something,"

it could out. in a to

of the verb, subject

incorrect

the dummy "The

"someone" Again, form

like

mountains outside

worn

down

(by the wind)." determine which

to a function

the parser

could

The simplest their

complement cases.

structure

in this cause

grammar their

is also subject

limited rather

to the than

Some verbs

should

object

to be sent down:

John John This arc also these grammar

told his promised feature,

friend

to leave.

(obj) (subj) may be tested This to on the WRD TO grammar screen will Qut the

his friend if noted

to leave.

in the dictionary, which NP to send could

in VP/OBJ accept

to determine various

down. be

badly

formed tests but

sentences,

for example, added

"The house

has have painted." ill-formed more than

Additional strings, is desired

only

at the cost introduction.

of c o m p l i c a t i n g

for this

To show how large, Figure 9 shows a diagram

and yet u n d e r s t a n d a b l e , of the grammar [Woods et which al, some

ATN grammars was used 1972]. for This

can grow, the LUNAR can

question handle

answering forms

system

grammar

some

of anaphoric

reference, and

ellipsis, clauses,

several

types

of complements, expressions, constructions. ATN included seeing [Woods

reduced

and unreduced -passives,

relative

parenthetical syntactic

comparatives,

a number

of other

grammars

large such

enough as

to be really Readers ATN

useful who

are too are should

large

to be in

in a paper complete et al, with

this. of large

interested consult

listings

grammars

other system tests on is

references. 9 together and

The final 1972]

report

of the LUNAR a listing of many of the

question functions gives

answering shown used in

contains

of the grammar

in Figure

descriptions

actions

on the arcs. that easy

Leal

[Leal, in a

1975]

a grammar notation gives

based which

tagmemie of the

theory large

is written grammar

semi-LISP

particularly

for n o n - L i S P - u s e r s

to follow.

Bates

a listing

syntactic 1975]

used of

in the BBN speech a portion 1976]. of

understanding the semantic who knows

system grammar

in [Bates, for the

and diagrams in [Woods

same

system

et al.,

Anyone

! v.,,oJ

,I ..,7 V

( ...... ~I ~ . I ......
j

-"R\ "~ I~I:ie~q2l CAT &hV T ,,.," "%~ I /

C*T 0-.,.LN~ION(Q, I ,

(~lCTR TM(~( II ; i I C""~ZTjT,,D,,,'i,~SIl l ~ l l I Jiwl;iDlS(ElI

~ .

..+.-;'hr~_~ ) " "

<@~

POe INULL ~ R l m L

"
.........
mlStl l ' r II

WltO l T
. ~ +~'i'~
ke ~ ,

il ..................
,~t r

>'.!
~

!
i 5

......

~\\\\tll,.~/..>+7

.....

.... 7 (FITH&T

(O~rR (~li,ll) (~(ATINOQ~JVI)

:o.o<Avl \

%/

Figure
9:

The Grammar for the

LUNAR

System

(s/s ~
DCL NP &UX VP

JUMP (AND(NOT(WRD OF))(GETR ANAPHORFLG)} JUMP{OR(WRD OF}(NULLR ANAPHORFLG)) PUSH PP/(WRO(OF AMONG)} PUSH PP/PREP CAT FRO T a

VJR PPI, GETF PARTITIVE) PUSH PP/ PUSH PP/PREP JUMPT


\

JuMp s H8

~,,

CAT ~lAOJ CAT


N

CAT LI~T(

PUSH NPR a PUSH NPR/T CAT PO.~S 7

(GETR POPYAL) & ;'0


I~T H

CO

v CAT V 8

VIR PP (HPREP}

CATC01~JIf,

JpMPT

eusHvw/V (CHECKFV U~TENSED)

UP

Figure

9 (con't):

The G r a m m a r

for the L U N A R S y s t e m

CATI ~ P {NUt.L#AOV) T P0PT 13TUNtT.T'~" I

~
CAT ~ MEM(WHICHWHOTHAT~T Wm~wHgu T PUSHhiP/ (WROWHOSE] ~ ~~ ~u~"~ |_ --% (WMOWHOSE] ~. ~ S
2 T %

PUSH N P / A R T
T

~
CIT AOVT

VP

Figure

(con't) :

The G r a m m a r

for the L U N A R S y s t e m

220

Lamnsok [Grebe, two sets other

(a B e n u e - C o n g o will 1975]. of arcs Grimes parse

language [Grimes,

spoken 1975]

in part of the United cluster grammar an ATN diagram tmgmemic

Republic given in by

of Cameroun)

be interested

in the verb gives

followed

and actions: trees.

one produces

structures

and the

regular

2b. Formal

Properties

of ATNs

It unwieldy [Woods,

can and 1969]

be

shown to

that a

the

ATN

formalism It

described so has also left

above are

is

equivalent

in power that

Turing formal

Machine. systems. to e l i m i n a t e ATN

However, direct

many shown right state

inefficient

been and

it is possible to optimize an

recursion optimization

and

network

using

finite

techniques. algorithm for context in free recognition using [Earley, 1970] clever of and K

Earley's follows method time all

alternatives parse by Kn~3 where

parallel,

a particularly in an input on string the

of merging bounded

paths.

It can parse which is

strings of the

amount

n is the length

is a constant (but not on

of p r o p o r t i o n a l i t y the input);

dependent

grammar time is

for linear grammars

grammars

the recognition is n. about in any to

bounded

by n~2, are

and for LR(k) two

the bound to

There algorithm. in that just the bounds

interesting

things

note grammars

Earley's form be (not told can be the

First,

it works form)

for context grammars

free

a normalized linear is grammar

and it a u t o m a t i c a l l y (without Second,

achieves having

the

smaller

for the

or LR(k)

so restricted). of Earley's recursion used

an ATN grammar

recognized time time state bound

by a m o d i f i c a t i o n That is, to be reduced

algorithm

which m a i n t a i n s Simple

bounds.

removing may be

from an ATN causes to n. the to reduce

the upper finite of

from n@3 to n~2 or even

optimization

constant

proportionality

in the n@3 case.

2c. M o d i f i c a t i o n s

to the Grammar

Formalism

For particular grammar formalism

applications, (necessitating

changes

may easily

be

made

in

the

corresponding

changes

in the parser).

221

The

following

changes

have

been

incorporated

in v a r i o u s

ATN grammars.

I. after

Weights the test,

on arcs.

A number either

can be p l a c e d

on every

arc,

just

to i n d i c a t e

how l i k e l y the arc is to be taken is l i k e l y to this be gained from

from that taking

state or how much arc. paths The

information can

the

parser

use

information first.

to score

alternative

in order

to pursue

the best ones

A variation (using in its

of this w o u l d all c o n t e x t paths.

be to have

an action

on the arc w h i c h to

calculates

available)

a score

for the parser

incorporate

2. register the final

Actions contents

on

POP

arcs.

Such

actions

could

be used to lift for b u i l d i n g

to the next

level or to r e a d y

registers

structure. are not p e r m i t t e d the final here, they may have to be d u p l i c a t e d on

If a c t i o n s

each arc e n t e r i n g

state.

3. 1975b, free

Factorization 1976]

of tests.

For the BBN

speech

parser

[Bates,

each arc test was split only features (involving

into two tests; word) and

one was context the other was

(testing

of the c u r r e n t register

context-sensitive

contents).

4. allows which

Grouping a subset means

of

arcs.

The Burton

grammar-compiler to be grouped

(see below) in a list

of the

arcs from any state

that

whenever

one arc from that

set is taken the others even if the For

will n e c e s s a r i l y parse should

fail, up

so they do not have

to be e x a m i n e d

back

over the arc w h i c h was o r i g i n a l l y exclusive WRD arcs should be

taken. so

example, This

a set of m u t u a l l y

grouped.

is purely

an e f f i c i e n c y

measure.

5.

New

actions.

Because

of the EVAL arc. must

feature If this

of LISP, feature

any LISP is not

form can be used available, a

as an action

on an

fixed of

set of actions the

be known

to the i n t e r p r e t e r , still domain with has the

but the d e s i g n e r freedom to

parser/grammar actions parser is

combination to his

specify If the

particular to be

before another to be

implementation. program, able to

interfaced

say a s e m a n t i c call that

interpretation

routine,

it is c o n v e n i e n t

program

by an action (or whatever)

on an arc.

In that way the guide the

results further

of the i n t e r p r e t a t i o n parsing.

may be used to

222

3. P A R S E R S

FOR ATN G R A M M A R S

Because different down,

an

ATN grammar algorithms table

is s e p a r a t e can be used, and

from just other any

its parser, as types classical there of

a number exist parsers

of top for

parsing up,

bottom

driven,

context-free parsing to carry on

grammars.

In fact,

almost

context

free

algorithms

could be a d a p t e d contents Parsers

to ATNs

by p r o v i d i n g the tests

a mechanism and actions debugging they will

along r e g i s t e r

and to p e r f o r m

arcs of the grammar. or s p e c i a l

can be t a i l o r e d

for speed, for w h i c h

aids,

requirements

of the a p p l i c a t i o n

be used.

Although LISP, there such

all the parsers

described

b e l o w have

been

implemented in a high

in

is no r e a s o n why one could not be w r i t t e n as PL/I, ALGOL, should the PASCAL, be or BCPL, or even or at

level

language language. simulate grammar

in an a s s e m b l e r least able to

The l a n g u a g e recursion,

recursive to

and

ability

evaluate

a portion

of the though

as if it were

a part of a p r o g r a m feature.

is a very desirable,

not a b s o l u t e l y

necessary,

3a. ATN I n t e r p r e t e r s

The first reproduce) input string act

parsers like

w h i c h were w r i t t e n interpreters

for ATNs

(and the e a s i e s t using

to the

for the g r a m m a r - l a n g u a g e , The s i m p l e s t of the ATN parser

and d i c t i o n a r y depth first

as data. interpreter

to build to a

is a top down, parser state the for are

grammar

similar

a context tried

free grammar. in w h i c h

In this model they order

the arcs

from each allows

in the order to

are written. the arcs

This

grammar

writer come first

deliberately

so that the most in using which less

likely ones alternative

and thus

gain c o n t r o l Such a parser

over the order may be w r i t t e n

paths

are tried.

than 5000 LISP

cells.

The o r i g i n a l than each parser implies first. the parser of which

ATN parser

[Woods,

1970.

1973]

is

more

versatile

just described. contains at

It is based

on a list of a l t e r n a t i v e s , necessary was to r e s t a r t created. the This

all the which can

information

at the point that

the

alternative

alternatives

be done of the

in any order, current input

not just depth string, the

An a l t e r n a t i v e

consists

223

current state,

state

of the grammar, list,

the arcs r e m a i n i n g

to be tried

at that it

the r e g i s t e r

and the stack.

If the HOLD too, have

list is used, it is

must be r e m e m b e r e d useful current to

as part of an a l t e r n a t i v e the path of

and

often at the times arcs

remember

arcs w h i c h

been taken different the

level.

Alternatives process. tried

may be c r e a t e d The most c o m m o n one up" arc to

at s e v e r a l one

d u r i n g the p a r s i n g remaining Resuming another there to this arc be

remembers

after "backs

from the state that state and

is followed. looks for when is

alternative to try.

Another

type of a l t e r n a t i v e of a

may be c r e a t e d word. Another

is a m b i g u i t y

about the part of speech called LEXIC) when

created next words

(by a f u n c t i o n of input. "common

there

is a m b i g u i t y there

about the

word like

(This h a p p e n s which

in cases

where

are c o m p o u n d

market"

possibly

should

be collapsed.

One may think configuration current register follow process more more LUNAR to

of the p a r s i n g another, where (i.e.,

process

as one of

moving

from

one

a configuration the c u r r e n t one word

is a snapshot state,

of the and may The one or

state of the parse list). When

input

stack

processing

of input in

the parser parallel.

just one path or may try to f o l l o w of creating a path is e x a c t l y

several

the p r o c e s s

of c r e a t i n g

configurations concrete, system.

from a given

configuration. functions

To make

this a little for the

let us look at the main

of the parser

The f u n c t i o n argument. register lexical involve It

PARSER

is

called

with

the

input

string

as

its empty

sets up an initial It calls

configuration the

(initial LEXIC

state, to

list and stack). analysis expanding

function

perform This may making

of the string to d e t e r m i n e contractions, compressing

the next word. compound words,

substitutions calls active STEP

according

to d i c t i o n a r y

information,

etc.

Then PARSER currently

to c o m p u t e

a new set of c o n f i g u r a t i o n s

from each

configuration.

STEP, (usually

which may be given either as a result of backup),

a configuration uses

or an a l t e r n a t i v e to r e s t o r e creating a the new

its a r g u m e n t an arc, thus

state of the parse configuration. STEP returns NIL. If

and a t t e m p t s

to follow

no more arcs may be taken The functions 10. PARSER, LEXIC,

from the current and STEP are from

state, in

are shown

flowchart 1972]).

form in F i g u r e

(These

flowcharts

[Woods et al,

224 STEP..

Y E S ' ~ N 0 I SETUPACONFIG I ANDSPREAD COMPONENTS YES ~I I

TYPE ERROR COMMENT

=, RETURN NIL

I RESTORE SAVEDCONFIS, I ARCS,AND LEX

YES

I OET,,cs ,OR

! COMPUTE ARCS I

I CUNRENT I STATE
SYSCONJARC IF NEEDED

I ~ o ~ EXARCS

NO I RESTORE SAVEDCONFIGt ARC, ROOT, AND FEATURES

NO

ONFIS~

RETURN NIL

~YES
NEXT ARC ANDFEATURES LIST OF BLOCKS NO RETURN ==VCFS

I TYPEANDCHECK CONDITIONS NO

COMMENT RETURNNIL

STORE I ALTERNATIVES fOR /ANYUNTR ARCS ED

1=~NS IPERF~M I
L ~

ACCORDING TO I THEARCTYPE RETURNVCPS (NEW ACFS)

FOR ARC

Figure

10c:

The F u n c t i o n

STEP

225

PARSER:

YES

I SETUPINITIAL I CONFIGURATION (STATE S/)

I CALL DETOUR I TO SELECT ALTERNATIVE

YES

?NO

~
ALTCAT~

ALTCONJ ALTLEX

TORESUME

ALTERNATIVE

GETNEXT ELEMENT I FROMTRAIL

I CALLLEXIC 1 TO GET NEXTWORD

I APPLYSTEP TO EACHACF

YES ~DNO~. ~
Y E S +

NO
'E$

APPLYSTEP TOALL ACTIVE CONFIGURATIONS (ACFS

I - '1 1
ERROR COMMENT RETURNNIL

RETURN PARSINGS ANDALTS

SETUPCONFIG TOBE RESUMED

I
Figure lOa: The Function PARSER

226

LEXIC:

/,,
~,~ ALT$f / ~

VER ISETUPNEW LEX I ~

NO~",~

RETURN AND ANY NEW I A L T S I ALTERNATIVEs

t LEX '1 FIRST ] ELEMENT OF STRING

GENERATE
ALT O,.,P

i RET~

ALTERN"T'VE I
N ~ YEs ~ NIL RETURN

N@YES

INO

~
YES

CALL MORPH I

YES

/ A~ r \ ys IIN STRING ~.. s UBSTITUTES~-~-:':-'=~1 - GENERATE -

IM'KESUBST'TOT'ON I
I
iALT E RNATIVE S J

AND LOOKFOR PUNCTUATtON

"-..~R LEX?/

IA'-TS FOR

~-'~.~ ~.;;~ ~ ~-.._~o_~ RETURN ;,.

C LR Q E T EIyE / ~ A L E U SD F ^ TOGETDICTIONARYI,~-~EOUESTDEFFLAG ~ E T YR MS R N RF O U E I ~./.,..'~

~YES TRY 1"OMATCH I ICOMPOUNDS i I--CHOOSELONGESTI IMA'('CH GENERATE I a I ALTS, ANYOTHERS FOR I
RETURN ALTS

;NO
PRINTERROR COMMENT RETURN NIL

Figure

iQb:

The F u n c t i o n

LEXIC

227

When STEPed, a

PARSER

runs

out

of

active is

configurations used to select

which can be the next

function

called

DETOUR

alternative to be tried. alternatives special

If nothing special has been done to give the DETOUR will pickup alternatives in the

weights,

usual backup order.

This parser has the advantage of allowing actions on the arcs influence the order in which alternatives are chosen. the search

to

This means space by

that it is not necessary to completely exhaust following all possible parse paths; by

careful

ordering

of the the

alternatives, others to be

the most likely parsing can be found first, found later if necessary. are ambiguous, alternatives.

leaving

Because many sentences and

many partial sentences be able to together cope with with

any natural alnguage parser must The ability to order them,

the alternatives implicit in the factoring and merging help to reduce the number of alternatives which must

of the grammar,

be considered during a parse.

This

parser

has

also

proved

to

be

flexible

testbed for on

modifications and special features conjunction and selective

(for examples, placement

see the sections below.)

modifier

This parser 8 word

(compiled in LISP) using the grammar of Figure 9 could sentences in about 2 seconds of cpu time on a PDP-IO.

parse

Several mentioned.

important

efficiency

issues for both parsers should be

The first is the problem of storing register contents in a Although only a few

space-conserving yet easily accessible manner. registers are likely

to be used on any one arc, the total number of quite large. A representation

registers used in the grammar may be which fixes

this number or which requires large numbers of unchanged This of

registers to be copied would be wasteful of both space and time. problem can be solved by representing a register list name/value pairs. as a list

The function GETR searches the list from front to and returns the

back for the first occurrence of the named register associated value. SETR does

not change the name/value pair in the front a new name/value pair.

current list but instead adds to the This new pair

will effectively hide from GETR any old pair with the particular

same name. point, only

Thus to preserve the entire register list at a a pointer to the current front

of the list need be list and

remembered. remember

Other processes can add new information to the their new front pointer. Thus the on

only

register lists commonly shared

resemble a tree structure with much

information

228

branches; leaf to accessing

any root. one,

one This but

process makes

is only setting building. the

able to see its unique a register much set more often

path than

from than they

faster

registers

are g e n e r a l l y

are retrieved Another cases where

for structure way to speed a lot of up

parsing is likely At

process, to be done, POP,

particularly

in

backup

is the use of a the constituent is placed is checked found at in to the

well-formed-substring just found together the WFST. see whether current directly level. a

table with

(WFST).

every of input

the portion

it spans the WFST

Then whenever a constituent in the place

a PUSH input.

is encountered, If so, the the

of the desired

type has been

constituent parsing the WFST at

may be used the HOLD lower since list

from the WFST can be

instead taken

of redoing

(The use of the HOLD

list c o m p l i c a t e s from it only when

slightly,

constituent

if the current was

matches parsed.

the one which A similar

was current holds

the constituent

originally

situation

for the use of SENDRs.)

3b.

Inside-Out

Parsers

Two

parsers

have

been are

developed quite they

for

an

experimental from the

speech

understanding parsers. respect

system

which

different are applied.

preceding

They

illustrate

the e x t r a o r d i n a r y where

flexibility

of ATNs with

to the environment

Current identify homonyms meeting, incomplete a, Thus the),

and all forseeable bare),

acoustic signal. word

processors This boundaries words

cannot

uniquely with team have, and

all words (bear,

in an acoustic phonological of most at

is due to problems (tea meeting, white (of,

disputable

team eating), pronunciation low energy

effects

(why choose,

shoes),

function

in context

the b e g i n n i n g analysis the case that an u t t e r a n c e with

and end of an utterance, and word matching a speech understanding

errors should to way,

induced attempt in

by the acoustic to process the middle inside can

process. system

it is not n e c e s s a r i l y

left

to right; the

it may be better long content ends. In and about what word this can be these of

begin the

a reliably provide

identified out to

and work adjacent

from the grammar

of the sentence also sentence

predictions already of the

to a portion

of the

processed,

expectations subsequent

can be used (or previous)

by the rest portions

system

in its analysis

of the utterance.

229

With 1975a stream nature and

these b,

constraints 1976] which

in mind, could

a parser

was

developed

[Bates,

start

parsing

anywhere as to

in the input the that exact parse any

and could parse of the words up,

despite at

the lack of c e r t a i n t y in the stored did not input. were

each point

As p a r t i a l so

paths were built other parse

their pieces could use

in tables

that

them

need to r e p a r s e paths

common than make to a the the

sections complete

of input.

(This is like Using the

a WFST

for p a r t i a l the

rather could

constituents.) about

grammar,

parser

predictions extend gap

the words of words was

of lexical

classes

that c o u l d be used If

a sequence

either small

to the r i g h t or to the left. enough to c o n t a i n

between

words

just one word, to fill

parser gap. easily

could p r e d i c t The control

just the class or c l a s s e s structure of the parser with v a r i o u s It used usually

of w o r d s could

be m o d i f i e d of

fairly backup,

to a l l o w e x p e r i m e n t a t i o n and p a r a l l e l search.

combinations

sequential, and

a combination following

of d e p t h - f i r s t

breadth-first

techniques,

a single path but to a l l o w of

splitting the p a r s e r the

into p a r a l l e l to i n t e r a c t (notably

paths when frequently semantics)

desirable.

Care was taken

and e a s i l y with other in order to r e c e i v e

components guidance

system

and to

verify

completed

constituents.

As part of its i n i t i a l i z a t i o n , the grammar so that, for example, if a noun

the parser

set up

an

index nouns

into could

all arcs w h i c h in the middle

consume of the

be easily were

located.

Then,

utterance

presented

to the parser,

it would

retrieve

t h o s e arcs and set up these paths predictions left) the as to the

a partial new words

parse path were added

for each of them.

By a d d i n g onto to make

(and by using the g r a m m a r index to make be a grown

the right and the g r a m m a r "island" utterance. maintenance very general of words Careful of could use

predictions until it

to the

spanned

entire and the a

of

well-formed-substring possible partial

table paths

numerous

alternate

allowed

middle-out

parsing

algorithm.

A second, same driven parsing to obtain relation JUMP arcs system

faster, [Woods

more et al,

efficient 1976].

parser

was later built

for

the

It too was d e s i g n e d of the

for an island grammar while

strategy.

It did less

interpretation because

than the p r e v i o u s a set of useful ($I J S2)

system

it p r e - p r o c e s s e d states.

the grammar the of

relations

between

(For example, up e x c l u s i v e l y created

holds

if there S2.) The

is a path made pre-processor

from state

SI to

arrays

which described the i n h e r e n t l y

the g r a m m a r left-to-right

~n a b i - d i r e c t i o n a l way it is written.

way rather

than the in

The sets of r e l a t i o n s

230

were

used

by

the

parser

in a way similar [Earley,

to the L 1970]

set in Earley's and a technique indirect

algorithm similar

for context to Earley's

free grammars for eliminating

the stack and performing

PUSHes was used. This parser all possible had the advantage paths of being able to follow efficiently thus eliminating the need to

partial

in parallel,

try to calculate also These do more allowing

which paths were most context could sensitive also be

likely to be correct. and register at the ends as more of

It could thus was islands.

checking

setting,

very good predictions

to be made

predictions

tightened

information

added to the utterance. The only accommodation this that the grammar writer needed to make contents taken to

system was to mark actions which

used register could

with the place. a

state or states where the original (These context small "scope" price to declarations pay for the were is present to perform

setting

have

used to determine and accuracy

when sufficient This was of the parser's

a test or to do an action.) speed

predictions.

3c. A Grammar-Compiler

A system has been written Woods, 1976] which produces In this 11.

by R. Burton

[Burton,

1976;

Burton

and

an extremely

fast ATN parser converts second

by a two step (compiles) phase is the to

process. ATN grammar compile in Figure Figure 12.

the first phase, LISP function,

an interpreter function. process The

into a single combination.

thus producing This of the

a compiled

program which is

is the in

parser/grammar

is schematically program

illustrated shown

The operation

resulting [Burton,

(Both figures LISP

are from

1976].)

The

function which

is produced

by the first phase is a PROG labels and the tests

in which the state names of the grammar and actions are the "statements".

become

The function

looks like this:

(LAMBDA (PROG

(ACF) (special variables like STATE, STACK, REGS, HOLD,


, LEX)

SPREAD-ACF

(code to set up current

configuration)

231

(GO EVAL-ARC) NEXTLEX DETOUR (if (another word?) (if (another then (advance then input)

(GO EVAL-ARC)) alternative?) (ACF-BIt) (GO SPREAD-ACF) else EVAL-ARC arclabell arclabel2 (BRANCH STATE arclabell (code for arc) (code for arc) (RETURN failure)) arclabel2 ... arclabeln)

arclabeln

(code for arc))).

The arc code for JUMP, (if (arctype then

WRD, CAT,

and VIR arcs generally

looks like:

and test satisfied?) alternative for remaining arcs)

(create

(do actions) (DOTO nextstate) (GO nextstate-arclabell))

The next

function arclabel,

DOTO which

changes

the state and advances the function arc.

the input. through"

If the to the

arctype or test is not satisfied,

"falls

is the following

The arc code for the last clause. invoke the grammar:

arc of a state must have a "else PUSH arcs generate label

(GO DETOUR)"

arc code to recursively

(if (test satisfied?) then (create alternatives for remaining arcs) (DOPUSH pushstate (GO pushstate)) remaining-actions-label)

remaining-actions-label (do actions on PUSH arc) (DOPTO nextstate) (GO nextstate-lstarclabel) The function DOPUSH saves as the the current current configuration state) (with the

remaining-actions-label any pre-actions

on the stack and does level is started

to initialize

registers.

The lower

232

~
USER ARC ACTIONS

ATN GRAMMAR
R A M M A R coMPI

1
ATN OBJECT CODE BJECT LANGUA ENVIRONMENT

ATN COMPILATION DECLARATIONS

L "ME~ Ru~TT,
J

SE.TE,.CE

ATN RUNTIME ROUTINES

V.T

~ . ~ .NTERUSm~

'

XICAL ROUTINES

ATN DEBUGGING PACKAGE

PARSE(S)

Figure ii:

The Operation of a Grammar-Compiler

233

INPUT THE CHART A SENTENCE : OF

CONFIGURATION
I_

CONFIGURATION YES IYES

~
y
T

CAL

~
NO
!

CREATE SAVEALT I AND FORREMAINING ARCS

I
O~E~r I 1 CONF,GURAT.ON J
p~NO RESTORE HIGHER I I SAVECURRENT LEVELI

< ~%%H~ "~O.O


~

NO

NO

PARSE / A 9 ~ CREATE SAVE i AND CONFIGURATION INTEPRETATION

u
PERFORM ARC ACTIONS ADVANCESTATE AND LEXICALNODE I

I
Figure 12: The O p e r a t i o n of an A T N Parser

234

by

the GO to the arc label of the state. control will transfer the state

When

and if the lower where

level the

finishes, actions

to r e m a i n i n g - a c t l o n s - l a b e l (but not the input) changed

are performed, state

by DOPTO,

and the next

is begun.

POP arcs

produce

code like:

(if (test then

satisfied?) (create (DOPOP alternative form) for r e m a i n i n g arcs)

(GO E V A L - A R C ) )

DOPOP

builds

the s t r u c t u r e

to be returned,

restores

the higher

level

by g e t t i n g and sets

information

from the c o n f i g u r a t i o n The b r a n c h

on the top of the stack, activates the

to the value of form.

at E V A L - A R C

state w h i c h was saved

(remaining-actions-label).

This described the LUNAR about

"parser" methods,

executes about

very

much

faster

than the p r e v i o u s l y version sentences of in

10 times

faster

than

the compiled 8-12 word

parser with the same grammar,

parsing

150 m i l l i s e c o n d s .

For extremely

production good.

programs

where

speed

is essential, for example, the

this m e t h o d

is

The c o m p i l e r it

can check, can omit

whether of

the test code to

on an arc is T and if so evaluate The WFST, the test; this

generation much

sort of c h e c k i n g features

eliminates

inefficiency. (recursion, thus that the

compiler alternate the

allows

unused

of the ATN f o r m a l i s m etc.) to be

lexical

interpretations, even more. to the

removed,

improving whenever compiler grammar final

efficiency

It has the d i s a d v a n t a g e grammar both phases especially aids of

a change

is to be made

must be redone is large. There

-- a time

consuming

process,

if the in the an

are also few t r a c i n g similar to this

or d e b u g g i n g

version.

A method

one could be used to w r i t e an EVAL function.

ATN s y s t e m

in a language

w h i c h does

not have

4. V A R I O U S

T Y P E S OF ATN G R A M M A R S

In

this

section

we explore

some of the t r a d e o f f s to w r i t e

and d e c i s i o n s for a

w h i c h must be made

by someone

who w a n t s

an ATN grammar

235

particular

purpose. which

As was noted suits every

earlier,

there

is no one grammar linguist who

or is

style of g r a m m a r interested grammar language differ

purpose.

The

in t e s t i n g a

a theory of l a n g u a g e programmer

will w r i t e

a very d i f f e r e n t fast n a t u r a l those Here will are

from front from a

systems

who needs system,

a small, of

end for a p r o g r a m m i n g grammar written involved.

and both

for i n s t r u c t i o n a l

purposes.

some of the issues

4a.

Parsing

vs. G e n e r a t i o n

Computational language something However text, front which there

linguists for be used a

and those who want system tend to

to

build

"natural as

end" will are

think

of a g r a m m a r i.e.

exclusively

for analysis,

parsing. English assisted

many cases where answering

it is useful systems and

to g e n e r a t e computer

as in some q u e s t i o n programs. suffice

instruction

It is not at all clear w h e t h e r for both a n a l y s i s separately

the same grammar but b e c a u s e an

can or should ATN of grammar

and g e n e r a t i o n ,

is w r i t t e n

from its parser, of a n a l y s i s

it can be t h o u g h t

as a form w h i c h

is i n d e p e n d e n t

or p r o d u c t i o n .

A generator ATN grammar

(rather a

than a parser) as

may be w r i t t e n data and

which

takes

an

and

dictionary

produces the

sentences. can

Beginning randomly of PUSH, parser,

with the select POP, but

initial

state of the state

grammar,

generator

an arc from that

t O be followed. be n e a r l y the

The f o l l o w i n g same a word as in a

VIR, WRD

and JUMP arcs would and CAT arcs

w o u l d try to select on the arc.

from the

dictionary

which

satisfies

the c o n d i t i o n s

Of course, since

the method

just o u t l i n e d

will or

produce

random sentences It may be in

it is in no way however,

guided

by i n t e n t i o n s which

concepts. supposed

useful,

to test g r a m m a r s able to i d e n t i f y written to

are

to be "tight" input.

the sense of being a generator was

and reject

incorrect

Such

help d e b u g the ATN grammar very h e l p f u l have been in the

for the BBN of

speech

understanding which would

system and was (erroneously)

discovery

sentences

accepted

by the grammar.

One has been very

generation proposed to

scheme

[Simmons,

1973;

Simmons

and Slocum, and a

1972]

which a

is driven

by a s e m a n t i c A similar whose

network

grammar system of a

similar 1975]

BTN an ATN

grammar. grammar

but m o r e g e n e r a l is a node

[Shapiro,

uses

input

236

semantic describing

network

and

whose

output

is

a linear

string,

a sentence

the node.

If a grammar usually make the

is to be w r i t t e n assumption of E n g l i s h

exclusively

for parsing, be correct always

one

can

that the syntax.

input will

according be the is

to the common case,

rules

(This may not instruction

for example, to check

in a computer

assisted

system which

supposed

and c o r r e c t

input

from s t u d e n t s dialog

or in a system which of such a of by in

is e x p e c t e d dialog correct allowing practice.

to parse

a naturally

spoken

or t r a n s c r i p t

as opposed input then it

to w r i t t e n in many

text.)

If one can make can

the a s s u m p t i o n be simplified

places

the g r a m m a r input which

to accept

incorrect

it will not e n c o u n t e r E n g l i s h must A

A grammar checks

which

is w r i t t e n

to g e n e r a t e

contain grammar possible

a great many which inputs is

to e l i m i n a t e to help

incorrect

combinations.

designed

choose

among many c o n f l i c t i n g out the incorrect

must make

extensive

tests

to screen

input.

4b.

Competence

vs.

Performance

Linguists actually it

have

long made

a distinction and language

between

language

as people abstract that

use it (performance)

as one may ideally

(competence). that

As an example, relative

it is in the realm of c o m p e t e n c e clause that may be used this

one k n o w s but

a reduced of

in a noun phrase, is usually

it is a

fact

performance

embedding

performed

only once

and in the subject

position:

The girl The girl

the man k i s s e d

screamed. robbed k i s s e d screamed.

the man the mugger

Competence with the

tells

us that the verb of a sentence and s o m e t i m e s spoken: with the object,

must

agree

in number such as

the subject following

yet sentences

are often

The boys That's

in the band

plays

at the

football

game.

them!

There with verb,

is a verb

rule

(competence) up") may is very

which

says that beyond

a particle the most

associated of the say

(as in "call

be m o v e d

object

yet if the object is bad.

long or c o m p l e x

people would

the s e n t e n c e

237

I called

up my neighbor. up. in the town where forced me to move I used to live before to C h i c a g o last month up.

I called my n e i g h b o r ?I called my n e i g h b o r problems

financial

Most outside

systems

must

deal with

some a s p e c t s model. when For

of p e r f o r m a n c e this reason,

which the

are term

the formal

competence

"ungrammatical" parsing apples system. falls")

becomes

ambiguous may be

referring

to the input of a ungrammatical ("A

A sentence but may correct

conventionally

still be a c c e p t e d yet be r e j e c t e d

by the grammar,

or it may be

conventionally

by the grammar.

4c. S y n t a c t i c

vs.

"Semantic"

Grammars

For many words or

particular of

applications, words

one

can take range

advantage of likely sequenaes

of

key

classes

and the small the grammar

syntactic which are and

constructions correct not

by r e s t r i c t i n g only

to accept also

syntactically

but

semantically

pragmatically.

It is p o s s i b l e not only) by

to use a d i c t i o n a r y syntactic For example, classes be used. parts

which

classifies

words

not

(or

their

of speech

but also by r e l e v a n t sentences about

semantic people and goat,

groupings.

in a s y s t e m

to p a r s e

at a zoo,

lexicml could

such as ANIMAL, They w o u l d

PEOPLE, words

ANIMAL-ADJ, like (bear, and

PEOPLE-ADJ lion),

contain

(boy,

father,

woman), vain)

(hungry,

caged,

fierce),

(hungry,

naughty,

educated,

respectively.

Figure course, desired

13 shows

portions

of two d i f f e r e n t

semantic

grammars. to

Of any

a grammar extent.

may blend

syntactic

and s e m a n t i c

categories

purely

syntactic accept

grammar a large

would

be c o n f i n e d of

to the usual

parts

of speech which

and w o u l d

number green the

syntactic speep be

constructions In be

were m e a n i n g l e s s domain a

("Colorless where

ideas

furiously"). assumed solution of to

a limited

of d i s c o u r s e semantic

input

can

meaningful, parsing

grammar

is a very

efficient

to the

problem. See

It is also useful [Burton, 1976]

for the g e n e r a t i o n

meaningful

sentences. s y s t e m using

for the d e s c r i p t i o n

of a very e f f i c i e n t

such a grammar.

238

WRD ~3HPU OUTPUT) T

~
ff

~~.o.,

~T~,oo~"
C4T 4f

POP

MEAS/QUANT

CAT

(B~TWEEN i FROM}~.~ ~ S ~

pRONOUN/

~O ~ANDTG,~

?.OS~IpRONOuN/

Figure

13a:

A Semantic

Grammar

for C i r c u i t

Measurements

Semantic which author system actions accept same accept knows [Bates,

grammars

tend to be much l a r g e r

than s y n t a c t i c

grammars this

the same set of sentences. of is one she wrote 448

The l a r g e s t

ATN g r a m m a r

for the BBN speech states, 881

understanding and 2280

1975];

it c o n t a i n e d limited

arcs,

but was more

in the v a r i e t y 386 a c t i o n

of c o n s t r u c t i o n s syntactic grammar grammar is that

it could for the

than a 83 state, system. Another

202 arc, drawback

to a s e m a n t i c

it must be

be w r i t t e n extremely

anew if the domain impractical

of d i s c o u r s e to w r i t e

is changed,

and it w o u l d for

to a t t e m p t area.

such a g r a m m a r

anything

but a l i m i t e d

application

4d. ~

Prosodics

Because complex

syntax,

semantics, languages,

and

pragmatics

interact to a t t e m p t

in a very a similar

way in n a t u r a l

it is d e s i r a b l e

fusion when m o d e l i n g

linguistic

processing.

TODAY YESTERDAY TOMORROW

PUSH ORD/ e~ Mr JUmp ~/

~USH y.~AR

T }Js cAT '~lF"~" JUMP

E- ~'~

EX-ALL-OF?/

O~'-

EX'DET'PL- INDEF/
- - -

Figure

13b:

A Semantic Grammar

for Dates and Travel Expenses

JUMP , . . ~ ~ . - - - - - - - ' - -

FOR

F~RE FARE~ ......----~ ( EXPENSE/FARE

~u~rp THERE
PUSH CiTy/

BUDGET JUMP

POP

- PREVIOUS PRECISE PO

COST C O ~ T . ~ = v ~ . .

EXPENSE-S

0 pUSH

- EX~SE/COS~-~
PUSH DATE MOD__~/ JUMP

JUMP

Figure

13b

(con't):

A Semantic

Grammar

for Travel

Expenses

241

As

was

described on

above,

it

is

possible

to

place

semantic but and of

constraints this

the grammar

by using s e m a n t i c the d e s i r e d routines

lexical

categories, Tests

is not the only way to achieve on arcs may call

synthesis. to check within a

actions

semantic

any number noun

constraints and the

such as the o r d e r i n g of a d j e c t i v e s between modifiers to call

phrase The to is

agreement

and the things

they modify. routine as it

parser

may also

be m o d i f i e d

a semantic

verification

test the c o m p l e t e n e s s produced by a POP arc.

or m e a n i n g f u l n e s s

of each c o n s t i t u e n t

If the parser be modified,

is s t r u c t u r e d to allow functions Similar speech, (e.g., body

the order

of a l t e r n a t i v e s

to to

the s e m a n t i c

called

on the arcs can be used may be in in used text, an to and

effect

the m o d i f i c a t i o n . features events in

functions

detect even

prosodic

punctuation actions)

non-linguistic environment. Weischedel presuppositions the f o l l o w i n g

appropriate

[Weischedel,

1976]

demonstrates

an ATN to compute sentences.

the

and e n t a i l m e n t s

of fairly c o m p l e x

Consider

sentences: doubted that John m a n a g e d to t r a n s l a t e an

a. The p r o f e s s o r assignment. b. The p r o f e s s o r c. The p r o f e s s o r assignment. d. The p r o f e s s o r translated Sentences

is human. believed that John a t t e m p t e d to t r a n s l a t e some

believed

that

it is not the case that John

some

assignment. by a, while a entails d. Since of to

b and c are p r e s u p p o s e d and e n t a i l m e n t s syntactic

presuppositions certain have words

are d e t e r m i n e d constructs as well used,

by both the m e a n i n g it is necessary

and the entries

special

in the lexicon

as an ATN which can accept

the r e l e v a n t

structures.

For

systems

dealing

with speech

input,

prosodic system

information [Woods et

is al,

very important. 1976] a special words would

In the BBN speech pseudo action

understanding PBDRY

called

was p l a c e d on arcs w h i c h was expected. This

consumed function

before in

which effect

a prosodic change

boundary

the score of the process was actually provides detected

depending in the

upon w h e t h e r expected

or not a prosodic Prosodic

boundary

location.

information

good c l u e s ( f o r

242

humans)

about vs.

syntactic a yes-no

boundaries, question),

sentence pronominal

type

(for

example and other

imperative things, detect

reference,

but very little

is currently

known

about how to

automatically

and use such information.

It

is

not one.

always A system

necessary

to

have

tradeoff between and generality is currently

the of

efficiency a syntactic

of a semantic

grammar and the compactness [Bobrow and Bates, syntactic 1977]

being a

built which uses a very general case-oriented rules). and semantic dictionary

ATN grammar

together with

(that also contains

semantic

interpretation parsing, is

The grammar uses case information interpretations

to help guide the

can be done as soon as a constituent size (75 states, and 153 arcs)

complete. sentences

This grammar

is of moderate

but,

using Button's

grammar-compiler,

has parsed

interpreted

20-word

in under one second.

5. LINGUISTIC

ISSUES

There studied paradigm suitable ATNs in of

are

number degrees

of of

syntactic depth by

structures linguists.

which have been The current is not very purposes. ideas of and

varying

linguistic

theory,

transformational or a applied captures While

grammar,

for either a

theoretical model in way.

computational many of the

provide

which

transformational theoretically ATN grammar natural been

grammar

computationally

efficient

interesting

it is not necessary

for every in a use not

to account

for all the structures

which may be used written

language,

and in fact most grammars ad hoc methods it is important

for practical

will have to include studied

to deal with those which have

linguistically, captured

to khow that many of the grammar some can such

generalizations be handled issues.

by the theory of transformational The following are

smoothly

by an ATN grammar.

5a. Extraposition

In

many

English position

sentences, up and to

a constituent the left,

may be moved out of its

from its original

deep structure

243

subtree examples:

to

a place

in a d o m i n a t i n g

tree.

The

following

sentences

are

This

is the cat that

I was afraid

someone

was going

to try to steal. to speak to.

The man

in the red shirt

is the one who John w a n t e d

The p r o b l e m during used

with

this c o n s t r u c t i o n cannot

is that

the c o n s t i t u e n t

when

found up and is one is

parsing

just be put in a r e g i s t e r at w h i c h it is likely

to be picked to be needed in the

later

because

the place

or more reached,

levels

down;

by the fiime that the proper is hidden

place

input

the c o n s t i t u e n t

and i n a c c e s s i b l e

on the stack.

The HOLD in an

action

and VIR arcs may be used uncomplicated and places way.

to h a n d l e action

this

problem a

efficient, with

The HOLD

associates list

constituent the HOLD

a name

this pair

on a s p e c i a l a global

called which is

list.

The HOLD

list is in e f f e c t

variable

accessible any lower

at all lower

levels.

A VIR arc on the o r i g i n a l from the HOLD current

level

or at it the in

level may remove

a constituent occurred that

list and use in used

just as if it had a c t u a l l y input string. To

at the held

position are

insure

constituents they are

constructions

dominated so that

by the one in w h i c h

found,

the parser

must be m o d i f i e d make the arc fail

every POP arc has an i m p l i c i t which were placed

test w h i c h will on the HOLD

if any c o n s t i t u e n t s level

list at the c u r r e n t

remain on it.

The

HOLD-VIR

mechanism It would

is not a b s o l u t e l y be possible to

necessary put the

to deal with extraposed a PUSH the an is a of

left e x t r a p o s i t i o n . constituent were done. in JUMP

a register

w h i c h was then

sent down

every time

arcs could then test place.

for that

register

and insert to lift This

constituent indication

in its proper of w h e t h e r

Every

POP arc w o u l d have had been used. not have

or not the c o n s t i t u e n t procedure which

complicated, the p r e v i o u s

error-prone mechanism.

does

the c l a i r t y

Another place of

way to avoid arcs

using the H O L D - V I R

mechanism

is

to

use

in

VIR

JUMP arcs w h i c h which 14. is being

put an i d e n t i f i a b l e A tree with

"dummy node" a dummy

into the c o n s t i t u e n t node level is given where

built.

such

in F i g u r e the HOLD

At a higher would level

level have

(on the PUSH arc on the been done) a copy of the the

action

structure

returned

from the lower

can be made,

substituting

appropriate over the

structure one just

for th'e dummy node. described is that the

An a d v a n t a g e constituent

of this method with the dummy

244
node may be placed to s u b s t i t u t e numerous in the WFST for use by other paths, which may there an e x p l i c i t want are test

a different

structure compared

for the dummy.

However,

disadvantages

to the HOLD method;

must be made level where constructing higher

to avoid r e t u r n i n g it should have constituents

a constituent

with a dummy node time may be

from a wasted at

been r e p l a c e d , with dummy n o d e s

much

w h i c h cannot do a g r e e m e n t

be r e p l a c e d tests

levels,

and it is more d i f f i c u l t ~ t o portion

between

the e x t r a p o s e d

and the rest of the c o n s t i t u e n t .

/\
I

"t/T
NP AUX NU SG

VP

PAST GIVE NIP PREP ~ / / NP TO PI~O FE~S


**NP~* YOU

NlU
SG
Node

Figure

14: A R e l a t i v e

Clause

Tree w i t h

a Dummy

Extraposition sentences like

may happen

to the

right

as well as

the

left.

In

The resort

I went

to w h i c h was on the M e d i t e r r a n e a n were there w h i c h crossed the road?

was pleasant.

How many c h i c k e n s

complete

constituent

is m o v e d

out of a n o t h e r the loss)

constituent position

(which

is

still

completely

well-formed

despite

to a

farther

to the right

and above

the o r i g i n a l

constituent.

To register some

handle

this

right <state>)

extraposition, creates state later at w h i c h time,

two new actions combining the parsing

are used. current at is

The action

(RESUMETAG

a marker

list with the named time. on Then at some

could c o n t i n u e (RESUME)

later

if the action

encountered the p a r s e r

an arc,

the m a r k e r

is r e t r i e v e d was e x e c u t e d

and the c o n f i g u r a t i o n is re-established.

was in when

the R E S U M E T A G

245

The e x t r a p o s e d completed containing

text

can be parsed is used

as usual,

and when

a POP is done

the

constituent the RESUME.

as the c u r r e n t

input

item on the arc

5b.

Conjunction

Conjunction for both the In

in E n g l i s h theoretical its

(and many other linguist form, and two

languages) the applied

is

problem

computational are

linguist. conjoined; in an

simplest complex,

complete constituents

constituents

at its most

numerous The

can be c o n j o i n e d illustrate the

overlapping of c o m p l e x i t y .

manner.

following

sentences

range

The q u i c k Enclose Will

brown

fox and the lazy dog are with the red, orange

famous. and blue packages.

this m e s s a g e

you plant

the roses or cut the g r a s s ? my cat. the c o u r s e hard to fail. the boy next

I groomed

and b r u s h e d

The easy h o m e w o r k She often door.

and exams made

fought with,

later married,

and soon d i v o r c e d

If only single the last

complete

constituents

could

be

conjoined the

(either

by

conjunction, conjunct,

a list with commas or a list with

having the

conjunction

before all by be

conjunction be made

between conjoinable

conjuncts) having PUSHed level a

then any type of c o n s t i t u e n t new level of the grammar formerly

X could

c a l l e d XLIST. PUSHed to

XLIST/ would X/. The

to by all the would simply

arcs w h i c h

XLIST/

PUSH to X/,

accumulate

the result In the

in a register, latter conjunct. "or," would case If and be

and either control

POP or find a c o n j u n c t i o n transfer to X L I S T / is allowed,

(or comma). to look for

would

for the next

more g e n e r a l "not" with

conjunction different

example,

"and,"

precedence,

then a m o r e

complex

grammar

required.

However, boundaries add

general

conjunction

which

can

violate

constituent to

cannot

be c o n v e n i e n t l y

handled which

this way. some

It is p o s s i b l e insight

a facility process. reduced

to the ATN parser This without facility, requiring

provides

into the 1973]

general handles deep

called

SYSCONJ

in

[Woods,

conjunctions

and c o n s t r u c t s any c h a n g e s

the a p p r o p r i a t e to the grammar.

unreduced

structures

246

The most general kind of conjunction

is the

reduced

conjunction In

as seen in "John seldom talked about and tried to forget the war." this case the conjoined fragments "... to forget ..." are not constituents, There is,

seldom talked about" and "tried nor are they similar at either however, a great deal of

their beginning or their end.

structure to conjunctions of this type. conjunct (after "seldom") where

There is a place in the first

the parser could begin parsing the (after "forget")

second conjunct, where first. the

and there is a place in the second is in

parser

the same state it was in at the end of the

Because of this regularity, into the parser it. to intercept

the SYSCONJ a

facility before

may the

be

built grammar

conjunction

processes

At this point the parser has available to it the entire on (how or

history of the parse up to that time in the form of configurations the path and stack. SYSCONJ selects one of these configurations a deterministic, and non-deterministic,

it decides which one can be semantically motivated

decision)

restarts

it using the string in

following the conjunction. when the conjunction was

The configuration which the parser was encountered is saved. The

restarted some

qonfiguration will allow the parser to proceed until it

reaches

portion of the input which is shared with the suspended configuration. Then the paths can be combined and the constituent completed. at which the If the

restarted configuration cannot find a point configuration can join it, it is

suspended

an indication that the original and another one may

selection of a restart configuration was in error, be tried.

This method is potentially very combinatorially it may be guided by some syntactic heuristics

explosive,

but

such as restarting just same category as the

before a word which is identical to or of the word after the conjunction and in the house barked"). (e.g.,

"The dogs in the yard,

on the porch, but for

Semantic guidance may be useful also, It also will not work

this has not been investigated in depth. embedded problem. complex structures.

Further study and experimentation with the

this feature are necessary to show the limits of this approach to

247

5c. Punctuation and Contraction

Usually

punctuation

is an aid to parsing because it can be used as in

to delimit the scope of constituents, John, the plumber, fixed it.

He eats fried eggs, over-ripe bananas, and soft custard. It is useful punctuation to have a prepass to scan the input to separate , or WRD ;

from the preceding word.

Then an arc like WRD

or CAT PUNCT may be put in the grammar. It is tempting to separate endings like "n't," and be a turn prepass, but here the problem is more complex. for "is" or There it are can "'ve," and "'s" ~"'s" can

them into the separate words "not," "have," and "is" in the For example, be the possessive morpheme predecessor on where may be restrictions grammar to words

contraction -> "will

("Helen's jewelry"), and "n't" can change the form of its ("won't" not"). also contraction can occur lost by

("Mary's eating and Joe is too," but not "Mary's One way to handle which

eating and Joe's too"), so information useful to the expanding the contraction out of context. the latter problem is to attach a feature CONTRACTED have been so expanded.

5d.

Modifier Placement One of the most common sources of syntactic ambiguity in English is modified by a modifier. This is the true of a series of prepositional phrases, and it also Often, but not always, semantics determines

is the problem of what head particularly occurs with adverbs. correct attachment: I saw the man in the I

park

~with

telescope~ J

~ark

sui~

~igeons.

Paul borrowed the book that ~ e l o n g e d

t o ~ Jane on Friday.
!

L menti~

]
recently. ~ikesJ

The orchestra performed th-e music Walter ~ r o t e ~

248

Woods' placement called search

parser

[Woods, that was an for

1973] invoked arc

incorporated by a special

a selective type of

modifier POP arc would the above,

facility

SPOP. the

When stack

SPOP other

was e n c o u n t e r e d , that

the parser could use

configurations For the second would

constituent an SPOP PUSHing clause the

about to be SPOPed.

sentence find

shown

arc at the end of the PP/ n e t w o r k for (after a prepositional phrase

configurations for a r e l a t i v e At

(after

"Jane"),

"book"),

and for a noun p h r a s e the most recent

(after PUSH,

"borrowed"). the SPOP

configuration

for

process Then

determines it finds

that a POP could have out that the next

been done higher

instead

of the PUSH.

level

(the PUSH

for a r e l a t i v e means that

clause)

could also PUSH

for a p r e p o s i t i o n a l for a r e l a t i v e

phrase.

This

the c o n f i g u r a t i o n prepositional

PUSHing

clause

is a c a n d i d a t e

for the

modifier.

Continuing configurations head which (This

up

the

stack Then

in

the

same way,

a list of c a n d i d a t e associated examined with the to see

is made.

semantic

information is

of the level ones

represented

by the c a n d i d a t e s with the s e m a n t i c come from the

may be a s s o c i a t e d information functions.) may

head of the modifier. dictionary or from heads heads with a

semantic

special-purpose which which

Checks w h i c h may be made ("sincerity type on the sense

include: table") ("consort ("see

forbid m o d i f i e r s require

of that type of that

modifiers

to make a

criminals"), telescope"). the m o d i f i e r

and heads

w h i c h may use such configuration

modifier

with

The chosen most.

is the closest for other

one which placements

needs (in

Alternatives

are c r e a t e d

case of b a c k u p ambiguities),

or to e n s u r e

the e v e n t u a l one

production

of the less likely

and the p r e f e r e d

is continued.

6. D E V E L O P I N G

AN ATN G R A M M A R

This grammar.

final

section that

is d e s i g n e d

for those who wish to w r i t e to have learn a a parser lot

an ATN

Remember

it is not n e c e s s a r y grammar. One can

in order the

to e f f e c t i v e l y structure of

use an ATN a language

about

(even a r e s t r i c t e d the l a n g u a g e

subset)

by going t h r o u g h Careful portions hand of the

the e x e r c i s e simulation grammar.

of e x p r e s s i n g

in an ATN

form.

of parses

is u s u a l l y

sufficient

for t e s t i n g

249

The mind

first types

step of

in w r i t i n g sentences one

a grammar would

is to have

a clear What

idea

in

of the

like

to handle. Is the of ten type

aspects going to

of c o m p e t e n c e be used

must

be h a n d l e d ?

Of p e r f o r m a n c e ? Make a list

grammar or t w e n t y

to g e n e r a t e Then for

or to p a r s e ? decide on the

sample which is

sentences. desired,

general

of g r a m m a r which will

example,

decide

on a s y n t a c t i c

grammar

produce

stratificational

structures.

Next, which has you

sketch find

an ATN

diagram sample tests

of

the

most After It

common the

constructions structure to use arc~

in your add

sentences. and

surface

been

drawn, chosen

a few

actions. names with and

is a good on

idea

carefully (as will

state its

and r e g i s t e r together

to r e c o r d

every

a comment) use the

purpose Like

a sample

phrase

or two w h i c h program the as it

arc.

completely is almost ideal the

commenting

a computer done, but

is being one the can

written, come

this

never easier

actually it will

closer

to this

be to debug

and m o d i f y

grammar

later.

Look two w a y s to with PUSH net.

for

portions

of the these: arcs.

diagram merging Every

which

are

identical. and m a k i n g

There a new

are level faced to

to c o n s o l i d a t e by PUSH

by l o o p s

be r e a c h e d the to choice

grammar a new

writer level set

is e v e n t u a l l y of the

of w h e t h e r

to create more

network

or to use

a longer,

complex

of arcs

in the

original

For grammar noun

inexperienced into is levels a

grammar can

writers,

deciding It seems about

where

to

separate to say that

a a a

be a problem. but

natural a

phrase

constituent,

what

verb

phrase?

determiner?

an a u x i l a r y ?

a reduced

relative

clause?

Some constituent. be moved

linguists To them as a

have

fairly

fixed

ideas

about

what

constitutes which

it is a c o n v e n i e n t (but place not in the in

grouping part

of words the part

can a

whole

unless

is also

constituent) rules very with helpful that

to a n o t h e r respect

sentence rules. for

and w h i c h This

obeys

certain is not

to t r a n s f o r m a t i o n a l of ATN grammars that It to

definition

to w r i t e r s it does

applications , of words

however, may appear have than PUSH as a to is

except

indicate

certain is

groups

in s e v e r a l single repeat

places of

in a sentence. the grammar

more

efficient units

to

level the arcs by which

process them.

such

rather a

and

nodes nature

comprising ~f the

Sometimes to be parsed, which

required phrases

the may

structure

in noun contain

contain

prepositional

phrases

must

250

noun

phrases. when

In a

other section

cases of

a input

PUSH can

may be it.

be

used

merely

for

convenience

processed

relatively

independently

of any i n f o r m a t i o n

preceding

Now look quite nets

for parts Again,

of the n e t w o r k there

which

are very

similar, either

but not the

identical. using

are two a l t e r n a t i v e s : to k e e p t r a c k

merge

registers

and tests

of which

path is b e i n g use SENDRs Merging cleanly If in into not

followed, to convey

or make the

a new level necessary by using

to be r e a c h e d about

by PUSHs w h i c h

information tests

the d i f f e r e n c e s . since it

similar and

networks

is u s u a l l y

desirable

concisely

expresses

some g e n e r a l i z a t i o n however,

about the

language. is lost

this m e t h o d the

is c a r r i e d

to extremes,

the c l a r i t y an entire but this

complexity

of the tests.

One c o u l d m e r g e tests,

network would

one arc w i t h express

a huge number

of c o n d i t i o n a l

very much about

the language!

The permit

merging

of

common

portions

of the n e t w o r k the have one

does more necessity common has of the

than of

a more

compact

representation; when parsing.

it e l i m i n a t e s When two rules the first,

redundant by m a t c h i n g performed

processing

parts, already second. the

(or even a t t e m p t i n g some

to match)

of the tests r e q u i r e d of this rule.

for the m a t c h i n g to avoid

Thus one can take a d v a n t a g e work when t r y i n g the second

information

redoing

Next, which the

for

each

of the sample to return. within

sentences

decide on the s t r u c t u r e the structure of

you want major

the parser

Try to sketch

constituents those

the sentence. When

Add more a c t i o n s conflicts arise,

to the use a

grammar

to p r o d u c e action

structures.

conditional tests

or use two arcs of the

same type but with d i f f e r e n t

and actions.

Try to c o n s o l i d a t e to prevent them from

the n e t w o r k being

by u s i n g

self loop

arcs with tests For example, the arc

taken more

than once. by one

grammar

fragment

in F i g u r e

15 may be r e d u c e d in F i g u r e 6.

state

and one

if it is r e p r e s e n t e d

as it was

The use of " l o o k - a h e a d " since before a lot of work it c o n s u m e s

tests

on PUSH

arcs

is a great time

saver,

is wasted input.

if a r e c u r s i v e

call

is set up but fails

any

251 cAT NEG JUMP

Figure

15:

A Grammar

Fragment

Which

May be Reduced

The best way to debug a g r a m m a r and e x a m i n e extremely the o p e r t i o n useful

is to parse on t h o s e

a number which

of s e n t e n c e s It is

of the parser

fail.

to have a t r a c i n g entered, it),

facility

in the parser w h i c h will (not the whole arc, each structure POPed, just and

print out each state enough each to identify

each arc taken set,

each r e g i s t e r

state w h i c h

blocks.

The trace parse or

is useful parsed

not only

to

debug

sentences

which

didn't to a

which

and r e t u r n e d

the w r o n g of

structure,

but also If

find i n e f f i c i e n c i e s sentence things parses

in the p r o c e s s i n g but r e q u i r e s

correct

sentences.

correctly

much backup, the p a r s e r

there are s e v e r a l started down the the

to look

for b e t w e e n

the point w h e r e it blocked:

w r o n g path and the point where beginning taken? first? of the e r r o n e o u s

Can a test be made at prevent

path that w o u l d so that

the arc from being one is taken

S h o u l d the arcs Can the right

be r e o r d e r e d

the c o r r e c t

and w r o n g paths be m e r g e d ?

The o v e r r i d i n g but different After

consideration

in a d e c i s i o n should

among

several

adequate

network all,

representations purpose

be the c l a r i t y of the a grammar is to

method.

the u l t i m a t e about

of w r i t i n g

communicate beings. of

something

the s t r u c t u r e

of l a n g u a g e clarity

to other h u m a n for the sake and

A grammar

w r i t e r who always

sacrifices amounts

efficiency

will w a s t e

extraordinary

of time m o d i f y i n g it work,

explaining it fast,"

his work. should

The p r o g r a m m e r ' s

maxim,

"Make

then make

be heeded.

One p o r t i o n grammar discover) copious keep a gets the

of the g r a m m a r it

interacts harder or

with m a n y

others,

so as

the

large

becomes

to keep t r a c k of (or even to changes. it is also exercise This is where

implications

of a d d i t i o n s

commenting list

of the g r a m m a r which

pays off. thoroughly

a good idea to of the to the that

of s e n t e n c e s

all parts

grammar. grammar what used

Add to the list w h e n e v e r and occasionally still

new c a p a b i l i t i e s list,

are added

parse the entire

just to be sure

to p a r s e

does.

252

As the g r a m m a r same will time. be

grows,

the d i c t i o n a r y to keep to track

will

probably

grow

at

the

It is i m p o r t a n t testing and how

of what whether

features a word

the grammar

decide

is to get that information

feature

or not.

See A p p e n d i x

II for a d e s c r i p t i o n

of the

w h i c h may be kept

in the dictionary.

The

reader

who has c a r e f u l l y an ATN

studied grammar

the concepts and/or

presented with

here which yet

s h o u l d now be able to design to experiment.

parser

It is a r e w a r d i n g The author and

experience

to use such a simple appreciate experiences

powerful comments, grammars.

mechanism,

would

greatly of others'

receiving with ATN

suggestions,

reports

I would reading own.

like to e x p r e s s

appreciation

to W i l l i a m

A. Woods

for

his is my

of a draft

of this paper;

the r e s p o n s i b i l i t y

for errors

253

References

Bates, M. "The Use of Syntax in a Speech Understanding System," IEEE Transactions on Speech and Signal Processing, Vol. ASSP-23, No. I, Feb. 1975, pp. 112-117. Bates, M. "Syntactic Analysis in a Speech Understanding System," BBN Report No. 3116, Bolt Beranek and Newman Inc., Cambridge, Ma., 1975. Bates, M. "Syntax in Automatic Speech Understanding," American Journal of Computational Linguistics, Microfiche 45, 1976. Bobrow, D.G. and Fraser, J.B. "An Augmented State Transition Analysis Procedure.', Proc. IJCAI, 557-567, 1969. Network

Bobrow, R. and Bates, M. "The Efficient Integration of Syntactic Processing with Case-Oriented Semantic Interpretation,~" submitted to the Annual Meeting of the Association for Computational Linguistics, Georgetown University, Washington D.C., March 1977. Burton, R.R. "Semantic Grammar: An Engineering Technique for Constructing Natural Language Understanding Systems.~" BBN Report No. 3453, Bolt Beranek and Newman Inc., Cambridge, Ma., December 1976. Burton, R.R. and Woods, W.A. "A Compiling System for Augmented Transition Networks," presented at the International Conference on Computational Linguistics, Ottawa, Canada, June 1976 Earley, J. "An Efficient Context-Free Parsing Communications of the ACM. 13, 1970, 94-102. Grebe, K. "Verb Clusters of Lamnsok," in J., ed., 1975. Network A Algorithm." Grimes, Network

Grammars, Guide," in

Grimes, J. "Transition Network Grammars: Grammars, Grimes, J., ed., 1975.

Grimes, J., ed. Network Grammars, a publication of the Summer Institute of LinguisTics of the University of Oklahoma, 1975. Leal, W.M. "Transition Network Grammars as Tagmemics," in Network Grammars, Grimes, Rustin, R., ed. Natural N.Y., 1973. Language a Notation Scheme J., ed., 1975. Algorithmics for

Processing.

Press,

Shapiro, Stuart C., "Generation as Parsing from a Network into a Linear String,L" American Journal of Computational Linguistics, Microfiche 33, 1975. Simmons, R.F. "Semantic Networks: Their Computation and Use for Understanding English." in Computer Models of Thought and Language. Eds. R.C. Schank and K.M. Colby. San Francisco: W.H. Freeman and Company. 1973. Simmons, R. and Slocum, J. "Generating English Discourse Semantic Networks," CACM, 15:10 (Oct. 1972) pp. 891-905. from

254

Teitelman, W. INTERLISP Reference Manual. Center, Palo Alto, California, 1974. Thorne, J.P., Bratley, P., and Dewar, H. English by Machine," in Michie, 281-309, 1968.

Xerox Palo

Alto

Research

"The Syntactic Analysis of Machine ~ntelligence 3, pp.

Weischedel, R.M. "A New Semantic Computation While Parsing: Presupposition and Entailment." Technical Report 76, Department of Information and Computer Science, University of California, Irvine, California, 1976. Weissman, C. LISP 1.5 Calif., 190"?-~. Primer, Dickenson Publishing Co, Belmont,

Woods, W.A. "Augmented Transition Networks for Natural Language Analysis." Harvard Computation Laboratory Report No. CS-I, Harvard University, Cambridge, Ma., 1969. (Available from the National Technical Information Service 5285 Port Royal Rd., Arlington, Vao, 22209, USA, as Microfiche PB-203-527; also available from ERIC, PO Box O, Bethesda, Md., 20014, USA as publication ED-O37-733) Woods, W.A. "Transition Network Gr.ammars for Natural Language Analysis." Communications of the ACM. 13(1970), 591-606. Woods, W.A. "An Experimental Parsing System Grammars." Natural Language Processing, York; Algorithmics Press, 1973. for Transition Randall Rustin, Network ed., New

Woods, W.A., R.M. Kaplan, and B. Nash-Webber, "The Lunar Sciences Natural Language Information System: Final Report." BBN Report No. 2378, Bolt Beranek and Newman Inc., Cambridge, Ma., 1972. (available from the National Technical Information Service as publication N72-28984) Woods, W.A. et al, "Speech Understanding Systems, Final Report Vol. IV (Syntax and Semantics)," BBN Report No. 3438, Bolt Ber~nek and Newman Inc., Cambridge, Ma., 1976.

SYNTACTIC ANALYSIS

OF WRITTEN POLISH

Stanis~aw Szpakowic z Institute of Informatics of Warsaw

University

2a~ac Kultury i Natfl~i, pok. 850 00-901 Warszawa, POLAND

Abstract The aim of the paper is to give an idea of methodology cal solutions used in the design of an experimental and of techni-

syntax-oriented

program to process Polish texts; the program is currently being developed by the author. A classification of Polish words is presented. category and it covers in

It is based on the notion of syntactic principle most of the inflexional

and syntactic features of words. the

Polish syntax is to be described by means of a formal grammar; description takes into account some newer results

concerning the syn-

tactic function of particular word classes.

The formalism used to grammar. The program

describe syntax is the Colmerauerts metamorphic will be implemented in PROLOG, which metamorphic

the powerful progr~mm~ug language in The output of the

grammars are directly available. structure

program will be the surface syntactic Next, a subset of Polish is specified. ces to be processed by the program. gram are given.

of each sentence.

The subset consists of sentensome details of the pre-

Finally,

262

1~ Introduction

Syntactic analysis maY b e understood in several ways, depending on the definition or the description of syntax itself and on the task performed by the analysis process, gorithm (program, system). for example by some al-

The analysis may concern texts, or sinThe results of analysis depend again on between them.

gle sentences, the definition

or phrases.

of syntactic units and relations

I use the results of Saloni (1976) as the theoretical foundation of my syntax definition; I confine myself to sentences (com-

pound clauses as well as simole clauses); ve sentences for the time being.

I ignore the interrogati-

Syntax describes formal relations the syntac-

between words. It gives the rules allowing to recognize

tic function of each word in the sentence by indicating the possibility of locating it in the abstract syntactic tence, e.g. in such units as noun-phrase, structure of the senadjective-

verb-phrase,

phrase etc. Recognition of the relations is based on the occurrence of some

required inflexional tation.

ending or on another similar formal represenconsist first of all in matching categories as case, gender, number

The syntactic relations

the values of such grammatical and person; among the matchings

of such kind agreement and governThe important problem is also constituents,

ment are traditionally to recognize

distinguished.

in the sentence obligatory subordinate

which are syntactically

implied by the head of the construction. of words.

I do not consider any semantic interdependence

263

The words are the basic constituents of a sentence. I do not use the notion of morphemes, that is, I do not distinguish stems and

affixes. Instead, I assume that each word is supplied with sufficient inflexional and syntactic information. This can be achieved

by some dictionary-based preprocessing of the input sentence. The outcome of syntactic analysis is the sentence representation revealing its structure and the relations between particular words. It might be, for instance, a tree of parsing.

284

2. The scope .of syntactic

description

of words

Polish is the inflexional res of a word determine order is definitely syntax,

language,

so the inflexional r$1e.

featu-

to some extent its syntactic

The word

less important

from the point of view of Polish on the stylistic and semantic cha-

though it has some impact of the sentence.

racteristics

I distinguish I) Case, assuming

seven inflexional

categories.

They are as follows: dative, accusa-

six values

- nominative,

genitive,

tive, instrumental, 2) Gender;

locative. out at least six distinct values,

it is useful to single

which are the following:


|, 'i

a) masnuline-personal b) masculine-animate c) masculine-inanimate d) feminine e) neuter


,,

(e.g. "ch~opiec" ("kogut"

boy );

- "cock") ;

(" st6~" - "table") ; " noga " - "woman",


~ II

("kobiet~',
. II

"leg") window i,

( dzlecko

, "okno"
9

child
l

'l

,
tl

,|

II

);
II

f) pltu.ale tantum 3) Number - singular, 4) Person - first, 5) Degree

( rodzenstwo plural. third.

It

II

, L drzwi

siblings

, "door").

second,

- positive,

comparative, imperative,

superlative. conditional. only two possible values -

6) ~ood - indicative, 7) Tense;

it is sufficient

to distinguish

past and nonpast of syntax).

(the compound

future will be treated

at the level

265

Sometimes it is useful to distinguish, tioned above, the universal valus stance:

besides

the values men-

0 (i.e. " a n o t h e r

value'l), for in-

the case 0, realised as an empty word, the de~ree

like in an implied

first person subject; has no comparative

0, which means that an adjective degree.

and superlative

I assume that words can be gathered into sets called lexemes, which group words differing only in the value tegory (or values of categories). of some grammatical ca-

A lexeme m a y be thought of as a dic-

tionary entry of the kind used e.g. in the great dictionary of Polish (Doroszewski 1958). category of a word is the category that to which the word be-

The proper inflexional determines longs.

some opposition within the lexeme to define

It is easiest

each lexeme by simply listing its features are actually recategories: the

elements and i n d i c a t i n g which inflexional levant.

Here are the examples of proper inflexional the gender of an adjective;

case of a noun; verb.

the number of a personal

The selective inflexional

category is the category which itself Instead it determines the value

does not constitute any opposition. of the proper inflexional

category of a governed word. For instance:

a preposition decides the case of a noun; a noun has the selective category of gender which determines ted with the noun. per category lective The selective the gender of an adjective conuec-

category can coincide with the pro-

(e.g. the case of a noun is proper and at same time se-

from the point of view of an adjective).

266

The ge~eralized

selective

category or the syntactic requirement If a word implies, say, an infi-

has to do with syntactic implication. nitive, I recognize infinitive

as the value of the syntactic requirenothing; an infinitive; an ad-

ment of this word. A word can require: jective; a noun in a particular

case; an adverb or an adverbial modiof the above. ~e..o "

fier; a preposition grouo; and several combinations Moreover a subordinate ("I know that syntactically, equivalent clause can be required,

as in "wiem,

...i,). All mentioned word categories are understood that is " a n infinitive" stands for every distributive

of an infinitive,

for instance for an infinitive modified

by an adverb. I assume for the sake of clarity that no word has more than three different requirements at a time; the assumption seems to be

justified in almost all cases. The inflexional s~rntactic requirements tegories. Below I present the classification combinations ignored. Several of relevant syntactic of words according to the Irrelevant ones are categories (both oroper and selective) and the ca-

I treat jointly and I call them syntactic

categories.

Basically each class has a unique selection of categories. classes are further subdivided. Needless to say, the classi-

fication is arbitzary,

although relatively well suited to the recent 1974, 1976a); there are

results in the morphology and syntax (Saloni also some similarities (Misz, Szupryczy~ska

to the ideas expressed in (Nisz 1971) and

1971). c - case, g - gender, n


-

I use the following abbreviations:

267

number,

p - person,

d - degree,

m - mood,

t - tense,

rl, r2, r3 - re-

quirements greater

(an absent one may be assigned

the value "nothing" for

consistency).

The symbol x o means that the category x has of the values categories of remaining categories. Pro-

a fixed value regardless per categories, selective

and requirements group

are separated is

by semicolons, " -" means wholly proper absent. ones.

that the respective

of categories

I omit the selective

categories

coinciding with some

Word class I ) Noun 2) Substantival 3) Adjective 4) Adjectival 5) Adverb 6) Adverbial 7) Numeral 8) Preposition 9) Conjunction 10) Personal verb three subclasses sets of proper These are: pronoun ("tak""so") pronoun ("taki" - "such") pronoun (e.g. "ja'' - "I")

Categories c,n;g,po;c,no;g,P;c,g,n,d;-; c,g,n;-;d;-;-;-;c,g;no;-;c;-;-;g,n,p,m,t ;- ;rl ,r2,r3

This class includes for which different categories

are relevant. verb

10. I) Imperative 10.2) Present

n,p,mo,to;-;rl,r2,r3 verb

(or simple future)

("9 md~ " , "przyJd% " - "I go", 9 "I shall come") n,p,mo,to;-;rl,r2,r3

2~

10.3) Past verb ("zna~em"-

"I knew")

g,n,p,m, to;-;rl,r2,r3

11) Impersonal verb ("zrobiono" "one did" or "it was done") 12) Infinitive 13) Gerund 14) Adjectival participle This class is further divided into two subclasses with distinct syntactic functions but with identical categories: 14.1) Active participle ~ i d ~cy
I ,, 9 "

m;-;rl,r2,r3 -;-;rl,r2,r3 c,n;go,po;rl,r2,r3 c,n,g;-;rl,r2,r3

"going" as in "a going man") 14.2) Passive participle ( " b i t y " - "beaten") 15) Adverbial participle ("id~c" "going" as in "he slept, going 'l) 16) Auxiliary verb "b~d~" ("shall", "will") n,p;-;-;-;rl,r2,r3

constitutin~ the compound future 17) Unclassified, i.e. anything else; this class has no syntactic categories. Remarks:

a) An adjective may have certain requirements which will be taken into account later. b) At the present stage of research the list of categories of the numeral is still incomplete. c) Certain characteristics of the conjunction can be categorized, for example affinity to another conjunction, say, ~'either" to " o r " | "if" to "then". Such facts will be investigated later.

269

d) The mood of present verbs is fixed otherwise than that of imperative verbs. e) The class
Jz

unclasslfled

LI

may be further diversified

to include

orooer names, numbers, throw-ins etc.

abbreviations,

scientific symbols, foreign

The process of assignin~ each word in a given sentence a set of values of its syntactic categories I call syntactic preprocessing. A simple search algorithm will suffice if only the search soace is oroperly organized. One approach can consist in writing down all inThe dictionary obtained in

flexional forms of all words of vocabulary. such a way should also include selective

categories and requirements.

I assume that the syntactic preprocessing can be relatively easy to implement or at any rate easy to simulate. nnections between words need to be analysed. It is so because no coThe syntactic categories

of a word can be singled out solely on the basis of its apoearance. Any oossible ambiguities can be solved just by reoeatin~ an appropriate dictionary entry as many times (with suitable values of categories) as is needed to account for those ambiguities. Therefore in further considerations I shall use freely all necessary syntactic information. The above classification and the grammatical characterization

of word classes have been already outlined and partially verified in the NARu system (Bie~ e~ al. 1973, 1973a, 1973b, 1974; ~ukaszewicz,

Szpakowicz 1973, 1974, 1976).

270

~. The m e t h o d of s2ntax d e s c r i p t i o n

Syntax is described by means of a formal grammar.

Syntactically

p r e p r o c e s s e d words are the terminal symbols of this grammar. The nonterminal symbols (further r e f e r r e d to as syntactic units) are chosen a l t h o u g h a c c o r d i n g to some linguistic intu-

more or less arbitrarily, itions.

The productions, w h i c h I call replacement rules, define the or theaxiom of the

structure of syntactic units. The topmost unit, grammar,

is SENTENCE. At the bottom, nearest to the words, are syntac-

tic units r e p r e s e n t i n g any word of a particular class (cf 2). Actually, the syntactic units are not listed explicitly, they are instead g i v e n

i m p l i c i t l y by a set of rules. The words are not listed at all: the set of words is determined by the content of a dictionary. The tas~ of syntactic analysis consists in m a p p i n g an analysed sentence onto an appropriate structure; such m a ~ o i n g need not be uni-

que but it should reflect the fundamental characteristics of a sentence. W i t h i n the adopted set of replacement rules one should be able to find (for each sentence of a p r e d e f i n e d collection) at least one sequence of rules which constitutes a derivation of a given sentence from the axiom of the grammar. The d e r i v a t i o n should comprise every m a t c h needed to ta~e into account values of syntactic categories of words w h i c h make the sentence. Every syntactic unit has also some syn-

tactic categories due to the word class d i s t r i b u t i v e l y equivalent to it. These are the external categories of a unit w h i c h determine its connections, as a whole, with another constituents of a sentence, If a unit includes something more than a single specimen of a word class, then it

271

has its own internal structure expressed by means of suitable category matches~ This structure is b i d d e n from above but it must be r e v e a l e d

if the analysis is to be complete. The structure found out in the course of analysis I call surface syntactic structure. The only considered features of a word are its word class characteristics. Any word of a given class can be substituted for another one provided that both have identical values of all syntactic categories; the resulting surface syntactic structure is the

same in b o t h cases. On the other hand, changing order of two different n e i g h b o u r i n g units renders a different (however similar) structure,

although both structures may differ only at the lowest level. The surface syntactic structure can be represented by a parsing tree. Every rule used during analysis specifies a parent node and its daughter nodes. The leaves of such a tree are the syntactically pre-

p r o c e s s e d words. An auannented version of a p a r s i n ~ tree might be a parsing graph, p r o d u c e d from the tree by linking up all pairs of nodes w h i c h have some m a t c h i n g category. Every such link would be an category. All syn-

arch labelled with name and value of an appropriate

tactic relations observed in a sentence w o u l d be thus fully exposed. Some well k n o w n facts should be p o i n t e d out. It is practically impossible to describe the natural language in extenso by means of a formal grammar. It would be unrealistic~ if at all possible. A reasodescribed in a

nably chosen subset of the language can be, however,

sufficiently detailed manner. A carefully selected collection of syntactic units makes it possible to write down such a set of rules that is h i g h l y plausible as a starting point of some computer-based imple-

272

mentation. always

The same is valid in case of vocabulary,

which should be

considered as specific to some application. At the present stage of research it is convenient to express

syntactic relations by means of context-free rules with parameters. Those parameters stand for syntactic categories. The rule with a para-

m e t e r can be treated as an abbreviation individual values of the parameter.

of a Set of rules concerning

The parameter can occur in various units in the same rule; it assumes tactic then the same value. This means that the corresponding syn-

categories have identical value. If the proper inflexional

This is how the matching is of two units match, (for instance, the

realized.

categories

then it may be interpreted roughly as agreement

connection between the case of a noun and of an adjective reflected). If the proper category of a unit matches

can be thus

the selective (in

category of another unit,

then we can interpret it as government

this mannner the gender of a noun and of an adjective Similarly the syntactic requirement

can be matched).

can be matched with an appropriate the values of synsimilarities of

word class of a required unit. In general, matching tactic categories different enables us to render distributive

syntactic units,

such as noun phrases with different order

of complexity. The m o t i v a t i o n underlying the choice of syntactic units is strictly distributive. The word class may be (slightly imprecisely) equivalent but

thought of as including items which are distributively

have different degree of complexity in some specific sense. It is then convenient to distinguish a number of subclasses of a word class;

273

they should have approximately such subclasses phrases.

the same de~ree of comolexity.

I call

The phrases can be arranged in a sequence The simplest phrase is just

according to their growing complexity. a single word of an appropriate consists

class. The phrase of each next degree (in particular, of on-

of some phrases of the previous degree

ly one). The phrases are linked up by means of conjunctions king more precisely, lent to conjunctions. by means of constructions The phrase syntactically

or, speaequiva-

of the lowest degree may be either

a single word or (recursively)

a hi~hest degree phrase; it i~ an illusare essentially

tration of the fact that the phrases of all degrees equivalent from the standpoint of distribution.

The number of degrees is arbitrary. it should conform to the experimentally of respective constructions

It. seems to me, though,

that

determined relative

frequency

in a g i v e ~ text corpus. generality of replacement

In order to attain the greatest possible

rules one should always choose the most complicated phrase to stand for an element of a word class: any less complicated As an example, valeuts one. let us consider the sequence of syntactic equiconsists of one or one or more this phrase can be directly replaced by

of a noun. A "series of noun phrases" each of those includes, A single-noun phrase

(SNP)

more "noun phrases"; "single-noun modifiers

in turn,

phrases".

deprived of all adverbial

(that are insignificant


Jl

from the point of view of fundameni!

tal syntactic relations)

makes a

trimmed single-noun phrase pronoun;

, which

may be one of the following: nied by an attribute,

a substantival

a noun accompa-

which can be, by the way, a fairly complicated

274

adjective mentioned

phrase;

the same plus an SNP in genitive involved; moreover

case;

one of the can inclu-

above with numerals clause. the phrases

the phrase

de subordinate

I introduce noun, verb

related

to the following word classes: adverb, infinitive, numeral. levels

(classes

I0, 11), adjective,

Every member

of each of the classes structure

can be located

at various

of the syntactic of complexity

of the sentence,

depending

on the degree

of a relevant

phrase 9 For instance,

"he" in "he fences 'i

is treated as an SNP, in "he and Jack fence t~_ as a "noun 0hra~e", whereas in 'leither he and Jack or Jim and Joe fence ~i
--

as a

l!

single-

-noun phrase ". Here are a few e~amples phrases described of rules, connected with the noun-like

above. Let us assume gender,

that every such phrase has four The names of syntactic units

parameters:

case, number,

person.

and the constant

parameters

are written in block letters.

The sequence

numbe~ is not the part of a rule. I ) SERNOUNPHR( case, numb, gend, pers ) = NOUNPHR( case, numb, gend, pets ) 2 ) SERNOUNPHR( cas e, PLURAL, gend, oers ) = NOUNPHR( c as e, numb2, gend2 ~oers 2) CONJUNC SERNOUNPHR( case, numb3, gend3, pets3) = SNGLNOU~HR(case,numb,gend,pers) = SNGLNOUNPHR(case,numb2,gend2,

3) NOUNPHR(case,numb,gend,pers) 4) NOUNPHR(case,PLURAL,gend,oers) oers2) CONJUNC

NOUNPHR(case,numb3,gend3,oers3)

5 ) SNGLNOUNPHR( case, numb, gend, pers ) = TRIMSNGLNOUNP ( cas e, numb, gend, pets ) 6) TRIMSNGLNOUNP(case,numb,~end,pers) = NOUNATTR(case,numb,~end,pers)

7 ) TRIMSNGLNOUNP( case, numb, gend, pets ) = NOUNATTR( case, numb, gend, pets) SERNOUNPHR( GENITIVE, numb2, gend2, o ers 2 )

275

Each rule is aoplied according to the left-to-right That is, a rule reads:

principle.

a left side syntactic unit is to be replaced by of a sentence, ~oreover, corres-

a sequence of right side units, if the sections ponding to the right side units, are contiguous.

all cate-

gories supposed to match should actually match. Note that in case of the rules 2 and 4 an additional procedure ought to be used which adjusts the gender of a left side to the genders of all right sides. Care should be also taken that more subtle

rules are used to handle special cases of number and gender adjustment. As an example let us consider the sentence:"Dziecko, przyszli" ko~ i kobieta

("A child, a horse and a woman have come"). Each of the which is

nouns has different gender, neither is masculine-personal, the case with the whole group. Another example:

"Jan lub Piotr przyj-

dzie" ("John or Peter will come"), where the group is to be treated as singular.

276

4~ The tools for describing and analysin~ syntax

A grammar of the kind described in the previous section can be directly and conveniently tamorphic expressed as a metamorphic grammar. The me-

grammars have been invented by Colmerauer

(1975) and already

proved in practice as useful means of defining some formal properties of a natural language (Battani, Meloni 1975). Metamorphic grammars are It is

said to be at least as powerful as context-sensitive then presumably

grammars.

even more than is currently needed from the standpoint of written Polish. to a language defined by a metamorin the PROLOG programming lanare translated one-to-one of language elements

of surface syntactic analysis Analysis

of words belonging

phic grammar can be easily implemented

guage. In fact, the grammar rules themselves into PROLOG subprograms.

(By the way, synthesis

is equally easily available of metamorphic

in PROLOG; it is a very appealing property

grammars implementation.) team

PROLOG has been designed and developed by Colmerauerts (Roussel 1975). It is an implementation calculus,

of the idea of programming (1973,

in predicate

which has been advocated e.g~ by Kowalski

1974), and it actually exceeds the capabilities

of first order logic.

Externally it can be viewed as a theorem prover for the facts expressed in clausal form, which is based on the SL-resolution principle ski, Kuehner 1971). Internally, process, such as substitutions certain side-effects (Kowal-

of a proving lite-

necessary to unify appropriate

rals, result in that PROLOG is a very powerful, programming language. It is not, however,

concise and elegant efficient.

particularly

277

The basic data structures in PROLO~ are terms, tures. The proof procedure,

or tree struc-

and therefore control flow, is top-down,

d e p t h - f i r s t w i t h b a c k t r a c k i n g in case of failure. A program in P R O L O G is made of subprograms, each consisting of a sequence of clauses, and

a sequence of i n v o k i n g clauses w h i c h can be i n t e r p r e t e d as subprogram calls. The choice of a clause within a subprogram resembles a case statement w i t h a set of parameters as a selector. It is then a kind

of p a t t e r n - d i r e c t e d procedure i n v o c a t i o n where the p a t t e r n - m a t c h i n g process is carried out by means of unification. The m e t a m o r p h i c gr~]~ar rules can be straightly incorporated into a P R O L O G program. gram, They are in fact treated as a part of the oro-

since each rule corresponds to a clause. A set of rules can be

thus r e g a r d e d as a predicate calculus v e r s i o n of a lanpula~e definition. The rules " w o r k " in two directions: their can be used equally well dulanguage.

r i n g analysis and during synthesis of elements of a ~ v e n

A c t i v a t i o n of any of those processes requires a P R O L O G command. This command specifies both the d i r e c t i o n of a process and the parameters which indicate a particular object submitted to the process. The m e t a m o r p h i c grammars in PROLOG are especially handy for two reasons. First, one can interpret any p a r a m e t e r of a syntactic unit as

another syntactic unit; a d i s t i n g u i s h e d n o n t e r m i n a l NT(xl,...,xn) is interpreted reughly as a nonterminal x1(x2,...,xn). If xl is a unit

name, then the nonterminal x1(x2,...,xn) makes this unit. The second reason is the possibility of i n s e r t i n g in the right side of a rule any number of procedure calls w h i c h are called conditions. They are v e r b a t i m transmitted to the clause c o r r e s p o n d i n g to a rule and they

278

exDlicitly condition the use of the rule: must

the activation of a literal some useful

succeed unless the rule is to be abandoned. Moreover,

a c t i o n may be done, like gender and number adjustment of a noun-like phrase.

279

~, The specification

of a subset of Polish

Here are the properties

of a subset of Polish,

to be actually

processed by a preliminary version of a syntactic analysis and synthesis system which is currently being implemented in PROLOG. For the

sake of the system it is useful to determine what is meant by a sentence from the technical point of view: it is each section of an input text terminated by a period or a semicolon. analysis of a sentence The task of syntax

consists in examining its syntactic correctness

(that is, its accordance with a given set of reolacement rules which implicitly define the notion of correctness); every correct sentence Punctuation must

should be assigned its surface syntactic be correct too. The subset of Polish includes that conform themselves

structure.

then all and only those sentences listed below. or conditional. Com-

to the restrictions

I) Only proper clauses are considered, pound clauses are admissible

indicative

too. (By a proper clause I mean a e.g. a sole noun phrase

clause which has at least one predicate; would not be accepted.) 2) No ellipses are allowed, day.") is not accepted. 3) The phrases ought to be continuous: interlace,

e.g. "Dali wczoraj." ("They gave yester-

no two distinct phrases should is a good physicianJ')

e.g. "Dobrym jest on lekarzem."("He

is not accepted. 4) The word order should be approximately tions of whole phrases are possible. neutral, although permuta-

280

A finite verb is the pivot of a Polish sentence. belonging to word classes 10 and 11. The members

It is the verb

of word classes 12-15

play a specific ro~le in a sentence rements.

too, due to their syntactic requi10-15 have (with few excepcol-

Corresponding items of classes

tions) identical requirements. lected into a superclass has been introduced: applies

These classes have been therefore A new syntactic discriminant

of verb derivatives.

category

it is called derivational

and it

only to a verb derivative,

dividin~ it into original classes. formulated by Tokarski

I follow here the idea of verb derivatives (1973). It has also (in a specific form)

occurred in the NARYSIA system.

A verb derivative is the central syntactic unit of a generalized verbal construction built of the derivative itself and of the units reeach require-

quired by it. According to the principle

given earlier,

ment is satisfied by the most complicated phrase which can stand for an element of a required word class. requires For instance, if a verb derivative

a noun in dative case, then we refer to a "series of noun The verbal construction with a fixed discriminant adjective phrase, adverb phrase,

p h r a s e s " in dative.

makes a special case of: verb phrase, infinitive phrase.

It is then convenient

for the technical reason too, rules.

as it allows us to limit the number of replacement

The syntactic units which may correspond to single words I regard as elementary units, each of classes The elementary units are associated with of classes 10 sad 14, and

1-16, with five subclasses of verb derivatives. that are necessary

with the superclass ven the parameters tegories

Every elementary unit is gica-

to stand for all syntactic

of a suitable

class; moreover it has an additional parameter

281

which represents

a word form belonging to that class. For instance that correspond to a word

the

elementary unit NOUN has five parameters form, case, number,

gender and person, respectively.

The elementary units are, in some sense, terminal units with respect to the definition of the subset of Polish. surface syntactic substituted structure any representative That is, within a

of a word class may be

for an elementary unit related to the class, and the struc(obviously, semantic considerations substitutions). would As a

ture will remain unchanged

be needed to restrict the number of permissible matter of fact, a description

of syntax (in the sense adopted here) since their only features rele-

should well do without lexical items, vant to syntax are their syntactic

categories.

2~

6. The qrganizatio n of a n experimental

program

The syntactic analysis program has not been implement yet. Below I shall present some technical decisions which will be thoroughly tested soon. The replacement rules constituting the syntax definition are of the subset of Polish

the global rules. They apply to every sentence

which has been described above, provided that each word of the sentence is linked to a corresponding elementary unit, This can be accomplished via syntactic preprocessing. analysed, If a separate sentence to complement ought to be

then it will be sufficient

the global rules

with those and only those specific rules which concern this sentence. These rules can be regarded as local (to the sentence). A local rule

defines an elementary unit having a specified word form parameter as this particular word form. The form is supolied with oertinent tic categories. The global rules would be the constant part of a PROLOG program. The local rules would be exchangeable: tence to another. they would vary from one senthough, the arransyntac-

In the current tentative version,

gement of rules is slightly different, processing but also the dictionar7

because not only syntactic pre-

are simulated as yet. I use the dis-

tinguished nonterminal NT (cf 4). There is one global rule for each elementary unit. For example, a rule for the NOUN unit is: == NT(form,case,numb,gend,pers)

NOUN(form,case,numb,gend,pers)

(The double " = " separates left and right sides of a rule.) For a fixed word form, NT(form,case,numb,gend,pers) corresponds to

283

a nonterminal form(case,numb,gend,pers) If the parameter "form" has the value, tive), then the nonterminal say, P I ~ K ~ ("a ball", accusa-

looks like this:

PI~K~(ACC,SING,FEM, 3) The vocabulary is composed or more readings with respect a rule for each reading, of word forms. Every form has one categories. There is

to its syntactic

with a nonterminal

of the above form at the The word form

left side of the rule and with a word form at the right. is written as a metamorphic symbol, grammar terminal

(prefixed by a special For example,

say, a ~$, to distinguish it from nonterminals). przyjaciela

the set of rules for the word follows:

( a friend ) may be as

PRZYJACIELA(GEN,SING,N~SCPERS,3) PRZYJACIELA(ACC,SING,MASCPERS,3)

==~PRZYJACIELA == ~ P R Z Y J A C I E L A

This is how the syntactic preprocessing is simulated. Beneath I shall give the list of n o n - e l e m e n t a r y which occur at the left sides of global replacement must not be regarded as complete or definitive, syntactic units

rules. The list

because the set of

rules made up so far ought to be verified and then perhaps modified in order to mirror more adequately the characteristics subset of Polish. The verification would be carried of the chosen

out with some par-

ticular text corpus. The list of non-elementary I) Sentence 2) Subject syntactic units is the following:

2~

3) Predicate 4) Noun phrases (four degress of complexity) 5) Verb phrases (u.s.) 6) Infinitive phrases (u.s.) 7) Adjective phrases (three degress of complexity) 8) Adverb phrases (u.s.) 9) Numeral phrases (u.s.) 10) Conjunctive construction (such as "a tak~e", "jak r 6 w n i e ~ " II

also ,

II

I|

as well as !' )

11) Verbal construction (cf 5) 12) Verb with requirements, a separate unit for each of these situations: no requirement, noun required, preposition plus noun required, two nouns required, noun and preposition plus non required, subordinate clause required; this list can be amplified in the future. 13) An undetermined so far number of subordinate clauses, such as those connected with "~e" ("that") or "ktSry" ("which", "who"). 14) Negation NIE, realized as the word "nie" or as an empty word. 15) Noun with attributes (introduced mainly for technical reasons). 16) Adjective with modifiers (u.s.) The list will be probably expanded as a result of the verification mentioned above. Punctuation will be also taken into account, as in the initial outline it is not considered at all. Syntactic analysis or synthesis of a sentence is activated by means of a special PROLOG command S Y N w i t h two parameters. The first

parameter is an axiom of the metamorphic grammar (SENTENCE in our ca-

285

se), the second is the sentence put down as a concatenated list of consecutive words and punctuation marks. For purely technical rsasous each syntactic unit will have an additional parameter used to transmit succesive approximations of a parsing tree produced during analysis. The same parameter will indicate the parsing tree of a sentence

to be produced durin~ synthesis. The tree will be transmitted as a term. In the case of analysis the initial value of tree parameter of

SENTENCE should be a free variable; the final value would then be a parsin~ tree. In the case of synthesis the second parameter of SYN command, initially a free variable, would eventually receive the sentence representation as a result. The information connected with a node of a parsing tree may be as complicated as necessary. The term corresoonding to the node

may have any number of parameters. The daughter nodes (which are terms themselves) must be among them; one can also choose, for instance, to place in the node an information concerning some match of the daughter nodes, such as name and value of a matching syntactic category. I shall present below a sample term which corresponds to a parsing tree of the sentence: wczoraj "Syn mojej siostry i cgrka przy~aciela

znale~li pi~k~ i zabrali j ~ do d o m u " ("The son of ~V sister

and the friendts daughter found a ball yesterday and took it home"). For the sake of clarity I have simplified the term by cmitting less significant stages of analysis; for instance, I have neglected all single-unit phrases (such as "single-noun phrase ~, cf 3), beca~se they are not important in this example. I have also removed from the nodes almost all syntactic categories. The remaining categories appear

286

as first parameters daughter nodes

of the suitable nodes;

other parameters are the

(or the word forms in case of the nodes that describe

elementary units). The names of nodes have the following meanings: SN~ = series of noun phrases, -noun phrase ; NP = noun phrase, NPIT = trimmed single-

SVP, V~, VPIT = as above for verbs; ADJP, ADJPIT = similarly for adjectives; VCON = verbal construction, VRN = verb requiring noun, (plus noun). MASI - masculin-inanimate; another names VRNPR = verb

requiring noun and preposition MASP means masculin-personal, are, hopefully,

self-explanatory. (denoted O - O ) have been taken out of the

Four subterms

term so that it would be easier to read it. The items corresponding to daughter nodes have been succesively have been underlined. SENTENCE (SUB JECT (PL, S N P ( N O M , P L , ~ SP, indented. The word forms

0
CONJ(I_),
PREDICATE(PL, SVP (PERS, MASP, PL,

@
cONJ(I),

@)))

287

|
NP (NOM, SING, NASP, NPI T(NOM, SING,NASP, NOUN(NOM, SING,NASP, SYN), SNP( GEN, SING, FFA~, NP( GEN, SING, F ~ , NPI T( GEN, SING, FEM, ADJP( GEN, SING, FEN, ADJP IT( GEN, SING, FEM, AD JPRON( GEN, SING, FEM,MOJEJ) ) ),

NOUN(GEN, SING, PEM,.SIOSTRY) ) ) ) ) ) )

O
NP(NOM, SING, FEM, NP1 T(NOM,SING, ~EM, NOUN(NOM,SING, FEM,C ~ ) , SNP( GEN, SING, MASP,
NP( GEN, SING,NASa, NPI T( GEN, SING, MASP, NOUN (GEN, SING, MASP, PRZYJACIELA) ) ) ) ))

288

@
VP (PERS, MASP ,PL, 3, MODIFIER(
AOVEP~ (W CZ 0RAJ) ),

VPIT(PERS,MASP,PL, 3, VCON(P ERS, ~ASP, PL, 3, VRN( PERS ,MASP, PL, 3, ACC, VERBP ERS (MASP, PL, 3, ACC, ZNALET,L!,), SNP(ACC, SING, FEM, NP(ACC, SING, FI~, NPI T(ACC, SING, F ~ ,

NOUN(ACC, SING, Fz~,pI~KFO ) ) ) ) ) ) )

|
VP (PERS,MASP,PL, 3, VPI T(PERS,MASP,PL, 3, VCON(PERS,MASP, PL, 3, VRNPR(PERS ,MASP ,PL, 3, ACC, VERBPERS (MASP, PL, 3, ACC, ZABRALI ), SNP (ACC, SING, FI~, NP(ACC, SING, FE~, NP] T(ACC, SING, FF~,

S~SPRON(ACC, SING, FZ~, ~ PREP( GEN, D.O.O),


SNP (GEN, SING, IViASI, NP (GEN, SING, MASI, NPI T(GEN, SING,MASI,

) ) ),

NOUN( G~,N,SING,I~AST, DO, U) ) ) ) ) ) ) )

289

The structure of the sentence revealed during analysis is roughly represented by this term. It can also be shown (in a simplified manner) in the following parenthesized form: ((((syn)(mojej siostry))(i)((cgrka)(przyjaciela))) domu))))

(((wczoraj)((znale~li)(pi~k~)))(i)((zabrali)(j~)(do

290

~. Conclusion

Automatic processing of Polish syntax reached only the preliminary phase of investigation. The task of this phase consists in

disclosing problems and in indicating the course of further research. The syntax definition must be verified, corrected and improved. The

set of rules must be then expanded to cover some richer subsets of the language; it seems that the restrictions as to word order and con-

tinuity of phrases would be dropped first. Well structured dictionary accompanied by a reasonably organized lookup should make syntactic pre-

processing more efficient and flexible than in the current version. The research should be carried on in two interacting directions. First, it is necessary to study Polish syntax, point of view of computer applications. sophisticated programming especially from the

Next, looking for even more to implement more power-

tools is essential

ful syntax processing

systems.

The results achieved in both directions

will probably allow better insight into problems which arise during the work at automatic processing of natural language texts.

291

References (Battani,Meloni 1975) G.Battani, H.Meloni, "Nise en oeuvre des contraintes phonologiques, syntaxiques et semautiques dans un systeme de comprehension automatique de la parole". G.I.A., Universit6 d~Aix-Marseille, June 1975. (Bie~ et al. 1973) J. St.Bie~, W.~ukaszewicz, S.Szpakowicz, "Wprowadzenie do systemu MARYSIA". Reports of the Warsaw University Computation Centre, No 39, 1973. (Bie~ et al. 1973a) J. St.Bie~, W,~ukaszewicz, S.Szpakowicz, "0pis systemuMARYSIA. I. Zasady pisania scenariusza i scenopisu". Reports of the Warsaw University Computation Centre, No 41, 1973. (Bie~ et al. 1973b) J.St.Bie~, W.~ukaszewicz, S. Szpakowicz, JaOpis systemu~ARYSIA. II. Wprowadzanie hase~ do systemu". Reports of the Warsaw University Computation Centre, No 42, 1973. (Bie~ et al. 1974) J.St.Bie~, W.~ukaszewicz, S.Szpakowicz, "0pis systemu MARYSIA. III. Tworzenie czq~ci gramatycznych s~ownikdw systemu". Reports of the Warsaw University Computation Centre,No 43, 1974. (Colmerauer 1975) A. Colmerauer, "Les grammaires de metamorphose". G.I.A., Unlverslte dZAix-Marseille, November 1975. (Also in this volume. ) (Doroszewski 1958) W.Doroszewski (ed.),"S~ownik Jgzyka PolskiegJ~ vol. I-XI. Warszawa 1958-1969. (Kowalski 1973) R.Kowalski, "Predicate calculus as programming language". D.C.L. Memo 70, University of Edinburgh, 1973. (Kowalski 1974) R.Kowalski,"Logic for preble~ solving". D.C.L. Memo 75, University of Edinburgh, 1974. (Kowalski, Kuehner 1971) R.Kowalski, D.Kuehner, "Linear resolution with selection function". Artificial Intelligence 2, 1971, pp.227260. (~ukaszewicz, Szoakowicz 1973) W.&ukaszewicz, S.Szpakowicz~ "Start prac nad systememNLARYSIA". In: "Zastosowanie maszFn matematycznych do bada~ nad jqzykiem naturalnym ~'. Wydawnictwa UW 1973, pp. 34-41. (~ukaszewicz, Szpakowicz 1974) W.~ukaszewicz, S.Szpakowicz, "Charakterystyka systemu NARYSIA". In: " Systemy ~jszukiwania informaeji", PWN 1974, pp. 181-186. (~ukaszewicz, Szpakov~cz Iq76)W.Lukaszewicz, S.Szpakowicz, "System konw~rsacyjnv N[ARYSIA". In: "Zastosowanie maszyn matematycznvch
9 . f

292

do bada~ had j~zykiem naturalnym II", Wydawnictwa UW 1976, pp. 127-137. (Nisz 1967) H.Misz, "Opis grup synta~tycznyoh dzisiejszej polszczyzny pisanej". Bydgoszoz 1967. (Nisz, Szupryczy~ska 1971) H.Misz, ~.Szupryczy~ska, '~Nad zagadnieniem deskryptor6w dla niewsp~rz~dnych grup syntaktycznych dzisiejszej polszczyzny pisanej". In: "Problemy sk~adni polskiej", Warszawa 1971. (Roussel 1975) Ph.Roussel, " PROLOG, manuel de reference et dlutilisation". G.I.A., Unlverslte dIAix-Marseille, September 1975. (Saloni 1974) Z.Saloni, "Klasyfikacja gramatyczna leksem~w Dolskich". "J~zyk Polski" LIV (1974), vol. I, pp. 3-13, vol. 2, pp. 93-101. (Salonl ~ 1976) Z.Saloni, ItCechy sk~adniowe polskiego czasownika". Wroc~aw 1976. (Saloni 1976a) Z.Saloni, "Kategoria rodzaju we wsp~czesnvm jNz~ku polskim". In: "Kategorie gramatyczne grup imiennych w j~zyku polskim", Wroc~aw 1976. (Tokarski 1973) J.Tokarski, "Fleksja polska". Warszawa 1973.

Potrebbero piacerti anche