Sei sulla pagina 1di 13

G un ter Neumann

Deutsches Forschungszentrum fur Kunstlic he Intelligenz


D Saarbruc ken Germany

neumann dfki uni sb de

T o appear in Proceedings of the TWLT Twente Netherlands

abstra ct

In this paper w e introduce the basic design


of the disco system a Natural


analysis and generation system

In particular we describe the disco de

velopment shell the basic tool for the

integration of natural language compo

nents in the disco system and its appli

cation in the cosma COoperative Sched

ule MAnagement Agent system

Following an obje ct oriented architec

tural model we introduce a two step ap

proach where in the rst phase the archi

tecture is developed independently of


ci c components to be used and of a par

ticular ow

of control In the second phase

the frame

system is

instantiated by the


of existing

components as well

as by de ning the particular ow of con

trol between these components Because

of the ob ject oriented paradigm it is easy

to augment the frame system which in


the exibility of the whole system


researc h underlying this paper was sup

ported by a research grant FKZ ITW

from the German Bundesministerium fur

Forschung und Technologie to the DFKI project

with respect to new applications The de

velopment of the cosma system will serve

as an example of this claim


Today s natural language systems are

large software products They consist of

serveral mutually connected components

of di erent kinds each developed by dif

ferent researchers often placed on di er

ent locations The integration of these

components has therefore become a soft

ware engineering and mangement prob

lem We will consider the pro ject dis

co DIalogue Systems for COoperating

agents from this perspective

disco s primary goal is the process

ing of multiagent natural language dia

logue Multiagent capabilities make it

an appropriate front end for autonomous

cooperative agents which will be exem

pli ed by the cosma system COopera

tive Scheduling MAnagement system de

scribed in section

The pro ject disco as a whole is a four

year eight person research e ort fund

ed by the German Ministry for Research

and Technology disco is completing its

fourth y ear The rst task of the develop uniform approach instead allows to pro

ment of the disco system was to provide cess these constraints in an incremental

a uniform core formalism based on uni and parallel way by the very same con

cation of feature structures The second straint solving algorithm

was the construction of a modular archi Figure shows graphically the structure

tecture orthogonal to the representation of the linguistic kernel

formalism as a platform for experimen

tation Third research and construction

of dialogue components is ongoing as is

fourth investigation of

the interface be

tween dialogue components and multia

gent systems

We will emphasize in this paper the sec

ond phase i e the description of the


derlying architecture The solutions to


other tasks will be reported brie y in


next section

o verview of the dis

co system

The linguistic core machinery extends a

constraint based approach of linguistic de

scription Pollard and Sag In this

paradigm linguistic ob jects are described

by a set of constraints which express mu

tually co occurence restrictions of phono

logical syntactic and semantic informa

tion A fundamental aspect of these the

ories is that they are de clarative i e they

only describe what constraints are



to describe linguistic ob jects not the

way in what order the constraints involved

are to be


Such a

uniform view has not only a lot

NLL expression NLL SAR SAP DISCO Grammar Lexicon Parser Generator TDL Udine X morf Scanning
NLL expression

Figure Overview of the disco kernel

of advantages for linguistic description but Uniform Core Formalism The lin

also for the design of Natural Language guistic knowledge is speci ed in the

systems because it leads to more com typed feature formalism TDL Sch afer and

pact systems If for example the di er Krieger which incorporates the

ent stratas e g syntax semantics would uni cation engine UDiNe TDL is the ex

be represented and processed in di erent clusive formal device employed to speci

modules than a complex internal ow of fy grammar rules lexical entries and all

control between these modules would be other linguistic knowledge relevant for the

necessary if the mutual co occurence re grammar UDiNe serves also as the basic

strictions should be maintained Using an machinery for linguistic processing e g

during parsing generation or speech act lard and Sag to appear The grammar is


In TDL typed feature structures can be

de ned through simple or multiple inher

itance relations TDL performs full type

expansion at compile time i e if in a type

de nition a type inherits from other types

or if the value of an attribute is restrict

ed to a type these types are replaced by

the associated feature

limited simpli cation

structures with only

Uni cation of feature structures in TDL

and elsewhere in the system is executed

by the un er UDiNe which is one of the

most comprehensive uni ers so far imple

mented It comprises full negation in

cluding negation of co references and full

disjunction UDiNe provides for so called

distribute d disjunction through which dis

junctive information can be kept as lo

cal as possible in the structural speci ca

tion The main advantage of distributed

disjunction is that it helps to avoid the

translation of structures containing dis

junctions into a disjunctive normal form

which given the size of structures in ques

tion could lead to a serious e ciency

problem UDiNe has a mechanism for

treating feature values de ned as func

tional constraints

Grammar The DISCO grammar is a

German grammar whose syntactic part

was developed by and described in Net

ter and which has an integrated se

mantic representation described in Ner


Kasper The style of

the grammar

follows very much the spirit

of Head Driven Phrase Structure Gram

mar HPSG Pollard and Sag Pol

There is no w a new version of TDL available

which enables partial or delayed type expansion

the incremental de nition of types negation dis

reversible in the sense that it is used for

both parsing and generation

Present coverage of German in the dis

co grammar includes on the

nominal level determiners and nu

merals bare plurals and mass nouns

postnominal prepositional phrases

nominal ellipsis

adjectival level complex and sim

ple adjective phrases attributive and

predicative functions

verb level V initial V second and V

nal modal verbs and construction of

verb complexes

clausal level free insertion of ad

juncts topicalization sentence types

Y N and WH interrogatives imper

atives declaratives

In addition the grammar has been ex

tended to include multiword lexemes per

mitting opaque idioms to reside in the lex

icon Fig shows an example of such

an entry This example demonstrates al

so the uniform representation of di erent

levels of linguistic information in HPSG

Semantic information is represented

within the grammar following the ideas

of constraint based semantics Nerbonne

The current status of semantic rep

resentation in the disco pro ject is de

scribed in Kasper It follows the

practice to specify the semantic repre

sentation only partially up to a level of

quasi logical form which leaves for ex

ample some contextual restrictions out of

consideration These are left to a sec

ond step of semantic interpretation Such

quasi logical forms contain the following

kind of information

junctive types and partitioning However this The following is not a complete list of the

version needs still to be integrated into the whole current coverage A detailed description can be


found in Netter










PRED time





STEM h cardinal i STEM huhr i STEM h cardinal i NUM sg HEAD VALUE
STEM h cardinal i STEM huhr i STEM h cardinal i

Figure An example of a multi word lexeme taken from Netter that covers

time expressions such as Uhr p m

predicate argument structure Plural terms

thematic roles of arguments

the modi cation relation

sortal and selectional restrictions on

predicate argument structures and

modi cation These have proved

to be very e cient means for

disambiguation especially of PP

attachment ambiguities see Kasper

NLL Although the core formalism has

a great deal of power the disco sys

tem also provides a logical form module

which facilitates translation into various

types of back end systems NLL is a

representation of standard predicate logic

with lambda abstraction and several fea

tures that support representation of nat

ural language constructions Laubsch and

Nerbonne These include

Named predicate argument positions

ship agt John

th shipment

time Mar

Jim and John are shippers

exist x shipper inst i x

arg fJim Johng arg

Location terms


John is in Saarbruc ken on the Saar

located th John

loc reg XfSB on fn Saar g

Restricted parameters

Generalized quanti ers

Complex determiners

Modularity is achieved by providing

several types of interfaces NLL struc

tures may be created using constructor

functions a structure parser or a feature

description language

The goal of modi ability is realized

through the implementation

of NLL using

the compiler tools Zebu a LISP version

of yacc Laubsch and refine

Processing Components In the fol

lowing we describe brie y the main pro

cessing components of the linguistic kernel

cf g

Scanning The scanner is mainly re strategies as well as giving control over

sponsible for preprocessing the input the processing of individual rules For

string It is implemented in lex and

example it is possible to set the control

yacc The scanner can expand abbre strategy to a breadth rst strategy to give

viations into their full forms as for ex priority to certain rules or to determine in

ample h into Uhr Jan into Jan which order types of daughters e g head

uar etc For tokens algorithmically en daughters adjunct daughters etc as well

coding their denotations the scanner also as individual daughters in a speci c rule

performs a morphological analysis by as are processed In addition the parser pro

signing a feature structure to them Such vides the facility to lter out useless tasks

tokens are above all cardinal and ordinal i e tasks where a rule application can be

numbers e g and but also predicted to eventually fail due to an in

complex time and date expressions such evitable uni cation failure

as or

Generation The surface generator cur

Morphology The morphological com rently in use is a variant of the

ponent receives as input those tokens semantic head driven algorithm described

which have not been analysed by the scan in Shieber et al It is a syntac

ner It produces as output a feature tic bottom up process guided by seman

structure which contains as a key or in tic information passed top down Given

dex the lemma of the respective item a semantic representation expressed as a

as well as other relevant morphosyntac typed feature structures it generates all

tic information which uniquely identi es expressions permitted by the grammar

the form During generation the morpho Instead of using a full form lexicon as

logical component receives as input a fea it is the case for Shieber et al

ture structure description of the form to the generator yields as output a sequence

be produced

of feature structure containing the lexical

At present the morphological informa stems together with morphosyntacic infor

tion is precompiled into a morphological mation These elements are then passed

full form lexicon so that runtime process to the morphological component which

ing reduces to the lookup of full forms computes the appropriate surface forms

and the initialization of the lexical com

ponent with the associated feature struc

tures The precompilation is performed by

the X MORF

system described in Trost

Part of

this system was redesigned

with the

results that the feature part is

now also

speci ed in TDL and that the

morphology can be integrated into the sys

tem for a full runtime analysis

Speech act component The current

disco system incorporates a

rst proto

type of a surface speech act


developed by and described in more de

tail in Hinkelman and Spackman

In linking text to task it is crucial to

capture not only propositional content of

linguistic expressions but also the attitude

or intention behind them Speech acts

Parsing The parser is a bidirection or utterances construed as actions serve

al bottom up chart parser which oper this purpose An agent is assumed to be

ates on a context free backbone implicit an inference system with a planning al

ly contained in the grammar The pars gorithm The planner constructs chains

er provides parameterized general parsing of actions that would if executed result

in the achievements of the agent s goal experimentation with ow of control

Plan recognition

ing this process

then consists in revers

taking observed actions

as evidence for the intentions

In the approach followed in

the disco

system the HPSG framework

is used to

represent the necessary inference rules for

speech act recognition e g simple impli

cations can be expressed using negation

and distributed disjunction Based on

this kind of representation the relation

ship between sentence types and coven

tionalized speech acts like inform or re

quest can be expressed declaratively in

the grammar Parsing of an expression


not only yields an propositional con


but also the possible types of speech

acts The current version of the speech

act component o ers solutions to the fol

lowing subproblems

general linkage between utterances

and intentions

speci c mechanisms linking speech

acts to surface representations

reliable n way speech acts

the disco develop

ment shell

The architecture of the disco system and

cosma have both been realized using the

disco development shell which was

developed by the author and which we are

now going to present in more detail


use of modern programming tech


in system integration is crucial to

support the following desiderata

modularity of NLP components

The current approach only supports speech

incorporation of new modules

building of subsystems and stan

dalone applications

accommodation of alternative mod

ules with similar

In order to perform

above we have chosen


the tasks mentioned

a two step appr oach

to realize DISCO s architecture

In a rst step the architecture is de

scribed and developed independent

ly of the components to be used

and of the particular ow of control

Possible components are viewed as

black boxes and the ow of control

is described independently of specif

ic components In such an abstract

view the architecture realizes only a

schema called the frame system

Next the frame system has to be in

stantiated by the integration of exist

ing modules and by de ning the par

ticular ow of control between these


It is useful to divide the system com

ponents into di erent types according

to their speci c tasks Currently we


tool components e g graphic de

vices printer debugger errror han





components e g morphology pars

er generator speech act recognition

control component

act recognition but it is planned to use the same We do not assume that this list is complete

basic approach also for the planning of speech For example it is also possible to have knowledge

acts see Hinkelman and Spackman sources as components of the frame system

In order to obtain a high degree of ex paradigm a program is viewed as a set

ibility and robustness especially during of ob jects that are manipulated by ac

the development phase of a system the tions The state of each ob ject and the ac

control unit directs and monitors the ow tions that manipulate the state are de ned

of information between the other compo once and for all when the ob ject is creat

nents The important tasks of the control ed The essential ingredients of ob ject

unit are

to direct the data ow between the

individual components

to de ne which components should

run together to form a subsystem

to check the data received from one

component before they are sent to an

other one

to manage global memory and call

speci c tools

There is a command level for direct

communication with the kernel The pur

pose of the command level is to provide

commands that allow users to run subsys

tems to activate or inactivate tracing of

modules and to specify printing devices

Users may also specify values

for global

variables interactively or with

con gura

tion les for each module

Ob ject Oriented Design If a new

component must be integrated one would

like to concentrate only on those parts

that are of speci c interest for these new

components Algorithms or data which

oriented programming are objects class

es and inheritance Objects are modules

that encapsulate data and operations on

that data Every ob ject is an instance of

a speci c class which determines its struc

ture and behaviour Inheritance allows

new classes to specialize already de ned

classes The result is a hierarchy of class


where classes inherit the behaviour da


and operations from superclasses The

advantage for the programmer is that she


only specify to what extend the new


di eres from the class es it inherits

from This supports the design of mod

ular and robust systems that are easy to

use and extend

CLOS The main programming lan

guage for the DISCO pro ject is Common

Lisp Because CLOS is de ned to be

a standard language extension to Com

mon Lisp it is easy to combine ordinary

Lisp code with OOP style CLOS al

lows us to de ne an hierarchical organiza

tion of classes that models the relationship

among the various kinds of ob jects Fur

thermore because CLOS supports mul

tiple inheritance it is possible to de ne

methods that are de ned for particular

are common to all components or compo combinations of classes Therefore a large

nents of a speci c

type should be de ned

only once and then be added automatical

ly for each new component without side

e ects to other already integrated compo


amount of control ow

alized by CLOS This

is automatically re

helps us to concen

trate on the individual properties of new

components which simpli es and speeds

up their integration extremly Of course

We have choosen an object oriented CLOS itself does not enforce modularity

programming style OOP style using the

Common Lisp Ob ject System CLOS in

order to realize the two step approach

The reader should consult e g Keene

for an excellent introduction into CLOS if more

detailed information on object oriented program

described above In the ob ject oriented ming is of interest

or mak es it possible to organize programs language components All current proto

poorly it is just a tool that helps us to cols are de ned using the same structure

achieve such modular systems as shown in g

DISCO s Class Hierachy The dis

co development shell consists of the

class hierarchy and the speci cation of

class speci c methods Every type of com

ponent and its specializations are de ned


CLOS classes Figure shows a portion


the current hierarchy

module is the most general class All

other classes inherit its data structure and

associated methods The class language

component subsumes all modules of the

current system which are responsible for

language processing A module that is ac

tually used in the system is an instance of

one of the classes

New modules are added to the sys

tem by associating a class with them

CLOS supports dynamic extension of the

class hierarchy so that new types can be

added even at run time For example

if we wanted to add a new parser mod

ule we would either use the already exist

ing class parser or de ne a new class

say alternative parser In the rst

case we assume that we only need the

methods that have already been de ned

for the parser class In the second

case we would have to add new


or could specialize some of the


that alternative parser inherits from

parser In principle it could also


that the new parser shares many


ties with disco parser In that case we

would have to

re ne the parser subnet in

order to avoid


Protocols The ow of control between

The controller uses the generic function

call component to activate an individ

ual language component instance special

ized to the appropriate subclass Con

trol ow is determined by the sequence of

call component invocations Between

calls output is veri ed and converted to

the following component s input format by

calling the generic

transform This

function check and

mechanism is very im

portant to support robustness especially

during the development phase of the sys

tem Speci c methods are de ned for each

module that indicate what to do if a mod

ule does not come up with a correct re

sult These methods

controller during the

are activated by the

call of check and

transform In the current version of

the system further processing is then in

terrupted and the user is informed about

the problem that occured

For example the output of mor

phology de nes the input to parser

and so on By calling check and

transform controller morpholo

gy parser the controller checks whether

the morphology yielded a valid output and

eventually transforms the output for the

parser If morphology detects an un

known word X further processing is in

terrupted and the user receives a message

notifying him

that X is unknown to the

morphological component

Some Remarks If two adjacent com

ponents have been proven to work to

gether without problems check and

transform need not be called for


a set of components is mediated by means as it is the case for the components

of protocols Protocols are methods de component and component in the

ned for the class control ler They specify example shown in g

the set of language components to be used Input and output for the whole system

and the input output relation between the is specifed using general communication

Module Tool Controller Command Shell
Controller Command Shell
Module Tool Controller Command Shell Trace Handler Printer Handler Language Component Scanner Morphology Lexicon P

Trace Handler Printer Handler

Language Component

Shell Trace Handler Printer Handler Language Component Scanner Morphology Lexicon P arser NLL Lisp Scanner

Scanner Morphology Lexicon P arser NLL

Component Scanner Morphology Lexicon P arser NLL Lisp Scanner Yacc Scanner X morf Morphix Disco Parser

Lisp Scanner Yacc Scanner

Lexicon P arser NLL Lisp Scanner Yacc Scanner X morf Morphix Disco Parser Figure A portion

X morf Morphix Disco Parser


A portion of the current class hierarchie in the disco system

c al l component control ler component

check and transform control ler component


cal l component control ler component

check and transform control ler component


cal l component control ler component

check and transform control ler component


cal l component control ler component

Figure Schematic structure of protocols

channels In the normal case this is the the current version In principle the ar

standard terminal input output stream of chitecture appeals to be general enough to

Common Lisp In the cosma system realize negotiation based architectures

an e mail interface for standard e mail is

used as the principle communication chan

nel Besides the general input output de

vice the controller also

manages working

and long term memory These memories

are used to process a sequence of sen

tences In this case the controller stores

each analysed sentence in long term mem


Curren t Subsystems For grammar

debugging it is possible to run server

al subsystems called standalone applica

tions which are activated via the com

mand level For example one might want

to run the parser and generator without

morphology or only the set of components

necessary during analysis aso In each

The architecture by itself is not restrict case the same functionality is available as

ed to pipeline processing but would be well as the same set of tools In principle

used in modeling cascade or blackboard it would be possible for a user to de ne

architectures as well In the latter case protocols himself e g to test self written

the working memory can be used to real modules because the integration of mod

ize the possibly structured blackboard ules and the de nition of protocols takes

This has already been partially realized in place in a standardized fashion

Output Input Component-1 Input Component-2 Output Input/Output ControllerUnit Input Input Component-4 Output

Input/Output from/to general device (e.g., email)

Figure Flow of control between four components In this protocol component

and interact directly The controller views them as being

by the dotted lines around


one component indicated

o verview of the cos dar is managed by an individual planning

ma system

component Each cosma system consists

of three basic components

In this nal section we will give a brief

overview of the cosma system The

principle idea behind the cosma


is to support scheduling of appointments

between several human participants by

means of distribute d intel ligent calendar

assistents Instead of using a central

ized solution where only a global


database is maintained we have choosen

a distributed solution Scheduling of ap

pointments between several participants is

than viewed as a

dialogue between

cooperative negotiation

the di erent agents

An intelligent assistent that keeps

and manages the calendar database

A graphical user interface to the cal

endar data base application planner

The natural language system disco

It is assumed that electronic mail will

be used as a basic communication chan

nel Information concerning the schedule


particular appointments e g request

to arrange a meeting cancelation of a pre

viously setup appointment or other infor

We assume that each person has its own mation relevant in performing some nego

therefore local calendar database avail tiation is sent around the set of relevant

able on her computer where each calen participants via e mail Using standard

e mail software has the advantage that Short Example Figure gives an

scheduling of appointments can be done overview of a con guration where three

in a distributed and asychronous way participants a human Tick and two

Natural language NL comes into play

because we allow humans to participate

who have no calendar assistent available

The only restriction is that they have elec

cosmas Trick Track are involved

A possible appointment scheduling is as

follows abstracting away from details

Track to Trick and Tick

arrange meeting p m

tronic mail available Such a poor per Trick to Track

son is responsible for mantaining an old accept meeting

fashioned calendar but is allowed to use Tick to Track

natur al language during appoinment nego

tiation Consequently each cosma sys

Ich bin mit dem Termin einverstanden I

accept the appointment

tem needs to be able to process natural Track to Trick and Tick

language either to understand a NL dia con rm meeting

logue contribution or to produce one

Each user of a cosma system has ac

cess to the calendar data base by means

of a graphical user interface The graphi

cal user interface developed by Stephen

Spackman who named the tool dui

is used to display and update existing

items and enter new items into the data

base The intelligent calendar manager

maintains the calendar database The

current version developed by the AKA

In words Track wanted to arrange a

meeting and

sends this request to Trick

and Tick in form

of an internal


expression Trick


sends an

acception Because there are no con ict

ing entries in his calendar data base Tick

sends an acception using NL Track will

update its calendar while sending a con

rmation to the two participants notify

ing them that all participants accepted the


The current version of the system is able

MOD group of the DFKI consists of time to handle more complex dialogs e g ap

processing functions a nite state proto pointment scheduling initiated by a non

col for arranging appointments and an

action memory storing the protocol state

and original e mail for each arrangement

in progress The principle task of the dis

co system is to

from an natural

extract that information

language expression that

cosma user cancellation and modi ca

tion of already set up appointments


can be used by the calendar manager dis In this paper we have given an overview

co is also responsible for the production of the disco system and a detailed de

of natural language text from the inter scription of the disco development

nal representation of scheduling informa shell The disco development shell

tion computed by the calendar manager has been proven very useful in setting up

The produced text is sent in addition to the cosma system It was possible to

the internal structure of scheduling ex integrate the new modules independent

pressions to the participants via e mail ly from the rest of the system Existing

Busemann describes the current modules have been exchanged by new ones

approach for generating natural language without the need of adding new meth

expressions in cosma in more detail ods Because di erent researchers were

TICK TRACK Planner e-mail DISCO Emailer DUI TRICK Human e-mail e-mail (traditional) Planner DISCO DUI

This Cosma is authorized to make appointments


(loves AI)

Figure General Overview of the Sample Scenarios

able to run subsystems the development References

of the whole system could be done in a

distributed way Therefore eight very dif

ferent modules have been intergrated in

less than three weeks including test phas


Based on these experiences we believe

that the o ject oriented architectual mod

el of our approach is a fruitful basis for

managing large scale pro jects It makes

it possible to develop the

basis of a whole

system in parallel to the

development of

the individual components Therefore it is

possible to take into account restrictions

and modi cations of each component as

early as possible

Busemann and Harbusch

Stephan Busemann and Karin Har

busch editors Pr oc DFKI Workshop

on Natural Language Systems Mod

ularity and Re usability Saarbruc ken


Busemann S Busemann Towards

the con guration of generation


Some initial ideas In Busemann and


Hinkelman and Spackman

Elizabeth A Hinkelman and Stephen P

Spackman Abductive speech act recog

nition corporate agents and the cos

ma system In W J Black G Sabah

and T J Wachtel editors Abduction

Beliefs and Context Proceedings of

the second ESPRIT PLUS workshop in

computational pragmatics

Inc Reasoning Systems Inc Re

ne user s guide Technical report Palo

Alto CA

Kasper Walter Kasper Integra

tion of syntax and semantics in feature

structures In Busemann and Harbusch R C Moore Semantic head driven

Keene Sonya E Keene Obje ct

generation Computational Linguistics

Oriented Programming in Common Trost Harald Trost X MORF

Lisp A Programmer s Guide to CLOS A morphological component based

Addision Wesley

Laubsch and Nerbonne J Laub

sch and J Nerbonne An overview of

nll Technical report Hewlett Packard

Laboraties Palo Alto CA

Laubsch J Laubsch Zebu A tool

for specifying reversible lalr parsers

Technical report Hewlett Packard Lab

oraties Palo Alto CA


John Nerbonne Constraint based se

mantics In Paul Dekker and Martin

Stokhof editors Proceedings of the th

Amsterdam Col loquium pages

Institute for Logic Language and Com

putation also DFKI RR

Netter Klaus Netter Architecture

and coverage of the disco grammar In

Busemann and Harbusch

Pollard and Sag C Pollard and

I A Sag Information Based Syntax

and Semantics Volume Center for

the Study of Language and Information


Pollard and Sag to appear

C Pollard and I M Sag Information

Based Syntax and Semantics Volume

Center for the Study of Language and

Information Stanford to appear

Sch afer and Krieger Ulrich

Sch afer and Hans Ulrich Krieger

extra light User s Guide Franz Al


Common LISP Version DISCO

Shieber et al S M Shieber

F C N Pereira G van Noord and

on augmented two level morphology

Technical Report RR Deutsches

Forschungsinstitut fur Kunstlic he Intel

ligenz Saarbruc ken Germany