Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Intelligence
Johanna D. Moore Peter Wiemer-Hastings
HCRC/Division of Informatics
University of Edinburgh
2 Buccleuch Place
Edinburgh EH8 9LW Scotland
{jmoore|peterwh}@cogsci.ed.ac.uk
+44 131 651 1336 (voice)
+44 131 650 4587 (fax)
Running head:
Discourse in CL and AI
1
List of Figures
1 A simple DRS for the sentence, John sleeps. . . . . . . . . . . . . . . . . . . . . 49
2 A DRS rule for processing proper nouns . . . . . . . . . . . . . . . . . . . . . . . . 50
3 Before and after applying the proper noun rule . . . . . . . . . . . . . . . . . . . . 51
4 A snapshot of the world . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
5 Two readings of an ambiguous quantier scoping . . . . . . . . . . . . . . . . . . . 53
6 A DRS for a conditional sentence with accommodated presuppositions . . . . . . . 54
7 For G&S, dominance in intentional structure determines embedding in linguistic
structure. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
8 Discourse structure aects referent accessibility. . . . . . . . . . . . . . . . . . . . . 56
9 Graphical Representation of an RST Schema . . . . . . . . . . . . . . . . . . . . . 57
10 Graphical Representation of an RST Analysis of (9) . . . . . . . . . . . . . . . . . 58
11 An SDRS for discourse (10) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
12 A Sample Discourse Plan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
13 Arguing from cause to eect . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
14 Arguing from eect to cause . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
2
Models of discourse structure and processing are crucial for constructing computational systems
capable of interpreting and generating natural language. Research on discourse focuses on two
fundamental questions within computational linguistics and articial intelligence. First, what
information is contained in extended sequences of utterances that goes beyond the meaning of the
individual utterances themselves? Second, how does the context in which an utterance is used
aect the meaning of the individual utterances, or parts of them?
Discourse research in computational linguistics and articial intelligence encompasses work on
spoken and written discourse, monologues as well as dialogue (both spoken and keyboarded). The
questions that discourse research attempts to answer are relevant to all combinations of these
features. The juxtaposition of individual clauses may imply more than the meaning of the clauses
themselves, whether or not the clauses were contributed by the same speaker (writer).
1
Likewise,
the context created by prior utterances aects the current one regardless of which participant
uttered it.
In this chapter, we will rst provide an overview of types of discourse structure and illustrate
these with examples. We will then describe the inuential theories that account for one or more
types of structure. We then show how these theories are used to address specic discourse pro-
cessing phenomena in computational systems. Finally, we discuss the use of discourse processing
techniques in a range of modern language technology applications.
Overview of Discourse Structure
Researchers in computational linguistics have long argued that coherent discourse has structure,
and that recognising the structure is a crucial component of comprehending the discourse (Grosz
& Sidner, 1986; Hobbs, 1993; Moore & Pollack, 1992; Stone & Webber, 1998). Interpreting
referring expressions (e.g., pronouns and denite descriptions), identifying the temporal order of
events (e.g., the default relationship between the falling and pushing events in Max fell. John
pushed him.), and recognizing the plans and goals of our interlocutors all require knowledge of
discourse structure (Grosz & Sidner, 1986; Kehler, 1994b; Lascarides & Asher, 1993; Litman &
Allen, 1987). Moreover, early research in language generation showed that producing natural-
sounding multisentential texts required the ability to select and organise content according to
rules governing discourse structure and coherence (Hovy, 1988b; McKeown, 1985; Moore & Paris,
1993).
1
Henceforth, we will use speaker and hearer to indicate the producer and interpreter of discourse, respectively,
whether it is spoken or written. Where the distinction between spoken and written discourse is important, we will
be more explicit.
3
Although there is still considerable debate about the exact nature of discourse structure and
how it is recognized, there is a growing consensus among researchers in computational linguistics
that at least three types of structure are needed in computational models of discourse processing
(Grosz & Sidner, 1986; Hobbs, 1993; Moore & Pollack, 1992). These are described below.
Intentional structure describes the roles that utterances play in the speakers communicative
plan to achieve desired eects on the hearers mental state or the conversational record (Lewis,
1979; Thomason, 1990). Intentions encode what the speaker was trying to accomplish with a given
portion of discourse. Many have argued that the coherence of discourse derives from the intentions
of speakers, and that understanding depends on recognition of those intentions, e.g., (Grice, 1957;
Grosz & Sidner, 1986). Research in response generation shows that in order to participate in a
dialogue, agents must have a representation of the intentional structure of the utterances they
produce. Intentional structure is crucial for responding eectively to questions that address a
previous utterance; without a record of what an utterance was intended to achieve, it is impossible
to elaborate or clarify that utterance (Moore, 1995; Young, Moore, & Pollack, 1994a). Moreover,
speaker intentions are an important factor in generating nominal expressions (Appelt, 1985; Green,
Carenini, & Moore, 1998) and in selecting appropriate lexical items, including discourse cues (e.g.
because, thus) (Moser & Moore, 1995; Webber, Knott, Stone, & Joshi, 1999) and scalar terms
like (e.g. dicult, easy) (Elhadad, 1995).
Informational structure consists of the semantic relationships between the information con-
veyed by successive utterances (Moore & Pollack, 1992). Causal relations are a typical example
of informational structure, and psycholigists working in reading comprehension have shown that
these relations are inferred during reading (Gernsbacher, 1990; Graesser, Singer, & Trabasso, 1994;
Singer, Revlin, & Halldorson, 1992). In addition, several researchers identied types of text whose
organization follows the inherent structure of the subject matter being communicated, e.g., the
structure of the domain plan being discussed (Grosz, 1974; Linde & Goguen, 1978), or the spatial
(Sibun, 1992; Linde, 1974), familial (Sibun, 1992) or causal relationships (Paris, 1988; Suthers,
1991) between the objects or events being described, or the states and events being narrated
(Lehnert, 1981). Several systems that generate coherent texts based on domain or informational
structure have been constructed (Sibun, 1992; Paris, 1988; Suthers, 1991)
Attentional structure as dened by Grosz and Sidner (1986) contains information about the
objects, properties, relations, and discourse intentions that are most salient at any given point in
the discourse. In discourse, humans focus or center their attention on a small set of entities
and attention shifts to new entities in predictable ways. Natural language understanding systems
must track attentional shifts in order to resolve anaphoric expressions (Grosz, 1977; Gordon, Grosz,
4
& Gilliom, 1993; Sidner, 1979) and understand ellipsis (Carberry, 1983; Kehler, 1994a). Natural
language generation systems track focus of attention as the discourse as a whole progresses as well
as during the construction of individual responses in order to inuence choices on what to say
next (Kibble, 1999; McKeown, 1985; McCoy & Cheng, 1990), to determine when to pronominalise
(Elhadad, 1992), to make choices in syntactic form (e.g., active vs. passive) (McKeown, 1985;
Elhadad, 1992; Mittal, Moore, Carenini, & Roth, 1998), to appropriately mark changes in topic
(Cawsey, 1993), and generate elliptical utterances.
In addition to these three primary types of discourse structure, the literature on discourse in
computational linguistics has discussed two additional types of structure. One of them, rhetorical
structure, has had considerable impact on computational work in natural language generation.
Information structure consists of two dimensions: (1) the contrast a speaker makes between
the part of an utterance that connects it to the rest of the discourse (the theme), and the part
of an utterance that contributes new information on that theme (the rheme); and (2) what the
speaker takes to be in contrast with things a hearer is or can be attending to. Information
structure can be conveyed by syntactic, prosodic, or mophological means. Steedman argues that
information structure is the component of linguistic structure (or grammar) that links intentional
and attentional structure to syntax and prosody, via a compositional semantics for notions like
theme (or topic) and rheme (or comment). Recently, a number of theories of information structure
including (Vallduvi, 1990) and (Steedman, 1991) have brought hitherto unformalised notions like
theme, rheme and focus within the compositional semantics that forms a part of formal grammar
(Steedman, 2000, 2001).
Rhetorical structure is used by many researchers in computational linguistics to explain
a wide range of discourse phenomena. There have been several proposals dening the set of
rhetorical (or discourse or coherence) relations that can hold between adjacent discourse elements,
and have attempted to explain the inferences that arise when a particular relation holds between
two discourse entities, even if that relation is not explicitly signalled in the text. Researchers in
interpretation have argued that recognizing these relationships is crucial for explaining discourse
coherence, resolving anaphora, and computing conversational implicature (Hobbs, 1983; Mann
& Thompson, 1988; Lascarides & Asher, 1993). Researchers in generation have shown that it
is crucial that a system recognise the additional inferences that will conveyed by the sequence of
clauses they generate, because these additional inferences may be the source of problems if the user
does not understand or accept the systems utterance. Moreover, in order to implement generation
systems capable of synthesizing coherent multi-sentential texts, researchers identied patterns of
such relations that chararacterise the structure of texts that achieve given discourse purposes,
5
and many text generation systems have used these patterns to construct coherent monologic texts
to achieve a variety of discourse purposes (Hovy, 1991; McKeown, 1985; Mellish, ODonnell,
Oberlander, & Knott, 1998; Mittal et al., 1998; Moore & Paris, 1993; R osner & Stede, 1992; Scott
& de Souza, 1990).
2
Much of the remaining debate concerning discourse structure within computational linguistics
centers around which of the these structures are primary and which are parasitic, what role the
structures play in dierent discourse interpretation and generation tasks, and whether their im-
portance or function varies with the discourse genre under consideration. For example, a major
result of early work in discourse was the determination that discourses divide into segments much
like sentences divide into phrases. Each utterance of a discourse either contributes to the pre-
ceding utterances, or initiates a new unit of meaning that subsequent utterances may augment.
The usage of a wide range of lexicogrammatical devices correlate with discourse structure, and
the meaning of a segment encompasses more than the meaning of the individual parts. In addi-
tion, recent studies show signicant agreement among segmentations performed by naive subjects
(Passonneau & Litman, 1997). Discourse theories dier about the factors they consider central
to explaining this segmentation and the way in which utterances in a segment convey more than
the sum of the parts. Grosz and Sidner (1986) argue that intentions are the primary determiners
of discourse segmentation, and that linguistic structure (i.e., segment embedding) and attentional
structure (i.e., global focus) are dictated by relations between intentions. Polanyi (1988, p. 602)
takes an opposing view and claims that hierarchical structure emerges from the structural and
semantic relationships obtaining among the linguistic units which speakers use to build up their
discourses. In Hobbs (1985), segmental structure is an artefact of binary coherence relations
(e.g., background, explanation, elaboration) between a current utterance and the preceding dis-
course. Despite these dierent views, there is general agreement concerning the implications of
segmentation for language processing. For example, segment boundaries must be detected in order
to resolve anaphoric expressions (Asher, 1993; Grosz & Sidner, 1986; Hobbs, 1979; Passonneau
& Litman, 1997). Moreover, several studies have found prosodic as well as textual correlations
with segment boundaries in spoken language (Grosz & Hirschberg, 1992; Hirschberg, Nakatini,
& Grosz, 1995; Nakatani, 1997; Ostendorf & Swerts, 1995), and appropriate usage of these into-
national indicators can be used to improve the quality of speech synthesis (Davis & Hirschberg,
1988).
In the sections that follow, we will further examine the theories of discourse structure and
2
Moore and Pollack (1992) argue that the rhetorical relations used in these systems typically conate informa-
tional and intentional considerations, and thus do not represent a fourth type of structure.
6
processing that have had signicant impact on computational models of discourse phenomena. A
comprehensive survey of discourse structure for natural language understanding appears in (Grosz,
Pollack, & Sidner, 1989), and therefore we will focus on discourse generation and dialogue in this
chapter. We will then review the role that discourse structure and its processing plays in a variety
of current natural language applications. The survey in (Grosz et al., 1989) focuses largely on the
knowledge-intensive techniques that were prevalant in the late eighties. Here, we will put more
emphasis on statistical and shallow processing approaches that enable discourse information to be
used in a wide range of todays natural language technologies.
Computational Theories of Discourse Structure and
Semantics
Discourse Representation Theory
Discourse Representation Theory (DRT) is a formal semantic model of the processing of text in
context which has applications in discourse understanding. DRT was originally formulated in
(Kamp, 1981) and further developed in (Kamp & Reyle, 1993), with a concise technical summary
in (van Eijck & Kamp, 1997). DRT grew out of Montagues model-theoretic semantics (Thomason,
1974) which represents the meanings of utterances as logical forms and supports the calculation of
the truth conditions of an utterance. DRT addresses a number of diculties in text understanding
(e.g. anaphora resolution) which act at the level of the discourse.
This section gives a brief overview of the philosophy behind DRT, the types of structures and
rules that DRT uses, and the particular problems that it addresses. We also describe some of the
limitations of the standard DRT theory.
Philosophical foundations of DRT. As mentioned above, DRT is concerned with ascer-
taining the semantic truth conditions of a discourse. The semantic aspects of a discourse are
related to the meaning of the discourse, but not related to the particular situation (including
time, location, common ground, etc) in which the discourse is uttered. The advantage to this
approach from the logical point of view is that the semantic representation for the discourse can
be automatically (more or less) built up from the contents (words) and structure of the discourse
alone, without bringing in information about the external context of the utterance. Once con-
structed, it can be compared with a logical representation of some world (a model in DRT terms)
to determine whether the discourse is true with respect to that model.
DRT structures. The standard representation format in DRT, known as a discourse repre-
sentation structure (DRS), consists of a box with two parts as shown in Figure 1. The top part of
7
the box lists the discourse referents, which act as variables that can be bound to dierent entities
in the world. The bottom section of the DRS lists the propositions that are claimed to be true of
those referents in the described situation. Figure 1 gives the DRS of the sentence John sleeps.
The representation can be read as There is an individual who is named John, and of whom the
sleep predicate is true. This is equivalent to the logical expression: (x) : John(x) sleep(x).
[Figure 1 about here.]
DRT rules and processing. To derive a structure like the one shown above, DRT uses a set
of standard context-free grammar rules, and a set of semantic interpretation rules that are based
the syntactic structure of the input sentence. Figure 2 shows a simple DRT rule for processing
proper nouns. The left hand side shows a segment of the syntactic tree which must be matched,
and the right hand side shows the result of applying the rule, including adding propositions to
the DRS, and changing the parse tree. This rule applies to the structure on the left in Figure 3,
which shows the parse tree of the example sentence, John sleeps, in a DRS. The rule produces
the representation on the right in Figure 3 by deleting part of the parse tree, inserting a variable
in its place, and adding a proposition to the DRS.
[Figure 2 about here.]
Next, a similar rule is applied which reduces the verb phrase to a proposition, sleep which
is true of a new discourse referent, y. Then, the sentence rule deletes the remaining syntactic
structure and equates the subject discourse referent with the object referent, x = y. Finally, x is
substituted for y in the sleep proposition, producing the structure shown in Figure 1. The next
sentence in a discourse is processed by adding its parse tree to the box, and then applying the
semantic transformation rules to it.
[Figure 3 about here.]
The construction of a complete DRS enables the calculation of its truth value with respect to
a model. A model is a formal representation of the state of the world. Figure 4 shows a model
which is distinguished from a DRS by its double box. The elements in the top section of a model
are interpreted dierently than those in a DRS. In a DRS, they are variables which can bind to
entities in the world. In a model, each referent indexes a particular entity in the world.
In this example, the DRS is true with respect to the model because there is a consistent
mapping (x = b) between the discourse referents in the DRS and the individuals in the model,
and the model contains all of the propositions which are in the DRS. Because the model is taken
8
to be a snapshot of the world, it may contain many additional propositions which are not in the
DRS without aecting the truth conditions of the DRS. Those which are not relevant to the DRS
are simply ignored.
[Figure 4 about here.]
Uses of DRT. The major advantages of DRT are that it provides a simple, structure-based
procedure for converting a syntactic representation of a sentence into a semantic one, and that
that semantic representation can be mechanically compared to a representation of the world to
compute the truth-conditional status of the text. DRT also addresses (at least partially) the
dicult discourse problems of anaphora resolution, quantier scoping, and presupposition.
When a pronoun is processed in DRT, the semantic interpretation rule adds an instruction
of the form x =?, which is read as, nd some discourse referent x in the discourse context. The
referent must satisfy three constraints: the consistency constraint, the structural constraint, and
the knowledge constraint. The consistency constraint species that the new mapping of a discourse
referent must not introduce a contradiction into the DRS. In practice, this ensures that number
and gender restrictions are applied. The structural constraint limits where in a complex DRS a
coreferent can be found. In practice, this is similar to the constraints proposed in Centering Theory
(Grosz, Joshi, & Weinstein, 1995). Finally, the knowledge constraint is intended to prohibit the
inclusion of any coreference which would violate world knowledge or common sense. Unfortunately,
the scope of this constraint makes a complete implementation of it impossible.
DRT is the foundation of recent research in psycholinguistics which attempts to model human
judgements of the acceptability of a range of anaphors. Gordon and Hendrick (1997) collected
ratings from humans of the acceptability of various combinations of names, quantied, denite
and indenite noun phrases, and pronouns. Their results showed that human judgements did
not support some of the constraints on coreference acceptability that came from classical binding
theory (Chomsky, 1981). Gordon and Hendrick (1998) claimed that a model of coreference based
on DRT corresponds better with human acceptability judgements.
Of course, pronominal coreference is not the only type of anaphora. Asher (1993) addresses
other types of anaphora, as described below.
[Figure 5 about here.]
In sentences like (1), there is a structural ambiguity concerning the scope of the quantier,
every. Specically, there are two readings of the sentence. One in which each farmer owns a
9
dierent donkey, and one in which all the farmers collectively own a particular donkey. When
processing such a sentence with DRT, there is a choice of processing the quantier before or after
processing the indenite noun phrase. The two orders of rule application produce the two dierent
structures shown in Figure 5.
(1) Every farmer owns a donkey.
Both of these DRSs include substructures which represent the quantier as a conditional. They
are read as, if the conditions on the left hold, the conditions on the right must also hold. In the
DRS on the left, the donkey is within the scope of the conditional. Thus, for every farmer, there
should be a (potentially dierent) donkey. In the DRS on the right, the referent for the donkey
is global, and outside the scope of the conditional. Thus, there is one donkey that every farmer
owns. Although this example applies only within a sentence, it suggests how hierarchical discourse
relations can be represented by variants of DRT, as described below.
DRTs treatment of presupposition in discourse is related to the way it handles quantier
scoping. In particular utterances, such as (2a), certain propositions are said to project out of the
sentence, that is, they are held to be true whether or not the premise of the conditional is. In
(2a), there is a presupposition (at least for rhetorical purposes) that John has a dog, and that
presupposition is true whether or not she has eas. For constructs like Johns dog, DRT creates
discourse referents at the global level of the representation which correspond to John and his dog.
(2) a. If Johns dog has eas, she scratches them.
b. She wears a ea collar though.
b
scratches(y, z)
Figure 6: A DRS for a conditional sentence with accommodated presuppositions
54
Intentional Linguistic
Structure Structure
I
0
: Intend
S
(Intend
H
a)
|
I
1
: Intend
S
(Believe
H
b)
|
I
2
: Intend
S
(Believe
H
c)
DS
0
(a) Come to the party for the new President.
DS
1
(b) There will be lots of good food.
DS
2
(c) The Fluted Mushroom is doing the catering.
Figure 7: For G&S, dominance in intentional structure determines embedding in linguistic struc-
ture.
55
DS0 1 A: Im going camping next weekend. Do you have a two person tent
I could borrow?
2 B: Sure. I have a a two-person backpacking tent .
DS1 3 A: The last trip I was on there was a huge storm.
4 It poured for two hours.
5 I had a tent, but I got soaked anyway.
6 B: What kind of a tent was it?
7 A: A tube tent.
8 B: Tube tents dont stand up well in a real storm.
9 A: True.
10 B: Where are you going on this trip?
11 A: Up in the Minarets.
12 B: Do you need any other equipment?
13 A: No.
14 B: Okay. Ill bring the tent in tomorrow.
Figure 8: Discourse structure aects referent accessibility.
56
MOTIVATION
ENABLEMENT
Figure 9: Graphical Representation of an RST Schema
57
MOTIVATION
ENABLEMENT
d
b
c
EVIDENCE
a
Figure 10: Graphical Representation of an RST Analysis of (9)
58
1
,
6
1
: K
1
6
:
2
,
5
,
7
2
: K
2
,
5
: K
5
Narration(
2
,
5
)
7
:
3
,
4
3
: K
3
,
4
: K
4
,
Narration(
3
,
4
)
Elaboration(
2
,
7
)
Elaboration(
1
,
6
)
Figure 11: An SDRS for discourse (10)
59
FINAL
INFORM(P1)
END BEGIN
Cause-to-Bel(P2)
BEGIN
END
Bel(P1,P2)
Bel(causes(P1,P2))
BEGIN END
Bel(P2)
cause (P2,P1)
P2 = You cant count on having spare cards available for swapping.
G = to troubleshoot the test station
A = taking measurements
B = swapping components
INITIAL
Know-Plan (A,G)
Know-Plan (B,G)
Bel(P1)
SUPPORT(P1)
INFORM (P2)
Cause-to-Bel(P2,P1)
Cause-to-Bel(P1)
P1 = It would have been better to troubleshoot by taking measurements instead of swapping.
Figure 12: A Sample Discourse Plan
60
Cause-to-Bel(b)
Cause-to-Bel(a)
cause(a,b)
(15) a. You know that the signal on pin 26 is bad.
b. Thus, theres a break in the path created by TPA63.
Figure 13: Arguing from cause to eect
61
Cause-to-Bel(a)
Cause-to-Bel(b)
cause(a,b)
(16) a. You know that the signal on pin 26 is bad
b. because theres a break in the path created by TPA63.
Figure 14: Arguing from eect to cause
62
List of Tables
1 RST Relation MOTIVATION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
2 Discourse Plan Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
3 Fragment of a naturally occurring tutoring dialogue . . . . . . . . . . . . . . . . . 66
63
Table 1: RST Relation MOTIVATION
relation name: MOTIVATION
constraints on N: Presents an action (unrealized with respect to N)
in which the Hearer is the actor.
constraints on S: none
constraints on N + S combination:
Comprehending S increases the Hearers desire to
perform the action presented in N.
effect: The Hearers desire to perform the action presented in N
is increased.
64
Table 2: Discourse Plan Operators
Operator 1: Action operator for Cause-to-Bel
HEADER: Cause-to-Bel(?p)
CONSTRAINTS: not(Bel(?p))
PRECONDITIONS: nil
EFFECTS: Bel(?p)
Operator 2: Decomposition operator for Cause-to-Bel
HEADER: Cause-to-Bel(?p)
CONSTRAINTS: nil
STEPS: Begin, Inform(?p),
Support(?p), End
Operator 3: Decomposition operator for Support
HEADER: Support(?p)
CONSTRAINTS: causes(?q, ?p)
STEPS: Begin, Cause-to-Bel(?q),
Cause-to-Bel(causes(?q, ?p)), End
65
Table 3: Fragment of a naturally occurring tutoring dialogue
.
.
.
tutor < > Next, you replaced the A1A3A13. [1]
student This is the rst circuit card assembly in the drawer that the signal goes
through. Why would I not start at the entrance to the station and follow
the path to the measurement device?
[2]
tutor It would have been better to troubleshoot this card by taking measurements
instead of swapping. You cant count on having spare cards available for
swapping.
[3]
66