Sei sulla pagina 1di 10

Paraphrasing Questions Using

Given and N e w Information 1


Kathleen R. M c K e o w n
C o m p u t e r Science D e p a r t m e n t
Columbia University
N e w Y ork, N Y 10027

The design and implementation of a paraphrase component for a natural language


question-answering system (CO-OP) is presented. The component is used to produce a
paraphrase of a user's question to the system, which is presented to the user before the
question is evaluated and answered. A major point made is the role of given and new
information in formulating a paraphrase that differs in a meaningful way from the user's
question. A description is also given of the transformational grammar that is used by the
paraphraser.

1. Introduction and new information to indicate to the user the exis-


tential presuppositions made in h e r / h i s question.
In a natural language interface to a data base query
system, a paraphraser can be used to ensure that the
system has correctly understood the user. Such a par- 2. Overview of the CO-OP System
aphraser has been developed as part of the CO-OP The CO-OP system is aimed at infrequent users of data
system (Kaplan 1979). In CO-OP, an internal repre- base query systems. These casual users are likely to
sentation of the user's question is passed to the para- be unfamiliar with c o m p u t e r systems and unwilling to
phraser, which then generates a new version of the invest the time needed to learn a formal query lan-
question for the user. U p o n seeing the paraphrase, the guage. Being able to converse naturally in English
user has the option of rephrasing h e r / h i s question
enables such persons to tap the information available
before the system attempts to answer it. Thus, if the
in a data base.
question was not interpreted correctly, the error can
In order to allow the question-answering process to
be caught before a possibly lengthy search of the data
p r o c e e d naturally, CO-OP follows some of the " c o -
base is initiated. Furthermore, the user is assured that
operative principles" of conversation (Grice 1975). In
the answer s h e / h e receives is an answer to the ques-
particular, the system a t t e m p t s to find meaningful
tion asked and not to a deviant version of it.
answers to failed questions by addressing any incorrect
The idea of using a paraphraser in the a b o v e way is
assumptions the questioner m a y have made in h e r / h i s
not new. To date, other systems have used canned
templates to form paraphrases, filling in e m p t y slots in question. When the direct r e s p o n s e to a question
the pattern with information from the user's question would be simply " n o " or " n o n e " , CO-OP gives a more
(Waltz 1978, Codd 1978). In CO-OP, a t r a n s f o r m a - informative response by correcting the q u e s t i o n e r ' s
tional g r a m m a r is used to generate the p a r a p h r a s e mistaken assumptions.
f r o m an internal r e p r e s e n t a t i o n of the question. The false assumptions that CO-OP corrects are the
Moreover, the CO-OP paraphraser generates a question existential presuppositions of the questions. 2 Since
that differs in a meaningful way f r o m the original these presuppositions can be c o m p u t e d from the sur-
question. It makes use of a distinction between given face structure of the question, a large store of seman-
tic knowledge for inferencing purposes is not needed.
1 This work was carried out in the D e p a r t m e n t of C o m p u t e r
and Information Science, The University of Pennsylvania. It was 2 For example, in the question " W h i c h users work on projects
partially supported by an IBM Fellowship, and by NSF grants M C S sponsored by N A S A ? " , the speaker m a k e s the existential presuppo-
78-08401 and M C S 79-19171. sition that there are projects s p o n s o r e d by N A S A .

Copyright 1983 by the Association for C o m p u t a t i o n a l Linguistics. Permission to copy without fee all or part of this material is granted
provided that the copies are not made for direct commercial advantage and the Journal reference and this copyright notice are included on
the first page. To copy otherwise, or to republish, requires a fee a n d / o r specific permission.

0 3 6 2 - 6 1 3 X / 8 3 / 0 1 0 0 0 1 - 10503.00

American Journal of Computational Linguistics, Volume 9, Number 1, January-March 1983 1


Kathleen R. M c K e o w n Paraphrasing Q u e s t i o n s Using Given and N e w Information

In fact, a lexicon and data base schema are the only (C) Assuming that there are books on computers
items that contain domain-specific information. Con- (those books date from the '60s), which stu-
sequently, the CO-OP system is a portable one; a dents read those books?
change of data base requires that only these two
The m e t h o d adopted generates a paraphrase that
knowledge sources be modified.
differs from the original except in cases where no rela-
tive clauses or prepositional phrases were used. It was
3. The CO-OP Paraphraser formulated on the basis of a distinction between given
CO-OP's paraphraser provides the only means of error and new information and indicates to the user the
checking for the casual user. If the user is familiar presuppositions s h e / h e has made in the question (in
the "assuming t h a t " clause), while focusing h e r / h i s
with the system, s h e / h e can ask to have the intermedi-
attention on the attributes of the class s h e / h e is inter-
ate results printed, in which case the parser's output
ested in.
and the formal data base query will be shown. The
naive user, however, is unlikely to understand these
4. Linguistic Background
results. It is for this reason that the paraphraser was
designed to respond in English. As mentioned earlier, the lexicon and the data base
The use of English to paraphrase queries creates are the sole sources of world knowledge for CO-OP.
several problems. The first is that natural language is While this design increases CO-OP's portability, it
inherently ambiguous. A paraphrase must clarify the means that little semantic information is available for
system's interpretation of possible ambiguous phrases the paraphraser's use. Contextual information is also
in the user's question without introducing additional limited since no running history or context is main-
ambiguity. tained for a user session in the current version. The
One particular type of ambiguity that a paraphraser input the paraphraser received from the parser is a
must clarify and avoid re-introducing is caused by the syntactic parse tree of the question. Using this infor-
linear nature of sentences. A modifying relative mation, the paraphraser must construct a question that
clause, for example, frequently cannot be placed di- differs in phrasing from the original. The following
rectly after the noun phrase it modifies. In such cases, question must therefore be addressed:
the semantics of the sentence may indicate the correct What reasons are there for choosing one syntac-
choice of modified noun phrase, but occasionally the tic form of expression over another?
sentence may be genuinely ambiguous. For example,
question (A) below has two interpretations, both Some linguists maintain that word order is affected
equally plausible. The speaker could be referring to by functional roles elements play within the sentence. 3
books dating from the '60s or to computers dating Terminology used to describe the types of roles that
from the '60s. can occur varies widely. Some of the distinctions that
have been described include g i v e n / n e w , t o p i c / c o m -
(A) Which students read books on computers dating ment, t h e m e / r h e m e , and presupposition/focus. Defini-
from the '60s? tions of these terms, however, are not consistent. 4
Nevertheless, one influence on expression does
A second problem in paraphrasing English queries
appear to be the interaction of sentence content and
is the possibility of generating the exact question that
the beliefs of the speaker concerning the knowledge of
was originally asked. If a grammar were developed to
the listener. Some elements in the sentence function
simply generate English from an underlying represent-
in conveying information the speaker assumes is pres-
ation of the question, this possibility could be realized.
ent in the " c o n s c i o u s n e s s " of the listener ( C h a f e
Instead, a method must be devised that can determine 1976). This information is said to be contextually
how the phrasing should differ from the original. dependent, either by virtue of its presence in the pre-
The CO-OP paraphraser addresses both the problem ceding discourse or because it is part of the shared
of ambiguity and the rephrasing of the question. It world knowledge of the dialog participants. In a
makes the system's interpretation of the question ex- question-answering system, shared world-knowledge
plicit by breaking down the clauses of the question
and reordering them depending upon their function in 3 Some other influences on syntactic expression are discussed
the sentence. Thus, question (A) above will result in in Morgan and Green 1973. They suggest that stylistic reasons, in
either paraphrase (B) or (C), reflecting the interpreta- addition to some of the functions discussed here, determine when
different syntactic constructions are to be used. They point out, for
tion the system has chosen. example, that the passive tense is often used in academic prose to
avoid identification of agent and to lend a scientific flavor to the
(B) Assuming that there are books on computers text.
(those computers date from the '60s), which 4 For example, see Prince 1979 for a discussion of various
students read those books? usages of " g i v e n / n e w " .

2 A m e r i c a n J o u r n a l of Computational Linguistics, V o l u m e 9 , N u m b e r 1, J a n u a r y - M a r c h 1983


Kathleen R. M c K e o w n Paraphrasing Questions Using Given and N e w I n f o r m a t i o n

refers to information the speaker assumes is present in labeling of information within the question will allow
the data base. Information functioning in the role just the construction of a natural paraphrase, avoiding
described has been termed "given". ambiguity.
" N e w " labels all information in the sentence that is
presented as not retrievable from context. In the dec- 5. Formulation
larative, elements functioning in asserting information
Following the analysis described above, the CO-OP
that the listener is presumed not to know are called
paraphraser breaks down questions into given and new
new. In the question, elements functioning in convey-
information. More specifically, an input question is
ing what the speaker wants to know (i.e., what s h e / h e
divided into three parts, of which (2) and (3) form the
doesn't know) represent information the speaker pre- new information.
sumes the listener is not already aware of. Firbas
1974 identifies additional functions in the question. 1. given information
Of these, (ii) is used here to augment the interpreta- 2. lack of knowledge (ii[a] from Firbas above)
tion of new information. He says (p. 31): 3. angle (ii[b] from Firbas above)

(i) it indicates the want of knowledge on the part In terms of the question components, part (2) is
of the inquirer and appeals to the informant to indicated by the question with no subclauses 7 as it
satisfy this want. defines the lack of knowledge of the hearer. Part (3)
(ii) [a] it imparts knowledge to the informant in that is indicated by the direct and indirect modifiers of the
it informs him what the inquirer is interested in interrogative words as they define the angle from
(what is on her/his mind) and [b] from what which the question was asked. They identify the at-
particular angle the intimated want of knowl- tributes of the missing information for the hearer.
edge is to be satisfied. Part (1) is formed from the remaining clauses.
As an example, consider question (D):
Although word order vis-a-vis these and related
(D) Which division of the computing facility works
distinctions has been discussed in light of the declara-
on projects using oceanography research?
tive sentence, less has been said about the interroga-
tive form. Halliday 1967 and Krizkova s are among Following the outline above, part (2) of the para-
the few to have analyzed the question. Despite the phrase will be the question minus the subclauses:
fact that they arrive at different conclusions, 6 the two " W h i c h division works on p r o j e c t s ? " Part (3), the
follow similar lines of reasoning. Krizkova argues that modifiers of the interrogative words, will be " o f the
both the wh-item of the wh-question and the finite computing facility", which modifies "which division".8
verb (e.g., " d o " or " b e " ) of the y e s / n o question point The remaining clause " p r o j e c t s using o c e a n o g r a p h y
to the new information to be disclosed in the response. research" is considered given information. The three
These elements, she claims, are the only unknowns to parts can then be assembled into a natural sequence:
the questioner. Halliday, in discussing the y e s / n o
(E) Assuming that there are projects using oceanog-
question, also argues that the finite verb is the only
raphy research, which division works on those
unknown. The polarity of the text is in question and
projects? Look for a division of the computing
the finite element indicates this.
facility. 9
In this paper the interpretation of the unknown
elements in the question as dfined by Krizkova and Information belonging to each of the three categor-
Halliday is followed. The wh-items, in defining the ies occurred in question (D). If one of these types of
questioner's lack of knowledge, act as new informa- information is missing, the question will be presented
tion. Firbas's analysis of the functions in questions is minus the initial or concluding clauses. Only part (2)
used to further elucidate the role of new information of the paraphrase will invariably occur. Note that this
in questions. The remaining elements are given infor- means that if there are no clauses in the original ques-
mation. They represent information assumed by the tion corresponding to parts (1) and (2) (i.e., the ques-
questioner to be true of the data base domain. This tion contains no relative clauses, prepositional phrases,

5 S u m m a r y by Firbas 1974 of the untranslated article " T h e 7 Here, subclauses are defined as relative clauses, preposition-
Interrogative Sentence and Some problems of the So-called F u n c - al phrases, and adjectival phrases.
tional Sentence Perspective (Contextual Organization of the Sen- 8 Note that this phrase also identifies a presupposition of the
tence)," N A S A Rec. 4, 1968. questioner. For the paraphrase, however, its function to precisely
6 It should be noted that Halliday and Krizkova discuss the specify what the questioner is interested in (which is new informa-
u n k n o w n s in the question in order to define the theme and r h e m e tion for the hearer) is of greater importance.
of a question. Although they agree about the u n k n o w n s for the 9 This example, as well as sample q u e s t i o n s and p a r a p h r a s e s
questioner, they disagree about which elements function as t h e m e that follow, were taken from actual sessions with the paraphraser.
and which function as rheme. A full discussion of their analysis Question (A) and its possible p a r a p h r a s e s (B) and (C) were not run
and conclusions is given in M c K e o w n 1979. on the system.

American Journal of C o m p u t a t i o n a l Linguistics, Volume 9, Number 1, January-March 1983 3


K a t h l e e n R. McKeown Paraphrasing Questions Using Given and New Information

or adjectival phrases), the paraphrase m a y be the same The subject of the main verb is the root node of the
as the original question. left subtree, the object (if there is one) the root node
If more than one clause occurs in a particular cate- of the right subtree. In the current system, the use of
gory, the question will be further splintered. Addi- binary relations in the parser's representation 10 creates
tional given information is parenthesized following the the illusion that every verb or preposition has a sub-
"assuming that . . . " clause. E x a m p l e (F) below illus-
ject and object. The p a r a p h r a s e r ' s tree does allow for
trates the p a r a p h r a s e for a question containing several
the r e p r e s e n t a t i o n of other constructions should the
clauses of given information and no clauses defining
incoming language use them.
specific attributes of the missing information. Clauses
containing information characterized by category (3) N o t e that the use of binary relations in the incom-
will be presented as separate sentences following the ing parse tree to represent the verbs and prepositions
s t r i p p e d - d o w n question. ( G ) below d e m o n s t r a t e s a of a sentence means that modifiers of verbs are repre-
p a r a p h r a s e containing m o r e than one clause of this sented as modifiers of their objects (and thus hang off
type of information. the object in the p a r a p h r a s e r ' s r e f o r m e d tree). While
this is not the usual interpretation of questions using
(F) Q: Which users work on projects in ocean-
such constructions, it functions a d e q u a t e l y for b o t h
ography that are sponsored by N A S A ?
CO-OP and the p a r a p h r a s e r as illustrated by a h y p o t h -
P: Assuming that there are projects in ocean-
ography (those projects are sponsored etical p a r a p h r a s e for such a question, shown below in
by N A S A ) , which users work on those (H):
projects?
(H) Q: Which p r o g r a m m e r s worked on ocean-
(G) Q: Which p r o g r a m m e r s in superdivision 5000 ography projects in 1972?
f r o m the ASD group are advised by P: Assuming that there were o c e a n o g r a p h y
T h o m a s Wirth? projects in 1972, which p r o g r a m m e r s
P: Which p r o g r a m m e r s are advised by T h o m a s worked on those projects?
Wirth? L o o k for p r o g r a m m e r s in superdivi-
sion 5000. The p r o g r a m m e r s must be f r o m E a c h of the p a r a p h r a s e subtrees r e p r e s e n t s o t h e r
the ASD group. clauses in the question. Both the subject and the ob-
ject of the main verb will have a subtree for each oth-
6. Implementation Overview er clause it participates in. If a noun in one of these
The p a r a p h r a s e r ' s first step in processing is to r e f o r m clauses also participates in another clause in the sen-
the parse tree it is given so that the main verb occurs tence, it too will have subtrees.
as the root of the new tree. This is done to simplify As an example, consider the question: " W h i c h ac-
the identification of given and new information in the tive users advised by T h o m a s Wirth work on projects
parse. The tree is then divided into three separate in area 3 ? " The phrase structure tree used in the par-
trees reflecting the division of given and new informa- aphraser is shown in Figure 1. Since " w o r k o n " is
tion in the question. The design of the tree allows for identified as the main verb of the question by the par-
a simple set of rules that flatten the tree. The final ser, it will be the root node of the tree. " u s e r s " is
stage of processing in the paraphraser is translation. root of the left subtree, " p r o j e c t s " of the right. Each
In the translation phase, labels in the parser's repre- noun participates in one other clause and therefore has
sentation are translated into their c o r r e s p o n d i n g one subtree. Modifiers are closely b o u n d to the noun
words. During this process, necessary transformations
they modify and are treated as properties of the noun
of the g r a m m a r are p e r f o r m e d upon the string.
(i.e., each node in the tree that is modified has a p r o p -
erty called " m o d i f i e r s " whose value is any adjectival
6.1 The phrase structure tree
or noun modifier). In Figure 1, modifiers are shown
In its initial processing, the p a r a p h r a s e r transforms the as part of the node label for clarity. Subtree nodes
parser's representation into one that is more conven- (the leaves of Figure 1) have three pieces of informa-
ient for generation purposes. The resultant structure tion associated with them:
is a tree that highlights certain syntactic features of
the question. This initial processing gives the para- • the relation b e t w e e n the node and its parent,
phraser some independence from the CO-OP system. • the noun phrase the node represents, and
Were the parser's representation changed or the com- • an indication of whether the node functions as
ponent m o v e d to a new system, only the initial proc- subject or object in the clause.
essing phase would need to be modified.
The p a r a p h r a s e r ' s phrase structure tree uses the 10 See Kaplan 1979 for a description of Meta Query Lan-
main verb of the question as the root node of the tree. guage, or MQL.

4 A m e r i c a n J o u r n a l of Computational L i n g u i s t i c s , V o l u m e 9, N u m b e r 1, J a n u a r y - M a r c h 1983
Kathleen R. M c K e o w n Paraphrasing auestions Using Given and N e w Information

work on mation and the two types of new information. The

/ \ splitting of the tree is accomplished by first extracting


the t o p m o s t smallest portion of the tree containing the
active users projects wh-item. At the very least, this will include the root
/ \ node plus the left and right subtree root nodes. This
portion of the tree is the stripped-down question. The
advised by in clauses that define the particular aspect f r o m which
T h o m a s Wirth area 3 the question is asked are found by searching the left
object object and right subtrees for the wh-item or questioned noun.
Figure 1. The subtree whose root node is the wh-item contains
these clauses. N o t e that this m a y be the entire left or
6 . 2 Dividing the tree right subtree or m a y only be a subtree of one of these.
The constructed tree is computationally suited for the The remainder of the tree represents given informa-
three-part paraphrase. The tree is flattened after it tion. Figure 2 illustrates this division for the previous
has been divided into subtrees containing given infor- example.

Pt. 2 information
(new)

Pt. 3 Pt. 1 information


(new) (given)

Q: Which active users advised by T h o m a s Wirth work on projects in area 3?


P: Assuming that there are projects in area 3, which active users work on these projects? L o o k for users
advised by T h o m a s Wirth.
Figure 2.

6.3 Flattening in-order linearization, and parts (1) and (3) are the
subtrees, which will be linearized by a pre-order trav-
If the structure of the phrase structure tree is
ersal. The use of two traversals to linearize the tree
Tree: Subtree: stems from the fact that different types of information
are stored at nodes at different levels in the tree. As a
node in a subtree has three pieces of information asso-
R R'

A
/\ B
/\
A' B'
ciated with it, one more rule is required to expand a
node. A node consists of:
• arc-label
• set-label
• subject/object
Figure 3.
where arc-label is the label of a binary relation in the
with A the left subtree and B the right, then the fol- input parse tree (i.e., a verb or preposition) and set-
lowing rules define the flattening process: label is the label of a set in the input parse (i.e., noun
TREE ~ A R B phrase). The input parse is in MQL representation,
which consists of sets and binary relations b e t w e e n
S U B T R E E ~ R wA v B v them. S u b j e c t / o b j e c t indicates whether the sub-node
In other words, the top level of the tree (shown on the noun phrase functions as subject or object in the
left in Figure 3) is linearized by an in-order traversal clause; it is used by the s u b j e c t - a u x t r a n s f o r m a t i o n
while each of its subtrees (shown on the right in Fig- and does not apply to the expansion rule. In Figure 2,
ure 3) is linearized by a pre-order traversal. In the the leaves of the tree carry these three pieces of infor-
example shown in Figure 2, part (2) of the tree corre- mation. For example, the leftmost leave has arc-label
sponds to the top level of the tree and will undergo advised by, set-label Thomas Wirth, and is labeled as

American Journal of Computational Linguistics, Volume 9, Number 1, January-March 1983 5


K a t h l e e n R. M c K e o w n Paraphrasing Questions Using Given and New Information

the object of the relation. The following rule expands of transformations is invoked through the application
a subtree node: of negation. It includes do-support, contraction, and
tense-placement. Has-deletion is not affected by the
N O D E -~ A R C - L A B E L S E T - L A B E L
absence or presence of other transformations. A de-
The tree of given information is flattened first. It scription of the t r a n s f o r m a t i o n rules follows. The
is part of the left or right subtree of the phrase struc- rules used here are b a s e d on analyses described by
ture tree and t h e r e f o r e is flattened by a p r e - o r d e r A k m a j i a n and H e n y (1975) and by Cullicover (1976).
traversal. It is during the flattening stage that the The rule for w h - f r o n t i n g is specified as follows,
words " A s s u m i n g that there [be] . . . " are inserted to where SD stands for structural description and SC,
introduce the clause of given information. " b e " will structural changes. Each rule is followed by an e x a m -
agree with the subject of the clause. Following these ple input string and the string after it has undergone
rules, the tree of given information in Figure 2 would the transformation. The full tree for the string is not
be flattened by a pre-order traversal yielding " p r o j e c t s shown, but the string is labeled by markers in the SD.
in area # 6 " ( R ' A t arc-label set-label). A f t e r the SD: X - NP - Y
" A s s u m i n g t h a t " clause is inserted, this portion of the 1 2 3
paraphrase is " A s s u m i n g that there be projects in area SC: 2+1 0 3
#6". If there is more than one clause, parentheses are
inserted around the additional ones. Input to rule:
The tree representing the stripped-down question is 1 2
flattened next, using the in-order traversal. Applying i I I I
this process to Part (2) of the tree in Figure 2 yields programmers in division 5 past plur work on wh projects?
the phrase " w h active users work on p r o j e c t s " (A R T r a n s f o r m e d input:
B). (In final processing stages, the correct d e m o n s t r a -
tive ( " t h o s e " or " t h a t " ) is selected to modify nouns 2 1
already mentioned in the first part of the paraphrase.) i I I t
wh projects programmers in division 5 past plur work on?
The tree that represents modifiers of the questions
noun is linearized to follow these phrases. A pre- The first step in the implementation of wh-fronting
order traversal of this portion of the tree in Figure 2 is a search of the tree for the wh-item. A slightly
yields "users advised b y T h o m a s W i r t h " (R t A T arc- different a p p r o a c h is used for paraphrasing than would
label set-label). A n y modifiers of a n o u n (here, be used if simply generating a question f r o m the input
" a c t i v e " ) are omitted in this part of the p a r a p h r a s e if parse. The difference occurs because in the original
they have already been mentioned. The phrase " L o o k question the NP to be fronted m a y be the head noun
f o r " is inserted before the first clause of modifiers. of some relative clauses or prepositional phrases. If
T w o transformations are applied during the flatten- generating, these clauses would be fronted along with
ing process. T h e y are w h - f r o n t i n g and s u b j e c t - a u x the head noun. Since the clauses of the original ques-
inversion. O t h e r transformations are applied following tion are b r o k e n d o w n for the paraphrase, it will never
the flattening process to produce sentences in final be the case when paraphrasing that the NP to be front-
grammatical form. ed also d o m i n a t e s relative clauses or prepositional
phrases. F o r this reason, the applicability of wh-
6.4 Transformations fronting is testing for and is applied in the flattening
process of the stripped-down question. N o t e that the
The g r a m m a r used in the p a r a p h r a s e is a t r a n s f o r m a - phrase markers (or categories) of each word are re-
tional one. In addition to the basic flattening rules tained as the tree is flattened and thus the SD's can be
described above, the following t r a n s f o r m a t i o n s are matched against b o t h the tree and its linearized ver-
used: sion. If wh-fronting applies, only one word need be

~
wh-fronting m o v e d to the initial position.
The p a r a p h r a s e r is c a p a b l e of generating English
negation
f r o m the input as well as paraphrasing (see Section 7).
do-support
W h e n g e n e r a t i o n is being done, the applicability of
subject-aux inversion
wh-fronting is tested for immediately before flattening.
tense-placement
If the t r a n s f o r m a t i o n applies, the tree is split. The
contraction
subtree of which the wh-item is the root is flattened
has-deletion
separately f r o m the remainder of the tree and is atta-
The curved lines indicate the ordering restrictions. ched in fronted position to the string resulting f r o m
There are two connected groups of transformations. If flattening the other part.
wh-fronting applies, then so will do-support, subject- A f t e r wh-fronting has b e e n applied, d o - s u p p o r t is
aux inversion, and tense-placement. The second group invoked. In CO-OP, the underlying representation of

6 A m e r i c a n J o u r n a l of C o m p u t a t i o n a l Linguistics, V o l u m e 9, N u m b e r 1, J a n u a r y - M a r c h 1983
Kathleen R. M c K e o w n Paraphrasing Questions Using Given and New Information

the question does not contain modals of auxiliary Input to rule:


verbs• Thus, fronting the wh-item necessitates supply-
1 2 3
ing an auxiliary• The following rule is used for do-
support: IWh projects
• ' r~o ~p r o g r a m m e r s in division 5
SD: NP - NP - tense - num - V - X 4 5
1 2 3 4
'past plur ~ 1work on?'
SC: 1 2+do 3 4
condition: 1 dominates wh T r a n s f o r m e d input:

In_nput to rule: 1 2 4
, , f'-"l , ,
1 2 Wh projects do past plur
• l I 5 I
'wh projects p r o g r a m m e r s in division 3 5
t , I I
3 p r o g r a m m e r s in division 5 work on?

'past plur work on?' Some t r a n s f o r m a t i o n a l analyses p r o p o s e that wh-


fronting and subject-aux inversion apply to the relative
T r a n s f o r m e d input: clause as well as the question. In the CO-OP para-
1 2+do phraser, the h e a d - n o u n is properly positioned by the
, , , flattening process and wh-fronting need not be used.
wh projects p r o g r a m m e r s in division 5 do Subject-aux inversion, however, m a y be applicable. In
cases where the head noun of the clause is not its sub-
3
ject, subject-aux inversion results in the proper order•
I I
past plur work on? The rule for negation is tested during the transla-
tion phase of execution. It has b e e n formalized as:
Subject-aux inversion is activated immediately af-
terwards. Again, if wh-fronting is applied, subject-aux SD: X - t e n s e - n u m - V - NP - Y
inversion will apply also. The rule is: 1 2 3 4
SC: 1 2+no 3 4
SD: NP - NP - AUX - X condition: 4 marked as negative
1 2 3 4
SC: 1 3+2 0 4 Input to rule:
condition: 1 dominates wh 1 2 3
I II I
Input to rule: 'wh students' pres plur have advisors?
1 3 (advisors has p r o p e r t y " n e g " )
t ! I T r a n s f o r m e d input:
wh projects p r o g r a m m e r s in division 5 do
1 2 + no 3

'past plur work on~ 'wh students' 'pres plur hav~ ' no ' fadvisors?l
In the CO-OP representation, an indication of negation
T r a n s f o r m e d input:
is carried on the object of a binary relation (see K a -
1 3 2 plan 1979)• When generating an English r e p r e s e n t a -
f ] ~ I tion of the question, it is possible in some cases to
wh projects do p r o g r a m m e r s in division 5
express negation as modification of the noun (see
4 question (H) below)• In all cases, however, negation
can be indicated as part of the verb (see version (I) of
past plur work on? question (H)). Therefore, w h e n the object is marked
T e n s e - p l a c e m e n t follows s u b j e c t - a u x inversion• as negative, the p a r a p h r a s e r m o v e s the negation to
Tense, number, and negation (if present) are attributes b e c o m e part of the verbal element•
of all verbs in the p a r s e r ' s representation• W h e n an (H) Which students have no advisors?
auxiliary is generated, the tense, number, and negation (I) Which students d o n ' t have advisors?
are m o v e d f r o m the verb to the auxiliary• Formally:
In English, the negative m a r k e r is attached to the
SD: X - AUX - Y - tense-num ( - n o - ) V - Z auxiliary of the verbal element and, therefore, as was
1 2 3 4 5 6 the case for questions, an auxiliary must be generated•
SC: 1 2+4 3 0 5 6 D o - s u p p o r t is used. The rule for d o - s u p p o r t after

American Journal of Computational Linguistics, Volume 9, N u m b e r 1, January-March 1983 7


K a t h l e e n R. McKeown Paraphrasing Questions Using Given and New I n f o r m a t i o n

negation differs from the one used after wh-fronting. corrective r e s p o n s e that could be g e n e r a t e d b y the
They are presented this way for clarity, but could have p a r a p h r a s e r if (J) were asked:
been combined into one rule.
(J) Which p r o g r a m m e r s in division 3 work on pro-
SD: X - t e n s e - n u m - V - n o - Y jects in o c e a n o g r a p h y ?
1 2 3 (K) I d o n ' t k n o w of any projects in oceanography.
SC: 1 do+2 3
Alternative suggestions are also used by the CO-OP
Input to rule: system when the direct response to the user's question
is negative. If an incorrect presupposition is r e m o v e d
1 2 3
f r o m a question, the resulting question m a y no longer
i I ! I' I
wh students pres plur have no advisors? have a negative response. 11 In such cases, CO-OP
suggests the wider class question to the user as a pos-
T r a n s f o r m e d input: sible interest. CO-OP passes the MQL r e p r e s e n t i n g
1 do + 2 3 this question to the paraphraser, which generates the
I I I English for the suggestion. A sequence like (J), (K)
'wh s t u d e n t s " do " p r e s plur have no advisors? above might be followed by the alternative suggestion
T e n s e - p l a c e m e n t , as described above, m o v e s the (L):
tense, number, and negation from the verb to the aux- (L) But you might be interested in p r o g r a m m e r s in
iliary verb. The cycle of t r a n s f o r m a t i o n s invoked division 3 that work on any projects.
through application of negation is completed with the
contraction transformation. The s t a t e m e n t of the F o r both types of responses, the p a r a p h r a s e r gener-
contraction t r a n s f o r m a t i o n is: ates the response using the p a r a p h r a s e functions with
minor differences. The flattening process for genera-
SD: X - d o + t e n s e - n u m - V - n o - Y tion differs f r o m that used for p a r a p h r a s e s in that the
1 2 3 4 5 tree is not divided into subtrees representing given and
SC: 1 #2+n't# 3 0 5 new information and, therefore, the tree is flattened as
a whole. The transformational g r a m m a r also applies
Input to rule:
to the g e n e r a t i o n process, with the one d i f f e r e n c e
1 2 3 4 5 being the point at which the applicability of wh-
I i I I II II I fronting is tested for (described in Section 6.4). O t h e r
'wh students do pres plur have no advisors?
t h a n these changes and the use of different leading
T r a n s f o r m e d rules: phrases (e.g., " B u t you might be interested in . . . " ) ,
the generation process is the same as the p a r a p h r a s e r
1 #2+n'+# 3 0 5 process. The generation function is general enough
, i , I I , I I
wh students #do+pres+plur+n't# have advisors? that it could be used for other types of responses in
cases when something other than a direct response is
where # indicates that the result must be treated as a needed.
unit for further t r a n s f o r m a t i o n s . The m o r p h o l o g y
routines will combine the result to produce " d o n ' t " .
8. Related Research

7. O t h e r Features of the Paraphraser At the time of the CO-OP p a r a p h r a s e r implementation,


two main other paraphrasers had b e e n developed and
The p a r a p h r a s e r is used for a second purpose in addi- i m p l e m e n t e d for data b a s e q u e s t i o n - a n s w e r i n g sys-
tion to paraphrasing. It can generate an English ver- tems:
sion of the p a r s e r ' s r e p r e s e n t a t i o n as well as p a r a - • PLANES, Waltz et al. 1978;
phrase in the three-part form. This function uses the • RENDEZVOUS Version 1, C o d d 1978.
same procedures and g r a m m a r as the three-part para- Both systems used templates to f o r m the paraphrases.
phraser, but the tree is not split into three separate T e m p l a t e s are canned English phrases (or sentences)
trees before being flattened. containing slots that m a y be filled with different words
In CO-OP, generation is used to produce alternative to produce a variety of full English phrases.
suggestions and corrective responses. A corrective The PLANES system generates the p a r a p h r a s e f r o m
response is used to correct the user's false presupposi- the formal data base query using templates. The proc-
tions. When an existential presupposition encoded in ess involves three specific actions. English words are
the question is incorrect, the portion of MQL repre- substituted for any abbreviations or code n a m e s in the
senting the failed presupposition (this is determined by
CO-OP) is passed to the paraphraser, which generates ll See Kaplan 1979 for details on determining the most ap-
the corrective response. F o r example, (K) below is a propriate alternative suggestion.

8 A m e r i c a n J o u r n a l of Computational Linguistics, V o l u m e 9, N u m b e r 1, J a n u a r y - M a r c h 1983


Kathleen R. M c K e o w n Paraphrasing Questions Using Given and N e w Information

data base query, using a table look-up. A single ap- must be done by hand in formatting the English phras-
propriate paraphrase template is selected for use based es. All questions that will be asked must be anticipat-
on the query, and the slots in the template are then ed ahead of time, and although the systems can be
filled with words and phrases from the query. The extended by adding new templates, undesirable inter-
major effort in designing this kind of system is in the actions b e t w e e n new and old templates must be specif-
formation, by hand, of templates suitable for the par- ically avoided, and each new required addition does
ticular data base and for the types of questions that not ease the addition of subsequent templates. N o t e
can be asked. An example of an English question and that this means coverage in a template system is also
the PLANES paraphrase for it are shown below in (M): difficult to specify.
The use of a g r a m m a r in the CO-OP p a r a p h r a s e r
(M) Q: H o w m a n y flights did plane 3 make in
makes it more flexible than these earlier paraphrasers:
Jan 73?
• less work must be done by hand in formulating the
P: PLANES searches the MONTHLY FLIGHT
system,
and MAINTENANCE SUMMARIES and
• interactions b e t w e e n t e m p l a t e s are not a p r o b l e m
returns: The value of TOTAL FLIGHTS
since the g r a m m a r d e t e r m i n e s h o w to c o m b i n e
for plane SERIAL #3 during J a n u a r y 1973.
words and phrases in an acceptable way, and
The RENDEZVOUS system also generates the para- • the system is capable of handling new questions for
phrase f r o m the formal query using templates, al- which it has not been explicitly prepared, as long as
though it is slightly more sophisticated than Waltz's. they fall within the syntactic range of the system.
There are three parts to generation, and two types of The p a r a p h r a s e r ' s ability to p e r f o r m the generation
templates are used. A header template corresponding task described in the previous section nicely illustrates
to the type of query is chosen first. There are three its flexibility. N o t e f u r t h e r m o r e that the CO-OP para-
types of queries in the system (FIND, EXIST, COUNT), phraser specifically addresses the problems of disam-
of which FIND occurs most frequently. The header for biguating relative clause modification in a general way
FIND is PRINT THE ... EVERY .... where the dots and of generating a p a r a p h r a s e that differs from the
must be filled in. The second part to the paraphrase is original question on a theoretical basis, issues not ad-
the target list. It specifies the attributes requested by dressed by either the PLANES or the RENDEZVOUS
the user and is supplied by doing a table look-up on paraphraser.
the attribute. The third part of the p a r a p h r a s e is
called the body. It is formed by extracting templates 9. Conclusions
f r o m tables, associated with particular items in the
query, that specify restrictions on the requested values. The p a r a p h r a s e r described here is a syntactic one.
An example of a query and the paraphrase generated While this work has examined the reasons for different
by RENDEZVOUS is shown in (N) below. f o r m s of expression, additions must be m a d e in the
area of semantics. The substitution of s y n o n y m s ,
(N) Q: I want to find certain projects. Pipes were phrases, or idioms for portions or all of the question
sent to them in Feb. 1975. requires an e x a m i n a t i o n of the effect of context on
P: Print the name of every project to which a word meaning and of the intentions of the speaker on
shipment of a part n a m e d pipe was sent word or phrase choice. The lack of a rich semantic
during F e b r u a r y 1975.
base and contextual information dictated the syntactic
The goals of the RENDEZVOUS generation c o m p o - approach used here, but the p a r a p h r a s e r can be ex-
nent are important ones. The generated English must tended once a wider range of i n f o r m a t i o n b e c o m e s
be unambiguous, easy to understand, discriminating, available.
and not misleading (Codd 1978). Instead of develop- W h e n testing the i m p l e m e n t a t i o n of the CO-OP
ing a general solution to achieve these goals, however, system and extending its linguistic coverage, the para-
the research seems to be c o n c e n t r a t e d on particular phraser proved particularly helpful in debugging incor-
examples which d o n ' t meet these criteria. This results rect parses. It provided fast, easy-to-recognize notifi-
in part from the use of templates. The templates must cation w h e n an incorrect i n t e r p r e t a t i o n had b e e n
be constructed b e f o r e h a n d for a particular data base, made. This leads us to believe that the p a r a p h r a s e
and great care must be taken to choose phrases that would also prove helpful to actual users of the system
can be easily patched together with a variety of other were CO-OP to interpret a question differently than it
phrases. U n f o r e s e e n interaction between juxtaposed was intended. Testing of this facility with a large
phrases is a problem that frequently arises. Such an n u m b e r of actual users remains a topic for future
approach necessitates looking at particular examples, work.
instead of the general framework. The CO-OP p a r a p h r a s e r has b e e n designed to be
In both of these systems, the use of t e m p l a t e s d o m a i n - i n d e p e n d e n t , and thus a change of the data
means that the major effort in developing the system base requires no change in the paraphraser. P a r a p h r a -

American Journal of Computational Linguistics, Volume 9, Number 1, January-March 1983 9


Kathleen R. M c K e o w n Paraphrasing Questions Using Given and N e w Information

sers that use the template form, however, will require of Relational Data Bases. IBM Research Report RJ2144, IBM
Research Laboratory, San Jose, California.
such changes. This is because the templates or pat-
Cullicover, P.W. 1976. Syntax. Academic Press, New York, New
terns, which constitute the type of question that can York.
be asked, are necessarily d e p e n d e n t on the domain. Danes, F., Ed. 1974 Papers on Functional Sentence Perspective.
Different sets of templates must be used for different Academia, Prague.
data bases. Firbas, Jan. 1966 On Defining the Theme in Functional Sentence
Analysis. In Travaux Linguistiques de Prague 1. University of
The CO-OP p a r a p h r a s e r also differs f r o m o t h e r
Alabama Press.
systems in that it generates the question using a trans- Fibras, Jan. 1974 Some Aspects of the Czechoslovak Approach to
f o r m a t i o n g r a m m a r of questions. It addresses two Problems of Functional Sentence Perspective. Papers on Func-
specific problems involved in generating paraphrases: tional Sentence Perspective. Academia, Prague.
Goldman, N. 1975 Conceptual Generation. In Schank, R.C., Ed.,
1. ambiguity in determining which noun phrases a Conceptual Information Processing. North-Holland Publishing
relative clause modifies; Co., Amsterdam.
2. the production of a question that differs f r o m the Grice, H.P. 1975 Logic and Conversation. In Cole, P. and Mor-
gan, J.L., Ed., Syntax and Semantics: Speech Acts, Vol. 3. Aca-
user's.
demic Press, New York, New York.
T h e s e goals have b e e n achieved for questions using Halliday, M.A.K. 1967 Notes on Transitivity and Theme in Eng-
lish. Journal o f Linguisticsx 3.
relative clauses through the application of a theory of
Heidorn, G. 1975 Augmented Phrase Structure Grammar. In
given and new information to the generation process. TINLAP-1 Proceedings.
Joshi, A.K. 1979 Centered Logic: The Role of Entity Centered
Acknowledgments Sentence Representation in Natural Language Inferencing. in
IJCAI Proceedings.
This work was partially supported by an IBM fellow- Kaplan, S.J. 1979 Cooperative Responses from a Portable Natural
ship and NSF grant MC78-08401. I would like to thank Language Data Base Query System. Ph.D. dissertation. Univer-
Dr. Aravind K. Joshi and Dr. Bonnie W e b b e r for their sity of Pennsylvania, Philadelphia, Pennsylvania.
McDonald, D.D. 1978 Subsequent Reference: Syntactic and Rhet-
invaluable c o m m e n t s on the style and content of this
orical Constraints. In TINLAP-2 Proceedings.
paper. McKeown, K. 1979 Paraphrasing Using Given and New Informa-
tion in a Question-answering System. Master's thesis. Univer-
References sity of Pennsylvania, Philadelphia, Pennsylvania.
Morgan, J.L. and Green, G.M. 1977 Pragmatics and Reading
Akmajian, A. and Heny, F. 1975 An Introduction to the Principles of
Comprehension. University of Illinois.
Transformational Syntax. Academic Press, New York, New York.
Prince, E. 1979 On the Given/New Distinction, CLS 15.
Chafe, W.L. 1977 Givennness, Contrastiveness, Definiteness,
Simmons, R. and Slocum, J. 1972 Generating English Discourse
Subjects, Topics, and Points of View. In Li, C.N., Ed., Subject
from Semantic Networks, Communications o f the A CM 5 (10).
and Topic. Academic Press, New York, New York.
Waltz, D.L. 1978 An English Language Question Answering Sys-
Codd, E.F. et al. 1978 Rendezvous Version 1: An Experimental
English-Language Query Formulation System for Casual Users tem for a Large Relational Data Base, CACM 21(7).

10 American Journal of Computational Linguistics, Volume 9, N u m b e r 1, January-March 1983

Potrebbero piacerti anche