Teaching Machines To Ask Useful Clarification Questions

Teaching Machines to Ask
Useful Clarification Questions
Sudha Rao Committee

PhD Defense Examination Prof. Hal Daumé III (advisor)
Dept. of Computer Science Prof. Philip Resnik
University of Maryland, College Park Prof. Marine Carpuat
Prof. Jordan Boyd-Graber
Prof. Lucy Vanderwende
Natural Language Understanding
2
How long does it take to get a PhD?
3
Give me a recipe
for lasagna
4
Give me a recipe
for lasagna
Please bring me my
coffee mug from the
kitchen
5
Give me a recipe
for lasagna
Please bring me my
coffee mug from the
kitchen
6
Human Interactions
7
Human Interactions
Please bring me my
coffee mug from the
kitchen
8
Human Interactions
Please bring me my
coffee mug from the
kitchen
9
Human Interactions
Please bring me my
coffee mug from the
kitchen
What color is
your coffee mug?
10
Teach Machines to Ask Clarification Questions
11
Context-aware questions about missing information
12
How long does it take to get a PhD ?
In which field?
13
How long does it take to get a PhD ? Give me a recipe

for lasagna
In which field?
Any dietary
restrictions?
14
How long does it take to get a PhD ? Give me a recipe

for lasagna
In which field?
Any dietary
restrictions?
Please bring me my
coffee mug from the
kitchen What color is your
coffee mug?
15
PRIOR WORK
16
Reading Comprehension Question Generation
My class is going to the movies on a field trip next week.

We have to get permission slips signed before we go.
We are going to see a movie that tells the story from a book we read.
Q: What do the students need to do before going to the movies?
o  Vasile, et al. NLG 2010

o  Heilman. PhD thesis 2011
o  Olney, Graesser, and Person. Dialogue & Discourse 2012
o  Richardson, et al. EMNLP 2013
o  Chali and Hasan. ACL 2015
o  Serban, et al. ACL 2016
o  Du, Shao & Cardie ACL 2017
o  Tang et al. NAACL 2018
o  Mrinmaya and Xing. NAACL 2018
17
Question Generation for Slot Filling
SLOTS
USER: I want to go to Melbourne on July 14
<origin city>
SYSTEM: What time do you want to leave?
<departure city>
USER: I must be in Melbourne by 11 am
<origin time>
SYSTEM: Would you like a Delta flight that arrives at 10.15 am?
<departure time>
USER: Sure
<airline>
SYSTEM: In what name should I make the reservation?
o  Goddeau, et al. 1996

o  Bobrow., et al. Artificial intelligence 1977
o  Lemon, et al. EACL 2006
o  Williams, et al SIGDIAL 2013
o  Young, et al. IEEE 2013
o  Dhingra, et al. ACL 2017
o  Bordes, et al. ICLR 2017
18
Visual Question Generation Task
Q: Was anyone injured in the crash?
Q: Is the motorcyclist alive?
Q: What caused the accident?
Mostafazadeh et al. "Generating natural questions about an image." ACL 2016
19
We consider two scenarios
20
We consider two scenarios -- First Scenario
StackExchange
How to set environment variables for installation?

I'm aiming to install ape, a simple code for pseudopotential generation.
I'm having this error message while running ./configure
<error message> Context
So I have the library but the program installation isn't finding it.
Any help? Thanks in advance!
21
We consider two scenarios -- First Scenario
StackExchange
How to set environment variables for installation?

<error message> Context
What version of Ubuntu do you have?
How are you installing ape? Shortlist of useful

questions
Do you have GSL installed?
22
We consider two scenarios -- Second Scenario
Amazon
23
We consider two scenarios -- Second Scenario
Amazon
Is this induction safe?
What is the warranty or guarantee on this?
What are the handles made of?
24
Our Contributions
1.  Question Ranking Model:

ü  Good question is one whose answer is useful
25
Our Contributions
1.  Question Ranking Model:

ü  Good question is one whose answer is useful
2.  Question Generation Model:

ü  Generate question from scratch
ü  Sequence-to-sequence trained using adversarial networks
26
Talk Outline
o  How we build the clarification questions dataset?
o  How we rank clarification questions from an existing set?
o  How we generate clarification questions from scratch?
o  How we control specificity of the generated clarification questions?
o  Future Directions
27
Talk Outline
28
Clarification Questions Dataset: StackExchange
29
How to configure path or set environment variables for installation?

<error message>
30

I'm having this error message while running ./configure Initial Post
<error message>
Finding: Questions go unanswered for a long time if they are not clear enough
Asaduzzaman, Muhammad, et al. "Answering questions about unanswered questions of stack

overflow.” Working Conference on Mining Software Repositories. IEEE Press, 2013.
31

<error message>
Question
comment
What version of ubuntu do you have?
32

<error message>
Question
comment
I'm aiming to install ape in Ubuntu 14.04 LTS, a simple code for
pseudopotential generation.
<error message> Updated Post
33

<error message>
Question
comment
I'm aiming to install ape in Ubuntu 14.04 LTS, a simple code for Edit as an answer
pseudopotential generation. to the question
34

<error message>
Question
comment
I'm aiming to install ape in Ubuntu 14.04 LTS, a simple code for Edit as an answer
pseudopotential generation. to the question
35
Dataset Creation
( context, question , answer ) triples
context Original post
question Clarification question posted in comments
answer Edit made to the post in response to the question

OR author’s reply to the question comment
Dataset Size: ~77 K triples

Domains: AskUbuntu, Unix, Superuser
36
Clarification Questions Dataset: Amazon
37
McAuley and Yang. Addressing complex and subjective product-related queries with customer reviews. WWW 2016
38
context
question
answer
39
context
question
answer
Dataset Size: ~24K (3-10 questions per context)

Domain: Home & Kitchen
40
Talk Outline
§  Two datasets: StackExchange & Amazon
41
Talk Outline

ü  Two datasets: StackExchange & Amazon
Sudha Rao, Hal Daumé III, "Learning to Ask Good Questions: Ranking Clarification Questions using Neural Expected
Value of Perfect Information ”, ACL 2018
42
Expected Value of Perfect Information (EVPI) inspired model
Mordecai et al. "The value of information and stochastic programming." Operations Research 18.5 (1970)
43
o  Use EVPI to identify questions that add the most value to the given post
44
o  Definition: Value of Perfect Information VPI (x|c)

How much value does x add to a given information content c?
45

o  Since we have not acquired x, we define its value in expectation
46

EVPI (x|c) =
x X
47

Likelihood of x given c
EVPI (x|c) = P (x|c)
x X
48

Likelihood of x given c
EVPI (x|c) = P (x|c) Utility(x, c)
x X
Value of updating c with x
49
EVPI formulation for our problem
50
EVPI ( qi | c )=
c : given context
qi : question from set of question candidates Q
51
Likelihood of aj being the answer to qi on context c
EVPI ( qi | c )= P( aj | c , qi )
c : given context
52
EVPI ( qi | c )= P( aj | c , qi ) U( c + aj )
Utility of updating the context c with answer aj
c : given context
53
aj A
Utility of updating the context c with answer aj
c : given context
aj : answer from set of answer candidates A
54
We rank questions by their EVPI value
aj A
Question Candidates EVPI value
What is the make of your wifi card? 0.34
What version of Ubuntu do you have? 0.85
What OS are you using?

0.67
55
We rank questions by their EVPI value
aj A
Question Candidates EVPI value
What is the make of your wifi card? 0.34 What version of Ubuntu do you have?
What version of Ubuntu do you have? 0.85 What OS are you using?
What OS are you using? What is the make of your wifi card?
0.67
56
Three parts of our formulation:
qi Q aj A
Question & Answer Answer Utility

Candidate Generator Modeling Calculator
1 2 3
57
qi Q aj A
Question & Answer

Candidate Generator
58
1. Question & Answer Generator
Dataset of
(post, question, answer)
Post as
Documents
Lucene
Post p as
Search
query
Engine
59
Dataset of Ten posts

(post, question, answer) similar to given
post p
p1
Post as
Documents
p2
Lucene
Post p as pj
Search
query
Engine
p10
60
Dataset of Ten posts Questions

(post, question, answer) similar to given paired with
post p those posts
p1 q1
Post as
Documents q2
p2
Lucene qj
Post p as pj
Search
query
Engine
p10 q10
61
Dataset of Ten posts Questions Answers paired

(post, question, answer) similar to given paired with with those posts
post p those posts
p1 q1 a1
Post as
Documents q2
p2 a2
Lucene qj
Post p as pj aj
Search
query
Engine
p10 q10 a10
62
qi Q aj A
Answer
Modeling
63
2. Answer Modeling
P( aj | c , qi )≈ cosine_sim ( Embans( c , qi ), aj )
64
2. Answer Modeling
Neural
Embedding
Network
c qi aj
65
2. Answer Modeling
Training objective Neural

Embedding
Network
close
( c , qi ) a0 Correct
answer
c qi aj
a1
Other
answers
a10
66
2. Answer Modeling
Feedforward Average
Neural Network c qi
Context Question
LSTM LSTM
Word embedding module
c qi aj
67
qi Q aj A
Utility
Calculator
68
3. Utility Calculator
U( c + aj ) Value between
0 and 1
Neural Network
c qi aj
69
0 and 1
Training objective
Neural Network
Label
Original
( c , q0 , a0 ) (ques, ans) y=1
c qi aj
( c , q1 , a1 ) y=0
Other
(ques, ans)
( c , q10 , a10 ) y=0
70
0 and 1
Feedforward
Neural
Network c qi aj
Context Question Answer

LSTM LSTM LSTM
c qi aj
71
Our EVPI inspired question ranking model (in summary)
qi Q aj A
Question & Answer Answer Utility

Candidate Generator Modeling Calculator
72
Human-based Evaluation Design
73
TALK: Teaching Machines to Ask

Clarification Questions
74
TALK: Teaching Machines to Ask

What is going on?
What is EVPI?
How many candidates do you consider?
How is answer used in selecting useful questions?
When is lunch?
75
TALK: Teaching Machines to Ask Annotator 1

Best Valid
What is going on?
What is EVPI?
When is lunch?
Note: We use UpWork to find expert annotators 76

TALK: Teaching Machines to Ask Annotator 1 Annotator 2

Best Valid Best Valid
What is going on?
What is EVPI?
When is lunch?

Human-based Evaluation Design (Union of “best”)

What is going on?
What is EVPI?
When is lunch?

Human-based Evaluation Design (Intersection of “valid”)

What is going on?
What is EVPI?
When is lunch?

Research Questions for Experimentation
80
1.  Does a neural network architecture improve upon non-neural baselines?
81
2.  Are answers useful in identifying good questions?
82
2.  Are answers useful in identifying good questions?
3.  Does EVPI formalism improve over a traditionally trained neural network?
83
Neural Baseline Model
o  Neural (c, q, a)
Value between 0 and 1
Feedforward
ci qi ai
Both Neural (c, q, a) and
Neural
Network EVPI (q|c, a) have similar
Context Ques Ans no. of parameters
LSTM LSTM LSTM
ci qi ai
84
Human based evaluation results on StackExchange
Union of Best
Random 17.5
0 10 20 30 40
Precision @1
85
Union of Best
Bag-of-ngrams (c, q, a) 19.4
Random 17.5
0 5 10 15 20 25 30 35 40
Precision @1
86
Union of Best
Features (c, q) 23.1
Random 17.5
0 5 10 15 20 25 30 35 40
Precision @1
Nandi, Titas, et al. IIT-UHH at SemEval-2017 task 3: Exploring multiple features for community question
answering and implicit dialogue identification. Workshop on Semantic Evaluation (SemEval-2017).
87 2017.
Union of Best
Neural (c, q, a) 25.2
Non-linear vs linear
Random 17.5
0 5 10 15 20 25 30 35 40
Precision @1
88
Union of Best

Explicitly modeling
“answer” is useful
Neural (c, q) 21.9
Random 17.5
0 5 10 15 20 25 30 35 40
Precision @1
89
Union of Best
EVPI (q|c, a) 27.7 Mainly differ in

their loss function
Neural (c, q) 21.9
Random 17.5
Train: 61,678
Tune: 7,710
0 5 10 15 20 25 30 35 40
Test: 500 Precision @1
Note: Difference between EVPI and all baselines is statistically significant with p < 0.05
90
Talk Outline


ü  Answers are helpful in identifying useful questions
ü  EVPI formalism outperforms traditional neural network
91
Talk Outline


ü  Answers are helpful in identifying useful questions
o  Conclusion
Sudha Rao, Hal Daumé III, "Answer-based Adversarial Training for Generating Clarification
Questions”, In Submission 92
92
Issue with the ranking approach
o  It only regurgitates previously seen questions
Existing contexts New unseen contexts
Contexts with Ubuntu OS Contexts with Windows OS
What version of Ubuntu do you have? What version of Windows do you have?
93
Issue with the ranking approach
o  It only regurgitates previously seen questions
o  It relies on Lucene to get the initial set of candidate questions
Existing contexts New unseen contexts
Contexts with Ubuntu OS Contexts with Windows OS
What version of Ubuntu do you have? What version of Windows do you have?
94
Sequence-to-sequence neural network model
Ilya Sutskever, Oriol Vinyals, and Quoc V Le. Sequence to sequence learning with neural networks. NIPS 2014
95
o  Given an input sequence, generate output sequence one word at a time
96
A B C <EOS>
97
A B C <EOS>
98
W X
A B C <EOS> W
99
W X Y
A B C <EOS> W X
100
W X Y Z
A B C <EOS> W X Y
101
W X Y Z <EOS>
A B C <EOS> W X Y Z
102
o  Trained to maximize the likelihood of input-output pairs in data
W X Y Z <EOS>
A B C <EOS> W X Y Z
103
Max-likelihood clarification question generation model
Context Loss function
Loss = - log Pr(q|c)
Question
Generator
(Seq2seq)
Question
104
Max-likelihood clarification question generation model
Context Loss function
Loss = - log Pr(q|c)
Question
Generator Issues
(Seq2seq)
o  Maximum-likelihood (MLE) training generates
generic questions
Question What are the dimensions?

Is this made in China?
o  MLE relies heavily on the original question.

Contexts can have multiple good questions
Li et al. A diversity-promoting objective function for neural conversation models. In NAACL, 2016.
105
Max-utility based clarification question generation model
Context
Question Answer
Generator Generator
(Seq2seq) (Seq2seq)
Question Answer
106
Context
Question Answer
Utility
Generator Generator Reward
Calculator
(Seq2seq) (Seq2seq)
Question Answer
107
Context
Question Answer
Utility
Calculator
(Seq2seq) (Seq2seq)
Question Answer
Train Question Generator to Maximize this Reward
108
Context
Reward
Calculator
Question Answer
Utility
Calculator
(Seq2seq) (Seq2seq)
Question Answer
109
Max-likelihood vs Max-utility
Context Objective: Objective:

Maximize likelihood of Maximize reward
(context, question) pairs
Question
Generator
(Seq2seq)
Loss Function: Loss Function:
Loss = - log Pr(q|c) Loss = - reward(q|c)
Question
Reward
Calculator
Reward
110

Question
Generator
(Seq2seq)
Question
Differentiable Non- Differentiable
Reward
Calculator
Similar to discrete metrics
like BLEU & ROUGE
Reward
Ranzato, Marc'Aurelio, et al. "Sequence level training with recurrent neural networks." ICLR 2016
111

Question
Generator
(Seq2seq)
Question
Differentiable Non- Differentiable
Reward Therefore, we use

Calculator Reinforcement Learning
Reward
Ranzato, Marc'Aurelio, et al. "Sequence level training with recurrent neural networks." ICLR 2016
112
ould generate an answer that would increase the utility of the context by adding useful
to it (see §2.3 for details).
ptimizing metrics like BLearning
Reinforcement LEU and ROUGE , this U TILITY function
for Clarification QuestionalsoGeneration
operates on dis-
utputs, which makes optimization difficult due to non-differentiability. A successful
oach dealing with the non-differentiability while also retaining some advantages of max-
hood training is the Mixed Key
Context Incremental
Idea: Cross-Entropy Reinforce (Ranzato et al., 2015)
M IXER). In M IXER, the overall loss L is differentiated as in R EINFORCE (Williams,
ü  Estimate loss by drawing samples (“questions”)
Question
L(✓) Eqs ⇠p✓ r(q s )
=Generator ; r✓ L(✓) = - Eq s ⇠p✓ r(q
Loss = s
)r✓ log
reward(q p✓ (q s )
s|c) (3)
(Seq2seq)
a random output sample according to the model p✓ , where ✓ are the parameters of the
We then approximate the expected gradient using a single sample q s = (q1s , q2s , ..., qTs )
odel distribution (p✓ ). In R EINFORCE, the policy is initialized random, which can cause
gence times.Question
To solve this, M IXER starts by optimizing maximum likelihood and slowly
imizing the expected reward from Eq 3. For the initial time steps, M IXER optimizes
r the remaining (T ) time steps, it optimizes the external reward.
Reward
el, we minimize the U TILITY-based
Calculator loss Lmax-utility defined as:
T
X
Lmax-utility = (r(q p ) r(q b )) log p(qt |q1 , q2 , ..., qt 1 , ct ) (4)
Reward
t=1
) is the U TILITY based reward on the predicted question and r(q b ) is a baseline reward
o reduce the high variance otherwise observed when using R EINFORCE. 113
ould
lity
ty thatgenerate
that as the
would
would an
be answer
beutility
obtained
obtained thatthat ifwould
ifwould
the be
the context increase
contextobtained were
were theif utility
the
updated
updated of
contextwith
with thethewere
thecontext updated
answer
answer byto toadding
with useful
the proposed
the the
proposed answer
Rao to it & Daumé
(see §2.3 III (2018)
for details). Recently
observed Rao
that & Daumé
usefulness III
of (2018)
a question observed can be that usefulness
better measured of a q
We We use
use thisquestion.
this observation
observation We use totoas this
define
define observation
aa U U TILITYto
TILITY define
based
based rewarda
reward U function
function
TILITY based and
and reward
train
train thefunction
the updated trw
question
question and
ity that would be obtained ifthe theutility
context thatwere would updatedbe obtained with the if answer
the context to the were proposed
to tooptimize
optimize generator
this reward.
this to optimize
reward. Weand
We trainthis
train thereward.
the We
UUClarification train
reward
Ureward thefunction
to U predict areward
the to predict
likelihood that the
a l
rewardto predict
TILITYthe likelihood that
Weptimizing
use thismetrics
Reinforcement observation like BLearning
question.
toLEU define aRU Wefor
OUGE
TILITYuse ,TILITY
TILITY
this
based observation
TILITY to
function
Question define also
and Uoperates
train
Generation
TILITY on
the based dis-rewar
question
would
would
utputs, question
generate
generate
which anananswer
makes would
answer generate
that would
that
optimization
generator would an answer
increase
toincrease
difficult
optimize duethat
the
the would
utility
utility
toreward
this increase
of the
of the
non-differentiability.
reward. We the utility
context
context
train by Uof
byAadding
adding the useful
successful context
useful b
to optimize this reward. We train the U TILITY to predict the the likelihood TILITY thatrewarda
on
n totoitdealing
oach it(see
(seeinformation
§2.3
§2.3
with for
for the to itquestion
details).
details). (see §2.3would
non-differentiability for details). whilethe
generate also anretaining
answer some
that advantages
would of themax-
would generate an answer that would increase utility of the context byincrease
adding usefulutility
ohood to ittraining is for
the Mixed Incremental Cross-Entropy for Reinforce (Ranzato et al., 2015)
Context
noptimizing
optimizing (seeSimilar
metrics
metrics
§2.3 like BBinformation
to details).
optimizing
like LEU
LEU
Key metrics
and
and
Idea: tolike
RROUGE
OUGE it (see
,,B this
this §2.3
LEU U and
UTILITY
TILITY Rdetails).
OUGE function
function , this U TILITY
also
also operates
operates function
on dis-also
on
M IXER ). which
outputs,
outputs, In M text
crete
which IXERmakes
makes , theoptimization
outputs, overall
which loss makes L is optimization
difficult differentiated
due as in R EINFORCE
difficult
to non-differentiability.
non-differentiability. due A(Williams,
toRnon-differentiability
successful
optimizing metrics likeoptimization
BSimilar
LEU and ü  to R difficult
optimizing
OUGE
Estimate , due
this
loss by U to
metrics TILITY
drawing like B
function
LEU
samples andalso A
operates
OUGE
(“questions”)
successful
, thison U dis-TILIT
proach
roach dealingrecent
dealing withapproach
with the
the dealing
non-differentiability
non-differentiability with the non-differentiability
while
while also
also retaining
retaining while
some
some also
advantagesretaining
advantages of
of some
max-
max- adv
outputs, which makes optimization s crete text outputs, difficultwhich due tomakes s optimization
non-differentiability. difficult
A due
successful to non
Question s
elihood
lihood L(✓) =imum
training
training Eqisis
Generatorlikelihood
s ⇠p the
the r(qMixed
Mixed ) training
;Incremental
Incremental
r is
L(✓)the
Loss
recent approach dealing =Mixed
= - E
Cross-Entropy
Cross-Entropy Incremental
s
with
⇠p r(q
reward(qReinforce
Reinforce
)r Cross-Entropy
logs|c)
the non-differentiability p (Ranzato
(Ranzato
(q ) Reinforce
et
et al.,
al.,
while (3)
2015)
2015) (Ranz
also ret
roach dealing with the ✓
non-differentiability ✓
while also
q ✓
retaining ✓
some advantages
✓
of max-
(M(MIXERIXER).).algorithm
In M
In MIXER
(Seq2seq) IXER (M , , the
the
IXER
imum ).
overall
overall In M loss
loss
IXER
likelihood L L , isthe
is overall
differentiated
differentiated
training loss as
as
L is
inin differentiated
RR EINFORCE (Williams,
EINFORCE as in
(Williams, R EINFO
lihood
a random training
output is the
sample Mixed Incremental
according ü  to theCross-Entropy
Differentiate model the p✓is theas
, where
loss Mixed
Reinforce ✓ areIncremental
(Ranzato
the parameters Cross-Entropy
et al.,of2015)
the
1992): algorithm (M ). In M , the overall s loss sL is differentiate
We (Mthen ). In M IXERthe
IXERapproximate , the s overall
expected loss
gradient IXER
L is differentiated
using
s a single
IXER s as
sample in R qEINFORCE
= (q
s 1 s 2s, q (Williams,
s
, ..., q s
T) s
L(✓)
L(✓) = = E E q q ⇠p ✓✓
r(q
r(q s
L(✓)) )
1992): =
;; rrE ✓ q
✓ L(✓)
L(✓)
Loss r(q
odel distribution (p✓ ). In R EINFORCE, the policy is initialized random, which can cause
ss ⇠p s ⇠p ✓ = =
= - )EE qqs;s⇠p
⇠p ✓
r✓
r(q
r(q✓ logs
L(✓) )r
)r Pr(q
✓ =
✓ log
logs|c)Epp q
✓✓ (q
(q⇠p
s
)
reward(q
s )
✓
r(q )r|c) ✓ log (3)
p ✓ (q
gence
sisaarandom times.
random
L(✓) where ToEsolve
output
=Question
output yqss⇠p is
samplethis,
r(q s )M
✓a random
sample ; output
according
IXER
according rstartsto
✓ L(✓) by
the=optimizing
tosample
the
L(✓) model
model
-= Eq s E
accordingpp✓q✓,s✓,⇠p
⇠p maximum
where
r(q s s s|c)
✓to)r
where r(q
reward(q the are;likelihood
model
✓✓✓) are
log the
pthe
✓r
s log and
), where
parameters
(qpparameters
✓✓L(✓) Pr(q= slowly
✓E are
of
of
s|c)
(3)
q sthe
⇠pthe✓ r(
p
imizing
WeWethen thenthe expectedWe
network.
approximate
approximate reward
the then
the froms Eq
approximate
expected
expected 3. For
gradient
gradient the the
using initial
expected
using aa single time
gradient
single sample
sample steps,
using qqssM a=
= single
IXER (q
(q ss ,optimizes
, qqsample
ss , ..., ssq s =
2, ..., qthe
Tp) , w
is a random
r the remaining output sample where
)Intime according y
steps, it(pis a
to random
the
optimizes model output
the p ,
external
✓ sample
where ✓ according
reward. are the to
parametersthe
11 model
2 of T ✓
model
model from the
distribution
distribution (T(p model
(p ✓).). In distribution
RR EINFORCE
EINFORCE
network. We , , ).
the
the
then In R
policy
policy is
is
EINFORCE
approximate initialized
initialized ,
the the policy
random,
random,
expected is initialized
which
which
gradient can
canusingrandom,
cause
cause a sing w
We then approximate ✓ the expected gradient ✓
using a single sample q s
= (q s
, q s
, ..., q s
)
ergence
rgence long
times.
times.
el, we distribution
minimize convergence
To
Reward
To solve
solve
the(pU this,
this, times.
MM IXER
IXER
-based To solve
starts
starts
loss, the by this,
by M
optimizing
optimizing IXER starts maximum
maximum by optimizing
likelihood
likelihood maximum
1 and
and 2 slowly
slowly Tlikeli
model Calculator ✓ ). Infrom
TILITY R EINFORCEthe model policy defined
Ldistribution
max-utility (p✓ ). as:
is initialized In Rrandom,EINFORCE , the can
which policy causeis initi
ptimizingshifts
ptimizing the to optimizing
theexpected
expected reward
rewardlong the from
from expected
Eq
Eq 3.
3. reward
For
For the
the from
initial
initial Eq 3.
time
time For the
steps,
steps, initial
M
M time
optimizes
IXER optimizes
IXER steps, M
ergence times. To solve this, Mconvergence
IXER starts by times.
optimizing To solve maximumthis, M IXER likelihoodstarts by andoptimizing
slowly
forthe
or theremainingLmle and
remaining (Tfor the))shifts
(T remaining
timesteps,
time steps,
to itToptimizes
itX
(T
optimizing optimizes timeexpected
) the thesteps,
the external it reward
external optimizesreward.
reward. from theEq external
3. For reward.
the initial
ptimizing the expected reward p from b Eq 3. For the initial time steps, M IXER optimizes
or
del,
del, the
we L
weremaining
In Reward =
our model,
minimize
max-utility
minimize theU(r(q
the
(T we
UTILITY
TILITY
)minimize
)Ltime r(q
and
-based
mle-based
))loss
for
steps, itthe
the
loss log
optimizes
UL LTILITYp(qt |q
remaining
max-utility
max-utility the
-based ,(T qexternal
defined
1defined 2 , loss
...,as:q)L
ttime
as: 1 , ctsteps,
reward. ) defined
max-utility it optimizesas: (4) the ext
REINFORCE: Ronaldt=1 J Williams. Simple statistical gradient-following algorithms for
del, we minimize the U In our model,
-based weLminimizedefined
loss
TILITY
X
the U as: -based loss L
TT max-utility
connectionist reinforcement
X b
TILITY
learning. Machine defin
max-utility
Tlearning , 8(3-4):229–256, 1992.
) is the ULLTILITY based reward onr(q
the X
max-utility=
max-utility (r(qppL
= (r(q ))predicted
bb ))
))max-utility
r(q = T (r(q
log pquestion
) tt|q
logp(q
p(q |qr(q b and r(q ) is a baseline reward
11,,qq2)) ...,qqtlog
2,,..., t 11,,p(q q2T, ..., qt 1 ,(4)
cctt))t |q1 ,114 ct )
o reduce the high variance otherwise observed X when using R EINFORCE . X
Lmax-utility = (r(q ) r(q )) t=1 log
p b t=1 Lmax-utility
p(qt |q1 ,=q2 , ...,t=1 p
(r(q qt )1 , cr(qt ) ))
b
log p(qt(4)|q1 , q2
ould
lity
ty thatgenerate
that as the
would
would an
be answer
beutility
obtained
obtained
question. thatthatWe ifwould
ifwouldthethis
the
use be increase
context
context obtained
observation were
were thetoif utility
the
updated
updated
define contextofwith
with
a(2018)
U thethe
TILITY were
thecontext updated
answer
answer
based byto
reward toadding
with
the proposed
the
function useful
theandanswer
proposed train theq
Rao to it & Daumé
(see §2.3 III (2018)
for generator
details). Recently
observed Rao
that & Daumé
usefulness III
of a question observed can be thatbetter usefulness
measured of a
We We use
use thisquestion.
this observation
observation We use totoas this
define
define
to observation
optimize
the aacontext
UUTILITY
utility
this
TILITY
that tobased
reward.
would define
based Wereward
be aobtained
reward
train Uthe function
function
TILITY U TILITY
if based
the and
and
reward reward
train
train
context the
thefunction
to predict
were question
question and trw
the likeliho
updated
ity that would be obtained question if
would the generate an were
answer updated
that would withincreasethe answer
the theutility to the
of the proposed
context bythe addil
to to optimize
optimize
ptimizing generator
this
this reward.to
reward. optimizeWeWe trainthis
train thereward.
the U U TILITY
TILITY We train
reward
reward the to
to U predict
predict
TILITY thereward likelihood
likelihood to predict that
that a
We use thismetrics
Reinforcement observation like BtoLEU
information toand
question.
define
Learning aRU
it (see Wefor
OUGE
TILITY
§2.3 for, details).
use this
based
Clarification U TILITY rewardfunction
observation to define
function
Question also
anda Uoperates
train
Generation
TILITY the basedon
questiondis-rewar
would
would
utputs, question
generate
generate
which anan
makes would
answer
answer generate
that
that
optimization
generator would
would an answer
increase
increase
difficult
to optimize due that
the
the to
this would
utility
utility increase
of
of
reward. the
the We the
context
context
train utility
by
by
the A of
adding
adding
U the
successful context
useful
useful reward b
to optimize this reward.
Similar We
to (see train
optimizing the
metrics U TILITY reward to predict
like B LEU and ROUGE, this U TILITY function also operat the likelihood TILITY that a
on
n toto it
oach dealingit (see
(see information
§2.3
§2.3
with for
for to
details).
details). it §2.3 for details).
would generate anthe non-differentiability
answer
crete textquestion
that
outputs, would would
which increase
makeswhile
generate thealso anretaining
utility
optimization answer of the
difficult some
that context
due advantages
would byincrease
adding
to non-differentiability. of the max-
useful utilityAs
ohood to ittraining is for
the Mixed Incremental Cross-Entropy for Reinforce (Ranzato et al., 2015)
Context
noptimizing
optimizing (seeSimilar
metrics
metrics
§2.3 to details).
recent
optimizing
like BBinformation
likeapproach LEU
LEU
Key dealing
metrics
and
and
Idea: to
RROUGEwith
OUGE it the
like (see
,,B non-differentiability
this
this §2.3
LEU U and
UTILITY
TILITY Rdetails).
OUGE function
function while
, this also retaining
U TILITY
also
also operates
operates some
functiononadvantage
on dis-also
M IXER ). which In M text imum , the likelihood
overall training
loss Lis is thedifferentiated
Mixed Incremental as in Cross-Entropy
R EINFORCE Reinforce(Williams, (Ranzato et
outputs,
outputs, crete
which IXERmakes
makes outputs, optimization
optimization which makes
difficult
difficult optimization
due
due toto difficult
non-differentiability. due to non-differentiability
AA successful
successful
optimizing metrics like BSimilar
algorithm (M IXER
LEU ü  to
and R optimizing
).Estimate
In M IXER
OUGE ,loss themetrics
,this by U TILITY
overall
drawing likefunction
loss LB isLEU
samples andalso
differentiated Roperates
OUGE
(“questions”)as in, R this
on U
EINFORCEdis- TILIT(
proach
roach dealingrecent
dealing withapproach
with the
the dealing
1992): optimization with the non-differentiability
while
while also
also retaining
retaining while
some
some also
advantages
advantagesretaining of
of some
max-
max- adv
outputs, which makes s crete text outputs, difficultwhich due to makes optimization
non-differentiability. difficult
A due
successful to non
Question s s
elihood
lihood L(✓) =imum
training
training Eqisis
Generatorlikelihood
s ⇠p the
the Mixed
Mixed
r(q ) training
;Incremental
Incremental
r
L(✓) is
=L(✓)
recent approachq dealing the
Loss E Mixed
=
= - E
Cross-Entropy
Cross-Entropy
r(q ssIncremental
)⇠pwith; r(q
reward(q
r
the Reinforce
Reinforce
)r Cross-Entropy
logs|c)
✓ non-differentiability
L(✓) = p
E (Ranzato
(Ranzato
(q )r(q s Reinforce
)r et
et al.,
al.,
✓ while
log p (3)
2015)
2015)
(q s(Ranz
also ret
)
roach dealing with the ✓
non-differentiability ✓ s ⇠pwhile
✓
q
also ✓
retaining ✓
some ✓ s
q ⇠p advantages
✓ of✓max-
(M(MIXERIXER).).algorithm
In M
In MIXER
(Seq2seq) IXER (M , , the
the
IXER
y simum
).
overall
overall In M loss
lossIXER
likelihood L L ,isthe
is overall
differentiated
differentiated
training loss as
as
L is
inin differentiated
RR EINFORCE
EINFORCE as in
(Williams,
(Williams, R EINFO
lihood
a random training
output is the
sample
where Mixed is aIncremental
according random
ü  to theCross-Entropy
output
Differentiate model
sample the p✓is
according theas
, where
loss Mixed
Reinforce
to the ✓ are Incremental
model (Ranzato
the ✓ , whereet
pparameters Cross-Entropy
✓ areal.,of 2015)
the theparamet
1992): network. algorithm We then approximate
(M ).theIn expected
M gradient
, the using
overall s a single
loss sample
is q sq s=) (q1s , q
differentiate
We (Mthen ). In M IXER
IXERapproximate , the
the s overall
expected loss
gradient IXER
L is differentiated
using a
s R EINFORCE single
IXER as
sample in
ss, the policy
R qEINFORCE
= (q sL
, q
ss 1 s 2srandom,
(Williams,
s
, ..., T whichs
sfrom the smodel distribution (p ✓ ). -In is initialized
L(✓)
L(✓) = =
odel distribution (plong E E r(q
r(q L(✓) ) )
1992): ;=; rrE L(✓)
L(✓)
Loss r(q
==
= ) EE s;s⇠p r r(q
r(q L(✓)
log )r
)r Pr(q= log
logs|c)Epp (q
(q
reward(q)
) r(q )r |c) log (3)
p✓ (q
✓ ). In R EINFORCE times., the policy this,isMinitialized random, which can cause
q q s ⇠p
⇠p ✓ q
✓ s ⇠p q q ⇠p ✓ ✓ ✓ q
✓✓ s ⇠p ✓
✓✓ ✓ ✓ ✓ ✓
convergence To solve IXER starts by optimizing maximum likelihood a
gence
sisaarandomtimes.
random
L(✓) where ToEsolve
output
=Question
output is
yqssshifts
sample
⇠p this,
✓a random
sample
r(q s M IXER the
)according
to optimizing rstarts
; output
according to
✓ L(✓) by
the=optimizing
tosample
the
expectedL(✓) model
model Eq s E
according
-reward
= ✓q✓,s✓,⇠p
ppfrom
⇠p maximum
where
r(q Eq
reward(qs 3. sFor sthe
✓to)r
where r(q the✓✓✓) are
log ;likelihood
model
are
|c) the
pthe✓r
s logtime
), where
parameters
(qpparameters
initial ✓✓L(✓) and
Pr(q slowly
=steps,✓E
s|c)
are
of
of qM (3)
sthe
⇠pthe ✓ p
IXER r(
imizing
WeWethen thenthe expected
network.
approximate
approximate We
Lmle reward
and
the
the thenfor from
the
expected s Eq
remaining
approximate
expected 3. (T
gradient
gradient For
the the initial
) time
expected
using
using aa steps,
single
single ittime
gradient optimizes
sample
sample steps,
using the
qq ssM aexternal
== single
IXER (q
(q ss ,optimizes
, reward.
qq sample
ss , ..., ssq s =
2, ..., qtheTp) , w
is a random
r the remaining output sample where
according y is a
to random
the model output p , sample
where ✓ according
are the to
parametersthe
1 1 model
2 of T
model
model from the
distribution
distribution (T(p Inmodel
(p ✓).). In
our Intime
)model,R steps,
distribution
R we
EINFORCE
EINFORCE
network.
minimize
We
it(poptimizes
, , ).
theIn
the
the
then policy
U R
policy the
EINFORCE
TILITY
approximate is
is external
✓
initialized
initialized
-based ,
loss
the
reward.
the policy
random,
random,
Lmax-utility
expected is initialized
which
which
defined
gradient
as: can
canusing random,
cause
cause a
✓
sing w
We then approximate ✓ the expected ü  gradient
Mixed
✓
using
Incremental a single
Cross-Entropy sample q s
=
Reinforce (q s
, q s
, ...,
(MIXER) q s
)
ergence
rgence long
times.
times.
el, we distribution
minimize convergence
To
Reward
To solve
solve
the(pU this,
this, times.
MM IXER
IXER
-based To solve
starts
starts
loss, the bythis,
by M
optimizing
optimizing IXER starts maximum
maximum by optimizing
likelihood
likelihood maximum
1 and
and 2 slowly
slowly Tlikeli
model Calculator ✓ ).
TILITYInfromR EINFORCE the model policyp defined
Ldistribution
max-utility (p✓b). as:
is initialized In Rrandom,
X T EINFORCE , the can
which policy cause is initi
ptimizingshifts
ptimizing the to optimizing
theexpected
expected reward
reward long the from
from expected
Eq
Eq 3.
3. reward
For
For the
the from
initial
initial
– r(q Eq 3.
time
time For the
steps,
steps, initial
M
M IXER
IXER time
optimizes
optimizes steps, M
ergence times. To solve this, Mconvergence starts==bytimes.
Lmax-utility (r(q ))To
optimizing solve )) this,log
r(q maximum Mp(q likelihood starts
t |q1 ,|c)q2 , ..., qby t 1optimizing
and , cslowly
t)
Loss - (r(q s b )) log Pr(q s
IXER IXER
forthe
or theremainingLmle and
remaining (Tfor the))shifts
(T remaining
timesteps,
time steps,
to itToptimizes
itX
(T
optimizing optimizes ) thetimeexpected
thesteps,
the external
external it reward
optimizes
t=1 reward.
reward. from theEq external
3. For reward.
the initial
ptimizing the expected rewardp from b Eq 3. For the initial time steps, M IXER optimizes
or
del,
del, the
we L
weremaining
In Reward =
our model,
minimize
max-utility
minimize where
the
the
(T U(r(q we
UTILITY
r(q
TILITY
p)
)Ltimeminimize
mle
r(q
and
) is-based
-basedUfor
steps,
the )) itthe
the
loss
loss
TILITY UL log
LTILITY p(q
remaining
optimizes
based t |q
reward
max-utility
max-utility the
-based ,(T qexternal
on
defined
1defined 2 ,the
...,predicted
loss q)L
ttime
as:
as: 1 , ctsteps,
reward. ) defined
question
max-utility it and
optimizes as:b ) is(4)
r(q athe ext
baseli
introduced to reduce thet=1 high variance otherwise observed when using R EINFORCE.
del, we minimize the In, the
U TILITY our
In M IXERRanzato
model,
-based
baseline
we
loss
isXTTLminimize
max-utility
estimated
levelusing thewith
defined
a linear
U TILITY
as: -based
T
regressor
loss Lmax-utility
ICLR 2016 defin
X X b that takes in the current hidden
et.al Sequence training recurrent neural networks
) is the ULLTILITY based reward onr(q
ppL)input the))ispredicted
bb )) pquestion and
b the r(q is
),,p(q a|qerror
baseline reward (4)
max-utilitythe
== model
(r(qas and =trained top(q
minimize mean squared p b
max-utility (r(q r(q
) max-utility (r(q
log
log )
p(q tt |q
|qr(q
11,,qq2))
2,, ...,
..., q qlog
tt 11 c
c tt))
t 1 , q 2 , ...,
(||r(q q , c
)t 1r(q t )||)2
)
o reduce the high we variance otherwisetraining
use a self-critical observed
X T
t=1
when
approach using
Rennie et R EINFORCE
al.
t=1 p
(2017) where. X
115T
the baseline is estima
p b t=1 b
Lmax-utility the
= reward
(r(qobtained
) r(qby)) Lmax-utility
log
the current model 1 ,=q2 ,greedy
p(qt |qunder (r(qqt decoding
..., )1 , cr(q )) test
t ) during p(qt(4)
logtime. |q1 , q2
Context
Question
Reward
Generator Reward
Calculator
(Seq2seq)
Question

using Reinforcement Learning
116
Context
Trained Offline
Question
Reward
Generator Reward
Calculator
(Seq2seq)
Question

117
Context Train it along with

Question Generator
Question
Reward
Generator Reward
Calculator
(Seq2seq)
Question

118
Generative Adversarial Networks (GAN) based training
Context
Generator
Question
Generator
(Seq2seq)
Question
Model Data
119
Context
Generator Discriminator
Question
Reward
Generator
Calculator
(Seq2seq)
Question
Model Data
120
Real Data
Context
(context,
question,
answer)
Question
Reward
Generator Reward
Calculator
(Seq2seq)
Question
ü  Discriminator tries to distinguish between
Model Data real and model data
ü  Generator tries to fool the discriminator by
generating real looking data
121
GAN-Utility based Clarification Question Generation Model
Real Data
Context
(context,
question,
answer)
Question
Reward
Generator Reward
Calculator
(Seq2seq)
Question

122
Our clarification question generation model (in summary)
123
Sequence-to-sequence model trained using MLE
Context
Question
Generator
(Seq2seq)
Question
124
Sequence-to-sequence model trained using RL
Context
Question Answer
Utility
Calculator
(Seq2seq) (Seq2seq)
Question Answer
125
Sequence-to-sequence model trained using GAN
Context
Question Answer
Utility
Calculator
(Seq2seq) (Seq2seq)
Question Answer
126
Example outputs
Original: are these pillows firm and do they keep their shape
Max-Likelihood: what is the size of the pillow ?
GAN-Utility: does this pillow come with a cover or does it have a zipper ?
127
Example outputs
Original: are these pillows firm and do they keep their shape
Max-Likelihood: what is the size of the pillow ?
GAN-Utility: does this pillow come with a cover or does it have a zipper ?
Original: does it come with a shower hook or ring ?
Max-Likelihood: is it waterproof ?
GAN-Utility: is this shower curtain mildew resistant ?
128
Error Analysis of GAN-Utility model
Incompleteness
what is the size of the towel ? i 'm looking for something to be able to use it for
Word repetition
what is the difference between this and the picture of the cuisinart deluxe
deluxe deluxe deluxe deluxe deluxe deluxe
129
1.  Do generation models outperform simpler retrieval baselines?
130
2.  Does maximizing reward improve over max-likelihood training?
131
3.  Does adversarial training improve over pretrained reward calculator?
132
3.  Does adversarial training improve over pretrained reward calculator?
4.  How do models perform when evaluated for specificity and usefulness?
133
Context
Evaluation set size: 500
Generated Question
•  How relevant is the question?
•  How grammatical is the question?
•  How specific is it to the product?
•  Does this question ask for new information?
•  How useful is this question to a potential buyer?
Note: We use a crowdsourcing platform called Figure-Eight 134

Context
Evaluation set size: 500
Generated Question
•  How relevant is the question?

All models equal and
close to reference
•  How grammatical is the question?
•  How specific is it to the product?
•  Does this question ask for new information?
•  How useful is this question to a potential buyer?
Note: We use a crowdsourcing platform called Figure-Eight 135

Human-based Evaluation Results on Amazon Dataset
How specific is the question to the given context?
136
Original 3.07
0 0.5 1 1.5 2 2.5 3 3.5 4
Specificity score
137
Information
Lucene 2.8 Retrieval
Original 3.07
0 0.5 1 1.5 2 2.5 3 3.5 4
Specificity score
138
Max-Likelihood 2.84
Learning vs
Non-learning
Lucene 2.8
Original 3.07
0 0.5 1 1.5 2 2.5 3 3.5 4
Specificity score
139
Max-Utility 2.88
Reinforcement
Learning
Max-Likelihood 2.84
Lucene 2.8
Original 3.07
0 0.5 1 1.5 2 2.5 3 3.5 4
Specificity score
140
Gan-Utility 2.99
Adversarial
Training
Max-Utility 2.88
Max-Likelihood 2.84
Lucene 2.8
Original 3.07
0 0.5 1 1.5 2 2.5 3 3.5 4

Specificity score
Note: Difference between GAN-Utility and all others is statistically significant with p < 0.001
141
Does the question ask for new information?
Gan-Utility 2.51
Difference
Max-Utility 2.47 Statistically
Insignificant
Max-Likelihood 2.48
Lucene 2.56
Original 2.68
0 0.5 1 1.5 2 2.5 3
New information score
142
How useful is this question to a potential buyer?
Gan-Utility 0.94
Difference
Max-Utility 0.9 Statistically
Insignificant
Max-Likelihood 0.93
Lucene 0.77
Original 0.79
0 0.2 0.4 0.6 0.8 1
Usefulness score
143
Talk Outline


ü  Answers are useful in identifying useful questions

ü  Sequence-to-sequence model generates relevant & useful questions
ü  Adversarial training generates questions more specific to context
144
Talk Outline



145
Generic versus specific questions
Amazon
Generic questions Specific questions
Where was this manufactured? Is this induction safe?
What is the warranty? Is ladle included in the set?
146
Sequence-to-sequence model for question generation
Input Output
Context
Context Question
Training data
Question
Generator Context Question
(Seq2seq)
Context Question
Question
147
Sequence-to-sequence model for controlling specificity
Input Output
Context Specific
< specific > Question
Training data
Question Context Generic
Generator < generic > Question
(Seq2seq)
Context Generic
< generic > Question
Sennrich et al. Controlling politeness in neural machine translation via side constraints. NAACL 2016
148
Sequence-to-sequence model for controlling specificity
Input Output
Context
< specific > Context Specific
Training data
Question Context Generic
Generator < generic > Question
(Seq2seq)
Context Generic
Specific
Question
Sennrich et al. Controlling politeness in neural machine translation via side constraints. NAACL 2016
149
Annotating questions with level of specificity
Input Output
o  We need annotations on training data
Context Specific
o  Manually annotating is expensive
Context Generic
Context Generic
150
Annotating questions with level of specificity
Input Output
o  We need annotations on training data
Context Specific
o  Manually annotating is expensive
o  Hence
Context Generic
Ø  Use ask humans1 to annotate a set of
3000 questions
Ø  Train a machine learning model to Context Generic
automatically annotate the rest
1 We use a crowdsourcing platform called Figure-Eight
151
Specificity classifier
Context Question specific

Training
Specificity data
Context Question generic
Classifier
Input Output
Louis & Nenkova. "Automatic identification of general and specific sentences by leveraging discourse annotations.” IJCNLP 2011
152

Training
Test Input Specificity data
Question Classifier

Output
Input Output
specific OR generic
153

Training
Question Classifier

Output
Input Output
specific OR generic
Features for training logistic regression model

ü  Question Length
ü  Path of question word in WordNet
ü  Syntax
ü  Polarity
ü  Question bag-of-words
ü  Average word embeddings
154
Summary of specificity-controlled question generation model

Training
Question Classifier

Output
Input Output
specific OR generic
Context Specific
Training < specific > Question
Input Question data
Test Context
Generation Context Generic
< specific > Model < specific > Question
Output
Context Generic
Specific Question < specific > Question
Input Output
155
Specificity classifier results (with feature ablation)
Test Accuracy Training Accuracy
0.73
All features 0.79
0.71
Question bag-of-words 0.8
0.7
Syntax 0.71
0.64
Average word embeddings 0.66
0.65
Polarity 0.65
0.64
Path in WordNet 0.63
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
156
Example Outputs
Original: can this thermometer be left inside of a roast as it cooks ?
Max-Likelihood: is this thermometer dishwasher safe ?
GAN-Utility: is this a leave-in ?
Specificity-MLE (g): is it made in the usa ?
Specificity-MLE (s): can you use this thermometer to make a turkey ?
Specificity-GAN (g): is this dishwasher safe ?
Specificity-GAN (s): does this thermometer have a timer ?
157
Automatic metric based evaluation of question generation
Diversity
GAN-Utility
0.13
MLE
0.12
0 0.04 0.08 0.12 0.16 0.2
Diversity = Proportion of unique trigrams in the question
158
Diversity Diversity (specific)
0.14
Specificity-GAN-Utility
0.16
Specificity-MLE
GAN-Utility
0.13
MLE
0.12
0 0.04 0.08 0.12 0.16 0.2
159
Diversity Diversity (specific) Diversity (generic)
0.14
0.1
0.16
Specificity-MLE
0.1
GAN-Utility
0.13
MLE
0.12
0 0.04 0.08 0.12 0.16 0.2
160
BLEU (specific)
Specificity-GAN-Utility 2.95
Specificity-MLE 4.45
GAN-Utility 2.69
MLE 1.41
0 2 4 6 8 10 12 14
161
BLEU (specific) BLEU (generic)
2.95
12.84
4.45
Specificity-MLE
12.61
2.69
GAN-Utility
12.01
1.41
MLE
12.61
0 2 4 6 8 10 12 14
162
Talk Outline



163
Talk Outline



164
1. Using multi-modal context (Text + Image)
165
1. Using multi-modal context (Text + Image)
MODEL Generated Question

Using product description: Does the set include a ladle?
Using description + image: Are they induction compatible?
166
2. Knowledge-grounded question asking
Post related to Ubuntu

Operating System
What version of Ubuntu are you using?
167
Operating systems Knowledge

ü  <version> Base
ü  <bit>
Post related to Ubuntu

Operating System
What version of Ubuntu are you using?
168
Operating systems Knowledge

Toaster
ü  <version> Base
ü  <dimensions>
ü  <bit> ü  <watts>
Post related to Ubuntu Product description

Operating System about Toaster
What version of Ubuntu are you using? What is the dimensions of the toaster?
169
3. Towards more intelligent dialog agents
Please bring me my
coffee mug from the
kitchen
What color is your

coffee mug?
Black
I found two black mugs.

Is yours the one with
the NFL logo?
170
CONCLUSION
ü  Identify importance of teaching machines to ask clarification questions
171
CONCLUSION
ü  Create dataset of clarification questions (StackExchange & Amazon)
172
CONCLUSION
ü  Novel model for ranking clarification questions
173
CONCLUSION
ü  Novel model for generating clarification questions
174
CONCLUSION
ü  Novel model for generating clarification questions
ü  Novel model for generating specificity-controlled clarification questions
175
Collaborators
Philip Resnik Marine Carpuat UMD
Hal Daumé III

My wonderful advisor J
Allyson Ettinger Yogarshi Vyas Xing Niu
Daniel Marcu Kevin Knight Joel Tetreault Paul Mineiro

ISI Internship Grammarly Internship MSR Internship
Acknowledgements
Ø  Thesis committee members:

Hal Daumé III
Philip Resnik
Marine Carpuat
Jordan Boyd-Graber
David Jacobs
Lucy Vanderwende (University of Washington)
Ø  CLIP lab members
Ø  Friends and family

Publications
o  Clarification Questions
ü  Sudha Rao, Hal Daumé III, "Learning to Ask Good Questions: Ranking Clarification Questions
using Neural Expected Value of Perfect Information ”, ACL 2018 (Best Long Paper Award)
ü  Sudha Rao, Hal Daumé III, “Answer-based Adversarial Training for Generating Clarification
Questions” In Submission
o  Formality Style Transfer

ü  Sudha Rao, Joel Tetreault, "Dear Sir or Madam, May I introduce the GYAFC Dataset: Corpus,
Benchmarks and Metrics for Formality Style Transfer”, NAACL 2018
ü  Xing Niu, Sudha Rao, Marine Carpuat, "Multi-task Neural Models for Translating Between Styles
Within and Across Languages”, COLING 2018
o  Semantic Representations
ü  Sudha Rao, Yogarshi Vyas, Hal Daume III, Philip Resnik, "Parser for Abstract Meaning
Representation using Learning to Search", Meaning Representation Parsing, NAACL 2016
ü  Sudha Rao, Daniel Marcu, Kevin Knight Hal Daumé III, "Biomedical Event Extraction using
Abstract Meaning Representation” Biomedical Natural Language Processing, ACL 2017
o  Zero Pronoun Resolution

ü  Sudha Rao, Allyson Ettinger, Hal Daumé III, Philip Resnik, "Dialogue focus tracking for zero
pronoun resolution", NAACL 2015
Backup Slides
179
Generalization beyond large datasets
ü  Bootstrapping process:
1.  Use template based approach or humans to write initial set of questions
2.  Train model on small set of questions and generate more
3.  Add these (noisy) questions to training data and retrain
ü  Domain adaptation:
1.  Find a similar domain that has large no. of clarification questions
2.  Train neural network parameters on out-domain and tune on in-domain
ü  Use reading comprehension questions data (like SQUAD)

1.  Remove the answer sentence from the passage
2.  The question can now become a clarification question
ü  EVPI idea can be applicable to identify “good” questions among several

template-based questions
180
StackExchange dataset: Example of comment as answer
Make install: cannot run strip: No such file or directory
root@server:~/shc-3.8.9# make install

*** Installing shc and shc.1 on /usr/local
*** Do you want to continue? y
install -c -s shc /usr/local/bin/
install: cannot run strip: No such file or directory Initial Post
install: strip process terminated abnormally
make: *** [install] Error 1
I don't use make install often. Can someone tell me how to fix it? J
what exactly are you trying to install and Question

what version of ubuntu are you on ? comment
i 'm trying to install shc-3.8.9 and i tried Answer

to follow this guide : use ubuntu 14.04 comment
181
StackExchange dataset: Example of comment as answer
Not enough space to build proposed filesystem while setting up superblock
Just bought a new external drive. Plugged it in, erased current partition using fdisk and
created a new extended partition using fdisk. Used all the defaults for start and end
blocks. I then try to format the new partition using the following:
sudo mkfs.ext4 /dev/sdb1 Initial
However, I received the following error: Post
mke2fs 1.42 (29-Nov-2011)
/dev/sdb1: Not enough space to build proposed filesystem while setting up superblock
Any ideas what could be wrong? Should I have created a primary partition? If so, why?
Question
are you installing from a bootable thumb drive ?
comment
i am booting from a dvd drive . i created a dvd Answer

with ubuntu 12.04 installation iso image on it . comment
182
StackExchange dataset: Example of edit as answer
VM with host communication
i run a program inside a vm which outputs 0 or 1

only . how can i communicate this result from the vm Initial Post
to my host machine ( which is ubuntu 12.04 )
guest os ? where does your program Question

output the result to ? comment
use virtualbox
2. virtual machine os : ubuntu 12.04 lts Edit to the post
3. host machine os : ubuntu 12.04 lts .
183
StackExchange dataset: Example of non-answer
My Ubunto 12.04 Installation hangs after “Preparing to install Ubuntu”.

What can I do to work around the problem?
I did download Ubuntu 12.04LTS. I tried to install - no progress. I tried to remove all
partition using a bootable version of GParted. I created one big partition ext4 formatted. Initial
It all did not help. The installation stops after "Preparing to install Ubuntu". All three Post
checkmarks are checked an I can click "Continue" but then nothing for hours. What can I
do? Please help!
why don't you try to create a partition Question

via gparted ? comment
i already know how to partition it using gparted . Answer

i am trying to expand my knowledge . comment
184
Human-based Evaluation Results (Specificity)
How specific is the question to the product?
300
250
200
150
100
50
0
Original Lucene Max-Likelihood Max-Utility GAN-Utility
This product Similar Products Products in Home & Kitchen N/A
185
Human-based Evaluation Results (Usefulness)
How useful is the question to a potential buyer?
300
250
200
150
100
50
0
Should be in the description Useful to large no. of users
Useful to small no. of users Useful only to person asking
N/A
186
Human-based Evaluation Results (Seeking new information)
Does the question ask for new information currently not included in the description?
450
400
350
300
250
200
150
100
50
0
Completely Somewhat No N/A
187
Human-based Evaluation Results (Relevance)
How relevant is the question to the product?
500
450
400
350
300
250
200
150
100
50
0
Yes No
188
Human-based Evaluation Results (Grammaticality)
How grammatical is the question?
500
450
400
350
300
250
200
150
100
50
0
Grammatical Comprehensible Incomprehensible
189
Human-based Evaluation Results
190
Error Analysis of MLE model
Short and Generic questions
dishwasher safe ?
what are the dimensions ?
is this a firm topper ?
where is this product made ?
191
Error Analysis of Max-Utility model
Incompleteness and repetition
what are the dimensions of this item ? i have a great size of baking pan and pans and pans
what are the dimensions of this topper ? i have a queen size mattress topper topper topper
what is the height of the trash trash trash trash trash
can this be used with the sodastream system system system system
192
Error Analysis of GAN-Utility model
<unk> tokens and bad long questions
what is the difference between the <unk> and the <unk> ?
what is the size of the towel ? i 'm looking for something to be able to use it for
what is the difference between this and the picture of the cuisinart <unk> deluxe
deluxe deluxe deluxe deluxe deluxe deluxe
193
Error Analysis of specificity model
Incomplete questions
what are the dimensions of the table ? i 'm looking for something to put it in a suitcase
what is the density of the mattress pad ? i 'm looking for a mattress for a memory foam
does this unit come with a hose ? i need to know if the window window can be mounted
Disconnected multi-sentence questions
can you use this in a conventional oven ? i have a small muffin pan for baking .
what is the height of this unit ? i want to use it in a rental .
what are the dimensions of the basket ? i need to know if the baskets are in the picture
194
Reward Calculator
Context Question Answer Training
Real Data
Reward
Calculator
Generated Generated
Context Testing
Question Answer
Model Output
195
Other types of Question Generation
o  Liu, et al. “Automatic question generation for literature review writing support." International
Conference on Intelligent Tutoring Systems. 2010
o  Penas and Hovy, “Filling knowledge gaps in text for machine reading” International Conference
on Computational Linguistics: Posters ACL 2010
o  Artzi & Zettlemoyer, “Bootstrapping semantic parsers from conversations” EMNLP 2011
o  Labutov, et al.“Deep questions without deep understanding” ACL 2015
o  Mostafazadeh et al. "Generating natural questions about an image." ACL 2016
o  Mostafazadeh et al. "Multimodal Context for Natural Question and Response Generation.” IJCNLP
2017.
o  Rothe, Lake and Gureckis. “Question asking as program generation” NIPS 2017.
196
Key Idea behind Expected Value of Perfect Information (EVPI)

<error message>
Possible questions
(a)  What version of Ubuntu do you have? à Just right
(b)  What is the make of your wifi card? à Not useful
(c) Are you running Ubuntu 14.10 kernel 4.4.0-59-

generic on an x86 64 architecture? à Unlikely to add value
Avriel, Mordecai, and A. C. Williams. "The value of information and stochastic programming." Operations Research 18.5
197(1970)
197
4. Writing Assistance
Hi Kathy,
We have decided to meet at 10am tomorrow

to discuss the next group assignment.
Hey John,
Thanks for letting me know.

Where are we meeting though?
198
Hi Kathy,

Hey John,
Thanks for letting me know.

Where are we meeting though?
Oh right. Forgot to mention that.

In the 3rd floor grad lounge.
199
Do you want to include

the location?
Hi Kathy,

200
Do you want to include

the location?
Hi Kathy,

Hi Kathy,

in the 3rd floor grad lounge to discuss the
next group assignment.
Sounds good!
201
3. Interactive Search Query
Historical gas prices
202
Which region?
203
Which region?
Which period?
204
4. Asking questions to help build reasoning
Jack and Jill were running a race.

Jack reached the finish line when
Jill was still a few steps behind.
Jill was quite upset.
205

Jill was still a few steps behind. Why was Jill
Jill was quite upset. upset?
206

Jill was still a few steps behind. Why was Jill
Jill was quite upset. upset?
Because she
did not win
the race.
207
Generating Natural Questions from Images (+ Text)
Q: Was anyone injured in the crash?
Q: Is the motorcyclist alive?
Q: What caused the accident?
User1: My son is ahead and surprised!
User2: Did he end up winning the race?
User1: Yes he won, he can’t believe it!
o  Mostafazadeh et al. "Generating natural questions about an image." ACL 2016
o  Mostafazadeh et al. "Image-Grounded Conversations: Multimodal Context for Natural

Question and Response Generation." IJCNLP 2017.
208
Example outputs
Original: where is the hose attachment hole located ?
Max-Likelihood: does it have a remote control?
GAN-Utility: does this unit have a drain hose on the outside ?
Original: how quickly does it boil water ?
Max-Likelihood: does this kettle have a warranty ?
GAN-Utility: does it come with a cord ?
209
ndal.a discriminator.
(2017) proposed a sequence
The generator GAN model
is an arbitrary model gfor 2 Gtext
that generation
produces outpu to
e gradient
eat their The
uestions). update
generator from
as anisthe
discriminator discriminator
agent
another and
model 2 Dto
used the theattempts
generator.
discriminator
that as aRecently
to classifyreward
betwee
dequence
ive modelGAN
model-generated
GAN-Utilityusing model
outputs. for
Thetext
reinforcement
based goal generation
Clarificationof the
learning toGeneration
generator
Question overcome
is to generate
techniques. Our this
data issue.
such thatT
GAN-based
Model
criminator;
gent andGAN
quence the goal
use the of thewith
discriminator
discriminator
model as isa toreward
two main be ablefunction
to successfully
modifications: a)toWe distinguish
update
use the M be
ge
neratedlearning
ement data. In the process of trying
techniques. Our to fool the discriminator,
GAN-based approach the
is generator by
inspired pro
or (§2.2) instead
Ø  General
as close
GANof
as possible policy
Objective
to the gradient
real data approach;
distribution. and the
Generically, b) GAN
We use the UisT
objective
two main modifications:
scriminator a) We use the
instead of a convolutional M IXER
neural algorithm
network (CNN). as our ge
y gradient
LGAN (D, approach;
G) = max minand Eb) x⇠We
p̂ loguse
d(x)the + EU log(1 function
TILITY d(g(z)))(§2.3) as
our model, the
onvolutional answer
neural is an latent
network
d2D g2G
(CNN). variable: we do not actually use i
z⇠p z
scriminator. Because of this, we train our discriminator using (con

sampled
nswer)
an latent from
triplestheastrue
variable: data distribution
positive
we do instances
not p̂, and
actually is sampled
andzuse
(context,
it from aexcept
prior question,
generated
anywhere defined
to on
train
bles
e z . Clarification
pØ 
negative instances.Question Model GAN Objective
Formally, our objective function is:
this, we train our discriminator using (context, true question, gener
instances
GANs and successfully
have been (context, generated
used for image question, generated
tasks, training GANsanswer) triple
for text generat
gmally, our
dueLtoGAN-U objective
the discrete
(U =function
, M)naturemax minis:Eq⇠
of outputs in p̂text. The discrete
log u(c, q)) + Efrom
q, A(c, outputs the gen
c⇠p̂ log(1
u2U m2M
ficult to pass the gradient update from the discriminator to the generator. Recent
7)min
proposed
q⇠p̂alog
Eis sequence
u(c, q,GAN
A(c, model
q)) +E for
c⇠text generation
p̂ log(1 u(c,tom(c),
overcome
A(c, this issue.
m(c))))
here U the U discriminator, M is the M generator,
enerator as an agent and use the discriminator as a reward function to update p̂the
m2M TILITY IXER is g
nswer) triples and A
l using reinforcement is ourtechniques.
learning answer generator.
Our GAN-based approach is inspired b
criminator,
GAN model withMtwois the
mainMmodifications:
IXER generator, is our
a) Wep̂use the data of algorithm
M IXER (context,asquest
our g
rinstead
answer generator.
of policy gradient approach; and b) We use the U TILITY function (§2.3) a
5 instead
or P RETRAINING
of a convolutional neural network (CNN).
210
el, the answer is an latent variable: we do not actually use it anywhere except to tra
Generative Adversarial Networks (GAN)
211
Goal: Train a model to generate digits
Latent Space + Noise
Model Data
212
Real Data
1 (Real)
0 (Fake)

real and model data
Model Data
213
Real Data
1 (Real)
0 (Fake)

real and model data
ü  Generator tries to fool the discriminator by
Model Data
generating real looking data
ü  Thus, the generator is optimized
214
Style transfer prior work
Informal Formal
Gotta see both sides of the story You have to consider both sides of the story
Shakespearean English Modern English
I should kill thee straight I ought to kill you right now
Brooke et al. Automatic acquisition of lexical formality. ACL 2010
Niu et al. Controlling the formality of machine translation output. EMNLP 2017
Rao and Tetreault. Corpus, Benchmarks and Metrics for Formality Style Transfer. NAACL 2018
Xu et al. Paraphrasing for style COLING 2012
215
Upwork annotation statistics
Ø  Agreement on best in ‘strict sense’: 0.15
Ø  Agreement on best in ‘relaxed sense’: 0.87

(best by one annotator is valid by another)
Ø  Agreement on valid in ‘strict sense’: 0.58

(binary judgment of is valid)
Ø  Original in union of best: 72%
Ø  Original in intersection of best: 20%
Ø  Original in intersection of valid: 76%
Ø  Original in union of valid: 88%
216
Detailed human evaluation results
B1 [ B2 V1\V2 Original
Model p@1 p@3 p@5 MAP p@1 p@3 p@5 MAP p@1
Random 17.5 17.5 17.5 35.2 26.4 26.4 26.4 42.1 10.0
Bag-of-ngrams 19.4 19.4 18.7 34.4 25.6 27.6 27.5 42.7 10.7
Community QA 23.1 21.2 20.0 40.2 33.6 30.8 29.1 47.0 18.5
Neural (p, q) 21.9 20.9 19.5 39.2 31.6 30.0 28.9 45.5 15.4
Neural (p, a) 24.1 23.5 20.6 41.4 32.3 31.5 29.0 46.5 18.8
Neural (p, q, a) 25.2 22.7 21.3 42.5 34.4 31.8 30.1 47.7 20.5
EVPI 27.7 23.4 21.5 43.6 36.1 32.2 30.5 49.2 21.4
Table 4.1: Model performances on 500 samples when evaluated against the union
of the “best” annotations (B1 [ B2), intersection of the “valid” annotations (V 1 \
V 2) and the original question paired with the post in the dataset. The di↵erence
between the bold and the non-bold numbers is statistically significant with p <
0.05 as calculated using bootstrap test. p@k is the precision of the k questions
ranked highest by the model and MAP is the mean average precision of the ranking
predicted by the model.
217
Detailed human evaluation results (without original)
B1 [ B2 V1\V2
Model p@1 p@3 p@5 MAP p@1 p@3 p@5 MAP
Random 17.4 17.5 17.5 26.7 26.3 26.4 26.4 37.0
Bag-of-ngrams 16.3 18.9 17.5 25.2 26.7 28.3 26.8 37.3
Community QA 22.6 20.6 18.6 29.3 30.2 29.4 27.4 38.5
Neural (p,q) 20.6 20.1 18.7 27.8 29.0 29.0 27.8 38.9
Neural (p,a) 22.6 20.1 18.3 28.9 30.5 28.6 26.3 37.9
Neural (p,q,a) 22.2 21.1 19.9 28.5 29.7 29.7 28.0 38.7
EVPI 23.7 21.2 19.4 29.1 31.0 30.0 28.4 39.6
Table 4.2: Model performances on 500 samples when evaluated against the union
of the “best” annotations (B1 [ B2) and intersection of the “valid” annotations
(V 1 \ V 2), with the original question excluded. The di↵erence between all numbers
except the random and bag-of-ngrams are statistically insignificant.
predict the “best” question. The model predicts “why would you need this” with
very high probability likely because it is a very generic question, unlike the question
marked as “best” by the annotator which is too specific. In the third example,218 the
model again predicts a very generic question which is also marked as “valid” by the
0.50 define “ frozen ” . did it panic ? or did something else happen ?
0.50 maybe you need to use your ‘fn‘ key when pressing print screen ?
0.50 tried ctrl + alt + f2 ?
StackExchange example
0.49 does the script output
process 1 iteration (ranking)
successfully ?
0.49 laptop or desktop ?
Title: How to flash a USB drive?.
Post: I have a 8 GB Sandisk USB drive. Recently it became write somehow.
So I searched in Google and I tried to remove the write protection
through almost all the methods I found. Unfortunately nothing worked.
So I decided to try some other ways.
Some said that flashing the USB drive will solve the problem.
But I don’t know how. So how can it be done ?
1.01 what file system was the drive using ?
1.00 was it 16gb before or it has been 16mb from the first day you used it ?
0.74 which os are you using ? which file system is used by your pen drive ?
0.64 what operation system you use ?
0.51 can you narrow ’a hp usb down ’ ?
0.50 could the device be simply broken ?
0.50 does it work properly on any other pc ?
0.50 usb is an interface , not a storage device . was it a flash drive or a portable disk ?
0.49 does usb flash drive tester have anything useful to say about the drive ?
0.49 your drive became writeable ? or read-only ?
Table 4.4: Examples of human annotation from the unix and superuser domain of
our dataset. The questions are sorted by expected utility, given in the first column.
The “best” annotation is marked with black ticks and the “valid”’ annotations
are marked with grey ticks .
219
43
StackExchange example output (ranking)
Title: Frozen Linux Recovery Without SysReq

Post: RHEL system has run out of memory and is now frozen.
The SysReq commands are not working, so I am not even sure that
/proc/sys/kernel/sysrq is set to 1.
Is there any other ”safe” way I can reboot w/out power cycling?
0.91 why would you need this ?
0.59 do you have sudo rights on this computer ?
0.55 are you sure sysrq is enabled on your machine ?
0.52 did you look carefully at the logs when you rebooted after it hung ?
0.51 i assume you have data open which needs to be saved ?
0.50 define “ frozen ” . did it panic ? or did something else happen ?
0.50 tried ctrl + alt + f2 ?
0.49 does the script process 1 iteration successfully ?
0.49 laptop or desktop ?
Title: How to flash a USB drive?.
Post: I have a 8 GB Sandisk USB drive. Recently it became write somehow.
So I searched in Google and I tried to remove the write protection
220
through almost all the methods I found. Unfortunately nothing worked.
So I decided to try some other ways.
StackExchange example output (ranking)
Title: Ubuntu 15.10 instant resume from suspend
Post: I have an ASUS desktop PC that I decided to install Ubuntu onto.
I have used Linux before, specifically for 3 years in High School.
I have never encountered suspend resume issues on Linux before until now.
It appears that my PC is instantly resuming from suspend on Ubuntu 15.10
I am not sure what is causing this, but my hardware is as follows:
Intel Core i5 4460 @ 3.2 GHz
2 TB Toshiba 7200 RPM disk
8 GB DDR3 RAM
Corsair CX 500 Power Supply
AMD Radeon R9 270X Graphics - 4 Gigs
ASUS Motherboard for OEM builds
VIA technologies USB 3.0 Hub
Realtek Network Adapter
Any help is greatly appreciated. I haven’t worked with Linux in over a year,
and I am trying to get back into it, as I plan to pursue a career in Comp Science
(specifically through internships and trade school) and this is a problem,
as I don’t want to drive the power bill up.
(Even though I don’t pay it, my parents do.)
0.87 does suspend - resume work as expected ?
0.71 what , specifically , is the problem you want help with ?
0.70 the suspend problem exits only if a virtual machines is running ?
0.67 is the pasted workaround still working for you ?
0.57 just wondering if you got a solution for this ?
0.50 we *could* try a workaround , with a keyboard shortcut . would that interest you ?
0.49 did you restart the systemd daemon after the changes ‘sudo restart systemd-logind‘
?
0.49 does running ‘sudo modprobe -r psmouse ; sleep 1 ; sudo modprobe psmouse‘ enable
the touchpad ?
0.49 2 to 5 minutes ?
0.49 does it work from the menu or not ?
Table 4.3: Example of human annotation from the askubuntu domain of our dataset.
The questions are sorted by expected utility, given in the first column. The “best”
annotation is marked with black ticks and the “valid”’ annotations are marked
with grey ticks . 221
Automatic metric based evaluation (question generation)
Amazon StackExchange
Model Diversity Bleu Meteor Diversity Bleu Meteor
Reference 0.6934 — — 0.7509 — —
Lucene 0.6289 4.26 10.85 0.7453 1.63 7.96
MLE 0.1059 17.02 12.72 0.2183 3.49 8.49
Max-Utility 0.1214 16.77 12.69 0.2508 3.89 8.79
GAN-Utility 0.1296 15.20 12.82 0.2256 4.26 8.99
Table 5.1: Diversity as measured by the proportion of unique trigrams in model

outputs. Bleu and Meteor scores using up to 10 references for the Amazon
dataset and up to six references for the StackExchange dataset. Numbers in bold
are the highest among the models. All results for Amazon are on the entire test set
whereas for StackExchange they are on the 500 instances of the test set that have
multiple references.
5.3.5 Automatic Metric Results

222
Table 5.1 shows the results on the two datasets when evaluated according to
automatic metrics.
Specificity-controlled question generation model results
Generic Specific
Model Diversity Bleu Meteor Diversity Bleu Meteor
Reference 0.6071 — — 0.7474 — —

Lucene 0.6289 2.90 12.04 0.6289 1.76 6.96
MLE 0.1201 12.61 13.29 0.1201 1.41 5.06

Max-Utility 0.1299 12.17 14.06 0.1299 1.79 5.57
GAN-Utility 0.1304 12.01 14.35 0.1304 2.69 6.12
Specificity-MLE 0.1023 12.61 13.53 0.1640 4.45 7.85
Specificity-GAN-Utility 0.1012 12.84 14.18 0.1357 2.95 6.08
Table 6.2: Diversity as measured by the proportion of unique trigrams in model

outputs. Bleu and Meteor scores are calculated using an average of 6 references
under generic setting and using an average of 3 references under specific setting.
The highest numbers within a column is in bold (except for diversity under generic
setting where the lowest number is bold).
Our best model is the one that uses all the features and attains an accuracy
of 0.73 on the test set. In comparison, a baseline model that predicts the specificity
223
label at random gets an accuracy of 0.58 on the test set.

Teaching Machines To Ask Useful Clarification Questions

Caricato da

Informazioni sul documento

Titolo originale

Copyright

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Teaching Machines To Ask Useful Clarification Questions

Caricato da

Copyright:

Teaching Machines to Ask

Useful Clarification Questions

Sudha Rao Committee

How long does it take to get a PhD?

How long does it take to get a PhD?

How long does it take to get a PhD?

How long does it take to get a PhD?

Context-aware questions about missing information

Context-aware questions about missing information

How long does it take to get a PhD ?

Context-aware questions about missing information

How long does it take to get a PhD ? Give me a recipe

Context-aware questions about missing information

How long does it take to get a PhD ? Give me a recipe

My class is going to the movies on a field trip next week.

Q: What do the students need to do before going to the movies?

o Vasile, et al. NLG 2010

o Goddeau, et al. 1996

Q: Was anyone injured in the crash?

Q: Is the motorcyclist alive?

Q: What caused the accident?

Mostafazadeh et al. "Generating natural questions about an image." ACL 2016

How to set environment variables for installation?

How to set environment variables for installation?

What version of Ubuntu do you have?

How are you installing ape? Shortlist of useful

Is this induction safe?

What is the warranty or guarantee on this?

What are the handles made of?

1. Question Ranking Model:

1. Question Ranking Model:

2. Question Generation Model:

o How we build the clarification questions dataset?

o How we rank clarification questions from an existing set?

o How we generate clarification questions from scratch?

o How we control specificity of the generated clarification questions?

o How we build the clarification questions dataset?

o How we rank clarification questions from an existing set?

o How we generate clarification questions from scratch?

o How we control specificity of the generated clarification questions?

How to configure path or set environment variables for installation?

I'm aiming to install ape, a simple code for pseudopotential generation.

How to configure path or set environment variables for installation?

I'm aiming to install ape, a simple code for pseudopotential generation.

Asaduzzaman, Muhammad, et al. "Answering questions about unanswered questions of stack

How to configure path or set environment variables for installation?

I'm aiming to install ape, a simple code for pseudopotential generation.

What version of ubuntu do you have?

How to configure path or set environment variables for installation?

I'm aiming to install ape, a simple code for pseudopotential generation.

What version of ubuntu do you have?

How to configure path or set environment variables for installation?

I'm aiming to install ape, a simple code for pseudopotential generation.

What version of ubuntu do you have?

How to configure path or set environment variables for installation?

I'm aiming to install ape, a simple code for pseudopotential generation.

What version of ubuntu do you have?

( context, question , answer ) triples

context Original post

question Clarification question posted in comments

answer Edit made to the post in response to the question

Dataset Size: ~77 K triples

Dataset Size: ~24K (3-10 questions per context)

o How we build the clarification questions dataset?

§ Two datasets: StackExchange & Amazon

o How we rank clarification questions from an existing set?

o  Vasile, et al. NLG 2010

o  Goddeau, et al. 1996

1.  Question Ranking Model:

1.  Question Ranking Model:

2.  Question Generation Model:

o  How we build the clarification questions dataset?

o  How we rank clarification questions from an existing set?

o  How we generate clarification questions from scratch?

o  How we control specificity of the generated clarification questions?

o  How we build the clarification questions dataset?

o  How we rank clarification questions from an existing set?

o  How we generate clarification questions from scratch?

o  How we control specificity of the generated clarification questions?

o  How we build the clarification questions dataset?

§  Two datasets: StackExchange & Amazon

o  How we rank clarification questions from an existing set?

o  How we generate clarification questions from scratch?

o  How we control specificity of the generated clarification questions?

o  How we build the clarification questions dataset?

o  How we rank clarification questions from an existing set?

o  How we generate clarification questions from scratch?

o  How we control specificity of the generated clarification questions?

o  Definition: Value of Perfect Information VPI (x|c)

o  Definition: Value of Perfect Information VPI (x|c)

o  Since we have not acquired x, we define its value in expectation

o  Definition: Value of Perfect Information VPI (x|c)

o  Since we have not acquired x, we define its value in expectation

o  Definition: Value of Perfect Information VPI (x|c)

o  Since we have not acquired x, we define its value in expectation

o  Definition: Value of Perfect Information VPI (x|c)

o  Since we have not acquired x, we define its value in expectation

2.  Are answers useful in identifying good questions?

2.  Are answers useful in identifying good questions?