Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Abstract y1 y2 y3 y4
Table 3: Test set results on the SICK semantic relatedness subtask. For our experiments, we report mean
scores over 5 runs (standard deviations in parentheses). Results are grouped as follows: (1) SemEval
2014 submissions; (2) Our own baselines; (3) Sequential LSTMs; (4) Tree-structured LSTMs.
0.50 0.84
r
0.45
DT-LSTM
0.82 DT-LSTM
0.40 CT-LSTM CT-LSTM
LSTM 0.80 LSTM
0.35
Bi-LSTM Bi-LSTM
0.30 0.78
0 5 10 15 20 25 30 35 40 45 4 6 8 10 12 14 16 18 20
sentence length mean sentence length
Figure 3: Fine-grained sentiment classification ac- Figure 4: Pearson correlations r between pre-
curacy vs. sentence length. For each `, we plot dicted similarities and gold ratings vs. sentence
accuracy for the test set sentences with length in length. For each `, we plot r for the pairs with
the window [` 2, ` + 2]. Examples in the tail mean length in the window [`2, `+2]. Examples
of the length distribution are batched in the final in the tail of the length distribution are batched in
window (` = 45). the final window (` = 18.5).
tems without any additional feature engineering, bustness to differences in sentence length. Given
with the best results achieved by the Dependency the query two men are playing guitar, the Tree-
Tree-LSTM. Recall that in this task, both Tree- LSTM associates the phrase playing guitar with
LSTM models only receive supervision at the root the longer, related phrase dancing and singing in
of the tree, in contrast to the sentiment classifi- front of a crowd (note as well that there is zero
cation task where supervision was also provided token overlap between the two phrases).
at the intermediate nodes. We conjecture that in
7.2 Effect of Sentence Length
this setting, the Dependency Tree-LSTM benefits
from its more compact structure relative to the One hypothesis to explain the empirical strength
Constituency Tree-LSTM, in the sense that paths of Tree-LSTMs is that tree structures help miti-
from input word vectors to the root of the tree gate the problem of preserving state over long se-
are shorter on aggregate for the Dependency Tree- quences of words. If this were true, we would ex-
LSTM. pect to see the greatest improvement over sequen-
tial LSTMs on longer sentences. In Figs. 3 and 4,
7 Discussion and Qualitative Analysis we show the relationship between sentence length
and performance as measured by the relevant task-
7.1 Modeling Semantic Relatedness specific metric. Each data point is a mean score
In Table 4, we list nearest-neighbor sentences re- over 5 runs, and error bars have been omitted for
trieved from a 1000-sentence sample of the SICK clarity.
test set. We compare the neighbors ranked by the We observe that while the Dependency Tree-
Dependency Tree-LSTM model against a baseline LSTM does significantly outperform its sequen-
ranking by cosine similarity of the mean word vec- tial counterparts on the relatedness task for
tors for each sentence. longer sentences of length 13 to 15 (Fig. 4), it
The Dependency Tree-LSTM model exhibits also achieves consistently strong performance on
several desirable properties. Note that in the de- shorter sentences. This suggests that unlike se-
pendency parse of the second query sentence, the quential LSTMs, Tree-LSTMs are able to encode
word ocean is the second-furthest word from the semantically-useful structural information in the
root (waving), with a depth of 4. Regardless, the sentence representations that they compose.
retrieved sentences are all semantically related to
8 Related Work
the word ocean, which indicates that the Tree-
LSTM is able to both preserve and emphasize in- Distributed representations of words (Rumelhart
formation from relatively distant nodes. Addi- et al., 1988; Collobert et al., 2011; Turian et al.,
tionally, the Tree-LSTM model shows greater ro- 2010; Huang et al., 2012; Mikolov et al., 2013;
Ranking by mean word vector cosine similarity Score Ranking by Dependency Tree-LSTM model Score
a woman is slicing potatoes a woman is slicing potatoes
a woman is cutting potatoes 0.96 a woman is cutting potatoes 4.82
a woman is slicing herbs 0.92 potatoes are being sliced by a woman 4.70
a woman is slicing tofu 0.92 tofu is being sliced by a woman 4.39
a boy is waving at some young runners from the ocean a boy is waving at some young runners from the ocean
a man and a boy are standing at the bottom of some stairs , 0.92 a group of men is playing with a ball on the beach 3.79
which are outdoors
a group of children in uniforms is standing at a gate and 0.90 a young boy wearing a red swimsuit is jumping out of a 3.37
one is kissing the mother blue kiddies pool
a group of children in uniforms is standing at a gate and 0.90 the man is tossing a kid into the swimming pool that is 3.19
there is no one kissing the mother near the ocean
two men are playing guitar two men are playing guitar
some men are playing rugby 0.88 the man is singing and playing the guitar 4.08
two men are talking 0.87 the man is opening the guitar for donations and plays 4.01
with the case
two dogs are playing with each other 0.87 two men are dancing and singing in front of a crowd 4.00
Table 4: Most similar sentences from a 1000-sentence sample drawn from the SICK test set. The Tree-
LSTM model is able to pick up on more subtle relationships, such as that between beach and ocean
in the second example.
Pennington et al., 2014) have found wide appli- and sentiment classification, outperforming exist-
cability in a variety of NLP tasks. Following ing systems on both. Controlling for model di-
this success, there has been substantial interest in mensionality, we demonstrated that Tree-LSTM
the area of learning distributed phrase and sen- models are able to outperform their sequential
tence representations (Mitchell and Lapata, 2010; counterparts. Our results suggest further lines of
Yessenalina and Cardie, 2011; Grefenstette et al., work in characterizing the role of structure in pro-
2013; Mikolov et al., 2013), as well as distributed ducing distributed representations of sentences.
representations of longer bodies of text such as
paragraphs and documents (Srivastava et al., 2013; Acknowledgements
Le and Mikolov, 2014). We thank our anonymous reviewers for their valu-
Our approach builds on recursive neural net- able feedback. Stanford University gratefully ac-
works (Goller and Kuchler, 1996; Socher et al., knowledges the support of a Natural Language
2011), which we abbreviate as Tree-RNNs in or- Understanding-focused gift from Google Inc. and
der to avoid confusion with recurrent neural net- the Defense Advanced Research Projects Agency
works. Under the Tree-RNN framework, the vec- (DARPA) Deep Exploration and Filtering of Text
tor representation associated with each node of (DEFT) Program under Air Force Research Lab-
a tree is composed as a function of the vectors oratory (AFRL) contract no. FA8750-13-2-0040.
corresponding to the children of the node. The Any opinions, findings, and conclusion or recom-
choice of composition function gives rise to nu- mendations expressed in this material are those of
merous variants of this basic framework. Tree- the authors and do not necessarily reflect the view
RNNs have been used to parse images of natu- of the DARPA, AFRL, or the US government.
ral scenes (Socher et al., 2011), compose phrase
representations from word vectors (Socher et al., References
2012), and classify the sentiment polarity of sen-
Bahdanau, Dzmitry, Kyunghyun Cho, and Yoshua
tences (Socher et al., 2013).
Bengio. 2014. Neural machine translation by
9 Conclusion jointly learning to align and translate. arXiv
preprint arXiv:1409.0473 .
In this paper, we introduced a generalization of
LSTMs to tree-structured network topologies. The Bengio, Yoshua, Patrice Simard, and Paolo Fras-
Tree-LSTM architecture can be applied to trees coni. 1994. Learning long-term dependencies
with arbitrary branching factor. We demonstrated with gradient descent is difficult. IEEE Trans-
the effectiveness of the Tree-LSTM by applying actions on Neural Networks 5(2):157166.
the architecture in two tasks: semantic relatedness Bjerva, Johannes, Johan Bos, Rob van der Goot,
and Malvina Nissim. 2014. The Meaning Fac- Hinton, Geoffrey E, Nitish Srivastava, Alex
tory: Formal semantics for recognizing textual Krizhevsky, Ilya Sutskever, and Ruslan R
entailment and determining semantic similarity. Salakhutdinov. 2012. Improving neural net-
SemEval 2014 . works by preventing co-adaptation of feature
Blunsom, Phil, Edward Grefenstette, Nal Kalch- detectors. arXiv preprint arXiv:1207.0580 .
brenner, et al. 2014. A convolutional neural net- Hochreiter, Sepp. 1998. The vanishing gradient
work for modelling sentences. In Proceedings problem during learning recurrent neural nets
of the 52nd Annual Meeting of the Association and problem solutions. International Journal of
for Computational Linguistics. Uncertainty, Fuzziness and Knowledge-Based
Systems 6(02):107116.
Chen, Danqi and Christopher D Manning. 2014. A
fast and accurate dependency parser using neu- Hochreiter, Sepp and Jurgen Schmidhuber. 1997.
ral networks. In Proceedings of the 2014 Con- Long Short-Term Memory. Neural Computa-
ference on Empirical Methods in Natural Lan- tion 9(8):17351780.
guage Processing (EMNLP). pages 740750. Huang, Eric H., Richard Socher, Christopher D.
Collobert, Ronan, Jason Weston, Leon Bottou, Manning, and Andrew Y. Ng. 2012. Improv-
Michael Karlen, Koray Kavukcuoglu, and Pavel ing word representations via global context and
Kuksa. 2011. Natural language processing (al- multiple word prototypes. In Annual Meeting
most) from scratch. The Journal of Machine of the Association for Computational Linguis-
Learning Research 12:24932537. tics (ACL).
Duchi, John, Elad Hazan, and Yoram Singer. 2011. Irsoy, Ozan and Claire Cardie. 2014. Deep re-
Adaptive subgradient methods for online learn- cursive neural networks for compositionality in
ing and stochastic optimization. The Journal of language. In Advances in Neural Information
Machine Learning Research 12:21212159. Processing Systems. pages 20962104.
Jimenez, Sergio, George Duenas, Julia Baquero,
Elman, Jeffrey L. 1990. Finding structure in time.
Alexander Gelbukh, Av Juan Dios Batiz, and
Cognitive science 14(2):179211.
Av Mendizabal. 2014. UNAL-NLP: Combin-
Foltz, Peter W, Walter Kintsch, and Thomas K ing soft cardinality features for semantic textual
Landauer. 1998. The measurement of textual similarity, relatedness and entailment. SemEval
coherence with latent semantic analysis. Dis- 2014 .
course processes 25(2-3):285307. Kim, Yoon. 2014. Convolutional neural net-
Ganitkevitch, Juri, Benjamin Van Durme, and works for sentence classification. arXiv preprint
Chris Callison-Burch. 2013. PPDB: The Para- arXiv:1408.5882 .
phrase Database. In HLT-NAACL. pages 758 Klein, Dan and Christopher D Manning. 2003.
764. Accurate unlexicalized parsing. In Proceedings
Goller, Christoph and Andreas Kuchler. 1996. of the 41st Annual Meeting on Association for
Learning task-dependent distributed representa- Computational Linguistics-Volume 1. Associa-
tions by backpropagation through structure. In tion for Computational Linguistics, pages 423
IEEE International Conference on Neural Net- 430.
works. volume 1, pages 347352. Lai, Alice and Julia Hockenmaier. 2014. Illinois-
Graves, Alex, Navdeep Jaitly, and A-R Mohamed. lh: A denotational and distributional approach
2013. Hybrid speech recognition with deep to semantics. SemEval 2014 .
bidirectional LSTM. In IEEE Workshop on Au- Landauer, Thomas K and Susan T Dumais. 1997.
tomatic Speech Recognition and Understanding A solution to platos problem: The latent se-
(ASRU). pages 273278. mantic analysis theory of acquisition, induction,
Grefenstette, Edward, Georgiana Dinu, Yao- and representation of knowledge. Psychological
Zhong Zhang, Mehrnoosh Sadrzadeh, and review 104(2):211.
Marco Baroni. 2013. Multi-step regression Le, Quoc V and Tomas Mikolov. 2014. Dis-
learning for compositional distributional se- tributed representations of sentences and doc-
mantics. arXiv preprint arXiv:1301.6939 . uments. arXiv preprint arXiv:1405.4053 .
Marelli, Marco, Luisa Bentivogli, Marco Ba- drew Y Ng, and Christopher Potts. 2013. Re-
roni, Raffaella Bernardi, Stefano Menini, and cursive deep models for semantic composition-
Roberto Zamparelli. 2014. SemEval-2014 Task ality over a sentiment treebank. In Proceedings
1: Evaluation of compositional distributional of the Conference on Empirical Methods in Nat-
semantic models on full sentences through se- ural Language Processing (EMNLP).
mantic relatedness and textual entailment. In Srivastava, Nitish, Ruslan R Salakhutdinov, and
SemEval 2014. Geoffrey E Hinton. 2013. Modeling documents
Mikolov, Tomas. 2012. Statistical Language Mod- with deep boltzmann machines. arXiv preprint
els Based on Neural Networks. Ph.D. thesis, arXiv:1309.6865 .
Brno University of Technology. Sutskever, Ilya, Oriol Vinyals, and Quoc VV Le.
Mikolov, Tomas, Ilya Sutskever, Kai Chen, Greg S 2014. Sequence to sequence learning with neu-
Corrado, and Jeff Dean. 2013. Distributed ral networks. In Advances in Neural Informa-
representations of words and phrases and their tion Processing Systems. pages 31043112.
compositionality. In Advances in Neural Infor- Turian, Joseph, Lev Ratinov, and Yoshua Bengio.
mation Processing Systems. pages 31113119. 2010. Word representations: a simple and gen-
Mitchell, Jeff and Mirella Lapata. 2010. Composi- eral method for semi-supervised learning. In
tion in distributional models of semantics. Cog- Proceedings of the 48th annual meeting of the
nitive science 34(8):13881429. association for computational linguistics. As-
Pennington, Jeffrey, Richard Socher, and Christo- sociation for Computational Linguistics, pages
pher D Manning. 2014. Glove: Global vectors 384394.
for word representation. Proceedings of the Em- Vinyals, Oriol, Alexander Toshev, Samy Bengio,
piricial Methods in Natural Language Process- and Dumitru Erhan. 2014. Show and tell: A
ing (EMNLP 2014) 12. neural image caption generator. arXiv preprint
arXiv:1411.4555 .
Rumelhart, David E, Geoffrey E Hinton, and
Ronald J Williams. 1988. Learning represen- Yessenalina, Ainur and Claire Cardie. 2011. Com-
tations by back-propagating errors. Cognitive positional matrix-space models for sentiment
modeling 5. analysis. In Proceedings of the Conference
on Empirical Methods in Natural Language
Socher, Richard, Brody Huval, Christopher D
Processing. Association for Computational Lin-
Manning, and Andrew Y Ng. 2012. Seman-
guistics, pages 172182.
tic compositionality through recursive matrix-
vector spaces. In Proceedings of the 2012 Joint Zaremba, Wojciech and Ilya Sutskever.
Conference on Empirical Methods in Natural 2014. Learning to execute. arXiv preprint
Language Processing and Computational Nat- arXiv:1410.4615 .
ural Language Learning. Association for Com- Zhao, Jiang, Tian Tian Zhu, and Man Lan. 2014.
putational Linguistics, pages 12011211. ECNU: One stone two birds: Ensemble of het-
Socher, Richard, Andrej Karpathy, Quoc V Le, erogenous measures for semantic relatedness
Christopher D Manning, and Andrew Y Ng. and textual entailment. SemEval 2014 .
2014. Grounded compositional semantics for
finding and describing images with sentences.
Transactions of the Association for Computa-
tional Linguistics 2:207218.
Socher, Richard, Cliff C Lin, Chris Manning, and
Andrew Y Ng. 2011. Parsing natural scenes
and natural language with recursive neural net-
works. In Proceedings of the 28th International
Conference on Machine Learning (ICML-11).
pages 129136.
Socher, Richard, Alex Perelygin, Jean Y Wu,
Jason Chuang, Christopher D Manning, An-