Sei sulla pagina 1di 4

MDye | Fall 2014

Discussion Group: Quantitative Approaches to Language

Helpful References
Manning, C.D. & Schtze, H. (1999). Foundations of Statistical Natural
Language Processing.
Jurafsky, D. & Martin, J.H. (2008). Speech and Language Processing: An
Introduction to Natural Language Processing, Computational Linguistics
and Speech Recognition. 2nd Edition.
Cover, T.M. & Thomas, J.A. (2006). Elements of Information Theory.
Manning, C.D., Raghavan, P. & Schtze, H. (2008). Introduction to
Information Retrieval. Cambridge University Press.

1. Theoretical & Mathematical Foundations


Abney, S. P. (1996). Statistical methods and linguistics. In J. L. Klavans
& P. Resnik (eds.) The balancing act: Combining symbolic and
statistical approaches to language (pp. 126), Cambridge MA: MIT
Press.
Norvig, P. (2011). On Chomsky and the Two Cultures of Statistical
Learning. Web log post.
Harris, Z. (1991). A theory of language and information: A
mathematical approach.*
Wittgenstein, L. Philosophical Investigations.*

2. Bags of Words
Goldsmith, J.H. (2007). Probability for linguists.
Baroni, M. (2006). Distributions in texts.
Gilquin, G., & Gries, S. T. (2009). Corpora and experimental methods: A
state-of-the-art review. Corpus Linguistics and Linguistic Theory, 5(1),
126.

2. Zipfs Law
Zipf, G.K. (1949). Human Behavior and the Principle of Least Effort: An
Introduction to Human Ecology. Addison-Wesley: Cambridge, MA.
Mandelbrot, B. (1953). An informational theory of the statistical structure
of language. Communication Theory.
Ferrer i Cancho, R., & Sole, R. V. (2003). Least effort and the origins of
scaling in human language. Proceedings of the National Academy of
Sciences, 100(3), 788791.
L, L., Zhang, Z.K., & Zhou, T. (2010). Zipf's Law Leads to Heaps Law:
Analyzing Their Relation in Finite-Size Systems. PLoS ONE 5(12):
e14139.
Barabasi, A.L. (2005). The origin of bursts and heavy tails in human
dynamics. Nature, 435, 207-211.

3. The Statistics of Texts


Baayen, R. H. (1996). The randomness assumption in word frequency
MDye | Fall 2014

statistics. Research in Humanities Computing 5, Oxford University


Press, Oxford, 17-31.
Evert, S. (2006). How Random is a Corpus? The Library Metaphor.
Zeitschrift Fr Anglistik Und Amerikanistik, 54(2).
Church, K.W., & Gale, W.A. (1995a). Poisson mixtures. Natural Language
Engineering, 1(02), 163190.
Serrano, M. ., Flammini, A., & Menczer, F. (2009). Modeling Statistical
Properties of Written Text. PLoS ONE, 4(4), e5372.
Altmann, E. G., Pierrehumbert, J. B., & Motter, A. E. (2009). Beyond Word
Frequency: Bursts, Lulls, and Scaling in the Temporal Distributions of
Words. PLoS ONE, 4(11), e7678.

4. Insights from Information Retrieval


Sprck Jones, K. (1972). A statistical interpretation of term specificity
and its application in retrieval. Journal of Documentation, 28(1), 1121.
Church, K.W., & Gale, W.A. (1995b). Inverse Document Frequency (IDF):
A Measure of Deviation from Poisson (pp. 121-130). In Proceedings of
the Third Workshop on Very Large Corpora.
Aizawa, A. (2003). An information-theoretic perspective of tf-idf
measures. Information Processing & Management, 39(1), 4565.

5. Distributional Approaches I
Salton, G., Wong, A., & Yang, C. S. (1975). A vector space model for
automatic indexing. Communications of the ACM, 18(11), 613620.
Landauer and Dumais (1997). A solution to Platos problem: The latent
semantic analysis theory of acquisition, induction, and representation
of knowledge. Psychological Review, 104, 211-240.
Lund, K., & Burgess, C. (1996). Producing high-dimensional semantic
spaces from lexical co-occurrence. Behavior Research Methods,
Instruments, & Computers: a Journal of the Psychonomic Society,
28(2), 203208.

6. Distributional Approaches 2
Bullinaria, J. A., & Levy, J. P. (2007). Extracting semantic representations
from word co-occurrence statistics: a computational study. Behavior
Research Methods, 39(3), 510526.
Recchia, G., & Jones, M. N. (2009). More data trumps smarter algorithms:
Comparing pointwise mutual information with latent semantic analysis.
Behavior Research Methods, 41(3), 647656.
Andrews, M., Vigliocco, G., & Vinson, D. (2009). Integrating experiential
and distributional data to learn semantic representations.
Psychological Review, 116(3), 463498.
Lapesa, G., Evert, S., & Walde, S.S.I. (2014). Contrasting Syntagmatic
and Paradigmatic Relations: Insights from Distributional Semantic
Models, 111.

7. Statistical Language Modeling 1


MDye | Fall 2014

Rosenfeld, R. (2000). Two decades of statistical language modeling:


Where do we go from here? Proceedings of the IEEE, 88, 12701278.
Chen, S.F. (1996). Building Probabilistic Models for Natural Language.
(Unpublished doctoral dissertation).
Goodman, J.T. (2001) A Bit of Progress in Language Modeling. Technical
Report MSR-TR-2001-72, Microsoft Research.

8. Statistical Language Modeling 2


Brown, P. F., deSouza, P. V., Mercer, R. L., Pietra, V. J. D., & Lai, J. C.
(1992). Class-based n -gram models of natural language.
Computational Linguistics, 18(4), 467479.
Bellegarda, J. R. (2000). Exploiting latent semantic information in
statistical language modeling. Proceedings of the IEEE, 88(8), 1279
1296.
Dagan, I., Lee, L., & Pereira, F. C. N. (1999). Similarity-Based Models of
Word Cooccurrence Probabilities. Machine Learning, 34, 4369.

9. Topics Models
Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent Dirichlet allocation.
Journal of Machine Learning Research, 3(4-5), 9931022.
Madsen, R. E., Kauchak, D., & Elkan, C. (2005). Modeling word burstiness
using the Dirichlet distribution. Proceedings of the 22st International
Conference on Machine Learning (pp. 545552).
Griffiths, T. L., Steyvers, M., & Tenenbaum, J. B. (2007). Topics in
semantic representation. Psychological Review, 114(2), 211244.

10. Networks
Barabasi, A., & Albert, R. (1999). Emergence of scaling in random
networks. Science, 286(5439), 509512.
Newman, M. (2005). Power laws, Pareto distributions and Zipf's law.
Contemporary Physics, 46(5), 323351.
Steyvers, M., & Tenenbaum, J. B. (2005). The large-scale structure of
semantic networks: statistical analyses and a model of semantic
growth. Cognitive Science, 29(1), 4178.

11. Information Theory: Conceptual Foundations 1


Denning, P. J., & Bell, T. (2012). The Information Paradox. American
Scientist, 100(6), 470.
DeDeo, S. (2012). Information Theory for Intelligent People.
Campbell, J. (1982). Grammatical man: Information, entropy, language,
and life.*
Gleick, J. (2011). The Information: A History, A Theory, A Flood.*

12. Information Theory: Conceptual Foundations 2


Shannon, C.E. (1951). Prediction and Entropy of Printed English. Bell
System Technical Journal, 30(1), 5064.
MDye | Fall 2014

Shannon, C.E. (1948). A Mathematical Theory of Communication. Bell


System Technical Journal, 27, 379423, 623656.
Pereira, F. (2000). Formal grammar and information theory: together
again? Philosophical Transactions of the Royal Society of London A,
358(1769), 12391253.

13. Information Theory: Entropy Rate


Genzel, D., & Charniak, E. (2002). Entropy rate constancy in text (pp.
199206). In Proceedings of the 40th Annual Meeting on Association
for Computational Linguistics. Association for Computational
Linguistics: Morristown, NJ.
Aylett, M., & Turk, A. (2004). The Smooth Signal Redundancy Hypothesis:
A Functional Explanation for Relationships between Redundancy,
Prosodic Prominence, and Duration in Spontaneous Speech. Language
and Speech, 47(1), 3156.
Qian, T., & Jaeger, T. F. (2012). Cue Effectiveness in Communicatively
Efficient Discourse Production. Cognitive Science, 36(7), 13121336.
Pellegrino, F., Coup, C., & Marsico, E. (2011). A cross-language
perspective on speech information rate. Language, 87(3), 539558.

14. Information Theory: Communicative Repertoires


Piantadosi, S. T., Tily, H., & Gibson, E. (2012). The communicative
function of ambiguity in language. Cognition, 112.
Baddeley, R., & Attewell, D. (2009). The Relationship Between Language
and the Environment: Information Theory Shows Why We Have Only
Three Lightness Terms. Psychological Science, 20(9), 11001107.
McCowan, B., Doyle, L.R.. & Hanser, S.F. (2002). Using Information
Theory to Assess the Diversity, Complexity, and Development of
Communicative Repertoires. Journal of Comparative Psychology,
116(2), 166-172.

15. Information Theory: Processing & Production


Bell, A., Brenier, J. M., Gregory, M., Girand, C., & Jurafsky, D. (2008).
Predictability effects on durations of content and function words in
conversational English. Journal of Memory and Language, 60(1), 92
111.
Jaeger, T. F. (2010). Redundancy and reduction: Speakers manage
syntactic information density. Cognitive Psychology, 61(1), 2362.
Frank, S.L. & Bod, R. (2011). Insensitivity of the human sentence-
processing system to hierarchical structure. Psychological Science, 22,
829-834.
Moscoso del Prado Martn, F., Kosti, A., & Baayen, R.H. (2004) Putting
the bits together: An information theoretical perspective on
morphological processing. Cognition, 94, 1-18

Potrebbero piacerti anche