Topics & Papers

We’re basically following the structure of this nice Review of the Neural History of NLP by Sebastian Ruder. That blogpost provides a useful narrative around the individual papers that we will discuss (as well as some useful links to background resources, so be sure to read it first).

You should be able to find all of the papers just by dropping the title into the search engine of your choice. But do make sure that you actually get the version specified here. Sometimes there are conference paper versions and later journal paper versions with the same title.

01 Language Modelling with Neural Networks

Feed Forward

Main:

Bengio, Y., Ducharme, R., Vincent, P., & Janvin, C. (2003). A Neural Probabilistic Language Model. The Journal of Machine Learning Research, 3, 1137–1155. Selected for Week 03.

Background:

Bengio, Y., Ducharme, R., & Vincent, P. (2001). A neural probabilistic language model. In NIPS.

Recurrent

Main:

Graves, A. (2013). Generating sequences with recurrent neural networks. ArXiv Preprint ArXiv:1308.0850, 1–43. Selected for Week 04.

Background:

Melis, G., Dyer, C., & Blunsom, P. (2018). On the State of the Art of Evaluation in Neural Language Models. In ICLR (pp. 1–10).

Background Material

02 Multitask Learning

Main:

Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K., & Kuksa, P. (2011). Natural Language Processing (Almost) from Scratch. Journal of Machine Learning Research, 12, 2461–2505.

Background:

Collobert, R., & Weston, J. (2008). A Unified Architecture for Natural Language Processing: Deep Neural Networks with Multitask Learning. In ICML.

03 Word Representations

word2vec

Main:

Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient Estimation of Word Representations in Vector Space. In ICLR (pp. 1–12). Selected for Week 05.

Background:

Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Distributed Representations of Words and Phrases and their Compositionality. In NIPS (pp. 1–9).

Levy, O., & Goldberg, Y. (2014). Neural Word Embedding as Implicit Matrix Factorization. In NIPS 2014 (pp. 1–9).

multilingual

Main:

Ruder, S., Vulić, I., & Søgaard, A. (2017). A Survey Of Cross-lingual Word Embedding Models. ArXiv, 1–55.

Background Material

04 Convolutional Networks for NLP

Main 1:

Kalchbrenner, N., Grefenstette, E., & Blunsom, P. (2014). A Convolutional Neural Network for Modelling Sentences. In ACL (pp. 655–665).

Background 1:

Kalchbrenner, N., Espeholt, L., Simonyan, K., Oord, A. van den, Graves, A., & Kavukcuoglu, K. (2016). Neural Machine Translation in Linear Time. ArXiv.

Main 2:

Kim, Y. (2014). Convolutional Neural Networks for Sentence Classification. In EMNLP.

Background:

05 Trees

Recursive NNs

Main:

Socher, R., Perelygin, A., Wu, J. Y., Chuang, J., Manning, C. D., Ng, A. Y., & Potts, C. (2013). Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank. In EMNLP 2013.

Tai, K. S., Socher, R., & Manning, C. D. (2006). Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks. In ACL.

RNN Grammars

Main:

Dyer, C., Kuncoro, A., Ballesteros, M., & Smith, N. A. (2016). Recurrent Neural Network Grammars. In NAACL (pp. 199–209).

Background:

Kuncoro, A., Ballesteros, M., Kong, L., Dyer, C., Neubig, G., & Smith, N. A. (2017). What Do Recurrent Neural Network Grammars Learn About Syntax? In EACL (Vol. 1, pp. 1249–1258).

06 Sequence to Sequence

Sutskever, I., Vinyals, O., & Le, Q. V. (2014). Sequence to Sequence Learning with Neural Networks. In NIPS 2014 (p. 9). Computation and Language; Learning.

Gillick, D., Brunk, C., Vinyals, O., & Subramanya, A. (2016). Multilingual Language Processing From Bytes. In NAACL (pp. 1296–1306).

07 Attention

Main:

Bahdanau, D., Cho, K., & Bengio, Y. (2015). Neural Machine Translation by Jointly Learning to Align and Translate. In ICLR 2015 (pp. 1–15).

Main:

Vaswani, A., & Uszkoreit, J. (2017). Attention Is All You Need. In NIPS.

Background Material

08 Memory

Main:

Kumar, A., Irsoy, O., Su, J., Bradbury, J., English, R., Pierce, B., … Socher, R. (2016). Ask Me Anything: Dynamic Memory Networks for Natural Language Processing. ArXiv, 48, 1–10.

09 What do these things learn?

Main:

Blevins, T., Levy, O., & Zettlemoyer, L. (2018). Deep RNNs Encode Soft Hierarchical Syntax. In ACL (pp. 14–19).

Kuncoro, A., Dyer, C., Hale, J., Yogatama, D., Clark, S., & Blunsom, P. (2018). LSTMs Can Learn Syntax-Sensitive Dependencies Well, But Modeling Structure Makes Them Better. Proceedings of the 56st Annual Meeting of the Association for Computational Linguistics, 1–11.

10 Sesame Street (Massive Pretraining for Word and Sentence Representations)

Main:

Peters, M. E., Neumann, M., Iyyer, M., Gardner, M., Clark, C., Lee, K., & Zettlemoyer, L. (2018). Deep contextualized word representations. In NAACL 2018 (pp. 2227–2237).

Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2018). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. ArXiv.

Background:

Tenney, I., Xia, P., Chen, B., Wang, A., Poliak, A., Mccoy, R. T., … Pavlick, E. (2019). What do you learn from context? Probing for sentence structure in contextualized word representations. In ICLR 2019 (pp. 1–17).

Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., & Sutskever, I. (2018). Language Model and Unsupervised Multitask Learning.

Background Material

11 Multitask II

Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., & Bowman, S. R. (2019). GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding. In ICLR 2019 (pp. 1–20). Retrieved from

Ruder, S., Bingel, J., Augenstein, I., & Søgaard, A. (2019). Latent Multi-task Architecture Learning. In AAAI 2019. Retrieved from