The following two papers are here for historical reasons. These are survey papers that describe the state of the art in Machine Learning for NLP in 1999 and 2005.

- C. Cardie and R. Mooney, Guest Editors’ Introduction: Machine Learning and Natural Language Processing. Machine Learning Journal. Special Issue on Natural Language Learning 1999
- P. Fung and and D. Roth, Guest Editors’ Introduction: Machine Learning in Speech and Language Technologies. Machine Learning Journal, Special Issue on Natural Language Learning 2005

- A. Ng and M. Jordan On Discriminative vs. Generative Classifiers. A comparison of Logistics Regression and naive Bayes NIPS 2002
- D. Roth Learning to Resolve Natural Language Ambiguities: A Unified Approach AAAI 1998
- D. Roth Learning in Natural Language IJCAI 1999

- S. Har-Peled, D. Roth and D. Zimak, Constraint Classification for Multiclass Classification and Ranking NIPS 2003
- Y. Crammer and T. Singer, Ultraconservative Online Algorithms for Multiclass Problems JMLR 2003
- Y. Even-Zohar and D. Roth, A Sequential Model for Multi Class Classification EMNLP 2001
- X. Li and D. Roth, X. Lin and D. Roth, Learning Questions Classifiers: The Role of Semantic Information NLE 2005
- M. Gupta, S. Bengio and J. Weston, Training Highly Multiclass Classifiers JMLR 2014

- Chapter 9-10 Manning and Schutze
- Y. Bengio, Markovian Models for Sequential Data, Neural Computing Surveys 1999.
- L. R. Rabiner, A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition, IEEE 1989.

**♥**V. Punyakanok and D. Roth, The Use of Classifiers in Sequential Inference, NIPS 2001- A. McCallum, D. Freitag, and F. Pereira, Maximum entropy Markov models for information extraction and segmentation, ICML 2000.

**♥**J. Lafferty, A. McCallum, F. Pereira Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data ICML 2002- C. Sutton and A. McCallum Introduction to Conditional Random Fields for Relational Learning In Statistical Relational Learning, 2007
**♥**A. F. T. Martins, N. A. Smith, P. M. Q. Aguiar, and M. A. T. Figueiredo Structured Sparsity in Structured Prediction EMNLP 2011- T. Vieira, R. Cotterell and J. Eisner Speed-Accuracy Tradeoffs in Tagging with Variable-Order CRFs and Structured Sparsity EMNLP 2016

- M. Collins, Discriminative Training for Hidden Markov Models: Theory and Experiments with Perceptron Algorithms EMNLP 2002.
**♥**K. Crammer, Y. Singer Pranking with Ranking NIPS 2002.**♥**L. Huang, S. Fayong, Y. Guo, Structured Perceptron with Inexact Search NAACL 2012.**♥**R. McDonald, K. Hall, G. Mann Distributed Training Strategies for the Structured Perceptron NAACL 2010.

**BONUS**: To learn how to efficiently implement averaged perceptron (without storing weight vectors), refer Fig 2.3 on page 19 in Hal Daume’s thesis.

- C. Burges A Tutorial on Support Vector Machines for Pattern Recognition, 1998
**♥**B. Taskar, C. Guestrin and D. Koller Max-Margin Markov Networks NIPS 2003**♥**I. Tsochantaridis, T. Hofman, T. Joachims, Y. Altun Large Margin Methods for Structured and Interdependent Output Variables JMLR 2005

- D. Roth and W. Yih, A Linear Programming Formulation for Global Inference in Natural Language Tasks. CoNLL 2004
**♥**D. Roth and W. Yih Global Inference for Entity and Relation Identification via a Linear Programming Formulation. Introduction to Statistical Relational Learning, 2007- M. Richardson and P. Domingos, Markov Logic Networks Machine Learning Journal 2006

**BONUS**: To learn how to convert boolean constraints to ILP constraints, refer,

- W. Yih Global Inference Using Integer Linear Programming Technical Report 2004.
#### Applications

**♥**J. Clarke and M. Lapata Constraint-Based Sentence Compression: An Integer Programming Approach COLING/SCL 2006**♥**S. Riedel and J. Clarke, Incremental Integer Linear Programming for Non-projective Dependency Parsing EMNLP 2006**♥**J. Clarke and M. Lapata Global Inference for Sentence Compression: An Integer Linear Programming Approach JAIR 2008**♥**A. F. T. Martins, N. A. Smith, and E. P. Xing, Concise Integer Linear Programming Formulations for Dependency Parsing ACL 2009**♥**Y. Choi and C. Cardie, Adapting a Polarity Lexicon Using Integer Linear Programming for Domain-Specific Sentiment Classification EMNLP 2009**♥**X. Cheng and D. Roth, Relational Inference for Wikification EMNLP 2013.

**♥**V. Punyakanok, D. Roth, W. Yih, and D. Zimak Learning and Inference over Constrained Output IJCAI 2005**♥**D. Roth, W. Yih Integer Linear Programming Inference for Conditional Random Fields ICML 2005.

- V. Srikumar and C. Manning Learning Distributed Representations for Structured Output Prediction. NIPS 2014

**♥**B. Taskar, D. Klein, M. Collins, D. Koller and C. Manning. Max-Margin Parsing EMNLP 2004**♥**M. Collins Discriminative Reranking for Natural Language Parsing ICML 2000**♥**R. Johansson and P. Nugues Dependency-based Semantic Role Labeling of PropBank. EMNLP 2008**♥**V. Punyakanok, D. Roth and W. Yih, The Importance of Syntactic Parsing and Inference in Semantic Role Labeling Computational Linguistics 2008.**♥**Y. Yang and M-W. Chang, S-MART: Novel Tree-based Structured Learning Algorithms Applied to Tweet Entity Linking, ACL 2015.**♥**K.-W. Chang and R. Samdani and D. Roth, A Constrained Latent Variable Model for Coreference Resolution, EMNLP 2013.

- M. Chang, L. Ratinov, N. Rizzolo and D. Roth, Learning and Inference with Constraints AAAI 2008.
**♥**M. Chang, L. Ratinov, and D. Roth, Guiding Semi-Supervision with Constraint-Driven Learning ACL 2007.**♥**K. Ganchev, J. Graca, J. Gillenwater and B. Taskar, Posterior Regularization for Structured Latent Variable Models JMLR 2010.**♥**K. Hall, R. McDonald, J. Katz-Brown and M. Ringgaard, Training dependency parsers by jointly optimizing multiple objectives EMNLP 2011.

**♥**M. Chang, D. Goldwasser, D. Roth and V. Srikumar, Discriminative Learning over Constrained Latent Representations NAACL 2010.**♥**Chun-Nam John Yu and T. Joachims, Learning Structural SVMs with Latent Variables ICML 2009.- A. McCallum, K. Bellare and F. Pereira, A Conditional Random Field for Discriminatively-trained Finite-state String Edit Distance UAI, 2005.
**♥**Sun, Xu, T. Matsuzaki, D. Okanohara and J. Tsujii, Latent Variable Perceptron Algorithm for Structured Classification IJCAI 2009.- Matsuzaki, Miyao, Tsujii Probabilistic CFG with Latent Annotations ACL 2005
**♥**Collobert and Weston A Unified Architecture for Natural Language Processing: Deep Neural Networks with Multitask Learning.- S. Petrov, L. Barrett, R. Thibaux and D. Klein, COLING/ACL 2006 Learning Accurate, Compact, and Interpretable Tree Annotation
- P. Liang, S. Petrov, M. Jordan, and D. Klein, EMNLP 2007 The Infinite PCFG using Hierarchical Dirichlet Processes

**♥**M. Chang, V. Srikumar, D. Goldwasser and D. Roth, Structured Output Learning with Indirect Supervision ICML 2010.**♥**Noah A. Smith and Jason Eisner, Contrastive Estimation: Training Log-Linear Models on Unlabeled Data ACL 2005.**♥**G.S. Mann and A. McCallum, Generalized Expectation Criteria for Semi-Supervised Learning with Weakly Labeled Data JMLR 2010.

**♥**T. Finley, T. Joachims, Training Structural SVMs when Exact Inference is Intractable ICML 2008.**♥**C. Sutton and A. McCallum Piecewise Pseudolikelihood for Efficient Training of Conditional Random Fields ICML 2007**♥**T. Joachims, T. Finley, Chun-Nam Yu, Cutting-Plane Training of Structural SVMs Machine Learning 2009.**♥**T. Koo, A. M. Rush, M. Collins, T. Jaakkola, and D. Sontag, Dual Decomposition for Parsing with Non-Projective Head Automata. EMNLP 2010.**♥**V. Srikumar, G. Kundu and D. Roth On Amortizing Inference Cost for Structured Prediction EMNLP 2012.

**♥**H. Daume, J. Langford, and D. Marcu, Search-based Structured Prediction Machine Learning 2009- J.R. Doppa, A. Fern and P. Tadepalli, HC-Search: A Learning Framework for Search-based Structured Prediction JAIR 2014
- K.-W. Chang, A. Krishnamurthy, A. Agarwal, H. Daumé III, J. Langford, Learning to Search Better Than Your Teacher ICML 2015
- T. Vieira and J. Eisner, Learning to Prune: Exploring the Frontier of Fast and Accurate Parsing TACL 2017

- Y. Goldberg A Primer on Neural Network Models for Natural Language Processing. JAIR 2016.

- Richard Socher, John Bauer, Christopher D. Manning, and Andrew Y. Ng, Parsing With Compositional Vector Grammars. ACL 2013.
- Sutskever, Ilya, Oriol Vinyals, and Quoc V. Le. Sequence to sequence learning with neural networks. NIPS 2014.
**♥**S. Wiseman and A. M. Rush. Sequence-to-sequence learning as beam-search optimization. EMNLP 2016.**♥**A. Karpathy, A. Joulin, and F. F. Li. Deep fragment embeddings for bidirectional image sentence mapping NIPS, 2014.**♥**L. Kong, C. Dyer, N. A. Smith Segmental Recurrent Neural Networks ICLR 2016.**♥**L. Yu, P. Blunsom, C. Dyer, E. Grefenstette, T. Kocisky The Neural Noisy Channel ICLR 2017.**♥**Y. Kim, C. Denton, L. Hoang, A. M. Rush Structured Attention Networks ICLR 2017.**♥**E. Kiperwasser, Y. Goldberg Easy-First Dependency Parsing with Hierarchical Tree LSTMs TACL 2016.