Mitch Marcus

RCA Professor of Artificial Intelligence Emeritus

Mitch Marcus's photo Office Address:
Dept. of Computer & Information Science
University of Pennsylvania
3330 Walnut Street
503 Levine Hall
Philadelphia, PA 19104-6389


I'm the RCA Professor of Artificial Intelligence Emeritus in the Department of Computer and Information Science at the University of Pennsylvania. I received my Ph.D. in 1978 from the MIT Artificial Intelligence Lab, and was a Member of Technical Staff at AT&T Bell Laboratories before coming to Penn in 1987. I served as chair of Penn's Computer and Information Science Department, as chair of the Penn Faculty Senate, as well as president of the Association for Computational Linguistics. I served as chair of the Advisory Committee of the Center of Excellence in Human Language Technology at John Hopkins University and as Technical Advisor to that Center. I was named a Fellow of the American Association of Artificial Intelligence in 1992 and a Founding Fellow of the Association for Computational Linguistics in 2011.

I created and ran the Penn Treebank Project through the mid-1990s which developed the primary training corpus that led to a breakthrough in the accuracy of natural language parsers. I was a co-PI on the OntoNotes Project and PI for an ARO-funded MURI project to investigate natural language understanding for human-robot interaction with co-PIs at Stanford, Cornell, UMass Amherst, UMass Lowell and George Mason. Until recently I was PI of a project on completely unsupervised morphology acquisition under DARPA LORELEI, working with Prof. Charles Yang in Linguistics and then worked with a team at ISI Boston on simulations of child language acquisition funded by the DARPA GAILA AIX Program. My current research interests include: statistical natural language processing and cognitively plausible models for automatic acquisition of linguistic structure.

My past PhD students have gone on to teach at such schools as MIT, Johns Hopkins, University of Arizona, Queens College, SUNY Stony Brook and the Navy Postgraduate School, and such industrial research labs as Google, IBM Research, BBN Technologies, and Microsoft Research.

Selected Publications

Hongzhi Xu, Mitchell Marcus, Charles Yang and Lyle Ungar, Unsupervised morphology learning with statistical paradigms, Proceedings of the 27th international Conference on Computational Linguistics (COLING), pp. 44-54, 2018

Emily Pitler, Sampath Kannan, Mitchell Marcus, Finding optimal 1-endpoint-crossing trees, Transactions of the Association for Computational Linguistics (TACL), Vol 1., pp. 13-24, 2013

Vasumathi Raman, Constantine Lignos, Cameron Finucane, Kenton CT Lee, Mitchell P. Marcus, and Hadas Kress-Gazit Sorry Dave, I'm afraid I can't do that: Explaining unachievable robot tasks using natural language, Proceedings of Robotics: Science and Systems IX, 2013

Daniel J. Brooks, Constantine Lignos, Cameron Finucane, Mikhail S. Medvedev, Ian Perera, Vasumathi Raman, Hadas Kress-Gazit, Mitchell P. Marcus, Holly A Yanco, Make it so: Continuous, flexible natural language interaction with an autonomous robot, Proccedings of the Grounding Language for Physical Systems Workshop at the Twenty-Sixth AAAI Conference on Artificial Intelligence, 2012

Qiuye Zhao and Mitch Marcus. Long tail distributions and Unsupervised learning of Morphology, Coling 2012

Qiuye Zhao and Mitch Marcus. Exploring Deterministic Constraints: from a Constrained English POS Tagger to an Efficient ILP Solution to Chinese Word Segmentation, Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (ACL), pages 1054--1062, 2012

Constantine Lignos, Erwin Chan, Charles Yang, and Mitchell P. Marcus, Evidence for a morphological acquisition model from development data, Proceedings of the 34th Annual Boston University Conference on Language Development, 2, 269-280. 2010

Constantine Lignos, Erwin Chan, Mitchell P. Marcus, and Charles Yang, A rule-based acquisition model adapted for morphological analysis, Multilingual Information Access Evaluation I. Text Retrieval Experiments. Lecture Notes in Computer Science, 6241, 658-665. 2010

Q. Zhao, M. Marcus, A simple unsupervised learner for POS disambiguation rules given only a minimal lexicon, Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, p 688-697, 2009

S. Pradhan, E. Hovy, M. Marcus, M. Palmer, L. Ramshaw, and R. Weischedel, OntoNotes: A Unified Relational Semantic Representation, International Journal of Semantic Computing, Vol. 1, No. 4, 2007

R. Gabbard, S. Kulick, M. Marcus, Fully Parsing the Penn Treebank, HLT-NAACL 2006, New York, New York

M. Marcus, B. Santorini, and M. Marcinkiewicz, Building a large annotated corpus of English: the Penn Treebank, Corpus Linguistics: Readings in a Widening Discipline, G. Sampson and D. McCarthy (eds.), Continuum, 2004. Also in Using Large Corpora, S. Armstrong (ed.), MIT Press, 1994. (reprinted from Computational Linguistics, 19(2), 1993)

M. Marcus (ed.), HLT 2002: Proceedings of the Second International Conference on Human Language Technology Research, Morgan Kaufmann, 2002

L. Ramshaw, M. Marcus, Text Chunking using Transformation-Based Learning, Natural Language Processing Using Very Large Corpora, Armstrong et al. (eds.), Kluwer, 1998

E. Brill, D. Magerman, M. Marcus, and B. Santorini, Deducing linguistic structure from the statistics of large corpora, Proceedings of DARPA Speech and Natural Language Workshop, June, 1990, Morgan-Kaufmann

D. Magerman, M. Marcus, Parsing a natural language using mutual information statistics, Proceedings of AAAI 90

M. Marcus, D. Hindle, and M. Fleck, D-Theory: Talking about talking about trees, Proceedings of the 21st Annual Meeting of the ACL, 1983

M. Marcus, A Theory of Syntactic Recognition for Natural Language, MIT Press, 1980

(Please note: Under many circumstances, I don't put my name on my students' papers. Please see my students' web sites for other current work.)

Former PhD Students

Former Postdocs