Mitchell P. Marcus

RCA Professor of Artificial Intelligence

Mitch Marcus's photo Office Address:
Dept. of Computer & Information Science
University of Pennsylvania
3330 Walnut Street
503 Levine Hall
Philadelphia, PA 19104-6389

Tel: (215) 898-2538
FAX: (215) 898-0587

I'm the RCA Professor of Artificial Intelligence in the Department of Computer and Information Science at the University of Pennsylvania, where I'm also Professor of Linguistics. I received my Ph.D. in 1978 from the MIT Artificial Intelligence Lab, and was a Member of Technical Staff at AT&T Bell Laboratories before coming to Penn in 1987. I served as chair of Penn's Computer and Information Science Department, as chair of the Penn Faculty Senate, as well as president of the Association for Computational Linguistics. I currently serve as chair of the Advisory Committee of the Center of Excellence in Human Language Technology at John Hopkins University. I was named a Fellow of the American Association of Artificial Intelligence in 1992.

I created and ran the Penn Treebank Project through the mid-1990s which developed the primary training corpus that led to a breakthrough in the accuracy of natural language parsers for unrestricted text. I and my collaborators continue to develop hand-annotated corpora for use world-wide as training materials for statistical natural language systems. I am currently the principal investigator for an ARO-funded MURI project to investigate natural language understanding for human-robot interaction with co-PIs at Stanford, Cornell, UMass Amherst, UMass Lowell and George Mason. My research interests include: statistical natural language processing, human-robot communication, and cognitively plausible models for automatic acquisition of linguistic structure.

My past PhD students have gone on to teach at such schools as MIT, Johns Hopkins, University of Arizona, Queens College and the Navy Postgraduate School, and such industrial research labs as IBM Research, BBN Technologies, and Microsoft Research.

Current Projects

Situation Understanding Bot Through Language And Environment (SUBTLE)


Selected Publications

Vasumathi Raman, Constantine Lignos, Cameron Finucane, Kenton CT Lee, Mitchell P. Marcus, and Hadas Kress-Gazit Sorry Dave, I'm afraid I can't do that: Explaining unachievable robot tasks using natural language, Proceedings of Robotics: Science and Systems IX, 2013

Daniel J. Brooks, Constantine Lignos, Cameron Finucane, Mikhail S. Medvedev, Ian Perera, Vasumathi Raman, Hadas Kress-Gazit, Mitchell P. Marcus, Holly A Yanco, Make it so: Continuous, flexible natural language interaction with an autonomous robot, Proccedings of the Grounding Language for Physical Systems Workshop at the Twenty-Sixth AAAI Conference on Artificial Intelligence, 2012

Qiuye Zhao and Mitch Marcus. Long tail distributions and Unsupervised learning of Morphology, Coling 2012.

Qiuye Zhao and Mitch Marcus. Exploring Deterministic Constraints: from a Constrained English POS Tagger to an Efficient ILP Solution to Chinese Word Segmentation, Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (ACL), pages 1054--1062, 2012

Constantine Lignos, Erwin Chan, Charles Yang, and Mitchell P. Marcus, Evidence for a morphological acquisition model from development data, Proceedings of the 34th Annual Boston University Conference on Language Development, 2, 269-280. 2010

Constantine Lignos, Erwin Chan, Mitchell P. Marcus, and Charles Yang, A rule-based acquisition model adapted for morphological analysis, Multilingual Information Access Evaluation I. Text Retrieval Experiments. Lecture Notes in Computer Science, 6241, 658-665. 2010

Q. Zhao, M. Marcus, A simple unsupervised learner for POS disambiguation rules given only a minimal lexicon, Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, p 688-697, 2009.

S. Pradhan, E. Hovy, M. Marcus, M. Palmer, L. Ramshaw, and R. Weischedel, OntoNotes: A Unified Relational Semantic Representation, International Journal of Semantic Computing, Vol. 1, No. 4, 2007.

R. Gabbard, S. Kulick, M. Marcus, Fully Parsing the Penn Treebank, HLT-NAACL 2006, New York, New York.

M. Marcus, B. Santorini, and M. Marcinkiewicz, Building a large annotated corpus of English: the Penn Treebank, Corpus Linguistics: Readings in a Widening Discipline, G. Sampson and D. McCarthy (eds.), Continuum, 2004. Also in Using Large Corpora, S. Armstrong (ed.), MIT Press, 1994. (reprinted from Computational Linguistics, 19(2), 1993)

M. Marcus (ed.), HLT 2002: Proceedings of the Second International Conference on Human Language Technology Research, Morgan Kaufmann, 2002.

L. Ramshaw, M. Marcus, Text Chunking using Transformation-Based Learning, Natural Language Processing Using Very Large Corpora, Armstrong et al. (eds.), Kluwer, 1998.

E. Brill, D. Magerman, M. Marcus, and B. Santorini, Deducing linguistic structure from the statistics of large corpora, Proceedings of DARPA Speech and Natural Language Workshop, June, 1990, Morgan-Kaufmann.

D. Magerman, M. Marcus, Parsing a natural language using mutual information statistics, Proceedings of AAAI 90.

M. Marcus, D. Hindle, and M. Fleck, D-Theory: Talking about talking about trees, Proceedings of the 21st Annual Meeting of the ACL, 1983.

M. Marcus, A Theory of Syntactic Recognition for Natural Language, MIT Press, 1980.

(Please note: Under many circumstances, I don't put my name on my students' papers. Please see my students' web sites for other current work.)

Former PhD Students