Machine Learning and Natural Language

Fall 2005

Experimental Assignment II                    Classification: Prepositional Phrase Attachment (Due 11/15/05)


Prepositional Phrase Attachment

Prepositional phrase attachment is a common cause of structural ambiguity in natural language. The task is to decide whether the Prepositional Phrase (PP) attaches to the Noun Phrase (NP), as in Buy the car with the steering wheel} or to the VP buy}, as in Buy the car with his money.

Earlier works on this problem (see the class lecture page for pointers to several relevant papers) considered as input the four head words involved in the attachment - the VP head, the first NP head, the preposition and the second NP head (i.e., buy, car, with and steering wheel}, respectively).

These 4-tuples, along with the attachment decision, constitute the labeled input sentence and will be the main input that you will be given in this problem set. The goal of this problem set is for you to train a classifier that is doing a good a job as possible resolving this ambiguity. In the course of doing it you will have a chance to

For reference, the most recent paper I know of on this problem is: PP-attachment disambiguation using large context; , by Marian Olteanu and Dan Moldovan, in EMNLP/HLT 2005
The paper has a lot of pointers to relevant papers.

The Assignment

You will be given training and test data in the format mentioned above (and used in most papers on PPA) - the 4-tuple format.
Your assignment is to train classifiers on the training data, test it on the test data, and report your results.
You can use this data in order to generate features that will be given as input to your learning algorithm. For example, you can use as features all sub-sequences of the 4-tuple, total of 15 for every input sentence. You can use additional resources to generate better features. You can also use the whole sentence (I will supply the whole sentence along with the 4-tuples in case you want to extract feature from it).

As a minimum you need to do the following:

In addition, you can choose to compare with an additional classifier and/or to you more data (complete sentences rather than 4-tuples) and external resources for better features, but you do not have to do any of these.


  1. Describe what you did, the specifics of your experiments, models, additional decisions you had to make and justify the decisions you had made and your experimental design.
  2. You will be running many experiments. Think about it ahead of time and be smart about how you do it, how do you collect your results from all the experiments and how do you present it. Both to yourself, so that you can figure out what is going on, and to me (or any other external reader).
  3. Chose at least 3 interesting examples, good, bad, or just interesting for some reason, and discuss them in your write-up.
  4. Conclude with some suggestions for improvements, future work, etc.
  5. Send me only your report (no longer than 10 pages) but be ready with a package of the code in case you need to show something about it.


The data is available here. It is already split to training and test data. Note that one mapping to features is also provided (in the SNoW input format), as an example. This is the mapping that makes use of all 15 conjunctions for each example; but, you will need to generate other mappings.


Your grade depends on:
  1. The quality of your report
  2. The quality of your results.
  3. Your originality in going beyond the minimal requirements.

Due date

Tuesday, Nov. 15.
Dan Roth