Don't Panic

For the term project part of your grade, which is 40% of your overall grade.

Term Project

Your project will be a self-designed multi-week team-based effort. You will be paired with a member of the course staff to help mentor the project. The goal of the project is to build or analyze a model on data of your choosing. Your final project will consist of the following components:

A formal definition of the problem and a motivation for why it is an interesting challenge for natural language processing.
A literature review of past approaches to the problem.
A commented implementation of the simplest possible solution to the problem. For instance, this could be a majority class baseline or a random baseline.
A reproduction of a strong baseline or starting point.
Two extensions that attempt to improve from your starting point, along with a brief (one- to three-paragraph) accompanying write-up for each innovation describing the general approach and whether it worked.
A final report summarizing your results.
A short 10 minute presentation about your project including 3 minutes for questions.

Most projects will fall into one of three catagories: trying prebuilt models on data you are interested in and trying to make improvements, a reimplementation of a paper or an analysis of an existing system. Extensions should be interpreted in a project specific way, but should clearly involve something new. For example, if you focus on analysis, an extensions could be a new qualtiative analysis or a new metric. We will be flexible in terms of what kind of data you would like to work and allowing for interesting innovations.

After the proposal, a member of the course staff will be paired with you and help guide and mentor you throughout the project. As part of each milestone after the proposal, you will need to have scheduled your next meeting with your mentor. Your mentor will be the main evaluator of your project, so it is important that when you meet, that you can clearly explain whats going on in your project. They will be able to offer advice and provide context. Extensions should be agreed upon with your mentor.

The term project is split into 5 deliverables, where the first four are worth 5% each, and the final report and presentation are worth 20%. You don’t have to wait to start working on each part of the project. We encourage you to begin work early, so that you have a polished final product.

The expected team size is 4. You may reach out to the instructors on piazza for special circumstances about team size.

If you don’t have an idea about what you’d like to do for the project, we can suggest a technical project implementing a sequence to sequence question answering system.

Milestones and Due Dates

Here are the milestones for the term project:

Nov 8, 2021 - Proposal - maximum 1.5 page project proposal. Give motivation and define the problem, or question you want to explore.
Nov 25, 2021 - Milestone 1 - A literature review of 3 papers, collect any data you need and scheduled mentor meeting.
Dec 2, 2021 - Milestone 2 - Evaluation code, a simple baseline, a strong baseline and scheduled mentor meeting.
Dec 9, 2021 - Milestone 3 - 1 extension, draft of your final project presentation and scheduled a mentor meeting.
Dec 15, 2021 - Project Final - Finish extension, submit final report, final presentation, and have scheduled a presention.
Dec 16, 2021 - Project Presention Day Option 1 3:30 - 6:00 EST via zoom.
Dec 21, 2021 - Project Presention Day Option 2 3:30 - 6:00 EST via zoom.

Proposal

For this milestone, you will need to create writeup proposing a project. It should include at least

A motivation for the problem or question (1 paragraph).
Explanation of what you plan on doing or studying (2 paragraphs, plus an illustrative example, in the form of a figure if appropriate)

You do not have to have a perfectly clear picture of your project at this point, but try to make at least inputs and outputs clear. If you know what kinds of models you want to work with already, please say so. The proposal should be no longer than 1.5 pages, and will serve as a basis for pairing your with a mentor. This milestone is worth 5% of the grade.

Milestone 1

For your first milestone you will need to have reviewed 3 papers and collected any data you need to do your project.

For Milestone 1, you’ll need to create a writeup which includes:

A literature review of three or sections textbook that describes the problem
A description the data you have collected to support your project

For your literature review, you should read 3 research papers that address the problem that you are working on. You should write a 1-2 paragraph summary of each paper, desribing the approaches that used and how well the approaches worked.

You will also need to submit any data you have collected.

This milestone is worth 5% of the grade.

Collect your data

Since most of the projects that we do in this course are data-driven, it’s very important to have your data ready to go at the outset of a project. You should collect all of the data that you’ll need for your term project and split the data into three pieces:

Training data
Development data
Test data

The training data will be used to train the model, the dev data can be used to optimize your system parameters and/or to evaluate different approaches to the problem, the test data is a “blind” test set that will be used in the final evaluation.

If you are basing your term project on a shared task, then usually the data will be collected already, and usually it will be divided into a standard training/dev/test split. If it’s already assembled and split - great! You’re ahead of the game. If you’re not doing a shared task, then you may need to assemble your own data. A good way of creating your own training/dev/test split is to divide the data into chunks that are sized around 80%/10%/10%, where you want to use most of the data for training. It’s important to ensure that the same items don’t appear in more than one of the splits.

For your M1 deliverables, we’ll ask you to submit your data, plus a markdown file named data.md that describes the format of the data. If your data is very large, then you can submit a sample of the data and give a link to a Google Drive that contains the full data set. You data.md should describe the number of items in each of your training/dev/test splits.

What do you need to turn in?

A report no longer than two pages with related work, and description of your data.
A markdown file that describes your data (name it data.md). It should give an example of the data, describe the file format of the data, give a link to the full data set (if you’re uploading a sample), and give a description of where you collected the data from.
You should create a directory containing your training/dev/test data (please create a gzipped tar archive of the data). If your data is too large to upload to gradescope, the you can submit a sample of the training data, plus your compute dev and test sets

Milestone 2

For Milestone 2, you will need to:

Write an evaluation script
Write a simple baseline (for instance, a majority class baseline)
Evaluate a strong baseline you may have seen in literature

This milestone is worth 5% of the grade.

Write an evaluation script

You’ll need to determine a suitable evaluation metric for your project, and implement it. If you’re basing your term project on a shared task, then there is likely an established evaluation metric for the task. You should re-use it. If you’re doing a new task, then you may have to do a literature review in order to determine what metrics are best suited for your task.

You should write an evaluation script that takes two things as input: a system’s output and a corresponding set of gold standard answers. Your script should output a number that quantifies how good the system’s answers are.

For your deliverables, you should include your script, plus an example of how to run it from the command line. You should give a formal definition of the evaluation metric that explains how it is calculated in a markdown file called scoring.md - this file should cite any relevant papers that introduce the metric. You can also cite Wikipedia articles that describe your evaluation metric, and/or link to an overview paper describing the shared task that you’re basing your project on if it defines the metric.

Write a simple baseline

As the final part of M2, you should write a simple baseline. This should be the simplest way of producing output for your task. For example, it could be a majority class baseline (like the one that we used in HW1) that determines the majority class from the training data and guesses that class for each item in the test set.

You should write a python program that will generate the output for the baseline, and you should submit that as simple-baseline.py. You should also include a markdown file named simple-baseline.md that describes your simple baseline, gives sample output, and reports the score of the baseline when you run it on the test set, and evaluate it with your scoring script.

Evaluate a strong baseline

The goal is to evaluate a strong baseline that you may have seen described in the liteature.
It’s fine to use machine learning packages like pytorch or sklearn, or NLP software like AllenNLP, Spacy, or HuggingFace. For example, you could implement an LSTM based classifer or generation system. You should not turn in existing code that implements the baseline on the data you have in mind. If you are using a standard model, please integrate it into a code base you are forming, with your data. Your strong baseline should not be so complex that it could take an entire paper to describe. Instead, it should starting point for any innovations you have in mind.

What do you need to turn in?

A 1-2 page pdf report describing your evaluation measure, baselines, and their performance. Include any equations to you need to consisely, formally, describe all of these components.
You should describe your evaluation metric in a markdown file called scoring.md. Your scoring.md file should also show how to run your evaluation script on the command line (with example arguments, and example output).
You should include your evaluation script (you can call then score.py if you’re writing it in python).
You should upload simple-baseline.py and describe how to use the code in simple-baseline.md.
You should upload any code supporting your strong baseline and describe how to use it in strong-baseline.md.

Milestone 3

The goal of Milestone 3 is to complete one extension to the starting point you developed in the previous milestone. You will need to turn in a report describing the extension, no longer than 1 page.

For Milestone 3, you will also prepare a draft presentation about your project: google slides with presenter’s notes. Your presentation should convey these main ideas:

What is the topic of your term project? You should clearly explain to your classmates the problem that you selected to work on. Give an illustrative example of the problem first, and then give a more formal definition of the problem.
What is exciting about your term project? Why did you want to work on this topic?
How does the topic relate to the class? What new things did you learn?

You may also want to cover topics like this:

What kind of data is available for this problem? How do you evaluate whether a solution is good or not? If the evaluation metric is not already familiar to the class, then walk through an explanation of how it works.
What is the baseline performance for the simple baseline like a majority class baseline?
What approaches have people taken in the past? How successful have they been?
What did you implement for your published baseline?

For final submission, you’ll need to implement several extensions beyond this published baseline. These should be different experiments that you run to try to improve its performance.

What do you need to turn in?

You should also include a 2-3 explanation of extension, and your emprical evaluation of it.
You should submit your code for your extension. You should also submit a markup file explaining how to use the code.
A link to your draft slides for your final presentation.

This milestone is worth 5% of the grade.

Milestone 4

For your final milestone, you’ll complete your extensions to the baseline, and you’ll produce a final writeup for your term project.

Your final report should be written in the style of a scientific paper, and formatted with this LaTeX style file (which will make it look totally scientific!). Feel free you reuse content from your previous milestone writeups. Your report should contain the following sections:

Title. A descrpitive title for your term project.
Authors. A list of team members.
Abstract. Your abstract should give an overview of your project and your results (~100 words).
Introduction. Your introduction should contain the following information. (~300-500 words, plus one illustrative example).
- An informal description of the task, and how it relates to NLP/Computational Linguistics (1-2 paragraphs)
- A figure that illustrates the task, or an illustrative example of the type of problem you’re trying to solve. This can be a picture, or an example of an input and output. You should include a caption or a short paragraph that describes what’s happening in your illustration.
- A formal definition of the problem.
- A paragraph describing why you picked this task for your term project.
Literature Review. You can adapt your literature review from Milestone 3 for this part of your writeup. (~300-500 words, with 3 or more ciations).
- If you adapted a shared task for your term project, then you should describe the shared task in your literature review, and cite the overview paper and give a URL to shared task homepage (if applicable).
- For your literature review, you should also cite and summarize 3-5 research papers that address the problem that you are working on. You should write a 1-2 paragraph summary of each paper, desribing the approaches that they proposed and what results they got. Be sure to include a full citation of these papers in your Bibliography.
Experimental Design. Your Experimental Design section should include a description of your training data, your evaluation metric, and your simple baseline model along with its performance on the test set. You can adapt your Milestone 2 submission for this part. (~300-500 words, plus 2 figures/tables, plus 1 or more equations).
- Data. This subsection should describe your training/development/test data. You should give an figure or table with examples from your data (including inputs and output labels). You should include a table that describes the size of your data sets. For example, it should give number of sentences or words, etc for each of the splits. You should also characterize the data. For instance, if there’s a skewed distribtuion over the labels, you should reoprt it here. If your training data comes from a published paper, then cite that paper and explain how they collected the data. If you constructed your data set, then explain in detail how you collected it, and include example code in an appendix.
- Evaluation Metric. This subsection should describe your evaluation metric. You should include an English description of the metric, an equation for how your metric is computed, and a citation for this metric, and some citation(s) that shows what past publication(s) used this metric for the task that you’re working on.
- Simple baseline. You should compute the majority class baseline (or other simple baseline) for your data, and report it in this section. This is a way of characterizing the data and showing the diffiulty of the task.
Experimental Results. In this section, you should describe your implementation of a published baseline, and all of the extensions that you experimented with for your term project, and an error analysis of your system’s output. (~300-500 words).
- Published baseline. In this subsection you should write a detailed description of the published baseline that you implemented and cite the paper that it was published in. (You can update your Milestone 3 submission for this). You should report how well the model performs on your test set using the evaluation metric that you defined in your experimental design section. Does your implementation of the published baseline reach the same level of accuracy as the original paper? If not, why not? Are your results directly comparable – are they on the same test set? If not, why not?
- Extensions. In this subsection, you should describe each of the extensions that you tried. You should include a ~1-2 paragraph of each extension that explains what you tried, why you tried it, and how it performed compared to your baseline. You should include a table of results, where the rows are the performance of the baseline or one of your extensions, and the columns are the performance on the test set (and on the dev set if you measured it). If you did any experiments where you searched over a set of different parameters, then you should include a result on how varying the parameter changed the performance on the dev or test set. Your tables and figures should include a detailed caption that explain how to read them.
- Error analysis. In this subsection, you should perform an error anlaysis for your best performing system. Show examples of the errors that it makes. Can you cateorize the types of errors that it makes, and give an esimate of how prevelant each error type is? If you extensions performed better than the published baseline, then show examples of the errors that the published baseline makes that your extensions get correct (and vice versa if your extension introduces some new errors).
Conclusions. You should write a brief summary of what you accomplished in your term project. Did any of your implementations reach state-of-the-art performance on the task? If not, how close did you come? If not very close, then why not? (~100-300 words).
Acknowledgements. If you used someone else’s code or you benefited from discussions with one of the TAs, then you should thank them here. Give credit generously! (Optional)
Appendicies. This can include short snippets of code that were relevant to your project, along with a description of what it’s doing. It could also include more examples of your training data or your system’s output. (Optional)

I really like examples and good illustrations. If you created some nice visuals for your final presentation slides, then I encourage you to include them in your writeup too. You can submit your images in a images/ subfolder.

What do you need to turn in for Milestone 4?

You should turn the following items:

A PDF of your final report
A PDF final project presentation slides.
A tarball or zip file with all of your code and data. It should contain:
- data/ - a subdirectory containing the training/dev/test splits that you use. If your data is too large to submit, then you can include a README file in this subdirectory that explains how to download your data.
- code/ - a subdirectory containing all code that you developed for your project, including the baseline and extensions, and your evaluation scripts. This should include a README that gives a step by step walk thorugh of how to run your code, including an example of the command lines to run to reproduce the results that you report.
- output/ - a subdirectory containing your model’s predictions on the test set, along with the gold labels. This should also include a README that shows the command line on how to run your evaluation script on the output, and example of what scores the script returns.
Schedule a time to present your results to Mark. You may choose from the two timeslots indicated at the top of this page, or schedule a seperate slot with Mark.

Project Presentation

You will need to present your project via zoom either in one of two public group sessions (listed at the top of the document), or schedule one-on-one time with Mark. You will have a specific order in the schedule. Your presentation should be no longer than 7 minutes, with 3 minutes for questions. The public sessions will each have roughly 10 projects, and everyone in the class is welcome to attend. We will take a 15 minute break in the middle.

This final milestone is worth 10%, split equally between final report and final presentation.

You’ve reached the end. Great job!