|
|
CIS 520 - Fall 08
Machine Learning
Ben Taskar |
|
|
Lectures: Moore 216, Monday and Wednesday, 10:30am-12:00pm
| Instructor: Ben Taskar
| | |
Email: taskar@cis
| | |
Office: Levine 464, Monday 3-5pm
|
Teaching Assistants:
Paramveer Dhillon Email: pasingh@seas
Office: Levine 612
Thursday 10am-12pm
|
|
Katerina Fragiadaki Email: katef@seas
Office: Levine 4th Floor Lobby
Friday 3-5pm
|
|
Ben Sapp Email: bensapp@seas
Office: Levine 612
Wednesday 4-6pm 3-5pm
|
|
Administrative Assistant: Charity Payne
| | |
Email: charity@cis | | |
Office: 459 Levine
|
Course schedule.
Announcements
Midterm is scheduled for Monday, Oct 20th.
For submitting your homework, you will use turnin on seas. Please
turn in a separate file for each question (pdf strongly preferred). (Hw1 due
Mon, Sep 29 at 5 pm). For example, to turn in Homework 1, use
turnin -c cis520 -p hw1_1 file1.pdf
turnin -c cis520 -p hw1_2 file2.pdf
turnin -c cis520 -p hw1_3 file3.pdf
turnin -c cis520 -p hw1_4 file4.pdf
Please include the names of your collaborators.
We will use Blackboard
(https://courseweb.library.upenn.edu) for communicating about assignments and other questions.
Course description
CIS 520 provides a fundamental introduction to the mathematics, algorithms and practice of machine learning. Topics covered include:
- Supervised learning: least squares regression, logistic regression, perceptron, generalized linear models,
discriminant analysis, naive Bayes, support vector machines. Model and feature selection, ensemble methods, bagging, boosting. Learning theory: Bias/variance tradeoff. Union and Chernoff/Hoeffding bounds.
VC dimension. Online learning.
- Unsupervised learning: Clustering. K-means. EM. Mixture of Gaussians.
Factor analysis. PCA. MDS. pPCA.
Independent components analysis (ICA).
- Graphical models: HMMs, Bayesian and Markov networks. Inference. Variable elimination.
- Reinforcement learning:
MDPs. Bellman equations. Value iteration. Policy iteration.
Q-learning. Value function approximation.
For details, see the course schedule.
Audience
The course is aimed broadly at advanced undergraduates and beginning graduate students in computer science, electrical engineering, mathematics, physics, and statistics. Undergraduates who meet the prerequisites are particularly encouraged to enroll, as are students from other departments.
Please enroll in CIS 520/001 if you are a CIS PhD student and in CIS 520/002 if not.
Reading Materials
Required Text: C. Bishop, Pattern Recognition and Machine Learning.
Optional Text: T. Hastie, R. Tibshirani, J. Friedman, The Elements of Statistical Learning: Data Mining, Inference, and Prediction.
Optional Text: T. Mitchell, Machine Learning.
Selected readings from other books and papers will be distributed as electronic or hard copies.
Software
We will be using Matlab for the course. It can be
purchased at the bookstore ($99) or online from Mathworks or used on the computers in SEAS and a variety of other places on campus.
Pre-requisites
- Basic algorithms, data structures and complexity (dynamic
programming, queues, stacks, graphs, big-O, P/NP). [CSE 320]
- Basic probability and statistics (random variables, moments, standard
distributions, simple regression). [CSE 261]
- Basic linear algebra (matrices, vectors, norms, inverses).
- Reasonable programming skills.
Evaluation
- 5 Problem Sets: 40%
- Midterm: 15%
- Project: 20%
- Final: 25%
The problem sets include programming questions. All problem sets will be submitted electronically.
The midterm and final will be open-book, open notes
exams, which will encompass material covered in the lectures and assigned in
the readings. For the project, you will be given an open-ended challenge problem,
set up as a competition (see details below) .
In addition to problem sets, we will hand out exercises. You are expected to solve the exercises, but not hand them in. The
exercises will not be graded. However, we strongly recommend that you try
to solve the exercises and look over the solutions to understand them.
We will use class participation as a factor in determining the final grade in borderline cases,
so we encourage you to attend class and participate actively.
We try very hard to make questions unambiguous, but some
ambiguities may remain. On the problem sets, each question will have one TA
responsible for grading and clarifications.
Ask if confused or state your assumptions explicitly.
Reasonable assumptions will be accepted in case of ambiguous questions.
Collaboration
You are allowed and encouraged to work together. You may
discuss the homework to understand the problem and reach a solution. However, each student
must write down the
solution independently, and without referring to written notes
from the joint session. In other words, you must understand the
solution well enough in order to reconstruct it by yourself. In addition,
each student must write on the problem set the set of people with whom s/he
collaborated. You cannot collaborate with the same person on more than one problem set. This policy will hopefully make sure you get to know more of your peers and prevent unequal contributions within a collaboration.
On the final project competition, you can work in groups of at most two and
submit one solution. You may join someone you have already worked with on a problem set.
Important note on the honor code: The purpose of problem sets in this class
is to help you think about the material, not just give us the right answers.
You are free to use online resources for learning more about the
material covered in class; however, you should not look online for solutions to
questions in the problem sets.
Late Policy
Recognizing that you may face unusual circumstances
and require some flexibility in the course of the quarter, each student will
have a total of five free late (calendar) days to use as s/he sees fit.
Once these late days are exhausted, any homework turned in late will
be penalized 20% per late day. However, no homework will be accepted
more than four days after its due date.
Each 24 hours or part thereof that a homework
is late uses up one full late day.
Late days are not permitted for the final project code or the
final project writeup.
Project
You will be given an open-ended challenge problem,
set up as a competition. Details on the challenge problem will be announced later in class.
Your solution will be judged both in terms of its performance (at the final project competition
held at the end of the semester), and in terms of the quality
and novelty of your ideas (as described in your writeup).
A the final project competition we will declare a winning and a runner-up team.
Each member of the winning team will receive 3% extra credit;
each member of the runner-up team will receive 1.5% extra credit.
You will also submit a final writeup describing your solution to the
challenge problem.
Related course of interest
Michael Kearns and Koby Crammer are co-teaching CIS 620: Computational Learning Theory, Tuesdays 12-3 PM.
See http://www.cis.upenn.edu/~mkearns/teaching/COLT