Last updated March 26, 2003.
From time to time graduate students involved in machine learning and the related fields of data mining, natural language processing or other aspects of the field known as Artificial Intelligence ask me about what courses would be useful to take to help understand the papers they read: particularly the papers involving machine learning. I have found reading groups to be among the most helpful things. Here is a list of courses I have taken that have either helped or one might have thought they could help. It is intended to be an evaluation of the courses and the material, not the instructors.
Math Essentials: Undergraduate 100-level calculus. 300 or 400 level linear algebra course that includes the singular value decomposition. I followed Gilbert Strang's Linear Algebra and Its Applications several summers ago.
CIS520 - Artificial Intelligence.
Taken: Fall 1998 with Lyle Ungar.
This was a good first course. I gained familiarity with some
of the fundamental algorithms such as linear and logistic regression and
applied them to actual data sets. At the end of the semester I had
a good sense of key concepts like overfitting. However, in
a one semester introductory course for Artificial Intelligence there is
only so much machine learning theory covered and the degree of assumed
mathematical sophistication is limited. More recently the course
has been instructed by Lawrence Saul and the emphasis has changed a bit.
CIS620 - Statistical Methods in Artificial Intelligence
Taken: Spring 2002 with Michael Kearns and Lawrence Saul.
What I found most useful in this course was that we derived and implemented
model fitting for linear, logistic regression and EM model fitting for
elementary models. It remains to be seen whether this course was
a one-time only thing. Let us hope not.
EE674 - Information Theory
When Dr. Venkatesh gave a special seminar-style version of this course for four motivated undergraduates (Spring 2001), he kindly allowed me to attend as an auditor. I felt this was a strong introduction to information theory. The normal course is rumored to be quite challenging.
Other EE Courses -
There are several EE courses relevant to machine learning. Unfortunately, I have not pursued these opportunities. If you have useful info about these courses, please pass it along and I will post it.
MATH360,361 - Elementary Analysis/Advanced Calculus
Audited: Fall2002,Spring2003 with Stephen Preston and Jianqiang Zhao
respectively.
I view 360 as a hurdle that must be overcome in order to prepare for
361. 361 is a real gem of a course introducing the Jacobian, the
Hessian, constrained optimization with Lagrange multipliers, advanced integration
theory and other things you have probably seen in papers or courses but
didn't know where to learn about. There is a 500 level version of this
course which I attempted to audit at one point, but was made to feel unwelcome
as an auditor by the particular professor that semester. I have heard
conflicting reports about whether the 500 level version is too much harder
than the 300 level version. The 300-level version is quite
manageable with a steady stream of homework's to complete. I wish
I had taken/audited this series my second year. Students with engineering
degrees from countries other than the U.S. may have already seen enough
of this material. U.S. Computer Science students on the other hand
are often weak on continuous mathematics.
STAT511 - Statistics for Business and Economics
Taken: Spring 2000 with Lawrence Brown
This was the most useful course I have ever taken. This is an
applied Statistics course that will help you analyze data sets and give
a strong foundation in the concepts of Statistics.
STAT530 - Probability
Taken: Fall 1999 with J. Michael Steele
This is a core course in the Statistics Ph.D. program at Wharton.
I took this course without a background in Analysis (e.g. MATH360,361)
which made it _extremely_ painful. This is not an applied course,
but more of a foundation course. I would compared it to MATH 360
in that it consists of a bunch of hoops to jump through in order to prepare
for the topics of STAT531, a course I have not taken. You spend the
course learning to prove the fundamental theorems of Probability, for example
the Central Limit Theorem. Students without the analysis background
or looking for a first introduction to Probability should consider instead
EE530: Probability and Random Processes in the Electrical Engineering department.
I didn't take EE530, but a friend did and it looked like a good course.
STAT510 looks like an easier version of EE530.
STAT550/551 - Mathematical Statistics
I have not taken this course, but I know people who have and so I'll
mention a few things.
This is a core course in the Statistics Ph.D. program at Wharton.
It should only be taken by those with advanced calculus or analysis background
and at least a 100 level background in statistics and probability.
I have known a number of computer science students who have bravely taken STAT530/531 and STAT550/551 without the calculus prereqs on the advice of various faculty in our department. This is mostly a waste of time. The advanced calculus in these courses is essential to understanding the material at this level. Check out the text for MATH360/361 or MATH508/509 to see if you have the proper background.
One-Time courses:
CIS6?? - Advanced Topics in Natural Language Processing
Taken: Fall 2001 w Fernando Pereira
This was useful in that it covered machine learning algorithms I had
not seen in prior reading groups.
CIS6?? - Advanced Topics in NLP... Information Extraction
Audited: Fall 2003 w Martha Palmer and Dan Gildea
This course showed machine learning and other approaches applied to
the field of information extraction (IE).
If you know of other courses that are relevant please Email a description of the course and why it is useful to graduate students pursuing research in machine learning and related disciplines.