Machine Learning Preparation @Penn: An Informal Course Catalog

By Andrew Schein

Last updated March 26, 2003.

From time to time graduate students involved in machine learning and the related fields of data mining, natural language processing or other aspects of the field known as Artificial Intelligence ask me about what courses would be useful to take to help understand the papers they read: particularly the papers involving machine learning.  I have found reading groups to be among the most helpful things.  Here is a list of courses I have taken that have either helped or one might have thought they could help.  It is intended to be an evaluation of the courses and the material, not the instructors.

Math Essentials:  Undergraduate 100-level calculus.   300 or 400 level linear algebra course  that includes the singular value decomposition.  I followed Gilbert Strang's Linear Algebra and Its Applications several summers ago.

CIS520 - Artificial Intelligence.
Taken: Fall 1998 with Lyle Ungar.
This was a good first course.  I gained familiarity with some of the fundamental algorithms such as linear and logistic regression and applied them to actual data sets.  At the end of the semester I had a good sense of key concepts like overfitting.  However,  in a one semester introductory course for Artificial Intelligence there is only so much machine learning theory covered and the degree of assumed mathematical sophistication is limited.  More recently the course has been instructed by Lawrence Saul and the emphasis has changed a bit.

CIS620  - Statistical Methods in Artificial Intelligence
Taken: Spring 2002 with Michael Kearns and Lawrence Saul.
What I found most useful in this course was that we derived and implemented model fitting for linear, logistic regression and EM model fitting for elementary models.  It remains to be seen whether this course was a one-time only thing.  Let us hope not.

EE674 - Information Theory
When Dr. Venkatesh gave a special seminar-style version of this course for four motivated undergraduates (Spring 2001), he kindly allowed me to attend as an auditor. I felt this was a strong introduction to information theory. The normal course is rumored to be quite challenging.

Other EE Courses  -
There are several EE courses relevant to machine learning. Unfortunately, I have not pursued these opportunities. If you have useful info about these courses, please pass it along and I will post it.

MATH360,361 - Elementary Analysis/Advanced Calculus
Audited: Fall2002,Spring2003 with Stephen Preston and Jianqiang Zhao respectively.
I view 360 as a hurdle that must be overcome in order to prepare for 361.  361 is a real gem of a course introducing the Jacobian, the Hessian, constrained optimization with Lagrange multipliers, advanced integration theory and other things you have probably seen in papers or courses but didn't know where to learn about. There is a 500 level version of this course which I attempted to audit at one point, but was made to feel unwelcome as an auditor by the particular professor that semester.  I have heard conflicting reports about whether the 500 level version is too much harder than the 300 level version.   The 300-level version is quite manageable with a steady stream of homework's to complete.  I wish I had taken/audited this series my second year.  Students with engineering degrees from countries other than the U.S. may have already seen enough of this material.  U.S. Computer Science students on the other hand are often weak on continuous mathematics.

STAT511 - Statistics for Business and Economics
Taken: Spring 2000 with Lawrence Brown
This was the most useful course I have ever taken.  This is an applied Statistics course that will help you analyze data sets and give a strong foundation in the concepts of Statistics.

STAT530 - Probability
Taken: Fall 1999 with J. Michael Steele
This is a core course in the Statistics Ph.D. program at Wharton.  I took this course without a background in Analysis (e.g. MATH360,361) which made it _extremely_ painful.  This is not an applied course, but more of a foundation course.  I would compared it to MATH 360 in that it consists of a bunch of hoops to jump through in order to prepare for the topics of STAT531, a course I have not taken.  You spend the course learning to prove the fundamental theorems of Probability, for example the Central Limit Theorem.  Students without the analysis background or looking for a first introduction to Probability should consider instead EE530: Probability and Random Processes in the Electrical Engineering department.  I didn't take EE530, but a friend did and it looked like a good course.  STAT510 looks like an easier version of EE530.

STAT550/551 - Mathematical Statistics
I have not taken this course, but I know people who have and so I'll mention a few things.
This is a core course in the Statistics Ph.D. program at Wharton.  It should only be taken by those with advanced calculus or analysis background and at least a 100 level background in statistics and probability.

I have known a number of computer science students who have bravely taken STAT530/531 and STAT550/551 without the calculus prereqs on the advice of various faculty in our department.  This is mostly a waste of time.  The advanced calculus in these courses is essential to understanding the material at this level.  Check out the text for MATH360/361 or MATH508/509 to see if you have the proper background.

One-Time courses:

CIS6??  - Advanced Topics in Natural Language Processing
Taken: Fall 2001 w Fernando Pereira
This was useful in that it covered machine learning algorithms I had not seen in prior reading groups.

CIS6?? - Advanced Topics in NLP... Information Extraction
Audited: Fall 2003 w Martha Palmer and Dan Gildea
This course showed machine learning and other approaches applied to the field of information extraction (IE).

If you know of other courses that are relevant please Email a description of the course and why it is useful to graduate students pursuing research in machine learning and related disciplines.

Back to my home page.