ADVANCED TOPICS IN MACHINE LEARNING
URL for this page: http://www.cis.upenn.edu/~mkearns/teaching/cis700
Prof. Michael Kearns
Office hours: By appointment
COURSE LOCATION AND TIME
Wednesdays 1:30-4, in 315 Levine Hall.
Just as in the Fall 2003 version, this seminar course will examine selected recent developments in machine learning and related topics from the literature. The term "machine learning" will be construed broadly (as it seems to be in the research community:)), and will include all topics in statistical modeling and probabilistic AI, as well as relevant results and tools from theoretical CS and algorithms, game theory and economics, finance, and others.
See last year's web page (link above) for a flavor of the topics covered then and the level of the readings. Participants will have a strong say in what topics are examined.
COURSE FORMAT AND REQUIREMENTS
The course will be run in seminar/reading group format. Each week we will examine a couple of papers, and each session will have one or more participants who will lead an informal discussion of the chosen material. The course will also act as a venue for external speakers, and we'll also consider locals who would like a forum for presentation and discussion of their own work in machine learning.
Class participation, including leading one or more sessions, will be determine the course grade. It is expected that participants will read the material, at least enough to understand the main ideas, before each meeting. This way the meetings themselves can be for diving into greater detail, discussing strengths and weaknesses, brainstorming on extensions, proving original results, etc.
Auditors are welcome, as are sporadic and one-time attendees who drop in for subjects that interest them.
Familiarity with the basics of machine learning, probability theory and statistics, and the analysis of algorithms. Some knowledge of mathematical programming, optimization, and computational complexity will probably also be helpful.
SCHEDULE AND READINGS
Wed Sep 8:
We'll start with a mainly organizational meeting to discuss possible topics for the term, and to start planning the schedule and discussion leaders. Please come prepared with suggestions of both areas for investigation and specific readings.
OK, we did the above:), and I have summarized our initial suggestions for topics and readings in The Bullpen (see below) shortly. Next meeting or two will be led by me and others on topics in financial modeling.
Wed Sep 15: Financial Modeling I: Random and Non-Random Walks; Technical Trading Strategies
Sham Kakade, Ryan Porter and I will lead the discussions for the first few sessions on different approaches to financial modeling, algorithms for trading profitably in those models, etc. I'll begin with some papers examining the more classical "random walk hypothesis" as well as technical trading strategies. Papers for this meeting:
Wed Sep 22: Financial Modeling II: On-Line, Worst-Case Models
This week, Sham Kakade and Ryan Porter will take us through some recent works in a much more adversarial model of price movement, along with the kinds of results one can obtain in such models.
Wed Sep 29: Financial Modeling III: Trading in Markovian Price Models
We will finish up our financial wanderings with two topics:
Wed Oct 6:
Guest Speaker Yee Whye Teh, UC Berkeley
Hierarchical Dirichlet processes: A Bayesian approach to sharing clusters among related groups
Abstract: We consider problems involving groups of data, where each observation within a group is drawn from a mixture model, and where it is desirable to share mixture components both within and between groups. We assume that the number of mixture components is unknown a priori and is to be inferred from data. Such problems occur often in practice, e.g. in the problem of topic discovery in document corpora, each document is treated as a group of data items (bag of words), and each topic corresponds to a mixture component. In this setting, sharing components between groups simply means that topics can occur across a number of documents, allowing dependencies across documents (groups) to be modeled effectively as well as conferring generalization to new documents (groups).
The hierarchical Dirichlet process (HDP) is a Bayesian solution to this problem, utilizing both hierarchical and nonparametric modeling ideas. Each group is modeled using a Dirichlet process (DP) mixture, which provides a nonparametric prior for the number of components within each group. To facilitate sharing components between groups, we consider a hierarchical extension where the common base distribution for the DPs is itself distributed according to a DP. Such a base distribution being discrete, the group specific DPs necessarily share atoms, hence the mixtures for different groups necessarily share components.
We discuss a variety of representations for the HDP, as well as Markov chain Monte Carlo algorithms for posterior inference. We report experimental results on three text corpora showing the effective and superior performance of the HDP over previous models.
Technical Report: Hierarchical Dirichlet processes. Teh, Jordan, Beal and Blei (2004). UC Berkeley Department of Statistics, TR 653. Can be obtained at: http://www.cs.berkeley.edu/~ywteh/research/npbayes
Wed Oct 13:
Seminar participant and CIS doctoral student Timothee Cour will take us through the Bach-Jordan paper below, and related recent research that Timothy has been doing with Jianbo Shi.
Wed Oct 20:
Seminar participant Ted Sandler will lead us through some recent work in the theory community on spectral and graph clustering:
Wed Oct 27:
Seminar participant Ben Packer will tell us about some recent dimensionality reduction work he has done with Kilian Wienberger, as well as related background and connections to kernel methods. Further readings may be posted as well.
Wed Nov 3:
Once again the inimitable Sham Kakade steps to the plate, this time to speak on some recent work he has been doing:
Online Bounds for Bayesian Algorithms
Abstract: I'll provide some competitive online bounds for a number of Bayesian algorithms, without making any assumptions on the data generation process. These bounds show that some of the most popular Bayesian algorithms (such as least squares regression and Bayesian logistic regression) perform favorably when compared, in retrospect, to the single best model in the model class. I'll also provide bounds for some widely used maximum a posteriori (MAP) estimation algorithms, including regularized logistic regression.
Finally, I'll present some results (on work in progress) on online bounds for Gaussian Processes. The bound here have an interesting dependence on an intrinsic notion of dimensionality.
Wed Nov 10: Seminar participant Axel Bernal will lead us through work on conditional random fields, including the application of such models to gene finding.
Wed Nov 17:
Guest Speaker Prof. Amy Greenwald, Brown University (held this week in Levine 307)
No-regret learning algorithms have attracted a great deal of attention in the game-theoretic and machine learning communities. Whereas rational agents act so as to maximize their expected utilities, no-regret learners are boundedly rational agents that act so as to minimize their ``regret''. In this talk, we discuss the behavior of no-regret learning algorithms in repeated games.
Specifically, we introduce a general class of algorithms called no-$\Phi$-regret learning, which includes common variants of no-regret learning such as no-external-regret and no-internal-regret learning. Analogously, we introduce a class of game-theoretic equilibria called $\Phi$-equilibria. We show that no-$\Phi$-regret learning algorithms converge to $\Phi$-equilibria. In particular, no-external-regret learning converges to minimax equilibrium in zero-sum games; and no-internal-regret learning converges to correlated equilibrium in general-sum games. Although our class of no-regret algorithms is quite extensive, no algorithm in this class learns Nash equilibrium.
Here are Amy's slides.
Wed Nov 24: NO MEETING, THANKSGIVING BREAK
Wed Dec 1:
Gaurav Chakravorty will lead us through several papers of Vladimiar Vovk, who has systematically removed probabilistic assumptions from probability theory:
Wed Dec 8:
This week seminar participant Rong Chen, a postdoc in the Penn radiology department, will talk about his work applying graphical models to medical image analysis.
We'll start with a little bit of relevant background on Bayes networks and belief propagation:
Then we'll move on to Rong's work, including some slides to be provided later.
In this area, I will place links to papers and topics that may eventually promoted to the actual schedule. I will try to group them by rough "modules" of semi-related material. The initial set of modules is strongly based on the suggestions made during our first meeting. As these modules are fleshed out with specific readings and volunteers, they will be promoted to the formal schedule above.
A number of us expressed interest in looking at papers in the general areas of spectral clustering, factor analysis, and blind source separation including potential applications to image processing and vision, and financial data. Some suggested topics in this area were:
There was also interest in probabilistic models for document representation. Suggested topics:
There was healthy interest in the latest in graphical models and its relatives. Suggested topics:
Though a bit of it will be folded into our finance wanderings, there may be interest in developments in on-line learning and statistics, including:
Two other topic areas suggested were machine learning in medical imaging where we may have some local expertise and source of interesting problems; and Joe Halpern's extensive line of work on the foundations of probabilistic reasoning.
I continue to be interested in learning more about the work on property testing going on in the theory community, and its potential connections to topics in machine learning.
Have wanted to read this for a while: