Course Home Page for CIS 700/02, Fall 2004: Advanced Topics in Machine Learning

CIS 700/02
ADVANCED TOPICS IN MACHINE LEARNING
FALL 2004

URL for this page: http://www.cis.upenn.edu/~mkearns/teaching/cis700

INSTRUCTOR

Prof. Michael Kearns
mkearns@cis.upenn.edu
Office hours: By appointment

COURSE LOCATION AND TIME

Wednesdays 1:30-4, in 315 Levine Hall.

COURSE DESCRIPTION

Just as in the Fall 2003 version, this seminar course will examine selected recent developments in machine learning and related topics from the literature. The term "machine learning" will be construed broadly (as it seems to be in the research community:)), and will include all topics in statistical modeling and probabilistic AI, as well as relevant results and tools from theoretical CS and algorithms, game theory and economics, finance, and others.

See last year's web page (link above) for a flavor of the topics covered then and the level of the readings. Participants will have a strong say in what topics are examined.

COURSE FORMAT AND REQUIREMENTS

The course will be run in seminar/reading group format. Each week we will examine a couple of papers, and each session will have one or more participants who will lead an informal discussion of the chosen material. The course will also act as a venue for external speakers, and we'll also consider locals who would like a forum for presentation and discussion of their own work in machine learning.

Class participation, including leading one or more sessions, will be determine the course grade. It is expected that participants will read the material, at least enough to understand the main ideas, before each meeting. This way the meetings themselves can be for diving into greater detail, discussing strengths and weaknesses, brainstorming on extensions, proving original results, etc.

Auditors are welcome, as are sporadic and one-time attendees who drop in for subjects that interest them.

PREREQUISITES

Familiarity with the basics of machine learning, probability theory and statistics, and the analysis of algorithms. Some knowledge of mathematical programming, optimization, and computational complexity will probably also be helpful.

SCHEDULE AND READINGS

Wed Sep 8:

We'll start with a mainly organizational meeting to discuss possible topics for the term, and to start planning the schedule and discussion leaders. Please come prepared with suggestions of both areas for investigation and specific readings.

OK, we did the above:), and I have summarized our initial suggestions for topics and readings in The Bullpen (see below) shortly. Next meeting or two will be led by me and others on topics in financial modeling.

Wed Sep 15: Financial Modeling I: Random and Non-Random Walks; Technical Trading Strategies

Sham Kakade, Ryan Porter and I will lead the discussions for the first few sessions on different approaches to financial modeling, algorithms for trading profitably in those models, etc. I'll begin with some papers examining the more classical "random walk hypothesis" as well as technical trading strategies. Papers for this meeting:

Excerpt from "Options, Futures, and Other Derivative Securities", J. Hull.

Excerpt from "A Non-Random Walk Down Wall Street", Lo and MacKinlay

"Foundations of Technical Analysis: Computational Algorithms, Statistical Inference, and Empirical Implementation" A. Lo, H. Mamaysky, J. Wang.

"Simple Technical Trading Rules and the Stochastic Properties of Stock Returns", W. Brock, J. Lakonishok, B. LeBaron.

Here's a nice collection of links to materials on technical analysis (for reference only)

Wed Sep 22: Financial Modeling II: On-Line, Worst-Case Models

This week, Sham Kakade and Ryan Porter will take us through some recent works in a much more adversarial model of price movement, along with the kinds of results one can obtain in such models.

"On-line Portfolio Selection Using Multiplicative Updates", D. Helmbold, R. Schapire, Y. Singer, M. Warmuth.

"Optimal Search and One Way Trading Online Algorithms", R. El-Yaniv, A. Fiat, R.M. Karp, G. Turpin

"Universal Portfolios with and Without Transaction Costs", A. Blum, A. Kalai

"Can We Learn to Beat the Best Stock", A. Borodin, R. El-Yaniv, V. Gogan.

Wed Sep 29: Financial Modeling III: Trading in Markovian Price Models

We will finish up our financial wanderings with two topics:

I will present some brand-new work that Sham Kakade and I have done on trading in Markovian price models, and I will relate this work to the other models we have examined.

Sham will present the paper "Competitive Algorithms for VWAP and Limit Order Trading, S. Kakade, M. Kearns, Y. Mansour and L. Ortiz.

Wed Oct 6:

Guest Speaker Yee Whye Teh, UC Berkeley

Hierarchical Dirichlet processes: A Bayesian approach to sharing clusters among related groups

Abstract: We consider problems involving groups of data, where each observation within a group is drawn from a mixture model, and where it is desirable to share mixture components both within and between groups. We assume that the number of mixture components is unknown a priori and is to be inferred from data. Such problems occur often in practice, e.g. in the problem of topic discovery in document corpora, each document is treated as a group of data items (bag of words), and each topic corresponds to a mixture component. In this setting, sharing components between groups simply means that topics can occur across a number of documents, allowing dependencies across documents (groups) to be modeled effectively as well as conferring generalization to new documents (groups).

The hierarchical Dirichlet process (HDP) is a Bayesian solution to this problem, utilizing both hierarchical and nonparametric modeling ideas. Each group is modeled using a Dirichlet process (DP) mixture, which provides a nonparametric prior for the number of components within each group. To facilitate sharing components between groups, we consider a hierarchical extension where the common base distribution for the DPs is itself distributed according to a DP. Such a base distribution being discrete, the group specific DPs necessarily share atoms, hence the mixtures for different groups necessarily share components.

We discuss a variety of representations for the HDP, as well as Markov chain Monte Carlo algorithms for posterior inference. We report experimental results on three text corpora showing the effective and superior performance of the HDP over previous models.

Technical Report: Hierarchical Dirichlet processes. Teh, Jordan, Beal and Blei (2004). UC Berkeley Department of Statistics, TR 653. Can be obtained at: http://www.cs.berkeley.edu/~ywteh/research/npbayes

Wed Oct 13:

Seminar participant and CIS doctoral student Timothee Cour will take us through the Bach-Jordan paper below, and related recent research that Timothy has been doing with Jianbo Shi.

"Learning Spectral Clustering", F. Bach, M. Jordan

Here is a link to Timothee's page , which includes a technical report he will speak on.

Wed Oct 20:

Seminar participant Ted Sandler will lead us through some recent work in the theory community on spectral and graph clustering:

"Spectral Partitioning Works: Planar graphs and finite element meshes", Daniel A. Spielman, Shang-Hua Teng

"On the Quality of Spectral Separators", Stephen Guattery, Gary L. Miller

"A unifying theorem for spectral embedding and clustering", Matthew Brand, Kun Huang

Wed Oct 27:

Seminar participant Ben Packer will tell us about some recent dimensionality reduction work he has done with Kilian Wienberger, as well as related background and connections to kernel methods. Further readings may be posted as well.

"Nonlinear Dimensionality Reduction by Semidefinite Programming and Kernel Matrix Factorization", Kilian Q. Weinberger, Lawrence K. Saul, and Benjamin D. Packer

Wed Nov 3:

Once again the inimitable Sham Kakade steps to the plate, this time to speak on some recent work he has been doing:

Online Bounds for Bayesian Algorithms

Abstract: I'll provide some competitive online bounds for a number of Bayesian algorithms, without making any assumptions on the data generation process. These bounds show that some of the most popular Bayesian algorithms (such as least squares regression and Bayesian logistic regression) perform favorably when compared, in retrospect, to the single best model in the model class. I'll also provide bounds for some widely used maximum a posteriori (MAP) estimation algorithms, including regularized logistic regression.

Finally, I'll present some results (on work in progress) on online bounds for Gaussian Processes. The bound here have an interesting dependence on an intrinsic notion of dimensionality.

Wed Nov 10: Seminar participant Axel Bernal will lead us through work on conditional random fields, including the application of such models to gene finding.

"Computational Gene Prediction of Eukaruotic Protein Coding Genes.", MQ Zhang

"Discriminative Training Methods for Hidden Markov Models: Theory and Experiments with Perceptron Algorithms.", Michael Collins

"Conditional random fields: Probabilistic models for segmenting and labeling sequence data.", John Lafferty, Andrew McCallum, and Fernando Pereira.

Wed Nov 17:

Guest Speaker Prof. Amy Greenwald, Brown University (held this week in Levine 307)

Game-Theoretic Learning

No-regret learning algorithms have attracted a great deal of attention in the game-theoretic and machine learning communities. Whereas rational agents act so as to maximize their expected utilities, no-regret learners are boundedly rational agents that act so as to minimize their ``regret''. In this talk, we discuss the behavior of no-regret learning algorithms in repeated games.

Specifically, we introduce a general class of algorithms called no-$\Phi$-regret learning, which includes common variants of no-regret learning such as no-external-regret and no-internal-regret learning. Analogously, we introduce a class of game-theoretic equilibria called $\Phi$-equilibria. We show that no-$\Phi$-regret learning algorithms converge to $\Phi$-equilibria. In particular, no-external-regret learning converges to minimax equilibrium in zero-sum games; and no-internal-regret learning converges to correlated equilibrium in general-sum games. Although our class of no-regret algorithms is quite extensive, no algorithm in this class learns Nash equilibrium.

Here are Amy's slides.

Wed Nov 24: NO MEETING, THANKSGIVING BREAK

Wed Dec 1:

Gaurav Chakravorty will lead us through several papers of Vladimiar Vovk, who has systematically removed probabilistic assumptions from probability theory:

"Competitive on-line statistics", Vladimir Vovk

"Aggregating strategies" Vladimir Vovk

"A game of prediction with expert advice" Vladimir Vovk

Wed Dec 8:

This week seminar participant Rong Chen, a postdoc in the Penn radiology department, will talk about his work applying graphical models to medical image analysis.

We'll start with a little bit of relevant background on Bayes networks and belief propagation:

"A Tutorial on Learning with Bayesian Networks", D. Heckerman

"Understanding belief propagation and its generalizations", J. S. Yedidia, W. T. Freeman, and Y. Weiss

Then we'll move on to Rong's work, including some slides to be provided later.

"Learning Bayesian Network Structure from Distributed Data", Chen. R, Sivakumar. K, and Kargupta, H.

THE BULLPEN

In this area, I will place links to papers and topics that may eventually promoted to the actual schedule. I will try to group them by rough "modules" of semi-related material. The initial set of modules is strongly based on the suggestions made during our first meeting. As these modules are fleshed out with specific readings and volunteers, they will be promoted to the formal schedule above.

A number of us expressed interest in looking at papers in the general areas of spectral clustering, factor analysis, and blind source separation including potential applications to image processing and vision, and financial data. Some suggested topics in this area were:

papers by Ng, Jordan, Weiss and others

paper on learning spectral clustering by Bach and Jordan

clustering and factor analyses in financial markets (MK knows some papers)

blind source separation, relationship to factor analysis As they arrive, I will fill in these general topics with specific papers.

There was also interest in probabilistic models for document representation. Suggested topics:

Bengio paper on "theme topic mixture models"

Adar et al. on implicit structure and dynamics of blogspace

There was healthy interest in the latest in graphical models and its relatives. Suggested topics:

Wainwright and Jordan work

Jordan, Beal and Blei on hierarchical Dirichilet models

all that Bethe approximation/loopy belief prop stuff

Radford Neal on defining prior distributions using hierarchical Dirichilet

Ishwaran and James on "stick-breaking priors" (what about back-breaking priors?)

probabilistic programming languages a la Koller, Pfeffer and beyond

our own Hanna Wallach will learn us Gaussian processes! This one is a must

Though a bit of it will be folded into our finance wanderings, there may be interest in developments in on-line learning and statistics, including:

Vovk on competitive online statistics

Warmuth on online density estimation

some recent work by Sham

Two other topic areas suggested were machine learning in medical imaging where we may have some local expertise and source of interesting problems; and Joe Halpern's extensive line of work on the foundations of probabilistic reasoning.

I continue to be interested in learning more about the work on property testing going on in the theory community, and its potential connections to topics in machine learning.

"Property Testing (A Tutorial)", D. Ron.

Have wanted to read this for a while:

"The Complexity of Approximating the Entropy", T. Batu, S. Dasgupta, R. Kumar, R. Rubinfeld.