CIS 700/04
FALL 2003

URL for this page:

Prof. Michael Kearns
Office hours: By appointment

Tuesday and Thursday 12-1:30, in 307 Levine Hall. PLEASE NOTE CHANGE IN ROOM.

This seminar course will examine selected recent developments in machine learning research from the literature. While more detail will be provided as it develops, potential topics include things like:

  • Support vector machines and kernel methods
  • Random projection methods for clustering
  • Information bottleneck
  • Independent component analysis, principal component analysis, etc.
  • Statistical relational learning
  • Nearest-neighbor algorithms
  • Gradient approaches in reinforcement learning
  • Gaussian processes
  • Information geometry
  • Predictive state representation in reinforcement learning
  • Discriminative learning
  • Mixing unlabeled and labeled data
  • Learning rankings
  • Recent advances in boosting
  • Property testing

    I am also open to suggestions for both topics and specific readings.

    The course will be run in seminar/reading group format. Each week we will examine a couple of papers, and each session will have one or more participants who will lead an informal discussion of the chosen material (not give a formal presentation). I will probably also use the course as a venue for external speakers.

    Class participation will be determine approximately half of the course grade, with the other half being based on a course project.

    Auditors are welcome, as are sporadic and one-time attendees who drop in for subjects that interest them.

    Familiarity with the basics of machine learning, statistics and analysis of algorithms.


    Th Sep 4: We'll start with a mainly organizational meeting to discuss possible topics for the term, and to start plotting schedule and volunteers.

    Tu Sep 9, Th Sep 11: Support Vector Machines and Kernel Methods
    I will start things off by leading discussions of some portions of the papers below. There is likely to be lots of overlap between the first three items, which are all tutorials, so at a minimum I'd like to suggest that everyone take a good look at at least one of them.

  • ``Support Vector Networks'', C. Cortes and V. Vapnik.
  • ``A Tutorial on Support Vector Machines for Pattern Recognition'', C. Burges.
  • ``Support Vector and Kernel Machines'', ICML 2001 tutorial by Nello Cristianini.

    Tu Sep 16, Th Sep 18: SVMs Continued
    We'll finish up our study of SVMs with a couple of more application-oriented papers. Fernando Pereira graciously offered to lead the discussion on the second paper below.

  • ``A Statistical Learning Model of Text Classification for Support Vector Machines'', T. Joachims.
  • ``Max-Margin Markov Networks'', B. Taskar, C. Guestrin, D. Koller.
  • ``Predicting Time Series with Support Vector Machines'', K. Muller, A. Smola, G. Ratsch, B. Scholkopf, J. Kohlmorgen, V. Vapnik.

    Tu Sep 23, Th Sep 25: Clustering, and a Special Seminar
    Tu Sep 23: Jon Kleinberg has a really thought-provoking impossibility result that we will examine, led by Nick Montfort.

  • ``An Impossibility Theorem for Clustering'', J. Kleinberg.

    For Th Sep 25, we'll have the following special seminar by Prof Yishay Mansour a well-known researcher in computational learning theory, computational game theory, and distributed computation. This talk mixes all of these topics, and can be viewed as a distributed learning result in a game-theoretic setting.

    Title: Convergence Time to Nash Equilibria for Load Balancing
    Speaker: Yishay Mansour
    Affiliation: Tel-Aviv University

    Abstract: We study the number of steps required to reach a pure Nash Equilibrium in a load balancing scenario where each job behaves selfishly and attempts to migrate to a machine which will minimize its cost.

    We consider variety of load balancing models, including identical, restricted, related and unrelated machines. Our results have a crucial dependence on the allowable weights assigned to jobs. We consider arbitrary weights, integer weights, K distinct weights and identical (unit) weights. We look both at an arbitrary schedule (where the only restriction is that a job migrates to a machine which lowers its cost) and specific efficient schedulers (such as allowing the largest weight job to move first).

    This is a joint work with Eyal Even-Dar and Alex Kesselman.

    Tu Sep 30, Th Oct 2, Tu Oct 7: Clustering Continued
    Sanjoy Dasgupta has made a number of fundamental insights and contributions to clustering over the past few years. Sham Kakade has also suggested we look at the related Vempala/Wang paper below.

  • ``Learning Mixtures of Gaussians'', S. Dasgupta.
  • ``Experiments with Random Projection'', S. Dasgupta.
  • ``A Two-Round Variant of EM for Gaussian Mixtures'', S. Dasgupta, L. Schulman.
  • ``A Spectral Algorithm for Learning Mixtures of Distributions'', S. Vempala, G. Wang.

    Th Oct 9: Special Talk --- Prof Lawrence Saul
    Prof Lawrence Saul of CIS will give a special talk on very recent work directly related to our clustering investigation. His abstract is below. I also strongly encourage people to hit Sebastian Seung's talk at IRCS, Fri Oct 10 at noon. He is a terrific speaker.


    ABSTRACT: Can we detect low dimensional structure in high dimensional data sets? The problem of dimensionality reduction is fundamental to machine learning, pattern recognition, and statistics. In this talk, I will describe a new solution to this problem based on semidefinite programming. The resulting algorithm can be used to analyze high dimensional data that lies on or near a low dimensional manifold. The algorithm overcomes various limitations of previous work in manifold learning (e.g., Isomap, LLE, graph laplacians). It also bridges two recent developments in machine learning: semidefinite programming for learning kernel matrices and spectral methods for nonlinear dimensionality reduction.

    Joint work with Kilian Weinberger (U Penn).

    Th Oct 16, Tu Oct 21: Clustering Data Streams
    Prof Sudipto Guha Sudipto Guha will lead us through recent algorithms for on-line clustering of data streams. More references to follow.

  • ``Incremental Clustering and Dynamic Information Retreival'', M. Charikar, C. Chekuri, T. Feder, R. Motwani.
  • ``Clustering Data Streams'', S. Guha, N. Mishra, R. Motwani, L. O'Callaghan.
  • ``Better streaming algorithms for clustering problems'' M. Charikar, L. O'Callaghan, R. Panigrahy.
  • ``Sublinear Time Algorithms for Metric Space Problems'', P. Indyk.

    Th Oct 23, Tu Oct 28, Th Oct 30: Ranking
    Libin Shen will lead us through some of the many recent interesting papers on the problem of learning ordinal rankings of alternatives.

  • ``Discriminative Reranking for Natural Language Parsing'', M. Collins. ICML 2000.
  • ``Pranking with Ranking'', K. Crammer and Y. Singer. NIPS 2001.
  • ``Constraint Classification for Multiclass Classification and Ranking'', S. Har-Peled and D. Roth and D. Zimak. NIPS 2002.
  • ``Large Margin Rank Boundaries for Ordinal Regression'', R. Herbrich, T. Graepel, and K. Obermayer.
  • ``An SVM-based voting algorithm with application to parse reranking'', L. Shen and A. Joshi. CoNLL 2003.

    Tu Nov 4, Th Nov 6: Dimensionality Reduction Revisited
    Kilian Weinberger will take us through a set of related papers on dimensionality reduction that are closely related to his ongoing work with Lawrence Saul on the topic.

  • ``Nonlinear Dimensionality Reduction by Locally Linear Embedding'', L. Saul and S. Roweis.
  • ``A Global Geometric Framework for Nonlinear Dimensionality Reduction'', J. Tennenbaum, V. de Silva, J. Langford.
  • ``Nonlinear Component Analysis as a Kernel Eigenvalue Problem'', B. Scholkopf, A. Smola, K. Muller.

    Tu Nov 11, Th Nov 13: Boosting, Online Learning, and Game Theory
    Kobby Essien will lead us through a set of related papers on online prediction and game-theoretic learning frameworks.

  • ``Game theory, on-line prediction and boosting'', Yoav Freund and Robert E. Schapire.
  • ``Efficient Algorithms for Online Decision Problems'', Adam Kalai and Santosh Vempala.
  • ``Regret in the On-line Decision Problem'' Dean Foster and Rakesh Vohra.
  • ``Efficient Algorithms for Universal Portfolios'', Adam Kalai and Santosh Vempala.

    Nov 20 and Dec 2: Approaches to Multiclass Learning
    Yuan Ding will lead our study of algorithms and models for dealing with multiclass learning problems, many of which have a coding-theoretic element.

  • `` A Comparison of Methods for Multi-class Support Vector Machines,'' C. Hsu, C. Lin.
  • ``On the Learnability and Design of Output Codes for Multiclass Problems,'' K. Crammer, Y. Singer.
  • ``Solving Multiclass Learning via Error-Correcting Output Codes'' , T. Dietterich, G. Bakiri.


    Title: Dimension reduction and information theory - what's the connection?

    Speaker: Naftali Tishby, The Hebrew University


    In this talk I will take a new look at Shannon's Information theory from a Machine Learning perspective. I will argue that Shannon's theory provides a compelling mechanism for quantifying the fundamental tradeoff between complexity and accuracy, by unifying the source and channel coding theorems into one principle which I call the "Information Bottleneck" (IB). This unified view of the coding theorems can shed new light on the question of "relevant data representation" and suggests new algorithms for extracting such representations from co-occurrence statistics. In the second part I will briefly examine two different applications of this principle to continuous dimension reduction and feature extraction from high dimensional datasets, Sufficient Dimensionality Reduction (SDR) and Gaussian-IB, both provide nonlinear continuous dimension reduction that preserve relevant information.

    More information and other related papers can be found at

    Possible Future Material

  • ``The Complexity of Approximating the Entropy'', T. Batu, S. Dasgupta, R. Kumar, R. Rubinfeld.
  • ``Property Testing (A Tutorial)'', D. Ron.