Prof. Michael Kearns

Time: Mondays 12-3 PM
Location: Active Learning Classroom, 3401 Walnut St., fourth floor.
(Elevator lobby just left of the Starbucks entrance)

IMPORTANT NOTE: As per the University schedule, The first course meeting will be on Weds Jan 10 from 12-3. From then on we will meet Mondays 12-3.

URL for this page:

Previous incarnations of this course: (with Grigory Yaroslavtsev) (with Jake Abernethy) (with Koby Crammer)


This course is an introduction to the theory of machine learning, which attempts to provide algorithmic, complexity-theoretic and probabilistic foundations to modern machine learning and related topics.

The first part of the course will closely follow portions of An Introduction to Computational Learning Theory, by M. Kearns and U. Vazirani (MIT Press). We will cover perhaps 6 or 7 of the chapters in K&V over (approximately) the first half of the course, often supplementing with additional readings and materials. Copies of K&V will be available at the Penn bookstore. The second portion of the course will focus on a number of models and topics in learning theory and related areas not covered in K&V.

The course will give a broad overview of the kinds of problems and techniques typically studied in theoretical machine learning, and provide a basic arsenal of powerful mathematical tools for analyzing machine learning problems.

Topics likely to be covered include:

  • Basics of the Probably Approximately Correct (PAC) Learning Model
  • Occam's Razor, Compression and Learning
  • Uniform Convergence and the Vapnik-Chervonenkis Dimension
  • Boosting
  • Learning in the Presence of Noise and Statistical Query Learning
  • Learning and Cryptography
  • Query Models
  • Online and No-Regret Learning Models
  • Agnostic Learning

    Additional topics we may also cover include:

  • Learning and Differential Privacy
  • Fairness in Machine Learning
  • Theory of Reinforcement Learning


    Much of the course will be in fairly traditional "chalk talk" lecture format, but with ample opportunity for discussion, participation, and critique. The course will meet once a week on Mondays from 12 to 3, with the first meeting on Weds Jan 10. Lunch will be served.

    The course will involve advanced mathematical material and will cover formal proofs in detail, and will be taught at the doctoral level. While there are no specific formal prerequisites, background or courses in algorithms, complexity theory, discrete math, combinatorics, probability theory and statistics will prove helpful, as will "mathematical maturity" in general. We will be examining detailed proofs throughout the course. If you have questions about the desired background, please ask. Auditors and occasional participants are welcome.

    The course requirements for registered students will be a mixture of active in-class participation, problem sets, possibly leading a class discussion, and a final project. The final projects can range from actual research work, to a literature survey, to solving some additional problems.

    The exact timing and set of topics below will depend on our progress and will be updated as we proceed.

    Wed Jan 10
    In the first meeting, we will go over course mechanics and present a course overview, then immediately begin investigating the Probably Approximately Correct (PAC) model of learning. Detailed topics covered: Learning rectangles in the real plane; definition of the PAC model; PAC learnability of rectangles in d dimensions.

    READING: K&V Chapter 1.

    Mon Jan 22
    PAC learnability of boolean conjunctions; intractability of PAC learning 3-term DNF.

    READING: K&V Chapter 1.

    Mon Jan 29
    Intractability of PAC learning 3-term DNF continued; PAC learning 3-term DNF by 3CNF; learning, consistency and compression; Occam's Razor.

    READING: K&V Chapter 1 and Chapter 2.

    PROBLEM SET #1 (due in hardcopy form in class Feb 5):

    1. As carefully as you can, prove the PAC learnability of axis-aligned rectangles in n dimensions in time polynomial in n, 1/epsilon and 1/delta. Describe the algorithm precisely and provide as detailed a proof as you can, and calculate the sample size needed.

    For problems 2. and 3. below, you may assume that the input distribution/density D is uniform over the unit square [0,1] x [0,1]. For extra credit, sketch how you might generalize your results to the case of unknown and arbitrary D. You might also find it useful to know about Chernoff bounds and related inequalities, which are discussed both here and in the appendix of K&V.

    2. Prove the PAC learnability of unions of 2 axis-aligned rectangles in the real plane in time polynomial in 1/epsilon and 1/delta.

    3. Consider the variant of the PAC model with classification noise: each time the learner asks for a random example of the target concept c, instead of receiving x,c(x) for x drawn from D, the learner receives x,y where y = c(x) with probability 2/3, and y = -c(x) with probability 1/3. Here we assume that c(x) is +1 or -1, and that the noise is independent for each example. Prove the PAC learnability of axis-aligned rectangles in the real plane in this modified model.

    Collaboration on the problem sets is permitted, but   everyone must turn in their own, independent writeup, in which you acknowledge your collaborators.

    Mon Feb 5
    PAC learning yields weak compression; consistency and learning in the agnostic/unrealizable setting; introduction to VC dimension.

    READING: K&V Chapters 2 and 3.

    Mon Feb 12
    Today to start we will have a special guest lecture from Prof. Dana Moshkovitz of UT Austin on some very exciting recent results in the PAC model; afterwards we will continue with the VC dimension. Dana's abstract:

    What Cannot Be Learned With Bounded Memory

    How does computational learning change when one cannot store all the examples one sees in memory? This question has seen a burst of interest in the past couple of years, leading to the surprising theorem that there exist simple concepts (parities) that require an extraordinary amount of time to learn unless one has quite a lot of memory. In this work we show that in fact most concepts cannot be learned without sufficient memory. This subsumes the aforementioned theorem and implies similar results for other concepts of interest. The new results follow from a general combinatorial framework that we developed to prove lower bounds for space bounded learning.

    Joint work with Michal Moshkovitz, Hebrew University.

    READING: K&V Chapter 3, and here is a link to a paper related to Dana's talk.