**PENN CIS 625, SPRING 2018: THEORETICAL FOUNDATIONS OF MACHINE LEARNING (aka Computational Learning Theory)
**

Prof. Michael Kearns

mkearns@cis.upenn.edu

Time: Mondays
12-3 PM

Location: Active Learning Classroom, 3401 Walnut St., fourth floor.

(Elevator lobby just left of the Starbucks entrance)

**IMPORTANT NOTE:**
As per the University schedule, The first course meeting will be on
**Weds Jan 10 from 12-3.**
From then on we will meet Mondays 12-3.

URL for this page:

www.cis.upenn.edu/~mkearns/teaching/COLT

Previous incarnations of this course:

www.cis.upenn.edu/~mkearns/teaching/COLT/colt17.html

www.cis.upenn.edu/~mkearns/teaching/COLT/colt16.html

www.cis.upenn.edu/~mkearns/teaching/COLT/colt15.html (with Grigory Yaroslavtsev)

www.cis.upenn.edu/~mkearns/teaching/COLT/colt12.html (with Jake Abernethy)

www.cis.upenn.edu/~mkearns/teaching/COLT/colt08.html (with Koby Crammer)

**COURSE DESCRIPTION**

This course is an introduction to the theory of machine learning, which attempts to provide algorithmic, complexity-theoretic and probabilistic foundations to modern machine learning and related topics.

The first part of the course will closely follow portions of An Introduction to Computational Learning Theory, by M. Kearns and U. Vazirani (MIT Press). We will cover perhaps 6 or 7 of the chapters in K&V over (approximately) the first half of the course, often supplementing with additional readings and materials. Copies of K&V will be available at the Penn bookstore. The second portion of the course will focus on a number of models and topics in learning theory and related areas not covered in K&V.

The course will give a broad overview of the kinds of problems and techniques typically studied in theoretical machine learning, and provide a basic arsenal of powerful mathematical tools for analyzing machine learning problems.

Topics likely to be covered include:

Additional topics we may also cover include:

**COURSE FORMAT, REQUIREMENTS, AND PREREQUISITES **

Much of the course will be in fairly traditional "chalk talk" lecture format, but with ample opportunity for discussion, participation, and critique. The course will meet once a week on Mondays from 12 to 3, with the first meeting on Weds Jan 10. Lunch will be served.

The course will involve advanced mathematical material and will cover formal proofs in detail, and will be taught at the doctoral level. While there are no specific formal prerequisites, background or courses in algorithms, complexity theory, discrete math, combinatorics, probability theory and statistics will prove helpful, as will "mathematical maturity" in general. We will be examining detailed proofs throughout the course. If you have questions about the desired background, please ask. Auditors and occasional participants are welcome.

The course requirements for registered students will be a mixture of active in-class participation, problem sets, possibly leading a class discussion, and a final project. The final projects can range from actual research work, to a literature survey, to solving some additional problems.

Mon Jan 22

PAC learnability of boolean conjunctions;
intractability of PAC learning 3-term DNF.

READING: K&V Chapter 1.

Mon Jan 29

Intractability of PAC learning 3-term DNF continued;
PAC learning 3-term DNF by 3CNF;
learning, consistency and compression;
Occam's Razor.

READING: K&V Chapter 1 and Chapter 2.

PROBLEM SET #1 (due in hardcopy form in class Feb 5):

1. As carefully as you can, prove the PAC learnability of axis-aligned rectangles in n dimensions in time polynomial in n, 1/epsilon and 1/delta. Describe the algorithm precisely and provide as detailed a proof as you can, and calculate the sample size needed.

For problems 2. and 3. below, you may assume that the input distribution/density D is uniform over the unit square [0,1] x [0,1]. For extra credit, sketch how you might generalize your results to the case of unknown and arbitrary D. You might also find it useful to know about Chernoff bounds and related inequalities, which are discussed both here and in the appendix of K&V.

2. Prove the PAC learnability of unions of 2 axis-aligned rectangles in the real plane in time polynomial in 1/epsilon and 1/delta.

3. Consider the variant of the PAC model with classification noise: each time the learner asks for a random example of the target concept c, instead of receiving x,c(x) for x drawn from D, the learner receives x,y where y = c(x) with probability 2/3, and y = -c(x) with probability 1/3. Here we assume that c(x) is +1 or -1, and that the noise is independent for each example. Prove the PAC learnability of axis-aligned rectangles in the real plane in this modified model.

Collaboration on the problem sets is permitted, but everyone must turn in their own, independent writeup, in which you acknowledge your collaborators.

Mon Feb 5

PAC learning yields weak compression;
consistency and learning in the agnostic/unrealizable setting;
introduction to VC dimension.

READING: K&V Chapters 2 and 3.

Mon Feb 12

Today to start we will have a special guest lecture from Prof.
Dana Moshkovitz
of UT Austin on some
very exciting recent results in the PAC model; afterwards we will continue
with the VC dimension. Dana's abstract:

What Cannot Be Learned With Bounded Memory

How does computational learning change when one cannot store all the examples one sees in memory? This question has seen a burst of interest in the past couple of years, leading to the surprising theorem that there exist simple concepts (parities) that require an extraordinary amount of time to learn unless one has quite a lot of memory. In this work we show that in fact most concepts cannot be learned without sufficient memory. This subsumes the aforementioned theorem and implies similar results for other concepts of interest. The new results follow from a general combinatorial framework that we developed to prove lower bounds for space bounded learning.

Joint work with Michal Moshkovitz, Hebrew University.

READING: K&V Chapter 3, and here is a link to a paper related to Dana's talk.