CIS 620: ADVANCED TOPICS IN ARTIFICIAL INTELLIGENCE
SPRING 2002


URL for this page: http://www.cis.upenn.edu/~mkearns/teaching/cis620/cis620.html

INSTRUCTORS
Prof. Michael Kearns
mkearns@cis.upenn.edu
Office hours: Wednesday 2-4 PM in 555 Moore/GRW

Prof. Lawrence Saul
lsaul@cis.upenn.edu
Office hours: Wednesday 2-4 PM in 557 Moore/GRW

COURSE DESCRIPTION
Over the last decade, significant advances have been made in a number of core problems of artificial intelligence (AI). These advances share an emphasis on probabilistic models of uncertainty, compact representations of complex environments, and efficient algorithms for reasoning, acting, and learning from experience. We will survey the main problems of probabilistic AI, including planning and learning in Markovian environments, reasoning under uncertainty via probabilistic graphical models, and interaction between rational agents in game-theoretic settings. We will emphasize the significant connections between these problems and the solutions proposed for them. The course will feature a mix of theory and application, with formal mathematical analyses of algorithms as well as frequent in-class computer demonstrations.

COURSE LOCATION AND TIME
Monday and Wednesday 10:30 - 12 in Towne 321.

PREREQUISITES
Introduction to AI at the level of CIS 520. Some multivariable calculus, linear algebra, and elementary probability theory would be helpful, but not essential. This course is aimed at graduate students and advanced undergraduates.

COURSE REQUIREMENTS
Problem sets and a course project; maybe occasional scribe notes and presentations of readings.

RECOMMENDED BOOKS

  • Reinforcement Learning , R. Sutton and A. Barto, MIT Press, 1998.
  • Probabilistic Reasoning in Intelligent Systems , J. Pearl, Morgan Kaufmann, 1988.

    There will also be frequent supplementary research readings made available on this page.

    COURSE NEWSGROUP
    There is a newsgroup for the course, which is upenn.cis.cis620. Students are free to communicate with each other there, and we will occasionally post news and information about the course and problem sets there.

    INFORMATION ON COURSE PROJECTS

  • IMPORTANT FINAL INFORMATION ON PROJECTS: You MUST turn your projects in by THIS FRIDAY MAY 3, 4 PM AT THE LATEST. Please make TWO COPIES of your project, and place one copy in each of our departmental mailboxes (M. Kearns and L. Saul) in Moore/GRW 559 (this is the CIS administrative office). Formatting information for projects: Most projects will consist of a 6-9 page write-up, including abstract, figures, and references. (If you provide source code, do not count this towards the 6-9 write-up pages.) Write-ups should aim for clarity, brevity, originality, and wit.
  • Course projects are due Fri May 3, the last day of finals week. This is a HARD deadline.
  • You should propose a project idea and have it approved by us by Wed Apr 3. Please send us your idea(s) in an email first, then come by to discuss it in person at office hours (or make an appointment with one of us).
  • Projects will constitute somewhere between 2 and 4 problem sets worth of the course grade. We'll be quite receptive to weighting the project more heavily if it's a strong one.


    PROBLEM SETS, LECTURE NOTES, HANDOUTS
    Note on MATLAB access: The PC labs are in the Towne building. You can get full details to the labs at http://www.seas.upenn.edu/cets/labs.html. The PC labs require an Eniac account for access. Anyone with a gradient account is also entitled to an Eniac account if she or he doesn't already have one.
    Problem Set #1 (due Weds Jan 23, start of class) [Postscript] [PDF]
    A clarifying note on definitions [Postscript] [PDF]
    Solutions to Problem Set #1 [Postscript] [PDF]
    Paper: "An Upper Bound on the Loss from Approximate Optimal-Value Functions", by Singh and Yee [Postscript] [PDF]
    Problem Set #2 (due Weds Feb 6, start of class) [Postscript] [PDF]
    Related paper: "A Sparse Sampling Algorithm for Near-Optimal Planning in Large Markov Decision Processes", by MK, Mansour, Ng [Postscript] [PDF]
    Link to Prof. Saul's lecture note directory here
    Lecture Notes on Probability [Gzipped Postscript] [PDF]
    Lecture Notes on Regression [Gzipped Postscript] [PDF]
    Problem Set #3 (Due Wed Feb 21, start of class) [Gzipped Postscript] [PDF]
    Related MATLAB binary data files: nasdaq_lag5.mat digits2vs3.mat
    Lecture Notes on Convexity [Gzipped Postscript] [PDF]
    Lecture Notes on Iterative Scaling [Gzipped Postscript] [PDF]
    Problem Set #4 (Due Wed Feb 27, start of class) [Gzipped Postscript] [PDF]
    Lecture Notes on State Aggregation [Gzipped Postscript] [PDF]
    Lecture Notes on Sensor Fusion [Gzipped Postscript] [PDF]
    Lecture Notes on HMMs [Gzipped Postscript] [PDF] Problem Set #6 (Due Monday Mar 25) [Gzipped Postscript] [PDF]
    Problem Set #7 (due Mon Apr 1) [Postscript] [PDF]
    MK tutorial including inference in Bayes nets, reinforcement learning [Postscript] [PDF]


    DETAILED COURSE OUTLINE
    The outline below is subject to ongoing revision. In general, we'll try to modify the outline both in advance (to reflect anticipated changes in material and scheduling of topics) and in retrospect (to reflect what was actually covered in each past week). Just to be fancy, we'll change the color of subjects we've already covered to blue.


    WEEK 1 (Jan 7):
    INTRODUCTION; PLANNING IN MDPs

  • Course overview
  • Introduction and motivation for Markov Decision Processes (MDPs) and reinforcement learning
  • Definition of an MDP: states, actions, transition probabilities
  • Policies in an MDP
  • Notions of return: discounted, undiscounted, delayed rewards, credit assignment
  • Horizon time
  • Planning in MDPs: computing the optimal policy given all parameters of an MDP
  • Value functions, fixed-point recurrence of value function
  • Greedy policies with respect to value functions
  • Bellman optimality equation for the optimal value function
  • The value iteration algorithm for planning
  • Proof of value iteration convergence to optimal value function/policy
  • Demo of value iteration: grid world with stochastic "wind" and dragons
  • Policy evaluation via solution of a linear system of equations
  • The policy iteration algorithm for planning
  • Monotonic improvement of policy iteration, relation to value iteration: see Problem Set 1

    Related Readings

  • Sutton and Barto pages 66-81 introduces MDPs, gives some nice examples, and covers value functions.
  • S&B pages 89-102 covers policy evaluation, policy iteration, and value iteration.

    Problem Set #1 (due Weds Jan 23, start of class) [Postscript] [PDF]
    A clarifying note on definitions [Postscript] [PDF]


    WEEK 2 (Jan 14):
    FOUNDATIONS OF PROBABILITY; LARGE DEVIATION THEORY; LEARNING IN MDPs

  • Central limit theorem
  • Large deviation theory; general form and examples
  • Models and goals of learning in MDPs (overview): generative models, episodic (reset-to-start), no-reset
  • Policy evaluation in the learning setting
  • TD(k), phased TD(k): algorithm definitions

    Related Readings

  • Sutton and Barto pages 112-116 discusses Monte Carlo policy evaluation, which we called the TD(k=infinity) algorithm
  • S&B pages 133-145 discusses the interpolation from relying purely on your value function estimate to relying purely on Monte Carlo trajectories
  • S&B pages 164-168 examines what we have been calling TD(k)

    Lecture Notes for Jan 14 (Foundations of Probability) [Gzipped Postscript] [PDF]
    Paper: "On the Complexity of Solving Markov Decision Processes", by Littman, Dean, and Kaelbling [Postscript] [PDF]


    WEEK 3 (Jan 21; Short week, MLK day):
    LEARNING IN MDPs: TD(k) ANALYSIS

  • Analysis of phased TD(k) via large deviation bounds and union bound (uniform convergence analysis)
  • Phased TD(k) "bias-variance" bound on error
  • Schedule for k as a function of phase number
  • Very brief mention of TD(lambda)
  • Introduction to Q-learning

    Paper: " 'Bias-Variance' Error Bounds for Temporal Difference Updates", by MK and Singh, gives a fairly brief treatment of the TD(k) material we covered (lots of details missing, though) [Postscript] [PDF]


    WEEK 4 (Jan 28):
    Q-LEARNING; EXPLORATION VS. EXPLOITATION; LARGE STATE SPACES

  • Analysis of Q-learning via large deviation bounds; dependence on number of states
  • Indirect approach
  • Exploration vs Exploitation: E^3 algorithm and analysis
  • Function approximation in large state spaces: use in Q-learning, convergence properties, gradient methods
  • Generative models for large state spaces
  • Sparse sampling algorithms for planning

    Problem Set #2 (due Weds Feb 6, start of class) [Postscript] [PDF]
    Paper: "A Sparse Sampling Algorithm for Near-Optimal Planning in Large Markov Decision Processes", by MK, Mansour, Ng [Postscript] [PDF]


    WEEK 5 (Feb 4):
    LINEAR AND LOGISTIC REGRESSION; GRADIENT AND NEWTON METHODS

  • Discrete models, continuous models
  • Regression as graphical model
  • Linear regression: least squares
  • Logistic regression: gradient descent, Newton's method

    Lecture Notes on Regression [Gzipped Postscript] [PDF]
    Problem Set #3 (Due Wed Feb 21, start of class) [Gzipped Postscript] [PDF]
    Data sets in MATLAB (binary) format: nasdaq_lag5.mat digits2vs3.mat


    WEEK 6 (Feb 11):
    PREDICTION AND ITERATIVE SCALING

  • Convex functions
  • Jensen's inequality
  • Legendre transformations
  • Iterative scaling


    WEEK 7 (Feb 18):
    STATE AGGREGATION AND CLUSTERING

  • Aggregate Markov models
  • Language modeling
  • k-means algorithm
  • Gaussian mixture models
  • Graphical representation
  • EM algorithm: proof of convergence

    Lecture Notes on Iterative Scaling [Gzipped Postscript] [PDF]
    Problem Set #4 (Due Wed Feb 27, start of class) [Gzipped Postscript] [PDF]
    Lecture Notes on State Aggregation [Gzipped Postscript] [PDF]


    WEEK 8 (Feb 25):
    DIMENSIONALITY REDUCTION AND FUSION

  • PCA algorithm
  • Factor analysis
  • Probabilistic union model
  • Application to vision, speech

    Lecture Notes on Sensor Fusion [Gzipped Postscript] [PDF]


    WEEK 9 (Mar 4):
    HIERARCHY AND DYNAMICS

  • Decision trees
  • Hierarchical mixture of experts
  • Hidden Markov models
  • Baum-Welch algorithm
  • Application to speech, chat mining

    Lecture Notes on HMMs [Gzipped Postscript] [PDF]


    WEEK 10 (Mar 11):
    SPRING BREAK


    WEEK 11 (Mar 18):
    GRAPHICAL MODELS FOR PROBABILISTIC INFERENCE

  • Motivation and definition of Bayes nets
  • Examples of Bayes nets from previous lectures
  • Conditional independence in Bayes nets; d-separation
  • Explaining away


    WEEK 12 (Mar 25):
    INFERENCE IN BAYES NETS

  • Definition of inference problem
  • The polytree algorithm and analysis
  • Variational methods for BN inference

    Problem Set #7 (due Mon Apr 1) [Postscript] [PDF]
    MK tutorial including inference in Bayes nets, reinforcement learning [Postscript] [PDF]


    WEEK 13 (Apr 1):
    BAYES NETS AND MDPs

  • Large deviation methods for BN inference
  • Factored MDPs
  • Exploration vs. exploitation in factored MDPs

    Paper: An Introduction to Variational Methods for Graphical Models. M. Jordan, Z. Ghahramani, T. Jaakkola, L. Saul. [Postscript] [PDF]
    Paper: Variational Methods and the QMR-DT Database. T. Jaakkola and M. Jordan. [Postscript] [PDF]
    Paper: Large Deviation Methods for Approximate Probabilistic Inference, with Rates of Convergence, MK and LS. [Postscript] [PDF]


    WEEK 14 (Apr 8):
    INTRODUCTION TO GAME THEORY

  • Introduction to multi-agent interaction and game theory
  • Basics of game theory: matrix games, mixed strategies, Nash equilibria, etc.
  • Minimax theorem
  • Nash's theorem


    WEEK 15 (Apr 15):
    STOCHASTIC AND GRAPHICAL GAMES

  • Compact representations in game theory: multistage, multiplayer, multiaction
  • Definition of stochastic games
  • Planning in stochastic games: minmax value iteration, sparse sampling, correlated equilibria
  • Q-learning in stochastic games
  • Undirected graphical models for game theory
  • Tree algorithms
  • Summarization games

    www.digits.com