Reinforcement Learning and Planning under Uncertainty (ANU COMP6460/4640)

as part of the Masters of Information & Communications Technology (MICT) program

Course Coordinators

Marcus Hutter
Scott Sanner

Presenters

Scott Sanner
Marcus Hutter

Time and Place

  • Lecture: Tuesday, 11:00 -- 13:00, Ian Ross R221 Graduate Teaching Room
  • Tutorial / Lab: Wednesday, 9:00 -- 11:00 (selected weeks, check schedule), CSIT N112

Enrollment

Prerequisites: Ideally you've taken the comp3620/6320 Artificial Intelligence and/or the comp4670/6467 Statistical Machine Learning course. Generally, some background in elementary logic, statistics and probability is required, and programming experience.

Auditors are welcome, but we request that auditing students commit to attending all lectures / labs / tutorials as course material is cumulative.

If you fulfil the official requirements, please send an email to Scott so that he can support your enrolment.

Assessment

  • 3 written assignments (10% each)
  • Take-home final examination (70%)

Important dates and special announcements

  • Aug. 27, 23:59: Assignment 1 due (email to Scott Sanner)
  • Sep. 24, 23:59: Assignment 2 due (email to Marcus Hutter)
  • Oct. 29, 23:59: Assignment 3 due (email to Scott Sanner)
  • TBD: Take-home final exam

Contact hours for students

After the lecture and by email.

Textbooks

Overview

This course provides an introduction to reinforcement learning (RL) and planning under uncertainty providing underlying concepts necessary for understanding and developing intelligent systems. For instance, the world-class Backgammon program, TD-Gammon, is based on RL techniques. Topics covered will be the classical MDP model, temporal difference learning, dynamic programming, structured models, approximation algorithms, integrating planning and learning, and the theory of universal rational agents based on sequential decision theory and algorithmic information theory.

Syllabus

Nr Date Presenter Content
Week 1
IN Jul. 22 M./S. Administrative + Overview (Admin) (General Overview) (Marcus Overview)
Week 2
1a Jul. 29 Scott Introduction to Reinforcement Learning (MDP model and model classes: deterministic, SSP, process-oriented, etc.; model-based MDP solutions: dynamic programming, (async.) value iteration, (modified) policy iteration, linear programming, search & RTDP) (LAO* not covered) (Decision-theory and One-shot Decision Making) (MDPs and Sequential Decision Making) (Read Ch. 1-4 Sutton & Barto; Ch. 2 of Sanner)
LT Jul. 30   Lab 1a: Learning to drive in a simulator using Value Iteration and RTDP. (Handout) (Code) (Solution)
A1 Jul. 30   Assignment 1 available (Assignment 1)
Week 3
1b Aug. 5 Scott Introduction to Reinforcement Learning (model-free MDP solutions: estimation vs. control, exploration vs. exploitation, bandits and regret bounds, Monte Carlo) (Monte Carlo RL Slides) (Read Ch. 5 of Sutton and Barto)
LT Aug. 6   Lab 1b: Bandit algorithms and Poker. (Handout) (Code) (Solution)
Week 4
1c Aug. 12 Scott Introduction to Reinforcement Learning (model-free MDP solutions, continued: SARSA, Q-learning, temporal difference methods) (Review / Overview Slides) (Temporal Difference Slides) (TD lambda and Eligibility Traces) (Read Ch. 6-7 of Sutton and Barto)
LT Aug. 13   Lab 1c: Monte Carlo and TD Lambda for Tic-Tac-Toe & Othello. (Handout) (Code) (Solution)
Week 5
1d Aug. 19 Scott Introduction to Reinforcement Learning (function approximation for MDPs: linear and nonlinear methods, gradient descent, issues with control and partial observability) (RL with Function Approximation) (combining planning and learning: model-based RL overview, DYNA and extensions) (Model-based RL) (not covered -- least-squares methods (LSTD & LSPI), bandits for trees (UCT), Bayesian methods) (Read Ch. 8-11 of Sutton and Barto)
LT Aug. 20   Tutorial 1d: Sample problems/solutions and review. (Review Questions)
Week 6
2a Aug. 26 Marcus Sequential Decisions based on Algorithmic Probability (information theory & kolmogorov complexity, algorithmic probability & universal induction) (Slides for All Four Lectures)
A1 Aug. 27   Assignment 1 due (email to Scott Sanner)
A2 Aug. 27   Assignment 2 available (Assignment 2)
Week 7
2b Sep. 2 Marcus Sequential Decisions based on Algorithmic Probability (minimum description length, the universal similarity metric) (See above for slides)
LT Sep. 3   Tutorial 2b: Sample problems/solutions and review.
Week 8
2c Sep. 9 Marcus Sequential Decisions based on Algorithmic Probability (Bayesian sequence prediction, universal sequence prediction) (See above for slides)
Week 9
2d Sep. 16 Marcus Sequential Decisions based on Algorithmic Probability (universal rational agents, computational aspects) (See above for slides)
LT Sep. 17   Tutorial 2d: Sample problems/solutions and review.
Week 10
3a Sep. 23 Scott Decision-Theoretic Planning (structured representation: Markov chains and dynamic Bayesian nets (DBNs), decision diagrams, factored MDPs and Bellman equations, factored concurrent actions) (Markov Chains, DBNs, ADDs, and Factored MDPs) (Section 3.1-3.3 of Sanner)
LT Sep. 24   Lab 3a: Decision diagrams and factored MDPs. (Handout) (Code)
A2 Sep. 24   Assignment 2 due (email to Marcus Hutter)
A3 Sep. 24   Assignment 3 available (Assignment 3)
Week 11
3b Oct. 14 Scott Decision-Theoretic Planning (structured solution and approximation: read SPUDD and APRICODD) (POMDP Intro Slides and POMDP Tutorial Web Page) (not covered: structured linear-value approximation for MDPs)
LT Oct. 15   Tutorial 3b: Go over Assignments 1 and 3.
Week 12
4a Oct. 21 Scott Overview of Extensions (part I: inverse reinforcement learning, reward shaping, semi-MDPs / options / hierarchical planning and RL (MAXQ), constrained policies and HAMs) (Extensions, Part I)
Week 13
4b Oct. 28 Scott Overview of Extensions (part II: continuous state and action spaces, sequential Markov / stochastic games, first-order MDPs, mechanism design) (Extensions, Part II) (Course Summary: Dimensions of RL and Planning)
LT Oct. 29   Tutorial 4b: Sample problems/solutions and review. (Handout)
A3 Oct. 29   Assignment 3 due (email to Scott Sanner)
FE x M./S. Take-home final exam


Last modified 2008-14-07 1:00 PM