as part of the Masters of Information & Communications Technology (MICT) program
Prerequisites: Ideally you've taken the comp3620/6320 Artificial Intelligence and/or the comp4670/6467 Statistical Machine Learning course. Generally, some background in elementary logic, statistics and probability is required, and programming experience.
Auditors are welcome, but we request that auditing students commit to attending all lectures / labs / tutorials as course material is cumulative.
If you fulfil the official requirements, please send an email to Scott so that he can support your enrolment.
After the lecture and by email.
This course provides an introduction to reinforcement learning (RL) and planning under uncertainty providing underlying concepts necessary for understanding and developing intelligent systems. For instance, the world-class Backgammon program, TD-Gammon, is based on RL techniques. Topics covered will be the classical MDP model, temporal difference learning, dynamic programming, structured models, approximation algorithms, integrating planning and learning, and the theory of universal rational agents based on sequential decision theory and algorithmic information theory.
| Nr | Date | Presenter | Content |
|---|---|---|---|
| Week 1 | |||
| IN | Jul. 22 | M./S. | Administrative + Overview (Admin) (General Overview) (Marcus Overview) |
| Week 2 | |||
| 1a | Jul. 29 | Scott | Introduction to Reinforcement Learning (MDP model and model classes: deterministic, SSP, process-oriented, etc.; model-based MDP solutions: dynamic programming, (async.) value iteration, (modified) policy iteration, linear programming, search & RTDP) (LAO* not covered) (Decision-theory and One-shot Decision Making) (MDPs and Sequential Decision Making) (Read Ch. 1-4 Sutton & Barto; Ch. 2 of Sanner) |
| LT | Jul. 30 | Lab 1a: Learning to drive in a simulator using Value Iteration and RTDP. (Handout) (Code) (Solution) | |
| A1 | Jul. 30 | Assignment 1 available (Assignment 1) | |
| Week 3 | |||
| 1b | Aug. 5 | Scott | Introduction to Reinforcement Learning (model-free MDP solutions: estimation vs. control, exploration vs. exploitation, bandits and regret bounds, Monte Carlo) (Monte Carlo RL Slides) (Read Ch. 5 of Sutton and Barto) |
| LT | Aug. 6 | Lab 1b: Bandit algorithms and Poker. (Handout) (Code) (Solution) | |
| Week 4 | |||
| 1c | Aug. 12 | Scott | Introduction to Reinforcement Learning (model-free MDP solutions, continued: SARSA, Q-learning, temporal difference methods) (Review / Overview Slides) (Temporal Difference Slides) (TD lambda and Eligibility Traces) (Read Ch. 6-7 of Sutton and Barto) |
| LT | Aug. 13 | Lab 1c: Monte Carlo and TD Lambda for Tic-Tac-Toe & Othello. (Handout) (Code) (Solution) | |
| Week 5 | |||
| 1d | Aug. 19 | Scott | Introduction to Reinforcement Learning (function approximation for MDPs: linear and nonlinear methods, gradient descent, issues with control and partial observability) (RL with Function Approximation) (combining planning and learning: model-based RL overview, DYNA and extensions) (Model-based RL) (not covered -- least-squares methods (LSTD & LSPI), bandits for trees (UCT), Bayesian methods) (Read Ch. 8-11 of Sutton and Barto) |
| LT | Aug. 20 | Tutorial 1d: Sample problems/solutions and review. (Review Questions) | |
| Week 6 | |||
| 2a | Aug. 26 | Marcus | Sequential Decisions based on Algorithmic Probability (information theory & kolmogorov complexity, algorithmic probability & universal induction) (Slides for All Four Lectures) |
| A1 | Aug. 27 | Assignment 1 due (email to Scott Sanner) | |
| A2 | Aug. 27 | Assignment 2 available (Assignment 2) | |
| Week 7 | |||
| 2b | Sep. 2 | Marcus | Sequential Decisions based on Algorithmic Probability (minimum description length, the universal similarity metric) (See above for slides) |
| LT | Sep. 3 | Tutorial 2b: Sample problems/solutions and review. | |
| Week 8 | |||
| 2c | Sep. 9 | Marcus | Sequential Decisions based on Algorithmic Probability (Bayesian sequence prediction, universal sequence prediction) (See above for slides) |
| Week 9 | |||
| 2d | Sep. 16 | Marcus | Sequential Decisions based on Algorithmic Probability (universal rational agents, computational aspects) (See above for slides) |
| LT | Sep. 17 | Tutorial 2d: Sample problems/solutions and review. | |
| Week 10 | |||
| 3a | Sep. 23 | Scott | Decision-Theoretic Planning (structured representation: Markov chains and dynamic Bayesian nets (DBNs), decision diagrams, factored MDPs and Bellman equations, factored concurrent actions) (Markov Chains, DBNs, ADDs, and Factored MDPs) (Section 3.1-3.3 of Sanner) |
| LT | Sep. 24 | Lab 3a: Decision diagrams and factored MDPs. (Handout) (Code) | |
| A2 | Sep. 24 | Assignment 2 due (email to Marcus Hutter) | |
| A3 | Sep. 24 | Assignment 3 available (Assignment 3) | |
| Week 11 | |||
| 3b | Oct. 14 | Scott | Decision-Theoretic Planning (structured solution and approximation: read SPUDD and APRICODD) (POMDP Intro Slides and POMDP Tutorial Web Page) (not covered: structured linear-value approximation for MDPs) |
| LT | Oct. 15 | Tutorial 3b: Go over Assignments 1 and 3. | |
| Week 12 | |||
| 4a | Oct. 21 | Scott | Overview of Extensions (part I: inverse reinforcement learning, reward shaping, semi-MDPs / options / hierarchical planning and RL (MAXQ), constrained policies and HAMs) (Extensions, Part I) |
| Week 13 | |||
| 4b | Oct. 28 | Scott | Overview of Extensions (part II: continuous state and action spaces, sequential Markov / stochastic games, first-order MDPs, mechanism design) (Extensions, Part II) (Course Summary: Dimensions of RL and Planning) |
| LT | Oct. 29 | Tutorial 4b: Sample problems/solutions and review. (Handout) | |
| A3 | Oct. 29 | Assignment 3 due (email to Scott Sanner) | |
| FE | x | M./S. | Take-home final exam |