CS 748: Advances in Intelligent and Learning Agents
(Spring 2022)

(Picture source: https://pixabay.com/photos/sudoku-mystery-puzzle-book-hand-552944/.)

(Page last edited .)

Instructor

  Shivaram Kalyanakrishnan
  Office: 220, New CSE Building
  Phone: 7704
  E-mail: shivaram@cse.iitb.ac.in

Teaching Assistant

Santhosh Kumar G.
E-mail: santhoshkg@iitb.ac.in

Course Description

Artificial intelligence is fast making inroads into various fields, and is indeed influencing our lives in a telling way. This course will accustom students to the state-of-the-art in designing and deploying intelligent and learning agents. The course will build upon the platform laid by CS 747 (Foundations of Intelligent and Learning Agents) to engage students in targeted research and system-building projects.

The course has three main objectives. First, it seeks to impart students the mindset that artificial intelligence and machine learning can substantially benefit real-world problems, and give them the confidence that they can drive this process. Second, the course seeks to develop the students' skills to abstract, analyse, design, implement, evaluate, and iterate while devising solutions. The third objective of the course is to train students to comprehend technical discourse and sharpen their communication skills.

The course is organised in the form of two parallel tracks: lectures (mainly) based on research papers, and a semester-long research project. Topics covered in the lectures will include, among others, POMDPs, theoretical analysis of MDP planning and learning, representation discovery, exploration, abstraction, animal behaviour, evolutionary algorithms, philosophy of AI, and applications. Students will be provided a short quiz/assignment every week based on the lecture material.

The research project presents the students an opportunity to apply their learning in creative and imaginative ways to understand, build, and analyse systems. Both theoretical and empirical investigations may be undertaken. Students may work alone or in teams. Each team will be guided individually through the phases of the research project, but will share its progress with the class at designated intervals.

Prerequisites

Registration is open to students who have passed CS 747 and have secured the instructor's consent (which is decided based on performance in CS 747).

Hybrid Mode

It will be possible for students to avail the course entirely in the on-line mode, although in-person meetings are also planned, as described below.

All lecture slides and instructional videos will be made available on this page.

There shall be no synchronous meetings (either on-line or in person) that students are mandated to attend.

The instructor will hold office hours in the allotted meeting slot (Slot 13: 7.00 p.m. – 8.25 p.m. Mondays and Thursdays). No new material will be presented during these slots. The instructor plans to hold Monday office hours on-line, and Thursday office hours in person. However, this plan may change depending on institute guidelines on in-person meetings. In the first few weeks, both weekly meetings will be held on-line. Details of the meeting schedule and links will be shared on the class Moodle page and updated as the semester progresses.

Students are strongly encouraged to keep up with the weekly plan posted below, and should they have any questions for the instructor, bring them up through one of the channels listed. Nonetheless, students who are unable to interact with the instructor on a regular basis will be at no particular disadvantage. Students who are unable to access course material may please promptly inform the instructor.

Research presentations from the students will have to be pre-recorded. Students are encouraged to familiarise themselves with software for creating high-quality video presentations, which will be viewed off-line by the class.

Weekly Plan

Tuesday 12.00 p.m.: Lectures and slides for the week are put up on this page.

Tuesday–Saturday: Students watch the videos and make a note of questions and comments.

Tuesday–Saturday: Students post their questions and comments on the week's discussion forum (on Moodle). It is okay to ask questions based on previous lectures, and bring up topics of general interest.

Office hours (7.00 p.m. – 8.25 p.m. Mondays and Thursdays):
- Each meeting slot will be conducted either fully on-line or fully in-person. To begin, all meeting slots will be on-line; if/when the guidelines permit, the instructor will hold the Thursday office hours in person.
- Each meeting slot will begin with time dedicated to discussing the week's lecture material, following which teams may discuss their research projects.
- Students who are unable to join the on-line or in-person meetings can optionally provide their phone numbers; the instructor will call them during the office hours.
- The instructor will periodically schedule meetings with the teams to discuss their research progress. These meetings will be held during the regular meeting slots. While it would be ideal for the entire team to meet, it is okay if some members are unable to do so.
Thursday 11.55 p.m.: A quiz is published based on the week's lecture and reading material.

Students submit a response to the week's quiz by 11.55 p.m. Monday.

Details of the web-based interaction, as well as a form for requesting the instructor to call, will be provided on Moodle. In addition, students will be given a feedback form through which they can communicate issues related to the course at any point of time.

Evaluation

There will be 10-12 weekly quizzes, each worth 4–6 marks, and together totaling at least 50 marks. The marks contributed by the quizzes to the grade will be the maximum of the total marks earned in the quizzes and 40.

The research project will carry 60 marks, divided as: 5 marks for an introductory presentation, 5 marks for a proposal, 10 marks for a mid-stage presentation, 10 marks for the final presentation, and 30 marks for the final report. An outstanding research project will bypass the regular evaluation criteria and automatically result in an "AA" grade for the concerned team.

All submissions must be made through Moodle.

Students auditing the course must score 50 or more marks in the course to be awarded an "AU" grade.

Moodle

Moodle will be the primary course management system. Marks for the assessments will be maintained on the class Moodle page; discussion fora will also be hosted on Moodle. Students who do not have an account on Moodle for the course must send the instructor a request by e-mail, specifying the roll number/employee number for account creation.

Academic Honesty

Students are expected to adhere to the highest standards of integrity and academic honesty. Academic violations, as detailed below, will be dealt with strictly, in accordance with the institute's procedures and disciplinary actions for academic malpractice.

Students are expected to work alone on all the quizzes. While they are free to discuss the material presented in class with their peers, they must not discuss the contents of the assessments (neither the questions, nor the solutions) with classmates (or anybody other than the instructor and TAs). Violations will be considered acts of dishonesty.

Students may freely collaborate with their peers on their research, but any assistance received from colleagues must be properly acknowledged in the corresponding presentations and reports.

Communication

This page will serve as the primary source of information regarding the course, the schedule, and related announcements. The Moodle page for the course will be used for sharing resources for the lectures and assignments, and also for recording grades.

E-mail is the best means of communicating with the instructor; students must send e-mail to "shivaram@cse.iitb.ac.in" with "[CS748]" in the header.

Texts and References

Reinforcement Learning: An Introduction, Richard S. Sutton and Andrew G. Barto, 2^nd edition, MIT Press, 2018. On-line version.

Artificial Intelligence: Foundations of Computational Agents: David L. Poole and Alan K. Mackworth, 2^nd edition, Cambridge University Press, 2017. On-line version.

Selected research papers.

An Improved Policy Iteration Algorithm for Partially Observable MDPs
Eric A. Hansen, 1997
Planning and acting in partially observable stochastic domains
Leslie Pack Kaelbling, Michael L. Littman, and Anthony R. Cassandra, 1998
On the Complexity of Policy Iteration
Yishay Mansour and Satinder Singh, 1999
Algorithms for Inverse Reinforcement Learning
Andrew Y. Ng and Stuart Russell, 2000
Autonomous helicopter flight via reinforcement learning, Andrew Y. Ng, H. Jin Kim, Michael I. Jordan, and Shankar Sastry, 2003
Labeling Images with a Computer Game
Luis von Ahn and Laura Dabbish, 2004
Bandit based Monte-Carlo Planning
Levente Kocsis and Csaba Szepesvári, 2006
Evolutionary Function Approximation for Reinforcement Learning
Shimon Whiteson and Peter Stone, 2006
Efficient Selection of Multiple Bandit Arms: Theory and Practice
Shivaram Kalyanakrishnan and Peter Stone, 2010.
PAC Subset Selection in Stochastic Multi-armed Bandits
Shivaram Kalyanakrishnan, Ambuj Tewari, Peter Auer, and Peter Stone, 2012.
Information Complexity in Bandit Subset Selection
Emilie Kaufmann and Shivaram Kalyanakrishnan, 2013.
Human-level control through deep reinforcement learning
Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A. Rusu, Joel Veness, Marc G. Bellemare, Alex Graves, Martin Riedmiller, Andreas K. Fidjeland, Georg Ostrovski, Stig Petersen, Charles Beattie, Amir Sadik, Ioannis Antonoglou, Helen King, Dharshan Kumaran, Daan Wierstra, Shane Legg, and Demis Hassabis, 2015
Batch-Switching Policy Iteration
Shivaram Kalyanakrishnan, Utkarsh Mall, and Ritish Goyal, 2016
Mastering the game of Go with deep neural networks and tree search
David Silver, Aja Huang, Chris J. Maddison, Arthur Guez, Laurent Sifre, George van den Driessche, Julian Schrittwieser, Ioannis Antonoglou, Veda Panneershelvam, Marc Lanctot, Sander Dieleman, Dominik Grewe, John Nham, Nal Kalchbrenner, Ilya Sutskever, Timothy Lillicrap, Madeleine Leach, Koray Kavukcuoglu, Thore Graepel, and Demis Hassabis, 2016
Mastering the game of Go without human knowledge
David Silver, Julian Schrittwieser, Karen Simonyan, Ioannis Antonoglou, Aja Huang, Arthur Guez, Thomas Hubert, Lucas Baker, Matthew Lai, Adrian Bolton, Yutian Chen, Timothy Lillicrap, Fan Hui, Laurent Sifre, George van den Driessche, Thore Graepel, and Demis Hassabis, 2017.
A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play
David Silver, Thomas Hubert, Julian Schrittwieser, Ioannis Antonoglou, Matthew Lai, Arthur Guez, Marc Lanctot, Laurent Sifre, Dharshan Kumaran, Thore Graepel, Timothy Lillicrap, Karen Simonyan, and Demis Hassabis, 2018.
A Tighter Analysis of Randomised Policy Iteration
Meet Taraviya and Shivaram Kalyanakrishnan, 2019
PAC Mode Estimation using PPR Martingale Confidence Sequences
Shubham Anand Jain, Rohan Shah, Sanit Gupta, Denil Mehta, Inderjeet Nair, Jian Vora, Sushil Khyalia, Sourav Das, Vinay J. Ribeiro, and Shivaram Kalyanakrishnan, 2022

Schedule

Week 0 (January 4‐10) : Welcome; Introduction to the course.
- Administrative: Video.
Week 1 (January 11‐17) : Model-based RL.
- Lecture 1: Video, Slides.
Week 2 (January 18‐24) : Search.
- Lecture 1: Video, Slides.
- Reading: Chapter 3, Poole and Mackworth (2017); Sections 8, 8.1, 8.8, 8.9, 8.10, 8.11, Sutton and Barto (2018).
- Reference: Kocsis and Szepesvári (2006).
Week 3 (January 25‐31) : Applications of Deep RL.
- Lecture 1: Video, Slides.
Week 4 (February 1‐7) : Inverse Reinforcement Learning.
- Lecture 1: Video, Slides.
Week 5 (February 8‐14) : Evolution and Learning.
- Lecture 1: Video, Slides.
Week 6 (February 15‐28) : Crowdsourcing.
- Lecture 1: Video, Slides.
Week 7 (March 1‐7) : Running-time Analysis of Policy Iteration.
- Lecture 1: Video, Slides.
Week 8 (March 8‐14) : PAC Subset Selection in Stochastic Bandits.
- Lecture 1: Video, Slides.
Week 9 (March 15‐21) : PAC Mode Estimation.
- Lecture and discussion to be held on-line during Thursday meeting slot.
Week 10 (March 22‐28) : POMDPs.
- Lecture 1: Video, Slides.
Week 11 (March 29‐April 4) : POMDP Solution Techniques.
- Lecture 1: Video, Slides.
Week 12 (April 5‐12) : An Agent for Reconnaissance Blind Chess.
- Lecture and discussion to be held on-line during Thursday meeting slot.

Assignments

Week 2 Quiz, due 11.55 p.m. Monday, January 24.
Project proposal document, due 11.55 p.m. Thursday, February 3.
Project proposal presentation (video), due 11.55 p.m. Thursday, February 10.
Week 3 Quiz, due 11.55 p.m. Monday, January 31.
Week 5 Quiz, due 11.55 p.m. Monday, February 14.
Week 6 Quiz, due 11.55 p.m. Monday, February 28.
Week 7 Quiz, due 11.55 p.m. Monday, March 7.
Mid-stage presentation (video), due 11.55 p.m. Thursday, March 24.
Week 8 Quiz, due 11.55 p.m. Monday, March 14.
Week 9 Quiz, due 11.55 p.m. Monday, March 21.
Week 10 Quiz, due 11.55 p.m. Monday, March 28.
Week 11 Quiz, due 11.55 p.m. Monday, April 4.
Week 12 Quiz, due 11.55 p.m. Monday, April 11.
Project report, due 11.55 p.m. Thursday, May 5.
Project final presentation (video), due 11.55 p.m. Saturday, May 7.

Copyright

Slides and videos on this page are licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. Permission for their use beyond the scope of the license may be sought by writing to shivaram@cse.iitb.ac.in.

CS 748: Advances in Intelligent and Learning Agents(Spring 2022)