CS 748: Advances in Intelligent and Learning Agents
(Spring 2022)
(Picture source: https://pixabay.com/photos/sudoku-mystery-puzzle-book-hand-552944/.)
(Page last edited .)
Instructor
Shivaram Kalyanakrishnan
Office: 220, New CSE Building
Phone: 7704
E-mail: shivaram@cse.iitb.ac.in
Teaching Assistant
Santhosh Kumar G.
E-mail: santhoshkg@iitb.ac.in
Course Description
Artificial intelligence is fast making inroads into various fields,
and is indeed influencing our lives in a telling way. This course will
accustom students to the state-of-the-art in designing and deploying
intelligent and learning agents. The course will build upon the
platform laid by CS 747 (Foundations of Intelligent and Learning
Agents) to engage students in targeted research and system-building
projects.
The course has three main objectives. First, it seeks to impart
students the mindset that artificial intelligence and machine learning
can substantially benefit real-world problems, and give them the
confidence that they can drive this process. Second, the course seeks
to develop the students' skills to abstract, analyse, design,
implement, evaluate, and iterate while devising solutions. The third
objective of the course is to train students to comprehend technical
discourse and sharpen their communication skills.
The course is organised in the form of two parallel
tracks: lectures (mainly) based on research papers, and a
semester-long
research project. Topics covered in the lectures will
include, among others, POMDPs, theoretical analysis of MDP planning
and learning, representation discovery, exploration, abstraction,
animal behaviour, evolutionary algorithms, philosophy of AI, and
applications. Students will be provided a short quiz/assignment every
week based on the lecture material.
The research project presents the students an opportunity to apply
their learning in creative and imaginative ways to understand, build,
and analyse systems. Both theoretical and empirical investigations may
be undertaken. Students may work alone or in teams. Each team will be
guided individually through the phases of the research project, but
will share its progress with the class at designated intervals.
Prerequisites
Registration is open to students who have passed CS 747 and have
secured the instructor's consent (which is decided based on
performance in CS 747).
Hybrid Mode
It will be possible for students to avail the course entirely in
the on-line mode, although in-person meetings are also planned, as
described below.
- All lecture slides and instructional videos will be made available on
this page.
- There shall be no synchronous meetings (either on-line or
in person) that students are mandated to attend.
- The instructor will hold office hours in the allotted meeting slot
(Slot 13: 7.00 p.m. – 8.25 p.m. Mondays and Thursdays). No new
material will be presented during these slots. The instructor plans to
hold Monday office hours on-line, and Thursday office hours in
person. However, this plan may change depending on institute
guidelines on in-person meetings. In the first few weeks, both weekly
meetings will be held on-line. Details of the meeting
schedule and links will be shared on the class Moodle page and updated
as the semester progresses.
- Students are strongly encouraged to keep up with the weekly plan
posted below, and should they have any questions for the instructor,
bring them up through one of the channels listed. Nonetheless,
students who are unable to interact with the instructor on a regular
basis will be at no particular disadvantage. Students who are unable
to access course material may please promptly inform the instructor.
- Research presentations from the students will have to be
pre-recorded. Students are encouraged to familiarise themselves
with software for creating high-quality video presentations, which
will be viewed off-line by the class.
Weekly Plan
- Tuesday 12.00 p.m.: Lectures and slides for the week are put up
on this page.
- Tuesday–Saturday: Students watch the videos and make a
note of questions and comments.
- Tuesday–Saturday: Students post their questions and
comments on the week's discussion forum (on Moodle). It is okay to
ask questions based on previous lectures, and bring up topics of
general interest.
- Office hours (7.00 p.m. – 8.25 p.m. Mondays and
Thursdays):
- Each meeting slot will be conducted either fully on-line or fully
in-person. To begin, all meeting slots will be on-line; if/when the
guidelines permit, the instructor will hold the Thursday office
hours in person.
- Each meeting slot will begin with time dedicated to discussing
the week's lecture material, following which teams may discuss their
research projects.
- Students who are unable to join the on-line or in-person
meetings can optionally provide their phone numbers; the instructor
will call them during the office hours.
- The instructor will periodically schedule meetings with the
teams to discuss their research progress. These meetings will be
held during the regular meeting slots. While it would
be ideal for the entire team to meet, it is okay if some members
are unable to do so.
- Thursday 11.55 p.m.: A quiz is published based on the
week's lecture and reading material.
- Students submit a response to the week's quiz
by 11.55 p.m. Monday.
Details of the web-based interaction, as well as a form for
requesting the instructor to call, will be provided on Moodle. In
addition, students will be given a feedback form through which they
can communicate issues related to the course at any point of time.
Evaluation
There will be 10-12 weekly quizzes, each worth 4–6 marks,
and together totaling at least 50 marks. The marks contributed by
the quizzes to the grade will be the maximum of the total marks
earned in the quizzes and 40.
The research project will carry 60 marks, divided as: 5 marks for
an introductory presentation, 5 marks for a proposal, 10 marks for a
mid-stage presentation, 10 marks for the final presentation, and 30
marks for the final report. An outstanding research project
will bypass the regular evaluation criteria and automatically result
in an "AA" grade for the concerned team.
All submissions must be made through Moodle.
Students auditing the course must score 50 or more marks in the
course to be awarded an "AU" grade.
Moodle
Moodle will be the primary course management system. Marks for the
assessments will be maintained on the class Moodle page; discussion
fora will also be hosted on Moodle. Students who do not have an
account on Moodle for the course must send the instructor a request
by e-mail, specifying the roll number/employee number for account
creation.
Academic Honesty
Students are expected to adhere to the highest standards of
integrity and academic honesty. Academic violations, as detailed
below, will be dealt with strictly, in accordance with the
institute's procedures
and disciplinary
actions for academic malpractice.
Students are expected to work alone on all the quizzes. While they
are free to discuss the material presented in class with their peers,
they must not discuss the contents of the assessments (neither the
questions, nor the solutions) with classmates (or anybody other than
the instructor and TAs). Violations will be considered acts of
dishonesty.
Students may freely collaborate with their peers on their research,
but any assistance received from colleagues must be properly
acknowledged in the corresponding presentations and reports.
Communication
This page will serve as the primary source of information regarding
the course, the schedule, and related announcements. The Moodle page
for the course will be used for sharing resources for the lectures and
assignments, and also for recording grades.
E-mail is the best means of communicating with the instructor;
students must send e-mail to "shivaram@cse.iitb.ac.in" with "[CS748]"
in the header.
Texts and References
Reinforcement Learning: An Introduction, Richard S. Sutton and
Andrew G. Barto, 2nd edition, MIT Press,
2018. On-line
version.
Artificial Intelligence: Foundations of Computational Agents: David
L. Poole and Alan K. Mackworth, 2nd edition, Cambridge University Press,
2017. On-line
version.
Selected research papers.
-
An Improved Policy Iteration Algorithm for Partially Observable MDPs
Eric A. Hansen, 1997
-
Planning and acting in partially observable stochastic domains
Leslie Pack Kaelbling, Michael L. Littman, and Anthony R. Cassandra, 1998
-
On the Complexity of Policy Iteration
Yishay Mansour and Satinder Singh, 1999
-
Algorithms for Inverse Reinforcement Learning
Andrew Y. Ng and Stuart Russell, 2000
-
Autonomous helicopter flight via reinforcement learning,
Andrew Y. Ng, H. Jin Kim, Michael I. Jordan, and Shankar Sastry, 2003
-
Labeling Images with a Computer Game
Luis von Ahn and Laura Dabbish, 2004
-
Bandit based Monte-Carlo Planning
Levente Kocsis and Csaba Szepesvári, 2006
-
Evolutionary Function Approximation for Reinforcement Learning
Shimon Whiteson and Peter Stone, 2006
-
Efficient Selection of Multiple Bandit Arms: Theory and Practice
Shivaram Kalyanakrishnan and Peter Stone, 2010.
-
-
PAC Subset Selection in Stochastic Multi-armed Bandits
Shivaram Kalyanakrishnan, Ambuj Tewari, Peter Auer, and Peter Stone, 2012.
-
-
Information Complexity in Bandit Subset Selection
Emilie Kaufmann and Shivaram Kalyanakrishnan, 2013.
-
-
Human-level control through deep reinforcement learning
Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A. Rusu, Joel
Veness, Marc G. Bellemare, Alex Graves, Martin Riedmiller, Andreas
K. Fidjeland, Georg Ostrovski, Stig Petersen, Charles Beattie, Amir
Sadik, Ioannis Antonoglou, Helen King, Dharshan Kumaran, Daan
Wierstra, Shane Legg, and Demis Hassabis, 2015
-
Batch-Switching Policy Iteration
Shivaram Kalyanakrishnan, Utkarsh Mall, and Ritish Goyal, 2016
-
Mastering the game of Go with deep neural networks and tree search
David Silver, Aja Huang, Chris J. Maddison, Arthur Guez, Laurent Sifre, George van den Driessche, Julian Schrittwieser, Ioannis Antonoglou, Veda Panneershelvam, Marc Lanctot, Sander Dieleman, Dominik Grewe, John Nham, Nal Kalchbrenner, Ilya Sutskever, Timothy Lillicrap, Madeleine Leach, Koray Kavukcuoglu, Thore Graepel, and Demis Hassabis, 2016
-
Mastering the game of Go without human knowledge
David Silver, Julian Schrittwieser, Karen Simonyan, Ioannis Antonoglou, Aja Huang, Arthur Guez, Thomas Hubert, Lucas Baker, Matthew Lai, Adrian Bolton, Yutian Chen, Timothy Lillicrap, Fan Hui, Laurent Sifre, George van den Driessche, Thore Graepel, and Demis Hassabis, 2017.
-
-
A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play
David Silver, Thomas Hubert, Julian Schrittwieser, Ioannis Antonoglou, Matthew Lai, Arthur Guez, Marc Lanctot, Laurent Sifre, Dharshan Kumaran, Thore Graepel, Timothy Lillicrap, Karen Simonyan, and Demis Hassabis, 2018.
-
-
A Tighter Analysis of Randomised Policy Iteration
Meet Taraviya and Shivaram Kalyanakrishnan, 2019
-
PAC Mode Estimation using PPR Martingale Confidence Sequences
Shubham Anand Jain, Rohan Shah, Sanit Gupta, Denil Mehta, Inderjeet Nair, Jian Vora, Sushil Khyalia, Sourav Das, Vinay J. Ribeiro, and Shivaram Kalyanakrishnan, 2022
Schedule
-
Week 0 (January 4‐10) : Welcome; Introduction to the course.
-
Week 1 (January 11‐17) : Model-based RL.
- Lecture 1: Video, Slides.
Reading: Sections 8, 8.1, 8.2, 8.3, 8.4, Sutton and Barto (2018); Ng et al. (2003).
-
Week 2 (January 18‐24) : Search.
- Lecture 1: Video, Slides.
- Reading: Chapter 3, Poole and Mackworth (2017); Sections 8, 8.1, 8.8, 8.9, 8.10, 8.11, Sutton and Barto (2018).
- Reference: Kocsis and Szepesvári (2006).
-
Week 3 (January 25‐31) : Applications of Deep RL.
- Lecture 1: Video, Slides.
Reading: Mnih et al. (2015), Silver et al. (2016).
References: Silver et al. (2017), Silver et al. (2018).
-
Week 4 (February 1‐7) : Inverse Reinforcement Learning.
- Lecture 1: Video, Slides.
Reading: Ng and Russell (2000).
-
Week 5 (February 8‐14) : Evolution and Learning.
- Lecture 1: Video, Slides.
Reading: Whiteson and Stone (2006).
-
Week 6 (February 15‐28) : Crowdsourcing.
- Lecture 1: Video, Slides.
Reading: von Ahn and Dabbish (2004).
-
Week 7 (March 1‐7) : Running-time Analysis of Policy Iteration.
- Lecture 1: Video, Slides.
References: Mansour and Singh (1999); Kalyanakrishnan, Mall, and Goyal (2016); Taraviya and Kalyanakrishnan (2019).
-
Week 8 (March 8‐14) : PAC Subset Selection in Stochastic Bandits.
- Lecture 1: Video, Slides.
Reading: Sections 1, 2, 3, Kalyanakrishnan and Stone (2010); Kalyanakrishnan et al. (2012).
References: Kaufmann and Kalyanakrishnan (2013).
-
Week 9 (March 15‐21) : PAC Mode Estimation.
- Lecture and discussion to be held on-line during Thursday meeting slot.
Reading: Jain et al. (2022).
-
Week 10 (March 22‐28) : POMDPs.
- Lecture 1: Video, Slides.
Reading: Sections 1, 2, 3, Kaelbling et al. (1998).
-
Week 11 (March 29‐April 4) : POMDP Solution Techniques.
- Lecture 1: Video, Slides.
Reading: Sections 4, 4.1, 4.2, 4.3, Kaelbling et al. (1998); Hansen (1997).
-
Week 12 (April 5‐12) : An Agent for Reconnaissance Blind Chess.
- Lecture and discussion to be held on-line during Thursday meeting slot.
Reading shared on Moodle.
Assignments
- Week 2 Quiz, due 11.55 p.m. Monday, January 24.
- Project proposal document, due 11.55 p.m. Thursday, February 3.
- Project proposal presentation (video), due 11.55 p.m. Thursday, February 10.
- Week 3 Quiz, due 11.55 p.m. Monday, January 31.
- Week 5 Quiz, due 11.55 p.m. Monday, February 14.
- Week 6 Quiz, due 11.55 p.m. Monday, February 28.
- Week 7 Quiz, due 11.55 p.m. Monday, March 7.
- Mid-stage presentation (video), due 11.55 p.m. Thursday, March 24.
- Week 8 Quiz, due 11.55 p.m. Monday, March 14.
- Week 9 Quiz, due 11.55 p.m. Monday, March 21.
- Week 10 Quiz, due 11.55 p.m. Monday, March 28.
- Week 11 Quiz, due 11.55 p.m. Monday, April 4.
- Week 12 Quiz, due 11.55 p.m. Monday, April 11.
- Project report, due 11.55 p.m. Thursday, May 5.
- Project final presentation (video), due 11.55 p.m. Saturday, May 7.
Copyright
Slides and videos on this page are licensed under
a Creative Commons
Attribution-ShareAlike 4.0 International License. Permission for
their use beyond the scope of the license may be sought by writing to
shivaram@cse.iitb.ac.in.