CS 748: Advances in Intelligent and Learning Agents
(Spring 2021)

(Picture source: https://www.publicdomainpictures.net/pictures/310000/velka/vintage-checkers-game-board.jpg.)

Instructor

  Shivaram Kalyanakrishnan
  Office: 220, New CSE Building
  Phone: 7704
  E-mail: shivaram@cse.iitb.ac.in

Teaching Assistant

Santhosh Kumar G.
E-mail: santhoshkg@iitb.ac.in

Course Description

Artificial intelligence is fast making inroads into various fields, and is indeed influencing our lives in a telling way. This course will accustom students to the state-of-the-art in designing and deploying intelligent and learning agents. The course will build upon the platform laid by CS 747 (Foundations of Intelligent and Learning Agents) to engage students in targeted research and system-building projects.

The course has three main objectives. First, it seeks to impart students the mindset that artificial intelligence and machine learning can substantially benefit real-world problems, and give them the confidence that they can drive this process. Second, the course seeks to develop the students' skills to abstract, analyse, design, implement, evaluate, and iterate while devising solutions. The third objective of the course is to train students to comprehend technical discourse and sharpen their communication skills.

The course is organised in the form of two parallel tracks: lectures (mainly) based on research papers, and a semester-long research project. Topics covered in the lectures will include, among others, multiagent learning, POMDPs, theoretical analysis of MDP planning and learning, representation discovery, exploration, abstraction, animal behaviour, evolutionary algorithms, philosophy of AI, and applications. Students will be provided a short quiz/assignment every week based on the lecture material.

The research project presents an opportunity to students for applying their learning in creative and imaginative ways to understand, build, and analyse systems. Both theoretical and empirical investigations may be undertaken. Students may work alone or in teams. Each team will be guided individually through the phases of the research project, but will share its progress with the class at designated intervals.

Prerequisites

Registration is open to students who have passed CS 747 and have secured the instructor's consent (which is decided by a preliminary research proposal).

On-line Mode

The course will be conducted entirely in on-line mode.

All lecture slides and instructional videos will be made available on this page.

There shall be no synchronous meetings that students are mandated to attend.

The instructor will hold office hours in the allotted meeting slot (Slot 13: 7.00 p.m. – 8.25 p.m. Mondays and Thursdays). No new material will be presented during these slots.

Students are strongly encouraged to keep up with the weekly plan posted below, and should they have any questions for the instructor, bring them up through one of the channels listed. Nonetheless, students who are unable to interact with the instructor on a regular basis will be at no particular disadvantage. Students who are unable to access course material may please promptly inform the instructor.

Research presentations from the students will have to be pre-recorded. Students are encouraged to familiarise themselves with software for creating high-quality video presentations, which will be viewed off-line by the class.

Weekly Plan

Wednesday 12.00 p.m.: Lectures and slides for the week are put up on this page.

Wednesday–Sunday: Students watch the videos and make a note of questions and comments.

Wednesday–Sunday: Students post their questions and comments on the week's discussion forum (on Moodle). It is okay to ask questions based on previous lectures, and bring up topics of general interest.

Office hours (7.00 p.m. – 8.25 p.m. Mondays and Thursdays):
- The instructor is available on a web-based interaction platform 7.00 p.m. – 8.00 p.m. Mondays. This session will be entirely devoted to the lecture material.
- Most Thursdays, the instructor will schedule meetings 7.00 p.m. – 8.25 p.m. with a subset of teams to discuss their research progress. These meetings can be conducted either using a web-based platform or telephone. While it would be ideal for the entire team to meet, it is okay if some members are unable to do so.
- Students with questions call the instructor's office phone (+91 22 2576 7704) 8.00 p.m. – 8.25 p.m. on Mondays.
- Students may also request for the instructor to call them; the instructor makes these calls 8.00 p.m. – 9.00 p.m. Mondays and Thursdays.
Friday 11.55 p.m.: A quiz is published based on the week's lecture and reading material.

Students submit a response to the week's quiz by 11.55 p.m. Tuesday.

Details of the web-based interaction, as well as a form for requesting the instructor to call, will be provided on Moodle. In addition, students will be given a feedback form through which they can communicate issues related to the course at any point of time.

Evaluation

There will be 10-12 weekly quizzes, each worth 4–6 marks, and together totaling at least 50 marks. The marks contributed by the quizzes to the grade will be the maximum of the total marks earned in the quizzes and 40.

The research project will carry 60 marks, divided as: 5 marks for an introductory presentation, 5 marks for a proposal, 10 marks for a mid-stage presentation, 10 marks for the final presentation, and 30 marks for the final report. An outstanding research project will bypass the regular evaluation criteria and automatically result in an "AA" grade for the concerned team.

All submissions must be made through Moodle.

Students auditing the course must score 50 or more marks in the course to be awarded an "AU" grade.

Moodle

Moodle will be the primary course management system. Marks for the assessments will be maintained on the class Moodle page; discussion fora will also be hosted on Moodle. Students who do not have an account on Moodle for the course must send the instructor a request by e-mail, specifying the roll number/employee number for account creation.

Academic Honesty

Students are expected to adhere to the highest standards of integrity and academic honesty. Academic violations, as detailed below, will be dealt with strictly, in accordance with the institute's procedures and disciplinary actions for academic malpractice.

Students are expected to work alone on all the quizzes. They may not share code or consult with classmates (or anybody other than the instructor and TAs) regarding their solutions. Violations will be considered acts of dishonesty.

Students may freely collaborate with their peers on their research, but any assistance received from colleagues must be properly acknowledged in the corresponding presentations and reports.

Communication

This page will serve as the primary source of information regarding the course, the schedule, and related announcements. The Moodle page for the course will be used for sharing resources for the lectures and assignments, and also for recording grades.

E-mail is the best means of communicating with the instructor; students must send e-mail to "shivaram@cse.iitb.ac.in" with "[CS748]" in the header.

Texts and References

Reinforcement Learning: An Introduction, Richard S. Sutton and Andrew G. Barto, 2^nd edition, MIT Press, 2018. On-line version.

Artificial Intelligence: Foundations of Computational Agents: David L. Poole and Alan K. Mackworth, 2^nd edition, Cambridge University Press, 2017. On-line version.

Selected research papers.

Markov games as a framework for multi-agent reinforcment learning
Michael L. Littman, 1994
An Improved Policy Iteration Algorithm for Partially Observable MDPs
Eric A. Hansen, 1997
New Methods for Competitive Coevolution
Christopher D. Rosin and Richard K. Belew, 1997
Planning and acting in partially observable stochastic domains
Leslie Pack Kaelbling, Michael L. Littman, and Anthony R. Cassandra, 1998
On the Complexity of Policy Iteration
Yishay Mansour and Satinder Singh, 1999
Algorithms for Inverse Reinforcement Learning
Andrew Y. Ng and Stuart Russell, 2000
Rational and Convergent Learning in Stochastic Games
Michael Bowling and Manuela Veloso, 2001
Autonomous helicopter flight via reinforcement learning, Andrew Y. Ng, H. Jin Kim, Michael I. Jordan, and Shankar Sastry, 2003
Labeling Images with a Computer Game
Luis von Ahn and Laura Dabbish, 2004
Bandit based Monte-Carlo Planning
Levente Kocsis and Csaba Szepesvári, 2006
If multi-agent learning is the answer, what is the question?
Yoav Shoham, Rob Powers, and Trond Grenager, 2006
Evolutionary Function Approximation for Reinforcement Learning
Shimon Whiteson and Peter Stone, 2006
Efficient Selection of Multiple Bandit Arms: Theory and Practice
Shivaram Kalyanakrishnan and Peter Stone, 2010.
PAC Subset Selection in Stochastic Multi-armed Bandits
Shivaram Kalyanakrishnan, Ambuj Tewari, Peter Auer, and Peter Stone, 2012.
Information Complexity in Bandit Subset Selection
Emilie Kaufmann and Shivaram Kalyanakrishnan, 2013.
Human-level control through deep reinforcement learning
Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A. Rusu, Joel Veness, Marc G. Bellemare, Alex Graves, Martin Riedmiller, Andreas K. Fidjeland, Georg Ostrovski, Stig Petersen, Charles Beattie, Amir Sadik, Ioannis Antonoglou, Helen King, Dharshan Kumaran, Daan Wierstra, Shane Legg, and Demis Hassabis, 2015
Batch-Switching Policy Iteration
Shivaram Kalyanakrishnan, Utkarsh Mall, and Ritish Goyal, 2016
Mastering the game of Go with deep neural networks and tree search
David Silver, Aja Huang, Chris J. Maddison, Arthur Guez, Laurent Sifre, George van den Driessche, Julian Schrittwieser, Ioannis Antonoglou, Veda Panneershelvam, Marc Lanctot, Sander Dieleman, Dominik Grewe, John Nham, Nal Kalchbrenner, Ilya Sutskever, Timothy Lillicrap, Madeleine Leach, Koray Kavukcuoglu, Thore Graepel, and Demis Hassabis, 2016
Mastering the game of Go without human knowledge
David Silver, Julian Schrittwieser, Karen Simonyan, Ioannis Antonoglou, Aja Huang, Arthur Guez, Thomas Hubert, Lucas Baker, Matthew Lai, Adrian Bolton, Yutian Chen, Timothy Lillicrap, Fan Hui, Laurent Sifre, George van den Driessche, Thore Graepel, and Demis Hassabis, 2017.
A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play
David Silver, Thomas Hubert, Julian Schrittwieser, Ioannis Antonoglou, Matthew Lai, Arthur Guez, Marc Lanctot, Laurent Sifre, Dharshan Kumaran, Thore Graepel, Timothy Lillicrap, Karen Simonyan, and Demis Hassabis, 2018.
A Tighter Analysis of Randomised Policy Iteration
Meet Taraviya and Shivaram Kalyanakrishnan, 2019

Schedule

Week 0 (January 6‐12) : Welcome; Introduction to the course.
- Administrative: Video.
Week 1 (January 13‐19) : Multiagent Reinforcement Learning.
- Administrative: Video.
- Lecture 1: Video, Slides.
- Reading: Littman, 1994.
- References: Bowling and Veloso (2001); Shoham, Powers, and Grenager (2006).
Week 2 (January 20‐26) : Search.
- Administrative: Video.
- Lecture 1: Video, Slides.
- Reading: Chapter 3, Poole and Mackworth (2017); Sections 8, 8.1, 8.8, 8.9, 8.10, 8.11, Sutton and Barto (2018).
- Reference: Kocsis and Szepesvári (2006).
Week 3 (January 27‐February 2) : No lecture.
Week 4 (February 3‐9) : Model-based RL.
- Administrative: Video.
- Lecture 1: Video, Slides.
Week 5 (February 10‐16) : Applications of Deep RL.
- Lecture 1: Video, Slides.
Week 6 (February 17‐23) : PAC Subset Selection in Stochastic Bandits.
- Lecture 1: Video, Slides.
Week 7 (March 3‐9) : Running-time Analysis of Policy Iteration.
- Lecture 1: Video, Slides.
Week 8 (March 10‐16) : Inverse Reinforcement Learning.
- Lecture 1: Video, Slides.
Week 9 (March 17‐23) : Evolution and Learning.
- Lecture 1: Video, Slides.
Week 10 (March 24‐30) : Crowdsourcing.
- Lecture 1: Video, Slides.
Week 11 (March 31‐April 6) : POMDPs.
- Lecture 1: Video, Slides.
Week 12 (April 7‐April 13) : POMDP Solution Techniques.
- Lecture 1: Video, Slides.

You can also watch all these lectures on the video portal of Swayam Prabha IIT Bombay. For "Department" choose "Computer Science and Engineering"; for Course choose "Advances in Intelligent and Learning Agents"; press "Go".

Assignments

Week 1 Quiz, due 11.55 p.m. Tuesday, January 19.
Project proposal document, due 11.55 p.m. Sunday, January 24.
Project proposal presentation (video), due 11.55 p.m. Sunday, January 31.
Week 2 Quiz, due 11.55 p.m. Tuesday, February 2.
Week 4 Quiz, due 11.55 p.m. Thursday, February 11.
Week 5 Quiz, due 11.55 p.m. Tuesday, February 16.
Mid-stage presentation (video), due 11.55 p.m. Sunday, March 7.
Week 6 Quiz, due 11.55 p.m. Wednesday, March 3.
Week 7 Quiz, due 11.55 p.m. Tuesday, March 16.
Week 8 Quiz, due 11.55 p.m. Tuesday, March 16.
Week 9 Quiz, due 11.55 p.m. Tuesday, March 23.
Week 10 Quiz, due 11.55 p.m. Tuesday, March 30.
Week 11 Quiz, due 11.55 p.m. Thursday, April 15.
Project report, due 11.55 p.m. Monday, May 10.
Project final presentation (video), due 11.55 p.m. Wednesday, May 12.

Copyright

Slides and videos on this page are licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. Permission for their use beyond the scope of the license may be sought by writing to shivaram@cse.iitb.ac.in.

CS 748: Advances in Intelligent and Learning Agents(Spring 2021)