CS 748: Advances in Intelligent and Learning Agents
(Spring 2021)
(Picture source: https://www.publicdomainpictures.net/pictures/310000/velka/vintage-checkers-game-board.jpg.)
Instructor
Shivaram Kalyanakrishnan
Office: 220, New CSE Building
Phone: 7704
E-mail: shivaram@cse.iitb.ac.in
Teaching Assistant
Santhosh Kumar G.
E-mail: santhoshkg@iitb.ac.in
Course Description
Artificial intelligence is fast making inroads into various fields,
and is indeed influencing our lives in a telling way. This course will
accustom students to the state-of-the-art in designing and deploying
intelligent and learning agents. The course will build upon the
platform laid by CS 747 (Foundations of Intelligent and Learning
Agents) to engage students in targeted research and system-building
projects.
The course has three main objectives. First, it seeks to impart
students the mindset that artificial intelligence and machine learning
can substantially benefit real-world problems, and give them the
confidence that they can drive this process. Second, the course seeks
to develop the students' skills to abstract, analyse, design,
implement, evaluate, and iterate while devising solutions. The third
objective of the course is to train students to comprehend technical
discourse and sharpen their communication skills.
The course is organised in the form of two parallel
tracks: lectures (mainly) based on research papers, and a
semester-long
research project. Topics covered in the lectures will
include, among others, multiagent learning, POMDPs, theoretical
analysis of MDP planning and learning, representation discovery,
exploration, abstraction, animal behaviour, evolutionary algorithms,
philosophy of AI, and applications. Students will be provided a short
quiz/assignment every week based on the lecture material.
The research project presents an opportunity to students for
applying their learning in creative and imaginative ways to
understand, build, and analyse systems. Both theoretical and empirical
investigations may be undertaken. Students may work alone or in
teams. Each team will be guided individually through the phases of the
research project, but will share its progress with the class at
designated intervals.
Prerequisites
Registration is open to students who have passed CS 747 and have
secured the instructor's consent (which is decided by a preliminary
research proposal).
On-line Mode
- The course will be conducted entirely in on-line mode.
- All lecture slides and instructional videos will be made available on
this page.
- There shall be no synchronous meetings that students are
mandated to attend.
- The instructor will hold office hours in the allotted meeting slot
(Slot 13: 7.00 p.m. – 8.25 p.m. Mondays and Thursdays). No new
material will be presented during these slots.
- Students are strongly encouraged to keep up with the weekly plan
posted below, and should they have any questions for the instructor,
bring them up through one of the channels listed. Nonetheless, students
who are unable to interact with the instructor on a regular basis
will be at no particular disadvantage. Students who are unable to
access course material may please promptly inform the instructor.
- Research presentations from the students will have to be
pre-recorded. Students are encouraged to familiarise themselves
with software for creating high-quality video presentations, which
will be viewed off-line by the class.
Weekly Plan
- Wednesday 12.00 p.m.: Lectures and slides for the week are put up
on this page.
- Wednesday–Sunday: Students watch the videos and make a
note of questions and comments.
- Wednesday–Sunday: Students post their questions and
comments on the week's discussion forum (on Moodle). It is okay to
ask questions based on previous lectures, and bring up topics of
general interest.
- Office hours (7.00 p.m. – 8.25 p.m. Mondays and
Thursdays):
- The instructor is available on a web-based interaction
platform 7.00 p.m. – 8.00 p.m. Mondays. This session will be
entirely devoted to the lecture material.
- Most Thursdays, the instructor will schedule meetings 7.00
p.m. – 8.25 p.m. with a subset of teams to discuss their
research progress. These meetings can be conducted either using
a web-based platform or telephone. While it would be ideal for
the entire team to meet, it is okay if some members are unable
to do so.
- Students with questions call the instructor's office phone
(+91 22 2576 7704) 8.00 p.m. – 8.25 p.m. on Mondays.
- Students may also request for the instructor to call them; the
instructor makes these calls 8.00 p.m. – 9.00
p.m. Mondays and Thursdays.
Friday 11.55 p.m.: A quiz is published based on the
week's lecture and reading material.
Students submit a response to the week's quiz
by 11.55 p.m. Tuesday.
Details of the web-based interaction, as well as a form for
requesting the instructor to call, will be provided on Moodle. In
addition, students will be given a feedback form through which they
can communicate issues related to the course at any point of time.
Evaluation
There will be 10-12 weekly quizzes, each worth 4–6 marks,
and together totaling at least 50 marks. The marks contributed by
the quizzes to the grade will be the maximum of the total marks
earned in the quizzes and 40.
The research project will carry 60 marks, divided as: 5 marks for
an introductory presentation, 5 marks for a proposal, 10 marks for a
mid-stage presentation, 10 marks for the final presentation, and 30
marks for the final report. An outstanding research project
will bypass the regular evaluation criteria and automatically result
in an "AA" grade for the concerned team.
All submissions must be made through Moodle.
Students auditing the course must score 50 or more marks in the
course to be awarded an "AU" grade.
Moodle
Moodle will be the primary course management system. Marks for the
assessments will be maintained on the class Moodle page; discussion
fora will also be hosted on Moodle. Students who do not have an
account on Moodle for the course must send the instructor a request
by e-mail, specifying the roll number/employee number for account
creation.
Academic Honesty
Students are expected to adhere to the highest standards of
integrity and academic honesty. Academic violations, as detailed
below, will be dealt with strictly, in accordance with the
institute's procedures
and disciplinary
actions for academic malpractice.
Students are expected to work alone on all the quizzes. They may
not share code or consult with classmates (or anybody other than the
instructor and TAs) regarding their solutions. Violations will be
considered acts of dishonesty.
Students may freely collaborate with their peers on their research,
but any assistance received from colleagues must be properly
acknowledged in the corresponding presentations and reports.
Communication
This page will serve as the primary source of information regarding
the course, the schedule, and related announcements. The Moodle page
for the course will be used for sharing resources for the lectures and
assignments, and also for recording grades.
E-mail is the best means of communicating with the instructor;
students must send e-mail to "shivaram@cse.iitb.ac.in" with "[CS748]"
in the header.
Texts and References
Reinforcement Learning: An Introduction, Richard S. Sutton and
Andrew G. Barto, 2nd edition, MIT Press,
2018. On-line
version.
Artificial Intelligence: Foundations of Computational Agents: David
L. Poole and Alan K. Mackworth, 2nd edition, Cambridge University Press,
2017. On-line
version.
Selected research papers.
-
Markov games as a framework for multi-agent reinforcment learning
Michael L. Littman, 1994
-
An Improved Policy Iteration Algorithm for Partially Observable MDPs
Eric A. Hansen, 1997
-
New Methods for Competitive Coevolution
Christopher D. Rosin and Richard K. Belew, 1997
-
Planning and acting in partially observable stochastic domains
Leslie Pack Kaelbling, Michael L. Littman, and Anthony R. Cassandra, 1998
-
On the Complexity of Policy Iteration
Yishay Mansour and Satinder Singh, 1999
-
Algorithms for Inverse Reinforcement Learning
Andrew Y. Ng and Stuart Russell, 2000
-
Rational and Convergent Learning in Stochastic Games
Michael Bowling and Manuela Veloso, 2001
-
Autonomous helicopter flight via reinforcement learning,
Andrew Y. Ng, H. Jin Kim, Michael I. Jordan, and Shankar Sastry, 2003
-
Labeling Images with a Computer Game
Luis von Ahn and Laura Dabbish, 2004
-
Bandit based Monte-Carlo Planning
Levente Kocsis and Csaba Szepesvári, 2006
-
If multi-agent learning is the answer, what is the question?
Yoav Shoham, Rob Powers, and Trond Grenager, 2006
-
Evolutionary Function Approximation for Reinforcement Learning
Shimon Whiteson and Peter Stone, 2006
-
Efficient Selection of Multiple Bandit Arms: Theory and Practice
Shivaram Kalyanakrishnan and Peter Stone, 2010.
-
-
PAC Subset Selection in Stochastic Multi-armed Bandits
Shivaram Kalyanakrishnan, Ambuj Tewari, Peter Auer, and Peter Stone, 2012.
-
-
Information Complexity in Bandit Subset Selection
Emilie Kaufmann and Shivaram Kalyanakrishnan, 2013.
-
-
Human-level control through deep reinforcement learning
Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A. Rusu, Joel
Veness, Marc G. Bellemare, Alex Graves, Martin Riedmiller, Andreas
K. Fidjeland, Georg Ostrovski, Stig Petersen, Charles Beattie, Amir
Sadik, Ioannis Antonoglou, Helen King, Dharshan Kumaran, Daan
Wierstra, Shane Legg, and Demis Hassabis, 2015
-
Batch-Switching Policy Iteration
Shivaram Kalyanakrishnan, Utkarsh Mall, and Ritish Goyal, 2016
-
Mastering the game of Go with deep neural networks and tree search
David Silver, Aja Huang, Chris J. Maddison, Arthur Guez, Laurent Sifre, George van den Driessche, Julian Schrittwieser, Ioannis Antonoglou, Veda Panneershelvam, Marc Lanctot, Sander Dieleman, Dominik Grewe, John Nham, Nal Kalchbrenner, Ilya Sutskever, Timothy Lillicrap, Madeleine Leach, Koray Kavukcuoglu, Thore Graepel, and Demis Hassabis, 2016
-
Mastering the game of Go without human knowledge
David Silver, Julian Schrittwieser, Karen Simonyan, Ioannis Antonoglou, Aja Huang, Arthur Guez, Thomas Hubert, Lucas Baker, Matthew Lai, Adrian Bolton, Yutian Chen, Timothy Lillicrap, Fan Hui, Laurent Sifre, George van den Driessche, Thore Graepel, and Demis Hassabis, 2017.
-
-
A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play
David Silver, Thomas Hubert, Julian Schrittwieser, Ioannis Antonoglou, Matthew Lai, Arthur Guez, Marc Lanctot, Laurent Sifre, Dharshan Kumaran, Thore Graepel, Timothy Lillicrap, Karen Simonyan, and Demis Hassabis, 2018.
-
-
A Tighter Analysis of Randomised Policy Iteration
Meet Taraviya and Shivaram Kalyanakrishnan, 2019
Schedule
-
Week 0 (January 6‐12) : Welcome; Introduction to the course.
-
Week 1 (January 13‐19) : Multiagent Reinforcement Learning.
- Administrative: Video.
- Lecture 1: Video, Slides.
- Reading: Littman, 1994.
- References: Bowling and Veloso (2001); Shoham, Powers, and Grenager (2006).
-
Week 2 (January 20‐26) : Search.
- Administrative: Video.
- Lecture 1: Video, Slides.
- Reading: Chapter 3, Poole and Mackworth (2017); Sections 8, 8.1, 8.8, 8.9, 8.10, 8.11, Sutton and Barto (2018).
- Reference: Kocsis and Szepesvári (2006).
-
Week 3 (January 27‐February 2) : No lecture.
-
Week 4 (February 3‐9) : Model-based RL.
- Administrative: Video.
- Lecture 1: Video, Slides.
Reading: Sections 8, 8.1, 8.2, 8.3, 8.4, Sutton and Barto (2018); Ng et al. (2003).
-
Week 5 (February 10‐16) : Applications of Deep RL.
- Lecture 1: Video, Slides.
Reading: Mnih et al. (2015), Silver et al. (2016).
References: Silver et al. (2017), Silver et al. (2018).
-
Week 6 (February 17‐23) : PAC Subset Selection in Stochastic Bandits.
- Lecture 1: Video, Slides.
Reading: Sections 1, 2, 3, Kalyanakrishnan and Stone (2010); Kalyanakrishnan et al. (2012).
References: Kaufmann and Kalyanakrishnan (2013).
-
Week 7 (March 3‐9) : Running-time Analysis of Policy Iteration.
- Lecture 1: Video, Slides.
References: Mansour and Singh (1999); Kalyanakrishnan, Mall, and Goyal (2016); Taraviya and Kalyanakrishnan (2019, see Section).
-
Week 8 (March 10‐16) : Inverse Reinforcement Learning.
- Lecture 1: Video, Slides.
Reading: Ng and Russell (2000).
-
Week 9 (March 17‐23) : Evolution and Learning.
- Lecture 1: Video, Slides.
Reading: Whiteson and Stone (2006).
-
Week 10 (March 24‐30) : Crowdsourcing.
- Lecture 1: Video, Slides.
Reading: von Ahn and Dabbish (2004).
-
Week 11 (March 31‐April 6) : POMDPs.
- Lecture 1: Video, Slides.
Reading: Sections 1, 2, 3, Kaelbling et al. (1998).
-
Week 12 (April 7‐April 13) : POMDP Solution Techniques.
- Lecture 1: Video, Slides.
Reading: Sections 4, 4.1, 4.2, 4.3, Kaelbling et al. (1998); Hansen (1997).
You can also watch all these lectures on the video
portal of Swayam Prabha IIT Bombay. For
"Department" choose "Computer Science and Engineering"; for Course
choose "Advances in Intelligent and Learning Agents"; press "Go".
Assignments
- Week 1 Quiz, due 11.55 p.m. Tuesday, January 19.
- Project proposal document, due 11.55 p.m. Sunday, January 24.
- Project proposal presentation (video), due 11.55 p.m. Sunday, January 31.
- Week 2 Quiz, due 11.55 p.m. Tuesday, February 2.
- Week 4 Quiz, due 11.55 p.m. Thursday, February 11.
- Week 5 Quiz, due 11.55 p.m. Tuesday, February 16.
- Mid-stage presentation (video), due 11.55 p.m. Sunday, March 7.
- Week 6 Quiz, due 11.55 p.m. Wednesday, March 3.
- Week 7 Quiz, due 11.55 p.m. Tuesday, March 16.
- Week 8 Quiz, due 11.55 p.m. Tuesday, March 16.
- Week 9 Quiz, due 11.55 p.m. Tuesday, March 23.
- Week 10 Quiz, due 11.55 p.m. Tuesday, March 30.
- Week 11 Quiz, due 11.55 p.m. Thursday, April 15.
- Project report, due 11.55 p.m. Monday, May 10.
- Project final presentation (video), due 11.55 p.m. Wednesday, May 12.
Copyright
Slides and videos on this page are licensed under
a Creative Commons
Attribution-ShareAlike 4.0 International License. Permission for
their use beyond the scope of the license may be sought by writing to
shivaram@cse.iitb.ac.in.