CS 748: Advances in Intelligent and Learning Agents
(Spring 2023)
(Page last edited .)
Instructor
Shivaram Kalyanakrishnan
Office: 220, New CSE Building
Phone: 7704
E-mail: shivaram@cse.iitb.ac.in
Teaching Assistant
Santhosh Kumar G.
E-mail: santhoshkg@iitb.ac.in
Class
Lectures will be held in 101, New CSE Building, during Slot 13:
7.00 p.m. – 8.25 p.m. Mondays and Thursdays.
The instructor will be available for consultation immediately
following class, up to 9.00 p.m., on both Mondays and Thursdays. He
will also hold office hours (220, New CSE Building) 9.00
a.m. – 10.00 a.m. Wednesdays.
Course Description
Artificial intelligence is fast making inroads into various fields,
and is indeed influencing our lives in a telling way. This course will
accustom students to the state-of-the-art in designing and deploying
intelligent and learning agents. The course will build upon the
platform laid by CS 747 (Foundations of Intelligent and Learning
Agents) to engage students in targeted research and system-building
projects.
The course has three main objectives. First, it seeks to impart
students the mindset that artificial intelligence and machine learning
can substantially benefit real-world problems, and give them the
confidence that they can drive this process. Second, the course seeks
to develop the students' skills to abstract, analyse, design,
implement, evaluate, and iterate while devising solutions. The third
objective of the course is to train students to comprehend technical
discourse and sharpen their communication skills.
The course is organised in the form of two parallel
tracks: class discussions based on research papers, and a
semester-long
research project. Students will be provided a reading
assignment every class, and will be expected to turn in a response
summarising their understanding and related observations. Individual
responses will be shared with the class, and will guide the class
discussion. Topics covered in the reading assignments will include,
among others, multiagent learning, POMDPs, theoretical analysis of MDP
planning and learning, representation discovery, exploration,
abstraction, animal behaviour, evolutionary algorithms, philosophy of
AI, and applications.
The research project presents an opportunity to students for
applying their learning in creative and imaginative ways to
understand, build, and analyse systems. Both theoretical and empirical
investigations may be undertaken. Students may work alone or in teams
(of size up to three). Each team will be guided individually through
the phases of the research project, but will share its progress with
the class at designated intervals.
Prerequisites
Registration is open to students who have taken CS 747 (any
previous offering) and secured a grade of AA or AB.
Evaluation
Class discussions will carry 45 marks, of which 35 marks will be
for written responses to the reading assignments, and 10 marks for
contribution to class discussions.
The research project will carry 55 marks, divided as: 5 marks for
an introductory presentation, 5 marks for a proposal, 10 marks for a
mid-stage presentation, 10 marks for the final presentation, and 25
marks for the final report. An outstanding research project
will bypass the regular evaluation criteria and automatically result
in an "AA" grade for the concerned team.
All submissions must be made through Moodle.
Students auditing the course must score 50 or more marks in the
course to be awarded an "AU" grade.
Academic Honesty
Students are expected to adhere to the highest standards of
integrity and academic honesty. Academic violations, as detailed
below, will be dealt with strictly, in accordance with the
institute's procedures
and disciplinary
actions for academic malpractice.
Students may freely collaborate with their peers towards their
research projects, but any assistance received from colleagues must be
properly acknowledged in the corresponding presentations and
reports. Failure to list any source used in responses to reading
assignments will be considered an academic violation.
Students are allowed to verbally discuss the readings with their
peers. However, they must not view, access, or consult any others'
written responses. It is all right to read related papers from the
published literature, blogs, and so on. However, students must
cite every resource consulted or used, as a part of their
response itself.
Students may use existing code and libraries towards their research
project, but must take care to provide appropriate attribution in
the relevant reports. Failure to list any resource used will be
considered an academic violation.
If in any doubt as to what is legitimate collaboration and what is
not, students must ask the instructor.
Reading Material
-
[R1] Adaptive Treatment of Epilepsy via Batch-mode Reinforcement Learning, Arthur Guez, Robert D. Vincent, Massimo Avoli, and Joelle Pineau, 2008.
-
[R2] Outracing champion Gran Turismo drivers with deep reinforcement learning, Peter R. Wurman, Samuel Barrett, Kenta Kawamoto, James MacGlashan, Kaushik Subramanian, Thomas J. Walsh, Roberto Capobianco, Alisa Devlic, Franziska Eckert, Florian Fuchs, Leilani Gilpin, Piyush Khandelwal, Varun Kompella, HaoChih Lin, Patrick MacAlpine, Declan Oller, Takuma Seno, Craig Sherstan, Michael D. Thomure, Houmehr Aghabozorgi, Leon Barrett, Rory Douglas, Dion Whitehead, Peter Dürr, Peter Stone, Michael Spranger, and Hiroaki Kitano, 2022.
-
[R3] Labeling Images with a Computer Game, Luis von Ahn and Laura Dabbish, 2004.
-
[R4] Algorithms for Inverse Reinforcement Learning, Andrew Y. Ng and Stuart Russell, 2000.
-
[R5] On the Expressivity of Markov Reward, David Abel, Will Dabney, Anna Harutyunyan, Mark K. Ho, Michael L. Littman, Doina Precup, and Satinder Singh, 2021.
-
[R6] Policy
Gradient Reinforcement Learning for Fast Quadrupedal Locomotion,
Nate Kohl and Peter Stone, 2004.
-
[R7] On the Complexity of Policy Iteration,
Yishay Mansour and Satinder Singh, 1999.
-
[R8] A Tighter Analysis of Randomised Policy Iteration,
Meet Taraviya and Shivaram Kalyanakrishnan, 2019.
-
[R9] Is it Enough to Get the Behaviour Right?, Hector J. Levesque, 2009.
-
[R10] Neural Architecture Search with Reinforcement Learning, Barret Zoph and Quoc V. Le, 2017.
-
[R11] A Contextual-Bandit Approach to Personalized News Article Recommendation, Lihong Li, Wei Chu, John Langford, and Robert E. Schapire, 2012.
-
[R12] Neural Thompson Sampling, Weitong Zhang, Dongruo Zhou, Lihong Li, and Quanquan Gu, 2021.
-
[R13] Action Selection for Hammer Shots in Curling, Zaheen Farraz Ahmad, Robert C. Holte, and Michael Bowling, 2016.
-
[R14] Cross-Domain Transfer for Reinforcement Learning, Matthew E. Taylor and Peter Stone, 2007.
-
[R15] On the Dangers of Stochastic Parrots:Can Language Models Be Too Big? 🦜, Emily M. Bender, Timnit Gebru, Angelina McMillan-Major, and Shmargaret Shmitchell, 2021.
References
-
An Improved Policy Iteration Algorithm for Partially Observable MDPs
Eric A. Hansen, 1997
-
Planning and acting in partially observable stochastic domains
Leslie Pack Kaelbling, Michael L. Littman, and Anthony R. Cassandra, 1998
-
PAC
Mode Estimation using PPR Martingale Confidence Sequences,
Shubham Anand Jain, Rohan Shah, Sanit Gupta, Denil Mehta,
Inderjeet Nair, Jian Vora, Sushil Khyalia, Sourav Das, Vinay
J. Ribeiro, and Shivaram Kalyanakrishnan, 2022.
Communication
This page will serve as the primary source of information regarding
the course, the schedule, and related announcements. The Moodle page
for the course will be used for sharing resources for the lectures and
assignments, and also for recording grades.
E-mail is the best means of communicating with the instructor;
students must send e-mail to "shivaram@cse.iitb.ac.in" with "[CS748]"
in the header.
Schedule
-
January 2: Welcome, Introduction to the course.
-
January 5: Presentation by instructor: An Efficient Algorithm for PAC Mode Estimation.
Reference: Jain et al. (2022).
-
January 9: Discussion of R1.
-
January 12: Discussion of R2.
-
January 16: Discussion of R3.
-
January 19: Further discussion on crowdsourcing.
Reference: Luis von Ahn's TEDxCMU2011 talk.
-
January 23: Discussion of R4.
-
January 30: Proposal presentations.
-
February 2: Proposal presentations.
-
February 6: Discussion of R5.
-
February 9: Discussion of R6.
-
February 13: Discussion of R7.
-
February 16: Discussion of R8.
-
February 27: POMDPs: Part 1.
Reference: Sections 1, 2, 3, Kaelbling et al. (1998).
-
March 2: POMDPs: Part 2.
Reference: Sections 4, 4.1, 4.2, 4.3, Kaelbling et al. (1998); Hansen (1997).
-
March 6: Discussion of R9.
-
March 9: Mid-stage presentation.
-
March 16: Discussion of R10.
-
March 20: Discussion of R11.
-
March 23: Discussion of R12.
-
March 27: Discussion of R13.
-
March 30: Discussion of R14.
-
April 3: Presentation by R. B. Sunoj: Reinforcement Learning for Exploration of Reactions.
-
April 6: No meeting.
-
April 10: Discussion of R15.
-
April 13: Discussion of research projects with instructor.
Assignments
-
Response to R1, due 11.55 p.m., Sunday, January 8.
-
Response to R2, due 11.55 p.m., Wednesday, January 11.
-
Response to R3, due 11.55 p.m., Sunday, January 15.
-
Writing Assignment 1, due 11.55 a.m., Thursday, January 19.
-
Response to R4, due 11.55 p.m., Sunday, January 22.
-
Project proposal, due 11.55 p.m., Wednesday, January 25.
-
Response to R5, due 11.55 p.m., Sunday, February 5.
-
Response to R6, due 11.55 p.m., Wednesday, February 8.
-
Response to R7, due 11.55 p.m., Sunday, February 12.
-
Response to R8, due 11.55 p.m., Wednesday, February 15.
- Mid-stage
presentation, to be made in class Thursday, March 9.
-
Response to R9, due 11.55 p.m., Sunday, March 5.
-
Response to R10, due 11.55 p.m., Wednesday, March 15.
-
Response to R11, due 11.55 p.m., Sunday, March 19.
-
Response to R12, due 11.55 p.m., Wednesday, March 22.
-
Response to R13, due 11.55 p.m., Sunday, March 26.
-
Response to R14, due 11.55 p.m., Wednesday, March 29.
-
Report on mid-stage task, due 11.55 p.m., Sunday, April 2.
-
Response to R15, due 11.55 p.m., Wednesday, April 5.
-
Project report, due 11.55 p.m. Friday, April 28.
-
Project final presentation (video), due 11.55 p.m. Saturday, April 29.