CS 748: Advances in Intelligent and Learning Agents
(Spring 2023)

(Page last edited .)

Instructor

  Shivaram Kalyanakrishnan
  Office: 220, New CSE Building
  Phone: 7704
  E-mail: shivaram@cse.iitb.ac.in

Teaching Assistant

Santhosh Kumar G.
E-mail: santhoshkg@iitb.ac.in

Class

Lectures will be held in 101, New CSE Building, during Slot 13: 7.00 p.m. – 8.25 p.m. Mondays and Thursdays.

The instructor will be available for consultation immediately following class, up to 9.00 p.m., on both Mondays and Thursdays. He will also hold office hours (220, New CSE Building) 9.00 a.m. – 10.00 a.m. Wednesdays.

Course Description

Artificial intelligence is fast making inroads into various fields, and is indeed influencing our lives in a telling way. This course will accustom students to the state-of-the-art in designing and deploying intelligent and learning agents. The course will build upon the platform laid by CS 747 (Foundations of Intelligent and Learning Agents) to engage students in targeted research and system-building projects.

The course has three main objectives. First, it seeks to impart students the mindset that artificial intelligence and machine learning can substantially benefit real-world problems, and give them the confidence that they can drive this process. Second, the course seeks to develop the students' skills to abstract, analyse, design, implement, evaluate, and iterate while devising solutions. The third objective of the course is to train students to comprehend technical discourse and sharpen their communication skills.

The course is organised in the form of two parallel tracks: class discussions based on research papers, and a semester-long research project. Students will be provided a reading assignment every class, and will be expected to turn in a response summarising their understanding and related observations. Individual responses will be shared with the class, and will guide the class discussion. Topics covered in the reading assignments will include, among others, multiagent learning, POMDPs, theoretical analysis of MDP planning and learning, representation discovery, exploration, abstraction, animal behaviour, evolutionary algorithms, philosophy of AI, and applications.

The research project presents an opportunity to students for applying their learning in creative and imaginative ways to understand, build, and analyse systems. Both theoretical and empirical investigations may be undertaken. Students may work alone or in teams (of size up to three). Each team will be guided individually through the phases of the research project, but will share its progress with the class at designated intervals.

Prerequisites

Registration is open to students who have taken CS 747 (any previous offering) and secured a grade of AA or AB.

Evaluation

Class discussions will carry 45 marks, of which 35 marks will be for written responses to the reading assignments, and 10 marks for contribution to class discussions.

The research project will carry 55 marks, divided as: 5 marks for an introductory presentation, 5 marks for a proposal, 10 marks for a mid-stage presentation, 10 marks for the final presentation, and 25 marks for the final report. An outstanding research project will bypass the regular evaluation criteria and automatically result in an "AA" grade for the concerned team.

All submissions must be made through Moodle.

Students auditing the course must score 50 or more marks in the course to be awarded an "AU" grade.

Academic Honesty

Students are expected to adhere to the highest standards of integrity and academic honesty. Academic violations, as detailed below, will be dealt with strictly, in accordance with the institute's procedures and disciplinary actions for academic malpractice.

Students may freely collaborate with their peers towards their research projects, but any assistance received from colleagues must be properly acknowledged in the corresponding presentations and reports. Failure to list any source used in responses to reading assignments will be considered an academic violation.

Students are allowed to verbally discuss the readings with their peers. However, they must not view, access, or consult any others' written responses. It is all right to read related papers from the published literature, blogs, and so on. However, students must cite every resource consulted or used, as a part of their response itself.

Students may use existing code and libraries towards their research project, but must take care to provide appropriate attribution in the relevant reports. Failure to list any resource used will be considered an academic violation.

If in any doubt as to what is legitimate collaboration and what is not, students must ask the instructor.

Reading Material

[R1] Adaptive Treatment of Epilepsy via Batch-mode Reinforcement Learning, Arthur Guez, Robert D. Vincent, Massimo Avoli, and Joelle Pineau, 2008.
[R2] Outracing champion Gran Turismo drivers with deep reinforcement learning, Peter R. Wurman, Samuel Barrett, Kenta Kawamoto, James MacGlashan, Kaushik Subramanian, Thomas J. Walsh, Roberto Capobianco, Alisa Devlic, Franziska Eckert, Florian Fuchs, Leilani Gilpin, Piyush Khandelwal, Varun Kompella, HaoChih Lin, Patrick MacAlpine, Declan Oller, Takuma Seno, Craig Sherstan, Michael D. Thomure, Houmehr Aghabozorgi, Leon Barrett, Rory Douglas, Dion Whitehead, Peter Dürr, Peter Stone, Michael Spranger, and Hiroaki Kitano, 2022.
[R3] Labeling Images with a Computer Game, Luis von Ahn and Laura Dabbish, 2004.
[R4] Algorithms for Inverse Reinforcement Learning, Andrew Y. Ng and Stuart Russell, 2000.
[R5] On the Expressivity of Markov Reward, David Abel, Will Dabney, Anna Harutyunyan, Mark K. Ho, Michael L. Littman, Doina Precup, and Satinder Singh, 2021.
[R6] Policy Gradient Reinforcement Learning for Fast Quadrupedal Locomotion, Nate Kohl and Peter Stone, 2004.
[R7] On the Complexity of Policy Iteration, Yishay Mansour and Satinder Singh, 1999.
[R8] A Tighter Analysis of Randomised Policy Iteration, Meet Taraviya and Shivaram Kalyanakrishnan, 2019.
[R9] Is it Enough to Get the Behaviour Right?, Hector J. Levesque, 2009.
[R10] Neural Architecture Search with Reinforcement Learning, Barret Zoph and Quoc V. Le, 2017.
[R11] A Contextual-Bandit Approach to Personalized News Article Recommendation, Lihong Li, Wei Chu, John Langford, and Robert E. Schapire, 2012.
[R12] Neural Thompson Sampling, Weitong Zhang, Dongruo Zhou, Lihong Li, and Quanquan Gu, 2021.
[R13] Action Selection for Hammer Shots in Curling, Zaheen Farraz Ahmad, Robert C. Holte, and Michael Bowling, 2016.
[R14] Cross-Domain Transfer for Reinforcement Learning, Matthew E. Taylor and Peter Stone, 2007.
[R15] On the Dangers of Stochastic Parrots:Can Language Models Be Too Big? 🦜, Emily M. Bender, Timnit Gebru, Angelina McMillan-Major, and Shmargaret Shmitchell, 2021.

References

An Improved Policy Iteration Algorithm for Partially Observable MDPs
Eric A. Hansen, 1997
Planning and acting in partially observable stochastic domains
Leslie Pack Kaelbling, Michael L. Littman, and Anthony R. Cassandra, 1998
PAC Mode Estimation using PPR Martingale Confidence Sequences, Shubham Anand Jain, Rohan Shah, Sanit Gupta, Denil Mehta, Inderjeet Nair, Jian Vora, Sushil Khyalia, Sourav Das, Vinay J. Ribeiro, and Shivaram Kalyanakrishnan, 2022.

Communication

This page will serve as the primary source of information regarding the course, the schedule, and related announcements. The Moodle page for the course will be used for sharing resources for the lectures and assignments, and also for recording grades.

E-mail is the best means of communicating with the instructor; students must send e-mail to "shivaram@cse.iitb.ac.in" with "[CS748]" in the header.

Schedule

January 2: Welcome, Introduction to the course.
January 5: Presentation by instructor: An Efficient Algorithm for PAC Mode Estimation.
Reference: Jain et al. (2022).
January 9: Discussion of R1.
January 12: Discussion of R2.
January 16: Discussion of R3.
January 19: Further discussion on crowdsourcing.
Reference: Luis von Ahn's TEDxCMU2011 talk.
January 23: Discussion of R4.
January 30: Proposal presentations.
February 2: Proposal presentations.
February 6: Discussion of R5.
February 9: Discussion of R6.
February 13: Discussion of R7.
February 16: Discussion of R8.
February 27: POMDPs: Part 1.
Reference: Sections 1, 2, 3, Kaelbling et al. (1998).
March 2: POMDPs: Part 2.
Reference: Sections 4, 4.1, 4.2, 4.3, Kaelbling et al. (1998); Hansen (1997).
March 6: Discussion of R9.
March 9: Mid-stage presentation.
March 16: Discussion of R10.
March 20: Discussion of R11.
March 23: Discussion of R12.
March 27: Discussion of R13.
March 30: Discussion of R14.
April 3: Presentation by R. B. Sunoj: Reinforcement Learning for Exploration of Reactions.
April 6: No meeting.
April 10: Discussion of R15.
April 13: Discussion of research projects with instructor.

Assignments

Response to R1, due 11.55 p.m., Sunday, January 8.
Response to R2, due 11.55 p.m., Wednesday, January 11.
Response to R3, due 11.55 p.m., Sunday, January 15.
Writing Assignment 1, due 11.55 a.m., Thursday, January 19.
Response to R4, due 11.55 p.m., Sunday, January 22.
Project proposal, due 11.55 p.m., Wednesday, January 25.
Response to R5, due 11.55 p.m., Sunday, February 5.
Response to R6, due 11.55 p.m., Wednesday, February 8.
Response to R7, due 11.55 p.m., Sunday, February 12.
Response to R8, due 11.55 p.m., Wednesday, February 15.
Mid-stage presentation, to be made in class Thursday, March 9.
Response to R9, due 11.55 p.m., Sunday, March 5.
Response to R10, due 11.55 p.m., Wednesday, March 15.
Response to R11, due 11.55 p.m., Sunday, March 19.
Response to R12, due 11.55 p.m., Wednesday, March 22.
Response to R13, due 11.55 p.m., Sunday, March 26.
Response to R14, due 11.55 p.m., Wednesday, March 29.
Report on mid-stage task, due 11.55 p.m., Sunday, April 2.
Response to R15, due 11.55 p.m., Wednesday, April 5.
Project report, due 11.55 p.m. Friday, April 28.
Project final presentation (video), due 11.55 p.m. Saturday, April 29.

CS 748: Advances in Intelligent and Learning Agents(Spring 2023)