CS 748: Advances in Intelligent and Learning Agents
(Spring 2016)
(Picture source: https://upload.wikimedia.org/wikipedia/en/6/6c/Mspacman.png.)
Instructor
Shivaram Kalyanakrishnan
Office: SIA-204
Phone: 7716
E-mail: shivaram@cse.iitb.ac.in
Teaching Assistants
Kishan Pandey
E-mail: kishan@cse.iitb.ac.in
Shashank Khobragade
E-mail: kshashank@cse.iitb.ac.in
Class
Lectures will be held in SIC-205 during Slot 6: 11.05 a.m. –
12.30 p.m. Wednesdays and Fridays.
Office hours will immediately follow class and be up to 1.00
p.m. on Wednesdays and Fridays. Meetings can also be arranged by
appointment.
Course Description
Artificial intelligence is fast making inroads into various fields,
and is indeed influencing our lives in a telling way. This course will
accustom students to the state-of-the-art in designing and deploying
intelligent and learning agents. The course will build upon the
platform laid by CS 747 (Foundations of Intelligent and Learning
Agents) to engage students in targeted research and system-building
projects.
The course has three main objectives. First, it seeks to impart
students the mindset that artificial intelligence and machine learning
can substantially benefit real-world problems, and give them the
confidence that they can drive this process. Second, the course seeks
to develop the students' skills to abstract, analyse, design,
implement, evaluate, and iterate while devising solutions. The third
objective of the course is to train students to comprehend technical
discourse and sharpen their communication skills.
The course is organised in the form of two parallel
tracks: class discussions based on research papers,
and a semester-long
research project. Students will be provided a reading
assignment every week, and will be expected to turn in a response
summarising their understanding and related observations. Individual
responses will be shared with the class, and will guide the class
discussion, which will be led by a group of 2-3 designated
students. Topics covered in the reading assignments will include,
among others, (1) philosophy of AI, (2) animal behaviour, (3) POMDPs,
(4) evolutionary computation, (5) representation discovery, (6)
crowdsourcing, (7) contextual bandits, (8) theoretical analysis of MDP
planning and learning, (9) Monte Carlo tree search, (10) game-playing,
and (11) robotics.
The research project presents an opportunity to students for
applying their learning in creative and imaginative ways to
understand, build, and analyse systems. Both theoretical and empirical
investigations may be undertaken. Students may work alone or in
teams. Each team will be guided individually through the phases of the
research project, but will share its progress with the class at
designated intervals.
Prerequisites
CS 747 and consent of instructor.
Evaluation
Class discussions will carry 40 marks, of which 20 marks will be
for written responses to the reading assignments, 10 marks will be for
leading the class discussion, and 10 marks for participation in class
discussions.
The research project will carry 60 marks, divided as: 5 marks for
an introductory presentation, 10 marks for a proposal, 10 marks for
the final presentation, and 35 marks for the final
report. An outstanding research project will bypass the regular
evaluation criteria and automatically result in an ``AA'' grade for
the concerned team.
All submissions must be made through Moodle.
Students auditing the course must score 50 or more marks in the
course to be awarded an ``AU'' grade.
Academic Honesty
Students are expected to adhere to the highest standards of
integrity and academic honesty. Students may freely collaborate with
their peers, but any assistance received from colleagues must be
properly acknowledged in the corresponding presentations and
reports. Academic malpractice will be dealt with strictly, in
accordance with the
institute's procedures
and disciplinary
actions.
Reading Material
R4: Deployed Application 1
-
Mastering the game of Go with deep neural networks and tree search
David Silver, Aja Huang, Chris J. Maddison, Arthur Guez, Laurent Sifre, George van den Driessche, Julian Schrittwieser, Ioannis Antonoglou, Veda Panneershelvam, Marc Lanctot, Sander Dieleman, Dominik Grewe, John Nham, Nal Kalchbrenner, Ilya Sutskever, Timothy Lillicrap, Madeleine Leach, Koray Kavukcuoglu, Thore Graepel, and Demis Hassabis, 2016
Additional References
-
VOX POPULI
Francis Galton, 1907
-
Policy invariance under reward transformations: Theory and application to reward shaping
Andrew Y. Ng, Daishi Harada, and Stuart Russell, 1999
-
Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning
Richard S. Sutton, Doina Precup, and Satinder Singh, 1999
-
Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition
Thomas G. Dietterich, 2000
-
Algorithms for Inverse Reinforcement Learning
Andrew Y. Ng and Stuart Russell, 2000
-
Evolving Neural Networks through
Augmenting Topologies
Kenneth O. Stanley and Risto Miikkulainen, 2002
-
Autonomous Transfer for Reinforcement Learning
Matthew E. Taylor, Gregory Kuhlmann, and Peter Stone, 2008
-
Learning Classifier Systems: A Complete Introduction, Review, and Roadmap
Ryan J. Urbanowicz and Jason H. Moore, 2009
-
PAC Subset Selection in Stochastic Multi-armed Bandits
Shivaram Kalyanakrishnan, Ambuj Tewari, Peter Auer, and Peter Stone, 2012
-
The CMA Evolution Strategy: A Tutorial
Nikolaus Hansen, 2015
Communication
This page will serve as the primary source of information regarding
the course, the schedule, and related announcements. The Moodle page
for the course will be used for sharing resources for the lectures and
assignments, and also for recording grades.
E-mail is the best means of communicating with the instructor;
students must send e-mail with ``[CS748]'' in the header.
Schedule
-
January 6: Welcome, Introduction to the course.
Summary: Conception of course, syllabus, policies, administrative matters.
-
January 8: Individual meetings with teams in SIA-204 (schedule on Moodle).
-
January 13: Introductory presentations to research projects by
teams T1, T2, T10, T12, and T13.
-
January 15: Introductory presentations to research projects by
teams T4, T5, T6, T7, and T9.
-
January 20: Introductory presentations to research projects by
teams T3 and T11; presentation on PAC Subset Selection in
Stochastic Multi-armed Bandits.
Reference: Kalyanakrishnan et
al. (2012).
-
January 22: Class discussion on game-playing (R1).
Reference: Taylor et al. (2008).
-
January 27: Class discussion on evolutionary computation (R2).
References: Stanley and Miikkulainen (2002), Urbanowicz and Moore (2009), Hansen (2015).
-
January 29: Individual meetings with teams T1, T2, T3, T4, T10, and T12 in SIA-204 (schedule on Moodle).
-
February 3: Class discussion on philosophy of AI (R3).
-
February 5: Individual meetings with teams T5, T6, T7, T9, T11, and T13 in SIA-204 (schedule on Moodle).
-
February 7: Class discussion on Deployed Application 1 (R4).
-
February 10: Presentation on Randomised Procedures for Initialising and Switching Actions in Policy Iteration.
-
February 27: Class discussions on Monte Carlo Tree Search (R5) and Partial Observability (R6) (Meeting time: 8.30 a.m. – 11.00 a.m.).
-
March 2: Individual meetings with teams T1, T3, T5, T7, T10, and T12 in SIA-204 (schedule on Moodle).
-
March 4: Individual meetings with teams T2, T4, T6, T9, T11, and T13 in SIA-204 (schedule on Moodle).
-
March 9: Class discussion on Contextual Bandits (R7).
-
March 11: Class discussion on Recent Trends in AI.
-
March 16: Class discussion on Crowdsourcing (R8).
Reference: Galton (1907).
-
March 18: Advanced topics in reinforcement learning (inverse
reinforcement learning, reward shaping, optimistic initialisation,
state and temporal abstraction).
References: Ng et al. (1999), Sutton et al. (1999), Dietterich (2000), Ng and Russell (2000).
-
March 23: Class discussion on Robotics (R9).
-
March 30: Class discussion on Animal Behaviour (R10).
-
April 1: Algorithms for Stochastic Optimization and
Reinforcement Learning: A Non-Asymptotic Viewpoint. Invited talk
by Prashanth
L. A. Venue: SIC 201.
-
April 6: Class discussion on Deployed Application 2 (R11).
-
April 13: Final project presentations by teams T3, T4, T7, T9, and T2.
-
April 15: Final project presentations by teams T1, T5, T6, T10 and T11.
Assignments
-
Response to reading material R1, due noon, Monday, January 18.
-
Response to reading material R2, due noon, Monday, January 25.
-
Response to reading material R3, due noon, Monday, February 1.
-
Response to reading material R4, due noon, Monday, February 8.
-
Project proposal, due noon, Monday, February 8.
-
Response to reading material R5, due noon, Wednesday, February 24.
-
Response to reading material R6, due noon, Wednesday, February 24.
-
Response to reading material R7, due noon, Monday, March 7.
-
Response to reading material R8, due noon, Monday, March 14.
-
Response to reading material R9, due noon, Monday, March 21.
-
Response to reading material R10, due noon, Monday, March 28.
-
Response to reading material R11, due noon, Monday, April 4.
-
Project report, due noon, Monday, May 2.