CS 748: Advances in Intelligent and Learning Agents
(Spring 2016)

(Picture source: https://upload.wikimedia.org/wikipedia/en/6/6c/Mspacman.png.)

Instructor

  Shivaram Kalyanakrishnan
  Office: SIA-204
  Phone: 7716
  E-mail: shivaram@cse.iitb.ac.in

Teaching Assistants

  Kishan Pandey
  E-mail: kishan@cse.iitb.ac.in

  Shashank Khobragade
  E-mail: kshashank@cse.iitb.ac.in

Class

Lectures will be held in SIC-205 during Slot 6: 11.05 a.m. – 12.30 p.m. Wednesdays and Fridays.

Office hours will immediately follow class and be up to 1.00 p.m. on Wednesdays and Fridays. Meetings can also be arranged by appointment.

Course Description

Artificial intelligence is fast making inroads into various fields, and is indeed influencing our lives in a telling way. This course will accustom students to the state-of-the-art in designing and deploying intelligent and learning agents. The course will build upon the platform laid by CS 747 (Foundations of Intelligent and Learning Agents) to engage students in targeted research and system-building projects.

The course has three main objectives. First, it seeks to impart students the mindset that artificial intelligence and machine learning can substantially benefit real-world problems, and give them the confidence that they can drive this process. Second, the course seeks to develop the students' skills to abstract, analyse, design, implement, evaluate, and iterate while devising solutions. The third objective of the course is to train students to comprehend technical discourse and sharpen their communication skills.

The course is organised in the form of two parallel tracks: class discussions based on research papers, and a semester-long research project. Students will be provided a reading assignment every week, and will be expected to turn in a response summarising their understanding and related observations. Individual responses will be shared with the class, and will guide the class discussion, which will be led by a group of 2-3 designated students. Topics covered in the reading assignments will include, among others, (1) philosophy of AI, (2) animal behaviour, (3) POMDPs, (4) evolutionary computation, (5) representation discovery, (6) crowdsourcing, (7) contextual bandits, (8) theoretical analysis of MDP planning and learning, (9) Monte Carlo tree search, (10) game-playing, and (11) robotics.

The research project presents an opportunity to students for applying their learning in creative and imaginative ways to understand, build, and analyse systems. Both theoretical and empirical investigations may be undertaken. Students may work alone or in teams. Each team will be guided individually through the phases of the research project, but will share its progress with the class at designated intervals.

Prerequisites

CS 747 and consent of instructor.

Evaluation

Class discussions will carry 40 marks, of which 20 marks will be for written responses to the reading assignments, 10 marks will be for leading the class discussion, and 10 marks for participation in class discussions.

The research project will carry 60 marks, divided as: 5 marks for an introductory presentation, 10 marks for a proposal, 10 marks for the final presentation, and 35 marks for the final report. An outstanding research project will bypass the regular evaluation criteria and automatically result in an ``AA'' grade for the concerned team.

All submissions must be made through Moodle.

Students auditing the course must score 50 or more marks in the course to be awarded an ``AU'' grade.

Academic Honesty

Students are expected to adhere to the highest standards of integrity and academic honesty. Students may freely collaborate with their peers, but any assistance received from colleagues must be properly acknowledged in the corresponding presentations and reports. Academic malpractice will be dealt with strictly, in accordance with the institute's procedures and disciplinary actions.

Reading Material

R1: Game-playing

Primary article
Practical Issues in Temporal Difference Learning
Gerald Tesauro, 1992
Related reference
TD-Gammon, A Self-Teaching Backgammon Program, Achieves Master-level Play
Gerald Tesauro, 1993

R2: Evolutionary Computation

Primary article
Evolutionary Algorithms for Reinforcement Learning
David E. Moriarty, Alan C. Schultz, and John J. Greffenstette, 1999
Secondary reference
Learning Tetris Using the Noisy Cross-Entropy Method
István Szita and András Lőrincz, 2006

R3: Philosophy of AI

Article 1
The Chinese Room Argument
John Searle, 2006
Article 2
Is it Enough to Get the Behaviour Right?
Hector J. Levesque, 2009

R4: Deployed Application 1

Mastering the game of Go with deep neural networks and tree search
David Silver, Aja Huang, Chris J. Maddison, Arthur Guez, Laurent Sifre, George van den Driessche, Julian Schrittwieser, Ioannis Antonoglou, Veda Panneershelvam, Marc Lanctot, Sander Dieleman, Dominik Grewe, John Nham, Nal Kalchbrenner, Ilya Sutskever, Timothy Lillicrap, Madeleine Leach, Koray Kavukcuoglu, Thore Graepel, and Demis Hassabis, 2016

R5: Monte Carlo Tree Search

Bandit based Monte-Carlo Planning
Levente Kocsis and Csaba Szepesvári, 2006

R6: Partial Observability

Planning and acting in partially observable stochastic domains
Leslie Pack Kaelbling, Michael L. Littman, and Anthony R. Cassandra, 1998

R7: Contextual Bandits

A Contextual Bandit Approach to Personalized News Recommendation
Lihong Li, Wei Chu, John Langford, and Robert E. Schapire, 2010

R8: Crowdsourcing

Primary paper
Labeling Images with a Computer Game
Luis von Ahn and Laura Dabbish, 2004
Secondary reference
Crowdsourcing Systems on the World-Wide Web
Anhai Doan, Raghu Ramakrishnan, and Alon Y. Halevy, 2011

R9: Robotics

Simultaneous Localisation and Mapping (SLAM): Part I The Essential Algorithms
Hugh Durrant-Whyte and Tim Bailey, 2006

R10: Animal Behaviour

Honeybee Navigation En Route to the Goal: Visual Flight Control and Odometry
M. V. Srinivasan, S. W. Zhang, M. Lehrer, and T. S. Collett, 1996

R11: Deployed Application 2

Adaptive Treatment of Epilepsy via Batch-mode Reinforcement Learning
Arthur Guez, Robert D. Vincent, Massimo Avoli, and Joelle Pineau, 2008

Additional References

VOX POPULI
Francis Galton, 1907
Policy invariance under reward transformations: Theory and application to reward shaping
Andrew Y. Ng, Daishi Harada, and Stuart Russell, 1999
Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning
Richard S. Sutton, Doina Precup, and Satinder Singh, 1999
Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition
Thomas G. Dietterich, 2000
Algorithms for Inverse Reinforcement Learning
Andrew Y. Ng and Stuart Russell, 2000
Evolving Neural Networks through Augmenting Topologies
Kenneth O. Stanley and Risto Miikkulainen, 2002
Autonomous Transfer for Reinforcement Learning
Matthew E. Taylor, Gregory Kuhlmann, and Peter Stone, 2008
Learning Classifier Systems: A Complete Introduction, Review, and Roadmap
Ryan J. Urbanowicz and Jason H. Moore, 2009
PAC Subset Selection in Stochastic Multi-armed Bandits
Shivaram Kalyanakrishnan, Ambuj Tewari, Peter Auer, and Peter Stone, 2012
The CMA Evolution Strategy: A Tutorial
Nikolaus Hansen, 2015

Communication

This page will serve as the primary source of information regarding the course, the schedule, and related announcements. The Moodle page for the course will be used for sharing resources for the lectures and assignments, and also for recording grades.

E-mail is the best means of communicating with the instructor; students must send e-mail with ``[CS748]'' in the header.

Schedule

January 6: Welcome, Introduction to the course.
Summary: Conception of course, syllabus, policies, administrative matters.
January 8: Individual meetings with teams in SIA-204 (schedule on Moodle).
January 13: Introductory presentations to research projects by teams T1, T2, T10, T12, and T13.
January 15: Introductory presentations to research projects by teams T4, T5, T6, T7, and T9.
January 20: Introductory presentations to research projects by teams T3 and T11; presentation on PAC Subset Selection in Stochastic Multi-armed Bandits.
Reference: Kalyanakrishnan et al. (2012).
January 22: Class discussion on game-playing (R1).
Reference: Taylor et al. (2008).
January 27: Class discussion on evolutionary computation (R2).
References: Stanley and Miikkulainen (2002), Urbanowicz and Moore (2009), Hansen (2015).
January 29: Individual meetings with teams T1, T2, T3, T4, T10, and T12 in SIA-204 (schedule on Moodle).
February 3: Class discussion on philosophy of AI (R3).
February 5: Individual meetings with teams T5, T6, T7, T9, T11, and T13 in SIA-204 (schedule on Moodle).
February 7: Class discussion on Deployed Application 1 (R4).
February 10: Presentation on Randomised Procedures for Initialising and Switching Actions in Policy Iteration.
February 27: Class discussions on Monte Carlo Tree Search (R5) and Partial Observability (R6) (Meeting time: 8.30 a.m. – 11.00 a.m.).
March 2: Individual meetings with teams T1, T3, T5, T7, T10, and T12 in SIA-204 (schedule on Moodle).
March 4: Individual meetings with teams T2, T4, T6, T9, T11, and T13 in SIA-204 (schedule on Moodle).
March 9: Class discussion on Contextual Bandits (R7).
March 11: Class discussion on Recent Trends in AI.
March 16: Class discussion on Crowdsourcing (R8).
Reference: Galton (1907).
March 18: Advanced topics in reinforcement learning (inverse reinforcement learning, reward shaping, optimistic initialisation, state and temporal abstraction).
References: Ng et al. (1999), Sutton et al. (1999), Dietterich (2000), Ng and Russell (2000).
March 23: Class discussion on Robotics (R9).
March 30: Class discussion on Animal Behaviour (R10).
April 1: Algorithms for Stochastic Optimization and Reinforcement Learning: A Non-Asymptotic Viewpoint. Invited talk by Prashanth L. A. Venue: SIC 201.
April 6: Class discussion on Deployed Application 2 (R11).
April 13: Final project presentations by teams T3, T4, T7, T9, and T2.
April 15: Final project presentations by teams T1, T5, T6, T10 and T11.

Assignments

Response to reading material R1, due noon, Monday, January 18.
Response to reading material R2, due noon, Monday, January 25.
Response to reading material R3, due noon, Monday, February 1.
Response to reading material R4, due noon, Monday, February 8.
Project proposal, due noon, Monday, February 8.
Response to reading material R5, due noon, Wednesday, February 24.
Response to reading material R6, due noon, Wednesday, February 24.
Response to reading material R7, due noon, Monday, March 7.
Response to reading material R8, due noon, Monday, March 14.
Response to reading material R9, due noon, Monday, March 21.
Response to reading material R10, due noon, Monday, March 28.
Response to reading material R11, due noon, Monday, April 4.
Project report, due noon, Monday, May 2.

CS 748: Advances in Intelligent and Learning Agents(Spring 2016)