CS 747: Programming Assignment 4

This assignment is meant to give you the experience of developing both agent and environment. Consequently it is more open-ended than your previous assignments. As a part of this assignment, you will implement the Windy Gridworld task given as Example 6.5 by Sutton and Barto (2018). You will program some agent-environment interactions, record your results, and present you interpretations. You can use any programming language of your choice for this assignment.

Tasks

  1. Implement Windy Gridworld as an episodic MDP. The core of your code will have to be a function (or functions) to obtain next state and reward for a given state and action. You can use your own function names and conventions.
  2. Implement a Sarsa(0) agent as described in the example, and obtain a baseline plot similar to the one accompanying the example (episodes against time steps). You can set learning and exploration rates as you see fit (just be sure to describe them in your report).
  3. Get another plot when King's moves are permitted (that is, 8 actions in total), as described in Exercise 6.9.
  4. Add stochasticity to the task as described in Exercise 6.10, and again plot the resulting performance of the Sarsa agent. Make sure you note down your convention for modeling corner cases.

In all your experiments, generate at least ten independent runs by varying the random seed. Plot the average statistic in the graphs.

Submission

You must submit:

  1. Your code for implementing the task and its variants;
  2. Code for your Sarsa agent;
  3. A script to run your simulations and gather data;
  4. Plots of your agent's performance;
  5. A README file describing how to run your code and obtain the plots; and
  6. A report presenting your observations from these experiments (as a pdf file).

Place all these items in a directory titled [rollno] (such as 1234567). You must then submit your [rollno] directory, compressed as [rollno].tar.gz (say 1234567.tar.gz). Before you upload the submission to Moodle, make sure you can successfully run your code on the departmental (sl2) machines.

Convince yourself that the results obtained match your expectations. Feel free to be creative and use the simulation environment to test related hypotheses you might find interesting. Your observations (under 6) must explain the variations observed across the three task settings, and report any particular issues you encountered while experimenting with this task. Don't hesitate to include additional numbers or graphs.

Evaluation

Your marks will be divided roughly equally among the three tasks you have to implement, in each case determined by the plot and the accompanying observations.

The TAs and instructor may look at your source code and notes to corroborate the results obtained by your program, and may also call you to a face-to-face session to explain your code.

Deadline and Rules

Your submission is due by 11.55 p.m., Sunday, November 11. You are advised to finish working on your submission well in advance, keeping enough time to test it on the sl2 machines and upload to Moodle. Your submission will not be evaluated (and will be given a score of zero) if it is not received by the deadline.

Test your code on the sl2 machines even while you are developing it: do not postpone this step to the last minute. If your code requires any special libraries to run, it is your responsibility to get those libraries working on the sl2 machines (go through the CSE bug tracking system to make a request to the system administrators). Make sure that you upload the intended version of your code to Moodle (after uploading, download your submission and test it on the sl2 machines to make sure it is the correct version). You will not be allowed to alter your code in any way after the submission deadline. In short: your grade will be completely determined by your submission on Moodle at the time of the deadline. Play safe by having it uploaded and tested at least a few hours in advance.

You must work alone on this assignment. Do not share any code (whether yours or code you have found on the Internet) with your classmates. Do not discuss the design of your solution with anybody else. Do not see anybody else's code or report, either your colleagues' or from sites on the Internet that discuss Windy Gridworld.