In this assignment, you will implement and compare different algorithms for sampling the arms of a stochastic multi-armed bandit. Each arm provides i.i.d. rewards from a fixed Bernoulli distribution. The objective is to minimise regret. The algorithms you will implement are epsilon-greedy exploration, UCB, KL-UCB, and Thompson Sampling.
The server
directory comprises the code to simulate a bandit,
in other words the ``environment''. The server waits for a client
(``agent'') to connect with it and start pulling arms in sequence. For
each pull, the server generates a 0/1 reward based on the true mean of
the arm pulled (known to the server, but unknown to the client), which
is communicated back to the agent. Each communication from the server
also contains the total number of pulls performed so far. Thus, each
server message is of the form "reward,pulls". Agent-server
communication happens through a TCP connection. The server is coded in
C++, and compiles into a binary called bandit-environment
on running
make
. It can be started with a shell script
called startserver.sh
.
The data
directory contains two bandit instances:
one with 5 arms and another with 25 arms. Each instance is coded as a
text file that lists the true means of the arms in sequence. Do create
and experiment with other instances for testing your algorithms.
The client
directory is provided to you as an
example of what your submission must achieve. The client is the agent
that implements sampling algorithms. The agent provided to you merely
samples the arms in a round-robin fashion: you will have to implement
more efficient sampling algorithms. Observe that the command line
parameters to the client include --algorithm
(which can
take
values epsilon-greedy
, UCB
, KL-UCB
,
and Thompson-Sampling
(all case-sensitive)),
and --epsilon
(which will be a real number in [0,
1]). The agent provided is coded in C++: it compiles
using make
into a binary
called bandit-agent
.
Run startexperiment.sh
in the top level directory to
run an experiment. It will start the server and the client in
sequence. The server writes out a log file based on its interaction
with the client; the per-step reward and regret are available from
this log file. Run with different algorithms, random seeds, bandit
instances, and horizons to get familiar with the space of
experiments.
You must submit a directory called client
, which contains
all the source and executable files of your agent. The directory must
contain a script named startclient.sh
, which must take
in all the command line arguments that are accepted by the example
client provided. You are free to build upon the provided C++ agent, or
otherwise implement an agent in any programming language of your
choice. The hostname and port number should suffice for setting up a TCP
connection with the server.
Your code will be tested by running an experiment that calls
startclient.sh
in your directory. Before you submit, make
sure you can successfully run startexperiment.sh
on the sl2
machines (sl2-*.cse.iitb.ac.in
).
For each bandit instance from
{instance-5.txt
,instance-25.txt
}; each
algorithm from {epsilon-greedy
, UCB
, KL-UCB
, Thompson-Sampling
}; each
horizon in {10, 100, 1000, 10000, 100000}, generate at least 100 runs
by varying the random seed. Create a separate plot for each bandit
instance, with the horizon on the x axis and the (expected cumulative)
regret on the y axis. Plot a curve corresponding to each algorithm;
each point must be the average over all runs for the corresponding
setting of instance, horizon, and algorithm.
Place your plots in a single pdf file
called report.pdf
, and along with the plots, note down
your observations about the results obtained (what patterns do they
display?; did you notice anything unexpected?; do you have any ideas to
further improve performance?). The report should also include any
relevant details about your implementation of the algorithms (for
example, if you tuned any parameters).
Place all the data generated from your experiments, as well
as report.pdf
, within a directory
called report
, and place this directory
inside client
.
If you have used any external code snippets or libraries in your code, make sure you provide references in your report. It is okay to use public code for parts of your agent such as the network communication module, or, say, libraries for random number generation and sorting. However, the logic used for sampling the arms must entirely be code that you have written.
In summary: you must submit your client
directory (compressed as
client.tar.gz
) through Moodle. The directory must contain
startclient.sh
, along with all the sources and
executables for the agent. The client
directory must also
contain a directory called report
, in which you provide
all the data generated by your experiments, and a file
called report.pdf
.
Your algorithms will be tested on different bandit instances and horizons to verify that they perform as expected. Their performance on the two provided bandit instances, as well as your accompanying report, will carry 6 marks. The performance of the algorithms on other unseen bandit instances will carry 4 marks. For each such (algorithm, instance, horizon) configuration, a large number of runs (say 100) will be conducted by varying the random seed passed to the server and agent. The average regret over these runs will be recorded. The number of bandit arms will be between 2 and 50; the horizon will be between 1 and 100,000.
The TAs and instructor may look at your source code and notes to corroborate the results obtained by your agent, and may also call you to a face-to-face session to explain your code.
Your submission is due by 11.55 p.m., Monday, August 14. You are
advised to finish working on your submission well in advance, keeping
enough time to test it on the sl2
machines and upload to Moodle. Your
submission will not be evaluated (and will be given a score of zero)
if it is not received by the deadline.
You must work alone on this assignment. Do not share any code (whether yours or code you have found on the Internet) with your classmates. Do not discuss the design of your agent with anybody else.
You will not be allowed to alter your code in any way after the
submission deadline. Before submission, make sure that it runs for a
variety of experimental conditions on the sl2
machines. If your code requires any special libraries to run, it
is your responsibility to get those libraries working on
the sl2
machines (go through
the CSE bug tracking
system to make a request to the system administrators.)