
  Assignment 1


      Deadline : Sept 6 2008

Group : 4 students (max)

This homework is about training models that predict continuous class
labels, that is regression problems.  We have not covered much of
regression in the class.  So, your first step is to read up about what
methods exist for regression problems.  A starting point is chapter 3
of the HTF book <../?id=8#HTF> but you are encouraged to
search the  literature for more.  We will only do linear regression.

Read and design at least two reasonable loss functions. Comment on their
strengths and weakness.

Implement the trainer using the lbfgs optimizer. Local copy of LBFGS.jar
Make sure your code is modular and can be easily extended to work on a
third loss function, if needed.

Integrate your code in WEKA, an open source Data Mining tool. You need
to understand the code of Weka and make interfaces to add your
classifier to Weka. You should not write your own code for reading
datasets and performing experiments. Use Weka libraries as far as
possible. The TAs will provide you instructions on how to obtain
Weka.jar and how to use it.

You will be given a bunch of datasets by the TAs. For each dataset,
report the square error and the value of your loss function with $N$
fold cross-validation with N increasing from 1 to 10. Exact formats for
reporting your results will be specified by the TAs. Submit the
assignment as follows:- 1) Keep all files in a folder rollno_assign1
(Roll No: of any one group member) eg. (08305001_assign1) 2) Create a
README file in the same folder that contains following information : -->
Name and roll number of all the group members. --> Information about the
files. --> Any other information for eg. some typical assumption etc. 3)
Make a tar gzip of the folder as : tar -zcvf rollno_assign1.tar.gz
rollno_assign1 4) Upload the tar file on the course webpage at following
link. http://www.cse.iitb.ac.in/~sunita/cs636/?id=11

