<html><head>
<meta http-equiv="content-type" content="text/html; charset=ISO-8859-1"></head><body>The aim of this homework is to train classifiers that are sensitive to
the following kinds of classification costs.  
<ul>
<li> Misclassification cost: For every pair of classes i,j you will be
given a cost C(i,j) that is the cost incurred if you predict class j
for an instance whose true class is i.  C(i,i)=0.
</li><li> Attribute cost: Each of the d attributes are associated with
a cost A(k) and the cost of prediction is the sum of the cost of the
attributes you inspect in making the prediction.
</li></ul>

At training time, you will be given normal classification dataset with
all attributes of all instances available at no cost.  During
deployment, for each instance, you will have to pay a cost A(k) when
you use attribute k in making a prediction.  That is why, these are
called active classifiers because they actively decide which attribute
values should be evaluated and do not passively assume that all
attribute values are available.

Your task is to build two cost sensitive active classifiers, one of
which could be a decision.


Here are some reading material.
<ul>
<li><a href="http://www.cs.washington.edu/research/jair/volume2/turney95a-html/title.html">This is a good introduction to the problem and provides many motivating examples</a>
</li><li> <a href="http://vista.it.iitb.ac.in/%7Esunita/papers/csc/costDecisionTrees.PDF">Cost-based decision trees (local copy)</a>
</li><li><a href="http://vista.it.iitb.ac.in/%7Esunita/papers/csc/ml2002-statistical-pruning.pdf"><another popular="" paper="" (local="" copy)=""></another></a>
</li><li> <a href="http://www.cs.rutgers.edu/%7Emlittman/papers/ijcai03-csfr.pdf"> A Nice application</a>

</li><li> Watch this space for more...
</li></ul>

The HW can be done in groups of four.  You will be judged both on the
cost you achieve on the test datasets we provide and also on the
quality of your implementation.

<h3> Datasets </h3>
<ul>
<li>
Data with costs attached to attributes: <a href="http://vista.it.iitb.ac.in/%7Esunita/datasets/UCI-ML/pima-indians-diabetes/">pima diabetes dataset</a> Attribute costs are <a href="http://vista.it.iitb.ac.in/%7Esunita/datasets/UCI-ML/pima-indians-diabetes/costs/pima-indians-diabetes.cost">here.</a>
</li><li> For multi-class experiments you can use the following datasets from <a href="http://vista.it.iitb.ac.in/%7Esunita/datasets/UCI-ML/">the UCI repository</a>:
solar-flare,
letter, annealing. Misclassification costs and attribute costs can be
generated randomly for testing purposes as described in <a href="http://vista.it.iitb.ac.in/%7Esunita/papers/csc/kdd04AZL.pdf">this paper</a>
</li></ul>
</body></html>