Continuous Attributes

Next: Categorial Attributes Up: Finding Split point Previous: Finding Split point

List for continuous attributes are kept in sorted order
The candidate split points are the mid points between every two consecutive attribute values in training data
For determining the split for an attribute for a node, two histograms are used :
C_below stores the frequency of classes in which the attribute belongs for the records that are scanned
C_above stores the frequency of classes in which the attribute belongs for the records yet to be scanned.
C_below is initialized to zero and C_above is initialized with the class distribution for all the records for the node.
C_below and C_above are evaluated and updated everey time when attribute record are read
If winning split point found during scan, it's saved and C_below and C_above are deallocated before processing next attribute

DBMS
1999-03-11