Next: Categorial Attributes
Up: Finding Split point
Previous: Finding Split point
- List for continuous attributes are kept in sorted order
- The candidate split points are the mid points between every two consecutive attribute values in training data
- For determining the split for an attribute for a node, two histograms
are used :
C_below stores the frequency of classes in which the attribute belongs for the records that are scanned
C_above stores the frequency of classes in which the attribute belongs for the records yet to be scanned.
- C_below is initialized to zero and C_above is initialized with the
class distribution for all the records for the node.
- C_below and C_above are evaluated and updated everey time when attribute record are read
- If winning split point found during scan, it's saved and C_below and C_above are deallocated before processing next attribute
DBMS
1999-03-11