Next: Continuous Attributes
Up: Serial Algorithm
Previous: Histograms
Goal at each node is to find a best split point. The value of a split point depends upon how well it separates classes.
- SPRINT uses Gini index
- For a data set s containing patterns from n classes, gini(S) is defined as follows : gini(S)=1-(sum p[j]^2) where p[j] is the relative frequency
of class j in S.
- If a split divides S into two subsets S1 and S2 , the index of divided data is given by:
gini_split(S) = (n1/n)gini(s1) + (n2/n)gini(S2)
- To find best split point for a node, each node's attribute is scanned and the attribute having minimum gini index value is chosen to split the node
DBMS
1999-03-11