Next: Performing Splits
Up: Parallelizing Classification
Previous: Data Placement and Workload
- Continuous attribute
- In the parallel environment, each processor has a separate contiguous
section of a "global" attribute list. Thus, the processor's C_below
and C_above histograms are initialized to reflect that
- C_below is initialized to reflect the class distribution of all
sections of the attribute-list assigned to processors of lower rank.
- C_above is initialized to reflect the class distribution of the
local section as well as all sections assigned to processors of
higher rank.
- The statistics for initializing C_above & C_below are gathered when
attribute lists for new leaves are created. After collecting statistics, the information is exchanged between all the processors and stored with
each leaf, where it is later used to initialize that leaf's c_above
and C_below histograms.
- After processing the attribute-list section, the processors
communicate to determine which of the N split points has the lowest
cost.
- Categorical attributes
- The count matrix built by each processor is based on "local"
information only. Hence, the count matrix are exchanged to get the
"global" counts.
- The global matrix is calculated by a coordinator.
- The global matrix is used to calculate the best split for each
categorical attribute.
Next: Performing Splits
Up: Parallelizing Classification
Previous: Data Placement and Workload
DBMS
1999-03-11