next up previous
Next: Clustering Up: No Title Previous: No Title

Clustering Methodology

The major steps to be considered in a data mining problem are shown in Figure 1.

   figure30
Figure 1: Clustering methodology

  1. Data collection: This step requires careful recording of data.
  2. Initial screening: Raw data usually needs some massaging before they are ready for analysis. For example: All the values of a particular feature might be same, so that feature can be eliminated.
  3. Representation: Putting the data into a form suitable for further analysis.
  4. Clustering tendency: Finding out if there exists some justification for clustering. If the data cannot be shown to have the tendency to cluster then analysis techniques should be applied rather than cluster analysis.
  5. Clustering Strategy: This involves choosing the appropriate clustering algorithm. Thought must be given to details such as matching the algorithm to the data, the presentation of results and the choice of parameters.
  6. Validation: This step changes the analysis into hard evidence. Stability is one basis for comparing clustering methods.
  7. Interpretation: Drawing conclusion from the analysis. This depends on the application.



Miranda Maria Irene
Thu Apr 1 15:43:18 IST 1999