Next: Clustering by mixture decomposition
Up: Partitional Clustering
Previous: Partitional Clustering
The most commonly used clustering strategy is based on the square-root
error criterion.
Objective: To obtain a partition which, for a fixed number of clusters, minimizes the square-error where square-error is the sum of the Euclidean distances between each pattern and its cluster center.
Algorithm
- Select an initial partition with k clusters.
Repeat steps 2 through 5 until the cluster membership stabilizes.
- Generate a new partition by assigning each pattern to its closest
cluster center.
- Compute new cluster centers as the centroids of the clusters.
- Repeat steps 2 and 3 until an optimum value of the criterion is found.
- Adjust the number of clusters by merging and splitting existing clusters
or by removing small or outlier clusters.
The algorithm converges when the criterion function cannot be improved.
Initial partition
- Select k seed points at random or by taking the centroid as the first seed point
and the rest at a certain miminimum distance from this seed point.
- Cluster the remaining points to the closest seed point.
Updating a partition
K-means:
- In each pass(cycle) make an assignment of all patterns to the
closest cluster center.
- Recompute the cluster center after every new assignment is made.
Adjusting the number of clusters
- Clustering algorithms can create new clusters or merge existing ones if
certain conditions specified by the user are met.
- Split a cluster if it has too many patterns and an unusually large
variance along the feature with large spread.
- Merge if they are sufficiently close.
- Remove outliers from future consideration. (outliers are
pattern/patterns that is sufficiently far removed from the rest of
the data and hence suspected as a
mistake in data entry.)
Performance of square-error clustering methods
- Seeks compact hyper-ellipsoidal clusters and this can produce misleading results when the data do not occur in compact, hyper-ellipsoidal boundaries.
- Exhibit inadequacies when the Euclidean measure is used to measure distance but the features are not on comparable scales.
Next: Clustering by mixture decomposition
Up: Partitional Clustering
Previous: Partitional Clustering
Miranda Maria Irene
Thu Apr 1 15:43:18 IST 1999