next up previous
Next: Hierarchical Clustering Up: No Title Previous: Clustering Methodology

Clustering

A clustering is a type of classification imposed on a finite set of objects. The relationship between objects is represented in a proximity matrix in which rows and columns correspond to objects.

The proximity matrix is the one and only input to a clustering algorithm.

The following figure shows a tree of classification problems

   figure44
Figure 2: Tree of classification types

Exclusive versus nonexclusive
An exclusive classification is a partition of the set of objects. Each object belongs to exactly one subset, or cluster. Nonexclusive, or overlapping, classification can assign an object to several classes.

Intrinsic versus extrinsic
An intrinsic classification uses only the proximity matrix to perform the classification. Intrinsic classification is called "unsupervised learning".

Extrinsic classification uses category labels on the objects as well as the proximity matrix. The problem is then to establish a discriminant surface that separates the objects according to category.

Hierarchical versus partitional

A hierarchical clustering method is a procedure for transforming a proximity matrix into a sequence of nested partitions.

Algorithms that express exclusive, intrinsic classification.

Agglomerative versus divisive
An agglomerative, hierarchical classificat ion places each object in its own cluster and gradually merges these atomic clusters into larger and larger clusters until all objects are in a single cluster. Divisive, hierarchical classification reverses the process by starting with all objects in one cluster and subdividing into smaller pieces.

Serial versus simultaneous
Serial procedures handle the patterns one by one whereas simultaneous classification works with the entire set of patterns at the same time.

Monothetic versus polythetic
This option is most applicable to problems in taxonomy, where the objects to be clustered are represented as patterns, or points in a space. A monothetic clustering algorithm uses the features one by one, whereas, a polythetic procedure uses all the features at once.

Graph theory versus matrix algebra

Some algorithms in terms of graph theory use properties such as connectedness and completeness to define classifications, and express other algorithms in terms of algebraic constructs, such as mean-square-error.

When implementing an algorithm on a computer, attention must be paid to questions of computational efficiency.


next up previous
Next: Hierarchical Clustering Up: No Title Previous: Clustering Methodology

Miranda Maria Irene
Thu Apr 1 15:43:18 IST 1999