Multivariate Analysis
Cluster Analysis Overview

Some General Comments on Cluster Analysis

Cluster analysis is one of those techniques that is very attractive to both students and researchers alike. The idea behind cluster analysis is very simple, that is, to identify groupings or clusters of individuals, using multiple variables, that are not readily aparent to the researcher. The figure below gives a simplistic example of two clusters defined by two variables.

The problem with cluster analysis is that in all but the simplest of cases uniquely defined clusters may not exist. Cluster analysis is a collection of techniques and algorithms which often classify the same observations into completely different groupings. For example, cluster analysis tends to be good at finding spherical cluster and has great difficulty curved clusters, as in the example below, even though humans easily discern the two clusters.

Another issue to be aware of is that cluster analysis treats all variables as being equally important in determings cluster membership.

Nick Cox Comments on Cluster Analysis

In response to a question on which method of cluster is best, Nick Cox of University of Durham commented:

Fron the Stata Reference Manual

The following quote is from the Stata Reference Manual in the section on cluster analysis:

Multivariate Course Page

Phil Ender, 21Jan05