Cluster Analysis - Discussion Definition Vocabulary Simple Procedure SPSS example ICPSR and hands on
Definition Cluster analysis is a process by which we take a large number of cases (read that observations across respondents) and reduce them into a smaller number of mutually exclusive “groups”, by “clustering” the shared variation among respondents across variables. The result is a “grouping” for each case across all variables.
Vocabulary and Procedure There are essentially two steps in Cluster Analysis: 1. First is to create a table of relative similarities or differences between all objects. The table of relative similarities is called a proximities matrix. 2. Use this information to combine the objects into groups. The method of combining objects into groups is called a clustering algorithm. The idea is to combine objects that are similar to one another into the same group.
Vocabulary and Procedure (cont.) In this respect, cluster analysis is the obverse of factor analysis. Whereas factor analysis reduces the number of variables by grouping them into a smaller set of factors, cluster analysis reduces the number of observations or cases by grouping them into a smaller set of clusters. The obvious challenge is to determine which variables to include across observations and how to combine such variables, once they are chosen.
Clustering – Flat Method There are two types of clustering methods—flat and hierarchical. If the number of groups is known beforehand, the "flat" method works. In SPSS, this is called K-means clustering. Using this method, the objects are assigned to a given group at the first step based on some initial criterion. The means for each group are then calculated. The next step reshuffles the objects into groups, assigning objects to groups based on the object's similarity to the current mean of that group. The means of the groups are recalculated at the end of this step. This process continues recursively until no objects change groups.
Clustering – Hierarchical Method If the groups are not known a priori, hierarchical clustering works better. There are two kinds: Divisive – Starts with all observations in one groups and continues to divide into subgroups until no further distinction can be made. Agglormerative – starts with each observation as a separate group and continues to pair observations until all groups are formed.
Steps in the Analysis Input the data Choose the method for grouping Generate the Output Interpret the results