# Classification: Cluster Analysis and Related Techniques Tanya, Caroline, Nick.

## Presentation on theme: "Classification: Cluster Analysis and Related Techniques Tanya, Caroline, Nick."— Presentation transcript:

Classification: Cluster Analysis and Related Techniques Tanya, Caroline, Nick

Introduction to Classification Search for divisions within data identify groups of individuals with similar characteristics and cluster them togetherSearch for divisions within data identify groups of individuals with similar characteristics and cluster them together Help researchers explore data and generate hypotheses like ordinationHelp researchers explore data and generate hypotheses like ordination –Ordination techniques vs. Classification techniques

Objective ?? What is a cluster?What is a cluster? No formal rule exists for identifying clusters it is subjective; you make the callNo formal rule exists for identifying clusters it is subjective; you make the call

Hierarchical vs. Non-Hierarchical Hierarchical divide data into clusters and looks for relationships between them to create higher order clusters create dendrogramsHierarchical divide data into clusters and looks for relationships between them to create higher order clusters create dendrograms –Dendrograms subdivide a set of individuals into progressively smaller clusters until a stopping condition is encountered Non-hierarchical divide data into clusters without looking at relationships between clustersNon-hierarchical divide data into clusters without looking at relationships between clusters

Dendrogram of Classification Techniques

Hierarchical Techniques Monothetic vs. PolytheticMonothetic vs. Polythetic –Monothetic imposes classifications based on the presence or absence of one attribute at a time Association analysisAssociation analysis –Polythetic uses all information within data Most common modern approachMost common modern approach Cluster analysisCluster analysis TWINSPANTWINSPAN

Cluster Analysis Many procedures and algorithms may be used to create a valid dendrogramMany procedures and algorithms may be used to create a valid dendrogram Similar in technique to Bray-Curtis OrdinationSimilar in technique to Bray-Curtis Ordination Procedure:Procedure: –Square Matrix of Dissimilarities Find lowest distance in matrix Identify pair that generated this Fuse two observations together (First Cluster)

Example

Example

Dissimilarity Matrix

Rules for cluster formation Single- link clustering (AKA Nearest- neighbor clustering)Single- link clustering (AKA Nearest- neighbor clustering) –Clusters are defined by fusing the individual pairs with the smallest distance –Chaining- two individuals ending up in the same cluster despite having a big dissimilarity occurs if linked by closely connected points –Constituent clusters may increase in size gradually with each fusion adding one or small number of elements inconclusive and hard to interpret

Other Rules Complete- Link ClusteringComplete- Link Clustering –Allows fusion between members separated by the greatest distance –Exact opposite of Single Link –May end up separating individuals that are very similar Minimum Variance Clustering (Wards technique) Minimum Variance Clustering (Wards technique) –Intermediate

Interpretation There are NO objective rules for interpreting dendrogramsThere are NO objective rules for interpreting dendrograms Use dendrogram for Hypothesis Formation look for divisions that coincide with existing knowledge about the data Metadata (Chapter 1)Use dendrogram for Hypothesis Formation look for divisions that coincide with existing knowledge about the data Metadata (Chapter 1) Complementary AnalysisComplementary Analysis

Divisive Classification Techniques Takes an entire dataset and divides it into categoriesTakes an entire dataset and divides it into categories As always, the boundaries for these categories is subjectiveAs always, the boundaries for these categories is subjective On a plus though, this forces us to admit that there is some uncertainty which a software package wouldnt tell usOn a plus though, this forces us to admit that there is some uncertainty which a software package wouldnt tell us

TWINSPAN Acronym for Two-way indicator species analysisAcronym for Two-way indicator species analysis Polythetic divisive classification techniquePolythetic divisive classification technique Output is in two-way tablesOutput is in two-way tables

TWINSPAN Tables There are two ordered lists, one for species and one for observationsThere are two ordered lists, one for species and one for observations There are two dendrograms, one to classify species, and one to classify observationsThere are two dendrograms, one to classify species, and one to classify observations Pseudospecies are constructs that convert continuous distributions to a presence/absence (discrete)Pseudospecies are constructs that convert continuous distributions to a presence/absence (discrete)

HOMEWORK!!!!!! 1) What is the difference between Hierarchical and Non- Hierarchical classification technique 2) Define Cluster 3) T/F There can be only one valid dendrogram for a single data set? (Correct if False) **********Bonus********** What is the background of the powerpoint suppose to represent?