UPGMA Algorithm.  Main idea: Group the taxa into clusters and repeatedly merge the closest two clusters until one cluster remains  Algorithm  Add a.

Slides:



Advertisements
Similar presentations
Computing a tree Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas.
Advertisements

SEEM Tutorial 4 – Clustering. 2 What is Cluster Analysis?  Finding groups of objects such that the objects in a group will be similar (or.
Hierarchical Clustering
1 CSE 980: Data Mining Lecture 16: Hierarchical Clustering.
O(N 1.5 ) divide-and-conquer technique for Minimum Spanning Tree problem Step 1: Divide the graph into  N sub-graph by clustering. Step 2: Solve each.
Effects of Rooting on Phylogenic Algorithms Margareta Ackerman Joint work with David Loker and Dan Brown.
Data Mining Cluster Analysis: Basic Concepts and Algorithms
Agglomerative Hierarchical Clustering 1. Compute a distance matrix 2. Merge the two closest clusters 3. Update the distance matrix 4. Repeat Step 2 until.
. Class 9: Phylogenetic Trees. The Tree of Life Evolution u Many theories of evolution u Basic idea: l speciation events lead to creation of different.
Computing a tree Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas.
Brandon Andrews CS6030.  What is a phylogenetic tree?  Goals in a phylogenetic tree generator  Distance based method  Fitch-Margoliash Method Example.
Phylogenetic trees Sushmita Roy BMI/CS 576 Sep 23 rd, 2014.
Fitch-Margoliash (FM) Algorithm
From Ernst Haeckel, 1891 The Tree of Life.  Classical approach considers morphological features  number of legs, lengths of legs, etc.  Modern approach.
Lecture 7 – Algorithmic Approaches Justification: Any estimate of a phylogenetic tree has a large variance. Therefore, any tree that we can demonstrate.
Building phylogenetic trees Jurgen Mourik & Richard Vogelaars Utrecht University.
Distance methods. UPGMA: similar to hierarchical clustering but not additive Neighbor-joining: more sophisticated and additive What is additivity?
The Tree of Life From Ernst Haeckel, 1891.
5 - 1 Chap 5 The Evolution Trees Evolutionary Tree.
CISC667, F05, Lec15, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Phylogenetic Trees (II) Distance-based methods.
Phylogenetic Trees Tutorial 6. Measuring distance Bottom-up algorithm (Neighbor Joining) –Distance based algorithm –Relative distance based Phylogenetic.
Cluster Analysis.  What is Cluster Analysis?  Types of Data in Cluster Analysis  A Categorization of Major Clustering Methods  Partitioning Methods.
. Class 9: Phylogenetic Trees. The Tree of Life D’après Ernst Haeckel, 1891.
Phylogenetic Trees Tutorial 6. Measuring distance Bottom-up algorithm (Neighbor Joining) –Distance based algorithm –Relative distance based Phylogenetic.
Chapter 5 The Evolution Trees.
Distance-Based Phylogenetic Reconstruction Tutorial #8 © Ilan Gronau, edited by Itai Sharon.
© University of Minnesota Data Mining for the Discovery of Ocean Climate Indices 1 CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance.
Building Phylogenies Distance-Based Methods. Methods Distance-based Parsimony Maximum likelihood.
Phylogenetic trees Tutorial 6. Distance based methods UPGMA Neighbor Joining Tools Mega phylogeny.fr DrewTree Phylogenetic Trees.
Phylogenetic trees Sushmita Roy BMI/CS 576
Chapter 3: Cluster Analysis  3.1 Basic Concepts of Clustering  3.2 Partitioning Methods  3.3 Hierarchical Methods The Principle Agglomerative.
9/1/ Ultrametric phylogenies By Sivan Yogev Based on Chapter 11 from “Inferring Phylogenies” by J. Felsenstein.
Molecular evidence for endosymbiosis Perform blastp to investigate sequence similarity among domains of life Found yeast nuclear genes exhibit more sequence.
Hierarchical Clustering
PHYLOGENETIC TREES Dwyane George February 24,
COSC 2007 Data Structures II Chapter 14 Graphs III.
DOCUMENT CLUSTERING. Clustering  Automatically group related documents into clusters.  Example  Medical documents  Legal documents  Financial documents.
OUTLINE Phylogeny UPGMA Neighbor Joining Method Phylogeny Understanding life through time, over long periods of past time, the connections between all.
Building phylogenetic trees. Contents Phylogeny Phylogenetic trees How to make a phylogenetic tree from pairwise distances  UPGMA method (+ an example)
Introduction to Phylogenetics
CSE5334 DATA MINING CSE4334/5334 Data Mining, Fall 2014 Department of Computer Science and Engineering, University of Texas at Arlington Chengkai Li (Slides.
Comp. Genomics Recitation 8 Phylogeny. Outline Phylogeny: Distance based Probabilistic Parsimony.
Phylogeny Ch. 7 & 8.
Ch. Eick: Introduction to Hierarchical Clustering and DBSCAN 1 Remaining Lectures in Advanced Clustering and Outlier Detection 2.Advanced Classification.
Phylogenetic trees Sushmita Roy BMI/CS 576 Sep 23 rd, 2014.
Hierarchical Clustering Produces a set of nested clusters organized as a hierarchical tree Can be visualized as a dendrogram – A tree like diagram that.
Tutorial 5 Phylogenetic Trees.
1 CAP5510 – Bioinformatics Phylogeny Tamer Kahveci CISE Department University of Florida.
Distance-Based Approaches to Inferring Phylogenetic Trees BMI/CS 576 Colin Dewey Fall 2010.
Hierarchical clustering approaches for high-throughput data Colin Dewey BMI/CS 576 Fall 2015.
Distance-based methods for phylogenetic tree reconstruction Colin Dewey BMI/CS 576 Fall 2015.
Data Mining Cluster Analysis: Basic Concepts and Algorithms Lecture Notes Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar Introduction.
Lecture 14 CS5661 Neighbor Joining Generates unrooted tree, allowing for unequal branches Given: Distance matrix for sequences Steps: Repeat 1-3 till all.
CSE 4705 Artificial Intelligence
Phylogeny - based on whole genome data
Shortest Path from G to C Using Dijkstra’s Algorithm
Inferring a phylogeny is an estimation procedure.
Network Flow Problems – Shortest Path Problem
Clustering methods Tree building methods for distance-based trees
Hierarchical clustering approaches for high-throughput data
The Tree of Life From Ernst Haeckel, 1891.
Phylogenetic Trees.
The BIRCH Algorithm Davitkov Miroslav, 2011/3116
CIS595: Lecture 5 Acknowledgement:
Birch presented by : Bahare hajihashemi Atefeh Rahimi
#30 - Phylogenetics Distance-Based Methods
Lecture 7 – Algorithmic Approaches
Phylogeny.
SEEM4630 Tutorial 3 – Clustering.
Presentation transcript:

UPGMA Algorithm

 Main idea: Group the taxa into clusters and repeatedly merge the closest two clusters until one cluster remains  Algorithm  Add a leaf to the tree for each taxon  Initially make each taxon be its own cluster  Find the closest clusters and connect with node in the tree (place new node at equal distance from the clusters)  Repeat previous step until all clusters are connected UPGMA Algorithm x4x4 x2x2 x3x3 x5x5 x1x1 x3x3 x5x5 x1x1 x2x2 x4x4 root

 The algorithm needs to compute distance between clusters  The distance between clusters C i and C j is defined to be the average distance between all pairs of taxa in C i and C j UPGMA Clustering

 The algorithm needs to compute distance between clusters  The distance between clusters C i and C j is defined to be the average distance between all pairs of taxa in C i and C j  Shortcut when combining C i and C j to form new cluster C k UPGMA Clustering

UPGMA Example

Assume the following distance matrix x1x1 x2x2 x3x3 x4x4 x5x5 x1x x2x2 - 8 x3x x4x4 8 - x5x Closest Pair is {x 3, x 5 } so cluster them, C 1 = {x 3,C 5 } Compute the distance from C 1 to the rest d(C 1,x 1 ) = 1/2 (d(x 3,x 1 ) + d(x 5,x 1 ) ) = 6 d(C 1,x 2 ) = 1/2 (d(x 3,x 2 ) + d(x 5,x 2 ) ) = 16 d(C 1,x 4 ) = 1/2 (d(x 3,x 4 ) + d(x 5,x 4 ) ) = 16 Add new node for x 3, x 5 at height d(x 3,x 5 ) / 2 = 1 x3x3 x5x5 1 1 UPGMA

x1x1 x2x2 x4x4 C1C1 x1x x2x2 -8 x4x4 8- C1C1 6 - Closest Pair is {x 1, C 1 } so cluster them, C 2 = {x 1,C 1 } Compute the distances from C 2 to the d(C 2,x 2 ) = 1/3 (d(x 1,x 2 ) + d(x 3,x 2 ) +d(x 5,x 2 ) ) = 16 d(C 2,x 4 ) = 1/3 (d(x 1,x 4 ) + d(x 3,x 4 ) +d(x 5,x 4 ) ) = 16 Add new node for x 1, C 1 at height d(x 1,C 1 ) / 2 = 3 The updated distance matrix – C 1 replaced x 3, x 5 x1x1 3 2 x3x3 x5x5 1 1 UPGMA

Closest Pair is {x 2, x 4 } so cluster them, C 3 = {x 2,x 4 } Compute the distances from C 3 to the rest d(C 3,C 2 ) = 1/6 (d(x 2,x 1 ) + d(x 2,x 3 ) +d(x 2,x 5 ) + d(x 4,x 1 ) + d(x 4,x 3 ) +d(x 4,x 5 )) = 16 Add new node for x 2, x 4 at height d(x 2,x 4 ) / 2 = 4 The updated distance matrix – C 2 replaced x 1, C 1 x2x2 x4x4 C2C2 x2x x4x4 8- C2C2 - x3x3 x5x5 1 x1x x2x2 x4x4 44 UPGMA

Closest Pair is {C 2, C 3 } so cluster them, C 4 = {C 2,C 3 } Add new node for C 2, C 3 at height d(C 2,C 4 ) / 2 = 8 The updated distance matrix – C 3 replaced x 2, x 4 C2C2 C3C3 C2C2 -16 C3C3 - x3x3 x5x5 1 x1x x2x2 x4x root UPGMA Done! Double-check if original distances between taxa are preserved (not guaranteed)

UPGMA Summary  Distance-based algorithm that produces rooted trees  Assumes that all species evolve at the same rate (molecular clock hypothesis)  Implication of molecular clock hypothesis is that distance from root to any taxon is the same  Final tree may not preserve original distances between the taxa x3x3 x5x5 1 x1x x2x2 x4x root