 # UPGMA Algorithm.  Main idea: Group the taxa into clusters and repeatedly merge the closest two clusters until one cluster remains  Algorithm  Add a.

## Presentation on theme: "UPGMA Algorithm.  Main idea: Group the taxa into clusters and repeatedly merge the closest two clusters until one cluster remains  Algorithm  Add a."— Presentation transcript:

UPGMA Algorithm

 Main idea: Group the taxa into clusters and repeatedly merge the closest two clusters until one cluster remains  Algorithm  Add a leaf to the tree for each taxon  Initially make each taxon be its own cluster  Find the closest clusters and connect with node in the tree (place new node at equal distance from the clusters)  Repeat previous step until all clusters are connected UPGMA Algorithm x4x4 x2x2 x3x3 x5x5 x1x1 x3x3 x5x5 x1x1 x2x2 x4x4 root

 The algorithm needs to compute distance between clusters  The distance between clusters C i and C j is defined to be the average distance between all pairs of taxa in C i and C j UPGMA Clustering

 The algorithm needs to compute distance between clusters  The distance between clusters C i and C j is defined to be the average distance between all pairs of taxa in C i and C j  Shortcut when combining C i and C j to form new cluster C k UPGMA Clustering

UPGMA Example

Assume the following distance matrix x1x1 x2x2 x3x3 x4x4 x5x5 x1x1 -166 6 x2x2 - 8 x3x3 6 - 2 x4x4 8 - x5x5 6 2 - Closest Pair is {x 3, x 5 } so cluster them, C 1 = {x 3,C 5 } Compute the distance from C 1 to the rest d(C 1,x 1 ) = 1/2 (d(x 3,x 1 ) + d(x 5,x 1 ) ) = 6 d(C 1,x 2 ) = 1/2 (d(x 3,x 2 ) + d(x 5,x 2 ) ) = 16 d(C 1,x 4 ) = 1/2 (d(x 3,x 4 ) + d(x 5,x 4 ) ) = 16 Add new node for x 3, x 5 at height d(x 3,x 5 ) / 2 = 1 x3x3 x5x5 1 1 UPGMA

x1x1 x2x2 x4x4 C1C1 x1x1 -16 6 x2x2 -8 x4x4 8- C1C1 6 - Closest Pair is {x 1, C 1 } so cluster them, C 2 = {x 1,C 1 } Compute the distances from C 2 to the d(C 2,x 2 ) = 1/3 (d(x 1,x 2 ) + d(x 3,x 2 ) +d(x 5,x 2 ) ) = 16 d(C 2,x 4 ) = 1/3 (d(x 1,x 4 ) + d(x 3,x 4 ) +d(x 5,x 4 ) ) = 16 Add new node for x 1, C 1 at height d(x 1,C 1 ) / 2 = 3 The updated distance matrix – C 1 replaced x 3, x 5 x1x1 3 2 x3x3 x5x5 1 1 UPGMA

Closest Pair is {x 2, x 4 } so cluster them, C 3 = {x 2,x 4 } Compute the distances from C 3 to the rest d(C 3,C 2 ) = 1/6 (d(x 2,x 1 ) + d(x 2,x 3 ) +d(x 2,x 5 ) + d(x 4,x 1 ) + d(x 4,x 3 ) +d(x 4,x 5 )) = 16 Add new node for x 2, x 4 at height d(x 2,x 4 ) / 2 = 4 The updated distance matrix – C 2 replaced x 1, C 1 x2x2 x4x4 C2C2 x2x2 -816 x4x4 8- C2C2 - x3x3 x5x5 1 x1x1 3 2 1 x2x2 x4x4 44 UPGMA

Closest Pair is {C 2, C 3 } so cluster them, C 4 = {C 2,C 3 } Add new node for C 2, C 3 at height d(C 2,C 4 ) / 2 = 8 The updated distance matrix – C 3 replaced x 2, x 4 C2C2 C3C3 C2C2 -16 C3C3 - x3x3 x5x5 1 x1x1 3 2 1 x2x2 x4x4 44 45 root UPGMA Done! Double-check if original distances between taxa are preserved (not guaranteed)

UPGMA Summary  Distance-based algorithm that produces rooted trees  Assumes that all species evolve at the same rate (molecular clock hypothesis)  Implication of molecular clock hypothesis is that distance from root to any taxon is the same  Final tree may not preserve original distances between the taxa x3x3 x5x5 1 x1x1 3 2 1 x2x2 x4x4 44 45 root

Download ppt "UPGMA Algorithm.  Main idea: Group the taxa into clusters and repeatedly merge the closest two clusters until one cluster remains  Algorithm  Add a."

Similar presentations