From Ernst Haeckel, 1891 The Tree of Life.  Classical approach considers morphological features  number of legs, lengths of legs, etc.  Modern approach.

Slides:



Advertisements
Similar presentations
Computing a tree Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas.
Advertisements

Phylogenetic Tree A Phylogeny (Phylogenetic tree) or Evolutionary tree represents the evolutionary relationships among a set of organisms or groups of.
. Class 9: Phylogenetic Trees. The Tree of Life Evolution u Many theories of evolution u Basic idea: l speciation events lead to creation of different.
Phylogenetic Trees Lecture 12
. Intro to Phylogenetic Trees Lecture 5 Sections 7.1, 7.2, in Durbin et al. Chapter 17 in Gusfield Slides by Shlomo Moran. Slight modifications by Benny.
Computing a tree Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas.
Lecture 13 CS5661 Phylogenetics Motivation Concepts Algorithms.
Parsimony based phylogenetic trees Sushmita Roy BMI/CS 576 Sep 30 th, 2014.
Phylogenetics - Distance-Based Methods CIS 667 March 11, 2204.
Phylogenetic trees Sushmita Roy BMI/CS 576 Sep 23 rd, 2014.
Molecular Evolution Revised 29/12/06
Maximum Parsimony (MP) Algorithm. MP Algorithm  Character-based algorithm – does not use distances, but utilizes the character information in sequences.
Fitch-Margoliash (FM) Algorithm
UPGMA Algorithm.  Main idea: Group the taxa into clusters and repeatedly merge the closest two clusters until one cluster remains  Algorithm  Add a.
Lecture 7 – Algorithmic Approaches Justification: Any estimate of a phylogenetic tree has a large variance. Therefore, any tree that we can demonstrate.
. Phylogeny II : Parsimony, ML, SEMPHY. Phylogenetic Tree u Topology: bifurcating Leaves - 1…N Internal nodes N+1…2N-2 leaf branch internal node.
UPGMA and FM are distance based methods. UPGMA enforces the Molecular Clock Assumption. FM (Fitch-Margoliash) relieves that restriction, but still enforces.
. Phylogenetic Trees Lecture 11 Sections 7.1, 7.2, in Durbin et al.
Building phylogenetic trees Jurgen Mourik & Richard Vogelaars Utrecht University.
The Tree of Life From Ernst Haeckel, 1891.
Phylogenetic reconstruction
CISC667, F05, Lec15, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Phylogenetic Trees (II) Distance-based methods.
. Phylogenetic Trees Lecture 1 Credits: N. Friedman, D. Geiger, S. Moran,
Phylogenetic Trees Tutorial 6. Measuring distance Bottom-up algorithm (Neighbor Joining) –Distance based algorithm –Relative distance based Phylogenetic.
. Class 9: Phylogenetic Trees. The Tree of Life D’après Ernst Haeckel, 1891.
Phylogeny Tree Reconstruction
Phylogenetic Trees Tutorial 6. Measuring distance Bottom-up algorithm (Neighbor Joining) –Distance based algorithm –Relative distance based Phylogenetic.
Building Phylogenies Parsimony 1. Methods Distance-based Parsimony Maximum likelihood.
Building Phylogenies Distance-Based Methods. Methods Distance-based Parsimony Maximum likelihood.
Phylogenetic Analysis. 2 Phylogenetic Analysis Overview Insight into evolutionary relationships Inferring or estimating these evolutionary relationships.
. Class 9: Phylogenetic Trees. The Tree of Life D’après Ernst Haeckel, 1891.
Phylogenetic trees Tutorial 6. Distance based methods UPGMA Neighbor Joining Tools Mega phylogeny.fr DrewTree Phylogenetic Trees.
. Phylogenetic Trees Lecture 11 Sections 7.1, 7.2, in Durbin et al.
Phylogenetic trees Sushmita Roy BMI/CS 576
. Phylogenetic Trees Lecture 11 Sections 7.1, 7.2, in Durbin et al.
Phylogenetic Analysis. 2 Introduction Intension –Using powerful algorithms to reconstruct the evolutionary history of all know organisms. Phylogenetic.
Terminology of phylogenetic trees
Molecular evidence for endosymbiosis Perform blastp to investigate sequence similarity among domains of life Found yeast nuclear genes exhibit more sequence.
1 Dan Graur Molecular Phylogenetics Molecular phylogenetic approaches: 1. distance-matrix (based on distance measures) 2. character-state.
Phylogentic Tree Evolution Evolution of organisms is driven by Diversity  Different individuals carry different variants of.
Phylogenetic Analysis. General comments on phylogenetics Phylogenetics is the branch of biology that deals with evolutionary relatedness Uses some measure.
Computational Biology, Part D Phylogenetic Trees Ramamoorthi Ravi/Robert F. Murphy Copyright  2000, All rights reserved.
BINF6201/8201 Molecular phylogenetic methods
Bioinformatics 2011 Molecular Evolution Revised 29/12/06.
Phylogenetic Tree Reconstruction
OUTLINE Phylogeny UPGMA Neighbor Joining Method Phylogeny Understanding life through time, over long periods of past time, the connections between all.
Building phylogenetic trees. Contents Phylogeny Phylogenetic trees How to make a phylogenetic tree from pairwise distances  UPGMA method (+ an example)
Introduction to Phylogenetics
Calculating branch lengths from distances. ABC A B C----- a b c.
Evolutionary tree reconstruction
Algorithms in Computational Biology11Department of Mathematics & Computer Science Algorithms in Computational Biology Building Phylogenetic Trees.
Phylogeny Ch. 7 & 8.
Phylogenetic trees Sushmita Roy BMI/CS 576 Sep 23 rd, 2014.
Applied Bioinformatics Week 8 Jens Allmer. Theory I.
Tutorial 5 Phylogenetic Trees.
Ayesha M.Khan Spring Phylogenetic Basics 2 One central field in biology is to infer the relation between species. Do they possess a common ancestor?
1 CAP5510 – Bioinformatics Phylogeny Tamer Kahveci CISE Department University of Florida.
Distance-Based Approaches to Inferring Phylogenetic Trees BMI/CS 576 Colin Dewey Fall 2010.
Distance-based methods for phylogenetic tree reconstruction Colin Dewey BMI/CS 576 Fall 2015.
Fitch-Margoliash Algorithm 1.From the distance matrix find the closest pair, e.g., A & B 2.Treat the rest of the sequences as a single composite sequence.
CSCE555 Bioinformatics Lecture 13 Phylogenetics II Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page:
Inferring a phylogeny is an estimation procedure.
Clustering methods Tree building methods for distance-based trees
Multiple Alignment and Phylogenetic Trees
Patterns in Evolution I. Phylogenetic
The Tree of Life From Ernst Haeckel, 1891.
Phylogenetic Trees.
CS 581 Tandy Warnow.
Phylogeny.
Presentation transcript:

From Ernst Haeckel, 1891 The Tree of Life

 Classical approach considers morphological features  number of legs, lengths of legs, etc.  Modern approach considers molecular features  gene sequences  protein sequences  Use of molecular data provides objective criteria for constructing phylogenetic trees Phylogenetic Analysis

 Phylogenetic analysis is based on homologous sequences in different species (e.g., globins)  Sequences can be homologous for different reasons:  orthologs -- sequences diverged after a speciation event  paralogs -- sequences diverged after a duplication event  xenologs -- sequences diverged after horizontal transfer (e.g., by virus) Phylogenetic Analysis

 A tree is a collection of nodes and edges with no cycles (i.e. there is no path from a node to itself)  Tree topology refers to the “shape” of the tree Tree Terminology treenot a tree topologically equivalent

 A tree is a collection of nodes and edges with no cycles (i.e. there is no path from a node to itself)  Classification of nodes (in the context of phylogenetic trees)  root – (a single distinguished node) represents the common ancestor  internal nodes – represent intermediate ancestors in the course of evolution  leaves – (the non-branching nodes) represent the species for which the tree is built Tree Terminology treenot a tree

 Rooted Trees  internal nodes have 3 edges (1 for parent, 2 for children)  a special node (the root) has 2 edges  the leaves (the given taxa) have one edge  Unrooted trees – same as above but do not have root node Tree Terminology

 Classification of nodes (in the context of phylogenetic trees)  root – (a single distinguished node) represents the common ancestor  internal nodes – represent ancestors in the course of evolution  leaves – (the non-branching nodes) represent the species for which the tree is built  When the root node is not specified the tree is unrooted Tree Terminology

Three Leaf Nodes Only one unrooted tree is possible Four Leaf Nodes A A D C B D B C Three different unrooted trees are possible A B C D A B C Counting Trees  How many trees are there that have n leaf nodes (or taxa)?

 N R = Number of possible rooted trees =  N U = Number of possible unrooted trees = Counting Trees

nUnrootedRooted * Tree Explosion

The number of possible rooted trees for 15 different taxa is 213,458,046,767,875 Assuming a computer can create a tree in seconds, it would take 2.47 days of computation time to create them. For 20 taxa, there are 8,200,794,532,637,891,559,337 possible trees and the same computer would take 259,867 years to generate this many trees! Tree Explosion

 Distance-based  UPGMA – Unweighted Pair-Group Mathod with Arithmetic Means  Fitch-Margoliash (FM)  Neighbor-Joining  Character-based  Maximum parsimony algorithm Algorithms

 Distance-based algorithms expect as input a matrix of distances (d ij ) between each pair of sequences  Distance data can be generated from the available sequences and models of base substitution  Jukes-Cantor model p – fraction of mismatches  Kimura model P – fraction of transitions Q – fraction transversions Distance Data

UPGMA Algorithm

 Main idea: Group the taxa into clusters and repeatedly merge the closest two clusters until one cluster remains  Algorithm  Add a leaf to the tree for each taxon  Initially make each taxon be its own cluster  Find the closest clusters and connect with node in the tree (place new node at equal distance from the clusters)  Repeat previous step until all clusters are connected UPGMA Algorithm x4x4 x2x2 x3x3 x5x5 x1x1 x3x3 x5x5 x1x1 x2x2 x4x4 root

 The algorithm needs to compute distance between clusters  The distance between clusters C i and C j is defined to be the average distance between all pairs of taxa in C i and C j UPGMA Clustering

 The algorithm needs to compute distance between clusters  The distance between clusters C i and C j is defined to be the average distance between all pairs of taxa in C i and C j  Shortcut when combining C i and C j to form new cluster C k UPGMA Clustering

UPGMA Example

Assume the following distance matrix x1x1 x2x2 x3x3 x4x4 x5x5 x1x x2x2 - 8 x3x x4x4 8 - x5x Closest Pair is {x 3, x 5 } so cluster them, C 1 = {x 3,C 5 } Compute the distance from C 1 to the rest d(C 1,x 1 ) = 1/2 (d(x 3,x 1 ) + d(x 5,x 1 ) ) = 6 d(C 1,x 2 ) = 1/2 (d(x 3,x 2 ) + d(x 5,x 2 ) ) = 16 d(C 1,x 4 ) = 1/2 (d(x 3,x 4 ) + d(x 5,x 4 ) ) = 16 Add new node for x 3, x 5 at height d(x 3,x 5 ) / 2 = 1 x3x3 x5x5 1 1 UPGMA

x1x1 x2x2 x4x4 C1C1 x1x x2x2 -8 x4x4 8- C1C1 6 - Closest Pair is {x 1, C 1 } so cluster them, C 2 = {x 1,C 1 } Compute the distances from C 2 to the d(C 2,x 2 ) = 1/3 (d(x 1,x 2 ) + d(x 3,x 2 ) +d(x 5,x 2 ) ) = 16 d(C 2,x 4 ) = 1/3 (d(x 1,x 4 ) + d(x 3,x 4 ) +d(x 5,x 4 ) ) = 16 Add new node for x 1, C 1 at height d(x 1,C 1 ) / 2 = 3 The updated distance matrix – C 1 replaced x 3, x 5 x1x1 3 2 x3x3 x5x5 1 1 UPGMA

Closest Pair is {x 2, x 4 } so cluster them, C 3 = {x 2,x 4 } Compute the distances from C 3 to the rest d(C 3,C 2 ) = 1/6 (d(x 2,x 1 ) + d(x 2,x 3 ) +d(x 2,x 5 ) + d(x 4,x 1 ) + d(x 4,x 3 ) +d(x 4,x 5 )) = 16 Add new node for x 2, x 4 at height d(x 2,x 4 ) / 2 = 4 The updated distance matrix – C 2 replaced x 1, C 1 x2x2 x4x4 C2C2 x2x x4x4 8- C2C2 - x3x3 x5x5 1 x1x x2x2 x4x4 44 UPGMA

Closest Pair is {C 2, C 3 } so cluster them, C 4 = {C 2,C 3 } Add new node for C 2, C 3 at height d(C 2,C 4 ) / 2 = 8 The updated distance matrix – C 3 replaced x 2, x 4 C2C2 C3C3 C2C2 -16 C3C3 - x3x3 x5x5 1 x1x x2x2 x4x root UPGMA Done! Double-check if original distances between taxa are preserved (not guaranteed)

UPGMA Summary  Distance-based algorithm that produces rooted trees  Assumes that all species evolve at the same rate (molecular clock hypothesis)  Implication of molecular clock hypothesis is that distance from root to any taxon is the same  Final tree may not preserve original distances between the taxa x3x3 x5x5 1 x1x x2x2 x4x root

Fitch-Margoliash (FM) Algorithm

FM Algorithm  Similar to UPGMA but removes molecular clock assumption (i.e. distance from an internal node to leaves differs)  Produces unrooted trees  Algorithm (similar to UPGMA)  Add a leaf to the tree for each taxon  Initially make each taxon be its own cluster  Find the closest clusters and connect with node in the tree (place new node at equal distance from the clusters at distance given by 3-point formula)  Repeat previous step until all clusters are connected

 Given three taxa i, j, k with distances d(i, j), d(i, k), d(j, k) where should the interior node m be placed to connect the taxa and preserve the distances? i j k m FM and 3-point formula

 Given three taxa i, j, k with distances d(i, j), d(i, k), d(j, k) where should the interior node m be placed to connect the taxa and preserve the distances? i j k m FM and 3-point formula

 Algorithm (similar to UPGMA)  Add a leaf to the tree for each taxon  Initially make each taxon be its own cluster  Find the closest clusters and connect with node in the tree (place new at distance given by 3-point formula, where the points are clusters of tax and we use the distance between clusters)  Repeat previous step until all clusters are connected FM Algorithm x4x4 x2x2 x3x3 x5x5 x1x1 x3x3 x5x5 x1x1 x2x2 x4x4

Apply the FM algorithm to the following distance matrix: BCDE A B C D A and B are closest; temporarily group C-D-E and compute d(A, B), d(A, C-D-E), d(B, C-D-E) to apply 3-point formula d(A,C-D-E) = 1/3( ) =.93 d(B,C-D-E) = 1/3( ) =.863 d(A, B) =.31 only used to help us group A, B By 3-point formula: d(C-D-E,X) = 1/2(d(C-D-E,A) + d(C-D-E,B) – d(A,B)) d(B, X) = 1/2(d(B,A) + d(B,C-D-E) – d(A,C-D-E)) d(A, X) = 1/2(d(A,B) + d(A,C-D-E) – d(B,C-D-E)) C-D-E.7415 A B X

A and B are combined in a cluster for the rest of the algorithm, so need to recompute the distances from A-B to other clusters: d(A-B,C) = 1/2( ) = d(A-B,D) = 1/2( ) =.72 d(A-B, E) = 1/2( ) =.965 The updated table is: CDE A-B C D--.37 The partial tree so far is: A B

Based on the updated table CDE A-B C D--.37 D and E are closest; temporarily group A-B-C and compute d(D, E), d(D, A-B-C), d(E, A-B-C) to apply 3-point formula d(D,A-B-C) = 1/3( ) =.683 d(E,A-B-C) = 1/3( ) =.783 d(D, E) =.37 only used to help us group D, E E D A-B-C Y By 3-point formula: d(A-B-C,Y) = 1/2(d(A-B-C, D) + d(A-B-C,E) – d(D,E)) d(D, Y) = 1/2(d(D,E) + d(D,A-B-C) – d(E,A-B-C)) d(E, Y) = 1/2(d(E,D) + d(E,A-B-C) – d(D,A-B-C))

The partial tree so far is: D and E are combined in a cluster for the rest of the algorithm, so need to recompute the distances from D-E to other clusters: d(A-B,D-E) = 1/4 ( ) =.8425 d(A-B,C) = 1/2( ) = d(C,D-E) = 1/2 ( ) = E D A B The updated table is now: CD-E A-B C-.515

Based on the updated table CD-E A-B C-.515 There are only three clusters, so just apply the 3-point formula d(A-B,Z) = 1/2(d(A-B, D-E) + d(A-B,C) – d(D-E,C)) d(D-E,Z) = 1/2(d(D-E,A-B) + d(D-E,-C) – d(A-B,C)) d(C, Y) = 1/2(d(C,A-B) + d(C,D-E) – d(A-B,D-E)) A-B C D-E Z

Now we need to expand the clusters A-B, D-E We also need to compute the values for a and b: The negative value for b is a cause for concern about the quality of the data. If we are confident of our data and since is close to 0, b would be set to 0. A-B C D-E Z C A B a E D b Z d(A-B, Z) = 1/2 (d(A,Z) + d(B, Z)) = 1/2 (.1885+a a) = a = d(D-E, Z) = 1/2 (d(D,Z) + d(E, Z)) = 1/2 (.235+b b) = b =

FM Summary  Distance-based algorithm that produces unrooted trees  Removes the assumption of molecular clock, but does not give information about the root (common ancestor)  To detect the root could introduce an extra taxon (outgroup) that is more distantly related to the given taxa

Neighbor-Joining (NJ) Algorithm

NJ Algorithm  Similar to FM (also removes molecular clock assumption) but more sophisticated in how it selects clusters to join  Produces unrooted trees  Algorithm (similar to FM)  Add a leaf to the tree for each taxon  Initially make each taxon be its own cluster  Find the closest clusters (using more sophisticated criterion) (place new node at distance given by a variant of 3-point formula)  Repeat previous step until all clusters are connected

 Suppose that you are given n taxa x 1, x 2, x 3, …, x n, and suppose that you have some tree that fits the distance data NJ “closeness” Criterion observation: d(x 1,x 2 ) + d(x i,x j ) < d(x 1,x i ) + d(x 2,x j ) x2x2 x1x1 x4x4 x5x5 x3x3 x6x6 y z (right side includes yz twice, left does not)

 From previous slide NJ “closeness” Criterion d(x 1,x 2 ) + d(x i,x j ) < d(x 1,x i ) + d(x 2,x j ) d(x 1,x 2 ) + d(x 3,x 4 ) < d(x 1,x 3 ) + d(x 2,x 4 ) d(x 1,x 2 ) + d(x 3,x 5 ) < d(x 1,x 3 ) + d(x 2,x 5 ) d(x 1,x 2 ) + d(x 3,x 6 ) < d(x 1,x 3 ) + d(x 2,x 6 ) … … … d(x 1,x 2 ) + d(x 3,x n ) < d(x 1,x 3 ) + d(x 2,x n ) For a fixed i, say i = 3: Add d(x 3,x 1 ),d(x 3,x 2 ), d(x 3,x 3 ), d(x 2,x 1 ), d(x 2,x 2 ) to both sides

 From previous slide, if x 1 and x 2 are neighbors Let Then in general, if x k and x l are neighbors  NJ uses this observation to determine “closeness” and computes the smallest value M(k, l) to determine a cluster  Unlike UPGMA and FM, NJ has a more global view of “closeness” when selecting neighbors NJ “closeness” Criterion

 If x 1 and x 2 are neighbors; where should new node y be NJ new node Placement x2x2 x1x1 x4x4 x5x5 x3x3 y by 3-point formula … … … add on right side d(x 1,x 1 ) + d(x 1,x 2 ) - d(x 2,x 1 ) - d(x 2,x 2 )

 For each pair of nodes x k and x l compute the quantity Actually, could compute  When x k and x l are replaced by new node y, place y at  From now on S i will always be divided implicitly by (n-2) NJ mini summary

NJ Algorithm  From the distance matrix compute the criterion matrix  Find the smallest value in M(i, j) – cluster the corresponding pair  Connect taxa x i and x j with a new node y placed at distance  Remove x i and x j and replace with y; update the distance matrix using the 3-point formula  Repeat from beginning

Apply the NJ algorithm to the given distance matrix: x1x1 x2x2 x3x3 x4x4 x5x5 x6x6 x1x x2x x3x x4x x5x X6X First compute S i =sum-of-row / (n-2) Compute M(1,2) = d(1,2) – S 1 – S 2 = 8 – 22= -14 M(1,3) = d(1,3) – S 1 – S 3 = 3 – 24.5= M(1,4) = d(1,4) – S 1 – S 4 = 14 – 26 = -12 M(1,5) = d(1,5) – S 1 – S 5 = 10 – 23 = -13 M(1,4) = d(1,4) – S 1 – S 4 = 12 – 24 = -12 and so on … S 1 = S 2 =10.25 S 3 =12.75 S 4 =14.25 S 5 =11.25 S 6 = x1x1 x2x2 x3x3 x4x4 x5x5 x6x6 x1x x2x x3x x4x x5x X6X6 - Find min value, i.e. the pair to cluster

From previous slide we need to cluster x 1 and x 3 Add a new taxon x 7 and place it at distance Recompute distances from x 7 to all others using the 3-point formula x1x1 2 1 x3x3 x7x7 d(7,2) = ½(d(1,2) + d(3,2) – d(1,3)) = 7 d(7,4) = ½(d(1,4) + d(3,4) – d(1,3)) = 13 d(7,5) = ½(d(1,5) + d(3,5) – d(1,3)) = 9 d(7,6) = ½(d(1,6) + d(3,6) – d(1,3)) = 11 x2x2 x4x4 x5x5 x6x6 x7x7 x2x x4x x5x x6x x7x

Apply the NJ algorithm to the new distance matrix: First compute S i =sum-of-row / (n-2) Compute M(2,4) = d(2,4) – S 2 – S 4 = M(2,5) = d(2,5) – S 2 – S 5 = M(2,6) = d(2,6) – S 2 – S 6 = M(2,7) = d(2,7) – S 2 – S 7 = and so on … S 2 = S 4 = S 5 = S 6 = S 7 = x2x2 x4x4 x5x5 x6x6 x7x7 x2x x4x x5x x6x x7x x2x2 x4x4 x5x5 x6x6 x7x7 x2x2 - x4x4 -- x5x5 --- x6x x7x Find min value, i.e. the pair to cluster

From previous slide we need to cluster ? and ?? Add a new taxon x 8 and place it at distance Recompute distances from x 8 to all others using the 3-point formula x?x? ? ? x ?? x8x8 x?x? x?x? x?x? x8x8 x?x? - x?x? - x?x? - x6x6 -

NJ Summary  Distance-based algorithm that produces unrooted trees  Removes the assumption of molecular clock, but does not give information about the root (common ancestor)  Typically performs better than UPGMA and FM – uses a more global criterion to select pairs to cluster  To detect the root could introduce an extra taxon (outgroup) that is more distantly related to the given taxa

Maximum Parsimony (MP) Algorithm

MP Algorithm  Character-based algorithm – does not use distances, but utilizes the character information in sequences  A criticism of distance-based methods is that they do not exploit the structure of the sequences (collapse them to a number – the distance)  Main philosophy is “economy of substitutions” – find the tree that requires the fewest mutations (maximum parsimony)

MP Algorithm  The strategy  explore a number of possible trees  report the tree with smallest score (most parsimonious)  Need to be able to solve two problems  small parsimony problem -- given a candidate tree compute its parsimony score  large parsimony problem -- generate efficiently viable candidate trees (cannot generate all – tree explosion)

Small Parsimony Problem  Given a candidate tree, compute its parsimony score  Consider a candidate tree for one-site sequences s1 = A s2 = T s3 = T s4 = G s5 = A A T T G A ATAT AGAG T AGTAGT Final Score = 3

Solving Small Parsimony Problem  explore the tree bottom-up (from leaves to interior)  for each internal node one level up  if the “labels” at the two child nodes have no symbols in common assign as label at this node the sum of both labels penalize the tree one unit  if the “labels” at the two child nodes do have symbols in common, label with common portion no penalty AGCAGC AGAG C AGAG GTGT G

Solving Small Parsimony Problem  For n-site sequences run the algorithm in parallel for each site and add up the parsimony scores for all sites  Consider a candidate tree for the following sequences s1 = ATC s2 = ACC s3 = GTA s4 = GCA ATC ACC GTA GCA TCTC AC AGAG T ACAC T CTCT A Final Score = 4

Solving Large Parsimony Problem  Generate efficiently viable candidate trees (cannot try all)  Branch-and-bound approach  create a possible tree by some method; calculate its score  start building a tree from scratch; discarding trees that cost more than current best

Solving Large Parsimony Problem  Branch-and-bound approach

MP Summary  Character-based algorithm – uses the sequence data  Produces unrooted trees  Economy of substitution – best tree is one that requires fewest number of substitutions  Examines a number of possible trees in search for best one