Parsimony based phylogenetic trees Sushmita Roy BMI/CS 576 Sep 30 th, 2014.

Slides:



Advertisements
Similar presentations
Parsimony Small Parsimony and Search Algorithms Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein.
Advertisements

. Phylogenetic Trees (2) Lecture 13 Based on: Durbin et al 7.4, Gusfield , Setubal&Meidanis 6.1.
. Class 9: Phylogenetic Trees. The Tree of Life Evolution u Many theories of evolution u Basic idea: l speciation events lead to creation of different.
BALANCED MINIMUM EVOLUTION. DISTANCE BASED PHYLOGENETIC RECONSTRUCTION 1. Compute distance matrix D. 2. Find binary tree using just D. Balanced Minimum.
. Phylogenetic Trees (2) Lecture 13 Based on: Durbin et al 7.4, Gusfield , Setubal&Meidanis 6.1.
Phylogenetic Trees Lecture 4
Phylogenetic trees Sushmita Roy BMI/CS 576 Sep 23 rd, 2014.
Tree Reconstruction.
Problem Set 2 Solutions Tree Reconstruction Algorithms
Lecture 7 – Algorithmic Approaches Justification: Any estimate of a phylogenetic tree has a large variance. Therefore, any tree that we can demonstrate.
. Phylogeny II : Parsimony, ML, SEMPHY. Phylogenetic Tree u Topology: bifurcating Leaves - 1…N Internal nodes N+1…2N-2 leaf branch internal node.
CISC667, F05, Lec14, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Phylogenetic Trees (I) Maximum Parsimony.
. Phylogenetic Trees - Parsimony Tutorial #12 Next semester: Project in advanced algorithms for phylogenetic reconstruction (236512) Initial details in:
3 -1 Chapter 3 The Greedy Method 3 -2 The greedy method Suppose that a problem can be solved by a sequence of decisions. The greedy method has that each.
MAE 552 – Heuristic Optimization Lecture 27 April 3, 2002
Building phylogenetic trees Jurgen Mourik & Richard Vogelaars Utrecht University.
In addition to maximum parsimony (MP) and likelihood methods, pairwise distance methods form the third large group of methods to infer evolutionary trees.
. Phylogenetic Trees Lecture 3 Based on: Durbin et al 7.4; Gusfield 17.
. Class 9: Phylogenetic Trees. The Tree of Life D’après Ernst Haeckel, 1891.
CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS Phylogenetic Reconstruction: Parsimony Anders Gorm Pedersen
Phylogeny Tree Reconstruction
. Phylogenetic Trees - Parsimony Tutorial #11 © Ilan Gronau. Based on original slides of Ydo Wexler & Dan Geiger.
. Comput. Genomics, Lecture 5b Character Based Methods for Reconstructing Phylogenetic Trees: Maximum Parsimony Based on presentations by Dan Geiger, Shlomo.
UNIVERSITY OF SOUTH CAROLINA College of Engineering & Information Technology Bioinformatics Algorithms and Data Structures Chapter : Strings and.
Probabilistic methods for phylogenetic trees (Part 2)
Practical multiple sequence algorithms Sushmita Roy BMI/CS 576 Sushmita Roy Sep 23rd, 2014.
Building Phylogenies Parsimony 2.
Building Phylogenies Parsimony 1. Methods Distance-based Parsimony Maximum likelihood.
Building Phylogenies Distance-Based Methods. Methods Distance-based Parsimony Maximum likelihood.
Backtracking.
Phylogenetic trees Sushmita Roy BMI/CS 576
CSE 589 Applied Algorithms Spring Colorability Branch and Bound.
© The McGraw-Hill Companies, Inc., Chapter 3 The Greedy Method.
Parsimony and searching tree-space Phylogenetics Workhop, August 2006 Barbara Holland.
1 Dan Graur Molecular Phylogenetics Molecular phylogenetic approaches: 1. distance-matrix (based on distance measures) 2. character-state.
1 Generalized Tree Alignment: The Deferred Path Heuristic Stinus Lindgreen
Phylogenetic Analysis. General comments on phylogenetics Phylogenetics is the branch of biology that deals with evolutionary relatedness Uses some measure.
Molecular phylogenetics 1 Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Sections
BINF6201/8201 Molecular phylogenetic methods
Parsimony-Based Approaches to Inferring Phylogenetic Trees BMI/CS 576 Colin Dewey Fall 2010.
Phylogenetics II.
Introduction to Phylogenetics
Using traveling salesman problem algorithms for evolutionary tree construction Chantal Korostensky and Gaston H. Gonnet Presentation by: Ben Snider.
Evolutionary tree reconstruction (Chapter 10). Early Evolutionary Studies Anatomical features were the dominant criteria used to derive evolutionary relationships.
Evolutionary tree reconstruction
More statistical stuff CS 394C Feb 6, Today Review of material from Jan 31 Calculating pattern probabilities Why maximum parsimony and UPGMA are.
Parsimony-Based Approaches to Inferring Phylogenetic Trees BMI/CS 576 Colin Dewey Fall 2015.
Maximum Likelihood Given competing explanations for a particular observation, which explanation should we choose? Maximum likelihood methodologies suggest.
Phylogenetic trees Sushmita Roy BMI/CS 576 Sep 23 rd, 2014.
1 Alignment Matrix vs. Distance Matrix Sequence a gene of length m nucleotides in n species to generate an… n x m alignment matrix n x n distance matrix.
Phylogenetic Trees - Parsimony Tutorial #13
1 CAP5510 – Bioinformatics Phylogeny Tamer Kahveci CISE Department University of Florida.
Probabilistic methods for phylogenetic tree reconstruction BMI/CS 576 Colin Dewey Fall 2015.
Probabilistic Approaches to Phylogenies BMI/CS 576 Sushmita Roy Oct 2 nd, 2014.
Branch and Bound Searching Strategies
Adversarial Search 2 (Game Playing)
Distance-Based Approaches to Inferring Phylogenetic Trees BMI/CS 576 Colin Dewey Fall 2010.
Distance-based methods for phylogenetic tree reconstruction Colin Dewey BMI/CS 576 Fall 2015.
CSCE555 Bioinformatics Lecture 13 Phylogenetics II Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page:
Artificial Intelligence in Game Design Board Games and the MinMax Algorithm.
Phylogenetic Trees - Parsimony Tutorial #12
Character-Based Phylogeny Reconstruction
Recitation 5 2/4/09 ML in Phylogeny
Inferring phylogenetic trees: Distance and maximum likelihood methods
BNFO 602 Phylogenetics Usman Roshan.
CS 581 Tandy Warnow.
CSCI2950-C Lecture 8 Molecular Phylogeny: Parsimony and Likelihood
Lecture 7 – Algorithmic Approaches
Backtracking and Branch-and-Bound
Phylogeny.
Presentation transcript:

Parsimony based phylogenetic trees Sushmita Roy BMI/CS Sep 30 th, 2014

Phylogenetic tree construction Distance-based methods Parsimony methods Probabilistic methods

Parsimony Given character data at leaf nodes, find the tree that has the smallest cost Cost of a tree is determined by the number of substitutions Best tree->lowest cost-> lowest number of substitutions Hence there are two problems to finding the best tree – How to compute the cost of a tree – How to search the space of trees

Defining cost of a tree Assume a set of aligned sequences Each sequence corresponds to a leaf in a tree Assume sites are independent of each other – Estimate cost per site For any possible tree for these sequences, estimate the number of changes needed to produce character at each site Sum over all sites

Defining the cost of a tree AAGAAAGGAAGA AAGAGAAAAGGA AAGGGAAAAAGA AAA 1 AGA AAA Consider the sequences AAG, AAA, GGA, AGA There are multiple trees that could explain the phylogeny Maximum parsimony will select the tree with the lowest cost, that is, Tree 1 Tree 1 Tree 2 Tree 3

How to compute the cost of a tree? Weighted parsimony Assume we have a substitution matrix that gives us the cost of switching between two different bases There is a recursive algorithm that allows us to compute the cost of the tree

Weighted parsimony Remember we only see things at the leaves Need to consider all possible ways in which we could see something at the leaves and consider the one with the smallest number of substitutions Weighted Parsimony uses a Dynamic Programming idea on trees – Performs a bottom up tree traversal to compute minimal cost at a node based on its children – Re-use computation done for the children Thus if we had n extant nodes, n-1 internal nodes, and m letters in our alphabet we will compute (2n-1)*m numbers

Weighted Parsimony notation Let C k (a) be the minimal cost of observing a at node k Let x k denote letter in the k th node Assume our tree has n nodes Let S(a,b) be the cost of switching from a to b where a, b are in our alphabet An internal node k ’s children are referred to as i and j

Weighted parsimony algorithm Initialization Recursion – If k is a leaf node – Otherwise Compute C i (a) and C j (a) for all a, for k ’s daughter nodes i and j Termination – Tree cost= min a C 2n-1 (a) Keeps descending to lower nodes until we reach the leaf nodes

Weighted parsimony for an internal node k with children i and j : Pick b (or (c)) such that the cost is minimized

Weighted parsimony example ACT ACGT A C G T Estimate the cost of this tree using the substitution matrix.

In class exercise

Weighted Parsimony example

Parsimony can reconstruct ancestral states as well This requires a small modification to the algorithm Just keep track of the value that gave the smallest cost as well in addition to the cost Let k be an internal node Let i and j be k ’s children Introduce pointers Update these additional pointers at the end of recursion step Trace back then looks at these values to reconstruct the ancestral state

Weighted Parsimony modification to keep track of ancestral states Initialization Recursion – If k is a leaf node – Otherwise Compute C i (a) and C j (a) for all a, for k ’s daughter nodes i and j Termination – Tree cost= min a C 2n-1 (a)

Example to infer the ancestral states ACT What is the ancestral state associated with the minimal cost tree? ACGT A C G T Recall costs for node 5 are:

Keeping track of the daughters For node 5, makes sense to only track node 4, that is L 5 (a) ATGC

Keeping track of the decisions L 4 (a)R 4 (a) AAC CAC GAC TAC Tracking daughters for node 4Tracking daughters for node 5 L 5 (a)R 5 (a) AAT CCT GGT TGT Recall, the min cost is associated with G or T at node 5. If x 5 =G, x 4 =G If x 5 =T, x 4 =G

Parsimony Often people use the simpler version of parsimony where there is no substitution matrix This is equivalent to S(a,a)=0 and S(a,b)=1 where a!=b The corresponding algorithm that uses this unweighted version is called “Fitch’s algorithm”

Searching the space of possible trees We know how to score a given tree But how to search the space of trees? Heuristic methods – Start with a tree – Make small changes to the tree and check for improvements in score Branch and bound methods – Adding a sequence cannot decrease the cost of the tree – That is the best partial tree gives a lower bound on the cost of trees that can be grown from this partial tree – Thus if we have the cost of the best complete tree so far, any partial tree with cost greater than the current best tree is not worth exploring

Heuristic methods Nearest neighbor interchange (NNI) – For each internal branch there are four nodes – Without changing the nodes, there are three topologies that link these nodes – NNI swaps the nodes to evaluate these topologies Subtree pruning and regrafting (SPR) – Delete an internal branch to get two subtrees – Add one subtree to the other subtree by considering other branches

Nearest neighbor interchange A BC D A DC B A CD B Every internal branch has three possible topologies for four nodes. Nearest neighbor interchange evaluates these three topologies for each internal branch.

Subtree pruning and regrafting A B C D E F G Delete branch A B C D E F G Old treeNew tree

Heuristic method: hill-climbing with nearest neighbor interchange given: set of leaves L create an initial tree t incorporating all leaves in L best-score = parsimony algorithm applied to t repeat for each internal edge e in t for each nearest neighbor interchange t’  tree with interchange applied to edge e in t score = parsimony algorithm applied to t’ if score < best-score best-score = score best-tree = t’ t = best-tree until stopping criteria met

Branch and bound methods – Systematically enumerate solutions, and discards avenues that are guaranteed to have higher costs Lower bound – For a set of numbers, the lower bound of the set is the smallest number in the set The cost of a partial tree, T provides a lower bound for all trees possible from T Search by repeatedly selecting the partial tree with the lowest lower bound

Branch and bound methods

Branch and bound algorithm for Phylogenetic tree search Given a set of leaves L Initialize Q to a partial tree with 3 leaves from L Repeat – Set T new to tree with lowest cost in Q – If T new has all leaves return – Else Generate new trees by considering remaining leaves for each branch of T new Compute cost for each new tree Add new trees to Q in sorted order of cost

Comments on branch and bound Exact method May be more efficient than exhaustive Worst case is no better Efficiency depends on – tightness of the lower bound – quality of initial tree

Distance-based vs Parsimony methods Different methods for phylogenetic tree reconstruction – Distance based methods UPGMA Neighbor Joining – Parsimony methods Enables also estimation of the ancestral sequences No emphasis on branch length estimation Distance-based are faster Parsimony gives ancestral sequence – Does not assume anything on branch lengths