Maximum Parsimony Phenetic (distance based) methods are fast and often accurate but discard data and are not based on explicit character states at each.

Slides:



Advertisements
Similar presentations
Tree Building What is a tree ? How to build a tree ? Cladograms Trees
Advertisements

. Phylogenetic Trees (2) Lecture 13 Based on: Durbin et al 7.4, Gusfield , Setubal&Meidanis 6.1.
Bioinformatics Phylogenetic analysis and sequence alignment The concept of evolutionary tree Types of phylogenetic trees Measurements of genetic distances.
Outgroups Outgroups are the most common method for rooting trees Outgroup criteria 1. “Outside” the group of study 2.Closely related enough to be informative.
Terminal node (terminal) (=interior branch) Outgroups.
1 Phylogeny: Reconstructing Evolutionary Trees Chapter 14.
 Aim in building a phylogenetic tree is to use a knowledge of the characters of organisms to build a tree that reflects the relationships between them.
1 General Phylogenetics Points that will be covered in this presentation Tree TerminologyTree Terminology General Points About Phylogenetic TreesGeneral.
Phylogenetic Trees - I.
BIO2093 – Phylogenetics Darren Soanes Phylogeny I.
Phylogenetic reconstruction
Phylogenetic Analysis – Part 2 Spring Outline   Why do we do phylogenetics (cladistics)?   How do we build a tree?   Do we believe the tree?
IE68 - Biological databases Phylogenetic analysis
Molecular Evolution Revised 29/12/06
BIOE 109 Summer 2009 Lecture 4- Part II Phylogenetic Inference.
. Phylogeny II : Parsimony, ML, SEMPHY. Phylogenetic Tree u Topology: bifurcating Leaves - 1…N Internal nodes N+1…2N-2 leaf branch internal node.
UPGMA and FM are distance based methods. UPGMA enforces the Molecular Clock Assumption. FM (Fitch-Margoliash) relieves that restriction, but still enforces.
Maximum Likelihood. Historically the newest method. Popularized by Joseph Felsenstein, Seattle, Washington. Its slow uptake by the scientific community.
Some basics: Homology = refers to a structure, behavior, or other character of two taxa that is derived from the same or equivalent feature of a common.
Distance Methods. Distance Estimates attempt to estimate the mean number of changes per site since 2 species (sequences) split from each other Simply.
Branch lengths Branch lengths (3 characters): A C A A C C A A C A C C Sum of branch lengths = total number of changes.
Phylogenetic Concepts. Phylogenetic Relationships Phylogenetic relationships exist between lineages (e.g. species, genes) These include ancestor-descendent.
Maximum Parsimony.
Building Phylogenies Parsimony 1. Methods Distance-based Parsimony Maximum likelihood.
Tree-Building. Methods in Tree Building Phylogenetic trees can be constructed by: clustering method optimality method.
Parsimony methods the evolutionary tree to be preferred involves ‘the minimum amount of evolution’ Edwards & Cavalli-Sforza Reconstruct all evolutionary.
What Is Phylogeny? The evolutionary history of a group.
Phylogenetic analyses Kirsi Kostamo. The aim: To construct a visual representation (a tree) to describe the assumed evolution occurring between and among.
Terminology of phylogenetic trees
Molecular phylogenetics
Why Models of Sequence Evolution Matter Number of differences between each pair of taxa vs. genetic distance between those two taxa. The x-axis is a proxy.
Parsimony and searching tree-space Phylogenetics Workhop, August 2006 Barbara Holland.
Phylogenetics Alexei Drummond. CS Friday quiz: How many rooted binary trees having 20 labeled terminal nodes are there? (A) (B)
1 Dan Graur Molecular Phylogenetics Molecular phylogenetic approaches: 1. distance-matrix (based on distance measures) 2. character-state.
Phylogenetic Analysis. General comments on phylogenetics Phylogenetics is the branch of biology that deals with evolutionary relatedness Uses some measure.
Molecular phylogenetics 1 Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Sections
Bioinformatics 2011 Molecular Evolution Revised 29/12/06.
 Read Chapter 4.  All living organisms are related to each other having descended from common ancestors.  Understanding the evolutionary relationships.
Systematics and the Phylogenetic Revolution Chapter 23.
Fixations along phylogenetic lineages. Phylogenetic reconstruction: a simplification of the evolutionary process.
A brief introduction to phylogenetics
Lecture 2: Principles of Phylogenetics
Introduction to Phylogenetics
Cladogram construction Thanks to Leandro Gaetano.
GENE 3000 Fall 2013 slides wiki. wiki. wiki.
Phylogenetic Analysis – Part 2. Outline   Why do we do phylogenetics (cladistics)?   How do we build a tree?   Do we believe the tree?   Applications.
What is a synapomorphy?. Terms systematics [taxonomy, phylogenetics] phylogeny/phylogenetic tree cladogram tips, branches, nodes homology apomorphy synapomorhy.
Phylogenies Reconstructing the Past. The field of systematics Studies –the mechanisms of evolution evolutionary agents –the process of evolution speciation.
Parsimony-Based Approaches to Inferring Phylogenetic Trees BMI/CS 576 Colin Dewey Fall 2015.
Maximum Likelihood Given competing explanations for a particular observation, which explanation should we choose? Maximum likelihood methodologies suggest.
Phylogeny Ch. 7 & 8.
How do we infer phylogeny?
Phylogenetic Analysis – Part 2. Outline   Why do we do phylogenetics (cladistics)?   How do we build a tree?   Do we believe the tree?   Applications.
Phylogenies & Classifying species (AKA Cladistics & Taxonomy) What are phylogenies? How do we read them? How do we estimate them?
Introduction to Bioinformatics Resources for DNA Barcoding
Multiple Alignment and Phylogenetic Trees
Warm-Up Contrast adaptive radiation vs. convergent evolution? Give an example of each. What is the correct sequence from the most comprehensive to least.
Patterns in Evolution I. Phylogenetic
Systematics: Tree of Life
Warm-Up Contrast adaptive radiation vs. convergent evolution? Give an example of each. What is the correct sequence from the most comprehensive to least.
Warm-Up Contrast adaptive radiation vs. convergent evolution? Give an example of each. What is the correct sequence from the most comprehensive to least.
CS 581 Tandy Warnow.
Why Models of Sequence Evolution Matter
Systematics: Tree of Life
Chapter 20 Phylogenetic Trees. Chapter 20 Phylogenetic Trees.
Warm-Up Contrast adaptive radiation vs. convergent evolution? Give an example of each. What is the correct sequence from the most comprehensive to least.
Phylogeny: Reconstructing Evolutionary Trees
Warm-Up Contrast adaptive radiation vs. convergent evolution? Give an example of each. What is the correct sequence from the most comprehensive to least.
The Great Clade Race 10/24/18.
Warm-Up Contrast adaptive radiation vs. convergent evolution? Give an example of each. What is the correct sequence from the most comprehensive to least.
Presentation transcript:

Maximum Parsimony Phenetic (distance based) methods are fast and often accurate but discard data and are not based on explicit character states at each position – most (except minimum evolution) are algorithm based If you follow the algorithm, you will arrive at a single tree Other methods are character and/or criterion based You establish your criteria for the best tree and then explore all possible trees to find the best one according to that criterion Parsimony is the principle that given all possible explanations, the simplest one is the best Thus, under parsimony, the criterion is that the best tree is the one that requires the fewest number of evolutionary changes, the shortest tree. Parsimony is based on cladistics, elucidated by Willi Hennig (1950)

Maximum Parsimony Frog Bird Crocodile Kangeroo Bat Human amnion hair wings antorbital fenestra placenta lactation Tree 1 Tree 2 T A X A FIT - + CHARACTERS 1 2 3 4 5 6 TREE LENGTH 7 10 Cocodile

Maximum Parsimony Parsimony is based on cladistics, elucidated by Willi Hennig (1950) Note that the clustering methods discussed previously don’t actually attempt to define the genealogy of the taxa They merely cluster them according to similarity based on an evolutionary model Cladistics attempts to analyze each character in terms of it genealogy and reconstruct the genealogy of the sequence as a whole from that analysis

Maximum Parsimony Cladistic Terminology A. Cladistic terminology. Useful for other methods but critical for parsimony analyses. Cladistic analyses are based on character polarity. 1. Ingroup. Group being studied. 2. Outgroup. Group "outside (but not too distantly related) the one being studied. Used to root the tree (set character state polarities). 3. Plesiomorphy. The ancestral character state. Shared plesiomorphies are symplesiomorphies - uninformative similarity. 4. Apomorphy. A derived character or character state. Shared apomorphies are synapomorphies - informative similarity. 5. Autapomorphy. Uninformative differences unique to particular taxa. They provide no cladistically useful information. They do provide information on branch length (to produce phylograms). 6. Homoplasy. Uninformative similarity - i.e. due to convergence or parallelism.

Maximum Parsimony Cladistic Terminology On this tree, find examples of: Apomorphies, synapomorphies, plesiomorphies, symplesiomorphies, homoplasy Each mark indicates a change scored for each organism 6 C 5 1 – autapomorphy for outgroup 2 – symplesiomorphy for all 3 & 4 – synapomorphy for A, B, C; symplesiomorphy for B, C 5 – synapomorphy for B, C 6 – autapomorphy for C B Ingroup 3 4 2 A 1 Outgroup

Maximum Parsimony Cladistic Terminology Map the characters onto the tree Find examples of apomorphies, synapomorphies, plesiomorphies, symplesiomorphies, and homoplasy What are the outgroup and ingroup? Taxon A Taxon B Taxon C Taxon D Taxon E Taxon F 1. Notochord 1 2. Backbone 3. Four limbs 4. Amniotic development 5. Feathers 6. Diapsid skull 7. Single Jaw Bone 8. Warm Blooded 7 8 F Uncle Fred 4 5 E parakeet 6 3 D snake 2 C frog 1 B fish A lancelet

Maximum Parsimony Parsimony analysis consists of two problems 1. Determining the amount of character change required by any given tree (mapping the characters onto the tree) Computationally trivial 2. Searching possible trees for the shortest possible topology Computationally intensive due to the number of possible trees A sample problem: What is the most parsimonious tree for these sequences? 1 2 3 4 5 6 7 8 9 10 A – A T G G A T T T C G B – A T G G C G T T C G C – G C G G A G T T C G D – G C G G C G T T T G

Maximum Parsimony First, let’s determine what is actually parsimony informative in the data set Cladistics is based on synapomorphies, they are the only thing that matter, thus A character is parsimony informative if it contains at least two types of nucleotides (or amino acids), and at least two of them occur with a minimum frequency of two. Now, lets map one of these characters (#2) onto an unrooted tree Note that we must assign states to ancestral nodes 1 2 3 4 5 6 7 8 9 10 A – A T G G A T T T C G B – A T G G C G T T C G C – G C G G A G T T C G D – G C G G C G T T T G 1 2 3 4 5 6 7 8 9 10 A – A T G G A T T T C G B – A T G G C G T T C G C – G C G G A G T T C G D – G C G G C G T T T G A D B C T C 1 step T C or 5 steps T C or 2 steps

Maximum Parsimony Mapping the changes must be done for all parsimony informative sites 1 2 3 4 5 6 7 8 9 10 A – A T G G A T T T C G B – A T G G C G T T C G C – G C G G A G T T C G D – G C G G C G T T T G A C B D A T A A G C G A A A C T A A or C C A T C C G C C C site 5 - 2 steps on two equally parsimonious trees site 2 - 1 step site 1 - 1 step

Maximum Parsimony Mapping should also be done for all other sites Sites 3,4,7,8,10 – 0 steps Mapping should also be done for all possible trees 1 2 3 4 5 6 7 8 9 10 A – A T G G A T T T C G B – A T G G C G T T C G C – G C G G A G T T C G D – G C G G C G T T T G A C B D T C G G G C C C G C G T site 6 – 1 step site 9 - 1 step

Maximum Parsimony There are three possible unrooted trees for four taxa 1 2 3 4 5 6 7 8 9 10 A – A T G G A T T T C G B – A T G G C G T T C G C – G C G G A G T T C G D – G C G G C G T T T G A A B C C A B D D D B C ((A,B),(C,D)) ((A,D),(C,B)) ((A,C),(B,D))

Maximum Parsimony Evaluate each possible tree for all sites to determine the smallest total number of changes necessary to generate each one Note sites 3,4,6,7,8,9,10 are the same for every tree – parsimony uninformative What about site 5? 1 2 3 4 5 6 7 8 9 10 A – A T G G A T T T C G B – A T G G C G T T C G C – G C G G A G T T C G D – G C G G C G T T T G Sites Tree 1 2 3 4 5 6 7 8 9 10 Total ((A,B),(C,D)) ((A,D),(C,B)) ((A,C),(B,D)) A C B D ((A,B),(C,D))

Maximum Parsimony For this simple example, all changes were given the same cost in building the tree However, some nucleotide changes are more or less likely and thus more or less informative Sites that change rapidly and are approaching saturation are often not as informative as sites that change more slowly Weighting can give us different answers for the same data

Maximum Parsimony Suppose we weight transversions with twice the value of transitions Site 5 is now weighted twice as much as sites 1 and 2 1 2 3 4 5 6 7 8 9 10 A – A T G G A T T T C G B – A T G G C G T T C G C – G C G G A G T T C G D – G C G G C G T T T G Sites Tree 1 2 3 4 5 6 7 8 9 10 Total ((A,B),(C,D)) ((A,D),(C,B)) ((A,C),(B,D)) B A A C = D B C D ((A,C),(B,D)) ((A,B),(C,D))

Maximum Parsimony Why use parsimony? Easy to understand Makes relatively few assumptions. Well studied mathematically Many useful software packages More theoretical arguments: 1. Methodologically, parsimony forces us to maximize homologous similarity. This is not necessarily true for other methods 2. Parsimony is based on an evolutionary assumption – evolutionary change is rare. Not true at all for most distance methods

Maximum Parsimony Why not use parsimony? Not consistent, under some scenarios it is possible (even likely) to get the wrong tree Long-branch attraction – similar to rate heterogeneity problem encountered with distance methods See the tree below. In order for the correct topology to be recovered, there must be more sites supporting the ((AD),(C,B)) split than the other two possible internal splits. When DNA substitution rates are high, the probability that two lineages will convergently evolve the same nucleotide at the same site increases. When this happens, parsimony erroneously interprets this similarity as a synapomorphy (i.e., evolving once in the common ancestor of the two lineages). A B D C

Maximum Parsimony Why not use parsimony? Not consistent. Under some scenarios it is possible (even likely) to get the wrong tree Long-branch attraction – similar to rate heterogeneity problem encountered with distance methods See tree 1, the ‘true. tree. In order for the correct topology to be recovered, there must be more sites supporting the ((AD),(C,B)) split than the other two possible internal splits. When DNA substitution rates are high, the probability that two lineages will convergently evolve the same nucleotide at the same site increases. When this happens, parsimony erroneously interprets this similarity as a synapomorphy (i.e., evolving once in the common ancestor of the two lineages). We get tree 2. As more data is collected for these four taxa, the problem will become worse. How do you solve the problem? Split your branches if possible by adding taxa 1 A B D C 2 A B D C

Maximum Parsimony Versions of parsimony Fitch parsimony – no limitations on permissible character changes, reversible P(A->T) = P(T->A) Wagner parsimony – allows ordered transformations (to get from C to G, you must proceed through A), reversible Dollo parsimony – consider restriction site characters P(0->1) ≠ P(1->0) Limited non-reversibility – derived states cannot be lost and regained Works really well for mobile element insertion data Camin-Sokal parsimony – evolutionary changes are irreversible Transversion parsimony – ignores transitions or downweights them severely

Maximum Parsimony Optimal tree searching While branch length is not an issue, finding an optimal tree using parsimony suffers from the same problems as maximum likelihood – too many trees to search The same heuristics apply

Maximum Parsimony Exercise – Examine the provided data set It is a mobile element (SINE) based data set That means the presence of an insertion is always the derived state (the absence of an insertion is the ancestral (plesiomorphic) state) How many unrooted and rooted trees are possible? Assume taxon D is the outgroup Find the most parsimonious tree Taxon Insertion 1 Insertion 2 Insertion 3 Insertion 4 Insertion 5 Insertion 6 Insertion 7 Insertion 8 Insertion 9 Insertion 10 0 = absent, 1 = present, ? = unknown A 1 B C D ? E F G