Presentation is loading. Please wait.

Presentation is loading. Please wait.

Maximum Parsimony Phenetic (distance based) methods are fast and often accurate but discard data and are not based on explicit character states at each.

Similar presentations


Presentation on theme: "Maximum Parsimony Phenetic (distance based) methods are fast and often accurate but discard data and are not based on explicit character states at each."— Presentation transcript:

1 Maximum Parsimony Phenetic (distance based) methods are fast and often accurate but discard data and are not based on explicit character states at each position – most (except minimum evolution) are algorithm based If you follow the algorithm, you will arrive at a single tree Other methods are character and/or criterion based You establish your criteria for the best tree and then explore all possible trees to find the best one according to that criterion Parsimony is the principle that given all possible explanations, the simplest one is the best Thus, under parsimony, the criterion is that the best tree is the one that requires the fewest number of evolutionary changes, the shortest tree. Parsimony is based on cladistics, elucidated by Willi Hennig (1950)

2 Maximum Parsimony Frog Bird Crocodile Kangeroo Bat Human amnion hair
wings antorbital fenestra placenta lactation Tree 1 Tree 2 T A X A FIT - + CHARACTERS 1 2 3 4 5 6 TREE LENGTH 7 10 Cocodile

3 Maximum Parsimony Parsimony is based on cladistics, elucidated by Willi Hennig (1950) Note that the clustering methods discussed previously don’t actually attempt to define the genealogy of the taxa They merely cluster them according to similarity based on an evolutionary model Cladistics attempts to analyze each character in terms of it genealogy and reconstruct the genealogy of the sequence as a whole from that analysis

4 Maximum Parsimony Cladistic Terminology
A. Cladistic terminology. Useful for other methods but critical for parsimony analyses. Cladistic analyses are based on character polarity. 1. Ingroup. Group being studied. 2. Outgroup. Group "outside (but not too distantly related) the one being studied. Used to root the tree (set character state polarities). 3. Plesiomorphy. The ancestral character state. Shared plesiomorphies are symplesiomorphies - uninformative similarity. 4. Apomorphy. A derived character or character state. Shared apomorphies are synapomorphies - informative similarity. 5. Autapomorphy. Uninformative differences unique to particular taxa. They provide no cladistically useful information. They do provide information on branch length (to produce phylograms). 6. Homoplasy. Uninformative similarity - i.e. due to convergence or parallelism.

5 Maximum Parsimony Cladistic Terminology
On this tree, find examples of: Apomorphies, synapomorphies, plesiomorphies, symplesiomorphies, homoplasy Each mark indicates a change scored for each organism 6 C 5 1 – autapomorphy for outgroup 2 – symplesiomorphy for all 3 & 4 – synapomorphy for A, B, C; symplesiomorphy for B, C 5 – synapomorphy for B, C 6 – autapomorphy for C B Ingroup 3 4 2 A 1 Outgroup

6 Maximum Parsimony Cladistic Terminology
Map the characters onto the tree Find examples of apomorphies, synapomorphies, plesiomorphies, symplesiomorphies, and homoplasy What are the outgroup and ingroup? Taxon A Taxon B Taxon C Taxon D Taxon E Taxon F 1. Notochord 1 2. Backbone 3. Four limbs 4. Amniotic development 5. Feathers 6. Diapsid skull 7. Single Jaw Bone 8. Warm Blooded 7 8 F Uncle Fred 4 5 E parakeet 6 3 D snake 2 C frog 1 B fish A lancelet

7 Maximum Parsimony Parsimony analysis consists of two problems
1. Determining the amount of character change required by any given tree (mapping the characters onto the tree) Computationally trivial 2. Searching possible trees for the shortest possible topology Computationally intensive due to the number of possible trees A sample problem: What is the most parsimonious tree for these sequences? A – A T G G A T T T C G B – A T G G C G T T C G C – G C G G A G T T C G D – G C G G C G T T T G

8 Maximum Parsimony First, let’s determine what is actually parsimony informative in the data set Cladistics is based on synapomorphies, they are the only thing that matter, thus A character is parsimony informative if it contains at least two types of nucleotides (or amino acids), and at least two of them occur with a minimum frequency of two. Now, lets map one of these characters (#2) onto an unrooted tree Note that we must assign states to ancestral nodes A – A T G G A T T T C G B – A T G G C G T T C G C – G C G G A G T T C G D – G C G G C G T T T G A – A T G G A T T T C G B – A T G G C G T T C G C – G C G G A G T T C G D – G C G G C G T T T G A D B C T C 1 step T C or 5 steps T C or 2 steps

9 Maximum Parsimony Mapping the changes must be done for all parsimony informative sites A – A T G G A T T T C G B – A T G G C G T T C G C – G C G G A G T T C G D – G C G G C G T T T G A C B D A T A A G C G A A A C T A A or C C A T C C G C C C site steps on two equally parsimonious trees site step site step

10 Maximum Parsimony Mapping should also be done for all other sites
Sites 3,4,7,8,10 – 0 steps Mapping should also be done for all possible trees A – A T G G A T T T C G B – A T G G C G T T C G C – G C G G A G T T C G D – G C G G C G T T T G A C B D T C G G G C C C G C G T site 6 – 1 step site step

11 Maximum Parsimony There are three possible unrooted trees for four taxa A – A T G G A T T T C G B – A T G G C G T T C G C – G C G G A G T T C G D – G C G G C G T T T G A A B C C A B D D D B C ((A,B),(C,D)) ((A,D),(C,B)) ((A,C),(B,D))

12 Maximum Parsimony Evaluate each possible tree for all sites to determine the smallest total number of changes necessary to generate each one Note sites 3,4,6,7,8,9,10 are the same for every tree – parsimony uninformative What about site 5? A – A T G G A T T T C G B – A T G G C G T T C G C – G C G G A G T T C G D – G C G G C G T T T G Sites Tree 1 2 3 4 5 6 7 8 9 10 Total ((A,B),(C,D)) ((A,D),(C,B)) ((A,C),(B,D)) A C B D ((A,B),(C,D))

13 Maximum Parsimony For this simple example, all changes were given the same cost in building the tree However, some nucleotide changes are more or less likely and thus more or less informative Sites that change rapidly and are approaching saturation are often not as informative as sites that change more slowly Weighting can give us different answers for the same data

14 Maximum Parsimony Suppose we weight transversions with twice the value of transitions Site 5 is now weighted twice as much as sites 1 and 2 A – A T G G A T T T C G B – A T G G C G T T C G C – G C G G A G T T C G D – G C G G C G T T T G Sites Tree 1 2 3 4 5 6 7 8 9 10 Total ((A,B),(C,D)) ((A,D),(C,B)) ((A,C),(B,D)) B A A C = D B C D ((A,C),(B,D)) ((A,B),(C,D))

15 Maximum Parsimony Why use parsimony? Easy to understand
Makes relatively few assumptions. Well studied mathematically Many useful software packages More theoretical arguments: 1. Methodologically, parsimony forces us to maximize homologous similarity. This is not necessarily true for other methods 2. Parsimony is based on an evolutionary assumption – evolutionary change is rare. Not true at all for most distance methods

16 Maximum Parsimony Why not use parsimony?
Not consistent, under some scenarios it is possible (even likely) to get the wrong tree Long-branch attraction – similar to rate heterogeneity problem encountered with distance methods See the tree below. In order for the correct topology to be recovered, there must be more sites supporting the ((AD),(C,B)) split than the other two possible internal splits. When DNA substitution rates are high, the probability that two lineages will convergently evolve the same nucleotide at the same site increases. When this happens, parsimony erroneously interprets this similarity as a synapomorphy (i.e., evolving once in the common ancestor of the two lineages). A B D C

17 Maximum Parsimony Why not use parsimony?
Not consistent. Under some scenarios it is possible (even likely) to get the wrong tree Long-branch attraction – similar to rate heterogeneity problem encountered with distance methods See tree 1, the ‘true. tree. In order for the correct topology to be recovered, there must be more sites supporting the ((AD),(C,B)) split than the other two possible internal splits. When DNA substitution rates are high, the probability that two lineages will convergently evolve the same nucleotide at the same site increases. When this happens, parsimony erroneously interprets this similarity as a synapomorphy (i.e., evolving once in the common ancestor of the two lineages). We get tree 2. As more data is collected for these four taxa, the problem will become worse. How do you solve the problem? Split your branches if possible by adding taxa 1 A B D C 2 A B D C

18 Maximum Parsimony Versions of parsimony
Fitch parsimony – no limitations on permissible character changes, reversible P(A->T) = P(T->A) Wagner parsimony – allows ordered transformations (to get from C to G, you must proceed through A), reversible Dollo parsimony – consider restriction site characters P(0->1) ≠ P(1->0) Limited non-reversibility – derived states cannot be lost and regained Works really well for mobile element insertion data Camin-Sokal parsimony – evolutionary changes are irreversible Transversion parsimony – ignores transitions or downweights them severely

19 Maximum Parsimony Optimal tree searching
While branch length is not an issue, finding an optimal tree using parsimony suffers from the same problems as maximum likelihood – too many trees to search The same heuristics apply

20

21 Maximum Parsimony Exercise – Examine the provided data set
It is a mobile element (SINE) based data set That means the presence of an insertion is always the derived state (the absence of an insertion is the ancestral (plesiomorphic) state) How many unrooted and rooted trees are possible? Assume taxon D is the outgroup Find the most parsimonious tree Taxon Insertion 1 Insertion 2 Insertion 3 Insertion 4 Insertion 5 Insertion 6 Insertion 7 Insertion 8 Insertion 9 Insertion 10 0 = absent, 1 = present, ? = unknown A 1 B C D ? E F G


Download ppt "Maximum Parsimony Phenetic (distance based) methods are fast and often accurate but discard data and are not based on explicit character states at each."

Similar presentations


Ads by Google