Presentation on theme: "How to estimate phylogenies? On parsimony, likelihood and probability. Duur Aanen."— Presentation transcript:
How to estimate phylogenies? On parsimony, likelihood and probability. Duur Aanen
Basics –What is a phylogenetic tree –Rooted, unrooted, monophyletic group –Distance methods –Maximum parsimony Likelihood methods –Maximum likelihood –Bayesian analysis –Differences Example: the evolution of fungus-growing termites and their mutualistic fungal symbionts Overview
Basics Phylogeny = evolutionary history of a group Phylogenetic tree: graphical representation (phylogenetic reconstruction ABC
How do we know where the root is? Usually by including an outgroup in the analysis: a species that falls outside the group you study (the ingroup) Examples: ingroup birds, outgroup crocodile ingroup rodents, outgroup gorilla
rat hamster mouse rathamstermouse ratmousehamster mouserat
Maximum likelihood The tree that maximizes the likelihood of observing the data from that tree is the best tree Requires evolutionary model of sequence evolution Calculation takes lot of computer time
Example (coin tossing) 10 coins, 1 is biased (p=0.8) but we don’t know which one Experiment: 10 tosses with 1 of the 10 coins: HHHHHHHHHH Likelihood: Pr [data | hypothesis] Pr [10H | biased] = 0.8 10 = 0.107 Pr [10H | fair] = 0.5 10 = 0.00098 Maximum Likelihood estimate: the coin is biased!
’The best tree?’ How can we know for sure? –Unlikely that we can decide for sure which tree –Often unlikely that we can find it
no. sequencesno. possible rooted trees 2 13 4 15 5 105 6 945 7 10.395 8 135.135 9 2.027.025 10 34.469.425. 20. 8.200.794.532.637.891.559.000. 135 2113354829308321145237289349456774432829304974 6389294775489579847592843759314562131843276117 4912347721241323233245569964443827487648712865 2143778687234129346123462394984237415736553232 2518798537837558885200938452003255000329843122 001192827437745585983493487551798753932 !!! No. of possible rooted trees
Phylogeny as a statistical problem... Many possible trees, with different likelihoods Estimate the probability distribution of trees
Bayesian methods... Bayes’ theorem: Posterior probability = probability of a hypothesis given the data For trees: probability of a tree given the DNA sequences and model of sequence evolution (likelihood is probability of data given a hypothesis, or of the DNA sequences given the tree)
Example (coin tossing) 10 coins, 1 is biased (p=0.8) but we don’t know which one Prior probability biased coin = 0.1 Experiment: 10 tosses with one of the 10 coins: HHHHHHHHHH Bayes’ theorem: Posterior probability of biased coin given 10H = pr[10H|biased]*pr[biased] / [ pr[10H|biased]*pr[biased] + pr[10H|biased]*pr[unbiased] ] Posterior probability that coin is biased = 0.92 (Remember: Likelihood: Pr [data | hypothesis], Pr [10H | biased] = 0.8 10 = 0.107, Pr [10H | fair] = 0.5 10 = 0.00098)
For trees... Pr[tree|data] Impossible to calculate: –Tree –Branch lengths –Evolutionary model with many parameters... estimation using Markow Chain Monte Carlo (MCMC) simulation –Start with a random tree –Propose new tree by changing current tree –Accept or reject with some criterion (likelihood and chance implemented in criterion) –Many generations –Save sample of trees
Phylogenies Termites: COI (mitochondrial) Fungi: two ribosomal genes (nuclear and mitochondrial) Specific fungal primers developed: use of termite guts as source for fungal DNA Analysis of DNA sequences with Bayesian techniques
Your consent to our cookies if you continue to use this website.