Phylogenetics LLO9 Maximum Likelihood and Its Applications

Phylogenetics LLO9 Maximum Likelihood and Its Applications
Prepared by, Jaya Seelan Sathiya Seelan PhD

Previously: The parsimony criterion Bootstrapping Models of molecular evolution Distance methods Model-based distance corrections for distance analyses (e.g., Neighbor-Joining, UPGMA) Today: The likelihood criterion More on models of molecular evolution Maximum likelihood analysis Performance of distance, parsimony, and ML analyses in simulation (Huelsenbeck, 1995) Bayesian phylogenetic inference

Parsimony criterion: The “shortest” tree is optimal.
Tree length is dependent on the step matrix for transformation costs. Pros: Intuitive Analytically tractable Flexible--many different weighting schemes possible Can combine different kinds of data Cons: May be inconsistent in the statistical sense, meaning that as more and more data are accumulated, results can converge on an incorrect solution. Consider tree with a short internal branch and asymmetric terminal branches, reflecting unequal rates of evolution. Using parsimony, correct reconstruction of the internal branch requires a character that changed along the internal branch, but not on terminal branches. It is unlikely that such informative characters will occur, but uninformative or misinformative characters may be common. Parsimony can be "positively misleading" in cases like these, because the number of misinformative characters will increase as the number of characters increases. So, parsimony will cause you to become more confident in the wrong answer. This tree scenario has been called the "Felsenstein zone”. The phylogenetic artifact is called long branch attraction. Swofford et al p. 427 fig. 8

Maximum likelihood methods are explicitly model-based.
The likelihood criterion: The tree that maximizes the likelihood of the observed data is optimal. L = P(datatree, model) Likelihood (L) is the probability of the data (alignment), given a tree (with topology and branch lengths specified) and a probabilistic model of evolution. Assumptions (the fine print): The tree is correct The probability that a position has a certain state at time 1 depends only on the state at time 0; knowing that it had some state prior to time 0 is irrelevant--this is called a Markov process Data (individual sites) are independent A uniform evolutionary process operated across the entire tree (why might this be false? endosymbiosis? loss of function?), i.e., the process of evolution is a homogeneous Markov process. Maximum likelihood methods are explicitly model-based. Examples of models….

A C A C G T G T Two simple models of molecular evolution:
Jukes-Cantor (JC69) one-parameter model Assumes that all transformations between nucleotides occur at the same rate Kimura (K80 or K2P) two-parameter model Assumes that transitions and transversions occur at different rates (supported by empirical data).   A C A C   Transitions      Transversions  G T G T   JC69 K2P

JC69* rate matrix “To” state “From” state 1 parameter: a
Models of molecular evolution are based on substitution rate matrices (and which can be transformed into substitution probability matrices). Models vary in the numbers and kinds of parameters used to determine elements in the rate matrix JC69* rate matrix 1 parameter: a “To” state “From” state *Jukes, T. H., and C. R. Cantor Evolution of protein molecules. Pages in H. N. Munro (ed.), Mammalian Protein Metabolism. Academic Press, New York. © Paul Lewis

Generalized time reversible (GTR) model: Transformation rates determined by mean substitution rate (m), relative rate parameters (a-e) and base frequencies (e.g., pA) 9 parameters: pA pC pG a b c d e m -m (pAc + pCe + pGf) Identical to the JC69 model if a = b = c = d = e = f = 1 and all the base frequencies are set to ¼. *Lanave, C., G. Preparata, C. Saccone, and G. Serio A new method for calculating evolutionary substitution rates. Journal of Molecular Evolution 20:86-93. © Paul Lewis

The models discussed so far are interconvertible by adding or restricting parameters.
Swofford et al. p. 434

Elaborations to GTR (and other nucleotide models):
Modifications to accommodate rate heterogeneity Discrete gamma () model of rate heterogeneity (rates of evolution for each site are distributed according to a “discretized” gamma distribution) Proportion of invariant (I) sites model (some sites assumed to be invariant) Combinations of the above, e.g., GTR +  + I model Molecular clock models Strict clock models (uniform rates of evolution across the phylogeny) Relaxed clock models (rates allowed to vary across branches) Correlated relaxed clock (rates of adjacent branches are correlated) Uncorrelated relaxed clock (rates of adjacent branches are independent) Other kinds of models used in phylogenetics: Amino acid sequence models: Dayhoff; PAM; Wagner, etc Codon-based nucleotide models Morphological character models Various methods exist for choosing models of molecular evolution (aiming for a model that is not underparameterized [poor fit to observed data] or overparameterized [poor predictive power; computationally intensive]). Criteria: Likelihood ratio test; Akaike Information Criterion (AIC); Bayesian Information Criterion (BIC) Programs by David Posada and colleagues: Modeltest (nucleotides) ProtTest (amino acids)

Calculating the likelihood of a tree
L = P(datatree, model) Consider a tree, with four species and branch lengths specified, and an aligned dataset For each site, there are four observed tip states and sixteen (4x4) possible combinations of ancestral states Each reconstruction of ancestral states has a probability, based on the transformation probability matrix and branch lengths The likelihood of the tree for the one site is equal to the sum of the probabilities of each reconstruction of ancestral states The likelihood of the entire dataset given the tree is the product of the likelihoods of the individual sites, or the sum of the log likelihoods of the sites This calculation is repeated on multiple trees, and the tree that provides the highest likelihood score is preferred. The process of generating the trees can use exhaustive, branch-and-bound, or heuristic searches (often with starting trees generated with parsimony or distance searches). Calculating the likelihood of a tree Graur and Li Fig 5.19

Likelihood calculations, continued
Likelihood calculations, continued. Brute force approach would be to calculate L for all 16 combinations of ancestral states and sum

Pruning algorithm* (same result, much less time)
Many calculations can be done just once, and then reused many times *The pruning algorithm was introduced by: Felsenstein, J Evolutionary trees from DNA sequences: a maximum likelihood approach. Journal of Molecular Evolution 17: © Paul Lewis

Recent algorithms have dramatically improved the efficiency of ML calculations.
e.g., RAxML by Alexi Stamatakis GARLI by Derrick Zwickl Nonetheless, ML calculations are still a lot of work. Are they worth it? John Huelsenbeck performed a classic study to test accuracy of parsimony, ML, and distance methods (Huelsenbeck, J Performance of phylogenetic methods in simulation. Syst. Biol. 44: 17-48) Simulations: 4-taxon trees with varying branch lengths (including the Felsenstein zone) implying change in 1% to 75%; total of 1296 (parsimony, distance) or 676 (ML) combinations of branch lengths. Datasets “evolved” on each tree under three models: Jukes-Cantor, Kimura 2-parameter (with 5:1 transition/transversion bias), JC with among-site rate heterogeneity 1000 simulated datasets were produced for each of 1296 trees; except ML, with 100 datasets for each of 676 trees Phylogenetic analyses were performed for each tree/dataset combination using 26 analytical methods, with 100, 500, 1000, or infinite sites.

Results: Results for simple parsimony on data generated under Jukes-Cantor model. 95% isocline indicates region within which the correct tree is estimated 95% of the time. Increasing data improves accuracy, up to a point--in the Felsenstein zone, even infinite data do not allow correct reconstruction of the phylogeny. Parsimony is inconsistent.

Results continued: Results for 26 methods on data generated under Jukes-Cantor model. Black: correct tree is recovered 100% of the time. White: correct tree never recovered. Gray: intermediate. White lines: 95% isocline (region within which the correct tree is estimated 95% of the time). Black lines: 33% isocline (equivalent to picking a tree at random).

What does it mean?: For all methods, more data are better, but…
Parsimony is inconsistent, even when character-state weighting is applied. Model-based distance corrections improve performance of minimum evolution distance method UPGMA performs poorly, perhaps due to implicit assumption of equal rate of evolution ML performs best, but still has trouble in the FZ

ML appears to be an accurate method under a wide range of treespace (and modelspace).
But… ML is still computationally intensive. Intuitively the Likelihood of the data L = P(datatree, model) is not really what we care about. (After all, we have observed the data.) What we really want to know is, what is the probability of the tree: P(treedata) Bayes’ theorem allows estimation of the posterior probability of a tree, given a prior probability (marginal probability) of the tree.

Questions Now….you may be able to compare and contrast the advantages and disadvantages of Parsimony and Likelihood analysis. Which one is better?

Maximum Parsimony Maximum Likelihood More?
chooses the tree that requires the smallest number of character state changes chooses the tree that maximizes the probability of the data trees are scored based on a character dataset, and the tree with the best score is selected character-based method score is a measure of the number of evolutionary changes (e.g., A changing to T) that would be required to generate the data given that particular tree. parametric statistical method, in that employs an explicit model of character evolution. requires the lowest evolutionary changes depends on the complete specification of the data and a probability model to describe the data “most parsimonious tree assumption is that evolution rarely happens. substitution model should be optimized to fit the observed data optimality criterion where we do choose the shortest tree among all contenders. More? a simple method and easily understood operation. It does not seem to depend on an explicit model of evolution results can converge on an incorrect solution and also underestimates branch lengths. uninformative or misinformative characters Long branch attraction

Maximum likelihood analysis using ITS
Use of Maximum Likelihood analysis Seelan et al. 2015 Maximum likelihood analysis using ITS

Phylogenetics LLO9 Maximum Likelihood and Its Applications

Similar presentations

Presentation on theme: "Phylogenetics LLO9 Maximum Likelihood and Its Applications"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Phylogenetics LLO9 Maximum Likelihood and Its Applications

Similar presentations

Presentation on theme: "Phylogenetics LLO9 Maximum Likelihood and Its Applications"— Presentation transcript:

Similar presentations

About project

Feedback