1 Additive Distances Between DNA Sequences MPI, June 2012.

Slides:



Advertisements
Similar presentations
Introduction to Molecular Evolution
Advertisements

1 Towards optimal distance functions for stochastic substitution models Ilan Gronau, Shlomo Moran, Irad Yavneh Technion, Israel.
Computing a tree Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas.
CS 598AGB What simulations can tell us. Questions that simulations cannot answer Simulations are on finite data. Some questions (e.g., whether a method.
A Separate Analysis Approach to the Reconstruction of Phylogenetic Networks Luay Nakhleh Department of Computer Sciences UT Austin.
Computing a tree Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas.
Phylogenetic Trees Lecture 4
פרויקט בתכנות מחקר השוואתי בשחזור עצי אבולוציה: אלגוריתמים קיימים מול תכנות בשלמים אביב 2013 מרצה: שלמה מורן מנחה חיצוני: יוסי שילוח Website:
MAT 4830 Mathematical Modeling 4.4 Matrix Models of Base Substitutions II
Maximum Likelihood. Likelihood The likelihood is the probability of the data given the model.
IE68 - Biological databases Phylogenetic analysis
. פרויקט בתכנות מתקדם – פונקציות מרחק אופטימליות לשיחזור עצי אבולוציה סמסטר אביב דואר אלקטרוני חדרטלפון.
. Computational Genomics 5a Distance Based Trees Reconstruction (cont.) Modified by Benny Chor, from slides by Shlomo Moran and Ydo Wexler (IIT)
. Phylogeny II : Parsimony, ML, SEMPHY. Phylogenetic Tree u Topology: bifurcating Leaves - 1…N Internal nodes N+1…2N-2 leaf branch internal node.
. Maximum Likelihood (ML) Parameter Estimation with applications to inferring phylogenetic trees Comput. Genomics, lecture 7a Presentation partially taken.
Fast Algorithms for Minimum Evolution Richard Desper, NCBI Olivier Gascuel, LIRMM.
Maximum Likelihood. Historically the newest method. Popularized by Joseph Felsenstein, Seattle, Washington. Its slow uptake by the scientific community.
Distance Methods. Distance Estimates attempt to estimate the mean number of changes per site since 2 species (sequences) split from each other Simply.
Molecular Evolution with an emphasis on substitution rates Gavin JD Smith State Key Laboratory of Emerging Infectious Diseases & Department of Microbiology.
Maximum Likelihood Flips usage of probability function A typical calculation: P(h|n,p) = C(h, n) * p h * (1-p) (n-h) The implied question: Given p of success.
We have shown that: To see what this means in the long run let α=.001 and graph p:
CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS Distance Matrix Methods: Models of Evolution Anders Gorm Pedersen Molecular Evolution Group Center for Biological.
Probabilistic Approaches to Phylogeny Wouter Van Gool & Thomas Jellema.
CISC667, F05, Lec16, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Phylogenetic Trees (III) Probabilistic methods.
Probabilistic methods for phylogenetic trees (Part 2)
Building Phylogenies Distance-Based Methods. Methods Distance-based Parsimony Maximum likelihood.
Estimating Evolutionary Distances from DNA Sequences Lecture 14 ©Shlomo Moran, parts based on Ilan Gronau.
CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS Distance Matrix Methods Anders Gorm Pedersen Molecular Evolution Group Center for Biological Sequence Analysis.
CIS786, Lecture 8 Usman Roshan Some of the slides are based upon material by Dennis Livesay and David.
Phylogenetic trees Sushmita Roy BMI/CS 576
. Phylogenetic Trees Lecture 13 This class consists of parts of Prof Joe Felsenstein’s lectures 4 and 5 taken from:
Phylogeny Estimation: Traditional and Bayesian Approaches Molecular Evolution, 2003
Molecular evidence for endosymbiosis Perform blastp to investigate sequence similarity among domains of life Found yeast nuclear genes exhibit more sequence.
Lecture 3: Markov models of sequence evolution Alexei Drummond.
Tree Inference Methods
1 Dan Graur Molecular Phylogenetics Molecular phylogenetic approaches: 1. distance-matrix (based on distance measures) 2. character-state.
Phylogenetic Analysis. General comments on phylogenetics Phylogenetics is the branch of biology that deals with evolutionary relatedness Uses some measure.
Molecular phylogenetics 1 Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Sections
BINF6201/8201 Molecular phylogenetic methods
Bioinformatics 2011 Molecular Evolution Revised 29/12/06.
Building phylogenetic trees. Contents Phylogeny Phylogenetic trees How to make a phylogenetic tree from pairwise distances  UPGMA method (+ an example)
1 Evolutionary Change in Nucleotide Sequences Dan Graur.
Benjamin Loyle 2004 Cse 397 Solving Phylogenetic Trees Benjamin Loyle March 16, 2004 Cse 397 : Intro to MBIO.
Lecture 10 – Models of DNA Sequence Evolution Correct for multiple substitutions in calculating pairwise genetic distances. Derive transformation probabilities.
Using traveling salesman problem algorithms for evolutionary tree construction Chantal Korostensky and Gaston H. Gonnet Presentation by: Ben Snider.
More statistical stuff CS 394C Feb 6, Today Review of material from Jan 31 Calculating pattern probabilities Why maximum parsimony and UPGMA are.
Statistical stuff: models, methods, and performance issues CS 394C September 16, 2013.
MAT 4830 Mathematical Modeling
Why do trees?. Phylogeny 101 OTUsoperational taxonomic units: species, populations, individuals Nodes internal (often ancestors) Nodes external (terminal,
Maximum Likelihood Given competing explanations for a particular observation, which explanation should we choose? Maximum likelihood methodologies suggest.
Measuring genetic change Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Section 5.2.
Evolutionary Models CS 498 SS Saurabh Sinha. Models of nucleotide substitution The DNA that we study in bioinformatics is the end(??)-product of evolution.
Molecular Evolution Distance Methods Biol. Luis Delaye Facultad de Ciencias, UNAM.
Statistical stuff: models, methods, and performance issues CS 394C September 3, 2009.
Distance-based methods for phylogenetic tree reconstruction Colin Dewey BMI/CS 576 Fall 2015.
Distance-based phylogeny estimation
Models for DNA substitution
Lecture 10 – Models of DNA Sequence Evolution
Distance based phylogenetics
Inferring a phylogeny is an estimation procedure.
Maximum likelihood (ML) method
Distances.
Models of Sequence Evolution
Goals of Phylogenetic Analysis
Inferring phylogenetic trees: Distance and maximum likelihood methods
The Most General Markov Substitution Model on an Unrooted Tree
Lecture 10 – Models of DNA Sequence Evolution
Incorporating uncertainty in distance-matrix phylogenetics
Presentation transcript:

1 Additive Distances Between DNA Sequences MPI, June 2012

Additive Evolutionary distance : The number of substitutions which occurred during the sequence evolution ACAC CCCC C G T A site 1 site 2 substitutions Some substitutions are hidden, due to overwriting. Therefore, the exact number of subst. is usually larger than the number of observed changes. site 3 0

3 Edge weight = Expected number of substit’s per site AACA…GTCTTCGAGGCCC u v AGCA…GCCTATGCGACCT MPI, June … Number of substitutions per site

4 When the exact number of substitutions between any two sequences is known, NJ (and any other algorithm which reconstructs trees from the exact distances) returns the correct evolutionary tree. Interleaf distances: sum of edge weights v u d(u,v) = 1.12

5 Estimating # of substitutions from observed substitutions requires Substitution Model JC [Jukes Cantor 1969] Kimura 2 Parameter (K2P) [Kimura 1980] HKY [Hasegawa, Kishino and Yano 1985] TN [Tamura and Nei 1993] GTR: Generalised time-reversible [Tavaré 1986] …and more…

6 Distance estimation in the Jukes Cantor model

7 Jukes Cantor model: All substitutions are equally like JC generic rate matrix t is the expected # of substitutions per site u v t uv R uv =

8 Substitution Matrix P (Theory of Markov Processes) R = Rate Matrix R P =

9 JC distance estimation: First estimate the substitution matrix u AACA…GTCTTCGAGGCCC v AGCA…GCCTATGCGACCT an Estimation of P uv From observed substit’s

10 Estimate t from estimation of p(t) by “reverse engineering” Solve the formula for p(t)

11 Checking the effect of estimation-errors in Reconstructing Quartets

12 Quartets Reconstruction = Finding the correct split AC BD AB C D AC DB Quartets are trees with four leaves. They have three possible (fully resolved) topologies, called splits: Distance methods resolves splits by the 4 point method

13 The 4 points method AC BD The 4-point condition: w sep The 4-point condition for estimated distances:

14 Evaluate the accuracy of reconstructing quartets using evolutionary distances root D C A B t is “evolutionary time” The diameter of the quartet is 22t

15 Phase A: simulate evolution D C A B

16 Phase B: reconstruct the split by the 4p condition DCBA                   Apply the 4p condition. Is the recontruction correct? compute distances between sequences, Repeat this process 10,000 times, count number of failures

17 This test was applied on the model quartet with various diameters  For each diameter, mark the fraction (percentage) of the simulations in which the reconstruction failed (next slide) ……

18 Performance of K2P distances in resolving quartets, small diameters: Template quartet

19 Performance for larger diameters “site saturation”

20 Repeat this experiment on the Hasegawa tree Assume the JC model. Reconstruct by the NJ algorithm (use any variants of NJ available in MATLAB)

Hasegawa Tree 21