Part 9 Phylogenetic Trees

Slides:



Advertisements
Similar presentations
Computational Molecular Biology Biochem 218 – BioMedical Informatics Doug Brutlag Professor.
Advertisements

Phylogenetic Tree A Phylogeny (Phylogenetic tree) or Evolutionary tree represents the evolutionary relationships among a set of organisms or groups of.
. Class 9: Phylogenetic Trees. The Tree of Life Evolution u Many theories of evolution u Basic idea: l speciation events lead to creation of different.
Multiple Sequence Alignment & Phylogenetic Trees.
Phylogenetic Analysis 1 Phylogeny (phylo =tribe + genesis)
Based on lectures by C-B Stewart, and by Tal Pupko Phylogenetic Analysis based on two talks, by Caro-Beth Stewart, Ph.D. Department of Biological Sciences.
Phylogenetic Analysis
Phylogenetics - Distance-Based Methods CIS 667 March 11, 2204.
Summer Bioinformatics Workshop 2008 Comparative Genomics and Phylogenetics Chi-Cheng Lin, Ph.D., Professor Department of Computer Science Winona State.
Phylogenetic reconstruction
Phylogenetic trees Sushmita Roy BMI/CS 576 Sep 23 rd, 2014.
Molecular Evolution Revised 29/12/06
Tree Reconstruction.
© Wiley Publishing All Rights Reserved. Phylogeny.
Lecture 7 – Algorithmic Approaches Justification: Any estimate of a phylogenetic tree has a large variance. Therefore, any tree that we can demonstrate.
. Computational Genomics 5a Distance Based Trees Reconstruction (cont.) Modified by Benny Chor, from slides by Shlomo Moran and Ydo Wexler (IIT)
Phylogenetic reconstruction
In addition to maximum parsimony (MP) and likelihood methods, pairwise distance methods form the third large group of methods to infer evolutionary trees.
The Tree of Life From Ernst Haeckel, 1891.
Phylogenetic reconstruction
CISC667, F05, Lec15, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Phylogenetic Trees (II) Distance-based methods.
Bas E. Dutilh Phylogenomics Using complete genomes to determine the phylogeny of species.
. Class 9: Phylogenetic Trees. The Tree of Life D’après Ernst Haeckel, 1891.
Introduction to Molecular Phylogeny Starting point: a set of homologous, aligned DNA or protein sequences Result of the process: a tree describing evolutionary.
Building Phylogenies Parsimony 1. Methods Distance-based Parsimony Maximum likelihood.
Building Phylogenies Distance-Based Methods. Methods Distance-based Parsimony Maximum likelihood.
Phylogenetic Analysis. 2 Phylogenetic Analysis Overview Insight into evolutionary relationships Inferring or estimating these evolutionary relationships.
Phylogenetic trees Sushmita Roy BMI/CS 576
Multiple Sequence Alignments and Phylogeny.  Within a protein sequence, some regions will be more conserved than others. As more conserved,
Phylogenetic Analysis. 2 Introduction Intension –Using powerful algorithms to reconstruct the evolutionary history of all know organisms. Phylogenetic.
Terminology of phylogenetic trees
Molecular evidence for endosymbiosis Perform blastp to investigate sequence similarity among domains of life Found yeast nuclear genes exhibit more sequence.
Phylogenetics Alexei Drummond. CS Friday quiz: How many rooted binary trees having 20 labeled terminal nodes are there? (A) (B)
Chapter 26: Phylogeny and the Tree of Life Objectives 1.Identify how phylogenies show evolutionary relationships. 2.Phylogenies are inferred based homologies.
1 Dan Graur Molecular Phylogenetics Molecular phylogenetic approaches: 1. distance-matrix (based on distance measures) 2. character-state.
1 Summary on similarity search or Why do we care about far homologies ? A protein from a new pathogenic bacteria. We have no idea what it does A protein.
Phylogentic Tree Evolution Evolution of organisms is driven by Diversity  Different individuals carry different variants of.
Phylogenetic Analysis. General comments on phylogenetics Phylogenetics is the branch of biology that deals with evolutionary relatedness Uses some measure.
Molecular phylogenetics 1 Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Sections
BINF6201/8201 Molecular phylogenetic methods
Bioinformatics 2011 Molecular Evolution Revised 29/12/06.
Phylogenetic Inference Data Optimality Criteria Algorithms Results Practicalities BIO520 BioinformaticsJim Lund Reading: Ch8.
OUTLINE Phylogeny UPGMA Neighbor Joining Method Phylogeny Understanding life through time, over long periods of past time, the connections between all.
分子演化 Molecular Evolution
Building phylogenetic trees. Contents Phylogeny Phylogenetic trees How to make a phylogenetic tree from pairwise distances  UPGMA method (+ an example)
Introduction to Phylogenetics
Calculating branch lengths from distances. ABC A B C----- a b c.
Algorithms in Computational Biology11Department of Mathematics & Computer Science Algorithms in Computational Biology Building Phylogenetic Trees.
Molecular Phylogeny. 2 Phylogeny is the inference of evolutionary relationships. Traditionally, phylogeny relied on the comparison of morphological features.
Chapter 10 Phylogenetic Basics. Similarities and divergence between biological sequences are often represented by phylogenetic trees Phylogenetics is.
Phylogeny Ch. 7 & 8.
Phylogenetic trees Sushmita Roy BMI/CS 576 Sep 23 rd, 2014.
Applied Bioinformatics Week 8 Jens Allmer. Theory I.
Phylogenetics.
Ayesha M.Khan Spring Phylogenetic Basics 2 One central field in biology is to infer the relation between species. Do they possess a common ancestor?
1 CAP5510 – Bioinformatics Phylogeny Tamer Kahveci CISE Department University of Florida.
Distance-Based Approaches to Inferring Phylogenetic Trees BMI/CS 576 Colin Dewey Fall 2010.
Distance-based methods for phylogenetic tree reconstruction Colin Dewey BMI/CS 576 Fall 2015.
CSCE555 Bioinformatics Lecture 13 Phylogenetics II Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page:
Phylogenetic trees. 2 Phylogeny is the inference of evolutionary relationships. Traditionally, phylogeny relied on the comparison of morphological features.
Phylogeny and the Tree of Life
Evolutionary genomics can now be applied beyond ‘model’ organisms
Inferring a phylogeny is an estimation procedure.
Phylogenetic Inference
Clustering methods Tree building methods for distance-based trees
Multiple Alignment and Phylogenetic Trees
The Tree of Life From Ernst Haeckel, 1891.
Phylogenetic Trees.
Phylogeny.
Presentation transcript:

Part 9 Phylogenetic Trees

Phylogeny PHYLOGENY (coined 1866 Haeckel) the line of descent or evolutionary development of any plant or animal species the origin and evolution of a division, group or race of animals or plants

Goals Understand evolutionary history Assist in epidemiology Origin of Europeans Assist in epidemiology of infectious diseases of genetic defects Aid in prediction of function of novel genes Biodiversity studies Understanding microbial ecologies

Mitochondria and Phylogeny Mitochondrial DNA (mtDNA): Extra-nuclear DNA, transmitted through maternal lineage. Allows tracing of a single genetic line 16.5 Kb circular DNA contains genes: coding for 13 proteins, 22 tRNA genes, 2 rRNA genes. mtDNA has a pointwise mutation substitution rate 10 times faster than nuclear DNA: provides a way to infer relationships between closely related individuals

HIV-1 Origins

Which species are the closest living relatives of modern humans? Gorillas Chimpanzees Chimpanzees Bonobos Bonobos Gorillas Orangutans Orangutans Humans 14 15-30 MYA (Million Years Ago) MYA Mitochondrial DNA, most nuclear DNA-encoded genes, and DNA/DNA hybridization all show that bonobos and chimpanzees are related more closely to humans than either are to gorillas. The pre-molecular view was that the great apes (chimpanzees, gorillas and orangutans) formed a clade separate from humans, and that humans diverged from the apes at least 15-30 MYA.

Did the Florida Dentist infect his patients with HIV? Phylogenetic tree of HIV sequences from the DENTIST, his Patients, & Local HIV-infected People: Patient C Patient A Patient G Yes: The HIV sequences from these patients fall within the clade of HIV sequences found in the dentist. Patient B Patient E Patient A DENTIST Local control 2 Local control 3 No Patient F Local control 9 Local control 35 Local control 3 No Patient D From Ou et al. (1992) and Page & Holmes (1998)

Gene Tree vs. Species Tree The evolutionary history of genes reflects that of species that carry them, except if : horizontal transfer = gene transfer between species (e.g. bacteria, mitochondria) Gene duplication : orthology/ paralogy

Orthology / Paralogy ! ancestral GNS gene Homology : two genes speciation duplication Homology : two genes are homologous iff Rodents they have a common ancestor. Primates Orthology : two genes are orthologous iff GNS1 GNS2 they diverged following a speciation event. Paralogy : two genes are paralogous iff they diverged following a duplication event. GNS GNS1 GNS1 GNS2 GNS2 Human Rat Mouse Rat Mouse ! Orthology  functional equivalence

Building Phylogenies: Phenotype Information has problems Can be difficult to observe Bacteria Difficult to compare diverse species Plants, bacteria, animals

Data for Building Phylogenies Characteristics Traits (continuous or discrete) Biomolecular features character state matrix Numerical distance estimates distance matrix

Example of Character-based Phylogeny A character state matrix Taxon c1 c2 c3 c4 c5 c6 A 0 0 0 1 1 0 B 1 1 0 0 0 0 C 0 0 0 1 1 1 D 1 0 1 0 0 0 E 0 0 0 1 0 0 1 1 0 0 0 0 B D E A C c1 c4 c2 c3 c5 c6 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 1 0 1 0 1 0 0 0 0 0 0 1 1 1

Different Kinds of Trees Order of evolution Rooted: indicates direction of evolution Unrooted: only reflects the distance Rate of evolution Edge lengths: distance (scaled trees) Molecular clock: constant rate of evolution Unscaled trees

Rooted and Unrooted Trees Most phylogenetic methods produce unrooted trees. This is because they detect differences between sequences, but have no means to orient residue changes relatively to time. Two means to root an unrooted tree : The outgroup method : include in the analysis a group of sequences known a priori to be external to the group under study; the root is by necessity on the branch joining the outgroup to other sequences. Make the molecular clock hypothesis : all lineages are supposed to have evolved with the same speed since divergence from their common ancestor. Root the tree at the midway point between the two most distant taxa in the tree, as determined by branch lengths. The root is at the equidistant point from all tree leaves.

Rooting unrooted trees By outgroup: outgroup A d (A,D) = 10 + 3 + 5 = 18 Midpoint = 18 / 2 = 9 By midpoint or distance: 10 C 3 2 B 2 5 D

Unrooted Tree

Rooted Tree

Universal phylogeny Archaea Bacteria Eucarya deduced from comparison of SSU and LSU rRNA sequences (2508 homologous sites) using Kimura’s 2-parameter distance and the NJ method. The absence of root in this tree is expressed using a circular design. Archaea Bacteria

Tree building Methods Character-based methods Distance-based methods Maximum parsimony Maximum likelihood Distance-based methods UPGMA NJ

Distance Matrix Methods Given a pairwise distace matrix D Produce a tree such that the path distance between leaves i and j (sum of edge weights in the path between i and j) equals dij Optimize the error between d and D Least square error metric: LSQ LSQ(d,D) = Σ Σ (dij – Dij)2 NP-complete Heuristics (usually based on agglomerative (group by group) clustering) UPGMA NJ Both assume additive distances implies that distance is a metric symmetry triangle inequality d(x,y) = 0 iff x = y d(x,y) >= 0

Distance Measures DNA sequences Protein sequences Percent Identities PAM matrix

Example Tree and Additive Matrix c b d 1.5 5.5 1 2.5 3 2 a b c d e A 10 12 8 7 B 4 14 C 6 16 D E There exists a tree with additive distances

Additive Trees from Additive Matrices Verify that the distance matrix is additive Choose a pair of objects, which results in the first path in the tree. Choose a third object and establish the linear equations to let the object branch off the path. Choose a pair of leaves in the tree constructed so far and compute the point at which a newly chosen object is inserted. The new path branches off an existing node in the tree: Do the insertion step once more in the branching path. The new path branches off an edge in the tree: This insertion is finished.

Example A B C D E 2 7 4 6 7 C A A 1 6 C X C B 5 A 1 B 2 D C E 5 E A 1 2 7 4 6 A C 1 B 5 2 D C A C 1 B 2 D E 3 A 1 B 5 2 D E 6 NO!

Approximating Additive Matrices In practice, the distance matrix between molecular sequences will not be additive. An additive tree T whose distance matrix approximates the given one is used. The methods for exact tree reconstruction provide an inventory for heuristics for tree construction based on approximating additive metrics. Heuristics give exact results when operating on additive metrics.

UPGMA Unweighted Pair-Group Method with Arithmetic Mean Sokal and Michener 1958 Agglomerative clustering Ultrametric tree distances from root to all leaves are equal Cluster distances defined as

UPGMA Step 1 combine B and C Choose two clusters with minimum distance and combine them A B C D E 10 12 8 7 4 14 6 16 A B D E C

Updating distance matrices BC D E 11 8 7 5 15 12 A 2 C BC 2 D B Distance of new cluster to nodes in the cluster is half of original distance Distance of new cluster to other clusters is weighted mean of individual distances

UPGMA step 2 combine BC and D 11 8 7 5 15 12 E A 2 C BC 2 D B

Updating distance matrices BCD E 10 7 14 2 .5 C BCD BC 2.5 2 B D

UPGMA step 3 combine A and E AE 3.5 3.5 E A A BCD E 10 7 14 2 .5 C BCD BC 2.5 2 B D

Updating distance matrices AE 3.5 3.5 E A AE BCD 12 2 .5 C BCD BC 2.5 2 B D

UPGMA step 4 combine AE and BCD 3.5 3.5 E A AE BCD 12 2 .5 C BCD BC 2.5 2 B D

UPGMA Result A B C D E 10 12 8 7 4 14 6 16 produced tree AE 3.5 2.5 10 12 8 7 4 14 6 16 3.5 2.5 3.5 E A 2 3.5 .5 C BCD BC 2.5 2 B D produced tree

Actual tree A B C D E 10 12 8 7 4 14 6 16 actual tree AE 5.5 2.5 1.5 E 10 12 8 7 4 14 6 16 5.5 2.5 1.5 E A 3 3 2 C BCD BC 1 1 B D actual tree

Limitations of UPGMA Ultrametric tree Ultrametric distance Path distance from the root to each leaf is the same Ultrametric distance Usual metric conditions d(x,y) <= max[d(x,z), d(y,z)] 2 largest distances in any group of 3 are equal meaning in a tree setting? UPGMA works correctly for ultrametric distances

Neighbor Joining (NJ) Saitou and Nei, 1987 Produces unrooted tree Join clusters that are close to each other and also far from the rest Produces unrooted tree NJ is a fast method, even for hundreds of sequences. The NJ tree is an approximation of the minimum evolution tree (that whose total branch length is minimum). In that sense, the NJ method is very similar to parsimony methods because branch lengths represent substitutions. NJ always finds the correct tree if distances are additive (tree-like). NJ performs well when substitution rates vary among lineages. Thus NJ should find the correct tree if distances are well estimated.

Algorithm Define ui = Σ Dik / (n-2) Iterate until 2 nodes are left measure of average distance from other nodes Iterate until 2 nodes are left choose pair (i,j) with smallest Dij – ui– uj close to each other and far from others merge to a new node (ij) and update distance matrix Dk,(ij) = (Dik+ Djk- Dij)/2 -- consider the tree paths Di,(ij) = (Dij+ ui– uj)/2 -- similarly Dj,(ij) = Dij – Di,(ij) -- similarly delete nodes i and j For the final group (i,j), use Dij as the edge weight. k ≠i

Neighbor-Joining Result AE A B C D E 10 12 8 7 4 14 6 16 5.5 2.5 1.5 E A 3 3 2 C BCD BC 1 1 B D actual tree

WWW Resources PHYLIP : an extensive package of programs for all platforms http://evolution.genetics.washington.edu/phylip.html CLUSTALX : beyond alignment, it also performs NJ PAUP* : a very performing commercial package http://paup.csit.fsu.edu/index.html PHYLO_WIN : a graphical interface, for unix only http://pbil.univ-lyon1.fr/software/phylowin.html MrBayes : Bayesian phylogenetic analysis http://morphbank.ebc.uu.se/mrbayes/ PHYML : fast maximum likelihood tree building http://www.lirmm.fr/~guindon/phyml.html WWW-interface at Institut Pasteur, Paris http://bioweb.pasteur.fr/seqanal/phylogeny Tree drawing NJPLOT (for all platforms) http://pbil.univ-lyon1.fr/software/njplot.html