Phylogenetic Trees Lecture 12

Slides:



Advertisements
Similar presentations
Intro to Phylogenetic Trees Computational Genomics Lecture 4b
Advertisements

Computational Molecular Biology Biochem 218 – BioMedical Informatics Doug Brutlag Professor.
Phylogenetic Tree A Phylogeny (Phylogenetic tree) or Evolutionary tree represents the evolutionary relationships among a set of organisms or groups of.
. Phylogenetic Trees (2) Lecture 13 Based on: Durbin et al 7.4, Gusfield , Setubal&Meidanis 6.1.
. Class 9: Phylogenetic Trees. The Tree of Life Evolution u Many theories of evolution u Basic idea: l speciation events lead to creation of different.
. Intro to Phylogenetic Trees Lecture 5 Sections 7.1, 7.2, in Durbin et al. Chapter 17 in Gusfield Slides by Shlomo Moran. Slight modifications by Benny.
Multiple Sequence Alignment & Phylogenetic Trees.
. Phylogenetic Trees (2) Lecture 13 Based on: Durbin et al 7.4, Gusfield , Setubal&Meidanis 6.1.
Lecture 3 Molecular Evolution and Phylogeny. Facts on the molecular basis of life Every life forms is genome based Genomes evolves There are large numbers.
Phylogenetics - Distance-Based Methods CIS 667 March 11, 2204.
Phylogenetic reconstruction
Phylogenetic trees Sushmita Roy BMI/CS 576 Sep 23 rd, 2014.
The Saitou&Nei Neighbor Joining Algorithm ©Shlomo Moran & Ilan Gronau.
Molecular Evolution Revised 29/12/06
Tree Reconstruction.
From Ernst Haeckel, 1891 The Tree of Life.  Classical approach considers morphological features  number of legs, lengths of legs, etc.  Modern approach.
. Computational Genomics 5a Distance Based Trees Reconstruction (cont.) Modified by Benny Chor, from slides by Shlomo Moran and Ydo Wexler (IIT)
UPGMA and FM are distance based methods. UPGMA enforces the Molecular Clock Assumption. FM (Fitch-Margoliash) relieves that restriction, but still enforces.
Combining genes in phylogeny And How to test phylogeny methods … Tal Pupko Department of Cell Research and Immunology, George S. Wise Faculty of Life Sciences,
. Phylogenetic Trees Lecture 11 Sections 7.1, 7.2, in Durbin et al.
Bioinformatics Algorithms and Data Structures
Building phylogenetic trees Jurgen Mourik & Richard Vogelaars Utrecht University.
The Tree of Life From Ernst Haeckel, 1891.
. Phylogenetic Trees Lecture 1 Credits: N. Friedman, D. Geiger, S. Moran,
. Class 9: Phylogenetic Trees. The Tree of Life D’après Ernst Haeckel, 1891.
Phylogeny Tree Reconstruction
Building Phylogenies Parsimony 1. Methods Distance-based Parsimony Maximum likelihood.
Building Phylogenies Distance-Based Methods. Methods Distance-based Parsimony Maximum likelihood.
. Class 9: Phylogenetic Trees. The Tree of Life D’après Ernst Haeckel, 1891.
Phylogenetic Trees Lecture 2
. Computational Genomics 5a Distance Based Trees Reconstruction (cont.) Sections 7.1, 7.2, in Durbin et al. Chapter 17 in Gusfield (updated April 12, 2009)
. Phylogenetic Trees (2) Lecture 12 Based on: Durbin et al Section 7.3, 7.8, Gusfield: Algorithms on Strings, Trees, and Sequences Section 17.
. Phylogenetic Trees Lecture 11 Sections 7.1, 7.2, in Durbin et al.
Phylogenetic trees Sushmita Roy BMI/CS 576
. Phylogenetic Trees Lecture 13 This class consists of parts of Prof Joe Felsenstein’s lectures 4 and 5 taken from:
. Phylogenetic Trees Lecture 11 Sections 7.1, 7.2, in Durbin et al.
Phylogenetics Alexei Drummond. CS Friday quiz: How many rooted binary trees having 20 labeled terminal nodes are there? (A) (B)
PHYLOGENETIC TREES Dwyane George February 24,
Phylogentic Tree Evolution Evolution of organisms is driven by Diversity  Different individuals carry different variants of.
Phylogenetic Analysis. General comments on phylogenetics Phylogenetics is the branch of biology that deals with evolutionary relatedness Uses some measure.
BINF6201/8201 Molecular phylogenetic methods
Bioinformatics 2011 Molecular Evolution Revised 29/12/06.
. Phylogenetic Trees Lecture 11 Sections 6.1, 6.2, in Setubal et. al., 7.1, 7.1 Durbin et. al. © Shlomo Moran, based on Nir Friedman. Danny Geiger, Ilan.
Phylogenetic Tree Reconstruction
OUTLINE Phylogeny UPGMA Neighbor Joining Method Phylogeny Understanding life through time, over long periods of past time, the connections between all.
Phylogenetic Trees Tutorial 5. Agenda How to construct a tree using Neighbor Joining algorithm Phylogeny.fr tool Cool story of the day: Horizontal gene.
Introduction to Phylogenetic Trees
Building phylogenetic trees. Contents Phylogeny Phylogenetic trees How to make a phylogenetic tree from pairwise distances  UPGMA method (+ an example)
Introduction to Phylogenetics
CSCE555 Bioinformatics Lecture 12 Phylogenetics I Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page:
Ch.6 Phylogenetic Trees 2 Contents Phylogenetic Trees Character State Matrix Perfect Phylogeny Binary Character States Two Characters Distance Matrix.
Evolutionary tree reconstruction
Algorithms in Computational Biology11Department of Mathematics & Computer Science Algorithms in Computational Biology Building Phylogenetic Trees.
Comp. Genomics Recitation 8 Phylogeny. Outline Phylogeny: Distance based Probabilistic Parsimony.
Chapter 10 Phylogenetic Basics. Similarities and divergence between biological sequences are often represented by phylogenetic trees Phylogenetics is.
Phylogeny Ch. 7 & 8.
Phylogenetic trees Sushmita Roy BMI/CS 576 Sep 23 rd, 2014.
Tutorial 5 Phylogenetic Trees.
Phylogenetic Trees - Parsimony Tutorial #13
Ayesha M.Khan Spring Phylogenetic Basics 2 One central field in biology is to infer the relation between species. Do they possess a common ancestor?
Distance-Based Approaches to Inferring Phylogenetic Trees BMI/CS 576 Colin Dewey Fall 2010.
Building Phylogenies Maximum Likelihood. Methods Distance-based Parsimony Maximum likelihood.
Distance-based methods for phylogenetic tree reconstruction Colin Dewey BMI/CS 576 Fall 2015.
CSCE555 Bioinformatics Lecture 13 Phylogenetics II Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page:
Phylogenetic basis of systematics
dij(T) - the length of a path between leaves i and j
The Tree of Life From Ernst Haeckel, 1891.
Phylogenetic Trees.
Phylogeny.
Presentation transcript:

Phylogenetic Trees Lecture 12 Based on pages 160-176 in Durbin et al (the black text book). This class has been edited from Nir Friedman’s lecture which was available at www.cs.huji.ac.il/~nir. Pictures from Tal Pupko slides. Changes by Dan Geiger and Shlomo Moran. .

Evolution Evolution of new organisms is driven by Diversity Different individuals carry different variants of the same basic blue print Mutations The DNA sequence can be changed due to single base changes, deletion/insertion of DNA segments, etc. Selection bias

The Tree of Life Source: Alberts et al

Tree of life- a better picture D’après Ernst Haeckel, 1891

Primate evolution A phylogeny is a tree that describes the sequence of speciation events that lead to the forming of a set of current day species; also called a phylogenetic tree.

Morphological vs. Molecular Classical phylogenetic analysis: morphological features: number of legs, lengths of legs, etc. Modern biological methods allow to use molecular features Gene sequences Protein sequences Analysis based on homologous sequences (e.g., globins) in different species Important for many aspects of biology Classification Understanding biological mechanisms

Morphological topology (Based on Mc Kenna and Bell, 1997) Bonobo Chimpanzee Man Gorilla Sumatran orangutan Bornean orangutan Common gibbon Barbary ape Baboon White-fronted capuchin Slow loris Tree shrew Japanese pipistrelle Long-tailed bat Jamaican fruit-eating bat Horseshoe bat Little red flying fox Ryukyu flying fox Mouse Rat Vole Cane-rat Guinea pig Squirrel Dormouse Rabbit Pika Pig Hippopotamus Sheep Cow Alpaca Blue whale Fin whale Sperm whale Donkey Horse Indian rhino White rhino Elephant Aardvark Grey seal Harbor seal Dog Cat Asiatic shrew Long-clawed shrew Small Madagascar hedgehog Hedgehog Gymnure Mole Armadillo Bandicoot Wallaroo Opossum Platypus Archonta Glires Ungulata Carnivora Insectivora Xenarthra

From sequences to a phylogenetic tree Rat QEPGGLVVPPTDA Rabbit QEPGGMVVPPTDA Gorilla QEPGGLVVPPTDA Cat REPGGLVVPPTEG There are many possible types of sequences to use (e.g. Mitochondrial vs Nuclear proteins).

Mitochondrial topology (Based on Pupko et al.,) Perissodactyla Carnivora Cetartiodactyla Donkey Horse Indian rhino White rhino Grey seal Harbor seal Dog Cat Blue whale Fin whale Sperm whale Hippopotamus Sheep Cow Alpaca Pig Little red flying fox Ryukyu flying fox Horseshoe bat Japanese pipistrelle Long-tailed bat Jamaican fruit-eating bat Asiatic shrew Long-clawed shrew Mole Small Madagascar hedgehog Aardvark Elephant Armadillo Rabbit Pika Tree shrew Bonobo Chimpanzee Man Gorilla Sumatran orangutan Bornean orangutan Common gibbon Barbary ape Baboon White-fronted capuchin Slow loris Squirrel Dormouse Cane-rat Guinea pig Mouse Rat Vole Hedgehog Gymnure Bandicoot Wallaroo Opossum Platypus Primates Chiroptera Moles+Shrews Afrotheria Xenarthra Lagomorpha + Scandentia Rodentia 1 Hedgehogs Rodentia 2

Nuclear topology 1 2 3 4 Chiroptera Eulipotyphla Pholidota (Based on Pupko et al. slide) (tree by Madsenl) Cetartiodactyla Afrotheria Chiroptera Eulipotyphla Glires Xenarthra Carnivora Perissodactyla Scandentia+ Dermoptera Pholidota Primate Round Eared Bat Flying Fox Hedgehog Mole Pangolin Whale Hippo Cow Pig Cat Dog Horse Rhino Rat Capybara Rabbit Flying Lemur Tree Shrew Human Galago Sloth Hyrax Dugong Elephant Aardvark Elephant Shrew Opossum Kangaroo 1 2 3 4

Theory of Evolution Basic idea speciation events lead to creation of different species. Speciation caused by physical separation into groups where different genetic variants become dominant Any two species share a (possibly distant) common ancestor

Phylogenenetic trees Aardvark Bison Chimp Dog Elephant Leafs - current day species Nodes - hypothetical most recent common ancestors Edges length - “time” from one speciation to the next

Dangers in Molecular Phylogenies Gene and protein sequences can be homologous for various reasons: Orthologs -- sequences diverged after a speciation event. Indicative of a new specie. Paralogs -- sequences diverged after a duplication event. Xenologs -- sequences diverged after a horizontal transfer (e.g., by virus).

Gene Phylogenies Phylogenies can be constructed to describe evolution genes. Speciation events Gene Duplication 1A 2A 3A 3B 2B 1B Species Phylogeny Three species termed 1,2,3. Two paralog genes A and B.

Dangers of Paralogs If we happen to consider only species 1A, 2B, and 3A, we get a wrong tree that does not represent the phylogeny of the host species of the given sequences because duplication does not create new species. Gene Duplication Speciation events 1A 2A 3A 3B 2B 1B In the sequel we assume all given sequences are orthologs.

Types of Trees A natural model to consider is that of rooted trees Common Ancestor

Types of trees Unrooted tree represents phylogeny without the root node Depending on the model, data from current day species does not distinguish between different placements of the root. In this example there are seven possible ways to place a root.

Rooted versus unrooted trees Tree a Tree b Tree c b a c Represents the three rooted trees Slide by Tal Pupko

Positioning Roots in Unrooted Trees We can estimate the position of the root by introducing an outgroup: a set of species that are definitely distant from all the species of interest Proposed root Falcon Aardvark Bison Chimp Dog Elephant

Type of Data Distance-based Input is a matrix of distances between species Can be fraction of residue they disagree on, or alignment score between them, or … Character-based Examine each character (e.g., residue) separately

Three Methods of Tree Construction Distance- A tree that recursively combines two nodes of the smallest distance. Parsimony – A tree with a total minimum number of character changes between nodes. Maximum likelihood - Finding the best Bayesian network of a tree shape. The method of choice nowadays. Most known and useful software called phylip uses this method. http://evolution.genetics.washington.edu/phylip.html

Distance-Based (1st type Method) Input: distance matrix between species Outline: Cluster species together Initially clusters are singletons At each iteration combine two “closest” clusters to get a new one

UPGMA Clustering Let Ci and Cj be clusters, define distance between them to be When we combine two cluster, Ci and Cj, to form a new cluster Ck, then Define a node K and place its daughter nodes at depth d(Ci,Cj)/2

Example UPGMA construction on five objects. The length of an edge = its (vertical) height. 9 8 0.5d(7,8) 6 7 0.5d(2,3) 2 3 4 5 1

Molecular clock This phylogenetic tree has all leaves in the same level. When this property holds, the phylogenetic tree is said to satisfy a molecular clock. Namely, the time from a speciation event to the formation of current species is identical for all paths (wrong assumption in reality).

Molecular Clock UPGMA constructs trees that satisfy a molecular clock, even if the true tree does not satisfy a molecular clock. 2 3 4 1 1 2 3 4 UPGMA

Restrictive Correctness of UPGMA Proposition: If the distance function is derived by adding edge distances in a tree T with a molecular clock, then UPGMA will reconstruct T. Proof idea: Move a horizontal line from the bottom of the T to the top. Whenever an internal node is formed, the algorithm will create it.

Additivity Molecular clock defines additive distances, namely, distances between objects can be realized by a tree: a b c i j k

Basic property of Additivity Suppose input distances are additive For any three leaves Thus m c b j a k i

Constructing additive trees: The neighbor finding problem Can we use this fact to construct trees assuming only additivity (but not a molecular clock)? Yes. The formula shows that if we knew that i and j are neighboring leaves, then we can construct their parent node k and compute the distances of k to all other leaves m. We remove nodes i,j and add k.

Neighbor Finding How can we find from distances alone that a pair of nodes i,j are neighboring leaves? Closest nodes aren’t necessarily neighbors. A B C D Next we show one way to find neighbors from additive distances.

Neighbor Finding Theorem (Saitou&Nei) Assume all edge weights are positive. If D(i,j) is minimal (among all pairs of leaves), then i and j are neighboring leaves in the tree. i j k l m T1 T2

Neighbor Joining Algorithm Set L to contain all leaves Iteration: Choose i,j such that D(i,j) is minimal Create new node k, and set remove i,j from L, and add k Terminate: when |L| =2, connect two remaining nodes i j k m

Notations used in the proof Neighbor Finding Notations used in the proof p(i,j) = the path from vertex i to vertex j; P(D,C) = (e1,e2,e3) = (D,E,F,C) For a vertex i, and an edge e=(i’,j’): Ni(e) = |{k : e is on p(i,k)}|. ND(e1) = 3, ND(e2) = 2, ND(e3) = 1 NC(e1) = 1 A B C D e1 e3 e2 F E

Neighbor Finding Notation: For e=(i,m), we denote d(i,m) by d(e). Rest of T Lemma: For leaves i,j connected by a path (i,l,…,k,j), l k i j

Neighbor Finding Proof of Theorem: Assume by contradiction that D(i,j) is minimal for i,j which are not neighboring leaves. Let (i,l,...,k,j) be the path from i to j. Let T1 and T2 be the subtrees rooted at l and k. Let |T| denote the number of leaves in T. i j k l T1 T2

Neighbor Finding Case 1: i or j has a neighboring leaf. WLOG j and m are such leaves. A. D(i,j) - D(m,j)=(L-2)(d(i,j) - d(j,m) ) – (ri+rj) + rm+ rj {Definition} =(L-2)(d(i,k)-d(k,m) )+rm-ri {Figure} B. rm-ri ≥ (L-2)(d(k,m)-d(i,l)) + (4-L)d(k,l) {Lemma+Figure} (since for each edge eP(k,l), Nm(e)≥2 and Ni(e)  L-2, so Nm(e)- Ni(e ) ≥ 4-L ) Substituting B in A: D(i,j) - D(m,j) ≥ (L-2)(d(i,k)-d(i,l))+ (4-L)d(k,l) = 2d(k,l) > 0, contradicting the minimality assumption. i j k l m T2

Neighbor Finding Case 2: Not case 1. Then both T1 and T2 contain 2 neighboring leaves. We show that if D(i,j) is minimal, then we must have both |T1| > |T2| and |T2| > |T1| - which is a contradiction, hence D(i,j) is not minimal. i j k l m n p T1 T2 We prove that |T1| > |T2| by assuming that |T1| ≤ |T2| and reaching a contradiction. The proof that |T2| > |T1| is similar. Let n,m be neighboring leaves in T1.

Neighbor Finding A. 0 ≤ D(m,n) - D(i,j)= (L-2)(d(m,n) - d(i,j) ) + (ri+rj) – (rm+rn) B. rj-rm< (L-2)(d(j,k) – d(m,p)) + (|T1|-|T2|)d(k,p) (Because Nj(e)- Nm(e ) < |T1|-|T2|). i j k l m n p T1 T2 C. ri-rn < (L-2)(d(i,k) – d(n,p)) + (|T1|-|T2|)d(l,p) Adding B and C, noting that d(l,p)>d(k,p) and using the assumption |T1| - |T2| ≤ 0: D. (ri+rj) – (rm+rn) < (L-2)(d(i,j)-d(n,m)) + 2(|T1|-|T2|)d(k,p) Substituting D in the right hand side of A: 0 ≤ D(m,n) - D(i,j)< 2(|T1|-|T2|)d(k,p), hence |T1|-|T2| > 0, a contradiction.