Presentation is loading. Please wait.

Presentation is loading. Please wait.

Phylogeny. Reconstructing a phylogeny  The phylogenetic tree (phylogeny) describes the evolutionary relationships between the studied data  The data.

Similar presentations


Presentation on theme: "Phylogeny. Reconstructing a phylogeny  The phylogenetic tree (phylogeny) describes the evolutionary relationships between the studied data  The data."— Presentation transcript:

1 Phylogeny

2 Reconstructing a phylogeny  The phylogenetic tree (phylogeny) describes the evolutionary relationships between the studied data  The data must be comprised of homologous types  In molecular evolution, the studied data are homologous DNA/AA sequences  Phylogeny reconstruction explicitly assumes that the sequences are aligned INPUT = MSA

3 Reminder: MSA and phylogeny are dependent Inaccurate guide tree MSA Sequence alignment Phylogeny reconstruction Unaligned sequences

4 Phylogeny representation CA D Textual representation (Newick format) B Each pair of parenthesis () encloses a clade in the tree A comma “,” separates the members of the corresponding clade A semicolon “;” is always the last character Visual representation ((A,C),(B,D));

5 Some terminology root internal branches (splits) internal nodes External nodes (leaves) monophyletic group (clade) External branches Neighbors

6 ChimpHumanGorilla HumanChimpGorilla = ChimpGorillaHuman == GorillaChimp (Gorilla,(Human,Chimp)) = (Gorilla,(Chimp,Human)) = ((Human,Chimp),Gorilla) = ((Chimp,Human),Gorilla) Swapping neighbors is meaningless

7 1 2 3 A B C 1 CB A 2 BC A 3 AB C ≠ ≠ Rooted vs. unrooted

8 1 2 3 A B C 1 CB A 2 BC A 3 AB C ≠ ≠ ((C,B),A) ((A,B),C) ((A,C),B) (A,B,C) In newick format

9 How can we root a tree?

10 Rooting the tree based on a priori knowledge: using an outgroup HumanChimp Chicken Gorilla INGROUP OUTGROUP Human Chimp Gorilla Chicken Human Chimp Chicken Gorilla The outgroup should be close enough for detecting sequence homology, but far enough to be a clear outgroup

11 The gene tree is not always identical to the species tree Gorilla Chimp Chicken Human GorillaChimp Chicken Human Chimp Chicken Gorilla ≠ Gene tree Species tree

12 Phylogeny reconstruction approaches Distance based methods: Neighbor Joining B D A C E A D C E B A,B B D A C E ABCDE A02344 B0345 C034 D05 E0 CDE 02.54.53.5 C034 D05 E0 The Minimum Evolution (ME) criterion: in each iteration we separate the two sequences which result with the minimal sum of branch lengths

13 Maximum Parsimony: finds the most parsimonious topology Seq 1: Seq 2: Seq 3: Seq 4: 1324 14231234 Phylogeny reconstruction approaches 1324 14231234 P(Data|T) Maximum Likelihood: finds the most likely topology Topology search methods: MP, ML

14  Distance based methods Neighbor Joining (e.g., using ClustalX) Neighbor Joining (e.g., using ClustalX) Fast Fast  Inaccurate  Topology search methods Maximum parsimony (e.g., using MEGA ) Maximum parsimony (e.g., using MEGA ) MEGA ×Crude ×Questionable statistical basis Maximum likelihood (e.g., using RAxML, phyML ) Maximum likelihood (e.g., using RAxML, phyML ) RAxMLphyML RAxMLphyML Accurate Accurate  Slow  Bayesian methods Monte Carlo Markov Chains (MCMC) (e.g., using MrBayes ) Monte Carlo Markov Chains (MCMC) (e.g., using MrBayes ) MrBayes Most accurate Most accurate  Very slow Phylogeny reconstruction approaches: summary

15 How robust is our tree? HumanGorillaChimp

16  We need some statistical way to estimate the confidence in the tree topology  But we don’t know anything about the distribution of tree topologies  The only data source we have is our data (MSA)  So, we must rely on our own resources: “pull up by your own bootstraps” Bootstrap for estimating robustness

17 Bootstrap 1. C reate n (100-1000) new MSAs (pseudo-MSAs) by randomly sampling K positions from our original MSA with replacement 12345 K 1 : ATCTG…A 2 : ATCTG…C 3 : ACTTA…C 4 : ACCTA…T 11244…3 1 : AATTT…C 2 : AATTT…C 3 : AACTT…T 4 : AACTT…C 97478…10 1 : TTTTA…T 2 : CATAC…A 3 : CATAC…T 4 : AGTGG…A 51578…12 1 : GAGTA…T 2 : GAGAC…G 3 : AAAAC…A 4 : AAAGG…C Sp1 Sp2 Sp3 Sp4

18 Bootstrap 2. Reconstruct a pseudo-tree from each pseudo- MSA with the same method used for reconstructing the original tree Sp1 Sp2 Sp3 Sp4 Sp1 Sp2 Sp3 Sp4 Sp1 Sp2 Sp3 Sp4 11244…3 1 : AATTT…C 2 : AATTT…C 3 : AACTT…T 4 : AACTT…C 97478…10 1 : TTTTA…T 2 : CATAC…A 3 : CATAC…T 4 : AGTGG…A 51578…12 1 : GAGTA…T 2 : GAGAC…G 3 : AAAAC…A 4 : AAAGG…C

19 Bootstrap 3. For each split in our original tree, we count the number of times it appeared in the pseudo-trees Sp1 Sp2 Sp3 Sp4 Sp1 Sp2 Sp3 Sp4 Sp1 Sp2 Sp3 Sp4 Sp1 Sp2 Sp3 Sp4 67% 100% In 67% of the pseudo- trees, the split between SP1+SP2 and the rest of the tree was found In general bp support < 80% is considered low

20 ClustalX: NJ phylogeny reconstruction

21

22 http://phylobench.vital-it.ch/raxml-bbhttp://phylobench.vital-it.ch/raxml-bb/ http://phylobench.vital-it.ch/raxml-bb

23

24 Viewing the tree with njPlot

25 Note: unrooted tree

26 Defining an outgroup

27 Swapping nodes

28 Bootstrap support

29 FigTree: tree visualization and figure creation http://tree.bio.ed.ac.uk/software/figtree/ http://tree.bio.ed.ac.uk/software/figtree/

30 Reconstructing the tree of life

31 Darwin’s vision of the tree of life from the Origin of Species

32 The three-domain tree of life based on SSU rRNA MSA

33 But branching of several kingdoms remain in dispute

34 Lateral Gene Transfer (LGT) challenges the conceptual basis of phylogenetic classification

35

36 Methodology  Started with 36 genes universally present in 191 species (spanning all 3 domains of life), for which orthologs could be unambiguously identified  Eliminated 5 genes that are LGT suspects (mostly tRNA synthetases)  Constructed an MSA for each of the 31 orthogroups  Concatenated all 31 MSAs to a super-MSA of 8090 columns  The phylogeny was reconstructed based on the super-MSA using the maximum likelihood approach

37 Archaea Eukaryota Bacteria http://itol.embl.de

38 Tree support  81.7% of the splits show bootstrap support of over 80%  65% of the split show bootstrap support of 100%  However, several deep splits show low supports

39 Still, the debate goes on

40 “Tree of one percent of life”   Ciccarelli et al. on the one hand favor the claim that bacteria adhere to a bifurcating tree of life, given that the small amount of LGT genes are filtered   On the other hand, their filtering process left only 31 proteins, which represent ~1% of an average prokaryotic proteome and ~0.1% of a large eukaryotic proteome  “If throwing out all non-universally distributed genes and all LGT suspects leaves a 1% tree, then we should probably abandon the tree as a working hypothesis”


Download ppt "Phylogeny. Reconstructing a phylogeny  The phylogenetic tree (phylogeny) describes the evolutionary relationships between the studied data  The data."

Similar presentations


Ads by Google