Download presentation
Presentation is loading. Please wait.
1
Reconstructing the Tree of Life
Phylogenetics Reconstructing the Tree of Life
2
In the Speciation lecture, I talked about a “Phylogenetic Species Concept”
What is a “Phylogeny?” How do you construct one? Why on earth should I care?
3
Why you should care: All biological relationships can be determined by constructing phylogenies: Even if phylogenies are not always the best way to define species boundaries, they do tell you the genetic and evolutionary relationships among groups and individuals Your ancestry Diseases—figure out evolutionary origins and evolutionary pathways of disease, like HIV, Ebola, SARS, etc. Crops and live stock (food security)—rescue from inbreeding, create new varieties Endangered Species— figure out how endangered populations are related and how to perform genetic rescue
4
Tree of Life Web Project
5
EUKARYA BACTERIA ARCHAEA Textbook Version Land plants Dinoflagellates
Green algae Forams Ciliates Diatoms Red algae Amoebas Cellular slime molds Euglena Trypanosomes Animals Leishmania Fungi Sulfolobus Green nonsulfur bacteria Thermophiles (Mitochondrion) Figure The three domains of life Spirochetes Halophiles Chlamydia COMMON ANCESTOR OF ALL LIFE Green sulfur bacteria BACTERIA Methanobacterium Cyanobacteria ARCHAEA (Plastids, including chloroplasts)
6
Updated Tree of Life 2016 Hug et al. 2016 Nature Microbiology
Bacteria Eukarya Archaea Archaea
7
Outline What is a phylogeny? How do you construct a phylogeny?
The Molecular Clock Statistical Methods
8
Are Genetic Distances and fossil record roughly congruent?
Think about relationships among the major lineages of life and when they appeared in the fossil record Are Genetic Distances and fossil record roughly congruent?
9
Fossil Record vs Molecular Clock
4/15/2018 Fossil Record vs Molecular Clock Molecular clock and fossil record are not always congruent Fossil record is incomplete, and soft bodied species are usually not preserved Mutation rates can vary among species (depending on generation time, replication error, mismatch repair) But they provide complementary information Fossil record contains extinct species, while molecular data is based on extant taxa Major events in fossil record could be used to calibrate the molecular clock
10
Evolutionary History of HIV
HIV evolved multiple times from SIV (Simian Immunodeficiency Syndrome) Evolutionary Analysis Freeman& Herron, 2004 Time
11
Charles Darwin (1809 -1882) On the Origin of Species (1859)
Living species are related by common ancestry Change through time occurs at the population not the organism level The main cause of adaptive evolution is natural selection
12
Darwin envisaged evolution as a tree
The affinities of all the beings of the same class have sometimes been represented by a great tree. I believe this simile largely speaks the truth…… …The green and budding twigs may represent existing species; and those produced during former years may represent the long succession of extinct species….. ….the great Tree of Life….covers the earth with ever-branching and beautiful ramifications Charles Darwin, On the Origin of Species; pages
13
Reconstructing the Tree of Life
14
The only figure in The Origin of Species
15
What did people believe before Darwin?
Lamarck proposed a ladder of life Past Future
16
Jean-Baptiste Lamarck
French Naturalist ( ) “Professor of Worms and Insects” in Paris The first scientific theory of evolution (inheritance of acquired traits) Lamarck was a French biologists who lived in the 18th century. He propounded the heretical idea that species were not created in their current form but have changed over time. His model was, however, at odds with current ideas. First he believed (as everybody did then) that life arises spontaneously all the time - that maggots could appear spontaneously on rotting meat without an adult laying eggs! His idea was that there is a ladder of life with superior forms (humans of course) near the top and lower forms on lower rungs (plants near the bottom). He imagined that all organisms had an internal drive to ascend the ladder during evolution. Those that started earlier (or had a stronger drive) would be higher than those that evolved later. He did not think of evolution in a tree-like form. His model to explain adaptation was the popular idea of use-and-disuse or “the inheritance of acquired characters.” He imagined that when an organism tries to use an organ for some purpose repeatedly that organ will grow and, moreover, its offspring would start with an already enlarged organ. For example a giraffe would stretch for high leaves throughout its life and this would result in its offspring being borne with a longer neck. This mechanism was widely believe until the early 20th century. Now we know that it does not work - that there is no way that event going on during an organisms life can causes directed changes in its heritable material.
17
Lamarck’s View of Evolution
Being God Continuum between physical and biological world (followed Aristotle) Scala Naturae (“Ladder of Life” or “Great Chain of Being”) Angels Realm of Being Demons Man Animals Realm of Becoming Plants Minerals Non-Being
18
What is wrong with a ladder?
Evolution is not linear but branching Living organisms are not ancestors of one another The ladder implies progress The biggest problem is the treatment of living groups as ancestors. Also tends to reinforce ideas of “progress.”
19
What is right with the tree?
Evolution is a branching process If a mutation occurs, one species is not turning into another, but there is a split, and both lineages continue to evolve So, evolution is not progressive - all living taxa are equally “successful” Phylogenies (Trees) reflect the hierarchical structuring of relationships
20
The only figure in The Origin of Species
21
The Tree of Life is a Fractal
22
Genealogical structures
Phylogeny A depiction of the ancestry relations between species (it includes speciation events) Tree-like (divergent) Pedigree A depiction of the ancestry relations within populations Net-like (reticulating)
23
Four butterflies connected to their parents
offspring parents
24
future Individuals past Population
25
Lineage-branching Speciation Population Lineage/ Species
What happened here? Phylogeny Lineage-branching Speciation
26
What happened here? Extinction
27
Representation of phylogenies?
B C A B C A simplified representation The True History
28
Some terms used to describe a phylogenetic tree
Taxon (taxa) Tip Internal branch Internode Node (Speciation event) Root
29
Outline What is a phylogeny? How do you construct a phylogeny?
The Molecular Clock Statistical Methods
30
What is a Phylogeny? A phylogenetic tree represents a hypothesis about evolutionary relationships Each branch point represents the divergence of two taxa (e.g. species) Sister taxa are groups that share an immediate common ancestor
31
Molecular Clock Phylogenies rely on the “Molecular Clock,” namely the fact that Mutations on average, occur at a given rate So, on average, more mutational differences between taxa means that they branched from a common ancestor longer ago So longer branches on phylogeny often greater evolutionary distance Example: Mitochondria: 1 mutation every ~2.2%/million years
32
Molecular Clock Problem: mutation rate can vary among species
Mutation rate is faster: Shorter generation time (greater number of meiosis or mitosis events in a given time) Replication Error (e.g. Sloppy DNA or RNA polymerase; poor mismatch repair mechanisms)
34
Phylogenetic Trees with Proportional Branch Lengths
In some trees, the length of a branch can reflect the number of genetic changes that have taken place in a particular DNA sequence in that lineage So longer branches = greater evolutionary distance
35
Neutral data are better for capturing genetic distances (the molecular clock) than genes that might be under selection Why?
36
Phylogenetic Informative Characters (mutations)
Neutral mutations: Mutations that are not subjected to selection Better for constructing phylogenies because selection could make unrelated taxa appear more similar or related taxa more different Examples: Noncoding regions of DNA, 3rd codon position in proteins, introns, microsatellites (“junk DNA”)
37
Codon Bias In the case of amino acids
Mutations in Position 1, 2 lead to change Mutations in Position 3 don’t matter
38
Canis lupus Order Family Genus Species Felidae Panthera Pantherapardus
Taxidea Taxidea taxus Carnivora Mustelidae Lutra lutra Lutra Figure 26.4 The connection between classification and phylogeny Canis latrans Canidae Canis Canis lupus
39
Polytomy (unresolved branching point)
Branch point (node) Taxon A Taxon B Sister taxa Taxon C ANCESTRAL LINEAGE Taxon D Taxon E Figure 26.5 How to read a phylogenetic tree Taxon F Common ancestor of taxa A–F Polytomy (unresolved branching point)
40
A monophyletic clade consists of an ancestral taxa and all its descendants
B Group I B B C C C D D D E E Group II E Group III F F F Figure Monophyletic, paraphyletic, and polyphyletic groups G G G (a) Monophyletic group (clade) (b) Paraphyletic group (c) Polyphyletic group
41
Examples of Paraphyletic Groups
Figure 4.13 Monophyletic and paraphyletic groups The prokaryotes, dicotyledenous plants ("dicots"), and fish are all examples of paraphyletic groups. Examples of Paraphyletic Groups (not recognized as legitimate groups in the Phylogenetic Species Concept, which only recognizes monophyletic groups)
42
(a) Monophyletic group (clade)
B Group I C D E F G Figure 26.10a Monophyletic, paraphyletic, and polyphyletic groups (a) Monophyletic group (clade) (in the lecture on species concepts we discussed that the “smallest” monophyletic group is a “phylogenetic species”)
43
Figure 4.1 Monophyletic groups are comprised of an ancestor and all of its descendants
Monophyletic groups are also called clades or lineages. The groups circled here are all monophyletic. The group described by species 1, 2, 3, and their closest common ancestor is also monophyletic.
45
Synapomorphies Synapomorphies are shared derived homologous traits
They can be DNA nucleotides or other heritable traits They are used to group taxa that are more closely related to one another
46
Figure 4.2 Synapomorphies arise in ancestral populations, and are passed on to descendants
(a) Speciation leads to the creation of two independent populations. Each acquires unique traits by mutation, selection, and genetic drift but they share traits inherited from their common ancestor. (b) As you go up a tree, synapomorphies create a nested hierarchy. Each successive monophyletic group can be identified by synapomorphies that arose in its ancestors.
47
Figure 4.2a Synapomorphies arise in ancestral populations, and are passed on to descendants
(a) Speciation leads to the creation of two independent populations. Each acquires unique traits by mutation, selection, and genetic drift but they share traits inherited from their common ancestor.
48
Figure 4.2b Synapomorphies arise in ancestral populations, and are passed on to descendants
(b) As you go up a tree, synapomorphies create a nested hierarchy. Each successive monophyletic group can be identified by synapomorphies that arose in its ancestors.
49
Figure 4.3 Synapomorphies reveal the relationships among tetrapods
The traits that are labeled at each hash mark on this tree are synapomorphies shared by the descendant species above that point. For example, birds have feathers and other shared, derived traits that identify them as birds. But they also have four limbs that identify them as a member of the monophyletic group called Tetrapoda, amniotic eggs that identify them as members of the clade called Amniota, and so on. synapomorphies
50
Figure 4.5 Reversals complicate phylogeny inference
(a) Read this tree up from the root, and notice that a change in the fifth position of this DNA sequence creates a shared, derived character in the descendant populations. (b) If a reversal changed the fifth position back to the ancestral state later in the evolution of this group, it would make it much more difficult to infer the correct phylogeny.
51
Figure 4.5a Reversals complicate phylogeny inference
(a) Read this tree up from the root, and notice that a change in the fifth position of this DNA sequence creates a shared, derived character in the descendant populations.
52
Figure 4.5b Reversals complicate phylogeny inference
(b) If a reversal changed the fifth position back to the ancestral state later in the evolution of this group, it would make it much more difficult to infer the correct phylogeny.
53
Sometimes similar looking traits are not homologous, and are not synapomorphies, but are the result of convergent evolution Figure 4.4 Similar traits may not be homologous The pairs of species shown have similar traits even though they are not closely related. The octopus (a) and ray-finned fish (b) have camera eyes. The crocodile (c) and hippos (d) have skulls in which the eyes sit on top. These similarities are due to convergent evolution, not common ancestry.
54
Figure 4.6a Using parsimony to distinguish homology from homoplasy
The trees shown were estimated using a large number of synapomorphies in DNA sequences. (a) If the camera eyes of octopuses and vertebrates are homologous, then six evolutionary changes occurred, as shown.
55
Figure 4.6b Using parsimony to distinguish homology from homoplasy
The trees shown were estimated using a large number of synapomorphies in DNA sequences. (b) If the camera eyes of octopuses and vertebrates are convergent, then two evolutionary changes occurred, as shown.
56
How do we construct Phylogenies?
57
Phylogenetic Methods Parsimony: Minimize # steps
Distance Matrix: minimize pairwise genetic distances Maximum Likelihood: Probability of the data given the tree Bayesian: Probability of the tree given the data
58
Parsimony Uses Discrete Characters (like mutations, or some heritable trait) Select the tree with the minimum number of character-state transitions summed across all characters
59
Parsimony: Example 1 Species I Species II Species III
Fig Parsimony: Example 1 Species I Species II Species III Three phylogenetic hypotheses: Figure Applying parsimony to a problem in molecular systematics I I III II III II III II I
60
Site 1 2 3 4 Species I C T A T I I III Species II C T T C II III II
Fig Site 1 2 3 4 1/C Species I C T A T I I III 1/C Species II C T T C II III II 1/C Species III A G A C III II I 1/C 1/C Ancestral sequence A G T T Figure Applying parsimony to a problem in molecular systematics
61
Site 1 2 3 4 Species I C T A T I I III Species II C T T C II III II
Fig Site 1 2 3 4 1/C Species I C T A T I I III 1/C Species II C T T C II III II 1/C Species III A G A C III II I 1/C 1/C Ancestral sequence A G T T 3/A 2/T 3/A I I III 2/T 3/A 4/C II III II 4/C 4/C 2/T III II I 3/A 4/C 2/T 4/C 2/T 3/A Figure Applying parsimony to a problem in molecular systematics
62
Site 1 2 3 4 Species I C T A T I I III Species II C T T C II III II
Fig Site 1 2 3 4 1/C Species I C T A T I I III 1/C Species II C T T C II III II 1/C Species III A G A C III II I 1/C 1/C Ancestral sequence A G T T 3/A 2/T 3/A I I III 2/T 3/A 4/C II III II 4/C 4/C 2/T III II I 3/A 4/C 2/T 4/C 2/T 3/A Figure Applying parsimony to a problem in molecular systematics I I III II III II III II I 6 events 7 events 7 events
63
Three possible trees Parsimony: Example 2 C B A O A B C O O A O C C B
64
Map the characters (mutations) onto tree 1
B A O 1 2 3 4 5 O T G G A A A G C G A A C T 1 B 2 C
65
Map the characters (mutations) onto tree 1
B A O 1 2 3 4 5 1 2 3 4 5 O T G G A A A G C G A A C T B C Total # number of steps = 6
66
Actually, there is more than one way to map character 3
B A O C B A O G 3 3 A G 3 B A C A 3 Either way the character contributes 2 steps to the overall tree length
67
Map the characters onto tree 2
1 2 3 4 5 A B C O T G G A A O 4 5 A G C G A A C T 3 B 1 2 C # steps = 5
68
Tree 3 Length = 6 steps 1 2 3 4 5 B A C O 3 T G G A A O 3 4 5 A G C G
69
Which tree had the shortest branch lengths (most parsimonious)?
Most parsimonious tree C B A O A B C O O A C B Tree 1: length = 6 Tree 2: length = 5 B A C O O A Tree 3: length = 6 B C
70
Where do the Whales belong?
Example from Freeman & Herron, Fig. 4.8 Figure 4.8 Phylogenetic hypotheses for whales and other mammals The tree in (a) shows the Artiodactyla hypothesis: Whales and dolphins are related to the ungulates, possibly as the sister group to the artiodactyls (represented by cows, deer, hippos, pigs, peccaries, and camels). The outgroup to these species is from the ungulate group called Perissodactyla (horses and rhinos). The tree in (b) shows the whale + hippo hypothesis. It is identical to (a) with one exception: The branch leading to whales is moved so that whales are the sister group to the hippos.
71
Freeman & Herron, Fig. 4.9: Using maximum parsimony, looks like the whales cluster with the hippos (and cows) Figure 4.9 Sequence data for parsimony analysis These data are 60 nucleotides of aligned sequence from a milk-protein gene in six artiodactyls, a whale (the dolphin Lagenorhynchus obscurus), and a perissodactyl as an outgroup. An X at a site indicates an ambiguously identified nucleotide. Some of the invariant or uninformative sites are shaded blue; sites that provide synapomorphies are shaded orange. The phylogeny is based on a parsimony analysis of these nucleotide synapomorphies.
72
Parsimony Simplest and fastest method of phylogenetic reconstruction
Can give misleading results if rates of evolution (rates that mutations occur) differ in different lineages Tends to become less accurate as genetic distances get greater Could be mislead by reversals, homoplasy: Because with only 4 nucleotides, after a while, same mutations occur repeatedly at a given site (called “saturation”) – “multiple hits (mutations) per site”
73
Distance Matrix Continuous or Discrete Characters
74
Distance Matrix Calculate pairwise distances between taxa
Choose the tree that minimizes overall distances between taxa proportion sequence distance at 2 genes (hypothetical data) mouse cat dog dolphin seal Mouse 1 Cat Dog Dolphin Seal
75
Freeman & Herron, Fig. 4.10: Using genetic distances, looks like the whales again cluster with the hippos (and cows) Figure 4.10 Genetic distances for cluster analysis Each entry in this table is a genetic distance between a pair of taxa, calculated from the sequence data in Figure 4.9. The phylogeny here was produced by a clustering analysis of these genetic distances. Notice that pairs of taxa with low genetic distances are grouped as sister taxa, such as the cow and deer (blue) or the whale and hippo (orange). The lengths of the branches are proportional to the expected proportion of nucleotide differences between groups, and are shown numerically for several branches.
76
Distance Matrix Generally more accurate than parsimony
Like parsimony, it tends to be computationally fast
77
Maximum Likelihood (R.A. Fisher)
Probability of the data given the tree This is a “Frequentist” method: one true answer (one true tree) Draw from the data (probability distribution of DNA sequence data) to find the true tree Choose the tree (x, y axis) that maximizes the probability of the observed data (z axis) “Normal statistics” Z: Probability of the data Felsenstein, J Evolutionary trees from DNA sequences: a maximum likelihood approach. Journal of Molecular Evolution. 17(6): x,y: Tree space
78
Maximum Likelihood (R.A. Fisher)
Probability of the data given the tree The aim of maximum likelihood estimation is to find the parameter value(s) that makes the observed data most likely. For example: finding a mean. If you want to have a number that describes the data, like human height, you could find the mean P(data/tree) = likelihood(tree/data) Tree = hypothesis Z: Probability of the data Felsenstein, J Evolutionary trees from DNA sequences: a maximum likelihood approach. Journal of Molecular Evolution. 17(6): x,y: Tree space
79
Maximum Likelihood (R.A. Fisher)
Often yields more accurate tree than parsimony or distance Relies on an accurate assumption of which mutations are more probable (A->G more often than A->T or C? i.e. accurate model of molecular evolution) Computationally intensive
80
Bayesian Inference Reverend Thomas Bayes (1702-1760)
Probability of a tree given the data Uses prior information on the tree Does not assume that there is one correct tree Will modify estimate based on additional information Uses Bayes’ Theorem P(A/B) = P(B/A)P(A) P(B)
81
Bayesian Inference Reverend Thomas Bayes (1702-1760)
Probability of a tree given the data: Will modify estimate based on additional information: so as you get more data, you update your hypothesis for the tree Uses prior information on the tree: this is where you start The sequential use of the Bayes' formula (recursive): when more data become available after calculating a posterior distribution, the posterior becomes the next prior Does not assume that there is one correct tree
82
Bayesian Inference Reverend Thomas Bayes (1702-1760)
Uses Bayes’ Theorem P(A/B) = P(B/A)P(A) = P(tree/data) = P(data/tree)P(tree) P(B) P(data) P(A) = prior probability, probability of a tree P(A/B) = posterior probability—probably of tree given the data P(B/A) = the probability B (data) of observing given A (tree), is also known as the likelihood. It indicates the compatibility of the evidence with the given hypothesis. P(B) = probability of the data
83
Bayesian Inference Like Likelihood, often yields more accurate tree than parsimony or distance Computationally more intensive than parsimony or distance matrix, but less intensive than likelihood Needs a prior probability for the tree and a model of evolution
84
Potential problems of Phylogenetic Reconstruction
Sufficient Amount of Data: With enough data most statistical methods usually yield the same tree Insufficient data would yield a tree that lacks resolution (lacks statistical power) Gene trees vs species trees Evolutionary history of individual genes are not necessarily the same Should try to get data from many genes, or the whole genome
85
Challenges of Phylogenetic Reconstructions
Different parts of the genome might have different evolutionary histories (different gene genealogies, horizontal gene transfers, allopolyploidy, etc) So, there might not be one true tree for a group of taxa, and relationships might be difficult to resolve because they are inherently complex
86
Current trend is to use whole genome data to reconstruct phylogenies
Gain a comprehensive picture of the evolutionary relationships among taxa for the whole genome
87
Phylogenetic Reconstructions
Typically, evolutionary biologists will use a variety of methods to reconstruct a phylogeny. Maximum likelihood and Bayesian methods are considered more robust. Tree is only as good as the data. Having many homoplastic characters (due to convergent evolution, reversals, etc.) will make the reconstruction less robust Standard to use Bootstrapping to assess the validity of the tree Understanding statistics is fundamental to understanding evolution Much of statistics was in fact developed in order to model evolutionary processes (such as ANOVA, analysis of variance)
88
1. Sometimes the Molecular Clock (based on genetic data) conflicts with the Geological Record. Why would this happen? (A) Sometimes there are gaps in the geological record, because fossils do not form everywhere, and mutation rate might vary between different species (B) Radiometric dating relies on chance events in the preservation of isotopes, making the timing events in the geological time scale less accurate than the molecular clock (C) Mutation rates slow down as you go back in time, making estimation of timing of events less accurate as you go back in time (D) The molecular clock is calculated from radioisotopes, while the geological record is obtained from fossil data. The two can conflict when fossils end up displaced from their original sedimentary layer
89
2. You are a medical researcher working on HIV
2. You are a medical researcher working on HIV. A novel strain has appeared in Madison, Wisconsin. To determine which drugs would be most effective in treating this new strain (because different strains are resistant to different drugs), you need to determine its recent evolutionary history. You decide to reconstruct the evolutionary history of HIV by using a phylogenetic approach. Thus, you collect samples from patients in various geographic locations and sequence a fragment of RNA. Using parsimony, which is the correct phylogeny for HIV-1 based on the data below? HIV-1, Uganda, Africa ACAUG HIV-1, San Francisco, USA UGAUG HIV-1, Madison, USA UAAGG HIV-1, New York, USA UAAAG HIV-1, Paris ACAUC HIV-2 Africa (ancestral outgroup): ACCUG
90
3. Which of the following is most TRUE regarding phylogenetic reconstructions?
Phylogenetic reconstruction based on any gene would yield the same tree Parsimony is the most accurate method for reconstructing phylogenies Some DNA sequence data is better for phylogenetic reconstruction than others, such as those that tend to be less subjected to selection (3rd codon, introns) Maximum likelihood relies on maximizing distances among taxa
91
4. Which of the following types of data would be most optimal for constructing a phylogeny? (a) Non-coding and regulatory sequences (b) Non-coding and non functional sequences (c) Paralogous genes (d) Genes that have undergone purifying selection (e) Intron sequences within rapidly evolving genes
92
5. Which of the following reasons is FALSE on why the type of data chosen in the question above would be optimal for constructing a phylogeny? (a) Because selection might make taxa seem more closely related due to convergent evolution (b) Because selection might make taxa seem more distantly related due to disruptive evolution (c) Because selection might make taxa seem more closely related due to purifying selection (d) Because non-coding regulatory sequences are likely to be neutral (e) Because coding sequences are likely to be under selection
93
Answers 1A 2C 3C 4B 5D
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.