Phylogeny or “tree thinking”

Phylogeny or “tree thinking”
The evolutionary history of extant organisms can be understood in terms of their shared inheritance - which extant species evolved from the same ancestor? - how were ancestral traits modified in different lineages? A hypothesis of the evolutionary history of a group is called its phylogeny - often summarized in a branching diagram called a phylogenetic tree Since we can’t travel back in time to identify common ancestors, relationships of existing species must be estimated or inferred from data – therefore, a phylogeny is always a hypothesis

Tree thinking and phylogeny
Nodes: branching points Branches: lines connecting nodes Topology: branching pattern Phylogenetic tree

Taxon = any named group of organisms
Two or more = taxa A + B C + D Sister Taxa: two taxa (= named group of organisms) that are more closely related to each other than either is to a 3rd taxon

recently diverged diverged a long time ago

Branches can be rotated at a node, without changing
relationships among the taxa these 3 trees are all equivalent

Relationships can be resolved or unresolved
A node with more than 2 branches from it is called a polytomy, and means the relationships are not fully resolved - may the data aren’t good enough to figure out the true relationships among descendents of that ancestor - maybe ancestor had multiple direct descendents -- rapid speciation can occur during an adaptive radiation

Characters (traits) change across a phylogeny
Plesiomorphy: ancestral (or primitive) character state Apomorphy: derived character state, modified from the ancestral (primitive) state Synapomorphy: derived character shared by more than one species or group (shared derived character) Synapomorphy ancestral derived

Synapomorphy (unites all birds) Plesiomorphy Feathers (birds) Scales (snakes) Scales (bony fish) Scales (early reptile) Scales (ancestral fish) Synapomorphy: shared, derived character - it’s different from the ancestral state - it’s found in more than one species

Homologous trait: structurally similar, but functionally different
- human arm, dolphin flipper, bat wing are homologous: all descended from same ancestral tetrapod appendage, with modification

Homoplasy: structurally different (unrelated),
but functionally similar - example: streamlined shape of sharks + killer whales - adaptation to similar environment, driven by natural selection flippers = yes arms = no similar coding for these non-relatives Distinguishing between true homology and homoplasy is critical if we are to sort out the phylogeny, or evolutionary relationship, among living things

Homoplasy = problem for phylogenetics
Homoplasy: non-homologous traits look superficially similar - resemblance is not due to common ancestry, but rather to convergent evolution in 2 different groups, which tricks us - you don’t want to code non-relatives as having the same traits, or they will falsely group together on your tree wings in birds wings in bats

Cladistics uses synapomorphies to identify clades, which
(a) are the groups we want to name (true groups) (b) tell us something about the evolution of a group For a long time, this was the only way to infer phylogeny, using morphological character traits

Cladistics uses synapomorphies alone to establish recent
common ancestry - relies on synapomorphies to define monophyletic groups, or clades (same thing) Monophyletic group includes ALL descendants of an ancestral species

Types of non-monophyletic groups
Paraphyletic: leaves out one or more descendants of an ancestor Polyphyletic: derived from 2 or more ancestral taxa (= crap) Polyphyletic Paraphyletic

Clade = monophyletic group
Birds = clade Reptiles: paraphyletic group (not a clade) Mammals = clade

Why do we name para- and polyphyletic groups?
Taxonomists often assigned class or family status to groups that are very morphologically distinctive, such as birds and cetaceans (whales & dolphins) - does not reflect their evolutionary status within other groups what do you think: should scientists call birds and reptiles both classes of vertebrates, if birds are really a subset of reptiles? Another reason is, mistakes can be made in defining relationships depending on the character that is used

Cladistics and parsimony
Techniques that identify monophyletic groups based on shared derived characters are cladistic methods - based on the idea that the most related groups will have the most traits in common Parsimony is a principle that simpler explanations are more likely to be correct than complicated explanations - when choosing among hypotheses, we should accept the one that explains the data most simply and efficiently - assumes convergence is less likely than shared descent

When a new trait appears, all descendants will share it as a synapomorphy (barring secondary loss of the trait) feathers Bird evolution

Maximum parsimony = method of phylogeny reconstruction, process by which we infer the evolutionary history of a group based on the traits we see today - the best phylogenetic tree is the one which requires the fewest changes in traits (characters) to account for modern character states in surviving lineages - i.e., assumes that the minimum number of changes is what really happened over the course of evolution

Using parsimony to infer evolutionary relationships
How are the major types of algae-eating sea slugs related? some have shells; seem “primitive” because they retain a plesiomorphy, the ancestral snail shell some groups have frilly flaps on their backs... are they the most derived = different from the ancestor?

“code” the different states for each morphological character,
for each existing taxon “has a shell” = 0, “no shell” = 1 # of characters you can use is unlimited

the characters you are using a coded representation of one genus feed all this into a computer; ask it to give you the tree that requires the fewest number of overall changes, to produce the distribution of traits seen in modern taxa

Parsimony analysis of morphological traits gave this answer
for how all genera in this group are related

Homoplasy = problem for phylogenetics
Problem: Morphological traits may be unreliable for inferring evolutionary relationships if they are frequently subject to convergent evolution - homoplasy may fool us if we rely too heavily on morphology can we find traits to use in phylogeny-building that are largely invisible to selection?

DNA sequence data & phylogenetics
Most modern phylogenetic trees are based partly or entirely on molecular data, usually DNA sequence information - you can combine molecular and morphological data in a tree - you can combine data from different genes Some changes in DNA are largely “invisible” to selection; mutations steadily accrue over time in a clock-like manner - more distant relatives have more changes in their DNA than close relatives - true for non-coding junk DNA, introns, & silent substitutions

Changes provide insight into patterns of relatedness (phylogeny)
ATTCGTATTC ATTCGTTTTC ATTCGTTTTC ATTCGTTTTC ATTCGTATTC Changes provide insight into patterns of relatedness (phylogeny)

Drawback: any given site in a DNA sequence can only occupy one of 4 possible “character states” -- A, T, G, or C - “character” is the nucleotide at a given position in the DNA - “state” is what the character looks like in a particular species (a “G” at position 137 of the actin gene) Mutations can change the nucleotide sequence, but then later mutations can change the sequence back to the original one

such reversions can fool us (= homoplasy)
ATTCGTATTC ATTCGTTTTC ATTCGTTTTC ATTCGTATTC ATTCGTATTC ATTCGTTTTC ATTCGTATTC

Mutations can change the nucleotide sequence, but then later mutations can change the sequence back to the original one This leads to a form of homoplasy when comparing DNA sequences, since there are only 4 possible character states - unlike with morphology, at least this kind of homoplasy cannot result from convergent evolution; it’s “accidental” Amino acids have 20 possible states, which is a bit better - less chance of homoplasy from reverse-mutations to the ancestral amino acid - however, they can also be under selection  possibility of convergent evolution exists

Mutations can change the nucleotide sequence, but then later mutations can change the sequence back to the original one This leads to a form of homoplasy when comparing DNA sequences, since there are only 4 possible character states - unlike with morphology, at least this kind of homoplasy cannot result from convergent evolution; it’s “accidental” Take-home:

Advantages of DNA-based methods: - DNA sequences do not undergo convergent evolution, so there’s no homoplasy like in morphological characters - you can cheaply acquires 1000’s of nucleotides; each position is potentially information - hopefully, the huge amount of good data will drown out the mis-information present due to “accidental” homoplasy 

Disadvantages of DNA-based methods: - homoplasy arises from reverse-mutations, which makes some nucleotide positions misleading - data can be generated very quickly, but analysis is slow and can require months of computing time (more to follow)

Evolution of whales: who’s the nearest relative?
Classical taxonomy: Whales are a sister group to the Artiodactyla common ancestor of whales and other Artiodactyls New hypothesis: whales are sister group of hippos

4 kinds of DNA-based analysis
There are 4 ways to use DNA info to construct a phylogeny (1) Genetic distance (fast) - “Neighbor-Joining” or NJ tree - pairs up the closest sequences (lowest % difference) as sister taxa, builds a tree from there - discards info about specific changes, focuses on overall similarities and differences - quick and dirty, not very sophisticated

Genetic distance Fast, especially when you have a
huge amount of sequence data (otherwise many possible trees; slow computation) Branches show how much each of 2 sister taxa has diverged from hypothetical common ancestor

(2) Maximum parsimony (medium speed) - uses only informative sites to draw the most parsimonious tree - finds the minimum set of changes that had to occur to produce the data you observe in present-day species - also discards lots of information

Parsimony analysis and DNA sequence data
Can DNA sequence comparisons tell us who’s related to whales? site 162: C T mutation defines a clade that includes hippos, whales, deer + cows

Parsimony analysis and DNA sequence data
Can DNA sequence comparisons tell us who’s related to whales? site 166: G C mutation defines hippos + whales as a clade

Position 177 shows a reversion: a change from C to T at (1),
and change from T back to C at (2) T C (2) C T (1)

Every time you posit that a reversion occurred, you make the tree less parsimonious (it costs you)
- doesn’t mean reversions don’t happen; just that they are less likely than shared inheritance Tree that requires the smallest number of changes is the most parsimonious, and therefore the “chosen” phylogeny Problem: there can be “ties” where multiple trees require the same number of changes - no way to chose among equally-parsimonious trees

(3) Maximum likelihood (slow) - uses a model of DNA sequence evolution to find the most likely tree, given the data - slow; searches all possible trees to find the best one (4) Bayesian inference (medium speed) - also uses model of DNA sequence evolution - rather than looking for best tree, looks at many likely trees (much faster than likelihood) - tells you how often a particular relationship shows up  i.e., what % of good trees show the clade [hippo+whale]

What is a model of DNA evolution?
0.01 - model may also include a rate multiplier that allows a given site to mutate at times slower, or faster, than the G T rate 10.3 1.0 0.5 0.1 Transitions and transversions 1.0 Model of DNA sequence evolution is a matrix of estimated rates at which all the different mutations occurred, given your data  rates are both inferred from your sequence data, and then used to estimate the most likely tree given your data 42

Advantage of likelihood/Bayesian methods: - models of DNA sequence evolution include all sorts of rates of change, which are estimated from the data and used by computer programs to infer the likeliest trees - used to estimate the likelihood of changes at each nucleotide position, based on the overall sequence data type of change likelihood A G transition 1.0 C G transversion 0.1 C A transversion 0.01 some changes are more likely than others

DNA-based phylogenetic analysis
type of change relative likelihood A G transition often happens C G transversion sometimes happens A C transversion rarely ever happens A ? Can we figure out what the ancestral nucleotide was at a certain position, if we know the nucleotide at that position in four related species today? ? C A ? G

type of change relative likelihood A G transition often happens C G transversion sometimes happens A C transversion rarely ever happens A G C A C A A Parsimony Likelihood 2 changes A G often A C rarely A G Only two changes required, but one is a very rare change (likelihood doesn’t like to let rare things happen)

type of change relative likelihood A G transition often happens C G transversion sometimes happens A C transversion rarely ever happens G A G A C G C G G Parsimony Likelihood 3 changes, G A often don’t like G A often as much G C sometimes G A Likelihood may favor more changes if they are likely changes

Tree parameters In addition to estimaitng the model of DNA sequence evolution, the computer program will draw the phylogeny that maximizes the likelihood of seeing your sequence data, by altering 2 things: 1) the topology itself (the tree, or branching relationships) C C B B C B A A A likelihood:

Tree parameters In addition to estimaitng the model of DNA sequence evolution, the computer program will draw the phylogeny that maximizes the likelihood of seeing your sequence data, by altering 2 things: 1) the topology itself (the tree, or branching relationships) 2) branch lengths (estimate of evolutionary time, or amount of change, since a split in the tree) C C B B C B A A A C C C A A A B B B likelihood:

Genes evolve at different rates
Some genes evolve faster than others, so you can pick a gene appropriate to the problem you are tackling If comparing closely related species, use fast-evolving mitochondrial genes to show differences  mitochondrial lack sophisticated proof-reading enzymes of the nucleus; accumulate mutations faster If comparing distantly related taxa, use a slow-evolving nuclear gene, like 18S ribosomal RNA or histone 3 gene 

Numbers are bootstrap support (maximum parsimony or
maximum likelihood) or posterior probabilities (Bayesian) Statistical measure of how confident we are that a given node is real (that relationships really are how they appear in the tree) - significant bootstrap levels are >70% - significant posterior probabilities are >90%

Numbers are bootstrap support (maximum parsimony or
maximum likelihood) or posterior probabilities (Bayesian) E. pratensis E. subornata Elysia sp. 2 Elysia tomentosa Elysia sp. 4, Japan this clade is a polytomy: the relationships of these 4 species are not resolved (it’s a 4-way tie)

the two species are 1% different at this conserved gene mitochondrial
COI gene every slug has a slightly different sequence all alleles differ by silent substitutions the species are 20% different at this gene nuclear histone H3 gene within a species, every slug has the same allele the two species are 1% different at this conserved gene

Boselia basal, or “primitive” Thuridilla Plakobranchus Elysia - most species-rich genus My phylogeny of sea slug family Elysiidae, based on 4 genes: 2 fast mitochondrial genes 2 slow-evolving nuclear genes

Sequences of the fast-evolving
mitochondrial COI gene help to distinguish individuals within a species from different populations - Guam versus Australia - different Caribbean islands Also resolve the placement of closely related (recently diverged) species – i.e., near the tips of tree

Sequences of conserved, slow-evolving
nuclear genes help to resolve ancient events (relationships among genera) - things that happened near the root of the tree, farther back in time For instance, can help answer group-level questions: Who is the sister group of Elysia?

Sequences of conserved, slow-evolving
nuclear genes help to resolve ancient events (relationships among genera) - things that happened near the root of the tree, farther back in time For instance, can help answer group-level questions: Who is the sister group of Elysia? … a clade of Thuridilla + Plako

Does DNA give you the same
“answer” (= phylogeny) as morphological traits? Why or why not, and which is better?

Morphological phylogeny
Morphology says, Thuridilla is the sister group of Elysia 4-gene molecular phylogeny Boselia Plakobranchus Thuridilla Elysiella Elysia australis Elysia ornata Our says, nope – Thuridilla is sister to Plakobranchus

Many kinds of molecular data are used in evolutionary biology: a) DNA sequence comparisons b) amino acid sequences of protein-coding genes c) secondary structure of RNA (rRNA, tRNA genes) d) presence/absence of introns, transposable elements e) microsatellites f) gene order on the chromosome g) gene duplication events (presence/absence of genes) h) expression patterns of genes (evo-devo)

Phylogeny or “tree thinking”

Similar presentations

Presentation on theme: "Phylogeny or “tree thinking”"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Phylogeny or “tree thinking”

Similar presentations

Presentation on theme: "Phylogeny or “tree thinking”"— Presentation transcript:

Similar presentations

About project

Feedback