Presentation is loading. Please wait.

Presentation is loading. Please wait.

Phylogenetic genome analysis, phylogenomics

Similar presentations


Presentation on theme: "Phylogenetic genome analysis, phylogenomics"— Presentation transcript:

1 Phylogenetic genome analysis, phylogenomics
Bas E. Dutilh

2 What we can see

3 Family tree → species tree
Offspring looks like its parents Darwin: species evolve like families

4 Species tree

5 Tree of life Archaea Bacteria Eukaryota

6 Phylogeny Term coined by Ernst Haeckel (1866) Phylon (Greek: fulon)
Tribe Race Genus (Latin) Birth Origin

7 Phenotype ↔ genotype Infinite # features Subjective choice
Value can depend on observation (etc.) Gene/genome is finite Objective choice A sequence is absolute

8 Convergence Contrary to phenotype or structure, sequences do not converge Highly dimensional: every residue is a dimension

9 Phylogenetic markers Available/easy to sequence Present in all species
Cytochrome C Present in all species Constant function Slowly evolving SSU rRNA Fitch, Science1967 Woese et al, PNAS 1977

10 SSU rRNA Phylogeny of SSU rRNA discovered the three domains
Representative for the evolutionary history of species Archaea Bacteria Eukaryota

11 Phylogenetic assumptions
Sequences are homologous – have a common ancestor Sequences diverge in a binary fashion Each position evolves independently

12 Phylogenetics Neighbor joining Maximum parsimony Maximum likelihood
Which tree assumes the fewest mutations? Maximum likelihood For a given model, which tree has the highest probability of generating observed alignment?

13 Bootstrapping Jackknifing
Randomly re-sample all columns in the alignment with replacement Re-create trees Count presence of each branch Jackknifing Delete fraction of columns Re-create tree

14 Different genes tell different stories
Conflict between trees based on single genes Unrecognized paralogy Horizontal gene transfer Mutation saturation, biases, divergent rates spec B spec A - Paralogs - Orthologs ancestor spec C

15 More data → more consistent trees
Combine information from more genes to average out these anomalies Complete genomes contain the maximum phylogenetic information Dutilh et al, Bioinformatics 2007

16 Chimeric genomes Is a tree the right representation of the evolutionary history of a genome? Endosymbiosis (mitochondrion, chloroplast) Horizontal gene transfer (many examples, often adaptations to environment) Darwin, 1859 Doolittle, Science 1999

17 Densitree “Fuzzy” trees Draw the tree lots of times
Bootstrap Different genes Use transparency to make fuzziness

18 Splitstree Tries to accommodate non-bifurcating nodes
Some positions evolve independently Parallel edges are related

19 Genomic properties Word frequency Sequence (nt/aa) Gene content
Gene order

20 Dutilh et al, Bioinformatics 2007
Fungi Yeasts, filamentous and dimorphic fungi Fungi are the eukaryotic clade with largest number of completely sequenced genomes S. cerevisiae is a well studied model organism Much consensus about phylogeny Dutilh et al, Bioinformatics 2007

21 Consensus phylogeny (literature)
19 target nodes Dutilh et al, Bioinformatics 2007

22 13 trees 14 trees 15 trees 12 trees

23 Gene content methods Presence/absence matrix (0/1)
Similarity: number of shared orthologous groups Genomes that share few OGs are distantly related Genomes that share many OGs are closely related OG1 OG2 OG3 OG4 … sp … sp … sp … … … … … … but… Snel et al, Nat Genet 1999 Tekaia et al, Genome Res 1999

24 Genome size correction
Large genomes have more genes, so they also share more genes Divide number of shared genes by Average genome size Smallest of two genomes Weighted average genome size P. chrysosporium # shared genes genome size Korbel et al, TiG 2002

25 Saitou et al, Mol Biol Evol 1987
Gene content methods Similarity: corrected number of shared genes Distance: (1 – similarity) Neighbour joining ( ) # shared OGs (spA, spB) weighted average size (spA, spB) d 0.8 0 dist (spA, spB) = 1 – \s sp1 sp2 sp3 sp4 … sp1 \ … sp \ … sp \ … sp \1 … … … … … … Saitou et al, Mol Biol Evol 1987

26 Superalignment methods
Multiple alignment Concatenate alignments (1:1:1) A missing gene in a certain species (row) can be seen as a gap in the alignment

27 Select positions Percentage gaps Percentage conservation GBlocks
Slow-fast Castresana, Mol Biol Evol 2000 Brinkmann et al, Mol Biol Evol 1999

28 Gene content vs. sequence
Gene content supertrees are different than sequence based supertrees Dutilh et al, Bioinformatics 2007

29 “Hot” origin of life? Protein sequence Gene content

30 Other evidence Membrane composition Gene structure
Plötz et al, J Biol Chem 2000 Gene structure Gribaldo et al, J Bact 1999

31 Light from different angles
Sequence Phylogenetic trees (marker genes) Phylogenomic trees Gene content Gene content trees Signature genes Phenotype Morphology Metabolism / chemistry

32 Highly similar strains
Almost identical gene content Low recombination rate Whole genome alignment Mauve Nucmer Extract positions that are not completely conserved from the genome alignment SNPs Small indels Abundance Recombination rate


Download ppt "Phylogenetic genome analysis, phylogenomics"

Similar presentations


Ads by Google