Presentation is loading. Please wait.

Presentation is loading. Please wait.

Bioinformatics and Evolutionary Genomics Genome Evolution (I) and Genomics Context for function prediction.

Similar presentations


Presentation on theme: "Bioinformatics and Evolutionary Genomics Genome Evolution (I) and Genomics Context for function prediction."— Presentation transcript:

1 Bioinformatics and Evolutionary Genomics Genome Evolution (I) and Genomics Context for function prediction

2 (bacterial) Genome Evolution: added value of genomes How does gene content evolve?How does gene content evolve? How does gene order evolve?How does gene order evolve? How important are various evolutionary dynamics of genes on a genomic scale (e.g. gene fusion, gene loss, gene duplication): moving from anecdotes to trendsHow important are various evolutionary dynamics of genes on a genomic scale (e.g. gene fusion, gene loss, gene duplication): moving from anecdotes to trends How does gene content evolve?How does gene content evolve? How does gene order evolve?How does gene order evolve? How important are various evolutionary dynamics of genes on a genomic scale (e.g. gene fusion, gene loss, gene duplication): moving from anecdotes to trendsHow important are various evolutionary dynamics of genes on a genomic scale (e.g. gene fusion, gene loss, gene duplication): moving from anecdotes to trends

3 functionally associated proteins leave evolutionary traces of their relation in genomes Genomic context / in silico interaction prediction

4 Gene order evolution: -Establish orthologous relations between pairs of genomes (e.g. S-W best bidirectional hit approach -Put them in a dotplot, color the relative direction of transcription (Green for the same relative direction. Red for the opposite direction.) Gene order evolution: -Establish orthologous relations between pairs of genomes (e.g. S-W best bidirectional hit approach -Put them in a dotplot, color the relative direction of transcription (Green for the same relative direction. Red for the opposite direction.)

5

6 Evolution of genome organization: -In prokaryotes, genome inversions centered around the origin/terminus of replication are a major source of genome rearrangements. -This suggests that both replication forks are in close contact -> comparative genome analysis provides support for a hypothesis about genome replication “and a close proximity of the forks would increase the probability of reciprocal recombination or transposition between sequences at the two forks. That the forks are near each other is also consistent with the 'replication factory' model based on immunolocalization of components of the replication machinery in Bacillus subtilis” (Tillier and Collins, Nat. Gen)” b

7 Gene order evolves rapidly But …

8 Differential retention of divergent / convergent gene pairs suggests that conservation implies a functional association Operons Gene Order Evolution

9 Conserved gene order i.e. genes that are present over ‘sufficiently large’ evolutionary distances in the same gene clusteri.e. genes that are present over ‘sufficiently large’ evolutionary distances in the same gene cluster Contributes many reliable predictionsContributes many reliable predictions i.e. genes that are present over ‘sufficiently large’ evolutionary distances in the same gene clusteri.e. genes that are present over ‘sufficiently large’ evolutionary distances in the same gene cluster Contributes many reliable predictionsContributes many reliable predictions

10 Conserved gene order NB1 predicting operons is not trivial; in fact conserved gene order or functional association is a major clue NB2 using ‘only’ operons without requiring conservation results in much less reliable function prediction

11 Comparison to pathways conservation implies a functional association

12 Conserved gene order: an example from Conserved gene order: an example from metabolism of propionyl-CoA “query” “target”

13 Biochemical assays confirm the function of members of COG0346 as a DL- methylmalonyl-CoA racemase

14 Gene Fusion “Rare” (especially in prokaryotes): ~3000 linked COGs in STRING v6 (~180 genomes)“Rare” (especially in prokaryotes): ~3000 linked COGs in STRING v6 (~180 genomes) But what about domain recombination?But what about domain recombination? “Rare” (especially in prokaryotes): ~3000 linked COGs in STRING v6 (~180 genomes)“Rare” (especially in prokaryotes): ~3000 linked COGs in STRING v6 (~180 genomes) But what about domain recombination?But what about domain recombination? FusionFusion

15 Gene fusion i.e. the orthologs of two genes in another organism are fused into one polypeptidei.e. the orthologs of two genes in another organism are fused into one polypeptide A very reliable indicator for functional interaction; partly because it is an relatively infrequent evolutionary event:A very reliable indicator for functional interaction; partly because it is an relatively infrequent evolutionary event: i.e. the orthologs of two genes in another organism are fused into one polypeptidei.e. the orthologs of two genes in another organism are fused into one polypeptide A very reliable indicator for functional interaction; partly because it is an relatively infrequent evolutionary event:A very reliable indicator for functional interaction; partly because it is an relatively infrequent evolutionary event:

16 Gene fusion: an example

17 Gene Content Evolution

18 What about HGT? Genome trees based on gene content: Escherichia coli Haemophilus influenzae shared genes Species specific genes genes genes genes

19 Genome trees based on gene content ( ) # shared OGs (spA, spB) Weighted average Genome size(spA, spB) Weighted average Genome size(spA, spB) \s sp1 sp2 sp3 sp4 … sp1 \ … sp2 \ … sp3 \1 0.3 … sp4 \1 … … … … … … \s sp1 sp2 sp3 sp4 … sp1 \ … sp2 \ … sp3 \1 0.3 … sp4 \1 … … … … … … Neighbor joining d d dist (spA, spB) = 1 – OG1 OG2 OG3 OG4 … sp … sp … sp … … … … … … OG1 OG2 OG3 OG4 … sp … sp … sp … … … … … … Presence / absence matrix:

20 Genome trees based on gene content are remarkably similar to consensus on ToL C. pneumoniae C. trachomatis M. tuberculosis M. pneumoniae M. genitalium B. subtilis T. pallidum T. maritima B. burgdorferi P. horikoshii M. thermoautotrophicum A. fulgidus M. jannaschii S. cerevisiae C. elegans A. aeolicus E. coli H. influenzae R. prowazekii H. pylori Synechocystis sp. H. pylori J99 A. pernix Proteobacteria Eukarya Euryarchaeota Spirochaetales 100

21 Reconstruction of Gene Content b

22 Deletion Gain Ancestral Genome Reconstruction of LUCA : patchy gene distributions

23 ParsimonyParsimony Attach a cost (c) to HGT / independent gain in terms of loss; find scenario with lowest costAttach a cost (c) to HGT / independent gain in terms of loss; find scenario with lowest cost At g = 1.5, 733 genes in LUCAAt g = 1.5, 733 genes in LUCA At g = 2, 956 genes in LUCAAt g = 2, 956 genes in LUCA Evolution is not parsimonious, minimal estimate?Evolution is not parsimonious, minimal estimate? Why not use gene trees?Why not use gene trees? Attach a cost (c) to HGT / independent gain in terms of loss; find scenario with lowest costAttach a cost (c) to HGT / independent gain in terms of loss; find scenario with lowest cost At g = 1.5, 733 genes in LUCAAt g = 1.5, 733 genes in LUCA At g = 2, 956 genes in LUCAAt g = 2, 956 genes in LUCA Evolution is not parsimonious, minimal estimate?Evolution is not parsimonious, minimal estimate? Why not use gene trees?Why not use gene trees? b

24 Nice results e.g. nucleotide biosynthesis

25 Another attempt to reconstruct the genome of LUCA over 1000 gene families, of which more than 90% are also functionally characterized.over 1000 gene families, of which more than 90% are also functionally characterized. a fairly complex genome similar to those of free-living prokaryotes, with a variety of functional capabilities including metabolic transformation, information processing, membrane/transport proteins and complex regulation,a fairly complex genome similar to those of free-living prokaryotes, with a variety of functional capabilities including metabolic transformation, information processing, membrane/transport proteins and complex regulation, over 1000 gene families, of which more than 90% are also functionally characterized.over 1000 gene families, of which more than 90% are also functionally characterized. a fairly complex genome similar to those of free-living prokaryotes, with a variety of functional capabilities including metabolic transformation, information processing, membrane/transport proteins and complex regulation,a fairly complex genome similar to those of free-living prokaryotes, with a variety of functional capabilities including metabolic transformation, information processing, membrane/transport proteins and complex regulation,

26 Presence / absence of genes Gene content  co-evolution. (The easy case, few genomes. ) Genomes share genes for phenotypes they have in common Differences between gene Content reflect differences in Phenotypic potentialities Differences between gene Content reflect differences in Phenotypic potentialities

27 Qualitative differential genome analysis: -Find “pathogen specific” specific proteins that can serve as drug targets -Relate the differences between genomes to the differences in the phenotypes Qualitative differential genome analysis: -Find “pathogen specific” specific proteins that can serve as drug targets -Relate the differences between genomes to the differences in the phenotypes

28 Three-way comparisons Huynen et al., 1998, FEBS Lett

29 Convergence in functional classes of gene content in small intra cellular bacterial parasites Zomorodipour & Andersson FEBS Letters 1999

30 Although we can, qualitatively, interpret the variations in shared gene content in terms of the phenotypes of the species, quantitatively they depend on the relative phylogenetic positions of the species. The closer two species are the larger fraction of their genes they share.

31 Presence / absence of genes L. innocua (non-pathogen) L. monocytogenes (pathogen)

32 Occurrence of genes L. innocua (non-pathogenic) L. monocytogenes (pathogenic) Genes involved in pathogenecity

33 Generalization: phylogenetic profiles / co- occurence Gene 1: Gene 2: Gene 3:.... Gene 1: Gene 2: Gene 3:.... species 1 species 2 species 3 species 4 species species 1 species 2 species 3 species 4 species Gene 1: Gene 2: Gene 3: Gene 1: Gene 2: Gene 3: species 1 species 2 species 3 species 4 species species 1 species 2 species 3 species 4 species

34 Co-occurrence of genes across genomes i.e. two genes have the same presence/ absence pattern over multiple genomes: i.e. two genes have the same presence/ absence pattern over multiple genomes: AKA phylogenetic profilesAKA phylogenetic profiles NB complete genomes absenceNB complete genomes absence Correction for phylogenetic signal needed → eventsCorrection for phylogenetic signal needed → events b

35 Predicting function of a disease gene protein with unknown function, frataxin, using co-occurrence of genes across genomes Friedreich’s ataxiaFriedreich’s ataxia No (homolog with) known functionNo (homolog with) known function Friedreich’s ataxiaFriedreich’s ataxia No (homolog with) known functionNo (homolog with) known function

36 A. a e o l i c u s S y n e c h o c y s t i s B. s u b t i l i s M. g e n i t a l i u m M. t u b e r c u l o s i s D. r a d i o d u r a n s R. p r o w a z e k i i C. c r e s c e n t u s M. l o t i N. m e n i n g i t i d i s X. f a s t i d i o s a P. a e r u g i n o s a B u c h n e r a V. c h o l e r a e H. i n f l u e n z a e P. m u l t o c i d a E. coli A. p e r n i x M. j a n n a s c h i i A. t h a l i a n a S. c e r e v i s i a e s C. j e j u n i C. a l b i c a n s S. p o m b e H. s a p i e n s C. e l e g a n H. pylori D.melan. cyaY Yfh1 hscB Jac1 hscA ssq1 Nfu1 iscA Isa1-2 fdx Yah1 Arh1 RnaM IscR Hyp iscS Nfs1 iscU Isu1-2 Atm1 Atm1 Frataxin has co-evolved with hscA and hscB indicating that it plays a role in iron-sulfur cluster assembly

37 Iron-Sulfur (2Fe-2S) cluster in the Rieske protein

38 Prediction: ~Confirmation:

39 functionally associated proteins leave evolutionary traces of their relation in genomes Genomic context / in silico interaction prediction

40 Evolutionary rate Chen &Dokholyan TiG 2006

41 Co-evolution: mirrortree Pavos & Valencia PEDS 2001

42 Co-evolution: mirrortree

43 Integrating genomic context scores into one single score (post-hoc) Compare each individual method against an independent benchmark (KEGG), and find “equivalency” Compare each individual method against an independent benchmark (KEGG), and find “equivalency” Multiply the chances that two proteins are not interacting and subtract from 1; naive bayesian i.e. assuming independence Multiply the chances that two proteins are not interacting and subtract from 1; naive bayesian i.e. assuming independence


Download ppt "Bioinformatics and Evolutionary Genomics Genome Evolution (I) and Genomics Context for function prediction."

Similar presentations


Ads by Google