Presentation is loading. Please wait.

Presentation is loading. Please wait.

Nothing in (computational) biology makes sense except in the light of evolution after Theodosius Dobzhansky (1970) Comparative genomics, genome context.

Similar presentations


Presentation on theme: "Nothing in (computational) biology makes sense except in the light of evolution after Theodosius Dobzhansky (1970) Comparative genomics, genome context."— Presentation transcript:

1 Nothing in (computational) biology makes sense except in the light of evolution after Theodosius Dobzhansky (1970) Comparative genomics, genome context and genome annotation

2 Genome context analysis and genome annotation Using information other than homologous relationships between individual gene/proteins for functional prediction (guilt by association) phyletic patterns domain fusion (“Rosetta Stone” proteins) gene order conservation co-expression …. Types of context analysis:

3

4

5

6 Goals: COGs Using gene sets from complete genomes, delineate families of orthologs and paralogs - Clusters of Orthologous Groups (of genes) (COGs) Using COGs, develop an engine for functional annotation of new genomes Apply COGs for analysis of phylogenetic patterns

7 COG: - group of homologous proteins such that all proteins from different species are orthologs (all proteins from the same species in a COG are paralogs )

8 Complete set of proteins from the analyzed genomes FULL SELF-COMPARISON (BLASTPGP, no cut-off) Collapse obvious paralogs Merge triangles with common edges CONSTRUCTION OF COGs FOR 8 COMPLETE GENOMES Detect all interspecies Best Hits (BeTs) between individual proteins or groups of paralogs 1 2 3 Detect all triangles of consistent BeTs 4 5 Detect groups with multidomain proteins and isolate domains REPEAT STEPS 3-5 6 COGs

9 A TRIANGLE OF BeTs IS A MINIMAL, ELEMENTARY COG

10 A RELATIVELY SIMPLE COG PRODUCED BY MERGING ADJACENT TRIANGLES

11 A COMPLEX COG WITH MULTIPLE PARALOGS

12 Current status of the COGs 11 Archaea + 1 unicellular eukaryote + 46 bacteria = 58 complete genomes 149,321 proteins105,861 proteins in 4075 COGs (71%) 4 animals + 1 plant + 2 fungi + 1 microsporidium = 8 complete genomes 142,498 proteins 74,093 proteins in 4822 COGs (52%) Prokaryotes Eukaryotes

13 COGnitor...

14 …IN ACTION

15

16

17

18 The Universal COGs

19 Search for genomic determinants of hyperthermophily

20

21 Search for unique archaeo-eukaryotic genes

22

23 A complementary pattern: search for unique bacterial genes

24

25 Essential function… but holes in the phyletic pattern Strict complementary pattern

26

27 Relaxed complementary pattern

28

29 Relaxed complementary pattern with extra restrictions

30

31

32

33

34 Conservation of gene order in bacterial species of the same genus M. genitalium vs M. pneumoniae

35 Conservation of gene order in closely related bacterial genera C. trachomatis vs C. pneumoniae

36 Lack of gene order conservation - even in “closely related” bacteria of the same Proteobacterial subdivision P. aeruginosa vs E. coli

37 Genome Alignments - Method Protein sets from completely genomes BLAST cross-comparison Pairwise Genome Alignment Local alignment algorithm Lamarck (gap opening penalty, gap extension penalty); statistics with Monte Carlo simulations Table of Hits Template-Anchored Genome Alignment

38

39 Genome Alignments - Statistics Distribution of conserved gene string lengths

40 Genome Alignments - Statistics PairwiseNo.No.% in % in alignments: strings genes Gen1Gen2 all homologs ecoli-hinf13856613%33% ecoli-bsub893228%8% ecoli-mjan10301%2% probable orthologs ecoli-hinf10548211%28% ecoli-bsub341684%4% ecoli-mjan12331%2%

41 Genome Alignments - Statistics Not in gene strings In non-conserved gene strings (directons) In conserved gene strings Breakdown of genes in the genome

42 Genome Alignments - Statistics Fraction of the genome in conserved gene strings - from template-anchored alignments MinimumSynechocystis sp.5% Aquifex aeolicus10% Archaeoglobus fulgidus13% Escherichia coli14% Treponema pallidum17% MaximumThermotoga maritima23% Mycoplasma genitalium24%

43 Context-Based Prediction of Protein Functions A Novel Translation Factor (COG0536) L21L27GTPase? GTP-binding translation factor

44 Context-Based Prediction of Protein Functions A Novel Translation Factor (COG0012) TGS domain containing GTPase? Peptidyl-tRNA hydrolase GTP-binding translation factor

45

46


Download ppt "Nothing in (computational) biology makes sense except in the light of evolution after Theodosius Dobzhansky (1970) Comparative genomics, genome context."

Similar presentations


Ads by Google