Presentation is loading. Please wait.

Presentation is loading. Please wait.

Computational genomic strategies for natural product discovery

Similar presentations


Presentation on theme: "Computational genomic strategies for natural product discovery"— Presentation transcript:

1 Computational genomic strategies for natural product discovery
Dr. Marnix H. Medema Bioinformatics Group Wageningen University, The Netherlands EBI Course Exploiting Metagenomics Thursday, december 3rd, 2015, 11:00h

2 Microbial Biosynthetic pathways:
a great source of valuable molecules

3 Specialized metabolites play key roles in microbiomes

4 Specialized metabolites play key roles in microbiomes
Donia et al. (2014) Cell 158:

5 Diverse and complex enzymology produces chemical diversity: riPPs
Huge assembly-lines, But: not the only mechanism RiPPs: Ribosomally synthesized and Posttranslationally modified Peptides Ortega et al. (2015), Nature 517:

6 nonproteinogenic amino
Diverse and complex enzymology produces chemical diversity: nonribosomal peptides Key enzyme class: Nonribosomal Peptide Synthetase (NRPS) NRPSs can introduce nonproteinogenic amino acids into peptides! Huge assembly-lines, But: not the only mechanism

7 Diverse and complex enzymology produces chemical diversity: nonribosomal peptides
Huge assembly-lines, But: not the only mechanism Schmartz et al. (2014), Nat. Prod. Rep. 12:

8 Diverse and complex enzymology produces chemical diversity: polyketides
Key enzyme class: Polyketide synthase (PKS) Huge assembly-lines, But: not the only mechanism Menzella et al. (2005), Nat. Biotechnol. 23:

9 Diverse and complex enzymology produces chemical diversity: polyketides
Not all polyketide synthases are modular, some are iterative! Fungal Type I Type II Type III etc. Huge assembly-lines, But: not the only mechanism Shen et al. (2003), Curr. Opin. Chem. Biol. 7:

10 Diverse and complex enzymology produces chemical diversity: terpenes
Huge assembly-lines, But: not the only mechanism Key enzyme classes: terpene synthases / cyclases These turn isoprene precursors into mature terpenoids Gao et al. (2012), Nat. Prod. Rep. 29:

11 Diverse and complex enzymology produces chemical diversity:
saccharides Key enzyme class: glycosyl transferase Huge assembly-lines, But: not the only mechanism McCranie & Bachmann et al. (2014), Nat. Prod. Rep. 31:

12 Biosynthetic gene clusters: the genetic basis of molecular diversity
So if we can find new gene clusters, we can find new chemicals! Now how to find new gene clusters?

13 Modularity of biosynthetic gene clusters
Second strategy Cacho et al. (2015) Front. Microbiol 5: 774.

14 Modularity of biosynthetic gene clusters
Second strategy Medema, Cimermancic et al. (2015) PLoS Comp. Biol. 10: e

15 antiSMASH: A Web Server for the Detection and analysis of biosynthetic gene clusters
15 Medema et al. (2011) Nucl. Acids Res. 39: W339-W346. Blin, Medema et al. (2013) Nucl. Acids Res. 41: W

16 Core structure prediction for polyketide synthase and nonribosomal peptide synthetase gene clusters
16 Medema et al. (2011) Nucl. Acids Res. 39: W339-W346. Blin, Medema et al. (2013) Nucl. Acids Res. 41: W

17 Comparative analysis and subcluster detection
17 Medema et al. (2011) Nucl. Acids Res. 39: W339-W346. Blin, Medema et al. (2013) Nucl. Acids Res. 41: W

18 Another Method to Detect Biosynthetic Gene Clusters in Prokaryotic Genomes
Training set consisted of 732 biosynthetic gene clusters of known compounds: 136 type I polyketides 100 nonribosomal peptides 76 type II polyketides 82 polyketide-peptide hybrids 93 oligo- and polysaccharides 38 aminoglycosides 36 terpenoids 27 ribosomal peptides 23 lantibiotics 13 indolocarbazoles 11 type III polyketides 9 fatty acids 9 siderophores 8 nucleosides 6 beta-lactams 4 aminocoumarins 61 others Cimermancic, Medema, Claesen et al. (2014) Cell 158:

19 Large metagenomic datasets may contain very large numbers of biosynthetic gene clusters
Now there are of course both rare and frequently occurring classes of gene clusters / compounds. What we had not expected was to find large clusters within this network that contain no known gene clusters. We chose one of these regions, which contained two related families of hundreds of gene clusters encoding amongst others very unusual ketosynthases CoA-ligases. Cimermancic, Medema, Claesen et al. (2014) Cell 158:

20 Data on bgcs is scattered and not systematically stored

21 The minimum information about a biosynthetic gene cluster (MIBiG)
21 Medema et al. (2015) Nature Chem. Biol., under review.

22 a rich set of annotations and metadata on biosynthetic gene clusters
General MIBiG Parameters Biosynthetic class MIxS environmental / taxonomic information Number of loci Complete / partial cluster Nucleotide sequence accession 16S accession / sequence Custom gene names Functional sub-clusters Biosynthetic genes Transport-related genes Regulatory genes Resistance/immunity genes Operon architecture Knockout mutant phenotypes Compound name Synonyms for compound name Exact molecular mass Molecular formulae of the compound(s) Compound structure Chemical moieties Compound activity Compound molecular target Publications on activity/toxicity/target Tailoring reactions Evidence for compound-cluster connection Polyketide-specific Polyketide synthase type Polyketide subclass Linear / cyclic PKS genes Number of PKS modules Ketide unit sequence Starter Unit Reductive domains KR stereochemistries AT domain substrate specificities Non-reductive modifying PKS domains Module skipping / iteration Number of iterations (if iterative) Iterative PKS subtype (if iterative) Trans-acyltransferase genes Inactive / atypical domains TE domain type Cyclization / termination type Nonribosomal peptide-specific NRP subclass Linear / cyclic NRPS genes Number of NRPS modules NRP amino acid sequence A domain substrate specificities Variable A domain specificities Condensation domain subtypes Modifying domains (Me/Ox/Red/Epi) Module skipping / iteration TE domain type Cyclization / termination type RiPP-specific RiPP subclass Linear/cyclic Precursor-encoding gene(s) Precursor peptide length Leader peptide length Follower peptide length Core peptide length Core peptide sequence Cleavage recognition site Number of crosslinks Crosslink positions Type of crosslinks/cyclizations Recognition motif in leader peptide Terpenoid-specific Terpene subclass Precursor carbon chain length Final isoprenoid precursor Terpene synthases / cyclases Prenyltransferases Saccharide-specific Saccharide subclass Glycosyltransferase (GT) genes GT substrate specificities Alkaloid-specific Alkaloid subclass Specific for other classes Biosynthetic class specification 22 Again, MIBiG has an important role to play here, as standardized data submission and storage will allow us to build up a parts registry that can function as a trustworthy repository for designing new pathways. Medema et al. (2015) Nature Chem. Biol., under review.

23 >75 research groups worldwide participated
Community annotation of biosynthetic gene clusters using MIBiG 23 >75 research groups worldwide participated Result: detailed annotation of ±400 BGCs, essential annotations for another ±900 BGCs So we currently have a draft version of MIBiG, on which between PIs in the field have already commented through an online survey. Later this week, I will organize a discussion session, to which I would like to invite you all to discuss this further. A standard has to be carried by the community.

24 Community annotation of biosynthetic gene clusters using MIBiG
24 So we currently have a draft version of MIBiG, on which between PIs in the field have already commented through an online survey. Later this week, I will organize a discussion session, to which I would like to invite you all to discuss this further. A standard has to be carried by the community.

25 An online repository for MIBIG information
25

26 Integration with antismash: KnownClusterblast
26

27 Finding more variants of known enzymatic parts using Multigeneblast
27 Medema et al. (2013) Mol. Biol. Evol. 30:

28 Finally: some suggestions for analyzing metagenomes using antismash
28 Assemble first! Only run contigs > 2 kb; use other tools for very fragmented assemblies, e.g. Sort contigs by size, if >1000 contigs: run locally or contact us to run it on the public server Local installations: Docker container available


Download ppt "Computational genomic strategies for natural product discovery"

Similar presentations


Ads by Google