Presentation is loading. Please wait.

Presentation is loading. Please wait.

Greengenes.lbl.gov 16S rRNA gene database and workbench compatible with ARB Todd DeSantis, Phil Hugenholtz, Niels Larson, Igor Dubosarskiy, Jordan Moberg,

Similar presentations


Presentation on theme: "Greengenes.lbl.gov 16S rRNA gene database and workbench compatible with ARB Todd DeSantis, Phil Hugenholtz, Niels Larson, Igor Dubosarskiy, Jordan Moberg,"— Presentation transcript:

1 greengenes.lbl.gov 16S rRNA gene database and workbench compatible with ARB
Todd DeSantis, Phil Hugenholtz, Niels Larson, Igor Dubosarskiy, Jordan Moberg, Yvette Piceno, Ingrid Zubieta, Eoin Brodie, Gary Andersen LBL - JGI

2 Andersen Group Program Aims
Creating a microarray for the simultaneous differentiation and quantification of closely related prokaryotes in complex samples.

3 The Biomarker 16S rDNA - identify and classify organisms by gene sequence variations. 16S rDNA rRNA (functional molecule) LSU SSU

4 The Challenges 16S sequence deposit rate is increasing.
Many are mis-annotated and/or chimeric. Sequence Taxonomy updates lags years behind sequence availability (“Bacteria, Unclassified”). Difficult to create and manage MSAs of all 16S seq data (or even thousands) using Clustal/BioEdit/Arb. Probe quality is reliant on excellent MSAs and taxonomy. “Signatures” can erode as more sequences are discovered.

5 greengenes.lbl.gov

6 greengenes.lbl.gov Stay current
Source: ‘16S NOT 1.16S NOTmitochondr* NOT 18S’

7 greengenes.lbl.gov Verify ‘16S-ness’
Fate of NCBI Records: short FASTA file (9%) short BLAST match length (8%) BLAST match to 18S/Mito SSU (1%) odd nt insertions (1%) passed (81%)

8 NAST align step 1: find template
Hand curated MSA provided by Phil. Alignment "template" is top BLAST HSP q= -1, Favors long match Candidate trimmed of extra-16S seq data tRNA, intergenic spacer regions, and 23S rDNA based on HSP boundries If HSP paired opposite strands, candidate is reverse complemented. NAST align step 1: find template

9 NAST align step 1: find template
Hand curated MSA provided by Phil. Alignment "template" is top BLAST HSP q= -1, Favors long match Candidate trimmed of extra-16S seq data tRNA, intergenic spacer regions, and 23S rDNA based on HSP boundries If HSP paired opposite strands, candidate is reverse complemented. NAST align step 1: find template

10 NAST align step 1: find template
Hand curated MSA provided by Phil. Alignment "template" is top BLAST HSP q= -1, Favors long match Candidate trimmed of extra-16S seq data tRNA, intergenic spacer regions, and 23S rDNA based on HSP boundries If HSP paired opposite strands, candidate is reverse complemented. NAST align step 1: find template

11 NAST align step 1: find template
Hand curated MSA provided by Phil. Alignment "template" is top BLAST HSP q= -1, Favors long match Candidate trimmed of extra-16S seq data tRNA, intergenic spacer regions, and 23S rDNA based on HSP boundries If HSP paired opposite strands, candidate is reverse complemented. NAST align step 1: find template

12 NAST align step 2: gap removal
Preserves global MSA positions(columns) by allowing local misalignments. DEFINE St = post-Align0 template sequence. Sc = post-Align0 candidate sequence. Ht = alignment space (hyphen) inserted into St by Align0. Hc = alignment space (hyphen) inserted into Sc by Align0. WHILE (St contains one or more Ht) DO LHt = character index of distal 5' Ht within St L5' = character index of Hc within Sc which is 5' proximal to Ht L3' = character index of Hc within Sc which is 3' proximal to Ht IF ((LHt – L5') > (L3' – LHt)) Delete Hc found at L3' ELSE Delete Hc found at L5' Delete template gap character. END WHILE Result: Largest MSA of full-length (>1250 nt) 16S rDNA genes.

13 greengenes.lbl.gov Name generator
Genbank record Is sequence from whole genome record? NCBI annotations are non-standardized Determine if sequence is from an isolate, environmental amplicon/metagenome Concatenate useful terms Effort to guide future GenBank submitters in clear record descriptions no Glob text from “DEFINITION”, “source”, and “TITLE” “Genus species” style name in DEFINITION or source>organism? Does a source>isolate field exist? Text glob contains “clone” OR “uncultur”? yes yes no yes no yes Record is from an isolate no if Gs Gs result? “Gs yes” Text glob “Isolate tag no” “Isolate tag yes” “Gs no” yes no yes no Text glob contains “symbiont”? Strain tag is present Record is from a clone Isolate tag present? Record is from a symbiont Record is from undecided yes no Record is from a isolate_str

14 greengenes.lbl.gov Chimera tracking
Amplicons from complex gDNA can contain partial sequence from more than one genome. Up to 4% of sequences are deemed chimeric by Bellerophon2 Flags are set to avoid using these questionable sequences in phylogeny assessments

15 greengenes.lbl.gov Maintain Taxonomy
JGI taxonomy organized in ARB using maximum parsimony tree insertions. Example: prokMSA_id: prokMSAname: termite gut clone Rs-050 GenBank ACCESSION: AB , GenBank GI: , RDP_id: S , NCBI_tax_id: , Study_id: 21358 G2_chip_tax_string=Bacteria; Firmicutes; Clostridia; Clostridiales; Peptostreptococcaceae; sf_5; otu_2988 JGI_tax_string=Bacteria; Firmicutes (incl. basal lineag; Firmicutes; Peptostreptococcaceae; Mogibacterium JGI_tax_string_format_2=Bacteria; Firmicutes (incl. basal lineag; Firmicutes; Peptostreptococcaceae; Mogibacterium; otu_415 Pace_tax_string=Bacteria; Firmicutes; Clostridium et al.; Peptostreptococcaceae; Clostridium acidiurici et al.; Clostridium difficile et al.; Clostridium aminobutyricum et RDP_tax_string= Bacteria; Firmicutes; Clostridia; Clostridiales; unclassified_Clostridiales. ncbi_tax_string=Bacteria; Firmicutes; Clostridia; Clostridiales; Eubacteriaceae; environmental samples

16 greengenes.lbl.gov Maintain Taxonomy

17 greengenes.lbl.gov Tools
BLAST SimRank Probe matcher Text search PCR primer design Private NAST aligner

18 greengenes.lbl.gov Compatible with ARB
Entire data base download-able in ARB format. Can import new records into personal ARB data base.

19 How we use greengenes data to get our work done…..

20 16S Sequence clustering Each sequence reduced to an array (list) of “probe-friendly” 25-mers which: Have high complexity Can be synthesized with 75 or fewer masks Adequate H-bond potential G+C content over 48% Or empirical bond stability found in test arrays Transitive clustering by fraction of 25mers in common Cluster considered an Operational Taxonomic Unit (OTU)

21 Extended Bergey’s Taxonomy
Bergey’s v0.9 with added nomenclature from Hugenholtz tree of environmental DNA Each OTU assigned to one of 455 families Families split into subfamilies where >15% sequence variation existed. Results: (considering both domains) 63 phyla 136 classes 262 orders 455 families 842 subfamilies (~94% identity) 8,989 OTUs (~99% identity) 30,627 sequences (each belong to only one OTU)

22 Probe Design Example of the Location of Probes Used for
Desulfovibrio sp. str. DMB. Desulfovibrio sp. 'Bendigo A' Desulfovibrio vulgaris DSM 644 Example of the Location of Probes Used for the Desulfovibrio vulgaris Probe Set Sequence discrepancies Regions not unique to OTU Bacteria; Proteobacteria; Deltaproteobacteria; Desulfovibrionales; Desulfovibrionaceae; sf_1; otu_10051 Regions unique to OTU

23 Locus Specific Prevalence Scoring
22/22 25/25 20/25 Example: proteobacteria OTU composed of 26 sequences Locus Specific Prevalence Scoring

24 Probe selection objectives for each OTU
Find 11 or more 25mers (targets) >90% prevalent in an OTU’s sequences dissimilar from sequences outside the OTU >48% G+C or empirically responsive >1 loci within 16S rDNA gene Presumed cross-hybridizing probes were those 25-mers that contained a central 17-mer matching sequences in more than one OTU (Urakawa, Stahl et al. 2002) avoiding probes that were unique solely due to a mismatch in one of the outer four bases. As each PM probe (Perfect Match to target) was chosen, it was paired with a control 25-mer (mismatching probe, MM), identical in all positions except the thirteenth base. The MM probe did not contain an internal 17-mer complimentary to sequences in any OTU.

25 Overview of Sample Preparation
C G T A C G T A C G T A C G T A C G T Extract Genomic DNA PCR Amplify DNA 18 µ Fractionate DNA 18 µ End-label with biotin Hybridize

26 Image Capture and Data Reduction
Over 500,000 data points Image Capture and Data Reduction Scores for each of 9000 OTUS

27 Distribution of 16S rDNA Sequences detected via Cloning or Microarray Analysis
Clone Hits Only (8) Clone and Array Hits (73) Array Hits Only (97) Confirmed by specific PCR and sequencing: Actinobacteria; Actinosynnemataceae; sf_1 Nitrospira; Nitrospiraceae; sf_1 Clostridia; Syntrophomonadaceae; sf_5 Planctomycetes; Plantomycetaceae; sf_3 Gammaproteobacteria; Pseudoaltermonadaceae; sf_1 Acidobacteria; Ellin6075/11-25; sf_1 Spirochaetes; Spirochaetaceae; sf_1 Spirochaetes; Spirochaetaceae; sf_3 Spirochaetes; Leptospiracea; sf_3

28 Array is quantitative r = 0.917 Spike–in % G+C sequence % G+C probes
Mycoplasma neurolyticum 50.0 45.4 Oenococcus oeni 50.9 50.8 Saprospira grandis 51.8 Fervidobacterium nodosum 58.2 53.8 Caulobacter vibrioides 56.4 58.5

29 Array is quantitative ~ S gene copies ~107 16S gene copies

30 Example query against meteorological data: Does detection of Actinobacterium PENDANT-38 correlate with temperature?

31 Real-time quantitative PCR confirmation of array monitoring.
Uranium Bioremediation – is uranium re-oxidation under reducing conditions due to loss of metal reducers? (a) Array quantitation Representative organism Phylocode Group Corrected Array Intensity Area Reduction Oxidation Geothrix fermentans Acidobacteriaceae 45 2344 2290 Geobacter metallireducens Geobacteraceae 251 2238 2188 Geobacter arculus 38 1412 1698 (b) qPCR quantitation Species specific - Geothrix fermentans Group specific - Geobacteraceae

32 Real-time quantitative PCR confirmation – Urban Aerosol
Array hybridization signal correlates significantly with 16S copies in environmental aerosol DNA extract Pseudomonas oleovorans example

33 FEMS Letters - pseudoshift
Order Class Peak Duration (sec) Phaeophyceae (phylum) Stramenopiles (no rank) 5 Basidiomycota (phylum) Fungi (kingdom) 45 Deferribacterales Cyanobacteria 450 Ascomycota (phylum) Vibrionales Gammaproteobacteria Flavobacteriales Flavobacteria Clostridiales Clostridia Rhizobiales Alphaproteobacteria Rhodospirillales n.s. Lactobacillales Bacilli Bacillales Mycoplasmatales Mollicutes Xanthomonadales Burkholderiales Betaproteobacteria Sphingomonadales Sphingobacteriales Sphingobacteria Acholeplasmatales

34 Acknowledgements Phil Hugenholtz – Taxonomy, Arb Interface, Chimera
Niels Larson – SimRank Igor Dubosarskiy – JSP Jordan Moberg – Microarrays, Cloning Yvette Piceno – Microarrays, Primer Design Ingrid Zubieta – PCR, Cloning Eoin Brodie – Microarrays, QPCR Gary Andersen – 16S Microarray Group Leader

35 C. perfringens probe set identified in EPA sample 22 (N.Y. Spring)
CFB C.AURANTIBUTYRICUM C.THERMOBUTYRICUM_SUBGROUP C. BUTYRICUM Cyan High G+C C.ALGIDICARNIS Bacteria Proteo C.BOTULINUM_SUBGROUP Bacil-Strep C.CADAVERIS Gram + C.PERFRINGENS Clostridium C.BARATI_SUBGROUP 27 1492 16S rDNA 420 469 5 6 7 8 C. perf. resistant ...CGTAAAGCTCTGTCTTTGGGGAAGATAATGACGGTACCCAAGGAGGAAGCCACGGCTAACT... C. perf. str.CPN50 Clostridium sp. AB&J clone p Wa2 C. perf. A C. perf rrnA C. perf rrnE T C. perf rrnD C. perf rrnC C. perf rrnB C. perf rrnF C. perf rrnG C. perf str.13a C. perf str.13b C. perf rrnH C. perf rrnI C. perf rrnJ clone OI1612 C. perf. B Swine manure 37-3 Swine manure 37-4 TAAAGCTCTGTCTTTGGGGAAGATA tacccaaggaggaagccacggctaa AAAGCTCTGTCTTTGGGGAAGATAA AAGCTCTGTCTTTGGGGAAGATAAT AGCTCTGTCTTTGGGGAAGATAATG Ave Diff =1891 Probe Properties: 25mer exits in 90% of the taxon’s seqs Internal 21mer exists only in one taxon. Probes 5 - 8


Download ppt "Greengenes.lbl.gov 16S rRNA gene database and workbench compatible with ARB Todd DeSantis, Phil Hugenholtz, Niels Larson, Igor Dubosarskiy, Jordan Moberg,"

Similar presentations


Ads by Google