Accurate estimation of microbial communities using 16S tags Julien Tremblay, PhD

Slides:



Advertisements
Similar presentations
In Silico Primer Design and Simulation for Targeted High Throughput Sequencing I519 – FALL 2010 Adam Thomas, Kanishka Jain, Tulip Nandu.
Advertisements

16S sequencing for microbiome studies Nicola Segata and Nick Loman
Clostridium difficile Colitis or Dysbiosis. Symbiostasis/Dysbiosis.
The Past, Present, and Future of DNA Sequencing
Use of the genomic data o Reconstruction of metabolic properties o Nature’s Microbiome o NGS in Population Genetics.
Jeff Dangl, UNC Chapel Hill Phil Hugenholtz, Susannah Tringe, JGI Ruth Ley, Cornell Rhizosphere Grand Challenge Pilot Project Scott Clingenpeel Project.
Metabarcoding 16S RNA targeted sequencing
Advancing Science with DNA Sequence Metatranscriptomics: Challenges and Progress Shaomei He DOE Joint Genome Institute AUG.
Microbiome Analysis from sample to data MGL Users Group June 18, 2014.
Computational Analysis of the Taxanomical Classification of Short 16S rRNA Sequences Christel Chehoud Mentor: Brian Haas.
1 Les mesures de diversité microbienne par séquençage massivement parallèle Richard Christen CNRS UMR 6543 & Université de Nice
Practical Bioinformatics Community structure measures for meta-genomics István Albert Bioinformatics Consulting Center Penn State.
Bioinformatics for Whole-Genome Shotgun Sequencing of Microbial Communities By Kevin Chen, Lior Pachter PLoS Computational Biology, 2005 David Kelley.
The Sorcerer II Global ocean sampling expedition Katrine Lekang Global Ocean Sampling project (GOS) Global Ocean Sampling project (GOS) CAMERA CAMERA METAREP.
High Throughput Sequencing
The Microbiome and Metagenomics
Molecular Microbial Ecology
Advancing Science with DNA Sequence Natalia Ivanova MGM Workshop September 12, 2012 Metagenome analysis: use case.
From Metagenomic Sample to Useful Visual Anna Shcherbina 01/10/ Anna Shcherbina Bioinformatics Challenge Day 02/02/2013 From Metagenomic Sample to.
H = -Σp i log 2 p i. SCOPI Each one of the many microbial communities has its own structure and ecosystem, depending on the body environment it exists.
Species  OTUs  OPUs  Species  OTUs  OPUs. Rosselló-Mora & Amann 2001, FEMS Rev. 25:39-67 Taxa circumscription depends on the observable characters.
Genomics – Next-Gen sequencing and Microarrays
Probes can be designed in an evolutionary hierarchy.
Work by Antonio Izzo Based on 36 soil cores from a total of 9 plots contained within a 2.5 hectare region.
Greengenes: A Tutorial
 16S rRNA gene marker  intra-gene variability  primer selection  size & information content Primer selection, information content, alignment and length.
Christian Rinke Microbial Genomics DOE, Joint Genome Institute Introduction to ARB (From A User's Perspective)
Analysis of microbial communities with QIIME Justin Kuczynski February 2012.
BIOINFORMATICS PROGRAM St. Edward’s University Genomics Education Partnership (GEP) Genomics Consortium for Active Teaching (GCAT)
Metatranscriptomics: Challenges and Progress
Current Challenges in Metagenomics: an Overview Chandan Pal 17 th December, GoBiG Meeting.
Quality Control Hubert DENISE
CompostBin : A DNA composition based metagenomic binning algorithm Sourav Chatterji *, Ichitaro Yamazaki, Zhaojun Bai and Jonathan Eisen UC Davis
Elucidating factors behind pair wise distances discrepancies between short and near full-length sequences. We hypothesized that since the 16S rRNA molecule.
Advancing Science with DNA Sequence Natalia Ivanova MGM Workshop September 29, 2011 Metagenome analysis: use case.
Analyzing Time Course Data: How can we pick the disappearing needle across multiple haystacks? IEEE-HPEC Bioinformatics Challenge Day Dr. C. Nicole Rosenzweig.
The Microbiome and Metagenomics
Metagenomics at Second Genome
Compositionality and Sparseness in 16S rRNA data Anthony Fodor Associate Professor Bioinformatics and Genomics UNC Charlotte.
Accurate estimation of microbial communities using 16S tags
Metagenomic dataset preprocessing – data reduction
Shotgun sequencing reveals transkingdom alterations in immunodeficiency associated enteropathy Xiaoxi Dong (Oregon State University), Jialu Hu (Oregon.
MEGAN analysis of metagenomic data Daniel H. Huson, Alexander F. Auch, Ji Qi, et al. Genome Res
Convenience Sample of 4 Adults and 6 Infants. Adults 4 visits over 2 weeks; infants 2 visits over 2 weeks Adult specimens: 1) plaque (by method, teeth,
Introducing DOTUR, a Computer Program for Defining Operational Taxonomic Units and Estimating Species Richness Patric D. Schloss and Jo Handelsman Department.
Presented by Samuel Chapman. Pyrosequencing-Intro The core idea behind pyrosequencing is that it utilizes the process of complementary DNA extension on.
Date of download: 6/23/2016 Copyright © 2016 McGraw-Hill Education. All rights reserved. Pipeline for culture-independent studies of a microbiota. (A)
Population sequencing using short reads: HIV as a case study Vladimir Jojic et.al. PSB 13: (2008) Presenter: Yong Li.
Date of download: 7/7/2016 Copyright © 2016 McGraw-Hill Education. All rights reserved. Pipeline for culture-independent studies of a microbiota. (A) DNA.
Are Roche 454 shotgun reads giving a accurate picture of the genome?
ESPRIT. Taxonomy ● Works very well and gives accurate results ● Requires a previous blast search that may take long to complete ● When in doubt goes one.
Robert Edgar Independent scientist
16S rRNA Experimental Design
16S RNA sequencing analysis
Metagenomic Species Diversity.
Short Read Sequencing Analysis Workshop
Metagenomics: From Bench to Data Analysis 19-23rd September S rRNA-based surveys for Community Analysis: How Quantitative are they? Dr.
Peter Sterk EBI Metagenomics Course 2014
Metagenomics Rob Edwards.
PNAS 2012 Alpha diversity: how many species are in each sample?
Workshop on the analysis of microbial sequence data using ARB
Independent scientist
Rosie Coates-Brown Final year Bioinformatics trainee
Microbiome: 16S rRNA Sequencing
H = -Σpi log2 pi.
Example of amplicon performance in our presented workflow.
Bacterial composition of olive fermentations is affected by microbial inoculation. Bacterial composition of olive fermentations is affected by microbial.
ITS rRNA gene locus. ITS rRNA gene locus. Schematic of the eukaryotic ribosomal gene cluster. The SILVA database contains sequences of the 18S gene, while.
Comparison of gut microbiota alpha diversity in different preservatives based on 16S rRNA gene V3-V4 amplicon sequencing. Comparison of gut microbiota.
Fig. 3 Postnatal assembly of the humanized gut microbiota.
Presentation transcript:

Accurate estimation of microbial communities using 16S tags Julien Tremblay, PhD

16S rRNA as phylogenetic marker gene Escherichia coli 16S rRNA Primary and Secondary Structure 70S Ribosome subunits 30S 50S 34 proteins 21 proteins 5S rRNA 23S rRNA 16S rRNA Falk Warnecke highly conserved between different species of bacteria and archaea

16S rRNA in environmental microbiology (Sanger clone libraries) Falk Warnecke bp length

Next generation sequencing (NGS) Illumina M 150bp reads/lane $ 0.5M 450bp reads $$ Read length Throughput

Game plan to survey microbial diversity V1V2V3V4V5V6V7V8V9 16S rRNA Reduce dataset by dereplication/clustering X 10,000 X 800 X 1,200 X 200 X 2,000 X 1,000 X 1 X 10 Identification (BLAST, RDP classifier) Generate amplicons of a given variable region from bacterial community (many millions of sequences) Amplicon tags = Deeper, cheaper, faster

Rare biosphere Rank Abundance Rare biosphere Sequencing error? Chimeras? Background noise? Relative small size of amplicons High abundance Low abundance High sequencing depth of NGS  reveals “rare” OTUs

Rare bias sphere? Control experiment: estimate rare biosphere in a single strain of E.coli 27F342R1114F1392R Is rare biosphere an artifact of the NGS error? V1 & V2 V8 Kunin et al., (2009), Environ. Microbiol. It should not, if relatively stringent clustering parameters are applied Subject to controversy – Is rare always real? Quince et al., (2009), Nat. Methods

PyroTagger (for 454 amplicons) Unzip, validate Remove low-quality reads Redundancy removal PyroClust & Uclust Remove chimeras Samples comparison, post-processing pyrotagger.jgi-psf.org

Classification and barcode separation Sequences of cluster (OTU) representatives Blast vs GreenGenes and Silva databases, dereplicated at 99.5% Distribution of microbial phyla in the dataset Also see the Qiime pipeline

Illumina tags (itags) Typical 454 run  450,000 – 500,000 reads “Typical” Illumina run: GAIIx  10,000,000 – 40,000,000 reads/lane Hiseq   ~ 350,000,000 reads/lane Miseq (available soon)  ~4,000,000 reads/lane Move 16S tags sequencing to Illumina platform HiSeq = huge output compared to 454 (suitable for big projects indexes(barcodes)/libraries MiSeq = moderatly high throughput (More suitable?)  throughput  more efficient clustering algorithm (SeqObs).

Illumina tags (itags) 454 = “1” read Illumina = “2” reads => have to be assembled Both reads need to be of good quality ACGTGGTACTACGTGAT…. ~ bp ACGTGGTACTACGTGATAGTGTAT ~252 bp 454 Illumina

itags clustering Sort by alphabetical order  100% identity Reduces dataset by 80% Edward Kirton, JGI 97%

Number of reads >> number of clusters Edward Kirton, JGI Number of reads (millions) Clustering happens here!

Benefits of parallelization Processing time (min.) Number of reads (millions) Edward Kirton, JGI

MiSeq validation Exploratory experiments using 11 wetlands samples. Validate reproducibility between runs

MiSeq validation Beta diversity (UniFrac Distances) Run 1 Run 2

itags Validating SeqObs output by comparing with pyrotagger results Synthetic communities Termite gut Surface Sediments Compost Sludge 454  Pyrotagger (V8 region) Illumina GAIIx  SeqObs pipeline (V4, V5 and V9 regions) Illumina Miseq  SeqObs pipeline (V4 region)

Comparing 454 with illumina GAIIx vs 454 region

Comparing 454 with illumina Primer pair of variable region is likely to affect outcome of results. In silico PCR on 16S Greengenes database.

itags – confidence level bp GAIIx ~110 bp Miseq 5’ reads 150 bp Miseq assembled reads ~250 bp E values

Challenges Short size of amplicon What filtering parameters to use (stringency level)?  balance between stringency filter and keeping as much data as we can Whole new dimension for rare biosphere? Handling large numbers of sample (tens of thousand magnitude) Cost of barcoded primers (will need lots of barcodes), handling Huge ammount of samples  statistics models…

Acknowledgments Susannah Tringe Edward Kirton Feng Chen Kanwar Singh Rob Knight lab (Univ. of Colorado) Thanks!

16S rRNA Dangl lab, UNC