Presentation is loading. Please wait.

Presentation is loading. Please wait.

Accurate estimation of microbial communities using 16S tags Julien Tremblay, PhD

Similar presentations


Presentation on theme: "Accurate estimation of microbial communities using 16S tags Julien Tremblay, PhD"— Presentation transcript:

1 Accurate estimation of microbial communities using 16S tags Julien Tremblay, PhD jtremblay@lbl.gov

2 16S rRNA as phylogenetic marker gene Escherichia coli 16S rRNA Primary and Secondary Structure 70S Ribosome subunits 30S 50S 34 proteins 21 proteins 5S rRNA 23S rRNA 16S rRNA Falk Warnecke highly conserved between different species of bacteria and archaea

3 16S rRNA in environmental microbiology (Sanger clone libraries) Falk Warnecke 900-1100 bp length

4 Next generation sequencing (NGS) Illumina454 10-400M 150bp reads/lane $ 0.5M 450bp reads $$ Read length Throughput

5 Game plan to survey microbial diversity V1V2V3V4V5V6V7V8V9 16S rRNA Reduce dataset by dereplication/clustering X 10,000 X 800 X 1,200 X 200 X 2,000 X 1,000 X 1 X 10 Identification (BLAST, RDP classifier) Generate amplicons of a given variable region from bacterial community (many millions of sequences) Amplicon tags = Deeper, cheaper, faster

6 Rare biosphere Rank Abundance Rare biosphere Sequencing error? Chimeras? Background noise? Relative small size of amplicons High abundance Low abundance High sequencing depth of NGS  reveals “rare” OTUs

7 Rare bias sphere? Control experiment: estimate rare biosphere in a single strain of E.coli 27F342R1114F1392R Is rare biosphere an artifact of the NGS error? V1 & V2 V8 Kunin et al., (2009), Environ. Microbiol. It should not, if relatively stringent clustering parameters are applied Subject to controversy – Is rare always real? Quince et al., (2009), Nat. Methods

8 PyroTagger (for 454 amplicons) Unzip, validate Remove low-quality reads Redundancy removal PyroClust & Uclust Remove chimeras Samples comparison, post-processing pyrotagger.jgi-psf.org

9 Classification and barcode separation Sequences of cluster (OTU) representatives Blast vs GreenGenes and Silva databases, dereplicated at 99.5% Distribution of microbial phyla in the dataset Also see the Qiime pipeline

10 Illumina tags (itags) Typical 454 run  450,000 – 500,000 reads “Typical” Illumina run: GAIIx  10,000,000 – 40,000,000 reads/lane Hiseq   ~ 350,000,000 reads/lane Miseq (available soon)  ~4,000,000 reads/lane Move 16S tags sequencing to Illumina platform HiSeq = huge output compared to 454 (suitable for big projects 1000+ indexes(barcodes)/libraries MiSeq = moderatly high throughput (More suitable?)  throughput  more efficient clustering algorithm (SeqObs).

11 Illumina tags (itags) 454 = “1” read Illumina = “2” reads => have to be assembled Both reads need to be of good quality ACGTGGTACTACGTGAT…. ~200-220 bp ACGTGGTACTACGTGATAGTGTAT ~252 bp 454 Illumina

12 itags clustering Sort by alphabetical order  100% identity Reduces dataset by 80% Edward Kirton, JGI 97%

13 Number of reads >> number of clusters Edward Kirton, JGI Number of reads (millions) 0 10 20 5 15 25 30 Clustering happens here!

14 Benefits of parallelization Processing time (min.) Number of reads (millions) 0 10 20 5 15 25 20015010050 Edward Kirton, JGI

15 MiSeq validation Exploratory experiments using 11 wetlands samples. Validate reproducibility between runs

16 MiSeq validation Beta diversity (UniFrac Distances) Run 1 Run 2

17 itags Validating SeqObs output by comparing with pyrotagger results Synthetic communities Termite gut Surface Sediments Compost Sludge 454  Pyrotagger (V8 region) Illumina GAIIx  SeqObs pipeline (V4, V5 and V9 regions) Illumina Miseq  SeqObs pipeline (V4 region)

18 Comparing 454 with illumina GAIIx vs 454 region

19 Comparing 454 with illumina Primer pair of variable region is likely to affect outcome of results. In silico PCR on 16S Greengenes database.

20 itags – confidence level 454 220 bp GAIIx ~110 bp Miseq 5’ reads 150 bp Miseq assembled reads ~250 bp E values

21 Challenges Short size of amplicon What filtering parameters to use (stringency level)?  balance between stringency filter and keeping as much data as we can Whole new dimension for rare biosphere? Handling large numbers of sample (tens of thousand magnitude) Cost of barcoded primers (will need lots of barcodes), handling Huge ammount of samples  statistics models…

22 Acknowledgments Susannah Tringe Edward Kirton Feng Chen Kanwar Singh Rob Knight lab (Univ. of Colorado) Thanks!

23 16S rRNA Dangl lab, UNC


Download ppt "Accurate estimation of microbial communities using 16S tags Julien Tremblay, PhD"

Similar presentations


Ads by Google