Microbiome Analysis from sample to data MGL Users Group June 18, 2014.

Slides:



Advertisements
Similar presentations
Next-Generation Sequencing: Methodology and Application
Advertisements

16S sequencing for microbiome studies Nicola Segata and Nick Loman
RNAseq Library Preparation and ANAlysis basics
V Improvements to 3kb Long Insert Size Paired-End Library Preparation Naomi Park, Lesley Shirley, Michael Quail, Harold Swerdlow Wellcome Trust Sanger.
REAL TIME PCR ………A step forward in medicine
Processing of miRNA samples and primary data analysis
Metabarcoding 16S RNA targeted sequencing
Metagenomics. What is metagenomics? Term first used in 1998 by Jo Handelsman "the application of modern genomics techniques to the study of communities.
MCB Lecture #9 Sept 23/14 Illumina library preparation, de novo genome assembly.
SOLiD Sequencing & Data
A Next-Generation Sequencing Core Facility DNA and RNA sequencing capabilities of.
Practical Bioinformatics Community structure measures for meta-genomics István Albert Bioinformatics Consulting Center Penn State.
Mining SNPs from EST Databases Picoult-Newberg et al. (1999)
The SOLiD System: Next-Generation Sequencing Overview of the SOLiD System –  Scalable  Accurate Ultra High Throughput  Flexible  Mate Pairs.
Accepting samples from NICHD investigators for Next Generation sequencing.
Genome Analysis Determine locus & sequence of all the organism’s genes More than 100 genomes have been analysed including humans in the Human Genome Project.
High Throughput Sequencing
The Microbiome and Metagenomics
CS 6293 Advanced Topics: Current Bioinformatics
11 © 2009 PerkinElmer © 2010 PerkinElmer November 20, 2012 DNA Services Overview.
Update on Next-Generation Sequencing
Molecular Microbial Ecology
Advancing Science with DNA Sequence Natalia Ivanova MGM Workshop September 12, 2012 Metagenome analysis: use case.
H = -Σp i log 2 p i. SCOPI Each one of the many microbial communities has its own structure and ecosystem, depending on the body environment it exists.
Genomics – Next-Gen sequencing and Microarrays
Accurate estimation of microbial communities using 16S tags Julien Tremblay, PhD
Advancing Science with DNA Sequence Metagenome definitions: a refresher course Natalia Ivanova MGM Workshop September 12, 2012.
How will new sequencing technologies enable the HMP? Elaine Mardis, Ph.D. Associate Professor of Genetics Co-Director, Genome Sequencing Center Washington.
Next-Generation Sequencing of Microbial Genomes and Metagenomes
Genomics Core Facility at UNH: High-Throughput Sequencing on the Illumina HiSeq 2500 Platform Project Consultation Sample Submission Library Creation Illumina.
Advancing Science with DNA Sequence Natalia Ivanova MGM Workshop September 29, 2011 Metagenome analysis: use case.
The Microbiome and Metagenomics
Census of Marine Life (CoML) / Sloan Foundation
 CHANGE!! MGL Users Group meetings will now be on the 1 st Monday of each month 3:00-4:00 Room Note the change of time and room.
Accurate estimation of microbial communities using 16S tags
Sequencing Transcriptomes Do Me a SOLiD. Overview – Library Construction RNA ◦Isolate & Bioanalyze ◦rRNA Depletion ◦Fragment ◦Bioanalyze Amplified Library.
MEGAN analysis of metagenomic data Daniel H. Huson, Alexander F. Auch, Ji Qi, et al. Genome Res
Convenience Sample of 4 Adults and 6 Infants. Adults 4 visits over 2 weeks; infants 2 visits over 2 weeks Adult specimens: 1) plaque (by method, teeth,
Presented by Samuel Chapman. Pyrosequencing-Intro The core idea behind pyrosequencing is that it utilizes the process of complementary DNA extension on.
Date of download: 6/23/2016 Copyright © 2016 McGraw-Hill Education. All rights reserved. Pipeline for culture-independent studies of a microbiota. (A)
Library QA & QC Day 1, Video 3
Date of download: 7/7/2016 Copyright © 2016 McGraw-Hill Education. All rights reserved. Pipeline for culture-independent studies of a microbiota. (A) DNA.
An Overview of Applications for the MiSeq and HiSeq 2500 April 4, 2016 Kevin Shianna, Ph.D. Sequencing Specialist - Illumina, Inc. MGC USERS GROUP.
16S rRNA Experimental Design
Next-generation sequencing technology
Research Techniques Made Simple: Next-Generation Sequencing:
Presented By: Emily Lamoureux
Metagenomics: From Bench to Data Analysis 19-23rd September S rRNA-based surveys for Community Analysis: How Quantitative are they? Dr.
Micelle PCR reduces artifact formation in 16S microbiota profiling
EDNA analyze Wang Ying & Huang Junman.
exRNA Metadata Standards
Next-generation sequencing technology
Research in Computational Molecular Biology , Vol (2008)
Workshop on the analysis of microbial sequence data using ARB
Sup. Fig. 1 Sample set #1 WT mice (n = 3) Sample set #2
Microbiome: 16S rRNA Sequencing
2nd (Next) Generation Sequencing
High-throughput sequencing techniques
H = -Σpi log2 pi.
Hybrid Capture and Next-Generation Sequencing Identify Viral Integration Sites from Formalin-Fixed, Paraffin-Embedded Tissue  Eric J. Duncavage, Vincent.
Accurate Sample Assignment in a Multiplexed, Ultrasensitive, High-Throughput Sequencing Assay for Minimal Residual Disease  Jack Bartram, Edward Mountjoy,
Volume 21, Issue 8, Pages (August 2014)
Alternative Splicing QTLs in European and African Populations
Independent scientist
BF nd (Next) Generation Sequencing
(Top) Construction of synthetic long read clouds with 10× Genomics technology. (Top) Construction of synthetic long read clouds with 10× Genomics technology.
Andrew L. Goodman, Jeffrey I. Gordon  Cell Metabolism 
Standard (Sanger) sequencing
ITS rRNA gene locus. ITS rRNA gene locus. Schematic of the eukaryotic ribosomal gene cluster. The SILVA database contains sequences of the 18S gene, while.
Toward Accurate and Quantitative Comparative Metagenomics
Presentation transcript:

Microbiome Analysis from sample to data MGL Users Group June 18, 2014

Assaying Microbial Content One of the most common approaches is to sequence 16S ribosomal RNA amplicons. Another option is shotgun sequencing of the community, assembling the sequences, and assigning the identified genes to metabolic pathways. If a finer level of detail is required, most often 16S is sequenced followed by generalized sequencing for a finer species resolution (or validation).

16S rRNA Sequencing Databases of all known 16S sequences have been compiled (Silva, GreenGenes, others). Either targeted amplicons of variable regions or whole 16S sequencing. – Isolate gDNA, PCR amplify using universal 16S primers. imers Primer pair 1 Primer pair 2 Primer pair 3

16S rRNA Sequencing Shear amplicon using Covaris focused acoustics

The Microbiome Library Libraries have adaptor sequences at both ends used for PCR and sequencing priming. “P1” is the universal Forward primer sequence. “P2” has an embedded barcode sequence. Between the two adapter ends is the DNA which will be sequenced from the “P1” forward, and Barcode regions (green arrows). Note: Adapter sequences DIFFER from Illumina if other preparations are to be adapted to this platform.

Bead Preparation from Libraries The pool of libraries is subjected to emulsion PCR to populate beads. Oil micro-reactors are titrated such that each bead is populated by a single template. Unpopulated beads are removed in subsequent cleanup.

16S rRNA Sequencing Bead Preparation from Libraries Nick translate → Amplify → Quantitate

Slide Deposition of enriched beads Beads are flowed into, and then adhered to, the FlowChip lanes. Optimum density is 160 million beads per lane.

ABI SOLiD 5500xl 16S rRNA Sequencing

The resulting library is sequenced. – We do 75 bp on one end (Exact Call Chemistry; most commonly done on a long-read platform [454, MiSeq, etc.]). – We generate millions of reads (most commonly generate thousands). Reads are aligned to the database of 16S sequences to the possible level of resolution. We keep only uniquely aligned reads.

Data Analysis - OTUs Sequences are often reported in “OTUs” (Operational Taxonomic Units) Due to high levels of identity in related 16S sequences, typically some identity threshold is applied and similar sequences are collapsed into OTU sequences (commonly at 97% identity) As a result, the level of taxonomic resolution for individual OTU sequences can vary, even at the same identity threshold.

OTU examples k__Bacteria; p__Bacteroidetes; c__Flavobacteriia; o__Flavobacteriales; f__Flavobacteriaceae; g__Flavobacterium; s__ k__Bacteria; p__Firmicutes; c__Clostridia; o__Clostridiales; f__; g__; s__ k__Bacteria; p__Cyanobacteria; c__Chloroplast; o__Cercozoa; f__; g__; s__ k__Bacteria; p__Firmicutes; c__Clostridia; o__Clostridiales; f__; g__; s__ k__Bacteria; p__Proteobacteria; c__Gammaproteobacteria; o__Pseudomonadales; f__Moraxellaceae; g__Enhydrobacter; s__ k__Bacteria; p__Acidobacteria; c__Acidobacteriia; o__Acidobacteriales; f__Acidobacteriaceae; g__Terriglobus; s__ k__Bacteria; p__Verrucomicrobia; c__Opitutae; o__Puniceicoccales; f__Puniceicoccaceae; g__Puniceicoccus; s__ k__Bacteria; p__Firmicutes; c__Clostridia; o__Clostridiales; f__Caldicoprobacteraceae; g__Caldicoprobacter; s__ 3918k__Bacteria; p__Spirochaetes; c__Spirochaetes; o__Spirochaetales; f__Spirochaetaceae; g__Treponema; s__ k__Bacteria; p__Proteobacteria; c__Alphaproteobacteria; o__Rhodospirillales; f__Rhodospirillaceae; g__; s__ k__Bacteria; p__; c__; o__; f__; g__; s__ (k=kingdom; p=phylum; c=class; o=order; f=family; g=genus; s=species) Typically lack species-level resolution (as seen in the example subset), but some get down to Order, Family, or even Genus. A few really not identifiable.

Gut microbiota Primarily comprised of Firmicutes and Bacteroidetes. Balance between these populations has been linked to obesity in mice. Example using public dataset, treated to simulate short 75bp random reads 3 Simulations on same dataset TestLib1TestLib2 TestLib1 TestLib3

Initial trial run

... Hundreds of lines

Initial run results 4 mice run: 2 WT; 2 KO Phylum resolution N=7.5 Million E1E2 E3E4

Initial run results 4 mice run: 2 WT; 2 KO Class resolution N=7.5 Million

Initial run results 4 mice run: 2 WT; 2 KO Order resolution N=7.4 Million

Initial run results 4 mice run: 2 WT; 2 KO Family resolution N=2.2 Million

Initial run results 4 mice run: 2 WT; 2 KO Genus resolution N=1.1 Million (Species N=340K; More complex)

For more information: Website: mgl.nichd.nih.gov List serv: MGL-USERS-L Phone: Walk-in: Bldg 10/Rm 9D41

Typical Approach Use long reads from a whole, intact amplicon (A few thousand reads typically used) – Perform trimming, remove chimeric sequences, join overlaps in paired ends, etc. – Compare sequences to database through BLAST or comparison to a prepared multi-sequence alignment. – Compare / clean data – Assign resolve taxonomy, describe distribution – Compare populations across conditions, etc. (Statistical digging)

Alternative approach using short reads Amplify 16S or amplicon as normal. Randomly shear to construct a typical short read library comprising random starts/ends. Generate millions of reads. Assign reads that only map unambiguously to OTUs using short read aligners. Analyze normally from OTU populations. A more “wasteful” approach, but in practice performs just as well. Utilizes higher throughput instruments vs lower capacity long-read platforms.