Canadian Bioinformatics Workshops www.bioinformatics.ca.

Slides:



Advertisements
Similar presentations
16S sequencing for microbiome studies Nicola Segata and Nick Loman
Advertisements

Metabarcoding 16S RNA targeted sequencing
A Simple Guide to GMO Testing
Bioinformatics for Whole-Genome Shotgun Sequencing of Microbial Communities By Kevin Chen, Lior Pachter PLoS Computational Biology, 2005 David Kelley.
Introduction to Bioinformatics - Tutorial no. 5 MEME – Discovering motifs in sequences MAST – Searching for motifs in databanks TRANSFAC – The Transcription.
The Sorcerer II Global ocean sampling expedition Katrine Lekang Global Ocean Sampling project (GOS) Global Ocean Sampling project (GOS) CAMERA CAMERA METAREP.
The Microbiome and Metagenomics
Microbial Genomes Features Analysis Role of high-throughput sequencing Yeast - the eukaryotic model microbe Databases –TIGR CMR –NCBI Microbial Genomes.
Comparative Genomics of Viruses: VirGen as a case study Dr. Urmila Kulkarni-Kale Bioinformatics Centre University of Pune Pune
Metagenomics Binning and Machine Learning
Discussion on Metagenomic Data for ANGUS Course Adina Howe.
Molecular Microbial Ecology
Discovery of new biomarkers as indicators of watershed health and water quality Anamaria Crisan & Mike Peabody.
From Metagenomic Sample to Useful Visual Anna Shcherbina 01/10/ Anna Shcherbina Bioinformatics Challenge Day 02/02/2013 From Metagenomic Sample to.
H = -Σp i log 2 p i. SCOPI Each one of the many microbial communities has its own structure and ecosystem, depending on the body environment it exists.
Generic substitution matrix -based sequence similarity evaluation Q: M A T W L I. A: M A - W T V. Scr: 45 -?11 3 Scr: Q: M A T W L I. A: M A W.
RNAseq analyses -- methods
Advancing Science with DNA Sequence Metagenome definitions: a refresher course Natalia Ivanova MGM Workshop September 12, 2012.
The iPlant Collaborative
Systems Life Cycle. Know the elements of the system that are created Understand the need for thorough testing Be able to describe the different tests.
CS5263 Bioinformatics Lecture 20 Practical issues in motif finding Final project.
How will new sequencing technologies enable the HMP? Elaine Mardis, Ph.D. Associate Professor of Genetics Co-Director, Genome Sequencing Center Washington.
Current Challenges in Metagenomics: an Overview Chandan Pal 17 th December, GoBiG Meeting.
Metagenomic Analysis Using MEGAN4 Peter R. Hoyt Director, OSU Bioinformatics Graduate Certificate Program Matthew Vaughn iPlant, University of Texas Super.
TIPP: Taxon Identification and Phylogenetic Profiling Tandy Warnow The Department of Computer Science.
Analyzing Time Course Data: How can we pick the disappearing needle across multiple haystacks? IEEE-HPEC Bioinformatics Challenge Day Dr. C. Nicole Rosenzweig.
Metagenomics at Second Genome
Metagenome analysis Natalia Ivanova MGM Workshop February 2, 2012.
Accurate estimation of microbial communities using 16S tags
Chapter 25: Phylogeny and Systematics. “Taxonomy is the division of organisms into categories based on… similarities and differences.” p. 495, Campbell.
TIPP: Taxon Identification using Phylogeny-Aware Profiles Tandy Warnow Founder Professor of Engineering The University of Illinois at Urbana-Champaign.
A Robust and Accurate Binning Algorithm for Metagenomic Sequences with Arbitrary Species Abundance Ratio Zainab Haydari Dr. Zelikovsky Summer 2011.
Canadian Bioinformatics Workshops
Canadian Bioinformatics Workshops
Canadian Bioinformatics Workshops
tracking microbes at the strain level
MEGAN analysis of metagenomic data Daniel H. Huson, Alexander F. Auch, Ji Qi, et al. Genome Res
Canadian Bioinformatics Workshops
Canadian Bioinformatics Workshops
Date of download: 6/23/2016 Copyright © 2016 McGraw-Hill Education. All rights reserved. Pipeline for culture-independent studies of a microbiota. (A)
Canadian Bioinformatics Workshops
Discussion on Genomic/Metagenomic Data for ANGUS Course Adina Howe.
Canadian Bioinformatics Workshops
Canadian Bioinformatics Workshops
Quantitative Phylogenetic Assessment of Microbial Communities in Diverse Environments Xinjun Zhang.
Rob Edwards San Diego State University
Canadian Bioinformatics Workshops
Metagenomic Species Diversity.
Metagenomics: From Bench to Data Analysis 19-23rd September S rRNA-based surveys for Community Analysis: How Quantitative are they? Dr.
Gene expression from RNA-Seq
Metagenomic assembly Cedric Notredame
Research in Computational Molecular Biology , Vol (2008)
COURSE OF MICROBIOLOGY
Workshop on the analysis of microbial sequence data using ARB
Higher Biology Genomic Sequencing Mr G R Davidson.
Taxonomic profiling with MetaPhlAn2
Metagenomics Image: Iverson et al. 2012, Science.
Taxonomic profiling with MetaPhlAn2
H = -Σpi log2 pi.
3.1 Genes Essential idea: Every living organism inherits a blueprint for life from its parents. Genes and hence genetic information is inherited from.
Fractions of 16S rRNA genes from bacteria (top panel) and archaea (bottom panel) in public databases from primer-amplified metagenomes (with and without.
Microbiome studies for microbial disease pathogenesis research
Skin Microbiome Surveys Are Strongly Influenced by Experimental Design
Taxonomic identification and phylogenetic profiling
Example usage of mockrobiota MC resource for marker gene and metagenome sequencing pipelines. Example usage of mockrobiota MC resource for marker gene.
A typical current computational meta'omic pipeline to analyze and contrast microbial communities. A typical current computational meta'omic pipeline to.
Microbial composition of mother and infant samples and shared bacteria within mother-infant pairs. Microbial composition of mother and infant samples and.
Overview of Shotgun Sequence Analysis
Toward Accurate and Quantitative Comparative Metagenomics
General overview of the bioinformatic pipelines for the 16S rRNA gene microbial profiling and shotgun metagenomics. General overview of the bioinformatic.
Presentation transcript:

Canadian Bioinformatics Workshops

2Module #: Title of Module

Module 3 Metagenomic Taxonomic Composition

Module 3 bioinformatics.ca Learning Objectives of Module Understand the pros and cons between 16S and metagenomic sequencing Understand different approaches for determining the taxonomic composition of a metagenomics sample Be able to run Metaphlan2 on one or more samples Be able to determine statistically significant differences in taxonomic abundance across sample groups using STAMP

Module 3 bioinformatics.ca 16S vs Metagenomics 16S is targeted sequencing of a single gene which acts as a marker for identification Pros – Well established – Sequencing costs are relatively cheap (~10,000 reads/sample) – Only amplifies what you want (no host contamination) Cons – Primer choice can bias results towards certain organisms – Usually not enough resolution to identify to the strain level – Need different primers usually for archaea & eukaryotes (18S) – Doesn’t identify viruses

Module 3 bioinformatics.ca 16S vs Metagenomics Metagenomics: sequencing ALL the DNA in a sample Pros – Less bias from sequencing – Can identify all microbes (euks, viruses, etc.) – Provides functional information (“What are they doing?”) Cons – Host/site contamination can be signficant – Expensive (more sequencing depth is required) – May not be able to sequence “rare” microbes – Complex bioinformatics

Module 3 bioinformatics.ca Metagenomics: Who is there? Goal: Identify the relative abundance of different microbes in a sample given using metagenomics Problems: – Reads are all mixed together – Reads can be short (~100bp) – Lateral gene transfer Two broad approaches 1.Binning Based 2.Marker Based

Module 3 bioinformatics.ca Binning Based Attempts to “bin” reads into the genome from which they originated Composition-based – Uses GC composition or k-mers (e.g. Naïve Bayes Classifier) – Generally not very precise and not recommended Sequence-based – Compare reads to large reference database using BLAST (or some other similarity search method) – Reads are assigned based on “Best-hit” or “Lowest Common Ancestor” approach

Module 3 bioinformatics.ca LCA: Lowest Common Ancestor Use all BLAST hits above a threshold and assign taxonomy at the lowest level in the tree which covers these taxa. Notable Examples: – MEGAN: One of the first metagenomic tools Does functional profiling too! – MG-RAST: Web-based pipeline (might need to wait awhile for results) – Kraken: Fastest binning approach to date and very accurate. Large computing requirements (e.g. >128GB RAM)

Module 3 bioinformatics.ca Marker Based Single Gene Identify and extract reads hitting a single marker gene (e.g. 16S, cpn60, or other “universal” genes) Use existing bioinformatics pipeline (e.g. QIIME, etc.) Multiple Gene Several universal genes – PhyloSift (Darling et al, 2014) » Uses 37 universal single-copy genes Clade specific markers – MetaPhlAn (Segata et al, 2012)

Module 3 bioinformatics.ca Marker or Binning? Binning approaches – May be too computationally intensive – May not adequately reflect organism abundances due to genome size Marker approaches – Doesn’t allow functions to be linked directly to organisms – Genome reconstruction is not possible – Very sensitive to choice of markers

Module 3 bioinformatics.ca Why MetaPhlAn? Fast (marker database is considerably smaller) Markers for bacteria, archaea, eukaryotes, and viruses (since MetaPhlAn2 was released) Being continuously updated and supported Used by the Human Microbiome Project Generally accepted as a robust method for taxonomy assignment Main Disadvantage: not all reads are assigned a taxonomic label

Module 3 bioinformatics.ca MetaPhlAn Uses “clade-specific” gene markers A clade represents a set of genomes that can be as broad as a phylum or as specific as a species Uses ~1 million markers derived from 17,000 genomes – ~13,500 bacterial and archaeal, ~3,500 viral, and ~110 eukaryotic Can identify down to the species level (and possibly even strain level) Can handle millions of reads on a standard computer within a few minutes

Module 3 bioinformatics.ca MetaPhlAn Open-source: –

Module 3 bioinformatics.ca MetaPhlAn Marker Selection

Module 3 bioinformatics.ca MetaPhlAn Marker Selection

Module 3 bioinformatics.ca Using MetaPhlan MetaPhlan uses Bowtie2 for sequence similarity searching (nucleotide sequences vs. nucleotide database) Paired-end data can be used directly Each sample is processed individually and then multiple sample can be combined together at the last step Output is relative abundances at different taxonomic levels

Module 3 bioinformatics.ca Absolute vs. Relative Abundance Absolute abundance: Numbers represent real abundance of thing being measured (e.g. the actual quantity of a particular gene or organism) Relative abundance: Numbers represent proportion of thing being measured within sample In almost all cases microbiome studies are measuring relative abundance – This is due to DNA amplification during sequencing library preparation not being quantitative

Module 3 bioinformatics.ca Relative Abundance Use Case Sample A: – Has 10 8 bacterial cells (but we don’t know this from sequencing) – 25% of the microbiome from this sample is classified as Shigella Sample B: – Has 10 6 bacterial cells (but we don’t know this from sequencing) – 50% of the microbiome from this sample is classified as Shigella “Sample B contains twice as much Shigella as Sample A” – WRONG! (If quantified it we would find Sample A has more Shigella) “Sample B contains a greater proportion of Shigella compared to Sample A” – Correct!

Module 3 bioinformatics.ca Visualization and Statistics Various tools are available to determine statistically significant taxonomic differences across groups of samples – Excel – SigmaPlot – R – MeV (MultiExperiment Viewer) – Python (matplotlib) – LefSe & Graphlan (Huttenhower Group) – STAMP

Module 3 bioinformatics.ca STAMP

Module 3 bioinformatics.ca

Module 3 bioinformatics.ca STAMP Plots

Module 3 bioinformatics.ca STAMP Input 1.“Profile file”: Table of features (samples by OTUs, samples by functions, etc.) Features can form a heirarchy (e.g. Phylum, Order, Class, etc) to allow data to be collapsed within the program 2.“Group file”: Contains different metadata for grouping samples Can be two groups: (e.g. Healthy vs Sick) or multiple groups (e.g. Water depth at 2M, 4M, and 6M) Output – PCA, heatmap, box, and bar plots – Tables of significantly different features

Module 3 bioinformatics.ca Questions?

Module 3 bioinformatics.ca We are on a Coffee Break & Networking Session