Advancing Science with DNA Sequence Natalia Ivanova MGM Workshop September 29, 2011 Metagenome analysis: use case.

Slides:



Advertisements
Similar presentations
Jeff Dangl, UNC Chapel Hill Phil Hugenholtz, Susannah Tringe, JGI Ruth Ley, Cornell Rhizosphere Grand Challenge Pilot Project Scott Clingenpeel Project.
Advertisements

Metabarcoding 16S RNA targeted sequencing
Differential insertion of transposable elements in Anopheles gambiae M & S genomes Jenica L. Abrudan, Ryan C. Kennedy, Maria F. Unger, Michael R. Olson,
FACE Soil Metagenome Comparisons in IMG Melissa Dsouza, Peter Hallin, Craig Herbold, Rima Upchurch, & Paul Wilkinson.
Bioinformatics for Whole-Genome Shotgun Sequencing of Microbial Communities By Kevin Chen, Lior Pachter PLoS Computational Biology, 2005 David Kelley.
Mining SNPs from EST Databases Picoult-Newberg et al. (1999)
Workshop in Bioinformatics 2010 Class # Class 8 March 2010.
Whole Genome Assembly. WGA 1. Screener 2. Overlapper 3. Unitigger, 4. Scaffolder, 5. Repeat Resolver.
Analyses of ORFans in microbial and viral genomes Journal club presentation on Mar. 14 Albert Yu.
Utilizing Fuzzy Logic for Gene Sequence Construction from Sub Sequences and Characteristic Genome Derivation and Assembly.
Scaffold Download free viewer:
The Microbiome and Metagenomics
Analysis of Hot Spring Microbial Mat
From Haystacks to Needles AP Biology Fall Isolating Genes  Gene library: a collection of bacteria that house different cloned DNA fragments, one.
Advancing Science with DNA Sequence Natalia Ivanova MGM Workshop September 12, 2012 Metagenome analysis: use case.
Advancing Science with DNA Sequence Data Curation in IMG-ER Natalia Ivanova MGM Workshop May 16, 2012.
Discovery of new biomarkers as indicators of watershed health and water quality Anamaria Crisan & Mike Peabody.
From Metagenomic Sample to Useful Visual Anna Shcherbina 01/10/ Anna Shcherbina Bioinformatics Challenge Day 02/02/2013 From Metagenomic Sample to.
H = -Σp i log 2 p i. SCOPI Each one of the many microbial communities has its own structure and ecosystem, depending on the body environment it exists.
Probes can be designed in an evolutionary hierarchy.
Cottrell, M. T., L. A. Waldner, L. Yu, and D. L. Kirchman Bacterial diversity of metagenomic and PCR libraries from the Delaware River. Environmental.
Accurate estimation of microbial communities using 16S tags Julien Tremblay, PhD
What do these terms mean to you? You have 5 min to discuss possible meanings and examples with your group! DNA sequencing DNA profiling/fingerprinting.
Identify gene markers for different taxonomic groups in Archaea and Bacteria Genomes Dongying Wu 1,2, Jonathan A. Eisen 1,2 1. DOE Joint Genome Institute,
Chromatin Immunoprecipitation DNA Sequencing (ChIP-seq)
Microbial diversity and virulence probing of five different body sites Anu Rebbapragada, Pub. Health Ontario Central Lab. Canada Wei-Jen Lin, Cal State.
Advancing Science with DNA Sequence Metagenome definitions: a refresher course Natalia Ivanova MGM Workshop September 12, 2012.
The iPlant Collaborative
How will new sequencing technologies enable the HMP? Elaine Mardis, Ph.D. Associate Professor of Genetics Co-Director, Genome Sequencing Center Washington.
Marine planktonic communities from Hawaii Ocean Times Series Station (HOT/ALOHA) Mark Anderson (University of Chicago) Ildiko Frank (UCSC) Yvonne Lipsewers.
Diversity and quantification of candidate division SR1 in various anaerobic environments James P. Davis and Mostafa Elshahed Microbiology and Molecular.
Problems of Genome Assembly James Yorke and Aleksey Zimin University of Maryland, College Park 1.
Current Challenges in Metagenomics: an Overview Chandan Pal 17 th December, GoBiG Meeting.
Advancing Science with DNA Sequence Finding the genes in microbial genomes Natalia Ivanova MGM Workshop January 31, 2012.
Advancing Science with DNA Sequence Finding the genes in microbial genomes Natalia Ivanova MGM Workshop May 15, 2012.
Anna Shcherbina Bioinformatics Challenge Day 01/10/2013 De novo assembly from clinical sample This work is sponsored by the Defense Threat Reduction Agency.
Metagenome analysis Natalia Ivanova MGM Workshop February 2, 2012.
es/by-sa/2.0/. Metagenomics Prof:Rui Alves Dept Ciencies Mediques Basiques, 1st Floor, Room.
Accurate estimation of microbial communities using 16S tags
454 Genome Sequence Assembly and Analysis HC70AL S Brandon Le & Min Chen.
Metagenomic dataset preprocessing – data reduction
Environmental Genome Shotgun Sequencing of the Sargasso Sea Venter et. al (2004) Presented by Ken Vittayarukskul Steven S. White.
CyVerse Workshop Transcriptome Assembly. Overview of work RNA-Seq without a reference genome Generate Sequence QC and Processing Transcriptome Assembly.
Higher Human Biology Unit 1 Human Cells KEY AREA 5: Human Genomics.
Canadian Bioinformatics Workshops
Canadian Bioinformatics Workshops
MEGAN analysis of metagenomic data Daniel H. Huson, Alexander F. Auch, Ji Qi, et al. Genome Res
Canadian Bioinformatics Workshops
De Novo Assembly of Mitochondrial Genomes from Low Coverage Whole-Genome Sequencing Reads Fahad Alqahtani and Ion Mandoiu University of Connecticut Computer.
Tools for microbial community analysis. What I am not going to talk  Culture dependent analysis  Isolate all possible colonies  Infer community  Test.
Robert Edgar Independent scientist
Quantitative Phylogenetic Assessment of Microbial Communities in Diverse Environments Xinjun Zhang.
The Integrated Microbial Genome (IMG) systems
Rob Edwards San Diego State University
Presented By: Emily Lamoureux
Metagenomic Species Diversity.
Metagenomics: From Bench to Data Analysis 19-23rd September S rRNA-based surveys for Community Analysis: How Quantitative are they? Dr.
Preprocessing Data Rob Schmieder.
Quality Control & Preprocessing of Metagenomic Data
Gapless genome assembly of Colletotrichum higginsianum reveals chromosome structure and association of transposable elements with secondary metabolite.
Metagenomic assembly Cedric Notredame
Research in Computational Molecular Biology , Vol (2008)
Workshop on the analysis of microbial sequence data using ARB
Artefacts and Biases in Gene Set Analysis
H = -Σpi log2 pi.
Victor M. Markowitz, I-Min A. Chen, Ken Chu, Amrita Pati, Natalia N
Basic Local Alignment Search Tool (BLAST)
Artefacts and Biases in Gene Set Analysis
Example usage of mockrobiota MC resource for marker gene and metagenome sequencing pipelines. Example usage of mockrobiota MC resource for marker gene.
Toward Accurate and Quantitative Comparative Metagenomics
Presentation transcript:

Advancing Science with DNA Sequence Natalia Ivanova MGM Workshop September 29, 2011 Metagenome analysis: use case

Advancing Science with DNA Sequence Minoan eruption and metagenomics …it seemed as though the sea was being sucked backwards, as if it were being pushed back by the shaking of the land…Behind us were frightening dark clouds, rent by lightning twisted and hurled, opening to reveal huge figures of flame. These were like lightning, but bigger. From Pliny the Younger’s Letter

Advancing Science with DNA Sequence Apart from Minoan eruption… from Chernicoff & Stanley, Geology, 2007 Diagram by Gary Massoth/PMEL

Advancing Science with DNA Sequence Sampling sites white mat red mat Key gradients white vs red: Temperature60 vs 18 o C CO2 tension>99% vs <1%

Advancing Science with DNA Sequence This is what it looks like

Advancing Science with DNA Sequence Chimney material may be of biological origin

Advancing Science with DNA Sequence Standard JGI metagenome pipeline DNA sample DNA QC SSU pyrotags shotgun libraries  Community composition  Semi-quantitative – OTU abundance Illumina long mate pair Illumina standard 454 standard 454 long mate pair Metagenome IMG/M-ER contigs + unassembled reads  Community composition  Functional analysis Assembly Analysis

Advancing Science with DNA Sequence Pyrotag results – BLASTn against Greengenes database

Advancing Science with DNA Sequence PhyloDistribution results – BLASTp of metagenome CDSs against isolates in IMG

Advancing Science with DNA Sequence Pyrotags vs PhyloDistribution – white mat Big differences in abundance (an order of magnitude or more) of Bacteroidetes and Thermotogae

Advancing Science with DNA Sequence Possible explanations Primer bias in pyrotags (against Proteobacteria)? Amplification artifacts in pyrotags – well known for metagenome data Sequencing GC bias in the metagenome – low and high ( 65%) are underrepresented in Illumina data K-mer assembler problems: abundant populations may be undrrepresented in assembly if incorrect k-mer/coverage parameters selected

Advancing Science with DNA Sequence PCR artifacts in metagenome data 12 Reason: presence of free beads during the library prep step; escaped emPCR products bind to free beads and are disproportionately amplified 454 technology includes an emulsion PCR step, which may lead to artificial overrepresentation of certain sequences

Advancing Science with DNA Sequence Low GC (Brachyspira) What about GC bias? Medium GC (Arcanobacterium) High GC (Cellulomonas) Question: how do you find average/max/min GC content for a clade? Answer: IMG=>Genome Browser=>View Phylogenetically=>click on green + to select the clade, then “Add selected to Genome Cart”=>Compare Genomes=>Genome Statistics Result: Thermotogae GC percent 41 average/47 max/31 min Bacteroidetes GC percent 42.5 average/66 max/31 min

Advancing Science with DNA Sequence Are there any abundant populations that could be filtered out in assembly? Typical Pyrotagger output There are 2 highly abundant populations – just 2 clusters account for nearly all Bacteroidetes and Thermotogae in the sample

Advancing Science with DNA Sequence Let’s take a closer look at the assemblies and unassembled reads White matRed mat 454 reads total299,9751,429,091 Illumina reads total49,227,14645,337,178 Assembled contigs195,59088,776 N50, bp Longest contig, bp28,14575,483 Illumina reads mapped to assembly, % total reads mapped to assembly, % total

Advancing Science with DNA Sequence Functional analysis: metagenome as a bag of functions Red mat is taxonomically more diverse Is it more diverse functionally? White matRed mat COG clusters Pfam clusters Question: where do you find this information? Answer: IMG=>Taxon Details=>Metagenome Statistics; Genes with Pfam=>Display as a list =>Export Rarefaction curves: white mat is expected to have ~4000 different Pfams; red mat ~3600

Advancing Science with DNA Sequence Abundance Comparisons Motility and chemotaxis genes are overrepresented in white mat (detected by both Pfams and COG Categories) white matred mat

Advancing Science with DNA Sequence Is motility/chemotaxis common to all organisms in white mat? Scenario 1: the function/pathway is overrepresented because it is present in all members of the community, possibly at higher copy number Scenario 2: the function/pathway is overrepresented because it is present in one clade, which is absent from the second sample Question: can we distinguish between the two scenarios? Answer: click on the gene count for protein family/functional category, add all genes to Gene Cart=>add scaffolds to Scaffold Cart=>PhyloDistribution of all scaffolds in the Scaffold Cart

Advancing Science with DNA Sequence Are Sulfurimonas-like bacteria present in both samples? The total number of sequences in all clusters assigned to Epsilonproteobacteria is 50 in white mat and 66 in red mat Largest cluster in white mat includes 125K+ sequences Largest cluster in red mat includes 14K+ sequences Question: what about the presence of Sulfurimonas-like bacteria in the metagenomes? Answer: go to Compare Genomes=>PhyloDistribution=>Genome vs Metagenomes, select the genome; the histogram shows the number of BLASTp hits from CDSs in all metagenomes to this genome

Advancing Science with DNA Sequence Are there any methylotrophs in the white mat?

Advancing Science with DNA Sequence Conclusions Two communities have different composition; white mat sampled next to the hydrothermal vent has lower complexity Community composition as sampled by pyrotags and the metagenome may be quite different due to a number of biases Some protein families/functional categories are more abundant in one sample as compared to the other because of different community composition, and not necessarily because they are more important in this environment