Metagenomics and the microbiome

Slides:



Advertisements
Similar presentations
Clostridium difficile Colitis or Dysbiosis. Symbiostasis/Dysbiosis.
Advertisements

 Sequencing technology › Roche/454 GS-FLX (‘454’) › Illumina  Prokaryotic profiling › De novo genome sequencing › Metagenomics › SNP profiling › Species.
Tucson High School Biotechnology Course Spring 2010.
Using phylogenetic profiles to predict protein function and localization As discussed by Catherine Grasso.
Genetic Analysis in Human Disease
Use of the genomic data o Reconstruction of metabolic properties o Nature’s Microbiome o NGS in Population Genetics.
Metabarcoding 16S RNA targeted sequencing
Transcriptome Sequencing with Reference
Computational Analysis of the Taxanomical Classification of Short 16S rRNA Sequences Christel Chehoud Mentor: Brian Haas.
Next-generation sequencing
Genomics, Cancers & Infectious Diseases Qunyuan Zhang Division of Statistical Genomics Washington University School of Medicine.
Greg Phillips Veterinary Microbiology
Variant discovery Different approaches: With or without a reference? With a reference – Limiting factors are CPU time and memory required – Crossbow –
. Class 1: Introduction. The Tree of Life Source: Alberts et al.
CHAPTER 15 Microbial Genomics Genomic Cloning Techniques Vectors for Genomic Cloning and Sequencing MS2, RNA virus nt sequenced in 1976 X17, ssDNA.
VIRUS PROPERTIES Infectious – must be transmissible horizontally Intracellular – require living cells RNA or DNA genome, not both* Most all have protein.
Utilizing Fuzzy Logic for Gene Sequence Construction from Sub Sequences and Characteristic Genome Derivation and Assembly.
The Sorcerer II Global ocean sampling expedition Katrine Lekang Global Ocean Sampling project (GOS) Global Ocean Sampling project (GOS) CAMERA CAMERA METAREP.
Subsystem Approach to Genome Annotation National Microbial Pathogen Data Resource Claudia Reich NCSA, University of Illinois, Urbana.
The Microbiome and Metagenomics
MICROBIOLOGY A branch of biology that studies microorganisms and their effect on humans.
Epigenome 1. 2 Background: GWAS Genome-Wide Association Studies 3.
AP Biology Ch. 20 Biotechnology.
What is the Human Genome Project? Identify all the approximately 35,000 genes in human DNA Determine the sequences of the 3,000,000,000 bases ( = 200 phone.
Todd J. Treangen, Steven L. Salzberg
Discovery of new biomarkers as indicators of watershed health and water quality Anamaria Crisan & Mike Peabody.
H = -Σp i log 2 p i. SCOPI Each one of the many microbial communities has its own structure and ecosystem, depending on the body environment it exists.
The Human Microbiome.
Gao Song 2010/07/14. Outline Overview of Metagenomices Current Assemblers Genovo Assembly.
Bioinformatics Lecture to accompany BLAST/ORF finder activity Start with orientation to activity, for taking notes effectively Slide difference between.
The Human Microbiome: PSC, IBD, and the Gut-Liver Axis
713 Lecture 15 Host metagenomics. Progression of techniques Culture based –Use phenotypes and genotypes to ID Non-culture based, focused on 16S rDNA –Clone.
Ch. 21 Genomes and their Evolution. New approaches have accelerated the pace of genome sequencing The human genome project began in 1990, using a three-stage.
Analyzing Time Course Data: How can we pick the disappearing needle across multiple haystacks? IEEE-HPEC Bioinformatics Challenge Day Dr. C. Nicole Rosenzweig.
Analysis and comparison of very large metagenomes with fast clustering and functional annotation Weizhong Li, BMC Bioinformatics 2009 Present by Chuan-Yih.
Genome annotation and search for homologs. Genome of the week Discuss the diversity and features of selected microbial genomes. Link to the paper describing.
The Microbiome and Metagenomics
Bioinformatics Lecture to accompany BLAST/ORF finder activity
Presented by Jonas Korlach, Ph.D. Chief Scientific Officer, Pacific Biosciences Wednesday, March 18, :-- AM – --:-- AM Location:HIR, UM SMRT Sequencing.
Canadian Bioinformatics Workshops
Introducing DOTUR, a Computer Program for Defining Operational Taxonomic Units and Estimating Species Richness Patric D. Schloss and Jo Handelsman Department.
Gene prediction in metagenomic fragments: A large scale machine learning approach Katharina J Hoff, Maike Tech, Thomas Lingner, Rolf Daniel, Burkhard Morgenstern.
Computational Characterization of Short Environmental DNA Fragments Jens Stoye 1, Lutz Krause 1, Robert A. Edwards 2, Forest Rohwer 2, Naryttza N. Diaz.
Functional profiling with HUMAnN2
Prokaryotes capture solar energy
Rob Edwards San Diego State University
Interpreting exomes and genomes: a beginner’s guide
Metagenomic Species Diversity.
Introduction to Bioinformatics Resources for DNA Barcoding
Metagenomics: From Bench to Data Analysis 19-23rd September S rRNA-based surveys for Community Analysis: How Quantitative are they? Dr.
Introduction to Bioinformatics
Preprocessing Data Rob Schmieder.
Seminar in Bioinformatics (236818)
박 종 빈 (Jongbin Park, M.S. Candidate Student)
Strain profiling with StrainPhlAn and PanPhlAn
Gapless genome assembly of Colletotrichum higginsianum reveals chromosome structure and association of transposable elements with secondary metabolite.
Research in Computational Molecular Biology , Vol (2008)
Single-molecule sequencing and chromatin conformation capture enable de novo reference assembly of the domestic goat genome.
Agenda 4/10 Biotech Intro Uses for Bacteria and Viruses
Genomes and Their Evolution
Volume 160, Issue 3, Pages (January 2015)
H = -Σpi log2 pi.
Summary and Recommendations
Microbiome studies for microbial disease pathogenesis research
Volume 10, Issue 4, Pages (October 2011)
Agenda 4/8 Biotech Intro Uses for Bacteria and Viruses
VIRUSES Characteristics NOT considered living things
Unit Genomic sequencing
Summary and Recommendations
Toward Accurate and Quantitative Comparative Metagenomics
Presentation transcript:

Metagenomics and the microbiome

What is metagenomics? Looking at microorganisms via genomic sequencing rather than culturing Environmental use case: ag, biofuels, pollution monitoring Health use case: The human microbiome Most microbes are anaerobic and therefore difficult to culture

Why care about microbiome? You = 1013 your cells + 1014 bacterial cells More actionable genomics Why care about microbiome? Why should we have a microbiome at all? Why do some bugs get a pass from our immune system? Not just commensal but symbiotic – leveraged adaptation (10^5-10^6 generations of bacteria per generation of human)? An ancient adaptation, animals have had residential microbes helping with metabolism for at least 500MM years. “comparing germ-free and normal mice indicates that microbiota are responsible for most of the metabolites that are detected in plasma “ Responsive to and productive of environmental factors 23and me in the economist -- C. Diff Environmental microbiome: Sushi and Japanese gut Your pets and your microbiome People living together Vaginal microbiome and preterm birth: Lactobacilus Oral microbiome and dental applications Big thing to keep in mind is looking at this ecologically: much more than the sum of its parts in both health and disease – for instance two entirely different oral microbe populations were found to break down sugars in the same way. “One microbe, one disease” doesn’t quite work Source: http://www.med-health.net/Best-Time-To-Take-Probiotics.html http://www.mayo.edu/research/labs/gut-microbiome/projects/fecal-microbiota-transplant-c-diff-colitis

Why care about microbiome? Diagnostic or modulatory implications in: Obesity, Diabetes, Fatigue, Pain disorders Anxiety, Depression, Autism Antibiotic resistant bacteria IBD and other gut disorders Cardiac function, cancer Classic example is H Pylori and ulcers As on the last slide, one microbe, one disease appears to be the wrong framework here as well. The limited association studies that have been done make it appear that the situation is like that of the GWAS and common diseases. Small effect sizes, non-additive interactions at play. We’ll see a bit more about this later when we look at composition. Common among these diseases is that they have a somewhat nebulous, chronic character and leave sufferers trying multiple options many of which don’t work that well

Diseases and the microbiome Source: The human microbiome: at the interface of health and disease. Nature reviews genetics

Why care about microbiome? Science direct, papers containing ‘microbiome’ Why care about microbiome? Publications containing ‘microbiome’ by date on Science Direct

500-1000 species of bacteria in the human gut 500-1000 species of bacteria in the human gut. More and more is being discovered about how composition associates with disease. A “virtual organ” Many ways of looking at diversity Goal 1: Composition Source: The human microbiome: at the interface of health and disease, Nature Reviews Genetics http://huttenhower.sph.harvard.edu/metaphlan

Diversity measures Alpha diversity: how diverse is this population? Simpson’s index, Shannon’s index, etc Difference in alpha diversity before and after antibiotics Beta diversity: Taxonomical similarity between 2 samples Finding compositional associations between disease cohort and microbial makeup

Sequencing for diversity Pyrosequencing the 16s ribosomal RNA subunit < 10 taxa appear in > 95% of people in HMP Recall the implicated diseases. Looks like GWAS common disease, small effect size + common disease, rare variant 16s gene coding regions are highly conserved among bacteria. Other internal re- gions of the gene are highly variable, possessing almost entirely unique sequences in most bacterial clades. PCR amplified, sanger-sequenced CNV and primer biases 16s often fails to distinguish to the species level – genus and family resolution only. E.g. C. diff is hard for 16s to distinguish from other benign Clostridium species – a very important distinction! About 30 species make up 99% of the bacteria, but the low abundance ones might still be important. More on this problem later.

Goal 2: Functional profiling Remember: Ecological approach, not necessarily the strict composition that matters as much as what the ecosystem is doing (usually metabolically) as a whole Goal 2: Functional profiling Source: The human microbiome: at the interface of health and disease. Nature reviews genetics

Functional profiling Current: Which genes are present and are being transcribed In development: proteomics, metabolomics “most genes related to amino acid biosynthesis are not expressed by the typical gut microbiome—these compounds generally are available from host diet and metabolites. Rather, the most highly transcribed genes are those related to energy production“ Archea when present are transcriptionally most active, the fermentation efficiency of the entire gut micro- biome is limited by accumulation of hydrogen, and meth- anogenesis is the most efficient means of excess hydrogen removal E.g. IBD: whole-community shifts to amino acid transport from biosynthesis, a larger reliance on host metabolites and energy harvesting, and more genes for surviving redox stress of the inflammatory immune response. Some of these might feel roughly causal while others are probably effects

Sequencing for function Whole microbiome sequencing Avoids primer biases and is more kingdom agnostic Assembly is hard, especially where reference genomes don’t exist Assembly also hard due to the aforementioned abundance problems – if 30 species make up 99% and there is something really nasty lurking in that remaining 1%, how do we make sure it is covered?

Two big problems Can’t understand the body without understanding the microbiome Can’t understand the microbiome by only looking at bacteria Read fragment assembly is very very hard in metagenomics

Kingdom-Agnostic Metagenomics

The players in your body Your cells Metabolites Bacteria Bacteriophages Other viruses Fungi Metabolites: various small molecules. Fuel, structure, signaling, enzyme/catalytic activity.

We’ve seen these wall charts showing the signaling maps of a given cell. Imagine the complexity of the real ecosystem. That’s not complexity Source: A comprehensive map of the toll‐like receptor signaling network. Molecular Systems Biology

Prokaryotic virome: bacteriophages Infect prokaryotic bacteria Transfer genetic material among prokaryotic bacteria Rapidly evolving Put constant selection pressure on bacterial microbiome Important in antibiotic resistance gene transfer Potential as therapeutic agents

Bacteriophages: deep sequencing results 60% of sequences dissimilar from all sequence databases More than 80% come from 3 families Little intrapersonal variation Large interpersonal variation, even among relatives Diet affects community structure Antibiotic resistance genes found in viral material

Bacteriophages and function Cross the intestinal barrier possibly affecting systemic immune response Adhere to mucin glycoproteins potentially causing immune response in gut epithelium IBD/Chron’s: relative increase in Caudovirales bacteriophages Affect bacterial composition and/or host directly

Eukaryotic virome Fecal samples from healthy children shows complex community of typically pathogenic viruses Includes plant RNA viruses from food Anelloviruses and circoviruses present in nearly 100% by age 5, likely from industrial ag These are typically viruses of livestock and plants

Eukaryotic viruses and function Simian immunodeficient experiment showed enteric virome expansion Increased gut permeability and caused intestinal lining inflammation Acute diarrhea subjects showed novel viruses and highly divergent viruses with less than 35% similarity to catalogued viruses at amino acid level So the immune system does hold the enteric virome at bay, but not completely

Meiofauna Fungi, protazoa, and helminths (worms) No experiments conducted with sampling to saturation, much more work to be done 18S sequencing showed 66 genera of fungi in gut and fungi were found in 100% of samples Most subjects had less than 10 genera But high fungal diversity is bad: increases in IBD, increases with antibiotic usage Oral Candida and antibiotics Helminthic parasites seem to confer resistance to asthma, IBD, other autoimmune disorders

But it’s very hard Amplicon-based don’t work well for viruses Heterogeneous sample-prep is required Large differences in genome sizes from a few kb in viruses to 100+Mb in fungi Small genomes+divergence require lots of coverage to get contigs Viruses are highly divergent, no magic 16S like unit that works well across populations Sample-prep: large differences in cellular integrity and nucleic acid encapsidation – nuclear versus cytoplasmic

Getting the whole picture Source: Meta'omic Analytic Techniques for Studying the Intestinal Microbiome. Gastroenterology.

The assembly problem

Isn’t assembly easy? Recall: 500-1000 species of bacteria in the gut, but about 30 of them make up 99% of composition 33% of bacterial microbiome not well-represented in reference databases, > 60% for bacteriophages However, low abundance organisms can still have a large impact, so we need to know if they are there are not. <1% of reads mapped to non-bacterial taxa in Human Microbial Consortium studies

Coverage Coverage: mean number of reads per base L=read length, N=number of reads, G=genome size Problem, with 2nd gen WMS technologies, L is low and G is astronomical or unknown Thus, “full or sometimes even adequate coverage may be unattainable” Source: A primer on metagenomics

Sequence length and discovery Mostly stuck on the first 3 or 4 rows Sequence length and discovery Source: A primer on metagenomics

All is not lost Can use rarefaction curves to estimate our coverage Green is well-sampled All is not lost Can use rarefaction curves to estimate our coverage

All is not lost For composition analysis the phylogenetic marker regions (18S, 16S) work pretty well For functional analysis: can still find ORFs fairly reliably and can be aligned to homologs in databases Barring this, clustering and motif-finding yield some information Open reading frames: ie sequences with no stop codon, ie genes ORF finding is estimated at around 85% to 90% accuracy

Different sequencing approaches? Single-cell microfluidics in the future Now: hybrid long/short read approaches. “finishing” with Sanger sequencing Pacific biosciences SMRT approach SMRT errors are random, unbiased De novo assembly is 99.999% concordant with reference genomes We still haven’t addressed how to get at particularly rare or divergent sequences, what to do? Short reads are used to correct errors in long reads Single-molecule, real-time: only requires one library instead of 2nd gen + sanger

HGAP: the SMRT assembly algorithm Select longest reads as seeds Use seed reads to recruit short reads Assemble using off the shelf assembly tools Refine assembly using sequencer metadata HGAP: the SMRT assembly algorithm Source: Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data. Nature Methods

Seed selection Order reads according to length Considering reads above length L ~ 6kb Rough end-pair align reads until ~20x coverage is reached 17.7k seed reads, averaging 7.2kb in length, already at 86.9% accuracy compared to reference Average read length from SMRT is 3.2kb, 141k total continuous long reads were generated

Recruiting short reads Align all reads to the seed reads Each read can be mapped to multiple seed reads, controlled by –bestn parameter -bestn must be chosen so that the coverage of seeds + short aligned reads is about equal to the expected coverage of the sequenced genome Use MSA and consensus to error correct long reads Result is 17.2k reads of length 5.7kb with 99.9% accuracy

Overlap layout consensus assembly Source: Overview of Genome Assembly Algorithms. Ntino Krampis. http://www.slideshare.net/agbiotec/overview-of-genome-assembly-algorithms

Refinement Use Quiver algorithm which looks at raw physical data from sequencer Uses an HMM and observed data to tell classify base calls as genuine or spurious Do a final consensus alignment, conditioned on Quiver’s probabilities Final result: 17.2k reads, length of 5.7kb, accuracy of 99.999506%

Summary Most of the cells in your body aren’t yours But looking at bacteria alone is insufficient Expanding our view causes us to look for needles in haystacks which is beyond most conventional approaches Motif-finding and hybrid approaches will work until 3rd gen sequencing arrives

References Cho, Ilseung, and Martin J. Blaser. "The human microbiome: at the interface of health and disease." Nature Reviews Genetics 13.4 (2012): 260-270. Wooley, John C., Adam Godzik, and Iddo Friedberg. "A primer on metagenomics." PLoS computational biology 6.2 (2010): e1000667. Chin, Chen-Shan, et al. "Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data." Nature methods 10.6 (2013): 563-569. Human Microbiome Project Consortium. "Structure, function and diversity of the healthy human microbiome." Nature 486.7402 (2012): 207-214. Norman, Jason M., Scott A. Handley, and Herbert W. Virgin. "Kingdom-agnostic metagenomics and the importance of complete characterization of enteric microbial communities." Gastroenterology 146.6 (2014): 1459-1469. Morgan, X. C., and C. Huttenhower. "Meta'omic Analytic Techniques for Studying the Intestinal Microbiome." Gastroenterology (2014).