2What is metagenomics?Looking at microorganisms via genomic sequencing rather than culturingEnvironmental use case: ag, biofuels, pollution monitoringHealth use case: The human microbiomeMost microbes are anaerobic and therefore difficult to culture
3Why care about microbiome? You = 1013 your cells bacterial cellsMore actionable genomicsWhy care about microbiome?Why should we have a microbiome at all? Why do some bugs get a pass from our immune system? Not just commensal but symbiotic – leveraged adaptation (10^5-10^6 generations of bacteria per generation of human)?An ancient adaptation, animals have had residential microbes helping with metabolism for at least 500MM years.“comparing germ-free and normal mice indicates that microbiota are responsible for most of the metabolites that are detected in plasma “Responsive to and productive of environmental factors23and me in the economist -- C. DiffEnvironmental microbiome: Sushi and Japanese gutYour pets and your microbiomePeople living togetherVaginal microbiome and preterm birth: LactobacilusOral microbiome and dental applicationsBig thing to keep in mind is looking at this ecologically: much more than the sum of its parts in both health and disease – for instance two entirely different oral microbe populations were found to break down sugars in the same way. “One microbe, one disease” doesn’t quite workSource:
4Why care about microbiome? Diagnostic or modulatory implications in:Obesity, Diabetes, Fatigue, Pain disordersAnxiety, Depression, AutismAntibiotic resistant bacteriaIBD and other gut disordersCardiac function, cancerClassic example is H Pylori and ulcersAs on the last slide, one microbe, one disease appears to be the wrong framework here as well. The limited association studies that have been done make it appear that the situation is like that of the GWAS and common diseases. Small effect sizes, non-additive interactions at play. We’ll see a bit more about this later when we look at composition.Common among these diseases is that they have a somewhat nebulous, chronic character and leave sufferers trying multiple options many of which don’t work that well
5Diseases and the microbiome Source: The human microbiome: at the interface of health and disease. Nature reviews genetics
6Why care about microbiome? Science direct, papers containing ‘microbiome’Why care about microbiome?Publications containing ‘microbiome’ by date on Science Direct
7500-1000 species of bacteria in the human gut species of bacteria in the human gut. More and more is being discovered about how composition associates with disease. A “virtual organ”Many ways of looking at diversityGoal 1: CompositionSource: The human microbiome: at the interface of health and disease, Nature Reviews Genetics
8Diversity measuresAlpha diversity: how diverse is this population? Simpson’s index, Shannon’s index, etcDifference in alpha diversity before and after antibioticsBeta diversity: Taxonomical similarity between 2 samplesFinding compositional associations between disease cohort and microbial makeup
9Sequencing for diversity Pyrosequencing the 16s ribosomal RNA subunit< 10 taxa appear in > 95% of people in HMPRecall the implicated diseases. Looks like GWAS common disease, small effect size + common disease, rare variant16s gene coding regions are highly conserved among bacteria. Other internal re- gions of the gene are highly variable, possessing almost entirely unique sequences in most bacterial clades.PCR amplified, sanger-sequencedCNV and primer biases16s often fails to distinguish to the species level – genus and family resolution only. E.g. C. diff is hard for 16s to distinguish from other benign Clostridium species – a very important distinction!About 30 species make up 99% of the bacteria, but the low abundance ones might still be important. More on this problem later.
10Goal 2: Functional profiling Remember: Ecological approach, not necessarily the strict composition that matters as much as what the ecosystem is doing (usually metabolically) as a wholeGoal 2: Functional profilingSource: The human microbiome: at the interface of health and disease. Nature reviews genetics
11Functional profilingCurrent: Which genes are present and are being transcribedIn development: proteomics, metabolomics“most genes related to amino acid biosynthesis are not expressed by the typical gut microbiome—these compounds generally are available from host diet and metabolites. Rather, the most highly transcribed genes are those related to energy production“Archea when present are transcriptionally most active, the fermentation efficiency of the entire gut micro- biome is limited by accumulation of hydrogen, and meth- anogenesis is the most efficient means of excess hydrogen removalE.g. IBD: whole-community shifts to amino acid transport from biosynthesis, a larger reliance on host metabolites and energy harvesting, and more genes for surviving redox stress of the inflammatory immune response. Some of these might feel roughly causal while others are probably effects
12Sequencing for function Whole microbiome sequencingAvoids primer biases and is more kingdom agnosticAssembly is hard, especially where reference genomes don’t existAssembly also hard due to the aforementioned abundance problems – if 30 species make up 99% and there is something really nasty lurking in that remaining 1%, how do we make sure it is covered?
13Two big problemsCan’t understand the body without understanding the microbiomeCan’t understand the microbiome by only looking at bacteriaRead fragment assembly is very very hard in metagenomics
15The players in your body Your cellsMetabolitesBacteriaBacteriophagesOther virusesFungiMetabolites: various small molecules. Fuel, structure, signaling, enzyme/catalytic activity.
16We’ve seen these wall charts showing the signaling maps of a given cell. Imagine the complexity of the real ecosystem.That’s not complexitySource: A comprehensive map of the toll‐like receptor signaling network. Molecular Systems Biology
17Prokaryotic virome: bacteriophages Infect prokaryotic bacteriaTransfer genetic material among prokaryotic bacteriaRapidly evolvingPut constant selection pressure on bacterial microbiomeImportant in antibiotic resistance gene transferPotential as therapeutic agents
18Bacteriophages: deep sequencing results 60% of sequences dissimilar from all sequence databasesMore than 80% come from 3 familiesLittle intrapersonal variationLarge interpersonal variation, even among relativesDiet affects community structureAntibiotic resistance genes found in viral material
19Bacteriophages and function Cross the intestinal barrier possibly affecting systemic immune responseAdhere to mucin glycoproteins potentially causing immune response in gut epitheliumIBD/Chron’s: relative increase in Caudovirales bacteriophagesAffect bacterial composition and/or host directly
20Eukaryotic viromeFecal samples from healthy children shows complex community of typically pathogenic virusesIncludes plant RNA viruses from foodAnelloviruses and circoviruses present in nearly 100% by age 5, likely from industrial agThese are typically viruses of livestock and plants
21Eukaryotic viruses and function Simian immunodeficient experiment showed enteric virome expansionIncreased gut permeability and caused intestinal lining inflammationAcute diarrhea subjects showed novel viruses and highly divergent viruses with less than 35% similarity to catalogued viruses at amino acid levelSo the immune system does hold the enteric virome at bay, but not completely
22Meiofauna Fungi, protazoa, and helminths (worms) No experiments conducted with sampling to saturation, much more work to be done18S sequencing showed 66 genera of fungi in gut and fungi were found in 100% of samplesMost subjects had less than 10 generaBut high fungal diversity is bad: increases in IBD, increases with antibiotic usageOral Candida and antibioticsHelminthic parasites seem to confer resistance to asthma, IBD, other autoimmune disorders
23But it’s very hard Amplicon-based don’t work well for viruses Heterogeneous sample-prep is requiredLarge differences in genome sizes from a few kb in viruses to 100+Mb in fungiSmall genomes+divergence require lots of coverage to get contigsViruses are highly divergent, no magic 16S like unit that works well across populationsSample-prep: large differences in cellular integrity and nucleic acid encapsidation – nuclear versus cytoplasmic
24Getting the whole picture Source: Meta'omic Analytic Techniques for Studying the Intestinal Microbiome. Gastroenterology.
26Isn’t assembly easy?Recall: species of bacteria in the gut, but about 30 of them make up 99% of composition33% of bacterial microbiome not well-represented in reference databases, > 60% for bacteriophagesHowever, low abundance organisms can still have a large impact, so we need to know if they are there are not.<1% of reads mapped to non-bacterial taxa in Human Microbial Consortium studies
27Coverage Coverage: mean number of reads per base L=read length, N=number of reads, G=genome sizeProblem, with 2nd gen WMS technologies, L is low and G is astronomical or unknownThus, “full or sometimes even adequate coverage may be unattainable”Source: A primer on metagenomics
28Sequence length and discovery Mostly stuck on the first 3 or 4 rowsSequence length and discoverySource: A primer on metagenomics
29All is not lost Can use rarefaction curves to estimate our coverage Green is well-sampledAll is not lostCan use rarefaction curves to estimate our coverage
30All is not lostFor composition analysis the phylogenetic marker regions (18S, 16S) work pretty wellFor functional analysis: can still find ORFs fairly reliably and can be aligned to homologs in databasesBarring this, clustering and motif-finding yield some informationOpen reading frames: ie sequences with no stop codon, ie genesORF finding is estimated at around 85% to 90% accuracy
31Different sequencing approaches? Single-cell microfluidics in the futureNow: hybrid long/short read approaches. “finishing” with Sanger sequencingPacific biosciences SMRT approachSMRT errors are random, unbiasedDe novo assembly is % concordant with reference genomesWe still haven’t addressed how to get at particularly rare or divergent sequences, what to do?Short reads are used to correct errors in long readsSingle-molecule, real-time: only requires one library instead of 2nd gen + sanger
32HGAP: the SMRT assembly algorithm Select longest reads as seedsUse seed reads to recruit short readsAssemble using off the shelf assembly toolsRefine assembly using sequencer metadataHGAP: the SMRT assembly algorithmSource: Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data. Nature Methods
33Seed selection Order reads according to length Considering reads above length L ~ 6kbRough end-pair align reads until ~20x coverage is reached17.7k seed reads, averaging 7.2kb in length, already at 86.9% accuracy compared to referenceAverage read length from SMRT is 3.2kb, 141k total continuous long reads were generated
34Recruiting short reads Align all reads to the seed readsEach read can be mapped to multiple seed reads, controlled by –bestn parameter-bestn must be chosen so that the coverage of seeds + short aligned reads is about equal to the expected coverage of the sequenced genomeUse MSA and consensus to error correct long readsResult is 17.2k reads of length 5.7kb with 99.9% accuracy
36RefinementUse Quiver algorithm which looks at raw physical data from sequencerUses an HMM and observed data to tell classify base calls as genuine or spuriousDo a final consensus alignment, conditioned on Quiver’s probabilitiesFinal result: 17.2k reads, length of 5.7kb, accuracy of %
37Summary Most of the cells in your body aren’t yours But looking at bacteria alone is insufficientExpanding our view causes us to look for needles in haystacks which is beyond most conventional approachesMotif-finding and hybrid approaches will work until 3rd gen sequencing arrives
38ReferencesCho, Ilseung, and Martin J. Blaser. "The human microbiome: at the interface of health and disease." Nature Reviews Genetics 13.4 (2012):Wooley, John C., Adam Godzik, and Iddo Friedberg. "A primer on metagenomics." PLoS computational biology 6.2 (2010): eChin, Chen-Shan, et al. "Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data." Nature methods 10.6 (2013):Human Microbiome Project Consortium. "Structure, function and diversity of the healthy human microbiome." Nature (2012):Norman, Jason M., Scott A. Handley, and Herbert W. Virgin. "Kingdom-agnostic metagenomics and the importance of complete characterization of enteric microbial communities." Gastroenterology (2014):Morgan, X. C., and C. Huttenhower. "Meta'omic Analytic Techniques for Studying the Intestinal Microbiome." Gastroenterology (2014).