Presentation is loading. Please wait.

Presentation is loading. Please wait.

Computational metagenomics and the human microbiome Curtis Huttenhower 01-21-11 Harvard School of Public Health Department of Biostatistics.

Similar presentations


Presentation on theme: "Computational metagenomics and the human microbiome Curtis Huttenhower 01-21-11 Harvard School of Public Health Department of Biostatistics."— Presentation transcript:

1 Computational metagenomics and the human microbiome Curtis Huttenhower 01-21-11 Harvard School of Public Health Department of Biostatistics

2 What to do with your metagenome? 2 (x10 10 ) Diagnostic or prognostic biomarker for host disease Public health tool monitoring population health and interactions Comprehensive snapshot of microbial ecology and evolution Reservoir of gene and protein functional information Who’s there? What are they doing? What do functional genomic data tell us about microbiomes? What can our microbiomes tell us about us? * * Using terabases of sequence and thousands of experimental results

3 The Human Microbiome Project 3 2007 - ongoing 300 “normal” adults, 18-40 16S rDNA + WGS 5 sites/18 samples + blood Oral cavity: saliva, tongue, palate, buccal mucosa, gingiva, tonsils, throat, teeth Skin: ears, inner elbows Nasal cavity Gut: stool Vagina: introitus, mid, fornix Reference genomes (~200+800) All healthy subjects; followup projects in psoriasis, Crohn’s, colitis, obesity, acne, cancer, antibiotic resistant infection… Hamady, 2009 Kolenbrander, 2010

4 HMP Organisms: Everyone and everywhere is different 4 ← Body sites + individuals → ← Organisms (taxa) → ear gutnosemouthvaginaarm mucosapalategingivatonsilssalivasub. plaq.sup. plaq.throattongue Every microbiome is surprisingly different Most organisms are rare in most places Even common organisms vary tremendously in abundance among individuals Aerobicity, interaction with the immune system, and extracellular medium appear to be major determinants There are few organismal biotypes in health

5 HUMAnN: Community metabolic and functional reconstruction 5 WGS reads Pathways/ modules Genes (KOs) Pathways (KEGGs) Functional seq. KEGG + MetaCYC CAZy, TCDB, VFDB, MEROPS… BLAST → Genes Genes → Pathways MinPath (Ye 2009) Smoothing Witten-Bell Gap filling c(g) = max( c(g), median ) 300 subjects 1-3 visits/subject ~6 body sites/visit 10-200M reads/sample 100bp reads BLAST ? Taxonomic limitation Rem. paths in taxa < ave. Xipe Distinguish zero/low (Rodriguez-Mueller in review) HMP Unified Metabolic Analysis Network

6 HUMAnN: Community metabolic and functional reconstruction 6 Pathway coveragePathway abundance

7 HUMAnN: Validating gene and pathway abundances on synthetic data 7 Validated on individual genes, module coverage + abundance False negatives: short genes (<100bp), taxonomically rare pathways False positives: large and multicopy (not many in bacteria)

8 HUMAnN: The steps that didn’t make the cut 8 Abundance Coverage

9 Functional modules in 741 HMP samples 9 Coverage Abundance ANO(BM)PFO(SP)SRCRCO(TD) ← Samples → ← Pathways→ Zero microbes (of ~1,000) are core among body sites Zero microbes are core among individuals 19 (of ~220) pathways are present in every sample 53 pathways are present in 90%+ samples Only 31 (of 1,110) pathways are present/absent from exactly one body site 263 pathways are differentially abundant in exactly one body site

10 Microbial environment trumps host environment (in health) 10 HMP stool, colored by BMIMetaHIT stool, colored by IBD ← Microbes→ ← Pathways→ Aerobic body sites Gastrointestinal body sites Pathways in all body sites (“core”) Human microbiome structure dictated primarily by microbial niche, not host (in health) Huge variation in who’s there; small variation in what they’re doing Note: definitely variation in how these functions are implemented Does not yet speak to environment (diet!), genetics, or disease

11 Gene expression SNP genotypes Metagenomic biomarker discovery 11 Healthy/IBD BMI Diet Taxa & pathways Batch effects? Population structure? Niches & Phylogeny Test for correlates Multiple hypothesis correction Feature selection p >> n Confounds/ stratification/ environment Cross- validate Biological story? Independent sample Intervention/ perturbation

12 LEfSe: Metagenomic class comparison and explanation 12 LEfSe http://huttenhower.sph.harvard.edu/lefse Nicola Segata LDA + Effect Size

13 LEfSe: Evaluation on synthetic data 13

14 Microbes characteristic of the oral and gut microbiota 14

15 Aerobic, microaerobic and anaerobic communities High oxygen:skin, nasal Mid oxygen:vaginal, oral Low oxygen:gut

16 LEfSe: The TRUC murine colitis microbiota 16 With Wendy Garrett

17 MetaHIT: The gut microbiome and IBD 17 WGS reads Pathways/ modules 124 subjects:99 healthy 21 UC + 4 CD ReBLASTed against KEGG since published data obfuscates read counts Taxa Phymm Brady 2009 Genes (KOs) Pathways (KEGGs) Qin 2010 With Ramnik Xavier, Joshua Korzenik

18 MetaHIT: Taxonomic CD biomarkers 18 Firmicutes Enterobacteriaceae Up in CD Down in CD UC

19 MetaHIT: Functional CD biomarkers 19 Motility Transporters Sugar metabolism Down in CD Up in CD Subset of enriched modules in CD patientsSubset of enriched pathways in CD patients Growth/replication

20 Sleipnir C++ library for computational functional genomics Data types for biological entities Microarray data, interaction data, genes and gene sets, functional catalogs, etc. etc. Network communication, parallelization Efficient machine learning algorithms Generative (Bayesian) and discriminative (SVM) And it’s fully documented! Sleipnir: Software for scalable functional genomics Massive datasets require efficient algorithms and implementations. 20 It’s also speedy: microbial data integration computation takes <3hrs. http://huttenhower.sph.harvard.edu/sleipnir http://huttenhower.sph.harvard.edu/lefse http://huttenhower.sph.harvard.edu/humann

21 Thanks! 21 Jacques Izard Wendy Garrett Pinaki SarderNicola Segata Levi WaldronLarisa Miropolsky Interested? We’re recruiting students and postdocs! Human Microbiome Project HMP Metabolic Reconstruction George Weinstock Jennifer Wortman Owen White Makedonka Mitreva Erica Sodergren Vivien Bonazzi Jane Peterson Lita Proctor Sahar Abubucker Yuzhen Ye Beltran Rodriguez-Mueller Jeremy Zucker Qiandong Zeng Mathangi Thiagarajan Brandi Cantarel Maria Rivera Barbara Methe Bill Klimke Daniel Haft Ramnik XavierDirk Gevers Bruce BirrenMark Daly Doyle WardEric Alm Ashlee EarlLisa Cosimi Sarah Fortune http://huttenhower.sph.harvard.edu/

22

23 The LEfSe algorithm 23 Statistical consistency Biological consistency Overall effect size

24 HMP: Metabolism, host-microbiome interactions, and microbial taxa 24 >3200 gene families differential in the mucosa >1500 upregulated outside the mucosa and not in any Actinobacterial genome 16S WGS


Download ppt "Computational metagenomics and the human microbiome Curtis Huttenhower 01-21-11 Harvard School of Public Health Department of Biostatistics."

Similar presentations


Ads by Google