Functional profiling with HUMAnN2

Slides:



Advertisements
Similar presentations
Clostridium difficile Colitis or Dysbiosis. Symbiostasis/Dysbiosis.
Advertisements

Gene Expression And Regulation Bioinformatics January 11, 2006 D. A. McClellan
Sequence Analysis. Today How to retrieve a DNA sequence? How to search for other related DNA sequences? How to search for its protein sequence? How to.
The NIH Human Microbiome Project
The Microbiome and Metagenomics
Metagenomic Analysis Using MEGAN4
Molecular Microbial Ecology
H = -Σp i log 2 p i. SCOPI Each one of the many microbial communities has its own structure and ecosystem, depending on the body environment it exists.
Construction of Substitution Matrices
Meta’omic functional profiling with ShortBRED Galeb Abu-Ali Curtis Huttenhower Harvard T.H. Chan School of Public Health Department of Biostatistics.
Metagenomic Analysis Using MEGAN4 Peter R. Hoyt Director, OSU Bioinformatics Graduate Certificate Program Matthew Vaughn iPlant, University of Texas Super.
Predicting protein degradation rates Karen Page. The central dogma DNA RNA protein Transcription Translation The expression of genetic information stored.
“Observing the Dynamics of the Human Immune System Coupled to the Microbiome in Health and Disease” CASIS Workshop on Biomedical Research Aboard the ISS.
Metagenomics at Second Genome
Meta’omic functional profiling with ShortBRED Curtis Huttenhower Harvard School of Public Health Department of Biostatistics U. Oregon.
Construction of Substitution matrices
Central hub for biological data UniProtKB/Swiss-Prot is a central hub for biological data: over 120 databases are cross-referenced (EMBL/DDBJ/GenBank,
Canadian Bioinformatics Workshops
Canadian Bioinformatics Workshops
An Introduction to Meta’omic Analyses Curtis Huttenhower Galeb Abu-Ali Eric Franzosa Harvard T.H. Chan School of Public Health Department of Biostatistics.
Using the bioBakery Curtis Huttenhower
Meta’omic functional profiling with ShortBRED
Metagenomic Species Diversity.
Metagenomics: From Bench to Data Analysis 19-23rd September S rRNA-based surveys for Community Analysis: How Quantitative are they? Dr.
The Transcriptional Landscape of the Mammalian Genome
Genomic Data Manipulation Thinking about data visually
Gene expression from RNA-Seq
박 종 빈 (Jongbin Park, M.S. Candidate Student)
Strain profiling with StrainPhlAn and PanPhlAn
An Introduction to Meta’omic Analyses
Automating reproducible analyses with AnADAMA2 and bioBakery Workflows
Research in Computational Molecular Biology , Vol (2008)
Unraveling the microbial profile of the rhizosphere of SDS-suppressive soils in Soybean fields Ali Y. Srour1, Jason Bond1, Leonor Leandro2, Dean Malvick3.
Functional profiling with HUMAnN2
Taxonomic profiling with MetaPhlAn2
Identifying personal microbiomes using metagenomic codes
Functional Annotation Final Results
Systematic Characterization and Analysis of the Taxonomic Drivers of Functional Shifts in the Human Microbiome  Ohad Manor, Elhanan Borenstein  Cell Host.
Taxonomic profiling with MetaPhlAn2
Genomic Data Manipulation
Strain profiling with StrainPhlAn
Human Gut Microbiome: Function Matters
Metagenomics and metatranscriptomics: Windows on CF-associated viral and microbial communities  Yan Wei Lim, Robert Schmieder, Matthew Haynes, Dana Willner,
Volume 160, Issue 3, Pages (January 2015)
Curtis Huttenhower Galeb Abu-Ali Eric Franzosa
H = -Σpi log2 pi.
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Volume 16, Issue 3, Pages (September 2014)
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Volume 17, Issue 2, Pages (February 2015)
Metagenomics and metatranscriptomics: Windows on CF-associated viral and microbial communities  Yan Wei Lim, Robert Schmieder, Matthew Haynes, Dana Willner,
Daniel A. Peterson, Daniel N. Frank, Norman R. Pace, Jeffrey I. Gordon 
Faecalibacterium prausnitzii subspecies–level dysbiosis in the human gut microbiome underlying atopic dermatitis  Han Song, PhD, Young Yoo, MD, PhD, Junghyun.
EXTENDING GENE ANNOTATION WITH GENE EXPRESSION
Volume 20, Issue 5, Pages (November 2014)
Volume 21, Issue 5, Pages e3 (May 2017)
Microbiome studies for microbial disease pathogenesis research
Skin Microbiome Surveys Are Strongly Influenced by Experimental Design
Presented by Jacob Miller
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Daniel A. Peterson, Daniel N. Frank, Norman R. Pace, Jeffrey I. Gordon 
Volume 20, Issue 5, Pages (November 2014)
Volume 17, Issue 3, Pages (March 2015)
Basic Local Alignment Search Tool
A typical current computational meta'omic pipeline to analyze and contrast microbial communities. A typical current computational meta'omic pipeline to.
Volume 20, Issue 4, Pages (October 2016)
A Presentation by Regina Strelecki
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Toward Accurate and Quantitative Comparative Metagenomics
Comparison of 16S sequencing and shallow shotgun recovery of species-level taxa. Comparison of 16S sequencing and shallow shotgun recovery of species-level.
Presentation transcript:

Functional profiling with HUMAnN2 Eric Franzosa Jason Lloyd-Price Functional profiling with HUMAnN2 Curtis Huttenhower (chuttenh@hsph.harvard.edu) Galeb Abu-Ali (gabuali@hsph.harvard.edu) Ali Rahnavard (rah@broadinstitute.org) STAMPS 2017 08-08-17 Harvard T.H. Chan School of Public Health Department of Biostatistics

The two big questions of microbial community profiling: What are they doing? Who is there? (functional profiling) (taxonomic profiling) Like many great bioinformatics problems, answering these questions begins with sequence search!

HUMAnN2 for taxon-specific metagenome and metatranscriptome functional profiling The relative abundance of gene i in a metagenome is the number of reads j that map to a gene sequence in the family, weighted by the inverse p-value of each mapping and normalized by the average length of all gene sequences in the orthologous family. Eric Franzosa Lauren McIver http://huttenhower.sph.harvard.edu/humann2

HUMAnN2: stratified output UniRef gene cluster Gene name Total gene abundance (RPK) UniRef90_R6K3Z5: IMP dehydrogenase 600.95 UniRef90_R6K3Z5: IMP dehydrogenase|Bacteroides_caccae 234.76 UniRef90_R6K3Z5: IMP dehydrogenase|Bacteroides_dorei 107.38 UniRef90_R6K3Z5: IMP dehydrogenase|Bacteroides_ovatus 92.18 UniRef90_R6K3Z5: IMP dehydrogenase|Bacteroides_stercoris 83.95 UniRef90_R6K3Z5: IMP dehydrogenase|Bacteroides_vulgatus 57.27 UniRef90_R6K3Z5: IMP dehydrogenase|unclassified 25.41 Σ Per-species & unclassified stratifications ~HUMAnN1 MetaCyc pathway Pathway abundance & coverage PWY-7221: GTP biosynthesis 200.35 1 PWY-7221: GTP biosynthesis|Bacteroides_caccae 120.23 PWY-7221: GTP biosynthesis|Bacteroides_dorei 11.12

HUMAnN2 real-world performance ~60% of reads align before translated search ~15% more reads align during translated search (total ~75%) Applied HUMAnN2’s tiered search to profile >2K human metagenomes (HMP1-II, six major body sites) Pangenome search tier 1-2 orders of magnitude faster than comprehensive translated search DIAMOND w/ comprehensive protein db bowtie2 w/ sample-specific pangenome db

And it works on non-human meta’omes, too Luke Thompson

Quantifying the diversity of species contributing a function within and across subjects low between-subject diversity high low simple, consistent simple, variable within-subject diversity A pathway’s contributional alpha-diversity is calculated from the distribution of taxa providing it (DNA or RNA) within a community; contributional beta-diversity is the corresponding comparison between communities. complex, consistent complex, variable high

HUMAnN2 reveals unusual “relative expression” in paired metatranscriptomes & metagenomes Sucrose degradation follows a complex attribution pattern across ~200 human gut metagenomes… In collaboration with the STARR Consortium & HPFS cohort …but its expression can be dominated by a single species in paired gut metatranscriptomes!

The “HMP2” IBD Multi’omics Data resource http://ibdmdb.org With Ramnik Xavier

The IBD Multi’omics DataBase http://ibdmdb.org Cesar Arze

The IBD metatranscriptome in the HMP2 IBDMDB 117 Subjects: 59 Crohn’s Disease 34 Ulcerative Colitis 24 non-IBD Controls Gender: 57 Female 59 Male 1 unknown Cohorts: 32 MGH adult new onset 30 Cedars-Sinai adult establ. 31 Cincinnati peds new onset 11 Emory peds new onset 13 MGH peds new onset Melanie Schirmer

Different microbes can transcribe shared pathways HISDEG-PWY: L-histidine degradation I Histidine is an α-amino acid that is used in the biosynthesis of proteins A. putredinis has been implicated in IBD Major contributor to transcription in subsets of IBD patients

PWY-7094: fatty acid salvage Pathways can be contributed by different microbes over time PWY-7094: fatty acid salvage Faecalibacterium prausnitzii Time-courses for individual patients: CD Patient 1 CD Patient 2

https://bitbucket.org/biobakery/biobakery/wiki/humann2 HUMAnN2 tutorial https://bitbucket.org/biobakery/biobakery/wiki/humann2

HUMAnN2 synthetic evaluation (genes) …and is ~3x faster ~2.1 hours ~0.7 hours (10M reads, 8 cores) HUMAnN2 tiered search is more accurate… Comprehensive search suffers from spurious hits ...and provides accurate per-species quantification! Compare exp. vs. obs. gene abundance 1x Staggered abundance ~0.1x to 100x coverage Synthetic human gut metagenome (top 20 species)

HUMAnN2 real-world performance

Considerations for paired metatranscriptomes & metagenomes $ humann2_rna_dna_norm --input_dna <DNA genefamilies file> --input_rna <RNA genefamilies file> --output_basename <basename of the 3 output files> Calculates RNA/DNA abundance ratios Smooths the RNA and DNA abundances prior to taking the ratio Also outputs smoothed RNA and DNA files UniRef90_R6K3Z5: IMP dehydrogenase 2.02 UniRef90_R6K3Z5: IMP dehydrogenase|Bacteroides_caccae 5.96 UniRef90_R6K3Z5: IMP dehydrogenase|Bacteroides_dorei 3.82 UniRef90_R6K3Z5: IMP dehydrogenase|Bacteroides_ovatus 1.80 UniRef90_R6K3Z5: IMP dehydrogenase|Bacteroides_stercoris 0.87 UniRef90_R6K3Z5: IMP dehydrogenase|Bacteroides_vulgatus 0.34 UniRef90_R6K3Z5: IMP dehydrogenase|unclassified 1.96