Functional profiling with HUMAnN2 Eric Franzosa Jason Lloyd-Price Functional profiling with HUMAnN2 Curtis Huttenhower (chuttenh@hsph.harvard.edu) Galeb Abu-Ali (gabuali@hsph.harvard.edu) Ali Rahnavard (rah@broadinstitute.org) STAMPS 2017 08-08-17 Harvard T.H. Chan School of Public Health Department of Biostatistics
The two big questions of microbial community profiling: What are they doing? Who is there? (functional profiling) (taxonomic profiling) Like many great bioinformatics problems, answering these questions begins with sequence search!
HUMAnN2 for taxon-specific metagenome and metatranscriptome functional profiling The relative abundance of gene i in a metagenome is the number of reads j that map to a gene sequence in the family, weighted by the inverse p-value of each mapping and normalized by the average length of all gene sequences in the orthologous family. Eric Franzosa Lauren McIver http://huttenhower.sph.harvard.edu/humann2
HUMAnN2: stratified output UniRef gene cluster Gene name Total gene abundance (RPK) UniRef90_R6K3Z5: IMP dehydrogenase 600.95 UniRef90_R6K3Z5: IMP dehydrogenase|Bacteroides_caccae 234.76 UniRef90_R6K3Z5: IMP dehydrogenase|Bacteroides_dorei 107.38 UniRef90_R6K3Z5: IMP dehydrogenase|Bacteroides_ovatus 92.18 UniRef90_R6K3Z5: IMP dehydrogenase|Bacteroides_stercoris 83.95 UniRef90_R6K3Z5: IMP dehydrogenase|Bacteroides_vulgatus 57.27 UniRef90_R6K3Z5: IMP dehydrogenase|unclassified 25.41 Σ Per-species & unclassified stratifications ~HUMAnN1 MetaCyc pathway Pathway abundance & coverage PWY-7221: GTP biosynthesis 200.35 1 PWY-7221: GTP biosynthesis|Bacteroides_caccae 120.23 PWY-7221: GTP biosynthesis|Bacteroides_dorei 11.12
HUMAnN2 real-world performance ~60% of reads align before translated search ~15% more reads align during translated search (total ~75%) Applied HUMAnN2’s tiered search to profile >2K human metagenomes (HMP1-II, six major body sites) Pangenome search tier 1-2 orders of magnitude faster than comprehensive translated search DIAMOND w/ comprehensive protein db bowtie2 w/ sample-specific pangenome db
And it works on non-human meta’omes, too Luke Thompson
Quantifying the diversity of species contributing a function within and across subjects low between-subject diversity high low simple, consistent simple, variable within-subject diversity A pathway’s contributional alpha-diversity is calculated from the distribution of taxa providing it (DNA or RNA) within a community; contributional beta-diversity is the corresponding comparison between communities. complex, consistent complex, variable high
HUMAnN2 reveals unusual “relative expression” in paired metatranscriptomes & metagenomes Sucrose degradation follows a complex attribution pattern across ~200 human gut metagenomes… In collaboration with the STARR Consortium & HPFS cohort …but its expression can be dominated by a single species in paired gut metatranscriptomes!
The “HMP2” IBD Multi’omics Data resource http://ibdmdb.org With Ramnik Xavier
The IBD Multi’omics DataBase http://ibdmdb.org Cesar Arze
The IBD metatranscriptome in the HMP2 IBDMDB 117 Subjects: 59 Crohn’s Disease 34 Ulcerative Colitis 24 non-IBD Controls Gender: 57 Female 59 Male 1 unknown Cohorts: 32 MGH adult new onset 30 Cedars-Sinai adult establ. 31 Cincinnati peds new onset 11 Emory peds new onset 13 MGH peds new onset Melanie Schirmer
Different microbes can transcribe shared pathways HISDEG-PWY: L-histidine degradation I Histidine is an α-amino acid that is used in the biosynthesis of proteins A. putredinis has been implicated in IBD Major contributor to transcription in subsets of IBD patients
PWY-7094: fatty acid salvage Pathways can be contributed by different microbes over time PWY-7094: fatty acid salvage Faecalibacterium prausnitzii Time-courses for individual patients: CD Patient 1 CD Patient 2
https://bitbucket.org/biobakery/biobakery/wiki/humann2 HUMAnN2 tutorial https://bitbucket.org/biobakery/biobakery/wiki/humann2
HUMAnN2 synthetic evaluation (genes) …and is ~3x faster ~2.1 hours ~0.7 hours (10M reads, 8 cores) HUMAnN2 tiered search is more accurate… Comprehensive search suffers from spurious hits ...and provides accurate per-species quantification! Compare exp. vs. obs. gene abundance 1x Staggered abundance ~0.1x to 100x coverage Synthetic human gut metagenome (top 20 species)
HUMAnN2 real-world performance
Considerations for paired metatranscriptomes & metagenomes $ humann2_rna_dna_norm --input_dna <DNA genefamilies file> --input_rna <RNA genefamilies file> --output_basename <basename of the 3 output files> Calculates RNA/DNA abundance ratios Smooths the RNA and DNA abundances prior to taking the ratio Also outputs smoothed RNA and DNA files UniRef90_R6K3Z5: IMP dehydrogenase 2.02 UniRef90_R6K3Z5: IMP dehydrogenase|Bacteroides_caccae 5.96 UniRef90_R6K3Z5: IMP dehydrogenase|Bacteroides_dorei 3.82 UniRef90_R6K3Z5: IMP dehydrogenase|Bacteroides_ovatus 1.80 UniRef90_R6K3Z5: IMP dehydrogenase|Bacteroides_stercoris 0.87 UniRef90_R6K3Z5: IMP dehydrogenase|Bacteroides_vulgatus 0.34 UniRef90_R6K3Z5: IMP dehydrogenase|unclassified 1.96