Presentation is loading. Please wait.

Presentation is loading. Please wait.

Taxonomic profiling with MetaPhlAn2

Similar presentations


Presentation on theme: "Taxonomic profiling with MetaPhlAn2"— Presentation transcript:

1 Taxonomic profiling with MetaPhlAn2
Curtis Huttenhower Galeb Abu-Ali Eric Franzosa Harvard T.H. Chan School of Public Health Department of Biostatistics

2 The two big questions of microbial community analysis...
What are they doing? Who is there?

3 Taxonomic profiling: who’s there

4 Efficient assembly-free meta’omics by leveraging isolates
II III IV V I II III IV V II III II I IV I I II III II V IV V V Species pan-genomes 7,677 containing 18.6 million gene clusters Core genes Marker genes NCBI isolate genomes Archaea 300 Bacteria 12,926 Viruses 3,565 Eukaryota 112 Open reading frames 49.0 million total genes Nicola Segata RepoPhlAn ChocoPhlAn

5 MetaPhlAn2: metagenomic taxonomic profiling
X is a unique marker gene for clade Y Gene X ~1M most representative markers used for identification 184±45 markers per species (target 200) ~7,100 species (excludes incomplete annotations, spp., etc.) False positive/False negative rates of ~1 in 106 Profiles all domains of life: bacteria, viruses, euks, archaea Strain level profiling using marker barcodes and SNPs Quasi-markers used to resolve ambiguity in postprocessing Nicola’s taken advantage of this catalog for several computational methods, but the one I’d like to talk about today relies on identifying high-quality taxonomically unique marker sequences guaranteed to arise from exactly one microbial clade. By organizing the gene catalog of IMG into groups of gene families – not orthologous families, but highly nucleotide-similar sequences – we can identify gene families that are core to one or more clades. This means that the gene’s conserved throughout the clade, although it may appear elsewhere due to conservation or horizontal gene transfer. Core genes are thus a superset of unique marker genes, which are both core to a clade and unique there – they never appear elsewhere, even by horizontal transfer. Nicola’s developed a system called ChocoPhlAn, which I’m pretty sure is an acronym for something, that identifies all genes core or unique for any clade within IMG. This results in a high-quality set of about two million unique markers, with uniqueness verified by whole-genome BLAST against the entire database. About 400 thousand of these proved sufficient to uniquely identify all 1200 species in the database, plus several hundred higher-level clades, with several hundred markers for most organisms.

6 Per-species abundance by robust averaging
Abundance-sorted pan-gene families Coverage Multi-copy genes Plateau of genes from one metagenome’s strain Absent genes

7 Meta-analysis of metagenomic taxonomic profiles
Waldron and Segata: meta-analysis of >2,400 gut metagenomes. Available as an R package. Allows systematic tests of phenotypes across datasets, or health vs. disease.

8


Download ppt "Taxonomic profiling with MetaPhlAn2"

Similar presentations


Ads by Google