Strain profiling with StrainPhlAn and PanPhlAn

Slides:



Advertisements
Similar presentations
Asthma One child in 10 in the EU Childhood asthma costs the EU 3 Billion p.a. Adult and industrial asthma also 3 Billion Abnormal airway mucosa Intermittent.
Advertisements

16S sequencing for microbiome studies Nicola Segata and Nick Loman
Use of the genomic data o Reconstruction of metabolic properties o Nature’s Microbiome o NGS in Population Genetics.
ISARE : Health indicators in the regions of Europe André Ochoa for Isare team ISARE : Health indicators in the regions of Europe André Ochoa for Isare.
From Metagenomic Sample to Useful Visual Anna Shcherbina 01/10/ Anna Shcherbina Bioinformatics Challenge Day 02/02/2013 From Metagenomic Sample to.
The large land areas are called continents. There are seven continents.
Probes can be designed in an evolutionary hierarchy.
Accurate estimation of microbial communities using 16S tags Julien Tremblay, PhD
Identify gene markers for different taxonomic groups in Archaea and Bacteria Genomes Dongying Wu 1,2, Jonathan A. Eisen 1,2 1. DOE Joint Genome Institute,
Microbial diversity and virulence probing of five different body sites Anu Rebbapragada, Pub. Health Ontario Central Lab. Canada Wei-Jen Lin, Cal State.
Strength in Numbers Mar The Delian League  Countries do not want to be dominated by other countries.  But there are many advantages to be gained.
De novo assembly validation
Accurate estimation of microbial communities using 16S tags
The (IMG) Systems for Comparative Analysis of Microbial Genomes & Metagenomes: N America: 1,180 Europe: 386 Asia: 235 Africa: 6 Oceania: 81 S America:
A Robust and Accurate Binning Algorithm for Metagenomic Sequences with Arbitrary Species Abundance Ratio Zainab Haydari Dr. Zelikovsky Summer 2011.
tracking microbes at the strain level
MEGAN analysis of metagenomic data Daniel H. Huson, Alexander F. Auch, Ji Qi, et al. Genome Res
An Introduction to Meta’omic Analyses Curtis Huttenhower Galeb Abu-Ali Eric Franzosa Harvard T.H. Chan School of Public Health Department of Biostatistics.
Functional profiling with HUMAnN2
Using the bioBakery Curtis Huttenhower
TIPP: Taxonomic Identification And Phylogenetic Profiling
The Integrated Microbial Genome (IMG) systems
CuratedMetagenomicData: curated taxonomic and functional profiles for thousands of human-associated microbiomes Microbiome working group seminar Dec 1,
Metagenomic Species Diversity.
Genomic Data Manipulation Thinking about data visually
The Integrated Microbial Genome (IMG) systems
An Introduction to Meta’omic Analyses
Metagenomic assembly Cedric Notredame
Systematic Literature Reviews and Development of Distributions of Viral Densities in Raw Wastewater Sorina E. Eftim, PhD MHS May 18, 2016.
Automating reproducible analyses with AnADAMA2 and bioBakery Workflows
Hotel name…. Occupied rooms Total person-nights Here of Israeli nights
Buy real and fake drivers license, passport, ID cards,
Pipelines for Computational Analysis (Bioinformatics)
Functional profiling with HUMAnN2
Electrification Products
Taxonomic profiling with MetaPhlAn2
Identifying personal microbiomes using metagenomic codes
Locations where Black Panther was released in the theaters in 2018.
Metagenomics Image: Iverson et al. 2012, Science.
Taxonomic profiling with MetaPhlAn2
Analysis of Statistical Units Delineated by OECD Member Countries
Genomic Data Manipulation
Selection of cities Anastasios Maroudas Eurogramme
Strain profiling with StrainPhlAn
HIV/AIDS Surveillance in Europe 2011
Propionibacterium acnes Strain Populations in the Human Skin Microbiome Associated with Acne  Sorel Fitz-Gibbon, Shuta Tomida, Bor-Han Chiu, Lin Nguyen,
Gonorrhoea cases of gonorrhoea were reported by 27 EU/EEA Member States for The overall notification rate was 18.8 cases per 100 000 population.
Curtis Huttenhower Galeb Abu-Ali Eric Franzosa
European Union Membership
Do Now for Monday, October 10
Volume 20, Issue 5, Pages (November 2014)
Signature of CRC‐associated gut microbial species Relative abundances of 22 gut microbial species, collectively associated with CRC, are displayed as heatmap.
The Longue Durée of Genetic Ancestry: Multiple Genetic Marker Systems and Celtic Origins on the Atlantic Facade of Europe  Brian McEvoy, Martin Richards,
Task force on victimisation 4. Precision requirements
Strain-level phylogenetic trees for microbes present in both the mother and infant. Strain-level phylogenetic trees for microbes present in both the mother.
The Longue Durée of Genetic Ancestry: Multiple Genetic Marker Systems and Celtic Origins on the Atlantic Facade of Europe  Brian McEvoy, Martin Richards,
Volume 20, Issue 5, Pages (November 2014)
Community diversity and metagenome depth interact to influence assembly quality. Community diversity and metagenome depth interact to influence assembly.
A typical current computational meta'omic pipeline to analyze and contrast microbial communities. A typical current computational meta'omic pipeline to.
GNP and per capita GNP Top of the world!?.
Where in the world is the European Union?
A Presentation by Regina Strelecki
Graph-based variant detection
2006 Rank Adjusted for Purchasing Power
Gut Microbiome Studies
Volume 25, Issue 14, Pages R611-R613 (July 2015)
Toward Accurate and Quantitative Comparative Metagenomics
Electrification business
Phylogenetic analysis of complete Fusobacterium genomes.
Batch variation of formulations from two products by two different genomic-scale techniques. Batch variation of formulations from two products by two different.
Presentation transcript:

Strain profiling with StrainPhlAn and PanPhlAn Nicola Segata Strain profiling with StrainPhlAn and PanPhlAn Curtis Huttenhower (chuttenh@hsph.harvard.edu) Galeb Abu-Ali (gabuali@hsph.harvard.edu) Ali Rahnavard (rah@broadinstitute.org) STAMPS 2017 08-08-17 Harvard T.H. Chan School of Public Health Department of Biostatistics

Efficient assembly-free meta’omics by leveraging isolates II III IV V I II III IV V II III II I IV I I II III II V IV V V Species pan-genomes 7,677 containing 18.6 million gene clusters Core genes Marker genes NCBI isolate genomes Archaea 300 Bacteria 12,926 Viruses 3,565 Eukaryota 112 Open reading frames 49.0 million total genes RepoPhlAn ChocoPhlAn http://www.metaref.org

StrainPhlAn: metagenomic strain identification and tracking http://segatalab.cibio.unitn.it/tools/strainphlan

A tool for strain level population genomics China Denmark Estonia Finland Peru’ Hungary Italy Norway France Spain Sweden USA Germany P. copri as an example species Alignment length: 66k nt Median SNPs: 830 [3.6%] # pos. samples: 123

A tool for strain level population genomics Alignment length: 62k nt Median SNPs: 830 [1.3%] # pos. samples: 123

Most bugs (in the gut) are dominated by one stable strain

Most bugs (in the gut) are dominated by one stable strain

There’s a lot of strain-level variation left to discover Median divergence from reference markers

PanPhlAn: the approach http://bitbucket.org/CibioCM/panphlan mapping Read Metagenomic sample Gene coverage Microbial pangenomes Cluster to Gene families Pan-gene family coverage Abundance-sorted pan-gene families Coverage Multi-copy genes Plateau of genes from one metagenome’s strain Absent genes

PanPhlAn for “meta-epidemiology” http://bitbucket.org/CibioCM/panphlan Metagenomes from [Loman et al., 2013]

Strain-level epidemiology of human-associated E. coli with PanPhlAn STEC Scholz et al., Nature Methods, 2016 T2D (China) German outbreak Reference genomes Liver Cirr. (China) Infants (Italy) CRC (Europe) HMP (USA) Obesity (Europe) Neilsen (Europe) T2D (Finland) Rampelli (Africa) Liu (Mongolia) Tito (Peru) Segre (Skin) B1 B2 ~5,000 metagenomes (and counting) All continents Many EU countries A D

Multiple options for strain tracking in metagenomes StrainPhlAn: Map reads to core markers and call SNPs. Requires ~10x coverage, ~0.1% error rate. PanPhlAn: Map reads to pan-genomes and identify absent genes. Requires ~1x coverage, ~1% error rate. Both work uniquely well for meta-analysis. Not sensitive to typical batch effects. http://segatalab.cibio.unitn.it/tools/strainphlan http://segatalab.cibio.unitn.it/tools/panphlan

https://bitbucket.org/biobakery/biobakery/wiki/strainphlan StrainPhlAn tutorial https://bitbucket.org/biobakery/biobakery/wiki/strainphlan

There’s a lot of strain-level variation left to discover Phylogenetic branch % spanned by reference vs. “wild” bugs

Gene-family distribution curves Select samples with “step” distribution (colored curves) strain of species present Base coverage Reject non-step (gray) curves E. coli gene-families

Synthetic and semi-synthetic validation Coverage Coverage Coverage Coverage Coverage

PanPhlAn on Eubacterium rectale Only one Eubacterium rectale genome used here

PanPhlAn on Eubacterium rectale