Metagenomic Species Diversity.

Slides:



Advertisements
Similar presentations
Introduction Classification Phylogeny Cladograms Quiz
Advertisements

Metabarcoding 16S RNA targeted sequencing
Phylogenetic Trees Understand the history and diversity of life. Systematics. –Study of biological diversity in evolutionary context. –Phylogeny is evolutionary.
PHYLOGENY AND SYSTEMATICS
Systems Biology Existing and future genome sequencing projects and the follow-on structural and functional analysis of complete genomes will produce an.
Protein Modules An Introduction to Bioinformatics.
CHAPTER 25 TRACING PHYLOGENY. I. PHYLOGENY AND SYSTEMATICS A.TAXONOMY EMPLOYS A HIERARCHICAL SYSTEM OF CLASSIFICATION  SYSTEMATICS, THE STUDY OF BIOLOGICAL.
Utilizing Fuzzy Logic for Gene Sequence Construction from Sub Sequences and Characteristic Genome Derivation and Assembly.
Topic : Phylogenetic Reconstruction I. Systematics = Science of biological diversity. Systematics uses taxonomy to reflect phylogeny (evolutionary history).
The Sorcerer II Global ocean sampling expedition Katrine Lekang Global Ocean Sampling project (GOS) Global Ocean Sampling project (GOS) CAMERA CAMERA METAREP.
Phylogeny & The Tree of Life. Phylogeny  The evolutionary history of a species or group of species.
Metagenomics Binning and Machine Learning
Metagenomic Analysis Using MEGAN4
Microbial taxonomy and phylogeny
Development of Bioinformatics and its application on Biotechnology
Discussion on Metagenomic Data for ANGUS Course Adina Howe.
Molecular Microbial Ecology
BLAST What it does and what it means Steven Slater Adapted from pt.
From Metagenomic Sample to Useful Visual Anna Shcherbina 01/10/ Anna Shcherbina Bioinformatics Challenge Day 02/02/2013 From Metagenomic Sample to.
H = -Σp i log 2 p i. SCOPI Each one of the many microbial communities has its own structure and ecosystem, depending on the body environment it exists.
Updated: January 2015 By Jerald D. Hendrix. A. Classification Systems B. Levels of Classification C. Definition of “Species” D. Nomenclature E. Useful.
Current Challenges in Metagenomics: an Overview Chandan Pal 17 th December, GoBiG Meeting.
Metagenomic Analysis Using MEGAN4 Peter R. Hoyt Director, OSU Bioinformatics Graduate Certificate Program Matthew Vaughn iPlant, University of Texas Super.
Analyzing Time Course Data: How can we pick the disappearing needle across multiple haystacks? IEEE-HPEC Bioinformatics Challenge Day Dr. C. Nicole Rosenzweig.
Phylogeny & the Tree of Life
Classification. Cell Types Cells come in all types of shapes and sizes. Cell Membrane – cells are surrounded by a thin flexible layer Also known as a.
Classification.
Accurate estimation of microbial communities using 16S tags
UNIT 5A Classification & Kingdoms. I. Classification a. Organize items so you can better understand and find them b. Based on Similarities c. Taxonomy:
A Robust and Accurate Binning Algorithm for Metagenomic Sequences with Arbitrary Species Abundance Ratio Zainab Haydari Dr. Zelikovsky Summer 2011.
Biotechnology and Bioinformatics: Bioinformatics Essential Idea: Bioinformatics is the use of computers to analyze sequence data in biological research.
Classification Biology I. Lesson Objectives Compare Aristotle’s and Linnaeus’s methods of classifying organisms. Explain how to write a scientific name.
Phylogeny and Taxonomy. Phylogeny and Systematics The evolutionary history of a species or related species Reconstructing phylogeny is done using evidence.
Canadian Bioinformatics Workshops
tracking microbes at the strain level
MEGAN analysis of metagenomic data Daniel H. Huson, Alexander F. Auch, Ji Qi, et al. Genome Res
Discussion on Genomic/Metagenomic Data for ANGUS Course Adina Howe.
Gene prediction in metagenomic fragments: A large scale machine learning approach Katharina J Hoff, Maike Tech, Thomas Lingner, Rolf Daniel, Burkhard Morgenstern.
Computational Characterization of Short Environmental DNA Fragments Jens Stoye 1, Lutz Krause 1, Robert A. Edwards 2, Forest Rohwer 2, Naryttza N. Diaz.
Date of download: 7/7/2016 Copyright © 2016 McGraw-Hill Education. All rights reserved. Pipeline for culture-independent studies of a microbiota. (A) DNA.
Quantitative Phylogenetic Assessment of Microbial Communities in Diverse Environments Xinjun Zhang.
Canadian Bioinformatics Workshops
Introduction to Bioinformatics Resources for DNA Barcoding
Preprocessing Data Rob Schmieder.
Phylogeny & the Tree of Life
Basics of Comparative Genomics
Metagenomic assembly Cedric Notredame
Research in Computational Molecular Biology , Vol (2008)
Unraveling the microbial profile of the rhizosphere of SDS-suppressive soils in Soybean fields Ali Y. Srour1, Jason Bond1, Leonor Leandro2, Dean Malvick3.
Bioinformatics Madina Bazarova. What is Bioinformatics? Bioinformatics is marriage between biology and computer. It is the use of computers for the acquisition,
Section 3: Gene Technologies in Detail
Human Cells Human genomics
Taxonomic profiling with MetaPhlAn2
Identifying personal microbiomes using metagenomic codes
Department of Computer Science
Taxonomic profiling with MetaPhlAn2
Chapter 17: Organizing Life’s Diversity
Microbiome: 16S rRNA Sequencing
H = -Σpi log2 pi.
Metagenomics Microbial community DNA extraction
Dr Tan Tin Wee Director Bioinformatics Centre
Bioinformatics Vicki & Joe.
Basic Local Alignment Search Tool (BLAST)
Volume 10, Issue 4, Pages (October 2011)
Taxonomic identification and phylogenetic profiling
Unit Genomic sequencing
Basics of Comparative Genomics
A typical current computational meta'omic pipeline to analyze and contrast microbial communities. A typical current computational meta'omic pipeline to.
Genome resolved metagenomics
Toward Accurate and Quantitative Comparative Metagenomics
Presentation transcript:

Metagenomic Species Diversity

Agenda Motivation Basic classification Terms Pre- Identification of microbial community Identification of microbial community Computation tool “Demo”

Basic classification Terms Motivation Basic classification Terms Pre- Identification Identification Computation tool “Demo” Motivation Microbial communities are responsible for a broad spectrum of biological activities carried out in virtually all natural environments including oceans, soil and human-associated habitats. For example: bacteria are responsible for about half of the photosynthesis on Earth. Friendly bacteria in the digestive system occur mainly in the colon, and help with the digestive process. The Microbiome Project- Food Allergies.mp4

Basic classification Terms Motivation Basic classification Terms Pre- Identification Identification Computation tool “Demo” In Conclusion Profiling the taxonomic and phylogenetic compositions of such communities is critical for understanding their biology and characterizing complex disorders like inflammatory bowel diseases, and obesity that do not appear to be associated with any individual microbes.

Basic classification Terms Motivation Basic classification Terms Pre- Identification Identification Computation tool “Demo” Taxonomy Is the science of defining groups of biological organisms on the basis of shared characteristics and giving names to those groups. Organisms are grouped together into taxa (singular: taxon) and these groups are given a taxonomic rank Groups of a given rank can be aggregated to form a super group of lower rank, thus creating a taxonomic hierarchy.

Basic classification Terms Motivation Basic classification Terms Pre- Identification Identification Computation tool “Demo” Taxonomy hierarchy

Basic classification Terms Motivation Basic classification Terms Pre- Identification Identification Computation tool “Demo” Example - Felidae:

Ordering to Taxonomy hierarchy: Motivation Basic classification Terms Pre- Identification Identification Computation tool “Demo” Ordering to Taxonomy hierarchy:

Ordering to Taxonomy hierarchy: Motivation Basic classification Terms Pre- Identification Identification Computation tool “Demo” Ordering to Taxonomy hierarchy:

Ordering to Taxonomy hierarchy: Motivation Basic classification Terms Pre- Identification Identification Computation tool “Demo” Ordering to Taxonomy hierarchy:

Ordering to Taxonomy hierarchy: Motivation Basic classification Terms Pre- Identification Identification Computation tool “Demo” Ordering to Taxonomy hierarchy:

Basic classification Terms Motivation Basic classification Terms Pre- Identification Identification Computation tool “Demo” Clade Is a monophyletic taxon or monophyletic group. Is a group of organisms that consists of a common ancestor (which may be an individual, a population, a species (extinct or extant), and so on right up to a kingdom), and all its lineal descendants.

Example - Repitilia:

Basic classification Terms Motivation Basic classification Terms Pre- Identification Identification Computation tool “Demo” Example:

Basic classification Terms Motivation Basic classification Terms Pre- Identification Identification Computation tool “Demo” Metagenomic Shotgun Sequencing Shotgun Sequencing Recently advances in bioinformatics allowed the adaptation of shotgun sequencing to metagenomic samples. Metagenomic samples can contain reads from a huge number of organisms. For example, in a single gram of soil, there can be up to 18000 different types of organisms, each with its own genome.   Shotgun sequencing reveals genes present in environmental samples. Provide a rich profile of the microbial community.

Basic classification Terms Motivation Basic classification Terms Pre- Identification Identification Computation tool “Demo” More Terms: Reads – pieces of sequenced DNA which we get from a metagenomic sample. There size is between 500 and 1000 bases long Marker Gene – a piece of DNA which its location on the chromosome is well known, and therefore it can be used to identify organisms. Relative abundance – is the percent composition of an organism of a particular kind relative to the total number of organisms in the area.

Basic classification Terms Motivation Basic classification Terms Pre- Identification Identification Computation tool “Demo” Binning Is the process of associating a particular sequence with an organism. Binning algorithms can employ previous information, and thus act as supervised classifiers, or they can try to find new groups, those act as unsupervised. Many, of course, do both. Strategies: Alignment/Similarity-based-binning - methods used to rapidly search for phylogenetic markers or otherwise similar sequences in existing public databases. For example: BLAST

Basic classification Terms Motivation Basic classification Terms Pre- Identification Identification Computation tool “Demo” BLAST Basic Local Alignment Search Tool. Is a search algorithm for a comparison of any DNA sequences to a large database of referenced sequences Is one of the most widely used bioinformatics programs for sequence searching Enables a researcher to compare a query sequence with a library or database of sequences, and identify library sequences that resemble the query sequence above a certain threshold.

Basic classification Terms Motivation Basic classification Terms Pre- Identification Identification Computation tool “Demo” But Both alignment- and composition-based approaches have been developed for this task, and the two approaches have also been integrated in hybrid methods. However, none have simultaneously achieved both the efficiency and the species-level accuracy required by current highly-complexity datasets due to computational limitations, untenable accuracy for short (<400 nt) reads, and the need to normalize read counts into clade-specific relative abundances.

Basic classification Terms Motivation Basic classification Terms Pre- Identification Identification Computation tool “Demo” MetaPhlan Is a computational tool for profiling the composition of microbial communities from metagenomic shotgun sequencing data. Requires only minutes to process millions of metagenomic reads. Estimate the relative abundance of microbial cells using unique clade- specific marker gene.

Clade-specific markers Motivation Basic classification Terms Pre- Identification Identification Computation tool “Demo” Clade-specific markers Clade-specific markers are coding sequences (CDS) that satisfy: Being strongly conserved within the clades genomes . Not possessing substantial local similarity with any sequence outside the clade. The definition of such markers is to some extent sensitive to the availability of sequenced genomes, especially point (i), because a gene can be present in all available sequenced genomes in a clade but missing from some yet-to-be-sequenced strains.

Basic classification Terms Motivation Basic classification Terms Pre- Identification Identification Computation tool “Demo” The Markers Catalog Starting from the 2,887 genomes currently available from IMG (Integrated Microbial Genome-July 2011), more than 2 million were identified as potential markers meeting this level of stringency and allowing for sequencing and annotation errors. Then a subset of 400,141 genes most representative of each taxonomic unit were selected, and from them the resulting catalog was generated. The resulting catalog spans 1,221 species with 231 (standard deviation 107) markers per species and >115,000 markers at higher taxonomic levels.

The MetaPhlan classifier workflow Motivation Basic classification Terms Pre- Identification Identification Computation tool “Demo” The MetaPhlan classifier workflow

The MetaPhlan classifier Motivation Basic classification Terms Pre- Identification Identification Computation tool “Demo” The MetaPhlan classifier Compares each metagenomic read from a sample to the marker catalog to identify high-confidence matches. It is done very efficiently, as the catalog contains only ~4% of sequenced microbial genes, and each read of interest has at most one match due to the markers' uniqueness. Since spurious reads are very unlikely to have significant matches with a marker sequence, no pre-processing of metagenomic DNA (for example error detection or assembly) is required.

Basic classification Terms Motivation Basic classification Terms Pre- Identification Identification Computation tool “Demo” The classifier normalizes the total number of reads in each clade by the nucleotide length of its markers and provides the relative abundance of each taxonomic unit, taking into account any markers specific to subclades. a classification rate of about 450 reads-per-second on standard single- processor system.

Calculating Relative Abundance Example: Bacteria A: 2,000.000 reads and total size of specific-clade marker is 1,000,000. Bacteria B: 8,000.000 reads and total size of specific-clade marker is 5,000,000. Calculations: Normalized bacteria a : 2,000,000 1,000,000 =2 Normalized bacteria b : 8,000,000 5,000,000 =1.6 Relative abundance of bacteria A: 55.55%. Relative abundance of bacteria B: 44.44%.

Basic classification Terms Motivation Basic classification Terms Pre- Identification Identification Computation tool “Demo” “Demo” Input: Reads from 20 samples collected from 15-18 body sites from 300 healthy human subjects. Output: profiled_samples.txt

The end…