Presentation is loading. Please wait.

Presentation is loading. Please wait.

Metagenomic Species Diversity.

Similar presentations


Presentation on theme: "Metagenomic Species Diversity."— Presentation transcript:

1 Metagenomic Species Diversity

2 Agenda Motivation Basic classification Terms
Pre- Identification of microbial community Identification of microbial community Computation tool “Demo”

3 Basic classification Terms
Motivation Basic classification Terms Pre- Identification Identification Computation tool “Demo” Motivation Microbial communities are responsible for a broad spectrum of biological activities carried out in virtually all natural environments including oceans, soil and human-associated habitats. For example: bacteria are responsible for about half of the photosynthesis on Earth. Friendly bacteria in the digestive system occur mainly in the colon, and help with the digestive process. The Microbiome Project- Food Allergies.mp4

4 Basic classification Terms
Motivation Basic classification Terms Pre- Identification Identification Computation tool “Demo” In Conclusion Profiling the taxonomic and phylogenetic compositions of such communities is critical for understanding their biology and characterizing complex disorders like inflammatory bowel diseases, and obesity that do not appear to be associated with any individual microbes.

5 Basic classification Terms
Motivation Basic classification Terms Pre- Identification Identification Computation tool “Demo” Taxonomy Is the science of defining groups of biological organisms on the basis of shared characteristics and giving names to those groups. Organisms are grouped together into taxa (singular: taxon) and these groups are given a taxonomic rank Groups of a given rank can be aggregated to form a super group of lower rank, thus creating a taxonomic hierarchy.

6 Basic classification Terms
Motivation Basic classification Terms Pre- Identification Identification Computation tool “Demo” Taxonomy hierarchy

7 Basic classification Terms
Motivation Basic classification Terms Pre- Identification Identification Computation tool “Demo” Example - Felidae:

8 Ordering to Taxonomy hierarchy:
Motivation Basic classification Terms Pre- Identification Identification Computation tool “Demo” Ordering to Taxonomy hierarchy:

9 Ordering to Taxonomy hierarchy:
Motivation Basic classification Terms Pre- Identification Identification Computation tool “Demo” Ordering to Taxonomy hierarchy:

10 Ordering to Taxonomy hierarchy:
Motivation Basic classification Terms Pre- Identification Identification Computation tool “Demo” Ordering to Taxonomy hierarchy:

11 Ordering to Taxonomy hierarchy:
Motivation Basic classification Terms Pre- Identification Identification Computation tool “Demo” Ordering to Taxonomy hierarchy:

12 Basic classification Terms
Motivation Basic classification Terms Pre- Identification Identification Computation tool “Demo” Clade Is a monophyletic taxon or monophyletic group. Is a group of organisms that consists of a common ancestor (which may be an individual, a population, a species (extinct or extant), and so on right up to a kingdom), and all its lineal descendants.

13 Example - Repitilia:

14 Basic classification Terms
Motivation Basic classification Terms Pre- Identification Identification Computation tool “Demo” Example:

15 Basic classification Terms
Motivation Basic classification Terms Pre- Identification Identification Computation tool “Demo” Metagenomic Shotgun Sequencing Shotgun Sequencing Recently advances in bioinformatics allowed the adaptation of shotgun sequencing to metagenomic samples. Metagenomic samples can contain reads from a huge number of organisms. For example, in a single gram of soil, there can be up to different types of organisms, each with its own genome.   Shotgun sequencing reveals genes present in environmental samples. Provide a rich profile of the microbial community.

16 Basic classification Terms
Motivation Basic classification Terms Pre- Identification Identification Computation tool “Demo” More Terms: Reads – pieces of sequenced DNA which we get from a metagenomic sample. There size is between 500 and 1000 bases long Marker Gene – a piece of DNA which its location on the chromosome is well known, and therefore it can be used to identify organisms. Relative abundance – is the percent composition of an organism of a particular kind relative to the total number of organisms in the area.

17 Basic classification Terms
Motivation Basic classification Terms Pre- Identification Identification Computation tool “Demo” Binning Is the process of associating a particular sequence with an organism. Binning algorithms can employ previous information, and thus act as supervised classifiers, or they can try to find new groups, those act as unsupervised. Many, of course, do both. Strategies: Alignment/Similarity-based-binning - methods used to rapidly search for phylogenetic markers or otherwise similar sequences in existing public databases For example: BLAST

18 Basic classification Terms
Motivation Basic classification Terms Pre- Identification Identification Computation tool “Demo” BLAST Basic Local Alignment Search Tool. Is a search algorithm for a comparison of any DNA sequences to a large database of referenced sequences Is one of the most widely used bioinformatics programs for sequence searching Enables a researcher to compare a query sequence with a library or database of sequences, and identify library sequences that resemble the query sequence above a certain threshold.

19 Basic classification Terms
Motivation Basic classification Terms Pre- Identification Identification Computation tool “Demo” But Both alignment- and composition-based approaches have been developed for this task, and the two approaches have also been integrated in hybrid methods. However, none have simultaneously achieved both the efficiency and the species-level accuracy required by current highly-complexity datasets due to computational limitations, untenable accuracy for short (<400 nt) reads, and the need to normalize read counts into clade-specific relative abundances.

20 Basic classification Terms
Motivation Basic classification Terms Pre- Identification Identification Computation tool “Demo” MetaPhlan Is a computational tool for profiling the composition of microbial communities from metagenomic shotgun sequencing data. Requires only minutes to process millions of metagenomic reads. Estimate the relative abundance of microbial cells using unique clade- specific marker gene.

21 Clade-specific markers
Motivation Basic classification Terms Pre- Identification Identification Computation tool “Demo” Clade-specific markers Clade-specific markers are coding sequences (CDS) that satisfy: Being strongly conserved within the clades genomes . Not possessing substantial local similarity with any sequence outside the clade. The definition of such markers is to some extent sensitive to the availability of sequenced genomes, especially point (i), because a gene can be present in all available sequenced genomes in a clade but missing from some yet-to-be-sequenced strains.

22 Basic classification Terms
Motivation Basic classification Terms Pre- Identification Identification Computation tool “Demo” The Markers Catalog Starting from the 2,887 genomes currently available from IMG (Integrated Microbial Genome-July 2011), more than 2 million were identified as potential markers meeting this level of stringency and allowing for sequencing and annotation errors. Then a subset of 400,141 genes most representative of each taxonomic unit were selected, and from them the resulting catalog was generated. The resulting catalog spans 1,221 species with 231 (standard deviation 107) markers per species and >115,000 markers at higher taxonomic levels.

23 The MetaPhlan classifier workflow
Motivation Basic classification Terms Pre- Identification Identification Computation tool “Demo” The MetaPhlan classifier workflow

24 The MetaPhlan classifier
Motivation Basic classification Terms Pre- Identification Identification Computation tool “Demo” The MetaPhlan classifier Compares each metagenomic read from a sample to the marker catalog to identify high-confidence matches. It is done very efficiently, as the catalog contains only ~4% of sequenced microbial genes, and each read of interest has at most one match due to the markers' uniqueness. Since spurious reads are very unlikely to have significant matches with a marker sequence, no pre-processing of metagenomic DNA (for example error detection or assembly) is required.

25 Basic classification Terms
Motivation Basic classification Terms Pre- Identification Identification Computation tool “Demo” The classifier normalizes the total number of reads in each clade by the nucleotide length of its markers and provides the relative abundance of each taxonomic unit, taking into account any markers specific to subclades. a classification rate of about 450 reads-per-second on standard single- processor system.

26 Calculating Relative Abundance Example:
Bacteria A: 2, reads and total size of specific-clade marker is 1,000,000. Bacteria B: 8, reads and total size of specific-clade marker is 5,000,000. Calculations: Normalized bacteria a : 2,000,000 1,000,000 =2 Normalized bacteria b : 8,000,000 5,000,000 =1.6 Relative abundance of bacteria A: 55.55%. Relative abundance of bacteria B: 44.44%.

27 Basic classification Terms
Motivation Basic classification Terms Pre- Identification Identification Computation tool “Demo” “Demo” Input: Reads from 20 samples collected from body sites from 300 healthy human subjects. Output: profiled_samples.txt

28 The end…


Download ppt "Metagenomic Species Diversity."

Similar presentations


Ads by Google