Computational Analysis of the Taxanomical Classification of Short 16S rRNA Sequences Christel Chehoud Mentor: Brian Haas.

Slides:



Advertisements
Similar presentations
Classification of Living Things
Advertisements

Metabarcoding 16S RNA targeted sequencing
Taxonomy level: 1.1-A Remember Factual Knowledge
Phylogeny Systematics Cladistics
CLASS START Pages Answer Questions 1-4 on Page 463.
Lecture 2 Overview of Microbial Diversity Prokaryotic and Eukaryotic Cells Taxonomy and Nomenclature (Text Chapters: 2; 11)
Bioinformatics for Whole-Genome Shotgun Sequencing of Microbial Communities By Kevin Chen, Lior Pachter PLoS Computational Biology, 2005 David Kelley.
BIO 244 GENERAL MICROBIOLOGY
Protein Functional Site Prediction The identification of protein regions responsible for stability and function is an especially important post-genomic.
Evaluating alignments using motif detection Let’s evaluate alignments by searching for motifs If alignment X reveals more functional motifs than Y using.
CHAPTER 25 TRACING PHYLOGENY. I. PHYLOGENY AND SYSTEMATICS A.TAXONOMY EMPLOYS A HIERARCHICAL SYSTEM OF CLASSIFICATION  SYSTEMATICS, THE STUDY OF BIOLOGICAL.
We are developing a web database for plant comparative genomics, named Phytome, that, when complete, will integrate organismal phylogenies, genetic maps.
Topic : Phylogenetic Reconstruction I. Systematics = Science of biological diversity. Systematics uses taxonomy to reflect phylogeny (evolutionary history).
Lecture 1. Microorganisms: an overview Chapter 1. Microorganisms and Microbiology Chapter 2. An overview of microbial life. Cell and viral structures DNA.
The Sorcerer II Global ocean sampling expedition Katrine Lekang Global Ocean Sampling project (GOS) Global Ocean Sampling project (GOS) CAMERA CAMERA METAREP.
The Microbiome and Metagenomics
Classification Organizing the Diversity of Life. Why do we classify things? – Supermarket aisles – Libraries – Classes – Teams/sports – Members of a family.
Observation Hypothesis Experimental Design (including Methods) Results Inference Camp Wildness 2004 Ward Lab Research Project.
Microbial taxonomy and phylogeny
Understanding miRNA Turnover: A Study of miRNA Half-Life
Systematics, Taxonomy, Phylogeny and Evolution Systematics The systematic classification of organisms, the science of systematic classification and the.
Systematics the study of the diversity of organisms and their evolutionary relationships Taxonomy – the science of naming, describing, and classifying.
Prokaryote Taxonomy & Diversity Classification, Nomenclature & Identification Phenetic Classification Molecular Phylogeny Approach Classification (hierarchical.
Updated: January 2015 By Jerald D. Hendrix. A. Classification Systems B. Levels of Classification C. Definition of “Species” D. Nomenclature E. Useful.
Accurate estimation of microbial communities using 16S tags Julien Tremblay, PhD
Prokaryote Taxonomy & Diversity
Identify gene markers for different taxonomic groups in Archaea and Bacteria Genomes Dongying Wu 1,2, Jonathan A. Eisen 1,2 1. DOE Joint Genome Institute,
The oligonucleotide frequency derived error gradient and its application to the binning of metagenome fragments Isaam Saeed & Saman K Halgamuge MERIT,
Christian Rinke Microbial Genomics DOE, Joint Genome Institute Introduction to ARB (From A User's Perspective)
Taxonomy of Cellular Life Taxonomy: classification (hierarchical grouping based on characteristics); nomenclature (naming); identification (define characteristics.
Identification and Classification of Prokaryotes
The iPlant Collaborative
Analysis of the RNAseq Genome Annotation Assessment Project by Subhajyoti De.
VIII. Phylogeny & Classification Resources:  Bozeman  Phylogenetics
Classifying the Diversity of Life Targets: 17. State the goals of taxonomy. 18. Describe how evolutionary biology and molecular biology influence classification.
Tsute (George) Chen Bioinformatics Core Department of Microbiology The Forsyth Institute March 24 th, 2015 HOMD A Tour to the Data and Tools.
Abstract Our current understanding of the taxonomic and phylogenetic diversity of cellular organisms, especially the bacteria and archaea, is mostly based.
CompostBin : A DNA composition based metagenomic binning algorithm Sourav Chatterji *, Ichitaro Yamazaki, Zhaojun Bai and Jonathan Eisen UC Davis
GEBA Project Summary Dongying Wu. Phylogenetic Tree Building (Martin Wu) Concatenate alignments of 31 marker genes build a PHYML tree 667 non-GEBA genomes,
Elucidating factors behind pair wise distances discrepancies between short and near full-length sequences. We hypothesized that since the 16S rRNA molecule.
2009 IADR, MIAMI, FL, USA Hands-on Experience for using the Human Oral Microbiome Database (HOMD) 2009 IADR Workshop, Miami, FL, USA Tsute (George) Chen.
Analyzing Time Course Data: How can we pick the disappearing needle across multiple haystacks? IEEE-HPEC Bioinformatics Challenge Day Dr. C. Nicole Rosenzweig.
Analysis and comparison of very large metagenomes with fast clustering and functional annotation Weizhong Li, BMC Bioinformatics 2009 Present by Chuan-Yih.
The Tree of Life How do we select a gene sequence for comparison?
The Microbiome and Metagenomics
Metagenomics at Second Genome
Northern Star Coral (Astrangia poculata) Populations from the New Jersey Coast. Abstract- This project investigated the distribution and molecular evolution.
Accurate estimation of microbial communities using 16S tags
Predicting patterns of biological performance using chemical substructure features Diego Borges-Rivera 08/04/08.
Taxonomy and Classification Elizabeth Stacy. Overview Taxonomy: science of classifying living things. Taxonomy is used for: organizing information, and.
Starter: Group the TV Shows Friends Neighbours X factor Big Brother Doctor Who Lost ER House Sponge Bob Squarepants Star Trek The Simpsons Futurama Eastenders.
Classification of Living Things Chapter 20. Classification of Living Things 2OutlineTaxonomy  Binomial System  Species Identification  Classification.
Convenience Sample of 4 Adults and 6 Infants. Adults 4 visits over 2 weeks; infants 2 visits over 2 weeks Adult specimens: 1) plaque (by method, teeth,
General Microbiology (Micr300)
Computational Characterization of Short Environmental DNA Fragments Jens Stoye 1, Lutz Krause 1, Robert A. Edwards 2, Forest Rohwer 2, Naryttza N. Diaz.
Metagenomic Species Diversity.
Introduction to Bioinformatics Resources for DNA Barcoding
The Original Question:
PNAS 2012 Alpha diversity: how many species are in each sample?
Workshop on the analysis of microbial sequence data using ARB
TSS Annotation Workflow
Microbiome: 16S rRNA Sequencing
Metagenomics and metatranscriptomics: Windows on CF-associated viral and microbial communities  Yan Wei Lim, Robert Schmieder, Matthew Haynes, Dana Willner,
Metagenomics and metatranscriptomics: Windows on CF-associated viral and microbial communities  Yan Wei Lim, Robert Schmieder, Matthew Haynes, Dana Willner,
Y. Huang, B. Yang, W. Li  Clinical Microbiology and Infection 
Predicting Gene Expression from Sequence
Volume 120, Issue 1, Pages (January 2005)
AS Level Paper 1 and 2. A2 Level Paper 1 and 3 - Topics 1-4
Overview of Shotgun Sequence Analysis
Toward Accurate and Quantitative Comparative Metagenomics
Presentation transcript:

Computational Analysis of the Taxanomical Classification of Short 16S rRNA Sequences Christel Chehoud Mentor: Brian Haas

Overview Human Microbiome Project 16S rRNA Reference and Test Sets Classifiers Accuracy of Classifications Results

Human Microbiome Project (HMP) Microorganism communities Human development Physiology Immunity Disease Nutrition Core Microbiome

16S rRNA 16S Ribosomal RNA Large RNA component of the small subunit of the ribosome Phylogenetic Markers Species Identification 1542 bp

Using 16S for Species Identification Classifier Sequence Predicted Classification

Project Goal New Sequencing Technology Evaluate the accuracy of the classification of the 16S rRNA across different: Classifiers Regions of the sequence Phylogeny

Reference Dataset RDP Core Set Trusted Taxonomies 6,621 sequences Phylum: 27 Class: 43 Order: 97 Family: 258 Genus: 1352

GreenGenes’s Full Collection of Sequences Full Collection used by GreenGenes High phylogenetic diversity 188,073 sequences 188,073

Comparison of Taxonomy Predictions by Method Classified GreenGenes Core Set Using: RDP (Naïve Bayesian) kmerRank Blast All Match 135,269 sequences Phylum: 27 Class: 43 Order: 96 Family: 257 Genus: , ,073

None Match: BLAST kmerRank RDP None Match 19588

CD-hit: Normalizing Genus Representation 3% difference between genera 21,179 sequences Phylum: 27 Class: 43 Order: 96 Family: 235 Genus: 1241 Li, , ,269 21,179

Sliding Window: Producing our Localized Regions Van de Peer, 1996 Sliding Window Approach 300 bp window 25 bp overlap Sanger vs. 454-XLR = Full-length vs. localized region

Overall Accuracy of the Three Different Classifiers

Average BLASTN:.843 kmerRank:.830 RDP:.831

Overall Accuracy of the Three Different Classifiers Average BLASTN:.843 kmerRank:.830 RDP:.831 Standard Deviation BLASTN:.031 kmerRank:.030 RDP:.017

Genus Prediction Accuracy (per Phylum)

Average BLASTN:.843 kmerRank:.830 RDP:.831 Standard Deviation BLASTN:.107 kmerRank:.153 RDP:.142 Genus Prediction Accuracy (per Phylum)

Finding the 16S Region Providing the Most Reliable Prediction Accuracy

Clustering Phyla and Methods by Prediction Accuracy

Best method is Phylum-dependent Variation in accuracy impacted by depth of species coverage

Summary Central region of 16S is the most accurate, on average Of the methods examined, BLAST is most accurate across all 16S regions and all phyla, on average RDP-bayes is least variable across short sequence regions Best short sequence classification method is phylum-dependent

Acknowledgements Genome Sequencing and Analysis Program Brian Haas Dirk Gevers Michael Feldgarden Doyle Ward Chad Nusbaum Bruce Birren Administration Shawna Young Lucia Vielma Maura Silverstein