Metagenomic Analysis Using MEGAN4 Peter R. Hoyt Director, OSU Bioinformatics Graduate Certificate Program Matthew Vaughn iPlant, University of Texas Super.

Slides:



Advertisements
Similar presentations
Blast outputoutput. How to measure the similarity between two sequences Q: which one is a better match to the query ? Query: M A T W L Seq_A: M A T P.
Advertisements

Extending Taxonomic Visualisations Dealing with Large Datasets, Structural Markers and Synonymy.
Genome databases and webtools for genome analysis Become familiar with microbial genome databases Use some of the tools useful for analyzing genome Visit.
Metabarcoding 16S RNA targeted sequencing
Basics of Comparative Genomics Dr G. P. S. Raghava.
Practical Bioinformatics Community structure measures for meta-genomics István Albert Bioinformatics Consulting Center Penn State.
Biological Oceanography Scientific Domain Ed DeLong MIT Department of Biological Engineering Department of Civil and Environmental Engineering DataSpace.
Data-intensive Computing: Case Study Area 1: Bioinformatics B. Ramamurthy 6/17/20151.
Bioinformatics and Phylogenetic Analysis
Molecular Evidence Using DNA, RNA or Protein Sequences to Classify Organisms.
Project Proposals Due Monday Feb. 12 Two Parts: Background—describe the question Why is it important and interesting? What is already known about it? Proposed.
Similar Sequence Similar Function Charles Yan Spring 2006.
We are developing a web database for plant comparative genomics, named Phytome, that, when complete, will integrate organismal phylogenies, genetic maps.
Bioinformatics Unit 1: Data Bases and Alignments Lecture 3: “Homology” Searches and Sequence Alignments (cont.) The Mechanics of Alignments.
The Sorcerer II Global ocean sampling expedition Katrine Lekang Global Ocean Sampling project (GOS) Global Ocean Sampling project (GOS) CAMERA CAMERA METAREP.
Sequence comparison: Local alignment
Subsystem Approach to Genome Annotation National Microbial Pathogen Data Resource Claudia Reich NCSA, University of Illinois, Urbana.
341: Introduction to Bioinformatics Dr. Natasa Przulj Deaprtment of Computing Imperial College London
Metagenomics Binning and Machine Learning
Ch10. Intermolecular Interactions and Biological Pathways
Metagenomic Analysis Using MEGAN4
From Metagenomic Sample to Useful Visual Anna Shcherbina 01/10/ Anna Shcherbina Bioinformatics Challenge Day 02/02/2013 From Metagenomic Sample to.
H = -Σp i log 2 p i. SCOPI Each one of the many microbial communities has its own structure and ecosystem, depending on the body environment it exists.
Introduction to Bioinformatics CPSC 265. Interface of biology and computer science Analysis of proteins, genes and genomes using computer algorithms and.
Metagenomic Analysis Using MEGAN?
BLAST: A Case Study Lecture 25. BLAST: Introduction The Basic Local Alignment Search Tool, BLAST, is a fast approach to finding similar strings of characters.
Proliferation cluster (G12) Figure S1 A The proliferation cluster is a stable one. A dendrogram depicting results of cluster analysis of all varying genes.
I529: Lab5 02/20/2009 AI : Kwangmin Choi. Today’s topics Gene Ontology prediction/mapping – AmiGo –
Microbial diversity and virulence probing of five different body sites Anu Rebbapragada, Pub. Health Ontario Central Lab. Canada Wei-Jen Lin, Cal State.
Bacterial Genetics - Assignment and Genomics Exercise: Aims –To provide an overview of the development and.
ANALYSIS AND VISUALIZATION OF SINGLE COPY ORTHOLOGS IN ARABIDOPSIS, LETTUCE, SUNFLOWER AND OTHER PLANT SPECIES. Alexander Kozik and Richard W. Michelmore.
NCBI resources II: web-based tools and ftp resources Yanbin Yin Fall 2014 Most materials are downloaded from ftp://ftp.ncbi.nih.gov/pub/education/ 1.
Gene expression analysis
PIRSF Classification System PIRSF: Evolutionary relationships of proteins from super- to sub-families Homeomorphic Family: Homologous proteins sharing.
BLAST: Basic Local Alignment Search Tool Altschul et al. J. Mol Bio CS 466 Saurabh Sinha.
Introduction to Bioinformatics Dr. Rybarczyk, PhD University of North Carolina-Chapel Hill
Structural Models Lecture 11. Structural Models: Introduction Structural models display relationships among entities and have a variety of uses, such.
GeWorkbench John Watkinson Columbia University. geWorkbench The bioinformatics platform of the National Center for the Multi-scale Analysis of Genomic.
Copyright OpenHelix. No use or reproduction without express written consent1.
Primary vs. Secondary Databases Primary databases are repositories of “raw” data. These are also referred to as archival databases. -This is one of the.
GeWorkbench Overview Support Team Molecular Analysis Tools Knowledge Center Columbia University and The Broad Institute of MIT and Harvard.
Tutorial 8 Gene expression analysis 1. How to interpret an expression matrix Expression data DBs - GEO Clustering –Hierarchical clustering –K-means clustering.
Copyright OpenHelix. No use or reproduction without express written consent1 1.
Biocomputational Languages December 1, 2011 Greg Antell & Khoa Nguyen.
Canadian Bioinformatics Workshops
Zach Miller Computer Sciences Department University of Wisconsin-Madison Supporting the Computation Needs.
MEGAN analysis of metagenomic data Daniel H. Huson, Alexander F. Auch, Ji Qi, et al. Genome Res
Date of download: 6/23/2016 Copyright © 2016 McGraw-Hill Education. All rights reserved. Pipeline for culture-independent studies of a microbiota. (A)
Date of download: 7/7/2016 Copyright © 2016 McGraw-Hill Education. All rights reserved. Pipeline for culture-independent studies of a microbiota. (A) DNA.
Bioinformatics Overview
Alastair Grant Environmental Sciences, University of East Anglia
Metagenomic Species Diversity.
Introduction to Bioinformatics Resources for DNA Barcoding
Data-intensive Computing: Case Study Area 1: Bioinformatics
Basics of Comparative Genomics
Sequence comparison: Local alignment
Multiform Views of Multiple Trees
Date of download: 1/1/2018 Copyright © ASME. All rights reserved.
Human Gut Microbiome: Function Matters
Metagenomics and metatranscriptomics: Windows on CF-associated viral and microbial communities  Yan Wei Lim, Robert Schmieder, Matthew Haynes, Dana Willner,
Metagenomics and metatranscriptomics: Windows on CF-associated viral and microbial communities  Yan Wei Lim, Robert Schmieder, Matthew Haynes, Dana Willner,
Dr Tan Tin Wee Director Bioinformatics Centre
Genes to Trees Daniel Ayres and Adam Bazinet
Volume 21, Issue 8, Pages (August 2014)
Victor M. Markowitz, I-Min A. Chen, Ken Chu, Amrita Pati, Natalia N
Skin Microbiome Surveys Are Strongly Influenced by Experimental Design
Basics of Comparative Genomics
Basic Local Alignment Search Tool
Sequence alignment, E-value & Extreme value distribution
Toward Accurate and Quantitative Comparative Metagenomics
Presentation transcript:

Metagenomic Analysis Using MEGAN4 Peter R. Hoyt Director, OSU Bioinformatics Graduate Certificate Program Matthew Vaughn iPlant, University of Texas Super Computing Center

Introduction In METAGENOMICS, the aim is to understand the composition and operation of complex microbial consortia in environmental samples through sequencing and analysis of their DNA. Similarly, metatranscriptomics and metaproteomics target the RNA and proteins obtained from such samples. Technological advances in next-generation sequencing methods are fueling a rapid increase in the number and scope of environmental sequencing projects. In consequence, there is a dramatic increase in the volume of sequence data to be analyzed.

The $100 Human Genome The Importance of Metagenomics is Driven by Sequencing Costs

Basic Computational Metagenomics The first three basic computational tasks for such data are:  taxonomic analysis (“who is out there?”)  functional analysis (“what are they doing?”)  comparative analysis. (“how do they compare?”) This is an immense conceptual and computational challenge that MEGAN is designed to address.

For Example:

 Taxonomic  Metagenomic  Metatranscriptomic  Metaproteomic  16S rRNA sequences  Function/Gene Ontology (SEED)  Metabolomics/Pathway Analyses (KEGG)  Comparative Genomics Algorithms in Bioinformatics MEGAN-4 USES

Prepare a dataset for use with MEGAN: 1. First compare reads against a database of reference sequences, e.g. BLASTX search against the NCBI-NR database. 2. Reads file & resulting BLAST file can be directly imported into MEGAN* Automatic taxonomic classification or functional classification, Uses SEED or KEGG classification, or both. 3. Multiple datasets can be opened simultaneously for comparative views Getting started aatacgaacatt tgccatggacgc tggccattgac nr nt Ref seq Ref seq pdb rdb Comparative Data MEGAN4 Metagenomic sample DNA-RNA-Protein Raw Digital Data BLAST

MEGAN can be used to interactively explore the dataset. Figure shows assignment of reads to the NCBI taxonomy. Each node is labeled by a taxon and the number of reads assigned to the taxon, The size of a node is scaled logarithmically to represent the number of assigned reads. Tree display options allow you to interactively drill down to the individual BLAST hits and to export all reads One can select a set of taxa and then use MEGAN to generate different types of charts Taxonomic analysis

Multiple Chart Options are Available

MEGAN attempts to map each read to a SEED functional role by the highest scoring BLAST protein match with a known functional role. SEED rooted trees are “multi-labeled” because different leaves may represent the same functional role (if it occurs in different types of subsystems) The current complete SEED tree has about 13,000 nodes. Functional analysis using the SEED classification 1 SEED 1 is a comparative genomics environment of curated genomic data. The following figure shows a part of the SEED analysis of a marine metagenome sample.

To perform a KEGG analysis, MEGAN attempts to match each read to a KEGG orthology (KO) accession number, using the best hit to a reference sequenceKEGG Reads are then assigned to enzymes and pathways. The KEGG classification is represented by a rooted tree whose leaves represent pathways. Each pathway can also be inspected visually, for example the citric acid cycle (shown) KEGG displays different participating enzymes by numbered rectangles. MEGAN shades each such rectangle is so as to indicate the number of reads assigned to the corresponding enzyme. Functional analysis using the KEGG classification

MEGAN also supports the simultaneous analysis and comparison of the SEED functional content of multiple metagenomes, A comparative view of assignments to a KEGG pathway is also possible. Comparitive analysis using the SEED classification

MEGAN supports a number of different methods for calculating a distance matrix, These can be visualized either using a split network calculated using the neighbor-net algorithm, or using a multi-dimensional scaling plot. The figure we shows a comparison of eight marine datasets based on the taxonomic content of the datasets and computed using Goodall’s index. MEGANs analysis window compares multiple datasets. This enables creating distance matrices for a collection of datasets using different ecological indices. Computational comparison of metagenomes

MEGAN provides a comparison view that is based on a tree in which each node shows the number of reads assigned to it for each of the datasets. This can be done either as a pie chart, a bar chart or as a heat map. Comparative phylogenetic visualization Once the datasets are all individually opened MEGAN provides a “compare” dialog. The following figure shows the taxonomic comparison of all eight marine datasets. Here, each node in the NCBI taxonomy is shown as a bar chart indicating the number of reads (normalized, if desired) from each dataset assigned to the node.