Approaches for our growing metagenomes Kostas Konstantinidis Carlton S. Wilder Associate Professor School of Civil and Environmental Engineering & School.

Slides:



Advertisements
Similar presentations
Data Mining Methodology 1. Why have a Methodology  Don’t want to learn things that aren’t true May not represent any underlying reality ○ Spurious correlation.
Advertisements

Use of the genomic data o Reconstruction of metabolic properties o Nature’s Microbiome o NGS in Population Genetics.
G. Alonso, D. Kossmann Systems Group
Determination of host-associated bacterial communities In the rhizospheres of maize, acorn squash, and pinto beans.
Summer Bioinformatics Workshop 2008 Comparative Genomics and Phylogenetics Chi-Cheng Lin, Ph.D., Professor Department of Computer Science Winona State.
Systems Biology Existing and future genome sequencing projects and the follow-on structural and functional analysis of complete genomes will produce an.
Bioinformatics for Whole-Genome Shotgun Sequencing of Microbial Communities By Kevin Chen, Lior Pachter PLoS Computational Biology, 2005 David Kelley.
Xiaole Shirley Liu STAT115, STAT215, BIO298, BIST520
ACTIVITY 2: SIZE AND SCALE MATTER! Original drawings by John Tenniel.
Workshop in Bioinformatics 2010 Class # Class 8 March 2010.
We are developing a web database for plant comparative genomics, named Phytome, that, when complete, will integrate organismal phylogenies, genetic maps.
3 rd Summer School in Computational Biology September 10, 2014 Frank Emmert-Streib & Salissou Moutari Computational Biology and Machine Learning Laboratory.
Aleksi Kallio CSC – IT Center for Science Chipster and collaboration with other bioinformatics platforms.
The Sorcerer II Global ocean sampling expedition Katrine Lekang Global Ocean Sampling project (GOS) Global Ocean Sampling project (GOS) CAMERA CAMERA METAREP.
Subsystem Approach to Genome Annotation National Microbial Pathogen Data Resource Claudia Reich NCSA, University of Illinois, Urbana.
Microbial Genomes Features Analysis Role of high-throughput sequencing Yeast - the eukaryotic model microbe Databases –TIGR CMR –NCBI Microbial Genomes.
Introduction to metagenomics Agnieszka S. Juncker Center for Biological Sequence Analysis Technical University of Denmark.
Motive Konza: understanding disease, since there is no apparent reason to manage native pathogens of native plants Also have background information in.
 The institute started in 1989 as a UNDP funded project called the National Agricultural Genetic Engineering Laboratory (NAGEL).  The Agricultural.
Genome Informatics 2005 ~ 220 participants 1 keynote speaker: David Haussler 47 talks 121 posters.
Metagenomic Analysis Using MEGAN4
Discussion on Metagenomic Data for ANGUS Course Adina Howe.
Molecular Microbial Ecology
“Mapping the Human Gut Microbiome in Health and Disease Using Sequencing, Supercomputing, and Data Analysis” Invited Talk Delivered by Mehrdad Yazdani,
Discovery of new biomarkers as indicators of watershed health and water quality Anamaria Crisan & Mike Peabody.
From Metagenomic Sample to Useful Visual Anna Shcherbina 01/10/ Anna Shcherbina Bioinformatics Challenge Day 02/02/2013 From Metagenomic Sample to.
Beyond the Human Genome Project Future goals and projects based on findings from the HGP.
H = -Σp i log 2 p i. SCOPI Each one of the many microbial communities has its own structure and ecosystem, depending on the body environment it exists.
Transcriptome analysis With a reference – Challenging due to size and complexity of datasets – Many tools available, driven by biomedical research – GATK.
Accurate estimation of microbial communities using 16S tags Julien Tremblay, PhD
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Collaborating with iPlant.
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Collaborating with iPlant.
DAN LAWSON BRC 2011 – ANNUAL MEETING UT SOUTHWESTERN MEDICAL CENTER DALLAS, TX SEPTEMBER 2011 Challenges and opportunities of new sequencing technologies.
Roadmap for Soil Community Metagenomics of DOE’s FACE & OTC Sites
The Future of the iPlant Cyberinfrastructure: Coming Attractions.
Analysis of Complex Proteomic Datasets Using Scaffold Free Scaffold Viewer can be downloaded at:
Chapter 21 Eukaryotic Genome Sequences
ASCAC-BERAC Joint Panel on Accelerating Progress Toward GTL Goals Some concerns that were expressed by ASCAC members.
Current Challenges in Metagenomics: an Overview Chandan Pal 17 th December, GoBiG Meeting.
Bioinformatics The application of computer technology to the management of biological information
NY Times Molecular Sciences Institute Started in 1996 by Dr. Syndey Brenner (2002 Nobel Prize winner). Opened in Berkeley in Roger Brent,
Bioinformatics MEDC601 Lecture by Brad Windle Ph# Office: Massey Cancer Center, Goodwin Labs Room 319 Web site for lecture:
Analyzing Time Course Data: How can we pick the disappearing needle across multiple haystacks? IEEE-HPEC Bioinformatics Challenge Day Dr. C. Nicole Rosenzweig.
Genome annotation and search for homologs. Genome of the week Discuss the diversity and features of selected microbial genomes. Link to the paper describing.
De novo assembly validation
Compositionality and Sparseness in 16S rRNA data Anthony Fodor Associate Professor Bioinformatics and Genomics UNC Charlotte.
Accurate estimation of microbial communities using 16S tags
Design of a Compound Screening Collection Gavin Harper Cheminformatics, Stevenage.
Drinking from a fire hose: analysis of metagenomic data Rachel Mackelprang, Ph.D. Assistant Professor of Biology California State University Northridge.
No reference available
CyVerse Workshop Transcriptome Assembly. Overview of work RNA-Seq without a reference genome Generate Sequence QC and Processing Transcriptome Assembly.
High throughput biology data management and data intensive computing drivers George Michaels.
Big Data in Indian Agriculture D. Rama Rao Director, NAARM.
MEGAN analysis of metagenomic data Daniel H. Huson, Alexander F. Auch, Ji Qi, et al. Genome Res
Genome sequencing and annotation Week 2 reading assignment - pages 63-78, 93-98, Boxes 2.1 and don’t worry about details of similarity scoring.
Introducing DOTUR, a Computer Program for Defining Operational Taxonomic Units and Estimating Species Richness Patric D. Schloss and Jo Handelsman Department.
Bioinformatics for biologists (2) Dr. Habil Zare, PhD PI of Oncinfo Lab Assistant Professor, Department of Computer Science Texas State University Presented.
Discussion on Genomic/Metagenomic Data for ANGUS Course Adina Howe.
Population sequencing using short reads: HIV as a case study Vladimir Jojic et.al. PSB 13: (2008) Presenter: Yong Li.
Metagenomic Species Diversity.
Metagenomic assembly Cedric Notredame
Research in Computational Molecular Biology , Vol (2008)
Toward Next Generation Biodiversity Research
The African Soil Microbiology project
H = -Σpi log2 pi.
Metagenomics Microbial community DNA extraction
Volume 20, Issue 5, Pages (November 2014)
Volume 20, Issue 5, Pages (November 2014)
Genome resolved metagenomics
Toward Accurate and Quantitative Comparative Metagenomics
Presentation transcript:

Approaches for our growing metagenomes Kostas Konstantinidis Carlton S. Wilder Associate Professor School of Civil and Environmental Engineering & School of Biology (Adjunct), Center for Bioinformatics and Computational Genomics Georgia Institute of Technology ISME 15 Aug 25 th, 2014

Adina Howe’s ideas for discussion - How do you deal with poorly replicated data? The low n high p problem? - What are the best approaches to re-analyze previous datasets with improved tools? - What is the progress on integrating different sequencing platforms? - How big a computer do I really need to do everything I want? Is it reasonable to expect access to this for myself? - Is metagenomics really useful and worth the investment? - What are the most useful tools you use regularly? - How do you reduce dataset sizes? - How do you share data? - What kind of statistical tests are appropriate for low replicate data? - What are the assumptions you make for metagenomics data/analyses? - Which assumptions should you not make ever? Or which will come back and haunt us? - What are the best metagenomic datasets? - What is the dream experiment/dataset? - What is the single largest obstacle in tackling a metagenome? - How much data do I need? Is it possible for there to be too much data? - Do you sequence deeper or for more replicates? - How do you evaluate statistical power of your approaches? - How do you visualize enormous datasets? Too many! I will focus on a few…

Is shotgun metagenomics really useful? Not a panacea (like any other technology!)…but a powerful, hypothesis-generating tool. If experiment is designed well, metagenomics can also provide a mechanistic understanding of how microbes and their communities evolve, respond to perturbations, which genes they exchange horizontally, what mutations are selected, etc. A few recent examples from our group Luo et al, AEM 2014 Oh et al., Env. Microb 2013 Examples from our group in this meeting Minjae Kim’s talk on Thursday Kostas’ talk on Friday

Not much because replicates typically give the same picture (gene amplicons may be a different story). Differentially abundant taxa, gene, pathways are easily detectable when differences are not marginal. For time-series: usually 3 replicates for one sampling point; for the rest sampling points, no replication. More replicates (n>=6) when we want to detect marginal difference between treatments. DESeq is powerful package. Always include a mock sample (i.e., one that you know who is there and how abundant) to test for artifacts/errors, especially for gene amplicon work. How much replication?

What coverage to obtain and why it matters From Rodriguez-R and Konstantinidis, ISME 2014 Effect of average coverage on detection of differentially abundant features A winter and a summer shotgun metagenome dataset form Lake Lanier time series (Atlanta, GA) were subsampled and compared. Datasets with average coverage > ~50% perform well (e.g., assembly; detect differences). Avoid comparisons between datasets that differ >2 fold in terms of coverage.

Need for new tools Nonpareil: Estimating coverage level of metagenomes Rodriguez-R and Konstantinidis, ISME 2014 Our approach examines the redundancy of reads. It is free from assembly, reference gene databases (e.g., 16S rRNA gene), or clustering OTUs. Note that more diverse communities require larger sequencing efforts to achieve the same level of coverage, hence located rightward in the plot. Available through

How to select the right tool? -Test the tool first on a mock dataset! Sometimes the code does not work as it is supposed to, or you anticipated… -Learn some Perl/Python! From Luo, Rodriguez-R and Konstantinidis, Methods in Enzymology 2013

Some (potentially) useful approaches An approach to assess assembly parameters and results based on in-silico generated “spiked-in” metagenomes For some additional approaches, see: Luo, Rodriguez-R and Konstantinidis, Methods in Enzymology 2013

Challenges remaining Gene functional annotation. Propagation of wrong/poor annotations; many genes still hypothetical. Need to keep supporting experimental work to decipher gene functions and curated databases. Tools do not scale with the volume of data that become available. Need to work closer with computer engineers and scientists. Binning of assembled contigs into populations, especially in complex communities (e.g., to model what each member of the community does). New approaches needed; longer sequencing reads; single cells.

Additional lab presentations at ISME Minjae Kim Seasonal changes and nitrogen cycle genes in midwestern agricultural soils as revealed by metagenomics. Poster 199B, Tuesday. Expanding the bioinformatics toolbox for the analysis of genomes and metagenomes. Poster 204B, Tuesday. Microbial community degradation of widely used quaternary ammonium disinfectants and implications for controlling disinfectant-induced antibiotic resistance. Contributed talk 1400, Thursday. Metagenomics reveal that bacterial species exist. Invited talk, Friday.

Acknowledgements Konstantinidis Lab Janet Hatt, Ph.D. Michael Weigand, Ph.D. Samantha Waters, PhD Despina Tsementzi Natasha DeLeon Luis Orellana Luis-Miguel Rodriguez-R. Eric Johnston Juliana Soto Angela Pena Minjae Kim Yuanqi Wang Interested? Funding