Presentation is loading. Please wait.

Presentation is loading. Please wait.

C A M E R A A Metagenomics Resource for Marine Microbial Ecology July 27, 2007 Paul Gilna UCSD/Calit2 Saul A. Kravitz J. Craig Venter Institute.

Similar presentations


Presentation on theme: "C A M E R A A Metagenomics Resource for Marine Microbial Ecology July 27, 2007 Paul Gilna UCSD/Calit2 Saul A. Kravitz J. Craig Venter Institute."— Presentation transcript:

1 C A M E R A A Metagenomics Resource for Marine Microbial Ecology July 27, 2007 Paul Gilna UCSD/Calit2 Saul A. Kravitz J. Craig Venter Institute

2 UCSD/Calit2 - Larry Smarr, PI; Paul Gilna, Executive Director - Phil Papadopoulos, Technical Lead - Weizhong Li JCVI - Marv Frazier, co-PI - Leonid Kagan, Architect; Jennifer Wortman, Bioinformatics - Rekha Seshadri, Outreach and Training; - Doug Rusch, Shibu Yooseph, Aaron Halpern, Granger Sutton UC Davis - Jonathan Eisen, co-investigator Gordon and Betty Moore Foundation - David Kingsbury and Mary Maxon Acknowledgements

3 Outline New Discipline of Metagenomics Global Ocean Sampling Expedition Challenges of Metagenomic Data CAMERA Features CAMERA Usage to Date Cyberinfrastructure

4 Genomics – ‘Old School’ - Study of an organism's genome - Genome sequence determined using shotgun sequencing and assembly - ~1300 microbes sequenced, first in 1995 - DNA usually obtained from pure cultures Metagenomics - Application of genome sequencing methods to environmental samples (no culturing) - Environmental shotgun sequencing is the most widely used approach Genomics vs Metagenomics

5 Within an environment - What biological functions are present (absent)? - What organisms are present (absent) Compare data from (dis)similar environments - What are the fundamental rules of microbial ecology Search for novel proteins and protein families Metagenomic Questions

6 Metagenomics Applications Marine Ecology and Microbiology Alternative Energy and Industrial - Hypersaline ponds, Oceans - Termite Metabolism Medical Applications - Microbial Ecology of Human body cavities and fluids Agricultural - Disease Vector Metabolism (Glassy Eyed Sharpshooter) - Soil Ecology Environmental Remediation - DOE: Acid Mine Drainage, Chemical and Radioactive Waste

7 Metagenomics - Genomics + Metadata Environmental Metadata - Time and location (lat, long, depth) of sample collection - Correlate w/remote sensing data - Physico-chemical properties (e.g. temperature, salinity) MODIS-Aqua satellite image of ocean chlorophyll in the Sargasso Sea grid about the BATS site from 22 February 2003 Metadata

8 JCVI Global Ocean Sampling Expedition Largest Metagenomic Study to Date

9 Global Ocean Sampling (GOS) 178 Total Sampling Locations Phase 1: 41 samples, 7.7M reads, >6M proteins Diverse Environments Open ocean, estuary, embayment, upwelling, fringing reef, atoll, warm seep, mangrove, fresh water, biofilms, sediments, soils

10 Novel clustering process Sequence similarity based Predict proteins and group into related clusters Include GOS and all known proteins Findings GOS proteins cover ~all existing prokaryotic families GOS expands diversity of known protein families 1700 large novel clusters with no homology to known protein families Higher than expected proportion of novel clusters are viral No saturation in the rate of novel protein family discover GOS Protein Analysis Yooseph et al (PLoS 2007)

11 H. marismortui B. halodurans T. thermophilus B. anthracis D. psychrophila D. radiodurans UVDE homologs Rubisco homologs GOS prokaryotes Known eukaryotes Known prokaryotes GOS prokaryotes Known eukaryotes Known prokaryotes GOS viral Known viral GOS eukaryotes Added Diversity

12 Rate of Protein Discovery

13 Fragment Recruitment Viewer Rusch et al, PLoS 3/2007 Percent Identity Reference Genome Coordinates 100% 55% Ribosomal operon “core” genome, ~75% identical Sequence absent from most strains – phage/other lateral transfer? 100% 50%

14 Public repositories not focused on environmental metagenomics - Sargasso Sea data underutilized by community M$ invested in sequencing and analysis but only accessible to bioinformatics elite Release of GOS dataset in March 2007 Comply with Convention on Biodiversity Why CAMERA?

15 CAMERA – http://camera.calit2.net “Convenient acronym for cumbersome name…” - Henry Nichols, PLoS Biology Mission - Enable Research in Marine Microbiology CAMERA Partners:

16 Enormous datasets with high gene density - large compute resources required - 2 orders of magnitude jump Fragmentary data - inadequate bioinformatics tools for assembly, annotation, analysis, visualization Metadata standards non-existent - metadata absent from databases - Lack of standards impedes collection of datasets Diversity of User Sophistication and Needs Challenges

17 Maintain searchable sequence collections - ALL metagenomic sequence reads, assemblies - Non-identical amino acid collection (extended NRAA) - Viral, Fungal, pico-Eukaryotes, Microbial - CAMERA protein clusters Metagenomics data easily downloadable Interactive and Batch Search Facility - Scalable parallel implementations of BLAST - Integrated with associated metadata CAMERA Services

18 Graphical Tools for Visualizing Diversity - Based on Rusch et al - Fragment recruitment viewer CAMERA Protein Clusters - Based on Yooseph et al - Incremental version implemented in 2007 Annotation - Break through quadratic complexity via clusters - Phyletic Classification Overviews of sequence collections Distinctive Features Set in Progress

19 Fragment Recruitment Viewer Metagenomic Sequence vs Reference Sequence Highlight and Select with Associated Metadata View large datasets AJAX I/F Based on Doug Rusch’s Viewer


Download ppt "C A M E R A A Metagenomics Resource for Marine Microbial Ecology July 27, 2007 Paul Gilna UCSD/Calit2 Saul A. Kravitz J. Craig Venter Institute."

Similar presentations


Ads by Google