Presentation is loading. Please wait.

Presentation is loading. Please wait.

The CAMERA Project Metagenomics 2006 Oct 3-5, 2006 Paul Gilna, Calit2, UCSD.

Similar presentations


Presentation on theme: "The CAMERA Project Metagenomics 2006 Oct 3-5, 2006 Paul Gilna, Calit2, UCSD."— Presentation transcript:

1 The CAMERA Project Metagenomics 2006 Oct 3-5, 2006 Paul Gilna, Calit2, UCSD

2 The CAMERA Partnership Community Cyberinfrastructure for Advanced Marine Microbial Ecology Research and Analysis

3 Genomic Data Is Growing Rapidly, But Metagenomics Will Vastly Increase The Scale… GenBank Protein Data Bank www.rcsb.org/pdb/holdings.html www.ncbi.nlm.nih.gov/Genbank 100 Billion Bases! Total Data < 1TB 35,000 Structures

4 The Sargasso Sea Experiment The Power of Environmental Metagenomics Yielded a Total of Over 1 billion Base Pairs of Non- Redundant Sequence Displayed the Gene Content, Diversity, & Relative Abundance of the Organisms Sequences from at Least 1800 Genomic Species, including 148 Previously Unknown Identified over 1.2 Million Unknown Genes MODIS-Aqua satellite image of ocean chlorophyll in the Sargasso Sea grid about the BATS site from 22 February 2003 J. Craig Venter, et al. Science 2 April 2004: Vol. 304. pp. 66 - 74

5 Full Genome Sequencing is Exploding: Most Sequenced Genomes are Bacterial Total 422 Completed Genomes Total 1665 Ongoing Genomes www.genomesonline.org 55 Metagenomes First Genome 1995 6 Genomes/ Year 2000 Moore 155 In Here

6 Moore Microbial Genome Sequencing Project Selected Microbes Throughout the Worlds Oceans www.moore.org/microgenome/worldmap.asp Microbes Nominated by Leading Ocean Microbial Biologists

7 Moore Microbial Genome Sequencing Project: Cyanobacteria Being Sequenced by Venter Institute

8 Marine Genome Sequencing Project Measuring the Genetic Diversity of Ocean Microbes

9 Genomic Data Is Growing Rapidly, But Metagenomics Will Vastly Increase The Scale… GenBank Protein Data Bank www.rcsb.org/pdb/holdings.html www.ncbi.nlm.nih.gov/Genbank 100 Billion Bases! Total Data < 1TB 35,000 Structures

10 Metagenomics Will Couple to Earth Observations Which Add Several TBs/Day Source: Glenn Iona, EOSDIS Element Evolution Technical Working Group January 6-7, 2005

11 Driven by User Needs CAMERA serves as one representation of a specific research communitys need for a system to –Collect and reference increasing metadata relevant to environmental metagenome datasets –Exploit the power of querying on metadata across multiple geospatial locations –Have access to a diverse and customizable set of easy-to-use tools to analyze their data –Have ability to add, update and propagate improvements to annotations –Have a pre-publication, pre-submission collaborative workspace –Serve a diverse levels of informatics literacy

12 Services Provided Data and Application Services Tools and Workflows Computational Data, Visualization and Collaborative environment Outreach and Training in Environmental Genomics

13 Data and Application Services Primary Data –Sargasso Sea and Sorcerer II expedition data –JGI marine & terrestrial environmental datasets –Moore Microbial Genomes –JGI and other relevant whole genomes –Research community submitted datasets –Submitted 454-based metagenomic datasets –Publicly available NR protein and DNA sequence datasets Derived Data –Annotations of datasets –Assemblies –Alignments –Pre-computed clusters

14 Sample Metadata from GOS Site Metadata –Location (lat/long, water depth) –Site characterization (finite list of types plus other) –Site description (free text) –Country Sampling Metadata –Sample collection date/time –Sampling depth –Conditions at time of sampling (e.g., stormy, surface temperature) –Sample physical/chemical measurements (T (oC), S (ppt), chl a (mg m-3), etc) –author Experimental Parameters –Filter size –Insert size

15 Tools and Workflows Initial set –BLAST Server –Clustering –HMM/Profile –Neighborhood analysis –Multiple sequence alignments –Assembly Proposed New Tools –Multiple Auto Annotation pipelines –Fast Sequence lookup –Customized Assembly –Phylogenetic Analysis –Clustering Tools

16 Guiding Philosophy for Development Sprint Q4 2006 –Propagate JCVI toolkit and data ASAP –Mechanism for publication of Sorcerer II data –Enabler for community –Defined deliverables, project management approach MarathonQ4 2006 onward –Additional Datasets –Additional tools –Community drives prioritization for ongoing releases –Advisory Board, Community Outreach Keys to success: Tight integration of science, bioinformatics, software, and IT Matched to Community Needs

17 The Future Home of the Moore Foundation Funded Marine Microbial Ecology Metagenomics Complex First Implementation of the CAMERA Complex Photo Courtesy Joe Keefe, Calit2 Major Buildout of Calit2 Server Room Underway http://calit2-1101-1.ucsd.edu/

18 Moore CAMERA Production Environment Creation of Initial Production Environment – September 2006 –Hardware –Compute Nodes – –~200 4 CPU Nodes = ~800 Processing Cores –Storage Servers – –10 systems = ¼ Petabyte raw storage –Database Servers –Larger 20-40TB; Smaller 5-10TB –Network Management – –Force10 E1200 Router w/12 10GigE Interfaces to Each System Ports User Access to Compute Cycles –Bulk of free cycles available to external users –Proposal mechanism in process Source: Greg Hidley, Calit2; Phil Papadopoulos, SDSC, Calit2

19 www.glif.is Created in Reykjavik, Iceland 2003 Countries are Aggressively Creating Gigabit Services: Interactive Access to CAMERA and LOOKING Systems Visualization courtesy of Bob Patterson, NCSA.

20 CAMERA Outreach Modes Scientific Advisory Board –Early Adopters – OptIPortal End Points Targeted Workshops –User Forums –User Software Testing –Viz Tool Brainstorming Presentations at Scientific Meetings –Talks, posters, eventually demonstration booths Partnerships With Metagenomics Projects –E.g. DoEs Joint Genome Institute (JGI) Training and User Services Team

21 A Near Future Metagenomics Fiber Optic-Enabled Data Generator Source John Delaney, UWash


Download ppt "The CAMERA Project Metagenomics 2006 Oct 3-5, 2006 Paul Gilna, Calit2, UCSD."

Similar presentations


Ads by Google