The CAMERA Project Metagenomics 2006 Oct 3-5, 2006 Paul Gilna, Calit2, UCSD.

Slides:



Advertisements
Similar presentations
Cyber Metagenomics; Challenge to See The Unseen Majority in The Ocean
Advertisements

Advancing the Metagenomics Revolution Invited Talk Symposium #1816, Managing the Exaflood: Enhancing the Value of Networked Data for Science and Society.
Sequencing Genomics: The New Big Data Driver IntermezzoTalk SURFnet7, Part of GigaPort3 Utrecht, Netherlands December 7, 2011 Dr. Larry Smarr Director,
Calit2-Living in the Future " Keynote Sharecase 2006 University of California, San Diego March 29, 2006 Dr. Larry Smarr Director, California Institute.
Creating a Community Cyberinfrastructure for Advanced Marine Microbial Ecology Research and Analysis (a.k.a. CAMERA) Invited Talk Honoring David Kingsbury.
Cyberinfrastructure for Advanced Marine Microbial Ecology Research and Analysis (CAMERA) Invited Talk CONNECT Board Meeting La Jolla, CA April 26, 2006.
The Australian Virtual Observatory e-Science Meeting School of Physics, March 2003 David Barnes.
Andrew Meade School of Biological Sciences.
Xsede eXtreme Science and Engineering Discovery Environment Ron Perrott University of Oxford 1.
NG-CHC Northern Gulf Coastal Hazards Collaboratory Simulation Experiment Integration Sandra Harper 1, Manil Maskey 1, Sara Graves 1, Sabin Basyal 1, Jian.
JGI Timeline 1997 JGI April 2003 Human Genome Program Officially Ended Human Genome Program Officially Launched 1990 Joint Genome Institute ………………….(JGI)
Creating NCBI The late Senator Claude Pepper recognized the importance of computerized information processing methods for the conduct of biomedical research.
DESIGNING THE MICROBIAL RESEARCH COMMONS: AN INTERNATIONAL SYMPOSIUM NATIONAL ACADEMY OF SCIENCES, WASHINGTON, DC, 8-9 OCTOBER 2009 Paul Gilna, B.Sc.,
Bioinformatics at WSU Matt Settles Bioinformatics Core Washington State University Wednesday, April 23, 2008 WSU Linux User Group (LUG)‏
Genomics at the Speed of Light: Understanding the Living Ocean The Gordon and Betty Moore Foundation 2nd Annual Marine Microbiology Investigator Symposium.
Systems Biology Existing and future genome sequencing projects and the follow-on structural and functional analysis of complete genomes will produce an.
Office of Science Office of Biological and Environmental Research Susan K. Gregurick, Ph.D. Program Manager Computational Biology & Bioinformatics Biological.
C A M E R A A Metagenomics Resource for Marine Microbial Ecology July 27, 2007 Paul Gilna UCSD/Calit2 Saul A. Kravitz J. Craig Venter Institute.
We are developing a web database for plant comparative genomics, named Phytome, that, when complete, will integrate organismal phylogenies, genetic maps.
The Sorcerer II Global ocean sampling expedition Katrine Lekang Global Ocean Sampling project (GOS) Global Ocean Sampling project (GOS) CAMERA CAMERA METAREP.
Building a Community Cyberinfrastructure to Support Marine Microbial Ecology Metagenomics Center for Earth Observations and Applications Advisory Committee.
The BIO Directorate Microbial Biology Emphasis BIO Advisory Committee April, 2005.
Presentation Title April 4, 2002 CAMERA- Metagenomics meets the Cyberinfrastructure David T. Kingsbury Gordon and Betty Moore Foundation BERAC - October.
Metagenomic Analysis Using MEGAN4
Development of Bioinformatics and its application on Biotechnology
Discussion on Metagenomic Data for ANGUS Course Adina Howe.
From Metagenomic Sample to Useful Visual Anna Shcherbina 01/10/ Anna Shcherbina Bioinformatics Challenge Day 02/02/2013 From Metagenomic Sample to.
Beyond the Human Genome Project Future goals and projects based on findings from the HGP.
Department of Biomedical Informatics Service Oriented Bioscience Cluster at OSC Umit V. Catalyurek Associate Professor Dept. of Biomedical Informatics.
GTL Facilities Computing Infrastructure for 21 st Century Systems Biology Ed Uberbacher ORNL & Mike Colvin LLNL.
“Quantified Self- On Being a Personal Genomic Observatory” Keynote in the “Humans as Genomic Observatories” Meeting Session in the Genomics Standards Consortium.
Building a Community Cyberinfrastructure to Support Marine Microbial Ecology Metagenomics Invited Talk 2006 Synthetic Biology Symposium Aliso Creek Inn.
material assembled from the web pages at
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Collaborating with iPlant.
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop Objectives.
IPlant cyberifrastructure to support ecological modeling Presented at the Species Distribution Modeling Group at the American Museum of Natural History.
The Environmental Genomics Thematic Programme Data Centre Dawn Field, Director.
The Future of the iPlant Cyberinfrastructure: Coming Attractions.
Innovative Research Alliances Invited Talk IUCRP Fellows Seminar UCSD La Jolla, CA July 10, 2006 Dr. Larry Smarr Director, California Institute for Telecommunications.
Data discovery and data processing for environmental research infrastructures Roberto Cossu ENVRI WP4 leader ESA.
“Living in a Microbial World” Global Health Program Council on Foreign Relations New York, NY April 10, 2014 Dr. Larry Smarr Director, California Institute.
Browsing the Genome Using Genome Browsers to Visualize and Mine Data.
EBI is an Outstation of the European Molecular Biology Laboratory. Bioinformatics Challenges in Data Handling and Presentation to the Bioinformaticists.
Where to find LiDAR: Online Data Resources.
ASCAC-BERAC Joint Panel on Accelerating Progress Toward GTL Goals Some concerns that were expressed by ASCAC members.
Metagenomic Analysis Using MEGAN4 Peter R. Hoyt Director, OSU Bioinformatics Graduate Certificate Program Matthew Vaughn iPlant, University of Texas Super.
Bioinformatics Core Facility Guglielmo Roma January 2011.
Data Integration and Management A PDB Perspective.
IPlant Collaborative Hands-on Cyberinfrastructure Workshop - Part 1 R. Walls University of Arizona Biodiversity Information Standards (TDWG) Sep. 28, 2015,
GEON2 and OpenEarth Framework (OEF) Bradley Wallet School of Geology and Geophysics, University of Oklahoma
Sara E. Richardson Calit2 Summer Undergraduate Research Scholarship Program Advisor: Jurgen Schulze Ivl.calit2.net/wiki CAMERA is.
Scratchpads and the new Biodiversity Data Journal Biodiversity Data Publishing made… easier Dimitris Koureas Natural History Museum London.
Cyberinfrastructure: Many Things to Many People Russ Hobby Program Manager Internet2.
es/by-sa/2.0/. Metagenomics Prof:Rui Alves Dept Ciencies Mediques Basiques, 1st Floor, Room.
Fire Emissions Network Sept. 4, 2002 A white paper for the development of a NSF Digital Government Program proposal Stefan Falke Washington University.
Ocean Observatories Initiative OOI Cyberinfrastructure Life Cycle Objectives Review January 8-9, 2013 Scientific Workflows for OOI Ilkay Altintas Charles.
“Genomics: The CAMERA Project" Invited Talk 5 th Annual ON*VECTOR International Photonics Workshop UCSD February 28, 2006 Dr. Larry Smarr Director,
High throughput biology data management and data intensive computing drivers George Michaels.
All Hands Meeting 2005 BIRN-CC: Building, Maintaining and Maturing a National Information Infrastructure to Enable and Advance Biomedical Research.
Metagenomics The study of metagenomes, genetic material recovered directly from environmental samples. Term: Coined in 1998 to refer to the idea that a.
Transforming Science Through Data-driven Discovery Workshop Overview Ohio State University MCIC Jason Williams – Lead, CyVerse – Education, Outreach, Training.
Discussion on Genomic/Metagenomic Data for ANGUS Course Adina Howe.
“ Building an Information Infrastructure to Support Microbial Metagenomic Sciences " Presentation to the NBCR Research Advisory Committee UCSD La Jolla,
LSST Commissioning Overview and Data Plan Charles (Chuck) Claver Beth Willman LSST System Scientist LSST Deputy Director SAC Meeting.
CyVerse Tools and Services
Tools and Services Workshop
Joslynn Lee – Data Science Educator
LSST Commissioning Overview and Data Plan Charles (Chuck) Claver Beth Willman LSST System Scientist LSST Deputy Director SAC Meeting.
“Building an Information Infrastructure to Support Genetic Sciences"
Metagenomics Microbial community DNA extraction
Presentation transcript:

The CAMERA Project Metagenomics 2006 Oct 3-5, 2006 Paul Gilna, Calit2, UCSD

The CAMERA Partnership Community Cyberinfrastructure for Advanced Marine Microbial Ecology Research and Analysis

Genomic Data Is Growing Rapidly, But Metagenomics Will Vastly Increase The Scale… GenBank Protein Data Bank Billion Bases! Total Data < 1TB 35,000 Structures

The Sargasso Sea Experiment The Power of Environmental Metagenomics Yielded a Total of Over 1 billion Base Pairs of Non- Redundant Sequence Displayed the Gene Content, Diversity, & Relative Abundance of the Organisms Sequences from at Least 1800 Genomic Species, including 148 Previously Unknown Identified over 1.2 Million Unknown Genes MODIS-Aqua satellite image of ocean chlorophyll in the Sargasso Sea grid about the BATS site from 22 February 2003 J. Craig Venter, et al. Science 2 April 2004: Vol pp

Full Genome Sequencing is Exploding: Most Sequenced Genomes are Bacterial Total 422 Completed Genomes Total 1665 Ongoing Genomes 55 Metagenomes First Genome Genomes/ Year 2000 Moore 155 In Here

Moore Microbial Genome Sequencing Project Selected Microbes Throughout the Worlds Oceans Microbes Nominated by Leading Ocean Microbial Biologists

Moore Microbial Genome Sequencing Project: Cyanobacteria Being Sequenced by Venter Institute

Marine Genome Sequencing Project Measuring the Genetic Diversity of Ocean Microbes

Genomic Data Is Growing Rapidly, But Metagenomics Will Vastly Increase The Scale… GenBank Protein Data Bank Billion Bases! Total Data < 1TB 35,000 Structures

Metagenomics Will Couple to Earth Observations Which Add Several TBs/Day Source: Glenn Iona, EOSDIS Element Evolution Technical Working Group January 6-7, 2005

Driven by User Needs CAMERA serves as one representation of a specific research communitys need for a system to –Collect and reference increasing metadata relevant to environmental metagenome datasets –Exploit the power of querying on metadata across multiple geospatial locations –Have access to a diverse and customizable set of easy-to-use tools to analyze their data –Have ability to add, update and propagate improvements to annotations –Have a pre-publication, pre-submission collaborative workspace –Serve a diverse levels of informatics literacy

Services Provided Data and Application Services Tools and Workflows Computational Data, Visualization and Collaborative environment Outreach and Training in Environmental Genomics

Data and Application Services Primary Data –Sargasso Sea and Sorcerer II expedition data –JGI marine & terrestrial environmental datasets –Moore Microbial Genomes –JGI and other relevant whole genomes –Research community submitted datasets –Submitted 454-based metagenomic datasets –Publicly available NR protein and DNA sequence datasets Derived Data –Annotations of datasets –Assemblies –Alignments –Pre-computed clusters

Sample Metadata from GOS Site Metadata –Location (lat/long, water depth) –Site characterization (finite list of types plus other) –Site description (free text) –Country Sampling Metadata –Sample collection date/time –Sampling depth –Conditions at time of sampling (e.g., stormy, surface temperature) –Sample physical/chemical measurements (T (oC), S (ppt), chl a (mg m-3), etc) –author Experimental Parameters –Filter size –Insert size

Tools and Workflows Initial set –BLAST Server –Clustering –HMM/Profile –Neighborhood analysis –Multiple sequence alignments –Assembly Proposed New Tools –Multiple Auto Annotation pipelines –Fast Sequence lookup –Customized Assembly –Phylogenetic Analysis –Clustering Tools

Guiding Philosophy for Development Sprint Q –Propagate JCVI toolkit and data ASAP –Mechanism for publication of Sorcerer II data –Enabler for community –Defined deliverables, project management approach MarathonQ onward –Additional Datasets –Additional tools –Community drives prioritization for ongoing releases –Advisory Board, Community Outreach Keys to success: Tight integration of science, bioinformatics, software, and IT Matched to Community Needs

The Future Home of the Moore Foundation Funded Marine Microbial Ecology Metagenomics Complex First Implementation of the CAMERA Complex Photo Courtesy Joe Keefe, Calit2 Major Buildout of Calit2 Server Room Underway

Moore CAMERA Production Environment Creation of Initial Production Environment – September 2006 –Hardware –Compute Nodes – –~200 4 CPU Nodes = ~800 Processing Cores –Storage Servers – –10 systems = ¼ Petabyte raw storage –Database Servers –Larger 20-40TB; Smaller 5-10TB –Network Management – –Force10 E1200 Router w/12 10GigE Interfaces to Each System Ports User Access to Compute Cycles –Bulk of free cycles available to external users –Proposal mechanism in process Source: Greg Hidley, Calit2; Phil Papadopoulos, SDSC, Calit2

Created in Reykjavik, Iceland 2003 Countries are Aggressively Creating Gigabit Services: Interactive Access to CAMERA and LOOKING Systems Visualization courtesy of Bob Patterson, NCSA.

CAMERA Outreach Modes Scientific Advisory Board –Early Adopters – OptIPortal End Points Targeted Workshops –User Forums –User Software Testing –Viz Tool Brainstorming Presentations at Scientific Meetings –Talks, posters, eventually demonstration booths Partnerships With Metagenomics Projects –E.g. DoEs Joint Genome Institute (JGI) Training and User Services Team

A Near Future Metagenomics Fiber Optic-Enabled Data Generator Source John Delaney, UWash