Presentation Title April 4, 2002 CAMERA- Metagenomics meets the Cyberinfrastructure David T. Kingsbury Gordon and Betty Moore Foundation BERAC - October.

Slides:



Advertisements
Similar presentations
Cyber Metagenomics; Challenge to See The Unseen Majority in The Ocean
Advertisements

Creating a Community Cyberinfrastructure for Advanced Marine Microbial Ecology Research and Analysis (a.k.a. CAMERA) Invited Talk Honoring David Kingsbury.
Cyberinfrastructure for Advanced Marine Microbial Ecology Research and Analysis (CAMERA) Invited Talk CONNECT Board Meeting La Jolla, CA April 26, 2006.
The CAMERA Project Metagenomics 2006 Oct 3-5, 2006 Paul Gilna, Calit2, UCSD.
High Performance Cyberinfrastructure Discovery Tools for Data Intensive Research Larry Smarr Prof. Computer Science and Engineering Director, Calit2 (UC.
Why Optical Networks Are Emerging as the 21 st Century Driver Scientific American, January 2001.
DESIGNING THE MICROBIAL RESEARCH COMMONS: AN INTERNATIONAL SYMPOSIUM NATIONAL ACADEMY OF SCIENCES, WASHINGTON, DC, 8-9 OCTOBER 2009 Paul Gilna, B.Sc.,
Genomics at the Speed of Light: Understanding the Living Ocean The Gordon and Betty Moore Foundation 2nd Annual Marine Microbiology Investigator Symposium.
Microbial Metagenomics and Human Health Invited Talk Health Sciences Advisory Board School of Medicine University of California, San Diego May 8, 2006.
Cyberinfrastructure for Advanced Marine Microbial Ecology Research and Analysis (CAMERA) Invited Keynote Annual Meeting CENIC 2006 Oakland, CA March 13,
2. Point Cloud x, y, z, … Complete LiDAR Workflow 1. Survey 4. Analyze / “Do Science” 3. Interpolate / Grid USGS Coastal & Marine.
C A M E R A A Metagenomics Resource for Marine Microbial Ecology July 27, 2007 Paul Gilna UCSD/Calit2 Saul A. Kravitz J. Craig Venter Institute.
We are developing a web database for plant comparative genomics, named Phytome, that, when complete, will integrate organismal phylogenies, genetic maps.
The Sorcerer II Global ocean sampling expedition Katrine Lekang Global Ocean Sampling project (GOS) Global Ocean Sampling project (GOS) CAMERA CAMERA METAREP.
UNIVERSITY of MARYLAND GLOBAL LAND COVER FACILITY High Performance Computing in Support of Geospatial Information Discovery and Mining Joseph JaJa Institute.
“ OptIPuter Tech Transfer to the Broader e-Science and HPC Communities " OptIPuter All Hands Meeting La Jolla, CA December 20, 2006 Dr. Larry.
Building a Community Cyberinfrastructure to Support Marine Microbial Ecology Metagenomics Center for Earth Observations and Applications Advisory Committee.
es/by-sa/2.0/. Metagenomics Prof:Rui Alves Dept Ciencies Mediques Basiques, 1st Floor, Room.
The BIO Directorate Microbial Biology Emphasis BIO Advisory Committee April, 2005.
Genomics at the Speed of Light: Understanding the Living Ocean Invited Talk JASON Summer Program La Jolla, CA July 12, 2006 Dr. Larry Smarr Director, California.
Bringing Mexico Into the Global LambdaGrid Dr. Larry Smarr Director, California Institute for Telecommunications and Information Technology Harry E. Gruber.
Title: GeneWiz browser: An Interactive Tool for Visualizing Sequenced Chromosomes By Peter F. Hallin, Hans-Henrik Stærfeldt, Eva Rotenberg, Tim T. Binnewies,
From Metagenomic Sample to Useful Visual Anna Shcherbina 01/10/ Anna Shcherbina Bioinformatics Challenge Day 02/02/2013 From Metagenomic Sample to.
GTL Facilities Computing Infrastructure for 21 st Century Systems Biology Ed Uberbacher ORNL & Mike Colvin LLNL.
Building a Community Cyberinfrastructure to Support Marine Microbial Ecology Metagenomics Invited Talk 2006 Synthetic Biology Symposium Aliso Creek Inn.
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Collaborating with iPlant.
“Calit2: A UC Experiment for Living in the Future" Talk to UCSD Near You La Jolla, CA April 11, 2006 Dr. Larry Smarr Director, California Institute.
“Comparative Human Microbiome Analysis” Remote Video Talk to CICESE Big Data, Big Network Workshop Ensenada, Mexico October 10, 2013 Dr. Larry Smarr Director,
“ High Performance Collaboration – The Jump to Light Speed " Talk to A Visiting Team from Intel June 25, 2006 Dr. Larry Smarr Director, California.
IPlant cyberifrastructure to support ecological modeling Presented at the Species Distribution Modeling Group at the American Museum of Natural History.
The Environmental Genomics Thematic Programme Data Centre Dawn Field, Director.
What is Cyberinfrastructure? Russ Hobby, Internet2 Clemson University CI Days 20 May 2008.
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Collaborating with iPlant.
Chiaro’s Enstara™ Summary Scalable Capacity –6 Tb/S Initial Capacity –GigE  OC-192 Interfaces –“Soft” Forwarding Plane With Network Processors For Maximum.
The Future of the iPlant Cyberinfrastructure: Coming Attractions.
Innovative Research Alliances Invited Talk IUCRP Fellows Seminar UCSD La Jolla, CA July 10, 2006 Dr. Larry Smarr Director, California Institute for Telecommunications.
“Metagenomics Over Lambdas: Update on the CAMERA Project" Invited Talk 6 th Annual ON*VECTOR International Photonics Workshop UCSD February 27,
Using Photonics to Prototype the Research Campus Infrastructure of the Future: The UCSD Quartzite Project Philip Papadopoulos Larry Smarr Joseph Ford Shaya.
Building a Community Cyberinfrastructure to Support Marine Microbial Ecology Metagenomics Invited Talk Metagenomics 2006 UCSD La Jolla, CA October.
“Living in a Microbial World” Global Health Program Council on Foreign Relations New York, NY April 10, 2014 Dr. Larry Smarr Director, California Institute.
EBI is an Outstation of the European Molecular Biology Laboratory. Bioinformatics Challenges in Data Handling and Presentation to the Bioinformaticists.
Where to find LiDAR: Online Data Resources.
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop Discovery Environment Overview.
Bioinformatics Core Facility Guglielmo Roma January 2011.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop Discovery Environment Overview.
GEON2 and OpenEarth Framework (OEF) Bradley Wallet School of Geology and Geophysics, University of Oklahoma
Sara E. Richardson Calit2 Summer Undergraduate Research Scholarship Program Advisor: Jurgen Schulze Ivl.calit2.net/wiki CAMERA is.
The OptIPuter Project Tom DeFanti, Jason Leigh, Maxine Brown, Tom Moher, Oliver Yu, Bob Grossman, Luc Renambot Electronic Visualization Laboratory, Department.
“ Collaborations Between Calit2, SIO, and the Venter Institute—a Beginning " Talk to the UCSD Representative Assembly La Jolla, CA November 29, 2005 Dr.
“CAMERA Goes Live!" Presentation with Craig Venter National Press Club Washington, DC March 13, 2007 Dr. Larry Smarr Director, California Institute for.
Cyberinfrastructure Overview Russ Hobby, Internet2 ECSU CI Days 4 January 2008.
Cyberinfrastructure: Many Things to Many People Russ Hobby Program Manager Internet2.
“The UCSD Big Data Freeway System” Invited Short Talk Workshop on “Enriching Human Life and Society” UC San Diego February 6, 2014 Dr. Larry Smarr Director,
es/by-sa/2.0/. Metagenomics Prof:Rui Alves Dept Ciencies Mediques Basiques, 1st Floor, Room.
Ocean Observatories Initiative OOI Cyberinfrastructure Life Cycle Objectives Review January 8-9, 2013 Scientific Workflows for OOI Ilkay Altintas Charles.
Southern California Infrastructure Philip Papadopoulos Greg Hidley.
“Genomics: The CAMERA Project" Invited Talk 5 th Annual ON*VECTOR International Photonics Workshop UCSD February 28, 2006 Dr. Larry Smarr Director,
High throughput biology data management and data intensive computing drivers George Michaels.
All Hands Meeting 2005 BIRN-CC: Building, Maintaining and Maturing a National Information Infrastructure to Enable and Advance Biomedical Research.
Integrate access to advanced computational resources and high-level services (resource scheduling, automated data management) to accelerate and improve.
High Performance Cyberinfrastructure Discovery Tools for Data Intensive Research Larry Smarr Prof. Computer Science and Engineering Director, Calit2 (UC.
“OptIPuter: From the End User Lab to Global Digital Assets" Panel UC Research Cyberinfrastructure Meeting October 10, 2005 Dr. Larry Smarr.
“ Building an Information Infrastructure to Support Microbial Metagenomic Sciences " Presentation to the NBCR Research Advisory Committee UCSD La Jolla,
Invited Talk Metagenomics 2006 UCSD La Jolla, CA October 4, 2006
Lennart Johnsson Professor CSC Director, PDC
“Building an Information Infrastructure to Support Genetic Sciences"
Metagenomics Microbial community DNA extraction
The OptIPortal, a Scalable Visualization, Storage, and Computing Termination Device for High Bandwidth Campus Bridging Presentation by Larry Smarr to.
Data Management Components for a Research Data Archive
Presentation transcript:

Presentation Title April 4, 2002 CAMERA- Metagenomics meets the Cyberinfrastructure David T. Kingsbury Gordon and Betty Moore Foundation BERAC - October 16, 2006

Presentation Title April 4, 2002 The CAMERA Partnership Community Cyberinfrastructure for Advanced Marine Microbial Ecology Research and Analysis

Presentation Title April 4, 2002 Genomic Data Is Growing Rapidly, But Metagenomics Will Vastly Increase The Scale… GenBank Protein Data Bank Billion Bases! Total Data < 1TB 35,000 Structures

Presentation Title April 4, 2002 The Sargasso Sea Experiment The Power of Environmental Metagenomics  Yielded a Total of Over 1 billion Base Pairs of Non-Redundant Sequence  Displayed the Gene Content, Diversity, & Relative Abundance of the Organisms  Sequences from at Least 1800 Genomic Species, including 148 Previously Unknown  Identified over 1.2 Million Unknown Genes MODIS-Aqua satellite image of ocean chlorophyll in the Sargasso Sea grid about the BATS site from 22 February 2003 J. Craig Venter, et al. Science 2 April 2004: Vol pp

Presentation Title April 4, 2002 Marine Genome Sequencing Project Measuring the Genetic Diversity of Ocean Microbes

Presentation Title April 4, 2002 Moore Foundation Funded the Venter Institute to Provide the Full Genome Sequence of 155 Marine Microbes

Presentation Title April 4, 2002 Moore Microbial Genome Sequencing Project: Cyanobacteria Being Sequenced by Venter Institute

Presentation Title April 4, 2002 Moore Microbial Genome Sequencing Project: Cyanobacteria Being Sequenced by Venter Institute

Presentation Title April 4, 2002 GOS Sequences are Largely Bacterial Source: Shibu Yooseph, et al. (PLOS Biology in press 2006) ~3 Million Previously Known Sequences ~5.6 Million GOS Sequences

Presentation Title April 4, 2002 Metagenomics Will Couple to Earth Observations Which Add Several TBs/Day Source: Glenn Iona, EOSDIS Element Evolution Technical Working Group January 6-7, 2005

Presentation Title April 4, 2002 Driven by User Needs  CAMERA serves as one representation of a specific research community’s need for a system to –Collect and reference increasing metadata relevant to environmental metagenome datasets –Exploit the power of querying on metadata across multiple geospatial locations –Have access to a diverse and customizable set of easy-to-use tools to analyze their data in the context of collected metagenomic and whole genomic datasets –Have the ability to update and propagate improvements to annotations –Have a pre-publication, pre-submission collaborative workspace –Serve a diverse informatics-literate community

Presentation Title April 4, 2002 Services Provided  Data and Application Services  Tools and Workflows  Computational Data, Visualization and Collaborative environment  Outreach and Training in Environmental Genomics

Presentation Title April 4, 2002 Data and Application Services  Primary Data –Sargasso Sea and Sorcerer II expedition data –JGI marine & terrestrial environmental datasets –Moore Microbial Genomes –JGI and other relevant whole genomes –Research community submitted datasets –Submitted 454-based metagenomic datasets –Publically available NR protein and DNA sequence datasets  Derived Data –Annotations of datasets –Assemblies –Alignments –Pre-computed clusters

Presentation Title April 4, 2002 Sample Metadata from GOS  Site Metadata –Location (lat/long, water depth) –Site characterization (finite list of types plus “other”) –Site description (free text) –Country  Sampling Metadata –Sample collection date/time –Sampling depth –Conditions at time of sampling (e.g., stormy, surface temperature) –Sample physical/chemical measurements (T (oC), S (ppt), chl a (mg m-3), etc) –“author”  Experimental Parameters –Filter size –Insert size

Presentation Title April 4, 2002 Tools and Workflows  Initial set –BLAST Server –Clustering –HMM/Profile –Neighborhood analysis –Multiple sequence alignments –Assembly  Proposed New Tools –Multiple Auto Annotation pipelines –Fast Sequence lookup –Customized Assembly –Phylogenetic Analysis –Clustering Tools

Presentation Title April 4, 2002 CAMERA Outreach Modes  Scientific Advisory Board –Early Adopters – OptIPortal End Points  Targeted Workshops –User Forums –User Software Testing –Viz Tool Brainstorming  Presentations at Scientific Meetings –e.g. Demonstration Booth at JCVI Genomes, Medicine, and the Environment Conference October 2006  Partnerships With Metagenomics Projects –E.g. DoE’s Joint Genome Institute (JGI)  Training and User Services Team

Presentation Title April 4, 2002 Guiding Philosophy for Development  Sprint Q –Propagate JCVI toolkit and data ASAP  Mechanism for publication of Sorcerer II data  Enabler for community –Defined deliverables, project management approach  MarathonQ onward –Additional Datasets –Additional tools –Community drives prioritization for ongoing releases  Advisory Board, Community Outreach  Keys to success:  Tight integration of science, bioinformatics, software, and IT  Matched to Community Needs

Presentation Title April 4, 2002 The Future Home of the Moore Foundation Funded Marine Microbial Ecology Metagenomics Complex First Implementation of the CAMERA Complex Photo Courtesy Joe Keefe, Calit2 Major Buildout of Calit2 Server Room Underway

Presentation Title April 4, 2002 Moore CAMERA Production Environment  Creation of Initial Production Environment – September 2006 –Hardware  Compute Nodes – –~200 4 CPU Nodes = ~800 Processing Cores  Storage Servers – –10 systems = ¼ Petabyte raw storage  Database Servers –Larger 20-40TB; Smaller 5-10TB  Network Management – –Force10 E1200 Router w/12 10GigE Interfaces to Each System Ports  User Access to Compute Cycles –Bulk of free cycles available to external users –Proposal mechanism Source: Greg Hidley, Calit2; Phil Papadopoulos, SDSC, Calit2

Presentation Title April 4, Created in Reykjavik, Iceland 2003 Countries are Aggressively Creating Gigabit Services: Interactive Access to CAMERA and LOOKING Systems Visualization courtesy of Bob Patterson, NCSA.

Presentation Title April 4, 2002 Scale

Presentation Title April 4, 2002 Flat File Server Farm W E B PORTAL Traditional User Response Request Dedicated Compute Farm (1000 CPUs) TeraGrid: Cyberinfrastructure Backplane (scheduled activities, e.g. all by all comparison) (10000s of CPUs) Web (other service) Local Cluster Local Environment Direct Access Lambda Cnxns Data- Base Farm 10 GigE Fabric Calit2’s Direct Access Core Architecture Will Create Next Generation Metagenomics Server Source: Phil Papadopoulos, SDSC, Calit2 + Web Services Sargasso Sea Data Sorcerer II Expedition (GOS) JGI Community Sequencing Project Moore Marine Microbial Project NASA Goddard Satellite Data Community Microbial Metagenomics Data

Presentation Title April 4, 2002 OptIPuter Scalable Adaptive Graphics Environment (SAGE) Allows Integration of HD Streams OptIPortal– Termination Device for the OptIPuter Global Backplane

Presentation Title April 4, 2002 OptIPortal– Termination Device for the OptIPuter Global Backplane  20 Dual CPU Nodes, 20 24” Monitors, ~$50,000  1/4 Teraflop, 5 Terabyte Storage, 45 Mega Pixels--Nice PC!  Scalable Adaptive Graphics Environment ( SAGE) Jason Leigh, EVL-UIC Source: Phil Papadopoulos SDSC, Calit2

Presentation Title April 4, 2002 UIC/UCSD 10GE CAVEWave on the National LambdaRail Emerging OptIPortal Sites CAVEWave Connects Chicago to Seattle to San Diego…and Washington D.C. as of 4/1/06 and JCVI as of 5/15/06 NEW! SunLight CICESE UW JCVI MIT SIO UCSD SDSU UIC EVL UCI OptIPortals

Presentation Title April 4, 2002 First Remote Interactive High Definition Video Exploration of Deep Sea Vents Source John Delaney & Deborah Kelley, UWash Canadian-U.S. Collaboration

Presentation Title April 4, 2002 High Definition Still Frame of Hydrothermal Vent Ecology 2.3 Km Deep White Filamentous Bacteria on 'Pill Bug' Outer Carapace 1 cm. Source: John Delaney and Research Channel, U Washington

Presentation Title April 4, 2002 A Near Future Metagenomics Fiber Optic-Enabled Data Generator Source John Delaney, UWash