Building a Community Cyberinfrastructure to Support Marine Microbial Ecology Metagenomics Invited Talk Metagenomics 2006 Calit2 @ UCSD La Jolla, CA October 4, 2006 Dr. Larry Smarr Director, California Institute for Telecommunications and Information Technology Harry E. Gruber Professor, Dept. of Computer Science and Engineering Jacobs School of Engineering, UCSD
Internet2 Backbone is 10,000 Mbps! Throughput is < 0.5% to End User Challenge: Average Throughput of NASA Data Products to End User is < 50 Mbps Tested October 2005 Internet2 Backbone is 10,000 Mbps! Throughput is < 0.5% to End User http://ensight.eos.nasa.gov/Missions/icesat/index.shtml
Source: Steve Wallach, Chiaro Networks Dedicated Optical Channels Makes High Performance Cyberinfrastructure Possible (WDM) Source: Steve Wallach, Chiaro Networks “Lambdas” Parallel Lambdas are Driving Optical Networking The Way Parallel Processors Drove 1990s Computing
National Lambda Rail (NLR) and TeraGrid Provides Cyberinfrastructure Backbone for U.S. Researchers NSF’s TeraGrid Has 4 x 10Gb Lambda Backbone Seattle International Collaborators Portland Boise UC-TeraGrid UIC/NW-Starlight Ogden/ Salt Lake City Cleveland Chicago New York City San Francisco Denver Pittsburgh Washington, DC Kansas City Raleigh Albuquerque Tulsa Los Angeles Atlanta San Diego Phoenix Dallas Baton Rouge Las Cruces / El Paso Links Two Dozen State and Regional Optical Networks Jacksonville Pensacola DOE, NSF, & NASA Using NLR San Antonio Houston NLR 4 x 10Gb Lambdas Initially Capable of 40 x 10Gb wavelengths at Buildout
NSF EarthScope and ORION The OptIPuter Project – Creating High Resolution Portals Over Dedicated Optical Channels to Global Science Data NSF Large Information Technology Research Proposal Calit2 (UCSD, UCI) and UIC Lead Campuses—Larry Smarr PI Partnering Campuses: SDSC, USC, SDSU, NCSA, NW, TA&M, UvA, SARA, NASA Goddard, KISTI, AIST, CRC(Canada), CICESE (Mexico) Engaged Industrial Partners: IBM, Sun, Telcordia, Chiaro, Calient, Glimmerglass, Lucent $13.5 Million Over Five Years—Now In the Fifth Year NIH Biomedical Informatics NSF EarthScope and ORION Research Network
OptIPuter Software Architecture--a Service-Oriented Architecture Integrating Lambdas Into the Grid Distributed Applications/ Web Services Telescience Vol-a-Tile SAGE JuxtaView Visualization Data Services LambdaRAM Source: Andrew Chien, UCSD DVC Configuration Distributed Virtual Computer (DVC) API DVC Runtime Library DVC Services DVC Core Services DVC Job Scheduling DVC Communication Resource Identify/Acquire Namespace Management Security Management High Speed Storage Services IP Lambdas Discovery and Control PIN/PDC RobuStore Globus GRAM GSI XIO GTP XCP UDT CEP LambdaStream RBUDP
Source: Harry Dent, The Great Boom Ahead Calit2 “Lives in the Future” By Building Systems of Emerging Disruptive Technologies Co-Evolution of Personal Automobile and Highway/Petroleum Infrastructure Technologies Diffuse Into Society Following an S-Curve Calit2 Works Here { Source: Harry Dent, The Great Boom Ahead
Calit2--A Systems Approach to the Future of the Internet and its Transformation of Our Society Calit2 Has Assembled a Complex Social Network of Over 350 UC San Diego & UC Irvine Faculty Working in Multidisciplinary Teams With Staff, Students, Industry, and the Community Integrating Technology Consumers and Producers Into “Living Laboratories” www.calit2.net
Some Areas of Concentration: Calit2 Brings Computer Scientists and Engineers Together with Biomedical Researchers Some Areas of Concentration: Metagenomics Genomic Analysis of Organisms Evolution of Genomes Cancer Genomics Human Genomic Variation and Disease Proteomics Mitochondrial Evolution Computational Biology Information Theory and Biological Systems UC Irvine UC San Diego
Much of Genome Work Has Occurred in Animals Evolution is the Principle of Biological Systems: Most of Evolutionary Time Was in the Microbial World You Are Here Much of Genome Work Has Occurred in Animals Source: Carl Woese, et al
Calit2 is Now Attracting Private Foundation Grants Paul Gilna Ex. Dir. PI Larry Smarr Calit2 is Now Attracting Private Foundation Grants Announced January 17, 2006--$24.5M Over Seven Years
Sorcerer II Data Will Double Number of Proteins in GenBank! Marine Genome Sequencing Project – Measuring the Genetic Diversity of Ocean Microbes Sorcerer II Data Will Double Number of Proteins in GenBank!
Current Universe of Medium/ Large Protein Families 17,067 Protein Family Clusters Protein Families Unique to GOS Protein Families Conserved Across Tree of Life Source: Shibu Yooseph, et al. (PLOS Biology in press 2006)
Calit2’s Direct Access Core Architecture Will Create Next Generation Metagenomics Server Sargasso Sea Data Sorcerer II Expedition (GOS) JGI Community Sequencing Project Moore Marine Microbial Project NASA Goddard Satellite Data Community Microbial Metagenomics Data Traditional User Dedicated Compute Farm (1000 CPUs) Flat File Server Farm W E B PORTAL Request Data- Base Farm 10 GigE Fabric Response + Web Services Web (other service) Local Cluster Environment Direct Access Lambda Cnxns TeraGrid: Cyberinfrastructure Backplane (scheduled activities, e.g. all by all comparison) (10000s of CPUs) Source: Phil Papadopoulos, SDSC, Calit2
The Bioinformatics Core of the Joint Center for Structural Genomics will be Housed in the Calit2@UCSD Building Extremely Thermostable -- Useful for Many Industrial Processes (e.g. Chemical and Food) 173 Structures (122 from JCSG) Determining the Protein Structures of the Thermotoga Maritima Genome 122 T.M. Structures Solved by JCSG (75 Unique In The PDB) Direct Structural Coverage of 25% of the Expressed Soluble Proteins Probably Represents the Highest Structural Coverage of Any Organism Source: John Wooley, UCSD
Interactive Visualization of Thermatoga Proteins at Calit2 Source: John Wooley, Jurgen Schulze, Calit2
OptIPortal– Termination Device for the OptIPuter Global Backplane 20 Dual CPU Nodes, 20 24” Monitors, ~$50,000 1/4 Teraflop, 5 Terabyte Storage, 45 Mega Pixels--Nice PC! Scalable Adaptive Graphics Environment ( SAGE) Jason Leigh, EVL-UIC Source: Phil Papadopoulos SDSC, Calit2
Calit2 is Now OptIPuter Connecting Remote Moore-Funded Microbial Researchers OptIPortals UW NW! UIC EVL MIT JCVI UCI SIO UCSD OptIPortal SDSU CICESE
Live Demonstration of 21st Century National-Scale Team Science Calit2 and the Venter Institute Will Combine Telepresence with Remote Interactive Analysis Live Demonstration of 21st Century National-Scale Team Science 25 Miles Venter Institute OptIPuter Visualized Data HDTV Over Lambda
Countries are Aggressively Creating Gigabit Services: Interactive Access to CAMERA and LOOKING Systems Visualization courtesy of Bob Patterson, NCSA. www.glif.is Created in Reykjavik, Iceland 2003
NEPTUNE CI Requirements of Gigabit Flows Routine New OptIPuter Driver: Gigabit Fibers on the Ocean Floor -- Controlling Sensors and HDTV Cameras Remotely National Science Foundation Is Planning a New Generation of Ocean Observatories Ocean Research Interactive Observatory Networks (ORION) Fibered Observatories Linked to Land Fiber Infrastructure Laboratory for the Ocean Observatory Knowledge Integration Grid (LOOKING) Building a Prototype Based on OptIPuter Technologies Plus Web/Grid Services HDTV Streams Over IP Will be a Major Driver LOOKING is Driven By NEPTUNE CI Requirements (Funded by NSF ITR- John Delaney, UWash, PI) Making Management of Gigabit Flows Routine
Using the OptIPuter to Couple Data Assimilation Models to Remote Data Sources Including Biology NASA MODIS Mean Primary Productivity for April 2001 in California Current System Regional Ocean Modeling System (ROMS) http://ourocean.jpl.nasa.gov/
Canadian-U.S. Collaboration Deploying Novel Infrastructure Enables New Science: Gigabit Fibers on the Ocean Floor Canadian-U.S. Collaboration An Experiment in the NSF Laboratory for the Ocean Observatory Knowledge Integration Grid (LOOKING) ITR Prototype of CI for NSF’s ORION Source: John Delaney & Deborah Kelley, UWash
High Definition Still Frame of Hydrothermal Vent Ecology 2.3 Km Deep 1 cm. Source: John Delaney and Research Channel, U Washington White Filamentous Bacteria on 'Pill Bug' Outer Carapace
A Near Future Metagenomics Fiber Optic-Enabled Data Generator Source John Delaney, UWash