Presentation is loading. Please wait.

Presentation is loading. Please wait.

High Performance Cyberinfrastructure Enables Data-Driven Science in the Globally Networked World Keynote Presentation Sequencing Data Storage and Management.

Similar presentations

Presentation on theme: "High Performance Cyberinfrastructure Enables Data-Driven Science in the Globally Networked World Keynote Presentation Sequencing Data Storage and Management."— Presentation transcript:

1 High Performance Cyberinfrastructure Enables Data-Driven Science in the Globally Networked World Keynote Presentation Sequencing Data Storage and Management Meeting at The X-GEN Congress and Expo San Diego, CA March 14, 2011 Dr. Larry Smarr Director, California Institute for Telecommunications and Information Technology Harry E. Gruber Professor, Dept. of Computer Science and Engineering Jacobs School of Engineering, UCSD Follow me on Twitter: lsmarr 1

2 Abstract High performance cyberinfrastructure (10Gbps dedicated optical channels end- to-end) enables new levels of discovery for data-intensive research projects such as next generation sequencing. In addition to international and national optical fiber infrastructure, we need local campus high performance research cyberinfrastructure (HPCI) to provide on-ramps, as well as scalable visualization walls and compute and storage clouds, to augment the emerging remote commercial clouds. I will review how UCSD has built out just such a HPCI and is in the process of connecting it to a variety of high throughput biomedical devices. I will show how high performance collaboration technologies allow for distributed interdisciplinary teams to analyze these large data sets in real-time.

3 Two Calit2 Buildings Provide Laboratories for Living in the Future Convergence Laboratory Facilities –Nanotech, BioMEMS, Chips, Radio, Photonics –Virtual Reality, Digital Cinema, HDTV, Gaming Over 1000 Researchers in Two Buildings –Linked via Dedicated Optical Networks UC San Diego Over 400 Federal Grants, 200 Companies UC Irvine

4 The Required Components of High Performance Cyberinfrastructure High Performance Optical Networks Scalable Visualization and Analysis Multi-Site Collaborative Systems End-to-End Wide Area CI Data-Intensive Campus Research CI

5 The OptIPuter Project: Creating High Resolution Portals Over Dedicated Optical Channels to Global Science Data Picture Source: Mark Ellisman, David Lee, Jason Leigh Calit2 (UCSD, UCI), SDSC, and UIC LeadsLarry Smarr PI Univ. Partners: NCSA, USC, SDSU, NW, TA&M, UvA, SARA, KISTI, AIST Industry: IBM, Sun, Telcordia, Chiaro, Calient, Glimmerglass, Lucent Scalable Adaptive Graphics Environment (SAGE) OptIPortal

6 Visual Analytics--Use of Tiled Display Wall OptIPortal to Interactively View Microbial Genome (5 Million Bases) Acidobacteria bacterium Ellin345 Soil Bacterium 5.6 Mb; ~5000 Genes Source: Raj Singh, UCSD

7 Use of Tiled Display Wall OptIPortal to Interactively View Microbial Genome Source: Raj Singh, UCSD

8 Use of Tiled Display Wall OptIPortal to Interactively View Microbial Genome Source: Raj Singh, UCSD

9 Large Data Challenge: Average Throughput to End User on Shared Internet is 10-100 Mbps Transferring 1 TB: --50 Mbps = 2 Days --10 Gbps = 15 Minutes Tested January 2011

10 Solution: Give Dedicated Optical Channels to Data-Intensive Users (WDM) Source: Steve Wallach, Chiaro Networks Lambdas Parallel Lambdas are Driving Optical Networking The Way Parallel Processors Drove 1990s Computing 10 Gbps per User ~ 100-1000x Shared Internet Throughput

11 Dedicated 10Gbps Lightpaths Tie Together State and Regional Fiber Infrastructure NLR 40 x 10Gb Wavelengths Interconnects Two Dozen State and Regional Optical Networks Internet2 Dynamic Circuit Network Is Now Available

12 Visualization courtesy of Bob Patterson, NCSA. Created in Reykjavik, Iceland 2003 The Global Lambda Integrated Facility-- Creating a Planetary-Scale High Bandwidth Collaboratory Research Innovation Labs Linked by 10G Dedicated Lambdas

13 Launch of the 100 Megapixel OzIPortal Kicked Off a Rapid Build Out of Australian OptIPortals Covise, Phil Weber, Jurgen Schulze, Calit2 CGLX, Kai-Uwe Doerr, Calit2 January 15, 2008 No Calit2 Person Physically Flew to Australia to Bring This Up! January 15, 2008

14 Blueprint for the Digital University--Report of the UCSD Research Cyberinfrastructure Design Team Focus on Data-Intensive Cyberinfrastructure No Data Bottlenecks --Design for Gigabit/s Data Flows April 2009

15 Source: Jim Dolgonas, CENIC Campus Preparations Needed to Accept CENIC CalREN Handoff to Campus

16 Current UCSD Prototype Optical Core: Bridging End-Users to CENIC L1, L2, L3 Services Source: Phil Papadopoulos, SDSC/Calit2 (Quartzite PI, OptIPuter co-PI) Quartzite Network MRI #CNS-0421555; OptIPuter #ANI-0225642 Lucent Glimmerglass Force10 Enpoints: >= 60 endpoints at 10 GigE >= 32 Packet switched >= 32 Switched wavelengths >= 300 Connected endpoints Approximately 0.5 TBit/s Arrive at the Optical Center of Campus. Switching is a Hybrid of: Packet, Lambda, Circuit -- OOO and Packet Switches

17 Calit2 Sunlight Optical Exchange Contains Quartzite Maxine Brown, EVL, UIC OptIPuter Project Manager

18 UCSD Planned Optical Networked Biomedical Researchers and Instruments Cellular & Molecular Medicine West National Center for Microscopy & Imaging Biomedical Research Center for Molecular Genetics Pharmaceutical Sciences Building Cellular & Molecular Medicine East CryoElectron Microscopy Facility Radiology Imaging Lab Bioengineering Calit2@UCSD San Diego Supercomputer Center Connects at 10 Gbps : –Microarrays –Genome Sequencers –Mass Spectrometry –Light and Electron Microscopes –Whole Body Imagers –Computing –Storage

19 UCSD Campus Investment in Fiber Enables Consolidation of Energy Efficient Computing & Storage Source: Philip Papadopoulos, SDSC, UCSD OptIPortal Tiled Display Wall Campus Lab Cluster Digital Data Collections N x 10Gb/s Triton – Petascale Data Analysis Gordon – HPD System Cluster Condo WAN 10Gb: CENIC, NLR, I2 Scientific Instruments DataOasis (Central) Storage GreenLight Data Center

20 Community Cyberinfrastructure for Advanced Microbial Ecology Research and Analysis

21 Calit2 Microbial Metagenomics Cluster- Next Generation Optically Linked Science Data Server 512 Processors ~5 Teraflops ~ 200 Terabytes Storage 1GbE and 10GbE Switched / Routed Core ~200TB Sun X4500 Storage 10GbE Source: Phil Papadopoulos, SDSC, Calit2 4000 Users From 90 Countries

22 OptIPuter Persistent Infrastructure Enables Calit2 and U Washington CAMERA Collaboratory Ginger Armbrusts Diatoms: Micrographs, Chromosomes, Genetic Assembly Photo Credit: Alan Decker Feb. 29, 2008 iHDTV: 1500 Mbits/sec Calit2 to UW Research Channel Over NLR

23 Creating CAMERA 2.0 - Advanced Cyberinfrastructure Service Oriented Architecture Source: CAMERA CTO Mark Ellisman

24 The GreenLight Project: Instrumenting the Energy Cost of Computational Science Focus on 5 Communities with At-Scale Computing Needs: –Metagenomics –Ocean Observing –Microscopy –Bioinformatics –Digital Media Measure, Monitor, & Web Publish Real-Time Sensor Outputs –Via Service-oriented Architectures –Allow Researchers Anywhere To Study Computing Energy Cost –Enable Scientists To Explore Tactics For Maximizing Work/Watt Develop Middleware that Automates Optimal Choice of Compute/RAM Power Strategies for Desired Greenness Data Center for School of Medicine Illumina Next Gen Sequencer Storage and Processing Source: Tom DeFanti, Calit2; GreenLight PI

25 SDSC Large Memory Nodes 256/512 GB/sys 8TB Total 128 GB/sec ~ 9 TF x28 SDSC Shared Resource Cluster 24 GB/Node 6TB Total 256 GB/sec ~ 20 TF x256 UCSD Research Labs SDSC Data Oasis Large Scale Storage 2 PB 50 GB/sec 3000 – 6000 disks Phase 0: 1/3 TB, 8GB/s Moving to Shared Enterprise Data Storage & Analysis Resources: SDSC Triton Resource & Calit2 GreenLight Campus Research Network Calit2 GreenLight N x 10Gb/s Source: Philip Papadopoulos, SDSC, UCSD

26 NSF Funds a Data-Intensive Track 2 Supercomputer: SDSCs Gordon-Coming Summer 2011 Data-Intensive Supercomputer Based on SSD Flash Memory and Virtual Shared Memory SW –Emphasizes MEM and IOPS over FLOPS –Supernode has Virtual Shared Memory: –2 TB RAM Aggregate –8 TB SSD Aggregate –Total Machine = 32 Supernodes –4 PB Disk Parallel File System >100 GB/s I/O System Designed to Accelerate Access to Massive Data Bases being Generated in Many Fields of Science, Engineering, Medicine, and Social Science Source: Mike Norman, Allan Snavely SDSC

27 Data Mining Applications will Benefit from Gordon De Novo Genome Assembly from Sequencer Reads & Analysis of Galaxies from Cosmological Simulations & Observations Will Benefit from Large Shared Memory Federations of Databases & Interaction Network Analysis for Drug Discovery, Social Science, Biology, Epidemiology, Etc. Will Benefit from Low Latency I/O from Flash Source: Mike Norman, SDSC

28 Rapid Evolution of 10GbE Port Prices Makes Campus-Scale 10Gbps CI Affordable 2005 2007 2009 2010 $80K/port Chiaro (60 Max) $ 5K Force 10 (40 max) $ 500 Arista 48 ports ~$1000 (300+ Max) $ 400 Arista 48 ports Port Pricing is Falling Density is Rising – Dramatically Cost of 10GbE Approaching Cluster HPC Interconnects Source: Philip Papadopoulos, SDSC/Calit2

29 10G Switched Data Analysis Resource: SDSCs Data Oasis 2 12 OptIPuter 32 Co-Lo UCSD RCI CENIC/ NLR Trestles 100 TF 8 Dash 128 Gordon Oasis Procurement (RFP) Phase0: > 8GB/s Sustained Today Phase I: > 50 GB/sec for Lustre (May 2011) :Phase II: >100 GB/s (Feb 2012) 40 128 Source: Philip Papadopoulos, SDSC/Calit2 Triton 32 Radical Change Enabled by Arista 7508 10G Switch 384 10G Capable 8 Existing Commodity Storage 1/3 PB 2000 TB > 50 GB/s 10Gbps 5 8 2 4

30 Calit2 CAMERA Automatic Overflows into SDSC Triton Triton Resource CAMERA DATA @ CALIT2 @ SDSC CAMERA - Managed Job Submit Portal (VM) 10Gbps Transparently Sends Jobs to Submit Portal on Triton Direct Mount == No Data Staging

31 California and Washington Universities Are Testing a 10Gbps Connected Commercial Data Cloud Amazon Experiment for Big Data –Only Available Through CENIC & Pacific NW GigaPOP –Private 10Gbps Peering Paths –Includes Amazon EC2 Computing & S3 Storage Services Early Experiments Underway –Robert Grossman, Open Cloud Consortium –Phil Papadopoulos, Calit2/SDSC Rocks

32 Academic Research OptIPlanet Collaboratory: A 10Gbps End-to-End Lightpath Cloud National LambdaRail Campus Optical Switch Data Repositories & Clusters HPC HD/4k Video Repositories End User OptIPortal 10G Lightpaths HD/4k Live Video Local or Remote Instruments

33 You Can Download This Presentation at

Download ppt "High Performance Cyberinfrastructure Enables Data-Driven Science in the Globally Networked World Keynote Presentation Sequencing Data Storage and Management."

Similar presentations

Ads by Google