SDSC RP Update TeraGrid Roundtable 01-14-10. Reviewing Dash Unique characteristics: –A pre-production/evaluation “data-intensive” supercomputer based.

Slides:



Advertisements
Similar presentations
1 Copyright © 2012 Oracle and/or its affiliates. All rights reserved. Convergence of HPC, Databases, and Analytics Tirthankar Lahiri Senior Director, Oracle.
Advertisements

IBM 1350 Cluster Expansion Doug Johnson Senior Systems Developer.
Appro Xtreme-X Supercomputers A P P R O I N T E R N A T I O N A L I N C.
Supercomputing Challenges at the National Center for Atmospheric Research Dr. Richard Loft Computational Science Section Scientific Computing Division.
Program Systems Institute Russian Academy of Sciences1 Program Systems Institute Research Activities Overview Extended Version Alexander Moskovsky, Program.
SAN DIEGO SUPERCOMPUTER CENTER at the UNIVERSITY OF CALIFORNIA, SAN DIEGO HPEC September 16, 2010 Dr. Mike Norman, PI Dr. Allan Snavely, Co-PI.
SAN DIEGO SUPERCOMPUTER CENTER Using Gordon to Accelerate LHC Science Rick Wagner San Diego Supercomputer Center XSEDE 13 July 22-25, 2013 San Diego, CA.
SAN DIEGO SUPERCOMPUTER CENTER at the UNIVERSITY OF CALIFORNIA, SAN DIEGO Gordon: NSF Flash-based System for Data-intensive Science Mahidhar Tatineni 37.
SAN DIEGO SUPERCOMPUTER CENTER Niches, Long Tails, and Condos Effectively Supporting Modest-Scale HPC Users 21st High Performance Computing Symposia (HPC'13)
SAN DIEGO SUPERCOMPUTER CENTER at the UNIVERSITY OF CALIFORNIA; SAN DIEGO SDSC RP Update Trestles Recent Dash results Gordon schedule SDSC’s broader HPC.
SAN DIEGO SUPERCOMPUTER CENTER at the UNIVERSITY OF CALIFORNIA; SAN DIEGO SDSC RP Update October 21, 2010.
SAN DIEGO SUPERCOMPUTER CENTER at the UNIVERSITY OF CALIFORNIA, SAN DIEGO Status Update TeraGrid Science Advisory Board Meeting July 19, 2010 Dr. Mike.
IDC HPC User Forum Conference Appro Product Update Anthony Kenisky, VP of Sales.
ASKAP Central Processor: Design and Implementation Calibration and Imaging Workshop 2014 ASTRONOMY AND SPACE SCIENCE Ben Humphreys | ASKAP Software and.
SAN DIEGO SUPERCOMPUTER CENTER at the UNIVERSITY OF CALIFORNIA; SAN DIEGO IEEE Symposium of Massive Storage Systems, May 3-5, 2010 Data-Intensive Solutions.
CS CS 5150 Software Engineering Lecture 19 Performance.
NPACI: National Partnership for Advanced Computational Infrastructure Supercomputing ‘98 Mannheim CRAY T90 vs. Tera MTA: The Old Champ Faces a New Challenger.
SUMS Storage Requirement 250 TB fixed disk cache 130 TB annual increment for permanently on- line data 100 TB work area (not controlled by SUMS) 2 PB near-line.
NWfs A ubiquitous, scalable content management system with grid enabled cross site data replication and active storage. R. Scott Studham.
Lecture 1: Introduction to High Performance Computing.
Gordon: Using Flash Memory to Build Fast, Power-efficient Clusters for Data-intensive Applications A. Caulfield, L. Grupp, S. Swanson, UCSD, ASPLOS’09.
Real Parallel Computers. Modular data centers Background Information Recent trends in the marketplace of high performance computing Strohmaier, Dongarra,
SAN DIEGO SUPERCOMPUTER CENTER at the UNIVERSITY OF CALIFORNIA, SAN DIEGO TeraGrid Coordination Meeting June 10, 2010 TeraGrid Forum Meeting June 16, 2010.
Swiss Academic Compute Cloud Project Lightning Talk at CH OpenStack User Meeting Nov Tyanko Alekseyev (UZH) Markus Eurich (ETH) Dean Flanders.
SDSC RP Update TeraGrid Roundtable Changes in SDSC Allocated Resources We will decommission our IA-64 cluster June 30 (rather than March 2010)
Computers Central Processor Unit. Basic Computer System MAIN MEMORY ALUCNTL..... BUS CONTROLLER Processor I/O moduleInterconnections BUS Memory.
CCS machine development plan for post- peta scale computing and Japanese the next generation supercomputer project Mitsuhisa Sato CCS, University of Tsukuba.
1 CS503: Operating Systems Spring 2014 Dongyan Xu Department of Computer Science Purdue University.
Computer Science Section National Center for Atmospheric Research Department of Computer Science University of Colorado at Boulder Blue Gene Experience.
The Red Storm High Performance Computer March 19, 2008 Sue Kelly Sandia National Laboratories Abstract: Sandia National.
Computing and IT Update Jefferson Lab User Group Roy Whitney, CIO & CTO 10 June 2009.
Hardware Trends. Contents Memory Hard Disks Processors Network Accessories Future.
High Performance Computing Processors Felix Noble Mirayma V. Rodriguez Agnes Velez Electric and Computer Engineer Department August 25, 2004.
Scientific Computing Experimental Physics Lattice QCD Sandy Philpott May 20, 2011 IT Internal Review 12GeV Readiness.
SAN DIEGO SUPERCOMPUTER CENTER at the UNIVERSITY OF CALIFORNIA, SAN DIEGO Michael L. Norman Principal Investigator Interim Director, SDSC Allan Snavely.
JLab Scientific Computing: Theory HPC & Experimental Physics Thomas Jefferson National Accelerator Facility Newport News, VA Sandy Philpott.
1 Application Scalability and High Productivity Computing Nicholas J Wright John Shalf Harvey Wasserman Advanced Technologies Group NERSC/LBNL.
Contact: Hirofumi Amano at Kyushu Mission 40 Years of HPC Services Though the R. I. I.
SAN DIEGO SUPERCOMPUTER CENTER SDSC's Data Oasis Balanced performance and cost-effective Lustre file systems. Lustre User Group 2013 (LUG13) Rick Wagner.
2009/4/21 Third French-Japanese PAAP Workshop 1 A Volumetric 3-D FFT on Clusters of Multi-Core Processors Daisuke Takahashi University of Tsukuba, Japan.
Nanco: a large HPC cluster for RBNI (Russell Berrie Nanotechnology Institute) Anne Weill – Zrahia Technion,Computer Center October 2008.
TeraGrid Quarterly Meeting Arlington, VA Sep 6-7, 2007 NCSA RP Status Report.
NICS RP Update TeraGrid Round Table March 10, 2011 Ryan Braby NICS HPC Operations Group Lead.
Comprehensive Scientific Support Of Large Scale Parallel Computation David Skinner, NERSC.
COMP381 by M. Hamdi 1 Clusters: Networks of WS/PC.
DiRAC-3 – The future Jeremy Yates, STFC DiRAC HPC Facility.
NICS Update Bruce Loftis 16 December National Institute for Computational Sciences University of Tennessee and ORNL partnership  NICS is the 2.
Presented by NCCS Hardware Jim Rogers Director of Operations National Center for Computational Sciences.
Tackling I/O Issues 1 David Race 16 March 2010.
Parallel Computers Today Oak Ridge / Cray Jaguar > 1.75 PFLOPS Two Nvidia 8800 GPUs > 1 TFLOPS Intel 80- core chip > 1 TFLOPS  TFLOPS = floating.
Scheduling a 100,000 Core Supercomputer for Maximum Utilization and Capability September 2010 Phil Andrews Patricia Kovatch Victor Hazlewood Troy Baer.
Introduction to Exadata X5 and X6 New Features
Opportunistic Computing Only Knocks Once: Processing at SDSC Ian Fisk FNAL On behalf of the CMS Collaboration.
COMP7500 Advanced Operating Systems I/O-Aware Load Balancing Techniques Dr. Xiao Qin Auburn University
The Evolution of the Italian HPC Infrastructure Carlo Cavazzoni CINECA – Supercomputing Application & Innovation 31 Marzo 2015.
Introduction to Data Analysis with R on HPC Texas Advanced Computing Center Feb
© Thomas Ludwig Prof. Dr. Thomas Ludwig German Climate Computing Center (DKRZ) University of Hamburg, Department for Computer Science (UHH/FBI) Disks,
High Performance Storage System (HPSS) Jason Hick Mass Storage Group HEPiX October 26-30, 2009.
Low-Cost High-Performance Computing Via Consumer GPUs
Appro Xtreme-X Supercomputers
Windows Server* 2016 & Intel® Technologies
…updates… 9/19/2018.
BlueGene/L Supercomputer
CS 140 Lecture Notes: Technology and Operating Systems
Assessment Findings System Professional <Insert Consultant Name>
CS 140 Lecture Notes: Technology and Operating Systems
TeraScale Supernova Initiative
K computer RIKEN Advanced Institute for Computational Science
K computer RIKEN Advanced Institute for Computational Science
Contention-Aware Resource Scheduling for Burst Buffer Systems
Presentation transcript:

SDSC RP Update TeraGrid Roundtable

Reviewing Dash Unique characteristics: –A pre-production/evaluation “data-intensive” supercomputer based on SSD flash memory and virtual shared memory –Nehalem processors Integrating into TeraGrid: –Add to TeraGrid Resource Catalog –Target friendly users interested in exploring unique capabilities –Available initially for start-up allocations (March 2010) –As it stabilizes and depending on user interest, evaluate more routine allocations at TRAC level –Appropriate CTSS kits will be installed –Planned to support TeraGrid wide-area filesystem efforts (GPFS-WAN, Lustre-WAN)

Introducing Gordon (SDSC’s Track 2d System) Unique characteristics: –A “data-intensive” supercomputer based on SSD flash memory and virtual shared memory Emphasizes MEM and IO over FLOPS –A system designed to accelerate access to massive data bases being generated in all fields of science, engineering, medicine, and social science –Sandy Bridge processors Integrating into TeraGrid: –Will be added to TeraGrid Resource Catalog –Appropriate CTSS kits will be installed –Planned to support TeraGrid wide-area filesystem efforts –Coming summer 2011

The Memory Hierarchy Flash SSD, O(TB) 1000 cycles Potential 10x speedup for random I/O to large files and databases

Gordon Architecture: “Supernode” 32 Appro Extreme-X compute nodes –Dual processor Intel Sandy Bridge 240 GFLOPS 64 GB 2 Appro Extreme-X IO nodes –Intel SSD drives 4 TB ea. 560,000 IOPS ScaleMP vSMP virtual shared memory –2 TB RAM aggregate –8 TB SSD aggregate 240 GF Comp. Node 64 GB RAM 240 GF Comp. Node 64 GB RAM 4 TB SSD I/O Node vSMP memory virtualization

Gordon Architecture: Full Machine 32 supernodes = 1024 compute nodes Dual rail QDR Infiniband network –3D torus (4x4x4) 4 PB rotating disk parallel file system –>100 GB/s SN DDDDDD

Comparing Dash and Gordon systems Doubling capacity halves accessibility to any random data on a given media System ComponentDash Gordon Node Characteristics (# sockets, cores, DRAM) 2 sockets, 8 cores, 48 GB 2 sockets, TBD cores, 64 GB Compute Nodes (#) Processor TypeNehalem Sandy Bridge Clock Speed (GHz)2.4 TBD Peak Speed (Tflops) DRAM (TB)3 64 I/O Nodes (#)2 64 I/O Controllers per Node2 with 8 ports 1 with 16 ports Flash (TB)2 256 Total Memory: DRAM + flash (TB)5 320 vSMPYes 32-node Supernodes2 32 InterconnectInfiniBand Disk.5 PB 4.5 PB

Data mining applications will benefit from Gordon De novo genome assembly from sequencer reads & analysis of galaxies from cosmological simulations and observations Will benefit from large shared memory Federations of databases and Interaction network analysis for drug discovery, social science, biology, epidemiology, etc. Will benefit from low latency I/O from flash

Data-intensive predictive science will benefit from Gordon Solution of inverse problems in oceanography, atmospheric science, & seismology –Will benefit from a balanced system, especially large RAM per core & fast I/O Modestly scalable codes in quantum chemistry & structural engineering –Will benefit from largeshared memory

We won SC09 Data Challenge with Dash! With these numbers: IOR 4KB –RAMFS 4Million+ IOPS on up to.750 TB of DRAM (1 supernode’s worth) –88K+ IOPS on up to 1 TB of flash (1 supernode’s worth) –Speed up Palomar Transients database searches 10x to 100x –Best IOPS per dollar Since that time we boosted flash IOPS to 540K hitting our 2011 performance targets

Deployment Schedule Summer 2009-Present –Internal evaluation and testing w/ internal apps – SSD and vSMP Starting ~Mar 2010 –Dash would be allocated via startup requests by friendly TeraGrid users. Summer 2010 –Expect to change status to allocable system starting ~October 2010 via TRAC requests –Preference given to applications that target the unique technologies of Dash. Oct June 2011 –Operate Dash as an allocable TeraGrid resource, available thru the normal POPS/TRAC cycles, with appropriate caveats about preferred applications and friendly-user status. –Help fill the SMP gap created by Altix’s being retired in 2010 March 2011 – July 2011 –Gordon build and acceptance July 2011 – June 2014 –Operate Gordon as an allocable TeraGrid resource, available thru the normal POPS/TRAC cycles Summer 2009-Present –Internal evaluation and testing w/ internal apps – SSD and vSMP Starting ~Mar 2010 –Dash would be allocated via startup requests by friendly TeraGrid users. Summer 2010 –Expect to change status to allocable system starting ~October 2010 via TRAC requests –Preference given to applications that target the unique technologies of Dash. Oct June 2011 –Operate Dash as an allocable TeraGrid resource, available thru the normal POPS/TRAC cycles, with appropriate caveats about preferred applications and friendly-user status. –Help fill the SMP gap created by Altix’s being retired in 2010 March 2011 – July 2011 –Gordon build and acceptance July 2011 – June 2014 –Operate Gordon as an allocable TeraGrid resource, available thru the normal POPS/TRAC cycles

HPSS (R/W) HPSS (R/W) HPSS (R) HPSS (R) SAMQFS (R/W) SAMQFS (R/W) SAMQFS Legacy: (R) Allocated: (R/W) SAMQFS Legacy: (R) Allocated: (R/W) SAMQFS Legacy: (R) Allocated: (R/W) SAMQFS Legacy: (R) Allocated: (R/W) SAMQFS (R) SAMQFS (R) Hardware 6 Silos 12 PB 64 Tape Drives Hardware 6 Silos 12 PB 64 Tape Drives No Change No Change Hardware 2 Silos 6 PB 32 Tape Drives Hardware 2 Silos 6 PB 32 Tape Drives No Change No Change Jul 2009Mid 2010Mar 2011Jun 2013 TBD … Consolidating Archive Systems SDSC has historically operated two archive systems: HPSS and SAM-QFS Due to budget constraints, we’re consolidating to one: SAM-QFS We’re currently migrating HPSS user data to SAM-QFS