SAN DIEGO SUPERCOMPUTER CENTER Niches, Long Tails, and Condos Effectively Supporting Modest-Scale HPC Users 21st High Performance Computing Symposia (HPC'13)

Slides:



Advertisements
Similar presentations
Founded in 2010: UCL, Southampton, Oxford and Bristol Key Objectives of the Consortium: Prove the concept of shared, regional e-infrastructure services.
Advertisements

IBM 1350 Cluster Expansion Doug Johnson Senior Systems Developer.
Appro Xtreme-X Supercomputers A P P R O I N T E R N A T I O N A L I N C.
SAN DIEGO SUPERCOMPUTER CENTER Using Gordon to Accelerate LHC Science Rick Wagner San Diego Supercomputer Center XSEDE 13 July 22-25, 2013 San Diego, CA.
SAN DIEGO SUPERCOMPUTER CENTER at the UNIVERSITY OF CALIFORNIA, SAN DIEGO Gordon: NSF Flash-based System for Data-intensive Science Mahidhar Tatineni 37.
SAN DIEGO SUPERCOMPUTER CENTER at the UNIVERSITY OF CALIFORNIA; SAN DIEGO SDSC RP Update Trestles Recent Dash results Gordon schedule SDSC’s broader HPC.
National Center for Atmospheric Research John Clyne 4/27/11 4/26/20111.
SAN DIEGO SUPERCOMPUTER CENTER at the UNIVERSITY OF CALIFORNIA; SAN DIEGO SDSC RP Update October 21, 2010.
SAN DIEGO SUPERCOMPUTER CENTER at the UNIVERSITY OF CALIFORNIA, SAN DIEGO Status Update TeraGrid Science Advisory Board Meeting July 19, 2010 Dr. Mike.
IDC HPC User Forum Conference Appro Product Update Anthony Kenisky, VP of Sales.
WEST VIRGINIA UNIVERSITY HPC and Scientific Computing AN OVERVIEW OF HIGH PERFORMANCE COMPUTING RESOURCES AT WVU.
Cyberinfrastructure for Scalable and High Performance Geospatial Computation Xuan Shi Graduate assistants supported by the CyberGIS grant Fei Ye (2011)
LinkSCEEM-2: A computational resource for the development of Computational Sciences in the Eastern Mediterranean Mostafa Zoubi SESAME SESAME – LinkSCEEM.
ASKAP Central Processor: Design and Implementation Calibration and Imaging Workshop 2014 ASTRONOMY AND SPACE SCIENCE Ben Humphreys | ASKAP Software and.
SAN DIEGO SUPERCOMPUTER CENTER at the UNIVERSITY OF CALIFORNIA; SAN DIEGO IEEE Symposium of Massive Storage Systems, May 3-5, 2010 Data-Intensive Solutions.
SUMS Storage Requirement 250 TB fixed disk cache 130 TB annual increment for permanently on- line data 100 TB work area (not controlled by SUMS) 2 PB near-line.
Real Parallel Computers. Background Information Recent trends in the marketplace of high performance computing Strohmaier, Dongarra, Meuer, Simon Parallel.
Gordon: Using Flash Memory to Build Fast, Power-efficient Clusters for Data-intensive Applications A. Caulfield, L. Grupp, S. Swanson, UCSD, ASPLOS’09.
Real Parallel Computers. Modular data centers Background Information Recent trends in the marketplace of high performance computing Strohmaier, Dongarra,
NSF Vision and Strategy for Advanced Computational Infrastructure Vision: NSF Leadership in creating and deploying a comprehensive portfolio…to facilitate.
© 2013 Mellanox Technologies 1 NoSQL DB Benchmarking with high performance Networking solutions WBDB, Xian, July 2013.
SAN DIEGO SUPERCOMPUTER CENTER at the UNIVERSITY OF CALIFORNIA, SAN DIEGO TeraGrid Coordination Meeting June 10, 2010 TeraGrid Forum Meeting June 16, 2010.
Project Overview:. Longhorn Project Overview Project Program: –NSF XD Vis Purpose: –Provide remote interactive visualization and data analysis services.
GPU Programming with CUDA – Accelerated Architectures Mike Griffiths
High Performance Computing G Burton – ICG – Oct12 – v1.1 1.
A Makeshift HPC (Test) Cluster Hardware Selection Our goal was low-cost cycles in a configuration that can be easily expanded using heterogeneous processors.
LARGE SCALE DEPLOYMENT OF DAP AND DTS Rob Kooper Jay Alemeda Volodymyr Kindratenko.
SDSC RP Update TeraGrid Roundtable Reviewing Dash Unique characteristics: –A pre-production/evaluation “data-intensive” supercomputer based.
CCS machine development plan for post- peta scale computing and Japanese the next generation supercomputer project Mitsuhisa Sato CCS, University of Tsukuba.
SIMPLE DOES NOT MEAN SLOW: PERFORMANCE BY WHAT MEASURE? 1 Customer experience & profit drive growth First flight: June, minute turn at the gate.
/ ZZ88 Performance of Parallel Neuronal Models on Triton Cluster Anita Bandrowski, Prithvi Sundararaman, Subhashini Sivagnanam, Kenneth Yoshimoto,
The Red Storm High Performance Computer March 19, 2008 Sue Kelly Sandia National Laboratories Abstract: Sandia National.
Taking the Complexity out of Cluster Computing Vendor Update HPC User Forum Arend Dittmer Director Product Management HPC April,
Embedded System Lab. 최 길 모최 길 모 Kilmo Choi Active Flash: Towards Energy-Efficient, In-Situ Data Analytics on Extreme-Scale Machines.
SAN DIEGO SUPERCOMPUTER CENTER at the UNIVERSITY OF CALIFORNIA, SAN DIEGO Michael L. Norman Principal Investigator Interim Director, SDSC Allan Snavely.
Looking Ahead: A New PSU Research Cloud Architecture Chuck Gilbert - Systems Architect and Systems Team Lead Research CI Coordinating Committee Meeting.
14 Aug 08DOE Review John Huth ATLAS Computing at Harvard John Huth.
PDSF at NERSC Site Report HEPiX April 2010 Jay Srinivasan (w/contributions from I. Sakrejda, C. Whitney, and B. Draney) (Presented by Sandy.
SAN DIEGO SUPERCOMPUTER CENTER SDSC's Data Oasis Balanced performance and cost-effective Lustre file systems. Lustre User Group 2013 (LUG13) Rick Wagner.
ITEP computing center and plans for supercomputing Plans for Tier 1 for FAIR (GSI) in ITEP  8000 cores in 3 years, in this year  Distributed.
2009/4/21 Third French-Japanese PAAP Workshop 1 A Volumetric 3-D FFT on Clusters of Multi-Core Processors Daisuke Takahashi University of Tsukuba, Japan.
1 CS : Technology Trends Ion Stoica and Ali Ghodsi ( August 31, 2015.
Nanco: a large HPC cluster for RBNI (Russell Berrie Nanotechnology Institute) Anne Weill – Zrahia Technion,Computer Center October 2008.
Cray Environmental Industry Solutions Per Nyberg Earth Sciences Business Manager Annecy CAS2K3 Sept 2003.
NICS RP Update TeraGrid Round Table March 10, 2011 Ryan Braby NICS HPC Operations Group Lead.
High Performance Computing
Computing Issues for the ATLAS SWT2. What is SWT2? SWT2 is the U.S. ATLAS Southwestern Tier 2 Consortium UTA is lead institution, along with University.
 System Requirements are the prerequisites needed in order for a software or any other resources to execute efficiently.  Most software defines two.
Tackling I/O Issues 1 David Race 16 March 2010.
Pathway to Petaflops A vendor contribution Philippe Trautmann Business Development Manager HPC & Grid Global Education, Government & Healthcare.
Parallel Computers Today Oak Ridge / Cray Jaguar > 1.75 PFLOPS Two Nvidia 8800 GPUs > 1 TFLOPS Intel 80- core chip > 1 TFLOPS  TFLOPS = floating.
Getting Started: XSEDE Comet Shahzeb Siddiqui - Software Systems Engineer Office: 222A Computer Building Institute of CyberScience May.
Scheduling a 100,000 Core Supercomputer for Maximum Utilization and Capability September 2010 Phil Andrews Patricia Kovatch Victor Hazlewood Troy Baer.
Petascale Computing Resource Allocations PRAC – NSF Ed Walker, NSF CISE/ACI March 3,
Opportunistic Computing Only Knocks Once: Processing at SDSC Ian Fisk FNAL On behalf of the CMS Collaboration.
The Evolution of the Italian HPC Infrastructure Carlo Cavazzoni CINECA – Supercomputing Application & Innovation 31 Marzo 2015.
Introduction to Data Analysis with R on HPC Texas Advanced Computing Center Feb
Slide 1 User-Centric Workload Analytics: Towards Better Cluster Management Saurabh Bagchi Purdue University Joint work with: Subrata Mitra, Suhas Javagal,
Supercomputing versus Big Data processing — What's the difference?
NIIF HPC services for research and education
HPC Roadshow Overview of HPC systems and software available within the LinkSCEEM project.
Buying into “Summit” under the “Condo” model
What is HPC? High Performance Computing (HPC)
Low-Cost High-Performance Computing Via Consumer GPUs
LinkSCEEM-2: A computational resource for the development of Computational Sciences in the Eastern Mediterranean Mostafa Zoubi SESAME Outreach SESAME,
Appro Xtreme-X Supercomputers
Super Computing By RIsaj t r S3 ece, roll 50.
CS : Technology Trends August 31, 2015 Ion Stoica and Ali Ghodsi (
System G And CHECS Cal Ribbens
CUBAN ICT NETWORK UNIVERSITY COOPERATION (VLIRED
Presentation transcript:

SAN DIEGO SUPERCOMPUTER CENTER Niches, Long Tails, and Condos Effectively Supporting Modest-Scale HPC Users 21st High Performance Computing Symposia (HPC'13) Rick Wagner High Performance System Manager San Diego Supercomputer Center April 10, 2013

SAN DIEGO SUPERCOMPUTER CENTER What trends are driving HPC in the SDSC machine room? Niches: Specialized hardware for specialized applications Long tail of science: Broadening the HPC user base Condos: Leveraging our facility to support campus users

SAN DIEGO SUPERCOMPUTER CENTER SDSC Clusters UCSD System: Triton Shared Compute Cluster Gordon Trestles XSEDE Systems Overview:

SAN DIEGO SUPERCOMPUTER CENTER SDSC Clusters Gordon Niches: Specialized hardware for specialized applications

SAN DIEGO SUPERCOMPUTER CENTER Gordon – An Innovative Data Intensive Supercomputer Designed to accelerate access to massive amounts of data in areas of genomics, earth science, engineering, medicine, and others Emphasizes memory and IO over FLOPS. Appro integrated 1,024 node Sandy Bridge cluster 300 TB of high performance Intel flash Large memory supernodes via vSMP Foundation from ScaleMP 3D torus interconnect from Mellanox In production operation since February 2012 Funded by the NSF and available through the NSF Extreme Science and Engineering Discovery Environment program (XSEDE)

SAN DIEGO SUPERCOMPUTER CENTER Gordon Design: Two Driving Ideas Observation #1: Data keeps getting further away from processor cores (“red shift”) Do we need a new level in the memory hierarchy? Observation #2: Many data-intensive applications are serial and difficult to parallelize Would a large, shared memory machine be better from the standpoint of researcher productivity for some of these?  Rapid prototyping of new approaches to data analysis

SAN DIEGO SUPERCOMPUTER CENTER Red Shift: Data keeps moving further away from the CPU with every turn of Moore’s Law Source Dean Klein, Micron Disk Access Time BIG DATA LIVES HERE

SAN DIEGO SUPERCOMPUTER CENTER Gordon Design Highlights 3D Torus Dual rail QDR 64, 2S Westmere I/O nodes 12 core, 48 GB/node 4 LSI controllers 16 SSDs Dual 10GbE SuperMicro mobo PCI Gen2 300 GB Intel 710 eMLC SSDs 300 TB aggregate 1,024 2S Xeon E5 (Sandy Bridge) nodes 16 cores, 64 GB/node Intel Jefferson Pass mobo PCI Gen3 Large Memory vSMP Supernodes 2TB DRAM 10 TB Flash “Data Oasis” Lustre PFS 100 GB/sec, 4 PB

SAN DIEGO SUPERCOMPUTER CENTER SDSC Clusters Trestles Long tail of science: Broadening the HPC user base

SAN DIEGO SUPERCOMPUTER CENTER The Majority of TeraGrid/XD Projects Have Modest-Scale Resource Needs “80/20” rule around 512 cores ~80% of projects only run jobs smaller than this … And use <20% of resources Only ~1% of projects run jobs as large as 16K cores and consume >30% of resources Many projects/users only need modest-scale jobs/resources And a modest-size resource can provide the resources for a large number of these projects/users Exceedance distributions of projects and usage as a function of the largest job (core count) run by a project over a full year (FY2009)

SAN DIEGO SUPERCOMPUTER CENTER The Modest Scale Source: XSEDE Metrics on Demand (XDMoD)

SAN DIEGO SUPERCOMPUTER CENTER Trestles Focuses on Productivity of its Users Rather than System Utilization We manage the system with a different focus than has been typical of TeraGrid/XD systems Short queue waits are key to productivity Primary system metric is expansion factor = 1 + (wait time/run time) Long-running job queues (48 hours std, up to 2 weeks) Shared nodes for interactive, accessible computing User-settable advance reservations Automatic on-demand access for urgent applications Robust suite of applications software Once expectations for system are established, say yes to user requests whenever possible …

SAN DIEGO SUPERCOMPUTER CENTER Trestles is a 100TF system with 324 nodes (Each node 4 socket*8-core/64GB DRAM/120GB flash, AMD Magny-Cours) System ComponentConfiguration AMD MAGNY-COURS COMPUTE NODE Sockets4 Cores32 Clock Speed2.4 GHz Flop Speed307 Gflop/s Memory capacity64 GB Memory bandwidth171 GB/s STREAM Triad bandwidth100 GB/s Flash memory (SSD)120 GB FULL SYSTEM Total compute nodes324 Total compute cores10,368 Peak performance100 Tflop/s Total memory20.7 TB Total memory bandwidth55.4 TB/s Total flash memory39 TB QDR INFINIBAND INTERCONNECT TopologyFat tree Link bandwidth8 GB/s (bidrectional) Peak bisection bandwidth5.2 TB/s (bidirectional) MPI latency1.3 us DISK I/O SUBSYSTEM File systemsNFS, Lustre Storage capacity (usable)150 TB: Dec PB : August PB: July 2012 I/O bandwidth50 GB/s

SAN DIEGO SUPERCOMPUTER CENTER SDSC Clusters UCSD System: Triton Shared Compute Cluster Condos: Leveraging our facility to support campus users

SAN DIEGO SUPERCOMPUTER CENTER Competitive Edge Source:

SAN DIEGO SUPERCOMPUTER CENTER Condo Model (for those who can’t afford Crays) Central facility (space, power, network, management) Researchers purchase compute nodes on grants Some campus subsidy for operation Small selection of nodes (Model T or Model A) Benefits Sustainability Efficiency from scale Harvest idle cycles Adopters: Purdue, LBNL, UCLA, Clemson, …

SAN DIEGO SUPERCOMPUTER CENTER Triton Shared Compute Cluster PerformanceTBD Compute Nodes~100 Intel XEON E5 2.6 GHz dual socket; 16 cores/node; 64 GB RAM GPU Nodes~4 Intel XEON E5 2.3 GHz dual socket; 16 cores/node; 32 GB RAM 4 NVIDIA GeForce GTX680 Interconnects10GbE QDR, Fat Tree Islands (Opt)

SAN DIEGO SUPERCOMPUTER CENTER Conclusion: Supporting Evidence Source: