SAN DIEGO SUPERCOMPUTER CENTER SDSC's Data Oasis Balanced performance and cost-effective Lustre file systems. Lustre User Group 2013 (LUG13) Rick Wagner.

Slides:



Advertisements
Similar presentations
The Development of Mellanox - NVIDIA GPUDirect over InfiniBand A New Model for GPU to GPU Communications Gilad Shainer.
Advertisements

Statewide IT Conference30-September-2011 HPC Cloud Penguin on David Hancock –
SAN DIEGO SUPERCOMPUTER CENTER Emerging HIPAA and Protected Data Requirements for Research Computing at SDSC Ron Hawkins Director of Industry Relations.
2013 Summer Institute: Discover Big Data, August 5-9, San Diego, California SAN DIEGO SUPERCOMPUTER CENTER at the UNIVERSITY OF CALIFORNIA, SAN DIEGO Dealing.
IBM 1350 Cluster Expansion Doug Johnson Senior Systems Developer.
CURRENT AND FUTURE HPC SOLUTIONS. T-PLATFORMS  Russia’s leading developer of turn-key solutions for supercomputing  Privately owned  140+ employees.
Performance, Reliability, and Operational Issues for High Performance NAS Matthew O’Keefe, Alvarri CMG Meeting February 2008.
SAN DIEGO SUPERCOMPUTER CENTER Using Gordon to Accelerate LHC Science Rick Wagner San Diego Supercomputer Center XSEDE 13 July 22-25, 2013 San Diego, CA.
SAN DIEGO SUPERCOMPUTER CENTER at the UNIVERSITY OF CALIFORNIA, SAN DIEGO Gordon: NSF Flash-based System for Data-intensive Science Mahidhar Tatineni 37.
SAN DIEGO SUPERCOMPUTER CENTER Niches, Long Tails, and Condos Effectively Supporting Modest-Scale HPC Users 21st High Performance Computing Symposia (HPC'13)
PRAKTICKÝ ÚVOD DO SUPERPOČÍTAČE ANSELM Infrastruktura, přístup a podpora uživatelů David Hrbáč
PRISM: High-Capacity Networks that Augment Campus’ General Utility Production Infrastructure Philip Papadopoulos, PhD. Calit2 and SDSC.
High Performance Analytical Appliance MPP Database Server Platform for high performance Prebuilt appliance with HW & SW included and optimally configured.
Architecture and Implementation of Lustre at the National Climate Computing Research Center Douglas Fuller National Climate Computing Research Center /
SAN DIEGO SUPERCOMPUTER CENTER at the UNIVERSITY OF CALIFORNIA; SAN DIEGO SDSC RP Update October 21, 2010.
© 2004 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice InfiniBand Storage: Luster™ Technology.
SAN DIEGO SUPERCOMPUTER CENTER at the UNIVERSITY OF CALIFORNIA, SAN DIEGO Performance of Applications Using Dual-Rail InfiniBand 3D Torus Network on the.
Scale-out Central Store. Conventional Storage Verses Scale Out Clustered Storage Conventional Storage Scale Out Clustered Storage Faster……………………………………………….
SAN DIEGO SUPERCOMPUTER CENTER at the UNIVERSITY OF CALIFORNIA; SAN DIEGO IEEE Symposium of Massive Storage Systems, May 3-5, 2010 Data-Intensive Solutions.
NWfs A ubiquitous, scalable content management system with grid enabled cross site data replication and active storage. R. Scott Studham.
What is it? Hierarchical storage software developed in collaboration with five US department of Energy Labs since 1992 Allows storage management of 100s.
Efficiently store fewer bits. File1 File2 After Dedup: Before Dedup:5TB Chunk Store Non-Optimized Files Optimized file stubs Savings = 4TB 1TB.
National Energy Research Scientific Computing Center (NERSC) The GUPFS Project at NERSC GUPFS Team NERSC Center Division, LBNL November 2003.
Real Parallel Computers. Modular data centers Background Information Recent trends in the marketplace of high performance computing Strohmaier, Dongarra,
Lustre at Dell Overview Jeffrey B. Layton, Ph.D. Dell HPC Solutions |
Understand what’s new for Windows File Server Understand considerations for building Windows NAS appliances Understand how to build a customized NAS experience.
© 2013 Mellanox Technologies 1 NoSQL DB Benchmarking with high performance Networking solutions WBDB, Xian, July 2013.
The Product March 15 – 18, 2015 #OFADevWorkshop1.
© Copyright 2010 Hewlett-Packard Development Company, L.P. 1 HP + DDN = A WINNING PARTNERSHIP Systems architected by HP and DDN Full storage hardware and.
1 Advanced Storage Technologies for High Performance Computing Sorin, Faibish EMC NAS Senior Technologist IDC HPC User Forum, April 14-16, Norfolk, VA.
CERN IT Department CH-1211 Geneva 23 Switzerland t Experience with NetApp at CERN IT/DB Giacomo Tenaglia on behalf of Eric Grancher Ruben.
1 - Q Copyright © 2006, Cluster File Systems, Inc. Lustre Networking with OFED Andreas Dilger Principal System Software Engineer
SDSC RP Update TeraGrid Roundtable Reviewing Dash Unique characteristics: –A pre-production/evaluation “data-intensive” supercomputer based.
Optimizing Performance of HPC Storage Systems
Voltaire The Grid Backbone™ InfiniBand CERN Seminar Asaf Somekh VP Strategic Alliances June 2006.
Ceph Storage in OpenStack Part 2 openstack-ch,
Big Red II & Supporting Infrastructure Craig A. Stewart, Matthew R. Link, David Y Hancock Presented at IUPUI Faculty Council Information Technology Subcommittee.
SIMPLE DOES NOT MEAN SLOW: PERFORMANCE BY WHAT MEASURE? 1 Customer experience & profit drive growth First flight: June, minute turn at the gate.
HEAnet Centralised NAS Storage Justin Hourigan, Senior Network Engineer, HEAnet Limited.
The Red Storm High Performance Computer March 19, 2008 Sue Kelly Sandia National Laboratories Abstract: Sandia National.
Appro Products and Solutions Anthony Kenisky, Vice President of Sales Appro, Premier Provider of Scalable Supercomputing Solutions: 9/1/09.
Large Scale Test of a storage solution based on an Industry Standard Michael Ernst Brookhaven National Laboratory ADC Retreat Naples, Italy February 2,
SAN DIEGO SUPERCOMPUTER CENTER at the UNIVERSITY OF CALIFORNIA, SAN DIEGO Michael L. Norman Principal Investigator Interim Director, SDSC Allan Snavely.
Presented by Leadership Computing Facility (LCF) Roadmap Buddy Bland Center for Computational Sciences Leadership Computing Facility Project.
SoCal Infrastructure OptIPuter Southern California Network Infrastructure Philip Papadopoulos OptIPuter Co-PI University of California, San Diego Program.
JLab Scientific Computing: Theory HPC & Experimental Physics Thomas Jefferson National Accelerator Facility Newport News, VA Sandy Philpott.
10/22/2002Bernd Panzer-Steindel, CERN/IT1 Data Challenges and Fabric Architecture.
Large Scale Parallel File System and Cluster Management ICT, CAS.
ITEP computing center and plans for supercomputing Plans for Tier 1 for FAIR (GSI) in ITEP  8000 cores in 3 years, in this year  Distributed.
Terascala – Lustre for the Rest of Us  Delivering high performance, Lustre-based parallel storage appliances  Simplifies deployment, management and tuning.
CASPUR Site Report Andrei Maslennikov Lead - Systems Rome, April 2006.
Scientific Storage at FNAL Gerard Bernabeu Altayo Dmitry Litvintsev Gene Oleynik 14/10/2015.
Trust & File Systems Rick Wagner HPC Systems Manager.
Evolving Scientific Data Workflow CAS 2011 Pamela Gillman
Computing Issues for the ATLAS SWT2. What is SWT2? SWT2 is the U.S. ATLAS Southwestern Tier 2 Consortium UTA is lead institution, along with University.
Click to edit Master title style Literature Review Interconnection Architectures for Petabye-Scale High-Performance Storage Systems Andy D. Hospodor, Ethan.
Tackling I/O Issues 1 David Race 16 March 2010.
SYSTEM MODELS FOR ADVANCED COMPUTING Jhashuva. U 1 Asst. Prof CSE
20071 Native Infiniband Storage John Josephakis, VP, Data Direct Networks St. Louis – November 2007.
Architecture of a platform for innovation and research Erik Deumens – University of Florida SC15 – Austin – Nov 17, 2015.
Introduction to Data Analysis with R on HPC Texas Advanced Computing Center Feb
ORNL is managed by UT-Battelle for the US Department of Energy OLCF HPSS Performance Then and Now Jason Hill HPC Operations Storage Team Lead
GPFS Parallel File System
Disruptive Storage Workshop Lustre Hardware Primer
Commercial Networking Solutions
DSS-G Configuration Bill Luken – April 10th , 2017
The demonstration of Lustre in EAST data system
Appro Xtreme-X Supercomputers
Regional Software Defined Science DMZ (SD-SDMZ)
Large Scale Test of a storage solution based on an Industry Standard
Presentation transcript:

SAN DIEGO SUPERCOMPUTER CENTER SDSC's Data Oasis Balanced performance and cost-effective Lustre file systems. Lustre User Group 2013 (LUG13) Rick Wagner San Diego Supercomputer Center Jeff Johnson Aeon Computing April 18, 2013

SAN DIEGO SUPERCOMPUTER CENTER Data Oasis High performance, high capacity Lustre-based parallel file system 10GbE I/O backbone for all of SDSC’s HPC systems, supporting multiple architectures Integrated by Aeon Computing using their EclipseSL Scalable, open platform design Driven by 100GB/s bandwidth target for Gordon Motivated by $/TB and $/GB/s $1.5M = + = 4PB = 100GB/s 6.4PB capacity and growing Currently Lustre 1.8.7

SAN DIEGO SUPERCOMPUTER CENTER Data Oasis Heterogeneous Architecture OSS72TBOSS72TB 64 OSS (Object Storage Servers) Provide 100GB/s Performance and >4PB Raw Capacity JBODs (Just a Bunch Of Disks) Provide Capacity Scale-out Arista G 10G 10G 10G Redundant Switches for Reliability and Performance 3 Distinct Network Architectures OSS72TBOSS72TBOSS72TBOSS72TBOSS108TBOSS108TB JBOD 132TB JBOD 132TB 64 Lustre LNET Routers 100 GB/s 64 Lustre LNET Routers 100 GB/s Mellanox 5020 Bridge 12 GB/s Mellanox 5020 Bridge 12 GB/s MDS: Gordon scratch MDS: Trestles scratch Juniper 10G Switches XX GB/s Juniper 10G Switches XX GB/s MDS: Triton scratch GORDON IB cluster GORDON TRITON 10G & IB cluster TRITON TRESTLES IB cluster TRESTLES Metadata Servers MDS: Gordon & Trestles project

SAN DIEGO SUPERCOMPUTER CENTER File Systems

SAN DIEGO SUPERCOMPUTER CENTER Data Oasis Servers MDS (active)OSSOSS+JBOD MDS (backup) LSI RAID 10 (2x6) Myri10GbE LSI RAID 6 (7+2) Myri10GbE LSI RAID 6 (8+2) Myri10GbE x4 … … 3TB drives 2TB drives

SAN DIEGO SUPERCOMPUTER CENTER Trestles Architecture QDR 40 Gb/s GbE 10GbE Ethernet Management QDR InfiniBand Switch NFS Servers (4x) Gordon & Trestles Shared Compute Node Compute Node Compute Node Data Movers (4x) Data Oasis Lustre PFS 4 PB XSEDE & R&E Networks Login Nodes (2x ) Compute Node 324 QDR IB GbE management GbE public Round robin login Mirrored NFS Redundant front-end Arista (2x MLAG) GbE switch IB/Ethernet Bridge switch Mgmt. Nodes (2x) Gordon Cluster 4x SDSC Network 12x

SAN DIEGO SUPERCOMPUTER CENTER Gordon Network Architecture QDR 40 Gb/s GbE 2x10GbE 10GbE 3D torus: rail 13D torus: rail 2 Mgmt. Nodes (2x) Mgmt. Edge & Core Ethernet Public Edge & Core Ethernet NFS Server (4x) Compute Node Compute Node Compute Node Data Movers (4x) Data Oasis Lustre PFS 4 PB XSEDE & R&E Networks SDSC Network IO Nodes Login Nodes (4x ) Compute Node 1, Dual-rail IB Dual 10GbE storage GbE management GbE public Round robin login Mirrored NFS Redundant front-end Arista 10 GbE switch 4x 128x

SAN DIEGO SUPERCOMPUTER CENTER Gordon Network Design Detail

SAN DIEGO SUPERCOMPUTER CENTER Data Oasis Performance – Measured from Gordon

SAN DIEGO SUPERCOMPUTER CENTER Issues & The Future LNET “death spiral” LNET tcp peers stop communicating, packets back up We need to upgrade to Lustre 2.x soon Can’t wait for MDS SMP improvements & DNE Design drawback: juggling data is a pain Client virtualization testing SR-IOV very promising for o2ib clients Watching the Fast Forward program Gordon’s architecture ideally suited to burst buffers HSM Really want to tie Data Oasis to SDSC Cloud