1 - Q2 2007 Copyright © 2006, Cluster File Systems, Inc. Lustre Networking with OFED Andreas Dilger Principal System Software Engineer

Slides:



Advertisements
Similar presentations
Archive Task Team (ATT) Disk Storage Stuart Doescher, USGS (Ken Gacke) WGISS-18 September 2004 Beijing, China.
Advertisements

Introducing Campus Networks
© 2006 Cisco Systems, Inc. All rights reserved.Cisco Public 1 Version 4.0 Implementing IP Addressing Services Accessing the WAN – Chapter 7.
Metadata Performance Improvements Presentation for LUG 2011 Ben Evans Principal Software Engineer Terascala, Inc.
Performance, Reliability, and Operational Issues for High Performance NAS Matthew O’Keefe, Alvarri CMG Meeting February 2008.
© 2004 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice InfiniBand Storage: Luster™ Technology.
RDS and Oracle 10g RAC Update Paul Tsien, Oracle.
6/10/20011 Cluster File Systems, Inc Peter J. BraamTim Reddin The Lustre Storage Architecture.
Copyright 2009 FUJITSU TECHNOLOGY SOLUTIONS PRIMERGY Servers and Windows Server® 2008 R2 Benefit from an efficient, high performance and flexible platform.
INFSO-RI An On-Demand Dynamic Virtualization Manager Øyvind Valen-Sendstad CERN – IT/GD, ETICS Virtual Node bootstrapper.
Develop Application with Open Fabrics Yufei Ren Tan Li.
NWfs A ubiquitous, scalable content management system with grid enabled cross site data replication and active storage. R. Scott Studham.
Overview of Lustre ECE, U of MN Changjin Hong (Prof. Tewfik’s group) Monday, Aug. 19, 2002.
1 I/O Management in Representative Operating Systems.
Lecture Week 7 Implementing IP Addressing Services.
National Energy Research Scientific Computing Center (NERSC) The GUPFS Project at NERSC GUPFS Team NERSC Center Division, LBNL November 2003.
11 SERVER CLUSTERING Chapter 6. Chapter 6: SERVER CLUSTERING2 OVERVIEW  List the types of server clusters.  Determine which type of cluster to use for.
Infiniband enables scalable Real Application Clusters – Update Spring 2008 Sumanta Chatterjee, Oracle Richard Frank, Oracle.
Lustre at Dell Overview Jeffrey B. Layton, Ph.D. Dell HPC Solutions |
The Product March 15 – 18, 2015 #OFADevWorkshop1.
Object-based Storage Long Liu Outline Why do we need object based storage? What is object based storage? How to take advantage of it? What's.
Yury Kissin Infrastructure Consultant Storage improvements Dynamic Memory Hyper-V Replica VM Mobility New and Improved Networking Capabilities.
SANPoint Foundation Suite HA Robert Soderbery Sr. Director, Product Management VERITAS Software Corporation.
Database Services for Physics at CERN with Oracle 10g RAC HEPiX - April 4th 2006, Rome Luca Canali, CERN.
© Copyright 2010 Hewlett-Packard Development Company, L.P. 1 HP + DDN = A WINNING PARTNERSHIP Systems architected by HP and DDN Full storage hardware and.
Technology Overview. Agenda What’s New and Better in Windows Server 2003? Why Upgrade to Windows Server 2003 ?  From Windows NT 4.0  From Windows 2000.
Roland Dreier Technical Lead – Cisco Systems, Inc. OpenIB Maintainer Sean Hefty Software Engineer – Intel Corporation OpenIB Maintainer Yaron Haviv CTO.
Copyright DataDirect Networks - All Rights Reserved - Not reproducible without express written permission Adventures Installing Infiniband Storage Randy.
Optimizing Performance of HPC Storage Systems
Appendix B Planning a Virtualization Strategy for Exchange Server 2010.
86% 50% Infrastructure provisioning Enterprise-class multi- tenant infrastructure for hybrid environments System Center capabilities Application.
Boosting Event Building Performance Using Infiniband FDR for CMS Upgrade Andrew Forrest – CERN (PH/CMD) Technology and Instrumentation in Particle Physics.
The Red Storm High Performance Computer March 19, 2008 Sue Kelly Sandia National Laboratories Abstract: Sandia National.
Presented by Reliability, Availability, and Serviceability (RAS) for High-Performance Computing Stephen L. Scott and Christian Engelmann Computer Science.
Large Scale Test of a storage solution based on an Industry Standard Michael Ernst Brookhaven National Laboratory ADC Retreat Naples, Italy February 2,
© 2010 Voltaire Inc. HPCFS AT ORLANDO LUG 2011 BILL BOAS PATH FORWARD FOR LUSTRE COMMUNITY System Fabric Works.
Swapping to Remote Memory over InfiniBand: An Approach using a High Performance Network Block Device Shuang LiangRanjit NoronhaDhabaleswar K. Panda IEEE.
03/03/09USCMS T2 Workshop1 Future of storage: Lustre Dimitri Bourilkov, Yu Fu, Bockjoo Kim, Craig Prescott, Jorge L. Rodiguez, Yujun Wu.
1 Week #10Business Continuity Backing Up Data Configuring Shadow Copies Providing Server and Service Availability.
OFED Usage in VMware Virtual Infrastructure Anne Marie Merritt, VMware Tziporet Koren, Mellanox May 1, 2007 Sonoma Workshop Presentation.
Microsoft Virtual Academy Module 8 Managing the Infrastructure with VMM.
10/22/2002Bernd Panzer-Steindel, CERN/IT1 Data Challenges and Fabric Architecture.
SAN DIEGO SUPERCOMPUTER CENTER SDSC's Data Oasis Balanced performance and cost-effective Lustre file systems. Lustre User Group 2013 (LUG13) Rick Wagner.
DYNES Storage Infrastructure Artur Barczyk California Institute of Technology LHCOPN Meeting Geneva, October 07, 2010.
Clustering In A SAN For High Availability Steve Dalton, President and CEO Gadzoox Networks September 2002.
OSIsoft High Availability PI Replication
11 CLUSTERING AND AVAILABILITY Chapter 11. Chapter 11: CLUSTERING AND AVAILABILITY2 OVERVIEW  Describe the clustering capabilities of Microsoft Windows.
Ultimate Integration Joseph Lappa Pittsburgh Supercomputing Center ESCC/Internet2 Joint Techs Workshop.
Welcome to the PVFS BOF! Rob Ross, Rob Latham, Neill Miller Argonne National Laboratory Walt Ligon, Phil Carns Clemson University.
VMware vSphere Configuration and Management v6
Cluster Shared Volumes Reborn in Windows Server 2012
© 2011 Whamcloud, Inc. Whamcloud Overview Brent Gorda President and CEO Whamcloud, Inc. LUG2011 Orlando.
CERN - IT Department CH-1211 Genève 23 Switzerland t High Availability Databases based on Oracle 10g RAC on Linux WLCG Tier2 Tutorials, CERN,
BlueWaters Storage Solution Michelle Butler NCSA January 19, 2016.
4/26/2017 © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks.
Barriers to IB adoption (Storage Perspective) Ashish Batwara Software Solution Architect May 01, 2007.
Accelerating High Performance Cluster Computing Through the Reduction of File System Latency David Fellinger Chief Scientist, DDN Storage ©2015 Dartadirect.
Tackling I/O Issues 1 David Race 16 March 2010.
Dynamic and Scalable Distributed Metadata Management in Gluster File System Huang Qiulan Computing Center,Institute of High Energy Physics,
IHEP Computing Center Site Report Gang Chen Computing Center Institute of High Energy Physics 2011 Spring Meeting.
Introduction to Data Analysis with R on HPC Texas Advanced Computing Center Feb
OSIsoft High Availability PI Replication Colin Breck, PI Server Team Dave Oda, PI SDK Team.
GPFS Parallel File System
1 Lustre ZFS Team Sun Microsystems Lustre, ZFS, and Data Integrity ZFS BOF Supercomputing 2009 Andreas Dilger Sr. Staff Engineer Sun Microsystems.
Lustre File System chris. Outlines  What is lustre  How does it works  Features  Performance.
DSS-G Configuration Bill Luken – April 10th , 2017
The demonstration of Lustre in EAST data system
Large Scale Test of a storage solution based on an Industry Standard
GGF15 – Grids and Network Virtualization
Designing Business Intelligence Solutions with Microsoft SQL Server
Presentation transcript:

1 - Q Copyright © 2006, Cluster File Systems, Inc. Lustre Networking with OFED Andreas Dilger Principal System Software Engineer Cluster File Systems, Inc.

2 - Q Copyright © 2006, Cluster File Systems, Inc. Topics Lustre Deployment Overview Lustre Network Implementation Summary of what CFS has accomplished with OFED (scalability, performance) Problems we've run into lately with OFED Future plans for OFED and LNET Lustre Now and Future

3 - Q Copyright © 2006, Cluster File Systems, Inc. Lustre Deployment Overview OSS 7 Pool of metadata servers Lustre Clients (10’s - 10,000’s) Lustre Metadata Servers (MDS) = failover MDS 1 (active) MDS 2 (standby) OSS 1 OSS 2 OSS 3 OSS 4 OSS 5 OSS 6 Lustre Object Storage Servers(OSS) (100’s) Commodity Storage Servers Enterprise-Class Storage Arrays & SAN Fabrics Simultaneous support of multiple network types Router GigE Infiniband etc Elan Myrinet InfiniBan d etc Shared storage enables failover OSS Router

4 - Q Copyright © 2006, Cluster File Systems, Inc. Lustre Network Implementation  Network features  Scalability - network 10,000’s nodes  Support for multiple networks  TCP  IB - many flavors  Elan3,4  Myricom GM, MX  Cray Seastar & RA  Routing between networks

5 - Q Copyright © 2006, Cluster File Systems, Inc. Modular Network Implementation Vendor Network Device Libraries Lustre Networking (LNET)Lustre Network Drivers (LNDs) Lustre RPCLustre Request Processing Multiple network types Network-independent Asynchronous post – completion event Message passing / RDMA Routing Request - queued Optional bulk data - RDMA Reply – RDMA Teardown Zero-copy marshalling libraries Service framework and request dispatch Connection and address naming Generic recovery infrastructure Portable Lustre component Not portable Not supplied by CFS Key:

6 - Q Copyright © 2006, Cluster File Systems, Inc. Multiple interfaces and LNET Server Multiple Interfaces vib1 Network Rail vib0 Network Rail Clients vib1 network vib0 network Server Multiple Interfaces vib1 Network Rail vib0 Network Rail Clients vib1 network vib0 network Switch Support through: multiple Lustre networks on one or two physical networks static load balance (now) dynamic load balance and failover (future)

7 - Q Copyright © 2006, Cluster File Systems, Inc. OFED Accomplishments by CFS Customers Testing OFED 1.1 with Lustre: TACC Lonestar Dresden MHPCC LLNL Peloton: >500 clients on 2 prod clusters Sandia NCSA Lincoln: 520 clients (OFED 1.0) OFED 1.1 supported in Lustre and beyond

8 - Q Copyright © 2006, Cluster File Systems, Inc. OFED Accomplishments by CFS OFED 1.1 Network Performance Attained in Tests Test Systems with PCI-X bus MB/s point to point Test Systems with PCI-express bus MB/s (testing done at LLNL)

9 - Q Copyright © 2006, Cluster File Systems, Inc. Problems (OFED 1.1) and Wishlist  Mutiple HCAs cause ARP mixup with IPoIB (#12349)  Data corruption with memfree HCA and FMR (#11984)  Duplicate completion events (#7246)  FMR performance improvement  would really like to use this

10 - Q Copyright © 2006, Cluster File Systems, Inc. Future Plans for LNET & OFED Scale to 1000’s of IB clients as systems available Currently awaiting final changes to OFED 1.2 API before final LNET integration and test

11 - Q Copyright © 2006, Cluster File Systems, Inc. Questions ~ Thank You OFED/IB-specific questions to: Eric Barton

12 - Q Copyright © 2006, Cluster File Systems, Inc. What can you do with Lustre Today? Quota, Failover, POSIX, POSIX ACL, secure portsFeatures Training, Level 1,2 & Internals. Certification for Level 1Varia Number of files: 2B File System Size: 32PB or more, Max File size: 1.2PB Capacity Native support for many different networks, with routingNetworks Metadata Servers: 1 + failover OSS servers: Tested up to 450, OST’s up to 4000 # servers Single Client or Server: 2 GB/s + BlueGene/L – first week: 74M files, 175TB written Aggregate IO (One FS): ~130GB/s (PNNL) Pure MD Operations: ~15,000 ops/second Performance Software reliability on par with hardware reliability Increased failover resiliency Stability Clients: 25,000 – Red Storm Processes: 130,000 – BlueGene/L Can have Lustre root file systems # clients

13 - Q Copyright © 2006, Cluster File Systems, Inc. Done – in or on its way to release Large ext3 partitions (8TB) support (1.4.7) Very powerful new ext4 disk allocator (1.6.1) Dramatic Linux software RAID5 performance improvements Linux pCIFS client – in beta todayOther Clients require no Linux kernel patches (1.6.0) Dramatically simpler configuration (1.6.0) Online server addition (1.6.0) Space management (1.6.0) Metadata performance improvements (1.4.7 & 1.6.0) Recovery improvements (1.6.0) Snapshots & backup solutions (1.6.0) CISCO, OpenFabrics IB (up to 1.5GB/sec!) (1.4.7) Much improved statistics for analysis (1.6.0) Snapshot file systems (1.6.0) Backup tools (1.6.1) Lustre

14 - Q Copyright © 2006, Cluster File Systems, Inc. Intergalactic Strategy Lustre v1.4 Lustre v1.6 Q Lustre v2.0 Q Lustre v Enterprise Data Management HPC Scalability Online Server Addition Simple Configuration Patchless Client Run with Linux RAID 5-10X MD perf Pools Kerberos Lustre RAID Windows pCIFS Clustered MDS 1 PFlop Systems 1 Trillion files 1M file creates / sec 30 GB/s mixed files 1 TB/s Snapshots Optimize Backups HSM Network RAID 10 TB/sec WB caches Small files Proxy Servers Disconnected Operation Lustre v1.8 Q Lustre v1.10 Q1 2008