Accelerating High Performance Cluster Computing Through the Reduction of File System Latency David Fellinger Chief Scientist, DDN Storage ©2015 Dartadirect.

Slides:



Advertisements
Similar presentations
Multiple Processor Systems
Advertisements

The Development of Mellanox - NVIDIA GPUDirect over InfiniBand A New Model for GPU to GPU Communications Gilad Shainer.
COURSE: COMPUTER PLATFORMS
HPC USER FORUM I/O PANEL April 2009 Roanoke, VA Panel questions: 1 response per question Limit length to 1 slide.
1 Cplant I/O Pang Chen Lee Ward Sandia National Laboratories Scalable Computing Systems Fifth NASA/DOE Joint PC Cluster Computing Conference October 6-8,
Katie Antypas NERSC User Services Lawrence Berkeley National Lab NUG Meeting 1 February 2012 Best Practices for Reading and Writing Data on HPC Systems.
Parallel I/O Performance Study Christian Chilan The HDF Group September 9, 2008SPEEDUP Workshop - HDF5 Tutorial1.
A Cost-Effective, High-Bandwidth Storage Architecture Garth A. Gibson, David F. Nagle, Khalil Amiri, Jeff Butler, Fay W. Chang, Howard Gobioff, Charles.
On evaluating GPFS Research work that has been done at HLRS by Alejandro Calderon.
Connecting HPIO Capabilities with Domain Specific Needs Rob Ross MCS Division Argonne National Laboratory
NFS. The Sun Network File System (NFS) An implementation and a specification of a software system for accessing remote files across LANs. The implementation.
Hardware/Software Concepts Tran, Van Hoai Department of Systems & Networking Faculty of Computer Science & Engineering HCMC University of Technology.
Replay Debugging for Distributed Systems Dennis Geels, Gautam Altekar, Ion Stoica, Scott Shenker.
© 2013 Mellanox Technologies 1 NoSQL DB Benchmarking with high performance Networking solutions WBDB, Xian, July 2013.
By N.Gopinath AP/CSE. Why a Data Warehouse Application – Business Perspectives  There are several reasons why organizations consider Data Warehousing.
Chapter 2 Architectural Models. Keywords Middleware Interface vs. implementation Client-server models OOP.
Distributed Systems Early Examples. Projects NOW – a Network Of Workstations University of California, Berkely Terminated about 1997 after demonstrating.
Computer System Architectures Computer System Software
Pooja Shetty Usha B Gowda.  Network File Systems (NFS)  Drawbacks of NFS  Parallel Virtual File Systems (PVFS)  PVFS components  PVFS application.
1 - Q Copyright © 2006, Cluster File Systems, Inc. Lustre Networking with OFED Andreas Dilger Principal System Software Engineer
 What is an operating system? What is an operating system?  Where does the OS fit in? Where does the OS fit in?  Services provided by an OS Services.
Multiple Processor Systems. Multiprocessor Systems Continuous need for faster and powerful computers –shared memory model ( access nsec) –message passing.
Lecture 8: Design of Parallel Programs Part III Lecturer: Simon Winberg.
Distributed Shared Memory: A Survey of Issues and Algorithms B,. Nitzberg and V. Lo University of Oregon.
5.1 Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved DISTRIBUTED.
1 Adapted from Pearson Prentice Hall Adapted form James A. Senn’s Information Technology, 3 rd Edition Chapter 7 Enterprise Databases and Data Warehouses.
1 Distributed Systems: an Introduction G53ACC Chris Greenhalgh.
RFC: Breaking Free from the Collective Requirement for HDF5 Metadata Operations.
© 2012 MELLANOX TECHNOLOGIES 1 The Exascale Interconnect Technology Rich Graham – Sr. Solutions Architect.
Distributed Systems: Concepts and Design Chapter 1 Pages
MediaGrid Processing Framework 2009 February 19 Jason Danielson.
CCGrid 2014 Improving I/O Throughput of Scientific Applications using Transparent Parallel Compression Tekin Bicer, Jian Yin and Gagan Agrawal Ohio State.
Building a Parallel File System Simulator E Molina-Estolano, C Maltzahn, etc. UCSC Lab, UC Santa Cruz. Published in Journal of Physics, 2009.
SciDAC All Hands Meeting, March 2-3, 2005 Northwestern University PIs:Alok Choudhary, Wei-keng Liao Graduate Students:Avery Ching, Kenin Coloma, Jianwei.
Ddn.com ©2012 DataDirect Networks. All Rights Reserved. The Future of Cloud Infrastructure Cloud Scale Storage Jean-Luc Chatelain EVP, Strategy and Technology.
Frontiers in Massive Data Analysis Chapter 3.  Difficult to include data from multiple sources  Each organization develops a unique way of representing.
© 2012 MELLANOX TECHNOLOGIES 1 Disruptive Technologies in HPC Interconnect HPC User Forum April 16, 2012.
PARALLEL COMPUTING overview What is Parallel Computing? Traditionally, software has been written for serial computation: To be run on a single computer.
Towards Exascale File I/O Yutaka Ishikawa University of Tokyo, Japan 2009/05/21.
Distributed System Concepts and Architectures 2.3 Services Fall 2011 Student: Fan Bai
CCGrid 2014 Improving I/O Throughput of Scientific Applications using Transparent Parallel Compression Tekin Bicer, Jian Yin and Gagan Agrawal Ohio State.
Data Management for Decision Support Session-4 Prof. Bharat Bhasker.
GLOBAL EDGE SOFTWERE LTD1 R EMOTE F ILE S HARING - Ardhanareesh Aradhyamath.
Active Storage Processing in Parallel File Systems Jarek Nieplocha Evan Felix Juan Piernas-Canovas SDM CENTER.
The Mach System Silberschatz et al Presented By Anjana Venkat.
Dzmitry Kliazovich University of Luxembourg, Luxembourg
Highest performance parallel storage for HPC environments Garth Gibson CTO & Founder IDC HPC User Forum, I/O and Storage Panel April 21, 2009.
Mellanox Connectivity Solutions for Scalable HPC Highest Performing, Most Efficient End-to-End Connectivity for Servers and Storage September 2010 Brandon.
Data Evolution: 101. Parallel Filesystem vs Object Stores Amazon S3 CIFS NFS.
Operating Systems: Summary INF1060: Introduction to Operating Systems and Data Communication.
GPFS: A Shared-Disk File System for Large Computing Clusters Frank Schmuck & Roger Haskin IBM Almaden Research Center.
Background Computer System Architectures Computer System Software.
Intro to Distributed Systems Hank Levy. 23/20/2016 Distributed Systems Nearly all systems today are distributed in some way, e.g.: –they use –they.
CLIENT SERVER COMPUTING. We have 2 types of n/w architectures – client server and peer to peer. In P2P, each system has equal capabilities and responsibilities.
LIOProf: Exposing Lustre File System Behavior for I/O Middleware
Distributed Systems: Distributed File Systems Ghada Ahmed, PhD. Assistant Prof., Computer Science Dept. Web:
An Introduction to GPFS
Chapter 16 Client/Server Computing Dave Bremer Otago Polytechnic, N.Z. ©2008, Prentice Hall Operating Systems: Internals and Design Principles, 6/E William.
Introduction to Data Analysis with R on HPC Texas Advanced Computing Center Feb
1 © 2014 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others. Any statements or representations around future events.
Ddn.com © 2015 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others. Any statements or representations around future.
Ddn.com © 2015 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others. Any statements or representations around future.
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING CLOUD COMPUTING
Experience of Lustre at QMUL
Lock Ahead: Shared File Performance Improvements
Multiple Processor Systems
CLUSTER COMPUTING.
Multiple Processor and Distributed Systems
DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S
#01 Client/Server Computing
Presentation transcript:

Accelerating High Performance Cluster Computing Through the Reduction of File System Latency David Fellinger Chief Scientist, DDN Storage ©2015 Dartadirect Networks, Inc. All rights reserved.

Accelerating HPC Applications Traditionally, where has the focus been? In large clusters, primarily on the cluster itself Lower latency interconnects More efficient message passing structures Higher performance processors & GPUs Also, research & study on processing techniques to achieve true parallel processing operations Symmetric multi-processing vs. Efficient message passing

Today’s Challenge: Bulk I/O Latency COMPUTE WAITING I/O Phases HPC Jobs COMPUTE PROCESSING Job Execution Phases

What’s Needed... Compressed I/O HPC Jobs Reduced I/O Phases Reduced Time to Solution Benefit Get the answer faster by reducing I/O cycle time without adding additional compute hardware

Other Industry Approaches NEW WAY Utilizing parallelism for network transfer efficiency Data sharing P2P Grid OLD WAY: Single FTP Server (regardless of performance & availability)

Why is HPC living in the “Internet 80’s”? This phone is way too heavy to have only 1 conversation at a time

Compute Nodes / Clients Parallel File System IB Network OSS OST MDS MDT 7 Is a Parallel File System Really Parallel? Deterministic Write Schema Not Parallel – It’s a Bottleneck! Each client competes to talk to a single OSS (serial operation) OSS What’s Needed? A Write Anywhere file system is the next evolution OST

File Creation in Lustre ® 1.Ext4 extended write (I make a call to a Lustre client) 2.Redirect request to metadata server 3.Metadata server returns OST number & iNode number, execute locks 4.Begin classic write operation: iNode to file loc table EXT 4 gathers blocks from garbage collection to extent list 5.Metadata server assignment of handle - Then lock is released But... where’s the FATs? But... where’s the FATs?

Compute Nodes / Clients Parallel File System IB Network OSS MDS MDT 9 IME Makes a Parallel File System Parallel Write Anywhere Schema Now Parallel Data Access POSIX semantics and PFS bottleneck broken! OSS OST IME IB Network

File Creation in IME ® Compute Nodes / Clients IME The Magic of Write Anywhere Now... The PFS is Parallel! Every compute node can write file increments to every storage node along with metadata & erasure coding TRUE PARALLEL FILE CREATION In IME, we’ve implemented a DHT. Now, the compute is sharing files the same way Napster users do

Introducing IME ® Technical Key Components and Operations Transparent to Apps A thin IME Client resides on cluster compute or I/O nodes. Standards Embraced Traditional Lustre ® (or GPFS™) Client model with MDS (or NSD) interface Application Transparency The Parallel File System (PFS) is unaware that IME is an intermediary No Source Modifications Interfaces include: POSIX, MPI-IO, ROMIO, HDF5, NetCDF, etc. Breakthrough Performance & Scalability Low-level communications key value pair based protocol - not POSIX Intelligent File Management IME Server software does all the heavy lifting POSIX MPI-IO EXAScaler™ & GRIDScaler™ PF S 11

IME Benchmarking IOR MPI-IO 12

ddn.com © 2015 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others. Any statements or representations around future events are subject to change. DDN CONFIDENTIAL For Internal Use Only ONLY Do NOT Reproduce or Distribute IME Benchmarking Results 63GB/s GB/s IME ELMINATES POSIX CONTENTIONS, ENABLING I/O TO RUN AT LINE RATE 32X I/O ACCELERATION From 2.2GB/s to 62.8GB/s 2GB/s “PROBLEM APPLICATIONS” CHOKE PARALLEL FILE SYSTEMS & SLOW DOWN THE ENTIRE CLUSTER OBSTACLES: PFS LOCKING, SMALL I/O, AND MAL-ALIGNED, FRAGMENTED I/O PATTERNS Description: HACC-IO is an HPC Cosmology Kernel 10GB/s 64GB/s 13

Moral of the story HPC Cluster = Large Group of Data Users Why haven’t we learned... What the internet p2p guys have known for a long time? Learn to share! The precious resources of network bandwidth & storage

ddn.com © 2015 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others. Any statements or representations around future events are subject to change Patrick Henry Drive Santa Clara, CA Thank You! Keep in touch with us David Fellinger Chief Scientist, DDN Storage