1© Copyright 2015 EMC Corporation. All rights reserved. © Copyright 2015 EMC Corporation. All rights reserved. Under NDA 2 TIERS TM Model Performance of.

Slides:



Advertisements
Similar presentations
Introduction to Grid Application On-Boarding Nick Werstiuk
Advertisements

Ddn.com ©2012 DataDirect Networks. All Rights Reserved. GridScaler™ Overview Vic Cornell Application Support Consultant.
1© Copyright 2013 EMC Corporation. All rights reserved. EMC VSPEX END USER COMPUTING WITH VMWARE HORIZON VIEW For up to 2000 Virtual Desktops.
Serverless Network File Systems. Network File Systems Allow sharing among independent file systems in a transparent manner Mounting a remote directory.
1 © Copyright 2013 EMC Corporation. All rights reserved. EMC CLOUD TIERING APPLIANCE.
© 2009 VMware Inc. All rights reserved Big Data’s Virtualization Journey Andrew Yu Sr. Director, Big Data R&D VMware.
Server Platforms Week 11- Lecture 1. Server Market $ 46,100,000,000 ($ 46.1 Billion) Gartner.
Jharrod LaFon (HPC-3) Jim Williams (HPC-3) 2011 Computer System, Cluster, and Networking Summer Institute Russell Husted (MTU) Derek Walker (NCA&TSU) Povi.
Connecting HPIO Capabilities with Domain Specific Needs Rob Ross MCS Division Argonne National Laboratory
2 Tier Storage Architecture
STORAGE Virtualization
Module – 7 network-attached storage (NAS)
Virtual Network Servers. What is a Server? 1. A software application that provides a specific one or more services to other computers  Example: Apache.
Case Study - GFS.
© Hitachi Data Systems Corporation All rights reserved. 1 1 Det går pænt stærkt! Tony Franck Senior Solution Manager.
1© Copyright 2013 EMC Corporation. All rights reserved. EMC and Microsoft SharePoint Server Performance Name Title Date.
1© Copyright 2013 EMC Corporation. All rights reserved. November 2013 Oracle Backup and Recovery.
© 2013 Mellanox Technologies 1 NoSQL DB Benchmarking with high performance Networking solutions WBDB, Xian, July 2013.
Copyright © 2010 Platform Computing Corporation. All Rights Reserved.1 The CERN Cloud Computing Project William Lu, Ph.D. Platform Computing.
Challenges of Storage in an Elastic Infrastructure. May 9, 2014 Farid Yavari, Storage Solutions Architect and Technologist.
Introduction To Windows Azure Cloud
Cloud Computing. What is Cloud Computing? Cloud computing is a model for enabling convenient, on-demand network access to a shared pool of configurable.
MODULE – 8 OBJECT-BASED AND UNIFIED STORAGE
Module – 4 Intelligent storage system
1EMC CONFIDENTIAL—INTERNAL USE ONLY Why EMC for SQL Performance Optimization.
FlashSystem family 2014 © 2014 IBM Corporation IBM® FlashSystem™ V840 Product Overview.
Operated by Los Alamos National Security, LLC for NNSA U N C L A S S I F I E D HPC Storage and IO Trends, Towards the Exascale Era and Beyond Gary Grider.
Taking the Complexity out of Cluster Computing Vendor Update HPC User Forum Arend Dittmer Director Product Management HPC April,
Large Scale Test of a storage solution based on an Industry Standard Michael Ernst Brookhaven National Laboratory ADC Retreat Naples, Italy February 2,
DATABASE MANAGEMENT SYSTEMS IN DATA INTENSIVE ENVIRONMENNTS Leon Guzenda Chief Technology Officer.
Amy Apon, Pawel Wolinski, Dennis Reed Greg Amerson, Prathima Gorjala University of Arkansas Commercial Applications of High Performance Computing Massive.
Frontiers in Massive Data Analysis Chapter 3.  Difficult to include data from multiple sources  Each organization develops a unique way of representing.
Parallel and Grid I/O Infrastructure W. Gropp, R. Ross, R. Thakur Argonne National Lab A. Choudhary, W. Liao Northwestern University G. Abdulla, T. Eliassi-Rad.
Enterprise at a Global Scale Paul Grun Chief Scientist System Fabric Works (503)
Copyright ©2003 Digitask Consultants Inc., All rights reserved Cluster Concepts Digitask Seminar November 29, 1999 Digitask Consultants, Inc.
Towards Exascale File I/O Yutaka Ishikawa University of Tokyo, Japan 2009/05/21.
1 Public DAFS Storage for High Performance Computing using MPI-I/O: Design and Experience Arkady Kanevsky & Peter Corbett Network Appliance Vijay Velusamy.
Eduardo Gutarra Velez. Outline Distributed Filesystems Motivation Google Filesystem Architecture The Metadata Consistency Model File Mutation.
Next Generation Operating Systems Zeljko Susnjar, Cisco CTG June 2015.
High Performance Storage Solutions April 2010 Larry Jones VP, Product Marketing.
Scientific Storage at FNAL Gerard Bernabeu Altayo Dmitry Litvintsev Gene Oleynik 14/10/2015.
| nectar.org.au NECTAR TRAINING Module 4 From PC To Cloud or HPC.
Mick Badran Using Microsoft Service Fabric to build your next Solution with zero downtime – Lvl 300 CLD32 5.
Tackling I/O Issues 1 David Race 16 March 2010.
© 2013 IBM Corporation 1 Title of presentation goes Elisa Martín Garijo IBM Distinguish Engineer and CTO for IBM Spain. Global Technology.
Load Rebalancing for Distributed File Systems in Clouds.
Fast (Auto- tiering) Capacity Fast (Auto- tiering) Capacity Scale-up, No Scale-out Array Order (1) Fast Tier (EMC IOD - OFS, Syncer) (Policy-tiering) Capacity.
© Copyright 2015 EMC Corporation. All rights reserved. EMC Isilon Scale-out NAS For Syncplicity.
1© Copyright 2015 EMC Corporation. All rights reserved. © Copyright 2015 EMC Corporation. All rights reserved. Under NDA 2 TIERS TM Architecture POSIX-LIKE.
Architecture of a platform for innovation and research Erik Deumens – University of Florida SC15 – Austin – Nov 17, 2015.
BIG DATA/ Hadoop Interview Questions.
Decentralized Distributed Storage System for Big Data Presenter: Wei Xie Data-Intensive Scalable Computing Laboratory(DISCL) Computer Science Department.
Predrag Buncic CERN Data management in Run3. Roles of Tiers in Run 3 Predrag Buncic 2 ALICEALICE ALICE Offline Week, 01/04/2016 Reconstruction Calibration.
GPFS Parallel File System
Copyright © Univa Corporation, All Rights Reserved Using Containers for HPC Workloads HEPiX – Apr 21, 2016 Fritz Ferstl – CTO, Univa.
Performance Capacity SLA +SLA - Off-Premise On-Premise Application Workloads.
High Performance Storage System (HPSS) Jason Hick Mass Storage Group HEPiX October 26-30, 2009.
Extreme Scale Infrastructure
Organizations Are Embracing New Opportunities
Business Continuity & Disaster Recovery
Achieving the Ultimate Efficiency for Seismic Analysis
DSS-G Configuration Bill Luken – April 10th , 2017
Introduction to Distributed Platforms
Intelligent Archiving for Media & Entertainment
Introduction to Data Management in EGI
Module – 7 network-attached storage (NAS)
Business Continuity & Disaster Recovery
CloudSimplified.IO.
Quasardb Is a Fast, Reliable, and Highly Scalable Application Database, Built on Microsoft Azure and Designed Not to Buckle Under Demand MICROSOFT AZURE.
CS 295: Modern Systems Organizing Storage Devices
Presentation transcript:

1© Copyright 2015 EMC Corporation. All rights reserved. © Copyright 2015 EMC Corporation. All rights reserved. Under NDA 2 TIERS TM Model Performance of Flash, with Capacity and Enterprise Reliability of Object Stores 1 Percy Tzelnic Office of the CTO EMC

2© Copyright 2015 EMC Corporation. All rights reserved. Long Static, HPC Storage Now Changing Rapidly EMCEMC EMCEMC Supercomputer Tape Archive Disk-based Parallel File System Object Store EMCEMC EMCEMC

3© Copyright 2015 EMC Corporation. All rights reserved. HPC Research Leads To Enterprise Solutions EMC IOD EMC developed open source IOD – exascale I/O technology (based on 2011 CRADA research in Burst Buffer) – DOE funded exascale Storage Research, 2012 to 2014 (Fast Forward) – Semantic data storage, new storage APIs – Small, fast burst buffers tiering to larger slower object stores – Repurposed as simplified data architecture for Enterprise Technology trickle-down: 2 TIERS TM – Fast acceleration tier – Large capacity tier – Performance of flash – Retention and capacity of data lake – Global POSIX namespace over one trillion objects

4© Copyright 2015 EMC Corporation. All rights reserved. Storage Array is being disrupted: – Flash replaces disk for +100x performance (flash array) – Cloud replaces disk for +100x capacity (object store) Moving older/cold data to cloud is inevitable – Cloud approaching $0 for data at rest – Capacity disks from arrays move to the cloud, leaving the Array as a Flash only Fast Tier, on-premise We can no longer package Performance and Capacity in one box at an attractive price/value point – Split the two, hence 2 TIERS TM (Fast Tier and Capacity Tier) Technology Context In The Enterprise As The 2 nd Platform Evolves Towards The 3 rd  2 TIERS TM is a Game Changer!

5© Copyright 2015 EMC Corporation. All rights reserved. 1.Real-Time Analytics – High Performance (low latency, high bandwidth) 2.Ingest Fast Data – High Speed, High Volume data Ingest 3.Fast & Big Data Ecosystem – Processed Fast Data exported to Data Lake for Big Data Analytics 4.Enterprise 2 nd Platform Analytics – HPDA workloads (e.g., Simulation) Enterprise Fast Data (ingest, real-time analytics) is an emerging market EMC-IOD integrated with Flash products (DSSD, SIO) for Fast Data EMC-IOD integrated with Data Lake products (Isilon, ECS) for Capacity Fast Data Use Cases EMC Solutions are Powered by Intel ® Xeon ® Processor Technology

6© Copyright 2015 EMC Corporation. All rights reserved. Cloud Array O(1) File System Database, Data Warehouse block file VNX VMAX Scale-up, No Scale-out Fast (Auto-tiering) Capacity Fast (Auto-tiering) Capacity 3 rd Platform block file object Geo Scale-out Fast Tier Scale-out O(1,000) (Policy-tiering) Hyperscale O(100,000) Capacity Tier Analytics (Structured, unstructured, in-memory) Isilon Cloud Pool Intermediate Scale-out “Array” O(100) Twin Strata 2 nd Platform

7© Copyright 2015 EMC Corporation. All rights reserved. Related Work Many see the need for similar technology Products and Open Source are moving towards multi-personality Data Lakes over Object Stores This is good confirmation of two widely resonating concepts: – Object Store for Capacity, Flash for Performance! – But Users think in folders, not objects! We need a Hierarchical Namespace… – Representing an Object Store with a huge number of objects as a Hierarchical Namespace is a challenge Closest approach: LANL MarFS – Shares origins with 2 TIERS TM – Different market targets: Extreme HPC (LANL), vs. Enterprise (EMC)

8© Copyright 2015 EMC Corporation. All rights reserved. MarFS and 2 TIERS TM similarities – Motivation Object Store for capacity, but users expect POSIX interface – Basic architecture A POSIX namespace served from a parallel file system Data storage in a 1 trillion object store – Similar challenge Metadata performance for a POSIX namespace holding 1 trillion files MarFS and 2 TIERS TM differences – 2 TIERS TM has an acceleration tier for data performance – Different, complementary, techniques for metadata performance At LANL: MarFS GPFS Server (NSD) Dual Copy Raided enterprise class HD D or SSD Metadata (may have some small data (object lists that are too large to fit in xattrs) Dual Copy Raided enterprise class HD D or SSD Metadata (may have some small data (object lists that are too large to fit in xattrs) GPFS Server (NSD) Batch FTA Mounted GPFS archive, NFS, PanFS, Lustre, and GPFS-MDS hidden Pftool, obj client, PSI Batch FTA Mounted GPFS archive, NFS, PanFS, Lustre, and GPFS-MDS hidden Pftool, obj client, PSI Batch FTA Mounted GPFS archive, NFS, PanFS, Lustre, an d GPFS-MDS hidden Pftool, obj client, PSI Batch FTA Mounted GPFS archive, NFS, PanFS, Lustre, an d GPFS-MDS hidden Pftool, obj client, PSI Object Data Lakes Object Data Lakes Los Alamos Natl. Lab – MarFS BOF 165 … jointly with Gary Grider, LANL Wednesday, 5:30-7:00, Hilton Salon A “Two Tiers Scalable Storage: Building POSIX-Like Namespaces with Object Stores”

9© Copyright 2015 EMC Corporation. All rights reserved. A parallel file system supplies the Hierarchical Namespace – e.g., OrangeFS A flash-based acceleration tier – e.g., ScaleIO, DSSD A capacity scale-out data lake – e.g., ECS, Isilon A software package that binds them all together, tiering Data and Metadata; EMC will make this package Open Source – EMC IOD, 2 TIERS TM – Software Defined Storage Building Blocks Of 2 TIERS TM

10© Copyright 2015 EMC Corporation. All rights reserved. Stateless design of underlying PVFS2 – Light weight Linux kernel module, multi-threaded client – Leverage Linux containers for additional scale & HA Modular design – Abstract key-value interface for metadata – Abstract storage interface for data – Abstract networking allows RDMA, IP Client changes NOT required Supports Windows and Macintosh clients Future roadmap: OFS V3 – changes for Cloud PaaS; consistent with our direction (2017+) OrangeFS is maintained and developed by Omnibond, Clemson, SC – Agile and responsive open source community – Performance comparable to other PFS – History of 4-5 years in production OrangeFS Choice For 2 TIERS TM

11© Copyright 2015 EMC Corporation. All rights reserved. 1.Single File System Namespace with dynamically loadable namespace subsets (DLN) 2.Tiering of both Data and Metadata 3.Fast Tier Performance Target: greater than 10x Capacity Data Lake 4.Direct access (read-only) to the Capacity Tier, bypassing the Fast Tier 5.2 TIERS™ provides Tiering and Non-Tiering modes 6.No client changes required 7.No changes to the products required to instantiate Flash for Fast Tier and Object Store for Capacity Tier Unique Differentiation Of 2 TIERS TM

12© Copyright 2015 EMC Corporation. All rights reserved. Disaggregates the monolithic memory / storage / IO Stack and recasts it into loosely coupled “Fast Tier” and “Capacity Tier” 2 TIERS TM is Open Source Software Defined Storage 2 TIERS TM is all about Independent Scaling: – Scale-out for Fast Tier, O(1,000) – Hyperscale for Capacity Tier, O(100,000) 2 TIERS TM deploys equally well on the 2 nd Platform, albeit limited at the Enterprise scale 2 TIERS TM Designed For The 3 rd Platform

13© Copyright 2015 EMC Corporation. All rights reserved. Note: Compute Server interconnect RDMA, for best performance 2 TIERS TM Local Fast Tier Example App SIO EMC IOD Flash App SIO EMC IOD Flash App SIO EMC IOD Flash App SIO EMC IOD Flash App Cluster (Compute Servers) + IO Nodes (ION) + Local Flash Isilon, ECS Data Lakes Isilon, ECS Data Lakes

14© Copyright 2015 EMC Corporation. All rights reserved. App DSSD App DSSD RDMA 2 TIERS TM Network Fast Tier Example Note: Compute Server & IOD interconnect RDMA, for best performance App Cluster (Compute Servers) IO Nodes (ION) PCIe SAN EMC IOD Isilon, ECS Data Lakes Isilon, ECS Data Lakes

15© Copyright 2015 EMC Corporation. All rights reserved. Four 2 TIERS TM Instantiations POSIX Namespace DLN Fast Capacity No Tiering (1) POSIX Namespace PFS for DSSD No Tiering (2) Local FS (Mac, Win, Linux) Fast (Higher B/W > ISLN) No Tiering (3) POSIX Namespace Scale-out 2 TIERS TM ECS, ISLN DSSDISLN Object PCIe NFS OFS Syncer DSSD / ScaleIO OFS OFS ScaleIO Flash / HDD

16© Copyright 2015 EMC Corporation. All rights reserved. Demo 1 1.Load several DLNs from objects in the Capacity Tier (ECS) into the Fast Tier (DSSD) 2.Run a Lifesciences job – BLAST – on one of the DLNs, on an 8 servers cluster 3.At job completion, evict the DLNs as objects in the Capacity Tier, with a new version for the one used in the run; leave the Fast Tier empty Demo 2 1.There is no Fast Tier 2.A Translator function maps application required data and metadata from a DLN contained in an object on Capacity Tier as files which the job accesses in local storage 3.At job completion, everything is cleared in local storage while the modified file are written as new objects into the Capacity Tier 2 TIERS ™ Proof Of Concept

17© Copyright 2015 EMC Corporation. All rights reserved. Capacity Tier at Time T 0 2T metadata 2T file data IT Packed DLNs Capacity Tier at Time T 1 Read-only, Read-through Translation Service on Local FUSE File System TIME T 0 : Load DLN d, version v Fast Tier on Distributed OrangeFS TIME T 0 : Promote DLN d, version v TIME T 1 : Persist DLN d, version v+1 promoted new modified Flash hyperstub App Local Store