Using Pattern-Models to Guide SSD Deployment for Big Data in HPC Systems Junjie Chen 1, Philip C. Roth 2, Yong Chen 1 1 Data-Intensive Scalable Computing.

Slides:

Advertisements

Similar presentations

Technology Drivers Traditional HPC application drivers – OS noise, resource monitoring and management, memory footprint – Complexity of resources to be.

Advertisements

Myoungsoo Jung (UT Dallas) Mahmut Kandemir (PSU)

Priority Research Direction (I/O Models, Abstractions and Software) Key challenges What will you do to address the challenges? – Develop newer I/O models.

The Who, What, Why and How of High Performance Computing Applications in the Cloud Soheila Abrishami 1.

Availability in Globally Distributed Storage Systems

International Conference on Supercomputing June 12, 2009

O AK R IDGE N ATIONAL L ABORATORY U.S. D EPARTMENT OF E NERGY Cluster Computing Applications Project Parallelizing BLAST Research Alliance of Minorities.

Hystor : Making the Best Use of Solid State Drivers in High Performance Storage Systems Presenter : Dong Chang.

Comparing Coordinated Garbage Collection Algorithms for Arrays of Solid-state Drives Junghee Lee, Youngjae Kim, Sarp Oral, Galen M. Shipman, David A. Dillow,

Solid State Drive Feb 15. NAND Flash Memory Main storage component of Solid State Drive (SSD) USB Drive, cell phone, touch pad…

The Evaluation of an Embedded System for First Responders Nicholas Brabson The University of Tennessee David Hill Computational Sciences and Engineering.

UNIVERSITY of MARYLAND GLOBAL LAND COVER FACILITY High Performance Computing in Support of Geospatial Information Discovery and Mining Joseph JaJa Institute.

CERN openlab Open Day 10 June 2015 KL Yong Sergio Ruocco Data Center Technologies Division Speeding-up Large-Scale Storage with Non-Volatile Memory.

© 2013 Mellanox Technologies 1 NoSQL DB Benchmarking with high performance Networking solutions WBDB, Xian, July 2013.

Methods  OpenGL Functionality Visualization Tool Functionality 1)3D Shape/Adding Color1)Atom/element representations 2)Blending/Rotation 2)Rotation 3)Sphere.

O AK R IDGE N ATIONAL L ABORATORY U.S. D EPARTMENT OF E NERGY Nanoscale Electronics / Single-Electron Transport in Quantum Dot Arrays Dene Farrell SUNY.

SDSC RP Update TeraGrid Roundtable Reviewing Dash Unique characteristics: –A pre-production/evaluation “data-intensive” supercomputer based.

Integrating Visualization Peripherals into Power-Walls and Similar Tiled Display Environments James Da Cunha Savannah State University Research Alliance.

1 An SLA-Oriented Capacity Planning Tool for Streaming Media Services Lucy Cherkasova, Wenting Tang, and Sharad Singhal HPLabs,USA.

Energy Profiling And Analysis Of The HPC Challenge Benchmarks Scalable Performance Laboratory Department of Computer Science Virginia Tech Shuaiwen Song,

OAK RIDGE NATIONAL LABORATORY U.S. DEPARTMENT OF ENERGY Parallel Solution of 2-D Heat Equation Using Laplace Finite Difference Presented by Valerie Spencer.

A Metadata Based Approach For Supporting Subsetting Queries Over Parallel HDF5 Datasets Vignesh Santhanagopalan Graduate Student Department Of CSE.

SOLID STATE DRIVES By: Vaibhav Talwar UE84071 EEE(5th Sem)

Extreme-scale computing systems – High performance computing systems Current No. 1 supercomputer Tianhe-2 at petaflops Pushing toward exa-scale computing.

2010 IEEE ICECS - Athens, Greece, December1 Using Flash memories as SIMO channels for extending the lifetime of Solid-State Drives Maria Varsamou.

Data Storage Systems: A Survey Abdullah Aldhamin July 29, 2013 CMPT 880: Large-Scale Multimedia Systems and Cloud Computing Course Project.

CCGrid 2014 Improving I/O Throughput of Scientific Applications using Transparent Parallel Compression Tekin Bicer, Jian Yin and Gagan Agrawal Ohio State.

1 Wenguang WangRichard B. Bunt Department of Computer Science University of Saskatchewan November 14, 2000 Simulating DB2 Buffer Pool Management.

March 9, 2015 San Jose Compute Engineering Workshop.

Towards Exascale File I/O Yutaka Ishikawa University of Tokyo, Japan 2009/05/21.

CCGrid 2014 Improving I/O Throughput of Scientific Applications using Transparent Parallel Compression Tekin Bicer, Jian Yin and Gagan Agrawal Ohio State.

The Research Alliance in Math and Science program is sponsored by the Office of Advanced Scientific Computing Research, Office of Science, U.S. Department.

CCGrid, 2012 Supporting User Defined Subsetting and Aggregation over Parallel NetCDF Datasets Yu Su and Gagan Agrawal Department of Computer Science and.

Authors – Jeahyuk huh, Doug Burger, and Stephen W.Keckler Presenter – Sushma Myneni Exploring the Design Space of Future CMPs.

Modeling Billion-Node Torus Networks Using Massively Parallel Discrete-Event Simulation Ning Liu, Christopher Carothers 1.

A Semi-Preemptive Garbage Collector for Solid State Drives

DynamicMR: A Dynamic Slot Allocation Optimization Framework for MapReduce Clusters Nanyang Technological University Shanjiang Tang, Bu-Sung Lee, Bingsheng.

Accelerating High Performance Cluster Computing Through the Reduction of File System Latency David Fellinger Chief Scientist, DDN Storage ©2015 Dartadirect.

Tackling I/O Issues 1 David Race 16 March 2010.

Performance Comparison of Winterhawk I and Winterhawk II Systems Patrick H. Worley Computer Science and Mathematics Division Oak Ridge National Laboratory.

1 Evaluation of Cooperative Web Caching with Web Polygraph Ping Du and Jaspal Subhlok Department of Computer Science University of Houston presented at.

LIOProf: Exposing Lustre File System Behavior for I/O Middleware

Rethinking RAID for SSD based HPC Systems Yugendra R. Guvvala, Yong Chen, and Yu Zhuang Department of Computer Science, Texas Tech University, Lubbock,

DOSAS: Mitigating the Resource Contention in Active Storage Systems Chao Chen 1, Yong Chen 1 and Philip C. Roth 2 1 Texas Tech University 2 Oak Ridge National.

Introduction to Data Analysis with R on HPC Texas Advanced Computing Center Feb

Model-driven Data Layout Selection for Improving Read Performance Jialin Liu 1, Bin Dong 2, Surendra Byna 2, Kesheng Wu 2, Yong Chen 1 Texas Tech University.

Decentralized Distributed Storage System for Big Data Presenter: Wei Xie Data-Intensive Scalable Computing Laboratory(DISCL) Computer Science Department.

Fast Data Analysis with Integrated Statistical Metadata in Scientific Datasets By Yong Chen (with Jialin Liu) Data-Intensive Scalable Computing Laboratory.

Jialin Liu, Surendra Byna, Yong Chen Oct Data-Intensive Scalable Computing Laboratory (DISCL) Lawrence Berkeley National Lab (LBNL) Segmented.

April 9-10, 2015 Texas Tech University Semiannual Meeting Unistore: A Unified Storage Architecture for Cloud Computing Project Members: Wei Xie,

Extreme Scale Infrastructure

F1-17: Architecture Studies for New-Gen HPC Systems

Internal Parallelism of Flash Memory-Based Solid-State Drives

Unistore: Project Updates

Parallel-DFTL: A Flash Translation Layer that Exploits Internal Parallelism in Solid State Drives Wei Xie1 , Yong Chen1 and Philip C. Roth2 1. Texas Tech.

Database Management Systems (CS 564)

BD-CACHE Big Data Caching for Datacenters

Locality-driven High-level I/O Aggregation

Unistore: A Unified Storage Architecture for Cloud Computing

An Adaptive Data Separation Aware FTL for Improving the Garbage Collection Efficiency of Solid State Drives Wei Xie and Yong Chen Texas Tech University.

Operating Systems ECE344 Lecture 11: SSD Ding Yuan

Elastic Consistent Hashing for Distributed Storage Systems

Jiang Zhou, Wei Xie, Dong Dai, and Yong Chen

Unistore: Project Updates

Yu Su, Yi Wang, Gagan Agrawal The Ohio State University

Toward a Unified HPC and Big Data Runtime

Operating Systems (CS 340 D)

On the Role of Burst Buffers in Leadership-Class Storage Systems

CS 295: Modern Systems Storage Technologies Introduction

Contention-Aware Resource Scheduling for Burst Buffer Systems

Presentation transcript:

Using Pattern-Models to Guide SSD Deployment for Big Data in HPC Systems Junjie Chen 1, Philip C. Roth 2, Yong Chen 1 1 Data-Intensive Scalable Computing Lab (DISCL) Department of Computer Science Texas Tech University 2 Oak Ridge National Laboratory 1

Background HPC applications are increasingly data-intensive Scientific simulations have already reached 100TB – 1PB of data volume, projected at the scale of 10PB – 100PB for upcoming exascale Collected data from instruments increases rapidly too, in a global climate model, with 100 × 120 km grid cell, PBs of data managed Such trend brings a critical challenge Efficient I/O access demands Highly efficient storage system 2

Over 90% of all data in the world is being stored on magnetic media (hard disk drives, HDDs) IBM invented in 1956 Mechanism remains the same since then Various mechanical moving parts High latency, slow random access performance, unreliable, power hungry Large capacity, low cost (USD 0.10/GB), impressive sequential access performance 3 Storage Media

Non-volatile storage-class memory (SCM) Flash-memory based Solid State Drives (SSDs), PCRAM, NRAM, … Use microchips which retain data in non- volatile memory (array of floating gate transistors isolated by an insulating layer) Superior performance, high bandwidth, low latency (esp. random accesses), less susceptible to physical shock, power efficient Low capacity, high cost (USD /GB), block erasure, wear out (10K-100K P/E cycles) Intel® X25-E SSD 4 Emerging Storage Media

Motivation The challenge of leveraging SSDs and maximizing benefits remains daunting. Deploy SSDs on different nodes can have different impacts The fixed hardware budget needs to be considered. A cost-effective decision of deployment needs to be made at design/deployment phase of HPC systems. 5 Interconnect Local SSD storage

Our Study To investigate different deployment strategies Compute side and storage side Characteristics of SSDs, ratios, access patterns Consider a fixed hardware budget Pattern-Model Guided Deployment Approach Considering I/O access pattern of workloads Considering SSD characteristics via a performance model 6

Our Contributions We propose a pattern-model guided deployment approach We introduce a performance model to quantitatively analyze different SSD deployment strategies We try to answer the questions of how SSDs show be utilized for big data applications in HPC systems We have carried out initial experimental tests to verify the proposed approach. 7

Pattern-Model Guided Approach 8 Workloads I/O Requests Operation Types Workload Size Spatial Pattern Workload Characterization Pattern-Model Guided Approach Storage Arrays Parallel File System Storage Configuration HDDs SSDs Analytical Model Strategy Mapping

Workload I/O Access Pattern 9 Workload Characterization Request size, I/O operation type, spatial pattern, and ratio of local requests to remote requests. Given or obtained from I/O characterization tools, like Darshan and IOSIG. Strategy Mapping Analysis i -> Strategy j For a specific pattern, give a specific deployment strategy.

Performance Model 10 R is the total response time, R local is the local response time, R remote is the remote response time and R inter is the time spent on interconnection. W is the workload and B is the aggregate bandwidth.

Performance Model (cont.) 11 We characterize the three different response time respectively and estimate the total response time L ssd Latency of SSD L hdd Latency of HDD γ Percentage of workload serviced locally ω The available capacity of SSDs p Percentage of SSD budget deployed on compute nodes

Performance Model (cont.) 12 The tradeoff analysis, C is the capacity of SSD which one compute node could utilize, G is the SSD budget, and n is the number of compute nodes. Compute-side deployment: all the SSDs on compute nodes Storage-side deployment: all the SSDs on storage nodes Compute-Storage deployment: SSDs on both types of nodes

Preliminary Results and Analysis IOR Tested the aggregate bandwidth/execution time File size is varied, and the performance of sequential read and write, and random read and write is tested. MPI-IO Test Tested the aggregate bandwidth/execution time With different file sizes and operation types (sequential read and write, random read and write). Both benchmark with ratio γ= ¼. 13

Preliminary Results and Analysis (cont.) 14 IOR

Preliminary Results and Analysis (cont.) 15 MPI-IO Test

Conclusion Flash-memory based SSDs are promising storage devices in the storage hierarchy of HPC systems. Different deployment strategies of SSDs can impact the performance given a fixed hardware budget We proposed a pattern-model guided approach Model the performance impact of various deployment strategies Considering workload characterization and device characteristics Mapping to deployment strategy This study provides a possible solution that guides such placement and deployment strategies 16

Ongoing and Future Work A Unified HPC Storage System Managing Heterogeneous Devices We study the needs of a well- managed and unified heterogeneous storage system for HPC workloads We propose a working-set based reorganization scheme (WS-ROS) Explore the capability of SSDs and HDDs Provide a highly efficient storage system for HPC workloads 17

ACKNOWLDEGEMENT: This research is sponsored in part by the Advanced Scientific Computing Research program, Office of Science, U.S. Department of Energy. This research is also sponsored in part by Texas Tech University startup grant and National Science Foundation under NSF grant CNS The work was performed in part at the Oak Ridge National Laboratory, which is managed by UT-Battelle, LLC under Contract No. De-AC05-00OR Accordingly, the U.S. Government retains a non-exclusive, royalty-free license to publish or reproduce the published form of this contribution, or allow others to do so, for U.S. Government purposes. Thank You Please visit our website: 18 Q&A