CCGrid 2014 Improving I/O Throughput of Scientific Applications using Transparent Parallel Compression Tekin Bicer, Jian Yin and Gagan Agrawal Ohio State.

Slides:



Advertisements
Similar presentations
IPDPS Boston Integrating Online Compression to Accelerate Large-Scale Data Analytics Applications Tekin Bicer, Jian Yin, David Chiu, Gagan Agrawal.
Advertisements

Jialin Liu, Bradly Crysler, Yin Lu, Yong Chen Oct. 15. Seminar Data-Intensive Scalable Computing Laboratory (DISCL) Locality-driven High-level.
CoMPI: Enhancing MPI based applications performance and scalability using run-time compression. Rosa Filgueira, David E.Singh, Alejandro Calderón and Jesús.
A PLFS Plugin for HDF5 for Improved I/O Performance and Analysis Kshitij Mehta 1, John Bent 2, Aaron Torres 3, Gary Grider 3, Edgar Gabriel 1 1 University.
Analysis and Performance Results of a Molecular Modeling Application on Merrimac Erez, et al. Stanford University 2004 Presented By: Daniel Killebrew.
Connecting HPIO Capabilities with Domain Specific Needs Rob Ross MCS Division Argonne National Laboratory
Adnan Ozsoy & Martin Swany DAMSL - Distributed and MetaSystems Lab Department of Computer Information and Science University of Delaware September 2011.
MATE-EC2: A Middleware for Processing Data with Amazon Web Services Tekin Bicer David Chiu* and Gagan Agrawal Department of Compute Science and Engineering.
ANL Chicago Elastic and Efficient Execution of Data- Intensive Applications on Hybrid Cloud Tekin Bicer Computer Science and Engineering Ohio State.
Hadoop Team: Role of Hadoop in the IDEAL Project ●Jose Cadena ●Chengyuan Wen ●Mengsu Chen CS5604 Spring 2015 Instructor: Dr. Edward Fox.
Venkatram Ramanathan 1. Motivation Evolution of Multi-Core Machines and the challenges Background: MapReduce and FREERIDE Co-clustering on FREERIDE Experimental.
Tanzima Z. Islam, Saurabh Bagchi, Rudolf Eigenmann – Purdue University Kathryn Mohror, Adam Moody, Bronis R. de Supinski – Lawrence Livermore National.
High Throughput Compression of Double-Precision Floating-Point Data Martin Burtscher and Paruj Ratanaworabhan School of Electrical and Computer Engineering.
Alok 1Northwestern University Access Patterns, Metadata, and Performance Alok Choudhary and Wei-Keng Liao Department of ECE,
Parallel HDF5 Introductory Tutorial May 19, 2008 Kent Yang The HDF Group 5/19/20081SCICOMP 14 Tutorial.
Venkatram Ramanathan 1. Motivation Evolution of Multi-Core Machines and the challenges Summary of Contributions Background: MapReduce and FREERIDE Wavelet.
1 High level view of HDF5 Data structures and library HDF Summit Boeing Seattle September 19, 2006.
HDF5 A new file format & software for high performance scientific data management.
Ohio State University Department of Computer Science and Engineering Automatic Data Virtualization - Supporting XML based abstractions on HDF5 Datasets.
A Metadata Based Approach For Supporting Subsetting Queries Over Parallel HDF5 Datasets Vignesh Santhanagopalan Graduate Student Department Of CSE.
Performance Issues in Parallelizing Data-Intensive applications on a Multi-core Cluster Vignesh Ravi and Gagan Agrawal
1 Configurable Security for Scavenged Storage Systems NetSysLab The University of British Columbia Abdullah Gharaibeh with: Samer Al-Kiswany, Matei Ripeanu.
Ohio State University Department of Computer Science and Engineering 1 Supporting SQL-3 Aggregations on Grid-based Data Repositories Li Weng, Gagan Agrawal,
Active Storage and Its Applications Jarek Nieplocha, Juan Piernas-Canovas Pacific Northwest National Laboratory 2007 Scientific Data Management All Hands.
A Framework for Elastic Execution of Existing MPI Programs Aarthi Raveendran Tekin Bicer Gagan Agrawal 1.
HPDC 2014 Supporting Correlation Analysis on Scientific Datasets in Parallel and Distributed Settings Yu Su*, Gagan Agrawal*, Jonathan Woodring # Ayan.
CCGrid 2014 Improving I/O Throughput of Scientific Applications using Transparent Parallel Compression Tekin Bicer, Jian Yin and Gagan Agrawal Ohio State.
CMAQ Runtime Performance as Affected by Number of Processors and NFS Writes Patricia A. Bresnahan, a * Ahmed Ibrahim b, Jesse Bash a and David Miller a.
Evaluating FERMI features for Data Mining Applications Masters Thesis Presentation Sinduja Muralidharan Advised by: Dr. Gagan Agrawal.
ICPP 2012 Indexing and Parallel Query Processing Support for Visualizing Climate Datasets Yu Su*, Gagan Agrawal*, Jonathan Woodring † *The Ohio State University.
PFPC: A Parallel Compressor for Floating-Point Data Martin Burtscher 1 and Paruj Ratanaworabhan 2 1 The University of Texas at Austin 2 Cornell University.
Project 4 : SciDAC All Hands Meeting, September 11-13, 2002 A. Choudhary, W. LiaoW. Gropp, R. Ross, R. Thakur Northwestern UniversityArgonne National Lab.
VIRTUAL MEMORY By Thi Nguyen. Motivation  In early time, the main memory was not large enough to store and execute complex program as higher level languages.
1 HDF5 Life cycle of data Boeing September 19, 2006.
Computer Science and Engineering Parallelizing Defect Detection and Categorization Using FREERIDE Leonid Glimcher P. 1 ipdps’05 Scaling and Parallelizing.
SC 2013 SDQuery DSI: Integrating Data Management Support with a Wide Area Data Transfer Protocol Yu Su*, Yi Wang*, Gagan Agrawal*, Rajkumar Kettimuthu.
SAGA: Array Storage as a DB with Support for Structural Aggregations SSDBM 2014 June 30 th, Aalborg, Denmark 1 Yi Wang, Arnab Nandi, Gagan Agrawal The.
CCGrid, 2012 Supporting User Defined Subsetting and Aggregation over Parallel NetCDF Datasets Yu Su and Gagan Agrawal Department of Computer Science and.
Compiler and Runtime Support for Enabling Generalized Reduction Computations on Heterogeneous Parallel Configurations Vignesh Ravi, Wenjing Ma, David Chiu.
The HDF Group Introduction to netCDF-4 Elena Pourmal The HDF Group 110/17/2015.
GEM: A Framework for Developing Shared- Memory Parallel GEnomic Applications on Memory Constrained Architectures Mucahid Kutlu Gagan Agrawal Department.
Supporting Load Balancing for Distributed Data-Intensive Applications Leonid Glimcher, Vignesh Ravi, and Gagan Agrawal Department of ComputerScience and.
Parallel I/O Performance Study and Optimizations with HDF5, A Scientific Data Package MuQun Yang, Christian Chilan, Albert Cheng, Quincey Koziol, Mike.
Massive Semantic Web data compression with MapReduce Jacopo Urbani, Jason Maassen, Henri Bal Vrije Universiteit, Amsterdam HPDC ( High Performance Distributed.
PDAC-10 Middleware Solutions for Data- Intensive (Scientific) Computing on Clouds Gagan Agrawal Ohio State University (Joint Work with Tekin Bicer, David.
OSU – CSE 2014 Supporting Data-Intensive Scientific Computing on Bandwidth and Space Constrained Environments Tekin Bicer Department of Computer Science.
DOE Network PI Meeting 2005 Runtime Data Management for Data-Intensive Scientific Applications Xiaosong Ma NC State University Joint Faculty: Oak Ridge.
Ohio State University Department of Computer Science and Engineering Servicing Range Queries on Multidimensional Datasets with Partial Replicas Li Weng,
SDM Center Parallel I/O Storage Efficient Access Team.
Sunpyo Hong, Hyesoon Kim
Parallel I/O Performance Study and Optimizations with HDF5, A Scientific Data Package Christian Chilan, Kent Yang, Albert Cheng, Quincey Koziol, Leon Arber.
LIOProf: Exposing Lustre File System Behavior for I/O Middleware
Model-driven Data Layout Selection for Improving Read Performance Jialin Liu 1, Bin Dong 2, Surendra Byna 2, Kesheng Wu 2, Yong Chen 1 Texas Tech University.
Fast Data Analysis with Integrated Statistical Metadata in Scientific Datasets By Yong Chen (with Jialin Liu) Data-Intensive Scalable Computing Laboratory.
Failure-Atomic Slotted Paging for Persistent Memory
Distributed Network Traffic Feature Extraction for a Real-time IDS
HDF5 October 8, 2017 Elena Pourmal Copyright 2016, The HDF Group.
Locality-driven High-level I/O Aggregation
Li Weng, Umit Catalyurek, Tahsin Kurc, Gagan Agrawal, Joel Saltz
Yu Su, Yi Wang, Gagan Agrawal The Ohio State University
The Basics of Apache Hadoop
Gagan Agrawal The Ohio State University
Efficient Document Analytics on Compressed Data: Method, Challenges, Algorithms, Insights Feng Zhang †⋄, Jidong Zhai ⋄, Xipeng Shen #, Onur Mutlu ⋆, Wenguang.
Bin Ren, Gagan Agrawal, Brad Chamberlain, Steve Deitz
Declarative Transfer Learning from Deep CNNs at Scale
Yi Wang, Wei Jiang, Gagan Agrawal
COMP755 Advanced Operating Systems
Supporting Online Analytics with User-Defined Estimation and Early Termination in a MapReduce-Like Framework Yi Wang, Linchuan Chen, Gagan Agrawal The.
Efficient Migration of Large-memory VMs Using Private Virtual Memory
L. Glimcher, R. Jin, G. Agrawal Presented by: Leo Glimcher
Presentation transcript:

CCGrid 2014 Improving I/O Throughput of Scientific Applications using Transparent Parallel Compression Tekin Bicer, Jian Yin and Gagan Agrawal Ohio State University Pacific Northwest National Laboratory 1 ‡ ‡

CCGrid 2014 Introduction Increasing parallelism in HPC systems –Large-scale scientific simulations and instruments –Scalable computational throughput –Limited I/O performance Example: –PRACE-UPSCALE: 2 TB per day; expectation:10-100PB per day Higher precision. i.e. more computation and data “Big Compute” Opportunities → “Big Data” Problems –Large volume of output data –Data read and analysis –Storage, management, and transfer of data Compression 2

CCGrid 2014 Introduction (cont.) Community focus –Storing, managing and moving scientific dataset Compression can further help –Decreased amount of data Increased I/O throughput Better data transfer –Increased simulation and data analysis performance But… –Can it really benefit the application execution? Tradeoff between CPU utilization and I/O idle time –What about integration with scientific applications? Effort required by scientists to adapt their application 3

CCGrid 2014 Scientific Data Management Libs. Widely used by the community –PnetCDF (NetCDF), HDF5… NetCDF Format –Portable, self-describing, space-efficient High Performance Parallel I/O –MPI-IO Optimizations: Collective and Independent calls Hints about file system No Support for Compression 4

CCGrid 2014 Parallel and Transparent Compression for PnetCDF Parallel write operations –Size of data types and variables –Data item locations Parallel write operations with compression –Variable-size chunks –No priori knowledge about the locations –Many processes write at once 5

CCGrid 2014 Parallel and Transparent Compression for PnetCDF Desired features while enabling compression: Parallel Compression and Write –Sparse and Dense Storage Transparency –Minimum effort from application developer –Integration with PnetCDF Performance –Different variable may require different compression –Domain specific compression algorithm 6

CCGrid 2014 Outline Introduction Scientific Data Management Libraries PnetCDF Compression Approaches A Compression Methodology System Design Experimental Result Conclusion 7

CCGrid 2014 Compression: Sparse Storage Chunks/splits are created Compression layer applies user provided algs. Compressed splits are written w/ orig. offset addr. Still can benefit I/O –Only compressed data No benefit for storage space 8

CCGrid 2014 Compression: Dense Storage Generated compressed splits are appended locally Net offset addresses are calculated –Requires metadata exchange All compressed data blocks written using collective call Generated file is smaller –Advantages: I/O + storage space 9

CCGrid 2014 Compression: Hybrid Method Developer provides: –Compression ratio –Error ratio Does not require metadata exchange Error padding can be used for overflowed data Generated file is smaller Relies on user inputs 10 Off’ = Off x (1/(comp_ratio-err_ratio)

CCGrid 2014 System API Complexity of scientific data management libs. Trivial changes in scientific applications Requirement of a system API: –Defining compression function comp_f (input, in_size, output, out_size, args) –Defining decompression function decomp_f (input, in_size, output, out_size, args) –Registering user defined functions ncmpi_comp_reg (*comp_f, *decomp_f, args, …) 11

CCGrid 2014 Compression Methodology Common properties of scientific datasets –Consist of floating point numbers –Relationship between neighboring values Generic compression cannot perform well Domain specific solutions can help Approach: –Differential compression Predict the values of neighboring cells Store the difference 12

CCGrid 2014 Example: GCRM Temperature Variable Compression E.g.: Temperature record The values of neighboring cells are highly related X’ table (after prediction): X’’ compressed values –5bits for prediction + difference Lossless and lossy comp. Fast and good compression ratios 13

CCGrid 2014 PnetCDF Data Flow 1.Generated data is passed to PnetCDF lib. 2.Variable info. gathered from NetCDF header 3.Splits are compressed 1.User defined comp. alg. 4.Metadata info. exchanged 5.Parallel write ops. 6.Synch. and global view 1.Update NetCDF header 14

CCGrid 2014 Outline Introduction Scientific Data Management Libraries PnetCDF Compression Approaches A Compression Methodology System Design Experimental Result Conclusion 15

CCGrid 2014 Experimental Setup Local cluster: –Each node has 8 cores (Intel Xeon E5630, 2.53Ghz) –Memory: 12GB Infiniband network –Lustre file system: 8 OSTs, 4 storage nodes –1 Metadata Sert Microbenchmarks: 34 GB Two data analysis applications: 136 GB dataset –AT, MATT Scientific simulation application: 49 GB dataset –Mantevo Project: MiniMD 16

CCGrid 2014 Exp: (Write) Microbenchmarks 17

CCGrid 2014 Exp: (Read) Microbenchmarks 18

CCGrid 2014 Exp: Simulation (MiniMD) 19 Application Execution Times Application Write Times

CCGrid 2014 Exp: Scientific Analysis (AT) 20

CCGrid 2014 Conclusion Scientific data analysis and simulation app. –Deal with massive amount of data Management of “Big Data” –I/O throughput affects performance –Need for transparent compression –Minimum effort during integration Proposed two compression methods Implemented a compression layer in PnetCDF –Ported our proposed methods –Scientific data compression alg. Evaluated our system –MiniMD: 22% performance, 25.5% storage space –AT, MATT: 45.3% performance, 47.8% storage space 21

CCGrid 2014 Thanks 22

CCGrid 2014 PnetCDF: Example Header 23

CCGrid 2014 Exp: Microbenchmarks Dataset size: 34GB –Timestep: 270MB Comp.: 17.7GB –Timestep: 142MB Chunk size: 32MB # Processes: 64 Strip count: 8 24 Comparing Write Times with Varying Stripe Sizes

CCGrid 2014 Outline Introduction Scientific Data Management Libraries PnetCDF Compression Approaches A Compression Methodology System Design Experimental Result Conclusion 25