Www.hdfgroup.org The HDF Group HDF5 Filters Using filters and compression in HDF5 May 30-31, 2012HDF5 Workshop at PSI 1.

Slides:



Advertisements
Similar presentations
Chapter 6 File Systems 6.1 Files 6.2 Directories
Advertisements

Processor Data Path and Control Diana Palsetia UPenn
Chapter 5 : Memory Management
The HDF Group Parallel HDF5 Developments 1 Copyright © 2010 The HDF Group. All Rights Reserved Quincey Koziol The HDF Group
Chapter 4 Memory Management Basic memory management Swapping
Chapter 6 File Systems 6.1 Files 6.2 Directories
Part IV: Memory Management
* 1 Common Dialog Control. * 2 You want your user to set property or provide your application with some information easily? How do you do it? The Common.
Copyright © 2003 by Prentice Hall Computers: Tools for an Information Age Chapter 15 Programming and Languages: Telling the Computer What to Do.
The HDF Group HDF Tools Tutorial September 28-30, 2010HDF and HDF-EOS Workshop XIV1 Peter Cao, The HDF Group Jonathan Kim, The HDF Group.
CS 450 Module R4. R4 Overview Due on March 11 th along with R3. R4 is a small yet critical part of the MPX system. In this module, you will add the functionality.
Module R2 CS450. Next Week R1 is due next Friday ▫Bring manuals in a binder - make sure to have a cover page with group number, module, and date. You.
The HDF Group November 3-5, 2009HDF/HDF-EOS Workshop XIII1 HDF5 Advanced Topics Elena Pourmal The HDF Group The 13 th HDF and HDF-EOS.
1 File Systems Chapter Files 6.2 Directories 6.3 File system implementation 6.4 Example file systems.
CSCE 121, Sec 200, 507, 508 Fall 2010 Prof. Jennifer L. Welch.
XML Compression Aslam Tajwala Kalyan Chakravorty.
Indexing Debapriyo Majumdar Information Retrieval – Spring 2015 Indian Statistical Institute Kolkata.
CSE Lectures 22 – Huffman codes
File System. NET+OS 6 File System Architecture Design Goals File System Layer Design Storage Services Layer Design RAM Services Layer Design Flash Services.
© iPerimeter Ltd Unix and IBM i  AIX and Linux run natively on Power Systems  IBM i can do Unix type things in two ways:  Posix/QShell  Ordinary.
HDF5 Tools Update Peter Cao - The HDF Group November 6, 2007 This report is based upon work supported in part by a Cooperative Agreement.
HDF 1 HDF5 Advanced Topics Object’s Properties Storage Methods and Filters Datatypes HDF and HDF-EOS Workshop VIII October 26, 2004.
Chapter 17 Domain Name System
1 High level view of HDF5 Data structures and library HDF Summit Boeing Seattle September 19, 2006.
NASA EOS DATA COMPRESSION WITH HDF5 SCALEOFFSET FILTER This work was funded by the NASA Earth Science Technology Office under NASA award AIST and.
April 6, 2010GMQS Meeting1 Optional Feature Support in HDF5 Tools Albert Cheng The HDF Group.
A Metadata Based Approach For Supporting Subsetting Queries Over Parallel HDF5 Datasets Vignesh Santhanagopalan Graduate Student Department Of CSE.
February 2-3, 2006SRB Workshop, San Diego P eter Cao, NCSA Mike Wan, SDSC Sponsored by NLADR, NFS PACI Project in Support of NCSA-SDSC Collaboration Object-level.
The HDF Group Multi-threading in HDF5: Paths Forward Current implementation - Future directions May 30-31, 2012HDF5 Workshop at PSI 1.
May 30-31, 2012HDF5 Workshop at PSI1 HDF5 at Glance Quick overview of known topics.
CCGrid 2014 Improving I/O Throughput of Scientific Applications using Transparent Parallel Compression Tekin Bicer, Jian Yin and Gagan Agrawal Ohio State.
The HDF Group HDF5 Datasets and I/O Dataset storage and its effect on performance May 30-31, 2012HDF5 Workshop at PSI 1.
April 28, 2008LCI Tutorial1 Introduction to HDF5 Tools Tutorial Part II.
The HDF Group HDF5 Tools Updates Peter Cao, The HDF Group September 28-30, 20101HDF and HDF-EOS Workshop XIV.
11/7/2007HDF and HDF-EOS Workshop XI, Landover, MD1 HDF5 Software Process MuQun Yang, Quincey Koziol, Elena Pourmal The HDF Group.
October 15, 2008HDF and HDF-EOS Workshop XII1 What will be new in HDF5?
1 N-bit and ScaleOffset filters MuQun Yang National Center for Supercomputing Applications University of Illinois at Urbana-Champaign Urbana, IL
1 HDF5 Life cycle of data Boeing September 19, 2006.
CCGrid 2014 Improving I/O Throughput of Scientific Applications using Transparent Parallel Compression Tekin Bicer, Jian Yin and Gagan Agrawal Ohio State.
The HDF Group Support for NPP/NPOESS by The HDF Group Mike Folk, Elena Pourmal, Peter Cao The HDF Group November 5, 2009 November 3-5,
CE Operating Systems Lecture 17 File systems – interface and implementation.
September 9, 2008SPEEDUP Workshop - HDF5 Tutorial1 Introduction to HDF5 Command-line Tools.
The HDF Group HDF5 Chunking and Compression Performance tuning 10/17/15 1 ICALEPCS 2015.
File Systems cs550 Operating Systems David Monismith.
A.Abhari CPS1251 Topic 1: Introduction to Computers Computer Hardware Computer components Connecting Computers Computer Software Operating System (OS)
May 30-31, 2012 HDF5 Workshop at PSI May Partial Edge Chunks Dana Robinson The HDF Group Efficient Use of HDF5 With High Data Rate X-Ray Detectors.
The HDF Group 10/17/15 1 HDF5 vs. Other Binary File Formats Introduction to the HDF5’s most powerful features ICALEPCS 2015.
May 30-31, 2012 HDF5 Workshop at PSI May Metadata Journaling Dana Robinson The HDF Group Efficient Use of HDF5 With High Data Rate X-Ray Detectors.
The HDF Group January 8, ESIP Winter Meeting Data Container Study: HDF5 in a POSIX File System or HDF5 C 3 : Compression, Chunking,
ECE 456 Computer Architecture Lecture #9 – Input/Output Instructor: Dr. Honggang Wang Fall 2013.
Lecture 20: C File Processing. Why Using Files? Storage of data in variables and arrays is temporary Data lost when a program terminates. Files are used.
Copyright © 2010 The HDF Group. All Rights Reserved1 Data Storage and I/O in HDF5.
HDF and HDF-EOS Workshop XII
Elena Pourmal The HDF Group
Hierarchical Data Formats (HDF) Update
Topics Introduction Hardware and Software How Computers Store Data
HDF5 Metadata and Page Buffering
Operation System Program 4
Performance Optimization for Embedded Software
CSCE Fall 2013 Prof. Jennifer L. Welch.
Beginning C Lecture 11 Lecturer: Dr. Zhao Qinpei
CSCE 121: Simple Computer Model Spring 2015
Topics Introduction Hardware and Software How Computers Store Data
Peter Cao The HDF Group November 28, 2006
Moving applications to HDF
Storage Structure and Efficient File Access
CSCE Fall 2012 Prof. Jennifer L. Welch.
Hierarchical Data Format (HDF) Status Update
COMP755 Advanced Operating Systems
1.6) Storing Integer: 1.7) storing fraction:
Presentation transcript:

The HDF Group HDF5 Filters Using filters and compression in HDF5 May 30-31, 2012HDF5 Workshop at PSI 1

Outline Introduction to HDF5 filters HDF5 filters Other filters and how to find them How to add your own filter Future work May 30-31, 2012HDF5 Workshop at PSI2

INTRODUCTION TO HDF5 FILTERS May 30-31, 2012HDF5 Workshop at PSI3

What is an HDF5 filter? Data transformation performed by the HDF5 library during I/O operations HDF5 filters (or built-in filters) Supported by The HDF Group Come with the HDF5 library source code User-defined filters Filters written by HDF5 users and/or available with some applications (h5py, PyTables) May be or may not be registered with The HDF Group May 30-31, 2012HDF5 Workshop at PSI4

HDF5 filters Filters are arranged in a pipeline so the output of one filter becomes the input of the next filter The filter pipeline can be only applied to -Chunked dataset -HDF5 library passes each chunk through the filter pipeline on the way to or from disk -Group -Link names are stored in a local heap, which may be compressed with a filter pipeline May 30-31, 2012HDF5 Workshop at PSI5

May 30-31, 2012HDF5 Workshop at PSI6 Filter pipeline CB A ………….. Filters are applied in a user-specified order when the HDF5 library performs I/O operations on a chunk or on a group heap ABC C File Chunk cacheChunked dataset Filter pipeline Application memory space X Y Z Group heap Filter pipeline

Filter pipeline programming model Operations on the HDF5 filter pipeline Defining a pipeline  Use a sequence of the H5Pset_filter calls or predefined API, e.g., H5Pset_deflate, on a dataset or group creation property to create a pipeline -On write, the filters are applied in the order they were specified -On read, the filters are applied in the reverse order they were specified (last one in the pipeline is applied first) -It is the user’s responsibility to create a meaningful pipeline May 30-31, 2012HDF5 Workshop at PSI7

Filter pipeline programming model Operations on the HDF5 filter pipeline Query -Number of filters in a pipeline -H5Pget_nfilters -Information about a filter using filter identifier -H5Pget_filter_by_id -Check if a filter is available in the library -H5Zfilter_avail Modify -Change properties of existing filter -H5Pmodify_filter -Remove filter from pipeline -H5Premove_filter May 30-31, 2012HDF5 Workshop at PSI8

Filter pipeline programming model Filter pipeline is permanent for dataset or a group Filters are part of an HDF5 object (group or dataset) creation property The object’s filter pipeline cannot be modified after the object has been created May 30-31, 2012HDF5 Workshop at PSI9

, 2012HDF5 Workshop at PSI10 Applying filters to a dataset dcpl_id = H5Pcreate(H5P_DATASET_CREATE); cdims[0] = 100; cdims[1] = 100; H5Pset_chunk(dcpl_id, 2, cdims); H5Pset_shuffle(dcpl); H5Pset_deflate(dcpl_id, 9); dset_id = H5Dcreate (…, dcpl_id); H5Pclose(dcpl_id);

, 2012HDF5 Workshop at PSI11 Applying filters to a group gcpl_id = H5Pcreate(H5P_GROUP_CREATE); H5Pset_deflate(dcpl_id, 9); group_id = H5Gcreate (…, gcpl_id, …); H5Pclose(gcpl_id);

HDF5 FILTERS May 30-31, 2012HDF5 Workshop at PSI12

Types of HDF5 Filters Algebraic data transformation Data shuffling Checksum Data compression -Scale + offset -N-bit -GZIP (deflate) -SZIP May 30-31, 2012HDF5 Workshop at PSI13

, 2012HDF5 Workshop at PSI14 Checking available HDF5 Filters Use API ( H5Zfilter_avail ) Check libhdf5.settings file Features: Parallel HDF5: no ………………………………………………. I/O filters (external): deflate(zlib),szip(encoder) I/O filters (internal): shuffle,fletcher32,nbit,scaleoffset ……………………………………………….

, 2012HDF5 Workshop at PSI15 External HDF5 Filters External HDF5 filters rely on the third-party libraries installed on the system GZIP By default HDF5 configure uses ZLIB installed on the system Configure will proceed if ZLIB is not found on the system SZIP (added by NASA request) Optional; have to be configured in using –with- szlib=/path…. Configure will proceed if SZIP is not found Comes with a license rcial_szip.html rcial_szip.html Decoder is free; for encoder see the license terms

, 2012HDF5 Workshop at PSI16 Internal HDF5 Filters Internal filters are implemented by The HDF Group and come with the library HDF5 internal filters can be configured out using –disable-filters=“ filter1, filter2,.. ” FLETCHER32 SHUFFLE SCALEOFFSET NBIT

17 Checksum filter Predefined HDF5 filter ( H5Pset_fletcher32) Why: Error detection for raw data What: Implements Fletcher32 checksum algorithm Checksum value MemoryFile May 30-31, 2012HDF5 Workshop at PSI

18 Shuffling filter Predefined HDF5 filter ( H5Pset_shuffle ) Why: Better compression of unused bytes What: Changes byte order in a stream of data B B B May 30-31, 2012HDF5 Workshop at PSI

19 Effect of data shuffling File sizeTotal timeWrite Time No Shuffle102.9MB Shuffle67.34MB H5Pset_shuffle followed by H5Pset_deflate Write 4-byte integer dataset 256x256x1024 (256MB) Using chunks of 256x16x1024 (16MB) Values: random integers between 0 and 255 May 30-31, 2012HDF5 Workshop at PSI

May 30-31, 2012HDF5 Workshop at PSI20 N-bit compression filter Predefined HDF5 filter ( H5Pset_nbit) Why: Compact storage for user-defined datatypes What: When data stored on disk, padding bits chopped off and only significant bits stored Supports most datatypes Works with compound datatypes

May 30-31, 2012HDF5 Workshop at PSI21 N-bit compression example In memory, one value of N-Bit datatype is stored like this: | byte 3 | byte 2 | byte 1 | byte 0 | |????????|????SPPP|PPPPPPPP|PPPP????| S-sign bit P-significant bit ?-padding bit After passing through the N-Bit filter, all padding bits are chopped off, and the bits are stored on disk like this: | 1st value | 2nd value | |SPPPPPPP PPPPPPPP|SPPPPPPP PPPPPPPP|... Opposite (decompress) when going from disk to memory

May 30-31, 2012HDF5 Workshop at PSI22 “Scale+offset” filter Predefined HDF5 filter ( H5Pset_scaleoffset) Why: Use less storage when less precision needed What: Performs scale/offset operation on each value Truncates result to fewer bits before storing Currently supports integers and floats

May 30-31, 2012HDF5 Workshop at PSI23 Example with floating-point type Data: { , , , } Choose scaling factor: decimal precision to keep E.g. scale factor D = 2 1. Find minimum value (offset): Subtract minimum value from each element Result: {5.102, 0, 1.086, 6.185} 3. Scale data by multiplying 10 D = 100 Result: {510.2, 0, 108.6, 618.5} 4. Round the data to integer Result: {510, 0, 109, 619} 5. Pack and store using min number of bits

THIRD PARTY HDF5 FILTERS May 30-31, 2012HDF5 Workshop at PSI24

, 2012HDF5 Workshop at PSI25 Third-party HDF5 filters Compression methods supported by HDF5 user community -LZO, BZIP2, BLOSC (PyTables) -LZF (h5py) -MAFISC -The Website has a patch for external module loader

HOW TO ADD YOUR OWN FILTER May 30-31, 2012HDF5 Workshop at PSI26

Filter design considerations A filter is bidirectional -Handles both input and output to the file -A flag is passed to the filter to indicate the direction The filter -Reads data from a buffer -Performs transformation on the data -Places the result in the same or new buffer -Returns the buffer pointer and size to the caller -Returns zero to indicate a failure May 30-31, 2012HDF5 Workshop at PSI27

How to proceed? Implement a filter (See H5Zregister in RM) See H5Zdeflate.c in the HDF5 src directory for ideas Application will need to Register filter with the HDF5 library using H5Zregister Add filter to pipeline using H5Pset_filter Follow the HDF5 programming model as usual May 30-31, 2012HDF5 Workshop at PSI28

Example: Adding BZIP2 compression Source: h5ex_d_bzip2.c h5bzip2.h H5Zbzip2.c Compile %h5cc h5ex_d_bzip2.c H5Zbzip2.c –lbz2 May 30-31, 2012HDF5 Workshop at PSI29

How to register new filter with us? Send request to Provide Filter information Maintainer contact information Get filter unique identifier Filter info will be available May 30-31, 2012HDF5 Workshop at PSI30

Example: h5dump output on BZIP2 data HDF5 "h5ex_d_bzip2.h5" { GROUP "/" { DATASET "DS-bzip2" {... } FILTERS { UNKNOWN_FILTER { FILTER_ID 305 COMMENT bzip2 PARAMS { 9 } }..... } DATA {h5dump error: unable to print data } May 30-31, 2012HDF5 Workshop at PSI31

Problem with using custom filter “Off the shelf” HDF5 tools do not work with the third-party filters h5dump, MATLAB and IDL, etc. Solution Modify HDF5 source with your code Use a patch from hamburg.de/research/projects/icomex/mafischttp://wr.informatik.uni- hamburg.de/research/projects/icomex/mafisc May 30-31, 2012HDF5 Workshop at PSI32

FUTURE IMPROVEMENTS May 30-31, 2012HDF5 Workshop at PSI33

Proposal in works Modify the HDF5 file format and library that allows a dynamic library to be loaded for performing filter operations Challenges: Portable solution between UNIX and Windows is required Increased maintenance cost Testing Code maintenance Documentation May 30-31, 2012HDF5 Workshop at PSI34

The HDF Group Thank You! Questions? May 30-31, 2012HDF5 Workshop at PSI 35