1 January 11-13, 2010ESRF Workshop – Introduction to HDF5 Introduction to HDF5 Francesc Alted Consultant and PyTables creator.

Slides:



Advertisements
Similar presentations
A PLFS Plugin for HDF5 for Improved I/O Performance and Analysis Kshitij Mehta 1, John Bent 2, Aaron Torres 3, Gary Grider 3, Edgar Gabriel 1 1 University.
Advertisements

The HDF Group November 3-5, 2009HDF/HDF-EOS Workshop XIII1 HDF5 Advanced Topics Elena Pourmal The HDF Group The 13 th HDF and HDF-EOS.
Streaming NetCDF John Caron July What does NetCDF do for you? Data Storage: machine-, OS-, compiler-independent Standard API (Application Programming.
11/6/07HDF and HDF-EOS Workshop XI, Landover, MD1 Introduction to HDF5 HDF and HDF-EOS Workshop XI November 6-8, 2007.
10/15/08HDF & HDF-EOS Workshop XII11 Introduction to HDF5 HDF & HDF-EOS Workshop XII October 15, 2008.
The HDF Group Introduction to HDF5 Barbara Jones The HDF Group The 13 th HDF & HDF-EOS Workshop November 3-5, HDF/HDF-EOS Workshop.
University of Illinois at Urbana-ChampaignHDF Mike Folk HDF-EOS Workshop IV Sept , 2000 HDF Update HDF.
March 9, th International LCI Conference - HDF5 Tutorial1 Tutorial II: HDF5 and NetCDF-4 10 th International LCI Conference Albert Cheng, Neil Fortner.
Euratom – ENEA Association Commonalities and differences between MDSplus and HDF5 data systems G. Manduchi Consorzio RFX, Euratom-ENEA Association, corso.
1 of 14 Substituting HDF5 tools with Python/H5py scripts Daniel Kahn Science Systems and Applications Inc. HDF HDF-EOS Workshop XIV, 28 Sep
Parallel HDF5 Introductory Tutorial May 19, 2008 Kent Yang The HDF Group 5/19/20081SCICOMP 14 Tutorial.
HDF 1 HDF5 Advanced Topics Object’s Properties Storage Methods and Filters Datatypes HDF and HDF-EOS Workshop VIII October 26, 2004.
The HDF Group April 17-19, 2012HDF/HDF-EOS Workshop XV1 Introduction to HDF5 Barbara Jones The HDF Group The 15 th HDF and HDF-EOS Workshop.
1 High level view of HDF5 Data structures and library HDF Summit Boeing Seattle September 19, 2006.
HDF5 A new file format & software for high performance scientific data management.
DM_PPT_NP_v01 SESIP_0715_AJ HDF Product Designer Aleksandar Jelenak, H. Joe Lee, Ted Habermann Gerd Heber, John Readey, Joel Plutchak The HDF Group HDF.
Sep , 2010HDF/HDF-EOS Workshop XIV1 HDF5 Advanced Topics Neil Fortner The HDF Group The 14 th HDF and HDF-EOS Workshop September 28-30, 2010.
The HDF Group Parallel HDF5 Design and Programming Model May 30-31, 2012HDF5 Workshop at PSI 1.
A Metadata Based Approach For Supporting Subsetting Queries Over Parallel HDF5 Datasets Vignesh Santhanagopalan Graduate Student Department Of CSE.
1 Overview of HDF5 HDF Summit Boeing Seattle The HDF Group (THG) September 19, 2006.
February 2-3, 2006SRB Workshop, San Diego P eter Cao, NCSA Mike Wan, SDSC Sponsored by NLADR, NFS PACI Project in Support of NCSA-SDSC Collaboration Object-level.
1 Introduction to HDF5 Data Model, Programming Model and Library APIs HDF and HDF-EOS Workshop VIII October 26, 2004.
April 28, 2008LCI Tutorial1 HDF5 Tutorial LCI April 28, 2008.
December 1, 2005HDF & HDF-EOS Workshop IX P eter Cao, NCSA December 1, 2005 Sponsored by NLADR, NFS PACI Project in Support of NCSA-SDSC Collaboration.
May 30-31, 2012HDF5 Workshop at PSI1 HDF5 at Glance Quick overview of known topics.
The HDF Group HDF5 Datasets and I/O Dataset storage and its effect on performance May 30-31, 2012HDF5 Workshop at PSI 1.
The netCDF-4 data model and format Russ Rew, UCAR Unidata NetCDF Workshop 25 October 2012.
The HDF Group HDF5 Tools Updates Peter Cao, The HDF Group September 28-30, 20101HDF and HDF-EOS Workshop XIV.
The HDF Group October 28, 2010NetcDF Workshop1 Introduction to HDF5 Quincey Koziol The HDF Group Unidata netCDF Workshop October 28-29,
1 HDF5 Life cycle of data Boeing September 19, 2006.
A High performance I/O Module: the HDF5 WRF I/O module Muqun Yang, Robert E. McGrath, Mike Folk National Center for Supercomputing Applications University.
Towards Exascale File I/O Yutaka Ishikawa University of Tokyo, Japan 2009/05/21.
1 Introduction to HDF5 Data Model, Programming Model and Library APIs HDF and HDF-EOS Workshop IX November 30, 2005.
HDF Hierarchical Data Format Nancy Yeager Mike Folk NCSA University of Illinois at Urbana-Champaign, USA
The HDF Group November 3-5, 2009HDF/HDF-EOS Workshop XIII1 HDF5 Advanced Topics Elena Pourmal The HDF Group The 13 th HDF and HDF-EOS.
September 9, 2008SPEEDUP Workshop - HDF5 Tutorial1 Introduction to HDF5 Command-line Tools.
The HDF Group Introduction to netCDF-4 Elena Pourmal The HDF Group 110/17/2015.
HDF5 Q4 Demo. Architecture Friday, May 10, 2013 Friday Seminar2.
Parallel I/O Performance Study and Optimizations with HDF5, A Scientific Data Package MuQun Yang, Christian Chilan, Albert Cheng, Quincey Koziol, Mike.
The HDF Group 10/17/15 1 HDF5 vs. Other Binary File Formats Introduction to the HDF5’s most powerful features ICALEPCS 2015.
Intro to Parallel HDF5 10/17/151ICALEPCS /17/152 Outline Overview of Parallel HDF5 design Parallel Environment Requirements Performance Analysis.
April 28, 2008LCI Tutorial1 Parallel HDF5 Tutorial Tutorial Part IV.
September 9, 2008SPEEDUP Workshop - HDF5 Tutorial1 HDF5 Tutorial 37 th SPEEDUP Workshop on HPC Albert Cheng, Elena Pourmal The HDF Group.
The HDF Group 10/17/151 Introduction to HDF5 ICALEPCS 2015.
Parallel I/O Performance Study and Optimizations with HDF5, A Scientific Data Package Christian Chilan, Kent Yang, Albert Cheng, Quincey Koziol, Leon Arber.
1 Introduction to HDF5 Programming and Tools Boeing September 19, 2006.
The HDF Group Introduction to HDF5 Session 7 Datatypes 1 Copyright © 2010 The HDF Group. All Rights Reserved.
Copyright © 2010 The HDF Group. All Rights Reserved1 Data Storage and I/O in HDF5.
Other Projects Relevant (and Not So Relevant) to the SODA Ideal: NetCDF, HDF, OLE/COM/DCOM, OpenDoc, Zope Sheila Denn INLS April 16, 2001.
- 1 - Overview of Parallel HDF Overview of Parallel HDF5 and Performance Tuning in HDF5 Library NCSA/University of Illinois at Urbana- Champaign.
The HDF Group Introduction to HDF5 Session Three HDF5 Software Overview 1 Copyright © 2010 The HDF Group. All Rights Reserved.
COMP 3500 Introduction to Operating Systems Directory Structures Block Management Dr. Xiao Qin Auburn University
HDF and HDF-EOS Workshop XII
HDF Experiences with I/O Bottlenecks
Hierarchical Data Formats (HDF) Update
Chapter 11: File System Implementation
Moving from HDF4 to HDF5/netCDF-4
Parallel HDF5 Introductory Tutorial
Introduction to HDF5 Session Five Reading & Writing Raw Data Values
Spark Presentation.
Introduction to HDF5 Tutorial.
Tad Scheiblich RSI December 2, 2005
What NetCDF users should know about HDF5?
HDF5 Virtual Dataset Elena Pourmal Copyright 2017, The HDF Group.
Introduction to HDF5 Mike McGreevy The HDF Group
Moving applications to HDF
Chapter 2: Operating-System Structures
Hierarchical Data Format (HDF) Status Update
Chapter 2: Operating-System Structures
Department of Computer Science
Presentation transcript:

1 January 11-13, 2010ESRF Workshop – Introduction to HDF5 Introduction to HDF5 Francesc Alted Consultant and PyTables creator

2 Outline ❖ Some words about me ❖ What is HDF5? ❖ Basic file structure ❖ Groups, datasets, attributes and links ❖ The software ❖ The library ❖ Other tools ❖ A short glimpse into the C and Python APIs

3 Slides Provenance ❖ First time that I do an introduction to HDF5 ❖ The HDF Group has already made a great job introducing HDF4/HDF5 to the public ❖ Asked them for permission to reuse part of their material (don't like to reinvent the wheel) ❖ Added some additional slides based on my own experience

A slice from The HDF Group

5 Me & HDF5 ❖ Started working with HDF5 back in 2002 ❖ Needed it to scratch my own itch ❖ The PyTables project, based on HDF5, started shortly after ❖ Handle large series of tabular data efficiently ❖ Buffered I/O for maximum throughput ❖ Very fast selections (leverage Numexpr) ❖ Column indexing for top-class speed queries ❖ PyTables Pro is the commercial version that allows me to continue improving the package

What is HDF5?

March 9, th International LCI Conference - HDF5 Tutorial 7 What is HDF5? HDF stands for Hierarchical Data Format A file format for managing any kind of data  Software system to manage data in the format Designed for high volume or complex data Designed for every size and type of system

March 9, th International LCI Conference - HDF5 Tutorial 8 Brief History of HDF 1987At NCSA (University of Illinois), a task force formed to create an architecture-independent format and library: AEHOO (All Encompassing Hierarchical Object Oriented format) Became HDF Early NASA adopted HDF for Earth Observing System project 1990’s 1996 DOE’s ASC (Advanced Simulation and Computing) Project began collaborating with the HDF group (NCSA) to create “Big HDF” (Increase in computing power of DOE systems at LLNL, LANL and Sandia National labs, required bigger, more complex data files). “Big HDF” became HDF HDF5 was released with support from National Labs, NASA, NCSA 2006 The HDF Group spun off from University of Illinois as non-profit corporation

March 9, th International LCI Conference - HDF5 Tutorial 9 Outstanding Features of HDF5 Can store all kinds of data in a variety of ways Runs on most systems Lots of tools to access data Long term format support (HDF-EOS, CGNS) Library and format emphasis on I/O efficiency and different kinds of storage

March 9, th International LCI Conference - HDF5 Tutorial 10 Who uses HDF5? Applications that deal with big or complex data Over 200 different types of apps 2+million product users world-wide Academia, government agencies, industry

March 9, th International LCI Conference - HDF5 Tutorial 11 NASA EOS remote sense data HDF format is the standard file format for storing data from NASA's Earth Observing System (EOS) mission. Petabytes of data stored in HDF and HDF5 to support the Global Climate Change Research Program.

March 9, th International LCI Conference - HDF5 Tutorial 12 HDF5 Basic File Structure

March 9, th International LCI Conference - HDF5 Tutorial 13 An HDF5 “file” is a container… lat | lon | temp ----|-----| | 23 | | 24 | | 21 | 3.6palette …into which you can put your data objects

March 9, th International LCI Conference - HDF5 Tutorial 14 Structures to organize objects 3-D array Raster image lat | lon | temp ----|-----| | 23 | | 23 | | 24 | | 24 | | 21 | | 21 | 3.6Table “/” (root) “/group”“/group” “Group s” “Datasets” “/link”“/link” “Links”

March 9, th International LCI Conference - HDF5 Tutorial 15 HDF5 model Groups – provide structure among objects Datasets – where the primary data goes Rich set of datatype options Flexible, efficient storage and I/O Attributes, for metadata annotations Links – point to other groups or datasets Hard, soft and external flavors Everything else is built essentially from these parts

March 9, th International LCI Conference - HDF5 Tutorial 16 HDF5 Group A mechanism for organizing collections of related objects Every file starts with a root group Similar to UNIX directories Can have attributes

March 9, th International LCI Conference - HDF5 Tutorial 17 X temp / (root) /X /Y /Y/temp /Y/bar/temp Path to HDF5 object in a file Y bar

March 9, th International LCI Conference - HDF5 Tutorial 18 HDF5 Dataset RankRankRankRank Dim ensi ons Attri bute s St or ag e inf o Da tat yp e

March 9, th International LCI Conference - HDF5 Tutorial 19 HDF5 Dataspace Two roles Dataspace contains spatial info about a dataset stored in a file Rank and dimensions Permanent part of dataset definition Dataspace describes application’s data buffer and data elements participating in I/O Rank = 2 Dimensions = 4x6 Rank = 1 Dimensions = 12

March 9, th International LCI Conference - HDF5 Tutorial 20 HDF5 Datatype Datatype – how to interpret a data element Permanent part of the dataset definition Two classes: atomic and compound

March 9, th International LCI Conference - HDF5 Tutorial 21 HDF5 Datatype HDF5 atomic types include normal integer & float user-definable (e.g., 13-bit integer) variable length types (e.g., strings) references to objects/dataset regions enumeration - names mapped to integers array HDF5 compound types Comparable to C structs (“records”) Members can be atomic or compound types

March 9, th International LCI Conference - HDF5 Tutorial 22 Recor d int8int4int16 2x3x2 array of float32 Datatype : HDF5 dataset: array of records Dimensionality: 5 x 3 3 5

March 9, th International LCI Conference - HDF5 Tutorial 23 HDF5 dataset storage layouts Compact Contiguous Chunked

March 9, th International LCI Conference - HDF5 Tutorial 24 Compact storage layout Dataset data and metadata stored together in the object header File Application memory Dataset header …………. Datatype Dataspace …………. Attributes … Metadata cacheDataset data

March 9, th International LCI Conference - HDF5 Tutorial 25 Contiguous storage layout Metadata header separate from dataset data Data stored in one contiguous block in HDF5 file Application memory Metadata cache Dataset header …………. Datatype Dataspace …………. Attributes … File Dataset data

March 9, th International LCI Conference - HDF5 Tutorial 26 Chunked storage layout Dataset data divided into equal sized blocks (chunks) Each chunk stored separately as a contiguous block in HDF5 file Application memory Metadata cache Dataset header …………. Datatype Dataspace …………. Attributes … File Dataset data ADCB heade r Chu nk inde x A B CD

November 3-5, 2009HDF/HDF-EOS Workshop XIII 27 Why HDF5 Chunking? Chunking is required for several HDF5 features Enabling compression and other filters like checksum Extendible datasets

November 3-5, 2009HDF/HDF-EOS Workshop XIII 28 Why HDF5 Chunking? If used appropriately chunking improves partial I/O for big datasets Only two chunks are involved in I/O

March 9, th International LCI Conference - HDF5 Tutorial 29 HDF5 Attribute Attribute – data of the form “name = value”, attached to an object by application Operations similar to dataset operations, but … Not extendible No compression or partial I/O Can be overwritten, deleted, added during the “life” of a dataset or a group (but not to a link)

March 9, th International LCI Conference - HDF5 Tutorial 30 HDF5 links /A/R /B/L /C/L A B C R L L

31 Hard links A B C R HL /B/HL /B/HL /C/HL /C/HL /A/R

32 Soft links /A/R /B/SL AB R SL (/A/R)

33 External links file1:/B/EL A B R EL (file2:/C) “/” file2 C file1

March 9, th International LCI Conference - HDF5 Tutorial 34 HDF5 Software

March 9, th International LCI Conference - HDF5 Tutorial 35 Tools & Applications HDF File HDF I/O Library HDF5 software stack

March 9, th International LCI Conference - HDF5 Tutorial 36 Virtual file I/O (C only)  Perform byte-stream I/O operations (open/close, read/write, seek)  User-implementable I/O (stdio, network, memory, etc.) Virtual file I/O (C only)  Perform byte-stream I/O operations (open/close, read/write, seek)  User-implementable I/O (stdio, network, memory, etc.) Library internals Performs data transformations and other prep for I/O Configurable transformations (compression, etc.) Library internals Performs data transformations and other prep for I/O Configurable transformations (compression, etc.) Structure of HDF5 Library Object API (C, Fortran 90, Java, C++)  Specify objects and transformation properties  Invoke data movement operations and data transformations Object API (C, Fortran 90, Java, C++)  Specify objects and transformation properties  Invoke data movement operations and data transformations

November 3-5, 2009HDF/HDF-EOS Workshop XIII 37 HDF5 Library Features HDF5 Library provides capabilities to Describe subsets of data and perform write/read operations on subsets Hyperslab selections and partial I/O Layered architecture Virtual I/O layers (ex. parallel I/O) Use efficient storage mechanism to achieve good performance while writing/reading subsets of data Chunking, compression

March 9, th International LCI Conference - HDF5 Tutorial 38 Partial I/O (b) Regular series of blocks from a 2D array to a contiguous sequence at a certain offset in a 1D array memorydisk (a) Hyperslab from a 2D array to the corner of a smaller 2D array memory disk Move just part of a dataset

March 9, th International LCI Conference - HDF5 Tutorial 39 (c) A sequence of points from a 2D array to a sequence of points in a 3D array. memorydisk (d) Union of hyperslabs in file to union of hyperslabs in memory. Partial I/O memorydisk Move just part of a dataset

March 9, th International LCI Conference - HDF5 Tutorial 40 Virtual file I/O (C only) Library internals Virtual I/O layer Object API (C, Fortran 90, Java, C++)

March 9, th International LCI Conference - HDF5 Tutorial 41 Virtual file I/O layer A public API for writing I/O drivers Allows HDF5 to interface to disk, memory, or a user-defined device ??? CustomFile Family MPI I/OCore Virtual file I/O drivers Memor y Stdio File Famil y File “Storage”

March 9, th International LCI Conference - HDF5 Tutorial 42 Layers – parallel example Application Parallel computing system (Linux cluster) Compute node I/O library (HDF5) Parallel I/O library (MPI-I/O) Parallel file system (GPFS) Switch network/I/O servers Compute node Disk architecture & layout of data on disk I/O flows through many layers from application to disk.

43 Optimal chunksizes

March 9, th International LCI Conference - HDF5 Tutorial 44 Other Software The HDF Group HDFView Java tools Command-line utilities (h5ls, h5dump, h5repack...) Web browser plug-in Regression and performance testing software 3 rd Party (IDL, MATLAB, Mathematica, PyTables, h5py, ViTables, HDF Explorer, LabView) Communities (EOS, ASC, CGNS) Integration with other software (OpeNDAP)

March 9, th International LCI Conference - HDF5 Tutorial 45 A short glimpse into the HDF5 APIs (C & Python)

March 9, th International LCI Conference - HDF5 Tutorial 46 The General HDF5 API Currently C, Fortran 90, Java, and C++ bindings. C routines begin with prefix H5? ? is a character corresponding to the type of object the function acts on Example APIs: H5D :Dataset interface e.g., H5Dread H5F : File interface e.g., H5Fopen H5S : dataSpace interface e.g., H5Sclose

Example: Create this HDF5 File March 9, th International LCI Conference - HDF5 Tutorial 47 A B 4x6 array of integers

March 9, th International LCI Conference - HDF5 Tutorial 48 1 hid_t file_id, dataset_id, dataspace_id; 2 hsize_t dims[2]; 3 herr_t status; 4 file_id = H5Fcreate (”file.h5", H5F_ACC_TRUNC, H5P_DEFAULT, H5P_DEFAULT); 5 dims[0] = 4; 6 dims[1] = 6; 7 dataspace_id = H5Screate_simple (2, dims, NULL); 8 dataset_id = H5Dcreate(file_id,”A",H5T_STD_I32BE, dataspace_id, H5P_DEFAULT); 9 status = H5Dclose (dataset_id); 10 status = H5Sclose (dataspace_id); 11 status = H5Fclose (file_id); Code: Create a Dataset Terminate access to dataset, dataspace, file Create a dataspace rank current dims Create a dataset dataspace datatype property list (default) pathname

March 9, th International LCI Conference - HDF5 Tutorial 49 Code: Create a Group hid_t file_id, group_id;... /* Open “file.h5” */ file_id = H5Fopen(“file.h5”, H5F_ACC_RDWR, H5P_DEFAULT); /* Create group "/B" in file. */ group_id = H5Gcreate(file_id, "/B", H5P_DEFAULT, H5P_DEFAULT); /* Close group and file. */ status = H5Gclose(group_id); status = H5Fclose(file_id);

50 1 import tables as tb 2 file_id = tb.openFile (”file.h5", “w”) 3 dataset_id = file_id.createCArray( “/”, ”A", tb.Int32Atom, shape=(4,6)) 4 group_id = file_id.createGroup("/", "B") # You don't need to explicitly close dataset or group! 5 file_id.close() Code: Create a Dataset and Group (Python/PyTables)

March 9, th International LCI Conference - HDF5 Tutorial 51 HDF5 Information HDF Information Center HDF Help address HDF users mailing lists

March 9, th International LCI Conference - HDF5 Tutorial 52 Questions?