April 28, 2008LCI Tutorial1 Introduction to HDF5 Tools Tutorial Part II.

Slides:



Advertisements
Similar presentations
ILDG File Format Chip Watson, for Middleware & MetaData Working Groups.
Advertisements

HDF4 and HDF5 Performance Preliminary Results Elena Pourmal IV HDF-EOS Workshop September
NetCDF4 Performance Benchmark. Part I Will the performance in netCDF4 comparable with that in netCDF3? Will the performance in netCDF4 comparable with.
1 Project 7: Huffman Code. 2 Extend the most recent version of the Huffman Code program to include decode information in the binary output file and use.
© 2008The MathWorks, Inc. ® ® The MATLAB Low-Level HDF5 Interface John Evans.
March 9, th International LCI Conference - HDF5 Tutorial1 Tutorial II: HDF5 and NetCDF-4 10 th International LCI Conference Albert Cheng, Neil Fortner.
Offline File Storage. Module 12 Offline File Storage ♦ Introduction Backup is usually done by first collecting all the data in a single archive file,
Support for NPP/NPOESS by The HDF Group Mike Folk, Elena Pourmal, Peter Cao The HDF Group June 30, NPOESS Data Formats Working Group.
1 of 14 Substituting HDF5 tools with Python/H5py scripts Daniel Kahn Science Systems and Applications Inc. HDF HDF-EOS Workshop XIV, 28 Sep
HDF5 Tools Update Peter Cao - The HDF Group November 6, 2007 This report is based upon work supported in part by a Cooperative Agreement.
HDF 1 HDF5 Advanced Topics Object’s Properties Storage Methods and Filters Datatypes HDF and HDF-EOS Workshop VIII October 26, 2004.
The HDF Group April 17-19, 2012HDF/HDF-EOS Workshop XV1 Introduction to HDF5 Barbara Jones The HDF Group The 15 th HDF and HDF-EOS Workshop.
Graphic images for computers Stored in files of binary data - Binary blobs Software has to know the binary format to decode the file and render an image.
1 High level view of HDF5 Data structures and library HDF Summit Boeing Seattle September 19, 2006.
Sep , 2010HDF/HDF-EOS Workshop XIV1 HDF5 Advanced Topics Neil Fortner The HDF Group The 14 th HDF and HDF-EOS Workshop September 28-30, 2010.
April 6, 2010GMQS Meeting1 Optional Feature Support in HDF5 Tools Albert Cheng The HDF Group.
A Metadata Based Approach For Supporting Subsetting Queries Over Parallel HDF5 Datasets Vignesh Santhanagopalan Graduate Student Department Of CSE.
February 2-3, 2006SRB Workshop, San Diego P eter Cao, NCSA Mike Wan, SDSC Sponsored by NLADR, NFS PACI Project in Support of NCSA-SDSC Collaboration Object-level.
1 Introduction to HDF5 Data Model, Programming Model and Library APIs HDF and HDF-EOS Workshop VIII October 26, 2004.
The WinMine Toolkit Max Chickering. Build Statistical Models From Data Dependency Networks Bayesian Networks Local Distributions –Trees Multinomial /
May 30-31, 2012HDF5 Workshop at PSI1 HDF5 at Glance Quick overview of known topics.
The HDF Group HDF5 Datasets and I/O Dataset storage and its effect on performance May 30-31, 2012HDF5 Workshop at PSI 1.
HDF 1 New Features in HDF Group Revisions HDF and HDF-EOS Workshop IX November 30, 2005.
The HDF Group HDF5 Tools Updates Peter Cao, The HDF Group September 28-30, 20101HDF and HDF-EOS Workshop XIV.
October 15, 2008HDF and HDF-EOS Workshop XII1 What will be new in HDF5?
1 N-bit and ScaleOffset filters MuQun Yang National Center for Supercomputing Applications University of Illinois at Urbana-Champaign Urbana, IL
Advanced Utilities Extending ncgen to support the netCDF-4 Data Model Dr. Dennis Heimbigner Unidata netCDF Workshop August 3-4, 2009.
Update on HDF5 1.8 The HDF Group HDF and HDF-EOS Workshop X November 28, 2006HDF.
1 HDF5 Life cycle of data Boeing September 19, 2006.
May 30-31, 2012 HDF5 Workshop at PSI May Shared Object Headers Dana Robinson The HDF Group Efficient Use of HDF5 With High Data Rate X-Ray Detectors.
The HDF Group Support for NPP/NPOESS by The HDF Group Mike Folk, Elena Pourmal, Peter Cao The HDF Group November 5, 2009 November 3-5,
November 30, 2005HDF & HDF-EOS Workshop IX Peter Cao, NCSA November 30, 2005 HDF5 Tools.
CE Operating Systems Lecture 17 File systems – interface and implementation.
September 9, 2008SPEEDUP Workshop - HDF5 Tutorial1 Introduction to HDF5 Command-line Tools.
Comments from User Services C. Boquist/Code 423 The HDF Group Meeting 1 April 2009.
Parallel I/O Performance Study and Optimizations with HDF5, A Scientific Data Package MuQun Yang, Christian Chilan, Albert Cheng, Quincey Koziol, Mike.
The HDF Group HDF5 Chunking and Compression Performance tuning 10/17/15 1 ICALEPCS 2015.
Lecture 10 Page 1 CS 111 Summer 2013 File Systems Control Structures A file is a named collection of information Primary roles of file system: – To store.
The HDF Group Single Writer/Multiple Reader (SWMR) 110/17/15.
– Intermediate Perl 1/6/ Intermediate Perl - POD, parameters and configuration Intermediate Perl – Session 7 · POD –
The HDF Group 10/17/151 HDF5 Tools Tutorial ICALEPCS 2015.
The HDF Group 10/17/15 1 HDF5 vs. Other Binary File Formats Introduction to the HDF5’s most powerful features ICALEPCS 2015.
April 28, 2008LCI Tutorial1 Parallel HDF5 Tutorial Tutorial Part IV.
NTFS Filing System CHAPTER 9. New Technology File System (NTFS) Started with Window NT in 1993, Windows XP, 2000, Server 2003, 2008, and Window 7 also.
The HDF Group 10/17/151 Introduction to HDF5 ICALEPCS 2015.
The HDF Group Introduction to HDF5 Session Two Data Model Comparison HDF5 File Format 1 Copyright © 2010 The HDF Group. All Rights Reserved.
The HDF Group Single Writer/Multiple Reader (SWMR) 110/17/15.
The HDF Group Introduction to HDF5 Session 7 Datatypes 1 Copyright © 2010 The HDF Group. All Rights Reserved.
Utilities for netCDF-4 Dr. Dennis Heimbigner Unidata Advanced netCDF Workshop July 25, 2011.
FILES AND EXCEPTIONS Topics Introduction to File Input and Output Using Loops to Process Files Processing Records Exceptions.
Copyright © 2010 The HDF Group. All Rights Reserved1 Data Storage and I/O in HDF5.
Lecture 3 – MapReduce: Implementation CSE 490h – Introduction to Distributed Computing, Spring 2009 Except as otherwise noted, the content of this presentation.
HDF and HDF-EOS Workshop XII
Moving from HDF4 to HDF5/netCDF-4
Introduction to HDF5 Session Five Reading & Writing Raw Data Values
HDF5 Metadata and Page Buffering
Introduction to HDF5 Tutorial.
File System Structure How do I organize a disk into a file system?
Other Kinds of Arrays Chapter 11
What NetCDF users should know about HDF5?
Unidata Advanced netCDF Workshop
Topics Introduction to File Input and Output
Peter Cao The HDF Group November 28, 2006
Introduction to HDF5 Mike McGreevy The HDF Group
Moving applications to HDF
File System Implementation
Hierarchical Data Format (HDF) Status Update
Real-World File Structures
HDF5 Tools Updates and Discussions
Topics Introduction to File Input and Output
Presentation transcript:

April 28, 2008LCI Tutorial1 Introduction to HDF5 Tools Tutorial Part II

April 28, 2008LCI Tutorial2 Outline Overview of HDF5 tools Using tools for problems troubleshooting

April 28, 2008LCI Tutorial3 HDF5 command-line tools Readers h5dump, h5diff, h5ls 1.8 tools: h5check, h5stat Writers h5repack, h5repart, h5import, h5jam/h5unjam 1.8 tools: h5copy, h5mkgrp Converters h4toh5, h5toh4, gif2h5, h52gif

April 28, 2008LCI Tutorial4 h5dump Dumps the content of an HDF5 file to standard output and optionally to the following types of files 1.ASCII text file 2.XML file 3.Binary file Flags to remember -H to print header information -p to print objects’ properties -b to export data in a binary form -o to export data to a file (text by default) -y to skip printing indices -w to specify line width

April 28, 2008LCI Tutorial5 h5dump -H SDS.h5 HDF5 "SDS.h5" { GROUP "/" { GROUP "Floats" { DATASET "FloatArray" { DATATYPE H5T_IEEE_F32LE DATASPACE SIMPLE { ( 4, 3 ) / ( 4, 3 ) } } DATASET "IntArray" { DATATYPE H5T_STD_I32LE DATASPACE SIMPLE { ( 5, 6 ) / ( 5, 6 ) } }

April 28, 2008LCI Tutorial6 h5dump -d /Floats/FloatArray SDS.h5 HDF5 "SDS.h5" { DATASET "/Floats/FloatArray" { DATATYPE H5T_IEEE_F32LE DATASPACE SIMPLE { ( 4, 3 ) / ( 4, 3 ) } DATA { (0,0): 0.01, 0.02, 0.03, (1,0): 0.1, 0.2, 0.3, (2,0): 1, 2, 3, (3,0): 10, 20, 30 }

April 28, 2008LCI Tutorial7 h5dump -x SDS.h5

April 28, 2008LCI Tutorial8 h5dump binary output -b F, --binary=F The form of the binary output (F): MEMORY-- for memory type Data in a file will have the same data type as in memory FILE -- for the disk file type Data in a file will have the same data type as corresponding dataset in an HDF5 file LE -- for pre-defined little endian type H5T_IEEE_F64LE BE -- for pre-defined big endian type H5T_STD_I32BE

April 28, 2008LCI Tutorial9 h5dump -d /IntArray -o out_le.bin -b LE SDS.h5 od --width=24 -t x4 out_le.bin a b c d e f e f a b c d Dumps a 32-bit integer dataset, IntArray, from SDS.h5 to a little endian binary file out_le.bin

April 28, 2008LCI Tutorial10 h5diff Using h5diff, you can compare two objects in the same file compare two objects between two files compare all objects between two files

April 28, 2008LCI Tutorial11 h5diff SDS.h5 SDS2.h5 Dataset: and 5 differences found

April 28, 2008LCI Tutorial12 h5diff SDS.h5 SDS2.h5 -r /IntArray Dataset: and positionIntArrayIntArraydifference [ 0 0 ]01010 [ 1 0 ] [ 2 0 ] [ 3 0 ] [ 4 0 ] differences found

April 28, 2008LCI Tutorial13 h5repack Copies an HDF5 file to a new file with/without compression/chunking Remove un-used space Apply compression filter Apply layout

April 28, 2008LCI Tutorial14 h5repack: Applying filters -f FILTER GZIP, to apply GZIP compression SZIP, to apply SZIP compression SHUF, to apply the HDF5 shuffle filter FLET, to apply the HDF5 checksum filter NBIT, to apply NBIT compression SOFF, to apply the HDF5 Scale/Offset filter NONE, to remove all filters For example h5repack -i SDS2.h5 -o SDS2_compressed.h5 -f /IntArray:GZIP=9 Remember that if your data is smaller than 1K, compression will not be applied, see -m flag

April 28, 2008LCI Tutorial15 h5repack: Data layout -l LAYOUT CHUNK, to apply chunking layout COMPA, to apply compact layout CONTI, to apply continuous layout For example h5repack -i SDS.h5 -o SDS_chunk.h5 -l /Floats/FloatArray,/IntArray:CHUNK=2x3

April 28, 2008LCI Tutorial16 h5repart Repartitions a file or family of files For example h5repart -m 200m int16kx16k.h5 part200m%d.h5 977 MB 200 MB part200m0.h5 200 MB part200m1.h5 200 MB part200m2.h5 200 MB part200m3.h5 177 MB part200m1.h5

April 28, 2008LCI Tutorial17 h5import Imports binary/ASCII data into an HDF5 file h5import infile -c config_file [infile -c config_file2...] -outfile outfile Example: h5import float5x4x2.txt -c First_set.conf -o First_set.h5 PATH work/First-set INPUT-CLASS TEXTFP RANK 3 DIMENSION-SIZES OUTPUT-CLASS FP OUTPUT-SIZE 64 OUTPUT-ARCHITECTURE IEEE OUTPUT-BYTE-ORDER LE CHUNKED-DIMENSION-SIZES MAXIMUM-DIMENSIONS GROUP "/" { GROUP "work" { DATASET "First-set" { DATATYPE H5T_IEEE_F64LE DATASPACE SIMPLE { ( 5, 2, 4 ) / ( 8, 8, H5S_UNLIMITED ) } DATA { (0,0,0): 1.01, 1.02, 1.03, 1.04, (0,1,0): 1.11, 1.12, 1.13, 1.14, (1,0,0): 1.21, 1.22, 1.23, 1.24, (1,1,0): 1.31, 1.32, 1.33, 1.34, (2,0,0): 1.41, 1.42, 1.43, 1.44, (2,1,0): 1.51, 1.52, 1.53, 1.54, (3,0,0): 2.01, 2.02, 2.03, 2.04, (3,1,0): 2.11, 2.12, 2.13, 2.14, (4,0,0): 2.21, 2.22, 2.23, 2.24, (4,1,0): 2.31, 2.32, 2.33, 2.34 } }}

April 28, 2008LCI Tutorial18 h5jam/h5unjam Adds/removes a file at the beginning of an HDF5 file Example: h5jam -- adds text to User Block h5jam -u test_ub.txt -i test_ub.h5 h5unjam -- removes text from User Block h5unjam -i test_ub.h5 -o out_ub.txt -o out_ub.h5

April 28, 2008LCI Tutorial19 h5ls Lists selected information about file objects in the specified format Example: h5ls -r SDS2.h5 /Floats Group /Floats/DoubleArray Dataset {10, 5} /Floats/FloatArray Dataset {4, 3} /Floats/subs Group /IntArray Dataset {5, 6}

April 28, 2008LCI Tutorial20 gif2h5 / h52gif gif2h5 – Converts a GIF file into HDF5 gif2h5 apollo17_earth.gif apollo17_earth.h5 h52gif – Converts an HDF5 file into GIF h52gif apollo17_earth.h5 apollo17_earth2.gif -i /apollo17_earth.gif/Image0 -p "/apollo17_earth.gif/Global Palette"

April 28, 2008LCI Tutorial21 h5copy Copies an object from one location to another location within a file or across files Available in and later / FloatArray Floats IntArray / FloatArray

April 28, 2008LCI Tutorial22 h5copy usage: h5copy [OPTIONS] [OBJECTS...] -i, --input input file name -o, --output output file name -s, --source source object name -d, --destination destination object name -f, --flag shallow Copy only immediate members for groups soft Expand soft links into new objects ext Expand external links into new objects ref Copy objects that are pointed by references noattr Copy object without copying attributes

April 28, 2008LCI Tutorial23 h5copy Example h5copy -i SDS.h5 -o SDS_cp.h5 -s /Floats/FloatArray -d /FloatArray / FloatArray Floats IntArray / FloatArray SDS.h5 SDS_cp.h5

April 28, 2008LCI Tutorial24 h5copy -f shallow / i1 floats integers 64-bit i2 f32 f2f1 / floats 64-bit f32 f2f1 / floats 64-bit f32 -f shallow

April 28, 2008LCI Tutorial25 h5copy -f soft / -f soft dset_SL/f1 f1 / dset_SL/f1 f1 / dset_SL/f1

April 28, 2008LCI Tutorial26 h5copy -f ref / -f ref d1 dset_ref d / d1 dset_ref d / dset_ref 0 0

April 28, 2008LCI Tutorial27 h5stat Prints different statistics about HDF5 file Helps To troubleshoot size overhead in HDF5 files To choose specific object’s properties and storage strategies Available in and later

April 28, 2008LCI Tutorial28 h5check  Verifies if an HDF5 file is encoded according to the HDF5 File Format Specification  Does not use HDF5 library  Serves as a watch dog that the HDF5 library implementation is compliant with the HDF5 File Format Specification  Tool is NOT a part of the HDF5 source code distribution

April 28, 2008LCI Tutorial29 How to use it? h5check [-vn] -vn verboseness mode n=0Terse—only prints if the file is compliant or not n=1Default—prints its progress and all errors found n=2Verbose—prints everything it knows, usually for debugging

April 28, 2008LCI Tutorial30 Example: a compliant file % h5check example1.h5 VALIDATING example1.h5 FOUND super block signature VALIDATING the super block at 0... VALIDATING the object header at VALIDATING the btree at FOUND btree signature. VALIDATING the local heap at FOUND local heap signature. … Result: File is in compliance.

April 28, 2008LCI Tutorial31 Example: a non-compliant file h5check invalid2.h5 FOUND super block signature VALIDATING the super block at 0... VALIDATING the object header at VALIDATING the btree at FOUND btree signature. VALIDATING the SNOD at FOUND SNOD signature. VALIDATING the object header at check_sym(at 1248): Errors from check_obj_header() decode_validate_messages(): Failure in type->decode(). H5O_sdspace_decode(): Bad version number in simple dataspace message. VALIDATING the local heap at FOUND local heap signature. Main(): Errors from check_obj_header(). decode_validate_messages(): Failure in type->decode(). H5O_attr_decode(): Can't decode attribute dataspace. H5O_sdspace_decode(): Bad version number in simple dataspace message. … Result: File is not in compliance.

April 28, 2008LCI Tutorial32 Using HDF5 Tools for Performance Tuning and Troubleshooting

April 28, 2008LCI Tutorial33 Introduction HDF5 tools may be very useful for performance tuning and troubleshooting Discover objects and their properties in HDF5 files h5dump -p Get file size overhead information h5stat Get locations of the objects in a file h5ls Discover differences h5diff, h5ls Location of raw data h5ls –var

April 28, 2008LCI Tutorial34 h5stat Prints different statistics about HDF5 file Helps To troubleshoot size overhead in HDF5 files To choose specific object’s properties and storage strategies To use h5stat --help h5stat file.h5 Full spec can be found Let us know if you need some “special” type of statistics

April 28, 2008LCI Tutorial35 h5stat Reports two types of statistics: High-level information about objects (examples): Number of different objects (groups, datasets, datatypes) in a file Number of unique datatypes Size of raw data in a file Information about object’s structural metadata Sizes of structural metadata (total/free) Object headers, local and global heaps Sizes of B-trees Object headers fragmentation

April 28, 2008LCI Tutorial36 h5stat Examples of high-level information: File information # of unique groups: # of unique datasets: 30 # of unique named datatypes: 0 …………………… Max. # of links to object: 1 Max. depth of hierarchy: 4 Max. # of objects in group: 19 …………………… Group bins: # of groups of size 0: # of groups of size 1 - 9: 7 # of groups of size : 1 …………………… Max. dimension size of 1-D datasets: 1643 …………………… Dataset filters information: Number of datasets with ……………… SZIP filter: 2 ……………… NBIT filter: 10 USER-DEFINED filter: 1

April 28, 2008LCI Tutorial37 h5stat Conclusion: There are a lot of empty groups in the file; good candidate for compact group feature (h5repack -l ….) Some datasets use “user-defined” filters and may not be readable by HDF5 library SZIP compression is needed to read some datasets Oh… my application uses buffers of size 1024 to read data… No wonder it crashes on reading… Do I have all filters needed to read the data?

April 28, 2008LCI Tutorial38 h5stat Examples of structural metadata information: Object header size: (total/unused) Groups: 1808/72 Datasets: 15792/832 ……… Dataset storage information: Total raw data size: ……… Dataset datatype #3: Count (total/named) = (2/0) Size (desc./elmt) = (10/65535) Dataset datatype #4: Count (total/named) = (1/0) Size (desc./elmt) = (10/32000)

April 28, 2008LCI Tutorial39 Conclusions File size: % overhead (not bad at all!) There some elements of size and Oh… Is it really what I want? Should I use other datatype and get advantage of compression? h5stat

April 28, 2008LCI Tutorial40 Case study: Using HDF5tools to debug a problem My application creates files on Windows with VS2005 and VS2003. I can read the VS2003 file but not the VS2005 one. H5dump reads both files OK and there are no differences. What am I doing wrong? h5diff good.h5 bad.h5 Datatype: and 1 differences found h5ls –var good.h5 /Definitions/timespec Type Location: 0:1:0:900 h5debug good.h5 900 Message Information: Type class: compound Size: 8 bytes h5debug bad.h5 900 Message Information: Type class: compound Size: 16 bytes

April 28, 2008LCI Tutorial41 Conclusions Compound datatype “timespec” requires different number of bytes on VS2005 (16 bytes; 2x8bytes) and on VS2003 (8bytes; 2x4bytes) Oh… How do I read my data back? I assumed that my struct would need only 8 bytes for each element but it needs 16 bytes on VS2005. I need H5Tget_native_type function to find the type of my data in memory Case study: Using HDF5tools to debug a problem

April 28, 2008LCI Tutorial42 Questions? End of Part II