Presentation is loading. Please wait.

Presentation is loading. Please wait.

Update on HDF5 1.8 The HDF Group HDF and HDF-EOS Workshop X November 28, 2006HDF.

Similar presentations


Presentation on theme: "Update on HDF5 1.8 The HDF Group HDF and HDF-EOS Workshop X November 28, 2006HDF."— Presentation transcript:

1 Update on HDF5 1.8 The HDF Group HDF and HDF-EOS Workshop X November 28, 2006HDF

2 Why HDF5 1.8?

3 Nov. 28, 2006HDF and HDF-EOS Workshop X, Landover MD3 … as we know, there are known knowns; there are things we know we know. We also know there are known unknowns; that is to say we know there are some things we do not know. But there are also unknown unknowns -- the ones we don't know we don't know. Donald Rumsfeld

4 Nov. 28, 2006HDF and HDF-EOS Workshop X, Landover MD4 Some things we knew we knew Need high level APIs – image, etc. Need more datatypes - packed n-bit, etc. Need external and other links Tools needed – h5pack, etc. Caching embellishments Eventually, multithreading

5 Nov. 28, 2006HDF and HDF-EOS Workshop X, Landover MD5 Things we knew we did not know New requirements from EOS and ASCI New applications that would use HDF5 How HDF5 would really perform in parallel What new tools, features and options needed New APIs, API features

6 Nov. 28, 2006HDF and HDF-EOS Workshop X, Landover MD6 Things we didn’t know we didn’t know Completely unanticipated applications New data types and structures E.g. DNA sequences New operations E.g. write many real-time streams simultaneously

7 Nov. 28, 2006HDF and HDF-EOS Workshop X, Landover MD7 HDF5 1.8 topics Dataset and datatype improvements Group improvements Link Revisions Shared object header nessages Metadata cache improvements Other improvements Platform-specific changes High level APIs Parallel HDF5 Tool improvements

8 Dataset and Datatype Improvements

9 Nov. 28, 2006HDF and HDF-EOS Workshop X, Landover MD9 Text-based data type descriptions Why: Simplify datatype creation Make datatype creation code more readable Facilitate debugging by printing the text description of a data type What: New routine to create a data type through the text description of the data type: H5LTdtype_to_text

10 Nov. 28, 2006HDF and HDF-EOS Workshop X, Landover MD10 Text data type description – Example Create a datatype of compound type. /* Create the data type with text description */ ( dtype = H5Ttext_to_type( “typedef struct foo {int a; float b;} foo_t;”) “typedef struct foo {int a; float b;} foo_t;”) /* Convert the data type back to text */ H5Ttype_to_text(dtype, NULL, H5T_C, &tsize)

11 Nov. 28, 2006HDF and HDF-EOS Workshop X, Landover MD11 Serialized datatypes and dataspaces Why: Allow datatype and dataspace info to be transmitted between processes Allow datatype/dataspace to be stored in non- HDF5 files What: A new set of routines to serialize/deserialize HDF5 datatypes and dataspaces.

12 Nov. 28, 2006HDF and HDF-EOS Workshop X, Landover MD12 Int to float convert during I/O Why: Convert ints to floats during I/O What: Int to float conversion supported during I/O

13 Nov. 28, 2006HDF and HDF-EOS Workshop X, Landover MD13 Revised conversion exception handling Why: Give apps greater control over exceptions (range errors, etc.) during datatype conversion. What: Revised conversion exception handling

14 Nov. 28, 2006HDF and HDF-EOS Workshop X, Landover MD14 Revised conversion exception handling To handle exceptions during conversions, register handling function through H5Pset_type_conv_cb(). Cases of exception: H5T_CONV_EXCEPT_RANGE_HI H5T_CONV_EXCEPT_RANGE_LOW H5T_CONV_EXCEPT_TRUNCATE H5T_CONV_EXCEPT_PRECISION H5T_CONV_EXCEPT_PINF H5T_CONV_EXCEPT_NINF H5T_CONV_EXCEPT_NAN Return values: H5T_CONV_ABORT, H5T_CONV_UNHANDLED, H5T_CONV_HANDLED

15 Nov. 28, 2006HDF and HDF-EOS Workshop X, Landover MD15 Compression filter for n-bit data Why: Compact storage for user-defined datatypes What: When data stored on disk, padding bits chopped off and only significant bits stored Supports most datatypes Works with compound datatypes

16 Nov. 28, 2006HDF and HDF-EOS Workshop X, Landover MD16 N-bit compression example In memory, one value of N-Bit datatype is stored like this: | byte 3 | byte 2 | byte 1 | byte 0 | |????????|????SPPP|PPPPPPPP|PPPP????| S-sign bit P-significant bit ?-padding bit After passing through the N-Bit filter, all padding bits are chopped off, and the bits are stored on disk like this: | 1st value | 2nd value | |SPPPPPPP PPPPPPPP|SPPPPPPP PPPPPPPP|... Opposite (decompress) when going from disk to memory

17 Nov. 28, 2006HDF and HDF-EOS Workshop X, Landover MD17 Offset+size storage filter Why: Use less storage when less precision needed What: Performs scale/offset operation on each value Truncates result to fewer bits before storing Currently supports integers and floats Example H5Pset_scaleoffset (dcr,H5Z_SO_INT,H5Z_SO_INT_MINBITS_DEFAULT); H5Dcreate(……, dcr) H5Dwrite (…);

18 Nov. 28, 2006HDF and HDF-EOS Workshop X, Landover MD18 Example with floating-point type Data: {104.561, 99.459, 100.545, 105.644} Choose scaling factor: decimal precision to keep E.g. scale factor D = 2 1. Find minimum value (offset): 99.459 2. Subtract minimum value from each element Result: {5.102, 0, 1.086, 6.185} 3. Scale data by multiplying 10 D = 100 Result: {510.2, 0, 108.6, 618.5} 4. Round the data to integer Result: {510, 0, 109, 619} 5. Pack and store using min number of bits

19 Nov. 28, 2006HDF and HDF-EOS Workshop X, Landover MD19 “NULL” Dataspace Why: Allow datasets with no elements to be described NetCDF 4 needed a “place holder” for attributes What: A dataset with no dimensions, no data

20 Group improvements

21 Nov. 28, 2006HDF and HDF-EOS Workshop X, Landover MD21 Access links by creation-time order Why: Allow iteration & lookup of group’s links (children) by creation order as well as by name order Support netCDF access model for netCDF 4 What: Option to access objects in group according to relative creation time

22 Nov. 28, 2006HDF and HDF-EOS Workshop X, Landover MD22 “Compact groups” Why: Save space and access time for small groups If groups small, don’t need B-tree overhead What: Alternate storage for groups with few links Example File with 11,600 groups With original group structure, file size ~ 20 MB With compact groups, file size ~ 12 MB Total savings: 8 MB (40%) Average savings/group: ~700 bytes

23 Nov. 28, 2006HDF and HDF-EOS Workshop X, Landover MD23 Better large group storage Why: Faster, more scalable storage and access for large groups What: New format and method for storing groups with many links

24 Nov. 28, 2006HDF and HDF-EOS Workshop X, Landover MD24 Intermediate group creation Why: Simplify creation of a series of connected groups Avoid having to create each intermediate group separately, one by one What: Intermediate groups can be created when creating an object in a file, with one function call

25 Nov. 28, 2006HDF and HDF-EOS Workshop X, Landover MD25 Example: add intermediate groups Want to create “/A/B/C/dset1” “A” exists, but “B/C/dset1” do not / A / A B dset1 C H5Dcreate(file_id, “/A/B/C/dset1”,..) One call creates groups “B” & “C”, then creates “dset1”

26 Link Revisions

27 Nov. 28, 2006HDF and HDF-EOS Workshop X, Landover MD27 What are links? Links connect groups to their members “Hard” links point to a target by address “Soft” links store the path to a target root group Hard link dataset Soft link “/target dataset”

28 Nov. 28, 2006HDF and HDF-EOS Workshop X, Landover MD28 file2.h5 file1.h5 New: external Links Why: Access objects by file & path within file What: Store location of file and path within that file Can link across files root group “dataset EL” “file2.h5” “target dataset” root group dataset “target dataset”

29 Nov. 28, 2006HDF and HDF-EOS Workshop X, Landover MD29 New: User-defined Links Why: Allow applications to create their own kinds of links and link operations, such as Create “hard” external link that finds an object by address Create link that accesses a URL Keep track of how often a link accessed, or other behavior What: App can create new kinds of links by supplying custom callback functions Can do anything HDF5 hard, soft, or external links do

30 Shared Object Header Messages

31 Nov. 28, 2006HDF and HDF-EOS Workshop X, Landover MD31 Shared object header messages Why: metadata duplicated many times, wasting space Example: You create a file with 10,000 datasets All use the same datatype and dataspace HDF5 needs to write this information 10,000 times! Dataset 1 data 1 datatype dataspace Dataset 2 data 2 datatype dataspace Dataset 3 data 3 datatype dataspace

32 Nov. 28, 2006HDF and HDF-EOS Workshop X, Landover MD32 Shared object header messages What: Enable messages to be shared automatically HDF5 shares duplicated messages on its own! Dataset 1 data 1 datatype dataspace Dataset 2 data 2

33 Nov. 28, 2006HDF and HDF-EOS Workshop X, Landover MD33 Shared Messages Happens automatically Works with datatypes, dataspaces, attributes, fill values, and filter pipelines Saves space if these objects are relatively large May be faster if HDF5 can cache shared messages Drawbacks Usually slower than non-shared messages Adds overhead to the file Index for storing shared datatypes 25 bytes per instance Older library versions can’t read files with shared messages

34 Nov. 28, 2006HDF and HDF-EOS Workshop X, Landover MD34 Two informal tests File with 24 datasets, all with same big datatype 26,000 bytes normally 17,000 bytes with shared messages enabled Saves 375 bytes per dataset But, make a bad decision: invoke shared messages but only create one dataset… 9,000 bytes normally 12,000 bytes with shared messages enabled Probably slower when reading and writing, too. Moral: shared messages can be a big help, but only in the right situation!

35 Metadata cache improvements

36 Nov. 28, 2006HDF and HDF-EOS Workshop X, Landover MD36 Metadata Cache improvements Why: Improve I/O performance and memory usage when accessing many objects What: New metadata cache APIs control cache size monitor actual cache size and current hit rate Under the hood: adaptive cache resizing Automatically detects the current working size Sets max cache size to the working set size

37 Nov. 28, 2006HDF and HDF-EOS Workshop X, Landover MD37 Metadata cache improvements Note: most applications do not need to worry about the cache See “Advanced topics” for details And if you do see unusual memory growth or poor performance, please contact us. We want to help you.

38 Other improvements

39 Nov. 28, 2006HDF and HDF-EOS Workshop X, Landover MD39 New extendible error-handling API Why: Enable app to integrate error reporting with HDF5 library error stack What: New error handling API H5Epush - push major and minor error ID on specified error stack H5Eprint – print specified stack H5Ewalk – walk through specified stack H5Eclear – clear specified stack H5Eset_auto – turn error printing on/off for specified stack H5Eget_auto – return settings for specified stack traversal

40 Nov. 28, 2006HDF and HDF-EOS Workshop X, Landover MD40 Extendible ID API A ID management routines allow an application to use the HDF5 ID-to-object mapping routines

41 Nov. 28, 2006HDF and HDF-EOS Workshop X, Landover MD41 Attribute improvements Why: Use less storage when large numbers of attributes attached to a single object Iterate over or look up attributes by creation order What: Property to create index on the order in which the attributes are created Improved attribute storage

42 Nov. 28, 2006HDF and HDF-EOS Workshop X, Landover MD42 Support for Unicode Character Set Why: So apps can create names using Unicode netCDF 4 needed this What UTF-8 Unicode encoding now supported For string datatypes, names of links and attributes Example: H5Pset_char_encoding(lcpl_id, H5T_CSET_UTF8) H5Llink(file_id, "UTF-8 name", …, lcpl_id, …);

43 Nov. 28, 2006HDF and HDF-EOS Workshop X, Landover MD43 Efficient copying of HDF5 objects Why: Enable apps to copy objects efficiently What New routines to copy an object in an HDF5 file within the current file or to another file Done at a low-level in the HDF5 file, allowing Entire group hierarchies to be copied quickly Compressed datasets to be copied without going through a decompression/compression cycle

44 Nov. 28, 2006HDF and HDF-EOS Workshop X, Landover MD44 Performance of object copy routines

45 Nov. 28, 2006HDF and HDF-EOS Workshop X, Landover MD45 Data transformation filter Why: Apply arithmetic operations to data during I/O What: Data transformation filter Transform expressed by algebraic formula Only +, -, *, and /supported Example: Expression parameter set, such as x*(x-5) When dataset read/written, x*(x-5) applied per element When reading, values in file are unchanged When writing, transformed data written to file

46 Nov. 28, 2006HDF and HDF-EOS Workshop X, Landover MD46 Stackable Virtual File Drivers What is Virtual File Driver (VFD)?

47 Nov. 28, 2006HDF and HDF-EOS Workshop X, Landover MD47 Virtual file I/O (C only)  Perform byte-stream I/O operations (open/close, read/write, seek)  User-implementable I/O (stdio, network, memory, etc.) Virtual file I/O (C only)  Perform byte-stream I/O operations (open/close, read/write, seek)  User-implementable I/O (stdio, network, memory, etc.) Library internals Performs data transformations and other prep for I/O Configurable transformations (compression, etc.) Library internals Performs data transformations and other prep for I/O Configurable transformations (compression, etc.) Structure of HDF5 Library Object API (C, Fortran 90, Java, C++)  Specify objects and transformation properties  Invoke data movement operations and data transformations Object API (C, Fortran 90, Java, C++)  Specify objects and transformation properties  Invoke data movement operations and data transformations

48 Nov. 28, 2006HDF and HDF-EOS Workshop X, Landover MD48 Stackable VFD HDF5 VFD allows Storing data using different physical file layout. E.g., Family VFD (writes file as “family of files”) Doing different types of I/O. E.g., stdio (standard I/O); MPI-I/O (for parallel I/O)

49 Nov. 28, 2006HDF and HDF-EOS Workshop X, Landover MD49 Stackable VFD Why “stackable:” Before now, only one VFD could be used at a time VFDs could not inter-operative What is “stackable:” A Non-terminal VFD may stack on top of compatible non-terminal and eventually Terminal VFD’s Two kinds of VFD Non-terminal (e.g. Family) Terminal (e.g. stdio; MPI-I/O)

50 Nov. 28, 2006HDF and HDF-EOS Workshop X, Landover MD50 Stackable VFD HDF5 Files Application HDF5 API stdio Family Filesplit mpiio Sec2 Default I/O path Terminal VFD Non-terminal VFD metadatarawdata

51 Platform-specific changes

52 Nov. 28, 2006HDF and HDF-EOS Workshop X, Landover MD52 Platform-specific changes Why: Better UNIX/Linux Portability What: 1.8 uses latest GNU “auto” tools (autoconf, automake, libtool) improves portability between many machine and OS configurations Build can now be done in parallel with gmake “–j” flag speeds up build, test and install processes Build infrastructure includes many other improvements as well

53 Nov. 28, 2006HDF and HDF-EOS Workshop X, Landover MD53 Platforms to be dropped Operating systems HPUX 11.00 MAC OS 10.3 AIX 5.1 and 5.2 SGI IRIX64-6.5 Linux 2.4 Solaris 2.8 and 2.9 Compilers GNU C compilers older than 3.4 (Linux) Intel 8.* PGI V. 5.*, 6.0 MPICH 1.2.5 http://www.hdfgroup.org/HDF5/release/alpha/obtain518.html

54 Nov. 28, 2006HDF and HDF-EOS Workshop X, Landover MD54 Platforms to be added Systems Alpha Open VMS MAC OSX 10.4 (Intel) Solaris 2.* on Intel (?) Cray XT3 Windows 64-bit (32-bit binaries) Linux 2.6 BG/L Compilers g95 PGI V. 6.1 Intel 9.* MPICH 1.2.7 MPICH2

55 High level APIs

56 Nov. 28, 2006HDF and HDF-EOS Workshop X, Landover MD56 High-Level Fortran APIs Fortran APIs have been added for H5Lite, H5Image and H5Table.

57 Nov. 28, 2006HDF and HDF-EOS Workshop X, Landover MD57 Dimension scales Similar to Dimension scales in HDF4 Coordinate variables in netCDF What is a dimension scale ? An HDF5 dataset with additional metadata that identifies the dataset as a “Dimension Scale” Associated with dimensions of HDF5 datasets Meaning of the association is left to applications A Dimension scale can be shared by two or more dataset dimensions

58 Nov. 28, 2006HDF and HDF-EOS Workshop X, Landover MD58 Dimension scales example HDF Explorer image

59 Nov. 28, 2006HDF and HDF-EOS Workshop X, Landover MD59 Dimension scales example HDF Explorer image

60 Nov. 28, 2006HDF and HDF-EOS Workshop X, Landover MD60 Sample dimension scale functions H5DSset_scale: convert dataset to a dimension scaleH5DSset_scale: convert dataset to a dimension scale H5DSattach_scale: attach scale to a dimensionH5DSattach_scale: attach scale to a dimension H5DSdetach_scale: detach scale from a dimensionH5DSdetach_scale: detach scale from a dimension H5DSis_attached: verify if scale attached to datasetH5DSis_attached: verify if scale attached to dataset H5DSget_scale_name: read name of scaleH5DSget_scale_name: read name of scale

61 Nov. 28, 2006HDF and HDF-EOS Workshop X, Landover MD61 HDF5Packet Why: High performance table writing For data acquisition, when there are many sources of data E.g. flight test What: Each row is a “packet”: a collection of fields, fixed or variable length Append only Indexed retrieval

62 Nov. 28, 2006HDF and HDF-EOS Workshop X, Landover MD62 Packets in HDF5...... Data Variable-length records Fixed-length data records Time......

63 Parallel HDF5

64 Nov. 28, 2006HDF and HDF-EOS Workshop X, Landover MD64 Collective I/O improvements Why Collective I/O not available for chunked data Collective I/O not available for complex selections Collective I/O is key to improving performance for parallel HDF5 What Collective I/O works for chunked storage Works for irregular selections for both chunked and contiguous storage

65 Nov. 28, 2006HDF and HDF-EOS Workshop X, Landover MD65 Parallel h5diff (ph5diff) Compares two files in an MPI parallel environment. Compares multiple datasets simultaneously

66 Nov. 28, 2006HDF and HDF-EOS Workshop X, Landover MD66 Windows MPICH support Windows MPICH support: prototype

67 Tool improvements

68 Nov. 28, 2006HDF and HDF-EOS Workshop X, Landover MD68 New features for old tools h5dump Dump data in binary format Faster for files with large numbers of objects h5diff Can now compare dataset regions Parallel ph5diff now available h5repack Efficient data copy using H5Gcopy() Able to handle big datasets

69 Nov. 28, 2006HDF and HDF-EOS Workshop X, Landover MD69 New HDF5 Tools h5copy Copies a group, dataset or named datatype from one location to another Copies within a file or across files h5repart Partition file into a family of files h5import Import binary/ascii data into an HDF5 file h5check Verifies an HDF5 file against the defined HDF5 File Format Specification h5stat Reports statistics about a file and objects in a file

70 Thank You

71 Questions/comments?

72 Nov. 28, 2006HDF and HDF-EOS Workshop X, Landover MD72 For more information Go to http://www.hdfgroup.org/HDF5/http://www.hdfgroup.org/HDF5/ Click on “Obtain HDF5 1.8.0 Alpha” Look at table “Information”

73 Acknowledgement This report is based upon work supported in part by a Cooperative Agreement with NASA under NASA NNG05GC60A. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Aeronautics and Space Administration.


Download ppt "Update on HDF5 1.8 The HDF Group HDF and HDF-EOS Workshop X November 28, 2006HDF."

Similar presentations


Ads by Google